From 1799d8abeabc68ec05679292aaf6cba93b343c05 Mon Sep 17 00:00:00 2001 From: Jiayuan Chen Date: Tue, 27 Jan 2026 19:38:44 +0800 Subject: [PATCH 001/359] xfrm6: fix uninitialized saddr in xfrm6_get_saddr() xfrm6_get_saddr() does not check the return value of ipv6_dev_get_saddr(). When ipv6_dev_get_saddr() fails to find a suitable source address (returns -EADDRNOTAVAIL), saddr->in6 is left uninitialized, but xfrm6_get_saddr() still returns 0 (success). This causes the caller xfrm_tmpl_resolve_one() to use the uninitialized address in xfrm_state_find(), triggering KMSAN warning: ===================================================== BUG: KMSAN: uninit-value in xfrm_state_find+0x2424/0xa940 xfrm_state_find+0x2424/0xa940 xfrm_resolve_and_create_bundle+0x906/0x5a20 xfrm_lookup_with_ifid+0xcc0/0x3770 xfrm_lookup_route+0x63/0x2b0 ip_route_output_flow+0x1ce/0x270 udp_sendmsg+0x2ce1/0x3400 inet_sendmsg+0x1ef/0x2a0 __sock_sendmsg+0x278/0x3d0 __sys_sendto+0x593/0x720 __x64_sys_sendto+0x130/0x200 x64_sys_call+0x332b/0x3e70 do_syscall_64+0xd3/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f Local variable tmp.i.i created at: xfrm_resolve_and_create_bundle+0x3e3/0x5a20 xfrm_lookup_with_ifid+0xcc0/0x3770 ===================================================== Fix by checking the return value of ipv6_dev_get_saddr() and propagating the error. Fixes: a1e59abf8249 ("[XFRM]: Fix wildcard as tunnel source") Reported-by: syzbot+e136d86d34b42399a8b1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68bf1024.a70a0220.7a912.02c2.GAE@google.com/T/ Signed-off-by: Jiayuan Chen Signed-off-by: Jiayuan Chen Reviewed-by: Simon Horman Signed-off-by: Steffen Klassert --- net/ipv6/xfrm6_policy.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index 1f19b6f14484..125ea9a5b8a0 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -57,6 +57,7 @@ static int xfrm6_get_saddr(xfrm_address_t *saddr, struct dst_entry *dst; struct net_device *dev; struct inet6_dev *idev; + int err; dst = xfrm6_dst_lookup(params); if (IS_ERR(dst)) @@ -68,9 +69,11 @@ static int xfrm6_get_saddr(xfrm_address_t *saddr, return -EHOSTUNREACH; } dev = idev->dev; - ipv6_dev_get_saddr(dev_net(dev), dev, ¶ms->daddr->in6, 0, - &saddr->in6); + err = ipv6_dev_get_saddr(dev_net(dev), dev, ¶ms->daddr->in6, 0, + &saddr->in6); dst_release(dst); + if (err) + return -EHOSTUNREACH; return 0; } From 0a4524bc69882a4ddb235bb6b279597721bda197 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Tue, 27 Jan 2026 14:49:23 +0200 Subject: [PATCH 002/359] xfrm: skip templates check for packet offload tunnel mode In packet offload, hardware is responsible to check templates. The result of its operation is forwarded through secpath by relevant drivers. That secpath is actually removed in __xfrm_policy_check2(). In case packet is forwarded, this secpath is reset in RX, but pushed again to TX where policy is rechecked again against dummy secpath in xfrm_policy_ok(). Such situation causes to unexpected XfrmInTmplMismatch increase. As a solution, simply skip template mismatch check. Fixes: 600258d555f0 ("xfrm: delete intermediate secpath entry in packet offload mode") Signed-off-by: Leon Romanovsky Reviewed-by: Jianbo Liu Reviewed-by: Cosmin Ratiu Signed-off-by: Tariq Toukan Reviewed-by: Simon Horman Signed-off-by: Steffen Klassert --- net/xfrm/xfrm_policy.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 62486f866975..5428185196a1 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -3801,8 +3801,8 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, struct xfrm_tmpl *tp[XFRM_MAX_DEPTH]; struct xfrm_tmpl *stp[XFRM_MAX_DEPTH]; struct xfrm_tmpl **tpp = tp; + int i, k = 0; int ti = 0; - int i, k; sp = skb_sec_path(skb); if (!sp) @@ -3828,6 +3828,12 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, tpp = stp; } + if (pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET && sp == &dummy) + /* This policy template was already checked by HW + * and secpath was removed in __xfrm_policy_check2. + */ + goto out; + /* For each tunnel xfrm, find the first matching tmpl. * For each tmpl before that, find corresponding xfrm. * Order is _important_. Later we will implement @@ -3837,7 +3843,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, * verified to allow them to be skipped in future policy * checks (e.g. nested tunnels). */ - for (i = xfrm_nr-1, k = 0; i >= 0; i--) { + for (i = xfrm_nr - 1; i >= 0; i--) { k = xfrm_policy_ok(tpp[i], sp, k, family, if_id); if (k < 0) { if (k < -1) @@ -3853,6 +3859,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, goto reject; } +out: xfrm_pols_put(pols, npols); sp->verified_cnt = k; From 594c11d0e1d445f580898a2b8c850f2e3f099368 Mon Sep 17 00:00:00 2001 From: Corey Minyard Date: Tue, 27 Jan 2026 07:22:35 -0600 Subject: [PATCH 003/359] ipmi: Fix use-after-free and list corruption on sender error The analysis from Breno: When the SMI sender returns an error, smi_work() delivers an error response but then jumps back to restart without cleaning up properly: 1. intf->curr_msg is not cleared, so no new message is pulled 2. newmsg still points to the message, causing sender() to be called again with the same message 3. If sender() fails again, deliver_err_response() is called with the same recv_msg that was already queued for delivery This causes list_add corruption ("list_add double add") because the recv_msg is added to the user_msgs list twice. Subsequently, the corrupted list leads to use-after-free when the memory is freed and reused, and eventually a NULL pointer dereference when accessing recv_msg->done. The buggy sequence: sender() fails -> deliver_err_response(recv_msg) // recv_msg queued for delivery -> goto restart // curr_msg not cleared! sender() fails again (same message!) -> deliver_err_response(recv_msg) // tries to queue same recv_msg -> LIST CORRUPTION Fix this by freeing the message and setting it to NULL on a send error. Also, always free the newmsg on a send error, otherwise it will leak. Reported-by: Breno Leitao Closes: https://lore.kernel.org/lkml/20260127-ipmi-v1-0-ba5cc90f516f@debian.org/ Fixes: 9cf93a8fa9513 ("ipmi: Allow an SMI sender to return an error") Cc: stable@vger.kernel.org # 4.18 Reviewed-by: Breno Leitao Signed-off-by: Corey Minyard --- drivers/char/ipmi/ipmi_msghandler.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index 3f48fc6ab596..a590a67294e2 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -4852,8 +4852,15 @@ restart: if (newmsg->recv_msg) deliver_err_response(intf, newmsg->recv_msg, cc); - else - ipmi_free_smi_msg(newmsg); + if (!run_to_completion) + spin_lock_irqsave(&intf->xmit_msgs_lock, + flags); + intf->curr_msg = NULL; + if (!run_to_completion) + spin_unlock_irqrestore(&intf->xmit_msgs_lock, + flags); + ipmi_free_smi_msg(newmsg); + newmsg = NULL; goto restart; } } From 1d90e6c1a56f6ab83e5c9d30ded19e7ac8155713 Mon Sep 17 00:00:00 2001 From: Corey Minyard Date: Tue, 27 Jan 2026 07:35:02 -0600 Subject: [PATCH 004/359] ipmi: Consolidate the run to completion checking for xmit msgs lock It made things hard to read, move the check to a function. Signed-off-by: Corey Minyard Reviewed-by: Breno Leitao --- drivers/char/ipmi/ipmi_msghandler.c | 42 ++++++++++++++++------------- 1 file changed, 24 insertions(+), 18 deletions(-) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index a590a67294e2..a042b1596933 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -602,6 +602,22 @@ static int __ipmi_bmc_register(struct ipmi_smi *intf, static int __scan_channels(struct ipmi_smi *intf, struct ipmi_device_id *id, bool rescan); +static void ipmi_lock_xmit_msgs(struct ipmi_smi *intf, int run_to_completion, + unsigned long *flags) +{ + if (run_to_completion) + return; + spin_lock_irqsave(&intf->xmit_msgs_lock, *flags); +} + +static void ipmi_unlock_xmit_msgs(struct ipmi_smi *intf, int run_to_completion, + unsigned long *flags) +{ + if (run_to_completion) + return; + spin_unlock_irqrestore(&intf->xmit_msgs_lock, *flags); +} + static void free_ipmi_user(struct kref *ref) { struct ipmi_user *user = container_of(ref, struct ipmi_user, refcount); @@ -1878,11 +1894,9 @@ static void smi_send(struct ipmi_smi *intf, int run_to_completion = READ_ONCE(intf->run_to_completion); unsigned long flags = 0; - if (!run_to_completion) - spin_lock_irqsave(&intf->xmit_msgs_lock, flags); + ipmi_lock_xmit_msgs(intf, run_to_completion, &flags); smi_msg = smi_add_send_msg(intf, smi_msg, priority); - if (!run_to_completion) - spin_unlock_irqrestore(&intf->xmit_msgs_lock, flags); + ipmi_unlock_xmit_msgs(intf, run_to_completion, &flags); if (smi_msg) handlers->sender(intf->send_info, smi_msg); @@ -4826,8 +4840,7 @@ static void smi_work(struct work_struct *t) * message delivery. */ restart: - if (!run_to_completion) - spin_lock_irqsave(&intf->xmit_msgs_lock, flags); + ipmi_lock_xmit_msgs(intf, run_to_completion, &flags); if (intf->curr_msg == NULL && !intf->in_shutdown) { struct list_head *entry = NULL; @@ -4843,8 +4856,7 @@ restart: intf->curr_msg = newmsg; } } - if (!run_to_completion) - spin_unlock_irqrestore(&intf->xmit_msgs_lock, flags); + ipmi_unlock_xmit_msgs(intf, run_to_completion, &flags); if (newmsg) { cc = intf->handlers->sender(intf->send_info, newmsg); @@ -4852,13 +4864,9 @@ restart: if (newmsg->recv_msg) deliver_err_response(intf, newmsg->recv_msg, cc); - if (!run_to_completion) - spin_lock_irqsave(&intf->xmit_msgs_lock, - flags); + ipmi_lock_xmit_msgs(intf, run_to_completion, &flags); intf->curr_msg = NULL; - if (!run_to_completion) - spin_unlock_irqrestore(&intf->xmit_msgs_lock, - flags); + ipmi_unlock_xmit_msgs(intf, run_to_completion, &flags); ipmi_free_smi_msg(newmsg); newmsg = NULL; goto restart; @@ -4928,16 +4936,14 @@ void ipmi_smi_msg_received(struct ipmi_smi *intf, spin_unlock_irqrestore(&intf->waiting_rcv_msgs_lock, flags); - if (!run_to_completion) - spin_lock_irqsave(&intf->xmit_msgs_lock, flags); + ipmi_lock_xmit_msgs(intf, run_to_completion, &flags); /* * We can get an asynchronous event or receive message in addition * to commands we send. */ if (msg == intf->curr_msg) intf->curr_msg = NULL; - if (!run_to_completion) - spin_unlock_irqrestore(&intf->xmit_msgs_lock, flags); + ipmi_unlock_xmit_msgs(intf, run_to_completion, &flags); if (run_to_completion) smi_work(&intf->smi_work); From 9f235ccecd03c436cb1683eac16b12f119e54aa9 Mon Sep 17 00:00:00 2001 From: Matt Johnston Date: Tue, 13 Jan 2026 17:41:34 +0800 Subject: [PATCH 005/359] ipmi: ipmb: initialise event handler read bytes IPMB doesn't use i2c reads, but the handler needs to set a value. Otherwise an i2c read will return an uninitialised value from the bus driver. Fixes: 63c4eb347164 ("ipmi:ipmb: Add initial support for IPMI over IPMB") Signed-off-by: Matt Johnston Message-ID: <20260113-ipmb-read-init-v1-1-a9cbce7b94e3@codeconstruct.com.au> Signed-off-by: Corey Minyard --- drivers/char/ipmi/ipmi_ipmb.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/char/ipmi/ipmi_ipmb.c b/drivers/char/ipmi/ipmi_ipmb.c index 3a51e58b2487..28818952a7a4 100644 --- a/drivers/char/ipmi/ipmi_ipmb.c +++ b/drivers/char/ipmi/ipmi_ipmb.c @@ -202,11 +202,16 @@ static int ipmi_ipmb_slave_cb(struct i2c_client *client, break; case I2C_SLAVE_READ_REQUESTED: + *val = 0xff; + ipmi_ipmb_check_msg_done(iidev); + break; + case I2C_SLAVE_STOP: ipmi_ipmb_check_msg_done(iidev); break; case I2C_SLAVE_READ_PROCESSED: + *val = 0xff; break; } From 6b157b408d0c7d125e4d7c62e11e7d9376a5d150 Mon Sep 17 00:00:00 2001 From: Corey Minyard Date: Fri, 16 Jan 2026 17:22:01 -0600 Subject: [PATCH 006/359] ipmi:ls2k: Make ipmi_ls2k_platform_driver static No need for it to be global. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202601170753.3zDBerGP-lkp@intel.com/ Signed-off-by: Corey Minyard --- drivers/char/ipmi/ipmi_si_ls2k.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/char/ipmi/ipmi_si_ls2k.c b/drivers/char/ipmi/ipmi_si_ls2k.c index 45442c257efd..4c1da80f256c 100644 --- a/drivers/char/ipmi/ipmi_si_ls2k.c +++ b/drivers/char/ipmi/ipmi_si_ls2k.c @@ -168,7 +168,7 @@ static void ipmi_ls2k_remove(struct platform_device *pdev) ipmi_si_remove_by_dev(&pdev->dev); } -struct platform_driver ipmi_ls2k_platform_driver = { +static struct platform_driver ipmi_ls2k_platform_driver = { .driver = { .name = "ls2k-ipmi-si", }, From 211ecfaaef186ee5230a77d054cdec7fbfc6724a Mon Sep 17 00:00:00 2001 From: Brad Spengler Date: Wed, 7 Jan 2026 12:12:36 -0500 Subject: [PATCH 007/359] drm/vmwgfx: Fix invalid kref_put callback in vmw_bo_dirty_release The kref_put() call uses (void *)kvfree as the release callback, which is incorrect. kref_put() expects a function with signature void (*release)(struct kref *), but kvfree has signature void (*)(const void *). Calling through an incompatible function pointer is undefined behavior. The code only worked by accident because ref_count is the first member of vmw_bo_dirty, making the kref pointer equal to the struct pointer. Fix this by adding a proper release callback that uses container_of() to retrieve the containing structure before freeing. Fixes: c1962742ffff ("drm/vmwgfx: Use kref in vmw_bo_dirty") Signed-off-by: Brad Spengler Signed-off-by: Zack Rusin Cc: Ian Forbes Link: https://patch.msgid.link/20260107171236.3573118-1-zack.rusin@broadcom.com --- drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c b/drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c index fd4e76486f2d..45561bc1c9ef 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c @@ -260,6 +260,13 @@ out_no_dirty: return ret; } +static void vmw_bo_dirty_free(struct kref *kref) +{ + struct vmw_bo_dirty *dirty = container_of(kref, struct vmw_bo_dirty, ref_count); + + kvfree(dirty); +} + /** * vmw_bo_dirty_release - Release a dirty-tracking user from a buffer object * @vbo: The buffer object @@ -274,7 +281,7 @@ void vmw_bo_dirty_release(struct vmw_bo *vbo) { struct vmw_bo_dirty *dirty = vbo->dirty; - if (dirty && kref_put(&dirty->ref_count, (void *)kvfree)) + if (dirty && kref_put(&dirty->ref_count, vmw_bo_dirty_free)) vbo->dirty = NULL; } From 922f9dec5d19df4cfbb7070275e5c131d10c80f3 Mon Sep 17 00:00:00 2001 From: Ian Forbes Date: Fri, 9 Jan 2026 09:51:39 -0600 Subject: [PATCH 008/359] drm/vmwgfx: Set a unique ID for each submitted command buffer These IDs are logged by the Hypervisor when debug logging is enabled. Having the IDs in the log makes it much easier to see when command buffers start and finish. They can also be used by logging/tracing in the Guest to help correlate between Guest and Hypervisor logs. Signed-off-by: Ian Forbes Signed-off-by: Zack Rusin Link: https://patch.msgid.link/20260109155139.3259493-1-ian.forbes@broadcom.com --- drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c index 94e8982f5616..1ee37690b940 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c @@ -105,6 +105,7 @@ struct vmw_cmdbuf_context { * @handle: DMA address handle for the command buffer space if @using_mob is * false. Immutable. * @size: The size of the command buffer space. Immutable. + * @id: Monotonically increasing ID of the last cmdbuf submitted. * @num_contexts: Number of contexts actually enabled. */ struct vmw_cmdbuf_man { @@ -132,6 +133,7 @@ struct vmw_cmdbuf_man { bool has_pool; dma_addr_t handle; size_t size; + u64 id; u32 num_contexts; }; @@ -303,6 +305,8 @@ static int vmw_cmdbuf_header_submit(struct vmw_cmdbuf_header *header) struct vmw_cmdbuf_man *man = header->man; u32 val; + header->cb_header->id = man->id++; + val = upper_32_bits(header->handle); vmw_write(man->dev_priv, SVGA_REG_COMMAND_HIGH, val); From 5023ca80f9589295cb60735016e39fc5cc714243 Mon Sep 17 00:00:00 2001 From: Ian Forbes Date: Tue, 13 Jan 2026 11:53:57 -0600 Subject: [PATCH 009/359] drm/vmwgfx: Return the correct value in vmw_translate_ptr functions Before the referenced fixes these functions used a lookup function that returned a pointer. This was changed to another lookup function that returned an error code with the pointer becoming an out parameter. The error path when the lookup failed was not changed to reflect this change and the code continued to return the PTR_ERR of the now uninitialized pointer. This could cause the vmw_translate_ptr functions to return success when they actually failed causing further uninitialized and OOB accesses. Reported-by: Kuzey Arda Bulut Fixes: a309c7194e8a ("drm/vmwgfx: Remove rcu locks from user resources") Signed-off-by: Ian Forbes Reviewed-by: Zack Rusin Signed-off-by: Zack Rusin Link: https://patch.msgid.link/20260113175357.129285-1-ian.forbes@broadcom.com --- drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c index 3057f8baa7d2..e1f18020170a 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c @@ -1143,7 +1143,7 @@ static int vmw_translate_mob_ptr(struct vmw_private *dev_priv, ret = vmw_user_bo_lookup(sw_context->filp, handle, &vmw_bo); if (ret != 0) { drm_dbg(&dev_priv->drm, "Could not find or use MOB buffer.\n"); - return PTR_ERR(vmw_bo); + return ret; } vmw_bo_placement_set(vmw_bo, VMW_BO_DOMAIN_MOB, VMW_BO_DOMAIN_MOB); ret = vmw_validation_add_bo(sw_context->ctx, vmw_bo); @@ -1199,7 +1199,7 @@ static int vmw_translate_guest_ptr(struct vmw_private *dev_priv, ret = vmw_user_bo_lookup(sw_context->filp, handle, &vmw_bo); if (ret != 0) { drm_dbg(&dev_priv->drm, "Could not find or use GMR region.\n"); - return PTR_ERR(vmw_bo); + return ret; } vmw_bo_placement_set(vmw_bo, VMW_BO_DOMAIN_GMR | VMW_BO_DOMAIN_VRAM, VMW_BO_DOMAIN_GMR | VMW_BO_DOMAIN_VRAM); From 52c9ee202edd21d0599ac3b5a6fe1da2a2f053e5 Mon Sep 17 00:00:00 2001 From: Corey Minyard Date: Fri, 6 Feb 2026 09:59:32 -0600 Subject: [PATCH 010/359] ipmi:si: Handle waiting messages when BMC failure detected If a BMC failure is detected, the current message is returned with an error. However, if there was a waiting message, it would not be handled. Add a check for the waiting message after handling the current message. Suggested-by: Guenter Roeck Reported-by: Rafael J. Wysocki Closes: https://lore.kernel.org/linux-acpi/CAK8fFZ58fidGUCHi5WFX0uoTPzveUUDzT=k=AAm4yWo3bAuCFg@mail.gmail.com/ Fixes: bc3a9d217755 ("ipmi:si: Gracefully handle if the BMC is non-functional") Cc: stable@vger.kernel.org # 4.18 Signed-off-by: Corey Minyard --- drivers/char/ipmi/ipmi_si_intf.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 5459ffdde8dc..ff159b1162b9 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -809,6 +809,12 @@ restart: */ return_hosed_msg(smi_info, IPMI_BUS_ERR); } + if (smi_info->waiting_msg != NULL) { + /* Also handle if there was a message waiting. */ + smi_info->curr_msg = smi_info->waiting_msg; + smi_info->waiting_msg = NULL; + return_hosed_msg(smi_info, IPMI_BUS_ERR); + } smi_mod_timer(smi_info, jiffies + SI_TIMEOUT_HOSED); goto out; } From c3bb3295637cc9bf514f690941ca9a385bf30113 Mon Sep 17 00:00:00 2001 From: Corey Minyard Date: Fri, 6 Feb 2026 10:33:52 -0600 Subject: [PATCH 011/359] ipmi:si: Use a long timeout when the BMC is misbehaving If the driver goes into HOSED state, don't reset the timeout to the short timeout in the timeout handler. Reported-by: Igor Raits Closes: https://lore.kernel.org/linux-acpi/CAK8fFZ58fidGUCHi5WFX0uoTPzveUUDzT=k=AAm4yWo3bAuCFg@mail.gmail.com/ Fixes: bc3a9d217755 ("ipmi:si: Gracefully handle if the BMC is non-functional") Cc: stable@vger.kernel.org # 4.18 Signed-off-by: Corey Minyard --- drivers/char/ipmi/ipmi_si_intf.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index ff159b1162b9..0049e3792ba1 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -1119,7 +1119,9 @@ static void smi_timeout(struct timer_list *t) * SI_USEC_PER_JIFFY); smi_result = smi_event_handler(smi_info, time_diff); - if ((smi_info->io.irq) && (!smi_info->interrupt_disabled)) { + if (smi_info->si_state == SI_HOSED) { + timeout = jiffies + SI_TIMEOUT_HOSED; + } else if ((smi_info->io.irq) && (!smi_info->interrupt_disabled)) { /* Running with interrupts, only do long timeouts. */ timeout = jiffies + SI_TIMEOUT_JIFFIES; smi_inc_stat(smi_info, long_timeouts); From 4efa91a28576054aae0e6dad9cba8fed8293aef8 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Fri, 30 Jan 2026 19:42:47 +0900 Subject: [PATCH 012/359] xfrm: always flush state and policy upon NETDEV_UNREGISTER event syzbot is reporting that "struct xfrm_state" refcount is leaking. unregister_netdevice: waiting for netdevsim0 to become free. Usage count = 2 ref_tracker: netdev@ffff888052f24618 has 1/1 users at __netdev_tracker_alloc include/linux/netdevice.h:4400 [inline] netdev_tracker_alloc include/linux/netdevice.h:4412 [inline] xfrm_dev_state_add+0x3a5/0x1080 net/xfrm/xfrm_device.c:316 xfrm_state_construct net/xfrm/xfrm_user.c:986 [inline] xfrm_add_sa+0x34ff/0x5fa0 net/xfrm/xfrm_user.c:1022 xfrm_user_rcv_msg+0x58e/0xc00 net/xfrm/xfrm_user.c:3507 netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2550 xfrm_netlink_rcv+0x71/0x90 net/xfrm/xfrm_user.c:3529 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x8c8/0xdd0 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] ____sys_sendmsg+0xa5d/0xc30 net/socket.c:2592 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2646 __sys_sendmsg+0x16d/0x220 net/socket.c:2678 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f This is because commit d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") implemented xfrm_dev_unregister() as no-op despite xfrm_dev_state_add() from xfrm_state_construct() acquires a reference to "struct net_device". I guess that that commit expected that NETDEV_DOWN event is fired before NETDEV_UNREGISTER event fires, and also assumed that xfrm_dev_state_add() is called only if (dev->features & NETIF_F_HW_ESP) != 0. Sabrina Dubroca identified steps to reproduce the same symptoms as below. echo 0 > /sys/bus/netdevsim/new_device dev=$(ls -1 /sys/bus/netdevsim/devices/netdevsim0/net/) ip xfrm state add src 192.168.13.1 dst 192.168.13.2 proto esp \ spi 0x1000 mode tunnel aead 'rfc4106(gcm(aes))' $key 128 \ offload crypto dev $dev dir out ethtool -K $dev esp-hw-offload off echo 0 > /sys/bus/netdevsim/del_device Like these steps indicate, the NETIF_F_HW_ESP bit can be cleared after xfrm_dev_state_add() acquired a reference to "struct net_device". Also, xfrm_dev_state_add() does not check for the NETIF_F_HW_ESP bit when acquiring a reference to "struct net_device". Commit 03891f820c21 ("xfrm: handle NETDEV_UNREGISTER for xfrm device") re-introduced the NETDEV_UNREGISTER event to xfrm_dev_event(), but that commit for unknown reason chose to share xfrm_dev_down() between the NETDEV_DOWN event and the NETDEV_UNREGISTER event. I guess that that commit missed the behavior in the previous paragraph. Therefore, we need to re-introduce xfrm_dev_unregister() in order to release the reference to "struct net_device" by unconditionally flushing state and policy. Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84 Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Cc: Sabrina Dubroca Signed-off-by: Tetsuo Handa Signed-off-by: Steffen Klassert --- net/xfrm/xfrm_device.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c index 52ae0e034d29..550457e4c4f0 100644 --- a/net/xfrm/xfrm_device.c +++ b/net/xfrm/xfrm_device.c @@ -544,6 +544,14 @@ static int xfrm_dev_down(struct net_device *dev) return NOTIFY_DONE; } +static int xfrm_dev_unregister(struct net_device *dev) +{ + xfrm_dev_state_flush(dev_net(dev), dev, true); + xfrm_dev_policy_flush(dev_net(dev), dev, true); + + return NOTIFY_DONE; +} + static int xfrm_dev_event(struct notifier_block *this, unsigned long event, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); @@ -556,8 +564,10 @@ static int xfrm_dev_event(struct notifier_block *this, unsigned long event, void return xfrm_api_check(dev); case NETDEV_DOWN: - case NETDEV_UNREGISTER: return xfrm_dev_down(dev); + + case NETDEV_UNREGISTER: + return xfrm_dev_unregister(dev); } return NOTIFY_DONE; } From fd3634312a04f336dcbfb481060219f0cd320738 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sat, 7 Feb 2026 14:27:05 +0100 Subject: [PATCH 013/359] debugobject: Make it work with deferred page initialization - again debugobjects uses __GFP_HIGH for allocations as it might be invoked within locked regions. That worked perfectly fine until v6.18. It still works correctly when deferred page initialization is disabled and works by chance when no page allocation is required before deferred page initialization has completed. Since v6.18 allocations w/o a reclaim flag cause new_slab() to end up in alloc_frozen_pages_nolock_noprof(), which returns early when deferred page initialization has not yet completed. As the deferred page initialization takes quite a while the debugobject pool is depleted and debugobjects are disabled. This can be worked around when PREEMPT_COUNT is enabled as that allows debugobjects to add __GFP_KSWAPD_RECLAIM to the GFP flags when the context is preemtible. When PREEMPT_COUNT is disabled the context is unknown and the reclaim bit can't be set because the caller might hold locks which might deadlock in the allocator. In preemptible context the reclaim bit is harmless and not a performance issue as that's usually invoked from slow path initialization context. That makes debugobjects depend on PREEMPT_COUNT || !DEFERRED_STRUCT_PAGE_INIT. Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") Signed-off-by: Thomas Gleixner Tested-by: Sebastian Andrzej Siewior Acked-by: Alexei Starovoitov Acked-by: Vlastimil Babka Link: https://patch.msgid.link/87pl6gznti.ffs@tglx --- lib/Kconfig.debug | 1 + lib/debugobjects.c | 19 ++++++++++++++++++- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index ba36939fda79..a874146ad828 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -723,6 +723,7 @@ source "mm/Kconfig.debug" config DEBUG_OBJECTS bool "Debug object operations" + depends on PREEMPT_COUNT || !DEFERRED_STRUCT_PAGE_INIT depends on DEBUG_KERNEL help If you say Y here, additional code will be inserted into the diff --git a/lib/debugobjects.c b/lib/debugobjects.c index 89a1d6745dc2..12f50de85b62 100644 --- a/lib/debugobjects.c +++ b/lib/debugobjects.c @@ -398,9 +398,26 @@ static void fill_pool(void) atomic_inc(&cpus_allocating); while (pool_should_refill(&pool_global)) { + gfp_t gfp = __GFP_HIGH | __GFP_NOWARN; HLIST_HEAD(head); - if (!kmem_alloc_batch(&head, obj_cache, __GFP_HIGH | __GFP_NOWARN)) + /* + * Allow reclaim only in preemptible context and during + * early boot. If not preemptible, the caller might hold + * locks causing a deadlock in the allocator. + * + * If the reclaim flag is not set during early boot then + * allocations, which happen before deferred page + * initialization has completed, will fail. + * + * In preemptible context the flag is harmless and not a + * performance issue as that's usually invoked from slow + * path initialization context. + */ + if (preemptible() || system_state < SYSTEM_SCHEDULING) + gfp |= __GFP_KSWAPD_RECLAIM; + + if (!kmem_alloc_batch(&head, obj_cache, gfp)) break; guard(raw_spinlock_irqsave)(&pool_lock); From fef0e649f8b42bdffe4a916dd46e1b1e9ad2f207 Mon Sep 17 00:00:00 2001 From: Felix Gu Date: Fri, 30 Jan 2026 00:21:19 +0800 Subject: [PATCH 014/359] drm/logicvc: Fix device node reference leak in logicvc_drm_config_parse() The logicvc_drm_config_parse() function calls of_get_child_by_name() to find the "layers" node but fails to release the reference, leading to a device node reference leak. Fix this by using the __free(device_node) cleanup attribute to automatic release the reference when the variable goes out of scope. Fixes: efeeaefe9be5 ("drm: Add support for the LogiCVC display controller") Signed-off-by: Felix Gu Reviewed-by: Luca Ceresoli Reviewed-by: Kory Maincent Link: https://patch.msgid.link/20260130-logicvc_drm-v1-1-04366463750c@gmail.com Signed-off-by: Luca Ceresoli --- drivers/gpu/drm/logicvc/logicvc_drm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/logicvc/logicvc_drm.c b/drivers/gpu/drm/logicvc/logicvc_drm.c index 204b0fee55d0..bbebf4fc7f51 100644 --- a/drivers/gpu/drm/logicvc/logicvc_drm.c +++ b/drivers/gpu/drm/logicvc/logicvc_drm.c @@ -92,7 +92,6 @@ static int logicvc_drm_config_parse(struct logicvc_drm *logicvc) struct device *dev = drm_dev->dev; struct device_node *of_node = dev->of_node; struct logicvc_drm_config *config = &logicvc->config; - struct device_node *layers_node; int ret; logicvc_of_property_parse_bool(of_node, LOGICVC_OF_PROPERTY_DITHERING, @@ -128,7 +127,8 @@ static int logicvc_drm_config_parse(struct logicvc_drm *logicvc) if (ret) return ret; - layers_node = of_get_child_by_name(of_node, "layers"); + struct device_node *layers_node __free(device_node) = + of_get_child_by_name(of_node, "layers"); if (!layers_node) { drm_err(drm_dev, "Missing non-optional layers node\n"); return -EINVAL; From b777b5e09eabeefc6ba80f4296366a4742701103 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 10 Feb 2026 17:02:25 +0000 Subject: [PATCH 015/359] time/jiffies: Inline jiffies_to_msecs() and jiffies_to_usecs() For common cases (HZ=100, 250 or 1000), these helpers are at most one multiply, so there is no point calling a tiny function. Keep them out of line for HZ=300 and others. This saves cycles in TCP fast path, among other things. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/8 grow/shrink: 25/89 up/down: 530/-3474 (-2944) ... nla_put_msecs 193 - -193 message_stats_print 2131 920 -1211 Total: Before=25365208, After=25362264, chg -0.01% Signed-off-by: Eric Dumazet Signed-off-by: Thomas Gleixner Link: https://patch.msgid.link/20260210170226.57209-1-edumazet@google.com --- include/linux/jiffies.h | 40 ++++++++++++++++++++++++++++++++++++++-- kernel/time/time.c | 19 +++++++------------ 2 files changed, 45 insertions(+), 14 deletions(-) diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h index fdef2c155c27..d1c3d4941854 100644 --- a/include/linux/jiffies.h +++ b/include/linux/jiffies.h @@ -434,8 +434,44 @@ extern unsigned long preset_lpj; /* * Convert various time units to each other: */ -extern unsigned int jiffies_to_msecs(const unsigned long j); -extern unsigned int jiffies_to_usecs(const unsigned long j); + +#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ) +/** + * jiffies_to_msecs - Convert jiffies to milliseconds + * @j: jiffies value + * + * This inline version takes care of HZ in {100,250,1000}. + * + * Return: milliseconds value + */ +static inline unsigned int jiffies_to_msecs(const unsigned long j) +{ + return (MSEC_PER_SEC / HZ) * j; +} +#else +unsigned int jiffies_to_msecs(const unsigned long j); +#endif + +#if !(USEC_PER_SEC % HZ) +/** + * jiffies_to_usecs - Convert jiffies to microseconds + * @j: jiffies value + * + * Return: microseconds value + */ +static inline unsigned int jiffies_to_usecs(const unsigned long j) +{ + /* + * Hz usually doesn't go much further MSEC_PER_SEC. + * jiffies_to_usecs() and usecs_to_jiffies() depend on that. + */ + BUILD_BUG_ON(HZ > USEC_PER_SEC); + + return (USEC_PER_SEC / HZ) * j; +} +#else +unsigned int jiffies_to_usecs(const unsigned long j); +#endif /** * jiffies_to_nsecs - Convert jiffies to nanoseconds diff --git a/kernel/time/time.c b/kernel/time/time.c index 0ba8e3c50d62..36fd2313ae7e 100644 --- a/kernel/time/time.c +++ b/kernel/time/time.c @@ -365,20 +365,16 @@ SYSCALL_DEFINE1(adjtimex_time32, struct old_timex32 __user *, utp) } #endif +#if HZ > MSEC_PER_SEC || (MSEC_PER_SEC % HZ) /** * jiffies_to_msecs - Convert jiffies to milliseconds * @j: jiffies value * - * Avoid unnecessary multiplications/divisions in the - * two most common HZ cases. - * * Return: milliseconds value */ unsigned int jiffies_to_msecs(const unsigned long j) { -#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ) - return (MSEC_PER_SEC / HZ) * j; -#elif HZ > MSEC_PER_SEC && !(HZ % MSEC_PER_SEC) +#if HZ > MSEC_PER_SEC && !(HZ % MSEC_PER_SEC) return (j + (HZ / MSEC_PER_SEC) - 1)/(HZ / MSEC_PER_SEC); #else # if BITS_PER_LONG == 32 @@ -390,7 +386,9 @@ unsigned int jiffies_to_msecs(const unsigned long j) #endif } EXPORT_SYMBOL(jiffies_to_msecs); +#endif +#if (USEC_PER_SEC % HZ) /** * jiffies_to_usecs - Convert jiffies to microseconds * @j: jiffies value @@ -405,17 +403,14 @@ unsigned int jiffies_to_usecs(const unsigned long j) */ BUILD_BUG_ON(HZ > USEC_PER_SEC); -#if !(USEC_PER_SEC % HZ) - return (USEC_PER_SEC / HZ) * j; -#else -# if BITS_PER_LONG == 32 +#if BITS_PER_LONG == 32 return (HZ_TO_USEC_MUL32 * j) >> HZ_TO_USEC_SHR32; -# else +#else return (j * HZ_TO_USEC_NUM) / HZ_TO_USEC_DEN; -# endif #endif } EXPORT_SYMBOL(jiffies_to_usecs); +#endif /** * mktime64 - Converts date to seconds. From f66857bafd4f151c5cc6856e47be2e12c1721e43 Mon Sep 17 00:00:00 2001 From: Fuad Tabba Date: Fri, 13 Feb 2026 14:38:12 +0000 Subject: [PATCH 016/359] KVM: arm64: Hide S1POE from guests when not supported by the host When CONFIG_ARM64_POE is disabled, KVM does not save/restore POR_EL1. However, ID_AA64MMFR3_EL1 sanitisation currently exposes the feature to guests whenever the hardware supports it, ignoring the host kernel configuration. If a guest detects this feature and attempts to use it, the host will fail to context-switch POR_EL1, potentially leading to state corruption. Fix this by masking ID_AA64MMFR3_EL1.S1POE in the sanitised system registers, preventing KVM from advertising the feature when the host does not support it (i.e. system_supports_poe() is false). Fixes: 70ed7238297f ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1") Signed-off-by: Fuad Tabba Link: https://patch.msgid.link/20260213143815.1732675-2-tabba@google.com Signed-off-by: Marc Zyngier --- arch/arm64/kvm/sys_regs.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index a7cd0badc20c..1b4cacb6e918 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1816,6 +1816,9 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu, ID_AA64MMFR3_EL1_SCTLRX | ID_AA64MMFR3_EL1_S1POE | ID_AA64MMFR3_EL1_S1PIE; + + if (!system_supports_poe()) + val &= ~ID_AA64MMFR3_EL1_S1POE; break; case SYS_ID_MMFR4_EL1: val &= ~ID_MMFR4_EL1_CCIDX; From 9cb0468d0b335ccf769bd8e161cc96195e82d8b1 Mon Sep 17 00:00:00 2001 From: Fuad Tabba Date: Fri, 13 Feb 2026 14:38:13 +0000 Subject: [PATCH 017/359] KVM: arm64: Optimise away S1POE handling when not supported by host Although ID register sanitisation prevents guests from seeing the feature, adding this check to the helper allows the compiler to entirely eliminate S1POE-specific code paths (such as context switching POR_EL1) when the host kernel is compiled without support (CONFIG_ARM64_POE is disabled). This aligns with the pattern used for other optional features like SVE (kvm_has_sve()) and FPMR (kvm_has_fpmr()), ensuring no POE logic if the host lacks support, regardless of the guest configuration state. Signed-off-by: Fuad Tabba Link: https://patch.msgid.link/20260213143815.1732675-3-tabba@google.com Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/kvm_host.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 5d5a3bbdb95e..2ca264b3db5f 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -1616,7 +1616,8 @@ void kvm_set_vm_id_reg(struct kvm *kvm, u32 reg, u64 val); (kvm_has_feat((k), ID_AA64MMFR3_EL1, S1PIE, IMP)) #define kvm_has_s1poe(k) \ - (kvm_has_feat((k), ID_AA64MMFR3_EL1, S1POE, IMP)) + (system_supports_poe() && \ + kvm_has_feat((k), ID_AA64MMFR3_EL1, S1POE, IMP)) #define kvm_has_ras(k) \ (kvm_has_feat((k), ID_AA64PFR0_EL1, RAS, IMP)) From 7e7c2cf0024d89443a7af52e09e47b1fe634ab17 Mon Sep 17 00:00:00 2001 From: Fuad Tabba Date: Fri, 13 Feb 2026 14:38:14 +0000 Subject: [PATCH 018/359] KVM: arm64: Fix ID register initialization for non-protected pKVM guests In protected mode, the hypervisor maintains a separate instance of the `kvm` structure for each VM. For non-protected VMs, this structure is initialized from the host's `kvm` state. Currently, `pkvm_init_features_from_host()` copies the `KVM_ARCH_FLAG_ID_REGS_INITIALIZED` flag from the host without the underlying `id_regs` data being initialized. This results in the hypervisor seeing the flag as set while the ID registers remain zeroed. Consequently, `kvm_has_feat()` checks at EL2 fail (return 0) for non-protected VMs. This breaks logic that relies on feature detection, such as `ctxt_has_tcrx()` for TCR2_EL1 support. As a result, certain system registers (e.g., TCR2_EL1, PIR_EL1, POR_EL1) are not saved/restored during the world switch, which could lead to state corruption. Fix this by explicitly copying the ID registers from the host `kvm` to the hypervisor `kvm` for non-protected VMs during initialization, since we trust the host with its non-protected guests' features. Also ensure `KVM_ARCH_FLAG_ID_REGS_INITIALIZED` is cleared initially in `pkvm_init_features_from_host` so that `vm_copy_id_regs` can properly initialize them and set the flag once done. Fixes: 41d6028e28bd ("KVM: arm64: Convert the SVE guest vcpu flag to a vm flag") Signed-off-by: Fuad Tabba Link: https://patch.msgid.link/20260213143815.1732675-4-tabba@google.com Signed-off-by: Marc Zyngier --- arch/arm64/kvm/hyp/nvhe/pkvm.c | 35 ++++++++++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c index 8e29d7734a15..f3c1c6955163 100644 --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c @@ -342,6 +342,7 @@ static void pkvm_init_features_from_host(struct pkvm_hyp_vm *hyp_vm, const struc /* No restrictions for non-protected VMs. */ if (!kvm_vm_is_protected(kvm)) { hyp_vm->kvm.arch.flags = host_arch_flags; + hyp_vm->kvm.arch.flags &= ~BIT_ULL(KVM_ARCH_FLAG_ID_REGS_INITIALIZED); bitmap_copy(kvm->arch.vcpu_features, host_kvm->arch.vcpu_features, @@ -471,6 +472,35 @@ err: return ret; } +static int vm_copy_id_regs(struct pkvm_hyp_vcpu *hyp_vcpu) +{ + struct pkvm_hyp_vm *hyp_vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu); + const struct kvm *host_kvm = hyp_vm->host_kvm; + struct kvm *kvm = &hyp_vm->kvm; + + if (!test_bit(KVM_ARCH_FLAG_ID_REGS_INITIALIZED, &host_kvm->arch.flags)) + return -EINVAL; + + if (test_and_set_bit(KVM_ARCH_FLAG_ID_REGS_INITIALIZED, &kvm->arch.flags)) + return 0; + + memcpy(kvm->arch.id_regs, host_kvm->arch.id_regs, sizeof(kvm->arch.id_regs)); + + return 0; +} + +static int pkvm_vcpu_init_sysregs(struct pkvm_hyp_vcpu *hyp_vcpu) +{ + int ret = 0; + + if (pkvm_hyp_vcpu_is_protected(hyp_vcpu)) + kvm_init_pvm_id_regs(&hyp_vcpu->vcpu); + else + ret = vm_copy_id_regs(hyp_vcpu); + + return ret; +} + static int init_pkvm_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu, struct pkvm_hyp_vm *hyp_vm, struct kvm_vcpu *host_vcpu) @@ -490,8 +520,9 @@ static int init_pkvm_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu, hyp_vcpu->vcpu.arch.cflags = READ_ONCE(host_vcpu->arch.cflags); hyp_vcpu->vcpu.arch.mp_state.mp_state = KVM_MP_STATE_STOPPED; - if (pkvm_hyp_vcpu_is_protected(hyp_vcpu)) - kvm_init_pvm_id_regs(&hyp_vcpu->vcpu); + ret = pkvm_vcpu_init_sysregs(hyp_vcpu); + if (ret) + goto done; ret = pkvm_vcpu_init_traps(hyp_vcpu); if (ret) From 02471a78a052b631204aed051ab718e4d14ae687 Mon Sep 17 00:00:00 2001 From: Fuad Tabba Date: Fri, 13 Feb 2026 14:38:15 +0000 Subject: [PATCH 019/359] KVM: arm64: Remove redundant kern_hyp_va() in unpin_host_sve_state() The `sve_state` pointer in `hyp_vcpu->vcpu.arch` is initialized as a hypervisor virtual address during vCPU initialization in `pkvm_vcpu_init_sve()`. `unpin_host_sve_state()` calls `kern_hyp_va()` on this address. Since `kern_hyp_va()` is idempotent, it's not a bug. However, it is unnecessary and potentially confusing. Remove the redundant conversion. Signed-off-by: Fuad Tabba Link: https://patch.msgid.link/20260213143815.1732675-5-tabba@google.com Signed-off-by: Marc Zyngier --- arch/arm64/kvm/hyp/nvhe/pkvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c index f3c1c6955163..2f029bfe4755 100644 --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c @@ -392,7 +392,7 @@ static void unpin_host_sve_state(struct pkvm_hyp_vcpu *hyp_vcpu) if (!vcpu_has_feature(&hyp_vcpu->vcpu, KVM_ARM_VCPU_SVE)) return; - sve_state = kern_hyp_va(hyp_vcpu->vcpu.arch.sve_state); + sve_state = hyp_vcpu->vcpu.arch.sve_state; hyp_unpin_shared_mem(sve_state, sve_state + vcpu_sve_state_size(&hyp_vcpu->vcpu)); } From ee5c38a8d31e5dea52299c43c2ec3213351ab6e1 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 6 Feb 2026 14:30:23 -0800 Subject: [PATCH 020/359] KVM: arm64: vgic: Handle const qualifier from gic_kvm_info allocation type In preparation for making the kmalloc family of allocators type aware, we need to make sure that the returned type from the allocation matches the type of the variable being assigned. (Before, the allocator would always return "void *", which can be implicitly cast to any pointer type.) The assigned type is "struct gic_kvm_info", but the returned type, while matching, is const qualified. To get them exactly matching, just use the dereferenced pointer for the sizeof(). Signed-off-by: Kees Cook Link: https://patch.msgid.link/20260206223022.it.052-kees@kernel.org Signed-off-by: Marc Zyngier --- arch/arm64/kvm/vgic/vgic-init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c index 86c149537493..a53f93546aa0 100644 --- a/arch/arm64/kvm/vgic/vgic-init.c +++ b/arch/arm64/kvm/vgic/vgic-init.c @@ -654,7 +654,7 @@ static struct gic_kvm_info *gic_kvm_info; void __init vgic_set_kvm_info(const struct gic_kvm_info *info) { BUG_ON(gic_kvm_info != NULL); - gic_kvm_info = kmalloc(sizeof(*info), GFP_KERNEL); + gic_kvm_info = kmalloc(sizeof(*gic_kvm_info), GFP_KERNEL); if (gic_kvm_info) *gic_kvm_info = *info; } From 0b87d51690dd5131cbe9fbd23746b037aab89815 Mon Sep 17 00:00:00 2001 From: Franz Schnyder Date: Fri, 6 Feb 2026 13:37:36 +0100 Subject: [PATCH 021/359] drm/bridge: ti-sn65dsi86: Enable HPD polling if IRQ is not used Fallback to polling to detect hotplug events on systems without interrupts. On systems where the interrupt line of the bridge is not connected, the bridge cannot notify hotplug events. Only add the DRM_BRIDGE_OP_HPD flag if an interrupt has been registered otherwise remain in polling mode. Fixes: 55e8ff842051 ("drm/bridge: ti-sn65dsi86: Add HPD for DisplayPort connector type") Cc: stable@vger.kernel.org # 6.16: 9133bc3f0564: drm/bridge: ti-sn65dsi86: Add Signed-off-by: Franz Schnyder Reviewed-by: Douglas Anderson [dianders: Adjusted Fixes/stable line based on discussion] Signed-off-by: Douglas Anderson Link: https://patch.msgid.link/20260206123758.374555-1-fra.schnyder@gmail.com --- drivers/gpu/drm/bridge/ti-sn65dsi86.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi86.c b/drivers/gpu/drm/bridge/ti-sn65dsi86.c index 276d05d25ad8..98d64ad791d0 100644 --- a/drivers/gpu/drm/bridge/ti-sn65dsi86.c +++ b/drivers/gpu/drm/bridge/ti-sn65dsi86.c @@ -1415,6 +1415,7 @@ static int ti_sn_bridge_probe(struct auxiliary_device *adev, { struct ti_sn65dsi86 *pdata = dev_get_drvdata(adev->dev.parent); struct device_node *np = pdata->dev->of_node; + const struct i2c_client *client = to_i2c_client(pdata->dev); int ret; pdata->next_bridge = devm_drm_of_get_bridge(&adev->dev, np, 1, 0); @@ -1433,8 +1434,9 @@ static int ti_sn_bridge_probe(struct auxiliary_device *adev, ? DRM_MODE_CONNECTOR_DisplayPort : DRM_MODE_CONNECTOR_eDP; if (pdata->bridge.type == DRM_MODE_CONNECTOR_DisplayPort) { - pdata->bridge.ops = DRM_BRIDGE_OP_EDID | DRM_BRIDGE_OP_DETECT | - DRM_BRIDGE_OP_HPD; + pdata->bridge.ops = DRM_BRIDGE_OP_EDID | DRM_BRIDGE_OP_DETECT; + if (client->irq) + pdata->bridge.ops |= DRM_BRIDGE_OP_HPD; /* * If comms were already enabled they would have been enabled * with the wrong value of HPD_DISABLE. Update it now. Comms From e9e0b48cd15b46dcb2bbc165f6b0fee698b855d6 Mon Sep 17 00:00:00 2001 From: Simon Ser Date: Sun, 8 Feb 2026 22:47:26 +0000 Subject: [PATCH 022/359] drm/fourcc: fix plane order for 10/12/16-bit YCbCr formats The short comments had the correct order, but the long comments had the planes reversed. Fixes: 2271e0a20ef7 ("drm: drm_fourcc: add 10/12/16bit software decoder YCbCr formats") Signed-off-by: Simon Ser Reviewed-by: Daniel Stone Reviewed-by: Robert Mader Link: https://patch.msgid.link/20260208224718.57199-1-contact@emersion.fr --- include/uapi/drm/drm_fourcc.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index e527b24bd824..c89aede3cb12 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -401,8 +401,8 @@ extern "C" { * implementation can multiply the values by 2^6=64. For that reason the padding * must only contain zeros. * index 0 = Y plane, [15:0] z:Y [6:10] little endian - * index 1 = Cr plane, [15:0] z:Cr [6:10] little endian - * index 2 = Cb plane, [15:0] z:Cb [6:10] little endian + * index 1 = Cb plane, [15:0] z:Cb [6:10] little endian + * index 2 = Cr plane, [15:0] z:Cr [6:10] little endian */ #define DRM_FORMAT_S010 fourcc_code('S', '0', '1', '0') /* 2x2 subsampled Cb (1) and Cr (2) planes 10 bits per channel */ #define DRM_FORMAT_S210 fourcc_code('S', '2', '1', '0') /* 2x1 subsampled Cb (1) and Cr (2) planes 10 bits per channel */ @@ -414,8 +414,8 @@ extern "C" { * implementation can multiply the values by 2^4=16. For that reason the padding * must only contain zeros. * index 0 = Y plane, [15:0] z:Y [4:12] little endian - * index 1 = Cr plane, [15:0] z:Cr [4:12] little endian - * index 2 = Cb plane, [15:0] z:Cb [4:12] little endian + * index 1 = Cb plane, [15:0] z:Cb [4:12] little endian + * index 2 = Cr plane, [15:0] z:Cr [4:12] little endian */ #define DRM_FORMAT_S012 fourcc_code('S', '0', '1', '2') /* 2x2 subsampled Cb (1) and Cr (2) planes 12 bits per channel */ #define DRM_FORMAT_S212 fourcc_code('S', '2', '1', '2') /* 2x1 subsampled Cb (1) and Cr (2) planes 12 bits per channel */ @@ -424,8 +424,8 @@ extern "C" { /* * 3 plane YCbCr * index 0 = Y plane, [15:0] Y little endian - * index 1 = Cr plane, [15:0] Cr little endian - * index 2 = Cb plane, [15:0] Cb little endian + * index 1 = Cb plane, [15:0] Cb little endian + * index 2 = Cr plane, [15:0] Cr little endian */ #define DRM_FORMAT_S016 fourcc_code('S', '0', '1', '6') /* 2x2 subsampled Cb (1) and Cr (2) planes 16 bits per channel */ #define DRM_FORMAT_S216 fourcc_code('S', '2', '1', '6') /* 2x1 subsampled Cb (1) and Cr (2) planes 16 bits per channel */ From cb184dd19154fc486fa3d9e02afe70a97e54e055 Mon Sep 17 00:00:00 2001 From: Edward Adam Davis Date: Fri, 6 Feb 2026 14:20:28 +0800 Subject: [PATCH 023/359] fs: init flags_valid before calling vfs_fileattr_get syzbot reported a uninit-value bug in [1]. Similar to the "*get" context where the kernel's internal file_kattr structure is initialized before calling vfs_fileattr_get(), we should use the same mechanism when using fa. [1] BUG: KMSAN: uninit-value in fuse_fileattr_get+0xeb4/0x1450 fs/fuse/ioctl.c:517 fuse_fileattr_get+0xeb4/0x1450 fs/fuse/ioctl.c:517 vfs_fileattr_get fs/file_attr.c:94 [inline] __do_sys_file_getattr fs/file_attr.c:416 [inline] Local variable fa.i created at: __do_sys_file_getattr fs/file_attr.c:380 [inline] __se_sys_file_getattr+0x8c/0xbd0 fs/file_attr.c:372 Reported-by: syzbot+7c31755f2cea07838b0c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7c31755f2cea07838b0c Tested-by: syzbot+7c31755f2cea07838b0c@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis Link: https://patch.msgid.link/tencent_B6C4583771D76766D71362A368696EC3B605@qq.com Signed-off-by: Christian Brauner --- fs/file_attr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/file_attr.c b/fs/file_attr.c index 42721427245a..0e9b4f737546 100644 --- a/fs/file_attr.c +++ b/fs/file_attr.c @@ -376,7 +376,7 @@ SYSCALL_DEFINE5(file_getattr, int, dfd, const char __user *, filename, struct path filepath __free(path_put) = {}; unsigned int lookup_flags = 0; struct file_attr fattr; - struct file_kattr fa; + struct file_kattr fa = { .flags_valid = true }; /* hint only */ int error; BUILD_BUG_ON(sizeof(struct file_attr) < FILE_ATTR_SIZE_VER0); From 9eed043d10f17301c1b5141e16bb98a85a8fd07e Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Tue, 3 Feb 2026 17:40:14 +0800 Subject: [PATCH 024/359] writeback: Fix wakeup and logging timeouts for !DETECT_HUNG_TASK Recent changes of fs-writeback cause such warnings if DETECT_HUNG_TASK is not enabled: INFO: The task sync:1342 has been waiting for writeback completion for more than 1 seconds. The reason is sysctl_hung_task_timeout_secs is 0 when DETECT_HUNG_TASK is not enabled, then it causes the warning message even if the writeback lasts for only one second. Guard the wakeup and logging with "#ifdef CONFIG_DETECT_HUNG_TASK" can eliminate the warning messages. But on the other hand, it is possible that sysctl_hung_task_timeout_secs be also 0 when DETECT_HUNG_TASK is enabled. So let's just check the value of sysctl_hung_task_timeout_secs to decide whether do wakeup and logging. Fixes: 1888635532fb ("writeback: Wake up waiting tasks when finishing the writeback of a chunk.") Fixes: d6e621590764 ("writeback: Add logging for slow writeback (exceeds sysctl_hung_task_timeout_secs)") Signed-off-by: Huacai Chen Link: https://patch.msgid.link/20260203094014.2273240-1-chenhuacai@loongson.cn Reviewed-by: Jan Kara Signed-off-by: Christian Brauner --- fs/fs-writeback.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 68228bf89b82..eb1ca86bf30a 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -198,10 +198,11 @@ static void wb_queue_work(struct bdi_writeback *wb, static bool wb_wait_for_completion_cb(struct wb_completion *done) { + unsigned long timeout = sysctl_hung_task_timeout_secs; unsigned long waited_secs = (jiffies - done->wait_start) / HZ; done->progress_stamp = jiffies; - if (waited_secs > sysctl_hung_task_timeout_secs) + if (timeout && (waited_secs > timeout)) pr_info("INFO: The task %s:%d has been waiting for writeback " "completion for more than %lu seconds.", current->comm, current->pid, waited_secs); @@ -1955,6 +1956,7 @@ static long writeback_sb_inodes(struct super_block *sb, .range_end = LLONG_MAX, }; unsigned long start_time = jiffies; + unsigned long timeout = sysctl_hung_task_timeout_secs; long write_chunk; long total_wrote = 0; /* count both pages and inodes */ unsigned long dirtied_before = jiffies; @@ -2041,9 +2043,8 @@ static long writeback_sb_inodes(struct super_block *sb, __writeback_single_inode(inode, &wbc); /* Report progress to inform the hung task detector of the progress. */ - if (work->done && work->done->progress_stamp && - (jiffies - work->done->progress_stamp) > HZ * - sysctl_hung_task_timeout_secs / 2) + if (work->done && work->done->progress_stamp && timeout && + (jiffies - work->done->progress_stamp) > HZ * timeout / 2) wake_up_all(work->done->waitq); wbc_detach_inode(&wbc); From 81f16c9778d730f573d0d565706bb7227e2405f4 Mon Sep 17 00:00:00 2001 From: Qing Wang Date: Fri, 13 Feb 2026 18:30:06 +0800 Subject: [PATCH 025/359] statmount: Fix the null-ptr-deref in do_statmount() If the mount is internal, it's mnt_ns will be MNT_NS_INTERNAL, which is defined as ERR_PTR(-EINVAL). So, in the do_statmount(), need to check ns of mount by IS_ERR() and return. Fixes: 0e5032237ee5 ("statmount: accept fd as a parameter") Reported-by: syzbot+9e03a9535ea65f687a44@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/698e287a.a70a0220.2c38d7.009e.GAE@google.com/ Signed-off-by: Qing Wang Link: https://patch.msgid.link/20260213103006.2472569-1-wangqing7171@gmail.com Reviewed-by: Bhavik Sachdev Signed-off-by: Christian Brauner --- fs/namespace.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/namespace.c b/fs/namespace.c index a67cbe42746d..90700df65f0d 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -5678,6 +5678,8 @@ static int do_statmount(struct kstatmount *s, u64 mnt_id, u64 mnt_ns_id, s->mnt = mnt_file->f_path.mnt; ns = real_mount(s->mnt)->mnt_ns; + if (IS_ERR(ns)) + return PTR_ERR(ns); if (!ns) /* * We can't set mount point and mnt_ns_id since we don't have a From ac83896172798cf82ebc643cf555aa4cdd3a07da Mon Sep 17 00:00:00 2001 From: Hongbo Li Date: Fri, 13 Feb 2026 02:28:12 +0000 Subject: [PATCH 026/359] iomap: Describe @private in iomap_readahead() The kernel test rebot reports the kernel-doc warning: ``` Warning: fs/iomap/buffered-io.c:624 function parameter 'private' not described in 'iomap_readahead' ``` The former commit in "iomap: stash iomap read ctx in the private field of iomap_iter" has added a new parameter @private to iomap_readahead(), so let's describe the parameter. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202601261111.vIL9rhgD-lkp@intel.com/ Fixes: 8806f279244b ("iomap: stash iomap read ctx in the private field of iomap_iter") Signed-off-by: Hongbo Li Link: https://patch.msgid.link/20260213022812.766187-1-lihongbo22@huawei.com Reviewed-by: Gao Xiang Reviewed-by: Christoph Hellwig Signed-off-by: Christian Brauner --- fs/iomap/buffered-io.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 1fe19b4ee2f4..58887513b894 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -625,6 +625,7 @@ static int iomap_readahead_iter(struct iomap_iter *iter, * iomap_readahead - Attempt to read pages from a file. * @ops: The operations vector for the filesystem. * @ctx: The ctx used for issuing readahead. + * @private: The filesystem-specific information for issuing iomap_iter. * * This function is for filesystems to call to implement their readahead * address_space operation. From 9478c166c46934160135e197b049b5a05753f2ad Mon Sep 17 00:00:00 2001 From: Dave Airlie Date: Thu, 21 Nov 2024 11:46:01 +1000 Subject: [PATCH 027/359] nouveau/gsp: drop WARN_ON in ACPI probes These WARN_ONs seem to trigger a lot, and we don't seem to have a plan to fix them, so just drop them, as they are most likely harmless. Cc: stable@vger.kernel.org Fixes: 176fdcbddfd2 ("drm/nouveau/gsp/r535: add support for booting GSP-RM") Signed-off-by: Dave Airlie Link: https://patch.msgid.link/20241121014601.229391-1-airlied@gmail.com Signed-off-by: Danilo Krummrich --- .../gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c index 7fb13434c051..a575a8dbf727 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c @@ -737,8 +737,8 @@ r535_gsp_acpi_caps(acpi_handle handle, CAPS_METHOD_DATA *caps) if (!obj) goto done; - if (WARN_ON(obj->type != ACPI_TYPE_BUFFER) || - WARN_ON(obj->buffer.length != 4)) + if (obj->type != ACPI_TYPE_BUFFER || + obj->buffer.length != 4) goto done; caps->status = 0; @@ -773,8 +773,8 @@ r535_gsp_acpi_jt(acpi_handle handle, JT_METHOD_DATA *jt) if (!obj) goto done; - if (WARN_ON(obj->type != ACPI_TYPE_BUFFER) || - WARN_ON(obj->buffer.length != 4)) + if (obj->type != ACPI_TYPE_BUFFER || + obj->buffer.length != 4) goto done; jt->status = 0; @@ -861,8 +861,8 @@ r535_gsp_acpi_dod(acpi_handle handle, DOD_METHOD_DATA *dod) _DOD = output.pointer; - if (WARN_ON(_DOD->type != ACPI_TYPE_PACKAGE) || - WARN_ON(_DOD->package.count > ARRAY_SIZE(dod->acpiIdList))) + if (_DOD->type != ACPI_TYPE_PACKAGE || + _DOD->package.count > ARRAY_SIZE(dod->acpiIdList)) return; for (int i = 0; i < _DOD->package.count; i++) { From 46120745bb4e7e1f09959624716b4c5d6e2c2e9e Mon Sep 17 00:00:00 2001 From: Ethan Tidmore Date: Sun, 15 Feb 2026 22:04:38 -0600 Subject: [PATCH 028/359] drm/tiny: sharp-memory: fix pointer error dereference The function devm_drm_dev_alloc() returns a pointer error upon failure not NULL. Change null check to pointer error check. Detected by Smatch: drivers/gpu/drm/tiny/sharp-memory.c:549 sharp_memory_probe() error: 'smd' dereferencing possible ERR_PTR() Fixes: b8f9f21716fec ("drm/tiny: Add driver for Sharp Memory LCD") Signed-off-by: Ethan Tidmore Reviewed-by: Thomas Zimmermann Signed-off-by: Thomas Zimmermann Link: https://patch.msgid.link/20260216040438.43702-1-ethantidmore06@gmail.com --- drivers/gpu/drm/tiny/sharp-memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/tiny/sharp-memory.c b/drivers/gpu/drm/tiny/sharp-memory.c index 64272cd0f6e2..cbf69460ebf3 100644 --- a/drivers/gpu/drm/tiny/sharp-memory.c +++ b/drivers/gpu/drm/tiny/sharp-memory.c @@ -541,8 +541,8 @@ static int sharp_memory_probe(struct spi_device *spi) smd = devm_drm_dev_alloc(dev, &sharp_memory_drm_driver, struct sharp_memory_device, drm); - if (!smd) - return -ENOMEM; + if (IS_ERR(smd)) + return PTR_ERR(smd); spi_set_drvdata(spi, smd); From 1072020685f4b81f6efad3b412cdae0bd62bb043 Mon Sep 17 00:00:00 2001 From: Nam Cao Date: Thu, 12 Feb 2026 12:41:25 +0100 Subject: [PATCH 029/359] irqchip/sifive-plic: Fix frozen interrupt due to affinity setting PLIC ignores interrupt completion message for disabled interrupt, explained by the specification: The PLIC signals it has completed executing an interrupt handler by writing the interrupt ID it received from the claim to the claim/complete register. The PLIC does not check whether the completion ID is the same as the last claim ID for that target. If the completion ID does not match an interrupt source that is currently enabled for the target, the completion is silently ignored. This caused problems in the past, because an interrupt can be disabled while still being handled and plic_irq_eoi() had no effect. That was fixed by checking if the interrupt is disabled, and if so enable it, before sending the completion message. That check is done with irqd_irq_disabled(). However, that is not sufficient because the enable bit for the handling hart can be zero despite irqd_irq_disabled(d) being false. This can happen when affinity setting is changed while a hart is still handling the interrupt. This problem is easily reproducible by dumping a large file to uart (which generates lots of interrupts) and at the same time keep changing the uart interrupt's affinity setting. The uart port becomes frozen almost instantaneously. Fix this by checking PLIC's enable bit instead of irqd_irq_disabled(). Fixes: cc9f04f9a84f ("irqchip/sifive-plic: Implement irq_set_affinity() for SMP host") Signed-off-by: Nam Cao Signed-off-by: Thomas Gleixner Link: https://patch.msgid.link/20260212114125.3148067-1-namcao@linutronix.de --- drivers/irqchip/irq-sifive-plic.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c index 60fd8f91762b..70058871d2fb 100644 --- a/drivers/irqchip/irq-sifive-plic.c +++ b/drivers/irqchip/irq-sifive-plic.c @@ -172,8 +172,13 @@ static void plic_irq_disable(struct irq_data *d) static void plic_irq_eoi(struct irq_data *d) { struct plic_handler *handler = this_cpu_ptr(&plic_handlers); + u32 __iomem *reg; + bool enabled; - if (unlikely(irqd_irq_disabled(d))) { + reg = handler->enable_base + (d->hwirq / 32) * sizeof(u32); + enabled = readl(reg) & BIT(d->hwirq % 32); + + if (unlikely(!enabled)) { plic_toggle(handler, d->hwirq, 1); writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM); plic_toggle(handler, d->hwirq, 0); From ce9e40a9a5e5cff0b1b0d2fa582b3d71a8ce68e8 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 6 Feb 2026 15:48:16 +0000 Subject: [PATCH 030/359] irqchip/gic-v3-its: Limit number of per-device MSIs to the range the ITS supports The ITS driver blindly assumes that EventIDs are in abundant supply, to the point where it never checks how many the hardware actually supports. It turns out that some pretty esoteric integrations make it so that only a few bits are available, all the way down to a single bit. Enforce the advertised limitation at the point of allocating the device structure, and hope that the endpoint driver can deal with such limitation. Fixes: 84a6a2e7fc18d ("irqchip: GICv3: ITS: device allocation and configuration") Signed-off-by: Marc Zyngier Signed-off-by: Thomas Gleixner Reviewed-by: Robin Murphy Reviewed-by: Zenghui Yu Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260206154816.3582887-1-maz@kernel.org --- drivers/irqchip/irq-gic-v3-its.c | 4 ++++ include/linux/irqchip/arm-gic-v3.h | 1 + 2 files changed, 5 insertions(+) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 2988def30972..a51e8e6a8181 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -3475,6 +3475,7 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id, int lpi_base; int nr_lpis; int nr_ites; + int id_bits; int sz; if (!its_alloc_device_table(its, dev_id)) @@ -3486,7 +3487,10 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id, /* * Even if the device wants a single LPI, the ITT must be * sized as a power of two (and you need at least one bit...). + * Also honor the ITS's own EID limit. */ + id_bits = FIELD_GET(GITS_TYPER_IDBITS, its->typer) + 1; + nvecs = min_t(unsigned int, nvecs, BIT(id_bits)); nr_ites = max(2, nvecs); sz = nr_ites * (FIELD_GET(GITS_TYPER_ITT_ENTRY_SIZE, its->typer) + 1); sz = max(sz, ITS_ITT_ALIGN); diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 70c0948f978e..0225121f3013 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -394,6 +394,7 @@ #define GITS_TYPER_VLPIS (1UL << 1) #define GITS_TYPER_ITT_ENTRY_SIZE_SHIFT 4 #define GITS_TYPER_ITT_ENTRY_SIZE GENMASK_ULL(7, 4) +#define GITS_TYPER_IDBITS GENMASK_ULL(12, 8) #define GITS_TYPER_IDBITS_SHIFT 8 #define GITS_TYPER_DEVBITS_SHIFT 13 #define GITS_TYPER_DEVBITS GENMASK_ULL(17, 13) From f0617176be5e497b67b3c87bac35b26ebccac499 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Mon, 16 Feb 2026 12:04:50 +0100 Subject: [PATCH 031/359] irqchip/mmp: Make icu_irq_chip variable static const File-scope 'icu_irq_chip' is not used outside of this unit and is not modified anywhere, so make it static const to silence sparse warning: irq-mmp.c:139:17: warning: symbol 'icu_irq_chip' was not declared. Should it be static? Signed-off-by: Krzysztof Kozlowski Signed-off-by: Thomas Gleixner Link: https://patch.msgid.link/20260216110449.160277-2-krzysztof.kozlowski@oss.qualcomm.com --- drivers/irqchip/irq-mmp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-mmp.c b/drivers/irqchip/irq-mmp.c index 09e640430208..8e210cb451be 100644 --- a/drivers/irqchip/irq-mmp.c +++ b/drivers/irqchip/irq-mmp.c @@ -136,7 +136,7 @@ static void icu_unmask_irq(struct irq_data *d) } } -struct irq_chip icu_irq_chip = { +static const struct irq_chip icu_irq_chip = { .name = "icu_irq", .irq_mask = icu_mask_irq, .irq_mask_ack = icu_mask_ack_irq, From bffda93a51b40afd67c11bf558dc5aae83ca0943 Mon Sep 17 00:00:00 2001 From: Mathias Krause Date: Thu, 12 Feb 2026 11:23:27 -0800 Subject: [PATCH 032/359] scsi: lpfc: Properly set WC for DPP mapping Using set_memory_wc() to enable write-combining for the DPP portion of the MMIO mapping is wrong as set_memory_*() is meant to operate on RAM only, not MMIO mappings. In fact, as used currently triggers a BUG_ON() with enabled CONFIG_DEBUG_VIRTUAL. Simply map the DPP region separately and in addition to the already existing mappings, avoiding any possible negative side effects for these. Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4") Signed-off-by: Mathias Krause Signed-off-by: Justin Tee Reviewed-by: Mathias Krause Link: https://patch.msgid.link/20260212192327.141104-1-justintee8345@gmail.com Signed-off-by: Martin K. Petersen --- drivers/scsi/lpfc/lpfc_init.c | 2 ++ drivers/scsi/lpfc/lpfc_sli.c | 36 +++++++++++++++++++++++++++++------ drivers/scsi/lpfc/lpfc_sli4.h | 3 +++ 3 files changed, 35 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index a116a16c4a6f..b5e53c7d33e7 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -12039,6 +12039,8 @@ lpfc_sli4_pci_mem_unset(struct lpfc_hba *phba) iounmap(phba->sli4_hba.conf_regs_memmap_p); if (phba->sli4_hba.dpp_regs_memmap_p) iounmap(phba->sli4_hba.dpp_regs_memmap_p); + if (phba->sli4_hba.dpp_regs_memmap_wc_p) + iounmap(phba->sli4_hba.dpp_regs_memmap_wc_p); break; case LPFC_SLI_INTF_IF_TYPE_1: break; diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index 734af3d039f8..690763e0aa55 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -15981,6 +15981,32 @@ lpfc_dual_chute_pci_bar_map(struct lpfc_hba *phba, uint16_t pci_barset) return NULL; } +static __maybe_unused void __iomem * +lpfc_dpp_wc_map(struct lpfc_hba *phba, uint8_t dpp_barset) +{ + + /* DPP region is supposed to cover 64-bit BAR2 */ + if (dpp_barset != WQ_PCI_BAR_4_AND_5) { + lpfc_log_msg(phba, KERN_WARNING, LOG_INIT, + "3273 dpp_barset x%x != WQ_PCI_BAR_4_AND_5\n", + dpp_barset); + return NULL; + } + + if (!phba->sli4_hba.dpp_regs_memmap_wc_p) { + void __iomem *dpp_map; + + dpp_map = ioremap_wc(phba->pci_bar2_map, + pci_resource_len(phba->pcidev, + PCI_64BIT_BAR4)); + + if (dpp_map) + phba->sli4_hba.dpp_regs_memmap_wc_p = dpp_map; + } + + return phba->sli4_hba.dpp_regs_memmap_wc_p; +} + /** * lpfc_modify_hba_eq_delay - Modify Delay Multiplier on EQs * @phba: HBA structure that EQs are on. @@ -16944,9 +16970,6 @@ lpfc_wq_create(struct lpfc_hba *phba, struct lpfc_queue *wq, uint8_t dpp_barset; uint32_t dpp_offset; uint8_t wq_create_version; -#ifdef CONFIG_X86 - unsigned long pg_addr; -#endif /* sanity check on queue memory */ if (!wq || !cq) @@ -17132,14 +17155,15 @@ lpfc_wq_create(struct lpfc_hba *phba, struct lpfc_queue *wq, #ifdef CONFIG_X86 /* Enable combined writes for DPP aperture */ - pg_addr = (unsigned long)(wq->dpp_regaddr) & PAGE_MASK; - rc = set_memory_wc(pg_addr, 1); - if (rc) { + bar_memmap_p = lpfc_dpp_wc_map(phba, dpp_barset); + if (!bar_memmap_p) { lpfc_printf_log(phba, KERN_ERR, LOG_INIT, "3272 Cannot setup Combined " "Write on WQ[%d] - disable DPP\n", wq->queue_id); phba->cfg_enable_dpp = 0; + } else { + wq->dpp_regaddr = bar_memmap_p + dpp_offset; } #else phba->cfg_enable_dpp = 0; diff --git a/drivers/scsi/lpfc/lpfc_sli4.h b/drivers/scsi/lpfc/lpfc_sli4.h index ee58383492b2..b6d90604bb61 100644 --- a/drivers/scsi/lpfc/lpfc_sli4.h +++ b/drivers/scsi/lpfc/lpfc_sli4.h @@ -785,6 +785,9 @@ struct lpfc_sli4_hba { void __iomem *dpp_regs_memmap_p; /* Kernel memory mapped address for * dpp registers */ + void __iomem *dpp_regs_memmap_wc_p;/* Kernel memory mapped address for + * dpp registers with write combining + */ union { struct { /* IF Type 0, BAR 0 PCI cfg space reg mem map */ From 57297736c08233987e5d29ce6584c6ca2a831b12 Mon Sep 17 00:00:00 2001 From: Jan Kiszka Date: Thu, 29 Jan 2026 15:30:39 +0100 Subject: [PATCH 033/359] scsi: storvsc: Fix scheduling while atomic on PREEMPT_RT This resolves the follow splat and lock-up when running with PREEMPT_RT enabled on Hyper-V: [ 415.140818] BUG: scheduling while atomic: stress-ng-iomix/1048/0x00000002 [ 415.140822] INFO: lockdep is turned off. [ 415.140823] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec ghash_clmulni_intel aesni_intel rapl binfmt_misc nls_ascii nls_cp437 vfat fat snd_pcm hyperv_drm snd_timer drm_client_lib drm_shmem_helper snd sg soundcore drm_kms_helper pcspkr hv_balloon hv_utils evdev joydev drm configfs efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vsock vmw_vmci efivarfs autofs4 ext4 crc16 mbcache jbd2 sr_mod sd_mod cdrom hv_storvsc serio_raw hid_generic scsi_transport_fc hid_hyperv scsi_mod hid hv_netvsc hyperv_keyboard scsi_common [ 415.140846] Preemption disabled at: [ 415.140847] [] storvsc_queuecommand+0x2e1/0xbe0 [hv_storvsc] [ 415.140854] CPU: 8 UID: 0 PID: 1048 Comm: stress-ng-iomix Not tainted 6.19.0-rc7 #30 PREEMPT_{RT,(full)} [ 415.140856] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/04/2024 [ 415.140857] Call Trace: [ 415.140861] [ 415.140861] ? storvsc_queuecommand+0x2e1/0xbe0 [hv_storvsc] [ 415.140863] dump_stack_lvl+0x91/0xb0 [ 415.140870] __schedule_bug+0x9c/0xc0 [ 415.140875] __schedule+0xdf6/0x1300 [ 415.140877] ? rtlock_slowlock_locked+0x56c/0x1980 [ 415.140879] ? rcu_is_watching+0x12/0x60 [ 415.140883] schedule_rtlock+0x21/0x40 [ 415.140885] rtlock_slowlock_locked+0x502/0x1980 [ 415.140891] rt_spin_lock+0x89/0x1e0 [ 415.140893] hv_ringbuffer_write+0x87/0x2a0 [ 415.140899] vmbus_sendpacket_mpb_desc+0xb6/0xe0 [ 415.140900] ? rcu_is_watching+0x12/0x60 [ 415.140902] storvsc_queuecommand+0x669/0xbe0 [hv_storvsc] [ 415.140904] ? HARDIRQ_verbose+0x10/0x10 [ 415.140908] ? __rq_qos_issue+0x28/0x40 [ 415.140911] scsi_queue_rq+0x760/0xd80 [scsi_mod] [ 415.140926] __blk_mq_issue_directly+0x4a/0xc0 [ 415.140928] blk_mq_issue_direct+0x87/0x2b0 [ 415.140931] blk_mq_dispatch_queue_requests+0x120/0x440 [ 415.140933] blk_mq_flush_plug_list+0x7a/0x1a0 [ 415.140935] __blk_flush_plug+0xf4/0x150 [ 415.140940] __submit_bio+0x2b2/0x5c0 [ 415.140944] ? submit_bio_noacct_nocheck+0x272/0x360 [ 415.140946] submit_bio_noacct_nocheck+0x272/0x360 [ 415.140951] ext4_read_bh_lock+0x3e/0x60 [ext4] [ 415.140995] ext4_block_write_begin+0x396/0x650 [ext4] [ 415.141018] ? __pfx_ext4_da_get_block_prep+0x10/0x10 [ext4] [ 415.141038] ext4_da_write_begin+0x1c4/0x350 [ext4] [ 415.141060] generic_perform_write+0x14e/0x2c0 [ 415.141065] ext4_buffered_write_iter+0x6b/0x120 [ext4] [ 415.141083] vfs_write+0x2ca/0x570 [ 415.141087] ksys_write+0x76/0xf0 [ 415.141089] do_syscall_64+0x99/0x1490 [ 415.141093] ? rcu_is_watching+0x12/0x60 [ 415.141095] ? finish_task_switch.isra.0+0xdf/0x3d0 [ 415.141097] ? rcu_is_watching+0x12/0x60 [ 415.141098] ? lock_release+0x1f0/0x2a0 [ 415.141100] ? rcu_is_watching+0x12/0x60 [ 415.141101] ? finish_task_switch.isra.0+0xe4/0x3d0 [ 415.141103] ? rcu_is_watching+0x12/0x60 [ 415.141104] ? __schedule+0xb34/0x1300 [ 415.141106] ? hrtimer_try_to_cancel+0x1d/0x170 [ 415.141109] ? do_nanosleep+0x8b/0x160 [ 415.141111] ? hrtimer_nanosleep+0x89/0x100 [ 415.141114] ? __pfx_hrtimer_wakeup+0x10/0x10 [ 415.141116] ? xfd_validate_state+0x26/0x90 [ 415.141118] ? rcu_is_watching+0x12/0x60 [ 415.141120] ? do_syscall_64+0x1e0/0x1490 [ 415.141121] ? do_syscall_64+0x1e0/0x1490 [ 415.141123] ? rcu_is_watching+0x12/0x60 [ 415.141124] ? do_syscall_64+0x1e0/0x1490 [ 415.141125] ? do_syscall_64+0x1e0/0x1490 [ 415.141127] ? irqentry_exit+0x140/0x7e0 [ 415.141129] entry_SYSCALL_64_after_hwframe+0x76/0x7e get_cpu() disables preemption while the spinlock hv_ringbuffer_write is using is converted to an rt-mutex under PREEMPT_RT. Signed-off-by: Jan Kiszka Tested-by: Florian Bezdeka Reviewed-by: Michael Kelley Tested-by: Michael Kelley Link: https://patch.msgid.link/0c7fb5cd-fb21-4760-8593-e04bade84744@siemens.com Signed-off-by: Martin K. Petersen --- drivers/scsi/storvsc_drv.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 19e18939d3a5..19ff0004e259 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1855,8 +1855,9 @@ static enum scsi_qc_status storvsc_queuecommand(struct Scsi_Host *host, cmd_request->payload_sz = payload_sz; /* Invokes the vsc to start an IO */ - ret = storvsc_do_io(dev, cmd_request, get_cpu()); - put_cpu(); + migrate_disable(); + ret = storvsc_do_io(dev, cmd_request, smp_processor_id()); + migrate_enable(); if (ret) scsi_dma_unmap(scmnd); From 2e6b5cd6a4b37a95b78cf8c39a979b58c915c8ed Mon Sep 17 00:00:00 2001 From: Alexey Charkov Date: Mon, 9 Feb 2026 19:17:34 +0400 Subject: [PATCH 034/359] scsi: ufs: core: Fix RPMB region size detection for UFS 2.2 Older UFS spec devices (2.2 and earlier) do not expose per-region RPMB sizes, as only one RPMB region is supported. In such cases, the size of the single RPMB region can be deduced from the Logical Block Count and Logical Block Size fields in the RPMB Unit Descriptor. Add a fallback mechanism to calculate the RPMB region size from these fields if the device implements an older spec, so that the RPMB driver can work with such devices - otherwise it silently skips the whole RPMB. Section 14.1.4.6 (RPMB Unit Descriptor) Link: https://www.jedec.org/system/files/docs/JESD220C-2_2.pdf Cc: stable@vger.kernel.org Fixes: b06b8c421485 ("scsi: ufs: core: Add OP-TEE based RPMB driver for UFS devices") Reviewed-by: Bean Huo Signed-off-by: Alexey Charkov Link: https://patch.msgid.link/20260209-ufs-rpmb-v3-1-b1804e71bd38@flipper.net Signed-off-by: Martin K. Petersen --- drivers/ufs/core/ufshcd.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 8349fe2090db..12f1bdc70587 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -5248,6 +5249,25 @@ static void ufshcd_lu_init(struct ufs_hba *hba, struct scsi_device *sdev) hba->dev_info.rpmb_region_size[1] = desc_buf[RPMB_UNIT_DESC_PARAM_REGION1_SIZE]; hba->dev_info.rpmb_region_size[2] = desc_buf[RPMB_UNIT_DESC_PARAM_REGION2_SIZE]; hba->dev_info.rpmb_region_size[3] = desc_buf[RPMB_UNIT_DESC_PARAM_REGION3_SIZE]; + + if (hba->dev_info.wspecversion <= 0x0220) { + /* + * These older spec chips have only one RPMB region, + * sized between 128 kB minimum and 16 MB maximum. + * No per region size fields are provided (respective + * REGIONX_SIZE fields always contain zeros), so get + * it from the logical block count and size fields for + * compatibility + * + * (See JESD220C-2_2 Section 14.1.4.6 + * RPMB Unit Descriptor,* offset 13h, 4 bytes) + */ + hba->dev_info.rpmb_region_size[0] = + (get_unaligned_be64(desc_buf + + RPMB_UNIT_DESC_PARAM_LOGICAL_BLK_COUNT) + << desc_buf[RPMB_UNIT_DESC_PARAM_LOGICAL_BLK_SIZE]) + / SZ_128K; + } } From 70ca8caa96ce473647054f5c7b9dab5423902402 Mon Sep 17 00:00:00 2001 From: Tomas Henzl Date: Tue, 10 Feb 2026 20:18:50 +0100 Subject: [PATCH 035/359] scsi: ses: Fix devices attaching to different hosts On a multipath SAS system some devices don't end up with correct symlinks from the SCSI device to its enclosure. Some devices even have enclosure links pointing to enclosures attached to different SCSI hosts. ses_match_to_enclosure() calls enclosure_for_each_device() which iterates over all enclosures on the system, not just enclosures attached to the current SCSI host. Replace the iteration with a direct call to ses_enclosure_find_by_addr(). Reviewed-by: David Jeffery Signed-off-by: Tomas Henzl Link: https://patch.msgid.link/20260210191850.36784-1-thenzl@redhat.com Signed-off-by: Martin K. Petersen --- drivers/scsi/ses.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c index 789b170da652..98a7367d2f20 100644 --- a/drivers/scsi/ses.c +++ b/drivers/scsi/ses.c @@ -528,9 +528,8 @@ struct efd { }; static int ses_enclosure_find_by_addr(struct enclosure_device *edev, - void *data) + struct efd *efd) { - struct efd *efd = data; int i; struct ses_component *scomp; @@ -683,7 +682,7 @@ static void ses_match_to_enclosure(struct enclosure_device *edev, if (efd.addr) { efd.dev = &sdev->sdev_gendev; - enclosure_for_each_device(ses_enclosure_find_by_addr, &efd); + ses_enclosure_find_by_addr(edev, &efd); } } From 5b313760059c9df7d60aba7832279bcb81b4aec0 Mon Sep 17 00:00:00 2001 From: Won Jung Date: Wed, 11 Feb 2026 15:01:05 +0900 Subject: [PATCH 036/359] scsi: ufs: core: Reset urgent_bkops_lvl to allow runtime PM power mode Ensures that UFS Runtime PM can achieve power saving after System PM suspend by resetting hba->urgent_bkops_lvl. Also modify the ufshcd_bkops_exception_event_handler to avoid setting urgent_bkops_lvl when status is 0, which helps maintain optimal power management. On UFS devices supporting UFSHCD_CAP_AUTO_BKOPS_SUSPEND, a BKOPS exception event can lead to a situation where UFS Runtime PM can't enter low-power mode states even after the BKOPS exception has been resolved. BKOPS exception with bkops status 0 occurs, the driver logs: "ufshcd_bkops_exception_event_handler: device raised urgent BKOPS exception for bkops status 0" When a BKOPS exception occurs, ufshcd_bkops_exception_event_handler() reads the BKOPS status and sets hba->urgent_bkops_lvl to BKOPS_STATUS_NO_OP(0). This allows the device to perform Runtime PM without changing the UFS power mode. (__ufshcd_wl_suspend(hba, UFS_RUNTIME_PM)) During system PM suspend, ufshcd_disable_auto_bkops() is called, disabling auto bkops. After UFS System PM Resume, when runtime PM attempts to suspend again, ufshcd_urgent_bkops() is invoked. Since hba->urgent_bkops_lvl remains at BKOPS_STATUS_NO_OP(0), ufshcd_enable_auto_bkops() is triggered. However, in ufshcd_bkops_ctrl(), the driver compares the current BKOPS status with hba->urgent_bkops_lvl, and only enables auto bkops if curr_status >= hba->urgent_bkops_lvl. Since both values are 0, the condition is met As a result, __ufshcd_wl_suspend(hba, UFS_RUNTIME_PM) skips power mode transitions and remains in an active state, preventing power saving even though no urgent BKOPS condition exists. Signed-off-by: Won Jung Reviewed-by: Peter Wang Link: https://patch.msgid.link/1891546521.01770806581968.JavaMail.epsvc@epcpadp2new Signed-off-by: Martin K. Petersen --- drivers/ufs/core/ufshcd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 12f1bdc70587..3bbc2b51c87b 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -5982,6 +5982,7 @@ static int ufshcd_disable_auto_bkops(struct ufs_hba *hba) hba->auto_bkops_enabled = false; trace_ufshcd_auto_bkops_state(hba, "Disabled"); + hba->urgent_bkops_lvl = BKOPS_STATUS_PERF_IMPACT; hba->is_urgent_bkops_lvl_checked = false; out: return err; @@ -6085,7 +6086,7 @@ static void ufshcd_bkops_exception_event_handler(struct ufs_hba *hba) * impacted or critical. Handle these device by determining their urgent * bkops status at runtime. */ - if (curr_status < BKOPS_STATUS_PERF_IMPACT) { + if ((curr_status > BKOPS_STATUS_NO_OP) && (curr_status < BKOPS_STATUS_PERF_IMPACT)) { dev_err(hba->dev, "%s: device raised urgent BKOPS exception for bkops status %d\n", __func__, curr_status); /* update the current status as the urgent bkops level */ From fa96392ebebc8fade2b878acb14cce0f71016503 Mon Sep 17 00:00:00 2001 From: Ranjan Kumar Date: Thu, 12 Feb 2026 12:30:26 +0530 Subject: [PATCH 037/359] scsi: mpi3mr: Add NULL checks when resetting request and reply queues The driver encountered a crash during resource cleanup when the reply and request queues were NULL due to freed memory. This issue occurred when the creation of reply or request queues failed, and the driver freed the memory first, but attempted to mem set the content of the freed memory, leading to a system crash. Add NULL pointer checks for reply and request queues before accessing the reply/request memory during cleanup Signed-off-by: Ranjan Kumar Link: https://patch.msgid.link/20260212070026.30263-1-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen --- drivers/scsi/mpi3mr/mpi3mr_fw.c | 32 ++++++++++++++++++-------------- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/drivers/scsi/mpi3mr/mpi3mr_fw.c b/drivers/scsi/mpi3mr/mpi3mr_fw.c index 1cfbdb773353..04d4a2aea7d7 100644 --- a/drivers/scsi/mpi3mr/mpi3mr_fw.c +++ b/drivers/scsi/mpi3mr/mpi3mr_fw.c @@ -4806,21 +4806,25 @@ void mpi3mr_memset_buffers(struct mpi3mr_ioc *mrioc) } for (i = 0; i < mrioc->num_queues; i++) { - mrioc->op_reply_qinfo[i].qid = 0; - mrioc->op_reply_qinfo[i].ci = 0; - mrioc->op_reply_qinfo[i].num_replies = 0; - mrioc->op_reply_qinfo[i].ephase = 0; - atomic_set(&mrioc->op_reply_qinfo[i].pend_ios, 0); - atomic_set(&mrioc->op_reply_qinfo[i].in_use, 0); - mpi3mr_memset_op_reply_q_buffers(mrioc, i); + if (mrioc->op_reply_qinfo) { + mrioc->op_reply_qinfo[i].qid = 0; + mrioc->op_reply_qinfo[i].ci = 0; + mrioc->op_reply_qinfo[i].num_replies = 0; + mrioc->op_reply_qinfo[i].ephase = 0; + atomic_set(&mrioc->op_reply_qinfo[i].pend_ios, 0); + atomic_set(&mrioc->op_reply_qinfo[i].in_use, 0); + mpi3mr_memset_op_reply_q_buffers(mrioc, i); + } - mrioc->req_qinfo[i].ci = 0; - mrioc->req_qinfo[i].pi = 0; - mrioc->req_qinfo[i].num_requests = 0; - mrioc->req_qinfo[i].qid = 0; - mrioc->req_qinfo[i].reply_qid = 0; - spin_lock_init(&mrioc->req_qinfo[i].q_lock); - mpi3mr_memset_op_req_q_buffers(mrioc, i); + if (mrioc->req_qinfo) { + mrioc->req_qinfo[i].ci = 0; + mrioc->req_qinfo[i].pi = 0; + mrioc->req_qinfo[i].num_requests = 0; + mrioc->req_qinfo[i].qid = 0; + mrioc->req_qinfo[i].reply_qid = 0; + spin_lock_init(&mrioc->req_qinfo[i].q_lock); + mpi3mr_memset_op_req_q_buffers(mrioc, i); + } } atomic_set(&mrioc->pend_large_data_sz, 0); From 38353c26db28efd984f51d426eac2396d299cca7 Mon Sep 17 00:00:00 2001 From: Salomon Dushimirimana Date: Fri, 13 Feb 2026 19:28:06 +0000 Subject: [PATCH 038/359] scsi: pm8001: Fix use-after-free in pm8001_queue_command() Commit e29c47fe8946 ("scsi: pm8001: Simplify pm8001_task_exec()") refactors pm8001_queue_command(), however it introduces a potential cause of a double free scenario when it changes the function to return -ENODEV in case of phy down/device gone state. In this path, pm8001_queue_command() updates task status and calls task_done to indicate to upper layer that the task has been handled. However, this also frees the underlying SAS task. A -ENODEV is then returned to the caller. When libsas sas_ata_qc_issue() receives this error value, it assumes the task wasn't handled/queued by LLDD and proceeds to clean up and free the task again, resulting in a double free. Since pm8001_queue_command() handles the SAS task in this case, it should return 0 to the caller indicating that the task has been handled. Fixes: e29c47fe8946 ("scsi: pm8001: Simplify pm8001_task_exec()") Signed-off-by: Salomon Dushimirimana Reviewed-by: Damien Le Moal Link: https://patch.msgid.link/20260213192806.439432-1-salomondush@google.com Signed-off-by: Martin K. Petersen --- drivers/scsi/pm8001/pm8001_sas.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c index 6a8d35aea93a..645524f3fe2d 100644 --- a/drivers/scsi/pm8001/pm8001_sas.c +++ b/drivers/scsi/pm8001/pm8001_sas.c @@ -525,8 +525,9 @@ int pm8001_queue_command(struct sas_task *task, gfp_t gfp_flags) } else { task->task_done(task); } - rc = -ENODEV; - goto err_out; + spin_unlock_irqrestore(&pm8001_ha->lock, flags); + pm8001_dbg(pm8001_ha, IO, "pm8001_task_exec device gone\n"); + return 0; } ccb = pm8001_ccb_alloc(pm8001_ha, pm8001_dev, task); From af3973e7b4fd91e6884b312526168869fb013d68 Mon Sep 17 00:00:00 2001 From: Thomas Fourier Date: Mon, 16 Feb 2026 15:10:55 +0100 Subject: [PATCH 039/359] scsi: snic: Remove unused linkstatus The (struct vnic_dev).linkstatus buffer is freed in svnic_dev_unregister() and referenced in svnic_dev_link_status() but never alloc'd. This means (struct vnic_dev).linkstatus is always null and the dealloc the reference in svnic_dev_link_status() is dead code. Signed-off-by: Thomas Fourier Acked-by: Karan Tilak Kumar Link: https://patch.msgid.link/20260216141056.59429-2-fourier.thomas@gmail.com Signed-off-by: Martin K. Petersen --- drivers/scsi/snic/vnic_dev.c | 9 --------- 1 file changed, 9 deletions(-) diff --git a/drivers/scsi/snic/vnic_dev.c b/drivers/scsi/snic/vnic_dev.c index 760f3f22095c..c4df0b17c86c 100644 --- a/drivers/scsi/snic/vnic_dev.c +++ b/drivers/scsi/snic/vnic_dev.c @@ -42,8 +42,6 @@ struct vnic_dev { struct vnic_devcmd_notify *notify; struct vnic_devcmd_notify notify_copy; dma_addr_t notify_pa; - u32 *linkstatus; - dma_addr_t linkstatus_pa; struct vnic_stats *stats; dma_addr_t stats_pa; struct vnic_devcmd_fw_info *fw_info; @@ -650,8 +648,6 @@ int svnic_dev_init(struct vnic_dev *vdev, int arg) int svnic_dev_link_status(struct vnic_dev *vdev) { - if (vdev->linkstatus) - return *vdev->linkstatus; if (!vnic_dev_notify_ready(vdev)) return 0; @@ -686,11 +682,6 @@ void svnic_dev_unregister(struct vnic_dev *vdev) sizeof(struct vnic_devcmd_notify), vdev->notify, vdev->notify_pa); - if (vdev->linkstatus) - dma_free_coherent(&vdev->pdev->dev, - sizeof(u32), - vdev->linkstatus, - vdev->linkstatus_pa); if (vdev->stats) dma_free_coherent(&vdev->pdev->dev, sizeof(struct vnic_stats), From 97af85787c1964a7e7b146c5d4e021f089af47ea Mon Sep 17 00:00:00 2001 From: Karan Tilak Kumar Date: Tue, 17 Feb 2026 12:46:58 -0800 Subject: [PATCH 040/359] scsi: snic: MAINTAINERS: Update snic maintainers Update snic maintainers. Signed-off-by: Karan Tilak Kumar Link: https://patch.msgid.link/20260217204658.5465-1-kartilak@cisco.com Signed-off-by: Martin K. Petersen --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 70292bd3ed71..0900af786cce 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6115,6 +6115,7 @@ F: drivers/scsi/fnic/ CISCO SCSI HBA DRIVER M: Karan Tilak Kumar +M: Narsimhulu Musini M: Sesidhar Baddela L: linux-scsi@vger.kernel.org S: Supported From a41dbf5e004edbe1260883c43a8bd134d9cb0c1c Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Sat, 14 Feb 2026 16:22:13 +0100 Subject: [PATCH 041/359] mount: hold namespace_sem across copy in create_new_namespace() Fix an oversight when creating a new mount namespace. If someone had the bright idea to make the real rootfs a shared or dependent mount and it is later copied the copy will become a peer of the old real rootfs mount or a dependent mount of it. The namespace semaphore is dropped and we use mount lock exact to lock the new real root mount. If that fails or the subsequent do_loopback() fails we rely on the copy of the real root mount to be cleaned up by path_put(). The problem is that this doesn't deal with mount propagation and will leave the mounts linked in the propagation lists. When creating a new mount namespace create_new_namespace() first acquires namespace_sem to clone the nullfs root, drops it, then reacquires it via LOCK_MOUNT_EXACT which takes inode_lock first to respect the inode_lock -> namespace_sem lock ordering. This drop-and-reacquire pattern is fragile and was the source of the propagation cleanup bug fixed in the preceding commit. Extend lock_mount_exact() with a copy_mount mode that clones the mount under the locks atomically. When copy_mount is true, path_overmounted() is skipped since we're copying the mount, not mounting on top of it - the nullfs root always has rootfs mounted on top so the check would always fail. If clone_mnt() fails after get_mountpoint() has pinned the mountpoint, __unlock_mount() is used to properly unpin the mountpoint and release both locks. This allows create_new_namespace() to use LOCK_MOUNT_EXACT_COPY which takes inode_lock and namespace_sem once and holds them throughout the clone and subsequent mount operations, eliminating the drop-and-reacquire pattern entirely. Reported-by: syzbot+a89f9434fb5a001ccd58@syzkaller.appspotmail.com Fixes: 9b8a0ba68246 ("mount: add OPEN_TREE_NAMESPACE") # mainline only Link: https://lore.kernel.org/699047f6.050a0220.2757fb.0024.GAE@google.com Signed-off-by: Christian Brauner --- fs/namespace.c | 117 +++++++++++++++++++++++++------------------------ 1 file changed, 60 insertions(+), 57 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 90700df65f0d..022e59afcb5e 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2791,7 +2791,8 @@ static inline void unlock_mount(struct pinned_mountpoint *m) } static void lock_mount_exact(const struct path *path, - struct pinned_mountpoint *mp); + struct pinned_mountpoint *mp, bool copy_mount, + unsigned int copy_flags); #define LOCK_MOUNT_MAYBE_BENEATH(mp, path, beneath) \ struct pinned_mountpoint mp __cleanup(unlock_mount) = {}; \ @@ -2799,7 +2800,10 @@ static void lock_mount_exact(const struct path *path, #define LOCK_MOUNT(mp, path) LOCK_MOUNT_MAYBE_BENEATH(mp, (path), false) #define LOCK_MOUNT_EXACT(mp, path) \ struct pinned_mountpoint mp __cleanup(unlock_mount) = {}; \ - lock_mount_exact((path), &mp) + lock_mount_exact((path), &mp, false, 0) +#define LOCK_MOUNT_EXACT_COPY(mp, path, copy_flags) \ + struct pinned_mountpoint mp __cleanup(unlock_mount) = {}; \ + lock_mount_exact((path), &mp, true, (copy_flags)) static int graft_tree(struct mount *mnt, const struct pinned_mountpoint *mp) { @@ -3073,16 +3077,13 @@ static struct file *open_detached_copy(struct path *path, unsigned int flags) return file; } -DEFINE_FREE(put_empty_mnt_ns, struct mnt_namespace *, - if (!IS_ERR_OR_NULL(_T)) free_mnt_ns(_T)) - static struct mnt_namespace *create_new_namespace(struct path *path, unsigned int flags) { - struct mnt_namespace *new_ns __free(put_empty_mnt_ns) = NULL; - struct path to_path __free(path_put) = {}; struct mnt_namespace *ns = current->nsproxy->mnt_ns; struct user_namespace *user_ns = current_user_ns(); - struct mount *new_ns_root; + struct mnt_namespace *new_ns; + struct mount *new_ns_root, *old_ns_root; + struct path to_path; struct mount *mnt; unsigned int copy_flags = 0; bool locked = false; @@ -3094,71 +3095,63 @@ static struct mnt_namespace *create_new_namespace(struct path *path, unsigned in if (IS_ERR(new_ns)) return ERR_CAST(new_ns); - scoped_guard(namespace_excl) { - new_ns_root = clone_mnt(ns->root, ns->root->mnt.mnt_root, copy_flags); - if (IS_ERR(new_ns_root)) - return ERR_CAST(new_ns_root); + old_ns_root = ns->root; + to_path.mnt = &old_ns_root->mnt; + to_path.dentry = old_ns_root->mnt.mnt_root; - /* - * If the real rootfs had a locked mount on top of it somewhere - * in the stack, lock the new mount tree as well so it can't be - * exposed. - */ - mnt = ns->root; - while (mnt->overmount) { - mnt = mnt->overmount; - if (mnt->mnt.mnt_flags & MNT_LOCKED) - locked = true; - } + VFS_WARN_ON_ONCE(old_ns_root->mnt.mnt_sb->s_type != &nullfs_fs_type); + + LOCK_MOUNT_EXACT_COPY(mp, &to_path, copy_flags); + if (IS_ERR(mp.parent)) { + free_mnt_ns(new_ns); + return ERR_CAST(mp.parent); + } + new_ns_root = mp.parent; + + /* + * If the real rootfs had a locked mount on top of it somewhere + * in the stack, lock the new mount tree as well so it can't be + * exposed. + */ + mnt = old_ns_root; + while (mnt->overmount) { + mnt = mnt->overmount; + if (mnt->mnt.mnt_flags & MNT_LOCKED) + locked = true; } /* - * We dropped the namespace semaphore so we can actually lock - * the copy for mounting. The copied mount isn't attached to any - * mount namespace and it is thus excluded from any propagation. - * So realistically we're isolated and the mount can't be - * overmounted. - */ - - /* Borrow the reference from clone_mnt(). */ - to_path.mnt = &new_ns_root->mnt; - to_path.dentry = dget(new_ns_root->mnt.mnt_root); - - /* Now lock for actual mounting. */ - LOCK_MOUNT_EXACT(mp, &to_path); - if (unlikely(IS_ERR(mp.parent))) - return ERR_CAST(mp.parent); - - /* - * We don't emulate unshare()ing a mount namespace. We stick to the - * restrictions of creating detached bind-mounts. It has a lot - * saner and simpler semantics. + * We don't emulate unshare()ing a mount namespace. We stick + * to the restrictions of creating detached bind-mounts. It + * has a lot saner and simpler semantics. */ mnt = __do_loopback(path, flags, copy_flags); - if (IS_ERR(mnt)) - return ERR_CAST(mnt); - scoped_guard(mount_writer) { + if (IS_ERR(mnt)) { + emptied_ns = new_ns; + umount_tree(new_ns_root, 0); + return ERR_CAST(mnt); + } + if (locked) mnt->mnt.mnt_flags |= MNT_LOCKED; /* - * Now mount the detached tree on top of the copy of the - * real rootfs we created. + * now mount the detached tree on top of the copy + * of the real rootfs we created. */ attach_mnt(mnt, new_ns_root, mp.mp); if (user_ns != ns->user_ns) lock_mnt_tree(new_ns_root); } - /* Add all mounts to the new namespace. */ - for (struct mount *p = new_ns_root; p; p = next_mnt(p, new_ns_root)) { - mnt_add_to_ns(new_ns, p); + for (mnt = new_ns_root; mnt; mnt = next_mnt(mnt, new_ns_root)) { + mnt_add_to_ns(new_ns, mnt); new_ns->nr_mounts++; } - new_ns->root = real_mount(no_free_ptr(to_path.mnt)); + new_ns->root = new_ns_root; ns_tree_add_raw(new_ns); - return no_free_ptr(new_ns); + return new_ns; } static struct file *open_new_namespace(struct path *path, unsigned int flags) @@ -3840,16 +3833,20 @@ static int do_new_mount(const struct path *path, const char *fstype, } static void lock_mount_exact(const struct path *path, - struct pinned_mountpoint *mp) + struct pinned_mountpoint *mp, bool copy_mount, + unsigned int copy_flags) { struct dentry *dentry = path->dentry; int err; + /* Assert that inode_lock() locked the correct inode. */ + VFS_WARN_ON_ONCE(copy_mount && !path_mounted(path)); + inode_lock(dentry->d_inode); namespace_lock(); if (unlikely(cant_mount(dentry))) err = -ENOENT; - else if (path_overmounted(path)) + else if (!copy_mount && path_overmounted(path)) err = -EBUSY; else err = get_mountpoint(dentry, mp); @@ -3857,9 +3854,15 @@ static void lock_mount_exact(const struct path *path, namespace_unlock(); inode_unlock(dentry->d_inode); mp->parent = ERR_PTR(err); - } else { - mp->parent = real_mount(path->mnt); + return; } + + if (copy_mount) + mp->parent = clone_mnt(real_mount(path->mnt), dentry, copy_flags); + else + mp->parent = real_mount(path->mnt); + if (unlikely(IS_ERR(mp->parent))) + __unlock_mount(mp); } int finish_automount(struct vfsmount *__m, const struct path *path) From 4a403d7aa9074f527f064ef0806aaab38d14b07c Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Thu, 29 Jan 2026 14:52:22 +0100 Subject: [PATCH 042/359] namespace: fix proc mount iteration The m->index isn't updated when m->show() overflows and retains its value before the current mount causing a restart to start at the same value. If that happens in short order to due a quickly expanding mount table this would cause the same mount to be shown again and again. Ensure that *pos always equals the mount id of the mount that was returned by start/next. On restart after overflow mnt_find_id_at(*pos) finds the exact mount. This should avoid duplicates, avoid skips and should handle concurrent modification just fine. Cc: Fixed: 2eea9ce4310d8 ("mounts: keep list of mounts in an rbtree") Link: https://patch.msgid.link/20260129-geleckt-treuhand-4bb940acacd9@brauner Signed-off-by: Christian Brauner --- fs/namespace.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 022e59afcb5e..31483c66ace8 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1531,23 +1531,33 @@ static struct mount *mnt_find_id_at_reverse(struct mnt_namespace *ns, u64 mnt_id static void *m_start(struct seq_file *m, loff_t *pos) { struct proc_mounts *p = m->private; + struct mount *mnt; down_read(&namespace_sem); - return mnt_find_id_at(p->ns, *pos); + mnt = mnt_find_id_at(p->ns, *pos); + if (mnt) + *pos = mnt->mnt_id_unique; + return mnt; } static void *m_next(struct seq_file *m, void *v, loff_t *pos) { - struct mount *next = NULL, *mnt = v; + struct mount *mnt = v; struct rb_node *node = rb_next(&mnt->mnt_node); - ++*pos; if (node) { - next = node_to_mount(node); + struct mount *next = node_to_mount(node); *pos = next->mnt_id_unique; + return next; } - return next; + + /* + * No more mounts. Set pos past current mount's ID so that if + * iteration restarts, mnt_find_id_at() returns NULL. + */ + *pos = mnt->mnt_id_unique + 1; + return NULL; } static void m_stop(struct seq_file *m, void *v) From ef0b64741a53e47ce8022c973099e969094aa536 Mon Sep 17 00:00:00 2001 From: Jori Koolstra Date: Mon, 1 Dec 2025 13:23:38 +0100 Subject: [PATCH 043/359] minix: Correct errno in minix_new_inode The cases (!j || j > sbi->s_ninodes) can never occur unless the filesystem is broken, so this should not return ENOSPC, but EFSCORRUPTED. Signed-off-by: Jori Koolstra Link: https://patch.msgid.link/20251201122338.90568-1-jkoolstra@xs4all.nl Reviewed-by: Jan Kara Signed-off-by: Christian Brauner --- fs/minix/bitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/minix/bitmap.c b/fs/minix/bitmap.c index 7da66ca184f4..abec438330a7 100644 --- a/fs/minix/bitmap.c +++ b/fs/minix/bitmap.c @@ -247,7 +247,7 @@ struct inode *minix_new_inode(const struct inode *dir, umode_t mode) j += i * bits_per_zone; if (!j || j > sbi->s_ninodes) { iput(inode); - return ERR_PTR(-ENOSPC); + return ERR_PTR(-EFSCORRUPTED); } inode_init_owner(&nop_mnt_idmap, inode, dir, mode); inode->i_ino = j; From 6c4b2243cb6c0755159bd567130d5e12e7b10d9f Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sat, 7 Feb 2026 08:25:24 +0000 Subject: [PATCH 044/359] unshare: fix unshare_fs() handling There's an unpleasant corner case in unshare(2), when we have a CLONE_NEWNS in flags and current->fs hadn't been shared at all; in that case copy_mnt_ns() gets passed current->fs instead of a private copy, which causes interesting warts in proof of correctness] > I guess if private means fs->users == 1, the condition could still be true. Unfortunately, it's worse than just a convoluted proof of correctness. Consider the case when we have CLONE_NEWCGROUP in addition to CLONE_NEWNS (and current->fs->users == 1). We pass current->fs to copy_mnt_ns(), all right. Suppose it succeeds and flips current->fs->{pwd,root} to corresponding locations in the new namespace. Now we proceed to copy_cgroup_ns(), which fails (e.g. with -ENOMEM). We call put_mnt_ns() on the namespace created by copy_mnt_ns(), it's destroyed and its mount tree is dissolved, but... current->fs->root and current->fs->pwd are both left pointing to now detached mounts. They are pinning those, so it's not a UAF, but it leaves the calling process with unshare(2) failing with -ENOMEM _and_ leaving it with pwd and root on detached isolated mounts. The last part is clearly a bug. There is other fun related to that mess (races with pivot_root(), including the one between pivot_root() and fork(), of all things), but this one is easy to isolate and fix - treat CLONE_NEWNS as "allocate a new fs_struct even if it hadn't been shared in the first place". Sure, we could go for something like "if both CLONE_NEWNS *and* one of the things that might end up failing after copy_mnt_ns() call in create_new_namespaces() are set, force allocation of new fs_struct", but let's keep it simple - the cost of copy_fs_struct() is trivial. Another benefit is that copy_mnt_ns() with CLONE_NEWNS *always* gets a freshly allocated fs_struct, yet to be attached to anything. That seriously simplifies the analysis... FWIW, that bug had been there since the introduction of unshare(2) ;-/ Signed-off-by: Al Viro Link: https://patch.msgid.link/20260207082524.GE3183987@ZenIV Tested-by: Waiman Long Signed-off-by: Christian Brauner --- kernel/fork.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/fork.c b/kernel/fork.c index e832da9d15a4..65113a304518 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -3085,7 +3085,7 @@ static int unshare_fs(unsigned long unshare_flags, struct fs_struct **new_fsp) return 0; /* don't need lock here; in the worst case we'll do useless copy */ - if (fs->users == 1) + if (!(unshare_flags & CLONE_NEWNS) && fs->users == 1) return 0; *new_fsp = copy_fs_struct(fs); From 7be41fb00e2c2a823f271a8318b453ca11812f1e Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 29 Oct 2025 08:30:11 +0300 Subject: [PATCH 045/359] accel: ethosu: Fix shift overflow in cmd_to_addr() The "((cmd[0] & 0xff0000) << 16)" shift is zero. This was intended to be (((u64)cmd[0] & 0xff0000) << 16). Move the cast to the correct location. Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver") Signed-off-by: Dan Carpenter Link: https://patch.msgid.link/aQGmY64tWcwOGFP4@stanley.mountain Signed-off-by: Rob Herring (Arm) --- drivers/accel/ethosu/ethosu_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/accel/ethosu/ethosu_gem.c b/drivers/accel/ethosu/ethosu_gem.c index 473b5f5d7514..7b073116314b 100644 --- a/drivers/accel/ethosu/ethosu_gem.c +++ b/drivers/accel/ethosu/ethosu_gem.c @@ -154,7 +154,7 @@ static void cmd_state_init(struct cmd_state *st) static u64 cmd_to_addr(u32 *cmd) { - return ((u64)((cmd[0] & 0xff0000) << 16)) | cmd[1]; + return (((u64)cmd[0] & 0xff0000) << 16) | cmd[1]; } static u64 dma_length(struct ethosu_validated_cmdstream_info *info, From 249013e673fce3506c61063c7cbedd75b4c668d8 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Wed, 18 Feb 2026 22:09:21 -0800 Subject: [PATCH 046/359] fsnotify: drop unused helper Remove this helper now that all users have been converted to fserror_report_metadata as of 7.0-rc1. Cc: jack@suse.cz Cc: amir73il@gmail.com Signed-off-by: Darrick J. Wong Link: https://patch.msgid.link/177148129543.716249.980530449513340111.stgit@frogsfrogsfrogs Reviewed-by: Christoph Hellwig Signed-off-by: Christian Brauner --- include/linux/fsnotify.h | 13 ------------- 1 file changed, 13 deletions(-) diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 28a9cb13fbfa..079c18bcdbde 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -495,19 +495,6 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid) fsnotify_dentry(dentry, mask); } -static inline int fsnotify_sb_error(struct super_block *sb, struct inode *inode, - int error) -{ - struct fs_error_report report = { - .error = error, - .inode = inode, - .sb = sb, - }; - - return fsnotify(FS_ERROR, &report, FSNOTIFY_EVENT_ERROR, - NULL, NULL, NULL, 0); -} - static inline void fsnotify_mnt_attach(struct mnt_namespace *ns, struct vfsmount *mnt) { fsnotify_mnt(FS_MNT_ATTACH, ns, mnt); From 294f54f849d846f4643a67db9b41b63867dc8bfe Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Wed, 18 Feb 2026 22:09:37 -0800 Subject: [PATCH 047/359] fserror: fix lockdep complaint when igrabbing inode Christoph Hellwig reported a lockdep splat in generic/108: ================================ WARNING: inconsistent lock state 6.19.0+ #4827 Tainted: G N -------------------------------- inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. swapper/1/0 [HC1[1]:SC0[0]:HE0:SE1] takes: ffff88811ed1b140 (&sb->s_type->i_lock_key#33){?.+.}-{3:3}, at: igrab+0x1a/0xb0 {HARDIRQ-ON-W} state was registered at: lock_acquire+0xca/0x2c0 _raw_spin_lock+0x2e/0x40 unlock_new_inode+0x2c/0xc0 xfs_iget+0xcf4/0x1080 xfs_trans_metafile_iget+0x3d/0x100 xfs_metafile_iget+0x2b/0x50 xfs_mount_setup_metadir+0x20/0x60 xfs_mountfs+0x457/0xa60 xfs_fs_fill_super+0x6b3/0xa90 get_tree_bdev_flags+0x13c/0x1e0 vfs_get_tree+0x27/0xe0 vfs_cmd_create+0x54/0xe0 __do_sys_fsconfig+0x309/0x620 do_syscall_64+0x8b/0xf80 entry_SYSCALL_64_after_hwframe+0x76/0x7e irq event stamp: 139080 hardirqs last enabled at (139079): [] do_idle+0x1ec/0x270 hardirqs last disabled at (139080): [] common_interrupt+0x19/0xe0 softirqs last enabled at (139032): [] __irq_exit_rcu+0xc3/0x120 softirqs last disabled at (139025): [] __irq_exit_rcu+0xc3/0x120 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sb->s_type->i_lock_key#33); lock(&sb->s_type->i_lock_key#33); *** DEADLOCK *** 1 lock held by swapper/1/0: #0: ffff8881052c81a0 (&vblk->vqs[i].lock){-.-.}-{3:3}, at: virtblk_done+0x4b/0x110 stack backtrace: CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G N 6.19.0+ #4827 PREEMPT(full) Tainted: [N]=TEST Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x5b/0x80 print_usage_bug.part.0+0x22c/0x2c0 mark_lock+0xa6f/0xe90 __lock_acquire+0x10b6/0x25e0 lock_acquire+0xca/0x2c0 _raw_spin_lock+0x2e/0x40 igrab+0x1a/0xb0 fserror_report+0x135/0x260 iomap_finish_ioend_buffered+0x170/0x210 clone_endio+0x8f/0x1c0 blk_update_request+0x1e4/0x4d0 blk_mq_end_request+0x1b/0x100 virtblk_done+0x6f/0x110 vring_interrupt+0x59/0x80 __handle_irq_event_percpu+0x8a/0x2e0 handle_irq_event+0x33/0x70 handle_edge_irq+0xdd/0x1e0 __common_interrupt+0x6f/0x180 common_interrupt+0xb7/0xe0 It looks like the concern here is that inode::i_lock is sometimes taken in IRQ context, and sometimes it is held when going to IRQ context, though it's a little difficult to tell since I think this is a kernel from after the actual 6.19 release but before 7.0-rc1. Either way, we don't need to take i_lock, because filesystems should not report files to fserror if they're about to be freed or have not yet been exposed to other threads, because the resulting fsnotify report will be meaningless. Therefore, bump inode::i_count directly and clarify the preconditions on the inode being passed in. Link: https://lore.kernel.org/linux-fsdevel/aY7BndIgQg3ci_6s@infradead.org/ Reported-by: Christoph Hellwig Signed-off-by: Darrick J. Wong Link: https://patch.msgid.link/177148129564.716249.3069780698231701540.stgit@frogsfrogsfrogs Reviewed-by: Christoph Hellwig Signed-off-by: Christian Brauner --- fs/iomap/ioend.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c index e4d57cb969f1..4d1ef8a2cee9 100644 --- a/fs/iomap/ioend.c +++ b/fs/iomap/ioend.c @@ -69,11 +69,57 @@ static u32 iomap_finish_ioend_buffered(struct iomap_ioend *ioend) return folio_count; } +static DEFINE_SPINLOCK(failed_ioend_lock); +static LIST_HEAD(failed_ioend_list); + +static void +iomap_fail_ioends( + struct work_struct *work) +{ + struct iomap_ioend *ioend; + struct list_head tmp; + unsigned long flags; + + spin_lock_irqsave(&failed_ioend_lock, flags); + list_replace_init(&failed_ioend_list, &tmp); + spin_unlock_irqrestore(&failed_ioend_lock, flags); + + while ((ioend = list_first_entry_or_null(&tmp, struct iomap_ioend, + io_list))) { + list_del_init(&ioend->io_list); + iomap_finish_ioend_buffered(ioend); + cond_resched(); + } +} + +static DECLARE_WORK(failed_ioend_work, iomap_fail_ioends); + +static void iomap_fail_ioend_buffered(struct iomap_ioend *ioend) +{ + unsigned long flags; + + /* + * Bounce I/O errors to a workqueue to avoid nested i_lock acquisitions + * in the fserror code. The caller no longer owns the ioend reference + * after the spinlock drops. + */ + spin_lock_irqsave(&failed_ioend_lock, flags); + if (list_empty(&failed_ioend_list)) + WARN_ON_ONCE(!schedule_work(&failed_ioend_work)); + list_add_tail(&ioend->io_list, &failed_ioend_list); + spin_unlock_irqrestore(&failed_ioend_lock, flags); +} + static void ioend_writeback_end_bio(struct bio *bio) { struct iomap_ioend *ioend = iomap_ioend_from_bio(bio); ioend->io_error = blk_status_to_errno(bio->bi_status); + if (ioend->io_error) { + iomap_fail_ioend_buffered(ioend); + return; + } + iomap_finish_ioend_buffered(ioend); } From 023cd6d90f8aa2ef7b72d84be84a18e61ecebd64 Mon Sep 17 00:00:00 2001 From: Piotr Mazek Date: Thu, 5 Feb 2026 23:05:02 +0100 Subject: [PATCH 048/359] ACPI: PM: Save NVS memory on Lenovo G70-35 [821d6f0359b0614792ab8e2fb93b503e25a65079] prevented machines produced later than 2012 from saving NVS region to accelerate S3. Despite being made after 2012, Lenovo G70-35 still needs NVS memory saving during S3. A quirk is introduced for this platform. Signed-off-by: Piotr Mazek [ rjw: Subject adjustment ] Link: https://patch.msgid.link/GV2PPF3CD5B63CC2442EE3F76F8443EAD90D499A@GV2PPF3CD5B63CC.EURP251.PROD.OUTLOOK.COM Signed-off-by: Rafael J. Wysocki --- drivers/acpi/sleep.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c index 66ec81e306d4..132a9df98471 100644 --- a/drivers/acpi/sleep.c +++ b/drivers/acpi/sleep.c @@ -386,6 +386,14 @@ static const struct dmi_system_id acpisleep_dmi_table[] __initconst = { DMI_MATCH(DMI_PRODUCT_NAME, "80E1"), }, }, + { + .callback = init_nvs_save_s3, + .ident = "Lenovo G70-35", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "80Q5"), + }, + }, /* * ThinkPad X1 Tablet(2016) cannot do suspend-to-idle using * the Low Power S0 Idle firmware interface (see From ab140365fb62c0bdab22b2f516aff563b2559e3b Mon Sep 17 00:00:00 2001 From: Lars Ellenberg Date: Thu, 19 Feb 2026 15:20:12 +0100 Subject: [PATCH 049/359] drbd: fix "LOGIC BUG" in drbd_al_begin_io_nonblock() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Even though we check that we "should" be able to do lc_get_cumulative() while holding the device->al_lock spinlock, it may still fail, if some other code path decided to do lc_try_lock() with bad timing. If that happened, we logged "LOGIC BUG for enr=...", but still did not return an error. The rest of the code now assumed that this request has references for the relevant activity log extents. The implcations are that during an active resync, mutual exclusivity of resync versus application IO is not guaranteed. And a potential crash at this point may not realizs that these extents could have been target of in-flight IO and would need to be resynced just in case. Also, once the request completes, it will give up activity log references it does not even hold, which will trigger a BUG_ON(refcnt == 0) in lc_put(). Fix: Do not crash the kernel for a condition that is harmless during normal operation: also catch "e->refcnt == 0", not only "e == NULL" when being noisy about "al_complete_io() called on inactive extent %u\n". And do not try to be smart and "guess" whether something will work, then be surprised when it does not. Deal with the fact that it may or may not work. If it does not, remember a possible "partially in activity log" state (only possible for requests that cross extent boundaries), and return an error code from drbd_al_begin_io_nonblock(). A latter call for the same request will then resume from where we left off. Cc: stable@vger.kernel.org Signed-off-by: Lars Ellenberg Signed-off-by: Christoph Böhmwalder Signed-off-by: Jens Axboe --- drivers/block/drbd/drbd_actlog.c | 53 +++++++++++++----------------- drivers/block/drbd/drbd_interval.h | 5 ++- 2 files changed, 27 insertions(+), 31 deletions(-) diff --git a/drivers/block/drbd/drbd_actlog.c b/drivers/block/drbd/drbd_actlog.c index 742b2908ff68..b3dbf6c76e98 100644 --- a/drivers/block/drbd/drbd_actlog.c +++ b/drivers/block/drbd/drbd_actlog.c @@ -483,38 +483,20 @@ void drbd_al_begin_io(struct drbd_device *device, struct drbd_interval *i) int drbd_al_begin_io_nonblock(struct drbd_device *device, struct drbd_interval *i) { - struct lru_cache *al = device->act_log; /* for bios crossing activity log extent boundaries, * we may need to activate two extents in one go */ unsigned first = i->sector >> (AL_EXTENT_SHIFT-9); unsigned last = i->size == 0 ? first : (i->sector + (i->size >> 9) - 1) >> (AL_EXTENT_SHIFT-9); - unsigned nr_al_extents; - unsigned available_update_slots; unsigned enr; - D_ASSERT(device, first <= last); - - nr_al_extents = 1 + last - first; /* worst case: all touched extends are cold. */ - available_update_slots = min(al->nr_elements - al->used, - al->max_pending_changes - al->pending_changes); - - /* We want all necessary updates for a given request within the same transaction - * We could first check how many updates are *actually* needed, - * and use that instead of the worst-case nr_al_extents */ - if (available_update_slots < nr_al_extents) { - /* Too many activity log extents are currently "hot". - * - * If we have accumulated pending changes already, - * we made progress. - * - * If we cannot get even a single pending change through, - * stop the fast path until we made some progress, - * or requests to "cold" extents could be starved. */ - if (!al->pending_changes) - __set_bit(__LC_STARVING, &device->act_log->flags); - return -ENOBUFS; + if (i->partially_in_al_next_enr) { + D_ASSERT(device, first < i->partially_in_al_next_enr); + D_ASSERT(device, last >= i->partially_in_al_next_enr); + first = i->partially_in_al_next_enr; } + D_ASSERT(device, first <= last); + /* Is resync active in this area? */ for (enr = first; enr <= last; enr++) { struct lc_element *tmp; @@ -529,14 +511,21 @@ int drbd_al_begin_io_nonblock(struct drbd_device *device, struct drbd_interval * } } - /* Checkout the refcounts. - * Given that we checked for available elements and update slots above, - * this has to be successful. */ + /* Try to checkout the refcounts. */ for (enr = first; enr <= last; enr++) { struct lc_element *al_ext; al_ext = lc_get_cumulative(device->act_log, enr); - if (!al_ext) - drbd_info(device, "LOGIC BUG for enr=%u\n", enr); + + if (!al_ext) { + /* Did not work. We may have exhausted the possible + * changes per transaction. Or raced with someone + * "locking" it against changes. + * Remember where to continue from. + */ + if (enr > first) + i->partially_in_al_next_enr = enr; + return -ENOBUFS; + } } return 0; } @@ -556,7 +545,11 @@ void drbd_al_complete_io(struct drbd_device *device, struct drbd_interval *i) for (enr = first; enr <= last; enr++) { extent = lc_find(device->act_log, enr); - if (!extent) { + /* Yes, this masks a bug elsewhere. However, during normal + * operation this is harmless, so no need to crash the kernel + * by the BUG_ON(refcount == 0) in lc_put(). + */ + if (!extent || extent->refcnt == 0) { drbd_err(device, "al_complete_io() called on inactive extent %u\n", enr); continue; } diff --git a/drivers/block/drbd/drbd_interval.h b/drivers/block/drbd/drbd_interval.h index 366489b72fe9..5d3213b81eed 100644 --- a/drivers/block/drbd/drbd_interval.h +++ b/drivers/block/drbd/drbd_interval.h @@ -8,12 +8,15 @@ struct drbd_interval { struct rb_node rb; sector_t sector; /* start sector of the interval */ - unsigned int size; /* size in bytes */ sector_t end; /* highest interval end in subtree */ + unsigned int size; /* size in bytes */ unsigned int local:1 /* local or remote request? */; unsigned int waiting:1; /* someone is waiting for completion */ unsigned int completed:1; /* this has been completed already; * ignore for conflict detection */ + + /* to resume a partially successful drbd_al_begin_io_nonblock(); */ + unsigned int partially_in_al_next_enr; }; static inline void drbd_clear_interval(struct drbd_interval *i) From 81b1f046ff8a5ad5da2c970cff354b61dfa1d6b1 Mon Sep 17 00:00:00 2001 From: Thorsten Blum Date: Mon, 12 Jan 2026 18:04:12 +0100 Subject: [PATCH 050/359] drbd: Replace deprecated strcpy with strscpy MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit strcpy() has been deprecated [1] because it performs no bounds checking on the destination buffer, which can lead to buffer overflows. Replace it with the safer strscpy(). No functional changes. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [1] Signed-off-by: Thorsten Blum Reviewed-by: Christoph Böhmwalder Signed-off-by: Jens Axboe --- drivers/block/drbd/drbd_main.c | 14 +++++++++----- drivers/block/drbd/drbd_receiver.c | 4 ++-- 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c index 1f6ac9202b66..ba00b5f21a49 100644 --- a/drivers/block/drbd/drbd_main.c +++ b/drivers/block/drbd/drbd_main.c @@ -32,6 +32,7 @@ #include #include #include +#include #include #include #include @@ -732,9 +733,9 @@ int drbd_send_sync_param(struct drbd_peer_device *peer_device) } if (apv >= 88) - strcpy(p->verify_alg, nc->verify_alg); + strscpy(p->verify_alg, nc->verify_alg); if (apv >= 89) - strcpy(p->csums_alg, nc->csums_alg); + strscpy(p->csums_alg, nc->csums_alg); rcu_read_unlock(); return drbd_send_command(peer_device, sock, cmd, size, NULL, 0); @@ -745,6 +746,7 @@ int __drbd_send_protocol(struct drbd_connection *connection, enum drbd_packet cm struct drbd_socket *sock; struct p_protocol *p; struct net_conf *nc; + size_t integrity_alg_len; int size, cf; sock = &connection->data; @@ -762,8 +764,10 @@ int __drbd_send_protocol(struct drbd_connection *connection, enum drbd_packet cm } size = sizeof(*p); - if (connection->agreed_pro_version >= 87) - size += strlen(nc->integrity_alg) + 1; + if (connection->agreed_pro_version >= 87) { + integrity_alg_len = strlen(nc->integrity_alg) + 1; + size += integrity_alg_len; + } p->protocol = cpu_to_be32(nc->wire_protocol); p->after_sb_0p = cpu_to_be32(nc->after_sb_0p); @@ -778,7 +782,7 @@ int __drbd_send_protocol(struct drbd_connection *connection, enum drbd_packet cm p->conn_flags = cpu_to_be32(cf); if (connection->agreed_pro_version >= 87) - strcpy(p->integrity_alg, nc->integrity_alg); + strscpy(p->integrity_alg, nc->integrity_alg, integrity_alg_len); rcu_read_unlock(); return __conn_send_command(connection, sock, cmd, size, NULL, 0); diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c index 3de919b6f0e1..3d0a061b6c40 100644 --- a/drivers/block/drbd/drbd_receiver.c +++ b/drivers/block/drbd/drbd_receiver.c @@ -3801,14 +3801,14 @@ static int receive_SyncParam(struct drbd_connection *connection, struct packet_i *new_net_conf = *old_net_conf; if (verify_tfm) { - strcpy(new_net_conf->verify_alg, p->verify_alg); + strscpy(new_net_conf->verify_alg, p->verify_alg); new_net_conf->verify_alg_len = strlen(p->verify_alg) + 1; crypto_free_shash(peer_device->connection->verify_tfm); peer_device->connection->verify_tfm = verify_tfm; drbd_info(device, "using verify-alg: \"%s\"\n", p->verify_alg); } if (csums_tfm) { - strcpy(new_net_conf->csums_alg, p->csums_alg); + strscpy(new_net_conf->csums_alg, p->csums_alg); new_net_conf->csums_alg_len = strlen(p->csums_alg) + 1; crypto_free_shash(peer_device->connection->csums_tfm); peer_device->connection->csums_tfm = csums_tfm; From 858d2a4f67ff69e645a43487ef7ea7f28f06deae Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 17 Feb 2026 16:12:05 +0000 Subject: [PATCH 051/359] tcp: fix potential race in tcp_v6_syn_recv_sock() Code in tcp_v6_syn_recv_sock() after the call to tcp_v4_syn_recv_sock() is done too late. After tcp_v4_syn_recv_sock(), the child socket is already visible from TCP ehash table and other cpus might use it. Since newinet->pinet6 is still pointing to the listener ipv6_pinfo bad things can happen as syzbot found. Move the problematic code in tcp_v6_mapped_child_init() and call this new helper from tcp_v4_syn_recv_sock() before the ehash insertion. This allows the removal of one tcp_sync_mss(), since tcp_v4_syn_recv_sock() will call it with the correct context. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+937b5bbb6a815b3e5d0b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/69949275.050a0220.2eeac1.0145.GAE@google.com/ Signed-off-by: Eric Dumazet Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20260217161205.2079883-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/net/inet_connection_sock.h | 4 +- include/net/tcp.h | 4 +- net/ipv4/syncookies.c | 2 +- net/ipv4/tcp_fastopen.c | 2 +- net/ipv4/tcp_ipv4.c | 8 ++- net/ipv4/tcp_minisocks.c | 2 +- net/ipv6/tcp_ipv6.c | 98 +++++++++++++----------------- net/mptcp/subflow.c | 6 +- net/smc/af_smc.c | 6 +- 9 files changed, 66 insertions(+), 66 deletions(-) diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index ecb362025c4e..5cb3056d6ddc 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -42,7 +42,9 @@ struct inet_connection_sock_af_ops { struct request_sock *req, struct dst_entry *dst, struct request_sock *req_unhash, - bool *own_req); + bool *own_req, + void (*opt_child_init)(struct sock *newsk, + const struct sock *sk)); u16 net_header_len; int (*setsockopt)(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); diff --git a/include/net/tcp.h b/include/net/tcp.h index 40e72b9cb85f..eb8bf63fdafc 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -544,7 +544,9 @@ struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst, struct request_sock *req_unhash, - bool *own_req); + bool *own_req, + void (*opt_child_init)(struct sock *newsk, + const struct sock *sk)); int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb); int tcp_v4_connect(struct sock *sk, struct sockaddr_unsized *uaddr, int addr_len); int tcp_connect(struct sock *sk); diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c index 569befcf021b..061751aabc8e 100644 --- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -203,7 +203,7 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb, bool own_req; child = icsk->icsk_af_ops->syn_recv_sock(sk, skb, req, dst, - NULL, &own_req); + NULL, &own_req, NULL); if (child) { refcount_set(&req->rsk_refcnt, 1); sock_rps_save_rxhash(child, skb); diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index b30090cff3cf..df803f088f45 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -333,7 +333,7 @@ static struct sock *tcp_fastopen_create_child(struct sock *sk, bool own_req; child = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL, - NULL, &own_req); + NULL, &own_req, NULL); if (!child) return NULL; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 6264fc0b2be5..2980c175d326 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1705,7 +1705,9 @@ struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst, struct request_sock *req_unhash, - bool *own_req) + bool *own_req, + void (*opt_child_init)(struct sock *newsk, + const struct sock *sk)) { struct inet_request_sock *ireq; bool found_dup_sk = false; @@ -1757,6 +1759,10 @@ struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb, } sk_setup_caps(newsk, dst); +#if IS_ENABLED(CONFIG_IPV6) + if (opt_child_init) + opt_child_init(newsk, sk); +#endif tcp_ca_openreq_child(newsk, dst); tcp_sync_mss(newsk, dst4_mtu(dst)); diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index ec128865f5c0..d9c5a43bd281 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -925,7 +925,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, * socket is created, wait for troubles. */ child = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL, - req, &own_req); + req, &own_req, NULL); if (!child) goto listen_overflow; diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index d10487b4e5bf..e46a0efae012 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1312,11 +1312,48 @@ static void tcp_v6_restore_cb(struct sk_buff *skb) sizeof(struct inet6_skb_parm)); } +/* Called from tcp_v4_syn_recv_sock() for v6_mapped children. */ +static void tcp_v6_mapped_child_init(struct sock *newsk, const struct sock *sk) +{ + struct inet_sock *newinet = inet_sk(newsk); + struct ipv6_pinfo *newnp; + + newinet->pinet6 = newnp = tcp_inet6_sk(newsk); + newinet->ipv6_fl_list = NULL; + + memcpy(newnp, tcp_inet6_sk(sk), sizeof(struct ipv6_pinfo)); + + newnp->saddr = newsk->sk_v6_rcv_saddr; + + inet_csk(newsk)->icsk_af_ops = &ipv6_mapped; + if (sk_is_mptcp(newsk)) + mptcpv6_handle_mapped(newsk, true); + newsk->sk_backlog_rcv = tcp_v4_do_rcv; +#if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) + tcp_sk(newsk)->af_specific = &tcp_sock_ipv6_mapped_specific; +#endif + + newnp->ipv6_mc_list = NULL; + newnp->ipv6_ac_list = NULL; + newnp->pktoptions = NULL; + newnp->opt = NULL; + + /* tcp_v4_syn_recv_sock() has initialized newinet->mc_{index,ttl} */ + newnp->mcast_oif = newinet->mc_index; + newnp->mcast_hops = newinet->mc_ttl; + + newnp->rcv_flowinfo = 0; + if (inet6_test_bit(REPFLOW, sk)) + newnp->flow_label = 0; +} + static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst, struct request_sock *req_unhash, - bool *own_req) + bool *own_req, + void (*opt_child_init)(struct sock *newsk, + const struct sock *sk)) { const struct ipv6_pinfo *np = tcp_inet6_sk(sk); struct inet_request_sock *ireq; @@ -1332,61 +1369,10 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff * #endif struct flowi6 fl6; - if (skb->protocol == htons(ETH_P_IP)) { - /* - * v6 mapped - */ - - newsk = tcp_v4_syn_recv_sock(sk, skb, req, dst, - req_unhash, own_req); - - if (!newsk) - return NULL; - - newinet = inet_sk(newsk); - newinet->pinet6 = tcp_inet6_sk(newsk); - newinet->ipv6_fl_list = NULL; - - newnp = tcp_inet6_sk(newsk); - newtp = tcp_sk(newsk); - - memcpy(newnp, np, sizeof(struct ipv6_pinfo)); - - newnp->saddr = newsk->sk_v6_rcv_saddr; - - inet_csk(newsk)->icsk_af_ops = &ipv6_mapped; - if (sk_is_mptcp(newsk)) - mptcpv6_handle_mapped(newsk, true); - newsk->sk_backlog_rcv = tcp_v4_do_rcv; -#if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) - newtp->af_specific = &tcp_sock_ipv6_mapped_specific; -#endif - - newnp->ipv6_mc_list = NULL; - newnp->ipv6_ac_list = NULL; - newnp->pktoptions = NULL; - newnp->opt = NULL; - newnp->mcast_oif = inet_iif(skb); - newnp->mcast_hops = ip_hdr(skb)->ttl; - newnp->rcv_flowinfo = 0; - if (inet6_test_bit(REPFLOW, sk)) - newnp->flow_label = 0; - - /* - * No need to charge this sock to the relevant IPv6 refcnt debug socks count - * here, tcp_create_openreq_child now does this for us, see the comment in - * that function for the gory details. -acme - */ - - /* It is tricky place. Until this moment IPv4 tcp - worked with IPv6 icsk.icsk_af_ops. - Sync it now. - */ - tcp_sync_mss(newsk, inet_csk(newsk)->icsk_pmtu_cookie); - - return newsk; - } - + if (skb->protocol == htons(ETH_P_IP)) + return tcp_v4_syn_recv_sock(sk, skb, req, dst, + req_unhash, own_req, + tcp_v6_mapped_child_init); ireq = inet_rsk(req); if (sk_acceptq_is_full(sk)) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index f66129f1e649..8b25dd86ffeb 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -808,7 +808,9 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, struct request_sock *req, struct dst_entry *dst, struct request_sock *req_unhash, - bool *own_req) + bool *own_req, + void (*opt_child_init)(struct sock *newsk, + const struct sock *sk)) { struct mptcp_subflow_context *listener = mptcp_subflow_ctx(sk); struct mptcp_subflow_request_sock *subflow_req; @@ -855,7 +857,7 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, create_child: child = listener->icsk_af_ops->syn_recv_sock(sk, skb, req, dst, - req_unhash, own_req); + req_unhash, own_req, opt_child_init); if (child && *own_req) { struct mptcp_subflow_context *ctx = mptcp_subflow_ctx(child); diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index d8201eb3ac5f..18c56b0d7ad5 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -124,7 +124,9 @@ static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk, struct request_sock *req, struct dst_entry *dst, struct request_sock *req_unhash, - bool *own_req) + bool *own_req, + void (*opt_child_init)(struct sock *newsk, + const struct sock *sk)) { struct smc_sock *smc; struct sock *child; @@ -142,7 +144,7 @@ static struct sock *smc_tcp_syn_recv_sock(const struct sock *sk, /* passthrough to original syn recv sock fct */ child = smc->ori_af_ops->syn_recv_sock(sk, skb, req, dst, req_unhash, - own_req); + own_req, opt_child_init); /* child must not inherit smc or its ops */ if (child) { rcu_assign_sk_user_data(child, NULL); From f891007ab1c77436950d10e09eae54507f1865ff Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 18 Feb 2026 14:13:37 +0000 Subject: [PATCH 052/359] psp: use sk->sk_hash in psp_write_headers() udp_flow_src_port() is indirectly using sk->sk_txhash as a base, because __tcp_transmit_skb() uses skb_set_hash_from_sk(). This is problematic because this field can change over the lifetime of a TCP flow, thanks to calls to sk_rethink_txhash(). Problem is that some NIC might (ab)use the PSP UDP source port in their RSS computation, and PSP packets for a given flow could jump from one queue to another. In order to avoid surprises, it is safer to let Protective Load Balancing (PLB) get its entropy from the IPv6 flowlabel, and change psp_write_headers() to use sk->sk_hash which does not change for the duration of the flow. We might add a sysctl to select the behavior, if there is a need for it. Fixes: fc724515741a ("psp: provide encapsulation helper for drivers") Signed-off-by: Eric Dumazet Reviewed-By: Daniel Zahka Link: https://patch.msgid.link/20260218141337.999945-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/psp/psp_main.c | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/net/psp/psp_main.c b/net/psp/psp_main.c index a8534124f626..066222eb56c4 100644 --- a/net/psp/psp_main.c +++ b/net/psp/psp_main.c @@ -166,9 +166,46 @@ static void psp_write_headers(struct net *net, struct sk_buff *skb, __be32 spi, { struct udphdr *uh = udp_hdr(skb); struct psphdr *psph = (struct psphdr *)(uh + 1); + const struct sock *sk = skb->sk; uh->dest = htons(PSP_DEFAULT_UDP_PORT); - uh->source = udp_flow_src_port(net, skb, 0, 0, false); + + /* A bit of theory: Selection of the source port. + * + * We need some entropy, so that multiple flows use different + * source ports for better RSS spreading at the receiver. + * + * We also need that all packets belonging to one TCP flow + * use the same source port through their duration, + * so that all these packets land in the same receive queue. + * + * udp_flow_src_port() is using sk_txhash, inherited from + * skb_set_hash_from_sk() call in __tcp_transmit_skb(). + * This field is subject to reshuffling, thanks to + * sk_rethink_txhash() calls in various TCP functions. + * + * Instead, use sk->sk_hash which is constant through + * the whole flow duration. + */ + if (likely(sk)) { + u32 hash = sk->sk_hash; + int min, max; + + /* These operations are cheap, no need to cache the result + * in another socket field. + */ + inet_get_local_port_range(net, &min, &max); + /* Since this is being sent on the wire obfuscate hash a bit + * to minimize possibility that any useful information to an + * attacker is leaked. Only upper 16 bits are relevant in the + * computation for 16 bit port value because we use a + * reciprocal divide. + */ + hash ^= hash << 16; + uh->source = htons((((u64)hash * (max - min)) >> 32) + min); + } else { + uh->source = udp_flow_src_port(net, skb, 0, 0, false); + } uh->check = 0; uh->len = htons(udp_len); From e1512c1db9e8794d8d130addd2615ec27231d994 Mon Sep 17 00:00:00 2001 From: Hyunwoo Kim Date: Wed, 18 Feb 2026 02:16:43 +0900 Subject: [PATCH 053/359] espintcp: Fix race condition in espintcp_close() This issue was discovered during a code audit. After cancel_work_sync() is called from espintcp_close(), espintcp_tx_work() can still be scheduled from paths such as the Delayed ACK handler or ksoftirqd. As a result, the espintcp_tx_work() worker may dereference a freed espintcp ctx or sk. The following is a simple race scenario: cpu0 cpu1 espintcp_close() cancel_work_sync(&ctx->work); espintcp_write_space() schedule_work(&ctx->work); To prevent this race condition, cancel_work_sync() is replaced with disable_work_sync(). Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)") Signed-off-by: Hyunwoo Kim Reviewed-by: Simon Horman Link: https://patch.msgid.link/aZSie7rEdh9Nu0eM@v4bel Signed-off-by: Jakub Kicinski --- net/xfrm/espintcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/xfrm/espintcp.c b/net/xfrm/espintcp.c index bf744ac9d5a7..8709df716e98 100644 --- a/net/xfrm/espintcp.c +++ b/net/xfrm/espintcp.c @@ -536,7 +536,7 @@ static void espintcp_close(struct sock *sk, long timeout) sk->sk_prot = &tcp_prot; barrier(); - cancel_work_sync(&ctx->work); + disable_work_sync(&ctx->work); strp_done(&ctx->strp); skb_queue_purge(&ctx->out_queue); From 64868f5ecadeb359a49bc4485bfa7c497047f13a Mon Sep 17 00:00:00 2001 From: Ziyi Guo Date: Tue, 17 Feb 2026 17:50:12 +0000 Subject: [PATCH 054/359] net: usb: kaweth: remove TX queue manipulation in kaweth_set_rx_mode kaweth_set_rx_mode(), the ndo_set_rx_mode callback, calls netif_stop_queue() and netif_wake_queue(). These are TX queue flow control functions unrelated to RX multicast configuration. The premature netif_wake_queue() can re-enable TX while tx_urb is still in-flight, leading to a double usb_submit_urb() on the same URB: kaweth_start_xmit() { netif_stop_queue(); usb_submit_urb(kaweth->tx_urb); } kaweth_set_rx_mode() { netif_stop_queue(); netif_wake_queue(); // wakes TX queue before URB is done } kaweth_start_xmit() { netif_stop_queue(); usb_submit_urb(kaweth->tx_urb); // URB submitted while active } This triggers the WARN in usb_submit_urb(): "URB submitted while active" This is a similar class of bug fixed in rtl8150 by - commit 958baf5eaee3 ("net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast"). Also kaweth_set_rx_mode() is already functionally broken, the real set_rx_mode action is performed by kaweth_async_set_rx_mode(), which in turn is not a no-op only at ndo_open() time. Suggested-by: Paolo Abeni Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Ziyi Guo Link: https://patch.msgid.link/20260217175012.1234494-1-n7l8m4@u.northwestern.edu Signed-off-by: Jakub Kicinski --- drivers/net/usb/kaweth.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/usb/kaweth.c b/drivers/net/usb/kaweth.c index c9efb7df892e..e01d14f6c366 100644 --- a/drivers/net/usb/kaweth.c +++ b/drivers/net/usb/kaweth.c @@ -765,7 +765,6 @@ static void kaweth_set_rx_mode(struct net_device *net) netdev_dbg(net, "Setting Rx mode to %d\n", packet_filter_bitmap); - netif_stop_queue(net); if (net->flags & IFF_PROMISC) { packet_filter_bitmap |= KAWETH_PACKET_FILTER_PROMISCUOUS; @@ -775,7 +774,6 @@ static void kaweth_set_rx_mode(struct net_device *net) } kaweth->packet_filter_bitmap = packet_filter_bitmap; - netif_wake_queue(net); } /**************************************************************** From f1e2f0ce704e4a14e3f367d3b97d3dd2d8e183b7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Martin=20P=C3=A5lsson?= Date: Wed, 18 Feb 2026 05:28:22 +0000 Subject: [PATCH 055/359] net: usb: lan78xx: scan all MDIO addresses on LAN7801 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The LAN7801 is designed exclusively for external PHYs (unlike the LAN7800/LAN7850 which have internal PHYs), but lan78xx_mdio_init() restricts PHY scanning to MDIO addresses 0-7 by setting phy_mask to ~(0xFF). This prevents discovery of external PHYs wired to addresses outside that range. One such case is the DP83TC814 100BASE-T1 PHY, which is typically configured at MDIO address 10 via PHYAD bootstrap pins and goes undetected with the current mask. Remove the restrictive phy_mask assignment for the LAN7801 so that the default mask of 0 applies, allowing all 32 MDIO addresses to be scanned during bus registration. Fixes: 02dc1f3d613d ("lan78xx: add LAN7801 MAC only support") Signed-off-by: Martin Pålsson Link: https://patch.msgid.link/0110019c6f388aff-98d99cf0-4425-4fff-b16b-dea5ad8fafe0-000000@eu-north-1.amazonses.com Signed-off-by: Jakub Kicinski --- drivers/net/usb/lan78xx.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c index 00397a807393..065588c9cfa6 100644 --- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -2094,8 +2094,6 @@ static int lan78xx_mdio_init(struct lan78xx_net *dev) dev->mdiobus->phy_mask = ~(1 << 1); break; case ID_REV_CHIP_ID_7801_: - /* scan thru PHYAD[2..0] */ - dev->mdiobus->phy_mask = ~(0xFF); break; } From bfd264fbbbca7a39ea430d7bc2baf8ee2ea958e4 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Wed, 18 Feb 2026 18:05:51 +0200 Subject: [PATCH 056/359] net: dsa: sja1105: protect link replay helpers against NULL phylink instance There is a crash when unbinding the sja1105 driver under special circumstances: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030 Call trace: phylink_run_resolve_and_disable+0x10/0x90 sja1105_static_config_reload+0xc0/0x410 sja1105_vlan_filtering+0x100/0x140 dsa_port_vlan_filtering+0x13c/0x368 dsa_port_reset_vlan_filtering.isra.0+0xe8/0x198 dsa_port_bridge_leave+0x130/0x248 dsa_user_changeupper.part.0+0x74/0x158 dsa_user_netdevice_event+0x50c/0xa50 notifier_call_chain+0x78/0x148 raw_notifier_call_chain+0x20/0x38 call_netdevice_notifiers_info+0x58/0xa8 __netdev_upper_dev_unlink+0xac/0x220 netdev_upper_dev_unlink+0x38/0x70 del_nbp+0x1a4/0x320 br_del_if+0x3c/0xd8 br_device_event+0xf8/0x2d8 notifier_call_chain+0x78/0x148 raw_notifier_call_chain+0x20/0x38 call_netdevice_notifiers_info+0x58/0xa8 unregister_netdevice_many_notify+0x314/0x848 unregister_netdevice_queue+0xe8/0xf8 dsa_user_destroy+0x50/0xa8 dsa_port_teardown+0x80/0x98 dsa_switch_teardown_ports+0x4c/0xb8 dsa_switch_deinit+0x94/0xb8 dsa_switch_put_tree+0x2c/0xc0 dsa_unregister_switch+0x38/0x60 sja1105_remove+0x24/0x40 spi_remove+0x38/0x60 device_remove+0x54/0x90 device_release_driver_internal+0x1d4/0x230 device_driver_detach+0x20/0x38 unbind_store+0xbc/0xc8 ---[ end trace 0000000000000000 ]--- which requires an explanation. When a port offloads a bridge, the switch must be reset to change the VLAN awareness state (the SJA1105_VLAN_FILTERING reason for sja1105_static_config_reload()). When the port leaves a VLAN-aware bridge, it must also be reset for the same reason: it is returning to operation as a VLAN-unaware standalone port. sja1105_static_config_reload() triggers the phylink link replay helpers. Because sja1105 is a switch, it has multiple user ports. During unbind, ports are torn down one by one in dsa_switch_teardown_ports() -> dsa_port_teardown() -> dsa_user_destroy(). The crash happens when the first user port is not part of the VLAN-aware bridge, but any other user port is. Tearing down the first user port causes phylink_destroy() to be called on dp->pl, and this pointer to be set to NULL. Then, when the second user port is torn down, this was offloading a VLAN-aware bridge port, so indirectly it will trigger sja1105_static_config_reload(). The latter function iterates using dsa_switch_for_each_available_port(), and unconditionally dereferences dp->pl, including for the aforementioned torn down previous port, and passes that to phylink. This is where the NULL pointer is coming from. There are multiple levels at which this could be avoided: - add an "if (dp->pl)" in sja1105_static_config_reload() - make the phylink replay helpers NULL-tolerant - mark ports as DSA_PORT_TYPE_UNUSED after dsa_port_phylink_destroy() has run, such that subsequent dsa_switch_for_each_available_port() iterations skip them - disconnect the entire switch at once from switchdev and NETDEV_CHANGEUPPER events while unbinding, not just port by port, likely using a "ds->unbinding = true" mechanism or similar however options 3 and 4 are quite heavy and might have side effects. Although 2 allows to keep the driver simpler, the phylink API it not NULL-tolerant in general and is not responsible for the NULL pointer (this is something done by dsa_port_phylink_destroy()). So I went with 1. Functionally speaking, skipping the replay helpers for ports without a phylink instance is fine, because that only happens during driver removal (an operation which cannot be cancelled). The ports are not required to work (although they probably still will - untested assumption - as long as we don't overwrite the last port speed with SJA1105_SPEED_AUTO). Fixes: 0b2edc531e0b ("net: dsa: sja1105: let phylink help with the replay of link callbacks") Signed-off-by: Vladimir Oltean Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/20260218160551.194782-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- drivers/net/dsa/sja1105/sja1105_main.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index 71a817c07a90..94d17b06da40 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -2278,6 +2278,12 @@ int sja1105_static_config_reload(struct sja1105_private *priv, * change it through the dynamic interface later. */ dsa_switch_for_each_available_port(dp, ds) { + /* May be called during unbind when we unoffload a VLAN-aware + * bridge from port 1 while port 0 was already torn down + */ + if (!dp->pl) + continue; + phylink_replay_link_begin(dp->pl); mac[dp->index].speed = priv->info->port_speed[SJA1105_SPEED_AUTO]; } @@ -2334,7 +2340,8 @@ int sja1105_static_config_reload(struct sja1105_private *priv, } dsa_switch_for_each_available_port(dp, ds) - phylink_replay_link_end(dp->pl); + if (dp->pl) + phylink_replay_link_end(dp->pl); rc = sja1105_reload_cbs(priv); if (rc < 0) From f6a495484a27150fb85f943e1a7464da88c2a797 Mon Sep 17 00:00:00 2001 From: Ethan Tidmore Date: Thu, 19 Feb 2026 16:10:01 -0600 Subject: [PATCH 057/359] proc: Fix pointer error dereference The function try_lookup_noperm() can return an error pointer. Add check for error pointer. Detected by Smatch: fs/proc/base.c:2148 proc_fill_cache() error: 'child' dereferencing possible ERR_PTR() Fixes: 1df98b8bbcca ("proc_fill_cache(): clean up, get rid of pointless find_inode_number() use") Signed-off-by: Ethan Tidmore Link: https://patch.msgid.link/20260219221001.1117135-1-ethantidmore06@gmail.com Signed-off-by: Christian Brauner --- fs/proc/base.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/proc/base.c b/fs/proc/base.c index 4eec684baca9..4c863d17dfb4 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2128,6 +2128,9 @@ bool proc_fill_cache(struct file *file, struct dir_context *ctx, ino_t ino = 1; child = try_lookup_noperm(&qname, dir); + if (IS_ERR(child)) + goto end_instantiate; + if (!child) { DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); child = d_alloc_parallel(dir, &qname, &wq); From c5f8658f97ec392eeaf355d4e9775ae1f23ca1d3 Mon Sep 17 00:00:00 2001 From: Chen Ni Date: Wed, 4 Feb 2026 17:06:29 +0800 Subject: [PATCH 058/359] drm/imx: parallel-display: check return value of devm_drm_bridge_add() in imx_pd_probe() Return the value of devm_drm_bridge_add() in order to propagate the error properly, if it fails due to resource allocation failure or bridge registration failure. This ensures that the probe function fails safely rather than proceeding with a potentially incomplete bridge setup. Fixes: bf7e97910b9f ("drm/imx: parallel-display: add the bridge before attaching it") Signed-off-by: Chen Ni Reviewed-by: Luca Ceresoli Link: https://patch.msgid.link/20260204090629.2209542-1-nichen@iscas.ac.cn Signed-off-by: Luca Ceresoli --- drivers/gpu/drm/imx/ipuv3/parallel-display.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/imx/ipuv3/parallel-display.c b/drivers/gpu/drm/imx/ipuv3/parallel-display.c index 6fbf505d2801..590120a33fa0 100644 --- a/drivers/gpu/drm/imx/ipuv3/parallel-display.c +++ b/drivers/gpu/drm/imx/ipuv3/parallel-display.c @@ -256,7 +256,9 @@ static int imx_pd_probe(struct platform_device *pdev) platform_set_drvdata(pdev, imxpd); - devm_drm_bridge_add(dev, &imxpd->bridge); + ret = devm_drm_bridge_add(dev, &imxpd->bridge); + if (ret) + return ret; return component_add(dev, &imx_pd_ops); } From 496daa2759260374bb9c9b2196a849aa3bc513a8 Mon Sep 17 00:00:00 2001 From: Chen Ni Date: Fri, 6 Feb 2026 12:06:21 +0800 Subject: [PATCH 059/359] drm/bridge: synopsys: dw-dp: Check return value of devm_drm_bridge_add() in dw_dp_bind() Return the value of devm_drm_bridge_add() in order to propagate the error properly, if it fails due to resource allocation failure or bridge registration failure. This ensures that the bind function fails safely rather than proceeding with a potentially incomplete bridge setup. Fixes: b726970486d8 ("drm/bridge: synopsys: dw-dp: add bridge before attaching") Signed-off-by: Chen Ni Reviewed-by: Andy Yan Reviewed-by: Luca Ceresoli Link: https://patch.msgid.link/20260206040621.4095517-1-nichen@iscas.ac.cn Signed-off-by: Luca Ceresoli --- drivers/gpu/drm/bridge/synopsys/dw-dp.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/bridge/synopsys/dw-dp.c b/drivers/gpu/drm/bridge/synopsys/dw-dp.c index 432342452484..07f7a2e0d9f2 100644 --- a/drivers/gpu/drm/bridge/synopsys/dw-dp.c +++ b/drivers/gpu/drm/bridge/synopsys/dw-dp.c @@ -2049,7 +2049,9 @@ struct dw_dp *dw_dp_bind(struct device *dev, struct drm_encoder *encoder, bridge->type = DRM_MODE_CONNECTOR_DisplayPort; bridge->ycbcr_420_allowed = true; - devm_drm_bridge_add(dev, bridge); + ret = devm_drm_bridge_add(dev, bridge); + if (ret) + return ERR_PTR(ret); dp->aux.dev = dev; dp->aux.drm_dev = encoder->dev; From 803ec1faf7c1823e6e3b1f2aaa81be18528c9436 Mon Sep 17 00:00:00 2001 From: Osama Abdelkader Date: Mon, 9 Feb 2026 19:41:14 +0100 Subject: [PATCH 060/359] drm/bridge: samsung-dsim: Fix memory leak in error path In samsung_dsim_host_attach(), drm_bridge_add() is called to add the bridge. However, if samsung_dsim_register_te_irq() or pdata->host_ops->attach() fails afterwards, the function returns without removing the bridge, causing a memory leak. Fix this by adding proper error handling with goto labels to ensure drm_bridge_remove() is called in all error paths. Also ensure that samsung_dsim_unregister_te_irq() is called if the attach operation fails after the TE IRQ has been registered. samsung_dsim_unregister_te_irq() function is moved without changes to be before samsung_dsim_host_attach() to avoid forward declaration. Fixes: e7447128ca4a ("drm: bridge: Generalize Exynos-DSI driver into a Samsung DSIM bridge") Cc: stable@vger.kernel.org Signed-off-by: Osama Abdelkader Reviewed-by: Luca Ceresoli Link: https://patch.msgid.link/20260209184115.10937-1-osama.abdelkader@gmail.com Signed-off-by: Luca Ceresoli --- drivers/gpu/drm/bridge/samsung-dsim.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/bridge/samsung-dsim.c b/drivers/gpu/drm/bridge/samsung-dsim.c index eabc4c32f6ab..ad8c6aa49d48 100644 --- a/drivers/gpu/drm/bridge/samsung-dsim.c +++ b/drivers/gpu/drm/bridge/samsung-dsim.c @@ -1881,6 +1881,14 @@ static int samsung_dsim_register_te_irq(struct samsung_dsim *dsi, struct device return 0; } +static void samsung_dsim_unregister_te_irq(struct samsung_dsim *dsi) +{ + if (dsi->te_gpio) { + free_irq(gpiod_to_irq(dsi->te_gpio), dsi); + gpiod_put(dsi->te_gpio); + } +} + static int samsung_dsim_host_attach(struct mipi_dsi_host *host, struct mipi_dsi_device *device) { @@ -1955,13 +1963,13 @@ of_find_panel_or_bridge: if (!(device->mode_flags & MIPI_DSI_MODE_VIDEO)) { ret = samsung_dsim_register_te_irq(dsi, &device->dev); if (ret) - return ret; + goto err_remove_bridge; } if (pdata->host_ops && pdata->host_ops->attach) { ret = pdata->host_ops->attach(dsi, device); if (ret) - return ret; + goto err_unregister_te_irq; } dsi->lanes = device->lanes; @@ -1969,14 +1977,13 @@ of_find_panel_or_bridge: dsi->mode_flags = device->mode_flags; return 0; -} -static void samsung_dsim_unregister_te_irq(struct samsung_dsim *dsi) -{ - if (dsi->te_gpio) { - free_irq(gpiod_to_irq(dsi->te_gpio), dsi); - gpiod_put(dsi->te_gpio); - } +err_unregister_te_irq: + if (!(device->mode_flags & MIPI_DSI_MODE_VIDEO)) + samsung_dsim_unregister_te_irq(dsi); +err_remove_bridge: + drm_bridge_remove(&dsi->bridge); + return ret; } static int samsung_dsim_host_detach(struct mipi_dsi_host *host, From 0d195d3b205ca90db30d70d09d7bb6909aac178f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christoph=20B=C3=B6hmwalder?= Date: Fri, 20 Feb 2026 12:39:37 +0100 Subject: [PATCH 061/359] drbd: fix null-pointer dereference on local read error MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In drbd_request_endio(), READ_COMPLETED_WITH_ERROR is passed to __req_mod() with a NULL peer_device: __req_mod(req, what, NULL, &m); The READ_COMPLETED_WITH_ERROR handler then unconditionally passes this NULL peer_device to drbd_set_out_of_sync(), which dereferences it, causing a null-pointer dereference. Fix this by obtaining the peer_device via first_peer_device(device), matching how drbd_req_destroy() handles the same situation. Cc: stable@vger.kernel.org Reported-by: Tuo Li Link: https://lore.kernel.org/linux-block/20260104165355.151864-1-islituo@gmail.com Signed-off-by: Christoph Böhmwalder Signed-off-by: Jens Axboe --- drivers/block/drbd/drbd_req.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c index d15826f6ee81..70f75ef07945 100644 --- a/drivers/block/drbd/drbd_req.c +++ b/drivers/block/drbd/drbd_req.c @@ -621,7 +621,8 @@ int __req_mod(struct drbd_request *req, enum drbd_req_event what, break; case READ_COMPLETED_WITH_ERROR: - drbd_set_out_of_sync(peer_device, req->i.sector, req->i.size); + drbd_set_out_of_sync(first_peer_device(device), + req->i.sector, req->i.size); drbd_report_io_error(device, req); __drbd_chk_io_error(device, DRBD_READ_ERROR); fallthrough; From 9b8eeccd7110d10f9c1b1aa456f5efac23e4e383 Mon Sep 17 00:00:00 2001 From: Satish Kharat Date: Thu, 19 Feb 2026 11:24:00 -0800 Subject: [PATCH 062/359] MAINTAINERS: update enic and usnic maintainers Remove Christian Benvenuti from the enic/usnic maintainer lists because he is no longer working on the drivers and the email address is no longer valid. Keep Satish Kharat as the enic maintainer and the usnic co-maintainer. Signed-off-by: Satish Kharat Acked-by: Leon Romanovsky Link: https://patch.msgid.link/20260219-enic-maintainers-update-v2-v1-1-c58aa11c2ea8@cisco.com Signed-off-by: Jakub Kicinski --- MAINTAINERS | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index b8d8a5c41597..c00b5f5d4203 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6219,14 +6219,13 @@ S: Supported F: drivers/scsi/snic/ CISCO VIC ETHERNET NIC DRIVER -M: Christian Benvenuti M: Satish Kharat S: Maintained F: drivers/net/ethernet/cisco/enic/ CISCO VIC LOW LATENCY NIC DRIVER -M: Christian Benvenuti M: Nelson Escobar +M: Satish Kharat S: Supported F: drivers/infiniband/hw/usnic/ From 2bb995e6155cb4f254574598cbd6fe1dcc99766a Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Wed, 18 Feb 2026 16:56:00 -0800 Subject: [PATCH 063/359] net: phy: qcom: qca807x: normalize return value of gpio_get The GPIO get callback is expected to return 0 or 1 (or a negative error code). Ensure that the value returned by qca807x_gpio_get() is normalized to the [0, 1] range. Fixes: 86ef402d805d ("gpiolib: sanitize the return value of gpio_chip::get()") Signed-off-by: Dmitry Torokhov Reviewed-by: Bartosz Golaszewski Reviewed-by: Linus Walleij Link: https://patch.msgid.link/aZZeyr2ysqqk2GqA@google.com Signed-off-by: Jakub Kicinski --- drivers/net/phy/qcom/qca807x.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/qcom/qca807x.c b/drivers/net/phy/qcom/qca807x.c index d8f1ce5a7128..6004da5741af 100644 --- a/drivers/net/phy/qcom/qca807x.c +++ b/drivers/net/phy/qcom/qca807x.c @@ -375,7 +375,7 @@ static int qca807x_gpio_get(struct gpio_chip *gc, unsigned int offset) reg = QCA807X_MMD7_LED_FORCE_CTRL(offset); val = phy_read_mmd(priv->phy, MDIO_MMD_AN, reg); - return FIELD_GET(QCA807X_GPIO_FORCE_MODE_MASK, val); + return !!FIELD_GET(QCA807X_GPIO_FORCE_MODE_MASK, val); } static int qca807x_gpio_set(struct gpio_chip *gc, unsigned int offset, int value) From 594163ea88a03bdb412063af50fc7177ef3cbeae Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Thu, 19 Feb 2026 12:38:50 +0100 Subject: [PATCH 064/359] net: ethernet: xscale: Check for PTP support properly In ixp4xx_get_ts_info() ixp46x_ptp_find() is called unconditionally despite this feature only existing on ixp46x, leading to the following splat from tcpdump: root@OpenWrt:~# tcpdump -vv -X -i eth0 (...) Unable to handle kernel NULL pointer dereference at virtual address 00000238 when read (...) Call trace: ptp_clock_index from ixp46x_ptp_find+0x1c/0x38 ixp46x_ptp_find from ixp4xx_get_ts_info+0x4c/0x64 ixp4xx_get_ts_info from __ethtool_get_ts_info+0x90/0x108 __ethtool_get_ts_info from __dev_ethtool+0xa00/0x2648 __dev_ethtool from dev_ethtool+0x160/0x234 dev_ethtool from dev_ioctl+0x2cc/0x460 dev_ioctl from sock_ioctl+0x1ec/0x524 sock_ioctl from sys_ioctl+0x51c/0xa94 sys_ioctl from ret_fast_syscall+0x0/0x44 (...) Segmentation fault Check for ixp46x in ixp46x_ptp_find() before trying to set up PTP to avoid this. To avoid altering the returned error code from ixp4xx_hwtstamp_set() which before this patch was -EOPNOTSUPP, we return -EOPNOTSUPP from ixp4xx_hwtstamp_set() if ixp46x_ptp_find() fails no matter the error code. The helper function ixp46x_ptp_find() helper returns -ENODEV. Fixes: 9055a2f59162 ("ixp4xx_eth: make ptp support a platform driver") Signed-off-by: Linus Walleij Link: https://patch.msgid.link/20260219-ixp4xx-fix-ethernet-v3-1-f235ccc3cd46@kernel.org Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/xscale/ixp4xx_eth.c | 5 +---- drivers/net/ethernet/xscale/ptp_ixp46x.c | 3 +++ 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/xscale/ixp4xx_eth.c b/drivers/net/ethernet/xscale/ixp4xx_eth.c index e1e7f65553e7..b0faa0f1780d 100644 --- a/drivers/net/ethernet/xscale/ixp4xx_eth.c +++ b/drivers/net/ethernet/xscale/ixp4xx_eth.c @@ -403,15 +403,12 @@ static int ixp4xx_hwtstamp_set(struct net_device *netdev, int ret; int ch; - if (!cpu_is_ixp46x()) - return -EOPNOTSUPP; - if (!netif_running(netdev)) return -EINVAL; ret = ixp46x_ptp_find(&port->timesync_regs, &port->phc_index); if (ret) - return ret; + return -EOPNOTSUPP; ch = PORT2CHANNEL(port); regs = port->timesync_regs; diff --git a/drivers/net/ethernet/xscale/ptp_ixp46x.c b/drivers/net/ethernet/xscale/ptp_ixp46x.c index 94203eb46e6b..93c64db22a69 100644 --- a/drivers/net/ethernet/xscale/ptp_ixp46x.c +++ b/drivers/net/ethernet/xscale/ptp_ixp46x.c @@ -232,6 +232,9 @@ static struct ixp_clock ixp_clock; int ixp46x_ptp_find(struct ixp46x_ts_regs *__iomem *regs, int *phc_index) { + if (!cpu_is_ixp46x()) + return -ENODEV; + *regs = ixp_clock.regs; *phc_index = ptp_clock_index(ixp_clock.ptp_clock); From 470c7ca2b4c3e3a51feeb952b7f97a775b5c49cd Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Thu, 19 Feb 2026 17:31:31 +0000 Subject: [PATCH 065/359] udplite: Fix null-ptr-deref in __udp_enqueue_schedule_skb(). syzbot reported null-ptr-deref of udp_sk(sk)->udp_prod_queue. [0] Since the cited commit, udp_lib_init_sock() can fail, as can udp_init_sock() and udpv6_init_sock(). Let's handle the error in udplite_sk_init() and udplitev6_sk_init(). [0]: BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:82 [inline] BUG: KASAN: null-ptr-deref in atomic_read include/linux/atomic/atomic-instrumented.h:32 [inline] BUG: KASAN: null-ptr-deref in __udp_enqueue_schedule_skb+0x151/0x1480 net/ipv4/udp.c:1719 Read of size 4 at addr 0000000000000008 by task syz.2.18/2944 CPU: 1 UID: 0 PID: 2944 Comm: syz.2.18 Not tainted syzkaller #0 PREEMPTLAZY Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 Call Trace: dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 kasan_report+0xa2/0xe0 mm/kasan/report.c:595 check_region_inline mm/kasan/generic.c:-1 [inline] kasan_check_range+0x264/0x2c0 mm/kasan/generic.c:200 instrument_atomic_read include/linux/instrumented.h:82 [inline] atomic_read include/linux/atomic/atomic-instrumented.h:32 [inline] __udp_enqueue_schedule_skb+0x151/0x1480 net/ipv4/udp.c:1719 __udpv6_queue_rcv_skb net/ipv6/udp.c:795 [inline] udpv6_queue_rcv_one_skb+0xa2e/0x1ad0 net/ipv6/udp.c:906 udp6_unicast_rcv_skb+0x227/0x380 net/ipv6/udp.c:1064 ip6_protocol_deliver_rcu+0xe17/0x1540 net/ipv6/ip6_input.c:438 ip6_input_finish+0x191/0x350 net/ipv6/ip6_input.c:489 NF_HOOK+0x354/0x3f0 include/linux/netfilter.h:318 ip6_input+0x16c/0x2b0 net/ipv6/ip6_input.c:500 NF_HOOK+0x354/0x3f0 include/linux/netfilter.h:318 __netif_receive_skb_one_core net/core/dev.c:6149 [inline] __netif_receive_skb+0xd3/0x370 net/core/dev.c:6262 process_backlog+0x4d6/0x1160 net/core/dev.c:6614 __napi_poll+0xae/0x320 net/core/dev.c:7678 napi_poll net/core/dev.c:7741 [inline] net_rx_action+0x60d/0xdc0 net/core/dev.c:7893 handle_softirqs+0x209/0x8d0 kernel/softirq.c:622 do_softirq+0x52/0x90 kernel/softirq.c:523 __local_bh_enable_ip+0xe7/0x120 kernel/softirq.c:450 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:924 [inline] __dev_queue_xmit+0x109c/0x2dc0 net/core/dev.c:4856 __ip6_finish_output net/ipv6/ip6_output.c:-1 [inline] ip6_finish_output+0x158/0x4e0 net/ipv6/ip6_output.c:219 NF_HOOK_COND include/linux/netfilter.h:307 [inline] ip6_output+0x342/0x580 net/ipv6/ip6_output.c:246 ip6_send_skb+0x1d7/0x3c0 net/ipv6/ip6_output.c:1984 udp_v6_send_skb+0x9a5/0x1770 net/ipv6/udp.c:1442 udp_v6_push_pending_frames+0xa2/0x140 net/ipv6/udp.c:1469 udpv6_sendmsg+0xfe0/0x2830 net/ipv6/udp.c:1759 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg+0xe5/0x270 net/socket.c:742 __sys_sendto+0x3eb/0x580 net/socket.c:2206 __do_sys_sendto net/socket.c:2213 [inline] __se_sys_sendto net/socket.c:2209 [inline] __x64_sys_sendto+0xde/0x100 net/socket.c:2209 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xd2/0xf20 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f67b4d9c629 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f67b5c98028 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f67b5015fa0 RCX: 00007f67b4d9c629 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007f67b4e32b39 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000040000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f67b5016038 R14: 00007f67b5015fa0 R15: 00007ffe3cb66dd8 Fixes: b650bf0977d3 ("udp: remove busylock and add per NUMA queues") Reported-by: syzbot Signed-off-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20260219173142.310741-1-kuniyu@google.com Signed-off-by: Jakub Kicinski --- net/ipv4/udplite.c | 3 +-- net/ipv6/udplite.c | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c index d3e621a11a1a..826e9e79eb19 100644 --- a/net/ipv4/udplite.c +++ b/net/ipv4/udplite.c @@ -20,10 +20,9 @@ EXPORT_SYMBOL(udplite_table); /* Designate sk as UDP-Lite socket */ static int udplite_sk_init(struct sock *sk) { - udp_init_sock(sk); pr_warn_once("UDP-Lite is deprecated and scheduled to be removed in 2025, " "please contact the netdev mailing list\n"); - return 0; + return udp_init_sock(sk); } static int udplite_rcv(struct sk_buff *skb) diff --git a/net/ipv6/udplite.c b/net/ipv6/udplite.c index 2cec542437f7..e867721cda4d 100644 --- a/net/ipv6/udplite.c +++ b/net/ipv6/udplite.c @@ -16,10 +16,9 @@ static int udplitev6_sk_init(struct sock *sk) { - udpv6_init_sock(sk); pr_warn_once("UDP-Lite is deprecated and scheduled to be removed in 2025, " "please contact the netdev mailing list\n"); - return 0; + return udpv6_init_sock(sk); } static int udplitev6_rcv(struct sk_buff *skb) From e123d9302d223767bd910bfbcfe607bae909f8ac Mon Sep 17 00:00:00 2001 From: Pavan Chebbi Date: Thu, 19 Feb 2026 10:53:11 -0800 Subject: [PATCH 066/359] bnxt_en: Fix RSS context delete logic We need to free the corresponding RSS context VNIC in FW everytime an RSS context is deleted in driver. Commit 667ac333dbb7 added a check to delete the VNIC in FW only when netif_running() is true to help delete RSS contexts with interface down. Having that condition will make the driver leak VNICs in FW whenever close() happens with active RSS contexts. On the subsequent open(), as part of RSS context restoration, we will end up trying to create extra VNICs for which we did not make any reservation. FW can fail this request, thereby making us lose active RSS contexts. Suppose an RSS context is deleted already and we try to process a delete request again, then the HWRM functions will check for validity of the request and they simply return if the resource is already freed. So, even for delete-when-down cases, netif_running() check is not necessary. Remove the netif_running() condition check when deleting an RSS context. Reported-by: Jakub Kicinski Fixes: 667ac333dbb7 ("eth: bnxt: allow deleting RSS contexts when the device is down") Reviewed-by: Andy Gospodarek Signed-off-by: Pavan Chebbi Signed-off-by: Michael Chan Link: https://patch.msgid.link/20260219185313.2682148-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index fb45e1dd1dd7..7768d9753e2d 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -10889,12 +10889,10 @@ void bnxt_del_one_rss_ctx(struct bnxt *bp, struct bnxt_rss_ctx *rss_ctx, struct bnxt_ntuple_filter *ntp_fltr; int i; - if (netif_running(bp->dev)) { - bnxt_hwrm_vnic_free_one(bp, &rss_ctx->vnic); - for (i = 0; i < BNXT_MAX_CTX_PER_VNIC; i++) { - if (vnic->fw_rss_cos_lb_ctx[i] != INVALID_HW_RING_ID) - bnxt_hwrm_vnic_ctx_free_one(bp, vnic, i); - } + bnxt_hwrm_vnic_free_one(bp, &rss_ctx->vnic); + for (i = 0; i < BNXT_MAX_CTX_PER_VNIC; i++) { + if (vnic->fw_rss_cos_lb_ctx[i] != INVALID_HW_RING_ID) + bnxt_hwrm_vnic_ctx_free_one(bp, vnic, i); } if (!all) return; From c1bbd9900d65ac65b9fce9f129e3369a04871570 Mon Sep 17 00:00:00 2001 From: Pavan Chebbi Date: Thu, 19 Feb 2026 10:53:12 -0800 Subject: [PATCH 067/359] bnxt_en: Fix deleting of Ntuple filters Ntuple filters can be deleted when the interface is down. The current code blindly sends the filter delete command to FW. When the interface is down, all the VNICs are deleted in the FW. When the VNIC is freed in the FW, all the associated filters are also freed. We need not send the free command explicitly. Sending such command will generate FW error in the dmesg. In order to fix this, we can safely return from bnxt_hwrm_cfa_ntuple_filter_free() when BNXT_STATE_OPEN is not true which confirms the VNICs have been deleted. Fixes: 8336a974f37d ("bnxt_en: Save user configured filters in a lookup list") Suggested-by: Michael Chan Signed-off-by: Pavan Chebbi Signed-off-by: Michael Chan Link: https://patch.msgid.link/20260219185313.2682148-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 7768d9753e2d..8c73a723d23f 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -6240,6 +6240,9 @@ int bnxt_hwrm_cfa_ntuple_filter_free(struct bnxt *bp, int rc; set_bit(BNXT_FLTR_FW_DELETED, &fltr->base.state); + if (!test_bit(BNXT_STATE_OPEN, &bp->state)) + return 0; + rc = hwrm_req_init(bp, req, HWRM_CFA_NTUPLE_FILTER_FREE); if (rc) return rc; From ce5a0f4612dbacc805fe16719f9f6bd18352b70d Mon Sep 17 00:00:00 2001 From: Pavan Chebbi Date: Thu, 19 Feb 2026 10:53:13 -0800 Subject: [PATCH 068/359] selftests: drv-net: rss_ctx: test RSS contexts persist after ifdown/up Add a test to verify that RSS contexts persist across interface down/up along with their associated Ntuple filters. Another test that creates contexts/rules keeping interface down and test their persistence is also added. Tested on bnxt_en: TAP version 13 1..1 # timeout set to 0 # selftests: drivers/net/hw: rss_ctx.py # TAP version 13 # 1..2 # ok 1 rss_ctx.test_rss_context_persist_create_and_ifdown # ok 2 rss_ctx.test_rss_context_persist_ifdown_and_create # SKIP Create context not supported with interface down # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:1 error:0 Reviewed-by: Andy Gospodarek Signed-off-by: Pavan Chebbi Signed-off-by: Michael Chan Link: https://patch.msgid.link/20260219185313.2682148-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski --- .../selftests/drivers/net/hw/rss_ctx.py | 100 +++++++++++++++++- 1 file changed, 98 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/drivers/net/hw/rss_ctx.py b/tools/testing/selftests/drivers/net/hw/rss_ctx.py index ed7e405682f0..b9b7527c2c6b 100755 --- a/tools/testing/selftests/drivers/net/hw/rss_ctx.py +++ b/tools/testing/selftests/drivers/net/hw/rss_ctx.py @@ -4,13 +4,15 @@ import datetime import random import re +import time from lib.py import ksft_run, ksft_pr, ksft_exit from lib.py import ksft_eq, ksft_ne, ksft_ge, ksft_in, ksft_lt, ksft_true, ksft_raises from lib.py import NetDrvEpEnv from lib.py import EthtoolFamily, NetdevFamily from lib.py import KsftSkipEx, KsftFailEx +from lib.py import ksft_disruptive from lib.py import rand_port -from lib.py import ethtool, ip, defer, GenerateTraffic, CmdExitFailure +from lib.py import cmd, ethtool, ip, defer, GenerateTraffic, CmdExitFailure, wait_file def _rss_key_str(key): @@ -809,6 +811,98 @@ def test_rss_default_context_rule(cfg): 'noise' : (0, 1) }) +@ksft_disruptive +def test_rss_context_persist_ifupdown(cfg, pre_down=False): + """ + Test that RSS contexts and their associated ntuple filters persist across + an interface down/up cycle. + + """ + + require_ntuple(cfg) + + qcnt = len(_get_rx_cnts(cfg)) + if qcnt < 6: + try: + ethtool(f"-L {cfg.ifname} combined 6") + defer(ethtool, f"-L {cfg.ifname} combined {qcnt}") + except Exception as exc: + raise KsftSkipEx("Not enough queues for the test") from exc + + ethtool(f"-X {cfg.ifname} equal 2") + defer(ethtool, f"-X {cfg.ifname} default") + + ifup = defer(ip, f"link set dev {cfg.ifname} up") + if pre_down: + ip(f"link set dev {cfg.ifname} down") + + try: + ctx1_id = ethtool_create(cfg, "-X", "context new start 2 equal 2") + defer(ethtool, f"-X {cfg.ifname} context {ctx1_id} delete") + except CmdExitFailure as exc: + raise KsftSkipEx("Create context not supported with interface down") from exc + + ctx2_id = ethtool_create(cfg, "-X", "context new start 4 equal 2") + defer(ethtool, f"-X {cfg.ifname} context {ctx2_id} delete") + + port_ctx2 = rand_port() + flow = f"flow-type tcp{cfg.addr_ipver} dst-ip {cfg.addr} dst-port {port_ctx2} context {ctx2_id}" + ntuple_id = ethtool_create(cfg, "-N", flow) + defer(ethtool, f"-N {cfg.ifname} delete {ntuple_id}") + + if not pre_down: + ip(f"link set dev {cfg.ifname} down") + ifup.exec() + + wait_file(f"/sys/class/net/{cfg.ifname}/carrier", + lambda x: x.strip() == "1", deadline=20) + + remote_addr = cfg.remote_addr_v[cfg.addr_ipver] + for _ in range(10): + if cmd(f"ping -c 1 -W 1 {remote_addr}", fail=False).ret == 0: + break + time.sleep(1) + else: + raise KsftSkipEx("Cannot reach remote host after interface up") + + ctxs = cfg.ethnl.rss_get({'header': {'dev-name': cfg.ifname}}, dump=True) + + data1 = [c for c in ctxs if c.get('context') == ctx1_id] + ksft_eq(len(data1), 1, f"Context {ctx1_id} should persist after ifup") + + data2 = [c for c in ctxs if c.get('context') == ctx2_id] + ksft_eq(len(data2), 1, f"Context {ctx2_id} should persist after ifup") + + _ntuple_rule_check(cfg, ntuple_id, ctx2_id) + + cnts = _get_rx_cnts(cfg) + GenerateTraffic(cfg).wait_pkts_and_stop(20000) + cnts = _get_rx_cnts(cfg, prev=cnts) + + main_traffic = sum(cnts[0:2]) + ksft_ge(main_traffic, 18000, f"Main context traffic distribution: {cnts}") + ksft_lt(sum(cnts[2:6]), 500, f"Other context queues should be mostly empty: {cnts}") + + _send_traffic_check(cfg, port_ctx2, f"context {ctx2_id}", + {'target': (4, 5), + 'noise': (0, 1), + 'empty': (2, 3)}) + + +def test_rss_context_persist_create_and_ifdown(cfg): + """ + Create RSS contexts then cycle the interface down and up. + """ + test_rss_context_persist_ifupdown(cfg, pre_down=False) + + +def test_rss_context_persist_ifdown_and_create(cfg): + """ + Bring interface down first, then create RSS contexts and bring up. + """ + test_rss_context_persist_ifupdown(cfg, pre_down=True) + + def main() -> None: with NetDrvEpEnv(__file__, nsim_test=False) as cfg: cfg.context_cnt = None @@ -823,7 +917,9 @@ def main() -> None: test_rss_context_out_of_order, test_rss_context4_create_with_cfg, test_flow_add_context_missing, test_delete_rss_context_busy, test_rss_ntuple_addition, - test_rss_default_context_rule], + test_rss_default_context_rule, + test_rss_context_persist_create_and_ifdown, + test_rss_context_persist_ifdown_and_create], args=(cfg, )) ksft_exit() From d4f687fbbce45b5e88438e89b5e26c0c15847992 Mon Sep 17 00:00:00 2001 From: Ralf Lici Date: Wed, 18 Feb 2026 21:08:26 +0100 Subject: [PATCH 069/359] ovpn: tcp - fix packet extraction from stream When processing TCP stream data in ovpn_tcp_recv, we receive large cloned skbs from __strp_rcv that may contain multiple coalesced packets. The current implementation has two bugs: 1. Header offset overflow: Using pskb_pull with large offsets on coalesced skbs causes skb->data - skb->head to exceed the u16 storage of skb->network_header. This causes skb_reset_network_header to fail on the inner decapsulated packet, resulting in packet drops. 2. Unaligned protocol headers: Extracting packets from arbitrary positions within the coalesced TCP stream provides no alignment guarantees for the packet data causing performance penalties on architectures without efficient unaligned access. Additionally, openvpn's 2-byte length prefix on TCP packets causes the subsequent 4-byte opcode and packet ID fields to be inherently misaligned. Fix both issues by allocating a new skb for each openvpn packet and using skb_copy_bits to extract only the packet content into the new buffer, skipping the 2-byte length prefix. Also, check the length before invoking the function that performs the allocation to avoid creating an invalid skb. If the packet has to be forwarded to userspace the 2-byte prefix can be pushed to the head safely, without misalignment. As a side effect, this approach also avoids the expensive linearization that pskb_pull triggers on cloned skbs with page fragments. In testing, this resulted in TCP throughput improvements of up to 74%. Fixes: 11851cbd60ea ("ovpn: implement TCP transport") Signed-off-by: Ralf Lici Signed-off-by: Antonio Quartulli Reviewed-by: Sabrina Dubroca Signed-off-by: David S. Miller --- drivers/net/ovpn/tcp.c | 57 ++++++++++++++++++++++++++++-------------- 1 file changed, 38 insertions(+), 19 deletions(-) diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c index ec2bbc28c196..5499c1572f3e 100644 --- a/drivers/net/ovpn/tcp.c +++ b/drivers/net/ovpn/tcp.c @@ -70,37 +70,56 @@ static void ovpn_tcp_to_userspace(struct ovpn_peer *peer, struct sock *sk, peer->tcp.sk_cb.sk_data_ready(sk); } +static struct sk_buff *ovpn_tcp_skb_packet(const struct ovpn_peer *peer, + struct sk_buff *orig_skb, + const int pkt_len, const int pkt_off) +{ + struct sk_buff *ovpn_skb; + int err; + + /* create a new skb with only the content of the current packet */ + ovpn_skb = netdev_alloc_skb(peer->ovpn->dev, pkt_len); + if (unlikely(!ovpn_skb)) + goto err; + + skb_copy_header(ovpn_skb, orig_skb); + err = skb_copy_bits(orig_skb, pkt_off, skb_put(ovpn_skb, pkt_len), + pkt_len); + if (unlikely(err)) { + net_warn_ratelimited("%s: skb_copy_bits failed for peer %u\n", + netdev_name(peer->ovpn->dev), peer->id); + kfree_skb(ovpn_skb); + goto err; + } + + consume_skb(orig_skb); + return ovpn_skb; +err: + kfree_skb(orig_skb); + return NULL; +} + static void ovpn_tcp_rcv(struct strparser *strp, struct sk_buff *skb) { struct ovpn_peer *peer = container_of(strp, struct ovpn_peer, tcp.strp); struct strp_msg *msg = strp_msg(skb); - size_t pkt_len = msg->full_len - 2; - size_t off = msg->offset + 2; + int pkt_len = msg->full_len - 2; u8 opcode; - /* ensure skb->data points to the beginning of the openvpn packet */ - if (!pskb_pull(skb, off)) { - net_warn_ratelimited("%s: packet too small for peer %u\n", - netdev_name(peer->ovpn->dev), peer->id); - goto err; - } - - /* strparser does not trim the skb for us, therefore we do it now */ - if (pskb_trim(skb, pkt_len) != 0) { - net_warn_ratelimited("%s: trimming skb failed for peer %u\n", - netdev_name(peer->ovpn->dev), peer->id); - goto err; - } - - /* we need the first 4 bytes of data to be accessible + /* we need at least 4 bytes of data in the packet * to extract the opcode and the key ID later on */ - if (!pskb_may_pull(skb, OVPN_OPCODE_SIZE)) { + if (unlikely(pkt_len < OVPN_OPCODE_SIZE)) { net_warn_ratelimited("%s: packet too small to fetch opcode for peer %u\n", netdev_name(peer->ovpn->dev), peer->id); goto err; } + /* extract the packet into a new skb */ + skb = ovpn_tcp_skb_packet(peer, skb, pkt_len, msg->offset + 2); + if (unlikely(!skb)) + goto err; + /* DATA_V2 packets are handled in kernel, the rest goes to user space */ opcode = ovpn_opcode_from_skb(skb, 0); if (unlikely(opcode != OVPN_DATA_V2)) { @@ -113,7 +132,7 @@ static void ovpn_tcp_rcv(struct strparser *strp, struct sk_buff *skb) /* The packet size header must be there when sending the packet * to userspace, therefore we put it back */ - skb_push(skb, 2); + *(__be16 *)__skb_push(skb, sizeof(u16)) = htons(pkt_len); ovpn_tcp_to_userspace(peer, strp->sk, skb); return; } From 08f97454b7fa39bfcf82524955c771d2d693d6fe Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Sun, 22 Feb 2026 13:35:13 +0000 Subject: [PATCH 070/359] KVM: arm64: Fix protected mode handling of pages larger than 4kB Since 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings"), pKVM tracks the memory that has been mapped into a guest in a side data structure. Crucially, it uses it to find out whether a page has already been mapped, and therefore refuses to map it twice. So far, so good. However, this very patch completely breaks non-4kB page support, with guests being unable to boot. The most obvious symptom is that we take the same fault repeatedly, and not making forward progress. A quick investigation shows that this is because of the above rejection code. As it turns out, there are multiple issues at play: - while the HPFAR_EL2 register gives you the faulting IPA minus the bottom 12 bits, it will still give you the extra bits that are part of the page offset for anything larger than 4kB, even for a level-3 mapping - pkvm_pgtable_stage2_map() assumes that the address passed as a parameter is aligned to the size of the intended mapping - the faulting address is only aligned for a non-page mapping When the planets are suitably aligned (pun intended), the guest faults on a page by accessing it past the bottom 4kB, and extra bits get set in the HPFAR_EL2 register. If this results in a page mapping (which is likely with large granule sizes), nothing aligns it further down, and pkvm_mapping_iter_first() finds an intersection that doesn't really exist. We assume this is a spurious fault and return -EAGAIN. And again... This doesn't hit outside of the protected code, as the page table code always aligns the IPA down to a page boundary, hiding the issue for everyone else. Fix it by always forcing the alignment on vma_pagesize, irrespective of the value of vma_pagesize. Fixes: 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings") Reviewed-by: Fuad Tabba Tested-by: Fuad Tabba Signed-off-by: Marc Zyngier Link: https://https://patch.msgid.link/20260222141000.3084258-1-maz@kernel.org Cc: stable@vger.kernel.org --- arch/arm64/kvm/mmu.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 8c5d259810b2..3952415c4f83 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1753,14 +1753,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, } /* - * Both the canonical IPA and fault IPA must be hugepage-aligned to - * ensure we find the right PFN and lay down the mapping in the right - * place. + * Both the canonical IPA and fault IPA must be aligned to the + * mapping size to ensure we find the right PFN and lay down the + * mapping in the right place. */ - if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) { - fault_ipa &= ~(vma_pagesize - 1); - ipa &= ~(vma_pagesize - 1); - } + fault_ipa = ALIGN_DOWN(fault_ipa, vma_pagesize); + ipa = ALIGN_DOWN(ipa, vma_pagesize); gfn = ipa >> PAGE_SHIFT; mte_allowed = kvm_vma_mte_allowed(vma); From 663c28469d3274d6456f206a6671c91493d85ff1 Mon Sep 17 00:00:00 2001 From: Henrique Carvalho Date: Sat, 21 Feb 2026 01:59:44 -0300 Subject: [PATCH 071/359] smb: client: fix cifs_pick_channel when channels are equally loaded cifs_pick_channel uses (start % chan_count) when channels are equally loaded, but that can return a channel that failed the eligibility checks. Drop the fallback and return the scan-selected channel instead. If none is eligible, keep the existing behavior of using the primary channel. Signed-off-by: Henrique Carvalho Acked-by: Paulo Alcantara (Red Hat) Acked-by: Meetakshi Setiya Reviewed-by: Shyam Prasad N Cc: stable@vger.kernel.org Signed-off-by: Steve French --- fs/smb/client/transport.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c index 75697f6d2566..05f8099047e1 100644 --- a/fs/smb/client/transport.c +++ b/fs/smb/client/transport.c @@ -807,16 +807,21 @@ cifs_cancelled_callback(struct TCP_Server_Info *server, struct mid_q_entry *mid) } /* - * Return a channel (master if none) of @ses that can be used to send - * regular requests. + * cifs_pick_channel - pick an eligible channel for network operations * - * If we are currently binding a new channel (negprot/sess.setup), - * return the new incomplete channel. + * @ses: session reference + * + * Select an eligible channel (not terminating and not marked as needing + * reconnect), preferring the least loaded one. If no eligible channel is + * found, fall back to the primary channel (index 0). + * + * Return: TCP_Server_Info pointer for the chosen channel, or NULL if @ses is + * NULL. */ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) { uint index = 0; - unsigned int min_in_flight = UINT_MAX, max_in_flight = 0; + unsigned int min_in_flight = UINT_MAX; struct TCP_Server_Info *server = NULL; int i, start, cur; @@ -846,14 +851,8 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) min_in_flight = server->in_flight; index = cur; } - if (server->in_flight > max_in_flight) - max_in_flight = server->in_flight; } - /* if all channels are equally loaded, fall back to round-robin */ - if (min_in_flight == max_in_flight) - index = (uint)start % ses->chan_count; - server = ses->chans[index].server; spin_unlock(&ses->chan_lock); From 2692c614f8f05929d692b3dbfd3faef1f00fbaf0 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Tue, 10 Feb 2026 14:58:22 +0100 Subject: [PATCH 072/359] device property: Allow secondary lookup in fwnode_get_next_child_node() When device_get_child_node_count() got split to the fwnode and device respective APIs, the fwnode didn't inherit the ability to traverse over the secondary fwnode. Hence any user, that switches from device to fwnode API misses this feature. In particular, this was revealed by the commit 1490cbb9dbfd ("device property: Split fwnode_get_child_node_count()") that effectively broke the GPIO enumeration on Intel Galileo boards. Fix this by moving the secondary lookup from device to fwnode API. Note, in general no device_*() API should go into the depth of the fwnode implementation. Fixes: 114dbb4fa7c4 ("drivers property: When no children in primary, try secondary") Cc: stable@vger.kernel.org Signed-off-by: Andy Shevchenko Reviewed-by: Rafael J. Wysocki (Intel) Reviewed-by: Sakari Ailus Link: https://patch.msgid.link/20260210135822.47335-1-andriy.shevchenko@linux.intel.com Signed-off-by: Danilo Krummrich --- drivers/base/property.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/drivers/base/property.c b/drivers/base/property.c index 6a63860579dd..8d9a34be57fb 100644 --- a/drivers/base/property.c +++ b/drivers/base/property.c @@ -797,7 +797,18 @@ struct fwnode_handle * fwnode_get_next_child_node(const struct fwnode_handle *fwnode, struct fwnode_handle *child) { - return fwnode_call_ptr_op(fwnode, get_next_child_node, child); + struct fwnode_handle *next; + + if (IS_ERR_OR_NULL(fwnode)) + return NULL; + + /* Try to find a child in primary fwnode */ + next = fwnode_call_ptr_op(fwnode, get_next_child_node, child); + if (next) + return next; + + /* When no more children in primary, continue with secondary */ + return fwnode_call_ptr_op(fwnode->secondary, get_next_child_node, child); } EXPORT_SYMBOL_GPL(fwnode_get_next_child_node); @@ -841,19 +852,7 @@ EXPORT_SYMBOL_GPL(fwnode_get_next_available_child_node); struct fwnode_handle *device_get_next_child_node(const struct device *dev, struct fwnode_handle *child) { - const struct fwnode_handle *fwnode = dev_fwnode(dev); - struct fwnode_handle *next; - - if (IS_ERR_OR_NULL(fwnode)) - return NULL; - - /* Try to find a child in primary fwnode */ - next = fwnode_get_next_child_node(fwnode, child); - if (next) - return next; - - /* When no more children in primary, continue with secondary */ - return fwnode_get_next_child_node(fwnode->secondary, child); + return fwnode_get_next_child_node(dev_fwnode(dev), child); } EXPORT_SYMBOL_GPL(device_get_next_child_node); From be704107e79696e855aa41e901a926039b6d2410 Mon Sep 17 00:00:00 2001 From: David Lechner Date: Thu, 19 Feb 2026 16:55:30 -0600 Subject: [PATCH 073/359] regulator: dt-bindings: mt6359: make regulator names unique Update the example devicetree with unique regulator names for all regulators. This reflects the same change made to the actual .dtsi file. Signed-off-by: David Lechner Acked-by: Krzysztof Kozlowski Link: https://patch.msgid.link/20260219-mtk-mt6359-fix-regulator-names-v1-2-ee0fcebfe1d9@baylibre.com Signed-off-by: Mark Brown --- .../devicetree/bindings/regulator/mt6359-regulator.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/regulator/mt6359-regulator.yaml b/Documentation/devicetree/bindings/regulator/mt6359-regulator.yaml index d6b3b5a5c0b3..fe4ac9350ba0 100644 --- a/Documentation/devicetree/bindings/regulator/mt6359-regulator.yaml +++ b/Documentation/devicetree/bindings/regulator/mt6359-regulator.yaml @@ -287,7 +287,7 @@ examples: regulator-max-microvolt = <1700000>; }; mt6359_vrfck_1_ldo_reg: ldo_vrfck_1 { - regulator-name = "vrfck"; + regulator-name = "vrfck_1"; regulator-min-microvolt = <1240000>; regulator-max-microvolt = <1600000>; }; @@ -309,7 +309,7 @@ examples: regulator-max-microvolt = <3300000>; }; mt6359_vemc_1_ldo_reg: ldo_vemc_1 { - regulator-name = "vemc"; + regulator-name = "vemc_1"; regulator-min-microvolt = <2500000>; regulator-max-microvolt = <3300000>; }; From bfcaf4e8ae9b4f52b97fb04ce03be3a01d41cdd0 Mon Sep 17 00:00:00 2001 From: Danilo Krummrich Date: Mon, 16 Feb 2026 14:14:33 +0100 Subject: [PATCH 074/359] rust: io: macro_export io_define_read!() and io_define_write!() Currently, the define_read!() and define_write!() I/O macros are crate public. The only user outside of the I/O module is PCI (for the configurations space I/O backend). Consequently, when CONFIG_PCI=n this causes a compile time warning [1]. In order to fix this, rename the macros to io_define_read!() and io_define_write!() and use #[macro_export] to export them. This is better than making the crate public visibility conditional, as eventually subsystems will have their own crate. Also, I/O backends are valid to be implemented by drivers as well. For instance, there are devices (such as GPUs) that run firmware which allows to program other devices only accessible through the primary device through indirect I/O. Since the macros are now public, also add the corresponding documentation. Fixes: 121d87b28e1d ("rust: io: separate generic I/O helpers from MMIO implementation") Reported-by: Miguel Ojeda Closes: https://lore.kernel.org/driver-core/CANiq72khOYkt6t5zwMvSiyZvWWHMZuNCMERXu=7K=_5tT-8Pgg@mail.gmail.com/ [1] Reviewed-by: Alice Ryhl Reviewed-by: Daniel Almeida Link: https://patch.msgid.link/20260216131534.65008-1-dakr@kernel.org Signed-off-by: Danilo Krummrich --- rust/kernel/io.rs | 131 ++++++++++++++++++++++++++++-------------- rust/kernel/pci/io.rs | 24 ++++---- 2 files changed, 101 insertions(+), 54 deletions(-) diff --git a/rust/kernel/io.rs b/rust/kernel/io.rs index c1cca7b438c3..e5fba6bf6db0 100644 --- a/rust/kernel/io.rs +++ b/rust/kernel/io.rs @@ -139,9 +139,9 @@ pub struct Mmio(MmioRaw); /// Internal helper macros used to invoke C MMIO read functions. /// -/// This macro is intended to be used by higher-level MMIO access macros (define_read) and provides -/// a unified expansion for infallible vs. fallible read semantics. It emits a direct call into the -/// corresponding C helper and performs the required cast to the Rust return type. +/// This macro is intended to be used by higher-level MMIO access macros (io_define_read) and +/// provides a unified expansion for infallible vs. fallible read semantics. It emits a direct call +/// into the corresponding C helper and performs the required cast to the Rust return type. /// /// # Parameters /// @@ -166,9 +166,9 @@ macro_rules! call_mmio_read { /// Internal helper macros used to invoke C MMIO write functions. /// -/// This macro is intended to be used by higher-level MMIO access macros (define_write) and provides -/// a unified expansion for infallible vs. fallible write semantics. It emits a direct call into the -/// corresponding C helper and performs the required cast to the Rust return type. +/// This macro is intended to be used by higher-level MMIO access macros (io_define_write) and +/// provides a unified expansion for infallible vs. fallible write semantics. It emits a direct call +/// into the corresponding C helper and performs the required cast to the Rust return type. /// /// # Parameters /// @@ -193,7 +193,30 @@ macro_rules! call_mmio_write { }}; } -macro_rules! define_read { +/// Generates an accessor method for reading from an I/O backend. +/// +/// This macro reduces boilerplate by automatically generating either compile-time bounds-checked +/// (infallible) or runtime bounds-checked (fallible) read methods. It abstracts the address +/// calculation and bounds checking, and delegates the actual I/O read operation to a specified +/// helper macro, making it generic over different I/O backends. +/// +/// # Parameters +/// +/// * `infallible` / `fallible` - Determines the bounds-checking strategy. `infallible` relies on +/// `IoKnownSize` for compile-time checks and returns the value directly. `fallible` performs +/// runtime checks against `maxsize()` and returns a `Result`. +/// * `$(#[$attr:meta])*` - Optional attributes to apply to the generated method (e.g., +/// `#[cfg(CONFIG_64BIT)]` or inline directives). +/// * `$vis:vis` - The visibility of the generated method (e.g., `pub`). +/// * `$name:ident` / `$try_name:ident` - The name of the generated method (e.g., `read32`, +/// `try_read8`). +/// * `$call_macro:ident` - The backend-specific helper macro used to emit the actual I/O call +/// (e.g., `call_mmio_read`). +/// * `$c_fn:ident` - The backend-specific C function or identifier to be passed into the +/// `$call_macro`. +/// * `$type_name:ty` - The Rust type of the value being read (e.g., `u8`, `u32`). +#[macro_export] +macro_rules! io_define_read { (infallible, $(#[$attr:meta])* $vis:vis $name:ident, $call_macro:ident($c_fn:ident) -> $type_name:ty) => { /// Read IO data from a given offset known at compile time. @@ -226,9 +249,33 @@ macro_rules! define_read { } }; } -pub(crate) use define_read; +pub use io_define_read; -macro_rules! define_write { +/// Generates an accessor method for writing to an I/O backend. +/// +/// This macro reduces boilerplate by automatically generating either compile-time bounds-checked +/// (infallible) or runtime bounds-checked (fallible) write methods. It abstracts the address +/// calculation and bounds checking, and delegates the actual I/O write operation to a specified +/// helper macro, making it generic over different I/O backends. +/// +/// # Parameters +/// +/// * `infallible` / `fallible` - Determines the bounds-checking strategy. `infallible` relies on +/// `IoKnownSize` for compile-time checks and returns `()`. `fallible` performs runtime checks +/// against `maxsize()` and returns a `Result`. +/// * `$(#[$attr:meta])*` - Optional attributes to apply to the generated method (e.g., +/// `#[cfg(CONFIG_64BIT)]` or inline directives). +/// * `$vis:vis` - The visibility of the generated method (e.g., `pub`). +/// * `$name:ident` / `$try_name:ident` - The name of the generated method (e.g., `write32`, +/// `try_write8`). +/// * `$call_macro:ident` - The backend-specific helper macro used to emit the actual I/O call +/// (e.g., `call_mmio_write`). +/// * `$c_fn:ident` - The backend-specific C function or identifier to be passed into the +/// `$call_macro`. +/// * `$type_name:ty` - The Rust type of the value being written (e.g., `u8`, `u32`). Note the use +/// of `<-` before the type to denote a write operation. +#[macro_export] +macro_rules! io_define_write { (infallible, $(#[$attr:meta])* $vis:vis $name:ident, $call_macro:ident($c_fn:ident) <- $type_name:ty) => { /// Write IO data from a given offset known at compile time. @@ -259,7 +306,7 @@ macro_rules! define_write { } }; } -pub(crate) use define_write; +pub use io_define_write; /// Checks whether an access of type `U` at the given `offset` /// is valid within this region. @@ -509,40 +556,40 @@ impl Io for Mmio { self.0.maxsize() } - define_read!(fallible, try_read8, call_mmio_read(readb) -> u8); - define_read!(fallible, try_read16, call_mmio_read(readw) -> u16); - define_read!(fallible, try_read32, call_mmio_read(readl) -> u32); - define_read!( + io_define_read!(fallible, try_read8, call_mmio_read(readb) -> u8); + io_define_read!(fallible, try_read16, call_mmio_read(readw) -> u16); + io_define_read!(fallible, try_read32, call_mmio_read(readl) -> u32); + io_define_read!( fallible, #[cfg(CONFIG_64BIT)] try_read64, call_mmio_read(readq) -> u64 ); - define_write!(fallible, try_write8, call_mmio_write(writeb) <- u8); - define_write!(fallible, try_write16, call_mmio_write(writew) <- u16); - define_write!(fallible, try_write32, call_mmio_write(writel) <- u32); - define_write!( + io_define_write!(fallible, try_write8, call_mmio_write(writeb) <- u8); + io_define_write!(fallible, try_write16, call_mmio_write(writew) <- u16); + io_define_write!(fallible, try_write32, call_mmio_write(writel) <- u32); + io_define_write!( fallible, #[cfg(CONFIG_64BIT)] try_write64, call_mmio_write(writeq) <- u64 ); - define_read!(infallible, read8, call_mmio_read(readb) -> u8); - define_read!(infallible, read16, call_mmio_read(readw) -> u16); - define_read!(infallible, read32, call_mmio_read(readl) -> u32); - define_read!( + io_define_read!(infallible, read8, call_mmio_read(readb) -> u8); + io_define_read!(infallible, read16, call_mmio_read(readw) -> u16); + io_define_read!(infallible, read32, call_mmio_read(readl) -> u32); + io_define_read!( infallible, #[cfg(CONFIG_64BIT)] read64, call_mmio_read(readq) -> u64 ); - define_write!(infallible, write8, call_mmio_write(writeb) <- u8); - define_write!(infallible, write16, call_mmio_write(writew) <- u16); - define_write!(infallible, write32, call_mmio_write(writel) <- u32); - define_write!( + io_define_write!(infallible, write8, call_mmio_write(writeb) <- u8); + io_define_write!(infallible, write16, call_mmio_write(writew) <- u16); + io_define_write!(infallible, write32, call_mmio_write(writel) <- u32); + io_define_write!( infallible, #[cfg(CONFIG_64BIT)] write64, @@ -566,40 +613,40 @@ impl Mmio { unsafe { &*core::ptr::from_ref(raw).cast() } } - define_read!(infallible, pub read8_relaxed, call_mmio_read(readb_relaxed) -> u8); - define_read!(infallible, pub read16_relaxed, call_mmio_read(readw_relaxed) -> u16); - define_read!(infallible, pub read32_relaxed, call_mmio_read(readl_relaxed) -> u32); - define_read!( + io_define_read!(infallible, pub read8_relaxed, call_mmio_read(readb_relaxed) -> u8); + io_define_read!(infallible, pub read16_relaxed, call_mmio_read(readw_relaxed) -> u16); + io_define_read!(infallible, pub read32_relaxed, call_mmio_read(readl_relaxed) -> u32); + io_define_read!( infallible, #[cfg(CONFIG_64BIT)] pub read64_relaxed, call_mmio_read(readq_relaxed) -> u64 ); - define_read!(fallible, pub try_read8_relaxed, call_mmio_read(readb_relaxed) -> u8); - define_read!(fallible, pub try_read16_relaxed, call_mmio_read(readw_relaxed) -> u16); - define_read!(fallible, pub try_read32_relaxed, call_mmio_read(readl_relaxed) -> u32); - define_read!( + io_define_read!(fallible, pub try_read8_relaxed, call_mmio_read(readb_relaxed) -> u8); + io_define_read!(fallible, pub try_read16_relaxed, call_mmio_read(readw_relaxed) -> u16); + io_define_read!(fallible, pub try_read32_relaxed, call_mmio_read(readl_relaxed) -> u32); + io_define_read!( fallible, #[cfg(CONFIG_64BIT)] pub try_read64_relaxed, call_mmio_read(readq_relaxed) -> u64 ); - define_write!(infallible, pub write8_relaxed, call_mmio_write(writeb_relaxed) <- u8); - define_write!(infallible, pub write16_relaxed, call_mmio_write(writew_relaxed) <- u16); - define_write!(infallible, pub write32_relaxed, call_mmio_write(writel_relaxed) <- u32); - define_write!( + io_define_write!(infallible, pub write8_relaxed, call_mmio_write(writeb_relaxed) <- u8); + io_define_write!(infallible, pub write16_relaxed, call_mmio_write(writew_relaxed) <- u16); + io_define_write!(infallible, pub write32_relaxed, call_mmio_write(writel_relaxed) <- u32); + io_define_write!( infallible, #[cfg(CONFIG_64BIT)] pub write64_relaxed, call_mmio_write(writeq_relaxed) <- u64 ); - define_write!(fallible, pub try_write8_relaxed, call_mmio_write(writeb_relaxed) <- u8); - define_write!(fallible, pub try_write16_relaxed, call_mmio_write(writew_relaxed) <- u16); - define_write!(fallible, pub try_write32_relaxed, call_mmio_write(writel_relaxed) <- u32); - define_write!( + io_define_write!(fallible, pub try_write8_relaxed, call_mmio_write(writeb_relaxed) <- u8); + io_define_write!(fallible, pub try_write16_relaxed, call_mmio_write(writew_relaxed) <- u16); + io_define_write!(fallible, pub try_write32_relaxed, call_mmio_write(writel_relaxed) <- u32); + io_define_write!( fallible, #[cfg(CONFIG_64BIT)] pub try_write64_relaxed, diff --git a/rust/kernel/pci/io.rs b/rust/kernel/pci/io.rs index 6ca4cf75594c..fb6edab2aea7 100644 --- a/rust/kernel/pci/io.rs +++ b/rust/kernel/pci/io.rs @@ -8,8 +8,8 @@ use crate::{ device, devres::Devres, io::{ - define_read, - define_write, + io_define_read, + io_define_write, Io, IoCapable, IoKnownSize, @@ -88,7 +88,7 @@ pub struct ConfigSpace<'a, S: ConfigSpaceKind = Extended> { /// Internal helper macros used to invoke C PCI configuration space read functions. /// /// This macro is intended to be used by higher-level PCI configuration space access macros -/// (define_read) and provides a unified expansion for infallible vs. fallible read semantics. It +/// (io_define_read) and provides a unified expansion for infallible vs. fallible read semantics. It /// emits a direct call into the corresponding C helper and performs the required cast to the Rust /// return type. /// @@ -117,9 +117,9 @@ macro_rules! call_config_read { /// Internal helper macros used to invoke C PCI configuration space write functions. /// /// This macro is intended to be used by higher-level PCI configuration space access macros -/// (define_write) and provides a unified expansion for infallible vs. fallible read semantics. It -/// emits a direct call into the corresponding C helper and performs the required cast to the Rust -/// return type. +/// (io_define_write) and provides a unified expansion for infallible vs. fallible read semantics. +/// It emits a direct call into the corresponding C helper and performs the required cast to the +/// Rust return type. /// /// # Parameters /// @@ -163,13 +163,13 @@ impl<'a, S: ConfigSpaceKind> Io for ConfigSpace<'a, S> { // PCI configuration space does not support fallible operations. // The default implementations from the Io trait are not used. - define_read!(infallible, read8, call_config_read(pci_read_config_byte) -> u8); - define_read!(infallible, read16, call_config_read(pci_read_config_word) -> u16); - define_read!(infallible, read32, call_config_read(pci_read_config_dword) -> u32); + io_define_read!(infallible, read8, call_config_read(pci_read_config_byte) -> u8); + io_define_read!(infallible, read16, call_config_read(pci_read_config_word) -> u16); + io_define_read!(infallible, read32, call_config_read(pci_read_config_dword) -> u32); - define_write!(infallible, write8, call_config_write(pci_write_config_byte) <- u8); - define_write!(infallible, write16, call_config_write(pci_write_config_word) <- u16); - define_write!(infallible, write32, call_config_write(pci_write_config_dword) <- u32); + io_define_write!(infallible, write8, call_config_write(pci_write_config_byte) <- u8); + io_define_write!(infallible, write16, call_config_write(pci_write_config_word) <- u16); + io_define_write!(infallible, write32, call_config_write(pci_write_config_dword) <- u32); } impl<'a, S: ConfigSpaceKind> IoKnownSize for ConfigSpace<'a, S> { From c5794709bc9105935dbedef8b9cf9c06f2b559fa Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 17 Feb 2026 20:28:29 -0800 Subject: [PATCH 075/359] ksmbd: Compare MACs in constant time To prevent timing attacks, MAC comparisons need to be constant-time. Replace the memcmp() with the correct function, crypto_memneq(). Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers Acked-by: Namjae Jeon Signed-off-by: Steve French --- fs/smb/server/Kconfig | 1 + fs/smb/server/auth.c | 4 +++- fs/smb/server/smb2pdu.c | 5 +++-- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/smb/server/Kconfig b/fs/smb/server/Kconfig index 2775162c535c..12594879cb64 100644 --- a/fs/smb/server/Kconfig +++ b/fs/smb/server/Kconfig @@ -13,6 +13,7 @@ config SMB_SERVER select CRYPTO_LIB_MD5 select CRYPTO_LIB_SHA256 select CRYPTO_LIB_SHA512 + select CRYPTO_LIB_UTILS select CRYPTO_CMAC select CRYPTO_AEAD2 select CRYPTO_CCM diff --git a/fs/smb/server/auth.c b/fs/smb/server/auth.c index 580c4d303dc3..5fe8c667c6b1 100644 --- a/fs/smb/server/auth.c +++ b/fs/smb/server/auth.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -165,7 +166,8 @@ int ksmbd_auth_ntlmv2(struct ksmbd_conn *conn, struct ksmbd_session *sess, ntlmv2_rsp, CIFS_HMAC_MD5_HASH_SIZE, sess->sess_key); - if (memcmp(ntlmv2->ntlmv2_hash, ntlmv2_rsp, CIFS_HMAC_MD5_HASH_SIZE) != 0) + if (crypto_memneq(ntlmv2->ntlmv2_hash, ntlmv2_rsp, + CIFS_HMAC_MD5_HASH_SIZE)) return -EINVAL; return 0; } diff --git a/fs/smb/server/smb2pdu.c b/fs/smb/server/smb2pdu.c index 95901a78951c..743c629fe7ec 100644 --- a/fs/smb/server/smb2pdu.c +++ b/fs/smb/server/smb2pdu.c @@ -4,6 +4,7 @@ * Copyright (C) 2018 Samsung Electronics Co., Ltd. */ +#include #include #include #include @@ -8880,7 +8881,7 @@ int smb2_check_sign_req(struct ksmbd_work *work) ksmbd_sign_smb2_pdu(work->conn, work->sess->sess_key, iov, 1, signature); - if (memcmp(signature, signature_req, SMB2_SIGNATURE_SIZE)) { + if (crypto_memneq(signature, signature_req, SMB2_SIGNATURE_SIZE)) { pr_err("bad smb2 signature\n"); return 0; } @@ -8968,7 +8969,7 @@ int smb3_check_sign_req(struct ksmbd_work *work) if (ksmbd_sign_smb3_pdu(conn, signing_key, iov, 1, signature)) return 0; - if (memcmp(signature, signature_req, SMB2_SIGNATURE_SIZE)) { + if (crypto_memneq(signature, signature_req, SMB2_SIGNATURE_SIZE)) { pr_err("bad smb2 signature\n"); return 0; } From 6b4f875aac344cdd52a1f34cc70ed2f874a65757 Mon Sep 17 00:00:00 2001 From: Nicholas Carlini Date: Thu, 19 Feb 2026 20:58:57 +0900 Subject: [PATCH 076/359] ksmbd: fix signededness bug in smb_direct_prepare_negotiation() smb_direct_prepare_negotiation() casts an unsigned __u32 value from sp->max_recv_size and req->preferred_send_size to a signed int before computing min_t(int, ...). A maliciously provided preferred_send_size of 0x80000000 will return as smaller than max_recv_size, and then be used to set the maximum allowed alowed receive size for the next message. By sending a second message with a large value (>1420 bytes) the attacker can then achieve a heap buffer overflow. This fix replaces min_t(int, ...) with min_t(u32) Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Signed-off-by: Nicholas Carlini Reviewed-by: Stefan Metzmacher Acked-by: Stefan Metzmacher Acked-by: Namjae Jeon Signed-off-by: Steve French --- fs/smb/server/transport_rdma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/smb/server/transport_rdma.c b/fs/smb/server/transport_rdma.c index 7c53b78b818e..188572491d53 100644 --- a/fs/smb/server/transport_rdma.c +++ b/fs/smb/server/transport_rdma.c @@ -2540,9 +2540,9 @@ static int smb_direct_prepare(struct ksmbd_transport *t) goto put; req = (struct smbdirect_negotiate_req *)recvmsg->packet; - sp->max_recv_size = min_t(int, sp->max_recv_size, + sp->max_recv_size = min_t(u32, sp->max_recv_size, le32_to_cpu(req->preferred_send_size)); - sp->max_send_size = min_t(int, sp->max_send_size, + sp->max_send_size = min_t(u32, sp->max_send_size, le32_to_cpu(req->max_receive_size)); sp->max_fragmented_send_size = le32_to_cpu(req->max_fragmented_size); From 47322c469d4a63ac45b705ca83680671ff71c975 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Mon, 9 Feb 2026 16:38:05 +0100 Subject: [PATCH 077/359] dma-mapping: avoid random addr value print out on error path dma_addr is unitialized in dma_direct_map_phys() when swiotlb is forced and DMA_ATTR_MMIO is set which leads to random value print out in warning. Fix that by just returning DMA_MAPPING_ERROR. Fixes: e53d29f957b3 ("dma-mapping: convert dma_direct_*map_page to be phys_addr_t based") Signed-off-by: Jiri Pirko Signed-off-by: Marek Szyprowski Link: https://lore.kernel.org/r/20260209153809.250835-2-jiri@resnulli.us --- kernel/dma/direct.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h index f476c63b668c..e89f175e9c2d 100644 --- a/kernel/dma/direct.h +++ b/kernel/dma/direct.h @@ -85,7 +85,7 @@ static inline dma_addr_t dma_direct_map_phys(struct device *dev, if (is_swiotlb_force_bounce(dev)) { if (attrs & DMA_ATTR_MMIO) - goto err_overflow; + return DMA_MAPPING_ERROR; return swiotlb_map(dev, phys, size, dir, attrs); } From d5b5e8149af0f5efed58653cbebf1cb3258ce49a Mon Sep 17 00:00:00 2001 From: Stian Halseth Date: Wed, 18 Feb 2026 13:00:24 +0100 Subject: [PATCH 078/359] sparc: Fix page alignment in dma mapping 'phys' may include an offset within the page, while previously used 'base_paddr' was already page-aligned. This caused incorrect DMA mapping in dma_4u_map_phys and dma_4v_map_phys. Fix both functions by masking 'phys' with IO_PAGE_MASK, covering both generic SPARC code and sun4v. Fixes: 38c0d0ebf520 ("sparc: Use physical address DMA mapping") Reported-by: Stian Halseth Closes: https://github.com/sparclinux/issues/issues/75 Suggested-by: Marek Szyprowski Signed-off-by: Stian Halseth Tested-by: Nathaniel Roach Tested-by: Han Gao # on SPARC Enterprise T5220 [mszyprow: adjusted commit description a bit] Signed-off-by: Marek Szyprowski Link: https://lore.kernel.org/r/20260218120056.3366-2-stian@itx.no --- arch/sparc/kernel/iommu.c | 2 ++ arch/sparc/kernel/pci_sun4v.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c index 46ef88bc9c26..7613ab0ffb89 100644 --- a/arch/sparc/kernel/iommu.c +++ b/arch/sparc/kernel/iommu.c @@ -312,6 +312,8 @@ static dma_addr_t dma_4u_map_phys(struct device *dev, phys_addr_t phys, if (direction != DMA_TO_DEVICE) iopte_protection |= IOPTE_WRITE; + phys &= IO_PAGE_MASK; + for (i = 0; i < npages; i++, base++, phys += IO_PAGE_SIZE) iopte_val(*base) = iopte_protection | phys; diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 440284cc804e..61f14b4c8f90 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -410,6 +410,8 @@ static dma_addr_t dma_4v_map_phys(struct device *dev, phys_addr_t phys, iommu_batch_start(dev, prot, entry); + phys &= IO_PAGE_MASK; + for (i = 0; i < npages; i++, phys += IO_PAGE_SIZE) { long err = iommu_batch_add(phys, mask); if (unlikely(err < 0L)) From c8d7f21ead727485ebf965e2b4d42d4a4f0840f6 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Mon, 9 Feb 2026 19:12:20 +0100 Subject: [PATCH 079/359] wifi: cfg80211: wext: fix IGTK key ID off-by-one The IGTK key ID must be 4 or 5, but the code checks against key ID + 1, so must check against 5/6 rather than 4/5. Fix that. Reported-by: Jouni Malinen Fixes: 08645126dd24 ("cfg80211: implement wext key handling") Link: https://patch.msgid.link/20260209181220.362205-2-johannes@sipsolutions.net Signed-off-by: Johannes Berg --- net/wireless/wext-compat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/wireless/wext-compat.c b/net/wireless/wext-compat.c index 1241fda78a68..680500fa57cf 100644 --- a/net/wireless/wext-compat.c +++ b/net/wireless/wext-compat.c @@ -684,7 +684,7 @@ static int cfg80211_wext_siwencodeext(struct net_device *dev, idx = erq->flags & IW_ENCODE_INDEX; if (cipher == WLAN_CIPHER_SUITE_AES_CMAC) { - if (idx < 4 || idx > 5) { + if (idx < 5 || idx > 6) { idx = wdev->wext.default_mgmt_key; if (idx < 0) return -EINVAL; From 767d23ade706d5fa51c36168e92a9c5533c351a1 Mon Sep 17 00:00:00 2001 From: Daniil Dulov Date: Wed, 11 Feb 2026 11:20:24 +0300 Subject: [PATCH 080/359] wifi: cfg80211: cancel rfkill_block work in wiphy_unregister() There is a use-after-free error in cfg80211_shutdown_all_interfaces found by syzkaller: BUG: KASAN: use-after-free in cfg80211_shutdown_all_interfaces+0x213/0x220 Read of size 8 at addr ffff888112a78d98 by task kworker/0:5/5326 CPU: 0 UID: 0 PID: 5326 Comm: kworker/0:5 Not tainted 6.19.0-rc2 #2 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: events cfg80211_rfkill_block_work Call Trace: dump_stack_lvl+0x116/0x1f0 print_report+0xcd/0x630 kasan_report+0xe0/0x110 cfg80211_shutdown_all_interfaces+0x213/0x220 cfg80211_rfkill_block_work+0x1e/0x30 process_one_work+0x9cf/0x1b70 worker_thread+0x6c8/0xf10 kthread+0x3c5/0x780 ret_from_fork+0x56d/0x700 ret_from_fork_asm+0x1a/0x30 The problem arises due to the rfkill_block work is not cancelled when wiphy is being unregistered. In order to fix the issue cancel the corresponding work in wiphy_unregister(). Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 1f87f7d3a3b4 ("cfg80211: add rfkill support") Cc: stable@vger.kernel.org Signed-off-by: Daniil Dulov Link: https://patch.msgid.link/20260211082024.1967588-1-d.dulov@aladdin.ru Signed-off-by: Johannes Berg --- net/wireless/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/wireless/core.c b/net/wireless/core.c index 9af85d655027..d35cf04cbc81 100644 --- a/net/wireless/core.c +++ b/net/wireless/core.c @@ -1212,6 +1212,7 @@ void wiphy_unregister(struct wiphy *wiphy) /* this has nothing to do now but make sure it's gone */ cancel_work_sync(&rdev->wiphy_work); + cancel_work_sync(&rdev->rfkill_block); cancel_work_sync(&rdev->conn_work); flush_work(&rdev->event_work); cancel_delayed_work_sync(&rdev->dfs_update_channels_wk); From c854758abe0b8d86f9c43dc060ff56a0ee5b31e0 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Tue, 17 Feb 2026 13:05:26 +0100 Subject: [PATCH 081/359] wifi: radiotap: reject radiotap with unknown bits The radiotap parser is currently only used with the radiotap namespace (not with vendor namespaces), but if the undefined field 18 is used, the alignment/size is unknown as well. In this case, iterator->_next_ns_data isn't initialized (it's only set for skipping vendor namespaces), and syzbot points out that we later compare against this uninitialized value. Fix this by moving the rejection of unknown radiotap fields down to after the in-namespace lookup, so it will really use iterator->_next_ns_data only for vendor namespaces, even in case undefined fields are present. Cc: stable@vger.kernel.org Fixes: 33e5a2f776e3 ("wireless: update radiotap parser") Reported-by: syzbot+b09c1af8764c0097bb19@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/69944a91.a70a0220.2c38d7.00fc.GAE@google.com Link: https://patch.msgid.link/20260217120526.162647-2-johannes@sipsolutions.net Signed-off-by: Johannes Berg --- net/wireless/radiotap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/wireless/radiotap.c b/net/wireless/radiotap.c index 326faea38ca3..c85eaa583a46 100644 --- a/net/wireless/radiotap.c +++ b/net/wireless/radiotap.c @@ -239,14 +239,14 @@ int ieee80211_radiotap_iterator_next( default: if (!iterator->current_namespace || iterator->_arg_index >= iterator->current_namespace->n_bits) { - if (iterator->current_namespace == &radiotap_ns) - return -ENOENT; align = 0; } else { align = iterator->current_namespace->align_size[iterator->_arg_index].align; size = iterator->current_namespace->align_size[iterator->_arg_index].size; } if (!align) { + if (iterator->current_namespace == &radiotap_ns) + return -ENOENT; /* skip all subsequent data */ iterator->_arg = iterator->_next_ns_data; /* give up on this namespace */ From 243307a0d1b0d01538e202c00454c28b21d4432e Mon Sep 17 00:00:00 2001 From: Marek Szyprowski Date: Tue, 3 Feb 2026 11:21:33 +0100 Subject: [PATCH 082/359] wifi: brcmfmac: Fix potential kernel oops when probe fails When probe of the sdio brcmfmac device fails for some reasons (i.e. missing firmware), the sdiodev->bus is set to error instead of NULL, thus the cleanup later in brcmf_sdio_remove() tries to free resources via invalid bus pointer. This happens because sdiodev->bus is set 2 times: first in brcmf_sdio_probe() and second time in brcmf_sdiod_probe(). Fix this by chaning the brcmf_sdio_probe() function to return the error code and set sdio->bus only there. Fixes: 0ff0843310b7 ("wifi: brcmfmac: Add optional lpo clock enable support") Signed-off-by: Marek Szyprowski Acked-by: Arend van Spriel Link: https://patch.msgid.link/20260203102133.1478331-1-m.szyprowski@samsung.com Signed-off-by: Johannes Berg --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c | 7 +++---- drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 7 ++++--- drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.h | 2 +- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c index 6a3f187320fc..13952dfeb3e3 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c @@ -951,11 +951,10 @@ int brcmf_sdiod_probe(struct brcmf_sdio_dev *sdiodev) goto out; /* try to attach to the target device */ - sdiodev->bus = brcmf_sdio_probe(sdiodev); - if (IS_ERR(sdiodev->bus)) { - ret = PTR_ERR(sdiodev->bus); + ret = brcmf_sdio_probe(sdiodev); + if (ret) goto out; - } + brcmf_sdiod_host_fixup(sdiodev->func2->card->host); out: if (ret) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c index 8cf9d7e7c3f7..4e6ed02c1591 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c @@ -4445,7 +4445,7 @@ brcmf_sdio_prepare_fw_request(struct brcmf_sdio *bus) return fwreq; } -struct brcmf_sdio *brcmf_sdio_probe(struct brcmf_sdio_dev *sdiodev) +int brcmf_sdio_probe(struct brcmf_sdio_dev *sdiodev) { int ret; struct brcmf_sdio *bus; @@ -4551,11 +4551,12 @@ struct brcmf_sdio *brcmf_sdio_probe(struct brcmf_sdio_dev *sdiodev) goto fail; } - return bus; + return 0; fail: brcmf_sdio_remove(bus); - return ERR_PTR(ret); + sdiodev->bus = NULL; + return ret; } /* Detach and free everything */ diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.h b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.h index 0d18ed15b403..80180d5c6c87 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.h +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.h @@ -358,7 +358,7 @@ void brcmf_sdiod_freezer_uncount(struct brcmf_sdio_dev *sdiodev); int brcmf_sdiod_probe(struct brcmf_sdio_dev *sdiodev); int brcmf_sdiod_remove(struct brcmf_sdio_dev *sdiodev); -struct brcmf_sdio *brcmf_sdio_probe(struct brcmf_sdio_dev *sdiodev); +int brcmf_sdio_probe(struct brcmf_sdio_dev *sdiodev); void brcmf_sdio_remove(struct brcmf_sdio *bus); void brcmf_sdio_isr(struct brcmf_sdio *bus, bool in_isr); From 25723454f67980712234a42caca606b6710fd1e1 Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Tue, 10 Feb 2026 18:03:34 +0800 Subject: [PATCH 083/359] wifi: mwifiex: Fix dev_alloc_name() return value check dev_alloc_name() returns the allocated ID on success, which could be over 0. Fix the return value check to check for negative error codes. Reported-by: Dan Carpenter Closes: https://lore.kernel.org/all/aYmsQfujoAe5qO02@stanley.mountain/ Fixes: 7bab5bdb81e3 ("wifi: mwifiex: Allocate dev name earlier for interface workqueue name") Signed-off-by: Chen-Yu Tsai Reviewed-by: Francesco Dolcini Link: https://patch.msgid.link/20260210100337.1131279-1-wenst@chromium.org Signed-off-by: Johannes Berg --- drivers/net/wireless/marvell/mwifiex/cfg80211.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c index a66d18e380fc..bc0a946e8e6e 100644 --- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c +++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c @@ -3148,7 +3148,7 @@ struct wireless_dev *mwifiex_add_virtual_intf(struct wiphy *wiphy, SET_NETDEV_DEV(dev, adapter->dev); ret = dev_alloc_name(dev, name); - if (ret) + if (ret < 0) goto err_alloc_name; priv->dfs_cac_workqueue = alloc_workqueue("MWIFIEX_DFS_CAC-%s", From 03cc8f90d0537fcd4985c3319b4fafbf2e3fb1f0 Mon Sep 17 00:00:00 2001 From: Daniel Hodges Date: Fri, 6 Feb 2026 14:53:56 -0500 Subject: [PATCH 084/359] wifi: libertas: fix use-after-free in lbs_free_adapter() The lbs_free_adapter() function uses timer_delete() (non-synchronous) for both command_timer and tx_lockup_timer before the structure is freed. This is incorrect because timer_delete() does not wait for any running timer callback to complete. If a timer callback is executing when lbs_free_adapter() is called, the callback will access freed memory since lbs_cfg_free() frees the containing structure immediately after lbs_free_adapter() returns. Both timer callbacks (lbs_cmd_timeout_handler and lbs_tx_lockup_handler) access priv->driver_lock, priv->cur_cmd, priv->dev, and other fields, which would all be use-after-free violations. Use timer_delete_sync() instead to ensure any running timer callback has completed before returning. This bug was introduced in commit 8f641d93c38a ("libertas: detect TX lockups and reset hardware") where del_timer() was used instead of del_timer_sync() in the cleanup path. The command_timer has had the same issue since the driver was first written. Fixes: 8f641d93c38a ("libertas: detect TX lockups and reset hardware") Fixes: 954ee164f4f4 ("[PATCH] libertas: reorganize and simplify init sequence") Cc: stable@vger.kernel.org Signed-off-by: Daniel Hodges Link: https://patch.msgid.link/20260206195356.15647-1-git@danielhodges.dev Signed-off-by: Johannes Berg --- drivers/net/wireless/marvell/libertas/main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/marvell/libertas/main.c b/drivers/net/wireless/marvell/libertas/main.c index d44e02c6fe38..dd97f1b61f4d 100644 --- a/drivers/net/wireless/marvell/libertas/main.c +++ b/drivers/net/wireless/marvell/libertas/main.c @@ -799,8 +799,8 @@ static void lbs_free_adapter(struct lbs_private *priv) { lbs_free_cmd_buffer(priv); kfifo_free(&priv->event_fifo); - timer_delete(&priv->command_timer); - timer_delete(&priv->tx_lockup_timer); + timer_delete_sync(&priv->command_timer); + timer_delete_sync(&priv->tx_lockup_timer); } static const struct net_device_ops lbs_netdev_ops = { From 2259d14499d16b115ef8d5d2ddc867e2be7cb5b5 Mon Sep 17 00:00:00 2001 From: Ramanathan Choodamani Date: Thu, 5 Feb 2026 15:12:16 +0530 Subject: [PATCH 085/359] wifi: mac80211: set default WMM parameters on all links Currently, mac80211 only initializes default WMM parameters on the deflink during do_open(). For MLO cases, this leaves the additional links without proper WMM defaults if hostapd does not supply per-link WMM parameters, leading to inconsistent QoS behavior across links. Set default WMM parameters for each link during ieee80211_vif_update_links(), because this ensures all individual links in an MLD have valid WMM settings during bring-up and behave consistently across different BSS. Signed-off-by: Ramanathan Choodamani Signed-off-by: Aishwarya R Link: https://patch.msgid.link/20260205094216.3093542-1-aishwarya.r@oss.qualcomm.com Signed-off-by: Johannes Berg --- net/mac80211/link.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/mac80211/link.c b/net/mac80211/link.c index 17bf55dabd31..a1f67bab8ba1 100644 --- a/net/mac80211/link.c +++ b/net/mac80211/link.c @@ -281,6 +281,7 @@ static int ieee80211_vif_update_links(struct ieee80211_sub_if_data *sdata, struct ieee80211_bss_conf *old[IEEE80211_MLD_MAX_NUM_LINKS]; struct ieee80211_link_data *old_data[IEEE80211_MLD_MAX_NUM_LINKS]; bool use_deflink = old_links == 0; /* set for error case */ + bool non_sta = sdata->vif.type != NL80211_IFTYPE_STATION; lockdep_assert_wiphy(sdata->local->hw.wiphy); @@ -337,6 +338,7 @@ static int ieee80211_vif_update_links(struct ieee80211_sub_if_data *sdata, link = links[link_id]; ieee80211_link_init(sdata, link_id, &link->data, &link->conf); ieee80211_link_setup(&link->data); + ieee80211_set_wmm_default(&link->data, true, non_sta); } if (new_links == 0) From 32e0a7ad9c841f46549ccac0f1cca347a40d8685 Mon Sep 17 00:00:00 2001 From: Daniel J Blueman Date: Fri, 20 Feb 2026 17:34:51 +0800 Subject: [PATCH 086/359] gpio: shared: fix memory leaks On a Snapdragon X1 Elite laptop (Lenovo Yoga Slim 7x), kmemleak reports three sets of: unreferenced object 0xffff00080187f400 (size 1024): comm "swapper/0", pid 1, jiffies 4294667327 hex dump (first 32 bytes): 58 bd 70 01 08 00 ff ff 58 bd 70 01 08 00 ff ff X.p.....X.p..... 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ backtrace (crc 1665d1f8): kmemleak_alloc+0xf4/0x12c __kmalloc_cache_noprof+0x370/0x49c gpio_shared_make_ref+0x70/0x16c gpio_shared_of_traverse+0x4e8/0x5f4 gpio_shared_of_traverse+0x200/0x5f4 gpio_shared_of_traverse+0x200/0x5f4 gpio_shared_of_traverse+0x200/0x5f4 gpio_shared_of_traverse+0x200/0x5f4 gpio_shared_init+0x34/0x1c4 do_one_initcall+0x50/0x280 kernel_init_freeable+0x290/0x33c kernel_init+0x28/0x14c ret_from_fork+0x10/0x20 unreferenced object 0xffff00080170c140 (size 8): comm "swapper/0", pid 1, jiffies 4294667327 hex dump (first 8 bytes): 72 65 73 65 74 00 00 00 reset... backtrace (crc fc24536): kmemleak_alloc+0xf4/0x12c __kmalloc_node_track_caller_noprof+0x3c4/0x584 kstrdup+0x4c/0xcc gpio_shared_make_ref+0x8c/0x16c gpio_shared_of_traverse+0x4e8/0x5f4 gpio_shared_of_traverse+0x200/0x5f4 gpio_shared_of_traverse+0x200/0x5f4 gpio_shared_of_traverse+0x200/0x5f4 gpio_shared_of_traverse+0x200/0x5f4 gpio_shared_init+0x34/0x1c4 do_one_initcall+0x50/0x280 kernel_init_freeable+0x290/0x33c kernel_init+0x28/0x14c ret_from_fork+0x10/0x20 Fix this by decrementing the reference count of each list entry rather than only the first. Fix verified on the same laptop. Fixes: a060b8c511abb gpiolib: implement low-level, shared GPIO support Signed-off-by: Daniel J Blueman Link: https://patch.msgid.link/20260220093452.101655-1-daniel@quora.org Signed-off-by: Bartosz Golaszewski --- drivers/gpio/gpiolib-shared.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpio/gpiolib-shared.c b/drivers/gpio/gpiolib-shared.c index d2614ace4de1..17a7128b6bd9 100644 --- a/drivers/gpio/gpiolib-shared.c +++ b/drivers/gpio/gpiolib-shared.c @@ -748,14 +748,14 @@ static bool gpio_shared_entry_is_really_shared(struct gpio_shared_entry *entry) static void gpio_shared_free_exclusive(void) { struct gpio_shared_entry *entry, *epos; + struct gpio_shared_ref *ref, *rpos; list_for_each_entry_safe(entry, epos, &gpio_shared_list, list) { if (gpio_shared_entry_is_really_shared(entry)) continue; - gpio_shared_drop_ref(list_first_entry(&entry->refs, - struct gpio_shared_ref, - list)); + list_for_each_entry_safe(ref, rpos, &entry->refs, list) + gpio_shared_drop_ref(ref); gpio_shared_drop_entry(entry); } } From 03c0d030f5874eec6ce22750b2b8751d6d4303b5 Mon Sep 17 00:00:00 2001 From: Hongbo Li Date: Sat, 14 Feb 2026 03:02:48 +0000 Subject: [PATCH 087/359] erofs: allow sharing page cache with the same aops only Inode with identical data but different @aops cannot be mixed because the page cache is managed by different subsystems (e.g., @aops for compressed on-disk inodes cannot handle plain on-disk inodes). In this patch, we never allow inodes to share the page cache among plain, compressed, and fileio cases. When a shared inode is created, we initialize @aops that is the same as the initial real inode, and subsequent inodes cannot share the page cache if the inferred @aops differ from the corresponding shared inode. This is reasonable as a first step because, in typical use cases, if an inode is compressible, it will fall into compressed inodes across different filesystem images unless users use plain filesystems. However, in that cases, users will use plain filesystems all the time. Fixes: 5ef3208e3be5 ("erofs: introduce the page cache share feature") Signed-off-by: Hongbo Li Reviewed-by: Gao Xiang Signed-off-by: Gao Xiang --- fs/erofs/inode.c | 7 ++++++- fs/erofs/internal.h | 16 +++++++--------- fs/erofs/ishare.c | 14 +++++++++----- 3 files changed, 22 insertions(+), 15 deletions(-) diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index 4f86169c23f1..4b3d21402e10 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -222,6 +222,7 @@ err_out: static int erofs_fill_inode(struct inode *inode) { + const struct address_space_operations *aops; int err; trace_erofs_fill_inode(inode); @@ -254,7 +255,11 @@ static int erofs_fill_inode(struct inode *inode) } mapping_set_large_folios(inode->i_mapping); - return erofs_inode_set_aops(inode, inode, false); + aops = erofs_get_aops(inode, false); + if (IS_ERR(aops)) + return PTR_ERR(aops); + inode->i_mapping->a_ops = aops; + return 0; } /* diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index d1634455e389..a4f0a42cf8c3 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -471,26 +471,24 @@ static inline void *erofs_vm_map_ram(struct page **pages, unsigned int count) return NULL; } -static inline int erofs_inode_set_aops(struct inode *inode, - struct inode *realinode, bool no_fscache) +static inline const struct address_space_operations * +erofs_get_aops(struct inode *realinode, bool no_fscache) { if (erofs_inode_is_data_compressed(EROFS_I(realinode)->datalayout)) { if (!IS_ENABLED(CONFIG_EROFS_FS_ZIP)) - return -EOPNOTSUPP; + return ERR_PTR(-EOPNOTSUPP); DO_ONCE_LITE_IF(realinode->i_blkbits != PAGE_SHIFT, erofs_info, realinode->i_sb, "EXPERIMENTAL EROFS subpage compressed block support in use. Use at your own risk!"); - inode->i_mapping->a_ops = &z_erofs_aops; - return 0; + return &z_erofs_aops; } - inode->i_mapping->a_ops = &erofs_aops; if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && !no_fscache && erofs_is_fscache_mode(realinode->i_sb)) - inode->i_mapping->a_ops = &erofs_fscache_access_aops; + return &erofs_fscache_access_aops; if (IS_ENABLED(CONFIG_EROFS_FS_BACKED_BY_FILE) && erofs_is_fileio_mode(EROFS_SB(realinode->i_sb))) - inode->i_mapping->a_ops = &erofs_fileio_aops; - return 0; + return &erofs_fileio_aops; + return &erofs_aops; } int erofs_register_sysfs(struct super_block *sb); diff --git a/fs/erofs/ishare.c b/fs/erofs/ishare.c index ce980320a8b9..829d50d5c717 100644 --- a/fs/erofs/ishare.c +++ b/fs/erofs/ishare.c @@ -40,10 +40,14 @@ bool erofs_ishare_fill_inode(struct inode *inode) { struct erofs_sb_info *sbi = EROFS_SB(inode->i_sb); struct erofs_inode *vi = EROFS_I(inode); + const struct address_space_operations *aops; struct erofs_inode_fingerprint fp; struct inode *sharedinode; unsigned long hash; + aops = erofs_get_aops(inode, true); + if (IS_ERR(aops)) + return false; if (erofs_xattr_fill_inode_fingerprint(&fp, inode, sbi->domain_id)) return false; hash = xxh32(fp.opaque, fp.size, 0); @@ -56,15 +60,15 @@ bool erofs_ishare_fill_inode(struct inode *inode) } if (inode_state_read_once(sharedinode) & I_NEW) { - if (erofs_inode_set_aops(sharedinode, inode, true)) { - iget_failed(sharedinode); - kfree(fp.opaque); - return false; - } + sharedinode->i_mapping->a_ops = aops; sharedinode->i_size = vi->vfs_inode.i_size; unlock_new_inode(sharedinode); } else { kfree(fp.opaque); + if (aops != sharedinode->i_mapping->a_ops) { + iput(sharedinode); + return false; + } if (sharedinode->i_size != vi->vfs_inode.i_size) { _erofs_printk(inode->i_sb, KERN_WARNING "size(%lld:%lld) not matches for the same fingerprint\n", From aa280a08e7d8fae58557acc345b36b3dc329d595 Mon Sep 17 00:00:00 2001 From: Andrew Cooper Date: Tue, 6 Jan 2026 13:15:04 +0000 Subject: [PATCH 088/359] x86/fred: Correct speculative safety in fred_extint() array_index_nospec() is no use if the result gets spilled to the stack, as it makes the believed safe-under-speculation value subject to memory predictions. For all practical purposes, this means array_index_nospec() must be used in the expression that accesses the array. As the code currently stands, it's the wrong side of irqentry_enter(), and 'index' is put into %ebp across the function call. Remove the index variable and reposition array_index_nospec(), so it's calculated immediately before the array access. Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code") Signed-off-by: Andrew Cooper Signed-off-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20260106131504.679932-1-andrew.cooper3@citrix.com --- arch/x86/entry/entry_fred.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c index a9b72997103d..88c757ac8ccd 100644 --- a/arch/x86/entry/entry_fred.c +++ b/arch/x86/entry/entry_fred.c @@ -160,8 +160,6 @@ void __init fred_complete_exception_setup(void) static noinstr void fred_extint(struct pt_regs *regs) { unsigned int vector = regs->fred_ss.vector; - unsigned int index = array_index_nospec(vector - FIRST_SYSTEM_VECTOR, - NR_SYSTEM_VECTORS); if (WARN_ON_ONCE(vector < FIRST_EXTERNAL_VECTOR)) return; @@ -170,7 +168,8 @@ static noinstr void fred_extint(struct pt_regs *regs) irqentry_state_t state = irqentry_enter(regs); instrumentation_begin(); - sysvec_table[index](regs); + sysvec_table[array_index_nospec(vector - FIRST_SYSTEM_VECTOR, + NR_SYSTEM_VECTORS)](regs); instrumentation_end(); irqentry_exit(regs, state); } else { From a0cb371b521dde44f32cfe954b6ef6f82b407393 Mon Sep 17 00:00:00 2001 From: Hou Wenlong Date: Sat, 10 Jan 2026 11:47:37 +0800 Subject: [PATCH 089/359] x86/bug: Handle __WARN_printf() trap in early_fixup_exception() The commit 5b472b6e5bd9 ("x86_64/bug: Implement __WARN_printf()") implemented __WARN_printf(), which changed the mechanism to use UD1 instead of UD2. However, it only handles the trap in the runtime IDT handler, while the early booting IDT handler lacks this handling. As a result, the usage of WARN() before the runtime IDT setup can lead to kernel crashes. Since KMSAN is enabled after the runtime IDT setup, it is safe to use handle_bug() directly in early_fixup_exception() to address this issue. Fixes: 5b472b6e5bd9 ("x86_64/bug: Implement __WARN_printf()") Signed-off-by: Hou Wenlong Signed-off-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/c4fb3645f60d3a78629d9870e8fcc8535281c24f.1768016713.git.houwenlong.hwl@antgroup.com --- arch/x86/include/asm/traps.h | 2 ++ arch/x86/kernel/traps.c | 2 +- arch/x86/mm/extable.c | 7 ++----- 3 files changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 869b88061801..3f24cc472ce9 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -25,6 +25,8 @@ extern int ibt_selftest_noendbr(void); void handle_invalid_op(struct pt_regs *regs); #endif +noinstr bool handle_bug(struct pt_regs *regs); + static inline int get_si_code(unsigned long condition) { if (condition & DR_STEP) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 5a6a772e0a6c..4dbff8ef9b1c 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -397,7 +397,7 @@ static inline void handle_invalid_op(struct pt_regs *regs) ILL_ILLOPN, error_get_trap_addr(regs)); } -static noinstr bool handle_bug(struct pt_regs *regs) +noinstr bool handle_bug(struct pt_regs *regs) { unsigned long addr = regs->ip; bool handled = false; diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c index 2fdc1f1f5adb..6b9ff1c6cafa 100644 --- a/arch/x86/mm/extable.c +++ b/arch/x86/mm/extable.c @@ -411,14 +411,11 @@ void __init early_fixup_exception(struct pt_regs *regs, int trapnr) return; if (trapnr == X86_TRAP_UD) { - if (report_bug(regs->ip, regs) == BUG_TRAP_TYPE_WARN) { - /* Skip the ud2. */ - regs->ip += LEN_UD2; + if (handle_bug(regs)) return; - } /* - * If this was a BUG and report_bug returns or if this + * If this was a BUG and handle_bug returns or if this * was just a normal #UD, we want to continue onward and * crash. */ From 24c8147abb39618d74fcc36e325765e8fe7bdd7a Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 11 Feb 2026 13:59:43 +0100 Subject: [PATCH 090/359] x86/cfi: Fix CFI rewrite for odd alignments Rustam reported his clang builds did not boot properly; turns out his .config has: CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y set. Fix up the FineIBT code to deal with this unusual alignment. Fixes: 931ab63664f0 ("x86/ibt: Implement FineIBT") Reported-by: Rustam Kovhaev Signed-off-by: Peter Zijlstra (Intel) Tested-by: Rustam Kovhaev --- arch/x86/include/asm/cfi.h | 12 ++++++++---- arch/x86/include/asm/linkage.h | 4 ++-- arch/x86/kernel/alternative.c | 29 ++++++++++++++++++++++------- arch/x86/net/bpf_jit_comp.c | 13 ++----------- 4 files changed, 34 insertions(+), 24 deletions(-) diff --git a/arch/x86/include/asm/cfi.h b/arch/x86/include/asm/cfi.h index c40b9ebc1fb4..ab3fbbd947ed 100644 --- a/arch/x86/include/asm/cfi.h +++ b/arch/x86/include/asm/cfi.h @@ -111,6 +111,12 @@ extern bhi_thunk __bhi_args_end[]; struct pt_regs; +#ifdef CONFIG_CALL_PADDING +#define CFI_OFFSET (CONFIG_FUNCTION_PADDING_CFI+5) +#else +#define CFI_OFFSET 5 +#endif + #ifdef CONFIG_CFI enum bug_trap_type handle_cfi_failure(struct pt_regs *regs); #define __bpfcall @@ -119,11 +125,9 @@ static inline int cfi_get_offset(void) { switch (cfi_mode) { case CFI_FINEIBT: - return 16; + return /* fineibt_prefix_size */ 16; case CFI_KCFI: - if (IS_ENABLED(CONFIG_CALL_PADDING)) - return 16; - return 5; + return CFI_OFFSET; default: return 0; } diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h index 9d38ae744a2e..a7294656ad90 100644 --- a/arch/x86/include/asm/linkage.h +++ b/arch/x86/include/asm/linkage.h @@ -68,7 +68,7 @@ * Depending on -fpatchable-function-entry=N,N usage (CONFIG_CALL_PADDING) the * CFI symbol layout changes. * - * Without CALL_THUNKS: + * Without CALL_PADDING: * * .align FUNCTION_ALIGNMENT * __cfi_##name: @@ -77,7 +77,7 @@ * .long __kcfi_typeid_##name * name: * - * With CALL_THUNKS: + * With CALL_PADDING: * * .align FUNCTION_ALIGNMENT * __cfi_##name: diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index a888ae0f01fb..e87da25d1236 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -1182,7 +1182,7 @@ void __init_or_module noinline apply_seal_endbr(s32 *start, s32 *end) poison_endbr(addr); if (IS_ENABLED(CONFIG_FINEIBT)) - poison_cfi(addr - 16); + poison_cfi(addr - CFI_OFFSET); } } @@ -1389,6 +1389,8 @@ extern u8 fineibt_preamble_end[]; #define fineibt_preamble_ud 0x13 #define fineibt_preamble_hash 5 +#define fineibt_prefix_size (fineibt_preamble_size - ENDBR_INSN_SIZE) + /* * : * 0: b8 78 56 34 12 mov $0x12345678, %eax @@ -1634,7 +1636,7 @@ static int cfi_rewrite_preamble(s32 *start, s32 *end) * have determined there are no indirect calls to it and we * don't need no CFI either. */ - if (!is_endbr(addr + 16)) + if (!is_endbr(addr + CFI_OFFSET)) continue; hash = decode_preamble_hash(addr, &arity); @@ -1642,6 +1644,15 @@ static int cfi_rewrite_preamble(s32 *start, s32 *end) addr, addr, 5, addr)) return -EINVAL; + /* + * FineIBT relies on being at func-16, so if the preamble is + * actually larger than that, place it the tail end. + * + * NOTE: this is possible with things like DEBUG_CALL_THUNKS + * and DEBUG_FORCE_FUNCTION_ALIGN_64B. + */ + addr += CFI_OFFSET - fineibt_prefix_size; + text_poke_early(addr, fineibt_preamble_start, fineibt_preamble_size); WARN_ON(*(u32 *)(addr + fineibt_preamble_hash) != 0x12345678); text_poke_early(addr + fineibt_preamble_hash, &hash, 4); @@ -1664,10 +1675,10 @@ static void cfi_rewrite_endbr(s32 *start, s32 *end) for (s = start; s < end; s++) { void *addr = (void *)s + *s; - if (!exact_endbr(addr + 16)) + if (!exact_endbr(addr + CFI_OFFSET)) continue; - poison_endbr(addr + 16); + poison_endbr(addr + CFI_OFFSET); } } @@ -1772,7 +1783,8 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline, if (FINEIBT_WARN(fineibt_preamble_size, 20) || FINEIBT_WARN(fineibt_preamble_bhi + fineibt_bhi1_size, 20) || FINEIBT_WARN(fineibt_caller_size, 14) || - FINEIBT_WARN(fineibt_paranoid_size, 20)) + FINEIBT_WARN(fineibt_paranoid_size, 20) || + WARN_ON_ONCE(CFI_OFFSET < fineibt_prefix_size)) return; if (cfi_mode == CFI_AUTO) { @@ -1885,6 +1897,11 @@ static void poison_cfi(void *addr) */ switch (cfi_mode) { case CFI_FINEIBT: + /* + * FineIBT preamble is at func-16. + */ + addr += CFI_OFFSET - fineibt_prefix_size; + /* * FineIBT prefix should start with an ENDBR. */ @@ -1923,8 +1940,6 @@ static void poison_cfi(void *addr) } } -#define fineibt_prefix_size (fineibt_preamble_size - ENDBR_INSN_SIZE) - /* * When regs->ip points to a 0xD6 byte in the FineIBT preamble, * return true and fill out target and type. diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 8f10080e6fe3..e9b78040d703 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -438,17 +438,8 @@ static void emit_kcfi(u8 **pprog, u32 hash) EMIT1_off32(0xb8, hash); /* movl $hash, %eax */ #ifdef CONFIG_CALL_PADDING - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); + for (int i = 0; i < CONFIG_FUNCTION_PADDING_CFI; i++) + EMIT1(0x90); #endif EMIT_ENDBR(); From 237dc6a054f6787c2a8f61c59086030267e5e1c5 Mon Sep 17 00:00:00 2001 From: Thomas Huth Date: Thu, 18 Dec 2025 19:20:29 +0100 Subject: [PATCH 091/359] x86/headers: Replace __ASSEMBLY__ stragglers with __ASSEMBLER__ After converting the __ASSEMBLY__ statements to __ASSEMBLER__ in commit 24a295e4ef1ca ("x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers"), some new code has been added that uses __ASSEMBLY__ again. Convert these stragglers, too. This is a mechanical patch, done with a simple "sed -i" command. Signed-off-by: Thomas Huth Signed-off-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20251218182029.166993-1-thuth@redhat.com --- arch/x86/include/asm/bug.h | 6 +++--- arch/x86/include/asm/irqflags.h | 4 ++-- arch/x86/include/asm/percpu.h | 2 +- arch/x86/include/asm/runtime-const.h | 6 +++--- 4 files changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h index 9b4e04690e1a..80c1696d8d59 100644 --- a/arch/x86/include/asm/bug.h +++ b/arch/x86/include/asm/bug.h @@ -7,7 +7,7 @@ #include #include -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ struct bug_entry; extern void __WARN_trap(struct bug_entry *bug, ...); #endif @@ -137,7 +137,7 @@ do { \ #ifdef HAVE_ARCH_BUG_FORMAT_ARGS -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ #include DECLARE_STATIC_CALL(WARN_trap, __WARN_trap); @@ -153,7 +153,7 @@ struct arch_va_list { struct sysv_va_list args; }; extern void *__warn_args(struct arch_va_list *args, struct pt_regs *regs); -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ #define __WARN_bug_entry(flags, format) ({ \ struct bug_entry *bug; \ diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index a1193e9d65f2..462754b0bf8a 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -77,7 +77,7 @@ static __always_inline void native_local_irq_restore(unsigned long flags) #endif #ifndef CONFIG_PARAVIRT -#ifndef __ASSEMBLY__ +#ifndef __ASSEMBLER__ /* * Used in the idle loop; sti takes one instruction cycle * to complete: @@ -95,7 +95,7 @@ static __always_inline void halt(void) { native_halt(); } -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ #endif /* CONFIG_PARAVIRT */ #ifdef CONFIG_PARAVIRT_XXL diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index c55058f3d75e..409981468cba 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -20,7 +20,7 @@ #define PER_CPU_VAR(var) __percpu(var)__percpu_rel -#else /* !__ASSEMBLY__: */ +#else /* !__ASSEMBLER__: */ #include #include diff --git a/arch/x86/include/asm/runtime-const.h b/arch/x86/include/asm/runtime-const.h index e5a13dc8816e..4cd94fdcb45e 100644 --- a/arch/x86/include/asm/runtime-const.h +++ b/arch/x86/include/asm/runtime-const.h @@ -6,7 +6,7 @@ #error "Cannot use runtime-const infrastructure from modules" #endif -#ifdef __ASSEMBLY__ +#ifdef __ASSEMBLER__ .macro RUNTIME_CONST_PTR sym reg movq $0x0123456789abcdef, %\reg @@ -16,7 +16,7 @@ .popsection .endm -#else /* __ASSEMBLY__ */ +#else /* __ASSEMBLER__ */ #define runtime_const_ptr(sym) ({ \ typeof(sym) __ret; \ @@ -74,5 +74,5 @@ static inline void runtime_const_fixup(void (*fn)(void *, unsigned long), } } -#endif /* __ASSEMBLY__ */ +#endif /* __ASSEMBLER__ */ #endif From b3d99f43c72b56cf7a104a364e7fb34b0702828b Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 9 Feb 2026 15:28:16 +0100 Subject: [PATCH 092/359] sched/fair: Fix zero_vruntime tracking It turns out that zero_vruntime tracking is broken when there is but a single task running. Current update paths are through __{en,de}queue_entity(), and when there is but a single task, pick_next_task() will always return that one task, and put_prev_set_next_task() will end up in neither function. This can cause entity_key() to grow indefinitely large and cause overflows, leading to much pain and suffering. Furtermore, doing update_zero_vruntime() from __{de,en}queue_entity(), which are called from {set_next,put_prev}_entity() has problems because: - set_next_entity() calls __dequeue_entity() before it does cfs_rq->curr = se. This means the avg_vruntime() will see the removal but not current, missing the entity for accounting. - put_prev_entity() calls __enqueue_entity() before it does cfs_rq->curr = NULL. This means the avg_vruntime() will see the addition *and* current, leading to double accounting. Both cases are incorrect/inconsistent. Noting that avg_vruntime is already called on each {en,de}queue, remove the explicit avg_vruntime() calls (which removes an extra 64bit division for each {en,de}queue) and have avg_vruntime() update zero_vruntime itself. Additionally, have the tick call avg_vruntime() -- discarding the result, but for the side-effect of updating zero_vruntime. While there, optimize avg_vruntime() by noting that the average of one value is rather trivial to compute. Test case: # taskset -c -p 1 $$ # taskset -c 2 bash -c 'while :; do :; done&' # cat /sys/kernel/debug/sched/debug | awk '/^cpu#/ {P=0} /^cpu#2,/ {P=1} {if (P) print $0}' | grep -e zero_vruntime -e "^>" PRE: .zero_vruntime : 31316.407903 >R bash 487 50787.345112 E 50789.145972 2.800000 50780.298364 16 120 0.000000 0.000000 0.000000 / .zero_vruntime : 382548.253179 >R bash 487 427275.204288 E 427276.003584 2.800000 427268.157540 23 120 0.000000 0.000000 0.000000 / POST: .zero_vruntime : 17259.709467 >R bash 526 17259.709467 E 17262.509467 2.800000 16915.031624 9 120 0.000000 0.000000 0.000000 / .zero_vruntime : 18702.723356 >R bash 526 18702.723356 E 18705.523356 2.800000 18358.045513 9 120 0.000000 0.000000 0.000000 / Fixes: 79f3f9bedd14 ("sched/eevdf: Fix min_vruntime vs avg_vruntime") Reported-by: K Prateek Nayak Signed-off-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak Tested-by: Shubhang Kaushik Link: https://patch.msgid.link/20260219080624.438854780%40infradead.org --- kernel/sched/fair.c | 84 ++++++++++++++++++++++++++++++--------------- 1 file changed, 57 insertions(+), 27 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index eea99ec01a3f..56dddd4fd208 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -589,6 +589,21 @@ static inline bool entity_before(const struct sched_entity *a, return vruntime_cmp(a->deadline, "<", b->deadline); } +/* + * Per avg_vruntime() below, cfs_rq::zero_vruntime is only slightly stale + * and this value should be no more than two lag bounds. Which puts it in the + * general order of: + * + * (slice + TICK_NSEC) << NICE_0_LOAD_SHIFT + * + * which is around 44 bits in size (on 64bit); that is 20 for + * NICE_0_LOAD_SHIFT, another 20 for NSEC_PER_MSEC and then a handful for + * however many msec the actual slice+tick ends up begin. + * + * (disregarding the actual divide-by-weight part makes for the worst case + * weight of 2, which nicely cancels vs the fuzz in zero_vruntime not actually + * being the zero-lag point). + */ static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se) { return vruntime_op(se->vruntime, "-", cfs_rq->zero_vruntime); @@ -676,39 +691,61 @@ sum_w_vruntime_sub(struct cfs_rq *cfs_rq, struct sched_entity *se) } static inline -void sum_w_vruntime_update(struct cfs_rq *cfs_rq, s64 delta) +void update_zero_vruntime(struct cfs_rq *cfs_rq, s64 delta) { /* - * v' = v + d ==> sum_w_vruntime' = sum_runtime - d*sum_weight + * v' = v + d ==> sum_w_vruntime' = sum_w_vruntime - d*sum_weight */ cfs_rq->sum_w_vruntime -= cfs_rq->sum_weight * delta; + cfs_rq->zero_vruntime += delta; } /* - * Specifically: avg_runtime() + 0 must result in entity_eligible() := true + * Specifically: avg_vruntime() + 0 must result in entity_eligible() := true * For this to be so, the result of this function must have a left bias. + * + * Called in: + * - place_entity() -- before enqueue + * - update_entity_lag() -- before dequeue + * - entity_tick() + * + * This means it is one entry 'behind' but that puts it close enough to where + * the bound on entity_key() is at most two lag bounds. */ u64 avg_vruntime(struct cfs_rq *cfs_rq) { struct sched_entity *curr = cfs_rq->curr; - s64 avg = cfs_rq->sum_w_vruntime; - long load = cfs_rq->sum_weight; + long weight = cfs_rq->sum_weight; + s64 delta = 0; - if (curr && curr->on_rq) { - unsigned long weight = scale_load_down(curr->load.weight); + if (curr && !curr->on_rq) + curr = NULL; - avg += entity_key(cfs_rq, curr) * weight; - load += weight; - } + if (weight) { + s64 runtime = cfs_rq->sum_w_vruntime; + + if (curr) { + unsigned long w = scale_load_down(curr->load.weight); + + runtime += entity_key(cfs_rq, curr) * w; + weight += w; + } - if (load) { /* sign flips effective floor / ceiling */ - if (avg < 0) - avg -= (load - 1); - avg = div_s64(avg, load); + if (runtime < 0) + runtime -= (weight - 1); + + delta = div_s64(runtime, weight); + } else if (curr) { + /* + * When there is but one element, it is the average. + */ + delta = curr->vruntime - cfs_rq->zero_vruntime; } - return cfs_rq->zero_vruntime + avg; + update_zero_vruntime(cfs_rq, delta); + + return cfs_rq->zero_vruntime; } /* @@ -777,16 +814,6 @@ int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se) return vruntime_eligible(cfs_rq, se->vruntime); } -static void update_zero_vruntime(struct cfs_rq *cfs_rq) -{ - u64 vruntime = avg_vruntime(cfs_rq); - s64 delta = vruntime_op(vruntime, "-", cfs_rq->zero_vruntime); - - sum_w_vruntime_update(cfs_rq, delta); - - cfs_rq->zero_vruntime = vruntime; -} - static inline u64 cfs_rq_min_slice(struct cfs_rq *cfs_rq) { struct sched_entity *root = __pick_root_entity(cfs_rq); @@ -856,7 +883,6 @@ RB_DECLARE_CALLBACKS(static, min_vruntime_cb, struct sched_entity, static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) { sum_w_vruntime_add(cfs_rq, se); - update_zero_vruntime(cfs_rq); se->min_vruntime = se->vruntime; se->min_slice = se->slice; rb_add_augmented_cached(&se->run_node, &cfs_rq->tasks_timeline, @@ -868,7 +894,6 @@ static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) rb_erase_augmented_cached(&se->run_node, &cfs_rq->tasks_timeline, &min_vruntime_cb); sum_w_vruntime_sub(cfs_rq, se); - update_zero_vruntime(cfs_rq); } struct sched_entity *__pick_root_entity(struct cfs_rq *cfs_rq) @@ -5524,6 +5549,11 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued) update_load_avg(cfs_rq, curr, UPDATE_TG); update_cfs_group(curr); + /* + * Pulls along cfs_rq::zero_vruntime. + */ + avg_vruntime(cfs_rq); + #ifdef CONFIG_SCHED_HRTICK /* * queued ticks are scheduled to match the slice, so don't bother From bcd74b2ffdd0a2233adbf26b65c62fc69a809c8e Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 23 Jan 2026 16:49:09 +0100 Subject: [PATCH 093/359] sched/fair: Only set slice protection at pick time We should not (re)set slice protection in the sched_change pattern which calls put_prev_task() / set_next_task(). Fixes: 63304558ba5d ("sched/eevdf: Curb wakeup-preemption") Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Vincent Guittot Tested-by: K Prateek Nayak Tested-by: Shubhang Kaushik Link: https://patch.msgid.link/20260219080624.561421378%40infradead.org --- kernel/sched/fair.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 56dddd4fd208..f2b46c33a8c5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5445,7 +5445,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) } static void -set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) +set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, bool first) { clear_buddies(cfs_rq, se); @@ -5460,7 +5460,8 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) __dequeue_entity(cfs_rq, se); update_load_avg(cfs_rq, se, UPDATE_TG); - set_protect_slice(cfs_rq, se); + if (first) + set_protect_slice(cfs_rq, se); } update_stats_curr_start(cfs_rq, se); @@ -8978,13 +8979,13 @@ again: pse = parent_entity(pse); } if (se_depth >= pse_depth) { - set_next_entity(cfs_rq_of(se), se); + set_next_entity(cfs_rq_of(se), se, true); se = parent_entity(se); } } put_prev_entity(cfs_rq, pse); - set_next_entity(cfs_rq, se); + set_next_entity(cfs_rq, se, true); __set_next_task_fair(rq, p, true); } @@ -13598,7 +13599,7 @@ static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool first) for_each_sched_entity(se) { struct cfs_rq *cfs_rq = cfs_rq_of(se); - set_next_entity(cfs_rq, se); + set_next_entity(cfs_rq, se, first); /* ensure bandwidth has been allocated on our new cfs_rq */ account_cfs_rq_runtime(cfs_rq, 0); } From ff38424030f98976150e42ca35f4b00e6ab8fa23 Mon Sep 17 00:00:00 2001 From: Wang Tao Date: Tue, 20 Jan 2026 12:31:13 +0000 Subject: [PATCH 094/359] sched/eevdf: Update se->vprot in reweight_entity() In the EEVDF framework with Run-to-Parity protection, `se->vprot` is an independent variable defining the virtual protection timestamp. When `reweight_entity()` is called (e.g., via nice/renice), it performs the following actions to preserve Lag consistency: 1. Scales `se->vlag` based on the new weight. 2. Calls `place_entity()`, which recalculates `se->vruntime` based on the new weight and scaled lag. However, the current implementation fails to update `se->vprot`, leading to mismatches between the task's actual runtime and its expected duration. Fixes: 63304558ba5d ("sched/eevdf: Curb wakeup-preemption") Suggested-by: Zhang Qiao Signed-off-by: Wang Tao Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Vincent Guittot Tested-by: K Prateek Nayak Tested-by: Shubhang Kaushik Link: https://patch.msgid.link/20260120123113.3518950-1-wangtao554@huawei.com --- kernel/sched/fair.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f2b46c33a8c5..93fa5b8313e4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3815,6 +3815,8 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, unsigned long weight) { bool curr = cfs_rq->curr == se; + bool rel_vprot = false; + u64 vprot; if (se->on_rq) { /* commit outstanding execution time */ @@ -3822,6 +3824,11 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, update_entity_lag(cfs_rq, se); se->deadline -= se->vruntime; se->rel_deadline = 1; + if (curr && protect_slice(se)) { + vprot = se->vprot - se->vruntime; + rel_vprot = true; + } + cfs_rq->nr_queued--; if (!curr) __dequeue_entity(cfs_rq, se); @@ -3837,6 +3844,9 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, if (se->rel_deadline) se->deadline = div_s64(se->deadline * se->load.weight, weight); + if (rel_vprot) + vprot = div_s64(vprot * se->load.weight, weight); + update_load_set(&se->load, weight); do { @@ -3848,6 +3858,8 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, enqueue_load_avg(cfs_rq, se); if (se->on_rq) { place_entity(cfs_rq, se, 0); + if (rel_vprot) + se->vprot = se->vruntime + vprot; update_load_add(&cfs_rq->load, se->load.weight); if (!curr) __enqueue_entity(cfs_rq, se); From 6e3c0a4e1ad1e0455b7880fad02b3ee179f56c09 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 22 Apr 2025 12:16:28 +0200 Subject: [PATCH 095/359] sched/fair: Fix lag clamp Vincent reported that he was seeing undue lag clamping in a mixed slice workload. Implement the max_slice tracking as per the todo comment. Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy") Reported-off-by: Vincent Guittot Signed-off-by: Peter Zijlstra (Intel) Tested-by: Vincent Guittot Tested-by: K Prateek Nayak Tested-by: Shubhang Kaushik Link: https://patch.msgid.link/20250422101628.GA33555@noisy.programming.kicks-ass.net --- include/linux/sched.h | 1 + kernel/sched/fair.c | 39 +++++++++++++++++++++++++++++++++++---- 2 files changed, 36 insertions(+), 4 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 074ad4ef3d81..a7b4a980eb2f 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -579,6 +579,7 @@ struct sched_entity { u64 deadline; u64 min_vruntime; u64 min_slice; + u64 max_slice; struct list_head group_node; unsigned char on_rq; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 93fa5b8313e4..f4446cbe8ffa 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -748,6 +748,8 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq) return cfs_rq->zero_vruntime; } +static inline u64 cfs_rq_max_slice(struct cfs_rq *cfs_rq); + /* * lag_i = S - s_i = w_i * (V - v_i) * @@ -761,17 +763,16 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq) * EEVDF gives the following limit for a steady state system: * * -r_max < lag < max(r_max, q) - * - * XXX could add max_slice to the augmented data to track this. */ static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se) { + u64 max_slice = cfs_rq_max_slice(cfs_rq) + TICK_NSEC; s64 vlag, limit; WARN_ON_ONCE(!se->on_rq); vlag = avg_vruntime(cfs_rq) - se->vruntime; - limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se); + limit = calc_delta_fair(max_slice, se); se->vlag = clamp(vlag, -limit, limit); } @@ -829,6 +830,21 @@ static inline u64 cfs_rq_min_slice(struct cfs_rq *cfs_rq) return min_slice; } +static inline u64 cfs_rq_max_slice(struct cfs_rq *cfs_rq) +{ + struct sched_entity *root = __pick_root_entity(cfs_rq); + struct sched_entity *curr = cfs_rq->curr; + u64 max_slice = 0ULL; + + if (curr && curr->on_rq) + max_slice = curr->slice; + + if (root) + max_slice = max(max_slice, root->max_slice); + + return max_slice; +} + static inline bool __entity_less(struct rb_node *a, const struct rb_node *b) { return entity_before(__node_2_se(a), __node_2_se(b)); @@ -853,6 +869,15 @@ static inline void __min_slice_update(struct sched_entity *se, struct rb_node *n } } +static inline void __max_slice_update(struct sched_entity *se, struct rb_node *node) +{ + if (node) { + struct sched_entity *rse = __node_2_se(node); + if (rse->max_slice > se->max_slice) + se->max_slice = rse->max_slice; + } +} + /* * se->min_vruntime = min(se->vruntime, {left,right}->min_vruntime) */ @@ -860,6 +885,7 @@ static inline bool min_vruntime_update(struct sched_entity *se, bool exit) { u64 old_min_vruntime = se->min_vruntime; u64 old_min_slice = se->min_slice; + u64 old_max_slice = se->max_slice; struct rb_node *node = &se->run_node; se->min_vruntime = se->vruntime; @@ -870,8 +896,13 @@ static inline bool min_vruntime_update(struct sched_entity *se, bool exit) __min_slice_update(se, node->rb_right); __min_slice_update(se, node->rb_left); + se->max_slice = se->slice; + __max_slice_update(se, node->rb_right); + __max_slice_update(se, node->rb_left); + return se->min_vruntime == old_min_vruntime && - se->min_slice == old_min_slice; + se->min_slice == old_min_slice && + se->max_slice == old_max_slice; } RB_DECLARE_CALLBACKS(static, min_vruntime_cb, struct sched_entity, From 4c652a47722f69c6f2685f05b17490ea97f643a8 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 6 Feb 2026 08:41:13 +0100 Subject: [PATCH 096/359] rseq: Mark rseq_arm_slice_extension_timer() __always_inline objtool warns about this function being called inside of a uaccess section: kernel/entry/common.o: warning: objtool: irqentry_exit+0x1dc: call to rseq_arm_slice_extension_timer() with UACCESS enabled Interestingly, this happens with CONFIG_RSEQ_SLICE_EXTENSION disabled, so this is an empty function, as the normal implementation is already marked __always_inline. I could reproduce this multiple times with gcc-11 but not with gcc-15, so the compiler probably got better at identifying the trivial function. Mark all the empty helpers for !RSEQ_SLICE_EXTENSION as __always_inline for consistency, avoiding this warning. Fixes: 0ac3b5c3dc45 ("rseq: Implement time slice extension enforcement timer") Signed-off-by: Arnd Bergmann Signed-off-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20260206074122.709580-1-arnd@kernel.org --- include/linux/rseq_entry.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/rseq_entry.h b/include/linux/rseq_entry.h index cbc4a791618b..c6831c93cd6e 100644 --- a/include/linux/rseq_entry.h +++ b/include/linux/rseq_entry.h @@ -216,10 +216,10 @@ efault: } #else /* CONFIG_RSEQ_SLICE_EXTENSION */ -static inline bool rseq_slice_extension_enabled(void) { return false; } -static inline bool rseq_arm_slice_extension_timer(void) { return false; } -static inline void rseq_slice_clear_grant(struct task_struct *t) { } -static inline bool rseq_grant_slice_extension(bool work_pending) { return false; } +static __always_inline bool rseq_slice_extension_enabled(void) { return false; } +static __always_inline bool rseq_arm_slice_extension_timer(void) { return false; } +static __always_inline void rseq_slice_clear_grant(struct task_struct *t) { } +static __always_inline bool rseq_grant_slice_extension(bool work_pending) { return false; } #endif /* !CONFIG_RSEQ_SLICE_EXTENSION */ bool rseq_debug_update_user_cs(struct task_struct *t, struct pt_regs *regs, unsigned long csaddr); From 5324953c06bd929c135d9e04be391ee2c11b5a19 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 18 Feb 2026 17:33:29 +0100 Subject: [PATCH 097/359] sched/core: Fix wakeup_preempt's next_class tracking Kernel test robot reported that tools/testing/selftests/kvm/hardware_disable_test was failing due to commit 704069649b5b ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()") It turns out there were two related problems that could lead to a missed preemption: - when hitting newidle balance from the idle thread, it would elevate rb->next_class from &idle_sched_class to &fair_sched_class, causing later wakeup_preempt() calls to not hit the sched_class_above() case, and not issue resched_curr(). Notably, this modification pattern should only lower the next_class, and never raise it. Create two new helper functions to wrap this. - when doing schedule_idle(), it was possible to miss (re)setting rq->next_class to &idle_sched_class, leading to the very same problem. Cc: Sean Christopherson Fixes: 704069649b5b ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()") Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-lkp/202602122157.4e861298-lkp@intel.com Signed-off-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20260218163329.GQ1395416@noisy.programming.kicks-ass.net --- kernel/sched/core.c | 1 + kernel/sched/ext.c | 4 ++-- kernel/sched/fair.c | 4 ++-- kernel/sched/sched.h | 11 +++++++++++ 4 files changed, 16 insertions(+), 4 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 759777694c78..b7f77c165a6e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6830,6 +6830,7 @@ static void __sched notrace __schedule(int sched_mode) /* SCX must consult the BPF scheduler to tell if rq is empty */ if (!rq->nr_running && !scx_enabled()) { next = prev; + rq->next_class = &idle_sched_class; goto picked; } } else if (!preempt && prev_state) { diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 62b1f3ac5630..06cc0a4aec66 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -2460,7 +2460,7 @@ do_pick_task_scx(struct rq *rq, struct rq_flags *rf, bool force_scx) /* see kick_cpus_irq_workfn() */ smp_store_release(&rq->scx.kick_sync, rq->scx.kick_sync + 1); - rq->next_class = &ext_sched_class; + rq_modified_begin(rq, &ext_sched_class); rq_unpin_lock(rq, rf); balance_one(rq, prev); @@ -2475,7 +2475,7 @@ do_pick_task_scx(struct rq *rq, struct rq_flags *rf, bool force_scx) * If @force_scx is true, always try to pick a SCHED_EXT task, * regardless of any higher-priority sched classes activity. */ - if (!force_scx && sched_class_above(rq->next_class, &ext_sched_class)) + if (!force_scx && rq_modified_above(rq, &ext_sched_class)) return RETRY_TASK; keep_prev = rq->scx.flags & SCX_RQ_BAL_KEEP; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f4446cbe8ffa..bf948db905ed 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12982,7 +12982,7 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf) t0 = sched_clock_cpu(this_cpu); __sched_balance_update_blocked_averages(this_rq); - this_rq->next_class = &fair_sched_class; + rq_modified_begin(this_rq, &fair_sched_class); raw_spin_rq_unlock(this_rq); for_each_domain(this_cpu, sd) { @@ -13049,7 +13049,7 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf) pulled_task = 1; /* If a higher prio class was modified, restart the pick */ - if (sched_class_above(this_rq->next_class, &fair_sched_class)) + if (rq_modified_above(this_rq, &fair_sched_class)) pulled_task = -1; out: diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index b82fb70a9d54..43bbf0693cca 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2748,6 +2748,17 @@ static inline const struct sched_class *next_active_class(const struct sched_cla #define sched_class_above(_a, _b) ((_a) < (_b)) +static inline void rq_modified_begin(struct rq *rq, const struct sched_class *class) +{ + if (sched_class_above(rq->next_class, class)) + rq->next_class = class; +} + +static inline bool rq_modified_above(struct rq *rq, const struct sched_class *class) +{ + return sched_class_above(rq->next_class, class); +} + static inline bool sched_stop_runnable(struct rq *rq) { return rq->stop && task_on_rq_queued(rq->stop); From 26d43a90be81fc90e26688a51d3ec83188602731 Mon Sep 17 00:00:00 2001 From: Mathieu Desnoyers Date: Fri, 20 Feb 2026 15:06:40 -0500 Subject: [PATCH 098/359] rseq: Clarify rseq registration rseq_size bound check comment The rseq registration validates that the rseq_size argument is greater or equal to 32 (the original rseq size), but the comment associated with this check does not clearly state this. Clarify the comment to that effect. Fixes: ee3e3ac05c26 ("rseq: Introduce extensible rseq ABI") Signed-off-by: Mathieu Desnoyers Signed-off-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20260220200642.1317826-2-mathieu.desnoyers@efficios.com --- kernel/rseq.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/rseq.c b/kernel/rseq.c index b0973d19f366..e349f86cc945 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -449,8 +449,9 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, int, flags, u32 * auxiliary vector AT_RSEQ_ALIGN. If rseq_len is the original rseq * size, the required alignment is the original struct rseq alignment. * - * In order to be valid, rseq_len is either the original rseq size, or - * large enough to contain all supported fields, as communicated to + * The rseq_len is required to be greater or equal to the original rseq + * size. In order to be valid, rseq_len is either the original rseq size, + * or large enough to contain all supported fields, as communicated to * user-space through the ELF auxiliary vector AT_RSEQ_FEATURE_SIZE. */ if (rseq_len < ORIG_RSEQ_SIZE || From 3b68df978133ac3d46d570af065a73debbb68248 Mon Sep 17 00:00:00 2001 From: Mathieu Desnoyers Date: Fri, 20 Feb 2026 15:06:41 -0500 Subject: [PATCH 099/359] rseq: slice ext: Ensure rseq feature size differs from original rseq size Before rseq became extensible, its original size was 32 bytes even though the active rseq area was only 20 bytes. This had the following impact in terms of userspace ecosystem evolution: * The GNU libc between 2.35 and 2.39 expose a __rseq_size symbol set to 32, even though the size of the active rseq area is really 20. * The GNU libc 2.40 changes this __rseq_size to 20, thus making it express the active rseq area. * Starting from glibc 2.41, __rseq_size corresponds to the AT_RSEQ_FEATURE_SIZE from getauxval(3). This means that users of __rseq_size can always expect it to correspond to the active rseq area, except for the value 32, for which the active rseq area is 20 bytes. Exposing a 32 bytes feature size would make life needlessly painful for userspace. Therefore, add a reserved field at the end of the rseq area to bump the feature size to 33 bytes. This reserved field is expected to be replaced with whatever field will come next, expecting that this field will be larger than 1 byte. The effect of this change is to increase the size from 32 to 64 bytes before we actually have fields using that memory. Clarify the allocation size and alignment requirements in the struct rseq uapi comment. Change the value returned by getauxval(AT_RSEQ_ALIGN) to return the value of the active rseq area size rounded up to next power of 2, which guarantees that the rseq structure will always be aligned on the nearest power of two large enough to contain it, even as it grows. Change the alignment check in the rseq registration accordingly. This will minimize the amount of ABI corner-cases we need to document and require userspace to play games with. The rule stays simple when __rseq_size != 32: #define rseq_field_available(field) (__rseq_size >= offsetofend(struct rseq_abi, field)) Signed-off-by: Mathieu Desnoyers Signed-off-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20260220200642.1317826-3-mathieu.desnoyers@efficios.com --- fs/binfmt_elf.c | 3 ++- include/linux/rseq.h | 12 ++++++++++++ include/uapi/linux/rseq.h | 26 ++++++++++++++++++++++---- kernel/rseq.c | 3 ++- 4 files changed, 38 insertions(+), 6 deletions(-) diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 8e89cc5b2820..fb857faaf0d6 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -47,6 +47,7 @@ #include #include #include +#include #include #include @@ -286,7 +287,7 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, } #ifdef CONFIG_RSEQ NEW_AUX_ENT(AT_RSEQ_FEATURE_SIZE, offsetof(struct rseq, end)); - NEW_AUX_ENT(AT_RSEQ_ALIGN, __alignof__(struct rseq)); + NEW_AUX_ENT(AT_RSEQ_ALIGN, rseq_alloc_align()); #endif #undef NEW_AUX_ENT /* AT_NULL is zero; clear the rest too */ diff --git a/include/linux/rseq.h b/include/linux/rseq.h index 7a01a0760405..b9d62fc2140d 100644 --- a/include/linux/rseq.h +++ b/include/linux/rseq.h @@ -146,6 +146,18 @@ static inline void rseq_fork(struct task_struct *t, u64 clone_flags) t->rseq = current->rseq; } +/* + * Value returned by getauxval(AT_RSEQ_ALIGN) and expected by rseq + * registration. This is the active rseq area size rounded up to next + * power of 2, which guarantees that the rseq structure will always be + * aligned on the nearest power of two large enough to contain it, even + * as it grows. + */ +static inline unsigned int rseq_alloc_align(void) +{ + return 1U << get_count_order(offsetof(struct rseq, end)); +} + #else /* CONFIG_RSEQ */ static inline void rseq_handle_slowpath(struct pt_regs *regs) { } static inline void rseq_signal_deliver(struct ksignal *ksig, struct pt_regs *regs) { } diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h index 863c4a00a66b..f69344fe6c08 100644 --- a/include/uapi/linux/rseq.h +++ b/include/uapi/linux/rseq.h @@ -87,10 +87,17 @@ struct rseq_slice_ctrl { }; /* - * struct rseq is aligned on 4 * 8 bytes to ensure it is always - * contained within a single cache-line. + * The original size and alignment of the allocation for struct rseq is + * 32 bytes. * - * A single struct rseq per thread is allowed. + * The allocation size needs to be greater or equal to + * max(getauxval(AT_RSEQ_FEATURE_SIZE), 32), and the allocation needs to + * be aligned on max(getauxval(AT_RSEQ_ALIGN), 32). + * + * As an alternative, userspace is allowed to use both the original size + * and alignment of 32 bytes for backward compatibility. + * + * A single active struct rseq registration per thread is allowed. */ struct rseq { /* @@ -180,10 +187,21 @@ struct rseq { */ struct rseq_slice_ctrl slice_ctrl; + /* + * Before rseq became extensible, its original size was 32 bytes even + * though the active rseq area was only 20 bytes. + * Exposing a 32 bytes feature size would make life needlessly painful + * for userspace. Therefore, add a reserved byte after byte 32 + * to bump the rseq feature size from 32 to 33. + * The next field to be added to the rseq area will be larger + * than one byte, and will replace this reserved byte. + */ + __u8 __reserved; + /* * Flexible array member at end of structure, after last feature field. */ char end[]; -} __attribute__((aligned(4 * sizeof(__u64)))); +} __attribute__((aligned(32))); #endif /* _UAPI_LINUX_RSEQ_H */ diff --git a/kernel/rseq.c b/kernel/rseq.c index e349f86cc945..38d3ef540760 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -80,6 +80,7 @@ #include #include #include +#include #include #define CREATE_TRACE_POINTS @@ -456,7 +457,7 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, int, flags, u32 */ if (rseq_len < ORIG_RSEQ_SIZE || (rseq_len == ORIG_RSEQ_SIZE && !IS_ALIGNED((unsigned long)rseq, ORIG_RSEQ_SIZE)) || - (rseq_len != ORIG_RSEQ_SIZE && (!IS_ALIGNED((unsigned long)rseq, __alignof__(*rseq)) || + (rseq_len != ORIG_RSEQ_SIZE && (!IS_ALIGNED((unsigned long)rseq, rseq_alloc_align()) || rseq_len < offsetof(struct rseq, end)))) return -EINVAL; if (!access_ok(rseq, rseq_len)) From 486ff5ad49bc50315bcaf6d45f04a33ef0a45ced Mon Sep 17 00:00:00 2001 From: Namhyung Kim Date: Mon, 2 Jun 2025 21:51:05 -0700 Subject: [PATCH 100/359] perf/core: Fix invalid wait context in ctx_sched_in() Lockdep found a bug in the event scheduling when a pinned event was failed and wakes up the threads in the ring buffer like below. It seems it should not grab a wait-queue lock under perf-context lock. Let's do it with irq_work. [ 39.913691] ============================= [ 39.914157] [ BUG: Invalid wait context ] [ 39.914623] 6.15.0-next-20250530-next-2025053 #1 Not tainted [ 39.915271] ----------------------------- [ 39.915731] repro/837 is trying to lock: [ 39.916191] ffff88801acfabd8 (&event->waitq){....}-{3:3}, at: __wake_up+0x26/0x60 [ 39.917182] other info that might help us debug this: [ 39.917761] context-{5:5} [ 39.918079] 4 locks held by repro/837: [ 39.918530] #0: ffffffff8725cd00 (rcu_read_lock){....}-{1:3}, at: __perf_event_task_sched_in+0xd1/0xbc0 [ 39.919612] #1: ffff88806ca3c6f8 (&cpuctx_lock){....}-{2:2}, at: __perf_event_task_sched_in+0x1a7/0xbc0 [ 39.920748] #2: ffff88800d91fc18 (&ctx->lock){....}-{2:2}, at: __perf_event_task_sched_in+0x1f9/0xbc0 [ 39.921819] #3: ffffffff8725cd00 (rcu_read_lock){....}-{1:3}, at: perf_event_wakeup+0x6c/0x470 Fixes: f4b07fd62d4d ("perf/core: Use POLLHUP for a pinned event in error") Closes: https://lore.kernel.org/lkml/aD2w50VDvGIH95Pf@ly-workstation Reported-by: "Lai, Yi" Signed-off-by: Namhyung Kim Signed-off-by: Peter Zijlstra (Intel) Tested-by: "Lai, Yi" Link: https://patch.msgid.link/20250603045105.1731451-1-namhyung@kernel.org --- kernel/events/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index ac70d68217b6..4f86d22f281a 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4138,7 +4138,8 @@ static int merge_sched_in(struct perf_event *event, void *data) if (*perf_event_fasync(event)) event->pending_kill = POLL_ERR; - perf_event_wakeup(event); + event->pending_wakeup = 1; + irq_work_queue(&event->pending_irq); } else { struct perf_cpu_pmu_context *cpc = this_cpc(event->pmu_ctx->pmu); From 6a8a48644c4b804123e59dbfc5d6cd29a0194046 Mon Sep 17 00:00:00 2001 From: Zide Chen Date: Mon, 9 Feb 2026 16:52:25 -0800 Subject: [PATCH 101/359] perf/x86/intel/uncore: Add per-scheduler IMC CAS count events IMC on SPR and EMR does not support sub-channels. In contrast, CPUs that use gnr_uncores[] (e.g. Granite Rapids and Sierra Forest) implement two command schedulers (SCH0/SCH1) per memory channel, providing logically independent command and data paths. Do not reuse the spr_uncore_imc[] configuration for these CPUs. Instead, introduce a dedicated gnr_uncore_imc[] with per-scheduler events, so userspace can monitor SCH0 and SCH1 independently. On these CPUs, replace cas_count_{read,write} with cas_count_{read,write}_sch{0,1}. This may break existing userspace that relies on cas_count_{read,write}, prompting it to switch to the per-scheduler events, as the legacy event reports only partial traffic (SCH0). Fixes: 632c4bf6d007 ("perf/x86/intel/uncore: Support Granite Rapids") Fixes: cb4a6ccf3583 ("perf/x86/intel/uncore: Support Sierra Forest and Grand Ridge") Reported-by: Reinette Chatre Signed-off-by: Zide Chen Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Dapeng Mi Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260210005225.20311-1-zide.chen@intel.com --- arch/x86/events/intel/uncore_snbep.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c index 5ed6e0b7e715..0a1d08136cc1 100644 --- a/arch/x86/events/intel/uncore_snbep.c +++ b/arch/x86/events/intel/uncore_snbep.c @@ -6497,6 +6497,32 @@ static struct intel_uncore_type gnr_uncore_ubox = { .attr_update = uncore_alias_groups, }; +static struct uncore_event_desc gnr_uncore_imc_events[] = { + INTEL_UNCORE_EVENT_DESC(clockticks, "event=0x01,umask=0x00"), + INTEL_UNCORE_EVENT_DESC(cas_count_read_sch0, "event=0x05,umask=0xcf"), + INTEL_UNCORE_EVENT_DESC(cas_count_read_sch0.scale, "6.103515625e-5"), + INTEL_UNCORE_EVENT_DESC(cas_count_read_sch0.unit, "MiB"), + INTEL_UNCORE_EVENT_DESC(cas_count_read_sch1, "event=0x06,umask=0xcf"), + INTEL_UNCORE_EVENT_DESC(cas_count_read_sch1.scale, "6.103515625e-5"), + INTEL_UNCORE_EVENT_DESC(cas_count_read_sch1.unit, "MiB"), + INTEL_UNCORE_EVENT_DESC(cas_count_write_sch0, "event=0x05,umask=0xf0"), + INTEL_UNCORE_EVENT_DESC(cas_count_write_sch0.scale, "6.103515625e-5"), + INTEL_UNCORE_EVENT_DESC(cas_count_write_sch0.unit, "MiB"), + INTEL_UNCORE_EVENT_DESC(cas_count_write_sch1, "event=0x06,umask=0xf0"), + INTEL_UNCORE_EVENT_DESC(cas_count_write_sch1.scale, "6.103515625e-5"), + INTEL_UNCORE_EVENT_DESC(cas_count_write_sch1.unit, "MiB"), + { /* end: all zeroes */ }, +}; + +static struct intel_uncore_type gnr_uncore_imc = { + SPR_UNCORE_MMIO_COMMON_FORMAT(), + .name = "imc", + .fixed_ctr_bits = 48, + .fixed_ctr = SNR_IMC_MMIO_PMON_FIXED_CTR, + .fixed_ctl = SNR_IMC_MMIO_PMON_FIXED_CTL, + .event_descs = gnr_uncore_imc_events, +}; + static struct intel_uncore_type gnr_uncore_pciex8 = { SPR_UNCORE_PCI_COMMON_FORMAT(), .name = "pciex8", @@ -6544,7 +6570,7 @@ static struct intel_uncore_type *gnr_uncores[UNCORE_GNR_NUM_UNCORE_TYPES] = { NULL, &spr_uncore_pcu, &gnr_uncore_ubox, - &spr_uncore_imc, + &gnr_uncore_imc, NULL, &gnr_uncore_upi, NULL, From 77de62ad3de3967818c3dbe656b7336ebee461d2 Mon Sep 17 00:00:00 2001 From: Haocheng Yu Date: Tue, 3 Feb 2026 00:20:56 +0800 Subject: [PATCH 102/359] perf/core: Fix refcount bug and potential UAF in perf_mmap Syzkaller reported a refcount_t: addition on 0; use-after-free warning in perf_mmap. The issue is caused by a race condition between a failing mmap() setup and a concurrent mmap() on a dependent event (e.g., using output redirection). In perf_mmap(), the ring_buffer (rb) is allocated and assigned to event->rb with the mmap_mutex held. The mutex is then released to perform map_range(). If map_range() fails, perf_mmap_close() is called to clean up. However, since the mutex was dropped, another thread attaching to this event (via inherited events or output redirection) can acquire the mutex, observe the valid event->rb pointer, and attempt to increment its reference count. If the cleanup path has already dropped the reference count to zero, this results in a use-after-free or refcount saturation warning. Fix this by extending the scope of mmap_mutex to cover the map_range() call. This ensures that the ring buffer initialization and mapping (or cleanup on failure) happens atomically effectively, preventing other threads from accessing a half-initialized or dying ring buffer. Closes: https://lore.kernel.org/oe-kbuild-all/202602020208.m7KIjdzW-lkp@intel.com/ Reported-by: kernel test robot Signed-off-by: Haocheng Yu Signed-off-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20260202162057.7237-1-yuhaocheng035@gmail.com --- kernel/events/core.c | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 4f86d22f281a..22a0f405585b 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7465,29 +7465,29 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma) ret = perf_mmap_aux(vma, event, nr_pages); if (ret) return ret; + + /* + * Since pinned accounting is per vm we cannot allow fork() to copy our + * vma. + */ + vm_flags_set(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP); + vma->vm_ops = &perf_mmap_vmops; + + mapped = get_mapped(event, event_mapped); + if (mapped) + mapped(event, vma->vm_mm); + + /* + * Try to map it into the page table. On fail, invoke + * perf_mmap_close() to undo the above, as the callsite expects + * full cleanup in this case and therefore does not invoke + * vmops::close(). + */ + ret = map_range(event->rb, vma); + if (ret) + perf_mmap_close(vma); } - /* - * Since pinned accounting is per vm we cannot allow fork() to copy our - * vma. - */ - vm_flags_set(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP); - vma->vm_ops = &perf_mmap_vmops; - - mapped = get_mapped(event, event_mapped); - if (mapped) - mapped(event, vma->vm_mm); - - /* - * Try to map it into the page table. On fail, invoke - * perf_mmap_close() to undo the above, as the callsite expects - * full cleanup in this case and therefore does not invoke - * vmops::close(). - */ - ret = map_range(event->rb, vma); - if (ret) - perf_mmap_close(vma); - return ret; } From eb4a7139e97374f42b7242cc754e77f1623fbcd5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jouni=20H=C3=B6gander?= Date: Thu, 12 Feb 2026 08:27:31 +0200 Subject: [PATCH 103/359] drm/i915/alpm: ALPM disable fixes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PORT_ALPM_CTL is supposed to be written only before link training. Remove writing it from ALPM disable. Also clearing ALPM_CTL_ALPM_AUX_LESS_ENABLE and is not about disabling ALPM but switching to AUX-Wake ALPM. Stop touching this bit on ALPM disable. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7153 Fixes: 1ccbf135862b ("drm/i915/psr: Enable ALPM on source side for eDP Panel replay") Cc: Animesh Manna Cc: Jani Nikula Cc: # v6.10+ Signed-off-by: Jouni Högander Reviewed-by: Michał Grzelak Link: https://patch.msgid.link/20260212062731.397801-1-jouni.hogander@intel.com (cherry picked from commit 008304c9ae75c772d3460040de56e12112cdf5e6) Signed-off-by: Joonas Lahtinen --- drivers/gpu/drm/i915/display/intel_alpm.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_alpm.c b/drivers/gpu/drm/i915/display/intel_alpm.c index 7ce8c674bb03..07ffee38974b 100644 --- a/drivers/gpu/drm/i915/display/intel_alpm.c +++ b/drivers/gpu/drm/i915/display/intel_alpm.c @@ -562,12 +562,7 @@ void intel_alpm_disable(struct intel_dp *intel_dp) mutex_lock(&intel_dp->alpm.lock); intel_de_rmw(display, ALPM_CTL(display, cpu_transcoder), - ALPM_CTL_ALPM_ENABLE | ALPM_CTL_LOBF_ENABLE | - ALPM_CTL_ALPM_AUX_LESS_ENABLE, 0); - - intel_de_rmw(display, - PORT_ALPM_CTL(cpu_transcoder), - PORT_ALPM_CTL_ALPM_AUX_LESS_ENABLE, 0); + ALPM_CTL_ALPM_ENABLE | ALPM_CTL_LOBF_ENABLE, 0); drm_dbg_kms(display->drm, "Disabling ALPM\n"); mutex_unlock(&intel_dp->alpm.lock); From ec2cceadfae72304ca19650f9cac4b2a97b8a2fc Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Thu, 19 Feb 2026 10:51:33 +0100 Subject: [PATCH 104/359] gpiolib: normalize the return value of gc->get() on behalf of buggy drivers Commit 86ef402d805d ("gpiolib: sanitize the return value of gpio_chip::get()") started checking the return value of the .get() callback in struct gpio_chip. Now - almost a year later - it turns out that there are quite a few drivers in tree that can break with this change. Partially revert it: normalize the return value in GPIO core but also emit a warning. Cc: stable@vger.kernel.org Fixes: 86ef402d805d ("gpiolib: sanitize the return value of gpio_chip::get()") Reported-by: Dmitry Torokhov Closes: https://lore.kernel.org/all/aZSkqGTqMp_57qC7@google.com/ Reviewed-by: Linus Walleij Reviewed-by: Dmitry Torokhov Link: https://patch.msgid.link/20260219-gpiolib-set-normalize-v2-1-f84630e45796@oss.qualcomm.com Signed-off-by: Bartosz Golaszewski --- drivers/gpio/gpiolib.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 86a171e96b0e..ada572aaebd6 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -3267,8 +3267,12 @@ static int gpiochip_get(struct gpio_chip *gc, unsigned int offset) /* Make sure this is called after checking for gc->get(). */ ret = gc->get(gc, offset); - if (ret > 1) - ret = -EBADE; + if (ret > 1) { + gpiochip_warn(gc, + "invalid return value from gc->get(): %d, consider fixing the driver\n", + ret); + ret = !!ret; + } return ret; } From af12e64ae0661546e8b4f5d30d55c5f53a11efe7 Mon Sep 17 00:00:00 2001 From: Felix Gu Date: Tue, 20 Jan 2026 22:26:46 +0800 Subject: [PATCH 105/359] mmc: mmci: Fix device_node reference leak in of_get_dml_pipe_index() When calling of_parse_phandle_with_args(), the caller is responsible to call of_node_put() to release the reference of device node. In of_get_dml_pipe_index(), it does not release the reference. Fixes: 9cb15142d0e3 ("mmc: mmci: Add qcom dml support to the driver.") Signed-off-by: Felix Gu Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson --- drivers/mmc/host/mmci_qcom_dml.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/mmc/host/mmci_qcom_dml.c b/drivers/mmc/host/mmci_qcom_dml.c index 3da6112fbe39..67371389cc33 100644 --- a/drivers/mmc/host/mmci_qcom_dml.c +++ b/drivers/mmc/host/mmci_qcom_dml.c @@ -109,6 +109,7 @@ static int of_get_dml_pipe_index(struct device_node *np, const char *name) &dma_spec)) return -ENODEV; + of_node_put(dma_spec.np); if (dma_spec.args_count) return dma_spec.args[0]; From 6465a8bbb0f6ad98aeb66dc9ea19c32c193a610b Mon Sep 17 00:00:00 2001 From: Shawn Lin Date: Fri, 16 Jan 2026 08:55:30 +0800 Subject: [PATCH 106/359] mmc: dw_mmc-rockchip: Fix runtime PM support for internal phase support RK3576 is the first platform to introduce internal phase support, and subsequent platforms are expected to adopt a similar design. In this architecture, runtime suspend powers off the attached power domain, which resets registers, including vendor-specific ones such as SDMMC_TIMING_CON0, SDMMC_TIMING_CON1, and SDMMC_MISC_CON. These registers must be saved and restored, a requirement that falls outside the scope of the dw_mmc core. Fixes: 59903441f5e4 ("mmc: dw_mmc-rockchip: Add internal phase support") Signed-off-by: Shawn Lin Tested-by: Marco Schirrmeister Reviewed-by: Heiko Stuebner Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson --- drivers/mmc/host/dw_mmc-rockchip.c | 38 +++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/dw_mmc-rockchip.c b/drivers/mmc/host/dw_mmc-rockchip.c index 4e3423a19bdf..ac069d0c42b2 100644 --- a/drivers/mmc/host/dw_mmc-rockchip.c +++ b/drivers/mmc/host/dw_mmc-rockchip.c @@ -36,6 +36,8 @@ struct dw_mci_rockchip_priv_data { int default_sample_phase; int num_phases; bool internal_phase; + int sample_phase; + int drv_phase; }; /* @@ -573,9 +575,43 @@ static void dw_mci_rockchip_remove(struct platform_device *pdev) dw_mci_pltfm_remove(pdev); } +static int dw_mci_rockchip_runtime_suspend(struct device *dev) +{ + struct platform_device *pdev = to_platform_device(dev); + struct dw_mci *host = platform_get_drvdata(pdev); + struct dw_mci_rockchip_priv_data *priv = host->priv; + + if (priv->internal_phase) { + priv->sample_phase = rockchip_mmc_get_phase(host, true); + priv->drv_phase = rockchip_mmc_get_phase(host, false); + } + + return dw_mci_runtime_suspend(dev); +} + +static int dw_mci_rockchip_runtime_resume(struct device *dev) +{ + struct platform_device *pdev = to_platform_device(dev); + struct dw_mci *host = platform_get_drvdata(pdev); + struct dw_mci_rockchip_priv_data *priv = host->priv; + int ret; + + ret = dw_mci_runtime_resume(dev); + if (ret) + return ret; + + if (priv->internal_phase) { + rockchip_mmc_set_phase(host, true, priv->sample_phase); + rockchip_mmc_set_phase(host, false, priv->drv_phase); + mci_writel(host, MISC_CON, MEM_CLK_AUTOGATE_ENABLE); + } + + return ret; +} + static const struct dev_pm_ops dw_mci_rockchip_dev_pm_ops = { SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume) - RUNTIME_PM_OPS(dw_mci_runtime_suspend, dw_mci_runtime_resume, NULL) + RUNTIME_PM_OPS(dw_mci_rockchip_runtime_suspend, dw_mci_rockchip_runtime_resume, NULL) }; static struct platform_driver dw_mci_rockchip_pltfm_driver = { From 79ad471530e0baef0dce991816013df55e401d9c Mon Sep 17 00:00:00 2001 From: Kamal Dasu Date: Mon, 16 Feb 2026 14:15:43 -0500 Subject: [PATCH 107/359] mmc: sdhci-brcmstb: use correct register offset for V1 pin_sel restore The restore path for SDIO_CFG_CORE_V1 was incorrectly using SDIO_CFG_SD_PIN_SEL (offset 0x44) instead of SDIO_CFG_V1_SD_PIN_SEL (offset 0x54), causing the wrong register to be written on resume. The save path already uses the correct V1-specific offset. This affects BCM7445 and BCM72116 platforms which use the V1 config core. Fixes: b7e614802e3f ("mmc: sdhci-brcmstb: save and restore registers during PM") Signed-off-by: Kamal Dasu Cc: stable@vger.kernel.org Tested-by: Florian Fainelli Reviewed-by: Florian Fainelli Signed-off-by: Ulf Hansson --- drivers/mmc/host/sdhci-brcmstb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mmc/host/sdhci-brcmstb.c b/drivers/mmc/host/sdhci-brcmstb.c index c9442499876c..57e45951644e 100644 --- a/drivers/mmc/host/sdhci-brcmstb.c +++ b/drivers/mmc/host/sdhci-brcmstb.c @@ -116,7 +116,7 @@ static void sdhci_brcmstb_restore_regs(struct mmc_host *mmc, enum cfg_core_ver v writel(sr->boot_main_ctl, priv->boot_regs + SDIO_BOOT_MAIN_CTL); if (ver == SDIO_CFG_CORE_V1) { - writel(sr->sd_pin_sel, cr + SDIO_CFG_SD_PIN_SEL); + writel(sr->sd_pin_sel, cr + SDIO_CFG_V1_SD_PIN_SEL); return; } From 162d331d833dc73a3e905a24c44dd33732af1fc5 Mon Sep 17 00:00:00 2001 From: Ariel Silver Date: Fri, 20 Feb 2026 10:11:29 +0000 Subject: [PATCH 108/359] wifi: mac80211: bounds-check link_id in ieee80211_ml_reconfiguration link_id is taken from the ML Reconfiguration element (control & 0x000f), so it can be 0..15. link_removal_timeout[] has IEEE80211_MLD_MAX_NUM_LINKS (15) elements, so index 15 is out-of-bounds. Skip subelements with link_id >= IEEE80211_MLD_MAX_NUM_LINKS to avoid a stack out-of-bounds write. Fixes: 8eb8dd2ffbbb ("wifi: mac80211: Support link removal using Reconfiguration ML element") Reported-by: Ariel Silver Signed-off-by: Ariel Silver Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260220101129.1202657-1-Ariel.Silver@cybereason.com Signed-off-by: Johannes Berg --- net/mac80211/mlme.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c index e83582b2c377..d43204ee330e 100644 --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -7085,6 +7085,9 @@ static void ieee80211_ml_reconfiguration(struct ieee80211_sub_if_data *sdata, control = le16_to_cpu(prof->control); link_id = control & IEEE80211_MLE_STA_RECONF_CONTROL_LINK_ID; + if (link_id >= IEEE80211_MLD_MAX_NUM_LINKS) + continue; + removed_links |= BIT(link_id); /* the MAC address should not be included, but handle it */ From 901084c51a0a8fb42a3f37d2e9c62083c495f824 Mon Sep 17 00:00:00 2001 From: Penghe Geng Date: Thu, 19 Feb 2026 15:29:54 -0500 Subject: [PATCH 109/359] mmc: core: Avoid bitfield RMW for claim/retune flags Move claimed and retune control flags out of the bitfield word to avoid unrelated RMW side effects in asynchronous contexts. The host->claimed bit shared a word with retune flags. Writes to claimed in __mmc_claim_host() or retune_now in mmc_mq_queue_rq() can overwrite other bits when concurrent updates happen in other contexts, triggering spurious WARN_ON(!host->claimed). Convert claimed, can_retune, retune_now and retune_paused to bool to remove shared-word coupling. Fixes: 6c0cedd1ef952 ("mmc: core: Introduce host claiming by context") Fixes: 1e8e55b67030c ("mmc: block: Add CQE support") Cc: stable@vger.kernel.org Suggested-by: Adrian Hunter Signed-off-by: Penghe Geng Acked-by: Adrian Hunter Signed-off-by: Ulf Hansson --- include/linux/mmc/host.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index e0e2c265e5d1..ba84f02c2a10 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -486,14 +486,12 @@ struct mmc_host { struct mmc_ios ios; /* current io bus settings */ + bool claimed; /* host exclusively claimed */ + /* group bitfields together to minimize padding */ unsigned int use_spi_crc:1; - unsigned int claimed:1; /* host exclusively claimed */ unsigned int doing_init_tune:1; /* initial tuning in progress */ - unsigned int can_retune:1; /* re-tuning can be used */ unsigned int doing_retune:1; /* re-tuning in progress */ - unsigned int retune_now:1; /* do re-tuning at next req */ - unsigned int retune_paused:1; /* re-tuning is temporarily disabled */ unsigned int retune_crc_disable:1; /* don't trigger retune upon crc */ unsigned int can_dma_map_merge:1; /* merging can be used */ unsigned int vqmmc_enabled:1; /* vqmmc regulator is enabled */ @@ -508,6 +506,9 @@ struct mmc_host { int rescan_disable; /* disable card detection */ int rescan_entered; /* used with nonremovable devices */ + bool can_retune; /* re-tuning can be used */ + bool retune_now; /* do re-tuning at next req */ + bool retune_paused; /* re-tuning is temporarily disabled */ int need_retune; /* re-tuning is needed */ int hold_retune; /* hold off re-tuning */ unsigned int retune_period; /* re-tuning period in secs */ From 7a73801fdaf8aee90d23ba77976082a48d156a21 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 22 Dec 2025 21:29:41 +0100 Subject: [PATCH 110/359] pmdomain: imx: gpcv2: Discard pm_runtime_put() return value Passing pm_runtime_put() return value to the callers is not particularly useful. Returning an error code from pm_runtime_put() merely means that it has not queued up a work item to check whether or not the device can be suspended and there are many perfectly valid situations in which that can happen, like after writing "on" to the devices' runtime PM "control" attribute in sysfs for one example. Accordingly, update imx_pgc_domain_suspend() to simply discard the return value of pm_runtime_put() and always return success to the caller. This will facilitate a planned change of the pm_runtime_put() return type to void in the future. Signed-off-by: Rafael J. Wysocki Acked-by: Peng Fan Acked-by: Ulf Hansson Link: https://patch.msgid.link/15658107.tv2OnDr8pf@rafael.j.wysocki --- drivers/pmdomain/imx/gpcv2.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/pmdomain/imx/gpcv2.c b/drivers/pmdomain/imx/gpcv2.c index cff738e4d546..a829f8da5be7 100644 --- a/drivers/pmdomain/imx/gpcv2.c +++ b/drivers/pmdomain/imx/gpcv2.c @@ -1416,7 +1416,9 @@ static int imx_pgc_domain_suspend(struct device *dev) static int imx_pgc_domain_resume(struct device *dev) { - return pm_runtime_put(dev); + pm_runtime_put(dev); + + return 0; } #endif From 3afd8df024339c7da1a5a0302f3987866dd16e40 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 22 Dec 2025 21:36:25 +0100 Subject: [PATCH 111/359] PM: runtime: Change pm_runtime_put() return type to void The primary role of pm_runtime_put() is to decrement the runtime PM usage counter of the given device. It always does that regardless of the value returned by it later. In addition, if the runtime PM usage counter after decrementation turns out to be zero, a work item is queued up to check whether or not the device can be suspended. This is not guaranteed to succeed though and even if it is successful, the device may still not be suspended going forward. There are multiple valid reasons why pm_runtime_put() may not decide to queue up the work item mentioned above, including, but not limited to, the case when user space has written "on" to the device's runtime PM "control" file in sysfs. In all of those cases, pm_runtime_put() returns a negative error code (even though the device's runtime PM usage counter has been successfully decremented by it) which is very confusing. In fact, its return value should only be used for debug purposes and care should be taken when doing it even in that case. Accordingly, to avoid the confusion mentioned above, change the return type of pm_runtime_put() to void. Signed-off-by: Rafael J. Wysocki Reviewed-by: Ulf Hansson Reviewed-by: Brian Norris Link: https://patch.msgid.link/14387202.RDIVbhacDa@rafael.j.wysocki --- include/linux/pm_runtime.h | 16 ++-------------- 1 file changed, 2 insertions(+), 14 deletions(-) diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h index 41037c513f06..64921b10ac74 100644 --- a/include/linux/pm_runtime.h +++ b/include/linux/pm_runtime.h @@ -545,22 +545,10 @@ static inline int pm_runtime_resume_and_get(struct device *dev) * * Decrement the runtime PM usage counter of @dev and if it turns out to be * equal to 0, queue up a work item for @dev like in pm_request_idle(). - * - * Return: - * * 1: Success. Usage counter dropped to zero, but device was already suspended. - * * 0: Success. - * * -EINVAL: Runtime PM error. - * * -EACCES: Runtime PM disabled. - * * -EAGAIN: Runtime PM usage counter became non-zero or Runtime PM status - * change ongoing. - * * -EBUSY: Runtime PM child_count non-zero. - * * -EPERM: Device PM QoS resume latency 0. - * * -EINPROGRESS: Suspend already in progress. - * * -ENOSYS: CONFIG_PM not enabled. */ -static inline int pm_runtime_put(struct device *dev) +static inline void pm_runtime_put(struct device *dev) { - return __pm_runtime_idle(dev, RPM_GET_PUT | RPM_ASYNC); + __pm_runtime_idle(dev, RPM_GET_PUT | RPM_ASYNC); } /** From 4b73231b2a61c4142a027613d277a19c484dfcc3 Mon Sep 17 00:00:00 2001 From: Yufan Chen Date: Sun, 22 Feb 2026 18:40:35 +0800 Subject: [PATCH 112/359] regulator: tps65185: check devm_kzalloc() result in probe tps65185_probe() dereferences the allocation result immediately by using data->regmap. If devm_kzalloc() returns NULL under memory pressure, this leads to a NULL pointer dereference. Add the missing allocation check and return -ENOMEM on failure. Signed-off-by: Yufan Chen Link: https://patch.msgid.link/20260222104035.90790-1-ericterminal@gmail.com Signed-off-by: Mark Brown --- drivers/regulator/tps65185.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/regulator/tps65185.c b/drivers/regulator/tps65185.c index 3286c9ab33d0..786622d8d598 100644 --- a/drivers/regulator/tps65185.c +++ b/drivers/regulator/tps65185.c @@ -332,6 +332,9 @@ static int tps65185_probe(struct i2c_client *client) int i; data = devm_kzalloc(&client->dev, sizeof(*data), GFP_KERNEL); + if (!data) + return -ENOMEM; + data->regmap = devm_regmap_init_i2c(client, ®map_config); if (IS_ERR(data->regmap)) return dev_err_probe(&client->dev, PTR_ERR(data->regmap), From f895e5df80316a308c2f7d64d13a78494630ea05 Mon Sep 17 00:00:00 2001 From: Corey Minyard Date: Thu, 12 Feb 2026 21:52:48 -0600 Subject: [PATCH 113/359] ipmi:si: Don't block module unload if the BMC is messed up If the BMC is in a bad state, don't bother waiting for queues messages since there can't be any. Otherwise the unload is blocked until the BMC is back in a good state. Reported-by: Rafael J. Wysocki Fixes: bc3a9d217755 ("ipmi:si: Gracefully handle if the BMC is non-functional") Cc: stable@vger.kernel.org # 4.18 Signed-off-by: Corey Minyard Reviewed-by: Rafael J. Wysocki (Intel) --- drivers/char/ipmi/ipmi_si_intf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 0049e3792ba1..3667033fcc51 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -2234,7 +2234,8 @@ static void wait_msg_processed(struct smi_info *smi_info) unsigned long jiffies_now; long time_diff; - while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)) { + while (smi_info->si_state != SI_HOSED && + (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL))) { jiffies_now = jiffies; time_diff = (((long)jiffies_now - (long)smi_info->last_timeout_jiffies) * SI_USEC_PER_JIFFY); From 62cd145453d577113f993efd025f258dd86aa183 Mon Sep 17 00:00:00 2001 From: Corey Minyard Date: Thu, 12 Feb 2026 21:56:54 -0600 Subject: [PATCH 114/359] ipmi:msghandler: Handle error returns from the SMI sender It used to be, until recently, that the sender operation on the low level interfaces would not fail. That's not the case any more with recent changes. So check the return value from the sender operation, and propagate it back up from there and handle the errors in all places. Reported-by: Rafael J. Wysocki Fixes: bc3a9d217755 ("ipmi:si: Gracefully handle if the BMC is non-functional") Cc: stable@vger.kernel.org # 4.18 Signed-off-by: Corey Minyard Reviewed-by: Rafael J. Wysocki (Intel) --- drivers/char/ipmi/ipmi_msghandler.c | 102 +++++++++++++++++++--------- 1 file changed, 69 insertions(+), 33 deletions(-) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index a042b1596933..f8c3c1e44520 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -1887,19 +1887,32 @@ static struct ipmi_smi_msg *smi_add_send_msg(struct ipmi_smi *intf, return smi_msg; } -static void smi_send(struct ipmi_smi *intf, +static int smi_send(struct ipmi_smi *intf, const struct ipmi_smi_handlers *handlers, struct ipmi_smi_msg *smi_msg, int priority) { int run_to_completion = READ_ONCE(intf->run_to_completion); unsigned long flags = 0; + int rv = 0; ipmi_lock_xmit_msgs(intf, run_to_completion, &flags); smi_msg = smi_add_send_msg(intf, smi_msg, priority); ipmi_unlock_xmit_msgs(intf, run_to_completion, &flags); - if (smi_msg) - handlers->sender(intf->send_info, smi_msg); + if (smi_msg) { + rv = handlers->sender(intf->send_info, smi_msg); + if (rv) { + ipmi_lock_xmit_msgs(intf, run_to_completion, &flags); + intf->curr_msg = NULL; + ipmi_unlock_xmit_msgs(intf, run_to_completion, &flags); + /* + * Something may have been added to the transmit + * queue, so schedule a check for that. + */ + queue_work(system_wq, &intf->smi_work); + } + } + return rv; } static bool is_maintenance_mode_cmd(struct kernel_ipmi_msg *msg) @@ -2312,6 +2325,7 @@ static int i_ipmi_request(struct ipmi_user *user, struct ipmi_recv_msg *recv_msg; int run_to_completion = READ_ONCE(intf->run_to_completion); int rv = 0; + bool in_seq_table = false; if (supplied_recv) { recv_msg = supplied_recv; @@ -2365,33 +2379,50 @@ static int i_ipmi_request(struct ipmi_user *user, rv = i_ipmi_req_ipmb(intf, addr, msgid, msg, smi_msg, recv_msg, source_address, source_lun, retries, retry_time_ms); + in_seq_table = true; } else if (is_ipmb_direct_addr(addr)) { rv = i_ipmi_req_ipmb_direct(intf, addr, msgid, msg, smi_msg, recv_msg, source_lun); } else if (is_lan_addr(addr)) { rv = i_ipmi_req_lan(intf, addr, msgid, msg, smi_msg, recv_msg, source_lun, retries, retry_time_ms); + in_seq_table = true; } else { - /* Unknown address type. */ + /* Unknown address type. */ ipmi_inc_stat(intf, sent_invalid_commands); rv = -EINVAL; } - if (rv) { + if (!rv) { + dev_dbg(intf->si_dev, "Send: %*ph\n", + smi_msg->data_size, smi_msg->data); + + rv = smi_send(intf, intf->handlers, smi_msg, priority); + if (rv != IPMI_CC_NO_ERROR) + /* smi_send() returns an IPMI err, return a Linux one. */ + rv = -EIO; + if (rv && in_seq_table) { + /* + * If it's in the sequence table, it will be + * retried later, so ignore errors. + */ + rv = 0; + /* But we need to fix the timeout. */ + intf_start_seq_timer(intf, smi_msg->msgid); + ipmi_free_smi_msg(smi_msg); + smi_msg = NULL; + } + } out_err: + if (!run_to_completion) + mutex_unlock(&intf->users_mutex); + + if (rv) { if (!supplied_smi) ipmi_free_smi_msg(smi_msg); if (!supplied_recv) ipmi_free_recv_msg(recv_msg); - } else { - dev_dbg(intf->si_dev, "Send: %*ph\n", - smi_msg->data_size, smi_msg->data); - - smi_send(intf, intf->handlers, smi_msg, priority); } - if (!run_to_completion) - mutex_unlock(&intf->users_mutex); - return rv; } @@ -3965,12 +3996,12 @@ static int handle_ipmb_get_msg_cmd(struct ipmi_smi *intf, dev_dbg(intf->si_dev, "Invalid command: %*ph\n", msg->data_size, msg->data); - smi_send(intf, intf->handlers, msg, 0); - /* - * We used the message, so return the value that - * causes it to not be freed or queued. - */ - rv = -1; + if (smi_send(intf, intf->handlers, msg, 0) == IPMI_CC_NO_ERROR) + /* + * We used the message, so return the value that + * causes it to not be freed or queued. + */ + rv = -1; } else if (!IS_ERR(recv_msg)) { /* Extract the source address from the data. */ ipmb_addr = (struct ipmi_ipmb_addr *) &recv_msg->addr; @@ -4044,12 +4075,12 @@ static int handle_ipmb_direct_rcv_cmd(struct ipmi_smi *intf, msg->data[4] = IPMI_INVALID_CMD_COMPLETION_CODE; msg->data_size = 5; - smi_send(intf, intf->handlers, msg, 0); - /* - * We used the message, so return the value that - * causes it to not be freed or queued. - */ - rv = -1; + if (smi_send(intf, intf->handlers, msg, 0) == IPMI_CC_NO_ERROR) + /* + * We used the message, so return the value that + * causes it to not be freed or queued. + */ + rv = -1; } else if (!IS_ERR(recv_msg)) { /* Extract the source address from the data. */ daddr = (struct ipmi_ipmb_direct_addr *)&recv_msg->addr; @@ -4189,7 +4220,7 @@ static int handle_lan_get_msg_cmd(struct ipmi_smi *intf, struct ipmi_smi_msg *msg) { struct cmd_rcvr *rcvr; - int rv = 0; + int rv = 0; /* Free by default */ unsigned char netfn; unsigned char cmd; unsigned char chan; @@ -4242,12 +4273,12 @@ static int handle_lan_get_msg_cmd(struct ipmi_smi *intf, dev_dbg(intf->si_dev, "Invalid command: %*ph\n", msg->data_size, msg->data); - smi_send(intf, intf->handlers, msg, 0); - /* - * We used the message, so return the value that - * causes it to not be freed or queued. - */ - rv = -1; + if (smi_send(intf, intf->handlers, msg, 0) == IPMI_CC_NO_ERROR) + /* + * We used the message, so return the value that + * causes it to not be freed or queued. + */ + rv = -1; } else if (!IS_ERR(recv_msg)) { /* Extract the source address from the data. */ lan_addr = (struct ipmi_lan_addr *) &recv_msg->addr; @@ -5056,7 +5087,12 @@ static void check_msg_timeout(struct ipmi_smi *intf, struct seq_table *ent, ipmi_inc_stat(intf, retransmitted_ipmb_commands); - smi_send(intf, intf->handlers, smi_msg, 0); + /* If this fails we'll retry later or timeout. */ + if (smi_send(intf, intf->handlers, smi_msg, 0) != IPMI_CC_NO_ERROR) { + /* But fix the timeout. */ + intf_start_seq_timer(intf, smi_msg->msgid); + ipmi_free_smi_msg(smi_msg); + } } else ipmi_free_smi_msg(smi_msg); From cae66f1a1dcd23e17da5a015ef9d731129f9d2dd Mon Sep 17 00:00:00 2001 From: Corey Minyard Date: Fri, 13 Feb 2026 00:15:04 -0600 Subject: [PATCH 115/359] ipmi:si: Fix check for a misbehaving BMC There is a race on checking the state in the sender, it needs to be checked under a lock. But you also need a check to avoid issues with a misbehaving BMC for run to completion mode. So leave the check at the beginning for run to completion, and add a check under the lock to avoid the race. Reported-by: Rafael J. Wysocki Fixes: bc3a9d217755 ("ipmi:si: Gracefully handle if the BMC is non-functional") Cc: stable@vger.kernel.org # 4.18 Signed-off-by: Corey Minyard Reviewed-by: Rafael J. Wysocki (Intel) --- drivers/char/ipmi/ipmi_si_intf.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 3667033fcc51..6eda61664aaa 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -924,9 +924,14 @@ static int sender(void *send_info, struct ipmi_smi_msg *msg) { struct smi_info *smi_info = send_info; unsigned long flags; + int rv = IPMI_CC_NO_ERROR; debug_timestamp(smi_info, "Enqueue"); + /* + * Check here for run to completion mode. A check under lock is + * later. + */ if (smi_info->si_state == SI_HOSED) return IPMI_BUS_ERR; @@ -940,18 +945,15 @@ static int sender(void *send_info, struct ipmi_smi_msg *msg) } spin_lock_irqsave(&smi_info->si_lock, flags); - /* - * The following two lines don't need to be under the lock for - * the lock's sake, but they do need SMP memory barriers to - * avoid getting things out of order. We are already claiming - * the lock, anyway, so just do it under the lock to avoid the - * ordering problem. - */ - BUG_ON(smi_info->waiting_msg); - smi_info->waiting_msg = msg; - check_start_timer_thread(smi_info); + if (smi_info->si_state == SI_HOSED) { + rv = IPMI_BUS_ERR; + } else { + BUG_ON(smi_info->waiting_msg); + smi_info->waiting_msg = msg; + check_start_timer_thread(smi_info); + } spin_unlock_irqrestore(&smi_info->si_lock, flags); - return IPMI_CC_NO_ERROR; + return rv; } static void set_run_to_completion(void *send_info, bool i_run_to_completion) From 318c58852e686c009825ae8c071080b9ccdd2af0 Mon Sep 17 00:00:00 2001 From: Gregory Price Date: Wed, 11 Feb 2026 14:22:27 -0500 Subject: [PATCH 116/359] cxl/memdev: fix deadlock in cxl_memdev_autoremove() on attach failure cxl_memdev_autoremove() takes device_lock(&cxlmd->dev) via guard(device) and then calls cxl_memdev_unregister() when the attach callback was provided but cxl_mem_probe() failed to bind. cxl_memdev_unregister() calls cdev_device_del() device_del() bus_remove_device() device_release_driver() This path is reached when a driver uses the @attach parameter to devm_cxl_add_memdev() and the CXL topology fails to enumerate (e.g. DVSEC range registers decode outside platform-defined CXL ranges, causing the endpoint port probe to fail). Add cxl_memdev_attach_failed() to set the scope of the check correctly. Reported-by: kreview-c94b85d6d2 Fixes: 29317f8dc6ed ("cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation") Signed-off-by: Gregory Price Reviewed-by: Dan Williams Reviewed-by: Davidlohr Bueso Link: https://patch.msgid.link/20260211192228.2148713-1-gourry@gourry.net Signed-off-by: Dave Jiang --- drivers/cxl/core/memdev.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index f547d8ac34c7..273c22118d3d 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -1089,10 +1089,8 @@ static int cxlmd_add(struct cxl_memdev *cxlmd, struct cxl_dev_state *cxlds) DEFINE_FREE(put_cxlmd, struct cxl_memdev *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev)) -static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd) +static bool cxl_memdev_attach_failed(struct cxl_memdev *cxlmd) { - int rc; - /* * If @attach is provided fail if the driver is not attached upon * return. Note that failure here could be the result of a race to @@ -1100,7 +1098,14 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd) * succeeded and then cxl_mem unbound before the lock is acquired. */ guard(device)(&cxlmd->dev); - if (cxlmd->attach && !cxlmd->dev.driver) { + return (cxlmd->attach && !cxlmd->dev.driver); +} + +static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd) +{ + int rc; + + if (cxl_memdev_attach_failed(cxlmd)) { cxl_memdev_unregister(cxlmd); return ERR_PTR(-ENXIO); } From ec197dca8735f7627e5cff7e3fa8839b53a28514 Mon Sep 17 00:00:00 2001 From: Fuad Tabba Date: Sun, 22 Feb 2026 08:33:52 +0000 Subject: [PATCH 117/359] KVM: arm64: Revert accidental drop of kvm_uninit_stage2_mmu() for non-NV VMs Commit 0c4762e26879 ("KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported") added an early return to several functions in arch/arm64/kvm/nested.c to prevent a UBSAN shift-out-of-bounds error when accessing the pgt union for non-nested VMs. However, this early return was inadvertently applied to kvm_arch_flush_shadow_all() as well, causing it to skip the call to kvm_uninit_stage2_mmu(kvm) for all non-nested VMs. For pKVM, skipping this teardown means the host never unshares the guest's memory with the EL2 hypervisor. When the host kernel later recycles these leaked pages for a new VM, it attempts to re-share them. The hypervisor correctly rejects this with -EPERM, triggering a host WARN_ON and hanging the guest. Fix this by dropping the early return from kvm_arch_flush_shadow_all(). The for-loop guarding the nested MMU cleanup already bounds itself when nested_mmus_size == 0, allowing execution to proceed to kvm_uninit_stage2_mmu() as intended. Reported-by: Mark Brown Closes: https://lore.kernel.org/all/60916cb6-f460-4751-b910-f63c58700ad0@sirena.org.uk/ Fixes: 0c4762e26879 ("KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported") Signed-off-by: Fuad Tabba Tested-by: Mark Brown Link: https://patch.msgid.link/20260222083352.89503-1-tabba@google.com Signed-off-by: Marc Zyngier --- arch/arm64/kvm/nested.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c index eeea5e692370..7f1ea85dc67a 100644 --- a/arch/arm64/kvm/nested.c +++ b/arch/arm64/kvm/nested.c @@ -1154,9 +1154,6 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm) { int i; - if (!kvm->arch.nested_mmus_size) - return; - for (i = 0; i < kvm->arch.nested_mmus_size; i++) { struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i]; From 822655e6751dde2df7ddaa828c5aba217726c5a2 Mon Sep 17 00:00:00 2001 From: Li Ming Date: Mon, 23 Feb 2026 09:29:00 -0700 Subject: [PATCH 118/359] cxl/port: Introduce port_to_host() helper In CXL subsystem, a port has its own host device for the port creation and removal. The host of CXL root and all the first level ports is the platform firmware device, the host of other ports is their parent port's device. Create this new helper to much easier to get the host of a cxl port. Introduce port_to_host() and use it to replace all places where using open coded to get the host of a port. Remove endpoint_host() as its functionality can be replaced by port_to_host(). [dj: Squashed commit 1 and 3 in the series to commit 1. ] Signed-off-by: Li Ming Tested-by: Alison Schofield Reviewed-by: Dan Williams Link: https://patch.msgid.link/20260210-fix-port-enumeration-failure-v3-1-06acce0b9ead@zohomail.com Signed-off-by: Dave Jiang --- drivers/cxl/core/core.h | 18 ++++++++++++++++++ drivers/cxl/core/port.c | 29 +++-------------------------- 2 files changed, 21 insertions(+), 26 deletions(-) diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 007b8aff0238..5b0570df0fd9 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -152,6 +152,24 @@ int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c); int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port, struct access_coordinate *c); +static inline struct device *port_to_host(struct cxl_port *port) +{ + struct cxl_port *parent = is_cxl_root(port) ? NULL : + to_cxl_port(port->dev.parent); + + /* + * The host of CXL root port and the first level of ports is + * the platform firmware device, the host of all other ports + * is their parent port. + */ + if (!parent) + return port->uport_dev; + else if (is_cxl_root(parent)) + return parent->uport_dev; + else + return &parent->dev; +} + static inline struct device *dport_to_host(struct cxl_dport *dport) { struct cxl_port *port = dport->port; diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index b69c2529744c..6be150e645b6 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -615,22 +615,8 @@ struct cxl_port *parent_port_of(struct cxl_port *port) static void unregister_port(void *_port) { struct cxl_port *port = _port; - struct cxl_port *parent = parent_port_of(port); - struct device *lock_dev; - /* - * CXL root port's and the first level of ports are unregistered - * under the platform firmware device lock, all other ports are - * unregistered while holding their parent port lock. - */ - if (!parent) - lock_dev = port->uport_dev; - else if (is_cxl_root(parent)) - lock_dev = parent->uport_dev; - else - lock_dev = &parent->dev; - - device_lock_assert(lock_dev); + device_lock_assert(port_to_host(port)); port->dead = true; device_unregister(&port->dev); } @@ -1427,20 +1413,11 @@ static struct device *grandparent(struct device *dev) return NULL; } -static struct device *endpoint_host(struct cxl_port *endpoint) -{ - struct cxl_port *port = to_cxl_port(endpoint->dev.parent); - - if (is_cxl_root(port)) - return port->uport_dev; - return &port->dev; -} - static void delete_endpoint(void *data) { struct cxl_memdev *cxlmd = data; struct cxl_port *endpoint = cxlmd->endpoint; - struct device *host = endpoint_host(endpoint); + struct device *host = port_to_host(endpoint); scoped_guard(device, host) { if (host->driver && !endpoint->dead) { @@ -1456,7 +1433,7 @@ static void delete_endpoint(void *data) int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint) { - struct device *host = endpoint_host(endpoint); + struct device *host = port_to_host(endpoint); struct device *dev = &cxlmd->dev; get_device(host); From 0066688dbcdcf51680f499936faffe6d0e94194e Mon Sep 17 00:00:00 2001 From: Li Ming Date: Tue, 10 Feb 2026 19:46:57 +0800 Subject: [PATCH 119/359] cxl/port: Hold port host lock during dport adding. CXL testing environment can trigger following trace Oops: general protection fault, probably for non-canonical address 0xdffffc0000000092: 0000 [#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000490-0x0000000000000497] RIP: 0010:cxl_dpa_to_region+0x105/0x1f0 [cxl_core] Call Trace: cxl_event_trace_record+0xd1/0xa70 [cxl_core] __cxl_event_trace_record+0x12f/0x1e0 [cxl_core] cxl_mem_get_records_log+0x261/0x500 [cxl_core] cxl_mem_get_event_records+0x7c/0xc0 [cxl_core] cxl_mock_mem_probe+0xd38/0x1c60 [cxl_mock_mem] platform_probe+0x9d/0x130 really_probe+0x1c8/0x960 __driver_probe_device+0x187/0x3e0 driver_probe_device+0x45/0x120 __device_attach_driver+0x15d/0x280 When CXL subsystem adds a cxl port to a hierarchy, there is a small window where the new port becomes visible before it is bound to a driver. This happens because device_add() adds a device to bus device list before bus_probe_device() binds it to a driver. So if two cxl memdevs are trying to add a dport to a same port via devm_cxl_enumerate_ports(), the second cxl memdev may observe the port and attempt to add a dport, but fails because the port has not yet been attached to cxl port driver. That causes the memdev->endpoint can not be updated. The sequence is like: CPU 0 CPU 1 devm_cxl_enumerate_ports() # port not found, add it add_port_attach_ep() # hold the parent port lock # to add the new port devm_cxl_create_port() device_add() # Add dev to bus devs list bus_add_device() devm_cxl_enumerate_ports() # found the port find_cxl_port_by_uport() # hold port lock to add a dport device_lock(the port) find_or_add_dport() cxl_port_add_dport() return -ENXIO because port->dev.driver is NULL device_unlock(the port) bus_probe_device() # hold the port lock # for attaching device_lock(the port) attaching the new port device_unlock(the port) To fix this race, require that dport addition holds the host lock of the target port(the host of CXL root and all cxl host bridge ports is the platform firmware device, the host of all other ports is their parent port). The CXL subsystem already requires holding the host lock while attaching a new port. Therefore, successfully acquiring the host lock guarantees that port attaching has completed. Fixes: 4f06d81e7c6a ("cxl: Defer dport allocation for switch ports") Signed-off-by: Li Ming Reviewed-by: Dan Williams Tested-by: Alison Schofield Link: https://patch.msgid.link/20260210-fix-port-enumeration-failure-v3-2-06acce0b9ead@zohomail.com Signed-off-by: Dave Jiang --- drivers/cxl/core/port.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 6be150e645b6..0c5957d1d329 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -1767,7 +1767,16 @@ static struct cxl_dport *find_or_add_dport(struct cxl_port *port, { struct cxl_dport *dport; - device_lock_assert(&port->dev); + /* + * The port is already visible in CXL hierarchy, but it may still + * be in the process of binding to the CXL port driver at this point. + * + * port creation and driver binding are protected by the port's host + * lock, so acquire the host lock here to ensure the port has completed + * driver binding before proceeding with dport addition. + */ + guard(device)(port_to_host(port)); + guard(device)(&port->dev); dport = cxl_find_dport_by_dev(port, dport_dev); if (!dport) { dport = probe_dport(port, dport_dev); @@ -1834,13 +1843,11 @@ retry: * RP port enumerated by cxl_acpi without dport will * have the dport added here. */ - scoped_guard(device, &port->dev) { - dport = find_or_add_dport(port, dport_dev); - if (IS_ERR(dport)) { - if (PTR_ERR(dport) == -EAGAIN) - goto retry; - return PTR_ERR(dport); - } + dport = find_or_add_dport(port, dport_dev); + if (IS_ERR(dport)) { + if (PTR_ERR(dport) == -EAGAIN) + goto retry; + return PTR_ERR(dport); } rc = cxl_add_ep(dport, &cxlmd->dev); From 08fe1b5166fdc81b010d7bf39cd6440620e7931e Mon Sep 17 00:00:00 2001 From: Lizhi Hou Date: Thu, 5 Feb 2026 22:02:37 -0800 Subject: [PATCH 120/359] accel/amdxdna: Remove buffer size check when creating command BO Large command buffers may be used, and they do not always need to be mapped or accessed by the driver. Performing a size check at command BO creation time unnecessarily rejects valid use cases. Remove the buffer size check from command BO creation, and defer vmap and size validation to the paths where the driver actually needs to map and access the command buffer. Fixes: ac49797c1815 ("accel/amdxdna: Add GEM buffer object management") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260206060237.4050492-1-lizhi.hou@amd.com --- drivers/accel/amdxdna/amdxdna_gem.c | 38 ++++++++++++++--------------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/drivers/accel/amdxdna/amdxdna_gem.c b/drivers/accel/amdxdna/amdxdna_gem.c index 8c290ddd3251..d60db49ead71 100644 --- a/drivers/accel/amdxdna/amdxdna_gem.c +++ b/drivers/accel/amdxdna/amdxdna_gem.c @@ -21,8 +21,6 @@ #include "amdxdna_pci_drv.h" #include "amdxdna_ubuf.h" -#define XDNA_MAX_CMD_BO_SIZE SZ_32K - MODULE_IMPORT_NS("DMA_BUF"); static int @@ -745,12 +743,6 @@ amdxdna_drm_create_cmd_bo(struct drm_device *dev, { struct amdxdna_dev *xdna = to_xdna_dev(dev); struct amdxdna_gem_obj *abo; - int ret; - - if (args->size > XDNA_MAX_CMD_BO_SIZE) { - XDNA_ERR(xdna, "Command bo size 0x%llx too large", args->size); - return ERR_PTR(-EINVAL); - } if (args->size < sizeof(struct amdxdna_cmd)) { XDNA_DBG(xdna, "Command BO size 0x%llx too small", args->size); @@ -764,17 +756,7 @@ amdxdna_drm_create_cmd_bo(struct drm_device *dev, abo->type = AMDXDNA_BO_CMD; abo->client = filp->driver_priv; - ret = amdxdna_gem_obj_vmap(abo, &abo->mem.kva); - if (ret) { - XDNA_ERR(xdna, "Vmap cmd bo failed, ret %d", ret); - goto release_obj; - } - return abo; - -release_obj: - drm_gem_object_put(to_gobj(abo)); - return ERR_PTR(ret); } int amdxdna_drm_create_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) @@ -871,6 +853,7 @@ struct amdxdna_gem_obj *amdxdna_gem_get_obj(struct amdxdna_client *client, struct amdxdna_dev *xdna = client->xdna; struct amdxdna_gem_obj *abo; struct drm_gem_object *gobj; + int ret; gobj = drm_gem_object_lookup(client->filp, bo_hdl); if (!gobj) { @@ -879,9 +862,26 @@ struct amdxdna_gem_obj *amdxdna_gem_get_obj(struct amdxdna_client *client, } abo = to_xdna_obj(gobj); - if (bo_type == AMDXDNA_BO_INVALID || abo->type == bo_type) + if (bo_type != AMDXDNA_BO_INVALID && abo->type != bo_type) + goto put_obj; + + if (bo_type != AMDXDNA_BO_CMD || abo->mem.kva) return abo; + if (abo->mem.size > SZ_32K) { + XDNA_ERR(xdna, "Cmd bo is too big %ld", abo->mem.size); + goto put_obj; + } + + ret = amdxdna_gem_obj_vmap(abo, &abo->mem.kva); + if (ret) { + XDNA_ERR(xdna, "Vmap cmd bo failed, ret %d", ret); + goto put_obj; + } + + return abo; + +put_obj: drm_gem_object_put(gobj); return NULL; } From c68a6af400ca80596e8c37de0a1cb564aa9da8a4 Mon Sep 17 00:00:00 2001 From: Lizhi Hou Date: Thu, 5 Feb 2026 22:02:51 -0800 Subject: [PATCH 121/359] accel/amdxdna: Switch to always use chained command Preempt commands are only supported when submitted as chained commands. To ensure preempt support works consistently, always submit commands in chained command format. Set force_cmdlist to true so that single commands are filled using the chained command layout, enabling correct handling of preempt commands. Fixes: 3a0ff7b98af4 ("accel/amdxdna: Support preemption requests") Reviewed-by: Karol Wachowski Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260206060251.4050512-1-lizhi.hou@amd.com --- drivers/accel/amdxdna/aie2_ctx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_ctx.c index 4503c7c77a3e..7140c3f96362 100644 --- a/drivers/accel/amdxdna/aie2_ctx.c +++ b/drivers/accel/amdxdna/aie2_ctx.c @@ -23,9 +23,9 @@ #include "amdxdna_pci_drv.h" #include "amdxdna_pm.h" -static bool force_cmdlist; +static bool force_cmdlist = true; module_param(force_cmdlist, bool, 0600); -MODULE_PARM_DESC(force_cmdlist, "Force use command list (Default false)"); +MODULE_PARM_DESC(force_cmdlist, "Force use command list (Default true)"); #define HWCTX_MAX_TIMEOUT 60000 /* milliseconds */ From 8363c02863332992a1822688da41f881d88d1631 Mon Sep 17 00:00:00 2001 From: Lizhi Hou Date: Thu, 5 Feb 2026 22:03:06 -0800 Subject: [PATCH 122/359] accel/amdxdna: Fix crash when destroying a suspended hardware context If userspace issues an ioctl to destroy a hardware context that has already been automatically suspended, the driver may crash because the mailbox channel pointer is NULL for the suspended context. Fix this by checking the mailbox channel pointer in aie2_destroy_context() before accessing it. Fixes: 97f27573837e ("accel/amdxdna: Fix potential NULL pointer dereference in context cleanup") Reviewed-by: Karol Wachowski Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260206060306.4050531-1-lizhi.hou@amd.com --- drivers/accel/amdxdna/aie2_message.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/aie2_message.c index 7d7dcfeaf794..ab1178850c47 100644 --- a/drivers/accel/amdxdna/aie2_message.c +++ b/drivers/accel/amdxdna/aie2_message.c @@ -318,6 +318,9 @@ int aie2_destroy_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwctx *hwc struct amdxdna_dev *xdna = ndev->xdna; int ret; + if (!hwctx->priv->mbox_chann) + return 0; + xdna_mailbox_stop_channel(hwctx->priv->mbox_chann); ret = aie2_destroy_context_req(ndev, hwctx->fw_ctx_id); xdna_mailbox_destroy_channel(hwctx->priv->mbox_chann); From 57aa3917a3b3bd805a3679371f97a1ceda3c5510 Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Tue, 10 Feb 2026 10:42:51 -0600 Subject: [PATCH 123/359] accel/amdxdna: Reduce log noise during process termination During process termination, several error messages are logged that are not actual errors but expected conditions when a process is killed or interrupted. This creates unnecessary noise in the kernel log. The specific scenarios are: 1. HMM invalidation returns -ERESTARTSYS when the wait is interrupted by a signal during process cleanup. This is expected when a process is being terminated and should not be logged as an error. 2. Context destruction returns -ENODEV when the firmware or device has already stopped, which commonly occurs during cleanup if the device was already torn down. This is also an expected condition during orderly shutdown. Downgrade these expected error conditions from error level to debug level to reduce log noise while still keeping genuine errors visible. Fixes: 97f27573837e ("accel/amdxdna: Fix potential NULL pointer dereference in context cleanup") Reviewed-by: Lizhi Hou Signed-off-by: Mario Limonciello Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260210164521.1094274-3-mario.limonciello@amd.com --- drivers/accel/amdxdna/aie2_ctx.c | 6 ++++-- drivers/accel/amdxdna/aie2_message.c | 4 +++- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_ctx.c index 7140c3f96362..e13be7608462 100644 --- a/drivers/accel/amdxdna/aie2_ctx.c +++ b/drivers/accel/amdxdna/aie2_ctx.c @@ -497,7 +497,7 @@ static void aie2_release_resource(struct amdxdna_hwctx *hwctx) if (AIE2_FEATURE_ON(xdna->dev_handle, AIE2_TEMPORAL_ONLY)) { ret = aie2_destroy_context(xdna->dev_handle, hwctx); - if (ret) + if (ret && ret != -ENODEV) XDNA_ERR(xdna, "Destroy temporal only context failed, ret %d", ret); } else { ret = xrs_release_resource(xdna->xrs_hdl, (uintptr_t)hwctx); @@ -1070,6 +1070,8 @@ void aie2_hmm_invalidate(struct amdxdna_gem_obj *abo, ret = dma_resv_wait_timeout(gobj->resv, DMA_RESV_USAGE_BOOKKEEP, true, MAX_SCHEDULE_TIMEOUT); - if (!ret || ret == -ERESTARTSYS) + if (!ret) XDNA_ERR(xdna, "Failed to wait for bo, ret %ld", ret); + else if (ret == -ERESTARTSYS) + XDNA_DBG(xdna, "Wait for bo interrupted by signal"); } diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/aie2_message.c index ab1178850c47..5d80c5837745 100644 --- a/drivers/accel/amdxdna/aie2_message.c +++ b/drivers/accel/amdxdna/aie2_message.c @@ -216,8 +216,10 @@ static int aie2_destroy_context_req(struct amdxdna_dev_hdl *ndev, u32 id) req.context_id = id; ret = aie2_send_mgmt_msg_wait(ndev, &msg); - if (ret) + if (ret && ret != -ENODEV) XDNA_WARN(xdna, "Destroy context failed, ret %d", ret); + else if (ret == -ENODEV) + XDNA_DBG(xdna, "Destroy context: device already stopped"); return ret; } From 1aa82181a3c285c7351523d587f7981ae4c015c8 Mon Sep 17 00:00:00 2001 From: Lizhi Hou Date: Wed, 11 Feb 2026 12:46:44 -0800 Subject: [PATCH 124/359] accel/amdxdna: Fix dead lock for suspend and resume When an application issues a query IOCTL while auto suspend is running, a deadlock can occur. The query path holds dev_lock and then calls pm_runtime_resume_and_get(), which waits for the ongoing suspend to complete. Meanwhile, the suspend callback attempts to acquire dev_lock and blocks, resulting in a deadlock. Fix this by releasing dev_lock before calling pm_runtime_resume_and_get() and reacquiring it after the call completes. Also acquire dev_lock in the resume callback to keep the locking consistent. Fixes: 063db451832b ("accel/amdxdna: Enhance runtime power management") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260211204644.722758-1-lizhi.hou@amd.com --- drivers/accel/amdxdna/aie2_ctx.c | 4 ++-- drivers/accel/amdxdna/aie2_pci.c | 7 +++---- drivers/accel/amdxdna/aie2_pm.c | 2 +- drivers/accel/amdxdna/amdxdna_ctx.c | 19 +++++++------------ drivers/accel/amdxdna/amdxdna_pm.c | 2 ++ drivers/accel/amdxdna/amdxdna_pm.h | 11 +++++++++++ 6 files changed, 26 insertions(+), 19 deletions(-) diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_ctx.c index e13be7608462..8d79fafd889a 100644 --- a/drivers/accel/amdxdna/aie2_ctx.c +++ b/drivers/accel/amdxdna/aie2_ctx.c @@ -629,7 +629,7 @@ int aie2_hwctx_init(struct amdxdna_hwctx *hwctx) goto free_entity; } - ret = amdxdna_pm_resume_get(xdna); + ret = amdxdna_pm_resume_get_locked(xdna); if (ret) goto free_col_list; @@ -760,7 +760,7 @@ static int aie2_hwctx_cu_config(struct amdxdna_hwctx *hwctx, void *buf, u32 size if (!hwctx->cus) return -ENOMEM; - ret = amdxdna_pm_resume_get(xdna); + ret = amdxdna_pm_resume_get_locked(xdna); if (ret) goto free_cus; diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c index 2a51b2658bfc..07e369507818 100644 --- a/drivers/accel/amdxdna/aie2_pci.c +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -451,7 +451,6 @@ static int aie2_hw_suspend(struct amdxdna_dev *xdna) { struct amdxdna_client *client; - guard(mutex)(&xdna->dev_lock); list_for_each_entry(client, &xdna->client_list, node) aie2_hwctx_suspend(client); @@ -951,7 +950,7 @@ static int aie2_get_info(struct amdxdna_client *client, struct amdxdna_drm_get_i if (!drm_dev_enter(&xdna->ddev, &idx)) return -ENODEV; - ret = amdxdna_pm_resume_get(xdna); + ret = amdxdna_pm_resume_get_locked(xdna); if (ret) goto dev_exit; @@ -1044,7 +1043,7 @@ static int aie2_get_array(struct amdxdna_client *client, if (!drm_dev_enter(&xdna->ddev, &idx)) return -ENODEV; - ret = amdxdna_pm_resume_get(xdna); + ret = amdxdna_pm_resume_get_locked(xdna); if (ret) goto dev_exit; @@ -1134,7 +1133,7 @@ static int aie2_set_state(struct amdxdna_client *client, if (!drm_dev_enter(&xdna->ddev, &idx)) return -ENODEV; - ret = amdxdna_pm_resume_get(xdna); + ret = amdxdna_pm_resume_get_locked(xdna); if (ret) goto dev_exit; diff --git a/drivers/accel/amdxdna/aie2_pm.c b/drivers/accel/amdxdna/aie2_pm.c index 579b8be13b18..29bd4403a94d 100644 --- a/drivers/accel/amdxdna/aie2_pm.c +++ b/drivers/accel/amdxdna/aie2_pm.c @@ -31,7 +31,7 @@ int aie2_pm_set_dpm(struct amdxdna_dev_hdl *ndev, u32 dpm_level) { int ret; - ret = amdxdna_pm_resume_get(ndev->xdna); + ret = amdxdna_pm_resume_get_locked(ndev->xdna); if (ret) return ret; diff --git a/drivers/accel/amdxdna/amdxdna_ctx.c b/drivers/accel/amdxdna/amdxdna_ctx.c index 59fa3800b9d3..5173456bbb61 100644 --- a/drivers/accel/amdxdna/amdxdna_ctx.c +++ b/drivers/accel/amdxdna/amdxdna_ctx.c @@ -266,9 +266,9 @@ int amdxdna_drm_config_hwctx_ioctl(struct drm_device *dev, void *data, struct dr struct amdxdna_drm_config_hwctx *args = data; struct amdxdna_dev *xdna = to_xdna_dev(dev); struct amdxdna_hwctx *hwctx; - int ret, idx; u32 buf_size; void *buf; + int ret; u64 val; if (XDNA_MBZ_DBG(xdna, &args->pad, sizeof(args->pad))) @@ -310,20 +310,17 @@ int amdxdna_drm_config_hwctx_ioctl(struct drm_device *dev, void *data, struct dr return -EINVAL; } - mutex_lock(&xdna->dev_lock); - idx = srcu_read_lock(&client->hwctx_srcu); + guard(mutex)(&xdna->dev_lock); hwctx = xa_load(&client->hwctx_xa, args->handle); if (!hwctx) { XDNA_DBG(xdna, "PID %d failed to get hwctx %d", client->pid, args->handle); ret = -EINVAL; - goto unlock_srcu; + goto free_buf; } ret = xdna->dev_info->ops->hwctx_config(hwctx, args->param_type, val, buf, buf_size); -unlock_srcu: - srcu_read_unlock(&client->hwctx_srcu, idx); - mutex_unlock(&xdna->dev_lock); +free_buf: kfree(buf); return ret; } @@ -334,7 +331,7 @@ int amdxdna_hwctx_sync_debug_bo(struct amdxdna_client *client, u32 debug_bo_hdl) struct amdxdna_hwctx *hwctx; struct amdxdna_gem_obj *abo; struct drm_gem_object *gobj; - int ret, idx; + int ret; if (!xdna->dev_info->ops->hwctx_sync_debug_bo) return -EOPNOTSUPP; @@ -345,17 +342,15 @@ int amdxdna_hwctx_sync_debug_bo(struct amdxdna_client *client, u32 debug_bo_hdl) abo = to_xdna_obj(gobj); guard(mutex)(&xdna->dev_lock); - idx = srcu_read_lock(&client->hwctx_srcu); hwctx = xa_load(&client->hwctx_xa, abo->assigned_hwctx); if (!hwctx) { ret = -EINVAL; - goto unlock_srcu; + goto put_obj; } ret = xdna->dev_info->ops->hwctx_sync_debug_bo(hwctx, debug_bo_hdl); -unlock_srcu: - srcu_read_unlock(&client->hwctx_srcu, idx); +put_obj: drm_gem_object_put(gobj); return ret; } diff --git a/drivers/accel/amdxdna/amdxdna_pm.c b/drivers/accel/amdxdna/amdxdna_pm.c index d024d480521c..b1fafddd7ad5 100644 --- a/drivers/accel/amdxdna/amdxdna_pm.c +++ b/drivers/accel/amdxdna/amdxdna_pm.c @@ -16,6 +16,7 @@ int amdxdna_pm_suspend(struct device *dev) struct amdxdna_dev *xdna = to_xdna_dev(dev_get_drvdata(dev)); int ret = -EOPNOTSUPP; + guard(mutex)(&xdna->dev_lock); if (xdna->dev_info->ops->suspend) ret = xdna->dev_info->ops->suspend(xdna); @@ -28,6 +29,7 @@ int amdxdna_pm_resume(struct device *dev) struct amdxdna_dev *xdna = to_xdna_dev(dev_get_drvdata(dev)); int ret = -EOPNOTSUPP; + guard(mutex)(&xdna->dev_lock); if (xdna->dev_info->ops->resume) ret = xdna->dev_info->ops->resume(xdna); diff --git a/drivers/accel/amdxdna/amdxdna_pm.h b/drivers/accel/amdxdna/amdxdna_pm.h index 77b2d6e45570..3d26b973e0e3 100644 --- a/drivers/accel/amdxdna/amdxdna_pm.h +++ b/drivers/accel/amdxdna/amdxdna_pm.h @@ -15,4 +15,15 @@ void amdxdna_pm_suspend_put(struct amdxdna_dev *xdna); void amdxdna_pm_init(struct amdxdna_dev *xdna); void amdxdna_pm_fini(struct amdxdna_dev *xdna); +static inline int amdxdna_pm_resume_get_locked(struct amdxdna_dev *xdna) +{ + int ret; + + mutex_unlock(&xdna->dev_lock); + ret = amdxdna_pm_resume_get(xdna); + mutex_lock(&xdna->dev_lock); + + return ret; +} + #endif /* _AMDXDNA_PM_H_ */ From fdb65acfe655f844ae1e88696b9656d3ef5bb8fb Mon Sep 17 00:00:00 2001 From: Lizhi Hou Date: Wed, 11 Feb 2026 12:47:16 -0800 Subject: [PATCH 125/359] accel/amdxdna: Fix suspend failure after enabling turbo mode Enabling turbo mode disables hardware clock gating. Suspend requires hardware clock gating to be re-enabled, otherwise suspend will fail. Fix this by calling aie2_runtime_cfg() from aie2_hw_stop() to re-enable clock gating during suspend. Also ensure that firmware is initialized in aie2_hw_start() before modifying clock-gating settings during resume. Fixes: f4d7b8a6bc8c ("accel/amdxdna: Enhance power management settings") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260211204716.722788-1-lizhi.hou@amd.com --- drivers/accel/amdxdna/aie2_pci.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c index 07e369507818..4b3e6bb97bd2 100644 --- a/drivers/accel/amdxdna/aie2_pci.c +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -323,6 +323,7 @@ static void aie2_hw_stop(struct amdxdna_dev *xdna) return; } + aie2_runtime_cfg(ndev, AIE2_RT_CFG_CLK_GATING, NULL); aie2_mgmt_fw_fini(ndev); xdna_mailbox_stop_channel(ndev->mgmt_chann); xdna_mailbox_destroy_channel(ndev->mgmt_chann); @@ -406,18 +407,18 @@ static int aie2_hw_start(struct amdxdna_dev *xdna) goto stop_psp; } - ret = aie2_pm_init(ndev); - if (ret) { - XDNA_ERR(xdna, "failed to init pm, ret %d", ret); - goto destroy_mgmt_chann; - } - ret = aie2_mgmt_fw_init(ndev); if (ret) { XDNA_ERR(xdna, "initial mgmt firmware failed, ret %d", ret); goto destroy_mgmt_chann; } + ret = aie2_pm_init(ndev); + if (ret) { + XDNA_ERR(xdna, "failed to init pm, ret %d", ret); + goto destroy_mgmt_chann; + } + ret = aie2_mgmt_fw_query(ndev); if (ret) { XDNA_ERR(xdna, "failed to query fw, ret %d", ret); From 07efce5a6611af6714ea3ef65694e0c8dd7e44f5 Mon Sep 17 00:00:00 2001 From: Lizhi Hou Date: Wed, 11 Feb 2026 12:53:41 -0800 Subject: [PATCH 126/359] accel/amdxdna: Fix command hang on suspended hardware context When a hardware context is suspended, the job scheduler is stopped. If a command is submitted while the context is suspended, the job is queued in the scheduler but aie2_sched_job_run() is never invoked to restart the hardware context. As a result, the command hangs. Fix this by modifying the hardware context suspend routine to keep the job scheduler running so that queued jobs can trigger context restart properly. Fixes: aac243092b70 ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260211205341.722982-1-lizhi.hou@amd.com --- drivers/accel/amdxdna/aie2_ctx.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_ctx.c index 8d79fafd889a..25845bd5e507 100644 --- a/drivers/accel/amdxdna/aie2_ctx.c +++ b/drivers/accel/amdxdna/aie2_ctx.c @@ -53,6 +53,7 @@ static void aie2_hwctx_stop(struct amdxdna_dev *xdna, struct amdxdna_hwctx *hwct { drm_sched_stop(&hwctx->priv->sched, bad_job); aie2_destroy_context(xdna->dev_handle, hwctx); + drm_sched_start(&hwctx->priv->sched, 0); } static int aie2_hwctx_restart(struct amdxdna_dev *xdna, struct amdxdna_hwctx *hwctx) @@ -80,7 +81,6 @@ static int aie2_hwctx_restart(struct amdxdna_dev *xdna, struct amdxdna_hwctx *hw } out: - drm_sched_start(&hwctx->priv->sched, 0); XDNA_DBG(xdna, "%s restarted, ret %d", hwctx->name, ret); return ret; } @@ -297,19 +297,23 @@ aie2_sched_job_run(struct drm_sched_job *sched_job) struct dma_fence *fence; int ret; - if (!hwctx->priv->mbox_chann) + ret = amdxdna_pm_resume_get(hwctx->client->xdna); + if (ret) return NULL; - if (!mmget_not_zero(job->mm)) + if (!hwctx->priv->mbox_chann) { + amdxdna_pm_suspend_put(hwctx->client->xdna); + return NULL; + } + + if (!mmget_not_zero(job->mm)) { + amdxdna_pm_suspend_put(hwctx->client->xdna); return ERR_PTR(-ESRCH); + } kref_get(&job->refcnt); fence = dma_fence_get(job->fence); - ret = amdxdna_pm_resume_get(hwctx->client->xdna); - if (ret) - goto out; - if (job->drv_cmd) { switch (job->drv_cmd->opcode) { case SYNC_DEBUG_BO: From 1110a949675ebd56b3f0286e664ea543f745801c Mon Sep 17 00:00:00 2001 From: Lizhi Hou Date: Tue, 17 Feb 2026 10:54:15 -0800 Subject: [PATCH 127/359] accel/amdxdna: Fix out-of-bounds memset in command slot handling The remaining space in a command slot may be smaller than the size of the command header. Clearing the command header with memset() before verifying the available slot space can result in an out-of-bounds write and memory corruption. Fix this by moving the memset() call after the size validation. Fixes: 3d32eb7a5ecf ("accel/amdxdna: Fix cu_idx being cleared by memset() during command setup") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260217185415.1781908-1-lizhi.hou@amd.com --- drivers/accel/amdxdna/aie2_message.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/aie2_message.c index 5d80c5837745..277a27bce850 100644 --- a/drivers/accel/amdxdna/aie2_message.c +++ b/drivers/accel/amdxdna/aie2_message.c @@ -699,11 +699,11 @@ aie2_cmdlist_fill_npu_cf(struct amdxdna_gem_obj *cmd_bo, void *slot, size_t *siz u32 cmd_len; void *cmd; - memset(npu_slot, 0, sizeof(*npu_slot)); cmd = amdxdna_cmd_get_payload(cmd_bo, &cmd_len); if (*size < sizeof(*npu_slot) + cmd_len) return -EINVAL; + memset(npu_slot, 0, sizeof(*npu_slot)); npu_slot->cu_idx = amdxdna_cmd_get_cu_idx(cmd_bo); if (npu_slot->cu_idx == INVALID_CU_IDX) return -EINVAL; @@ -724,7 +724,6 @@ aie2_cmdlist_fill_npu_dpu(struct amdxdna_gem_obj *cmd_bo, void *slot, size_t *si u32 cmd_len; u32 arg_sz; - memset(npu_slot, 0, sizeof(*npu_slot)); sn = amdxdna_cmd_get_payload(cmd_bo, &cmd_len); arg_sz = cmd_len - sizeof(*sn); if (cmd_len < sizeof(*sn) || arg_sz > MAX_NPU_ARGS_SIZE) @@ -733,6 +732,7 @@ aie2_cmdlist_fill_npu_dpu(struct amdxdna_gem_obj *cmd_bo, void *slot, size_t *si if (*size < sizeof(*npu_slot) + arg_sz) return -EINVAL; + memset(npu_slot, 0, sizeof(*npu_slot)); npu_slot->cu_idx = amdxdna_cmd_get_cu_idx(cmd_bo); if (npu_slot->cu_idx == INVALID_CU_IDX) return -EINVAL; @@ -756,7 +756,6 @@ aie2_cmdlist_fill_npu_preempt(struct amdxdna_gem_obj *cmd_bo, void *slot, size_t u32 cmd_len; u32 arg_sz; - memset(npu_slot, 0, sizeof(*npu_slot)); pd = amdxdna_cmd_get_payload(cmd_bo, &cmd_len); arg_sz = cmd_len - sizeof(*pd); if (cmd_len < sizeof(*pd) || arg_sz > MAX_NPU_ARGS_SIZE) @@ -765,6 +764,7 @@ aie2_cmdlist_fill_npu_preempt(struct amdxdna_gem_obj *cmd_bo, void *slot, size_t if (*size < sizeof(*npu_slot) + arg_sz) return -EINVAL; + memset(npu_slot, 0, sizeof(*npu_slot)); npu_slot->cu_idx = amdxdna_cmd_get_cu_idx(cmd_bo); if (npu_slot->cu_idx == INVALID_CU_IDX) return -EINVAL; @@ -792,7 +792,6 @@ aie2_cmdlist_fill_npu_elf(struct amdxdna_gem_obj *cmd_bo, void *slot, size_t *si u32 cmd_len; u32 arg_sz; - memset(npu_slot, 0, sizeof(*npu_slot)); pd = amdxdna_cmd_get_payload(cmd_bo, &cmd_len); arg_sz = cmd_len - sizeof(*pd); if (cmd_len < sizeof(*pd) || arg_sz > MAX_NPU_ARGS_SIZE) @@ -801,6 +800,7 @@ aie2_cmdlist_fill_npu_elf(struct amdxdna_gem_obj *cmd_bo, void *slot, size_t *si if (*size < sizeof(*npu_slot) + arg_sz) return -EINVAL; + memset(npu_slot, 0, sizeof(*npu_slot)); npu_slot->type = EXEC_NPU_TYPE_ELF; npu_slot->inst_buf_addr = pd->inst_buf; npu_slot->save_buf_addr = pd->save_buf; From 03808abb1d868aed7478a11a82e5bb4b3f1ca6d6 Mon Sep 17 00:00:00 2001 From: Lizhi Hou Date: Tue, 17 Feb 2026 11:28:15 -0800 Subject: [PATCH 128/359] accel/amdxdna: Prevent ubuf size overflow The ubuf size calculation may overflow, resulting in an undersized allocation and possible memory corruption. Use check_add_overflow() helpers to validate the size calculation before allocation. Fixes: bd72d4acda10 ("accel/amdxdna: Support user space allocated buffer") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260217192815.1784689-1-lizhi.hou@amd.com --- drivers/accel/amdxdna/amdxdna_ubuf.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/accel/amdxdna/amdxdna_ubuf.c b/drivers/accel/amdxdna/amdxdna_ubuf.c index b509f10b155c..fb71d6e3f44d 100644 --- a/drivers/accel/amdxdna/amdxdna_ubuf.c +++ b/drivers/accel/amdxdna/amdxdna_ubuf.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -176,7 +177,10 @@ struct dma_buf *amdxdna_get_ubuf(struct drm_device *dev, goto free_ent; } - exp_info.size += va_ent[i].len; + if (check_add_overflow(exp_info.size, va_ent[i].len, &exp_info.size)) { + ret = -EINVAL; + goto free_ent; + } } ubuf->nr_pages = exp_info.size >> PAGE_SHIFT; From 901ec3470994006bc8dd02399e16b675566c3416 Mon Sep 17 00:00:00 2001 From: Lizhi Hou Date: Thu, 19 Feb 2026 13:19:46 -0800 Subject: [PATCH 129/359] accel/amdxdna: Validate command buffer payload count The count field in the command header is used to determine the valid payload size. Verify that the valid payload does not exceed the remaining buffer space. Fixes: aac243092b70 ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260219211946.1920485-1-lizhi.hou@amd.com --- drivers/accel/amdxdna/amdxdna_ctx.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/accel/amdxdna/amdxdna_ctx.c b/drivers/accel/amdxdna/amdxdna_ctx.c index 5173456bbb61..263d36072540 100644 --- a/drivers/accel/amdxdna/amdxdna_ctx.c +++ b/drivers/accel/amdxdna/amdxdna_ctx.c @@ -104,7 +104,10 @@ void *amdxdna_cmd_get_payload(struct amdxdna_gem_obj *abo, u32 *size) if (size) { count = FIELD_GET(AMDXDNA_CMD_COUNT, cmd->header); - if (unlikely(count <= num_masks)) { + if (unlikely(count <= num_masks || + count * sizeof(u32) + + offsetof(struct amdxdna_cmd, data[0]) > + abo->mem.size)) { *size = 0; return NULL; } From 551d44200152cb26f75d2ef990aeb6185b7e37fd Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Mon, 23 Feb 2026 09:33:08 -0800 Subject: [PATCH 130/359] default_gfp(): avoid using the "newfangled" __VA_OPT__ trick The default_gfp() helper that I added is not wrong, but it turns out that it causes unnecessary headaches for 'sparse' which doesn't support the use of __VA_OPT__ (introduced in C++20 and C23, and supported by gcc and clang for a long time). We do already use __VA_OPT__ in some other cases in the kernel (drm/xe and btrfs), but it has been fairly limited. Now it triggers for pretty much everything, and sparse ends up not working at all. We can use the traditional gcc ',##__VA_ARGS__' syntax instead: it may not be the "C standard" way and is slightly less natural in this context, but it is the traditional model for this and avoids the sparse problem. Reported-and-tested-by: Ricardo Ribalda Reported-and-tested-by: Richard Fitzgerald Reported-by: Ben Dooks Fixes: e19e1b480ac7 ("add default_gfp() helper macro and use it in the new *alloc_obj() helpers") Signed-off-by: Linus Torvalds --- include/linux/gfp.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 2b30a0529d48..90536b2bc42e 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -14,8 +14,8 @@ struct vm_area_struct; struct mempolicy; /* Helper macro to avoid gfp flags if they are the default one */ -#define __default_gfp(a,...) a -#define default_gfp(...) __default_gfp(__VA_ARGS__ __VA_OPT__(,) GFP_KERNEL) +#define __default_gfp(a,b,...) b +#define default_gfp(...) __default_gfp(,##__VA_ARGS__,GFP_KERNEL) /* Convert GFP flags to their corresponding migrate type */ #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE) From e7e222ad73d93fe54d6e6e3a15253a0ecf081a1b Mon Sep 17 00:00:00 2001 From: Dave Jiang Date: Wed, 11 Feb 2026 17:31:23 -0700 Subject: [PATCH 131/359] cxl: Move devm_cxl_add_nvdimm_bridge() to cxl_pmem.ko Moving the symbol devm_cxl_add_nvdimm_bridge() to drivers/cxl/cxl_pmem.c, so that cxl_pmem can export a symbol that gives cxl_acpi a depedency on cxl_pmem kernel module. This is a prepatory patch to resolve the issue of a race for nvdimm_bus object that is created during cxl_acpi_probe(). No functional changes besides moving code. Suggested-by: Dan Williams Acked-by: Ira Weiny Tested-by: Alison Schofield Reviewed-by: Alison Schofield Link: https://patch.msgid.link/20260205001633.1813643-2-dave.jiang@intel.com Signed-off-by: Dave Jiang --- drivers/cxl/core/pmem.c | 13 +++---------- drivers/cxl/cxl.h | 2 ++ drivers/cxl/pmem.c | 14 ++++++++++++++ 3 files changed, 19 insertions(+), 10 deletions(-) diff --git a/drivers/cxl/core/pmem.c b/drivers/cxl/core/pmem.c index 3c6e76721522..b652e3486038 100644 --- a/drivers/cxl/core/pmem.c +++ b/drivers/cxl/core/pmem.c @@ -115,15 +115,8 @@ static void unregister_nvb(void *_cxl_nvb) device_unregister(&cxl_nvb->dev); } -/** - * devm_cxl_add_nvdimm_bridge() - add the root of a LIBNVDIMM topology - * @host: platform firmware root device - * @port: CXL port at the root of a CXL topology - * - * Return: bridge device that can host cxl_nvdimm objects - */ -struct cxl_nvdimm_bridge *devm_cxl_add_nvdimm_bridge(struct device *host, - struct cxl_port *port) +struct cxl_nvdimm_bridge *__devm_cxl_add_nvdimm_bridge(struct device *host, + struct cxl_port *port) { struct cxl_nvdimm_bridge *cxl_nvb; struct device *dev; @@ -155,7 +148,7 @@ err: put_device(dev); return ERR_PTR(rc); } -EXPORT_SYMBOL_NS_GPL(devm_cxl_add_nvdimm_bridge, "CXL"); +EXPORT_SYMBOL_FOR_MODULES(__devm_cxl_add_nvdimm_bridge, "cxl_pmem"); static void cxl_nvdimm_release(struct device *dev) { diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 04c673e7cdb0..f5850800f400 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -920,6 +920,8 @@ void cxl_driver_unregister(struct cxl_driver *cxl_drv); struct cxl_nvdimm_bridge *to_cxl_nvdimm_bridge(struct device *dev); struct cxl_nvdimm_bridge *devm_cxl_add_nvdimm_bridge(struct device *host, struct cxl_port *port); +struct cxl_nvdimm_bridge *__devm_cxl_add_nvdimm_bridge(struct device *host, + struct cxl_port *port); struct cxl_nvdimm *to_cxl_nvdimm(struct device *dev); bool is_cxl_nvdimm(struct device *dev); int devm_cxl_add_nvdimm(struct device *host, struct cxl_port *port, diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c index 6a97e4e490b6..c67b30b516bf 100644 --- a/drivers/cxl/pmem.c +++ b/drivers/cxl/pmem.c @@ -13,6 +13,20 @@ static __read_mostly DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX); +/** + * __devm_cxl_add_nvdimm_bridge() - add the root of a LIBNVDIMM topology + * @host: platform firmware root device + * @port: CXL port at the root of a CXL topology + * + * Return: bridge device that can host cxl_nvdimm objects + */ +struct cxl_nvdimm_bridge *devm_cxl_add_nvdimm_bridge(struct device *host, + struct cxl_port *port) +{ + return __devm_cxl_add_nvdimm_bridge(host, port); +} +EXPORT_SYMBOL_NS_GPL(devm_cxl_add_nvdimm_bridge, "CXL"); + static void clear_exclusive(void *mds) { clear_exclusive_cxl_commands(mds, exclusive_cmds); From 43d37df67f7770d8d261fdcb64ecc8c314e91303 Mon Sep 17 00:00:00 2001 From: Matt Roper Date: Fri, 6 Feb 2026 14:30:59 -0800 Subject: [PATCH 132/359] drm/xe/wa: Steer RMW of MCR registers while building default LRC When generating the default LRC, if a register is not masked, we apply any save-restore programming necessary via a read-modify-write sequence that will ensure we only update the relevant bits/fields without clobbering the rest of the register. However some of the registers that need to be updated might be MCR registers which require steering to a non-terminated instance to ensure we can read back a valid, non-zero value. The steering of reads originating from a command streamer is controlled by register CS_MMIO_GROUP_INSTANCE_SELECT. Emit additional MI_LRI commands to update the steering before any RMW of an MCR register to ensure the reads are performed properly. Note that needing to perform a RMW of an MCR register while building the default LRC is pretty rare. Most of the MCR registers that are part of an engine's LRCs are also masked registers, so no MCR is necessary. Fixes: f2f90989ccff ("drm/xe: Avoid reading RMW registers in emit_wa_job") Cc: Michal Wajdeczko Reviewed-by: Balasubramani Vivekanandan Link: https://patch.msgid.link/20260206223058.387014-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper (cherry picked from commit 6c2e331c915ba9e774aa847921262805feb00863) Signed-off-by: Rodrigo Vivi --- drivers/gpu/drm/xe/regs/xe_engine_regs.h | 6 +++ drivers/gpu/drm/xe/xe_gt.c | 66 +++++++++++++++++++----- 2 files changed, 60 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/xe/regs/xe_engine_regs.h b/drivers/gpu/drm/xe/regs/xe_engine_regs.h index 68172b0248a6..dc5a4fafa70c 100644 --- a/drivers/gpu/drm/xe/regs/xe_engine_regs.h +++ b/drivers/gpu/drm/xe/regs/xe_engine_regs.h @@ -96,6 +96,12 @@ #define ENABLE_SEMAPHORE_POLL_BIT REG_BIT(13) #define RING_CMD_CCTL(base) XE_REG((base) + 0xc4, XE_REG_OPTION_MASKED) + +#define CS_MMIO_GROUP_INSTANCE_SELECT(base) XE_REG((base) + 0xcc) +#define SELECTIVE_READ_ADDRESSING REG_BIT(30) +#define SELECTIVE_READ_GROUP REG_GENMASK(29, 23) +#define SELECTIVE_READ_INSTANCE REG_GENMASK(22, 16) + /* * CMD_CCTL read/write fields take a MOCS value and _not_ a table index. * The lsb of each can be considered a separate enabling bit for encryption. diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c index 9d090d0f2438..df6d04704823 100644 --- a/drivers/gpu/drm/xe/xe_gt.c +++ b/drivers/gpu/drm/xe/xe_gt.c @@ -210,11 +210,15 @@ static int emit_nop_job(struct xe_gt *gt, struct xe_exec_queue *q) return ret; } +/* Dwords required to emit a RMW of a register */ +#define EMIT_RMW_DW 20 + static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q) { - struct xe_reg_sr *sr = &q->hwe->reg_lrc; + struct xe_hw_engine *hwe = q->hwe; + struct xe_reg_sr *sr = &hwe->reg_lrc; struct xe_reg_sr_entry *entry; - int count_rmw = 0, count = 0, ret; + int count_rmw = 0, count_rmw_mcr = 0, count = 0, ret; unsigned long idx; struct xe_bb *bb; size_t bb_len = 0; @@ -224,6 +228,8 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q) xa_for_each(&sr->xa, idx, entry) { if (entry->reg.masked || entry->clr_bits == ~0) ++count; + else if (entry->reg.mcr) + ++count_rmw_mcr; else ++count_rmw; } @@ -231,17 +237,35 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q) if (count) bb_len += count * 2 + 1; - if (count_rmw) - bb_len += count_rmw * 20 + 7; + /* + * RMW of MCR registers is the same as a normal RMW, except an + * additional LRI (3 dwords) is required per register to steer the read + * to a nom-terminated instance. + * + * We could probably shorten the batch slightly by eliding the + * steering for consecutive MCR registers that have the same + * group/instance target, but it's not worth the extra complexity to do + * so. + */ + bb_len += count_rmw * EMIT_RMW_DW; + bb_len += count_rmw_mcr * (EMIT_RMW_DW + 3); - if (q->hwe->class == XE_ENGINE_CLASS_RENDER) + /* + * After doing all RMW, we need 7 trailing dwords to clean up, + * plus an additional 3 dwords to reset steering if any of the + * registers were MCR. + */ + if (count_rmw || count_rmw_mcr) + bb_len += 7 + (count_rmw_mcr ? 3 : 0); + + if (hwe->class == XE_ENGINE_CLASS_RENDER) /* * Big enough to emit all of the context's 3DSTATE via * xe_lrc_emit_hwe_state_instructions() */ - bb_len += xe_gt_lrc_size(gt, q->hwe->class) / sizeof(u32); + bb_len += xe_gt_lrc_size(gt, hwe->class) / sizeof(u32); - xe_gt_dbg(gt, "LRC %s WA job: %zu dwords\n", q->hwe->name, bb_len); + xe_gt_dbg(gt, "LRC %s WA job: %zu dwords\n", hwe->name, bb_len); bb = xe_bb_new(gt, bb_len, false); if (IS_ERR(bb)) @@ -276,13 +300,23 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q) } } - if (count_rmw) { - /* Emit MI_MATH for each RMW reg: 20dw per reg + 7 trailing dw */ - + if (count_rmw || count_rmw_mcr) { xa_for_each(&sr->xa, idx, entry) { if (entry->reg.masked || entry->clr_bits == ~0) continue; + if (entry->reg.mcr) { + struct xe_reg_mcr reg = { .__reg.raw = entry->reg.raw }; + u8 group, instance; + + xe_gt_mcr_get_nonterminated_steering(gt, reg, &group, &instance); + *cs++ = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(1); + *cs++ = CS_MMIO_GROUP_INSTANCE_SELECT(hwe->mmio_base).addr; + *cs++ = SELECTIVE_READ_ADDRESSING | + REG_FIELD_PREP(SELECTIVE_READ_GROUP, group) | + REG_FIELD_PREP(SELECTIVE_READ_INSTANCE, instance); + } + *cs++ = MI_LOAD_REGISTER_REG | MI_LRR_DST_CS_MMIO; *cs++ = entry->reg.addr; *cs++ = CS_GPR_REG(0, 0).addr; @@ -308,8 +342,9 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q) *cs++ = CS_GPR_REG(0, 0).addr; *cs++ = entry->reg.addr; - xe_gt_dbg(gt, "REG[%#x] = ~%#x|%#x\n", - entry->reg.addr, entry->clr_bits, entry->set_bits); + xe_gt_dbg(gt, "REG[%#x] = ~%#x|%#x%s\n", + entry->reg.addr, entry->clr_bits, entry->set_bits, + entry->reg.mcr ? " (MCR)" : ""); } /* reset used GPR */ @@ -321,6 +356,13 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q) *cs++ = 0; *cs++ = CS_GPR_REG(0, 2).addr; *cs++ = 0; + + /* reset steering */ + if (count_rmw_mcr) { + *cs++ = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(1); + *cs++ = CS_MMIO_GROUP_INSTANCE_SELECT(q->hwe->mmio_base).addr; + *cs++ = 0; + } } cs = xe_lrc_emit_hwe_state_instructions(q, cs); From 7dff99b354601dd01829e1511711846e04340a69 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Mon, 23 Feb 2026 11:18:48 -0800 Subject: [PATCH 133/359] Remove WARN_ALL_UNSEEDED_RANDOM kernel config option This config option goes way back - it used to be an internal debug option to random.c (at that point called DEBUG_RANDOM_BOOT), then was renamed and exposed as a config option as CONFIG_WARN_UNSEEDED_RANDOM, and then further renamed to the current CONFIG_WARN_ALL_UNSEEDED_RANDOM. It was all done with the best of intentions: the more limited rate-limited reports were reporting some cases, but if you wanted to see all the gory details, you'd enable this "ALL" option. However, it turns out - perhaps not surprisingly - that when people don't care about and fix the first rate-limited cases, they most certainly don't care about any others either, and so warning about all of them isn't actually helping anything. And the non-ratelimited reporting causes problems, where well-meaning people enable debug options, but the excessive flood of messages that nobody cares about will hide actual real information when things go wrong. I just got a kernel bug report (which had nothing to do with randomness) where two thirds of the the truncated dmesg was just variations of random: get_random_u32 called from __get_random_u32_below+0x10/0x70 with crng_init=0 and in the process early boot messages had been lost (in addition to making the messages that _hadn't_ been lost harder to read). The proper way to find these things for the hypothetical developer that cares - if such a person exists - is almost certainly with boot time tracing. That gives you the option to get call graphs etc too, which is likely a requirement for fixing any problems anyway. See Documentation/trace/boottime-trace.rst for that option. And if we for some reason do want to re-introduce actual printing of these things, it will need to have some uniqueness filtering rather than this "just print it all" model. Fixes: cc1e127bfa95 ("random: remove ratelimiting for in-kernel unseeded randomness") Acked-by: Jason Donenfeld Signed-off-by: Linus Torvalds --- drivers/char/random.c | 12 +----------- kernel/configs/debug.config | 1 - lib/Kconfig.debug | 27 --------------------------- 3 files changed, 1 insertion(+), 39 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index dcd002e1b890..7ff4d29911fd 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -96,8 +96,7 @@ static ATOMIC_NOTIFIER_HEAD(random_ready_notifier); /* Control how we warn userspace. */ static struct ratelimit_state urandom_warning = RATELIMIT_STATE_INIT_FLAGS("urandom_warning", HZ, 3, RATELIMIT_MSG_ON_RELEASE); -static int ratelimit_disable __read_mostly = - IS_ENABLED(CONFIG_WARN_ALL_UNSEEDED_RANDOM); +static int ratelimit_disable __read_mostly = 0; module_param_named(ratelimit_disable, ratelimit_disable, int, 0644); MODULE_PARM_DESC(ratelimit_disable, "Disable random ratelimit suppression"); @@ -168,12 +167,6 @@ int __cold execute_with_initialized_rng(struct notifier_block *nb) return ret; } -#define warn_unseeded_randomness() \ - if (IS_ENABLED(CONFIG_WARN_ALL_UNSEEDED_RANDOM) && !crng_ready()) \ - printk_deferred(KERN_NOTICE "random: %s called from %pS with crng_init=%d\n", \ - __func__, (void *)_RET_IP_, crng_init) - - /********************************************************************* * * Fast key erasure RNG, the "crng". @@ -434,7 +427,6 @@ static void _get_random_bytes(void *buf, size_t len) */ void get_random_bytes(void *buf, size_t len) { - warn_unseeded_randomness(); _get_random_bytes(buf, len); } EXPORT_SYMBOL(get_random_bytes); @@ -523,8 +515,6 @@ type get_random_ ##type(void) \ struct batch_ ##type *batch; \ unsigned long next_gen; \ \ - warn_unseeded_randomness(); \ - \ if (!crng_ready()) { \ _get_random_bytes(&ret, sizeof(ret)); \ return ret; \ diff --git a/kernel/configs/debug.config b/kernel/configs/debug.config index 774702591d26..307c97ac5fa9 100644 --- a/kernel/configs/debug.config +++ b/kernel/configs/debug.config @@ -29,7 +29,6 @@ CONFIG_SECTION_MISMATCH_WARN_ONLY=y # CONFIG_UBSAN_ALIGNMENT is not set # CONFIG_UBSAN_DIV_ZERO is not set # CONFIG_UBSAN_TRAP is not set -# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set CONFIG_DEBUG_FS=y CONFIG_DEBUG_FS_ALLOW_ALL=y CONFIG_DEBUG_IRQFLAGS=y diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 4e2dfbbd3d78..318df4c75454 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1766,33 +1766,6 @@ config STACKTRACE It is also used by various kernel debugging features that require stack trace generation. -config WARN_ALL_UNSEEDED_RANDOM - bool "Warn for all uses of unseeded randomness" - default n - help - Some parts of the kernel contain bugs relating to their use of - cryptographically secure random numbers before it's actually possible - to generate those numbers securely. This setting ensures that these - flaws don't go unnoticed, by enabling a message, should this ever - occur. This will allow people with obscure setups to know when things - are going wrong, so that they might contact developers about fixing - it. - - Unfortunately, on some models of some architectures getting - a fully seeded CRNG is extremely difficult, and so this can - result in dmesg getting spammed for a surprisingly long - time. This is really bad from a security perspective, and - so architecture maintainers really need to do what they can - to get the CRNG seeded sooner after the system is booted. - However, since users cannot do anything actionable to - address this, by default this option is disabled. - - Say Y here if you want to receive warnings for all uses of - unseeded randomness. This will be of use primarily for - those developers interested in improving the security of - Linux kernels running on their architecture (or - subarchitecture). - config DEBUG_KOBJECT bool "kobject debugging" depends on DEBUG_KERNEL From 16cb1a64dce56708a1684bf755ad52fb52c41f87 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 16 Feb 2026 13:12:00 +0100 Subject: [PATCH 134/359] RDMA/uverbs: select CONFIG_DMA_SHARED_BUFFER The addition of dmabuf support in uverbs means that it is no longer possible to build infiniband support if that is disabled: arm-linux-gnueabi-ld: drivers/infiniband/core/ib_core_uverbs.o: in function `rdma_user_mmap_entry_remove.part.0': ib_core_uverbs.c:(.text+0x508): undefined reference to `dma_buf_move_notify' (dma_buf_move_notify): Unknown destination type (ARM/Thumb) in drivers/infiniband/core/ib_core_uverbs.o ib_core_uverbs.c:(.text+0x518): undefined reference to `dma_resv_wait_timeout' (dma_resv_wait_timeout): Unknown destination type (ARM/Thumb) in drivers/infiniband/core/ib_core_uverbs.o Select this from Kconfig, as we do for the other users. Fixes: 0ac6f4056c4a ("RDMA/uverbs: Add DMABUF object type and operations") Signed-off-by: Arnd Bergmann Link: https://patch.msgid.link/20260216121213.2088910-1-arnd@kernel.org Signed-off-by: Leon Romanovsky --- drivers/infiniband/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index 794b9778816b..78ac2ff5befd 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -6,6 +6,7 @@ menuconfig INFINIBAND depends on INET depends on m || IPV6 != m depends on !ALPHA + select DMA_SHARED_BUFFER select IRQ_POLL select DIMLIB help From 7accb1c4321acb617faf934af59d928b0b047e2b Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Tue, 3 Feb 2026 15:16:16 -0500 Subject: [PATCH 135/359] Bluetooth: L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ This fixes responding with an invalid result caused by checking the wrong size of CID which should have been (cmd_len - sizeof(*req)) and on top of it the wrong result was use L2CAP_CR_LE_INVALID_PARAMS which is invalid/reserved for reconf when running test like L2CAP/ECFC/BI-03-C: > ACL Data RX: Handle 64 flags 0x02 dlen 14 LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6 MTU: 64 MPS: 64 Source CID: 64 < ACL Data TX: Handle 64 flags 0x00 dlen 10 LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2 ! Result: Reserved (0x000c) Result: Reconfiguration failed - one or more Destination CIDs invalid (0x0003) Fiix L2CAP/ECFC/BI-04-C which expects L2CAP_RECONF_INVALID_MPS (0x0002) when more than one channel gets its MPS reduced: > ACL Data RX: Handle 64 flags 0x02 dlen 16 LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 8 MTU: 264 MPS: 99 Source CID: 64 ! Source CID: 65 < ACL Data TX: Handle 64 flags 0x00 dlen 10 LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2 ! Result: Reconfiguration successful (0x0000) Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002) Fix L2CAP/ECFC/BI-05-C when SCID is invalid (85 unconnected): > ACL Data RX: Handle 64 flags 0x02 dlen 14 LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6 MTU: 65 MPS: 64 ! Source CID: 85 < ACL Data TX: Handle 64 flags 0x00 dlen 10 LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2 ! Result: Reconfiguration successful (0x0000) Result: Reconfiguration failed - one or more Destination CIDs invalid (0x0003) Fix L2CAP/ECFC/BI-06-C when MPS < L2CAP_ECRED_MIN_MPS (64): > ACL Data RX: Handle 64 flags 0x02 dlen 14 LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6 MTU: 672 ! MPS: 63 Source CID: 64 < ACL Data TX: Handle 64 flags 0x00 dlen 10 LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2 ! Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002) Result: Reconfiguration failed - other unacceptable parameters (0x0004) Fix L2CAP/ECFC/BI-07-C when MPS reduced for more than one channel: > ACL Data RX: Handle 64 flags 0x02 dlen 16 LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 3 len 8 MTU: 84 ! MPS: 71 Source CID: 64 ! Source CID: 65 < ACL Data TX: Handle 64 flags 0x00 dlen 10 LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2 ! Result: Reconfiguration successful (0x0000) Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002) Link: https://github.com/bluez/bluez/issues/1865 Fixes: 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode") Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/l2cap.h | 2 + net/bluetooth/l2cap_core.c | 71 ++++++++++++++++++++++++----------- 2 files changed, 51 insertions(+), 22 deletions(-) diff --git a/include/net/bluetooth/l2cap.h b/include/net/bluetooth/l2cap.h index ec3af01e4db9..6f9cf7a05986 100644 --- a/include/net/bluetooth/l2cap.h +++ b/include/net/bluetooth/l2cap.h @@ -493,6 +493,8 @@ struct l2cap_ecred_reconf_req { #define L2CAP_RECONF_SUCCESS 0x0000 #define L2CAP_RECONF_INVALID_MTU 0x0001 #define L2CAP_RECONF_INVALID_MPS 0x0002 +#define L2CAP_RECONF_INVALID_CID 0x0003 +#define L2CAP_RECONF_INVALID_PARAMS 0x0004 struct l2cap_ecred_reconf_rsp { __le16 result; diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index b628b0fa39b2..81038458be0c 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -5310,14 +5310,14 @@ static inline int l2cap_ecred_reconf_req(struct l2cap_conn *conn, struct l2cap_ecred_reconf_req *req = (void *) data; struct l2cap_ecred_reconf_rsp rsp; u16 mtu, mps, result; - struct l2cap_chan *chan; + struct l2cap_chan *chan[L2CAP_ECRED_MAX_CID] = {}; int i, num_scid; if (!enable_ecred) return -EINVAL; - if (cmd_len < sizeof(*req) || cmd_len - sizeof(*req) % sizeof(u16)) { - result = L2CAP_CR_LE_INVALID_PARAMS; + if (cmd_len < sizeof(*req) || (cmd_len - sizeof(*req)) % sizeof(u16)) { + result = L2CAP_RECONF_INVALID_CID; goto respond; } @@ -5327,42 +5327,69 @@ static inline int l2cap_ecred_reconf_req(struct l2cap_conn *conn, BT_DBG("mtu %u mps %u", mtu, mps); if (mtu < L2CAP_ECRED_MIN_MTU) { - result = L2CAP_RECONF_INVALID_MTU; + result = L2CAP_RECONF_INVALID_PARAMS; goto respond; } if (mps < L2CAP_ECRED_MIN_MPS) { - result = L2CAP_RECONF_INVALID_MPS; + result = L2CAP_RECONF_INVALID_PARAMS; goto respond; } cmd_len -= sizeof(*req); num_scid = cmd_len / sizeof(u16); + + if (num_scid > L2CAP_ECRED_MAX_CID) { + result = L2CAP_RECONF_INVALID_PARAMS; + goto respond; + } + result = L2CAP_RECONF_SUCCESS; + /* Check if each SCID, MTU and MPS are valid */ for (i = 0; i < num_scid; i++) { u16 scid; scid = __le16_to_cpu(req->scid[i]); - if (!scid) - return -EPROTO; - - chan = __l2cap_get_chan_by_dcid(conn, scid); - if (!chan) - continue; - - /* If the MTU value is decreased for any of the included - * channels, then the receiver shall disconnect all - * included channels. - */ - if (chan->omtu > mtu) { - BT_ERR("chan %p decreased MTU %u -> %u", chan, - chan->omtu, mtu); - result = L2CAP_RECONF_INVALID_MTU; + if (!scid) { + result = L2CAP_RECONF_INVALID_CID; + goto respond; } - chan->omtu = mtu; - chan->remote_mps = mps; + chan[i] = __l2cap_get_chan_by_dcid(conn, scid); + if (!chan[i]) { + result = L2CAP_RECONF_INVALID_CID; + goto respond; + } + + /* The MTU field shall be greater than or equal to the greatest + * current MTU size of these channels. + */ + if (chan[i]->omtu > mtu) { + BT_ERR("chan %p decreased MTU %u -> %u", chan[i], + chan[i]->omtu, mtu); + result = L2CAP_RECONF_INVALID_MTU; + goto respond; + } + + /* If more than one channel is being configured, the MPS field + * shall be greater than or equal to the current MPS size of + * each of these channels. If only one channel is being + * configured, the MPS field may be less than the current MPS + * of that channel. + */ + if (chan[i]->remote_mps >= mps && i) { + BT_ERR("chan %p decreased MPS %u -> %u", chan[i], + chan[i]->remote_mps, mps); + result = L2CAP_RECONF_INVALID_MPS; + goto respond; + } + } + + /* Commit the new MTU and MPS values after checking they are valid */ + for (i = 0; i < num_scid; i++) { + chan[i]->omtu = mtu; + chan[i]->remote_mps = mps; } respond: From c28d2bff70444a85b3b86aaf241ece9408c7858c Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Thu, 5 Feb 2026 15:11:34 -0500 Subject: [PATCH 136/359] Bluetooth: L2CAP: Fix result of L2CAP_ECRED_CONN_RSP when MTU is too short Test L2CAP/ECFC/BV-26-C expect the response to L2CAP_ECRED_CONN_REQ with and MTU value < L2CAP_ECRED_MIN_MTU (64) to be L2CAP_CR_LE_INVALID_PARAMS rather than L2CAP_CR_LE_UNACCEPT_PARAMS. Also fix not including the correct number of CIDs in the response since the spec requires all CIDs being rejected to be included in the response. Link: https://github.com/bluez/bluez/issues/1868 Fixes: 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode") Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/l2cap.h | 6 +++--- net/bluetooth/l2cap_core.c | 14 ++++++++------ 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/include/net/bluetooth/l2cap.h b/include/net/bluetooth/l2cap.h index 6f9cf7a05986..010f1a8fd15f 100644 --- a/include/net/bluetooth/l2cap.h +++ b/include/net/bluetooth/l2cap.h @@ -284,9 +284,9 @@ struct l2cap_conn_rsp { #define L2CAP_CR_LE_BAD_KEY_SIZE 0x0007 #define L2CAP_CR_LE_ENCRYPTION 0x0008 #define L2CAP_CR_LE_INVALID_SCID 0x0009 -#define L2CAP_CR_LE_SCID_IN_USE 0X000A -#define L2CAP_CR_LE_UNACCEPT_PARAMS 0X000B -#define L2CAP_CR_LE_INVALID_PARAMS 0X000C +#define L2CAP_CR_LE_SCID_IN_USE 0x000A +#define L2CAP_CR_LE_UNACCEPT_PARAMS 0x000B +#define L2CAP_CR_LE_INVALID_PARAMS 0x000C /* connect/create channel status */ #define L2CAP_CS_NO_INFO 0x0000 diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 81038458be0c..390d25909104 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -5051,13 +5051,15 @@ static inline int l2cap_ecred_conn_req(struct l2cap_conn *conn, struct l2cap_chan *chan, *pchan; u16 mtu, mps; __le16 psm; - u8 result, len = 0; + u8 result, rsp_len = 0; int i, num_scid; bool defer = false; if (!enable_ecred) return -EINVAL; + memset(pdu, 0, sizeof(*pdu)); + if (cmd_len < sizeof(*req) || (cmd_len - sizeof(*req)) % sizeof(u16)) { result = L2CAP_CR_LE_INVALID_PARAMS; goto response; @@ -5066,6 +5068,9 @@ static inline int l2cap_ecred_conn_req(struct l2cap_conn *conn, cmd_len -= sizeof(*req); num_scid = cmd_len / sizeof(u16); + /* Always respond with the same number of scids as in the request */ + rsp_len = cmd_len; + if (num_scid > L2CAP_ECRED_MAX_CID) { result = L2CAP_CR_LE_INVALID_PARAMS; goto response; @@ -5075,7 +5080,7 @@ static inline int l2cap_ecred_conn_req(struct l2cap_conn *conn, mps = __le16_to_cpu(req->mps); if (mtu < L2CAP_ECRED_MIN_MTU || mps < L2CAP_ECRED_MIN_MPS) { - result = L2CAP_CR_LE_UNACCEPT_PARAMS; + result = L2CAP_CR_LE_INVALID_PARAMS; goto response; } @@ -5095,8 +5100,6 @@ static inline int l2cap_ecred_conn_req(struct l2cap_conn *conn, BT_DBG("psm 0x%2.2x mtu %u mps %u", __le16_to_cpu(psm), mtu, mps); - memset(pdu, 0, sizeof(*pdu)); - /* Check if we have socket listening on psm */ pchan = l2cap_global_chan_by_psm(BT_LISTEN, psm, &conn->hcon->src, &conn->hcon->dst, LE_LINK); @@ -5121,7 +5124,6 @@ static inline int l2cap_ecred_conn_req(struct l2cap_conn *conn, BT_DBG("scid[%d] 0x%4.4x", i, scid); pdu->dcid[i] = 0x0000; - len += sizeof(*pdu->dcid); /* Check for valid dynamic CID range */ if (scid < L2CAP_CID_DYN_START || scid > L2CAP_CID_LE_DYN_END) { @@ -5188,7 +5190,7 @@ response: return 0; l2cap_send_cmd(conn, cmd->ident, L2CAP_ECRED_CONN_RSP, - sizeof(*pdu) + len, pdu); + sizeof(*pdu) + rsp_len, pdu); return 0; } From 21e4271e65094172aadd5beb8caea95dd0fbf6d7 Mon Sep 17 00:00:00 2001 From: Heitor Alves de Siqueira Date: Wed, 11 Feb 2026 15:03:35 -0300 Subject: [PATCH 137/359] Bluetooth: purge error queues in socket destructors When TX timestamping is enabled via SO_TIMESTAMPING, SKBs may be queued into sk_error_queue and will stay there until consumed. If userspace never gets to read the timestamps, or if the controller is removed unexpectedly, these SKBs will leak. Fix by adding skb_queue_purge() calls for sk_error_queue in affected bluetooth destructors. RFCOMM does not currently use sk_error_queue. Fixes: 134f4b39df7b ("Bluetooth: add support for skb TX SND/COMPLETION timestamping") Reported-by: syzbot+7ff4013eabad1407b70a@syzkaller.appspotmail.com Closes: https://syzbot.org/bug?extid=7ff4013eabad1407b70a Cc: stable@vger.kernel.org Signed-off-by: Heitor Alves de Siqueira Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/hci_sock.c | 1 + net/bluetooth/iso.c | 1 + net/bluetooth/l2cap_sock.c | 1 + net/bluetooth/sco.c | 1 + 4 files changed, 4 insertions(+) diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index 4e7bf63af9c5..0290dea081f6 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -2166,6 +2166,7 @@ static void hci_sock_destruct(struct sock *sk) mgmt_cleanup(sk); skb_queue_purge(&sk->sk_receive_queue); skb_queue_purge(&sk->sk_write_queue); + skb_queue_purge(&sk->sk_error_queue); } static const struct proto_ops hci_sock_ops = { diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c index 1459ab161fd2..a38d3774176d 100644 --- a/net/bluetooth/iso.c +++ b/net/bluetooth/iso.c @@ -746,6 +746,7 @@ static void iso_sock_destruct(struct sock *sk) skb_queue_purge(&sk->sk_receive_queue); skb_queue_purge(&sk->sk_write_queue); + skb_queue_purge(&sk->sk_error_queue); } static void iso_sock_cleanup_listen(struct sock *parent) diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c index 3ba3ce7eaa98..62ceda979f39 100644 --- a/net/bluetooth/l2cap_sock.c +++ b/net/bluetooth/l2cap_sock.c @@ -1817,6 +1817,7 @@ static void l2cap_sock_destruct(struct sock *sk) skb_queue_purge(&sk->sk_receive_queue); skb_queue_purge(&sk->sk_write_queue); + skb_queue_purge(&sk->sk_error_queue); } static void l2cap_skb_msg_name(struct sk_buff *skb, void *msg_name, diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c index 87ba90336e80..cccfaf560317 100644 --- a/net/bluetooth/sco.c +++ b/net/bluetooth/sco.c @@ -470,6 +470,7 @@ static void sco_sock_destruct(struct sock *sk) skb_queue_purge(&sk->sk_receive_queue); skb_queue_purge(&sk->sk_write_queue); + skb_queue_purge(&sk->sk_error_queue); } static void sco_sock_cleanup_listen(struct sock *parent) From 5c4e9a8b18457ad28b57069ef0f14661e3192b2e Mon Sep 17 00:00:00 2001 From: Jinwang Li Date: Thu, 5 Feb 2026 14:26:00 +0800 Subject: [PATCH 138/359] Bluetooth: hci_qca: Cleanup on all setup failures The setup process previously combined error handling and retry gating under one condition. As a result, the final failed attempt exited without performing cleanup. Update the failure path to always perform power and port cleanup on setup failure, and reopen the port only when retrying. Fixes: 9e80587aba4c ("Bluetooth: hci_qca: Enhance retry logic in qca_setup") Signed-off-by: Jinwang Li Reviewed-by: Bartosz Golaszewski Signed-off-by: Luiz Augusto von Dentz --- drivers/bluetooth/hci_qca.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c index c511546f793e..c6de2602b49d 100644 --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -2046,19 +2046,23 @@ retry: } out: - if (ret && retries < MAX_INIT_RETRIES) { - bt_dev_warn(hdev, "Retry BT power ON:%d", retries); + if (ret) { qca_power_shutdown(hu); - if (hu->serdev) { - serdev_device_close(hu->serdev); - ret = serdev_device_open(hu->serdev); - if (ret) { - bt_dev_err(hdev, "failed to open port"); - return ret; + + if (retries < MAX_INIT_RETRIES) { + bt_dev_warn(hdev, "Retry BT power ON:%d", retries); + if (hu->serdev) { + serdev_device_close(hu->serdev); + ret = serdev_device_open(hu->serdev); + if (ret) { + bt_dev_err(hdev, "failed to open port"); + return ret; + } } + retries++; + goto retry; } - retries++; - goto retry; + return ret; } /* Setup bdaddr */ From 05761c2c2b5bfec85c47f60c903c461e9b56cf87 Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Wed, 11 Feb 2026 15:18:03 -0500 Subject: [PATCH 139/359] Bluetooth: L2CAP: Fix response to L2CAP_ECRED_CONN_REQ Similar to 03dba9cea72f ("Bluetooth: L2CAP: Fix not responding with L2CAP_CR_LE_ENCRYPTION") the result code L2CAP_CR_LE_ENCRYPTION shall be used when BT_SECURITY_MEDIUM is set since that means security mode 2 which mean it doesn't require authentication which results in qualification test L2CAP/ECFC/BV-32-C failing. Link: https://github.com/bluez/bluez/issues/1871 Fixes: 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode") Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/l2cap_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 390d25909104..9452c6179acb 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -5112,7 +5112,8 @@ static inline int l2cap_ecred_conn_req(struct l2cap_conn *conn, if (!smp_sufficient_security(conn->hcon, pchan->sec_level, SMP_ALLOW_STK)) { - result = L2CAP_CR_LE_AUTHENTICATION; + result = pchan->sec_level == BT_SECURITY_MEDIUM ? + L2CAP_CR_LE_ENCRYPTION : L2CAP_CR_LE_AUTHENTICATION; goto unlock; } From 7cff9a40c6b0f72ccefdaf0ffe03cfac30348f51 Mon Sep 17 00:00:00 2001 From: Mariusz Skamra Date: Thu, 12 Feb 2026 14:46:46 +0100 Subject: [PATCH 140/359] Bluetooth: Fix CIS host feature condition This fixes the condition for sending the LE Set Host Feature command. The command is sent to indicate host support for Connected Isochronous Streams in this case. It has been observed that the system could not initialize BIS-only capable controllers because the controllers do not support the command. As per Core v6.2 | Vol 4, Part E, Table 3.1 the command shall be supported if CIS Central or CIS Peripheral is supported; otherwise, the command is optional. Fixes: 709788b154ca ("Bluetooth: hci_core: Fix using {cis,bis}_capable for current settings") Cc: stable@vger.kernel.org Signed-off-by: Mariusz Skamra Reviewed-by: Paul Menzel Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/hci_sync.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index f04a90bce4a9..0b0dc0965f5a 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -4592,7 +4592,7 @@ static int hci_le_set_host_features_sync(struct hci_dev *hdev) { int err; - if (iso_capable(hdev)) { + if (cis_capable(hdev)) { /* Connected Isochronous Channels (Host Support) */ err = hci_le_set_host_feature_sync(hdev, 32, (iso_enabled(hdev) ? 0x01 : From a8d1d73c81d1e70d2aa49fdaf59d933bb783ffe5 Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Tue, 17 Feb 2026 13:29:43 -0500 Subject: [PATCH 141/359] Bluetooth: L2CAP: Fix not checking output MTU is acceptable on L2CAP_ECRED_CONN_REQ Upon receiving L2CAP_ECRED_CONN_REQ the given MTU shall be checked against the suggested MTU of the listening socket as that is required by the likes of PTS L2CAP/ECFC/BV-27-C test which expects L2CAP_CR_LE_UNACCEPT_PARAMS if the MTU is lowers than socket omtu. In order to be able to set chan->omtu the code now allows setting setsockopt(BT_SNDMTU), but it is only allowed when connection has not been stablished since there is no procedure to reconfigure the output MTU. Link: https://github.com/bluez/bluez/issues/1895 Fixes: 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode") Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/l2cap_core.c | 8 ++++++++ net/bluetooth/l2cap_sock.c | 15 +++++++++++---- 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 9452c6179acb..90676ca0e92b 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -5117,6 +5117,14 @@ static inline int l2cap_ecred_conn_req(struct l2cap_conn *conn, goto unlock; } + /* Check if the listening channel has set an output MTU then the + * requested MTU shall be less than or equal to that value. + */ + if (pchan->omtu && mtu < pchan->omtu) { + result = L2CAP_CR_LE_UNACCEPT_PARAMS; + goto unlock; + } + result = L2CAP_CR_LE_SUCCESS; for (i = 0; i < num_scid; i++) { diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c index 62ceda979f39..477b4d145459 100644 --- a/net/bluetooth/l2cap_sock.c +++ b/net/bluetooth/l2cap_sock.c @@ -1029,10 +1029,17 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname, break; } - /* Setting is not supported as it's the remote side that - * decides this. - */ - err = -EPERM; + /* Only allow setting output MTU when not connected */ + if (sk->sk_state == BT_CONNECTED) { + err = -EISCONN; + break; + } + + err = copy_safe_from_sockptr(&mtu, sizeof(mtu), optval, optlen); + if (err) + break; + + chan->omtu = mtu; break; case BT_RCVMTU: From 138d7eca445ef37a0333425d269ee59900ca1104 Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Fri, 13 Feb 2026 13:33:33 -0500 Subject: [PATCH 142/359] Bluetooth: L2CAP: Fix missing key size check for L2CAP_LE_CONN_REQ This adds a check for encryption key size upon receiving L2CAP_LE_CONN_REQ which is required by L2CAP/LE/CFC/BV-15-C which expects L2CAP_CR_LE_BAD_KEY_SIZE. Link: https://lore.kernel.org/linux-bluetooth/5782243.rdbgypaU67@n9w6sw14/ Fixes: 27e2d4c8d28b ("Bluetooth: Add basic LE L2CAP connect request receiving support") Signed-off-by: Luiz Augusto von Dentz Tested-by: Christian Eggers --- net/bluetooth/l2cap_core.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 90676ca0e92b..2dcc5bb907b8 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -4916,6 +4916,13 @@ static int l2cap_le_connect_req(struct l2cap_conn *conn, goto response_unlock; } + /* Check if Key Size is sufficient for the security level */ + if (!l2cap_check_enc_key_size(conn->hcon, pchan)) { + result = L2CAP_CR_LE_BAD_KEY_SIZE; + chan = NULL; + goto response_unlock; + } + /* Check for valid dynamic CID range */ if (scid < L2CAP_CID_DYN_START || scid > L2CAP_CID_LE_DYN_END) { result = L2CAP_CR_LE_INVALID_SCID; From 1bfd7575092420ba5a0b944953c95b74a5646ff8 Mon Sep 17 00:00:00 2001 From: Shuicheng Lin Date: Thu, 19 Feb 2026 23:35:18 +0000 Subject: [PATCH 143/359] drm/xe/sync: Cleanup partially initialized sync on parse failure xe_sync_entry_parse() can allocate references (syncobj, fence, chain fence, or user fence) before hitting a later failure path. Several of those paths returned directly, leaving partially initialized state and leaking refs. Route these error paths through a common free_sync label and call xe_sync_entry_cleanup(sync) before returning the error. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Matthew Brost Signed-off-by: Shuicheng Lin Reviewed-by: Matthew Brost Signed-off-by: Matthew Brost Link: https://patch.msgid.link/20260219233516.2938172-5-shuicheng.lin@intel.com (cherry picked from commit f939bdd9207a5d1fc55cced5459858480686ce22) Cc: stable@vger.kernel.org Signed-off-by: Rodrigo Vivi --- drivers/gpu/drm/xe/xe_sync.c | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c index eb136390dafd..ebf6c96d7a41 100644 --- a/drivers/gpu/drm/xe/xe_sync.c +++ b/drivers/gpu/drm/xe/xe_sync.c @@ -146,8 +146,10 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef, if (!signal) { sync->fence = drm_syncobj_fence_get(sync->syncobj); - if (XE_IOCTL_DBG(xe, !sync->fence)) - return -EINVAL; + if (XE_IOCTL_DBG(xe, !sync->fence)) { + err = -EINVAL; + goto free_sync; + } } break; @@ -167,17 +169,21 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef, if (signal) { sync->chain_fence = dma_fence_chain_alloc(); - if (!sync->chain_fence) - return -ENOMEM; + if (!sync->chain_fence) { + err = -ENOMEM; + goto free_sync; + } } else { sync->fence = drm_syncobj_fence_get(sync->syncobj); - if (XE_IOCTL_DBG(xe, !sync->fence)) - return -EINVAL; + if (XE_IOCTL_DBG(xe, !sync->fence)) { + err = -EINVAL; + goto free_sync; + } err = dma_fence_chain_find_seqno(&sync->fence, sync_in.timeline_value); if (err) - return err; + goto free_sync; } break; @@ -216,6 +222,10 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef, sync->timeline_value = sync_in.timeline_value; return 0; + +free_sync: + xe_sync_entry_cleanup(sync); + return err; } ALLOW_ERROR_INJECTION(xe_sync_entry_parse, ERRNO); From 0879c3f04f67e2a1677c25dcc24669ce21eb6a6c Mon Sep 17 00:00:00 2001 From: Shuicheng Lin Date: Thu, 19 Feb 2026 23:35:19 +0000 Subject: [PATCH 144/359] drm/xe/sync: Fix user fence leak on alloc failure When dma_fence_chain_alloc() fails, properly release the user fence reference to prevent a memory leak. Fixes: 0995c2fc39b0 ("drm/xe: Enforce correct user fence signaling order using") Cc: Matthew Brost Signed-off-by: Shuicheng Lin Reviewed-by: Matthew Brost Signed-off-by: Matthew Brost Link: https://patch.msgid.link/20260219233516.2938172-6-shuicheng.lin@intel.com (cherry picked from commit a5d5634cde48a9fcd68c8504aa07f89f175074a0) Cc: stable@vger.kernel.org Signed-off-by: Rodrigo Vivi --- drivers/gpu/drm/xe/xe_sync.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c index ebf6c96d7a41..24d6d9af20d6 100644 --- a/drivers/gpu/drm/xe/xe_sync.c +++ b/drivers/gpu/drm/xe/xe_sync.c @@ -206,8 +206,10 @@ int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef, if (XE_IOCTL_DBG(xe, IS_ERR(sync->ufence))) return PTR_ERR(sync->ufence); sync->ufence_chain_fence = dma_fence_chain_alloc(); - if (!sync->ufence_chain_fence) - return -ENOMEM; + if (!sync->ufence_chain_fence) { + err = -ENOMEM; + goto free_sync; + } sync->ufence_syncobj = ufence_syncobj; } From 96a7b71c4438d3b72d6c95e3efdc9e8e8aee6b78 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 23 Feb 2026 13:43:45 -0800 Subject: [PATCH 145/359] ubd: Use pointer-to-pointers for io_thread_req arrays Having an unbounded array for irq_req_buffer and io_req_buffer doesn't provide any bounds safety, and confuses the needed allocation type, which is returning a pointer to pointers. Instead of the implicit cast, switch the variable types. Reported-by: Nathan Chancellor Reported-by: Guenter Roeck Closes: https://lore.kernel.org/all/b04b6c13-7d0e-4a89-9e68-b572b6c686ac@roeck-us.net Fixes: 69050f8d6d07 ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types") Acked-by: Richard Weinberger Link: https://patch.msgid.link/20260223214341.work.846-kees@kernel.org Signed-off-by: Kees Cook --- arch/um/drivers/ubd_kern.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c index 012b2bcaa8a0..20fc33300a95 100644 --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -69,11 +69,11 @@ struct io_thread_req { }; -static struct io_thread_req * (*irq_req_buffer)[]; +static struct io_thread_req **irq_req_buffer; static struct io_thread_req *irq_remainder; static int irq_remainder_size; -static struct io_thread_req * (*io_req_buffer)[]; +static struct io_thread_req **io_req_buffer; static struct io_thread_req *io_remainder; static int io_remainder_size; @@ -398,7 +398,7 @@ static int thread_fd = -1; static int bulk_req_safe_read( int fd, - struct io_thread_req * (*request_buffer)[], + struct io_thread_req **request_buffer, struct io_thread_req **remainder, int *remainder_size, int max_recs @@ -465,7 +465,7 @@ static irqreturn_t ubd_intr(int irq, void *dev) &irq_remainder, &irq_remainder_size, UBD_REQ_BUFFER_SIZE)) >= 0) { for (i = 0; i < len / sizeof(struct io_thread_req *); i++) - ubd_end_request((*irq_req_buffer)[i]); + ubd_end_request(irq_req_buffer[i]); } if (len < 0 && len != -EAGAIN) @@ -1512,7 +1512,7 @@ void *io_thread(void *arg) } for (count = 0; count < n/sizeof(struct io_thread_req *); count++) { - struct io_thread_req *req = (*io_req_buffer)[count]; + struct io_thread_req *req = io_req_buffer[count]; int i; io_count++; From f709859f331b8fbd7fede5769d668008fc5ba789 Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Mon, 23 Feb 2026 11:23:12 -0700 Subject: [PATCH 146/359] init/Kconfig: Adjust fixed clang version for __builtin_counted_by_ref Commit d39a1d7486d9 ("compiler_types: Disable __builtin_counted_by_ref for Clang") used 22.0.0 as the fixed version for a compiler crash but the fix was only merged in main (23.0.0) and release/22.x (22.1.0). With the current fixed version number, prerelease or Android LLVM 22 builds will still be able to hit the compiler crash when building the kernel. This can be particularly disruptive when bisecting LLVM. Use 21.1.0 as the fixed version number to ensure the fix for this crash is always present. Fixes: d39a1d7486d9 ("compiler_types: Disable __builtin_counted_by_ref for Clang") Signed-off-by: Nathan Chancellor Link: https://patch.msgid.link/20260223-fix-clang-version-builtin-counted-by-ref-v1-1-3ea478a24f0a@kernel.org Signed-off-by: Kees Cook --- init/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/init/Kconfig b/init/Kconfig index c25869cf59c1..b55deae9256c 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -153,7 +153,7 @@ config CC_HAS_COUNTED_BY_PTR config CC_HAS_BROKEN_COUNTED_BY_REF bool # https://github.com/llvm/llvm-project/issues/182575 - default y if CC_IS_CLANG && CLANG_VERSION < 220000 + default y if CC_IS_CLANG && CLANG_VERSION < 220100 config CC_HAS_MULTIDIMENSIONAL_NONSTRING def_bool $(success,echo 'char tag[][4] __attribute__((__nonstring__)) = { };' | $(CC) $(CLANG_FLAGS) -x c - -c -o /dev/null -Werror) From eddb98ad9364b4e778768785d46cfab04ce52100 Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Fri, 20 Feb 2026 13:43:00 +0900 Subject: [PATCH 147/359] ata: libata-eh: correctly handle deferred qc timeouts A deferred qc may timeout while waiting for the device queue to drain to be submitted. In such case, since the qc is not active, ata_scsi_cmd_error_handler() ends up calling scsi_eh_finish_cmd(), which frees the qc. But as the port deferred_qc field still references this finished/freed qc, the deferred qc work may eventually attempt to call ata_qc_issue() against this invalid qc, leading to errors such as reported by UBSAN (syzbot run): UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24 shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int' ... Call Trace: __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120 ubsan_epilogue+0xa/0x30 lib/ubsan.c:233 __ubsan_handle_shift_out_of_bounds+0x279/0x2a0 lib/ubsan.c:494 ata_qc_issue.cold+0x38/0x9f drivers/ata/libata-core.c:5166 ata_scsi_deferred_qc_work+0x154/0x1f0 drivers/ata/libata-scsi.c:1679 process_one_work+0x9d7/0x1920 kernel/workqueue.c:3275 process_scheduled_works kernel/workqueue.c:3358 [inline] worker_thread+0x5da/0xe40 kernel/workqueue.c:3439 kthread+0x370/0x450 kernel/kthread.c:467 ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 Fix this by checking if the qc of a timed out SCSI command is a deferred one, and in such case, clear the port deferred_qc field and finish the SCSI command with DID_TIME_OUT. Reported-by: syzbot+1f77b8ca15336fff21ff@syzkaller.appspotmail.com Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation") Signed-off-by: Damien Le Moal Reviewed-by: Hannes Reinecke Reviewed-by: Igor Pylypiv --- drivers/ata/libata-eh.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index 72a22b6c9682..b373cceb95d2 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -640,12 +640,28 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap, set_host_byte(scmd, DID_OK); ata_qc_for_each_raw(ap, qc, i) { - if (qc->flags & ATA_QCFLAG_ACTIVE && - qc->scsicmd == scmd) + if (qc->scsicmd != scmd) + continue; + if ((qc->flags & ATA_QCFLAG_ACTIVE) || + qc == ap->deferred_qc) break; } - if (i < ATA_MAX_QUEUE) { + if (qc == ap->deferred_qc) { + /* + * This is a deferred command that timed out while + * waiting for the command queue to drain. Since the qc + * is not active yet (deferred_qc is still set, so the + * deferred qc work has not issued the command yet), + * simply signal the timeout by finishing the SCSI + * command and clear the deferred qc to prevent the + * deferred qc work from issuing this qc. + */ + WARN_ON_ONCE(qc->flags & ATA_QCFLAG_ACTIVE); + ap->deferred_qc = NULL; + set_host_byte(scmd, DID_TIME_OUT); + scsi_eh_finish_cmd(scmd, &ap->eh_done_q); + } else if (i < ATA_MAX_QUEUE) { /* the scmd has an associated qc */ if (!(qc->flags & ATA_QCFLAG_EH)) { /* which hasn't failed yet, timeout */ From 55db009926634b20955bd8abbee921adbc8d2cb4 Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Fri, 20 Feb 2026 12:09:12 +0900 Subject: [PATCH 148/359] ata: libata-core: fix cancellation of a port deferred qc work cancel_work_sync() is a sleeping function so it cannot be called with the spin lock of a port being held. Move the call to this function in ata_port_detach() after EH completes, with the port lock released, together with other work cancellation calls. Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation") Signed-off-by: Damien Le Moal Reviewed-by: Hannes Reinecke Reviewed-by: Igor Pylypiv --- drivers/ata/libata-core.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 11909417f017..ccbf320524da 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -6269,10 +6269,6 @@ static void ata_port_detach(struct ata_port *ap) } } - /* Make sure the deferred qc work finished. */ - cancel_work_sync(&ap->deferred_qc_work); - WARN_ON(ap->deferred_qc); - /* Tell EH to disable all devices */ ap->pflags |= ATA_PFLAG_UNLOADING; ata_port_schedule_eh(ap); @@ -6283,9 +6279,11 @@ static void ata_port_detach(struct ata_port *ap) /* wait till EH commits suicide */ ata_port_wait_eh(ap); - /* it better be dead now */ + /* It better be dead now and not have any remaining deferred qc. */ WARN_ON(!(ap->pflags & ATA_PFLAG_UNLOADED)); + WARN_ON(ap->deferred_qc); + cancel_work_sync(&ap->deferred_qc_work); cancel_delayed_work_sync(&ap->hotplug_task); cancel_delayed_work_sync(&ap->scsi_rescan_task); From 41e09ec73d7431dffda01b1e7208a272a5c90fb9 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Fri, 20 Feb 2026 17:18:58 -0800 Subject: [PATCH 149/359] MAINTAINERS: include all of framer under pef2256 The "framer" infrastructure only has one driver - pef2256 and is not covered by any MAINTAINERS entry of its own. This leads to author not being CCed on patches. Let's include all of framer/ under the pef2256 entry. We can split it in the very unlikely event of another driver appearing. Link: https://lore.kernel.org/aZefB5f3EAkQQM1m@google.com Acked-by: Herve Codina Link: https://patch.msgid.link/20260221011858.3403605-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- MAINTAINERS | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index c00b5f5d4203..e1a6c6cc2b7c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -14410,9 +14410,9 @@ LANTIQ PEF2256 DRIVER M: Herve Codina S: Maintained F: Documentation/devicetree/bindings/net/lantiq,pef2256.yaml -F: drivers/net/wan/framer/pef2256/ +F: drivers/net/wan/framer/ F: drivers/pinctrl/pinctrl-pef2256.c -F: include/linux/framer/pef2256.h +F: include/linux/framer/ LASI 53c700 driver for PARISC M: "James E.J. Bottomley" From 8a8a9fac9efa6423fd74938b940cb7d731780718 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 20 Feb 2026 22:26:05 +0000 Subject: [PATCH 150/359] net: do not pass flow_id to set_rps_cpu() Blamed commit made the assumption that the RPS table for each receive queue would have the same size, and that it would not change. Compute flow_id in set_rps_cpu(), do not assume we can use the value computed by get_rps_cpu(). Otherwise we risk out-of-bound access and/or crashes. Fixes: 48aa30443e52 ("net: Cache hash and flow_id to avoid recalculation") Signed-off-by: Eric Dumazet Cc: Krishna Kumar Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20260220222605.3468081-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/core/dev.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 096b3ff13f6b..f3426385f1ba 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4992,8 +4992,7 @@ static bool rps_flow_is_active(struct rps_dev_flow *rflow, static struct rps_dev_flow * set_rps_cpu(struct net_device *dev, struct sk_buff *skb, - struct rps_dev_flow *rflow, u16 next_cpu, u32 hash, - u32 flow_id) + struct rps_dev_flow *rflow, u16 next_cpu, u32 hash) { if (next_cpu < nr_cpu_ids) { u32 head; @@ -5004,6 +5003,7 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb, struct rps_dev_flow *tmp_rflow; unsigned int tmp_cpu; u16 rxq_index; + u32 flow_id; int rc; /* Should we steer this flow to a different hardware queue? */ @@ -5019,6 +5019,7 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb, if (!flow_table) goto out; + flow_id = rfs_slot(hash, flow_table); tmp_rflow = &flow_table->flows[flow_id]; tmp_cpu = READ_ONCE(tmp_rflow->cpu); @@ -5066,7 +5067,6 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb, struct rps_dev_flow_table *flow_table; struct rps_map *map; int cpu = -1; - u32 flow_id; u32 tcpu; u32 hash; @@ -5113,8 +5113,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb, /* OK, now we know there is a match, * we can look at the local (per receive queue) flow table */ - flow_id = rfs_slot(hash, flow_table); - rflow = &flow_table->flows[flow_id]; + rflow = &flow_table->flows[rfs_slot(hash, flow_table)]; tcpu = rflow->cpu; /* @@ -5133,8 +5132,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb, ((int)(READ_ONCE(per_cpu(softnet_data, tcpu).input_queue_head) - rflow->last_qtail)) >= 0)) { tcpu = next_cpu; - rflow = set_rps_cpu(dev, skb, rflow, next_cpu, hash, - flow_id); + rflow = set_rps_cpu(dev, skb, rflow, next_cpu, hash); } if (tcpu < nr_cpu_ids && cpu_online(tcpu)) { From 7bb09315f93dce6acc54bf59e5a95ba7365c2be4 Mon Sep 17 00:00:00 2001 From: Hyunwoo Kim Date: Fri, 20 Feb 2026 18:40:36 +0900 Subject: [PATCH 151/359] tls: Fix race condition in tls_sw_cancel_work_tx() This issue was discovered during a code audit. After cancel_delayed_work_sync() is called from tls_sk_proto_close(), tx_work_handler() can still be scheduled from paths such as the Delayed ACK handler or ksoftirqd. As a result, the tx_work_handler() worker may dereference a freed TLS object. The following is a simple race scenario: cpu0 cpu1 tls_sk_proto_close() tls_sw_cancel_work_tx() tls_write_space() tls_sw_write_space() if (!test_and_set_bit(BIT_TX_SCHEDULED, &tx_ctx->tx_bitmask)) set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask); cancel_delayed_work_sync(&ctx->tx_work.work); schedule_delayed_work(&tx_ctx->tx_work.work, 0); To prevent this race condition, cancel_delayed_work_sync() is replaced with disable_delayed_work_sync(). Fixes: f87e62d45e51 ("net/tls: remove close callback sock unlock/lock around TX work flush") Signed-off-by: Hyunwoo Kim Reviewed-by: Simon Horman Reviewed-by: Sabrina Dubroca Link: https://patch.msgid.link/aZgsFO6nfylfvLE7@v4bel Signed-off-by: Jakub Kicinski --- net/tls/tls_sw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 9937d4c810f2..b1fa62de9dab 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -2533,7 +2533,7 @@ void tls_sw_cancel_work_tx(struct tls_context *tls_ctx) set_bit(BIT_TX_CLOSING, &ctx->tx_bitmask); set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask); - cancel_delayed_work_sync(&ctx->tx_work.work); + disable_delayed_work_sync(&ctx->tx_work.work); } void tls_sw_release_resources_tx(struct sock *sk) From fb868db5f4bccd7a78219313ab2917429f715cea Mon Sep 17 00:00:00 2001 From: Ankit Garg Date: Fri, 20 Feb 2026 13:53:24 -0800 Subject: [PATCH 152/359] gve: fix incorrect buffer cleanup in gve_tx_clean_pending_packets for QPL In DQ-QPL mode, gve_tx_clean_pending_packets() incorrectly uses the RDA buffer cleanup path. It iterates num_bufs times and attempts to unmap entries in the dma array. This leads to two issues: 1. The dma array shares storage with tx_qpl_buf_ids (union). Interpreting buffer IDs as DMA addresses results in attempting to unmap incorrect memory locations. 2. num_bufs in QPL mode (counting 2K chunks) can significantly exceed the size of the dma array, causing out-of-bounds access warnings (trace below is how we noticed this issue). UBSAN: array-index-out-of-bounds in drivers/net/ethernet/drivers/net/ethernet/google/gve/gve_tx_dqo.c:178:5 index 18 is out of range for type 'dma_addr_t[18]' (aka 'unsigned long long[18]') Workqueue: gve gve_service_task [gve] Call Trace: dump_stack_lvl+0x33/0xa0 __ubsan_handle_out_of_bounds+0xdc/0x110 gve_tx_stop_ring_dqo+0x182/0x200 [gve] gve_close+0x1be/0x450 [gve] gve_reset+0x99/0x120 [gve] gve_service_task+0x61/0x100 [gve] process_scheduled_works+0x1e9/0x380 Fix this by properly checking for QPL mode and delegating to gve_free_tx_qpl_bufs() to reclaim the buffers. Cc: stable@vger.kernel.org Fixes: a6fb8d5a8b69 ("gve: Tx path for DQO-QPL") Signed-off-by: Ankit Garg Reviewed-by: Jordan Rhee Reviewed-by: Harshitha Ramamurthy Signed-off-by: Joshua Washington Reviewed-by: Simon Horman Link: https://patch.msgid.link/20260220215324.1631350-1-joshwash@google.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/google/gve/gve_tx_dqo.c | 54 +++++++++----------- 1 file changed, 24 insertions(+), 30 deletions(-) diff --git a/drivers/net/ethernet/google/gve/gve_tx_dqo.c b/drivers/net/ethernet/google/gve/gve_tx_dqo.c index 28e85730f785..b57e8f13cb51 100644 --- a/drivers/net/ethernet/google/gve/gve_tx_dqo.c +++ b/drivers/net/ethernet/google/gve/gve_tx_dqo.c @@ -167,6 +167,25 @@ gve_free_pending_packet(struct gve_tx_ring *tx, } } +static void gve_unmap_packet(struct device *dev, + struct gve_tx_pending_packet_dqo *pkt) +{ + int i; + + if (!pkt->num_bufs) + return; + + /* SKB linear portion is guaranteed to be mapped */ + dma_unmap_single(dev, dma_unmap_addr(pkt, dma[0]), + dma_unmap_len(pkt, len[0]), DMA_TO_DEVICE); + for (i = 1; i < pkt->num_bufs; i++) { + netmem_dma_unmap_page_attrs(dev, dma_unmap_addr(pkt, dma[i]), + dma_unmap_len(pkt, len[i]), + DMA_TO_DEVICE, 0); + } + pkt->num_bufs = 0; +} + /* gve_tx_free_desc - Cleans up all pending tx requests and buffers. */ static void gve_tx_clean_pending_packets(struct gve_tx_ring *tx) @@ -176,21 +195,12 @@ static void gve_tx_clean_pending_packets(struct gve_tx_ring *tx) for (i = 0; i < tx->dqo.num_pending_packets; i++) { struct gve_tx_pending_packet_dqo *cur_state = &tx->dqo.pending_packets[i]; - int j; - for (j = 0; j < cur_state->num_bufs; j++) { - if (j == 0) { - dma_unmap_single(tx->dev, - dma_unmap_addr(cur_state, dma[j]), - dma_unmap_len(cur_state, len[j]), - DMA_TO_DEVICE); - } else { - dma_unmap_page(tx->dev, - dma_unmap_addr(cur_state, dma[j]), - dma_unmap_len(cur_state, len[j]), - DMA_TO_DEVICE); - } - } + if (tx->dqo.qpl) + gve_free_tx_qpl_bufs(tx, cur_state); + else + gve_unmap_packet(tx->dev, cur_state); + if (cur_state->skb) { dev_consume_skb_any(cur_state->skb); cur_state->skb = NULL; @@ -1157,22 +1167,6 @@ static void remove_from_list(struct gve_tx_ring *tx, } } -static void gve_unmap_packet(struct device *dev, - struct gve_tx_pending_packet_dqo *pkt) -{ - int i; - - /* SKB linear portion is guaranteed to be mapped */ - dma_unmap_single(dev, dma_unmap_addr(pkt, dma[0]), - dma_unmap_len(pkt, len[0]), DMA_TO_DEVICE); - for (i = 1; i < pkt->num_bufs; i++) { - netmem_dma_unmap_page_attrs(dev, dma_unmap_addr(pkt, dma[i]), - dma_unmap_len(pkt, len[i]), - DMA_TO_DEVICE, 0); - } - pkt->num_bufs = 0; -} - /* Completion types and expected behavior: * No Miss compl + Packet compl = Packet completed normally. * Miss compl + Re-inject compl = Packet completed normally. From ca220141fa8ebae09765a242076b2b77338106b0 Mon Sep 17 00:00:00 2001 From: Jiayuan Chen Date: Thu, 19 Feb 2026 09:42:51 +0800 Subject: [PATCH 153/359] kcm: fix zero-frag skb in frag_list on partial sendmsg error Syzkaller reported a warning in kcm_write_msgs() when processing a message with a zero-fragment skb in the frag_list. When kcm_sendmsg() fills MAX_SKB_FRAGS fragments in the current skb, it allocates a new skb (tskb) and links it into the frag_list before copying data. If the copy subsequently fails (e.g. -EFAULT from user memory), tskb remains in the frag_list with zero fragments: head skb (msg being assembled, NOT yet in sk_write_queue) +-----------+ | frags[17] | (MAX_SKB_FRAGS, all filled with data) | frag_list-+--> tskb +-----------+ +----------+ | frags[0] | (empty! copy failed before filling) +----------+ For SOCK_SEQPACKET with partial data already copied, the error path saves this message via partial_message for later completion. For SOCK_SEQPACKET, sock_write_iter() automatically sets MSG_EOR, so a subsequent zero-length write(fd, NULL, 0) completes the message and queues it to sk_write_queue. kcm_write_msgs() then walks the frag_list and hits: WARN_ON(!skb_shinfo(skb)->nr_frags) TCP has a similar pattern where skbs are enqueued before data copy and cleaned up on failure via tcp_remove_empty_skb(). KCM was missing the equivalent cleanup. Fix this by tracking the predecessor skb (frag_prev) when allocating a new frag_list entry. On error, if the tail skb has zero frags, use frag_prev to unlink and free it in O(1) without walking the singly-linked frag_list. frag_prev is safe to dereference because the entire message chain is only held locally (or in kcm->seq_skb) and is not added to sk_write_queue until MSG_EOR, so the send path cannot free it underneath us. Also change the WARN_ON to WARN_ON_ONCE to avoid flooding the log if the condition is somehow hit repeatedly. There are currently no KCM selftests in the kernel tree; a simple reproducer is available at [1]. [1] https://gist.github.com/mrpre/a94d431c757e8d6f168f4dd1a3749daa Reported-by: syzbot+52624bdfbf2746d37d70@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000269a1405a12fdc77@google.com/T/ Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Signed-off-by: Jiayuan Chen Link: https://patch.msgid.link/20260219014256.370092-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski --- net/kcm/kcmsock.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c index 5dd7e0509a48..3912e75079f5 100644 --- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -628,7 +628,7 @@ retry: skb = txm->frag_skb; } - if (WARN_ON(!skb_shinfo(skb)->nr_frags) || + if (WARN_ON_ONCE(!skb_shinfo(skb)->nr_frags) || WARN_ON_ONCE(!skb_frag_page(&skb_shinfo(skb)->frags[0]))) { ret = -EINVAL; goto out; @@ -749,7 +749,7 @@ static int kcm_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) { struct sock *sk = sock->sk; struct kcm_sock *kcm = kcm_sk(sk); - struct sk_buff *skb = NULL, *head = NULL; + struct sk_buff *skb = NULL, *head = NULL, *frag_prev = NULL; size_t copy, copied = 0; long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); int eor = (sock->type == SOCK_DGRAM) ? @@ -824,6 +824,7 @@ start: else skb->next = tskb; + frag_prev = skb; skb = tskb; skb->ip_summed = CHECKSUM_UNNECESSARY; continue; @@ -933,6 +934,22 @@ partial_message: out_error: kcm_push(kcm); + /* When MAX_SKB_FRAGS was reached, a new skb was allocated and + * linked into the frag_list before data copy. If the copy + * subsequently failed, this skb has zero frags. Remove it from + * the frag_list to prevent kcm_write_msgs from later hitting + * WARN_ON(!skb_shinfo(skb)->nr_frags). + */ + if (frag_prev && !skb_shinfo(skb)->nr_frags) { + if (head == frag_prev) + skb_shinfo(head)->frag_list = NULL; + else + frag_prev->next = NULL; + kfree_skb(skb); + /* Update skb as it may be saved in partial_message via goto */ + skb = frag_prev; + } + if (sock->type == SOCK_SEQPACKET) { /* Wrote some bytes before encountering an * error, return partial success. From 4cfe066a82cdf9e83e48b16000f55280efc98325 Mon Sep 17 00:00:00 2001 From: Ivan Vecera Date: Fri, 20 Feb 2026 16:57:54 +0100 Subject: [PATCH 154/359] dpll: zl3073x: fix REF_PHASE_OFFSET_COMP register width for some chip IDs The REF_PHASE_OFFSET_COMP register is 48-bit wide on most zl3073x chip variants, but only 32-bit wide on chip IDs 0x0E30, 0x0E93..0x0E97 and 0x1F60. The driver unconditionally uses 48-bit read/write operations, which on 32-bit variants causes reading 2 bytes past the register boundary (corrupting the value) and writing 2 bytes into the adjacent register. Fix this by storing the chip ID in the device structure during probe and adding a helper to detect the affected variants. Use the correct register width for read/write operations and the matching sign extension bit (31 vs 47) when interpreting the phase compensation value. Fixes: 6287262f761e ("dpll: zl3073x: Add support to adjust phase") Signed-off-by: Ivan Vecera Reviewed-by: Simon Horman Link: https://patch.msgid.link/20260220155755.448185-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski --- drivers/dpll/zl3073x/core.c | 1 + drivers/dpll/zl3073x/core.h | 28 ++++++++++++++++++++++++++++ drivers/dpll/zl3073x/dpll.c | 7 +++++-- drivers/dpll/zl3073x/ref.c | 25 ++++++++++++++++++++----- drivers/dpll/zl3073x/regs.h | 1 + 5 files changed, 55 insertions(+), 7 deletions(-) diff --git a/drivers/dpll/zl3073x/core.c b/drivers/dpll/zl3073x/core.c index 63bd97181b9e..8c5ee28e02f4 100644 --- a/drivers/dpll/zl3073x/core.c +++ b/drivers/dpll/zl3073x/core.c @@ -1026,6 +1026,7 @@ int zl3073x_dev_probe(struct zl3073x_dev *zldev, "Unknown or non-match chip ID: 0x%0x\n", id); } + zldev->chip_id = id; /* Read revision, firmware version and custom config version */ rc = zl3073x_read_u16(zldev, ZL_REG_REVISION, &revision); diff --git a/drivers/dpll/zl3073x/core.h b/drivers/dpll/zl3073x/core.h index dddfcacea5c0..fd2af3c62a7d 100644 --- a/drivers/dpll/zl3073x/core.h +++ b/drivers/dpll/zl3073x/core.h @@ -35,6 +35,7 @@ struct zl3073x_dpll; * @dev: pointer to device * @regmap: regmap to access device registers * @multiop_lock: to serialize multiple register operations + * @chip_id: chip ID read from hardware * @ref: array of input references' invariants * @out: array of outs' invariants * @synth: array of synths' invariants @@ -48,6 +49,7 @@ struct zl3073x_dev { struct device *dev; struct regmap *regmap; struct mutex multiop_lock; + u16 chip_id; /* Invariants */ struct zl3073x_ref ref[ZL3073X_NUM_REFS]; @@ -144,6 +146,32 @@ int zl3073x_write_hwreg_seq(struct zl3073x_dev *zldev, int zl3073x_ref_phase_offsets_update(struct zl3073x_dev *zldev, int channel); +/** + * zl3073x_dev_is_ref_phase_comp_32bit - check ref phase comp register size + * @zldev: pointer to zl3073x device + * + * Some chip IDs have a 32-bit wide ref_phase_offset_comp register instead + * of the default 48-bit. + * + * Return: true if the register is 32-bit, false if 48-bit + */ +static inline bool +zl3073x_dev_is_ref_phase_comp_32bit(struct zl3073x_dev *zldev) +{ + switch (zldev->chip_id) { + case 0x0E30: + case 0x0E93: + case 0x0E94: + case 0x0E95: + case 0x0E96: + case 0x0E97: + case 0x1F60: + return true; + default: + return false; + } +} + static inline bool zl3073x_is_n_pin(u8 id) { diff --git a/drivers/dpll/zl3073x/dpll.c b/drivers/dpll/zl3073x/dpll.c index 78edc36b17fb..8ffbede117c6 100644 --- a/drivers/dpll/zl3073x/dpll.c +++ b/drivers/dpll/zl3073x/dpll.c @@ -475,8 +475,11 @@ zl3073x_dpll_input_pin_phase_adjust_get(const struct dpll_pin *dpll_pin, ref_id = zl3073x_input_pin_ref_get(pin->id); ref = zl3073x_ref_state_get(zldev, ref_id); - /* Perform sign extension for 48bit signed value */ - phase_comp = sign_extend64(ref->phase_comp, 47); + /* Perform sign extension based on register width */ + if (zl3073x_dev_is_ref_phase_comp_32bit(zldev)) + phase_comp = sign_extend64(ref->phase_comp, 31); + else + phase_comp = sign_extend64(ref->phase_comp, 47); /* Reverse two's complement negation applied during set and convert * to 32bit signed int diff --git a/drivers/dpll/zl3073x/ref.c b/drivers/dpll/zl3073x/ref.c index aa2de13effa8..6b65e6103999 100644 --- a/drivers/dpll/zl3073x/ref.c +++ b/drivers/dpll/zl3073x/ref.c @@ -121,8 +121,16 @@ int zl3073x_ref_state_fetch(struct zl3073x_dev *zldev, u8 index) return rc; /* Read phase compensation register */ - rc = zl3073x_read_u48(zldev, ZL_REG_REF_PHASE_OFFSET_COMP, - &ref->phase_comp); + if (zl3073x_dev_is_ref_phase_comp_32bit(zldev)) { + u32 val; + + rc = zl3073x_read_u32(zldev, ZL_REG_REF_PHASE_OFFSET_COMP_32, + &val); + ref->phase_comp = val; + } else { + rc = zl3073x_read_u48(zldev, ZL_REG_REF_PHASE_OFFSET_COMP, + &ref->phase_comp); + } if (rc) return rc; @@ -179,9 +187,16 @@ int zl3073x_ref_state_set(struct zl3073x_dev *zldev, u8 index, if (!rc && dref->sync_ctrl != ref->sync_ctrl) rc = zl3073x_write_u8(zldev, ZL_REG_REF_SYNC_CTRL, ref->sync_ctrl); - if (!rc && dref->phase_comp != ref->phase_comp) - rc = zl3073x_write_u48(zldev, ZL_REG_REF_PHASE_OFFSET_COMP, - ref->phase_comp); + if (!rc && dref->phase_comp != ref->phase_comp) { + if (zl3073x_dev_is_ref_phase_comp_32bit(zldev)) + rc = zl3073x_write_u32(zldev, + ZL_REG_REF_PHASE_OFFSET_COMP_32, + ref->phase_comp); + else + rc = zl3073x_write_u48(zldev, + ZL_REG_REF_PHASE_OFFSET_COMP, + ref->phase_comp); + } if (rc) return rc; diff --git a/drivers/dpll/zl3073x/regs.h b/drivers/dpll/zl3073x/regs.h index d837bee72b17..5573d7188406 100644 --- a/drivers/dpll/zl3073x/regs.h +++ b/drivers/dpll/zl3073x/regs.h @@ -194,6 +194,7 @@ #define ZL_REF_CONFIG_DIFF_EN BIT(2) #define ZL_REG_REF_PHASE_OFFSET_COMP ZL_REG(10, 0x28, 6) +#define ZL_REG_REF_PHASE_OFFSET_COMP_32 ZL_REG(10, 0x28, 4) #define ZL_REG_REF_SYNC_CTRL ZL_REG(10, 0x2e, 1) #define ZL_REF_SYNC_CTRL_MODE GENMASK(2, 0) From 3aa677625c8fad39989496c51bcff3872c1f16f1 Mon Sep 17 00:00:00 2001 From: Tung Nguyen Date: Fri, 20 Feb 2026 05:05:41 +0000 Subject: [PATCH 155/359] tipc: fix duplicate publication key in tipc_service_insert_publ() TIPC uses named table to store TIPC services represented by type and instance. Each time an application calls TIPC API bind() to bind a type/instance to a socket, an entry is created and inserted into the named table. It looks like this: named table: key1, entry1 (type, instance ...) key2, entry2 (type, instance ...) In the above table, each entry represents a route for sending data from one socket to the other. For all publications originated from the same node, the key is UNIQUE to identify each entry. It is calculated by this formula: key = socket portid + number of bindings + 1 (1) where: - socket portid: unique and calculated by using linux kernel function get_random_u32_below(). So, the value is randomized. - number of bindings: the number of times a type/instance pair is bound to a socket. This number is linearly increased, starting from 0. While the socket portid is unique and randomized by linux kernel, the linear increment of "number of bindings" in formula (1) makes "key" not unique anymore. For example: - Socket 1 is created with its associated port number 20062001. Type 1000, instance 1 is bound to socket 1: key1: 20062001 + 0 + 1 = 20062002 Then, bind() is called a second time on Socket 1 to by the same type 1000, instance 1: key2: 20062001 + 1 + 1 = 20062003 Named table: key1 (20062002), entry1 (1000, 1 ...) key2 (20062003), entry2 (1000, 1 ...) - Socket 2 is created with its associated port number 20062002. Type 1000, instance 1 is bound to socket 2: key3: 20062002 + 0 + 1 = 20062003 TIPC looks up the named table and finds out that key2 with the same value already exists and rejects the insertion into the named table. This leads to failure of bind() call from application on Socket 2 with error message EINVAL "Invalid argument". This commit fixes this issue by adding more port id checking to make sure that the key is unique to publications originated from the same port id and node. Fixes: 218527fe27ad ("tipc: replace name table service range array with rb tree") Signed-off-by: Tung Nguyen Reviewed-by: Simon Horman Link: https://patch.msgid.link/20260220050541.237962-1-tung.quang.nguyen@est.tech Signed-off-by: Jakub Kicinski --- net/tipc/name_table.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c index e74940eab3a4..7f42fb6a8481 100644 --- a/net/tipc/name_table.c +++ b/net/tipc/name_table.c @@ -348,7 +348,8 @@ static bool tipc_service_insert_publ(struct net *net, /* Return if the publication already exists */ list_for_each_entry(_p, &sr->all_publ, all_publ) { - if (_p->key == key && (!_p->sk.node || _p->sk.node == node)) { + if (_p->key == key && _p->sk.ref == p->sk.ref && + (!_p->sk.node || _p->sk.node == node)) { pr_debug("Failed to bind duplicate %u,%u,%u/%u:%u/%u\n", p->sr.type, p->sr.lower, p->sr.upper, node, p->sk.ref, key); @@ -388,7 +389,8 @@ static struct publication *tipc_service_remove_publ(struct service_range *r, u32 node = sk->node; list_for_each_entry(p, &r->all_publ, all_publ) { - if (p->key != key || (node && node != p->sk.node)) + if (p->key != key || p->sk.ref != sk->ref || + (node && node != p->sk.node)) continue; list_del(&p->all_publ); list_del(&p->local_publ); From 63c49efc987afefc6b9bb7de083eb8748e0b1789 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:17 -0800 Subject: [PATCH 156/359] selftests/bpf: Add simple strscpy() implementation Replace bpf_strlcpy() in bpf_util.h with a sized_strscpy(), which is a simplified sized_strscpy() from the kernel (lib/string.c [1]). It: * takes a count (destination size) parameter * guarantees NULL-termination * returns the number of characters copied or -E2BIG Re-define strscpy macro similar to in-kernel implementation [2]: allow the count parameter to be optional. Add #ifdef-s to tools/include/linux/args.h, as they may be defined in other system headers (for example, __CONCAT in sys/cdefs.h). Fixup the single existing bpf_strlcpy() call in cgroup_helpers.c [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/string.c?h=v6.19#n113 [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/string.h?h=v6.19#n91 Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223190736.649171-2-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/include/linux/args.h | 4 ++ tools/testing/selftests/bpf/bpf_util.h | 45 ++++++++++++++------ tools/testing/selftests/bpf/cgroup_helpers.c | 2 +- 3 files changed, 37 insertions(+), 14 deletions(-) diff --git a/tools/include/linux/args.h b/tools/include/linux/args.h index 2e8e65d975c7..14b268f2389a 100644 --- a/tools/include/linux/args.h +++ b/tools/include/linux/args.h @@ -22,7 +22,11 @@ #define COUNT_ARGS(X...) __COUNT_ARGS(, ##X, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0) /* Concatenate two parameters, but allow them to be expanded beforehand. */ +#ifndef __CONCAT #define __CONCAT(a, b) a ## b +#endif +#ifndef CONCATENATE #define CONCATENATE(a, b) __CONCAT(a, b) +#endif #endif /* _LINUX_ARGS_H */ diff --git a/tools/testing/selftests/bpf/bpf_util.h b/tools/testing/selftests/bpf/bpf_util.h index 4bc2d25f33e1..6cb56501a505 100644 --- a/tools/testing/selftests/bpf/bpf_util.h +++ b/tools/testing/selftests/bpf/bpf_util.h @@ -8,6 +8,7 @@ #include #include #include /* libbpf_num_possible_cpus */ +#include static inline unsigned int bpf_num_possible_cpus(void) { @@ -21,25 +22,43 @@ static inline unsigned int bpf_num_possible_cpus(void) return possible_cpus; } -/* Copy up to sz - 1 bytes from zero-terminated src string and ensure that dst - * is zero-terminated string no matter what (unless sz == 0, in which case - * it's a no-op). It's conceptually close to FreeBSD's strlcpy(), but differs - * in what is returned. Given this is internal helper, it's trivial to extend - * this, when necessary. Use this instead of strncpy inside libbpf source code. +/* + * Simplified strscpy() implementation. The kernel one is in lib/string.c */ -static inline void bpf_strlcpy(char *dst, const char *src, size_t sz) +static inline ssize_t sized_strscpy(char *dest, const char *src, size_t count) { - size_t i; + long res = 0; - if (sz == 0) - return; + if (count == 0) + return -E2BIG; - sz--; - for (i = 0; i < sz && src[i]; i++) - dst[i] = src[i]; - dst[i] = '\0'; + while (count > 1) { + char c; + + c = src[res]; + dest[res] = c; + if (!c) + return res; + res++; + count--; + } + + /* Force NUL-termination. */ + dest[res] = '\0'; + + /* Return E2BIG if the source didn't stop */ + return src[res] ? -E2BIG : res; } +#define __strscpy0(dst, src, ...) \ + sized_strscpy(dst, src, sizeof(dst)) +#define __strscpy1(dst, src, size) \ + sized_strscpy(dst, src, size) + +#undef strscpy /* Redefine the placeholder from tools/include/linux/string.h */ +#define strscpy(dst, src, ...) \ + CONCATENATE(__strscpy, COUNT_ARGS(__VA_ARGS__))(dst, src, __VA_ARGS__) + #define __bpf_percpu_val_align __attribute__((__aligned__(8))) #define BPF_DECLARE_PERCPU(type, name) \ diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c b/tools/testing/selftests/bpf/cgroup_helpers.c index 20cede4db3ce..45cd0b479fe3 100644 --- a/tools/testing/selftests/bpf/cgroup_helpers.c +++ b/tools/testing/selftests/bpf/cgroup_helpers.c @@ -86,7 +86,7 @@ static int __enable_controllers(const char *cgroup_path, const char *controllers enable[len] = 0; close(fd); } else { - bpf_strlcpy(enable, controllers, sizeof(enable)); + strscpy(enable, controllers); } snprintf(path, sizeof(path), "%s/cgroup.subtree_control", cgroup_path); From 6a087ac23f5b6619033ad08e60c355b00bb8ce51 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:18 -0800 Subject: [PATCH 157/359] selftests/bpf: Replace strcpy() calls with strscpy() strcpy() does not perform bounds checking and is considered deprecated [1]. Replace strcpy() calls with strscpy() defined in bpf_util.h. [1] https://docs.kernel.org/process/deprecated.html#strcpy Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223190736.649171-3-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/network_helpers.c | 2 +- tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c | 2 +- tools/testing/selftests/bpf/prog_tests/setget_sockopt.c | 2 +- tools/testing/selftests/bpf/prog_tests/sockopt_sk.c | 2 +- tools/testing/selftests/bpf/prog_tests/test_veristat.c | 4 ++-- tools/testing/selftests/bpf/xdp_features.c | 3 ++- 6 files changed, 8 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c index 0a6a5561bed3..5374b7e16d53 100644 --- a/tools/testing/selftests/bpf/network_helpers.c +++ b/tools/testing/selftests/bpf/network_helpers.c @@ -432,7 +432,7 @@ int make_sockaddr(int family, const char *addr_str, __u16 port, memset(addr, 0, sizeof(*sun)); sun->sun_family = family; sun->sun_path[0] = 0; - strcpy(sun->sun_path + 1, addr_str); + strscpy(sun->sun_path + 1, addr_str, sizeof(sun->sun_path) - 1); if (len) *len = offsetof(struct sockaddr_un, sun_path) + 1 + strlen(addr_str); return 0; diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c index b7d1b52309d0..f829b6f09bc9 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c @@ -281,7 +281,7 @@ static void test_dctcp_fallback(void) dctcp_skel = bpf_dctcp__open(); if (!ASSERT_OK_PTR(dctcp_skel, "dctcp_skel")) return; - strcpy(dctcp_skel->rodata->fallback_cc, "cubic"); + strscpy(dctcp_skel->rodata->fallback_cc, "cubic"); if (!ASSERT_OK(bpf_dctcp__load(dctcp_skel), "bpf_dctcp__load")) goto done; diff --git a/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c b/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c index e4dac529d424..77fe1bfb7504 100644 --- a/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c +++ b/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c @@ -212,7 +212,7 @@ void test_setget_sockopt(void) if (!ASSERT_OK_PTR(skel, "open skel")) goto done; - strcpy(skel->rodata->veth, "binddevtest1"); + strscpy(skel->rodata->veth, "binddevtest1"); skel->rodata->veth_ifindex = if_nametoindex("binddevtest1"); if (!ASSERT_GT(skel->rodata->veth_ifindex, 0, "if_nametoindex")) goto done; diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c index ba6b3ec1156a..53637431ec5d 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c +++ b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c @@ -142,7 +142,7 @@ static int getsetsockopt(void) /* TCP_CONGESTION can extend the string */ - strcpy(buf.cc, "nv"); + strscpy(buf.cc, "nv"); err = setsockopt(fd, SOL_TCP, TCP_CONGESTION, &buf, strlen("nv")); if (err) { log_err("Failed to call setsockopt(TCP_CONGESTION)"); diff --git a/tools/testing/selftests/bpf/prog_tests/test_veristat.c b/tools/testing/selftests/bpf/prog_tests/test_veristat.c index b38c16b4247f..9aff08ac55c0 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_veristat.c +++ b/tools/testing/selftests/bpf/prog_tests/test_veristat.c @@ -24,9 +24,9 @@ static struct fixture *init_fixture(void) /* for no_alu32 and cpuv4 veristat is in parent folder */ if (access("./veristat", F_OK) == 0) - strcpy(fix->veristat, "./veristat"); + strscpy(fix->veristat, "./veristat"); else if (access("../veristat", F_OK) == 0) - strcpy(fix->veristat, "../veristat"); + strscpy(fix->veristat, "../veristat"); else PRINT_FAIL("Can't find veristat binary"); diff --git a/tools/testing/selftests/bpf/xdp_features.c b/tools/testing/selftests/bpf/xdp_features.c index 595c79141cf3..a27ed663967c 100644 --- a/tools/testing/selftests/bpf/xdp_features.c +++ b/tools/testing/selftests/bpf/xdp_features.c @@ -16,6 +16,7 @@ #include +#include "bpf_util.h" #include "xdp_features.skel.h" #include "xdp_features.h" @@ -212,7 +213,7 @@ static void set_env_default(void) env.feature.drv_feature = NETDEV_XDP_ACT_NDO_XMIT; env.feature.action = -EINVAL; env.ifindex = -ENODEV; - strcpy(env.ifname, "unknown"); + strscpy(env.ifname, "unknown"); make_sockaddr(AF_INET6, "::ffff:127.0.0.1", DUT_CTRL_PORT, &env.dut_ctrl_addr, NULL); make_sockaddr(AF_INET6, "::ffff:127.0.0.1", DUT_ECHO_PORT, From 9af0feae8016ba58ad7ff784a903404986b395b1 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Tue, 27 Jan 2026 10:38:39 +0100 Subject: [PATCH 158/359] RDMA/core: Fix stale RoCE GIDs during netdev events at registration MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit RoCE GID entries become stale when netdev properties change during the IB device registration window. This is reproducible with a udev rule that sets a MAC address when a VF netdev appears: ACTION=="add", SUBSYSTEM=="net", KERNEL=="eth4", \ RUN+="/sbin/ip link set eth4 address 88:22:33:44:55:66" After VF creation, show_gids displays GIDs derived from the original random MAC rather than the configured one. The root cause is a race between netdev event processing and device registration: CPU 0 (driver) CPU 1 (udev/workqueue) ────────────── ────────────────────── ib_register_device() ib_cache_setup_one() gid_table_setup_one() _gid_table_setup_one() ← GID table allocated rdma_roce_rescan_device() ← GIDs populated with OLD MAC ip link set eth4 addr NEW_MAC NETDEV_CHANGEADDR queued netdevice_event_work_handler() ib_enum_all_roce_netdevs() ← Iterates DEVICE_REGISTERED ← Device NOT marked yet, SKIP! enable_device_and_get() xa_set_mark(DEVICE_REGISTERED) ← Too late, event was lost The netdev event handler uses ib_enum_all_roce_netdevs() which only iterates devices marked DEVICE_REGISTERED. However, this mark is set late in the registration process, after the GID cache is already populated. Events arriving in this window are silently dropped. Fix this by introducing a new xarray mark DEVICE_GID_UPDATES that is set immediately after the GID table is allocated and initialized. Use the new mark in ib_enum_all_roce_netdevs() function to iterate devices instead of DEVICE_REGISTERED. This is safe because: - After _gid_table_setup_one(), all required structures exist (port_data, immutable, cache.gid) - The GID table mutex serializes concurrent access between the initial rescan and event handlers - Event handlers correctly update stale GIDs even when racing with rescan - The mark is cleared in ib_cache_cleanup_one() before teardown This also fixes similar races for IP address events (inetaddr_event, inet6addr_event) which use the same enumeration path. Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data") Signed-off-by: Jiri Pirko Link: https://patch.msgid.link/20260127093839.126291-1-jiri@resnulli.us Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84 Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/cache.c | 13 +++++++++++ drivers/infiniband/core/core_priv.h | 3 +++ drivers/infiniband/core/device.c | 34 ++++++++++++++++++++++++++++- 3 files changed, 49 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index b415d4aad04f..ee4a2bc68fb2 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -926,6 +926,13 @@ static int gid_table_setup_one(struct ib_device *ib_dev) if (err) return err; + /* + * Mark the device as ready for GID cache updates. This allows netdev + * event handlers to update the GID cache even before the device is + * fully registered. + */ + ib_device_enable_gid_updates(ib_dev); + rdma_roce_rescan_device(ib_dev); return err; @@ -1637,6 +1644,12 @@ void ib_cache_release_one(struct ib_device *device) void ib_cache_cleanup_one(struct ib_device *device) { + /* + * Clear the GID updates mark first to prevent event handlers from + * accessing the device while it's being torn down. + */ + ib_device_disable_gid_updates(device); + /* The cleanup function waits for all in-progress workqueue * elements and cleans up the GID cache. This function should be * called after the device was removed from the devices list and diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 05102769a918..a2c36666e6fc 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -100,6 +100,9 @@ void ib_enum_all_roce_netdevs(roce_netdev_filter filter, roce_netdev_callback cb, void *cookie); +void ib_device_enable_gid_updates(struct ib_device *device); +void ib_device_disable_gid_updates(struct ib_device *device); + typedef int (*nldev_callback)(struct ib_device *device, struct sk_buff *skb, struct netlink_callback *cb, diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 1b5f1ee0a557..558b73940d66 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -93,6 +93,7 @@ static struct workqueue_struct *ib_unreg_wq; static DEFINE_XARRAY_FLAGS(devices, XA_FLAGS_ALLOC); static DECLARE_RWSEM(devices_rwsem); #define DEVICE_REGISTERED XA_MARK_1 +#define DEVICE_GID_UPDATES XA_MARK_2 static u32 highest_client_id; #define CLIENT_REGISTERED XA_MARK_1 @@ -2412,11 +2413,42 @@ void ib_enum_all_roce_netdevs(roce_netdev_filter filter, unsigned long index; down_read(&devices_rwsem); - xa_for_each_marked (&devices, index, dev, DEVICE_REGISTERED) + xa_for_each_marked(&devices, index, dev, DEVICE_GID_UPDATES) ib_enum_roce_netdev(dev, filter, filter_cookie, cb, cookie); up_read(&devices_rwsem); } +/** + * ib_device_enable_gid_updates - Mark device as ready for GID cache updates + * @device: Device to mark + * + * Called after GID table is allocated and initialized. After this mark is set, + * netdevice event handlers can update the device's GID cache. This allows + * events that arrive during device registration to be processed, avoiding + * stale GID entries when netdev properties change during the device + * registration process. + */ +void ib_device_enable_gid_updates(struct ib_device *device) +{ + down_write(&devices_rwsem); + xa_set_mark(&devices, device->index, DEVICE_GID_UPDATES); + up_write(&devices_rwsem); +} + +/** + * ib_device_disable_gid_updates - Clear the GID updates mark + * @device: Device to unmark + * + * Called before GID table cleanup to prevent event handlers from accessing + * the device while it's being torn down. + */ +void ib_device_disable_gid_updates(struct ib_device *device) +{ + down_write(&devices_rwsem); + xa_clear_mark(&devices, device->index, DEVICE_GID_UPDATES); + up_write(&devices_rwsem); +} + /* * ib_enum_all_devs - enumerate all ib_devices * @cb: Callback to call for each found ib_device From 7a23af417d9dd57b4382356b2e7442e5d2bf5bea Mon Sep 17 00:00:00 2001 From: Siva Reddy Kallam Date: Wed, 18 Feb 2026 09:12:45 +0000 Subject: [PATCH 159/359] RDMA/bng_re: Remove unnessary validity checks Fix below smatch warning: drivers/infiniband/hw/bng_re/bng_dev.c:113 bng_re_net_ring_free() warn: variable dereferenced before check 'rdev' (see line 107) current driver has unnessary validity checks. So, removing these unnessary validity checks. Fixes: 4f830cd8d7fe ("RDMA/bng_re: Add infrastructure for enabling Firmware channel") Fixes: 745065770c2d ("RDMA/bng_re: Register and get the resources from bnge driver") Fixes: 04e031ff6e60 ("RDMA/bng_re: Initialize the Firmware and Hardware") Fixes: d0da769c19d0 ("RDMA/bng_re: Add Auxiliary interface") Reported-by: Simon Horman Reported-by: kernel test robot Reported-by: Dan Carpenter Closes: https://lore.kernel.org/r/202601010413.sWadrQel-lkp@intel.com/ Signed-off-by: Siva Reddy Kallam Link: https://patch.msgid.link/20260218091246.1764808-2-siva.kallam@broadcom.com Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/bng_re/bng_dev.c | 27 ++++---------------------- 1 file changed, 4 insertions(+), 23 deletions(-) diff --git a/drivers/infiniband/hw/bng_re/bng_dev.c b/drivers/infiniband/hw/bng_re/bng_dev.c index cf3bf08e5f26..b8df807c3a64 100644 --- a/drivers/infiniband/hw/bng_re/bng_dev.c +++ b/drivers/infiniband/hw/bng_re/bng_dev.c @@ -54,9 +54,6 @@ static void bng_re_destroy_chip_ctx(struct bng_re_dev *rdev) { struct bng_re_chip_ctx *chip_ctx; - if (!rdev->chip_ctx) - return; - kfree(rdev->dev_attr); rdev->dev_attr = NULL; @@ -124,12 +121,6 @@ static int bng_re_net_ring_free(struct bng_re_dev *rdev, struct bnge_fw_msg fw_msg = {}; int rc = -EINVAL; - if (!rdev) - return rc; - - if (!aux_dev) - return rc; - bng_re_init_hwrm_hdr((void *)&req, HWRM_RING_FREE); req.ring_type = type; req.ring_id = cpu_to_le16(fw_ring_id); @@ -150,10 +141,7 @@ static int bng_re_net_ring_alloc(struct bng_re_dev *rdev, struct hwrm_ring_alloc_input req = {}; struct hwrm_ring_alloc_output resp; struct bnge_fw_msg fw_msg = {}; - int rc = -EINVAL; - - if (!aux_dev) - return rc; + int rc; bng_re_init_hwrm_hdr((void *)&req, HWRM_RING_ALLOC); req.enables = 0; @@ -184,10 +172,7 @@ static int bng_re_stats_ctx_free(struct bng_re_dev *rdev) struct hwrm_stat_ctx_free_input req = {}; struct hwrm_stat_ctx_free_output resp = {}; struct bnge_fw_msg fw_msg = {}; - int rc = -EINVAL; - - if (!aux_dev) - return rc; + int rc; bng_re_init_hwrm_hdr((void *)&req, HWRM_STAT_CTX_FREE); req.stat_ctx_id = cpu_to_le32(rdev->stats_ctx.fw_id); @@ -208,13 +193,10 @@ static int bng_re_stats_ctx_alloc(struct bng_re_dev *rdev) struct hwrm_stat_ctx_alloc_output resp = {}; struct hwrm_stat_ctx_alloc_input req = {}; struct bnge_fw_msg fw_msg = {}; - int rc = -EINVAL; + int rc; stats->fw_id = BNGE_INVALID_STATS_CTX_ID; - if (!aux_dev) - return rc; - bng_re_init_hwrm_hdr((void *)&req, HWRM_STAT_CTX_ALLOC); req.update_period_ms = cpu_to_le32(1000); req.stats_dma_addr = cpu_to_le64(stats->dma_map); @@ -486,8 +468,7 @@ static void bng_re_remove(struct auxiliary_device *adev) rdev = dev_info->rdev; - if (rdev) - bng_re_remove_device(rdev, adev); + bng_re_remove_device(rdev, adev); kfree(dev_info); } From 3d2e5d12a2eef0ca8a629a422aa593673235c77c Mon Sep 17 00:00:00 2001 From: Siva Reddy Kallam Date: Wed, 18 Feb 2026 09:12:46 +0000 Subject: [PATCH 160/359] RDMA/bng_re: Unwind bng_re_dev_init properly Fix below smatch warning: drivers/infiniband/hw/bng_re/bng_dev.c:270 bng_re_dev_init() warn: missing unwind goto? Current bng_re_dev_init function is not having clear unwinding code. So, added proper unwinding with ladder. Fixes: 4f830cd8d7fe ("RDMA/bng_re: Add infrastructure for enabling Firmware channel") Reported-by: Simon Horman Reported-by: kernel test robot Reported-by: Dan Carpenter Closes: https://lore.kernel.org/r/202601010413.sWadrQel-lkp@intel.com/ Signed-off-by: Siva Reddy Kallam Link: https://patch.msgid.link/20260218091246.1764808-3-siva.kallam@broadcom.com Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/bng_re/bng_dev.c | 29 +++++++++++++------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/drivers/infiniband/hw/bng_re/bng_dev.c b/drivers/infiniband/hw/bng_re/bng_dev.c index b8df807c3a64..d34b5f88cd40 100644 --- a/drivers/infiniband/hw/bng_re/bng_dev.c +++ b/drivers/infiniband/hw/bng_re/bng_dev.c @@ -285,7 +285,7 @@ static int bng_re_dev_init(struct bng_re_dev *rdev) if (rc) { ibdev_err(&rdev->ibdev, "Failed to register with netedev: %#x\n", rc); - return -EINVAL; + goto reg_netdev_fail; } set_bit(BNG_RE_FLAG_NETDEV_REGISTERED, &rdev->flags); @@ -294,19 +294,16 @@ static int bng_re_dev_init(struct bng_re_dev *rdev) ibdev_err(&rdev->ibdev, "RoCE requires minimum 2 MSI-X vectors, but only %d reserved\n", rdev->aux_dev->auxr_info->msix_requested); - bnge_unregister_dev(rdev->aux_dev); - clear_bit(BNG_RE_FLAG_NETDEV_REGISTERED, &rdev->flags); - return -EINVAL; + rc = -EINVAL; + goto msix_ctx_fail; } ibdev_dbg(&rdev->ibdev, "Got %d MSI-X vectors\n", rdev->aux_dev->auxr_info->msix_requested); rc = bng_re_setup_chip_ctx(rdev); if (rc) { - bnge_unregister_dev(rdev->aux_dev); - clear_bit(BNG_RE_FLAG_NETDEV_REGISTERED, &rdev->flags); ibdev_err(&rdev->ibdev, "Failed to get chip context\n"); - return -EINVAL; + goto msix_ctx_fail; } bng_re_query_hwrm_version(rdev); @@ -315,16 +312,14 @@ static int bng_re_dev_init(struct bng_re_dev *rdev) if (rc) { ibdev_err(&rdev->ibdev, "Failed to allocate RCFW Channel: %#x\n", rc); - goto fail; + goto alloc_fw_chl_fail; } /* Allocate nq record memory */ rdev->nqr = kzalloc_obj(*rdev->nqr); if (!rdev->nqr) { - bng_re_destroy_chip_ctx(rdev); - bnge_unregister_dev(rdev->aux_dev); - clear_bit(BNG_RE_FLAG_NETDEV_REGISTERED, &rdev->flags); - return -ENOMEM; + rc = -ENOMEM; + goto nq_alloc_fail; } rdev->nqr->num_msix = rdev->aux_dev->auxr_info->msix_requested; @@ -393,9 +388,15 @@ disable_rcfw: free_ring: bng_re_net_ring_free(rdev, rdev->rcfw.creq.ring_id, type); free_rcfw: + kfree(rdev->nqr); +nq_alloc_fail: bng_re_free_rcfw_channel(&rdev->rcfw); -fail: - bng_re_dev_uninit(rdev); +alloc_fw_chl_fail: + bng_re_destroy_chip_ctx(rdev); +msix_ctx_fail: + bnge_unregister_dev(rdev->aux_dev); + clear_bit(BNG_RE_FLAG_NETDEV_REGISTERED, &rdev->flags); +reg_netdev_fail: return rc; } From 017c1792525064a723971f0216e6ef86a8c7af11 Mon Sep 17 00:00:00 2001 From: Vahagn Vardanian Date: Mon, 23 Feb 2026 00:00:00 +0000 Subject: [PATCH 161/359] wifi: mac80211: fix NULL pointer dereference in mesh_rx_csa_frame() In mesh_rx_csa_frame(), elems->mesh_chansw_params_ie is dereferenced at lines 1638 and 1642 without a prior NULL check: ifmsh->chsw_ttl = elems->mesh_chansw_params_ie->mesh_ttl; ... pre_value = le16_to_cpu(elems->mesh_chansw_params_ie->mesh_pre_value); The mesh_matches_local() check above only validates the Mesh ID, Mesh Configuration, and Supported Rates IEs. It does not verify the presence of the Mesh Channel Switch Parameters IE (element ID 118). When a received CSA action frame omits that IE, ieee802_11_parse_elems() leaves elems->mesh_chansw_params_ie as NULL, and the unconditional dereference causes a kernel NULL pointer dereference. A remote mesh peer with an established peer link (PLINK_ESTAB) can trigger this by sending a crafted SPECTRUM_MGMT/CHL_SWITCH action frame that includes a matching Mesh ID and Mesh Configuration IE but omits the Mesh Channel Switch Parameters IE. No authentication beyond the default open mesh peering is required. Crash confirmed on kernel 6.17.0-5-generic via mac80211_hwsim: BUG: kernel NULL pointer dereference, address: 0000000000000000 Oops: Oops: 0000 [#1] SMP NOPTI RIP: 0010:ieee80211_mesh_rx_queued_mgmt+0x143/0x2a0 [mac80211] CR2: 0000000000000000 Fix by adding a NULL check for mesh_chansw_params_ie after mesh_matches_local() returns, consistent with how other optional IEs are guarded throughout the mesh code. The bug has been present since v3.13 (released 2014-01-19). Fixes: 8f2535b92d68 ("mac80211: process the CSA frame for mesh accordingly") Cc: stable@vger.kernel.org Signed-off-by: Vahagn Vardanian Signed-off-by: Johannes Berg --- net/mac80211/mesh.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/mac80211/mesh.c b/net/mac80211/mesh.c index 68901f1def0d..129e814abe76 100644 --- a/net/mac80211/mesh.c +++ b/net/mac80211/mesh.c @@ -1636,6 +1636,9 @@ static void mesh_rx_csa_frame(struct ieee80211_sub_if_data *sdata, if (!mesh_matches_local(sdata, elems)) goto free; + if (!elems->mesh_chansw_params_ie) + goto free; + ifmsh->chsw_ttl = elems->mesh_chansw_params_ie->mesh_ttl; if (!--ifmsh->chsw_ttl) fwd_csa = false; From 021fd0f87004e949cba6f27199b9925bc24ec3a0 Mon Sep 17 00:00:00 2001 From: Fernando Fernandez Mancera Date: Thu, 19 Feb 2026 08:57:38 +0100 Subject: [PATCH 162/359] net/rds: fix recursive lock in rds_tcp_conn_slots_available syzbot reported a recursive lock warning in rds_tcp_get_peer_sport() as it calls inet6_getname() which acquires the socket lock that was already held by __release_sock(). kworker/u8:6/2985 is trying to acquire lock: ffff88807a07aa20 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1709 [inline] ffff88807a07aa20 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: inet6_getname+0x15d/0x650 net/ipv6/af_inet6.c:533 but task is already holding lock: ffff88807a07aa20 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1709 [inline] ffff88807a07aa20 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_sock_set_cork+0x2c/0x2e0 net/ipv4/tcp.c:3694 lock_sock_nested+0x48/0x100 net/core/sock.c:3780 lock_sock include/net/sock.h:1709 [inline] inet6_getname+0x15d/0x650 net/ipv6/af_inet6.c:533 rds_tcp_get_peer_sport net/rds/tcp_listen.c:70 [inline] rds_tcp_conn_slots_available+0x288/0x470 net/rds/tcp_listen.c:149 rds_recv_hs_exthdrs+0x60f/0x7c0 net/rds/recv.c:265 rds_recv_incoming+0x9f6/0x12d0 net/rds/recv.c:389 rds_tcp_data_recv+0x7f1/0xa40 net/rds/tcp_recv.c:243 __tcp_read_sock+0x196/0x970 net/ipv4/tcp.c:1702 rds_tcp_read_sock net/rds/tcp_recv.c:277 [inline] rds_tcp_data_ready+0x369/0x950 net/rds/tcp_recv.c:331 tcp_rcv_established+0x19e9/0x2670 net/ipv4/tcp_input.c:6675 tcp_v6_do_rcv+0x8eb/0x1ba0 net/ipv6/tcp_ipv6.c:1609 sk_backlog_rcv include/net/sock.h:1185 [inline] __release_sock+0x1b8/0x3a0 net/core/sock.c:3213 Reading from the socket struct directly is safe from possible paths. For rds_tcp_accept_one(), the socket has just been accepted and is not yet exposed to concurrent access. For rds_tcp_conn_slots_available(), direct access avoids the recursive deadlock seen during backlog processing where the socket lock is already held from the __release_sock(). However, rds_tcp_conn_slots_available() is also called from the normal softirq path via tcp_data_ready() where the lock is not held. This is also safe because inet_dport is a stable 16 bits field. A READ_ONCE() annotation as the value might be accessed lockless in a concurrent access context. Note that it is also safe to call rds_tcp_conn_slots_available() from rds_conn_shutdown() because the fan-out is disabled. Fixes: 9d27a0fb122f ("net/rds: Trigger rds_send_ping() more than once") Reported-by: syzbot+5efae91f60932839f0a5@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=5efae91f60932839f0a5 Signed-off-by: Fernando Fernandez Mancera Reviewed-by: Allison Henderson Link: https://patch.msgid.link/20260219075738.4403-1-fmancera@suse.de Signed-off-by: Paolo Abeni --- net/rds/connection.c | 3 +++ net/rds/tcp_listen.c | 26 ++++---------------------- 2 files changed, 7 insertions(+), 22 deletions(-) diff --git a/net/rds/connection.c b/net/rds/connection.c index 185f73b01694..a542f94c0214 100644 --- a/net/rds/connection.c +++ b/net/rds/connection.c @@ -455,6 +455,9 @@ void rds_conn_shutdown(struct rds_conn_path *cp) rcu_read_unlock(); } + /* we do not hold the socket lock here but it is safe because + * fan-out is disabled when calling conn_slots_available() + */ if (conn->c_trans->conn_slots_available) conn->c_trans->conn_slots_available(conn, false); } diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c index b4ab68a1da6d..08a506aa7ce7 100644 --- a/net/rds/tcp_listen.c +++ b/net/rds/tcp_listen.c @@ -59,30 +59,12 @@ void rds_tcp_keepalive(struct socket *sock) static int rds_tcp_get_peer_sport(struct socket *sock) { - union { - struct sockaddr_storage storage; - struct sockaddr addr; - struct sockaddr_in sin; - struct sockaddr_in6 sin6; - } saddr; - int sport; + struct sock *sk = sock->sk; - if (kernel_getpeername(sock, &saddr.addr) >= 0) { - switch (saddr.addr.sa_family) { - case AF_INET: - sport = ntohs(saddr.sin.sin_port); - break; - case AF_INET6: - sport = ntohs(saddr.sin6.sin6_port); - break; - default: - sport = -1; - } - } else { - sport = -1; - } + if (!sk) + return -1; - return sport; + return ntohs(READ_ONCE(inet_sk(sk)->inet_dport)); } /* rds_tcp_accept_one_path(): if accepting on cp_index > 0, make sure the From fdcfce93073d990ed4b71752e31ad1c1d6e9d58b Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Mon, 23 Feb 2026 20:59:33 +0100 Subject: [PATCH 163/359] eventpoll: Fix integer overflow in ep_loop_check_proc() If a recursive call to ep_loop_check_proc() hits the `result = INT_MAX`, an integer overflow will occur in the calling ep_loop_check_proc() at `result = max(result, ep_loop_check_proc(ep_tovisit, depth + 1) + 1)`, breaking the recursion depth check. Fix it by using a different placeholder value that can't lead to an overflow. Reported-by: Guenter Roeck Fixes: f2e467a48287 ("eventpoll: Fix semi-unbounded recursion") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn Link: https://patch.msgid.link/20260223-epoll-int-overflow-v1-1-452f35132224@google.com Signed-off-by: Christian Brauner --- fs/eventpoll.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 6c36d9dc6926..d20917b03161 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2061,7 +2061,8 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, * @ep: the &struct eventpoll to be currently checked. * @depth: Current depth of the path being checked. * - * Return: depth of the subtree, or INT_MAX if we found a loop or went too deep. + * Return: depth of the subtree, or a value bigger than EP_MAX_NESTS if we found + * a loop or went too deep. */ static int ep_loop_check_proc(struct eventpoll *ep, int depth) { @@ -2080,7 +2081,7 @@ static int ep_loop_check_proc(struct eventpoll *ep, int depth) struct eventpoll *ep_tovisit; ep_tovisit = epi->ffd.file->private_data; if (ep_tovisit == inserting_into || depth > EP_MAX_NESTS) - result = INT_MAX; + result = EP_MAX_NESTS+1; else result = max(result, ep_loop_check_proc(ep_tovisit, depth + 1) + 1); if (result > EP_MAX_NESTS) From bae8a5d2e759da2e0cba33ab2080deee96a09373 Mon Sep 17 00:00:00 2001 From: Duoming Zhou Date: Thu, 19 Feb 2026 20:46:37 +0800 Subject: [PATCH 164/359] net: wan: farsync: Fix use-after-free bugs caused by unfinished tasklets When the FarSync T-series card is being detached, the fst_card_info is deallocated in fst_remove_one(). However, the fst_tx_task or fst_int_task may still be running or pending, leading to use-after-free bugs when the already freed fst_card_info is accessed in fst_process_tx_work_q() or fst_process_int_work_q(). A typical race condition is depicted below: CPU 0 (cleanup) | CPU 1 (tasklet) | fst_start_xmit() fst_remove_one() | tasklet_schedule() unregister_hdlc_device()| | fst_process_tx_work_q() //handler kfree(card) //free | do_bottom_half_tx() | card-> //use The following KASAN trace was captured: ================================================================== BUG: KASAN: slab-use-after-free in do_bottom_half_tx+0xb88/0xd00 Read of size 4 at addr ffff88800aad101c by task ksoftirqd/3/32 ... Call Trace: dump_stack_lvl+0x55/0x70 print_report+0xcb/0x5d0 ? do_bottom_half_tx+0xb88/0xd00 kasan_report+0xb8/0xf0 ? do_bottom_half_tx+0xb88/0xd00 do_bottom_half_tx+0xb88/0xd00 ? _raw_spin_lock_irqsave+0x85/0xe0 ? __pfx__raw_spin_lock_irqsave+0x10/0x10 ? __pfx___hrtimer_run_queues+0x10/0x10 fst_process_tx_work_q+0x67/0x90 tasklet_action_common+0x1fa/0x720 ? hrtimer_interrupt+0x31f/0x780 handle_softirqs+0x176/0x530 __irq_exit_rcu+0xab/0xe0 sysvec_apic_timer_interrupt+0x70/0x80 ... Allocated by task 41 on cpu 3 at 72.330843s: kasan_save_stack+0x24/0x50 kasan_save_track+0x17/0x60 __kasan_kmalloc+0x7f/0x90 fst_add_one+0x1a5/0x1cd0 local_pci_probe+0xdd/0x190 pci_device_probe+0x341/0x480 really_probe+0x1c6/0x6a0 __driver_probe_device+0x248/0x310 driver_probe_device+0x48/0x210 __device_attach_driver+0x160/0x320 bus_for_each_drv+0x101/0x190 __device_attach+0x198/0x3a0 device_initial_probe+0x78/0xa0 pci_bus_add_device+0x81/0xc0 pci_bus_add_devices+0x7e/0x190 enable_slot+0x9b9/0x1130 acpiphp_check_bridge.part.0+0x2e1/0x460 acpiphp_hotplug_notify+0x36c/0x3c0 acpi_device_hotplug+0x203/0xb10 acpi_hotplug_work_fn+0x59/0x80 ... Freed by task 41 on cpu 1 at 75.138639s: kasan_save_stack+0x24/0x50 kasan_save_track+0x17/0x60 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x43/0x70 kfree+0x135/0x410 fst_remove_one+0x2ca/0x540 pci_device_remove+0xa6/0x1d0 device_release_driver_internal+0x364/0x530 pci_stop_bus_device+0x105/0x150 pci_stop_and_remove_bus_device+0xd/0x20 disable_slot+0x116/0x260 acpiphp_disable_and_eject_slot+0x4b/0x190 acpiphp_hotplug_notify+0x230/0x3c0 acpi_device_hotplug+0x203/0xb10 acpi_hotplug_work_fn+0x59/0x80 ... The buggy address belongs to the object at ffff88800aad1000 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 28 bytes inside of freed 1024-byte region The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xaad0 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x100000000000040(head|node=0|zone=1) page_type: f5(slab) raw: 0100000000000040 ffff888007042dc0 dead000000000122 0000000000000000 raw: 0000000000000000 0000000080100010 00000000f5000000 0000000000000000 head: 0100000000000040 ffff888007042dc0 dead000000000122 0000000000000000 head: 0000000000000000 0000000080100010 00000000f5000000 0000000000000000 head: 0100000000000003 ffffea00002ab401 00000000ffffffff 00000000ffffffff head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88800aad0f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88800aad0f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88800aad1000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88800aad1080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88800aad1100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fix this by ensuring that both fst_tx_task and fst_int_task are properly canceled before the fst_card_info is released. Add tasklet_kill() in fst_remove_one() to synchronize with any pending or running tasklets. Since unregister_hdlc_device() stops data transmission and reception, and fst_disable_intr() prevents further interrupts, it is appropriate to place tasklet_kill() after these calls. The bugs were identified through static analysis. To reproduce the issue and validate the fix, a FarSync T-series card was simulated in QEMU and delays(e.g., mdelay()) were introduced within the tasklet handler to increase the likelihood of triggering the race condition. Fixes: 2f623aaf9f31 ("net: farsync: Fix kmemleak when rmmods farsync") Signed-off-by: Duoming Zhou Reviewed-by: Jijie Shao Link: https://patch.msgid.link/20260219124637.72578-1-duoming@zju.edu.cn Signed-off-by: Paolo Abeni --- drivers/net/wan/farsync.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/wan/farsync.c b/drivers/net/wan/farsync.c index 5b01642ca44e..6b2d1e63855e 100644 --- a/drivers/net/wan/farsync.c +++ b/drivers/net/wan/farsync.c @@ -2550,6 +2550,8 @@ fst_remove_one(struct pci_dev *pdev) fst_disable_intr(card); free_irq(card->irq, card); + tasklet_kill(&fst_tx_task); + tasklet_kill(&fst_int_task); iounmap(card->ctlmem); iounmap(card->mem); From 82aec772fca2223bc5774bd9af486fd95766e578 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 19 Feb 2026 11:50:21 -0800 Subject: [PATCH 165/359] netconsole: avoid OOB reads, msg is not nul-terminated msg passed to netconsole from the console subsystem is not guaranteed to be nul-terminated. Before recent commit 7eab73b18630 ("netconsole: convert to NBCON console infrastructure") the message would be placed in printk_shared_pbufs, a static global buffer, so KASAN had harder time catching OOB accesses. Now we see: printk: console [netcon_ext0] enabled BUG: KASAN: slab-out-of-bounds in string+0x1f7/0x240 Read of size 1 at addr ffff88813b6d4c00 by task pr/netcon_ext0/594 CPU: 65 UID: 0 PID: 594 Comm: pr/netcon_ext0 Not tainted 6.19.0-11754-g4246fd6547c9 Call Trace: kasan_report+0xe4/0x120 string+0x1f7/0x240 vsnprintf+0x655/0xba0 scnprintf+0xba/0x120 netconsole_write+0x3fe/0xa10 nbcon_emit_next_record+0x46e/0x860 nbcon_kthread_func+0x623/0x750 Allocated by task 1: nbcon_alloc+0x1ea/0x450 register_console+0x26b/0xe10 init_netconsole+0xbb0/0xda0 The buggy address belongs to the object at ffff88813b6d4000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 0 bytes to the right of allocated 3072-byte region [ffff88813b6d4000, ffff88813b6d4c00) Fixes: c62c0a17f9b7 ("netconsole: Append kernel version to message") Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman Link: https://patch.msgid.link/20260219195021.2099699-1-kuba@kernel.org Signed-off-by: Paolo Abeni --- drivers/net/netconsole.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index d144787b2947..1b6a4135ec08 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -1679,7 +1679,8 @@ static void send_msg_no_fragmentation(struct netconsole_target *nt, if (release_len) { release = init_utsname()->release; - scnprintf(nt->buf, MAX_PRINT_CHUNK, "%s,%s", release, msg); + scnprintf(nt->buf, MAX_PRINT_CHUNK, "%s,%.*s", release, + msg_len, msg); msg_len += release_len; } else { memcpy(nt->buf, msg, msg_len); From fd80bd7105f88189f47d465ca8cb7d115570de30 Mon Sep 17 00:00:00 2001 From: Kamal Heib Date: Fri, 20 Feb 2026 17:21:26 -0500 Subject: [PATCH 166/359] RDMA/ionic: Fix potential NULL pointer dereference in ionic_query_port The function ionic_query_port() calls ib_device_get_netdev() without checking the return value which could lead to NULL pointer dereference, Fix it by checking the return value and return -ENODEV if the 'ndev' is NULL. Fixes: 2075bbe8ef03 ("RDMA/ionic: Register device ops for miscellaneous functionality") Signed-off-by: Kamal Heib Link: https://patch.msgid.link/20260220222125.16973-2-kheib@redhat.com Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/ionic/ionic_ibdev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/hw/ionic/ionic_ibdev.c b/drivers/infiniband/hw/ionic/ionic_ibdev.c index 164046d00e5d..bd4c73e530d0 100644 --- a/drivers/infiniband/hw/ionic/ionic_ibdev.c +++ b/drivers/infiniband/hw/ionic/ionic_ibdev.c @@ -81,6 +81,8 @@ static int ionic_query_port(struct ib_device *ibdev, u32 port, return -EINVAL; ndev = ib_device_get_netdev(ibdev, port); + if (!ndev) + return -ENODEV; if (netif_running(ndev) && netif_carrier_ok(ndev)) { attr->state = IB_PORT_ACTIVE; From f22c77ce49db0589103d96487dca56f5b2136362 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Mon, 16 Feb 2026 11:02:47 -0400 Subject: [PATCH 167/359] RDMA/efa: Fix typo in efa_alloc_mr() The pattern is to check the entire driver request space, not just sizeof something unrelated. Fixes: 40909f664d27 ("RDMA/efa: Add EFA verbs implementation") Signed-off-by: Jason Gunthorpe Link: https://patch.msgid.link/1-v1-83e918d69e73+a9-rdma_udata_rc_jgg@nvidia.com Acked-by: Michael Margolin Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/efa/efa_verbs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c index b5b93b42e6c4..b0aa7c0238ed 100644 --- a/drivers/infiniband/hw/efa/efa_verbs.c +++ b/drivers/infiniband/hw/efa/efa_verbs.c @@ -1661,7 +1661,7 @@ static struct efa_mr *efa_alloc_mr(struct ib_pd *ibpd, int access_flags, struct efa_mr *mr; if (udata && udata->inlen && - !ib_is_udata_cleared(udata, 0, sizeof(udata->inlen))) { + !ib_is_udata_cleared(udata, 0, udata->inlen)) { ibdev_dbg(&dev->ibdev, "Incompatible ABI params, udata not cleared\n"); return ERR_PTR(-EINVAL); From 117942ca43e2e3c3d121faae530989931b7f67e1 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Mon, 16 Feb 2026 11:02:48 -0400 Subject: [PATCH 168/359] IB/mthca: Add missed mthca_unmap_user_db() for mthca_create_srq() Fix a user triggerable leak on the system call failure path. Cc: stable@vger.kernel.org Fixes: ec34a922d243 ("[PATCH] IB/mthca: Add SRQ implementation") Signed-off-by: Jason Gunthorpe Link: https://patch.msgid.link/2-v1-83e918d69e73+a9-rdma_udata_rc_jgg@nvidia.com Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mthca/mthca_provider.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c index ef0635064fba..5c182ae4fc2f 100644 --- a/drivers/infiniband/hw/mthca/mthca_provider.c +++ b/drivers/infiniband/hw/mthca/mthca_provider.c @@ -428,6 +428,8 @@ static int mthca_create_srq(struct ib_srq *ibsrq, if (context && ib_copy_to_udata(udata, &srq->srqn, sizeof(__u32))) { mthca_free_srq(to_mdev(ibsrq->device), srq); + mthca_unmap_user_db(to_mdev(ibsrq->device), &context->uar, + context->db_tab, ucmd.db_index); return -EFAULT; } @@ -436,6 +438,7 @@ static int mthca_create_srq(struct ib_srq *ibsrq, static int mthca_destroy_srq(struct ib_srq *srq, struct ib_udata *udata) { + mthca_free_srq(to_mdev(srq->device), to_msrq(srq)); if (udata) { struct mthca_ucontext *context = rdma_udata_to_drv_context( @@ -446,8 +449,6 @@ static int mthca_destroy_srq(struct ib_srq *srq, struct ib_udata *udata) mthca_unmap_user_db(to_mdev(srq->device), &context->uar, context->db_tab, to_msrq(srq)->db_index); } - - mthca_free_srq(to_mdev(srq->device), to_msrq(srq)); return 0; } From 74586c6da9ea222a61c98394f2fc0a604748438c Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Mon, 16 Feb 2026 11:02:49 -0400 Subject: [PATCH 169/359] RDMA/irdma: Fix kernel stack leak in irdma_create_user_ah() struct irdma_create_ah_resp { // 8 bytes, no padding __u32 ah_id; // offset 0 - SET (uresp.ah_id = ah->sc_ah.ah_info.ah_idx) __u8 rsvd[4]; // offset 4 - NEVER SET <- LEAK }; rsvd[4]: 4 bytes of stack memory leaked unconditionally. Only ah_id is assigned before ib_respond_udata(). The reserved members of the structure were not zeroed. Cc: stable@vger.kernel.org Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Jason Gunthorpe Link: https://patch.msgid.link/3-v1-83e918d69e73+a9-rdma_udata_rc_jgg@nvidia.com Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/irdma/verbs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/irdma/verbs.c b/drivers/infiniband/hw/irdma/verbs.c index 15af53237217..7251cd7a2147 100644 --- a/drivers/infiniband/hw/irdma/verbs.c +++ b/drivers/infiniband/hw/irdma/verbs.c @@ -5212,7 +5212,7 @@ static int irdma_create_user_ah(struct ib_ah *ibah, #define IRDMA_CREATE_AH_MIN_RESP_LEN offsetofend(struct irdma_create_ah_resp, rsvd) struct irdma_ah *ah = container_of(ibah, struct irdma_ah, ibah); struct irdma_device *iwdev = to_iwdev(ibah->pd->device); - struct irdma_create_ah_resp uresp; + struct irdma_create_ah_resp uresp = {}; struct irdma_ah *parent_ah; int err; From faa72102b178c7ae6c6afea23879e7c84fc59b4e Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Mon, 16 Feb 2026 11:02:50 -0400 Subject: [PATCH 170/359] RDMA/ionic: Fix kernel stack leak in ionic_create_cq() struct ionic_cq_resp resp { __u32 cqid[2]; // offset 0 - PARTIALLY SET (see below) __u8 udma_mask; // offset 8 - SET (resp.udma_mask = vcq->udma_mask) __u8 rsvd[7]; // offset 9 - NEVER SET <- LEAK }; rsvd[7]: 7 bytes of stack memory leaked unconditionally. cqid[2]: The loop at line 1256 iterates over udma_idx but skips indices where !(vcq->udma_mask & BIT(udma_idx)). The array has 2 entries but udma_count could be 1, meaning cqid[1] might never be written via ionic_create_cq_common(). If udma_mask only has bit 0 set, cqid[1] (4 bytes) is also leaked. So potentially 11 bytes leaked. Cc: stable@vger.kernel.org Fixes: e8521822c733 ("RDMA/ionic: Register device ops for control path") Signed-off-by: Jason Gunthorpe Link: https://patch.msgid.link/4-v1-83e918d69e73+a9-rdma_udata_rc_jgg@nvidia.com Acked-by: Abhijit Gangurde Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/ionic/ionic_controlpath.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/ionic/ionic_controlpath.c b/drivers/infiniband/hw/ionic/ionic_controlpath.c index 5b3f40bd98d8..4842931f5316 100644 --- a/drivers/infiniband/hw/ionic/ionic_controlpath.c +++ b/drivers/infiniband/hw/ionic/ionic_controlpath.c @@ -1218,7 +1218,7 @@ int ionic_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, rdma_udata_to_drv_context(udata, struct ionic_ctx, ibctx); struct ionic_vcq *vcq = to_ionic_vcq(ibcq); struct ionic_tbl_buf buf = {}; - struct ionic_cq_resp resp; + struct ionic_cq_resp resp = {}; struct ionic_cq_req req; int udma_idx = 0, rc; From 983512f3a87fd8dc4c94dfa6b596b6e57df5aad7 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Fri, 20 Feb 2026 19:38:58 +0100 Subject: [PATCH 171/359] net: Drop the lock in skb_may_tx_timestamp() skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must not be taken in IRQ context, only softirq is okay. A few drivers receive the timestamp via a dedicated interrupt and complete the TX timestamp from that handler. This will lead to a deadlock if the lock is already write-locked on the same CPU. Taking the lock can be avoided. The socket (pointed by the skb) will remain valid until the skb is released. The ->sk_socket and ->file member will be set to NULL once the user closes the socket which may happen before the timestamp arrives. If we happen to observe the pointer while the socket is closing but before the pointer is set to NULL then we may use it because both pointer (and the file's cred member) are RCU freed. Drop the lock. Use READ_ONCE() to obtain the individual pointer. Add a matching WRITE_ONCE() where the pointer are cleared. Link: https://lore.kernel.org/all/20260205145104.iWinkXHv@linutronix.de Fixes: b245be1f4db1a ("net-timestamp: no-payload only sysctl") Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Willem de Bruijn Reviewed-by: Jason Xing Reviewed-by: Eric Dumazet Link: https://patch.msgid.link/20260220183858.N4ERjFW6@linutronix.de Signed-off-by: Paolo Abeni --- include/net/sock.h | 2 +- net/core/skbuff.c | 23 ++++++++++++++++++----- net/socket.c | 2 +- 3 files changed, 20 insertions(+), 7 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 66b56288c1d3..6c9a83016e95 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2098,7 +2098,7 @@ static inline int sk_rx_queue_get(const struct sock *sk) static inline void sk_set_socket(struct sock *sk, struct socket *sock) { - sk->sk_socket = sock; + WRITE_ONCE(sk->sk_socket, sock); if (sock) { WRITE_ONCE(sk->sk_uid, SOCK_INODE(sock)->i_uid); WRITE_ONCE(sk->sk_ino, SOCK_INODE(sock)->i_ino); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index dc47d3efc72e..0e217041958a 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -5590,15 +5590,28 @@ static void __skb_complete_tx_timestamp(struct sk_buff *skb, static bool skb_may_tx_timestamp(struct sock *sk, bool tsonly) { - bool ret; + struct socket *sock; + struct file *file; + bool ret = false; if (likely(tsonly || READ_ONCE(sock_net(sk)->core.sysctl_tstamp_allow_data))) return true; - read_lock_bh(&sk->sk_callback_lock); - ret = sk->sk_socket && sk->sk_socket->file && - file_ns_capable(sk->sk_socket->file, &init_user_ns, CAP_NET_RAW); - read_unlock_bh(&sk->sk_callback_lock); + /* The sk pointer remains valid as long as the skb is. The sk_socket and + * file pointer may become NULL if the socket is closed. Both structures + * (including file->cred) are RCU freed which means they can be accessed + * within a RCU read section. + */ + rcu_read_lock(); + sock = READ_ONCE(sk->sk_socket); + if (!sock) + goto out; + file = READ_ONCE(sock->file); + if (!file) + goto out; + ret = file_ns_capable(file, &init_user_ns, CAP_NET_RAW); +out: + rcu_read_unlock(); return ret; } diff --git a/net/socket.c b/net/socket.c index 136b98c54fb3..05952188127f 100644 --- a/net/socket.c +++ b/net/socket.c @@ -674,7 +674,7 @@ static void __sock_release(struct socket *sock, struct inode *inode) iput(SOCK_INODE(sock)); return; } - sock->file = NULL; + WRITE_ONCE(sock->file, NULL); } /** From bf4fde7db4a8e2613cba36d81ac271f3d66c28f7 Mon Sep 17 00:00:00 2001 From: Ferry Meng Date: Tue, 24 Feb 2026 14:02:07 +0800 Subject: [PATCH 172/359] erofs: remove more unnecessary #ifdefs Many #ifdefs can be replaced with IS_ENABLED() to improve code readability. No functional changes. Signed-off-by: Ferry Meng Reviewed-by: Gao Xiang Signed-off-by: Gao Xiang --- fs/erofs/super.c | 83 ++++++++++++++++++++---------------------------- 1 file changed, 35 insertions(+), 48 deletions(-) diff --git a/fs/erofs/super.c b/fs/erofs/super.c index d4995686ac6c..972a0c82198d 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -424,26 +424,23 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = { static bool erofs_fc_set_dax_mode(struct fs_context *fc, unsigned int mode) { -#ifdef CONFIG_FS_DAX - struct erofs_sb_info *sbi = fc->s_fs_info; + if (IS_ENABLED(CONFIG_FS_DAX)) { + struct erofs_sb_info *sbi = fc->s_fs_info; - switch (mode) { - case EROFS_MOUNT_DAX_ALWAYS: - set_opt(&sbi->opt, DAX_ALWAYS); - clear_opt(&sbi->opt, DAX_NEVER); - return true; - case EROFS_MOUNT_DAX_NEVER: - set_opt(&sbi->opt, DAX_NEVER); - clear_opt(&sbi->opt, DAX_ALWAYS); - return true; - default: + if (mode == EROFS_MOUNT_DAX_ALWAYS) { + set_opt(&sbi->opt, DAX_ALWAYS); + clear_opt(&sbi->opt, DAX_NEVER); + return true; + } else if (mode == EROFS_MOUNT_DAX_NEVER) { + set_opt(&sbi->opt, DAX_NEVER); + clear_opt(&sbi->opt, DAX_ALWAYS); + return true; + } DBG_BUGON(1); return false; } -#else errorfc(fc, "dax options not supported"); return false; -#endif } static int erofs_fc_parse_param(struct fs_context *fc, @@ -460,31 +457,26 @@ static int erofs_fc_parse_param(struct fs_context *fc, switch (opt) { case Opt_user_xattr: -#ifdef CONFIG_EROFS_FS_XATTR - if (result.boolean) + if (!IS_ENABLED(CONFIG_EROFS_FS_XATTR)) + errorfc(fc, "{,no}user_xattr options not supported"); + else if (result.boolean) set_opt(&sbi->opt, XATTR_USER); else clear_opt(&sbi->opt, XATTR_USER); -#else - errorfc(fc, "{,no}user_xattr options not supported"); -#endif break; case Opt_acl: -#ifdef CONFIG_EROFS_FS_POSIX_ACL - if (result.boolean) + if (!IS_ENABLED(CONFIG_EROFS_FS_POSIX_ACL)) + errorfc(fc, "{,no}acl options not supported"); + else if (result.boolean) set_opt(&sbi->opt, POSIX_ACL); else clear_opt(&sbi->opt, POSIX_ACL); -#else - errorfc(fc, "{,no}acl options not supported"); -#endif break; case Opt_cache_strategy: -#ifdef CONFIG_EROFS_FS_ZIP - sbi->opt.cache_strategy = result.uint_32; -#else - errorfc(fc, "compression not supported, cache_strategy ignored"); -#endif + if (!IS_ENABLED(CONFIG_EROFS_FS_ZIP)) + errorfc(fc, "compression not supported, cache_strategy ignored"); + else + sbi->opt.cache_strategy = result.uint_32; break; case Opt_dax: if (!erofs_fc_set_dax_mode(fc, EROFS_MOUNT_DAX_ALWAYS)) @@ -533,24 +525,21 @@ static int erofs_fc_parse_param(struct fs_context *fc, break; #endif case Opt_directio: -#ifdef CONFIG_EROFS_FS_BACKED_BY_FILE - if (result.boolean) + if (!IS_ENABLED(CONFIG_EROFS_FS_BACKED_BY_FILE)) + errorfc(fc, "%s option not supported", erofs_fs_parameters[opt].name); + else if (result.boolean) set_opt(&sbi->opt, DIRECT_IO); else clear_opt(&sbi->opt, DIRECT_IO); -#else - errorfc(fc, "%s option not supported", erofs_fs_parameters[opt].name); -#endif break; case Opt_fsoffset: sbi->dif0.fsoff = result.uint_64; break; case Opt_inode_share: -#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE - set_opt(&sbi->opt, INODE_SHARE); -#else - errorfc(fc, "%s option not supported", erofs_fs_parameters[opt].name); -#endif + if (!IS_ENABLED(CONFIG_EROFS_FS_PAGE_CACHE_SHARE)) + errorfc(fc, "%s option not supported", erofs_fs_parameters[opt].name); + else + set_opt(&sbi->opt, INODE_SHARE); break; } return 0; @@ -809,8 +798,7 @@ static int erofs_fc_get_tree(struct fs_context *fc) ret = get_tree_bdev_flags(fc, erofs_fc_fill_super, IS_ENABLED(CONFIG_EROFS_FS_BACKED_BY_FILE) ? GET_TREE_BDEV_QUIET_LOOKUP : 0); -#ifdef CONFIG_EROFS_FS_BACKED_BY_FILE - if (ret == -ENOTBLK) { + if (IS_ENABLED(CONFIG_EROFS_FS_BACKED_BY_FILE) && ret == -ENOTBLK) { struct file *file; if (!fc->source) @@ -824,7 +812,6 @@ static int erofs_fc_get_tree(struct fs_context *fc) sbi->dif0.file->f_mapping->a_ops->read_folio) return get_tree_nodev(fc, erofs_fc_fill_super); } -#endif return ret; } @@ -1108,12 +1095,12 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root) seq_puts(seq, ",dax=never"); if (erofs_is_fileio_mode(sbi) && test_opt(opt, DIRECT_IO)) seq_puts(seq, ",directio"); -#ifdef CONFIG_EROFS_FS_ONDEMAND - if (sbi->fsid) - seq_printf(seq, ",fsid=%s", sbi->fsid); - if (sbi->domain_id) - seq_printf(seq, ",domain_id=%s", sbi->domain_id); -#endif + if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND)) { + if (sbi->fsid) + seq_printf(seq, ",fsid=%s", sbi->fsid); + if (sbi->domain_id) + seq_printf(seq, ",domain_id=%s", sbi->domain_id); + } if (sbi->dif0.fsoff) seq_printf(seq, ",fsoffset=%llu", sbi->dif0.fsoff); if (test_opt(opt, INODE_SHARE)) From 3d7e6ce34f4fcc7083510c28b17a7c36462a25d4 Mon Sep 17 00:00:00 2001 From: Ziyi Guo Date: Sun, 22 Feb 2026 05:06:33 +0000 Subject: [PATCH 173/359] net: usb: pegasus: enable basic endpoint checking pegasus_probe() fills URBs with hardcoded endpoint pipes without verifying the endpoint descriptors: - usb_rcvbulkpipe(dev, 1) for RX data - usb_sndbulkpipe(dev, 2) for TX data - usb_rcvintpipe(dev, 3) for status interrupts A malformed USB device can present these endpoints with transfer types that differ from what the driver assumes. Add a pegasus_usb_ep enum for endpoint numbers, replacing magic constants throughout. Add usb_check_bulk_endpoints() and usb_check_int_endpoints() calls before any resource allocation to verify endpoint types before use, rejecting devices with mismatched descriptors at probe time, and avoid triggering assertion. Similar fix to - commit 90b7f2961798 ("net: usb: rtl8150: enable basic endpoint checking") - commit 9e7021d2aeae ("net: usb: catc: enable basic endpoint checking") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Ziyi Guo Reviewed-by: Simon Horman Link: https://patch.msgid.link/20260222050633.410165-1-n7l8m4@u.northwestern.edu Signed-off-by: Paolo Abeni --- drivers/net/usb/pegasus.c | 35 ++++++++++++++++++++++++++++++----- 1 file changed, 30 insertions(+), 5 deletions(-) diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c index 7b6d6eb60709..b84968c5f074 100644 --- a/drivers/net/usb/pegasus.c +++ b/drivers/net/usb/pegasus.c @@ -28,6 +28,17 @@ static const char driver_name[] = "pegasus"; BMSR_100FULL | BMSR_ANEGCAPABLE) #define CARRIER_CHECK_DELAY (2 * HZ) +/* + * USB endpoints. + */ + +enum pegasus_usb_ep { + PEGASUS_USB_EP_CONTROL = 0, + PEGASUS_USB_EP_BULK_IN = 1, + PEGASUS_USB_EP_BULK_OUT = 2, + PEGASUS_USB_EP_INT_IN = 3, +}; + static bool loopback; static bool mii_mode; static char *devid; @@ -542,7 +553,7 @@ static void read_bulk_callback(struct urb *urb) goto tl_sched; goon: usb_fill_bulk_urb(pegasus->rx_urb, pegasus->usb, - usb_rcvbulkpipe(pegasus->usb, 1), + usb_rcvbulkpipe(pegasus->usb, PEGASUS_USB_EP_BULK_IN), pegasus->rx_skb->data, PEGASUS_MTU, read_bulk_callback, pegasus); rx_status = usb_submit_urb(pegasus->rx_urb, GFP_ATOMIC); @@ -582,7 +593,7 @@ static void rx_fixup(struct tasklet_struct *t) return; } usb_fill_bulk_urb(pegasus->rx_urb, pegasus->usb, - usb_rcvbulkpipe(pegasus->usb, 1), + usb_rcvbulkpipe(pegasus->usb, PEGASUS_USB_EP_BULK_IN), pegasus->rx_skb->data, PEGASUS_MTU, read_bulk_callback, pegasus); try_again: @@ -710,7 +721,7 @@ static netdev_tx_t pegasus_start_xmit(struct sk_buff *skb, ((__le16 *) pegasus->tx_buff)[0] = cpu_to_le16(l16); skb_copy_from_linear_data(skb, pegasus->tx_buff + 2, skb->len); usb_fill_bulk_urb(pegasus->tx_urb, pegasus->usb, - usb_sndbulkpipe(pegasus->usb, 2), + usb_sndbulkpipe(pegasus->usb, PEGASUS_USB_EP_BULK_OUT), pegasus->tx_buff, count, write_bulk_callback, pegasus); if ((res = usb_submit_urb(pegasus->tx_urb, GFP_ATOMIC))) { @@ -837,7 +848,7 @@ static int pegasus_open(struct net_device *net) set_registers(pegasus, EthID, 6, net->dev_addr); usb_fill_bulk_urb(pegasus->rx_urb, pegasus->usb, - usb_rcvbulkpipe(pegasus->usb, 1), + usb_rcvbulkpipe(pegasus->usb, PEGASUS_USB_EP_BULK_IN), pegasus->rx_skb->data, PEGASUS_MTU, read_bulk_callback, pegasus); if ((res = usb_submit_urb(pegasus->rx_urb, GFP_KERNEL))) { @@ -848,7 +859,7 @@ static int pegasus_open(struct net_device *net) } usb_fill_int_urb(pegasus->intr_urb, pegasus->usb, - usb_rcvintpipe(pegasus->usb, 3), + usb_rcvintpipe(pegasus->usb, PEGASUS_USB_EP_INT_IN), pegasus->intr_buff, sizeof(pegasus->intr_buff), intr_callback, pegasus, pegasus->intr_interval); if ((res = usb_submit_urb(pegasus->intr_urb, GFP_KERNEL))) { @@ -1133,10 +1144,24 @@ static int pegasus_probe(struct usb_interface *intf, pegasus_t *pegasus; int dev_index = id - pegasus_ids; int res = -ENOMEM; + static const u8 bulk_ep_addr[] = { + PEGASUS_USB_EP_BULK_IN | USB_DIR_IN, + PEGASUS_USB_EP_BULK_OUT | USB_DIR_OUT, + 0}; + static const u8 int_ep_addr[] = { + PEGASUS_USB_EP_INT_IN | USB_DIR_IN, + 0}; if (pegasus_blacklisted(dev)) return -ENODEV; + /* Verify that all required endpoints are present */ + if (!usb_check_bulk_endpoints(intf, bulk_ep_addr) || + !usb_check_int_endpoints(intf, int_ep_addr)) { + dev_err(&intf->dev, "Missing or invalid endpoints\n"); + return -ENODEV; + } + net = alloc_etherdev(sizeof(struct pegasus)); if (!net) goto out; From 4a1ddb0f1c48c2b56f21d8b5200e2e29adf4c1df Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Tue, 24 Feb 2026 12:09:00 +0100 Subject: [PATCH 174/359] pidfs: avoid misleading break The break would only break out of the scoped_guard() loop, not the switch statement. It still works correct as is ofc but let's avoid the confusion. Reported-by: David Lechner Link:: https://lore.kernel.org/cd2153f1-098b-463c-bbc1-5c6ca9ef1f12@baylibre.com Signed-off-by: Christian Brauner --- fs/pidfs.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/pidfs.c b/fs/pidfs.c index 1e20e36e0ed5..21f9f011a957 100644 --- a/fs/pidfs.c +++ b/fs/pidfs.c @@ -577,9 +577,8 @@ static long pidfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg) struct user_namespace *user_ns; user_ns = task_cred_xxx(task, user_ns); - if (!ns_ref_get(user_ns)) - break; - ns_common = to_ns_common(user_ns); + if (ns_ref_get(user_ns)) + ns_common = to_ns_common(user_ns); } #endif break; @@ -589,9 +588,8 @@ static long pidfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg) struct pid_namespace *pid_ns; pid_ns = task_active_pid_ns(task); - if (!ns_ref_get(pid_ns)) - break; - ns_common = to_ns_common(pid_ns); + if (ns_ref_get(pid_ns)) + ns_common = to_ns_common(pid_ns); } #endif break; From c8dbdc6e380e7e96a51706db3e4b7870d8a9402d Mon Sep 17 00:00:00 2001 From: Andrew Lunn Date: Sun, 22 Feb 2026 16:26:01 +0100 Subject: [PATCH 175/359] net: phy: register phy led_triggers during probe to avoid AB-BA deadlock There is an AB-BA deadlock when both LEDS_TRIGGER_NETDEV and LED_TRIGGER_PHY are enabled: [ 1362.049207] [<8054e4b8>] led_trigger_register+0x5c/0x1fc <-- Trying to get lock "triggers_list_lock" via down_write(&triggers_list_lock); [ 1362.054536] [<80662830>] phy_led_triggers_register+0xd0/0x234 [ 1362.060329] [<8065e200>] phy_attach_direct+0x33c/0x40c [ 1362.065489] [<80651fc4>] phylink_fwnode_phy_connect+0x15c/0x23c [ 1362.071480] [<8066ee18>] mtk_open+0x7c/0xba0 [ 1362.075849] [<806d714c>] __dev_open+0x280/0x2b0 [ 1362.080384] [<806d7668>] __dev_change_flags+0x244/0x24c [ 1362.085598] [<806d7698>] dev_change_flags+0x28/0x78 [ 1362.090528] [<807150e4>] dev_ioctl+0x4c0/0x654 <-- Hold lock "rtnl_mutex" by calling rtnl_lock(); [ 1362.094985] [<80694360>] sock_ioctl+0x2f4/0x4e0 [ 1362.099567] [<802e9c4c>] sys_ioctl+0x32c/0xd8c [ 1362.104022] [<80014504>] syscall_common+0x34/0x58 Here LED_TRIGGER_PHY is registering LED triggers during phy_attach while holding RTNL and then taking triggers_list_lock. [ 1362.191101] [<806c2640>] register_netdevice_notifier+0x60/0x168 <-- Trying to get lock "rtnl_mutex" via rtnl_lock(); [ 1362.197073] [<805504ac>] netdev_trig_activate+0x194/0x1e4 [ 1362.202490] [<8054e28c>] led_trigger_set+0x1d4/0x360 <-- Hold lock "triggers_list_lock" by down_read(&triggers_list_lock); [ 1362.207511] [<8054eb38>] led_trigger_write+0xd8/0x14c [ 1362.212566] [<80381d98>] sysfs_kf_bin_write+0x80/0xbc [ 1362.217688] [<8037fcd8>] kernfs_fop_write_iter+0x17c/0x28c [ 1362.223174] [<802cbd70>] vfs_write+0x21c/0x3c4 [ 1362.227712] [<802cc0c4>] ksys_write+0x78/0x12c [ 1362.232164] [<80014504>] syscall_common+0x34/0x58 Here LEDS_TRIGGER_NETDEV is being enabled on an LED. It first takes triggers_list_lock and then RTNL. A classical AB-BA deadlock. phy_led_triggers_registers() does not require the RTNL, it does not make any calls into the network stack which require protection. There is also no requirement the PHY has been attached to a MAC, the triggers only make use of phydev state. This allows the call to phy_led_triggers_registers() to be placed elsewhere. PHY probe() and release() don't hold RTNL, so solving the AB-BA deadlock. Reported-by: Shiji Yang Closes: https://lore.kernel.org/all/OS7PR01MB13602B128BA1AD3FA38B6D1FFBC69A@OS7PR01MB13602.jpnprd01.prod.outlook.com/ Fixes: 06f502f57d0d ("leds: trigger: Introduce a NETDEV trigger") Cc: stable@vger.kernel.org Signed-off-by: Andrew Lunn Tested-by: Shiji Yang Link: https://patch.msgid.link/20260222152601.1978655-1-andrew@lunn.ch Signed-off-by: Paolo Abeni --- drivers/net/phy/phy_device.c | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 9b8eaac63b90..cbb4af604aa5 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1866,8 +1866,6 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, goto error; phy_resume(phydev); - if (!phydev->is_on_sfp_module) - phy_led_triggers_register(phydev); /** * If the external phy used by current mac interface is managed by @@ -1982,9 +1980,6 @@ void phy_detach(struct phy_device *phydev) phydev->phy_link_change = NULL; phydev->phylink = NULL; - if (!phydev->is_on_sfp_module) - phy_led_triggers_unregister(phydev); - if (phydev->mdio.dev.driver) module_put(phydev->mdio.dev.driver->owner); @@ -3778,16 +3773,27 @@ static int phy_probe(struct device *dev) /* Set the state to READY by default */ phydev->state = PHY_READY; + /* Register the PHY LED triggers */ + if (!phydev->is_on_sfp_module) + phy_led_triggers_register(phydev); + /* Get the LEDs from the device tree, and instantiate standard * LEDs for them. */ - if (IS_ENABLED(CONFIG_PHYLIB_LEDS) && !phy_driver_is_genphy(phydev)) + if (IS_ENABLED(CONFIG_PHYLIB_LEDS) && !phy_driver_is_genphy(phydev)) { err = of_phy_leds(phydev); + if (err) + goto out; + } + + return 0; out: + if (!phydev->is_on_sfp_module) + phy_led_triggers_unregister(phydev); + /* Re-assert the reset signal on error */ - if (err) - phy_device_reset(phydev, 1); + phy_device_reset(phydev, 1); return err; } @@ -3801,6 +3807,9 @@ static int phy_remove(struct device *dev) if (IS_ENABLED(CONFIG_PHYLIB_LEDS) && !phy_driver_is_genphy(phydev)) phy_leds_unregister(phydev); + if (!phydev->is_on_sfp_module) + phy_led_triggers_unregister(phydev); + phydev->state = PHY_DOWN; phy_cleanup_ports(phydev); From 78437ab3b769f80526416570f60173c89858dd84 Mon Sep 17 00:00:00 2001 From: Danilo Krummrich Date: Fri, 13 Feb 2026 00:58:11 +0100 Subject: [PATCH 176/359] clk: scu/imx8qxp: do not register driver in probe() imx_clk_scu_init() registers the imx_clk_scu_driver while commonly being called from IMX driver's probe() callbacks. However, it neither makes sense to register drivers from probe() callbacks of other drivers, nor does the driver core allow registering drivers with a device lock already being held. The latter was revealed by commit dc23806a7c47 ("driver core: enforce device_lock for driver_match_device()") leading to a deadlock condition described in [1]. Besides that, nothing seems to unregister the imx_clk_scu_driver once the corresponding driver module is unloaded, which leaves the driver-core with a dangling pointer. Also, if there are multiple matching devices for the imx8qxp_clk_driver, imx8qxp_clk_probe() calls imx_clk_scu_init() multiple times. However, any subsequent call after the first one will fail, since the driver-core does not allow to register the same struct platform_driver multiple times. Hence, register the imx_clk_scu_driver from module_init() and unregister it in module_exit(). Note that we first register the imx8qxp_clk_driver and then call imx_clk_scu_module_init() to avoid having to call imx_clk_scu_module_exit() in the unwind path of imx8qxp_clk_init(). Fixes: dc23806a7c47 ("driver core: enforce device_lock for driver_match_device()") Fixes: 220175cd3979 ("clk: imx: scu: fix build break when compiled as modules") Reported-by: Alexander Stein Closes: https://lore.kernel.org/lkml/13955113.uLZWGnKmhe@steina-w/ Tested-by: Alexander Stein # TQMa8x/MBa8x Link: https://lore.kernel.org/lkml/DFU7CEPUSG9A.1KKGVW4HIPMSH@kernel.org/ [1] Acked-by: Abel Vesa Reviewed-by: Daniel Baluta Link: https://patch.msgid.link/20260212235842.85934-1-dakr@kernel.org Signed-off-by: Danilo Krummrich --- drivers/clk/imx/clk-imx8qxp.c | 24 +++++++++++++++++++++++- drivers/clk/imx/clk-scu.c | 12 +++++++++++- drivers/clk/imx/clk-scu.h | 2 ++ 3 files changed, 36 insertions(+), 2 deletions(-) diff --git a/drivers/clk/imx/clk-imx8qxp.c b/drivers/clk/imx/clk-imx8qxp.c index 3ae162625bb1..c781425a005e 100644 --- a/drivers/clk/imx/clk-imx8qxp.c +++ b/drivers/clk/imx/clk-imx8qxp.c @@ -346,7 +346,29 @@ static struct platform_driver imx8qxp_clk_driver = { }, .probe = imx8qxp_clk_probe, }; -module_platform_driver(imx8qxp_clk_driver); + +static int __init imx8qxp_clk_init(void) +{ + int ret; + + ret = platform_driver_register(&imx8qxp_clk_driver); + if (ret) + return ret; + + ret = imx_clk_scu_module_init(); + if (ret) + platform_driver_unregister(&imx8qxp_clk_driver); + + return ret; +} +module_init(imx8qxp_clk_init); + +static void __exit imx8qxp_clk_exit(void) +{ + imx_clk_scu_module_exit(); + platform_driver_unregister(&imx8qxp_clk_driver); +} +module_exit(imx8qxp_clk_exit); MODULE_AUTHOR("Aisheng Dong "); MODULE_DESCRIPTION("NXP i.MX8QXP clock driver"); diff --git a/drivers/clk/imx/clk-scu.c b/drivers/clk/imx/clk-scu.c index 33e637ad3579..a85ec48a798b 100644 --- a/drivers/clk/imx/clk-scu.c +++ b/drivers/clk/imx/clk-scu.c @@ -191,6 +191,16 @@ static bool imx_scu_clk_is_valid(u32 rsrc_id) return p != NULL; } +int __init imx_clk_scu_module_init(void) +{ + return platform_driver_register(&imx_clk_scu_driver); +} + +void __exit imx_clk_scu_module_exit(void) +{ + return platform_driver_unregister(&imx_clk_scu_driver); +} + int imx_clk_scu_init(struct device_node *np, const struct imx_clk_scu_rsrc_table *data) { @@ -215,7 +225,7 @@ int imx_clk_scu_init(struct device_node *np, rsrc_table = data; } - return platform_driver_register(&imx_clk_scu_driver); + return 0; } /* diff --git a/drivers/clk/imx/clk-scu.h b/drivers/clk/imx/clk-scu.h index af7b697f51ca..ca82f2cce897 100644 --- a/drivers/clk/imx/clk-scu.h +++ b/drivers/clk/imx/clk-scu.h @@ -25,6 +25,8 @@ extern const struct imx_clk_scu_rsrc_table imx_clk_scu_rsrc_imx8dxl; extern const struct imx_clk_scu_rsrc_table imx_clk_scu_rsrc_imx8qxp; extern const struct imx_clk_scu_rsrc_table imx_clk_scu_rsrc_imx8qm; +int __init imx_clk_scu_module_init(void); +void __exit imx_clk_scu_module_exit(void); int imx_clk_scu_init(struct device_node *np, const struct imx_clk_scu_rsrc_table *data); struct clk_hw *imx_scu_of_clk_src_get(struct of_phandle_args *clkspec, From fb73d0e19f12b793bfe013171d931587f37e3552 Mon Sep 17 00:00:00 2001 From: Shyam Sundar S K Date: Mon, 23 Feb 2026 13:10:20 +0530 Subject: [PATCH 177/359] MAINTAINERS: Update AMD XGBE driver maintainers Due to additional responsibilities, Shyam Sundar S K will no longer be supporting the AMD XGBE driver. Maintenance will be handled by Raju Rangoju going forward. Cc: Raju Rangoju Signed-off-by: Shyam Sundar S K Link: https://patch.msgid.link/20260223074020.1987884-1-Shyam-sundar.S-k@amd.com Signed-off-by: Paolo Abeni --- MAINTAINERS | 1 - 1 file changed, 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index e1a6c6cc2b7c..cf0313ca6039 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1292,7 +1292,6 @@ F: include/trace/events/amdxdna.h F: include/uapi/drm/amdxdna_accel.h AMD XGBE DRIVER -M: "Shyam Sundar S K" M: Raju Rangoju L: netdev@vger.kernel.org S: Maintained From ab39cc4cb8ceecdc2b61747433e7237f1ac2b789 Mon Sep 17 00:00:00 2001 From: David Arcari Date: Tue, 24 Feb 2026 07:21:06 -0500 Subject: [PATCH 178/359] cpufreq: intel_pstate: Fix NULL pointer dereference in update_cpu_qos_request() The update_cpu_qos_request() function attempts to initialize the 'freq' variable by dereferencing 'cpudata' before verifying if the 'policy' is valid. This issue occurs on systems booted with the "nosmt" parameter, where all_cpu_data[cpu] is NULL for the SMT sibling threads. As a result, any call to update_qos_requests() will result in a NULL pointer dereference as the code will attempt to access pstate.turbo_freq using the NULL cpudata pointer. Also, pstate.turbo_freq may be updated by intel_pstate_get_hwp_cap() after initializing the 'freq' variable, so it is better to defer the 'freq' until intel_pstate_get_hwp_cap() has been called. Fix this by deferring the 'freq' assignment until after the policy and driver_data have been validated. Fixes: ae1bdd23b99f ("cpufreq: intel_pstate: Adjust frequency percentage computations") Reported-by: Jirka Hladky Closes: https://lore.kernel.org/all/CAE4VaGDfiPvz3AzrwrwM4kWB3SCkMci25nPO8W1JmTBd=xHzZg@mail.gmail.com/ Signed-off-by: David Arcari Cc: 6.18+ # 6.18+ [ rjw: Added one paragraph to the changelog ] Link: https://patch.msgid.link/20260224122106.228116-1-darcari@redhat.com Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/intel_pstate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index a48af3540c74..bdc37080d319 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -1647,8 +1647,8 @@ static ssize_t store_no_turbo(struct kobject *a, struct kobj_attribute *b, static void update_cpu_qos_request(int cpu, enum freq_qos_req_type type) { struct cpudata *cpudata = all_cpu_data[cpu]; - unsigned int freq = cpudata->pstate.turbo_freq; struct freq_qos_request *req; + unsigned int freq; struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpu); if (!policy) @@ -1661,6 +1661,8 @@ static void update_cpu_qos_request(int cpu, enum freq_qos_req_type type) if (hwp_active) intel_pstate_get_hwp_cap(cpudata); + freq = cpudata->pstate.turbo_freq; + if (type == FREQ_QOS_MIN) { freq = DIV_ROUND_UP(freq * global.min_perf_pct, 100); } else { From 5ede90206273ff156a778254f0f972a55e973c89 Mon Sep 17 00:00:00 2001 From: Sofia Schneider Date: Sun, 22 Feb 2026 23:52:40 -0300 Subject: [PATCH 179/359] ACPI: OSI: Add DMI quirk for Acer Aspire One D255 The screen backlight turns off during boot (specifically during udev device initialization) when returning true for _OSI("Windows 2009"). Analyzing the device's DSDT reveals that the firmware takes a different code path when Windows 7 is reported, which leads to the backlight shutoff. Add a DMI quirk to invoke dmi_disable_osi_win7 for this model. Signed-off-by: Sofia Schneider Link: https://patch.msgid.link/20260223025240.518509-1-sofia@schn.dev Signed-off-by: Rafael J. Wysocki --- drivers/acpi/osi.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/acpi/osi.c b/drivers/acpi/osi.c index f2c943b934be..9470f1830ff5 100644 --- a/drivers/acpi/osi.c +++ b/drivers/acpi/osi.c @@ -389,6 +389,19 @@ static const struct dmi_system_id acpi_osi_dmi_table[] __initconst = { }, }, + /* + * The screen backlight turns off during udev device creation + * when returning true for _OSI("Windows 2009") + */ + { + .callback = dmi_disable_osi_win7, + .ident = "Acer Aspire One D255", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "AOD255"), + }, + }, + /* * The wireless hotkey does not work on those machines when * returning true for _OSI("Windows 2012") From 297318a1c26dabb5a2d8540fdf436c22094eb2d7 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Tue, 24 Feb 2026 12:52:18 +0100 Subject: [PATCH 180/359] spi: dt-bindings: snps,dw-abp-ssi: Remove unused bindings As stated in the da0a672268b3 ("spi: dw: Remove not-going-to-be-supported code for Baikal SoC") the Baikal platforms are not supported and the respective driver code was removed. Remove the currently unused bindings. Signed-off-by: Andy Shevchenko Link: https://patch.msgid.link/20260224115218.3499222-1-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown --- .../bindings/spi/snps,dw-apb-ssi.yaml | 31 +------------------ 1 file changed, 1 insertion(+), 30 deletions(-) diff --git a/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.yaml b/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.yaml index 81838577cf9c..8ebebcebca16 100644 --- a/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.yaml +++ b/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.yaml @@ -22,21 +22,6 @@ allOf: properties: reg: minItems: 2 - - if: - properties: - compatible: - contains: - enum: - - baikal,bt1-sys-ssi - then: - properties: - mux-controls: - maxItems: 1 - required: - - mux-controls - else: - required: - - interrupts - if: properties: compatible: @@ -75,10 +60,6 @@ properties: const: intel,mountevans-imc-ssi - description: AMD Pensando Elba SoC SPI Controller const: amd,pensando-elba-spi - - description: Baikal-T1 SPI Controller - const: baikal,bt1-ssi - - description: Baikal-T1 System Boot SPI Controller - const: baikal,bt1-sys-ssi - description: Canaan Kendryte K210 SoS SPI Controller const: canaan,k210-spi - description: Renesas RZ/N1 SPI Controller @@ -170,6 +151,7 @@ required: - "#address-cells" - "#size-cells" - clocks + - interrupts examples: - | @@ -190,15 +172,4 @@ examples: rx-sample-delay-ns = <7>; }; }; - - | - spi@1f040100 { - compatible = "baikal,bt1-sys-ssi"; - reg = <0x1f040100 0x900>, - <0x1c000000 0x1000000>; - #address-cells = <1>; - #size-cells = <0>; - mux-controls = <&boot_mux>; - clocks = <&ccu_sys>; - clock-names = "ssi_clk"; - }; ... From 96a1fd0d84b17360840f344826897fa71049870e Mon Sep 17 00:00:00 2001 From: Dave Jiang Date: Thu, 12 Feb 2026 14:50:38 -0700 Subject: [PATCH 181/359] cxl: Fix race of nvdimm_bus object when creating nvdimm objects Found issue during running of cxl-translate.sh unit test. Adding a 3s sleep right before the test seems to make the issue reproduce fairly consistently. The cxl_translate module has dependency on cxl_acpi and causes orphaned nvdimm objects to reprobe after cxl_acpi is removed. The nvdimm_bus object is registered by the cxl_nvb object when cxl_acpi_probe() is called. With the nvdimm_bus object missing, __nd_device_register() will trigger NULL pointer dereference when accessing the dev->parent that points to &nvdimm_bus->dev. [ 192.884510] BUG: kernel NULL pointer dereference, address: 000000000000006c [ 192.895383] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20250812-19.fc42 08/12/2025 [ 192.897721] Workqueue: cxl_port cxl_bus_rescan_queue [cxl_core] [ 192.899459] RIP: 0010:kobject_get+0xc/0x90 [ 192.924871] Call Trace: [ 192.925959] [ 192.926976] ? pm_runtime_init+0xb9/0xe0 [ 192.929712] __nd_device_register.part.0+0x4d/0xc0 [libnvdimm] [ 192.933314] __nvdimm_create+0x206/0x290 [libnvdimm] [ 192.936662] cxl_nvdimm_probe+0x119/0x1d0 [cxl_pmem] [ 192.940245] cxl_bus_probe+0x1a/0x60 [cxl_core] [ 192.943349] really_probe+0xde/0x380 This patch also relies on the previous change where devm_cxl_add_nvdimm_bridge() is called from drivers/cxl/pmem.c instead of drivers/cxl/core.c to ensure the dependency of cxl_acpi on cxl_pmem. 1. Set probe_type of cxl_nvb to PROBE_FORCE_SYNCHRONOUS to ensure the driver is probed synchronously when add_device() is called. 2. Add a check in __devm_cxl_add_nvdimm_bridge() to ensure that the cxl_nvb driver is attached during cxl_acpi_probe(). 3. Take the cxl_root uport_dev lock and the cxl_nvb->dev lock in devm_cxl_add_nvdimm() before checking nvdimm_bus is valid. 4. Set cxl_nvdimm flag to CXL_NVD_F_INVALIDATED so cxl_nvdimm_probe() will exit with -EBUSY. The removal of cxl_nvdimm devices should prevent any orphaned devices from probing once the nvdimm_bus is gone. [ dj: Fixed 0-day reported kdoc issue. ] [ dj: Fix cxl_nvb reference leak on error. Gregory (kreview-0811365) ] Suggested-by: Dan Williams Fixes: 8fdcb1704f61 ("cxl/pmem: Add initial infrastructure for pmem support") Tested-by: Alison Schofield Reviewed-by: Alison Schofield Link: https://patch.msgid.link/20260205001633.1813643-3-dave.jiang@intel.com Signed-off-by: Dave Jiang --- drivers/cxl/core/pmem.c | 29 +++++++++++++++++++++++++++++ drivers/cxl/cxl.h | 5 +++++ drivers/cxl/pmem.c | 10 ++++++++-- 3 files changed, 42 insertions(+), 2 deletions(-) diff --git a/drivers/cxl/core/pmem.c b/drivers/cxl/core/pmem.c index b652e3486038..68462e38a977 100644 --- a/drivers/cxl/core/pmem.c +++ b/drivers/cxl/core/pmem.c @@ -115,6 +115,15 @@ static void unregister_nvb(void *_cxl_nvb) device_unregister(&cxl_nvb->dev); } +static bool cxl_nvdimm_bridge_failed_attach(struct cxl_nvdimm_bridge *cxl_nvb) +{ + struct device *dev = &cxl_nvb->dev; + + guard(device)(dev); + /* If the device has no driver, then it failed to attach. */ + return dev->driver == NULL; +} + struct cxl_nvdimm_bridge *__devm_cxl_add_nvdimm_bridge(struct device *host, struct cxl_port *port) { @@ -138,6 +147,11 @@ struct cxl_nvdimm_bridge *__devm_cxl_add_nvdimm_bridge(struct device *host, if (rc) goto err; + if (cxl_nvdimm_bridge_failed_attach(cxl_nvb)) { + unregister_nvb(cxl_nvb); + return ERR_PTR(-ENODEV); + } + rc = devm_add_action_or_reset(host, unregister_nvb, cxl_nvb); if (rc) return ERR_PTR(rc); @@ -248,6 +262,21 @@ int devm_cxl_add_nvdimm(struct device *host, struct cxl_port *port, if (!cxl_nvb) return -ENODEV; + /* + * Take the uport_dev lock to guard against race of nvdimm_bus object. + * cxl_acpi_probe() registers the nvdimm_bus and is done under the + * root port uport_dev lock. + * + * Take the cxl_nvb device lock to ensure that cxl_nvb driver is in a + * consistent state. And the driver registers nvdimm_bus. + */ + guard(device)(cxl_nvb->port->uport_dev); + guard(device)(&cxl_nvb->dev); + if (!cxl_nvb->nvdimm_bus) { + rc = -ENODEV; + goto err_alloc; + } + cxl_nvd = cxl_nvdimm_alloc(cxl_nvb, cxlmd); if (IS_ERR(cxl_nvd)) { rc = PTR_ERR(cxl_nvd); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index f5850800f400..9b947286eb9b 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -574,11 +574,16 @@ struct cxl_nvdimm_bridge { #define CXL_DEV_ID_LEN 19 +enum { + CXL_NVD_F_INVALIDATED = 0, +}; + struct cxl_nvdimm { struct device dev; struct cxl_memdev *cxlmd; u8 dev_id[CXL_DEV_ID_LEN]; /* for nvdimm, string of 'serial' */ u64 dirty_shutdowns; + unsigned long flags; }; struct cxl_pmem_region_mapping { diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c index c67b30b516bf..082ec0f1c3a0 100644 --- a/drivers/cxl/pmem.c +++ b/drivers/cxl/pmem.c @@ -14,7 +14,7 @@ static __read_mostly DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX); /** - * __devm_cxl_add_nvdimm_bridge() - add the root of a LIBNVDIMM topology + * devm_cxl_add_nvdimm_bridge() - add the root of a LIBNVDIMM topology * @host: platform firmware root device * @port: CXL port at the root of a CXL topology * @@ -143,6 +143,9 @@ static int cxl_nvdimm_probe(struct device *dev) struct nvdimm *nvdimm; int rc; + if (test_bit(CXL_NVD_F_INVALIDATED, &cxl_nvd->flags)) + return -EBUSY; + set_exclusive_cxl_commands(mds, exclusive_cmds); rc = devm_add_action_or_reset(dev, clear_exclusive, mds); if (rc) @@ -323,8 +326,10 @@ static int detach_nvdimm(struct device *dev, void *data) scoped_guard(device, dev) { if (dev->driver) { cxl_nvd = to_cxl_nvdimm(dev); - if (cxl_nvd->cxlmd && cxl_nvd->cxlmd->cxl_nvb == data) + if (cxl_nvd->cxlmd && cxl_nvd->cxlmd->cxl_nvb == data) { release = true; + set_bit(CXL_NVD_F_INVALIDATED, &cxl_nvd->flags); + } } } if (release) @@ -367,6 +372,7 @@ static struct cxl_driver cxl_nvdimm_bridge_driver = { .probe = cxl_nvdimm_bridge_probe, .id = CXL_DEVICE_NVDIMM_BRIDGE, .drv = { + .probe_type = PROBE_FORCE_SYNCHRONOUS, .suppress_bind_attrs = true, }, }; From 60b5d1f68338aff2c5af0113f04aefa7169c50c2 Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Thu, 19 Feb 2026 16:16:17 -0800 Subject: [PATCH 182/359] cxl/mbox: validate payload size before accessing contents in cxl_payload_from_user_allowed() cxl_payload_from_user_allowed() casts and dereferences the input payload without first verifying its size. When a raw mailbox command is sent with an undersized payload (ie: 1 byte for CXL_MBOX_OP_CLEAR_LOG, which expects a 16-byte UUID), uuid_equal() reads past the allocated buffer, triggering a KASAN splat: BUG: KASAN: slab-out-of-bounds in memcmp+0x176/0x1d0 lib/string.c:683 Read of size 8 at addr ffff88810130f5c0 by task syz.1.62/2258 CPU: 2 UID: 0 PID: 2258 Comm: syz.1.62 Not tainted 6.19.0-dirty #3 PREEMPT(voluntary) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0xab/0xe0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xce/0x650 mm/kasan/report.c:482 kasan_report+0xce/0x100 mm/kasan/report.c:595 memcmp+0x176/0x1d0 lib/string.c:683 uuid_equal include/linux/uuid.h:73 [inline] cxl_payload_from_user_allowed drivers/cxl/core/mbox.c:345 [inline] cxl_mbox_cmd_ctor drivers/cxl/core/mbox.c:368 [inline] cxl_validate_cmd_from_user drivers/cxl/core/mbox.c:522 [inline] cxl_send_cmd+0x9c0/0xb50 drivers/cxl/core/mbox.c:643 __cxl_memdev_ioctl drivers/cxl/core/memdev.c:698 [inline] cxl_memdev_ioctl+0x14f/0x190 drivers/cxl/core/memdev.c:713 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl fs/ioctl.c:583 [inline] __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xa8/0x330 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fdaf331ba79 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fdaf1d77038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fdaf3585fa0 RCX: 00007fdaf331ba79 RDX: 00002000000001c0 RSI: 00000000c030ce02 RDI: 0000000000000003 RBP: 00007fdaf33749df R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fdaf3586038 R14: 00007fdaf3585fa0 R15: 00007ffced2af768 Add 'in_size' parameter to cxl_payload_from_user_allowed() and validate the payload is large enough. Fixes: 6179045ccc0c ("cxl/mbox: Block immediate mode in SET_PARTITION_INFO command") Fixes: 206f9fa9d555 ("cxl/mbox: Add Clear Log mailbox command") Signed-off-by: Davidlohr Bueso Reviewed-by: Alison Schofield Reviewed-by: Dave Jiang Link: https://patch.msgid.link/20260220001618.963490-2-dave@stgolabs.net Signed-off-by: Dave Jiang --- drivers/cxl/core/mbox.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index fa6dd0c94656..e7a6452bf544 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -311,6 +311,7 @@ static bool cxl_mem_raw_command_allowed(u16 opcode) * cxl_payload_from_user_allowed() - Check contents of in_payload. * @opcode: The mailbox command opcode. * @payload_in: Pointer to the input payload passed in from user space. + * @in_size: Size of @payload_in in bytes. * * Return: * * true - payload_in passes check for @opcode. @@ -325,12 +326,15 @@ static bool cxl_mem_raw_command_allowed(u16 opcode) * * The specific checks are determined by the opcode. */ -static bool cxl_payload_from_user_allowed(u16 opcode, void *payload_in) +static bool cxl_payload_from_user_allowed(u16 opcode, void *payload_in, + size_t in_size) { switch (opcode) { case CXL_MBOX_OP_SET_PARTITION_INFO: { struct cxl_mbox_set_partition_info *pi = payload_in; + if (in_size < sizeof(*pi)) + return false; if (pi->flags & CXL_SET_PARTITION_IMMEDIATE_FLAG) return false; break; @@ -338,6 +342,8 @@ static bool cxl_payload_from_user_allowed(u16 opcode, void *payload_in) case CXL_MBOX_OP_CLEAR_LOG: { const uuid_t *uuid = (uuid_t *)payload_in; + if (in_size < sizeof(uuid_t)) + return false; /* * Restrict the ‘Clear log’ action to only apply to * Vendor debug logs. @@ -365,7 +371,8 @@ static int cxl_mbox_cmd_ctor(struct cxl_mbox_cmd *mbox_cmd, if (IS_ERR(mbox_cmd->payload_in)) return PTR_ERR(mbox_cmd->payload_in); - if (!cxl_payload_from_user_allowed(opcode, mbox_cmd->payload_in)) { + if (!cxl_payload_from_user_allowed(opcode, mbox_cmd->payload_in, + in_size)) { dev_dbg(cxl_mbox->host, "%s: input payload not allowed\n", cxl_mem_opcode_to_name(opcode)); kvfree(mbox_cmd->payload_in); From 0a70b7cd397e545e926c93715ff6366b67c716f6 Mon Sep 17 00:00:00 2001 From: Alison Schofield Date: Mon, 23 Feb 2026 11:13:39 -0800 Subject: [PATCH 183/359] cxl: Test CXL_DECODER_F_LOCK as a bitmask The CXL decoder flags are defined as bitmasks, not bit indices. Using test_bit() to check them interprets the mask value as a bit index, which is the wrong test. For CXL_DECODER_F_LOCK the test reads beyond the defined bits, causing the test to always return false and allowing resets that should have been blocked. Replace test_bit() with a bitmask check. Fixes: 2230c4bdc412 ("cxl: Add handling of locked CXL decoder") Signed-off-by: Alison Schofield Reviewed-by: Gregory Price Tested-by: Gregory Price Link: https://patch.msgid.link/98851c4770e4631753cf9f75b58a3a6daeca2ea2.1771873256.git.alison.schofield@intel.com Signed-off-by: Dave Jiang --- drivers/cxl/core/hdm.c | 2 +- drivers/cxl/core/region.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index e3f0c39e6812..c222e98ae736 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -904,7 +904,7 @@ static void cxl_decoder_reset(struct cxl_decoder *cxld) if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0) return; - if (test_bit(CXL_DECODER_F_LOCK, &cxld->flags)) + if (cxld->flags & CXL_DECODER_F_LOCK) return; if (port->commit_end == id) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index fec37af1dfbf..780ec947ecf2 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1100,7 +1100,7 @@ static int cxl_rr_assign_decoder(struct cxl_port *port, struct cxl_region *cxlr, static void cxl_region_setup_flags(struct cxl_region *cxlr, struct cxl_decoder *cxld) { - if (test_bit(CXL_DECODER_F_LOCK, &cxld->flags)) { + if (cxld->flags & CXL_DECODER_F_LOCK) { set_bit(CXL_REGION_F_LOCK, &cxlr->flags); clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); } From e46f25f5a81f6f1a9ab93bcda80d5dfaea9f4897 Mon Sep 17 00:00:00 2001 From: Alison Schofield Date: Mon, 23 Feb 2026 11:13:40 -0800 Subject: [PATCH 184/359] cxl/region: Test CXL_DECODER_F_NORMALIZED_ADDRESSING as a bitmask The CXL decoder flags are defined as bitmasks, not bit indices. Using test_bit() to check them interprets the mask value as a bit index, which is the wrong test. For CXL_DECODER_F_NORMALIZED_ADDRESSING the test reads beyond the flags word, making the flag sometimes appear set and blocking creation of CXL region debugfs attributes that support poison operations. Replace test_bit() with a bitmask check. Found with cxl-test. Fixes: 208f432406b7 ("cxl: Disable HPA/SPA translation handlers for Normalized Addressing") Signed-off-by: Alison Schofield Reviewed-by: Gregory Price Tested-by: Gregory Price Link: https://patch.msgid.link/63fe4a6203e40e404347f1cdc7a1c55cb4959b86.1771873256.git.alison.schofield@intel.com Signed-off-by: Dave Jiang --- drivers/cxl/core/region.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 780ec947ecf2..42874948b589 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1105,7 +1105,7 @@ static void cxl_region_setup_flags(struct cxl_region *cxlr, clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); } - if (test_bit(CXL_DECODER_F_NORMALIZED_ADDRESSING, &cxld->flags)) + if (cxld->flags & CXL_DECODER_F_NORMALIZED_ADDRESSING) set_bit(CXL_REGION_F_NORMALIZED_ADDRESSING, &cxlr->flags); } From 43277d8f57b490550d33f6f4cb73c28ec6024696 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:19 -0800 Subject: [PATCH 185/359] selftests/bpf: Replace strncpy() with strscpy() strncpy() does not guarantee NULL-termination and is considered deprecated [1]. Replace strncpy() calls with strscpy(). [1] https://docs.kernel.org/process/deprecated.html#strncpy-on-nul-terminated-strings Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223190736.649171-4-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/network_helpers.c | 3 +-- tools/testing/selftests/bpf/prog_tests/bpf_iter.c | 3 +-- tools/testing/selftests/bpf/prog_tests/flow_dissector.c | 4 ++-- tools/testing/selftests/bpf/prog_tests/queue_stack_map.c | 4 ++-- tools/testing/selftests/bpf/prog_tests/skc_to_unix_sock.c | 2 +- tools/testing/selftests/bpf/prog_tests/task_local_data.h | 2 +- tools/testing/selftests/bpf/prog_tests/tc_redirect.c | 2 +- tools/testing/selftests/bpf/test_progs.c | 2 +- tools/testing/selftests/bpf/xdp_hw_metadata.c | 4 ++-- 9 files changed, 12 insertions(+), 14 deletions(-) diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c index 5374b7e16d53..b82f572641b7 100644 --- a/tools/testing/selftests/bpf/network_helpers.c +++ b/tools/testing/selftests/bpf/network_helpers.c @@ -581,8 +581,7 @@ int open_tuntap(const char *dev_name, bool need_mac) return -1; ifr.ifr_flags = IFF_NO_PI | (need_mac ? IFF_TAP : IFF_TUN); - strncpy(ifr.ifr_name, dev_name, IFNAMSIZ - 1); - ifr.ifr_name[IFNAMSIZ - 1] = '\0'; + strscpy(ifr.ifr_name, dev_name); err = ioctl(fd, TUNSETIFF, &ifr); if (!ASSERT_OK(err, "ioctl(TUNSETIFF)")) { diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c index 5225d69bf79b..c69080ca14f5 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c @@ -346,8 +346,7 @@ static void test_task_sleepable(void) close(finish_pipe[1]); test_data = malloc(sizeof(char) * 10); - strncpy(test_data, "test_data", 10); - test_data[9] = '\0'; + strscpy(test_data, "test_data", 10); test_data_long = malloc(sizeof(char) * 5000); for (int i = 0; i < 5000; ++i) { diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c index 08bae13248c4..fb4892681464 100644 --- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c +++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c @@ -570,7 +570,7 @@ static int create_tap(const char *ifname) }; int fd, ret; - strncpy(ifr.ifr_name, ifname, sizeof(ifr.ifr_name)); + strscpy(ifr.ifr_name, ifname); fd = open("/dev/net/tun", O_RDWR); if (fd < 0) @@ -599,7 +599,7 @@ static int ifup(const char *ifname) struct ifreq ifr = {}; int sk, ret; - strncpy(ifr.ifr_name, ifname, sizeof(ifr.ifr_name)); + strscpy(ifr.ifr_name, ifname); sk = socket(PF_INET, SOCK_DGRAM, 0); if (sk < 0) diff --git a/tools/testing/selftests/bpf/prog_tests/queue_stack_map.c b/tools/testing/selftests/bpf/prog_tests/queue_stack_map.c index a043af9cd6d9..41441325e179 100644 --- a/tools/testing/selftests/bpf/prog_tests/queue_stack_map.c +++ b/tools/testing/selftests/bpf/prog_tests/queue_stack_map.c @@ -28,9 +28,9 @@ static void test_queue_stack_map_by_type(int type) vals[i] = rand(); if (type == QUEUE) - strncpy(file, "./test_queue_map.bpf.o", sizeof(file)); + strscpy(file, "./test_queue_map.bpf.o"); else if (type == STACK) - strncpy(file, "./test_stack_map.bpf.o", sizeof(file)); + strscpy(file, "./test_stack_map.bpf.o"); else return; diff --git a/tools/testing/selftests/bpf/prog_tests/skc_to_unix_sock.c b/tools/testing/selftests/bpf/prog_tests/skc_to_unix_sock.c index 3eefdfed1db9..657d897958b6 100644 --- a/tools/testing/selftests/bpf/prog_tests/skc_to_unix_sock.c +++ b/tools/testing/selftests/bpf/prog_tests/skc_to_unix_sock.c @@ -34,7 +34,7 @@ void test_skc_to_unix_sock(void) memset(&sockaddr, 0, sizeof(sockaddr)); sockaddr.sun_family = AF_UNIX; - strncpy(sockaddr.sun_path, sock_path, strlen(sock_path)); + strscpy(sockaddr.sun_path, sock_path); sockaddr.sun_path[0] = '\0'; err = bind(sockfd, (struct sockaddr *)&sockaddr, sizeof(sockaddr)); diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h index 0f86b9275cf9..8342e2fe5260 100644 --- a/tools/testing/selftests/bpf/prog_tests/task_local_data.h +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h @@ -262,7 +262,7 @@ retry: if (!atomic_compare_exchange_strong(&tld_meta_p->cnt, &cnt, cnt + 1)) goto retry; - strncpy(tld_meta_p->metadata[i].name, name, TLD_NAME_LEN); + strscpy(tld_meta_p->metadata[i].name, name); atomic_store(&tld_meta_p->metadata[i].size, size); return (tld_key_t){(__s16)off}; } diff --git a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c index 76d72a59365e..64fbda082309 100644 --- a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c +++ b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c @@ -1095,7 +1095,7 @@ static int tun_open(char *name) ifr.ifr_flags = IFF_TUN | IFF_NO_PI; if (*name) - strncpy(ifr.ifr_name, name, IFNAMSIZ); + strscpy(ifr.ifr_name, name); err = ioctl(fd, TUNSETIFF, &ifr); if (!ASSERT_OK(err, "ioctl TUNSETIFF")) diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c index 02a85dda30e6..d1418ec1f351 100644 --- a/tools/testing/selftests/bpf/test_progs.c +++ b/tools/testing/selftests/bpf/test_progs.c @@ -1799,7 +1799,7 @@ static int worker_main_send_subtests(int sock, struct test_state *state) msg.subtest_done.num = i; - strncpy(msg.subtest_done.name, subtest_state->name, MAX_SUBTEST_NAME); + strscpy(msg.subtest_done.name, subtest_state->name, MAX_SUBTEST_NAME); msg.subtest_done.error_cnt = subtest_state->error_cnt; msg.subtest_done.skipped = subtest_state->skipped; diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c index 3d8de0d4c96a..6db3b5555a22 100644 --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c @@ -550,7 +550,7 @@ static int rxq_num(const char *ifname) struct ifreq ifr = { .ifr_data = (void *)&ch, }; - strncpy(ifr.ifr_name, ifname, IF_NAMESIZE - 1); + strscpy(ifr.ifr_name, ifname); int fd, ret; fd = socket(AF_UNIX, SOCK_DGRAM, 0); @@ -571,7 +571,7 @@ static void hwtstamp_ioctl(int op, const char *ifname, struct hwtstamp_config *c struct ifreq ifr = { .ifr_data = (void *)cfg, }; - strncpy(ifr.ifr_name, ifname, IF_NAMESIZE - 1); + strscpy(ifr.ifr_name, ifname); int fd, ret; fd = socket(AF_UNIX, SOCK_DGRAM, 0); From 3ed0bc2d4994c718fb5476b45d4aafec42a1d040 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:20 -0800 Subject: [PATCH 186/359] selftests/bpf: Use strscpy in bpftool_helpers.c Replace strncpy() calls in bpftool_helpers.c with strscpy(). Pass the destination buffer size to detect_bpftool_path() instead of hardcoding BPFTOOL_PATH_MAX_LEN. Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223190736.649171-5-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/bpftool_helpers.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/bpf/bpftool_helpers.c b/tools/testing/selftests/bpf/bpftool_helpers.c index a5824945a4a5..595a636fa13b 100644 --- a/tools/testing/selftests/bpf/bpftool_helpers.c +++ b/tools/testing/selftests/bpf/bpftool_helpers.c @@ -1,15 +1,17 @@ // SPDX-License-Identifier: GPL-2.0-only -#include "bpftool_helpers.h" #include #include #include +#include "bpf_util.h" +#include "bpftool_helpers.h" + #define BPFTOOL_PATH_MAX_LEN 64 #define BPFTOOL_FULL_CMD_MAX_LEN 512 #define BPFTOOL_DEFAULT_PATH "tools/sbin/bpftool" -static int detect_bpftool_path(char *buffer) +static int detect_bpftool_path(char *buffer, size_t size) { char tmp[BPFTOOL_PATH_MAX_LEN]; @@ -18,7 +20,7 @@ static int detect_bpftool_path(char *buffer) */ snprintf(tmp, BPFTOOL_PATH_MAX_LEN, "./%s", BPFTOOL_DEFAULT_PATH); if (access(tmp, X_OK) == 0) { - strncpy(buffer, tmp, BPFTOOL_PATH_MAX_LEN); + strscpy(buffer, tmp, size); return 0; } @@ -27,7 +29,7 @@ static int detect_bpftool_path(char *buffer) */ snprintf(tmp, BPFTOOL_PATH_MAX_LEN, "../%s", BPFTOOL_DEFAULT_PATH); if (access(tmp, X_OK) == 0) { - strncpy(buffer, tmp, BPFTOOL_PATH_MAX_LEN); + strscpy(buffer, tmp, size); return 0; } @@ -44,7 +46,7 @@ static int run_command(char *args, char *output_buf, size_t output_max_len) int ret; /* Detect and cache bpftool binary location */ - if (bpftool_path[0] == 0 && detect_bpftool_path(bpftool_path)) + if (bpftool_path[0] == 0 && detect_bpftool_path(bpftool_path, sizeof(bpftool_path))) return 1; ret = snprintf(command, BPFTOOL_FULL_CMD_MAX_LEN, "%s %s%s", From 9d8685239e85ba5a4899b5b3534c732e3ae5f2aa Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:21 -0800 Subject: [PATCH 187/359] selftests/bpf: Use memcpy() for bounded non-NULL-terminated copies Replace strncpy() with memcpy() in cases where the source is non-NULL-terminated and the copy length is known. Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223190736.649171-6-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c | 6 ++++-- tools/testing/selftests/bpf/test_verifier.c | 2 +- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c index dd75ccb03770..469e92869523 100644 --- a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c +++ b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c @@ -308,8 +308,10 @@ static int find_field_offset(struct btf *btf, char *pattern, regmatch_t *matches return -1; } - strncpy(type_str, type, type_sz); - strncpy(field_str, field, field_sz); + memcpy(type_str, type, type_sz); + type_str[type_sz] = '\0'; + memcpy(field_str, field, field_sz); + field_str[field_sz] = '\0'; btf_id = btf__find_by_name(btf, type_str); if (btf_id < 0) { PRINT_FAIL("No BTF info for type %s\n", type_str); diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 27db34ecf3f5..a8ae03c57bba 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -1320,7 +1320,7 @@ static bool cmp_str_seq(const char *log, const char *exp) printf("FAIL\nTestcase bug\n"); return false; } - strncpy(needle, exp, len); + memcpy(needle, exp, len); needle[len] = 0; q = strstr(log, needle); if (!q) { From 4021848a903e65f74cf88997c12b96d6bc2453e6 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:22 -0800 Subject: [PATCH 188/359] selftests/bpf: Pass through build flags to bpftool and resolve_btfids EXTRA_* and SAN_* build flags were not correctly propagated to bpftool and resolve_btids when building selftests/bpf. This led to various build errors on attempt to build with SAN_CFLAGS="-fsanitize=address", for example. Fix the makefiles to address this: - Pass SAN_CFLAGS/SAN_LDFLAGS to bpftool and resolve_btfids build - Propagate EXTRA_LDFLAGS to resolve_btfids link command - Use pkg-config to detect zlib and zstd for resolve_btfids, similar libelf handling Also check for ASAN flag in selftests/bpf/Makefile for convenience. Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223190736.649171-7-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/bpf/resolve_btfids/Makefile | 7 +++++-- tools/testing/selftests/bpf/Makefile | 13 +++++++++---- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/tools/bpf/resolve_btfids/Makefile b/tools/bpf/resolve_btfids/Makefile index 1733a6e93a07..ef083602b73a 100644 --- a/tools/bpf/resolve_btfids/Makefile +++ b/tools/bpf/resolve_btfids/Makefile @@ -65,6 +65,9 @@ $(BPFOBJ): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(LIBBPF_OU LIBELF_FLAGS := $(shell $(HOSTPKG_CONFIG) libelf --cflags 2>/dev/null) LIBELF_LIBS := $(shell $(HOSTPKG_CONFIG) libelf --libs 2>/dev/null || echo -lelf) +ZLIB_LIBS := $(shell $(HOSTPKG_CONFIG) zlib --libs 2>/dev/null || echo -lz) +ZSTD_LIBS := $(shell $(HOSTPKG_CONFIG) libzstd --libs 2>/dev/null || echo -lzstd) + HOSTCFLAGS_resolve_btfids += -g \ -I$(srctree)/tools/include \ -I$(srctree)/tools/include/uapi \ @@ -73,7 +76,7 @@ HOSTCFLAGS_resolve_btfids += -g \ $(LIBELF_FLAGS) \ -Wall -Werror -LIBS = $(LIBELF_LIBS) -lz +LIBS = $(LIBELF_LIBS) $(ZLIB_LIBS) $(ZSTD_LIBS) export srctree OUTPUT HOSTCFLAGS_resolve_btfids Q HOSTCC HOSTLD HOSTAR include $(srctree)/tools/build/Makefile.include @@ -83,7 +86,7 @@ $(BINARY_IN): fixdep FORCE prepare | $(OUTPUT) $(BINARY): $(BPFOBJ) $(SUBCMDOBJ) $(BINARY_IN) $(call msg,LINK,$@) - $(Q)$(HOSTCC) $(BINARY_IN) $(KBUILD_HOSTLDFLAGS) -o $@ $(BPFOBJ) $(SUBCMDOBJ) $(LIBS) + $(Q)$(HOSTCC) $(BINARY_IN) $(KBUILD_HOSTLDFLAGS) $(EXTRA_LDFLAGS) -o $@ $(BPFOBJ) $(SUBCMDOBJ) $(LIBS) clean_objects := $(wildcard $(OUTPUT)/*.o \ $(OUTPUT)/.*.o.cmd \ diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 6776158f1f3e..72a9ba41f95e 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -27,7 +27,11 @@ ifneq ($(wildcard $(GENHDR)),) endif BPF_GCC ?= $(shell command -v bpf-gcc;) +ifdef ASAN +SAN_CFLAGS ?= -fsanitize=address -fno-omit-frame-pointer +else SAN_CFLAGS ?= +endif SAN_LDFLAGS ?= $(SAN_CFLAGS) RELEASE ?= OPT_FLAGS ?= $(if $(RELEASE),-O2,-O0) @@ -326,8 +330,8 @@ $(DEFAULT_BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) \ $(HOST_BPFOBJ) | $(HOST_BUILD_DIR)/bpftool $(Q)$(MAKE) $(submake_extras) -C $(BPFTOOLDIR) \ ARCH= CROSS_COMPILE= CC="$(HOSTCC)" LD="$(HOSTLD)" \ - EXTRA_CFLAGS='-g $(OPT_FLAGS) $(EXTRA_CFLAGS)' \ - EXTRA_LDFLAGS='$(EXTRA_LDFLAGS)' \ + EXTRA_CFLAGS='-g $(OPT_FLAGS) $(SAN_CFLAGS) $(EXTRA_CFLAGS)' \ + EXTRA_LDFLAGS='$(SAN_LDFLAGS) $(EXTRA_LDFLAGS)' \ OUTPUT=$(HOST_BUILD_DIR)/bpftool/ \ LIBBPF_OUTPUT=$(HOST_BUILD_DIR)/libbpf/ \ LIBBPF_DESTDIR=$(HOST_SCRATCH_DIR)/ \ @@ -338,8 +342,8 @@ $(CROSS_BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) \ $(BPFOBJ) | $(BUILD_DIR)/bpftool $(Q)$(MAKE) $(submake_extras) -C $(BPFTOOLDIR) \ ARCH=$(ARCH) CROSS_COMPILE=$(CROSS_COMPILE) \ - EXTRA_CFLAGS='-g $(OPT_FLAGS) $(EXTRA_CFLAGS)' \ - EXTRA_LDFLAGS='$(EXTRA_LDFLAGS)' \ + EXTRA_CFLAGS='-g $(OPT_FLAGS) $(SAN_CFLAGS) $(EXTRA_CFLAGS)' \ + EXTRA_LDFLAGS='$(SAN_LDFLAGS) $(EXTRA_LDFLAGS)' \ OUTPUT=$(BUILD_DIR)/bpftool/ \ LIBBPF_OUTPUT=$(BUILD_DIR)/libbpf/ \ LIBBPF_DESTDIR=$(SCRATCH_DIR)/ \ @@ -404,6 +408,7 @@ $(RESOLVE_BTFIDS): $(HOST_BPFOBJ) | $(HOST_BUILD_DIR)/resolve_btfids \ $(Q)$(MAKE) $(submake_extras) -C $(TOOLSDIR)/bpf/resolve_btfids \ CC="$(HOSTCC)" LD="$(HOSTLD)" AR="$(HOSTAR)" \ LIBBPF_INCLUDE=$(HOST_INCLUDE_DIR) \ + EXTRA_LDFLAGS='$(SAN_LDFLAGS) $(EXTRA_LDFLAGS)' \ OUTPUT=$(HOST_BUILD_DIR)/resolve_btfids/ BPFOBJ=$(HOST_BPFOBJ) # Get Clang's default includes on this system, as opposed to those seen by From c5c1e313493cd836863bb673bb2b8beaa915cad9 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:23 -0800 Subject: [PATCH 189/359] resolve_btfids: Fix memory leaks reported by ASAN Running resolve_btfids with ASAN reveals memory leaks in btf_id handling. - Change get_id() to use a local buffer - Make btf_id__add() strdup the name internally - Add btf_id__free_all() that frees all nodese of a tree - Call the cleanup function on exit for every tree Acked-by: Jiri Olsa Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223190736.649171-8-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/bpf/resolve_btfids/main.c | 81 ++++++++++++++++++++++----------- 1 file changed, 54 insertions(+), 27 deletions(-) diff --git a/tools/bpf/resolve_btfids/main.c b/tools/bpf/resolve_btfids/main.c index ca7fcd03efb6..5208f650080f 100644 --- a/tools/bpf/resolve_btfids/main.c +++ b/tools/bpf/resolve_btfids/main.c @@ -226,7 +226,7 @@ static struct btf_id *btf_id__find(struct rb_root *root, const char *name) } static struct btf_id *__btf_id__add(struct rb_root *root, - char *name, + const char *name, enum btf_id_kind kind, bool unique) { @@ -250,7 +250,11 @@ static struct btf_id *__btf_id__add(struct rb_root *root, id = zalloc(sizeof(*id)); if (id) { pr_debug("adding symbol %s\n", name); - id->name = name; + id->name = strdup(name); + if (!id->name) { + free(id); + return NULL; + } id->kind = kind; rb_link_node(&id->rb_node, parent, p); rb_insert_color(&id->rb_node, root); @@ -258,17 +262,21 @@ static struct btf_id *__btf_id__add(struct rb_root *root, return id; } -static inline struct btf_id *btf_id__add(struct rb_root *root, char *name, enum btf_id_kind kind) +static inline struct btf_id *btf_id__add(struct rb_root *root, + const char *name, + enum btf_id_kind kind) { return __btf_id__add(root, name, kind, false); } -static inline struct btf_id *btf_id__add_unique(struct rb_root *root, char *name, enum btf_id_kind kind) +static inline struct btf_id *btf_id__add_unique(struct rb_root *root, + const char *name, + enum btf_id_kind kind) { return __btf_id__add(root, name, kind, true); } -static char *get_id(const char *prefix_end) +static int get_id(const char *prefix_end, char *buf, size_t buf_sz) { /* * __BTF_ID__func__vfs_truncate__0 @@ -277,28 +285,28 @@ static char *get_id(const char *prefix_end) */ int len = strlen(prefix_end); int pos = sizeof("__") - 1; - char *p, *id; + char *p; if (pos >= len) - return NULL; + return -1; - id = strdup(prefix_end + pos); - if (id) { - /* - * __BTF_ID__func__vfs_truncate__0 - * id = ^ - * - * cut the unique id part - */ - p = strrchr(id, '_'); - p--; - if (*p != '_') { - free(id); - return NULL; - } - *p = '\0'; - } - return id; + if (len - pos >= buf_sz) + return -1; + + strcpy(buf, prefix_end + pos); + /* + * __BTF_ID__func__vfs_truncate__0 + * buf = ^ + * + * cut the unique id part + */ + p = strrchr(buf, '_'); + p--; + if (*p != '_') + return -1; + *p = '\0'; + + return 0; } static struct btf_id *add_set(struct object *obj, char *name, enum btf_id_kind kind) @@ -335,10 +343,9 @@ static struct btf_id *add_set(struct object *obj, char *name, enum btf_id_kind k static struct btf_id *add_symbol(struct rb_root *root, char *name, size_t size) { - char *id; + char id[KSYM_NAME_LEN]; - id = get_id(name + size); - if (!id) { + if (get_id(name + size, id, sizeof(id))) { pr_err("FAILED to parse symbol name: %s\n", name); return NULL; } @@ -346,6 +353,21 @@ static struct btf_id *add_symbol(struct rb_root *root, char *name, size_t size) return btf_id__add(root, id, BTF_ID_KIND_SYM); } +static void btf_id__free_all(struct rb_root *root) +{ + struct rb_node *next; + struct btf_id *id; + + next = rb_first(root); + while (next) { + id = rb_entry(next, struct btf_id, rb_node); + next = rb_next(&id->rb_node); + rb_erase(&id->rb_node, root); + free(id->name); + free(id); + } +} + static void bswap_32_data(void *data, u32 nr_bytes) { u32 cnt, i; @@ -1547,6 +1569,11 @@ dump_btf: out: btf__free(obj.base_btf); btf__free(obj.btf); + btf_id__free_all(&obj.structs); + btf_id__free_all(&obj.unions); + btf_id__free_all(&obj.typedefs); + btf_id__free_all(&obj.funcs); + btf_id__free_all(&obj.sets); if (obj.efile.elf) { elf_end(obj.efile.elf); close(obj.efile.fd); From 45897ced3c1d3940fb5b68fe48c9e144143ae0ae Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:24 -0800 Subject: [PATCH 190/359] selftests/bpf: Add DENYLIST.asan Add a denylist file for tests that should be skipped when built with userspace ASAN: $ make ... SAN_CFLAGS="-fsanitize=address -fno-omit-frame-pointer" Skip the following tests: - *arena*: userspace ASAN does not understand BPF arena maps and gets confused particularly when map_extra is non-zero - non-zero map_extra leads to mmap with MAP_FIXED, and ASAN treats this as an unknown memory region - task_local_data: ASAN complains about "incorrect" aligned_alloc() usage, but it's intentional in the test - uprobe_multi_test: very slow with ASAN enabled Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223190736.649171-9-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/DENYLIST.asan | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 tools/testing/selftests/bpf/DENYLIST.asan diff --git a/tools/testing/selftests/bpf/DENYLIST.asan b/tools/testing/selftests/bpf/DENYLIST.asan new file mode 100644 index 000000000000..d7fe372a2293 --- /dev/null +++ b/tools/testing/selftests/bpf/DENYLIST.asan @@ -0,0 +1,3 @@ +*arena* +task_local_data +uprobe_multi_test From a1a771bd649212ef32cf9b0bcc63213a762d354a Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:25 -0800 Subject: [PATCH 191/359] selftests/bpf: Refactor bpf_get_ksyms() trace helper ASAN reported a memory leak in bpf_get_ksyms(): it allocates a struct ksyms internally and never frees it. Move struct ksyms to trace_helpers.h and return it from the bpf_get_ksyms(), giving ownership to the caller. Add filtered_syms and filtered_cnt fields to the ksyms to hold the filtered array of symbols, previously returned by bpf_get_ksyms(). Fixup the call sites: kprobe_multi_test and bench_trigger. Signed-off-by: Ihor Solodrai Acked-by: Eduard Zingerman Link: https://lore.kernel.org/r/20260223190736.649171-10-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- .../selftests/bpf/benchs/bench_trigger.c | 14 ++++++----- .../bpf/prog_tests/kprobe_multi_test.c | 12 ++++------ tools/testing/selftests/bpf/trace_helpers.c | 23 ++++++++++--------- tools/testing/selftests/bpf/trace_helpers.h | 11 +++++++-- 4 files changed, 34 insertions(+), 26 deletions(-) diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c index aeec9edd3851..f74b313d6ae4 100644 --- a/tools/testing/selftests/bpf/benchs/bench_trigger.c +++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c @@ -230,8 +230,8 @@ static void trigger_fentry_setup(void) static void attach_ksyms_all(struct bpf_program *empty, bool kretprobe) { LIBBPF_OPTS(bpf_kprobe_multi_opts, opts); - char **syms = NULL; - size_t cnt = 0; + struct bpf_link *link = NULL; + struct ksyms *ksyms = NULL; /* Some recursive functions will be skipped in * bpf_get_ksyms -> skip_entry, as they can introduce sufficient @@ -241,16 +241,18 @@ static void attach_ksyms_all(struct bpf_program *empty, bool kretprobe) * So, don't run the kprobe-multi-all and kretprobe-multi-all on * a debug kernel. */ - if (bpf_get_ksyms(&syms, &cnt, true)) { + if (bpf_get_ksyms(&ksyms, true)) { fprintf(stderr, "failed to get ksyms\n"); exit(1); } - opts.syms = (const char **) syms; - opts.cnt = cnt; + opts.syms = (const char **)ksyms->filtered_syms; + opts.cnt = ksyms->filtered_cnt; opts.retprobe = kretprobe; /* attach empty to all the kernel functions except bpf_get_numa_node_id. */ - if (!bpf_program__attach_kprobe_multi_opts(empty, NULL, &opts)) { + link = bpf_program__attach_kprobe_multi_opts(empty, NULL, &opts); + free_kallsyms_local(ksyms); + if (!link) { fprintf(stderr, "failed to attach bpf_program__attach_kprobe_multi_opts to all\n"); exit(1); } diff --git a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c index 9caef222e528..f81dcd609ee9 100644 --- a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c +++ b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c @@ -456,25 +456,23 @@ static void test_kprobe_multi_bench_attach(bool kernel) { LIBBPF_OPTS(bpf_kprobe_multi_opts, opts); struct kprobe_multi_empty *skel = NULL; - char **syms = NULL; - size_t cnt = 0; + struct ksyms *ksyms = NULL; - if (!ASSERT_OK(bpf_get_ksyms(&syms, &cnt, kernel), "bpf_get_ksyms")) + if (!ASSERT_OK(bpf_get_ksyms(&ksyms, kernel), "bpf_get_ksyms")) return; skel = kprobe_multi_empty__open_and_load(); if (!ASSERT_OK_PTR(skel, "kprobe_multi_empty__open_and_load")) goto cleanup; - opts.syms = (const char **) syms; - opts.cnt = cnt; + opts.syms = (const char **)ksyms->filtered_syms; + opts.cnt = ksyms->filtered_cnt; do_bench_test(skel, &opts); cleanup: kprobe_multi_empty__destroy(skel); - if (syms) - free(syms); + free_kallsyms_local(ksyms); } static void test_kprobe_multi_bench_attach_addr(bool kernel) diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c index eeaab7013ca2..0e63daf83ed5 100644 --- a/tools/testing/selftests/bpf/trace_helpers.c +++ b/tools/testing/selftests/bpf/trace_helpers.c @@ -24,12 +24,6 @@ #define TRACEFS_PIPE "/sys/kernel/tracing/trace_pipe" #define DEBUGFS_PIPE "/sys/kernel/debug/tracing/trace_pipe" -struct ksyms { - struct ksym *syms; - size_t sym_cap; - size_t sym_cnt; -}; - static struct ksyms *ksyms; static pthread_mutex_t ksyms_mutex = PTHREAD_MUTEX_INITIALIZER; @@ -54,6 +48,8 @@ void free_kallsyms_local(struct ksyms *ksyms) if (!ksyms) return; + free(ksyms->filtered_syms); + if (!ksyms->syms) { free(ksyms); return; @@ -610,7 +606,7 @@ static int search_kallsyms_compare(const void *p1, const struct ksym *p2) return compare_name(p1, p2->name); } -int bpf_get_ksyms(char ***symsp, size_t *cntp, bool kernel) +int bpf_get_ksyms(struct ksyms **ksymsp, bool kernel) { size_t cap = 0, cnt = 0; char *name = NULL, *ksym_name, **syms = NULL; @@ -637,8 +633,10 @@ int bpf_get_ksyms(char ***symsp, size_t *cntp, bool kernel) else f = fopen("/sys/kernel/debug/tracing/available_filter_functions", "r"); - if (!f) + if (!f) { + free_kallsyms_local(ksyms); return -EINVAL; + } map = hashmap__new(symbol_hash, symbol_equal, NULL); if (IS_ERR(map)) { @@ -679,15 +677,18 @@ int bpf_get_ksyms(char ***symsp, size_t *cntp, bool kernel) syms[cnt++] = ksym_name; } - *symsp = syms; - *cntp = cnt; + ksyms->filtered_syms = syms; + ksyms->filtered_cnt = cnt; + *ksymsp = ksyms; error: free(name); fclose(f); hashmap__free(map); - if (err) + if (err) { free(syms); + free_kallsyms_local(ksyms); + } return err; } diff --git a/tools/testing/selftests/bpf/trace_helpers.h b/tools/testing/selftests/bpf/trace_helpers.h index a5576b2dfc26..d5bf1433675d 100644 --- a/tools/testing/selftests/bpf/trace_helpers.h +++ b/tools/testing/selftests/bpf/trace_helpers.h @@ -23,7 +23,14 @@ struct ksym { long addr; char *name; }; -struct ksyms; + +struct ksyms { + struct ksym *syms; + size_t sym_cap; + size_t sym_cnt; + char **filtered_syms; + size_t filtered_cnt; +}; typedef int (*ksym_cmp_t)(const void *p1, const void *p2); typedef int (*ksym_search_cmp_t)(const void *p1, const struct ksym *p2); @@ -53,7 +60,7 @@ ssize_t get_rel_offset(uintptr_t addr); int read_build_id(const char *path, char *build_id, size_t size); -int bpf_get_ksyms(char ***symsp, size_t *cntp, bool kernel); +int bpf_get_ksyms(struct ksyms **ksymsp, bool kernel); int bpf_get_addrs(unsigned long **addrsp, size_t *cntp, bool kernel); #endif From 9d0272c91fbc3ae86f2c3b6ba0a0b0ee77d862e3 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:26 -0800 Subject: [PATCH 192/359] selftests/bpf: Fix memory leaks in tests Fix trivial memory leaks detected by userspace ASAN: - htab_update: free value buffer in test_reenter_update cleanup - test_xsk: inline pkt_stream_replace() in testapp_stats_rx_full() and testapp_stats_fill_empty() - testing_helpers: free buffer allocated by getline() in parse_test_list_file Acked-by: Eduard Zingerman Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223190736.649171-11-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- .../selftests/bpf/prog_tests/htab_update.c | 1 + .../selftests/bpf/prog_tests/test_xsk.c | 24 +++++++++++++++---- tools/testing/selftests/bpf/testing_helpers.c | 1 + 3 files changed, 22 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/htab_update.c b/tools/testing/selftests/bpf/prog_tests/htab_update.c index d0b405eb2966..ea1a6766fbe9 100644 --- a/tools/testing/selftests/bpf/prog_tests/htab_update.c +++ b/tools/testing/selftests/bpf/prog_tests/htab_update.c @@ -61,6 +61,7 @@ static void test_reenter_update(void) ASSERT_EQ(skel->bss->update_err, -EDEADLK, "no reentrancy"); out: + free(value); htab_update__destroy(skel); } diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c index bab4a31621c7..7e38ec6e656b 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c +++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c @@ -2003,9 +2003,17 @@ int testapp_stats_tx_invalid_descs(struct test_spec *test) int testapp_stats_rx_full(struct test_spec *test) { - if (pkt_stream_replace(test, DEFAULT_UMEM_BUFFERS + DEFAULT_UMEM_BUFFERS / 2, MIN_PKT_SIZE)) + struct pkt_stream *tmp; + + tmp = pkt_stream_generate(DEFAULT_UMEM_BUFFERS + DEFAULT_UMEM_BUFFERS / 2, MIN_PKT_SIZE); + if (!tmp) return TEST_FAILURE; - test->ifobj_rx->xsk->pkt_stream = pkt_stream_generate(DEFAULT_UMEM_BUFFERS, MIN_PKT_SIZE); + test->ifobj_tx->xsk->pkt_stream = tmp; + + tmp = pkt_stream_generate(DEFAULT_UMEM_BUFFERS, MIN_PKT_SIZE); + if (!tmp) + return TEST_FAILURE; + test->ifobj_rx->xsk->pkt_stream = tmp; test->ifobj_rx->xsk->rxqsize = DEFAULT_UMEM_BUFFERS; test->ifobj_rx->release_rx = false; @@ -2015,9 +2023,17 @@ int testapp_stats_rx_full(struct test_spec *test) int testapp_stats_fill_empty(struct test_spec *test) { - if (pkt_stream_replace(test, DEFAULT_UMEM_BUFFERS + DEFAULT_UMEM_BUFFERS / 2, MIN_PKT_SIZE)) + struct pkt_stream *tmp; + + tmp = pkt_stream_generate(DEFAULT_UMEM_BUFFERS + DEFAULT_UMEM_BUFFERS / 2, MIN_PKT_SIZE); + if (!tmp) return TEST_FAILURE; - test->ifobj_rx->xsk->pkt_stream = pkt_stream_generate(DEFAULT_UMEM_BUFFERS, MIN_PKT_SIZE); + test->ifobj_tx->xsk->pkt_stream = tmp; + + tmp = pkt_stream_generate(DEFAULT_UMEM_BUFFERS, MIN_PKT_SIZE); + if (!tmp) + return TEST_FAILURE; + test->ifobj_rx->xsk->pkt_stream = tmp; test->ifobj_rx->use_fill_ring = false; test->ifobj_rx->validation_func = validate_fill_empty; diff --git a/tools/testing/selftests/bpf/testing_helpers.c b/tools/testing/selftests/bpf/testing_helpers.c index 16eb37e5bad6..66af0d13751a 100644 --- a/tools/testing/selftests/bpf/testing_helpers.c +++ b/tools/testing/selftests/bpf/testing_helpers.c @@ -212,6 +212,7 @@ int parse_test_list_file(const char *path, break; } + free(buf); fclose(f); return err; } From 3eb4a2e399c6766ea0c8291e2adb5e6dc42b6e68 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:27 -0800 Subject: [PATCH 193/359] selftests/bpf: Fix cleanup in check_fd_array_cnt__fd_array_too_big() The Close() macro uses the passed in expression three times, which leads to repeated execution in case it has side effects. That is, Close(i--) would decrement i three times. ASAN caught a stack-buffer-undeflow error at a point where this was overlooked. Fix it. Acked-by: Eduard Zingerman Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223190736.649171-12-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/prog_tests/fd_array.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/fd_array.c b/tools/testing/selftests/bpf/prog_tests/fd_array.c index c534b4d5f9da..3078d8264deb 100644 --- a/tools/testing/selftests/bpf/prog_tests/fd_array.c +++ b/tools/testing/selftests/bpf/prog_tests/fd_array.c @@ -412,8 +412,8 @@ static void check_fd_array_cnt__fd_array_too_big(void) ASSERT_EQ(prog_fd, -E2BIG, "prog should have been rejected with -E2BIG"); cleanup_fds: - while (i > 0) - Close(extra_fds[--i]); + while (i-- > 0) + Close(extra_fds[i]); } void test_fd_array_cnt(void) From ff9511410d5fbdec21a3bbfaa477aa1f56735b33 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:28 -0800 Subject: [PATCH 194/359] veristat: Fix a memory leak for preset ENUMERATOR ASAN detected a memory leak in veristat. The cleanup code handling ENUMERATOR value missed freeing strdup-ed svalue. Fix it. Acked-by: Mykyta Yatsenko Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223190736.649171-13-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/veristat.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/bpf/veristat.c b/tools/testing/selftests/bpf/veristat.c index 1be1e353d40a..75f85e0362f5 100644 --- a/tools/testing/selftests/bpf/veristat.c +++ b/tools/testing/selftests/bpf/veristat.c @@ -3378,6 +3378,8 @@ int main(int argc, char **argv) } } free(env.presets[i].atoms); + if (env.presets[i].value.type == ENUMERATOR) + free(env.presets[i].value.svalue); } free(env.presets); return -err; From f7ac552be7f13e834a942f64b1ac87e7755b32f4 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:29 -0800 Subject: [PATCH 195/359] selftests/bpf: Fix use-after-free in xdp_metadata test ASAN reported a use-after-free in close_xsk(). The xsk->socket internally references xsk->umem via socket->ctx->umem, so the socket must be deleted before the umem. Fix the order of operations in close_xsk(). Acked-by: Mykyta Yatsenko Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223190736.649171-14-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/prog_tests/xdp_metadata.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c index 19f92affc2da..5c31054ad4a4 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c @@ -126,10 +126,10 @@ static int open_xsk(int ifindex, struct xsk *xsk) static void close_xsk(struct xsk *xsk) { - if (xsk->umem) - xsk_umem__delete(xsk->umem); if (xsk->socket) xsk_socket__delete(xsk->socket); + if (xsk->umem) + xsk_umem__delete(xsk->umem); munmap(xsk->umem_area, UMEM_SIZE); } From 68b8bea1eec392aa50736a859b8f1418067a3bc4 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:30 -0800 Subject: [PATCH 196/359] selftests/bpf: Fix double thread join in uprobe_multi_test ASAN reported a "joining already joined thread" error. The release_child() may be called multiple times for the same struct child. Fix by resetting child->thread to 0 after pthread_join. Also memset(0) static child variable in test_attach_api(). Acked-by: Mykyta Yatsenko Acked-by: Jiri Olsa Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223190736.649171-15-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c index 2ee17ef1dae2..56cbea280fbd 100644 --- a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c @@ -62,8 +62,10 @@ static void release_child(struct child *child) return; close(child->go[1]); close(child->go[0]); - if (child->thread) + if (child->thread) { pthread_join(child->thread, NULL); + child->thread = 0; + } close(child->c2p[0]); close(child->c2p[1]); if (child->pid > 0) @@ -331,6 +333,8 @@ test_attach_api(const char *binary, const char *pattern, struct bpf_uprobe_multi { static struct child child; + memset(&child, 0, sizeof(child)); + /* no pid filter */ __test_attach_api(binary, pattern, opts, NULL); From 71dca2950fe9abf8e6fd3d9f5e6342da6634b898 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:31 -0800 Subject: [PATCH 197/359] selftests/bpf: Fix resource leaks caused by missing cleanups ASAN reported a number of resource leaks: - Add missing *__destroy(skel) calls - Replace bpf_link__detach() with bpf_link__destroy() where appropriate - cgrp_local_storage: Add bpf_link__destroy() when bpf_iter_create fails - dynptr: Add missing bpf_object__close() Acked-by: Eduard Zingerman Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223190736.649171-16-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- .../bpf/prog_tests/cgrp_local_storage.c | 4 ++- .../testing/selftests/bpf/prog_tests/dynptr.c | 5 +++- .../selftests/bpf/prog_tests/sockmap_basic.c | 28 +++++++++---------- .../selftests/bpf/prog_tests/sockmap_listen.c | 2 +- .../bpf/prog_tests/struct_ops_private_stack.c | 4 +-- .../selftests/bpf/prog_tests/tc_opts.c | 6 ++-- .../selftests/bpf/prog_tests/test_tc_tunnel.c | 5 +++- 7 files changed, 29 insertions(+), 25 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/cgrp_local_storage.c b/tools/testing/selftests/bpf/prog_tests/cgrp_local_storage.c index 9015e2c2ab12..478a77cb67e6 100644 --- a/tools/testing/selftests/bpf/prog_tests/cgrp_local_storage.c +++ b/tools/testing/selftests/bpf/prog_tests/cgrp_local_storage.c @@ -202,7 +202,7 @@ static void test_cgroup_iter_sleepable(int cgroup_fd, __u64 cgroup_id) iter_fd = bpf_iter_create(bpf_link__fd(link)); if (!ASSERT_GE(iter_fd, 0, "iter_create")) - goto out; + goto out_link; /* trigger the program run */ (void)read(iter_fd, buf, sizeof(buf)); @@ -210,6 +210,8 @@ static void test_cgroup_iter_sleepable(int cgroup_fd, __u64 cgroup_id) ASSERT_EQ(skel->bss->cgroup_id, cgroup_id, "cgroup_id"); close(iter_fd); +out_link: + bpf_link__destroy(link); out: cgrp_ls_sleepable__destroy(skel); } diff --git a/tools/testing/selftests/bpf/prog_tests/dynptr.c b/tools/testing/selftests/bpf/prog_tests/dynptr.c index b9f86cb91e81..5fda11590708 100644 --- a/tools/testing/selftests/bpf/prog_tests/dynptr.c +++ b/tools/testing/selftests/bpf/prog_tests/dynptr.c @@ -137,11 +137,14 @@ static void verify_success(const char *prog_name, enum test_setup_type setup_typ ); link = bpf_program__attach(prog); - if (!ASSERT_OK_PTR(link, "bpf_program__attach")) + if (!ASSERT_OK_PTR(link, "bpf_program__attach")) { + bpf_object__close(obj); goto cleanup; + } err = bpf_prog_test_run_opts(aux_prog_fd, &topts); bpf_link__destroy(link); + bpf_object__close(obj); if (!ASSERT_OK(err, "test_run")) goto cleanup; diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c index 256707e7d20d..dd3c757859f6 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c @@ -204,7 +204,7 @@ static void test_skmsg_helpers_with_link(enum bpf_map_type map_type) /* Fail since bpf_link for the same prog type has been created. */ link2 = bpf_program__attach_sockmap(prog_clone, map); if (!ASSERT_ERR_PTR(link2, "bpf_program__attach_sockmap")) { - bpf_link__detach(link2); + bpf_link__destroy(link2); goto out; } @@ -230,7 +230,7 @@ static void test_skmsg_helpers_with_link(enum bpf_map_type map_type) if (!ASSERT_OK(err, "bpf_link_update")) goto out; out: - bpf_link__detach(link); + bpf_link__destroy(link); test_skmsg_load_helpers__destroy(skel); } @@ -417,7 +417,7 @@ static void test_sockmap_skb_verdict_attach_with_link(void) if (!ASSERT_OK_PTR(link, "bpf_program__attach_sockmap")) goto out; - bpf_link__detach(link); + bpf_link__destroy(link); err = bpf_prog_attach(bpf_program__fd(prog), map, BPF_SK_SKB_STREAM_VERDICT, 0); if (!ASSERT_OK(err, "bpf_prog_attach")) @@ -426,7 +426,7 @@ static void test_sockmap_skb_verdict_attach_with_link(void) /* Fail since attaching with the same prog/map has been done. */ link = bpf_program__attach_sockmap(prog, map); if (!ASSERT_ERR_PTR(link, "bpf_program__attach_sockmap")) - bpf_link__detach(link); + bpf_link__destroy(link); err = bpf_prog_detach2(bpf_program__fd(prog), map, BPF_SK_SKB_STREAM_VERDICT); if (!ASSERT_OK(err, "bpf_prog_detach2")) @@ -747,13 +747,13 @@ static void test_sockmap_skb_verdict_peek_with_link(void) test_sockmap_skb_verdict_peek_helper(map); ASSERT_EQ(pass->bss->clone_called, 1, "clone_called"); out: - bpf_link__detach(link); + bpf_link__destroy(link); test_sockmap_pass_prog__destroy(pass); } static void test_sockmap_unconnected_unix(void) { - int err, map, stream = 0, dgram = 0, zero = 0; + int err, map, stream = -1, dgram = -1, zero = 0; struct test_sockmap_pass_prog *skel; skel = test_sockmap_pass_prog__open_and_load(); @@ -764,22 +764,22 @@ static void test_sockmap_unconnected_unix(void) stream = xsocket(AF_UNIX, SOCK_STREAM, 0); if (stream < 0) - return; + goto out; dgram = xsocket(AF_UNIX, SOCK_DGRAM, 0); - if (dgram < 0) { - close(stream); - return; - } + if (dgram < 0) + goto out; err = bpf_map_update_elem(map, &zero, &stream, BPF_ANY); - ASSERT_ERR(err, "bpf_map_update_elem(stream)"); + if (!ASSERT_ERR(err, "bpf_map_update_elem(stream)")) + goto out; err = bpf_map_update_elem(map, &zero, &dgram, BPF_ANY); ASSERT_OK(err, "bpf_map_update_elem(dgram)"); - +out: close(stream); close(dgram); + test_sockmap_pass_prog__destroy(skel); } static void test_sockmap_many_socket(void) @@ -1027,7 +1027,7 @@ static void test_sockmap_skb_verdict_vsock_poll(void) if (xrecv_nonblock(conn, &buf, 1, 0) != 1) FAIL("xrecv_nonblock"); detach: - bpf_link__detach(link); + bpf_link__destroy(link); close: xclose(conn); xclose(peer); diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index f1bdccc7e4e7..cc0c68bab907 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -899,7 +899,7 @@ static void test_msg_redir_to_listening_with_link(struct test_sockmap_listen *sk redir_to_listening(family, sotype, sock_map, verdict_map, REDIR_EGRESS); - bpf_link__detach(link); + bpf_link__destroy(link); } static void redir_partial(int family, int sotype, int sock_map, int parser_map) diff --git a/tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c b/tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c index 4006879ca3fe..d42123a0fb16 100644 --- a/tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c +++ b/tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c @@ -54,9 +54,7 @@ static void test_private_stack_fail(void) } err = struct_ops_private_stack_fail__load(skel); - if (!ASSERT_ERR(err, "struct_ops_private_stack_fail__load")) - goto cleanup; - return; + ASSERT_ERR(err, "struct_ops_private_stack_fail__load"); cleanup: struct_ops_private_stack_fail__destroy(skel); diff --git a/tools/testing/selftests/bpf/prog_tests/tc_opts.c b/tools/testing/selftests/bpf/prog_tests/tc_opts.c index dd7a138d8c3d..2955750ddada 100644 --- a/tools/testing/selftests/bpf/prog_tests/tc_opts.c +++ b/tools/testing/selftests/bpf/prog_tests/tc_opts.c @@ -1360,10 +1360,8 @@ static void test_tc_opts_dev_cleanup_target(int target) assert_mprog_count_ifindex(ifindex, target, 4); - ASSERT_OK(system("ip link del dev tcx_opts1"), "del veth"); - ASSERT_EQ(if_nametoindex("tcx_opts1"), 0, "dev1_removed"); - ASSERT_EQ(if_nametoindex("tcx_opts2"), 0, "dev2_removed"); - return; + goto cleanup; + cleanup3: err = bpf_prog_detach_opts(fd3, loopback, target, &optd); ASSERT_OK(err, "prog_detach"); diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c b/tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c index 0fe0a8f62486..7fc4d7dd70ef 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c +++ b/tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c @@ -699,7 +699,7 @@ void test_tc_tunnel(void) return; if (!ASSERT_OK(setup(), "global setup")) - return; + goto out; for (i = 0; i < ARRAY_SIZE(subtests_cfg); i++) { cfg = &subtests_cfg[i]; @@ -711,4 +711,7 @@ void test_tc_tunnel(void) subtest_cleanup(cfg); } cleanup(); + +out: + test_tc_tunnel__destroy(skel); } From 2bb270a0ac68990648c5e84abf73bb7493230ea6 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:32 -0800 Subject: [PATCH 198/359] selftests/bpf: Free bpf_object in test_sysctl ASAN reported a resource leak due to the bpf_object not being tracked in test_sysctl. Add obj field to struct sysctl_test to properly clean it up. Acked-by: Eduard Zingerman Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223190736.649171-17-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/prog_tests/test_sysctl.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/test_sysctl.c b/tools/testing/selftests/bpf/prog_tests/test_sysctl.c index 273dd41ca09e..2a1ff821bc97 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_sysctl.c +++ b/tools/testing/selftests/bpf/prog_tests/test_sysctl.c @@ -27,6 +27,7 @@ struct sysctl_test { OP_EPERM, SUCCESS, } result; + struct bpf_object *obj; }; static struct sysctl_test tests[] = { @@ -1471,6 +1472,7 @@ static int load_sysctl_prog_file(struct sysctl_test *test) return -1; } + test->obj = obj; return prog_fd; } @@ -1573,6 +1575,7 @@ out: /* Detaching w/o checking return code: best effort attempt. */ if (progfd != -1) bpf_prog_detach(cgfd, atype); + bpf_object__close(test->obj); close(progfd); printf("[%s]\n", err ? "FAIL" : "PASS"); return err; From 3e711c8e4707c46cc45d9775b50204d0f2790d77 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:07:33 -0800 Subject: [PATCH 199/359] selftests/bpf: Fix array bounds warning in jit_disasm_helpers Compiler cannot infer upper bound for labels.cnt and warns about potential buffer overflow in snprintf. Add an explicit bounds check (... && i < MAX_LOCAL_LABELS) in the loop condition to fix the warning. Acked-by: Eduard Zingerman Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223190736.649171-18-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- .../testing/selftests/bpf/jit_disasm_helpers.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/bpf/jit_disasm_helpers.c b/tools/testing/selftests/bpf/jit_disasm_helpers.c index febd6b12e372..364c557c5115 100644 --- a/tools/testing/selftests/bpf/jit_disasm_helpers.c +++ b/tools/testing/selftests/bpf/jit_disasm_helpers.c @@ -122,15 +122,15 @@ static int disasm_one_func(FILE *text_out, uint8_t *image, __u32 len) pc += cnt; } qsort(labels.pcs, labels.cnt, sizeof(*labels.pcs), cmp_u32); - for (i = 0; i < labels.cnt; ++i) - /* gcc is unable to infer upper bound for labels.cnt and assumes - * it to be U32_MAX. U32_MAX takes 10 decimal digits. - * snprintf below prints into labels.names[*], - * which has space only for two digits and a letter. - * To avoid truncation warning use (i % MAX_LOCAL_LABELS), - * which informs gcc about printed value upper bound. - */ - snprintf(labels.names[i], sizeof(labels.names[i]), "L%d", i % MAX_LOCAL_LABELS); + /* gcc is unable to infer upper bound for labels.cnt and + * assumes it to be U32_MAX. U32_MAX takes 10 decimal digits. + * snprintf below prints into labels.names[*], which has space + * only for two digits and a letter. To avoid truncation + * warning use (i < MAX_LOCAL_LABELS), which informs gcc about + * printed value upper bound. + */ + for (i = 0; i < labels.cnt && i < MAX_LOCAL_LABELS; ++i) + snprintf(labels.names[i], sizeof(labels.names[i]), "L%d", i); /* now print with labels */ labels.print_phase = true; From ad90ecedad755e2f3e31364ec3130e5bb2f4b64a Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:11:16 -0800 Subject: [PATCH 200/359] selftests/bpf: Fix out-of-bounds array access bugs reported by ASAN - kmem_cache_iter: remove unnecessary debug output - lwt_seg6local: change the type of foobar to char[] - the sizeof(foobar) returned the pointer size and not a string length as intended - verifier_log: increase prog_name buffer size in verif_log_subtest() - compiler has a conservative estimate of fixed_log_sz value, making ASAN complain on snprint() call Acked-by: Eduard Zingerman Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223191118.655185-1-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c | 7 ++----- tools/testing/selftests/bpf/prog_tests/lwt_seg6local.c | 2 +- tools/testing/selftests/bpf/prog_tests/verifier_log.c | 2 +- 3 files changed, 4 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c b/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c index 6e35e13c2022..399fe9103f83 100644 --- a/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c +++ b/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c @@ -104,11 +104,8 @@ void test_kmem_cache_iter(void) if (!ASSERT_GE(iter_fd, 0, "iter_create")) goto destroy; - memset(buf, 0, sizeof(buf)); - while (read(iter_fd, buf, sizeof(buf)) > 0) { - /* Read out all contents */ - printf("%s", buf); - } + while (read(iter_fd, buf, sizeof(buf)) > 0) + ; /* Read out all contents */ /* Next reads should return 0 */ ASSERT_EQ(read(iter_fd, buf, sizeof(buf)), 0, "read"); diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_seg6local.c b/tools/testing/selftests/bpf/prog_tests/lwt_seg6local.c index 3bc730b7c7fa..1b25d5c5f8fb 100644 --- a/tools/testing/selftests/bpf/prog_tests/lwt_seg6local.c +++ b/tools/testing/selftests/bpf/prog_tests/lwt_seg6local.c @@ -117,7 +117,7 @@ void test_lwt_seg6local(void) const char *ns1 = NETNS_BASE "1"; const char *ns6 = NETNS_BASE "6"; struct nstoken *nstoken = NULL; - const char *foobar = "foobar"; + const char foobar[] = "foobar"; ssize_t bytes; int sfd, cfd; char buf[7]; diff --git a/tools/testing/selftests/bpf/prog_tests/verifier_log.c b/tools/testing/selftests/bpf/prog_tests/verifier_log.c index 8337c6bc5b95..aaa2854974c0 100644 --- a/tools/testing/selftests/bpf/prog_tests/verifier_log.c +++ b/tools/testing/selftests/bpf/prog_tests/verifier_log.c @@ -47,7 +47,7 @@ static int load_prog(struct bpf_prog_load_opts *opts, bool expect_load_error) static void verif_log_subtest(const char *name, bool expect_load_error, int log_level) { LIBBPF_OPTS(bpf_prog_load_opts, opts); - char *exp_log, prog_name[16], op_name[32]; + char *exp_log, prog_name[24], op_name[32]; struct test_log_buf *skel; struct bpf_program *prog; size_t fixed_log_sz; From a2714e730304c79bf85d2178387255ef8348b897 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:11:17 -0800 Subject: [PATCH 201/359] selftests/bpf: Check BPFTOOL env var in detect_bpftool_path() The bpftool_maps_access and bpftool_metadata tests may fail on BPF CI with "command not found", depending on a workflow. This happens because detect_bpftool_path() only checks two hardcoded relative paths: - ./tools/sbin/bpftool - ../tools/sbin/bpftool Add support for a BPFTOOL environment variable that allows specifying the exact path to the bpftool binary. Acked-by: Mykyta Yatsenko Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223191118.655185-2-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/bpftool_helpers.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/bpftool_helpers.c b/tools/testing/selftests/bpf/bpftool_helpers.c index 595a636fa13b..929fc257f431 100644 --- a/tools/testing/selftests/bpf/bpftool_helpers.c +++ b/tools/testing/selftests/bpf/bpftool_helpers.c @@ -14,6 +14,17 @@ static int detect_bpftool_path(char *buffer, size_t size) { char tmp[BPFTOOL_PATH_MAX_LEN]; + const char *env_path; + + /* First, check if BPFTOOL environment variable is set */ + env_path = getenv("BPFTOOL"); + if (env_path && access(env_path, X_OK) == 0) { + strscpy(buffer, env_path, size); + return 0; + } else if (env_path) { + fprintf(stderr, "bpftool '%s' doesn't exist or is not executable\n", env_path); + return 1; + } /* Check default bpftool location (will work if we are running the * default flavor of test_progs) @@ -33,7 +44,7 @@ static int detect_bpftool_path(char *buffer, size_t size) return 0; } - /* Failed to find bpftool binary */ + fprintf(stderr, "Failed to detect bpftool path, use BPFTOOL env var to override\n"); return 1; } From 4c9d07865c06bb9b9faff1ecf6c4b7cc8f7d67a9 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Mon, 23 Feb 2026 11:11:18 -0800 Subject: [PATCH 202/359] selftests/bpf: Don't override SIGSEGV handler with ASAN test_progs has custom SIGSEGV handler, which interferes with the address sanitizer [1]. Add an #ifndef to avoid this. Additionally, declare an __asan_on_error() to dump the test logs in the same way it happens in the custom SIGSEGV handler. [1] https://lore.kernel.org/bpf/73d832948b01dbc0ebc60d85574bdf8537f3a810.camel@gmail.com/ Acked-by: Mykyta Yatsenko Acked-by: Eduard Zingerman Signed-off-by: Ihor Solodrai Link: https://lore.kernel.org/r/20260223191118.655185-3-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/test_progs.c | 36 +++++++++++++++++------- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c index d1418ec1f351..0929f4a7bda4 100644 --- a/tools/testing/selftests/bpf/test_progs.c +++ b/tools/testing/selftests/bpf/test_progs.c @@ -1261,14 +1261,8 @@ int get_bpf_max_tramp_links(void) return ret; } -#define MAX_BACKTRACE_SZ 128 -void crash_handler(int signum) +static void dump_crash_log(void) { - void *bt[MAX_BACKTRACE_SZ]; - size_t sz; - - sz = backtrace(bt, ARRAY_SIZE(bt)); - fflush(stdout); stdout = env.stdout_saved; stderr = env.stderr_saved; @@ -1277,12 +1271,32 @@ void crash_handler(int signum) env.test_state->error_cnt++; dump_test_log(env.test, env.test_state, true, false, NULL); } +} + +#define MAX_BACKTRACE_SZ 128 + +void crash_handler(int signum) +{ + void *bt[MAX_BACKTRACE_SZ]; + size_t sz; + + sz = backtrace(bt, ARRAY_SIZE(bt)); + + dump_crash_log(); + if (env.worker_id != -1) fprintf(stderr, "[%d]: ", env.worker_id); fprintf(stderr, "Caught signal #%d!\nStack trace:\n", signum); backtrace_symbols_fd(bt, sz, STDERR_FILENO); } +#ifdef __SANITIZE_ADDRESS__ +void __asan_on_error(void) +{ + dump_crash_log(); +} +#endif + void hexdump(const char *prefix, const void *buf, size_t len) { for (int i = 0; i < len; i++) { @@ -1944,13 +1958,15 @@ int main(int argc, char **argv) .parser = parse_arg, .doc = argp_program_doc, }; + int err, i; + +#ifndef __SANITIZE_ADDRESS__ struct sigaction sigact = { .sa_handler = crash_handler, .sa_flags = SA_RESETHAND, - }; - int err, i; - + }; sigaction(SIGSEGV, &sigact, NULL); +#endif env.stdout_saved = stdout; env.stderr_saved = stderr; From 56e0a838277b12d48477b9c7478ceb5ae528ae2c Mon Sep 17 00:00:00 2001 From: Shawn Guo Date: Tue, 24 Feb 2026 15:57:53 +0800 Subject: [PATCH 203/359] MAINTAINERS: Update Shawn Guo's address for HiSilicon PCIe controller driver Shawn is no longer with Linaro. Use his korg email address instead. Signed-off-by: Shawn Guo Signed-off-by: Bjorn Helgaas Link: https://patch.msgid.link/20260224075753.122091-1-shawnguo@kernel.org --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 55af015174a5..f02f50c9d695 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -20509,7 +20509,7 @@ F: Documentation/devicetree/bindings/pci/hisilicon,kirin-pcie.yaml F: drivers/pci/controller/dwc/pcie-kirin.c PCIE DRIVER FOR HISILICON STB -M: Shawn Guo +M: Shawn Guo L: linux-pci@vger.kernel.org S: Maintained F: Documentation/devicetree/bindings/pci/hisilicon-histb-pcie.txt From 30df81f2228d65bddf492db3929d9fcaffd38fc5 Mon Sep 17 00:00:00 2001 From: Peter Wang Date: Mon, 23 Feb 2026 14:56:09 +0800 Subject: [PATCH 204/359] scsi: ufs: core: Fix possible NULL pointer dereference in ufshcd_add_command_trace() The kernel log indicates a crash in ufshcd_add_command_trace, due to a NULL pointer dereference when accessing hwq->id. This can happen if ufshcd_mcq_req_to_hwq() returns NULL. This patch adds a NULL check for hwq before accessing its id field to prevent a kernel crash. Kernel log excerpt: [] notify_die+0x4c/0x8c [] __die+0x60/0xb0 [] die+0x4c/0xe0 [] die_kernel_fault+0x74/0x88 [] __do_kernel_fault+0x314/0x318 [] do_page_fault+0xa4/0x5f8 [] do_translation_fault+0x34/0x54 [] do_mem_abort+0x50/0xa8 [] el1_abort+0x3c/0x64 [] el1h_64_sync_handler+0x44/0xcc [] el1h_64_sync+0x80/0x88 [] ufshcd_add_command_trace+0x23c/0x320 [] ufshcd_compl_one_cqe+0xa4/0x404 [] ufshcd_mcq_poll_cqe_lock+0xac/0x104 [] ufs_mtk_mcq_intr+0x54/0x74 [ufs_mediatek_mod] [] __handle_irq_event_percpu+0xc8/0x348 [] handle_irq_event+0x3c/0xa8 [] handle_fasteoi_irq+0xf8/0x294 [] generic_handle_domain_irq+0x54/0x80 [] gic_handle_irq+0x1d4/0x330 [] call_on_irq_stack+0x44/0x68 [] do_interrupt_handler+0x78/0xd8 [] el1_interrupt+0x48/0xa8 [] el1h_64_irq_handler+0x14/0x24 [] el1h_64_irq+0x80/0x88 [] arch_local_irq_enable+0x4/0x1c [] cpuidle_enter+0x34/0x54 [] do_idle+0x1dc/0x2f8 [] cpu_startup_entry+0x30/0x3c [] secondary_start_kernel+0x134/0x1ac [] __secondary_switched+0xc4/0xcc Signed-off-by: Peter Wang Reviewed-by: Bart Van Assche Link: https://patch.msgid.link/20260223065657.2432447-1-peter.wang@mediatek.com Signed-off-by: Martin K. Petersen --- drivers/ufs/core/ufshcd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 571a73860561..4dc7fbc2a6db 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -518,8 +518,8 @@ static void ufshcd_add_command_trace(struct ufs_hba *hba, struct scsi_cmnd *cmd, if (hba->mcq_enabled) { struct ufs_hw_queue *hwq = ufshcd_mcq_req_to_hwq(hba, rq); - - hwq_id = hwq->id; + if (hwq) + hwq_id = hwq->id; } else { doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL); } From 62c015373e1cdb1cdca824bd2dbce2dac0819467 Mon Sep 17 00:00:00 2001 From: Peter Wang Date: Mon, 23 Feb 2026 18:37:57 +0800 Subject: [PATCH 205/359] scsi: ufs: core: Move link recovery for hibern8 exit failure to wl_resume Move the link recovery trigger from ufshcd_uic_pwr_ctrl() to __ufshcd_wl_resume(). Ensure link recovery is only attempted when hibern8 exit fails during resume, not during hibern8 enter in suspend. Improve error handling and prevent unnecessary link recovery attempts. Fixes: 35dabf4503b9 ("scsi: ufs: core: Use link recovery when h8 exit fails during runtime resume") Signed-off-by: Peter Wang Reviewed-by: Bart Van Assche Link: https://patch.msgid.link/20260223103906.2533654-1-peter.wang@mediatek.com Signed-off-by: Martin K. Petersen --- drivers/ufs/core/ufshcd.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 4dc7fbc2a6db..85987373b961 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -4390,14 +4390,6 @@ out_unlock: spin_unlock_irqrestore(hba->host->host_lock, flags); mutex_unlock(&hba->uic_cmd_mutex); - /* - * If the h8 exit fails during the runtime resume process, it becomes - * stuck and cannot be recovered through the error handler. To fix - * this, use link recovery instead of the error handler. - */ - if (ret && hba->pm_op_in_progress) - ret = ufshcd_link_recovery(hba); - return ret; } @@ -10200,7 +10192,15 @@ static int __ufshcd_wl_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op) } else { dev_err(hba->dev, "%s: hibern8 exit failed %d\n", __func__, ret); - goto vendor_suspend; + /* + * If the h8 exit fails during the runtime resume + * process, it becomes stuck and cannot be recovered + * through the error handler. To fix this, use link + * recovery instead of the error handler. + */ + ret = ufshcd_link_recovery(hba); + if (ret) + goto vendor_suspend; } } else if (ufshcd_is_link_off(hba)) { /* From 74b6e83942dcc9f3cca9e561b205a5b19940a344 Mon Sep 17 00:00:00 2001 From: Matthew Brost Date: Thu, 19 Feb 2026 12:50:29 -0800 Subject: [PATCH 206/359] drm/gpusvm: Fix drm_gpusvm_pages_valid_unlocked() kernel-doc The kernel-doc for drm_gpusvm_pages_valid_unlocked() was stale and still referenced old range-based arguments and naming. Update the documentation to match the current function arguments and signature. Signed-off-by: Matthew Brost Reviewed-by: Maarten Lankhorst Link: https://patch.msgid.link/20260219205029.1011336-1-matthew.brost@intel.com --- drivers/gpu/drm/drm_gpusvm.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/drm_gpusvm.c b/drivers/gpu/drm/drm_gpusvm.c index 24180bfdf5a2..9ef9e52c0547 100644 --- a/drivers/gpu/drm/drm_gpusvm.c +++ b/drivers/gpu/drm/drm_gpusvm.c @@ -1338,14 +1338,14 @@ bool drm_gpusvm_range_pages_valid(struct drm_gpusvm *gpusvm, EXPORT_SYMBOL_GPL(drm_gpusvm_range_pages_valid); /** - * drm_gpusvm_range_pages_valid_unlocked() - GPU SVM range pages valid unlocked + * drm_gpusvm_pages_valid_unlocked() - GPU SVM pages valid unlocked * @gpusvm: Pointer to the GPU SVM structure - * @range: Pointer to the GPU SVM range structure + * @svm_pages: Pointer to the GPU SVM pages structure * - * This function determines if a GPU SVM range pages are valid. Expected be - * called without holding gpusvm->notifier_lock. + * This function determines if a GPU SVM pages are valid. Expected be called + * without holding gpusvm->notifier_lock. * - * Return: True if GPU SVM range has valid pages, False otherwise + * Return: True if GPU SVM pages are valid, False otherwise */ static bool drm_gpusvm_pages_valid_unlocked(struct drm_gpusvm *gpusvm, struct drm_gpusvm_pages *svm_pages) From 0902010c8d163f7b62e655efda1a843529152c7c Mon Sep 17 00:00:00 2001 From: Felix Gu Date: Tue, 24 Feb 2026 18:07:59 +0800 Subject: [PATCH 207/359] regulator: fp9931: Fix PM runtime reference leak in fp9931_hwmon_read() In fp9931_hwmon_read(), if regmap_read() failed, the function returned the error code without calling pm_runtime_put_autosuspend(), causing a PM reference leak. Fixes: 12d821bd13d4 ("regulator: Add FP9931/JD9930 driver") Signed-off-by: Felix Gu Reviewed-by: Andreas Kemnade Link: https://patch.msgid.link/20260224-fp9931-v1-1-1cf05cabef4a@gmail.com Signed-off-by: Mark Brown --- drivers/regulator/fp9931.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/regulator/fp9931.c b/drivers/regulator/fp9931.c index 7fbcc6327cc6..abea3b69d8a0 100644 --- a/drivers/regulator/fp9931.c +++ b/drivers/regulator/fp9931.c @@ -144,13 +144,12 @@ static int fp9931_hwmon_read(struct device *dev, enum hwmon_sensor_types type, return ret; ret = regmap_read(data->regmap, FP9931_REG_TMST_VALUE, &val); - if (ret) - return ret; + if (!ret) + *temp = (s8)val * 1000; pm_runtime_put_autosuspend(data->dev); - *temp = (s8)val * 1000; - return 0; + return ret; } static umode_t fp9931_hwmon_is_visible(const void *data, From 4baaddaa44af01cd4ce239493060738fd0881835 Mon Sep 17 00:00:00 2001 From: Felix Gu Date: Tue, 24 Feb 2026 19:19:03 +0800 Subject: [PATCH 208/359] regulator: bq257xx: Fix device node reference leak in bq257xx_reg_dt_parse_gpio() In bq257xx_reg_dt_parse_gpio(), if fails to get subchild, it returns without calling of_node_put(child), causing the device node reference leak. Fixes: 981dd162b635 ("regulator: bq257xx: Add bq257xx boost regulator driver") Signed-off-by: Felix Gu Link: https://patch.msgid.link/20260224-bq257-v1-1-8ebbc731c1c3@gmail.com Signed-off-by: Mark Brown --- drivers/regulator/bq257xx-regulator.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/regulator/bq257xx-regulator.c b/drivers/regulator/bq257xx-regulator.c index fc1ccede4468..dab8f1ab4450 100644 --- a/drivers/regulator/bq257xx-regulator.c +++ b/drivers/regulator/bq257xx-regulator.c @@ -115,11 +115,10 @@ static void bq257xx_reg_dt_parse_gpio(struct platform_device *pdev) return; subchild = of_get_child_by_name(child, pdata->desc.of_match); + of_node_put(child); if (!subchild) return; - of_node_put(child); - pdata->otg_en_gpio = devm_fwnode_gpiod_get_index(&pdev->dev, of_fwnode_handle(subchild), "enable", 0, From bfd7db781e2e7a99b086d645a104d16e368f58ff Mon Sep 17 00:00:00 2001 From: Felix Gu Date: Tue, 24 Feb 2026 20:23:30 +0800 Subject: [PATCH 209/359] regulator: Kconfig: fix a typo Fixes a typo in Kconfig, HWWON -> HWMON Signed-off-by: Felix Gu Link: https://patch.msgid.link/20260224-kconfig-v1-1-b0c5459ed7a0@gmail.com Signed-off-by: Mark Brown --- drivers/regulator/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/regulator/Kconfig b/drivers/regulator/Kconfig index a708fc63f581..d10b6f9243d5 100644 --- a/drivers/regulator/Kconfig +++ b/drivers/regulator/Kconfig @@ -508,7 +508,7 @@ config REGULATOR_FP9931 This driver supports the FP9931/JD9930 voltage regulator chip which is used to provide power to Electronic Paper Displays so it is found in E-Book readers. - If HWWON is enabled, it also provides temperature measurement. + If HWMON is enabled, it also provides temperature measurement. config REGULATOR_LM363X tristate "TI LM363X voltage regulators" From e08f2adcf990b8cf272e90898401a9e481c1c667 Mon Sep 17 00:00:00 2001 From: Ioana Ciornei Date: Tue, 24 Feb 2026 13:36:09 +0200 Subject: [PATCH 210/359] Revert "irqchip/ls-extirq: Use for_each_of_imap_item iterator" This reverts commit 3ac6dfe3d7a2396602b67667249b146504dfbd2a. The ls-extirq uses interrupt-map but it's a non-standard use documented in fsl,ls-extirq.yaml: # The driver(drivers/irqchip/irq-ls-extirq.c) have not use standard DT # function to parser interrupt-map. So it doesn't consider '#address-size' # in parent interrupt controller, such as GIC. # # When dt-binding verify interrupt-map, item data matrix is spitted at # incorrect position. Remove interrupt-map restriction because it always # wrong. This means that by using for_each_of_imap_item and the underlying of_irq_parse_imap_parent() on its interrupt-map property will effectively break its functionality Revert the patch making use of for_each_of_imap_item() in ls-extirq. Fixes: 3ac6dfe3d7a2 ("irqchip/ls-extirq: Use for_each_of_imap_item iterator") Signed-off-by: Ioana Ciornei Signed-off-by: Thomas Gleixner Acked-by: Herve Codina Link: https://patch.msgid.link/20260224113610.1129022-2-ioana.ciornei@nxp.com --- drivers/irqchip/irq-ls-extirq.c | 47 +++++++++++++++++++++------------ 1 file changed, 30 insertions(+), 17 deletions(-) diff --git a/drivers/irqchip/irq-ls-extirq.c b/drivers/irqchip/irq-ls-extirq.c index a7e9c3885b09..96f9c20621cf 100644 --- a/drivers/irqchip/irq-ls-extirq.c +++ b/drivers/irqchip/irq-ls-extirq.c @@ -125,32 +125,45 @@ static const struct irq_domain_ops extirq_domain_ops = { static int ls_extirq_parse_map(struct ls_extirq_data *priv, struct device_node *node) { - struct of_imap_parser imap_parser; - struct of_imap_item imap_item; + const __be32 *map; + u32 mapsize; int ret; - ret = of_imap_parser_init(&imap_parser, node, &imap_item); - if (ret) - return ret; + map = of_get_property(node, "interrupt-map", &mapsize); + if (!map) + return -ENOENT; + if (mapsize % sizeof(*map)) + return -EINVAL; + mapsize /= sizeof(*map); - for_each_of_imap_item(&imap_parser, &imap_item) { + while (mapsize) { struct device_node *ipar; - u32 hwirq; - int i; + u32 hwirq, intsize, j; - hwirq = imap_item.child_imap[0]; - if (hwirq >= MAXIRQ) { - of_node_put(imap_item.parent_args.np); + if (mapsize < 3) + return -EINVAL; + hwirq = be32_to_cpup(map); + if (hwirq >= MAXIRQ) return -EINVAL; - } priv->nirq = max(priv->nirq, hwirq + 1); - ipar = of_node_get(imap_item.parent_args.np); - priv->map[hwirq].fwnode = of_fwnode_handle(ipar); + ipar = of_find_node_by_phandle(be32_to_cpup(map + 2)); + map += 3; + mapsize -= 3; + if (!ipar) + return -EINVAL; + priv->map[hwirq].fwnode = &ipar->fwnode; + ret = of_property_read_u32(ipar, "#interrupt-cells", &intsize); + if (ret) + return ret; - priv->map[hwirq].param_count = imap_item.parent_args.args_count; - for (i = 0; i < priv->map[hwirq].param_count; i++) - priv->map[hwirq].param[i] = imap_item.parent_args.args[i]; + if (intsize > mapsize) + return -EINVAL; + + priv->map[hwirq].param_count = intsize; + for (j = 0; j < intsize; ++j) + priv->map[hwirq].param[j] = be32_to_cpup(map++); + mapsize -= intsize; } return 0; } From fe5669e363b129cde285bfb4d45abb72d1d77cfc Mon Sep 17 00:00:00 2001 From: Ioana Ciornei Date: Tue, 24 Feb 2026 13:36:10 +0200 Subject: [PATCH 211/359] irqchip/ls-extirq: Fix devm_of_iomap() error check The devm_of_iomap() function returns an ERR_PTR() encoded error code on failure. Replace the incorrect check against NULL with IS_ERR(). Fixes: 05cd654829dd ("irqchip/ls-extirq: Convert to a platform driver to make it work again") Reported-by: Dan Carpenter Signed-off-by: Ioana Ciornei Signed-off-by: Thomas Gleixner Reviewed-by: Herve Codina Link: https://patch.msgid.link/20260224113610.1129022-3-ioana.ciornei@nxp.com Closes: https://lore.kernel.org/all/aYXvfbfT6w0TMsXS@stanley.mountain/ --- drivers/irqchip/irq-ls-extirq.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/irqchip/irq-ls-extirq.c b/drivers/irqchip/irq-ls-extirq.c index 96f9c20621cf..d724fe843980 100644 --- a/drivers/irqchip/irq-ls-extirq.c +++ b/drivers/irqchip/irq-ls-extirq.c @@ -190,8 +190,10 @@ static int ls_extirq_probe(struct platform_device *pdev) return dev_err_probe(dev, -ENOMEM, "Failed to allocate memory\n"); priv->intpcr = devm_of_iomap(dev, node, 0, NULL); - if (!priv->intpcr) - return dev_err_probe(dev, -ENOMEM, "Cannot ioremap OF node %pOF\n", node); + if (IS_ERR(priv->intpcr)) { + return dev_err_probe(dev, PTR_ERR(priv->intpcr), + "Cannot ioremap OF node %pOF\n", node); + } ret = ls_extirq_parse_map(priv, node); if (ret) From a46435537a844d0f7b4b620baf962cad136422de Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Tue, 24 Feb 2026 11:36:09 -0700 Subject: [PATCH 212/359] io_uring/cmd_net: use READ_ONCE() for ->addr3 read Any SQE read should use READ_ONCE(), to ensure the result is read once and only once. Doesn't really matter for this case, but it's better to keep these 100% consistent and always use READ_ONCE() for the prep side of SQE handling. Fixes: 5d24321e4c15 ("io_uring: Introduce getsockname io_uring cmd") Signed-off-by: Jens Axboe --- io_uring/cmd_net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/io_uring/cmd_net.c b/io_uring/cmd_net.c index 57ddaf874611..125a81c520a6 100644 --- a/io_uring/cmd_net.c +++ b/io_uring/cmd_net.c @@ -146,7 +146,7 @@ static int io_uring_cmd_getsockname(struct socket *sock, return -EINVAL; uaddr = u64_to_user_ptr(READ_ONCE(sqe->addr)); - ulen = u64_to_user_ptr(sqe->addr3); + ulen = u64_to_user_ptr(READ_ONCE(sqe->addr3)); peer = READ_ONCE(sqe->optlen); if (peer > 1) return -EINVAL; From 09833d99db36d74456a4d13eb29c32d56ff8f2b6 Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Fri, 13 Feb 2026 10:54:10 +0100 Subject: [PATCH 213/359] mm/kfence: disable KFENCE upon KASAN HW tags enablement KFENCE does not currently support KASAN hardware tags. As a result, the two features are incompatible when enabled simultaneously. Given that MTE provides deterministic protection and KFENCE is a sampling-based debugging tool, prioritize the stronger hardware protections. Disable KFENCE initialization and free the pre-allocated pool if KASAN hardware tags are detected to ensure the system maintains the security guarantees provided by MTE. Link: https://lkml.kernel.org/r/20260213095410.1862978-1-glider@google.com Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure") Signed-off-by: Alexander Potapenko Suggested-by: Marco Elver Reviewed-by: Marco Elver Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Dmitry Vyukov Cc: Ernesto Martinez Garcia Cc: Greg KH Cc: Kees Cook Cc: Signed-off-by: Andrew Morton --- mm/kfence/core.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/mm/kfence/core.c b/mm/kfence/core.c index b4ea3262c925..b5aedf505cec 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -916,6 +917,20 @@ void __init kfence_alloc_pool_and_metadata(void) if (!kfence_sample_interval) return; + /* + * If KASAN hardware tags are enabled, disable KFENCE, because it + * does not support MTE yet. + */ + if (kasan_hw_tags_enabled()) { + pr_info("disabled as KASAN HW tags are enabled\n"); + if (__kfence_pool) { + memblock_free(__kfence_pool, KFENCE_POOL_SIZE); + __kfence_pool = NULL; + } + kfence_sample_interval = 0; + return; + } + /* * If the pool has already been initialized by arch, there is no need to * re-allocate the memory pool. From eb9549346f7578eda3755683ac2cfb4d94c0675f Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 16 Feb 2026 13:17:44 +0100 Subject: [PATCH 214/359] mm: change vma_alloc_folio_noprof() macro to inline function In a few rare configurations with extra warnings eanbled, the new drm_pagemap_migrate_populate_ram_pfn() calls vma_alloc_folio_noprof() but that does not use all the arguments, leading to a harmless warning: drivers/gpu/drm/drm_pagemap.c: In function 'drm_pagemap_migrate_populate_ram_pfn': drivers/gpu/drm/drm_pagemap.c:701:63: error: parameter 'addr' set but not used [-Werror=unused-but-set-parameter=] 701 | unsigned long addr) | ~~~~~~~~~~~~~~^~~~ Replace the macro with an inline function so the compiler can see how the argument would be used, but is still able to optimize out the assignments. Link: https://lkml.kernel.org/r/20260216121751.2378374-1-arnd@kernel.org Signed-off-by: Arnd Bergmann Reviewed-by: Lorenzo Stoakes Acked-by: Zi Yan Reviewed-by: Suren Baghdasaryan Cc: Alexei Starovoitov Cc: Brendan Jackman Cc: David Hildenbrand Cc: Johannes Weiner Cc: Joshua Hahn Cc: Kefeng Wang Cc: Liam Howlett Cc: Michal Hocko Cc: Mike Rapoport Cc: Shakeel Butt Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/gfp.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 2b30a0529d48..f82d74a77cad 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -339,8 +339,11 @@ static inline struct folio *folio_alloc_mpol_noprof(gfp_t gfp, unsigned int orde { return folio_alloc_noprof(gfp, order); } -#define vma_alloc_folio_noprof(gfp, order, vma, addr) \ - folio_alloc_noprof(gfp, order) +static inline struct folio *vma_alloc_folio_noprof(gfp_t gfp, int order, + struct vm_area_struct *vma, unsigned long addr) +{ + return folio_alloc_noprof(gfp, order); +} #endif #define alloc_pages(...) alloc_hooks(alloc_pages_noprof(__VA_ARGS__)) From dd085fe9a8ebfc5d10314c60452db38d2b75e609 Mon Sep 17 00:00:00 2001 From: Deepanshu Kartikey Date: Sat, 14 Feb 2026 05:45:35 +0530 Subject: [PATCH 215/359] mm: thp: deny THP for files on anonymous inodes file_thp_enabled() incorrectly allows THP for files on anonymous inodes (e.g. guest_memfd and secretmem). These files are created via alloc_file_pseudo(), which does not call get_write_access() and leaves inode->i_writecount at 0. Combined with S_ISREG(inode->i_mode) being true, they appear as read-only regular files when CONFIG_READ_ONLY_THP_FOR_FS is enabled, making them eligible for THP collapse. Anonymous inodes can never pass the inode_is_open_for_write() check since their i_writecount is never incremented through the normal VFS open path. The right thing to do is to exclude them from THP eligibility altogether, since CONFIG_READ_ONLY_THP_FOR_FS was designed for real filesystem files (e.g. shared libraries), not for pseudo-filesystem inodes. For guest_memfd, this allows khugepaged and MADV_COLLAPSE to create large folios in the page cache via the collapse path, but the guest_memfd fault handler does not support large folios. This triggers WARN_ON_ONCE(folio_test_large(folio)) in kvm_gmem_fault_user_mapping(). For secretmem, collapse_file() tries to copy page contents through the direct map, but secretmem pages are removed from the direct map. This can result in a kernel crash: BUG: unable to handle page fault for address: ffff88810284d000 RIP: 0010:memcpy_orig+0x16/0x130 Call Trace: collapse_file hpage_collapse_scan_file madvise_collapse Secretmem is not affected by the crash on upstream as the memory failure recovery handles the failed copy gracefully, but it still triggers confusing false memory failure reports: Memory failure: 0x106d96f: recovery action for clean unevictable LRU page: Recovered Check IS_ANON_FILE(inode) in file_thp_enabled() to deny THP for all anonymous inode files. Link: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44 Link: https://lore.kernel.org/linux-mm/CAEvNRgHegcz3ro35ixkDw39ES8=U6rs6S7iP0gkR9enr7HoGtA@mail.gmail.com Link: https://lkml.kernel.org/r/20260214001535.435626-1-kartikey406@gmail.com Fixes: 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP eligibility") Signed-off-by: Deepanshu Kartikey Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44 Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com Tested-by: Lance Yang Acked-by: David Hildenbrand (Arm) Reviewed-by: Barry Song Reviewed-by: Ackerley Tng Tested-by: Ackerley Tng Reviewed-by: Lorenzo Stoakes Cc: Baolin Wang Cc: Dev Jain Cc: Fangrui Song Cc: Liam Howlett Cc: Nico Pache Cc: Ryan Roberts Cc: Yang Shi Cc: Zi Yan Cc: Signed-off-by: Andrew Morton --- mm/huge_memory.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d4ca8cfd7f9d..8e2746ea74ad 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -94,6 +94,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) inode = file_inode(vma->vm_file); + if (IS_ANON_FILE(inode)) + return false; + return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); } From f85b1c6af5bc3872f994df0a5688c1162de07a62 Mon Sep 17 00:00:00 2001 From: "Pratyush Yadav (Google)" Date: Mon, 16 Feb 2026 14:22:19 +0100 Subject: [PATCH 216/359] liveupdate: luo_file: remember retrieve() status LUO keeps track of successful retrieve attempts on a LUO file. It does so to avoid multiple retrievals of the same file. Multiple retrievals cause problems because once the file is retrieved, the serialized data structures are likely freed and the file is likely in a very different state from what the code expects. The retrieve boolean in struct luo_file keeps track of this, and is passed to the finish callback so it knows what work was already done and what it has left to do. All this works well when retrieve succeeds. When it fails, luo_retrieve_file() returns the error immediately, without ever storing anywhere that a retrieve was attempted or what its error code was. This results in an errored LIVEUPDATE_SESSION_RETRIEVE_FD ioctl to userspace, but nothing prevents it from trying this again. The retry is problematic for much of the same reasons listed above. The file is likely in a very different state than what the retrieve logic normally expects, and it might even have freed some serialization data structures. Attempting to access them or free them again is going to break things. For example, if memfd managed to restore 8 of its 10 folios, but fails on the 9th, a subsequent retrieve attempt will try to call kho_restore_folio() on the first folio again, and that will fail with a warning since it is an invalid operation. Apart from the retry, finish() also breaks. Since on failure the retrieved bool in luo_file is never touched, the finish() call on session close will tell the file handler that retrieve was never attempted, and it will try to access or free the data structures that might not exist, much in the same way as the retry attempt. There is no sane way of attempting the retrieve again. Remember the error retrieve returned and directly return it on a retry. Also pass this status code to finish() so it can make the right decision on the work it needs to do. This is done by changing the bool to an integer. A value of 0 means retrieve was never attempted, a positive value means it succeeded, and a negative value means it failed and the error code is the value. Link: https://lkml.kernel.org/r/20260216132221.987987-1-pratyush@kernel.org Fixes: 7c722a7f44e0 ("liveupdate: luo_file: implement file systems callbacks") Signed-off-by: Pratyush Yadav (Google) Reviewed-by: Mike Rapoport (Microsoft) Cc: Pasha Tatashin Cc: Signed-off-by: Andrew Morton --- include/linux/liveupdate.h | 9 +++++--- kernel/liveupdate/luo_file.c | 41 ++++++++++++++++++++++-------------- mm/memfd_luo.c | 7 +++++- 3 files changed, 37 insertions(+), 20 deletions(-) diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h index fe82a6c3005f..dd11fdc76a5f 100644 --- a/include/linux/liveupdate.h +++ b/include/linux/liveupdate.h @@ -23,8 +23,11 @@ struct file; /** * struct liveupdate_file_op_args - Arguments for file operation callbacks. * @handler: The file handler being called. - * @retrieved: The retrieve status for the 'can_finish / finish' - * operation. + * @retrieve_status: The retrieve status for the 'can_finish / finish' + * operation. A value of 0 means the retrieve has not been + * attempted, a positive value means the retrieve was + * successful, and a negative value means the retrieve failed, + * and the value is the error code of the call. * @file: The file object. For retrieve: [OUT] The callback sets * this to the new file. For other ops: [IN] The caller sets * this to the file being operated on. @@ -40,7 +43,7 @@ struct file; */ struct liveupdate_file_op_args { struct liveupdate_file_handler *handler; - bool retrieved; + int retrieve_status; struct file *file; u64 serialized_data; void *private_data; diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c index 8c79058253e1..5acee4174bf0 100644 --- a/kernel/liveupdate/luo_file.c +++ b/kernel/liveupdate/luo_file.c @@ -134,9 +134,12 @@ static LIST_HEAD(luo_file_handler_list); * state that is not preserved. Set by the handler's .preserve() * callback, and must be freed in the handler's .unpreserve() * callback. - * @retrieved: A flag indicating whether a user/kernel in the new kernel has + * @retrieve_status: Status code indicating whether a user/kernel in the new kernel has * successfully called retrieve() on this file. This prevents - * multiple retrieval attempts. + * multiple retrieval attempts. A value of 0 means a retrieve() + * has not been attempted, a positive value means the retrieve() + * was successful, and a negative value means the retrieve() + * failed, and the value is the error code of the call. * @mutex: A mutex that protects the fields of this specific instance * (e.g., @retrieved, @file), ensuring that operations like * retrieving or finishing a file are atomic. @@ -161,7 +164,7 @@ struct luo_file { struct file *file; u64 serialized_data; void *private_data; - bool retrieved; + int retrieve_status; struct mutex mutex; struct list_head list; u64 token; @@ -298,7 +301,6 @@ int luo_preserve_file(struct luo_file_set *file_set, u64 token, int fd) luo_file->file = file; luo_file->fh = fh; luo_file->token = token; - luo_file->retrieved = false; mutex_init(&luo_file->mutex); args.handler = fh; @@ -577,7 +579,12 @@ int luo_retrieve_file(struct luo_file_set *file_set, u64 token, return -ENOENT; guard(mutex)(&luo_file->mutex); - if (luo_file->retrieved) { + if (luo_file->retrieve_status < 0) { + /* Retrieve was attempted and it failed. Return the error code. */ + return luo_file->retrieve_status; + } + + if (luo_file->retrieve_status > 0) { /* * Someone is asking for this file again, so get a reference * for them. @@ -590,16 +597,19 @@ int luo_retrieve_file(struct luo_file_set *file_set, u64 token, args.handler = luo_file->fh; args.serialized_data = luo_file->serialized_data; err = luo_file->fh->ops->retrieve(&args); - if (!err) { - luo_file->file = args.file; - - /* Get reference so we can keep this file in LUO until finish */ - get_file(luo_file->file); - *filep = luo_file->file; - luo_file->retrieved = true; + if (err) { + /* Keep the error code for later use. */ + luo_file->retrieve_status = err; + return err; } - return err; + luo_file->file = args.file; + /* Get reference so we can keep this file in LUO until finish */ + get_file(luo_file->file); + *filep = luo_file->file; + luo_file->retrieve_status = 1; + + return 0; } static int luo_file_can_finish_one(struct luo_file_set *file_set, @@ -615,7 +625,7 @@ static int luo_file_can_finish_one(struct luo_file_set *file_set, args.handler = luo_file->fh; args.file = luo_file->file; args.serialized_data = luo_file->serialized_data; - args.retrieved = luo_file->retrieved; + args.retrieve_status = luo_file->retrieve_status; can_finish = luo_file->fh->ops->can_finish(&args); } @@ -632,7 +642,7 @@ static void luo_file_finish_one(struct luo_file_set *file_set, args.handler = luo_file->fh; args.file = luo_file->file; args.serialized_data = luo_file->serialized_data; - args.retrieved = luo_file->retrieved; + args.retrieve_status = luo_file->retrieve_status; luo_file->fh->ops->finish(&args); luo_flb_file_finish(luo_file->fh); @@ -788,7 +798,6 @@ int luo_file_deserialize(struct luo_file_set *file_set, luo_file->file = NULL; luo_file->serialized_data = file_ser[i].data; luo_file->token = file_ser[i].token; - luo_file->retrieved = false; mutex_init(&luo_file->mutex); list_add_tail(&luo_file->list, &file_set->files_list); } diff --git a/mm/memfd_luo.c b/mm/memfd_luo.c index 5c17da3880c5..e485b828d173 100644 --- a/mm/memfd_luo.c +++ b/mm/memfd_luo.c @@ -326,7 +326,12 @@ static void memfd_luo_finish(struct liveupdate_file_op_args *args) struct memfd_luo_folio_ser *folios_ser; struct memfd_luo_ser *ser; - if (args->retrieved) + /* + * If retrieve was successful, nothing to do. If it failed, retrieve() + * already cleaned up everything it could. So nothing to do there + * either. Only need to clean up when retrieve was not called. + */ + if (args->retrieve_status) return; ser = phys_to_virt(args->serialized_data); From 319d0bff22f3dd7a982c289e8336da69f0581299 Mon Sep 17 00:00:00 2001 From: "Vlastimil Babka (SUSE)" Date: Tue, 17 Feb 2026 11:21:52 +0100 Subject: [PATCH 217/359] MAINTAINERS, mailmap: update e-mail address for Vlastimil Babka Hopefully improve e-mail performance. Link: https://lkml.kernel.org/r/20260217102151.10425-2-vbabka@kernel.org Signed-off-by: Vlastimil Babka (SUSE) Signed-off-by: Andrew Morton --- .mailmap | 1 + MAINTAINERS | 18 +++++++++--------- 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/.mailmap b/.mailmap index e1cf6bb85d33..b3785362f73f 100644 --- a/.mailmap +++ b/.mailmap @@ -876,6 +876,7 @@ Vivien Didelot Vlad Dogaru Vladimir Davydov Vladimir Davydov +Vlastimil Babka WangYuli WangYuli Weiwen Hu diff --git a/MAINTAINERS b/MAINTAINERS index 55af015174a5..02fe14782533 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16656,7 +16656,7 @@ M: Andrew Morton M: David Hildenbrand R: Lorenzo Stoakes R: Liam R. Howlett -R: Vlastimil Babka +R: Vlastimil Babka R: Mike Rapoport R: Suren Baghdasaryan R: Michal Hocko @@ -16786,7 +16786,7 @@ M: Andrew Morton M: David Hildenbrand R: Lorenzo Stoakes R: Liam R. Howlett -R: Vlastimil Babka +R: Vlastimil Babka R: Mike Rapoport R: Suren Baghdasaryan R: Michal Hocko @@ -16841,7 +16841,7 @@ F: mm/oom_kill.c MEMORY MANAGEMENT - PAGE ALLOCATOR M: Andrew Morton -M: Vlastimil Babka +M: Vlastimil Babka R: Suren Baghdasaryan R: Michal Hocko R: Brendan Jackman @@ -16887,7 +16887,7 @@ M: David Hildenbrand M: Lorenzo Stoakes R: Rik van Riel R: Liam R. Howlett -R: Vlastimil Babka +R: Vlastimil Babka R: Harry Yoo R: Jann Horn L: linux-mm@kvack.org @@ -16986,7 +16986,7 @@ MEMORY MAPPING M: Andrew Morton M: Liam R. Howlett M: Lorenzo Stoakes -R: Vlastimil Babka +R: Vlastimil Babka R: Jann Horn R: Pedro Falcato L: linux-mm@kvack.org @@ -17016,7 +17016,7 @@ M: Andrew Morton M: Suren Baghdasaryan M: Liam R. Howlett M: Lorenzo Stoakes -R: Vlastimil Babka +R: Vlastimil Babka R: Shakeel Butt L: linux-mm@kvack.org S: Maintained @@ -17032,7 +17032,7 @@ M: Andrew Morton M: Liam R. Howlett M: Lorenzo Stoakes M: David Hildenbrand -R: Vlastimil Babka +R: Vlastimil Babka R: Jann Horn L: linux-mm@kvack.org S: Maintained @@ -23174,7 +23174,7 @@ K: \b(?i:rust)\b RUST [ALLOC] M: Danilo Krummrich R: Lorenzo Stoakes -R: Vlastimil Babka +R: Vlastimil Babka R: Liam R. Howlett R: Uladzislau Rezki L: rust-for-linux@vger.kernel.org @@ -24350,7 +24350,7 @@ F: Documentation/devicetree/bindings/nvmem/layouts/kontron,sl28-vpd.yaml F: drivers/nvmem/layouts/sl28vpd.c SLAB ALLOCATOR -M: Vlastimil Babka +M: Vlastimil Babka M: Andrew Morton R: Christoph Lameter R: David Rientjes From fdb24a820a5832ec4532273282cbd4f22c291a0d Mon Sep 17 00:00:00 2001 From: Phillip Lougher Date: Tue, 17 Feb 2026 05:09:55 +0000 Subject: [PATCH 218/359] Squashfs: check metadata block offset is within range Syzkaller reports a "general protection fault in squashfs_copy_data" This is ultimately caused by a corrupted index look-up table, which produces a negative metadata block offset. This is subsequently passed to squashfs_copy_data (via squashfs_read_metadata) where the negative offset causes an out of bounds access. The fix is to check that the offset is within range in squashfs_read_metadata. This will trap this and other cases. Link: https://lkml.kernel.org/r/20260217050955.138351-1-phillip@squashfs.org.uk Fixes: f400e12656ab ("Squashfs: cache operations") Reported-by: syzbot+a9747fe1c35a5b115d3f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/699234e2.a70a0220.2c38d7.00e2.GAE@google.com/ Signed-off-by: Phillip Lougher Cc: Christian Brauner Cc: Signed-off-by: Andrew Morton --- fs/squashfs/cache.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/squashfs/cache.c b/fs/squashfs/cache.c index 8e958db5f786..67abd4dff222 100644 --- a/fs/squashfs/cache.c +++ b/fs/squashfs/cache.c @@ -344,6 +344,9 @@ int squashfs_read_metadata(struct super_block *sb, void *buffer, if (unlikely(length < 0)) return -EIO; + if (unlikely(*offset < 0 || *offset >= SQUASHFS_METADATA_SIZE)) + return -EIO; + while (length) { entry = squashfs_cache_get(sb, msblk->block_cache, *block, 0); if (entry->error) { From c80f46ac228b48403866d65391ad09bdf0e8562a Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Sat, 14 Feb 2026 13:41:21 -0800 Subject: [PATCH 219/359] mm/damon/core: disallow non-power of two min_region_sz DAMON core uses min_region_sz parameter value as the DAMON region alignment. The alignment is made using ALIGN() and ALIGN_DOWN(), which support only the power of two alignments. But DAMON core API callers can set min_region_sz to an arbitrary number. Users can also set it indirectly, using addr_unit. When the alignment is not properly set, DAMON behavior becomes difficult to expect and understand, makes it effectively broken. It doesn't cause a kernel crash-like significant issue, though. Fix the issue by disallowing min_region_sz input that is not a power of two. Add the check to damon_commit_ctx(), as all DAMON API callers who set min_region_sz uses the function. This can be a sort of behavioral change, but it does not break users, for the following reasons. As the symptom is making DAMON effectively broken, it is not reasonable to believe there are real use cases of non-power of two min_region_sz. There is no known use case or issue reports from the setup, either. In future, if we find real use cases of non-power of two alignments and we can support it with low enough overhead, we can consider moving the restriction. But, for now, simply disallowing the corner case should be good enough as a hot fix. Link: https://lkml.kernel.org/r/20260214214124.87689-1-sj@kernel.org Fixes: d8f867fa0825 ("mm/damon: add damon_ctx->min_sz_region") Signed-off-by: SeongJae Park Cc: Quanmin Yan Cc: [6.18+] Signed-off-by: Andrew Morton --- mm/damon/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/damon/core.c b/mm/damon/core.c index 01eba1a547d4..adfc52fee9dc 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -1252,6 +1252,9 @@ int damon_commit_ctx(struct damon_ctx *dst, struct damon_ctx *src) { int err; + if (!is_power_of_2(src->min_region_sz)) + return -EINVAL; + err = damon_commit_schemes(dst, src); if (err) return err; From d155aab90fffa00f93cea1f107aef0a3d548b2ff Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Fri, 20 Feb 2026 15:49:40 +0100 Subject: [PATCH 220/359] mm/kfence: fix KASAN hardware tag faults during late enablement When KASAN hardware tags are enabled, re-enabling KFENCE late (via /sys/module/kfence/parameters/sample_interval) causes KASAN faults. This happens because the KFENCE pool and metadata are allocated via the page allocator, which tags the memory, while KFENCE continues to access it using untagged pointers during initialization. Use __GFP_SKIP_KASAN for late KFENCE pool and metadata allocations to ensure the memory remains untagged, consistent with early allocations from memblock. To support this, add __GFP_SKIP_KASAN to the allowlist in __alloc_contig_verify_gfp_mask(). Link: https://lkml.kernel.org/r/20260220144940.2779209-1-glider@google.com Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure") Signed-off-by: Alexander Potapenko Suggested-by: Ernesto Martinez Garcia Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Dmitry Vyukov Cc: Greg KH Cc: Kees Cook Cc: Marco Elver Cc: Signed-off-by: Andrew Morton --- mm/kfence/core.c | 14 ++++++++------ mm/page_alloc.c | 3 ++- 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/mm/kfence/core.c b/mm/kfence/core.c index b5aedf505cec..7393957f9a20 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -1004,14 +1004,14 @@ static int kfence_init_late(void) #ifdef CONFIG_CONTIG_ALLOC struct page *pages; - pages = alloc_contig_pages(nr_pages_pool, GFP_KERNEL, first_online_node, - NULL); + pages = alloc_contig_pages(nr_pages_pool, GFP_KERNEL | __GFP_SKIP_KASAN, + first_online_node, NULL); if (!pages) return -ENOMEM; __kfence_pool = page_to_virt(pages); - pages = alloc_contig_pages(nr_pages_meta, GFP_KERNEL, first_online_node, - NULL); + pages = alloc_contig_pages(nr_pages_meta, GFP_KERNEL | __GFP_SKIP_KASAN, + first_online_node, NULL); if (pages) kfence_metadata_init = page_to_virt(pages); #else @@ -1021,11 +1021,13 @@ static int kfence_init_late(void) return -EINVAL; } - __kfence_pool = alloc_pages_exact(KFENCE_POOL_SIZE, GFP_KERNEL); + __kfence_pool = alloc_pages_exact(KFENCE_POOL_SIZE, + GFP_KERNEL | __GFP_SKIP_KASAN); if (!__kfence_pool) return -ENOMEM; - kfence_metadata_init = alloc_pages_exact(KFENCE_METADATA_SIZE, GFP_KERNEL); + kfence_metadata_init = alloc_pages_exact(KFENCE_METADATA_SIZE, + GFP_KERNEL | __GFP_SKIP_KASAN); #endif if (!kfence_metadata_init) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fcc32737f451..2d4b6f1a554e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6928,7 +6928,8 @@ static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask) { const gfp_t reclaim_mask = __GFP_IO | __GFP_FS | __GFP_RECLAIM; const gfp_t action_mask = __GFP_COMP | __GFP_RETRY_MAYFAIL | __GFP_NOWARN | - __GFP_ZERO | __GFP_ZEROTAGS | __GFP_SKIP_ZERO; + __GFP_ZERO | __GFP_ZEROTAGS | __GFP_SKIP_ZERO | + __GFP_SKIP_KASAN; const gfp_t cc_action_mask = __GFP_RETRY_MAYFAIL | __GFP_NOWARN; /* From 079c24d5690262e83ee476e2a548e416f3237511 Mon Sep 17 00:00:00 2001 From: Kalesh Singh Date: Thu, 19 Feb 2026 15:36:56 -0800 Subject: [PATCH 221/359] mm/tracing: rss_stat: ensure curr is false from kthread context The rss_stat trace event allows userspace tools, like Perfetto [1], to inspect per-process RSS metric changes over time. The curr field was introduced to rss_stat in commit e4dcad204d3a ("rss_stat: add support to detect RSS updates of external mm"). Its intent is to indicate whether the RSS update is for the mm_struct of the current execution context; and is set to false when operating on a remote mm_struct (e.g., via kswapd or a direct reclaimer). However, an issue arises when a kernel thread temporarily adopts a user process's mm_struct. Kernel threads do not have their own mm_struct and normally have current->mm set to NULL. To operate on user memory, they can "borrow" a memory context using kthread_use_mm(), which sets current->mm to the user process's mm. This can be observed, for example, in the USB Function Filesystem (FFS) driver. The ffs_user_copy_worker() handles AIO completions and uses kthread_use_mm() to copy data to a user-space buffer. If a page fault occurs during this copy, the fault handler executes in the kthread's context. At this point, current is the kthread, but current->mm points to the user process's mm. Since the rss_stat event (from the page fault) is for that same mm, the condition current->mm == mm becomes true, causing curr to be incorrectly set to true when the trace event is emitted. This is misleading because it suggests the mm belongs to the kthread, confusing userspace tools that track per-process RSS changes and corrupting their mm_id-to-process association. Fix this by ensuring curr is always false when the trace event is emitted from a kthread context by checking for the PF_KTHREAD flag. Link: https://lkml.kernel.org/r/20260219233708.1971199-1-kaleshsingh@google.com Link: https://perfetto.dev/ [1] Fixes: e4dcad204d3a ("rss_stat: add support to detect RSS updates of external mm") Signed-off-by: Kalesh Singh Acked-by: Zi Yan Acked-by: SeongJae Park Reviewed-by: Pedro Falcato Cc: "David Hildenbrand (Arm)" Cc: Joel Fernandes Cc: Lorenzo Stoakes Cc: Minchan Kim Cc: Steven Rostedt Cc: Suren Baghdasaryan Cc: [5.10+] Signed-off-by: Andrew Morton --- include/trace/events/kmem.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h index 7f93e754da5c..cd7920c81f85 100644 --- a/include/trace/events/kmem.h +++ b/include/trace/events/kmem.h @@ -440,7 +440,13 @@ TRACE_EVENT(rss_stat, TP_fast_assign( __entry->mm_id = mm_ptr_to_hash(mm); - __entry->curr = !!(current->mm == mm); + /* + * curr is true if the mm matches the current task's mm_struct. + * Since kthreads (PF_KTHREAD) have no mm_struct of their own + * but can borrow one via kthread_use_mm(), we must filter them + * out to avoid incorrectly attributing the RSS update to them. + */ + __entry->curr = current->mm == mm && !(current->flags & PF_KTHREAD); __entry->member = member; __entry->size = (percpu_counter_sum_positive(&mm->rss_stat[member]) << PAGE_SHIFT); From a4ab97e34bb687a2ca63fc70b47e8762e689797f Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Sun, 22 Feb 2026 19:57:02 +0800 Subject: [PATCH 222/359] mm: fix NULL NODE_DATA dereference for memoryless nodes on boot Commit d49004c5f0c1 ("arch, mm: consolidate initialization of nodes, zones and memory map") moved free_area_init() from setup_arch() to mm_core_init_early(), which runs after setup_arch() returns. This changed the ordering relative to init_cpu_to_node() on x86. Before the commit, free_area_init() ran during paging_init() (called from setup_arch()) *before* init_cpu_to_node(). After the commit, it runs *after* init_cpu_to_node(). On machines with memoryless NUMA nodes (e.g., node 0 has CPUs but no memory), this causes a NULL pointer dereference: 1. numa_register_nodes() skips memoryless nodes: no alloc_node_data() and no node_set_online() for them. 2. init_cpu_to_node() sets memoryless nodes online (they have CPUs) but does not allocate NODE_DATA. 3. free_area_init() checks "if (!node_online(nid))" to decide whether to call alloc_offline_node_data(). Since the memoryless node is now online, the allocation is skipped, leaving NODE_DATA(nid) == NULL. 4. The immediate "pgdat = NODE_DATA(nid)" dereferences NULL. The crash happens before console_init(), so no output is visible without earlyprintk. With earlyprintk enabled, the following panic is observed: BUG: unable to handle page fault for address: 000000000002a1e0 Oops: Oops: 0000 [#1] SMP NOPTI RIP: 0010:free_area_init_node+0x3a/0x540 Call Trace: free_area_init+0x331/0x4e0 start_kernel+0x69/0x4a0 x86_64_start_reservations+0x24/0x30 x86_64_start_kernel+0x125/0x130 common_startup_64+0x13e/0x148 Kernel panic - not syncing: Attempted to kill the idle task! Fix this by checking "if (!NODE_DATA(nid))" instead of "if (!node_online(nid))". This directly tests whether the per-node data structure needs to be allocated, regardless of the node's online status. This change is also safe for non-x86 architectures as they all allocate NODE_DATA for every node including memoryless ones, so the check simply evaluates to false with no change in behavior. Link: https://lkml.kernel.org/r/20260222115702.3659-1-ming.lei@redhat.com Fixes: d49004c5f0c1 ("arch, mm: consolidate initialization of nodes, zones and memory map") Signed-off-by: Ming Lei Reviewed-by: Mike Rapoport (Microsoft) Signed-off-by: Andrew Morton --- mm/mm_init.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/mm_init.c b/mm/mm_init.c index 61d983d23f55..df34797691bd 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1896,7 +1896,11 @@ static void __init free_area_init(void) for_each_node(nid) { pg_data_t *pgdat; - if (!node_online(nid)) + /* + * If an architecture has not allocated node data for + * this node, presume the node is memoryless or offline. + */ + if (!NODE_DATA(nid)) alloc_offline_node_data(nid); pgdat = NODE_DATA(nid); From 37a012c5c10c3364c5cba5def30dd7a17a6b587a Mon Sep 17 00:00:00 2001 From: Daniele Alessandrelli Date: Mon, 23 Feb 2026 17:09:05 +0000 Subject: [PATCH 223/359] mailmap: add entry for Daniele Alessandrelli My Intel email is going to bounce soon. Map it to my personal Gmail address. Link: https://lkml.kernel.org/r/20260223170905.278956-1-daniele.alessandrelli@intel.com Signed-off-by: Daniele Alessandrelli Cc: Daniele Alessandrelli Signed-off-by: Andrew Morton --- .mailmap | 1 + 1 file changed, 1 insertion(+) diff --git a/.mailmap b/.mailmap index b3785362f73f..cd09bc545586 100644 --- a/.mailmap +++ b/.mailmap @@ -211,6 +211,7 @@ Daniel Borkmann Daniel Borkmann Daniel Borkmann Daniel Thompson +Daniele Alessandrelli Danilo Krummrich David Brownell David Collins From 410aed670cddac1de4f0c2865f30ec623fd20f78 Mon Sep 17 00:00:00 2001 From: Yosry Ahmed Date: Mon, 23 Feb 2026 16:00:26 +0000 Subject: [PATCH 224/359] MAINTAINERS: update Yosry Ahmed's email address Use my kernel.org email address. Link: https://lkml.kernel.org/r/20260223160027.122307-1-yosry@kernel.org Signed-off-by: Yosry Ahmed Cc: Johannes Weiner Cc: Nhat Pham Signed-off-by: Andrew Morton --- .mailmap | 3 ++- MAINTAINERS | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/.mailmap b/.mailmap index cd09bc545586..c124a1306d26 100644 --- a/.mailmap +++ b/.mailmap @@ -892,7 +892,8 @@ Yanteng Si Ying Huang Yixun Lan Yixun Lan -Yosry Ahmed +Yosry Ahmed +Yosry Ahmed Yu-Chun Lin Yusuke Goda Zack Rusin diff --git a/MAINTAINERS b/MAINTAINERS index 02fe14782533..e4572a36afd2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -29186,7 +29186,7 @@ K: zstd ZSWAP COMPRESSED SWAP CACHING M: Johannes Weiner -M: Yosry Ahmed +M: Yosry Ahmed M: Nhat Pham R: Chengming Zhou L: linux-mm@kvack.org From 2f38fd99c0004676d835ae96ac4f3b54edc02c82 Mon Sep 17 00:00:00 2001 From: wangshuaiwei Date: Tue, 24 Feb 2026 14:32:28 +0800 Subject: [PATCH 225/359] scsi: ufs: core: Fix shift out of bounds when MAXQ=32 According to JESD223F, the maximum number of queues (MAXQ) is 32. When MCQ is enabled and ESI is disabled, nr_hw_queues=32 causes a shift overflow problem. Fix this by using 64-bit intermediate values to handle the nr_hw_queues=32 case safely. Signed-off-by: wangshuaiwei Reviewed-by: Bart Van Assche Link: https://patch.msgid.link/20260224063228.50112-1-wangshuaiwei1@xiaomi.com Signed-off-by: Martin K. Petersen --- drivers/ufs/core/ufshcd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 85987373b961..899e663fea6e 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -7110,7 +7110,7 @@ static irqreturn_t ufshcd_handle_mcq_cq_events(struct ufs_hba *hba) ret = ufshcd_vops_get_outstanding_cqs(hba, &outstanding_cqs); if (ret) - outstanding_cqs = (1U << hba->nr_hw_queues) - 1; + outstanding_cqs = (1ULL << hba->nr_hw_queues) - 1; /* Exclude the poll queues */ nr_queues = hba->nr_hw_queues - hba->nr_queues[HCTX_TYPE_POLL]; From bfbc0b5b32a8f28ce284add619bf226716a59bc0 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Tue, 24 Feb 2026 11:51:16 -0700 Subject: [PATCH 226/359] media: dvb-core: fix wrong reinitialization of ringbuffer on reopen dvb_dvr_open() calls dvb_ringbuffer_init() when a new reader opens the DVR device. dvb_ringbuffer_init() calls init_waitqueue_head(), which reinitializes the waitqueue list head to empty. Since dmxdev->dvr_buffer.queue is a shared waitqueue (all opens of the same DVR device share it), this orphans any existing waitqueue entries from io_uring poll or epoll, leaving them with stale prev/next pointers while the list head is reset to {self, self}. The waitqueue and spinlock in dvr_buffer are already properly initialized once in dvb_dmxdev_init(). The open path only needs to reset the buffer data pointer, size, and read/write positions. Replace the dvb_ringbuffer_init() call in dvb_dvr_open() with direct assignment of data/size and a call to dvb_ringbuffer_reset(), which properly resets pread, pwrite, and error with correct memory ordering without touching the waitqueue or spinlock. Cc: stable@vger.kernel.org Fixes: 34731df288a5f ("V4L/DVB (3501): Dmxdev: use dvb_ringbuffer") Reported-by: syzbot+ab12f0c08dd7ab8d057c@syzkaller.appspotmail.com Tested-by: syzbot+ab12f0c08dd7ab8d057c@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/698a26d3.050a0220.3b3015.007d.GAE@google.com/ Signed-off-by: Jens Axboe Signed-off-by: Linus Torvalds --- drivers/media/dvb-core/dmxdev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/media/dvb-core/dmxdev.c b/drivers/media/dvb-core/dmxdev.c index 7cfed24054d0..3c8bc75e4d6c 100644 --- a/drivers/media/dvb-core/dmxdev.c +++ b/drivers/media/dvb-core/dmxdev.c @@ -168,7 +168,9 @@ static int dvb_dvr_open(struct inode *inode, struct file *file) mutex_unlock(&dmxdev->mutex); return -ENOMEM; } - dvb_ringbuffer_init(&dmxdev->dvr_buffer, mem, DVR_BUFFER_SIZE); + dmxdev->dvr_buffer.data = mem; + dmxdev->dvr_buffer.size = DVR_BUFFER_SIZE; + dvb_ringbuffer_reset(&dmxdev->dvr_buffer); if (dmxdev->may_do_mmap) dvb_vb2_init(&dmxdev->dvr_vb2_ctx, "dvr", &dmxdev->mutex, From 6acf7860dcc79ed045cc9e6a79c8a8bb6959dba7 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 24 Feb 2026 06:21:44 -0800 Subject: [PATCH 227/359] zloop: advertise a volatile write cache Zloop is file system backed and thus needs to sync the underlying file system to persist data. Set BLK_FEAT_WRITE_CACHE so that the block layer actually send flush commands, and fix the flush implementation as sync_filesystem requires s_umount to be held and the code currently misses that. Fixes: eb0570c7df23 ("block: new zoned loop block device driver") Signed-off-by: Christoph Hellwig Reviewed-by: Damien Le Moal Signed-off-by: Jens Axboe --- drivers/block/zloop.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/drivers/block/zloop.c b/drivers/block/zloop.c index 8e334f5025fc..ae9bf2a85c21 100644 --- a/drivers/block/zloop.c +++ b/drivers/block/zloop.c @@ -542,6 +542,21 @@ out: zloop_put_cmd(cmd); } +/* + * Sync the entire FS containing the zone files instead of walking all files. + */ +static int zloop_flush(struct zloop_device *zlo) +{ + struct super_block *sb = file_inode(zlo->data_dir)->i_sb; + int ret; + + down_read(&sb->s_umount); + ret = sync_filesystem(sb); + up_read(&sb->s_umount); + + return ret; +} + static void zloop_handle_cmd(struct zloop_cmd *cmd) { struct request *rq = blk_mq_rq_from_pdu(cmd); @@ -562,11 +577,7 @@ static void zloop_handle_cmd(struct zloop_cmd *cmd) zloop_rw(cmd); return; case REQ_OP_FLUSH: - /* - * Sync the entire FS containing the zone files instead of - * walking all files - */ - cmd->ret = sync_filesystem(file_inode(zlo->data_dir)->i_sb); + cmd->ret = zloop_flush(zlo); break; case REQ_OP_ZONE_RESET: cmd->ret = zloop_reset_zone(zlo, rq_zone_no(rq)); @@ -981,7 +992,8 @@ static int zloop_ctl_add(struct zloop_options *opts) struct queue_limits lim = { .max_hw_sectors = SZ_1M >> SECTOR_SHIFT, .chunk_sectors = opts->zone_size, - .features = BLK_FEAT_ZONED, + .features = BLK_FEAT_ZONED | BLK_FEAT_WRITE_CACHE, + }; unsigned int nr_zones, i, j; struct zloop_device *zlo; From 3c4617117a2b7682cf037be5e5533e379707f050 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 24 Feb 2026 06:21:45 -0800 Subject: [PATCH 228/359] zloop: check for spurious options passed to remove Zloop uses a command option parser for all control commands, but most options are only valid for adding a new device. Check for incorrectly specified options in the remove handler. Fixes: eb0570c7df23 ("block: new zoned loop block device driver") Signed-off-by: Christoph Hellwig Reviewed-by: Damien Le Moal Signed-off-by: Jens Axboe --- drivers/block/zloop.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/block/zloop.c b/drivers/block/zloop.c index ae9bf2a85c21..9e3bb538d5fc 100644 --- a/drivers/block/zloop.c +++ b/drivers/block/zloop.c @@ -1174,7 +1174,12 @@ static int zloop_ctl_remove(struct zloop_options *opts) int ret; if (!(opts->mask & ZLOOP_OPT_ID)) { - pr_err("No ID specified\n"); + pr_err("No ID specified for remove\n"); + return -EINVAL; + } + + if (opts->mask & ~ZLOOP_OPT_ID) { + pr_err("Invalid option specified for remove\n"); return -EINVAL; } From 4b44cbb264d0ed3f2f2bc2659db6ce45882f4670 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 24 Feb 2026 15:24:52 -0800 Subject: [PATCH 229/359] overflow: Make sure size helpers are always inlined With kmalloc_obj() performing implicit size calculations, the embedded size_mul() calls, while marked inline, were not always being inlined. I noticed a couple places where allocations were making a call out for things that would otherwise be compile-time calculated. Force the compilers to always inline these calculations. Reviewed-by: Gustavo A. R. Silva Link: https://patch.msgid.link/20260224232451.work.614-kees@kernel.org Signed-off-by: Kees Cook --- include/linux/overflow.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/overflow.h b/include/linux/overflow.h index eddd987a8513..a8cb6319b4fb 100644 --- a/include/linux/overflow.h +++ b/include/linux/overflow.h @@ -42,7 +42,7 @@ * both the type-agnostic benefits of the macros while also being able to * enforce that the return value is, in fact, checked. */ -static inline bool __must_check __must_check_overflow(bool overflow) +static __always_inline bool __must_check __must_check_overflow(bool overflow) { return unlikely(overflow); } @@ -327,7 +327,7 @@ static inline bool __must_check __must_check_overflow(bool overflow) * with any overflow causing the return value to be SIZE_MAX. The * lvalue must be size_t to avoid implicit type conversion. */ -static inline size_t __must_check size_mul(size_t factor1, size_t factor2) +static __always_inline size_t __must_check size_mul(size_t factor1, size_t factor2) { size_t bytes; @@ -346,7 +346,7 @@ static inline size_t __must_check size_mul(size_t factor1, size_t factor2) * with any overflow causing the return value to be SIZE_MAX. The * lvalue must be size_t to avoid implicit type conversion. */ -static inline size_t __must_check size_add(size_t addend1, size_t addend2) +static __always_inline size_t __must_check size_add(size_t addend1, size_t addend2) { size_t bytes; @@ -367,7 +367,7 @@ static inline size_t __must_check size_add(size_t addend1, size_t addend2) * argument may be SIZE_MAX (or the result with be forced to SIZE_MAX). * The lvalue must be size_t to avoid implicit type conversion. */ -static inline size_t __must_check size_sub(size_t minuend, size_t subtrahend) +static __always_inline size_t __must_check size_sub(size_t minuend, size_t subtrahend) { size_t bytes; From 2f61f38a217462411fed950e843b82bc119884cf Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Mon, 23 Feb 2026 12:19:08 +0000 Subject: [PATCH 230/359] net: stmmac: fix timestamping configuration after suspend/resume When stmmac_init_timestamping() is called, it clears the receive and transmit path booleans that allow timestamps to be read. These are never re-initialised until after userspace requests timestamping features to be enabled. However, our copy of the timestamp configuration is not cleared, which means we return the old configuration to userspace when requested. This is inconsistent. Fix this by clearing the timestamp configuration. Fixes: d6228b7cdd6e ("net: stmmac: implement the SIOCGHWTSTAMP ioctl") Signed-off-by: Russell King (Oracle) Reviewed-by: Simon Horman Link: https://patch.msgid.link/E1vuUu4-0000000Afea-0j9B@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 82375d34ad57..91f638ce132e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -853,6 +853,7 @@ static int stmmac_init_timestamping(struct stmmac_priv *priv) netdev_info(priv->dev, "IEEE 1588-2008 Advanced Timestamp supported\n"); + memset(&priv->tstamp_config, 0, sizeof(priv->tstamp_config)); priv->hwts_tx_en = 0; priv->hwts_rx_en = 0; From c601fd5414315fc515f746b499110e46272e7243 Mon Sep 17 00:00:00 2001 From: Jonathan Cavitt Date: Tue, 24 Feb 2026 22:12:28 +0000 Subject: [PATCH 231/359] drm/client: Do not destroy NULL modes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 'modes' in drm_client_modeset_probe may fail to kcalloc. If this occurs, we jump to 'out', calling modes_destroy on it, which dereferences it. This may result in a NULL pointer dereference in the error case. Prevent that. Fixes: 3039cc0c0653 ("drm/client: Make copies of modes") Signed-off-by: Jonathan Cavitt Cc: Ville Syrjälä Signed-off-by: Ville Syrjälä Link: https://patch.msgid.link/20260224221227.69126-2-jonathan.cavitt@intel.com --- drivers/gpu/drm/drm_client_modeset.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_client_modeset.c b/drivers/gpu/drm/drm_client_modeset.c index 262b1b8773c5..bb49b8361271 100644 --- a/drivers/gpu/drm/drm_client_modeset.c +++ b/drivers/gpu/drm/drm_client_modeset.c @@ -930,7 +930,8 @@ int drm_client_modeset_probe(struct drm_client_dev *client, unsigned int width, mutex_unlock(&client->modeset_mutex); out: kfree(crtcs); - modes_destroy(dev, modes, connector_count); + if (modes) + modes_destroy(dev, modes, connector_count); kfree(modes); kfree(offsets); kfree(enabled); From 29c8b85adb47daefc213381bc1831787f512d89b Mon Sep 17 00:00:00 2001 From: Sascha Bischoff Date: Wed, 25 Feb 2026 08:31:40 +0000 Subject: [PATCH 232/359] irqchip/gic-v5: Fix inversion of IRS_IDR0.virt flag It appears that a !! became ! during a cleanup, resulting in inverted logic when detecting if a host GICv5 implementation is capable of virtualization. Re-add the missing !, fixing the behaviour. Fixes: 3227c3a89d65f ("irqchip/gic-v5: Check if impl is virt capable") Signed-off-by: Sascha Bischoff Link: https://patch.msgid.link/20260225083130.3378490-1-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier --- drivers/irqchip/irq-gic-v5-irs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic-v5-irs.c b/drivers/irqchip/irq-gic-v5-irs.c index eeeb40fb0eaa..4ed2bfa17c01 100644 --- a/drivers/irqchip/irq-gic-v5-irs.c +++ b/drivers/irqchip/irq-gic-v5-irs.c @@ -744,7 +744,7 @@ static int __init gicv5_irs_init(struct device_node *node) */ if (list_empty(&irs_nodes)) { idr = irs_readl_relaxed(irs_data, GICV5_IRS_IDR0); - gicv5_global_data.virt_capable = !FIELD_GET(GICV5_IRS_IDR0_VIRT, idr); + gicv5_global_data.virt_capable = !!FIELD_GET(GICV5_IRS_IDR0_VIRT, idr); idr = irs_readl_relaxed(irs_data, GICV5_IRS_IDR1); irs_setup_pri_bits(idr); From 4a2d046e4b13202a6301a993961f5b30ae4d7119 Mon Sep 17 00:00:00 2001 From: Gao Xiang Date: Tue, 24 Feb 2026 18:31:25 +0800 Subject: [PATCH 233/359] erofs: fix interlaced plain identification for encoded extents Only plain data whose start position and on-disk physical length are both aligned to the block size should be classified as interlaced plain extents. Otherwise, it must be treated as shifted plain extents. This issue was found by syzbot using a crafted compressed image containing plain extents with unaligned physical lengths, which can cause OOB read in z_erofs_transform_plain(). Reported-and-tested-by: syzbot+d988dc155e740d76a331@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/699d5714.050a0220.cdd3c.03e7.GAE@google.com Fixes: 1d191b4ca51d ("erofs: implement encoded extent metadata") Signed-off-by: Gao Xiang --- fs/erofs/zmap.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c index c8d8e129eb4b..30775502b56d 100644 --- a/fs/erofs/zmap.c +++ b/fs/erofs/zmap.c @@ -513,6 +513,7 @@ static int z_erofs_map_blocks_ext(struct inode *inode, unsigned int recsz = z_erofs_extent_recsize(vi->z_advise); erofs_off_t pos = round_up(Z_EROFS_MAP_HEADER_END(erofs_iloc(inode) + vi->inode_isize + vi->xattr_isize), recsz); + unsigned int bmask = sb->s_blocksize - 1; bool in_mbox = erofs_inode_in_metabox(inode); erofs_off_t lend = inode->i_size; erofs_off_t l, r, mid, pa, la, lstart; @@ -596,17 +597,17 @@ static int z_erofs_map_blocks_ext(struct inode *inode, map->m_flags |= EROFS_MAP_MAPPED | EROFS_MAP_FULL_MAPPED | EROFS_MAP_ENCODED; fmt = map->m_plen >> Z_EROFS_EXTENT_PLEN_FMT_BIT; + if (map->m_plen & Z_EROFS_EXTENT_PLEN_PARTIAL) + map->m_flags |= EROFS_MAP_PARTIAL_REF; + map->m_plen &= Z_EROFS_EXTENT_PLEN_MASK; if (fmt) map->m_algorithmformat = fmt - 1; - else if (interlaced && !erofs_blkoff(sb, map->m_pa)) + else if (interlaced && !((map->m_pa | map->m_plen) & bmask)) map->m_algorithmformat = Z_EROFS_COMPRESSION_INTERLACED; else map->m_algorithmformat = Z_EROFS_COMPRESSION_SHIFTED; - if (map->m_plen & Z_EROFS_EXTENT_PLEN_PARTIAL) - map->m_flags |= EROFS_MAP_PARTIAL_REF; - map->m_plen &= Z_EROFS_EXTENT_PLEN_MASK; } } map->m_llen = lend - map->m_la; From 54e367cb94d6bef941bbc1132d9959dc73bd4b6f Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Wed, 25 Feb 2026 10:47:18 +0000 Subject: [PATCH 234/359] KVM: arm64: Deduplicate ASID retrieval code We currently have three versions of the ASID retrieval code, one in the S1 walker, and two in the VNCR handling (although the last two are limited to the EL2&0 translation regime). Make this code common, and take this opportunity to also simplify the code a bit while switching over to the TTBRx_EL1_ASID macro. Reviewed-by: Joey Gouly Reviewed-by: Jonathan Cameron Link: https://patch.msgid.link/20260225104718.14209-1-maz@kernel.org Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/kvm_nested.h | 2 + arch/arm64/kvm/at.c | 27 +------------ arch/arm64/kvm/nested.c | 60 +++++++++++++++-------------- 3 files changed, 35 insertions(+), 54 deletions(-) diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h index 905c658057a4..091544e6af44 100644 --- a/arch/arm64/include/asm/kvm_nested.h +++ b/arch/arm64/include/asm/kvm_nested.h @@ -397,6 +397,8 @@ int kvm_vcpu_allocate_vncr_tlb(struct kvm_vcpu *vcpu); int kvm_handle_vncr_abort(struct kvm_vcpu *vcpu); void kvm_handle_s1e2_tlbi(struct kvm_vcpu *vcpu, u32 inst, u64 val); +u16 get_asid_by_regime(struct kvm_vcpu *vcpu, enum trans_regime regime); + #define vncr_fixmap(c) \ ({ \ u32 __c = (c); \ diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c index 885bd5bb2f41..6588ea251ed7 100644 --- a/arch/arm64/kvm/at.c +++ b/arch/arm64/kvm/at.c @@ -540,31 +540,8 @@ static int walk_s1(struct kvm_vcpu *vcpu, struct s1_walk_info *wi, wr->pa |= va & GENMASK_ULL(va_bottom - 1, 0); wr->nG = (wi->regime != TR_EL2) && (desc & PTE_NG); - if (wr->nG) { - u64 asid_ttbr, tcr; - - switch (wi->regime) { - case TR_EL10: - tcr = vcpu_read_sys_reg(vcpu, TCR_EL1); - asid_ttbr = ((tcr & TCR_A1) ? - vcpu_read_sys_reg(vcpu, TTBR1_EL1) : - vcpu_read_sys_reg(vcpu, TTBR0_EL1)); - break; - case TR_EL20: - tcr = vcpu_read_sys_reg(vcpu, TCR_EL2); - asid_ttbr = ((tcr & TCR_A1) ? - vcpu_read_sys_reg(vcpu, TTBR1_EL2) : - vcpu_read_sys_reg(vcpu, TTBR0_EL2)); - break; - default: - BUG(); - } - - wr->asid = FIELD_GET(TTBR_ASID_MASK, asid_ttbr); - if (!kvm_has_feat_enum(vcpu->kvm, ID_AA64MMFR0_EL1, ASIDBITS, 16) || - !(tcr & TCR_ASID16)) - wr->asid &= GENMASK(7, 0); - } + if (wr->nG) + wr->asid = get_asid_by_regime(vcpu, wi->regime); return 0; diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c index 7f1ea85dc67a..787776aaf4ad 100644 --- a/arch/arm64/kvm/nested.c +++ b/arch/arm64/kvm/nested.c @@ -854,6 +854,33 @@ int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2) return kvm_inject_nested_sync(vcpu, esr_el2); } +u16 get_asid_by_regime(struct kvm_vcpu *vcpu, enum trans_regime regime) +{ + enum vcpu_sysreg ttbr_elx; + u64 tcr; + u16 asid; + + switch (regime) { + case TR_EL10: + tcr = vcpu_read_sys_reg(vcpu, TCR_EL1); + ttbr_elx = (tcr & TCR_A1) ? TTBR1_EL1 : TTBR0_EL1; + break; + case TR_EL20: + tcr = vcpu_read_sys_reg(vcpu, TCR_EL2); + ttbr_elx = (tcr & TCR_A1) ? TTBR1_EL2 : TTBR0_EL2; + break; + default: + BUG(); + } + + asid = FIELD_GET(TTBRx_EL1_ASID, vcpu_read_sys_reg(vcpu, ttbr_elx)); + if (!kvm_has_feat_enum(vcpu->kvm, ID_AA64MMFR0_EL1, ASIDBITS, 16) || + !(tcr & TCR_ASID16)) + asid &= GENMASK(7, 0); + + return asid; +} + static void invalidate_vncr(struct vncr_tlb *vt) { vt->valid = false; @@ -1333,20 +1360,8 @@ static bool kvm_vncr_tlb_lookup(struct kvm_vcpu *vcpu) if (read_vncr_el2(vcpu) != vt->gva) return false; - if (vt->wr.nG) { - u64 tcr = vcpu_read_sys_reg(vcpu, TCR_EL2); - u64 ttbr = ((tcr & TCR_A1) ? - vcpu_read_sys_reg(vcpu, TTBR1_EL2) : - vcpu_read_sys_reg(vcpu, TTBR0_EL2)); - u16 asid; - - asid = FIELD_GET(TTBR_ASID_MASK, ttbr); - if (!kvm_has_feat_enum(vcpu->kvm, ID_AA64MMFR0_EL1, ASIDBITS, 16) || - !(tcr & TCR_ASID16)) - asid &= GENMASK(7, 0); - - return asid == vt->wr.asid; - } + if (vt->wr.nG) + return get_asid_by_regime(vcpu, TR_EL20) == vt->wr.asid; return true; } @@ -1449,21 +1464,8 @@ static void kvm_map_l1_vncr(struct kvm_vcpu *vcpu) if (read_vncr_el2(vcpu) != vt->gva) return; - if (vt->wr.nG) { - u64 tcr = vcpu_read_sys_reg(vcpu, TCR_EL2); - u64 ttbr = ((tcr & TCR_A1) ? - vcpu_read_sys_reg(vcpu, TTBR1_EL2) : - vcpu_read_sys_reg(vcpu, TTBR0_EL2)); - u16 asid; - - asid = FIELD_GET(TTBR_ASID_MASK, ttbr); - if (!kvm_has_feat_enum(vcpu->kvm, ID_AA64MMFR0_EL1, ASIDBITS, 16) || - !(tcr & TCR_ASID16)) - asid &= GENMASK(7, 0); - - if (asid != vt->wr.asid) - return; - } + if (vt->wr.nG && get_asid_by_regime(vcpu, TR_EL20) != vt->wr.asid) + return; vt->cpu = smp_processor_id(); From 93a4a9b732fbb479f95d327aa867d094aed3f712 Mon Sep 17 00:00:00 2001 From: Stefan Metzmacher Date: Tue, 24 Feb 2026 17:59:52 +0100 Subject: [PATCH 235/359] RDMA/core: Check id_priv->restricted_node_type in cma_listen_on_dev() When listening on wildcard addresses we have a global list for the application layer rdma_cm_id and for any existing device or any device added in future we try to listen on any wildcard listener. When the listener has a restricted_node_type we should prevent listening on devices with a different node type. While there fix the documentation comment of rdma_restrict_node_type() to include rdma_resolve_addr() instead of having rdma_bind_addr() twice. Fixes: a760e80e90f5 ("RDMA/core: introduce rdma_restrict_node_type()") Cc: Jason Gunthorpe Cc: Leon Romanovsky Cc: Steve French Cc: Namjae Jeon Cc: Tom Talpey Cc: Long Li Cc: linux-rdma@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher Link: https://patch.msgid.link/20260224165951.3582093-2-metze@samba.org Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/cma.c | 6 +++++- include/rdma/rdma_cm.h | 2 +- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index e54c07c74575..9480d1a51c11 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -2729,6 +2729,9 @@ static int cma_listen_on_dev(struct rdma_id_private *id_priv, *to_destroy = NULL; if (cma_family(id_priv) == AF_IB && !rdma_cap_ib_cm(cma_dev->device, 1)) return 0; + if (id_priv->restricted_node_type != RDMA_NODE_UNSPECIFIED && + id_priv->restricted_node_type != cma_dev->device->node_type) + return 0; dev_id_priv = __rdma_create_id(net, cma_listen_handler, id_priv, @@ -2736,6 +2739,7 @@ static int cma_listen_on_dev(struct rdma_id_private *id_priv, if (IS_ERR(dev_id_priv)) return PTR_ERR(dev_id_priv); + dev_id_priv->restricted_node_type = id_priv->restricted_node_type; dev_id_priv->state = RDMA_CM_ADDR_BOUND; memcpy(cma_src_addr(dev_id_priv), cma_src_addr(id_priv), rdma_addr_size(cma_src_addr(id_priv))); @@ -4194,7 +4198,7 @@ int rdma_restrict_node_type(struct rdma_cm_id *id, u8 node_type) } mutex_lock(&lock); - if (id_priv->cma_dev) + if (READ_ONCE(id_priv->state) != RDMA_CM_IDLE) ret = -EALREADY; else id_priv->restricted_node_type = node_type; diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h index 6de6fd8bd15e..d639ff889e64 100644 --- a/include/rdma/rdma_cm.h +++ b/include/rdma/rdma_cm.h @@ -181,7 +181,7 @@ void rdma_destroy_id(struct rdma_cm_id *id); * * It needs to be called before the RDMA identifier is bound * to an device, which mean it should be called before - * rdma_bind_addr(), rdma_bind_addr() and rdma_listen(). + * rdma_bind_addr(), rdma_resolve_addr() and rdma_listen(). */ int rdma_restrict_node_type(struct rdma_cm_id *id, u8 node_type); From 104016eb671e19709721c1b0048dd912dc2e96be Mon Sep 17 00:00:00 2001 From: Jacob Moroni Date: Tue, 24 Feb 2026 23:41:53 +0000 Subject: [PATCH 236/359] RDMA/umem: Fix double dma_buf_unpin in failure path In ib_umem_dmabuf_get_pinned_with_dma_device(), the call to ib_umem_dmabuf_map_pages() can fail. If this occurs, the dmabuf is immediately unpinned but the umem_dmabuf->pinned flag is still set. Then, when ib_umem_release() is called, it calls ib_umem_dmabuf_revoke() which will call dma_buf_unpin() again. Fix this by removing the immediate unpin upon failure and just let the ib_umem_release/revoke path handle it. This also ensures the proper unmap-unpin unwind ordering if the dmabuf_map_pages call happened to fail due to dma_resv_wait_timeout (and therefore has a non-NULL umem_dmabuf->sgt). Fixes: 1e4df4a21c5a ("RDMA/umem: Allow pinned dmabuf umem usage") Signed-off-by: Jacob Moroni Link: https://patch.msgid.link/20260224234153.1207849-1-jmoroni@google.com Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/umem_dmabuf.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c index f5298c33e581..d30f24b90bca 100644 --- a/drivers/infiniband/core/umem_dmabuf.c +++ b/drivers/infiniband/core/umem_dmabuf.c @@ -218,13 +218,11 @@ ib_umem_dmabuf_get_pinned_with_dma_device(struct ib_device *device, err = ib_umem_dmabuf_map_pages(umem_dmabuf); if (err) - goto err_unpin; + goto err_release; dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv); return umem_dmabuf; -err_unpin: - dma_buf_unpin(umem_dmabuf->attach); err_release: dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv); ib_umem_release(&umem_dmabuf->umem); From 18c16f602a67782f5eb4b5ab9ba73350b9f711ec Mon Sep 17 00:00:00 2001 From: "Nirjhar Roy (IBM)" Date: Wed, 4 Feb 2026 20:36:26 +0530 Subject: [PATCH 237/359] xfs: Replace ASSERT with XFS_IS_CORRUPT in xfs_rtcopy_summary() Replace ASSERT(sum > 0) with an XFS_IS_CORRUPT() and place it just after the call to xfs_rtget_summary() so that we don't end up using an illegal value of sum. Signed-off-by: Nirjhar Roy (IBM) Reviewed-by: Carlos Maiolino Reviewed-by: Darrick J. Wong Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_rtalloc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 90a94a5b6f7e..aab59f66384e 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -112,6 +112,10 @@ xfs_rtcopy_summary( error = xfs_rtget_summary(oargs, log, bbno, &sum); if (error) goto out; + if (XFS_IS_CORRUPT(oargs->mp, sum < 0)) { + error = -EFSCORRUPTED; + goto out; + } if (sum == 0) continue; error = xfs_rtmodify_summary(oargs, log, bbno, -sum); @@ -120,7 +124,6 @@ xfs_rtcopy_summary( error = xfs_rtmodify_summary(nargs, log, bbno, sum); if (error) goto out; - ASSERT(sum > 0); } } error = 0; From a49b7ff63f98ba1c4503869c568c99ecffa478f2 Mon Sep 17 00:00:00 2001 From: "Nirjhar Roy (IBM)" Date: Wed, 11 Feb 2026 19:25:13 +0530 Subject: [PATCH 238/359] xfs: Refactoring the nagcount and delta calculation Introduce xfs_growfs_compute_delta() to calculate the nagcount and delta blocks and refactor the code from xfs_growfs_data_private(). No functional changes. Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Nirjhar Roy (IBM) Signed-off-by: Carlos Maiolino --- fs/xfs/libxfs/xfs_ag.c | 28 ++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_ag.h | 3 +++ fs/xfs/xfs_fsops.c | 17 ++--------------- 3 files changed, 33 insertions(+), 15 deletions(-) diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c index 9c6765cc2d44..bd8fbb40b49e 100644 --- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@ -872,6 +872,34 @@ resv_err: return err2; } +void +xfs_growfs_compute_deltas( + struct xfs_mount *mp, + xfs_rfsblock_t nb, + int64_t *deltap, + xfs_agnumber_t *nagcountp) +{ + xfs_rfsblock_t nb_div, nb_mod; + int64_t delta; + xfs_agnumber_t nagcount; + + nb_div = nb; + nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks); + if (nb_mod && nb_mod >= XFS_MIN_AG_BLOCKS) + nb_div++; + else if (nb_mod) + nb = nb_div * mp->m_sb.sb_agblocks; + + if (nb_div > XFS_MAX_AGNUMBER + 1) { + nb_div = XFS_MAX_AGNUMBER + 1; + nb = nb_div * mp->m_sb.sb_agblocks; + } + nagcount = nb_div; + delta = nb - mp->m_sb.sb_dblocks; + *deltap = delta; + *nagcountp = nagcount; +} + /* * Extent the AG indicated by the @id by the length passed in */ diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h index 1f24cfa27321..3cd4790768ff 100644 --- a/fs/xfs/libxfs/xfs_ag.h +++ b/fs/xfs/libxfs/xfs_ag.h @@ -331,6 +331,9 @@ struct aghdr_init_data { int xfs_ag_init_headers(struct xfs_mount *mp, struct aghdr_init_data *id); int xfs_ag_shrink_space(struct xfs_perag *pag, struct xfs_trans **tpp, xfs_extlen_t delta); +void +xfs_growfs_compute_deltas(struct xfs_mount *mp, xfs_rfsblock_t nb, + int64_t *deltap, xfs_agnumber_t *nagcountp); int xfs_ag_extend_space(struct xfs_perag *pag, struct xfs_trans *tp, xfs_extlen_t len); int xfs_ag_get_geometry(struct xfs_perag *pag, struct xfs_ag_geometry *ageo); diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 17255c41786b..8d64d904d73c 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -95,18 +95,17 @@ xfs_growfs_data_private( struct xfs_growfs_data *in) /* growfs data input struct */ { xfs_agnumber_t oagcount = mp->m_sb.sb_agcount; + xfs_rfsblock_t nb = in->newblocks; struct xfs_buf *bp; int error; xfs_agnumber_t nagcount; xfs_agnumber_t nagimax = 0; - xfs_rfsblock_t nb, nb_div, nb_mod; int64_t delta; bool lastag_extended = false; struct xfs_trans *tp; struct aghdr_init_data id = {}; struct xfs_perag *last_pag; - nb = in->newblocks; error = xfs_sb_validate_fsb_count(&mp->m_sb, nb); if (error) return error; @@ -125,20 +124,8 @@ xfs_growfs_data_private( mp->m_sb.sb_rextsize); if (error) return error; + xfs_growfs_compute_deltas(mp, nb, &delta, &nagcount); - nb_div = nb; - nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks); - if (nb_mod && nb_mod >= XFS_MIN_AG_BLOCKS) - nb_div++; - else if (nb_mod) - nb = nb_div * mp->m_sb.sb_agblocks; - - if (nb_div > XFS_MAX_AGNUMBER + 1) { - nb_div = XFS_MAX_AGNUMBER + 1; - nb = nb_div * mp->m_sb.sb_agblocks; - } - nagcount = nb_div; - delta = nb - mp->m_sb.sb_dblocks; /* * Reject filesystems with a single AG because they are not * supported, and reject a shrink operation that would cause a From 4ad85e633bc576a5cc8c8310aab141af7ed20efa Mon Sep 17 00:00:00 2001 From: "Nirjhar Roy (IBM)" Date: Wed, 11 Feb 2026 19:25:14 +0530 Subject: [PATCH 239/359] xfs: Replace &rtg->rtg_group with rtg_group() Use the already existing rtg_group() wrapper instead of directly accessing the struct xfs_group member in struct xfs_rtgroup. Reviewed-by: Christoph Hellwig Signed-off-by: Nirjhar Roy (IBM) [cem: Conflict resolution against 06873dbd940d] Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_zone_alloc.c | 6 +++--- fs/xfs/xfs_zone_gc.c | 10 +++++----- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c index 67e0c8f5800f..e3d19b6dc64a 100644 --- a/fs/xfs/xfs_zone_alloc.c +++ b/fs/xfs/xfs_zone_alloc.c @@ -78,7 +78,7 @@ xfs_zone_account_reclaimable( struct xfs_rtgroup *rtg, uint32_t freed) { - struct xfs_group *xg = &rtg->rtg_group; + struct xfs_group *xg = rtg_group(rtg); struct xfs_mount *mp = rtg_mount(rtg); struct xfs_zone_info *zi = mp->m_zone_info; uint32_t used = rtg_rmap(rtg)->i_used_blocks; @@ -759,7 +759,7 @@ xfs_zone_alloc_blocks( trace_xfs_zone_alloc_blocks(oz, allocated, count_fsb); - *sector = xfs_gbno_to_daddr(&rtg->rtg_group, 0); + *sector = xfs_gbno_to_daddr(rtg_group(rtg), 0); *is_seq = bdev_zone_is_seq(mp->m_rtdev_targp->bt_bdev, *sector); if (!*is_seq) *sector += XFS_FSB_TO_BB(mp, allocated); @@ -1080,7 +1080,7 @@ xfs_init_zone( if (write_pointer == 0) { /* zone is empty */ atomic_inc(&zi->zi_nr_free_zones); - xfs_group_set_mark(&rtg->rtg_group, XFS_RTG_FREE); + xfs_group_set_mark(rtg_group(rtg), XFS_RTG_FREE); iz->available += rtg_blocks(rtg); } else if (write_pointer < rtg_blocks(rtg)) { /* zone is open */ diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c index 48c6cf584447..7efeecd2d85f 100644 --- a/fs/xfs/xfs_zone_gc.c +++ b/fs/xfs/xfs_zone_gc.c @@ -627,7 +627,7 @@ xfs_zone_gc_alloc_blocks( if (!*count_fsb) return NULL; - *daddr = xfs_gbno_to_daddr(&oz->oz_rtg->rtg_group, 0); + *daddr = xfs_gbno_to_daddr(rtg_group(oz->oz_rtg), 0); *is_seq = bdev_zone_is_seq(mp->m_rtdev_targp->bt_bdev, *daddr); if (!*is_seq) *daddr += XFS_FSB_TO_BB(mp, oz->oz_allocated); @@ -702,7 +702,7 @@ xfs_zone_gc_start_chunk( chunk->data = data; chunk->oz = oz; chunk->victim_rtg = iter->victim_rtg; - atomic_inc(&chunk->victim_rtg->rtg_group.xg_active_ref); + atomic_inc(&rtg_group(chunk->victim_rtg)->xg_active_ref); atomic_inc(&chunk->victim_rtg->rtg_gccount); bio->bi_iter.bi_sector = xfs_rtb_to_daddr(mp, chunk->old_startblock); @@ -788,7 +788,7 @@ xfs_zone_gc_split_write( atomic_inc(&chunk->oz->oz_ref); split_chunk->victim_rtg = chunk->victim_rtg; - atomic_inc(&chunk->victim_rtg->rtg_group.xg_active_ref); + atomic_inc(&rtg_group(chunk->victim_rtg)->xg_active_ref); atomic_inc(&chunk->victim_rtg->rtg_gccount); chunk->offset += split_len; @@ -888,7 +888,7 @@ xfs_zone_gc_finish_reset( goto out; } - xfs_group_set_mark(&rtg->rtg_group, XFS_RTG_FREE); + xfs_group_set_mark(rtg_group(rtg), XFS_RTG_FREE); atomic_inc(&zi->zi_nr_free_zones); xfs_zoned_add_available(mp, rtg_blocks(rtg)); @@ -917,7 +917,7 @@ xfs_submit_zone_reset_bio( XFS_STATS_INC(mp, xs_gc_zone_reset_calls); - bio->bi_iter.bi_sector = xfs_gbno_to_daddr(&rtg->rtg_group, 0); + bio->bi_iter.bi_sector = xfs_gbno_to_daddr(rtg_group(rtg), 0); if (!bdev_zone_is_seq(bio->bi_bdev, bio->bi_iter.bi_sector)) { /* * Also use the bio to drive the state machine when neither From fd81d3fd01a5ee4bd26a7dc440e7a2209277d14b Mon Sep 17 00:00:00 2001 From: Wilfred Mallawa Date: Fri, 13 Feb 2026 08:50:06 +1000 Subject: [PATCH 240/359] xfs: fix code alignment issues in xfs_ondisk.c Fixup some code alignment issues in xfs_ondisk.c Signed-off-by: Wilfred Mallawa Reviewed-by: Christoph Hellwig Signed-off-by: Carlos Maiolino --- fs/xfs/libxfs/xfs_ondisk.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_ondisk.h b/fs/xfs/libxfs/xfs_ondisk.h index 2e9715cc1641..70605019383c 100644 --- a/fs/xfs/libxfs/xfs_ondisk.h +++ b/fs/xfs/libxfs/xfs_ondisk.h @@ -73,7 +73,7 @@ xfs_check_ondisk_structs(void) XFS_CHECK_STRUCT_SIZE(struct xfs_dir3_free_hdr, 64); XFS_CHECK_STRUCT_SIZE(struct xfs_dir3_leaf, 64); XFS_CHECK_STRUCT_SIZE(struct xfs_dir3_leaf_hdr, 64); - XFS_CHECK_STRUCT_SIZE(struct xfs_attr_leaf_entry, 8); + XFS_CHECK_STRUCT_SIZE(struct xfs_attr_leaf_entry, 8); XFS_CHECK_STRUCT_SIZE(struct xfs_attr_leaf_hdr, 32); XFS_CHECK_STRUCT_SIZE(struct xfs_attr_leaf_map, 4); XFS_CHECK_STRUCT_SIZE(struct xfs_attr_leaf_name_local, 4); @@ -116,7 +116,7 @@ xfs_check_ondisk_structs(void) XFS_CHECK_STRUCT_SIZE(struct xfs_da_intnode, 16); XFS_CHECK_STRUCT_SIZE(struct xfs_da_node_entry, 8); XFS_CHECK_STRUCT_SIZE(struct xfs_da_node_hdr, 16); - XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_data_free, 4); + XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_data_free, 4); XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_data_hdr, 16); XFS_CHECK_OFFSET(struct xfs_dir2_data_unused, freetag, 0); XFS_CHECK_OFFSET(struct xfs_dir2_data_unused, length, 2); From 03a6d6c4c85d2758534638fb2bb5f72e0f8877d0 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 2 Feb 2026 15:14:31 +0100 Subject: [PATCH 241/359] xfs: cleanup inode counter stats Most of them are unused, so mark them as such. Give the remaining ones names that match their use instead of the historic IRIX ones based on vnodes. Note that the names are purely internal to the XFS code, the user interface is based on section names and arrays of counters. Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_icache.c | 6 +++--- fs/xfs/xfs_stats.c | 10 +++++----- fs/xfs/xfs_stats.h | 16 ++++++++-------- fs/xfs/xfs_super.c | 4 ++-- 4 files changed, 18 insertions(+), 18 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index dbaab4ae709f..f76c6decdaa3 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -106,7 +106,7 @@ xfs_inode_alloc( mapping_set_folio_min_order(VFS_I(ip)->i_mapping, M_IGEO(mp)->min_folio_order); - XFS_STATS_INC(mp, vn_active); + XFS_STATS_INC(mp, xs_inodes_active); ASSERT(atomic_read(&ip->i_pincount) == 0); ASSERT(ip->i_ino == 0); @@ -172,7 +172,7 @@ __xfs_inode_free( /* asserts to verify all state is correct here */ ASSERT(atomic_read(&ip->i_pincount) == 0); ASSERT(!ip->i_itemp || list_empty(&ip->i_itemp->ili_item.li_bio_list)); - XFS_STATS_DEC(ip->i_mount, vn_active); + XFS_STATS_DEC(ip->i_mount, xs_inodes_active); call_rcu(&VFS_I(ip)->i_rcu, xfs_inode_free_callback); } @@ -2234,7 +2234,7 @@ xfs_inode_mark_reclaimable( struct xfs_mount *mp = ip->i_mount; bool need_inactive; - XFS_STATS_INC(mp, vn_reclaim); + XFS_STATS_INC(mp, xs_inode_mark_reclaimable); /* * We should never get here with any of the reclaim flags already set. diff --git a/fs/xfs/xfs_stats.c b/fs/xfs/xfs_stats.c index 017db0361cd8..bc4a5d6dc795 100644 --- a/fs/xfs/xfs_stats.c +++ b/fs/xfs/xfs_stats.c @@ -42,7 +42,7 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf) { "xstrat", xfsstats_offset(xs_write_calls) }, { "rw", xfsstats_offset(xs_attr_get) }, { "attr", xfsstats_offset(xs_iflush_count)}, - { "icluster", xfsstats_offset(vn_active) }, + { "icluster", xfsstats_offset(xs_inodes_active) }, { "vnodes", xfsstats_offset(xb_get) }, { "buf", xfsstats_offset(xs_abtb_2) }, { "abtb2", xfsstats_offset(xs_abtc_2) }, @@ -100,15 +100,15 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf) void xfs_stats_clearall(struct xfsstats __percpu *stats) { int c; - uint32_t vn_active; + uint32_t xs_inodes_active; xfs_notice(NULL, "Clearing xfsstats"); for_each_possible_cpu(c) { preempt_disable(); - /* save vn_active, it's a universal truth! */ - vn_active = per_cpu_ptr(stats, c)->s.vn_active; + /* save xs_inodes_active, it's a universal truth! */ + xs_inodes_active = per_cpu_ptr(stats, c)->s.xs_inodes_active; memset(per_cpu_ptr(stats, c), 0, sizeof(*stats)); - per_cpu_ptr(stats, c)->s.vn_active = vn_active; + per_cpu_ptr(stats, c)->s.xs_inodes_active = xs_inodes_active; preempt_enable(); } } diff --git a/fs/xfs/xfs_stats.h b/fs/xfs/xfs_stats.h index 153d2381d0a8..64bc0cc18126 100644 --- a/fs/xfs/xfs_stats.h +++ b/fs/xfs/xfs_stats.h @@ -100,14 +100,14 @@ struct __xfsstats { uint32_t xs_iflush_count; uint32_t xs_icluster_flushcnt; uint32_t xs_icluster_flushinode; - uint32_t vn_active; /* # vnodes not on free lists */ - uint32_t vn_alloc; /* # times vn_alloc called */ - uint32_t vn_get; /* # times vn_get called */ - uint32_t vn_hold; /* # times vn_hold called */ - uint32_t vn_rele; /* # times vn_rele called */ - uint32_t vn_reclaim; /* # times vn_reclaim called */ - uint32_t vn_remove; /* # times vn_remove called */ - uint32_t vn_free; /* # times vn_free called */ + uint32_t xs_inodes_active; + uint32_t __unused_vn_alloc; + uint32_t __unused_vn_get; + uint32_t __unused_vn_hold; + uint32_t xs_inode_destroy; + uint32_t xs_inode_destroy2; /* same as xs_inode_destroy */ + uint32_t xs_inode_mark_reclaimable; + uint32_t __unused_vn_free; uint32_t xb_get; uint32_t xb_create; uint32_t xb_get_locked; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index abc45f860a73..f8de44443e81 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -712,8 +712,8 @@ xfs_fs_destroy_inode( trace_xfs_destroy_inode(ip); ASSERT(!rwsem_is_locked(&inode->i_rwsem)); - XFS_STATS_INC(ip->i_mount, vn_rele); - XFS_STATS_INC(ip->i_mount, vn_remove); + XFS_STATS_INC(ip->i_mount, xs_inode_destroy); + XFS_STATS_INC(ip->i_mount, xs_inode_destroy2); xfs_inode_mark_reclaimable(ip); } From 47553dd60b1da88df2354f841a4f71dd4de6478a Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 2 Feb 2026 15:14:32 +0100 Subject: [PATCH 242/359] xfs: remove metafile inodes from the active inode stat The active inode (or active vnode until recently) stat can get much larger than expected on file systems with a lot of metafile inodes like zoned file systems on SMR hard disks with 10.000s of rtg rmap inodes. Remove all metafile inodes from the active counter to make it more useful to track actual workloads and add a separate counter for active metafile inodes. This fixes xfs/177 on SMR hard drives. Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Carlos Maiolino --- fs/xfs/libxfs/xfs_inode_buf.c | 4 ++++ fs/xfs/libxfs/xfs_metafile.c | 5 +++++ fs/xfs/xfs_icache.c | 5 ++++- fs/xfs/xfs_stats.c | 11 ++++++++--- fs/xfs/xfs_stats.h | 3 ++- 5 files changed, 23 insertions(+), 5 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index a017016e9075..3794e5412eba 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -268,6 +268,10 @@ xfs_inode_from_disk( } if (xfs_is_reflink_inode(ip)) xfs_ifork_init_cow(ip); + if (xfs_is_metadir_inode(ip)) { + XFS_STATS_DEC(ip->i_mount, xs_inodes_active); + XFS_STATS_INC(ip->i_mount, xs_inodes_meta); + } return 0; out_destroy_data_fork: diff --git a/fs/xfs/libxfs/xfs_metafile.c b/fs/xfs/libxfs/xfs_metafile.c index cf239f862212..71f004e9dc64 100644 --- a/fs/xfs/libxfs/xfs_metafile.c +++ b/fs/xfs/libxfs/xfs_metafile.c @@ -61,6 +61,9 @@ xfs_metafile_set_iflag( ip->i_diflags2 |= XFS_DIFLAG2_METADATA; ip->i_metatype = metafile_type; xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + + XFS_STATS_DEC(ip->i_mount, xs_inodes_active); + XFS_STATS_INC(ip->i_mount, xs_inodes_meta); } /* Clear the metadata directory inode flag. */ @@ -74,6 +77,8 @@ xfs_metafile_clear_iflag( ip->i_diflags2 &= ~XFS_DIFLAG2_METADATA; xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + XFS_STATS_INC(ip->i_mount, xs_inodes_active); + XFS_STATS_DEC(ip->i_mount, xs_inodes_meta); } /* diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index f76c6decdaa3..f2d4294efd37 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -172,7 +172,10 @@ __xfs_inode_free( /* asserts to verify all state is correct here */ ASSERT(atomic_read(&ip->i_pincount) == 0); ASSERT(!ip->i_itemp || list_empty(&ip->i_itemp->ili_item.li_bio_list)); - XFS_STATS_DEC(ip->i_mount, xs_inodes_active); + if (xfs_is_metadir_inode(ip)) + XFS_STATS_DEC(ip->i_mount, xs_inodes_meta); + else + XFS_STATS_DEC(ip->i_mount, xs_inodes_active); call_rcu(&VFS_I(ip)->i_rcu, xfs_inode_free_callback); } diff --git a/fs/xfs/xfs_stats.c b/fs/xfs/xfs_stats.c index bc4a5d6dc795..c13d600732c9 100644 --- a/fs/xfs/xfs_stats.c +++ b/fs/xfs/xfs_stats.c @@ -59,7 +59,8 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf) { "rtrefcntbt", xfsstats_offset(xs_qm_dqreclaims)}, /* we print both series of quota information together */ { "qm", xfsstats_offset(xs_gc_read_calls)}, - { "zoned", xfsstats_offset(__pad1)}, + { "zoned", xfsstats_offset(xs_inodes_meta)}, + { "metafile", xfsstats_offset(xs_xstrat_bytes)}, }; /* Loop over all stats groups */ @@ -99,16 +100,20 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf) void xfs_stats_clearall(struct xfsstats __percpu *stats) { + uint32_t xs_inodes_active, xs_inodes_meta; int c; - uint32_t xs_inodes_active; xfs_notice(NULL, "Clearing xfsstats"); for_each_possible_cpu(c) { preempt_disable(); - /* save xs_inodes_active, it's a universal truth! */ + /* + * Save the active / meta inode counters, as they are stateful. + */ xs_inodes_active = per_cpu_ptr(stats, c)->s.xs_inodes_active; + xs_inodes_meta = per_cpu_ptr(stats, c)->s.xs_inodes_meta; memset(per_cpu_ptr(stats, c), 0, sizeof(*stats)); per_cpu_ptr(stats, c)->s.xs_inodes_active = xs_inodes_active; + per_cpu_ptr(stats, c)->s.xs_inodes_meta = xs_inodes_meta; preempt_enable(); } } diff --git a/fs/xfs/xfs_stats.h b/fs/xfs/xfs_stats.h index 64bc0cc18126..57c32b86c358 100644 --- a/fs/xfs/xfs_stats.h +++ b/fs/xfs/xfs_stats.h @@ -142,7 +142,8 @@ struct __xfsstats { uint32_t xs_gc_read_calls; uint32_t xs_gc_write_calls; uint32_t xs_gc_zone_reset_calls; - uint32_t __pad1; +/* Metafile counters */ + uint32_t xs_inodes_meta; /* Extra precision counters */ uint64_t xs_xstrat_bytes; uint64_t xs_write_bytes; From cddfa648f1ab99e30e91455be19cd5ade26338c2 Mon Sep 17 00:00:00 2001 From: Ethan Tidmore Date: Thu, 19 Feb 2026 21:38:25 -0600 Subject: [PATCH 243/359] xfs: Fix error pointer dereference The function try_lookup_noperm() can return an error pointer and is not checked for one. Add checks for error pointer in xrep_adoption_check_dcache() and xrep_adoption_zap_dcache(). Detected by Smatch: fs/xfs/scrub/orphanage.c:449 xrep_adoption_check_dcache() error: 'd_child' dereferencing possible ERR_PTR() fs/xfs/scrub/orphanage.c:485 xrep_adoption_zap_dcache() error: 'd_child' dereferencing possible ERR_PTR() Fixes: 73597e3e42b4 ("xfs: ensure dentry consistency when the orphanage adopts a file") Cc: stable@vger.kernel.org # v6.16 Signed-off-by: Ethan Tidmore Reviewed-by: Darrick J. Wong Reviewed-by: Nirjhar Roy (IBM) Signed-off-by: Carlos Maiolino --- fs/xfs/scrub/orphanage.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c index 52a108f6d5f4..33c6db6b4498 100644 --- a/fs/xfs/scrub/orphanage.c +++ b/fs/xfs/scrub/orphanage.c @@ -442,6 +442,11 @@ xrep_adoption_check_dcache( return 0; d_child = try_lookup_noperm(&qname, d_orphanage); + if (IS_ERR(d_child)) { + dput(d_orphanage); + return PTR_ERR(d_child); + } + if (d_child) { trace_xrep_adoption_check_child(sc->mp, d_child); @@ -479,7 +484,7 @@ xrep_adoption_zap_dcache( return; d_child = try_lookup_noperm(&qname, d_orphanage); - while (d_child != NULL) { + while (!IS_ERR_OR_NULL(d_child)) { trace_xrep_adoption_invalidate_child(sc->mp, d_child); ASSERT(d_is_negative(d_child)); From e764dd439d68cfc16724e469db390d779ab49521 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Wed, 18 Feb 2026 15:25:35 -0800 Subject: [PATCH 244/359] xfs: fix copy-paste error in previous fix Chris Mason noticed that there is a copy-paste error in a recent change to xrep_dir_teardown that nulls out pointers after freeing the resources. Fixes: ba408d299a3bb3c ("xfs: only call xf{array,blob}_destroy if we have a valid pointer") Link: https://lore.kernel.org/linux-xfs/20260205194211.2307232-1-clm@meta.com/ Reported-by: Chris Mason Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig Reviewed-by: Carlos Maiolino Signed-off-by: Carlos Maiolino --- fs/xfs/scrub/dir_repair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/scrub/dir_repair.c b/fs/xfs/scrub/dir_repair.c index 9dc55c918c78..23b80c54aa60 100644 --- a/fs/xfs/scrub/dir_repair.c +++ b/fs/xfs/scrub/dir_repair.c @@ -177,7 +177,7 @@ xrep_dir_teardown( rd->dir_names = NULL; if (rd->dir_entries) xfarray_destroy(rd->dir_entries); - rd->dir_names = NULL; + rd->dir_entries = NULL; } /* Set up for a directory repair. */ From 161456987a1fe4ad73c3f36dec1f684316ac9bdd Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Wed, 18 Feb 2026 15:25:35 -0800 Subject: [PATCH 245/359] xfs: fix xfs_group release bug in xfs_verify_report_losses Chris Mason reports that his AI tools noticed that we were using xfs_perag_put and xfs_group_put to release the group reference returned by xfs_group_next_range. However, the iterator function returns an object with an active refcount, which means that we must use the correct function to release the active refcount, which is _rele. Fixes: b8accfd65d31f2 ("xfs: add media verification ioctl") Reported-by: Chris Mason Link: https://lore.kernel.org/linux-xfs/20260206030527.2506821-1-clm@meta.com/ Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig Reviewed-by: Carlos Maiolino Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_verify_media.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_verify_media.c b/fs/xfs/xfs_verify_media.c index 069cd371619d..8bbd4ec567f8 100644 --- a/fs/xfs/xfs_verify_media.c +++ b/fs/xfs/xfs_verify_media.c @@ -122,7 +122,7 @@ xfs_verify_report_losses( error = xfs_alloc_read_agf(pag, tp, 0, &agf_bp); if (error) { - xfs_perag_put(pag); + xfs_perag_rele(pag); break; } @@ -158,7 +158,7 @@ xfs_verify_report_losses( if (rtg) xfs_rtgroup_unlock(rtg, XFS_RTGLOCK_RMAP); if (error) { - xfs_group_put(xg); + xfs_group_rele(xg); break; } } From eb8550fb75a875657dc29e3925a40244ec6b6bd6 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Wed, 18 Feb 2026 15:25:36 -0800 Subject: [PATCH 246/359] xfs: fix xfs_group release bug in xfs_dax_notify_dev_failure Chris Mason reports that his AI tools noticed that we were using xfs_perag_put and xfs_group_put to release the group reference returned by xfs_group_next_range. However, the iterator function returns an object with an active refcount, which means that we must use the correct function to release the active refcount, which is _rele. Cc: # v6.0 Fixes: 6f643c57d57c56 ("xfs: implement ->notify_failure() for XFS") Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig Reviewed-by: Carlos Maiolino Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_notify_failure.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c index 6be19fa1ebe2..64c8afb935c2 100644 --- a/fs/xfs/xfs_notify_failure.c +++ b/fs/xfs/xfs_notify_failure.c @@ -304,7 +304,7 @@ xfs_dax_notify_dev_failure( error = xfs_alloc_read_agf(pag, tp, 0, &agf_bp); if (error) { - xfs_perag_put(pag); + xfs_perag_rele(pag); break; } @@ -340,7 +340,7 @@ xfs_dax_notify_dev_failure( if (rtg) xfs_rtgroup_unlock(rtg, XFS_RTGLOCK_RMAP); if (error) { - xfs_group_put(xg); + xfs_group_rele(xg); break; } } From 94014a23e91a3944947048169ccf38b4561cfd0c Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Wed, 18 Feb 2026 15:25:37 -0800 Subject: [PATCH 247/359] xfs: fix potential pointer access race in xfs_healthmon_get Pankaj Raghav asks about this code in xfs_healthmon_get: hm = mp->m_healthmon; if (hm && !refcount_inc_not_zero(&hm->ref)) hm = NULL; rcu_read_unlock(); return hm; (slightly edited to compress a mailing list thread) "Nit: Should we do a READ_ONCE(mp->m_healthmon) here to avoid any compiler tricks that can result in an undefined behaviour? I am not sure if I am being paranoid here. "So this is my understanding: RCU guarantees that we get a valid object (actual data of m_healthmon) but does not guarantee the compiler will not reread the pointer between checking if hm is !NULL and accessing the pointer as we are doing it lockless. "So just a barrier() call in rcu_read_lock is enough to make sure this doesn't happen and probably adding a READ_ONCE() is not needed?" After some initial confusion I concluded that he's correct. The compiler could very well eliminate the hm variable in favor of walking the pointers twice, turning the code into: if (mp->m_healthmon && !refcount_inc_not_zero(&mp->m_healthmon->ref)) If this happens, then xfs_healthmon_detach can sneak in between the two sides of the && expression and set mp->m_healthmon to NULL, and thereby cause a null pointer dereference crash. Fix this by using the rcu pointer assignment and dereference functions, which ensure that the proper reordering barriers are in place. Practically speaking, gcc seems to allocate an actual variable for hm and only reads mp->m_healthmon once (as intended), but we ought to be more explicit about requiring this. Reported-by: Pankaj Raghav Fixes: a48373e7d35a89f6f ("xfs: start creating infrastructure for health monitoring") Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig Reviewed-by: Carlos Maiolino Reviewed-by: Pankaj Raghav Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_healthmon.c | 11 +++++++---- fs/xfs/xfs_mount.h | 2 +- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c index e37c18cec372..4a06d6632f65 100644 --- a/fs/xfs/xfs_healthmon.c +++ b/fs/xfs/xfs_healthmon.c @@ -69,7 +69,7 @@ xfs_healthmon_get( struct xfs_healthmon *hm; rcu_read_lock(); - hm = mp->m_healthmon; + hm = rcu_dereference(mp->m_healthmon); if (hm && !refcount_inc_not_zero(&hm->ref)) hm = NULL; rcu_read_unlock(); @@ -110,13 +110,13 @@ xfs_healthmon_attach( struct xfs_healthmon *hm) { spin_lock(&xfs_healthmon_lock); - if (mp->m_healthmon != NULL) { + if (rcu_access_pointer(mp->m_healthmon) != NULL) { spin_unlock(&xfs_healthmon_lock); return -EEXIST; } refcount_inc(&hm->ref); - mp->m_healthmon = hm; + rcu_assign_pointer(mp->m_healthmon, hm); hm->mount_cookie = (uintptr_t)mp->m_super; spin_unlock(&xfs_healthmon_lock); @@ -128,13 +128,16 @@ STATIC void xfs_healthmon_detach( struct xfs_healthmon *hm) { + struct xfs_mount *mp; + spin_lock(&xfs_healthmon_lock); if (hm->mount_cookie == DETACHED_MOUNT_COOKIE) { spin_unlock(&xfs_healthmon_lock); return; } - XFS_M((struct super_block *)hm->mount_cookie)->m_healthmon = NULL; + mp = XFS_M((struct super_block *)hm->mount_cookie); + rcu_assign_pointer(mp->m_healthmon, NULL); hm->mount_cookie = DETACHED_MOUNT_COOKIE; spin_unlock(&xfs_healthmon_lock); diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 61c71128d171..ddd4028be8d6 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -345,7 +345,7 @@ typedef struct xfs_mount { struct xfs_hooks m_dir_update_hooks; /* Private data referring to a health monitor object. */ - struct xfs_healthmon *m_healthmon; + struct xfs_healthmon __rcu *m_healthmon; } xfs_mount_t; #define M_IGEO(mp) (&(mp)->m_ino_geo) From 75690e5fdd74fc4d2a4aec58be9a82aec7cee721 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Wed, 18 Feb 2026 15:25:38 -0800 Subject: [PATCH 248/359] xfs: don't report metadata inodes to fserror Internal metadata inodes are not exposed to userspace programs, so it makes no sense to pass them to the fserror functions (aka fsnotify). Instead, report metadata file problems as general filesystem corruption. Fixes: 5eb4cb18e445d0 ("xfs: convey metadata health events to the health monitor") Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig Reviewed-by: Carlos Maiolino Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_health.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c index 169123772cb3..6475159eb930 100644 --- a/fs/xfs/xfs_health.c +++ b/fs/xfs/xfs_health.c @@ -314,6 +314,18 @@ xfs_rgno_mark_sick( xfs_rtgroup_put(rtg); } +static inline void xfs_inode_report_fserror(struct xfs_inode *ip) +{ + /* Report metadata inodes as general filesystem corruption */ + if (xfs_is_internal_inode(ip)) { + fserror_report_metadata(ip->i_mount->m_super, -EFSCORRUPTED, + GFP_NOFS); + return; + } + + fserror_report_file_metadata(VFS_I(ip), -EFSCORRUPTED, GFP_NOFS); +} + /* Mark the unhealthy parts of an inode. */ void xfs_inode_mark_sick( @@ -339,7 +351,7 @@ xfs_inode_mark_sick( inode_state_clear(VFS_I(ip), I_DONTCACHE); spin_unlock(&VFS_I(ip)->i_lock); - fserror_report_file_metadata(VFS_I(ip), -EFSCORRUPTED, GFP_NOFS); + xfs_inode_report_fserror(ip); if (mask) xfs_healthmon_report_inode(ip, XFS_HEALTHMON_SICK, old_mask, mask); @@ -371,7 +383,7 @@ xfs_inode_mark_corrupt( inode_state_clear(VFS_I(ip), I_DONTCACHE); spin_unlock(&VFS_I(ip)->i_lock); - fserror_report_file_metadata(VFS_I(ip), -EFSCORRUPTED, GFP_NOFS); + xfs_inode_report_fserror(ip); if (mask) xfs_healthmon_report_inode(ip, XFS_HEALTHMON_CORRUPT, old_mask, mask); From 115ea07b94d2f13942fbd93c6acde376db36b16a Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Wed, 18 Feb 2026 15:25:38 -0800 Subject: [PATCH 249/359] xfs: don't report half-built inodes to fserror Sam Sun apparently found a syzbot way to fuzz a filesystem such that xfs_iget_cache_miss would free the inode before the fserror code could catch up. Frustratingly he doesn't use the syzbot dashboard so there's no C reproducer and not even a full error report, so I'm guessing that: Inodes that are being constructed or torn down inside XFS are not visible to the VFS. They should never be reported to fserror. Also, any inode that has been freshly allocated in _cache_miss should be marked INEW immediately because, well, it's an incompletely constructed inode that isn't yet visible to the VFS. Reported-by: Sam Sun Fixes: 5eb4cb18e445d0 ("xfs: convey metadata health events to the health monitor") Signed-off-by: "Darrick J. Wong" Reviewed-by: Christoph Hellwig Reviewed-by: Carlos Maiolino Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_health.c | 8 ++++++-- fs/xfs/xfs_icache.c | 9 ++++++++- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c index 6475159eb930..239b843e83d4 100644 --- a/fs/xfs/xfs_health.c +++ b/fs/xfs/xfs_health.c @@ -316,8 +316,12 @@ xfs_rgno_mark_sick( static inline void xfs_inode_report_fserror(struct xfs_inode *ip) { - /* Report metadata inodes as general filesystem corruption */ - if (xfs_is_internal_inode(ip)) { + /* + * Do not report inodes being constructed or freed, or metadata inodes, + * to fsnotify. + */ + if (xfs_iflags_test(ip, XFS_INEW | XFS_IRECLAIM) || + xfs_is_internal_inode(ip)) { fserror_report_metadata(ip->i_mount->m_super, -EFSCORRUPTED, GFP_NOFS); return; diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index f2d4294efd37..a7a09e7eec81 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -639,6 +639,14 @@ xfs_iget_cache_miss( if (!ip) return -ENOMEM; + /* + * Set XFS_INEW as early as possible so that the health code won't pass + * the inode to the fserror code if the ondisk inode cannot be loaded. + * We're going to free the xfs_inode immediately if that happens, which + * would lead to UAF problems. + */ + xfs_iflags_set(ip, XFS_INEW); + error = xfs_imap(pag, tp, ip->i_ino, &ip->i_imap, flags); if (error) goto out_destroy; @@ -716,7 +724,6 @@ xfs_iget_cache_miss( ip->i_udquot = NULL; ip->i_gdquot = NULL; ip->i_pdquot = NULL; - xfs_iflags_set(ip, XFS_INEW); /* insert the new inode */ spin_lock(&pag->pag_ici_lock); From 8baa9bccc0156d6952d337bf17f57ce15902dfe4 Mon Sep 17 00:00:00 2001 From: "Nirjhar Roy (IBM)" Date: Fri, 20 Feb 2026 12:23:58 +0530 Subject: [PATCH 250/359] xfs: Fix xfs_last_rt_bmblock() Bug description: If the size of the last rtgroup i.e, the rtg passed to xfs_last_rt_bmblock() is such that the last rtextent falls in 0th word offset of a bmblock of the bitmap file tracking this (last) rtgroup, then in that case xfs_last_rt_bmblock() incorrectly returns the next bmblock number instead of the current/last used bmblock number. When xfs_last_rt_bmblock() incorrectly returns the next bmblock, the loop to grow/modify the bmblocks in xfs_growfs_rtg() doesn't execute and xfs_growfs basically does a nop in certain cases. xfs_growfs will do a nop when the new size of the fs will have the same number of rtgroups i.e, we are only growing the last rtgroup. Reproduce: $ mkfs.xfs -m metadir=0 -r rtdev=/dev/loop1 /dev/loop0 \ -r size=32769b -f $ mount -o rtdev=/dev/loop1 /dev/loop0 /mnt/scratch $ xfs_growfs -R $(( 32769 + 1 )) /mnt/scratch $ xfs_info /mnt/scratch | grep rtextents $ # We can see that rtextents hasn't changed Fix: Fix this by returning the current/last used bmblock when the last rtgroup size is not a multiple xfs_rtbitmap_rtx_per_rbmblock() and the next bmblock when the rtgroup size is a multiple of xfs_rtbitmap_rtx_per_rbmblock() i.e, the existing blocks are completely used up. Also, I have renamed xfs_last_rt_bmblock() to xfs_last_rt_bmblock_to_extend() to signify that this function returns the bmblock number to extend and NOT always the last used bmblock number. Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Nirjhar Roy (IBM) Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_rtalloc.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index aab59f66384e..ae53ba2093b2 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1082,17 +1082,27 @@ xfs_last_rtgroup_extents( } /* - * Calculate the last rbmblock currently used. + * This will return the bitmap block number (indexed at 0) that will be + * extended/modified. There are 2 cases here: + * 1. The size of the rtg is such that it is a multiple of + * xfs_rtbitmap_rtx_per_rbmblock() i.e, an integral number of bitmap blocks + * are completely filled up. In this case, we should return + * 1 + (the last used bitmap block number). + * 2. The size of the rtg is not an multiple of xfs_rtbitmap_rtx_per_rbmblock(). + * Here we will return the block number of last used block number. In this + * case, we will modify the last used bitmap block to extend the size of the + * rtgroup. * * This also deals with the case where there were no rtextents before. */ static xfs_fileoff_t -xfs_last_rt_bmblock( +xfs_last_rt_bmblock_to_extend( struct xfs_rtgroup *rtg) { struct xfs_mount *mp = rtg_mount(rtg); xfs_rgnumber_t rgno = rtg_rgno(rtg); xfs_fileoff_t bmbno = 0; + unsigned int mod = 0; ASSERT(!mp->m_sb.sb_rgcount || rgno >= mp->m_sb.sb_rgcount - 1); @@ -1100,9 +1110,16 @@ xfs_last_rt_bmblock( xfs_rtxnum_t nrext = xfs_last_rtgroup_extents(mp); /* Also fill up the previous block if not entirely full. */ - bmbno = xfs_rtbitmap_blockcount_len(mp, nrext); - if (xfs_rtx_to_rbmword(mp, nrext) != 0) - bmbno--; + /* We are doing a -1 to convert it to a 0 based index */ + bmbno = xfs_rtbitmap_blockcount_len(mp, nrext) - 1; + div_u64_rem(nrext, xfs_rtbitmap_rtx_per_rbmblock(mp), &mod); + /* + * mod = 0 means that all the current blocks are full. So + * return the next block number to be used for the rtgroup + * growth. + */ + if (mod == 0) + bmbno++; } return bmbno; @@ -1207,7 +1224,8 @@ xfs_growfs_rtg( goto out_rele; } - for (bmbno = xfs_last_rt_bmblock(rtg); bmbno < bmblocks; bmbno++) { + for (bmbno = xfs_last_rt_bmblock_to_extend(rtg); bmbno < bmblocks; + bmbno++) { error = xfs_growfs_rt_bmblock(rtg, nrblocks, rextsize, bmbno); if (error) goto out_error; From ac1d977096a17d56c55bd7f90be48e81ac4cec3f Mon Sep 17 00:00:00 2001 From: "Nirjhar Roy (IBM)" Date: Fri, 20 Feb 2026 12:23:59 +0530 Subject: [PATCH 251/359] xfs: Add a comment in xfs_log_sb() Add a comment explaining why the sb_frextents are updated outside the if (xfs_has_lazycount(mp) check even though it is a lazycounter. RT groups are supported only in v5 filesystems which always have lazycounter enabled - so putting it inside the if(xfs_has_lazycount(mp) check is redundant. Suggested-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Nirjhar Roy (IBM) Signed-off-by: Carlos Maiolino --- fs/xfs/libxfs/xfs_sb.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 38d16fe1f6d8..47322adb7690 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -1347,6 +1347,9 @@ xfs_log_sb( * feature was introduced. This counter can go negative due to the way * we handle nearly-lockless reservations, so we must use the _positive * variant here to avoid writing out nonsense frextents. + * + * RT groups are only supported on v5 file systems, which always + * have lazy SB counters. */ if (xfs_has_rtgroups(mp) && !xfs_has_zoned(mp)) { mp->m_sb.sb_frextents = From c2368fc89a684be2900daaa2bbf68cbc147e8d3d Mon Sep 17 00:00:00 2001 From: "Nirjhar Roy (IBM)" Date: Fri, 20 Feb 2026 12:24:00 +0530 Subject: [PATCH 252/359] xfs: Update lazy counters in xfs_growfs_rt_bmblock() Update lazy counters in xfs_growfs_rt_bmblock() similar to the way it is done xfs_growfs_data_private(). This is because the lazy counters are not always updated and synching the counters will avoid inconsistencies between frexents and rtextents(total realtime extent count). This will be more useful once realtime shrink is implemented as this will prevent some transient state to occur where frexents might be greater than total rtextents. Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Nirjhar Roy (IBM) Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_rtalloc.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index ae53ba2093b2..153f3c378f9f 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1050,6 +1050,15 @@ xfs_growfs_rt_bmblock( */ xfs_trans_resv_calc(mp, &mp->m_resv); + /* + * Sync sb counters now to reflect the updated values. Lazy counters are + * not always updated and in order to avoid inconsistencies between + * frextents and rtextents, it is better to sync the counters. + */ + + if (xfs_has_lazysbcount(mp)) + xfs_log_sb(args.tp); + error = xfs_trans_commit(args.tp); if (error) goto out_free; From 9a654a8fa3191e9ea32c4494b943c0872a3f5d27 Mon Sep 17 00:00:00 2001 From: "Nirjhar Roy (IBM)" Date: Fri, 20 Feb 2026 12:24:01 +0530 Subject: [PATCH 253/359] xfs: Add comments for usages of some macros. Add comments explaining when to use XFS_IS_CORRUPT() and ASSERT() Suggested-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Nirjhar Roy (IBM) Signed-off-by: Carlos Maiolino --- fs/xfs/xfs_platform.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/fs/xfs/xfs_platform.h b/fs/xfs/xfs_platform.h index 1e59bf94d1f2..59a33c60e0ca 100644 --- a/fs/xfs/xfs_platform.h +++ b/fs/xfs/xfs_platform.h @@ -235,6 +235,10 @@ int xfs_rw_bdev(struct block_device *bdev, sector_t sector, unsigned int count, #ifdef XFS_WARN +/* + * Please note that this ASSERT doesn't kill the kernel. It will if the kernel + * has panic_on_warn set. + */ #define ASSERT(expr) \ (likely(expr) ? (void)0 : asswarn(NULL, #expr, __FILE__, __LINE__)) @@ -245,6 +249,11 @@ int xfs_rw_bdev(struct block_device *bdev, sector_t sector, unsigned int count, #endif /* XFS_WARN */ #endif /* DEBUG */ +/* + * Use this to catch metadata corruptions that are not caught by block or + * structure verifiers. The reason is that the verifiers check corruptions only + * within the scope of the object being verified. + */ #define XFS_IS_CORRUPT(mp, expr) \ (unlikely(expr) ? xfs_corruption_error(#expr, XFS_ERRLEVEL_LOW, (mp), \ NULL, 0, __FILE__, __LINE__, \ From e97cbf863d8918452c9f81bebdade8d04e2e7b60 Mon Sep 17 00:00:00 2001 From: Wilfred Mallawa Date: Wed, 11 Feb 2026 13:29:02 +1000 Subject: [PATCH 254/359] xfs: remove duplicate static size checks In libxfs/xfs_ondisk.h, remove some duplicate entries of XFS_CHECK_STRUCT_SIZE(). Signed-off-by: Wilfred Mallawa Reviewed-by: Christoph Hellwig Signed-off-by: Carlos Maiolino --- fs/xfs/libxfs/xfs_ondisk.h | 9 --------- 1 file changed, 9 deletions(-) diff --git a/fs/xfs/libxfs/xfs_ondisk.h b/fs/xfs/libxfs/xfs_ondisk.h index 70605019383c..7bccfa7b695c 100644 --- a/fs/xfs/libxfs/xfs_ondisk.h +++ b/fs/xfs/libxfs/xfs_ondisk.h @@ -136,16 +136,7 @@ xfs_check_ondisk_structs(void) /* ondisk dir/attr structures from xfs/122 */ XFS_CHECK_STRUCT_SIZE(struct xfs_attr_sf_entry, 3); XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_data_free, 4); - XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_data_hdr, 16); XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_data_unused, 6); - XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_free, 16); - XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_free_hdr, 16); - XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_leaf, 16); - XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_leaf_entry, 8); - XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_leaf_hdr, 16); - XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_leaf_tail, 4); - XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_sf_entry, 3); - XFS_CHECK_STRUCT_SIZE(struct xfs_dir2_sf_hdr, 10); /* log structures */ XFS_CHECK_STRUCT_SIZE(struct xfs_buf_log_format, 88); From 650b774cf94495465d6a38c31bb1a6ce697b6b37 Mon Sep 17 00:00:00 2001 From: Wilfred Mallawa Date: Wed, 11 Feb 2026 13:29:04 +1000 Subject: [PATCH 255/359] xfs: add static size checks for ioctl UABI The ioctl structures in libxfs/xfs_fs.h are missing static size checks. It is useful to have static size checks for these structures as adding new fields to them could cause issues (e.g. extra padding that may be inserted by the compiler). So add these checks to xfs/xfs_ondisk.h. Due to different padding/alignment requirements across different architectures, to avoid build failures, some structures are ommited from the size checks. For example, structures with "compat_" definitions in xfs/xfs_ioctl32.h are ommited. Signed-off-by: Wilfred Mallawa Reviewed-by: Christoph Hellwig Signed-off-by: Carlos Maiolino --- fs/xfs/libxfs/xfs_ondisk.h | 39 +++++++++++++++++++++++++++++++++----- 1 file changed, 34 insertions(+), 5 deletions(-) diff --git a/fs/xfs/libxfs/xfs_ondisk.h b/fs/xfs/libxfs/xfs_ondisk.h index 7bccfa7b695c..23cde1248f01 100644 --- a/fs/xfs/libxfs/xfs_ondisk.h +++ b/fs/xfs/libxfs/xfs_ondisk.h @@ -208,11 +208,6 @@ xfs_check_ondisk_structs(void) XFS_CHECK_OFFSET(struct xfs_dir3_free, hdr.hdr.magic, 0); XFS_CHECK_OFFSET(struct xfs_attr3_leafblock, hdr.info.hdr, 0); - XFS_CHECK_STRUCT_SIZE(struct xfs_bulkstat, 192); - XFS_CHECK_STRUCT_SIZE(struct xfs_inumbers, 24); - XFS_CHECK_STRUCT_SIZE(struct xfs_bulkstat_req, 64); - XFS_CHECK_STRUCT_SIZE(struct xfs_inumbers_req, 64); - /* * Make sure the incore inode timestamp range corresponds to hand * converted values based on the ondisk format specification. @@ -292,6 +287,40 @@ xfs_check_ondisk_structs(void) XFS_CHECK_SB_OFFSET(sb_pad, 281); XFS_CHECK_SB_OFFSET(sb_rtstart, 288); XFS_CHECK_SB_OFFSET(sb_rtreserved, 296); + + /* + * ioctl UABI + * + * Due to different padding/alignment requirements across + * different architectures, some structures are ommited from + * the size checks. In addition, structures with architecture + * dependent size fields are also ommited (e.g. __kernel_long_t). + */ + XFS_CHECK_STRUCT_SIZE(struct xfs_bulkstat, 192); + XFS_CHECK_STRUCT_SIZE(struct xfs_inumbers, 24); + XFS_CHECK_STRUCT_SIZE(struct xfs_bulkstat_req, 64); + XFS_CHECK_STRUCT_SIZE(struct xfs_inumbers_req, 64); + XFS_CHECK_STRUCT_SIZE(struct dioattr, 12); + XFS_CHECK_STRUCT_SIZE(struct getbmap, 32); + XFS_CHECK_STRUCT_SIZE(struct getbmapx, 48); + XFS_CHECK_STRUCT_SIZE(struct xfs_attrlist_cursor, 16); + XFS_CHECK_STRUCT_SIZE(struct xfs_attrlist, 8); + XFS_CHECK_STRUCT_SIZE(struct xfs_attrlist, 8); + XFS_CHECK_STRUCT_SIZE(struct xfs_attrlist_ent, 4); + XFS_CHECK_STRUCT_SIZE(struct xfs_ag_geometry, 128); + XFS_CHECK_STRUCT_SIZE(struct xfs_rtgroup_geometry, 128); + XFS_CHECK_STRUCT_SIZE(struct xfs_error_injection, 8); + XFS_CHECK_STRUCT_SIZE(struct xfs_fsop_geom, 256); + XFS_CHECK_STRUCT_SIZE(struct xfs_fsop_geom_v4, 112); + XFS_CHECK_STRUCT_SIZE(struct xfs_fsop_counts, 32); + XFS_CHECK_STRUCT_SIZE(struct xfs_fsop_resblks, 16); + XFS_CHECK_STRUCT_SIZE(struct xfs_growfs_log, 8); + XFS_CHECK_STRUCT_SIZE(struct xfs_bulk_ireq, 64); + XFS_CHECK_STRUCT_SIZE(struct xfs_fs_eofblocks, 128); + XFS_CHECK_STRUCT_SIZE(struct xfs_fsid, 8); + XFS_CHECK_STRUCT_SIZE(struct xfs_scrub_metadata, 64); + XFS_CHECK_STRUCT_SIZE(struct xfs_scrub_vec, 16); + XFS_CHECK_STRUCT_SIZE(struct xfs_scrub_vec_head, 40); } #endif /* __XFS_ONDISK_H */ From 6b050482ec40569429d963ac52afa878691b04c9 Mon Sep 17 00:00:00 2001 From: Srinivas Pandruvada Date: Tue, 24 Feb 2026 16:17:52 -0800 Subject: [PATCH 256/359] cpufreq: intel_pstate: Fix crash during turbo disable When the system is booted with kernel command line argument "nosmt" or "maxcpus" to limit the number of CPUs, disabling turbo via: echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo results in a crash: PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI ... RIP: 0010:store_no_turbo+0x100/0x1f0 ... This occurs because for_each_possible_cpu() returns CPUs even if they are not online. For those CPUs, all_cpu_data[] will be NULL. Since commit 973207ae3d7c ("cpufreq: intel_pstate: Rearrange max frequency updates handling code"), all_cpu_data[] is dereferenced even for CPUs which are not online, causing the NULL pointer dereference. To fix that, pass CPU number to intel_pstate_update_max_freq() and use all_cpu_data[] for those CPUs for which there is a valid cpufreq policy. Fixes: 973207ae3d7c ("cpufreq: intel_pstate: Rearrange max frequency updates handling code") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221068 Signed-off-by: Srinivas Pandruvada Cc: 6.16+ # 6.16+ Link: https://patch.msgid.link/20260225001752.890164-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/intel_pstate.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index bdc37080d319..11c58af41900 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -1476,13 +1476,13 @@ static void __intel_pstate_update_max_freq(struct cpufreq_policy *policy, refresh_frequency_limits(policy); } -static bool intel_pstate_update_max_freq(struct cpudata *cpudata) +static bool intel_pstate_update_max_freq(int cpu) { - struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpudata->cpu); + struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpu); if (!policy) return false; - __intel_pstate_update_max_freq(policy, cpudata); + __intel_pstate_update_max_freq(policy, all_cpu_data[cpu]); return true; } @@ -1501,7 +1501,7 @@ static void intel_pstate_update_limits_for_all(void) int cpu; for_each_possible_cpu(cpu) - intel_pstate_update_max_freq(all_cpu_data[cpu]); + intel_pstate_update_max_freq(cpu); mutex_lock(&hybrid_capacity_lock); @@ -1910,7 +1910,7 @@ static void intel_pstate_notify_work(struct work_struct *work) struct cpudata *cpudata = container_of(to_delayed_work(work), struct cpudata, hwp_notify_work); - if (intel_pstate_update_max_freq(cpudata)) { + if (intel_pstate_update_max_freq(cpudata->cpu)) { /* * The driver will not be unregistered while this function is * running, so update the capacity without acquiring the driver From c9bc1753b3cc41d0e01fbca7f035258b5f4db0ae Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 24 Feb 2026 13:29:09 +0100 Subject: [PATCH 257/359] perf: Fix __perf_event_overflow() vs perf_remove_from_context() race Make sure that __perf_event_overflow() runs with IRQs disabled for all possible callchains. Specifically the software events can end up running it with only preemption disabled. This opens up a race vs perf_event_exit_event() and friends that will go and free various things the overflow path expects to be present, like the BPF program. Fixes: 592903cdcbf6 ("perf_counter: add an event_list") Reported-by: Simond Hu Signed-off-by: Peter Zijlstra (Intel) Tested-by: Simond Hu Link: https://patch.msgid.link/20260224122909.GV1395416@noisy.programming.kicks-ass.net --- kernel/events/core.c | 42 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 22a0f405585b..1f5699b339ec 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -10777,6 +10777,13 @@ int perf_event_overflow(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs) { + /* + * Entry point from hardware PMI, interrupts should be disabled here. + * This serializes us against perf_event_remove_from_context() in + * things like perf_event_release_kernel(). + */ + lockdep_assert_irqs_disabled(); + return __perf_event_overflow(event, 1, data, regs); } @@ -10853,6 +10860,19 @@ static void perf_swevent_event(struct perf_event *event, u64 nr, { struct hw_perf_event *hwc = &event->hw; + /* + * This is: + * - software preempt + * - tracepoint preempt + * - tp_target_task irq (ctx->lock) + * - uprobes preempt/irq + * - kprobes preempt/irq + * - hw_breakpoint irq + * + * Any of these are sufficient to hold off RCU and thus ensure @event + * exists. + */ + lockdep_assert_preemption_disabled(); local64_add(nr, &event->count); if (!regs) @@ -10861,6 +10881,16 @@ static void perf_swevent_event(struct perf_event *event, u64 nr, if (!is_sampling_event(event)) return; + /* + * Serialize against event_function_call() IPIs like normal overflow + * event handling. Specifically, must not allow + * perf_event_release_kernel() -> perf_remove_from_context() to make + * progress and 'release' the event from under us. + */ + guard(irqsave)(); + if (event->state != PERF_EVENT_STATE_ACTIVE) + return; + if ((event->attr.sample_type & PERF_SAMPLE_PERIOD) && !event->attr.freq) { data->period = nr; return perf_swevent_overflow(event, 1, data, regs); @@ -11359,6 +11389,11 @@ void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size, struct perf_sample_data data; struct perf_event *event; + /* + * Per being a tracepoint, this runs with preemption disabled. + */ + lockdep_assert_preemption_disabled(); + struct perf_raw_record raw = { .frag = { .size = entry_size, @@ -11691,6 +11726,11 @@ void perf_bp_event(struct perf_event *bp, void *data) struct perf_sample_data sample; struct pt_regs *regs = data; + /* + * Exception context, will have interrupts disabled. + */ + lockdep_assert_irqs_disabled(); + perf_sample_data_init(&sample, bp->attr.bp_addr, 0); if (!bp->hw.state && !perf_exclude_event(bp, regs)) @@ -12155,7 +12195,7 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer) if (regs && !perf_exclude_event(event, regs)) { if (!(event->attr.exclude_idle && is_idle_task(current))) - if (__perf_event_overflow(event, 1, &data, regs)) + if (perf_event_overflow(event, &data, regs)) ret = HRTIMER_NORESTART; } From ab6088e7a95943af3452b20e3b96caaaef3eeebd Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Tue, 24 Feb 2026 16:16:30 -0700 Subject: [PATCH 258/359] lib/Kconfig.debug: Require a release version of LLVM 22 for context analysis Using a prerelease version as a minimum supported version for CONFIG_WARN_CONTEXT_ANALYSIS was reasonable to do while LLVM 22 was the development version so that people could immediately build from main and start testing and validating this in their own code. However, it can be problematic when using prerelease versions of LLVM 22, such as Android clang 22.0.1 (the current android mainline compiler) or when bisecting LLVM between llvmorg-22-init and llvmorg-23-init, to build the kernel, as all compiler fixes for the context analysis may not be present, potentially resulting in warnings that can easily turn into errors. Now that LLVM 22 is released as 22.1.0, upgrade the check to require at least this version to ensure that a user's toolchain actually has all the changes needed for a smooth experience with context analysis. Fixes: 3269701cb256 ("compiler-context-analysis: Add infrastructure for Context Analysis with Clang") Signed-off-by: Nathan Chancellor Signed-off-by: Peter Zijlstra (Intel) Acked-by: Marco Elver Link: https://patch.msgid.link/20260224-bump-clang-ver-context-analysis-v1-1-16cc7a90a040@kernel.org --- lib/Kconfig.debug | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 4e2dfbbd3d78..8e2b858078e6 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -630,7 +630,7 @@ config DEBUG_FORCE_WEAK_PER_CPU config WARN_CONTEXT_ANALYSIS bool "Compiler context-analysis warnings" - depends on CC_IS_CLANG && CLANG_VERSION >= 220000 + depends on CC_IS_CLANG && CLANG_VERSION >= 220100 # Branch profiling re-defines "if", which messes with the compiler's # ability to analyze __cond_acquires(..), resulting in false positives. depends on !TRACE_BRANCH_PROFILING @@ -641,7 +641,7 @@ config WARN_CONTEXT_ANALYSIS and releasing user-definable "context locks". Clang's name of the feature is "Thread Safety Analysis". Requires - Clang 22 or later. + Clang 22.1.0 or later. Produces warnings by default. Select CONFIG_WERROR if you wish to turn these warnings into errors. From 85f6c439a69afe4fa8a688512e586971e97e273a Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Wed, 25 Feb 2026 10:35:57 +0000 Subject: [PATCH 259/359] io_uring/timeout: READ_ONCE sqe->addr We should use READ_ONCE when reading from a SQE, make sure timeout gets a stable timespec address. Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe --- io_uring/timeout.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/io_uring/timeout.c b/io_uring/timeout.c index 84dda24f3eb2..cb61d4862fc6 100644 --- a/io_uring/timeout.c +++ b/io_uring/timeout.c @@ -462,7 +462,7 @@ int io_timeout_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) tr->ltimeout = true; if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK|IORING_TIMEOUT_ABS)) return -EINVAL; - if (get_timespec64(&tr->ts, u64_to_user_ptr(sqe->addr2))) + if (get_timespec64(&tr->ts, u64_to_user_ptr(READ_ONCE(sqe->addr2)))) return -EFAULT; if (tr->ts.tv_sec < 0 || tr->ts.tv_nsec < 0) return -EINVAL; @@ -557,7 +557,7 @@ static int __io_timeout_prep(struct io_kiocb *req, data->req = req; data->flags = flags; - if (get_timespec64(&data->ts, u64_to_user_ptr(sqe->addr))) + if (get_timespec64(&data->ts, u64_to_user_ptr(READ_ONCE(sqe->addr)))) return -EFAULT; if (data->ts.tv_sec < 0 || data->ts.tv_nsec < 0) From 0d785e2c324c90662baa4fe07a0d02233ff92824 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Wed, 18 Feb 2026 15:20:04 +0100 Subject: [PATCH 260/359] s390/idle: Fix cpu idle exit cpu time accounting With the conversion to generic entry [1] cpu idle exit cpu time accounting was converted from assembly to C. This introduced an reversed order of cpu time accounting. On cpu idle exit the current accounting happens with the following call chain: -> do_io_irq()/do_ext_irq() -> irq_enter_rcu() -> account_hardirq_enter() -> vtime_account_irq() -> vtime_account_kernel() vtime_account_kernel() accounts the passed cpu time since last_update_timer as system time, and updates last_update_timer to the current cpu timer value. However the subsequent call of -> account_idle_time_irq() will incorrectly subtract passed cpu time from timer_idle_enter to the updated last_update_timer value from system_timer. Then last_update_timer is updated to a sys_enter_timer, which means that last_update_timer goes back in time. Subsequently account_hardirq_exit() will account too much cpu time as hardirq time. The sum of all accounted cpu times is still correct, however some cpu time which was previously accounted as system time is now accounted as hardirq time, plus there is the oddity that last_update_timer goes back in time. Restore previous behavior by extracting cpu time accounting code from account_idle_time_irq() into a new update_timer_idle() function and call it before irq_enter_rcu(). Fixes: 56e62a737028 ("s390: convert to generic entry") [1] Reviewed-by: Sven Schnelle Signed-off-by: Heiko Carstens Signed-off-by: Vasily Gorbik --- arch/s390/include/asm/idle.h | 1 + arch/s390/kernel/idle.c | 13 +++++++++---- arch/s390/kernel/irq.c | 10 ++++++++-- 3 files changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/s390/include/asm/idle.h b/arch/s390/include/asm/idle.h index 09f763b9eb40..133059d9a949 100644 --- a/arch/s390/include/asm/idle.h +++ b/arch/s390/include/asm/idle.h @@ -23,5 +23,6 @@ extern struct device_attribute dev_attr_idle_count; extern struct device_attribute dev_attr_idle_time_us; void psw_idle(struct s390_idle_data *data, unsigned long psw_mask); +void update_timer_idle(void); #endif /* _S390_IDLE_H */ diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c index 39cb8d0ae348..0f9e53f0a068 100644 --- a/arch/s390/kernel/idle.c +++ b/arch/s390/kernel/idle.c @@ -21,11 +21,10 @@ static DEFINE_PER_CPU(struct s390_idle_data, s390_idle); -void account_idle_time_irq(void) +void update_timer_idle(void) { struct s390_idle_data *idle = this_cpu_ptr(&s390_idle); struct lowcore *lc = get_lowcore(); - unsigned long idle_time; u64 cycles_new[8]; int i; @@ -35,13 +34,19 @@ void account_idle_time_irq(void) this_cpu_add(mt_cycles[i], cycles_new[i] - idle->mt_cycles_enter[i]); } - idle_time = lc->int_clock - idle->clock_idle_enter; - lc->steal_timer += idle->clock_idle_enter - lc->last_update_clock; lc->last_update_clock = lc->int_clock; lc->system_timer += lc->last_update_timer - idle->timer_idle_enter; lc->last_update_timer = lc->sys_enter_timer; +} + +void account_idle_time_irq(void) +{ + struct s390_idle_data *idle = this_cpu_ptr(&s390_idle); + unsigned long idle_time; + + idle_time = get_lowcore()->int_clock - idle->clock_idle_enter; /* Account time spent with enabled wait psw loaded as idle time. */ WRITE_ONCE(idle->idle_time, READ_ONCE(idle->idle_time) + idle_time); diff --git a/arch/s390/kernel/irq.c b/arch/s390/kernel/irq.c index f81723bc8856..d10a17e6531d 100644 --- a/arch/s390/kernel/irq.c +++ b/arch/s390/kernel/irq.c @@ -146,6 +146,10 @@ void noinstr do_io_irq(struct pt_regs *regs) struct pt_regs *old_regs = set_irq_regs(regs); bool from_idle; + from_idle = test_and_clear_cpu_flag(CIF_ENABLED_WAIT); + if (from_idle) + update_timer_idle(); + irq_enter_rcu(); if (user_mode(regs)) { @@ -154,7 +158,6 @@ void noinstr do_io_irq(struct pt_regs *regs) current->thread.last_break = regs->last_break; } - from_idle = test_and_clear_cpu_flag(CIF_ENABLED_WAIT); if (from_idle) account_idle_time_irq(); @@ -182,6 +185,10 @@ void noinstr do_ext_irq(struct pt_regs *regs) struct pt_regs *old_regs = set_irq_regs(regs); bool from_idle; + from_idle = test_and_clear_cpu_flag(CIF_ENABLED_WAIT); + if (from_idle) + update_timer_idle(); + irq_enter_rcu(); if (user_mode(regs)) { @@ -194,7 +201,6 @@ void noinstr do_ext_irq(struct pt_regs *regs) regs->int_parm = get_lowcore()->ext_params; regs->int_parm_long = get_lowcore()->ext_params2; - from_idle = test_and_clear_cpu_flag(CIF_ENABLED_WAIT); if (from_idle) account_idle_time_irq(); From dbc0fb35679ed5d0adecf7d02137ac2c77244b3b Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Wed, 18 Feb 2026 15:20:05 +0100 Subject: [PATCH 261/359] s390/vtime: Fix virtual timer forwarding Since delayed accounting of system time [1] the virtual timer is forwarded by do_account_vtime() but also vtime_account_kernel(), vtime_account_softirq(), and vtime_account_hardirq(). This leads to double accounting of system, guest, softirq, and hardirq time. Remove accounting from the vtime_account*() family to restore old behavior. There is only one user of the vtimer interface, which might explain why nobody noticed this so far. Fixes: b7394a5f4ce9 ("sched/cputime, s390: Implement delayed accounting of system time") [1] Reviewed-by: Sven Schnelle Signed-off-by: Heiko Carstens Signed-off-by: Vasily Gorbik --- arch/s390/kernel/vtime.c | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-) diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c index 234a0ba30510..122d30b10440 100644 --- a/arch/s390/kernel/vtime.c +++ b/arch/s390/kernel/vtime.c @@ -225,10 +225,6 @@ static u64 vtime_delta(void) return timer - lc->last_update_timer; } -/* - * Update process times based on virtual cpu times stored by entry.S - * to the lowcore fields user_timer, system_timer & steal_clock. - */ void vtime_account_kernel(struct task_struct *tsk) { struct lowcore *lc = get_lowcore(); @@ -238,27 +234,17 @@ void vtime_account_kernel(struct task_struct *tsk) lc->guest_timer += delta; else lc->system_timer += delta; - - virt_timer_forward(delta); } EXPORT_SYMBOL_GPL(vtime_account_kernel); void vtime_account_softirq(struct task_struct *tsk) { - u64 delta = vtime_delta(); - - get_lowcore()->softirq_timer += delta; - - virt_timer_forward(delta); + get_lowcore()->softirq_timer += vtime_delta(); } void vtime_account_hardirq(struct task_struct *tsk) { - u64 delta = vtime_delta(); - - get_lowcore()->hardirq_timer += delta; - - virt_timer_forward(delta); + get_lowcore()->hardirq_timer += vtime_delta(); } /* From aefa6ec890f0d39a89fc80e95908ec1996337eee Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Wed, 18 Feb 2026 15:20:06 +0100 Subject: [PATCH 262/359] s390/idle: Add comment for non obvious code Add a comment to update_timer_idle() which describes why wall time (not steal time) is added to steal_timer. This is not obvious and was reported by Frederic Weisbecker. Reported-by: Frederic Weisbecker Closes: https://lore.kernel.org/all/aXEVM-04lj0lntMr@localhost.localdomain/ Reviewed-by: Sven Schnelle Signed-off-by: Heiko Carstens Signed-off-by: Vasily Gorbik --- arch/s390/kernel/idle.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c index 0f9e53f0a068..4e09f126d4fc 100644 --- a/arch/s390/kernel/idle.c +++ b/arch/s390/kernel/idle.c @@ -34,6 +34,15 @@ void update_timer_idle(void) this_cpu_add(mt_cycles[i], cycles_new[i] - idle->mt_cycles_enter[i]); } + /* + * This is a bit subtle: Forward last_update_clock so it excludes idle + * time. For correct steal time calculation in do_account_vtime() add + * passed wall time before idle_enter to steal_timer: + * During the passed wall time before idle_enter CPU time may have + * been accounted to system, hardirq, softirq, etc. lowcore fields. + * The accounted CPU times will be subtracted again from steal_timer + * when accumulated steal time is calculated in do_account_vtime(). + */ lc->steal_timer += idle->clock_idle_enter - lc->last_update_clock; lc->last_update_clock = lc->int_clock; From 00d8b035eb71262abb547d47717b19f4a55ad4f2 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Wed, 18 Feb 2026 15:20:07 +0100 Subject: [PATCH 263/359] s390/idle: Slightly optimize idle time accounting Slightly optimize account_idle_time_irq() and update_timer_idle(): - Use fast single instruction __atomic64() primitives to update per cpu idle_time and idle_count, instead of READ_ONCE() / WRITE_ONCE() pairs - stcctm() is an inline assembly with a full memory barrier. This leads to a not necessary extra dereference of smp_cpu_mtid in update_timer_idle(). Avoid this and read smp_cpu_mtid into a variable - Use __this_cpu_add() instead of this_cpu_add() to avoid disabling / enabling of preemption several times in a loop in update_timer_idle(). Reviewed-by: Sven Schnelle Signed-off-by: Heiko Carstens Signed-off-by: Vasily Gorbik --- arch/s390/kernel/idle.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c index 4e09f126d4fc..627d82dd900e 100644 --- a/arch/s390/kernel/idle.c +++ b/arch/s390/kernel/idle.c @@ -26,12 +26,13 @@ void update_timer_idle(void) struct s390_idle_data *idle = this_cpu_ptr(&s390_idle); struct lowcore *lc = get_lowcore(); u64 cycles_new[8]; - int i; + int i, mtid; - if (smp_cpu_mtid) { - stcctm(MT_DIAG, smp_cpu_mtid, cycles_new); - for (i = 0; i < smp_cpu_mtid; i++) - this_cpu_add(mt_cycles[i], cycles_new[i] - idle->mt_cycles_enter[i]); + mtid = smp_cpu_mtid; + if (mtid) { + stcctm(MT_DIAG, mtid, cycles_new); + for (i = 0; i < mtid; i++) + __this_cpu_add(mt_cycles[i], cycles_new[i] - idle->mt_cycles_enter[i]); } /* @@ -58,8 +59,8 @@ void account_idle_time_irq(void) idle_time = get_lowcore()->int_clock - idle->clock_idle_enter; /* Account time spent with enabled wait psw loaded as idle time. */ - WRITE_ONCE(idle->idle_time, READ_ONCE(idle->idle_time) + idle_time); - WRITE_ONCE(idle->idle_count, READ_ONCE(idle->idle_count) + 1); + __atomic64_add(idle_time, &idle->idle_time); + __atomic64_add_const(1, &idle->idle_count); account_idle_time(cputime_to_nsecs(idle_time)); } From 257c14e5a1d8de40fded80fd8b525af55a6ded26 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Wed, 18 Feb 2026 15:20:08 +0100 Subject: [PATCH 264/359] s390/idle: Inline update_timer_idle() Inline update_timer_idle() again to avoid an extra function call. This way the generated code is close to old assembler version again. Reviewed-by: Sven Schnelle Signed-off-by: Heiko Carstens Signed-off-by: Vasily Gorbik --- arch/s390/include/asm/idle.h | 3 ++- arch/s390/include/asm/vtime.h | 34 ++++++++++++++++++++++++++++++++++ arch/s390/kernel/entry.h | 2 -- arch/s390/kernel/idle.c | 34 ++-------------------------------- 4 files changed, 38 insertions(+), 35 deletions(-) diff --git a/arch/s390/include/asm/idle.h b/arch/s390/include/asm/idle.h index 133059d9a949..8c5aa7329461 100644 --- a/arch/s390/include/asm/idle.h +++ b/arch/s390/include/asm/idle.h @@ -19,10 +19,11 @@ struct s390_idle_data { unsigned long mt_cycles_enter[8]; }; +DECLARE_PER_CPU(struct s390_idle_data, s390_idle); + extern struct device_attribute dev_attr_idle_count; extern struct device_attribute dev_attr_idle_time_us; void psw_idle(struct s390_idle_data *data, unsigned long psw_mask); -void update_timer_idle(void); #endif /* _S390_IDLE_H */ diff --git a/arch/s390/include/asm/vtime.h b/arch/s390/include/asm/vtime.h index 9d25fb35a042..b1db75d14e9d 100644 --- a/arch/s390/include/asm/vtime.h +++ b/arch/s390/include/asm/vtime.h @@ -2,6 +2,12 @@ #ifndef _S390_VTIME_H #define _S390_VTIME_H +#include +#include +#include + +DECLARE_PER_CPU(u64, mt_cycles[8]); + static inline void update_timer_sys(void) { struct lowcore *lc = get_lowcore(); @@ -20,4 +26,32 @@ static inline void update_timer_mcck(void) lc->last_update_timer = lc->mcck_enter_timer; } +static inline void update_timer_idle(void) +{ + struct s390_idle_data *idle = this_cpu_ptr(&s390_idle); + struct lowcore *lc = get_lowcore(); + u64 cycles_new[8]; + int i, mtid; + + mtid = smp_cpu_mtid; + if (mtid) { + stcctm(MT_DIAG, mtid, cycles_new); + for (i = 0; i < mtid; i++) + __this_cpu_add(mt_cycles[i], cycles_new[i] - idle->mt_cycles_enter[i]); + } + /* + * This is a bit subtle: Forward last_update_clock so it excludes idle + * time. For correct steal time calculation in do_account_vtime() add + * passed wall time before idle_enter to steal_timer: + * During the passed wall time before idle_enter CPU time may have + * been accounted to system, hardirq, softirq, etc. lowcore fields. + * The accounted CPU times will be subtracted again from steal_timer + * when accumulated steal time is calculated in do_account_vtime(). + */ + lc->steal_timer += idle->clock_idle_enter - lc->last_update_clock; + lc->last_update_clock = lc->int_clock; + lc->system_timer += lc->last_update_timer - idle->timer_idle_enter; + lc->last_update_timer = lc->sys_enter_timer; +} + #endif /* _S390_VTIME_H */ diff --git a/arch/s390/kernel/entry.h b/arch/s390/kernel/entry.h index dd55cc6bbc28..fb67b4abe68c 100644 --- a/arch/s390/kernel/entry.h +++ b/arch/s390/kernel/entry.h @@ -56,8 +56,6 @@ long sys_s390_pci_mmio_write(unsigned long, const void __user *, size_t); long sys_s390_pci_mmio_read(unsigned long, void __user *, size_t); long sys_s390_sthyi(unsigned long function_code, void __user *buffer, u64 __user *return_code, unsigned long flags); -DECLARE_PER_CPU(u64, mt_cycles[8]); - unsigned long stack_alloc(void); void stack_free(unsigned long stack); diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c index 627d82dd900e..1f1b06b6b4ef 100644 --- a/arch/s390/kernel/idle.c +++ b/arch/s390/kernel/idle.c @@ -15,41 +15,11 @@ #include #include #include +#include #include #include -#include "entry.h" -static DEFINE_PER_CPU(struct s390_idle_data, s390_idle); - -void update_timer_idle(void) -{ - struct s390_idle_data *idle = this_cpu_ptr(&s390_idle); - struct lowcore *lc = get_lowcore(); - u64 cycles_new[8]; - int i, mtid; - - mtid = smp_cpu_mtid; - if (mtid) { - stcctm(MT_DIAG, mtid, cycles_new); - for (i = 0; i < mtid; i++) - __this_cpu_add(mt_cycles[i], cycles_new[i] - idle->mt_cycles_enter[i]); - } - - /* - * This is a bit subtle: Forward last_update_clock so it excludes idle - * time. For correct steal time calculation in do_account_vtime() add - * passed wall time before idle_enter to steal_timer: - * During the passed wall time before idle_enter CPU time may have - * been accounted to system, hardirq, softirq, etc. lowcore fields. - * The accounted CPU times will be subtracted again from steal_timer - * when accumulated steal time is calculated in do_account_vtime(). - */ - lc->steal_timer += idle->clock_idle_enter - lc->last_update_clock; - lc->last_update_clock = lc->int_clock; - - lc->system_timer += lc->last_update_timer - idle->timer_idle_enter; - lc->last_update_timer = lc->sys_enter_timer; -} +DEFINE_PER_CPU(struct s390_idle_data, s390_idle); void account_idle_time_irq(void) { From d8b5cf9c63143fae54a734c41e3bb55cf3f365c7 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Wed, 18 Feb 2026 15:20:09 +0100 Subject: [PATCH 265/359] s390/irq/idle: Remove psw bits early Remove wait, io, external interrupt bits early in do_io_irq()/do_ext_irq() when previous context was idle. This saves one conditional branch and is closer to the original old assembly code. Reviewed-by: Sven Schnelle Signed-off-by: Heiko Carstens Signed-off-by: Vasily Gorbik --- arch/s390/kernel/irq.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/arch/s390/kernel/irq.c b/arch/s390/kernel/irq.c index d10a17e6531d..7fdf960191d3 100644 --- a/arch/s390/kernel/irq.c +++ b/arch/s390/kernel/irq.c @@ -147,8 +147,10 @@ void noinstr do_io_irq(struct pt_regs *regs) bool from_idle; from_idle = test_and_clear_cpu_flag(CIF_ENABLED_WAIT); - if (from_idle) + if (from_idle) { update_timer_idle(); + regs->psw.mask &= ~(PSW_MASK_EXT | PSW_MASK_IO | PSW_MASK_WAIT); + } irq_enter_rcu(); @@ -174,9 +176,6 @@ void noinstr do_io_irq(struct pt_regs *regs) set_irq_regs(old_regs); irqentry_exit(regs, state); - - if (from_idle) - regs->psw.mask &= ~(PSW_MASK_EXT | PSW_MASK_IO | PSW_MASK_WAIT); } void noinstr do_ext_irq(struct pt_regs *regs) @@ -186,8 +185,10 @@ void noinstr do_ext_irq(struct pt_regs *regs) bool from_idle; from_idle = test_and_clear_cpu_flag(CIF_ENABLED_WAIT); - if (from_idle) + if (from_idle) { update_timer_idle(); + regs->psw.mask &= ~(PSW_MASK_EXT | PSW_MASK_IO | PSW_MASK_WAIT); + } irq_enter_rcu(); @@ -209,9 +210,6 @@ void noinstr do_ext_irq(struct pt_regs *regs) irq_exit_rcu(); set_irq_regs(old_regs); irqentry_exit(regs, state); - - if (from_idle) - regs->psw.mask &= ~(PSW_MASK_EXT | PSW_MASK_IO | PSW_MASK_WAIT); } static void show_msi_interrupt(struct seq_file *p, int irq) From e282ccd308de7b2a65eb4596da8030c18b543071 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Wed, 18 Feb 2026 15:20:10 +0100 Subject: [PATCH 266/359] s390/vtime: Use __this_cpu_read() / get rid of READ_ONCE() do_account_vtime() runs always with interrupts disabled, therefore use __this_cpu_read() instead of this_cpu_read() to get rid of a pointless preempt_disable() / preempt_enable() pair. Also there are no concurrent writers to the cpu time accounting fields in lowcore. Therefore get rid of READ_ONCE() usages. Reviewed-by: Sven Schnelle Signed-off-by: Heiko Carstens Signed-off-by: Vasily Gorbik --- arch/s390/kernel/vtime.c | 21 +++++++-------------- 1 file changed, 7 insertions(+), 14 deletions(-) diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c index 122d30b10440..4111ff4d727c 100644 --- a/arch/s390/kernel/vtime.c +++ b/arch/s390/kernel/vtime.c @@ -137,23 +137,16 @@ static int do_account_vtime(struct task_struct *tsk) lc->system_timer += timer; /* Update MT utilization calculation */ - if (smp_cpu_mtid && - time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies))) + if (smp_cpu_mtid && time_after64(jiffies_64, __this_cpu_read(mt_scaling_jiffies))) update_mt_scaling(); /* Calculate cputime delta */ - user = update_tsk_timer(&tsk->thread.user_timer, - READ_ONCE(lc->user_timer)); - guest = update_tsk_timer(&tsk->thread.guest_timer, - READ_ONCE(lc->guest_timer)); - system = update_tsk_timer(&tsk->thread.system_timer, - READ_ONCE(lc->system_timer)); - hardirq = update_tsk_timer(&tsk->thread.hardirq_timer, - READ_ONCE(lc->hardirq_timer)); - softirq = update_tsk_timer(&tsk->thread.softirq_timer, - READ_ONCE(lc->softirq_timer)); - lc->steal_timer += - clock - user - guest - system - hardirq - softirq; + user = update_tsk_timer(&tsk->thread.user_timer, lc->user_timer); + guest = update_tsk_timer(&tsk->thread.guest_timer, lc->guest_timer); + system = update_tsk_timer(&tsk->thread.system_timer, lc->system_timer); + hardirq = update_tsk_timer(&tsk->thread.hardirq_timer, lc->hardirq_timer); + softirq = update_tsk_timer(&tsk->thread.softirq_timer, lc->softirq_timer); + lc->steal_timer += clock - user - guest - system - hardirq - softirq; /* Push account value */ if (user) { From 80f63fd09efe9d9c016fd6378116eb7394edd2f2 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Wed, 18 Feb 2026 15:20:11 +0100 Subject: [PATCH 267/359] s390/vtime: Use lockdep_assert_irqs_disabled() instead of BUG_ON() Use lockdep_assert_irqs_disabled() instead of BUG_ON(). This avoids crashing the kernel, and generates better code if CONFIG_PROVE_LOCKING is disabled. Reviewed-by: Sven Schnelle Signed-off-by: Heiko Carstens Signed-off-by: Vasily Gorbik --- arch/s390/kernel/vtime.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c index 4111ff4d727c..bf48744d0912 100644 --- a/arch/s390/kernel/vtime.c +++ b/arch/s390/kernel/vtime.c @@ -48,8 +48,7 @@ static inline void set_vtimer(u64 expires) static inline int virt_timer_forward(u64 elapsed) { - BUG_ON(!irqs_disabled()); - + lockdep_assert_irqs_disabled(); if (list_empty(&virt_timer_list)) return 0; elapsed = atomic64_add_return(elapsed, &virt_timer_elapsed); From acbc0437ca3cf6c19005476a4f1f5a7c96ff4256 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Wed, 18 Feb 2026 15:20:12 +0100 Subject: [PATCH 268/359] s390/idle: Remove psw_idle() prototype psw_idle() does not exist anymore. Remove its prototype. Reviewed-by: Sven Schnelle Signed-off-by: Heiko Carstens Signed-off-by: Vasily Gorbik --- arch/s390/include/asm/idle.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/s390/include/asm/idle.h b/arch/s390/include/asm/idle.h index 8c5aa7329461..32536ee34aa0 100644 --- a/arch/s390/include/asm/idle.h +++ b/arch/s390/include/asm/idle.h @@ -24,6 +24,4 @@ DECLARE_PER_CPU(struct s390_idle_data, s390_idle); extern struct device_attribute dev_attr_idle_count; extern struct device_attribute dev_attr_idle_time_us; -void psw_idle(struct s390_idle_data *data, unsigned long psw_mask); - #endif /* _S390_IDLE_H */ From 1623a554c68f352c17d0a358bc62580dc187f06b Mon Sep 17 00:00:00 2001 From: Vasily Gorbik Date: Mon, 23 Feb 2026 23:33:52 +0100 Subject: [PATCH 269/359] s390/kexec: Disable stack protector in s390_reset_system() s390_reset_system() calls set_prefix(0), which switches back to the absolute lowcore. At that point the stack protector canary no longer matches the canary from the lowcore the function was entered with, so the stack check fails. Mark s390_reset_system() __no_stack_protector. This is safe here since its callers (__do_machine_kdump() and __do_machine_kexec()) are effectively no-return and fall back to disabled_wait() on failure. Fixes: f5730d44e05e ("s390: Add stackprotector support") Reported-by: Nikita Dubrovskii Reviewed-by: Heiko Carstens Acked-by: Alexander Gordeev Signed-off-by: Vasily Gorbik --- arch/s390/kernel/ipl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c index dcdc7e274848..049c557c452f 100644 --- a/arch/s390/kernel/ipl.c +++ b/arch/s390/kernel/ipl.c @@ -2377,7 +2377,7 @@ void __init setup_ipl(void) atomic_notifier_chain_register(&panic_notifier_list, &on_panic_nb); } -void s390_reset_system(void) +void __no_stack_protector s390_reset_system(void) { /* Disable prefixing */ set_prefix(0); From d879ac6756b662a085a743e76023c768c3241579 Mon Sep 17 00:00:00 2001 From: Alexander Gordeev Date: Tue, 24 Feb 2026 07:41:07 +0100 Subject: [PATCH 270/359] s390/pfault: Fix virtual vs physical address confusion When Linux is running as guest, runs a user space process and the user space process accesses a page that the host has paged out, the guest gets a pfault interrupt and schedules a different process. Without this mechanism the host would have to suspend the whole virtual CPU until the page has been paged in. To setup the pfault interrupt the real address of parameter list should be passed to DIAGNOSE 0x258, but a virtual address is passed instead. That has a performance impact, since the pfault setup never succeeds, the interrupt is never delivered to a guest and the whole virtual CPU is suspended as result. Cc: stable@vger.kernel.org Fixes: c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces") Reported-by: Claudio Imbrenda Reviewed-by: Heiko Carstens Signed-off-by: Alexander Gordeev Signed-off-by: Vasily Gorbik --- arch/s390/mm/pfault.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/s390/mm/pfault.c b/arch/s390/mm/pfault.c index 2f829448c719..6ecd6b0a22a8 100644 --- a/arch/s390/mm/pfault.c +++ b/arch/s390/mm/pfault.c @@ -62,7 +62,7 @@ int __pfault_init(void) "0: nopr %%r7\n" EX_TABLE(0b, 0b) : [rc] "+d" (rc) - : [refbk] "a" (&pfault_init_refbk), "m" (pfault_init_refbk) + : [refbk] "a" (virt_to_phys(&pfault_init_refbk)), "m" (pfault_init_refbk) : "cc"); return rc; } @@ -84,7 +84,7 @@ void __pfault_fini(void) "0: nopr %%r7\n" EX_TABLE(0b, 0b) : - : [refbk] "a" (&pfault_fini_refbk), "m" (pfault_fini_refbk) + : [refbk] "a" (virt_to_phys(&pfault_fini_refbk)), "m" (pfault_fini_refbk) : "cc"); } From 3ea20672d23b21327266534e08ca10d8f281f4ab Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Tue, 24 Feb 2026 13:41:20 +0100 Subject: [PATCH 271/359] MAINTAINERS: Update contact with the kernel.org address Use the kernel.org address as a unified single entry to send patches to. At the same time, update mailmap to group all past contributions. Signed-off-by: Daniel Lezcano --- .mailmap | 4 ++++ MAINTAINERS | 12 ++++++------ 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/.mailmap b/.mailmap index e1cf6bb85d33..463de4784ecb 100644 --- a/.mailmap +++ b/.mailmap @@ -210,6 +210,10 @@ Daniel Borkmann Daniel Borkmann Daniel Borkmann Daniel Borkmann +Daniel Lezcano +Daniel Lezcano +Daniel Lezcano +Daniel Lezcano Daniel Thompson Danilo Krummrich David Brownell diff --git a/MAINTAINERS b/MAINTAINERS index 55af015174a5..6c8752099924 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6280,7 +6280,7 @@ S: Maintained F: include/linux/clk.h CLOCKSOURCE, CLOCKEVENT DRIVERS -M: Daniel Lezcano +M: Daniel Lezcano M: Thomas Gleixner L: linux-kernel@vger.kernel.org S: Supported @@ -6669,7 +6669,7 @@ F: rust/kernel/cpu.rs CPU IDLE TIME MANAGEMENT FRAMEWORK M: "Rafael J. Wysocki" -M: Daniel Lezcano +M: Daniel Lezcano R: Christian Loehle L: linux-pm@vger.kernel.org S: Maintained @@ -6699,7 +6699,7 @@ F: arch/x86/kernel/msr.c CPUIDLE DRIVER - ARM BIG LITTLE M: Lorenzo Pieralisi -M: Daniel Lezcano +M: Daniel Lezcano L: linux-pm@vger.kernel.org L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) S: Maintained @@ -6707,7 +6707,7 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git F: drivers/cpuidle/cpuidle-big_little.c CPUIDLE DRIVER - ARM EXYNOS -M: Daniel Lezcano +M: Daniel Lezcano M: Kukjin Kim R: Krzysztof Kozlowski L: linux-pm@vger.kernel.org @@ -26217,7 +26217,7 @@ F: drivers/media/radio/radio-raremono.c THERMAL M: Rafael J. Wysocki -M: Daniel Lezcano +M: Daniel Lezcano R: Zhang Rui R: Lukasz Luba L: linux-pm@vger.kernel.org @@ -26247,7 +26247,7 @@ F: drivers/thermal/amlogic_thermal.c THERMAL/CPU_COOLING M: Amit Daniel Kachhap -M: Daniel Lezcano +M: Daniel Lezcano M: Viresh Kumar R: Lukasz Luba L: linux-pm@vger.kernel.org From f6bf47ab32e0863df50f5501d207dcdddb7fc507 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Mon, 23 Feb 2026 22:10:10 +0000 Subject: [PATCH 272/359] arm64: io: Rename ioremap_prot() to __ioremap_prot() Rename our ioremap_prot() implementation to __ioremap_prot() and convert all arch-internal callers over to the new function. ioremap_prot() remains as a #define to __ioremap_prot() for generic_access_phys() and will be subsequently extended to handle user permissions in 'prot'. Cc: Zeng Heng Cc: Jinjiang Tu Cc: Catalin Marinas Reviewed-by: Catalin Marinas Signed-off-by: Will Deacon --- arch/arm64/include/asm/io.h | 11 ++++++----- arch/arm64/kernel/acpi.c | 2 +- arch/arm64/mm/ioremap.c | 6 +++--- 3 files changed, 10 insertions(+), 9 deletions(-) diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h index 83e03abbb2ca..cd2fddfe814a 100644 --- a/arch/arm64/include/asm/io.h +++ b/arch/arm64/include/asm/io.h @@ -264,19 +264,20 @@ __iowrite64_copy(void __iomem *to, const void *from, size_t count) typedef int (*ioremap_prot_hook_t)(phys_addr_t phys_addr, size_t size, pgprot_t *prot); int arm64_ioremap_prot_hook_register(const ioremap_prot_hook_t hook); +void __iomem *__ioremap_prot(phys_addr_t phys, size_t size, pgprot_t prot); -#define ioremap_prot ioremap_prot +#define ioremap_prot __ioremap_prot #define _PAGE_IOREMAP PROT_DEVICE_nGnRE #define ioremap_wc(addr, size) \ - ioremap_prot((addr), (size), __pgprot(PROT_NORMAL_NC)) + __ioremap_prot((addr), (size), __pgprot(PROT_NORMAL_NC)) #define ioremap_np(addr, size) \ - ioremap_prot((addr), (size), __pgprot(PROT_DEVICE_nGnRnE)) + __ioremap_prot((addr), (size), __pgprot(PROT_DEVICE_nGnRnE)) #define ioremap_encrypted(addr, size) \ - ioremap_prot((addr), (size), PAGE_KERNEL) + __ioremap_prot((addr), (size), PAGE_KERNEL) /* * io{read,write}{16,32,64}be() macros @@ -297,7 +298,7 @@ static inline void __iomem *ioremap_cache(phys_addr_t addr, size_t size) if (pfn_is_map_memory(__phys_to_pfn(addr))) return (void __iomem *)__phys_to_virt(addr); - return ioremap_prot(addr, size, __pgprot(PROT_NORMAL)); + return __ioremap_prot(addr, size, __pgprot(PROT_NORMAL)); } /* diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c index af90128cfed5..a9d884fd1d00 100644 --- a/arch/arm64/kernel/acpi.c +++ b/arch/arm64/kernel/acpi.c @@ -377,7 +377,7 @@ void __iomem *acpi_os_ioremap(acpi_physical_address phys, acpi_size size) prot = __acpi_get_writethrough_mem_attribute(); } } - return ioremap_prot(phys, size, prot); + return __ioremap_prot(phys, size, prot); } /* diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c index b12cbed9b5ad..6eef48ef6278 100644 --- a/arch/arm64/mm/ioremap.c +++ b/arch/arm64/mm/ioremap.c @@ -14,8 +14,8 @@ int arm64_ioremap_prot_hook_register(ioremap_prot_hook_t hook) return 0; } -void __iomem *ioremap_prot(phys_addr_t phys_addr, size_t size, - pgprot_t pgprot) +void __iomem *__ioremap_prot(phys_addr_t phys_addr, size_t size, + pgprot_t pgprot) { unsigned long last_addr = phys_addr + size - 1; @@ -39,7 +39,7 @@ void __iomem *ioremap_prot(phys_addr_t phys_addr, size_t size, return generic_ioremap_prot(phys_addr, size, pgprot); } -EXPORT_SYMBOL(ioremap_prot); +EXPORT_SYMBOL(__ioremap_prot); /* * Must be called after early_fixmap_init From 8f098037139b294050053123ab2bc0f819d08932 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Mon, 23 Feb 2026 22:10:11 +0000 Subject: [PATCH 273/359] arm64: io: Extract user memory type in ioremap_prot() The only caller of ioremap_prot() outside of the generic ioremap() implementation is generic_access_phys(), which passes a 'pgprot_t' value determined from the user mapping of the target 'pfn' being accessed by the kernel. On arm64, the 'pgprot_t' contains all of the non-address bits from the pte, including the permission controls, and so we end up returning a new user mapping from ioremap_prot() which faults when accessed from the kernel on systems with PAN: | Unable to handle kernel read from unreadable memory at virtual address ffff80008ea89000 | ... | Call trace: | __memcpy_fromio+0x80/0xf8 | generic_access_phys+0x20c/0x2b8 | __access_remote_vm+0x46c/0x5b8 | access_remote_vm+0x18/0x30 | environ_read+0x238/0x3e8 | vfs_read+0xe4/0x2b0 | ksys_read+0xcc/0x178 | __arm64_sys_read+0x4c/0x68 Extract only the memory type from the user 'pgprot_t' in ioremap_prot() and assert that we're being passed a user mapping, to protect us against any changes in future that may require additional handling. To avoid falsely flagging users of ioremap(), provide our own ioremap() macro which simply wraps __ioremap_prot(). Cc: Zeng Heng Cc: Jinjiang Tu Cc: Catalin Marinas Fixes: 893dea9ccd08 ("arm64: Add HAVE_IOREMAP_PROT support") Reported-by: Jinjiang Tu Reviewed-by: Catalin Marinas Signed-off-by: Will Deacon --- arch/arm64/include/asm/io.h | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h index cd2fddfe814a..8cbd1e96fd50 100644 --- a/arch/arm64/include/asm/io.h +++ b/arch/arm64/include/asm/io.h @@ -266,10 +266,23 @@ typedef int (*ioremap_prot_hook_t)(phys_addr_t phys_addr, size_t size, int arm64_ioremap_prot_hook_register(const ioremap_prot_hook_t hook); void __iomem *__ioremap_prot(phys_addr_t phys, size_t size, pgprot_t prot); -#define ioremap_prot __ioremap_prot +static inline void __iomem *ioremap_prot(phys_addr_t phys, size_t size, + pgprot_t user_prot) +{ + pgprot_t prot; + ptdesc_t user_prot_val = pgprot_val(user_prot); -#define _PAGE_IOREMAP PROT_DEVICE_nGnRE + if (WARN_ON_ONCE(!(user_prot_val & PTE_USER))) + return NULL; + prot = __pgprot_modify(PAGE_KERNEL, PTE_ATTRINDX_MASK, + user_prot_val & PTE_ATTRINDX_MASK); + return __ioremap_prot(phys, size, prot); +} +#define ioremap_prot ioremap_prot + +#define ioremap(addr, size) \ + __ioremap_prot((addr), (size), __pgprot(PROT_DEVICE_nGnRE)) #define ioremap_wc(addr, size) \ __ioremap_prot((addr), (size), __pgprot(PROT_NORMAL_NC)) #define ioremap_np(addr, size) \ From 8a85b3131225a8c8143ba2ae29c0eef8c1f9117f Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Mon, 23 Feb 2026 17:45:30 +0000 Subject: [PATCH 274/359] arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is enabled When FEAT_LPA2 is enabled, bits 8-9 of the PTE replace the shareability attribute with bits 50-51 of the output address. The _PAGE_GCS{,_RO} definitions include the PTE_SHARED bits as 0b11 (this matches the other _PAGE_* definitions) but using this macro directly leads to the following panic when enabling GCS on a system/model with LPA2: Unable to handle kernel paging request at virtual address fffff1ffc32d8008 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000060f4d000 [fffff1ffc32d8008] pgd=100000006184b003, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] SMP CPU: 0 UID: 0 PID: 513 Comm: gcs_write_fault Tainted: G M 7.0.0-rc1 #1 PREEMPT Tainted: [M]=MACHINE_CHECK Hardware name: QEMU QEMU Virtual Machine, BIOS 2025.02-8+deb13u1 11/08/2025 pstate: 03402005 (nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : zap_huge_pmd+0x168/0x468 lr : zap_huge_pmd+0x2c/0x468 sp : ffff800080beb660 x29: ffff800080beb660 x28: fff00000c2058180 x27: ffff800080beb898 x26: fff00000c2058180 x25: ffff800080beb820 x24: 00c800010b600f41 x23: ffffc1ffc30af1a8 x22: fff00000c2058180 x21: 0000ffff8dc00000 x20: fff00000c2bc6370 x19: ffff800080beb898 x18: ffff800080bebb60 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000007 x14: 000000000000000a x13: 0000aaaacbbbffff x12: 0000000000000000 x11: 0000ffff8ddfffff x10: 00000000000001fe x9 : 0000ffff8ddfffff x8 : 0000ffff8de00000 x7 : 0000ffff8da00000 x6 : fff00000c2bc6370 x5 : 0000ffff8da00000 x4 : 000000010b600000 x3 : ffffc1ffc0000000 x2 : fff00000c2058180 x1 : fffff1ffc32d8000 x0 : 000000c00010b600 Call trace: zap_huge_pmd+0x168/0x468 (P) unmap_page_range+0xd70/0x1560 unmap_single_vma+0x48/0x80 unmap_vmas+0x90/0x180 unmap_region+0x88/0xe4 vms_complete_munmap_vmas+0xf8/0x1e0 do_vmi_align_munmap+0x158/0x180 do_vmi_munmap+0xac/0x160 __vm_munmap+0xb0/0x138 vm_munmap+0x14/0x20 gcs_free+0x70/0x80 mm_release+0x1c/0xc8 exit_mm_release+0x28/0x38 do_exit+0x190/0x8ec do_group_exit+0x34/0x90 get_signal+0x794/0x858 arch_do_signal_or_restart+0x11c/0x3e0 exit_to_user_mode_loop+0x10c/0x17c el0_da+0x8c/0x9c el0t_64_sync_handler+0xd0/0xf0 el0t_64_sync+0x198/0x19c Code: aa1603e2 d34cfc00 cb813001 8b011861 (f9400420) Similarly to how the kernel handles protection_map[], use a gcs_page_prot variable to store the protection bits and clear PTE_SHARED if LPA2 is enabled. Also remove the unused PAGE_GCS{,_RO} macros. Signed-off-by: Catalin Marinas Fixes: 6497b66ba694 ("arm64/mm: Map pages for guarded control stack") Reported-by: Emanuele Rocca Cc: stable@vger.kernel.org Cc: Mark Brown Cc: Will Deacon Reviewed-by: David Hildenbrand (Arm) Signed-off-by: Will Deacon --- arch/arm64/include/asm/pgtable-prot.h | 3 --- arch/arm64/mm/mmap.c | 8 ++++++-- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index d27e8872fe3c..2b32639160de 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -164,9 +164,6 @@ static inline bool __pure lpa2_is_enabled(void) #define _PAGE_GCS (_PAGE_DEFAULT | PTE_NG | PTE_UXN | PTE_WRITE | PTE_USER) #define _PAGE_GCS_RO (_PAGE_DEFAULT | PTE_NG | PTE_UXN | PTE_USER) -#define PAGE_GCS __pgprot(_PAGE_GCS) -#define PAGE_GCS_RO __pgprot(_PAGE_GCS_RO) - #define PIE_E0 ( \ PIRx_ELx_PERM_PREP(pte_pi_index(_PAGE_GCS), PIE_GCS) | \ PIRx_ELx_PERM_PREP(pte_pi_index(_PAGE_GCS_RO), PIE_R) | \ diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c index 08ee177432c2..75f343009b4b 100644 --- a/arch/arm64/mm/mmap.c +++ b/arch/arm64/mm/mmap.c @@ -34,6 +34,8 @@ static pgprot_t protection_map[16] __ro_after_init = { [VM_SHARED | VM_EXEC | VM_WRITE | VM_READ] = PAGE_SHARED_EXEC }; +static ptdesc_t gcs_page_prot __ro_after_init = _PAGE_GCS_RO; + /* * You really shouldn't be using read() or write() on /dev/mem. This might go * away in the future. @@ -73,9 +75,11 @@ static int __init adjust_protection_map(void) protection_map[VM_EXEC | VM_SHARED] = PAGE_EXECONLY; } - if (lpa2_is_enabled()) + if (lpa2_is_enabled()) { for (int i = 0; i < ARRAY_SIZE(protection_map); i++) pgprot_val(protection_map[i]) &= ~PTE_SHARED; + gcs_page_prot &= ~PTE_SHARED; + } return 0; } @@ -87,7 +91,7 @@ pgprot_t vm_get_page_prot(vm_flags_t vm_flags) /* Short circuit GCS to avoid bloating the table. */ if (system_supports_gcs() && (vm_flags & VM_SHADOW_STACK)) { - prot = _PAGE_GCS_RO; + prot = gcs_page_prot; } else { prot = pgprot_val(protection_map[vm_flags & (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); From 47a8aad135ac1aed04b7b0c0a8157fd208075827 Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Mon, 23 Feb 2026 17:45:31 +0000 Subject: [PATCH 275/359] arm64: gcs: Honour mprotect(PROT_NONE) on shadow stack mappings vm_get_page_prot() short-circuits the protection_map[] lookup for a VM_SHADOW_STACK mapping since it uses a different PIE index from the typical read/write/exec permissions. However, the side effect is that it also ignores mprotect(PROT_NONE) by creating an accessible PTE. Special-case the !(vm_flags & VM_ACCESS_FLAGS) flags to use the protection_map[VM_NONE] permissions instead. No GCS attributes are required for an inaccessible PTE. Signed-off-by: Catalin Marinas Fixes: 6497b66ba694 ("arm64/mm: Map pages for guarded control stack") Cc: stable@vger.kernel.org Cc: Mark Brown Cc: Will Deacon Cc: David Hildenbrand Reviewed-by: David Hildenbrand (Arm) Signed-off-by: Will Deacon --- arch/arm64/mm/mmap.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c index 75f343009b4b..92b2f5097a96 100644 --- a/arch/arm64/mm/mmap.c +++ b/arch/arm64/mm/mmap.c @@ -91,7 +91,11 @@ pgprot_t vm_get_page_prot(vm_flags_t vm_flags) /* Short circuit GCS to avoid bloating the table. */ if (system_supports_gcs() && (vm_flags & VM_SHADOW_STACK)) { - prot = gcs_page_prot; + /* Honour mprotect(PROT_NONE) on shadow stack mappings */ + if (vm_flags & VM_ACCESS_FLAGS) + prot = gcs_page_prot; + else + prot = pgprot_val(protection_map[VM_NONE]); } else { prot = pgprot_val(protection_map[vm_flags & (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); From 9d1a7c4a457eac8a7e07e1685f95e54fa551db5e Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Mon, 23 Feb 2026 17:45:32 +0000 Subject: [PATCH 276/359] kselftest: arm64: Check access to GCS after mprotect(PROT_NONE) A GCS mapping should not be accessible after mprotect(PROT_NONE). Add a kselftest for this scenario. Signed-off-by: Catalin Marinas Cc: Shuah Khan Cc: Mark Brown Cc: Will Deacon Cc: David Hildenbrand Reviewed-by: Mark Brown Signed-off-by: Will Deacon --- .../signal/testcases/gcs_prot_none_fault.c | 76 +++++++++++++++++++ 1 file changed, 76 insertions(+) create mode 100644 tools/testing/selftests/arm64/signal/testcases/gcs_prot_none_fault.c diff --git a/tools/testing/selftests/arm64/signal/testcases/gcs_prot_none_fault.c b/tools/testing/selftests/arm64/signal/testcases/gcs_prot_none_fault.c new file mode 100644 index 000000000000..2259f454a202 --- /dev/null +++ b/tools/testing/selftests/arm64/signal/testcases/gcs_prot_none_fault.c @@ -0,0 +1,76 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2026 ARM Limited + */ + +#include +#include +#include + +#include +#include + +#include "test_signals_utils.h" +#include "testcases.h" + +static uint64_t *gcs_page; +static bool post_mprotect; + +#ifndef __NR_map_shadow_stack +#define __NR_map_shadow_stack 453 +#endif + +static bool alloc_gcs(struct tdescr *td) +{ + long page_size = sysconf(_SC_PAGE_SIZE); + + gcs_page = (void *)syscall(__NR_map_shadow_stack, 0, + page_size, 0); + if (gcs_page == MAP_FAILED) { + fprintf(stderr, "Failed to map %ld byte GCS: %d\n", + page_size, errno); + return false; + } + + return true; +} + +static int gcs_prot_none_fault_trigger(struct tdescr *td) +{ + /* Verify that the page is readable (ie, not completely unmapped) */ + fprintf(stderr, "Read value 0x%lx\n", gcs_page[0]); + + if (mprotect(gcs_page, sysconf(_SC_PAGE_SIZE), PROT_NONE) != 0) { + fprintf(stderr, "mprotect(PROT_NONE) failed: %d\n", errno); + return 0; + } + post_mprotect = true; + + /* This should trigger a fault if PROT_NONE is honoured for the GCS page */ + fprintf(stderr, "Read value after mprotect(PROT_NONE) 0x%lx\n", gcs_page[0]); + return 0; +} + +static int gcs_prot_none_fault_signal(struct tdescr *td, siginfo_t *si, + ucontext_t *uc) +{ + ASSERT_GOOD_CONTEXT(uc); + + /* A fault before mprotect(PROT_NONE) is unexpected. */ + if (!post_mprotect) + return 0; + + return 1; +} + +struct tdescr tde = { + .name = "GCS PROT_NONE fault", + .descr = "Read from GCS after mprotect(PROT_NONE) segfaults", + .feats_required = FEAT_GCS, + .timeout = 3, + .sig_ok = SIGSEGV, + .sanity_disabled = true, + .init = alloc_gcs, + .trigger = gcs_prot_none_fault_trigger, + .run = gcs_prot_none_fault_signal, +}; From bfd9c931d19aa59fb8371d557774fa169b15db9a Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Wed, 18 Feb 2026 16:43:47 +0000 Subject: [PATCH 277/359] arm64: tlb: Allow XZR argument to TLBI ops The TLBI instruction accepts XZR as a register argument, and for TLBI operations with a register argument, there is no functional difference between using XZR or another GPR which contains zeroes. Operations without a register argument are encoded as if XZR were used. Allow the __TLBI_1() macro to use XZR when a register argument is all zeroes. Today this only results in a trivial code saving in __do_compat_cache_op()'s workaround for Neoverse-N1 erratum #1542419. In subsequent patches this pattern will be used more generally. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland Cc: Catalin Marinas Cc: Marc Zyngier Cc: Oliver Upton Cc: Ryan Roberts Cc: Will Deacon Signed-off-by: Will Deacon --- arch/arm64/include/asm/tlbflush.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index a2d65d7d6aae..bf1cc9949dc8 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -38,12 +38,12 @@ : : ) #define __TLBI_1(op, arg) asm (ARM64_ASM_PREAMBLE \ - "tlbi " #op ", %0\n" \ + "tlbi " #op ", %x0\n" \ ALTERNATIVE("nop\n nop", \ - "dsb ish\n tlbi " #op ", %0", \ + "dsb ish\n tlbi " #op ", %x0", \ ARM64_WORKAROUND_REPEAT_TLBI, \ CONFIG_ARM64_WORKAROUND_REPEAT_TLBI) \ - : : "r" (arg)) + : : "rZ" (arg)) #define __TLBI_N(op, arg, n, ...) __TLBI_##n(op, arg) From a8f78680ee6bf795086384e8aea159a52814f827 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Wed, 18 Feb 2026 16:43:48 +0000 Subject: [PATCH 278/359] arm64: tlb: Optimize ARM64_WORKAROUND_REPEAT_TLBI The ARM64_WORKAROUND_REPEAT_TLBI workaround is used to mitigate several errata where broadcast TLBI;DSB sequences don't provide all the architecturally required synchronization. The workaround performs more work than necessary, and can have significant overhead. This patch optimizes the workaround, as explained below. The workaround was originally added for Qualcomm Falkor erratum 1009 in commit: d9ff80f83ecb ("arm64: Work around Falkor erratum 1009") As noted in the message for that commit, the workaround is applied even in cases where it is not strictly necessary. The workaround was later reused without changes for: * Arm Cortex-A76 erratum #1286807 SDEN v33: https://developer.arm.com/documentation/SDEN-885749/33-0/ * Arm Cortex-A55 erratum #2441007 SDEN v16: https://developer.arm.com/documentation/SDEN-859338/1600/ * Arm Cortex-A510 erratum #2441009 SDEN v19: https://developer.arm.com/documentation/SDEN-1873351/1900/ The important details to note are as follows: 1. All relevant errata only affect the ordering and/or completion of memory accesses which have been translated by an invalidated TLB entry. The actual invalidation of TLB entries is unaffected. 2. The existing workaround is applied to both broadcast and local TLB invalidation, whereas for all relevant errata it is only necessary to apply a workaround for broadcast invalidation. 3. The existing workaround replaces every TLBI with a TLBI;DSB;TLBI sequence, whereas for all relevant errata it is only necessary to execute a single additional TLBI;DSB sequence after any number of TLBIs are completed by a DSB. For example, for a sequence of batched TLBIs: TLBI [, ] TLBI [, ] TLBI [, ] DSB ISH ... the existing workaround will expand this to: TLBI [, ] DSB ISH // additional TLBI [, ] // additional TLBI [, ] DSB ISH // additional TLBI [, ] // additional TLBI [, ] DSB ISH // additional TLBI [, ] // additional DSB ISH ... whereas it is sufficient to have: TLBI [, ] TLBI [, ] TLBI [, ] DSB ISH TLBI [, ] // additional DSB ISH // additional Using a single additional TBLI and DSB at the end of the sequence can have significantly lower overhead as each DSB which completes a TLBI must synchronize with other PEs in the system, with potential performance effects both locally and system-wide. 4. The existing workaround repeats each specific TLBI operation, whereas for all relevant errata it is sufficient for the additional TLBI to use *any* operation which will be broadcast, regardless of which translation regime or stage of translation the operation applies to. For example, for a single TLBI: TLBI ALLE2IS DSB ISH ... the existing workaround will expand this to: TLBI ALLE2IS DSB ISH TLBI ALLE2IS // additional DSB ISH // additional ... whereas it is sufficient to have: TLBI ALLE2IS DSB ISH TLBI VALE1IS, XZR // additional DSB ISH // additional As the additional TLBI doesn't have to match a specific earlier TLBI, the additional TLBI can be implemented in separate code, with no memory of the earlier TLBIs. The additional TLBI can also use a cheaper TLBI operation. 5. The existing workaround is applied to both Stage-1 and Stage-2 TLB invalidation, whereas for all relevant errata it is only necessary to apply a workaround for Stage-1 invalidation. Architecturally, TLBI operations which invalidate only Stage-2 information (e.g. IPAS2E1IS) are not required to invalidate TLB entries which combine information from Stage-1 and Stage-2 translation table entries, and consequently may not complete memory accesses translated by those combined entries. In these cases, completion of memory accesses is only guaranteed after subsequent invalidation of Stage-1 information (e.g. VMALLE1IS). Taking the above points into account, this patch reworks the workaround logic to reduce overhead: * New __tlbi_sync_s1ish() and __tlbi_sync_s1ish_hyp() functions are added and used in place of any dsb(ish) which is used to complete broadcast Stage-1 TLB maintenance. When the ARM64_WORKAROUND_REPEAT_TLBI workaround is enabled, these helpers will execute an additional TLBI;DSB sequence. For consistency, it might make sense to add __tlbi_sync_*() helpers for local and stage 2 maintenance. For now I've left those with open-coded dsb() to keep the diff small. * The duplication of TLBIs in __TLBI_0() and __TLBI_1() is removed. This is no longer needed as the necessary synchronization will happen in __tlbi_sync_s1ish() or __tlbi_sync_s1ish_hyp(). * The additional TLBI operation is chosen to have minimal impact: - __tlbi_sync_s1ish() uses "TLBI VALE1IS, XZR". This is only used at EL1 or at EL2 with {E2H,TGE}=={1,1}, where it will target an unused entry for the reserved ASID in the kernel's own translation regime, and have no adverse affect. - __tlbi_sync_s1ish_hyp() uses "TLBI VALE2IS, XZR". This is only used in hyp code, where it will target an unused entry in the hyp code's TTBR0 mapping, and should have no adverse effect. * As __TLBI_0() and __TLBI_1() no longer replace each TLBI with a TLBI;DSB;TLBI sequence, batching TLBIs is worthwhile, and there's no need for arch_tlbbatch_should_defer() to consider ARM64_WORKAROUND_REPEAT_TLBI. When building defconfig with GCC 15.1.0, compared to v6.19-rc1, this patch saves ~1KiB of text, makes the vmlinux ~42KiB smaller, and makes the resulting Image 64KiB smaller: | [mark@lakrids:~/src/linux]% size vmlinux-* | text data bss dec hex filename | 21179831 19660919 708216 41548966 279fca6 vmlinux-after | 21181075 19660903 708216 41550194 27a0172 vmlinux-before | [mark@lakrids:~/src/linux]% ls -l vmlinux-* | -rwxr-xr-x 1 mark mark 157771472 Feb 4 12:05 vmlinux-after | -rwxr-xr-x 1 mark mark 157815432 Feb 4 12:05 vmlinux-before | [mark@lakrids:~/src/linux]% ls -l Image-* | -rw-r--r-- 1 mark mark 41007616 Feb 4 12:05 Image-after | -rw-r--r-- 1 mark mark 41073152 Feb 4 12:05 Image-before Signed-off-by: Mark Rutland Cc: Catalin Marinas Cc: Marc Zyngier Cc: Oliver Upton Cc: Ryan Roberts Cc: Will Deacon Signed-off-by: Will Deacon --- arch/arm64/include/asm/tlbflush.h | 59 ++++++++++++++++++------------- arch/arm64/kernel/sys_compat.c | 2 +- arch/arm64/kvm/hyp/nvhe/mm.c | 2 +- arch/arm64/kvm/hyp/nvhe/tlb.c | 8 ++--- arch/arm64/kvm/hyp/pgtable.c | 2 +- arch/arm64/kvm/hyp/vhe/tlb.c | 10 +++--- 6 files changed, 47 insertions(+), 36 deletions(-) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index bf1cc9949dc8..1416e652612b 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -31,18 +31,10 @@ */ #define __TLBI_0(op, arg) asm (ARM64_ASM_PREAMBLE \ "tlbi " #op "\n" \ - ALTERNATIVE("nop\n nop", \ - "dsb ish\n tlbi " #op, \ - ARM64_WORKAROUND_REPEAT_TLBI, \ - CONFIG_ARM64_WORKAROUND_REPEAT_TLBI) \ : : ) #define __TLBI_1(op, arg) asm (ARM64_ASM_PREAMBLE \ "tlbi " #op ", %x0\n" \ - ALTERNATIVE("nop\n nop", \ - "dsb ish\n tlbi " #op ", %x0", \ - ARM64_WORKAROUND_REPEAT_TLBI, \ - CONFIG_ARM64_WORKAROUND_REPEAT_TLBI) \ : : "rZ" (arg)) #define __TLBI_N(op, arg, n, ...) __TLBI_##n(op, arg) @@ -181,6 +173,34 @@ static inline unsigned long get_trans_granule(void) (__pages >> (5 * (scale) + 1)) - 1; \ }) +#define __repeat_tlbi_sync(op, arg...) \ +do { \ + if (!alternative_has_cap_unlikely(ARM64_WORKAROUND_REPEAT_TLBI)) \ + break; \ + __tlbi(op, ##arg); \ + dsb(ish); \ +} while (0) + +/* + * Complete broadcast TLB maintenance issued by the host which invalidates + * stage 1 information in the host's own translation regime. + */ +static inline void __tlbi_sync_s1ish(void) +{ + dsb(ish); + __repeat_tlbi_sync(vale1is, 0); +} + +/* + * Complete broadcast TLB maintenance issued by hyp code which invalidates + * stage 1 translation information in any translation regime. + */ +static inline void __tlbi_sync_s1ish_hyp(void) +{ + dsb(ish); + __repeat_tlbi_sync(vale2is, 0); +} + /* * TLB Invalidation * ================ @@ -279,7 +299,7 @@ static inline void flush_tlb_all(void) { dsb(ishst); __tlbi(vmalle1is); - dsb(ish); + __tlbi_sync_s1ish(); isb(); } @@ -291,7 +311,7 @@ static inline void flush_tlb_mm(struct mm_struct *mm) asid = __TLBI_VADDR(0, ASID(mm)); __tlbi(aside1is, asid); __tlbi_user(aside1is, asid); - dsb(ish); + __tlbi_sync_s1ish(); mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL); } @@ -345,20 +365,11 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr) { flush_tlb_page_nosync(vma, uaddr); - dsb(ish); + __tlbi_sync_s1ish(); } static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) { - /* - * TLB flush deferral is not required on systems which are affected by - * ARM64_WORKAROUND_REPEAT_TLBI, as __tlbi()/__tlbi_user() implementation - * will have two consecutive TLBI instructions with a dsb(ish) in between - * defeating the purpose (i.e save overall 'dsb ish' cost). - */ - if (alternative_has_cap_unlikely(ARM64_WORKAROUND_REPEAT_TLBI)) - return false; - return true; } @@ -374,7 +385,7 @@ static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) */ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) { - dsb(ish); + __tlbi_sync_s1ish(); } /* @@ -509,7 +520,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma, { __flush_tlb_range_nosync(vma->vm_mm, start, end, stride, last_level, tlb_level); - dsb(ish); + __tlbi_sync_s1ish(); } static inline void local_flush_tlb_contpte(struct vm_area_struct *vma, @@ -557,7 +568,7 @@ static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end dsb(ishst); __flush_tlb_range_op(vaale1is, start, pages, stride, 0, TLBI_TTL_UNKNOWN, false, lpa2_is_enabled()); - dsb(ish); + __tlbi_sync_s1ish(); isb(); } @@ -571,7 +582,7 @@ static inline void __flush_tlb_kernel_pgtable(unsigned long kaddr) dsb(ishst); __tlbi(vaae1is, addr); - dsb(ish); + __tlbi_sync_s1ish(); isb(); } diff --git a/arch/arm64/kernel/sys_compat.c b/arch/arm64/kernel/sys_compat.c index 4a609e9b65de..b9d4998c97ef 100644 --- a/arch/arm64/kernel/sys_compat.c +++ b/arch/arm64/kernel/sys_compat.c @@ -37,7 +37,7 @@ __do_compat_cache_op(unsigned long start, unsigned long end) * We pick the reserved-ASID to minimise the impact. */ __tlbi(aside1is, __TLBI_VADDR(0, 0)); - dsb(ish); + __tlbi_sync_s1ish(); } ret = caches_clean_inval_user_pou(start, start + chunk); diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c index ae8391baebc3..218976287d3f 100644 --- a/arch/arm64/kvm/hyp/nvhe/mm.c +++ b/arch/arm64/kvm/hyp/nvhe/mm.c @@ -271,7 +271,7 @@ static void fixmap_clear_slot(struct hyp_fixmap_slot *slot) */ dsb(ishst); __tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level); - dsb(ish); + __tlbi_sync_s1ish_hyp(); isb(); } diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c index 48da9ca9763f..3dc1ce0d27fe 100644 --- a/arch/arm64/kvm/hyp/nvhe/tlb.c +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c @@ -169,7 +169,7 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, */ dsb(ish); __tlbi(vmalle1is); - dsb(ish); + __tlbi_sync_s1ish_hyp(); isb(); exit_vmid_context(&cxt); @@ -226,7 +226,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu, dsb(ish); __tlbi(vmalle1is); - dsb(ish); + __tlbi_sync_s1ish_hyp(); isb(); exit_vmid_context(&cxt); @@ -240,7 +240,7 @@ void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu) enter_vmid_context(mmu, &cxt, false); __tlbi(vmalls12e1is); - dsb(ish); + __tlbi_sync_s1ish_hyp(); isb(); exit_vmid_context(&cxt); @@ -266,5 +266,5 @@ void __kvm_flush_vm_context(void) /* Same remark as in enter_vmid_context() */ dsb(ish); __tlbi(alle1is); - dsb(ish); + __tlbi_sync_s1ish_hyp(); } diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 0e4ddd28ef5d..9b480f947da2 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -501,7 +501,7 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx, *unmapped += granule; } - dsb(ish); + __tlbi_sync_s1ish_hyp(); isb(); mm_ops->put_page(ctx->ptep); diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c index ec2569818629..35855dadfb1b 100644 --- a/arch/arm64/kvm/hyp/vhe/tlb.c +++ b/arch/arm64/kvm/hyp/vhe/tlb.c @@ -115,7 +115,7 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, */ dsb(ish); __tlbi(vmalle1is); - dsb(ish); + __tlbi_sync_s1ish_hyp(); isb(); exit_vmid_context(&cxt); @@ -176,7 +176,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu, dsb(ish); __tlbi(vmalle1is); - dsb(ish); + __tlbi_sync_s1ish_hyp(); isb(); exit_vmid_context(&cxt); @@ -192,7 +192,7 @@ void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu) enter_vmid_context(mmu, &cxt); __tlbi(vmalls12e1is); - dsb(ish); + __tlbi_sync_s1ish_hyp(); isb(); exit_vmid_context(&cxt); @@ -217,7 +217,7 @@ void __kvm_flush_vm_context(void) { dsb(ishst); __tlbi(alle1is); - dsb(ish); + __tlbi_sync_s1ish_hyp(); } /* @@ -358,7 +358,7 @@ int __kvm_tlbi_s1e2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding) default: ret = -EINVAL; } - dsb(ish); + __tlbi_sync_s1ish_hyp(); isb(); if (mmu) From 468711a40d5dfc01bf0a24c1981246a2c93ac405 Mon Sep 17 00:00:00 2001 From: Niklas Cassel Date: Tue, 10 Feb 2026 19:12:25 +0100 Subject: [PATCH 279/359] PCI: dwc: ep: Refresh MSI Message Address cache on change Endpoint drivers use dw_pcie_ep_raise_msi_irq() to raise MSI interrupts to the host. After 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping"), dw_pcie_ep_raise_msi_irq() caches the Message Address from the MSI Capability in ep->msi_msg_addr. But that Message Address is controlled by the host, and it may change. For example, if: - firmware on the host configures the Message Address and triggers an MSI, - a driver on the Endpoint raises the MSI via dw_pcie_ep_raise_msi_irq(), which caches the Message Address, - a kernel on the host reconfigures the Message Address and the host kernel driver triggers another MSI, dw_pcie_ep_raise_msi_irq() notices that the Message Address no longer matches the cached ep->msi_msg_addr, warns about it, and returns error instead of raising the MSI. The host kernel may hang because it never receives the MSI. This was seen with the nvmet_pci_epf_driver: the host UEFI performs NVMe commands, e.g. Identify Controller to get the name of the controller, nvmet-pci-epf posts the completion queue entry and raises an IRQ using dw_pcie_ep_raise_msi_irq(). When the host boots Linux, we see a WARN_ON_ONCE() from dw_pcie_ep_raise_msi_irq(), and the host kernel hangs because the nvme driver never gets an IRQ. Remove the warning when dw_pcie_ep_raise_msi_irq() notices that Message Address has changed, remap using the new address, and update the ep->msi_msg_addr cache. Fixes: 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping") Signed-off-by: Niklas Cassel [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas Tested-by: Shin'ichiro Kawasaki Tested-by: Koichiro Den Acked-by: Manivannan Sadhasivam Link: https://patch.msgid.link/20260210181225.3926165-2-cassel@kernel.org --- .../pci/controller/dwc/pcie-designware-ep.c | 22 +++++++++++-------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c index 295076cf70de..35d8fc5e1d20 100644 --- a/drivers/pci/controller/dwc/pcie-designware-ep.c +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c @@ -905,6 +905,19 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no, * supported, so we avoid reprogramming the region on every MSI, * specifically unmapping immediately after writel(). */ + if (ep->msi_iatu_mapped && (ep->msi_msg_addr != msg_addr || + ep->msi_map_size != map_size)) { + /* + * The host changed the MSI target address or the required + * mapping size changed. Reprogramming the iATU when there are + * operations in flight is unsafe on this controller. However, + * there is no unified way to check if we have operations in + * flight, thus we don't know if we should WARN() or not. + */ + dw_pcie_ep_unmap_addr(epc, func_no, 0, ep->msi_mem_phys); + ep->msi_iatu_mapped = false; + } + if (!ep->msi_iatu_mapped) { ret = dw_pcie_ep_map_addr(epc, func_no, 0, ep->msi_mem_phys, msg_addr, @@ -915,15 +928,6 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no, ep->msi_iatu_mapped = true; ep->msi_msg_addr = msg_addr; ep->msi_map_size = map_size; - } else if (WARN_ON_ONCE(ep->msi_msg_addr != msg_addr || - ep->msi_map_size != map_size)) { - /* - * The host changed the MSI target address or the required - * mapping size changed. Reprogramming the iATU at runtime is - * unsafe on this controller, so bail out instead of trying to - * update the existing region. - */ - return -EINVAL; } writel(msg_data | (interrupt_num - 1), ep->msi_mem + offset); From c22533c66ccae10511ad6a7afc34bb26c47577e3 Mon Sep 17 00:00:00 2001 From: Niklas Cassel Date: Wed, 11 Feb 2026 18:55:41 +0100 Subject: [PATCH 280/359] PCI: dwc: ep: Flush MSI-X write before unmapping its ATU entry Endpoint drivers use dw_pcie_ep_raise_msix_irq() to raise an MSI-X interrupt to the host using a writel(), which generates a PCI posted write transaction. There's no completion for posted writes, so the writel() may return before the PCI write completes. dw_pcie_ep_raise_msix_irq() also unmaps the outbound ATU entry used for the PCI write, so the write races with the unmap. If the PCI write loses the race with the ATU unmap, the write may corrupt host memory or cause IOMMU errors, e.g., these when running fio with a larger queue depth against nvmet-pci-epf: arm-smmu-v3 fc900000.iommu: 0x0000010000000010 arm-smmu-v3 fc900000.iommu: 0x0000020000000000 arm-smmu-v3 fc900000.iommu: 0x000000090000f040 arm-smmu-v3 fc900000.iommu: 0x0000000000000000 arm-smmu-v3 fc900000.iommu: event: F_TRANSLATION client: 0000:01:00.0 sid: 0x100 ssid: 0x0 iova: 0x90000f040 ipa: 0x0 arm-smmu-v3 fc900000.iommu: unpriv data write s1 "Input address caused fault" stag: 0x0 Flush the write by performing a readl() of the same address to ensure that the write has reached the destination before the ATU entry is unmapped. The same problem was solved for dw_pcie_ep_raise_msi_irq() in commit 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping"), but there it was solved by dedicating an outbound iATU only for MSI. We can't do the same for MSI-X because each vector can have a different msg_addr and the msg_addr may be changed while the vector is masked. Fixes: beb4641a787d ("PCI: dwc: Add MSI-X callbacks handler") Signed-off-by: Niklas Cassel [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas Reviewed-by: Frank Li Link: https://patch.msgid.link/20260211175540.105677-2-cassel@kernel.org --- drivers/pci/controller/dwc/pcie-designware-ep.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c index 35d8fc5e1d20..c57ae4d6c5c0 100644 --- a/drivers/pci/controller/dwc/pcie-designware-ep.c +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c @@ -1014,6 +1014,9 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no, writel(msg_data, ep->msi_mem + offset); + /* flush posted write before unmap */ + readl(ep->msi_mem + offset); + dw_pcie_ep_unmap_addr(epc, func_no, 0, ep->msi_mem_phys); return 0; From 75c151ceaacf5ca8f2f34ebf863d88002fb12587 Mon Sep 17 00:00:00 2001 From: Lizhi Hou Date: Wed, 25 Feb 2026 12:47:52 -0800 Subject: [PATCH 281/359] accel/amdxdna: Use a different name for latest firmware Using legacy driver with latest firmware causes a power off issue. Fix this by assigning a different filename (npu_7.sbin) to the latest firmware. The driver attempts to load the latest firmware first and falls back to the previous firmware version if loading fails. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5009 Fixes: f1eac46fe5f7 ("accel/amdxdna: Update firmware version check for latest firmware") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260225204752.2711734-1-lizhi.hou@amd.com --- drivers/accel/amdxdna/aie2_pci.c | 20 +++++++++++++++++++- drivers/accel/amdxdna/amdxdna_pci_drv.c | 3 +++ drivers/accel/amdxdna/npu1_regs.c | 2 +- drivers/accel/amdxdna/npu4_regs.c | 2 +- drivers/accel/amdxdna/npu5_regs.c | 2 +- drivers/accel/amdxdna/npu6_regs.c | 2 +- 6 files changed, 26 insertions(+), 5 deletions(-) diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c index 4b3e6bb97bd2..85079b6fc5d9 100644 --- a/drivers/accel/amdxdna/aie2_pci.c +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -32,6 +32,11 @@ static int aie2_max_col = XRS_MAX_COL; module_param(aie2_max_col, uint, 0600); MODULE_PARM_DESC(aie2_max_col, "Maximum column could be used"); +static char *npu_fw[] = { + "npu_7.sbin", + "npu.sbin" +}; + /* * The management mailbox channel is allocated by firmware. * The related register and ring buffer information is on SRAM BAR. @@ -489,6 +494,7 @@ static int aie2_init(struct amdxdna_dev *xdna) struct psp_config psp_conf; const struct firmware *fw; unsigned long bars = 0; + char *fw_full_path; int i, nvec, ret; if (!hypervisor_is_type(X86_HYPER_NATIVE)) { @@ -503,7 +509,19 @@ static int aie2_init(struct amdxdna_dev *xdna) ndev->priv = xdna->dev_info->dev_priv; ndev->xdna = xdna; - ret = request_firmware(&fw, ndev->priv->fw_path, &pdev->dev); + for (i = 0; i < ARRAY_SIZE(npu_fw); i++) { + fw_full_path = kasprintf(GFP_KERNEL, "%s%s", ndev->priv->fw_path, npu_fw[i]); + if (!fw_full_path) + return -ENOMEM; + + ret = firmware_request_nowarn(&fw, fw_full_path, &pdev->dev); + kfree(fw_full_path); + if (!ret) { + XDNA_INFO(xdna, "Load firmware %s%s", ndev->priv->fw_path, npu_fw[i]); + break; + } + } + if (ret) { XDNA_ERR(xdna, "failed to request_firmware %s, ret %d", ndev->priv->fw_path, ret); diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c b/drivers/accel/amdxdna/amdxdna_pci_drv.c index 4ada45d06fcf..a4384593bdcc 100644 --- a/drivers/accel/amdxdna/amdxdna_pci_drv.c +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c @@ -23,6 +23,9 @@ MODULE_FIRMWARE("amdnpu/1502_00/npu.sbin"); MODULE_FIRMWARE("amdnpu/17f0_10/npu.sbin"); MODULE_FIRMWARE("amdnpu/17f0_11/npu.sbin"); MODULE_FIRMWARE("amdnpu/17f0_20/npu.sbin"); +MODULE_FIRMWARE("amdnpu/1502_00/npu_7.sbin"); +MODULE_FIRMWARE("amdnpu/17f0_10/npu_7.sbin"); +MODULE_FIRMWARE("amdnpu/17f0_11/npu_7.sbin"); /* * 0.0: Initial version diff --git a/drivers/accel/amdxdna/npu1_regs.c b/drivers/accel/amdxdna/npu1_regs.c index 6f36a27b5a02..6e3d3ca69c04 100644 --- a/drivers/accel/amdxdna/npu1_regs.c +++ b/drivers/accel/amdxdna/npu1_regs.c @@ -72,7 +72,7 @@ static const struct aie2_fw_feature_tbl npu1_fw_feature_table[] = { }; static const struct amdxdna_dev_priv npu1_dev_priv = { - .fw_path = "amdnpu/1502_00/npu.sbin", + .fw_path = "amdnpu/1502_00/", .rt_config = npu1_default_rt_cfg, .dpm_clk_tbl = npu1_dpm_clk_table, .fw_feature_tbl = npu1_fw_feature_table, diff --git a/drivers/accel/amdxdna/npu4_regs.c b/drivers/accel/amdxdna/npu4_regs.c index a8d6f76dde5f..ce25eef5fc34 100644 --- a/drivers/accel/amdxdna/npu4_regs.c +++ b/drivers/accel/amdxdna/npu4_regs.c @@ -98,7 +98,7 @@ const struct aie2_fw_feature_tbl npu4_fw_feature_table[] = { }; static const struct amdxdna_dev_priv npu4_dev_priv = { - .fw_path = "amdnpu/17f0_10/npu.sbin", + .fw_path = "amdnpu/17f0_10/", .rt_config = npu4_default_rt_cfg, .dpm_clk_tbl = npu4_dpm_clk_table, .fw_feature_tbl = npu4_fw_feature_table, diff --git a/drivers/accel/amdxdna/npu5_regs.c b/drivers/accel/amdxdna/npu5_regs.c index c0a35cfd886c..c0ac5daf32ee 100644 --- a/drivers/accel/amdxdna/npu5_regs.c +++ b/drivers/accel/amdxdna/npu5_regs.c @@ -63,7 +63,7 @@ #define NPU5_SRAM_BAR_BASE MMNPU_APERTURE1_BASE static const struct amdxdna_dev_priv npu5_dev_priv = { - .fw_path = "amdnpu/17f0_11/npu.sbin", + .fw_path = "amdnpu/17f0_11/", .rt_config = npu4_default_rt_cfg, .dpm_clk_tbl = npu4_dpm_clk_table, .fw_feature_tbl = npu4_fw_feature_table, diff --git a/drivers/accel/amdxdna/npu6_regs.c b/drivers/accel/amdxdna/npu6_regs.c index 1fb07df99186..ce591ed0d483 100644 --- a/drivers/accel/amdxdna/npu6_regs.c +++ b/drivers/accel/amdxdna/npu6_regs.c @@ -63,7 +63,7 @@ #define NPU6_SRAM_BAR_BASE MMNPU_APERTURE1_BASE static const struct amdxdna_dev_priv npu6_dev_priv = { - .fw_path = "amdnpu/17f0_10/npu.sbin", + .fw_path = "amdnpu/17f0_10/", .rt_config = npu4_default_rt_cfg, .dpm_clk_tbl = npu4_dpm_clk_table, .fw_feature_tbl = npu4_fw_feature_table, From 49abfa812617a7f2d0132c70d23ac98b389c6ec1 Mon Sep 17 00:00:00 2001 From: Tvrtko Ursulin Date: Mon, 23 Feb 2026 12:41:30 +0000 Subject: [PATCH 282/359] drm/amdgpu/userq: Fix reference leak in amdgpu_userq_wait_ioctl MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Drop reference to syncobj and timeline fence when aborting the ioctl due output array being too small. Reviewed-by: Alex Deucher Signed-off-by: Tvrtko Ursulin Fixes: a292fdecd728 ("drm/amdgpu: Implement userqueue signal/wait IOCTL") Cc: Arunpravin Paneer Selvam Cc: Christian König Cc: Alex Deucher Signed-off-by: Alex Deucher (cherry picked from commit 68951e9c3e6bb22396bc42ef2359751c8315dd27) Cc: # v6.16+ --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c index 8013260e29dc..9b9947b94b89 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c @@ -876,6 +876,7 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, dma_fence_unwrap_for_each(f, &iter, fence) { if (WARN_ON_ONCE(num_fences >= wait_info->num_fences)) { r = -EINVAL; + dma_fence_put(fence); goto free_fences; } @@ -900,6 +901,7 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, if (WARN_ON_ONCE(num_fences >= wait_info->num_fences)) { r = -EINVAL; + dma_fence_put(fence); goto free_fences; } From 7b7d7693a55d606d700beb9549c9f7f0e5d9c24f Mon Sep 17 00:00:00 2001 From: Tvrtko Ursulin Date: Mon, 23 Feb 2026 12:41:31 +0000 Subject: [PATCH 283/359] drm/amdgpu/userq: Do not allow userspace to trivially triger kernel warnings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Userspace can either deliberately pass in the too small num_fences, or the required number can legitimately grow between the two calls to the userq wait ioctl. In both cases we do not want the emit the kernel warning backtrace since nothing is wrong with the kernel and userspace will simply get an errno reported back. So lets simply drop the WARN_ONs. Reviewed-by: Alex Deucher Signed-off-by: Tvrtko Ursulin Fixes: a292fdecd728 ("drm/amdgpu: Implement userqueue signal/wait IOCTL") Cc: Arunpravin Paneer Selvam Cc: Christian König Cc: Alex Deucher Reviewed-by: Christian König Signed-off-by: Alex Deucher (cherry picked from commit 2c333ea579de6cc20ea7bc50e9595ef72863e65c) --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c index 9b9947b94b89..d972dc46f5a8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c @@ -833,7 +833,7 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, dma_resv_for_each_fence(&resv_cursor, gobj_read[i]->resv, DMA_RESV_USAGE_READ, fence) { - if (WARN_ON_ONCE(num_fences >= wait_info->num_fences)) { + if (num_fences >= wait_info->num_fences) { r = -EINVAL; goto free_fences; } @@ -850,7 +850,7 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, dma_resv_for_each_fence(&resv_cursor, gobj_write[i]->resv, DMA_RESV_USAGE_WRITE, fence) { - if (WARN_ON_ONCE(num_fences >= wait_info->num_fences)) { + if (num_fences >= wait_info->num_fences) { r = -EINVAL; goto free_fences; } @@ -874,7 +874,7 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, goto free_fences; dma_fence_unwrap_for_each(f, &iter, fence) { - if (WARN_ON_ONCE(num_fences >= wait_info->num_fences)) { + if (num_fences >= wait_info->num_fences) { r = -EINVAL; dma_fence_put(fence); goto free_fences; @@ -899,7 +899,7 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, if (r) goto free_fences; - if (WARN_ON_ONCE(num_fences >= wait_info->num_fences)) { + if (num_fences >= wait_info->num_fences) { r = -EINVAL; dma_fence_put(fence); goto free_fences; From ea78f8c68f4f6211c557df49174c54d167821962 Mon Sep 17 00:00:00 2001 From: Sunil Khatri Date: Fri, 20 Feb 2026 13:47:58 +0530 Subject: [PATCH 284/359] drm/amdgpu: add upper bound check on user inputs in signal ioctl MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Huge input values in amdgpu_userq_signal_ioctl can lead to a OOM and could be exploited. So check these input value against AMDGPU_USERQ_MAX_HANDLES which is big enough value for genuine use cases and could potentially avoid OOM. Signed-off-by: Sunil Khatri Reviewed-by: Christian König Signed-off-by: Alex Deucher (cherry picked from commit be267e15f99bc97cbe202cd556717797cdcf79a5) Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c index d972dc46f5a8..c5f5af20af75 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c @@ -35,6 +35,8 @@ static const struct dma_fence_ops amdgpu_userq_fence_ops; static struct kmem_cache *amdgpu_userq_fence_slab; +#define AMDGPU_USERQ_MAX_HANDLES (1U << 16) + int amdgpu_userq_fence_slab_init(void) { amdgpu_userq_fence_slab = kmem_cache_create("amdgpu_userq_fence", @@ -478,6 +480,11 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data, if (!amdgpu_userq_enabled(dev)) return -ENOTSUPP; + if (args->num_syncobj_handles > AMDGPU_USERQ_MAX_HANDLES || + args->num_bo_write_handles > AMDGPU_USERQ_MAX_HANDLES || + args->num_bo_read_handles > AMDGPU_USERQ_MAX_HANDLES) + return -EINVAL; + num_syncobj_handles = args->num_syncobj_handles; syncobj_handles = memdup_user(u64_to_user_ptr(args->syncobj_handles), size_mul(sizeof(u32), num_syncobj_handles)); From 64ac7c09fc44985ec9bb6a9db740899fa40ca613 Mon Sep 17 00:00:00 2001 From: Sunil Khatri Date: Tue, 24 Feb 2026 12:13:09 +0530 Subject: [PATCH 285/359] drm/amdgpu: add upper bound check on user inputs in wait ioctl MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Huge input values in amdgpu_userq_wait_ioctl can lead to a OOM and could be exploited. So check these input value against AMDGPU_USERQ_MAX_HANDLES which is big enough value for genuine use cases and could potentially avoid OOM. v2: squash in Srini's fix Signed-off-by: Sunil Khatri Reviewed-by: Christian König Signed-off-by: Alex Deucher (cherry picked from commit fcec012c664247531aed3e662f4280ff804d1476) Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c index c5f5af20af75..7e9cf1868cc9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c @@ -671,6 +671,11 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, if (!amdgpu_userq_enabled(dev)) return -ENOTSUPP; + if (wait_info->num_syncobj_handles > AMDGPU_USERQ_MAX_HANDLES || + wait_info->num_bo_write_handles > AMDGPU_USERQ_MAX_HANDLES || + wait_info->num_bo_read_handles > AMDGPU_USERQ_MAX_HANDLES) + return -EINVAL; + num_read_bo_handles = wait_info->num_bo_read_handles; bo_handles_read = memdup_user(u64_to_user_ptr(wait_info->bo_read_handles), size_mul(sizeof(u32), num_read_bo_handles)); From 28dfe4317541e57fe52f9a290394cd29c348228b Mon Sep 17 00:00:00 2001 From: Natalie Vock Date: Mon, 23 Feb 2026 12:45:37 +0100 Subject: [PATCH 286/359] drm/amd/display: Use GFP_ATOMIC in dc_create_stream_for_sink This can be called while preemption is disabled, for example by dcn32_internal_validate_bw which is called with the FPU active. Fixes "BUG: scheduling while atomic" messages I encounter on my Navi31 machine. Signed-off-by: Natalie Vock Signed-off-by: Alex Deucher (cherry picked from commit b42dae2ebc5c84a68de63ec4ffdfec49362d53f1) Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/display/dc/core/dc_stream.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_stream.c b/drivers/gpu/drm/amd/display/dc/core/dc_stream.c index 246893d80f1f..baf820e6eae8 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_stream.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_stream.c @@ -170,11 +170,11 @@ struct dc_stream_state *dc_create_stream_for_sink( if (sink == NULL) goto fail; - stream = kzalloc_obj(struct dc_stream_state); + stream = kzalloc_obj(struct dc_stream_state, GFP_ATOMIC); if (stream == NULL) goto fail; - stream->update_scratch = kzalloc((int32_t) dc_update_scratch_space_size(), GFP_KERNEL); + stream->update_scratch = kzalloc((int32_t) dc_update_scratch_space_size(), GFP_ATOMIC); if (stream->update_scratch == NULL) goto fail; From 5e0bcc7b88bcd081aaae6f481b10d9ab294fcb69 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Mon, 23 Feb 2026 14:00:07 -0800 Subject: [PATCH 287/359] drm/amdgpu: Unlock a mutex before destroying it MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mutexes must be unlocked before these are destroyed. This has been detected by the Clang thread-safety analyzer. Cc: Alex Deucher Cc: Christian König Cc: Yang Wang Cc: Hawking Zhang Cc: amd-gfx@lists.freedesktop.org Fixes: f5e4cc8461c4 ("drm/amdgpu: implement RAS ACA driver framework") Reviewed-by: Yang Wang Acked-by: Christian König Signed-off-by: Bart Van Assche Signed-off-by: Alex Deucher (cherry picked from commit 270258ba320beb99648dceffb67e86ac76786e55) --- drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c index afe5ca81beec..db7858fe0c3d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c @@ -641,6 +641,7 @@ static void aca_error_fini(struct aca_error *aerr) aca_bank_error_remove(aerr, bank_error); out_unlock: + mutex_unlock(&aerr->lock); mutex_destroy(&aerr->lock); } From 480ad5f6ead4a47b969aab6618573cd6822bb6a4 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Mon, 23 Feb 2026 13:50:23 -0800 Subject: [PATCH 288/359] drm/amdgpu: Fix locking bugs in error paths MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Do not unlock psp->ras_context.mutex if it has not been locked. This has been detected by the Clang thread-safety analyzer. Cc: Alex Deucher Cc: Christian König Cc: YiPeng Chai Cc: Hawking Zhang Cc: amd-gfx@lists.freedesktop.org Fixes: b3fb79cda568 ("drm/amdgpu: add mutex to protect ras shared memory") Acked-by: Christian König Signed-off-by: Bart Van Assche Signed-off-by: Alex Deucher (cherry picked from commit 6fa01b4335978051d2cd80841728fd63cc597970) --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c index 6e8aad91bcd3..0d3c18f04ac3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c @@ -332,13 +332,13 @@ static ssize_t ta_if_invoke_debugfs_write(struct file *fp, const char *buf, size if (!context || !context->initialized) { dev_err(adev->dev, "TA is not initialized\n"); ret = -EINVAL; - goto err_free_shared_buf; + goto free_shared_buf; } if (!psp->ta_funcs || !psp->ta_funcs->fn_ta_invoke) { dev_err(adev->dev, "Unsupported function to invoke TA\n"); ret = -EOPNOTSUPP; - goto err_free_shared_buf; + goto free_shared_buf; } context->session_id = ta_id; @@ -346,7 +346,7 @@ static ssize_t ta_if_invoke_debugfs_write(struct file *fp, const char *buf, size mutex_lock(&psp->ras_context.mutex); ret = prep_ta_mem_context(&context->mem_context, shared_buf, shared_buf_len); if (ret) - goto err_free_shared_buf; + goto unlock; ret = psp_fn_ta_invoke(psp, cmd_id); if (ret || context->resp_status) { @@ -354,15 +354,17 @@ static ssize_t ta_if_invoke_debugfs_write(struct file *fp, const char *buf, size ret, context->resp_status); if (!ret) { ret = -EINVAL; - goto err_free_shared_buf; + goto unlock; } } if (copy_to_user((char *)&buf[copy_pos], context->mem_context.shared_buf, shared_buf_len)) ret = -EFAULT; -err_free_shared_buf: +unlock: mutex_unlock(&psp->ras_context.mutex); + +free_shared_buf: kfree(shared_buf); return ret; From a5fe1a54513196e4bc8f9170006057dc31e7155e Mon Sep 17 00:00:00 2001 From: sguttula Date: Sat, 21 Feb 2026 10:03:32 +0530 Subject: [PATCH 289/359] drm/amdgpu/vcn5: Add SMU dpm interface type This will set AMDGPU_VCN_SMU_DPM_INTERFACE_* smu_type based on soc type and fixing ring timeout issue seen for DPM enabled case. Signed-off-by: sguttula Reviewed-by: Pratik Vishwakarma Signed-off-by: Alex Deucher (cherry picked from commit f0f23c315b38c55e8ce9484cf59b65811f350630) --- drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c index 0202df5db1e1..6109124f852e 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c @@ -174,6 +174,10 @@ static int vcn_v5_0_0_sw_init(struct amdgpu_ip_block *ip_block) fw_shared->present_flag_0 = cpu_to_le32(AMDGPU_FW_SHARED_FLAG_0_UNIFIED_QUEUE); fw_shared->sq.is_enabled = 1; + fw_shared->present_flag_0 |= cpu_to_le32(AMDGPU_VCN_SMU_DPM_INTERFACE_FLAG); + fw_shared->smu_dpm_interface.smu_interface_type = (adev->flags & AMD_IS_APU) ? + AMDGPU_VCN_SMU_DPM_INTERFACE_APU : AMDGPU_VCN_SMU_DPM_INTERFACE_DGPU; + if (amdgpu_vcnfw_log) amdgpu_vcn_fwlog_init(&adev->vcn.inst[i]); From b57c4ec98c17789136a4db948aec6daadceb5024 Mon Sep 17 00:00:00 2001 From: Lijo Lazar Date: Tue, 24 Feb 2026 10:18:51 +0530 Subject: [PATCH 290/359] drm/amdgpu: Fix error handling in slot reset If the device has not recovered after slot reset is called, it goes to out label for error handling. There it could make decision based on uninitialized hive pointer and could result in accessing an uninitialized list. Initialize the list and hive properly so that it handles the error situation and also releases the reset domain lock which is acquired during error_detected callback. Fixes: 732c6cefc1ec ("drm/amdgpu: Replace tmp_adev with hive in amdgpu_pci_slot_reset") Signed-off-by: Lijo Lazar Reviewed-by: Ce Sun Signed-off-by: Alex Deucher (cherry picked from commit bb71362182e59caa227e4192da5a612b09349696) --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index d9789e0b5201..3e19b51a2763 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -7059,6 +7059,15 @@ pci_ers_result_t amdgpu_pci_slot_reset(struct pci_dev *pdev) dev_info(adev->dev, "PCI error: slot reset callback!!\n"); memset(&reset_context, 0, sizeof(reset_context)); + INIT_LIST_HEAD(&device_list); + hive = amdgpu_get_xgmi_hive(adev); + if (hive) { + mutex_lock(&hive->hive_lock); + list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) + list_add_tail(&tmp_adev->reset_list, &device_list); + } else { + list_add_tail(&adev->reset_list, &device_list); + } if (adev->pcie_reset_ctx.swus) link_dev = adev->pcie_reset_ctx.swus; @@ -7099,19 +7108,13 @@ pci_ers_result_t amdgpu_pci_slot_reset(struct pci_dev *pdev) reset_context.reset_req_dev = adev; set_bit(AMDGPU_NEED_FULL_RESET, &reset_context.flags); set_bit(AMDGPU_SKIP_COREDUMP, &reset_context.flags); - INIT_LIST_HEAD(&device_list); - hive = amdgpu_get_xgmi_hive(adev); if (hive) { - mutex_lock(&hive->hive_lock); reset_context.hive = hive; - list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) { + list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) tmp_adev->pcie_reset_ctx.in_link_reset = true; - list_add_tail(&tmp_adev->reset_list, &device_list); - } } else { set_bit(AMDGPU_SKIP_HW_RESET, &reset_context.flags); - list_add_tail(&adev->reset_list, &device_list); } r = amdgpu_device_asic_reset(adev, &device_list, &reset_context); From 6b0d812971370c64b837a2db4275410f478272fe Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Wed, 25 Feb 2026 10:51:16 -0600 Subject: [PATCH 291/359] drm/amd: Disable MES LR compute W/A A workaround was introduced in commit 1fb710793ce2 ("drm/amdgpu: Enable MES lr_compute_wa by default") to help with some hangs observed in gfx1151. This WA didn't fully fix the issue. It was actually fixed by adjusting the VGPR size to the correct value that matched the hardware in commit b42f3bf9536c ("drm/amdkfd: bump minimum vgpr size for gfx1151"). There are reports of instability on other products with newer GC microcode versions, and I believe they're caused by this workaround. As we don't need the workaround any more, remove it. Fixes: b42f3bf9536c ("drm/amdkfd: bump minimum vgpr size for gfx1151") Acked-by: Alex Deucher Signed-off-by: Mario Limonciello Signed-off-by: Alex Deucher (cherry picked from commit 9973e64bd6ee7642860a6f3b6958cbf14e89cabd) Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 5 ----- drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 5 ----- 2 files changed, 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index 09ebb13ca5e8..a926a330700e 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c @@ -720,11 +720,6 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes *mes) mes_set_hw_res_pkt.enable_reg_active_poll = 1; mes_set_hw_res_pkt.enable_level_process_quantum_check = 1; mes_set_hw_res_pkt.oversubscription_timer = 50; - if ((mes->adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >= 0x7f) - mes_set_hw_res_pkt.enable_lr_compute_wa = 1; - else - dev_info_once(mes->adev->dev, - "MES FW version must be >= 0x7f to enable LR compute workaround.\n"); if (amdgpu_mes_log_enable) { mes_set_hw_res_pkt.enable_mes_event_int_logging = 1; diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c index b1c864dc79a8..5bfa5d1d0b36 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c @@ -779,11 +779,6 @@ static int mes_v12_0_set_hw_resources(struct amdgpu_mes *mes, int pipe) mes_set_hw_res_pkt.use_different_vmid_compute = 1; mes_set_hw_res_pkt.enable_reg_active_poll = 1; mes_set_hw_res_pkt.enable_level_process_quantum_check = 1; - if ((mes->adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >= 0x82) - mes_set_hw_res_pkt.enable_lr_compute_wa = 1; - else - dev_info_once(adev->dev, - "MES FW version must be >= 0x82 to enable LR compute workaround.\n"); /* * Keep oversubscribe timer for sdma . When we have unmapped doorbell From 12133a483dfa832241fbbf09321109a0ea8a520e Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 23 Feb 2026 12:28:30 +0100 Subject: [PATCH 292/359] nfc: pn533: properly drop the usb interface reference on disconnect When the device is disconnected from the driver, there is a "dangling" reference count on the usb interface that was grabbed in the probe callback. Fix this up by properly dropping the reference after we are done with it. Cc: stable Signed-off-by: Greg Kroah-Hartman Reviewed-by: Simon Horman Fixes: c46ee38620a2 ("NFC: pn533: add NXP pn533 nfc device driver") Link: https://patch.msgid.link/2026022329-flashing-ought-7573@gregkh Signed-off-by: Jakub Kicinski --- drivers/nfc/pn533/usb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nfc/pn533/usb.c b/drivers/nfc/pn533/usb.c index 018a80674f06..0f12f86ebb02 100644 --- a/drivers/nfc/pn533/usb.c +++ b/drivers/nfc/pn533/usb.c @@ -628,6 +628,7 @@ static void pn533_usb_disconnect(struct usb_interface *interface) usb_free_urb(phy->out_urb); usb_free_urb(phy->ack_urb); kfree(phy->ack_buffer); + usb_put_dev(phy->udev); nfc_info(&interface->dev, "NXP PN533 NFC device disconnected\n"); } From 11de1d3ae5565ed22ef1f89d73d8f2d00322c699 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 23 Feb 2026 13:58:48 +0100 Subject: [PATCH 293/359] net: usb: pegasus: validate USB endpoints The pegasus driver should validate that the device it is probing has the proper number and types of USB endpoints it is expecting before it binds to it. If a malicious device were to not have the same urbs the driver will crash later on when it blindly accesses these endpoints. Cc: Petko Manolov Cc: stable Signed-off-by: Greg Kroah-Hartman Link: https://patch.msgid.link/2026022347-legibly-attest-cc5c@gregkh Signed-off-by: Jakub Kicinski --- drivers/net/usb/pegasus.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c index b84968c5f074..c497202b78e8 100644 --- a/drivers/net/usb/pegasus.c +++ b/drivers/net/usb/pegasus.c @@ -812,8 +812,19 @@ static void unlink_all_urbs(pegasus_t *pegasus) static int alloc_urbs(pegasus_t *pegasus) { + static const u8 bulk_ep_addr[] = { + 1 | USB_DIR_IN, + 2 | USB_DIR_OUT, + 0}; + static const u8 int_ep_addr[] = { + 3 | USB_DIR_IN, + 0}; int res = -ENOMEM; + if (!usb_check_bulk_endpoints(pegasus->intf, bulk_ep_addr) || + !usb_check_int_endpoints(pegasus->intf, int_ep_addr)) + return -ENODEV; + pegasus->rx_urb = usb_alloc_urb(0, GFP_KERNEL); if (!pegasus->rx_urb) { return res; @@ -1168,6 +1179,7 @@ static int pegasus_probe(struct usb_interface *intf, pegasus = netdev_priv(net); pegasus->dev_index = dev_index; + pegasus->intf = intf; res = alloc_urbs(pegasus); if (res < 0) { @@ -1179,7 +1191,6 @@ static int pegasus_probe(struct usb_interface *intf, INIT_DELAYED_WORK(&pegasus->carrier_check, check_carrier); - pegasus->intf = intf; pegasus->usb = dev; pegasus->net = net; From c58b6c29a4c9b8125e8ad3bca0637e00b71e2693 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 23 Feb 2026 13:59:26 +0100 Subject: [PATCH 294/359] net: usb: kalmia: validate USB endpoints The kalmia driver should validate that the device it is probing has the proper number and types of USB endpoints it is expecting before it binds to it. If a malicious device were to not have the same urbs the driver will crash later on when it blindly accesses these endpoints. Cc: stable Signed-off-by: Greg Kroah-Hartman Reviewed-by: Simon Horman Fixes: d40261236e8e ("net/usb: Add Samsung Kalmia driver for Samsung GT-B3730") Link: https://patch.msgid.link/2026022326-shack-headstone-ef6f@gregkh Signed-off-by: Jakub Kicinski --- drivers/net/usb/kalmia.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/usb/kalmia.c b/drivers/net/usb/kalmia.c index 613fc6910f14..ee9c48f7f68f 100644 --- a/drivers/net/usb/kalmia.c +++ b/drivers/net/usb/kalmia.c @@ -132,11 +132,18 @@ kalmia_bind(struct usbnet *dev, struct usb_interface *intf) { int status; u8 ethernet_addr[ETH_ALEN]; + static const u8 ep_addr[] = { + 1 | USB_DIR_IN, + 2 | USB_DIR_OUT, + 0}; /* Don't bind to AT command interface */ if (intf->cur_altsetting->desc.bInterfaceClass != USB_CLASS_VENDOR_SPEC) return -EINVAL; + if (!usb_check_bulk_endpoints(intf, ep_addr)) + return -ENODEV; + dev->in = usb_rcvbulkpipe(dev->udev, 0x81 & USB_ENDPOINT_NUMBER_MASK); dev->out = usb_sndbulkpipe(dev->udev, 0x02 & USB_ENDPOINT_NUMBER_MASK); dev->status = NULL; From 4b063c002ca759d1b299988ee23f564c9609c875 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 23 Feb 2026 14:00:06 +0100 Subject: [PATCH 295/359] net: usb: kaweth: validate USB endpoints The kaweth driver should validate that the device it is probing has the proper number and types of USB endpoints it is expecting before it binds to it. If a malicious device were to not have the same urbs the driver will crash later on when it blindly accesses these endpoints. Cc: stable Signed-off-by: Greg Kroah-Hartman Reviewed-by: Simon Horman Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://patch.msgid.link/2026022305-substance-virtual-c728@gregkh Signed-off-by: Jakub Kicinski --- drivers/net/usb/kaweth.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/net/usb/kaweth.c b/drivers/net/usb/kaweth.c index e01d14f6c366..cb2472b59e10 100644 --- a/drivers/net/usb/kaweth.c +++ b/drivers/net/usb/kaweth.c @@ -883,6 +883,13 @@ static int kaweth_probe( const eth_addr_t bcast_addr = { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF }; int result = 0; int rv = -EIO; + static const u8 bulk_ep_addr[] = { + 1 | USB_DIR_IN, + 2 | USB_DIR_OUT, + 0}; + static const u8 int_ep_addr[] = { + 3 | USB_DIR_IN, + 0}; dev_dbg(dev, "Kawasaki Device Probe (Device number:%d): 0x%4.4x:0x%4.4x:0x%4.4x\n", @@ -896,6 +903,12 @@ static int kaweth_probe( (int)udev->descriptor.bLength, (int)udev->descriptor.bDescriptorType); + if (!usb_check_bulk_endpoints(intf, bulk_ep_addr) || + !usb_check_int_endpoints(intf, int_ep_addr)) { + dev_err(dev, "couldn't find required endpoints\n"); + return -ENODEV; + } + netdev = alloc_etherdev(sizeof(*kaweth)); if (!netdev) return -ENOMEM; From 5cc619583c7e735c4fb801bede671fb6f9c79425 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 23 Feb 2026 18:32:18 +0100 Subject: [PATCH 296/359] vsock: Use container_of() to get net namespace in sysctl handlers current->nsproxy is should not be accessed directly as syzbot has found that it could be NULL at times, causing crashes. Fix up the af_vsock sysctl handlers to use container_of() to deal with the current net namespace instead of attempting to rely on current. This is the same type of change done in commit 7f5611cbc487 ("rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy") Signed-off-by: Greg Kroah-Hartman Reviewed-by: Bobby Eshleman Reviewed-by: Stefano Garzarella Fixes: eafb64f40ca4 ("vsock: add netns to vsock core") Link: https://patch.msgid.link/2026022318-rearview-gallery-ae13@gregkh Signed-off-by: Jakub Kicinski --- net/vmw_vsock/af_vsock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 9880756d9eff..f4062c6a1944 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -2825,7 +2825,7 @@ static int vsock_net_mode_string(const struct ctl_table *table, int write, if (write) return -EPERM; - net = current->nsproxy->net_ns; + net = container_of(table->data, struct net, vsock.mode); return __vsock_net_mode_string(table, write, buffer, lenp, ppos, vsock_net_mode(net), NULL); @@ -2838,7 +2838,7 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write, struct net *net; int ret; - net = current->nsproxy->net_ns; + net = container_of(table->data, struct net, vsock.child_ns_mode); ret = __vsock_net_mode_string(table, write, buffer, lenp, ppos, vsock_net_child_mode(net), &new_mode); From 1e3bb184e94125bae7c1703472109a646d0f79d9 Mon Sep 17 00:00:00 2001 From: Simon Baatz Date: Tue, 24 Feb 2026 09:20:12 +0100 Subject: [PATCH 297/359] tcp: re-enable acceptance of FIN packets when RWIN is 0 Commit 2bd99aef1b19 ("tcp: accept bare FIN packets under memory pressure") allowed accepting FIN packets in tcp_data_queue() even when the receive window was closed, to prevent ACK/FIN loops with broken clients. Such a FIN packet is in sequence, but because the FIN consumes a sequence number, it extends beyond the window. Before commit 9ca48d616ed7 ("tcp: do not accept packets beyond window"), tcp_sequence() only required the seq to be within the window. After that change, the entire packet (including the FIN) must fit within the window. As a result, such FIN packets are now dropped and the handling path is no longer reached. Be more lenient by not counting the sequence number consumed by the FIN when calling tcp_sequence(), restoring the previous behavior for cases where only the FIN extends beyond the window. Fixes: 9ca48d616ed7 ("tcp: do not accept packets beyond window") Signed-off-by: Simon Baatz Reviewed-by: Eric Dumazet Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20260224-fix_zero_wnd_fin-v2-1-a16677ea7cea@gmail.com Signed-off-by: Jakub Kicinski --- net/ipv4/tcp_input.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index e7b41abb82aa..1c6b8ca67918 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4858,15 +4858,24 @@ static enum skb_drop_reason tcp_disordered_ack_check(const struct sock *sk, */ static enum skb_drop_reason tcp_sequence(const struct sock *sk, - u32 seq, u32 end_seq) + u32 seq, u32 end_seq, + const struct tcphdr *th) { const struct tcp_sock *tp = tcp_sk(sk); + u32 seq_limit; if (before(end_seq, tp->rcv_wup)) return SKB_DROP_REASON_TCP_OLD_SEQUENCE; - if (after(end_seq, tp->rcv_nxt + tcp_receive_window(tp))) { - if (after(seq, tp->rcv_nxt + tcp_receive_window(tp))) + seq_limit = tp->rcv_nxt + tcp_receive_window(tp); + if (unlikely(after(end_seq, seq_limit))) { + /* Some stacks are known to handle FIN incorrectly; allow the + * FIN to extend beyond the window and check it in detail later. + */ + if (!after(end_seq - th->fin, seq_limit)) + return SKB_NOT_DROPPED_YET; + + if (after(seq, seq_limit)) return SKB_DROP_REASON_TCP_INVALID_SEQUENCE; /* Only accept this packet if receive queue is empty. */ @@ -6379,7 +6388,8 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb, step1: /* Step 1: check sequence number */ - reason = tcp_sequence(sk, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq); + reason = tcp_sequence(sk, TCP_SKB_CB(skb)->seq, + TCP_SKB_CB(skb)->end_seq, th); if (reason) { /* RFC793, page 37: "In all states except SYN-SENT, all reset * (RST) segments are validated by checking their SEQ-fields." From 74511332309844c3de64970841b3c250d85f34ce Mon Sep 17 00:00:00 2001 From: Simon Baatz Date: Tue, 24 Feb 2026 09:20:13 +0100 Subject: [PATCH 298/359] selftests/net: packetdrill: Verify acceptance of FIN packets when RWIN is 0 Add a packetdrill test that verifies we accept bare FIN packets when the advertised receive window is zero. Signed-off-by: Simon Baatz Reviewed-by: Eric Dumazet Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20260224-fix_zero_wnd_fin-v2-2-a16677ea7cea@gmail.com Signed-off-by: Jakub Kicinski --- .../net/packetdrill/tcp_rcv_zero_wnd_fin.pkt | 27 +++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 tools/testing/selftests/net/packetdrill/tcp_rcv_zero_wnd_fin.pkt diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_zero_wnd_fin.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_zero_wnd_fin.pkt new file mode 100644 index 000000000000..e245359a1a91 --- /dev/null +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_zero_wnd_fin.pkt @@ -0,0 +1,27 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Some TCP stacks send FINs even though the window is closed. We break +// a possible FIN/ACK loop by accepting the FIN. + +--mss=1000 + +`./defaults.sh` + +// Establish a connection. + +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 + +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 + +0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [20000], 4) = 0 + +0 bind(3, ..., ...) = 0 + +0 listen(3, 1) = 0 + + +0 < S 0:0(0) win 32792 + +0 > S. 0:0(0) ack 1 + +0 < . 1:1(0) ack 1 win 257 + + +0 accept(3, ..., ...) = 4 + + +0 < P. 1:60001(60000) ack 1 win 257 + * > . 1:1(0) ack 60001 win 0 + + +0 < F. 60001:60001(0) ack 1 win 257 + +0 > . 1:1(0) ack 60002 win 0 From 676c7af91fcd740d34e7cb788cbc58e3bcafde39 Mon Sep 17 00:00:00 2001 From: Felix Gu Date: Tue, 24 Feb 2026 19:04:04 +0800 Subject: [PATCH 299/359] dpll: zl3073x: Remove redundant cleanup in devm_dpll_init() The devm_add_action_or_reset() function already executes the cleanup action on failure before returning an error, so the explicit goto error and subsequent zl3073x_dev_dpll_fini() call causes double cleanup. Fixes: ebb1031c5137 ("dpll: zl3073x: Refactor DPLL initialization") Reviewed-by: Ivan Vecera Signed-off-by: Felix Gu Link: https://patch.msgid.link/20260224-dpll-v2-1-d7786414a830@gmail.com Signed-off-by: Jakub Kicinski --- drivers/dpll/zl3073x/core.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/dpll/zl3073x/core.c b/drivers/dpll/zl3073x/core.c index 8c5ee28e02f4..37f3c33570ee 100644 --- a/drivers/dpll/zl3073x/core.c +++ b/drivers/dpll/zl3073x/core.c @@ -981,11 +981,7 @@ zl3073x_devm_dpll_init(struct zl3073x_dev *zldev, u8 num_dplls) } /* Add devres action to release DPLL related resources */ - rc = devm_add_action_or_reset(zldev->dev, zl3073x_dev_dpll_fini, zldev); - if (rc) - goto error; - - return 0; + return devm_add_action_or_reset(zldev->dev, zl3073x_dev_dpll_fini, zldev); error: zl3073x_dev_dpll_fini(zldev); From 916864e5eda0c52572b11b4cacce41b89622cc35 Mon Sep 17 00:00:00 2001 From: Mohd Ayaan Anwar Date: Tue, 24 Feb 2026 17:58:47 +0530 Subject: [PATCH 300/359] MAINTAINERS: Update maintainer entry for QUALCOMM ETHQOS ETHERNET DRIVER Replace Vinod Koul with Mohd Ayaan Anwar as the maintainer of the QUALCOMM ETHQOS ETHERNET DRIVER. Vinod confirmed he is no longer active in this area and agreed to be removed. Acked-by: Vinod Koul Suggested-by: Russell King (Oracle) Signed-off-by: Mohd Ayaan Anwar Reviewed-by: Russell King (Oracle) Link: https://patch.msgid.link/20260224-qcom_ethqos_maintainer-v1-1-24e02701ea52@oss.qualcomm.com Signed-off-by: Jakub Kicinski --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index cf0313ca6039..a461baa96483 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -21692,7 +21692,7 @@ S: Maintained F: drivers/net/ethernet/qualcomm/emac/ QUALCOMM ETHQOS ETHERNET DRIVER -M: Vinod Koul +M: Mohd Ayaan Anwar L: netdev@vger.kernel.org L: linux-arm-msm@vger.kernel.org S: Maintained From f975a0955276579e2176a134366ed586071c7c6a Mon Sep 17 00:00:00 2001 From: Dipayaan Roy Date: Tue, 24 Feb 2026 04:38:36 -0800 Subject: [PATCH 301/359] net: mana: Fix double destroy_workqueue on service rescan PCI path While testing corner cases in the driver, a use-after-free crash was found on the service rescan PCI path. When mana_serv_reset() calls mana_gd_suspend(), mana_gd_cleanup() destroys gc->service_wq. If the subsequent mana_gd_resume() fails with -ETIMEDOUT or -EPROTO, the code falls through to mana_serv_rescan() which triggers pci_stop_and_remove_bus_device(). This invokes the PCI .remove callback (mana_gd_remove), which calls mana_gd_cleanup() a second time, attempting to destroy the already- freed workqueue. Fix this by NULL-checking gc->service_wq in mana_gd_cleanup() and setting it to NULL after destruction. Call stack of issue for reference: [Sat Feb 21 18:53:48 2026] Call Trace: [Sat Feb 21 18:53:48 2026] [Sat Feb 21 18:53:48 2026] mana_gd_cleanup+0x33/0x70 [mana] [Sat Feb 21 18:53:48 2026] mana_gd_remove+0x3a/0xc0 [mana] [Sat Feb 21 18:53:48 2026] pci_device_remove+0x41/0xb0 [Sat Feb 21 18:53:48 2026] device_remove+0x46/0x70 [Sat Feb 21 18:53:48 2026] device_release_driver_internal+0x1e3/0x250 [Sat Feb 21 18:53:48 2026] device_release_driver+0x12/0x20 [Sat Feb 21 18:53:48 2026] pci_stop_bus_device+0x6a/0x90 [Sat Feb 21 18:53:48 2026] pci_stop_and_remove_bus_device+0x13/0x30 [Sat Feb 21 18:53:48 2026] mana_do_service+0x180/0x290 [mana] [Sat Feb 21 18:53:48 2026] mana_serv_func+0x24/0x50 [mana] [Sat Feb 21 18:53:48 2026] process_one_work+0x190/0x3d0 [Sat Feb 21 18:53:48 2026] worker_thread+0x16e/0x2e0 [Sat Feb 21 18:53:48 2026] kthread+0xf7/0x130 [Sat Feb 21 18:53:48 2026] ? __pfx_worker_thread+0x10/0x10 [Sat Feb 21 18:53:48 2026] ? __pfx_kthread+0x10/0x10 [Sat Feb 21 18:53:48 2026] ret_from_fork+0x269/0x350 [Sat Feb 21 18:53:48 2026] ? __pfx_kthread+0x10/0x10 [Sat Feb 21 18:53:48 2026] ret_from_fork_asm+0x1a/0x30 [Sat Feb 21 18:53:48 2026] Fixes: 505cc26bcae0 ("net: mana: Add support for auxiliary device servicing events") Reviewed-by: Haiyang Zhang Signed-off-by: Dipayaan Roy Reviewed-by: Simon Horman Link: https://patch.msgid.link/aZ2bzL64NagfyHpg@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/microsoft/mana/gdma_main.c | 5 ++++- drivers/net/ethernet/microsoft/mana/mana_en.c | 4 +++- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c index 0055c231acf6..3926d18f1840 100644 --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c @@ -1946,7 +1946,10 @@ static void mana_gd_cleanup(struct pci_dev *pdev) mana_gd_remove_irqs(pdev); - destroy_workqueue(gc->service_wq); + if (gc->service_wq) { + destroy_workqueue(gc->service_wq); + gc->service_wq = NULL; + } dev_dbg(&pdev->dev, "mana gdma cleanup successful\n"); } diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index 9b5a72ada5c4..f69e42651359 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -3762,7 +3762,9 @@ void mana_rdma_remove(struct gdma_dev *gd) } WRITE_ONCE(gd->rdma_teardown, true); - flush_workqueue(gc->service_wq); + + if (gc->service_wq) + flush_workqueue(gc->service_wq); if (gd->adev) remove_adev(gd); From bb4c698633c0e19717586a6524a33196cff01a32 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Tue, 24 Feb 2026 14:57:08 +0200 Subject: [PATCH 302/359] team: avoid NETDEV_CHANGEMTU event when unregistering slave syzbot is reporting unregister_netdevice: waiting for netdevsim0 to become free. Usage count = 3 ref_tracker: netdev@ffff88807dcf8618 has 1/2 users at __netdev_tracker_alloc include/linux/netdevice.h:4400 [inline] netdev_hold include/linux/netdevice.h:4429 [inline] inetdev_init+0x201/0x4e0 net/ipv4/devinet.c:286 inetdev_event+0x251/0x1610 net/ipv4/devinet.c:1600 notifier_call_chain+0x19d/0x3a0 kernel/notifier.c:85 call_netdevice_notifiers_mtu net/core/dev.c:2318 [inline] netif_set_mtu_ext+0x5aa/0x800 net/core/dev.c:9886 netif_set_mtu+0xd7/0x1b0 net/core/dev.c:9907 dev_set_mtu+0x126/0x260 net/core/dev_api.c:248 team_port_del+0xb07/0xcb0 drivers/net/team/team_core.c:1333 team_del_slave drivers/net/team/team_core.c:1936 [inline] team_device_event+0x207/0x5b0 drivers/net/team/team_core.c:2929 notifier_call_chain+0x19d/0x3a0 kernel/notifier.c:85 call_netdevice_notifiers_extack net/core/dev.c:2281 [inline] call_netdevice_notifiers net/core/dev.c:2295 [inline] __dev_change_net_namespace+0xcb7/0x2050 net/core/dev.c:12592 do_setlink+0x2ce/0x4590 net/core/rtnetlink.c:3060 rtnl_changelink net/core/rtnetlink.c:3776 [inline] __rtnl_newlink net/core/rtnetlink.c:3935 [inline] rtnl_newlink+0x15a9/0x1be0 net/core/rtnetlink.c:4072 rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6958 netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x80f/0x9b0 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894 problem. Ido Schimmel found steps to reproduce ip link add name team1 type team ip link add name dummy1 mtu 1499 master team1 type dummy ip netns add ns1 ip link set dev dummy1 netns ns1 ip -n ns1 link del dev dummy1 and also found that the same issue was fixed in the bond driver in commit f51048c3e07b ("bonding: avoid NETDEV_CHANGEMTU event when unregistering slave"). Let's do similar thing for the team driver, with commit ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations") and commit 303a8487a657 ("net: s/__dev_set_mtu/__netif_set_mtu/") also applied. Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84 Suggested-by: Ido Schimmel Reviewed-by: Jiri Pirko Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device") Signed-off-by: Tetsuo Handa Signed-off-by: Ido Schimmel Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20260224125709.317574-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski --- drivers/net/team/team_core.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c index c08a5c1bd6e4..a0fe998cc055 100644 --- a/drivers/net/team/team_core.c +++ b/drivers/net/team/team_core.c @@ -1292,7 +1292,7 @@ err_set_mtu: static void __team_port_change_port_removed(struct team_port *port); -static int team_port_del(struct team *team, struct net_device *port_dev) +static int team_port_del(struct team *team, struct net_device *port_dev, bool unregister) { struct net_device *dev = team->dev; struct team_port *port; @@ -1330,7 +1330,13 @@ static int team_port_del(struct team *team, struct net_device *port_dev) __team_port_change_port_removed(port); team_port_set_orig_dev_addr(port); - dev_set_mtu(port_dev, port->orig.mtu); + if (unregister) { + netdev_lock_ops(port_dev); + __netif_set_mtu(port_dev, port->orig.mtu); + netdev_unlock_ops(port_dev); + } else { + dev_set_mtu(port_dev, port->orig.mtu); + } kfree_rcu(port, rcu); netdev_info(dev, "Port device %s removed\n", portname); netdev_compute_master_upper_features(team->dev, true); @@ -1634,7 +1640,7 @@ static void team_uninit(struct net_device *dev) ASSERT_RTNL(); list_for_each_entry_safe(port, tmp, &team->port_list, list) - team_port_del(team, port->dev); + team_port_del(team, port->dev, false); __team_change_mode(team, NULL); /* cleanup */ __team_options_unregister(team, team_options, ARRAY_SIZE(team_options)); @@ -1933,7 +1939,16 @@ static int team_del_slave(struct net_device *dev, struct net_device *port_dev) ASSERT_RTNL(); - return team_port_del(team, port_dev); + return team_port_del(team, port_dev, false); +} + +static int team_del_slave_on_unregister(struct net_device *dev, struct net_device *port_dev) +{ + struct team *team = netdev_priv(dev); + + ASSERT_RTNL(); + + return team_port_del(team, port_dev, true); } static netdev_features_t team_fix_features(struct net_device *dev, @@ -2926,7 +2941,7 @@ static int team_device_event(struct notifier_block *unused, !!netif_oper_up(port->dev)); break; case NETDEV_UNREGISTER: - team_del_slave(port->team->dev, dev); + team_del_slave_on_unregister(port->team->dev, dev); break; case NETDEV_FEAT_CHANGE: if (!port->team->notifier_ctx) { @@ -2999,3 +3014,4 @@ MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Jiri Pirko "); MODULE_DESCRIPTION("Ethernet team device driver"); MODULE_ALIAS_RTNL_LINK(DRV_NAME); +MODULE_IMPORT_NS("NETDEV_INTERNAL"); From 58f8ef625e23a607f4a89758d6b328a2701f7354 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 24 Feb 2026 14:57:09 +0200 Subject: [PATCH 303/359] selftests: team: Add a reference count leak test Add a test for the issue that was fixed in "team: avoid NETDEV_CHANGEMTU event when unregistering slave". The test hangs due to a reference count leak without the fix: # make -C tools/testing/selftests TARGETS="drivers/net/team" TEST_PROGS=refleak.sh TEST_GEN_PROGS="" run_tests [...] TAP version 13 1..1 # timeout set to 45 # selftests: drivers/net/team: refleak.sh [ 50.681299][ T496] unregister_netdevice: waiting for dummy1 to become free. Usage count = 3 [ 71.185325][ T496] unregister_netdevice: waiting for dummy1 to become free. Usage count = 3 And passes with the fix: # make -C tools/testing/selftests TARGETS="drivers/net/team" TEST_PROGS=refleak.sh TEST_GEN_PROGS="" run_tests [...] TAP version 13 1..1 # timeout set to 45 # selftests: drivers/net/team: refleak.sh ok 1 selftests: drivers/net/team: refleak.sh Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20260224125709.317574-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski --- .../testing/selftests/drivers/net/team/Makefile | 1 + .../selftests/drivers/net/team/refleak.sh | 17 +++++++++++++++++ 2 files changed, 18 insertions(+) create mode 100755 tools/testing/selftests/drivers/net/team/refleak.sh diff --git a/tools/testing/selftests/drivers/net/team/Makefile b/tools/testing/selftests/drivers/net/team/Makefile index 1340b3df9c31..45a3e7ad3dcb 100644 --- a/tools/testing/selftests/drivers/net/team/Makefile +++ b/tools/testing/selftests/drivers/net/team/Makefile @@ -5,6 +5,7 @@ TEST_PROGS := \ dev_addr_lists.sh \ options.sh \ propagation.sh \ + refleak.sh \ # end of TEST_PROGS TEST_INCLUDES := \ diff --git a/tools/testing/selftests/drivers/net/team/refleak.sh b/tools/testing/selftests/drivers/net/team/refleak.sh new file mode 100755 index 000000000000..ef08213ab964 --- /dev/null +++ b/tools/testing/selftests/drivers/net/team/refleak.sh @@ -0,0 +1,17 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# shellcheck disable=SC2154 + +lib_dir=$(dirname "$0") +source "$lib_dir"/../../../net/lib.sh + +trap cleanup_all_ns EXIT + +# Test that there is no reference count leak and that dummy1 can be deleted. +# https://lore.kernel.org/netdev/4d69abe1-ca8d-4f0b-bcf8-13899b211e57@I-love.SAKURA.ne.jp/ +setup_ns ns1 ns2 +ip -n "$ns1" link add name team1 type team +ip -n "$ns1" link add name dummy1 mtu 1499 type dummy +ip -n "$ns1" link set dev dummy1 master team1 +ip -n "$ns1" link set dev dummy1 netns "$ns2" +ip -n "$ns2" link del dev dummy1 From 2700b7e603af39ca55fe9fc876ca123efd44680f Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Tue, 24 Feb 2026 13:46:48 +0200 Subject: [PATCH 304/359] net/mlx5: DR, Fix circular locking dependency in dump Fix a circular locking dependency between dbg_mutex and the domain rx/tx mutexes that could lead to a deadlock. The dump path in dr_dump_domain_all() was acquiring locks in the order: dbg_mutex -> rx.mutex -> tx.mutex While the table/matcher creation paths acquire locks in the order: rx.mutex -> tx.mutex -> dbg_mutex This inverted lock ordering creates a circular dependency. Fix this by changing dr_dump_domain_all() to acquire the domain lock before dbg_mutex, matching the order used in mlx5dr_table_create() and mlx5dr_matcher_create(). Lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 6.19.0-rc6net_next_e817c4e #1 Not tainted ------------------------------------------------------ sos/30721 is trying to acquire lock: ffff888102df5900 (&dmn->info.rx.mutex){+.+.}-{4:4}, at: dr_dump_start+0x131/0x450 [mlx5_core] but task is already holding lock: ffff888102df5bc0 (&dmn->dump_info.dbg_mutex){+.+.}-{4:4}, at: dr_dump_start+0x10b/0x450 [mlx5_core] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&dmn->dump_info.dbg_mutex){+.+.}-{4:4}: __mutex_lock+0x91/0x1060 mlx5dr_matcher_create+0x377/0x5e0 [mlx5_core] mlx5_cmd_dr_create_flow_group+0x62/0xd0 [mlx5_core] mlx5_create_flow_group+0x113/0x1c0 [mlx5_core] mlx5_chains_create_prio+0x453/0x2290 [mlx5_core] mlx5_chains_get_table+0x2e2/0x980 [mlx5_core] esw_chains_create+0x1e6/0x3b0 [mlx5_core] esw_create_offloads_fdb_tables.cold+0x62/0x63f [mlx5_core] esw_offloads_enable+0x76f/0xd20 [mlx5_core] mlx5_eswitch_enable_locked+0x35a/0x500 [mlx5_core] mlx5_devlink_eswitch_mode_set+0x561/0x950 [mlx5_core] devlink_nl_eswitch_set_doit+0x67/0xe0 genl_family_rcv_msg_doit+0xe0/0x130 genl_rcv_msg+0x188/0x290 netlink_rcv_skb+0x4b/0xf0 genl_rcv+0x24/0x40 netlink_unicast+0x1ed/0x2c0 netlink_sendmsg+0x210/0x450 __sock_sendmsg+0x38/0x60 __sys_sendto+0x119/0x180 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x70/0xd00 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #1 (&dmn->info.tx.mutex){+.+.}-{4:4}: __mutex_lock+0x91/0x1060 mlx5dr_table_create+0x11d/0x530 [mlx5_core] mlx5_cmd_dr_create_flow_table+0x62/0x140 [mlx5_core] __mlx5_create_flow_table+0x46f/0x960 [mlx5_core] mlx5_create_flow_table+0x16/0x20 [mlx5_core] esw_create_offloads_fdb_tables+0x136/0x240 [mlx5_core] esw_offloads_enable+0x76f/0xd20 [mlx5_core] mlx5_eswitch_enable_locked+0x35a/0x500 [mlx5_core] mlx5_devlink_eswitch_mode_set+0x561/0x950 [mlx5_core] devlink_nl_eswitch_set_doit+0x67/0xe0 genl_family_rcv_msg_doit+0xe0/0x130 genl_rcv_msg+0x188/0x290 netlink_rcv_skb+0x4b/0xf0 genl_rcv+0x24/0x40 netlink_unicast+0x1ed/0x2c0 netlink_sendmsg+0x210/0x450 __sock_sendmsg+0x38/0x60 __sys_sendto+0x119/0x180 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x70/0xd00 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #0 (&dmn->info.rx.mutex){+.+.}-{4:4}: __lock_acquire+0x18b6/0x2eb0 lock_acquire+0xd3/0x2c0 __mutex_lock+0x91/0x1060 dr_dump_start+0x131/0x450 [mlx5_core] seq_read_iter+0xe3/0x410 seq_read+0xfb/0x130 full_proxy_read+0x53/0x80 vfs_read+0xba/0x330 ksys_read+0x65/0xe0 do_syscall_64+0x70/0xd00 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&dmn->dump_info.dbg_mutex); lock(&dmn->info.tx.mutex); lock(&dmn->dump_info.dbg_mutex); lock(&dmn->info.rx.mutex); *** DEADLOCK *** Fixes: 9222f0b27da2 ("net/mlx5: DR, Add support for dumping steering info") Signed-off-by: Shay Drory Reviewed-by: Yevgeny Kliteynik Reviewed-by: Alex Vesker Signed-off-by: Tariq Toukan Reviewed-by: Simon Horman Link: https://patch.msgid.link/20260224114652.1787431-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_dbg.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_dbg.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_dbg.c index 8803fa071c50..18362e9c3314 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_dbg.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_dbg.c @@ -1051,8 +1051,8 @@ static int dr_dump_domain_all(struct seq_file *file, struct mlx5dr_domain *dmn) struct mlx5dr_table *tbl; int ret; - mutex_lock(&dmn->dump_info.dbg_mutex); mlx5dr_domain_lock(dmn); + mutex_lock(&dmn->dump_info.dbg_mutex); ret = dr_dump_domain(file, dmn); if (ret < 0) @@ -1065,8 +1065,8 @@ static int dr_dump_domain_all(struct seq_file *file, struct mlx5dr_domain *dmn) } unlock_mutex: - mlx5dr_domain_unlock(dmn); mutex_unlock(&dmn->dump_info.dbg_mutex); + mlx5dr_domain_unlock(dmn); return ret; } From bd7b9f83fb9f85228c3ac9748d9cba9fab7fb5a2 Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Tue, 24 Feb 2026 13:46:49 +0200 Subject: [PATCH 305/359] net/mlx5: LAG, disable MPESW in lag_disable_change() mlx5_lag_disable_change() unconditionally called mlx5_disable_lag() when LAG was active, which is incorrect for MLX5_LAG_MODE_MPESW. Hnece, call mlx5_disable_mpesw() when running in MPESW mode. Fixes: a32327a3a02c ("net/mlx5: Lag, Control MultiPort E-Switch single FDB mode") Signed-off-by: Shay Drory Reviewed-by: Mark Bloch Signed-off-by: Tariq Toukan Reviewed-by: Simon Horman Link: https://patch.msgid.link/20260224114652.1787431-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 8 ++++++-- drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c | 8 ++++---- drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.h | 5 +++++ 3 files changed, 15 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c index 9fe47c836ebd..859f042caf79 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c @@ -1869,8 +1869,12 @@ void mlx5_lag_disable_change(struct mlx5_core_dev *dev) mutex_lock(&ldev->lock); ldev->mode_changes_in_progress++; - if (__mlx5_lag_is_active(ldev)) - mlx5_disable_lag(ldev); + if (__mlx5_lag_is_active(ldev)) { + if (ldev->mode == MLX5_LAG_MODE_MPESW) + mlx5_lag_disable_mpesw(ldev); + else + mlx5_disable_lag(ldev); + } mutex_unlock(&ldev->lock); mlx5_devcom_comp_unlock(dev->priv.hca_devcom_comp); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c index 04762562d7d9..a63d48d18878 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c @@ -65,7 +65,7 @@ err_metadata: return err; } -static int enable_mpesw(struct mlx5_lag *ldev) +static int mlx5_lag_enable_mpesw(struct mlx5_lag *ldev) { struct mlx5_core_dev *dev0; int err; @@ -126,7 +126,7 @@ err_add_devices: return err; } -static void disable_mpesw(struct mlx5_lag *ldev) +void mlx5_lag_disable_mpesw(struct mlx5_lag *ldev) { if (ldev->mode == MLX5_LAG_MODE_MPESW) { mlx5_mpesw_metadata_cleanup(ldev); @@ -152,9 +152,9 @@ static void mlx5_mpesw_work(struct work_struct *work) } if (mpesww->op == MLX5_MPESW_OP_ENABLE) - mpesww->result = enable_mpesw(ldev); + mpesww->result = mlx5_lag_enable_mpesw(ldev); else if (mpesww->op == MLX5_MPESW_OP_DISABLE) - disable_mpesw(ldev); + mlx5_lag_disable_mpesw(ldev); unlock: mutex_unlock(&ldev->lock); mlx5_devcom_comp_unlock(devcom); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.h b/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.h index f5d9b5c97b0d..b767dbb4f457 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.h @@ -31,6 +31,11 @@ int mlx5_lag_mpesw_do_mirred(struct mlx5_core_dev *mdev, bool mlx5_lag_is_mpesw(struct mlx5_core_dev *dev); void mlx5_lag_mpesw_disable(struct mlx5_core_dev *dev); int mlx5_lag_mpesw_enable(struct mlx5_core_dev *dev); +#ifdef CONFIG_MLX5_ESWITCH +void mlx5_lag_disable_mpesw(struct mlx5_lag *ldev); +#else +static inline void mlx5_lag_disable_mpesw(struct mlx5_lag *ldev) {} +#endif /* CONFIG_MLX5_ESWITCH */ #ifdef CONFIG_MLX5_ESWITCH void mlx5_mpesw_speed_update_work(struct work_struct *work); From d7073e8b978ae925f1f0f08754f33f84d8547ea7 Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Tue, 24 Feb 2026 13:46:50 +0200 Subject: [PATCH 306/359] net/mlx5: E-switch, Clear legacy flag when moving to switchdev The cited commit introduced MLX5_PRIV_FLAGS_SWITCH_LEGACY to identify when a transition to legacy mode is requested via devlink. However, the logic failed to clear this flag if the mode was subsequently changed back to MLX5_ESWITCH_OFFLOADS (switchdev). Consequently, if a user toggled from legacy to switchdev, the flag remained set, leaving the driver with wrong state indicating Fix this by explicitly clearing the MLX5_PRIV_FLAGS_SWITCH_LEGACY bit when the requested mode is MLX5_ESWITCH_OFFLOADS. Fixes: 2a4f56fbcc47 ("net/mlx5e: Keep netdev when leave switchdev for devlink set legacy only") Signed-off-by: Shay Drory Reviewed-by: Mark Bloch Signed-off-by: Tariq Toukan Reviewed-by: Simon Horman Link: https://patch.msgid.link/20260224114652.1787431-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index 1b439cef3719..9d51d030596c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -4068,6 +4068,8 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode, if (mlx5_mode == MLX5_ESWITCH_LEGACY) esw->dev->priv.flags |= MLX5_PRIV_FLAGS_SWITCH_LEGACY; + if (mlx5_mode == MLX5_ESWITCH_OFFLOADS) + esw->dev->priv.flags &= ~MLX5_PRIV_FLAGS_SWITCH_LEGACY; mlx5_eswitch_disable_locked(esw); if (mlx5_mode == MLX5_ESWITCH_OFFLOADS) { if (mlx5_devlink_trap_get_num_active(esw->dev)) { From 60253042c0b87b61596368489c44d12ba720d11c Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Tue, 24 Feb 2026 13:46:51 +0200 Subject: [PATCH 307/359] net/mlx5: Fix missing devlink lock in SRIOV enable error path The cited commit miss to add locking in the error path of mlx5_sriov_enable(). When pci_enable_sriov() fails, mlx5_device_disable_sriov() is called to clean up. This cleanup function now expects to be called with the devlink instance lock held. Add the missing devl_lock(devlink) and devl_unlock(devlink) Fixes: 84a433a40d0e ("net/mlx5: Lock mlx5 devlink reload callbacks") Signed-off-by: Shay Drory Reviewed-by: Mark Bloch Signed-off-by: Tariq Toukan Reviewed-by: Simon Horman Link: https://patch.msgid.link/20260224114652.1787431-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mellanox/mlx5/core/sriov.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c index a2fc937d5461..172862a70c70 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c @@ -193,7 +193,9 @@ static int mlx5_sriov_enable(struct pci_dev *pdev, int num_vfs) err = pci_enable_sriov(pdev, num_vfs); if (err) { mlx5_core_warn(dev, "pci_enable_sriov failed : %d\n", err); + devl_lock(devlink); mlx5_device_disable_sriov(dev, num_vfs, true, true); + devl_unlock(devlink); } return err; } From 859380694f434597407632c29f30fdb5e763e6cc Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Tue, 24 Feb 2026 13:46:52 +0200 Subject: [PATCH 308/359] net/mlx5e: Fix "scheduling while atomic" in IPsec MAC address query Fix a "scheduling while atomic" bug in mlx5e_ipsec_init_macs() by replacing mlx5_query_mac_address() with ether_addr_copy() to get the local MAC address directly from netdev->dev_addr. The issue occurs because mlx5_query_mac_address() queries the hardware which involves mlx5_cmd_exec() that can sleep, but it is called from the mlx5e_ipsec_handle_event workqueue which runs in atomic context. The MAC address is already available in netdev->dev_addr, so no need to query hardware. This avoids the sleeping call and resolves the bug. Call trace: BUG: scheduling while atomic: kworker/u112:2/69344/0x00000200 __schedule+0x7ab/0xa20 schedule+0x1c/0xb0 schedule_timeout+0x6e/0xf0 __wait_for_common+0x91/0x1b0 cmd_exec+0xa85/0xff0 [mlx5_core] mlx5_cmd_exec+0x1f/0x50 [mlx5_core] mlx5_query_nic_vport_mac_address+0x7b/0xd0 [mlx5_core] mlx5_query_mac_address+0x19/0x30 [mlx5_core] mlx5e_ipsec_init_macs+0xc1/0x720 [mlx5_core] mlx5e_ipsec_build_accel_xfrm_attrs+0x422/0x670 [mlx5_core] mlx5e_ipsec_handle_event+0x2b9/0x460 [mlx5_core] process_one_work+0x178/0x2e0 worker_thread+0x2ea/0x430 Fixes: cee137a63431 ("net/mlx5e: Handle ESN update events") Signed-off-by: Jianbo Liu Reviewed-by: Leon Romanovsky Signed-off-by: Tariq Toukan Reviewed-by: Simon Horman Link: https://patch.msgid.link/20260224114652.1787431-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index 9c7064187ed0..f03507a522b4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -259,7 +259,6 @@ static void mlx5e_ipsec_init_limits(struct mlx5e_ipsec_sa_entry *sa_entry, static void mlx5e_ipsec_init_macs(struct mlx5e_ipsec_sa_entry *sa_entry, struct mlx5_accel_esp_xfrm_attrs *attrs) { - struct mlx5_core_dev *mdev = mlx5e_ipsec_sa2dev(sa_entry); struct mlx5e_ipsec_addr *addrs = &attrs->addrs; struct net_device *netdev = sa_entry->dev; struct xfrm_state *x = sa_entry->x; @@ -276,7 +275,7 @@ static void mlx5e_ipsec_init_macs(struct mlx5e_ipsec_sa_entry *sa_entry, attrs->type != XFRM_DEV_OFFLOAD_PACKET) return; - mlx5_query_mac_address(mdev, addr); + ether_addr_copy(addr, netdev->dev_addr); switch (attrs->dir) { case XFRM_DEV_OFFLOAD_IN: src = attrs->dmac; From 7c2889af823340d1d410939b9d547bf184d5fa54 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Wed, 25 Feb 2026 15:48:59 +0200 Subject: [PATCH 309/359] RDMA/uverbs: Import DMA-BUF module in uverbs_std_types_dmabuf file Fix the following compilation error: ERROR: modpost: module ib_uverbs uses symbol dma_buf_move_notify from namespace DMA_BUF, but does not import it. Fixes: 0ac6f4056c4a ("RDMA/uverbs: Add DMABUF object type and operations") Link: https://patch.msgid.link/20260225-fix-uverbs-compilation-v1-1-acf7b3d0f9fa@nvidia.com Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/uverbs_std_types_dmabuf.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/core/uverbs_std_types_dmabuf.c b/drivers/infiniband/core/uverbs_std_types_dmabuf.c index dfdfcd1d1a44..4a7f8b6f9dc8 100644 --- a/drivers/infiniband/core/uverbs_std_types_dmabuf.c +++ b/drivers/infiniband/core/uverbs_std_types_dmabuf.c @@ -10,6 +10,8 @@ #include "rdma_core.h" #include "uverbs.h" +MODULE_IMPORT_NS("DMA_BUF"); + static int uverbs_dmabuf_attach(struct dma_buf *dmabuf, struct dma_buf_attachment *attachment) { From a382a34276cb94d1cdc620b622dd85c55589a166 Mon Sep 17 00:00:00 2001 From: Bobby Eshleman Date: Mon, 23 Feb 2026 14:38:32 -0800 Subject: [PATCH 310/359] selftests/vsock: change tests to respect write-once child ns mode The child_ns_mode sysctl parameter becomes write-once in a future patch in this series, which breaks existing tests. This patch updates the tests to respect this new policy. No additional tests are added. Add "global-parent" and "local-parent" namespaces as intermediaries to spawn namespaces in the given modes. This avoids the need to change "child_ns_mode" in the init_ns. nsenter must be used because ip netns unshares the mount namespace so nested "ip netns add" breaks exec calls from the init ns. Adds nsenter to the deps check. Reviewed-by: Stefano Garzarella Signed-off-by: Bobby Eshleman Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-1-c0cde6959923@meta.com Signed-off-by: Paolo Abeni --- tools/testing/selftests/vsock/vmtest.sh | 39 +++++++++++++------------ 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index dc8dbe74a6d0..86e338886b33 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -210,16 +210,21 @@ check_result() { } add_namespaces() { - local orig_mode - orig_mode=$(cat /proc/sys/net/vsock/child_ns_mode) + ip netns add "global-parent" 2>/dev/null + echo "global" | ip netns exec "global-parent" \ + tee /proc/sys/net/vsock/child_ns_mode &>/dev/null + ip netns add "local-parent" 2>/dev/null + echo "local" | ip netns exec "local-parent" \ + tee /proc/sys/net/vsock/child_ns_mode &>/dev/null - for mode in "${NS_MODES[@]}"; do - echo "${mode}" > /proc/sys/net/vsock/child_ns_mode - ip netns add "${mode}0" 2>/dev/null - ip netns add "${mode}1" 2>/dev/null - done - - echo "${orig_mode}" > /proc/sys/net/vsock/child_ns_mode + nsenter --net=/var/run/netns/global-parent \ + ip netns add "global0" 2>/dev/null + nsenter --net=/var/run/netns/global-parent \ + ip netns add "global1" 2>/dev/null + nsenter --net=/var/run/netns/local-parent \ + ip netns add "local0" 2>/dev/null + nsenter --net=/var/run/netns/local-parent \ + ip netns add "local1" 2>/dev/null } init_namespaces() { @@ -237,6 +242,8 @@ del_namespaces() { log_host "removed ns ${mode}0" log_host "removed ns ${mode}1" done + ip netns del "global-parent" &>/dev/null + ip netns del "local-parent" &>/dev/null } vm_ssh() { @@ -287,7 +294,7 @@ check_args() { } check_deps() { - for dep in vng ${QEMU} busybox pkill ssh ss socat; do + for dep in vng ${QEMU} busybox pkill ssh ss socat nsenter; do if [[ ! -x $(command -v "${dep}") ]]; then echo -e "skip: dependency ${dep} not found!\n" exit "${KSFT_SKIP}" @@ -1231,12 +1238,8 @@ test_ns_local_same_cid_ok() { } test_ns_host_vsock_child_ns_mode_ok() { - local orig_mode - local rc + local rc="${KSFT_PASS}" - orig_mode=$(cat /proc/sys/net/vsock/child_ns_mode) - - rc="${KSFT_PASS}" for mode in "${NS_MODES[@]}"; do local ns="${mode}0" @@ -1246,15 +1249,13 @@ test_ns_host_vsock_child_ns_mode_ok() { continue fi - if ! echo "${mode}" > /proc/sys/net/vsock/child_ns_mode; then - log_host "child_ns_mode should be writable to ${mode}" + if ! echo "${mode}" | ip netns exec "${ns}" \ + tee /proc/sys/net/vsock/child_ns_mode &>/dev/null; then rc="${KSFT_FAIL}" continue fi done - echo "${orig_mode}" > /proc/sys/net/vsock/child_ns_mode - return "${rc}" } From 102eab95f025b4d3f3a6c0a858400aca2af2fe52 Mon Sep 17 00:00:00 2001 From: Bobby Eshleman Date: Mon, 23 Feb 2026 14:38:33 -0800 Subject: [PATCH 311/359] vsock: lock down child_ns_mode as write-once Two administrator processes may race when setting child_ns_mode as one process sets child_ns_mode to "local" and then creates a namespace, but another process changes child_ns_mode to "global" between the write and the namespace creation. The first process ends up with a namespace in "global" mode instead of "local". While this can be detected after the fact by reading ns_mode and retrying, it is fragile and error-prone. Make child_ns_mode write-once so that a namespace manager can set it once and be sure it won't change. Writing a different value after the first write returns -EBUSY. This applies to all namespaces, including init_net, where an init process can write "local" to lock all future namespaces into local mode. Fixes: eafb64f40ca4 ("vsock: add netns to vsock core") Suggested-by: Daan De Meyer Suggested-by: Stefano Garzarella Co-developed-by: Stefano Garzarella Signed-off-by: Stefano Garzarella Signed-off-by: Bobby Eshleman Reviewed-by: Stefano Garzarella Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-2-c0cde6959923@meta.com Signed-off-by: Paolo Abeni --- include/net/af_vsock.h | 13 +++++++++++-- include/net/netns/vsock.h | 3 +++ net/vmw_vsock/af_vsock.c | 15 ++++++++++----- 3 files changed, 24 insertions(+), 7 deletions(-) diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index d3ff48a2fbe0..533d8e75f7bb 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -276,10 +276,19 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk) return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL; } -static inline void vsock_net_set_child_mode(struct net *net, +static inline bool vsock_net_set_child_mode(struct net *net, enum vsock_net_mode mode) { - WRITE_ONCE(net->vsock.child_ns_mode, mode); + int new_locked = mode + 1; + int old_locked = 0; /* unlocked */ + + if (try_cmpxchg(&net->vsock.child_ns_mode_locked, + &old_locked, new_locked)) { + WRITE_ONCE(net->vsock.child_ns_mode, mode); + return true; + } + + return old_locked == new_locked; } static inline enum vsock_net_mode vsock_net_child_mode(struct net *net) diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h index b34d69a22fa8..dc8cbe45f406 100644 --- a/include/net/netns/vsock.h +++ b/include/net/netns/vsock.h @@ -17,5 +17,8 @@ struct netns_vsock { enum vsock_net_mode mode; enum vsock_net_mode child_ns_mode; + + /* 0 = unlocked, 1 = locked to global, 2 = locked to local */ + int child_ns_mode_locked; }; #endif /* __NET_NET_NAMESPACE_VSOCK_H */ diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index f4062c6a1944..2f7d94d682cb 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -90,16 +90,20 @@ * * - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's * mode, which is set at namespace creation and immutable thereafter. - * - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future + * - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future * child namespaces will inherit when created. The initial value matches * the namespace's own ns_mode. * * Changing child_ns_mode only affects newly created namespaces, not the * current namespace or existing children. A "local" namespace cannot set - * child_ns_mode to "global". At namespace creation, ns_mode is inherited - * from the parent's child_ns_mode. + * child_ns_mode to "global". child_ns_mode is write-once, so that it may be + * configured and locked down by a namespace manager. Writing a different + * value after the first write returns -EBUSY. At namespace creation, ns_mode + * is inherited from the parent's child_ns_mode. * - * The init_net mode is "global" and cannot be modified. + * The init_net mode is "global" and cannot be modified. The init_net + * child_ns_mode is also write-once, so an init process (e.g. systemd) can + * set it to "local" to ensure all new namespaces inherit local mode. * * The modes affect the allocation and accessibility of CIDs as follows: * @@ -2853,7 +2857,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write, new_mode == VSOCK_NET_MODE_GLOBAL) return -EPERM; - vsock_net_set_child_mode(net, new_mode); + if (!vsock_net_set_child_mode(net, new_mode)) + return -EBUSY; } return 0; From b6302e057fdc8f199ddae736ecdf45029f892e5c Mon Sep 17 00:00:00 2001 From: Bobby Eshleman Date: Mon, 23 Feb 2026 14:38:34 -0800 Subject: [PATCH 312/359] vsock: document write-once behavior of the child_ns_mode sysctl Update the vsock child_ns_mode documentation to include the new write-once semantics of setting child_ns_mode. The semantics are implemented in a preceding patch in this series. Signed-off-by: Bobby Eshleman Reviewed-by: Stefano Garzarella Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-3-c0cde6959923@meta.com Signed-off-by: Paolo Abeni --- Documentation/admin-guide/sysctl/net.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst index c10530624f1e..3b2ad61995d4 100644 --- a/Documentation/admin-guide/sysctl/net.rst +++ b/Documentation/admin-guide/sysctl/net.rst @@ -594,6 +594,9 @@ Values: their sockets will only be able to connect within their own namespace. +The first write to ``child_ns_mode`` locks its value. Subsequent writes of the +same value succeed, but writing a different value returns ``-EBUSY``. + Changing ``child_ns_mode`` only affects namespaces created after the change; it does not modify the current namespace or any existing children. From 7aa767d0d3d04e50ae94e770db7db8197f666970 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 23 Feb 2026 15:51:00 -0800 Subject: [PATCH 313/359] net: consume xmit errors of GSO frames udpgro_frglist.sh and udpgro_bench.sh are the flakiest tests currently in NIPA. They fail in the same exact way, TCP GRO test stalls occasionally and the test gets killed after 10min. These tests use veth to simulate GRO. They attach a trivial ("return XDP_PASS;") XDP program to the veth to force TSO off and NAPI on. Digging into the failure mode we can see that the connection is completely stuck after a burst of drops. The sender's snd_nxt is at sequence number N [1], but the receiver claims to have received (rcv_nxt) up to N + 3 * MSS [2]. Last piece of the puzzle is that senders rtx queue is not empty (let's say the block in the rtx queue is at sequence number N - 4 * MSS [3]). In this state, sender sends a retransmission from the rtx queue with a single segment, and sequence numbers N-4*MSS:N-3*MSS [3]. Receiver sees it and responds with an ACK all the way up to N + 3 * MSS [2]. But sender will reject this ack as TCP_ACK_UNSENT_DATA because it has no recollection of ever sending data that far out [1]. And we are stuck. The root cause is the mess of the xmit return codes. veth returns an error when it can't xmit a frame. We end up with a loss event like this: ------------------------------------------------- | GSO super frame 1 | GSO super frame 2 | |-----------------------------------------------| | seg | seg | seg | seg | seg | seg | seg | seg | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | ------------------------------------------------- x ok ok | ok ok ok \\ snd_nxt "x" means packet lost by veth, and "ok" means it went thru. Since veth has TSO disabled in this test it sees individual segments. Segment 1 is on the retransmit queue and will be resent. So why did the sender not advance snd_nxt even tho it clearly did send up to seg 8? tcp_write_xmit() interprets the return code from the core to mean that data has not been sent at all. Since TCP deals with GSO super frames, not individual segment the crux of the problem is that loss of a single segment can be interpreted as loss of all. TCP only sees the last return code for the last segment of the GSO frame (in <> brackets in the diagram above). Of course for the problem to occur we need a setup or a device without a Qdisc. Otherwise Qdisc layer disconnects the protocol layer from the device errors completely. We have multiple ways to fix this. 1) make veth not return an error when it lost a packet. While this is what I think we did in the past, the issue keeps reappearing and it's annoying to debug. The game of whack a mole is not great. 2) fix the damn return codes We only talk about NETDEV_TX_OK and NETDEV_TX_BUSY in the documentation, so maybe we should make the return code from ndo_start_xmit() a boolean. I like that the most, but perhaps some ancient, not-really-networking protocol would suffer. 3) make TCP ignore the errors It is not entirely clear to me what benefit TCP gets from interpreting the result of ip_queue_xmit()? Specifically once the connection is established and we're pushing data - packet loss is just packet loss? 4) this fix Ignore the rc in the Qdisc-less+GSO case, since it's unreliable. We already always return OK in the TCQ_F_CAN_BYPASS case. In the Qdisc-less case let's be a bit more conservative and only mask the GSO errors. This path is taken by non-IP-"networks" like CAN, MCTP etc, so we could regress some ancient thing. This is the simplest, but also maybe the hackiest fix? Similar fix has been proposed by Eric in the past but never committed because original reporter was working with an OOT driver and wasn't providing feedback (see Link). Link: https://lore.kernel.org/CANn89iJcLepEin7EtBETrZ36bjoD9LrR=k4cfwWh046GB+4f9A@mail.gmail.com Fixes: 1f59533f9ca5 ("qdisc: validate frames going through the direct_xmit path") Signed-off-by: Jakub Kicinski Reviewed-by: Eric Dumazet Link: https://patch.msgid.link/20260223235100.108939-1-kuba@kernel.org Signed-off-by: Paolo Abeni --- net/core/dev.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index f3426385f1ba..3d2d3b18915c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4822,6 +4822,8 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) * to -1 or to their cpu id, but not to our id. */ if (READ_ONCE(txq->xmit_lock_owner) != cpu) { + bool is_list = false; + if (dev_xmit_recursion()) goto recursion_alert; @@ -4832,17 +4834,28 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) HARD_TX_LOCK(dev, txq, cpu); if (!netif_xmit_stopped(txq)) { + is_list = !!skb->next; + dev_xmit_recursion_inc(); skb = dev_hard_start_xmit(skb, dev, txq, &rc); dev_xmit_recursion_dec(); - if (dev_xmit_complete(rc)) { - HARD_TX_UNLOCK(dev, txq); - goto out; - } + + /* GSO segments a single SKB into + * a list of frames. TCP expects error + * to mean none of the data was sent. + */ + if (is_list) + rc = NETDEV_TX_OK; } HARD_TX_UNLOCK(dev, txq); + if (!skb) /* xmit completed */ + goto out; + net_crit_ratelimited("Virtual device %s asks to queue packet!\n", dev->name); + /* NETDEV_TX_BUSY or queue was stopped */ + if (!is_list) + rc = -ENETDOWN; } else { /* Recursion is detected! It is possible, * unfortunately @@ -4850,10 +4863,10 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) recursion_alert: net_crit_ratelimited("Dead loop on virtual device %s, fix it urgently!\n", dev->name); + rc = -ENETDOWN; } } - rc = -ENETDOWN; rcu_read_unlock_bh(); dev_core_stats_tx_dropped_inc(dev); From 8a5752c6dcc085a3bfc78589925182e4e98468c5 Mon Sep 17 00:00:00 2001 From: Junrui Luo Date: Tue, 24 Feb 2026 19:05:56 +0800 Subject: [PATCH 314/359] dpaa2-switch: validate num_ifs to prevent out-of-bounds write The driver obtains sw_attr.num_ifs from firmware via dpsw_get_attributes() but never validates it against DPSW_MAX_IF (64). This value controls iteration in dpaa2_switch_fdb_get_flood_cfg(), which writes port indices into the fixed-size cfg->if_id[DPSW_MAX_IF] array. When firmware reports num_ifs >= 64, the loop can write past the array bounds. Add a bound check for num_ifs in dpaa2_switch_init(). dpaa2_switch_fdb_get_flood_cfg() appends the control interface (port num_ifs) after all matched ports. When num_ifs == DPSW_MAX_IF and all ports match the flood filter, the loop fills all 64 slots and the control interface write overflows by one entry. The check uses >= because num_ifs == DPSW_MAX_IF is also functionally broken. build_if_id_bitmap() silently drops any ID >= 64: if (id[i] < DPSW_MAX_IF) bmap[id[i] / 64] |= ... Fixes: 539dda3c5d19 ("staging: dpaa2-switch: properly setup switching domains") Signed-off-by: Junrui Luo Reviewed-by: Ioana Ciornei Link: https://patch.msgid.link/SYBPR01MB78812B47B7F0470B617C408AAF74A@SYBPR01MB7881.ausprd01.prod.outlook.com Signed-off-by: Paolo Abeni --- drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c index 66240c340492..78e21b46a5ba 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c @@ -3034,6 +3034,13 @@ static int dpaa2_switch_init(struct fsl_mc_device *sw_dev) goto err_close; } + if (ethsw->sw_attr.num_ifs >= DPSW_MAX_IF) { + dev_err(dev, "DPSW num_ifs %u exceeds max %u\n", + ethsw->sw_attr.num_ifs, DPSW_MAX_IF); + err = -EINVAL; + goto err_close; + } + err = dpsw_get_api_version(ethsw->mc_io, 0, ðsw->major, ðsw->minor); From baed0d9ba91d4f390da12d5039128ee897253d60 Mon Sep 17 00:00:00 2001 From: Vahagn Vardanian Date: Wed, 25 Feb 2026 14:06:18 +0100 Subject: [PATCH 315/359] netfilter: nf_conntrack_h323: fix OOB read in decode_choice() In decode_choice(), the boundary check before get_len() uses the variable `len`, which is still 0 from its initialization at the top of the function: unsigned int type, ext, len = 0; ... if (ext || (son->attr & OPEN)) { BYTE_ALIGN(bs); if (nf_h323_error_boundary(bs, len, 0)) /* len is 0 here */ return H323_ERROR_BOUND; len = get_len(bs); /* OOB read */ When the bitstream is exactly consumed (bs->cur == bs->end), the check nf_h323_error_boundary(bs, 0, 0) evaluates to (bs->cur + 0 > bs->end), which is false. The subsequent get_len() call then dereferences *bs->cur++, reading 1 byte past the end of the buffer. If that byte has bit 7 set, get_len() reads a second byte as well. This can be triggered remotely by sending a crafted Q.931 SETUP message with a User-User Information Element containing exactly 2 bytes of PER-encoded data ({0x08, 0x00}) to port 1720 through a firewall with the nf_conntrack_h323 helper active. The decoder fully consumes the PER buffer before reaching this code path, resulting in a 1-2 byte heap-buffer-overflow read confirmed by AddressSanitizer. Fix this by checking for 2 bytes (the maximum that get_len() may read) instead of the uninitialized `len`. This matches the pattern used at every other get_len() call site in the same file, where the caller checks for 2 bytes of available data before calling get_len(). Fixes: ec8a8f3c31dd ("netfilter: nf_ct_h323: Extend nf_h323_error_boundary to work on bits as well") Signed-off-by: Vahagn Vardanian Signed-off-by: Florian Westphal Link: https://patch.msgid.link/20260225130619.1248-2-fw@strlen.de Signed-off-by: Paolo Abeni --- net/netfilter/nf_conntrack_h323_asn1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/nf_conntrack_h323_asn1.c b/net/netfilter/nf_conntrack_h323_asn1.c index 540d97715bd2..62aa22a07876 100644 --- a/net/netfilter/nf_conntrack_h323_asn1.c +++ b/net/netfilter/nf_conntrack_h323_asn1.c @@ -796,7 +796,7 @@ static int decode_choice(struct bitstr *bs, const struct field_t *f, if (ext || (son->attr & OPEN)) { BYTE_ALIGN(bs); - if (nf_h323_error_boundary(bs, len, 0)) + if (nf_h323_error_boundary(bs, 2, 0)) return H323_ERROR_BOUND; len = get_len(bs); if (nf_h323_error_boundary(bs, len, 0)) From 021ca6b670bebebc409d43845efcfe8c11c1dd54 Mon Sep 17 00:00:00 2001 From: Harry Yoo Date: Mon, 23 Feb 2026 22:33:22 +0900 Subject: [PATCH 316/359] mm/slab: pass __GFP_NOWARN to refill_sheaf() if fallback is available When refill_sheaf() is called, failing to refill the sheaf doesn't necessarily mean the allocation will fail because a fallback path might be available and serve the allocation request. Suppress spurious warnings by passing __GFP_NOWARN along with __GFP_NOMEMALLOC whenever a fallback path is available. When the caller is alloc_full_sheaf() or __pcs_replace_empty_main(), the kernel always falls back to the slowpath (__slab_alloc_node()). For __prefill_sheaf_pfmemalloc(), the fallback path is available only when gfp_pfmemalloc_allowed() returns true. Reported-and-tested-by: Chris Bainbridge Closes: https://lore.kernel.org/linux-mm/aZt2-oS9lkmwT7Ch@debian.local Fixes: 1ce20c28eafd ("slab: handle pfmemalloc slabs properly with sheaves") Link: https://lore.kernel.org/linux-mm/aZwSreGj9-HHdD-j@hyeyoo Signed-off-by: Harry Yoo Link: https://patch.msgid.link/20260223133322.16705-1-harry.yoo@oracle.com Tested-by: Mikhail Gavrilov Signed-off-by: Vlastimil Babka (SUSE) --- mm/slub.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 862642c165ed..4ce24d9ee7e4 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2822,7 +2822,7 @@ static struct slab_sheaf *alloc_full_sheaf(struct kmem_cache *s, gfp_t gfp) if (!sheaf) return NULL; - if (refill_sheaf(s, sheaf, gfp | __GFP_NOMEMALLOC)) { + if (refill_sheaf(s, sheaf, gfp | __GFP_NOMEMALLOC | __GFP_NOWARN)) { free_empty_sheaf(s, sheaf); return NULL; } @@ -4575,7 +4575,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs, return NULL; if (empty) { - if (!refill_sheaf(s, empty, gfp | __GFP_NOMEMALLOC)) { + if (!refill_sheaf(s, empty, gfp | __GFP_NOMEMALLOC | __GFP_NOWARN)) { full = empty; } else { /* @@ -4890,9 +4890,14 @@ EXPORT_SYMBOL(kmem_cache_alloc_node_noprof); static int __prefill_sheaf_pfmemalloc(struct kmem_cache *s, struct slab_sheaf *sheaf, gfp_t gfp) { - int ret = 0; + gfp_t gfp_nomemalloc; + int ret; - ret = refill_sheaf(s, sheaf, gfp | __GFP_NOMEMALLOC); + gfp_nomemalloc = gfp | __GFP_NOMEMALLOC; + if (gfp_pfmemalloc_allowed(gfp)) + gfp_nomemalloc |= __GFP_NOWARN; + + ret = refill_sheaf(s, sheaf, gfp_nomemalloc); if (likely(!ret || !gfp_pfmemalloc_allowed(gfp))) return ret; From f3ec502b6755a3bfb12c1c47025ef989ff9efc72 Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan Date: Wed, 25 Feb 2026 08:34:07 -0800 Subject: [PATCH 317/359] mm/slab: mark alloc tags empty for sheaves allocated with __GFP_NO_OBJ_EXT alloc_empty_sheaf() allocates sheaves from SLAB_KMALLOC caches using __GFP_NO_OBJ_EXT to avoid recursion, however it does not mark their allocation tags empty before freeing, which results in a warning when CONFIG_MEM_ALLOC_PROFILING_DEBUG is set. Fix this by marking allocation tags for such sheaves as empty. The problem was technically introduced in commit 4c0a17e28340 but only becomes possible to hit with commit 913ffd3a1bf5. Fixes: 4c0a17e28340 ("slab: prevent recursive kmalloc() in alloc_empty_sheaf()") Fixes: 913ffd3a1bf5 ("slab: handle kmalloc sheaves bootstrap") Reported-by: David Wang <00107082@163.com> Closes: https://lore.kernel.org/all/20260223155128.3849-1-00107082@163.com/ Analyzed-by: Harry Yoo Signed-off-by: Suren Baghdasaryan Reviewed-by: Harry Yoo Tested-by: Harry Yoo Tested-by: David Wang <00107082@163.com> Link: https://patch.msgid.link/20260225163407.2218712-1-surenb@google.com Signed-off-by: Vlastimil Babka (SUSE) --- include/linux/gfp_types.h | 2 ++ mm/slab.h | 4 ++-- mm/slub.c | 33 +++++++++++++++++++++++---------- 3 files changed, 27 insertions(+), 12 deletions(-) diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h index 814bb2892f99..6c75df30a281 100644 --- a/include/linux/gfp_types.h +++ b/include/linux/gfp_types.h @@ -139,6 +139,8 @@ enum { * %__GFP_ACCOUNT causes the allocation to be accounted to kmemcg. * * %__GFP_NO_OBJ_EXT causes slab allocation to have no object extension. + * mark_obj_codetag_empty() should be called upon freeing for objects allocated + * with this flag to indicate that their NULL tags are expected and normal. */ #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) diff --git a/mm/slab.h b/mm/slab.h index 71c7261bf822..f6ef862b60ef 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -290,14 +290,14 @@ static inline void *nearest_obj(struct kmem_cache *cache, /* Determine object index from a given position */ static inline unsigned int __obj_to_index(const struct kmem_cache *cache, - void *addr, void *obj) + void *addr, const void *obj) { return reciprocal_divide(kasan_reset_tag(obj) - addr, cache->reciprocal_size); } static inline unsigned int obj_to_index(const struct kmem_cache *cache, - const struct slab *slab, void *obj) + const struct slab *slab, const void *obj) { if (is_kfence_address(obj)) return 0; diff --git a/mm/slub.c b/mm/slub.c index 4ce24d9ee7e4..52f021711744 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2041,18 +2041,18 @@ static inline void dec_slabs_node(struct kmem_cache *s, int node, #ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG -static inline void mark_objexts_empty(struct slabobj_ext *obj_exts) +static inline void mark_obj_codetag_empty(const void *obj) { - struct slab *obj_exts_slab; + struct slab *obj_slab; unsigned long slab_exts; - obj_exts_slab = virt_to_slab(obj_exts); - slab_exts = slab_obj_exts(obj_exts_slab); + obj_slab = virt_to_slab(obj); + slab_exts = slab_obj_exts(obj_slab); if (slab_exts) { get_slab_obj_exts(slab_exts); - unsigned int offs = obj_to_index(obj_exts_slab->slab_cache, - obj_exts_slab, obj_exts); - struct slabobj_ext *ext = slab_obj_ext(obj_exts_slab, + unsigned int offs = obj_to_index(obj_slab->slab_cache, + obj_slab, obj); + struct slabobj_ext *ext = slab_obj_ext(obj_slab, slab_exts, offs); if (unlikely(is_codetag_empty(&ext->ref))) { @@ -2090,7 +2090,7 @@ static inline void handle_failed_objexts_alloc(unsigned long obj_exts, #else /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */ -static inline void mark_objexts_empty(struct slabobj_ext *obj_exts) {} +static inline void mark_obj_codetag_empty(const void *obj) {} static inline bool mark_failed_objexts_alloc(struct slab *slab) { return false; } static inline void handle_failed_objexts_alloc(unsigned long obj_exts, struct slabobj_ext *vec, unsigned int objects) {} @@ -2211,7 +2211,7 @@ retry: * assign slabobj_exts in parallel. In this case the existing * objcg vector should be reused. */ - mark_objexts_empty(vec); + mark_obj_codetag_empty(vec); if (unlikely(!allow_spin)) kfree_nolock(vec); else @@ -2254,7 +2254,7 @@ static inline void free_slab_obj_exts(struct slab *slab, bool allow_spin) * NULL, therefore replace NULL with CODETAG_EMPTY to indicate that * the extension for obj_exts is expected to be NULL. */ - mark_objexts_empty(obj_exts); + mark_obj_codetag_empty(obj_exts); if (allow_spin) kfree(obj_exts); else @@ -2312,6 +2312,10 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab) #else /* CONFIG_SLAB_OBJ_EXT */ +static inline void mark_obj_codetag_empty(const void *obj) +{ +} + static inline void init_slab_obj_exts(struct slab *slab) { } @@ -2783,6 +2787,15 @@ static inline struct slab_sheaf *alloc_empty_sheaf(struct kmem_cache *s, static void free_empty_sheaf(struct kmem_cache *s, struct slab_sheaf *sheaf) { + /* + * If the sheaf was created with __GFP_NO_OBJ_EXT flag then its + * corresponding extension is NULL and alloc_tag_sub() will throw a + * warning, therefore replace NULL with CODETAG_EMPTY to indicate + * that the extension for this sheaf is expected to be NULL. + */ + if (s->flags & SLAB_KMALLOC) + mark_obj_codetag_empty(sheaf); + kfree(sheaf); stat(s, SHEAF_FREE); From 2b351ea42820a7ecc2e8305724536512984f4419 Mon Sep 17 00:00:00 2001 From: Sanjay Chitroda Date: Thu, 26 Feb 2026 11:17:12 +0530 Subject: [PATCH 318/359] mm/slub: drop duplicate kernel-doc for ksize() The implementation of ksize() was updated with kernel-doc by commit fab0694646d7 ("mm/slab: move [__]ksize and slab_ksize() to mm/slub.c") However, the public header still contains a kernel-doc comment attached to the ksize() prototype. Having documentation both in the header and next to the implementation causes Sphinx to treat the function as being documented twice, resulting in the warning: WARNING: Duplicate C declaration, also defined at core-api/mm-api:521 Declaration is '.. c:function:: size_t ksize(const void *objp)' Kernel-doc guidelines recommend keeping the documentation with the function implementation. Therefore remove the redundant kernel-doc block from include/linux/slab.h so that the implementation in slub.c remains the canonical source for documentation. No functional change. Fixes: fab0694646d7 ("mm/slab: move [__]ksize and slab_ksize() to mm/slub.c") Signed-off-by: Sanjay Chitroda Link: https://patch.msgid.link/20260226054712.3610744-1-sanjayembedded@gmail.com Signed-off-by: Vlastimil Babka (SUSE) --- include/linux/slab.h | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index a5a5e4108ae5..15a60b501b95 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -517,18 +517,6 @@ void kfree_sensitive(const void *objp); DEFINE_FREE(kfree, void *, if (!IS_ERR_OR_NULL(_T)) kfree(_T)) DEFINE_FREE(kfree_sensitive, void *, if (_T) kfree_sensitive(_T)) -/** - * ksize - Report actual allocation size of associated object - * - * @objp: Pointer returned from a prior kmalloc()-family allocation. - * - * This should not be used for writing beyond the originally requested - * allocation size. Either use krealloc() or round up the allocation size - * with kmalloc_size_roundup() prior to allocation. If this is used to - * access beyond the originally requested allocation size, UBSAN_BOUNDS - * and/or FORTIFY_SOURCE may trip, since they only know about the - * originally allocated size via the __alloc_size attribute. - */ size_t ksize(const void *objp); #ifdef CONFIG_PRINTK From 795469820c638b4449f3bb90ee5e98ebccfbc480 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 23 Feb 2026 14:01:56 -0800 Subject: [PATCH 319/359] kcsan: test: Adjust "expect" allocation type for kmalloc_obj The call to kmalloc_obj(observed.lines) returns "char (*)[3][512]", a pointer to the whole 2D array. But "expect" wants to be "char (*)[512]", the decayed pointer type, as if it were observed.lines itself (though without the "3" bounds). This produces the following build error: ../kernel/kcsan/kcsan_test.c: In function '__report_matches': ../kernel/kcsan/kcsan_test.c:171:16: error: assignment to 'char (*)[512]' from incompatible pointer type 'char (*)[3][512]' [-Wincompatible-pointer-types] 171 | expect = kmalloc_obj(observed.lines); | ^ Instead of changing the "expect" type to "char (*)[3][512]" and requiring a dereference at each use (e.g. "(expect*)[0]"), just explicitly cast the return to the desired type. Note that I'm intentionally not switching back to byte-based "kmalloc" here because I cannot find a way for the Coccinelle script (which will be used going forward to catch future conversions) to exclude this case. Tested with: $ ./tools/testing/kunit/kunit.py run \ --kconfig_add CONFIG_DEBUG_KERNEL=y \ --kconfig_add CONFIG_KCSAN=y \ --kconfig_add CONFIG_KCSAN_KUNIT_TEST=y \ --arch=x86_64 --qemu_args '-smp 2' kcsan Reported-by: Nathan Chancellor Fixes: 69050f8d6d07 ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types") Signed-off-by: Kees Cook --- kernel/kcsan/kcsan_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/kcsan/kcsan_test.c b/kernel/kcsan/kcsan_test.c index 79e655ea4ca1..ae758150ccb9 100644 --- a/kernel/kcsan/kcsan_test.c +++ b/kernel/kcsan/kcsan_test.c @@ -168,7 +168,7 @@ static bool __report_matches(const struct expect_report *r) if (!report_available()) return false; - expect = kmalloc_obj(observed.lines); + expect = (typeof(expect))kmalloc_obj(observed.lines); if (WARN_ON(!expect)) return false; From e5cb94ba5f96d691d8885175d4696d6ae6bc5ec9 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Thu, 26 Feb 2026 08:22:32 +0000 Subject: [PATCH 320/359] arm64: Fix sampling the "stable" virtual counter in preemptible section MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Ben reports that when running with CONFIG_DEBUG_PREEMPT, using __arch_counter_get_cntvct_stable() results in well deserves warnings, as we access a per-CPU variable without preemption disabled. Fix the issue by disabling preemption on reading the counter. We can probably do a lot better by not disabling preemption on systems that do not require horrible workarounds to return a valid counter value, but this plugs the issue for the time being. Fixes: 29cc0f3aa7c6 ("arm64: Force the use of CNTVCT_EL0 in __delay()") Reported-by: Ben Horgan Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/aZw3EGs4rbQvbAzV@e134344.arm.com Tested-by: Ben Horgan Tested-by: André Draszik Signed-off-by: Will Deacon --- arch/arm64/lib/delay.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c index d02341303899..e278e060e78a 100644 --- a/arch/arm64/lib/delay.c +++ b/arch/arm64/lib/delay.c @@ -32,7 +32,11 @@ static inline unsigned long xloops_to_cycles(unsigned long xloops) * Note that userspace cannot change the offset behind our back either, * as the vcpu mutex is held as long as KVM_RUN is in progress. */ -#define __delay_cycles() __arch_counter_get_cntvct_stable() +static cycles_t notrace __delay_cycles(void) +{ + guard(preempt_notrace)(); + return __arch_counter_get_cntvct_stable(); +} void __delay(unsigned long cycles) { From df6e4ab654dc482c1d45776257a62ac10e14086c Mon Sep 17 00:00:00 2001 From: Sumit Gupta Date: Thu, 26 Feb 2026 17:29:11 +0530 Subject: [PATCH 321/359] arm64: topology: Fix false warning in counters_read_on_cpu() for same-CPU reads The counters_read_on_cpu() function warns when called with IRQs disabled to prevent deadlock in smp_call_function_single(). However, this warning is spurious when reading counters on the current CPU, since no IPI is needed for same CPU reads. Commit 12eb8f4fff24 ("cpufreq: CPPC: Update FIE arch_freq_scale in ticks for non-PCC regs") changed the CPPC Frequency Invariance Engine to read AMU counters directly from the scheduler tick for non-PCC register spaces (like FFH), instead of deferring to a kthread. This means counters_read_on_cpu() is now called with IRQs disabled from the tick handler, triggering the warning. Fix this by restructuring the logic: when IRQs are disabled (tick context), call the function directly for same-CPU reads. Otherwise use smp_call_function_single(). Fixes: 997c021abc6e ("cpufreq: CPPC: Update FIE arch_freq_scale in ticks for non-PCC regs") Suggested-by: Will Deacon Signed-off-by: Sumit Gupta Signed-off-by: Will Deacon --- arch/arm64/kernel/topology.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index 3fe1faab0362..b32f13358fbb 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -400,16 +400,25 @@ static inline int counters_read_on_cpu(int cpu, smp_call_func_t func, u64 *val) { /* - * Abort call on counterless CPU or when interrupts are - * disabled - can lead to deadlock in smp sync call. + * Abort call on counterless CPU. */ if (!cpu_has_amu_feat(cpu)) return -EOPNOTSUPP; - if (WARN_ON_ONCE(irqs_disabled())) - return -EPERM; - - smp_call_function_single(cpu, func, val, 1); + if (irqs_disabled()) { + /* + * When IRQs are disabled (tick path: sched_tick -> + * topology_scale_freq_tick or cppc_scale_freq_tick), only local + * CPU counter reads are allowed. Remote CPU counter read would + * require smp_call_function_single() which is unsafe with IRQs + * disabled. + */ + if (WARN_ON_ONCE(cpu != smp_processor_id())) + return -EPERM; + func(val); + } else { + smp_call_function_single(cpu, func, val, 1); + } return 0; } From ef06fd16d48704eac868441d98d4ef083d8f3d07 Mon Sep 17 00:00:00 2001 From: Fuad Tabba Date: Thu, 26 Feb 2026 07:55:25 +0000 Subject: [PATCH 322/359] bpf, arm64: Force 8-byte alignment for JIT buffer to prevent atomic tearing struct bpf_plt contains a u64 target field. Currently, the BPF JIT allocator requests an alignment of 4 bytes (sizeof(u32)) for the JIT buffer. Because the base address of the JIT buffer can be 4-byte aligned (e.g., ending in 0x4 or 0xc), the relative padding logic in build_plt() fails to ensure that target lands on an 8-byte boundary. This leads to two issues: 1. UBSAN reports misaligned-access warnings when dereferencing the structure. 2. More critically, target is updated concurrently via WRITE_ONCE() in bpf_arch_text_poke() while the JIT'd code executes ldr. On arm64, 64-bit loads/stores are only guaranteed to be single-copy atomic if they are 64-bit aligned. A misaligned target risks a torn read, causing the JIT to jump to a corrupted address. Fix this by increasing the allocation alignment requirement to 8 bytes (sizeof(u64)) in bpf_jit_binary_pack_alloc(). This anchors the base of the JIT buffer to an 8-byte boundary, allowing the relative padding math in build_plt() to correctly align the target field. Fixes: b2ad54e1533e ("bpf, arm64: Implement bpf_arch_text_poke() for arm64") Signed-off-by: Fuad Tabba Acked-by: Will Deacon Link: https://lore.kernel.org/r/20260226075525.233321-1-tabba@google.com Signed-off-by: Alexei Starovoitov --- arch/arm64/net/bpf_jit_comp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 356d33c7a4ae..adf84962d579 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -2119,7 +2119,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) extable_offset = round_up(prog_size + PLT_TARGET_SIZE, extable_align); image_size = extable_offset + extable_size; ro_header = bpf_jit_binary_pack_alloc(image_size, &ro_image_ptr, - sizeof(u32), &header, &image_ptr, + sizeof(u64), &header, &image_ptr, jit_fill_hole); if (!ro_header) { prog = orig_prog; From ad6fface76da42721c15e8fb281570aaa44a2c01 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Wed, 25 Feb 2026 12:12:49 +0100 Subject: [PATCH 323/359] bpf: Fix kprobe_multi cookies access in show_fdinfo callback We don't check if cookies are available on the kprobe_multi link before accessing them in show_fdinfo callback, we should. Cc: stable@vger.kernel.org Fixes: da7e9c0a7fbc ("bpf: Add show_fdinfo for kprobe_multi") Signed-off-by: Jiri Olsa Link: https://lore.kernel.org/r/20260225111249.186230-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov --- kernel/trace/bpf_trace.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 9bc0dfd235af..0b040a417442 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2454,8 +2454,10 @@ static void bpf_kprobe_multi_show_fdinfo(const struct bpf_link *link, struct seq_file *seq) { struct bpf_kprobe_multi_link *kmulti_link; + bool has_cookies; kmulti_link = container_of(link, struct bpf_kprobe_multi_link, link); + has_cookies = !!kmulti_link->cookies; seq_printf(seq, "kprobe_cnt:\t%u\n" @@ -2467,7 +2469,7 @@ static void bpf_kprobe_multi_show_fdinfo(const struct bpf_link *link, for (int i = 0; i < kmulti_link->cnt; i++) { seq_printf(seq, "%llu\t %pS\n", - kmulti_link->cookies[i], + has_cookies ? kmulti_link->cookies[i] : 0, (void *)kmulti_link->addrs[i]); } } From b7bf516c3ecd9a2aae2dc2635178ab87b734fef1 Mon Sep 17 00:00:00 2001 From: Kohei Enju Date: Wed, 25 Feb 2026 05:34:44 +0000 Subject: [PATCH 324/359] bpf: Fix stack-out-of-bounds write in devmap MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit get_upper_ifindexes() iterates over all upper devices and writes their indices into an array without checking bounds. Also the callers assume that the max number of upper devices is MAX_NEST_DEV and allocate excluded_devices[1+MAX_NEST_DEV] on the stack, but that assumption is not correct and the number of upper devices could be larger than MAX_NEST_DEV (e.g., many macvlans), causing a stack-out-of-bounds write. Add a max parameter to get_upper_ifindexes() to avoid the issue. When there are too many upper devices, return -EOVERFLOW and abort the redirect. To reproduce, create more than MAX_NEST_DEV(8) macvlans on a device with an XDP program attached using BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS. Then send a packet to the device to trigger the XDP redirect path. Reported-by: syzbot+10cc7f13760b31bd2e61@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/698c4ce3.050a0220.340abe.000b.GAE@google.com/T/ Fixes: aeea1b86f936 ("bpf, devmap: Exclude XDP broadcast to master device") Reviewed-by: Toke Høiland-Jørgensen Signed-off-by: Kohei Enju Link: https://lore.kernel.org/r/20260225053506.4738-1-kohei@enjuk.jp Signed-off-by: Alexei Starovoitov --- kernel/bpf/devmap.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 2625601de76e..2984e938f94d 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -588,18 +588,22 @@ static inline bool is_ifindex_excluded(int *excluded, int num_excluded, int ifin } /* Get ifindex of each upper device. 'indexes' must be able to hold at - * least MAX_NEST_DEV elements. - * Returns the number of ifindexes added. + * least 'max' elements. + * Returns the number of ifindexes added, or -EOVERFLOW if there are too + * many upper devices. */ -static int get_upper_ifindexes(struct net_device *dev, int *indexes) +static int get_upper_ifindexes(struct net_device *dev, int *indexes, int max) { struct net_device *upper; struct list_head *iter; int n = 0; netdev_for_each_upper_dev_rcu(dev, upper, iter) { + if (n >= max) + return -EOVERFLOW; indexes[n++] = upper->ifindex; } + return n; } @@ -615,7 +619,11 @@ int dev_map_enqueue_multi(struct xdp_frame *xdpf, struct net_device *dev_rx, int err; if (exclude_ingress) { - num_excluded = get_upper_ifindexes(dev_rx, excluded_devices); + num_excluded = get_upper_ifindexes(dev_rx, excluded_devices, + ARRAY_SIZE(excluded_devices) - 1); + if (num_excluded < 0) + return num_excluded; + excluded_devices[num_excluded++] = dev_rx->ifindex; } @@ -733,7 +741,11 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb, int err; if (exclude_ingress) { - num_excluded = get_upper_ifindexes(dev, excluded_devices); + num_excluded = get_upper_ifindexes(dev, excluded_devices, + ARRAY_SIZE(excluded_devices) - 1); + if (num_excluded < 0) + return num_excluded; + excluded_devices[num_excluded++] = dev->ifindex; } From 60e3cbef3500ad6101bc91946e645f69b1bb9556 Mon Sep 17 00:00:00 2001 From: Ihor Solodrai Date: Tue, 24 Feb 2026 16:33:51 -0800 Subject: [PATCH 325/359] selftests/bpf: Fix a memory leak in xdp_flowtable test test_progs run with ASAN reported [1]: ==126==ERROR: LeakSanitizer: detected memory leaks Direct leak of 32 byte(s) in 1 object(s) allocated from: #0 0x7f1ff3cfa340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x5610c15bb520 in bpf_program_attach_fd /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/lib/bpf/libbpf.c:13164 #2 0x5610c15bb740 in bpf_program__attach_xdp /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/lib/bpf/libbpf.c:13204 #3 0x5610c14f91d3 in test_xdp_flowtable /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c:138 #4 0x5610c1533566 in run_one_test /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/testing/selftests/bpf/test_progs.c:1406 #5 0x5610c1537fb0 in main /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/testing/selftests/bpf/test_progs.c:2097 #6 0x7f1ff25df1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 8e9fd827446c24067541ac5390e6f527fb5947bb) #7 0x7f1ff25df28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 8e9fd827446c24067541ac5390e6f527fb5947bb) #8 0x5610c0bd3180 in _start (/tmp/work/vmtest/vmtest/selftests/bpf/test_progs+0x593180) (BuildId: cdf9f103f42307dc0a2cd6cfc8afcbc1366cf8bd) Fix by properly destroying bpf_link on exit in xdp_flowtable test. [1] https://github.com/kernel-patches/vmtest/actions/runs/22361085418/job/64716490680 Signed-off-by: Ihor Solodrai Reviewed-by: Subbaraya Sundeep Link: https://lore.kernel.org/r/20260225003351.465104-1-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c b/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c index 3f9146d83d79..325e0b64dc35 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c @@ -67,7 +67,7 @@ void test_xdp_flowtable(void) struct nstoken *tok = NULL; int iifindex, stats_fd; __u32 value, key = 0; - struct bpf_link *link; + struct bpf_link *link = NULL; if (SYS_NOFAIL("nft -v")) { fprintf(stdout, "Missing required nft tool\n"); @@ -160,6 +160,7 @@ void test_xdp_flowtable(void) ASSERT_GE(value, N_PACKETS - 2, "bpf_xdp_flow_lookup failed"); out: + bpf_link__destroy(link); xdp_flowtable__destroy(skel); if (tok) close_netns(tok); From 6881af27f9ea0f5ca8f606f573ef5cc25ca31fe4 Mon Sep 17 00:00:00 2001 From: "T.J. Mercier" Date: Tue, 24 Feb 2026 16:33:48 -0800 Subject: [PATCH 326/359] selftests/bpf: Fix OOB read in dmabuf_collector Dmabuf name allocations can be less than DMA_BUF_NAME_LEN characters, but bpf_probe_read_kernel always tries to read exactly that many bytes. If a name is less than DMA_BUF_NAME_LEN characters, bpf_probe_read_kernel will read past the end. bpf_probe_read_kernel_str stops at the first NUL terminator so use it instead, like iter_dmabuf_for_each already does. Fixes: ae5d2c59ecd7 ("selftests/bpf: Add test for dmabuf_iter") Reported-by: Jerome Lee Signed-off-by: T.J. Mercier Link: https://lore.kernel.org/r/20260225003349.113746-1-tjmercier@google.com Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/progs/dmabuf_iter.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/progs/dmabuf_iter.c b/tools/testing/selftests/bpf/progs/dmabuf_iter.c index 13cdb11fdeb2..9cbb7442646e 100644 --- a/tools/testing/selftests/bpf/progs/dmabuf_iter.c +++ b/tools/testing/selftests/bpf/progs/dmabuf_iter.c @@ -48,7 +48,7 @@ int dmabuf_collector(struct bpf_iter__dmabuf *ctx) /* Buffers are not required to be named */ if (pname) { - if (bpf_probe_read_kernel(name, sizeof(name), pname)) + if (bpf_probe_read_kernel_str(name, sizeof(name), pname) < 0) return 1; /* Name strings can be provided by userspace */ From e96493229a6399e902062213c6381162464cdd50 Mon Sep 17 00:00:00 2001 From: Alain Volmat Date: Tue, 24 Feb 2026 16:09:22 +0100 Subject: [PATCH 327/359] spi: stm32: fix missing pointer assignment in case of dma chaining Commit c4f2c05ab029 ("spi: stm32: fix pointer-to-pointer variables usage") introduced a regression since dma descriptors generated as part of the stm32_spi_prepare_rx_dma_mdma_chaining function are not well propagated to the caller function, leading to mdma-dma chaining being no more functional. Fixes: c4f2c05ab029 ("spi: stm32: fix pointer-to-pointer variables usage") Signed-off-by: Alain Volmat Acked-by: Antonio Quartulli Link: https://patch.msgid.link/20260224-spi-stm32-chaining-fix-v1-1-5da7a4851b66@foss.st.com Signed-off-by: Mark Brown --- drivers/spi/spi-stm32.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/spi/spi-stm32.c b/drivers/spi/spi-stm32.c index b99de8c4cc99..33f211e159ef 100644 --- a/drivers/spi/spi-stm32.c +++ b/drivers/spi/spi-stm32.c @@ -1625,6 +1625,9 @@ static int stm32_spi_prepare_rx_dma_mdma_chaining(struct stm32_spi *spi, return -EINVAL; } + *rx_mdma_desc = _mdma_desc; + *rx_dma_desc = _dma_desc; + return 0; } From 4fc3a433c13944ee5766ec5b9bf6f1eb4d29b880 Mon Sep 17 00:00:00 2001 From: Paulo Alcantara Date: Mon, 23 Feb 2026 13:34:35 -0300 Subject: [PATCH 328/359] smb: client: use atomic_t for mnt_cifs_flags Use atomic_t for cifs_sb_info::mnt_cifs_flags as it's currently accessed locklessly and may be changed concurrently in mount/remount and reconnect paths. Signed-off-by: Paulo Alcantara (Red Hat) Reviewed-by: David Howells Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French --- fs/smb/client/cached_dir.c | 2 +- fs/smb/client/cifs_fs_sb.h | 2 +- fs/smb/client/cifs_ioctl.h | 8 -- fs/smb/client/cifs_unicode.c | 14 ---- fs/smb/client/cifs_unicode.h | 14 +++- fs/smb/client/cifsacl.c | 17 ++-- fs/smb/client/cifsfs.c | 84 ++++++++++---------- fs/smb/client/cifsglob.h | 61 +++++++++++--- fs/smb/client/connect.c | 65 ++++++++------- fs/smb/client/dfs_cache.c | 2 +- fs/smb/client/dir.c | 53 +++++++------ fs/smb/client/file.c | 90 +++++++++++---------- fs/smb/client/fs_context.c | 149 +++++++++++++++++------------------ fs/smb/client/fs_context.h | 2 +- fs/smb/client/inode.c | 146 ++++++++++++++++++---------------- fs/smb/client/ioctl.c | 2 +- fs/smb/client/link.c | 14 ++-- fs/smb/client/misc.c | 16 ++-- fs/smb/client/readdir.c | 39 ++++----- fs/smb/client/reparse.c | 29 +++---- fs/smb/client/reparse.h | 4 +- fs/smb/client/smb1ops.c | 22 ++++-- fs/smb/client/smb2file.c | 2 +- fs/smb/client/smb2misc.c | 18 +---- fs/smb/client/smb2ops.c | 8 +- fs/smb/client/smb2pdu.c | 13 ++- fs/smb/client/xattr.c | 6 +- 27 files changed, 465 insertions(+), 417 deletions(-) diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c index c327c246a9b4..04bb95091f49 100644 --- a/fs/smb/client/cached_dir.c +++ b/fs/smb/client/cached_dir.c @@ -118,7 +118,7 @@ static const char *path_no_prefix(struct cifs_sb_info *cifs_sb, if (!*path) return path; - if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_USE_PREFIX_PATH) && + if ((cifs_sb_flags(cifs_sb) & CIFS_MOUNT_USE_PREFIX_PATH) && cifs_sb->prepath) { len = strlen(cifs_sb->prepath) + 1; if (unlikely(len > strlen(path))) diff --git a/fs/smb/client/cifs_fs_sb.h b/fs/smb/client/cifs_fs_sb.h index 5e8d163cb5f8..84e7e366b0ff 100644 --- a/fs/smb/client/cifs_fs_sb.h +++ b/fs/smb/client/cifs_fs_sb.h @@ -55,7 +55,7 @@ struct cifs_sb_info { struct nls_table *local_nls; struct smb3_fs_context *ctx; atomic_t active; - unsigned int mnt_cifs_flags; + atomic_t mnt_cifs_flags; struct delayed_work prune_tlinks; struct rcu_head rcu; diff --git a/fs/smb/client/cifs_ioctl.h b/fs/smb/client/cifs_ioctl.h index b51ce64fcccf..147496ac9f9f 100644 --- a/fs/smb/client/cifs_ioctl.h +++ b/fs/smb/client/cifs_ioctl.h @@ -122,11 +122,3 @@ struct smb3_notify_info { #define CIFS_GOING_FLAGS_DEFAULT 0x0 /* going down */ #define CIFS_GOING_FLAGS_LOGFLUSH 0x1 /* flush log but not data */ #define CIFS_GOING_FLAGS_NOLOGFLUSH 0x2 /* don't flush log nor data */ - -static inline bool cifs_forced_shutdown(struct cifs_sb_info *sbi) -{ - if (CIFS_MOUNT_SHUTDOWN & sbi->mnt_cifs_flags) - return true; - else - return false; -} diff --git a/fs/smb/client/cifs_unicode.c b/fs/smb/client/cifs_unicode.c index e7891b4406f2..e2edc207cef2 100644 --- a/fs/smb/client/cifs_unicode.c +++ b/fs/smb/client/cifs_unicode.c @@ -11,20 +11,6 @@ #include "cifsglob.h" #include "cifs_debug.h" -int cifs_remap(struct cifs_sb_info *cifs_sb) -{ - int map_type; - - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MAP_SFM_CHR) - map_type = SFM_MAP_UNI_RSVD; - else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MAP_SPECIAL_CHR) - map_type = SFU_MAP_UNI_RSVD; - else - map_type = NO_MAP_UNI_RSVD; - - return map_type; -} - /* Convert character using the SFU - "Services for Unix" remapping range */ static bool convert_sfu_char(const __u16 src_char, char *target) diff --git a/fs/smb/client/cifs_unicode.h b/fs/smb/client/cifs_unicode.h index 9249db3b78c3..3e9cd9acf0a9 100644 --- a/fs/smb/client/cifs_unicode.h +++ b/fs/smb/client/cifs_unicode.h @@ -22,6 +22,7 @@ #include #include #include "../../nls/nls_ucs2_utils.h" +#include "cifsglob.h" /* * Macs use an older "SFM" mapping of the symbols above. Fortunately it does @@ -65,10 +66,21 @@ char *cifs_strndup_from_utf16(const char *src, const int maxlen, const struct nls_table *codepage); int cifsConvertToUTF16(__le16 *target, const char *source, int srclen, const struct nls_table *cp, int map_chars); -int cifs_remap(struct cifs_sb_info *cifs_sb); __le16 *cifs_strndup_to_utf16(const char *src, const int maxlen, int *utf16_len, const struct nls_table *cp, int remap); wchar_t cifs_toupper(wchar_t in); +static inline int cifs_remap(const struct cifs_sb_info *cifs_sb) +{ + unsigned int sbflags = cifs_sb_flags(cifs_sb); + + if (sbflags & CIFS_MOUNT_MAP_SFM_CHR) + return SFM_MAP_UNI_RSVD; + if (sbflags & CIFS_MOUNT_MAP_SPECIAL_CHR) + return SFU_MAP_UNI_RSVD; + + return NO_MAP_UNI_RSVD; +} + #endif /* _CIFS_UNICODE_H */ diff --git a/fs/smb/client/cifsacl.c b/fs/smb/client/cifsacl.c index 6fa12c901c14..f4cb3018a358 100644 --- a/fs/smb/client/cifsacl.c +++ b/fs/smb/client/cifsacl.c @@ -356,7 +356,7 @@ sid_to_id(struct cifs_sb_info *cifs_sb, struct smb_sid *psid, psid->num_subauth, SID_MAX_SUB_AUTHORITIES); } - if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_UID_FROM_ACL) || + if ((cifs_sb_flags(cifs_sb) & CIFS_MOUNT_UID_FROM_ACL) || (cifs_sb_master_tcon(cifs_sb)->posix_extensions)) { uint32_t unix_id; bool is_group; @@ -1612,7 +1612,8 @@ id_mode_to_cifs_acl(struct inode *inode, const char *path, __u64 *pnmode, struct smb_acl *dacl_ptr = NULL; struct smb_ntsd *pntsd = NULL; /* acl obtained from server */ struct smb_ntsd *pnntsd = NULL; /* modified acl to be sent to server */ - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(inode); + unsigned int sbflags; struct tcon_link *tlink; struct smb_version_operations *ops; bool mode_from_sid, id_from_sid; @@ -1643,15 +1644,9 @@ id_mode_to_cifs_acl(struct inode *inode, const char *path, __u64 *pnmode, return rc; } - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MODE_FROM_SID) - mode_from_sid = true; - else - mode_from_sid = false; - - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_UID_FROM_ACL) - id_from_sid = true; - else - id_from_sid = false; + sbflags = cifs_sb_flags(cifs_sb); + mode_from_sid = sbflags & CIFS_MOUNT_MODE_FROM_SID; + id_from_sid = sbflags & CIFS_MOUNT_UID_FROM_ACL; /* Potentially, five new ACEs can be added to the ACL for U,G,O mapping */ if (pnmode && *pnmode != NO_CHANGE_64) { /* chmod */ diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c index 99b04234a08e..427558404aa5 100644 --- a/fs/smb/client/cifsfs.c +++ b/fs/smb/client/cifsfs.c @@ -226,16 +226,18 @@ cifs_sb_deactive(struct super_block *sb) static int cifs_read_super(struct super_block *sb) { - struct inode *inode; struct cifs_sb_info *cifs_sb; struct cifs_tcon *tcon; + unsigned int sbflags; struct timespec64 ts; + struct inode *inode; int rc = 0; cifs_sb = CIFS_SB(sb); tcon = cifs_sb_master_tcon(cifs_sb); + sbflags = cifs_sb_flags(cifs_sb); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_POSIXACL) + if (sbflags & CIFS_MOUNT_POSIXACL) sb->s_flags |= SB_POSIXACL; if (tcon->snapshot_time) @@ -311,7 +313,7 @@ cifs_read_super(struct super_block *sb) } #ifdef CONFIG_CIFS_NFSD_EXPORT - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM) { + if (sbflags & CIFS_MOUNT_SERVER_INUM) { cifs_dbg(FYI, "export ops supported\n"); sb->s_export_op = &cifs_export_ops; } @@ -389,8 +391,7 @@ statfs_out: static long cifs_fallocate(struct file *file, int mode, loff_t off, loff_t len) { - struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(file); - struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb); + struct cifs_tcon *tcon = cifs_sb_master_tcon(CIFS_SB(file)); struct TCP_Server_Info *server = tcon->ses->server; struct inode *inode = file_inode(file); int rc; @@ -418,11 +419,9 @@ out_unlock: static int cifs_permission(struct mnt_idmap *idmap, struct inode *inode, int mask) { - struct cifs_sb_info *cifs_sb; + unsigned int sbflags = cifs_sb_flags(CIFS_SB(inode)); - cifs_sb = CIFS_SB(inode->i_sb); - - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_PERM) { + if (sbflags & CIFS_MOUNT_NO_PERM) { if ((mask & MAY_EXEC) && !execute_ok(inode)) return -EACCES; else @@ -568,15 +567,17 @@ cifs_show_security(struct seq_file *s, struct cifs_ses *ses) static void cifs_show_cache_flavor(struct seq_file *s, struct cifs_sb_info *cifs_sb) { + unsigned int sbflags = cifs_sb_flags(cifs_sb); + seq_puts(s, ",cache="); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_STRICT_IO) + if (sbflags & CIFS_MOUNT_STRICT_IO) seq_puts(s, "strict"); - else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DIRECT_IO) + else if (sbflags & CIFS_MOUNT_DIRECT_IO) seq_puts(s, "none"); - else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RW_CACHE) + else if (sbflags & CIFS_MOUNT_RW_CACHE) seq_puts(s, "singleclient"); /* assume only one client access */ - else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RO_CACHE) + else if (sbflags & CIFS_MOUNT_RO_CACHE) seq_puts(s, "ro"); /* read only caching assumed */ else seq_puts(s, "loose"); @@ -637,6 +638,8 @@ cifs_show_options(struct seq_file *s, struct dentry *root) struct cifs_sb_info *cifs_sb = CIFS_SB(root->d_sb); struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb); struct sockaddr *srcaddr; + unsigned int sbflags; + srcaddr = (struct sockaddr *)&tcon->ses->server->srcaddr; seq_show_option(s, "vers", tcon->ses->server->vals->version_string); @@ -670,16 +673,17 @@ cifs_show_options(struct seq_file *s, struct dentry *root) (int)(srcaddr->sa_family)); } + sbflags = cifs_sb_flags(cifs_sb); seq_printf(s, ",uid=%u", from_kuid_munged(&init_user_ns, cifs_sb->ctx->linux_uid)); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID) + if (sbflags & CIFS_MOUNT_OVERR_UID) seq_puts(s, ",forceuid"); else seq_puts(s, ",noforceuid"); seq_printf(s, ",gid=%u", from_kgid_munged(&init_user_ns, cifs_sb->ctx->linux_gid)); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID) + if (sbflags & CIFS_MOUNT_OVERR_GID) seq_puts(s, ",forcegid"); else seq_puts(s, ",noforcegid"); @@ -722,53 +726,53 @@ cifs_show_options(struct seq_file *s, struct dentry *root) seq_puts(s, ",unix"); else seq_puts(s, ",nounix"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_DFS) + if (sbflags & CIFS_MOUNT_NO_DFS) seq_puts(s, ",nodfs"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_POSIX_PATHS) + if (sbflags & CIFS_MOUNT_POSIX_PATHS) seq_puts(s, ",posixpaths"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SET_UID) + if (sbflags & CIFS_MOUNT_SET_UID) seq_puts(s, ",setuids"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_UID_FROM_ACL) + if (sbflags & CIFS_MOUNT_UID_FROM_ACL) seq_puts(s, ",idsfromsid"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM) + if (sbflags & CIFS_MOUNT_SERVER_INUM) seq_puts(s, ",serverino"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) + if (sbflags & CIFS_MOUNT_RWPIDFORWARD) seq_puts(s, ",rwpidforward"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) + if (sbflags & CIFS_MOUNT_NOPOSIXBRL) seq_puts(s, ",forcemand"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_XATTR) + if (sbflags & CIFS_MOUNT_NO_XATTR) seq_puts(s, ",nouser_xattr"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MAP_SPECIAL_CHR) + if (sbflags & CIFS_MOUNT_MAP_SPECIAL_CHR) seq_puts(s, ",mapchars"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MAP_SFM_CHR) + if (sbflags & CIFS_MOUNT_MAP_SFM_CHR) seq_puts(s, ",mapposix"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_UNX_EMUL) + if (sbflags & CIFS_MOUNT_UNX_EMUL) seq_puts(s, ",sfu"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_BRL) + if (sbflags & CIFS_MOUNT_NO_BRL) seq_puts(s, ",nobrl"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_HANDLE_CACHE) + if (sbflags & CIFS_MOUNT_NO_HANDLE_CACHE) seq_puts(s, ",nohandlecache"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MODE_FROM_SID) + if (sbflags & CIFS_MOUNT_MODE_FROM_SID) seq_puts(s, ",modefromsid"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_CIFS_ACL) + if (sbflags & CIFS_MOUNT_CIFS_ACL) seq_puts(s, ",cifsacl"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DYNPERM) + if (sbflags & CIFS_MOUNT_DYNPERM) seq_puts(s, ",dynperm"); if (root->d_sb->s_flags & SB_POSIXACL) seq_puts(s, ",acl"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MF_SYMLINKS) + if (sbflags & CIFS_MOUNT_MF_SYMLINKS) seq_puts(s, ",mfsymlinks"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_FSCACHE) + if (sbflags & CIFS_MOUNT_FSCACHE) seq_puts(s, ",fsc"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOSSYNC) + if (sbflags & CIFS_MOUNT_NOSSYNC) seq_puts(s, ",nostrictsync"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_PERM) + if (sbflags & CIFS_MOUNT_NO_PERM) seq_puts(s, ",noperm"); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_CIFS_BACKUPUID) + if (sbflags & CIFS_MOUNT_CIFS_BACKUPUID) seq_printf(s, ",backupuid=%u", from_kuid_munged(&init_user_ns, cifs_sb->ctx->backupuid)); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_CIFS_BACKUPGID) + if (sbflags & CIFS_MOUNT_CIFS_BACKUPGID) seq_printf(s, ",backupgid=%u", from_kgid_munged(&init_user_ns, cifs_sb->ctx->backupgid)); @@ -909,10 +913,10 @@ static int cifs_write_inode(struct inode *inode, struct writeback_control *wbc) static int cifs_drop_inode(struct inode *inode) { - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + unsigned int sbflags = cifs_sb_flags(CIFS_SB(inode)); /* no serverino => unconditional eviction */ - return !(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM) || + return !(sbflags & CIFS_MOUNT_SERVER_INUM) || inode_generic_drop(inode); } @@ -950,7 +954,7 @@ cifs_get_root(struct smb3_fs_context *ctx, struct super_block *sb) char *s, *p; char sep; - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_USE_PREFIX_PATH) + if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_USE_PREFIX_PATH) return dget(sb->s_root); full_path = cifs_build_path_to_root(ctx, cifs_sb, diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h index 080ea601c209..6f9b6c72962b 100644 --- a/fs/smb/client/cifsglob.h +++ b/fs/smb/client/cifsglob.h @@ -1580,24 +1580,59 @@ CIFS_I(struct inode *inode) return container_of(inode, struct cifsInodeInfo, netfs.inode); } -static inline struct cifs_sb_info * -CIFS_SB(struct super_block *sb) +static inline void *cinode_to_fsinfo(struct cifsInodeInfo *cinode) +{ + return cinode->netfs.inode.i_sb->s_fs_info; +} + +static inline void *super_to_fsinfo(struct super_block *sb) { return sb->s_fs_info; } -static inline struct cifs_sb_info * -CIFS_FILE_SB(struct file *file) +static inline void *inode_to_fsinfo(struct inode *inode) { - return CIFS_SB(file_inode(file)->i_sb); + return inode->i_sb->s_fs_info; +} + +static inline void *file_to_fsinfo(struct file *file) +{ + return file_inode(file)->i_sb->s_fs_info; +} + +static inline void *dentry_to_fsinfo(struct dentry *dentry) +{ + return dentry->d_sb->s_fs_info; +} + +static inline void *const_dentry_to_fsinfo(const struct dentry *dentry) +{ + return dentry->d_sb->s_fs_info; +} + +#define CIFS_SB(_ptr) \ + ((struct cifs_sb_info *) \ + _Generic((_ptr), \ + struct cifsInodeInfo * : cinode_to_fsinfo, \ + const struct dentry * : const_dentry_to_fsinfo, \ + struct super_block * : super_to_fsinfo, \ + struct dentry * : dentry_to_fsinfo, \ + struct inode * : inode_to_fsinfo, \ + struct file * : file_to_fsinfo)(_ptr)) + +/* + * Use atomic_t for @cifs_sb->mnt_cifs_flags as it is currently accessed + * locklessly and may be changed concurrently by mount/remount and reconnect + * paths. + */ +static inline unsigned int cifs_sb_flags(const struct cifs_sb_info *cifs_sb) +{ + return atomic_read(&cifs_sb->mnt_cifs_flags); } static inline char CIFS_DIR_SEP(const struct cifs_sb_info *cifs_sb) { - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_POSIX_PATHS) - return '/'; - else - return '\\'; + return (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_POSIX_PATHS) ? '/' : '\\'; } static inline void @@ -2314,9 +2349,8 @@ static inline bool __cifs_cache_state_check(struct cifsInodeInfo *cinode, unsigned int oplock_flags, unsigned int sb_flags) { - struct cifs_sb_info *cifs_sb = CIFS_SB(cinode->netfs.inode.i_sb); + unsigned int sflags = cifs_sb_flags(CIFS_SB(cinode)); unsigned int oplock = READ_ONCE(cinode->oplock); - unsigned int sflags = cifs_sb->mnt_cifs_flags; return (oplock & oplock_flags) || (sflags & sb_flags); } @@ -2336,4 +2370,9 @@ static inline void cifs_reset_oplock(struct cifsInodeInfo *cinode) WRITE_ONCE(cinode->oplock, 0); } +static inline bool cifs_forced_shutdown(const struct cifs_sb_info *sbi) +{ + return cifs_sb_flags(sbi) & CIFS_MOUNT_SHUTDOWN; +} + #endif /* _CIFS_GLOB_H */ diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index 33dfe116ca52..87686c804057 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -2915,8 +2915,8 @@ compare_mount_options(struct super_block *sb, struct cifs_mnt_data *mnt_data) { struct cifs_sb_info *old = CIFS_SB(sb); struct cifs_sb_info *new = mnt_data->cifs_sb; - unsigned int oldflags = old->mnt_cifs_flags & CIFS_MOUNT_MASK; - unsigned int newflags = new->mnt_cifs_flags & CIFS_MOUNT_MASK; + unsigned int oldflags = cifs_sb_flags(old) & CIFS_MOUNT_MASK; + unsigned int newflags = cifs_sb_flags(new) & CIFS_MOUNT_MASK; if ((sb->s_flags & CIFS_MS_MASK) != (mnt_data->flags & CIFS_MS_MASK)) return 0; @@ -2971,9 +2971,9 @@ static int match_prepath(struct super_block *sb, struct smb3_fs_context *ctx = mnt_data->ctx; struct cifs_sb_info *old = CIFS_SB(sb); struct cifs_sb_info *new = mnt_data->cifs_sb; - bool old_set = (old->mnt_cifs_flags & CIFS_MOUNT_USE_PREFIX_PATH) && + bool old_set = (cifs_sb_flags(old) & CIFS_MOUNT_USE_PREFIX_PATH) && old->prepath; - bool new_set = (new->mnt_cifs_flags & CIFS_MOUNT_USE_PREFIX_PATH) && + bool new_set = (cifs_sb_flags(new) & CIFS_MOUNT_USE_PREFIX_PATH) && new->prepath; if (tcon->origin_fullpath && @@ -3004,7 +3004,7 @@ cifs_match_super(struct super_block *sb, void *data) cifs_sb = CIFS_SB(sb); /* We do not want to use a superblock that has been shutdown */ - if (CIFS_MOUNT_SHUTDOWN & cifs_sb->mnt_cifs_flags) { + if (cifs_forced_shutdown(cifs_sb)) { spin_unlock(&cifs_tcp_ses_lock); return 0; } @@ -3469,6 +3469,8 @@ ip_connect(struct TCP_Server_Info *server) int cifs_setup_cifs_sb(struct cifs_sb_info *cifs_sb) { struct smb3_fs_context *ctx = cifs_sb->ctx; + unsigned int sbflags; + int rc = 0; INIT_DELAYED_WORK(&cifs_sb->prune_tlinks, cifs_prune_tlinks); INIT_LIST_HEAD(&cifs_sb->tcon_sb_link); @@ -3493,17 +3495,16 @@ int cifs_setup_cifs_sb(struct cifs_sb_info *cifs_sb) } ctx->local_nls = cifs_sb->local_nls; - smb3_update_mnt_flags(cifs_sb); + sbflags = smb3_update_mnt_flags(cifs_sb); if (ctx->direct_io) cifs_dbg(FYI, "mounting share using direct i/o\n"); if (ctx->cache_ro) { cifs_dbg(VFS, "mounting share with read only caching. Ensure that the share will not be modified while in use.\n"); - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_RO_CACHE; + sbflags |= CIFS_MOUNT_RO_CACHE; } else if (ctx->cache_rw) { cifs_dbg(VFS, "mounting share in single client RW caching mode. Ensure that no other systems will be accessing the share.\n"); - cifs_sb->mnt_cifs_flags |= (CIFS_MOUNT_RO_CACHE | - CIFS_MOUNT_RW_CACHE); + sbflags |= CIFS_MOUNT_RO_CACHE | CIFS_MOUNT_RW_CACHE; } if ((ctx->cifs_acl) && (ctx->dynperm)) @@ -3512,16 +3513,19 @@ int cifs_setup_cifs_sb(struct cifs_sb_info *cifs_sb) if (ctx->prepath) { cifs_sb->prepath = kstrdup(ctx->prepath, GFP_KERNEL); if (cifs_sb->prepath == NULL) - return -ENOMEM; - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_USE_PREFIX_PATH; + rc = -ENOMEM; + else + sbflags |= CIFS_MOUNT_USE_PREFIX_PATH; } - return 0; + atomic_set(&cifs_sb->mnt_cifs_flags, sbflags); + return rc; } /* Release all succeed connections */ void cifs_mount_put_conns(struct cifs_mount_ctx *mnt_ctx) { + struct cifs_sb_info *cifs_sb = mnt_ctx->cifs_sb; int rc = 0; if (mnt_ctx->tcon) @@ -3533,7 +3537,7 @@ void cifs_mount_put_conns(struct cifs_mount_ctx *mnt_ctx) mnt_ctx->ses = NULL; mnt_ctx->tcon = NULL; mnt_ctx->server = NULL; - mnt_ctx->cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_POSIX_PATHS; + atomic_andnot(CIFS_MOUNT_POSIX_PATHS, &cifs_sb->mnt_cifs_flags); free_xid(mnt_ctx->xid); } @@ -3587,19 +3591,23 @@ out: int cifs_mount_get_tcon(struct cifs_mount_ctx *mnt_ctx) { struct TCP_Server_Info *server; + struct cifs_tcon *tcon = NULL; struct cifs_sb_info *cifs_sb; struct smb3_fs_context *ctx; - struct cifs_tcon *tcon = NULL; + unsigned int sbflags; int rc = 0; - if (WARN_ON_ONCE(!mnt_ctx || !mnt_ctx->server || !mnt_ctx->ses || !mnt_ctx->fs_ctx || - !mnt_ctx->cifs_sb)) { - rc = -EINVAL; - goto out; + if (WARN_ON_ONCE(!mnt_ctx)) + return -EINVAL; + if (WARN_ON_ONCE(!mnt_ctx->server || !mnt_ctx->ses || + !mnt_ctx->fs_ctx || !mnt_ctx->cifs_sb)) { + mnt_ctx->tcon = NULL; + return -EINVAL; } server = mnt_ctx->server; ctx = mnt_ctx->fs_ctx; cifs_sb = mnt_ctx->cifs_sb; + sbflags = cifs_sb_flags(cifs_sb); /* search for existing tcon to this server share */ tcon = cifs_get_tcon(mnt_ctx->ses, ctx); @@ -3614,9 +3622,9 @@ int cifs_mount_get_tcon(struct cifs_mount_ctx *mnt_ctx) * path (i.e., do not remap / and \ and do not map any special characters) */ if (tcon->posix_extensions) { - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_POSIX_PATHS; - cifs_sb->mnt_cifs_flags &= ~(CIFS_MOUNT_MAP_SFM_CHR | - CIFS_MOUNT_MAP_SPECIAL_CHR); + sbflags |= CIFS_MOUNT_POSIX_PATHS; + sbflags &= ~(CIFS_MOUNT_MAP_SFM_CHR | + CIFS_MOUNT_MAP_SPECIAL_CHR); } #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY @@ -3643,12 +3651,11 @@ int cifs_mount_get_tcon(struct cifs_mount_ctx *mnt_ctx) /* do not care if a following call succeed - informational */ if (!tcon->pipe && server->ops->qfs_tcon) { server->ops->qfs_tcon(mnt_ctx->xid, tcon, cifs_sb); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RO_CACHE) { + if (sbflags & CIFS_MOUNT_RO_CACHE) { if (tcon->fsDevInfo.DeviceCharacteristics & cpu_to_le32(FILE_READ_ONLY_DEVICE)) cifs_dbg(VFS, "mounted to read only share\n"); - else if ((cifs_sb->mnt_cifs_flags & - CIFS_MOUNT_RW_CACHE) == 0) + else if (!(sbflags & CIFS_MOUNT_RW_CACHE)) cifs_dbg(VFS, "read only mount of RW share\n"); /* no need to log a RW mount of a typical RW share */ } @@ -3660,11 +3667,12 @@ int cifs_mount_get_tcon(struct cifs_mount_ctx *mnt_ctx) * Inside cifs_fscache_get_super_cookie it checks * that we do not get super cookie twice. */ - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_FSCACHE) + if (sbflags & CIFS_MOUNT_FSCACHE) cifs_fscache_get_super_cookie(tcon); out: mnt_ctx->tcon = tcon; + atomic_set(&cifs_sb->mnt_cifs_flags, sbflags); return rc; } @@ -3783,7 +3791,8 @@ int cifs_is_path_remote(struct cifs_mount_ctx *mnt_ctx) cifs_sb, full_path, tcon->Flags & SMB_SHARE_IS_IN_DFS); if (rc != 0) { cifs_server_dbg(VFS, "cannot query dirs between root and final path, enabling CIFS_MOUNT_USE_PREFIX_PATH\n"); - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_USE_PREFIX_PATH; + atomic_or(CIFS_MOUNT_USE_PREFIX_PATH, + &cifs_sb->mnt_cifs_flags); rc = 0; } } @@ -3863,7 +3872,7 @@ int cifs_mount(struct cifs_sb_info *cifs_sb, struct smb3_fs_context *ctx) * Force the use of prefix path to support failover on DFS paths that resolve to targets * that have different prefix paths. */ - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_USE_PREFIX_PATH; + atomic_or(CIFS_MOUNT_USE_PREFIX_PATH, &cifs_sb->mnt_cifs_flags); kfree(cifs_sb->prepath); cifs_sb->prepath = ctx->prepath; ctx->prepath = NULL; @@ -4357,7 +4366,7 @@ cifs_sb_tlink(struct cifs_sb_info *cifs_sb) kuid_t fsuid = current_fsuid(); int err; - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER)) + if (!(cifs_sb_flags(cifs_sb) & CIFS_MOUNT_MULTIUSER)) return cifs_get_tlink(cifs_sb_master_tlink(cifs_sb)); spin_lock(&cifs_sb->tlink_tree_lock); diff --git a/fs/smb/client/dfs_cache.c b/fs/smb/client/dfs_cache.c index 983132735d72..83f8cf2f8d2b 100644 --- a/fs/smb/client/dfs_cache.c +++ b/fs/smb/client/dfs_cache.c @@ -1333,7 +1333,7 @@ int dfs_cache_remount_fs(struct cifs_sb_info *cifs_sb) * Force the use of prefix path to support failover on DFS paths that resolve to targets * that have different prefix paths. */ - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_USE_PREFIX_PATH; + atomic_or(CIFS_MOUNT_USE_PREFIX_PATH, &cifs_sb->mnt_cifs_flags); refresh_tcon_referral(tcon, true); return 0; diff --git a/fs/smb/client/dir.c b/fs/smb/client/dir.c index cb10088197d2..953f1fee8cb8 100644 --- a/fs/smb/client/dir.c +++ b/fs/smb/client/dir.c @@ -82,10 +82,11 @@ char *__build_path_from_dentry_optional_prefix(struct dentry *direntry, void *pa const char *tree, int tree_len, bool prefix) { - int dfsplen; - int pplen = 0; - struct cifs_sb_info *cifs_sb = CIFS_SB(direntry->d_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(direntry); + unsigned int sbflags = cifs_sb_flags(cifs_sb); char dirsep = CIFS_DIR_SEP(cifs_sb); + int pplen = 0; + int dfsplen; char *s; if (unlikely(!page)) @@ -96,7 +97,7 @@ char *__build_path_from_dentry_optional_prefix(struct dentry *direntry, void *pa else dfsplen = 0; - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_USE_PREFIX_PATH) + if (sbflags & CIFS_MOUNT_USE_PREFIX_PATH) pplen = cifs_sb->prepath ? strlen(cifs_sb->prepath) + 1 : 0; s = dentry_path_raw(direntry, page, PATH_MAX); @@ -123,7 +124,7 @@ char *__build_path_from_dentry_optional_prefix(struct dentry *direntry, void *pa if (dfsplen) { s -= dfsplen; memcpy(s, tree, dfsplen); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_POSIX_PATHS) { + if (sbflags & CIFS_MOUNT_POSIX_PATHS) { int i; for (i = 0; i < dfsplen; i++) { if (s[i] == '\\') @@ -152,7 +153,7 @@ char *build_path_from_dentry_optional_prefix(struct dentry *direntry, void *page static int check_name(struct dentry *direntry, struct cifs_tcon *tcon) { - struct cifs_sb_info *cifs_sb = CIFS_SB(direntry->d_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(direntry); int i; if (unlikely(tcon->fsAttrInfo.MaxPathNameComponentLength && @@ -160,7 +161,7 @@ check_name(struct dentry *direntry, struct cifs_tcon *tcon) le32_to_cpu(tcon->fsAttrInfo.MaxPathNameComponentLength))) return -ENAMETOOLONG; - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_POSIX_PATHS)) { + if (!(cifs_sb_flags(cifs_sb) & CIFS_MOUNT_POSIX_PATHS)) { for (i = 0; i < direntry->d_name.len; i++) { if (direntry->d_name.name[i] == '\\') { cifs_dbg(FYI, "Invalid file name\n"); @@ -181,11 +182,12 @@ static int cifs_do_create(struct inode *inode, struct dentry *direntry, unsigned int rc = -ENOENT; int create_options = CREATE_NOT_DIR; int desired_access; - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(inode); struct cifs_tcon *tcon = tlink_tcon(tlink); const char *full_path; void *page = alloc_dentry_path(); struct inode *newinode = NULL; + unsigned int sbflags; int disposition; struct TCP_Server_Info *server = tcon->ses->server; struct cifs_open_parms oparms; @@ -365,6 +367,7 @@ retry_open: * If Open reported that we actually created a file then we now have to * set the mode if possible. */ + sbflags = cifs_sb_flags(cifs_sb); if ((tcon->unix_ext) && (*oplock & CIFS_CREATE_ACTION)) { struct cifs_unix_set_info_args args = { .mode = mode, @@ -374,7 +377,7 @@ retry_open: .device = 0, }; - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SET_UID) { + if (sbflags & CIFS_MOUNT_SET_UID) { args.uid = current_fsuid(); if (inode->i_mode & S_ISGID) args.gid = inode->i_gid; @@ -411,9 +414,9 @@ cifs_create_get_file_info: if (server->ops->set_lease_key) server->ops->set_lease_key(newinode, fid); if ((*oplock & CIFS_CREATE_ACTION) && S_ISREG(newinode->i_mode)) { - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DYNPERM) + if (sbflags & CIFS_MOUNT_DYNPERM) newinode->i_mode = mode; - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SET_UID) { + if (sbflags & CIFS_MOUNT_SET_UID) { newinode->i_uid = current_fsuid(); if (inode->i_mode & S_ISGID) newinode->i_gid = inode->i_gid; @@ -458,18 +461,20 @@ int cifs_atomic_open(struct inode *inode, struct dentry *direntry, struct file *file, unsigned int oflags, umode_t mode) { - int rc; - unsigned int xid; + struct cifs_sb_info *cifs_sb = CIFS_SB(inode); + struct cifs_open_info_data buf = {}; + struct TCP_Server_Info *server; + struct cifsFileInfo *file_info; + struct cifs_pending_open open; + struct cifs_fid fid = {}; struct tcon_link *tlink; struct cifs_tcon *tcon; - struct TCP_Server_Info *server; - struct cifs_fid fid = {}; - struct cifs_pending_open open; + unsigned int sbflags; + unsigned int xid; __u32 oplock; - struct cifsFileInfo *file_info; - struct cifs_open_info_data buf = {}; + int rc; - if (unlikely(cifs_forced_shutdown(CIFS_SB(inode->i_sb)))) + if (unlikely(cifs_forced_shutdown(cifs_sb))) return smb_EIO(smb_eio_trace_forced_shutdown); /* @@ -499,7 +504,7 @@ cifs_atomic_open(struct inode *inode, struct dentry *direntry, cifs_dbg(FYI, "parent inode = 0x%p name is: %pd and dentry = 0x%p\n", inode, direntry, direntry); - tlink = cifs_sb_tlink(CIFS_SB(inode->i_sb)); + tlink = cifs_sb_tlink(cifs_sb); if (IS_ERR(tlink)) { rc = PTR_ERR(tlink); goto out_free_xid; @@ -536,13 +541,13 @@ cifs_atomic_open(struct inode *inode, struct dentry *direntry, goto out; } - if (file->f_flags & O_DIRECT && - CIFS_SB(inode->i_sb)->mnt_cifs_flags & CIFS_MOUNT_STRICT_IO) { - if (CIFS_SB(inode->i_sb)->mnt_cifs_flags & CIFS_MOUNT_NO_BRL) + sbflags = cifs_sb_flags(cifs_sb); + if ((file->f_flags & O_DIRECT) && (sbflags & CIFS_MOUNT_STRICT_IO)) { + if (sbflags & CIFS_MOUNT_NO_BRL) file->f_op = &cifs_file_direct_nobrl_ops; else file->f_op = &cifs_file_direct_ops; - } + } file_info = cifs_new_fileinfo(&fid, file, tlink, oplock, buf.symlink_target); if (file_info == NULL) { diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c index 18f31d4eb98d..f3ddcdf406c8 100644 --- a/fs/smb/client/file.c +++ b/fs/smb/client/file.c @@ -270,7 +270,7 @@ static void cifs_begin_writeback(struct netfs_io_request *wreq) static int cifs_init_request(struct netfs_io_request *rreq, struct file *file) { struct cifs_io_request *req = container_of(rreq, struct cifs_io_request, rreq); - struct cifs_sb_info *cifs_sb = CIFS_SB(rreq->inode->i_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(rreq->inode); struct cifsFileInfo *open_file = NULL; rreq->rsize = cifs_sb->ctx->rsize; @@ -281,7 +281,7 @@ static int cifs_init_request(struct netfs_io_request *rreq, struct file *file) open_file = file->private_data; rreq->netfs_priv = file->private_data; req->cfile = cifsFileInfo_get(open_file); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) + if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_RWPIDFORWARD) req->pid = req->cfile->pid; } else if (rreq->origin != NETFS_WRITEBACK) { WARN_ON_ONCE(1); @@ -906,7 +906,7 @@ void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, * close because it may cause a error when we open this file * again and get at least level II oplock. */ - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_STRICT_IO) + if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_STRICT_IO) set_bit(CIFS_INO_INVALID_MAPPING, &cifsi->flags); cifs_set_oplock_level(cifsi, 0); } @@ -955,11 +955,11 @@ void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, int cifs_file_flush(const unsigned int xid, struct inode *inode, struct cifsFileInfo *cfile) { - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(inode); struct cifs_tcon *tcon; int rc; - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOSSYNC) + if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_NOSSYNC) return 0; if (cfile && (OPEN_FMODE(cfile->f_flags) & FMODE_WRITE)) { @@ -1015,24 +1015,24 @@ static int cifs_do_truncate(const unsigned int xid, struct dentry *dentry) int cifs_open(struct inode *inode, struct file *file) { + struct cifs_sb_info *cifs_sb = CIFS_SB(inode); + struct cifs_open_info_data data = {}; + struct cifsFileInfo *cfile = NULL; + struct TCP_Server_Info *server; + struct cifs_pending_open open; + bool posix_open_ok = false; + struct cifs_fid fid = {}; + struct tcon_link *tlink; + struct cifs_tcon *tcon; + const char *full_path; + unsigned int sbflags; int rc = -EACCES; unsigned int xid; __u32 oplock; - struct cifs_sb_info *cifs_sb; - struct TCP_Server_Info *server; - struct cifs_tcon *tcon; - struct tcon_link *tlink; - struct cifsFileInfo *cfile = NULL; void *page; - const char *full_path; - bool posix_open_ok = false; - struct cifs_fid fid = {}; - struct cifs_pending_open open; - struct cifs_open_info_data data = {}; xid = get_xid(); - cifs_sb = CIFS_SB(inode->i_sb); if (unlikely(cifs_forced_shutdown(cifs_sb))) { free_xid(xid); return smb_EIO(smb_eio_trace_forced_shutdown); @@ -1056,9 +1056,9 @@ int cifs_open(struct inode *inode, struct file *file) cifs_dbg(FYI, "inode = 0x%p file flags are 0x%x for %s\n", inode, file->f_flags, full_path); - if (file->f_flags & O_DIRECT && - cifs_sb->mnt_cifs_flags & CIFS_MOUNT_STRICT_IO) { - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_BRL) + sbflags = cifs_sb_flags(cifs_sb); + if ((file->f_flags & O_DIRECT) && (sbflags & CIFS_MOUNT_STRICT_IO)) { + if (sbflags & CIFS_MOUNT_NO_BRL) file->f_op = &cifs_file_direct_nobrl_ops; else file->f_op = &cifs_file_direct_ops; @@ -1209,7 +1209,7 @@ cifs_relock_file(struct cifsFileInfo *cfile) struct cifs_tcon *tcon = tlink_tcon(cfile->tlink); int rc = 0; #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY - struct cifs_sb_info *cifs_sb = CIFS_SB(cfile->dentry->d_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(cinode); #endif /* CONFIG_CIFS_ALLOW_INSECURE_LEGACY */ down_read_nested(&cinode->lock_sem, SINGLE_DEPTH_NESTING); @@ -1222,7 +1222,7 @@ cifs_relock_file(struct cifsFileInfo *cfile) #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY if (cap_unix(tcon->ses) && (CIFS_UNIX_FCNTL_CAP & le64_to_cpu(tcon->fsUnixInfo.Capability)) && - ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) == 0)) + ((cifs_sb_flags(cifs_sb) & CIFS_MOUNT_NOPOSIXBRL) == 0)) rc = cifs_push_posix_locks(cfile); else #endif /* CONFIG_CIFS_ALLOW_INSECURE_LEGACY */ @@ -2011,7 +2011,7 @@ cifs_push_locks(struct cifsFileInfo *cfile) struct cifs_tcon *tcon = tlink_tcon(cfile->tlink); int rc = 0; #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY - struct cifs_sb_info *cifs_sb = CIFS_SB(cfile->dentry->d_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(cinode); #endif /* CONFIG_CIFS_ALLOW_INSECURE_LEGACY */ /* we are going to update can_cache_brlcks here - need a write access */ @@ -2024,7 +2024,7 @@ cifs_push_locks(struct cifsFileInfo *cfile) #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY if (cap_unix(tcon->ses) && (CIFS_UNIX_FCNTL_CAP & le64_to_cpu(tcon->fsUnixInfo.Capability)) && - ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) == 0)) + ((cifs_sb_flags(cifs_sb) & CIFS_MOUNT_NOPOSIXBRL) == 0)) rc = cifs_push_posix_locks(cfile); else #endif /* CONFIG_CIFS_ALLOW_INSECURE_LEGACY */ @@ -2428,11 +2428,11 @@ int cifs_flock(struct file *file, int cmd, struct file_lock *fl) cifs_read_flock(fl, &type, &lock, &unlock, &wait_flag, tcon->ses->server); - cifs_sb = CIFS_FILE_SB(file); + cifs_sb = CIFS_SB(file); if (cap_unix(tcon->ses) && (CIFS_UNIX_FCNTL_CAP & le64_to_cpu(tcon->fsUnixInfo.Capability)) && - ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) == 0)) + ((cifs_sb_flags(cifs_sb) & CIFS_MOUNT_NOPOSIXBRL) == 0)) posix_lck = true; if (!lock && !unlock) { @@ -2455,14 +2455,14 @@ int cifs_flock(struct file *file, int cmd, struct file_lock *fl) int cifs_lock(struct file *file, int cmd, struct file_lock *flock) { - int rc, xid; + struct cifs_sb_info *cifs_sb = CIFS_SB(file); + struct cifsFileInfo *cfile; int lock = 0, unlock = 0; bool wait_flag = false; bool posix_lck = false; - struct cifs_sb_info *cifs_sb; struct cifs_tcon *tcon; - struct cifsFileInfo *cfile; __u32 type; + int rc, xid; rc = -EACCES; xid = get_xid(); @@ -2477,12 +2477,11 @@ int cifs_lock(struct file *file, int cmd, struct file_lock *flock) cifs_read_flock(flock, &type, &lock, &unlock, &wait_flag, tcon->ses->server); - cifs_sb = CIFS_FILE_SB(file); set_bit(CIFS_INO_CLOSE_ON_LOCK, &CIFS_I(d_inode(cfile->dentry))->flags); if (cap_unix(tcon->ses) && (CIFS_UNIX_FCNTL_CAP & le64_to_cpu(tcon->fsUnixInfo.Capability)) && - ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) == 0)) + ((cifs_sb_flags(cifs_sb) & CIFS_MOUNT_NOPOSIXBRL) == 0)) posix_lck = true; /* * BB add code here to normalize offset and length to account for @@ -2532,11 +2531,11 @@ void cifs_write_subrequest_terminated(struct cifs_io_subrequest *wdata, ssize_t struct cifsFileInfo *find_readable_file(struct cifsInodeInfo *cifs_inode, bool fsuid_only) { + struct cifs_sb_info *cifs_sb = CIFS_SB(cifs_inode); struct cifsFileInfo *open_file = NULL; - struct cifs_sb_info *cifs_sb = CIFS_SB(cifs_inode->netfs.inode.i_sb); /* only filter by fsuid on multiuser mounts */ - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER)) + if (!(cifs_sb_flags(cifs_sb) & CIFS_MOUNT_MULTIUSER)) fsuid_only = false; spin_lock(&cifs_inode->open_file_lock); @@ -2589,10 +2588,10 @@ cifs_get_writable_file(struct cifsInodeInfo *cifs_inode, int flags, return rc; } - cifs_sb = CIFS_SB(cifs_inode->netfs.inode.i_sb); + cifs_sb = CIFS_SB(cifs_inode); /* only filter by fsuid on multiuser mounts */ - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER)) + if (!(cifs_sb_flags(cifs_sb) & CIFS_MOUNT_MULTIUSER)) fsuid_only = false; spin_lock(&cifs_inode->open_file_lock); @@ -2787,7 +2786,7 @@ int cifs_fsync(struct file *file, loff_t start, loff_t end, int datasync) struct TCP_Server_Info *server; struct cifsFileInfo *smbfile = file->private_data; struct inode *inode = file_inode(file); - struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(file); + struct cifs_sb_info *cifs_sb = CIFS_SB(file); rc = file_write_and_wait_range(file, start, end); if (rc) { @@ -2801,7 +2800,7 @@ int cifs_fsync(struct file *file, loff_t start, loff_t end, int datasync) file, datasync); tcon = tlink_tcon(smbfile->tlink); - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOSSYNC)) { + if (!(cifs_sb_flags(cifs_sb) & CIFS_MOUNT_NOSSYNC)) { server = tcon->ses->server; if (server->ops->flush == NULL) { rc = -ENOSYS; @@ -2853,7 +2852,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from) struct inode *inode = file->f_mapping->host; struct cifsInodeInfo *cinode = CIFS_I(inode); struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server; - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(inode); ssize_t rc; rc = netfs_start_io_write(inode); @@ -2870,7 +2869,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from) if (rc <= 0) goto out; - if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) && + if ((cifs_sb_flags(cifs_sb) & CIFS_MOUNT_NOPOSIXBRL) && (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from), server->vals->exclusive_lock_type, 0, NULL, CIFS_WRITE_OP))) { @@ -2893,7 +2892,7 @@ cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from) { struct inode *inode = file_inode(iocb->ki_filp); struct cifsInodeInfo *cinode = CIFS_I(inode); - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(inode); struct cifsFileInfo *cfile = (struct cifsFileInfo *) iocb->ki_filp->private_data; struct cifs_tcon *tcon = tlink_tcon(cfile->tlink); @@ -2906,7 +2905,7 @@ cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from) if (CIFS_CACHE_WRITE(cinode)) { if (cap_unix(tcon->ses) && (CIFS_UNIX_FCNTL_CAP & le64_to_cpu(tcon->fsUnixInfo.Capability)) && - ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) == 0)) { + ((cifs_sb_flags(cifs_sb) & CIFS_MOUNT_NOPOSIXBRL) == 0)) { written = netfs_file_write_iter(iocb, from); goto out; } @@ -2994,7 +2993,7 @@ cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to) { struct inode *inode = file_inode(iocb->ki_filp); struct cifsInodeInfo *cinode = CIFS_I(inode); - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(inode); struct cifsFileInfo *cfile = (struct cifsFileInfo *) iocb->ki_filp->private_data; struct cifs_tcon *tcon = tlink_tcon(cfile->tlink); @@ -3011,7 +3010,7 @@ cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to) if (!CIFS_CACHE_READ(cinode)) return netfs_unbuffered_read_iter(iocb, to); - if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) == 0) { + if ((cifs_sb_flags(cifs_sb) & CIFS_MOUNT_NOPOSIXBRL) == 0) { if (iocb->ki_flags & IOCB_DIRECT) return netfs_unbuffered_read_iter(iocb, to); return netfs_buffered_read_iter(iocb, to); @@ -3130,10 +3129,9 @@ bool is_size_safe_to_change(struct cifsInodeInfo *cifsInode, __u64 end_of_file, if (is_inode_writable(cifsInode) || ((cifsInode->oplock & CIFS_CACHE_RW_FLG) != 0 && from_readdir)) { /* This inode is open for write at least once */ - struct cifs_sb_info *cifs_sb; + struct cifs_sb_info *cifs_sb = CIFS_SB(cifsInode); - cifs_sb = CIFS_SB(cifsInode->netfs.inode.i_sb); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DIRECT_IO) { + if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_DIRECT_IO) { /* since no page cache to corrupt on directio we can change size safely */ return true; @@ -3181,7 +3179,7 @@ void cifs_oplock_break(struct work_struct *work) server = tcon->ses->server; scoped_guard(spinlock, &cinode->open_file_lock) { - unsigned int sbflags = cifs_sb->mnt_cifs_flags; + unsigned int sbflags = cifs_sb_flags(cifs_sb); server->ops->downgrade_oplock(server, cinode, cfile->oplock_level, cfile->oplock_epoch, &purge_cache); diff --git a/fs/smb/client/fs_context.c b/fs/smb/client/fs_context.c index 09fe749e97ee..54090739535f 100644 --- a/fs/smb/client/fs_context.c +++ b/fs/smb/client/fs_context.c @@ -2062,161 +2062,160 @@ smb3_cleanup_fs_context(struct smb3_fs_context *ctx) kfree(ctx); } -void smb3_update_mnt_flags(struct cifs_sb_info *cifs_sb) +unsigned int smb3_update_mnt_flags(struct cifs_sb_info *cifs_sb) { + unsigned int sbflags = cifs_sb_flags(cifs_sb); struct smb3_fs_context *ctx = cifs_sb->ctx; if (ctx->nodfs) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_NO_DFS; + sbflags |= CIFS_MOUNT_NO_DFS; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_NO_DFS; + sbflags &= ~CIFS_MOUNT_NO_DFS; if (ctx->noperm) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_NO_PERM; + sbflags |= CIFS_MOUNT_NO_PERM; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_NO_PERM; + sbflags &= ~CIFS_MOUNT_NO_PERM; if (ctx->setuids) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_SET_UID; + sbflags |= CIFS_MOUNT_SET_UID; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_SET_UID; + sbflags &= ~CIFS_MOUNT_SET_UID; if (ctx->setuidfromacl) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_UID_FROM_ACL; + sbflags |= CIFS_MOUNT_UID_FROM_ACL; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_UID_FROM_ACL; + sbflags &= ~CIFS_MOUNT_UID_FROM_ACL; if (ctx->server_ino) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_SERVER_INUM; + sbflags |= CIFS_MOUNT_SERVER_INUM; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_SERVER_INUM; + sbflags &= ~CIFS_MOUNT_SERVER_INUM; if (ctx->remap) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_MAP_SFM_CHR; + sbflags |= CIFS_MOUNT_MAP_SFM_CHR; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_MAP_SFM_CHR; + sbflags &= ~CIFS_MOUNT_MAP_SFM_CHR; if (ctx->sfu_remap) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_MAP_SPECIAL_CHR; + sbflags |= CIFS_MOUNT_MAP_SPECIAL_CHR; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_MAP_SPECIAL_CHR; + sbflags &= ~CIFS_MOUNT_MAP_SPECIAL_CHR; if (ctx->no_xattr) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_NO_XATTR; + sbflags |= CIFS_MOUNT_NO_XATTR; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_NO_XATTR; + sbflags &= ~CIFS_MOUNT_NO_XATTR; if (ctx->sfu_emul) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_UNX_EMUL; + sbflags |= CIFS_MOUNT_UNX_EMUL; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_UNX_EMUL; + sbflags &= ~CIFS_MOUNT_UNX_EMUL; if (ctx->nobrl) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_NO_BRL; + sbflags |= CIFS_MOUNT_NO_BRL; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_NO_BRL; + sbflags &= ~CIFS_MOUNT_NO_BRL; if (ctx->nohandlecache) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_NO_HANDLE_CACHE; + sbflags |= CIFS_MOUNT_NO_HANDLE_CACHE; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_NO_HANDLE_CACHE; + sbflags &= ~CIFS_MOUNT_NO_HANDLE_CACHE; if (ctx->nostrictsync) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_NOSSYNC; + sbflags |= CIFS_MOUNT_NOSSYNC; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_NOSSYNC; + sbflags &= ~CIFS_MOUNT_NOSSYNC; if (ctx->mand_lock) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_NOPOSIXBRL; + sbflags |= CIFS_MOUNT_NOPOSIXBRL; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_NOPOSIXBRL; + sbflags &= ~CIFS_MOUNT_NOPOSIXBRL; if (ctx->rwpidforward) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_RWPIDFORWARD; + sbflags |= CIFS_MOUNT_RWPIDFORWARD; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_RWPIDFORWARD; + sbflags &= ~CIFS_MOUNT_RWPIDFORWARD; if (ctx->mode_ace) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_MODE_FROM_SID; + sbflags |= CIFS_MOUNT_MODE_FROM_SID; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_MODE_FROM_SID; + sbflags &= ~CIFS_MOUNT_MODE_FROM_SID; if (ctx->cifs_acl) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_CIFS_ACL; + sbflags |= CIFS_MOUNT_CIFS_ACL; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_CIFS_ACL; + sbflags &= ~CIFS_MOUNT_CIFS_ACL; if (ctx->backupuid_specified) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_CIFS_BACKUPUID; + sbflags |= CIFS_MOUNT_CIFS_BACKUPUID; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_CIFS_BACKUPUID; + sbflags &= ~CIFS_MOUNT_CIFS_BACKUPUID; if (ctx->backupgid_specified) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_CIFS_BACKUPGID; + sbflags |= CIFS_MOUNT_CIFS_BACKUPGID; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_CIFS_BACKUPGID; + sbflags &= ~CIFS_MOUNT_CIFS_BACKUPGID; if (ctx->override_uid) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_OVERR_UID; + sbflags |= CIFS_MOUNT_OVERR_UID; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_OVERR_UID; + sbflags &= ~CIFS_MOUNT_OVERR_UID; if (ctx->override_gid) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_OVERR_GID; + sbflags |= CIFS_MOUNT_OVERR_GID; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_OVERR_GID; + sbflags &= ~CIFS_MOUNT_OVERR_GID; if (ctx->dynperm) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_DYNPERM; + sbflags |= CIFS_MOUNT_DYNPERM; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_DYNPERM; + sbflags &= ~CIFS_MOUNT_DYNPERM; if (ctx->fsc) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_FSCACHE; + sbflags |= CIFS_MOUNT_FSCACHE; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_FSCACHE; + sbflags &= ~CIFS_MOUNT_FSCACHE; if (ctx->multiuser) - cifs_sb->mnt_cifs_flags |= (CIFS_MOUNT_MULTIUSER | - CIFS_MOUNT_NO_PERM); + sbflags |= CIFS_MOUNT_MULTIUSER | CIFS_MOUNT_NO_PERM; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_MULTIUSER; + sbflags &= ~CIFS_MOUNT_MULTIUSER; if (ctx->strict_io) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_STRICT_IO; + sbflags |= CIFS_MOUNT_STRICT_IO; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_STRICT_IO; + sbflags &= ~CIFS_MOUNT_STRICT_IO; if (ctx->direct_io) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_DIRECT_IO; + sbflags |= CIFS_MOUNT_DIRECT_IO; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_DIRECT_IO; + sbflags &= ~CIFS_MOUNT_DIRECT_IO; if (ctx->mfsymlinks) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_MF_SYMLINKS; + sbflags |= CIFS_MOUNT_MF_SYMLINKS; else - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_MF_SYMLINKS; - if (ctx->mfsymlinks) { - if (ctx->sfu_emul) { - /* - * Our SFU ("Services for Unix") emulation allows now - * creating new and reading existing SFU symlinks. - * Older Linux kernel versions were not able to neither - * read existing nor create new SFU symlinks. But - * creating and reading SFU style mknod and FIFOs was - * supported for long time. When "mfsymlinks" and - * "sfu" are both enabled at the same time, it allows - * reading both types of symlinks, but will only create - * them with mfsymlinks format. This allows better - * Apple compatibility, compatibility with older Linux - * kernel clients (probably better for Samba too) - * while still recognizing old Windows style symlinks. - */ - cifs_dbg(VFS, "mount options mfsymlinks and sfu both enabled\n"); - } - } - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_SHUTDOWN; + sbflags &= ~CIFS_MOUNT_MF_SYMLINKS; - return; + if (ctx->mfsymlinks && ctx->sfu_emul) { + /* + * Our SFU ("Services for Unix") emulation allows now + * creating new and reading existing SFU symlinks. + * Older Linux kernel versions were not able to neither + * read existing nor create new SFU symlinks. But + * creating and reading SFU style mknod and FIFOs was + * supported for long time. When "mfsymlinks" and + * "sfu" are both enabled at the same time, it allows + * reading both types of symlinks, but will only create + * them with mfsymlinks format. This allows better + * Apple compatibility, compatibility with older Linux + * kernel clients (probably better for Samba too) + * while still recognizing old Windows style symlinks. + */ + cifs_dbg(VFS, "mount options mfsymlinks and sfu both enabled\n"); + } + sbflags &= ~CIFS_MOUNT_SHUTDOWN; + atomic_set(&cifs_sb->mnt_cifs_flags, sbflags); + return sbflags; } diff --git a/fs/smb/client/fs_context.h b/fs/smb/client/fs_context.h index 49b2a6f09ca2..0b64fcb5d302 100644 --- a/fs/smb/client/fs_context.h +++ b/fs/smb/client/fs_context.h @@ -374,7 +374,7 @@ int smb3_fs_context_dup(struct smb3_fs_context *new_ctx, struct smb3_fs_context *ctx); int smb3_sync_session_ctx_passwords(struct cifs_sb_info *cifs_sb, struct cifs_ses *ses); -void smb3_update_mnt_flags(struct cifs_sb_info *cifs_sb); +unsigned int smb3_update_mnt_flags(struct cifs_sb_info *cifs_sb); /* * max deferred close timeout (jiffies) - 2^30 diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c index d4d3cfeb6c90..3e844c55ab8a 100644 --- a/fs/smb/client/inode.c +++ b/fs/smb/client/inode.c @@ -40,32 +40,33 @@ static void cifs_set_netfs_context(struct inode *inode) static void cifs_set_ops(struct inode *inode) { - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(inode); + struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb); struct netfs_inode *ictx = netfs_inode(inode); + unsigned int sbflags = cifs_sb_flags(cifs_sb); switch (inode->i_mode & S_IFMT) { case S_IFREG: inode->i_op = &cifs_file_inode_ops; - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DIRECT_IO) { + if (sbflags & CIFS_MOUNT_DIRECT_IO) { set_bit(NETFS_ICTX_UNBUFFERED, &ictx->flags); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_BRL) + if (sbflags & CIFS_MOUNT_NO_BRL) inode->i_fop = &cifs_file_direct_nobrl_ops; else inode->i_fop = &cifs_file_direct_ops; - } else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_STRICT_IO) { - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_BRL) + } else if (sbflags & CIFS_MOUNT_STRICT_IO) { + if (sbflags & CIFS_MOUNT_NO_BRL) inode->i_fop = &cifs_file_strict_nobrl_ops; else inode->i_fop = &cifs_file_strict_ops; - } else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_BRL) + } else if (sbflags & CIFS_MOUNT_NO_BRL) inode->i_fop = &cifs_file_nobrl_ops; else { /* not direct, send byte range locks */ inode->i_fop = &cifs_file_ops; } /* check if server can support readahead */ - if (cifs_sb_master_tcon(cifs_sb)->ses->server->max_read < - PAGE_SIZE + MAX_CIFS_HDR_SIZE) + if (tcon->ses->server->max_read < PAGE_SIZE + MAX_CIFS_HDR_SIZE) inode->i_data.a_ops = &cifs_addr_ops_smallbuf; else inode->i_data.a_ops = &cifs_addr_ops; @@ -194,8 +195,8 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr, inode->i_gid = fattr->cf_gid; /* if dynperm is set, don't clobber existing mode */ - if (inode_state_read(inode) & I_NEW || - !(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DYNPERM)) + if ((inode_state_read(inode) & I_NEW) || + !(cifs_sb_flags(cifs_sb) & CIFS_MOUNT_DYNPERM)) inode->i_mode = fattr->cf_mode; cifs_i->cifsAttrs = fattr->cf_cifsattrs; @@ -248,10 +249,8 @@ cifs_fill_uniqueid(struct super_block *sb, struct cifs_fattr *fattr) { struct cifs_sb_info *cifs_sb = CIFS_SB(sb); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM) - return; - - fattr->cf_uniqueid = iunique(sb, ROOT_I); + if (!(cifs_sb_flags(cifs_sb) & CIFS_MOUNT_SERVER_INUM)) + fattr->cf_uniqueid = iunique(sb, ROOT_I); } /* Fill a cifs_fattr struct with info from FILE_UNIX_BASIC_INFO. */ @@ -259,6 +258,8 @@ void cifs_unix_basic_to_fattr(struct cifs_fattr *fattr, FILE_UNIX_BASIC_INFO *info, struct cifs_sb_info *cifs_sb) { + unsigned int sbflags; + memset(fattr, 0, sizeof(*fattr)); fattr->cf_uniqueid = le64_to_cpu(info->UniqueId); fattr->cf_bytes = le64_to_cpu(info->NumOfBytes); @@ -317,8 +318,9 @@ cifs_unix_basic_to_fattr(struct cifs_fattr *fattr, FILE_UNIX_BASIC_INFO *info, break; } + sbflags = cifs_sb_flags(cifs_sb); fattr->cf_uid = cifs_sb->ctx->linux_uid; - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID)) { + if (!(sbflags & CIFS_MOUNT_OVERR_UID)) { u64 id = le64_to_cpu(info->Uid); if (id < ((uid_t)-1)) { kuid_t uid = make_kuid(&init_user_ns, id); @@ -328,7 +330,7 @@ cifs_unix_basic_to_fattr(struct cifs_fattr *fattr, FILE_UNIX_BASIC_INFO *info, } fattr->cf_gid = cifs_sb->ctx->linux_gid; - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID)) { + if (!(sbflags & CIFS_MOUNT_OVERR_GID)) { u64 id = le64_to_cpu(info->Gid); if (id < ((gid_t)-1)) { kgid_t gid = make_kgid(&init_user_ns, id); @@ -382,7 +384,7 @@ static int update_inode_info(struct super_block *sb, * * If file type or uniqueid is different, return error. */ - if (unlikely((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM) && + if (unlikely((cifs_sb_flags(cifs_sb) & CIFS_MOUNT_SERVER_INUM) && CIFS_I(*inode)->uniqueid != fattr->cf_uniqueid)) { CIFS_I(*inode)->time = 0; /* force reval */ return -ESTALE; @@ -468,7 +470,7 @@ static int cifs_get_unix_fattr(const unsigned char *full_path, cifs_fill_uniqueid(sb, fattr); /* check for Minshall+French symlinks */ - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MF_SYMLINKS) { + if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_MF_SYMLINKS) { tmprc = check_mf_symlink(xid, tcon, cifs_sb, fattr, full_path); cifs_dbg(FYI, "check_mf_symlink: %d\n", tmprc); } @@ -1081,7 +1083,7 @@ cifs_backup_query_path_info(int xid, else if ((tcon->ses->capabilities & tcon->ses->server->vals->cap_nt_find) == 0) info.info_level = SMB_FIND_FILE_INFO_STANDARD; - else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM) + else if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_SERVER_INUM) info.info_level = SMB_FIND_FILE_ID_FULL_DIR_INFO; else /* no srvino useful for fallback to some netapp */ info.info_level = SMB_FIND_FILE_DIRECTORY_INFO; @@ -1109,7 +1111,7 @@ static void cifs_set_fattr_ino(int xid, struct cifs_tcon *tcon, struct super_blo struct TCP_Server_Info *server = tcon->ses->server; int rc; - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM)) { + if (!(cifs_sb_flags(cifs_sb) & CIFS_MOUNT_SERVER_INUM)) { if (*inode) fattr->cf_uniqueid = CIFS_I(*inode)->uniqueid; else @@ -1263,14 +1265,15 @@ static int cifs_get_fattr(struct cifs_open_info_data *data, struct inode **inode, const char *full_path) { - struct cifs_open_info_data tmp_data = {}; - struct cifs_tcon *tcon; - struct TCP_Server_Info *server; - struct tcon_link *tlink; struct cifs_sb_info *cifs_sb = CIFS_SB(sb); + struct cifs_open_info_data tmp_data = {}; void *smb1_backup_rsp_buf = NULL; - int rc = 0; + struct TCP_Server_Info *server; + struct cifs_tcon *tcon; + struct tcon_link *tlink; + unsigned int sbflags; int tmprc = 0; + int rc = 0; tlink = cifs_sb_tlink(cifs_sb); if (IS_ERR(tlink)) @@ -1370,16 +1373,17 @@ static int cifs_get_fattr(struct cifs_open_info_data *data, #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY handle_mnt_opt: #endif /* CONFIG_CIFS_ALLOW_INSECURE_LEGACY */ + sbflags = cifs_sb_flags(cifs_sb); /* query for SFU type info if supported and needed */ if ((fattr->cf_cifsattrs & ATTR_SYSTEM) && - (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_UNX_EMUL)) { + (sbflags & CIFS_MOUNT_UNX_EMUL)) { tmprc = cifs_sfu_type(fattr, full_path, cifs_sb, xid); if (tmprc) cifs_dbg(FYI, "cifs_sfu_type failed: %d\n", tmprc); } /* fill in 0777 bits from ACL */ - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MODE_FROM_SID) { + if (sbflags & CIFS_MOUNT_MODE_FROM_SID) { rc = cifs_acl_to_fattr(cifs_sb, fattr, *inode, true, full_path, fid); if (rc == -EREMOTE) @@ -1389,7 +1393,7 @@ handle_mnt_opt: __func__, rc); goto out; } - } else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_CIFS_ACL) { + } else if (sbflags & CIFS_MOUNT_CIFS_ACL) { rc = cifs_acl_to_fattr(cifs_sb, fattr, *inode, false, full_path, fid); if (rc == -EREMOTE) @@ -1399,7 +1403,7 @@ handle_mnt_opt: __func__, rc); goto out; } - } else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_UNX_EMUL) + } else if (sbflags & CIFS_MOUNT_UNX_EMUL) /* fill in remaining high mode bits e.g. SUID, VTX */ cifs_sfu_mode(fattr, full_path, cifs_sb, xid); else if (!(tcon->posix_extensions)) @@ -1409,7 +1413,7 @@ handle_mnt_opt: /* check for Minshall+French symlinks */ - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MF_SYMLINKS) { + if (sbflags & CIFS_MOUNT_MF_SYMLINKS) { tmprc = check_mf_symlink(xid, tcon, cifs_sb, fattr, full_path); cifs_dbg(FYI, "check_mf_symlink: %d\n", tmprc); } @@ -1509,7 +1513,7 @@ static int smb311_posix_get_fattr(struct cifs_open_info_data *data, * 3. Tweak fattr based on mount options */ /* check for Minshall+French symlinks */ - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MF_SYMLINKS) { + if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_MF_SYMLINKS) { tmprc = check_mf_symlink(xid, tcon, cifs_sb, fattr, full_path); cifs_dbg(FYI, "check_mf_symlink: %d\n", tmprc); } @@ -1660,7 +1664,7 @@ struct inode *cifs_root_iget(struct super_block *sb) int len; int rc; - if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_USE_PREFIX_PATH) + if ((cifs_sb_flags(cifs_sb) & CIFS_MOUNT_USE_PREFIX_PATH) && cifs_sb->prepath) { len = strlen(cifs_sb->prepath); path = kzalloc(len + 2 /* leading sep + null */, GFP_KERNEL); @@ -2098,8 +2102,9 @@ cifs_mkdir_qinfo(struct inode *parent, struct dentry *dentry, umode_t mode, const char *full_path, struct cifs_sb_info *cifs_sb, struct cifs_tcon *tcon, const unsigned int xid) { - int rc = 0; struct inode *inode = NULL; + unsigned int sbflags; + int rc = 0; if (tcon->posix_extensions) { rc = smb311_posix_get_inode_info(&inode, full_path, @@ -2139,6 +2144,7 @@ cifs_mkdir_qinfo(struct inode *parent, struct dentry *dentry, umode_t mode, if (parent->i_mode & S_ISGID) mode |= S_ISGID; + sbflags = cifs_sb_flags(cifs_sb); #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY if (tcon->unix_ext) { struct cifs_unix_set_info_args args = { @@ -2148,7 +2154,7 @@ cifs_mkdir_qinfo(struct inode *parent, struct dentry *dentry, umode_t mode, .mtime = NO_CHANGE_64, .device = 0, }; - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SET_UID) { + if (sbflags & CIFS_MOUNT_SET_UID) { args.uid = current_fsuid(); if (parent->i_mode & S_ISGID) args.gid = parent->i_gid; @@ -2166,14 +2172,14 @@ cifs_mkdir_qinfo(struct inode *parent, struct dentry *dentry, umode_t mode, { #endif /* CONFIG_CIFS_ALLOW_INSECURE_LEGACY */ struct TCP_Server_Info *server = tcon->ses->server; - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_CIFS_ACL) && + if (!(sbflags & CIFS_MOUNT_CIFS_ACL) && (mode & S_IWUGO) == 0 && server->ops->mkdir_setinfo) server->ops->mkdir_setinfo(inode, full_path, cifs_sb, tcon, xid); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DYNPERM) + if (sbflags & CIFS_MOUNT_DYNPERM) inode->i_mode = (mode | S_IFDIR); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SET_UID) { + if (sbflags & CIFS_MOUNT_SET_UID) { inode->i_uid = current_fsuid(); if (inode->i_mode & S_ISGID) inode->i_gid = parent->i_gid; @@ -2686,7 +2692,7 @@ cifs_dentry_needs_reval(struct dentry *dentry) { struct inode *inode = d_inode(dentry); struct cifsInodeInfo *cifs_i = CIFS_I(inode); - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(inode); struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb); struct cached_fid *cfid = NULL; @@ -2727,7 +2733,7 @@ cifs_dentry_needs_reval(struct dentry *dentry) } /* hardlinked files w/ noserverino get "special" treatment */ - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM) && + if (!(cifs_sb_flags(cifs_sb) & CIFS_MOUNT_SERVER_INUM) && S_ISREG(inode->i_mode) && inode->i_nlink != 1) return true; @@ -2752,10 +2758,10 @@ cifs_wait_bit_killable(struct wait_bit_key *key, int mode) int cifs_revalidate_mapping(struct inode *inode) { - int rc; struct cifsInodeInfo *cifs_inode = CIFS_I(inode); + struct cifs_sb_info *cifs_sb = CIFS_SB(inode); unsigned long *flags = &cifs_inode->flags; - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + int rc; /* swapfiles are not supposed to be shared */ if (IS_SWAPFILE(inode)) @@ -2768,7 +2774,7 @@ cifs_revalidate_mapping(struct inode *inode) if (test_and_clear_bit(CIFS_INO_INVALID_MAPPING, flags)) { /* for cache=singleclient, do not invalidate */ - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RW_CACHE) + if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_RW_CACHE) goto skip_invalidate; cifs_inode->netfs.zero_point = cifs_inode->netfs.remote_i_size; @@ -2892,10 +2898,11 @@ int cifs_revalidate_dentry(struct dentry *dentry) int cifs_getattr(struct mnt_idmap *idmap, const struct path *path, struct kstat *stat, u32 request_mask, unsigned int flags) { - struct dentry *dentry = path->dentry; - struct cifs_sb_info *cifs_sb = CIFS_SB(dentry->d_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(path->dentry); struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb); + struct dentry *dentry = path->dentry; struct inode *inode = d_inode(dentry); + unsigned int sbflags; int rc; if (unlikely(cifs_forced_shutdown(CIFS_SB(inode->i_sb)))) @@ -2952,12 +2959,13 @@ int cifs_getattr(struct mnt_idmap *idmap, const struct path *path, * enabled, and the admin hasn't overridden them, set the ownership * to the fsuid/fsgid of the current process. */ - if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER) && - !(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_CIFS_ACL) && + sbflags = cifs_sb_flags(cifs_sb); + if ((sbflags & CIFS_MOUNT_MULTIUSER) && + !(sbflags & CIFS_MOUNT_CIFS_ACL) && !tcon->unix_ext) { - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID)) + if (!(sbflags & CIFS_MOUNT_OVERR_UID)) stat->uid = current_fsuid(); - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID)) + if (!(sbflags & CIFS_MOUNT_OVERR_GID)) stat->gid = current_fsgid(); } return 0; @@ -3102,7 +3110,7 @@ cifs_setattr_unix(struct dentry *direntry, struct iattr *attrs) void *page = alloc_dentry_path(); struct inode *inode = d_inode(direntry); struct cifsInodeInfo *cifsInode = CIFS_I(inode); - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(inode); struct tcon_link *tlink; struct cifs_tcon *pTcon; struct cifs_unix_set_info_args *args = NULL; @@ -3113,7 +3121,7 @@ cifs_setattr_unix(struct dentry *direntry, struct iattr *attrs) xid = get_xid(); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_PERM) + if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_NO_PERM) attrs->ia_valid |= ATTR_FORCE; rc = setattr_prepare(&nop_mnt_idmap, direntry, attrs); @@ -3266,26 +3274,26 @@ out: static int cifs_setattr_nounix(struct dentry *direntry, struct iattr *attrs) { - unsigned int xid; + struct inode *inode = d_inode(direntry); + struct cifsInodeInfo *cifsInode = CIFS_I(inode); + struct cifs_sb_info *cifs_sb = CIFS_SB(inode); + unsigned int sbflags = cifs_sb_flags(cifs_sb); + struct cifsFileInfo *cfile = NULL; + void *page = alloc_dentry_path(); + __u64 mode = NO_CHANGE_64; kuid_t uid = INVALID_UID; kgid_t gid = INVALID_GID; - struct inode *inode = d_inode(direntry); - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); - struct cifsInodeInfo *cifsInode = CIFS_I(inode); - struct cifsFileInfo *cfile = NULL; const char *full_path; - void *page = alloc_dentry_path(); - int rc = -EACCES; __u32 dosattr = 0; - __u64 mode = NO_CHANGE_64; - bool posix = cifs_sb_master_tcon(cifs_sb)->posix_extensions; + int rc = -EACCES; + unsigned int xid; xid = get_xid(); cifs_dbg(FYI, "setattr on file %pd attrs->ia_valid 0x%x\n", direntry, attrs->ia_valid); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_PERM) + if (sbflags & CIFS_MOUNT_NO_PERM) attrs->ia_valid |= ATTR_FORCE; rc = setattr_prepare(&nop_mnt_idmap, direntry, attrs); @@ -3346,8 +3354,7 @@ cifs_setattr_nounix(struct dentry *direntry, struct iattr *attrs) if (attrs->ia_valid & ATTR_GID) gid = attrs->ia_gid; - if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_CIFS_ACL) || - (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MODE_FROM_SID)) { + if (sbflags & (CIFS_MOUNT_CIFS_ACL | CIFS_MOUNT_MODE_FROM_SID)) { if (uid_valid(uid) || gid_valid(gid)) { mode = NO_CHANGE_64; rc = id_mode_to_cifs_acl(inode, full_path, &mode, @@ -3358,9 +3365,9 @@ cifs_setattr_nounix(struct dentry *direntry, struct iattr *attrs) goto cifs_setattr_exit; } } - } else - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SET_UID)) + } else if (!(sbflags & CIFS_MOUNT_SET_UID)) { attrs->ia_valid &= ~(ATTR_UID | ATTR_GID); + } /* skip mode change if it's just for clearing setuid/setgid */ if (attrs->ia_valid & (ATTR_KILL_SUID|ATTR_KILL_SGID)) @@ -3369,9 +3376,8 @@ cifs_setattr_nounix(struct dentry *direntry, struct iattr *attrs) if (attrs->ia_valid & ATTR_MODE) { mode = attrs->ia_mode; rc = 0; - if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_CIFS_ACL) || - (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MODE_FROM_SID) || - posix) { + if ((sbflags & (CIFS_MOUNT_CIFS_ACL | CIFS_MOUNT_MODE_FROM_SID)) || + cifs_sb_master_tcon(cifs_sb)->posix_extensions) { rc = id_mode_to_cifs_acl(inode, full_path, &mode, INVALID_UID, INVALID_GID); if (rc) { @@ -3393,7 +3399,7 @@ cifs_setattr_nounix(struct dentry *direntry, struct iattr *attrs) dosattr = cifsInode->cifsAttrs | ATTR_READONLY; /* fix up mode if we're not using dynperm */ - if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DYNPERM) == 0) + if ((sbflags & CIFS_MOUNT_DYNPERM) == 0) attrs->ia_mode = inode->i_mode & ~S_IWUGO; } else if ((mode & S_IWUGO) && (cifsInode->cifsAttrs & ATTR_READONLY)) { @@ -3404,7 +3410,7 @@ cifs_setattr_nounix(struct dentry *direntry, struct iattr *attrs) dosattr |= ATTR_NORMAL; /* reset local inode permissions to normal */ - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DYNPERM)) { + if (!(sbflags & CIFS_MOUNT_DYNPERM)) { attrs->ia_mode &= ~(S_IALLUGO); if (S_ISDIR(inode->i_mode)) attrs->ia_mode |= @@ -3413,7 +3419,7 @@ cifs_setattr_nounix(struct dentry *direntry, struct iattr *attrs) attrs->ia_mode |= cifs_sb->ctx->file_mode; } - } else if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DYNPERM)) { + } else if (!(sbflags & CIFS_MOUNT_DYNPERM)) { /* ignore mode change - ATTR_READONLY hasn't changed */ attrs->ia_valid &= ~ATTR_MODE; } diff --git a/fs/smb/client/ioctl.c b/fs/smb/client/ioctl.c index 8dc2651e237f..9afab3237e54 100644 --- a/fs/smb/client/ioctl.c +++ b/fs/smb/client/ioctl.c @@ -216,7 +216,7 @@ static int cifs_shutdown(struct super_block *sb, unsigned long arg) */ case CIFS_GOING_FLAGS_LOGFLUSH: case CIFS_GOING_FLAGS_NOLOGFLUSH: - sbi->mnt_cifs_flags |= CIFS_MOUNT_SHUTDOWN; + atomic_or(CIFS_MOUNT_SHUTDOWN, &sbi->mnt_cifs_flags); goto shutdown_good; default: rc = -EINVAL; diff --git a/fs/smb/client/link.c b/fs/smb/client/link.c index a2f7bfa8ad1e..434e8fe74080 100644 --- a/fs/smb/client/link.c +++ b/fs/smb/client/link.c @@ -544,14 +544,15 @@ int cifs_symlink(struct mnt_idmap *idmap, struct inode *inode, struct dentry *direntry, const char *symname) { - int rc = -EOPNOTSUPP; - unsigned int xid; - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(inode); + struct inode *newinode = NULL; struct tcon_link *tlink; struct cifs_tcon *pTcon; const char *full_path; + int rc = -EOPNOTSUPP; + unsigned int sbflags; + unsigned int xid; void *page; - struct inode *newinode = NULL; if (unlikely(cifs_forced_shutdown(cifs_sb))) return smb_EIO(smb_eio_trace_forced_shutdown); @@ -580,6 +581,7 @@ cifs_symlink(struct mnt_idmap *idmap, struct inode *inode, cifs_dbg(FYI, "symname is %s\n", symname); /* BB what if DFS and this volume is on different share? BB */ + sbflags = cifs_sb_flags(cifs_sb); rc = -EOPNOTSUPP; switch (cifs_symlink_type(cifs_sb)) { case CIFS_SYMLINK_TYPE_UNIX: @@ -594,14 +596,14 @@ cifs_symlink(struct mnt_idmap *idmap, struct inode *inode, break; case CIFS_SYMLINK_TYPE_MFSYMLINKS: - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MF_SYMLINKS) { + if (sbflags & CIFS_MOUNT_MF_SYMLINKS) { rc = create_mf_symlink(xid, pTcon, cifs_sb, full_path, symname); } break; case CIFS_SYMLINK_TYPE_SFU: - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_UNX_EMUL) { + if (sbflags & CIFS_MOUNT_UNX_EMUL) { rc = __cifs_sfu_make_node(xid, inode, direntry, pTcon, full_path, S_IFLNK, 0, symname); diff --git a/fs/smb/client/misc.c b/fs/smb/client/misc.c index 22cde46309fe..bc24c92b8b95 100644 --- a/fs/smb/client/misc.c +++ b/fs/smb/client/misc.c @@ -275,13 +275,15 @@ dump_smb(void *buf, int smb_buf_length) void cifs_autodisable_serverino(struct cifs_sb_info *cifs_sb) { - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM) { + unsigned int sbflags = cifs_sb_flags(cifs_sb); + + if (sbflags & CIFS_MOUNT_SERVER_INUM) { struct cifs_tcon *tcon = NULL; if (cifs_sb->master_tlink) tcon = cifs_sb_master_tcon(cifs_sb); - cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_SERVER_INUM; + atomic_andnot(CIFS_MOUNT_SERVER_INUM, &cifs_sb->mnt_cifs_flags); cifs_sb->mnt_cifs_serverino_autodisabled = true; cifs_dbg(VFS, "Autodisabling the use of server inode numbers on %s\n", tcon ? tcon->tree_name : "new server"); @@ -382,11 +384,13 @@ void cifs_done_oplock_break(struct cifsInodeInfo *cinode) bool backup_cred(struct cifs_sb_info *cifs_sb) { - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_CIFS_BACKUPUID) { + unsigned int sbflags = cifs_sb_flags(cifs_sb); + + if (sbflags & CIFS_MOUNT_CIFS_BACKUPUID) { if (uid_eq(cifs_sb->ctx->backupuid, current_fsuid())) return true; } - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_CIFS_BACKUPGID) { + if (sbflags & CIFS_MOUNT_CIFS_BACKUPGID) { if (in_group_p(cifs_sb->ctx->backupgid)) return true; } @@ -955,7 +959,7 @@ int cifs_update_super_prepath(struct cifs_sb_info *cifs_sb, char *prefix) convert_delimiter(cifs_sb->prepath, CIFS_DIR_SEP(cifs_sb)); } - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_USE_PREFIX_PATH; + atomic_or(CIFS_MOUNT_USE_PREFIX_PATH, &cifs_sb->mnt_cifs_flags); return 0; } @@ -984,7 +988,7 @@ int cifs_inval_name_dfs_link_error(const unsigned int xid, * look up or tcon is not DFS. */ if (strlen(full_path) < 2 || !cifs_sb || - (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_DFS) || + (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_NO_DFS) || !is_tcon_dfs(tcon)) return 0; diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c index 8615a8747b7f..be22bbc4a65a 100644 --- a/fs/smb/client/readdir.c +++ b/fs/smb/client/readdir.c @@ -121,7 +121,7 @@ retry: * want to clobber the existing one with the one that * the readdir code created. */ - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM)) + if (!(cifs_sb_flags(cifs_sb) & CIFS_MOUNT_SERVER_INUM)) fattr->cf_uniqueid = CIFS_I(inode)->uniqueid; /* @@ -177,6 +177,7 @@ cifs_fill_common_info(struct cifs_fattr *fattr, struct cifs_sb_info *cifs_sb) struct cifs_open_info_data data = { .reparse = { .tag = fattr->cf_cifstag, }, }; + unsigned int sbflags; fattr->cf_uid = cifs_sb->ctx->linux_uid; fattr->cf_gid = cifs_sb->ctx->linux_gid; @@ -215,12 +216,12 @@ out_reparse: * may look wrong since the inodes may not have timed out by the time * "ls" does a stat() call on them. */ - if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_CIFS_ACL) || - (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MODE_FROM_SID)) + sbflags = cifs_sb_flags(cifs_sb); + if (sbflags & (CIFS_MOUNT_CIFS_ACL | CIFS_MOUNT_MODE_FROM_SID)) fattr->cf_flags |= CIFS_FATTR_NEED_REVAL; - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_UNX_EMUL && - fattr->cf_cifsattrs & ATTR_SYSTEM) { + if ((sbflags & CIFS_MOUNT_UNX_EMUL) && + (fattr->cf_cifsattrs & ATTR_SYSTEM)) { if (fattr->cf_eof == 0) { fattr->cf_mode &= ~S_IFMT; fattr->cf_mode |= S_IFIFO; @@ -345,13 +346,14 @@ static int _initiate_cifs_search(const unsigned int xid, struct file *file, const char *full_path) { + struct cifs_sb_info *cifs_sb = CIFS_SB(file); + struct tcon_link *tlink = NULL; + struct TCP_Server_Info *server; + struct cifsFileInfo *cifsFile; + struct cifs_tcon *tcon; + unsigned int sbflags; __u16 search_flags; int rc = 0; - struct cifsFileInfo *cifsFile; - struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(file); - struct tcon_link *tlink = NULL; - struct cifs_tcon *tcon; - struct TCP_Server_Info *server; if (file->private_data == NULL) { tlink = cifs_sb_tlink(cifs_sb); @@ -385,6 +387,7 @@ _initiate_cifs_search(const unsigned int xid, struct file *file, cifs_dbg(FYI, "Full path: %s start at: %lld\n", full_path, file->f_pos); ffirst_retry: + sbflags = cifs_sb_flags(cifs_sb); /* test for Unix extensions */ /* but now check for them on the share/mount not on the SMB session */ /* if (cap_unix(tcon->ses) { */ @@ -395,7 +398,7 @@ ffirst_retry: else if ((tcon->ses->capabilities & tcon->ses->server->vals->cap_nt_find) == 0) { cifsFile->srch_inf.info_level = SMB_FIND_FILE_INFO_STANDARD; - } else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM) { + } else if (sbflags & CIFS_MOUNT_SERVER_INUM) { cifsFile->srch_inf.info_level = SMB_FIND_FILE_ID_FULL_DIR_INFO; } else /* not srvinos - BB fixme add check for backlevel? */ { cifsFile->srch_inf.info_level = SMB_FIND_FILE_FULL_DIRECTORY_INFO; @@ -411,8 +414,7 @@ ffirst_retry: if (rc == 0) { cifsFile->invalidHandle = false; - } else if ((rc == -EOPNOTSUPP) && - (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM)) { + } else if (rc == -EOPNOTSUPP && (sbflags & CIFS_MOUNT_SERVER_INUM)) { cifs_autodisable_serverino(cifs_sb); goto ffirst_retry; } @@ -690,7 +692,7 @@ find_cifs_entry(const unsigned int xid, struct cifs_tcon *tcon, loff_t pos, loff_t first_entry_in_buffer; loff_t index_to_find = pos; struct cifsFileInfo *cfile = file->private_data; - struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(file); + struct cifs_sb_info *cifs_sb = CIFS_SB(file); struct TCP_Server_Info *server = tcon->ses->server; /* check if index in the buffer */ @@ -955,6 +957,7 @@ static int cifs_filldir(char *find_entry, struct file *file, struct cifs_sb_info *cifs_sb = CIFS_SB(sb); struct cifs_dirent de = { NULL, }; struct cifs_fattr fattr; + unsigned int sbflags; struct qstr name; int rc = 0; @@ -1019,15 +1022,15 @@ static int cifs_filldir(char *find_entry, struct file *file, break; } - if (de.ino && (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM)) { + sbflags = cifs_sb_flags(cifs_sb); + if (de.ino && (sbflags & CIFS_MOUNT_SERVER_INUM)) { fattr.cf_uniqueid = de.ino; } else { fattr.cf_uniqueid = iunique(sb, ROOT_I); cifs_autodisable_serverino(cifs_sb); } - if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MF_SYMLINKS) && - couldbe_mf_symlink(&fattr)) + if ((sbflags & CIFS_MOUNT_MF_SYMLINKS) && couldbe_mf_symlink(&fattr)) /* * trying to get the type and mode can be slow, * so just call those regular files for now, and mark @@ -1058,7 +1061,7 @@ int cifs_readdir(struct file *file, struct dir_context *ctx) const char *full_path; void *page = alloc_dentry_path(); struct cached_fid *cfid = NULL; - struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(file); + struct cifs_sb_info *cifs_sb = CIFS_SB(file); xid = get_xid(); diff --git a/fs/smb/client/reparse.c b/fs/smb/client/reparse.c index ce9b923498b5..cd1e1eaee67a 100644 --- a/fs/smb/client/reparse.c +++ b/fs/smb/client/reparse.c @@ -55,17 +55,18 @@ static int create_native_symlink(const unsigned int xid, struct inode *inode, const char *full_path, const char *symname) { struct reparse_symlink_data_buffer *buf = NULL; - struct cifs_open_info_data data = {}; - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(inode); const char *symroot = cifs_sb->ctx->symlinkroot; - struct inode *new; - struct kvec iov; - __le16 *path = NULL; - bool directory; - char *symlink_target = NULL; - char *sym = NULL; + struct cifs_open_info_data data = {}; char sep = CIFS_DIR_SEP(cifs_sb); + char *symlink_target = NULL; u16 len, plen, poff, slen; + unsigned int sbflags; + __le16 *path = NULL; + struct inode *new; + char *sym = NULL; + struct kvec iov; + bool directory; int rc = 0; if (strlen(symname) > REPARSE_SYM_PATH_MAX) @@ -83,8 +84,8 @@ static int create_native_symlink(const unsigned int xid, struct inode *inode, .symlink_target = symlink_target, }; - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_POSIX_PATHS) && - symroot && symname[0] == '/') { + sbflags = cifs_sb_flags(cifs_sb); + if (!(sbflags & CIFS_MOUNT_POSIX_PATHS) && symroot && symname[0] == '/') { /* * This is a request to create an absolute symlink on the server * which does not support POSIX paths, and expects symlink in @@ -164,7 +165,7 @@ static int create_native_symlink(const unsigned int xid, struct inode *inode, * mask these characters in NT object prefix by '_' and then change * them back. */ - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_POSIX_PATHS) && symname[0] == '/') + if (!(sbflags & CIFS_MOUNT_POSIX_PATHS) && symname[0] == '/') sym[0] = sym[1] = sym[2] = sym[5] = '_'; path = cifs_convert_path_to_utf16(sym, cifs_sb); @@ -173,7 +174,7 @@ static int create_native_symlink(const unsigned int xid, struct inode *inode, goto out; } - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_POSIX_PATHS) && symname[0] == '/') { + if (!(sbflags & CIFS_MOUNT_POSIX_PATHS) && symname[0] == '/') { sym[0] = '\\'; sym[1] = sym[2] = '?'; sym[5] = ':'; @@ -197,7 +198,7 @@ static int create_native_symlink(const unsigned int xid, struct inode *inode, slen = 2 * UniStrnlen((wchar_t *)path, REPARSE_SYM_PATH_MAX); poff = 0; plen = slen; - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_POSIX_PATHS) && symname[0] == '/') { + if (!(sbflags & CIFS_MOUNT_POSIX_PATHS) && symname[0] == '/') { /* * For absolute NT symlinks skip leading "\\??\\" in PrintName as * PrintName is user visible location in DOS/Win32 format (not in NT format). @@ -824,7 +825,7 @@ int smb2_parse_native_symlink(char **target, const char *buf, unsigned int len, goto out; } - if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_POSIX_PATHS) && + if (!(cifs_sb_flags(cifs_sb) & CIFS_MOUNT_POSIX_PATHS) && symroot && !relative) { /* * This is an absolute symlink from the server which does not diff --git a/fs/smb/client/reparse.h b/fs/smb/client/reparse.h index 570b0d25aeba..0164dc47bdfd 100644 --- a/fs/smb/client/reparse.h +++ b/fs/smb/client/reparse.h @@ -33,7 +33,7 @@ static inline kuid_t wsl_make_kuid(struct cifs_sb_info *cifs_sb, { u32 uid = le32_to_cpu(*(__le32 *)ptr); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_UID) + if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_OVERR_UID) return cifs_sb->ctx->linux_uid; return make_kuid(current_user_ns(), uid); } @@ -43,7 +43,7 @@ static inline kgid_t wsl_make_kgid(struct cifs_sb_info *cifs_sb, { u32 gid = le32_to_cpu(*(__le32 *)ptr); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID) + if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_OVERR_GID) return cifs_sb->ctx->linux_gid; return make_kgid(current_user_ns(), gid); } diff --git a/fs/smb/client/smb1ops.c b/fs/smb/client/smb1ops.c index aed49aaef8c4..9643eca0cb70 100644 --- a/fs/smb/client/smb1ops.c +++ b/fs/smb/client/smb1ops.c @@ -49,6 +49,7 @@ void reset_cifs_unix_caps(unsigned int xid, struct cifs_tcon *tcon, if (!CIFSSMBQFSUnixInfo(xid, tcon)) { __u64 cap = le64_to_cpu(tcon->fsUnixInfo.Capability); + unsigned int sbflags; cifs_dbg(FYI, "unix caps which server supports %lld\n", cap); /* @@ -75,14 +76,16 @@ void reset_cifs_unix_caps(unsigned int xid, struct cifs_tcon *tcon, if (cap & CIFS_UNIX_TRANSPORT_ENCRYPTION_MANDATORY_CAP) cifs_dbg(VFS, "per-share encryption not supported yet\n"); + if (cifs_sb) + sbflags = cifs_sb_flags(cifs_sb); + cap &= CIFS_UNIX_CAP_MASK; if (ctx && ctx->no_psx_acl) cap &= ~CIFS_UNIX_POSIX_ACL_CAP; else if (CIFS_UNIX_POSIX_ACL_CAP & cap) { cifs_dbg(FYI, "negotiated posix acl support\n"); if (cifs_sb) - cifs_sb->mnt_cifs_flags |= - CIFS_MOUNT_POSIXACL; + sbflags |= CIFS_MOUNT_POSIXACL; } if (ctx && ctx->posix_paths == 0) @@ -90,10 +93,12 @@ void reset_cifs_unix_caps(unsigned int xid, struct cifs_tcon *tcon, else if (cap & CIFS_UNIX_POSIX_PATHNAMES_CAP) { cifs_dbg(FYI, "negotiate posix pathnames\n"); if (cifs_sb) - cifs_sb->mnt_cifs_flags |= - CIFS_MOUNT_POSIX_PATHS; + sbflags |= CIFS_MOUNT_POSIX_PATHS; } + if (cifs_sb) + atomic_set(&cifs_sb->mnt_cifs_flags, sbflags); + cifs_dbg(FYI, "Negotiate caps 0x%x\n", (int)cap); #ifdef CONFIG_CIFS_DEBUG2 if (cap & CIFS_UNIX_FCNTL_CAP) @@ -1147,7 +1152,7 @@ static int cifs_oplock_response(struct cifs_tcon *tcon, __u64 persistent_fid, __u64 volatile_fid, __u16 net_fid, struct cifsInodeInfo *cinode, unsigned int oplock) { - unsigned int sbflags = CIFS_SB(cinode->netfs.inode.i_sb)->mnt_cifs_flags; + unsigned int sbflags = cifs_sb_flags(CIFS_SB(cinode)); __u8 op; op = !!((oplock & CIFS_CACHE_READ_FLG) || (sbflags & CIFS_MOUNT_RO_CACHE)); @@ -1282,7 +1287,8 @@ cifs_make_node(unsigned int xid, struct inode *inode, struct dentry *dentry, struct cifs_tcon *tcon, const char *full_path, umode_t mode, dev_t dev) { - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + struct cifs_sb_info *cifs_sb = CIFS_SB(inode); + unsigned int sbflags = cifs_sb_flags(cifs_sb); struct inode *newinode = NULL; int rc; @@ -1298,7 +1304,7 @@ cifs_make_node(unsigned int xid, struct inode *inode, .mtime = NO_CHANGE_64, .device = dev, }; - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SET_UID) { + if (sbflags & CIFS_MOUNT_SET_UID) { args.uid = current_fsuid(); args.gid = current_fsgid(); } else { @@ -1317,7 +1323,7 @@ cifs_make_node(unsigned int xid, struct inode *inode, if (rc == 0) d_instantiate(dentry, newinode); return rc; - } else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_UNX_EMUL) { + } else if (sbflags & CIFS_MOUNT_UNX_EMUL) { /* * Check if mounted with mount parm 'sfu' mount parm. * SFU emulation should work with all servers diff --git a/fs/smb/client/smb2file.c b/fs/smb/client/smb2file.c index 1ab41de2b634..ed651c946251 100644 --- a/fs/smb/client/smb2file.c +++ b/fs/smb/client/smb2file.c @@ -72,7 +72,7 @@ int smb2_fix_symlink_target_type(char **target, bool directory, struct cifs_sb_i * POSIX server does not distinguish between symlinks to file and * symlink directory. So nothing is needed to fix on the client side. */ - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_POSIX_PATHS) + if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_POSIX_PATHS) return 0; if (!*target) diff --git a/fs/smb/client/smb2misc.c b/fs/smb/client/smb2misc.c index e19674d9b92b..973fce3c959c 100644 --- a/fs/smb/client/smb2misc.c +++ b/fs/smb/client/smb2misc.c @@ -455,17 +455,8 @@ calc_size_exit: __le16 * cifs_convert_path_to_utf16(const char *from, struct cifs_sb_info *cifs_sb) { - int len; const char *start_of_path; - __le16 *to; - int map_type; - - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MAP_SFM_CHR) - map_type = SFM_MAP_UNI_RSVD; - else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MAP_SPECIAL_CHR) - map_type = SFU_MAP_UNI_RSVD; - else - map_type = NO_MAP_UNI_RSVD; + int len; /* Windows doesn't allow paths beginning with \ */ if (from[0] == '\\') @@ -479,14 +470,13 @@ cifs_convert_path_to_utf16(const char *from, struct cifs_sb_info *cifs_sb) } else start_of_path = from; - to = cifs_strndup_to_utf16(start_of_path, PATH_MAX, &len, - cifs_sb->local_nls, map_type); - return to; + return cifs_strndup_to_utf16(start_of_path, PATH_MAX, &len, + cifs_sb->local_nls, cifs_remap(cifs_sb)); } __le32 smb2_get_lease_state(struct cifsInodeInfo *cinode, unsigned int oplock) { - unsigned int sbflags = CIFS_SB(cinode->netfs.inode.i_sb)->mnt_cifs_flags; + unsigned int sbflags = cifs_sb_flags(CIFS_SB(cinode)); __le32 lease = 0; if ((oplock & CIFS_CACHE_WRITE_FLG) || (sbflags & CIFS_MOUNT_RW_CACHE)) diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c index fea9a35caa57..7f2d3459cbf9 100644 --- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -986,7 +986,7 @@ smb2_is_path_accessible(const unsigned int xid, struct cifs_tcon *tcon, rc = -EREMOTE; } if (rc == -EREMOTE && IS_ENABLED(CONFIG_CIFS_DFS_UPCALL) && - (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_DFS)) + (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_NO_DFS)) rc = -EOPNOTSUPP; goto out; } @@ -2691,7 +2691,7 @@ static int smb2_oplock_response(struct cifs_tcon *tcon, __u64 persistent_fid, __u64 volatile_fid, __u16 net_fid, struct cifsInodeInfo *cinode, unsigned int oplock) { - unsigned int sbflags = CIFS_SB(cinode->netfs.inode.i_sb)->mnt_cifs_flags; + unsigned int sbflags = cifs_sb_flags(CIFS_SB(cinode)); __u8 op; if (tcon->ses->server->capabilities & SMB2_GLOBAL_CAP_LEASING) @@ -5332,7 +5332,7 @@ static int smb2_make_node(unsigned int xid, struct inode *inode, struct dentry *dentry, struct cifs_tcon *tcon, const char *full_path, umode_t mode, dev_t dev) { - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); + unsigned int sbflags = cifs_sb_flags(CIFS_SB(inode)); int rc = -EOPNOTSUPP; /* @@ -5341,7 +5341,7 @@ static int smb2_make_node(unsigned int xid, struct inode *inode, * supports block and char device, socket & fifo, * and was used by default in earlier versions of Windows */ - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_UNX_EMUL) { + if (sbflags & CIFS_MOUNT_UNX_EMUL) { rc = cifs_sfu_make_node(xid, inode, dentry, tcon, full_path, mode, dev); } else if (CIFS_REPARSE_SUPPORT(tcon)) { diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c index ef655acf673d..a2a96d817717 100644 --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -3182,22 +3182,19 @@ SMB2_open_init(struct cifs_tcon *tcon, struct TCP_Server_Info *server, } if ((oparms->disposition != FILE_OPEN) && (oparms->cifs_sb)) { + unsigned int sbflags = cifs_sb_flags(oparms->cifs_sb); bool set_mode; bool set_owner; - if ((oparms->cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MODE_FROM_SID) && - (oparms->mode != ACL_NO_MODE)) + if ((sbflags & CIFS_MOUNT_MODE_FROM_SID) && + oparms->mode != ACL_NO_MODE) { set_mode = true; - else { + } else { set_mode = false; oparms->mode = ACL_NO_MODE; } - if (oparms->cifs_sb->mnt_cifs_flags & CIFS_MOUNT_UID_FROM_ACL) - set_owner = true; - else - set_owner = false; - + set_owner = sbflags & CIFS_MOUNT_UID_FROM_ACL; if (set_owner | set_mode) { cifs_dbg(FYI, "add sd with mode 0x%x\n", oparms->mode); rc = add_sd_context(iov, &n_iov, oparms->mode, set_owner); diff --git a/fs/smb/client/xattr.c b/fs/smb/client/xattr.c index e1a7d9a10a53..23227f2f9428 100644 --- a/fs/smb/client/xattr.c +++ b/fs/smb/client/xattr.c @@ -149,7 +149,7 @@ static int cifs_xattr_set(const struct xattr_handler *handler, break; } - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_XATTR) + if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_NO_XATTR) goto out; if (pTcon->ses->server->ops->set_EA) { @@ -309,7 +309,7 @@ static int cifs_xattr_get(const struct xattr_handler *handler, break; } - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_XATTR) + if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_NO_XATTR) goto out; if (pTcon->ses->server->ops->query_all_EAs) @@ -398,7 +398,7 @@ ssize_t cifs_listxattr(struct dentry *direntry, char *data, size_t buf_size) if (unlikely(cifs_forced_shutdown(cifs_sb))) return smb_EIO(smb_eio_trace_forced_shutdown); - if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_XATTR) + if (cifs_sb_flags(cifs_sb) & CIFS_MOUNT_NO_XATTR) return -EOPNOTSUPP; tlink = cifs_sb_tlink(cifs_sb); From d9d1e319b39ea685ede59319002d567c159d23c3 Mon Sep 17 00:00:00 2001 From: Paulo Alcantara Date: Wed, 25 Feb 2026 21:34:55 -0300 Subject: [PATCH 329/359] smb: client: fix broken multichannel with krb5+signing When mounting a share with 'multichannel,max_channels=n,sec=krb5i', the client was duplicating signing key for all secondary channels, thus making the server fail all commands sent from secondary channels due to bad signatures. Every channel has its own signing key, so when establishing a new channel with krb5 auth, make sure to use the new session key as the derived key to generate channel's signing key in SMB2_auth_kerberos(). Repro: $ mount.cifs //srv/share /mnt -o multichannel,max_channels=4,sec=krb5i $ sleep 5 $ umount /mnt $ dmesg ... CIFS: VFS: sign fail cmd 0x5 message id 0x2 CIFS: VFS: \\srv SMB signature verification returned error = -13 CIFS: VFS: sign fail cmd 0x5 message id 0x2 CIFS: VFS: \\srv SMB signature verification returned error = -13 CIFS: VFS: sign fail cmd 0x4 message id 0x2 CIFS: VFS: \\srv SMB signature verification returned error = -13 Reported-by: Xiaoli Feng Reviewed-by: Enzo Matsumiya Signed-off-by: Paulo Alcantara (Red Hat) Cc: David Howells Cc: linux-cifs@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Steve French --- fs/smb/client/smb2pdu.c | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c index a2a96d817717..04e361ed2356 100644 --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -1714,19 +1714,17 @@ SMB2_auth_kerberos(struct SMB2_sess_data *sess_data) is_binding = (ses->ses_status == SES_GOOD); spin_unlock(&ses->ses_lock); - /* keep session key if binding */ - if (!is_binding) { - kfree_sensitive(ses->auth_key.response); - ses->auth_key.response = kmemdup(msg->data, msg->sesskey_len, - GFP_KERNEL); - if (!ses->auth_key.response) { - cifs_dbg(VFS, "Kerberos can't allocate (%u bytes) memory\n", - msg->sesskey_len); - rc = -ENOMEM; - goto out_put_spnego_key; - } - ses->auth_key.len = msg->sesskey_len; + kfree_sensitive(ses->auth_key.response); + ses->auth_key.response = kmemdup(msg->data, + msg->sesskey_len, + GFP_KERNEL); + if (!ses->auth_key.response) { + cifs_dbg(VFS, "%s: can't allocate (%u bytes) memory\n", + __func__, msg->sesskey_len); + rc = -ENOMEM; + goto out_put_spnego_key; } + ses->auth_key.len = msg->sesskey_len; sess_data->iov[1].iov_base = msg->data + msg->sesskey_len; sess_data->iov[1].iov_len = msg->secblob_len; From 2f37dc436d4e61ff7ae0b0353cf91b8c10396e4d Mon Sep 17 00:00:00 2001 From: Thorsten Blum Date: Thu, 26 Feb 2026 22:28:45 +0100 Subject: [PATCH 330/359] smb: client: Don't log plaintext credentials in cifs_set_cifscreds When debug logging is enabled, cifs_set_cifscreds() logs the key payload and exposes the plaintext username and password. Remove the debug log to avoid exposing credentials. Fixes: 8a8798a5ff90 ("cifs: fetch credentials out of keyring for non-krb5 auth multiuser mounts") Cc: stable@vger.kernel.org Acked-by: Paulo Alcantara (Red Hat) Signed-off-by: Thorsten Blum Signed-off-by: Steve French --- fs/smb/client/connect.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index 87686c804057..4c34d9603e05 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -2236,7 +2236,6 @@ cifs_set_cifscreds(struct smb3_fs_context *ctx, struct cifs_ses *ses) /* find first : in payload */ payload = upayload->data; delim = strnchr(payload, upayload->datalen, ':'); - cifs_dbg(FYI, "payload=%s\n", payload); if (!delim) { cifs_dbg(FYI, "Unable to find ':' in payload (datalen=%d)\n", upayload->datalen); From e9217ca77dc35b4978db0fe901685ddb3f1e223a Mon Sep 17 00:00:00 2001 From: Harry Yoo Date: Mon, 23 Feb 2026 16:58:09 +0900 Subject: [PATCH 331/359] mm/slab: initialize slab->stride early to avoid memory ordering issues When alloc_slab_obj_exts() is called later (instead of during slab allocation and initialization), slab->stride and slab->obj_exts are updated after the slab is already accessible by multiple CPUs. The current implementation does not enforce memory ordering between slab->stride and slab->obj_exts. For correctness, slab->stride must be visible before slab->obj_exts. Otherwise, concurrent readers may observe slab->obj_exts as non-zero while stride is still stale. With stale slab->stride, slab_obj_ext() could return the wrong obj_ext. This could cause two problems: - obj_cgroup_put() is called on the wrong objcg, leading to a use-after-free due to incorrect reference counting [1] by decrementing the reference count more than it was incremented. - refill_obj_stock() is called on the wrong objcg, leading to a page_counter overflow [2] by uncharging more memory than charged. Fix this by unconditionally initializing slab->stride in alloc_slab_obj_exts_early(), before the need_slab_obj_exts() check. In the case of SLAB_OBJ_EXT_IN_OBJ, it is overridden in the function. This ensures updates to slab->stride become visible before the slab can be accessed by other CPUs via the per-node partial slab list (protected by spinlock with acquire/release semantics). Thanks to Shakeel Butt for pointing out this issue [3]. [vbabka@kernel.org: the bug reports [1] and [2] are not yet fully fixed, with investigation ongoing, but it is nevertheless a step in the right direction to only set stride once after allocating the slab and not change it later ] Fixes: 7a8e71bc619d ("mm/slab: use stride to access slabobj_ext") Reported-by: Venkat Rao Bagalkote Link: https://lore.kernel.org/lkml/ca241daa-e7e7-4604-a48d-de91ec9184a5@linux.ibm.com [1] Link: https://lore.kernel.org/all/ddff7c7d-c0c3-4780-808f-9a83268bbf0c@linux.ibm.com [2] Link: https://lore.kernel.org/linux-mm/aZu9G9mVIVzSm6Ft@hyeyoo [3] Signed-off-by: Harry Yoo Signed-off-by: Vlastimil Babka (SUSE) --- mm/slub.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 52f021711744..0c906fefc31b 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2196,7 +2196,6 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s, retry: old_exts = READ_ONCE(slab->obj_exts); handle_failed_objexts_alloc(old_exts, vec, objects); - slab_set_stride(slab, sizeof(struct slabobj_ext)); if (new_slab) { /* @@ -2272,6 +2271,9 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab) void *addr; unsigned long obj_exts; + /* Initialize stride early to avoid memory ordering issues */ + slab_set_stride(slab, sizeof(struct slabobj_ext)); + if (!need_slab_obj_exts(s)) return; @@ -2288,7 +2290,6 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab) obj_exts |= MEMCG_DATA_OBJEXTS; #endif slab->obj_exts = obj_exts; - slab_set_stride(slab, sizeof(struct slabobj_ext)); } else if (s->flags & SLAB_OBJ_EXT_IN_OBJ) { unsigned int offset = obj_exts_offset_in_object(s); From f505a45776d149632e3bd0b87f0da1609607161a Mon Sep 17 00:00:00 2001 From: Thorsten Blum Date: Thu, 26 Feb 2026 23:15:22 +0100 Subject: [PATCH 332/359] smb: client: Use snprintf in cifs_set_cifscreds Replace unbounded sprintf() calls with the safer snprintf(). Avoid using magic numbers and use strlen() to calculate the key descriptor buffer size. Save the size in a local variable and reuse it for the bounded snprintf() calls. Remove CIFSCREDS_DESC_SIZE. Signed-off-by: Thorsten Blum Acked-by: Paulo Alcantara (Red Hat) Signed-off-by: Steve French --- fs/smb/client/connect.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index 4c34d9603e05..3bad2c5c523d 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -2167,9 +2167,6 @@ void __cifs_put_smb_ses(struct cifs_ses *ses) #ifdef CONFIG_KEYS -/* strlen("cifs:a:") + CIFS_MAX_DOMAINNAME_LEN + 1 */ -#define CIFSCREDS_DESC_SIZE (7 + CIFS_MAX_DOMAINNAME_LEN + 1) - /* Populate username and pw fields from keyring if possible */ static int cifs_set_cifscreds(struct smb3_fs_context *ctx, struct cifs_ses *ses) @@ -2177,6 +2174,7 @@ cifs_set_cifscreds(struct smb3_fs_context *ctx, struct cifs_ses *ses) int rc = 0; int is_domain = 0; const char *delim, *payload; + size_t desc_sz; char *desc; ssize_t len; struct key *key; @@ -2185,7 +2183,9 @@ cifs_set_cifscreds(struct smb3_fs_context *ctx, struct cifs_ses *ses) struct sockaddr_in6 *sa6; const struct user_key_payload *upayload; - desc = kmalloc(CIFSCREDS_DESC_SIZE, GFP_KERNEL); + /* "cifs:a:" and "cifs:d:" are the same length; +1 for NUL terminator */ + desc_sz = strlen("cifs:a:") + CIFS_MAX_DOMAINNAME_LEN + 1; + desc = kmalloc(desc_sz, GFP_KERNEL); if (!desc) return -ENOMEM; @@ -2193,11 +2193,11 @@ cifs_set_cifscreds(struct smb3_fs_context *ctx, struct cifs_ses *ses) switch (server->dstaddr.ss_family) { case AF_INET: sa = (struct sockaddr_in *)&server->dstaddr; - sprintf(desc, "cifs:a:%pI4", &sa->sin_addr.s_addr); + snprintf(desc, desc_sz, "cifs:a:%pI4", &sa->sin_addr.s_addr); break; case AF_INET6: sa6 = (struct sockaddr_in6 *)&server->dstaddr; - sprintf(desc, "cifs:a:%pI6c", &sa6->sin6_addr.s6_addr); + snprintf(desc, desc_sz, "cifs:a:%pI6c", &sa6->sin6_addr.s6_addr); break; default: cifs_dbg(FYI, "Bad ss_family (%hu)\n", @@ -2216,7 +2216,7 @@ cifs_set_cifscreds(struct smb3_fs_context *ctx, struct cifs_ses *ses) } /* didn't work, try to find a domain key */ - sprintf(desc, "cifs:d:%s", ses->domainName); + snprintf(desc, desc_sz, "cifs:d:%s", ses->domainName); cifs_dbg(FYI, "%s: desc=%s\n", __func__, desc); key = request_key(&key_type_logon, desc, ""); if (IS_ERR(key)) { From 39195990e4c093c9eecf88f29811c6de29265214 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Fri, 27 Feb 2026 06:10:08 -0600 Subject: [PATCH 333/359] PCI: Correct PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit fb82437fdd8c ("PCI: Change capability register offsets to hex") incorrectly converted the PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value from decimal 52 to hex 0x32: -#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 52 /* v2 endpoints with link end here */ +#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 0x32 /* end of v2 EPs w/ link */ This broke PCI capabilities in a VMM because subsequent ones weren't DWORD-aligned. Change PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 to the correct value of 0x34. fb82437fdd8c was from Baruch Siach , but this was not Baruch's fault; it's a mistake I made when applying the patch. Fixes: fb82437fdd8c ("PCI: Change capability register offsets to hex") Reported-by: David Woodhouse Closes: https://lore.kernel.org/all/3ae392a0158e9d9ab09a1d42150429dd8ca42791.camel@infradead.org Signed-off-by: Bjorn Helgaas Reviewed-by: Krzysztof Wilczyński --- include/uapi/linux/pci_regs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h index ec1c54b5a310..14f634ab9350 100644 --- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -712,7 +712,7 @@ #define PCI_EXP_LNKCTL2_HASD 0x0020 /* HW Autonomous Speed Disable */ #define PCI_EXP_LNKSTA2 0x32 /* Link Status 2 */ #define PCI_EXP_LNKSTA2_FLIT 0x0400 /* Flit Mode Status */ -#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 0x32 /* end of v2 EPs w/ link */ +#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 0x34 /* end of v2 EPs w/ link */ #define PCI_EXP_SLTCAP2 0x34 /* Slot Capabilities 2 */ #define PCI_EXP_SLTCAP2_IBPD 0x00000001 /* In-band PD Disable Supported */ #define PCI_EXP_SLTCTL2 0x38 /* Slot Control 2 */ From 1df97a7453eec80c1912c2d0360290a3970a7671 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Fri, 27 Feb 2026 14:48:01 -0800 Subject: [PATCH 334/359] bpf: Register dtor for freeing special fields There is a race window where BPF hash map elements can leak special fields if the program with access to the map value recreates these special fields between the check_and_free_fields done on the map value and its eventual return to the memory allocator. Several ways were explored prior to this patch, most notably [0] tried to use a poison value to reject attempts to recreate special fields for map values that have been logically deleted but still accessible to BPF programs (either while sitting in the free list or when reused). While this approach works well for task work, timers, wq, etc., it is harder to apply the idea to kptrs, which have a similar race and failure mode. Instead, we change bpf_mem_alloc to allow registering destructor for allocated elements, such that when they are returned to the allocator, any special fields created while they were accessible to programs in the mean time will be freed. If these values get reused, we do not free the fields again before handing the element back. The special fields thus may remain initialized while the map value sits in a free list. When bpf_mem_alloc is retired in the future, a similar concept can be introduced to kmalloc_nolock-backed kmem_cache, paired with the existing idea of a constructor. Note that the destructor registration happens in map_check_btf, after the BTF record is populated and (at that point) avaiable for inspection and duplication. Duplication is necessary since the freeing of embedded bpf_mem_alloc can be decoupled from actual map lifetime due to logic introduced to reduce the cost of rcu_barrier()s in mem alloc free path in 9f2c6e96c65e ("bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc."). As such, once all callbacks are done, we must also free the duplicated record. To remove dependency on the bpf_map itself, also stash the key size of the map to obtain value from htab_elem long after the map is gone. [0]: https://lore.kernel.org/bpf/20260216131341.1285427-1-mykyta.yatsenko5@gmail.com Fixes: 14a324f6a67e ("bpf: Wire up freeing of referenced kptr") Fixes: 1bfbc267ec91 ("bpf: Enable bpf_timer and bpf_wq in any context") Reported-by: Alexei Starovoitov Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20260227224806.646888-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf_mem_alloc.h | 6 +++ kernel/bpf/hashtab.c | 87 +++++++++++++++++++++++++++++++++++ kernel/bpf/memalloc.c | 58 ++++++++++++++++++----- 3 files changed, 140 insertions(+), 11 deletions(-) diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index e45162ef59bb..4ce0d27f8ea2 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -14,6 +14,8 @@ struct bpf_mem_alloc { struct obj_cgroup *objcg; bool percpu; struct work_struct work; + void (*dtor_ctx_free)(void *ctx); + void *dtor_ctx; }; /* 'size != 0' is for bpf_mem_alloc which manages fixed-size objects. @@ -32,6 +34,10 @@ int bpf_mem_alloc_percpu_init(struct bpf_mem_alloc *ma, struct obj_cgroup *objcg /* The percpu allocation with a specific unit size. */ int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size); void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); +void bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma, + void (*dtor)(void *obj, void *ctx), + void (*dtor_ctx_free)(void *ctx), + void *ctx); /* Check the allocation size for kmalloc equivalent allocator */ int bpf_mem_alloc_check_size(bool percpu, size_t size); diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 3b9d297a53be..582f0192b7e1 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -125,6 +125,11 @@ struct htab_elem { char key[] __aligned(8); }; +struct htab_btf_record { + struct btf_record *record; + u32 key_size; +}; + static inline bool htab_is_prealloc(const struct bpf_htab *htab) { return !(htab->map.map_flags & BPF_F_NO_PREALLOC); @@ -457,6 +462,84 @@ static int htab_map_alloc_check(union bpf_attr *attr) return 0; } +static void htab_mem_dtor(void *obj, void *ctx) +{ + struct htab_btf_record *hrec = ctx; + struct htab_elem *elem = obj; + void *map_value; + + if (IS_ERR_OR_NULL(hrec->record)) + return; + + map_value = htab_elem_value(elem, hrec->key_size); + bpf_obj_free_fields(hrec->record, map_value); +} + +static void htab_pcpu_mem_dtor(void *obj, void *ctx) +{ + void __percpu *pptr = *(void __percpu **)obj; + struct htab_btf_record *hrec = ctx; + int cpu; + + if (IS_ERR_OR_NULL(hrec->record)) + return; + + for_each_possible_cpu(cpu) + bpf_obj_free_fields(hrec->record, per_cpu_ptr(pptr, cpu)); +} + +static void htab_dtor_ctx_free(void *ctx) +{ + struct htab_btf_record *hrec = ctx; + + btf_record_free(hrec->record); + kfree(ctx); +} + +static int htab_set_dtor(const struct bpf_htab *htab, void (*dtor)(void *, void *)) +{ + u32 key_size = htab->map.key_size; + const struct bpf_mem_alloc *ma; + struct htab_btf_record *hrec; + int err; + + /* No need for dtors. */ + if (IS_ERR_OR_NULL(htab->map.record)) + return 0; + + hrec = kzalloc(sizeof(*hrec), GFP_KERNEL); + if (!hrec) + return -ENOMEM; + hrec->key_size = key_size; + hrec->record = btf_record_dup(htab->map.record); + if (IS_ERR(hrec->record)) { + err = PTR_ERR(hrec->record); + kfree(hrec); + return err; + } + ma = htab_is_percpu(htab) ? &htab->pcpu_ma : &htab->ma; + /* Kinda sad, but cast away const-ness since we change ma->dtor. */ + bpf_mem_alloc_set_dtor((struct bpf_mem_alloc *)ma, dtor, htab_dtor_ctx_free, hrec); + return 0; +} + +static int htab_map_check_btf(const struct bpf_map *map, const struct btf *btf, + const struct btf_type *key_type, const struct btf_type *value_type) +{ + struct bpf_htab *htab = container_of(map, struct bpf_htab, map); + + if (htab_is_prealloc(htab)) + return 0; + /* + * We must set the dtor using this callback, as map's BTF record is not + * populated in htab_map_alloc(), so it will always appear as NULL. + */ + if (htab_is_percpu(htab)) + return htab_set_dtor(htab, htab_pcpu_mem_dtor); + else + return htab_set_dtor(htab, htab_mem_dtor); +} + static struct bpf_map *htab_map_alloc(union bpf_attr *attr) { bool percpu = (attr->map_type == BPF_MAP_TYPE_PERCPU_HASH || @@ -2281,6 +2364,7 @@ const struct bpf_map_ops htab_map_ops = { .map_seq_show_elem = htab_map_seq_show_elem, .map_set_for_each_callback_args = map_set_for_each_callback_args, .map_for_each_callback = bpf_for_each_hash_elem, + .map_check_btf = htab_map_check_btf, .map_mem_usage = htab_map_mem_usage, BATCH_OPS(htab), .map_btf_id = &htab_map_btf_ids[0], @@ -2303,6 +2387,7 @@ const struct bpf_map_ops htab_lru_map_ops = { .map_seq_show_elem = htab_map_seq_show_elem, .map_set_for_each_callback_args = map_set_for_each_callback_args, .map_for_each_callback = bpf_for_each_hash_elem, + .map_check_btf = htab_map_check_btf, .map_mem_usage = htab_map_mem_usage, BATCH_OPS(htab_lru), .map_btf_id = &htab_map_btf_ids[0], @@ -2482,6 +2567,7 @@ const struct bpf_map_ops htab_percpu_map_ops = { .map_seq_show_elem = htab_percpu_map_seq_show_elem, .map_set_for_each_callback_args = map_set_for_each_callback_args, .map_for_each_callback = bpf_for_each_hash_elem, + .map_check_btf = htab_map_check_btf, .map_mem_usage = htab_map_mem_usage, BATCH_OPS(htab_percpu), .map_btf_id = &htab_map_btf_ids[0], @@ -2502,6 +2588,7 @@ const struct bpf_map_ops htab_lru_percpu_map_ops = { .map_seq_show_elem = htab_percpu_map_seq_show_elem, .map_set_for_each_callback_args = map_set_for_each_callback_args, .map_for_each_callback = bpf_for_each_hash_elem, + .map_check_btf = htab_map_check_btf, .map_mem_usage = htab_map_mem_usage, BATCH_OPS(htab_lru_percpu), .map_btf_id = &htab_map_btf_ids[0], diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index bd45dda9dc35..682a9f34214b 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -102,6 +102,8 @@ struct bpf_mem_cache { int percpu_size; bool draining; struct bpf_mem_cache *tgt; + void (*dtor)(void *obj, void *ctx); + void *dtor_ctx; /* list of objects to be freed after RCU GP */ struct llist_head free_by_rcu; @@ -260,12 +262,14 @@ static void free_one(void *obj, bool percpu) kfree(obj); } -static int free_all(struct llist_node *llnode, bool percpu) +static int free_all(struct bpf_mem_cache *c, struct llist_node *llnode, bool percpu) { struct llist_node *pos, *t; int cnt = 0; llist_for_each_safe(pos, t, llnode) { + if (c->dtor) + c->dtor((void *)pos + LLIST_NODE_SZ, c->dtor_ctx); free_one(pos, percpu); cnt++; } @@ -276,7 +280,7 @@ static void __free_rcu(struct rcu_head *head) { struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu_ttrace); - free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size); + free_all(c, llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size); atomic_set(&c->call_rcu_ttrace_in_progress, 0); } @@ -308,7 +312,7 @@ static void do_call_rcu_ttrace(struct bpf_mem_cache *c) if (atomic_xchg(&c->call_rcu_ttrace_in_progress, 1)) { if (unlikely(READ_ONCE(c->draining))) { llnode = llist_del_all(&c->free_by_rcu_ttrace); - free_all(llnode, !!c->percpu_size); + free_all(c, llnode, !!c->percpu_size); } return; } @@ -417,7 +421,7 @@ static void check_free_by_rcu(struct bpf_mem_cache *c) dec_active(c, &flags); if (unlikely(READ_ONCE(c->draining))) { - free_all(llist_del_all(&c->waiting_for_gp), !!c->percpu_size); + free_all(c, llist_del_all(&c->waiting_for_gp), !!c->percpu_size); atomic_set(&c->call_rcu_in_progress, 0); } else { call_rcu_hurry(&c->rcu, __free_by_rcu); @@ -635,13 +639,13 @@ static void drain_mem_cache(struct bpf_mem_cache *c) * Except for waiting_for_gp_ttrace list, there are no concurrent operations * on these lists, so it is safe to use __llist_del_all(). */ - free_all(llist_del_all(&c->free_by_rcu_ttrace), percpu); - free_all(llist_del_all(&c->waiting_for_gp_ttrace), percpu); - free_all(__llist_del_all(&c->free_llist), percpu); - free_all(__llist_del_all(&c->free_llist_extra), percpu); - free_all(__llist_del_all(&c->free_by_rcu), percpu); - free_all(__llist_del_all(&c->free_llist_extra_rcu), percpu); - free_all(llist_del_all(&c->waiting_for_gp), percpu); + free_all(c, llist_del_all(&c->free_by_rcu_ttrace), percpu); + free_all(c, llist_del_all(&c->waiting_for_gp_ttrace), percpu); + free_all(c, __llist_del_all(&c->free_llist), percpu); + free_all(c, __llist_del_all(&c->free_llist_extra), percpu); + free_all(c, __llist_del_all(&c->free_by_rcu), percpu); + free_all(c, __llist_del_all(&c->free_llist_extra_rcu), percpu); + free_all(c, llist_del_all(&c->waiting_for_gp), percpu); } static void check_mem_cache(struct bpf_mem_cache *c) @@ -680,6 +684,9 @@ static void check_leaked_objs(struct bpf_mem_alloc *ma) static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma) { + /* We can free dtor ctx only once all callbacks are done using it. */ + if (ma->dtor_ctx_free) + ma->dtor_ctx_free(ma->dtor_ctx); check_leaked_objs(ma); free_percpu(ma->cache); free_percpu(ma->caches); @@ -1014,3 +1021,32 @@ int bpf_mem_alloc_check_size(bool percpu, size_t size) return 0; } + +void bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma, void (*dtor)(void *obj, void *ctx), + void (*dtor_ctx_free)(void *ctx), void *ctx) +{ + struct bpf_mem_caches *cc; + struct bpf_mem_cache *c; + int cpu, i; + + ma->dtor_ctx_free = dtor_ctx_free; + ma->dtor_ctx = ctx; + + if (ma->cache) { + for_each_possible_cpu(cpu) { + c = per_cpu_ptr(ma->cache, cpu); + c->dtor = dtor; + c->dtor_ctx = ctx; + } + } + if (ma->caches) { + for_each_possible_cpu(cpu) { + cc = per_cpu_ptr(ma->caches, cpu); + for (i = 0; i < NUM_CACHES; i++) { + c = &cc->cache[i]; + c->dtor = dtor; + c->dtor_ctx = ctx; + } + } + } +} From ae51772b1e94ba1d76db19085957dbccac189c1c Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Fri, 27 Feb 2026 14:48:02 -0800 Subject: [PATCH 335/359] bpf: Lose const-ness of map in map_check_btf() BPF hash map may now use the map_check_btf() callback to decide whether to set a dtor on its bpf_mem_alloc or not. Unlike C++ where members can opt out of const-ness using mutable, we must lose the const qualifier on the callback such that we can avoid the ugly cast. Make the change and adjust all existing users, and lose the comment in hashtab.c. Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20260227224806.646888-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 4 ++-- include/linux/bpf_local_storage.h | 2 +- kernel/bpf/arena.c | 2 +- kernel/bpf/arraymap.c | 2 +- kernel/bpf/bloom_filter.c | 2 +- kernel/bpf/bpf_insn_array.c | 2 +- kernel/bpf/bpf_local_storage.c | 2 +- kernel/bpf/hashtab.c | 9 ++++----- kernel/bpf/local_storage.c | 2 +- kernel/bpf/lpm_trie.c | 2 +- kernel/bpf/syscall.c | 2 +- 11 files changed, 15 insertions(+), 16 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index b78b53198a2e..05b34a6355b0 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -124,7 +124,7 @@ struct bpf_map_ops { u32 (*map_fd_sys_lookup_elem)(void *ptr); void (*map_seq_show_elem)(struct bpf_map *map, void *key, struct seq_file *m); - int (*map_check_btf)(const struct bpf_map *map, + int (*map_check_btf)(struct bpf_map *map, const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type); @@ -656,7 +656,7 @@ static inline bool bpf_map_support_seq_show(const struct bpf_map *map) map->ops->map_seq_show_elem; } -int map_check_no_btf(const struct bpf_map *map, +int map_check_no_btf(struct bpf_map *map, const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type); diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h index 85efa9772530..8157e8da61d4 100644 --- a/include/linux/bpf_local_storage.h +++ b/include/linux/bpf_local_storage.h @@ -176,7 +176,7 @@ u32 bpf_local_storage_destroy(struct bpf_local_storage *local_storage); void bpf_local_storage_map_free(struct bpf_map *map, struct bpf_local_storage_cache *cache); -int bpf_local_storage_map_check_btf(const struct bpf_map *map, +int bpf_local_storage_map_check_btf(struct bpf_map *map, const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type); diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c index 144f30e740e8..f355cf1c1a16 100644 --- a/kernel/bpf/arena.c +++ b/kernel/bpf/arena.c @@ -303,7 +303,7 @@ static long arena_map_update_elem(struct bpf_map *map, void *key, return -EOPNOTSUPP; } -static int arena_map_check_btf(const struct bpf_map *map, const struct btf *btf, +static int arena_map_check_btf(struct bpf_map *map, const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type) { return 0; diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 26763df6134a..33de68c95ad8 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -548,7 +548,7 @@ static void percpu_array_map_seq_show_elem(struct bpf_map *map, void *key, rcu_read_unlock(); } -static int array_map_check_btf(const struct bpf_map *map, +static int array_map_check_btf(struct bpf_map *map, const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type) diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c index 35e1ddca74d2..b73336c976b7 100644 --- a/kernel/bpf/bloom_filter.c +++ b/kernel/bpf/bloom_filter.c @@ -180,7 +180,7 @@ static long bloom_map_update_elem(struct bpf_map *map, void *key, return -EINVAL; } -static int bloom_map_check_btf(const struct bpf_map *map, +static int bloom_map_check_btf(struct bpf_map *map, const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type) diff --git a/kernel/bpf/bpf_insn_array.c b/kernel/bpf/bpf_insn_array.c index c0286f25ca3c..a2f84afe6f7c 100644 --- a/kernel/bpf/bpf_insn_array.c +++ b/kernel/bpf/bpf_insn_array.c @@ -98,7 +98,7 @@ static long insn_array_delete_elem(struct bpf_map *map, void *key) return -EINVAL; } -static int insn_array_check_btf(const struct bpf_map *map, +static int insn_array_check_btf(struct bpf_map *map, const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type) diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c index b28f07d3a0db..2bf5ca5ae0df 100644 --- a/kernel/bpf/bpf_local_storage.c +++ b/kernel/bpf/bpf_local_storage.c @@ -797,7 +797,7 @@ int bpf_local_storage_map_alloc_check(union bpf_attr *attr) return 0; } -int bpf_local_storage_map_check_btf(const struct bpf_map *map, +int bpf_local_storage_map_check_btf(struct bpf_map *map, const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 582f0192b7e1..bc6bc8bb871d 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -496,10 +496,10 @@ static void htab_dtor_ctx_free(void *ctx) kfree(ctx); } -static int htab_set_dtor(const struct bpf_htab *htab, void (*dtor)(void *, void *)) +static int htab_set_dtor(struct bpf_htab *htab, void (*dtor)(void *, void *)) { u32 key_size = htab->map.key_size; - const struct bpf_mem_alloc *ma; + struct bpf_mem_alloc *ma; struct htab_btf_record *hrec; int err; @@ -518,12 +518,11 @@ static int htab_set_dtor(const struct bpf_htab *htab, void (*dtor)(void *, void return err; } ma = htab_is_percpu(htab) ? &htab->pcpu_ma : &htab->ma; - /* Kinda sad, but cast away const-ness since we change ma->dtor. */ - bpf_mem_alloc_set_dtor((struct bpf_mem_alloc *)ma, dtor, htab_dtor_ctx_free, hrec); + bpf_mem_alloc_set_dtor(ma, dtor, htab_dtor_ctx_free, hrec); return 0; } -static int htab_map_check_btf(const struct bpf_map *map, const struct btf *btf, +static int htab_map_check_btf(struct bpf_map *map, const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c index 1ccbf28b2ad9..8fca0c64f7b1 100644 --- a/kernel/bpf/local_storage.c +++ b/kernel/bpf/local_storage.c @@ -364,7 +364,7 @@ static long cgroup_storage_delete_elem(struct bpf_map *map, void *key) return -EINVAL; } -static int cgroup_storage_check_btf(const struct bpf_map *map, +static int cgroup_storage_check_btf(struct bpf_map *map, const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type) diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c index 1adeb4d3b8cf..0f57608b385d 100644 --- a/kernel/bpf/lpm_trie.c +++ b/kernel/bpf/lpm_trie.c @@ -751,7 +751,7 @@ free_stack: return err; } -static int trie_check_btf(const struct bpf_map *map, +static int trie_check_btf(struct bpf_map *map, const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 0378e83b4099..274039e36465 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1234,7 +1234,7 @@ int bpf_obj_name_cpy(char *dst, const char *src, unsigned int size) } EXPORT_SYMBOL_GPL(bpf_obj_name_cpy); -int map_check_no_btf(const struct bpf_map *map, +int map_check_no_btf(struct bpf_map *map, const struct btf *btf, const struct btf_type *key_type, const struct btf_type *value_type) From f41deee082dc7b1c198de21710cb5cb9c604cae0 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Fri, 27 Feb 2026 14:48:03 -0800 Subject: [PATCH 336/359] bpf: Delay freeing fields in local storage Currently, when use_kmalloc_nolock is false, the freeing of fields for a local storage selem is done eagerly before waiting for the RCU or RCU tasks trace grace period to elapse. This opens up a window where the program which has access to the selem can recreate the fields after the freeing of fields is done eagerly, causing memory leaks when the element is finally freed and returned to the kernel. Make a few changes to address this. First, delay the freeing of fields until after the grace periods have expired using a __bpf_selem_free_rcu wrapper which is eventually invoked after transitioning through the necessary number of grace period waits. Replace usage of the kfree_rcu with call_rcu to be able to take a custom callback. Finally, care needs to be taken to extend the rcu barriers for all cases, and not just when use_kmalloc_nolock is true, as RCU and RCU tasks trace callbacks can be in flight for either case and access the smap field, which is used to obtain the BTF record to walk over special fields in the map value. While we're at it, drop migrate_disable() from bpf_selem_free_rcu, since migration should be disabled for RCU callbacks already. Fixes: 9bac675e6368 ("bpf: Postpone bpf_obj_free_fields to the rcu callback") Reviewed-by: Amery Hung Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20260227224806.646888-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- kernel/bpf/bpf_local_storage.c | 42 ++++++++++++++++++---------------- 1 file changed, 22 insertions(+), 20 deletions(-) diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c index 2bf5ca5ae0df..d7db1ada2dad 100644 --- a/kernel/bpf/bpf_local_storage.c +++ b/kernel/bpf/bpf_local_storage.c @@ -164,16 +164,28 @@ static void bpf_local_storage_free(struct bpf_local_storage *local_storage, bpf_local_storage_free_trace_rcu); } +/* rcu callback for use_kmalloc_nolock == false */ +static void __bpf_selem_free_rcu(struct rcu_head *rcu) +{ + struct bpf_local_storage_elem *selem; + struct bpf_local_storage_map *smap; + + selem = container_of(rcu, struct bpf_local_storage_elem, rcu); + /* bpf_selem_unlink_nofail may have already cleared smap and freed fields. */ + smap = rcu_dereference_check(SDATA(selem)->smap, 1); + + if (smap) + bpf_obj_free_fields(smap->map.record, SDATA(selem)->data); + kfree(selem); +} + /* rcu tasks trace callback for use_kmalloc_nolock == false */ static void __bpf_selem_free_trace_rcu(struct rcu_head *rcu) { - struct bpf_local_storage_elem *selem; - - selem = container_of(rcu, struct bpf_local_storage_elem, rcu); if (rcu_trace_implies_rcu_gp()) - kfree(selem); + __bpf_selem_free_rcu(rcu); else - kfree_rcu(selem, rcu); + call_rcu(rcu, __bpf_selem_free_rcu); } /* Handle use_kmalloc_nolock == false */ @@ -181,7 +193,7 @@ static void __bpf_selem_free(struct bpf_local_storage_elem *selem, bool vanilla_rcu) { if (vanilla_rcu) - kfree_rcu(selem, rcu); + call_rcu(&selem->rcu, __bpf_selem_free_rcu); else call_rcu_tasks_trace(&selem->rcu, __bpf_selem_free_trace_rcu); } @@ -195,11 +207,8 @@ static void bpf_selem_free_rcu(struct rcu_head *rcu) /* The bpf_local_storage_map_free will wait for rcu_barrier */ smap = rcu_dereference_check(SDATA(selem)->smap, 1); - if (smap) { - migrate_disable(); + if (smap) bpf_obj_free_fields(smap->map.record, SDATA(selem)->data); - migrate_enable(); - } kfree_nolock(selem); } @@ -214,18 +223,12 @@ static void bpf_selem_free_trace_rcu(struct rcu_head *rcu) void bpf_selem_free(struct bpf_local_storage_elem *selem, bool reuse_now) { - struct bpf_local_storage_map *smap; - - smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held()); - if (!selem->use_kmalloc_nolock) { /* * No uptr will be unpin even when reuse_now == false since uptr * is only supported in task local storage, where * smap->use_kmalloc_nolock == true. */ - if (smap) - bpf_obj_free_fields(smap->map.record, SDATA(selem)->data); __bpf_selem_free(selem, reuse_now); return; } @@ -958,10 +961,9 @@ restart: */ synchronize_rcu(); - if (smap->use_kmalloc_nolock) { - rcu_barrier_tasks_trace(); - rcu_barrier(); - } + /* smap remains in use regardless of kmalloc_nolock, so wait unconditionally. */ + rcu_barrier_tasks_trace(); + rcu_barrier(); kvfree(smap->buckets); bpf_map_area_free(smap); } From baa35b3cb6b642b903162aacff13181e170c4ecc Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Fri, 27 Feb 2026 14:48:04 -0800 Subject: [PATCH 337/359] bpf: Retire rcu_trace_implies_rcu_gp() from local storage This assumption will always hold going forward, hence just remove the various checks and assume it is true with a comment for the uninformed reader. Reviewed-by: Paul E. McKenney Reviewed-by: Amery Hung Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20260227224806.646888-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- kernel/bpf/bpf_local_storage.c | 37 +++++++++++++++++----------------- 1 file changed, 19 insertions(+), 18 deletions(-) diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c index d7db1ada2dad..9c96a4477f81 100644 --- a/kernel/bpf/bpf_local_storage.c +++ b/kernel/bpf/bpf_local_storage.c @@ -107,14 +107,12 @@ static void __bpf_local_storage_free_trace_rcu(struct rcu_head *rcu) { struct bpf_local_storage *local_storage; - /* If RCU Tasks Trace grace period implies RCU grace period, do - * kfree(), else do kfree_rcu(). + /* + * RCU Tasks Trace grace period implies RCU grace period, do + * kfree() directly. */ local_storage = container_of(rcu, struct bpf_local_storage, rcu); - if (rcu_trace_implies_rcu_gp()) - kfree(local_storage); - else - kfree_rcu(local_storage, rcu); + kfree(local_storage); } /* Handle use_kmalloc_nolock == false */ @@ -138,10 +136,11 @@ static void bpf_local_storage_free_rcu(struct rcu_head *rcu) static void bpf_local_storage_free_trace_rcu(struct rcu_head *rcu) { - if (rcu_trace_implies_rcu_gp()) - bpf_local_storage_free_rcu(rcu); - else - call_rcu(rcu, bpf_local_storage_free_rcu); + /* + * RCU Tasks Trace grace period implies RCU grace period, do + * kfree() directly. + */ + bpf_local_storage_free_rcu(rcu); } static void bpf_local_storage_free(struct bpf_local_storage *local_storage, @@ -182,10 +181,11 @@ static void __bpf_selem_free_rcu(struct rcu_head *rcu) /* rcu tasks trace callback for use_kmalloc_nolock == false */ static void __bpf_selem_free_trace_rcu(struct rcu_head *rcu) { - if (rcu_trace_implies_rcu_gp()) - __bpf_selem_free_rcu(rcu); - else - call_rcu(rcu, __bpf_selem_free_rcu); + /* + * RCU Tasks Trace grace period implies RCU grace period, do + * kfree() directly. + */ + __bpf_selem_free_rcu(rcu); } /* Handle use_kmalloc_nolock == false */ @@ -214,10 +214,11 @@ static void bpf_selem_free_rcu(struct rcu_head *rcu) static void bpf_selem_free_trace_rcu(struct rcu_head *rcu) { - if (rcu_trace_implies_rcu_gp()) - bpf_selem_free_rcu(rcu); - else - call_rcu(rcu, bpf_selem_free_rcu); + /* + * RCU Tasks Trace grace period implies RCU grace period, do + * kfree() directly. + */ + bpf_selem_free_rcu(rcu); } void bpf_selem_free(struct bpf_local_storage_elem *selem, From 2939d7b3b0e5f35359ce8f69dbbad0bfc4e920b6 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Fri, 27 Feb 2026 14:48:05 -0800 Subject: [PATCH 338/359] selftests/bpf: Add tests for special fields races Add a couple of tests to ensure that the refcount drops to zero when we exercise the race where creation of a special field succeeds the logical bpf_obj_free_fields done when deleting an element. Prior to previous changes, the fields would be freed eagerly and repopulate and end up leaking, causing the reference to not drop down correctly. Running this test on a kernel without fixes will cause a hang in delete_module, since the module reference stays active due to the leaked kptr not dropping it. After the fixes tests succeed as expected. Reviewed-by: Amery Hung Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20260227224806.646888-6-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- .../selftests/bpf/prog_tests/map_kptr_race.c | 218 ++++++++++++++++++ .../selftests/bpf/progs/map_kptr_race.c | 197 ++++++++++++++++ 2 files changed, 415 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/map_kptr_race.c create mode 100644 tools/testing/selftests/bpf/progs/map_kptr_race.c diff --git a/tools/testing/selftests/bpf/prog_tests/map_kptr_race.c b/tools/testing/selftests/bpf/prog_tests/map_kptr_race.c new file mode 100644 index 000000000000..506ed55e8528 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/map_kptr_race.c @@ -0,0 +1,218 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */ +#include +#include + +#include "map_kptr_race.skel.h" + +static int get_map_id(int map_fd) +{ + struct bpf_map_info info = {}; + __u32 len = sizeof(info); + + if (!ASSERT_OK(bpf_map_get_info_by_fd(map_fd, &info, &len), "get_map_info")) + return -1; + return info.id; +} + +static int read_refs(struct map_kptr_race *skel) +{ + LIBBPF_OPTS(bpf_test_run_opts, opts); + int ret; + + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.count_ref), &opts); + if (!ASSERT_OK(ret, "count_ref run")) + return -1; + if (!ASSERT_OK(opts.retval, "count_ref retval")) + return -1; + return skel->bss->num_of_refs; +} + +static void test_htab_leak(void) +{ + LIBBPF_OPTS(bpf_test_run_opts, opts, + .data_in = &pkt_v4, + .data_size_in = sizeof(pkt_v4), + .repeat = 1, + ); + struct map_kptr_race *skel, *watcher; + int ret, map_id; + + skel = map_kptr_race__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_and_load")) + return; + + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_htab_leak), &opts); + if (!ASSERT_OK(ret, "test_htab_leak run")) + goto out_skel; + if (!ASSERT_OK(opts.retval, "test_htab_leak retval")) + goto out_skel; + + map_id = get_map_id(bpf_map__fd(skel->maps.race_hash_map)); + if (!ASSERT_GE(map_id, 0, "map_id")) + goto out_skel; + + watcher = map_kptr_race__open_and_load(); + if (!ASSERT_OK_PTR(watcher, "watcher open_and_load")) + goto out_skel; + + watcher->bss->target_map_id = map_id; + watcher->links.map_put = bpf_program__attach(watcher->progs.map_put); + if (!ASSERT_OK_PTR(watcher->links.map_put, "attach fentry")) + goto out_watcher; + watcher->links.htab_map_free = bpf_program__attach(watcher->progs.htab_map_free); + if (!ASSERT_OK_PTR(watcher->links.htab_map_free, "attach fexit")) + goto out_watcher; + + map_kptr_race__destroy(skel); + skel = NULL; + + kern_sync_rcu(); + + while (!READ_ONCE(watcher->bss->map_freed)) + sched_yield(); + + ASSERT_EQ(watcher->bss->map_freed, 1, "map_freed"); + ASSERT_EQ(read_refs(watcher), 2, "htab refcount"); + +out_watcher: + map_kptr_race__destroy(watcher); +out_skel: + map_kptr_race__destroy(skel); +} + +static void test_percpu_htab_leak(void) +{ + LIBBPF_OPTS(bpf_test_run_opts, opts, + .data_in = &pkt_v4, + .data_size_in = sizeof(pkt_v4), + .repeat = 1, + ); + struct map_kptr_race *skel, *watcher; + int ret, map_id; + + skel = map_kptr_race__open(); + if (!ASSERT_OK_PTR(skel, "open")) + return; + + skel->rodata->nr_cpus = libbpf_num_possible_cpus(); + if (skel->rodata->nr_cpus > 16) + skel->rodata->nr_cpus = 16; + + ret = map_kptr_race__load(skel); + if (!ASSERT_OK(ret, "load")) + goto out_skel; + + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_percpu_htab_leak), &opts); + if (!ASSERT_OK(ret, "test_percpu_htab_leak run")) + goto out_skel; + if (!ASSERT_OK(opts.retval, "test_percpu_htab_leak retval")) + goto out_skel; + + map_id = get_map_id(bpf_map__fd(skel->maps.race_percpu_hash_map)); + if (!ASSERT_GE(map_id, 0, "map_id")) + goto out_skel; + + watcher = map_kptr_race__open_and_load(); + if (!ASSERT_OK_PTR(watcher, "watcher open_and_load")) + goto out_skel; + + watcher->bss->target_map_id = map_id; + watcher->links.map_put = bpf_program__attach(watcher->progs.map_put); + if (!ASSERT_OK_PTR(watcher->links.map_put, "attach fentry")) + goto out_watcher; + watcher->links.htab_map_free = bpf_program__attach(watcher->progs.htab_map_free); + if (!ASSERT_OK_PTR(watcher->links.htab_map_free, "attach fexit")) + goto out_watcher; + + map_kptr_race__destroy(skel); + skel = NULL; + + kern_sync_rcu(); + + while (!READ_ONCE(watcher->bss->map_freed)) + sched_yield(); + + ASSERT_EQ(watcher->bss->map_freed, 1, "map_freed"); + ASSERT_EQ(read_refs(watcher), 2, "percpu_htab refcount"); + +out_watcher: + map_kptr_race__destroy(watcher); +out_skel: + map_kptr_race__destroy(skel); +} + +static void test_sk_ls_leak(void) +{ + struct map_kptr_race *skel, *watcher; + int listen_fd = -1, client_fd = -1, map_id; + + skel = map_kptr_race__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_and_load")) + return; + + if (!ASSERT_OK(map_kptr_race__attach(skel), "attach")) + goto out_skel; + + listen_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0); + if (!ASSERT_GE(listen_fd, 0, "start_server")) + goto out_skel; + + client_fd = connect_to_fd(listen_fd, 0); + if (!ASSERT_GE(client_fd, 0, "connect_to_fd")) + goto out_skel; + + if (!ASSERT_EQ(skel->bss->sk_ls_leak_done, 1, "sk_ls_leak_done")) + goto out_skel; + + close(client_fd); + client_fd = -1; + close(listen_fd); + listen_fd = -1; + + map_id = get_map_id(bpf_map__fd(skel->maps.race_sk_ls_map)); + if (!ASSERT_GE(map_id, 0, "map_id")) + goto out_skel; + + watcher = map_kptr_race__open_and_load(); + if (!ASSERT_OK_PTR(watcher, "watcher open_and_load")) + goto out_skel; + + watcher->bss->target_map_id = map_id; + watcher->links.map_put = bpf_program__attach(watcher->progs.map_put); + if (!ASSERT_OK_PTR(watcher->links.map_put, "attach fentry")) + goto out_watcher; + watcher->links.sk_map_free = bpf_program__attach(watcher->progs.sk_map_free); + if (!ASSERT_OK_PTR(watcher->links.sk_map_free, "attach fexit")) + goto out_watcher; + + map_kptr_race__destroy(skel); + skel = NULL; + + kern_sync_rcu(); + + while (!READ_ONCE(watcher->bss->map_freed)) + sched_yield(); + + ASSERT_EQ(watcher->bss->map_freed, 1, "map_freed"); + ASSERT_EQ(read_refs(watcher), 2, "sk_ls refcount"); + +out_watcher: + map_kptr_race__destroy(watcher); +out_skel: + if (client_fd >= 0) + close(client_fd); + if (listen_fd >= 0) + close(listen_fd); + map_kptr_race__destroy(skel); +} + +void serial_test_map_kptr_race(void) +{ + if (test__start_subtest("htab_leak")) + test_htab_leak(); + if (test__start_subtest("percpu_htab_leak")) + test_percpu_htab_leak(); + if (test__start_subtest("sk_ls_leak")) + test_sk_ls_leak(); +} diff --git a/tools/testing/selftests/bpf/progs/map_kptr_race.c b/tools/testing/selftests/bpf/progs/map_kptr_race.c new file mode 100644 index 000000000000..f6f136cd8f60 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/map_kptr_race.c @@ -0,0 +1,197 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */ +#include +#include +#include +#include "../test_kmods/bpf_testmod_kfunc.h" + +struct map_value { + struct prog_test_ref_kfunc __kptr *ref_ptr; +}; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, int); + __type(value, struct map_value); + __uint(max_entries, 1); +} race_hash_map SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_PERCPU_HASH); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, int); + __type(value, struct map_value); + __uint(max_entries, 1); +} race_percpu_hash_map SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_SK_STORAGE); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, int); + __type(value, struct map_value); +} race_sk_ls_map SEC(".maps"); + +int num_of_refs; +int sk_ls_leak_done; +int target_map_id; +int map_freed; +const volatile int nr_cpus; + +SEC("tc") +int test_htab_leak(struct __sk_buff *skb) +{ + struct prog_test_ref_kfunc *p, *old; + struct map_value val = {}; + struct map_value *v; + int key = 0; + + if (bpf_map_update_elem(&race_hash_map, &key, &val, BPF_ANY)) + return 1; + + v = bpf_map_lookup_elem(&race_hash_map, &key); + if (!v) + return 2; + + p = bpf_kfunc_call_test_acquire(&(unsigned long){0}); + if (!p) + return 3; + old = bpf_kptr_xchg(&v->ref_ptr, p); + if (old) + bpf_kfunc_call_test_release(old); + + bpf_map_delete_elem(&race_hash_map, &key); + + p = bpf_kfunc_call_test_acquire(&(unsigned long){0}); + if (!p) + return 4; + old = bpf_kptr_xchg(&v->ref_ptr, p); + if (old) + bpf_kfunc_call_test_release(old); + + return 0; +} + +static int fill_percpu_kptr(struct map_value *v) +{ + struct prog_test_ref_kfunc *p, *old; + + p = bpf_kfunc_call_test_acquire(&(unsigned long){0}); + if (!p) + return 1; + old = bpf_kptr_xchg(&v->ref_ptr, p); + if (old) + bpf_kfunc_call_test_release(old); + return 0; +} + +SEC("tc") +int test_percpu_htab_leak(struct __sk_buff *skb) +{ + struct map_value *v, *arr[16] = {}; + struct map_value val = {}; + int key = 0; + int err = 0; + + if (bpf_map_update_elem(&race_percpu_hash_map, &key, &val, BPF_ANY)) + return 1; + + for (int i = 0; i < nr_cpus; i++) { + v = bpf_map_lookup_percpu_elem(&race_percpu_hash_map, &key, i); + if (!v) + return 2; + arr[i] = v; + } + + bpf_map_delete_elem(&race_percpu_hash_map, &key); + + for (int i = 0; i < nr_cpus; i++) { + v = arr[i]; + err = fill_percpu_kptr(v); + if (err) + return 3; + } + + return 0; +} + +SEC("tp_btf/inet_sock_set_state") +int BPF_PROG(test_sk_ls_leak, struct sock *sk, int oldstate, int newstate) +{ + struct prog_test_ref_kfunc *p, *old; + struct map_value *v; + + if (newstate != BPF_TCP_SYN_SENT) + return 0; + + if (sk_ls_leak_done) + return 0; + + v = bpf_sk_storage_get(&race_sk_ls_map, sk, NULL, + BPF_SK_STORAGE_GET_F_CREATE); + if (!v) + return 0; + + p = bpf_kfunc_call_test_acquire(&(unsigned long){0}); + if (!p) + return 0; + old = bpf_kptr_xchg(&v->ref_ptr, p); + if (old) + bpf_kfunc_call_test_release(old); + + bpf_sk_storage_delete(&race_sk_ls_map, sk); + + p = bpf_kfunc_call_test_acquire(&(unsigned long){0}); + if (!p) + return 0; + old = bpf_kptr_xchg(&v->ref_ptr, p); + if (old) + bpf_kfunc_call_test_release(old); + + sk_ls_leak_done = 1; + return 0; +} + +long target_map_ptr; + +SEC("fentry/bpf_map_put") +int BPF_PROG(map_put, struct bpf_map *map) +{ + if (target_map_id && map->id == (u32)target_map_id) + target_map_ptr = (long)map; + return 0; +} + +SEC("fexit/htab_map_free") +int BPF_PROG(htab_map_free, struct bpf_map *map) +{ + if (target_map_ptr && (long)map == target_map_ptr) + map_freed = 1; + return 0; +} + +SEC("fexit/bpf_sk_storage_map_free") +int BPF_PROG(sk_map_free, struct bpf_map *map) +{ + if (target_map_ptr && (long)map == target_map_ptr) + map_freed = 1; + return 0; +} + +SEC("syscall") +int count_ref(void *ctx) +{ + struct prog_test_ref_kfunc *p; + unsigned long arg = 0; + + p = bpf_kfunc_call_test_acquire(&arg); + if (!p) + return 1; + + num_of_refs = p->cnt.refs.counter; + + bpf_kfunc_call_test_release(p); + return 0; +} + +char _license[] SEC("license") = "GPL"; From 869c63d5975d55e97f6b168e885452b3da20ea47 Mon Sep 17 00:00:00 2001 From: Jiayuan Chen Date: Wed, 25 Feb 2026 20:14:55 +0800 Subject: [PATCH 339/359] bpf: Fix race in cpumap on PREEMPT_RT On PREEMPT_RT kernels, the per-CPU xdp_bulk_queue (bq) can be accessed concurrently by multiple preemptible tasks on the same CPU. The original code assumes bq_enqueue() and __cpu_map_flush() run atomically with respect to each other on the same CPU, relying on local_bh_disable() to prevent preemption. However, on PREEMPT_RT, local_bh_disable() only calls migrate_disable() (when PREEMPT_RT_NEEDS_BH_LOCK is not set) and does not disable preemption, which allows CFS scheduling to preempt a task during bq_flush_to_queue(), enabling another task on the same CPU to enter bq_enqueue() and operate on the same per-CPU bq concurrently. This leads to several races: 1. Double __list_del_clearprev(): after bq->count is reset in bq_flush_to_queue(), a preempting task can call bq_enqueue() -> bq_flush_to_queue() on the same bq when bq->count reaches CPU_MAP_BULK_SIZE. Both tasks then call __list_del_clearprev() on the same bq->flush_node, the second call dereferences the prev pointer that was already set to NULL by the first. 2. bq->count and bq->q[] races: concurrent bq_enqueue() can corrupt the packet queue while bq_flush_to_queue() is processing it. The race between task A (__cpu_map_flush -> bq_flush_to_queue) and task B (bq_enqueue -> bq_flush_to_queue) on the same CPU: Task A (xdp_do_flush) Task B (cpu_map_enqueue) ---------------------- ------------------------ bq_flush_to_queue(bq) spin_lock(&q->producer_lock) /* flush bq->q[] to ptr_ring */ bq->count = 0 spin_unlock(&q->producer_lock) bq_enqueue(rcpu, xdpf) <-- CFS preempts Task A --> bq->q[bq->count++] = xdpf /* ... more enqueues until full ... */ bq_flush_to_queue(bq) spin_lock(&q->producer_lock) /* flush to ptr_ring */ spin_unlock(&q->producer_lock) __list_del_clearprev(flush_node) /* sets flush_node.prev = NULL */ <-- Task A resumes --> __list_del_clearprev(flush_node) flush_node.prev->next = ... /* prev is NULL -> kernel oops */ Fix this by adding a local_lock_t to xdp_bulk_queue and acquiring it in bq_enqueue() and __cpu_map_flush(). These paths already run under local_bh_disable(), so use local_lock_nested_bh() which on non-RT is a pure annotation with no overhead, and on PREEMPT_RT provides a per-CPU sleeping lock that serializes access to the bq. To reproduce, insert an mdelay(100) between bq->count = 0 and __list_del_clearprev() in bq_flush_to_queue(), then run reproducer provided by syzkaller. Fixes: 3253cb49cbad ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT") Reported-by: syzbot+2b3391f44313b3983e91@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69369331.a70a0220.38f243.009d.GAE@google.com/T/ Reviewed-by: Sebastian Andrzej Siewior Signed-off-by: Jiayuan Chen Signed-off-by: Jiayuan Chen Link: https://lore.kernel.org/r/20260225121459.183121-2-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov --- kernel/bpf/cpumap.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 04171fbc39cb..32b43cb9061b 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -52,6 +53,7 @@ struct xdp_bulk_queue { struct list_head flush_node; struct bpf_cpu_map_entry *obj; unsigned int count; + local_lock_t bq_lock; }; /* Struct for every remote "destination" CPU in map */ @@ -451,6 +453,7 @@ __cpu_map_entry_alloc(struct bpf_map *map, struct bpf_cpumap_val *value, for_each_possible_cpu(i) { bq = per_cpu_ptr(rcpu->bulkq, i); bq->obj = rcpu; + local_lock_init(&bq->bq_lock); } /* Alloc queue */ @@ -722,6 +725,8 @@ static void bq_flush_to_queue(struct xdp_bulk_queue *bq) struct ptr_ring *q; int i; + lockdep_assert_held(&bq->bq_lock); + if (unlikely(!bq->count)) return; @@ -749,11 +754,15 @@ static void bq_flush_to_queue(struct xdp_bulk_queue *bq) } /* Runs under RCU-read-side, plus in softirq under NAPI protection. - * Thus, safe percpu variable access. + * Thus, safe percpu variable access. PREEMPT_RT relies on + * local_lock_nested_bh() to serialise access to the per-CPU bq. */ static void bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf) { - struct xdp_bulk_queue *bq = this_cpu_ptr(rcpu->bulkq); + struct xdp_bulk_queue *bq; + + local_lock_nested_bh(&rcpu->bulkq->bq_lock); + bq = this_cpu_ptr(rcpu->bulkq); if (unlikely(bq->count == CPU_MAP_BULK_SIZE)) bq_flush_to_queue(bq); @@ -774,6 +783,8 @@ static void bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf) list_add(&bq->flush_node, flush_list); } + + local_unlock_nested_bh(&rcpu->bulkq->bq_lock); } int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf, @@ -810,7 +821,9 @@ void __cpu_map_flush(struct list_head *flush_list) struct xdp_bulk_queue *bq, *tmp; list_for_each_entry_safe(bq, tmp, flush_list, flush_node) { + local_lock_nested_bh(&bq->obj->bulkq->bq_lock); bq_flush_to_queue(bq); + local_unlock_nested_bh(&bq->obj->bulkq->bq_lock); /* If already running, costs spin_lock_irqsave + smb_mb */ wake_up_process(bq->obj->kthread); From 1872e75375c40add4a35990de3be77b5741c252c Mon Sep 17 00:00:00 2001 From: Jiayuan Chen Date: Wed, 25 Feb 2026 20:14:56 +0800 Subject: [PATCH 340/359] bpf: Fix race in devmap on PREEMPT_RT On PREEMPT_RT kernels, the per-CPU xdp_dev_bulk_queue (bq) can be accessed concurrently by multiple preemptible tasks on the same CPU. The original code assumes bq_enqueue() and __dev_flush() run atomically with respect to each other on the same CPU, relying on local_bh_disable() to prevent preemption. However, on PREEMPT_RT, local_bh_disable() only calls migrate_disable() (when PREEMPT_RT_NEEDS_BH_LOCK is not set) and does not disable preemption, which allows CFS scheduling to preempt a task during bq_xmit_all(), enabling another task on the same CPU to enter bq_enqueue() and operate on the same per-CPU bq concurrently. This leads to several races: 1. Double-free / use-after-free on bq->q[]: bq_xmit_all() snapshots cnt = bq->count, then iterates bq->q[0..cnt-1] to transmit frames. If preempted after the snapshot, a second task can call bq_enqueue() -> bq_xmit_all() on the same bq, transmitting (and freeing) the same frames. When the first task resumes, it operates on stale pointers in bq->q[], causing use-after-free. 2. bq->count and bq->q[] corruption: concurrent bq_enqueue() modifying bq->count and bq->q[] while bq_xmit_all() is reading them. 3. dev_rx/xdp_prog teardown race: __dev_flush() clears bq->dev_rx and bq->xdp_prog after bq_xmit_all(). If preempted between bq_xmit_all() return and bq->dev_rx = NULL, a preempting bq_enqueue() sees dev_rx still set (non-NULL), skips adding bq to the flush_list, and enqueues a frame. When __dev_flush() resumes, it clears dev_rx and removes bq from the flush_list, orphaning the newly enqueued frame. 4. __list_del_clearprev() on flush_node: similar to the cpumap race, both tasks can call __list_del_clearprev() on the same flush_node, the second dereferences the prev pointer already set to NULL. The race between task A (__dev_flush -> bq_xmit_all) and task B (bq_enqueue -> bq_xmit_all) on the same CPU: Task A (xdp_do_flush) Task B (ndo_xdp_xmit redirect) ---------------------- -------------------------------- __dev_flush(flush_list) bq_xmit_all(bq) cnt = bq->count /* e.g. 16 */ /* start iterating bq->q[] */ <-- CFS preempts Task A --> bq_enqueue(dev, xdpf) bq->count == DEV_MAP_BULK_SIZE bq_xmit_all(bq, 0) cnt = bq->count /* same 16! */ ndo_xdp_xmit(bq->q[]) /* frames freed by driver */ bq->count = 0 <-- Task A resumes --> ndo_xdp_xmit(bq->q[]) /* use-after-free: frames already freed! */ Fix this by adding a local_lock_t to xdp_dev_bulk_queue and acquiring it in bq_enqueue() and __dev_flush(). These paths already run under local_bh_disable(), so use local_lock_nested_bh() which on non-RT is a pure annotation with no overhead, and on PREEMPT_RT provides a per-CPU sleeping lock that serializes access to the bq. Fixes: 3253cb49cbad ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT") Reported-by: Sebastian Andrzej Siewior Reviewed-by: Sebastian Andrzej Siewior Signed-off-by: Jiayuan Chen Signed-off-by: Jiayuan Chen Link: https://lore.kernel.org/r/20260225121459.183121-3-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov --- kernel/bpf/devmap.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 2984e938f94d..3d619d01088e 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -45,6 +45,7 @@ * types of devmap; only the lookup and insertion is different. */ #include +#include #include #include #include @@ -60,6 +61,7 @@ struct xdp_dev_bulk_queue { struct net_device *dev_rx; struct bpf_prog *xdp_prog; unsigned int count; + local_lock_t bq_lock; }; struct bpf_dtab_netdev { @@ -381,6 +383,8 @@ static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) int to_send = cnt; int i; + lockdep_assert_held(&bq->bq_lock); + if (unlikely(!cnt)) return; @@ -425,10 +429,12 @@ void __dev_flush(struct list_head *flush_list) struct xdp_dev_bulk_queue *bq, *tmp; list_for_each_entry_safe(bq, tmp, flush_list, flush_node) { + local_lock_nested_bh(&bq->dev->xdp_bulkq->bq_lock); bq_xmit_all(bq, XDP_XMIT_FLUSH); bq->dev_rx = NULL; bq->xdp_prog = NULL; __list_del_clearprev(&bq->flush_node); + local_unlock_nested_bh(&bq->dev->xdp_bulkq->bq_lock); } } @@ -451,12 +457,16 @@ static void *__dev_map_lookup_elem(struct bpf_map *map, u32 key) /* Runs in NAPI, i.e., softirq under local_bh_disable(). Thus, safe percpu * variable access, and map elements stick around. See comment above - * xdp_do_flush() in filter.c. + * xdp_do_flush() in filter.c. PREEMPT_RT relies on local_lock_nested_bh() + * to serialise access to the per-CPU bq. */ static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, struct net_device *dev_rx, struct bpf_prog *xdp_prog) { - struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); + struct xdp_dev_bulk_queue *bq; + + local_lock_nested_bh(&dev->xdp_bulkq->bq_lock); + bq = this_cpu_ptr(dev->xdp_bulkq); if (unlikely(bq->count == DEV_MAP_BULK_SIZE)) bq_xmit_all(bq, 0); @@ -477,6 +487,8 @@ static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, } bq->q[bq->count++] = xdpf; + + local_unlock_nested_bh(&dev->xdp_bulkq->bq_lock); } static inline int __xdp_enqueue(struct net_device *dev, struct xdp_frame *xdpf, @@ -1127,8 +1139,13 @@ static int dev_map_notification(struct notifier_block *notifier, if (!netdev->xdp_bulkq) return NOTIFY_BAD; - for_each_possible_cpu(cpu) - per_cpu_ptr(netdev->xdp_bulkq, cpu)->dev = netdev; + for_each_possible_cpu(cpu) { + struct xdp_dev_bulk_queue *bq; + + bq = per_cpu_ptr(netdev->xdp_bulkq, cpu); + bq->dev = netdev; + local_lock_init(&bq->bq_lock); + } break; case NETDEV_UNREGISTER: /* This rcu_read_lock/unlock pair is needed because From 76e954155b45294c502e3d3a9e15757c858ca55e Mon Sep 17 00:00:00 2001 From: Harishankar Vishwanathan Date: Fri, 27 Feb 2026 22:32:21 +0100 Subject: [PATCH 341/359] bpf: Introduce tnum_step to step through tnum's members This commit introduces tnum_step(), a function that, when given t, and a number z returns the smallest member of t larger than z. The number z must be greater or equal to the smallest member of t and less than the largest member of t. The first step is to compute j, a number that keeps all of t's known bits, and matches all unknown bits to z's bits. Since j is a member of the t, it is already a candidate for result. However, we want our result to be (minimally) greater than z. There are only two possible cases: (1) Case j <= z. In this case, we want to increase the value of j and make it > z. (2) Case j > z. In this case, we want to decrease the value of j while keeping it > z. (Case 1) j <= z t = xx11x0x0 z = 10111101 (189) j = 10111000 (184) ^ k (Case 1.1) Let's first consider the case where j < z. We will address j == z later. Since z > j, there had to be a bit position that was 1 in z and a 0 in j, beyond which all positions of higher significance are equal in j and z. Further, this position could not have been unknown in a, because the unknown positions of a match z. This position had to be a 1 in z and known 0 in t. Let k be position of the most significant 1-to-0 flip. In our example, k = 3 (starting the count at 1 at the least significant bit). Setting (to 1) the unknown bits of t in positions of significance smaller than k will not produce a result > z. Hence, we must set/unset the unknown bits at positions of significance higher than k. Specifically, we look for the next larger combination of 1s and 0s to place in those positions, relative to the combination that exists in z. We can achieve this by concatenating bits at unknown positions of t into an integer, adding 1, and writing the bits of that result back into the corresponding bit positions previously extracted from z. >From our example, considering only positions of significance greater than k: t = xx..x z = 10..1 + 1 ----- 11..0 This is the exact combination 1s and 0s we need at the unknown bits of t in positions of significance greater than k. Further, our result must only increase the value minimally above z. Hence, unknown bits in positions of significance smaller than k should remain 0. We finally have, result = 11110000 (240) (Case 1.2) Now consider the case when j = z, for example t = 1x1x0xxx z = 10110100 (180) j = 10110100 (180) Matching the unknown bits of the t to the bits of z yielded exactly z. To produce a number greater than z, we must set/unset the unknown bits in t, and *all* the unknown bits of t candidates for being set/unset. We can do this similar to Case 1.1, by adding 1 to the bits extracted from the masked bit positions of z. Essentially, this case is equivalent to Case 1.1, with k = 0. t = 1x1x0xxx z = .0.1.100 + 1 --------- .0.1.101 This is the exact combination of bits needed in the unknown positions of t. After recalling the known positions of t, we get result = 10110101 (181) (Case 2) j > z t = x00010x1 z = 10000010 (130) j = 10001011 (139) ^ k Since j > z, there had to be a bit position which was 0 in z, and a 1 in j, beyond which all positions of higher significance are equal in j and z. This position had to be a 0 in z and known 1 in t. Let k be the position of the most significant 0-to-1 flip. In our example, k = 4. Because of the 0-to-1 flip at position k, a member of t can become greater than z if the bits in positions greater than k are themselves >= to z. To make that member *minimally* greater than z, the bits in positions greater than k must be exactly = z. Hence, we simply match all of t's unknown bits in positions more significant than k to z's bits. In positions less significant than k, we set all t's unknown bits to 0 to retain minimality. In our example, in positions of greater significance than k (=4), t=x000. These positions are matched with z (1000) to produce 1000. In positions of lower significance than k, t=10x1. All unknown bits are set to 0 to produce 1001. The final result is: result = 10001001 (137) This concludes the computation for a result > z that is a member of t. The procedure for tnum_step() in this commit implements the idea described above. As a proof of correctness, we verified the algorithm against a logical specification of tnum_step. The specification asserts the following about the inputs t, z and output res that: 1. res is a member of t, and 2. res is strictly greater than z, and 3. there does not exist another value res2 such that 3a. res2 is also a member of t, and 3b. res2 is greater than z 3c. res2 is smaller than res We checked the implementation against this logical specification using an SMT solver. The verification formula in SMTLIB format is available at [1]. The verification returned an "unsat": indicating that no input assignment exists for which the implementation and the specification produce different outputs. In addition, we also automatically generated the logical encoding of the C implementation using Agni [2] and verified it against the same specification. This verification also returned an "unsat", confirming that the implementation is equivalent to the specification. The formula for this check is also available at [3]. Link: https://pastebin.com/raw/2eRWbiit [1] Link: https://github.com/bpfverif/agni [2] Link: https://pastebin.com/raw/EztVbBJ2 [3] Co-developed-by: Srinivas Narayana Signed-off-by: Srinivas Narayana Co-developed-by: Santosh Nagarakatte Signed-off-by: Santosh Nagarakatte Signed-off-by: Harishankar Vishwanathan Link: https://lore.kernel.org/r/93fdf71910411c0f19e282ba6d03b4c65f9c5d73.1772225741.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/tnum.h | 3 +++ kernel/bpf/tnum.c | 56 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 59 insertions(+) diff --git a/include/linux/tnum.h b/include/linux/tnum.h index fa4654ffb621..ca2cfec8de08 100644 --- a/include/linux/tnum.h +++ b/include/linux/tnum.h @@ -131,4 +131,7 @@ static inline bool tnum_subreg_is_const(struct tnum a) return !(tnum_subreg(a)).mask; } +/* Returns the smallest member of t larger than z */ +u64 tnum_step(struct tnum t, u64 z); + #endif /* _LINUX_TNUM_H */ diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c index 26fbfbb01700..4abc359b3db0 100644 --- a/kernel/bpf/tnum.c +++ b/kernel/bpf/tnum.c @@ -269,3 +269,59 @@ struct tnum tnum_bswap64(struct tnum a) { return TNUM(swab64(a.value), swab64(a.mask)); } + +/* Given tnum t, and a number z such that tmin <= z < tmax, where tmin + * is the smallest member of the t (= t.value) and tmax is the largest + * member of t (= t.value | t.mask), returns the smallest member of t + * larger than z. + * + * For example, + * t = x11100x0 + * z = 11110001 (241) + * result = 11110010 (242) + * + * Note: if this function is called with z >= tmax, it just returns + * early with tmax; if this function is called with z < tmin, the + * algorithm already returns tmin. + */ +u64 tnum_step(struct tnum t, u64 z) +{ + u64 tmax, j, p, q, r, s, v, u, w, res; + u8 k; + + tmax = t.value | t.mask; + + /* if z >= largest member of t, return largest member of t */ + if (z >= tmax) + return tmax; + + /* if z < smallest member of t, return smallest member of t */ + if (z < t.value) + return t.value; + + /* keep t's known bits, and match all unknown bits to z */ + j = t.value | (z & t.mask); + + if (j > z) { + p = ~z & t.value & ~t.mask; + k = fls64(p); /* k is the most-significant 0-to-1 flip */ + q = U64_MAX << k; + r = q & z; /* positions > k matched to z */ + s = ~q & t.value; /* positions <= k matched to t.value */ + v = r | s; + res = v; + } else { + p = z & ~t.value & ~t.mask; + k = fls64(p); /* k is the most-significant 1-to-0 flip */ + q = U64_MAX << k; + r = q & t.mask & z; /* unknown positions > k, matched to z */ + s = q & ~t.mask; /* known positions > k, set to 1 */ + v = r | s; + /* add 1 to unknown positions > k to make value greater than z */ + u = v + (1ULL << k); + /* extract bits in unknown positions > k from u, rest from t.value */ + w = (u & t.mask) | t.value; + res = w; + } + return res; +} From efc11a667878a1d655ff034a93a539debbfedb12 Mon Sep 17 00:00:00 2001 From: Paul Chaignon Date: Fri, 27 Feb 2026 22:35:02 +0100 Subject: [PATCH 342/359] bpf: Improve bounds when tnum has a single possible value We're hitting an invariant violation in Cilium that sometimes leads to BPF programs being rejected and Cilium failing to start [1]. The following extract from verifier logs shows what's happening: from 201 to 236: R1=0 R6=ctx() R7=1 R9=scalar(smin=umin=smin32=umin32=3584,smax=umax=smax32=umax32=3840,var_off=(0xe00; 0x100)) R10=fp0 236: R1=0 R6=ctx() R7=1 R9=scalar(smin=umin=smin32=umin32=3584,smax=umax=smax32=umax32=3840,var_off=(0xe00; 0x100)) R10=fp0 ; if (magic == MARK_MAGIC_HOST || magic == MARK_MAGIC_OVERLAY || magic == MARK_MAGIC_ENCRYPT) @ bpf_host.c:1337 236: (16) if w9 == 0xe00 goto pc+45 ; R9=scalar(smin=umin=smin32=umin32=3585,smax=umax=smax32=umax32=3840,var_off=(0xe00; 0x100)) 237: (16) if w9 == 0xf00 goto pc+1 verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0xe01, 0xe00] s64=[0xe01, 0xe00] u32=[0xe01, 0xe00] s32=[0xe01, 0xe00] var_off=(0xe00, 0x0) We reach instruction 236 with two possible values for R9, 0xe00 and 0xf00. This is perfectly reflected in the tnum, but of course the ranges are less accurate and cover [0xe00; 0xf00]. Taking the fallthrough path at instruction 236 allows the verifier to reduce the range to [0xe01; 0xf00]. The tnum is however not updated. With these ranges, at instruction 237, the verifier is not able to deduce that R9 is always equal to 0xf00. Hence the fallthrough pass is explored first, the verifier refines the bounds using the assumption that R9 != 0xf00, and ends up with an invariant violation. This pattern of impossible branch + bounds refinement is common to all invariant violations seen so far. The long-term solution is likely to rely on the refinement + invariant violation check to detect dead branches, as started by Eduard. To fix the current issue, we need something with less refactoring that we can backport. This patch uses the tnum_step helper introduced in the previous patch to detect the above situation. In particular, three cases are now detected in the bounds refinement: 1. The u64 range and the tnum only overlap in umin. u64: ---[xxxxxx]----- tnum: --xx----------x- 2. The u64 range and the tnum only overlap in the maximum value represented by the tnum, called tmax. u64: ---[xxxxxx]----- tnum: xx-----x-------- 3. The u64 range and the tnum only overlap in between umin (excluded) and umax. u64: ---[xxxxxx]----- tnum: xx----x-------x- To detect these three cases, we call tnum_step(tnum, umin), which returns the smallest member of the tnum greater than umin, called tnum_next here. We're in case (1) if umin is part of the tnum and tnum_next is greater than umax. We're in case (2) if umin is not part of the tnum and tnum_next is equal to tmax. Finally, we're in case (3) if umin is not part of the tnum, tnum_next is inferior or equal to umax, and calling tnum_step a second time gives us a value past umax. This change implements these three cases. With it, the above bytecode looks as follows: 0: (85) call bpf_get_prandom_u32#7 ; R0=scalar() 1: (47) r0 |= 3584 ; R0=scalar(smin=0x8000000000000e00,umin=umin32=3584,smin32=0x80000e00,var_off=(0xe00; 0xfffffffffffff1ff)) 2: (57) r0 &= 3840 ; R0=scalar(smin=umin=smin32=umin32=3584,smax=umax=smax32=umax32=3840,var_off=(0xe00; 0x100)) 3: (15) if r0 == 0xe00 goto pc+2 ; R0=3840 4: (15) if r0 == 0xf00 goto pc+1 4: R0=3840 6: (95) exit In addition to the new selftests, this change was also verified with Agni [3]. For the record, the raw SMT is available at [4]. The property it verifies is that: If a concrete value x is contained in all input abstract values, after __update_reg_bounds, it will continue to be contained in all output abstract values. Link: https://github.com/cilium/cilium/issues/44216 [1] Link: https://pchaigno.github.io/test-verifier-complexity.html [2] Link: https://github.com/bpfverif/agni [3] Link: https://pastebin.com/raw/naCfaqNx [4] Fixes: 0df1a55afa83 ("bpf: Warn on internal verifier errors") Acked-by: Eduard Zingerman Tested-by: Marco Schirrmeister Co-developed-by: Harishankar Vishwanathan Signed-off-by: Harishankar Vishwanathan Signed-off-by: Paul Chaignon Link: https://lore.kernel.org/r/ef254c4f68be19bd393d450188946821c588565d.1772225741.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index bb12ba020649..401d6c4960ec 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2379,6 +2379,9 @@ static void __update_reg32_bounds(struct bpf_reg_state *reg) static void __update_reg64_bounds(struct bpf_reg_state *reg) { + u64 tnum_next, tmax; + bool umin_in_tnum; + /* min signed is max(sign bit) | min(other bits) */ reg->smin_value = max_t(s64, reg->smin_value, reg->var_off.value | (reg->var_off.mask & S64_MIN)); @@ -2388,6 +2391,33 @@ static void __update_reg64_bounds(struct bpf_reg_state *reg) reg->umin_value = max(reg->umin_value, reg->var_off.value); reg->umax_value = min(reg->umax_value, reg->var_off.value | reg->var_off.mask); + + /* Check if u64 and tnum overlap in a single value */ + tnum_next = tnum_step(reg->var_off, reg->umin_value); + umin_in_tnum = (reg->umin_value & ~reg->var_off.mask) == reg->var_off.value; + tmax = reg->var_off.value | reg->var_off.mask; + if (umin_in_tnum && tnum_next > reg->umax_value) { + /* The u64 range and the tnum only overlap in umin. + * u64: ---[xxxxxx]----- + * tnum: --xx----------x- + */ + ___mark_reg_known(reg, reg->umin_value); + } else if (!umin_in_tnum && tnum_next == tmax) { + /* The u64 range and the tnum only overlap in the maximum value + * represented by the tnum, called tmax. + * u64: ---[xxxxxx]----- + * tnum: xx-----x-------- + */ + ___mark_reg_known(reg, tmax); + } else if (!umin_in_tnum && tnum_next <= reg->umax_value && + tnum_step(reg->var_off, tnum_next) > reg->umax_value) { + /* The u64 range and the tnum only overlap in between umin + * (excluded) and umax. + * u64: ---[xxxxxx]----- + * tnum: xx----x-------x- + */ + ___mark_reg_known(reg, tnum_next); + } } static void __update_reg_bounds(struct bpf_reg_state *reg) From e6ad477d1bf8829973cddd9accbafa9d1a6cd15a Mon Sep 17 00:00:00 2001 From: Paul Chaignon Date: Fri, 27 Feb 2026 22:36:30 +0100 Subject: [PATCH 343/359] selftests/bpf: Test refinement of single-value tnum This patch introduces selftests to cover the new bounds refinement logic introduced in the previous patch. Without the previous patch, the first two tests fail because of the invariant violation they trigger. The last test fails because the R10 access is not detected as dead code. In addition, all three tests fail because of R0 having a non-constant value in the verifier logs. In addition, the last two cases are covering the negative cases: when we shouldn't refine the bounds because the u64 and tnum overlap in at least two values. Signed-off-by: Paul Chaignon Link: https://lore.kernel.org/r/90d880c8cf587b9f7dc715d8961cd1b8111d01a8.1772225741.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov --- .../selftests/bpf/progs/verifier_bounds.c | 137 ++++++++++++++++++ 1 file changed, 137 insertions(+) diff --git a/tools/testing/selftests/bpf/progs/verifier_bounds.c b/tools/testing/selftests/bpf/progs/verifier_bounds.c index 560531404bce..97065a26cf70 100644 --- a/tools/testing/selftests/bpf/progs/verifier_bounds.c +++ b/tools/testing/selftests/bpf/progs/verifier_bounds.c @@ -1863,4 +1863,141 @@ l1_%=: r0 = 1; \ : __clobber_all); } +/* This test covers the bounds deduction when the u64 range and the tnum + * overlap only at umax. After instruction 3, the ranges look as follows: + * + * 0 umin=0xe01 umax=0xf00 U64_MAX + * | [xxxxxxxxxxxxxx] | + * |----------------------------|------------------------------| + * | x x | tnum values + * + * The verifier can therefore deduce that the R0=0xf0=240. + */ +SEC("socket") +__description("bounds refinement with single-value tnum on umax") +__msg("3: (15) if r0 == 0xe0 {{.*}} R0=240") +__success __log_level(2) +__flag(BPF_F_TEST_REG_INVARIANTS) +__naked void bounds_refinement_tnum_umax(void *ctx) +{ + asm volatile(" \ + call %[bpf_get_prandom_u32]; \ + r0 |= 0xe0; \ + r0 &= 0xf0; \ + if r0 == 0xe0 goto +2; \ + if r0 == 0xf0 goto +1; \ + r10 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + +/* This test covers the bounds deduction when the u64 range and the tnum + * overlap only at umin. After instruction 3, the ranges look as follows: + * + * 0 umin=0xe00 umax=0xeff U64_MAX + * | [xxxxxxxxxxxxxx] | + * |----------------------------|------------------------------| + * | x x | tnum values + * + * The verifier can therefore deduce that the R0=0xe0=224. + */ +SEC("socket") +__description("bounds refinement with single-value tnum on umin") +__msg("3: (15) if r0 == 0xf0 {{.*}} R0=224") +__success __log_level(2) +__flag(BPF_F_TEST_REG_INVARIANTS) +__naked void bounds_refinement_tnum_umin(void *ctx) +{ + asm volatile(" \ + call %[bpf_get_prandom_u32]; \ + r0 |= 0xe0; \ + r0 &= 0xf0; \ + if r0 == 0xf0 goto +2; \ + if r0 == 0xe0 goto +1; \ + r10 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + +/* This test covers the bounds deduction when the only possible tnum value is + * in the middle of the u64 range. After instruction 3, the ranges look as + * follows: + * + * 0 umin=0x7cf umax=0x7df U64_MAX + * | [xxxxxxxxxxxx] | + * |----------------------------|------------------------------| + * | x x x x x | tnum values + * | +--- 0x7e0 + * +--- 0x7d0 + * + * Since the lower four bits are zero, the tnum and the u64 range only overlap + * in R0=0x7d0=2000. Instruction 5 is therefore dead code. + */ +SEC("socket") +__description("bounds refinement with single-value tnum in middle of range") +__msg("3: (a5) if r0 < 0x7cf {{.*}} R0=2000") +__success __log_level(2) +__naked void bounds_refinement_tnum_middle(void *ctx) +{ + asm volatile(" \ + call %[bpf_get_prandom_u32]; \ + if r0 & 0x0f goto +4; \ + if r0 > 0x7df goto +3; \ + if r0 < 0x7cf goto +2; \ + if r0 == 0x7d0 goto +1; \ + r10 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + +/* This test cover the negative case for the tnum/u64 overlap. Since + * they contain the same two values (i.e., {0, 1}), we can't deduce + * anything more. + */ +SEC("socket") +__description("bounds refinement: several overlaps between tnum and u64") +__msg("2: (25) if r0 > 0x1 {{.*}} R0=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1))") +__failure __log_level(2) +__naked void bounds_refinement_several_overlaps(void *ctx) +{ + asm volatile(" \ + call %[bpf_get_prandom_u32]; \ + if r0 < 0 goto +3; \ + if r0 > 1 goto +2; \ + if r0 == 1 goto +1; \ + r10 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + +/* This test cover the negative case for the tnum/u64 overlap. Since + * they overlap in the two values contained by the u64 range (i.e., + * {0xf, 0x10}), we can't deduce anything more. + */ +SEC("socket") +__description("bounds refinement: multiple overlaps between tnum and u64") +__msg("2: (25) if r0 > 0x10 {{.*}} R0=scalar(smin=umin=smin32=umin32=15,smax=umax=smax32=umax32=16,var_off=(0x0; 0x1f))") +__failure __log_level(2) +__naked void bounds_refinement_multiple_overlaps(void *ctx) +{ + asm volatile(" \ + call %[bpf_get_prandom_u32]; \ + if r0 < 0xf goto +3; \ + if r0 > 0x10 goto +2; \ + if r0 == 0x10 goto +1; \ + r10 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + char _license[] SEC("license") = "GPL"; From 024cea2d647ed8ab942f19544b892d324dba42b4 Mon Sep 17 00:00:00 2001 From: Paul Chaignon Date: Fri, 27 Feb 2026 22:42:45 +0100 Subject: [PATCH 344/359] selftests/bpf: Avoid simplification of crafted bounds test The reg_bounds_crafted tests validate the verifier's range analysis logic. They focus on the actual ranges and thus ignore the tnum. As a consequence, they carry the assumption that the tested cases can be reproduced in userspace without using the tnum information. Unfortunately, the previous change the refinement logic breaks that assumption for one test case: (u64)2147483648 (u32) [4294967294; 0x100000000] The tested bytecode is shown below. Without our previous improvement, on the false branch of the condition, R7 is only known to have u64 range [0xfffffffe; 0x100000000]. With our improvement, and using the tnum information, we can deduce that R7 equals 0x100000000. 19: (bc) w0 = w6 ; R6=0x80000000 20: (bc) w0 = w7 ; R7=scalar(smin=umin=0xfffffffe,smax=umax=0x100000000,smin32=-2,smax32=0,var_off=(0x0; 0x1ffffffff)) 21: (be) if w6 <= w7 goto pc+3 ; R6=0x80000000 R7=0x100000000 R7's tnum is (0; 0x1ffffffff). On the false branch, regs_refine_cond_op refines R7's u32 range to [0; 0x7fffffff]. Then, __reg32_deduce_bounds refines the s32 range to 0 using u32 and finally also sets u32=0. From this, __reg_bound_offset improves the tnum to (0; 0x100000000). Finally, our previous patch uses this new tnum to deduce that it only intersect with u64=[0xfffffffe; 0x100000000] in a single value: 0x100000000. Because the verifier uses the tnum to reach this constant value, the selftest is unable to reproduce it by only simulating ranges. The solution implemented in this patch is to change the test case such that there is more than one overlap value between u64 and the tnum. The max. u64 value is thus changed from 0x100000000 to 0x300000000. Acked-by: Eduard Zingerman Signed-off-by: Paul Chaignon Link: https://lore.kernel.org/r/50641c6a7ef39520595dcafa605692427c1006ec.1772225741.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov --- tools/testing/selftests/bpf/prog_tests/reg_bounds.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/reg_bounds.c b/tools/testing/selftests/bpf/prog_tests/reg_bounds.c index d93a0c7b1786..0322f817d07b 100644 --- a/tools/testing/selftests/bpf/prog_tests/reg_bounds.c +++ b/tools/testing/selftests/bpf/prog_tests/reg_bounds.c @@ -2091,7 +2091,7 @@ static struct subtest_case crafted_cases[] = { {U64, S64, {0, 0xffffffffULL}, {0x7fffffff, 0x7fffffff}}, {U64, U32, {0, 0x100000000}, {0, 0}}, - {U64, U32, {0xfffffffe, 0x100000000}, {0x80000000, 0x80000000}}, + {U64, U32, {0xfffffffe, 0x300000000}, {0x80000000, 0x80000000}}, {U64, S32, {0, 0xffffffff00000000ULL}, {0, 0}}, /* these are tricky cases where lower 32 bits allow to tighten 64 From 407fd8b8d8cce03856aa67329715de48b254b529 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Wed, 11 Feb 2026 19:03:03 +0100 Subject: [PATCH 345/359] KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIER All architectures now use MMU notifier for KVM page table management. Remove the Kconfig symbol and the code that is used when it is disabled. Signed-off-by: Paolo Bonzini --- arch/arm64/kvm/Kconfig | 1 - arch/loongarch/kvm/Kconfig | 1 - arch/mips/kvm/Kconfig | 1 - arch/powerpc/kvm/Kconfig | 4 ---- arch/powerpc/kvm/powerpc.c | 1 - arch/riscv/kvm/Kconfig | 1 - arch/s390/kvm/Kconfig | 2 -- arch/x86/kvm/Kconfig | 1 - include/linux/kvm_host.h | 7 +------ virt/kvm/Kconfig | 9 +-------- virt/kvm/kvm_main.c | 16 ---------------- 11 files changed, 2 insertions(+), 42 deletions(-) diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index 4f803fd1c99a..7d1f22fd490b 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -21,7 +21,6 @@ menuconfig KVM bool "Kernel-based Virtual Machine (KVM) support" select KVM_COMMON select KVM_GENERIC_HARDWARE_ENABLING - select KVM_GENERIC_MMU_NOTIFIER select HAVE_KVM_CPU_RELAX_INTERCEPT select KVM_MMIO select KVM_GENERIC_DIRTYLOG_READ_PROTECT diff --git a/arch/loongarch/kvm/Kconfig b/arch/loongarch/kvm/Kconfig index ed4f724db774..8e5213609975 100644 --- a/arch/loongarch/kvm/Kconfig +++ b/arch/loongarch/kvm/Kconfig @@ -28,7 +28,6 @@ config KVM select KVM_COMMON select KVM_GENERIC_DIRTYLOG_READ_PROTECT select KVM_GENERIC_HARDWARE_ENABLING - select KVM_GENERIC_MMU_NOTIFIER select KVM_MMIO select VIRT_XFER_TO_GUEST_WORK select SCHED_INFO diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig index cc13cc35f208..b1b9a1d67758 100644 --- a/arch/mips/kvm/Kconfig +++ b/arch/mips/kvm/Kconfig @@ -23,7 +23,6 @@ config KVM select KVM_COMMON select KVM_GENERIC_DIRTYLOG_READ_PROTECT select KVM_MMIO - select KVM_GENERIC_MMU_NOTIFIER select KVM_GENERIC_HARDWARE_ENABLING select HAVE_KVM_READONLY_MEM help diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index c9a2d50ff1b0..9a0d1c1aca6c 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -38,7 +38,6 @@ config KVM_BOOK3S_64_HANDLER config KVM_BOOK3S_PR_POSSIBLE bool select KVM_MMIO - select KVM_GENERIC_MMU_NOTIFIER config KVM_BOOK3S_HV_POSSIBLE bool @@ -81,7 +80,6 @@ config KVM_BOOK3S_64_HV tristate "KVM for POWER7 and later using hypervisor mode in host" depends on KVM_BOOK3S_64 && PPC_POWERNV select KVM_BOOK3S_HV_POSSIBLE - select KVM_GENERIC_MMU_NOTIFIER select KVM_BOOK3S_HV_PMU select CMA help @@ -203,7 +201,6 @@ config KVM_E500V2 depends on !CONTEXT_TRACKING_USER select KVM select KVM_MMIO - select KVM_GENERIC_MMU_NOTIFIER help Support running unmodified E500 guest kernels in virtual machines on E500v2 host processors. @@ -220,7 +217,6 @@ config KVM_E500MC select KVM select KVM_MMIO select KVM_BOOKE_HV - select KVM_GENERIC_MMU_NOTIFIER help Support running unmodified E500MC/E5500/E6500 guest kernels in virtual machines on E500MC/E5500/E6500 host processors. diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 9a89a6d98f97..3da40ea8c562 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -625,7 +625,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) break; #endif case KVM_CAP_SYNC_MMU: - BUILD_BUG_ON(!IS_ENABLED(CONFIG_KVM_GENERIC_MMU_NOTIFIER)); r = 1; break; #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig index 77379f77840a..ec2cee0a39e0 100644 --- a/arch/riscv/kvm/Kconfig +++ b/arch/riscv/kvm/Kconfig @@ -30,7 +30,6 @@ config KVM select KVM_GENERIC_HARDWARE_ENABLING select KVM_MMIO select VIRT_XFER_TO_GUEST_WORK - select KVM_GENERIC_MMU_NOTIFIER select SCHED_INFO select GUEST_PERF_EVENTS if PERF_EVENTS help diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig index 917ac740513e..5b835bc6a194 100644 --- a/arch/s390/kvm/Kconfig +++ b/arch/s390/kvm/Kconfig @@ -28,9 +28,7 @@ config KVM select HAVE_KVM_INVALID_WAKEUPS select HAVE_KVM_NO_POLL select KVM_VFIO - select MMU_NOTIFIER select VIRT_XFER_TO_GUEST_WORK - select KVM_GENERIC_MMU_NOTIFIER select KVM_MMU_LOCKLESS_AGING help Support hosting paravirtualized guest machines using the SIE diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index d916bd766c94..801bf9e520db 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -20,7 +20,6 @@ if VIRTUALIZATION config KVM_X86 def_tristate KVM if (KVM_INTEL != n || KVM_AMD != n) select KVM_COMMON - select KVM_GENERIC_MMU_NOTIFIER select KVM_ELIDE_TLB_FLUSH_IF_YOUNG select KVM_MMU_LOCKLESS_AGING select HAVE_KVM_IRQCHIP diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index dde605cb894e..34759a262b28 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -253,7 +253,6 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); #endif -#ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER union kvm_mmu_notifier_arg { unsigned long attributes; }; @@ -275,7 +274,6 @@ struct kvm_gfn_range { bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); -#endif enum { OUTSIDE_GUEST_MODE, @@ -849,13 +847,12 @@ struct kvm { struct hlist_head irq_ack_notifier_list; #endif -#ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER struct mmu_notifier mmu_notifier; unsigned long mmu_invalidate_seq; long mmu_invalidate_in_progress; gfn_t mmu_invalidate_range_start; gfn_t mmu_invalidate_range_end; -#endif + struct list_head devices; u64 manual_dirty_log_protect; struct dentry *debugfs_dentry; @@ -2118,7 +2115,6 @@ extern const struct _kvm_stats_desc kvm_vm_stats_desc[]; extern const struct kvm_stats_header kvm_vcpu_stats_header; extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[]; -#ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq) { if (unlikely(kvm->mmu_invalidate_in_progress)) @@ -2196,7 +2192,6 @@ static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm, return READ_ONCE(kvm->mmu_invalidate_seq) != mmu_seq; } -#endif #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 267c7369c765..794976b88c6f 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -5,6 +5,7 @@ config KVM_COMMON bool select EVENTFD select INTERVAL_TREE + select MMU_NOTIFIER select PREEMPT_NOTIFIERS config HAVE_KVM_PFNCACHE @@ -93,24 +94,16 @@ config HAVE_KVM_PM_NOTIFIER config KVM_GENERIC_HARDWARE_ENABLING bool -config KVM_GENERIC_MMU_NOTIFIER - select MMU_NOTIFIER - bool - config KVM_ELIDE_TLB_FLUSH_IF_YOUNG - depends on KVM_GENERIC_MMU_NOTIFIER bool config KVM_MMU_LOCKLESS_AGING - depends on KVM_GENERIC_MMU_NOTIFIER bool config KVM_GENERIC_MEMORY_ATTRIBUTES - depends on KVM_GENERIC_MMU_NOTIFIER bool config KVM_GUEST_MEMFD - depends on KVM_GENERIC_MMU_NOTIFIER select XARRAY_MULTI bool diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 22f8a672e1fd..29ee01747347 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -502,7 +502,6 @@ void kvm_destroy_vcpus(struct kvm *kvm) } EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_destroy_vcpus); -#ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) { return container_of(mn, struct kvm, mmu_notifier); @@ -901,15 +900,6 @@ static int kvm_init_mmu_notifier(struct kvm *kvm) return mmu_notifier_register(&kvm->mmu_notifier, current->mm); } -#else /* !CONFIG_KVM_GENERIC_MMU_NOTIFIER */ - -static int kvm_init_mmu_notifier(struct kvm *kvm) -{ - return 0; -} - -#endif /* CONFIG_KVM_GENERIC_MMU_NOTIFIER */ - #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER static int kvm_pm_notifier_call(struct notifier_block *bl, unsigned long state, @@ -1226,10 +1216,8 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) out_err_no_debugfs: kvm_coalesced_mmio_free(kvm); out_no_coalesced_mmio: -#ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER if (kvm->mmu_notifier.ops) mmu_notifier_unregister(&kvm->mmu_notifier, current->mm); -#endif out_err_no_mmu_notifier: kvm_disable_virtualization(); out_err_no_disable: @@ -1292,7 +1280,6 @@ static void kvm_destroy_vm(struct kvm *kvm) kvm->buses[i] = NULL; } kvm_coalesced_mmio_free(kvm); -#ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER mmu_notifier_unregister(&kvm->mmu_notifier, kvm->mm); /* * At this point, pending calls to invalidate_range_start() @@ -1311,9 +1298,6 @@ static void kvm_destroy_vm(struct kvm *kvm) kvm->mn_active_invalidate_count = 0; else WARN_ON(kvm->mmu_invalidate_in_progress); -#else - kvm_flush_shadow_all(kvm); -#endif kvm_arch_destroy_vm(kvm); kvm_destroy_devices(kvm); for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) { From 70295a479da684905c18d96656d781823f418ec2 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Wed, 11 Feb 2026 19:06:31 +0100 Subject: [PATCH 346/359] KVM: always define KVM_CAP_SYNC_MMU KVM_CAP_SYNC_MMU is provided by KVM's MMU notifiers, which are now always available. Move the definition from individual architectures to common code. Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 10 ++++------ arch/arm64/kvm/arm.c | 1 - arch/loongarch/kvm/vm.c | 1 - arch/mips/kvm/mips.c | 1 - arch/powerpc/kvm/powerpc.c | 5 ----- arch/riscv/kvm/vm.c | 1 - arch/s390/kvm/kvm-s390.c | 1 - arch/x86/kvm/x86.c | 1 - virt/kvm/kvm_main.c | 1 + 9 files changed, 5 insertions(+), 17 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index fc5736839edd..6f85e1b321dd 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -1396,7 +1396,10 @@ or its flags may be modified, but it may not be resized. Memory for the region is taken starting at the address denoted by the field userspace_addr, which must point at user addressable memory for the entire memory slot size. Any object may back this memory, including -anonymous memory, ordinary files, and hugetlbfs. +anonymous memory, ordinary files, and hugetlbfs. Changes in the backing +of the memory region are automatically reflected into the guest. +For example, an mmap() that affects the region will be made visible +immediately. Another example is madvise(MADV_DROP). On architectures that support a form of address tagging, userspace_addr must be an untagged address. @@ -1412,11 +1415,6 @@ use it. The latter can be set, if KVM_CAP_READONLY_MEM capability allows it, to make a new slot read-only. In this case, writes to this memory will be posted to userspace as KVM_EXIT_MMIO exits. -When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of -the memory region are automatically reflected into the guest. For example, an -mmap() that affects the region will be made visible immediately. Another -example is madvise(MADV_DROP). - For TDX guest, deleting/moving memory region loses guest memory contents. Read only region isn't supported. Only as-id 0 is supported. diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 29f0326f7e00..410ffd41fd73 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -358,7 +358,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) break; case KVM_CAP_IOEVENTFD: case KVM_CAP_USER_MEMORY: - case KVM_CAP_SYNC_MMU: case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: case KVM_CAP_ONE_REG: case KVM_CAP_ARM_PSCI: diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c index 63fd40530aa9..5282158b8122 100644 --- a/arch/loongarch/kvm/vm.c +++ b/arch/loongarch/kvm/vm.c @@ -118,7 +118,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ONE_REG: case KVM_CAP_ENABLE_CAP: case KVM_CAP_READONLY_MEM: - case KVM_CAP_SYNC_MMU: case KVM_CAP_IMMEDIATE_EXIT: case KVM_CAP_IOEVENTFD: case KVM_CAP_MP_STATE: diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index b0fb92fda4d4..29d9f630edfb 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -1035,7 +1035,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ONE_REG: case KVM_CAP_ENABLE_CAP: case KVM_CAP_READONLY_MEM: - case KVM_CAP_SYNC_MMU: case KVM_CAP_IMMEDIATE_EXIT: r = 1; break; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 3da40ea8c562..00302399fc37 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -623,11 +623,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = !!(hv_enabled && kvmppc_hv_ops->enable_nested && !kvmppc_hv_ops->enable_nested(NULL)); break; -#endif - case KVM_CAP_SYNC_MMU: - r = 1; - break; -#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE case KVM_CAP_PPC_HTAB_FD: r = hv_enabled; break; diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c index 7cbd2340c190..58bce57dc55b 100644 --- a/arch/riscv/kvm/vm.c +++ b/arch/riscv/kvm/vm.c @@ -181,7 +181,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) break; case KVM_CAP_IOEVENTFD: case KVM_CAP_USER_MEMORY: - case KVM_CAP_SYNC_MMU: case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: case KVM_CAP_ONE_REG: case KVM_CAP_READONLY_MEM: diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 7a175d86cef0..bc7d6fa66eaf 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -601,7 +601,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) switch (ext) { case KVM_CAP_S390_PSW: case KVM_CAP_S390_GMAP: - case KVM_CAP_SYNC_MMU: #ifdef CONFIG_KVM_S390_UCONTROL case KVM_CAP_S390_UCONTROL: #endif diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3fb64905d190..a03530795707 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4805,7 +4805,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) #endif case KVM_CAP_NOP_IO_DELAY: case KVM_CAP_MP_STATE: - case KVM_CAP_SYNC_MMU: case KVM_CAP_USER_NMI: case KVM_CAP_IRQ_INJECT_STATUS: case KVM_CAP_IOEVENTFD: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 29ee01747347..1bc1da66b4b0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4870,6 +4870,7 @@ static int kvm_ioctl_create_device(struct kvm *kvm, static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) { switch (arg) { + case KVM_CAP_SYNC_MMU: case KVM_CAP_USER_MEMORY: case KVM_CAP_USER_MEMORY2: case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: From 9197e5949a41cfb5d44a6b8a860766266340d558 Mon Sep 17 00:00:00 2001 From: Takashi Sakamoto Date: Sat, 28 Feb 2026 11:56:03 +0900 Subject: [PATCH 347/359] firewire: ohci: initialize page array to use alloc_pages_bulk() correctly The call of alloc_pages_bulk() skips to fill entries of page array when the entries already have values. While, 1394 OHCI PCI driver passes the page array without initializing. It could cause invalid state at PFN validation in vmap(). Fixes: f2ae92780ab9 ("firewire: ohci: split page allocation from dma mapping") Reported-by: John Ogness Reported-and-tested-by: Harald Arnesen Reported-and-tested-by: David Gow Closes: https://lore.kernel.org/lkml/87tsv1vig5.fsf@jogness.linutronix.de/ Signed-off-by: Takashi Sakamoto Signed-off-by: Linus Torvalds --- drivers/firewire/ohci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c index 1c868c1e4a49..8153d62c58f0 100644 --- a/drivers/firewire/ohci.c +++ b/drivers/firewire/ohci.c @@ -848,7 +848,7 @@ static int ar_context_init(struct ar_context *ctx, struct fw_ohci *ohci, { struct device *dev = ohci->card.device; unsigned int i; - struct page *pages[AR_BUFFERS + AR_WRAPAROUND_PAGES]; + struct page *pages[AR_BUFFERS + AR_WRAPAROUND_PAGES] = { NULL }; dma_addr_t dma_addrs[AR_BUFFERS]; void *vaddr; struct descriptor *d; From 11439c4635edd669ae435eec308f4ab8a0804808 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 1 Mar 2026 15:39:31 -0800 Subject: [PATCH 348/359] Linux 7.0-rc2 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index e944c6e71e81..2446085983f7 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,7 @@ VERSION = 7 PATCHLEVEL = 0 SUBLEVEL = 0 -EXTRAVERSION = -rc1 +EXTRAVERSION = -rc2 NAME = Baby Opossum Posse # *DOCUMENTATION* From 54a86cf48eaa6d1ab5130d756b718775e81e1748 Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Thu, 5 Feb 2026 00:25:37 +0000 Subject: [PATCH 349/359] ASoC: fsl_easrc: Fix event generation in fsl_easrc_iec958_put_bits() ALSA controls should return 1 if the value in the control changed but the control put operation fsl_easrc_iec958_put_bits() unconditionally returns 0, causing ALSA to not generate any change events. This is detected by mixer-test with large numbers of messages in the form: No event generated for Context 3 IEC958 CS5 Context 3 IEC958 CS5.0 orig 5224 read 5225, is_volatile 0 Add a suitable check. Signed-off-by: Mark Brown Link: https://patch.msgid.link/20260205-asoc-fsl-easrc-fix-events-v1-1-39d4c766918b@kernel.org Signed-off-by: Mark Brown --- sound/soc/fsl/fsl_easrc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/sound/soc/fsl/fsl_easrc.c b/sound/soc/fsl/fsl_easrc.c index e64a0d97afd0..5b6204923e8a 100644 --- a/sound/soc/fsl/fsl_easrc.c +++ b/sound/soc/fsl/fsl_easrc.c @@ -52,10 +52,13 @@ static int fsl_easrc_iec958_put_bits(struct snd_kcontrol *kcontrol, struct soc_mreg_control *mc = (struct soc_mreg_control *)kcontrol->private_value; unsigned int regval = ucontrol->value.integer.value[0]; + int ret; + + ret = (easrc_priv->bps_iec958[mc->regbase] != regval); easrc_priv->bps_iec958[mc->regbase] = regval; - return 0; + return ret; } static int fsl_easrc_iec958_get_bits(struct snd_kcontrol *kcontrol, From 31ddc62c1cd92e51b9db61d7954b85ae2ec224da Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Thu, 5 Feb 2026 00:25:38 +0000 Subject: [PATCH 350/359] ASoC: fsl_easrc: Fix event generation in fsl_easrc_iec958_set_reg() ALSA controls should return 1 if the value in the control changed but the control put operation fsl_easrc_set_reg() only returns 0 or a negative error code, causing ALSA to not generate any change events. Add a suitable check by using regmap_update_bits_check() with the underlying regmap, this is more clearly and simply correct than trying to verify that one of the generic ops is exactly equivalent to this one. Signed-off-by: Mark Brown Link: https://patch.msgid.link/20260205-asoc-fsl-easrc-fix-events-v1-2-39d4c766918b@kernel.org Signed-off-by: Mark Brown --- sound/soc/fsl/fsl_easrc.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/sound/soc/fsl/fsl_easrc.c b/sound/soc/fsl/fsl_easrc.c index 5b6204923e8a..6c56134c60cc 100644 --- a/sound/soc/fsl/fsl_easrc.c +++ b/sound/soc/fsl/fsl_easrc.c @@ -96,14 +96,17 @@ static int fsl_easrc_set_reg(struct snd_kcontrol *kcontrol, struct snd_soc_component *component = snd_kcontrol_chip(kcontrol); struct soc_mreg_control *mc = (struct soc_mreg_control *)kcontrol->private_value; + struct fsl_asrc *easrc = snd_soc_component_get_drvdata(component); unsigned int regval = ucontrol->value.integer.value[0]; + bool changed; int ret; - ret = snd_soc_component_write(component, mc->regbase, regval); - if (ret < 0) + ret = regmap_update_bits_check(easrc->regmap, mc->regbase, + GENMASK(31, 0), regval, &changed); + if (ret != 0) return ret; - return 0; + return changed; } #define SOC_SINGLE_REG_RW(xname, xreg) \ From 9351cf3fd92dc1349bb75f2f7f7324607dcf596f Mon Sep 17 00:00:00 2001 From: Richard Fitzgerald Date: Thu, 26 Feb 2026 11:01:37 +0000 Subject: [PATCH 351/359] ASoC: cs35l56: Only patch ASP registers if the DAI is part of a DAIlink Move the ASP register patches to a separate struct and apply this from the ASP DAI probe() function so that the registers are only patched if the DAI is part of a DAI link. Some systems use the ASP as a special-purpose interconnect and on these systems the ASP registers are configured by a third party (the firmware, the BIOS, or another device using the amp's secondary host control interface). If the machine driver does not hook up the ASP DAI then the ASP registers must be omitted from the patch to prevent overwriting the third party configuration. If the machine driver includes the ASP DAI in a DAI link, this implies that the machine driver and higher components (such as alsa-ucm) are taking ownership of the ASP. In this case the ASP registers are patched to known defaults and the machine driver should configure the ASP. Signed-off-by: Richard Fitzgerald Link: https://patch.msgid.link/20260226110137.1664562-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown --- include/sound/cs35l56.h | 1 + sound/soc/codecs/cs35l56-shared.c | 16 +++++++++++++++- sound/soc/codecs/cs35l56.c | 8 ++++++++ 3 files changed, 24 insertions(+), 1 deletion(-) diff --git a/include/sound/cs35l56.h b/include/sound/cs35l56.h index ae1e1489b671..28f9f5940ab6 100644 --- a/include/sound/cs35l56.h +++ b/include/sound/cs35l56.h @@ -406,6 +406,7 @@ extern const char * const cs35l56_cal_set_status_text[3]; extern const char * const cs35l56_tx_input_texts[CS35L56_NUM_INPUT_SRC]; extern const unsigned int cs35l56_tx_input_values[CS35L56_NUM_INPUT_SRC]; +int cs35l56_set_asp_patch(struct cs35l56_base *cs35l56_base); int cs35l56_set_patch(struct cs35l56_base *cs35l56_base); int cs35l56_mbox_send(struct cs35l56_base *cs35l56_base, unsigned int command); int cs35l56_firmware_shutdown(struct cs35l56_base *cs35l56_base); diff --git a/sound/soc/codecs/cs35l56-shared.c b/sound/soc/codecs/cs35l56-shared.c index 4707f28bfca2..af87ebae98cb 100644 --- a/sound/soc/codecs/cs35l56-shared.c +++ b/sound/soc/codecs/cs35l56-shared.c @@ -26,7 +26,7 @@ #include "cs35l56.h" -static const struct reg_sequence cs35l56_patch[] = { +static const struct reg_sequence cs35l56_asp_patch[] = { /* * Firmware can change these to non-defaults to satisfy SDCA. * Ensure that they are at known defaults. @@ -43,6 +43,20 @@ static const struct reg_sequence cs35l56_patch[] = { { CS35L56_ASP1TX2_INPUT, 0x00000000 }, { CS35L56_ASP1TX3_INPUT, 0x00000000 }, { CS35L56_ASP1TX4_INPUT, 0x00000000 }, +}; + +int cs35l56_set_asp_patch(struct cs35l56_base *cs35l56_base) +{ + return regmap_register_patch(cs35l56_base->regmap, cs35l56_asp_patch, + ARRAY_SIZE(cs35l56_asp_patch)); +} +EXPORT_SYMBOL_NS_GPL(cs35l56_set_asp_patch, "SND_SOC_CS35L56_SHARED"); + +static const struct reg_sequence cs35l56_patch[] = { + /* + * Firmware can change these to non-defaults to satisfy SDCA. + * Ensure that they are at known defaults. + */ { CS35L56_SWIRE_DP3_CH1_INPUT, 0x00000018 }, { CS35L56_SWIRE_DP3_CH2_INPUT, 0x00000019 }, { CS35L56_SWIRE_DP3_CH3_INPUT, 0x00000029 }, diff --git a/sound/soc/codecs/cs35l56.c b/sound/soc/codecs/cs35l56.c index 2ff8b172b76e..3bf9e8fc34a2 100644 --- a/sound/soc/codecs/cs35l56.c +++ b/sound/soc/codecs/cs35l56.c @@ -348,6 +348,13 @@ static int cs35l56_dsp_event(struct snd_soc_dapm_widget *w, return wm_adsp_event(w, kcontrol, event); } +static int cs35l56_asp_dai_probe(struct snd_soc_dai *codec_dai) +{ + struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(codec_dai->component); + + return cs35l56_set_asp_patch(&cs35l56->base); +} + static int cs35l56_asp_dai_set_fmt(struct snd_soc_dai *codec_dai, unsigned int fmt) { struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(codec_dai->component); @@ -552,6 +559,7 @@ static int cs35l56_asp_dai_set_sysclk(struct snd_soc_dai *dai, } static const struct snd_soc_dai_ops cs35l56_ops = { + .probe = cs35l56_asp_dai_probe, .set_fmt = cs35l56_asp_dai_set_fmt, .set_tdm_slot = cs35l56_asp_dai_set_tdm_slot, .hw_params = cs35l56_asp_dai_hw_params, From 986841dcad257615a6e3f89231bb38e1f3506b77 Mon Sep 17 00:00:00 2001 From: Shuming Fan Date: Wed, 25 Feb 2026 17:12:10 +0800 Subject: [PATCH 352/359] ASoC: rt1321: fix DMIC ch2/3 mask issue This patch fixed the DMIC ch2/3 mask missing problem. Signed-off-by: Shuming Fan Link: https://patch.msgid.link/20260225091210.3648905-1-shumingf@realtek.com Signed-off-by: Mark Brown --- sound/soc/codecs/rt1320-sdw.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/sound/soc/codecs/rt1320-sdw.c b/sound/soc/codecs/rt1320-sdw.c index 50f65662e143..8bb7e8497c72 100644 --- a/sound/soc/codecs/rt1320-sdw.c +++ b/sound/soc/codecs/rt1320-sdw.c @@ -2629,7 +2629,7 @@ static int rt1320_sdw_hw_params(struct snd_pcm_substream *substream, struct sdw_port_config port_config; struct sdw_port_config dmic_port_config[2]; struct sdw_stream_runtime *sdw_stream; - int retval; + int retval, num_channels; unsigned int sampling_rate; dev_dbg(dai->dev, "%s %s", __func__, dai->name); @@ -2661,7 +2661,8 @@ static int rt1320_sdw_hw_params(struct snd_pcm_substream *substream, dmic_port_config[1].num = 10; break; case RT1321_DEV_ID: - dmic_port_config[0].ch_mask = BIT(0) | BIT(1); + num_channels = params_channels(params); + dmic_port_config[0].ch_mask = GENMASK(num_channels - 1, 0); dmic_port_config[0].num = 8; break; default: From 70eddf6a0a3fc6d3ab6f77251676da97cc7f12ae Mon Sep 17 00:00:00 2001 From: Oliver Freyermuth Date: Tue, 24 Feb 2026 20:02:24 +0100 Subject: [PATCH 353/359] ASoC: Intel: sof_sdw: Add quirk for Alienware Area 51 (2025) 0CCD SKU This adds the necessary quirk for the Alienware 18 Area 51 (2025). Complements commit 1b03391d073d ("ASoC: Intel: sof_sdw: Add quirk for Alienware Area 51 (2025) 0CCC SKU"). Signed-off-by: Oliver Freyermuth Tested-by: Oliver Freyermuth Link: https://patch.msgid.link/20260224190224.30630-1-o.freyermuth@googlemail.com Signed-off-by: Mark Brown --- sound/soc/intel/boards/sof_sdw.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/sound/soc/intel/boards/sof_sdw.c b/sound/soc/intel/boards/sof_sdw.c index f230991f5f8e..c18ec607e029 100644 --- a/sound/soc/intel/boards/sof_sdw.c +++ b/sound/soc/intel/boards/sof_sdw.c @@ -763,6 +763,14 @@ static const struct dmi_system_id sof_sdw_quirk_table[] = { }, .driver_data = (void *)(SOC_SDW_CODEC_SPKR), }, + { + .callback = sof_sdw_quirk_cb, + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Alienware"), + DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "0CCD") + }, + .driver_data = (void *)(SOC_SDW_CODEC_SPKR), + }, /* Pantherlake devices*/ { .callback = sof_sdw_quirk_cb, From fd13fc700e3e239826a46448bf7f01847dd26f5a Mon Sep 17 00:00:00 2001 From: Simon Trimmer Date: Tue, 24 Feb 2026 13:03:07 +0000 Subject: [PATCH 354/359] ASoC: amd: acp: Add ACP6.3 match entries for Cirrus Logic parts This adds some match entries for a few system configurations: cs42l43 link 0 UID 0 cs35l56 link 1 UID 0 cs35l56 link 1 UID 1 cs35l56 link 1 UID 2 cs35l56 link 1 UID 3 cs42l45 link 1 UID 0 cs35l63 link 0 UID 0 cs35l63 link 0 UID 2 cs35l63 link 0 UID 4 cs35l63 link 0 UID 6 cs42l45 link 0 UID 0 cs35l63 link 1 UID 0 cs35l63 link 1 UID 1 cs42l45 link 0 UID 0 cs35l63 link 1 UID 1 cs35l63 link 1 UID 3 cs42l45 link 1 UID 0 cs35l63 link 0 UID 0 cs35l63 link 0 UID 1 cs42l43 link 1 UID 0 cs35l56 link 1 UID 0 cs35l56 link 1 UID 1 cs35l56 link 1 UID 2 cs35l56 link 1 UID 3 cs35l56 link 1 UID 0 cs35l56 link 1 UID 1 cs35l56 link 1 UID 2 cs35l56 link 1 UID 3 cs35l63 link 0 UID 0 cs35l63 link 0 UID 2 cs35l63 link 0 UID 4 cs35l63 link 0 UID 6 cs42l43 link 0 UID 1 cs42l43b link 0 UID 1 cs42l45 link 0 UID 0 cs42l45 link 1 UID 0 Signed-off-by: Simon Trimmer Link: https://patch.msgid.link/20260224130307.526626-1-simont@opensource.cirrus.com Signed-off-by: Mark Brown --- sound/soc/amd/acp/amd-acp63-acpi-match.c | 413 +++++++++++++++++++++++ 1 file changed, 413 insertions(+) diff --git a/sound/soc/amd/acp/amd-acp63-acpi-match.c b/sound/soc/amd/acp/amd-acp63-acpi-match.c index 9b6a49c051cd..1dbbaba3c75b 100644 --- a/sound/soc/amd/acp/amd-acp63-acpi-match.c +++ b/sound/soc/amd/acp/amd-acp63-acpi-match.c @@ -30,6 +30,20 @@ static const struct snd_soc_acpi_endpoint spk_r_endpoint = { .group_id = 1 }; +static const struct snd_soc_acpi_endpoint spk_2_endpoint = { + .num = 0, + .aggregated = 1, + .group_position = 2, + .group_id = 1 +}; + +static const struct snd_soc_acpi_endpoint spk_3_endpoint = { + .num = 0, + .aggregated = 1, + .group_position = 3, + .group_id = 1 +}; + static const struct snd_soc_acpi_adr_device rt711_rt1316_group_adr[] = { { .adr = 0x000030025D071101ull, @@ -103,6 +117,345 @@ static const struct snd_soc_acpi_adr_device rt722_0_single_adr[] = { } }; +static const struct snd_soc_acpi_endpoint cs42l43_endpoints[] = { + { /* Jack Playback Endpoint */ + .num = 0, + .aggregated = 0, + .group_position = 0, + .group_id = 0, + }, + { /* DMIC Capture Endpoint */ + .num = 1, + .aggregated = 0, + .group_position = 0, + .group_id = 0, + }, + { /* Jack Capture Endpoint */ + .num = 2, + .aggregated = 0, + .group_position = 0, + .group_id = 0, + }, + { /* Speaker Playback Endpoint */ + .num = 3, + .aggregated = 0, + .group_position = 0, + .group_id = 0, + }, +}; + +static const struct snd_soc_acpi_adr_device cs35l56x4_l1u3210_adr[] = { + { + .adr = 0x00013301FA355601ull, + .num_endpoints = 1, + .endpoints = &spk_l_endpoint, + .name_prefix = "AMP1" + }, + { + .adr = 0x00013201FA355601ull, + .num_endpoints = 1, + .endpoints = &spk_r_endpoint, + .name_prefix = "AMP2" + }, + { + .adr = 0x00013101FA355601ull, + .num_endpoints = 1, + .endpoints = &spk_2_endpoint, + .name_prefix = "AMP3" + }, + { + .adr = 0x00013001FA355601ull, + .num_endpoints = 1, + .endpoints = &spk_3_endpoint, + .name_prefix = "AMP4" + }, +}; + +static const struct snd_soc_acpi_adr_device cs35l63x2_l0u01_adr[] = { + { + .adr = 0x00003001FA356301ull, + .num_endpoints = 1, + .endpoints = &spk_l_endpoint, + .name_prefix = "AMP1" + }, + { + .adr = 0x00003101FA356301ull, + .num_endpoints = 1, + .endpoints = &spk_r_endpoint, + .name_prefix = "AMP2" + }, +}; + +static const struct snd_soc_acpi_adr_device cs35l63x2_l1u01_adr[] = { + { + .adr = 0x00013001FA356301ull, + .num_endpoints = 1, + .endpoints = &spk_l_endpoint, + .name_prefix = "AMP1" + }, + { + .adr = 0x00013101FA356301ull, + .num_endpoints = 1, + .endpoints = &spk_r_endpoint, + .name_prefix = "AMP2" + }, +}; + +static const struct snd_soc_acpi_adr_device cs35l63x2_l1u13_adr[] = { + { + .adr = 0x00013101FA356301ull, + .num_endpoints = 1, + .endpoints = &spk_l_endpoint, + .name_prefix = "AMP1" + }, + { + .adr = 0x00013301FA356301ull, + .num_endpoints = 1, + .endpoints = &spk_r_endpoint, + .name_prefix = "AMP2" + }, +}; + +static const struct snd_soc_acpi_adr_device cs35l63x4_l0u0246_adr[] = { + { + .adr = 0x00003001FA356301ull, + .num_endpoints = 1, + .endpoints = &spk_l_endpoint, + .name_prefix = "AMP1" + }, + { + .adr = 0x00003201FA356301ull, + .num_endpoints = 1, + .endpoints = &spk_r_endpoint, + .name_prefix = "AMP2" + }, + { + .adr = 0x00003401FA356301ull, + .num_endpoints = 1, + .endpoints = &spk_2_endpoint, + .name_prefix = "AMP3" + }, + { + .adr = 0x00003601FA356301ull, + .num_endpoints = 1, + .endpoints = &spk_3_endpoint, + .name_prefix = "AMP4" + }, +}; + +static const struct snd_soc_acpi_adr_device cs42l43_l0u0_adr[] = { + { + .adr = 0x00003001FA424301ull, + .num_endpoints = ARRAY_SIZE(cs42l43_endpoints), + .endpoints = cs42l43_endpoints, + .name_prefix = "cs42l43" + } +}; + +static const struct snd_soc_acpi_adr_device cs42l43_l0u1_adr[] = { + { + .adr = 0x00003101FA424301ull, + .num_endpoints = ARRAY_SIZE(cs42l43_endpoints), + .endpoints = cs42l43_endpoints, + .name_prefix = "cs42l43" + } +}; + +static const struct snd_soc_acpi_adr_device cs42l43b_l0u1_adr[] = { + { + .adr = 0x00003101FA2A3B01ull, + .num_endpoints = ARRAY_SIZE(cs42l43_endpoints), + .endpoints = cs42l43_endpoints, + .name_prefix = "cs42l43" + } +}; + +static const struct snd_soc_acpi_adr_device cs42l43_l1u0_cs35l56x4_l1u0123_adr[] = { + { + .adr = 0x00013001FA424301ull, + .num_endpoints = ARRAY_SIZE(cs42l43_endpoints), + .endpoints = cs42l43_endpoints, + .name_prefix = "cs42l43" + }, + { + .adr = 0x00013001FA355601ull, + .num_endpoints = 1, + .endpoints = &spk_l_endpoint, + .name_prefix = "AMP1" + }, + { + .adr = 0x00013101FA355601ull, + .num_endpoints = 1, + .endpoints = &spk_r_endpoint, + .name_prefix = "AMP2" + }, + { + .adr = 0x00013201FA355601ull, + .num_endpoints = 1, + .endpoints = &spk_2_endpoint, + .name_prefix = "AMP3" + }, + { + .adr = 0x00013301FA355601ull, + .num_endpoints = 1, + .endpoints = &spk_3_endpoint, + .name_prefix = "AMP4" + }, +}; + +static const struct snd_soc_acpi_adr_device cs42l45_l0u0_adr[] = { + { + .adr = 0x00003001FA424501ull, + /* Re-use endpoints, but cs42l45 has no speaker */ + .num_endpoints = ARRAY_SIZE(cs42l43_endpoints) - 1, + .endpoints = cs42l43_endpoints, + .name_prefix = "cs42l45" + } +}; + +static const struct snd_soc_acpi_adr_device cs42l45_l1u0_adr[] = { + { + .adr = 0x00013001FA424501ull, + /* Re-use endpoints, but cs42l45 has no speaker */ + .num_endpoints = ARRAY_SIZE(cs42l43_endpoints) - 1, + .endpoints = cs42l43_endpoints, + .name_prefix = "cs42l45" + } +}; + +static const struct snd_soc_acpi_link_adr acp63_cs35l56x4_l1u3210[] = { + { + .mask = BIT(1), + .num_adr = ARRAY_SIZE(cs35l56x4_l1u3210_adr), + .adr_d = cs35l56x4_l1u3210_adr, + }, + {} +}; + +static const struct snd_soc_acpi_link_adr acp63_cs35l63x4_l0u0246[] = { + { + .mask = BIT(0), + .num_adr = ARRAY_SIZE(cs35l63x4_l0u0246_adr), + .adr_d = cs35l63x4_l0u0246_adr, + }, + {} +}; + +static const struct snd_soc_acpi_link_adr acp63_cs42l43_l0u1[] = { + { + .mask = BIT(0), + .num_adr = ARRAY_SIZE(cs42l43_l0u1_adr), + .adr_d = cs42l43_l0u1_adr, + }, + {} +}; + +static const struct snd_soc_acpi_link_adr acp63_cs42l43b_l0u1[] = { + { + .mask = BIT(0), + .num_adr = ARRAY_SIZE(cs42l43b_l0u1_adr), + .adr_d = cs42l43b_l0u1_adr, + }, + {} +}; + +static const struct snd_soc_acpi_link_adr acp63_cs42l43_l0u0_cs35l56x4_l1u3210[] = { + { + .mask = BIT(0), + .num_adr = ARRAY_SIZE(cs42l43_l0u0_adr), + .adr_d = cs42l43_l0u0_adr, + }, + { + .mask = BIT(1), + .num_adr = ARRAY_SIZE(cs35l56x4_l1u3210_adr), + .adr_d = cs35l56x4_l1u3210_adr, + }, + {} +}; + +static const struct snd_soc_acpi_link_adr acp63_cs42l43_l1u0_cs35l56x4_l1u0123[] = { + { + .mask = BIT(1), + .num_adr = ARRAY_SIZE(cs42l43_l1u0_cs35l56x4_l1u0123_adr), + .adr_d = cs42l43_l1u0_cs35l56x4_l1u0123_adr, + }, + {} +}; + +static const struct snd_soc_acpi_link_adr acp63_cs42l45_l0u0[] = { + { + .mask = BIT(0), + .num_adr = ARRAY_SIZE(cs42l45_l0u0_adr), + .adr_d = cs42l45_l0u0_adr, + }, + {} +}; + +static const struct snd_soc_acpi_link_adr acp63_cs42l45_l0u0_cs35l63x2_l1u01[] = { + { + .mask = BIT(0), + .num_adr = ARRAY_SIZE(cs42l45_l0u0_adr), + .adr_d = cs42l45_l0u0_adr, + }, + { + .mask = BIT(1), + .num_adr = ARRAY_SIZE(cs35l63x2_l1u01_adr), + .adr_d = cs35l63x2_l1u01_adr, + }, + {} +}; + +static const struct snd_soc_acpi_link_adr acp63_cs42l45_l0u0_cs35l63x2_l1u13[] = { + { + .mask = BIT(0), + .num_adr = ARRAY_SIZE(cs42l45_l0u0_adr), + .adr_d = cs42l45_l0u0_adr, + }, + { + .mask = BIT(1), + .num_adr = ARRAY_SIZE(cs35l63x2_l1u13_adr), + .adr_d = cs35l63x2_l1u13_adr, + }, + {} +}; + +static const struct snd_soc_acpi_link_adr acp63_cs42l45_l1u0[] = { + { + .mask = BIT(1), + .num_adr = ARRAY_SIZE(cs42l45_l1u0_adr), + .adr_d = cs42l45_l1u0_adr, + }, + {} +}; + +static const struct snd_soc_acpi_link_adr acp63_cs42l45_l1u0_cs35l63x2_l0u01[] = { + { + .mask = BIT(1), + .num_adr = ARRAY_SIZE(cs42l45_l1u0_adr), + .adr_d = cs42l45_l1u0_adr, + }, + { + .mask = BIT(0), + .num_adr = ARRAY_SIZE(cs35l63x2_l0u01_adr), + .adr_d = cs35l63x2_l0u01_adr, + }, + {} +}; + +static const struct snd_soc_acpi_link_adr acp63_cs42l45_l1u0_cs35l63x4_l0u0246[] = { + { + .mask = BIT(1), + .num_adr = ARRAY_SIZE(cs42l45_l1u0_adr), + .adr_d = cs42l45_l1u0_adr, + }, + { + .mask = BIT(0), + .num_adr = ARRAY_SIZE(cs35l63x4_l0u0246_adr), + .adr_d = cs35l63x4_l0u0246_adr, + }, + {} +}; + static const struct snd_soc_acpi_link_adr acp63_rt722_only[] = { { .mask = BIT(0), @@ -135,6 +488,66 @@ struct snd_soc_acpi_mach snd_soc_acpi_amd_acp63_sdw_machines[] = { .links = acp63_4_in_1_sdca, .drv_name = "amd_sdw", }, + { + .link_mask = BIT(0) | BIT(1), + .links = acp63_cs42l43_l0u0_cs35l56x4_l1u3210, + .drv_name = "amd_sdw", + }, + { + .link_mask = BIT(0) | BIT(1), + .links = acp63_cs42l45_l1u0_cs35l63x4_l0u0246, + .drv_name = "amd_sdw", + }, + { + .link_mask = BIT(0) | BIT(1), + .links = acp63_cs42l45_l0u0_cs35l63x2_l1u01, + .drv_name = "amd_sdw", + }, + { + .link_mask = BIT(0) | BIT(1), + .links = acp63_cs42l45_l0u0_cs35l63x2_l1u13, + .drv_name = "amd_sdw", + }, + { + .link_mask = BIT(0) | BIT(1), + .links = acp63_cs42l45_l1u0_cs35l63x2_l0u01, + .drv_name = "amd_sdw", + }, + { + .link_mask = BIT(1), + .links = acp63_cs42l43_l1u0_cs35l56x4_l1u0123, + .drv_name = "amd_sdw", + }, + { + .link_mask = BIT(1), + .links = acp63_cs35l56x4_l1u3210, + .drv_name = "amd_sdw", + }, + { + .link_mask = BIT(0), + .links = acp63_cs35l63x4_l0u0246, + .drv_name = "amd_sdw", + }, + { + .link_mask = BIT(0), + .links = acp63_cs42l43_l0u1, + .drv_name = "amd_sdw", + }, + { + .link_mask = BIT(0), + .links = acp63_cs42l43b_l0u1, + .drv_name = "amd_sdw", + }, + { + .link_mask = BIT(0), + .links = acp63_cs42l45_l0u0, + .drv_name = "amd_sdw", + }, + { + .link_mask = BIT(1), + .links = acp63_cs42l45_l1u0, + .drv_name = "amd_sdw", + }, {}, }; EXPORT_SYMBOL(snd_soc_acpi_amd_acp63_sdw_machines); From ca5056f5a78ce62588878d138e8141f01d70e61b Mon Sep 17 00:00:00 2001 From: Richard Fitzgerald Date: Thu, 26 Feb 2026 11:35:11 +0000 Subject: [PATCH 355/359] ASoC: cs35l56: Suppress pointless warning about number of GPIO pulls In cs35l56_process_xu_onchip_speaker_id() the warning that the number of pulls != number of GPIOs should only be printed if pulls are defined. Pull settings are optional because there would normally be an external resistor providing the pull. The warning would still be true if pulls are not defined, but in that case is just log noise. While we're changing that block of code, also fix the indenting of the arguments to the dev_warn(). Signed-off-by: Richard Fitzgerald Link: https://patch.msgid.link/20260226113511.1768838-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown --- sound/soc/codecs/cs35l56.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sound/soc/codecs/cs35l56.c b/sound/soc/codecs/cs35l56.c index 3bf9e8fc34a2..37909a319f88 100644 --- a/sound/soc/codecs/cs35l56.c +++ b/sound/soc/codecs/cs35l56.c @@ -1625,9 +1625,9 @@ static int cs35l56_process_xu_onchip_speaker_id(struct cs35l56_private *cs35l56, if (num_pulls < 0) return num_pulls; - if (num_pulls != num_gpios) { + if (num_pulls && (num_pulls != num_gpios)) { dev_warn(cs35l56->base.dev, "%s count(%d) != %s count(%d)\n", - pull_name, num_pulls, gpio_name, num_gpios); + pull_name, num_pulls, gpio_name, num_gpios); } ret = cs35l56_check_and_save_onchip_spkid_gpios(&cs35l56->base, From a8df7892a9f42b2e2d5851f8835c734bd7fe8ad4 Mon Sep 17 00:00:00 2001 From: Sheetal Date: Mon, 2 Mar 2026 14:23:22 +0530 Subject: [PATCH 356/359] ASoC: dt-bindings: tegra: Add compatible for Tegra238 sound card Tegra238 requires different PLLA and PLLA_OUT0 clock rates compared to other Tegra platforms. Add Tegra238 compatible string to the APE tegra-audio-graph-card bindings. Signed-off-by: Sheetal Acked-by: Krzysztof Kozlowski Link: https://patch.msgid.link/20260302085323.3139571-2-sheetal@nvidia.com Signed-off-by: Mark Brown --- .../devicetree/bindings/sound/nvidia,tegra-audio-graph-card.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/sound/nvidia,tegra-audio-graph-card.yaml b/Documentation/devicetree/bindings/sound/nvidia,tegra-audio-graph-card.yaml index da89523ccf5f..92bc3ef56f2c 100644 --- a/Documentation/devicetree/bindings/sound/nvidia,tegra-audio-graph-card.yaml +++ b/Documentation/devicetree/bindings/sound/nvidia,tegra-audio-graph-card.yaml @@ -23,6 +23,7 @@ properties: enum: - nvidia,tegra210-audio-graph-card - nvidia,tegra186-audio-graph-card + - nvidia,tegra238-audio-graph-card - nvidia,tegra264-audio-graph-card clocks: From 27990181031fdcdbe0f7c46011f6404e5d116386 Mon Sep 17 00:00:00 2001 From: Charles Keepax Date: Tue, 3 Mar 2026 14:17:07 +0000 Subject: [PATCH 357/359] ASoC: SDCA: Add allocation failure check for Entity name Currently find_sdca_entity_iot() can allocate a string for the Entity name but it doesn't check if that allocation succeeded. Add the missing NULL check after the allocation. Fixes: 48fa77af2f4a ("ASoC: SDCA: Add terminal type into input/output widget name") Signed-off-by: Charles Keepax Link: https://patch.msgid.link/20260303141707.3841635-1-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown --- sound/soc/sdca/sdca_functions.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/sound/soc/sdca/sdca_functions.c b/sound/soc/sdca/sdca_functions.c index 95b67bb904c3..e0ed593697ba 100644 --- a/sound/soc/sdca/sdca_functions.c +++ b/sound/soc/sdca/sdca_functions.c @@ -1156,9 +1156,12 @@ static int find_sdca_entity_iot(struct device *dev, if (!terminal->is_dataport) { const char *type_name = sdca_find_terminal_name(terminal->type); - if (type_name) + if (type_name) { entity->label = devm_kasprintf(dev, GFP_KERNEL, "%s %s", entity->label, type_name); + if (!entity->label) + return -ENOMEM; + } } ret = fwnode_property_read_u32(entity_node, From fbb143e4a6efa4a175e856fc898754b06cb13c4f Mon Sep 17 00:00:00 2001 From: Biju Das Date: Wed, 4 Mar 2026 07:19:55 +0000 Subject: [PATCH 358/359] ASoC: dt-bindings: renesas,rz-ssi: Document RZ/G3L SoC Document RZ/G3L SSIF-2 bindings. The RZ/G3L SSIF-2 IP is identical to one found on the RZ/G2L SoC. Signed-off-by: Biju Das Link: https://patch.msgid.link/20260304072000.6787-1-biju.das.jz@bp.renesas.com Signed-off-by: Mark Brown --- Documentation/devicetree/bindings/sound/renesas,rz-ssi.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/sound/renesas,rz-ssi.yaml b/Documentation/devicetree/bindings/sound/renesas,rz-ssi.yaml index e4cdbf2202b9..1394f78281fc 100644 --- a/Documentation/devicetree/bindings/sound/renesas,rz-ssi.yaml +++ b/Documentation/devicetree/bindings/sound/renesas,rz-ssi.yaml @@ -20,6 +20,7 @@ properties: - renesas,r9a07g044-ssi # RZ/G2{L,LC} - renesas,r9a07g054-ssi # RZ/V2L - renesas,r9a08g045-ssi # RZ/G3S + - renesas,r9a08g046-ssi # RZ/G3L - const: renesas,rz-ssi reg: From 325291b20f8a6f14b9c82edbf5d12e4e71f6adaa Mon Sep 17 00:00:00 2001 From: Zhang Heng Date: Wed, 4 Mar 2026 14:32:55 +0800 Subject: [PATCH 359/359] ASoC: amd: yc: Add DMI quirk for ASUS EXPERTBOOK PM1503CDA Add a DMI quirk for the ASUS EXPERTBOOK PM1503CDA fixing the issue where the internal microphone was not detected. Link: https://bugzilla.kernel.org/show_bug.cgi?id=221070 Cc: stable@vger.kernel.org Signed-off-by: Zhang Heng Link: https://patch.msgid.link/20260304063255.139331-1-zhangheng@kylinos.cn Signed-off-by: Mark Brown --- sound/soc/amd/yc/acp6x-mach.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/sound/soc/amd/yc/acp6x-mach.c b/sound/soc/amd/yc/acp6x-mach.c index 7af4daeb4c6f..1324543b42d7 100644 --- a/sound/soc/amd/yc/acp6x-mach.c +++ b/sound/soc/amd/yc/acp6x-mach.c @@ -710,6 +710,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "ASUS EXPERTBOOK BM1503CDA"), } }, + { + .driver_data = &acp6x_card, + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_BOARD_NAME, "PM1503CDA"), + } + }, {} };