linux

mirror of https://github.com/torvalds/linux.git synced 2026-03-08 01:04:41 +01:00

Author	SHA1	Message	Date
Calvin Owens	d008ba8be8	tracing: Fix trace_buf_size= cmdline parameter with sizes >= 2G Some of the sizing logic through tracer_alloc_buffers() uses int internally, causing unexpected behavior if the user passes a value that does not fit in an int (on my x86 machine, the result is uselessly tiny buffers). Fix by plumbing the parameter's real type (unsigned long) through to the ring buffer allocation functions, which already use unsigned long. It has always been possible to create larger ring buffers via the sysfs interface: this only affects the cmdline parameter. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/bff42a4288aada08bdf74da3f5b67a2c28b761f8.1772852067.git.calvin@wbinvd.org Fixes: `73c5162aa3` ("tracing: keep ring buffer to minimum size till used") Signed-off-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-06 22:25:53 -05:00
Andrei-Alexandru Tachici	3b1679e086	tracing: Fix enabling multiple events on the kernel command line and bootconfig Multiple events can be enabled on the kernel command line via a comma separator. But if the are specified one at a time, then only the last event is enabled. This is because the event names are saved in a temporary buffer, and each call by the init cmdline code will reset that buffer. This also affects names in the boot config file, as it may call the callback multiple times with an example of: kernel.trace_event = ":mod:rproc_qcom_common", ":mod:qrtr", ":mod:qcom_aoss" Change the cmdline callback function to append a comma and the next value if the temporary buffer already has content. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260302-trace-events-allow-multiple-modules-v1-1-ce4436e37fb8@oss.qualcomm.com Signed-off-by: Andrei-Alexandru Tachici <andrei-alexandru.tachici@oss.qualcomm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-06 16:54:34 -05:00
Guenter Roeck	457965c13f	tracing: Add NULL pointer check to trigger_data_free() If trigger_data_alloc() fails and returns NULL, event_hist_trigger_parse() jumps to the out_free error path. While kfree() safely handles a NULL pointer, trigger_data_free() does not. This causes a NULL pointer dereference in trigger_data_free() when evaluating data->cmd_ops->set_filter. Fix the problem by adding a NULL pointer check to trigger_data_free(). The problem was found by an experimental code review agent based on gemini-3.1-pro while reviewing backports into v6.18.y. Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://patch.msgid.link/20260305193339.2810953-1-linux@roeck-us.net Fixes: `0550069cc2` ("tracing: Properly process error handling in event_hist_trigger_parse()") Assisted-by: Gemini:gemini-3.1-pro Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-06 13:04:30 -05:00
Jerome Marchand	f26b098d93	ftrace: Add MAINTAINERS entries for all ftrace headers There is currently no entry for ftrace_irq.h and ftrace_regs.h. Add a generic entry for all ftrace headers to include them and prevent overlooking future ftrace headers. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://patch.msgid.link/20260305093117.853700-1-jmarchan@redhat.com Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-05 10:17:31 -05:00
Qing Wang	e39bb9e02b	tracing: Fix WARN_ON in tracing_buffers_mmap_close When a process forks, the child process copies the parent's VMAs but the user_mapped reference count is not incremented. As a result, when both the parent and child processes exit, tracing_buffers_mmap_close() is called twice. On the second call, user_mapped is already 0, causing the function to return -ENODEV and triggering a WARN_ON. Normally, this isn't an issue as the memory is mapped with VM_DONTCOPY set. But this is only a hint, and the application can call madvise(MADVISE_DOFORK) which resets the VM_DONTCOPY flag. When the application does that, it can trigger this issue on fork. Fix it by incrementing the user_mapped reference count without re-mapping the pages in the VMA's open callback. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://patch.msgid.link/20260227025842.1085206-1-wangqing7171@gmail.com Fixes: `cf9f0f7c4c` ("tracing: Allow user-space mapping of the ring-buffer") Reported-by: syzbot+3b5dd2030fe08afdf65d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=3b5dd2030fe08afdf65d Tested-by: syzbot+3b5dd2030fe08afdf65d@syzkaller.appspotmail.com Signed-off-by: Qing Wang <wangqing7171@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-03 22:25:32 -05:00
Masami Hiramatsu (Google)	a5dd6f5866	tracing: Disable preemption in the tracepoint callbacks handling filtered pids Filtering PIDs for events triggered the following during selftests: [37] event tracing - restricts events based on pid notrace filtering [ 155.874095] [ 155.874869] ============================= [ 155.876037] WARNING: suspicious RCU usage [ 155.877287] 7.0.0-rc1-00004-g8cd473a19bc7 #7 Not tainted [ 155.879263] ----------------------------- [ 155.882839] kernel/trace/trace_events.c:1057 suspicious rcu_dereference_check() usage! [ 155.889281] [ 155.889281] other info that might help us debug this: [ 155.889281] [ 155.894519] [ 155.894519] rcu_scheduler_active = 2, debug_locks = 1 [ 155.898068] no locks held by ftracetest/4364. [ 155.900524] [ 155.900524] stack backtrace: [ 155.902645] CPU: 1 UID: 0 PID: 4364 Comm: ftracetest Not tainted 7.0.0-rc1-00004-g8cd473a19bc7 #7 PREEMPT(lazy) [ 155.902648] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 [ 155.902651] Call Trace: [ 155.902655] <TASK> [ 155.902659] dump_stack_lvl+0x67/0x90 [ 155.902665] lockdep_rcu_suspicious+0x154/0x1a0 [ 155.902672] event_filter_pid_sched_process_fork+0x9a/0xd0 [ 155.902678] kernel_clone+0x367/0x3a0 [ 155.902689] __x64_sys_clone+0x116/0x140 [ 155.902696] do_syscall_64+0x158/0x460 [ 155.902700] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 155.902702] ? trace_irq_disable+0x1d/0xc0 [ 155.902709] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 155.902711] RIP: 0033:0x4697c3 [ 155.902716] Code: 1f 84 00 00 00 00 00 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 89 c2 85 c0 75 2c 64 48 8b 04 25 10 00 00 [ 155.902718] RSP: 002b:00007ffc41150428 EFLAGS: 00000246 ORIG_RAX: 0000000000000038 [ 155.902721] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004697c3 [ 155.902722] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011 [ 155.902724] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000003fccf990 [ 155.902725] R10: 000000003fccd690 R11: 0000000000000246 R12: 0000000000000001 [ 155.902726] R13: 000000003fce8103 R14: 0000000000000001 R15: 0000000000000000 [ 155.902733] </TASK> [ 155.902747] The tracepoint callbacks recently were changed to allow preemption. The event PID filtering callbacks that were attached to the fork and exit tracepoints expected preemption disabled in order to access the RCU protected PID lists. Add a guard(preempt)() to protect the references to the PID list. Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260303215738.6ab275af@fedora Fixes: `a46023d561` ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast") Link: https://patch.msgid.link/20260303131706.96057f61a48a34c43ce1e396@kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-03 22:25:32 -05:00
Steven Rostedt	cc337974cd	ftrace: Disable preemption in the tracepoint callbacks handling filtered pids When function trace PID filtering is enabled, the function tracer will attach a callback to the fork tracepoint as well as the exit tracepoint that will add the forked child PID to the PID filtering list as well as remove the PID that is exiting. Commit `a46023d561` ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast") removed the disabling of preemption when calling tracepoint callbacks. The callbacks used for the PID filtering accounting depended on preemption being disabled, and now the trigger a "suspicious RCU usage" warning message. Make them explicitly disable preemption. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260302213546.156e3e4f@gandalf.local.home Fixes: `a46023d561` ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-03-03 22:25:31 -05:00
Huiwen He	0a663b764d	tracing: Fix syscall events activation by ensuring refcount hits zero When multiple syscall events are specified in the kernel command line (e.g., trace_event=syscalls:sys_enter_openat,syscalls:sys_enter_close), they are often not captured after boot, even though they appear enabled in the tracing/set_event file. The issue stems from how syscall events are initialized. Syscall tracepoints require the global reference count (sys_tracepoint_refcount) to transition from 0 to 1 to trigger the registration of the syscall work (TIF_SYSCALL_TRACEPOINT) for tasks, including the init process (pid 1). The current implementation of early_enable_events() with disable_first=true used an interleaved sequence of "Disable A -> Enable A -> Disable B -> Enable B". If multiple syscalls are enabled, the refcount never drops to zero, preventing the 0->1 transition that triggers actual registration. Fix this by splitting early_enable_events() into two distinct phases: 1. Disable all events specified in the buffer. 2. Enable all events specified in the buffer. This ensures the refcount hits zero before re-enabling, allowing syscall events to be properly activated during early boot. The code is also refactored to use a helper function to avoid logic duplication between the disable and enable phases. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260224023544.1250787-1-hehuiwen@kylinos.cn Fixes: `ce1039bd3a` ("tracing: Fix enabling of syscall events on the command line") Signed-off-by: Huiwen He <hehuiwen@kylinos.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-03 22:15:02 -05:00
Shengming Hu	b96d0c59cd	fgraph: Fix thresh_return nosleeptime double-adjust trace_graph_thresh_return() called handle_nosleeptime() and then delegated to trace_graph_return(), which calls handle_nosleeptime() again. When sleep-time accounting is disabled this double-adjusts calltime and can produce bogus durations (including underflow). Fix this by computing rettime once, applying handle_nosleeptime() only once, using the adjusted calltime for threshold comparison, and writing the return event directly via __trace_graph_return() when the threshold is met. Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260221113314048jE4VRwIyZEALiYByGK0My@zte.com.cn Fixes: `3c9880f3ab` ("ftrace: Use a running sleeptime instead of saving on shadow stack") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-03 22:11:20 -05:00
Shengming Hu	6ca8379b5d	fgraph: Fix thresh_return clear per-task notrace When tracing_thresh is enabled, function graph tracing uses trace_graph_thresh_return() as the return handler. Unlike trace_graph_return(), it did not clear the per-task TRACE_GRAPH_NOTRACE flag set by the entry handler for set_graph_notrace addresses. This could leave the task permanently in "notrace" state and effectively disable function graph tracing for that task. Mirror trace_graph_return()'s per-task notrace handling by clearing TRACE_GRAPH_NOTRACE and returning early when set. Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260221113007819YgrZsMGABff4Rc-O_fZxL@zte.com.cn Fixes: `b84214890a` ("function_graph: Move graph notrace bit to shadow stack global var") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-03-03 22:10:37 -05:00
Linus Torvalds	11439c4635	Linux 7.0-rc2	2026-03-01 15:39:31 -08:00
Linus Torvalds	949d0a46ad	Arm: * Make sure we don't leak any S1POE state from guest to guest when the feature is supported on the HW, but not enabled on the host * Propagate the ID registers from the host into non-protected VMs managed by pKVM, ensuring that the guest sees the intended feature set * Drop double kern_hyp_va() from unpin_host_sve_state(), which could bite us if we were to change kern_hyp_va() to not being idempotent * Don't leak stage-2 mappings in protected mode * Correctly align the faulting address when dealing with single page stage-2 mappings for PAGE_SIZE > 4kB * Fix detection of virtualisation-capable GICv5 IRS, due to the maintainer being obviously fat fingered... [his words, not mine] * Remove duplication of code retrieving the ASID for the purpose of S1 PT handling * Fix slightly abusive const-ification in vgic_set_kvm_info() Generic: * Remove internal Kconfigs that are now set on all architectures. * Remove per-architecture code to enable KVM_CAP_SYNC_MMU, all architectures finally enable it in Linux 7.0. -----BEGIN PGP SIGNATURE----- iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmmkSMUUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroO2tggAgOHvH9lwgHxFADAOKXPEsv+Tw6cB zuUBPo6e5XQQx6BJQVo8ctveCBbA0ybRJXbju6lgS/Bg9qQYKjZbNrYdKwBwhw3b JBUFTIJgl7lLU5gEANNtvgiiNUc2z1GfG8NQ5ovMJ0/MDEEI0YZjkGoe8ZyJ54vg qrNHBROKvt54wVHXFGhZ1z8u4RqJ/WCmy21wYbwiMgQGXq9ugRRfhnu3iGNO8BcK bSSh4cP7qusO5Nj0m/XYD/68VwvyogaEkh4sNS1VH0FOoONRWs80Q6iT2HjCGgv6 mAzCx/yB9ziSQRis3oYYEBH23HZxRVDVBeSBlPcbxE3VM7SGzp7O8L692g== =+mC+ -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "Arm: - Make sure we don't leak any S1POE state from guest to guest when the feature is supported on the HW, but not enabled on the host - Propagate the ID registers from the host into non-protected VMs managed by pKVM, ensuring that the guest sees the intended feature set - Drop double kern_hyp_va() from unpin_host_sve_state(), which could bite us if we were to change kern_hyp_va() to not being idempotent - Don't leak stage-2 mappings in protected mode - Correctly align the faulting address when dealing with single page stage-2 mappings for PAGE_SIZE > 4kB - Fix detection of virtualisation-capable GICv5 IRS, due to the maintainer being obviously fat fingered... [his words, not mine] - Remove duplication of code retrieving the ASID for the purpose of S1 PT handling - Fix slightly abusive const-ification in vgic_set_kvm_info() Generic: - Remove internal Kconfigs that are now set on all architectures - Remove per-architecture code to enable KVM_CAP_SYNC_MMU, all architectures finally enable it in Linux 7.0" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: always define KVM_CAP_SYNC_MMU KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIER KVM: arm64: Deduplicate ASID retrieval code irqchip/gic-v5: Fix inversion of IRS_IDR0.virt flag KVM: arm64: Revert accidental drop of kvm_uninit_stage2_mmu() for non-NV VMs KVM: arm64: Fix protected mode handling of pages larger than 4kB KVM: arm64: vgic: Handle const qualifier from gic_kvm_info allocation type KVM: arm64: Remove redundant kern_hyp_va() in unpin_host_sve_state() KVM: arm64: Fix ID register initialization for non-protected pKVM guests KVM: arm64: Optimise away S1POE handling when not supported by host KVM: arm64: Hide S1POE from guests when not supported by the host	2026-03-01 15:34:47 -08:00
Linus Torvalds	e2bd1b1369	A single fix for debugobjects. The deferred page initialization prevents debug objects from allocating slab pages until the initialization is complete. That causes depletion of the pool and disabling of debugobjects. The reason is that debugobjects uses __GFP_HIGH for allocations as it might be invoked from arbitrary contexts. When PREEMPT_COUNT is disabled there is no way to know whether the context is safe to set __GFP_KSWAPD_RECLAIM. This worked until v6.18. Since then allocations w/o a reclaim flag cause new_slab() to end up in alloc_frozen_pages_nolock_noprof(), which returns early when deferred page initialization has not yet completed. Work around that when PREEMPT_COUNT is enabled as the preempt counter allows debugobjects to add __GFP_KSWAPD_RECLAIM to the GFP flags when the context is preemtible. When PREEMPT_COUNT is disabled the context is unknown and the reclaim bit can't be set because the caller might hold locks which might deadlock in the allocator. That makes debugobjects depend on PREEMPT_COUNT \|\| !DEFERRED_STRUCT_PAGE_INIT, which limits the coverage slightly, but keeps it functional for most cases. -----BEGIN PGP SIGNATURE----- iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmmka6wQHHRnbHhAa2Vy bmVsLm9yZwAKCRCmGPVMDXSYoaOrEACqslcovaH37JnDkEVuMJAl/RcHJbKfFL/s SN3UcPIDmae4yF+DVnMmLUjBdkuMHHkTQOQRGKuwpOT3PUqGxvYAYtd6JogH4joc LhMA6uJUddINScKvjJmvdUBwFWHRrrcO3QROdqjKrF/DUB1zGxGVsWDrvUT7xv7w sXa2IWNnC59Zo3LlRON4uO6S18wuKh2+Vo8cT579wRs4vrccNv0c6V9OnIyO8YCi yuYmaE+oTKgjZ5NOIwc/h5yV9g2TKod0s1eXdpfbSHdVvWL2qLH/CPUA3xyPgK04 X0Yy62MeaJ4gZZJEXnHJHnlgqZXOB34Xy6FvsrG9RL/iGOV30PNVpnZNxSzCjbvM qobG8R2x8ej+5/1tdybxmY9WLijTKF7rVeihNbyNJ3gcxAXtNx5kQsMeDDlHbyGV kqB2Z+w93GrzAfvUaE79QNehWQM7+3/pe0Z/tBPF78y0Fo50R0d8m35i87rqGFLq xipSNoFOPrvCkPOtb7Cui90Jh0LOvdMll1UT8TWXD5GjHUF02KuwEp80HruLIGTw zgSLLxgjR3EhPI6NtvNoQIpdt5tgIwcHIgUTUFkg1V74HK6Pa/LINREeP5so8VDh dEtHHU5Q9fl50/oPQn7f+tezvxKqrET/oZXnVgW2x21Ypf1IMdbQKawzNYu2YeWf aerqW8SBHw== =kvxN -----END PGP SIGNATURE----- Merge tag 'core-debugobjects-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull debugobjects fix from Thomas Gleixner: "A single fix for debugobjects. The deferred page initialization prevents debug objects from allocating slab pages until the initialization is complete. That causes depletion of the pool and disabling of debugobjects. The reason is that debugobjects uses __GFP_HIGH for allocations as it might be invoked from arbitrary contexts. When PREEMPT_COUNT is disabled there is no way to know whether the context is safe to set __GFP_KSWAPD_RECLAIM. This worked until v6.18. Since then allocations w/o a reclaim flag cause new_slab() to end up in alloc_frozen_pages_nolock_noprof(), which returns early when deferred page initialization has not yet completed. Work around that when PREEMPT_COUNT is enabled as the preempt counter allows debugobjects to add __GFP_KSWAPD_RECLAIM to the GFP flags when the context is preemtible. When PREEMPT_COUNT is disabled the context is unknown and the reclaim bit can't be set because the caller might hold locks which might deadlock in the allocator. That makes debugobjects depend on PREEMPT_COUNT \|\| !DEFERRED_STRUCT_PAGE_INIT, which limits the coverage slightly, but keeps it functional for most cases" * tag 'core-debugobjects-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: debugobject: Make it work with deferred page initialization - again	2026-03-01 13:32:32 -08:00
Linus Torvalds	5920da4455	Miscellaneous x86 fixes: - Fix speculative safety in fred_extint() - Fix __WARN_printf() trap in early_fixup_exception() - Fix clang-build boot bug for unusual alignments, triggered by CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y - Replace the final few __ASSEMBLY__ stragglers that snuck in lately into non-UAPI x86 headers and use __ASSEMBLER__ consistently (again). Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmkBnwRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hPpw/+IyRN/VoRYWnqJL/BVCT6PRLhtmc+/+9t JLYjCXwDZDf6NGDVOQfeMuubJSt7wSlttPScbQbCSpj1o/xN5AdyBav5jG6fV/PQ B9+p+dh4rQSLQR5PHj/PKC9arBSsnBgnTuQKYxCEd/RDc5czNd+tgy6AWsQKS4Ti a5xOtsE4xZql49ypOm1JeOu8bZKF+yui6zl8alIE1V1P/dDt9VQ7h7tQB5Dki7Vc XVxKOhPAcNpsAa7PXgFihH3g5fqF5WRz1MOvxuVEhlpnJ31LsNhYznUJjwu+VnV0 0rQDPLs1qs2tq0ysboXh6NVmmjH0ojSQXTlkGsi5Qa4r2EwH0ZcmF74YS9fOzwP2 V35vGb6Fl0rtZPdQMN0uA6DfiG8+M2yyky71QwoTE27qvbZtXA8wDYn4N2PQa4ZJ BJG1MVNzttkxwmRNJa2rg8Uyr7FvG2NVpqJUbFwD87McQtlyv9LUWHw4N8z2nqq9 3uhj862MCz136zA+jUSR2+zh9aNnx8y/pNMzmwqRAuj5mpH5CK3FZfUlDKWE2YI0 3tEnbrvL3pv8sIT0/ttyERkCQGBdesHZ5ZaHljb7AUcSVn/Neroo1TZGqrBjyeeV kq0w8SHj+BNBs8dNuuy+ONc/Pzk0BMTPpS/flb5qvbCKT655LnCUwIdwy2IIT3XA 94PISb+ZZQ8= =A61U -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix speculative safety in fred_extint() - Fix __WARN_printf() trap in early_fixup_exception() - Fix clang-build boot bug for unusual alignments, triggered by CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y - Replace the final few __ASSEMBLY__ stragglers that snuck in lately into non-UAPI x86 headers and use __ASSEMBLER__ consistently (again) * tag 'x86-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/headers: Replace __ASSEMBLY__ stragglers with __ASSEMBLER__ x86/cfi: Fix CFI rewrite for odd alignments x86/bug: Handle __WARN_printf() trap in early_fixup_exception() x86/fred: Correct speculative safety in fred_extint()	2026-03-01 13:16:35 -08:00
Linus Torvalds	f6542af922	Improve the inlining of jiffies_to_msecs() and jiffies_to_usecs(), for the common HZ=100, 250 or 1000 cases, only inlining them for odd HZ values like HZ=300. The inlining overhead showed up in performance tests of the TCP code. (Marked as an RFC pull request, as it's not a regression.) Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmkA94RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1ggmhAAqvNgH1vG+Ad1/r2aZ0iQNOJ3lLL9DKVU V0Q2iXiRPKvzjXaHB+C2B2GkGgBBxyvdl1wGPpknI/OkoC8aBzK8jSchmfLFfWdx C1MC0xoZRtaWDQaMYYQ83ZGQqdKHct4ZrwfsPu+g95NGvNg3m/W8p3cZjruknvH3 bKZmbN2wQiSx6+PB7/FkEjV7eAlaskYYdKLNqZzd/62oQ6is4ppAjEAp+X/FidJv 0lUVr7ILx6lZHjWnavUjex2iSvvZ8XqiaYVKlj5EE9cCaPhXDTpkaO8l/ELcjpHL ZBBV6rpbDo7+Mo3UnUiz+CaYi6XAAOwgj6wZBiBXhVQUAW0PIlMaefDjccWFlDw1 oAcRCg5i3KF8cvrfQYUs4519W52eWOThqQ1fs5ql6P6ycHZ0KTsmaRAbNih811RN A2VMTkiyX25bXOUQ9e5Y7cYOvDMGGCWiocT6C7Is9gZQMfkj92NDCKURLwRYzzBr 2XDjg46YekGXwy8OamMwXMRcdyUC5fAIaWOaq7IL1K3cgbS2qbZx55Y85+AOfngF DvFWDIfjslYMZqzQQ+4+MaJitRQ2V6CqdOP3kQbJ3Z6DmIcwi7DkOzqH1hEglb7O IjXCxjxosZv4iofpxr0FKJPx7KBVSzzxezjMLzeijM+zDdbF4GWFpqRD1ONh/4vm /1tfaC6TU/E= =duTI -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Improve the inlining of jiffies_to_msecs() and jiffies_to_usecs(), for the common HZ=100, 250 or 1000 cases. Only use a function call for odd HZ values like HZ=300 that generate more code. The function call overhead showed up in performance tests of the TCP code" * tag 'timers-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time/jiffies: Inline jiffies_to_msecs() and jiffies_to_usecs()	2026-03-01 12:15:58 -08:00
Linus Torvalds	6170625149	Miscellaneous fixes: - Fix zero_vruntime tracking when there's a single task running - Fix slice protection logic - Fix the ->vprot logic for reniced tasks - Fix lag clamping in mixed slice workloads - Fix objtool uaccess warning (and bug) in the !CONFIG_RSEQ_SLICE_EXTENSION case caused by unexpected un-inlining, which triggers with older compilers - Fix a comment in the rseq registration rseq_size bound check code - Fix a legacy RSEQ ABI quirk that handled 32-byte area sizes differently, which special size we now reached naturally and want to avoid. The visible ugliness of the new reserved field will be avoided the next time the RSEQ area is extended. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmkAl0RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hdsg/+IdQpmtEXsugE1FqEuuptm0ld6hcFI9WC mlEXhid5Gq3a3KMBv0CLd73o+k8Ju/BDEdbfLMzY8A9h8OxfnuUL1T6Jt4q7dF1h 76ja1R+i+GNFcXWmSG8z6FUns4bRBJeNWFs3dzCFE9N2qOCCj1xBr/9BqgKvNVfZ cbcaiMvmi3z/vPUmT8hdMdEcA0Zo2gVcKDmny4Tca9sigyLZD8FqtW1FhqL1HX8H Cx8fZ2lkD2z6gKtOAbC3QuWVmP88tvZldaMsGHTAQIa14PP5h2xhyuLxBF1Zjnwy aWl4iYr6ILu3LRi54CmQOiESdEf3Srdbl8JxDcvU9vh8ecqXvDGPUB2xCszlPvOx R+scskNgNyd1WtUF2VYFLTNkj0B7Xe6eTYfIu2d5r8GrRt0YjRzsK/JQallAkV6V KORDm4/Xyl5Ss6tNtfZP7lpHD2qykscRGxgr0HjjJCyjA1ZNtGc1A+JKZ8D8q9Nq rxEbaa65KfAtYJ4i5j9goFPQwNeHXm/emToVzEfyKwZHs3ns0LwffDGSFFOYSm/p FVVmi9iSoxRvRFHBflvBIwFaCnIyBLTJZlB/Bp8MVaFnv+6OzdE/nfcKNaYqcVaT mzCpY2DFTx5KISmJR7DAWsPntoRV6WPcxVApWicTaT5G3C2TLvvTAEq8g2WIYDFB j6oNyEkX/Xw= =Nxqx -----END PGP SIGNATURE----- Merge tag 'sched-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: - Fix zero_vruntime tracking when there's a single task running - Fix slice protection logic - Fix the ->vprot logic for reniced tasks - Fix lag clamping in mixed slice workloads - Fix objtool uaccess warning (and bug) in the !CONFIG_RSEQ_SLICE_EXTENSION case caused by unexpected un-inlining, which triggers with older compilers - Fix a comment in the rseq registration rseq_size bound check code - Fix a legacy RSEQ ABI quirk that handled 32-byte area sizes differently, which special size we now reached naturally and want to avoid. The visible ugliness of the new reserved field will be avoided the next time the RSEQ area is extended. * tag 'sched-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq: slice ext: Ensure rseq feature size differs from original rseq size rseq: Clarify rseq registration rseq_size bound check comment sched/core: Fix wakeup_preempt's next_class tracking rseq: Mark rseq_arm_slice_extension_timer() __always_inline sched/fair: Fix lag clamp sched/eevdf: Update se->vprot in reweight_entity() sched/fair: Only set slice protection at pick time sched/fair: Fix zero_vruntime tracking	2026-03-01 11:09:24 -08:00
Linus Torvalds	cb36eabcaf	Miscellaneous fixes: - Fix lock ordering bug found by lockdep in perf_event_wakeup() - Fix uncore counter enumeration on Granite Rapids and Sierra Forest - Fix perf_mmap() refcount bug found by Syzkaller - Fix __perf_event_overflow() vs. perf_remove_from_context() race Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmj/7MRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gXPg//V/Qrbnc9jYyyA9ZT9hGg4Oz36HSLLuRe zcpb0Fndmyjt6Fq0vwqN59UcRM2coJjZ6V3TUyhQJjstnzkmMBsPE3frx+VjUqA6 rfUukgSr0mhlT/OtlBx0hUKBaiPvNe9khnKXLo1mO5aEVkIHryPbPcU/VLG45jE1 sF+dFP1cFVgyNqac8Ai4oNLsoRNQqWAvD0UrYHijJpFE6GqW8rBm2ReASbk/RMjv s5CqpdLiRFmOoQ1Vwu9iG/wej0OMVJWUEpbmcqysT7UAMdoYtEm79HYON1Ez+ZEx x6JEV8y3bv01MZc+HmP4mvKDgo5w1zxzNk3Smsx2sscUsZYVcv4zvG9C7UlkSsJ4 uWI6wwAPc1euBAmduTMDEyQr5CkjS3Rdb83s9+I2LtZXCP73+FPEhekbMx9mIhJi Qw+H6QFeacpFso74vjfK4nGEEz0GbjWaT+VLBSJkwhOd/+/fkWyHsQoU8DPfC8nH ETMaYGXpW80XRB5ttz/MoJfmXi2ovJsVpyvd06zJE0JzKdiydJsC0d6xGHY9JBCg 07bg0ux8/hX8grNVDWusvw2S15rso3RUOq9uajsTzlr728+hbCVZba87UYlgUNHL +uA7IyX1WrY9DApXKmOWi9MTRkvdAQz6r43QMk+xkDd6b8JrOOMAJFqYuF8xD5Da mXy3HKkIKag= =0JyK -----END PGP SIGNATURE----- Merge tag 'perf-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events fixes from Ingo Molnar: - Fix lock ordering bug found by lockdep in perf_event_wakeup() - Fix uncore counter enumeration on Granite Rapids and Sierra Forest - Fix perf_mmap() refcount bug found by Syzkaller - Fix __perf_event_overflow() vs perf_remove_from_context() race * tag 'perf-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix __perf_event_overflow() vs perf_remove_from_context() race perf/core: Fix refcount bug and potential UAF in perf_mmap perf/x86/intel/uncore: Add per-scheduler IMC CAS count events perf/core: Fix invalid wait context in ctx_sched_in()	2026-03-01 11:07:20 -08:00
Linus Torvalds	b410220870	Now that LLVM 22 has been released officially, require a release version to use the new CONFIG_WARN_CONTEXT_ANALYSIS feature. In particular this avoids the widely used Android clang 22.0.1 pre-release build which is known to be broken for this usecase. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmj/h0RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1iFhxAAq3gkt6Gd03wqSDR95IfNAtX+Pu3t+5v9 A8DpR281HzC313DpM/RlQ4O2VJRBhGTCXuGeOg5BbBgyD59RHqS47gw7WwNrpruQ 3qa3BMtbibZVfL8uy7sNscPtLgXFRwt2/BBKeSvldya0DdGJQwOuIqQ6MJkxN2Oq HkFbNvdiV/Gns6N893xgGWNdB6RvHIB4VPUxhb9cguq7Nwov75Uyf+adUo+qOTzq ztnQqNbR0DZcjEbJWYvYe5s87GYusWiPHXFh/1DCA9YUqkVMaJXMs6sxl8nRckfM 1RsH74IhlF1qOl3GcXT+tUG0gqnDeGUuK5d8eLlhGJnAQTUNRGLHVANGeQacBdAs wAOlZe5jDHmnW9cipfKTzlgHILt2rVlmIYbBRh4LKNigs/NUav22zy0NvTjQrGzJ P9SnCuXUc7EhLCNxabtwSL2fp/A9PR5hKpR+edxLUgVQBA0BZ1ewY67CDKjcYzet pU/93rroXS8LPEVxwlTM7VUb2xJhXMxxfVU0GTMIpx3vCIMXuzi1hgnGhkRh4Sk2 PrwHqj8hYPDWW0QCoy7AC4iejOYPO8A8tatJQDDNrRfLmnEriGOwA9S7l6v09D/3 yLBdG6lVFGryM4pJN3O8Ukh4cJVV/eOZVfYc2sO0NlO2XJohJLgxISzuw2XZzStp rrwtJI63P/E= =KdpZ -----END PGP SIGNATURE----- Merge tag 'locking-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "Now that LLVM 22 has been released officially, require a release version to use the new CONFIG_WARN_CONTEXT_ANALYSIS feature. In particular this avoids the widely used Android clang 22.0.1 pre-release build which is known to be broken for this usecase" * tag 'locking-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lib/Kconfig.debug: Require a release version of LLVM 22 for context analysis	2026-03-01 11:00:43 -08:00
Linus Torvalds	afa844360b	Miscellaneous irqchip driver fixes: - Fix frozen interrupt bug in the sifive-plic driver - Limit per-device MSI interrupts on uncommon gic-v3-its hardware variants - Address Sparse warning by constifying a variable in the MMP driver - Revert broken commit and also fix an error check in the ls-extirq driver Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmj/MMRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1iRdg//eKX6ScDG6IT/hzKHBVP9tGqFcvfR+LWf 3Q5I/5YipYNf2aqZS7UFcfhsOvMkwALF55CrNUCFatW2W4VAUR5pgt5JWNfMS4VO +9sGkkAhUenhLlGgGnIpuIHgIrkJFCqhQIy2pwMpVF5Sp/7jzJfIwonYziv3UUyD cWhv6zQULWFMzkLOjBC8Ba44gIfSSq6wERE0JZiE2aZMJ3Azjh+prKwkH5Q48adb Ni7b19wFFLDSvv8KGzpoFrA3S5Wwzppy4YJpTEKGXUq9AQr08BUtv2KNeLz660ma pWD9ZzPnclxUiWhmAdi3x+vmn8209Id3pYRGD2auZlT0+7cE1yvyFNl30IWwac5K IoGQ54F3nWYynT6ScYNJilKlN/kDDX6Azt1AELl+fHhOvp5VpjxlDpiUahvqJGpX bhNo1GC8fQSZFKehA987IZol1vp9z2utqxEQhm0hZX4FhIRaS8LfUa1e7A66FZ1Q cp+Hz/8oYczZAEb3vXNdEVkhtIFTzS90uRhrjaRLlO67Yo/0xWxwP34DuBv8MiL7 XWK+rj3TSxYR3U6cAIAkU0bft+XFIVI8fKc92duE2Cx5Nx2s0gHRYsM/F2lFrhIw jETLC60lWj1Sf1HP03VKi7EjFht6qSVs4PgPVYopZtn94eXV+HK+sZ5CmkxqgfFT OYpG1VbMxEU= =hIKS -----END PGP SIGNATURE----- Merge tag 'irq-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irqchip driver fixes from Ingo Molnar: - Fix frozen interrupt bug in the sifive-plic driver - Limit per-device MSI interrupts on uncommon gic-v3-its hardware variants - Address Sparse warning by constifying a variable in the MMP driver - Revert broken commit and also fix an error check in the ls-extirq driver * tag 'irq-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/ls-extirq: Fix devm_of_iomap() error check Revert "irqchip/ls-extirq: Use for_each_of_imap_item iterator" irqchip/mmp: Make icu_irq_chip variable static const irqchip/gic-v3-its: Limit number of per-device MSIs to the range the ITS supports irqchip/sifive-plic: Fix frozen interrupt due to affinity setting	2026-03-01 10:58:16 -08:00
Linus Torvalds	39c6332614	SCSI fixes on 20260228 All changes in drivers (well technically ses is enclosure services, but its change is minor). The biggest is the write combining change in lpfc followed by the additional NULL checks in mpi3mr. Signed-off-by: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> -----BEGIN PGP SIGNATURE----- iLgEABMIAGAWIQTnYEDbdso9F2cI+arnQslM7pishQUCaaO9chsUgAAAAAAEAA5t YW51MiwyLjUrMS4xMiwyLDImHGphbWVzLmJvdHRvbWxleUBoYW5zZW5wYXJ0bmVy c2hpcC5jb20ACgkQ50LJTO6YrIU8tQEAiyqxkTmD84X/4hL8+x+lxAI+lYmsGRfZ 4A4Xtx+B5vgBAK5mT+TEGBNVoGgpsWoDEa3xj6edz1jQMTiwnzhpnwql =FqRP -----END PGP SIGNATURE----- Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "All changes in drivers (well technically SES is enclosure services, but its change is minor). The biggest is the write combining change in lpfc followed by the additional NULL checks in mpi3mr" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: core: Fix shift out of bounds when MAXQ=32 scsi: ufs: core: Move link recovery for hibern8 exit failure to wl_resume scsi: ufs: core: Fix possible NULL pointer dereference in ufshcd_add_command_trace() scsi: snic: MAINTAINERS: Update snic maintainers scsi: snic: Remove unused linkstatus scsi: pm8001: Fix use-after-free in pm8001_queue_command() scsi: mpi3mr: Add NULL checks when resetting request and reply queues scsi: ufs: core: Reset urgent_bkops_lvl to allow runtime PM power mode scsi: ses: Fix devices attaching to different hosts scsi: ufs: core: Fix RPMB region size detection for UFS 2.2 scsi: storvsc: Fix scheduling while atomic on PREEMPT_RT scsi: lpfc: Properly set WC for DPP mapping	2026-03-01 09:59:29 -08:00
Linus Torvalds	eb71ab2bf7	bpf-fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmmjmDIACgkQ6rmadz2v bTq3gg//QQLOT/FxP2/dDurliDTXvQRr1tUxmIw6s3P6hnz9j/LLEVKpLRVkqd8t XEwbubPd1TXDRsJ4f26Ew01YUtf9xi6ZQoMe/BL1okxi0ZwQGGRVMkiKOQgRT+rj qYSN5JMfPzA2AuM6FjBF/hhw24yVRdgKRYBam6D7XLfFf3s8TOhHHjJ925PqEo0t uJOy4ddDYB9BcGmfoeyiFgUtpPqcYrKIUCLBdwFvT2fnPJvrFFoCF3t7NS9UJu/O wd6ZPuGWSOl9A7vSheldP6cJUDX8L/5WEGO4/LjN7plkySF0HNv8uq/b1T3kKqoY Y3unXerLGJUAA9D5wpYAekx9YmvRTPQ/o39oTbquEB4SSJVU/SPUpvFw7m2Moq10 51yuyXLcPlI3xtk0Bd8c/CESSmkRenjWzsuZQhDGhsR0I9mIaALrhf9LaatHtXI5 f5ct73e+beK7Fc0Ze+b0JxDeFvzA3CKfAF0/fvGt0r9VZjBaMD+a3NnscBlyKztW UCXazcfndMhNfUUWanktbT5YhYPmY7hzVQEOl7HAMGn4yG6XbXXmzzY6BqEXIucM etueW2msZJHGBHQGe2RK3lxtmiB7/FglJHd86xebkIU2gCzqt8fGUha8AIuJ4rLS 7wxC33DycCofRGWdseVu7PsTasdhSGsHKbXz2fOFOFESOczYRw8= =fj3P -----END PGP SIGNATURE----- Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: - Fix alignment of arm64 JIT buffer to prevent atomic tearing (Fuad Tabba) - Fix invariant violation for single value tnums in the verifier (Harishankar Vishwanathan, Paul Chaignon) - Fix a bunch of issues found by ASAN in selftests/bpf (Ihor Solodrai) - Fix race in devmpa and cpumap on PREEMPT_RT (Jiayuan Chen) - Fix show_fdinfo of kprobe_multi when cookies are not present (Jiri Olsa) - Fix race in freeing special fields in BPF maps to prevent memory leaks (Kumar Kartikeya Dwivedi) - Fix OOB read in dmabuf_collector (T.J. Mercier) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (36 commits) selftests/bpf: Avoid simplification of crafted bounds test selftests/bpf: Test refinement of single-value tnum bpf: Improve bounds when tnum has a single possible value bpf: Introduce tnum_step to step through tnum's members bpf: Fix race in devmap on PREEMPT_RT bpf: Fix race in cpumap on PREEMPT_RT selftests/bpf: Add tests for special fields races bpf: Retire rcu_trace_implies_rcu_gp() from local storage bpf: Delay freeing fields in local storage bpf: Lose const-ness of map in map_check_btf() bpf: Register dtor for freeing special fields selftests/bpf: Fix OOB read in dmabuf_collector selftests/bpf: Fix a memory leak in xdp_flowtable test bpf: Fix stack-out-of-bounds write in devmap bpf: Fix kprobe_multi cookies access in show_fdinfo callback bpf, arm64: Force 8-byte alignment for JIT buffer to prevent atomic tearing selftests/bpf: Don't override SIGSEGV handler with ASAN selftests/bpf: Check BPFTOOL env var in detect_bpftool_path() selftests/bpf: Fix out-of-bounds array access bugs reported by ASAN selftests/bpf: Fix array bounds warning in jit_disasm_helpers ...	2026-02-28 19:54:28 -08:00
Linus Torvalds	63a43faf6a	Driver core fixes for 7.0-rc2 - Do not register imx_clk_scu_driver in imx8qxp_clk_probe(); besides fixing two other issues, this avoids a deadlock in combination with commit `dc23806a7c` ("driver core: enforce device_lock for driver_match_device()"). - Move secondary node lookup from device_get_next_child_node() to fwnode_get_next_child_node(); this avoids issues when users switch from the device API to the fwnode API. - Export io_define_{read,write}!() to avoid unused import warnings when CONFIG_PCI=n. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQS2q/xV6QjXAdC7k+1FlHeO1qrKLgUCaaNQggAKCRBFlHeO1qrK Lo3AAP91+UfLXYTXh2qhAndszSSJi1Xjq2Ik4cIf4UT3Ed2FrQD/QQuppy8k6zFx tCZGWbmolra9Vnf/oz1OpCaU87JF/Qc= =ct45 -----END PGP SIGNATURE----- Merge tag 'driver-core-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core fixes from Danilo Krummrich: - Do not register imx_clk_scu_driver in imx8qxp_clk_probe(); besides fixing two other issues, this avoids a deadlock in combination with commit `dc23806a7c` ("driver core: enforce device_lock for driver_match_device()") - Move secondary node lookup from device_get_next_child_node() to fwnode_get_next_child_node(); this avoids issues when users switch from the device API to the fwnode API - Export io_define_{read,write}!() to avoid unused import warnings when CONFIG_PCI=n * tag 'driver-core-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: clk: scu/imx8qxp: do not register driver in probe() rust: io: macro_export io_define_read!() and io_define_write!() device property: Allow secondary lookup in fwnode_get_next_child_node()	2026-02-28 19:35:30 -08:00
Linus Torvalds	42eb017830	five client changesets -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmmjL1YACgkQiiy9cAdy T1Ec0wwAqUQdX9j+WfcWFZyKbHLUlhiQIPePZJ9vs75aDqWu3BPMrH8vzXP2Fuj5 utz0hlcUcjGtWYONFHZOCrfaL4lWIg0viSLehjfHzBwfz4meHGToIeXaH6aSi23t z308DsT4tV45hs7UX/XsokESawh2w1WMKb37dZipGNvoAI5ZXiUXUfYMc2OmatCN 29y8TFnZ4HK2uDleb/HFCsTBGmziMFDjA4Q+55Wy1ZDYXPa5GojsXqrmtp/yJTeT 0gA4ruuhutIbLIyKTRhSxIURKj+fEMnjDRJbi2f8ZQfj7bUaj20hVzskbUgYQ1AU uAc6r5e8JAGaydmIYECni52sh15LSMT+s3dxvPwxMiYaMcDJk52P3WgFMkIDPf/S uF+rBVWcmUCx4V55T5N/slOb7BS1X3bJWbuCZKDmK7weukaT2hEpqX2ogblJ1PBH RxjQv3Zo9LyrPnasdY4EelW+OZq7zRgUHw0Sjo/BJn0lwxVNU0S8X5MDrsC4N+j1 55r+2s9j =P2Ge -----END PGP SIGNATURE----- Merge tag 'v7.0rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - Two multichannel fixes - Locking fix for superblock flags - Fix to remove debug message that could log password - Cleanup fix for setting credentials * tag 'v7.0rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: Use snprintf in cifs_set_cifscreds smb: client: Don't log plaintext credentials in cifs_set_cifscreds smb: client: fix broken multichannel with krb5+signing smb: client: use atomic_t for mnt_cifs_flags smb: client: fix cifs_pick_channel when channels are equally loaded	2026-02-28 10:45:56 -08:00
Takashi Sakamoto	9197e5949a	firewire: ohci: initialize page array to use alloc_pages_bulk() correctly The call of alloc_pages_bulk() skips to fill entries of page array when the entries already have values. While, 1394 OHCI PCI driver passes the page array without initializing. It could cause invalid state at PFN validation in vmap(). Fixes: `f2ae92780a` ("firewire: ohci: split page allocation from dma mapping") Reported-by: John Ogness <john.ogness@linutronix.de> Reported-and-tested-by: Harald Arnesen <linux@skogtun.org> Reported-and-tested-by: David Gow <david@davidgow.net> Closes: https://lore.kernel.org/lkml/87tsv1vig5.fsf@jogness.linutronix.de/ Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-28 10:09:24 -08:00
Linus Torvalds	2f9339c052	spi: Fixes for v7.0 One fix for the stm32 driver which got broken for DMA chaining cases, plus a removal of some straggling bindings for the Bikal SoC which has been pulled out of the kernel. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmmi5gMACgkQJNaLcl1U h9B+yQf+Ktr/YWwTbT0Cw24otv2CSzoKUPThztW1ydenWv/HYkm5YS9MnnKajDXK bXPV7Nm+cG61sz2wNbGiqUvpdkoKEbyp7cHSoGM8rIs2/ncLk0BltjwVFeTZAk3R 9znWf5MIeYayxajK0dNjtaMKxK2Xn6faKchPXIygpp/XGLQ9ofoBfP15BnOh0x4c H5NxnhQZNWgrkMCT4C/ONXVf52RsD1gWvUL2hq8EZ0n9LgNsgEbbMOpX9VUewEWB AG3vZ5uqJ337KCo7wlUVPqQk1gCowpe1o71JmaaxYDrcYi6zZdPFR88FxsANN28L 4ASaFltuCde2hb39LhO45MURDQpU4Q== =Alum -----END PGP SIGNATURE----- Merge tag 'spi-fix-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "One fix for the stm32 driver which got broken for DMA chaining cases, plus a removal of some straggling bindings for the Bikal SoC which has been pulled out of the kernel" * tag 'spi-fix-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: stm32: fix missing pointer assignment in case of dma chaining spi: dt-bindings: snps,dw-abp-ssi: Remove unused bindings	2026-02-28 09:21:18 -08:00
Linus Torvalds	463e133751	regulator: Fixes for v7.0 A small pile of fixes, none of which are super major - the code fixes are improved error handling and fixing a leak of a device node. We also have a typo fix and an improvement to make the binding example for mt6359 more directly usable. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmmi54oACgkQJNaLcl1U h9CopQf/bMIeiMes34Q6tbLnco1K4Hq09N9eMofIc/7QhPQ3bJCdf4QZRMCgFBa2 VB0mv1b4DZayDJhJHSP6D/Tzdx6S+3cHRTENRfvsARFUEq7hyMDKdV0CWKFSyeQf 2rfhM/XW7NZOfAwZIFJRFtdGglYL958+ZwLvkYHaCQTKcqJQnJAEc9RS6rtKRWAv 95hEDYa8XwgS6cXKeuM508qYif0a5jWfQ41ZfhweXCe4vSJTm5zteOEjP10QzDeT ZaAsfUdQzC+Zps3sSDP7VRdi4KBWVepUtdyIkoPwoH5t4FiFLeL3CiY6arvHD8zK AHnNKPi6myISXasjPvd8F+4BRkwq0Q== =n8e7 -----END PGP SIGNATURE----- Merge tag 'regulator-fix-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A small pile of fixes, none of which are super major - the code fixes are improved error handling and fixing a leak of a device node. We also have a typo fix and an improvement to make the binding example for mt6359 more directly usable" * tag 'regulator-fix-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: Kconfig: fix a typo regulator: bq257xx: Fix device node reference leak in bq257xx_reg_dt_parse_gpio() regulator: fp9931: Fix PM runtime reference leak in fp9931_hwmon_read() regulator: tps65185: check devm_kzalloc() result in probe regulator: dt-bindings: mt6359: make regulator names unique	2026-02-28 09:18:02 -08:00
Linus Torvalds	201795a1b7	s390 updates for 7.0-rc2 - Fix guest pfault init to pass a physical address to DIAG 0x258, restoring pfault interrupts and avoiding vCPU stalls during host page-in - Fix kexec/kdump hangs with stack protector by marking s390_reset_system() __no_stack_protector; set_prefix(0) switches lowcore and the canary no longer matches - Fix idle/vtime cputime accounting (idle-exit ordering, vtimer double-forwarding) and small cleanups -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmmjAwQACgkQjYWKoQLX FBj8owf/TXZVqZbd2BQx0j4Ri+6APzv4BMZCUrWVLkibDsxb0xYQtPufo079JlFn 4xAYiEFgXPNPjw5y6cV1d9LTwDzlX6gsYF5Tc0cuD/KSdw3P5+nY9ohYtOtIrWzL h4Vp46/PfosQZefoQsxiSG6UrBhPxSbQBcXzcn2ZrqQnKLQAg/Fzoq+OW+L1SthX SU9cgISqMI8rBBIG7iOohpUJr12cX60qF1BbLNhcn+DC42TsQQY34HGAHABb8EcS dOWUREGcteBv7oE2nKjTPWfsSaGThOxjYKwDOSkfEq6B/UlOBuvzXHbzBT4eAPCL 8jPO0EdpyCMsMTPNuH70TliFhYLBUw== =6Ref -----END PGP SIGNATURE----- Merge tag 's390-7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix guest pfault init to pass a physical address to DIAG 0x258, restoring pfault interrupts and avoiding vCPU stalls during host page-in - Fix kexec/kdump hangs with stack protector by marking s390_reset_system() __no_stack_protector; set_prefix(0) switches lowcore and the canary no longer matches - Fix idle/vtime cputime accounting (idle-exit ordering, vtimer double-forwarding) and small cleanups * tag 's390-7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pfault: Fix virtual vs physical address confusion s390/kexec: Disable stack protector in s390_reset_system() s390/idle: Remove psw_idle() prototype s390/vtime: Use lockdep_assert_irqs_disabled() instead of BUG_ON() s390/vtime: Use __this_cpu_read() / get rid of READ_ONCE() s390/irq/idle: Remove psw bits early s390/idle: Inline update_timer_idle() s390/idle: Slightly optimize idle time accounting s390/idle: Add comment for non obvious code s390/vtime: Fix virtual timer forwarding s390/idle: Fix cpu idle exit cpu time accounting	2026-02-28 09:01:33 -08:00
Paolo Bonzini	55365ab85a	KVM/arm64 fixes for 7.0, take #1 - Make sure we don't leak any S1POE state from guest to guest when the feature is supported on the HW, but not enabled on the host - Propagate the ID registers from the host into non-protected VMs managed by pKVM, ensuring that the guest sees the intended feature set - Drop double kern_hyp_va() from unpin_host_sve_state(), which could bite us if we were to change kern_hyp_va() to not being idempotent - Don't leak stage-2 mappings in protected mode - Correctly align the faulting address when dealing with single page stage-2 mappings for PAGE_SIZE > 4kB - Fix detection of virtualisation-capable GICv5 IRS, due to the maintainer being obviously fat fingered... - Remove duplication of code retrieving the ASID for the purpose of S1 PT handling - Fix slightly abusive const-ification in vgic_set_kvm_info() -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmmfIDAACgkQI9DQutE9 ekNw0w//aFrTa4yoUns0Jd1HeCMnKIP3Iljz1FR0g58KAgIX5tTvgysThhNN7yjW k7Xp03oD7K4UtxqJVQnB4nPTX0PYFFOxfcsHNN1+cxhepo6n9t/61KMa6IM/EE2X lH5Cypz4pxHEynMXEDugnGJQaMKIe5eU9UHs5rTqqUeMGupqUOk2mjBOk77CWfio oIvOIne4h7SD9tzYcgZADZ5YAtc1QcRVZkLTAtfsd2xRzNlpBDNAqVX8PJPzEW35 cffrEvT0NGBu+0MLBzc3bZ3cViGv5/8DHqiTcQhvK14dHMUIxl1GxUrkXVHb8Ca0 hoBEV7dJHS0avPQoKXb+DjB+HkxHda4g8EJ7RxWJisofS1AztWWDZOsZnfgTaaLB Ia0WekWra0yuEiYlU5kE/HPRlXAKUAa3mhih3uu5Uq1c5xuhCzHY7pyB9uNCdYfz KiSsTFG0i+T0ddT8E92LOFTt65G2wZyd498yVoVJ7MDOUSb+eESvgu5BPH4hiJLM txnDDvEvcHn5zlnayUItxk4SSHLZHFEP/tjfWVTrEpFW3s9O86aQlSqCsxOQReYZ ZVSa4Qr/aef8KzS4IMxFcSIMGkUoNjFIe0+SsVj/5vLoX6eK0ppE0GoQhmREU2cc hcicwGH5VYdJXhxJUvxCqtrmX+F14JlENmdXBZsE4N0b41pi7G8= =Ppfu -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 7.0, take #1 - Make sure we don't leak any S1POE state from guest to guest when the feature is supported on the HW, but not enabled on the host - Propagate the ID registers from the host into non-protected VMs managed by pKVM, ensuring that the guest sees the intended feature set - Drop double kern_hyp_va() from unpin_host_sve_state(), which could bite us if we were to change kern_hyp_va() to not being idempotent - Don't leak stage-2 mappings in protected mode - Correctly align the faulting address when dealing with single page stage-2 mappings for PAGE_SIZE > 4kB - Fix detection of virtualisation-capable GICv5 IRS, due to the maintainer being obviously fat fingered... - Remove duplication of code retrieving the ASID for the purpose of S1 PT handling - Fix slightly abusive const-ification in vgic_set_kvm_info()	2026-02-28 15:33:34 +01:00
Paolo Bonzini	70295a479d	KVM: always define KVM_CAP_SYNC_MMU KVM_CAP_SYNC_MMU is provided by KVM's MMU notifiers, which are now always available. Move the definition from individual architectures to common code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2026-02-28 15:31:35 +01:00
Paolo Bonzini	407fd8b8d8	KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIER All architectures now use MMU notifier for KVM page table management. Remove the Kconfig symbol and the code that is used when it is disabled. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2026-02-28 15:31:35 +01:00
Alexei Starovoitov	b9c0a5c483	Merge branch 'fix-invariant-violation-for-single-value-tnums' Paul Chaignon says: ==================== Fix invariant violation for single-value tnums We're hitting an invariant violation in Cilium that sometimes leads to BPF programs being rejected and Cilium failing to start [1]. As far as I know this is the first case of invariant violation found in a real program (i.e., not by a fuzzer). The following extract from verifier logs shows what's happening: from 201 to 236: R1=0 R6=ctx() R7=1 R9=scalar(smin=umin=smin32=umin32=3584,smax=umax=smax32=umax32=3840,var_off=(0xe00; 0x100)) R10=fp0 236: R1=0 R6=ctx() R7=1 R9=scalar(smin=umin=smin32=umin32=3584,smax=umax=smax32=umax32=3840,var_off=(0xe00; 0x100)) R10=fp0 ; if (magic == MARK_MAGIC_HOST \|\| magic == MARK_MAGIC_OVERLAY \|\| magic == MARK_MAGIC_ENCRYPT) @ bpf_host.c:1337 236: (16) if w9 == 0xe00 goto pc+45 ; R9=scalar(smin=umin=smin32=umin32=3585,smax=umax=smax32=umax32=3840,var_off=(0xe00; 0x100)) 237: (16) if w9 == 0xf00 goto pc+1 verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0xe01, 0xe00] s64=[0xe01, 0xe00] u32=[0xe01, 0xe00] s32=[0xe01, 0xe00] var_off=(0xe00, 0x0) More details are given in the second patch, but in short, the verifier should be able to detect that the false branch of instruction 237 is never true. After instruction 236, the u64 range and the tnum overlap in a single value, 0xf00. The long-term solution to invariant violation is likely to rely on the refinement + invariant violation check to detect dead branches, as started by Eduard. To fix the current issue, we need something with less refactoring that we can backport to affected kernels. The solution implemented in the second patch is to improve the bounds refinement to avoid this case. It relies on a new tnum helper, tnum_step, first sent as an RFC in [2]. The last two patches extend and update the selftests. Link: https://github.com/cilium/cilium/issues/44216 [1] Link: https://lore.kernel.org/bpf/20251107192328.2190680-2-harishankar.vishwanathan@gmail.com/ [2] Changes in v3: - Fix commit description error spotted by AI bot. - Simplify constants in first two tests (Eduard). - Rework comment on third test (Eduard). - Add two new negative test cases (Eduard). - Rebased. Changes in v2: - Add guard suggested by Hari in tnum_step, to avoid undefined behavior spotted by AI code review. - Add explanation diagrams in code as suggested by Eduard. - Rework conditions for readability as suggested by Eduard. - Updated reference to SMT formula. - Rebased. ==================== Link: https://patch.msgid.link/cover.1772225741.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-02-27 16:11:50 -08:00
Paul Chaignon	024cea2d64	selftests/bpf: Avoid simplification of crafted bounds test The reg_bounds_crafted tests validate the verifier's range analysis logic. They focus on the actual ranges and thus ignore the tnum. As a consequence, they carry the assumption that the tested cases can be reproduced in userspace without using the tnum information. Unfortunately, the previous change the refinement logic breaks that assumption for one test case: (u64)2147483648 (u32)<op> [4294967294; 0x100000000] The tested bytecode is shown below. Without our previous improvement, on the false branch of the condition, R7 is only known to have u64 range [0xfffffffe; 0x100000000]. With our improvement, and using the tnum information, we can deduce that R7 equals 0x100000000. 19: (bc) w0 = w6 ; R6=0x80000000 20: (bc) w0 = w7 ; R7=scalar(smin=umin=0xfffffffe,smax=umax=0x100000000,smin32=-2,smax32=0,var_off=(0x0; 0x1ffffffff)) 21: (be) if w6 <= w7 goto pc+3 ; R6=0x80000000 R7=0x100000000 R7's tnum is (0; 0x1ffffffff). On the false branch, regs_refine_cond_op refines R7's u32 range to [0; 0x7fffffff]. Then, __reg32_deduce_bounds refines the s32 range to 0 using u32 and finally also sets u32=0. From this, __reg_bound_offset improves the tnum to (0; 0x100000000). Finally, our previous patch uses this new tnum to deduce that it only intersect with u64=[0xfffffffe; 0x100000000] in a single value: 0x100000000. Because the verifier uses the tnum to reach this constant value, the selftest is unable to reproduce it by only simulating ranges. The solution implemented in this patch is to change the test case such that there is more than one overlap value between u64 and the tnum. The max. u64 value is thus changed from 0x100000000 to 0x300000000. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/50641c6a7ef39520595dcafa605692427c1006ec.1772225741.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-02-27 16:11:50 -08:00
Paul Chaignon	e6ad477d1b	selftests/bpf: Test refinement of single-value tnum This patch introduces selftests to cover the new bounds refinement logic introduced in the previous patch. Without the previous patch, the first two tests fail because of the invariant violation they trigger. The last test fails because the R10 access is not detected as dead code. In addition, all three tests fail because of R0 having a non-constant value in the verifier logs. In addition, the last two cases are covering the negative cases: when we shouldn't refine the bounds because the u64 and tnum overlap in at least two values. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/90d880c8cf587b9f7dc715d8961cd1b8111d01a8.1772225741.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-02-27 16:11:50 -08:00
Paul Chaignon	efc11a6678	bpf: Improve bounds when tnum has a single possible value We're hitting an invariant violation in Cilium that sometimes leads to BPF programs being rejected and Cilium failing to start [1]. The following extract from verifier logs shows what's happening: from 201 to 236: R1=0 R6=ctx() R7=1 R9=scalar(smin=umin=smin32=umin32=3584,smax=umax=smax32=umax32=3840,var_off=(0xe00; 0x100)) R10=fp0 236: R1=0 R6=ctx() R7=1 R9=scalar(smin=umin=smin32=umin32=3584,smax=umax=smax32=umax32=3840,var_off=(0xe00; 0x100)) R10=fp0 ; if (magic == MARK_MAGIC_HOST \|\| magic == MARK_MAGIC_OVERLAY \|\| magic == MARK_MAGIC_ENCRYPT) @ bpf_host.c:1337 236: (16) if w9 == 0xe00 goto pc+45 ; R9=scalar(smin=umin=smin32=umin32=3585,smax=umax=smax32=umax32=3840,var_off=(0xe00; 0x100)) 237: (16) if w9 == 0xf00 goto pc+1 verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0xe01, 0xe00] s64=[0xe01, 0xe00] u32=[0xe01, 0xe00] s32=[0xe01, 0xe00] var_off=(0xe00, 0x0) We reach instruction 236 with two possible values for R9, 0xe00 and 0xf00. This is perfectly reflected in the tnum, but of course the ranges are less accurate and cover [0xe00; 0xf00]. Taking the fallthrough path at instruction 236 allows the verifier to reduce the range to [0xe01; 0xf00]. The tnum is however not updated. With these ranges, at instruction 237, the verifier is not able to deduce that R9 is always equal to 0xf00. Hence the fallthrough pass is explored first, the verifier refines the bounds using the assumption that R9 != 0xf00, and ends up with an invariant violation. This pattern of impossible branch + bounds refinement is common to all invariant violations seen so far. The long-term solution is likely to rely on the refinement + invariant violation check to detect dead branches, as started by Eduard. To fix the current issue, we need something with less refactoring that we can backport. This patch uses the tnum_step helper introduced in the previous patch to detect the above situation. In particular, three cases are now detected in the bounds refinement: 1. The u64 range and the tnum only overlap in umin. u64: ---[xxxxxx]----- tnum: --xx----------x- 2. The u64 range and the tnum only overlap in the maximum value represented by the tnum, called tmax. u64: ---[xxxxxx]----- tnum: xx-----x-------- 3. The u64 range and the tnum only overlap in between umin (excluded) and umax. u64: ---[xxxxxx]----- tnum: xx----x-------x- To detect these three cases, we call tnum_step(tnum, umin), which returns the smallest member of the tnum greater than umin, called tnum_next here. We're in case (1) if umin is part of the tnum and tnum_next is greater than umax. We're in case (2) if umin is not part of the tnum and tnum_next is equal to tmax. Finally, we're in case (3) if umin is not part of the tnum, tnum_next is inferior or equal to umax, and calling tnum_step a second time gives us a value past umax. This change implements these three cases. With it, the above bytecode looks as follows: 0: (85) call bpf_get_prandom_u32#7 ; R0=scalar() 1: (47) r0 \|= 3584 ; R0=scalar(smin=0x8000000000000e00,umin=umin32=3584,smin32=0x80000e00,var_off=(0xe00; 0xfffffffffffff1ff)) 2: (57) r0 &= 3840 ; R0=scalar(smin=umin=smin32=umin32=3584,smax=umax=smax32=umax32=3840,var_off=(0xe00; 0x100)) 3: (15) if r0 == 0xe00 goto pc+2 ; R0=3840 4: (15) if r0 == 0xf00 goto pc+1 4: R0=3840 6: (95) exit In addition to the new selftests, this change was also verified with Agni [3]. For the record, the raw SMT is available at [4]. The property it verifies is that: If a concrete value x is contained in all input abstract values, after __update_reg_bounds, it will continue to be contained in all output abstract values. Link: https://github.com/cilium/cilium/issues/44216 [1] Link: https://pchaigno.github.io/test-verifier-complexity.html [2] Link: https://github.com/bpfverif/agni [3] Link: https://pastebin.com/raw/naCfaqNx [4] Fixes: `0df1a55afa` ("bpf: Warn on internal verifier errors") Acked-by: Eduard Zingerman <eddyz87@gmail.com> Tested-by: Marco Schirrmeister <mschirrmeister@gmail.com> Co-developed-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com> Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/ef254c4f68be19bd393d450188946821c588565d.1772225741.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-02-27 16:11:50 -08:00
Harishankar Vishwanathan	76e954155b	bpf: Introduce tnum_step to step through tnum's members This commit introduces tnum_step(), a function that, when given t, and a number z returns the smallest member of t larger than z. The number z must be greater or equal to the smallest member of t and less than the largest member of t. The first step is to compute j, a number that keeps all of t's known bits, and matches all unknown bits to z's bits. Since j is a member of the t, it is already a candidate for result. However, we want our result to be (minimally) greater than z. There are only two possible cases: (1) Case j <= z. In this case, we want to increase the value of j and make it > z. (2) Case j > z. In this case, we want to decrease the value of j while keeping it > z. (Case 1) j <= z t = xx11x0x0 z = 10111101 (189) j = 10111000 (184) ^ k (Case 1.1) Let's first consider the case where j < z. We will address j == z later. Since z > j, there had to be a bit position that was 1 in z and a 0 in j, beyond which all positions of higher significance are equal in j and z. Further, this position could not have been unknown in a, because the unknown positions of a match z. This position had to be a 1 in z and known 0 in t. Let k be position of the most significant 1-to-0 flip. In our example, k = 3 (starting the count at 1 at the least significant bit). Setting (to 1) the unknown bits of t in positions of significance smaller than k will not produce a result > z. Hence, we must set/unset the unknown bits at positions of significance higher than k. Specifically, we look for the next larger combination of 1s and 0s to place in those positions, relative to the combination that exists in z. We can achieve this by concatenating bits at unknown positions of t into an integer, adding 1, and writing the bits of that result back into the corresponding bit positions previously extracted from z. >From our example, considering only positions of significance greater than k: t = xx..x z = 10..1 + 1 ----- 11..0 This is the exact combination 1s and 0s we need at the unknown bits of t in positions of significance greater than k. Further, our result must only increase the value minimally above z. Hence, unknown bits in positions of significance smaller than k should remain 0. We finally have, result = 11110000 (240) (Case 1.2) Now consider the case when j = z, for example t = 1x1x0xxx z = 10110100 (180) j = 10110100 (180) Matching the unknown bits of the t to the bits of z yielded exactly z. To produce a number greater than z, we must set/unset the unknown bits in t, and all the unknown bits of t candidates for being set/unset. We can do this similar to Case 1.1, by adding 1 to the bits extracted from the masked bit positions of z. Essentially, this case is equivalent to Case 1.1, with k = 0. t = 1x1x0xxx z = .0.1.100 + 1 --------- .0.1.101 This is the exact combination of bits needed in the unknown positions of t. After recalling the known positions of t, we get result = 10110101 (181) (Case 2) j > z t = x00010x1 z = 10000010 (130) j = 10001011 (139) ^ k Since j > z, there had to be a bit position which was 0 in z, and a 1 in j, beyond which all positions of higher significance are equal in j and z. This position had to be a 0 in z and known 1 in t. Let k be the position of the most significant 0-to-1 flip. In our example, k = 4. Because of the 0-to-1 flip at position k, a member of t can become greater than z if the bits in positions greater than k are themselves >= to z. To make that member minimally greater than z, the bits in positions greater than k must be exactly = z. Hence, we simply match all of t's unknown bits in positions more significant than k to z's bits. In positions less significant than k, we set all t's unknown bits to 0 to retain minimality. In our example, in positions of greater significance than k (=4), t=x000. These positions are matched with z (1000) to produce 1000. In positions of lower significance than k, t=10x1. All unknown bits are set to 0 to produce 1001. The final result is: result = 10001001 (137) This concludes the computation for a result > z that is a member of t. The procedure for tnum_step() in this commit implements the idea described above. As a proof of correctness, we verified the algorithm against a logical specification of tnum_step. The specification asserts the following about the inputs t, z and output res that: 1. res is a member of t, and 2. res is strictly greater than z, and 3. there does not exist another value res2 such that 3a. res2 is also a member of t, and 3b. res2 is greater than z 3c. res2 is smaller than res We checked the implementation against this logical specification using an SMT solver. The verification formula in SMTLIB format is available at [1]. The verification returned an "unsat": indicating that no input assignment exists for which the implementation and the specification produce different outputs. In addition, we also automatically generated the logical encoding of the C implementation using Agni [2] and verified it against the same specification. This verification also returned an "unsat", confirming that the implementation is equivalent to the specification. The formula for this check is also available at [3]. Link: https://pastebin.com/raw/2eRWbiit [1] Link: https://github.com/bpfverif/agni [2] Link: https://pastebin.com/raw/EztVbBJ2 [3] Co-developed-by: Srinivas Narayana <srinivas.narayana@rutgers.edu> Signed-off-by: Srinivas Narayana <srinivas.narayana@rutgers.edu> Co-developed-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu> Signed-off-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu> Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com> Link: https://lore.kernel.org/r/93fdf71910411c0f19e282ba6d03b4c65f9c5d73.1772225741.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-02-27 16:11:50 -08:00
Alexei Starovoitov	8c0d9e178d	Merge branch 'bpf-cpumap-devmap-fix-per-cpu-bulk-queue-races-on-preempt_rt' Jiayuan Chen says: ==================== bpf: Fix per-CPU bulk queue races on PREEMPT_RT On PREEMPT_RT kernels, local_bh_disable() only calls migrate_disable() (when PREEMPT_RT_NEEDS_BH_LOCK is not set) and does not disable preemption. This means CFS scheduling can preempt a task inside the per-CPU bulk queue (bq) operations in cpumap and devmap, allowing another task on the same CPU to concurrently access the same bq, leading to use-after-free, list corruption, and kernel panics. Patch 1 fixes the cpumap race in bq_flush_to_queue(), originally reported by syzbot [1]. Patch 2 fixes the same class of race in devmap's bq_xmit_all(), identified by code inspection after Sebastian Andrzej Siewior pointed out that devmap has the same per-CPU bulk queue pattern [2]. Both patches use local_lock_nested_bh() to serialize access to the per-CPU bq. On non-RT this is a pure lockdep annotation with no overhead; on PREEMPT_RT it provides a per-CPU sleeping lock. To reproduce the devmap race, insert an mdelay(100) in bq_xmit_all() after "cnt = bq->count" and before the actual transmit loop. Then pin two threads to the same CPU, each running BPF_PROG_TEST_RUN with an XDP program that redirects to a DEVMAP entry (e.g. a veth pair). CFS timeslicing during the mdelay window causes interleaving. Without the fix, KASAN reports null-ptr-deref due to operating on freed frames: BUG: KASAN: null-ptr-deref in __build_skb_around+0x22d/0x340 Write of size 32 at addr 0000000000000d50 by task devmap_race_rep/449 CPU: 0 UID: 0 PID: 449 Comm: devmap_race_rep Not tainted 6.19.0+ #31 PREEMPT_RT Call Trace: <TASK> __build_skb_around+0x22d/0x340 build_skb_around+0x25/0x260 __xdp_build_skb_from_frame+0x103/0x860 veth_xdp_rcv_bulk_skb.isra.0+0x162/0x320 veth_xdp_rcv.constprop.0+0x61e/0xbb0 veth_poll+0x280/0xb50 __napi_poll.constprop.0+0xa5/0x590 net_rx_action+0x4b0/0xea0 handle_softirqs.isra.0+0x1b3/0x780 __local_bh_enable_ip+0x12a/0x240 xdp_test_run_batch.constprop.0+0xedd/0x1f60 bpf_test_run_xdp_live+0x304/0x640 bpf_prog_test_run_xdp+0xd24/0x1b70 __sys_bpf+0x61c/0x3e00 </TASK> Kernel panic - not syncing: Fatal exception in interrupt [1] https://lore.kernel.org/all/69369331.a70a0220.38f243.009d.GAE@google.com/T/ [2] https://lore.kernel.org/bpf/20260212023634.366343-1-jiayuan.chen@linux.dev/ v3 -> v4: https://lore.kernel.org/all/20260213034018.284146-1-jiayuan.chen@linux.dev/ - Move panic trace to cover letter. (Sebastian Andrzej Siewior) - Add Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> to both patches from cover letter. v2 -> v3: https://lore.kernel.org/bpf/20260212023634.366343-1-jiayuan.chen@linux.dev/ - Fix commit message: remove incorrect "spin_lock() becomes rt_mutex" claim, the per-CPU bq has no spin_lock at all. (Sebastian Andrzej Siewior) - Fix commit message: accurately describe local_lock_nested_bh() behavior instead of referencing local_lock(). (Sebastian Andrzej Siewior) - Remove incomplete discussion of snapshot alternative. (Sebastian Andrzej Siewior) - Remove panic trace from commit message. (Sebastian Andrzej Siewior) - Add patch 2/2 for devmap, same race pattern. (Sebastian Andrzej Siewior) v1 -> v2: https://lore.kernel.org/bpf/20260211064417.196401-1-jiayuan.chen@linux.dev/ - Use local_lock_nested_bh()/local_unlock_nested_bh() instead of local_lock()/local_unlock(), since these paths already run under local_bh_disable(). (Sebastian Andrzej Siewior) - Replace "Caller must hold bq->bq_lock" comment with lockdep_assert_held() in bq_flush_to_queue(). (Sebastian Andrzej Siewior) - Fix Fixes tag to `3253cb49cb` ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT") which is the actual commit that makes the race possible. (Sebastian Andrzej Siewior) ==================== Link: https://patch.msgid.link/20260225121459.183121-1-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-02-27 16:08:34 -08:00
Jiayuan Chen	1872e75375	bpf: Fix race in devmap on PREEMPT_RT On PREEMPT_RT kernels, the per-CPU xdp_dev_bulk_queue (bq) can be accessed concurrently by multiple preemptible tasks on the same CPU. The original code assumes bq_enqueue() and __dev_flush() run atomically with respect to each other on the same CPU, relying on local_bh_disable() to prevent preemption. However, on PREEMPT_RT, local_bh_disable() only calls migrate_disable() (when PREEMPT_RT_NEEDS_BH_LOCK is not set) and does not disable preemption, which allows CFS scheduling to preempt a task during bq_xmit_all(), enabling another task on the same CPU to enter bq_enqueue() and operate on the same per-CPU bq concurrently. This leads to several races: 1. Double-free / use-after-free on bq->q[]: bq_xmit_all() snapshots cnt = bq->count, then iterates bq->q[0..cnt-1] to transmit frames. If preempted after the snapshot, a second task can call bq_enqueue() -> bq_xmit_all() on the same bq, transmitting (and freeing) the same frames. When the first task resumes, it operates on stale pointers in bq->q[], causing use-after-free. 2. bq->count and bq->q[] corruption: concurrent bq_enqueue() modifying bq->count and bq->q[] while bq_xmit_all() is reading them. 3. dev_rx/xdp_prog teardown race: __dev_flush() clears bq->dev_rx and bq->xdp_prog after bq_xmit_all(). If preempted between bq_xmit_all() return and bq->dev_rx = NULL, a preempting bq_enqueue() sees dev_rx still set (non-NULL), skips adding bq to the flush_list, and enqueues a frame. When __dev_flush() resumes, it clears dev_rx and removes bq from the flush_list, orphaning the newly enqueued frame. 4. __list_del_clearprev() on flush_node: similar to the cpumap race, both tasks can call __list_del_clearprev() on the same flush_node, the second dereferences the prev pointer already set to NULL. The race between task A (__dev_flush -> bq_xmit_all) and task B (bq_enqueue -> bq_xmit_all) on the same CPU: Task A (xdp_do_flush) Task B (ndo_xdp_xmit redirect) ---------------------- -------------------------------- __dev_flush(flush_list) bq_xmit_all(bq) cnt = bq->count /* e.g. 16 / / start iterating bq->q[] / <-- CFS preempts Task A --> bq_enqueue(dev, xdpf) bq->count == DEV_MAP_BULK_SIZE bq_xmit_all(bq, 0) cnt = bq->count / same 16! / ndo_xdp_xmit(bq->q[]) / frames freed by driver / bq->count = 0 <-- Task A resumes --> ndo_xdp_xmit(bq->q[]) / use-after-free: frames already freed! */ Fix this by adding a local_lock_t to xdp_dev_bulk_queue and acquiring it in bq_enqueue() and __dev_flush(). These paths already run under local_bh_disable(), so use local_lock_nested_bh() which on non-RT is a pure annotation with no overhead, and on PREEMPT_RT provides a per-CPU sleeping lock that serializes access to the bq. Fixes: `3253cb49cb` ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT") Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20260225121459.183121-3-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-02-27 16:08:10 -08:00
Jiayuan Chen	869c63d597	bpf: Fix race in cpumap on PREEMPT_RT On PREEMPT_RT kernels, the per-CPU xdp_bulk_queue (bq) can be accessed concurrently by multiple preemptible tasks on the same CPU. The original code assumes bq_enqueue() and __cpu_map_flush() run atomically with respect to each other on the same CPU, relying on local_bh_disable() to prevent preemption. However, on PREEMPT_RT, local_bh_disable() only calls migrate_disable() (when PREEMPT_RT_NEEDS_BH_LOCK is not set) and does not disable preemption, which allows CFS scheduling to preempt a task during bq_flush_to_queue(), enabling another task on the same CPU to enter bq_enqueue() and operate on the same per-CPU bq concurrently. This leads to several races: 1. Double __list_del_clearprev(): after bq->count is reset in bq_flush_to_queue(), a preempting task can call bq_enqueue() -> bq_flush_to_queue() on the same bq when bq->count reaches CPU_MAP_BULK_SIZE. Both tasks then call __list_del_clearprev() on the same bq->flush_node, the second call dereferences the prev pointer that was already set to NULL by the first. 2. bq->count and bq->q[] races: concurrent bq_enqueue() can corrupt the packet queue while bq_flush_to_queue() is processing it. The race between task A (__cpu_map_flush -> bq_flush_to_queue) and task B (bq_enqueue -> bq_flush_to_queue) on the same CPU: Task A (xdp_do_flush) Task B (cpu_map_enqueue) ---------------------- ------------------------ bq_flush_to_queue(bq) spin_lock(&q->producer_lock) /* flush bq->q[] to ptr_ring / bq->count = 0 spin_unlock(&q->producer_lock) bq_enqueue(rcpu, xdpf) <-- CFS preempts Task A --> bq->q[bq->count++] = xdpf / ... more enqueues until full ... / bq_flush_to_queue(bq) spin_lock(&q->producer_lock) / flush to ptr_ring / spin_unlock(&q->producer_lock) __list_del_clearprev(flush_node) / sets flush_node.prev = NULL / <-- Task A resumes --> __list_del_clearprev(flush_node) flush_node.prev->next = ... / prev is NULL -> kernel oops */ Fix this by adding a local_lock_t to xdp_bulk_queue and acquiring it in bq_enqueue() and __cpu_map_flush(). These paths already run under local_bh_disable(), so use local_lock_nested_bh() which on non-RT is a pure annotation with no overhead, and on PREEMPT_RT provides a per-CPU sleeping lock that serializes access to the bq. To reproduce, insert an mdelay(100) between bq->count = 0 and __list_del_clearprev() in bq_flush_to_queue(), then run reproducer provided by syzkaller. Fixes: `3253cb49cb` ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT") Reported-by: syzbot+2b3391f44313b3983e91@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69369331.a70a0220.38f243.009d.GAE@google.com/T/ Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20260225121459.183121-2-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-02-27 16:07:14 -08:00
Alexei Starovoitov	5263e30fff	Merge branch 'close-race-in-freeing-special-fields-and-map-value' Kumar Kartikeya Dwivedi says: ==================== Close race in freeing special fields and map value There exists a race across various map types where the freeing of special fields (tw, timer, wq, kptr, etc.) can be done eagerly when a logical delete operation is done on a map value, such that the program which continues to have access to such a map value can recreate the fields and cause them to leak. The set contains fixes for this case. It is a continuation of Mykyta's previous attempt in [0], but applies to all fields. A test is included which reproduces the bug reliably in absence of the fixes. Local Storage Benchmarks ------------------------ Evaluation Setup: Benchmarked on a dual-socket Intel Xeon Gold 6348 (Ice Lake) @ 2.60GHz (56 cores / 112 threads), with the CPU governor set to performance. Bench was pinned to a single NUMA node throughout the test. Benchmark comes from [1] using the following command: ./bench -p 1 local-storage-create --storage-type <socket,task> --batch-size <16,32,64> Before the test, 10 runs of all cases ([socket\|task] x 3 batch sizes x 7 iterations per batch size) are done to warm up and prime the machine. Then, 3 runs of all cases are done (with and without the patch, across reboots). For each comparison, we have 21 samples, i.e. per batch size (e.g. socket 16) of a given local storage, we have 3 runs x 7 iterations. The statistics (mean, median, stddev) and t-test is done for each scenario (local storage and batch size pair) individually (21 samples for either case). All values are for local storage creations in thousand creations / sec (k/s). Baseline (without patch) With patch Delta Case Median Mean Std. Dev. Median Mean Std. Dev. Median % --------------------------------------------------------------------------------------------------- socket 16 432.026 431.941 1.047 431.347 431.953 1.635 -0.679 -0.16% socket 32 432.641 432.818 1.535 432.488 432.302 1.508 -0.153 -0.04% socket 64 431.504 431.996 1.337 429.145 430.326 2.469 -2.359 -0.55% task 16 38.816 39.382 1.456 39.657 39.337 1.831 +0.841 +2.17% task 32 38.815 39.644 2.690 38.721 39.122 1.636 -0.094 -0.24% task 64 37.562 38.080 1.701 39.554 38.563 1.689 +1.992 +5.30% The cases for socket are within the range of noise, and improvements in task local storage are due to high variance (CV ~4%-6% across batch sizes). The only statistically significant case worth mentioning is socket with batch size 64 with p-value from t-test < 0.05, but the absolute difference is small (~2k/s). TL;DR there doesn't appear to be any significant regression or improvement. [0]: https://lore.kernel.org/bpf/20260216131341.1285427-1-mykyta.yatsenko5@gmail.com [1]: https://lore.kernel.org/bpf/20260205222916.1788211-1-ameryhung@gmail.com Changelog: ---------- v2 -> v3 v2: https://lore.kernel.org/bpf/20260227052031.3988575-1-memxor@gmail.com * Add syzbot Tested-by. * Add Amery's Reviewed-by. * Fix missing rcu_dereference_check() in __bpf_selem_free_rcu. (BPF CI Bot) * Remove migrate_disable() in bpf_selem_free_rcu. (Alexei) v1 -> v2 v1: https://lore.kernel.org/bpf/20260225185121.2057388-1-memxor@gmail.com * Add Paul's Reviewed-by. * Fix use-after-free in accessing bpf_mem_alloc embedded in map. (syzbot CI) * Add benchmark numbers for local storage. * Add extra test case for per-cpu hashmap coverage with up to 16 refcount leaks. * Target bpf tree. ==================== Link: https://patch.msgid.link/20260227224806.646888-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-02-27 15:39:01 -08:00
Kumar Kartikeya Dwivedi	2939d7b3b0	selftests/bpf: Add tests for special fields races Add a couple of tests to ensure that the refcount drops to zero when we exercise the race where creation of a special field succeeds the logical bpf_obj_free_fields done when deleting an element. Prior to previous changes, the fields would be freed eagerly and repopulate and end up leaking, causing the reference to not drop down correctly. Running this test on a kernel without fixes will cause a hang in delete_module, since the module reference stays active due to the leaked kptr not dropping it. After the fixes tests succeed as expected. Reviewed-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260227224806.646888-6-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-02-27 15:39:00 -08:00
Kumar Kartikeya Dwivedi	baa35b3cb6	bpf: Retire rcu_trace_implies_rcu_gp() from local storage This assumption will always hold going forward, hence just remove the various checks and assume it is true with a comment for the uninformed reader. Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260227224806.646888-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-02-27 15:39:00 -08:00
Kumar Kartikeya Dwivedi	f41deee082	bpf: Delay freeing fields in local storage Currently, when use_kmalloc_nolock is false, the freeing of fields for a local storage selem is done eagerly before waiting for the RCU or RCU tasks trace grace period to elapse. This opens up a window where the program which has access to the selem can recreate the fields after the freeing of fields is done eagerly, causing memory leaks when the element is finally freed and returned to the kernel. Make a few changes to address this. First, delay the freeing of fields until after the grace periods have expired using a __bpf_selem_free_rcu wrapper which is eventually invoked after transitioning through the necessary number of grace period waits. Replace usage of the kfree_rcu with call_rcu to be able to take a custom callback. Finally, care needs to be taken to extend the rcu barriers for all cases, and not just when use_kmalloc_nolock is true, as RCU and RCU tasks trace callbacks can be in flight for either case and access the smap field, which is used to obtain the BTF record to walk over special fields in the map value. While we're at it, drop migrate_disable() from bpf_selem_free_rcu, since migration should be disabled for RCU callbacks already. Fixes: `9bac675e63` ("bpf: Postpone bpf_obj_free_fields to the rcu callback") Reviewed-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260227224806.646888-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-02-27 15:39:00 -08:00
Kumar Kartikeya Dwivedi	ae51772b1e	bpf: Lose const-ness of map in map_check_btf() BPF hash map may now use the map_check_btf() callback to decide whether to set a dtor on its bpf_mem_alloc or not. Unlike C++ where members can opt out of const-ness using mutable, we must lose the const qualifier on the callback such that we can avoid the ugly cast. Make the change and adjust all existing users, and lose the comment in hashtab.c. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260227224806.646888-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-02-27 15:39:00 -08:00
Kumar Kartikeya Dwivedi	1df97a7453	bpf: Register dtor for freeing special fields There is a race window where BPF hash map elements can leak special fields if the program with access to the map value recreates these special fields between the check_and_free_fields done on the map value and its eventual return to the memory allocator. Several ways were explored prior to this patch, most notably [0] tried to use a poison value to reject attempts to recreate special fields for map values that have been logically deleted but still accessible to BPF programs (either while sitting in the free list or when reused). While this approach works well for task work, timers, wq, etc., it is harder to apply the idea to kptrs, which have a similar race and failure mode. Instead, we change bpf_mem_alloc to allow registering destructor for allocated elements, such that when they are returned to the allocator, any special fields created while they were accessible to programs in the mean time will be freed. If these values get reused, we do not free the fields again before handing the element back. The special fields thus may remain initialized while the map value sits in a free list. When bpf_mem_alloc is retired in the future, a similar concept can be introduced to kmalloc_nolock-backed kmem_cache, paired with the existing idea of a constructor. Note that the destructor registration happens in map_check_btf, after the BTF record is populated and (at that point) avaiable for inspection and duplication. Duplication is necessary since the freeing of embedded bpf_mem_alloc can be decoupled from actual map lifetime due to logic introduced to reduce the cost of rcu_barrier()s in mem alloc free path in `9f2c6e96c6` ("bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc."). As such, once all callbacks are done, we must also free the duplicated record. To remove dependency on the bpf_map itself, also stash the key size of the map to obtain value from htab_elem long after the map is gone. [0]: https://lore.kernel.org/bpf/20260216131341.1285427-1-mykyta.yatsenko5@gmail.com Fixes: `14a324f6a6` ("bpf: Wire up freeing of referenced kptr") Fixes: `1bfbc267ec` ("bpf: Enable bpf_timer and bpf_wq in any context") Reported-by: Alexei Starovoitov <ast@kernel.org> Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260227224806.646888-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-02-27 15:39:00 -08:00
Linus Torvalds	4d349ee5c7	arm64 fixes for -rc2 - Fix cpufreq warning due to attempting a cross-call with interrupts masked when reading local AMU counters. - Fix DEBUG_PREEMPT warning from the delay loop when it tries to access per-cpu errata workaround state for the virtual counter. - Re-jig and optimise our TLB invalidation errata workarounds in preparation for more hardware brokenness. - Fix GCS mappings to interact properly with PROT_NONE and to avoid corrupting the pte on CPUs with FEAT_LPA2. - Fix ioremap_prot() to extract only the memory attributes from the user pte and ignore all the other 'prot' bits. -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmmh458QHHdpbGxAa2Vy bmVsLm9yZwAKCRC3rHDchMFjNOA1B/9RhZoGSDEuTKVjue1UtVIaFXs5v0LFqgIf c5eCtYzHU2djw0zVUv4DHZaPvxmxjzu4G4safzjB23IwMDkeGM2hmDBigw/HgWXN Nm46UWFnmWrd/58w895r3QAe5wbTuhwzVdj9bUbYNGf9/Tqey/hbn8e2BJFdKC1H Dt/uBYNuyRLSe94iYTzKcWPKH5CnSAkQ7lshxOhHSq5WF+ybmNywJnSqcTy+2kHb cU9/TXWd6ck5MqK/pzu9nCLPPGpSmOYCjWPiEB2pWiHRPlykK3r7fhvYlqfySRtx r14TlLIpevstCc8dr7NS2rj6lkjXKzggWOm0fD/rPusSj5Jn7qkk =MmjE -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "The diffstat is dominated by changes to our TLB invalidation errata handling and the introduction of a new GCS selftest to catch one of the issues that is fixed here relating to PROT_NONE mappings. - Fix cpufreq warning due to attempting a cross-call with interrupts masked when reading local AMU counters - Fix DEBUG_PREEMPT warning from the delay loop when it tries to access per-cpu errata workaround state for the virtual counter - Re-jig and optimise our TLB invalidation errata workarounds in preparation for more hardware brokenness - Fix GCS mappings to interact properly with PROT_NONE and to avoid corrupting the pte on CPUs with FEAT_LPA2 - Fix ioremap_prot() to extract only the memory attributes from the user pte and ignore all the other 'prot' bits" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: topology: Fix false warning in counters_read_on_cpu() for same-CPU reads arm64: Fix sampling the "stable" virtual counter in preemptible section arm64: tlb: Optimize ARM64_WORKAROUND_REPEAT_TLBI arm64: tlb: Allow XZR argument to TLBI ops kselftest: arm64: Check access to GCS after mprotect(PROT_NONE) arm64: gcs: Honour mprotect(PROT_NONE) on shadow stack mappings arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is enabled arm64: io: Extract user memory type in ioremap_prot() arm64: io: Rename ioremap_prot() to __ioremap_prot()	2026-02-27 13:40:30 -08:00
Linus Torvalds	1c63df24be	pci-v7.0-fixes-2 -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmmh9ZEUHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vzh5RAAnapJg4KrABnDmsxyk2wFi0KTP1pA O0OUkdaAfpFsreGG9nUpRrrkG8QvMLB8sxoJ0PlFdL4kljp8gN56dTXhhSZuT9UA FZtoJgZWV+kjw8fk9vxnqc/yOzD8knXbF9AILTKRUDCUiEmWsVjcvEq5sDV7hpx/ DRuLtbWH8cY5vKXl7DS1HpORal73X3yCMZ1kRofJ+8CYOWdNotJp3jDss9M61bIw inhDy2iOR6UuxAjfRujT2B/sDWzdZ9jTENahb5PB66lWxbSfHXfnItlhcXUu/ojK ap2oIf/ResUjbtlQZQ2BYtouNilwI9hB23PdyYX/rDcCsvoUeJrYhMIam3G1zfYC xDg7jI7JEctB7hndC/yDE6iN9BJaY+nWrK7dCkL1zstawagT3CLcWq/j3XvYuLSo t1Bp1tTuYW8WFoeSHkPuazpVgm+sM5hHP08TNY/ErZ3nIlEajsQH4/yW+8WTtaYv uUuW+W6ATMVe/K38qBpAqkbAjaQ5m0zFSEsTjQyhssU+Atpqd/pJtmC8FoWZZ4DL uR+o1QT94RFJFiB91m1tbH7n2ReYIl1VvW5z27d4WVCZE1PkBnQo3EU4uUvzGmHK 1qLOGZiq4LEEG0E3KdeoZh6tqmjPsNk/ZJf/wcb02RkrvscpEymMvuvBLQk07U28 nHTEYfryFYzIq1w= =8d7b -----END PGP SIGNATURE----- Merge tag 'pci-v7.0-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Update MAINTAINERS email address (Shawn Guo) - Refresh cached Endpoint driver MSI Message Address to fix a v7.0 regression when kernel changes the address after firmware has configured it (Niklas Cassel) - Flush Endpoint MSI-X writes so they complete before the outbound ATU entry is unmapped (Niklas Cassel) - Correct the PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value, which broke VMM use of PCI capabilities (Bjorn Helgaas) * tag 'pci-v7.0-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI: Correct PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value PCI: dwc: ep: Flush MSI-X write before unmapping its ATU entry PCI: dwc: ep: Refresh MSI Message Address cache on change MAINTAINERS: Update Shawn Guo's address for HiSilicon PCIe controller driver	2026-02-27 13:32:52 -08:00
Linus Torvalds	aed968f8a6	cxl fixes for v7.0-rc2 - Fix incorrect usages of decoder flags. cxl/region: Test CXL_DECODER_F_NORMALIZED_ADDRESSING as a bitmask cxl: Test CXL_DECODER_F_LOCK as a bitmask cxl/mbox: validate payload size before accessing contents in cxl_payload_from_user_allowed() - Moving of devm_cxl_add_nvdimm_bridge() is required for the race fix. cxl: Fix race of nvdimm_bus object when creating nvdimm objects cxl: Move devm_cxl_add_nvdimm_bridge() to cxl_pmem.ko - The port_to_host() is used to fix the race window without host lock. cxl/port: Hold port host lock during dport adding. cxl/port: Introduce port_to_host() helper cxl/memdev: fix deadlock in cxl_memdev_autoremove() on attach failure -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmmhzZgACgkQYGjFFmlT OEqKrhAA2DoFTLePsbDXOAqs/9A90amNkWjE7JRuKELqVs0o00vutYiWiThP7lXP v0wsCq5V2+DT35Oo0SpDntNQkYn54vjRF4Q4VYdHvBRuU4mbWFEqctJBvYRHjXSE I4POupbqO8pAIz5R7jaRdq/PgnW1S+4ZARzkmJGXfpzgUC2JdHN+hl2aRkBWr09o 2RCmVu0H5yxLibw7DmZex/s4ksXzuegy0BDo94Xmo+GXabYHCPCiQOZobZzUMhlL 5r+YTGMY+Q3Z+nHm3fuC4lQCMClgTVf6xdHtqqZ/GQ6b69KjhO0V7/RmZENvmlLS LtW4ZwEmgQRX6rnoT3p2UaB1cu4lBscU9kYRZdmP7FnYbu8cJXe6Dfk+56lDnqy8 8bwYAPPmV/IBv4dRecoliYhGMJ4TMmgCoxwxuh2KROUpKI1GkIsTzMeoypviS6Nm 5EAlMvoMZkgygkpJ+z8u009IgIQyxQRWqIxw0aLcutAlBhtZxb8/iq8dOCrgCb64 HFpeglrmechDbXgf9hHWWh+RX+A6MrhLTcjyYIdVJzb7jrfWkwUDZaKg7cRni51W IPZtjaawvEMbYRs+dVP0frZvLDqPYE0U4Y3GWa7d9XRSHx0QwTcLwi8ShXEx3dqS wcL/THCEH0Zhtl3LU6CDfeChPKyyUnt2ChbucccVLjN2thaRH44= =4//d -----END PGP SIGNATURE----- Merge tag 'cxl-fixes-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dave Jiang: - Fix incorrect usages of decoder flags - Validate payload size before accessing contents - Fix race condition when creating nvdimm objects - Fix deadlock on attach failure * tag 'cxl-fixes-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/region: Test CXL_DECODER_F_NORMALIZED_ADDRESSING as a bitmask cxl: Test CXL_DECODER_F_LOCK as a bitmask cxl/mbox: validate payload size before accessing contents in cxl_payload_from_user_allowed() cxl: Fix race of nvdimm_bus object when creating nvdimm objects cxl: Move devm_cxl_add_nvdimm_bridge() to cxl_pmem.ko cxl/port: Hold port host lock during dport adding. cxl/port: Introduce port_to_host() helper cxl/memdev: fix deadlock in cxl_memdev_autoremove() on attach failure	2026-02-27 10:52:57 -08:00
Linus Torvalds	962336b9ff	MMC core: - Avoid bitfield RMW for claim/retune flags MMC host: - dw_mmc-rockchip: Fix runtime PM support for internal phase support - mmci: Fix device_node reference leak in of_get_dml_pipe_index() - sdhci-brcmstb: Use correct register offset for V1 pin_sel restore -----BEGIN PGP SIGNATURE----- iQJLBAABCgA1FiEEugLDXPmKSktSkQsV/iaEJXNYjCkFAmmhyS4XHHVsZi5oYW5z c29uQGxpbmFyby5vcmcACgkQ/iaEJXNYjCnhaA/+IzAHwVL1DGwZNbO/5cO84rkL vx7Hnzt0CMbwkBlEdFFXUDt1/ESSWNQk2e7ydNCElwBIz0PdBp5hlMIyumksx1D0 sQibqz4kNywtAAVtFcgemKb8BVdf7bdwVD+5sGnFhHmX8+GyTjlTd9Qj/Yt7bgcA ywWfl+w45rR0bypGsfMJVQ3l2pslwiXTyo0K4TUJiZkCtpwu9hzSfFShGM2ipYFX HiSTGV240MvxYteuqgVJIFdQmEje1NczGjRS6s6BIvt7zwKcfJgJACpue6ZJu3Ra vy9NFnNXCQ2C1LFtr0MjjrdWBowed6/HJJoxCgfcmdxbnqYqZWMfDGrOtH6xP8HQ xGHJJF5BA3EvvyfBg5hgUKVvQ80yRnIYUJ52Tzwcm0U9gcHe1tu1rxRYwNPUpLBS L/smXIvt0j/WVPEjKO+I1TXB1XMqJFSh88vAut7UXbCtJZN9YBP/aRqXJSVc5gs/ LKNWrNr1qymusvPagnlMnYGOdJKQbBNKYeN7oyEwyN/oW/JF6JGVuWctvJQuigVz xDw2g7cnI/CMh+gAQumd1H23Rnwy+kL16Ld9z27ih4PB54Vq7VlTxbpJCdREDxU6 y8c3C09AKOjMfOPXwqnJ/2HH+y4IpYW9WykuaKJFRKekx87irKuc8BOC5cGwXL8H WXZbxqLrQM9GkPXiu+w= =zG3/ -----END PGP SIGNATURE----- Merge tag 'mmc-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - Avoid bitfield RMW for claim/retune flags MMC host: - dw_mmc-rockchip: Fix runtime PM support for internal phase support - mmci: Fix device_node reference leak in of_get_dml_pipe_index() - sdhci-brcmstb: Use correct register offset for V1 pin_sel restore" * tag 'mmc-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: core: Avoid bitfield RMW for claim/retune flags mmc: sdhci-brcmstb: use correct register offset for V1 pin_sel restore mmc: dw_mmc-rockchip: Fix runtime PM support for internal phase support mmc: mmci: Fix device_node reference leak in of_get_dml_pipe_index()	2026-02-27 10:49:54 -08:00
Linus Torvalds	78d964c47f	block-7.0-20260227 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmhxjgQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpqZND/9ZHjUX2BhtEHWayTLpgBl5JYeYvNnxgeKw MsQjmkDBJDdpxz5s5Y6zY+OKDhWv+o4VgfgykK5bpn5/kXgwqvKSK9QroVSl6OPk JTFo3PQ0KhuZQyBxX+zZPNjgV/Kzop4j7NuNrovrIEJtvgVZuwJx/0h/DdpLez3N +l9jWQcjLjFJHHnTuV9Tv1jqhcPU5vxkkEWlZgwkWMxdghMWWw5JDHxGt1w3g6c8 4Qmn1G3SqAPEItQ+ugU+BL+vYVKVm9/Gaa4JWJTm8d1ADmQxd1xXfkDLSBL2L4EQ VU+lkHwFDhx8hjFSeOTps90lTlrUzIaAqYn532VVsU/RMWFzcdUbPJdED7QddUNG gGj40sUu5XaMoBKzCRJ6KznYMCK3x15gptz8MAj2hfqcUk+GRLKO9crXj0tU35iP sBVMlC66+nVDNQRybqVUMTpizARoFHiRCKSwaFjbsnoe1DkIayn8ghDLhMm9W8Ea hgroXlmgafB7Ad/Kt5SSYv0Nw644Zw4bLTrIO1G+OuQf6DRMGt6z5dmmo9arsgjU fKNs7Dwb+1d0cpKFNHpnQOc4htO3G1B1stGT6WEJUpd5MXDurw4VoNdaRkOSbi75 hd+VM+dyJEDHpc/QuRSFBQQqbFLJ4Hfi3z/+oUP8x1FVbDb2u7b+2H7BFgrn08Y+ EEKRaIQ6PQ== =gtEI -----END PGP SIGNATURE----- Merge tag 'block-7.0-20260227' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fixes from Jens Axboe: "Two sets of fixes, one for drbd, and one for the zoned loop driver" * tag 'block-7.0-20260227' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: zloop: check for spurious options passed to remove zloop: advertise a volatile write cache drbd: fix null-pointer dereference on local read error drbd: Replace deprecated strcpy with strscpy drbd: fix "LOGIC BUG" in drbd_al_begin_io_nonblock()	2026-02-27 10:42:02 -08:00
Linus Torvalds	530b0b61df	io_uring-7.0-20260227 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmhxlEQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpot9EACmBXos2lXMkiFN5TcSOLKW9Lh7PNAuDUns oflfPVkS/F17muPnQIHzn5wqHpbWijx2uH0ehUXNJi+6U7OBdvZyZvEfdslh2ewZ TAPnTFDZZbvdPqnGw7MJ1fLqGOB0RLoMJ72GkwIPhV6SQqmfu6U89ppplyMxybLK MPLOx3j8HK/pX/3uLEyOpZ6XIfZjGyGiiMj8lEN+UZ5a9b5G3W87LnGPTRitl6SL j5QTC5abGVk0vOqEPjm6Qws1icU9MumNAqTBTGL12WVlDRw0bQfSsZQzwySLXQpc pbT3CUurt3jgU873S8xPnbc0v5g/YMLuGv7morMRHJ13h4g1lTR3Q5u5+KX3RTBQ /I+R6C0uptLity+mBymdf0ZSgFOG7J7vfI6MzlqjUXSUqCfF3QVqrw5GrSDiDbI7 oO+JjSExM01w6kf1dcxk1ROJxxNiNWPpuP958poDydpD9jnp+Mr6jHuN66Es0Ctd fBvVMa1w2MzYopIeclN4KNvuPQV+HxITMt9RgNW+6iXo7Fwqy1SCyGbuOr6t1f4Y SD8s6xVYw47OilvGyaFZrERIq0xe8SjoCaNeVgDk9sX8g2XCZL1Utop9FbkzgPWW WzH6Y9by3UobHoCB2iehIbglaQtCKSwJou76Z7nXi15luZNL5MlQbQzYEaDcSlea i1dBMOmi+w== =lbcV -----END PGP SIGNATURE----- Merge tag 'io_uring-7.0-20260227' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring fixes from Jens Axboe: "Just two minor patches in here, ensuring the use of READ_ONCE() for sqe field reading is consistent across the codebase. There were two missing cases, now they are covered too" * tag 'io_uring-7.0-20260227' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring/timeout: READ_ONCE sqe->addr io_uring/cmd_net: use READ_ONCE() for ->addr3 read	2026-02-27 10:39:11 -08:00

1 2 3 4 5 ...

1427341 commits