linux/kernel
Chaitanya Kulkarni da46b5dfef blktrace: fix __this_cpu_read/write in preemptible context
tracing_record_cmdline() internally uses __this_cpu_read() and
__this_cpu_write() on the per-CPU variable trace_cmdline_save, and
trace_save_cmdline() explicitly asserts preemption is disabled via
lockdep_assert_preemption_disabled(). These operations are only safe
when preemption is off, as they were designed to be called from the
scheduler context (probe_wakeup_sched_switch() / probe_wakeup()).

__blk_add_trace() was calling tracing_record_cmdline(current) early in
the blk_tracer path, before ring buffer reservation, from process
context where preemption is fully enabled. This triggers the following
using blktests/blktrace/002:

blktrace/002 (blktrace ftrace corruption with sysfs trace)   [failed]
    runtime  0.367s  ...  0.437s
    something found in dmesg:
    [   81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33
    [   81.239580] null_blk: disk nullb1 created
    [   81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516
    [   81.362842] caller is tracing_record_cmdline+0x10/0x40
    [   81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G                 N  7.0.0-rc1lblk+ #84 PREEMPT(full)
    [   81.362877] Tainted: [N]=TEST
    [   81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
    [   81.362881] Call Trace:
    [   81.362884]  <TASK>
    [   81.362886]  dump_stack_lvl+0x8d/0xb0
    ...
    (See '/mnt/sda/blktests/results/nodev/blktrace/002.dmesg' for the entire message)

[   81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33
[   81.239580] null_blk: disk nullb1 created
[   81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516
[   81.362842] caller is tracing_record_cmdline+0x10/0x40
[   81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G                 N  7.0.0-rc1lblk+ #84 PREEMPT(full)
[   81.362877] Tainted: [N]=TEST
[   81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[   81.362881] Call Trace:
[   81.362884]  <TASK>
[   81.362886]  dump_stack_lvl+0x8d/0xb0
[   81.362895]  check_preemption_disabled+0xce/0xe0
[   81.362902]  tracing_record_cmdline+0x10/0x40
[   81.362923]  __blk_add_trace+0x307/0x5d0
[   81.362934]  ? lock_acquire+0xe0/0x300
[   81.362940]  ? iov_iter_extract_pages+0x101/0xa30
[   81.362959]  blk_add_trace_bio+0x106/0x1e0
[   81.362968]  submit_bio_noacct_nocheck+0x24b/0x3a0
[   81.362979]  ? lockdep_init_map_type+0x58/0x260
[   81.362988]  submit_bio_wait+0x56/0x90
[   81.363009]  __blkdev_direct_IO_simple+0x16c/0x250
[   81.363026]  ? __pfx_submit_bio_wait_endio+0x10/0x10
[   81.363038]  ? rcu_read_lock_any_held+0x73/0xa0
[   81.363051]  blkdev_read_iter+0xc1/0x140
[   81.363059]  vfs_read+0x20b/0x330
[   81.363083]  ksys_read+0x67/0xe0
[   81.363090]  do_syscall_64+0xbf/0xf00
[   81.363102]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   81.363106] RIP: 0033:0x7f281906029d
[   81.363111] Code: 31 c0 e9 c6 fe ff ff 50 48 8d 3d 66 63 0a 00 e8 59 ff 01 00 66 0f 1f 84 00 00 00 00 00 80 3d 41 33 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec
[   81.363113] RSP: 002b:00007ffca127dd48 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[   81.363120] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f281906029d
[   81.363122] RDX: 0000000000001000 RSI: 0000559f8bfae000 RDI: 0000000000000000
[   81.363123] RBP: 0000000000001000 R08: 0000002863a10a81 R09: 00007f281915f000
[   81.363124] R10: 00007f2818f77b60 R11: 0000000000000246 R12: 0000559f8bfae000
[   81.363126] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000000a
[   81.363142]  </TASK>

The same BUG fires from blk_add_trace_plug(), blk_add_trace_unplug(),
and blk_add_trace_rq() paths as well.

The purpose of tracing_record_cmdline() is to cache the task->comm for
a given PID so that the trace can later resolve it. It is only
meaningful when a trace event is actually being recorded. Ring buffer
reservation via ring_buffer_lock_reserve() disables preemption, and
preemption remains disabled until the event is committed :-

__blk_add_trace()
       	__trace_buffer_lock_reserve()
       		__trace_buffer_lock_reserve()
       			ring_buffer_lock_reserve()
       				preempt_disable_notrace();  <---

With this fix blktests for blktrace pass:

  blktests (master) # ./check blktrace
  blktrace/001 (blktrace zone management command tracing)      [passed]
      runtime  3.650s  ...  3.647s
  blktrace/002 (blktrace ftrace corruption with sysfs trace)   [passed]
      runtime  0.411s  ...  0.384s

Fixes: 7ffbd48d5c ("tracing: Cache comms only after an event occurred")
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-02 09:14:58 -07:00
..
bpf mm.git review status for linus..mm-nonmm-stable 2026-02-12 12:13:01 -08:00
cgroup mm.git review status for linus..mm-stable 2026-02-12 11:32:37 -08:00
configs watchdog: softlockup: panic when lockup duration exceeds N thresholds 2026-01-20 19:44:20 -08:00
debug SPDX updates for 7.0-rc1 2026-02-17 09:46:03 -08:00
dma dma-mapping update for Linux 7.0 2026-02-13 14:51:39 -08:00
entry Merge branch 'core/entry' into sched/core 2026-01-30 15:40:05 +01:00
events Performance events changes for v7.0: 2026-02-10 12:00:46 -08:00
futex Futex changes for v6.19: 2025-12-10 17:21:30 +09:00
gcov gcov: add support for GCC 15 2025-11-09 21:19:44 -08:00
irq pci-v7.0-changes 2026-02-11 17:20:38 -08:00
kcsan mm.git review status for linus..mm-nonmm-stable 2026-02-12 12:13:01 -08:00
livepatch livepatch: Fix having __klp_objects relics in non-livepatch modules 2026-02-05 08:00:44 -08:00
liveupdate mm.git review status for linus..mm-nonmm-stable 2026-02-12 12:13:01 -08:00
locking test-ww_mutex: Allow test to be run (and re-run) from userland 2025-12-18 10:45:23 +01:00
module mm.git review status for linus..mm-nonmm-stable 2026-02-12 12:13:01 -08:00
power mm.git review status for linus..mm-stable 2026-02-12 11:32:37 -08:00
printk printk, vt, fbcon: Remove console_conditional_schedule() 2026-02-14 11:09:47 +01:00
rcu tracing updates for 7.0: 2026-02-13 19:25:16 -08:00
sched mm.git review status for linus..mm-nonmm-stable 2026-02-12 12:13:01 -08:00
time Updates for the core time subsystem: 2026-02-10 16:41:59 -08:00
trace blktrace: fix __this_cpu_read/write in preemptible context 2026-03-02 09:14:58 -07:00
unwind unwind_user/fp: Use dummies instead of ifdef 2025-12-17 13:31:07 +01:00
.gitignore kheaders: rebuild kheaders_data.tar.xz when a file is modified within a minute 2025-06-24 20:30:37 +09:00
acct.c simplify the callers of file_open_name() 2026-01-13 15:18:08 -05:00
async.c
audit.c mm.git review status for linus..mm-nonmm-stable 2026-02-12 12:13:01 -08:00
audit.h audit: fix comment misindentation in audit.h 2025-10-22 19:28:06 -04:00
audit_fsnotify.c VFS/audit: introduce kern_path_parent() for audit 2025-09-23 12:37:35 +02:00
audit_tree.c mount-related stuff for this cycle 2025-10-03 10:19:44 -07:00
audit_watch.c VFS/audit: introduce kern_path_parent() for audit 2025-09-23 12:37:35 +02:00
auditfilter.c audit: Use kzalloc() instead of kmalloc()/memset() in audit_krule_to_data() 2025-11-07 16:38:34 -05:00
auditsc.c struct filename ->refcnt doesn't need to be atomic 2026-01-13 15:18:07 -05:00
backtracetest.c
bounds.c x86/asm: Remove ANNOTATE_DATA_SPECIAL usage 2025-12-03 16:53:19 +01:00
capability.c capability: Remove unused has_capability 2025-03-07 22:03:09 -06:00
cfi.c cfi: Move BPF CFI types and helpers to generic code 2025-07-31 18:23:53 -07:00
compat.c
configs.c
context_tracking.c context_tracking: Remove rcu_task_trace_heavyweight_{enter,exit}() 2026-01-01 16:39:46 +08:00
cpu.c SPDX updates for 7.0-rc1 2026-02-17 09:46:03 -08:00
cpu_pm.c syscore: Pass context data to callbacks 2025-11-14 10:01:52 +01:00
crash_core.c kernel/crash: handle multi-page vmcoreinfo in crash kernel copy 2026-01-20 19:44:20 -08:00
crash_core_test.c crash: add KUnit tests for crash_exclude_mem_range 2025-09-13 17:32:55 -07:00
crash_dump_dm_crypt.c crash_dump: fix dm_crypt keys locking and ref leak 2026-01-31 16:16:08 -08:00
crash_reserve.c crash: let architecture decide crash memory export to iomem_resource 2025-11-12 10:00:15 -08:00
cred.c cred: remove unused set_security_override_from_ctx() 2026-01-06 20:52:57 -05:00
delayacct.c delayacct: fix uapi timespec64 definition 2026-02-08 00:13:32 -08:00
dma.c
elfcorehdr.c
exec_domain.c
exit.c Significant patch series in this pull request: 2025-12-06 14:01:20 -08:00
exit.h
extable.c
fail_function.c
fork.c mm.git review status for linus..mm-nonmm-stable 2026-02-12 12:13:01 -08:00
freezer.c freezer: Clarify that only cgroup1 freezer uses PM freezer 2025-10-30 20:10:27 +01:00
gen_kheaders.sh kheaders: make it possible to override TAR 2025-08-06 10:23:36 +09:00
groups.c
hung_task.c hung_task: add hung_task_sys_info sysctl to dump sys info on task-hung 2025-11-20 14:03:43 -08:00
iomem.c mm/memremap: Pass down MEMREMAP_* flags to arch_memremap_wb() 2025-02-21 15:05:38 +01:00
irq_work.c kasan: make kasan_record_aux_stack_noalloc() the default behaviour 2025-01-13 22:40:36 -08:00
jump_label.c jump_label: Use RCU in all users of __module_text_address(). 2025-03-10 11:54:46 +01:00
kallsyms.c mm.git review status for linus..mm-nonmm-stable 2026-02-12 12:13:01 -08:00
kallsyms_internal.h kallsyms: Get rid of kallsyms relative base 2026-01-22 15:58:22 -07:00
kallsyms_selftest.c kallsyms: use kmalloc_array() instead of kmalloc() 2025-09-28 11:36:14 -07:00
kallsyms_selftest.h
kcmp.c kcmp: improve performance adding an unlikely hint to task comparisons 2025-02-21 10:25:33 +01:00
Kconfig.freezer
Kconfig.hz kernel: Fix "select" wording on HZ_250 description 2025-02-21 09:20:30 +01:00
Kconfig.kexec liveupdate: kho: move to kernel/liveupdate 2025-11-27 14:24:33 -08:00
Kconfig.locks
Kconfig.preempt sched: Further restrict the preemption modes 2026-01-08 12:43:57 +01:00
kcov.c kcov: Use scoped init guard 2026-01-28 20:45:24 +01:00
kexec.c kexec: enable CMA based contiguous allocation 2025-08-02 12:01:38 -07:00
kexec_core.c kernel/kexec: fix IMA when allocation happens in CMA area 2025-12-23 11:23:14 -08:00
kexec_elf.c kexec: initialize ELF lowest address to ULONG_MAX 2025-03-16 22:30:47 -07:00
kexec_file.c kexec: derive purgatory entry from symbol 2026-01-31 16:16:07 -08:00
kexec_internal.h kexec: enable CMA based contiguous allocation 2025-08-02 12:01:38 -07:00
kheaders.c
kprobes.c kprobes: Use dedicated kthread for kprobe optimizer 2026-01-30 11:49:38 +09:00
kstack_erase.c sysctl: remove __user qualifier from stack_erasing_sysctl buffer argument 2025-11-27 15:44:53 +01:00
ksyms_common.c
ksysfs.c kexec: move sysfs entries to /sys/kernel/kexec 2025-11-27 14:24:42 -08:00
kthread.c The kthread code provides an infrastructure which manages the preferred 2026-02-09 19:57:30 -08:00
latencytop.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
Makefile kcov: Enable context analysis 2026-01-05 16:43:34 +01:00
module_signature.c
notifier.c
nscommon.c ns: rename is_initial_namespace() 2025-11-11 10:01:31 +01:00
nsproxy.c nsproxy: fix free_nsproxy() and simplify create_new_namespaces() 2025-11-14 13:10:38 +01:00
nstree.c nstree: fix kernel-doc comments for internal functions 2025-11-14 13:10:38 +01:00
padata.c padata: Constify padata_sysfs_entry structs 2026-01-23 13:48:44 +08:00
panic.c panic: add panic_force_cpu= parameter to redirect panic to a specific CPU 2026-02-03 08:21:26 -08:00
params.c params: Replace __modinit with __init_or_module 2025-12-22 16:35:53 +00:00
pid.c Revert "pid: make __task_pid_nr_ns(ns => NULL) safe for zombie callers" 2026-02-10 11:39:30 +01:00
pid_namespace.c pid: rely on common reference count behavior 2025-11-11 10:01:32 +01:00
pid_sysctl.h treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
profile.c
ptrace.c rseq: Introduce struct rseq_data 2025-11-04 08:30:50 +01:00
range.c
reboot.c - The 7 patch series "powerpc/crash: use generic crashkernel 2025-04-01 10:06:52 -07:00
regset.c
relay.c kernel: add SPDX-License-Identifier lines 2026-01-16 15:32:16 +01:00
resource.c cxl changes for v7.0 2026-02-12 16:33:05 -08:00
resource_kunit.c
rseq.c rseq: Lower default slice extension 2026-01-22 11:11:20 +01:00
scftorture.c
scs.c scs: fix a wrong parameter in __scs_magic 2025-11-12 10:00:13 -08:00
seccomp.c Performance events updates for v6.18: 2025-09-30 11:11:21 -07:00
signal.c compiler-context-analysis: Remove __cond_lock() function-like helper 2026-01-05 16:43:33 +01:00
smp.c smp: Introduce a helper function to check for pending IPIs 2025-11-19 18:06:50 +01:00
smpboot.c sched/smp: Use the SMP version of idle_thread_set_boot_cpu() 2025-06-13 08:47:20 +02:00
smpboot.h
softirq.c softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT 2025-09-17 16:25:41 +02:00
stacktrace.c
static_call.c
static_call_inline.c Modules changes for 6.15-rc1 2025-03-30 15:44:36 -07:00
stop_machine.c sched/core: Fix migrate_swap() vs. hotplug 2025-07-01 15:02:03 +02:00
sys.c RISC-V updates for v7.0 2026-02-12 19:17:44 -08:00
sys_ni.c rseq: Implement sys_rseq_slice_yield() 2026-01-22 11:11:17 +01:00
sysctl-test.c sysctl: move u8 register test to lib/test_sysctl.c 2025-04-14 14:13:41 +02:00
sysctl.c sysctl: Wrap do_proc_douintvec with the public function proc_douintvec_conv 2025-11-27 15:45:38 +01:00
task_work.c task_work: Fix NMI race condition 2025-10-29 10:29:54 +01:00
taskstats.c
torture.c torture: Delay CPU-hotplug operations until boot completes 2025-08-14 15:26:30 -07:00
tracepoint.c tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast 2026-01-30 10:44:11 -05:00
tsacct.c tsacct: skip all kernel threads 2026-01-26 19:07:13 -08:00
ucount.c ucount: check for CAP_SYS_RESOURCE using ns_capable_noaudit() 2026-01-31 16:16:08 -08:00
uid16.c
uid16.h
umh.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
up.c
user-return-notifier.c
user.c ns: drop custom reference count initialization for initial namespaces 2025-11-11 10:01:32 +01:00
user_namespace.c ns: move ns type into struct ns_common 2025-09-25 09:23:54 +02:00
utsname.c namespace-6.18-rc1 2025-09-29 11:20:29 -07:00
utsname_sysctl.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
vhost_task.c vhost: Take a reference on the task in struct vhost_task. 2025-09-21 17:44:20 -04:00
vmcore_info.c mm.git review status for linus..mm-nonmm-stable 2026-02-12 12:13:01 -08:00
watch_queue.c watch_queue: Use local kmap in post_one_notification() 2025-11-19 12:17:28 +01:00
watchdog.c watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() 2026-02-08 00:13:34 -08:00
watchdog_buddy.c watchdog: fix opencoded cpumask_next_wrap() in watchdog_next_cpu() 2025-07-31 11:28:03 -04:00
watchdog_perf.c watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency 2026-02-08 00:13:35 -08:00
workqueue.c workqueue: Changes for v6.20 2026-02-11 13:13:32 -08:00
workqueue_internal.h