linux/kernel/trace
Chaitanya Kulkarni da46b5dfef blktrace: fix __this_cpu_read/write in preemptible context
tracing_record_cmdline() internally uses __this_cpu_read() and
__this_cpu_write() on the per-CPU variable trace_cmdline_save, and
trace_save_cmdline() explicitly asserts preemption is disabled via
lockdep_assert_preemption_disabled(). These operations are only safe
when preemption is off, as they were designed to be called from the
scheduler context (probe_wakeup_sched_switch() / probe_wakeup()).

__blk_add_trace() was calling tracing_record_cmdline(current) early in
the blk_tracer path, before ring buffer reservation, from process
context where preemption is fully enabled. This triggers the following
using blktests/blktrace/002:

blktrace/002 (blktrace ftrace corruption with sysfs trace)   [failed]
    runtime  0.367s  ...  0.437s
    something found in dmesg:
    [   81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33
    [   81.239580] null_blk: disk nullb1 created
    [   81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516
    [   81.362842] caller is tracing_record_cmdline+0x10/0x40
    [   81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G                 N  7.0.0-rc1lblk+ #84 PREEMPT(full)
    [   81.362877] Tainted: [N]=TEST
    [   81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
    [   81.362881] Call Trace:
    [   81.362884]  <TASK>
    [   81.362886]  dump_stack_lvl+0x8d/0xb0
    ...
    (See '/mnt/sda/blktests/results/nodev/blktrace/002.dmesg' for the entire message)

[   81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33
[   81.239580] null_blk: disk nullb1 created
[   81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516
[   81.362842] caller is tracing_record_cmdline+0x10/0x40
[   81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G                 N  7.0.0-rc1lblk+ #84 PREEMPT(full)
[   81.362877] Tainted: [N]=TEST
[   81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[   81.362881] Call Trace:
[   81.362884]  <TASK>
[   81.362886]  dump_stack_lvl+0x8d/0xb0
[   81.362895]  check_preemption_disabled+0xce/0xe0
[   81.362902]  tracing_record_cmdline+0x10/0x40
[   81.362923]  __blk_add_trace+0x307/0x5d0
[   81.362934]  ? lock_acquire+0xe0/0x300
[   81.362940]  ? iov_iter_extract_pages+0x101/0xa30
[   81.362959]  blk_add_trace_bio+0x106/0x1e0
[   81.362968]  submit_bio_noacct_nocheck+0x24b/0x3a0
[   81.362979]  ? lockdep_init_map_type+0x58/0x260
[   81.362988]  submit_bio_wait+0x56/0x90
[   81.363009]  __blkdev_direct_IO_simple+0x16c/0x250
[   81.363026]  ? __pfx_submit_bio_wait_endio+0x10/0x10
[   81.363038]  ? rcu_read_lock_any_held+0x73/0xa0
[   81.363051]  blkdev_read_iter+0xc1/0x140
[   81.363059]  vfs_read+0x20b/0x330
[   81.363083]  ksys_read+0x67/0xe0
[   81.363090]  do_syscall_64+0xbf/0xf00
[   81.363102]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   81.363106] RIP: 0033:0x7f281906029d
[   81.363111] Code: 31 c0 e9 c6 fe ff ff 50 48 8d 3d 66 63 0a 00 e8 59 ff 01 00 66 0f 1f 84 00 00 00 00 00 80 3d 41 33 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec
[   81.363113] RSP: 002b:00007ffca127dd48 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[   81.363120] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f281906029d
[   81.363122] RDX: 0000000000001000 RSI: 0000559f8bfae000 RDI: 0000000000000000
[   81.363123] RBP: 0000000000001000 R08: 0000002863a10a81 R09: 00007f281915f000
[   81.363124] R10: 00007f2818f77b60 R11: 0000000000000246 R12: 0000559f8bfae000
[   81.363126] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000000a
[   81.363142]  </TASK>

The same BUG fires from blk_add_trace_plug(), blk_add_trace_unplug(),
and blk_add_trace_rq() paths as well.

The purpose of tracing_record_cmdline() is to cache the task->comm for
a given PID so that the trace can later resolve it. It is only
meaningful when a trace event is actually being recorded. Ring buffer
reservation via ring_buffer_lock_reserve() disables preemption, and
preemption remains disabled until the event is committed :-

__blk_add_trace()
       	__trace_buffer_lock_reserve()
       		__trace_buffer_lock_reserve()
       			ring_buffer_lock_reserve()
       				preempt_disable_notrace();  <---

With this fix blktests for blktrace pass:

  blktests (master) # ./check blktrace
  blktrace/001 (blktrace zone management command tracing)      [passed]
      runtime  3.650s  ...  3.647s
  blktrace/002 (blktrace ftrace corruption with sysfs trace)   [passed]
      runtime  0.411s  ...  0.384s

Fixes: 7ffbd48d5c ("tracing: Cache comms only after an event occurred")
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-02 09:14:58 -07:00
..
rv verification/rvgen: Remove unused variable declaration from containers 2026-01-12 07:43:51 +01:00
blktrace.c blktrace: fix __this_cpu_read/write in preemptible context 2026-03-02 09:14:58 -07:00
bpf_trace.c tracing updates for 7.0: 2026-02-13 19:25:16 -08:00
bpf_trace.h tracing/treewide: Remove second parameter of __assign_str() 2024-05-22 20:14:47 -04:00
error_report-traces.c
fgraph.c function_graph: Restore direct mode when callbacks drop to one 2026-02-13 09:33:14 -05:00
fprobe.c tracing fixes for v6.19: 2025-12-06 13:49:40 -08:00
ftrace.c tracing updates for 7.0: 2026-02-13 19:25:16 -08:00
ftrace_internal.h function_graph: Make fgraph_update_pid_func() a stub for !DYNAMIC_FTRACE 2024-06-10 18:08:23 -04:00
Kconfig tracing updates for 7.0: 2026-02-13 19:25:16 -08:00
kprobe_event_gen_test.c
Makefile tracing: Move pid filtering into trace_pid.c 2026-02-08 21:01:13 -05:00
pid_list.c trace/pid_list: optimize pid_list->lock contention 2025-11-13 15:15:54 -05:00
pid_list.h trace/pid_list: optimize pid_list->lock contention 2025-11-13 15:15:54 -05:00
power-traces.c PM: cpufreq: powernv/tracing: Move powernv_throttle trace event 2025-07-21 16:40:56 -04:00
preemptirq_delay_test.c kernel: trace: preemptirq_delay_test: use offstack cpu mask 2025-07-08 18:17:38 -04:00
rethook.c rethook: honor CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING in rethook_try_get() 2024-05-01 23:18:48 +09:00
ring_buffer.c ring-buffer: Use a housekeeping CPU to wake up waiters 2026-01-26 17:44:53 -05:00
ring_buffer_benchmark.c tracing: Fix typo in ring_buffer_benchmark.c 2025-12-05 15:43:40 -05:00
rpm-traces.c
synth_event_gen_test.c
trace.c mm.git review status for linus..mm-nonmm-stable 2026-02-12 12:13:01 -08:00
trace.h tracing updates for 7.0: 2026-02-13 19:25:16 -08:00
trace_benchmark.c tracing: Improve benchmark test performance by using do_div() 2024-05-13 20:00:57 -04:00
trace_benchmark.h
trace_boot.c
trace_branch.c tracing: branch: Use trace_tracing_is_on_cpu() instead of "disabled" field 2025-05-09 15:19:10 -04:00
trace_btf.c tracing/probes: Fix to search structure fields correctly 2024-02-17 21:25:42 +09:00
trace_btf.h
trace_clock.c tracing: Use atomic64_inc_return() in trace_clock_counter() 2024-10-09 19:59:49 -04:00
trace_dynevent.c tracing: Report wrong dynamic event command 2025-11-10 19:26:14 -05:00
trace_dynevent.h tracing: probes: Fix a possible race in trace_probe_log APIs 2025-05-13 22:23:34 +09:00
trace_entries.h tracing: Fix ftrace event field alignments 2026-02-05 09:47:11 -05:00
trace_eprobe.c Probes for v6.19 2025-12-05 10:55:47 -08:00
trace_event_perf.c perf: Remove unnecessary parameter of security check 2025-02-26 14:13:58 -05:00
trace_events.c tracing: Make tracing_disabled global for tracing system 2026-02-08 21:01:11 -05:00
trace_events_filter.c tracing: Replace use of system_wq with system_dfl_wq 2026-01-26 17:44:05 -05:00
trace_events_filter_test.h
trace_events_hist.c tracing: Move tracing_set_filter_buffering() into trace_events_hist.c 2026-02-08 21:01:11 -05:00
trace_events_inject.c tracing: Have format file honor EVENT_FILE_FL_FREED 2024-08-07 18:12:46 -04:00
trace_events_synth.c tracing: Remove notrace from trace_event_raw_event_synth() 2026-01-28 21:01:09 -05:00
trace_events_trigger.c tracing: Have all triggers expect a file parameter 2026-02-08 21:00:57 -05:00
trace_events_user.c tracing: Fix multiple typos in trace_events_user.c 2025-12-05 15:43:41 -05:00
trace_export.c tracing: Fix ftrace event field alignments 2026-02-05 09:47:11 -05:00
trace_fprobe.c tracing updates for v6.19: 2025-12-05 09:51:37 -08:00
trace_functions.c tracing: Have function tracer define options per instance 2025-11-12 09:59:54 -05:00
trace_functions_graph.c function_graph: Fix args pointer mismatch in print_graph_retval() 2026-01-23 13:34:38 -05:00
trace_hwlat.c tracing: Fix false sharing in hwlat get_sample() 2026-02-10 03:36:39 -05:00
trace_irqsoff.c tracing: Allow tracer to add more than 32 options 2025-11-04 21:44:00 +09:00
trace_kdb.c tracing: Allow tracer to add more than 32 options 2025-11-04 21:44:00 +09:00
trace_kprobe.c Probes for v7.0 2026-02-16 07:04:01 -08:00
trace_kprobe_selftest.c
trace_kprobe_selftest.h
trace_mmiotrace.c tracing/mmiotrace: Remove reference to unused per CPU data pointer 2025-05-08 09:36:09 -04:00
trace_nop.c
trace_osnoise.c tracing: Fix multiple typos in trace_osnoise.c 2025-12-05 15:43:41 -05:00
trace_output.c tracing: Add bitmask-list option for human-readable bitmask display 2026-01-26 17:00:50 -05:00
trace_output.h tracing: Allow tracer to add more than 32 options 2025-11-04 21:44:00 +09:00
trace_pid.c tracing: Move pid filtering into trace_pid.c 2026-02-08 21:01:13 -05:00
trace_preemptirq.c tracing: Fix archs that still call tracepoints without RCU watching 2024-12-05 09:28:58 -05:00
trace_printk.c tracing updates for 7.0: 2026-02-13 19:25:16 -08:00
trace_probe.c tracing fixes for v6.19: 2025-12-06 13:49:40 -08:00
trace_probe.h tracing: probes: Use __free() for trace_probe_log 2025-11-01 01:10:28 +09:00
trace_probe_kernel.h
trace_probe_tmpl.h tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS 2024-12-26 10:50:04 -05:00
trace_recursion_record.c
trace_sched_switch.c tracing: Ensure optimized hashing works 2025-09-30 17:27:58 -04:00
trace_sched_wakeup.c tracing: Allow tracer to add more than 32 options 2025-11-04 21:44:00 +09:00
trace_selftest.c tracing: Rename trace_array field max_buffer to snapshot_buffer 2026-02-08 21:01:13 -05:00
trace_selftest_dynamic.c
trace_seq.c tracing: Add bitmask-list option for human-readable bitmask display 2026-01-26 17:00:50 -05:00
trace_stack.c tracing updates for v6.16: 2025-05-29 21:04:36 -07:00
trace_stat.c tracing: Switch trace_stat.c code over to use guard() 2024-12-26 10:38:37 -05:00
trace_stat.h
trace_synth.h
trace_syscalls.c tracing: Hide __NR_utimensat and _NR_mq_timedsend when not defined 2025-11-10 14:23:53 -05:00
trace_uprobe.c tracing: uprobe: eprobes: Allocate traceprobe_parse_context per probe 2025-11-01 01:10:29 +09:00
tracing_map.c tracing: Use vmalloc_array() to improve code 2025-09-23 09:31:58 -04:00
tracing_map.h