mirror of
https://github.com/torvalds/linux.git
synced 2026-03-08 02:44:41 +01:00
tracing_record_cmdline() internally uses __this_cpu_read() and
__this_cpu_write() on the per-CPU variable trace_cmdline_save, and
trace_save_cmdline() explicitly asserts preemption is disabled via
lockdep_assert_preemption_disabled(). These operations are only safe
when preemption is off, as they were designed to be called from the
scheduler context (probe_wakeup_sched_switch() / probe_wakeup()).
__blk_add_trace() was calling tracing_record_cmdline(current) early in
the blk_tracer path, before ring buffer reservation, from process
context where preemption is fully enabled. This triggers the following
using blktests/blktrace/002:
blktrace/002 (blktrace ftrace corruption with sysfs trace) [failed]
runtime 0.367s ... 0.437s
something found in dmesg:
[ 81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33
[ 81.239580] null_blk: disk nullb1 created
[ 81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516
[ 81.362842] caller is tracing_record_cmdline+0x10/0x40
[ 81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G N 7.0.0-rc1lblk+ #84 PREEMPT(full)
[ 81.362877] Tainted: [N]=TEST
[ 81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[ 81.362881] Call Trace:
[ 81.362884] <TASK>
[ 81.362886] dump_stack_lvl+0x8d/0xb0
...
(See '/mnt/sda/blktests/results/nodev/blktrace/002.dmesg' for the entire message)
[ 81.211018] run blktests blktrace/002 at 2026-02-25 22:24:33
[ 81.239580] null_blk: disk nullb1 created
[ 81.357294] BUG: using __this_cpu_read() in preemptible [00000000] code: dd/2516
[ 81.362842] caller is tracing_record_cmdline+0x10/0x40
[ 81.362872] CPU: 16 UID: 0 PID: 2516 Comm: dd Tainted: G N 7.0.0-rc1lblk+ #84 PREEMPT(full)
[ 81.362877] Tainted: [N]=TEST
[ 81.362878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[ 81.362881] Call Trace:
[ 81.362884] <TASK>
[ 81.362886] dump_stack_lvl+0x8d/0xb0
[ 81.362895] check_preemption_disabled+0xce/0xe0
[ 81.362902] tracing_record_cmdline+0x10/0x40
[ 81.362923] __blk_add_trace+0x307/0x5d0
[ 81.362934] ? lock_acquire+0xe0/0x300
[ 81.362940] ? iov_iter_extract_pages+0x101/0xa30
[ 81.362959] blk_add_trace_bio+0x106/0x1e0
[ 81.362968] submit_bio_noacct_nocheck+0x24b/0x3a0
[ 81.362979] ? lockdep_init_map_type+0x58/0x260
[ 81.362988] submit_bio_wait+0x56/0x90
[ 81.363009] __blkdev_direct_IO_simple+0x16c/0x250
[ 81.363026] ? __pfx_submit_bio_wait_endio+0x10/0x10
[ 81.363038] ? rcu_read_lock_any_held+0x73/0xa0
[ 81.363051] blkdev_read_iter+0xc1/0x140
[ 81.363059] vfs_read+0x20b/0x330
[ 81.363083] ksys_read+0x67/0xe0
[ 81.363090] do_syscall_64+0xbf/0xf00
[ 81.363102] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 81.363106] RIP: 0033:0x7f281906029d
[ 81.363111] Code: 31 c0 e9 c6 fe ff ff 50 48 8d 3d 66 63 0a 00 e8 59 ff 01 00 66 0f 1f 84 00 00 00 00 00 80 3d 41 33 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec
[ 81.363113] RSP: 002b:00007ffca127dd48 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 81.363120] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f281906029d
[ 81.363122] RDX: 0000000000001000 RSI: 0000559f8bfae000 RDI: 0000000000000000
[ 81.363123] RBP: 0000000000001000 R08: 0000002863a10a81 R09: 00007f281915f000
[ 81.363124] R10: 00007f2818f77b60 R11: 0000000000000246 R12: 0000559f8bfae000
[ 81.363126] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000000a
[ 81.363142] </TASK>
The same BUG fires from blk_add_trace_plug(), blk_add_trace_unplug(),
and blk_add_trace_rq() paths as well.
The purpose of tracing_record_cmdline() is to cache the task->comm for
a given PID so that the trace can later resolve it. It is only
meaningful when a trace event is actually being recorded. Ring buffer
reservation via ring_buffer_lock_reserve() disables preemption, and
preemption remains disabled until the event is committed :-
__blk_add_trace()
__trace_buffer_lock_reserve()
__trace_buffer_lock_reserve()
ring_buffer_lock_reserve()
preempt_disable_notrace(); <---
With this fix blktests for blktrace pass:
blktests (master) # ./check blktrace
blktrace/001 (blktrace zone management command tracing) [passed]
runtime 3.650s ... 3.647s
blktrace/002 (blktrace ftrace corruption with sysfs trace) [passed]
runtime 0.411s ... 0.384s
Fixes:
|
||
|---|---|---|
| .. | ||
| rv | ||
| blktrace.c | ||
| bpf_trace.c | ||
| bpf_trace.h | ||
| error_report-traces.c | ||
| fgraph.c | ||
| fprobe.c | ||
| ftrace.c | ||
| ftrace_internal.h | ||
| Kconfig | ||
| kprobe_event_gen_test.c | ||
| Makefile | ||
| pid_list.c | ||
| pid_list.h | ||
| power-traces.c | ||
| preemptirq_delay_test.c | ||
| rethook.c | ||
| ring_buffer.c | ||
| ring_buffer_benchmark.c | ||
| rpm-traces.c | ||
| synth_event_gen_test.c | ||
| trace.c | ||
| trace.h | ||
| trace_benchmark.c | ||
| trace_benchmark.h | ||
| trace_boot.c | ||
| trace_branch.c | ||
| trace_btf.c | ||
| trace_btf.h | ||
| trace_clock.c | ||
| trace_dynevent.c | ||
| trace_dynevent.h | ||
| trace_entries.h | ||
| trace_eprobe.c | ||
| trace_event_perf.c | ||
| trace_events.c | ||
| trace_events_filter.c | ||
| trace_events_filter_test.h | ||
| trace_events_hist.c | ||
| trace_events_inject.c | ||
| trace_events_synth.c | ||
| trace_events_trigger.c | ||
| trace_events_user.c | ||
| trace_export.c | ||
| trace_fprobe.c | ||
| trace_functions.c | ||
| trace_functions_graph.c | ||
| trace_hwlat.c | ||
| trace_irqsoff.c | ||
| trace_kdb.c | ||
| trace_kprobe.c | ||
| trace_kprobe_selftest.c | ||
| trace_kprobe_selftest.h | ||
| trace_mmiotrace.c | ||
| trace_nop.c | ||
| trace_osnoise.c | ||
| trace_output.c | ||
| trace_output.h | ||
| trace_pid.c | ||
| trace_preemptirq.c | ||
| trace_printk.c | ||
| trace_probe.c | ||
| trace_probe.h | ||
| trace_probe_kernel.h | ||
| trace_probe_tmpl.h | ||
| trace_recursion_record.c | ||
| trace_sched_switch.c | ||
| trace_sched_wakeup.c | ||
| trace_selftest.c | ||
| trace_selftest_dynamic.c | ||
| trace_seq.c | ||
| trace_stack.c | ||
| trace_stat.c | ||
| trace_stat.h | ||
| trace_synth.h | ||
| trace_syscalls.c | ||
| trace_uprobe.c | ||
| tracing_map.c | ||
| tracing_map.h | ||