linux/kernel
Lance Yang 194a9b9e84 hung_task: show the blocker task if the task is hung on semaphore
Inspired by mutex blocker tracking[1], this patch makes a trade-off to
balance the overhead and utility of the hung task detector.

Unlike mutexes, semaphores lack explicit ownership tracking, making it
challenging to identify the root cause of hangs.  To address this, we
introduce a last_holder field to the semaphore structure, which is updated
when a task successfully calls down() and cleared during up().

The assumption is that if a task is blocked on a semaphore, the holders
must not have released it.  While this does not guarantee that the last
holder is one of the current blockers, it likely provides a practical hint
for diagnosing semaphore-related stalls.

With this change, the hung task detector can now show blocker task's info
like below:

[Tue Apr  8 12:19:07 2025] INFO: task cat:945 blocked for more than 120 seconds.
[Tue Apr  8 12:19:07 2025]       Tainted: G            E      6.14.0-rc6+ #1
[Tue Apr  8 12:19:07 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Apr  8 12:19:07 2025] task:cat             state:D stack:0     pid:945   tgid:945   ppid:828    task_flags:0x400000 flags:0x00000000
[Tue Apr  8 12:19:07 2025] Call Trace:
[Tue Apr  8 12:19:07 2025]  <TASK>
[Tue Apr  8 12:19:07 2025]  __schedule+0x491/0xbd0
[Tue Apr  8 12:19:07 2025]  schedule+0x27/0xf0
[Tue Apr  8 12:19:07 2025]  schedule_timeout+0xe3/0xf0
[Tue Apr  8 12:19:07 2025]  ? __folio_mod_stat+0x2a/0x80
[Tue Apr  8 12:19:07 2025]  ? set_ptes.constprop.0+0x27/0x90
[Tue Apr  8 12:19:07 2025]  __down_common+0x155/0x280
[Tue Apr  8 12:19:07 2025]  down+0x53/0x70
[Tue Apr  8 12:19:07 2025]  read_dummy_semaphore+0x23/0x60
[Tue Apr  8 12:19:07 2025]  full_proxy_read+0x5f/0xa0
[Tue Apr  8 12:19:07 2025]  vfs_read+0xbc/0x350
[Tue Apr  8 12:19:07 2025]  ? __count_memcg_events+0xa5/0x140
[Tue Apr  8 12:19:07 2025]  ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr  8 12:19:07 2025]  ? handle_mm_fault+0x180/0x260
[Tue Apr  8 12:19:07 2025]  ksys_read+0x66/0xe0
[Tue Apr  8 12:19:07 2025]  do_syscall_64+0x51/0x120
[Tue Apr  8 12:19:07 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr  8 12:19:07 2025] RIP: 0033:0x7f419478f46e
[Tue Apr  8 12:19:07 2025] RSP: 002b:00007fff1c4d2668 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr  8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f419478f46e
[Tue Apr  8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f4194683000 RDI: 0000000000000003
[Tue Apr  8 12:19:07 2025] RBP: 00007f4194683000 R08: 00007f4194682010 R09: 0000000000000000
[Tue Apr  8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr  8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr  8 12:19:07 2025]  </TASK>
[Tue Apr  8 12:19:07 2025] INFO: task cat:945 blocked on a semaphore likely last held by task cat:938
[Tue Apr  8 12:19:07 2025] task:cat             state:S stack:0     pid:938   tgid:938   ppid:584    task_flags:0x400000 flags:0x00000000
[Tue Apr  8 12:19:07 2025] Call Trace:
[Tue Apr  8 12:19:07 2025]  <TASK>
[Tue Apr  8 12:19:07 2025]  __schedule+0x491/0xbd0
[Tue Apr  8 12:19:07 2025]  ? _raw_spin_unlock_irqrestore+0xe/0x40
[Tue Apr  8 12:19:07 2025]  schedule+0x27/0xf0
[Tue Apr  8 12:19:07 2025]  schedule_timeout+0x77/0xf0
[Tue Apr  8 12:19:07 2025]  ? __pfx_process_timeout+0x10/0x10
[Tue Apr  8 12:19:07 2025]  msleep_interruptible+0x49/0x60
[Tue Apr  8 12:19:07 2025]  read_dummy_semaphore+0x2d/0x60
[Tue Apr  8 12:19:07 2025]  full_proxy_read+0x5f/0xa0
[Tue Apr  8 12:19:07 2025]  vfs_read+0xbc/0x350
[Tue Apr  8 12:19:07 2025]  ? __count_memcg_events+0xa5/0x140
[Tue Apr  8 12:19:07 2025]  ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr  8 12:19:07 2025]  ? handle_mm_fault+0x180/0x260
[Tue Apr  8 12:19:07 2025]  ksys_read+0x66/0xe0
[Tue Apr  8 12:19:07 2025]  do_syscall_64+0x51/0x120
[Tue Apr  8 12:19:07 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr  8 12:19:07 2025] RIP: 0033:0x7f7c584a646e
[Tue Apr  8 12:19:07 2025] RSP: 002b:00007ffdba8ce158 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr  8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f7c584a646e
[Tue Apr  8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f7c5839a000 RDI: 0000000000000003
[Tue Apr  8 12:19:07 2025] RBP: 00007f7c5839a000 R08: 00007f7c58399010 R09: 0000000000000000
[Tue Apr  8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr  8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr  8 12:19:07 2025]  </TASK>

[1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com

Link: https://lkml.kernel.org/r/20250414145945.84916-3-ioworker0@gmail.com
Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yongliang Gao <leonylgao@tencent.com>
Cc: Zi Li <amaindex@outlook.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11 17:54:08 -07:00
..
bpf bpf: Add namespace to BPF internal symbols 2025-04-25 09:21:23 -07:00
cgroup cgroup/cpuset-v1: Add missing support for cpuset_v2_mode 2025-04-17 07:32:53 -10:00
configs - The 6 patch series "Enable strict percpu address space checks" from 2025-04-01 09:29:18 -07:00
debug TTY/Serial driver updates for 6.15-rc1 2025-04-02 18:17:33 -07:00
dma dma-coherent: Warn if OF reserved memory is beyond current coherent DMA mask 2025-04-22 17:44:09 +02:00
entry Objtool changes for v6.15: 2025-03-24 21:18:05 -07:00
events perf/core: Change to POLLERR for pinned events with error 2025-04-23 09:39:06 +02:00
futex futex: Use a hashmask instead of hashsize 2025-02-26 16:07:59 +01:00
gcov gcov: clang: use correct function param names 2025-01-24 22:47:27 -08:00
irq genirq/msi: Prevent NULL pointer dereference in msi_domain_debug_show() 2025-04-30 23:25:10 +02:00
kcsan treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
livepatch Modules changes for 6.15-rc1 2025-03-30 15:44:36 -07:00
locking hung_task: show the blocker task if the task is hung on semaphore 2025-05-11 17:54:08 -07:00
module ring-buffer updates for v6.15 2025-03-31 13:37:22 -07:00
power This update includes the following changes: 2025-03-29 10:01:55 -07:00
printk printk changes for 6.15 2025-03-27 19:22:24 -07:00
rcu treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
sched Fix sporadic crashes in dequeue_entities() due to ... bad math. 2025-04-26 09:23:20 -07:00
time timekeeping: Prevent coarse clocks going backwards 2025-04-28 11:17:29 +02:00
trace tracing: Do not take trace_event_sem in print_event_fields() 2025-05-01 22:44:52 -04:00
.gitignore
acct.c acct: block access to kernel internal filesystems 2025-02-12 12:24:16 +01:00
async.c
audit.c audit: Initialize lsmctx to avoid memory allocation error 2025-01-29 20:02:04 -05:00
audit.h audit: change context data from secid to lsm_prop 2024-10-11 14:34:16 -04:00
audit_fsnotify.c
audit_tree.c
audit_watch.c fs: add kern_path_locked_negative() 2025-04-15 11:32:34 +02:00
auditfilter.c audit: fix suffixed '/' filename matching 2024-12-05 19:22:38 -05:00
auditsc.c fs: dedup handling of struct filename init and refcounts bumps 2025-03-18 15:34:27 +01:00
backtracetest.c backtracetest: add MODULE_DESCRIPTION() 2024-06-24 22:24:55 -07:00
bounds.c
capability.c capability: Remove unused has_capability 2025-03-07 22:03:09 -06:00
cfi.c Modules changes for 6.15-rc1 2025-03-30 15:44:36 -07:00
compat.c
configs.c
context_tracking.c context_tracking: Make RCU watch ct_kernel_exit_state() warning 2025-03-04 18:44:29 -08:00
cpu.c hyperv-next for 6.15 2025-03-25 14:47:04 -07:00
cpu_pm.c
crash_core.c crash: Use note name macros 2025-02-10 16:56:58 -08:00
crash_reserve.c crash: remove an unused argument from reserve_crashkernel_generic() 2025-03-16 22:30:47 -07:00
cred.c cred: remove old {override,revert}_creds() helpers 2024-12-02 11:25:09 +01:00
delayacct.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
dma.c
elfcorehdr.c
exec_domain.c
exit.c exit: combine work under lock in synchronize_group_exit() and coredump_task_exit() 2025-05-11 17:54:07 -07:00
exit.h
extable.c
fail_function.c
fork.c - The 7 patch series "powerpc/crash: use generic crashkernel 2025-04-01 10:06:52 -07:00
freezer.c sched/fair: Fix external p->on_rq users 2024-10-14 09:14:35 +02:00
gen_kheaders.sh Revert "kheaders: Ignore silly-rename files" 2025-03-15 21:22:52 +09:00
groups.c
hung_task.c hung_task: show the blocker task if the task is hung on semaphore 2025-05-11 17:54:08 -07:00
iomem.c mm/memremap: Pass down MEMREMAP_* flags to arch_memremap_wb() 2025-02-21 15:05:38 +01:00
irq_work.c kasan: make kasan_record_aux_stack_noalloc() the default behaviour 2025-01-13 22:40:36 -08:00
jump_label.c jump_label: Use RCU in all users of __module_text_address(). 2025-03-10 11:54:46 +01:00
kallsyms.c kallsyms: Remove KALLSYMS_ABSOLUTE_PERCPU 2025-02-18 10:16:04 +01:00
kallsyms_internal.h kallsyms: get rid of code for absolute kallsyms 2024-07-20 16:33:21 +09:00
kallsyms_selftest.c kallsyms: Use kthread_run_on_cpu() 2025-01-02 22:12:12 +01:00
kallsyms_selftest.h
kcmp.c kcmp: improve performance adding an unlikely hint to task comparisons 2025-02-21 10:25:33 +01:00
Kconfig.freezer
Kconfig.hz kernel: Fix "select" wording on HZ_250 description 2025-02-21 09:20:30 +01:00
Kconfig.kexec crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32 2024-11-14 22:43:48 -08:00
Kconfig.locks
Kconfig.preempt sched: No PREEMPT_RT=y for all{yes,mod}config 2024-11-07 15:25:05 +01:00
kcov.c kcov: mark in_softirq_really() as __always_inline 2024-12-30 17:59:08 -08:00
kexec.c
kexec_core.c - The 7 patch series "powerpc/crash: use generic crashkernel 2025-04-01 10:06:52 -07:00
kexec_elf.c kexec: initialize ELF lowest address to ULONG_MAX 2025-03-16 22:30:47 -07:00
kexec_file.c crash: let arch decide usable memory range in reserved area 2025-03-16 22:30:47 -07:00
kexec_internal.h kexec: use atomic_try_cmpxchg_acquire() in kexec_trylock() 2024-09-01 20:43:23 -07:00
kheaders.c kheaders: Simplify attribute through __BIN_ATTR_SIMPLE_RO() 2024-12-24 09:46:49 +01:00
kprobes.c kprobes: Use RCU in all users of __module_text_address(). 2025-03-10 11:54:46 +01:00
ksyms_common.c
ksysfs.c kernel/ksysfs.c: simplify bin_attribute definition 2025-01-07 16:59:15 +01:00
kthread.c treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
latencytop.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
Makefile tracing: Disable branch profiling in noinstr code 2025-03-22 09:49:26 +01:00
module_signature.c
notifier.c reboot: move reboot_notifier_list to kernel/reboot.c 2024-11-05 17:12:31 -08:00
nsproxy.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
padata.c padata: switch padata_find_next() to using cpumask_next_wrap() 2025-02-24 16:37:23 -05:00
panic.c These are objtool fixes and updates by Josh Poimboeuf, centered 2025-04-02 10:30:10 -07:00
params.c module: ensure that kobject_put() is safe for module type kobjects 2025-05-07 20:24:59 +02:00
pid.c kernel-6.15-rc1.tasklist_lock 2025-03-24 13:39:27 -07:00
pid_namespace.c pid: Do not set pid_max in new pid namespaces 2025-03-06 10:18:36 +01:00
pid_sysctl.h treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
profile.c profiling: remove profile=sleep support 2024-08-04 13:36:28 -07:00
ptrace.c
range.c
reboot.c - The 7 patch series "powerpc/crash: use generic crashkernel 2025-04-01 10:06:52 -07:00
regset.c
relay.c relay: use kasprintf() instead of fixed buffer formatting 2025-03-21 22:10:05 -07:00
resource.c resource: replace open coded variant of DEFINE_RES() 2025-03-21 22:10:05 -07:00
resource_kunit.c resource, kunit: fix user-after-free in resource_test_region_intersects() 2024-10-09 12:47:19 -07:00
rseq.c rseq: Fix segfault on registration when rseq_cs is non-zero 2025-03-06 22:26:49 +01:00
scftorture.c scftorture: Handle NULL argument passed to scf_add_to_free_list(). 2024-11-14 16:09:51 -08:00
scs.c
seccomp.c seccomp: avoid the lock trip seccomp_filter_release in common case 2025-02-24 11:17:10 -08:00
signal.c vfs-6.15-rc1.fixes 2025-04-02 16:05:21 -07:00
smp.c CSD-lock pull request for v6.14 2025-01-28 11:34:03 -08:00
smpboot.c
smpboot.h
softirq.c lockdep: Fix wait context check on softirq for PREEMPT_RT 2025-03-25 10:46:44 +01:00
stackleak.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
stacktrace.c
static_call.c
static_call_inline.c Modules changes for 6.15-rc1 2025-03-30 15:44:36 -07:00
stop_machine.c stop-machine: Add comment for rcu_momentary_eqs() 2025-03-11 10:15:52 -07:00
sys.c Updates for the core time/timer subsystem: 2025-03-25 10:33:23 -07:00
sys_ni.c Probes updates for v6.11: 2024-07-18 12:19:20 -07:00
sysctl-test.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
sysctl.c s390 updates for 6.15 merge window 2025-03-29 11:59:43 -07:00
task_work.c kasan: make kasan_record_aux_stack_noalloc() the default behaviour 2025-01-13 22:40:36 -08:00
taskstats.c fdget(), more trivial conversions 2024-11-03 01:28:06 -05:00
torture.c torture: Add get_torture_init_jiffies() for test-start time 2025-02-05 07:14:24 -08:00
tracepoint.c tracepoint: Print the function symbol when tracepoint_debug is set 2025-03-21 15:30:10 -04:00
tsacct.c tsacct: replace strncpy() with strscpy() 2024-07-12 16:39:53 -07:00
ucount.c ucount: use rcuref_t for reference counting 2025-03-16 22:30:50 -07:00
uid16.c
uid16.h
umh.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
up.c
user-return-notifier.c
user.c uidgid: make sure we fit into one cacheline 2024-09-12 12:16:09 +02:00
user_namespace.c uidgid: add map_id_range_up() 2025-02-12 12:12:27 +01:00
usermode_driver.c
utsname.c
utsname_sysctl.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
vhost_task.c vhost_task: fix vhost_task_create() documentation 2025-04-18 10:08:11 -04:00
vmcore_info.c crash: export PAGE_UNACCEPTED_MAPCOUNT_VALUE to vmcoreinfo 2025-05-11 17:54:04 -07:00
watch_queue.c vfs-6.15-rc1.pipe 2025-03-24 09:52:37 -07:00
watchdog.c A treewide hrtimer timer cleanup 2025-03-25 10:54:15 -07:00
watchdog_buddy.c
watchdog_perf.c - The 7 patch series "powerpc/crash: use generic crashkernel 2025-04-01 10:06:52 -07:00
workqueue.c treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
workqueue_internal.h