linux/kernel/printk
Petr Mladek 9bd18e1262 printk/nbcon: Restore IRQ in atomic flush after each emitted record
The commit d5d399efff ("printk/nbcon: Release nbcon consoles ownership
in atomic flush after each emitted record") prevented stall of a CPU
which lost nbcon console ownership because another CPU entered
an emergency flush.

But there is still the problem that the CPU doing the emergency flush
might cause a stall on its own.

Let's go even further and restore IRQ in the atomic flush after
each emitted record.

It is not a complete solution. The interrupts and/or scheduling might
still be blocked when the emergency atomic flush was called with
IRQs and/or scheduling disabled. But it should remove the following
lockup:

  mlx5_core 0000:03:00.0: Shutdown was called
  kvm: exiting hardware virtualization
  arm-smmu-v3 arm-smmu-v3.10.auto: CMD_SYNC timeout at 0x00000103 [hwprod 0x00000104, hwcons 0x00000102]
  smp: csd: Detected non-responsive CSD lock (#1) on CPU#4, waiting 5000000032 ns for CPU#00 do_nothing (kernel/smp.c:1057)
  smp:     csd: CSD lock (#1) unresponsive.
  [...]
  Call trace:
  pl011_console_write_atomic (./arch/arm64/include/asm/vdso/processor.h:12 drivers/tty/serial/amba-pl011.c:2540) (P)
  nbcon_emit_next_record (kernel/printk/nbcon.c:1049)
  __nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1517)
  __nbcon_atomic_flush_pending.llvm.15488114865160659019 (./arch/arm64/include/asm/alternative-macros.h:254 ./arch/arm64/include/asm/cpufeature.h:808 ./arch/arm64/include/asm/irqflags.h:192 kernel/printk/nbcon.c:1562 kernel/printk/nbcon.c:1612)
  nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1629)
  printk_kthreads_shutdown (kernel/printk/printk.c:?)
  syscore_shutdown (drivers/base/syscore.c:120)
  kernel_kexec (kernel/kexec_core.c:1045)
  __arm64_sys_reboot (kernel/reboot.c:794 kernel/reboot.c:722 kernel/reboot.c:722)
  invoke_syscall (arch/arm64/kernel/syscall.c:50)
  el0_svc_common.llvm.14158405452757855239 (arch/arm64/kernel/syscall.c:?)
  do_el0_svc (arch/arm64/kernel/syscall.c:152)
  el0_svc (./arch/arm64/include/asm/alternative-macros.h:254 ./arch/arm64/include/asm/cpufeature.h:808 ./arch/arm64/include/asm/irqflags.h:73 arch/arm64/kernel/entry-common.c:169 arch/arm64/kernel/entry-common.c:182 arch/arm64/kernel/entry-common.c:749)
  el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:820)
  el0t_64_sync (arch/arm64/kernel/entry.S:600)

In this case, nbcon_atomic_flush_pending() is called from
printk_kthreads_shutdown() with IRQs and scheduling enabled.

Note that __nbcon_atomic_flush_pending_con() is directly called also from
nbcon_device_release() where the disabled IRQs might break PREEMPT_RT
guarantees. But the atomic flush is called only in emergency or panic
situations where the latencies are irrelevant anyway.

An ultimate solution would be a touching of watchdogs. But it would hide
all problems. Let's do it later when anyone reports a stall which does
not have a better solution.

Closes: https://lore.kernel.org/r/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupfexeml6hkd@v6sqmpbu3vvu
Tested-by: Breno Leitao <leitao@debian.org>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20251212124520.244483-1-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-12-15 16:18:41 +01:00
..
.kunitconfig printk: ringbuffer: Add KUnit test 2025-06-18 16:42:42 +02:00
braille.c printk: Replace strncmp() with str_has_prefix() 2019-08-16 09:54:08 +02:00
braille.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
console_cmdline.h printk: Add match_devname_and_update_preferred_console() 2024-07-04 15:41:44 +02:00
index.c kernel/printk/index.c: fix memory leak with using debugfs_lookup() 2023-02-03 10:42:02 +01:00
internal.h Merge branch 'rework/suspend-fixes' into for-linus 2025-12-01 14:16:28 +01:00
Makefile printk: ringbuffer: Add KUnit test 2025-06-18 16:42:42 +02:00
nbcon.c printk/nbcon: Restore IRQ in atomic flush after each emitted record 2025-12-15 16:18:41 +01:00
printk.c Merge branch 'rework/threaded-printk' into for-linus 2025-12-01 14:16:45 +01:00
printk_ringbuffer.c printk_ringbuffer: Create a helper function to decide whether more space is needed 2025-11-10 13:09:43 +01:00
printk_ringbuffer.h printk: nbcon: Init @nbcon_seq to highest possible 2024-09-04 15:56:32 +02:00
printk_ringbuffer_kunit_test.c printk: kunit: support offstack cpumask 2025-09-10 17:18:37 +02:00
printk_safe.c printk: Defer legacy printing when holding printk_cpu_sync 2024-12-16 13:26:31 +01:00
sysctl.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00