Linux kernel source tree
Find a file
Kumar Kartikeya Dwivedi 81d5a6a438 rqspinlock: Use trylock fallback when per-CPU rqnode is busy
In addition to deferring to the trylock fallback in NMIs, only do so
when an rqspinlock waiter is queued on the current CPU. This is detected
by noticing a non-zero node index. This allows NMI waiters to join the
waiter queue if it isn't interrupting an existing rqspinlock waiter, and
increase the chances of fairly obtaining the lock, performing deadlock
detection as the head, and not being starved while attempting the
trylock.

The trylock path in particular is unlikely to succeed under contention,
as it relies on the lock word becoming 0, which indicates no contention.
This means that the most likely result for NMIs attempting a trylock is
a timeout under contention if they don't hit an AA or ABBA case.

The core problem being addressed through the fixed commit was removing
the dependency edge between an NMI queue waiter and the queue waiter it
is interrupting. Whenever a circular dependency forms, and with no way
to break it (as non-head waiters don't poll for deadlocks or timeouts),
we would enter into a deadlock. A trylock either breaks such an edge by
probing for deadlocks, and finally terminating the waiting loop using a
timeout.

By excluding queueing on CPUs where the node index is non-zero for NMIs,
this sort of dependency is broken. The CPU enters the trylock path for
those cases, and falls back to deadlock checks and timeouts. However, in
other case where it doesn't interrupt the CPU in the slow path while its
queued on the lock, it can join the queue as a normal waiter, and avoid
trylock associated starvation and subsequent timeouts.

There are a few remaining cases here that matter: the NMI can still
preempt the owner in its critical section, and if it queues as a
non-head waiter, it can end up impeding the progress of the owner. While
this won't deadlock, since the head waiter will eventually signal the
NMI waiter to either stop (due to a timeout), it can still lead to long
timeouts. These gaps will be addressed in subsequent commits.

Note that while the node count detection approach is less conservative
than simply deferring NMIs to trylock, it is going to return errors
where attempts to lock B in NMI happen while waiters for lock A are in a
lower context on the same CPU. However, this only occurs when the lower
context is queued in the slow path, and the NMI attempt can proceed
without failure in all other cases. To continue to prevent AA deadlocks
(or ABBA in a similar NMI interrupting lower context pattern), we'd need
a more fleshed out algorithm to unlink NMI waiters after they queue and
detect such cases. However, all that complexity isn't appealing yet to
reduce the failure rate in the small window inside the slow path.

It is important to note that reentrancy in the slow path can also happen
through trace_contention_{begin,end}, but in those cases, unlike an NMI,
the forward progress of the head waiter (or the predecessor in general)
is not being blocked.

Fixes: 0d80e7f951 ("rqspinlock: Choose trylock fallback for NMI waiters")
Reported-by: Ritesh Oedayrajsingh Varma <ritesh@superluminal.eu>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251128232802.1031906-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29 09:35:35 -08:00
arch bpf: specify the old and new poke_type for bpf_arch_text_poke 2025-11-24 09:47:03 -08:00
block block-6.18-20251031 2025-10-31 12:57:19 -07:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto This push contains the following changes: 2025-10-10 08:56:16 -07:00
Documentation docs: bpf: map_array: Specify BPF_MAP_TYPE_PERCPU_ARRAY value size limit 2025-11-25 14:32:00 -08:00
drivers pci-v6.18-fixes-5 2025-11-14 15:45:31 -08:00
fs More NFS Client Bugfixes for Linux 6.18-rc 2025-11-14 13:44:23 -08:00
include rqspinlock: Enclose lock/unlock within lock entry acquisitions 2025-11-29 09:35:35 -08:00
init printk changes for 6.18 2025-10-04 11:13:11 -07:00
io_uring io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs 2025-11-12 08:25:33 -07:00
ipc namespace-6.18-rc1 2025-09-29 11:20:29 -07:00
kernel rqspinlock: Use trylock fallback when per-CPU rqnode is busy 2025-11-29 09:35:35 -08:00
lib Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after 6.18-rc5+ 2025-11-14 17:43:41 -08:00
LICENSES LICENSES: Replace the obsolete address of the FSF in the GFDL-1.2 2025-07-24 11:15:39 +02:00
mm slab fix for 6.18-rc6 2025-11-13 11:42:44 -08:00
net bpf: Remove smap argument from bpf_selem_free() 2025-11-18 16:20:25 -08:00
rust rust: Add -fno-isolate-erroneous-paths-dereference to bindgen_skip_c_flags 2025-11-10 08:37:06 +08:00
samples samples/bpf: Fix spelling typos in samples/bpf 2025-10-18 19:26:23 -07:00
scripts Rust fixes for v6.18 (2nd) 2025-11-14 15:36:15 -08:00
security integrity-v6.18 2025-10-05 10:48:33 -07:00
sound ALSA: usb-audio: Add native DSD quirks for PureAudio DAC series 2025-11-14 14:19:47 +01:00
tools bpf: Remove runqslower tool 2025-11-28 15:20:16 -08:00
usr gen_init_cpio: Ignore fsync() returning EINVAL on pipes 2025-10-07 09:53:05 -07:00
virt KVM: guest_memfd: Remove bindings on memslot deletion when gmem is dying 2025-11-04 09:16:53 -08:00
.clang-format memblock: drop for_each_free_mem_pfn_range_in_zone_from() 2025-09-14 08:49:03 +03:00
.clippy.toml rust: clean Rust 1.88.0's warning about clippy::disallowed_macros configuration 2025-05-07 00:11:47 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore MAINTAINERS: remove Alyssa Rosenzweig 2025-09-18 21:17:31 +02:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore .gitignore: ignore compile_commands.json globally 2025-08-12 15:53:55 -07:00
.mailmap KVM/arm654 fixes for 6.18, take #2 2025-11-09 08:07:55 +01:00
.pylintrc tools: docs: parse-headers.py: move it from sphinx dir 2025-08-29 15:54:42 -06:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: mark ISDN subsystem as orphan 2025-10-27 17:49:45 -07:00
Kbuild sched: Make migrate_{en,dis}able() inline 2025-09-25 09:57:16 +02:00
Kconfig io_uring: Rename KConfig to Kconfig 2025-02-19 14:53:27 -07:00
MAINTAINERS Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after 6.18-rc5+ 2025-11-14 17:43:41 -08:00
Makefile Linux 6.18-rc5 2025-11-09 15:10:19 -08:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.