Linux kernel source tree
Find a file
Jakub Kicinski 2da4def0f4 netpoll: prevent hanging NAPI when netcons gets enabled
Paolo spotted hangs in NIPA running driver tests against virtio.
The tests hang in virtnet_close() -> virtnet_napi_tx_disable().

The problem is only reproducible if running multiple of our tests
in sequence (I used TEST_PROGS="xdp.py ping.py netcons_basic.sh \
netpoll_basic.py stats.py"). Initial suspicion was that this is
a simple case of double-disable of NAPI, but instrumenting the
code reveals:

 Deadlocked on NAPI ffff888007cd82c0 (virtnet_poll_tx):
   state: 0x37, disabled: false, owner: 0, listed: false, weight: 64

The NAPI was not in fact disabled, owner is 0 (rather than -1),
so the NAPI "thinks" it's scheduled for CPU 0 but it's not listed
(!list_empty(&n->poll_list) => false). It seems odd that normal NAPI
processing would wedge itself like this.

Better suspicion is that netpoll gets enabled while NAPI is polling,
and also grabs the NAPI instance. This confuses napi_complete_done():

  [netpoll]                                   [normal NAPI]
                                        napi_poll()
                                          have = netpoll_poll_lock()
                                            rcu_access_pointer(dev->npinfo)
                                              return NULL # no netpoll
                                          __napi_poll()
					    ->poll(->weight)
  poll_napi()
    cmpxchg(->poll_owner, -1, cpu)
      poll_one_napi()
        set_bit(NAPI_STATE_NPSVC, ->state)
                                              napi_complete_done()
                                                if (NAPIF_STATE_NPSVC)
                                                  return false
                                           # exit without clearing SCHED

This feels very unlikely, but perhaps virtio has some interactions
with the hypervisor in the NAPI ->poll that makes the race window
larger?

Best I could to to prove the theory was to add and trigger this
warning in napi_poll (just before netpoll_poll_unlock()):

      WARN_ONCE(!have && rcu_access_pointer(n->dev->npinfo) &&
                napi_is_scheduled(n) && list_empty(&n->poll_list),
                "NAPI race with netpoll %px", n);

If this warning hits the next virtio_close() will hang.

This patch survived 30 test iterations without a hang (without it
the longest clean run was around 10). Credit for triggering this
goes to Breno's recent netconsole tests.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/c5a93ed1-9abe-4880-a3bb-8d1678018b1d@redhat.com
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20250726010846.1105875-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-30 18:05:52 -07:00
arch bpf-next-6.17 2025-07-30 09:58:50 -07:00
block Driver core changes for 6.17-rc1 2025-07-29 12:15:39 -07:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto Crypto library updates for 6.17 2025-07-28 17:58:52 -07:00
Documentation bpf-next-6.17 2025-07-30 09:58:50 -07:00
drivers net: ti: icss-iep: fix device and OF node leaks at probe 2025-07-30 18:02:48 -07:00
fs bpf-next-6.17 2025-07-30 09:58:50 -07:00
include bpf-next-6.17 2025-07-30 09:58:50 -07:00
init x86/boot changes for v6.17: 2025-07-29 18:58:22 -07:00
io_uring for-6.17/io_uring-20250728 2025-07-28 16:30:12 -07:00
ipc vfs-6.17-rc1.mmap_prepare 2025-07-28 13:43:25 -07:00
kernel bpf-next-6.17 2025-07-30 09:58:50 -07:00
lib Networking changes for 6.17. 2025-07-30 08:58:55 -07:00
LICENSES LICENSES: Replace the obsolete address of the FSF in the GFDL-1.2 2025-07-24 11:15:39 +02:00
mm Summary 2025-07-29 21:43:08 -07:00
net netpoll: prevent hanging NAPI when netcons gets enabled 2025-07-30 18:05:52 -07:00
rust Networking changes for 6.17. 2025-07-30 08:58:55 -07:00
samples Driver core changes for 6.17-rc1 2025-07-29 12:15:39 -07:00
scripts Networking changes for 6.17. 2025-07-30 08:58:55 -07:00
security powerpc updates for 6.17 2025-07-29 20:28:38 -07:00
sound A treewide cleanup of struct cycle_counter const annotations: 2025-07-29 14:02:53 -07:00
tools bpf-next-6.17 2025-07-30 09:58:50 -07:00
usr usr/include: openrisc: don't HDRTEST bpf_perf_event.h 2025-05-12 15:03:17 +09:00
virt KVM: Allow CPU to reschedule while setting per-page memory attributes 2025-06-24 12:20:17 -07:00
.clang-format Linux 6.15-rc5 2025-05-06 16:39:25 +10:00
.clippy.toml rust: clean Rust 1.88.0's warning about clippy::disallowed_macros configuration 2025-05-07 00:11:47 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore MAINTAINERS: Retire Ralf Baechle 2024-11-12 15:48:59 +01:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore .gitignore: ignore Python compiled bytecode 2025-04-24 10:12:46 -06:00
.mailmap 11 hotfixes. 9 are cc:stable and the remainder address post-6.15 issues 2025-07-24 19:13:30 -07:00
.pylintrc docs: add a .pylintrc file with sys path for docs scripts 2025-04-09 12:10:33 -06:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS mm: update MAINTAINERS entry for HMM 2025-07-19 19:26:16 -07:00
Kbuild drm: ensure drm headers are self-contained and pass kernel-doc 2025-02-12 10:44:43 +02:00
Kconfig io_uring: Rename KConfig to Kconfig 2025-02-19 14:53:27 -07:00
MAINTAINERS Networking changes for 6.17. 2025-07-30 08:58:55 -07:00
Makefile hardening updates for v6.17-rc1 2025-07-28 17:16:12 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.