linux/net
Victor Nogueira f179f5bc15 net/sched: sch_dualpi2: Run prob update timer in softirq to avoid deadlock
When a user creates a dualpi2 qdisc it automatically sets a timer. This
timer will run constantly and update the qdisc's probability field.
The issue is that the timer acquires the qdisc root lock and runs in
hardirq. The qdisc root lock is also acquired in dev.c whenever a packet
arrives for this qdisc. Since the dualpi2 timer callback runs in hardirq,
it may interrupt the packet processing running in softirq. If that happens
and it runs on the same CPU, it will acquire the same lock and cause a
deadlock. The following splat shows up when running a kernel compiled with
lock debugging:

[  +0.000224] WARNING: inconsistent lock state
[  +0.000224] 6.16.0+ #10 Not tainted
[  +0.000169] --------------------------------
[  +0.000029] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[  +0.000000] ping/156 [HC0[0]:SC0[2]:HE1:SE0] takes:
[  +0.000000] ffff897841242110 (&sch->root_lock_key){?.-.}-{3:3}, at: __dev_queue_xmit+0x86d/0x1140
[  +0.000000] {IN-HARDIRQ-W} state was registered at:
[  +0.000000]   lock_acquire.part.0+0xb6/0x220
[  +0.000000]   _raw_spin_lock+0x31/0x80
[  +0.000000]   dualpi2_timer+0x6f/0x270
[  +0.000000]   __hrtimer_run_queues+0x1c5/0x360
[  +0.000000]   hrtimer_interrupt+0x115/0x260
[  +0.000000]   __sysvec_apic_timer_interrupt+0x6d/0x1a0
[  +0.000000]   sysvec_apic_timer_interrupt+0x6e/0x80
[  +0.000000]   asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  +0.000000]   pv_native_safe_halt+0xf/0x20
[  +0.000000]   default_idle+0x9/0x10
[  +0.000000]   default_idle_call+0x7e/0x1e0
[  +0.000000]   do_idle+0x1e8/0x250
[  +0.000000]   cpu_startup_entry+0x29/0x30
[  +0.000000]   rest_init+0x151/0x160
[  +0.000000]   start_kernel+0x6f3/0x700
[  +0.000000]   x86_64_start_reservations+0x24/0x30
[  +0.000000]   x86_64_start_kernel+0xc8/0xd0
[  +0.000000]   common_startup_64+0x13e/0x148
[  +0.000000] irq event stamp: 6884
[  +0.000000] hardirqs last  enabled at (6883): [<ffffffffa75700b3>] neigh_resolve_output+0x223/0x270
[  +0.000000] hardirqs last disabled at (6882): [<ffffffffa7570078>] neigh_resolve_output+0x1e8/0x270
[  +0.000000] softirqs last  enabled at (6880): [<ffffffffa757006b>] neigh_resolve_output+0x1db/0x270
[  +0.000000] softirqs last disabled at (6884): [<ffffffffa755b533>] __dev_queue_xmit+0x73/0x1140
[  +0.000000]
              other info that might help us debug this:
[  +0.000000]  Possible unsafe locking scenario:

[  +0.000000]        CPU0
[  +0.000000]        ----
[  +0.000000]   lock(&sch->root_lock_key);
[  +0.000000]   <Interrupt>
[  +0.000000]     lock(&sch->root_lock_key);
[  +0.000000]
               *** DEADLOCK ***

[  +0.000000] 4 locks held by ping/156:
[  +0.000000]  #0: ffff897842332e08 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x41e/0xf40
[  +0.000000]  #1: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_output+0x2c/0x190
[  +0.000000]  #2: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_finish_output2+0xad/0x950
[  +0.000000]  #3: ffffffffa816f840 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x73/0x1140

I am able to reproduce it consistently when running the following:

tc qdisc add dev lo handle 1: root dualpi2
ping -f 127.0.0.1

To fix it, make the timer run in softirq.

Fixes: 320d031ad6 ("sched: Struct definition and parsing of dualpi2 qdisc")
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250815135317.664993-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19 17:49:01 -07:00
..
6lowpan net: replace ND_PRINTK with dynamic debug 2025-07-10 15:27:32 -07:00
9p netfs: Fix the request's work item to not require a ref 2025-05-21 14:35:20 +02:00
802 treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
8021q net: s/dev_close_many/netif_close_many/ 2025-07-18 17:27:47 -07:00
appletalk Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-07-24 11:10:46 -07:00
atm atm: clip: Fix NULL pointer dereference in vcc_sendmsg() 2025-07-09 19:09:36 -07:00
ax25 treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
batman-adv This cleanup patchset includes the following patches: 2025-07-11 17:50:27 -07:00
bluetooth Bluetooth: hci_conn: do return error from hci_enhanced_setup_sync() 2025-08-15 10:13:09 -04:00
bpf bpf: Add attach_type field to bpf_link 2025-07-11 10:51:55 -07:00
bridge net: bridge: fix soft lockup in br_multicast_query_expired() 2025-08-14 17:49:33 -07:00
caif caif: reduce stack size, again 2025-06-23 16:58:43 -07:00
can Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-06-12 10:09:10 -07:00
ceph libceph: Rename hmac_sha256() to ceph_hmac_sha256() 2025-07-04 10:18:52 -07:00
core net: gso: Forbid IPv6 TSO with extensions on devices with only IPV6_CSUM 2025-08-18 17:20:06 -07:00
dcb
devlink devlink: let driver opt out of automatic phys_port_name generation 2025-08-12 13:23:39 -07:00
dns_resolver
dsa net: s/dev_close_many/netif_close_many/ 2025-07-18 17:27:47 -07:00
ethernet
ethtool ethtool: rss: support removing contexts via Netlink 2025-07-21 18:21:19 -07:00
handshake net/handshake: Add new parameter 'HANDSHAKE_A_ACCEPT_KEYRING' 2025-07-08 15:31:44 +02:00
hsr treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
ieee802154 treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
ife
ipv4 ipsec-2025-08-11 2025-08-12 15:01:09 +02:00
ipv6 ipv6: sr: validate HMAC algorithm ID in seg6_hmac_info_add 2025-08-18 17:35:50 -07:00
iucv s390/drivers: Explicitly include <linux/export.h> 2025-06-17 18:18:02 +02:00
kcm net: kcm: Fix race condition in kcm_unattach() 2025-08-13 18:18:33 -07:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-07-24 11:10:46 -07:00
l2tp net: annotate races around sk->sk_uid 2025-06-23 17:04:03 -07:00
l3mdev net: fib_rules: Fix iif / oif matching on L3 master device 2025-04-15 17:54:56 -07:00
lapb treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
llc net: make sk->sk_rcvtimeo lockless 2025-06-23 17:05:12 -07:00
mac80211 wifi: mac80211: fix WARN_ON for monitor mode on some devices 2025-07-23 12:29:07 +02:00
mac802154
mctp net: mctp: Fix bad kfree_skb in bind lookup test 2025-08-13 17:07:34 -07:00
mpls net: s/dev_get_flags/netif_get_flags/ 2025-07-18 17:27:47 -07:00
mptcp mptcp: disable add_addr retransmission when timeout is 0 2025-08-18 17:39:58 -07:00
ncsi net: ncsi: Fix buffer overflow in fetching version id 2025-06-12 18:21:59 -07:00
netfilter netfilter: nf_tables: reject duplicate device on updates 2025-08-13 08:34:55 +02:00
netlabel calipso: unlock rcu before returning -EAFNOSUPPORT 2025-06-05 08:03:38 -07:00
netlink netlink: avoid infinite retry looping in netlink_unicast() 2025-07-30 19:16:49 -07:00
netrom treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-06-19 13:00:24 -07:00
nsh
openvswitch net: openvswitch: allow providing upcall pid for the 'execute' command 2025-07-07 14:30:39 -07:00
packet net/packet: fix a race in packet_set_ring() and packet_notifier() 2025-08-04 17:21:27 -07:00
phonet Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-07-17 11:00:33 -07:00
psample
qrtr
rds don't open-code kernel_accept() in rds_tcp_accept_one() 2025-07-15 16:19:54 -07:00
rfkill
rose net: track pfmemalloc drops via SKB_DROP_REASON_PFMEMALLOC 2025-07-18 16:59:05 -07:00
rxrpc rxrpc: Fix to use conn aborts for conn-wide failures 2025-07-17 07:50:48 -07:00
sched net/sched: sch_dualpi2: Run prob update timer in softirq to avoid deadlock 2025-08-19 17:49:01 -07:00
sctp sctp: linearize cloned gso packets in sctp_rcv 2025-08-08 13:08:06 -07:00
shaper
smc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-07-17 11:00:33 -07:00
strparser net: make sk->sk_rcvtimeo lockless 2025-06-23 17:05:12 -07:00
sunrpc nfsd-6.17 fixes: 2025-08-11 07:38:55 -07:00
switchdev net: switchdev: Convert blocking notification chain to a raw one 2025-03-11 11:30:28 +01:00
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-07-10 10:10:49 -07:00
tls tls: handle data disappearing from under the TLS ULP 2025-08-12 18:59:05 -07:00
unix Networking changes for 6.17. 2025-07-30 08:58:55 -07:00
vmw_vsock vsock: Do not allow binding to VMADDR_PORT_ANY 2025-08-08 12:55:00 -07:00
wireless Another wireless update: 2025-07-24 17:25:42 -07:00
x25 net/x25: Remove unused x25_terminate_link() 2025-07-14 17:19:13 -07:00
xdp net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt 2025-07-10 14:48:29 +02:00
xfrm xfrm: bring back device check in validate_xmit_xfrm 2025-08-07 08:07:01 +02:00
compat.c
devres.c
Kconfig net: Kconfig: add endif/endmenu comments 2025-07-22 18:17:23 -07:00
Kconfig.debug
Makefile net: Retire DCCP socket. 2025-04-11 18:58:10 -07:00
socket.c net: annotate races around sk->sk_uid 2025-06-23 17:04:03 -07:00
sysctl_net.c