Commit graph

83209 commits

Author SHA1 Message Date
Eric Dumazet
a4c2b8be2e net_sched: sch_fq: clear q->band_pkt_count[] in fq_reset()
When/if a NIC resets, queues are deactivated by dev_deactivate_many(),
then reactivated when the reset operation completes.

fq_reset() removes all the skbs from various queues.

If we do not clear q->band_pkt_count[], these counters keep growing
and can eventually reach sch->limit, preventing new packets to be queued.

Many thanks to Praveen for discovering the root cause.

Fixes: 29f834aa32 ("net_sched: sch_fq: add 3 bands and WRR scheduling")
Diagnosed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260304015640.961780-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04 17:54:22 -08:00
Ian Ray
f7d92f11bd net: nfc: nci: Fix zero-length proprietary notifications
NCI NFC controllers may have proprietary OIDs with zero-length payload.
One example is: drivers/nfc/nxp-nci/core.c, NXP_NCI_RF_TXLDO_ERROR_NTF.

Allow a zero length payload in proprietary notifications *only*.

Before:

-- >8 --
kernel: nci: nci_recv_frame: len 3
-- >8 --

After:

-- >8 --
kernel: nci: nci_recv_frame: len 3
kernel: nci: nci_ntf_packet: NCI RX: MT=ntf, PBF=0, GID=0x1, OID=0x23, plen=0
kernel: nci: nci_ntf_packet: unknown ntf opcode 0x123
kernel: nfc nfc0: NFC: RF transmitter couldn't start. Bad power and/or configuration?
-- >8 --

After fixing the hardware:

-- >8 --
kernel: nci: nci_recv_frame: len 27
kernel: nci: nci_ntf_packet: NCI RX: MT=ntf, PBF=0, GID=0x1, OID=0x5, plen=24
kernel: nci: nci_rf_intf_activated_ntf_packet: rf_discovery_id 1
-- >8 --

Fixes: d24b03535e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet")
Signed-off-by: Ian Ray <ian.ray@gehealthcare.com>
Link: https://patch.msgid.link/20260302163238.140576-1-ian.ray@gehealthcare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04 17:48:12 -08:00
Eric Dumazet
165573e41f tcp: secure_seq: add back ports to TS offset
This reverts 28ee1b746f ("secure_seq: downgrade to per-host timestamp offsets")

tcp_tw_recycle went away in 2017.

Zhouyan Deng reported off-path TCP source port leakage via
SYN cookie side-channel that can be fixed in multiple ways.

One of them is to bring back TCP ports in TS offset randomization.

As a bonus, we perform a single siphash() computation
to provide both an ISN and a TS offset.

Fixes: 28ee1b746f ("secure_seq: downgrade to per-host timestamp offsets")
Reported-by: Zhouyan Deng <dengzhouyan_nwpu@163.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260302205527.1982836-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04 17:44:35 -08:00
Jakub Kicinski
c649e99764 linux-can-fixes-for-7.0-20260302
-----BEGIN PGP SIGNATURE-----
 
 iIkEABYKADEWIQSl+MghEFFAdY3pYJLMOmT6rpmt0gUCaaVwehMcbWtsQHBlbmd1
 dHJvbml4LmRlAAoJEMw6ZPquma3SqFUA/ihDNaZuD1HDNZ6tFugz4gcvytH4LT+R
 CRZXS+a1FRLyAQCuTiN1k080l4pj0sVDNlkymjxcn7a8RZ+Dk/Wy3b7JDg==
 =e56S
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-7.0-20260302' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2026-03-02

The first 2 patches are by Oliver Hartkopp. The first fixes the
locking for CAN Broadcast Manager op runtime updates, the second fixes
the packet statisctics for the CAN dummy driver.

Alban Bedel's patch fixes a potential problem in the error path of the
mcp251x's ndo_open callback.

A patch by Ziyi Guo add USB endpoint type validation to the esd_usb
driver.

The next 6 patches are by Greg Kroah-Hartman and fix URB data parsing
for the ems_usb and ucan driver, fix URB anchoring in the etas_es58x,
and in the f81604 driver fix URB data parsing, add URB error handling
and fix URB anchoring.

A patch by me targets the gs_usb driver and fixes interoperability
with the CANable-2.5 firmware by always configuring the bit rate
before starting the device.

The last patch is by Frank Li and fixes a CHECK_DTBS warning for the
nxp,sja1000 dt-binding.

* tag 'linux-can-fixes-for-7.0-20260302' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  dt-bindings: net: can: nxp,sja1000: add reference to mc-peripheral-props.yaml
  can: gs_usb: gs_can_open(): always configure bitrates before starting device
  can: usb: f81604: correctly anchor the urb in the read bulk callback
  can: usb: f81604: handle bulk write errors properly
  can: usb: f81604: handle short interrupt urb messages properly
  can: usb: etas_es58x: correctly anchor the urb in the read bulk callback
  can: ucan: Fix infinite loop from zero-length messages
  can: ems_usb: ems_usb_read_bulk_callback(): check the proper length of a message
  can: esd_usb: add endpoint type validation
  can: mcp251x: fix deadlock in error path of mcp251x_open
  can: dummy_can: dummy_can_init(): fix packet statistics
  can: bcm: fix locking for bcm_op runtime updates
====================

Link: https://patch.msgid.link/20260302152755.1700177-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04 16:47:46 -08:00
Jakub Kicinski
2697c45a48 Some more fixes:
- mt76 gets three almost identical new length checks
  - cw1200 & ti: locking fixes
  - mac80211 has a fix for the recent EML frame handling
  - rsi driver no longer oddly responds to config, which
    had triggered a warning in mac80211
  - ath12k has two fixes for station statistics handling
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmmoFjoACgkQ10qiO8sP
 aACNYA/9Fqvno+pZXvGHo9yIsKgW1WfOVADhXkcG1rKDLcUS7MAzIXx3eru4Uteq
 wXD4rX07M2E0qa2TGmemCJSZPlYdNWkSn3EXGfn2TDmUH0u991kVa7wHMbUal4XQ
 lOlQns6c5B5r1wA7Cg1MTzDqplkIQacJ4T4Hf+v7n7twz9uKmfBYpR1vNek60Cx3
 smCrdMTbsMO1YBcMjoIoMYs7sPWk/2MFiCS0YxRlOik1bmZnwf34Y9p+sOSvZxRV
 Ydt/9pBlwA2eFzdKPdoSZmMtfs+OdAiICqSC6CSI1KylgLOwnEl/UPN/bUxszGx1
 NHjp2+8/yfOpm+EXwqWjeZ46lbw6YkkjWmxraNqRSCvQpPdc67RbqN405fFjonaR
 YMh6UuBHMlTGAbCLXwSkL5TKBD3EHq4PXnc6aFoBpPgaRxppGuT+Lx8PQny5fMOq
 GbOo3XUyB1Dc6B1A+xCN95DmPdg223RBR/qJCMHRkzaKm1ExO5hyYFQlZcH4bFHJ
 9K2bTzZ7/w5UNYEiqX1yQ7btIlqf/gdeBANjLHNdqlsJ8i3GKm6TZV1rMnvt9MWC
 21yYZ/Bwf4SzLgCH6Ds1RGEJjXusYnea/XJKeSPE0xqicFrXDqKGrmYIDFbntFLa
 mAqWmTqTVFl697h8PFoi47YqKr/5B+Ew3+1ZAGESc1RYWq+GDjI=
 =/SxR
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2026-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Some more fixes:
 - mt76 gets three almost identical new length checks
 - cw1200 & ti: locking fixes
 - mac80211 has a fix for the recent EML frame handling
 - rsi driver no longer oddly responds to config, which
   had triggered a warning in mac80211
 - ath12k has two fixes for station statistics handling

* tag 'wireless-2026-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: mt76: Fix possible oob access in mt76_connac2_mac_write_txwi_80211()
  wifi: mt76: mt7925: Fix possible oob access in mt7925_mac_write_txwi_80211()
  wifi: mt76: mt7996: Fix possible oob access in mt7996_mac_write_txwi_80211()
  wifi: wlcore: Fix a locking bug
  wifi: cw1200: Fix locking in error paths
  wifi: mac80211: fix missing ieee80211_eml_params member initialization
  wifi: rsi: Don't default to -EOPNOTSUPP in rsi_mac80211_config
  wifi: ath12k: fix station lookup failure when disconnecting from AP
  wifi: ath12k: use correct pdev id when requesting firmware stats
====================

Link: https://patch.msgid.link/20260304112500.169639-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04 15:29:56 -08:00
Eric Biggers
46d0d6f50d net/tcp-md5: Fix MAC comparison to be constant-time
To prevent timing attacks, MACs need to be compared in constant
time.  Use the appropriate helper function for this.

Fixes: cfb6eeb4c8 ("[TCP]: MD5 Signature Option (RFC2385) support.")
Fixes: 658ddaaf66 ("tcp: md5: RST: getting md5 key from listener")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20260302203409.13388-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-03 18:39:43 -08:00
Yung Chih Su
4ee7fa6cf7 net: ipv4: fix ARM64 alignment fault in multipath hash seed
`struct sysctl_fib_multipath_hash_seed` contains two u32 fields
(user_seed and mp_seed), making it an 8-byte structure with a 4-byte
alignment requirement.

In `fib_multipath_hash_from_keys()`, the code evaluates the entire
struct atomically via `READ_ONCE()`:

    mp_seed = READ_ONCE(net->ipv4.sysctl_fib_multipath_hash_seed).mp_seed;

While this silently works on GCC by falling back to unaligned regular
loads which the ARM64 kernel tolerates, it causes a fatal kernel panic
when compiled with Clang and LTO enabled.

Commit e35123d83e ("arm64: lto: Strengthen READ_ONCE() to acquire
when CONFIG_LTO=y") strengthens `READ_ONCE()` to use Load-Acquire
instructions (`ldar` / `ldapr`) to prevent compiler reordering bugs
under Clang LTO. Since the macro evaluates the full 8-byte struct,
Clang emits a 64-bit `ldar` instruction. ARM64 architecture strictly
requires `ldar` to be naturally aligned, thus executing it on a 4-byte
aligned address triggers a strict Alignment Fault (FSC = 0x21).

Fix the read side by moving the `READ_ONCE()` directly to the `u32`
member, which emits a safe 32-bit `ldar Wn`.

Furthermore, Eric Dumazet pointed out that `WRITE_ONCE()` on the entire
struct in `proc_fib_multipath_hash_set_seed()` is also flawed. Analysis
shows that Clang splits this 8-byte write into two separate 32-bit
`str` instructions. While this avoids an alignment fault, it destroys
atomicity and exposes a tear-write vulnerability. Fix this by
explicitly splitting the write into two 32-bit `WRITE_ONCE()`
operations.

Finally, add the missing `READ_ONCE()` when reading `user_seed` in
`proc_fib_multipath_hash_seed()` to ensure proper pairing and
concurrency safety.

Fixes: 4ee2a8cace ("net: ipv4: Add a sysctl to set multipath hash seed")
Signed-off-by: Yung Chih Su <yuuchihsu@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260302060247.7066-1-yuuchihsu@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-03 17:20:37 -08:00
Eric Biggers
67edfec516 net/tcp-ao: Fix MAC comparison to be constant-time
To prevent timing attacks, MACs need to be compared in constant
time.  Use the appropriate helper function for this.

Fixes: 0a3a809089 ("net/tcp: Verify inbound TCP-AO signed segments")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20260302203600.13561-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-03 17:16:54 -08:00
Jakub Kicinski
2ffb4f5c2c ipv6: fix NULL pointer deref in ip6_rt_get_dev_rcu()
l3mdev_master_dev_rcu() can return NULL when the slave device is being
un-slaved from a VRF. All other callers deal with this, but we lost
the fallback to loopback in ip6_rt_pcpu_alloc() -> ip6_rt_get_dev_rcu()
with commit 4832c30d54 ("net: ipv6: put host and anycast routes on
device with address").

  KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f]
  RIP: 0010:ip6_rt_pcpu_alloc (net/ipv6/route.c:1418)
  Call Trace:
   ip6_pol_route (net/ipv6/route.c:2318)
   fib6_rule_lookup (net/ipv6/fib6_rules.c:115)
   ip6_route_output_flags (net/ipv6/route.c:2607)
   vrf_process_v6_outbound (drivers/net/vrf.c:437)

I was tempted to rework the un-slaving code to clear the flag first
and insert synchronize_rcu() before we remove the upper. But looks like
the explicit fallback to loopback_dev is an established pattern.
And I guess avoiding the synchronize_rcu() is nice, too.

Fixes: 4832c30d54 ("net: ipv6: put host and anycast routes on device with address")
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260301194548.927324-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-03 17:14:48 -08:00
YiFei Zhu
1a86a1f7d8 net: Fix rcu_tasks stall in threaded busypoll
I was debugging a NIC driver when I noticed that when I enable
threaded busypoll, bpftrace hangs when starting up. dmesg showed:

  rcu_tasks_wait_gp: rcu_tasks grace period number 85 (since boot) is 10658 jiffies old.
  rcu_tasks_wait_gp: rcu_tasks grace period number 85 (since boot) is 40793 jiffies old.
  rcu_tasks_wait_gp: rcu_tasks grace period number 85 (since boot) is 131273 jiffies old.
  rcu_tasks_wait_gp: rcu_tasks grace period number 85 (since boot) is 402058 jiffies old.
  INFO: rcu_tasks detected stalls on tasks:
  00000000769f52cd: .N nvcsw: 2/2 holdout: 1 idle_cpu: -1/64
  task:napi/eth2-8265  state:R  running task     stack:0     pid:48300 tgid:48300 ppid:2      task_flags:0x208040 flags:0x00004000
  Call Trace:
   <TASK>
   ? napi_threaded_poll_loop+0x27c/0x2c0
   ? __pfx_napi_threaded_poll+0x10/0x10
   ? napi_threaded_poll+0x26/0x80
   ? kthread+0xfa/0x240
   ? __pfx_kthread+0x10/0x10
   ? ret_from_fork+0x31/0x50
   ? __pfx_kthread+0x10/0x10
   ? ret_from_fork_asm+0x1a/0x30
   </TASK>

The cause is that in threaded busypoll, the main loop is in
napi_threaded_poll rather than napi_threaded_poll_loop, where the
latter rarely iterates more than once within its loop. For
rcu_softirq_qs_periodic inside napi_threaded_poll_loop to report its
qs state, the last_qs must be 100ms behind, and this can't happen
because napi_threaded_poll_loop rarely iterates in threaded busypoll,
and each time napi_threaded_poll_loop is called last_qs is reset to
latest jiffies.

This patch changes so that in threaded busypoll, last_qs is saved
in the outer napi_threaded_poll, and whether busy_poll_last_qs
is NULL indicates whether napi_threaded_poll_loop is called for
busypoll. This way last_qs would not reset to latest jiffies on
each invocation of napi_threaded_poll_loop.

Fixes: c18d4b190a ("net: Extend NAPI threaded polling to allow kthread based busy polling")
Cc: stable@vger.kernel.org
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Link: https://patch.msgid.link/20260227221937.1060857-1-zhuyifei@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-03 13:44:28 +01:00
Allison Henderson
6a877ececd net/rds: Fix circular locking dependency in rds_tcp_tune
syzbot reported a circular locking dependency in rds_tcp_tune() where
sk_net_refcnt_upgrade() is called while holding the socket lock:

======================================================
WARNING: possible circular locking dependency detected
======================================================
kworker/u10:8/15040 is trying to acquire lock:
ffffffff8e9aaf80 (fs_reclaim){+.+.}-{0:0},
at: __kmalloc_cache_noprof+0x4b/0x6f0

but task is already holding lock:
ffff88805a3c1ce0 (k-sk_lock-AF_INET6){+.+.}-{0:0},
at: rds_tcp_tune+0xd7/0x930

The issue occurs because sk_net_refcnt_upgrade() performs memory
allocation (via get_net_track() -> ref_tracker_alloc()) while the
socket lock is held, creating a circular dependency with fs_reclaim.

Fix this by moving sk_net_refcnt_upgrade() outside the socket lock
critical section. This is safe because the fields modified by the
sk_net_refcnt_upgrade() call (sk_net_refcnt, ns_tracker) are not
accessed by any concurrent code path at this point.

v2:
  - Corrected fixes tag
  - check patch line wrap nits
  - ai commentary nits

Reported-by: syzbot+2e2cf5331207053b8106@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2e2cf5331207053b8106
Fixes: 3a58f13a88 ("net: rds: acquire refcount on TCP sockets")
Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260227202336.167757-1-achender@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-03 12:57:06 +01:00
MeiChia Chiu
8fb54c7307 wifi: mac80211: fix missing ieee80211_eml_params member initialization
The missing initialization causes driver to misinterpret the EML control bitmap,
resulting in incorrect link bitmap handling.

Fixes: 0d95280a2d ("wifi: mac80211: Add eMLSR/eMLMR action frame parsing support")
Signed-off-by: MeiChia Chiu <MeiChia.Chiu@mediatek.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260303054725.471548-1-MeiChia.Chiu@mediatek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-03-03 08:37:29 +01:00
Oliver Hartkopp
c35636e91e can: bcm: fix locking for bcm_op runtime updates
Commit c2aba69d0c ("can: bcm: add locking for bcm_op runtime updates")
added a locking for some variables that can be modified at runtime when
updating the sending bcm_op with a new TX_SETUP command in bcm_tx_setup().

Usually the RX_SETUP only handles and filters incoming traffic with one
exception: When the RX_RTR_FRAME flag is set a predefined CAN frame is
sent when a specific RTR frame is received. Therefore the rx bcm_op uses
bcm_can_tx() which uses the bcm_tx_lock that was only initialized in
bcm_tx_setup(). Add the missing spin_lock_init() when allocating the
bcm_op in bcm_rx_setup() to handle the RTR case properly.

Fixes: c2aba69d0c ("can: bcm: add locking for bcm_op runtime updates")
Reported-by: syzbot+5b11eccc403dd1cea9f8@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-can/699466e4.a70a0220.2c38d7.00ff.GAE@google.com/
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20260218-bcm_spin_lock_init-v1-1-592634c8a5b5@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2026-03-02 10:24:40 +01:00
Jiayuan Chen
101bacb303 atm: lec: fix null-ptr-deref in lec_arp_clear_vccs
syzkaller reported a null-ptr-deref in lec_arp_clear_vccs().
This issue can be easily reproduced using the syzkaller reproducer.

In the ATM LANE (LAN Emulation) module, the same atm_vcc can be shared by
multiple lec_arp_table entries (e.g., via entry->vcc or entry->recv_vcc).
When the underlying VCC is closed, lec_vcc_close() iterates over all
ARP entries and calls lec_arp_clear_vccs() for each matched entry.

For example, when lec_vcc_close() iterates through the hlists in
priv->lec_arp_empty_ones or other ARP tables:

1. In the first iteration, for the first matched ARP entry sharing the VCC,
lec_arp_clear_vccs() frees the associated vpriv (which is vcc->user_back)
and sets vcc->user_back to NULL.
2. In the second iteration, for the next matched ARP entry sharing the same
VCC, lec_arp_clear_vccs() is called again. It obtains a NULL vpriv from
vcc->user_back (via LEC_VCC_PRIV(vcc)) and then attempts to dereference it
via `vcc->pop = vpriv->old_pop`, leading to a null-ptr-deref crash.

Fix this by adding a null check for vpriv before dereferencing
it. If vpriv is already NULL, it means the VCC has been cleared
by a previous call, so we can safely skip the cleanup and just
clear the entry's vcc/recv_vcc pointers.

The entire cleanup block (including vcc_release_async()) is placed inside
the vpriv guard because a NULL vpriv indicates the VCC has already been
fully released by a prior iteration — repeating the teardown would
redundantly set flags and trigger callbacks on an already-closing socket.

The Fixes tag points to the initial commit because the entry->vcc path has
been vulnerable since the original code. The entry->recv_vcc path was later
added by commit 8d9f73c0ad ("atm: fix a memory leak of vcc->user_back")
with the same pattern, and both paths are fixed here.

Reported-by: syzbot+72e3ea390c305de0e259@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68c95a83.050a0220.3c6139.0e5c.GAE@google.com/T/
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Link: https://patch.msgid.link/20260225123250.189289-1-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28 09:33:26 -08:00
Nikhil P. Rao
f7387d6579 xsk: Fix zero-copy AF_XDP fragment drop
AF_XDP should ensure that only a complete packet is sent to application.
In the zero-copy case, if the Rx queue gets full as fragments are being
enqueued, the remaining fragments are dropped.

For the multi-buffer case, add a check to ensure that the Rx queue has
enough space for all fragments of a packet before starting to enqueue
them.

Fixes: 24ea50127e ("xsk: support mbuf on ZC RX")
Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>
Link: https://patch.msgid.link/20260225000456.107806-3-nikhil.rao@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28 08:55:11 -08:00
Nikhil P. Rao
60abb0ac11 xsk: Fix fragment node deletion to prevent buffer leak
After commit b692bf9a75 ("xsk: Get rid of xdp_buff_xsk::xskb_list_node"),
the list_node field is reused for both the xskb pool list and the buffer
free list, this causes a buffer leak as described below.

xp_free() checks if a buffer is already on the free list using
list_empty(&xskb->list_node). When list_del() is used to remove a node
from the xskb pool list, it doesn't reinitialize the node pointers.
This means list_empty() will return false even after the node has been
removed, causing xp_free() to incorrectly skip adding the buffer to the
free list.

Fix this by using list_del_init() instead of list_del() in all fragment
handling paths, this ensures the list node is reinitialized after removal,
allowing the list_empty() to work correctly.

Fixes: b692bf9a75 ("xsk: Get rid of xdp_buff_xsk::xskb_list_node")
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>
Link: https://patch.msgid.link/20260225000456.107806-2-nikhil.rao@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28 08:55:11 -08:00
Jakub Kicinski
026dfef287 tcp: give up on stronger sk_rcvbuf checks (for now)
We hit another corner case which leads to TcpExtTCPRcvQDrop

Connections which send RPCs in the 20-80kB range over loopback
experience spurious drops. The exact conditions for most of
the drops I investigated are that:
 - socket exchanged >1MB of data so its not completely fresh
 - rcvbuf is around 128kB (default, hasn't grown)
 - there is ~60kB of data in rcvq
 - skb > 64kB arrives

The sum of skb->len (!) of both of the skbs (the one already
in rcvq and the arriving one) is larger than rwnd.
My suspicion is that this happens because __tcp_select_window()
rounds the rwnd up to (1 << wscale) if less than half of
the rwnd has been consumed.

Eric suggests that given the number of Fixes we already have
pointing to 1d2fbaad7c it's probably time to give up on it,
until a bigger revamp of rmem management.

Also while we could risk tweaking the rwnd math, there are other
drops on workloads I investigated, after the commit in question,
not explained by this phenomenon.

Suggested-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/20260225122355.585fd57b@kernel.org
Fixes: 1d2fbaad7c ("tcp: stronger sk_rcvbuf checks")
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260227003359.2391017-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28 07:55:39 -08:00
Kuniyuki Iwashima
6996a2d2d0 udp: Unhash auto-bound connected sk from 4-tuple hash table when disconnected.
Let's say we bind() an UDP socket to the wildcard address with a
non-zero port, connect() it to an address, and disconnect it from
the address.

bind() sets SOCK_BINDPORT_LOCK on sk->sk_userlocks (but not
SOCK_BINDADDR_LOCK), and connect() calls udp_lib_hash4() to put
the socket into the 4-tuple hash table.

Then, __udp_disconnect() calls sk->sk_prot->rehash(sk).

It computes a new hash based on the wildcard address and moves
the socket to a new slot in the 4-tuple hash table, leaving a
garbage in the chain that no packet hits.

Let's remove such a socket from 4-tuple hash table when disconnected.

Note that udp_sk(sk)->udp_portaddr_hash needs to be udpated after
udp_hash4_dec(hslot2) in udp_unhash4().

Fixes: 78c91ae2c6 ("ipv4/udp: Add 4-tuple hash for connected socket")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260227035547.3321327-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28 07:46:24 -08:00
Victor Nogueira
11cb63b0d1 net/sched: Only allow act_ct to bind to clsact/ingress qdiscs and shared blocks
As Paolo said earlier [1]:

"Since the blamed commit below, classify can return TC_ACT_CONSUMED while
the current skb being held by the defragmentation engine. As reported by
GangMin Kim, if such packet is that may cause a UaF when the defrag engine
later on tries to tuch again such packet."

act_ct was never meant to be used in the egress path, however some users
are attaching it to egress today [2]. Attempting to reach a middle
ground, we noticed that, while most qdiscs are not handling
TC_ACT_CONSUMED, clsact/ingress qdiscs are. With that in mind, we
address the issue by only allowing act_ct to bind to clsact/ingress
qdiscs and shared blocks. That way it's still possible to attach act_ct to
egress (albeit only with clsact).

[1] https://lore.kernel.org/netdev/674b8cbfc385c6f37fb29a1de08d8fe5c2b0fbee.1771321118.git.pabeni@redhat.com/
[2] https://lore.kernel.org/netdev/cc6bfb4a-4a2b-42d8-b9ce-7ef6644fb22b@ovn.org/

Reported-by: GangMin Kim <km.kim1503@gmail.com>
Fixes: 3f14b377d0 ("net/sched: act_ct: fix skb leak and crash on ooo frags")
CC: stable@vger.kernel.org
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260225134349.1287037-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27 19:06:21 -08:00
Jonas Köppeler
15c2715a52 net/sched: sch_cake: fixup cake_mq rate adjustment for diffserv config
cake_mq's rate adjustment during the sync periods did not adjust the
rates for every tin in a diffserv config. This lead to inconsistencies
of rates between the tins. Fix this by setting the rates for all tins
during synchronization.

Fixes: 1bddd758ba ("net/sched: sch_cake: share shaper state across sub-instances of cake_mq")
Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260226-cake-mq-skip-sync-bandwidth-unlimited-v1-2-01830bb4db87@tu-berlin.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27 18:35:40 -08:00
Jonas Köppeler
0b3cd139be net/sched: sch_cake: avoid sync overhead when unlimited
Skip inter-instance sync when no rate limit is configured, as it serves
no purpose and only adds overhead.

Fixes: 1bddd758ba ("net/sched: sch_cake: share shaper state across sub-instances of cake_mq")
Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260226-cake-mq-skip-sync-bandwidth-unlimited-v1-1-01830bb4db87@tu-berlin.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27 18:35:40 -08:00
Eric Dumazet
29252397bc inet: annotate data-races around isk->inet_num
UDP/TCP lookups are using RCU, thus isk->inet_num accesses
should use READ_ONCE() and WRITE_ONCE() where needed.

Fixes: 3ab5aee7fe ("net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260225203545.1512417-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27 17:16:59 -08:00
Paul Moses
62413a9c3c net/sched: act_gate: snapshot parameters with RCU on replace
The gate action can be replaced while the hrtimer callback or dump path is
walking the schedule list.

Convert the parameters to an RCU-protected snapshot and swap updates under
tcf_lock, freeing the previous snapshot via call_rcu(). When REPLACE omits
the entry list, preserve the existing schedule so the effective state is
unchanged.

Fixes: a51c328df3 ("net: qos: introduce a gate control flow action")
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moses <p@1g4.org>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260223150512.2251594-2-p@1g4.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27 16:10:36 -08:00
Danielle Ratson
93c9475c04 bridge: Check relevant per-VLAN options in VLAN range grouping
The br_vlan_opts_eq_range() function determines if consecutive VLANs can
be grouped together in a range for compact netlink notifications. It
currently checks state, tunnel info, and multicast router configuration,
but misses two categories of per-VLAN options that affect the output:
1. User-visible priv_flags (neigh_suppress, mcast_enabled)
2. Port multicast context (mcast_max_groups, mcast_n_groups)

When VLANs have different settings for these options, they are incorrectly
grouped into ranges, causing netlink notifications to report only one
VLAN's settings for the entire range.

Fix by checking priv_flags equality, but only for flags that affect netlink
output (BR_VLFLAG_NEIGH_SUPPRESS_ENABLED and BR_VLFLAG_MCAST_ENABLED),
and comparing multicast context (mcast_max_groups and mcast_n_groups).

Example showing the bugs before the fix:

$ bridge vlan set vid 10 dev dummy1 neigh_suppress on
$ bridge vlan set vid 11 dev dummy1 neigh_suppress off
$ bridge -d vlan show dev dummy1
  port             vlan-id
  dummy1           10-11
                      ... neigh_suppress on

$ bridge vlan set vid 10 dev dummy1 mcast_max_groups 100
$ bridge vlan set vid 11 dev dummy1 mcast_max_groups 200
$ bridge -d vlan show dev dummy1
  port             vlan-id
  dummy1           10-11
                      ... mcast_max_groups 100

After the fix, VLANs 10 and 11 are shown as separate entries with their
correct individual settings.

Fixes: a1aee20d5d ("net: bridge: Add netlink knobs for number / maximum MDB entries")
Fixes: 83f6d60079 ("bridge: vlan: Allow setting VLAN neighbor suppression state")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260225143956.3995415-2-danieller@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26 19:24:29 -08:00
Eric Dumazet
2ef2b20cf4 net: annotate data-races around sk->sk_{data_ready,write_space}
skmsg (and probably other layers) are changing these pointers
while other cpus might read them concurrently.

Add corresponding READ_ONCE()/WRITE_ONCE() annotations
for UDP, TCP and AF_UNIX.

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Reported-by: syzbot+87f770387a9e5dc6b79b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/699ee9fc.050a0220.1cd54b.0009.GAE@google.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260225131547.1085509-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26 19:23:03 -08:00
Jakub Kicinski
754a3d081a Here is a batman-adv bugfix:
- Avoid double-rtnl_lock ELP metric worker, by Sven Eckelmann
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmmes84WHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoSxDD/wI/ssEvqmay/4okfp6Fk/+hjLi
 2BvCLwKei8JKqsnNvUSW7I+inrp0AilwfUuMqQlIiOdz6zJ6O4s4SXdiwl8TH49p
 uVp4dSwoOPHzBKaPH+dU15fcLD4yBqRYnl6gyxem7hWtsDU04fn96se7lagUdJc/
 35LZ2ni9cRmxgmvcLECNGOj4Tm7TxbcG0wkifS/rIO7gd05rXb7c7T1lCGRPeBf4
 2i4RVQXwSEVhff1ig7yU/1gs2FUzIKnrlKHayyfYkynEI37Ggc4IBiqLkdyBuxJ4
 Z+qlCfumrtdrt79kirzezrcWEzQEj5Yn3fnXj0X27QYy5FJVKnLczHnGuLUSUqzl
 QgwvQ87tNwEmz50ODsq+TFY9GuowWJ5yLTMFb18u/5hJrAGvux5wU+mIbloTOpBg
 M/kMv8kZIMNzVEirxbD08Ygx9Fsxu3UWGptDAunlv1GkHBj7XqA2Jkoq77eDfxx+
 lIa0tu1s/y1eTb5tA9JXUn0BsoNrafDIY5zrjz+lDKYpmmeNUgiTbQBGuVCZ+t2o
 EWLYPxdV84QpwuoaXZ/ZkD0YVAx/sfDLptxaBGViWbThLGVYxYSELePO94Mkr6Os
 Fa/8gEg0Z+jNUZ3UfVVnjyjPaa5/BM2vtbwSQFgv1udJGLoa/AkWIcOEgkeZAzWc
 B5cubcmbSHx4mBCEuA==
 =Ng/2
 -----END PGP SIGNATURE-----

Merge tag 'batadv-net-pullrequest-20260225' of https://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here is a batman-adv bugfix:

 - Avoid double-rtnl_lock ELP metric worker, by Sven Eckelmann

* tag 'batadv-net-pullrequest-20260225' of https://git.open-mesh.org/linux-merge:
  batman-adv: Avoid double-rtnl_lock ELP metric worker
====================

Link: https://patch.msgid.link/20260225084614.229077-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26 19:15:09 -08:00
Davide Caratti
e35626f610 net/sched: ets: fix divide by zero in the offload path
Offloading ETS requires computing each class' WRR weight: this is done by
averaging over the sums of quanta as 'q_sum' and 'q_psum'. Using unsigned
int, the same integer size as the individual DRR quanta, can overflow and
even cause division by zero, like it happened in the following splat:

 Oops: divide error: 0000 [#1] SMP PTI
 CPU: 13 UID: 0 PID: 487 Comm: tc Tainted: G            E       6.19.0-virtme #45 PREEMPT(full)
 Tainted: [E]=UNSIGNED_MODULE
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 RIP: 0010:ets_offload_change+0x11f/0x290 [sch_ets]
 Code: e4 45 31 ff eb 03 41 89 c7 41 89 cb 89 ce 83 f9 0f 0f 87 b7 00 00 00 45 8b 08 31 c0 45 01 cc 45 85 c9 74 09 41 6b c4 64 31 d2 <41> f7 f2 89 c2 44 29 fa 45 89 df 41 83 fb 0f 0f 87 c7 00 00 00 44
 RSP: 0018:ffffd0a180d77588 EFLAGS: 00010246
 RAX: 00000000ffffff38 RBX: ffff8d3d482ca000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffd0a180d77660
 RBP: ffffd0a180d77690 R08: ffff8d3d482ca2d8 R09: 00000000fffffffe
 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffe
 R13: ffff8d3d472f2000 R14: 0000000000000003 R15: 0000000000000000
 FS:  00007f440b6c2740(0000) GS:ffff8d3dc9803000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000000003cdd2000 CR3: 0000000007b58002 CR4: 0000000000172ef0
 Call Trace:
  <TASK>
  ets_qdisc_change+0x870/0xf40 [sch_ets]
  qdisc_create+0x12b/0x540
  tc_modify_qdisc+0x6d7/0xbd0
  rtnetlink_rcv_msg+0x168/0x6b0
  netlink_rcv_skb+0x5c/0x110
  netlink_unicast+0x1d6/0x2b0
  netlink_sendmsg+0x22e/0x470
  ____sys_sendmsg+0x38a/0x3c0
  ___sys_sendmsg+0x99/0xe0
  __sys_sendmsg+0x8a/0xf0
  do_syscall_64+0x111/0xf80
  entry_SYSCALL_64_after_hwframe+0x77/0x7f
 RIP: 0033:0x7f440b81c77e
 Code: 4d 89 d8 e8 d4 bc 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa
 RSP: 002b:00007fff951e4c10 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 0000000000481820 RCX: 00007f440b81c77e
 RDX: 0000000000000000 RSI: 00007fff951e4cd0 RDI: 0000000000000003
 RBP: 00007fff951e4c20 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff951f4fa8
 R13: 00000000699ddede R14: 00007f440bb01000 R15: 0000000000486980
  </TASK>
 Modules linked in: sch_ets(E) netdevsim(E)
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:ets_offload_change+0x11f/0x290 [sch_ets]
 Code: e4 45 31 ff eb 03 41 89 c7 41 89 cb 89 ce 83 f9 0f 0f 87 b7 00 00 00 45 8b 08 31 c0 45 01 cc 45 85 c9 74 09 41 6b c4 64 31 d2 <41> f7 f2 89 c2 44 29 fa 45 89 df 41 83 fb 0f 0f 87 c7 00 00 00 44
 RSP: 0018:ffffd0a180d77588 EFLAGS: 00010246
 RAX: 00000000ffffff38 RBX: ffff8d3d482ca000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffd0a180d77660
 RBP: ffffd0a180d77690 R08: ffff8d3d482ca2d8 R09: 00000000fffffffe
 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffe
 R13: ffff8d3d472f2000 R14: 0000000000000003 R15: 0000000000000000
 FS:  00007f440b6c2740(0000) GS:ffff8d3dc9803000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000000003cdd2000 CR3: 0000000007b58002 CR4: 0000000000172ef0
 Kernel panic - not syncing: Fatal exception
 Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
 ---[ end Kernel panic - not syncing: Fatal exception ]---

Fix this using 64-bit integers for 'q_sum' and 'q_psum'.

Cc: stable@vger.kernel.org
Fixes: d35eb52bd2 ("net: sch_ets: Make the ETS qdisc offloadable")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/28504887df314588c7255e9911769c36f751edee.1771964872.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26 18:28:47 -08:00
Linus Torvalds
b9c8fc2cae Including fixes from IPsec, Bluetooth and netfilter
Current release - regressions:
 
   - wifi: fix dev_alloc_name() return value check
 
   - rds: fix recursive lock in rds_tcp_conn_slots_available
 
 Current release - new code bugs:
 
   - vsock: lock down child_ns_mode as write-once
 
 Previous releases - regressions:
 
   - core:
     - do not pass flow_id to set_rps_cpu()
     - consume xmit errors of GSO frames
 
   - netconsole: avoid OOB reads, msg is not nul-terminated
 
   - netfilter: h323: fix OOB read in decode_choice()
 
   - tcp: re-enable acceptance of FIN packets when RWIN is 0
 
   - udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb().
 
   - wifi: brcmfmac: fix potential kernel oops when probe fails
 
   - phy: register phy led_triggers during probe to avoid AB-BA deadlock
 
   - eth: bnxt_en: fix deleting of Ntuple filters
 
   - eth: wan: farsync: fix use-after-free bugs caused by unfinished tasklets
 
   - eth: xscale: check for PTP support properly
 
 Previous releases - always broken:
 
   - tcp: fix potential race in tcp_v6_syn_recv_sock()
 
   - kcm: fix zero-frag skb in frag_list on partial sendmsg error
 
   - xfrm:
     - fix race condition in espintcp_close()
     - always flush state and policy upon NETDEV_UNREGISTER event
 
   - bluetooth:
     - purge error queues in socket destructors
     - fix response to L2CAP_ECRED_CONN_REQ
 
   - eth: mlx5:
     - fix circular locking dependency in dump
     - fix "scheduling while atomic" in IPsec MAC address query
 
   - eth: gve: fix incorrect buffer cleanup for QPL
 
   - eth: team: avoid NETDEV_CHANGEMTU event when unregistering slave
 
   - eth: usb: validate USB endpoints
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmmgYU4SHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkLBgQAINazHstJ0DoDkvmwXapRSN0Ffauyd46
 oX6nfeWOT3BzZbAhZHtGgCSs4aULifJWMevtT7pq7a7PgZwMwfa47BugR1G/u5UE
 hCqalNjRTB/U2KmFk6eViKSacD4FvUIAyAMOotn1aEdRRAkBIJnIW/o/ZR9ZUkm0
 5+UigO64aq57+FOc5EQdGjYDcTVdzW12iOZ8ZqwtSATdNd9aC+gn3voRomTEo+Fm
 kQinkFEPAy/YyHGmfpC/z87/RTgkYLpagmsT4ZvBJeNPrIRvFEibSpPNhuzTzg81
 /BW5M8sJmm3XFiTiRp6Blv+0n6HIpKjAZMHn5c9hzX9cxPZQ24EjkXEex9ClaxLd
 OMef79rr1HBwqBTpIlK7xfLKCdT5Iex88s8HxXRB/Psqk9pVP469cSoK6cpyiGiP
 I+4WT0wn9ukTiu/yV2L2byVr1sanlu54P+UBYJpDwqq3lZ1ngWtkJ+SY369jhwAS
 FYIBmUSKhmWz3FEULaGpgPy4m9Fl/fzN8IFh2Buoc/Puq61HH7MAMjRty2ZSFTqj
 gbHrRhlkCRqubytgjsnCDPLoJF4ZYcXtpo/8ogG3641H1I+dN+DyGGVZ/ioswkks
 My1ds0rKqA3BHCmn+pN/qqkuopDCOB95dqOpgDqHG7GePrpa/FJ1guhxexsCd+nL
 Run2RcgDmd+d
 =HBOu
 -----END PGP SIGNATURE-----

Merge tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from IPsec, Bluetooth and netfilter

  Current release - regressions:

   - wifi: fix dev_alloc_name() return value check

   - rds: fix recursive lock in rds_tcp_conn_slots_available

  Current release - new code bugs:

   - vsock: lock down child_ns_mode as write-once

  Previous releases - regressions:

   - core:
      - do not pass flow_id to set_rps_cpu()
      - consume xmit errors of GSO frames

   - netconsole: avoid OOB reads, msg is not nul-terminated

   - netfilter: h323: fix OOB read in decode_choice()

   - tcp: re-enable acceptance of FIN packets when RWIN is 0

   - udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb().

   - wifi: brcmfmac: fix potential kernel oops when probe fails

   - phy: register phy led_triggers during probe to avoid AB-BA deadlock

   - eth:
      - bnxt_en: fix deleting of Ntuple filters
      - wan: farsync: fix use-after-free bugs caused by unfinished tasklets
      - xscale: check for PTP support properly

  Previous releases - always broken:

   - tcp: fix potential race in tcp_v6_syn_recv_sock()

   - kcm: fix zero-frag skb in frag_list on partial sendmsg error

   - xfrm:
      - fix race condition in espintcp_close()
      - always flush state and policy upon NETDEV_UNREGISTER event

   - bluetooth:
      - purge error queues in socket destructors
      - fix response to L2CAP_ECRED_CONN_REQ

   - eth:
      - mlx5:
         - fix circular locking dependency in dump
         - fix "scheduling while atomic" in IPsec MAC address query
      - gve: fix incorrect buffer cleanup for QPL
      - team: avoid NETDEV_CHANGEMTU event when unregistering slave
      - usb: validate USB endpoints"

* tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
  netfilter: nf_conntrack_h323: fix OOB read in decode_choice()
  dpaa2-switch: validate num_ifs to prevent out-of-bounds write
  net: consume xmit errors of GSO frames
  vsock: document write-once behavior of the child_ns_mode sysctl
  vsock: lock down child_ns_mode as write-once
  selftests/vsock: change tests to respect write-once child ns mode
  net/mlx5e: Fix "scheduling while atomic" in IPsec MAC address query
  net/mlx5: Fix missing devlink lock in SRIOV enable error path
  net/mlx5: E-switch, Clear legacy flag when moving to switchdev
  net/mlx5: LAG, disable MPESW in lag_disable_change()
  net/mlx5: DR, Fix circular locking dependency in dump
  selftests: team: Add a reference count leak test
  team: avoid NETDEV_CHANGEMTU event when unregistering slave
  net: mana: Fix double destroy_workqueue on service rescan PCI path
  MAINTAINERS: Update maintainer entry for QUALCOMM ETHQOS ETHERNET DRIVER
  dpll: zl3073x: Remove redundant cleanup in devm_dpll_init()
  selftests/net: packetdrill: Verify acceptance of FIN packets when RWIN is 0
  tcp: re-enable acceptance of FIN packets when RWIN is 0
  vsock: Use container_of() to get net namespace in sysctl handlers
  net: usb: kaweth: validate USB endpoints
  ...
2026-02-26 08:00:13 -08:00
Vahagn Vardanian
baed0d9ba9 netfilter: nf_conntrack_h323: fix OOB read in decode_choice()
In decode_choice(), the boundary check before get_len() uses the
variable `len`, which is still 0 from its initialization at the top of
the function:

    unsigned int type, ext, len = 0;
    ...
    if (ext || (son->attr & OPEN)) {
        BYTE_ALIGN(bs);
        if (nf_h323_error_boundary(bs, len, 0))  /* len is 0 here */
            return H323_ERROR_BOUND;
        len = get_len(bs);                        /* OOB read */

When the bitstream is exactly consumed (bs->cur == bs->end), the check
nf_h323_error_boundary(bs, 0, 0) evaluates to (bs->cur + 0 > bs->end),
which is false.  The subsequent get_len() call then dereferences
*bs->cur++, reading 1 byte past the end of the buffer.  If that byte
has bit 7 set, get_len() reads a second byte as well.

This can be triggered remotely by sending a crafted Q.931 SETUP message
with a User-User Information Element containing exactly 2 bytes of
PER-encoded data ({0x08, 0x00}) to port 1720 through a firewall with
the nf_conntrack_h323 helper active.  The decoder fully consumes the
PER buffer before reaching this code path, resulting in a 1-2 byte
heap-buffer-overflow read confirmed by AddressSanitizer.

Fix this by checking for 2 bytes (the maximum that get_len() may read)
instead of the uninitialized `len`.  This matches the pattern used at
every other get_len() call site in the same file, where the caller
checks for 2 bytes of available data before calling get_len().

Fixes: ec8a8f3c31 ("netfilter: nf_ct_h323: Extend nf_h323_error_boundary to work on bits as well")
Signed-off-by: Vahagn Vardanian <vahagn@redrays.io>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260225130619.1248-2-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-26 12:50:42 +01:00
Jakub Kicinski
7aa767d0d3 net: consume xmit errors of GSO frames
udpgro_frglist.sh and udpgro_bench.sh are the flakiest tests
currently in NIPA. They fail in the same exact way, TCP GRO
test stalls occasionally and the test gets killed after 10min.

These tests use veth to simulate GRO. They attach a trivial
("return XDP_PASS;") XDP program to the veth to force TSO off
and NAPI on.

Digging into the failure mode we can see that the connection
is completely stuck after a burst of drops. The sender's snd_nxt
is at sequence number N [1], but the receiver claims to have
received (rcv_nxt) up to N + 3 * MSS [2]. Last piece of the puzzle
is that senders rtx queue is not empty (let's say the block in
the rtx queue is at sequence number N - 4 * MSS [3]).

In this state, sender sends a retransmission from the rtx queue
with a single segment, and sequence numbers N-4*MSS:N-3*MSS [3].
Receiver sees it and responds with an ACK all the way up to
N + 3 * MSS [2]. But sender will reject this ack as TCP_ACK_UNSENT_DATA
because it has no recollection of ever sending data that far out [1].
And we are stuck.

The root cause is the mess of the xmit return codes. veth returns
an error when it can't xmit a frame. We end up with a loss event
like this:

  -------------------------------------------------
  |   GSO super frame 1   |   GSO super frame 2   |
  |-----------------------------------------------|
  | seg | seg | seg | seg | seg | seg | seg | seg |
  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |
  -------------------------------------------------
     x    ok    ok    <ok>|  ok    ok    ok   <x>
                          \\
			   snd_nxt

"x" means packet lost by veth, and "ok" means it went thru.
Since veth has TSO disabled in this test it sees individual segments.
Segment 1 is on the retransmit queue and will be resent.

So why did the sender not advance snd_nxt even tho it clearly did
send up to seg 8? tcp_write_xmit() interprets the return code
from the core to mean that data has not been sent at all. Since
TCP deals with GSO super frames, not individual segment the crux
of the problem is that loss of a single segment can be interpreted
as loss of all. TCP only sees the last return code for the last
segment of the GSO frame (in <> brackets in the diagram above).

Of course for the problem to occur we need a setup or a device
without a Qdisc. Otherwise Qdisc layer disconnects the protocol
layer from the device errors completely.

We have multiple ways to fix this.

 1) make veth not return an error when it lost a packet.
    While this is what I think we did in the past, the issue keeps
    reappearing and it's annoying to debug. The game of whack
    a mole is not great.

 2) fix the damn return codes
    We only talk about NETDEV_TX_OK and NETDEV_TX_BUSY in the
    documentation, so maybe we should make the return code from
    ndo_start_xmit() a boolean. I like that the most, but perhaps
    some ancient, not-really-networking protocol would suffer.

 3) make TCP ignore the errors
    It is not entirely clear to me what benefit TCP gets from
    interpreting the result of ip_queue_xmit()? Specifically once
    the connection is established and we're pushing data - packet
    loss is just packet loss?

 4) this fix
    Ignore the rc in the Qdisc-less+GSO case, since it's unreliable.
    We already always return OK in the TCQ_F_CAN_BYPASS case.
    In the Qdisc-less case let's be a bit more conservative and only
    mask the GSO errors. This path is taken by non-IP-"networks"
    like CAN, MCTP etc, so we could regress some ancient thing.
    This is the simplest, but also maybe the hackiest fix?

Similar fix has been proposed by Eric in the past but never committed
because original reporter was working with an OOT driver and wasn't
providing feedback (see Link).

Link: https://lore.kernel.org/CANn89iJcLepEin7EtBETrZ36bjoD9LrR=k4cfwWh046GB+4f9A@mail.gmail.com
Fixes: 1f59533f9c ("qdisc: validate frames going through the direct_xmit path")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260223235100.108939-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-26 11:35:00 +01:00
Bobby Eshleman
102eab95f0 vsock: lock down child_ns_mode as write-once
Two administrator processes may race when setting child_ns_mode as one
process sets child_ns_mode to "local" and then creates a namespace, but
another process changes child_ns_mode to "global" between the write and
the namespace creation. The first process ends up with a namespace in
"global" mode instead of "local". While this can be detected after the
fact by reading ns_mode and retrying, it is fragile and error-prone.

Make child_ns_mode write-once so that a namespace manager can set it
once and be sure it won't change. Writing a different value after the
first write returns -EBUSY. This applies to all namespaces, including
init_net, where an init process can write "local" to lock all future
namespaces into local mode.

Fixes: eafb64f40c ("vsock: add netns to vsock core")
Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Co-developed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-2-c0cde6959923@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-26 11:10:03 +01:00
Jakub Kicinski
6668c6f2dd A good number of fixes:
- cfg80211:
    - cancel rfkill work appropriately
    - fix radiotap parsing to correctly reject field 18
    - fix wext (yes...) off-by-one for IGTK key ID
  - mac80211:
    - fix for mesh NULL pointer dereference
    - fix for stack out-of-bounds (2 bytes) write on
      specific multi-link action frames
    - set default WMM parameters for all links
  - mwifiex: check dev_alloc_name() return value correctly
  - libertas: fix potential timer use-after-free
  - brcmfmac: fix crash on probe failure
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmme3O0ACgkQ10qiO8sP
 aAAhBA//UhqBeXsJd7dfSfGcz4ztzw/m4BDDxwWhJd0wq/ZHVwGvLfOXN1lXG1yR
 OsMaSQkT8UGv4NI0V/+7vcKlTvCe0oF0RPyzNtGL8CCYASyM0WbD6EqqpaLKdBIE
 Qg/PQ3n7mtPiKHYz9fmL/Yku8uNvHaYJ18HIki9Zn1kgcKvJegf4VqYoMa4m5zK3
 ShaNERSsrks2cgBQGwRMxNDfmbn2lr/YnyavFd+RoOdlIjN4FiU7zelgeCKapL6B
 URkn/NTp92ga3zcb5b57K3fjHucSKc7Lvf7l/ie5m8tw+Omr7zooBzjvtUzd6lfy
 gIFaPUuiKe3Zzq8fUKqgdSivyVOv6VdX6ieKi+mS0CkhfURqQUwNTZPM1Cn5MAkt
 lOPwaBpO7iZ2pP56jr29sEXz2komhTZLDv4bssrPvH6si6zToSd+wY10b6hESfTw
 wQBxdZl/YqnzngaojQhKTwlQRYATp1h60yEj2SKXpx+DMCtNkAmfxDhAzBCuIaDI
 eggswVy97Fn11WuDF3d8nthgyULrAzaK9LIGDCGObHZQYqROJmXtyNyeCmJJHvM7
 5/4l61H2nfMIymcSItVo/0ZQKmgiaSeU3t7Arp13uX6jbiWEbmGcdV35fmorwq+u
 p9Y3ay8o5yWfpb/XKx7mdurFBrYXTwry7xlaOkUzqCuEhRNRbTU=
 =VLWl
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2026-02-25' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
A good number of fixes:
 - cfg80211:
   - cancel rfkill work appropriately
   - fix radiotap parsing to correctly reject field 18
   - fix wext (yes...) off-by-one for IGTK key ID
 - mac80211:
   - fix for mesh NULL pointer dereference
   - fix for stack out-of-bounds (2 bytes) write on
     specific multi-link action frames
   - set default WMM parameters for all links
 - mwifiex: check dev_alloc_name() return value correctly
 - libertas: fix potential timer use-after-free
 - brcmfmac: fix crash on probe failure

* tag 'wireless-2026-02-25' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: mac80211: fix NULL pointer dereference in mesh_rx_csa_frame()
  wifi: mac80211: bounds-check link_id in ieee80211_ml_reconfiguration
  wifi: mac80211: set default WMM parameters on all links
  wifi: libertas: fix use-after-free in lbs_free_adapter()
  wifi: mwifiex: Fix dev_alloc_name() return value check
  wifi: brcmfmac: Fix potential kernel oops when probe fails
  wifi: radiotap: reject radiotap with unknown bits
  wifi: cfg80211: cancel rfkill_block work in wiphy_unregister()
  wifi: cfg80211: wext: fix IGTK key ID off-by-one
====================

Link: https://patch.msgid.link/20260225113159.360574-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-25 19:54:28 -08:00
Simon Baatz
1e3bb184e9 tcp: re-enable acceptance of FIN packets when RWIN is 0
Commit 2bd99aef1b ("tcp: accept bare FIN packets under memory
pressure") allowed accepting FIN packets in tcp_data_queue() even when
the receive window was closed, to prevent ACK/FIN loops with broken
clients.

Such a FIN packet is in sequence, but because the FIN consumes a
sequence number, it extends beyond the window. Before commit
9ca48d616e ("tcp: do not accept packets beyond window"),
tcp_sequence() only required the seq to be within the window. After
that change, the entire packet (including the FIN) must fit within the
window. As a result, such FIN packets are now dropped and the handling
path is no longer reached.

Be more lenient by not counting the sequence number consumed by the
FIN when calling tcp_sequence(), restoring the previous behavior for
cases where only the FIN extends beyond the window.

Fixes: 9ca48d616e ("tcp: do not accept packets beyond window")
Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260224-fix_zero_wnd_fin-v2-1-a16677ea7cea@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-25 19:07:02 -08:00
Greg Kroah-Hartman
5cc619583c vsock: Use container_of() to get net namespace in sysctl handlers
current->nsproxy is should not be accessed directly as syzbot has found
that it could be NULL at times, causing crashes.  Fix up the af_vsock
sysctl handlers to use container_of() to deal with the current net
namespace instead of attempting to rely on current.

This is the same type of change done in commit 7f5611cbc4 ("rds:
sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy")

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Fixes: eafb64f40c ("vsock: add netns to vsock core")
Link: https://patch.msgid.link/2026022318-rearview-gallery-ae13@gregkh
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-25 18:59:18 -08:00
Paolo Abeni
1348659dc9 bluetooth pull request for net:
- purge error queues in socket destructors
  - hci_sync: Fix CIS host feature condition
  - L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ
  - L2CAP: Fix result of L2CAP_ECRED_CONN_RSP when MTU is too short
  - L2CAP: Fix response to L2CAP_ECRED_CONN_REQ
  - L2CAP: Fix not checking output MTU is acceptable on L2CAP_ECRED_CONN_REQ
  - L2CAP: Fix missing key size check for L2CAP_LE_CONN_REQ
  - hci_qca: Cleanup on all setup failures
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCgA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmmcw1EZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKUTyD/4jtQwDrveC19zamF5n7lFY
 Oils6eftANcLFzLwTrMqGO7IxESga4qdNOf2vc/UgVSUfNqsPIUJ5El+LzpXZXAa
 sYBP/KudEX53CfU3fEVyPTUaWkZ4CdMRZeiCmgXqW7GxYbGw92SFuaSIHAP6Ep4s
 Z7Ryd1H0xhX9QPMc4g4IgoMiBiKzNs4GtlLSbDJcivAtbC/34nkMOxK9g+1DbU0F
 qzW+oPfYCpPzXTf20I1QIAMt5smnSM3Tuvo9u2pZRuEGpKjENxeY4hdAejfjeKA6
 RLWXm6JvMP2lUBT68plMQQdYyQ8DxG75sVjgSoQYIu2YTVnsX76t/kD2hhiHXH/Y
 nQoy4dtA1/5V7Ka0cfMhcvino4Rb9Gh3dsFKJOuWRT+aTY+gNhpyr56SuJh24Y3C
 7tUeEDI4fBkJGaRAbreVbaI5vw4kbSfi7IDOM/ccWDSLaG8HGaLOtn0IU8q4AgMa
 IkYzB5zwtiyM/zaSTO1k0HkpjR0wwftnTd+Fj2mUWdTwSeek64R9enmKYmg5UJrv
 14yhfLHFsbAQo+o1B3ZslnCdYQJpgFmyAInV6Jpunc78IE9+g/YA55K22JbDDSzI
 t9Zy25OWLyYZyuD1PzDkMlYU5OARNYeyRXbJ3w037LrpqRoEuFsK0qTmgi+kR9C7
 VR9IpCqgf4SJbL7ge83H8g==
 =JBaa
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2026-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - purge error queues in socket destructors
 - hci_sync: Fix CIS host feature condition
 - L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ
 - L2CAP: Fix result of L2CAP_ECRED_CONN_RSP when MTU is too short
 - L2CAP: Fix response to L2CAP_ECRED_CONN_REQ
 - L2CAP: Fix not checking output MTU is acceptable on L2CAP_ECRED_CONN_REQ
 - L2CAP: Fix missing key size check for L2CAP_LE_CONN_REQ
 - hci_qca: Cleanup on all setup failures

* tag 'for-net-2026-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: L2CAP: Fix missing key size check for L2CAP_LE_CONN_REQ
  Bluetooth: L2CAP: Fix not checking output MTU is acceptable on L2CAP_ECRED_CONN_REQ
  Bluetooth: Fix CIS host feature condition
  Bluetooth: L2CAP: Fix response to L2CAP_ECRED_CONN_REQ
  Bluetooth: hci_qca: Cleanup on all setup failures
  Bluetooth: purge error queues in socket destructors
  Bluetooth: L2CAP: Fix result of L2CAP_ECRED_CONN_RSP when MTU is too short
  Bluetooth: L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ
====================

Link: https://patch.msgid.link/20260223211634.3800315-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-24 15:03:08 +01:00
Sebastian Andrzej Siewior
983512f3a8 net: Drop the lock in skb_may_tx_timestamp()
skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must
not be taken in IRQ context, only softirq is okay. A few drivers receive
the timestamp via a dedicated interrupt and complete the TX timestamp
from that handler. This will lead to a deadlock if the lock is already
write-locked on the same CPU.

Taking the lock can be avoided. The socket (pointed by the skb) will
remain valid until the skb is released. The ->sk_socket and ->file
member will be set to NULL once the user closes the socket which may
happen before the timestamp arrives.
If we happen to observe the pointer while the socket is closing but
before the pointer is set to NULL then we may use it because both
pointer (and the file's cred member) are RCU freed.

Drop the lock. Use READ_ONCE() to obtain the individual pointer. Add a
matching WRITE_ONCE() where the pointer are cleared.

Link: https://lore.kernel.org/all/20260205145104.iWinkXHv@linutronix.de
Fixes: b245be1f4d ("net-timestamp: no-payload only sysctl")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260220183858.N4ERjFW6@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-24 11:27:29 +01:00
Fernando Fernandez Mancera
021fd0f870 net/rds: fix recursive lock in rds_tcp_conn_slots_available
syzbot reported a recursive lock warning in rds_tcp_get_peer_sport() as
it calls inet6_getname() which acquires the socket lock that was already
held by __release_sock().

 kworker/u8:6/2985 is trying to acquire lock:
 ffff88807a07aa20 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1709 [inline]
 ffff88807a07aa20 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: inet6_getname+0x15d/0x650 net/ipv6/af_inet6.c:533

 but task is already holding lock:
 ffff88807a07aa20 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1709 [inline]
 ffff88807a07aa20 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_sock_set_cork+0x2c/0x2e0 net/ipv4/tcp.c:3694
   lock_sock_nested+0x48/0x100 net/core/sock.c:3780
   lock_sock include/net/sock.h:1709 [inline]
   inet6_getname+0x15d/0x650 net/ipv6/af_inet6.c:533
   rds_tcp_get_peer_sport net/rds/tcp_listen.c:70 [inline]
   rds_tcp_conn_slots_available+0x288/0x470 net/rds/tcp_listen.c:149
   rds_recv_hs_exthdrs+0x60f/0x7c0 net/rds/recv.c:265
   rds_recv_incoming+0x9f6/0x12d0 net/rds/recv.c:389
   rds_tcp_data_recv+0x7f1/0xa40 net/rds/tcp_recv.c:243
   __tcp_read_sock+0x196/0x970 net/ipv4/tcp.c:1702
   rds_tcp_read_sock net/rds/tcp_recv.c:277 [inline]
   rds_tcp_data_ready+0x369/0x950 net/rds/tcp_recv.c:331
   tcp_rcv_established+0x19e9/0x2670 net/ipv4/tcp_input.c:6675
   tcp_v6_do_rcv+0x8eb/0x1ba0 net/ipv6/tcp_ipv6.c:1609
   sk_backlog_rcv include/net/sock.h:1185 [inline]
   __release_sock+0x1b8/0x3a0 net/core/sock.c:3213

Reading from the socket struct directly is safe from possible paths. For
rds_tcp_accept_one(), the socket has just been accepted and is not yet
exposed to concurrent access. For rds_tcp_conn_slots_available(), direct
access avoids the recursive deadlock seen during backlog processing
where the socket lock is already held from the __release_sock().

However, rds_tcp_conn_slots_available() is also called from the normal
softirq path via tcp_data_ready() where the lock is not held. This is
also safe because inet_dport is a stable 16 bits field. A READ_ONCE()
annotation as the value might be accessed lockless in a concurrent
access context.

Note that it is also safe to call rds_tcp_conn_slots_available() from
rds_conn_shutdown() because the fan-out is disabled.

Fixes: 9d27a0fb12 ("net/rds: Trigger rds_send_ping() more than once")
Reported-by: syzbot+5efae91f60932839f0a5@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=5efae91f60932839f0a5
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260219075738.4403-1-fmancera@suse.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-24 10:11:04 +01:00
Vahagn Vardanian
017c179252 wifi: mac80211: fix NULL pointer dereference in mesh_rx_csa_frame()
In mesh_rx_csa_frame(), elems->mesh_chansw_params_ie is dereferenced
at lines 1638 and 1642 without a prior NULL check:

    ifmsh->chsw_ttl = elems->mesh_chansw_params_ie->mesh_ttl;
    ...
    pre_value = le16_to_cpu(elems->mesh_chansw_params_ie->mesh_pre_value);

The mesh_matches_local() check above only validates the Mesh ID,
Mesh Configuration, and Supported Rates IEs.  It does not verify the
presence of the Mesh Channel Switch Parameters IE (element ID 118).
When a received CSA action frame omits that IE, ieee802_11_parse_elems()
leaves elems->mesh_chansw_params_ie as NULL, and the unconditional
dereference causes a kernel NULL pointer dereference.

A remote mesh peer with an established peer link (PLINK_ESTAB) can
trigger this by sending a crafted SPECTRUM_MGMT/CHL_SWITCH action frame
that includes a matching Mesh ID and Mesh Configuration IE but omits the
Mesh Channel Switch Parameters IE.  No authentication beyond the default
open mesh peering is required.

Crash confirmed on kernel 6.17.0-5-generic via mac80211_hwsim:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  Oops: Oops: 0000 [#1] SMP NOPTI
  RIP: 0010:ieee80211_mesh_rx_queued_mgmt+0x143/0x2a0 [mac80211]
  CR2: 0000000000000000

Fix by adding a NULL check for mesh_chansw_params_ie after
mesh_matches_local() returns, consistent with how other optional IEs
are guarded throughout the mesh code.

The bug has been present since v3.13 (released 2014-01-19).

Fixes: 8f2535b92d ("mac80211: process the CSA frame for mesh accordingly")
Cc: stable@vger.kernel.org
Signed-off-by: Vahagn Vardanian <vahagn@redrays.io>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-02-24 10:03:10 +01:00
Tung Nguyen
3aa677625c tipc: fix duplicate publication key in tipc_service_insert_publ()
TIPC uses named table to store TIPC services represented by type and
instance. Each time an application calls TIPC API bind() to bind a
type/instance to a socket, an entry is created and inserted into the
named table. It looks like this:

named table:
key1, entry1 (type, instance ...)
key2, entry2 (type, instance ...)

In the above table, each entry represents a route for sending data
from one socket to the other. For all publications originated from
the same node, the key is UNIQUE to identify each entry.
It is calculated by this formula:
key = socket portid + number of bindings + 1 (1)

where:
 - socket portid: unique and calculated by using linux kernel function
                  get_random_u32_below(). So, the value is randomized.
 - number of bindings: the number of times a type/instance pair is bound
                       to a socket. This number is linearly increased,
                       starting from 0.

While the socket portid is unique and randomized by linux kernel, the
linear increment of "number of bindings" in formula (1) makes "key" not
unique anymore. For example:
- Socket 1 is created with its associated port number 20062001. Type 1000,
instance 1 is bound to socket 1:
key1: 20062001 + 0 + 1 = 20062002

Then, bind() is called a second time on Socket 1 to by the same type 1000,
instance 1:
key2: 20062001 + 1 + 1 = 20062003

Named table:
key1 (20062002), entry1 (1000, 1 ...)
key2 (20062003), entry2 (1000, 1 ...)

- Socket 2 is created with its associated port number 20062002. Type 1000,
instance 1 is bound to socket 2:
key3: 20062002 + 0 + 1 = 20062003

TIPC looks up the named table and finds out that key2 with the same value
already exists and rejects the insertion into the named table.
This leads to failure of bind() call from application on Socket 2 with error
message EINVAL "Invalid argument".

This commit fixes this issue by adding more port id checking to make sure
that the key is unique to publications originated from the same port id
and node.

Fixes: 218527fe27 ("tipc: replace name table service range array with rb tree")
Signed-off-by: Tung Nguyen <tung.quang.nguyen@est.tech>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260220050541.237962-1-tung.quang.nguyen@est.tech
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-23 17:40:52 -08:00
Jiayuan Chen
ca220141fa kcm: fix zero-frag skb in frag_list on partial sendmsg error
Syzkaller reported a warning in kcm_write_msgs() when processing a
message with a zero-fragment skb in the frag_list.

When kcm_sendmsg() fills MAX_SKB_FRAGS fragments in the current skb,
it allocates a new skb (tskb) and links it into the frag_list before
copying data. If the copy subsequently fails (e.g. -EFAULT from
user memory), tskb remains in the frag_list with zero fragments:

  head skb (msg being assembled, NOT yet in sk_write_queue)
  +-----------+
  | frags[17] |  (MAX_SKB_FRAGS, all filled with data)
  | frag_list-+--> tskb
  +-----------+    +----------+
                   | frags[0] |  (empty! copy failed before filling)
                   +----------+

For SOCK_SEQPACKET with partial data already copied, the error path
saves this message via partial_message for later completion. For
SOCK_SEQPACKET, sock_write_iter() automatically sets MSG_EOR, so a
subsequent zero-length write(fd, NULL, 0) completes the message and
queues it to sk_write_queue. kcm_write_msgs() then walks the
frag_list and hits:

  WARN_ON(!skb_shinfo(skb)->nr_frags)

TCP has a similar pattern where skbs are enqueued before data copy
and cleaned up on failure via tcp_remove_empty_skb(). KCM was
missing the equivalent cleanup.

Fix this by tracking the predecessor skb (frag_prev) when allocating
a new frag_list entry. On error, if the tail skb has zero frags,
use frag_prev to unlink and free it in O(1) without walking the
singly-linked frag_list. frag_prev is safe to dereference because
the entire message chain is only held locally (or in kcm->seq_skb)
and is not added to sk_write_queue until MSG_EOR, so the send path
cannot free it underneath us.

Also change the WARN_ON to WARN_ON_ONCE to avoid flooding the log
if the condition is somehow hit repeatedly.

There are currently no KCM selftests in the kernel tree; a simple
reproducer is available at [1].

[1] https://gist.github.com/mrpre/a94d431c757e8d6f168f4dd1a3749daa

Reported-by: syzbot+52624bdfbf2746d37d70@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000269a1405a12fdc77@google.com/T/
Fixes: ab7ac4eb98 ("kcm: Kernel Connection Multiplexor module")
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Link: https://patch.msgid.link/20260219014256.370092-1-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-23 17:26:55 -08:00
Hyunwoo Kim
7bb09315f9 tls: Fix race condition in tls_sw_cancel_work_tx()
This issue was discovered during a code audit.

After cancel_delayed_work_sync() is called from tls_sk_proto_close(),
tx_work_handler() can still be scheduled from paths such as the
Delayed ACK handler or ksoftirqd.
As a result, the tx_work_handler() worker may dereference a freed
TLS object.

The following is a simple race scenario:

          cpu0                         cpu1

tls_sk_proto_close()
  tls_sw_cancel_work_tx()
                                 tls_write_space()
                                   tls_sw_write_space()
                                     if (!test_and_set_bit(BIT_TX_SCHEDULED, &tx_ctx->tx_bitmask))
    set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask);
    cancel_delayed_work_sync(&ctx->tx_work.work);
                                     schedule_delayed_work(&tx_ctx->tx_work.work, 0);

To prevent this race condition, cancel_delayed_work_sync() is
replaced with disable_delayed_work_sync().

Fixes: f87e62d45e ("net/tls: remove close callback sock unlock/lock around TX work flush")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/aZgsFO6nfylfvLE7@v4bel
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-23 17:08:14 -08:00
Eric Dumazet
8a8a9fac9e net: do not pass flow_id to set_rps_cpu()
Blamed commit made the assumption that the RPS table for each receive
queue would have the same size, and that it would not change.

Compute flow_id in set_rps_cpu(), do not assume we can use the value
computed by get_rps_cpu(). Otherwise we risk out-of-bound access
and/or crashes.

Fixes: 48aa30443e ("net: Cache hash and flow_id to avoid recalculation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Krishna Kumar <krikku@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260220222605.3468081-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-23 17:07:34 -08:00
Luiz Augusto von Dentz
138d7eca44 Bluetooth: L2CAP: Fix missing key size check for L2CAP_LE_CONN_REQ
This adds a check for encryption key size upon receiving
L2CAP_LE_CONN_REQ which is required by L2CAP/LE/CFC/BV-15-C which
expects L2CAP_CR_LE_BAD_KEY_SIZE.

Link: https://lore.kernel.org/linux-bluetooth/5782243.rdbgypaU67@n9w6sw14/
Fixes: 27e2d4c8d2 ("Bluetooth: Add basic LE L2CAP connect request receiving support")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Christian Eggers <ceggers@arri.de>
2026-02-23 16:08:15 -05:00
Luiz Augusto von Dentz
a8d1d73c81 Bluetooth: L2CAP: Fix not checking output MTU is acceptable on L2CAP_ECRED_CONN_REQ
Upon receiving L2CAP_ECRED_CONN_REQ the given MTU shall be checked
against the suggested MTU of the listening socket as that is required
by the likes of PTS L2CAP/ECFC/BV-27-C test which expects
L2CAP_CR_LE_UNACCEPT_PARAMS if the MTU is lowers than socket omtu.

In order to be able to set chan->omtu the code now allows setting
setsockopt(BT_SNDMTU), but it is only allowed when connection has not
been stablished since there is no procedure to reconfigure the output
MTU.

Link: https://github.com/bluez/bluez/issues/1895
Fixes: 15f02b9105 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-02-23 16:07:55 -05:00
Mariusz Skamra
7cff9a40c6 Bluetooth: Fix CIS host feature condition
This fixes the condition for sending the LE Set Host Feature command.
The command is sent to indicate host support for Connected Isochronous
Streams in this case. It has been observed that the system could not
initialize BIS-only capable controllers because the controllers do not
support the command.

As per Core v6.2 | Vol 4, Part E, Table 3.1 the command shall be
supported if CIS Central or CIS Peripheral is supported; otherwise,
the command is optional.

Fixes: 709788b154 ("Bluetooth: hci_core: Fix using {cis,bis}_capable for current settings")
Cc: stable@vger.kernel.org
Signed-off-by: Mariusz Skamra <mariusz.skamra@codecoup.pl>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-02-23 16:07:37 -05:00
Luiz Augusto von Dentz
05761c2c2b Bluetooth: L2CAP: Fix response to L2CAP_ECRED_CONN_REQ
Similar to 03dba9cea7 ("Bluetooth: L2CAP: Fix not responding with
L2CAP_CR_LE_ENCRYPTION") the result code L2CAP_CR_LE_ENCRYPTION shall
be used when BT_SECURITY_MEDIUM is set since that means security mode 2
which mean it doesn't require authentication which results in
qualification test L2CAP/ECFC/BV-32-C failing.

Link: https://github.com/bluez/bluez/issues/1871
Fixes: 15f02b9105 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-02-23 15:31:10 -05:00
Heitor Alves de Siqueira
21e4271e65 Bluetooth: purge error queues in socket destructors
When TX timestamping is enabled via SO_TIMESTAMPING, SKBs may be queued
into sk_error_queue and will stay there until consumed. If userspace never
gets to read the timestamps, or if the controller is removed unexpectedly,
these SKBs will leak.

Fix by adding skb_queue_purge() calls for sk_error_queue in affected
bluetooth destructors. RFCOMM does not currently use sk_error_queue.

Fixes: 134f4b39df ("Bluetooth: add support for skb TX SND/COMPLETION timestamping")
Reported-by: syzbot+7ff4013eabad1407b70a@syzkaller.appspotmail.com
Closes: https://syzbot.org/bug?extid=7ff4013eabad1407b70a
Cc: stable@vger.kernel.org
Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-02-23 15:30:16 -05:00
Luiz Augusto von Dentz
c28d2bff70 Bluetooth: L2CAP: Fix result of L2CAP_ECRED_CONN_RSP when MTU is too short
Test L2CAP/ECFC/BV-26-C expect the response to L2CAP_ECRED_CONN_REQ with
and MTU value < L2CAP_ECRED_MIN_MTU (64) to be L2CAP_CR_LE_INVALID_PARAMS
rather than L2CAP_CR_LE_UNACCEPT_PARAMS.

Also fix not including the correct number of CIDs in the response since
the spec requires all CIDs being rejected to be included in the
response.

Link: https://github.com/bluez/bluez/issues/1868
Fixes: 15f02b9105 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-02-23 15:28:56 -05:00
Luiz Augusto von Dentz
7accb1c432 Bluetooth: L2CAP: Fix invalid response to L2CAP_ECRED_RECONF_REQ
This fixes responding with an invalid result caused by checking the
wrong size of CID which should have been (cmd_len - sizeof(*req)) and
on top of it the wrong result was use L2CAP_CR_LE_INVALID_PARAMS which
is invalid/reserved for reconf when running test like L2CAP/ECFC/BI-03-C:

> ACL Data RX: Handle 64 flags 0x02 dlen 14
      LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6
        MTU: 64
        MPS: 64
        Source CID: 64
< ACL Data TX: Handle 64 flags 0x00 dlen 10
      LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2
!        Result: Reserved (0x000c)
         Result: Reconfiguration failed - one or more Destination CIDs invalid (0x0003)

Fiix L2CAP/ECFC/BI-04-C which expects L2CAP_RECONF_INVALID_MPS (0x0002)
when more than one channel gets its MPS reduced:

> ACL Data RX: Handle 64 flags 0x02 dlen 16
      LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 8
        MTU: 264
        MPS: 99
        Source CID: 64
!       Source CID: 65
< ACL Data TX: Handle 64 flags 0x00 dlen 10
      LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2
!        Result: Reconfiguration successful (0x0000)
         Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002)

Fix L2CAP/ECFC/BI-05-C when SCID is invalid (85 unconnected):

> ACL Data RX: Handle 64 flags 0x02 dlen 14
      LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6
        MTU: 65
        MPS: 64
!        Source CID: 85
< ACL Data TX: Handle 64 flags 0x00 dlen 10
      LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2
!        Result: Reconfiguration successful (0x0000)
         Result: Reconfiguration failed - one or more Destination CIDs invalid (0x0003)

Fix L2CAP/ECFC/BI-06-C when MPS < L2CAP_ECRED_MIN_MPS (64):

> ACL Data RX: Handle 64 flags 0x02 dlen 14
      LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 2 len 6
        MTU: 672
!       MPS: 63
        Source CID: 64
< ACL Data TX: Handle 64 flags 0x00 dlen 10
      LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2
!       Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002)
        Result: Reconfiguration failed - other unacceptable parameters (0x0004)

Fix L2CAP/ECFC/BI-07-C when MPS reduced for more than one channel:

> ACL Data RX: Handle 64 flags 0x02 dlen 16
      LE L2CAP: Enhanced Credit Reconfigure Request (0x19) ident 3 len 8
        MTU: 84
!       MPS: 71
        Source CID: 64
!        Source CID: 65
< ACL Data TX: Handle 64 flags 0x00 dlen 10
      LE L2CAP: Enhanced Credit Reconfigure Respond (0x1a) ident 2 len 2
!       Result: Reconfiguration successful (0x0000)
        Result: Reconfiguration failed - reduction in size of MPS not allowed for more than one channel at a time (0x0002)

Link: https://github.com/bluez/bluez/issues/1865
Fixes: 15f02b9105 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-02-23 15:23:37 -05:00
Ariel Silver
162d331d83 wifi: mac80211: bounds-check link_id in ieee80211_ml_reconfiguration
link_id is taken from the ML Reconfiguration element (control & 0x000f),
so it can be 0..15. link_removal_timeout[] has IEEE80211_MLD_MAX_NUM_LINKS
(15) elements, so index 15 is out-of-bounds. Skip subelements with
link_id >= IEEE80211_MLD_MAX_NUM_LINKS to avoid a stack out-of-bounds
write.

Fixes: 8eb8dd2ffb ("wifi: mac80211: Support link removal using Reconfiguration ML element")
Reported-by: Ariel Silver <arielsilver77@gmail.com>
Signed-off-by: Ariel Silver <arielsilver77@gmail.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260220101129.1202657-1-Ariel.Silver@cybereason.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-02-23 12:35:34 +01:00