The driver obtains sw_attr.num_ifs from firmware via dpsw_get_attributes()
but never validates it against DPSW_MAX_IF (64). This value controls
iteration in dpaa2_switch_fdb_get_flood_cfg(), which writes port indices
into the fixed-size cfg->if_id[DPSW_MAX_IF] array. When firmware reports
num_ifs >= 64, the loop can write past the array bounds.
Add a bound check for num_ifs in dpaa2_switch_init().
dpaa2_switch_fdb_get_flood_cfg() appends the control interface (port
num_ifs) after all matched ports. When num_ifs == DPSW_MAX_IF and all
ports match the flood filter, the loop fills all 64 slots and the control
interface write overflows by one entry.
The check uses >= because num_ifs == DPSW_MAX_IF is also functionally
broken.
build_if_id_bitmap() silently drops any ID >= 64:
if (id[i] < DPSW_MAX_IF)
bmap[id[i] / 64] |= ...
Fixes: 539dda3c5d ("staging: dpaa2-switch: properly setup switching domains")
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/SYBPR01MB78812B47B7F0470B617C408AAF74A@SYBPR01MB7881.ausprd01.prod.outlook.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Fix a "scheduling while atomic" bug in mlx5e_ipsec_init_macs() by
replacing mlx5_query_mac_address() with ether_addr_copy() to get the
local MAC address directly from netdev->dev_addr.
The issue occurs because mlx5_query_mac_address() queries the hardware
which involves mlx5_cmd_exec() that can sleep, but it is called from
the mlx5e_ipsec_handle_event workqueue which runs in atomic context.
The MAC address is already available in netdev->dev_addr, so no need
to query hardware. This avoids the sleeping call and resolves the bug.
Call trace:
BUG: scheduling while atomic: kworker/u112:2/69344/0x00000200
__schedule+0x7ab/0xa20
schedule+0x1c/0xb0
schedule_timeout+0x6e/0xf0
__wait_for_common+0x91/0x1b0
cmd_exec+0xa85/0xff0 [mlx5_core]
mlx5_cmd_exec+0x1f/0x50 [mlx5_core]
mlx5_query_nic_vport_mac_address+0x7b/0xd0 [mlx5_core]
mlx5_query_mac_address+0x19/0x30 [mlx5_core]
mlx5e_ipsec_init_macs+0xc1/0x720 [mlx5_core]
mlx5e_ipsec_build_accel_xfrm_attrs+0x422/0x670 [mlx5_core]
mlx5e_ipsec_handle_event+0x2b9/0x460 [mlx5_core]
process_one_work+0x178/0x2e0
worker_thread+0x2ea/0x430
Fixes: cee137a634 ("net/mlx5e: Handle ESN update events")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260224114652.1787431-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The cited commit miss to add locking in the error path of
mlx5_sriov_enable(). When pci_enable_sriov() fails,
mlx5_device_disable_sriov() is called to clean up. This cleanup function
now expects to be called with the devlink instance lock held.
Add the missing devl_lock(devlink) and devl_unlock(devlink)
Fixes: 84a433a40d ("net/mlx5: Lock mlx5 devlink reload callbacks")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260224114652.1787431-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The cited commit introduced MLX5_PRIV_FLAGS_SWITCH_LEGACY to identify
when a transition to legacy mode is requested via devlink. However, the
logic failed to clear this flag if the mode was subsequently changed
back to MLX5_ESWITCH_OFFLOADS (switchdev). Consequently, if a user
toggled from legacy to switchdev, the flag remained set, leaving the
driver with wrong state indicating
Fix this by explicitly clearing the MLX5_PRIV_FLAGS_SWITCH_LEGACY bit
when the requested mode is MLX5_ESWITCH_OFFLOADS.
Fixes: 2a4f56fbcc ("net/mlx5e: Keep netdev when leave switchdev for devlink set legacy only")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260224114652.1787431-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mlx5_lag_disable_change() unconditionally called mlx5_disable_lag() when
LAG was active, which is incorrect for MLX5_LAG_MODE_MPESW.
Hnece, call mlx5_disable_mpesw() when running in MPESW mode.
Fixes: a32327a3a0 ("net/mlx5: Lag, Control MultiPort E-Switch single FDB mode")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260224114652.1787431-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot is reporting
unregister_netdevice: waiting for netdevsim0 to become free. Usage count = 3
ref_tracker: netdev@ffff88807dcf8618 has 1/2 users at
__netdev_tracker_alloc include/linux/netdevice.h:4400 [inline]
netdev_hold include/linux/netdevice.h:4429 [inline]
inetdev_init+0x201/0x4e0 net/ipv4/devinet.c:286
inetdev_event+0x251/0x1610 net/ipv4/devinet.c:1600
notifier_call_chain+0x19d/0x3a0 kernel/notifier.c:85
call_netdevice_notifiers_mtu net/core/dev.c:2318 [inline]
netif_set_mtu_ext+0x5aa/0x800 net/core/dev.c:9886
netif_set_mtu+0xd7/0x1b0 net/core/dev.c:9907
dev_set_mtu+0x126/0x260 net/core/dev_api.c:248
team_port_del+0xb07/0xcb0 drivers/net/team/team_core.c:1333
team_del_slave drivers/net/team/team_core.c:1936 [inline]
team_device_event+0x207/0x5b0 drivers/net/team/team_core.c:2929
notifier_call_chain+0x19d/0x3a0 kernel/notifier.c:85
call_netdevice_notifiers_extack net/core/dev.c:2281 [inline]
call_netdevice_notifiers net/core/dev.c:2295 [inline]
__dev_change_net_namespace+0xcb7/0x2050 net/core/dev.c:12592
do_setlink+0x2ce/0x4590 net/core/rtnetlink.c:3060
rtnl_changelink net/core/rtnetlink.c:3776 [inline]
__rtnl_newlink net/core/rtnetlink.c:3935 [inline]
rtnl_newlink+0x15a9/0x1be0 net/core/rtnetlink.c:4072
rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6958
netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x80f/0x9b0 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
problem. Ido Schimmel found steps to reproduce
ip link add name team1 type team
ip link add name dummy1 mtu 1499 master team1 type dummy
ip netns add ns1
ip link set dev dummy1 netns ns1
ip -n ns1 link del dev dummy1
and also found that the same issue was fixed in the bond driver in
commit f51048c3e0 ("bonding: avoid NETDEV_CHANGEMTU event when
unregistering slave").
Let's do similar thing for the team driver, with commit ad7c7b2172 ("net:
hold netdev instance lock during sysfs operations") and commit 303a8487a6
("net: s/__dev_set_mtu/__netif_set_mtu/") also applied.
Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Fixes: 3d249d4ca7 ("net: introduce ethernet teaming device")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260224125709.317574-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While testing corner cases in the driver, a use-after-free crash
was found on the service rescan PCI path.
When mana_serv_reset() calls mana_gd_suspend(), mana_gd_cleanup()
destroys gc->service_wq. If the subsequent mana_gd_resume() fails
with -ETIMEDOUT or -EPROTO, the code falls through to
mana_serv_rescan() which triggers pci_stop_and_remove_bus_device().
This invokes the PCI .remove callback (mana_gd_remove), which calls
mana_gd_cleanup() a second time, attempting to destroy the already-
freed workqueue. Fix this by NULL-checking gc->service_wq in
mana_gd_cleanup() and setting it to NULL after destruction.
Call stack of issue for reference:
[Sat Feb 21 18:53:48 2026] Call Trace:
[Sat Feb 21 18:53:48 2026] <TASK>
[Sat Feb 21 18:53:48 2026] mana_gd_cleanup+0x33/0x70 [mana]
[Sat Feb 21 18:53:48 2026] mana_gd_remove+0x3a/0xc0 [mana]
[Sat Feb 21 18:53:48 2026] pci_device_remove+0x41/0xb0
[Sat Feb 21 18:53:48 2026] device_remove+0x46/0x70
[Sat Feb 21 18:53:48 2026] device_release_driver_internal+0x1e3/0x250
[Sat Feb 21 18:53:48 2026] device_release_driver+0x12/0x20
[Sat Feb 21 18:53:48 2026] pci_stop_bus_device+0x6a/0x90
[Sat Feb 21 18:53:48 2026] pci_stop_and_remove_bus_device+0x13/0x30
[Sat Feb 21 18:53:48 2026] mana_do_service+0x180/0x290 [mana]
[Sat Feb 21 18:53:48 2026] mana_serv_func+0x24/0x50 [mana]
[Sat Feb 21 18:53:48 2026] process_one_work+0x190/0x3d0
[Sat Feb 21 18:53:48 2026] worker_thread+0x16e/0x2e0
[Sat Feb 21 18:53:48 2026] kthread+0xf7/0x130
[Sat Feb 21 18:53:48 2026] ? __pfx_worker_thread+0x10/0x10
[Sat Feb 21 18:53:48 2026] ? __pfx_kthread+0x10/0x10
[Sat Feb 21 18:53:48 2026] ret_from_fork+0x269/0x350
[Sat Feb 21 18:53:48 2026] ? __pfx_kthread+0x10/0x10
[Sat Feb 21 18:53:48 2026] ret_from_fork_asm+0x1a/0x30
[Sat Feb 21 18:53:48 2026] </TASK>
Fixes: 505cc26bca ("net: mana: Add support for auxiliary device servicing events")
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/aZ2bzL64NagfyHpg@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The kaweth driver should validate that the device it is probing has the
proper number and types of USB endpoints it is expecting before it binds
to it. If a malicious device were to not have the same urbs the driver
will crash later on when it blindly accesses these endpoints.
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Link: https://patch.msgid.link/2026022305-substance-virtual-c728@gregkh
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The kalmia driver should validate that the device it is probing has the
proper number and types of USB endpoints it is expecting before it binds
to it. If a malicious device were to not have the same urbs the driver
will crash later on when it blindly accesses these endpoints.
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes: d40261236e ("net/usb: Add Samsung Kalmia driver for Samsung GT-B3730")
Link: https://patch.msgid.link/2026022326-shack-headstone-ef6f@gregkh
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The pegasus driver should validate that the device it is probing has the
proper number and types of USB endpoints it is expecting before it binds
to it. If a malicious device were to not have the same urbs the driver
will crash later on when it blindly accesses these endpoints.
Cc: Petko Manolov <petkan@nucleusys.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2026022347-legibly-attest-cc5c@gregkh
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When stmmac_init_timestamping() is called, it clears the receive and
transmit path booleans that allow timestamps to be read. These are
never re-initialised until after userspace requests timestamping
features to be enabled.
However, our copy of the timestamp configuration is not cleared, which
means we return the old configuration to userspace when requested.
This is inconsistent. Fix this by clearing the timestamp configuration.
Fixes: d6228b7cdd ("net: stmmac: implement the SIOCGHWTSTAMP ioctl")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/E1vuUu4-0000000Afea-0j9B@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is an AB-BA deadlock when both LEDS_TRIGGER_NETDEV and
LED_TRIGGER_PHY are enabled:
[ 1362.049207] [<8054e4b8>] led_trigger_register+0x5c/0x1fc <-- Trying to get lock "triggers_list_lock" via down_write(&triggers_list_lock);
[ 1362.054536] [<80662830>] phy_led_triggers_register+0xd0/0x234
[ 1362.060329] [<8065e200>] phy_attach_direct+0x33c/0x40c
[ 1362.065489] [<80651fc4>] phylink_fwnode_phy_connect+0x15c/0x23c
[ 1362.071480] [<8066ee18>] mtk_open+0x7c/0xba0
[ 1362.075849] [<806d714c>] __dev_open+0x280/0x2b0
[ 1362.080384] [<806d7668>] __dev_change_flags+0x244/0x24c
[ 1362.085598] [<806d7698>] dev_change_flags+0x28/0x78
[ 1362.090528] [<807150e4>] dev_ioctl+0x4c0/0x654 <-- Hold lock "rtnl_mutex" by calling rtnl_lock();
[ 1362.094985] [<80694360>] sock_ioctl+0x2f4/0x4e0
[ 1362.099567] [<802e9c4c>] sys_ioctl+0x32c/0xd8c
[ 1362.104022] [<80014504>] syscall_common+0x34/0x58
Here LED_TRIGGER_PHY is registering LED triggers during phy_attach
while holding RTNL and then taking triggers_list_lock.
[ 1362.191101] [<806c2640>] register_netdevice_notifier+0x60/0x168 <-- Trying to get lock "rtnl_mutex" via rtnl_lock();
[ 1362.197073] [<805504ac>] netdev_trig_activate+0x194/0x1e4
[ 1362.202490] [<8054e28c>] led_trigger_set+0x1d4/0x360 <-- Hold lock "triggers_list_lock" by down_read(&triggers_list_lock);
[ 1362.207511] [<8054eb38>] led_trigger_write+0xd8/0x14c
[ 1362.212566] [<80381d98>] sysfs_kf_bin_write+0x80/0xbc
[ 1362.217688] [<8037fcd8>] kernfs_fop_write_iter+0x17c/0x28c
[ 1362.223174] [<802cbd70>] vfs_write+0x21c/0x3c4
[ 1362.227712] [<802cc0c4>] ksys_write+0x78/0x12c
[ 1362.232164] [<80014504>] syscall_common+0x34/0x58
Here LEDS_TRIGGER_NETDEV is being enabled on an LED. It first takes
triggers_list_lock and then RTNL. A classical AB-BA deadlock.
phy_led_triggers_registers() does not require the RTNL, it does not
make any calls into the network stack which require protection. There
is also no requirement the PHY has been attached to a MAC, the
triggers only make use of phydev state. This allows the call to
phy_led_triggers_registers() to be placed elsewhere. PHY probe() and
release() don't hold RTNL, so solving the AB-BA deadlock.
Reported-by: Shiji Yang <yangshiji66@outlook.com>
Closes: https://lore.kernel.org/all/OS7PR01MB13602B128BA1AD3FA38B6D1FFBC69A@OS7PR01MB13602.jpnprd01.prod.outlook.com/
Fixes: 06f502f57d ("leds: trigger: Introduce a NETDEV trigger")
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Shiji Yang <yangshiji66@outlook.com>
Link: https://patch.msgid.link/20260222152601.1978655-1-andrew@lunn.ch
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
pegasus_probe() fills URBs with hardcoded endpoint pipes without
verifying the endpoint descriptors:
- usb_rcvbulkpipe(dev, 1) for RX data
- usb_sndbulkpipe(dev, 2) for TX data
- usb_rcvintpipe(dev, 3) for status interrupts
A malformed USB device can present these endpoints with transfer types
that differ from what the driver assumes.
Add a pegasus_usb_ep enum for endpoint numbers, replacing magic
constants throughout. Add usb_check_bulk_endpoints() and
usb_check_int_endpoints() calls before any resource allocation to
verify endpoint types before use, rejecting devices with mismatched
descriptors at probe time, and avoid triggering assertion.
Similar fix to
- commit 90b7f29617 ("net: usb: rtl8150: enable basic endpoint checking")
- commit 9e7021d2ae ("net: usb: catc: enable basic endpoint checking")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260222050633.410165-1-n7l8m4@u.northwestern.edu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
msg passed to netconsole from the console subsystem is not guaranteed
to be nul-terminated. Before recent
commit 7eab73b186 ("netconsole: convert to NBCON console infrastructure")
the message would be placed in printk_shared_pbufs, a static global
buffer, so KASAN had harder time catching OOB accesses. Now we see:
printk: console [netcon_ext0] enabled
BUG: KASAN: slab-out-of-bounds in string+0x1f7/0x240
Read of size 1 at addr ffff88813b6d4c00 by task pr/netcon_ext0/594
CPU: 65 UID: 0 PID: 594 Comm: pr/netcon_ext0 Not tainted 6.19.0-11754-g4246fd6547c9
Call Trace:
kasan_report+0xe4/0x120
string+0x1f7/0x240
vsnprintf+0x655/0xba0
scnprintf+0xba/0x120
netconsole_write+0x3fe/0xa10
nbcon_emit_next_record+0x46e/0x860
nbcon_kthread_func+0x623/0x750
Allocated by task 1:
nbcon_alloc+0x1ea/0x450
register_console+0x26b/0xe10
init_netconsole+0xbb0/0xda0
The buggy address belongs to the object at ffff88813b6d4000
which belongs to the cache kmalloc-4k of size 4096
The buggy address is located 0 bytes to the right of
allocated 3072-byte region [ffff88813b6d4000, ffff88813b6d4c00)
Fixes: c62c0a17f9 ("netconsole: Append kernel version to message")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260219195021.2099699-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When the FarSync T-series card is being detached, the fst_card_info is
deallocated in fst_remove_one(). However, the fst_tx_task or fst_int_task
may still be running or pending, leading to use-after-free bugs when the
already freed fst_card_info is accessed in fst_process_tx_work_q() or
fst_process_int_work_q().
A typical race condition is depicted below:
CPU 0 (cleanup) | CPU 1 (tasklet)
| fst_start_xmit()
fst_remove_one() | tasklet_schedule()
unregister_hdlc_device()|
| fst_process_tx_work_q() //handler
kfree(card) //free | do_bottom_half_tx()
| card-> //use
The following KASAN trace was captured:
==================================================================
BUG: KASAN: slab-use-after-free in do_bottom_half_tx+0xb88/0xd00
Read of size 4 at addr ffff88800aad101c by task ksoftirqd/3/32
...
Call Trace:
<IRQ>
dump_stack_lvl+0x55/0x70
print_report+0xcb/0x5d0
? do_bottom_half_tx+0xb88/0xd00
kasan_report+0xb8/0xf0
? do_bottom_half_tx+0xb88/0xd00
do_bottom_half_tx+0xb88/0xd00
? _raw_spin_lock_irqsave+0x85/0xe0
? __pfx__raw_spin_lock_irqsave+0x10/0x10
? __pfx___hrtimer_run_queues+0x10/0x10
fst_process_tx_work_q+0x67/0x90
tasklet_action_common+0x1fa/0x720
? hrtimer_interrupt+0x31f/0x780
handle_softirqs+0x176/0x530
__irq_exit_rcu+0xab/0xe0
sysvec_apic_timer_interrupt+0x70/0x80
...
Allocated by task 41 on cpu 3 at 72.330843s:
kasan_save_stack+0x24/0x50
kasan_save_track+0x17/0x60
__kasan_kmalloc+0x7f/0x90
fst_add_one+0x1a5/0x1cd0
local_pci_probe+0xdd/0x190
pci_device_probe+0x341/0x480
really_probe+0x1c6/0x6a0
__driver_probe_device+0x248/0x310
driver_probe_device+0x48/0x210
__device_attach_driver+0x160/0x320
bus_for_each_drv+0x101/0x190
__device_attach+0x198/0x3a0
device_initial_probe+0x78/0xa0
pci_bus_add_device+0x81/0xc0
pci_bus_add_devices+0x7e/0x190
enable_slot+0x9b9/0x1130
acpiphp_check_bridge.part.0+0x2e1/0x460
acpiphp_hotplug_notify+0x36c/0x3c0
acpi_device_hotplug+0x203/0xb10
acpi_hotplug_work_fn+0x59/0x80
...
Freed by task 41 on cpu 1 at 75.138639s:
kasan_save_stack+0x24/0x50
kasan_save_track+0x17/0x60
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x43/0x70
kfree+0x135/0x410
fst_remove_one+0x2ca/0x540
pci_device_remove+0xa6/0x1d0
device_release_driver_internal+0x364/0x530
pci_stop_bus_device+0x105/0x150
pci_stop_and_remove_bus_device+0xd/0x20
disable_slot+0x116/0x260
acpiphp_disable_and_eject_slot+0x4b/0x190
acpiphp_hotplug_notify+0x230/0x3c0
acpi_device_hotplug+0x203/0xb10
acpi_hotplug_work_fn+0x59/0x80
...
The buggy address belongs to the object at ffff88800aad1000
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 28 bytes inside of
freed 1024-byte region
The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xaad0
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x100000000000040(head|node=0|zone=1)
page_type: f5(slab)
raw: 0100000000000040 ffff888007042dc0 dead000000000122 0000000000000000
raw: 0000000000000000 0000000080100010 00000000f5000000 0000000000000000
head: 0100000000000040 ffff888007042dc0 dead000000000122 0000000000000000
head: 0000000000000000 0000000080100010 00000000f5000000 0000000000000000
head: 0100000000000003 ffffea00002ab401 00000000ffffffff 00000000ffffffff
head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88800aad0f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88800aad0f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88800aad1000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88800aad1080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88800aad1100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Fix this by ensuring that both fst_tx_task and fst_int_task are properly
canceled before the fst_card_info is released. Add tasklet_kill() in
fst_remove_one() to synchronize with any pending or running tasklets.
Since unregister_hdlc_device() stops data transmission and reception,
and fst_disable_intr() prevents further interrupts, it is appropriate
to place tasklet_kill() after these calls.
The bugs were identified through static analysis. To reproduce the issue
and validate the fix, a FarSync T-series card was simulated in QEMU and
delays(e.g., mdelay()) were introduced within the tasklet handler to
increase the likelihood of triggering the race condition.
Fixes: 2f623aaf9f ("net: farsync: Fix kmemleak when rmmods farsync")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260219124637.72578-1-duoming@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In DQ-QPL mode, gve_tx_clean_pending_packets() incorrectly uses the RDA
buffer cleanup path. It iterates num_bufs times and attempts to unmap
entries in the dma array.
This leads to two issues:
1. The dma array shares storage with tx_qpl_buf_ids (union).
Interpreting buffer IDs as DMA addresses results in attempting to
unmap incorrect memory locations.
2. num_bufs in QPL mode (counting 2K chunks) can significantly exceed
the size of the dma array, causing out-of-bounds access warnings
(trace below is how we noticed this issue).
UBSAN: array-index-out-of-bounds in
drivers/net/ethernet/drivers/net/ethernet/google/gve/gve_tx_dqo.c:178:5 index 18 is out of
range for type 'dma_addr_t[18]' (aka 'unsigned long long[18]')
Workqueue: gve gve_service_task [gve]
Call Trace:
<TASK>
dump_stack_lvl+0x33/0xa0
__ubsan_handle_out_of_bounds+0xdc/0x110
gve_tx_stop_ring_dqo+0x182/0x200 [gve]
gve_close+0x1be/0x450 [gve]
gve_reset+0x99/0x120 [gve]
gve_service_task+0x61/0x100 [gve]
process_scheduled_works+0x1e9/0x380
Fix this by properly checking for QPL mode and delegating to
gve_free_tx_qpl_bufs() to reclaim the buffers.
Cc: stable@vger.kernel.org
Fixes: a6fb8d5a8b ("gve: Tx path for DQO-QPL")
Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Jordan Rhee <jordanrhee@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260220215324.1631350-1-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The lbs_free_adapter() function uses timer_delete() (non-synchronous)
for both command_timer and tx_lockup_timer before the structure is
freed. This is incorrect because timer_delete() does not wait for
any running timer callback to complete.
If a timer callback is executing when lbs_free_adapter() is called,
the callback will access freed memory since lbs_cfg_free() frees the
containing structure immediately after lbs_free_adapter() returns.
Both timer callbacks (lbs_cmd_timeout_handler and lbs_tx_lockup_handler)
access priv->driver_lock, priv->cur_cmd, priv->dev, and other fields,
which would all be use-after-free violations.
Use timer_delete_sync() instead to ensure any running timer callback
has completed before returning.
This bug was introduced in commit 8f641d93c3 ("libertas: detect TX
lockups and reset hardware") where del_timer() was used instead of
del_timer_sync() in the cleanup path. The command_timer has had the
same issue since the driver was first written.
Fixes: 8f641d93c3 ("libertas: detect TX lockups and reset hardware")
Fixes: 954ee164f4 ("[PATCH] libertas: reorganize and simplify init sequence")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Hodges <git@danielhodges.dev>
Link: https://patch.msgid.link/20260206195356.15647-1-git@danielhodges.dev
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
dev_alloc_name() returns the allocated ID on success, which could be
over 0.
Fix the return value check to check for negative error codes.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/aYmsQfujoAe5qO02@stanley.mountain/
Fixes: 7bab5bdb81 ("wifi: mwifiex: Allocate dev name earlier for interface workqueue name")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Link: https://patch.msgid.link/20260210100337.1131279-1-wenst@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When probe of the sdio brcmfmac device fails for some reasons (i.e.
missing firmware), the sdiodev->bus is set to error instead of NULL, thus
the cleanup later in brcmf_sdio_remove() tries to free resources via
invalid bus pointer. This happens because sdiodev->bus is set 2 times:
first in brcmf_sdio_probe() and second time in brcmf_sdiod_probe(). Fix
this by chaning the brcmf_sdio_probe() function to return the error code
and set sdio->bus only there.
Fixes: 0ff0843310 ("wifi: brcmfmac: Add optional lpo clock enable support")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Arend van Spriel<arend.vanspriel@broadcom.com>
Link: https://patch.msgid.link/20260203102133.1478331-1-m.szyprowski@samsung.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This converts some of the visually simpler cases that have been split
over multiple lines. I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.
Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script. I probably had made it a bit _too_ trivial.
So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.
The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.
As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
When processing TCP stream data in ovpn_tcp_recv, we receive large
cloned skbs from __strp_rcv that may contain multiple coalesced packets.
The current implementation has two bugs:
1. Header offset overflow: Using pskb_pull with large offsets on
coalesced skbs causes skb->data - skb->head to exceed the u16 storage
of skb->network_header. This causes skb_reset_network_header to fail
on the inner decapsulated packet, resulting in packet drops.
2. Unaligned protocol headers: Extracting packets from arbitrary
positions within the coalesced TCP stream provides no alignment
guarantees for the packet data causing performance penalties on
architectures without efficient unaligned access. Additionally,
openvpn's 2-byte length prefix on TCP packets causes the subsequent
4-byte opcode and packet ID fields to be inherently misaligned.
Fix both issues by allocating a new skb for each openvpn packet and
using skb_copy_bits to extract only the packet content into the new
buffer, skipping the 2-byte length prefix. Also, check the length before
invoking the function that performs the allocation to avoid creating an
invalid skb.
If the packet has to be forwarded to userspace the 2-byte prefix can be
pushed to the head safely, without misalignment.
As a side effect, this approach also avoids the expensive linearization
that pskb_pull triggers on cloned skbs with page fragments. In testing,
this resulted in TCP throughput improvements of up to 74%.
Fixes: 11851cbd60 ("ovpn: implement TCP transport")
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ntuple filters can be deleted when the interface
is down. The current code blindly sends the filter
delete command to FW. When the interface is down, all
the VNICs are deleted in the FW. When the VNIC is
freed in the FW, all the associated filters are also
freed. We need not send the free command explicitly.
Sending such command will generate FW error in the
dmesg.
In order to fix this, we can safely return from
bnxt_hwrm_cfa_ntuple_filter_free() when BNXT_STATE_OPEN
is not true which confirms the VNICs have been deleted.
Fixes: 8336a974f3 ("bnxt_en: Save user configured filters in a lookup list")
Suggested-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20260219185313.2682148-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We need to free the corresponding RSS context VNIC
in FW everytime an RSS context is deleted in driver.
Commit 667ac333db added a check to delete the VNIC
in FW only when netif_running() is true to help delete
RSS contexts with interface down.
Having that condition will make the driver leak VNICs
in FW whenever close() happens with active RSS contexts.
On the subsequent open(), as part of RSS context restoration,
we will end up trying to create extra VNICs for which we
did not make any reservation. FW can fail this request,
thereby making us lose active RSS contexts.
Suppose an RSS context is deleted already and we try to
process a delete request again, then the HWRM functions
will check for validity of the request and they simply
return if the resource is already freed. So, even for
delete-when-down cases, netif_running() check is not
necessary.
Remove the netif_running() condition check when deleting
an RSS context.
Reported-by: Jakub Kicinski <kicinski@meta.com>
Fixes: 667ac333db ("eth: bnxt: allow deleting RSS contexts when the device is down")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20260219185313.2682148-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In ixp4xx_get_ts_info() ixp46x_ptp_find() is called
unconditionally despite this feature only existing on
ixp46x, leading to the following splat from tcpdump:
root@OpenWrt:~# tcpdump -vv -X -i eth0
(...)
Unable to handle kernel NULL pointer dereference at virtual address
00000238 when read
(...)
Call trace:
ptp_clock_index from ixp46x_ptp_find+0x1c/0x38
ixp46x_ptp_find from ixp4xx_get_ts_info+0x4c/0x64
ixp4xx_get_ts_info from __ethtool_get_ts_info+0x90/0x108
__ethtool_get_ts_info from __dev_ethtool+0xa00/0x2648
__dev_ethtool from dev_ethtool+0x160/0x234
dev_ethtool from dev_ioctl+0x2cc/0x460
dev_ioctl from sock_ioctl+0x1ec/0x524
sock_ioctl from sys_ioctl+0x51c/0xa94
sys_ioctl from ret_fast_syscall+0x0/0x44
(...)
Segmentation fault
Check for ixp46x in ixp46x_ptp_find() before trying to set up
PTP to avoid this.
To avoid altering the returned error code from ixp4xx_hwtstamp_set()
which before this patch was -EOPNOTSUPP, we return -EOPNOTSUPP
from ixp4xx_hwtstamp_set() if ixp46x_ptp_find() fails no matter
the error code. The helper function ixp46x_ptp_find() helper
returns -ENODEV.
Fixes: 9055a2f591 ("ixp4xx_eth: make ptp support a platform driver")
Signed-off-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260219-ixp4xx-fix-ethernet-v3-1-f235ccc3cd46@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The GPIO get callback is expected to return 0 or 1 (or a negative error
code). Ensure that the value returned by qca807x_gpio_get() is
normalized to the [0, 1] range.
Fixes: 86ef402d80 ("gpiolib: sanitize the return value of gpio_chip::get()")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/aZZeyr2ysqqk2GqA@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a crash when unbinding the sja1105 driver under special
circumstances:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
Call trace:
phylink_run_resolve_and_disable+0x10/0x90
sja1105_static_config_reload+0xc0/0x410
sja1105_vlan_filtering+0x100/0x140
dsa_port_vlan_filtering+0x13c/0x368
dsa_port_reset_vlan_filtering.isra.0+0xe8/0x198
dsa_port_bridge_leave+0x130/0x248
dsa_user_changeupper.part.0+0x74/0x158
dsa_user_netdevice_event+0x50c/0xa50
notifier_call_chain+0x78/0x148
raw_notifier_call_chain+0x20/0x38
call_netdevice_notifiers_info+0x58/0xa8
__netdev_upper_dev_unlink+0xac/0x220
netdev_upper_dev_unlink+0x38/0x70
del_nbp+0x1a4/0x320
br_del_if+0x3c/0xd8
br_device_event+0xf8/0x2d8
notifier_call_chain+0x78/0x148
raw_notifier_call_chain+0x20/0x38
call_netdevice_notifiers_info+0x58/0xa8
unregister_netdevice_many_notify+0x314/0x848
unregister_netdevice_queue+0xe8/0xf8
dsa_user_destroy+0x50/0xa8
dsa_port_teardown+0x80/0x98
dsa_switch_teardown_ports+0x4c/0xb8
dsa_switch_deinit+0x94/0xb8
dsa_switch_put_tree+0x2c/0xc0
dsa_unregister_switch+0x38/0x60
sja1105_remove+0x24/0x40
spi_remove+0x38/0x60
device_remove+0x54/0x90
device_release_driver_internal+0x1d4/0x230
device_driver_detach+0x20/0x38
unbind_store+0xbc/0xc8
---[ end trace 0000000000000000 ]---
which requires an explanation.
When a port offloads a bridge, the switch must be reset to change
the VLAN awareness state (the SJA1105_VLAN_FILTERING reason for
sja1105_static_config_reload()). When the port leaves a VLAN-aware
bridge, it must also be reset for the same reason: it is returning
to operation as a VLAN-unaware standalone port.
sja1105_static_config_reload() triggers the phylink link replay helpers.
Because sja1105 is a switch, it has multiple user ports. During unbind,
ports are torn down one by one in dsa_switch_teardown_ports() ->
dsa_port_teardown() -> dsa_user_destroy().
The crash happens when the first user port is not part of the VLAN-aware
bridge, but any other user port is.
Tearing down the first user port causes phylink_destroy() to be called
on dp->pl, and this pointer to be set to NULL. Then, when the second
user port is torn down, this was offloading a VLAN-aware bridge port, so
indirectly it will trigger sja1105_static_config_reload().
The latter function iterates using dsa_switch_for_each_available_port(),
and unconditionally dereferences dp->pl, including for the
aforementioned torn down previous port, and passes that to phylink.
This is where the NULL pointer is coming from.
There are multiple levels at which this could be avoided:
- add an "if (dp->pl)" in sja1105_static_config_reload()
- make the phylink replay helpers NULL-tolerant
- mark ports as DSA_PORT_TYPE_UNUSED after dsa_port_phylink_destroy()
has run, such that subsequent dsa_switch_for_each_available_port()
iterations skip them
- disconnect the entire switch at once from switchdev and
NETDEV_CHANGEUPPER events while unbinding, not just port by port,
likely using a "ds->unbinding = true" mechanism or similar
however options 3 and 4 are quite heavy and might have side effects.
Although 2 allows to keep the driver simpler, the phylink API it not
NULL-tolerant in general and is not responsible for the NULL pointer
(this is something done by dsa_port_phylink_destroy()). So I went
with 1.
Functionally speaking, skipping the replay helpers for ports without
a phylink instance is fine, because that only happens during driver
removal (an operation which cannot be cancelled). The ports are not
required to work (although they probably still will - untested
assumption - as long as we don't overwrite the last port speed with
SJA1105_SPEED_AUTO).
Fixes: 0b2edc531e ("net: dsa: sja1105: let phylink help with the replay of link callbacks")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260218160551.194782-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The LAN7801 is designed exclusively for external PHYs (unlike the
LAN7800/LAN7850 which have internal PHYs), but lan78xx_mdio_init()
restricts PHY scanning to MDIO addresses 0-7 by setting phy_mask to
~(0xFF). This prevents discovery of external PHYs wired to addresses
outside that range.
One such case is the DP83TC814 100BASE-T1 PHY, which is typically
configured at MDIO address 10 via PHYAD bootstrap pins and goes
undetected with the current mask.
Remove the restrictive phy_mask assignment for the LAN7801 so that the
default mask of 0 applies, allowing all 32 MDIO addresses to be
scanned during bus registration.
Fixes: 02dc1f3d61 ("lan78xx: add LAN7801 MAC only support")
Signed-off-by: Martin Pålsson <martin@poleshift.se>
Link: https://patch.msgid.link/0110019c6f388aff-98d99cf0-4425-4fff-b16b-dea5ad8fafe0-000000@eu-north-1.amazonses.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
kaweth_set_rx_mode(), the ndo_set_rx_mode callback, calls
netif_stop_queue() and netif_wake_queue(). These are TX queue flow
control functions unrelated to RX multicast configuration.
The premature netif_wake_queue() can re-enable TX while tx_urb is still
in-flight, leading to a double usb_submit_urb() on the same URB:
kaweth_start_xmit() {
netif_stop_queue();
usb_submit_urb(kaweth->tx_urb);
}
kaweth_set_rx_mode() {
netif_stop_queue();
netif_wake_queue(); // wakes TX queue before URB is done
}
kaweth_start_xmit() {
netif_stop_queue();
usb_submit_urb(kaweth->tx_urb); // URB submitted while active
}
This triggers the WARN in usb_submit_urb():
"URB submitted while active"
This is a similar class of bug fixed in rtl8150 by
- commit 958baf5eae ("net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast").
Also kaweth_set_rx_mode() is already functionally broken, the
real set_rx_mode action is performed by kaweth_async_set_rx_mode(),
which in turn is not a no-op only at ndo_open() time.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Link: https://patch.msgid.link/20260217175012.1234494-1-n7l8m4@u.northwestern.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current release - new code bugs:
- net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT
- eth: mlx5e: XSK, Fix unintended ICOSQ change
- phy_port: correctly recompute the port's linkmodes
- vsock: prevent child netns mode switch from local to global
- couple of kconfig fixes for new symbols
Previous releases - regressions:
- nfc: nci: fix false-positive parameter validation for packet data
- net: do not delay zero-copy skbs in skb_attempt_defer_free()
Previous releases - always broken:
- mctp: ensure our nlmsg responses to user space are zero-initialised
- ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()
- fixes for ICMP rate limiting
Misc:
- intel: fix PCI device ID conflict between i40e and ipw2200
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmmXUh8ACgkQMUZtbf5S
IrufYA//ZVj+4gvegqKwKZYXNBndVW00GGTYqaILbaenK1olUVUelVB91eV2Klc/
dXCeKG/MgEPuT89IjkPzVr2Yg4x6uhjcQL1rsahORn+GuQfSI/P8y7ysDOPnHVeM
Rtsg1m8z3EizJcHPeAJe7nEqFzfvZ2m+FCEGe++z8BYaUZUVApytgpIWOHO/aB+p
t13bCNzd05XxPphMl610T00Fncj2jCVDHILMgTB5rmFmkeJuQwNrRGXQSoQame46
+g+yCZjT0eVTrBaH1EUssWfrOT3VJj3BEee6gSp7k9mxMkbW18i8shBgmxS+EHjk
u19wwBzSrHK+JY1UExim+1E/rZisQVmEE1Gs0ALedxAu9zC/Julzfa2/+BFsc0j7
QTXd4jukG3aTPIX8v3TV2Igu0j+bAT4WdpzvnsXXBMVKy7wFYMd1+aSOLyFH2W9L
qRbg50oUATcsz77bZt6YUTJEgua4HXNYGtn15FMZOR7HJVR2L44Q5TK5mQxGp5iM
GabeKMzg6bsjE98STM3nbWks3pIb9ptIk++i0913eSqKgn84bDPtp3Gabfgle2SJ
8gjKS61K8rDt5x8StXVod7oGQ4asL8RJyOtE/avgbWUu9BNH8/oKqsE6TQrpXauv
1ndiyim/mPe4fBCxkVAi2+uq5/ph9z8XyleESz9VYwyL3Rl4nsg=
=qSCj
-----END PGP SIGNATURE-----
Merge tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter.
Current release - new code bugs:
- net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT
- eth: mlx5e: XSK, Fix unintended ICOSQ change
- phy_port: correctly recompute the port's linkmodes
- vsock: prevent child netns mode switch from local to global
- couple of kconfig fixes for new symbols
Previous releases - regressions:
- nfc: nci: fix false-positive parameter validation for packet data
- net: do not delay zero-copy skbs in skb_attempt_defer_free()
Previous releases - always broken:
- mctp: ensure our nlmsg responses to user space are zero-initialised
- ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()
- fixes for ICMP rate limiting
Misc:
- intel: fix PCI device ID conflict between i40e and ipw2200"
* tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
net: nfc: nci: Fix parameter validation for packet data
net/mlx5e: Use unsigned for mlx5e_get_max_num_channels
net/mlx5e: Fix deadlocks between devlink and netdev instance locks
net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event
net/mlx5: Fix misidentification of write combining CQE during poll loop
net/mlx5e: Fix misidentification of ASO CQE during poll loop
net/mlx5: Fix multiport device check over light SFs
bonding: alb: fix UAF in rlb_arp_recv during bond up/down
bnge: fix reserving resources from FW
eth: fbnic: Advertise supported XDP features.
rds: tcp: fix uninit-value in __inet_bind
net/rds: Fix NULL pointer dereference in rds_tcp_accept_one
octeontx2-af: Fix default entries mcam entry action
net/mlx5e: XSK, Fix unintended ICOSQ change
ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero
ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero
ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow()
inet: move icmp_global_{credit,stamp} to a separate cache line
icmp: prevent possible overflow in icmp_global_allow()
selftests/net: packetdrill: add ipv4-mapped-ipv6 tests
...
The max number of channels is always an unsigned int, use the correct
type to fix compilation errors done with strict type checking, e.g.:
error: call to ‘__compiletime_assert_1110’ declared with attribute
error: min(mlx5e_get_devlink_param_num_doorbells(mdev),
mlx5e_get_max_num_channels(mdev)) signedness error
Fixes: 74a8dadac1 ("net/mlx5e: Preparations for supporting larger number of channels")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the mentioned "Fixes" commit, various work tasks triggering devlink
health reporter recovery were switched to use netdev_trylock to protect
against concurrent tear down of the channels being recovered. But this
had the side effect of introducing potential deadlocks because of
incorrect lock ordering.
The correct lock order is described by the init flow:
probe_one -> mlx5_init_one (acquires devlink lock)
-> mlx5_init_one_devl_locked -> mlx5_register_device
-> mlx5_rescan_drivers_locked -...-> mlx5e_probe -> _mlx5e_probe
-> register_netdev (acquires rtnl lock)
-> register_netdevice (acquires netdev lock)
=> devlink lock -> rtnl lock -> netdev lock.
But in the current recovery flow, the order is wrong:
mlx5e_tx_err_cqe_work (acquires netdev lock)
-> mlx5e_reporter_tx_err_cqe -> mlx5e_health_report
-> devlink_health_report (acquires devlink lock => boom!)
-> devlink_health_reporter_recover
-> mlx5e_tx_reporter_recover -> mlx5e_tx_reporter_recover_from_ctx
-> mlx5e_tx_reporter_err_cqe_recover
The same pattern exists in:
mlx5e_reporter_rx_timeout
mlx5e_reporter_tx_ptpsq_unhealthy
mlx5e_reporter_tx_timeout
Fix these by moving the netdev_trylock calls from the work handlers
lower in the call stack, in the respective recovery functions, where
they are actually necessary.
Fixes: 8f7b00307b ("net/mlx5e: Convert mlx5 netdevs to instance locking")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The macsec_aso_set_arm_event function calls mlx5_aso_poll_cq once
without a retry loop. If the CQE is not immediately available after
posting the WQE, the function fails unnecessarily.
Use read_poll_timeout() to poll 3-10 usecs for CQE, consistent with
other ASO polling code paths in the driver.
Fixes: 739cfa3451 ("net/mlx5: Make ASO poll CQ usable in atomic context")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The write combining completion poll loop uses usleep_range() which can
sleep much longer than requested due to scheduler latency. Under load,
we witnessed a 20ms+ delay until the process was rescheduled, causing
the jiffies based timeout to expire while the thread is sleeping.
The original do-while loop structure (poll, sleep, check timeout) would
exit without a final poll when waking after timeout, missing a CQE that
arrived during sleep.
Instead of the open-coded while loop, use the kernel's poll_timeout_us()
which always performs an additional check after the sleep expiration,
and is less error-prone.
Note: poll_timeout_us() doesn't accept a sleep range, by passing 10
sleep_us the sleep range effectively changes from 2-10 to 3-10 usecs.
Fixes: d98995b4bf ("net/mlx5: Reimplement write combining test")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ASO completion poll loop uses usleep_range() which can sleep much
longer than requested due to scheduler latency. Under load, we witnessed
a 20ms+ delay until the process was rescheduled, causing the jiffies
based timeout to expire while the thread is sleeping.
The original do-while loop structure (poll, sleep, check timeout) would
exit without a final poll when waking after timeout, missing a CQE that
arrived during sleep.
Instead of the open-coded while loop, use the kernel's
read_poll_timeout() which always performs an additional check after the
sleep expiration, and is less error-prone.
Note: read_poll_timeout() doesn't accept a sleep range, by passing 10
sleep_us the sleep range effectively changes from 2-10 to 3-10 usecs.
Fixes: 739cfa3451 ("net/mlx5: Make ASO poll CQ usable in atomic context")
Fixes: 7e3fce82d9 ("net/mlx5e: Overcome slow response for first macsec ASO WQE")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260218072904.1764634-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
HWRM_FUNC_CFG is used to reserve resources, whereas HWRM_FUNC_QCFG is
intended for querying resource information from the firmware.
Since __bnge_hwrm_reserve_pf_rings() reserves resources for a specific
PF, the command type should be HWRM_FUNC_CFG.
Fixes: 627c67f038 ("bng_en: Add resource management support")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260218052755.4097468-1-vikas.gupta@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As per design, AF should update the default MCAM action only when
mcam_index is -1. A bug in the previous patch caused default entries
to be changed even when the request was not for them.
Fixes: 570ba37898 ("octeontx2-af: Update RSS algorithm index")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260216090338.1318976-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
XSK wakeup must use the async ICOSQ (with proper locking), as it is not
guaranteed to run on the same CPU as the channel.
The commit that converted the NAPI trigger path to use the sync ICOSQ
incorrectly applied the same change to XSK, causing XSK wakeups to use
the sync ICOSQ as well. Revert XSK flows to use the async ICOSQ.
XDP program attach/detach triggers channel reopen, while XSK pool
enable/disable can happen on-the-fly via NDOs without reopening
channels. As a result, xsk_pool state cannot be reliably used at
mlx5e_open_channel() time to decide whether an async ICOSQ is needed.
Update the async_icosq_needed logic to depend on the presence of an XDP
program rather than the xsk_pool, ensuring the async ICOSQ is available
when XSK wakeups are enabled.
This fixes multiple issues:
1. Illegal synchronize_rcu() in an RCU read- side critical section via
mlx5e_xsk_wakeup() -> mlx5e_trigger_napi_icosq() ->
synchronize_net(). The stack holds RCU read-lock in xsk_poll().
2. Hitting a NULL pointer dereference in mlx5e_xsk_wakeup():
[] BUG: kernel NULL pointer dereference, address: 0000000000000240
[] #PF: supervisor read access in kernel mode
[] #PF: error_code(0x0000) - not-present page
[] PGD 0 P4D 0
[] Oops: Oops: 0000 [#1] SMP
[] CPU: 0 UID: 0 PID: 2255 Comm: qemu-system-x86 Not tainted 6.19.0-rc5+ #229 PREEMPT(none)
[] Hardware name: [...]
[] RIP: 0010:mlx5e_xsk_wakeup+0x53/0x90 [mlx5_core]
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://lore.kernel.org/all/20260123223916.361295-1-daniel@iogearbox.net/
Fixes: 56aca3e0f7 ("net/mlx5e: Use regular ICOSQ for triggering NAPI")
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Acked-by: Alice Mikityanska <alice.kernel@fastmail.im>
Link: https://patch.msgid.link/20260217074525.1761454-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Increasing the MTU beyond the HDS threshold causes the hardware to
fragment packets across multiple buffers. If a single-buffer XDP program
is attached, the driver will drop all multi-frag frames. While we can't
prevent a remote sender from sending non-TCP packets larger than the MTU,
this will prevent users from inadvertently breaking new TCP streams.
Traditionally, drivers supported XDP with MTU less than 4Kb
(packet per page). Fbnic currently prevents attaching XDP when MTU is too high.
But it does not prevent increasing MTU after XDP is attached.
Fixes: 1b0a3950db ("eth: fbnic: Add XDP pass, drop, abort support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
dma_free_coherent() in error path takes priv->rx_buf.alloc_len as
the dma handle. This would lead to improper unmapping of the buffer.
Change the dma handle to priv->rx_buf.alloc_phys.
Fixes: 6af55ff52b ("Driver for Beckhoff CX5020 EtherCAT master module.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Link: https://patch.msgid.link/20260213164340.77272-2-fourier.thomas@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
valis reported that a race condition still happens after my prior patch.
macvlan_common_newlink() might have made @dev visible before
detecting an error, and its caller will directly call free_netdev(dev).
We must respect an RCU period, either in macvlan or the core networking
stack.
After adding a temporary mdelay(1000) in macvlan_forward_source_one()
to open the race window, valis repro was:
ip link add p1 type veth peer p2
ip link set address 00:00:00:00:00:20 dev p1
ip link set up dev p1
ip link set up dev p2
ip link add mv0 link p2 type macvlan mode source
(ip link add invalid% link p2 type macvlan mode source macaddr add
00:00:00:00:00:20 &) ; sleep 0.5 ; ping -c1 -I p1 1.2.3.4
PING 1.2.3.4 (1.2.3.4): 56 data bytes
RTNETLINK answers: Invalid argument
BUG: KASAN: slab-use-after-free in macvlan_forward_source
(drivers/net/macvlan.c:408 drivers/net/macvlan.c:444)
Read of size 8 at addr ffff888016bb89c0 by task e/175
CPU: 1 UID: 1000 PID: 175 Comm: e Not tainted 6.19.0-rc8+ #33 NONE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl (lib/dump_stack.c:123)
print_report (mm/kasan/report.c:379 mm/kasan/report.c:482)
? macvlan_forward_source (drivers/net/macvlan.c:408 drivers/net/macvlan.c:444)
kasan_report (mm/kasan/report.c:597)
? macvlan_forward_source (drivers/net/macvlan.c:408 drivers/net/macvlan.c:444)
macvlan_forward_source (drivers/net/macvlan.c:408 drivers/net/macvlan.c:444)
? tasklet_init (kernel/softirq.c:983)
macvlan_handle_frame (drivers/net/macvlan.c:501)
Allocated by task 169:
kasan_save_stack (mm/kasan/common.c:58)
kasan_save_track (./arch/x86/include/asm/current.h:25
mm/kasan/common.c:70 mm/kasan/common.c:79)
__kasan_kmalloc (mm/kasan/common.c:419)
__kvmalloc_node_noprof (./include/linux/kasan.h:263 mm/slub.c:5657
mm/slub.c:7140)
alloc_netdev_mqs (net/core/dev.c:12012)
rtnl_create_link (net/core/rtnetlink.c:3648)
rtnl_newlink (net/core/rtnetlink.c:3830 net/core/rtnetlink.c:3957
net/core/rtnetlink.c:4072)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6958)
netlink_rcv_skb (net/netlink/af_netlink.c:2550)
netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344)
netlink_sendmsg (net/netlink/af_netlink.c:1894)
__sys_sendto (net/socket.c:727 net/socket.c:742 net/socket.c:2206)
__x64_sys_sendto (net/socket.c:2209)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
Freed by task 169:
kasan_save_stack (mm/kasan/common.c:58)
kasan_save_track (./arch/x86/include/asm/current.h:25
mm/kasan/common.c:70 mm/kasan/common.c:79)
kasan_save_free_info (mm/kasan/generic.c:587)
__kasan_slab_free (mm/kasan/common.c:287)
kfree (mm/slub.c:6674 mm/slub.c:6882)
rtnl_newlink (net/core/rtnetlink.c:3845 net/core/rtnetlink.c:3957
net/core/rtnetlink.c:4072)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6958)
netlink_rcv_skb (net/netlink/af_netlink.c:2550)
netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344)
netlink_sendmsg (net/netlink/af_netlink.c:1894)
__sys_sendto (net/socket.c:727 net/socket.c:742 net/socket.c:2206)
__x64_sys_sendto (net/socket.c:2209)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
Fixes: f8db6475a8 ("macvlan: fix error recovery in macvlan_common_newlink()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: valis <sec@valis.email>
Link: https://patch.msgid.link/20260213142557.3059043-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>