linux/include
Florian Westphal 8e1a1bc4f5 netfilter: nf_tables: avoid chain re-validation if possible
Hamza Mahfooz reports cpu soft lock-ups in
nft_chain_validate():

 watchdog: BUG: soft lockup - CPU#1 stuck for 27s! [iptables-nft-re:37547]
[..]
 RIP: 0010:nft_chain_validate+0xcb/0x110 [nf_tables]
[..]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_table_validate+0x6b/0xb0 [nf_tables]
  nf_tables_validate+0x8b/0xa0 [nf_tables]
  nf_tables_commit+0x1df/0x1eb0 [nf_tables]
[..]

Currently nf_tables will traverse the entire table (chain graph), starting
from the entry points (base chains), exploring all possible paths
(chain jumps).  But there are cases where we could avoid revalidation.

Consider:
1  input -> j2 -> j3
2  input -> j2 -> j3
3  input -> j1 -> j2 -> j3

Then the second rule does not need to revalidate j2, and, by extension j3,
because this was already checked during validation of the first rule.
We need to validate it only for rule 3.

This is needed because chain loop detection also ensures we do not exceed
the jump stack: Just because we know that j2 is cycle free, its last jump
might now exceed the allowed stack size.  We also need to update all
reachable chains with the new largest observed call depth.

Care has to be taken to revalidate even if the chain depth won't be an
issue: chain validation also ensures that expressions are not called from
invalid base chains.  For example, the masquerade expression can only be
called from NAT postrouting base chains.

Therefore we also need to keep record of the base chain context (type,
hooknum) and revalidate if the chain becomes reachable from a different
hook location.

Reported-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Closes: https://lore.kernel.org/netfilter-devel/20251118221735.GA5477@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/
Tested-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2025-12-15 15:02:44 +01:00
..
acpi Revert "ACPI: processor: idle: Optimize ACPI idle driver registration" 2025-11-25 16:08:06 +01:00
asm-generic bpf-next-6.19 2025-12-03 16:54:54 -08:00
clocksource
crypto This update includes the following changes: 2025-12-03 11:28:38 -08:00
cxl
drm drm/pcids: Split PTL pciids group to make wcl subplatform 2025-11-18 08:47:58 -05:00
dt-bindings There's a bunch of patches here across drivers/clk/ to migrate drivers to use 2025-10-07 09:28:37 -07:00
hyperv hyperv: Remove the spurious null directive line 2025-10-02 21:21:24 +00:00
keys keys: Annotate struct asymmetric_key_id with __counted_by 2025-10-31 17:43:56 +08:00
kunit linux_kselftest-kunit-6.18-rc1 2025-10-01 19:15:11 -07:00
kvm KVM: arm64: Kill leftovers of ad-hoc timer userspace access 2025-10-13 14:42:41 +01:00
linux Networking changes for 6.19. 2025-12-03 17:24:33 -08:00
math-emu
media
memory
misc
net netfilter: nf_tables: avoid chain re-validation if possible 2025-12-15 15:02:44 +01:00
pcmcia
ras
rdma
rv
scsi scsi: core: Fix the unit attention counter implementation 2025-10-21 21:09:36 -04:00
soc KEYS: trusted: caam based protected key 2025-10-20 12:10:28 +08:00
sound ASoC: tas2781: Support more newly-released amplifiers tas58xx in the driver 2025-10-13 11:08:09 +01:00
target
trace Networking changes for 6.19. 2025-12-03 17:24:33 -08:00
uapi mptcp: pm: ignore unknown endpoint flags 2025-12-08 23:54:02 -08:00
ufs scsi: ufs: core: Add a quirk to suppress link_startup_again 2025-10-29 23:20:19 -04:00
vdso
video
xen
Kbuild