linux/include/net/netfilter
Florian Westphal 8e1a1bc4f5 netfilter: nf_tables: avoid chain re-validation if possible
Hamza Mahfooz reports cpu soft lock-ups in
nft_chain_validate():

 watchdog: BUG: soft lockup - CPU#1 stuck for 27s! [iptables-nft-re:37547]
[..]
 RIP: 0010:nft_chain_validate+0xcb/0x110 [nf_tables]
[..]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_immediate_validate+0x36/0x50 [nf_tables]
  nft_chain_validate+0xc9/0x110 [nf_tables]
  nft_table_validate+0x6b/0xb0 [nf_tables]
  nf_tables_validate+0x8b/0xa0 [nf_tables]
  nf_tables_commit+0x1df/0x1eb0 [nf_tables]
[..]

Currently nf_tables will traverse the entire table (chain graph), starting
from the entry points (base chains), exploring all possible paths
(chain jumps).  But there are cases where we could avoid revalidation.

Consider:
1  input -> j2 -> j3
2  input -> j2 -> j3
3  input -> j1 -> j2 -> j3

Then the second rule does not need to revalidate j2, and, by extension j3,
because this was already checked during validation of the first rule.
We need to validate it only for rule 3.

This is needed because chain loop detection also ensures we do not exceed
the jump stack: Just because we know that j2 is cycle free, its last jump
might now exceed the allowed stack size.  We also need to update all
reachable chains with the new largest observed call depth.

Care has to be taken to revalidate even if the chain depth won't be an
issue: chain validation also ensures that expressions are not called from
invalid base chains.  For example, the masquerade expression can only be
called from NAT postrouting base chains.

Therefore we also need to keep record of the base chain context (type,
hooknum) and revalidate if the chain becomes reachable from a different
hook location.

Reported-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Closes: https://lore.kernel.org/netfilter-devel/20251118221735.GA5477@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/
Tested-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2025-12-15 15:02:44 +01:00
..
ipv4 netfilter: nf_reject: remove unneeded exports 2025-09-02 15:28:17 +02:00
ipv6 netfilter: nf_reject: remove unneeded exports 2025-09-02 15:28:17 +02:00
br_netfilter.h netfilter: remove CONFIG_NETFILTER checks from headers. 2019-09-13 12:47:36 +02:00
nf_bpf_link.h bpf: minimal support for programs hooked into netfilter framework 2023-04-21 11:34:14 -07:00
nf_conntrack.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-07-17 11:00:33 -07:00
nf_conntrack_acct.h netfilter: conntrack: Remove unused function declarations 2023-08-08 13:02:00 +02:00
nf_conntrack_act_ct.h net/sched: act_ct: Always fill offloading tuple iifidx 2023-11-08 17:47:08 -08:00
nf_conntrack_bpf.h net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c 2022-10-03 09:17:32 -07:00
nf_conntrack_bridge.h netfilter: remove CONFIG_NETFILTER checks from headers. 2019-09-13 12:47:36 +02:00
nf_conntrack_core.h netfilter: conntrack: fix wrong ct->timeout value 2023-04-19 12:08:38 +02:00
nf_conntrack_count.h netfilter: nf_conncount: rework API to use sk_buff directly 2025-11-28 00:05:49 +00:00
nf_conntrack_ecache.h netfilter: conntrack: add conntrack event timestamp 2025-01-09 14:42:16 +01:00
nf_conntrack_expect.h netfilter: allow exp not to be removed in nf_ct_find_expectation 2023-07-20 10:06:36 +02:00
nf_conntrack_extend.h netfilter: extensions: introduce extension genid count 2022-05-13 18:52:16 +02:00
nf_conntrack_helper.h netfilter: helper: Remove unused function declarations 2023-08-08 13:01:59 +02:00
nf_conntrack_l4proto.h netfilter: fix typo in nf_conntrack_l4proto.h comment 2025-10-30 12:52:45 +01:00
nf_conntrack_labels.h netfilter: conntrack: switch connlabels to atomic_t 2023-10-24 13:16:30 +02:00
nf_conntrack_seqadj.h netfilter: conntrack: remove extension register api 2022-02-04 06:30:28 +01:00
nf_conntrack_synproxy.h netfilter: conntrack: wrap two inline functions in config checks. 2019-09-13 12:47:10 +02:00
nf_conntrack_timeout.h netfilter: nf_conntrack: add missing __rcu annotations 2022-07-11 16:25:15 +02:00
nf_conntrack_timestamp.h netfilter: conntrack: remove extension register api 2022-02-04 06:30:28 +01:00
nf_conntrack_tuple.h netfilter: conntrack: don't fold port numbers into addresses before hashing 2023-07-05 14:42:16 +02:00
nf_conntrack_zones.h netfilter: conntrack: remove CONFIG_NF_CONNTRACK checks from nf_conntrack_zones.h. 2019-09-13 12:47:41 +02:00
nf_dup_netdev.h netfilter: nft_{fwd,dup}_netdev: add offload support 2019-09-10 22:44:29 +02:00
nf_flow_table.h netfilter: flowtable: Add IPIP rx sw acceleration 2025-11-28 00:00:38 +00:00
nf_hooks_lwtunnel.h sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
nf_log.h netfilter: load nf_log_syslog on enabling nf_conntrack_log_invalid 2025-07-25 18:35:41 +02:00
nf_nat.h net: move the nat function to nf_nat_ovs for ovs and tc 2022-12-12 10:14:03 +00:00
nf_nat_helper.h netfilter: nat: move repetitive nat port reserve loop to a helper 2022-09-07 16:46:04 +02:00
nf_nat_masquerade.h netfilter: update include directives. 2019-09-13 12:33:06 +02:00
nf_nat_redirect.h netfilter: nft_redir: use struct nf_nat_range2 throughout and deduplicate eval call-backs 2023-03-22 21:48:59 +01:00
nf_queue.h netfilter: move nf_reinject into nfnetlink_queue modules 2024-02-21 12:03:22 +01:00
nf_reject.h netfilter: conntrack: remove DCCP protocol support 2025-07-03 13:51:39 +02:00
nf_socket.h netfilter: Decrease code duplication regarding transparent socket option 2018-06-03 00:02:01 +02:00
nf_synproxy.h netfilter: remove CONFIG_NETFILTER checks from headers. 2019-09-13 12:47:36 +02:00
nf_tables.h netfilter: nf_tables: avoid chain re-validation if possible 2025-12-15 15:02:44 +01:00
nf_tables_core.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-09-11 17:40:13 -07:00
nf_tables_ipv4.h netfilter: nf_tables: restore IP sanity checks for netdev/egress 2024-08-26 13:05:28 +02:00
nf_tables_ipv6.h netfilter: nf_tables_ipv6: consider network offset in netdev/egress validation 2024-08-27 18:11:56 +02:00
nf_tables_offload.h netfilter: nf_tables: bail out early if hardware offload is not supported 2022-06-06 19:19:15 +02:00
nf_tproxy.h net: reformat kdoc return statements 2024-12-09 14:44:59 -08:00
nft_fib.h netfilter: nf_tables: nft_fib: consistent l3mdev handling 2025-05-23 13:57:09 +02:00
nft_meta.h netfilter: nf_tables: drop unused 3rd argument from validate callback ops 2024-09-03 10:47:17 +02:00
nft_reject.h netfilter: nf_tables: drop unused 3rd argument from validate callback ops 2024-09-03 10:47:17 +02:00
xt_rateest.h net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types 2021-10-18 12:54:41 +01:00