Linux kernel source tree
Find a file
Martin KaFai Lau 6e375b2363 Merge branch 'bpf-tcp-exactly-once-socket-iteration'
Jordan Rife says:

====================
bpf: tcp: Exactly-once socket iteration

TCP socket iterators use iter->offset to track progress through a
bucket, which is a measure of the number of matching sockets from the
current bucket that have been seen or processed by the iterator. On
subsequent iterations, if the current bucket has unprocessed items, we
skip at least iter->offset matching items in the bucket before adding
any remaining items to the next batch. However, iter->offset isn't
always an accurate measure of "things already seen" when the underlying
bucket changes between reads, which can lead to repeated or skipped
sockets. Instead, this series remembers the cookies of the sockets we
haven't seen yet in the current bucket and resumes from the first cookie
in that list that we can find on the next iteration.

This is a continuation of the work started in [1]. This series largely
replicates the patterns applied to UDP socket iterators, applying them
instead to TCP socket iterators.

CHANGES
=======
v5 -> v6:
* In patch ten ("selftests/bpf: Create established sockets in socket
  iterator tests"), use poll() to choose a socket that has a connection
  ready to be accept()ed. Before, connect_to_server would set the
  O_NONBLOCK flag on all listening sockets so that accept_from_one could
  loop through them all and find the one that connect_to_addr_str
  connected to. However, this is subtly buggy and could potentially lead
  to test flakes, since the 3 way handshake isn't necessarily done when
  connect returns, so it's possible none of the accept() calls succeed.
  Use poll() instead to guarantee that the socket we accept() from is
  ready and eliminate the need for the O_NONBLOCK flag (Martin).

v4 -> v5:
* Move WARN_ON_ONCE before the `done` label in patch two ("bpf: tcp:
  Make sure iter->batch always contains a full bucket snapshot"")
  (Martin).
* Remove unnecessary kfunc declaration in patch eleven ("selftests/bpf:
  Create iter_tcp_destroy test program") (Martin).
* Make sure to close the socket fd at the end of `destroy` in patch
  twelve ("selftests/bpf: Add tests for bucket resume logic in
  established sockets") (Martin).

v3 -> v4:
* Drop braces around sk_nulls_for_each_from in patch five ("bpf: tcp:
  Avoid socket skips and repeats during iteration") (Stanislav).
* Add a break after the TCP_SEQ_STATE_ESTABLISHED case in patch five
  (Stanislav).
* Add an `if (sock_type == SOCK_STREAM)` check before assigning
  TCP_LISTEN to skel->rodata->ss in patch eight ("selftests/bpf: Allow
  for iteration over multiple states") to more clearly express the
  intent that the option is only consumed for SOCK_STREAM tests
  (Stanislav).
* Move the `i = 0` assignment into the for loop in patch ten
  ("selftests/bpf: Create established sockets in socket iterator
  tests") (Stanislav).

v2 -> v3:
* Unroll the loop inside bpf_iter_tcp_batch to make the logic easier to
  follow in patch two ("bpf: tcp: Make sure iter->batch always contains
  a full bucket snapshot"). This gets rid of the `resizes` variable from
  v2 and eliminates the extra conditional that checks how many batch
  resize attempts have occurred so far (Stanislav).
    Note: This changes the behavior slightly. Before, in the case that
    the second call to tcp_seek_last_pos (and later bpf_iter_tcp_resume)
    advances to a new bucket, which may happen if the current bucket is
    emptied after releasing its lock, the `resizes` "budget" would be
    reset, the net effect being that we would try a batch resize with
    GFP_USER at most once per bucket. Now, we try to resize the batch
    with GFP_USER at most once per call, so it makes it slightly more
    likely that we hit the GFP_NOWAIT scenario. However, this edge case
    should be rare in practice anyway, and the new behavior is more or
    less consistent with the original retry logic, so avoid the loop and
    prefer code clarity.
* Move the call to bpf_iter_tcp_put_batch out of
  bpf_iter_tcp_realloc_batch and call it directly before invoking
  bpf_iter_tcp_realloc_batch with GFP_USER inside bpf_iter_tcp_batch.
  /Don't/ call it before invoking bpf_iter_tcp_realloc_batch the second
  time while we hold the lock with GFP_NOWAIT. This avoids a conditional
  inside bpf_iter_tcp_realloc_batch from v2 that only calls
  bpf_iter_tcp_put_batch if flags != GFP_NOWAIT and is a bit more
  explicit (Stanislav).
* Adjust patch five ("bpf: tcp: Avoid socket skips and repeats during
  iteration") to fit with the new logic in patch two.

v1 -> v2:
* In patch five ("bpf: tcp: Avoid socket skips and repeats during
  iteration"), remove unnecessary bucket bounds checks in
  bpf_iter_tcp_resume. In either case, if st->bucket is outside the
  current table's range then bpf_iter_tcp_resume_* calls *_get_first
  which immediately returns NULL anyway and the logic will fall through.
  (Martin)
* Add a check at the top of bpf_iter_tcp_resume_listening and
  bpf_iter_tcp_resume_established to see if we're done with the current
  bucket and advance it immediately instead of wasting time finding the
  first matching socket in that bucket with
  (listening|established)_get_first. In v1, we originally discussed
  adding logic to advance the bucket in bpf_iter_tcp_seq_next and
  bpf_iter_tcp_seq_stop, but after trying this the logic seemed harder
  to track. Overall, keeping everything inside bpf_iter_tcp_resume_*
  seemed a bit clearer. (Martin)
* Instead of using a timeout in the last patch ("selftests/bpf: Add
  tests for bucket resume logic in established sockets") to wait for
  sockets to leave the ehash table after calling close(), use
  bpf_sock_destroy to deterministically destroy and remove them. This
  introduces one more patch ("selftests/bpf: Create iter_tcp_destroy
  test program") to create the iterator program that destroys a selected
  socket. Drive this through a destroy() function in the last patch
  which, just like close(), accepts a socket file descriptor. (Martin)
* Introduce one more patch ("selftests/bpf: Allow for iteration over
  multiple states") to fix a latent bug in iter_tcp_soreuse where the
  sk->sk_state != TCP_LISTEN check was ignored. Add the "ss" variable to
  allow test code to configure which socket states to allow.

[1]: https://lore.kernel.org/bpf/20250502161528.264630-1-jordan@jrife.io/
====================

Link: https://patch.msgid.link/20250714180919.127192-1-jordan@jrife.io
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-07-14 15:12:54 -07:00
arch um: vector: Reduce stack usage in vector_eth_configure() 2025-06-25 09:28:17 +02:00
block block-6.16-20250614 2025-06-14 09:25:22 -07:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto crypto: ahash - Fix infinite recursion in ahash_def_finup 2025-06-18 17:02:02 +08:00
Documentation dt-bindings: net: Document support for Airoha AN7583 MDIO Controller 2025-06-27 10:09:36 +01:00
drivers tg3: spelling corrections 2025-06-27 10:25:57 +01:00
fs Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-06-26 10:40:50 -07:00
include net: Remove unused function first_net_device_rcu() 2025-06-26 17:52:01 -07:00
init init: fix build warnings about export.h 2025-06-11 22:42:36 -07:00
io_uring io_uring-6.16-20250621 2025-06-21 08:40:45 -07:00
ipc - The 3 patch series "hung_task: extend blocking task stacktrace dump to 2025-05-31 19:12:53 -07:00
kernel bpf-fixes 2025-06-25 21:09:02 -07:00
lib Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-06-26 10:40:50 -07:00
LICENSES LICENSES: add CC0-1.0 license text 2025-05-21 14:54:17 +02:00
mm 20 hotfixes. 7 are cc:stable and the remainder address post-6.15 issues 2025-06-23 09:20:39 -07:00
net bpf: tcp: Avoid socket skips and repeats during iteration 2025-07-14 12:09:09 -07:00
rust Driver core fixes for 6.16-rc3 2025-06-18 14:31:16 -07:00
samples - The 3 patch series "hung_task: extend blocking task stacktrace dump to 2025-05-31 19:12:53 -07:00
scripts checkpatch: check for comment explaining rgmii(|-rxid|-txid) PHY modes 2025-06-26 14:49:10 +02:00
security selinux: change security_compute_sid to return the ssid or tsid on match 2025-06-19 16:13:16 -04:00
sound ALSA: hda/realtek: Enable headset Mic on Positivo P15X 2025-06-20 10:05:46 +02:00
tools selftests/bpf: Add tests for bucket resume logic in established sockets 2025-07-14 15:12:54 -07:00
usr usr/include: openrisc: don't HDRTEST bpf_perf_event.h 2025-05-12 15:03:17 +09:00
virt Merge branch 'kvm-lockdep-common' into HEAD 2025-05-28 06:29:17 -04:00
.clang-format Linux 6.15-rc5 2025-05-06 16:39:25 +10:00
.clippy.toml rust: clean Rust 1.88.0's warning about clippy::disallowed_macros configuration 2025-05-07 00:11:47 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore MAINTAINERS: Retire Ralf Baechle 2024-11-12 15:48:59 +01:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore .gitignore: ignore Python compiled bytecode 2025-04-24 10:12:46 -06:00
.mailmap Including fixes from wireless. The ath12k fix to avoid FW crashes 2025-06-19 10:21:32 -07:00
.pylintrc docs: add a .pylintrc file with sys path for docs scripts 2025-04-09 12:10:33 -06:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS CREDITS: Add entry for Shannon Nelson 2025-06-21 07:34:28 -07:00
Kbuild drm: ensure drm headers are self-contained and pass kernel-doc 2025-02-12 10:44:43 +02:00
Kconfig io_uring: Rename KConfig to Kconfig 2025-02-19 14:53:27 -07:00
MAINTAINERS Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-06-26 10:40:50 -07:00
Makefile Linux 6.16-rc3 2025-06-22 13:30:08 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.