linux/net
Linus Torvalds 672dcda246 vfs-6.17-rc1.pidfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCiQAKCRCRxhvAZXjc
 orltAQDq3y1anYETz5/FD6P2gXY1W5hXdSm3EHHeacQ1JjTXvgEA2g1lWO7J4anf
 oOVE8aSvMow/FOjivLZBYmI65pkYJAE=
 =oDKB
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull pidfs updates from Christian Brauner:

 - persistent info

   Persist exit and coredump information independent of whether anyone
   currently holds a pidfd for the struct pid.

   The current scheme allocated pidfs dentries on-demand repeatedly.
   This scheme is reaching it's limits as it makes it impossible to pin
   information that needs to be available after the task has exited or
   coredumped and that should not be lost simply because the pidfd got
   closed temporarily. The next opener should still see the stashed
   information.

   This is also a prerequisite for supporting extended attributes on
   pidfds to allow attaching meta information to them.

   If someone opens a pidfd for a struct pid a pidfs dentry is allocated
   and stashed in pid->stashed. Once the last pidfd for the struct pid
   is closed the pidfs dentry is released and removed from pid->stashed.

   So if 10 callers create a pidfs dentry for the same struct pid
   sequentially, i.e., each closing the pidfd before the other creates a
   new one then a new pidfs dentry is allocated every time.

   Because multiple tasks acquiring and releasing a pidfd for the same
   struct pid can race with each another a task may still find a valid
   pidfs entry from the previous task in pid->stashed and reuse it. Or
   it might find a dead dentry in there and fail to reuse it and so
   stashes a new pidfs dentry. Multiple tasks may race to stash a new
   pidfs dentry but only one will succeed, the other ones will put their
   dentry.

   The current scheme aims to ensure that a pidfs dentry for a struct
   pid can only be created if the task is still alive or if a pidfs
   dentry already existed before the task was reaped and so exit
   information has been was stashed in the pidfs inode.

   That's great except that it's buggy. If a pidfs dentry is stashed in
   pid->stashed after pidfs_exit() but before __unhash_process() is
   called we will return a pidfd for a reaped task without exit
   information being available.

   The pidfds_pid_valid() check does not guard against this race as it
   doens't sync at all with pidfs_exit(). The pid_has_task() check might
   be successful simply because we're before __unhash_process() but
   after pidfs_exit().

   Introduce a new scheme where the lifetime of information associated
   with a pidfs entry (coredump and exit information) isn't bound to the
   lifetime of the pidfs inode but the struct pid itself.

   The first time a pidfs dentry is allocated for a struct pid a struct
   pidfs_attr will be allocated which will be used to store exit and
   coredump information.

   If all pidfs for the pidfs dentry are closed the dentry and inode can
   be cleaned up but the struct pidfs_attr will stick until the struct
   pid itself is freed. This will ensure minimal memory usage while
   persisting relevant information.

   The new scheme has various advantages. First, it allows to close the
   race where we end up handing out a pidfd for a reaped task for which
   no exit information is available. Second, it minimizes memory usage.
   Third, it allows to remove complex lifetime tracking via dentries
   when registering a struct pid with pidfs. There's no need to get or
   put a reference. Instead, the lifetime of exit and coredump
   information associated with a struct pid is bound to the lifetime of
   struct pid itself.

 - extended attributes

   Now that we have a way to persist information for pidfs dentries we
   can start supporting extended attributes on pidfds. This will allow
   userspace to attach meta information to tasks.

   One natural extension would be to introduce a custom pidfs.* extended
   attribute space and allow for the inheritance of extended attributes
   across fork() and exec().

   The first simple scheme will allow privileged userspace to set
   trusted extended attributes on pidfs inodes.

 - Allow autonomous pidfs file handles

   Various filesystems such as pidfs and drm support opening file
   handles without having to require a file descriptor to identify the
   filesystem. The filesystem are global single instances and can be
   trivially identified solely on the information encoded in the file
   handle.

   This makes it possible to not have to keep or acquire a sentinal file
   descriptor just to pass it to open_by_handle_at() to identify the
   filesystem. That's especially useful when such sentinel file
   descriptor cannot or should not be acquired.

   For pidfs this means a file handle can function as full replacement
   for storing a pid in a file. Instead a file handle can be stored and
   reopened purely based on the file handle.

   Such autonomous file handles can be opened with or without specifying
   a a file descriptor. If no proper file descriptor is used the
   FD_PIDFS_ROOT sentinel must be passed. This allows us to define
   further special negative fd sentinels in the future.

   Userspace can trivially test for support by trying to open the file
   handle with an invalid file descriptor.

 - Allow pidfds for reaped tasks with SCM_PIDFD messages

   This is a logical continuation of the earlier work to create pidfds
   for reaped tasks through the SO_PEERPIDFD socket option merged in
   923ea4d448 ("Merge patch series "net, pidfs: enable handing out
   pidfds for reaped sk->sk_peer_pid"").

 - Two minor fixes:

    * Fold fs_struct->{lock,seq} into a seqlock

    * Don't bother with path_{get,put}() in unix_open_file()

* tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (37 commits)
  don't bother with path_get()/path_put() in unix_open_file()
  fold fs_struct->{lock,seq} into a seqlock
  selftests: net: extend SCM_PIDFD test to cover stale pidfds
  af_unix: enable handing out pidfds for reaped tasks in SCM_PIDFD
  af_unix: stash pidfs dentry when needed
  af_unix/scm: fix whitespace errors
  af_unix: introduce and use scm_replace_pid() helper
  af_unix: introduce unix_skb_to_scm helper
  af_unix: rework unix_maybe_add_creds() to allow sleep
  selftests/pidfd: decode pidfd file handles withou having to specify an fd
  fhandle, pidfs: support open_by_handle_at() purely based on file handle
  uapi/fcntl: add FD_PIDFS_ROOT
  uapi/fcntl: add FD_INVALID
  fcntl/pidfd: redefine PIDFD_SELF_THREAD_GROUP
  uapi/fcntl: mark range as reserved
  fhandle: reflow get_path_anchor()
  pidfs: add pidfs_root_path() helper
  fhandle: rename to get_path_anchor()
  fhandle: hoist copy_from_user() above get_path_from_fd()
  fhandle: raise FILEID_IS_DIR in handle_type
  ...
2025-07-28 14:10:15 -07:00
..
6lowpan
9p netfs: Fix the request's work item to not require a ref 2025-05-21 14:35:20 +02:00
802 treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
8021q net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime 2025-07-17 07:44:26 -07:00
appletalk net: appletalk: Fix use-after-free in AARP proxy probe 2025-07-21 16:55:08 -07:00
atm atm: clip: Fix NULL pointer dereference in vcc_sendmsg() 2025-07-09 19:09:36 -07:00
ax25 treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
batman-adv treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
bluetooth Bluetooth: L2CAP: Fix attempting to adjust outgoing MTU 2025-07-17 10:26:53 -04:00
bpf selftests/bpf: Add test to access const void pointer argument in tracing program 2025-04-23 11:26:22 -07:00
bridge net: bridge: Do not offload IGMP/MLD messages 2025-07-17 07:46:41 -07:00
caif rtnetlink: Pack newlink() params into struct 2025-02-21 15:28:02 -08:00
can treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
ceph A small CephFS encryption-related fix and a dead code cleanup. 2025-04-25 15:51:28 -07:00
core vfs-6.17-rc1.pidfs 2025-07-28 14:10:15 -07:00
dcb dcb: Use rtnl_register_many(). 2024-10-15 18:52:26 -07:00
devlink devlink: use DEVLINK_VAR_ATTR_TYPE_* instead of NLA_* in fmsg 2025-05-06 18:21:11 -07:00
dns_resolver
dsa net: dsa: tag_brcm: legacy: fix pskb_may_pull length 2025-05-30 19:20:18 -07:00
ethernet
ethtool Including fixes from bluetooth and wireless. 2025-06-12 09:50:36 -07:00
handshake module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
hsr treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
ieee802154 treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
ife
ipv4 ipsec-2025-07-23 2025-07-24 12:30:40 +02:00
ipv6 ipsec-2025-07-23 2025-07-24 12:30:40 +02:00
iucv s390: Convert MACHINE_IS_[LPAR|VM|KVM], etc, machine_is_[lpar|vm|kvm]() 2025-03-04 17:18:07 +01:00
kcm
key Revert "xfrm: destroy xfrm_state synchronously on net exit path" 2025-07-08 13:28:29 +02:00
l2tp net: move misc netdev_lock flavors to a separate header 2025-03-08 09:06:50 -08:00
l3mdev net: fib_rules: Fix iif / oif matching on L3 master device 2025-04-15 17:54:56 -07:00
lapb treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
llc treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
mac80211 wifi: mac80211: add the virtual monitor after reconfig complete 2025-07-10 13:27:14 +02:00
mac802154 mac802154: Switch to use hrtimer_setup() 2025-02-18 10:35:44 +01:00
mctp net: mctp: use nlmsg_payload() for netlink message data extraction 2025-05-26 17:38:27 +02:00
mpls mpls: Use rcu_dereference_rtnl() in mpls_route_input_rcu(). 2025-06-17 18:21:59 -07:00
mptcp mptcp: reset fallback status gracefully at disconnect() time 2025-07-15 17:31:25 -07:00
ncsi treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
netfilter netfilter: nf_conntrack: fix crash due to removal of uninitialised entry 2025-07-17 11:23:33 +02:00
netlabel calipso: unlock rcu before returning -EAFNOSUPPORT 2025-06-05 08:03:38 -07:00
netlink netlink: make sure we allow at least one dump skb 2025-07-11 07:31:47 -07:00
netrom treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
nfc NFC: nci: uart: Set tty->disc_data only in success path 2025-06-19 08:33:54 -07:00
nsh
openvswitch openvswitch: Allocate struct ovs_pcpu_storage dynamically 2025-06-17 14:47:46 +02:00
packet af_packet: fix soft lockup issue caused by tpacket_snd() 2025-07-13 01:28:51 +01:00
phonet phonet/pep: Move call to pn_skb_get_dst_sockaddr() earlier in pep_sock_accept() 2025-07-17 07:30:27 -07:00
psample psample: adjust size if rate_as_probability is set 2024-12-18 19:23:04 -08:00
qrtr
rds replace strncpy with strscpy_pad 2025-05-26 22:28:44 +02:00
rfkill net: rfkill: gpio: allow booting in blocked state 2025-02-11 11:55:55 +01:00
rose rose: fix dangling neighbour pointers in rose_rt_device_down() 2025-07-01 19:28:48 -07:00
rxrpc rxrpc: Fix to use conn aborts for conn-wide failures 2025-07-17 07:50:48 -07:00
sched net/sched: sch_qfq: Avoid triggering might_sleep in atomic context in qfq_delete_class 2025-07-22 11:48:34 +02:00
sctp treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
shaper net: add netdev_lock() / netdev_unlock() helpers 2025-01-15 19:13:33 -08:00
smc smc: Fix various oops due to inet_sock type confusion. 2025-07-14 17:12:42 -07:00
strparser strparser: Remove unused __strp_unpause 2025-05-05 16:48:12 -07:00
sunrpc Massage rpc_pipefs to use saner primitives and clean up the 2025-07-28 09:56:09 -07:00
switchdev net: switchdev: Convert blocking notification chain to a raw one 2025-03-11 11:30:28 +01:00
tipc tipc: Fix use-after-free in tipc_conn_close(). 2025-07-07 18:38:24 -07:00
tls tls: always refresh the queue when reading sock 2025-07-17 07:39:02 -07:00
unix vfs-6.17-rc1.pidfs 2025-07-28 14:10:15 -07:00
vmw_vsock vsock: Fix IOCTL_VM_SOCKETS_GET_LOCAL_CID to check also transport_local 2025-07-08 08:39:49 -07:00
wireless wifi: prevent A-MSDU attacks in mesh networks 2025-07-07 10:54:13 +02:00
x25 treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
xdp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-05-22 09:42:41 -07:00
xfrm ipsec-2025-07-23 2025-07-24 12:30:40 +02:00
compat.c
devres.c
Kconfig net: Kconfig NET_DEVMEM selects GENERIC_ALLOCATOR 2025-05-27 17:31:42 -07:00
Kconfig.debug
Makefile net: Retire DCCP socket. 2025-04-11 18:58:10 -07:00
socket.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-03-26 09:32:10 -07:00
sysctl_net.c