linux/kernel
Linus Torvalds 5f85bd6aec vfs-6.14-rc1.pidfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ4pRdwAKCRCRxhvAZXjc
 otQjAP9ooUH2d/jHZ49Rw4q/3BkhX8R2fFEZgj2PMvtYlr0jQwD/d8Ji0k4jINTL
 AIFRfPdRwrD+X35IUK3WPO42YFZ4rAg=
 =5wgo
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.14-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull pidfs updates from Christian Brauner:

 - Rework inode number allocation

   Recently we received a patchset that aims to enable file handle
   encoding and decoding via name_to_handle_at(2) and
   open_by_handle_at(2).

   A crucical step in the patch series is how to go from inode number to
   struct pid without leaking information into unprivileged contexts.
   The issue is that in order to find a struct pid the pid number in the
   initial pid namespace must be encoded into the file handle via
   name_to_handle_at(2).

   This can be used by containers using a separate pid namespace to
   learn what the pid number of a given process in the initial pid
   namespace is. While this is a weak information leak it could be used
   in various exploits and in general is an ugly wart in the design.

   To solve this problem a new way is needed to lookup a struct pid
   based on the inode number allocated for that struct pid. The other
   part is to remove the custom inode number allocation on 32bit systems
   that is also an ugly wart that should go away.

   Allocate unique identifiers for struct pid by simply incrementing a
   64 bit counter and insert each struct pid into the rbtree so it can
   be looked up to decode file handles avoiding to leak actual pids
   across pid namespaces in file handles.

   On both 64 bit and 32 bit the same 64 bit identifier is used to
   lookup struct pid in the rbtree. On 64 bit the unique identifier for
   struct pid simply becomes the inode number. Comparing two pidfds
   continues to be as simple as comparing inode numbers.

   On 32 bit the 64 bit number assigned to struct pid is split into two
   32 bit numbers. The lower 32 bits are used as the inode number and
   the upper 32 bits are used as the inode generation number. Whenever a
   wraparound happens on 32 bit the 64 bit number will be incremented by
   2 so inode numbering starts at 2 again.

   When a wraparound happens on 32 bit multiple pidfds with the same
   inode number are likely to exist. This isn't a problem since before
   pidfs pidfds used the anonymous inode meaning all pidfds had the same
   inode number. On 32 bit sserspace can thus reconstruct the 64 bit
   identifier by retrieving both the inode number and the inode
   generation number to compare, or use file handles. This gives the
   same guarantees on both 32 bit and 64 bit.

 - Implement file handle support

   This is based on custom export operation methods which allows pidfs
   to implement permission checking and opening of pidfs file handles
   cleanly without hacking around in the core file handle code too much.

 - Support bind-mounts

   Allow bind-mounting pidfds. Similar to nsfs let's allow bind-mounts
   for pidfds. This allows pidfds to be safely recovered and checked for
   process recycling.

   Instead of checking d_ops for both nsfs and pidfs we could in a
   follow-up patch add a flag argument to struct dentry_operations that
   functions similar to file_operations->fop_flags.

* tag 'vfs-6.14-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests: add pidfd bind-mount tests
  pidfs: allow bind-mounts
  pidfs: lookup pid through rbtree
  selftests/pidfd: add pidfs file handle selftests
  pidfs: check for valid ioctl commands
  pidfs: implement file handle support
  exportfs: add permission method
  fhandle: pull CAP_DAC_READ_SEARCH check into may_decode_fh()
  exportfs: add open method
  fhandle: simplify error handling
  pseudofs: add support for export_ops
  pidfs: support FS_IOC_GETVERSION
  pidfs: remove 32bit inode number handling
  pidfs: rework inode number allocation
2025-01-20 09:59:00 -08:00
..
bpf bpf: Fix bpf_get_smp_processor_id() on !CONFIG_SMP 2024-12-17 16:09:24 -08:00
cgroup cgroup/cpuset: remove kernfs active break 2025-01-08 15:54:39 -10:00
configs configs/debug: make sure PROVE_RCU_LIST=y takes effect 2024-10-28 10:21:09 -07:00
debug kdb: fix ctrl+e/a/f/b/d/p/n broken in keyboard mode 2024-11-18 15:20:22 +00:00
dma dma-debug: fix physical address calculation for struct dma_debug_entry 2024-11-28 10:19:16 +01:00
entry sched: Add TIF_NEED_RESCHED_LAZY infrastructure 2024-11-05 12:55:37 +01:00
events uprobes: Fix race in uprobe_free_utask 2025-01-10 09:28:01 +01:00
futex futex: fix user access on powerpc 2024-12-09 10:00:25 -08:00
gcov gcov: add support for GCC 14 2024-06-15 10:43:06 -07:00
irq genirq/proc: Add missing space separator back 2024-12-03 14:59:34 +01:00
kcsan kcsan: Remove redundant call of kallsyms_lookup_name() 2024-10-14 16:44:56 +02:00
livepatch livepatch: Replace snprintf() with sysfs_emit() 2024-07-02 16:56:18 +02:00
locking locking/rtmutex: Make sure we wake anything on the wake_q when we release the lock->wait_lock 2024-12-17 17:47:24 +01:00
module module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
power The biggest change here is eliminating the awful idea that KVM had, of 2024-11-23 16:00:50 -08:00
printk printk changes for 6.13 2024-11-20 09:21:11 -08:00
rcu A set of updates for the interrupt subsystem: 2024-11-19 15:54:19 -08:00
sched - Do not adjust the weight of empty group entities and avoid scheduling 2025-01-19 09:01:17 -08:00
time hrtimers: Handle CPU state correctly on hotplug 2025-01-16 13:06:14 +01:00
trace tracing: Print lazy preemption model 2025-01-14 09:44:33 -05:00
.gitignore
acct.c kernel misc: Remove the now superfluous sentinel elements from ctl_table array 2024-04-24 09:43:53 +02:00
async.c async: Use a dedicated unbound workqueue with raised min_active 2024-02-09 11:13:59 -10:00
audit.c lsm/stable-6.13 PR 20241112 2024-11-18 17:34:05 -08:00
audit.h audit: change context data from secid to lsm_prop 2024-10-11 14:34:16 -04:00
audit_fsnotify.c
audit_tree.c fsnotify: create a wrapper fsnotify_find_inode_mark() 2024-04-04 16:24:16 +02:00
audit_watch.c fsnotify: create a wrapper fsnotify_find_inode_mark() 2024-04-04 16:24:16 +02:00
auditfilter.c audit: change context data from secid to lsm_prop 2024-10-11 14:34:16 -04:00
auditsc.c audit: workaround a GCC bug triggered by task comm changes 2024-12-04 22:57:46 -05:00
backtracetest.c backtracetest: add MODULE_DESCRIPTION() 2024-06-24 22:24:55 -07:00
bounds.c bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS 2024-04-29 08:29:29 -07:00
capability.c lsm: constify the 'target' parameter in security_capget() 2023-08-08 16:48:47 -04:00
cfi.c
compat.c
configs.c
context_tracking.c context_tracking, rcu: Rename rcu_dyntick trace event into rcu_watching 2024-08-15 21:30:43 +05:30
cpu.c hrtimers: Handle CPU state correctly on hotplug 2025-01-16 13:06:14 +01:00
cpu_pm.c
crash_core.c kexec/crash: no crash update when kexec in progress 2024-11-05 17:12:27 -08:00
crash_reserve.c crash: fix crash memory reserve exceed system memory bug 2024-09-01 20:43:30 -07:00
cred.c cred: Add a light version of override/revert_creds() 2024-11-11 10:45:04 +01:00
delayacct.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
dma.c
elfcorehdr.c crash: remove dependency of FA_DUMP on CRASH_DUMP 2024-02-23 17:48:22 -08:00
exec_domain.c
exit.c remove pointless includes of <linux/fdtable.h> 2024-10-07 13:34:41 -04:00
exit.h exit: add internal include file with helpers 2023-09-21 12:03:50 -06:00
extable.c
fail_function.c
fork.c fork: avoid inappropriate uprobe access to invalid mm 2024-12-18 19:04:44 -08:00
freezer.c sched/fair: Fix external p->on_rq users 2024-10-14 09:14:35 +02:00
gen_kheaders.sh kheaders: Ignore silly-rename files 2024-12-20 22:07:55 +01:00
groups.c groups: Convert group_info.usage to refcount_t 2023-09-29 11:28:39 -07:00
hung_task.c hung_task: add detect count for hung tasks 2024-11-11 17:17:03 -08:00
iomem.c kernel/iomem.c: remove __weak ioremap_cache helper 2023-08-21 13:37:28 -07:00
irq_work.c
jump_label.c jump_label: Fix static_key_slow_dec() yet again 2024-09-10 11:57:27 +02:00
kallsyms.c kallsyms: Match symbols exactly with CONFIG_LTO_CLANG 2024-08-15 09:33:35 -07:00
kallsyms_internal.h kallsyms: get rid of code for absolute kallsyms 2024-07-20 16:33:21 +09:00
kallsyms_selftest.c kallsyms: Match symbols exactly with CONFIG_LTO_CLANG 2024-08-15 09:33:35 -07:00
kallsyms_selftest.h
kcmp.c get rid of ...lookup...fdget_rcu() family 2024-10-07 13:34:41 -04:00
Kconfig.freezer
Kconfig.hz
Kconfig.kexec crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32 2024-11-14 22:43:48 -08:00
Kconfig.locks
Kconfig.preempt sched: No PREEMPT_RT=y for all{yes,mod}config 2024-11-07 15:25:05 +01:00
kcov.c kcov: mark in_softirq_really() as __always_inline 2024-12-30 17:59:08 -08:00
kexec.c crash: add a new kexec flag for hotplug support 2024-04-23 14:59:01 +10:00
kexec_core.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
kexec_elf.c
kexec_file.c kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y 2024-09-01 17:59:01 -07:00
kexec_internal.h kexec: use atomic_try_cmpxchg_acquire() in kexec_trylock() 2024-09-01 20:43:23 -07:00
kheaders.c
kprobes.c kprobes: Use struct_size() in __get_insn_slot() 2024-10-31 11:00:58 +09:00
ksyms_common.c
ksysfs.c profiling: remove prof_cpu_mask 2024-07-29 10:45:54 -07:00
kthread.c get rid of __get_task_comm() 2024-11-05 17:12:28 -08:00
latencytop.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
Makefile mm: move kernel/numa.c to mm/ 2024-09-03 21:15:26 -07:00
module_signature.c
notifier.c reboot: move reboot_notifier_list to kernel/reboot.c 2024-11-05 17:12:31 -08:00
nsproxy.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
padata.c padata: Clean up in padata_do_multithreaded() 2024-11-10 11:50:54 +08:00
panic.c drm next for 6.12-rc1 2024-09-19 10:18:15 +02:00
params.c params: Fix multi-line comment style 2023-12-01 09:51:44 -08:00
pid.c pidfs: lookup pid through rbtree 2024-12-17 09:16:18 +01:00
pid_namespace.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
pid_sysctl.h sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
profile.c profiling: remove profile=sleep support 2024-08-04 13:36:28 -07:00
ptrace.c ptrace_attach: shift send(SIGSTOP) into ptrace_set_stopped() 2024-02-22 15:38:52 -08:00
range.c
reboot.c kernel/reboot: replace sprintf() with sysfs_emit() 2024-11-11 17:17:05 -08:00
regset.c regset: use kvzalloc() for regset_get_alloc() 2024-04-25 21:07:03 -07:00
relay.c [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
resource.c module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
resource_kunit.c resource, kunit: fix user-after-free in resource_test_region_intersects() 2024-10-09 12:47:19 -07:00
rseq.c
scftorture.c scftorture: Handle NULL argument passed to scf_add_to_free_list(). 2024-11-14 16:09:51 -08:00
scs.c
seccomp.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
signal.c signal/posixtimers: Handle ignore/blocked sequences correctly 2025-01-15 18:08:01 +01:00
smp.c locking/csd-lock: Switch from sched_clock() to ktime_get_mono_fast_ns() 2024-10-11 09:31:21 -07:00
smpboot.c kthread: add kthread_stop_put 2023-10-04 10:41:57 -07:00
smpboot.h
softirq.c softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel 2024-12-02 12:01:27 +01:00
stackleak.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
stacktrace.c stacktrace: fix kernel-doc typo 2023-12-29 12:22:29 -08:00
static_call.c
static_call_inline.c x86/static-call: provide a way to do very early static-call updates 2024-12-13 09:28:32 +01:00
stop_machine.c rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs() 2024-08-15 21:30:42 +05:30
sys.c arm64 updates for 6.13: 2024-11-18 18:10:37 -08:00
sys_ni.c Probes updates for v6.11: 2024-07-18 12:19:20 -07:00
sysctl-test.c sysctl: Add module description to sysctl-testing 2024-06-03 15:20:37 +02:00
sysctl.c sysctl: Reorganize kerneldoc parameter names 2024-10-23 15:28:40 +02:00
task_work.c sched/core: Disable page allocation in task_tick_mm_cid() 2024-10-11 10:49:32 +02:00
taskstats.c fdget(), more trivial conversions 2024-11-03 01:28:06 -05:00
torture.c torture: Add MODULE_DESCRIPTION() 2024-05-30 15:31:38 -07:00
tracepoint.c tracing: Fix syscall tracepoint use-after-free 2024-11-01 14:37:31 -04:00
tsacct.c tsacct: replace strncpy() with strscpy() 2024-07-12 16:39:53 -07:00
ucount.c Summary 2024-11-22 20:36:11 -08:00
uid16.c
uid16.h
umh.c remove pointless includes of <linux/fdtable.h> 2024-10-07 13:34:41 -04:00
up.c smp: Change function signatures to use call_single_data_t 2023-09-13 14:59:24 +02:00
user-return-notifier.c
user.c uidgid: make sure we fit into one cacheline 2024-09-12 12:16:09 +02:00
user_namespace.c user_namespace: use kmemdup_array() instead of kmemdup() for multiple allocation 2024-09-09 16:47:42 -07:00
usermode_driver.c
utsname.c
utsname_sysctl.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
vhost_task.c vhost_task: Handle SIGKILL by flushing work and exiting 2024-05-22 08:31:15 -04:00
vmcore_info.c mm: support only one page_type per page 2024-09-03 21:15:43 -07:00
watch_queue.c watch_queue: Use page->private instead of page->index 2024-12-22 11:29:51 +01:00
watchdog.c - The series "resource: A couple of cleanups" from Andy Shevchenko 2024-11-25 16:09:48 -08:00
watchdog_buddy.c watchdog/hardlockup: move SMP barriers from common code to buddy code 2023-06-19 16:25:28 -07:00
watchdog_perf.c watchdog/perf: properly initialize the turbo mode timestamp and rearm counter 2024-07-17 21:11:34 -07:00
workqueue.c workqueue: warn if delayed_work is queued to an offlined cpu. 2025-01-10 08:33:39 -10:00
workqueue_internal.h workqueue: Drop the special locking rule for worker->flags and worker_pool->flags 2023-08-07 15:57:22 -10:00