Commit graph

4499 commits

Author SHA1 Message Date
Linus Torvalds
1b37ac211a nfsd-7.0 fixes:
NFSD fixes that arrived too late for the 7.0 merge window.
 
 Fixes for commits merged in 7.0:
 - Restore previous nfsd thread count reporting behavior
 
 Issues that need expedient stable backports:
 - Fix credential reference leaks in the NFSD netlink admin protocol
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmmlmfQACgkQM2qzM29m
 f5ckDA//eSSeZG+Ld2MX+DrYH7aYUSQzlUJ7ENKpt904tw6qy8o3dPFg5gcF2ZKQ
 2cK05l0G96eaYYZu/nto3RdS7lx3iDgQuWq9KorQMYXoCad/tNl8RYC8HiuH+aDu
 Fir7RApMknXe54Mz7uiPaZBUZHkb+hqe9wHOVJkZyMlRMYNAtijsI4wfaY9a5ACK
 dh03lCMOJyU3emBXizNsZ9lysuRbPVpHQEmcZsJUTnA7f6xcCTF/CyEtxjCHX9Z5
 KZ0Ltb/kG9V1VFyuGAm1S0dQmAKbl2WUo5k5eslRXmHxFx072BFOpwXlr4qd4yWt
 zjY9VY5q0anXWNgwz1U897R5xDfx43C+OdnRcMxWF7bRnNmNyCNeXnYUgSuh4HYF
 Y2IHBJk9HXSlxeiSZAq45lDgNOfg5ZBgGVVfcuKqUxgcCqG5r56FqGFkJiPvuDiI
 CEW6dIn7OQuUzDnSK0vXWFR1KGu39nKaunJHAq2BTLxbW42K5EPDw+Vhibym2LQG
 uSsBNHtviWKMONkb3jrkK5sIZryL07M/fLsYKYkSmF/B1XVwtvZHGG2k7qCCCM7B
 5IKjAFeFCRqiyYO8lm3dhLz/SbH5jpqUb3V7OpxAytk8FAEsUGX5y8fu43rfQCZD
 g2spjlrtoAhg3dSsAmrw9bDSs2TdAagWkSj1NfNgbsJs2irEgyU=
 =kHSn
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Restore previous nfsd thread count reporting behavior

 - Fix credential reference leaks in the NFSD netlink admin protocol

* tag 'nfsd-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: report the requested maximum number of threads instead of number running
  nfsd: Fix cred ref leak in nfsd_nl_listener_set_doit().
  nfsd: Fix cred ref leak in nfsd_nl_threads_set_doit().
2026-03-02 09:05:20 -08:00
Jeff Layton
364410170a nfsd: report the requested maximum number of threads instead of number running
The current netlink and /proc interfaces deviate from their traditional
values when dynamic threading is enabled, and there is currently no way
to know what the current setting is. This patch brings the reporting
back in line with traditional behavior.

Make these interfaces report the requested maximum number of threads
instead of the number currently running. Also, update documentation and
comments to reflect that this value represents a maximum and not the
number currently running.

Fixes: d8316b837c ("nfsd: add controls to set the minimum number of threads per pool")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-02-24 10:27:51 -05:00
Kees Cook
189f164e57 Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses
Conversion performed via this Coccinelle script:

  // SPDX-License-Identifier: GPL-2.0-only
  // Options: --include-headers-for-types --all-includes --include-headers --keep-comments
  virtual patch

  @gfp depends on patch && !(file in "tools") && !(file in "samples")@
  identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex,
 		    kzalloc_obj,kzalloc_objs,kzalloc_flex,
		    kvmalloc_obj,kvmalloc_objs,kvmalloc_flex,
		    kvzalloc_obj,kvzalloc_objs,kvzalloc_flex};
  @@

  	ALLOC(...
  -		, GFP_KERNEL
  	)

  $ make coccicheck MODE=patch COCCI=gfp.cocci

Build and boot tested x86_64 with Fedora 42's GCC and Clang:

Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01

Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-22 08:26:33 -08:00
Linus Torvalds
32a92f8c89 Convert more 'alloc_obj' cases to default GFP_KERNEL arguments
This converts some of the visually simpler cases that have been split
over multiple lines.  I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.

Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script.  I probably had made it a bit _too_ trivial.

So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.

The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 20:03:00 -08:00
Linus Torvalds
323bbfcf1e Convert 'alloc_flex' family to use the new default GFP_KERNEL argument
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.

As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
45a43ac5ac vfs-7.0-rc1.misc.2
Please consider pulling these changes from the signed vfs-7.0-rc1.misc.2 tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaZMOCwAKCRCRxhvAZXjc
 oswrAP9r1zjzMimjX2J0hBoMnYjNzQfLLew8+IRygImQ+yaqWgD9Fiw/cQ9eE1Hm
 TMLqck/ky588ywSDaBzfztrXAY3ISgg=
 =4yr2
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.0-rc1.misc.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull more misc vfs updates from Christian Brauner:
 "Features:

   - Optimize close_range() from O(range size) to O(active FDs) by using
     find_next_bit() on the open_fds bitmap instead of linearly scanning
     the entire requested range. This is a significant improvement for
     large-range close operations on sparse file descriptor tables.

   - Add FS_XFLAG_VERITY file attribute for fs-verity files, retrievable
     via FS_IOC_FSGETXATTR and file_getattr(). The flag is read-only.
     Add tracepoints for fs-verity enable and verify operations,
     replacing the previously removed debug printk's.

   - Prevent nfsd from exporting special kernel filesystems like pidfs
     and nsfs. These filesystems have custom ->open() and ->permission()
     export methods that are designed for open_by_handle_at(2) only and
     are incompatible with nfsd. Update the exportfs documentation
     accordingly.

  Fixes:

   - Fix KMSAN uninit-value in ovl_fill_real() where strcmp() was used
     on a non-null-terminated decrypted directory entry name from
     fscrypt. This triggered on encrypted lower layers when the
     decrypted name buffer contained uninitialized tail data.

     The fix also adds VFS-level name_is_dot(), name_is_dotdot(), and
     name_is_dot_dotdot() helpers, replacing various open-coded "." and
     ".." checks across the tree.

   - Fix read-only fsflags not being reset together with xflags in
     vfs_fileattr_set(). Currently harmless since no read-only xflags
     overlap with flags, but this would cause inconsistencies for any
     future shared read-only flag

   - Return -EREMOTE instead of -ESRCH from PIDFD_GET_INFO when the
     target process is in a different pid namespace. This lets userspace
     distinguish "process exited" from "process in another namespace",
     matching glibc's pidfd_getpid() behavior

  Cleanups:

   - Use C-string literals in the Rust seq_file bindings, replacing the
     kernel::c_str!() macro (available since Rust 1.77)

   - Fix typo in d_walk_ret enum comment, add porting notes for the
     readlink_copy() calling convention change"

* tag 'vfs-7.0-rc1.misc.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: add porting notes about readlink_copy()
  pidfs: return -EREMOTE when PIDFD_GET_INFO is called on another ns
  nfsd: do not allow exporting of special kernel filesystems
  exportfs: clarify the documentation of open()/permission() expotrfs ops
  fsverity: add tracepoints
  fs: add FS_XFLAG_VERITY for fs-verity files
  rust: seq_file: replace `kernel::c_str!` with C-Strings
  fs: dcache: fix typo in enum d_walk_ret comment
  ovl: use name_is_dot* helpers in readdir code
  fs: add helpers name_is_dot{,dot,_dotdot}
  ovl: Fix uninit-value in ovl_fill_real
  fs: reset read-only fsflags together with xflags
  fs/file: optimize close_range() complexity from O(N) to O(Sparse)
2026-02-16 13:00:36 -08:00
Kuniyuki Iwashima
92978c83bb nfsd: Fix cred ref leak in nfsd_nl_listener_set_doit().
nfsd_nl_listener_set_doit() uses get_current_cred() without
put_cred().

As we can see from other callers, svc_xprt_create_from_sa()
does not require the extra refcount.

nfsd_nl_listener_set_doit() is always in the process context,
sendmsg(), and current->cred does not go away.

Let's use current_cred() in nfsd_nl_listener_set_doit().

Fixes: 16a4711774 ("NFSD: add listener-{set,get} netlink command")
Cc: stable@vger.kernel.org
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-02-14 12:50:24 -05:00
Kuniyuki Iwashima
1cb968a201 nfsd: Fix cred ref leak in nfsd_nl_threads_set_doit().
syzbot reported memory leak of struct cred. [0]

nfsd_nl_threads_set_doit() passes get_current_cred() to
nfsd_svc(), but put_cred() is not called after that.

The cred is finally passed down to _svc_xprt_create(),
which calls get_cred() with the cred for struct svc_xprt.

The ownership of the refcount by get_current_cred() is not
transferred to anywhere and is just leaked.

nfsd_svc() is also called from write_threads(), but it does
not bump file->f_cred there.

nfsd_nl_threads_set_doit() is called from sendmsg() and
current->cred does not go away.

Let's use current_cred() in nfsd_nl_threads_set_doit().

[0]:
BUG: memory leak
unreferenced object 0xffff888108b89480 (size 184):
  comm "syz-executor", pid 5994, jiffies 4294943386
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc 369454a7):
    kmemleak_alloc_recursive include/linux/kmemleak.h:44 [inline]
    slab_post_alloc_hook mm/slub.c:4958 [inline]
    slab_alloc_node mm/slub.c:5263 [inline]
    kmem_cache_alloc_noprof+0x412/0x580 mm/slub.c:5270
    prepare_creds+0x22/0x600 kernel/cred.c:185
    copy_creds+0x44/0x290 kernel/cred.c:286
    copy_process+0x7a7/0x2870 kernel/fork.c:2086
    kernel_clone+0xac/0x6e0 kernel/fork.c:2651
    __do_sys_clone+0x7f/0xb0 kernel/fork.c:2792
    do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
    do_syscall_64+0xa4/0xf80 arch/x86/entry/syscall_64.c:94
    entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 924f4fb003 ("NFSD: convert write_threads to netlink command")
Cc: stable@vger.kernel.org
Reported-by: syzbot+dd3b43aa0204089217ee@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/69744674.a00a0220.33ccc7.0000.GAE@google.com/
Tested-by: syzbot+dd3b43aa0204089217ee@syzkaller.appspotmail.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-02-14 12:48:51 -05:00
Linus Torvalds
136114e0ab mm.git review status for linus..mm-nonmm-stable
Total patches:       107
 Reviews/patch:       1.07
 Reviewed rate:       67%
 
 - The 2 patch series "ocfs2: give ocfs2 the ability to reclaim
   suballocator free bg" from Heming Zhao saves disk space by teaching
   ocfs2 to reclaim suballocator block group space.
 
 - The 4 patch series "Add ARRAY_END(), and use it to fix off-by-one
   bugs" from Alejandro Colomar adds the ARRAY_END() macro and uses it in
   various places.
 
 - The 2 patch series "vmcoreinfo: support VMCOREINFO_BYTES larger than
   PAGE_SIZE" from Pnina Feder makes the vmcore code future-safe, if
   VMCOREINFO_BYTES ever exceeds the page size.
 
 - The 7 patch series "kallsyms: Prevent invalid access when showing
   module buildid" from Petr Mladek cleans up kallsyms code related to
   module buildid and fixes an invalid access crash when printing
   backtraces.
 
 - The 3 patch series "Address page fault in
   ima_restore_measurement_list()" from Harshit Mogalapalli fixes a
   kexec-related crash that can occur when booting the second-stage kernel
   on x86.
 
 - The 6 patch series "kho: ABI headers and Documentation updates" from
   Mike Rapoport updates the kexec handover ABI documentation.
 
 - The 4 patch series "Align atomic storage" from Finn Thain adds the
   __aligned attribute to atomic_t and atomic64_t definitions to get
   natural alignment of both types on csky, m68k, microblaze, nios2,
   openrisc and sh.
 
 - The 2 patch series "kho: clean up page initialization logic" from
   Pratyush Yadav simplifies the page initialization logic in
   kho_restore_page().
 
 - The 6 patch series "Unload linux/kernel.h" from Yury Norov moves
   several things out of kernel.h and into more appropriate places.
 
 - The 7 patch series "don't abuse task_struct.group_leader" from Oleg
   Nesterov removes the usage of ->group_leader when it is "obviously
   unnecessary".
 
 - The 5 patch series "list private v2 & luo flb" from Pasha Tatashin
   adds some infrastructure improvements to the live update orchestrator.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY4giAAKCRDdBJ7gKXxA
 jgusAQDnKkP8UWTqXPC1jI+OrDJGU5ciAx8lzLeBVqMKzoYk9AD/TlhT2Nlx+Ef6
 0HCUHUD0FMvAw/7/Dfc6ZKxwBEIxyww=
 =mmsH
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
   disk space by teaching ocfs2 to reclaim suballocator block group
   space (Heming Zhao)

 - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
   ARRAY_END() macro and uses it in various places (Alejandro Colomar)

 - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
   the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
   page size (Pnina Feder)

 - "kallsyms: Prevent invalid access when showing module buildid" cleans
   up kallsyms code related to module buildid and fixes an invalid
   access crash when printing backtraces (Petr Mladek)

 - "Address page fault in ima_restore_measurement_list()" fixes a
   kexec-related crash that can occur when booting the second-stage
   kernel on x86 (Harshit Mogalapalli)

 - "kho: ABI headers and Documentation updates" updates the kexec
   handover ABI documentation (Mike Rapoport)

 - "Align atomic storage" adds the __aligned attribute to atomic_t and
   atomic64_t definitions to get natural alignment of both types on
   csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)

 - "kho: clean up page initialization logic" simplifies the page
   initialization logic in kho_restore_page() (Pratyush Yadav)

 - "Unload linux/kernel.h" moves several things out of kernel.h and into
   more appropriate places (Yury Norov)

 - "don't abuse task_struct.group_leader" removes the usage of
   ->group_leader when it is "obviously unnecessary" (Oleg Nesterov)

 - "list private v2 & luo flb" adds some infrastructure improvements to
   the live update orchestrator (Pasha Tatashin)

* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
  watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
  procfs: fix missing RCU protection when reading real_parent in do_task_stat()
  watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
  kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
  kho: fix doc for kho_restore_pages()
  tests/liveupdate: add in-kernel liveupdate test
  liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
  liveupdate: luo_file: Use private list
  list: add kunit test for private list primitives
  list: add primitives for private list manipulations
  delayacct: fix uapi timespec64 definition
  panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
  netclassid: use thread_group_leader(p) in update_classid_task()
  RDMA/umem: don't abuse current->group_leader
  drm/pan*: don't abuse current->group_leader
  drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
  drm/amdgpu: don't abuse current->group_leader
  android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
  android/binder: don't abuse current->group_leader
  kho: skip memoryless NUMA nodes when reserving scratch areas
  ...
2026-02-12 12:13:01 -08:00
Linus Torvalds
2831fa8b8b NFSD 7.0 Release Notes
Neil Brown and Jeff Layton contributed a dynamic thread pool sizing
 mechanism for NFSD. The sunrpc layer now tracks minimum and maximum
 thread counts per pool, and NFSD adjusts running thread counts based
 on workload: idle threads exit after a timeout when the pool exceeds
 its minimum, and new threads spawn automatically when all threads
 are busy. Administrators control this behavior via the nfsdctl
 netlink interface.
 
 Rick Macklem, FreeBSD NFS maintainer, generously contributed server-
 side support for the POSIX ACL extension to NFSv4, as specified in
 draft-ietf-nfsv4-posix-acls. This extension allows NFSv4 clients to
 get and set POSIX access and default ACLs using native NFSv4
 operations, eliminating the need for sideband protocols. The feature
 is gated by a Kconfig option since the IETF draft has not yet been
 ratified.
 
 Chuck Lever delivered numerous improvements to the xdrgen tool.
 Error reporting now covers parsing, AST transformation, and invalid
 declarations. Generated enum decoders validate incoming values
 against valid enumerator lists. New features include pass-through
 line support for embedding C directives in XDR specifications,
 16-bit integer types, and program number definitions. Several code
 generation issues were also addressed.
 
 When an administrator revokes NFSv4 state for a filesystem via the
 unlock_fs interface, ongoing async COPY operations referencing that
 filesystem are now cancelled, with CB_OFFLOAD callbacks notifying
 affected clients.
 
 The remaining patches in this pull request are clean-ups and minor
 optimizations. Sincere thanks to all contributors, reviewers,
 testers, and bug reporters who participated in the v7.0 NFSD
 development cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmmJ8kAACgkQM2qzM29m
 f5ejCQ//RdoWNgN1VZdNoUrh1tm1Fhi1YN/RJS26G25OxgTBc3/qtGxrpW+ZAW6+
 mIAJ2bT/l66741drki4/x6WJU4OMI/4mJxrLd0WCb1POaeRQWnL1MdzNY+IP/QZv
 3DgcTv1T6FKE7pFmAqW0nFPCgaK+vlR+fo4uJognbB6+hZB3HlrLkfeZOWMAmchC
 y3U6nzrtP+IljAtdzKZ120E+LHp0PtTbJwPCPSt3/FR/dkA0DcjnOS9jybIYlJOu
 0ByX24BcrW/c3rJUdL8lL4G7gsPWjdARqczFiN8sufI9Q3zlHOxtYdUT7BNjd+04
 jcSKLlAXwcbNcK2f54B/QFKmNxllvoHLB3wo2hfEPig4LQELuxcUHYxmmD4vNKen
 lp6zmaLq3PiRGlew6eLRFxRxbdLds+9l0xjXV+J+rtQmjppXdXUoVNMm+D+tD6bF
 T5TUq4WNCGJIrpkR7pdF7uMD51s8fphvaDxOCjhSi3WHAtZAhOR8HFUU97qddM34
 KqF6Gph3tN/C4oNb8kKvzxBRpRhHIzKHZbreiu5fZr9pPe9IRBHnn/Dg4p/yYQcw
 K3/y1EnKrIlprfbFFkY1LzNFpf309uoZTVzwBcMfSJVsFgUqWD7KHJ/rmCJQ/pS6
 k0+YLRoUmtUHDYk2QNlstlt7r6FwA0d2GjT8n7viGoNQ3PA7rJQ=
 =hqla
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Neil Brown and Jeff Layton contributed a dynamic thread pool sizing
  mechanism for NFSD. The sunrpc layer now tracks minimum and maximum
  thread counts per pool, and NFSD adjusts running thread counts based
  on workload: idle threads exit after a timeout when the pool exceeds
  its minimum, and new threads spawn automatically when all threads are
  busy. Administrators control this behavior via the nfsdctl netlink
  interface.

  Rick Macklem, FreeBSD NFS maintainer, generously contributed server-
  side support for the POSIX ACL extension to NFSv4, as specified in
  draft-ietf-nfsv4-posix-acls. This extension allows NFSv4 clients to
  get and set POSIX access and default ACLs using native NFSv4
  operations, eliminating the need for sideband protocols. The feature
  is gated by a Kconfig option since the IETF draft has not yet been
  ratified.

  Chuck Lever delivered numerous improvements to the xdrgen tool. Error
  reporting now covers parsing, AST transformation, and invalid
  declarations. Generated enum decoders validate incoming values against
  valid enumerator lists. New features include pass-through line support
  for embedding C directives in XDR specifications, 16-bit integer
  types, and program number definitions. Several code generation issues
  were also addressed.

  When an administrator revokes NFSv4 state for a filesystem via the
  unlock_fs interface, ongoing async COPY operations referencing that
  filesystem are now cancelled, with CB_OFFLOAD callbacks notifying
  affected clients.

  The remaining patches in this pull request are clean-ups and minor
  optimizations. Sincere thanks to all contributors, reviewers, testers,
  and bug reporters who participated in the v7.0 NFSD development cycle"

* tag 'nfsd-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (45 commits)
  NFSD: Add POSIX ACL file attributes to SUPPATTR bitmasks
  NFSD: Add POSIX draft ACL support to the NFSv4 SETATTR operation
  NFSD: Add support for POSIX draft ACLs for file creation
  NFSD: Add support for XDR decoding POSIX draft ACLs
  NFSD: Refactor nfsd_setattr()'s ACL error reporting
  NFSD: Do not allow NFSv4 (N)VERIFY to check POSIX ACL attributes
  NFSD: Add nfsd4_encode_fattr4_posix_access_acl
  NFSD: Add nfsd4_encode_fattr4_posix_default_acl
  NFSD: Add nfsd4_encode_fattr4_acl_trueform_scope
  NFSD: Add nfsd4_encode_fattr4_acl_trueform
  Add RPC language definition of NFSv4 POSIX ACL extension
  NFSD: Add a Kconfig setting to enable support for NFSv4 POSIX ACLs
  xdrgen: Implement pass-through lines in specifications
  nfsd: cancel async COPY operations when admin revokes filesystem state
  nfsd: add controls to set the minimum number of threads per pool
  nfsd: adjust number of running nfsd threads based on activity
  sunrpc: allow svc_recv() to return -ETIMEDOUT and -EBUSY
  sunrpc: split new thread creation into a separate function
  sunrpc: introduce the concept of a minimum number of threads per pool
  sunrpc: track the max number of requested threads in a pool
  ...
2026-02-12 08:23:53 -08:00
Linus Torvalds
8113b3998d vfs-7.0-rc1.atomic_open
Please consider pulling these changes from the signed vfs-7.0-rc1.atomic_open tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc
 ogUIAQDJTGgoi7H5a8OllRLXU/6D4OXhIhvZtvrK31HfLLDTRAEAw8JFnvFrCJP9
 xf3yVklTJ9aW65zeh2mG0uiJ87JORgE=
 =qP0i
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.0-rc1.atomic_open' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs atomic_open updates from Christian Brauner:
 "Allow knfsd to use atomic_open()

  While knfsd offers combined exclusive create and open results to
  clients, on some filesystems those results are not atomic. The
  separate vfs_create() + vfs_open() sequence in dentry_create() can
  produce races and unexpected errors. For example, open O_CREAT with
  mode 0 will succeed in creating the file but return -EACCES from
  vfs_open(). Additionally, network filesystems benefit from reducing
  remote round-trip operations by using a single atomic_open() call.

  Teach dentry_create() -- whose sole caller is knfsd -- to use
  atomic_open() for filesystems that support it"

* tag 'vfs-7.0-rc1.atomic_open' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs/namei: fix kernel-doc markup for dentry_create
  VFS/knfsd: Teach dentry_create() to use atomic_open()
  VFS: Prepare atomic_open() for dentry_create()
  VFS: move dentry_create() from fs/open.c to fs/namei.c
2026-02-09 14:25:37 -08:00
Amir Goldstein
b3c78bc536
nfsd: do not allow exporting of special kernel filesystems
pidfs and nsfs recently gained support for encode/decode of file handles
via name_to_handle_at(2)/open_by_handle_at(2).

These special kernel filesystems have custom ->open() and ->permission()
export methods, which nfsd does not respect and it was never meant to be
used for exporting those filesystems by nfsd.

Therefore, do not allow nfsd to export filesystems with custom ->open()
or ->permission() methods.

Fixes: b3caba8f7a ("pidfs: implement file handle support")
Fixes: 5222470b2f ("nsfs: support file handles")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20260129100212.49727-3-amir73il@gmail.com
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-29 17:26:30 +01:00
Chuck Lever
e939bd6756 NFSD: Add POSIX ACL file attributes to SUPPATTR bitmasks
Now that infrastructure for NFSv4 POSIX draft ACL has been added
to NFSD, it should be safe to advertise support to NFS clients.

NFSD_SUPPATTR_EXCLCREAT_WORD2 includes NFSv4.2-only attributes,
but version filtering occurs via nfsd_suppattrs[] before this
mask is applied, ensuring pre-4.2 clients never see unsupported
attributes.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-29 09:48:33 -05:00
Rick Macklem
318579c093 NFSD: Add POSIX draft ACL support to the NFSv4 SETATTR operation
The POSIX ACL extension to NFSv4 enables clients to set access
and default ACLs via FATTR4_POSIX_ACCESS_ACL and
FATTR4_POSIX_DEFAULT_ACL attributes. Integration of these
attributes into SETATTR processing requires wiring them through
the nfsd_attrs structure and ensuring proper cleanup on all
code paths.

This patch connects the na_pacl and na_dpacl fields in
nfsd_attrs to the decoded ACL pointers from the NFSv4 SETATTR
decoder. Ownership of these ACL references transfers to attrs
immediately after initialization, with the decoder's pointers
cleared to NULL. This transfer ensures nfsd_attrs_free()
releases the ACLs on normal completion, while new error paths
call posix_acl_release() directly when cleanup occurs before
nfsd_attrs_free() runs.

Early returns in the nfsd4_setattr() function gain conversions
to goto statements that branch to proper cleanup handlers.
Error paths before fh_want_write() branch to out_err for ACL
release only; paths after fh_want_write() use the existing out
label for full cleanup via nfsd_attrs_free().

The patch adds mutual exclusion between NFSv4 ACLs (sa_acl) and
POSIX ACLs. Setting both types simultaneously returns
nfserr_inval because these ACL models cannot coexist on the
same file object.

Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-29 09:48:33 -05:00
Rick Macklem
d2ca50606f NFSD: Add support for POSIX draft ACLs for file creation
NFSv4.2 clients can specify POSIX draft ACLs when creating file
objects via OPEN(CREATE) and CREATE operations. The previous patch
added POSIX ACL support to the NFSv4 SETATTR operation for modifying
existing objects, but file creation follows different code paths
that also require POSIX ACL handling.

This patch integrates POSIX ACL support into nfsd4_create() and
nfsd4_create_file(). Ownership of the decoded ACL pointers
(op_dpacl, op_pacl, cr_dpacl, cr_pacl) transfers to the nfsd_attrs
structure immediately, with the original fields cleared to NULL.
This transfer ensures nfsd_attrs_free() releases the ACLs upon
completion while preventing double-free on error paths.

Mutual exclusion between NFSv4 ACLs and POSIX ACLs is enforced:
setting both op_acl and op_dpacl/op_pacl simultaneously returns
nfserr_inval. Errors during ACL application clear the corresponding
bits in the result bitmask (fattr->bmval), signaling partial
completion to the client.

Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-29 09:48:33 -05:00
Rick Macklem
5fc51dfc2e NFSD: Add support for XDR decoding POSIX draft ACLs
The POSIX ACL extension to NFSv4 defines FATTR4_POSIX_ACCESS_ACL
and FATTR4_POSIX_DEFAULT_ACL for setting access and default ACLs
via CREATE, OPEN, and SETATTR operations. This patch adds the XDR
decoders for those attributes.

The nfsd4_decode_fattr4() function gains two additional parameters
for receiving decoded POSIX ACLs. CREATE, OPEN, and SETATTR
decoders pass pointers to these new parameters, enabling clients
to set POSIX ACLs during object creation or modification.

Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-29 09:48:33 -05:00
Rick Macklem
345c4b7734 NFSD: Refactor nfsd_setattr()'s ACL error reporting
Support for FATTR4_POSIX_ACCESS_ACL and FATTR4_POSIX_DEFAULT_ACL
attributes in subsequent patches allows clients to set both ACL
types simultaneously during SETATTR and file creation. Each ACL
type can succeed or fail independently, requiring the server to
clear individual attribute bits in the reply bitmap when one
fails while the other succeeds.

The existing na_aclerr field cannot distinguish which ACL type
encountered an error. Separate error fields (na_paclerr for
access ACLs, na_dpaclerr for default ACLs) enable the server to
report per-ACL-type failures accurately.

This refactoring also adds validation previously absent: default
ACL processing rejects non-directory targets with EINVAL and
passes NULL to set_posix_acl() when a_count is zero to delete
the ACL. Access ACL processing rejects zero a_count with EINVAL
for ACL_SCOPE_FILE_SYSTEM semantics (the only scope currently
supported).

The changes preserve compatibility with existing NFSv4 ACL code.
NFSv4 ACL conversion (nfs4_acl_nfsv4_to_posix()) never produces
POSIX ACLs with a_count == 0, so the new validation logic only
affects future POSIX ACL attribute handling.

Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-29 09:48:33 -05:00
Rick Macklem
9ac6fc0fab NFSD: Do not allow NFSv4 (N)VERIFY to check POSIX ACL attributes
Section 9.3 of draft-ietf-nfsv4-posix-acls-00 prohibits use of
the POSIX ACL attributes with VERIFY and NVERIFY operations: the
server MUST reply NFS4ERR_INVAL when a client attempts this.

Beyond the protocol requirement, comparison of POSIX draft ACLs
via (N)VERIFY presents an implementation challenge. Clients are
not required to order the ACEs within a POSIX ACL in any
particular way, making reliable attribute comparison impractical.

Return nfserr_inval when the client requests FATTR4_POSIX_ACCESS_ACL
or FATTR4_POSIX_DEFAULT_ACL in a VERIFY or NVERIFY operation.

Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-29 09:48:33 -05:00
Rick Macklem
97e9a9ec32 NFSD: Add nfsd4_encode_fattr4_posix_access_acl
The POSIX ACL extension to NFSv4 defines FATTR4_POSIX_ACCESS_ACL
for retrieving the access ACL of a file or directory. This patch
adds the XDR encoder for that attribute.

The access ACL is retrieved via get_inode_acl(). If the filesystem
provides no explicit access ACL, one is synthesized from the file
mode via posix_acl_from_mode(). Each entry is encoded as a
posixace4: tag type, permission bits, and principal name (empty
for structural entries, resolved via idmapping for USER/GROUP
entries).

Unlike the default ACL encoder which applies only to directories,
this encoder handles all inode types and ensures an access ACL is
always available through mode-based synthesis when needed.

Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-29 09:48:33 -05:00
Rick Macklem
5e62c904e4 NFSD: Add nfsd4_encode_fattr4_posix_default_acl
The POSIX ACL extension to NFSv4 defines FATTR4_POSIX_DEFAULT_ACL
for retrieving a directory's default ACL. This patch adds the
XDR encoder for that attribute.

For directories, the default ACL is retrieved via get_inode_acl()
and each entry is encoded as a posixace4: tag type, permission
bits, and principal name (empty for structural entries like
USER_OBJ/GROUP_OBJ/MASK/OTHER, resolved via idmapping for
USER/GROUP entries).

Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-29 09:48:33 -05:00
Rick Macklem
8093c31f2c NFSD: Add nfsd4_encode_fattr4_acl_trueform_scope
The FATTR4_ACL_TRUEFORM_SCOPE attribute indicates the granularity at
which the ACL model can vary: per file object, per file system, or
uniformly across the entire server.

In Linux, the ACL model is determined by the SB_POSIXACL superblock
flag, which applies uniformly to all files within a file system.
Different exported file systems can have different ACL models, but
individual files cannot differ from their containing file system.
ACL_SCOPE_FILE_SYSTEM accurately reflects this behavior.

Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-29 09:48:33 -05:00
Rick Macklem
4a639a727f NFSD: Add nfsd4_encode_fattr4_acl_trueform
Mapping between NFSv4 ACLs and POSIX ACLs is semantically imprecise:
a client that sets an NFSv4 ACL and reads it back may see a different
ACL than it wrote. The proposed NFSv4 POSIX ACL extension introduces
the FATTR4_ACL_TRUEFORM attribute, which reports whether a file
object stores its access control permissions using NFSv4 ACLs or
POSIX ACLs.

A client aware of this extension can avoid lossy translation by
requesting and setting ACLs in their native format.

When NFSD is built with CONFIG_NFSD_V4_POSIX_ACLS, report
ACL_MODEL_POSIX_DRAFT for file objects on file systems with the
SB_POSIXACL flag set, and ACL_MODEL_NONE otherwise. Linux file
systems do not store NFSv4 ACLs natively, so ACL_MODEL_NFS4 is never
reported.

Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-29 09:48:33 -05:00
Chuck Lever
91dc464fbe Add RPC language definition of NFSv4 POSIX ACL extension
The language definition was extracted from the new
draft-ietf-nfsv4-posix-acls specification. This ensures good
constant and type name alignment between the spec and the Linux
kernel source code, and brings in some basic XDR utilities for
handling NFSv4 POSIX draft ACLs.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-29 09:48:33 -05:00
Chuck Lever
feb8a46b14 NFSD: Add a Kconfig setting to enable support for NFSv4 POSIX ACLs
A new IETF draft extends NFSv4.2 with POSIX ACL attributes:

  https://www.ietf.org/archive/id/draft-ietf-nfsv4-posix-acls-00.txt

This draft has not yet been ratified. A build-time configuration
option allows developers and distributors to decide whether to
expose this experimental protocol extension to NFSv4 clients. The
option is disabled by default to prevent unintended deployment of
potentially unstable protocol features in production environments.

This approach mirrors the existing NFSD_V4_DELEG_TIMESTAMPS option,
which gates another experimental NFSv4 extension based on an
unratified IETF draft.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-29 09:48:33 -05:00
Chuck Lever
6bc85baba4 xdrgen: Implement pass-through lines in specifications
XDR specification files can contain lines prefixed with '%' that
pass through unchanged to generated output. Traditional rpcgen
removes the '%' and emits the remainder verbatim, allowing direct
insertion of C includes, pragma directives, or other language-
specific content into the generated code.

Until now, xdrgen silently discarded these lines during parsing.
This prevented specifications from including necessary headers or
preprocessor directives that might be required for the generated
code to compile correctly.

The grammar now captures pass-through lines instead of ignoring
them. A new AST node type represents pass-through content, and
the AST transformer strips the leading '%' character. Definition
and source generators emit pass-through content in document order,
preserving the original placement within the specification.

This brings xdrgen closer to feature parity with traditional
rpcgen while maintaining the existing document-order processing
model.

Existing generated xdrgen source code has been regenerated.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-29 09:48:33 -05:00
Chuck Lever
3daab3112f nfsd: cancel async COPY operations when admin revokes filesystem state
Async COPY operations hold copy stateids that represent NFSv4 state.
Thus, when the NFS server administrator revokes all NFSv4 state for
a filesystem via the unlock_fs interface, ongoing async COPY
operations referencing that filesystem must also be canceled.

Each cancelled copy triggers a CB_OFFLOAD callback carrying the
NFS4ERR_ADMIN_REVOKED status to notify the client that the server
terminated the operation.

The static drop_client() function is renamed to nfsd4_put_client()
and exported. The function must be exported because both the new
nfsd4_cancel_copy_by_sb() and the CB_OFFLOAD release callback in
nfs4proc.c need to release client references.

Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28 10:15:42 -05:00
Jeff Layton
d8316b837c nfsd: add controls to set the minimum number of threads per pool
Add a new "min_threads" variable to the nfsd_net, along with the
corresponding netlink interface, to set that value from userland.
Pass that value to svc_set_pool_threads() and svc_set_num_threads().

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28 10:15:42 -05:00
Jeff Layton
1c87a0c39a nfsd: adjust number of running nfsd threads based on activity
nfsd() is changed to pass a timeout to svc_recv() when there is a min
number of threads set, and to handle error returns from it:

In the case of -ETIMEDOUT, if the service mutex can be taken (via
trylock), the thread becomes an RQ_VICTIM so that it will exit,
providing that the actual number of threads is above pool->sp_nrthrmin.

In the case of -EBUSY, if the actual number of threads is below
pool->sp_nrthrmax, it will attempt to start a new thread. This attempt
is gated on a new SP_TASK_STARTING pool flag that serializes thread
creation attempts within a pool, and further by mutex_trylock().

Neil says: "I think we want memory pressure to be able to push a thread
into returning -ETIMEDOUT.  That can come later."

Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28 10:15:42 -05:00
Jeff Layton
a0022a38be sunrpc: allow svc_recv() to return -ETIMEDOUT and -EBUSY
To dynamically adjust the thread count, nfsd requires some information
about how busy things are.

Change svc_recv() to take a timeout value, and then allow the wait for
work to time out if it's set. If a timeout is not defined, then the
schedule will be set to MAX_SCHEDULE_TIMEOUT. If the task waits for the
full timeout, then have it return -ETIMEDOUT to the caller.

If it wakes up, finds that there is more work and that no threads are
available, then attempt to set SP_TASK_STARTING. If wasn't already set,
have the task return -EBUSY to cue to the caller that the service could
use more threads.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28 10:15:42 -05:00
Jeff Layton
7ffc7ade2c sunrpc: introduce the concept of a minimum number of threads per pool
Add a new pool->sp_nrthrmin field to track the minimum number of threads
in a pool. Add min_threads parameters to both svc_set_num_threads() and
svc_set_pool_threads(). If min_threads is non-zero and less than the
max, svc_set_num_threads() will ensure that the number of running
threads is between the min and the max.

If the min is 0 or greater than the max, then it is ignored, and the
maximum number of threads will be started, and never spun down.

For now, the min_threads is always 0, but a later patch will pass the
proper value through from nfsd.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28 10:15:42 -05:00
Jeff Layton
e344f87262 sunrpc: split svc_set_num_threads() into two functions
svc_set_num_threads() will set the number of running threads for a given
pool. If the pool argument is set to NULL however, it will distribute
the threads among all of the pools evenly.

These divergent codepaths complicate the move to dynamic threading.
Simplify the API by splitting these two cases into different helpers:

Add a new svc_set_pool_threads() function that sets the number of
threads in a single, given pool. Modify svc_set_num_threads() to
distribute the threads evenly between all of the pools and then call
svc_set_pool_threads() for each.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28 10:15:42 -05:00
Chuck Lever
5288993c4d xdrgen: Add enum value validation to generated decoders
XDR enum decoders generated by xdrgen do not verify that incoming
values are valid members of the enum. Incoming out-of-range values
from malicious or buggy peers propagate through the system
unchecked.

Add validation logic to generated enum decoders using a switch
statement that explicitly lists valid enumerator values. The
compiler optimizes this to a simple range check when enum values
are dense (contiguous), while correctly rejecting invalid values
for sparse enums with gaps in their value ranges.

The --no-enum-validation option on the source subcommand disables
this validation when not needed.

The minimum and maximum fields in _XdrEnum, which were previously
unused placeholders for a range-based validation approach, have
been removed since the switch-based validation handles both dense
and sparse enums correctly.

Because the new mechanism results in substantive changes to
generated code, existing .x files are regenerated. Unrelated white
space and semicolon changes in the generated code are due to recent
commit 1c873a2fd1 ("xdrgen: Don't generate unnecessary semicolon")
and commit 38c4df91242b ("xdrgen: Address some checkpatch whitespace
complaints").

Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26 10:10:58 -05:00
Anthony Iliopoulos
404d779466 nfsd: fix return error code for nfsd_map_name_to_[ug]id
idmap lookups can time out while the cache is waiting for a userspace
upcall reply. In that case cache_check() returns -ETIMEDOUT to callers.

The nfsd_map_name_to_[ug]id functions currently proceed with attempting
to map the id to a kuid despite a potentially temporary failure to
perform the idmap lookup. This results in the code returning the error
NFSERR_BADOWNER which can cause client operations to return to userspace
with failure.

Fix this by returning the failure status before attempting kuid mapping.

This will return NFSERR_JUKEBOX on idmap lookup timeout so that clients
can retry the operation instead of aborting it.

Fixes: 65e10f6d0a ("nfsd: Convert idmap to use kuids and kgids")
Cc: stable@vger.kernel.org
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26 10:10:58 -05:00
Anthony Iliopoulos
f9c206cdc4 nfsd: never defer requests during idmap lookup
During v4 request compound arg decoding, some ops (e.g. SETATTR)
can trigger idmap lookup upcalls. When those upcall responses get
delayed beyond the allowed time limit, cache_check() will mark the
request for deferral and cause it to be dropped.

This prevents nfs4svc_encode_compoundres from being executed, and
thus the session slot flag NFSD4_SLOT_INUSE never gets cleared.
Subsequent client requests will fail with NFSERR_JUKEBOX, given
that the slot will be marked as in-use, making the SEQUENCE op
fail.

Fix this by making sure that the RQ_USEDEFERRAL flag is always
clear during nfs4svc_decode_compoundargs(), since no v4 request
should ever be deferred.

Fixes: 2f425878b6 ("nfsd: don't use the deferral service, return NFS4ERR_DELAY")
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26 10:10:58 -05:00
Olga Kornievskaia
41b0a87bc6 NFSD: fix setting FMODE_NOCMTIME in nfs4_open_delegation
fstests generic/215 and generic/407 were failing because the server
wasn't updating mtime properly. When deleg attribute support is not
compiled in and thus no attribute delegation was given, the server
was skipping updating mtime and ctime because FMODE_NOCMTIME was
uncoditionally set for the write delegation.

Fixes: e5e9b24ab8 ("nfsd: freeze c/mtime updates with outstanding WRITE_ATTRS delegation")
Cc: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26 10:10:58 -05:00
Jeff Layton
789477b849 nfsd: fix nfs4_file refcount leak in nfsd_get_dir_deleg()
Claude pointed out that there is a nfs4_file refcount leak in
nfsd_get_dir_deleg(). Ensure that the reference to "fp" is released
before returning.

Fixes: 8b99f6a8c1 ("nfsd: wire up GET_DIR_DELEGATION handling")
Cc: stable@vger.kernel.org
Cc: Chris Mason <clm@meta.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26 10:10:58 -05:00
NeilBrown
27e383ddeb nfsd: use workqueue enable/disable APIs for v4_end_grace sync
"nfsd: provide locking for v4_end_grace" introduced a
client_tracking_active flag protected by nn->client_lock to prevent
the laundromat from being scheduled before client tracking
initialization or after shutdown begins. That commit is suitable for
backporting to LTS kernels that predate commit 86898fa6b8
("workqueue: Implement disable/enable for (delayed) work items").

However, the workqueue subsystem in recent kernels provides
enable_delayed_work() and disable_delayed_work_sync() for this
purpose. Using this mechanism enable us to remove the
client_tracking_active flag and associated spinlock operations
while preserving the same synchronization guarantees, which is
a cleaner long-term approach.

Signed-off-by: NeilBrown <neil@brown.name>
Tested-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26 10:10:58 -05:00
Chuck Lever
0ac903d1bf NFS: NFSERR_INVAL is not defined by NFSv2
A documenting comment in include/uapi/linux/nfs.h claims incorrectly
that NFSv2 defines NFSERR_INVAL. There is no such definition in either
RFC 1094 or https://pubs.opengroup.org/onlinepubs/9629799/chap7.htm

NFS3ERR_INVAL is introduced in RFC 1813.

NFSD returns NFSERR_INVAL for PROC_GETACL, which has no
specification (yet).

However, nfsd_map_status() maps nfserr_symlink and nfserr_wrong_type
to nfserr_inval, which does not align with RFC 1094. This logic was
introduced only recently by commit 438f81e0e9 ("nfsd: move error
choice for incorrect object types to version-specific code."). Given
that we have no INVAL or SERVERFAULT status in NFSv2, probably the
only choice is NFSERR_IO.

Fixes: 438f81e0e9 ("nfsd: move error choice for incorrect object types to version-specific code.")
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26 10:10:58 -05:00
Jeff Layton
9be4b7e74e nfsd: prefix notification in nfsd4_finalize_deleg_timestamps() with "nfsd: "
Make it distinct that this message comes from nfsd.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26 10:10:58 -05:00
Chuck Lever
e344a031a4 NFSD: Add instructions on how to deal with xdrgen files
xdrgen requires a number of Python packages on the build system. We
don't want to add these to the kernel build dependency list, which
is long enough already.

The generated files are generated manually using

  $ cd fs/nfsd && make xdrgen

whenever the .x files are modified, then they are checked into the
kernel repo so others do not need to rebuild them.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26 10:10:58 -05:00
Chuck Lever
1f1fe81acb NFSD: Clean up nfsd4_check_open_attributes()
The @rqstp parameter was introduced in commit 3c8e03166a ("NFSv4:
do exact check about attribute specified") but has never been used.

Reduce indentation in callers to improve readability.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26 10:10:58 -05:00
Randy Dunlap
24c776355f kernel.h: drop hex.h and update all hex.h users
Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of
hex.h interfaces to directly #include <linux/hex.h> as part of the process
of putting kernel.h on a diet.

Removing hex.h from kernel.h means that 36K C source files don't have to
pay the price of parsing hex.h for the roughly 120 C source files that
need it.

This change has been build-tested with allmodconfig on most ARCHes.  Also,
all users/callers of <linux/hex.h> in the entire source tree have been
updated if needed (if not already #included).

Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:44:19 -08:00
Linus Torvalds
2bfe3e0da6 vfs-6.19-rc5.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaWDTUAAKCRCRxhvAZXjc
 okDeAQDsEpgFy8hpD08HBs4TpUXv7MSRqwZS7emlsfUqEjeprwEAgu95YuJ+z6hV
 RFk/Lior/+YlB5FN5VcKzyQGMuRDUwc=
 =snsQ
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.19-rc5.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Remove incorrect __user annotation from struct xattr_args::value

 - Documentation fix: Add missing kernel-doc description for the @isnew
   parameter in ilookup5_nowait() to silence Sphinx warnings

 - Documentation fix: Fix kernel-doc comment for __start_dirop() - the
   function name in the comment was wrong and the @state parameter was
   undocumented

 - Replace dynamic folio_batch allocation with stack allocation in
   iomap_zero_range(). The dynamic allocation was problematic for
   ext4-on-iomap work (didn't handle allocation failure properly) and
   triggered lockdep complaints. Uses a flag instead to control batch
   usage

 - Re-add #ifdef guards around PIDFD_GET_<ns-type>_NAMESPACE ioctls.
   When a namespace type is disabled, ns->ops is NULL, causes crashes
   during inode eviction when closing the fd. The ifdefs were removed in
   a recent simplification but are still needed

 - Fixe a race where a folio could be unlocked before the trailing zeros
   (for EOF within the page) were written

 - Split out a dedicated lease_dispose_list() helper since lease code
   paths always know they're disposing of leases. Removes unnecessary
   runtime flag checks and prepares for upcoming lease_manager
   enhancements

 - Fix userland delegation requests succeeding despite conflicting
   opens. Previously, FL_LAYOUT and FL_DELEG leases bypassed conflict
   checks (a hack for nfsd). Adds new ->lm_open_conflict() lease_manager
   operation so userland delegations get proper conflict checking while
   nfsd can continue its own conflict handling

 - Fix LOOKUP_CACHED path lookups incorrectly falling through to the
   slow path. After legitimize_links() calls were conditionally elided,
   the routine would always fail with LOOKUP_CACHED regardless of
   whether there were any links. Now the flag is checked at the two
   callsites before calling legitimize_links()

 - Fix bug in media fd allocation in media_request_alloc()

 - Fix mismatched API calls in ecryptfs_mknod(): was calling
   end_removing() instead of end_creating() after
   ecryptfs_start_creating_dentry()

 - Fix dentry reference count leak in ecryptfs_mkdir(): a dget() of the
   lower parent dir was added but never dput()'d, causing BUG during
   lower filesystem unmount due to the still-in-use dentry

* tag 'vfs-6.19-rc5.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
  pidfs: protect PIDFD_GET_* ioctls() via ifdef
  ecryptfs: Release lower parent dentry after creating dir
  ecryptfs: Fix improper mknod pairing of start_creating()/end_removing()
  get rid of bogus __user in struct xattr_args::value
  VFS: fix __start_dirop() kernel-doc warnings
  fs: Describe @isnew parameter in ilookup5_nowait()
  fs: make sure to fail try_to_unlazy() and try_to_unlazy() for LOOKUP_CACHED
  netfs: Fix early read unlock of page with EOF in middle
  filelock: allow lease_managers to dictate what qualifies as a conflict
  filelock: add lease_dispose_list() helper
  iomap: replace folio_batch allocation with stack allocation
  media: mc: fix potential use-after-free in media_request_alloc()
2026-01-09 05:57:57 -10:00
Linus Torvalds
f0b9d8eb98 nfsd-6.19 fixes:
A set of NFSD fixes that arrived after the 6.19 merge window.
 
 Issues that need expedient stable backports:
 - Remove an invalid NFS status code
 - Fix an fstests failure when using pNFS
 - Fix a UAF in v4_end_grace()
 - Fix the administrative interface used to revoke NFSv4 state
 - Fix a memory leak reported by syzbot
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmlb2WUACgkQM2qzM29m
 f5cqaA/+MbO1kop63/TiNE0tRc34yTBnApg1XVza4vSmcpSgpB8ZKGZ5xOjnRpwg
 yBw9+/puEJhyogPE6JKEGnLiFr+s3ApInFHaxnXnrGZz1RR1qkqfioKudIcpC0s1
 /pKx7y/fktltgo/5Dl0gp2QH3Oytg375ge+dcSQbopSTQYPbsAw7AmoHDPBQd8Nr
 Q/pIu1q/tAM8R2zyijU3eAiUMyYRCrxNVYnlsdYmj7Dn0ypybOyKufkpVCEaS3kO
 a7SV/QSVKdNbZOf8annwAhW+VN4urFmA9nnnr/yirrLJ0i2h18E0txrPFBszhftf
 xpOvaDR7okfEvzqwrHvVfRsqB4nYq9f0TSvvpPsS8vCtq34pWKZPa6iiSxeVL/jb
 EmFtiesUWClZzTIQSpUdbuU80cST6WEoNJJKDPZwF1XbA2navsDqgxKiYxsczjt6
 M5SStHcafK5LrXPruqOhfco/uKTmHNJJlvBWxUGCMQEDvdXdEJ4MIlg8VxxvoWPR
 FQDwU+iSdPOwlG7L3Tl9/PGSNe0MxJSgvzK6JNoKL3LvDx80FtMErWxPJdqdIL0+
 RpBsW7zaCyX9lwD866Frs4K2H1w2XFeQjOMI0Pz1SG9dZ8NoKJ+lzcwVY7GgHUvq
 NUNJLzL6MVCHytwTfqrSY7PGvUCrDqR102FQusQyplT4edcUv0M=
 =okQh
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
 "A set of NFSD fixes for stable that arrived after the merge window:

   - Remove an invalid NFS status code

   - Fix an fstests failure when using pNFS

   - Fix a UAF in v4_end_grace()

   - Fix the administrative interface used to revoke NFSv4 state

   - Fix a memory leak reported by syzbot"

* tag 'nfsd-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: net ref data still needs to be freed even if net hasn't startup
  nfsd: check that server is running in unlock_filesystem
  nfsd: use correct loop termination in nfsd4_revoke_states()
  nfsd: provide locking for v4_end_grace
  NFSD: Fix permission check for read access to executable-only files
  NFSD: Remove NFSERR_EAGAIN
2026-01-06 09:12:52 -08:00
Edward Adam Davis
0b88bfa42e NFSD: net ref data still needs to be freed even if net hasn't startup
When the NFSD instance doesn't to startup, the net ref data memory is
not properly reclaimed, which triggers the memory leak issue reported
by syzbot [1].

To avoid the problem reported in [1], the net ref data memory reclamation
action is moved outside of nfsd_net_up when the net is shutdown.

[1]
unreferenced object 0xffff88812a39dfc0 (size 64):
  backtrace (crc a2262fc6):
    percpu_ref_init+0x94/0x1e0 lib/percpu-refcount.c:76
    nfsd_create_serv+0xbe/0x260 fs/nfsd/nfssvc.c:605
    nfsd_nl_listener_set_doit+0x62/0xb00 fs/nfsd/nfsctl.c:1882
    genl_family_rcv_msg_doit+0x11e/0x190 net/netlink/genetlink.c:1115
    genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
    genl_rcv_msg+0x2fd/0x440 net/netlink/genetlink.c:1210

BUG: memory leak

Reported-by: syzbot+6ee3b889bdeada0a6226@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6ee3b889bdeada0a6226
Fixes: 39972494e3 ("nfsd: update percpu_ref to manage references on nfsd_net")
Cc: stable@vger.kernel.org
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-02 13:50:14 -05:00
Olga Kornievskaia
d0424066fc nfsd: check that server is running in unlock_filesystem
If we are trying to unlock the filesystem via an administrative
interface and nfsd isn't running, it crashes the server. This
happens currently because nfsd4_revoke_states() access state
structures (eg., conf_id_hashtbl) that has been freed as a part
of the server shutdown.

[   59.465072] Call trace:
[   59.465308]  nfsd4_revoke_states+0x1b4/0x898 [nfsd] (P)
[   59.465830]  write_unlock_fs+0x258/0x440 [nfsd]
[   59.466278]  nfsctl_transaction_write+0xb0/0x120 [nfsd]
[   59.466780]  vfs_write+0x1f0/0x938
[   59.467088]  ksys_write+0xfc/0x1f8
[   59.467395]  __arm64_sys_write+0x74/0xb8
[   59.467746]  invoke_syscall.constprop.0+0xdc/0x1e8
[   59.468177]  do_el0_svc+0x154/0x1d8
[   59.468489]  el0_svc+0x40/0xe0
[   59.468767]  el0t_64_sync_handler+0xa0/0xe8
[   59.469138]  el0t_64_sync+0x1ac/0x1b0

Ensure this can't happen by taking the nfsd_mutex and checking that
the server is still up, and then holding the mutex across the call to
nfsd4_revoke_states().

Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Fixes: 1ac3629bf0 ("nfsd: prepare for supporting admin-revocation of state")
Cc: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-02 13:49:55 -05:00
NeilBrown
fb321998de nfsd: use correct loop termination in nfsd4_revoke_states()
The loop in nfsd4_revoke_states() stops one too early because
the end value given is CLIENT_HASH_MASK where it should be
CLIENT_HASH_SIZE.

This means that an admin request to drop all locks for a filesystem will
miss locks held by clients which hash to the maximum possible hash value.

Fixes: 1ac3629bf0 ("nfsd: prepare for supporting admin-revocation of state")
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-02 13:49:38 -05:00
NeilBrown
2857bd59fe nfsd: provide locking for v4_end_grace
Writing to v4_end_grace can race with server shutdown and result in
memory being accessed after it was freed - reclaim_str_hashtbl in
particularly.

We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is
held while client_tracking_op->init() is called and that can wait for
an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a
deadlock.

nfsd4_end_grace() is also called by the landromat work queue and this
doesn't require locking as server shutdown will stop the work and wait
for it before freeing anything that nfsd4_end_grace() might access.

However, we must be sure that writing to v4_end_grace doesn't restart
the work item after shutdown has already waited for it.  For this we
add a new flag protected with nn->client_lock.  It is set only while it
is safe to make client tracking calls, and v4_end_grace only schedules
work while the flag is set with the spinlock held.

So this patch adds a nfsd_net field "client_tracking_active" which is
set as described.  Another field "grace_end_forced", is set when
v4_end_grace is written.  After this is set, and providing
client_tracking_active is set, the laundromat is scheduled.
This "grace_end_forced" field bypasses other checks for whether the
grace period has finished.

This resolves a race which can result in use-after-free.

Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
Closes: https://lore.kernel.org/linux-nfs/20250623030015.2353515-1-neil@brown.name/T/#t
Fixes: 7f5ef2e900 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd")
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neil@brown.name>
Tested-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-02 13:48:22 -05:00