Commit graph

5058 commits

Author SHA1 Message Date
Linus Torvalds
32a92f8c89 Convert more 'alloc_obj' cases to default GFP_KERNEL arguments
This converts some of the visually simpler cases that have been split
over multiple lines.  I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.

Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script.  I probably had made it a bit _too_ trivial.

So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.

The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 20:03:00 -08:00
Linus Torvalds
323bbfcf1e Convert 'alloc_flex' family to use the new default GFP_KERNEL argument
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.

As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
7449f86baf NFS Client Updates for Linux 7.0
New Features:
  * Use an LRU list for returning unused delegations
  * Introduce a KConfig option to disable NFS v4.0 and make NFS v4.1 the default
 
 Bugfixes:
  * NFS/localio: Handle short writes by retrying
  * NFS/localio: prevent direct reclaim recursion into NFS via nfs_writepages
  * NFS/localio: use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit
  * NFS/localio: remove -EAGAIN handling in nfs_local_doio()
  * pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN
  * fs/nfs: Fix a readdir slow-start regression
  * SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path
 
 Other Cleanups and Improvements:
   * A few other NFS/localio cleanups
   * Various other delegation handling cleanups from Christoph
   * Unify security_inode_listsecurity() calls
   * Improvements to NFSv4 lease handling
   * Clean up SUNRPC *_debug fields when CONFIG_SUNRPC_DEBUG is not set
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmmORX0ACgkQ18tUv7Cl
 QOtFORAAyCwTst5iEPRJ9rKZ/Kl39zHbA/QUn3CmmVkGlOBj0j7mWRyU5X0vlIQ9
 mUF3Ikm1XYpsxPTKBEELVumPkggT2nfsFx5518BrpRTODibzc/CZ10/z7q4qarvI
 UhdFlt9SRG4RhhOdAaThF6XVUsRSwGwVZo/YyYemCc/evjNVyXa0wfwbDl9l4Nzr
 1Sxt2/zeq3Eu4IfrxQpFM+0UuSScmVODSe8Jm4GnmlU/Q7x+onW35IvyuzTkgDwG
 8PAeH4b5uADY9VWnTHpvr1fQNnBoEw8b4qr9a7AXQKRIcPGMvgKkdK+f6hOh1cEs
 +O+L4+uixo7QXudnWC27brZSyHwDIVVaJGPF/kNv4O2GKDyEcbsHtQv2G1+1+PtR
 FCtRFGpLq2pZxb9SY/s73FKp6a8bd81FAtzAL7iYU+2FDtvEDKss1nG6sQNG1+Z4
 G8rI79PoimR4I6Jr5hk4sl8pM8wJVLZdcW+ytrEKl9FC+rFDrP9lVzHYArTFgIky
 N/IjEflejRfZ9bYIZ9/CYnFZC3Htrm8K9zerCRDsf96tvhxkX8FZM8tuZpHEMIbx
 Cx8XKCk+ubqLIF2mT+FKOc5T6CUmMiGRNagLkx0h0mbvRSI8HTpgQZGrbYkMk0Hs
 abUhvH73pRi0LRvkzPHfcNaZ7Y/mFBYfwBMwTUWJzh6CEgXnpks=
 =CwZq
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-7.0-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "New Features:
   - Use an LRU list for returning unused delegations
   - Introduce a KConfig option to disable NFS v4.0 and make NFS v4.1
     the default

  Bugfixes:
   - NFS/localio:
       - Handle short writes by retrying
       - Prevent direct reclaim recursion into NFS via nfs_writepages
       - Use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit
       - Remove -EAGAIN handling in nfs_local_doio()
   - pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN
   - fs/nfs: Fix a readdir slow-start regression
   - SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path

  Other cleanups and improvements:
   - A few other NFS/localio cleanups
   - Various other delegation handling cleanups from Christoph
   - Unify security_inode_listsecurity() calls
   - Improvements to NFSv4 lease handling
   - Clean up SUNRPC *_debug fields when CONFIG_SUNRPC_DEBUG is not set"

* tag 'nfs-for-7.0-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (60 commits)
  SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path
  nfs: nfs4proc: Convert comma to semicolon
  SUNRPC: Change list definition method
  sunrpc: rpc_debug and others are defined even if CONFIG_SUNRPC_DEBUG unset
  NFSv4: limit lease period in nfs4_set_lease_period()
  NFSv4: pass lease period in seconds to nfs4_set_lease_period()
  nfs: unify security_inode_listsecurity() calls
  fs/nfs: Fix readdir slow-start regression
  pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN
  NFS: fix delayed delegation return handling
  NFS: simplify error handling in nfs_end_delegation_return
  NFS: fold nfs_abort_delegation_return into nfs_end_delegation_return
  NFS: remove the delegation == NULL check in nfs_end_delegation_return
  NFS: use bool for the issync argument to nfs_end_delegation_return
  NFS: return void from ->return_delegation
  NFS: return void from nfs4_inode_make_writeable
  NFS: Merge CONFIG_NFS_V4_1 with CONFIG_NFS_V4
  NFS: Add a way to disable NFS v4.0 via KConfig
  NFS: Move sequence slot operations into minorversion operations
  NFS: Pass a struct nfs_client to nfs4_init_sequence()
  ...
2026-02-12 17:49:33 -08:00
Linus Torvalds
311aa68319 RDMA v7.0 merge window
Usual smallish cycle:
 
 - Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe
 
 - Small driver improvements and minor bug fixes to hns, mlx5, rxe, mana,
   mlx5, irdma
 
 - Robusness improvements in completion processing for EFA
 
 - New query_port_speed() verb to move past limited IBA defined speed steps
 
 - Support for SG_GAPS in rts and many other small improvements
 
 - Rare list corruption fix in iwcm
 
 - Better support different page sizes in rxe
 
 - Device memory support for mana
 
 - Direct bio vec to kernel MR for use by NFS-RDMA
 
 - QP rate limiting for bnxt_re
 
 - Remote triggerable NULL pointer crash in siw
 
 - DMA-buf exporter support for RDMA mmaps like doorbells
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCaY44vgAKCRCFwuHvBreF
 YfiZAP91cMZfogN7r1FMD75xDZu55dI3Jvy8OaixyRxlWLGPcQEAjritdL0o7fZp
 YrD1OXNS/1XG//rPBVw7xj+54Aa8hAU=
 =AVcu
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Usual smallish cycle. The NFS biovec work to push it down into RDMA
  instead of indirecting through a scatterlist is pretty nice to see,
  been talked about for a long time now.

   - Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe

   - Small driver improvements and minor bug fixes to hns, mlx5, rxe,
     mana, mlx5, irdma

   - Robusness improvements in completion processing for EFA

   - New query_port_speed() verb to move past limited IBA defined speed
     steps

   - Support for SG_GAPS in rts and many other small improvements

   - Rare list corruption fix in iwcm

   - Better support different page sizes in rxe

   - Device memory support for mana

   - Direct bio vec to kernel MR for use by NFS-RDMA

   - QP rate limiting for bnxt_re

   - Remote triggerable NULL pointer crash in siw

   - DMA-buf exporter support for RDMA mmaps like doorbells"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits)
  RDMA/mlx5: Implement DMABUF export ops
  RDMA/uverbs: Add DMABUF object type and operations
  RDMA/uverbs: Support external FD uobjects
  RDMA/siw: Fix potential NULL pointer dereference in header processing
  RDMA/umad: Reject negative data_len in ib_umad_write
  IB/core: Extend rate limit support for RC QPs
  RDMA/mlx5: Support rate limit only for Raw Packet QP
  RDMA/bnxt_re: Report QP rate limit in debugfs
  RDMA/bnxt_re: Report packet pacing capabilities when querying device
  RDMA/bnxt_re: Add support for QP rate limiting
  MAINTAINERS: Drop RDMA files from Hyper-V section
  RDMA/uverbs: Add __GFP_NOWARN to ib_uverbs_unmarshall_recv() kmalloc
  svcrdma: use bvec-based RDMA read/write API
  RDMA/core: add rdma_rw_max_sge() helper for SQ sizing
  RDMA/core: add MR support for bvec-based RDMA operations
  RDMA/core: use IOVA-based DMA mapping for bvec RDMA operations
  RDMA/core: add bio_vec based RDMA read/write API
  RDMA/irdma: Use kvzalloc for paged memory DMA address array
  RDMA/rxe: Fix race condition in QP timer handlers
  RDMA/mana_ib: Add device‑memory support
  ...
2026-02-12 17:05:20 -08:00
Linus Torvalds
136114e0ab mm.git review status for linus..mm-nonmm-stable
Total patches:       107
 Reviews/patch:       1.07
 Reviewed rate:       67%
 
 - The 2 patch series "ocfs2: give ocfs2 the ability to reclaim
   suballocator free bg" from Heming Zhao saves disk space by teaching
   ocfs2 to reclaim suballocator block group space.
 
 - The 4 patch series "Add ARRAY_END(), and use it to fix off-by-one
   bugs" from Alejandro Colomar adds the ARRAY_END() macro and uses it in
   various places.
 
 - The 2 patch series "vmcoreinfo: support VMCOREINFO_BYTES larger than
   PAGE_SIZE" from Pnina Feder makes the vmcore code future-safe, if
   VMCOREINFO_BYTES ever exceeds the page size.
 
 - The 7 patch series "kallsyms: Prevent invalid access when showing
   module buildid" from Petr Mladek cleans up kallsyms code related to
   module buildid and fixes an invalid access crash when printing
   backtraces.
 
 - The 3 patch series "Address page fault in
   ima_restore_measurement_list()" from Harshit Mogalapalli fixes a
   kexec-related crash that can occur when booting the second-stage kernel
   on x86.
 
 - The 6 patch series "kho: ABI headers and Documentation updates" from
   Mike Rapoport updates the kexec handover ABI documentation.
 
 - The 4 patch series "Align atomic storage" from Finn Thain adds the
   __aligned attribute to atomic_t and atomic64_t definitions to get
   natural alignment of both types on csky, m68k, microblaze, nios2,
   openrisc and sh.
 
 - The 2 patch series "kho: clean up page initialization logic" from
   Pratyush Yadav simplifies the page initialization logic in
   kho_restore_page().
 
 - The 6 patch series "Unload linux/kernel.h" from Yury Norov moves
   several things out of kernel.h and into more appropriate places.
 
 - The 7 patch series "don't abuse task_struct.group_leader" from Oleg
   Nesterov removes the usage of ->group_leader when it is "obviously
   unnecessary".
 
 - The 5 patch series "list private v2 & luo flb" from Pasha Tatashin
   adds some infrastructure improvements to the live update orchestrator.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY4giAAKCRDdBJ7gKXxA
 jgusAQDnKkP8UWTqXPC1jI+OrDJGU5ciAx8lzLeBVqMKzoYk9AD/TlhT2Nlx+Ef6
 0HCUHUD0FMvAw/7/Dfc6ZKxwBEIxyww=
 =mmsH
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
   disk space by teaching ocfs2 to reclaim suballocator block group
   space (Heming Zhao)

 - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
   ARRAY_END() macro and uses it in various places (Alejandro Colomar)

 - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
   the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
   page size (Pnina Feder)

 - "kallsyms: Prevent invalid access when showing module buildid" cleans
   up kallsyms code related to module buildid and fixes an invalid
   access crash when printing backtraces (Petr Mladek)

 - "Address page fault in ima_restore_measurement_list()" fixes a
   kexec-related crash that can occur when booting the second-stage
   kernel on x86 (Harshit Mogalapalli)

 - "kho: ABI headers and Documentation updates" updates the kexec
   handover ABI documentation (Mike Rapoport)

 - "Align atomic storage" adds the __aligned attribute to atomic_t and
   atomic64_t definitions to get natural alignment of both types on
   csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)

 - "kho: clean up page initialization logic" simplifies the page
   initialization logic in kho_restore_page() (Pratyush Yadav)

 - "Unload linux/kernel.h" moves several things out of kernel.h and into
   more appropriate places (Yury Norov)

 - "don't abuse task_struct.group_leader" removes the usage of
   ->group_leader when it is "obviously unnecessary" (Oleg Nesterov)

 - "list private v2 & luo flb" adds some infrastructure improvements to
   the live update orchestrator (Pasha Tatashin)

* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
  watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
  procfs: fix missing RCU protection when reading real_parent in do_task_stat()
  watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
  kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
  kho: fix doc for kho_restore_pages()
  tests/liveupdate: add in-kernel liveupdate test
  liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
  liveupdate: luo_file: Use private list
  list: add kunit test for private list primitives
  list: add primitives for private list manipulations
  delayacct: fix uapi timespec64 definition
  panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
  netclassid: use thread_group_leader(p) in update_classid_task()
  RDMA/umem: don't abuse current->group_leader
  drm/pan*: don't abuse current->group_leader
  drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
  drm/amdgpu: don't abuse current->group_leader
  android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
  android/binder: don't abuse current->group_leader
  kho: skip memoryless NUMA nodes when reserving scratch areas
  ...
2026-02-12 12:13:01 -08:00
Daniel Hodges
dd2fdc3504 SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path
Commit 5940d1cf9f ("SUNRPC: Rebalance a kref in auth_gss.c") added
a kref_get(&gss_auth->kref) call to balance the gss_put_auth() done
in gss_release_msg(), but forgot to add a corresponding kref_put()
on the error path when kstrdup_const() fails.

If service_name is non-NULL and kstrdup_const() fails, the function
jumps to err_put_pipe_version which calls put_pipe_version() and
kfree(gss_msg), but never releases the gss_auth reference. This leads
to a kref leak where the gss_auth structure is never freed.

Add a forward declaration for gss_free_callback() and call kref_put()
in the err_put_pipe_version error path to properly release the
reference taken earlier.

Fixes: 5940d1cf9f ("SUNRPC: Rebalance a kref in auth_gss.c")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Hodges <git@danielhodges.dev>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-02-09 16:39:50 -05:00
Chenguang Zhao
afb24505ff SUNRPC: Change list definition method
The LIST_HEAD macro can both define a linked list and initialize
it in one step. To simplify code, we replace the separate operations
of linked list definition and manual initialization with the LIST_HEAD
macro.

Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-02-09 14:24:19 -05:00
Jeff Layton
a0022a38be sunrpc: allow svc_recv() to return -ETIMEDOUT and -EBUSY
To dynamically adjust the thread count, nfsd requires some information
about how busy things are.

Change svc_recv() to take a timeout value, and then allow the wait for
work to time out if it's set. If a timeout is not defined, then the
schedule will be set to MAX_SCHEDULE_TIMEOUT. If the task waits for the
full timeout, then have it return -ETIMEDOUT to the caller.

If it wakes up, finds that there is more work and that no threads are
available, then attempt to set SP_TASK_STARTING. If wasn't already set,
have the task return -EBUSY to cue to the caller that the service could
use more threads.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28 10:15:42 -05:00
Jeff Layton
7f221b340d sunrpc: split new thread creation into a separate function
Break out the part of svc_start_kthreads() that creates a thread into
svc_new_thread(), as a new exported helper function.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28 10:15:42 -05:00
Jeff Layton
7ffc7ade2c sunrpc: introduce the concept of a minimum number of threads per pool
Add a new pool->sp_nrthrmin field to track the minimum number of threads
in a pool. Add min_threads parameters to both svc_set_num_threads() and
svc_set_pool_threads(). If min_threads is non-zero and less than the
max, svc_set_num_threads() will ensure that the number of running
threads is between the min and the max.

If the min is 0 or greater than the max, then it is ignored, and the
maximum number of threads will be started, and never spun down.

For now, the min_threads is always 0, but a later patch will pass the
proper value through from nfsd.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28 10:15:42 -05:00
Jeff Layton
6cd60f4274 sunrpc: track the max number of requested threads in a pool
The kernel currently tracks the number of threads running in a pool in
the "sp_nrthreads" field. In the future, where threads are dynamically
spun up and down, it'll be necessary to keep track of the maximum number
of requested threads separately from the actual number running.

Add a pool->sp_nrthrmax parameter to track this. When userland changes
the number of threads in a pool, update that value accordingly.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28 10:15:42 -05:00
Jeff Layton
2c01f0cf32 sunrpc: remove special handling of NULL pool from svc_start/stop_kthreads()
Now that svc_set_num_threads() handles distributing the threads among
the available pools, remove the special handling of a NULL pool pointer
from svc_start_kthreads() and svc_stop_kthreads().

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28 10:15:42 -05:00
Jeff Layton
e344f87262 sunrpc: split svc_set_num_threads() into two functions
svc_set_num_threads() will set the number of running threads for a given
pool. If the pool argument is set to NULL however, it will distribute
the threads among all of the pools evenly.

These divergent codepaths complicate the move to dynamic threading.
Simplify the API by splitting these two cases into different helpers:

Add a new svc_set_pool_threads() function that sets the number of
threads in a single, given pool. Modify svc_set_num_threads() to
distribute the threads evenly between all of the pools and then call
svc_set_pool_threads() for each.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28 10:15:42 -05:00
Chuck Lever
5ee62b4a91 svcrdma: use bvec-based RDMA read/write API
Convert svcrdma to the bvec-based RDMA API introduced earlier in
this series.

The bvec-based RDMA API eliminates the intermediate scatterlist
conversion step, allowing direct DMA mapping from bio_vec arrays.
This simplifies the svc_rdma_rw_ctxt structure by removing the
chained SG table management.

The structure retains an inline array approach similar to the
previous scatterlist implementation: an inline bvec array sized
to max_send_sge handles most I/O operations without additional
allocation. Larger requests fall back to dynamic allocation.
This preserves the allocation-free fast path for typical NFS
operations while supporting arbitrarily large transfers.

The bvec API handles all device types internally, including iWARP
devices which require memory registration. No explicit fallback
path is needed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://patch.msgid.link/20260128005400.25147-6-cel@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-28 05:54:53 -05:00
Chuck Lever
afcae7d7b8 RDMA/core: add rdma_rw_max_sge() helper for SQ sizing
svc_rdma_accept() computes sc_sq_depth as the sum of rq_depth and the
number of rdma_rw contexts (ctxts). This value is used to allocate the
Send CQ and to initialize the sc_sq_avail credit pool.

However, when the device uses memory registration for RDMA operations,
rdma_rw_init_qp() inflates the QP's max_send_wr by a factor of three
per context to account for REG and INV work requests. The Send CQ and
credit pool remain sized for only one work request per context,
causing Send Queue exhaustion under heavy NFS WRITE workloads.

Introduce rdma_rw_max_sge() to compute the actual number of Send Queue
entries required for a given number of rdma_rw contexts. Upper layer
protocols call this helper before creating a Queue Pair so that their
Send CQs and credit accounting match the QP's true capacity.

Update svc_rdma_accept() to use rdma_rw_max_sge() when computing
sc_sq_depth, ensuring the credit pool reflects the work requests
that rdma_rw_init_qp() will reserve.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: 00bd1439f4 ("RDMA/rw: Support threshold for registration vs scattering to local pages")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://patch.msgid.link/20260128005400.25147-5-cel@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-28 05:54:53 -05:00
Chuck Lever
3e6397b056 SUNRPC: auth_gss: fix memory leaks in XDR decoding error paths
The gssx_dec_ctx(), gssx_dec_status(), and gssx_dec_name()
functions allocate memory via gssx_dec_buffer(), which calls
kmemdup(). When a subsequent decode operation fails, these
functions return immediately without freeing previously
allocated buffers, causing memory leaks.

The leak in gssx_dec_ctx() is particularly relevant because
the caller (gssp_accept_sec_context_upcall) initializes several
buffer length fields to non-zero values, resulting in memory
allocation:

    struct gssx_ctx rctxh = {
        .exported_context_token.len = GSSX_max_output_handle_sz,
        .mech.len = GSS_OID_MAX_LEN,
        .src_name.display_name.len = GSSX_max_princ_sz,
        .targ_name.display_name.len = GSSX_max_princ_sz
    };

If, for example, gssx_dec_name() succeeds for src_name but
fails for targ_name, the memory allocated for
exported_context_token, mech, and src_name.display_name
remains unreferenced and cannot be reclaimed.

Add error handling with goto-based cleanup to free any
previously allocated buffers before returning an error.

Reported-by: Xingjing Deng <micro6947@gmail.com>
Closes: https://lore.kernel.org/linux-nfs/CAK+ZN9qttsFDu6h1FoqGadXjMx1QXqPMoYQ=6O9RY4SxVTvKng@mail.gmail.com/
Fixes: 1d658336b0 ("SUNRPC: Add RPC based upcall mechanism for RPCGSS auth")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26 10:10:58 -05:00
Randy Dunlap
24c776355f kernel.h: drop hex.h and update all hex.h users
Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of
hex.h interfaces to directly #include <linux/hex.h> as part of the process
of putting kernel.h on a diet.

Removing hex.h from kernel.h means that 36K C source files don't have to
pay the price of parsing hex.h for the roughly 120 C source files that
need it.

This change has been build-tested with allmodconfig on most ARCHes.  Also,
all users/callers of <linux/hex.h> in the entire source tree have been
updated if needed (if not already #included).

Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:44:19 -08:00
Linus Torvalds
ccd1cdca5c nfsd-6.19 fixes:
A set of NFSD fixes that arrived just a bit late for the 6.19 merge
 window.
 
 Issues reported with v6.19-rc:
 - Mark variable __maybe_unused to avoid W=1 build break
 
 Issues that need expedient stable backports:
 - NFSv4 file creation neglects setting ACL
 - Clear TIME_DELEG in the suppattr_exclcreat bitmap
 - Clear SECLABEL in the suppattr_exclcreat bitmap
 - Fix memory leak in nfsd_create_serv error paths
 - Bound check rq_pages index in inline path
 - Return 0 on success from svc_rdma_copy_inline_range
 - Use rc_pageoff for memcpy byte offset
 - Avoid NULL deref on zero length gss_token in gss_read_proxy_verf
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmlKp+EACgkQM2qzM29m
 f5e0IhAAwkgBfMIKn9S9eIJ4jYIMAlZGTv24Pt4DkTHcSMYzwZ/8bnCNqL5S1UUV
 V8DeVe4pYKoMsXZYkFt9ZyPE77D6ZrWaOHdwHeq/UI624f1KIccOGrnDHfz6Xt+Q
 7xuWsfblKPQQMalRgbfiVq+mRfBfAzdR/cjT+11RaagMEEOP/jMdXIn0qel03Yuy
 v+4L92wg0gD5GIpQqKZiNXWsPWbYvw/kj1a1hIpOFv//4R9L5oPESC/2jXFweTUE
 fCKuAa+CntaagJcTqXHokPB7DO0lvSNoP5wt/eS/uFmrnnLFPTHu0q5+EAangIV7
 laqEJiIqT/QR/mfunJ1v07ifGGmXCeg209olyM95enkPgaBCVaYg1OU52YMgGFPY
 GxnUCCEDQVBjPVnETRN4LpvnTcnDlceuzwzw9O99x0A3pZ1++FCfFl0VtcGI227u
 BZu8+/1jfSCGTOqvx+bzHJ+XR9h9edKZMIFruceEXKfagOkyjvknbWaTBs1+lPsZ
 9EswllY7V9MzRsD951sd8KiCVChhtOvNFutuhRz30KHBxT+alrP8D90TQBAYpUrz
 wN3eNqDKkD+YOIx8T8P3Qn4MxLWd/xX8roG/9TzwgmqD+5c3D12IKouYo4tiNqc5
 d+cH/CpqhwMqsM72XNw4+RQvZFsZyfubrKPUnVidc0t0Ong73zM=
 =jkV1
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
 "A set of NFSD fixes that arrived just a bit late for the 6.19 merge
  window.

  Regression fixes:
   - Mark variable __maybe_unused to avoid W=1 build break

  Stable fixes:
   - NFSv4 file creation neglects setting ACL
   - Clear TIME_DELEG in the suppattr_exclcreat bitmap
   - Clear SECLABEL in the suppattr_exclcreat bitmap
   - Fix memory leak in nfsd_create_serv error paths
   - Bound check rq_pages index in inline path
   - Return 0 on success from svc_rdma_copy_inline_range
   - Use rc_pageoff for memcpy byte offset
   - Avoid NULL deref on zero length gss_token in gss_read_proxy_verf"

* tag 'nfsd-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: NFSv4 file creation neglects setting ACL
  NFSD: Clear TIME_DELEG in the suppattr_exclcreat bitmap
  NFSD: Clear SECLABEL in the suppattr_exclcreat bitmap
  nfsd: fix memory leak in nfsd_create_serv error paths
  nfsd: Mark variable __maybe_unused to avoid W=1 build break
  svcrdma: bound check rq_pages index in inline path
  svcrdma: return 0 on success from svc_rdma_copy_inline_range
  svcrdma: use rc_pageoff for memcpy byte offset
  SUNRPC: svcauth_gss: avoid NULL deref on zero length gss_token in gss_read_proxy_verf
2025-12-24 09:23:04 -08:00
Linus Torvalds
6bb34aff1e NFS client updates for Linux 6.19
Highlights include:
 
 Bugfixes:
 - Fix 'nlink' attribute update races when unlinking a file.
 - Add missing initialisers for the directory verifier in various places.
 - Don't regress the NFSv4 open state due to misordered racing replies.
 - Ensure the NFSv4.x callback server uses the correct transport
   connection.
 - Fix potential use-after-free races when shutting down the NFSv4.x
   callback server.
 - Fix a pNFS layout commit crash.
 - Assorted fixes to ensure correct propagation of mount options when the
   client crosses a filesystem boundary and triggers the VFS automount
   code.
 - More localio fixes.
 
 Features and cleanups:
 - Add initial support for basic directory delegations.
 - SunRPC back channel code cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR8xgHcVzJNfOYElJo6EXfx2a6V0QUCaTrPnQAKCRA6EXfx2a6V
 0aLQAPwOs+bfoaPuk/EsC87m3rAtFJNPMg65toJY/6JnXnTGXgEAs1XlCcSCgc10
 bsX+D6QIudOoWExwFfLlVlFAFikGKQc=
 =DoZq
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Bugfixes:
   - Fix 'nlink' attribute update races when unlinking a file
   - Add missing initialisers for the directory verifier in various
     places
   - Don't regress the NFSv4 open state due to misordered racing replies
   - Ensure the NFSv4.x callback server uses the correct transport
     connection
   - Fix potential use-after-free races when shutting down the NFSv4.x
     callback server
   - Fix a pNFS layout commit crash
   - Assorted fixes to ensure correct propagation of mount options when
     the client crosses a filesystem boundary and triggers the VFS
     automount code
   - More localio fixes

  Features and cleanups:
   - Add initial support for basic directory delegations
   - SunRPC back channel code cleanups"

* tag 'nfs-for-6.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits)
  NFSv4: Handle NFS4ERR_NOTSUPP errors for directory delegations
  nfs/localio: remove 61 byte hole from needless ____cacheline_aligned
  nfs/localio: remove alignment size checking in nfs_is_local_dio_possible
  NFS: Fix up the automount fs_context to use the correct cred
  NFS: Fix inheritance of the block sizes when automounting
  NFS: Automounted filesystems should inherit ro,noexec,nodev,sync flags
  Revert "nfs: ignore SB_RDONLY when mounting nfs"
  Revert "nfs: clear SB_RDONLY before getting superblock"
  Revert "nfs: ignore SB_RDONLY when remounting nfs"
  NFS: Add a module option to disable directory delegations
  NFS: Shortcut lookup revalidations if we have a directory delegation
  NFS: Request a directory delegation during RENAME
  NFS: Request a directory delegation on ACCESS, CREATE, and UNLINK
  NFS: Add support for sending GDD_GETATTR
  NFSv4/pNFS: Clear NFS_INO_LAYOUTCOMMIT in pnfs_mark_layout_stateid_invalid
  NFSv4.1: protect destroying and nullifying bc_serv structure
  SUNRPC: new helper function for stopping backchannel server
  SUNRPC: cleanup common code in backchannel request
  NFSv4.1: pass transport for callback shutdown
  NFSv4: ensure the open stateid seqid doesn't go backwards
  ...
2025-12-12 21:52:42 +12:00
Joshua Rogers
d1bea0ce35 svcrdma: bound check rq_pages index in inline path
svc_rdma_copy_inline_range indexed rqstp->rq_pages[rc_curpage] without
verifying rc_curpage stays within the allocated page array. Add guards
before the first use and after advancing to a new page.

Fixes: d7cc739726 ("svcrdma: support multiple Read chunks per RPC")
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-12-08 10:51:26 -05:00
Joshua Rogers
94972027ab svcrdma: return 0 on success from svc_rdma_copy_inline_range
The function comment specifies 0 on success and -EINVAL on invalid
parameters. Make the tail return 0 after a successful copy loop.

Fixes: d7cc739726 ("svcrdma: support multiple Read chunks per RPC")
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-12-08 10:51:26 -05:00
Joshua Rogers
a8ee9099f3 svcrdma: use rc_pageoff for memcpy byte offset
svc_rdma_copy_inline_range added rc_curpage (page index) to the page
base instead of the byte offset rc_pageoff. Use rc_pageoff so copies
land within the current page.

Found by ZeroPath (https://zeropath.com)

Fixes: 8e12258268 ("svcrdma: Move svc_rdma_read_info::ri_pageno to struct svc_rdma_recv_ctxt")
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-12-08 10:51:26 -05:00
Joshua Rogers
d4b69a6186 SUNRPC: svcauth_gss: avoid NULL deref on zero length gss_token in gss_read_proxy_verf
A zero length gss_token results in pages == 0 and in_token->pages[0]
is NULL. The code unconditionally evaluates
page_address(in_token->pages[0]) for the initial memcpy, which can
dereference NULL even when the copy length is 0. Guard the first
memcpy so it only runs when length > 0.

Fixes: 5866efa8cb ("SUNRPC: Fix svcauth_gss_proxy_init()")
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Rogers <linux@joshua.hu>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-12-08 10:51:26 -05:00
Linus Torvalds
b0319c4642 NFSD 6.19 Release Notes
Mike Snitzer's mechanism for disabling I/O caching introduced in
 v6.18 is extended to include using direct I/O. The goal is to
 further reduce the memory footprint consumed by NFS clients
 accessing large data sets via NFSD.
 
 The NFSD community adopted a maintainer entry profile during this
 cycle. See
 
   Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst
 
 Work continues on hardening NFSD's implementation of the pNFS block
 layout type. This type enables pNFS clients to directly access the
 underlying block devices that contain an exported file system,
 reducing server overhead and increasing data throughput.
 
 The remaining patches in this pull request are clean-ups and minor
 optimizations. Many thanks to the contributors, reviewers, testers,
 and bug reporters who participated during the v6.19 NFSD development
 cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmk0SkQACgkQM2qzM29m
 f5cI3RAAoqg53ctyWC8B76mC2kT/bugWWXNwVAVd58UGy3yptMlJG2TBBAHbV6rs
 NgMzP1gg470eccYonntS6Pk259hJimi/REifk+2bFuvjqym7OruKXJBxn2FZrCCw
 mBu/ptfj9SFoTezAHh/wNuHTEy68gLn6V2LB3jdMcxeqUaC39kFnB0sZP4+xiFcx
 PfL6uiXj8JtGpYXf8AKf7HniZCBrtkia1ByRrFHrcPX5A6S9dL85rDQbm/O8L3AA
 hS3cp2UQUSwvFUED9N2QXPpRQ3nytNSG08f/wOaXhIXAmXq/sZFGjlNqjpX8i5jV
 jsSQIeYy1BwLsdxo25wbShrqmYsFH3zWRELKoCs0g9lXbpJWyK0O93zETXtYRRae
 4O9iJ5VcQMyMpg4/Gh1jsRU9RUCc88T15F2HQVYTIgHrLKnz1sWMKI//1G3kYrf3
 L7hYk2ZU6QYqSNOFkrL0Jf7geqf8FR4nUi/nv+A/2gFqm+WWIhNTKQt2hsQlqD66
 HaYOhFX5SOj8YnjsbpGM8W65n8ELaON0PWhq+SBpRCm7i2llod0HTIpyv00NTwVL
 xmr5MfmO7989kVVy7DA/h/xg/dfOpUCNjImrXz+QDbsmCMdLjT/6YGD47FLkxzEO
 O+PuRIORuez8KosEaIRVjdXcNFPSeGfDdBAveHH9IqRX7ZvIZQY=
 =E0oM
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:

 - Mike Snitzer's mechanism for disabling I/O caching introduced in
   v6.18 is extended to include using direct I/O. The goal is to further
   reduce the memory footprint consumed by NFS clients accessing large
   data sets via NFSD.

 - The NFSD community adopted a maintainer entry profile during this
   cycle. See

      Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst

 - Work continues on hardening NFSD's implementation of the pNFS block
   layout type. This type enables pNFS clients to directly access the
   underlying block devices that contain an exported file system,
   reducing server overhead and increasing data throughput.

 - The remaining patches are clean-ups and minor optimizations. Many
   thanks to the contributors, reviewers, testers, and bug reporters who
   participated during the v6.19 NFSD development cycle.

* tag 'nfsd-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (38 commits)
  NFSD: nfsd-io-modes: Separate lists
  NFSD: nfsd-io-modes: Wrap shell snippets in literal code blocks
  NFSD: Add toctree entry for NFSD IO modes docs
  NFSD: add Documentation/filesystems/nfs/nfsd-io-modes.rst
  NFSD: Implement NFSD_IO_DIRECT for NFS WRITE
  NFSD: Make FILE_SYNC WRITEs comply with spec
  NFSD: Add trace point for SCSI fencing operation.
  NFSD: use correct reservation type in nfsd4_scsi_fence_client
  xdrgen: Don't generate unnecessary semicolon
  xdrgen: Fix union declarations
  NFSD: don't start nfsd if sv_permsocks is empty
  xdrgen: handle _XdrString in union encoder/decoder
  xdrgen: Fix the variable-length opaque field decoder template
  xdrgen: Make the xdrgen script location-independent
  xdrgen: Generalize/harden pathname construction
  lockd: don't allow locking on reexported NFSv2/3
  MAINTAINERS: add a nfsd blocklayout reviewer
  nfsd: Use MD5 library instead of crypto_shash
  nfsd: stop pretending that we cache the SEQUENCE reply.
  NFS: nfsd-maintainer-entry-profile: Inline function name prefixes
  ...
2025-12-06 10:57:02 -08:00
Linus Torvalds
7cd122b552 Some filesystems use a kinda-sorta controlled dentry refcount leak to pin
dentries of created objects in dcache (and undo it when removing those).
 Reference is grabbed and not released, but it's not actually _stored_
 anywhere.  That works, but it's hard to follow and verify; among other
 things, we have no way to tell _which_ of the increments is intended
 to be an unpaired one.  Worse, on removal we need to decide whether
 the reference had already been dropped, which can be non-trivial if
 that removal is on umount and we need to figure out if this dentry is
 pinned due to e.g. unlink() not done.  Usually that is handled by using
 kill_litter_super() as ->kill_sb(), but there are open-coded special
 cases of the same (consider e.g. /proc/self).
 
 Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT)
 marking those "leaked" dentries.  Having it set claims responsibility
 for +1 in refcount.
 
 The end result this series is aiming for:
 
 * get these unbalanced dget() and dput() replaced with new primitives that
   would, in addition to adjusting refcount, set and clear persistency flag.
 * instead of having kill_litter_super() mess with removing the remaining
   "leaked" references (e.g. for all tmpfs files that hadn't been removed
   prior to umount), have the regular shrink_dcache_for_umount() strip
   DCACHE_PERSISTENT of all dentries, dropping the corresponding
   reference if it had been set.  After that kill_litter_super() becomes
   an equivalent of kill_anon_super().
 
 Doing that in a single step is not feasible - it would affect too many places
 in too many filesystems.  It has to be split into a series.
 
 This work has really started early in 2024; quite a few preliminary pieces
 have already gone into mainline.  This chunk is finally getting to the
 meat of that stuff - infrastructure and most of the conversions to it.
 
 Some pieces are still sitting in the local branches, but the bulk of
 that stuff is here.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaTEq1wAKCRBZ7Krx/gZQ
 643uAQC1rRslhw5l7OjxEpIYbGG4M+QaadN4Nf5Sr2SuTRaPJQD/W4oj/u4C2eCw
 Dd3q071tqyvm/PXNgN2EEnIaxlFUlwc=
 =rKq+
 -----END PGP SIGNATURE-----

Merge tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull persistent dentry infrastructure and conversion from Al Viro:
 "Some filesystems use a kinda-sorta controlled dentry refcount leak to
  pin dentries of created objects in dcache (and undo it when removing
  those). A reference is grabbed and not released, but it's not actually
  _stored_ anywhere.

  That works, but it's hard to follow and verify; among other things, we
  have no way to tell _which_ of the increments is intended to be an
  unpaired one. Worse, on removal we need to decide whether the
  reference had already been dropped, which can be non-trivial if that
  removal is on umount and we need to figure out if this dentry is
  pinned due to e.g. unlink() not done. Usually that is handled by using
  kill_litter_super() as ->kill_sb(), but there are open-coded special
  cases of the same (consider e.g. /proc/self).

  Things get simpler if we introduce a new dentry flag
  (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set
  claims responsibility for +1 in refcount.

  The end result this series is aiming for:

   - get these unbalanced dget() and dput() replaced with new primitives
     that would, in addition to adjusting refcount, set and clear
     persistency flag.

   - instead of having kill_litter_super() mess with removing the
     remaining "leaked" references (e.g. for all tmpfs files that hadn't
     been removed prior to umount), have the regular
     shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries,
     dropping the corresponding reference if it had been set. After that
     kill_litter_super() becomes an equivalent of kill_anon_super().

  Doing that in a single step is not feasible - it would affect too many
  places in too many filesystems. It has to be split into a series.

  This work has really started early in 2024; quite a few preliminary
  pieces have already gone into mainline. This chunk is finally getting
  to the meat of that stuff - infrastructure and most of the conversions
  to it.

  Some pieces are still sitting in the local branches, but the bulk of
  that stuff is here"

* tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  d_make_discardable(): warn if given a non-persistent dentry
  kill securityfs_recursive_remove()
  convert securityfs
  get rid of kill_litter_super()
  convert rust_binderfs
  convert nfsctl
  convert rpc_pipefs
  convert hypfs
  hypfs: swich hypfs_create_u64() to returning int
  hypfs: switch hypfs_create_str() to returning int
  hypfs: don't pin dentries twice
  convert gadgetfs
  gadgetfs: switch to simple_remove_by_name()
  convert functionfs
  functionfs: switch to simple_remove_by_name()
  functionfs: fix the open/removal races
  functionfs: need to cancel ->reset_work in ->kill_sb()
  functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
  functionfs: don't abuse ffs_data_closed() on fs shutdown
  convert selinuxfs
  ...
2025-12-05 14:36:21 -08:00
Olga Kornievskaia
6f8b26c90a SUNRPC: new helper function for stopping backchannel server
Create a new backchannel function to stop the backchannel server
and clear the bc_serv in transport protected under the bc_pa_lock.

Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23 15:30:12 -05:00
Olga Kornievskaia
441244d427 SUNRPC: cleanup common code in backchannel request
Create a helper function for common code between rdma
and tcp backchannel handling of the backchannel request.
Make sure that access is protected by the bc_pa_lock
lock.

Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23 15:30:12 -05:00
Al Viro
946e225677 convert rpc_pipefs
Just use d_make_persistent() + dput() (and fold the latter into
simple_finish_creating()) and that's it...

NOTE: pipe->dentry is a borrowed reference - it does not contribute
to dentry refcount.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:59:27 -05:00
Jeff Layton
6b3b697d65 sunrpc: allocate a separate bvec array for socket sends
svc_tcp_sendmsg() calls xdr_buf_to_bvec() with the second slot of
rq_bvec as the start, but doesn't reduce the array length by one, which
could lead to an array overrun. Also, rq_bvec is always rq_maxpages in
length, which can be too short in some cases, since the TCP record
marker consumes a slot.

Fix both problems by adding a separate bvec array to the svc_sock that
is specifically for sending. For TCP, make this array one slot longer
than rq_maxpages, to account for the record marker. For UDP, only
allocate as large an array as we need since it's limited to 64k of
payload.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-11-16 18:20:11 -05:00
Chuck Lever
ebd3330d1c SUNRPC: Improve "fragment too large" warning
Including the client IP address that generated the overrun traffic
seems like it would be helpful. The message now reads:

  kernel: svc: nfsd oversized RPC fragment (1064958 octets) from 100.64.0.11:45866

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-11-16 18:20:11 -05:00
Chuck Lever
bf94dea7fd svcrdma: Release transport resources synchronously
NFSD has always supported added network listeners. The new netlink
protocol now enables the removal of listeners.

Olga noticed that if an RDMA listener is removed and immediately
re-added, the deferred __svc_rdma_free() function might not have
run yet, so some or all of the old listener's RDMA resources
linger, which prevents a new listener on the same address from
being created.

Also, svc_xprt_free() does a module_put() just after calling
->xpo_free(). That means if there is deferred work going on, the
module could be unloaded before that work is even started,
resulting in a UAF.

Neil asks:
> What particular part of __svc_rdma_free() needs to run in order for a
> subsequent registration to succeed?
> Can that bit be run directory from svc_rdma_free() rather than be
> delayed?
> (I know almost nothing about rdma so forgive me if the answers to these
> questions seems obvious)

The reasons I can recall are:

 - Some of the transport tear-down work can sleep
 - Releasing a cm_id is tricky and can deadlock

We might be able to mitigate the second issue with judicious
application of transport reference counting.

Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Closes: https://lore.kernel.org/linux-nfs/20250821204328.89218-1-okorniev@redhat.com/
Suggested-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-11-16 18:20:11 -05:00
Jakub Kicinski
c99ebb6132 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.18-rc6).

No conflicts, adjacent changes in:

drivers/net/phy/micrel.c
  96a9178a29 ("net: phy: micrel: lan8814 fix reset of the QSGMII interface")
  61b7ade9ba ("net: phy: micrel: Add support for non PTP SKUs for lan8814")

and a trivial one in tools/testing/selftests/drivers/net/Makefile.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-13 12:35:38 -08:00
Linus Torvalds
6fa9041b71 nfsd-6.18 fixes:
Address recently reported issues or issues found at the recent NFS
 bake-a-thon held in Raleigh, NC.
 
 Issues reported with v6.18-rc:
 - Address a kernel build issue
 - Reorder SEQUENCE processing to avoid spurious NFS4ERR_SEQ_MISORDERED
 
 Issues that need expedient stable backports:
 - Close a refcount leak exposure
 - Report support for NFSv4.2 CLONE correctly
 - Fix oops during COPY_NOTIFY processing
 - Prevent rare crash after XDR encoding failure
 - Prevent crash due to confused or malicious NFSv4.1 client
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmkR+oIACgkQM2qzM29m
 f5dH4Q//cJWZs9TE6GpC0PbGU+vknUgfSZmRpopmzP2mGk5ECrS/BFGY+KScCbkn
 sRsUmG8iSaXyXUg0XNBh+NB3dWSadlbD9ww5FDQh+V1kV1O3HxvxaMGZ/y7lOjt5
 h29MDnkuRvLgk54vuQy34kpXLrYL4QT9KR6FZqqvtkTt/by+bOqAtdSkM2JcX1ZM
 7ggWUjB2Fyqz+/kryqVyYK2O7hwrdisDboe6/kn6eRpXqNW78RjQBOMbmqPFko00
 XQ6PcYeBzGgDHdc11JciyXa8uK14/17QuiFplHZJSYwqTWXPDPXF5rb1CHfjJECY
 grQWOcz82vXwu4H/4r/doPWTxIU/37O67+ERNqlCiu8ldZdb9oKIWPfshISWQoor
 k9UnoL5MCKiBibspX6MNz9PPqfUdd+ipwo4atft+sscAZL0G6DpKwvyuMUxztGOl
 PqyQ0KbhT1B9PO7wlonSfIgy8oCpz6j2LR17hUKvRsefapU/uQD57npmNlCrCVWs
 SFI2/vtlOxRcNMUdGjM/ih9i/foFgocxweal5uEc5s2nqdQvLp+apDccERDWIYlT
 YgZlz1HgjA5aXKFwQw3t6kUk656IdZGuadzUOk90TSlJxRirfU18u3sBMl6klwtV
 iTOyN96IcwxY6eIvQhTovfSJ9aZmCv5CdwrmJuhPxO+aaxD2MrA=
 =hxkc
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
 "Address recently reported issues or issues found at the recent NFS
  bake-a-thon held in Raleigh, NC.

  Issues reported with v6.18-rc:
   - Address a kernel build issue
   - Reorder SEQUENCE processing to avoid spurious NFS4ERR_SEQ_MISORDERED

  Issues that need expedient stable backports:
   - Close a refcount leak exposure
   - Report support for NFSv4.2 CLONE correctly
   - Fix oops during COPY_NOTIFY processing
   - Prevent rare crash after XDR encoding failure
   - Prevent crash due to confused or malicious NFSv4.1 client"

* tag 'nfsd-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  Revert "SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it"
  nfsd: ensure SEQUENCE replay sends a valid reply.
  NFSD: Never cache a COMPOUND when the SEQUENCE operation fails
  NFSD: Skip close replay processing if XDR encoding fails
  NFSD: free copynotify stateid in nfs4_free_ol_stateid()
  nfsd: add missing FATTR4_WORD2_CLONE_BLKSIZE from supported attributes
  nfsd: fix refcount leak in nfsd_set_fh_dentry()
2025-11-12 18:41:01 -08:00
Chuck Lever
324be6dcbf Revert "SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it"
Geert reports:
> This is now commit d8e97cc476 ("SUNRPC: Make RPCSEC_GSS_KRB5
> select CRYPTO instead of depending on it") in v6.18-rc1.
> As RPCSEC_GSS_KRB5 defaults to "y", CRYPTO is now auto-enabled in
> defconfigs that didn't enable it before.

Revert while we work out a proper solution and then test it.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/linux-nfs/b97cea29-4ab7-4fb6-85ba-83f9830e524f@kernel.org/T/#t
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-11-10 09:31:52 -05:00
Kees Cook
85cb0757d7 net: Convert proto_ops connect() callbacks to use sockaddr_unsized
Update all struct proto_ops connect() callback function prototypes from
"struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the
compiler about object sizes. Calls into struct proto handlers gain casts
that will be removed in the struct proto conversion patch.

No binary changes expected.

Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04 19:10:32 -08:00
Kees Cook
0e50474fa5 net: Convert proto_ops bind() callbacks to use sockaddr_unsized
Update all struct proto_ops bind() callback function prototypes from
"struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the
compiler about object sizes. Calls into struct proto handlers gain casts
that will be removed in the struct proto conversion patch.

No binary changes expected.

Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04 19:10:32 -08:00
Linus Torvalds
81538c8e42 NFSD 6.18 Release Notes
Mike Snitzer has prototyped a mechanism for disabling I/O caching in
 NFSD. This is introduced in v6.18 as an experimental feature. This
 enables scaling NFSD in /both/ directions:
 
 - NFS service can be supported on systems with small memory
   footprints, such as low-cost cloud instances
 - Large NFS workloads will be less likely to force the eviction of
   server-local activity, helping it avoid thrashing
 
 Jeff Layton contributed a number of fixes to the new attribute
 delegation implementation (based on a pending Internet RFC) that we
 hope will make attribute delegation reliable enough to enable by
 default, as it is on the Linux NFS client.
 
 The remaining patches in this pull request are clean-ups and minor
 optimizations. Many thanks to the contributors, reviewers, testers,
 and bug reporters who participated during the v6.18 NFSD development
 cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmjjxZMACgkQM2qzM29m
 f5dhcA//f6ha4lQ2qq3HfJCa4jLe55LEEYiPPgsBLyuAJtQcEQEVJtb1+0qmdeoE
 4ROW/ooE6TmgRX3hJxpUqMeJPj99lC/Y29gsDfWvl9+EM1Dii4JborEBXFwKXn5b
 KQhM/UK0H8nsZSDys5nr45eai0LQhjL6kbG2k2OoRxDjj1VpOdqVMYWKBVFDwYLc
 LDqv/zh4mB30faMATfLkLpGQb2J9sWd6+Ks45m0H6+KkEEjINlHB5WTX601Omyw5
 7YgIiR9/OPLiEcY0rK1iDjiY5g3sFza/W+hqzBADi9i5BDdadYUhUvQWgR+Y4SDD
 04UltS5eVRtLM74S3JErWqQ4yuS5/aiUcinNeHV6wyfunuipusG2Hc5ZehcJAH78
 Pt8D9KcBCsCMv2sBJl3vH7++rpBeiP9WHONrydK25Omk5as7eOGFjxE++KQ69BOB
 +AtHIKEtw1VqX8IpLaG1EL3Ux22z4/PRw2sVx1Oql3mgOCXIsBTr7KPVhjwfBoQO
 qlGpnj+xGV1b082BBMfTp1TLtRdD2PKOMaUQeF8XhyEXw92iGQfi0fCrAsbpFKfn
 D3+BK3mR+tOcaEhpHhqU/XwiYfSTU0NYrimmjEvGbFhsT0Tz+RSoNQUBth2tJaqZ
 hMEBmVujtu5LWnD0+dUNuylL10I4XDge+NVEmJa9pg2nFmjcojw=
 =/nLS
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Mike Snitzer has prototyped a mechanism for disabling I/O caching in
  NFSD. This is introduced in v6.18 as an experimental feature. This
  enables scaling NFSD in /both/ directions:

   - NFS service can be supported on systems with small memory
     footprints, such as low-cost cloud instances

   - Large NFS workloads will be less likely to force the eviction of
     server-local activity, helping it avoid thrashing

  Jeff Layton contributed a number of fixes to the new attribute
  delegation implementation (based on a pending Internet RFC) that we
  hope will make attribute delegation reliable enough to enable by
  default, as it is on the Linux NFS client.

  The remaining patches in this pull request are clean-ups and minor
  optimizations. Many thanks to the contributors, reviewers, testers,
  and bug reporters who participated during the v6.18 NFSD development
  cycle"

* tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits)
  nfsd: discard nfserr_dropit
  SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it
  NFSD: Add io_cache_{read,write} controls to debugfs
  NFSD: Do the grace period check in ->proc_layoutget
  nfsd: delete unnecessary NULL check in __fh_verify()
  NFSD: Allow layoutcommit during grace period
  NFSD: Disallow layoutget during grace period
  sunrpc: fix "occurence"->"occurrence"
  nfsd: Don't force CRYPTO_LIB_SHA256 to be built-in
  nfsd: nfserr_jukebox in nlm_fopen should lead to a retry
  NFSD: Reduce DRC bucket size
  NFSD: Delay adding new entries to LRU
  SUNRPC: Move the svc_rpcb_cleanup() call sites
  NFS: Remove rpcbind cleanup for NFSv4.0 callback
  nfsd: unregister with rpcbind when deleting a transport
  NFSD: Drop redundant conversion to bool
  sunrpc: eliminate return pointer in svc_tcp_sendmsg()
  sunrpc: fix pr_notice in svc_tcp_sendto() to show correct length
  nfsd: decouple the xprtsec policy check from check_nfsd_access()
  NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul()
  ...
2025-10-06 13:22:21 -07:00
Eric Biggers
d8e97cc476 SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it
Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it.  This
unblocks the eventual removal of the selection of CRYPTO from NFSD_V4,
which will no longer be needed by nfsd itself due to switching to the
crypto library functions.  But NFSD_V4 selects RPCSEC_GSS_KRB5, which
still needs CRYPTO.  It makes more sense for RPCSEC_GSS_KRB5 to select
CRYPTO itself, like most other kconfig options that need CRYPTO do.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-01 15:54:01 -04:00
Jeff Layton
ffe381923d sunrpc: unexport rpc_malloc() and rpc_free()
These are not used outside of sunrpc code.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-30 16:04:03 -04:00
Anna Schumaker
cc6ac66f1c SUNRPC: Update gssx_accept_sec_context() to use xdr_set_scratch_folio()
This was the last caller of xdr_set_scratch_page(), so I remove this
function while I'm at it.

Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23 13:29:50 -04:00
Anna Schumaker
d57e43b72b SUNRPC: Update svcxdr_init_decode() to call xdr_set_scratch_folio()
The only snag here is that __folio_alloc_node() doesn't handle
NUMA_NO_NODE, so I also need to update svc_pool_map_get_node() to return
numa_mem_id() instead. I arrived at this approach by  looking at what
other users of __folio_alloc_node() do for this case.

Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23 13:29:50 -04:00
Qianfeng Rong
040058a8f7 SUNRPC: Remove redundant __GFP_NOWARN
GFP_NOWAIT already includes __GFP_NOWARN, so let's remove the redundant
__GFP_NOWARN.

Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23 13:29:50 -04:00
Chuck Lever
62c0c0e749 SUNRPC: Move the svc_rpcb_cleanup() call sites
Clean up: because svc_rpcb_cleanup() and svc_xprt_destroy_all()
are always invoked in pairs, we can deduplicate code by moving
the svc_rpcb_cleanup() call sites into svc_xprt_destroy_all().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23 13:28:19 -04:00
Jeff Layton
ec7d8e68ef sunrpc: add a Kconfig option to redirect dfprintk() output to trace buffer
We have a lot of old dprintk() call sites that aren't going anywhere
anytime soon. At the same time, turning them up is a serious burden on
the host due to the console locking overhead.

Add a new Kconfig option that redirects dfprintk() output to the trace
buffer. This is more efficient than logging to the console and allows
for proper interleaving of dprintk and static tracepoint events.

Since using trace_printk() causes scary warnings to pop at boot time,
this new option defaults to "n".

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23 13:28:19 -04:00
Xichao Zhao
6c15463c45 sunrpc: fix "occurence"->"occurrence"
Trivial fix to spelling mistake in comment text.

Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com>
Reviewed-by: Joe Damato <joe@dama.to>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21 19:24:50 -04:00
Chuck Lever
d73d06dac6 SUNRPC: Move the svc_rpcb_cleanup() call sites
Clean up: because svc_rpcb_cleanup() and svc_xprt_destroy_all()
are always invoked in pairs, we can deduplicate code by moving
the svc_rpcb_cleanup() call sites into svc_xprt_destroy_all().

Tested-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21 19:24:50 -04:00
Olga Kornievskaia
898374fdd7 nfsd: unregister with rpcbind when deleting a transport
When a listener is added, a part of creation of transport also registers
program/port with rpcbind. However, when the listener is removed,
while transport goes away, rpcbind still has the entry for that
port/type.

When deleting the transport, unregister with rpcbind when appropriate.

---v2 created a new xpt_flag XPT_RPCB_UNREG to mark TCP and UDP
transport and at xprt destroy send rpcbind unregister if flag set.

Suggested-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: d093c90892 ("nfsd: fix management of listener transports")
Cc: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21 19:24:50 -04:00
Jeff Layton
7569065fb1 sunrpc: eliminate return pointer in svc_tcp_sendmsg()
Return a positive value if something was sent, or a negative error code.
Eliminate the "err" variable in the only caller as well.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21 19:24:50 -04:00