Commit graph

694 commits

Author SHA1 Message Date
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Jeff Layton
a0022a38be sunrpc: allow svc_recv() to return -ETIMEDOUT and -EBUSY
To dynamically adjust the thread count, nfsd requires some information
about how busy things are.

Change svc_recv() to take a timeout value, and then allow the wait for
work to time out if it's set. If a timeout is not defined, then the
schedule will be set to MAX_SCHEDULE_TIMEOUT. If the task waits for the
full timeout, then have it return -ETIMEDOUT to the caller.

If it wakes up, finds that there is more work and that no threads are
available, then attempt to set SP_TASK_STARTING. If wasn't already set,
have the task return -EBUSY to cue to the caller that the service could
use more threads.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28 10:15:42 -05:00
Jeff Layton
7ffc7ade2c sunrpc: introduce the concept of a minimum number of threads per pool
Add a new pool->sp_nrthrmin field to track the minimum number of threads
in a pool. Add min_threads parameters to both svc_set_num_threads() and
svc_set_pool_threads(). If min_threads is non-zero and less than the
max, svc_set_num_threads() will ensure that the number of running
threads is between the min and the max.

If the min is 0 or greater than the max, then it is ignored, and the
maximum number of threads will be started, and never spun down.

For now, the min_threads is always 0, but a later patch will pass the
proper value through from nfsd.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28 10:15:42 -05:00
Jeff Layton
e344f87262 sunrpc: split svc_set_num_threads() into two functions
svc_set_num_threads() will set the number of running threads for a given
pool. If the pool argument is set to NULL however, it will distribute
the threads among all of the pools evenly.

These divergent codepaths complicate the move to dynamic threading.
Simplify the API by splitting these two cases into different helpers:

Add a new svc_set_pool_threads() function that sets the number of
threads in a single, given pool. Modify svc_set_num_threads() to
distribute the threads evenly between all of the pools and then call
svc_set_pool_threads() for each.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28 10:15:42 -05:00
NeilBrown
96f04d24fc locks: ensure vfs_test_lock() never returns FILE_LOCK_DEFERRED
FILE_LOCK_DEFERRED can be returned when creating or removing a lock, but
not when testing for a lock.  This support was explicitly removed in
Commit 09802fd2a8 ("lockd: rip out deferred lock handling from testlock codepath")

However the test in nlmsvc_testlock() suggests that it *can* be returned,
only nlm cannot handle it.

To aid clarity, remove the test and instead put a similar test and
warning in vfs_test_lock().  If the impossible happens, convert
FILE_LOCK_DEFERRED to -EIO.

Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26 10:10:58 -05:00
Linus Torvalds
c8ebd43345 nfsd-6.19 fixes:
A set of NFSD fixes that arrived just a bit late for the 6.19 merge
 window.
 
 Issues reported with v6.19-rc:
 - Avoid unnecessarily breaking a timestamp delegation
 
 Issues that need expedient stable backports:
 - Fix a crasher in nlm4svc_proc_test()
 - Fix nfsd_file reference leak during write delegation
 - Fix error flow in client_states_open()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmlSktAACgkQM2qzM29m
 f5caSRAAhgbXuw3O1Fo9l+FOnbwH58mbIOEcykp0IgsId0+07yMiY2x9Ll4oxAzs
 Y6qMOeEz5K/8rgk8DdBf6lOZ03HMXiUEPP6cUVMjbv4XdOFsGtpTAIycoMGf71v8
 pk42uo/I437vYQHyC5UIdJ/kQ4nkxrfK9j0vRsS6LirIhZrfJohrgo80ou2FPw5L
 BQO3pd/Va5yzbVQ4cHHh1w5NCp4qEMJx1hQcwg2Ll/WJyk2zISHFPIusD+RLYLnP
 w4vwaCCZLbnYMbHcE9TfAWkkT7S5YftMYqkB8X/NkbiiSgw4V14CKUMaPEXYhLbq
 Us5C6cCMXKVunOlVrRfPISvkcDwkTjgAiSM8xur0d5NEgb8fXJx9KEoflstcXbNr
 iZ3UWg6FRHGMWx6HkXNKCIS5LF9xvY+APjO20VY+Q1A/+AxzNyfmmRsD5oL3SeKd
 EovR7QIZUycWUGkT7n69vB49x4QqpIaqgH5F0ZUu7dmKdPDnLB/PDI3Gxj9MjaBr
 zwt1BTcGnlGtrCR8x+GqV3RXiWNL3yQGrvIZDKBUww0QNJ9KBPrOAvP0AxRjoIvr
 WANMjVoK3bXULmiHejpcBuRi+h6kGzpE8wGPIkmcJRa3kAId2v1l7rbbWUhc1TnO
 /HomN/qQhZLoxijAonKl/KxL71/EMnzNfEdgwTyK3ZlVGwFhPSs=
 =QzWs
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
 "A set of NFSD fixes that arrived just a bit late for the 6.19 merge
  window.

  Regression fix:
   - Avoid unnecessarily breaking a timestamp delegation

  Stable fixes:
   - Fix a crasher in nlm4svc_proc_test()
   - Fix nfsd_file reference leak during write delegation
   - Fix error flow in client_states_open()"

* tag 'nfsd-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: Drop the client reference in client_states_open()
  nfsd: use ATTR_DELEG in nfsd4_finalize_deleg_timestamps()
  nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg()
  lockd: fix vfs_test_lock() calls
2025-12-30 17:56:26 -08:00
NeilBrown
a49a2a1baa lockd: fix vfs_test_lock() calls
Usage of vfs_test_lock() is somewhat confused.  Documentation suggests
it is given a "lock" but this is not the case.  It is given a struct
file_lock which contains some details of the sort of lock it should be
looking for.

In particular passing a "file_lock" containing fl_lmops or fl_ops is
meaningless and possibly confusing.

This is particularly problematic in lockd.  nlmsvc_testlock() receives
an initialised "file_lock" from xdr-decode, including manager ops and an
owner.  It then mistakenly passes this to vfs_test_lock() which might
replace the owner and the ops.  This can lead to confusion when freeing
the lock.

The primary role of the 'struct file_lock' passed to vfs_test_lock() is
to report a conflicting lock that was found, so it makes more sense for
nlmsvc_testlock() to pass "conflock", which it uses for returning the
conflicting lock.

With this change, freeing of the lock is not confused and code in
__nlm4svc_proc_test() and __nlmsvc_proc_test() can be simplified.

Documentation for vfs_test_lock() is improved to reflect its real
purpose, and a WARN_ON_ONCE() is added to avoid a similar problem in the
future.

Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Closes: https://lore.kernel.org/all/20251021130506.45065-1-okorniev@redhat.com
Signed-off-by: NeilBrown <neil@brown.name>
Fixes: 20fa190272 ("nfs: add export operations")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-12-24 21:18:46 -05:00
Linus Torvalds
b0319c4642 NFSD 6.19 Release Notes
Mike Snitzer's mechanism for disabling I/O caching introduced in
 v6.18 is extended to include using direct I/O. The goal is to
 further reduce the memory footprint consumed by NFS clients
 accessing large data sets via NFSD.
 
 The NFSD community adopted a maintainer entry profile during this
 cycle. See
 
   Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst
 
 Work continues on hardening NFSD's implementation of the pNFS block
 layout type. This type enables pNFS clients to directly access the
 underlying block devices that contain an exported file system,
 reducing server overhead and increasing data throughput.
 
 The remaining patches in this pull request are clean-ups and minor
 optimizations. Many thanks to the contributors, reviewers, testers,
 and bug reporters who participated during the v6.19 NFSD development
 cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmk0SkQACgkQM2qzM29m
 f5cI3RAAoqg53ctyWC8B76mC2kT/bugWWXNwVAVd58UGy3yptMlJG2TBBAHbV6rs
 NgMzP1gg470eccYonntS6Pk259hJimi/REifk+2bFuvjqym7OruKXJBxn2FZrCCw
 mBu/ptfj9SFoTezAHh/wNuHTEy68gLn6V2LB3jdMcxeqUaC39kFnB0sZP4+xiFcx
 PfL6uiXj8JtGpYXf8AKf7HniZCBrtkia1ByRrFHrcPX5A6S9dL85rDQbm/O8L3AA
 hS3cp2UQUSwvFUED9N2QXPpRQ3nytNSG08f/wOaXhIXAmXq/sZFGjlNqjpX8i5jV
 jsSQIeYy1BwLsdxo25wbShrqmYsFH3zWRELKoCs0g9lXbpJWyK0O93zETXtYRRae
 4O9iJ5VcQMyMpg4/Gh1jsRU9RUCc88T15F2HQVYTIgHrLKnz1sWMKI//1G3kYrf3
 L7hYk2ZU6QYqSNOFkrL0Jf7geqf8FR4nUi/nv+A/2gFqm+WWIhNTKQt2hsQlqD66
 HaYOhFX5SOj8YnjsbpGM8W65n8ELaON0PWhq+SBpRCm7i2llod0HTIpyv00NTwVL
 xmr5MfmO7989kVVy7DA/h/xg/dfOpUCNjImrXz+QDbsmCMdLjT/6YGD47FLkxzEO
 O+PuRIORuez8KosEaIRVjdXcNFPSeGfDdBAveHH9IqRX7ZvIZQY=
 =E0oM
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:

 - Mike Snitzer's mechanism for disabling I/O caching introduced in
   v6.18 is extended to include using direct I/O. The goal is to further
   reduce the memory footprint consumed by NFS clients accessing large
   data sets via NFSD.

 - The NFSD community adopted a maintainer entry profile during this
   cycle. See

      Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst

 - Work continues on hardening NFSD's implementation of the pNFS block
   layout type. This type enables pNFS clients to directly access the
   underlying block devices that contain an exported file system,
   reducing server overhead and increasing data throughput.

 - The remaining patches are clean-ups and minor optimizations. Many
   thanks to the contributors, reviewers, testers, and bug reporters who
   participated during the v6.19 NFSD development cycle.

* tag 'nfsd-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (38 commits)
  NFSD: nfsd-io-modes: Separate lists
  NFSD: nfsd-io-modes: Wrap shell snippets in literal code blocks
  NFSD: Add toctree entry for NFSD IO modes docs
  NFSD: add Documentation/filesystems/nfs/nfsd-io-modes.rst
  NFSD: Implement NFSD_IO_DIRECT for NFS WRITE
  NFSD: Make FILE_SYNC WRITEs comply with spec
  NFSD: Add trace point for SCSI fencing operation.
  NFSD: use correct reservation type in nfsd4_scsi_fence_client
  xdrgen: Don't generate unnecessary semicolon
  xdrgen: Fix union declarations
  NFSD: don't start nfsd if sv_permsocks is empty
  xdrgen: handle _XdrString in union encoder/decoder
  xdrgen: Fix the variable-length opaque field decoder template
  xdrgen: Make the xdrgen script location-independent
  xdrgen: Generalize/harden pathname construction
  lockd: don't allow locking on reexported NFSv2/3
  MAINTAINERS: add a nfsd blocklayout reviewer
  nfsd: Use MD5 library instead of crypto_shash
  nfsd: stop pretending that we cache the SEQUENCE reply.
  NFS: nfsd-maintainer-entry-profile: Inline function name prefixes
  ...
2025-12-06 10:57:02 -08:00
Asbjørn Sloth Tønnesen
68e83f3472 tools: ynl-gen: add regeneration comment
Add a comment on regeneration to the generated files.

The comment is placed after the YNL-GEN line[1], as to not interfere
with ynl-regen.sh's detection logic.

[1] and after the optional YNL-ARG line.

Link: https://lore.kernel.org/r/aR5m174O7pklKrMR@zx2c4.com/
Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251120174429.390574-3-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:20:42 -08:00
Jeff Layton
898f944652 lockd: don't allow locking on reexported NFSv2/3
Since commit 9254c8ae9b ("nfsd: disallow file locking and delegations
for NFSv4 reexport"), file locking when reexporting an NFS mount via
NFSv4 is expressly prohibited by nfsd. Do the same in lockd:

Add a new  nlmsvc_file_cannot_lock() helper that will test whether file
locking is allowed for a given file, and return nlm_lck_denied_nolocks
if it isn't.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-11-20 16:29:35 -05:00
Chuck Lever
d73d06dac6 SUNRPC: Move the svc_rpcb_cleanup() call sites
Clean up: because svc_rpcb_cleanup() and svc_xprt_destroy_all()
are always invoked in pairs, we can deduplicate code by moving
the svc_rpcb_cleanup() call sites into svc_xprt_destroy_all().

Tested-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21 19:24:50 -04:00
Colin Ian King
6ecdfd7aa8 lockd: Remove space before newline
There is an extraneous space before a newline in a dprintk message.
Remove the space.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21 19:24:50 -04:00
Nicolas Bouchinet
87480a8ce5 sysctl: Fixes nsm_local_state bounds
Bound nsm_local_state sysctl writings between SYSCTL_ZERO
and SYSCTL_INT_MAX.

The proc_handler has thus been updated to proc_dointvec_minmax.

Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
[ cel: updated to handle zero - UINT_MAX instead ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:13 -04:00
Jeff Layton
9a28ac1762 lockd: add netlink control interface
The legacy rpc.nfsd tool will set the nlm_grace_period if the NFSv4
grace period is set. nfsdctl is missing this functionality, so add a new
netlink control interface for lockd that it can use. For now, it only
allows setting the grace period, and the tcp and udp listener ports.

lockd currently uses module parameters and sysctls for configuration, so
all of its settings are global. With this change, lockd now tracks these
values on a per-net-ns basis. It will only fall back to using the global
values if any of them are 0.

Finally, as a backward compatibility measure, if updating the nlm
settings in the init_net namespace, also update the legacy global
values to match.

Link: https://issues.redhat.com/browse/RHEL-71698
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:10:53 -04:00
Joel Granados
1751f872cc treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.

Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25c ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.

Created this by running an spatch followed by a sed command:
Spatch:
    virtual patch

    @
    depends on !(file in "net")
    disable optional_qualifier
    @

    identifier table_name != {
      watchdog_hardlockup_sysctl,
      iwcm_ctl_table,
      ucma_ctl_table,
      memory_allocation_profiling_sysctls,
      loadpin_sysctl_table
    };
    @@

    + const
    struct ctl_table table_name [] = { ... };

sed:
    sed --in-place \
      -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
      kernel/utsname_sysctl.c

Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI
Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-01-28 13:48:37 +01:00
NeilBrown
a4b853f183 sunrpc: remove all connection limit configuration
Now that the connection limit only apply to unconfirmed connections,
there is no need to configure it.  So remove all the configuration and
fix the number of unconfirmed connections as always 64 - which is
now given a name: XPT_MAX_TMP_CONN

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06 09:37:36 -05:00
Linus Torvalds
445d9f05fa NFSD 6.13 Release Notes
Jeff Layton contributed a scalability improvement to NFSD's NFSv4
 backchannel session implementation. This improvement is intended to
 increase the rate at which NFSD can safely recall NFSv4 delegations
 from clients, to avoid the need to revoke them. Revoking requires
 a slow state recovery process.
 
 A wide variety of bug fixes and other incremental improvements make
 up the bulk of commits in this series. As always I am grateful to
 the NFSD contributors, reviewers, testers, and bug reporters who
 participated during this cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmdEgLQACgkQM2qzM29m
 f5cwmg/9HcfG7blepU/2qNHopzSYRO5vZw1YNJQ5/Wi3bmqIea83lf8OcCY1G/aj
 6K+jnenzHrwfhaA4u7N2FPXPVl8sPSMuOrJXY5zC4yE5QnIbranjcyEW5l5zlj3n
 ukkTYQgjUsKre3pHlvn3JmDHfUhNPEfzirsJeorP7DS3omne+OFA1LNncNP6emRu
 h0aEC6EJ43zUkYiz9nZYqPwIAwrUIA0WOrvVnq7vsi6gR4/Muk7nS+X/y4qFjli3
 9enVskEv8sFmmOAIMK3CHJq+exEeKtKEKUuYkD23QgPt2R4+IwqS70o9IM/S1ypf
 APiv958BIhxm/SwUn1IjoxIckTB5EdksMxU5/4qGr1ZxprPG4/ruKO80BkrxLzW2
 n1HmJ4ZNnpWPQvHN7RQ0WOsPNzL8byxJbGr1bpNgU4AGXnTFWPrAnB6juiyX4xb+
 YNfgkQGDY79o7r1OJ5UUdCyx0QBSnaLNACTGm2u2FpI/ukMFPdrWIE99QbBgSe1p
 MgWaiPwSY+9crFfGPJeQ4t6/siRAec6L3RO9KT9Epcd2S7/Uts3NXYRdJfwZ+Qza
 TkPY2bm7T/WCcMhW7DN372hqgfRHPWOf4tacJ1Tob+As1d6p6qXEX2zi6piCCOLj
 dmTVDSVPClRXt8YigF9WqosyWv1jUzSnh9ne+eYPBpj93Ag2YBY=
 =wBvS
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Jeff Layton contributed a scalability improvement to NFSD's NFSv4
  backchannel session implementation. This improvement is intended to
  increase the rate at which NFSD can safely recall NFSv4 delegations
  from clients, to avoid the need to revoke them. Revoking requires a
  slow state recovery process.

  A wide variety of bug fixes and other incremental improvements make up
  the bulk of commits in this series. As always I am grateful to the
  NFSD contributors, reviewers, testers, and bug reporters who
  participated during this cycle"

* tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (72 commits)
  nfsd: allow for up to 32 callback session slots
  nfs_common: must not hold RCU while calling nfsd_file_put_local
  nfsd: get rid of include ../internal.h
  nfsd: fix nfs4_openowner leak when concurrent nfsd4_open occur
  NFSD: Add nfsd4_copy time-to-live
  NFSD: Add a laundromat reaper for async copy state
  NFSD: Block DESTROY_CLIENTID only when there are ongoing async COPY operations
  NFSD: Handle an NFS4ERR_DELAY response to CB_OFFLOAD
  NFSD: Free async copy information in nfsd4_cb_offload_release()
  NFSD: Fix nfsd4_shutdown_copy()
  NFSD: Add a tracepoint to record canceled async COPY operations
  nfsd: make nfsd4_session->se_flags a bool
  nfsd: remove nfsd4_session->se_bchannel
  nfsd: make use of warning provided by refcount_t
  nfsd: Don't fail OP_SETCLIENTID when there are too many clients.
  svcrdma: fix miss destroy percpu_counter in svc_rdma_proc_init()
  xdrgen: Remove program_stat_to_errno() call sites
  xdrgen: Update the files included in client-side source code
  xdrgen: Remove check for "nfs_ok" in C templates
  xdrgen: Remove tracepoint call site
  ...
2024-11-26 12:59:30 -08:00
Chuck Lever
9189d23b83 lockd: Remove unneeded initialization of file_lock::c.flc_flags
Since commit 75c7940d2a ("lockd: set missing fl_flags field when
retrieving args"), nlmsvc_retrieve_args() initializes the flc_flags
field. svcxdr_decode_lock() no longer needs to do this.

This clean up removes one dependency on the nlm_lock:fl field. No
behavior change is expected.

Analysis:

svcxdr_decode_lock() is called by:

nlm4svc_decode_testargs()
nlm4svc_decode_lockargs()
nlm4svc_decode_cancargs()
nlm4svc_decode_unlockargs()

nlm4svc_decode_testargs() is used by:
- NLMPROC4_TEST and NLMPROC4_TEST_MSG, which call nlmsvc_retrieve_args()
- NLMPROC4_GRANTED and NLMPROC4_GRANTED_MSG, which don't pass the
  lock's file_lock to the generic lock API

nlm4svc_decode_lockargs() is used by:
- NLMPROC4_LOCK and NLM4PROC4_LOCK_MSG, which call nlmsvc_retrieve_args()
- NLMPROC4_UNLOCK and NLM4PROC4_UNLOCK_MSG, which call nlmsvc_retrieve_args()
- NLMPROC4_NM_LOCK, which calls nlmsvc_retrieve_args()

nlm4svc_decode_cancargs() is used by:
- NLMPROC4_CANCEL and NLMPROC4_CANCEL_MSG, which call nlmsvc_retrieve_args()

nlm4svc_decode_unlockargs() is used by:
- NLMPROC4_UNLOCK and NLMPROC4_UNLOCK_MSG, which call nlmsvc_retrieve_args()

All callers except GRANTED/GRANTED_MSG eventually call
nlmsvc_retrieve_args() before using nlm_lock::fl.c.flc_flags. Thus
this change is safe.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18 20:23:04 -05:00
Chuck Lever
8994a512e2 lockd: Remove unused parameter to nlmsvc_testlock()
The nlm_cookie parameter has been unused since commit 09802fd2a8
("lockd: rip out deferred lock handling from testlock codepath").

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18 20:23:04 -05:00
Chuck Lever
a872c7313e lockd: Remove some snippets of unfinished code
Clean up.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18 20:23:04 -05:00
Chuck Lever
e594884128 lockd: Remove unnecessary memset()
Since commit 103cc1fafe ("SUNRPC: Parametrize how much of argsize
should be zeroed") (and possibly long before that, even) all of the
memory underlying rqstp->rq_argp is zeroed already. There's no need
for the memset() in nlm4svc_decode_shareargs().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18 20:23:03 -05:00
Pali Rohár
2dc84a7522 lockd: Fix comment about NLMv3 backwards compatibility
NLMv2 is completely different protocol than NLMv1 and NLMv3, and in
original Sun implementation is used for RPC loopback callbacks from statd
to lockd services. Linux does not use nor does not implement NLMv2.

Hence, NLMv3 is not backward compatible with NLMv2. But NLMv3 is backward
compatible with NLMv1. Fix comment.

Signed-off-by: Pali Rohár <pali@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18 20:22:58 -05:00
Linus Torvalds
70e7730c2a vfs-6.13.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcToAAKCRCRxhvAZXjc
 osL9AP948FFumJRC28gDJ4xp+X4eohNOfkgoEG8FTbF2zU6ulwD+O0pr26FqpFli
 pqlG+38UdATImpfqqWjPbb72sBYcfQg=
 =wLUh
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Features:

   - Fixup and improve NLM and kNFSD file lock callbacks

     Last year both GFS2 and OCFS2 had some work done to make their
     locking more robust when exported over NFS. Unfortunately, part of
     that work caused both NLM (for NFS v3 exports) and kNFSD (for
     NFSv4.1+ exports) to no longer send lock notifications to clients

     This in itself is not a huge problem because most NFS clients will
     still poll the server in order to acquire a conflicted lock

     It's important for NLM and kNFSD that they do not block their
     kernel threads inside filesystem's file_lock implementations
     because that can produce deadlocks. We used to make sure of this by
     only trusting that posix_lock_file() can correctly handle blocking
     lock calls asynchronously, so the lock managers would only setup
     their file_lock requests for async callbacks if the filesystem did
     not define its own lock() file operation

     However, when GFS2 and OCFS2 grew the capability to correctly
     handle blocking lock requests asynchronously, they started
     signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check
     for also trusting posix_lock_file() was inadvertently dropped, so
     now most filesystems no longer produce lock notifications when
     exported over NFS

     Fix this by using an fop_flag which greatly simplifies the problem
     and grooms the way for future uses by both filesystems and lock
     managers alike

   - Add a sysctl to delete the dentry when a file is removed instead of
     making it a negative dentry

     Commit 681ce86235 ("vfs: Delete the associated dentry when
     deleting a file") introduced an unconditional deletion of the
     associated dentry when a file is removed. However, this led to
     performance regressions in specific benchmarks, such as
     ilebench.sum_operations/s, prompting a revert in commit
     4a4be1ad3a ("Revert "vfs: Delete the associated dentry when
     deleting a file""). This reintroduces the concept conditionally
     through a sysctl

   - Expand the statmount() system call:

       * Report the filesystem subtype in a new fs_subtype field to
         e.g., report fuse filesystem subtypes

       * Report the superblock source in a new sb_source field

       * Add a new way to return filesystem specific mount options in an
         option array that returns filesystem specific mount options
         separated by zero bytes and unescaped. This allows caller's to
         retrieve filesystem specific mount options and immediately pass
         them to e.g., fsconfig() without having to unescape or split
         them

       * Report security (LSM) specific mount options in a separate
         security option array. We don't lump them together with
         filesystem specific mount options as security mount options are
         generic and most users aren't interested in them

         The format is the same as for the filesystem specific mount
         option array

   - Support relative paths in fsconfig()'s FSCONFIG_SET_STRING command

   - Optimize acl_permission_check() to avoid costly {g,u}id ownership
     checks if possible

   - Use smp_mb__after_spinlock() to avoid full smp_mb() in evict()

   - Add synchronous wakeup support for ep_poll_callback.

     Currently, epoll only uses wake_up() to wake up task. But sometimes
     there are epoll users which want to use the synchronous wakeup flag
     to give a hint to the scheduler, e.g., the Android binder driver.
     So add a wake_up_sync() define, and use wake_up_sync() when sync is
     true in ep_poll_callback()

  Fixes:

   - Fix kernel documentation for inode_insert5() and iget5_locked()

   - Annotate racy epoll check on file->f_ep

   - Make F_DUPFD_QUERY associative

   - Avoid filename buffer overrun in initramfs

   - Don't let statmount() return empty strings

   - Add a cond_resched() to dump_user_range() to avoid hogging the CPU

   - Don't query the device logical blocksize multiple times for hfsplus

   - Make filemap_read() check that the offset is positive or zero

  Cleanups:

   - Various typo fixes

   - Cleanup wbc_attach_fdatawrite_inode()

   - Add __releases annotation to wbc_attach_and_unlock_inode()

   - Add hugetlbfs tracepoints

   - Fix various vfs kernel doc parameters

   - Remove obsolete TODO comment from io_cancel()

   - Convert wbc_account_cgroup_owner() to take a folio

   - Fix comments for BANDWITH_INTERVAL and wb_domain_writeout_add()

   - Reorder struct posix_acl to save 8 bytes

   - Annotate struct posix_acl with __counted_by()

   - Replace one-element array with flexible array member in freevxfs

   - Use idiomatic atomic64_inc_return() in alloc_mnt_ns()"

* tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
  statmount: retrieve security mount options
  vfs: make evict() use smp_mb__after_spinlock instead of smp_mb
  statmount: add flag to retrieve unescaped options
  fs: add the ability for statmount() to report the sb_source
  writeback: wbc_attach_fdatawrite_inode out of line
  writeback: add a __releases annoation to wbc_attach_and_unlock_inode
  fs: add the ability for statmount() to report the fs_subtype
  fs: don't let statmount return empty strings
  fs:aio: Remove TODO comment suggesting hash or array usage in io_cancel()
  hfsplus: don't query the device logical block size multiple times
  freevxfs: Replace one-element array with flexible array member
  fs: optimize acl_permission_check()
  initramfs: avoid filename buffer overrun
  fs/writeback: convert wbc_account_cgroup_owner to take a folio
  acl: Annotate struct posix_acl with __counted_by()
  acl: Realign struct posix_acl to save 8 bytes
  epoll: Add synchronous wakeup support for ep_poll_callback
  coredump: add cond_resched() to dump_user_range
  mm/page-writeback.c: Fix comment of wb_domain_writeout_add()
  mm/page-writeback.c: Update comment for BANDWIDTH_INTERVAL
  ...
2024-11-18 09:35:30 -08:00
Al Viro
5f60d5f6bb move asm/unaligned.h to linux/unaligned.h
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.

auto-generated by the following:

for i in `git grep -l -w asm/unaligned.h`; do
	sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
	sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-10-02 17:23:23 -04:00
Christian Brauner
09ee2a670d
Merge patch series "Fixup NLM and kNFSD file lock callbacks"
Benjamin Coddington <bcodding@redhat.com> says:

Last year both GFS2 and OCFS2 had some work done to make their locking more
robust when exported over NFS.  Unfortunately, part of that work caused both
NLM (for NFS v3 exports) and kNFSD (for NFSv4.1+ exports) to no longer send
lock notifications to clients.

This in itself is not a huge problem because most NFS clients will still
poll the server in order to acquire a conflicted lock, but now that I've
noticed it I can't help but try to fix it because there are big advantages
for setups that might depend on timely lock notifications, and we've
supported that as a feature for a long time.

Its important for NLM and kNFSD that they do not block their kernel threads
inside filesystem's file_lock implementations because that can produce
deadlocks.  We used to make sure of this by only trusting that
posix_lock_file() can correctly handle blocking lock calls asynchronously,
so the lock managers would only setup their file_lock requests for async
callbacks if the filesystem did not define its own lock() file operation.

However, when GFS2 and OCFS2 grew the capability to correctly
handle blocking lock requests asynchronously, they started signalling this
behavior with EXPORT_OP_ASYNC_LOCK, and the check for also trusting
posix_lock_file() was inadvertently dropped, so now most filesystems no
longer produce lock notifications when exported over NFS.

I tried to fix this by simply including the old check for lock(), but the
resulting include mess and layering violations was more than I could accept.
There's a much cleaner way presented here using an fop_flag, which while
potentially flag-greedy, greatly simplifies the problem and grooms the
way for future uses by both filesystems and lock managers alike.

* patches from https://lore.kernel.org/r/cover.1726083391.git.bcodding@redhat.com:
  exportfs: Remove EXPORT_OP_ASYNC_LOCK
  NLM/NFSD: Fix lock notifications for async-capable filesystems
  gfs2/ocfs2: set FOP_ASYNC_LOCK
  fs: Introduce FOP_ASYNC_LOCK
  NFS: trace: show TIMEDOUT instead of 0x6e
  nfsd: use system_unbound_wq for nfsd_file_gc_worker()
  nfsd: count nfsd_file allocations
  nfsd: fix refcount leak when file is unhashed after being found
  nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire
  nfsd: add list_head nf_gc to struct nfsd_file

Link: https://lore.kernel.org/r/cover.1726083391.git.bcodding@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-02 07:52:07 +02:00
Benjamin Coddington
7e64c5bc49
NLM/NFSD: Fix lock notifications for async-capable filesystems
Instead of checking just the exportfs flag, use the new
locks_can_async_lock() helper which allows NLM and NFSD to once again
support lock notifications for all filesystems which use posix_lock_file().

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/865c40da44af67939e8eb560d17a26c9c50f23e0.1726083391.git.bcodding@redhat.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-01 17:00:18 +02:00
NeilBrown
3391fc92db sunrpc: allow svc threads to fail initialisation cleanly
If an svc thread needs to perform some initialisation that might fail,
it has no good way to handle the failure.

Before the thread can exit it must call svc_exit_thread(), but that
requires the service mutex to be held.  The thread cannot simply take
the mutex as that could deadlock if there is a concurrent attempt to
shut down all threads (which is unlikely, but not impossible).

nfsd currently call svc_exit_thread() unprotected in the unlikely event
that unshare_fs_struct() fails.

We can clean this up by introducing svc_thread_init_status() by which an
svc thread can report whether initialisation has succeeded.  If it has,
it continues normally into the action loop.  If it has not,
svc_thread_init_status() immediately aborts the thread.
svc_start_kthread() waits for either of these to happen, and calls
svc_exit_thread() (under the mutex) if the thread aborted.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20 19:31:03 -04:00
NeilBrown
4ed9ef3260 lockd: discard nlmsvc_timeout
nlmsvc_timeout always has the same value as (nlm_timeout * HZ), so use
that in the one place that nlmsvc_timeout is used.

In truth it *might* not always be the same as nlmsvc_timeout is only set
when lockd is started while nlm_timeout can be set at anytime via
sysctl.  I think this difference it not helpful so removing it is good.

Also remove the test for nlm_timout being 0.  This is not possible -
unless a module parameter is used to set the minimum timeout to 0, and
if that happens then it probably should be honoured.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-01 10:04:56 -04:00
Andy Shevchenko
e5d85ec52b lockd: Use *-y instead of *-objs in Makefile
*-objs suffix is reserved rather for (user-space) host programs while
usually *-y suffix is used for kernel drivers (although *-objs works
for that purpose for now).

Let's correct the old usages of *-objs in Makefiles.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-07-08 14:10:03 -04:00
Li kunyu
03b0036f45 lockd: host: Remove unnecessary statements'host = NULL;'
In 'nlm_alloc_host', the host has already been assigned a value of NULL
when defined, so 'host=NULL;' Can be deleted.

Signed-off-by: Li kunyu <kunyu@nfschina.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06 09:07:20 -04:00
Linus Torvalds
a01c9fe323 NFSD 6.9 Release Notes
The bulk of the patches for this release are optimizations, code
 clean-ups, and minor bug fixes.
 
 One new feature to mention is that NFSD administrators now have the
 ability to revoke NFSv4 open and lock state. NFSD's NFSv3 support
 has had this capability for some time.
 
 As always I am grateful to NFSD contributors, reviewers, and
 testers.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmXwV4QACgkQM2qzM29m
 f5c7cg/8CRe0mGbeEMonoSycBjANDuiRolCM+DhVccUvSyWPqf4blF5yrNHcf5zN
 WmjQHVXIJUMVpLovcakj+4aBIuXGgdSmBJamFTy9fVfcFadiWYRceNgMMXpLMDDI
 fMAszRUyfL/r0Evj0Zajt86R5/gGn+W9X6HlDc1k7VV0Z+fzRw9WMxADy11cgHLp
 mh2bzyPmwu0EfBYlWNWLqzWVZm1C5UCGnlInyr0KXImCLOkpJqAVXTDvDkGFW2Qw
 1kJhodyabf6fRV2ZqPjLUuR4aRqABey83rB0N5z7MumO/dJUBW3CHR3uNMqvkmh3
 XevI8bPzS2Kypijcx7dONtkDWwU+fsvCdepNpmVDB73B19BFiLG+HDbMypJ0dmp+
 rvvfILRDCmIb+FA1DUeT3lIc6ac1f1+qAVc7hi3E7rGctEJWeHDsZg+E1PuTvpxM
 3XfRaFnucY5vwyiB2/uI4eblBHcVXoKho+pUqQMegLPRbgsEUyFUfg3+ZMtntagd
 OVUXvWYIARP97HNh0J5ChcGI72UpXtFWMlbbiTiCzYx4FeiCffeczIERXNJ4FYAg
 fKUaiBhdAN1PPFCRXJORZ5XlSIeZttUNSJUPfmuOpkscMdkpRUIhuEUYo9K8/1eL
 O+YZeGW/kTG+llxOERfEHJoekLf1TgGdU7oBmTIgQIK03hTUih8=
 =75G4
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "The bulk of the patches for this release are optimizations, code
  clean-ups, and minor bug fixes.

  One new feature to mention is that NFSD administrators now have the
  ability to revoke NFSv4 open and lock state. NFSD's NFSv3 support has
  had this capability for some time.

  As always I am grateful to NFSD contributors, reviewers, and testers"

* tag 'nfsd-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (75 commits)
  NFSD: Clean up nfsd4_encode_replay()
  NFSD: send OP_CB_RECALL_ANY to clients when number of delegations reaches its limit
  NFSD: Document nfsd_setattr() fill-attributes behavior
  nfsd: Fix NFSv3 atomicity bugs in nfsd_setattr()
  nfsd: Fix a regression in nfsd_setattr()
  NFSD: OP_CB_RECALL_ANY should recall both read and write delegations
  NFSD: handle GETATTR conflict with write delegation
  NFSD: add support for CB_GETATTR callback
  NFSD: Document the phases of CREATE_SESSION
  NFSD: Fix the NFSv4.1 CREATE_SESSION operation
  nfsd: clean up comments over nfs4_client definition
  svcrdma: Add Write chunk WRs to the RPC's Send WR chain
  svcrdma: Post WRs for Write chunks in svc_rdma_sendto()
  svcrdma: Post the Reply chunk and Send WR together
  svcrdma: Move write_info for Reply chunks into struct svc_rdma_send_ctxt
  svcrdma: Post Send WR chain
  svcrdma: Fix retry loop in svc_rdma_send()
  svcrdma: Prevent a UAF in svc_rdma_send()
  svcrdma: Fix SQ wake-ups
  svcrdma: Increase the per-transport rw_ctx count
  ...
2024-03-12 14:27:37 -07:00
Josef Bacik
a2214ed588 nfsd: stop setting ->pg_stats for unused stats
A lot of places are setting a blank svc_stats in ->pg_stats and never
utilizing these stats.  Remove all of these extra structs as we're not
reporting these stats anywhere.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01 09:12:08 -05:00
Jeff Layton
eb8ed7c6ab
lockd: adapt to breakup of struct file_lock
Most of the existing APIs have remained the same, but subsystems that
access file_lock fields directly need to reach into struct
file_lock_core now.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20240131-flsplit-v3-40-c6129007ee8d@kernel.org
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05 13:11:43 +01:00
Jeff Layton
a69ce85ec9
filelock: split common fields into struct file_lock_core
In a future patch, we're going to split file leases into their own
structure. Since a lot of the underlying machinery uses the same fields
move those into a new file_lock_core, and embed that inside struct
file_lock.

For now, add some macros to ensure that we can continue to build while
the conversion is in progress.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20240131-flsplit-v3-17-c6129007ee8d@kernel.org
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05 13:11:38 +01:00
Jeff Layton
872584f1bb
lockd: convert to using new filelock helpers
Convert to using the new file locking helper functions. Also in later
patches we're going to introduce some macros with names that clash with
the variable names in nlmclnt_lock. Rename them.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20240131-flsplit-v3-10-c6129007ee8d@kernel.org
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05 13:11:36 +01:00
Linus Torvalds
a05aea98d4 sysctl-6.8-rc1
To help make the move of sysctls out of kernel/sysctl.c not incur a size
 penalty sysctl has been changed to allow us to not require the sentinel, the
 final empty element on the sysctl array. Joel Granados has been doing all this
 work. On the v6.6 kernel we got the major infrastructure changes required to
 support this. For v6.7 we had all arch/ and drivers/ modified to remove
 the sentinel. For v6.8-rc1 we get a few more updates for fs/ directory only.
 The kernel/ directory is left but we'll save that for v6.9-rc1 as those patches
 are still being reviewed. After that we then can expect also the removal of the
 no longer needed check for procname == NULL.
 
 Let us recap the purpose of this work:
 
   - this helps reduce the overall build time size of the kernel and run time
     memory consumed by the kernel by about ~64 bytes per array
   - the extra 64-byte penalty is no longer inncurred now when we move sysctls
     out from kernel/sysctl.c to their own files
 
 Thomas Weißschuh also sent a few cleanups, for v6.9-rc1 we expect to see further
 work by Thomas Weißschuh with the constificatin of the struct ctl_table.
 
 Due to Joel Granados's work, and to help bring in new blood, I have suggested
 for him to become a maintainer and he's accepted. So for v6.9-rc1 I look forward
 to seeing him sent you a pull request for further sysctl changes. This also
 removes Iurii Zaikin as a maintainer as he has moved on to other projects and
 has had no time to help at all.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmWdWDESHG1jZ3JvZkBr
 ZXJuZWwub3JnAAoJEM4jHQowkoinjJAP/jTNNoyzWisvrrvmXqR5txFGLOE+wW6x
 Xv9avuiM+DTHsH/wK8CkXEivwDqYNAZEHU7NEcolS5bJX/ddSRwN9b5aSVlCrUdX
 Ab4rXmpeSCNFp9zNszWJsDuBKIqjvsKw7qGleGtgZ2qAUHbbH30VROLWCggaee50
 wU3icDLdwkasxrcMXy4Sq5dT5wYC4j/QelqBGIkYPT14Arl1im5zqPZ95gmO/s/6
 mdicTAmq+hhAUfUBJBXRKtsvxY6CItxe55Q4fjpncLUJLHUw+VPVNoBKFWJlBwlh
 LO3liKFfakPSkil4/en+/+zuMByd0JBkIzIJa+Kk5kjpbHRhK0RkmU4+Y5G5spWN
 jjLfiv6RxInNaZ8EWQBMfjE95A7PmYDQ4TOH08+OvzdDIi6B0BB5tBGQpG9BnyXk
 YsLg1Uo4CwE/vn1/a9w0rhadjUInvmAryhb/uSJYFz/lmApLm2JUpY3/KstwGetb
 z+HmLstJb24Djkr6pH8DcjhzRBHeWQ5p0b4/6B+v1HqAUuEhdbyw1F2GrDywyF3R
 h/UOAaKLm1+ffdA246o9TejKiDU96qEzzXMaCzPKyestaRZuiyuYEMDhYbvtsMV5
 zIdMJj5HQ+U1KHDv4IN99DEj7+/vjE3f4Sjo+POFpQeQ8/d+fxpFNqXVv449dgnb
 6xEkkxsR0ElM
 =2qBt
 -----END PGP SIGNATURE-----

Merge tag 'sysctl-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull sysctl updates from Luis Chamberlain:
 "To help make the move of sysctls out of kernel/sysctl.c not incur a
  size penalty sysctl has been changed to allow us to not require the
  sentinel, the final empty element on the sysctl array. Joel Granados
  has been doing all this work.

  In the v6.6 kernel we got the major infrastructure changes required to
  support this. For v6.7 we had all arch/ and drivers/ modified to
  remove the sentinel. For v6.8-rc1 we get a few more updates for fs/
  directory only.

  The kernel/ directory is left but we'll save that for v6.9-rc1 as
  those patches are still being reviewed. After that we then can expect
  also the removal of the no longer needed check for procname == NULL.

  Let us recap the purpose of this work:

   - this helps reduce the overall build time size of the kernel and run
     time memory consumed by the kernel by about ~64 bytes per array

   - the extra 64-byte penalty is no longer inncurred now when we move
     sysctls out from kernel/sysctl.c to their own files

  Thomas Weißschuh also sent a few cleanups, for v6.9-rc1 we expect to
  see further work by Thomas Weißschuh with the constificatin of the
  struct ctl_table.

  Due to Joel Granados's work, and to help bring in new blood, I have
  suggested for him to become a maintainer and he's accepted. So for
  v6.9-rc1 I look forward to seeing him sent you a pull request for
  further sysctl changes. This also removes Iurii Zaikin as a maintainer
  as he has moved on to other projects and has had no time to help at
  all"

* tag 'sysctl-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  sysctl: remove struct ctl_path
  sysctl: delete unused define SYSCTL_PERM_EMPTY_DIR
  coda: Remove the now superfluous sentinel elements from ctl_table array
  sysctl: Remove the now superfluous sentinel elements from ctl_table array
  fs: Remove the now superfluous sentinel elements from ctl_table array
  cachefiles: Remove the now superfluous sentinel element from ctl_table array
  sysclt: Clarify the results of selftest run
  sysctl: Add a selftest for handling empty dirs
  sysctl: Fix out of bounds access for empty sysctl registers
  MAINTAINERS: Add Joel Granados as co-maintainer for proc sysctl
  MAINTAINERS: remove Iurii Zaikin from proc sysctl
2024-01-10 17:44:36 -08:00
NeilBrown
1e3577a452 SUNRPC: discard sv_refcnt, and svc_get/svc_put
sv_refcnt is no longer useful.
lockd and nfs-cb only ever have the svc active when there are a non-zero
number of threads, so sv_refcnt mirrors sv_nrthreads.

nfsd also keeps the svc active between when a socket is added and when
the first thread is started, but we don't really need a refcount for
that.  We can simply not destroy the svc while there are any permanent
sockets attached.

So remove sv_refcnt and the get/put functions.
Instead of a final call to svc_put(), call svc_destroy() instead.
This is changed to also store NULL in the passed-in pointer to make it
easier to avoid use-after-free situations.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:33 -05:00
Joel Granados
9d5b947535 fs: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)

Remove sentinel elements ctl_table struct. Special attention was placed in
making sure that an empty directory for fs/verity was created when
CONFIG_FS_VERITY_BUILTIN_SIGNATURES is not defined. In this case we use the
register sysctl call that expects a size.

Signed-off-by: Joel Granados <j.granados@samsung.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-12-28 04:57:57 -08:00
NeilBrown
fa341560ca SUNRPC: change how svc threads are asked to exit.
svc threads are currently stopped using kthread_stop().  This requires
identifying a specific thread.  However we don't care which thread
stops, just as long as one does.

So instead, set a flag in the svc_pool to say that a thread needs to
die, and have each thread check this flag instead of calling
kthread_should_stop().  The first thread to find and clear this flag
then moves towards exiting.

This removes an explicit dependency on sp_all_threads which will make a
future patch simpler.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:04 -04:00
NeilBrown
f4578ba11c lockd: hold a reference to nlmsvc_serv while stopping the thread.
Both nfsd and nfsv4-callback take a temporary reference to the svc_serv
while calling svc_set_num_threads() to stop the last thread.  lockd does
not.

This extra reference prevents the scv_serv from being freed when the
last thread drops its reference count.  This is not currently needed
for lockd as the svc_serv is not accessed after the last thread is told
to exit.

However a future patch will require svc_exit_thread() to access the
svc_serv after the svc_put() so it will need the code that calls
svc_set_num_threads() to keep a reference and keep the svc_serv active.

So copy the pattern from nfsd and nfsv4-cb to lockd, and take a
reference around svc_set_num_threads(.., 0)

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:04 -04:00
Alexander Aring
afb13302aa lockd: fix race in async lock request handling
This patch fixes a race in async lock request handling between adding
the relevant struct nlm_block to nlm_blocked list after the request was
sent by vfs_lock_file() and nlmsvc_grant_deferred() does a lookup of the
nlm_block in the nlm_blocked list. It could be that the async request is
completed before the nlm_block was added to the list. This would end
in a -ENOENT and a kernel log message of "lockd: grant for unknown
block".

To solve this issue we add the nlm_block before the vfs_lock_file() call
to be sure it has been added when a possible nlmsvc_grant_deferred() is
called. If the vfs_lock_file() results in an case when it wouldn't be
added to nlm_blocked list, the nlm_block struct will be removed from
this list again.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:02 -04:00
Alexander Aring
b743612c0a lockd: don't call vfs_lock_file() for pending requests
This patch returns nlm_lck_blocked in nlmsvc_lock() when an asynchronous
lock request is pending. During testing I ran into the case with the
side-effects that lockd is waiting for only one lm_grant() callback
because it's already part of the nlm_blocked list. If another
asynchronous for the same nlm_block is triggered two lm_grant()
callbacks will occur but lockd was only waiting for one.

To avoid any change of existing users this handling will only being made
when export_op_support_safe_async_lock() returns true.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:01 -04:00
Alexander Aring
2dd10de8e6 lockd: introduce safe async lock op
This patch reverts mostly commit 40595cdc93 ("nfs: block notification
on fs with its own ->lock") and introduces an EXPORT_OP_ASYNC_LOCK
export flag to signal that the "own ->lock" implementation supports
async lock requests. The only main user is DLM that is used by GFS2 and
OCFS2 filesystem. Those implement their own lock() implementation and
return FILE_LOCK_DEFERRED as return value. Since commit 40595cdc93
("nfs: block notification on fs with its own ->lock") the DLM
implementation were never updated. This patch should prepare for DLM
to set the EXPORT_OP_ASYNC_LOCK export flag and update the DLM
plock implementation regarding to it.

Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:01 -04:00
Chuck Lever
78c542f916 SUNRPC: Add enum svc_auth_status
In addition to the benefits of using an enum rather than a set of
macros, we now have a named type that can improve static type
checking of function return values.

As part of this change, I removed a stale comment from svcauth.h;
the return values from current implementations of the
auth_ops::release method are all zero/negative errno, not the SVC_OK
enum values as the old comment suggested.

Suggested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-29 17:45:22 -04:00
NeilBrown
c743b4259c SUNRPC: remove timeout arg from svc_recv()
Most svc threads have no interest in a timeout.
nfsd sets it to 1 hour, but this is a wart of no significance.

lockd uses the timeout so that it can call nlmsvc_retry_blocked().
It also sometimes calls svc_wake_up() to ensure this is called.

So change lockd to be consistent and always use svc_wake_up() to trigger
nlmsvc_retry_blocked() - using a timer instead of a timeout to
svc_recv().

And change svc_recv() to not take a timeout arg.

This makes the sp_threads_timedout counter always zero.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-29 17:45:22 -04:00
NeilBrown
7b719e2bf3 SUNRPC: change svc_recv() to return void.
svc_recv() currently returns a 0 on success or one of two errors:
 - -EAGAIN means no message was successfully received
 - -EINTR means the thread has been told to stop

Previously nfsd would stop as the result of a signal as well as
following kthread_stop().  In that case the difference was useful: EINTR
means stop unconditionally.  EAGAIN means stop if kthread_should_stop(),
continue otherwise.

Now threads only exit when kthread_should_stop() so we don't need the
distinction.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-29 17:45:22 -04:00
NeilBrown
f78116d3bf SUNRPC: call svc_process() from svc_recv().
All callers of svc_recv() go on to call svc_process() on success.
Simplify callers by having svc_recv() do that for them.

This loses one call to validate_process_creds() in nfsd.  That was
debugging code added 14 years ago.  I don't think we need to keep it.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-29 17:45:22 -04:00
NeilBrown
8db14cad28 lockd: remove SIGKILL handling
lockd allows SIGKILL and responds by dropping all locks and restarting
the grace period.  This functionality has been present since 2.1.32 when
lockd was added to Linux.

This functionality is undocumented and most likely added as a useful
debug aid.  When there is a need to drop locks, the better approach is
to use /proc/fs/nfsd/unlock_*.

This patch removes SIGKILL handling as part of preparation for removing
all signal handling from sunrpc service threads.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-29 17:45:22 -04:00
Su Hui
de8d38cf44 fs: lockd: avoid possible wrong NULL parameter
clang's static analysis warning: fs/lockd/mon.c: line 293, column 2:
Null pointer passed as 2nd argument to memory copy function.

Assuming 'hostname' is NULL and calling 'nsm_create_handle()', this will
pass NULL as 2nd argument to memory copy function 'memcpy()'. So return
NULL if 'hostname' is invalid.

Fixes: 77a3ef33e2 ("NSM: More clean up of nsm_get_handle()")
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-29 17:45:22 -04:00