Commit graph

810 commits

Author SHA1 Message Date
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
233a0c0f44 eCryptfs fixes for 7.0-rc1
The set of eCryptfs patches for the 7.0-rc1 merge window consists of
 cleanups that are not intended to have any functional changes:
 - Comment typo fixes
 - Removal of an unused function declaration
 - Use strscpy() instead of the deprecated strcpy()
 - Use string copying helpers instead of memcpy() and manually
   terminating strings
 
 The patches have all spent time in linux-next and they do not regress
 the tests in the ecryptfs-utils tree.
 
 Signed-off-by: Tyler Hicks <code@tyhicks.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEKvuQkp28KvPJn/fVfL7QslYSS/0FAmmWWPwRHGNvZGVAdHlo
 aWNrcy5jb20ACgkQfL7QslYSS/3zQhAAhyHq55LJVxfSOACyabXxxQM4cpmN3vlu
 fgIsfly+mBYPKQg0/vfTAfC8ln5YhjFMcubVbohXcj6FmHGulqoB8A5w74wvksDN
 6kzFIUEg/k10AKijt36gb6ef6BjN00j+LeeSUWZ9uMtJt3SR7UuIKpbxB2sRJSHv
 Q7DguW6Wr4APDyZxmmyPPjM4CFHjTO0a5ePSRkqtLV0Qebtnk1q/v9Ddidmz1N5t
 dWNwcn8mXYSzjorFhzhFy0wVML2nStF1DgEy+7FQVzmpHxPfGN/F07TGLj7rW+hr
 BI9GQpvLOJ/EJ8ssBFqe4Ns44upo/WLnBo4PoxD+iuvPBTJLeP/b7wnHduu6ndjy
 4iZqStOhYNMWolQlnx/PmUAzBYn6K2hKOR/egUcHazWnpq5/FnXMKu6bWz4RE+fS
 n/Pi3XwIwos7zqToWnU7alcCKDwhzamRRicSYWX3ezzj3TbacFJTLv0KO+7W5HpE
 wqnLrwqSpC2tWl/4Qrm70ZekVK8ZMGW0+HwabXupWcAUZN6OmXy/B4sNkRCsZiuA
 TGeaiOMlH7AbF7QKvCI3YKRuPxKutgbzyqFfzSRYH97wPmf+1H0DRXfTVqMs1Wa8
 97KBzOxKpiJdJkvMC82xKa+1usc8JrzeGcErhleSQTgzIijQvoh08wF+isunJROx
 aWOTbdbl508=
 =Ak1f
 -----END PGP SIGNATURE-----

Merge tag 'ecryptfs-7.0-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull ecryptfs updates from Tyler Hicks:
 "This consists of some really minor typo fixes that fell through the
  cracks and some more recent code cleanups:

   - Comment typo fixes

   - Removal of an unused function declaration

   - Use strscpy() instead of the deprecated strcpy()

   - Use string copying helpers instead of memcpy() and manually
     terminating strings"

* tag 'ecryptfs-7.0-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  ecryptfs: Replace memcpy + NUL termination in ecryptfs_copy_filename
  ecryptfs: Drop redundant NUL terminations after calling ecryptfs_to_hex
  ecryptfs: Replace memcpy + NUL termination in ecryptfs_new_file_context
  ecryptfs: Replace strcpy with strscpy in ecryptfs_validate_options
  ecryptfs: Replace strcpy with strscpy in ecryptfs_cipher_code_to_string
  ecryptfs: Replace strcpy with strscpy in ecryptfs_set_default_crypt_stat_vals
  ecryptfs: simplify list initialization in ecryptfs_parse_packet_set()
  ecryptfs: Remove unused declartion ecryptfs_fill_zeros()
  ecryptfs: Fix packet format comment in parse_tag_67_packet()
  ecryptfs: comment typo fix
  ecryptfs: keystore: Fix typo 'the the' in comment
2026-02-20 14:46:31 -08:00
Linus Torvalds
45a43ac5ac vfs-7.0-rc1.misc.2
Please consider pulling these changes from the signed vfs-7.0-rc1.misc.2 tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaZMOCwAKCRCRxhvAZXjc
 oswrAP9r1zjzMimjX2J0hBoMnYjNzQfLLew8+IRygImQ+yaqWgD9Fiw/cQ9eE1Hm
 TMLqck/ky588ywSDaBzfztrXAY3ISgg=
 =4yr2
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.0-rc1.misc.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull more misc vfs updates from Christian Brauner:
 "Features:

   - Optimize close_range() from O(range size) to O(active FDs) by using
     find_next_bit() on the open_fds bitmap instead of linearly scanning
     the entire requested range. This is a significant improvement for
     large-range close operations on sparse file descriptor tables.

   - Add FS_XFLAG_VERITY file attribute for fs-verity files, retrievable
     via FS_IOC_FSGETXATTR and file_getattr(). The flag is read-only.
     Add tracepoints for fs-verity enable and verify operations,
     replacing the previously removed debug printk's.

   - Prevent nfsd from exporting special kernel filesystems like pidfs
     and nsfs. These filesystems have custom ->open() and ->permission()
     export methods that are designed for open_by_handle_at(2) only and
     are incompatible with nfsd. Update the exportfs documentation
     accordingly.

  Fixes:

   - Fix KMSAN uninit-value in ovl_fill_real() where strcmp() was used
     on a non-null-terminated decrypted directory entry name from
     fscrypt. This triggered on encrypted lower layers when the
     decrypted name buffer contained uninitialized tail data.

     The fix also adds VFS-level name_is_dot(), name_is_dotdot(), and
     name_is_dot_dotdot() helpers, replacing various open-coded "." and
     ".." checks across the tree.

   - Fix read-only fsflags not being reset together with xflags in
     vfs_fileattr_set(). Currently harmless since no read-only xflags
     overlap with flags, but this would cause inconsistencies for any
     future shared read-only flag

   - Return -EREMOTE instead of -ESRCH from PIDFD_GET_INFO when the
     target process is in a different pid namespace. This lets userspace
     distinguish "process exited" from "process in another namespace",
     matching glibc's pidfd_getpid() behavior

  Cleanups:

   - Use C-string literals in the Rust seq_file bindings, replacing the
     kernel::c_str!() macro (available since Rust 1.77)

   - Fix typo in d_walk_ret enum comment, add porting notes for the
     readlink_copy() calling convention change"

* tag 'vfs-7.0-rc1.misc.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: add porting notes about readlink_copy()
  pidfs: return -EREMOTE when PIDFD_GET_INFO is called on another ns
  nfsd: do not allow exporting of special kernel filesystems
  exportfs: clarify the documentation of open()/permission() expotrfs ops
  fsverity: add tracepoints
  fs: add FS_XFLAG_VERITY for fs-verity files
  rust: seq_file: replace `kernel::c_str!` with C-Strings
  fs: dcache: fix typo in enum d_walk_ret comment
  ovl: use name_is_dot* helpers in readdir code
  fs: add helpers name_is_dot{,dot,_dotdot}
  ovl: Fix uninit-value in ovl_fill_real
  fs: reset read-only fsflags together with xflags
  fs/file: optimize close_range() complexity from O(N) to O(Sparse)
2026-02-16 13:00:36 -08:00
Linus Torvalds
136114e0ab mm.git review status for linus..mm-nonmm-stable
Total patches:       107
 Reviews/patch:       1.07
 Reviewed rate:       67%
 
 - The 2 patch series "ocfs2: give ocfs2 the ability to reclaim
   suballocator free bg" from Heming Zhao saves disk space by teaching
   ocfs2 to reclaim suballocator block group space.
 
 - The 4 patch series "Add ARRAY_END(), and use it to fix off-by-one
   bugs" from Alejandro Colomar adds the ARRAY_END() macro and uses it in
   various places.
 
 - The 2 patch series "vmcoreinfo: support VMCOREINFO_BYTES larger than
   PAGE_SIZE" from Pnina Feder makes the vmcore code future-safe, if
   VMCOREINFO_BYTES ever exceeds the page size.
 
 - The 7 patch series "kallsyms: Prevent invalid access when showing
   module buildid" from Petr Mladek cleans up kallsyms code related to
   module buildid and fixes an invalid access crash when printing
   backtraces.
 
 - The 3 patch series "Address page fault in
   ima_restore_measurement_list()" from Harshit Mogalapalli fixes a
   kexec-related crash that can occur when booting the second-stage kernel
   on x86.
 
 - The 6 patch series "kho: ABI headers and Documentation updates" from
   Mike Rapoport updates the kexec handover ABI documentation.
 
 - The 4 patch series "Align atomic storage" from Finn Thain adds the
   __aligned attribute to atomic_t and atomic64_t definitions to get
   natural alignment of both types on csky, m68k, microblaze, nios2,
   openrisc and sh.
 
 - The 2 patch series "kho: clean up page initialization logic" from
   Pratyush Yadav simplifies the page initialization logic in
   kho_restore_page().
 
 - The 6 patch series "Unload linux/kernel.h" from Yury Norov moves
   several things out of kernel.h and into more appropriate places.
 
 - The 7 patch series "don't abuse task_struct.group_leader" from Oleg
   Nesterov removes the usage of ->group_leader when it is "obviously
   unnecessary".
 
 - The 5 patch series "list private v2 & luo flb" from Pasha Tatashin
   adds some infrastructure improvements to the live update orchestrator.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY4giAAKCRDdBJ7gKXxA
 jgusAQDnKkP8UWTqXPC1jI+OrDJGU5ciAx8lzLeBVqMKzoYk9AD/TlhT2Nlx+Ef6
 0HCUHUD0FMvAw/7/Dfc6ZKxwBEIxyww=
 =mmsH
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
   disk space by teaching ocfs2 to reclaim suballocator block group
   space (Heming Zhao)

 - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
   ARRAY_END() macro and uses it in various places (Alejandro Colomar)

 - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
   the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
   page size (Pnina Feder)

 - "kallsyms: Prevent invalid access when showing module buildid" cleans
   up kallsyms code related to module buildid and fixes an invalid
   access crash when printing backtraces (Petr Mladek)

 - "Address page fault in ima_restore_measurement_list()" fixes a
   kexec-related crash that can occur when booting the second-stage
   kernel on x86 (Harshit Mogalapalli)

 - "kho: ABI headers and Documentation updates" updates the kexec
   handover ABI documentation (Mike Rapoport)

 - "Align atomic storage" adds the __aligned attribute to atomic_t and
   atomic64_t definitions to get natural alignment of both types on
   csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)

 - "kho: clean up page initialization logic" simplifies the page
   initialization logic in kho_restore_page() (Pratyush Yadav)

 - "Unload linux/kernel.h" moves several things out of kernel.h and into
   more appropriate places (Yury Norov)

 - "don't abuse task_struct.group_leader" removes the usage of
   ->group_leader when it is "obviously unnecessary" (Oleg Nesterov)

 - "list private v2 & luo flb" adds some infrastructure improvements to
   the live update orchestrator (Pasha Tatashin)

* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
  watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
  procfs: fix missing RCU protection when reading real_parent in do_task_stat()
  watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
  kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
  kho: fix doc for kho_restore_pages()
  tests/liveupdate: add in-kernel liveupdate test
  liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
  liveupdate: luo_file: Use private list
  list: add kunit test for private list primitives
  list: add primitives for private list manipulations
  delayacct: fix uapi timespec64 definition
  panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
  netclassid: use thread_group_leader(p) in update_classid_task()
  RDMA/umem: don't abuse current->group_leader
  drm/pan*: don't abuse current->group_leader
  drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
  drm/amdgpu: don't abuse current->group_leader
  android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
  android/binder: don't abuse current->group_leader
  kho: skip memoryless NUMA nodes when reserving scratch areas
  ...
2026-02-12 12:13:01 -08:00
Linus Torvalds
85f24b0ace hardening updates for v7.0-rc1
- Various missed __counted_by annotations (Thorsten Blum)
 
 - Various missed -Wflex-array-member-not-at-end fixes (Gustavo A. R. Silva)
 
 - Avoid leftover tempfiles for interrupted compile-time FORTIFY tests
   (Nicolas Schier)
 
 - Remove non-existant CONFIG_UBSAN_REPORT_FULL from docs (Stefan Wiehler)
 
 - fortify: Use C arithmetic not FIELD_xxx() in FORTIFY_REASON defines
   (David Laight)
 
 - Add __counted_by_ptr attribute, tests, and first user (Bill Wendling,
   Kees Cook)
 
 - Update MAINTAINERS file to make hardening section not include pstore
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaYopPQAKCRA2KwveOeQk
 u5QDAQDj9ygsbzLj9EtXtU3T03GTvix4Rx7RkaBAMPSDEJGhNQD+M3dP6Z2ogEMz
 1Km6dAC7nTEujsVFur9BVpyEgoBjjQM=
 =G1gb
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "Mostly small cleanups and various scattered annotations and flex array
  warning fixes that we reviewed by unlanded in other trees. Introduces
  new annotation for expanding counted_by to pointer members, now that
  compiler behavior between GCC and Clang has been normalized.

   - Various missed __counted_by annotations (Thorsten Blum)

   - Various missed -Wflex-array-member-not-at-end fixes (Gustavo A. R.
     Silva)

   - Avoid leftover tempfiles for interrupted compile-time FORTIFY tests
     (Nicolas Schier)

   - Remove non-existant CONFIG_UBSAN_REPORT_FULL from docs (Stefan
     Wiehler)

   - fortify: Use C arithmetic not FIELD_xxx() in FORTIFY_REASON defines
     (David Laight)

   - Add __counted_by_ptr attribute, tests, and first user (Bill
     Wendling, Kees Cook)

   - Update MAINTAINERS file to make hardening section not include
     pstore"

* tag 'hardening-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  MAINTAINERS: pstore: Remove L: entry
  nfp: tls: Avoid -Wflex-array-member-not-at-end warnings
  carl9170: Avoid -Wflex-array-member-not-at-end warning
  coredump: Use __counted_by_ptr for struct core_name::corename
  lkdtm/bugs: Add __counted_by_ptr() test PTR_BOUNDS
  compiler_types.h: Attributes: Add __counted_by_ptr macro
  fortify: Cleanup temp file also on non-successful exit
  fortify: Rename temporary file to match ignore pattern
  fortify: Use C arithmetic not FIELD_xxx() in FORTIFY_REASON defines
  ecryptfs: Annotate struct ecryptfs_message with __counted_by
  fs/xattr: Annotate struct simple_xattr with __counted_by
  crypto: af_alg - Annotate struct af_alg_iv with __counted_by
  Kconfig.ubsan: Remove CONFIG_UBSAN_REPORT_FULL from documentation
  drm/nouveau: fifo: Avoid -Wflex-array-member-not-at-end warning
2026-02-10 08:54:13 -08:00
Amir Goldstein
55fb177d3a
fs: add helpers name_is_dot{,dot,_dotdot}
Rename the helper is_dot_dotdot() into the name_ namespace
and add complementary helpers to check for dot and dotdot
names individually.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20260128132406.23768-3-amir73il@gmail.com
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-29 10:06:59 +01:00
Randy Dunlap
24c776355f kernel.h: drop hex.h and update all hex.h users
Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of
hex.h interfaces to directly #include <linux/hex.h> as part of the process
of putting kernel.h on a diet.

Removing hex.h from kernel.h means that 36K C source files don't have to
pay the price of parsing hex.h for the roughly 120 C source files that
need it.

This change has been build-tested with allmodconfig on most ARCHes.  Also,
all users/callers of <linux/hex.h> in the entire source tree have been
updated if needed (if not already #included).

Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:44:19 -08:00
Thorsten Blum
cc34c669ab ecryptfs: Annotate struct ecryptfs_message with __counted_by
Add the __counted_by() compiler attribute to the flexible array member
'data' to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Tyler Hicks <code@tyhicks.com>
Link: https://patch.msgid.link/20260112115314.739612-2-thorsten.blum@linux.dev
Signed-off-by: Kees Cook <kees@kernel.org>
2026-01-14 14:43:19 -08:00
Thorsten Blum
99853d9dae ecryptfs: Replace memcpy + NUL termination in ecryptfs_copy_filename
Use kmemdup_nul() to copy 'name' instead of using memcpy() followed by a
manual NUL termination.  Remove the local return variable and the goto
label to simplify the code.  No functional changes.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Tyler Hicks <code@tyhicks.com>
Signed-off-by: Tyler Hicks <code@tyhicks.com>
2026-01-12 20:21:27 -06:00
Tyler Hicks
5c56afd204
ecryptfs: Release lower parent dentry after creating dir
Fix a mkdir-induced usage count imbalance that tripped a umount_check()
BUG while unmounting the lower filesystem. Commit f046fbb4d8
("ecryptfs: use new start_creating/start_removing APIs") added a new
dget() of the lower parent dir, in ecryptfs_mkdir(), but did not dput()
the dentry before returning from that function.

The BUG output as seen while running the eCryptfs test suite:

$ ./run_tests.sh -b 131072 -c safe,destructive -f ext4 -K -t lp-926292.sh
...
Running eCryptfs filesystem tests on ext4
lp-926292
------------[ cut here ]------------
BUG: Dentry ffff8e6692d11988{i=c,n=ECRYPTFS_FNEK_ENCRYPTED.FXZuRGZL7QAFtER.JeA46DtdKqkkQx9H2Vpmv234J5CU8YSsrUwZJK4AbXbrN5WkZ348wnqstovKKxA-}  still in use (1) [unmount of ext4 loop0]
WARNING: CPU: 7 PID: 950 at fs/dcache.c:1590 umount_check+0x5e/0x80
Modules linked in: md5 libmd5 ecryptfs encrypted_keys ext4 crc16 mbcache jbd2
CPU: 7 UID: 0 PID: 950 Comm: umount Not tainted 6.18.0-rc1-00013-gf046fbb4d81d #17 PREEMPT(full)
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
RIP: 0010:umount_check+0x5e/0x80
Code: 88 38 06 00 00 48 8b 40 28 4c 8b 08 48 8b 46 68 48 85 c0 74 04 48 8b 50 38 51 48 c7 c7 60 32 9c b5 48 89 f1 e8 43 5e ca ff 90 <0f> 0b 90 90 58 31 c0 e9 46 9d 6c 00 41 83 f8 01 75 b8 eb a3 66 66
RSP: 0018:ffffa19940c4bdd0 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff8e6692fad4c0 RCX: 0000000000000000
RDX: 0000000000000004 RSI: ffffa19940c4bc70 RDI: 00000000ffffffff
RBP: ffffffffb4eb5930 R08: 00000000ffffdfff R09: 0000000000000001
R10: 00000000ffffdfff R11: ffffffffb5c8a9e0 R12: ffff8e6692fad4c0
R13: ffff8e6692fad4c0 R14: ffff8e6692d11a40 R15: ffff8e6692d11988
FS:  00007f6b4b491800(0000) GS:ffff8e670506e000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6b4b5f8d40 CR3: 0000000114eb7001 CR4: 0000000000772ef0
PKRU: 55555554
Call Trace:
 <TASK>
 d_walk+0xfd/0x370
 shrink_dcache_for_umount+0x4d/0x140
 generic_shutdown_super+0x20/0x160
 kill_block_super+0x1a/0x40
 ext4_kill_sb+0x22/0x40 [ext4]
 deactivate_locked_super+0x33/0xa0
 cleanup_mnt+0xba/0x150
 task_work_run+0x5c/0xa0
 exit_to_user_mode_loop+0xac/0xb0
 do_syscall_64+0x2ab/0xfa0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6b4b6c2a2b
Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 b9 83 0d 00 f7 d8
RSP: 002b:00007ffcd5b8b498 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 000055b84af0b9e0 RCX: 00007f6b4b6c2a2b
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055b84af0bdf0
RBP: 00007ffcd5b8b570 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000103 R11: 0000000000000246 R12: 000055b84af0bae0
R13: 0000000000000000 R14: 000055b84af0bdf0 R15: 0000000000000000
 </TASK>
---[ end trace 0000000000000000 ]---
EXT4-fs (loop0): unmounting filesystem 00d9ea41-f61e-43d0-a449-6be03e7e8428.
EXT4-fs (loop0): sb orphan head is 12
sb_info orphan list:
  inode loop0:12 at ffff8e66950e1df0: mode 40700, nlink 0, next 0
Assertion failure in ext4_put_super() at fs/ext4/super.c:1345: 'list_empty(&sbi->s_orphan)'

Fixes: f046fbb4d8 ("ecryptfs: use new start_creating/start_removing APIs")
Signed-off-by: Tyler Hicks <code@tyhicks.com>
Link: https://patch.msgid.link/20251223194153.2818445-3-code@tyhicks.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-24 13:58:04 +01:00
Tyler Hicks
5f9ad16bcc
ecryptfs: Fix improper mknod pairing of start_creating()/end_removing()
The ecryptfs_start_creating_dentry() function must be paired with the
end_creating() function. Fix ecryptfs_mknod() so that end_creating() is
properly called in the return path, instead of end_removing().

Fixes: f046fbb4d8 ("ecryptfs: use new start_creating/start_removing APIs")
Signed-off-by: Tyler Hicks <code@tyhicks.com>
Link: https://patch.msgid.link/20251223194153.2818445-2-code@tyhicks.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-24 13:58:04 +01:00
Thorsten Blum
6ba6733313 ecryptfs: Drop redundant NUL terminations after calling ecryptfs_to_hex
ecryptfs_to_hex() already NUL-terminates the destination buffers. Drop
the manual NUL terminations.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Tyler Hicks <code@tyhicks.com>
2025-12-23 15:23:23 -06:00
Thorsten Blum
e8fb5ec893 ecryptfs: Replace memcpy + NUL termination in ecryptfs_new_file_context
Use strscpy() to copy the NUL-terminated '->global_default_cipher_name'
to the destination buffer instead of using memcpy() followed by a manual
NUL termination. Remove the now-unused local variable 'cipher_name_len'.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Tyler Hicks <code@tyhicks.com>
2025-12-23 15:23:23 -06:00
Thorsten Blum
0529a80409 ecryptfs: Replace strcpy with strscpy in ecryptfs_validate_options
strcpy() has been deprecated [1] because it performs no bounds checking
on the destination buffer, which can lead to buffer overflows. Replace
it with the safer strscpy().

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [1]
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Tyler Hicks <code@tyhicks.com>
2025-12-23 15:23:23 -06:00
Thorsten Blum
c82f77a4ac ecryptfs: Replace strcpy with strscpy in ecryptfs_cipher_code_to_string
strcpy() has been deprecated [1] because it performs no bounds checking
on the destination buffer, which can lead to buffer overflows. Since
the parameter 'char *str' is just a pointer with no size information,
extend the function with a 'size' parameter to pass the destination
buffer's size as an additional argument. Adjust the call sites
accordingly.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [1]
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Tyler Hicks <code@tyhicks.com>
2025-12-23 15:23:23 -06:00
Thorsten Blum
3bdc6cace2 ecryptfs: Replace strcpy with strscpy in ecryptfs_set_default_crypt_stat_vals
strcpy() has been deprecated [1] because it performs no bounds checking
on the destination buffer, which can lead to buffer overflows. Replace
it with the safer strscpy().

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [1]
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Tyler Hicks <code@tyhicks.com>
2025-12-23 15:23:23 -06:00
Baolin Liu
5c31c9bf9e ecryptfs: simplify list initialization in ecryptfs_parse_packet_set()
In ecryptfs_parse_packet_set(),use LIST_HEAD() to declare and
initialize the 'auth_tok_list' list in one step instead of
using INIT_LIST_HEAD() separately.

No functional change.

Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
Signed-off-by: Tyler Hicks <code@tyhicks.com>
2025-12-23 15:23:23 -06:00
Zhang Zekun
111625ba8a ecryptfs: Remove unused declartion ecryptfs_fill_zeros()
The definition of ecryptfs_fill_zeros() has been removed since
commit b6c1d8fcba ("eCryptfs: remove unused functions and kmem_cache")
So, Remove the empty declartion in header files.

Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Signed-off-by: Tyler Hicks <code@tyhicks.com>
2025-12-23 15:23:23 -06:00
Thorsten Blum
ec25c4cf2d ecryptfs: Fix packet format comment in parse_tag_67_packet()
s/TAG 65/TAG 67/

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Tyler Hicks <code@tyhicks.com>
2025-12-23 15:23:23 -06:00
Zipeng Zhang
9383d8205c ecryptfs: comment typo fix
Comment typo fix "vitual" -> "virtual".

Signed-off-by: Zipeng Zhang <zhangzipeng0@foxmail.com>
Signed-off-by: Tyler Hicks <code@tyhicks.com>
2025-12-23 15:23:20 -06:00
Slark Xiao
0f9b0076ff ecryptfs: keystore: Fix typo 'the the' in comment
Replace 'the the' with 'the' in the comment.

Signed-off-by: Slark Xiao <slark_xiao@163.com>
Signed-off-by: Tyler Hicks <code@tyhicks.com>
2025-12-23 15:22:59 -06:00
Linus Torvalds
a8058f8442 vfs-6.19-rc1.directory.locking
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZwAKCRCRxhvAZXjc
 op9tAQCJ//STOkvYHfqgsdRD+cW9MRg/gPzfVZgnV1FTyf8sMgEA0IsY5zCZB9eh
 9FdD0E57P8PlWRwWZ+LktnWBzRAUqwI=
 =MOVR
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull directory locking updates from Christian Brauner:
 "This contains the work to add centralized APIs for directory locking
  operations.

  This series is part of a larger effort to change directory operation
  locking to allow multiple concurrent operations in a directory. The
  ultimate goal is to lock the target dentry(s) rather than the whole
  parent directory.

  To help with changing the locking protocol, this series centralizes
  locking and lookup in new helper functions. The helpers establish a
  pattern where it is the dentry that is being locked and unlocked
  (currently the lock is held on dentry->d_parent->d_inode, but that can
  change in the future).

  This also changes vfs_mkdir() to unlock the parent on failure, as well
  as dput()ing the dentry. This allows end_creating() to only require
  the target dentry (which may be IS_ERR() after vfs_mkdir()), not the
  parent"

* tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  nfsd: fix end_creating() conversion
  VFS: introduce end_creating_keep()
  VFS: change vfs_mkdir() to unlock on failure.
  ecryptfs: use new start_creating/start_removing APIs
  Add start_renaming_two_dentries()
  VFS/ovl/smb: introduce start_renaming_dentry()
  VFS/nfsd/ovl: introduce start_renaming() and end_renaming()
  VFS: add start_creating_killable() and start_removing_killable()
  VFS: introduce start_removing_dentry()
  smb/server: use end_removing_noperm for for target of smb2_create_link()
  VFS: introduce start_creating_noperm() and start_removing_noperm()
  VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing()
  VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
  VFS: tidy up do_unlinkat()
  VFS: introduce start_dirop() and end_dirop()
  debugfs: rename end_creating() to debugfs_end_creating()
2025-12-01 16:13:46 -08:00
Linus Torvalds
db74a7d02a vfs-6.19-rc1.directory.delegations
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZgAKCRCRxhvAZXjc
 ooiEAPwNZfkqiSs6G1B2EmjFpMrA2BDqskaOsnN2sywra0sNewD9EQxJwlYXUn+z
 nNUIAvmegJGg2OiU2UaNGwxMR3lR3w8=
 =YELr
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull directory delegations update from Christian Brauner:
 "This contains the work for recall-only directory delegations for
  knfsd.

  Add support for simple, recallable-only directory delegations. This
  was decided at the fall NFS Bakeathon where the NFS client and server
  maintainers discussed how to merge directory delegation support.

  The approach starts with recallable-only delegations for several reasons:

   1. RFC8881 has gaps that are being addressed in RFC8881bis. In
      particular, it requires directory position information for
      CB_NOTIFY callbacks, which is difficult to implement properly
      under Linux. The spec is being extended to allow that information
      to be omitted.

   2. Client-side support for CB_NOTIFY still lags. The client side
      involves heuristics about when to request a delegation.

   3. Early indication shows simple, recallable-only delegations can
      help performance. Anna Schumaker mentioned seeing a multi-minute
      speedup in xfstests runs with them enabled.

  With these changes, userspace can also request a read lease on a
  directory that will be recalled on conflicting accesses. This may be
  useful for applications like Samba. Users can disable leases
  altogether via the fs.leases-enable sysctl if needed.

  VFS changes:

   - Dedicated Type for Delegations

     Introduce struct delegated_inode to track inodes that may have
     delegations that need to be broken. This replaces the previous
     approach of passing raw inode pointers through the delegation
     breaking code paths, providing better type safety and clearer
     semantics for the delegation machinery.

   - Break parent directory delegations in open(..., O_CREAT) codepath

   - Allow mkdir to wait for delegation break on parent

   - Allow rmdir to wait for delegation break on parent

   - Add try_break_deleg calls for parents to vfs_link(), vfs_rename(),
     and vfs_unlink()

   - Make vfs_create(), vfs_mknod(), and vfs_symlink() break delegations
     on parent directory

   - Clean up argument list for vfs_create()

   - Expose delegation support to userland

  Filelock changes:

   - Make lease_alloc() take a flags argument

   - Rework the __break_lease API to use flags

   - Add struct delegated_inode

   - Push the S_ISREG check down to ->setlease handlers

   - Lift the ban on directory leases in generic_setlease

  NFSD changes:

   - Allow filecache to hold S_IFDIR files

   - Allow DELEGRETURN on directories

   - Wire up GET_DIR_DELEGATION handling

  Fixes:

   - Fix kernel-doc warnings in __fcntl_getlease

   - Add needed headers for new struct delegation definition"

* tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  vfs: add needed headers for new struct delegation definition
  filelock: __fcntl_getlease: fix kernel-doc warnings
  vfs: expose delegation support to userland
  nfsd: wire up GET_DIR_DELEGATION handling
  nfsd: allow DELEGRETURN on directories
  nfsd: allow filecache to hold S_IFDIR files
  filelock: lift the ban on directory leases in generic_setlease
  vfs: make vfs_symlink break delegations on parent dir
  vfs: make vfs_mknod break delegations on parent directory
  vfs: make vfs_create break delegations on parent directory
  vfs: clean up argument list for vfs_create()
  vfs: break parent dir delegations in open(..., O_CREAT) codepath
  vfs: allow rmdir to wait for delegation break on parent
  vfs: allow mkdir to wait for delegation break on parent
  vfs: add try_break_deleg calls for parents to vfs_{link,rename,unlink}
  filelock: push the S_ISREG check down to ->setlease handlers
  filelock: add struct delegated_inode
  filelock: rework the __break_lease API to use flags
  filelock: make lease_alloc() take a flags argument
2025-12-01 15:34:41 -08:00
Linus Torvalds
9368f0f941 vfs-6.19-rc1.inode
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZAAKCRCRxhvAZXjc
 omMSAP9GLhavxyWQ24Q+49CNWWRQWDY1wTOiUK2BwtIvZ0YEcAD8D1dAiMckL5pC
 RwEAVA5p+y+qi+bZP0KXCBxQddoTIQM=
 =zo/J
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs inode updates from Christian Brauner:
 "Features:

   - Hide inode->i_state behind accessors. Open-coded accesses prevent
     asserting they are done correctly. One obvious aspect is locking,
     but significantly more can be checked. For example it can be
     detected when the code is clearing flags which are already missing,
     or is setting flags when it is illegal (e.g., I_FREEING when
     ->i_count > 0)

   - Provide accessors for ->i_state, converts all filesystems using
     coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2,
     overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to
     compile

   - Rework I_NEW handling to operate without fences, simplifying the
     code after the accessor infrastructure is in place

  Cleanups:

   - Move wait_on_inode() from writeback.h to fs.h

   - Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb
     for clarity

   - Cosmetic fixes to LRU handling

   - Push list presence check into inode_io_list_del()

   - Touch up predicts in __d_lookup_rcu()

   - ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage

   - Assert on ->i_count in iput_final()

   - Assert ->i_lock held in __iget()

  Fixes:

   - Add missing fences to I_NEW handling"

* tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
  dcache: touch up predicts in __d_lookup_rcu()
  fs: push list presence check into inode_io_list_del()
  fs: cosmetic fixes to lru handling
  fs: rework I_NEW handling to operate without fences
  fs: make plain ->i_state access fail to compile
  xfs: use the new ->i_state accessors
  nilfs2: use the new ->i_state accessors
  overlayfs: use the new ->i_state accessors
  gfs2: use the new ->i_state accessors
  f2fs: use the new ->i_state accessors
  smb: use the new ->i_state accessors
  ceph: use the new ->i_state accessors
  btrfs: use the new ->i_state accessors
  Manual conversion to use ->i_state accessors of all places not covered by coccinelle
  Coccinelle-based conversion to use ->i_state accessors
  fs: provide accessors for ->i_state
  fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb
  fs: move wait_on_inode() from writeback.h to fs.h
  fs: add missing fences to I_NEW handling
  ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage
  ...
2025-12-01 09:02:34 -08:00
NeilBrown
fe497f0759
VFS: change vfs_mkdir() to unlock on failure.
vfs_mkdir() already drops the reference to the dentry on failure but it
leaves the parent locked.
This complicates end_creating() which needs to unlock the parent even
though the dentry is no longer available.

If we change vfs_mkdir() to unlock on failure as well as releasing the
dentry, we can remove the "parent" arg from end_creating() and simplify
the rules for calling it.

Note that cachefiles_get_directory() can choose to substitute an error
instead of actually calling vfs_mkdir(), for fault injection.  In that
case it needs to call end_creating(), just as vfs_mkdir() now does on
error.

ovl_create_real() will now unlock on error.  So the conditional
end_creating() after the call is removed, and end_creating() is called
internally on error.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: syzbot@syzkaller.appspotmail.com
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://patch.msgid.link/20251113002050.676694-15-neilb@ownmail.net
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14 13:15:58 +01:00
NeilBrown
f046fbb4d8
ecryptfs: use new start_creating/start_removing APIs
This requires the addition of start_creating_dentry() which is given the
dentry which has already been found, and asks for it to be locked and
its parent validated.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://patch.msgid.link/20251113002050.676694-14-neilb@ownmail.net
Tested-by: syzbot@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14 13:15:58 +01:00
Jeff Layton
92bf53577f
vfs: make vfs_symlink break delegations on parent dir
In order to add directory delegation support, we must break delegations
on the parent on any change to the directory.

Add a delegated_inode parameter to vfs_symlink() and have it break the
delegation. do_symlinkat() can then wait on the delegation break before
proceeding.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-12-52f3feebb2f2@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12 09:38:36 +01:00
Jeff Layton
e8960c1b2e
vfs: make vfs_mknod break delegations on parent directory
In order to add directory delegation support, we need to break
delegations on the parent whenever there is going to be a change in the
directory.

Add a new delegated_inode pointer to vfs_mknod() and have the
appropriate callers wait when there is an outstanding delegation. All
other callers just set the pointer to NULL.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-11-52f3feebb2f2@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12 09:38:36 +01:00
Jeff Layton
c826229c6a
vfs: make vfs_create break delegations on parent directory
In order to add directory delegation support, we need to break
delegations on the parent whenever there is going to be a change in the
directory.

Add a delegated_inode parameter to vfs_create. Most callers are
converted to pass in NULL, but do_mknodat() is changed to wait for a
delegation break if there is one.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-10-52f3feebb2f2@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12 09:38:36 +01:00
Jeff Layton
85bbffcad7
vfs: clean up argument list for vfs_create()
As Neil points out:

"I would be in favour of dropping the "dir" arg because it is always
d_inode(dentry->d_parent) which is stable."

...and...

"Also *every* caller of vfs_create() passes ".excl = true".  So maybe we
don't need that arg at all."

Drop both arguments from vfs_create() and fix up the callers.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-9-52f3feebb2f2@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12 09:38:36 +01:00
Jeff Layton
4fa76319cd
vfs: allow rmdir to wait for delegation break on parent
In order to add directory delegation support, we need to break
delegations on the parent whenever there is going to be a change in the
directory.

Add a delegated_inode struct to vfs_rmdir() and populate that
pointer with the parent inode if it's non-NULL. Most existing in-kernel
callers pass in a NULL pointer.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-7-52f3feebb2f2@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12 09:38:35 +01:00
Jeff Layton
e12d203b8c
vfs: allow mkdir to wait for delegation break on parent
In order to add directory delegation support, we need to break
delegations on the parent whenever there is going to be a change in the
directory.

Add a new delegated_inode parameter to vfs_mkdir. All of the existing
callers set that to NULL for now, except for do_mkdirat which will
properly block until the lease is gone.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-6-52f3feebb2f2@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12 09:38:35 +01:00
Eric Biggers
0bbb838f38
ecryptfs: Use MD5 library instead of crypto_shash
eCryptfs uses MD5 for a couple unusual purposes: to "mix" the key into
the IVs for file contents encryption (similar to ESSIV), and to prepend
some key-dependent bytes to the plaintext when encrypting filenames
(which is useless since eCryptfs encrypts the filenames with ECB).

Currently, eCryptfs computes these MD5 hashes using the crypto_shash
API.  Update it to instead use the MD5 library API.  This is simpler and
faster: the library doesn't require memory allocations, can't fail, and
provides direct access to MD5 without overhead such as indirect calls.

To preserve the existing behavior of eCryptfs support being disabled
when the kernel is booted with "fips=1", make ecryptfs_get_tree() check
fips_enabled itself.  Previously it relied on crypto_alloc_shash("md5")
failing.  I don't know for sure that this is actually needed; e.g., it
could be argued that eCryptfs's use of MD5 isn't for a security purpose
as far as FIPS is concerned.  But this preserves the existing behavior.

Tested by verifying that an existing eCryptfs can still be mounted with
a kernel that has this commit, with all the files matching.  Also tested
creating a filesystem with this commit and mounting+reading it without.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20251011200010.193140-1-ebiggers@kernel.org
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-31 10:12:35 +01:00
Mateusz Guzik
b4dbfd8653
Coccinelle-based conversion to use ->i_state accessors
All places were patched by coccinelle with the default expecting that
->i_lock is held, afterwards entries got fixed up by hand to use
unlocked variants as needed.

The script:
@@
expression inode, flags;
@@

- inode->i_state & flags
+ inode_state_read(inode) & flags

@@
expression inode, flags;
@@

- inode->i_state &= ~flags
+ inode_state_clear(inode, flags)

@@
expression inode, flag1, flag2;
@@

- inode->i_state &= ~flag1 & ~flag2
+ inode_state_clear(inode, flag1 | flag2)

@@
expression inode, flags;
@@

- inode->i_state |= flags
+ inode_state_set(inode, flags)

@@
expression inode, flags;
@@

- inode->i_state = flags
+ inode_state_assign(inode, flags)

@@
expression inode, flags;
@@

- flags = inode->i_state
+ flags = inode_state_read(inode)

@@
expression inode, flags;
@@

- READ_ONCE(inode->i_state) & flags
+ inode_state_read(inode) & flags

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-20 20:22:26 +02:00
Linus Torvalds
e64aeecbbb mount-related stuff for this cycle
* saner handling of guards in fs/namespace.c, getting
 rid of needlessly strong locking in some of the users.
 
 	* lock_mount() calling conventions change - have it set
 the environment for attaching to given location, storing the
 results in caller-supplied object, without altering the passed
 struct path.  Make unlock_mount() called as __cleanup for those
 objects.  It's not exactly guard(), but similar to it.
 
 	* MNT_WRITE_HOLD done right - mnt_hold_writers() does *not*
 mess with ->mnt_flags anymore, so insertion of a new mount into
 ->s_mounts of underlying superblock does not, in itself, expose
 ->mnt_flags of that mount to concurrent modifications.
 
 	* getting rid of pathological cases when umount() spends
 quadratic time removing the victims from propagation graph -
 part of that had been dealt with last cycle, this should finish
 it.
 
 	* a bunch of stuff constified.
 
 	* assorted cleanups.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaNhzLAAKCRBZ7Krx/gZQ
 63/IAP4yxJ6e3Pt66Uw0MeuSNmeLsQwb7mYo72lsYHpxjYANZAEAspMaLDU9NHxM
 Dy6WDVoJnf7+aDlD6E443YMfPX8XRQM=
 =5T+t
 -----END PGP SIGNATURE-----

Merge tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs mount updates from Al Viro:
 "Several piles this cycle, this mount-related one being the largest and
  trickiest:

   - saner handling of guards in fs/namespace.c, getting rid of
     needlessly strong locking in some of the users

   - lock_mount() calling conventions change - have it set the
     environment for attaching to given location, storing the results in
     caller-supplied object, without altering the passed struct path.

     Make unlock_mount() called as __cleanup for those objects. It's not
     exactly guard(), but similar to it

   - MNT_WRITE_HOLD done right.

     mnt_hold_writers() does *not* mess with ->mnt_flags anymore, so
     insertion of a new mount into ->s_mounts of underlying superblock
     does not, in itself, expose ->mnt_flags of that mount to concurrent
     modifications

   - getting rid of pathological cases when umount() spends quadratic
     time removing the victims from propagation graph - part of that had
     been dealt with last cycle, this should finish it

   - a bunch of stuff constified

   - assorted cleanups

* tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
  constify {__,}mnt_is_readonly()
  WRITE_HOLD machinery: no need for to bump mount_lock seqcount
  struct mount: relocate MNT_WRITE_HOLD bit
  preparations to taking MNT_WRITE_HOLD out of ->mnt_flags
  setup_mnt(): primitive for connecting a mount to filesystem
  simplify the callers of mnt_unhold_writers()
  copy_mnt_ns(): use guards
  copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure
  open_detached_copy(): separate creation of namespace into helper
  open_detached_copy(): don't bother with mount_lock_hash()
  path_has_submounts(): use guard(mount_locked_reader)
  fs/namespace.c: sanitize descriptions for {__,}lookup_mnt()
  ecryptfs: get rid of pointless mount references in ecryptfs dentries
  umount_tree(): take all victims out of propagation graph at once
  do_mount(): use __free(path_put)
  do_move_mount_old(): use __free(path_put)
  constify can_move_mount_beneath() arguments
  path_umount(): constify struct path argument
  may_copy_tree(), __do_loopback(): constify struct path argument
  path_mount(): constify struct path argument
  ...
2025-10-03 10:19:44 -07:00
NeilBrown
d7fb2c4102
VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
A rename operation can only rename within a single mount.  Callers of
vfs_rename() must and do ensure this is the case.

So there is no point in having two mnt_idmaps in renamedata as they are
always the same.  Only one of them is passed to ->rename in any case.

This patch replaces both with a single "mnt_idmap" and changes all
callers.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-23 12:37:35 +02:00
Al Viro
fc812c40f5 ecryptfs: get rid of pointless mount references in ecryptfs dentries
->lower_path.mnt has the same value for all dentries on given ecryptfs
instance and if somebody goes for mountpoint-crossing variant where that
would not be true, we can deal with that when it happens (and _not_
with duplicating these reference into each dentry).

As it is, we are better off just sticking a reference into ecryptfs-private
part of superblock and keeping it pinned until ->kill_sb().

That way we can stick a reference to underlying dentry right into ->d_fsdata
of ecryptfs one, getting rid of indirection through struct ecryptfs_dentry_info,
along with the entire struct ecryptfs_dentry_info machinery.

[kudos to Dan Carpenter for spotting a bug in ecryptfs_get_tree() part]

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-15 21:26:44 -04:00
Linus Torvalds
57fcb7d930 vfs-6.17-rc1.fileattr
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCpgAKCRCRxhvAZXjc
 oqfFAQDcy3rROUF3W34KcSi7rDmaKVSX53d1tUoqH+1zDRpSlwEAriKDNC1ybudp
 YAnxVzkRHjHs1296WIuwKq5lfhJ60Q4=
 =geAl
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull fileattr updates from Christian Brauner:
 "This introduces the new file_getattr() and file_setattr() system calls
  after lengthy discussions.

  Both system calls serve as successors and extensible companions to
  the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have
  started to show their age in addition to being named in a way that
  makes it easy to conflate them with extended attribute related
  operations.

  These syscalls allow userspace to set filesystem inode attributes on
  special files. One of the usage examples is the XFS quota projects.

  XFS has project quotas which could be attached to a directory. All new
  inodes in these directories inherit project ID set on parent
  directory.

  The project is created from userspace by opening and calling
  FS_IOC_FSSETXATTR on each inode. This is not possible for special
  files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left
  with empty project ID. Those inodes then are not shown in the quota
  accounting but still exist in the directory. This is not critical but
  in the case when special files are created in the directory with
  already existing project quota, these new inodes inherit extended
  attributes. This creates a mix of special files with and without
  attributes. Moreover, special files with attributes don't have a
  possibility to become clear or change the attributes. This, in turn,
  prevents userspace from re-creating quota project on these existing
  files.

  In addition, these new system calls allow the implementation of
  additional attributes that we couldn't or didn't want to fit into the
  legacy ioctls anymore"

* tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: tighten a sanity check in file_attr_to_fileattr()
  tree-wide: s/struct fileattr/struct file_kattr/g
  fs: introduce file_getattr and file_setattr syscalls
  fs: prepare for extending file_get/setattr()
  fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP
  selinux: implement inode_file_[g|s]etattr hooks
  lsm: introduce new hooks for setting/getting inode fsxattr
  fs: split fileattr related helpers into separate file
2025-07-28 15:24:14 -07:00
Linus Torvalds
7031769e10 vfs-6.17-rc1.mmap_prepare
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCgQAKCRCRxhvAZXjc
 os+nAP9LFHUwWO6EBzHJJGEVjJvvzsbzqeYrRFamYiMc5ulPJwD+KW4RIgJa/MWO
 pcYE40CacaekD8rFWwYUyszpgmv6ewc=
 =wCwp
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull mmap_prepare updates from Christian Brauner:
 "Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b ("mm:
  introduce new .mmap_prepare() file callback").

  This is preferred to the existing f_op->mmap() hook as it does require
  a VMA to be established yet, thus allowing the mmap logic to invoke
  this hook far, far earlier, prior to inserting a VMA into the virtual
  address space, or performing any other heavy handed operations.

  This allows for much simpler unwinding on error, and for there to be a
  single attempt at merging a VMA rather than having to possibly
  reattempt a merge based on potentially altered VMA state.

  Far more importantly, it prevents inappropriate manipulation of
  incompletely initialised VMA state, which is something that has been
  the cause of bugs and complexity in the past.

  The intent is to gradually deprecate f_op->mmap, and in that vein this
  series coverts the majority of file systems to using f_op->mmap_prepare.

  Prerequisite steps are taken - firstly ensuring all checks for mmap
  capabilities use the file_has_valid_mmap_hooks() helper rather than
  directly checking for f_op->mmap (which is now not a valid check) and
  secondly updating daxdev_mapping_supported() to not require a VMA
  parameter to allow ext4 and xfs to be converted.

  Commit bb666b7c27 ("mm: add mmap_prepare() compatibility layer for
  nested file systems") handles the nasty edge-case of nested file
  systems like overlayfs, which introduces a compatibility shim to allow
  f_op->mmap_prepare() to be invoked from an f_op->mmap() callback.

  This allows for nested filesystems to continue to function correctly
  with all file systems regardless of which callback is used. Once we
  finally convert all file systems, this shim can be removed.

  As a result, ecryptfs, fuse, and overlayfs remain unaltered so they
  can nest all other file systems.

  We additionally do not update resctl - as this requires an update to
  remap_pfn_range() (or an alternative to it) which we defer to a later
  series, equally we do not update cramfs which needs a mixed mapping
  insertion with the same issue, nor do we update procfs, hugetlbfs,
  syfs or kernfs all of which require VMAs for internal state and hooks.
  We shall return to all of these later"

* tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  doc: update porting, vfs documentation to describe mmap_prepare()
  fs: replace mmap hook with .mmap_prepare for simple mappings
  fs: convert most other generic_file_*mmap() users to .mmap_prepare()
  fs: convert simple use of generic_file_*_mmap() to .mmap_prepare()
  mm/filemap: introduce generic_file_*_mmap_prepare() helpers
  fs/xfs: transition from deprecated .mmap hook to .mmap_prepare
  fs/ext4: transition from deprecated .mmap hook to .mmap_prepare
  fs/dax: make it possible to check dev dax support without a VMA
  fs: consistently use can_mmap_file() helper
  mm/nommu: use file_has_valid_mmap_hooks() helper
  mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
2025-07-28 13:43:25 -07:00
Linus Torvalds
7879d7aff0 vfs-6.17-rc1.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaIM/KwAKCRCRxhvAZXjc
 opT+AP407JwhRSBjUEmHg5JzUyDoivkOySdnthunRjaBKD8rlgEApM6SOIZYucU7
 cPC3ZY6ORFM6Mwaw+iDW9lasM5ucHQ8=
 =CHha
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc VFS updates from Christian Brauner:
 "This contains the usual selections of misc updates for this cycle.

  Features:

   - Add ext4 IOCB_DONTCACHE support

     This refactors the address_space_operations write_begin() and
     write_end() callbacks to take const struct kiocb * as their first
     argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate
     to the filesystem's buffered I/O path.

     Ext4 is updated to implement handling of the IOCB_DONTCACHE flag
     and advertises support via the FOP_DONTCACHE file operation flag.

     Additionally, the i915 driver's shmem write paths are updated to
     bypass the legacy write_begin/write_end interface in favor of
     directly calling write_iter() with a constructed synchronous kiocb.
     Another i915 change replaces a manual write loop with
     kernel_write() during GEM shmem object creation.

  Cleanups:

   - don't duplicate vfs_open() in kernel_file_open()

   - proc_fd_getattr(): don't bother with S_ISDIR() check

   - fs/ecryptfs: replace snprintf with sysfs_emit in show function

   - vfs: Remove unnecessary list_for_each_entry_safe() from
     evict_inodes()

   - filelock: add new locks_wake_up_waiter() helper

   - fs: Remove three arguments from block_write_end()

   - VFS: change old_dir and new_dir in struct renamedata to dentrys

   - netfs: Remove unused declaration netfs_queue_write_request()

  Fixes:

   - eventpoll: Fix semi-unbounded recursion

   - eventpoll: fix sphinx documentation build warning

   - fs/read_write: Fix spelling typo

   - fs: annotate data race between poll_schedule_timeout() and
     pollwake()

   - fs/pipe: set FMODE_NOWAIT in create_pipe_files()

   - docs/vfs: update references to i_mutex to i_rwsem

   - fs/buffer: remove comment about hard sectorsize

   - fs/buffer: remove the min and max limit checks in __getblk_slow()

   - fs/libfs: don't assume blocksize <= PAGE_SIZE in
     generic_check_addressable

   - fs_context: fix parameter name in infofc() macro

   - fs: Prevent file descriptor table allocations exceeding INT_MAX"

* tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits)
  netfs: Remove unused declaration netfs_queue_write_request()
  eventpoll: fix sphinx documentation build warning
  ext4: support uncached buffered I/O
  mm/pagemap: add write_begin_get_folio() helper function
  fs: change write_begin/write_end interface to take struct kiocb *
  drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter
  drm/i915: Use kernel_write() in shmem object create
  eventpoll: Fix semi-unbounded recursion
  vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes()
  fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable
  fs/buffer: remove the min and max limit checks in __getblk_slow()
  fs: Prevent file descriptor table allocations exceeding INT_MAX
  fs: Remove three arguments from block_write_end()
  fs/ecryptfs: replace snprintf with sysfs_emit in show function
  fs: annotate suspected data race between poll_schedule_timeout() and pollwake()
  docs/vfs: update references to i_mutex to i_rwsem
  fs/buffer: remove comment about hard sectorsize
  fs_context: fix parameter name in infofc() macro
  VFS: change old_dir and new_dir in struct renamedata to dentrys
  proc_fd_getattr(): don't bother with S_ISDIR() check
  ...
2025-07-28 11:22:56 -07:00
Taotao Chen
e9d8e2bf23
fs: change write_begin/write_end interface to take struct kiocb *
Change the address_space_operations callbacks write_begin() and
write_end() to take struct kiocb * as the first argument instead of
struct file *.

Update all affected function prototypes, implementations, call sites,
and related documentation across VFS, filesystems, and block layer.

Part of a series refactoring address_space_operations write_begin and
write_end callbacks to use struct kiocb for passing write context and
flags.

Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
Link: https://lore.kernel.org/20250716093559.217344-4-chentaotao@didiglobal.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-16 14:48:18 +02:00
Christian Brauner
ca115d7e75
tree-wide: s/struct fileattr/struct file_kattr/g
Now that we expose struct file_attr as our uapi struct rename all the
internal struct to struct file_kattr to clearly communicate that it is a
kernel internal struct. This is similar to struct mount_{k}attr and
others.

Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04 16:14:39 +02:00
Ankit Chauhan
06a705356d
fs/ecryptfs: replace snprintf with sysfs_emit in show function
Use sysfs_emit() instead of snprintf() in version_show() function to
follow the preferred kernel API.

Signed-off-by: Ankit Chauhan <ankitchauhan2065@gmail.com>
Link: https://lore.kernel.org/20250619031536.19352-1-ankitchauhan2065@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-23 14:12:36 +02:00
Lorenzo Stoakes
b013ed4031
fs: consistently use can_mmap_file() helper
Since commit c84bf6dd2b ("mm: introduce new .mmap_prepare() file
callback"), the f_op->mmap() hook has been deprecated in favour of
f_op->mmap_prepare().

Additionally, commit bb666b7c27 ("mm: add mmap_prepare() compatibility
layer for nested file systems") permits the use of the .mmap_prepare() hook
even in nested filesystems like overlayfs.

There are a number of places where we check only for f_op->mmap - this is
incorrect now mmap_prepare exists, so update all of these to use the
general helper can_mmap_file().

Most notably, this updates the elf logic to allow for the ability to
execute binaries on filesystems which have the .mmap_prepare hook, but
additionally we update nested filesystems.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/b68145b609532e62bab603dd9686faa6562046ec.1750099179.git.lorenzo.stoakes@oracle.com
Acked-by: Kees Cook <kees@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-17 13:47:22 +02:00
NeilBrown
bc9241367a
VFS: change old_dir and new_dir in struct renamedata to dentrys
all users of 'struct renamedata' have the dentry for the old and new
directories, and often have no use for the inode except to store it in
the renamedata.

This patch changes struct renamedata to hold the dentry, rather than
the inode, for the old and new directories, and changes callers to
match.  The names are also changed from a _dir suffix to _parent.  This
is consistent with other usage in namei.c and elsewhere.

This results in the removal of several local variables and several
dereferences of ->d_inode at the cost of adding ->d_inode dereferences
to vfs_rename().

Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/174977089072.608730.4244531834577097454@noble.neil.brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-16 16:30:45 +02:00
Al Viro
05fb0e6664 new helper: set_default_d_op()
... to be used instead of manually assigning to ->s_d_op.
All in-tree filesystem converted (and field itself is renamed,
so any out-of-tree ones in need of conversion will be caught
by compiler).

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-10 22:21:16 -04:00
NeilBrown
fa6fe07d15
VFS: rename lookup_one_len family to lookup_noperm and remove permission check
The lookup_one_len family of functions is (now) only used internally by
a filesystem on itself either
- in a context where permission checking is irrelevant such as by a
  virtual filesystem populating itself, or xfs accessing its ORPHANAGE
  or dquota accessing the quota file; or
- in a context where a permission check (MAY_EXEC on the parent) has just
  been performed such as a network filesystem finding in "silly-rename"
  file in the same directory.  This is also the context after the
  _parentat() functions where currently lookup_one_qstr_excl() is used.

So the permission check is pointless.

The name "one_len" is unhelpful in understanding the purpose of these
functions and should be changed.  Most of the callers pass the len as
"strlen()" so using a qstr and QSTR() can simplify the code.

This patch renames these functions (include lookup_positive_unlocked()
which is part of the family despite the name) to have a name based on
"lookup_noperm".  They are changed to receive a 'struct qstr' instead
of separate name and len.  In a few cases the use of QSTR() results in a
new call to strlen().

try_lookup_noperm() takes a pointer to a qstr instead of the whole
qstr.  This is consistent with d_hash_and_lookup() (which is nearly
identical) and useful for lookup_noperm_unlocked().

The new lookup_noperm_common() doesn't take a qstr yet.  That will be
tidied up in a subsequent patch.

Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/r/20250319031545.2999807-5-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-08 11:24:36 +02:00
Linus Torvalds
26d8e43079 vfs-6.15-rc1.async.dir
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rNwAKCRCRxhvAZXjc
 onBJAP9Z8Ywmlb5KQ1E3HvDmkwyY6yOSyZ9/CmbzrkCJ8ywYkQD/d9/xt0EP/O/q
 N8YtzXArHWt7u0YbcVpy9WK3F72BdwU=
 =VJgY
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs async dir updates from Christian Brauner:
 "This contains cleanups that fell out of the work from async directory
  handling:

   - Change kern_path_locked() and user_path_locked_at() to never return
     a negative dentry. This simplifies the usability of these helpers
     in various places

   - Drop d_exact_alias() from the remaining place in NFS where it is
     still used. This also allows us to drop the d_exact_alias() helper
     completely

   - Drop an unnecessary call to fh_update() from nfsd_create_locked()

   - Change i_op->mkdir() to return a struct dentry

     Change vfs_mkdir() to return a dentry provided by the filesystems
     which is hashed and positive. This allows us to reduce the number
     of cases where the resulting dentry is not positive to very few
     cases. The code in these places becomes simpler and easier to
     understand.

   - Repack DENTRY_* and LOOKUP_* flags"

* tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  doc: fix inline emphasis warning
  VFS: Change vfs_mkdir() to return the dentry.
  nfs: change mkdir inode_operation to return alternate dentry if needed.
  fuse: return correct dentry for ->mkdir
  ceph: return the correct dentry on mkdir
  hostfs: store inode in dentry after mkdir if possible.
  Change inode_operations.mkdir to return struct dentry *
  nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked()
  nfs/vfs: discard d_exact_alias()
  VFS: add common error checks to lookup_one_qstr_excl()
  VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  VFS: repack LOOKUP_ bit flags.
  VFS: repack DENTRY_ flags.
2025-03-24 10:47:14 -07:00
NeilBrown
c54b386969
VFS: Change vfs_mkdir() to return the dentry.
vfs_mkdir() does not guarantee to leave the child dentry hashed or make
it positive on success, and in many such cases the filesystem had to use
a different dentry which it can now return.

This patch changes vfs_mkdir() to return the dentry provided by the
filesystems which is hashed and positive when provided.  This reduces
the number of cases where the resulting dentry is not positive to a
handful which don't deserve extra efforts.

The only callers of vfs_mkdir() which are interested in the resulting
inode are in-kernel filesystem clients: cachefiles, nfsd, smb/server.
The only filesystems that don't reliably provide the inode are:
- kernfs, tracefs which these clients are unlikely to be interested in
- cifs in some configurations would need to do a lookup to find the
  created inode, but doesn't.  cifs cannot be exported via NFS, is
  unlikely to be used by cachefiles, and smb/server only has a soft
  requirement for the inode, so this is unlikely to be a problem in
  practice.
- hostfs, nfs, cifs may need to do a lookup (rarely for NFS) and it is
  possible for a race to make that lookup fail.  Actual failure
  is unlikely and providing callers handle negative dentries graceful
  they will fail-safe.

So this patch removes the lookup code in nfsd and smb/server and adjusts
them to fail safe if a negative dentry is provided:
- cache-files already fails safe by restarting the task from the
  top - it still does with this change, though it no longer calls
  cachefiles_put_directory() as that will crash if the dentry is
  negative.
- nfsd reports "Server-fault" which it what it used to do if the lookup
  failed. This will never happen on any file-systems that it can actually
  export, so this is of no consequence.  I removed the fh_update()
  call as that is not needed and out-of-place.  A subsequent
  nfsd_create_setattr() call will call fh_update() when needed.
- smb/server only wants the inode to call ksmbd_smb_inherit_owner()
  which updates ->i_uid (without calling notify_change() or similar)
  which can be safely skipping on cifs (I hope).

If a different dentry is returned, the first one is put.  If necessary
the fact that it is new can be determined by comparing pointers.  A new
dentry will certainly have a new pointer (as the old is put after the
new is obtained).
Similarly if an error is returned (via ERR_PTR()) the original dentry is
put.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250227013949.536172-7-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 11:52:50 +01:00