This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.
As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmmUaqQACgkQEVvVyTe/
1WrRVRAAil+jvxM26SR3lPp/EmGdlsvRwflin1wb6laey96Ezfos4KLBFkoz9cgZ
gW4aZBGx2MDBQy3DNlVbsQZgc4CAnQxrH5HLeR8y5zSbEI3WlcI1vQonzsh61/Cd
Ajnu0asivUTWnuoFicq7syT4TcDVUJFMNxjNCTBkRDWmQIkQZAA2Dy/0nWpur83s
QjAOLcCljxp88oqY1YE5Lbt7xyToL8vhxF2vN1cFrIZ6Ec/BN9x5LizAYGRjEQ6o
qN6o6dAZ4VDyLHms8YQ4KHuj+1psTEPyCNeC3tReek5VKYNS/I9WAwawD4QnmfSu
DF5CYkErFW3eVczFkaKl3K65CR6YkBFJhcNeEIViDh4TKSTtmdFURZPH+IVsB6KT
/Z+4LnNMxdN55tleYDMoxBBecurLQSyjVMDoCrMlRB5iWJL+/G5TO2ahDp+wGWCu
b9FS3DTa6XE/e7xRzDvitRfBX5N8xHNTn8keKRHe4Q7j0DRMYfdltbMz5wyhXfUW
YOZNINFpwwvn0Hq8NK86/lhctOi0qkBmqRbMIzAYwgYJI8Kdbe3wqcDeo2k7K4fC
J4O+puL263+vA85jqyXGmk+H4EK8Z3BNX+uFh8ucFtGi/rBGpLHBS1HiFEFBYRoW
IX6y2EGxgDHvrRbyfVG2EJMjd9XMflOjMYkurJ0r38Cl5Vnreqg=
=v9Yt
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs update from Amir Goldstein:
"Relax the semantics of uuid=off to cater to a use case of overlayfs
lower layers on btrfs clones, whose UUID are ephemeral and an upper
layer on a different filesystem"
* tag 'ovl-update-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: relax requirement for uuid=off,index=on
Please consider pulling these changes from the signed vfs-7.0-rc1.misc.2 tag.
Thanks!
Christian
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaZMOCwAKCRCRxhvAZXjc
oswrAP9r1zjzMimjX2J0hBoMnYjNzQfLLew8+IRygImQ+yaqWgD9Fiw/cQ9eE1Hm
TMLqck/ky588ywSDaBzfztrXAY3ISgg=
=4yr2
-----END PGP SIGNATURE-----
Merge tag 'vfs-7.0-rc1.misc.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull more misc vfs updates from Christian Brauner:
"Features:
- Optimize close_range() from O(range size) to O(active FDs) by using
find_next_bit() on the open_fds bitmap instead of linearly scanning
the entire requested range. This is a significant improvement for
large-range close operations on sparse file descriptor tables.
- Add FS_XFLAG_VERITY file attribute for fs-verity files, retrievable
via FS_IOC_FSGETXATTR and file_getattr(). The flag is read-only.
Add tracepoints for fs-verity enable and verify operations,
replacing the previously removed debug printk's.
- Prevent nfsd from exporting special kernel filesystems like pidfs
and nsfs. These filesystems have custom ->open() and ->permission()
export methods that are designed for open_by_handle_at(2) only and
are incompatible with nfsd. Update the exportfs documentation
accordingly.
Fixes:
- Fix KMSAN uninit-value in ovl_fill_real() where strcmp() was used
on a non-null-terminated decrypted directory entry name from
fscrypt. This triggered on encrypted lower layers when the
decrypted name buffer contained uninitialized tail data.
The fix also adds VFS-level name_is_dot(), name_is_dotdot(), and
name_is_dot_dotdot() helpers, replacing various open-coded "." and
".." checks across the tree.
- Fix read-only fsflags not being reset together with xflags in
vfs_fileattr_set(). Currently harmless since no read-only xflags
overlap with flags, but this would cause inconsistencies for any
future shared read-only flag
- Return -EREMOTE instead of -ESRCH from PIDFD_GET_INFO when the
target process is in a different pid namespace. This lets userspace
distinguish "process exited" from "process in another namespace",
matching glibc's pidfd_getpid() behavior
Cleanups:
- Use C-string literals in the Rust seq_file bindings, replacing the
kernel::c_str!() macro (available since Rust 1.77)
- Fix typo in d_walk_ret enum comment, add porting notes for the
readlink_copy() calling convention change"
* tag 'vfs-7.0-rc1.misc.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: add porting notes about readlink_copy()
pidfs: return -EREMOTE when PIDFD_GET_INFO is called on another ns
nfsd: do not allow exporting of special kernel filesystems
exportfs: clarify the documentation of open()/permission() expotrfs ops
fsverity: add tracepoints
fs: add FS_XFLAG_VERITY for fs-verity files
rust: seq_file: replace `kernel::c_str!` with C-Strings
fs: dcache: fix typo in enum d_walk_ret comment
ovl: use name_is_dot* helpers in readdir code
fs: add helpers name_is_dot{,dot,_dotdot}
ovl: Fix uninit-value in ovl_fill_real
fs: reset read-only fsflags together with xflags
fs/file: optimize close_range() complexity from O(N) to O(Sparse)
Total patches: 107
Reviews/patch: 1.07
Reviewed rate: 67%
- The 2 patch series "ocfs2: give ocfs2 the ability to reclaim
suballocator free bg" from Heming Zhao saves disk space by teaching
ocfs2 to reclaim suballocator block group space.
- The 4 patch series "Add ARRAY_END(), and use it to fix off-by-one
bugs" from Alejandro Colomar adds the ARRAY_END() macro and uses it in
various places.
- The 2 patch series "vmcoreinfo: support VMCOREINFO_BYTES larger than
PAGE_SIZE" from Pnina Feder makes the vmcore code future-safe, if
VMCOREINFO_BYTES ever exceeds the page size.
- The 7 patch series "kallsyms: Prevent invalid access when showing
module buildid" from Petr Mladek cleans up kallsyms code related to
module buildid and fixes an invalid access crash when printing
backtraces.
- The 3 patch series "Address page fault in
ima_restore_measurement_list()" from Harshit Mogalapalli fixes a
kexec-related crash that can occur when booting the second-stage kernel
on x86.
- The 6 patch series "kho: ABI headers and Documentation updates" from
Mike Rapoport updates the kexec handover ABI documentation.
- The 4 patch series "Align atomic storage" from Finn Thain adds the
__aligned attribute to atomic_t and atomic64_t definitions to get
natural alignment of both types on csky, m68k, microblaze, nios2,
openrisc and sh.
- The 2 patch series "kho: clean up page initialization logic" from
Pratyush Yadav simplifies the page initialization logic in
kho_restore_page().
- The 6 patch series "Unload linux/kernel.h" from Yury Norov moves
several things out of kernel.h and into more appropriate places.
- The 7 patch series "don't abuse task_struct.group_leader" from Oleg
Nesterov removes the usage of ->group_leader when it is "obviously
unnecessary".
- The 5 patch series "list private v2 & luo flb" from Pasha Tatashin
adds some infrastructure improvements to the live update orchestrator.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY4giAAKCRDdBJ7gKXxA
jgusAQDnKkP8UWTqXPC1jI+OrDJGU5ciAx8lzLeBVqMKzoYk9AD/TlhT2Nlx+Ef6
0HCUHUD0FMvAw/7/Dfc6ZKxwBEIxyww=
=mmsH
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
disk space by teaching ocfs2 to reclaim suballocator block group
space (Heming Zhao)
- "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
ARRAY_END() macro and uses it in various places (Alejandro Colomar)
- "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
page size (Pnina Feder)
- "kallsyms: Prevent invalid access when showing module buildid" cleans
up kallsyms code related to module buildid and fixes an invalid
access crash when printing backtraces (Petr Mladek)
- "Address page fault in ima_restore_measurement_list()" fixes a
kexec-related crash that can occur when booting the second-stage
kernel on x86 (Harshit Mogalapalli)
- "kho: ABI headers and Documentation updates" updates the kexec
handover ABI documentation (Mike Rapoport)
- "Align atomic storage" adds the __aligned attribute to atomic_t and
atomic64_t definitions to get natural alignment of both types on
csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)
- "kho: clean up page initialization logic" simplifies the page
initialization logic in kho_restore_page() (Pratyush Yadav)
- "Unload linux/kernel.h" moves several things out of kernel.h and into
more appropriate places (Yury Norov)
- "don't abuse task_struct.group_leader" removes the usage of
->group_leader when it is "obviously unnecessary" (Oleg Nesterov)
- "list private v2 & luo flb" adds some infrastructure improvements to
the live update orchestrator (Pasha Tatashin)
* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
procfs: fix missing RCU protection when reading real_parent in do_task_stat()
watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
kho: fix doc for kho_restore_pages()
tests/liveupdate: add in-kernel liveupdate test
liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
liveupdate: luo_file: Use private list
list: add kunit test for private list primitives
list: add primitives for private list manipulations
delayacct: fix uapi timespec64 definition
panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
netclassid: use thread_group_leader(p) in update_classid_task()
RDMA/umem: don't abuse current->group_leader
drm/pan*: don't abuse current->group_leader
drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
drm/amdgpu: don't abuse current->group_leader
android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
android/binder: don't abuse current->group_leader
kho: skip memoryless NUMA nodes when reserving scratch areas
...
Please consider pulling these changes from the signed vfs-7.0-rc1.leases tag.
Thanks!
Christian
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc
olR/AP40iNOTRn7LosXbRWqGGZqzy9v64QYoLzk3QdsWuGmbRAD/egNQzof8mkAf
IscefWTOjY7xyDzmEBEBnfHftgMiEwM=
=zre0
-----END PGP SIGNATURE-----
Merge tag 'vfs-7.0-rc1.leases' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs lease updates from Christian Brauner:
"This contains updates for lease support to require filesystems to
explicitly opt-in to lease support
Currently kernel_setlease() falls through to generic_setlease() when a
a filesystem does not define ->setlease(), silently granting lease
support to every filesystem regardless of whether it is prepared for
it.
This is a poor default: most filesystems never intended to support
leases, and the silent fallthrough makes it impossible to distinguish
"supports leases" from "never thought about it".
This inverts the default. It adds explicit
.setlease = generic_setlease;
assignments to every in-tree filesystem that should retain lease
support, then changes kernel_setlease() to return -EINVAL when
->setlease is NULL.
With the new default in place, simple_nosetlease() is redundant and
is removed along with all references to it"
* tag 'vfs-7.0-rc1.leases' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
fuse: add setlease file operation
fs: remove simple_nosetlease()
filelock: default to returning -EINVAL when ->setlease operation is NULL
xfs: add setlease file operation
ufs: add setlease file operation
udf: add setlease file operation
tmpfs: add setlease file operation
squashfs: add setlease file operation
overlayfs: add setlease file operation
orangefs: add setlease file operation
ocfs2: add setlease file operation
ntfs3: add setlease file operation
nilfs2: add setlease file operation
jfs: add setlease file operation
jffs2: add setlease file operation
gfs2: add a setlease file operation
fat: add setlease file operation
f2fs: add setlease file operation
exfat: add setlease file operation
ext4: add setlease file operation
...
uuid=off,index=on required that all upper/lower directories are on the
same filesystem.
Relax the requirement so that only all the lower directories need to be
on the same filesystem.
Reported-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/r/20260114-tonyk-get_disk_uuid-v1-3-e6a319e25d57@igalia.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Use the helpers in place of all the different open coded variants.
This makes the code more readable and robust.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20260128132406.23768-4-amir73il@gmail.com
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Rename the helper is_dot_dotdot() into the name_ namespace
and add complementary helpers to check for dot and dotdot
names individually.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20260128132406.23768-3-amir73il@gmail.com
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Syzbot reported a KMSAN uninit-value issue in ovl_fill_real.
This iusse's call chain is:
__do_sys_getdents64()
-> iterate_dir()
...
-> ext4_readdir()
-> fscrypt_fname_alloc_buffer() // alloc
-> fscrypt_fname_disk_to_usr // write without tail '\0'
-> dir_emit()
-> ovl_fill_real() // read by strcmp()
The string is used to store the decrypted directory entry name for an
encrypted inode. As shown in the call chain, fscrypt_fname_disk_to_usr()
write it without null-terminate. However, ovl_fill_real() uses strcmp() to
compare the name against "..", which assumes a null-terminated string and
may trigger a KMSAN uninit-value warning when the buffer tail contains
uninit data.
Reported-by: syzbot+d130f98b2c265fae5297@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d130f98b2c265fae5297
Fixes: 4edb83bb10 ("ovl: constant d_ino for non-merge dirs")
Signed-off-by: Qing Wang <wangqing7171@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20260128132406.23768-2-amir73il@gmail.com
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of
hex.h interfaces to directly #include <linux/hex.h> as part of the process
of putting kernel.h on a diet.
Removing hex.h from kernel.h means that 36K C source files don't have to
pay the price of parsing hex.h for the roughly 120 C source files that
need it.
This change has been build-tested with allmodconfig on most ARCHes. Also,
all users/callers of <linux/hex.h> in the entire source tree have been
updated if needed (if not already #included).
Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Currently file_update_time_flags unconditionally returns -EAGAIN if any
timestamp needs to be updated and IOCB_NOWAIT is passed. This makes
non-blocking direct writes impossible on file systems with granular
enough timestamps.
Pass IOCB_NOWAIT to ->update_time and return -EAGAIN if it could block.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260108141934.2052404-9-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Pass the type of update (atime vs c/mtime plus version) as an enum
instead of a set of flags that caused all kinds of confusion.
Because inode_update_timestamps now can't return a modified version
of those flags, return the I_DIRTY_* flags needed to persist the
update, which is what the main caller in generic_update_time wants
anyway, and which is suitable for the other callers that only want
to know if an update happened.
The whole update_time path keeps the flags argument, which will be used
to support non-blocking updates soon even if it is unused, and (the
slightly renamed) inode_update_time also gains the possibility to return
a negative errno to support this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260108141934.2052404-6-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Add the setlease file_operation to ovl_file_operations and
ovl_dir_operations, pointing to generic_setlease. A future patch will
change the default behavior to reject lease attempts with -EINVAL when
there is no setlease file operation defined. Add generic_setlease to
retain the ability to set leases on this filesystem.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260108-setlease-6-20-v1-17-ea4dec9b67fa@kernel.org
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
When creating new files the security layer expects the original
credentials to be passed. When cleaning up the code this was accidently
changed to pass the mounter's credentials by relying on current->cred
which is already overriden at this point. Pass the original credentials
directly.
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Reported-by: Paul Moore <paul@paul-moore.com>
Fixes: e566bff963 ("ovl: port ovl_create_or_link() to new ovl_override_creator_creds")
Link: https://lore.kernel.org/CAFqZXNvL1ciLXMhHrnoyBmQu1PAApH41LkSWEhrcvzAAbFij8Q@mail.gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Tested-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCaTBU0gAKCRDh3BK/laaZ
PD2xAP9BIorxA5cDSyFtAJtj909xe80ai8RNgmenLy4P4RvrvgD9EU7nIFKM6B5O
beaFjUaK7Q3z0oWGxkcfDtLV8CUS0Qc=
=m8wh
-----END PGP SIGNATURE-----
Merge tag 'fuse-update-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Add mechanism for cleaning out unused, stale dentries; controlled via
a module option (Luis Henriques)
- Fix various bugs
- Cleanups
* tag 'fuse-update-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: Uninitialized variable in fuse_epoch_work()
fuse: fix io-uring list corruption for terminated non-committed requests
fuse: signal that a fuse inode should exhibit local fs behaviors
fuse: Always flush the page cache before FOPEN_DIRECT_IO write
fuse: Invalidate the page cache after FOPEN_DIRECT_IO write
fuse: rename 'namelen' to 'namesize'
fuse: use strscpy instead of strcpy
fuse: refactor fuse_conn_put() to remove negative logic.
fuse: new work queue to invalidate dentries from old epochs
fuse: new work queue to periodically invalidate expired dentries
dcache: export shrink_dentry_list() and add new helper d_dispose_if_unused()
fuse: add WARN_ON and comment for RCU revalidate
fuse: Fix whitespace for fuse_uring_args_to_ring() comment
fuse: missing copy_finish in fuse-over-io-uring argument copies
fuse: fix readahead reclaim deadlock
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZwAKCRCRxhvAZXjc
othbAP97FNSOMUMAXTUxocE7vMgq3B/MG54e22ZrYhnZeP8NsgEA3zo4GpPCeM0p
e8EjiLz0wUlveF68MZUg52eXT5/nTAE=
=tbY/
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.19-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull overlayfs cred guard conversion from Christian Brauner:
"This converts all of overlayfs to use credential guards, eliminating
manual credential management throughout the filesystem.
Credential guard conversion:
- Convert all of overlayfs to use credential guards, replacing the
manual ovl_override_creds()/ovl_revert_creds() pattern with scoped
guards.
This makes credential handling visually explicit and eliminates a
class of potential bugs from mismatched override/revert calls.
(1) Basic credential guard (with_ovl_creds)
(2) Creator credential guard (ovl_override_creator_creds):
Introduced a specialized guard for file creation operations
that handles the two-phase credential override (mounter
credentials, then fs{g,u}id override). The new pattern is much
clearer:
with_ovl_creds(dentry->d_sb) {
scoped_class(prepare_creds_ovl, cred, dentry, inode, mode) {
if (IS_ERR(cred))
return PTR_ERR(cred);
/* creation operations */
}
}
(3) Copy-up credential guard (ovl_cu_creds):
Introduced a specialized guard for copy-up operations,
simplifying the previous struct ovl_cu_creds helper and
associated functions.
Ported ovl_copy_up_workdir() and ovl_copy_up_tmpfile() to this
pattern.
Cleanups:
- Remove ovl_revert_creds() after all callers converted to guards
- Remove struct ovl_cu_creds and associated functions
- Drop ovl_setup_cred_for_create() after conversion
- Refactor ovl_fill_super(), ovl_lookup(), ovl_iterate(),
ovl_rename() for cleaner credential guard scope
- Introduce struct ovl_renamedata to simplify rename handling
- Don't override credentials for ovl_check_whiteouts() (unnecessary)
- Remove unneeded semicolon"
* tag 'vfs-6.19-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (54 commits)
ovl: remove unneeded semicolon
ovl: remove struct ovl_cu_creds and associated functions
ovl: port ovl_copy_up_tmpfile() to cred guard
ovl: mark *_cu_creds() as unused temporarily
ovl: port ovl_copy_up_workdir() to cred guard
ovl: add copy up credential guard
ovl: drop ovl_setup_cred_for_create()
ovl: port ovl_create_or_link() to new ovl_override_creator_creds cleanup guard
ovl: mark ovl_setup_cred_for_create() as unused temporarily
ovl: reflow ovl_create_or_link()
ovl: port ovl_create_tmpfile() to new ovl_override_creator_creds cleanup guard
ovl: add ovl_override_creator_creds cred guard
ovl: remove ovl_revert_creds()
ovl: port ovl_fill_super() to cred guard
ovl: refactor ovl_fill_super()
ovl: port ovl_lower_positive() to cred guard
ovl: port ovl_lookup() to cred guard
ovl: refactor ovl_lookup()
ovl: port ovl_copyfile() to cred guard
ovl: port ovl_rename() to cred guard
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZwAKCRCRxhvAZXjc
op9tAQCJ//STOkvYHfqgsdRD+cW9MRg/gPzfVZgnV1FTyf8sMgEA0IsY5zCZB9eh
9FdD0E57P8PlWRwWZ+LktnWBzRAUqwI=
=MOVR
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull directory locking updates from Christian Brauner:
"This contains the work to add centralized APIs for directory locking
operations.
This series is part of a larger effort to change directory operation
locking to allow multiple concurrent operations in a directory. The
ultimate goal is to lock the target dentry(s) rather than the whole
parent directory.
To help with changing the locking protocol, this series centralizes
locking and lookup in new helper functions. The helpers establish a
pattern where it is the dentry that is being locked and unlocked
(currently the lock is held on dentry->d_parent->d_inode, but that can
change in the future).
This also changes vfs_mkdir() to unlock the parent on failure, as well
as dput()ing the dentry. This allows end_creating() to only require
the target dentry (which may be IS_ERR() after vfs_mkdir()), not the
parent"
* tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
nfsd: fix end_creating() conversion
VFS: introduce end_creating_keep()
VFS: change vfs_mkdir() to unlock on failure.
ecryptfs: use new start_creating/start_removing APIs
Add start_renaming_two_dentries()
VFS/ovl/smb: introduce start_renaming_dentry()
VFS/nfsd/ovl: introduce start_renaming() and end_renaming()
VFS: add start_creating_killable() and start_removing_killable()
VFS: introduce start_removing_dentry()
smb/server: use end_removing_noperm for for target of smb2_create_link()
VFS: introduce start_creating_noperm() and start_removing_noperm()
VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing()
VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
VFS: tidy up do_unlinkat()
VFS: introduce start_dirop() and end_dirop()
debugfs: rename end_creating() to debugfs_end_creating()
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZgAKCRCRxhvAZXjc
ooiEAPwNZfkqiSs6G1B2EmjFpMrA2BDqskaOsnN2sywra0sNewD9EQxJwlYXUn+z
nNUIAvmegJGg2OiU2UaNGwxMR3lR3w8=
=YELr
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull directory delegations update from Christian Brauner:
"This contains the work for recall-only directory delegations for
knfsd.
Add support for simple, recallable-only directory delegations. This
was decided at the fall NFS Bakeathon where the NFS client and server
maintainers discussed how to merge directory delegation support.
The approach starts with recallable-only delegations for several reasons:
1. RFC8881 has gaps that are being addressed in RFC8881bis. In
particular, it requires directory position information for
CB_NOTIFY callbacks, which is difficult to implement properly
under Linux. The spec is being extended to allow that information
to be omitted.
2. Client-side support for CB_NOTIFY still lags. The client side
involves heuristics about when to request a delegation.
3. Early indication shows simple, recallable-only delegations can
help performance. Anna Schumaker mentioned seeing a multi-minute
speedup in xfstests runs with them enabled.
With these changes, userspace can also request a read lease on a
directory that will be recalled on conflicting accesses. This may be
useful for applications like Samba. Users can disable leases
altogether via the fs.leases-enable sysctl if needed.
VFS changes:
- Dedicated Type for Delegations
Introduce struct delegated_inode to track inodes that may have
delegations that need to be broken. This replaces the previous
approach of passing raw inode pointers through the delegation
breaking code paths, providing better type safety and clearer
semantics for the delegation machinery.
- Break parent directory delegations in open(..., O_CREAT) codepath
- Allow mkdir to wait for delegation break on parent
- Allow rmdir to wait for delegation break on parent
- Add try_break_deleg calls for parents to vfs_link(), vfs_rename(),
and vfs_unlink()
- Make vfs_create(), vfs_mknod(), and vfs_symlink() break delegations
on parent directory
- Clean up argument list for vfs_create()
- Expose delegation support to userland
Filelock changes:
- Make lease_alloc() take a flags argument
- Rework the __break_lease API to use flags
- Add struct delegated_inode
- Push the S_ISREG check down to ->setlease handlers
- Lift the ban on directory leases in generic_setlease
NFSD changes:
- Allow filecache to hold S_IFDIR files
- Allow DELEGRETURN on directories
- Wire up GET_DIR_DELEGATION handling
Fixes:
- Fix kernel-doc warnings in __fcntl_getlease
- Add needed headers for new struct delegation definition"
* tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
vfs: add needed headers for new struct delegation definition
filelock: __fcntl_getlease: fix kernel-doc warnings
vfs: expose delegation support to userland
nfsd: wire up GET_DIR_DELEGATION handling
nfsd: allow DELEGRETURN on directories
nfsd: allow filecache to hold S_IFDIR files
filelock: lift the ban on directory leases in generic_setlease
vfs: make vfs_symlink break delegations on parent dir
vfs: make vfs_mknod break delegations on parent directory
vfs: make vfs_create break delegations on parent directory
vfs: clean up argument list for vfs_create()
vfs: break parent dir delegations in open(..., O_CREAT) codepath
vfs: allow rmdir to wait for delegation break on parent
vfs: allow mkdir to wait for delegation break on parent
vfs: add try_break_deleg calls for parents to vfs_{link,rename,unlink}
filelock: push the S_ISREG check down to ->setlease handlers
filelock: add struct delegated_inode
filelock: rework the __break_lease API to use flags
filelock: make lease_alloc() take a flags argument
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZAAKCRCRxhvAZXjc
omMSAP9GLhavxyWQ24Q+49CNWWRQWDY1wTOiUK2BwtIvZ0YEcAD8D1dAiMckL5pC
RwEAVA5p+y+qi+bZP0KXCBxQddoTIQM=
=zo/J
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs inode updates from Christian Brauner:
"Features:
- Hide inode->i_state behind accessors. Open-coded accesses prevent
asserting they are done correctly. One obvious aspect is locking,
but significantly more can be checked. For example it can be
detected when the code is clearing flags which are already missing,
or is setting flags when it is illegal (e.g., I_FREEING when
->i_count > 0)
- Provide accessors for ->i_state, converts all filesystems using
coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2,
overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to
compile
- Rework I_NEW handling to operate without fences, simplifying the
code after the accessor infrastructure is in place
Cleanups:
- Move wait_on_inode() from writeback.h to fs.h
- Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb
for clarity
- Cosmetic fixes to LRU handling
- Push list presence check into inode_io_list_del()
- Touch up predicts in __d_lookup_rcu()
- ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage
- Assert on ->i_count in iput_final()
- Assert ->i_lock held in __iget()
Fixes:
- Add missing fences to I_NEW handling"
* tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
dcache: touch up predicts in __d_lookup_rcu()
fs: push list presence check into inode_io_list_del()
fs: cosmetic fixes to lru handling
fs: rework I_NEW handling to operate without fences
fs: make plain ->i_state access fail to compile
xfs: use the new ->i_state accessors
nilfs2: use the new ->i_state accessors
overlayfs: use the new ->i_state accessors
gfs2: use the new ->i_state accessors
f2fs: use the new ->i_state accessors
smb: use the new ->i_state accessors
ceph: use the new ->i_state accessors
btrfs: use the new ->i_state accessors
Manual conversion to use ->i_state accessors of all places not covered by coccinelle
Coccinelle-based conversion to use ->i_state accessors
fs: provide accessors for ->i_state
fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb
fs: move wait_on_inode() from writeback.h to fs.h
fs: add missing fences to I_NEW handling
ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage
...
Remove unnecessary semicolons reported by Coccinelle/coccicheck and the
semantic patch at scripts/coccinelle/misc/semicolon.cocci.
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Fixed: 7ab96df840 ("VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
They will become unused in the next patch and we'll drop them after the
conversion is finished together with the struct. This keeps the changes
small and reviewable.
Link: https://patch.msgid.link/20251114-work-ovl-cred-guard-copyup-v1-3-ea3fb15cf427@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
The current code to override credentials for creation operations is
pretty difficult to understand. We effectively override the credentials
twice:
(1) override with the mounter's credentials
(2) copy the mounts credentials and override the fs{g,u}id with the inode {u,g}id
And then we elide the revert because it would be an idempotent revert.
That elision doesn't buy us anything anymore though because I've made it
all work without any reference counting anyway. All it does is mix the
two credential overrides together.
We can use a cleanup guard to clarify the creation codepaths and make
them easier to understand.
This just introduces the cleanup guard keeping the patch reviewable.
We'll convert the caller in follow-up patches and then drop the
duplicated code.
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-prepare-v2-1-bd1c97a36d7b@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Extract the code that runs under overridden credentials into a separate
ovl_rename_upper() helper function and the code that runs before/after to
ovl_rename_start/end(). Error handling is simplified.
The helpers returns errors directly instead of using goto labels.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-34-b31603935724@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Add a struct ovl_renamedata to group rename-related state that was
previously stored in local variables. Embedd struct renamedata directly
aligning with the vfs.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-33-b31603935724@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>