linux

mirror of https://github.com/torvalds/linux.git synced 2026-03-08 04:04:43 +01:00

Author	SHA1	Message	Date
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Linus Torvalds	45a43ac5ac	vfs-7.0-rc1.misc.2 Please consider pulling these changes from the signed vfs-7.0-rc1.misc.2 tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaZMOCwAKCRCRxhvAZXjc oswrAP9r1zjzMimjX2J0hBoMnYjNzQfLLew8+IRygImQ+yaqWgD9Fiw/cQ9eE1Hm TMLqck/ky588ywSDaBzfztrXAY3ISgg= =4yr2 -----END PGP SIGNATURE----- Merge tag 'vfs-7.0-rc1.misc.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull more misc vfs updates from Christian Brauner: "Features: - Optimize close_range() from O(range size) to O(active FDs) by using find_next_bit() on the open_fds bitmap instead of linearly scanning the entire requested range. This is a significant improvement for large-range close operations on sparse file descriptor tables. - Add FS_XFLAG_VERITY file attribute for fs-verity files, retrievable via FS_IOC_FSGETXATTR and file_getattr(). The flag is read-only. Add tracepoints for fs-verity enable and verify operations, replacing the previously removed debug printk's. - Prevent nfsd from exporting special kernel filesystems like pidfs and nsfs. These filesystems have custom ->open() and ->permission() export methods that are designed for open_by_handle_at(2) only and are incompatible with nfsd. Update the exportfs documentation accordingly. Fixes: - Fix KMSAN uninit-value in ovl_fill_real() where strcmp() was used on a non-null-terminated decrypted directory entry name from fscrypt. This triggered on encrypted lower layers when the decrypted name buffer contained uninitialized tail data. The fix also adds VFS-level name_is_dot(), name_is_dotdot(), and name_is_dot_dotdot() helpers, replacing various open-coded "." and ".." checks across the tree. - Fix read-only fsflags not being reset together with xflags in vfs_fileattr_set(). Currently harmless since no read-only xflags overlap with flags, but this would cause inconsistencies for any future shared read-only flag - Return -EREMOTE instead of -ESRCH from PIDFD_GET_INFO when the target process is in a different pid namespace. This lets userspace distinguish "process exited" from "process in another namespace", matching glibc's pidfd_getpid() behavior Cleanups: - Use C-string literals in the Rust seq_file bindings, replacing the kernel::c_str!() macro (available since Rust 1.77) - Fix typo in d_walk_ret enum comment, add porting notes for the readlink_copy() calling convention change" * tag 'vfs-7.0-rc1.misc.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: add porting notes about readlink_copy() pidfs: return -EREMOTE when PIDFD_GET_INFO is called on another ns nfsd: do not allow exporting of special kernel filesystems exportfs: clarify the documentation of open()/permission() expotrfs ops fsverity: add tracepoints fs: add FS_XFLAG_VERITY for fs-verity files rust: seq_file: replace `kernel::c_str!` with C-Strings fs: dcache: fix typo in enum d_walk_ret comment ovl: use name_is_dot* helpers in readdir code fs: add helpers name_is_dot{,dot,_dotdot} ovl: Fix uninit-value in ovl_fill_real fs: reset read-only fsflags together with xflags fs/file: optimize close_range() complexity from O(N) to O(Sparse)	2026-02-16 13:00:36 -08:00
Linus Torvalds	26c9342bb7	struct filename series [mostly] sanitize struct filename hanling Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaYlcJgAKCRBZ7Krx/gZQ 6xlKAP9c9J13sJ/mcobsj1Ov7nSHISNbnYqvRRCu09Wq3UQvJgEApNQYOEdLtpff zUnWOAQ0nOKY7w9VMLkRRustXpuGjAc= =Fld4 -----END PGP SIGNATURE----- Merge tag 'pull-filename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs 'struct filename' updates from Al Viro: "[Mostly] sanitize struct filename handling" * tag 'pull-filename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (68 commits) sysfs(2): fs_index() argument is _not_ a pathname alpha: switch osf_mount() to strndup_user() ksmbd: use CLASS(filename_kernel) mqueue: switch to CLASS(filename) user_statfs(): switch to CLASS(filename) statx: switch to CLASS(filename_maybe_null) quotactl_block(): switch to CLASS(filename) chroot(2): switch to CLASS(filename) move_mount(2): switch to CLASS(filename_maybe_null) namei.c: switch user pathname imports to CLASS(filename{,_flags}) namei.c: convert getname_kernel() callers to CLASS(filename_kernel) do_f{chmod,chown,access}at(): use CLASS(filename_uflags) do_readlinkat(): switch to CLASS(filename_flags) do_sys_truncate(): switch to CLASS(filename) do_utimes_path(): switch to CLASS(filename_uflags) chdir(2): unspaghettify a bit... do_fchownat(): unspaghettify a bit... fspick(2): use CLASS(filename_flags) name_to_handle_at(): use CLASS(filename_uflags) vfs_open_tree(): use CLASS(filename_uflags) ...	2026-02-09 16:58:28 -08:00
Linus Torvalds	9e355113f0	vfs-7.0-rc1.misc Please consider pulling these changes from the signed vfs-7.0-rc1.misc tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49QAKCRCRxhvAZXjc ojrZAQD1VJzY46r5FnAVf4jlEHyjIbDnZCP/n+c4x6XnqpU6EQEAgB0yAtAGP6+u SBuytElqHoTT5VtmEXTAabCNQ9Ks8wo= =JwZz -----END PGP SIGNATURE----- Merge tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains a mix of VFS cleanups, performance improvements, API fixes, documentation, and a deprecation notice. Scalability and performance: - Rework pid allocation to only take pidmap_lock once instead of twice during alloc_pid(), improving thread creation/teardown throughput by 10-16% depending on false-sharing luck. Pad the namespace refcount to reduce false-sharing - Track file lock presence via a flag in ->i_opflags instead of reading ->i_flctx, avoiding false-sharing with ->i_readcount on open/close hot paths. Measured 4-16% improvement on 24-core open-in-a-loop benchmarks - Use a consume fence in locks_inode_context() to match the store-release/load-consume idiom, eliminating a hardware fence on some architectures - Annotate cdev_lock with __cacheline_aligned_in_smp to prevent false-sharing - Remove a redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu() that never fires since the caller already verifies it, eliminating a 100% mispredicted branch - Fix a 100% mispredicted likely() in devcgroup_inode_permission() that became wrong after a prior code reorder Bug fixes and correctness: - Make insert_inode_locked() wait for inode destruction instead of skipping, fixing a corner case where two matching inodes could exist in the hash - Move f_mode initialization before file_ref_init() in alloc_file() to respect the SLAB_TYPESAFE_BY_RCU ordering contract - Add a WARN_ON_ONCE guard in try_to_free_buffers() for folios with no buffers attached, preventing a null pointer dereference when AS_RELEASE_ALWAYS is set but no release_folio op exists - Fix select restart_block to store end_time as timespec64, avoiding truncation of tv_sec on 32-bit architectures - Make dump_inode() use get_kernel_nofault() to safely access inode and superblock fields, matching the dump_mapping() pattern API modernization: - Make posix_acl_to_xattr() allocate the buffer internally since every single caller was doing it anyway. Reduces boilerplate and unnecessary error checking across ~15 filesystems - Replace deprecated simple_strtoul() with kstrtoul() for the ihash_entries, dhash_entries, mhash_entries, and mphash_entries boot parameters, adding proper error handling - Convert chardev code to use guard(mutex) and __free(kfree) cleanup patterns - Replace min_t() with min() or umin() in VFS code to avoid silently truncating unsigned long to unsigned int - Gate LOOKUP_RCU assertions behind CONFIG_DEBUG_VFS since callers already check the flag Deprecation: - Begin deprecating legacy BSD process accounting (acct(2)). The interface has numerous footguns and better alternatives exist (eBPF) Documentation: - Fix and complete kernel-doc for struct export_operations, removing duplicated documentation between ReST and source - Fix kernel-doc warnings for __start_dirop() and ilookup5_nowait() Testing: - Add a kunit test for initramfs cpio handling of entries with filesize > PATH_MAX Misc: - Add missing <linux/init_task.h> include in fs_struct.c" * tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits) posix_acl: make posix_acl_to_xattr() alloc the buffer fs: make insert_inode_locked() wait for inode destruction initramfs_test: kunit test for cpio.filesize > PATH_MAX fs: improve dump_inode() to safely access inode fields fs: add <linux/init_task.h> for 'init_fs' docs: exportfs: Use source code struct documentation fs: move initializing f_mode before file_ref_init() exportfs: Complete kernel-doc for struct export_operations exportfs: Mark struct export_operations functions at kernel-doc exportfs: Fix kernel-doc output for get_name() acct(2): begin the deprecation of legacy BSD process accounting device_cgroup: remove branch hint after code refactor VFS: fix __start_dirop() kernel-doc warnings fs: Describe @isnew parameter in ilookup5_nowait() fs/namei: Remove redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu fs: only assert on LOOKUP_RCU when built with CONFIG_DEBUG_VFS select: store end_time as timespec64 in restart block chardev: Switch to guard(mutex) and __free(kfree) namespace: Replace simple_strtoul with kstrtoul to parse boot params dcache: Replace simple_strtoul with kstrtoul in set_dhash_entries ...	2026-02-09 15:13:05 -08:00
Linus Torvalds	8113b3998d	vfs-7.0-rc1.atomic_open Please consider pulling these changes from the signed vfs-7.0-rc1.atomic_open tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc ogUIAQDJTGgoi7H5a8OllRLXU/6D4OXhIhvZtvrK31HfLLDTRAEAw8JFnvFrCJP9 xf3yVklTJ9aW65zeh2mG0uiJ87JORgE= =qP0i -----END PGP SIGNATURE----- Merge tag 'vfs-7.0-rc1.atomic_open' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs atomic_open updates from Christian Brauner: "Allow knfsd to use atomic_open() While knfsd offers combined exclusive create and open results to clients, on some filesystems those results are not atomic. The separate vfs_create() + vfs_open() sequence in dentry_create() can produce races and unexpected errors. For example, open O_CREAT with mode 0 will succeed in creating the file but return -EACCES from vfs_open(). Additionally, network filesystems benefit from reducing remote round-trip operations by using a single atomic_open() call. Teach dentry_create() -- whose sole caller is knfsd -- to use atomic_open() for filesystems that support it" * tag 'vfs-7.0-rc1.atomic_open' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs/namei: fix kernel-doc markup for dentry_create VFS/knfsd: Teach dentry_create() to use atomic_open() VFS: Prepare atomic_open() for dentry_create() VFS: move dentry_create() from fs/open.c to fs/namei.c	2026-02-09 14:25:37 -08:00
Linus Torvalds	6124fa45e2	vfs-7.0-rc1.btrfs Please consider pulling these changes from the signed vfs-7.0-rc1.btrfs tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc oogiAP0bJ72jxff4CcV1VDltO/mDT2XcCBRz3hYSZdC12Q+AYAD/XlozEUrUgbgg V2pWb1Xo+NrbNyhtNQ+2btHFmkzJ1gY= =7Xrx -----END PGP SIGNATURE----- Merge tag 'vfs-7.0-rc1.btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs updates for btrfs from Christian Brauner: "This contains some changes for btrfs that are taken to the vfs tree to stop duplicating VFS code for subvolume/snapshot dentry Btrfs has carried private copies of the VFS may_delete() and may_create() functions in fs/btrfs/ioctl.c for permission checks during subvolume creation and snapshot destruction. These copies have drifted out of sync with the VFS originals — btrfs_may_delete() is missing the uid/gid validity check and btrfs_may_create() is missing the audit_inode_child() call. Export the VFS functions as may_{create,delete}_dentry() and switch btrfs to use them, removing ~70 lines of duplicated code" * tag 'vfs-7.0-rc1.btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: btrfs: use may_create_dentry() in btrfs_mksubvol() btrfs: use may_delete_dentry() in btrfs_ioctl_snap_destroy() fs: export may_create() as may_create_dentry() fs: export may_delete() as may_delete_dentry()	2026-02-09 13:05:35 -08:00
Amir Goldstein	55fb177d3a	fs: add helpers name_is_dot{,dot,_dotdot} Rename the helper is_dot_dotdot() into the name_ namespace and add complementary helpers to check for dot and dotdot names individually. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://patch.msgid.link/20260128132406.23768-3-amir73il@gmail.com Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-01-29 10:06:59 +01:00
Jay Winston	6ea258d1f6	fs/namei: fix kernel-doc markup for dentry_create O_ is interpreted as a broken hyperlink target. Escape _ with a backslash. The asterisk in "struct file *" is interpreted as an opening emphasis string that never closes. Replace double quotes with rST backticks. Change "a ERR_PTR" to "an ERR_PTR". Signed-off-by: Jay Winston <jaybenjaminwinston@gmail.com> Link: https://patch.msgid.link/20260118110401.2651-1-jaybenjaminwinston@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-01-20 14:54:01 +01:00
Al Viro	904f58b507	namei.c: switch user pathname imports to CLASS(filename{,_flags}) filename_flags is used by user_path_at(). I suspect that mixing LOOKUP_EMPTY with real lookup flags had been a mistake all along; the former belongs to pathname import, the latter - to pathwalk. Right now none of the remaining in-tree callers of user_path_at() are getting LOOKUP_EMPTY in flags, so user_path_at() could probably be switched to CLASS(filename)... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-16 12:52:03 -05:00
Al Viro	e9817d5b8c	namei.c: convert getname_kernel() callers to CLASS(filename_kernel) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-16 12:52:03 -05:00
Al Viro	e50aae1d39	non-consuming variants of do_{unlinkat,rmdir}() similar to previous commit; replacements are filename_{unlinkat,rmdir}() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-16 12:51:50 -05:00
Al Viro	88fdc27617	non-consuming variant of do_mknodat() similar to previous commit; replacement is filename_mknodat() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-16 12:49:26 -05:00
Al Viro	dc912db15a	non-consuming variant of do_mkdirat() similar to previous commit; replacement is filename_mkdirat() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-16 12:48:49 -05:00
Al Viro	da72b76aae	non-consuming variant of do_symlinkat() similar to previous commit; replacement is filename_symlinkat() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-16 12:48:16 -05:00
Al Viro	037193b0ae	non-consuming variant of do_linkat() similar to previous commit; replacement is filename_linkat() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-16 12:47:42 -05:00
Al Viro	e6d50234cc	non-consuming variant of do_renameat2() filename_renameat2() replaces do_renameat2(); unlike the latter, it does not drop filename references - these days it can be just as easily arranged in the caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-16 12:46:57 -05:00
Filipe Manana	26aab3a485	fs: export may_create() as may_create_dentry() For many years btrfs as been using a copy of may_create() in fs/btrfs/ioctl.c:btrfs_may_create(). Everytime may_create() is updated we need to update the btrfs copy, and this is a maintenance burden. Currently there are minor differences between both because the btrfs side lacks updates done in may_create(). Export may_create() so that btrfs can use it and with the less generic name may_create_dentry(). Signed-off-by: Filipe Manana <fdmanana@suse.com> Link: https://patch.msgid.link/ce5174bca079f4cdcbb8dd145f0924feb1f227cd.1768307858.git.fdmanana@suse.com Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-01-14 17:17:47 +01:00
Filipe Manana	173e937552	fs: export may_delete() as may_delete_dentry() For many years btrfs as been using a copy of may_delete() in fs/btrfs/ioctl.c:btrfs_may_delete(). Everytime may_delete() is updated we need to update the btrfs copy, and this is a maintenance burden. Currently there are minor differences between both because the btrfs side lacks updates done in may_delete(). Export may_delete() so that btrfs can use it and with the less generic name may_delete_dentry(). While at it change the calls in vfs_rmdir() to pass a boolean literal instead of 1 and 0 as the last argument since the argument has a bool type. Signed-off-by: Filipe Manana <fdmanana@suse.com> Link: https://patch.msgid.link/e09128fd53f01b19d0a58f0e7d24739f79f47f6d.1768307858.git.fdmanana@suse.com Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-01-14 17:17:47 +01:00
Al Viro	541003b576	rename do_filp_open() to do_file_open() "filp" thing never made sense; seeing that there are exactly 4 callers in the entire tree (and it's neither exported nor even declared in linux//.h), there's no point keeping that ugliness. FWIW, the 'filp' thing did originate in OSD&I; for some reason Tanenbaum decided to call the object representing an opened file 'struct filp', the last letter standing for 'position'. In all Unices, Linux included, the corresponding object had always been 'struct file'... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-13 15:18:07 -05:00
Al Viro	2e2d64aea5	do_filp_open(): DTRT when getting ERR_PTR() as pathname The rest of the set_nameidata() callers treat IS_ERR(pathname) as "bail out immediately with PTR_ERR(pathname) as error". Makes life simpler for callers; do_filp_open() is the only exception and its callers would also benefit from such calling conventions change. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-13 15:18:07 -05:00
Al Viro	741c97fecb	struct filename ->refcnt doesn't need to be atomic ... or visible outside of audit, really. Note that references held in delayed_filename always have refcount 1, and from the moment of complete_getname() or equivalent point in getname...() there won't be any references to struct filename instance left in places visible to other threads. Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-13 15:18:07 -05:00
Al Viro	9fa3ec8458	allow incomplete imports of filenames There are two filename-related problems in io_uring and its interplay with audit. Filenames are imported when request is submitted and used when it is processed. Unfortunately, the latter may very well happen in a different thread. In that case the reference to filename is put into the wrong audit_context - that of submitting thread, not the processing one. Audit logics is called by the latter, and it really wants to be able to find the names in audit_context current (== processing) thread. Another related problem is the headache with refcounts - normally all references to given struct filename are visible only to one thread (the one that uses that struct filename). io_uring violates that - an extra reference is stashed in audit_context of submitter. It gets dropped when submitter returns to userland, which can happen simultaneously with processing thread deciding to drop the reference it got. We paper over that by making refcount atomic, but that means pointless headache for everyone. Solution: the notion of partially imported filenames. Namely, already copied from userland, but not exposed to audit yet. io_uring can create that in submitter thread, and complete the import (obtaining the usual reference to struct filename) in processing thread. Object: struct delayed_filename. Primitives for working with it: delayed_getname(&delayed_filename, user_string) - copies the name from userland, returning 0 and stashing the address of (still incomplete) struct filename in delayed_filename on success and returning -E... on error. delayed_getname_uflags(&delayed_filename, user_string, atflags) - similar, in the same relation to delayed_getname() as getname_uflags() is to getname() complete_getname(&delayed_filename) - completes the import of filename stashed in delayed_filename and returns struct filename to caller, emptying delayed_filename. CLASS(filename_complete_delayed, name)(&delayed_filename) - variant of CLASS(filename) with complete_getname() for constructor. dismiss_delayed_filename(&delayed_filename) - destructor; drops whatever might be stashed in delayed_filename, emptying it. putname_to_delayed(&delayed_filename, name) - if name is shared, stashes its copy into delayed_filename and drops the reference to name, otherwise stashes the name itself in there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-13 15:18:07 -05:00
Al Viro	a9900a27df	switch __getname_maybe_null() to CLASS(filename_flags) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-13 15:18:07 -05:00
Mateusz Guzik	7ca83f8ebe	fs: hide names_cache behind runtime const machinery s/names_cachep/names_cache/ for consistency with dentry cache. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-13 15:17:26 -05:00
Al Viro	8c888b3190	struct filename: saner handling of long names Always allocate struct filename from names_cachep, long name or short; short names would be embedded into struct filename. Longer ones do not cannibalize the original struct filename - put them into kmalloc'ed buffers (PATH_MAX-sized for import from userland, strlen() + 1 - for ones originating kernel-side, where we know the length beforehand). Cutoff length for short names is chosen so that struct filename would be 192 bytes long - that's both a multiple of 64 and large enough to cover the majority of real-world uses. Simplifies logics in getname()/putname() and friends. [fixed an embarrassing braino in EMBEDDED_NAME_MAX, first reported by Dan Carpenter] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-13 15:16:44 -05:00
Al Viro	c3a3577cdb	struct filename: use names_cachep only for getname() and friends Instances of struct filename come from names_cachep (via __getname()). That is done by getname_flags() and getname_kernel() and these two are the main callers of __getname(). However, there are other callers that simply want to allocate PATH_MAX bytes for uses that have nothing to do with struct filename. We want saner allocation rules for long pathnames, so that struct filename would always come from names_cachep, with the out-of-line pathname getting kmalloc'ed. For that we need to be able to change the size of objects allocated by getname_flags()/getname_kernel(). That requires the rest of __getname() users to stop using names_cachep; we could explicitly switch all of those to kmalloc(), but that would cause quite a bit of noise. So the plan is to switch getname_...() to new helpers and turn __getname() into a wrapper for kmalloc(). Remaining __getname() users could be converted to explicit kmalloc() at leisure, hopefully along with figuring out what size do they really want - PATH_MAX is an overkill for some of them, used out of laziness ("we have a convenient helper that does 4K allocations and that's large enough, let's use it"). As a side benefit, names_cachep is no longer used outside of fs/namei.c, so we can move it there and be done with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-13 15:16:44 -05:00
Al Viro	8f2ac84817	getname_flags() massage, part 2 Take the "long name" case into a helper (getname_long()). In case of failure have the caller deal with freeing the original struct filename. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-13 15:16:44 -05:00
Al Viro	8ba29c85e2	getname_flags() massage, part 1 In case of long name don't reread what we'd already copied. memmove() it instead. That avoids the possibility of ending up with empty name there and the need to look at the flags on the slow path. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-13 15:16:44 -05:00
Al Viro	41670a5900	get rid of audit_reusename() Originally we tried to avoid multiple insertions into audit names array during retry loop by a cute hack - memorize the userland pointer and if there already is a match, just grab an extra reference to it. Cute as it had been, it had problems - two identical pointers had audit aux entries merged, two identical strings did not. Having different behaviour for syscalls that differ only by addresses of otherwise identical string arguments is obviously wrong - if nothing else, compiler can decide to merge identical string literals. Besides, this hack does nothing for non-audited processes - they get a fresh copy for retry. It's not time-critical, but having behaviour subtly differ that way is bogus. These days we have very few places that import filename more than once (9 functions total) and it's easy to massage them so we get rid of all re-imports. With that done, we don't need audit_reusename() anymore. There's no need to memorize userland pointer either. Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-13 15:16:44 -05:00
Al Viro	4bfe0692d6	init_mknod(): turn into a trivial wrapper for do_mknodat() Same as init_unlink() and init_rmdir() already are; the only obstacle is do_mknodat() being static. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-01-13 15:01:32 -05:00
Bagas Sanjaya	ba4c74f80e	VFS: fix __start_dirop() kernel-doc warnings Sphinx report kernel-doc warnings: WARNING: ./fs/namei.c:2853 function parameter 'state' not described in '__start_dirop' WARNING: ./fs/namei.c:2853 expecting prototype for start_dirop(). Prototype was for __start_dirop() instead Fix them up. Fixes: `ff7c4ea11a` ("VFS: add start_creating_killable() and start_removing_killable()") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://patch.msgid.link/20251219024620.22880-3-bagasdotme@gmail.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-01-06 23:17:52 +01:00
Breno Leitao	a6b9f5b2f0	fs/namei: Remove redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu The check for DCACHE_MANAGED_DENTRY at the start of __follow_mount_rcu() is redundant because the only caller (handle_mounts) already verifies d_managed(dentry) before calling this function, so, dentry in __follow_mount_rcu() has always DCACHE_MANAGED_DENTRY set. This early-out optimization never fires in practice - but it is marking as likely(). This was detected with branch profiling, which shows 100% misprediction in this likely. Remove the whole if clause instead of removing the likely, given we know for sure that dentry is not DCACHE_MANAGED_DENTRY. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260105-dcache-v1-1-f0d904b4a7c2@debian.org Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-01-06 22:48:15 +01:00
Mateusz Guzik	729d015ab2	fs: only assert on LOOKUP_RCU when built with CONFIG_DEBUG_VFS Calls to the 2 modified routines are explicitly gated with checks for the flag, so there is no use for this in production kernels. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251229125751.826050-1-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-01-06 22:47:07 +01:00
Bagas Sanjaya	73a91ef328	VFS: fix __start_dirop() kernel-doc warnings Sphinx report kernel-doc warnings: WARNING: ./fs/namei.c:2853 function parameter 'state' not described in '__start_dirop' WARNING: ./fs/namei.c:2853 expecting prototype for start_dirop(). Prototype was for __start_dirop() instead Fix them up. Fixes: `ff7c4ea11a` ("VFS: add start_creating_killable() and start_removing_killable()") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://patch.msgid.link/20251219024620.22880-3-bagasdotme@gmail.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-24 13:33:24 +01:00
Mateusz Guzik	46af9ae130	fs: make sure to fail try_to_unlazy() and try_to_unlazy() for LOOKUP_CACHED Otherwise the slowpath can be taken by the caller, defeating the flag. This regressed after calls to legitimize_links() started being conditionally elided and stems from the routine always failing after seeing the flag, regardless if there were any links. In order to address both the bug and the weird semantics make it illegal to call legitimize_links() with LOOKUP_CACHED and handle the problem at the two callsites. Fixes: `7c179096e7` ("fs: add predicts based on nd->depth") Reported-by: Chris Mason <clm@meta.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251220054023.142134-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-24 13:31:56 +01:00
Benjamin Coddington	64a989dbd1	VFS/knfsd: Teach dentry_create() to use atomic_open() While knfsd offers combined exclusive create and open results to clients, on some filesystems those results may not be atomic. This behavior can be observed. For example, an open O_CREAT with mode 0 will succeed in creating the file but unexpectedly return -EACCES from vfs_open(). Additionally reducing the number of remote RPC calls required for O_CREAT on network filesystem provides a performance benefit in the open path. Teach knfsd's helper dentry_create() to use atomic_open() for filesystems that support it. The previously const @path is passed up to atomic_open() and may be modified depending on whether an existing entry was found or if the atomic_open() returned an error and consumed the passed-in dentry. Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com> Link: https://patch.msgid.link/8e449bfb64ab055abb9fd82641a171531415a88c.1764259052.git.bcodding@hammerspace.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-15 14:12:45 +01:00
Benjamin Coddington	36411554e8	VFS: Prepare atomic_open() for dentry_create() The next patch allows dentry_create() to call atomic_open(), but it does not have fabricated nameidata. Let atomic_open() take a path instead. Since atomic_open() currently takes a nameidata of which it only uses the path and the flags, and flags are only used to update open_flags, then the flag update can happen before calling atomic_open(). Then, only the path needs be passed to atomic_open() rather than the whole nameidata. This makes it easier for dentry_create() To call atomic_open(). Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com> Link: https://patch.msgid.link/e8c1d2ca28de4a972d37e78599502108148fe17d.1764259052.git.bcodding@hammerspace.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-15 14:12:45 +01:00
Benjamin Coddington	977de00dfc	VFS: move dentry_create() from fs/open.c to fs/namei.c To prepare knfsd's helper dentry_create(), move it to namei.c so that it can access static functions within. Callers of dentry_create() can be viewed as being mostly done with lookup, but still need to perform a few final checks. In order to use atomic_open() we want dentry_create() to be able to access: - vfs_prepare_mode - may_o_create - atomic_open .. all of which have static declarations. Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com> Link: https://patch.msgid.link/42deec53a50e1676e5501f8f1e17967d47b83681.1764259052.git.bcodding@hammerspace.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-12-15 14:12:41 +01:00
Linus Torvalds	a8058f8442	vfs-6.19-rc1.directory.locking -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZwAKCRCRxhvAZXjc op9tAQCJ//STOkvYHfqgsdRD+cW9MRg/gPzfVZgnV1FTyf8sMgEA0IsY5zCZB9eh 9FdD0E57P8PlWRwWZ+LktnWBzRAUqwI= =MOVR -----END PGP SIGNATURE----- Merge tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull directory locking updates from Christian Brauner: "This contains the work to add centralized APIs for directory locking operations. This series is part of a larger effort to change directory operation locking to allow multiple concurrent operations in a directory. The ultimate goal is to lock the target dentry(s) rather than the whole parent directory. To help with changing the locking protocol, this series centralizes locking and lookup in new helper functions. The helpers establish a pattern where it is the dentry that is being locked and unlocked (currently the lock is held on dentry->d_parent->d_inode, but that can change in the future). This also changes vfs_mkdir() to unlock the parent on failure, as well as dput()ing the dentry. This allows end_creating() to only require the target dentry (which may be IS_ERR() after vfs_mkdir()), not the parent" * tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nfsd: fix end_creating() conversion VFS: introduce end_creating_keep() VFS: change vfs_mkdir() to unlock on failure. ecryptfs: use new start_creating/start_removing APIs Add start_renaming_two_dentries() VFS/ovl/smb: introduce start_renaming_dentry() VFS/nfsd/ovl: introduce start_renaming() and end_renaming() VFS: add start_creating_killable() and start_removing_killable() VFS: introduce start_removing_dentry() smb/server: use end_removing_noperm for for target of smb2_create_link() VFS: introduce start_creating_noperm() and start_removing_noperm() VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing() VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating() VFS: tidy up do_unlinkat() VFS: introduce start_dirop() and end_dirop() debugfs: rename end_creating() to debugfs_end_creating()	2025-12-01 16:13:46 -08:00
Linus Torvalds	db74a7d02a	vfs-6.19-rc1.directory.delegations -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZgAKCRCRxhvAZXjc ooiEAPwNZfkqiSs6G1B2EmjFpMrA2BDqskaOsnN2sywra0sNewD9EQxJwlYXUn+z nNUIAvmegJGg2OiU2UaNGwxMR3lR3w8= =YELr -----END PGP SIGNATURE----- Merge tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull directory delegations update from Christian Brauner: "This contains the work for recall-only directory delegations for knfsd. Add support for simple, recallable-only directory delegations. This was decided at the fall NFS Bakeathon where the NFS client and server maintainers discussed how to merge directory delegation support. The approach starts with recallable-only delegations for several reasons: 1. RFC8881 has gaps that are being addressed in RFC8881bis. In particular, it requires directory position information for CB_NOTIFY callbacks, which is difficult to implement properly under Linux. The spec is being extended to allow that information to be omitted. 2. Client-side support for CB_NOTIFY still lags. The client side involves heuristics about when to request a delegation. 3. Early indication shows simple, recallable-only delegations can help performance. Anna Schumaker mentioned seeing a multi-minute speedup in xfstests runs with them enabled. With these changes, userspace can also request a read lease on a directory that will be recalled on conflicting accesses. This may be useful for applications like Samba. Users can disable leases altogether via the fs.leases-enable sysctl if needed. VFS changes: - Dedicated Type for Delegations Introduce struct delegated_inode to track inodes that may have delegations that need to be broken. This replaces the previous approach of passing raw inode pointers through the delegation breaking code paths, providing better type safety and clearer semantics for the delegation machinery. - Break parent directory delegations in open(..., O_CREAT) codepath - Allow mkdir to wait for delegation break on parent - Allow rmdir to wait for delegation break on parent - Add try_break_deleg calls for parents to vfs_link(), vfs_rename(), and vfs_unlink() - Make vfs_create(), vfs_mknod(), and vfs_symlink() break delegations on parent directory - Clean up argument list for vfs_create() - Expose delegation support to userland Filelock changes: - Make lease_alloc() take a flags argument - Rework the __break_lease API to use flags - Add struct delegated_inode - Push the S_ISREG check down to ->setlease handlers - Lift the ban on directory leases in generic_setlease NFSD changes: - Allow filecache to hold S_IFDIR files - Allow DELEGRETURN on directories - Wire up GET_DIR_DELEGATION handling Fixes: - Fix kernel-doc warnings in __fcntl_getlease - Add needed headers for new struct delegation definition" * tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: add needed headers for new struct delegation definition filelock: __fcntl_getlease: fix kernel-doc warnings vfs: expose delegation support to userland nfsd: wire up GET_DIR_DELEGATION handling nfsd: allow DELEGRETURN on directories nfsd: allow filecache to hold S_IFDIR files filelock: lift the ban on directory leases in generic_setlease vfs: make vfs_symlink break delegations on parent dir vfs: make vfs_mknod break delegations on parent directory vfs: make vfs_create break delegations on parent directory vfs: clean up argument list for vfs_create() vfs: break parent dir delegations in open(..., O_CREAT) codepath vfs: allow rmdir to wait for delegation break on parent vfs: allow mkdir to wait for delegation break on parent vfs: add try_break_deleg calls for parents to vfs_{link,rename,unlink} filelock: push the S_ISREG check down to ->setlease handlers filelock: add struct delegated_inode filelock: rework the __break_lease API to use flags filelock: make lease_alloc() take a flags argument	2025-12-01 15:34:41 -08:00
Linus Torvalds	9368f0f941	vfs-6.19-rc1.inode -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZAAKCRCRxhvAZXjc omMSAP9GLhavxyWQ24Q+49CNWWRQWDY1wTOiUK2BwtIvZ0YEcAD8D1dAiMckL5pC RwEAVA5p+y+qi+bZP0KXCBxQddoTIQM= =zo/J -----END PGP SIGNATURE----- Merge tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "Features: - Hide inode->i_state behind accessors. Open-coded accesses prevent asserting they are done correctly. One obvious aspect is locking, but significantly more can be checked. For example it can be detected when the code is clearing flags which are already missing, or is setting flags when it is illegal (e.g., I_FREEING when ->i_count > 0) - Provide accessors for ->i_state, converts all filesystems using coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2, overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to compile - Rework I_NEW handling to operate without fences, simplifying the code after the accessor infrastructure is in place Cleanups: - Move wait_on_inode() from writeback.h to fs.h - Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb for clarity - Cosmetic fixes to LRU handling - Push list presence check into inode_io_list_del() - Touch up predicts in __d_lookup_rcu() - ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage - Assert on ->i_count in iput_final() - Assert ->i_lock held in __iget() Fixes: - Add missing fences to I_NEW handling" * tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits) dcache: touch up predicts in __d_lookup_rcu() fs: push list presence check into inode_io_list_del() fs: cosmetic fixes to lru handling fs: rework I_NEW handling to operate without fences fs: make plain ->i_state access fail to compile xfs: use the new ->i_state accessors nilfs2: use the new ->i_state accessors overlayfs: use the new ->i_state accessors gfs2: use the new ->i_state accessors f2fs: use the new ->i_state accessors smb: use the new ->i_state accessors ceph: use the new ->i_state accessors btrfs: use the new ->i_state accessors Manual conversion to use ->i_state accessors of all places not covered by coccinelle Coccinelle-based conversion to use ->i_state accessors fs: provide accessors for ->i_state fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb fs: move wait_on_inode() from writeback.h to fs.h fs: add missing fences to I_NEW handling ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage ...	2025-12-01 09:02:34 -08:00
Mateusz Guzik	177fdbae39	fs: inline step_into() and walk_component() The primary consumer is link_path_walk(), calling walk_component() every time which in turn calls step_into(). Inlining these saves overhead of 2 function calls per path component, along with allowing the compiler to do better job optimizing them in place. step_into() had absolutely atrocious assembly to facilitate the slowpath. In order to lessen the burden at the callsite all the hard work is moved into step_into_slowpath() and instead an inline-able fastpath is implemented for rcu-walk. The new fastpath is a stripped down step_into() RCU handling with a d_managed() check from handle_mounts(). Benchmarked as follows on Sapphire Rapids: 1. the "before" was a kernel with not-yet-merged optimizations (notably elision of calls to security_inode_permission() and marking ext4 inodes as not having acls as applicable) 2. "after" is the same + the prep patch + this patch 3. benchmark consists of issuing 205 calls to access(2) in a loop with pathnames lifted out of gcc and the linker building real code, most of which have several path components and 118 of which fail with -ENOENT. Result in terms of ops/s: before: 21619 after: 22536 (+4%) profile before: 20.25% [kernel] [k] __d_lookup_rcu 10.54% [kernel] [k] link_path_walk 10.22% [kernel] [k] entry_SYSCALL_64 6.50% libc.so.6 [.] __GI___access 6.35% [kernel] [k] strncpy_from_user 4.87% [kernel] [k] step_into 3.68% [kernel] [k] kmem_cache_alloc_noprof 2.88% [kernel] [k] walk_component 2.86% [kernel] [k] kmem_cache_free 2.14% [kernel] [k] set_root 2.08% [kernel] [k] lookup_fast after: 23.38% [kernel] [k] __d_lookup_rcu 11.27% [kernel] [k] entry_SYSCALL_64 10.89% [kernel] [k] link_path_walk 7.00% libc.so.6 [.] __GI___access 6.88% [kernel] [k] strncpy_from_user 3.50% [kernel] [k] kmem_cache_alloc_noprof 2.01% [kernel] [k] kmem_cache_free 2.00% [kernel] [k] set_root 1.99% [kernel] [k] lookup_fast 1.81% [kernel] [k] do_syscall_64 1.69% [kernel] [k] entry_SYSCALL_64_safe_stack While walk_component() and step_into() of course disappear from the profile, the link_path_walk() barely gets more overhead despite the inlining thanks to the fast path added and while completing more walks per second. I did not investigate why overhead grew a lot on __d_lookup_rcu(). Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251120003803.2979978-2-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-26 14:52:02 +01:00
Mateusz Guzik	9d2a6211a7	fs: tidy up step_into() & friends before inlining Symlink handling is already marked as unlikely and pushing out some of it into pick_link() reduces register spillage on entry to step_into() with gcc 14.2. The compiler needed additional convincing that handle_mounts() is unlikely to fail. At the same time neither clang nor gcc could be convinced to tail-call into pick_link(). While pick_link() takes an address of stack-based object as an argument (which definitely prevents the optimization), splitting it into separate <dentry, mount> tuple did not help. The issue persists even when compiled without stack protector. As such nothing was done about this for the time being to not grow the diff. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251120003803.2979978-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-26 14:52:02 +01:00
Mateusz Guzik	8d79ec9e7f	fs: mark lookup_slow() as noinline Otherwise it gets inlined notably in walk_component(), which convinces the compiler to push/pop additional registers in the fast path to accomodate existence of the inlined version. Shortens the fast path of that routine from 87 to 71 bytes. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251119144930.2911698-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-25 10:04:38 +01:00
Mateusz Guzik	7c179096e7	fs: add predicts based on nd->depth Stats from nd->depth usage during the venerable kernel build collected like so: bpftrace -e 'kprobe:terminate_walk,kprobe:walk_component,kprobe:legitimize_links { @[probe] = lhist(((struct nameidata )arg0)->depth, 0, 8, 1); }' @[kprobe:legitimize_links]: [0, 1) 6554906 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [1, 2) 3534 \| \| @[kprobe:terminate_walk]: [0, 1) 12153664 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| @[kprobe:walk_component]: [0, 1) 53075749 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [1, 2) 971421 \| \| [2, 3) 84946 \| \| Additionally a custom probe was added for depth within link_path_walk(): bpftrace -e 'kprobe:link_path_walk_probe { @[probe] = lhist(arg0, 0, 8, 1); }' @[kprobe:link_path_walk_probe]: [0, 1) 7528231 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [1, 2) 407905 \|@@ \| Given these results: 1. terminate_walk() is called towards the end of the lookup and in this test it never had any links to clean up. 2. legitimize_links() is also called towards the end of lookup and most of the time there s 0 depth. Patch consumers to avoid calling into it in that case. 3. walk_component() is typically called with WALK_MORE and zero depth, checked in that order. Check depth first and predict it is 0. 4. link_path_walk() also does not deal with a symlink most of the time when !name Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251119142954.2909394-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-25 10:04:01 +01:00
NeilBrown	fe497f0759	VFS: change vfs_mkdir() to unlock on failure. vfs_mkdir() already drops the reference to the dentry on failure but it leaves the parent locked. This complicates end_creating() which needs to unlock the parent even though the dentry is no longer available. If we change vfs_mkdir() to unlock on failure as well as releasing the dentry, we can remove the "parent" arg from end_creating() and simplify the rules for calling it. Note that cachefiles_get_directory() can choose to substitute an error instead of actually calling vfs_mkdir(), for fault injection. In that case it needs to call end_creating(), just as vfs_mkdir() now does on error. ovl_create_real() will now unlock on error. So the conditional end_creating() after the call is removed, and end_creating() is called internally on error. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-15-neilb@ownmail.net Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-14 13:15:58 +01:00
NeilBrown	f046fbb4d8	ecryptfs: use new start_creating/start_removing APIs This requires the addition of start_creating_dentry() which is given the dentry which has already been found, and asks for it to be locked and its parent validated. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-14-neilb@ownmail.net Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-14 13:15:58 +01:00
NeilBrown	833d2b3a07	Add start_renaming_two_dentries() A few callers want to lock for a rename and already have both dentries. Also debugfs does want to perform a lookup but doesn't want permission checking, so start_renaming_dentry() cannot be used. This patch introduces start_renaming_two_dentries() which is given both dentries. debugfs performs one lookup itself. As it will only continue with a negative dentry and as those cannot be renamed or unlinked, it is safe to do the lookup before getting the rename locks. overlayfs uses start_renaming_two_dentries() in three places and selinux uses it twice in sel_make_policy_nodes(). In sel_make_policy_nodes() we now lock for rename twice instead of just once so the combined operation is no longer atomic w.r.t the parent directory locks. As selinux_state.policy_mutex is held across the whole operation this does not open up any interesting races. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-13-neilb@ownmail.net Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-14 13:15:58 +01:00
NeilBrown	ac50950ca1	VFS/ovl/smb: introduce start_renaming_dentry() Several callers perform a rename on a dentry they already have, and only require lookup for the target name. This includes smb/server and a few different places in overlayfs. start_renaming_dentry() performs the required lookup and takes the required lock using lock_rename_child() It is used in three places in overlayfs and in ksmbd_vfs_rename(). In the ksmbd case, the parent of the source is not important - the source must be renamed from wherever it is. So start_renaming_dentry() allows rd->old_parent to be NULL and only checks it if it is non-NULL. On success rd->old_parent will be the parent of old_dentry with an extra reference taken. Other start_renaming function also now take the extra reference and end_renaming() now drops this reference as well. ovl_lookup_temp(), ovl_parent_lock(), and ovl_parent_unlock() are all removed as they are no longer needed. OVL_TEMPNAME_SIZE and ovl_tempname() are now declared in overlayfs.h so that ovl_check_rename_whiteout() can access them. ovl_copy_up_workdir() now always cleans up on error. Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-12-neilb@ownmail.net Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-14 13:15:57 +01:00
NeilBrown	5c87527299	VFS/nfsd/ovl: introduce start_renaming() and end_renaming() start_renaming() combines name lookup and locking to prepare for rename. It is used when two names need to be looked up as in nfsd and overlayfs - cases where one or both dentries are already available will be handled separately. __start_renaming() avoids the inode_permission check and hash calculation and is suitable after filename_parentat() in do_renameat2(). It subsumes quite a bit of code from that function. start_renaming() does calculate the hash and check X permission and is suitable elsewhere: - nfsd_rename() - ovl_rename() In ovl, ovl_do_rename_rd() is factored out of ovl_do_rename(), which itself will be gone by the end of the series. Acked-by: Chuck Lever <chuck.lever@oracle.com> (for nfsd parts) Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> -- Changes since v3: - added missig dput() in ovl_rename when "whiteout" is not-NULL. Changes since v2: - in __start_renaming() some label have been renamed, and err is always set before a "goto out_foo" rather than passing the error in a dentry*. - ovl_do_rename() changed to call the new ovl_do_rename_rd() rather than keeping duplicate code - code around ovl_cleanup() call in ovl_rename() restructured. Link: https://patch.msgid.link/20251113002050.676694-11-neilb@ownmail.net Tested-by: syzbot@syzkaller.appspotmail.com Acked-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-14 13:15:57 +01:00

1 2 3 4 5 ...

1476 commits