This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
Total patches: 107
Reviews/patch: 1.07
Reviewed rate: 67%
- The 2 patch series "ocfs2: give ocfs2 the ability to reclaim
suballocator free bg" from Heming Zhao saves disk space by teaching
ocfs2 to reclaim suballocator block group space.
- The 4 patch series "Add ARRAY_END(), and use it to fix off-by-one
bugs" from Alejandro Colomar adds the ARRAY_END() macro and uses it in
various places.
- The 2 patch series "vmcoreinfo: support VMCOREINFO_BYTES larger than
PAGE_SIZE" from Pnina Feder makes the vmcore code future-safe, if
VMCOREINFO_BYTES ever exceeds the page size.
- The 7 patch series "kallsyms: Prevent invalid access when showing
module buildid" from Petr Mladek cleans up kallsyms code related to
module buildid and fixes an invalid access crash when printing
backtraces.
- The 3 patch series "Address page fault in
ima_restore_measurement_list()" from Harshit Mogalapalli fixes a
kexec-related crash that can occur when booting the second-stage kernel
on x86.
- The 6 patch series "kho: ABI headers and Documentation updates" from
Mike Rapoport updates the kexec handover ABI documentation.
- The 4 patch series "Align atomic storage" from Finn Thain adds the
__aligned attribute to atomic_t and atomic64_t definitions to get
natural alignment of both types on csky, m68k, microblaze, nios2,
openrisc and sh.
- The 2 patch series "kho: clean up page initialization logic" from
Pratyush Yadav simplifies the page initialization logic in
kho_restore_page().
- The 6 patch series "Unload linux/kernel.h" from Yury Norov moves
several things out of kernel.h and into more appropriate places.
- The 7 patch series "don't abuse task_struct.group_leader" from Oleg
Nesterov removes the usage of ->group_leader when it is "obviously
unnecessary".
- The 5 patch series "list private v2 & luo flb" from Pasha Tatashin
adds some infrastructure improvements to the live update orchestrator.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY4giAAKCRDdBJ7gKXxA
jgusAQDnKkP8UWTqXPC1jI+OrDJGU5ciAx8lzLeBVqMKzoYk9AD/TlhT2Nlx+Ef6
0HCUHUD0FMvAw/7/Dfc6ZKxwBEIxyww=
=mmsH
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
disk space by teaching ocfs2 to reclaim suballocator block group
space (Heming Zhao)
- "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
ARRAY_END() macro and uses it in various places (Alejandro Colomar)
- "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
page size (Pnina Feder)
- "kallsyms: Prevent invalid access when showing module buildid" cleans
up kallsyms code related to module buildid and fixes an invalid
access crash when printing backtraces (Petr Mladek)
- "Address page fault in ima_restore_measurement_list()" fixes a
kexec-related crash that can occur when booting the second-stage
kernel on x86 (Harshit Mogalapalli)
- "kho: ABI headers and Documentation updates" updates the kexec
handover ABI documentation (Mike Rapoport)
- "Align atomic storage" adds the __aligned attribute to atomic_t and
atomic64_t definitions to get natural alignment of both types on
csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)
- "kho: clean up page initialization logic" simplifies the page
initialization logic in kho_restore_page() (Pratyush Yadav)
- "Unload linux/kernel.h" moves several things out of kernel.h and into
more appropriate places (Yury Norov)
- "don't abuse task_struct.group_leader" removes the usage of
->group_leader when it is "obviously unnecessary" (Oleg Nesterov)
- "list private v2 & luo flb" adds some infrastructure improvements to
the live update orchestrator (Pasha Tatashin)
* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
procfs: fix missing RCU protection when reading real_parent in do_task_stat()
watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
kho: fix doc for kho_restore_pages()
tests/liveupdate: add in-kernel liveupdate test
liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
liveupdate: luo_file: Use private list
list: add kunit test for private list primitives
list: add primitives for private list manipulations
delayacct: fix uapi timespec64 definition
panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
netclassid: use thread_group_leader(p) in update_classid_task()
RDMA/umem: don't abuse current->group_leader
drm/pan*: don't abuse current->group_leader
drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
drm/amdgpu: don't abuse current->group_leader
android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
android/binder: don't abuse current->group_leader
kho: skip memoryless NUMA nodes when reserving scratch areas
...
- Prevent rename() from failing with -ESTALE when there are locking
conflicts and retry the operation instead.
- Don't fail when fiemap triggers a page fault (xfstest generic/742).
- Fix another locking request cancellation bug.
- Minor other fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmmJ5mQUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTooqg/+MzASj1LM37uKjYPAQkfF6nvujd9G
BPMspoT+JzZDc0+btNPxsoOgwmju2ZeCY2ZNzGXbQ/V2wcTT3ZnuWummHOwhs07G
XilDp+Ohzk4IcQ1uCvOpIMY7mmRdSTbCo/Ztny/nPLxHOfbe6AgWo+YU/Saxh6xI
ndkbzSq0yjW7zjxdMSKjVbRAhvQGaW892s9orjb36isVDj+4hvA/aIWLrm7Zm5sf
xWDlzx9mUMMqlM8zImUedUjyG8zyTSBz80NthQmnRtt6fZ+Hau0DCwbkbKz9J8mH
Pksm7Xz6I9eft0axEQh7KrvCiHWCEUl3zoknbC/QtusJcx/r7Vpe48lBimIN7xi0
u1e2k7MPpOeVPzsP1nhKxD9W/IRC/WaKZCKb9fYmtbhzYVk8gE/lrLLGTWnATTa2
X6OCBpdg5niohXeVDNmE9ZtU5xsB/UH9AZ8p2+iPhhUC5dPPcF//H7mXNQhDC6m9
Az7KxPu4VlTiRD9YCU+SV7uBM1YLocLQg8qLN4mdrxPegRs2EuVVEuQxb3isGrQV
yMl61bbazc028AOeWu0iu7iggUXczwdjlEZVRyBpmjJjD6JwZ40WUKl8JqqiNH/K
09CWNQO7GZOmOcXzCHz34sE2bekv3OMnncJ/z+zFdSnwc71r7Eo5H+XG7Vyf7hQ9
OPHJe0tsyMDKPl8=
=GBfL
-----END PGP SIGNATURE-----
Merge tag 'gfs2-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Prevent rename() from failing with -ESTALE when there are locking
conflicts and retry the operation instead
- Don't fail when fiemap triggers a page fault (xfstest generic/742)
- Fix another locking request cancellation bug
- Minor other fixes and cleanups
* tag 'gfs2-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: fiemap page fault fix
gfs2: fix memory leaks in gfs2_fill_super error path
gfs2: Fix use-after-free in iomap inline data write path
gfs2: Fix slab-use-after-free in qd_put
gfs2: Introduce glock_{type,number,sbd} helpers
gfs2: gfs2_glock_hold cleanup
gfs: Use fixed GL_GLOCK_MIN_HOLD time
gfs2: Fix gfs2_log_get_bio argument type
gfs2: gfs2_chain_bio start sector fix
gfs2: Initialize bio->bi_opf early
gfs2: Rename gfs2_log_submit_{bio -> write}
gfs2: Do not cancel internal demote requests
gfs2: run_queue cleanup
gfs2: Retries missing in gfs2_{rename,exchange}
gfs2: glock cancelation flag fix
Please consider pulling these changes from the signed vfs-7.0-rc1.misc tag.
Thanks!
Christian
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49QAKCRCRxhvAZXjc
ojrZAQD1VJzY46r5FnAVf4jlEHyjIbDnZCP/n+c4x6XnqpU6EQEAgB0yAtAGP6+u
SBuytElqHoTT5VtmEXTAabCNQ9Ks8wo=
=JwZz
-----END PGP SIGNATURE-----
Merge tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains a mix of VFS cleanups, performance improvements, API
fixes, documentation, and a deprecation notice.
Scalability and performance:
- Rework pid allocation to only take pidmap_lock once instead of
twice during alloc_pid(), improving thread creation/teardown
throughput by 10-16% depending on false-sharing luck. Pad the
namespace refcount to reduce false-sharing
- Track file lock presence via a flag in ->i_opflags instead of
reading ->i_flctx, avoiding false-sharing with ->i_readcount on
open/close hot paths. Measured 4-16% improvement on 24-core
open-in-a-loop benchmarks
- Use a consume fence in locks_inode_context() to match the
store-release/load-consume idiom, eliminating a hardware fence on
some architectures
- Annotate cdev_lock with __cacheline_aligned_in_smp to prevent
false-sharing
- Remove a redundant DCACHE_MANAGED_DENTRY check in
__follow_mount_rcu() that never fires since the caller already
verifies it, eliminating a 100% mispredicted branch
- Fix a 100% mispredicted likely() in devcgroup_inode_permission()
that became wrong after a prior code reorder
Bug fixes and correctness:
- Make insert_inode_locked() wait for inode destruction instead of
skipping, fixing a corner case where two matching inodes could
exist in the hash
- Move f_mode initialization before file_ref_init() in alloc_file()
to respect the SLAB_TYPESAFE_BY_RCU ordering contract
- Add a WARN_ON_ONCE guard in try_to_free_buffers() for folios with
no buffers attached, preventing a null pointer dereference when
AS_RELEASE_ALWAYS is set but no release_folio op exists
- Fix select restart_block to store end_time as timespec64, avoiding
truncation of tv_sec on 32-bit architectures
- Make dump_inode() use get_kernel_nofault() to safely access inode
and superblock fields, matching the dump_mapping() pattern
API modernization:
- Make posix_acl_to_xattr() allocate the buffer internally since
every single caller was doing it anyway. Reduces boilerplate and
unnecessary error checking across ~15 filesystems
- Replace deprecated simple_strtoul() with kstrtoul() for the
ihash_entries, dhash_entries, mhash_entries, and mphash_entries
boot parameters, adding proper error handling
- Convert chardev code to use guard(mutex) and __free(kfree) cleanup
patterns
- Replace min_t() with min() or umin() in VFS code to avoid silently
truncating unsigned long to unsigned int
- Gate LOOKUP_RCU assertions behind CONFIG_DEBUG_VFS since callers
already check the flag
Deprecation:
- Begin deprecating legacy BSD process accounting (acct(2)). The
interface has numerous footguns and better alternatives exist
(eBPF)
Documentation:
- Fix and complete kernel-doc for struct export_operations, removing
duplicated documentation between ReST and source
- Fix kernel-doc warnings for __start_dirop() and ilookup5_nowait()
Testing:
- Add a kunit test for initramfs cpio handling of entries with
filesize > PATH_MAX
Misc:
- Add missing <linux/init_task.h> include in fs_struct.c"
* tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
posix_acl: make posix_acl_to_xattr() alloc the buffer
fs: make insert_inode_locked() wait for inode destruction
initramfs_test: kunit test for cpio.filesize > PATH_MAX
fs: improve dump_inode() to safely access inode fields
fs: add <linux/init_task.h> for 'init_fs'
docs: exportfs: Use source code struct documentation
fs: move initializing f_mode before file_ref_init()
exportfs: Complete kernel-doc for struct export_operations
exportfs: Mark struct export_operations functions at kernel-doc
exportfs: Fix kernel-doc output for get_name()
acct(2): begin the deprecation of legacy BSD process accounting
device_cgroup: remove branch hint after code refactor
VFS: fix __start_dirop() kernel-doc warnings
fs: Describe @isnew parameter in ilookup5_nowait()
fs/namei: Remove redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu
fs: only assert on LOOKUP_RCU when built with CONFIG_DEBUG_VFS
select: store end_time as timespec64 in restart block
chardev: Switch to guard(mutex) and __free(kfree)
namespace: Replace simple_strtoul with kstrtoul to parse boot params
dcache: Replace simple_strtoul with kstrtoul in set_dhash_entries
...
Please consider pulling these changes from the signed vfs-7.0-rc1.leases tag.
Thanks!
Christian
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc
olR/AP40iNOTRn7LosXbRWqGGZqzy9v64QYoLzk3QdsWuGmbRAD/egNQzof8mkAf
IscefWTOjY7xyDzmEBEBnfHftgMiEwM=
=zre0
-----END PGP SIGNATURE-----
Merge tag 'vfs-7.0-rc1.leases' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs lease updates from Christian Brauner:
"This contains updates for lease support to require filesystems to
explicitly opt-in to lease support
Currently kernel_setlease() falls through to generic_setlease() when a
a filesystem does not define ->setlease(), silently granting lease
support to every filesystem regardless of whether it is prepared for
it.
This is a poor default: most filesystems never intended to support
leases, and the silent fallthrough makes it impossible to distinguish
"supports leases" from "never thought about it".
This inverts the default. It adds explicit
.setlease = generic_setlease;
assignments to every in-tree filesystem that should retain lease
support, then changes kernel_setlease() to return -EINVAL when
->setlease is NULL.
With the new default in place, simple_nosetlease() is redundant and
is removed along with all references to it"
* tag 'vfs-7.0-rc1.leases' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
fuse: add setlease file operation
fs: remove simple_nosetlease()
filelock: default to returning -EINVAL when ->setlease operation is NULL
xfs: add setlease file operation
ufs: add setlease file operation
udf: add setlease file operation
tmpfs: add setlease file operation
squashfs: add setlease file operation
overlayfs: add setlease file operation
orangefs: add setlease file operation
ocfs2: add setlease file operation
ntfs3: add setlease file operation
nilfs2: add setlease file operation
jfs: add setlease file operation
jffs2: add setlease file operation
gfs2: add a setlease file operation
fat: add setlease file operation
f2fs: add setlease file operation
exfat: add setlease file operation
ext4: add setlease file operation
...
Please consider pulling these changes from the signed vfs-7.0-rc1.nonblocking_timestamps tag.
Thanks!
Christian
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc
oqNMAQCjHw9iwYDu63n96QAipWopJb8onqc0rTEvi0OOl1zDNwEAufN3EqTzV3uQ
JbNgSwBWD/+ICd2aUOuAX0GgU6teyAQ=
=lJlI
-----END PGP SIGNATURE-----
Merge tag 'vfs-7.0-rc1.nonblocking_timestamps' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs timestamp updates from Christian Brauner:
"This contains the changes to support non-blocking timestamp updates.
Since commit 66fa3cedf1 ("fs: Add async write file modification
handling") file_update_time_flags() unconditionally returns -EAGAIN
when any timestamp needs updating and IOCB_NOWAIT is set. This makes
non-blocking direct writes impossible on file systems with granular
enough timestamps, which in practice means all of them.
This reworks the timestamp update path to propagate IOCB_NOWAIT
through ->update_time so that file systems which can update timestamps
without blocking are no longer penalized.
With that groundwork in place, the core change passes IOCB_NOWAIT into
->update_time and returns -EAGAIN only when the file system indicates
it would block.
XFS implements non-blocking timestamp updates by using the new
->sync_lazytime and open-coding generic_update_time without the
S_NOWAIT check, since the lazytime path through the generic helpers
can never block in XFS"
* tag 'vfs-7.0-rc1.nonblocking_timestamps' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
xfs: enable non-blocking timestamp updates
xfs: implement ->sync_lazytime
fs: refactor file_update_time_flags
fs: add support for non-blocking timestamp updates
fs: add a ->sync_lazytime method
fs: factor out a sync_lazytime helper
fs: refactor ->update_time handling
fat: cleanup the flags for fat_truncate_time
nfs: split nfs_update_timestamps
fs: allow error returns from generic_update_time
fs: remove inode_update_time
In gfs2_fiemap(), we are calling iomap_fiemap() while holding the inode
glock. This can lead to recursive glock taking if the fiemap buffer is
memory mapped to the same inode and accessing it triggers a page fault.
Fix by disabling page faults for iomap_fiemap() and faulting in the
buffer by hand if necessary.
Fixes xfstest generic/742.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Fix two memory leaks in the gfs2_fill_super() error handling path when
transitioning a filesystem to read-write mode fails.
First leak: kthread objects (thread_struct, task_struct, etc.)
When gfs2_freeze_lock_shared() fails after init_threads() succeeds, the
created kernel threads (logd and quotad) are never destroyed. This
occurs because the fail_per_node label doesn't call
gfs2_destroy_threads().
Second leak: quota bitmap buffer (8192 bytes)
When gfs2_make_fs_rw() fails after gfs2_quota_init() succeeds but
before other operations complete, the allocated quota bitmap is never
freed.
The fix moves thread cleanup to the fail_per_node label to handle all
error paths uniformly. gfs2_destroy_threads() is safe to call
unconditionally as it checks for NULL pointers. Quota cleanup is added
in gfs2_make_fs_rw() to properly handle the withdrawal case where
quota initialization succeeds but the filesystem is then withdrawn.
Thread leak backtrace (gfs2_freeze_lock_shared failure):
unreferenced object 0xffff88801d7bca80 (size 4480):
copy_process+0x3a1/0x4670 kernel/fork.c:2422
kernel_clone+0xf3/0x6e0 kernel/fork.c:2779
kthread_create_on_node+0x100/0x150 kernel/kthread.c:478
init_threads+0xab/0x350 fs/gfs2/ops_fstype.c:611
gfs2_fill_super+0xe5c/0x1240 fs/gfs2/ops_fstype.c:1265
Quota leak backtrace (gfs2_make_fs_rw failure):
unreferenced object 0xffff88812de7c000 (size 8192):
gfs2_quota_init+0xe5/0x820 fs/gfs2/quota.c:1409
gfs2_make_fs_rw+0x7a/0xe0 fs/gfs2/super.c:149
gfs2_fill_super+0xfbb/0x1240 fs/gfs2/ops_fstype.c:1275
Reported-by: syzbot+aac438d7a1c44071e04b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=aac438d7a1c44071e04b
Fixes: 6c7410f449 ("gfs2: gfs2_freeze_lock_shared cleanup")
Fixes: b66f723bb5 ("gfs2: Improve gfs2_make_fs_rw error handling")
Link: https://lore.kernel.org/all/20260131062509.77974-1-kartikey406@gmail.com/T/ [v1]
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The inline data buffer head (dibh) is being released prematurely in
gfs2_iomap_begin() via release_metapath() while iomap->inline_data
still points to dibh->b_data. This causes a use-after-free when
iomap_write_end_inline() later attempts to write to the inline data
area.
The bug sequence:
1. gfs2_iomap_begin() calls gfs2_meta_inode_buffer() to read inode
metadata into dibh
2. Sets iomap->inline_data = dibh->b_data + sizeof(struct gfs2_dinode)
3. Calls release_metapath() which calls brelse(dibh), dropping refcount
to 0
4. kswapd reclaims the page (~39ms later in the syzbot report)
5. iomap_write_end_inline() tries to memcpy() to iomap->inline_data
6. KASAN detects use-after-free write to freed memory
Fix by storing dibh in iomap->private and incrementing its refcount
with get_bh() in gfs2_iomap_begin(). The buffer is then properly
released in gfs2_iomap_end() after the inline write completes,
ensuring the page stays alive for the entire iomap operation.
Note: A C reproducer is not available for this issue. The fix is based
on analysis of the KASAN report and code review showing the buffer head
is freed before use.
[agruenba: Take buffer head reference in gfs2_iomap_begin() to avoid
leaks in gfs2_iomap_get() and gfs2_iomap_alloc().]
Reported-by: syzbot+ea1cd4aa4d1e98458a55@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ea1cd4aa4d1e98458a55
Fixes: d0a22a4b03 ("gfs2: Fix iomap write page reclaim deadlock")
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaXc4IwAKCRCRxhvAZXjc
oo0jAQDOV580l4wHiY6eT1QGY2QYa7u8fYDOi6mqfgHa+EH5twD9ETnQ0xQHIKYP
oruFJXLf3ihBBsum+pTpAO2XFVjM7Qs=
=pM8o
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.19-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix the the buggy conversion of fuse_reverse_inval_entry() introduced
during the creation rework
- Disallow nfs delegation requests for directories by setting
simple_nosetlease()
- Require an opt-in for getting readdir flag bits outside of S_DT_MASK
set in d_type
- Fix scheduling delayed writeback work by only scheduling when the
dirty time expiry interval is non-zero and cancel the delayed work if
the interval is set to zero
- Use rounded_jiffies_interval for dirty time work
- Check the return value of sb_set_blocksize() for romfs
- Wait for batched folios to be stable in __iomap_get_folio()
- Use private naming for fuse hash size
- Fix the stale dentry cleanup to prevent a race that causes a UAF
* tag 'vfs-6.19-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
vfs: document d_dispose_if_unused()
fuse: shrink once after all buckets have been scanned
fuse: clean up fuse_dentry_tree_work()
fuse: add need_resched() before unlocking bucket
fuse: make sure dentry is evicted if stale
fuse: fix race when disposing stale dentries
fuse: use private naming for fuse hash size
writeback: use round_jiffies_relative for dirtytime_work
iomap: wait for batched folios to be stable in __iomap_get_folio
romfs: check sb_set_blocksize() return value
docs: clarify that dirtytime_expire_seconds=0 disables writeback
writeback: fix 100% CPU usage when dirtytime_expire_interval is 0
readdir: require opt-in for d_type flags
vboxsf: don't allow delegations to be set on directories
ceph: don't allow delegations to be set on directories
gfs2: don't allow delegations to be set on directories
9p: don't allow delegations to be set on directories
smb/client: properly disallow delegations on directories
nfs: properly disallow delegation requests on directories
fuse: fix conversion of fuse_reverse_inval_entry() to start_removing()
Commit a475c5dd16 ("gfs2: Free quota data objects synchronously")
started freeing quota data objects during filesystem shutdown instead of
putting them back onto the LRU list, but it failed to remove these
objects from the LRU list, causing LRU list corruption. This caused
use-after-free when the shrinker (gfs2_qd_shrink_scan) tried to access
already-freed objects on the LRU list.
Fix this by removing qd objects from the LRU list before freeing them in
qd_put().
Initial fix from Deepanshu Kartikey <kartikey406@gmail.com>.
Fixes: a475c5dd16 ("gfs2: Free quota data objects synchronously")
Reported-by: syzbot+046b605f01802054bff0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=046b605f01802054bff0
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Introduce glock_type(), glock_number(), and glock_sbd() helpers for
accessing a glock's type, number, and super block pointer more easily.
Created with Coccinelle using the following semantic patch:
@@ struct gfs2_glock *gl; @@
- gl->gl_name.ln_type
+ glock_type(gl)
@@ struct gfs2_glock *gl; @@
- gl->gl_name.ln_number
+ glock_number(gl)
@@ struct gfs2_glock *gl; @@
- gl->gl_name.ln_sbd
+ glock_sbd(gl)
glock_sbd() is a macro because it is used with const as well as
non-const struct gfs2_glock * arguments.
Instances in macro definitions, particularly in tracepoint definitions,
replaced by hand.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
GL_GLOCK_MIN_HOLD represents the minimum time (in jiffies) that a glock
should be held before being eligible for release. It is currently
defined as 10, meaning that the duration depends on the timer interrupt
frequency (CONFIG_HZ). Change that time to a constant 10ms independent
of CONFIG_HZ. On CONFIG_HZ=1000 systems, the value remains the same.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Fix the type of gfs2_log_get_bio()'s op argument: callers pass in a
blk_opf_t value and the function passes that value on as a blk_opf_t
value, so the current argument type makes no sense.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Pass the start sector into gfs2_chain_bio(): the new bio isn't
necessarily contiguous with the previous one.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Pass the right blk_opf_t value to bio_alloc() so that ->bi_ops is
initialized correctly and doesn't have to be changed later. Adjust the
call chain to pass that value through to where it is needed (and only
there).
Add a separate blk_opf_t argument to gfs2_chain_bio() instead of copying
the value from the previous bio.
Fixes: 8a157e0a0a ("gfs2: Fix use of bio_chain")
Reported-by: syzbot+f6539d4ce3f775aee0cc@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f6539d4ce3f775aee0cc
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Rename gfs2_log_submit_bio() to gfs2_log_submit_write(): this function
isn't used for submitting log reads.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Trying to change the state of a glock may result in a "conversion
deadlock" error, indicating that the requested state transition would
cause a deadlock. In this case, we unlock and retry the state change.
It makes no sense to try canceling those unlock requests, but the
current code is not aware of that.
In addition, if a locking request is canceled shortly after it is made,
the cancelation request can currently overtake the locking request.
This may result in deadlocks.
Fix both of these bugs by repurposing the GLF_PENDING_REPLY flag into a
GLF_MAY_CANCEL flag which is set only when the current locking request
can be canceled. When trying to cancel a locking request in
gfs2_glock_dq(), wait for this flag to be set.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In run_queue(), instead of always setting the GLF_LOCK flag, only set it
when the flag is actually needed. This avoids having to undo the flag
setting later.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Fix a bug in gfs2's asynchronous glock handling for rename and exchange
operations. The original async implementation from commit ad26967b9a
("gfs2: Use async glocks for rename") mentioned that retries were needed
but never implemented them, causing operations to fail with -ESTALE
instead of retrying on timeout.
Also makes the waiting interruptible.
In addition, the timeouts used were too high for situations in which
timing out is a rare but expected scenario. Switch to shorter timeouts
with randomization and exponentional backoff.
Fixes: ad26967b9a ("gfs2: Use async glocks for rename")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When an asynchronous glock holder is dequeued that hasn't been granted
yet (HIF_HOLDER not set) and no dlm operation is in progress on behalf
of that holder (GLF_LOCK not set), the dequeuing takes place in
__gfs2_glock_dq(). There, we are not clearing the HIF_WAIT flag and
waking up waiters. Fix that.
This bug prevents the same holder from being enqueued later (gfs2_glock_nq())
without first reinitializing it (gfs2_holder_reinit()). The code
doesn't currently use this pattern, but this will change in the next
commit.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of
hex.h interfaces to directly #include <linux/hex.h> as part of the process
of putting kernel.h on a diet.
Removing hex.h from kernel.h means that 36K C source files don't have to
pay the price of parsing hex.h for the roughly 120 C source files that
need it.
This change has been build-tested with allmodconfig on most ARCHes. Also,
all users/callers of <linux/hex.h> in the entire source tree have been
updated if needed (if not already #included).
Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Without exception all caller do that. So move the allocation into the
helper.
This reduces boilerplate and removes unnecessary error checking.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://patch.msgid.link/20260115122341.556026-1-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
This reverts commit 8a157e0a0a.
That commit incorrectly assumed that the bio_chain() arguments were
swapped in gfs2. However, gfs2 intentionally constructs bio chains so
that the first bio's bi_end_io callback is invoked when all bios in the
chain have completed, unlike bio chains where the last bio's callback is
invoked.
Fixes: 8a157e0a0a ("gfs2: Fix use of bio_chain")
Cc: stable@vger.kernel.org
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Currently file_update_time_flags unconditionally returns -EAGAIN if any
timestamp needs to be updated and IOCB_NOWAIT is passed. This makes
non-blocking direct writes impossible on file systems with granular
enough timestamps.
Pass IOCB_NOWAIT to ->update_time and return -EAGAIN if it could block.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260108141934.2052404-9-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Pass the type of update (atime vs c/mtime plus version) as an enum
instead of a set of flags that caused all kinds of confusion.
Because inode_update_timestamps now can't return a modified version
of those flags, return the I_DIRTY_* flags needed to persist the
update, which is what the main caller in generic_update_time wants
anyway, and which is suitable for the other callers that only want
to know if an update happened.
The whole update_time path keeps the flags argument, which will be used
to support non-blocking updates soon even if it is unused, and (the
slightly renamed) inode_update_time also gains the possibility to return
a negative errno to support this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260108141934.2052404-6-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Now that no caller looks at the updated flags, switch generic_update_time
to the same calling convention as the ->update_time method and return 0
or a negative errno.
This prepares for adding non-blocking timestamp updates that could return
-EAGAIN.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260108141934.2052404-3-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Setting ->setlease() to a NULL pointer now has the same effect as
setting it to simple_nosetlease(). Remove all of the setlease
file_operations that are set to simple_nosetlease, and the function
itself.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260108-setlease-6-20-v1-24-ea4dec9b67fa@kernel.org
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
gfs2_file_fops_nolock() already has this explicitly set, so it's only
necessary to set this in gfs2_dir_fops_nolock(). A future patch
will change the default behavior to reject lease attempts with -EINVAL
when there is no setlease file operation defined.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260108-setlease-6-20-v1-10-ea4dec9b67fa@kernel.org
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
With the advent of directory leases, it's necessary to set the
->setlease() handler in directory file_operations to properly deny them.
In the "nolock" case however, there is no need to deny them.
Fixes: e6d28ebc17 ("filelock: push the S_ISREG check down to ->setlease handlers")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260107-setlease-6-19-v1-4-85f034abcc57@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
- Major withdraw / error handling overhaul based on dlm's new
DLM_RELEASE_RECOVER feature: this allows gfs to treat withdraws like
node failures. Make withdraws asynchronous.
- Fix a bug in commit e4a8b5481c that caused 'df' to remain out of
sync. ('df' is still allowed to go slightly out of sync for short
periods of time.)
- Prevent recusive memory reclaim in gfs2_unstuff_dinode().
- Clean up SDF_JOURNAL_LIVE flag handling.
- Fix remote evict for read-only filesystems.
- Fix a misuse of bio_chain().
- Various other minor cleanups.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmkvM/oUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqFLQ//Ra+FUIMYRrU83xd8OB+rXaFixymk
riPnoXZHrccoRAARc6d/p/nTYmV7FSnTa2jlsUFshsO2jl/1OIRI6xqScL+W5dp9
ToxHKnezOsRmy5tXvomYy9Teo/ngz6dpv6ixVrn7qTGJJ5zpir0ji2fO+xUvCUQq
IVGgvQN0O7ECUmmRXxjkRLhoEdus2Ye4jQkpJmqY9I1+H/0E43HlF2zDgoisTkiH
baNtFkTgE65Hba2xhoTCt3NO17POgqGcRaLVOLv5dqGhLAQ33+jXeYduM6ujbqwk
1FznGf09NRBJVQkyJvbgZELdX+mWXvWf/2kPmW6TXd7phFiO7cByxN2aHHcu/wBp
kh0ZgLFtvjyuR7wiyKf0tAKKPh250YiSkBuJU64dlHha2ZQuME6Z4dlYqBKwvPXf
AwxAu0tB2bAjc2OXvPaHhLHjdpySEWqNZIITbfQPVPonRKed10IqCwlTd+j5/64H
mnZCCxr2p2akrSYbQjl7h1k4LO7eX3cYJ23JoKIjwQ2hJE+Pd4li9+xhQCEQUVyE
ou/ezUKc19Xe7uxG1412PrbmOtIMgKfpq2KW151b1KA3pfCUt7GMvpYlvmnf9Wzs
XGjrQ1p4XdlxzQqe9XTrdO0IrNUNGDcky1Kqishje9SPa8FthwrtTuBSYL8nkcX6
3KRRzlFCRHfM5ng=
=AWK2
-----END PGP SIGNATURE-----
Merge tag 'gfs2-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Major withdraw / error handling overhaul based on dlm's new
DLM_RELEASE_RECOVER feature: this allows gfs to treat withdraws like
node failures. Make withdraws asynchronous
- Fix a bug in commit e4a8b5481c that caused 'df' to remain out of
sync. ('df' is still allowed to go slightly out of sync for short
periods of time)
- Prevent recusive memory reclaim in gfs2_unstuff_dinode()
- Clean up SDF_JOURNAL_LIVE flag handling
- Fix remote evict for read-only filesystems
- Fix a misuse of bio_chain()
- Various other minor cleanups
* tag 'gfs2-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (35 commits)
gfs2: Fix use of bio_chain
gfs2: Clean up SDF_JOURNAL_LIVE flag handling
gfs2: No longer thaw filesystems during a withdraw
gfs2: Withdraw immediately in gfs2_trans_add_meta
gfs2: New gfs2_withdraw_helper
gfs2: Clean up properly during a withdraw
gfs2: Rename gfs2_{gl_dq_holders => withdraw_glocks}
Revert "gfs2: fix infinite loop when checking ail item count before go_inval"
Revert "gfs2: Allow some glocks to be used during withdraw"
Revert "gfs2: Check for log write errors before telling dlm to unlock"
Revert "gfs2: fix a deadlock on withdraw-during-mount"
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (6/6)
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (5/6)
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (4/6)
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (3/6)
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (2/6)
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (1/6)
Revert "gfs2: don't stop reads while withdraw in progress"
gfs2: Rename LM_FLAG_{NOEXP -> RECOVER}
gfs2: Kill gfs2_io_error_bh_wd
...
In gfs2_chain_bio(), the call to bio_chain() has its arguments swapped.
The result is leaked bios and incorrect synchronization (only the last
bio will actually be waited for). This code is only used during mount
and filesystem thaw, so the bug normally won't be noticeable.
Reported-by: Stephen Zhang <starzhangzsd@gmail.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZQAKCRCRxhvAZXjc
onGBAQDtqeO0jZzS7q9UxlJ84Wj/H9w+9INpO4jMxtWK4svhUAEAghG4qVxRvkE2
Qh+wrpTPIC7OCQ78k8psDRmkj9cn8QA=
=FCVN
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.19-rc1.folio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull folio updates from Christian Brauner:
"Add a new folio_next_pos() helper function that returns the file
position of the first byte after the current folio. This is a common
operation in filesystems when needing to know the end of the current
folio.
The helper is lifted from btrfs which already had its own version, and
is now used across multiple filesystems and subsystems:
- btrfs
- buffer
- ext4
- f2fs
- gfs2
- iomap
- netfs
- xfs
- mm
This fixes a long-standing bug in ocfs2 on 32-bit systems with files
larger than 2GiB. Presumably this is not a common configuration, but
the fix is backported anyway. The other filesystems did not have bugs,
they were just mildly inefficient.
This also introduce uoff_t as the unsigned version of loff_t. A recent
commit inadvertently changed a comparison from being unsigned (on
64-bit systems) to being signed (which it had always been on 32-bit
systems), leading to sporadic fstests failures.
Generally file sizes are restricted to being a signed integer, but in
places where -1 is passed to indicate "up to the end of the file", it
is convenient to have an unsigned type to ensure comparisons are
always unsigned regardless of architecture"
* tag 'vfs-6.19-rc1.folio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: Add uoff_t
mm: Use folio_next_pos()
xfs: Use folio_next_pos()
netfs: Use folio_next_pos()
iomap: Use folio_next_pos()
gfs2: Use folio_next_pos()
f2fs: Use folio_next_pos()
ext4: Use folio_next_pos()
buffer: Use folio_next_pos()
btrfs: Use folio_next_pos()
filemap: Add folio_next_pos()
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZQAKCRCRxhvAZXjc
or4UAP9FbpFsZd0DpsYnKuv7kFepl291PuR0x2dKmseJ/wcf8AEAzI8FR5wd/fey
25ZNdExoUojAOj5wVn+jUep3u54jBws=
=/toi
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.19-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull writeback updates from Christian Brauner:
"Features:
- Allow file systems to increase the minimum writeback chunk size.
The relatively low minimal writeback size of 4MiB means that
written back inodes on rotational media are switched a lot. Besides
introducing additional seeks, this also can lead to extreme file
fragmentation on zoned devices when a lot of files are cached
relative to the available writeback bandwidth.
This adds a superblock field that allows the file system to
override the default size, and sets it to the zone size for zoned
XFS.
- Add logging for slow writeback when it exceeds
sysctl_hung_task_timeout_secs. This helps identify tasks waiting
for a long time and pinpoint potential issues. Recording the
starting jiffies is also useful when debugging a crashed vmcore.
- Wake up waiting tasks when finishing the writeback of a chunk
Cleanups:
- filemap_* writeback interface cleanups.
Adding filemap_fdatawrite_wbc ended up being a mistake, as all but
the original btrfs caller should be using better high level
interfaces instead.
This series removes all these low-level interfaces, switches btrfs
to a more specific interface, and cleans up other too low-level
interfaces. With this the writeback_control that is passed to the
writeback code is only initialized in three places.
- Remove __filemap_fdatawrite, __filemap_fdatawrite_range, and
filemap_fdatawrite_wbc
- Add filemap_flush_nr helper for btrfs
- Push struct writeback_control into start_delalloc_inodes in btrfs
- Rename filemap_fdatawrite_range_kick to filemap_flush_range
- Stop opencoding filemap_fdatawrite_range in 9p, ocfs2, and mm
- Make wbc_to_tag() inline and use it in fs"
* tag 'vfs-6.19-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: Make wbc_to_tag() inline and use it in fs.
xfs: set s_min_writeback_pages for zoned file systems
writeback: allow the file system to override MIN_WRITEBACK_PAGES
writeback: cleanup writeback_chunk_size
mm: rename filemap_fdatawrite_range_kick to filemap_flush_range
mm: remove __filemap_fdatawrite_range
mm: remove filemap_fdatawrite_wbc
mm: remove __filemap_fdatawrite
mm,btrfs: add a filemap_flush_nr helper
btrfs: push struct writeback_control into start_delalloc_inodes
btrfs: use the local tmp_inode variable in start_delalloc_inodes
ocfs2: don't opencode filemap_fdatawrite_range in ocfs2_journal_submit_inode_data_buffers
9p: don't opencode filemap_fdatawrite_range in v9fs_mmap_vm_close
mm: don't opencode filemap_fdatawrite_range in filemap_invalidate_inode
writeback: Add logging for slow writeback (exceeds sysctl_hung_task_timeout_secs)
writeback: Wake up waiting tasks when finishing the writeback of a chunk.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZAAKCRCRxhvAZXjc
omMSAP9GLhavxyWQ24Q+49CNWWRQWDY1wTOiUK2BwtIvZ0YEcAD8D1dAiMckL5pC
RwEAVA5p+y+qi+bZP0KXCBxQddoTIQM=
=zo/J
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs inode updates from Christian Brauner:
"Features:
- Hide inode->i_state behind accessors. Open-coded accesses prevent
asserting they are done correctly. One obvious aspect is locking,
but significantly more can be checked. For example it can be
detected when the code is clearing flags which are already missing,
or is setting flags when it is illegal (e.g., I_FREEING when
->i_count > 0)
- Provide accessors for ->i_state, converts all filesystems using
coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2,
overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to
compile
- Rework I_NEW handling to operate without fences, simplifying the
code after the accessor infrastructure is in place
Cleanups:
- Move wait_on_inode() from writeback.h to fs.h
- Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb
for clarity
- Cosmetic fixes to LRU handling
- Push list presence check into inode_io_list_del()
- Touch up predicts in __d_lookup_rcu()
- ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage
- Assert on ->i_count in iput_final()
- Assert ->i_lock held in __iget()
Fixes:
- Add missing fences to I_NEW handling"
* tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
dcache: touch up predicts in __d_lookup_rcu()
fs: push list presence check into inode_io_list_del()
fs: cosmetic fixes to lru handling
fs: rework I_NEW handling to operate without fences
fs: make plain ->i_state access fail to compile
xfs: use the new ->i_state accessors
nilfs2: use the new ->i_state accessors
overlayfs: use the new ->i_state accessors
gfs2: use the new ->i_state accessors
f2fs: use the new ->i_state accessors
smb: use the new ->i_state accessors
ceph: use the new ->i_state accessors
btrfs: use the new ->i_state accessors
Manual conversion to use ->i_state accessors of all places not covered by coccinelle
Coccinelle-based conversion to use ->i_state accessors
fs: provide accessors for ->i_state
fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb
fs: move wait_on_inode() from writeback.h to fs.h
fs: add missing fences to I_NEW handling
ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage
...
Change do_withdraw() to clear the SDF_JOURNAL_LIVE flag under the log
flush lock. In addition, change __gfs2_trans_begin() to check if the
filesystem is already known to be withdrawn using gfs2_withdrawn().
Then, once we are holding the log flush lock, check if the
SDF_JOURNAL_LIVE flag is still set. This second check ensures that the
filesystem will remain live until the transaction is submitted.
With these changes, it is no longer useful to clear SDF_JOURNAL_LIVE in
gfs2_end_log_write() after calling gfs2_withdraw().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Previously, when a withdraw occurred, we would wait for another node to
recover our journal. This also meant that frozen filesystem needed to
be thawed because otherwise, other nodes wouldn't be able to recover the
filesystem. With the reversal of commit 601ef0d52e ("gfs2: Force
withdraw to replay journals and wait for it to finish"), we are no
longer waiting for journal recovery during a withdraw, so we no longer
need to thaw frozen filesystems, either. This also fixes a potential
deadlock reported by lockdep when running xfstest generic/108.
In addition, there is nothing left in do_withdraw() that would require
taking sd_freeze_mutex, so don't bother taking that lock there anymore.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Currently, when a gfs2 filesystem is withdrawn, an "offline" uevent is
triggered that invokes gfs2-util's gfs2_withdraw_helper script. The
purpose of this script is to deactivate the filesystem's block device so
that it can be withdrawn immediately, even before all the filesystem's
caches have been discarded. The script provided by gfs2-utils never did
anything useful, and there was no way for it to report back its status
to the kernel.
To fix that, extend the gfs2_withdraw_helper mechanism so that the
script can report one of the following results by writing the
corresponding value into "/sys$DEVPATH/lock_module/withdraw":
0 - The shared block device has been marked inactive. Future write
operations will fail.
1 - The shared block device may still be active and carry out
write operations.
If the "offline" uevent isn't reacted upon within the timeout configured
in /sys$DEVPATH/tune/withdraw_helper_timeout (default 5 seconds), the
event handler is assumed to have failed.
In addition, add an additional "errors=deactivate" mount option.
With these changes, if fatal errors are detected on a gfs2 filesystem
and the filesystem is mounted with the "errors=panic" option, the kernel
will panic immediately. Otherwise, an attempt will be made to
deactivate the underlying block device. If successful, the kernel will
release all cluster-wide locks immediately so that the rest of the
cluster can continue. If unsuccessful, the kernel will either panic
("errors=deactivate"), or it will purge all filesystem I/O before
releasing all cluster-wide locks ("errors=withdraw").
Note that the gfs2_withdraw_helper script still needs to be fixed to
take advantage of these improvements. It could be changed to use a
mechanism like LVM Persistent Reservations. "dmsetup suspend" is not a
suitable mechanism as it infinitely postpones I/O operations, which may
prevent withdraw from completing.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
During a withdraw, we don't want to write out any more data than we have
to, so in do_xmote(), skip the ->go_sync() glock operation. We still
want to keep calling ->go_inval() to discard any cached data or
metadata, whether clean or dirty.
We do still allow glocks to transition into state LM_ST_UNLOCKED. This
has the desired side effect of calling ->go_inval() and invalidating the
glock caches.
Function gfs2_withdraw_glocks() is already used for dequeuing any
left-over waiters. We still want that to happen, but additionally, we
want all glocks to be unlocked.
Finally, we change function do_promote() to refuse any further
promotions.
This commit cleans up the leftovers of commit 86934198ee ("gfs2: Clear
flags when withdraw prevents xmote").
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Rename function gfs2_gl_dq_holders() to gfs2_withdraw_glocks(). This
function will soon be used for more than just dequeuing holders.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The current withdraw code duplicates the journal recovery code gfs2
already has for dealing with node failures, and it does so poorly. That
code was added because when releasing a lockspace, we didn't have a way
to indicate that the lockspace needs recovery. We now do have this
feature, so the current withdraw code can be removed almost entirely.
This is one of several steps towards that.
Reverts commit 33dbd1e41a ("gfs2: fix infinite loop when checking ail
item count before go_inval").
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The current withdraw code duplicates the journal recovery code gfs2
already has for dealing with node failures, and it does so poorly. That
code was added because when releasing a lockspace, we didn't have a way
to indicate that the lockspace needs recovery. We now do have this
feature, so the current withdraw code can be removed almost entirely.
This is one of several steps towards that.
Reverts commit a72d2401f5 ("gfs2: Allow some glocks to be used during
withdraw").
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The current withdraw code duplicates the journal recovery code gfs2
already has for dealing with node failures, and it does so poorly. That
code was added because when releasing a lockspace, we didn't have a way
to indicate that the lockspace needs recovery. We now do have this
feature, so the current withdraw code can be removed almost entirely.
This is one of several steps towards that.
Reverts the rest of d93ae386ef ("gfs2: Check for log write errors
before telling dlm to unlock").
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The current withdraw code duplicates the journal recovery code gfs2
already has for dealing with node failures, and it does so poorly. That
code was added because when releasing a lockspace, we didn't have a way
to indicate that the lockspace needs recovery. We now do have this
feature, so the current withdraw code can be removed almost entirely.
This is one of several steps towards that.
Reverts commit 865cc3e9cc ("gfs2: fix a deadlock on
withdraw-during-mount").
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The current withdraw code duplicates the journal recovery code gfs2
already has for dealing with node failures, and it does so poorly. That
code was added because when releasing a lockspace, we didn't have a way
to indicate that the lockspace needs recovery. We now do have this
feature, so the current withdraw code can be removed almost entirely.
This is one of several steps towards that.
Reverts parts of commit 601ef0d52e ("gfs2: Force withdraw to replay
journals and wait for it to finish").
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The current withdraw code duplicates the journal recovery code gfs2
already has for dealing with node failures, and it does so poorly. That
code was added because when releasing a lockspace, we didn't have a way
to indicate that the lockspace needs recovery. We now do have this
feature, so the current withdraw code can be removed almost entirely.
This is one of several steps towards that.
Reverts parts of commit 601ef0d52e ("gfs2: Force withdraw to replay
journals and wait for it to finish").
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The current withdraw code duplicates the journal recovery code gfs2
already has for dealing with node failures, and it does so poorly. That
code was added because when releasing a lockspace, we didn't have a way
to indicate that the lockspace needs recovery. We now do have this
feature, so the current withdraw code can be removed almost entirely.
This is one of several steps towards that.
Reverts parts of commit 601ef0d52e ("gfs2: Force withdraw to replay
journals and wait for it to finish").
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>