Commit graph

3175 commits

Author SHA1 Message Date
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
136114e0ab mm.git review status for linus..mm-nonmm-stable
Total patches:       107
 Reviews/patch:       1.07
 Reviewed rate:       67%
 
 - The 2 patch series "ocfs2: give ocfs2 the ability to reclaim
   suballocator free bg" from Heming Zhao saves disk space by teaching
   ocfs2 to reclaim suballocator block group space.
 
 - The 4 patch series "Add ARRAY_END(), and use it to fix off-by-one
   bugs" from Alejandro Colomar adds the ARRAY_END() macro and uses it in
   various places.
 
 - The 2 patch series "vmcoreinfo: support VMCOREINFO_BYTES larger than
   PAGE_SIZE" from Pnina Feder makes the vmcore code future-safe, if
   VMCOREINFO_BYTES ever exceeds the page size.
 
 - The 7 patch series "kallsyms: Prevent invalid access when showing
   module buildid" from Petr Mladek cleans up kallsyms code related to
   module buildid and fixes an invalid access crash when printing
   backtraces.
 
 - The 3 patch series "Address page fault in
   ima_restore_measurement_list()" from Harshit Mogalapalli fixes a
   kexec-related crash that can occur when booting the second-stage kernel
   on x86.
 
 - The 6 patch series "kho: ABI headers and Documentation updates" from
   Mike Rapoport updates the kexec handover ABI documentation.
 
 - The 4 patch series "Align atomic storage" from Finn Thain adds the
   __aligned attribute to atomic_t and atomic64_t definitions to get
   natural alignment of both types on csky, m68k, microblaze, nios2,
   openrisc and sh.
 
 - The 2 patch series "kho: clean up page initialization logic" from
   Pratyush Yadav simplifies the page initialization logic in
   kho_restore_page().
 
 - The 6 patch series "Unload linux/kernel.h" from Yury Norov moves
   several things out of kernel.h and into more appropriate places.
 
 - The 7 patch series "don't abuse task_struct.group_leader" from Oleg
   Nesterov removes the usage of ->group_leader when it is "obviously
   unnecessary".
 
 - The 5 patch series "list private v2 & luo flb" from Pasha Tatashin
   adds some infrastructure improvements to the live update orchestrator.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY4giAAKCRDdBJ7gKXxA
 jgusAQDnKkP8UWTqXPC1jI+OrDJGU5ciAx8lzLeBVqMKzoYk9AD/TlhT2Nlx+Ef6
 0HCUHUD0FMvAw/7/Dfc6ZKxwBEIxyww=
 =mmsH
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
   disk space by teaching ocfs2 to reclaim suballocator block group
   space (Heming Zhao)

 - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
   ARRAY_END() macro and uses it in various places (Alejandro Colomar)

 - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
   the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
   page size (Pnina Feder)

 - "kallsyms: Prevent invalid access when showing module buildid" cleans
   up kallsyms code related to module buildid and fixes an invalid
   access crash when printing backtraces (Petr Mladek)

 - "Address page fault in ima_restore_measurement_list()" fixes a
   kexec-related crash that can occur when booting the second-stage
   kernel on x86 (Harshit Mogalapalli)

 - "kho: ABI headers and Documentation updates" updates the kexec
   handover ABI documentation (Mike Rapoport)

 - "Align atomic storage" adds the __aligned attribute to atomic_t and
   atomic64_t definitions to get natural alignment of both types on
   csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)

 - "kho: clean up page initialization logic" simplifies the page
   initialization logic in kho_restore_page() (Pratyush Yadav)

 - "Unload linux/kernel.h" moves several things out of kernel.h and into
   more appropriate places (Yury Norov)

 - "don't abuse task_struct.group_leader" removes the usage of
   ->group_leader when it is "obviously unnecessary" (Oleg Nesterov)

 - "list private v2 & luo flb" adds some infrastructure improvements to
   the live update orchestrator (Pasha Tatashin)

* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
  watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
  procfs: fix missing RCU protection when reading real_parent in do_task_stat()
  watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
  kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
  kho: fix doc for kho_restore_pages()
  tests/liveupdate: add in-kernel liveupdate test
  liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
  liveupdate: luo_file: Use private list
  list: add kunit test for private list primitives
  list: add primitives for private list manipulations
  delayacct: fix uapi timespec64 definition
  panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
  netclassid: use thread_group_leader(p) in update_classid_task()
  RDMA/umem: don't abuse current->group_leader
  drm/pan*: don't abuse current->group_leader
  drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
  drm/amdgpu: don't abuse current->group_leader
  android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
  android/binder: don't abuse current->group_leader
  kho: skip memoryless NUMA nodes when reserving scratch areas
  ...
2026-02-12 12:13:01 -08:00
Linus Torvalds
7141433fbe gfs2 changes
- Prevent rename() from failing with -ESTALE when there are locking
   conflicts and retry the operation instead.
 
 - Don't fail when fiemap triggers a page fault (xfstest generic/742).
 
 - Fix another locking request cancellation bug.
 
 - Minor other fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmmJ5mQUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTooqg/+MzASj1LM37uKjYPAQkfF6nvujd9G
 BPMspoT+JzZDc0+btNPxsoOgwmju2ZeCY2ZNzGXbQ/V2wcTT3ZnuWummHOwhs07G
 XilDp+Ohzk4IcQ1uCvOpIMY7mmRdSTbCo/Ztny/nPLxHOfbe6AgWo+YU/Saxh6xI
 ndkbzSq0yjW7zjxdMSKjVbRAhvQGaW892s9orjb36isVDj+4hvA/aIWLrm7Zm5sf
 xWDlzx9mUMMqlM8zImUedUjyG8zyTSBz80NthQmnRtt6fZ+Hau0DCwbkbKz9J8mH
 Pksm7Xz6I9eft0axEQh7KrvCiHWCEUl3zoknbC/QtusJcx/r7Vpe48lBimIN7xi0
 u1e2k7MPpOeVPzsP1nhKxD9W/IRC/WaKZCKb9fYmtbhzYVk8gE/lrLLGTWnATTa2
 X6OCBpdg5niohXeVDNmE9ZtU5xsB/UH9AZ8p2+iPhhUC5dPPcF//H7mXNQhDC6m9
 Az7KxPu4VlTiRD9YCU+SV7uBM1YLocLQg8qLN4mdrxPegRs2EuVVEuQxb3isGrQV
 yMl61bbazc028AOeWu0iu7iggUXczwdjlEZVRyBpmjJjD6JwZ40WUKl8JqqiNH/K
 09CWNQO7GZOmOcXzCHz34sE2bekv3OMnncJ/z+zFdSnwc71r7Eo5H+XG7Vyf7hQ9
 OPHJe0tsyMDKPl8=
 =GBfL
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - Prevent rename() from failing with -ESTALE when there are locking
   conflicts and retry the operation instead

 - Don't fail when fiemap triggers a page fault (xfstest generic/742)

 - Fix another locking request cancellation bug

 - Minor other fixes and cleanups

* tag 'gfs2-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: fiemap page fault fix
  gfs2: fix memory leaks in gfs2_fill_super error path
  gfs2: Fix use-after-free in iomap inline data write path
  gfs2: Fix slab-use-after-free in qd_put
  gfs2: Introduce glock_{type,number,sbd} helpers
  gfs2: gfs2_glock_hold cleanup
  gfs: Use fixed GL_GLOCK_MIN_HOLD time
  gfs2: Fix gfs2_log_get_bio argument type
  gfs2: gfs2_chain_bio start sector fix
  gfs2: Initialize bio->bi_opf early
  gfs2: Rename gfs2_log_submit_{bio -> write}
  gfs2: Do not cancel internal demote requests
  gfs2: run_queue cleanup
  gfs2: Retries missing in gfs2_{rename,exchange}
  gfs2: glock cancelation flag fix
2026-02-09 16:29:57 -08:00
Linus Torvalds
9e355113f0 vfs-7.0-rc1.misc
Please consider pulling these changes from the signed vfs-7.0-rc1.misc tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49QAKCRCRxhvAZXjc
 ojrZAQD1VJzY46r5FnAVf4jlEHyjIbDnZCP/n+c4x6XnqpU6EQEAgB0yAtAGP6+u
 SBuytElqHoTT5VtmEXTAabCNQ9Ks8wo=
 =JwZz
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains a mix of VFS cleanups, performance improvements, API
  fixes, documentation, and a deprecation notice.

  Scalability and performance:

   - Rework pid allocation to only take pidmap_lock once instead of
     twice during alloc_pid(), improving thread creation/teardown
     throughput by 10-16% depending on false-sharing luck. Pad the
     namespace refcount to reduce false-sharing

   - Track file lock presence via a flag in ->i_opflags instead of
     reading ->i_flctx, avoiding false-sharing with ->i_readcount on
     open/close hot paths. Measured 4-16% improvement on 24-core
     open-in-a-loop benchmarks

   - Use a consume fence in locks_inode_context() to match the
     store-release/load-consume idiom, eliminating a hardware fence on
     some architectures

   - Annotate cdev_lock with __cacheline_aligned_in_smp to prevent
     false-sharing

   - Remove a redundant DCACHE_MANAGED_DENTRY check in
     __follow_mount_rcu() that never fires since the caller already
     verifies it, eliminating a 100% mispredicted branch

   - Fix a 100% mispredicted likely() in devcgroup_inode_permission()
     that became wrong after a prior code reorder

  Bug fixes and correctness:

   - Make insert_inode_locked() wait for inode destruction instead of
     skipping, fixing a corner case where two matching inodes could
     exist in the hash

   - Move f_mode initialization before file_ref_init() in alloc_file()
     to respect the SLAB_TYPESAFE_BY_RCU ordering contract

   - Add a WARN_ON_ONCE guard in try_to_free_buffers() for folios with
     no buffers attached, preventing a null pointer dereference when
     AS_RELEASE_ALWAYS is set but no release_folio op exists

   - Fix select restart_block to store end_time as timespec64, avoiding
     truncation of tv_sec on 32-bit architectures

   - Make dump_inode() use get_kernel_nofault() to safely access inode
     and superblock fields, matching the dump_mapping() pattern

  API modernization:

   - Make posix_acl_to_xattr() allocate the buffer internally since
     every single caller was doing it anyway. Reduces boilerplate and
     unnecessary error checking across ~15 filesystems

   - Replace deprecated simple_strtoul() with kstrtoul() for the
     ihash_entries, dhash_entries, mhash_entries, and mphash_entries
     boot parameters, adding proper error handling

   - Convert chardev code to use guard(mutex) and __free(kfree) cleanup
     patterns

   - Replace min_t() with min() or umin() in VFS code to avoid silently
     truncating unsigned long to unsigned int

   - Gate LOOKUP_RCU assertions behind CONFIG_DEBUG_VFS since callers
     already check the flag

  Deprecation:

   - Begin deprecating legacy BSD process accounting (acct(2)). The
     interface has numerous footguns and better alternatives exist
     (eBPF)

  Documentation:

   - Fix and complete kernel-doc for struct export_operations, removing
     duplicated documentation between ReST and source

   - Fix kernel-doc warnings for __start_dirop() and ilookup5_nowait()

  Testing:

   - Add a kunit test for initramfs cpio handling of entries with
     filesize > PATH_MAX

  Misc:

   - Add missing <linux/init_task.h> include in fs_struct.c"

* tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
  posix_acl: make posix_acl_to_xattr() alloc the buffer
  fs: make insert_inode_locked() wait for inode destruction
  initramfs_test: kunit test for cpio.filesize > PATH_MAX
  fs: improve dump_inode() to safely access inode fields
  fs: add <linux/init_task.h> for 'init_fs'
  docs: exportfs: Use source code struct documentation
  fs: move initializing f_mode before file_ref_init()
  exportfs: Complete kernel-doc for struct export_operations
  exportfs: Mark struct export_operations functions at kernel-doc
  exportfs: Fix kernel-doc output for get_name()
  acct(2): begin the deprecation of legacy BSD process accounting
  device_cgroup: remove branch hint after code refactor
  VFS: fix __start_dirop() kernel-doc warnings
  fs: Describe @isnew parameter in ilookup5_nowait()
  fs/namei: Remove redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu
  fs: only assert on LOOKUP_RCU when built with CONFIG_DEBUG_VFS
  select: store end_time as timespec64 in restart block
  chardev: Switch to guard(mutex) and __free(kfree)
  namespace: Replace simple_strtoul with kstrtoul to parse boot params
  dcache: Replace simple_strtoul with kstrtoul in set_dhash_entries
  ...
2026-02-09 15:13:05 -08:00
Linus Torvalds
aa2a0fcd4c vfs-7.0-rc1.leases
Please consider pulling these changes from the signed vfs-7.0-rc1.leases tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc
 olR/AP40iNOTRn7LosXbRWqGGZqzy9v64QYoLzk3QdsWuGmbRAD/egNQzof8mkAf
 IscefWTOjY7xyDzmEBEBnfHftgMiEwM=
 =zre0
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.0-rc1.leases' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs lease updates from Christian Brauner:
 "This contains updates for lease support to require filesystems to
  explicitly opt-in to lease support

  Currently kernel_setlease() falls through to generic_setlease() when a
  a filesystem does not define ->setlease(), silently granting lease
  support to every filesystem regardless of whether it is prepared for
  it.

  This is a poor default: most filesystems never intended to support
  leases, and the silent fallthrough makes it impossible to distinguish
  "supports leases" from "never thought about it".

  This inverts the default. It adds explicit

	.setlease = generic_setlease;

  assignments to every in-tree filesystem that should retain lease
  support, then changes kernel_setlease() to return -EINVAL when
  ->setlease is NULL.

  With the new default in place, simple_nosetlease() is redundant and
  is removed along with all references to it"

* tag 'vfs-7.0-rc1.leases' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
  fuse: add setlease file operation
  fs: remove simple_nosetlease()
  filelock: default to returning -EINVAL when ->setlease operation is NULL
  xfs: add setlease file operation
  ufs: add setlease file operation
  udf: add setlease file operation
  tmpfs: add setlease file operation
  squashfs: add setlease file operation
  overlayfs: add setlease file operation
  orangefs: add setlease file operation
  ocfs2: add setlease file operation
  ntfs3: add setlease file operation
  nilfs2: add setlease file operation
  jfs: add setlease file operation
  jffs2: add setlease file operation
  gfs2: add a setlease file operation
  fat: add setlease file operation
  f2fs: add setlease file operation
  exfat: add setlease file operation
  ext4: add setlease file operation
  ...
2026-02-09 11:59:07 -08:00
Linus Torvalds
74554251df vfs-7.0-rc1.nonblocking_timestamps
Please consider pulling these changes from the signed vfs-7.0-rc1.nonblocking_timestamps tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc
 oqNMAQCjHw9iwYDu63n96QAipWopJb8onqc0rTEvi0OOl1zDNwEAufN3EqTzV3uQ
 JbNgSwBWD/+ICd2aUOuAX0GgU6teyAQ=
 =lJlI
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.0-rc1.nonblocking_timestamps' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs timestamp updates from Christian Brauner:
 "This contains the changes to support non-blocking timestamp updates.

  Since commit 66fa3cedf1 ("fs: Add async write file modification
  handling") file_update_time_flags() unconditionally returns -EAGAIN
  when any timestamp needs updating and IOCB_NOWAIT is set. This makes
  non-blocking direct writes impossible on file systems with granular
  enough timestamps, which in practice means all of them.

  This reworks the timestamp update path to propagate IOCB_NOWAIT
  through ->update_time so that file systems which can update timestamps
  without blocking are no longer penalized.

  With that groundwork in place, the core change passes IOCB_NOWAIT into
  ->update_time and returns -EAGAIN only when the file system indicates
  it would block.

  XFS implements non-blocking timestamp updates by using the new
  ->sync_lazytime and open-coding generic_update_time without the
  S_NOWAIT check, since the lazytime path through the generic helpers
  can never block in XFS"

* tag 'vfs-7.0-rc1.nonblocking_timestamps' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  xfs: enable non-blocking timestamp updates
  xfs: implement ->sync_lazytime
  fs: refactor file_update_time_flags
  fs: add support for non-blocking timestamp updates
  fs: add a ->sync_lazytime method
  fs: factor out a sync_lazytime helper
  fs: refactor ->update_time handling
  fat: cleanup the flags for fat_truncate_time
  nfs: split nfs_update_timestamps
  fs: allow error returns from generic_update_time
  fs: remove inode_update_time
2026-02-09 11:25:01 -08:00
Andreas Gruenbacher
e411d74cc5 gfs2: fiemap page fault fix
In gfs2_fiemap(), we are calling iomap_fiemap() while holding the inode
glock.  This can lead to recursive glock taking if the fiemap buffer is
memory mapped to the same inode and accessing it triggers a page fault.

Fix by disabling page faults for iomap_fiemap() and faulting in the
buffer by hand if necessary.

Fixes xfstest generic/742.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-02-05 18:00:45 +01:00
Deepanshu Kartikey
da6f5bbc2e gfs2: fix memory leaks in gfs2_fill_super error path
Fix two memory leaks in the gfs2_fill_super() error handling path when
transitioning a filesystem to read-write mode fails.

First leak: kthread objects (thread_struct, task_struct, etc.)
When gfs2_freeze_lock_shared() fails after init_threads() succeeds, the
created kernel threads (logd and quotad) are never destroyed. This
occurs because the fail_per_node label doesn't call
gfs2_destroy_threads().

Second leak: quota bitmap buffer (8192 bytes)
When gfs2_make_fs_rw() fails after gfs2_quota_init() succeeds but
before other operations complete, the allocated quota bitmap is never
freed.

The fix moves thread cleanup to the fail_per_node label to handle all
error paths uniformly. gfs2_destroy_threads() is safe to call
unconditionally as it checks for NULL pointers. Quota cleanup is added
in gfs2_make_fs_rw() to properly handle the withdrawal case where
quota initialization succeeds but the filesystem is then withdrawn.

Thread leak backtrace (gfs2_freeze_lock_shared failure):
  unreferenced object 0xffff88801d7bca80 (size 4480):
    copy_process+0x3a1/0x4670 kernel/fork.c:2422
    kernel_clone+0xf3/0x6e0 kernel/fork.c:2779
    kthread_create_on_node+0x100/0x150 kernel/kthread.c:478
    init_threads+0xab/0x350 fs/gfs2/ops_fstype.c:611
    gfs2_fill_super+0xe5c/0x1240 fs/gfs2/ops_fstype.c:1265

Quota leak backtrace (gfs2_make_fs_rw failure):
  unreferenced object 0xffff88812de7c000 (size 8192):
    gfs2_quota_init+0xe5/0x820 fs/gfs2/quota.c:1409
    gfs2_make_fs_rw+0x7a/0xe0 fs/gfs2/super.c:149
    gfs2_fill_super+0xfbb/0x1240 fs/gfs2/ops_fstype.c:1275

Reported-by: syzbot+aac438d7a1c44071e04b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=aac438d7a1c44071e04b
Fixes: 6c7410f449 ("gfs2: gfs2_freeze_lock_shared cleanup")
Fixes: b66f723bb5 ("gfs2: Improve gfs2_make_fs_rw error handling")
Link: https://lore.kernel.org/all/20260131062509.77974-1-kartikey406@gmail.com/T/ [v1]
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-02-03 10:57:54 +01:00
Deepanshu Kartikey
faddeb8483 gfs2: Fix use-after-free in iomap inline data write path
The inline data buffer head (dibh) is being released prematurely in
gfs2_iomap_begin() via release_metapath() while iomap->inline_data
still points to dibh->b_data. This causes a use-after-free when
iomap_write_end_inline() later attempts to write to the inline data
area.

The bug sequence:
1. gfs2_iomap_begin() calls gfs2_meta_inode_buffer() to read inode
   metadata into dibh
2. Sets iomap->inline_data = dibh->b_data + sizeof(struct gfs2_dinode)
3. Calls release_metapath() which calls brelse(dibh), dropping refcount
   to 0
4. kswapd reclaims the page (~39ms later in the syzbot report)
5. iomap_write_end_inline() tries to memcpy() to iomap->inline_data
6. KASAN detects use-after-free write to freed memory

Fix by storing dibh in iomap->private and incrementing its refcount
with get_bh() in gfs2_iomap_begin(). The buffer is then properly
released in gfs2_iomap_end() after the inline write completes,
ensuring the page stays alive for the entire iomap operation.

Note: A C reproducer is not available for this issue. The fix is based
on analysis of the KASAN report and code review showing the buffer head
is freed before use.

[agruenba: Take buffer head reference in gfs2_iomap_begin() to avoid
leaks in gfs2_iomap_get() and gfs2_iomap_alloc().]

Reported-by: syzbot+ea1cd4aa4d1e98458a55@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ea1cd4aa4d1e98458a55
Fixes: d0a22a4b03 ("gfs2: Fix iomap write page reclaim deadlock")
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-30 13:21:57 +01:00
Linus Torvalds
fcb70a56f4 vfs-6.19-rc8.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaXc4IwAKCRCRxhvAZXjc
 oo0jAQDOV580l4wHiY6eT1QGY2QYa7u8fYDOi6mqfgHa+EH5twD9ETnQ0xQHIKYP
 oruFJXLf3ihBBsum+pTpAO2XFVjM7Qs=
 =pM8o
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.19-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Fix the the buggy conversion of fuse_reverse_inval_entry() introduced
   during the creation rework

 - Disallow nfs delegation requests for directories by setting
   simple_nosetlease()

 - Require an opt-in for getting readdir flag bits outside of S_DT_MASK
   set in d_type

 - Fix scheduling delayed writeback work by only scheduling when the
   dirty time expiry interval is non-zero and cancel the delayed work if
   the interval is set to zero

 - Use rounded_jiffies_interval for dirty time work

 - Check the return value of sb_set_blocksize() for romfs

 - Wait for batched folios to be stable in __iomap_get_folio()

 - Use private naming for fuse hash size

 - Fix the stale dentry cleanup to prevent a race that causes a UAF

* tag 'vfs-6.19-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  vfs: document d_dispose_if_unused()
  fuse: shrink once after all buckets have been scanned
  fuse: clean up fuse_dentry_tree_work()
  fuse: add need_resched() before unlocking bucket
  fuse: make sure dentry is evicted if stale
  fuse: fix race when disposing stale dentries
  fuse: use private naming for fuse hash size
  writeback: use round_jiffies_relative for dirtytime_work
  iomap: wait for batched folios to be stable in __iomap_get_folio
  romfs: check sb_set_blocksize() return value
  docs: clarify that dirtytime_expire_seconds=0 disables writeback
  writeback: fix 100% CPU usage when dirtytime_expire_interval is 0
  readdir: require opt-in for d_type flags
  vboxsf: don't allow delegations to be set on directories
  ceph: don't allow delegations to be set on directories
  gfs2: don't allow delegations to be set on directories
  9p: don't allow delegations to be set on directories
  smb/client: properly disallow delegations on directories
  nfs: properly disallow delegation requests on directories
  fuse: fix conversion of fuse_reverse_inval_entry() to start_removing()
2026-01-26 09:30:48 -08:00
Andreas Gruenbacher
22150a7d40 gfs2: Fix slab-use-after-free in qd_put
Commit a475c5dd16 ("gfs2: Free quota data objects synchronously")
started freeing quota data objects during filesystem shutdown instead of
putting them back onto the LRU list, but it failed to remove these
objects from the LRU list, causing LRU list corruption.  This caused
use-after-free when the shrinker (gfs2_qd_shrink_scan) tried to access
already-freed objects on the LRU list.

Fix this by removing qd objects from the LRU list before freeing them in
qd_put().

Initial fix from Deepanshu Kartikey <kartikey406@gmail.com>.

Fixes: a475c5dd16 ("gfs2: Free quota data objects synchronously")
Reported-by: syzbot+046b605f01802054bff0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=046b605f01802054bff0
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26 14:28:18 +01:00
Andreas Gruenbacher
0ec49e7ea6 gfs2: Introduce glock_{type,number,sbd} helpers
Introduce glock_type(), glock_number(), and glock_sbd() helpers for
accessing a glock's type, number, and super block pointer more easily.

Created with Coccinelle using the following semantic patch:

@@ struct gfs2_glock *gl; @@
- gl->gl_name.ln_type
+ glock_type(gl)

@@ struct gfs2_glock *gl; @@
- gl->gl_name.ln_number
+ glock_number(gl)

@@ struct gfs2_glock *gl; @@
- gl->gl_name.ln_sbd
+ glock_sbd(gl)

glock_sbd() is a macro because it is used with const as well as
non-const struct gfs2_glock * arguments.

Instances in macro definitions, particularly in tracepoint definitions,
replaced by hand.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26 14:28:18 +01:00
Andreas Gruenbacher
d3b39fcb39 gfs2: gfs2_glock_hold cleanup
Use lockref_get_not_dead() instead of an unguarded __lockref_is_dead()
check.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26 14:28:18 +01:00
Andreas Gruenbacher
536f48e8bb gfs: Use fixed GL_GLOCK_MIN_HOLD time
GL_GLOCK_MIN_HOLD represents the minimum time (in jiffies) that a glock
should be held before being eligible for release.  It is currently
defined as 10, meaning that the duration depends on the timer interrupt
frequency (CONFIG_HZ).  Change that time to a constant 10ms independent
of CONFIG_HZ.  On CONFIG_HZ=1000 systems, the value remains the same.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26 14:28:17 +01:00
Andreas Gruenbacher
c45fefe3a9 gfs2: Fix gfs2_log_get_bio argument type
Fix the type of gfs2_log_get_bio()'s op argument: callers pass in a
blk_opf_t value and the function passes that value on as a blk_opf_t
value, so the current argument type makes no sense.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26 14:28:17 +01:00
Andreas Gruenbacher
08ca56ffcd gfs2: gfs2_chain_bio start sector fix
Pass the start sector into gfs2_chain_bio(): the new bio isn't
necessarily contiguous with the previous one.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26 14:28:17 +01:00
Andreas Gruenbacher
4a94f052e0 gfs2: Initialize bio->bi_opf early
Pass the right blk_opf_t value to bio_alloc() so that ->bi_ops is
initialized correctly and doesn't have to be changed later.  Adjust the
call chain to pass that value through to where it is needed (and only
there).

Add a separate blk_opf_t argument to gfs2_chain_bio() instead of copying
the value from the previous bio.

Fixes: 8a157e0a0a ("gfs2: Fix use of bio_chain")
Reported-by: syzbot+f6539d4ce3f775aee0cc@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f6539d4ce3f775aee0cc
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26 14:28:17 +01:00
Andreas Gruenbacher
59d81037d3 gfs2: Rename gfs2_log_submit_{bio -> write}
Rename gfs2_log_submit_bio() to gfs2_log_submit_write(): this function
isn't used for submitting log reads.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26 14:28:17 +01:00
Andreas Gruenbacher
4928c36536 gfs2: Do not cancel internal demote requests
Trying to change the state of a glock may result in a "conversion
deadlock" error, indicating that the requested state transition would
cause a deadlock.  In this case, we unlock and retry the state change.
It makes no sense to try canceling those unlock requests, but the
current code is not aware of that.

In addition, if a locking request is canceled shortly after it is made,
the cancelation request can currently overtake the locking request.
This may result in deadlocks.

Fix both of these bugs by repurposing the GLF_PENDING_REPLY flag into a
GLF_MAY_CANCEL flag which is set only when the current locking request
can be canceled.  When trying to cancel a locking request in
gfs2_glock_dq(), wait for this flag to be set.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26 14:28:17 +01:00
Andreas Gruenbacher
5e3319932a gfs2: run_queue cleanup
In run_queue(), instead of always setting the GLF_LOCK flag, only set it
when the flag is actually needed.  This avoids having to undo the flag
setting later.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26 14:28:17 +01:00
Andreas Gruenbacher
11d763f0b0 gfs2: Retries missing in gfs2_{rename,exchange}
Fix a bug in gfs2's asynchronous glock handling for rename and exchange
operations.  The original async implementation from commit ad26967b9a
("gfs2: Use async glocks for rename") mentioned that retries were needed
but never implemented them, causing operations to fail with -ESTALE
instead of retrying on timeout.

Also makes the waiting interruptible.

In addition, the timeouts used were too high for situations in which
timing out is a rare but expected scenario.  Switch to shorter timeouts
with randomization and exponentional backoff.

Fixes: ad26967b9a ("gfs2: Use async glocks for rename")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26 14:28:17 +01:00
Andreas Gruenbacher
f8f04248c7 gfs2: glock cancelation flag fix
When an asynchronous glock holder is dequeued that hasn't been granted
yet (HIF_HOLDER not set) and no dlm operation is in progress on behalf
of that holder (GLF_LOCK not set), the dequeuing takes place in
__gfs2_glock_dq().  There, we are not clearing the HIF_WAIT flag and
waking up waiters.  Fix that.

This bug prevents the same holder from being enqueued later (gfs2_glock_nq())
without first reinitializing it (gfs2_holder_reinit()).  The code
doesn't currently use this pattern, but this will change in the next
commit.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26 14:28:17 +01:00
Randy Dunlap
24c776355f kernel.h: drop hex.h and update all hex.h users
Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of
hex.h interfaces to directly #include <linux/hex.h> as part of the process
of putting kernel.h on a diet.

Removing hex.h from kernel.h means that 36K C source files don't have to
pay the price of parsing hex.h for the roughly 120 C source files that
need it.

This change has been build-tested with allmodconfig on most ARCHes.  Also,
all users/callers of <linux/hex.h> in the entire source tree have been
updated if needed (if not already #included).

Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:44:19 -08:00
Miklos Szeredi
6cbfdf8947
posix_acl: make posix_acl_to_xattr() alloc the buffer
Without exception all caller do that.  So move the allocation into the
helper.

This reduces boilerplate and removes unnecessary error checking.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://patch.msgid.link/20260115122341.556026-1-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16 10:51:12 +01:00
Andreas Gruenbacher
469d71512d Revert "gfs2: Fix use of bio_chain"
This reverts commit 8a157e0a0a.

That commit incorrectly assumed that the bio_chain() arguments were
swapped in gfs2.  However, gfs2 intentionally constructs bio chains so
that the first bio's bi_end_io callback is invoked when all bios in the
chain have completed, unlike bio chains where the last bio's callback is
invoked.

Fixes: 8a157e0a0a ("gfs2: Fix use of bio_chain")
Cc: stable@vger.kernel.org
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-12 14:58:32 +01:00
Christoph Hellwig
85c871a02b
fs: add support for non-blocking timestamp updates
Currently file_update_time_flags unconditionally returns -EAGAIN if any
timestamp needs to be updated and IOCB_NOWAIT is passed.  This makes
non-blocking direct writes impossible on file systems with granular
enough timestamps.

Pass IOCB_NOWAIT to ->update_time and return -EAGAIN if it could block.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260108141934.2052404-9-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 14:01:33 +01:00
Christoph Hellwig
761475268f
fs: refactor ->update_time handling
Pass the type of update (atime vs c/mtime plus version) as an enum
instead of a set of flags that caused all kinds of confusion.
Because inode_update_timestamps now can't return a modified version
of those flags, return the I_DIRTY_* flags needed to persist the
update, which is what the main caller in generic_update_time wants
anyway, and which is suitable for the other callers that only want
to know if an update happened.

The whole update_time path keeps the flags argument, which will be used
to support non-blocking updates soon even if it is unused, and (the
slightly renamed) inode_update_time also gains the possibility to return
a negative errno to support this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260108141934.2052404-6-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 14:01:32 +01:00
Christoph Hellwig
dc9629faef
fs: allow error returns from generic_update_time
Now that no caller looks at the updated flags, switch generic_update_time
to the same calling convention as the ->update_time method and return 0
or a negative errno.

This prepares for adding non-blocking timestamp updates that could return
-EAGAIN.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260108141934.2052404-3-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 14:01:32 +01:00
Jeff Layton
51e49111c0
fs: remove simple_nosetlease()
Setting ->setlease() to a NULL pointer now has the same effect as
setting it to simple_nosetlease(). Remove all of the setlease
file_operations that are set to simple_nosetlease, and the function
itself.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260108-setlease-6-20-v1-24-ea4dec9b67fa@kernel.org
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 10:55:48 +01:00
Jeff Layton
3b514c3333
gfs2: add a setlease file operation
gfs2_file_fops_nolock() already has this explicitly set, so it's only
necessary to set this in gfs2_dir_fops_nolock().  A future patch
will change the default behavior to reject lease attempts with -EINVAL
when there is no setlease file operation defined.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260108-setlease-6-20-v1-10-ea4dec9b67fa@kernel.org
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 10:55:46 +01:00
Jeff Layton
ce946c4fb9
gfs2: don't allow delegations to be set on directories
With the advent of directory leases, it's necessary to set the
->setlease() handler in directory file_operations to properly deny them.

In the "nolock" case however, there is no need to deny them.

Fixes: e6d28ebc17 ("filelock: push the S_ISREG check down to ->setlease handlers")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260107-setlease-6-19-v1-4-85f034abcc57@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 10:54:47 +01:00
Linus Torvalds
afcbce74f3 gfs2 changes
- Major withdraw / error handling overhaul based on dlm's new
   DLM_RELEASE_RECOVER feature: this allows gfs to treat withdraws like
   node failures.  Make withdraws asynchronous.
 
 - Fix a bug in commit e4a8b5481c that caused 'df' to remain out of
   sync.  ('df' is still allowed to go slightly out of sync for short
   periods of time.)
 
 - Prevent recusive memory reclaim in gfs2_unstuff_dinode().
 
 - Clean up SDF_JOURNAL_LIVE flag handling.
 
 - Fix remote evict for read-only filesystems.
 
 - Fix a misuse of bio_chain().
 
 - Various other minor cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmkvM/oUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqFLQ//Ra+FUIMYRrU83xd8OB+rXaFixymk
 riPnoXZHrccoRAARc6d/p/nTYmV7FSnTa2jlsUFshsO2jl/1OIRI6xqScL+W5dp9
 ToxHKnezOsRmy5tXvomYy9Teo/ngz6dpv6ixVrn7qTGJJ5zpir0ji2fO+xUvCUQq
 IVGgvQN0O7ECUmmRXxjkRLhoEdus2Ye4jQkpJmqY9I1+H/0E43HlF2zDgoisTkiH
 baNtFkTgE65Hba2xhoTCt3NO17POgqGcRaLVOLv5dqGhLAQ33+jXeYduM6ujbqwk
 1FznGf09NRBJVQkyJvbgZELdX+mWXvWf/2kPmW6TXd7phFiO7cByxN2aHHcu/wBp
 kh0ZgLFtvjyuR7wiyKf0tAKKPh250YiSkBuJU64dlHha2ZQuME6Z4dlYqBKwvPXf
 AwxAu0tB2bAjc2OXvPaHhLHjdpySEWqNZIITbfQPVPonRKed10IqCwlTd+j5/64H
 mnZCCxr2p2akrSYbQjl7h1k4LO7eX3cYJ23JoKIjwQ2hJE+Pd4li9+xhQCEQUVyE
 ou/ezUKc19Xe7uxG1412PrbmOtIMgKfpq2KW151b1KA3pfCUt7GMvpYlvmnf9Wzs
 XGjrQ1p4XdlxzQqe9XTrdO0IrNUNGDcky1Kqishje9SPa8FthwrtTuBSYL8nkcX6
 3KRRzlFCRHfM5ng=
 =AWK2
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - Major withdraw / error handling overhaul based on dlm's new
   DLM_RELEASE_RECOVER feature: this allows gfs to treat withdraws like
   node failures. Make withdraws asynchronous

 - Fix a bug in commit e4a8b5481c that caused 'df' to remain out of
   sync. ('df' is still allowed to go slightly out of sync for short
   periods of time)

 - Prevent recusive memory reclaim in gfs2_unstuff_dinode()

 - Clean up SDF_JOURNAL_LIVE flag handling

 - Fix remote evict for read-only filesystems

 - Fix a misuse of bio_chain()

 - Various other minor cleanups

* tag 'gfs2-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (35 commits)
  gfs2: Fix use of bio_chain
  gfs2: Clean up SDF_JOURNAL_LIVE flag handling
  gfs2: No longer thaw filesystems during a withdraw
  gfs2: Withdraw immediately in gfs2_trans_add_meta
  gfs2: New gfs2_withdraw_helper
  gfs2: Clean up properly during a withdraw
  gfs2: Rename gfs2_{gl_dq_holders => withdraw_glocks}
  Revert "gfs2: fix infinite loop when checking ail item count before go_inval"
  Revert "gfs2: Allow some glocks to be used during withdraw"
  Revert "gfs2: Check for log write errors before telling dlm to unlock"
  Revert "gfs2: fix a deadlock on withdraw-during-mount"
  Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (6/6)
  Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (5/6)
  Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (4/6)
  Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (3/6)
  Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (2/6)
  Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (1/6)
  Revert "gfs2: don't stop reads while withdraw in progress"
  gfs2: Rename LM_FLAG_{NOEXP -> RECOVER}
  gfs2: Kill gfs2_io_error_bh_wd
  ...
2025-12-03 20:28:50 -08:00
Andreas Gruenbacher
8a157e0a0a gfs2: Fix use of bio_chain
In gfs2_chain_bio(), the call to bio_chain() has its arguments swapped.
The result is leaked bios and incorrect synchronization (only the last
bio will actually be waited for).  This code is only used during mount
and filesystem thaw, so the bug normally won't be noticeable.

Reported-by: Stephen Zhang <starzhangzsd@gmail.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-12-02 16:44:54 +00:00
Linus Torvalds
f2e74ecfba vfs-6.19-rc1.folio
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZQAKCRCRxhvAZXjc
 onGBAQDtqeO0jZzS7q9UxlJ84Wj/H9w+9INpO4jMxtWK4svhUAEAghG4qVxRvkE2
 Qh+wrpTPIC7OCQ78k8psDRmkj9cn8QA=
 =FCVN
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.19-rc1.folio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull folio updates from Christian Brauner:
 "Add a new folio_next_pos() helper function that returns the file
  position of the first byte after the current folio. This is a common
  operation in filesystems when needing to know the end of the current
  folio.

  The helper is lifted from btrfs which already had its own version, and
  is now used across multiple filesystems and subsystems:
   - btrfs
   - buffer
   - ext4
   - f2fs
   - gfs2
   - iomap
   - netfs
   - xfs
   - mm

  This fixes a long-standing bug in ocfs2 on 32-bit systems with files
  larger than 2GiB. Presumably this is not a common configuration, but
  the fix is backported anyway. The other filesystems did not have bugs,
  they were just mildly inefficient.

  This also introduce uoff_t as the unsigned version of loff_t. A recent
  commit inadvertently changed a comparison from being unsigned (on
  64-bit systems) to being signed (which it had always been on 32-bit
  systems), leading to sporadic fstests failures.

  Generally file sizes are restricted to being a signed integer, but in
  places where -1 is passed to indicate "up to the end of the file", it
  is convenient to have an unsigned type to ensure comparisons are
  always unsigned regardless of architecture"

* tag 'vfs-6.19-rc1.folio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: Add uoff_t
  mm: Use folio_next_pos()
  xfs: Use folio_next_pos()
  netfs: Use folio_next_pos()
  iomap: Use folio_next_pos()
  gfs2: Use folio_next_pos()
  f2fs: Use folio_next_pos()
  ext4: Use folio_next_pos()
  buffer: Use folio_next_pos()
  btrfs: Use folio_next_pos()
  filemap: Add folio_next_pos()
2025-12-01 10:26:38 -08:00
Linus Torvalds
ebaeabfa5a vfs-6.19-rc1.writeback
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZQAKCRCRxhvAZXjc
 or4UAP9FbpFsZd0DpsYnKuv7kFepl291PuR0x2dKmseJ/wcf8AEAzI8FR5wd/fey
 25ZNdExoUojAOj5wVn+jUep3u54jBws=
 =/toi
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.19-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull writeback updates from Christian Brauner:
 "Features:

   - Allow file systems to increase the minimum writeback chunk size.

     The relatively low minimal writeback size of 4MiB means that
     written back inodes on rotational media are switched a lot. Besides
     introducing additional seeks, this also can lead to extreme file
     fragmentation on zoned devices when a lot of files are cached
     relative to the available writeback bandwidth.

     This adds a superblock field that allows the file system to
     override the default size, and sets it to the zone size for zoned
     XFS.

   - Add logging for slow writeback when it exceeds
     sysctl_hung_task_timeout_secs. This helps identify tasks waiting
     for a long time and pinpoint potential issues. Recording the
     starting jiffies is also useful when debugging a crashed vmcore.

   - Wake up waiting tasks when finishing the writeback of a chunk

  Cleanups:

   - filemap_* writeback interface cleanups.

     Adding filemap_fdatawrite_wbc ended up being a mistake, as all but
     the original btrfs caller should be using better high level
     interfaces instead.

     This series removes all these low-level interfaces, switches btrfs
     to a more specific interface, and cleans up other too low-level
     interfaces. With this the writeback_control that is passed to the
     writeback code is only initialized in three places.

   - Remove __filemap_fdatawrite, __filemap_fdatawrite_range, and
     filemap_fdatawrite_wbc

   - Add filemap_flush_nr helper for btrfs

   - Push struct writeback_control into start_delalloc_inodes in btrfs

   - Rename filemap_fdatawrite_range_kick to filemap_flush_range

   - Stop opencoding filemap_fdatawrite_range in 9p, ocfs2, and mm

   - Make wbc_to_tag() inline and use it in fs"

* tag 'vfs-6.19-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: Make wbc_to_tag() inline and use it in fs.
  xfs: set s_min_writeback_pages for zoned file systems
  writeback: allow the file system to override MIN_WRITEBACK_PAGES
  writeback: cleanup writeback_chunk_size
  mm: rename filemap_fdatawrite_range_kick to filemap_flush_range
  mm: remove __filemap_fdatawrite_range
  mm: remove filemap_fdatawrite_wbc
  mm: remove __filemap_fdatawrite
  mm,btrfs: add a filemap_flush_nr helper
  btrfs: push struct writeback_control into start_delalloc_inodes
  btrfs: use the local tmp_inode variable in start_delalloc_inodes
  ocfs2: don't opencode filemap_fdatawrite_range in ocfs2_journal_submit_inode_data_buffers
  9p: don't opencode filemap_fdatawrite_range in v9fs_mmap_vm_close
  mm: don't opencode filemap_fdatawrite_range in filemap_invalidate_inode
  writeback: Add logging for slow writeback (exceeds sysctl_hung_task_timeout_secs)
  writeback: Wake up waiting tasks when finishing the writeback of a chunk.
2025-12-01 09:20:51 -08:00
Linus Torvalds
9368f0f941 vfs-6.19-rc1.inode
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZAAKCRCRxhvAZXjc
 omMSAP9GLhavxyWQ24Q+49CNWWRQWDY1wTOiUK2BwtIvZ0YEcAD8D1dAiMckL5pC
 RwEAVA5p+y+qi+bZP0KXCBxQddoTIQM=
 =zo/J
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs inode updates from Christian Brauner:
 "Features:

   - Hide inode->i_state behind accessors. Open-coded accesses prevent
     asserting they are done correctly. One obvious aspect is locking,
     but significantly more can be checked. For example it can be
     detected when the code is clearing flags which are already missing,
     or is setting flags when it is illegal (e.g., I_FREEING when
     ->i_count > 0)

   - Provide accessors for ->i_state, converts all filesystems using
     coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2,
     overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to
     compile

   - Rework I_NEW handling to operate without fences, simplifying the
     code after the accessor infrastructure is in place

  Cleanups:

   - Move wait_on_inode() from writeback.h to fs.h

   - Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb
     for clarity

   - Cosmetic fixes to LRU handling

   - Push list presence check into inode_io_list_del()

   - Touch up predicts in __d_lookup_rcu()

   - ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage

   - Assert on ->i_count in iput_final()

   - Assert ->i_lock held in __iget()

  Fixes:

   - Add missing fences to I_NEW handling"

* tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
  dcache: touch up predicts in __d_lookup_rcu()
  fs: push list presence check into inode_io_list_del()
  fs: cosmetic fixes to lru handling
  fs: rework I_NEW handling to operate without fences
  fs: make plain ->i_state access fail to compile
  xfs: use the new ->i_state accessors
  nilfs2: use the new ->i_state accessors
  overlayfs: use the new ->i_state accessors
  gfs2: use the new ->i_state accessors
  f2fs: use the new ->i_state accessors
  smb: use the new ->i_state accessors
  ceph: use the new ->i_state accessors
  btrfs: use the new ->i_state accessors
  Manual conversion to use ->i_state accessors of all places not covered by coccinelle
  Coccinelle-based conversion to use ->i_state accessors
  fs: provide accessors for ->i_state
  fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb
  fs: move wait_on_inode() from writeback.h to fs.h
  fs: add missing fences to I_NEW handling
  ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage
  ...
2025-12-01 09:02:34 -08:00
Andreas Gruenbacher
83348905e4 gfs2: Clean up SDF_JOURNAL_LIVE flag handling
Change do_withdraw() to clear the SDF_JOURNAL_LIVE flag under the log
flush lock.  In addition, change __gfs2_trans_begin() to check if the
filesystem is already known to be withdrawn using gfs2_withdrawn().
Then, once we are holding the log flush lock, check if the
SDF_JOURNAL_LIVE flag is still set.  This second check ensures that the
filesystem will remain live until the transaction is submitted.

With these changes, it is no longer useful to clear SDF_JOURNAL_LIVE in
gfs2_end_log_write() after calling gfs2_withdraw().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26 23:52:28 +00:00
Andreas Gruenbacher
16c3197984 gfs2: No longer thaw filesystems during a withdraw
Previously, when a withdraw occurred, we would wait for another node to
recover our journal.  This also meant that frozen filesystem needed to
be thawed because otherwise, other nodes wouldn't be able to recover the
filesystem.  With the reversal of commit 601ef0d52e ("gfs2: Force
withdraw to replay journals and wait for it to finish"), we are no
longer waiting for journal recovery during a withdraw, so we no longer
need to thaw frozen filesystems, either.  This also fixes a potential
deadlock reported by lockdep when running xfstest generic/108.

In addition, there is nothing left in do_withdraw() that would require
taking sd_freeze_mutex, so don't bother taking that lock there anymore.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26 23:52:28 +00:00
Andreas Gruenbacher
3a88edc165 gfs2: Withdraw immediately in gfs2_trans_add_meta
We can now withdraw while the log is locked.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26 23:52:28 +00:00
Andreas Gruenbacher
bbbf1529ea gfs2: New gfs2_withdraw_helper
Currently, when a gfs2 filesystem is withdrawn, an "offline" uevent is
triggered that invokes gfs2-util's gfs2_withdraw_helper script.  The
purpose of this script is to deactivate the filesystem's block device so
that it can be withdrawn immediately, even before all the filesystem's
caches have been discarded.  The script provided by gfs2-utils never did
anything useful, and there was no way for it to report back its status
to the kernel.

To fix that, extend the gfs2_withdraw_helper mechanism so that the
script can report one of the following results by writing the
corresponding value into "/sys$DEVPATH/lock_module/withdraw":

 0 - The shared block device has been marked inactive.  Future write
     operations will fail.

 1 - The shared block device may still be active and carry out
     write operations.

If the "offline" uevent isn't reacted upon within the timeout configured
in /sys$DEVPATH/tune/withdraw_helper_timeout (default 5 seconds), the
event handler is assumed to have failed.

In addition, add an additional "errors=deactivate" mount option.

With these changes, if fatal errors are detected on a gfs2 filesystem
and the filesystem is mounted with the "errors=panic" option, the kernel
will panic immediately.  Otherwise, an attempt will be made to
deactivate the underlying block device.  If successful, the kernel will
release all cluster-wide locks immediately so that the rest of the
cluster can continue.  If unsuccessful, the kernel will either panic
("errors=deactivate"), or it will purge all filesystem I/O before
releasing all cluster-wide locks ("errors=withdraw").

Note that the gfs2_withdraw_helper script still needs to be fixed to
take advantage of these improvements.  It could be changed to use a
mechanism like LVM Persistent Reservations.  "dmsetup suspend" is not a
suitable mechanism as it infinitely postpones I/O operations, which may
prevent withdraw from completing.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26 23:52:27 +00:00
Andreas Gruenbacher
0e10da69d1 gfs2: Clean up properly during a withdraw
During a withdraw, we don't want to write out any more data than we have
to, so in do_xmote(), skip the ->go_sync() glock operation.  We still
want to keep calling ->go_inval() to discard any cached data or
metadata, whether clean or dirty.

We do still allow glocks to transition into state LM_ST_UNLOCKED.  This
has the desired side effect of calling ->go_inval() and invalidating the
glock caches.

Function gfs2_withdraw_glocks() is already used for dequeuing any
left-over waiters.  We still want that to happen, but additionally, we
want all glocks to be unlocked.

Finally, we change function do_promote() to refuse any further
promotions.

This commit cleans up the leftovers of commit 86934198ee ("gfs2: Clear
flags when withdraw prevents xmote").

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26 23:52:27 +00:00
Andreas Gruenbacher
473678ccb9 gfs2: Rename gfs2_{gl_dq_holders => withdraw_glocks}
Rename function gfs2_gl_dq_holders() to gfs2_withdraw_glocks().  This
function will soon be used for more than just dequeuing holders.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26 23:52:27 +00:00
Andreas Gruenbacher
655531c95b Revert "gfs2: fix infinite loop when checking ail item count before go_inval"
The current withdraw code duplicates the journal recovery code gfs2
already has for dealing with node failures, and it does so poorly.  That
code was added because when releasing a lockspace, we didn't have a way
to indicate that the lockspace needs recovery.  We now do have this
feature, so the current withdraw code can be removed almost entirely.
This is one of several steps towards that.

Reverts commit 33dbd1e41a ("gfs2: fix infinite loop when checking ail
item count before go_inval").

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26 23:52:27 +00:00
Andreas Gruenbacher
af572efef1 Revert "gfs2: Allow some glocks to be used during withdraw"
The current withdraw code duplicates the journal recovery code gfs2
already has for dealing with node failures, and it does so poorly.  That
code was added because when releasing a lockspace, we didn't have a way
to indicate that the lockspace needs recovery.  We now do have this
feature, so the current withdraw code can be removed almost entirely.
This is one of several steps towards that.

Reverts commit a72d2401f5 ("gfs2: Allow some glocks to be used during
withdraw").

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26 23:52:27 +00:00
Andreas Gruenbacher
41ad1f7c8b Revert "gfs2: Check for log write errors before telling dlm to unlock"
The current withdraw code duplicates the journal recovery code gfs2
already has for dealing with node failures, and it does so poorly.  That
code was added because when releasing a lockspace, we didn't have a way
to indicate that the lockspace needs recovery.  We now do have this
feature, so the current withdraw code can be removed almost entirely.
This is one of several steps towards that.

Reverts the rest of d93ae386ef ("gfs2: Check for log write errors
before telling dlm to unlock").

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26 23:52:27 +00:00
Andreas Gruenbacher
6bb7c1bf5a Revert "gfs2: fix a deadlock on withdraw-during-mount"
The current withdraw code duplicates the journal recovery code gfs2
already has for dealing with node failures, and it does so poorly.  That
code was added because when releasing a lockspace, we didn't have a way
to indicate that the lockspace needs recovery.  We now do have this
feature, so the current withdraw code can be removed almost entirely.
This is one of several steps towards that.

Reverts commit 865cc3e9cc ("gfs2: fix a deadlock on
withdraw-during-mount").

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26 23:52:27 +00:00
Andreas Gruenbacher
dcc42d5541 Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (6/6)
The current withdraw code duplicates the journal recovery code gfs2
already has for dealing with node failures, and it does so poorly.  That
code was added because when releasing a lockspace, we didn't have a way
to indicate that the lockspace needs recovery.  We now do have this
feature, so the current withdraw code can be removed almost entirely.
This is one of several steps towards that.

Reverts parts of commit 601ef0d52e ("gfs2: Force withdraw to replay
journals and wait for it to finish").

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26 23:52:27 +00:00
Andreas Gruenbacher
406058184c Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (5/6)
The current withdraw code duplicates the journal recovery code gfs2
already has for dealing with node failures, and it does so poorly.  That
code was added because when releasing a lockspace, we didn't have a way
to indicate that the lockspace needs recovery.  We now do have this
feature, so the current withdraw code can be removed almost entirely.
This is one of several steps towards that.

Reverts parts of commit 601ef0d52e ("gfs2: Force withdraw to replay
journals and wait for it to finish").

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26 23:52:27 +00:00
Andreas Gruenbacher
a07a1e46d2 Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (4/6)
The current withdraw code duplicates the journal recovery code gfs2
already has for dealing with node failures, and it does so poorly.  That
code was added because when releasing a lockspace, we didn't have a way
to indicate that the lockspace needs recovery.  We now do have this
feature, so the current withdraw code can be removed almost entirely.
This is one of several steps towards that.

Reverts parts of commit 601ef0d52e ("gfs2: Force withdraw to replay
journals and wait for it to finish").

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26 23:52:27 +00:00