Commit graph

104278 commits

Author SHA1 Message Date
Linus Torvalds
0e335a7745 vfs-7.0-rc2.fixes
Please consider pulling these changes from the signed vfs-7.0-rc2.fixes tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaZ7xWAAKCRCRxhvAZXjc
 onpeAP4qOrTURIAX9M/NGCHywvjI91ZJt20J6vm0X6KbVV/ebQD/eoJ21xzPhG9M
 gN7oRcZ9SW3e/AdtdnlqB0PEP+cyGwM=
 =9Ji+
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.0-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Fix an uninitialized variable in file_getattr().

   The flags_valid field wasn't initialized before calling
   vfs_fileattr_get(), triggering KMSAN uninit-value reports in fuse

 - Fix writeback wakeup and logging timeouts when DETECT_HUNG_TASK is
   not enabled.

   sysctl_hung_task_timeout_secs is 0 in that case causing spurious
   "waiting for writeback completion for more than 1 seconds" warnings

 - Fix a null-ptr-deref in do_statmount() when the mount is internal

 - Add missing kernel-doc description for the @private parameter in
   iomap_readahead()

 - Fix mount namespace creation to hold namespace_sem across the mount
   copy in create_new_namespace().

   The previous drop-and-reacquire pattern was fragile and failed to
   clean up mount propagation links if the real rootfs was a shared or
   dependent mount

 - Fix /proc mount iteration where m->index wasn't updated when
   m->show() overflows, causing a restart to repeatedly show the same
   mount entry in a rapidly expanding mount table

 - Return EFSCORRUPTED instead of ENOSPC in minix_new_inode() when the
   inode number is out of range

 - Fix unshare(2) when CLONE_NEWNS is set and current->fs isn't shared.

   copy_mnt_ns() received the live fs_struct so if a subsequent
   namespace creation failed the rollback would leave pwd and root
   pointing to detached mounts. Always allocate a new fs_struct when
   CLONE_NEWNS is requested

 - fserror bug fixes:

    - Remove the unused fsnotify_sb_error() helper now that all callers
      have been converted to fserror_report_metadata

    - Fix a lockdep splat in fserror_report() where igrab() takes
      inode::i_lock which can be held in IRQ context.

      Replace igrab() with a direct i_count bump since filesystems
      should not report inodes that are about to be freed or not yet
      exposed

 - Handle error pointer in procfs for try_lookup_noperm()

 - Fix an integer overflow in ep_loop_check_proc() where recursive calls
   returning INT_MAX would overflow when +1 is added, breaking the
   recursion depth check

 - Fix a misleading break in pidfs

* tag 'vfs-7.0-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  pidfs: avoid misleading break
  eventpoll: Fix integer overflow in ep_loop_check_proc()
  proc: Fix pointer error dereference
  fserror: fix lockdep complaint when igrabbing inode
  fsnotify: drop unused helper
  unshare: fix unshare_fs() handling
  minix: Correct errno in minix_new_inode
  namespace: fix proc mount iteration
  mount: hold namespace_sem across copy in create_new_namespace()
  iomap: Describe @private in iomap_readahead()
  statmount: Fix the null-ptr-deref in do_statmount()
  writeback: Fix wakeup and logging timeouts for !DETECT_HUNG_TASK
  fs: init flags_valid before calling vfs_fileattr_get
2026-02-25 10:34:23 -08:00
Wilfred Mallawa
650b774cf9 xfs: add static size checks for ioctl UABI
The ioctl structures in libxfs/xfs_fs.h are missing static size checks.
It is useful to have static size checks for these structures as adding
new fields to them could cause issues (e.g. extra padding that may be
inserted by the compiler). So add these checks to xfs/xfs_ondisk.h.

Due to different padding/alignment requirements across different
architectures, to avoid build failures, some structures are ommited from
the size checks. For example, structures with "compat_" definitions in
xfs/xfs_ioctl32.h are ommited.

Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:50 +01:00
Wilfred Mallawa
e97cbf863d xfs: remove duplicate static size checks
In libxfs/xfs_ondisk.h, remove some duplicate entries of
XFS_CHECK_STRUCT_SIZE().

Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:49 +01:00
Nirjhar Roy (IBM)
9a654a8fa3 xfs: Add comments for usages of some macros.
Add comments explaining when to use XFS_IS_CORRUPT() and ASSERT()

Suggested-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:49 +01:00
Nirjhar Roy (IBM)
c2368fc89a xfs: Update lazy counters in xfs_growfs_rt_bmblock()
Update lazy counters in xfs_growfs_rt_bmblock() similar to the way it
is done xfs_growfs_data_private(). This is because the lazy counters are
not always updated and synching the counters will avoid inconsistencies
between frexents and rtextents(total realtime extent count). This will
be more useful once realtime shrink is implemented as this will prevent
some transient state to occur where frexents might be greater than
total rtextents.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:49 +01:00
Nirjhar Roy (IBM)
ac1d977096 xfs: Add a comment in xfs_log_sb()
Add a comment explaining why the sb_frextents are updated outside the
if (xfs_has_lazycount(mp) check even though it is a lazycounter.
RT groups are supported only in v5 filesystems which always have
lazycounter enabled - so putting it inside the if(xfs_has_lazycount(mp)
check is redundant.

Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:49 +01:00
Nirjhar Roy (IBM)
8baa9bccc0 xfs: Fix xfs_last_rt_bmblock()
Bug description:

If the size of the last rtgroup i.e, the rtg passed to
xfs_last_rt_bmblock() is such that the last rtextent falls in 0th word
offset of a bmblock of the bitmap file tracking this (last) rtgroup,
then in that case xfs_last_rt_bmblock() incorrectly returns the next
bmblock number instead of the current/last used bmblock number.
When xfs_last_rt_bmblock() incorrectly returns the next bmblock,
the loop to grow/modify the bmblocks in xfs_growfs_rtg() doesn't
execute and xfs_growfs basically does a nop in certain cases.

xfs_growfs will do a nop when the new size of the fs will have the same
number of rtgroups i.e, we are only growing the last rtgroup.

Reproduce:
$ mkfs.xfs -m metadir=0 -r rtdev=/dev/loop1 /dev/loop0 \
	-r size=32769b -f
$ mount -o rtdev=/dev/loop1 /dev/loop0 /mnt/scratch
$ xfs_growfs -R $(( 32769 + 1 )) /mnt/scratch
$ xfs_info /mnt/scratch | grep rtextents
$ # We can see that rtextents hasn't changed

Fix:
Fix this by returning the current/last used bmblock when the last
rtgroup size is not a multiple xfs_rtbitmap_rtx_per_rbmblock()
and the next bmblock when the rtgroup size is a multiple of
xfs_rtbitmap_rtx_per_rbmblock() i.e, the existing blocks are
completely used up.
Also, I have renamed xfs_last_rt_bmblock() to
xfs_last_rt_bmblock_to_extend() to signify that this function
returns the bmblock number to extend and NOT always the last used
bmblock number.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:49 +01:00
Darrick J. Wong
115ea07b94 xfs: don't report half-built inodes to fserror
Sam Sun apparently found a syzbot way to fuzz a filesystem such that
xfs_iget_cache_miss would free the inode before the fserror code could
catch up.  Frustratingly he doesn't use the syzbot dashboard so there's
no C reproducer and not even a full error report, so I'm guessing that:

Inodes that are being constructed or torn down inside XFS are not
visible to the VFS.  They should never be reported to fserror.
Also, any inode that has been freshly allocated in _cache_miss should be
marked INEW immediately because, well, it's an incompletely constructed
inode that isn't yet visible to the VFS.

Reported-by: Sam Sun <samsun1006219@gmail.com>
Fixes: 5eb4cb18e4 ("xfs: convey metadata health events to the health monitor")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:49 +01:00
Darrick J. Wong
75690e5fdd xfs: don't report metadata inodes to fserror
Internal metadata inodes are not exposed to userspace programs, so it
makes no sense to pass them to the fserror functions (aka fsnotify).
Instead, report metadata file problems as general filesystem corruption.

Fixes: 5eb4cb18e4 ("xfs: convey metadata health events to the health monitor")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:49 +01:00
Darrick J. Wong
94014a23e9 xfs: fix potential pointer access race in xfs_healthmon_get
Pankaj Raghav asks about this code in xfs_healthmon_get:

  hm = mp->m_healthmon;
  if (hm && !refcount_inc_not_zero(&hm->ref))
    hm = NULL;
  rcu_read_unlock();
  return hm;

(slightly edited to compress a mailing list thread)

"Nit: Should we do a READ_ONCE(mp->m_healthmon) here to avoid any
compiler tricks that can result in an undefined behaviour? I am not sure
if I am being paranoid here.

"So this is my understanding: RCU guarantees that we get a valid object
(actual data of m_healthmon) but does not guarantee the compiler will
not reread the pointer between checking if hm is !NULL and accessing the
pointer as we are doing it lockless.

"So just a barrier() call in rcu_read_lock is enough to make sure this
doesn't happen and probably adding a READ_ONCE() is not needed?"

After some initial confusion I concluded that he's correct.  The
compiler could very well eliminate the hm variable in favor of walking
the pointers twice, turning the code into:

  if (mp->m_healthmon && !refcount_inc_not_zero(&mp->m_healthmon->ref))

If this happens, then xfs_healthmon_detach can sneak in between the
two sides of the && expression and set mp->m_healthmon to NULL, and
thereby cause a null pointer dereference crash.  Fix this by using the
rcu pointer assignment and dereference functions, which ensure that the
proper reordering barriers are in place.

Practically speaking, gcc seems to allocate an actual variable for hm
and only reads mp->m_healthmon once (as intended), but we ought to be
more explicit about requiring this.

Reported-by: Pankaj Raghav <pankaj.raghav@linux.dev>
Fixes: a48373e7d3 ("xfs: start creating infrastructure for health monitoring")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:49 +01:00
Darrick J. Wong
eb8550fb75 xfs: fix xfs_group release bug in xfs_dax_notify_dev_failure
Chris Mason reports that his AI tools noticed that we were using
xfs_perag_put and xfs_group_put to release the group reference returned
by xfs_group_next_range.  However, the iterator function returns an
object with an active refcount, which means that we must use the correct
function to release the active refcount, which is _rele.

Cc: <stable@vger.kernel.org> # v6.0
Fixes: 6f643c57d5 ("xfs: implement ->notify_failure() for XFS")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:49 +01:00
Darrick J. Wong
161456987a xfs: fix xfs_group release bug in xfs_verify_report_losses
Chris Mason reports that his AI tools noticed that we were using
xfs_perag_put and xfs_group_put to release the group reference returned
by xfs_group_next_range.  However, the iterator function returns an
object with an active refcount, which means that we must use the correct
function to release the active refcount, which is _rele.

Fixes: b8accfd65d ("xfs: add media verification ioctl")
Reported-by: Chris Mason <clm@meta.com>
Link: https://lore.kernel.org/linux-xfs/20260206030527.2506821-1-clm@meta.com/
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:48 +01:00
Darrick J. Wong
e764dd439d xfs: fix copy-paste error in previous fix
Chris Mason noticed that there is a copy-paste error in a recent change
to xrep_dir_teardown that nulls out pointers after freeing the
resources.

Fixes: ba408d299a ("xfs: only call xf{array,blob}_destroy if we have a valid pointer")
Link: https://lore.kernel.org/linux-xfs/20260205194211.2307232-1-clm@meta.com/
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:48 +01:00
Ethan Tidmore
cddfa648f1 xfs: Fix error pointer dereference
The function try_lookup_noperm() can return an error pointer and is not
checked for one.

Add checks for error pointer in xrep_adoption_check_dcache() and
xrep_adoption_zap_dcache().

Detected by Smatch:
fs/xfs/scrub/orphanage.c:449 xrep_adoption_check_dcache() error:
'd_child' dereferencing possible ERR_PTR()

fs/xfs/scrub/orphanage.c:485 xrep_adoption_zap_dcache() error:
'd_child' dereferencing possible ERR_PTR()

Fixes: 73597e3e42 ("xfs: ensure dentry consistency when the orphanage adopts a file")
Cc: stable@vger.kernel.org # v6.16
Signed-off-by: Ethan Tidmore <ethantidmore06@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:48 +01:00
Christoph Hellwig
47553dd60b xfs: remove metafile inodes from the active inode stat
The active inode (or active vnode until recently) stat can get much larger
than expected on file systems with a lot of metafile inodes like zoned
file systems on SMR hard disks with 10.000s of rtg rmap inodes.

Remove all metafile inodes from the active counter to make it more useful
to track actual workloads and add a separate counter for active metafile
inodes.

This fixes xfs/177 on SMR hard drives.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:48 +01:00
Christoph Hellwig
03a6d6c4c8 xfs: cleanup inode counter stats
Most of them are unused, so mark them as such.  Give the remaining ones
names that match their use instead of the historic IRIX ones based on
vnodes.  Note that the names are purely internal to the XFS code, the
user interface is based on section names and arrays of counters.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:48 +01:00
Wilfred Mallawa
fd81d3fd01 xfs: fix code alignment issues in xfs_ondisk.c
Fixup some code alignment issues in xfs_ondisk.c

Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:48 +01:00
Nirjhar Roy (IBM)
4ad85e633b xfs: Replace &rtg->rtg_group with rtg_group()
Use the already existing rtg_group() wrapper instead of directly
accessing the struct xfs_group member in struct xfs_rtgroup.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
[cem: Conflict resolution against 06873dbd94]
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:48 +01:00
Nirjhar Roy (IBM)
a49b7ff63f xfs: Refactoring the nagcount and delta calculation
Introduce xfs_growfs_compute_delta() to calculate the nagcount
and delta blocks and refactor the code from xfs_growfs_data_private().
No functional changes.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:48 +01:00
Nirjhar Roy (IBM)
18c16f602a xfs: Replace ASSERT with XFS_IS_CORRUPT in xfs_rtcopy_summary()
Replace ASSERT(sum > 0) with an XFS_IS_CORRUPT() and place it just
after the call to xfs_rtget_summary() so that we don't end up using
an illegal value of sum.

Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-02-25 13:58:48 +01:00
Gao Xiang
4a2d046e4b erofs: fix interlaced plain identification for encoded extents
Only plain data whose start position and on-disk physical length are
both aligned to the block size should be classified as interlaced
plain extents. Otherwise, it must be treated as shifted plain extents.

This issue was found by syzbot using a crafted compressed image
containing plain extents with unaligned physical lengths, which can
cause OOB read in z_erofs_transform_plain().

Reported-and-tested-by: syzbot+d988dc155e740d76a331@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/699d5714.050a0220.cdd3c.03e7.GAE@google.com
Fixes: 1d191b4ca5 ("erofs: implement encoded extent metadata")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-02-25 17:40:58 +08:00
Phillip Lougher
fdb24a820a Squashfs: check metadata block offset is within range
Syzkaller reports a "general protection fault in squashfs_copy_data"

This is ultimately caused by a corrupted index look-up table, which
produces a negative metadata block offset.

This is subsequently passed to squashfs_copy_data (via
squashfs_read_metadata) where the negative offset causes an out of bounds
access.

The fix is to check that the offset is within range in
squashfs_read_metadata.  This will trap this and other cases.

Link: https://lkml.kernel.org/r/20260217050955.138351-1-phillip@squashfs.org.uk
Fixes: f400e12656 ("Squashfs: cache operations")
Reported-by: syzbot+a9747fe1c35a5b115d3f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/699234e2.a70a0220.2c38d7.00e2.GAE@google.com/
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-24 11:13:27 -08:00
Jeff Layton
364410170a nfsd: report the requested maximum number of threads instead of number running
The current netlink and /proc interfaces deviate from their traditional
values when dynamic threading is enabled, and there is currently no way
to know what the current setting is. This patch brings the reporting
back in line with traditional behavior.

Make these interfaces report the requested maximum number of threads
instead of the number currently running. Also, update documentation and
comments to reflect that this value represents a maximum and not the
number currently running.

Fixes: d8316b837c ("nfsd: add controls to set the minimum number of threads per pool")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-02-24 10:27:51 -05:00
Christian Brauner
4a1ddb0f1c
pidfs: avoid misleading break
The break would only break out of the scoped_guard() loop, not the
switch statement. It still works correct as is ofc but let's avoid the
confusion.

Reported-by: David Lechner <dlechner@baylibre.com>
Link:: https://lore.kernel.org/cd2153f1-098b-463c-bbc1-5c6ca9ef1f12@baylibre.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-24 12:09:00 +01:00
Ferry Meng
bf4fde7db4 erofs: remove more unnecessary #ifdefs
Many #ifdefs can be replaced with IS_ENABLED() to improve code
readability.  No functional changes.

Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-02-24 18:36:52 +08:00
Jann Horn
fdcfce9307
eventpoll: Fix integer overflow in ep_loop_check_proc()
If a recursive call to ep_loop_check_proc() hits the `result = INT_MAX`,
an integer overflow will occur in the calling ep_loop_check_proc() at
`result = max(result, ep_loop_check_proc(ep_tovisit, depth + 1) + 1)`,
breaking the recursion depth check.

Fix it by using a different placeholder value that can't lead to an
overflow.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: f2e467a482 ("eventpoll: Fix semi-unbounded recursion")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://patch.msgid.link/20260223-epoll-int-overflow-v1-1-452f35132224@google.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-24 10:21:30 +01:00
Mathieu Desnoyers
3b68df9781 rseq: slice ext: Ensure rseq feature size differs from original rseq size
Before rseq became extensible, its original size was 32 bytes even
though the active rseq area was only 20 bytes. This had the following
impact in terms of userspace ecosystem evolution:

* The GNU libc between 2.35 and 2.39 expose a __rseq_size symbol set
  to 32, even though the size of the active rseq area is really 20.
* The GNU libc 2.40 changes this __rseq_size to 20, thus making it
  express the active rseq area.
* Starting from glibc 2.41, __rseq_size corresponds to the
  AT_RSEQ_FEATURE_SIZE from getauxval(3).

This means that users of __rseq_size can always expect it to
correspond to the active rseq area, except for the value 32, for
which the active rseq area is 20 bytes.

Exposing a 32 bytes feature size would make life needlessly painful
for userspace. Therefore, add a reserved field at the end of the
rseq area to bump the feature size to 33 bytes. This reserved field
is expected to be replaced with whatever field will come next,
expecting that this field will be larger than 1 byte.

The effect of this change is to increase the size from 32 to 64 bytes
before we actually have fields using that memory.

Clarify the allocation size and alignment requirements in the struct
rseq uapi comment.

Change the value returned by getauxval(AT_RSEQ_ALIGN) to return the
value of the active rseq area size rounded up to next power of 2, which
guarantees that the rseq structure will always be aligned on the nearest
power of two large enough to contain it, even as it grows. Change the
alignment check in the rseq registration accordingly.

This will minimize the amount of ABI corner-cases we need to document
and require userspace to play games with. The rule stays simple when
__rseq_size != 32:

  #define rseq_field_available(field)	(__rseq_size >= offsetofend(struct rseq_abi, field))

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260220200642.1317826-3-mathieu.desnoyers@efficios.com
2026-02-23 11:19:19 +01:00
Hongbo Li
03c0d030f5 erofs: allow sharing page cache with the same aops only
Inode with identical data but different @aops cannot be mixed
because the page cache is managed by different subsystems (e.g.,
@aops for compressed on-disk inodes cannot handle plain on-disk
inodes).

In this patch, we never allow inodes to share the page cache
among plain, compressed, and fileio cases. When a shared inode
is created, we initialize @aops that is the same as the initial
real inode, and subsequent inodes cannot share the page cache
if the inferred @aops differ from the corresponding shared inode.

This is reasonable as a first step because, in typical use cases,
if an inode is compressible, it will fall into compressed
inodes across different filesystem images unless users use plain
filesystems. However, in that cases, users will use plain
filesystems all the time.

Fixes: 5ef3208e3b ("erofs: introduce the page cache share feature")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-02-23 18:04:18 +08:00
Nicholas Carlini
6b4f875aac ksmbd: fix signededness bug in smb_direct_prepare_negotiation()
smb_direct_prepare_negotiation() casts an unsigned __u32 value
from sp->max_recv_size and req->preferred_send_size to a signed
int before computing min_t(int, ...). A maliciously provided
preferred_send_size of 0x80000000 will return as smaller than
max_recv_size, and then be used to set the maximum allowed
alowed receive size for the next message.

By sending a second message with a large value (>1420 bytes)
the attacker can then achieve a heap buffer overflow.

This fix replaces min_t(int, ...) with min_t(u32)

Fixes: 0626e6641f ("cifsd: add server handler for central processing and tranport layers")
Signed-off-by: Nicholas Carlini <nicholas@carlini.com>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Stefan Metzmacher <metze@samba.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2026-02-22 21:27:33 -06:00
Eric Biggers
c5794709bc ksmbd: Compare MACs in constant time
To prevent timing attacks, MAC comparisons need to be constant-time.
Replace the memcmp() with the correct function, crypto_memneq().

Fixes: e2f34481b2 ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2026-02-22 21:27:28 -06:00
Henrique Carvalho
663c28469d smb: client: fix cifs_pick_channel when channels are equally loaded
cifs_pick_channel uses (start % chan_count) when channels are equally
loaded, but that can return a channel that failed the eligibility
checks.

Drop the fallback and return the scan-selected channel instead. If none
is eligible, keep the existing behavior of using the primary channel.

Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Acked-by: Meetakshi Setiya <msetiya@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2026-02-22 16:52:50 -06:00
Linus Torvalds
fbf3380361 fsverity fixes for v7.0-rc1
- Fix a build error on parisc
 
 - Remove the non-large-folio-aware function fsverity_verify_page()
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCaZtg+xQcZWJpZ2dlcnNA
 a2VybmVsLm9yZwAKCRDzXCl4vpKOKwQ+AQCiXEYAibl3vHRgQo7qEPCC5or4FtkF
 HZ0ERRArhsU17AD/TKHE/AJkyFrwK4rGTb6I9Wi1OXnpG7jihZlYjj03Ag4=
 =CUql
 -----END PGP SIGNATURE-----

Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux

Pull fsverity fixes from Eric Biggers:

 - Fix a build error on parisc

 - Remove the non-large-folio-aware function fsverity_verify_page()

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
  fsverity: fix build error by adding fsverity_readahead() stub
  fsverity: remove fsverity_verify_page()
  f2fs: make f2fs_verify_cluster() partially large-folio-aware
  f2fs: remove unnecessary ClearPageUptodate in f2fs_verify_cluster()
2026-02-22 13:12:04 -08:00
Kees Cook
189f164e57 Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses
Conversion performed via this Coccinelle script:

  // SPDX-License-Identifier: GPL-2.0-only
  // Options: --include-headers-for-types --all-includes --include-headers --keep-comments
  virtual patch

  @gfp depends on patch && !(file in "tools") && !(file in "samples")@
  identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex,
 		    kzalloc_obj,kzalloc_objs,kzalloc_flex,
		    kvmalloc_obj,kvmalloc_objs,kvmalloc_flex,
		    kvzalloc_obj,kvzalloc_objs,kvzalloc_flex};
  @@

  	ALLOC(...
  -		, GFP_KERNEL
  	)

  $ make coccicheck MODE=patch COCCI=gfp.cocci

Build and boot tested x86_64 with Fedora 42's GCC and Clang:

Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01

Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-22 08:26:33 -08:00
Linus Torvalds
32a92f8c89 Convert more 'alloc_obj' cases to default GFP_KERNEL arguments
This converts some of the visually simpler cases that have been split
over multiple lines.  I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.

Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script.  I probably had made it a bit _too_ trivial.

So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.

The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 20:03:00 -08:00
Linus Torvalds
323bbfcf1e Convert 'alloc_flex' family to use the new default GFP_KERNEL argument
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.

As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Linus Torvalds
8934827db5 kmalloc_obj treewide refactoring for v7.0-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaZl14wAKCRA2KwveOeQk
 uz8aAQCBFLYlij3Y3ivVADkBxuVF3xECaznFya41ENYsBwlHdwEArXqMyNrw+DiG
 TvWCK/tiddNmGIRpI2sxBFzyRpsHfAY=
 =rVD3
 -----END PGP SIGNATURE-----

Merge tag 'kmalloc_obj-treewide-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kmalloc_obj conversion from Kees Cook:
 "This does the tree-wide conversion to kmalloc_obj() and friends using
  coccinelle, with a subsequent small manual cleanup of whitespace
  alignment that coccinelle does not handle.

  This uncovered a clang bug in __builtin_counted_by_ref(), so the
  conversion is preceded by disabling that for current versions of
  clang.  The imminent clang 22.1 release has the fix.

  I've done allmodconfig build tests for x86_64, arm64, i386, and arm. I
  did defconfig builds for alpha, m68k, mips, parisc, powerpc, riscv,
  s390, sparc, sh, arc, csky, xtensa, hexagon, and openrisc"

* tag 'kmalloc_obj-treewide-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  kmalloc_obj: Clean up after treewide replacements
  treewide: Replace kmalloc with kmalloc_obj for non-scalar types
  compiler_types: Disable __builtin_counted_by_ref for Clang
2026-02-21 11:02:58 -08:00
Linus Torvalds
f9d66e64a2 io_uring-20260221
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmZszYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpvW9D/9qI0JrQ7zw9/zjFVOqFHVbwJUErTwUiSE/
 bKpbDOUcm1utiTIw8nWuhhmXbPq8MT+3Uyyk24UFpcIM+QZrqUHRRPYcqHwNy0mw
 3QXZut2QkwHKO/tWoKXIguCVQKtPteBjfH+La1IRZPLKK+MozQEloGMulbZwIP0j
 ynEX7tA8VYwuGtgmzBiTt7Ba2W5EIJ0SSfmdY9XajKDHrZCksx5iHnC/uJg0qEc0
 QQr4YZisyz4VxG4MxsJvpv6/wiic07MRSYa5cJXntknmPGc9zj6wg79WWB/lkbuA
 w95+X+EorHyCvKk2Ny9uwl1/UOp9VQbKQnVIZ58LBC4ILPj6Qk7a7WyxTbfl358d
 QNhrcspgcLwakQkVkhR1Z4SaCqMdc6i1hXx5m2nhXFevKipeyg11uoM75bZEzWsN
 63+6+S1upwSqFTk2nto8gKg+7aMGOvjT/Ae0kNvUD/WPXSkC66gSzADZdv2C4i4Y
 QdFs8UN8jKRXd5Rx7pthAM0E//XFHko/tGJnHUIzQ3Nk7xeXp4BeXOdp/m/FvpDe
 BakTvLHvvk5s1X9Bkv5hSXua6ZBFlgcsqn5VdVplQOdWPKwU0Mjg28z6AAzdZxLF
 mCcgDqDqls19E9HI7nFJOUJtBQjoCl4sVEuPvs06agY04sJTXWkYph5pwhkNJOKD
 NwumZJpb2A==
 =Pkge
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-20260221' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fixes from Jens Axboe:

 - A fix for a missing URING_CMD128 opcode check, fixing an issue with
   the SQE mixed mode support introduced in 6.19. Merged late due to
   having multiple dependencies

 - Add sqe->cmd size checking for big SQEs, similar to what we have for
   normal sized SQEs

 - Fix a race condition in zcrx, that leads to a double free

* tag 'io_uring-20260221' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring: Add size check for sqe->cmd
  io_uring: add IORING_OP_URING_CMD128 to opcode checks
  io_uring/zcrx: fix user_ref race between scrub and refill paths
2026-02-21 10:05:49 -08:00
Linus Torvalds
8eb604d4ee two small server fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmmYg8MACgkQiiy9cAdy
 T1HLbAwAjr1Eg4hDbI1GBLxFkB5pTxYmnHtj8eFJLNXpmpfe4/4f29e7S7hs3ChX
 UIIbFAFKPWAC7dFwAQ8rnYw4e3HPv+eFZxDzfi9Org8TxYGL9kew8JnqZtvCYtQd
 bHFndrZW3C8x2kj08O6uA0WWIhpCffZ4avig4Azkw5a6ukLzEpsLcODyJb/9Dx4Y
 EbHNjYZX3MMbbtNh47Bo2JVNjk5Nyxzf1kSdA46eHa5aQoy8DbMuJqFj0b9oUxqh
 E6QBPeZ0aZuSQZQUh3LtHnDS/aqFAn4zchtwALhHephU3SQZ1pLR8Uz+IwFI70MK
 bOCWI//LCGvdcXqJ6rspLplte9O6SoFNyqo+PuH1lnWCknnFNVKJWlOiu9LZnNe0
 G99VONqwdKlXs5MDVp2LVpW5VbxJEw5TsrWGQJL6FGMGWiSDPt4rpcA/ugokKhVr
 aeqlV3dWXBJcKIE9G+8XHNn5lwsgfI4bfs8phdrKxEVSshEZbu/B2hg6X+f2guWJ
 poF2ZJ4g
 =UcGf
 -----END PGP SIGNATURE-----

Merge tag 'v7.0-rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
 "Two small fixes:

   - fix potential deadlock

   - minor cleanup"

* tag 'v7.0-rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: call ksmbd_vfs_kern_path_end_removing() on some error paths
  smb: server: Remove duplicate include of misc.h
2026-02-21 09:11:32 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
b3f1da2a4d for-7.0-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmmYpGMACgkQxWXV+ddt
 WDvKNQ//cy7gb2nItc9hbYXUM/88Ks3Usu94u/4gREL9I97u2vHer6sT8cRfWA2k
 OSOZF2L7yPWIcxcj39YC66ubs7uebSwt/bZAL1TKAyA7wUvR/kdhD7DUTWX4ySf3
 2+1BANv1Bng8C7vGnWDhYPHcb1u8LvKxKcn+9h8SzBGpW5dyx3k4xUrneaMYq+jf
 D9sPjkkM6fxsKn+S3OJP/zFUIQ2DQiv7nF+Jv4Ke2h9c9nCVfn8fRK0AuTlYXFY/
 mWkKWo1ATGVd0fBg/otRp/ZlZczoKs3/1YBUMYTxZZngyweIms4Q6I4/GIGHO+RD
 QFFoIQ7OQd0aqBGhuKTDlYMlc6OS2jwoTgVYr6vSIxSRUsCK/grHdPL+s+9dLc3h
 p7+/URH9Gpfad46wFypb5w7zmmc8jCRkR1Ff+jf6Pi8GgffqocCro3C3HlGRKwcf
 CAj6gI3ypNPNFfYidcKbS+ehXhjmMVb9xhNa8YwCC1CdgM54ZMmEs/ksAN+uBc/u
 EfcAbB3T15LQgzUJs2WKvCI3E/0XUYEi54ng8UwCJ6P01p3egfvQo8t6jZal9Vx8
 ba/LUG50W1xRRjxgG1AU5s42GmGkO8WNyIixmLlT+Pwog0I2auPVDQBudbXZK4ps
 +FOtNnN9hYLmuZyRSTT03MHHf0Rqtckdjvq3413KMFILVh+ZM+Q=
 =CQsu
 -----END PGP SIGNATURE-----

Merge tag 'for-7.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - multiple error handling fixes of unexpected conditions

 - reset block group size class once it becomes empty so that
   its class can be changed

 - error message level adjustments

 - fixes of returned error values

 - use correct block reserve for delayed refs

* tag 'for-7.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix invalid leaf access in btrfs_quota_enable() if ref key not found
  btrfs: fix lost error return in btrfs_find_orphan_roots()
  btrfs: fix lost return value on error in finish_verity()
  btrfs: change unaligned root messages to error level in btrfs_validate_super()
  btrfs: use the correct type to initialize block reserve for delayed refs
  btrfs: do not ASSERT() when the fs flips RO inside btrfs_repair_io_failure()
  btrfs: reset block group size class when it becomes empty
  btrfs: replace BUG() with error handling in __btrfs_balance()
  btrfs: handle unexpected exact match in btrfs_set_inode_index_count()
2026-02-20 14:57:09 -08:00
Linus Torvalds
233a0c0f44 eCryptfs fixes for 7.0-rc1
The set of eCryptfs patches for the 7.0-rc1 merge window consists of
 cleanups that are not intended to have any functional changes:
 - Comment typo fixes
 - Removal of an unused function declaration
 - Use strscpy() instead of the deprecated strcpy()
 - Use string copying helpers instead of memcpy() and manually
   terminating strings
 
 The patches have all spent time in linux-next and they do not regress
 the tests in the ecryptfs-utils tree.
 
 Signed-off-by: Tyler Hicks <code@tyhicks.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEKvuQkp28KvPJn/fVfL7QslYSS/0FAmmWWPwRHGNvZGVAdHlo
 aWNrcy5jb20ACgkQfL7QslYSS/3zQhAAhyHq55LJVxfSOACyabXxxQM4cpmN3vlu
 fgIsfly+mBYPKQg0/vfTAfC8ln5YhjFMcubVbohXcj6FmHGulqoB8A5w74wvksDN
 6kzFIUEg/k10AKijt36gb6ef6BjN00j+LeeSUWZ9uMtJt3SR7UuIKpbxB2sRJSHv
 Q7DguW6Wr4APDyZxmmyPPjM4CFHjTO0a5ePSRkqtLV0Qebtnk1q/v9Ddidmz1N5t
 dWNwcn8mXYSzjorFhzhFy0wVML2nStF1DgEy+7FQVzmpHxPfGN/F07TGLj7rW+hr
 BI9GQpvLOJ/EJ8ssBFqe4Ns44upo/WLnBo4PoxD+iuvPBTJLeP/b7wnHduu6ndjy
 4iZqStOhYNMWolQlnx/PmUAzBYn6K2hKOR/egUcHazWnpq5/FnXMKu6bWz4RE+fS
 n/Pi3XwIwos7zqToWnU7alcCKDwhzamRRicSYWX3ezzj3TbacFJTLv0KO+7W5HpE
 wqnLrwqSpC2tWl/4Qrm70ZekVK8ZMGW0+HwabXupWcAUZN6OmXy/B4sNkRCsZiuA
 TGeaiOMlH7AbF7QKvCI3YKRuPxKutgbzyqFfzSRYH97wPmf+1H0DRXfTVqMs1Wa8
 97KBzOxKpiJdJkvMC82xKa+1usc8JrzeGcErhleSQTgzIijQvoh08wF+isunJROx
 aWOTbdbl508=
 =Ak1f
 -----END PGP SIGNATURE-----

Merge tag 'ecryptfs-7.0-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull ecryptfs updates from Tyler Hicks:
 "This consists of some really minor typo fixes that fell through the
  cracks and some more recent code cleanups:

   - Comment typo fixes

   - Removal of an unused function declaration

   - Use strscpy() instead of the deprecated strcpy()

   - Use string copying helpers instead of memcpy() and manually
     terminating strings"

* tag 'ecryptfs-7.0-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  ecryptfs: Replace memcpy + NUL termination in ecryptfs_copy_filename
  ecryptfs: Drop redundant NUL terminations after calling ecryptfs_to_hex
  ecryptfs: Replace memcpy + NUL termination in ecryptfs_new_file_context
  ecryptfs: Replace strcpy with strscpy in ecryptfs_validate_options
  ecryptfs: Replace strcpy with strscpy in ecryptfs_cipher_code_to_string
  ecryptfs: Replace strcpy with strscpy in ecryptfs_set_default_crypt_stat_vals
  ecryptfs: simplify list initialization in ecryptfs_parse_packet_set()
  ecryptfs: Remove unused declartion ecryptfs_fill_zeros()
  ecryptfs: Fix packet format comment in parse_tag_67_packet()
  ecryptfs: comment typo fix
  ecryptfs: keystore: Fix typo 'the the' in comment
2026-02-20 14:46:31 -08:00
Ethan Tidmore
f6a495484a
proc: Fix pointer error dereference
The function try_lookup_noperm() can return an error pointer. Add check
for error pointer.

Detected by Smatch:
fs/proc/base.c:2148 proc_fill_cache() error:
'child' dereferencing possible ERR_PTR()

Fixes: 1df98b8bbc ("proc_fill_cache(): clean up, get rid of pointless find_inode_number() use")
Signed-off-by: Ethan Tidmore <ethantidmore06@gmail.com>
Link: https://patch.msgid.link/20260219221001.1117135-1-ethantidmore06@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-20 11:21:16 +01:00
Govindarajulu Varadarajan
ea129e55c9 io_uring: Add size check for sqe->cmd
For SQE128, sqe->cmd provides 80 bytes for uring_cmd. Add macro to
check if size of user struct does not exceed 80 bytes at compile time.
User doesn't have to track this manually during development.

Replace io_uring_sqe_cmd() inline func with macro and add
io_uring_sqe128_cmd() which checks struct
size for 16 bytes cmd and 80 bytes cmd respectively.

Signed-off-by: Govindarajulu Varadarajan <govind.varadar@gmail.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-02-19 07:26:26 -07:00
Darrick J. Wong
294f54f849
fserror: fix lockdep complaint when igrabbing inode
Christoph Hellwig reported a lockdep splat in generic/108:

 ================================
 WARNING: inconsistent lock state
 6.19.0+ #4827 Tainted: G                 N
 --------------------------------
 inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
 swapper/1/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
 ffff88811ed1b140 (&sb->s_type->i_lock_key#33){?.+.}-{3:3}, at: igrab+0x1a/0xb0
 {HARDIRQ-ON-W} state was registered at:
   lock_acquire+0xca/0x2c0
   _raw_spin_lock+0x2e/0x40
   unlock_new_inode+0x2c/0xc0
   xfs_iget+0xcf4/0x1080
   xfs_trans_metafile_iget+0x3d/0x100
   xfs_metafile_iget+0x2b/0x50
   xfs_mount_setup_metadir+0x20/0x60
   xfs_mountfs+0x457/0xa60
   xfs_fs_fill_super+0x6b3/0xa90
   get_tree_bdev_flags+0x13c/0x1e0
   vfs_get_tree+0x27/0xe0
   vfs_cmd_create+0x54/0xe0
   __do_sys_fsconfig+0x309/0x620
   do_syscall_64+0x8b/0xf80
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
 irq event stamp: 139080
 hardirqs last  enabled at (139079): [<ffffffff813a923c>] do_idle+0x1ec/0x270
 hardirqs last disabled at (139080): [<ffffffff828a8d09>] common_interrupt+0x19/0xe0
 softirqs last  enabled at (139032): [<ffffffff8134a853>] __irq_exit_rcu+0xc3/0x120
 softirqs last disabled at (139025): [<ffffffff8134a853>] __irq_exit_rcu+0xc3/0x120

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&sb->s_type->i_lock_key#33);
   <Interrupt>
     lock(&sb->s_type->i_lock_key#33);

  *** DEADLOCK ***

 1 lock held by swapper/1/0:
  #0: ffff8881052c81a0 (&vblk->vqs[i].lock){-.-.}-{3:3}, at: virtblk_done+0x4b/0x110

 stack backtrace:
 CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G                 N  6.19.0+ #4827 PREEMPT(full)
 Tainted: [N]=TEST
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
 Call Trace:
  <IRQ>
  dump_stack_lvl+0x5b/0x80
  print_usage_bug.part.0+0x22c/0x2c0
  mark_lock+0xa6f/0xe90
  __lock_acquire+0x10b6/0x25e0
  lock_acquire+0xca/0x2c0
  _raw_spin_lock+0x2e/0x40
  igrab+0x1a/0xb0
  fserror_report+0x135/0x260
  iomap_finish_ioend_buffered+0x170/0x210
  clone_endio+0x8f/0x1c0
  blk_update_request+0x1e4/0x4d0
  blk_mq_end_request+0x1b/0x100
  virtblk_done+0x6f/0x110
  vring_interrupt+0x59/0x80
  __handle_irq_event_percpu+0x8a/0x2e0
  handle_irq_event+0x33/0x70
  handle_edge_irq+0xdd/0x1e0
  __common_interrupt+0x6f/0x180
  common_interrupt+0xb7/0xe0
  </IRQ>

It looks like the concern here is that inode::i_lock is sometimes taken
in IRQ context, and sometimes it is held when going to IRQ context,
though it's a little difficult to tell since I think this is a kernel
from after the actual 6.19 release but before 7.0-rc1.

Either way, we don't need to take i_lock, because filesystems should
not report files to fserror if they're about to be freed or have not
yet been exposed to other threads, because the resulting fsnotify report
will be meaningless.

Therefore, bump inode::i_count directly and clarify the preconditions on
the inode being passed in.

Link: https://lore.kernel.org/linux-fsdevel/aY7BndIgQg3ci_6s@infradead.org/
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://patch.msgid.link/177148129564.716249.3069780698231701540.stgit@frogsfrogsfrogs
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-19 09:12:08 +01:00
Linus Torvalds
eeccf287a2 mm.git review status for linus..mm-stable
Total patches:       36
 Reviews/patch:       1.77
 Reviewed rate:       83%
 
 - The 2 patch series "mm/vmscan: fix demotion targets checks in
   reclaim/demotion" from Bing Jiao fixes a couple of issues in the
   demotion code - pages were failed demotion and were finding themselves
   demoted into disallowed nodes.
 
 - The 11 patch series "Remove XA_ZERO from error recovery of dup_mmap()"
   from Liam Howlett fixes a rare mapledtree race and performs a number of
   cleanups.
 
 - The 13 patch series "mm: add bitmap VMA flag helpers and convert all
   mmap_prepare to use them" from Lorenzo Stoakes implements a lot of
   cleanups following on from the conversion of the VMA flags into a
   bitmap.
 
 - The 5 patch series "support batch checking of references and unmapping
   for large folios" from Baolin Wang implements batching to greatly
   improve the performance of reclaiming clean file-backed large folios.
 
 - The 3 patch series "selftests/mm: add memory failure selftests" from
   Miaohe Lin does as claimed.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaZaIEQAKCRDdBJ7gKXxA
 jj73AQCQDwLoipDiQRGyjB5BDYydymWuDoiB1tlDPHfYAP3b/QD/UQtVlOEXqwM3
 naOKs3NQ1pwnfhDaQMirGw2eAnJ1SQY=
 =6Iif
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM  updates from Andrew Morton:

 - "mm/vmscan: fix demotion targets checks in reclaim/demotion" fixes a
   couple of issues in the demotion code - pages were failed demotion
   and were finding themselves demoted into disallowed nodes (Bing Jiao)

 - "Remove XA_ZERO from error recovery of dup_mmap()" fixes a rare
   mapledtree race and performs a number of cleanups (Liam Howlett)

 - "mm: add bitmap VMA flag helpers and convert all mmap_prepare to use
   them" implements a lot of cleanups following on from the conversion
   of the VMA flags into a bitmap (Lorenzo Stoakes)

 - "support batch checking of references and unmapping for large folios"
   implements batching to greatly improve the performance of reclaiming
   clean file-backed large folios (Baolin Wang)

 - "selftests/mm: add memory failure selftests" does as claimed (Miaohe
   Lin)

* tag 'mm-stable-2026-02-18-19-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (36 commits)
  mm/page_alloc: clear page->private in free_pages_prepare()
  selftests/mm: add memory failure dirty pagecache test
  selftests/mm: add memory failure clean pagecache test
  selftests/mm: add memory failure anonymous page test
  mm: rmap: support batched unmapping for file large folios
  arm64: mm: implement the architecture-specific clear_flush_young_ptes()
  arm64: mm: support batch clearing of the young flag for large folios
  arm64: mm: factor out the address and ptep alignment into a new helper
  mm: rmap: support batched checks of the references for large folios
  tools/testing/vma: add VMA userland tests for VMA flag functions
  tools/testing/vma: separate out vma_internal.h into logical headers
  tools/testing/vma: separate VMA userland tests into separate files
  mm: make vm_area_desc utilise vma_flags_t only
  mm: update all remaining mmap_prepare users to use vma_flags_t
  mm: update shmem_[kernel]_file_*() functions to use vma_flags_t
  mm: update secretmem to use VMA flags on mmap_prepare
  mm: update hugetlbfs to use VMA flags on mmap_prepare
  mm: add basic VMA flag operation helper functions
  tools: bitmap: add missing bitmap_[subset(), andnot()]
  mm: add mk_vma_flags() bitmap flag macro helper
  ...
2026-02-18 20:50:32 -08:00
Linus Torvalds
23b0f90ba8 Summary
* Removed macros from proc handler converters
 
   Replace the proc converter macros with "regular" functions. Though it is more
   verbose than the macro version, it helps when debugging and better aligns with
   coding-style.rst.
 
 * General cleanup
 
   Remove superfluous ctl_table forward declarations. Const qualify the
   memory_allocation_profiling_sysctl and loadpin_sysctl_table arrays. Add
   missing kernel doc to proc_dointvec_conv.
 
 * Testing
 
   This series was run through sysctl selftests/kunit test suite in
   x86_64. And went into linux-next after rc4, giving it a good 3 weeks of
   testing
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmmUabYACgkQupfNUreW
 QU8y2Qv/d2y35uQPRDh0HKWKWXJy41C2RJzd/rFCWJPCwo150whTSHIHkWYnu76g
 10QblBXQmXi9TVqFnJ7Il7PWgqkMPjzA13tfT9eXNWU8j2OB/mcVKNl9X4wm/jWi
 QxtGmBsIQ/nxb2pUzMCykzgfc5mLi2NQ8qhZ5bOnq7UW3zdYmzEqx+tRdvIacyIk
 adComi5v8xUDqyEbVFaBovuX2WHQkPyBMnD64nwWG93JpNG/+9PxGzv/DNUXY11Y
 epVOfSoKdJbSLjYoHEPEhT0aHjSydq3QHru7uF6wzKOFTfHej/XkXXbUnFXPO2Pn
 c5J0u/HziYG5eN2QTqGfrhECZYuCFPemtUozltbcgGebkl1wKH+k9K5vsCaz/mhk
 ihUC3mui++W/n9B9HJRYh1XeEpk6C1pWERCOx27XFZ25fSek2YO6ZWkT0q+gceC0
 t4+eIFSGJ3OzheJgHNK9XhTMWiQPmHyA6brXYGx4WeRvJFLpVddPF7k3Z89zIAu/
 Fut7FGTH
 =0Z+I
 -----END PGP SIGNATURE-----

Merge tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl

Pull sysctl updates from Joel Granados:

 - Remove macros from proc handler converters

   Replace the proc converter macros with "regular" functions. Though it
   is more verbose than the macro version, it helps when debugging and
   better aligns with coding-style.rst.

 - General cleanup

   Remove superfluous ctl_table forward declarations. Const qualify the
   memory_allocation_profiling_sysctl and loadpin_sysctl_table arrays.
   Add missing kernel doc to proc_dointvec_conv.

 - Testing

   This series was run through sysctl selftests/kunit test suite in
   x86_64. And went into linux-next after rc4, giving it a good 3 weeks
   of testing

* tag 'sysctl-7.00-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
  sysctl: replace SYSCTL_INT_CONV_CUSTOM macro with functions
  sysctl: Replace unidirectional INT converter macros with functions
  sysctl: Add kernel doc to proc_douintvec_conv
  sysctl: Replace UINT converter macros with functions
  sysctl: Add CONFIG_PROC_SYSCTL guards for converter macros
  sysctl: clarify proc_douintvec_minmax doc
  sysctl: Return -ENOSYS from proc_douintvec_conv when CONFIG_PROC_SYSCTL=n
  sysctl: Remove unused ctl_table forward declarations
  loadpin: Implement custom proc_handler for enforce
  alloc_tag: move memory_allocation_profiling_sysctls into .rodata
  sysctl: Add missing kernel-doc for proc_dointvec_conv
2026-02-18 10:45:36 -08:00
Filipe Manana
ecb7c2484c btrfs: fix invalid leaf access in btrfs_quota_enable() if ref key not found
If btrfs_search_slot_for_read() returns 1, it means we did not find any
key greater than or equals to the key we asked for, meaning we have
reached the end of the tree and therefore the path is not valid. If
this happens we need to break out of the loop and stop, instead of
continuing and accessing an invalid path.

Fixes: 5223cc60b4 ("btrfs: drop the path before adding qgroup items when enabling qgroups")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-18 15:25:54 +01:00
Filipe Manana
7b54e08f2e btrfs: fix lost error return in btrfs_find_orphan_roots()
If the call to btrfs_get_fs_root() returns an error different from -ENOENT
we break out of the loop and then return 0, losing the error. Fix this
by returning the error instead of breaking from the loop.

Reported-by: Chris Mason <clm@meta.com>
Link: https://lore.kernel.org/linux-btrfs/20260208185321.1128472-1-clm@meta.com/
Fixes: 8670a25ecb ("btrfs: use single return variable in btrfs_find_orphan_roots()")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-18 15:25:54 +01:00
Filipe Manana
29e525665a btrfs: fix lost return value on error in finish_verity()
If btrfs_update_inode() or del_orphan() fail, we jump to the 'end_trans'
label and then return 0 instead of the error returned by one of those
calls. Fix this and return the error.

Fixes: 61fb7f04ee ("btrfs: remove out label in finish_verity()")
Reported-by: Chris Mason <clm@meta.com>
Link: https://lore.kernel.org/linux-btrfs/20260208161129.3888234-1-clm@meta.com/
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-18 15:25:54 +01:00