Commit graph

2177 commits

Author SHA1 Message Date
Kees Cook
189f164e57 Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses
Conversion performed via this Coccinelle script:

  // SPDX-License-Identifier: GPL-2.0-only
  // Options: --include-headers-for-types --all-includes --include-headers --keep-comments
  virtual patch

  @gfp depends on patch && !(file in "tools") && !(file in "samples")@
  identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex,
 		    kzalloc_obj,kzalloc_objs,kzalloc_flex,
		    kvmalloc_obj,kvmalloc_objs,kvmalloc_flex,
		    kvzalloc_obj,kvzalloc_objs,kvzalloc_flex};
  @@

  	ALLOC(...
  -		, GFP_KERNEL
  	)

  $ make coccicheck MODE=patch COCCI=gfp.cocci

Build and boot tested x86_64 with Fedora 42's GCC and Clang:

Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01

Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-22 08:26:33 -08:00
Linus Torvalds
32a92f8c89 Convert more 'alloc_obj' cases to default GFP_KERNEL arguments
This converts some of the visually simpler cases that have been split
over multiple lines.  I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.

Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script.  I probably had made it a bit _too_ trivial.

So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.

The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 20:03:00 -08:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
136114e0ab mm.git review status for linus..mm-nonmm-stable
Total patches:       107
 Reviews/patch:       1.07
 Reviewed rate:       67%
 
 - The 2 patch series "ocfs2: give ocfs2 the ability to reclaim
   suballocator free bg" from Heming Zhao saves disk space by teaching
   ocfs2 to reclaim suballocator block group space.
 
 - The 4 patch series "Add ARRAY_END(), and use it to fix off-by-one
   bugs" from Alejandro Colomar adds the ARRAY_END() macro and uses it in
   various places.
 
 - The 2 patch series "vmcoreinfo: support VMCOREINFO_BYTES larger than
   PAGE_SIZE" from Pnina Feder makes the vmcore code future-safe, if
   VMCOREINFO_BYTES ever exceeds the page size.
 
 - The 7 patch series "kallsyms: Prevent invalid access when showing
   module buildid" from Petr Mladek cleans up kallsyms code related to
   module buildid and fixes an invalid access crash when printing
   backtraces.
 
 - The 3 patch series "Address page fault in
   ima_restore_measurement_list()" from Harshit Mogalapalli fixes a
   kexec-related crash that can occur when booting the second-stage kernel
   on x86.
 
 - The 6 patch series "kho: ABI headers and Documentation updates" from
   Mike Rapoport updates the kexec handover ABI documentation.
 
 - The 4 patch series "Align atomic storage" from Finn Thain adds the
   __aligned attribute to atomic_t and atomic64_t definitions to get
   natural alignment of both types on csky, m68k, microblaze, nios2,
   openrisc and sh.
 
 - The 2 patch series "kho: clean up page initialization logic" from
   Pratyush Yadav simplifies the page initialization logic in
   kho_restore_page().
 
 - The 6 patch series "Unload linux/kernel.h" from Yury Norov moves
   several things out of kernel.h and into more appropriate places.
 
 - The 7 patch series "don't abuse task_struct.group_leader" from Oleg
   Nesterov removes the usage of ->group_leader when it is "obviously
   unnecessary".
 
 - The 5 patch series "list private v2 & luo flb" from Pasha Tatashin
   adds some infrastructure improvements to the live update orchestrator.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY4giAAKCRDdBJ7gKXxA
 jgusAQDnKkP8UWTqXPC1jI+OrDJGU5ciAx8lzLeBVqMKzoYk9AD/TlhT2Nlx+Ef6
 0HCUHUD0FMvAw/7/Dfc6ZKxwBEIxyww=
 =mmsH
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
   disk space by teaching ocfs2 to reclaim suballocator block group
   space (Heming Zhao)

 - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
   ARRAY_END() macro and uses it in various places (Alejandro Colomar)

 - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
   the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
   page size (Pnina Feder)

 - "kallsyms: Prevent invalid access when showing module buildid" cleans
   up kallsyms code related to module buildid and fixes an invalid
   access crash when printing backtraces (Petr Mladek)

 - "Address page fault in ima_restore_measurement_list()" fixes a
   kexec-related crash that can occur when booting the second-stage
   kernel on x86 (Harshit Mogalapalli)

 - "kho: ABI headers and Documentation updates" updates the kexec
   handover ABI documentation (Mike Rapoport)

 - "Align atomic storage" adds the __aligned attribute to atomic_t and
   atomic64_t definitions to get natural alignment of both types on
   csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)

 - "kho: clean up page initialization logic" simplifies the page
   initialization logic in kho_restore_page() (Pratyush Yadav)

 - "Unload linux/kernel.h" moves several things out of kernel.h and into
   more appropriate places (Yury Norov)

 - "don't abuse task_struct.group_leader" removes the usage of
   ->group_leader when it is "obviously unnecessary" (Oleg Nesterov)

 - "list private v2 & luo flb" adds some infrastructure improvements to
   the live update orchestrator (Pasha Tatashin)

* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
  watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
  procfs: fix missing RCU protection when reading real_parent in do_task_stat()
  watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
  kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
  kho: fix doc for kho_restore_pages()
  tests/liveupdate: add in-kernel liveupdate test
  liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
  liveupdate: luo_file: Use private list
  list: add kunit test for private list primitives
  list: add primitives for private list manipulations
  delayacct: fix uapi timespec64 definition
  panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
  netclassid: use thread_group_leader(p) in update_classid_task()
  RDMA/umem: don't abuse current->group_leader
  drm/pan*: don't abuse current->group_leader
  drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
  drm/amdgpu: don't abuse current->group_leader
  android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
  android/binder: don't abuse current->group_leader
  kho: skip memoryless NUMA nodes when reserving scratch areas
  ...
2026-02-12 12:13:01 -08:00
Randy Dunlap
24c776355f kernel.h: drop hex.h and update all hex.h users
Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of
hex.h interfaces to directly #include <linux/hex.h> as part of the process
of putting kernel.h on a diet.

Removing hex.h from kernel.h means that 36K C source files don't have to
pay the price of parsing hex.h for the roughly 120 C source files that
need it.

This change has been build-tested with allmodconfig on most ARCHes.  Also,
all users/callers of <linux/hex.h> in the entire source tree have been
updated if needed (if not already #included).

Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:44:19 -08:00
Paul Moore
ea64aa57d5 selinux: drop the BUG() in cred_has_capability()
With the compile time check located immediately above the
cred_has_capability() function ensuring that we will notice if the
capability set grows beyond 63 capabilities, we can safely remove
the BUG() call from the cred_has_capability().

Signed-off-by: Paul Moore <paul@paul-moore.com>
2026-01-14 16:26:21 -05:00
Paul Moore
b07b6f0c5d selinux: fix a capabilities parsing typo in selinux_bpf_token_capable()
There was a typo, likely a cut-n-paste bug, where we were checking for
SECCLASS_CAPABILITY instead of SECCLASS_CAPABILITY2.

Fixes: 5473a722f7 ("selinux: add support for BPF token access control")
Reported-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2026-01-14 16:15:09 -05:00
Eric Suen
5473a722f7 selinux: add support for BPF token access control
BPF token support was introduced to allow a privileged process to delegate
limited BPF functionality—such as map creation and program loading—to
an unprivileged process:
  https://lore.kernel.org/linux-security-module/20231130185229.2688956-1-andrii@kernel.org/

This patch adds SELinux support for controlling BPF token access. With
this change, SELinux policies can now enforce constraints on BPF token
usage based on both the delegating (privileged) process and the recipient
(unprivileged) process.

Supported operations currently include:
  - map_create
  - prog_load

High-level workflow:
  1. An unprivileged process creates a VFS context via `fsopen()` and
     obtains a file descriptor.
  2. This descriptor is passed to a privileged process, which configures
     BPF token delegation options and mounts a BPF filesystem.
  3. SELinux records the `creator_sid` of the privileged process during
     mount setup.
  4. The unprivileged process then uses this BPF fs mount to create a
     token and attach it to subsequent BPF syscalls.
  5. During verification of `map_create` and `prog_load`, SELinux uses
     `creator_sid` and the current SID to check policy permissions via:
       avc_has_perm(creator_sid, current_sid, SECCLASS_BPF,
                    BPF__MAP_CREATE, NULL);

The implementation introduces two new permissions:
  - map_create_as
  - prog_load_as

At token creation time, SELinux verifies that the current process has the
appropriate `*_as` permission (depending on the `allowed_cmds` value in
the bpf_token) to act on behalf of the `creator_sid`.

Example SELinux policy:
  allow test_bpf_t self:bpf {
      map_create map_read map_write prog_load prog_run
      map_create_as prog_load_as
  };

Additionally, a new policy capability bpf_token_perms is added to ensure
backward compatibility. If disabled, previous behavior ((checks based on
current process SID)) is preserved.

Signed-off-by: Eric Suen <ericsu@linux.microsoft.com>
Tested-by: Daniel Durning <danieldurning.work@gmail.com>
Reviewed-by: Daniel Durning <danieldurning.work@gmail.com>
[PM: merge fuzz, subject tweaks, whitespace tweaks, line length tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2026-01-13 15:42:37 -05:00
Paul Moore
27a7cef9c3 selinux: move the selinux_blob_sizes struct
Move the selinux_blob_sizes struct so it adjacent to the rest of the
SELinux initialization code and not in the middle of the LSM hook
callbacks.

Signed-off-by: Paul Moore <paul@paul-moore.com>
2026-01-13 11:53:38 -05:00
Linus Torvalds
7cd122b552 Some filesystems use a kinda-sorta controlled dentry refcount leak to pin
dentries of created objects in dcache (and undo it when removing those).
 Reference is grabbed and not released, but it's not actually _stored_
 anywhere.  That works, but it's hard to follow and verify; among other
 things, we have no way to tell _which_ of the increments is intended
 to be an unpaired one.  Worse, on removal we need to decide whether
 the reference had already been dropped, which can be non-trivial if
 that removal is on umount and we need to figure out if this dentry is
 pinned due to e.g. unlink() not done.  Usually that is handled by using
 kill_litter_super() as ->kill_sb(), but there are open-coded special
 cases of the same (consider e.g. /proc/self).
 
 Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT)
 marking those "leaked" dentries.  Having it set claims responsibility
 for +1 in refcount.
 
 The end result this series is aiming for:
 
 * get these unbalanced dget() and dput() replaced with new primitives that
   would, in addition to adjusting refcount, set and clear persistency flag.
 * instead of having kill_litter_super() mess with removing the remaining
   "leaked" references (e.g. for all tmpfs files that hadn't been removed
   prior to umount), have the regular shrink_dcache_for_umount() strip
   DCACHE_PERSISTENT of all dentries, dropping the corresponding
   reference if it had been set.  After that kill_litter_super() becomes
   an equivalent of kill_anon_super().
 
 Doing that in a single step is not feasible - it would affect too many places
 in too many filesystems.  It has to be split into a series.
 
 This work has really started early in 2024; quite a few preliminary pieces
 have already gone into mainline.  This chunk is finally getting to the
 meat of that stuff - infrastructure and most of the conversions to it.
 
 Some pieces are still sitting in the local branches, but the bulk of
 that stuff is here.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaTEq1wAKCRBZ7Krx/gZQ
 643uAQC1rRslhw5l7OjxEpIYbGG4M+QaadN4Nf5Sr2SuTRaPJQD/W4oj/u4C2eCw
 Dd3q071tqyvm/PXNgN2EEnIaxlFUlwc=
 =rKq+
 -----END PGP SIGNATURE-----

Merge tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull persistent dentry infrastructure and conversion from Al Viro:
 "Some filesystems use a kinda-sorta controlled dentry refcount leak to
  pin dentries of created objects in dcache (and undo it when removing
  those). A reference is grabbed and not released, but it's not actually
  _stored_ anywhere.

  That works, but it's hard to follow and verify; among other things, we
  have no way to tell _which_ of the increments is intended to be an
  unpaired one. Worse, on removal we need to decide whether the
  reference had already been dropped, which can be non-trivial if that
  removal is on umount and we need to figure out if this dentry is
  pinned due to e.g. unlink() not done. Usually that is handled by using
  kill_litter_super() as ->kill_sb(), but there are open-coded special
  cases of the same (consider e.g. /proc/self).

  Things get simpler if we introduce a new dentry flag
  (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set
  claims responsibility for +1 in refcount.

  The end result this series is aiming for:

   - get these unbalanced dget() and dput() replaced with new primitives
     that would, in addition to adjusting refcount, set and clear
     persistency flag.

   - instead of having kill_litter_super() mess with removing the
     remaining "leaked" references (e.g. for all tmpfs files that hadn't
     been removed prior to umount), have the regular
     shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries,
     dropping the corresponding reference if it had been set. After that
     kill_litter_super() becomes an equivalent of kill_anon_super().

  Doing that in a single step is not feasible - it would affect too many
  places in too many filesystems. It has to be split into a series.

  This work has really started early in 2024; quite a few preliminary
  pieces have already gone into mainline. This chunk is finally getting
  to the meat of that stuff - infrastructure and most of the conversions
  to it.

  Some pieces are still sitting in the local branches, but the bulk of
  that stuff is here"

* tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  d_make_discardable(): warn if given a non-persistent dentry
  kill securityfs_recursive_remove()
  convert securityfs
  get rid of kill_litter_super()
  convert rust_binderfs
  convert nfsctl
  convert rpc_pipefs
  convert hypfs
  hypfs: swich hypfs_create_u64() to returning int
  hypfs: switch hypfs_create_str() to returning int
  hypfs: don't pin dentries twice
  convert gadgetfs
  gadgetfs: switch to simple_remove_by_name()
  convert functionfs
  functionfs: switch to simple_remove_by_name()
  functionfs: fix the open/removal races
  functionfs: need to cancel ->reset_work in ->kill_sb()
  functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
  functionfs: don't abuse ffs_data_closed() on fs shutdown
  convert selinuxfs
  ...
2025-12-05 14:36:21 -08:00
Linus Torvalds
777f817160 integrity-v6.19
-----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCaS896BQcem9oYXJAbGlu
 dXguaWJtLmNvbQAKCRDLwZzRsCrn5RDuAQDx4fmvctP8kc9PeRjd5X/UV1ip1pPD
 beMKt8ghEThQiAEAzjFJbNGUDKhfR8yWODifAvYRurU5YQJZZI9wJ8skNw0=
 =3Vc4
 -----END PGP SIGNATURE-----

Merge tag 'integrity-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity updates from Mimi Zohar:
 "Bug fixes:

   - defer credentials checking from the bprm_check_security hook to the
     bprm_creds_from_file security hook

   - properly ignore IMA policy rules based on undefined SELinux labels

  IMA policy rule extensions:

   - extend IMA to limit including file hashes in the audit logs
     (dont_audit action)

   - define a new filesystem subtype policy option (fs_subtype)

  Misc:

   - extend IMA to support in-kernel module decompression by deferring
     the IMA signature verification in kernel_read_file() to after the
     kernel module is decompressed"

* tag 'integrity-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  ima: Handle error code returned by ima_filter_rule_match()
  ima: Access decompressed kernel module to verify appended signature
  ima: add fs_subtype condition for distinguishing FUSE instances
  ima: add dont_audit action to suppress audit actions
  ima: Attach CREDS_CHECK IMA hook to bprm_creds_from_file LSM hook
2025-12-03 11:08:03 -08:00
Linus Torvalds
51e3b98d73 selinux/stable-6.19 PR 20251201
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmkuAKEUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPKeA/8DSW+sTkQ9BMGGnyuH1uU/r84qtVh
 Ft6pnIPzrogE/GKcQeFgFA9D7gQbB8J39PSxZLS3lp0UiuPCuq+D09L+uzDKzDCD
 Avfe84dwsI5OiplPKyHiG3bF9W2+A1zkwH2j+5uC6yF8v9J9vglo4u5vAYeE2wxA
 X4b2r9jMm7WJ/KFNiSiiLGEhOSjVVUrJULcmWMRPPruplPDC4dLnqYTWTbkrfF8h
 /oXv/+ssqbj6FqfL4WaRnjN8GgZcwaWy1qu9LVlZ40iphpbVAyPBJPLJS6Q4hhOl
 mMHUbYkxALPyW7riQxoXAegQjJyGgKn8Bli9U6bkiKFA2yeIhJFX+OyV1SlOAs/J
 g6s5XfeCzqY0Tw3eqvT1YRhp10GcA7EtBYvhAe5ARq7PkMoqxmiI587piVX9hbos
 a0AH9CDNoOw+8QXx27sOoD1YIaiYD9fikXKymrzRRaW/GX6i43XIKiELBMuKoIVZ
 iwualvQiGBLLczzm5rdqPcLgp09Agn4AHfvFWXKFgS4+IJGKjeeXNOjsp9oFEivq
 RnXmDpa+nBud5zeTSeSpOY2L0pvuIG5N25N6U9bTsDe+4Y6p0qIAUy8e4sQ0PA8P
 xyp9/fcNr9jwHeLTjDbxZqZ+MU3GLIIVPdl0zq4z2J8nhkW3wD3pQX6B4qPIuXLx
 YP3nwhAT9T+hU7w=
 =IvVa
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:

 - Improve the granularity of SELinux labeling for memfd files

   Currently when creating a memfd file, SELinux treats it the same as
   any other tmpfs, or hugetlbfs, file. While simple, the drawback is
   that it is not possible to differentiate between memfd and tmpfs
   files.

   This adds a call to the security_inode_init_security_anon() LSM hook
   and wires up SELinux to provide a set of memfd specific access
   controls, including the ability to control the execution of memfds.

   As usual, the commit message has more information.

 - Improve the SELinux AVC lookup performance

   Adopt MurmurHash3 for the SELinux AVC hash function instead of the
   custom hash function currently used. MurmurHash3 is already used for
   the SELinux access vector table so the impact to the code is minimal,
   and performance tests have shown improvements in both hash
   distribution and latency.

   See the commit message for the performance measurments.

 - Introduce a Kconfig option for the SELinux AVC bucket/slot size

   While we have the ability to grow the number of AVC hash buckets
   today, the size of the buckets (slot size) is fixed at 512. This pull
   request makes that slot size configurable at build time through a new
   Kconfig knob, CONFIG_SECURITY_SELINUX_AVC_HASH_BITS.

* tag 'selinux-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: improve bucket distribution uniformity of avc_hash()
  selinux: Move avtab_hash() to a shared location for future reuse
  selinux: Introduce a new config to make avc cache slot size adjustable
  memfd,selinux: call security_inode_init_security_anon()
2025-12-03 10:45:47 -08:00
Linus Torvalds
121cc35cfb lsm/stable-6.19 PR 20251201
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmkuALUUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOtDg/8DMxvN2XKZrryP31zdknUEHLJOTfz
 eFCaNKQJK9GpJ1Q/Z4P/q/dH4QUKZHEM7E18N/hjA4Nx6Z7I1eVPK6hvvySkRa9l
 b5j+GTLteMcANV04i04B8VTn2mtEW5SZp0Y280EFOMoVGvav72zAt4HHWVytDzyy
 tVzvuC6iPNbe7rw+eUzTjHAq3WWWYe42QmiDfnAttdjWloSnfMx6AIvEoeo6jryc
 aLGeZQsrgk2wL/ovXXD5kvDo1EQnETGuxQRh8P3W2DzLwEtt6d+BpfAm9PE0FE4k
 oE5YrqOhvIpmcBm/8DdkvZ0o0gdfe0IrACvoEqJVpWs6w6T6zusiTzwWp7tBzET/
 ygqYabUpz+BrAsGNVtXlDD4va37e5OI500PjDntuT4GMwKBGe5JKXLeki0sQeu6d
 AcZd8hu6sVpYDLWJoWDXplxq1ndJTfafVtONQ5Cw8BHM5j6CIAaZM13KG9rJSOYa
 uyNOfHxndsjV7dzuQ9S763l4djixiw0oU/PF+XQP4dC/Dyf60yb47mCOlZndRaJj
 /FqR0Rbp2KonOSrkmzPTteGJOLMgM5bquZsSHNClxC/qeHTv8xKWf0HRWN61ZUe2
 /NLcSjL+CIcN6q0c8jx/k7I9N/yQcmQLQIVTnUY6YOi0TkhUUdqSaq0rp8rSDW9z
 AUvHpfPpC92klcM=
 =u7yQ
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull LSM updates from Paul Moore:

 - Rework the LSM initialization code

   What started as a "quick" patch to enable a notification event once
   all of the individual LSMs were initialized, snowballed a bit into a
   30+ patch patchset when everything was done. Most of the patches, and
   diffstat, is due to splitting out the initialization code into
   security/lsm_init.c and cleaning up some of the mess that was there.
   While not strictly necessary, it does cleanup the code signficantly,
   and hopefully makes the upkeep a bit easier in the future.

   Aside from the new LSM_STARTED_ALL notification, these changes also
   ensure that individual LSM initcalls are only called when the LSM is
   enabled at boot time. There should be a minor reduction in boot times
   for those who build multiple LSMs into their kernels, but only enable
   a subset at boot.

   It is worth mentioning that nothing at present makes use of the
   LSM_STARTED_ALL notification, but there is work in progress which is
   dependent upon LSM_STARTED_ALL.

 - Make better use of the seq_put*() helpers in device_cgroup

* tag 'lsm-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (36 commits)
  lsm: use unrcu_pointer() for current->cred in security_init()
  device_cgroup: Refactor devcgroup_seq_show to use seq_put* helpers
  lsm: add a LSM_STARTED_ALL notification event
  lsm: consolidate all of the LSM framework initcalls
  selinux: move initcalls to the LSM framework
  ima,evm: move initcalls to the LSM framework
  lockdown: move initcalls to the LSM framework
  apparmor: move initcalls to the LSM framework
  safesetid: move initcalls to the LSM framework
  tomoyo: move initcalls to the LSM framework
  smack: move initcalls to the LSM framework
  ipe: move initcalls to the LSM framework
  loadpin: move initcalls to the LSM framework
  lsm: introduce an initcall mechanism into the LSM framework
  lsm: group lsm_order_parse() with the other lsm_order_*() functions
  lsm: output available LSMs when debugging
  lsm: cleanup the debug and console output in lsm_init.c
  lsm: add/tweak function header comment blocks in lsm_init.c
  lsm: fold lsm_init_ordered() into security_init()
  lsm: cleanup initialize_lsm() and rename to lsm_init_single()
  ...
2025-12-03 09:53:48 -08:00
Linus Torvalds
a8058f8442 vfs-6.19-rc1.directory.locking
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZwAKCRCRxhvAZXjc
 op9tAQCJ//STOkvYHfqgsdRD+cW9MRg/gPzfVZgnV1FTyf8sMgEA0IsY5zCZB9eh
 9FdD0E57P8PlWRwWZ+LktnWBzRAUqwI=
 =MOVR
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull directory locking updates from Christian Brauner:
 "This contains the work to add centralized APIs for directory locking
  operations.

  This series is part of a larger effort to change directory operation
  locking to allow multiple concurrent operations in a directory. The
  ultimate goal is to lock the target dentry(s) rather than the whole
  parent directory.

  To help with changing the locking protocol, this series centralizes
  locking and lookup in new helper functions. The helpers establish a
  pattern where it is the dentry that is being locked and unlocked
  (currently the lock is held on dentry->d_parent->d_inode, but that can
  change in the future).

  This also changes vfs_mkdir() to unlock the parent on failure, as well
  as dput()ing the dentry. This allows end_creating() to only require
  the target dentry (which may be IS_ERR() after vfs_mkdir()), not the
  parent"

* tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  nfsd: fix end_creating() conversion
  VFS: introduce end_creating_keep()
  VFS: change vfs_mkdir() to unlock on failure.
  ecryptfs: use new start_creating/start_removing APIs
  Add start_renaming_two_dentries()
  VFS/ovl/smb: introduce start_renaming_dentry()
  VFS/nfsd/ovl: introduce start_renaming() and end_renaming()
  VFS: add start_creating_killable() and start_removing_killable()
  VFS: introduce start_removing_dentry()
  smb/server: use end_removing_noperm for for target of smb2_create_link()
  VFS: introduce start_creating_noperm() and start_removing_noperm()
  VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing()
  VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
  VFS: tidy up do_unlinkat()
  VFS: introduce start_dirop() and end_dirop()
  debugfs: rename end_creating() to debugfs_end_creating()
2025-12-01 16:13:46 -08:00
Paul Moore
3ded250b97 selinux: rename the cred_security_struct variables to "crsec"
Along with the renaming from task_security_struct to cred_security_struct,
rename the local variables to "crsec" from "tsec".  This both fits with
existing conventions and helps distinguish between task and cred related
variables.

No functional changes.

Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-11-20 16:47:50 -05:00
Stephen Smalley
dde3a5d0f4 selinux: move avdcache to per-task security struct
The avdcache is meant to be per-task; move it to a new
task_security_struct that is duplicated per-task.

Cc: stable@vger.kernel.org
Fixes: 5d7ddc59b3 ("selinux: reduce path walk overhead")
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: line length fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-11-20 16:43:51 -05:00
Stephen Smalley
75f72fe289 selinux: rename task_security_struct to cred_security_struct
Before Linux had cred structures, the SELinux task_security_struct was
per-task and although the structure was switched to being per-cred
long ago, the name was never updated. This change renames it to
cred_security_struct to avoid confusion and pave the way for the
introduction of an actual per-task security structure for SELinux. No
functional change.

Cc: stable@vger.kernel.org
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-11-20 16:43:50 -05:00
Coiby Xu
c200892b46 ima: Access decompressed kernel module to verify appended signature
Currently, when in-kernel module decompression (CONFIG_MODULE_DECOMPRESS)
is enabled, IMA has no way to verify the appended module signature as it
can't decompress the module.

Define a new kernel_read_file_id enumerate READING_MODULE_COMPRESSED so
IMA can calculate the compressed kernel module data hash on
READING_MODULE_COMPRESSED and defer appraising/measuring it until on
READING_MODULE when the module has been decompressed.

Before enabling in-kernel module decompression, a kernel module in
initramfs can still be loaded with ima_policy=secure_boot. So adjust the
kernel module rule in secure_boot policy to allow either an IMA
signature OR an appended signature i.e. to use
"appraise func=MODULE_CHECK appraise_type=imasig|modsig".

Reported-by: Karel Srot <ksrot@redhat.com>
Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2025-11-19 09:19:42 -05:00
Al Viro
cd08d17f39 convert selinuxfs
Tree has invariant part + two subtrees that get replaced upon each
policy load.  Invariant parts stay for the lifetime of filesystem,
these two subdirs - from policy load to policy load (serialized
on lock_rename(root, ...)).

All object creations are via d_alloc_name()+d_add() inside selinuxfs,
all removals are via simple_recursive_removal().

Turn those d_add() into d_make_persistent()+dput() and that's mostly it.

Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:05 -05:00
Al Viro
d1e4a99358 selinuxfs: new helper for attaching files to tree
allocating dentry after the inode has been set up reduces the amount
of boilerplate - "attach this inode under that name and this parent
or drop inode in case of failure" simplifies quite a few places.

Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:05 -05:00
Al Viro
d297622875 selinuxfs: don't stash the dentry of /policy_capabilities
Don't bother to store the dentry of /policy_capabilities - it belongs
to invariant part of tree and we only use it to populate that directory,
so there's no reason to keep it around afterwards.

Same situation as with /avc, /ss, etc.  There are two directories that
get replaced on policy load - /class and /booleans.  These we need to
stash (and update the pointers on policy reload); /policy_capabilities
is not in the same boat.

Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:05 -05:00
NeilBrown
833d2b3a07
Add start_renaming_two_dentries()
A few callers want to lock for a rename and already have both dentries.
Also debugfs does want to perform a lookup but doesn't want permission
checking, so start_renaming_dentry() cannot be used.

This patch introduces start_renaming_two_dentries() which is given both
dentries.  debugfs performs one lookup itself.  As it will only continue
with a negative dentry and as those cannot be renamed or unlinked, it is
safe to do the lookup before getting the rename locks.

overlayfs uses start_renaming_two_dentries() in three places and  selinux
uses it twice in sel_make_policy_nodes().

In sel_make_policy_nodes() we now lock for rename twice instead of just
once so the combined operation is no longer atomic w.r.t the parent
directory locks.  As selinux_state.policy_mutex is held across the whole
operation this does not open up any interesting races.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://patch.msgid.link/20251113002050.676694-13-neilb@ownmail.net
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14 13:15:58 +01:00
Hongru Zhang
20d387d7ce selinux: improve bucket distribution uniformity of avc_hash()
Reuse the already implemented MurmurHash3 algorithm. Under heavy stress
testing (on an 8-core system sustaining over 50,000 authentication events
per second), sample once per second and take the mean of 1800 samples:

1. Bucket utilization rate and length of longest chain
+--------------------------+-----------------------------------------+
|                          | bucket utilization rate / longest chain |
|                          +--------------------+--------------------+
|                          |      no-patch      |     with-patch     |
+--------------------------+--------------------+--------------------+
|  512 nodes,  512 buckets |      52.5%/7.5     |     60.2%/5.7      |
+--------------------------+--------------------+--------------------+
| 1024 nodes,  512 buckets |      68.9%/12.1    |     80.2%/9.7      |
+--------------------------+--------------------+--------------------+
| 2048 nodes,  512 buckets |      83.7%/19.4    |     93.4%/16.3     |
+--------------------------+--------------------+--------------------+
| 8192 nodes, 8192 buckets |      49.5%/11.4    |     60.3%/7.4      |
+--------------------------+--------------------+--------------------+

2. avc_search_node latency (total latency of hash operation and table
lookup)
+--------------------------+-----------------------------------------+
|                          |   latency of function avc_search_node   |
|                          +--------------------+--------------------+
|                          |      no-patch      |     with-patch     |
+--------------------------+--------------------+--------------------+
|  512 nodes,  512 buckets |        87ns        |        84ns        |
+--------------------------+--------------------+--------------------+
| 1024 nodes,  512 buckets |        97ns        |        96ns        |
+--------------------------+--------------------+--------------------+
| 2048 nodes,  512 buckets |       118ns        |       113ns        |
+--------------------------+--------------------+--------------------+
| 8192 nodes, 8192 buckets |       106ns        |        99ns        |
+--------------------------+--------------------+--------------------+

Although MurmurHash3 has higher overhead than the bitwise operations in
the original algorithm, the data shows that the MurmurHash3 achieves
better distribution, reducing average lookup time. Consequently, the
total latency of hashing and table lookup is lower than before.

Signed-off-by: Hongru Zhang <zhanghongru@xiaomi.com>
[PM: whitespace fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-23 18:24:30 -04:00
Hongru Zhang
929126ef4a selinux: Move avtab_hash() to a shared location for future reuse
This is a preparation patch, no functional change.

Signed-off-by: Hongru Zhang <zhanghongru@xiaomi.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-23 18:24:30 -04:00
Hongru Zhang
641e021758 selinux: Introduce a new config to make avc cache slot size adjustable
On mobile device high-load situations, permission check can happen
more than 90,000/s (8 core system). With default 512 cache nodes
configuration, avc cache miss happens more often and occasionally
leads to long time (>2ms) irqs off on both big and little cores,
which decreases system real-time capability.

An actual call stack is as follows:
 => avc_compute_av
 => avc_perm_nonode
 => avc_has_perm_noaudit
 => selinux_capable
 => security_capable
 => capable
 => __sched_setscheduler
 => do_sched_setscheduler
 => __arm64_sys_sched_setscheduler
 => invoke_syscall
 => el0_svc_common
 => do_el0_svc
 => el0_svc
 => el0t_64_sync_handler
 => el0t_64_sync

Although we can expand avc nodes through /sys/fs/selinux/cache_threshold
to mitigate long time irqs off, hash conflicts make the bucket average
length longer because of the fixed size of cache slots, leading to
avc_search_node() latency increase.

So introduce a new config to make avc cache slot size also configurable,
and with fine tuning, we can mitigate long time irqs off with slightly
avc_search_node() performance regression.

Theoretically, the main overhead is memory consumption.

Signed-off-by: Hongru Zhang <zhanghongru@xiaomi.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-23 18:24:30 -04:00
Thiébaud Weksteen
094e94d13b memfd,selinux: call security_inode_init_security_anon()
Prior to this change, no security hooks were called at the creation of a
memfd file. It means that, for SELinux as an example, it will receive
the default type of the filesystem that backs the in-memory inode. In
most cases, that would be tmpfs, but if MFD_HUGETLB is passed, it will
be hugetlbfs. Both can be considered implementation details of memfd.

It also means that it is not possible to differentiate between a file
coming from memfd_create and a file coming from a standard tmpfs mount
point.

Additionally, no permission is validated at creation, which differs from
the similar memfd_secret syscall.

Call security_inode_init_security_anon during creation. This ensures
that the file is setup similarly to other anonymous inodes. On SELinux,
it means that the file will receive the security context of its task.

The ability to limit fexecve on memfd has been of interest to avoid
potential pitfalls where /proc/self/exe or similar would be executed
[1][2]. Reuse the "execute_no_trans" and "entrypoint" access vectors,
similarly to the file class. These access vectors may not make sense for
the existing "anon_inode" class. Therefore, define and assign a new
class "memfd_file" to support such access vectors.

Guard these changes behind a new policy capability named "memfd_class".

[1] https://crbug.com/1305267
[2] https://lore.kernel.org/lkml/20221215001205.51969-1-jeffxu@google.com/

Signed-off-by: Thiébaud Weksteen <tweek@google.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
[PM: subj tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:28:27 -04:00
Paul Moore
3156bc814f selinux: move initcalls to the LSM framework
SELinux currently has a number of initcalls so we've created a new
function, selinux_initcall(), which wraps all of these initcalls so
that we have a single initcall function that can be registered with the
LSM framework.

Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:28 -04:00
Paul Moore
9f9dc69e06 lsm: replace the name field with a pointer to the lsm_id struct
Reduce the duplication between the lsm_id struct and the DEFINE_LSM()
definition by linking the lsm_id struct directly into the individual
LSM's DEFINE_LSM() instance.

Linking the lsm_id into the LSM definition also allows us to simplify
the security_add_hooks() function by removing the code which populates
the lsm_idlist[] array and moving it into the normal LSM startup code
where the LSM list is parsed and the individual LSMs are enabled,
making for a cleaner implementation with less overhead at boot.

Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22 19:24:18 -04:00
Linus Torvalds
33fc69a05c Simplifying ->d_name audits, easy part.
Turn dentry->d_name into an anon union of const struct qsrt (d_name
 itself) and a writable alias (__d_name).  With constification of some
 struct qstr * arguments of functions that get &dentry->d_name passed
 to them, that ends up with all modifications provably done only in
 fs/dcache.c (and a fairly small part of it).
 
 Any new places doing modifications will be easy to find - grep for
 __d_name will suffice.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaNh6XAAKCRBZ7Krx/gZQ
 6wIFAP9nJ9RIsTq2eiqb3YUTQsaFZNu7aqFWiHCFPeHVLzylPwEAgeoGrGdL8zNO
 JqAuPPbQxN6Q6n79qAI/vfFvYQCsAQ0=
 =88fF
 -----END PGP SIGNATURE-----

Merge tag 'pull-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull d_name audit update from Al Viro:
 "Simplifying ->d_name audits, easy part.

  Turn dentry->d_name into an anon union of const struct qsrt (d_name
  itself) and a writable alias (__d_name).

  With constification of some struct qstr * arguments of functions that
  get &dentry->d_name passed to them, that ends up with all
  modifications provably done only in fs/dcache.c (and a fairly small
  part of it).

  Any new places doing modifications will be easy to find - grep for
  __d_name will suffice"

* tag 'pull-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  make it easier to catch those who try to modify ->d_name
  generic_ci_validate_strict_name(): constify name argument
  afs_dir_search: constify qstr argument
  afs_edit_dir_{add,remove}(): constify qstr argument
  exfat_find(): constify qstr argument
  security_dentry_init_security(): constify qstr argument
2025-10-03 11:14:02 -07:00
Linus Torvalds
76f01a4f22 lsm/stable-6.18 PR 20250926
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmjWq9QUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMHIQ//dYegdfQvUB/eD4rnnlNEgmGAFiAg
 pAdAA5F+6bINfm2X622cqxKa71f9hqhAgt+k7KPuZyr1dSOhv55HkO8ibCwpGj6G
 IIYnhf9PPQXH1hdHI0uLlaxNSCf2T3uJf5I261g1zS54XaVhgYZKzlwaMjvF1w/u
 6DmvOJleIOH+Qu8D6+B79XxTtmEbgdJ2yNpb6tSgUhbD3a1zBuM+8EBBRf6q1IaL
 xoCTBzkEWR2M2V1bwqPycSbSqKmOnQwROTICRCXjOCjOlxbXXKQtnfb+26mU3ZAy
 5hnNkGjopgrvLSDE8Y9uX3WmLr3o1JmJGcRPrmXseOdd+mxcmZiXeWnVA5ZG5rxI
 ObFbj4nnn1VnayU7zFl/FW5weezqIEUC1+bfGh1PUWHwlbdF1Z2+eObbTWGjx0ev
 T42OC9MnfzU8poGEi+Wudg9LixzWkto1J2rCnHatQ/9FMpQoMCbTPNWfkPnf7pGc
 stml9Xd/3pxm6ah3VVLPiNpQJYidLgAT2REYvYLaiJTyu+OPi2zAvzcov3KaGQHV
 bQ6NGhZ0NdoM5L00N2yfeEuzh/NNwdDvhcp5hlTBSjbNqdgU1XE/PD5TKwzH6291
 Fjy4U/9UkWTJclrGYCiN87lfVpjvtk5vc0+tjS/908Pi4pIAsLtLZ9tJ9d7yqH/7
 FFA5bwob7mQ08fk=
 =jK6L
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull lsm updates from Paul Moore:

 - Move the management of the LSM BPF security blobs into the framework

   In order to enable multiple LSMs we need to allocate and free the
   various security blobs in the LSM framework and not the individual
   LSMs as they would end up stepping all over each other.

 - Leverage the lsm_bdev_alloc() helper in lsm_bdev_alloc()

   Make better use of our existing helper functions to reduce some code
   duplication.

 - Update the Rust cred code to use 'sync::aref'

   Part of a larger effort to move the Rust code over to the 'sync'
   module.

 - Make CONFIG_LSM dependent on CONFIG_SECURITY

   As the CONFIG_LSM Kconfig setting is an ordered list of the LSMs to
   enable a boot, it obviously doesn't make much sense to enable this
   when CONFIG_SECURITY is disabled.

 - Update the LSM and CREDENTIALS sections in MAINTAINERS with Rusty
   bits

   Add the Rust helper files to the associated LSM and CREDENTIALS
   entries int the MAINTAINERS file. We're trying to improve the
   communication between the two groups and making sure we're all aware
   of what is going on via cross-posting to the relevant lists is a good
   way to start.

* tag 'lsm-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  lsm: CONFIG_LSM can depend on CONFIG_SECURITY
  MAINTAINERS: add the associated Rust helper to the CREDENTIALS section
  MAINTAINERS: add the associated Rust helper to the LSM section
  rust,cred: update AlwaysRefCounted import to sync::aref
  security: use umax() to improve code
  lsm,selinux: Add LSM blob support for BPF objects
  lsm: use lsm_blob_alloc() in lsm_bdev_alloc()
2025-09-30 08:48:29 -07:00
Linus Torvalds
57bc683896 selinux/stable-6.18 PR 20250926
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmjWq78UHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMgnw/9Hc3ZGataAiWXNqRiMxVGB39rHidN
 zKVOIqAG4NT9J0koIt2FEI6LmI6KZ8JRpHAygLdb5nVHTmpOhDYIvZwIMU6Y91UF
 1Q041TQXfcv8Wc1cCab1Sj+28LAhUoFap2QesnoudV+5xpl0DhLBcDDXEneN90bp
 zrXS3Yq6wg+QCtY9CkUHuKjHC0tAZI56x9XO2nrzgAJgCzBT2F7WOlucMVA+40OB
 9pWoL5PZ+MF3/y7dtMqZV/vqM45mwj9jQ6UIi6T/QvL5XoH8bmi97xjVPtb3GZG6
 UWDrWJIUam7oRjYootHP59+6QjNquJkScjuLqsx82pUwrhWKJHLSKrOCagH0eWAj
 liFfqAwHznQL6IkHwtwdfchlnf4/LCkLeB510kK/Asr4723E9/DdOReB+lPjmwIo
 ibuZYfmVoETSjRv2LVva9hqGdT3aZezpx2qLnItc8WJgmQ3pSy1nb5iY5wS3S3/7
 aMArggG+3k3YOoVBmQjWjWNrDH9ky/efAxxTc6N6fmF7EqUaymWSxixYMM/hsoCQ
 JjVmUp/fOOAKweNnEHMlYy/aKKMCpB3hgJyV6GPNzTOvbG/Trz/QP1BH2xsiHsvg
 iWvHc+6L6euOmW8laaxoKUnnfweuUaXWtDLxE+aLqGyoX2XVMXSmD45hZ2QmsIyo
 acQEHZ38V4M3gqI=
 =eLHt
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:

 - Support per-file labeling for functionfs

   Both genfscon and user defined labeling methods are supported. This
   should help users who want to provide separation between the control
   endpoint file, "ep0", and other endpoints.

 - Remove our use of get_zeroed_page() in sel_read_bool()

   Update sel_read_bool() to use a four byte stack buffer instead of a
   memory page fetched via get_zeroed_page(), and fix a memory in the
   process.

   Needless to say we should have done this a long time ago, but it was
   in a very old chunk of code that "just worked" and I don't think
   anyone had taken a real look at it in many years.

 - Better use of the netdev skb/sock helper functions

   Convert a sk_to_full_sk(skb->sk) into a skb_to_full_sk(skb) call.

 - Remove some old, dead, and/or redundant code

* tag 'selinux-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: enable per-file labeling for functionfs
  selinux: fix sel_read_bool() allocation and error handling
  selinux: Remove redundant __GFP_NOWARN
  selinux: use a consistent method to get full socket from skb
  selinux: Remove unused function selinux_policycap_netif_wildcard()
2025-09-30 08:30:32 -07:00
Linus Torvalds
56a0810d8c audit/stable-6.18 PR 20250926
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmjWq5oUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPTjRAAwYapnw+ZGdFtTGIDT63HtlKjCGHg
 DRR8J1RYWhxQL78dInjl7hlGPd4ULdpdF6zsh27X/8OsdFotw4NhDyPJwS1qWZv9
 uBJMy/s1Qi1V/rrtDygLGgkQ9ICfl/hgVh3L+g9AXU8H9IapMULp33z+2ueFU4rA
 PXgXppgNQTOhIQml0tagY7iPlLaaI1uPv/Dbvt792CSrKZReC+uiDSQKD6SUy5oJ
 NBRs0emdCqbllo8Eo7wTGdfzUttsPWYHe7X9BGCMK2bHp0BpMnFBDtuipUAgjNE8
 O16EkAtBMpEBW9VEFvDYW1jMFO7ccD8b09CbqPLdE7E0GeigTiODg+FdncKEpZn0
 Dl4xPbIoPBHVrDHKFK3HcuEdUs0FZH3NpTLFRg0/nWbg3CfSOFq1ZKhSbwLTZ48V
 2Iq22G0hIIl3yTEePSoR8xCSQkWf6hA1SVvzBqw5Xn1tnkdIUuM+KzeZUPKxCOiH
 r5b3ufrN5YMAcmc59q393sNuSMd7s97fohhK8/HouB93EcVNM2UjLEKVJnhMhYRE
 N21O17jwQG9F+OYTnmtMzuUF6yxwSAmkzQOg6F+lalJ8MECnNrZOEeyuA3d5ISi5
 4ZrXHWw90qaDy9lCV1o0UwWt9na+WxeMCJNpI07h5V1k3x7BULiI6WeP7J1qnY9r
 YlLv/6Hgx29dtqE=
 =iQal
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:

 - Proper audit support for multiple LSMs

   As the audit subsystem predated the work to enable multiple LSMs,
   some additional work was needed to support logging the different LSM
   labels for the subjects/tasks and objects on the system. Casey's
   patches add new auxillary records for subjects and objects that
   convey the additional labels.

 - Ensure fanotify audit events are always generated

   Generally speaking security relevant subsystems always generate audit
   events, unless explicitly ignored. However, up to this point fanotify
   events had been ignored by default, but starting with this pull
   request fanotify follows convention and generates audit events by
   default.

 - Replace an instance of strcpy() with strscpy()

 - Minor indentation, style, and comment fixes

* tag 'audit-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: fix skb leak when audit rate limit is exceeded
  audit: init ab->skb_list earlier in audit_buffer_alloc()
  audit: add record for multiple object contexts
  audit: add record for multiple task security contexts
  lsm: security_lsmblob_to_secctx module selection
  audit: create audit_stamp structure
  audit: add a missing tab
  audit: record fanotify event regardless of presence of rules
  audit: fix typo in auditfilter.c comment
  audit: Replace deprecated strcpy() with strscpy()
  audit: fix indentation in audit_log_exit()
2025-09-30 08:22:16 -07:00
Al Viro
f9fadf23c7 security_dentry_init_security(): constify qstr argument
Nothing outside of fs/dcache.c has any business modifying
dentry names; passing &dentry->d_name as an argument should
have that argument declared as a const pointer.

Acked-by: Casey Schaufler <casey@schaufler-ca.com> # smack part
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-15 21:08:33 -04:00
Neill Kapron
68e1e908cb selinux: enable per-file labeling for functionfs
This patch adds support for genfscon per-file labeling of functionfs
files as well as support for userspace to apply labels after new
functionfs endpoints are created.

This allows for separate labels and therefore access control on a
per-endpoint basis. An example use case would be for the default
endpoint EP0 used as a restricted control endpoint, and additional
usb endpoints to be used by other more permissive domains.

It should be noted that if there are multiple functionfs mounts on a
system, genfs file labels will apply to all mounts, and therefore will not
likely be as useful as the userspace relabeling portion of this patch -
the addition to selinux_is_genfs_special_handling().

This patch introduces the functionfs_seclabel policycap to maintain
existing functionfs genfscon behavior unless explicitly enabled.

Signed-off-by: Neill Kapron <nkapron@google.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: trim changelog, apply boolean logic fixup]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-09-07 12:54:56 -04:00
Stephen Smalley
59ffc9beeb selinux: fix sel_read_bool() allocation and error handling
Switch sel_read_bool() from using get_zeroed_page() and free_page()
to a stack-allocated buffer. This also fixes a memory leak in the
error path when security_get_bool_value() returns an error.

Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-09-03 17:34:32 -04:00
Simon Schuster
edd3cb05c0 copy_process: pass clone_flags as u64 across calltree
With the introduction of clone3 in commit 7f192e3cd3 ("fork: add
clone3") the effective bit width of clone_flags on all architectures was
increased from 32-bit to 64-bit, with a new type of u64 for the flags.
However, for most consumers of clone_flags the interface was not
changed from the previous type of unsigned long.

While this works fine as long as none of the new 64-bit flag bits
(CLONE_CLEAR_SIGHAND and CLONE_INTO_CGROUP) are evaluated, this is still
undesirable in terms of the principle of least surprise.

Thus, this commit fixes all relevant interfaces of callees to
sys_clone3/copy_process (excluding the architecture-specific
copy_thread) to consistently pass clone_flags as u64, so that
no truncation to 32-bit integers occurs on 32-bit architectures.

Signed-off-by: Simon Schuster <schuster.simon@siemens-energy.com>
Link: https://lore.kernel.org/20250901-nios2-implement-clone3-v2-2-53fcf5577d57@siemens-energy.com
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01 15:31:34 +02:00
Casey Schaufler
0ffbc876d0 audit: add record for multiple object contexts
Create a new audit record AUDIT_MAC_OBJ_CONTEXTS.
An example of the MAC_OBJ_CONTEXTS record is:

    type=MAC_OBJ_CONTEXTS
      msg=audit(1601152467.009:1050):
      obj_selinux=unconfined_u:object_r:user_home_t:s0

When an audit event includes a AUDIT_MAC_OBJ_CONTEXTS record
the "obj=" field in other records in the event will be "obj=?".
An AUDIT_MAC_OBJ_CONTEXTS record is supplied when the system has
multiple security modules that may make access decisions based
on an object security context.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subj tweak, audit example readability indents]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-30 10:15:30 -04:00
Casey Schaufler
eb59d494ee audit: add record for multiple task security contexts
Replace the single skb pointer in an audit_buffer with a list of
skb pointers. Add the audit_stamp information to the audit_buffer as
there's no guarantee that there will be an audit_context containing
the stamp associated with the event. At audit_log_end() time create
auxiliary records as have been added to the list. Functions are
created to manage the skb list in the audit_buffer.

Create a new audit record AUDIT_MAC_TASK_CONTEXTS.
An example of the MAC_TASK_CONTEXTS record is:

    type=MAC_TASK_CONTEXTS
      msg=audit(1600880931.832:113)
      subj_apparmor=unconfined
      subj_smack=_

When an audit event includes a AUDIT_MAC_TASK_CONTEXTS record the
"subj=" field in other records in the event will be "subj=?".
An AUDIT_MAC_TASK_CONTEXTS record is supplied when the system has
multiple security modules that may make access decisions based on a
subject security context.

Refactor audit_log_task_context(), creating a new audit_log_subj_ctx().
This is used in netlabel auditing to provide multiple subject security
contexts as necessary.

Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subj tweak, audit example readability indents]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-30 10:15:30 -04:00
Qianfeng Rong
f20e70a341 selinux: Remove redundant __GFP_NOWARN
Commit 16f5dfbc85 ("gfp: include __GFP_NOWARN in GFP_NOWAIT")
made GFP_NOWAIT implicitly include __GFP_NOWARN.

Therefore, explicit __GFP_NOWARN combined with GFP_NOWAIT
(e.g., `GFP_NOWAIT | __GFP_NOWARN`) is now redundant. Let's clean
up these redundant flags across subsystems.

No functional changes.

Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: fixed horizontal spacing / alignment, line wraps]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-12 13:18:13 -04:00
Blaise Boscaccy
5816bf4273 lsm,selinux: Add LSM blob support for BPF objects
This patch introduces LSM blob support for BPF maps, programs, and
tokens to enable LSM stacking and multiplexing of LSM modules that
govern BPF objects. Additionally, the existing BPF hooks used by
SELinux have been updated to utilize the new blob infrastructure,
removing the assumption of exclusive ownership of the security
pointer.

Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
[PM: dropped local variable init, style fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-11 17:56:09 -04:00
Tianjia Zhang
d4e8dc8e8b selinux: use a consistent method to get full socket from skb
In order to maintain code consistency and readability,
skb_to_full_sk() is used to get full socket from skb.

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-11 11:59:11 -04:00
Yue Haibing
5f9383bd41 selinux: Remove unused function selinux_policycap_netif_wildcard()
This is unused since commit a3d3043ef2 ("selinux: get netif_wildcard
policycap from policy instead of cache").

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-11 11:59:11 -04:00
Linus Torvalds
dffb641bea selinux/stable-6.17 PR 20250725
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmiD08EUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOFzRAAwTbLKcFggDs0bG6AfqT5KCQHRtRW
 z/0+kvDZeKzJvmJ4XvNMdW3orC2iIYgGCYJgj8umEXh1BOGIOSd4edpGaUKvJB9Z
 S4py0I3udGjlchInmDBZqW2qGaN0EAl/s0AWBbf6QLYmMV1r9+nb5lH71pVBbEQv
 4Vltg61KkQPc6PuDpk7F+BD7yYKSCqxxWyy7zEJsFMntg4jCyoI2PfRhrHM5xuCE
 RRqctblhCc4nNqzhPvse4Vt9PD+cVQx0Pi7arxvSD8M7mVrFTsUV17OoSKlCcLGh
 1LlFk7heoDNC/UQy7xFg5hsy8GeI6gYYY/Fomu7Ue3SYJL2fqqr7xsbJyCCNg8kC
 S17gRPBxqftzl/2SE76V8y5toIRMtT7b+6aXudtukfFbxtBq09XNflq1G3iR9fTo
 o6f9MGpyWW3t/e8wk2yPYfXNXrLe3yJoxV6lIsDaHejOwGi098ArcahKKuxNzRit
 AikIoqn8GEO+j3/X/4YS1xbdS/HkG7N3t+Rbr2GNO4XsmU6iqZRBkmz5UlCcL8u1
 OTQNjzqqSy/DKAfrPAyqN7wiBVXt5siBPwAXIcPbh9JfAI/kRN7LUxAQeGnUps+o
 kJn2zCV3G4kD2qtCVCQ7VifIYGCMxa1PTYsiwu5S4Wgm1Ducvtsx4r3TVKiCuObC
 bwv/1k9kpkvLy/E=
 =M55j
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:

 - Introduce the concept of a SELinux "neveraudit" type which prevents
   all auditing of the given type/domain.

   Taken by itself, the benefit of marking a SELinux domain with the
   "neveraudit" tag is likely not very interesting, especially given the
   significant overlap with the "dontaudit" tag.

   However, given that the "neveraudit" tag applies to *all* auditing of
   the tagged domain, we can do some fairly interesting optimizations
   when a SELinux domain is marked as both "permissive" and "dontaudit"
   (think of the unconfined_t domain).

   While this pull request includes optimized inode permission and
   getattr hooks, these optimizations require SELinux policy changes,
   therefore the improvements may not be visible on standard downstream
   Linux distos for a period of time.

 - Continue the deprecation process of /sys/fs/selinux/user.

   After removing the associated userspace code in 2020, we marked the
   /sys/fs/selinux/user interface as deprecated in Linux v6.13 with
   pr_warn() and the usual documention update.

   This adds a five second sleep after the pr_warn(), following a
   previous deprecation process pattern that has worked well for us in
   the past in helping identify any existing users that we haven't yet
   reached.

 - Add a __GFP_NOWARN flag to our initial hash table allocation.

   Fuzzers such a syzbot often attempt abnormally large SELinux policy
   loads, which the SELinux code gracefully handles by checking for
   allocation failures, but not before the allocator emits a warning
   which causes the automated fuzzing to flag this as an error and
   report it to the list. While we want to continue to support the work
   done by the fuzzing teams, we want to focus on proper issues and not
   an error case that is already handled safely. Add a NOWARN flag to
   quiet the allocator and prevent syzbot from tripping on this again.

 - Remove some unnecessary selinuxfs cleanup code, courtesy of Al.

 - Update the SELinux in-kernel documentation with pointers to
   additional information.

* tag 'selinux-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: don't bother with selinuxfs_info_free() on failures
  selinux: add __GFP_NOWARN to hashtab_init() allocations
  selinux: optimize selinux_inode_getattr/permission() based on neveraudit|permissive
  selinux: introduce neveraudit types
  documentation: add links to SELinux resources
  selinux: add a 5 second sleep to /sys/fs/selinux/user
2025-07-28 18:25:57 -07:00
Linus Torvalds
57fcb7d930 vfs-6.17-rc1.fileattr
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCpgAKCRCRxhvAZXjc
 oqfFAQDcy3rROUF3W34KcSi7rDmaKVSX53d1tUoqH+1zDRpSlwEAriKDNC1ybudp
 YAnxVzkRHjHs1296WIuwKq5lfhJ60Q4=
 =geAl
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull fileattr updates from Christian Brauner:
 "This introduces the new file_getattr() and file_setattr() system calls
  after lengthy discussions.

  Both system calls serve as successors and extensible companions to
  the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have
  started to show their age in addition to being named in a way that
  makes it easy to conflate them with extended attribute related
  operations.

  These syscalls allow userspace to set filesystem inode attributes on
  special files. One of the usage examples is the XFS quota projects.

  XFS has project quotas which could be attached to a directory. All new
  inodes in these directories inherit project ID set on parent
  directory.

  The project is created from userspace by opening and calling
  FS_IOC_FSSETXATTR on each inode. This is not possible for special
  files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left
  with empty project ID. Those inodes then are not shown in the quota
  accounting but still exist in the directory. This is not critical but
  in the case when special files are created in the directory with
  already existing project quota, these new inodes inherit extended
  attributes. This creates a mix of special files with and without
  attributes. Moreover, special files with attributes don't have a
  possibility to become clear or change the attributes. This, in turn,
  prevents userspace from re-creating quota project on these existing
  files.

  In addition, these new system calls allow the implementation of
  additional attributes that we couldn't or didn't want to fit into the
  legacy ioctls anymore"

* tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: tighten a sanity check in file_attr_to_fileattr()
  tree-wide: s/struct fileattr/struct file_kattr/g
  fs: introduce file_getattr and file_setattr syscalls
  fs: prepare for extending file_get/setattr()
  fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP
  selinux: implement inode_file_[g|s]etattr hooks
  lsm: introduce new hooks for setting/getting inode fsxattr
  fs: split fileattr related helpers into separate file
2025-07-28 15:24:14 -07:00
Christian Brauner
ca115d7e75
tree-wide: s/struct fileattr/struct file_kattr/g
Now that we expose struct file_attr as our uapi struct rename all the
internal struct to struct file_kattr to clearly communicate that it is a
kernel internal struct. This is similar to struct mount_{k}attr and
others.

Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04 16:14:39 +02:00
Andrey Albershteyn
bd14e462bb
selinux: implement inode_file_[g|s]etattr hooks
These hooks are called on inode extended attribute retrieval/change.

Cc: selinux@vger.kernel.org
Cc: Paul Moore <paul@paul-moore.com>

Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Link: https://lore.kernel.org/20250630-xattrat-syscall-v6-3-c4e3bc35227b@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-01 22:44:29 +02:00
Al Viro
ee79ba39b3 selinux: don't bother with selinuxfs_info_free() on failures
Failures in sel_fill_super() will be followed by sel_kill_sb(), which
will call selinuxfs_info_free() anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Christian Brauner <brauner@kernel.org>
[PM: subj and description tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-24 19:39:28 -04:00
Paul Moore
9ab71d9204 selinux: add __GFP_NOWARN to hashtab_init() allocations
As reported by syzbot, hashtab_init() can be affected by abnormally
large policy loads which would cause the kernel's allocator to emit
a warning in some configurations.  Since the SELinux hashtab_init()
code handles the case where the allocation fails, due to a large
request or some other reason, we can safely add the __GFP_NOWARN flag
to squelch these abnormally large allocation warnings.

Reported-by: syzbot+bc2c99c2929c3d219fb3@syzkaller.appspotmail.com
Tested-by: syzbot+bc2c99c2929c3d219fb3@syzkaller.appspotmail.com
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-19 17:24:57 -04:00
Stephen Smalley
951b2de06a selinux: optimize selinux_inode_getattr/permission() based on neveraudit|permissive
Extend the task avdcache to also cache whether the task SID is both
permissive and neveraudit, and return immediately if so in both
selinux_inode_getattr() and selinux_inode_permission().

The same approach could be applied to many of the hook functions
although the avdcache would need to be updated for more than directory
search checks in order for this optimization to be beneficial for checks
on objects other than directories.

To test, apply https://github.com/SELinuxProject/selinux/pull/473 to
your selinux userspace, build and install libsepol, and use the following
CIL policy module:
$ cat neverauditpermissive.cil
(typeneveraudit unconfined_t)
(typepermissive unconfined_t)

Without this module inserted, running the following commands:
   perf record make -jN # on an already built allmodconfig tree
   perf report --sort=symbol,dso
yields the following percentages (only showing __d_lookup_rcu for
reference and only showing relevant SELinux functions):
   1.65%  [k] __d_lookup_rcu
   0.53%  [k] selinux_inode_permission
   0.40%  [k] selinux_inode_getattr
   0.15%  [k] avc_lookup
   0.05%  [k] avc_has_perm
   0.05%  [k] avc_has_perm_noaudit
   0.02%  [k] avc_policy_seqno
   0.02%  [k] selinux_file_permission
   0.01%  [k] selinux_inode_alloc_security
   0.01%  [k] selinux_file_alloc_security
for a total of 1.24% for SELinux compared to 1.65% for
__d_lookup_rcu().

After running the following command to insert this module:
   semodule -i neverauditpermissive.cil
and then re-running the same perf commands from above yields
the following non-zero percentages:
   1.74%  [k] __d_lookup_rcu
   0.31%  [k] selinux_inode_permission
   0.03%  [k] selinux_inode_getattr
   0.03%  [k] avc_policy_seqno
   0.01%  [k] avc_lookup
   0.01%  [k] selinux_file_permission
   0.01%  [k] selinux_file_open
for a total of 0.40% for SELinux compared to 1.74% for
__d_lookup_rcu().

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-19 17:23:05 -04:00