Boot parameters prefixed with "sysctl." are processed during the final
stage of system initialization via kernel_init()-> do_sysctl_args(). When
CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled, the sysctl.vm.mem_profiling
entry is not writable and will cause a warning.
Before run_init_process(), system initialization executes in kernel thread
context. Use current->mm to distinguish sysctl writes during
do_sysctl_args() from user-space triggered ones.
And when the proc_handler is from do_sysctl_args(), always return success
because the same value was already set by setup_early_mem_profiling() and
this eliminates a permission denied warning.
Link: https://lkml.kernel.org/r/20260115031536.164254-1-ranxiaokai627@163.com
Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Suggested-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Remove __assume_ctx_lock() from lock initializers.
Implicitly asserting an active context during initialization caused
false-positive double-lock errors when acquiring a lock immediately after its
initialization. Moving forward, guarded member initialization must either:
1. Use guard(type_init)(&lock) or scoped_guard(type_init, ...).
2. Use context_unsafe() for simple initialization.
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/57062131-e79e-42c2-aa0b-8f931cb8cac2@acm.org/
Link: https://patch.msgid.link/20260119094029.1344361-7-elver@google.com
Add scoped init guard definitions for common synchronization primitives
supported by context analysis.
The scoped init guards treat the context as active within initialization
scope of the underlying context lock, given initialization implies
exclusive access to the underlying object. This allows initialization of
guarded members without disabling context analysis, while documenting
initialization from subsequent usage.
The documentation is updated with the new recommendation. Where scoped
init guards are not provided or cannot be implemented (ww_mutex omitted
for lack of multi-arg guard initializers), the alternative is to just
disable context analysis where guarded members are initialized.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20251212095943.GM3911114@noisy.programming.kicks-ass.net/
Link: https://patch.msgid.link/20260119094029.1344361-3-elver@google.com
Massage __bio_iov_iter_get_pages so that it doesn't need the bio, and
move it to lib/iov_iter.c so that it can be used by block code for
other things than filling a bio and by other subsystems like netfs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that there are no users of the low-level SHA-1 interface, remove it.
Specifically:
- Remove SHA1_DIGEST_WORDS (no longer used)
- Remove sha1_init_raw() (no longer used)
- Rename sha1_transform() to sha1_block_generic() and make it static
- Move SHA1_WORKSPACE_WORDS into lib/crypto/sha1.c
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20260123051656.396371-3-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As CPU core counts increase, the number of NVMe IRQs may be smaller than
the total number of CPUs. This forces multiple CPUs to share the same
IRQ. If the IRQ affinity and the CPU's cluster do not align, a
performance penalty can be observed on some platforms.
This patch improves IRQ affinity by grouping CPUs by cluster within each
NUMA domain, ensuring better locality between CPUs and their assigned NVMe
IRQs.
Details:
Intel Xeon E platform packs 4 CPU cores as 1 module (cluster) and share
the L2 cache. Let's say, if there are 40 CPUs in 1 NUMA domain and 11
IRQs to dispatch. The existing algorithm will map first 7 IRQs each with
4 CPUs and remained 4 IRQs each with 3 CPUs. The last 4 IRQs may have
cross cluster issue. For example, the 9th IRQ which pinned to CPU32, then
for CPU31, it will have cross L2 memory access.
CPU |28 29 30 31|32 33 34 35|36 ...
-------- -------- --------
IRQ 8 9 10
If this patch applied, then first 2 IRQs each mapped with 2 CPUs and rest
9 IRQs each mapped with 4 CPUs, which avoids the cross cluster memory
access.
CPU |00 01 02 03|04 05 06 07|08 09 10 11| ...
----- ----- ----------- -----------
IRQ 1 2 3 4
As a result, 15%+ performance difference is observed in FIO
libaio/randread/bs=8k.
Changes since V1:
- Add more performance details in commit messages.
- Fix endless loop when topology_cluster_cpumask return invalid mask.
History:
v1: https://lore.kernel.org/all/20251024023038.872616-1-wangyang.guo@intel.com/
v1 [RESEND]: https://lore.kernel.org/all/20251111020608.1501543-1-wangyang.guo@intel.com/
Link: https://lkml.kernel.org/r/20260113022958.3379650-1-wangyang.guo@intel.com
Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
Reviewed-by: Tianyou Li <tianyou.li@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Dan Liang <dan.liang@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@fb.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Radu Rendec <rrendec@redhat.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a new Kconfig symbol to make CONFIG_DEBUG_ATOMIC more useful on those
architectures which do not align dynamic allocations to 8-byte boundaries.
Without this, CONFIG_DEBUG_ATOMIC produces excessive WARN splats.
Link: https://lkml.kernel.org/r/6d25a12934fe9199332f4d65d17c17de450139a8.1768281748.git.fthain@linux-m68k.org
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Daniel Borkman <daniel@iogearbox.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Hao Luo <haoluo@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Sasha Levin (Microsoft) <sashal@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pointless overhead to use a work queue to reset the static key for a
DO_ONCE_SLEEPABLE() invocation.
Note that the previous code path included a BUG_ON() if the static key
was already disabled. Dropped that as part of this change because:
1) Use of BUG_ON() is highly discouraged.
2) There is a WARN_ON() in the static_branch_disable() code path
that would provide adequate breadcrumbs to debug any issue.
Link: https://lkml.kernel.org/r/aWU4tfTju1l3oZCu@agluck-desk3
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
During the initialization phase, the test_kho module invokes the
kho_preserve_folio function, which internally configures bitmaps within
kho_mem_track and establishes chunk linked lists in KHO. Upon unloading
the test_kho module, it is necessary to clean up these states.
Link: https://lkml.kernel.org/r/20260107022427.4114424-1-longwei27@huawei.com
Signed-off-by: Long Wei <longwei27@huawei.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: hewenliang <hewenliang4@huawei.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch converts the existing glob selftest (lib/globtest.c) to use the
KUnit framework (lib/tests/glob_kunit.c).
The new test:
- Migrates all 64 test cases from the original test to the KUnit suite.
- Removes the custom 'verbose' module parameter as KUnit handles logging.
- Updates Kconfig.debug and Makefile to support the new KUnit test.
- Updates Kconfig and Makefile to remove the original selftest.
- Updates GLOB_SELFTEST to GLOB_KUNIT_TEST for arch/m68k/configs.
This commit is verified by `./tools/testing/kunit/kunit.py run'
with the .kunit/.kunitconfig:
CONFIG_KUNIT=y
CONFIG_GLOB_KUNIT_TEST=y
Link: https://lkml.kernel.org/r/20260108120753.27339-1-note351@hotmail.com
Signed-off-by: Kir Chou <note351@hotmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: <kirchou@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The comment for CONFIG_BOOTPARAM_HUNG_TASK_PANIC says:
Say N if unsure.
but since commit 9544f9e694 ("hung_task: panic when there are more than
N hung tasks at the same time"), N is not a valid value for the option,
leading to a warning at build time:
.config:11736:warning: symbol value 'n' invalid for BOOTPARAM_HUNG_TASK_PANIC
as well as an error when given to menuconfig.
Fix the comment to say '0' instead of 'N'.
Link: https://lkml.kernel.org/r/20260106140140.136446-1-tglozar@redhat.com
Fixes: 9544f9e694 ("hung_task: panic when there are more than N hung tasks at the same time")
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Reported-by: Johnny Mnemonic <jm@machine-hall.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The `struct kho_vmalloc` defines the in-memory layout for preserving
vmalloc regions across kexec. This layout is a contract between kernels
and part of the KHO ABI.
To reflect this relationship, the related structs and helper macros are
relocated to the ABI header, `include/linux/kho/abi/kexec_handover.h`.
This move places the structure's definition under the protection of the
KHO_FDT_COMPATIBLE version string.
The structure and its components are now also documented within the ABI
header to describe the contract and prevent ABI breaks.
[rppt@kernel.org: update comment, per Pratyush]
Link: https://lkml.kernel.org/r/aW_Mqp6HcqLwQImS@kernel.org
Link: https://lkml.kernel.org/r/20260105165839.285270-6-rppt@kernel.org
Signed-off-by: Jason Miu <jasonmiu@google.com>
Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit ae5b350085 ("kstrtox: add support for enabled and disabled in
kstrtobool()") added support for 'e'/'E' (enabled) and 'd'/'D' (disabled)
inputs, but did not update the docstring accordingly.
Update the docstring to include 'Ee' (for true) and 'Dd' (for false) in
the list of accepted first characters.
Link: https://lkml.kernel.org/r/20251227092229.57330-1-chaitanyamishra.ai@gmail.com
Fixes: ae5b350085 ("kstrtox: add support for enabled and disabled in kstrtobool()")
Signed-off-by: Chaitanya Mishra <chaitanyamishra.ai@gmail.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Move lib/test_min_heap.c to lib/tests/min_heap_kunit.c and convert it to
use KUnit.
This change switches the ad-hoc test code to standard KUnit test cases.
The test data remains the same, but the verification logic is updated to
use KUNIT_EXPECT_* macros.
Also remove CONFIG_TEST_MIN_HEAP from arch/*/configs/* because it is no
longer used. The new CONFIG_MIN_HEAP_KUNIT_TEST will be automatically
enabled by CONFIG_KUNIT_ALL_TESTS.
The reasons for converting to KUnit are:
1. Standardization:
Switching from ad-hoc printk-based reporting to the standard
KTAP format makes it easier for CI systems to parse and report test
results
2. Better Diagnostics:
Using KUNIT_EXPECT_* macros automatically provides detailed
diagnostics on failure.
3. Tooling Integration:
It allows the test to be managed and executed using standard
KUnit tools.
Link: https://lkml.kernel.org/r/20251221133516.321846-1-sakamo.ryota@gmail.com
Signed-off-by: Ryota Sakamoto <sakamo.ryota@gmail.com>
Acked-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: David Gow <davidgow@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
To be consistent, pass the kmalloc_array_node() parameters in the order
(number_of_elements, element_size). Since only the product of the two
values is used, this is not a bug fix.
Link: https://lkml.kernel.org/r/20251220054541.2295599-1-rdunlap@infradead.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216015
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reinitialize metadata for large zone device private folios in
zone_device_page_init prior to creating a higher-order zone device private
folio. This step is necessary when the folio's order changes dynamically
between zone_device_page_init calls to avoid building a corrupt folio. As
part of the metadata reinitialization, the dev_pagemap must be passed in
from the caller because the pgmap stored in the folio page may have been
overwritten with a compound head.
Without this fix, individual pages could have invalid pgmap fields and
flags (with PG_locked being notably problematic) due to prior different
order allocations, which can, and will, result in kernel crashes.
Link: https://lkml.kernel.org/r/20260116111325.1736137-2-francois.dugast@intel.com
Fixes: d245f9b4ab ("mm/zone_device: support large zone device private folios")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Balbir Singh <balbirs@nvidia.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bernd has reported a lockdep splat from flexible proportions code that is
essentially complaining about the following race:
<timer fires>
run_timer_softirq - we are in softirq context
call_timer_fn
writeout_period
fprop_new_period
write_seqcount_begin(&p->sequence);
<hardirq is raised>
...
blk_mq_end_request()
blk_update_request()
ext4_end_bio()
folio_end_writeback()
__wb_writeout_add()
__fprop_add_percpu_max()
if (unlikely(max_frac < FPROP_FRAC_BASE)) {
fprop_fraction_percpu()
seq = read_seqcount_begin(&p->sequence);
- sees odd sequence so loops indefinitely
Note that a deadlock like this is only possible if the bdi has configured
maximum fraction of writeout throughput which is very rare in general but
frequent for example for FUSE bdis. To fix this problem we have to make
sure write section of the sequence counter is irqsafe.
Link: https://lkml.kernel.org/r/20260121112729.24463-2-jack@suse.cz
Fixes: a91befde35 ("lib/flex_proportions.c: remove local_irq_ops in fprop_new_period()")
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Bernd Schubert <bernd@bsbernd.com>
Link: https://lore.kernel.org/all/9b845a47-9aee-43dd-99bc-1a82bea00442@bsbernd.com/
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Joanne Koong <joannelkoong@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
With CONFIG_CC_OPTIMIZE_FOR_SIZE=y, GCC may decide not to inline
__cvdso_clock_getres_common(). This introduces spurious internal
function calls in the vDSO fastpath.
Furthermore, with automatic stack variable initialization
(CONFIG_INIT_STACK_ALL_ZERO or CONFIG_INIT_STACK_ALL_PATTERN) GCC can emit
a call to memset() which is not valid in the vDSO.
Mark __cvdso_clock_getres_common() as __always_inline to avoid both issues.
Paradoxically the inlining even reduces the size of the code:
$ ./scripts/bloat-o-meter arch/powerpc/kernel/vdso/vgettimeofday-32.o.before arch/powerpc/kernel/vdso/vgettimeofday-32.o.after
add/remove: 0/1 grow/shrink: 1/1 up/down: 52/-148 (-96)
Function old new delta
__c_kernel_clock_getres_time64 92 144 +52
__c_kernel_clock_getres 136 132 -4
__cvdso_clock_getres_common 144 - -144
Total: Before=2788, After=2692, chg -3.44%
With CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y the functions are always inlined
and therefore the behaviour stays the same.
See also the equivalent change for clock_gettime() in commit b91c8c42ff
("lib/vdso: Force inlining of __cvdso_clock_gettime_common()").
Fixes: 21bbfd7404 ("x86/vdso: Provide clock_getres_time64() for x86-32")
Fixes: 1149dcdfc9 ("ARM: VDSO: Provide clock_getres_time64()")
Fixes: f10c2e72b5 ("arm64: vdso32: Provide clock_getres_time64()")
Fixes: bec06cd6a1 ("MIPS: vdso: Provide getres_time64() for 32-bit ABIs")
Fixes: 759a1f9737 ("powerpc/vdso: Provide clock_getres_time64()")
Reported-by: Sverdlin, Alexander <alexander.sverdlin@siemens.com>
Suggested-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://lore.kernel.org/lkml/230c749f-ebd6-4829-93ee-601d88000a45@kernel.org/
Link: https://patch.msgid.link/20260123-vdso-clock_getres-inline-v1-1-4d6203b90cd3@linutronix.de
Closes: https://lore.kernel.org/lkml/f45316f65a46da638b3c6aa69effd8980e6677b9.camel@siemens.com/
The softlockup_panic sysctl is currently a binary option: panic
immediately or never panic on soft lockups.
Panicking on any soft lockup, regardless of duration, can be overly
aggressive for brief stalls that may be caused by legitimate operations.
Conversely, never panicking may allow severe system hangs to persist
undetected.
Extend softlockup_panic to accept an integer threshold, allowing the
kernel to panic only when the normalized lockup duration exceeds N
watchdog threshold periods. This provides finer-grained control to
distinguish between transient delays and persistent system failures.
The accepted values are:
- 0: Don't panic (unchanged)
- 1: Panic when duration >= 1 * threshold (20s default, original behavior)
- N > 1: Panic when duration >= N * threshold (e.g., 2 = 40s, 3 = 60s.)
The original behavior is preserved for values 0 and 1, maintaining full
backward compatibility while allowing systems to tolerate brief lockups
while still catching severe, persistent hangs.
[lirongqing@baidu.com: v2]
Link: https://lkml.kernel.org/r/20251218074300.4080-1-lirongqing@baidu.com
Link: https://lkml.kernel.org/r/20251216074521.2796-1-lirongqing@baidu.com
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of
hex.h interfaces to directly #include <linux/hex.h> as part of the process
of putting kernel.h on a diet.
Removing hex.h from kernel.h means that 36K C source files don't have to
pay the price of parsing hex.h for the roughly 120 C source files that
need it.
This change has been build-tested with allmodconfig on most ARCHes. Also,
all users/callers of <linux/hex.h> in the entire source tree have been
updated if needed (if not already #included).
Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Move lib/test_uuid.c to lib/tests/uuid_kunit.c and convert it to use KUnit.
This change switches the ad-hoc test code to standard KUnit test cases.
The test data remains the same, but the verification logic is updated to
use KUNIT_EXPECT_* macros.
Also remove CONFIG_TEST_UUID from arch/*/configs/* because it is no longer
used. The new CONFIG_UUID_KUNIT_TEST will be automatically enabled by
CONFIG_KUNIT_ALL_TESTS.
[lukas.bulwahn@redhat.com: MAINTAINERS: adjust file entry in UUID HELPERS]
Link: https://lkml.kernel.org/r/20251217053907.2778515-1-lukas.bulwahn@redhat.com
Link: https://lkml.kernel.org/r/20251215134322.12949-1-sakamo.ryota@gmail.com
Signed-off-by: Ryota Sakamoto <sakamo.ryota@gmail.com>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: David Gow <davidgow@google.com>
Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The current OID registry parser uses 64 bit arithmetic which limits us to
supporting 64 bit or smaller OIDs. This isn't usually a problem except
that it prevents us from representing the 2.25. prefix OIDs which are the
OID representation of UUIDs and have a 128 bit number following the
prefix. Rather than import not often used perl arithmetic modules,
replace the current perl 64 bit arithmetic with a callout to bc, which is
arbitrary precision, for decimal to base 2 conversion, then do pure string
operations on the base 2 number.
[James.Bottomley@HansenPartnership.com: tidy up perl with better my placement also set bc to arbitrary size]
Link: https://lkml.kernel.org/r/dbc90c344c691ed988640a28367ff895b5ef2604.camel@HansenPartnership.com
Link: https://lkml.kernel.org/r/833c858cd74533203b43180208734b84f1137af0.camel@HansenPartnership.com
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If PAGE_SIZE is larger than 4k and if you have a system with a large
number of CPUs, this test can require a very large amount of memory
leading to oom-killer firing. Given the type of allocation, the kernel
won't have anything to kill, causing the system to stall.
Add a parameter to the test_vmalloc driver to represent the number of
times a percpu object will be allocated. Calculate this in
test_vmalloc.sh to be 90% of available memory or the current default of
35000, whichever is smaller.
Link: https://lkml.kernel.org/r/20251201181848.1216197-1-audra@redhat.com
Signed-off-by: Audra Mitchell <audra@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Rafael Aquini <raquini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Remove the change in file mode permissions done before initializing the
sysctl. It is not necessary as the writing of the kernel variable will be
blocked by the proc_mem_profiling_handler when writing is disallowed (also
controlled by mem_profiling_support).
Link: https://lkml.kernel.org/r/20251215-jag-alloc_tag_const-v1-1-35ea56a1ce13@kernel.org
Signed-off-by: Joel Granados <joel.granados@kernel.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The volatile keyword is no longer necessary or useful on aes_sbox and
aes_inv_sbox, since the table prefetching is now done using a helper
function that casts to volatile itself and also includes an optimization
barrier. Since it prevents some compiler optimizations, remove it.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-36-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Now that all callers of the aes_encrypt() and aes_decrypt() type-generic
macros are using the new types, remove the old functions.
Then, replace the macro with direct calls to the new functions, dropping
the "_new" suffix from them.
This completes the change in the type of the key struct that is passed
to aes_encrypt() and aes_decrypt().
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-35-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Switch from the old AES library functions (which use struct
crypto_aes_ctx) to the new ones (which use struct aes_enckey). This
eliminates the unnecessary computation and caching of the decryption
round keys. The new AES en/decryption functions are also much faster
and use AES instructions when supported by the CPU.
Note that in addition to the change in the key preparation function and
the key struct type itself, the change in the type of the key struct
results in aes_encrypt() (which is temporarily a type-generic macro)
calling the new encryption function rather than the old one.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-34-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Switch from the old AES library functions (which use struct
crypto_aes_ctx) to the new ones (which use struct aes_enckey). This
eliminates the unnecessary computation and caching of the decryption
round keys. The new AES en/decryption functions are also much faster
and use AES instructions when supported by the CPU.
Note that in addition to the change in the key preparation function and
the key struct type itself, the change in the type of the key struct
results in aes_encrypt() (which is temporarily a type-generic macro)
calling the new encryption function rather than the old one.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-33-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Optimize the AES library with x86 AES-NI instructions.
The relevant existing assembly functions, aesni_set_key(), aesni_enc(),
and aesni_dec(), are a bit difficult to extract into the library:
- They're coupled to the code for the AES modes.
- They operate on struct crypto_aes_ctx. The AES library now uses
different structs.
- They assume the key is 16-byte aligned. The AES library only
*prefers* 16-byte alignment; it doesn't require it.
Moreover, they're not all that great in the first place:
- They use unrolled loops, which isn't a great choice on x86.
- They use the 'aeskeygenassist' instruction, which is unnecessary, is
slow on Intel CPUs, and forces the loop to be unrolled.
- They have special code for AES-192 key expansion, despite that being
kind of useless. AES-128 and AES-256 are the ones used in practice.
These are small functions anyway.
Therefore, I opted to just write replacements of these functions for the
library. They address all the above issues.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-18-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Move the SPARC64 AES assembly code into lib/crypto/, wire the key
expansion and single-block en/decryption functions up to the AES library
API, and remove the "aes-sparc64" crypto_cipher algorithm.
The result is that both the AES library and crypto_cipher APIs use the
SPARC64 AES opcodes, whereas previously only crypto_cipher did (and it
wasn't enabled by default, which this commit fixes as well).
Note that some of the functions in the SPARC64 AES assembly code are
still used by the AES mode implementations in
arch/sparc/crypto/aes_glue.c. For now, just export these functions.
These exports will go away once the AES mode implementations are
migrated to the library as well. (Trying to split up the assembly file
seemed like much more trouble than it would be worth.)
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-17-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Implement aes_preparekey_arch(), aes_encrypt_arch(), and
aes_decrypt_arch() using the CPACF AES instructions.
Then, remove the superseded "aes-s390" crypto_cipher.
The result is that both the AES library and crypto_cipher APIs use the
CPACF AES instructions, whereas previously only crypto_cipher did (and
it wasn't enabled by default, which this commit fixes as well).
Note that this preserves the optimization where the AES key is stored in
raw form rather than expanded form. CPACF just takes the raw key.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Link: https://lore.kernel.org/r/20260112192035.10427-16-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Prevent a "BUG: unable to handle kernel NULL pointer dereference in
filemap_read_folio".
For the sleepable context, convert freader to use __kernel_read() instead
of direct page cache access via read_cache_folio(). This simplifies the
faultable code path by using the standard kernel file reading interface
which handles all the complexity of reading file data.
At the moment we are not changing the code for non-sleepable context which
uses filemap_get_folio() and only succeeds if the target folios are
already in memory and up-to-date. The reason is to keep the patch simple
and easier to backport to stable kernels.
Syzbot repro does not crash the kernel anymore and the selftests run
successfully.
In the follow up we will make __kernel_read() with IOCB_NOWAIT work for
non-sleepable contexts. In addition, I would like to replace the
secretmem check with a more generic approach and will add fstest for the
buildid code.
Link: https://lkml.kernel.org/r/20251222205859.3968077-1-shakeel.butt@linux.dev
Fixes: ad41251c29 ("lib/buildid: implement sleepable build_id_parse() API")
Reported-by: syzbot+09b7d050e4806540153d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=09b7d050e4806540153d
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jinchao Wang <wangjinchao600@gmail.com>
Link: https://lkml.kernel.org/r/aUteBPWPYzVWIZFH@ndev
Reviewed-by: Christian Brauner <brauner@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkman <daniel@iogearbox.net>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cross-merge BPF and other fixes after downstream PR.
No conflicts.
Adjacent:
Auto-merging MAINTAINERS
Auto-merging Makefile
Auto-merging kernel/bpf/verifier.c
Auto-merging kernel/sched/ext.c
Auto-merging mm/memcontrol.c
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Move the aes_encrypt_zvkned() and aes_decrypt_zvkned() assembly
functions into lib/crypto/, wire them up to the AES library API, and
remove the "aes-riscv64-zvkned" crypto_cipher algorithm.
To make this possible, change the prototypes of these functions to
take (rndkeys, key_len) instead of a pointer to crypto_aes_ctx, and
change the RISC-V AES-XTS code to implement tweak encryption using the
AES library instead of directly calling aes_encrypt_zvkned().
The result is that both the AES library and crypto_cipher APIs use
RISC-V's AES instructions, whereas previously only crypto_cipher did
(and it wasn't enabled by default, which this commit fixes as well).
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-15-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Move the POWER8 AES assembly code into lib/crypto/, wire the key
expansion and single-block en/decryption functions up to the AES library
API, and remove the superseded "p8_aes" crypto_cipher algorithm.
The result is that both the AES library and crypto_cipher APIs are now
optimized for POWER8, whereas previously only crypto_cipher was (and
optimizations weren't enabled by default, which this commit fixes too).
Note that many of the functions in the POWER8 assembly code are still
used by the AES mode implementations in arch/powerpc/crypto/. For now,
just export these functions. These exports will go away once the AES
modes are migrated to the library as well. (Trying to split up the
assembly file seemed like much more trouble than it would be worth.)
Another challenge with this code is that the POWER8 assembly code uses a
custom format for the expanded AES key. Since that code is imported
from OpenSSL and is also targeted to POWER8 (rather than POWER9 which
has better data movement and byteswap instructions), that is not easily
changed. For now I've just kept the custom format. To maintain full
correctness, this requires executing some slow fallback code in the case
where the usability of VSX changes between key expansion and use. This
should be tolerable, as this case shouldn't happen in practice.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-14-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Move the PowerPC SPE AES assembly code into lib/crypto/, wire the key
expansion and single-block en/decryption functions up to the AES library
API, and remove the superseded "aes-ppc-spe" crypto_cipher algorithm.
The result is that both the AES library and crypto_cipher APIs are now
optimized with SPE, whereas previously only crypto_cipher was (and
optimizations weren't enabled by default, which this commit fixes too).
Note that many of the functions in the PowerPC SPE assembly code are
still used by the AES mode implementations in arch/powerpc/crypto/. For
now, just export these functions. These exports will go away once the
AES modes are migrated to the library as well. (Trying to split up the
assembly files seemed like much more trouble than it would be worth.)
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Move the ARM64 optimized AES key expansion and single-block AES
en/decryption code into lib/crypto/, wire it up to the AES library API,
and remove the superseded crypto_cipher algorithms.
The result is that both the AES library and crypto_cipher APIs are now
optimized for ARM64, whereas previously only crypto_cipher was (and the
optimizations weren't enabled by default, which this fixes as well).
Note: to see the diff from arch/arm64/crypto/aes-ce-glue.c to
lib/crypto/arm64/aes.h, view this commit with 'git show -M10'.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Move the ARM optimized single-block AES en/decryption code into
lib/crypto/, wire it up to the AES library API, and remove the
superseded "aes-arm" crypto_cipher algorithm.
The result is that both the AES library and crypto_cipher APIs are now
optimized for ARM, whereas previously only crypto_cipher was (and the
optimizations weren't enabled by default, which this fixes as well).
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-11-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
The kernel's AES library currently has the following issues:
- It doesn't take advantage of the architecture-optimized AES code,
including the implementations using AES instructions.
- It's much slower than even the other software AES implementations: 2-4
times slower than "aes-generic", "aes-arm", and "aes-arm64".
- It requires that both the encryption and decryption round keys be
computed and cached. This is wasteful for users that need only the
forward (encryption) direction of the cipher: the key struct is 484
bytes when only 244 are actually needed. This missed optimization is
very common, as many AES modes (e.g. GCM, CFB, CTR, CMAC, and even the
tweak key in XTS) use the cipher only in the forward (encryption)
direction even when doing decryption.
- It doesn't provide the flexibility to customize the prepared key
format. The API is defined to do key expansion, and several callers
in drivers/crypto/ use it specifically to expand the key. This is an
issue when integrating the existing powerpc, s390, and sparc code,
which is necessary to provide full parity with the traditional API.
To resolve these issues, I'm proposing the following changes:
1. New structs 'aes_key' and 'aes_enckey' are introduced, with
corresponding functions aes_preparekey() and aes_prepareenckey().
Generally these structs will include the encryption+decryption round
keys and the encryption round keys, respectively. However, the exact
format will be under control of the architecture-specific AES code.
(The verb "prepare" is chosen over "expand" since key expansion isn't
necessarily done. It's also consistent with hmac*_preparekey().)
2. aes_encrypt() and aes_decrypt() will be changed to operate on the new
structs instead of struct crypto_aes_ctx.
3. aes_encrypt() and aes_decrypt() will use architecture-optimized code
when available, or else fall back to a new generic AES implementation
that unifies the existing two fragmented generic AES implementations.
The new generic AES implementation uses tables for both SubBytes and
MixColumns, making it almost as fast as "aes-generic". However,
instead of aes-generic's huge 8192-byte tables per direction, it uses
only 1024 bytes for encryption and 1280 bytes for decryption (similar
to "aes-arm"). The cost is just some extra rotations.
The new generic AES implementation also includes table prefetching,
making it have some "constant-time hardening". That's an improvement
from aes-generic which has no constant-time hardening.
It does slightly regress in constant-time hardening vs. the old
lib/crypto/aes.c which had smaller tables, and from aes-fixed-time
which disabled IRQs on top of that. But I think this is tolerable.
The real solutions for constant-time AES are AES instructions or
bit-slicing. The table-based code remains a best-effort fallback for
the increasingly-rare case where a real solution is unavailable.
4. crypto_aes_ctx and aes_expandkey() will remain for now, but only for
callers that are using them specifically for the AES key expansion
(as opposed to en/decrypting data with the AES library).
This commit begins the migration process by introducing the new structs
and functions, backed by the new generic AES implementation.
To allow callers to be incrementally converted, aes_encrypt() and
aes_decrypt() are temporarily changed into macros that use a _Generic
expression to call either the old functions (which take crypto_aes_ctx)
or the new functions (which take the new types). Once all callers have
been updated, these macros will go away, the old functions will be
removed, and the "_new" suffix will be dropped from the new functions.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260112192035.10427-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Since ML-DSA is FIPS-approved, add the boot-time self-test which is
apparently required.
Just add a test vector manually for now, borrowed from
lib/crypto/tests/mldsa-testvecs.h (where in turn it's borrowed from
leancrypto). The SHA-* FIPS test vectors are generated by
scripts/crypto/gen-fips-testvecs.py instead, but the common Python
libraries don't support ML-DSA yet.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20260107044215.109930-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Since the architecture-specific implementations of NH initialize memory
in assembly code, they aren't compatible with KMSAN as-is.
Fixes: 382de740759a ("lib/crypto: nh: Add NH library")
Link: https://lore.kernel.org/r/20260105053652.1708299-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Migrate the x86_64 implementations of NH into lib/crypto/. This makes
the nh() function be optimized on x86_64 kernels.
Note: this temporarily makes the adiantum template not utilize the
x86_64 optimized NH code. This is resolved in a later commit that
converts the adiantum template to use nh() instead of "nhpoly1305".
Link: https://lore.kernel.org/r/20251211011846.8179-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Migrate the arm64 NEON implementation of NH into lib/crypto/. This
makes the nh() function be optimized on arm64 kernels.
Note: this temporarily makes the adiantum template not utilize the arm64
optimized NH code. This is resolved in a later commit that converts the
adiantum template to use nh() instead of "nhpoly1305".
Link: https://lore.kernel.org/r/20251211011846.8179-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Migrate the arm32 NEON implementation of NH into lib/crypto/. This
makes the nh() function be optimized on arm32 kernels.
Note: this temporarily makes the adiantum template not utilize the arm32
optimized NH code. This is resolved in a later commit that converts the
adiantum template to use nh() instead of "nhpoly1305".
Link: https://lore.kernel.org/r/20251211011846.8179-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>