Commit graph

9334 commits

Author SHA1 Message Date
Linus Torvalds
4ae12d8bd9 Second round of Kbuild fixes for 7.0
- Split out .modinfo section from ELF_DETAILS macro, as that macro may
   be used in other areas that expect to discard .modinfo, breaking
   certain image layouts
 
 - Adjust genksyms parser to handle optional attributes in certain
   declarations, necessary after commit 07919126ec ("netfilter:
   annotate NAT helper hook pointers with __rcu")
 
 - Include resolve_btfids in external module build created by
   scripts/package/install-extmod-build when it may be run on
   external modules
 
 - Avoid removing objtool binary with 'make clean', as it is required for
   external module builds
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR74yXHMTGczQHYypIdayaRccAalgUCaat33gAKCRAdayaRccAa
 lizMAQCxm0P5WsJf3ydYR+5ZzzM7wreNtpMVMXsCbwOKBGY3VwEAyvB7om1a00Ex
 Z6WFa9P4VKW+L4PWMnWoyxcnvl/CdgM=
 =mvIb
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux

Pull Kbuild fixes from Nathan Chancellor:

 - Split out .modinfo section from ELF_DETAILS macro, as that macro may
   be used in other areas that expect to discard .modinfo, breaking
   certain image layouts

 - Adjust genksyms parser to handle optional attributes in certain
   declarations, necessary after commit 07919126ec ("netfilter:
   annotate NAT helper hook pointers with __rcu")

 - Include resolve_btfids in external module build created by
   scripts/package/install-extmod-build when it may be run on external
   modules

 - Avoid removing objtool binary with 'make clean', as it is required
   for external module builds

* tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  kbuild: Leave objtool binary around with 'make clean'
  kbuild: install-extmod-build: Package resolve_btfids if necessary
  genksyms: Fix parsing a declarator with a preceding attribute
  kbuild: Split .modinfo out from ELF_DETAILS
2026-03-06 20:27:13 -08:00
Nilay Shroff
2185904ff8 powerpc/pci: Initialize msi_addr_mask for OF-created PCI devices
Recent changes replaced the use of no_64bit_msi with msi_addr_mask.  As a
result, msi_addr_mask is now expected to be initialized to DMA_BIT_MASK(64)
when a pci_dev is set up. However, this initialization was missed on
powerpc due to differences in the device initialization path compared to
other (x86) architecture. Due to this, now PCI device probe method fails on
powerpc system.

On powerpc systems, struct pci_dev instances are created from device tree
nodes via of_create_pci_dev(). Because msi_addr_mask was not initialized
there, it remained zero. Later, during MSI setup, msi_verify_entries()
validates the programmed MSI address against pdev->msi_addr_mask. Since the
mask was not set correctly, the validation fails, causing PCI driver probe
failures for devices on powerpc systems.

Initialize pdev->msi_addr_mask to DMA_BIT_MASK(64) in of_create_pci_dev()
so that MSI address validation succeeds and device probe works as expected.

Fixes: 386ced19e9 ("PCI/MSI: Convert the boolean no_64bit_msi flag to a DMA address mask")
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Tested-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn>
Acked-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260220070239.1693303-2-nilay@linux.ibm.com
2026-03-03 10:29:15 -06:00
Nathan Chancellor
8678591b47
kbuild: Split .modinfo out from ELF_DETAILS
Commit 3e86e4d74c ("kbuild: keep .modinfo section in
vmlinux.unstripped") added .modinfo to ELF_DETAILS while removing it
from COMMON_DISCARDS, as it was needed in vmlinux.unstripped and
ELF_DETAILS was present in all architecture specific vmlinux linker
scripts. While this shuffle is fine for vmlinux, ELF_DETAILS and
COMMON_DISCARDS may be used by other linker scripts, such as the s390
and x86 compressed boot images, which may not expect to have a .modinfo
section. In certain circumstances, this could result in a bootloader
failing to load the compressed kernel [1].

Commit ddc6cbef3e ("s390/boot/vmlinux.lds.S: Ensure bzImage ends with
SecureBoot trailer") recently addressed this for the s390 bzImage but
the same bug remains for arm, parisc, and x86. The presence of .modinfo
in the x86 bzImage was the root cause of the issue worked around with
commit d50f210913 ("kbuild: align modinfo section for Secureboot
Authenticode EDK2 compat"). misc.c in arch/x86/boot/compressed includes
lib/decompress_unzstd.c, which in turn includes lib/xxhash.c and its
MODULE_LICENSE / MODULE_DESCRIPTION macros due to the STATIC definition.

Split .modinfo out from ELF_DETAILS into its own macro and handle it in
all vmlinux linker scripts. Discard .modinfo in the places where it was
previously being discarded from being in COMMON_DISCARDS, as it has
never been necessary in those uses.

Cc: stable@vger.kernel.org
Fixes: 3e86e4d74c ("kbuild: keep .modinfo section in vmlinux.unstripped")
Reported-by: Ed W <lists@wildgooses.com>
Closes: https://lore.kernel.org/587f25e0-a80e-46a5-9f01-87cb40cfa377@wildgooses.com/ [1]
Tested-by: Ed W <lists@wildgooses.com> # x86_64
Link: https://patch.msgid.link/20260225-separate-modinfo-from-elf-details-v1-1-387ced6baf4b@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-02-26 11:50:19 -07:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
136114e0ab mm.git review status for linus..mm-nonmm-stable
Total patches:       107
 Reviews/patch:       1.07
 Reviewed rate:       67%
 
 - The 2 patch series "ocfs2: give ocfs2 the ability to reclaim
   suballocator free bg" from Heming Zhao saves disk space by teaching
   ocfs2 to reclaim suballocator block group space.
 
 - The 4 patch series "Add ARRAY_END(), and use it to fix off-by-one
   bugs" from Alejandro Colomar adds the ARRAY_END() macro and uses it in
   various places.
 
 - The 2 patch series "vmcoreinfo: support VMCOREINFO_BYTES larger than
   PAGE_SIZE" from Pnina Feder makes the vmcore code future-safe, if
   VMCOREINFO_BYTES ever exceeds the page size.
 
 - The 7 patch series "kallsyms: Prevent invalid access when showing
   module buildid" from Petr Mladek cleans up kallsyms code related to
   module buildid and fixes an invalid access crash when printing
   backtraces.
 
 - The 3 patch series "Address page fault in
   ima_restore_measurement_list()" from Harshit Mogalapalli fixes a
   kexec-related crash that can occur when booting the second-stage kernel
   on x86.
 
 - The 6 patch series "kho: ABI headers and Documentation updates" from
   Mike Rapoport updates the kexec handover ABI documentation.
 
 - The 4 patch series "Align atomic storage" from Finn Thain adds the
   __aligned attribute to atomic_t and atomic64_t definitions to get
   natural alignment of both types on csky, m68k, microblaze, nios2,
   openrisc and sh.
 
 - The 2 patch series "kho: clean up page initialization logic" from
   Pratyush Yadav simplifies the page initialization logic in
   kho_restore_page().
 
 - The 6 patch series "Unload linux/kernel.h" from Yury Norov moves
   several things out of kernel.h and into more appropriate places.
 
 - The 7 patch series "don't abuse task_struct.group_leader" from Oleg
   Nesterov removes the usage of ->group_leader when it is "obviously
   unnecessary".
 
 - The 5 patch series "list private v2 & luo flb" from Pasha Tatashin
   adds some infrastructure improvements to the live update orchestrator.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY4giAAKCRDdBJ7gKXxA
 jgusAQDnKkP8UWTqXPC1jI+OrDJGU5ciAx8lzLeBVqMKzoYk9AD/TlhT2Nlx+Ef6
 0HCUHUD0FMvAw/7/Dfc6ZKxwBEIxyww=
 =mmsH
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
   disk space by teaching ocfs2 to reclaim suballocator block group
   space (Heming Zhao)

 - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
   ARRAY_END() macro and uses it in various places (Alejandro Colomar)

 - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
   the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
   page size (Pnina Feder)

 - "kallsyms: Prevent invalid access when showing module buildid" cleans
   up kallsyms code related to module buildid and fixes an invalid
   access crash when printing backtraces (Petr Mladek)

 - "Address page fault in ima_restore_measurement_list()" fixes a
   kexec-related crash that can occur when booting the second-stage
   kernel on x86 (Harshit Mogalapalli)

 - "kho: ABI headers and Documentation updates" updates the kexec
   handover ABI documentation (Mike Rapoport)

 - "Align atomic storage" adds the __aligned attribute to atomic_t and
   atomic64_t definitions to get natural alignment of both types on
   csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)

 - "kho: clean up page initialization logic" simplifies the page
   initialization logic in kho_restore_page() (Pratyush Yadav)

 - "Unload linux/kernel.h" moves several things out of kernel.h and into
   more appropriate places (Yury Norov)

 - "don't abuse task_struct.group_leader" removes the usage of
   ->group_leader when it is "obviously unnecessary" (Oleg Nesterov)

 - "list private v2 & luo flb" adds some infrastructure improvements to
   the live update orchestrator (Pasha Tatashin)

* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
  watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
  procfs: fix missing RCU protection when reading real_parent in do_task_stat()
  watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
  kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
  kho: fix doc for kho_restore_pages()
  tests/liveupdate: add in-kernel liveupdate test
  liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
  liveupdate: luo_file: Use private list
  list: add kunit test for private list primitives
  list: add primitives for private list manipulations
  delayacct: fix uapi timespec64 definition
  panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
  netclassid: use thread_group_leader(p) in update_classid_task()
  RDMA/umem: don't abuse current->group_leader
  drm/pan*: don't abuse current->group_leader
  drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
  drm/amdgpu: don't abuse current->group_leader
  android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
  android/binder: don't abuse current->group_leader
  kho: skip memoryless NUMA nodes when reserving scratch areas
  ...
2026-02-12 12:13:01 -08:00
Linus Torvalds
4cff5c05e0 mm.git review status for linus..mm-stable
Everything:
 
 Total patches:       325
 Reviews/patch:       1.39
 Reviewed rate:       72%
 
 Excluding DAMON:
 
 Total patches:       262
 Reviews/patch:       1.63
 Reviewed rate:       82%
 
 Excluding DAMON and zram:
 
 Total patches:       248
 Reviews/patch:       1.72
 Reviewed rate:       86%
 
 - The 14 patch series "powerpc/64s: do not re-activate batched TLB
   flush" from Alexander Gordeev makes arch_{enter|leave}_lazy_mmu_mode()
   nest properly.
 
   It adds a generic enter/leave layer and switches architectures to use
   it.  Various hacks were removed in the process.
 
 - The 7 patch series "zram: introduce compressed data writeback" from
   Richard Chang and Sergey Senozhatsky implements data compression for
   zram writeback.
 
 - The 8 patch series "mm: folio_zero_user: clear page ranges" from David
   Hildenbrand adds clearing of contiguous page ranges for hugepages.
   Large improvements during demand faulting are demonstrated.
 
 - The 2 patch series "memcg cleanups" from Chen Ridong tideis up some
   memcg code.
 
 - The 12 patch series "mm/damon: introduce {,max_}nr_snapshots and
   tracepoint for damos stats" from SeongJae Park improves DAMOS stat's
   provided information, deterministic control, and readability.
 
 - The 3 patch series "selftests/mm: hugetlb cgroup charging: robustness
   fixes" from Li Wang fixes a few issues in the hugetlb cgroup charging
   selftests.
 
 - The 5 patch series "Fix va_high_addr_switch.sh test failure - again"
   from Chunyu Hu addresses several issues in the va_high_addr_switch test.
 
 - The 5 patch series "mm/damon/tests/core-kunit: extend existing test
   scenarios" from Shu Anzai improves the KUnit test coverage for DAMON.
 
 - The 2 patch series "mm/khugepaged: fix dirty page handling for
   MADV_COLLAPSE" from Shivank Garg fixes a glitch in khugepaged which was
   causing madvise(MADV_COLLAPSE) to transiently return -EAGAIN.
 
 - The 29 patch series "arch, mm: consolidate hugetlb early reservation"
   from Mike Rapoport reworks and consolidates a pile of straggly code
   related to reservation of hugetlb memory from bootmem and creation of
   CMA areas for hugetlb.
 
 - The 9 patch series "mm: clean up anon_vma implementation" from Lorenzo
   Stoakes cleans up the anon_vma implementation in various ways.
 
 - The 3 patch series "tweaks for __alloc_pages_slowpath()" from
   Vlastimil Babka does a little streamlining of the page allocator's
   slowpath code.
 
 - The 8 patch series "memcg: separate private and public ID namespaces"
   from Shakeel Butt cleans up the memcg ID code and prevents the
   internal-only private IDs from being exposed to userspace.
 
 - The 6 patch series "mm: hugetlb: allocate frozen gigantic folio" from
   Kefeng Wang cleans up the allocation of frozen folios and avoids some
   atomic refcount operations.
 
 - The 11 patch series "mm/damon: advance DAMOS-based LRU sorting" from
   SeongJae Park improves DAMOS's movement of memory betewwn the active and
   inactive LRUs and adds auto-tuning of the ratio-based quotas and of
   monitoring intervals.
 
 - The 18 patch series "Support page table check on PowerPC" from Andrew
   Donnellan makes CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc.
 
 - The 3 patch series "nodemask: align nodes_and{,not} with underlying
   bitmap ops" from Yury Norov makes nodes_and() and nodes_andnot()
   propagate the return values from the underlying bit operations, enabling
   some cleanup in calling code.
 
 - The 5 patch series "mm/damon: hide kdamond and kdamond_lock from API
   callers" from SeongJae Park cleans up some DAMON internal interfaces.
 
 - The 4 patch series "mm/khugepaged: cleanups and scan limit fix" from
   Shivank Garg does some cleanup work in khupaged and fixes a scan limit
   accounting issue.
 
 - The 24 patch series "mm: balloon infrastructure cleanups" from David
   Hildenbrand goes to town on the balloon infrastructure and its page
   migration function.  Mainly cleanups, also some locking simplification.
 
 - The 2 patch series "mm/vmscan: add tracepoint and reason for
   kswapd_failures reset" from Jiayuan Chen adds additional tracepoints to
   the page reclaim code.
 
 - The 3 patch series "Replace wq users and add WQ_PERCPU to
   alloc_workqueue() users" from Marco Crivellari is part of Marco's
   kernel-wide migration from the legacy workqueue APIs over to the
   preferred unbound workqueues.
 
 - The 9 patch series "Various mm kselftests improvements/fixes" from
   Kevin Brodsky provides various unrelated improvements/fixes for the mm
   kselftests.
 
 - The 5 patch series "mm: accelerate gigantic folio allocation" from
   Kefeng Wang greatly speeds up gigantic folio allocation, mainly by
   avoiding unnecessary work in pfn_range_valid_contig().
 
 - The 5 patch series "selftests/damon: improve leak detection and wss
   estimation reliability" from SeongJae Park improves the reliability of
   two of the DAMON selftests.
 
 - The 8 patch series "mm/damon: cleanup kdamond, damon_call(), damos
   filter and DAMON_MIN_REGION" from SeongJae Park does some cleanup work
   in the core DAMON code.
 
 - The 8 patch series "Docs/mm/damon: update intro, modules, maintainer
   profile, and misc" from SeongJae Park performs maintenance work on the
   DAMON documentation.
 
 - The 10 patch series "mm: add and use vma_assert_stabilised() helper"
   from Lorenzo Stoakes refactors and cleans up the core VMA code.  The
   main aim here is to be able to use the mmap write lock's lockdep state
   to perform various assertions regarding the locking which the VMA code
   requires.
 
 - The 19 patch series "mm, swap: swap table phase II: unify swapin use"
   from Kairui Song removes some old swap code (swap cache bypassing and
   swap synchronization) which wasn't working very well.  Various other
   cleanups and simplifications were made.  The end result is a 20% speedup
   in one benchmark.
 
 - The 8 patch series "enable PT_RECLAIM on more 64-bit architectures"
   from Qi Zheng makes PT_RECLAIM available on 64-bit alpha, loongarch,
   mips, parisc, um,  Various cleanups were performed along the way.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY1HfAAKCRDdBJ7gKXxA
 jqhZAP9H8ZlKKqCEgnr6U5XXmJ63Ep2FDQpl8p35yr9yVuU9+gEAgfyWiJ43l1fP
 rT0yjsUW3KQFBi/SEA3R6aYarmoIBgI=
 =+HLt
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - "powerpc/64s: do not re-activate batched TLB flush" makes
   arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev)

   It adds a generic enter/leave layer and switches architectures to use
   it. Various hacks were removed in the process.

 - "zram: introduce compressed data writeback" implements data
   compression for zram writeback (Richard Chang and Sergey Senozhatsky)

 - "mm: folio_zero_user: clear page ranges" adds clearing of contiguous
   page ranges for hugepages. Large improvements during demand faulting
   are demonstrated (David Hildenbrand)

 - "memcg cleanups" tidies up some memcg code (Chen Ridong)

 - "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos
   stats" improves DAMOS stat's provided information, deterministic
   control, and readability (SeongJae Park)

 - "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few
   issues in the hugetlb cgroup charging selftests (Li Wang)

 - "Fix va_high_addr_switch.sh test failure - again" addresses several
   issues in the va_high_addr_switch test (Chunyu Hu)

 - "mm/damon/tests/core-kunit: extend existing test scenarios" improves
   the KUnit test coverage for DAMON (Shu Anzai)

 - "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a
   glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to
   transiently return -EAGAIN (Shivank Garg)

 - "arch, mm: consolidate hugetlb early reservation" reworks and
   consolidates a pile of straggly code related to reservation of
   hugetlb memory from bootmem and creation of CMA areas for hugetlb
   (Mike Rapoport)

 - "mm: clean up anon_vma implementation" cleans up the anon_vma
   implementation in various ways (Lorenzo Stoakes)

 - "tweaks for __alloc_pages_slowpath()" does a little streamlining of
   the page allocator's slowpath code (Vlastimil Babka)

 - "memcg: separate private and public ID namespaces" cleans up the
   memcg ID code and prevents the internal-only private IDs from being
   exposed to userspace (Shakeel Butt)

 - "mm: hugetlb: allocate frozen gigantic folio" cleans up the
   allocation of frozen folios and avoids some atomic refcount
   operations (Kefeng Wang)

 - "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement
   of memory betewwn the active and inactive LRUs and adds auto-tuning
   of the ratio-based quotas and of monitoring intervals (SeongJae Park)

 - "Support page table check on PowerPC" makes
   CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan)

 - "nodemask: align nodes_and{,not} with underlying bitmap ops" makes
   nodes_and() and nodes_andnot() propagate the return values from the
   underlying bit operations, enabling some cleanup in calling code
   (Yury Norov)

 - "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up
   some DAMON internal interfaces (SeongJae Park)

 - "mm/khugepaged: cleanups and scan limit fix" does some cleanup work
   in khupaged and fixes a scan limit accounting issue (Shivank Garg)

 - "mm: balloon infrastructure cleanups" goes to town on the balloon
   infrastructure and its page migration function. Mainly cleanups, also
   some locking simplification (David Hildenbrand)

 - "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds
   additional tracepoints to the page reclaim code (Jiayuan Chen)

 - "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is
   part of Marco's kernel-wide migration from the legacy workqueue APIs
   over to the preferred unbound workqueues (Marco Crivellari)

 - "Various mm kselftests improvements/fixes" provides various unrelated
   improvements/fixes for the mm kselftests (Kevin Brodsky)

 - "mm: accelerate gigantic folio allocation" greatly speeds up gigantic
   folio allocation, mainly by avoiding unnecessary work in
   pfn_range_valid_contig() (Kefeng Wang)

 - "selftests/damon: improve leak detection and wss estimation
   reliability" improves the reliability of two of the DAMON selftests
   (SeongJae Park)

 - "mm/damon: cleanup kdamond, damon_call(), damos filter and
   DAMON_MIN_REGION" does some cleanup work in the core DAMON code
   (SeongJae Park)

 - "Docs/mm/damon: update intro, modules, maintainer profile, and misc"
   performs maintenance work on the DAMON documentation (SeongJae Park)

 - "mm: add and use vma_assert_stabilised() helper" refactors and cleans
   up the core VMA code. The main aim here is to be able to use the mmap
   write lock's lockdep state to perform various assertions regarding
   the locking which the VMA code requires (Lorenzo Stoakes)

 - "mm, swap: swap table phase II: unify swapin use" removes some old
   swap code (swap cache bypassing and swap synchronization) which
   wasn't working very well. Various other cleanups and simplifications
   were made. The end result is a 20% speedup in one benchmark (Kairui
   Song)

 - "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM
   available on 64-bit alpha, loongarch, mips, parisc, and um. Various
   cleanups were performed along the way (Qi Zheng)

* tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits)
  mm/memory: handle non-split locks correctly in zap_empty_pte_table()
  mm: move pte table reclaim code to memory.c
  mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE
  mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
  um: mm: enable MMU_GATHER_RCU_TABLE_FREE
  parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE
  mips: mm: enable MMU_GATHER_RCU_TABLE_FREE
  LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE
  alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE
  mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h
  mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles
  zsmalloc: make common caches global
  mm: add SPDX id lines to some mm source files
  mm/zswap: use %pe to print error pointers
  mm/vmscan: use %pe to print error pointers
  mm/readahead: fix typo in comment
  mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
  mm: refactor vma_map_pages to use vm_insert_pages
  mm/damon: unify address range representation with damon_addr_range
  mm/cma: replace snprintf with strscpy in cma_new_area
  ...
2026-02-12 11:32:37 -08:00
Linus Torvalds
192c015940 powerpc updates for 7.0
- Implement masked user access
  - Add support for internal only per-CPU instructions and inline the bpf_get_smp_processor_id() and bpf_get_current_task()
  - Fix pSeries MSI-X allocation failure when quota is exceeded
  - Fix recursive pci_lock_rescan_remove locking in EEH event handling
  - Support tailcalls with subprogs & BPF exceptions on 64bit
  - Extend "trusted" keys to support the PowerVM Key Wrapping Module (PKWM)
 
 Thanks to: Abhishek Dubey, Christophe Leroy, Gaurav Batra, Guangshuo Li, Jarkko
 Sakkinen, Mahesh Salgaonkar, Mimi Zohar, Miquel Sabaté Solà, Nam Cao, Narayana
 Murty N, Nayna Jain, Nilay Shroff, Puranjay Mohan, Saket Kumar Bhaskar, Sourabh
 Jain, Srish Srinivasan, Venkat Rao Bagalkote,
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmmL7S0ACgkQpnEsdPSH
 ZJR2xA/9G+tZp9+2TidbLSyPT5o063uC5H5+j5QcvvHWi/ImGUNtixlDm5HcZLCR
 yKywE1piKYBU8HoISjzAt0+JCVd3ZjJ8chTpKgCHLXPRSBTgdR1MG+SXQysDJSWb
 yA4pwDikmwoLlfi+pf500F0nX2DRCRdU2Yi28ZFeaF/khJ7ebwj41QJ7LjN22+Q1
 G8Kq543obZluzSoVvfG4xUK4ByWER+Zdd2F6699iMP68yw5PJ8PPc0SUGt8nuD4i
 FUs0Lw7XV7i/K3+zm/ZgH5+Cvn7wOIcMNkXgFlxJXkFit97KXUDijifYPoXQyKLL
 ksD7SPFdV0++Sc+3mWcgW4j+hQZC0Pn864unmh8C6ug3SagQ+3pE1JYWKwCmoyXd
 49ROH0y+npArJ4NAc79eweunhafGcRYTSG+zV7swQvpRocMujEqa4CDz4uk1ll5W
 1yAac08AN6PnfcXl2VMrcDboziTlQVFcnNQbK/ieYMC7KpgA+udw1hd2rOWNZCPd
 u0byXxR1ak5YaAEuyMztd/39hrExx8306Jtkh5FIRZKWGAO+3np5bi3vxk11rDni
 c9BGh2JIMtuPUGys3wcFPGMRTKwF2bDFW/pB+5hMHeLUdlkni9WGCX8eLe2klYsy
 T7fBVb4d99IVrJGYv3J1lwELgjrgXvv35XOaUiyJyZhcbng15cc=
 =zJoL
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates for 7.0

 - Implement masked user access

 - Add bpf support for internal only per-CPU instructions and inline the
   bpf_get_smp_processor_id() and bpf_get_current_task() functions

 - Fix pSeries MSI-X allocation failure when quota is exceeded

 - Fix recursive pci_lock_rescan_remove locking in EEH event handling

 - Support tailcalls with subprogs & BPF exceptions on 64bit

 - Extend "trusted" keys to support the PowerVM Key Wrapping Module
   (PKWM)

Thanks to Abhishek Dubey, Christophe Leroy, Gaurav Batra, Guangshuo Li,
Jarkko Sakkinen, Mahesh Salgaonkar, Mimi Zohar, Miquel Sabaté Solà, Nam
Cao, Narayana Murty N, Nayna Jain, Nilay Shroff, Puranjay Mohan, Saket
Kumar Bhaskar, Sourabh Jain, Srish Srinivasan, and Venkat Rao Bagalkote.

* tag 'powerpc-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (27 commits)
  powerpc/pseries: plpks: export plpks_wrapping_is_supported
  docs: trusted-encryped: add PKWM as a new trust source
  keys/trusted_keys: establish PKWM as a trusted source
  pseries/plpks: add HCALLs for PowerVM Key Wrapping Module
  pseries/plpks: expose PowerVM wrapping features via the sysfs
  powerpc/pseries: move the PLPKS config inside its own sysfs directory
  pseries/plpks: fix kernel-doc comment inconsistencies
  powerpc/smp: Add check for kcalloc() failure in parse_thread_groups()
  powerpc: kgdb: Remove OUTBUFMAX constant
  powerpc64/bpf: Additional NVR handling for bpf_throw
  powerpc64/bpf: Support exceptions
  powerpc64/bpf: Add arch_bpf_stack_walk() for BPF JIT
  powerpc64/bpf: Avoid tailcall restore from trampoline
  powerpc64/bpf: Support tailcalls with subprogs
  powerpc64/bpf: Moving tail_call_cnt to bottom of frame
  powerpc/eeh: fix recursive pci_lock_rescan_remove locking in EEH event handling
  powerpc/pseries: Fix MSI-X allocation failure when quota is exceeded
  powerpc/iommu: bypass DMA APIs for coherent allocations for pre-mapped memory
  powerpc64/bpf: Inline bpf_get_smp_processor_id() and bpf_get_current_task/_btf()
  powerpc64/bpf: Support internal-only MOV instruction to resolve per-CPU addrs
  ...
2026-02-10 21:46:12 -08:00
Linus Torvalds
f1c538ca81 Updates for the VDSO subsystem:
- Provide the missing 64-bit variant of clock_getres()
 
     This allows the extension of CONFIG_COMPAT_32BIT_TIME to the vDSO and
     finally the removal of 32-bit time types from the kernel and UAPI.
 
   - Remove the useless and broken getcpu_cache from the VDSO
 
     The intention was to provide a trivial way to retrieve the CPU number from
     the VDSO, but as the VDSO data is per process there is no way to make it
     work.
 
   - Switch get/put_unaligned() from packed struct to memcpy()
 
     The packed struct violates strict aliasing rules which requires to pass
     -fno-strict-aliasing to the compiler. As this are scalar values
     __builtin_memcpy() turns them into simple loads and stores
 
   - Use __typeof_unqual__() for __unqual_scalar_typeof()
 
     The get/put_unaligned() changes triggered a new sparse warning when __beNN
     types are used with get/put_unaligned() as sparse builds add a special
     'bitwise' attribute to them which prevents sparse to evaluate the Generic
     in __unqual_scalar_typeof().
 
     Newer sparse versions support __typeof_unqual__() which avoids the problem,
     but requires a recent sparse install. So this adds a sanity check to sparse
     builds, which validates that sparse is available and capable of handling it.
 
   - Force inline __cvdso_clock_getres_common()
 
     Compilers sometimes un-inline agressively, which results in function call
     overhead and problems with automatic stack variable initialization.
 
     Interestingly enough the force inlining results in smaller code than the
     un-inlined variant produced by GCC when optimizing for size.
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmmJ2eAQHHRnbHhAa2Vy
 bmVsLm9yZwAKCRCmGPVMDXSYoXhQD/4mjneVlRBbQ6Nt9LxxhzPlYXvED6u8b4SX
 R//DQ4qqqagh2fSpE57tr3f56HqhUmXptGJSgEvr6tVKoyugsLqP6sN9J9J3o181
 jnPQR0VBP9dihQZ5X93pKo9NZWxLIn0uLD0RsdCbE9NTx4ciw4VIPFYQqkC4Rw6b
 jiRrDR2l8EhV8cmxB6puW5WaQ932M6Awabw9RumzwH3MzIIlbc5Ero51S9eS64LL
 byU5XeWUe295W1Gxze5RHHJWyNQEyx1eUCFfe3LWvfpz7FMzc2AQsKnIJDzW3GiO
 UGu5MGptbLpG+ccvhVEs6/Ls5pWXcoCw4WuDNAunCCOmqda98oDniKf2LwRRbLT0
 nAfLNatMnhXdTPk2zbS45z9uipUQAGKmVAE3/LVqB+ekcutmIGMyqHgR75QX0b4l
 CQPkC9rBsV6gGsScWTnhRydhqioNO/uhhrQv0vEXnKZa0ysTbgZKt3JDZbgUEL2B
 uDxXKyrqjpnqDZKlMMaoLtwd+l+T80ya4/NhHd4ZNGUpTUrHVw2H47lgE7ahCxEk
 /SvXTZSU4Jp8sVQIQ+J6y5z2AQ/xGy++zvNKiZMyP9fQuPqhiDqYkLpmOp61bARx
 wqyVsfGXZYSB1l16AiSC/CyUDcMqqsFohGXQ/Yf0SOiVtXu2WUyfofY1N0IXIGu4
 WOVV9mH1yg==
 =0ogX
 -----END PGP SIGNATURE-----

Merge tag 'timers-vdso-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull VDSO updates from Thomas Gleixner:

 - Provide the missing 64-bit variant of clock_getres()

   This allows the extension of CONFIG_COMPAT_32BIT_TIME to the vDSO and
   finally the removal of 32-bit time types from the kernel and UAPI.

 - Remove the useless and broken getcpu_cache from the VDSO

   The intention was to provide a trivial way to retrieve the CPU number
   from the VDSO, but as the VDSO data is per process there is no way to
   make it work.

 - Switch get/put_unaligned() from packed struct to memcpy()

   The packed struct violates strict aliasing rules which requires to
   pass -fno-strict-aliasing to the compiler. As this are scalar values
   __builtin_memcpy() turns them into simple loads and stores

 - Use __typeof_unqual__() for __unqual_scalar_typeof()

   The get/put_unaligned() changes triggered a new sparse warning when
   __beNN types are used with get/put_unaligned() as sparse builds add a
   special 'bitwise' attribute to them which prevents sparse to evaluate
   the Generic in __unqual_scalar_typeof().

   Newer sparse versions support __typeof_unqual__() which avoids the
   problem, but requires a recent sparse install. So this adds a sanity
   check to sparse builds, which validates that sparse is available and
   capable of handling it.

 - Force inline __cvdso_clock_getres_common()

   Compilers sometimes un-inline agressively, which results in function
   call overhead and problems with automatic stack variable
   initialization.

   Interestingly enough the force inlining results in smaller code than
   the un-inlined variant produced by GCC when optimizing for size.

* tag 'timers-vdso-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  vdso/gettimeofday: Force inlining of __cvdso_clock_getres_common()
  x86/percpu: Make CONFIG_USE_X86_SEG_SUPPORT work with sparse
  compiler: Use __typeof_unqual__() for __unqual_scalar_typeof()
  powerpc/vdso: Provide clock_getres_time64()
  tools headers: Remove unneeded ignoring of warnings in unaligned.h
  tools headers: Update the linux/unaligned.h copy with the kernel sources
  vdso: Switch get/put_unaligned() from packed struct to memcpy()
  parisc: Inline a type punning version of get_unaligned_le32()
  vdso: Remove struct getcpu_cache
  MIPS: vdso: Provide getres_time64() for 32-bit ABIs
  arm64: vdso32: Provide clock_getres_time64()
  ARM: VDSO: Provide clock_getres_time64()
  ARM: VDSO: Patch out __vdso_clock_getres() if unavailable
  x86/vdso: Provide clock_getres_time64() for x86-32
  selftests: vDSO: vdso_test_abi: Add test for clock_getres_time64()
  selftests: vDSO: vdso_test_abi: Use UAPI system call numbers
  selftests: vDSO: vdso_config: Add configurations for clock_getres_time64()
  vdso: Add prototype for __vdso_clock_getres_time64()
2026-02-10 17:02:23 -08:00
Peter Zijlstra
3e4067169c Linux 6.19-rc8
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAml/zSkeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG+bwIAJ0jbbeKDyeJJxPo
 8PgScnPJx9vBL3hGpphZrhbV3GOe9bDhKM/0Xk9qMDbpAm9C6qiBMTiDWyvWv5Qi
 qzDlZfoymMaDLPMxw9WHjJ++i1Z2StNdrz57Vze98C3/iG6gBcKnUEUzvF9nigri
 HIoxoOKlbSXLPUIzt49xE7YX+CRJhLF/kXmfoauZn5ghpv+uqSpWvRbUQJa3dmc0
 S4Ie/nbPtdVHmy1Fz9LJFDOzsdhGyjzHF4kc4shDkjAs8RAr8fJh74mQHO5a3MWA
 3WZ7GAAAc4XXNqj76X2dnVlMWpQNJ4p2e+OalsuXGA6VQ7OgbrJGMX8P6dMFn5AF
 8hFsXn4=
 =IdZ1
 -----END PGP SIGNATURE-----

Merge branch 'v6.19-rc8'

Update to avoid conflicts with /urgent patches.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2026-02-03 12:04:13 +01:00
Srish Srinivasan
40850c909f powerpc/pseries: move the PLPKS config inside its own sysfs directory
The /sys/firmware/secvar/config directory represents Power LPAR Platform
KeyStore (PLPKS) configuration properties such as max_object_size, signed_
update_algorithms, supported_policies, total_size, used_space, and version.
These attributes describe the PLPKS, and not the secure boot variables
(secvars).

Create /sys/firmware/plpks directory and move the PLPKS config inside this
directory. For backwards compatibility, create a soft link from the secvar
sysfs directory to this config and emit a warning stating that the older
sysfs path has been deprecated. Separate out the plpks specific
documentation from secvar.

Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com>
Tested-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260127145228.48320-3-ssrish@linux.ibm.com
2026-01-30 09:27:26 +05:30
Guangshuo Li
33c1c6d8a2 powerpc/smp: Add check for kcalloc() failure in parse_thread_groups()
As kcalloc() may fail, check its return value to avoid a NULL pointer
dereference when passing it to of_property_read_u32_array().

Fixes: 790a1662d3 ("powerpc/smp: Parse ibm,thread-groups with multiple properties")
Cc: stable@vger.kernel.org
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250923133235.1862108-1-lgs201920130244@gmail.com
2026-01-29 09:06:01 +05:30
Mike Rapoport (Microsoft)
9fac145b6d mm, arch: consolidate hugetlb CMA reservation
Every architecture that supports hugetlb_cma command line parameter
reserves CMA areas for hugetlb during setup_arch().

This obfuscates the ordering of hugetlb CMA initialization with respect to
the rest initialization of the core MM.

Introduce arch_hugetlb_cma_order() callback to allow architectures report
the desired order-per-bit of CMA areas and provide a week implementation
of arch_hugetlb_cma_order() for architectures that don't support hugetlb
with CMA.

Use this callback in hugetlb_cma_reserve() instead if passing the order as
parameter and call hugetlb_cma_reserve() from mm_core_init_early() rather
than have it spread over architecture specific code.

Link: https://lkml.kernel.org/r/20260111082105.290734-28-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26 20:02:19 -08:00
Thomas Gleixner
99d2592023 rseq: Implement sys_rseq_slice_yield()
Provide a new syscall which has the only purpose to yield the CPU after the
kernel granted a time slice extension.

sched_yield() is not suitable for that because it unconditionally
schedules, but the end of the time slice extension is not required to
schedule when the task was already preempted. This also allows to have a
strict check for termination to catch user space invoking random syscalls
including sched_yield() from a time slice extension region.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20251215155708.929634896@linutronix.de
2026-01-22 11:11:17 +01:00
Randy Dunlap
24c776355f kernel.h: drop hex.h and update all hex.h users
Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of
hex.h interfaces to directly #include <linux/hex.h> as part of the process
of putting kernel.h on a diet.

Removing hex.h from kernel.h means that 36K C source files don't have to
pay the price of parsing hex.h for the roughly 120 C source files that
need it.

This change has been build-tested with allmodconfig on most ARCHes.  Also,
all users/callers of <linux/hex.h> in the entire source tree have been
updated if needed (if not already #included).

Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:44:19 -08:00
Alexander Gordeev
58852f24f9 powerpc/64s: do not re-activate batched TLB flush
Patch series "Nesting support for lazy MMU mode", v6.

When the lazy MMU mode was introduced eons ago, it wasn't made clear
whether such a sequence was legal:

	arch_enter_lazy_mmu_mode()
	...
		arch_enter_lazy_mmu_mode()
		...
		arch_leave_lazy_mmu_mode()
	...
	arch_leave_lazy_mmu_mode()

It seems fair to say that nested calls to
arch_{enter,leave}_lazy_mmu_mode() were not expected, and most
architectures never explicitly supported it.

Nesting does in fact occur in certain configurations, and avoiding it has
proved difficult.  This series therefore enables lazy_mmu sections to
nest, on all architectures.

Nesting is handled using a counter in task_struct (patch 8), like other
stateless APIs such as pagefault_{disable,enable}().  This is fully
handled in a new generic layer in <linux/pgtable.h>; the arch_* API
remains unchanged.  A new pair of calls, lazy_mmu_mode_{pause,resume}(),
is also introduced to allow functions that are called with the lazy MMU
mode enabled to temporarily pause it, regardless of nesting.

An arch now opts in to using the lazy MMU mode by selecting
CONFIG_ARCH_LAZY_MMU; this is more appropriate now that we have a generic
API, especially with state conditionally added to task_struct.


This patch (of 14):

Since commit b9ef323ea1 ("powerpc/64s: Disable preemption in hash lazy
mmu mode") a task can not be preempted while in lazy MMU mode.  Therefore,
the batch re-activation code is never called, so remove it.

Link: https://lkml.kernel.org/r/20251215150323.2218608-1-kevin.brodsky@arm.com
Link: https://lkml.kernel.org/r/20251215150323.2218608-2-kevin.brodsky@arm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Juegren Gross <jgross@suse.com>
Cc: levi.yun <yeoreum.yun@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:24:32 -08:00
Feng Tang
e561383a39 powerpc/watchdog: add support for hardlockup_sys_info sysctl
Commit a9af76a787 ("watchdog: add sys_info sysctls to dump sys info on
system lockup") adds 'hardlock_sys_info' systcl knob for general kernel
watchdog to control what kinds of system debug info to be dumped on
hardlockup.

Add similar support in powerpc watchdog code to make the sysctl knob more
general, which also fixes a compiling warning in general watchdog code
reported by 0day bot.

Link: https://lkml.kernel.org/r/20251231080309.39642-1-feng.tang@linux.alibaba.com
Fixes: a9af76a787 ("watchdog: add sys_info sysctls to dump sys info on system lockup")
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512030920.NFKtekA7-lkp@intel.com/
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-14 22:16:22 -08:00
Thomas Weißschuh
759a1f9737 powerpc/vdso: Provide clock_getres_time64()
For consistency with __vdso_clock_gettime64() there should also be a
64-bit variant of clock_getres(). This will allow the extension of
CONFIG_COMPAT_32BIT_TIME to the vDSO and finally the removal of 32-bit
time types from the kernel and UAPI.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Link: https://patch.msgid.link/20260114-vdso-powerpc-align-v1-1-acf09373d568@linutronix.de
2026-01-14 14:57:39 +01:00
Narayana Murty N
815a8d2feb powerpc/eeh: fix recursive pci_lock_rescan_remove locking in EEH event handling
The recent commit 1010b4c012 ("powerpc/eeh: Make EEH driver device
hotplug safe") restructured the EEH driver to improve synchronization
with the PCI hotplug layer.

However, it inadvertently moved pci_lock_rescan_remove() outside its
intended scope in eeh_handle_normal_event(), leading to broken PCI
error reporting and improper EEH event triggering. Specifically,
eeh_handle_normal_event() acquired pci_lock_rescan_remove() before
calling eeh_pe_bus_get(), but eeh_pe_bus_get() itself attempts to
acquire the same lock internally, causing nested locking and disrupting
normal EEH event handling paths.

This patch adds a boolean parameter do_lock to _eeh_pe_bus_get(),
with two public wrappers:
    eeh_pe_bus_get() with locking enabled.
    eeh_pe_bus_get_nolock() that skips locking.

Callers that already hold pci_lock_rescan_remove() now use
eeh_pe_bus_get_nolock() to avoid recursive lock acquisition.

Additionally, pci_lock_rescan_remove() calls are restored to the correct
position—after eeh_pe_bus_get() and immediately before iterating affected
PEs and devices. This ensures EEH-triggered PCI removes occur under proper
bus rescan locking without recursive lock contention.

The eeh_pe_loc_get() function has been split into two functions:
    eeh_pe_loc_get(struct eeh_pe *pe) which retrieves the loc for given PE.
    eeh_pe_loc_get_bus(struct pci_bus *bus) which retrieves the location
    code for given bus.

This resolves lockdep warnings such as:
<snip>
[   84.964298] [    T928] ============================================
[   84.964304] [    T928] WARNING: possible recursive locking detected
[   84.964311] [    T928] 6.18.0-rc3 #51 Not tainted
[   84.964315] [    T928] --------------------------------------------
[   84.964320] [    T928] eehd/928 is trying to acquire lock:
[   84.964324] [    T928] c000000003b29d58 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x28/0x40
[   84.964342] [    T928]
                       but task is already holding lock:
[   84.964347] [    T928] c000000003b29d58 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x28/0x40
[   84.964357] [    T928]
                       other info that might help us debug this:
[   84.964363] [    T928]  Possible unsafe locking scenario:

[   84.964367] [    T928]        CPU0
[   84.964370] [    T928]        ----
[   84.964373] [    T928]   lock(pci_rescan_remove_lock);
[   84.964378] [    T928]   lock(pci_rescan_remove_lock);
[   84.964383] [    T928]
                       *** DEADLOCK ***

[   84.964388] [    T928]  May be due to missing lock nesting notation

[   84.964393] [    T928] 1 lock held by eehd/928:
[   84.964397] [    T928]  #0: c000000003b29d58 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x28/0x40
[   84.964408] [    T928]
                       stack backtrace:
[   84.964414] [    T928] CPU: 2 UID: 0 PID: 928 Comm: eehd Not tainted 6.18.0-rc3 #51 VOLUNTARY
[   84.964417] [    T928] Hardware name: IBM,9080-HEX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_022) hv:phyp pSeries
[   84.964419] [    T928] Call Trace:
[   84.964420] [    T928] [c0000011a7157990] [c000000001705de4] dump_stack_lvl+0xc8/0x130 (unreliable)
[   84.964424] [    T928] [c0000011a71579d0] [c0000000002f66e0] print_deadlock_bug+0x430/0x440
[   84.964428] [    T928] [c0000011a7157a70] [c0000000002fd0c0] __lock_acquire+0x1530/0x2d80
[   84.964431] [    T928] [c0000011a7157ba0] [c0000000002fea54] lock_acquire+0x144/0x410
[   84.964433] [    T928] [c0000011a7157cb0] [c0000011a7157cb0] __mutex_lock+0xf4/0x1050
[   84.964436] [    T928] [c0000011a7157e00] [c000000000de21d8] pci_lock_rescan_remove+0x28/0x40
[   84.964439] [    T928] [c0000011a7157e20] [c00000000004ed98] eeh_pe_bus_get+0x48/0xc0
[   84.964442] [    T928] [c0000011a7157e50] [c000000000050434] eeh_handle_normal_event+0x64/0xa60
[   84.964446] [    T928] [c0000011a7157f30] [c000000000051de8] eeh_event_handler+0xf8/0x190
[   84.964450] [    T928] [c0000011a7157f90] [c0000000002747ac] kthread+0x16c/0x180
[   84.964453] [    T928] [c0000011a7157fe0] [c00000000000ded8] start_kernel_thread+0x14/0x18
</snip>

Fixes: 1010b4c012 ("powerpc/eeh: Make EEH driver device hotplug safe")
Signed-off-by: Narayana Murty N <nnmlinux@linux.ibm.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251210142559.8874-1-nnmlinux@linux.ibm.com
2026-01-08 17:56:53 +05:30
Gaurav Batra
1471c517cf powerpc/iommu: bypass DMA APIs for coherent allocations for pre-mapped memory
Leverage ARCH_HAS_DMA_MAP_DIRECT config option for coherent allocations as
well. This will bypass DMA ops for memory allocations that have been
pre-mapped.

Always set device bus_dma_limit when memory is pre-mapped. In some
architectures, like PowerPC, pmemory can be converted to regular memory via
daxctl command. This will gate the coherent allocations to pre-mapped RAM
only, by dma_coherent_ok().

Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251107161105.85999-1-gbatra@linux.ibm.com
2026-01-07 09:33:55 +05:30
Christophe Leroy
fb7903771c powerpc/32s: Fix segments setup when TASK_SIZE is not a multiple of 256M
For book3s/32 it is assumed that TASK_SIZE is a multiple of 256 Mbytes,
but Kconfig allows any value for TASK_SIZE.

In all relevant calculations, align TASK_SIZE to the upper 256 Mbytes
boundary.

Also use ASM_CONST() in the definition of TASK_SIZE to ensure it is
seen as an unsigned constant.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/8928d906079e156c59794c41e826a684eaaaebb4.1766574657.git.chleroy@kernel.org
2026-01-07 09:31:04 +05:30
Christophe Leroy (CS GROUP)
608328ba5b powerpc/32: Restore disabling of interrupts at interrupt/syscall exit
Commit 2997876c4a ("powerpc/32: Restore clearing of MSR[RI] at
interrupt/syscall exit") delayed clearing of MSR[RI], but missed that
both MSR[RI] and MSR[EE] are cleared at the same time, so the commit
also delayed the disabling of interrupts, leading to unexpected
behaviour.

To fix that, mostly revert the blamed commit and restore the clearing
of MSR[RI] in interrupt_exit_kernel_prepare() instead. For 8xx it
implies adding a synchronising instruction after the mtspr in order to
make sure no instruction counter interrupt (used for perf events) will
fire just after clearing MSR[RI].

Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Closes: https://lore.kernel.org/all/4d0bd05d-6158-1323-3509-744d3fbe8fc7@xenosoft.de/
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/all/6b05eb1c-fdef-44e0-91a7-8286825e68f1@roeck-us.net/
Fixes: 2997876c4a ("powerpc/32: Restore clearing of MSR[RI] at interrupt/syscall exit")
Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/585ea521b2be99d293b539bbfae148366cfb3687.1766146895.git.chleroy@kernel.org
2025-12-22 18:25:07 +05:30
Finn Thain
b94b735675 powerpc: Add reloc_offset() to font bitmap pointer used for bootx_printf()
Since Linux v6.7, booting using BootX on an Old World PowerMac produces
an early crash. Stan Johnson writes, "the symptoms are that the screen
goes blank and the backlight stays on, and the system freezes (Linux
doesn't boot)."

Further testing revealed that the failure can be avoided by disabling
CONFIG_BOOTX_TEXT. Bisection revealed that the regression was caused by
a change to the font bitmap pointer that's used when btext_init() begins
painting characters on the display, early in the boot process.

Christophe Leroy explains, "before kernel text is relocated to its final
location ... data is addressed with an offset which is added to the
Global Offset Table (GOT) entries at the start of bootx_init()
by function reloc_got2(). But the pointers that are located inside a
structure are not referenced in the GOT and are therefore not updated by
reloc_got2(). It is therefore needed to apply the offset manually by using
PTRRELOC() macro."

Cc: stable@vger.kernel.org
Link: https://lists.debian.org/debian-powerpc/2025/10/msg00111.html
Link: https://lore.kernel.org/linuxppc-dev/d81ddca8-c5ee-d583-d579-02b19ed95301@yahoo.com/
Reported-by: Cedar Maxwell <cedarmaxwell@mac.com>
Closes: https://lists.debian.org/debian-powerpc/2025/09/msg00031.html
Bisected-by: Stan Johnson <userm57@yahoo.com>
Tested-by: Stan Johnson <userm57@yahoo.com>
Fixes: 0ebc7feae7 ("powerpc: Use shared font data")
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/22b3b247425a052b079ab84da926706b3702c2c7.1762731022.git.fthain@linux-m68k.org
2025-12-22 17:59:07 +05:30
Linus Torvalds
a7405aa92f dma-mapping updates for Linux 6.19:
- next part of DMA mapping API refactoring to physical addresses as the primary
 interface instead of page+offset parameters; this time dma_map_ops callbacks
 are converted to physical addresses, what in turn results also in some
 simplification of architecture specific code (Leon Romanovsky and Jason
 Gunthorpe)
 - clarify that dma_map_benchmark is not a kernel self-test, but standalone
 tool (Qinxin Xia)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCaTLpeQAKCRCJp1EFxbsS
 RKlAAQCo/gVslheoJ+h4Hk5oUjLfVQaiamJOlzxw12EVHVAs3AEA7GILIAL1GbGn
 EpkgwjIuz/J/4aGhNbYO6J+C1qMAHww=
 =aM6P
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.19-2025-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-mapping updates from Marek Szyprowski:

 - More DMA mapping API refactoring to physical addresses as the primary
   interface instead of page+offset parameters.

   This time dma_map_ops callbacks are converted to physical addresses,
   what in turn results also in some simplification of architecture
   specific code (Leon Romanovsky and Jason Gunthorpe)

 - Clarify that dma_map_benchmark is not a kernel self-test, but
   standalone tool (Qinxin Xia)

* tag 'dma-mapping-6.19-2025-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma-mapping: remove unused map_page callback
  xen: swiotlb: Convert mapping routine to rely on physical address
  x86: Use physical address for DMA mapping
  sparc: Use physical address DMA mapping
  powerpc: Convert to physical address DMA mapping
  parisc: Convert DMA map_page to map_phys interface
  MIPS/jazzdma: Provide physical address directly
  alpha: Convert mapping routine to rely on physical address
  dma-mapping: remove unused mapping resource callbacks
  xen: swiotlb: Switch to physical address mapping callbacks
  ARM: dma-mapping: Switch to physical address mapping callbacks
  ARM: dma-mapping: Reduce struct page exposure in arch_sync_dma*()
  dma-mapping: convert dummy ops to physical address mapping
  dma-mapping: prepare dma_map_ops to conversion to physical address
  tools/dma: move dma_map_benchmark from selftests to tools/dma
2025-12-06 09:25:05 -08:00
Linus Torvalds
ad952db4a8 powerpc updates for 6.19
- Restore clearing of MSR[RI] at interrupt/syscall exit on 32-bit.
 
  - Fix unpaired stwcx on interrupt exit on 32-bit.
 
  - Fix race condition leading to double list-add in mac_hid_toggle_emumouse().
 
  - Fix mprotect on book3s 32-bit.
 
  - Fix SLB multihit issue during SLB preload with 64-bit hash MMU.
 
  - Add support for crashkernel CMA reservation.
 
  - Add die_id and die_cpumask for Power10 & later to expose chip hemispheres.
 
  - A series of minor fixes and improvements to the hash SLB code.
 
 Thanks to: Antonio Alvarez Feijoo, Ben Collins, Bhaskar Chowdhury, Christophe
 Leroy, Daniel Thompson, Dave Vasilevsky, Donet Tom, J. Neuschäfer, Kunwu Chan,
 Long Li, Naresh Kamboju, Nathan Chancellor, Ritesh Harjani (IBM), Shirisha G,
 Shrikanth Hegde, Sourabh Jain, Srikar Dronamraju, Stephen Rothwell, Thomas
 Zimmermann, Venkat Rao Bagalkote, Vishal Chourasia.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRjvi15rv0TSTaE+SIF0oADX8seIQUCaTFrpQAKCRAF0oADX8se
 IaVSAPwJazU+gxYwIe9mB7Mt9w1N04voW7LmX4tcj83i/Xd5QgEAsXfdpYeo3Tvb
 VPIOXVGVxOxAccyQ7Yw5QF4BPv+8FAc=
 =h6Ed
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Restore clearing of MSR[RI] at interrupt/syscall exit on 32-bit

 - Fix unpaired stwcx on interrupt exit on 32-bit

 - Fix race condition leading to double list-add in
   mac_hid_toggle_emumouse()

 - Fix mprotect on book3s 32-bit

 - Fix SLB multihit issue during SLB preload with 64-bit hash MMU

 - Add support for crashkernel CMA reservation

 - Add die_id and die_cpumask for Power10 & later to expose chip
   hemispheres

 - A series of minor fixes and improvements to the hash SLB code

Thanks to Antonio Alvarez Feijoo, Ben Collins, Bhaskar Chowdhury,
Christophe Leroy, Daniel Thompson, Dave Vasilevsky, Donet Tom,
J. Neuschäfer, Kunwu Chan, Long Li, Naresh Kamboju, Nathan Chancellor,
Ritesh Harjani (IBM), Shirisha G, Shrikanth Hegde, Sourabh Jain, Srikar
Dronamraju, Stephen Rothwell, Thomas Zimmermann, Venkat Rao Bagalkote,
and Vishal Chourasia.

* tag 'powerpc-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (32 commits)
  macintosh/via-pmu-backlight: Include <linux/fb.h> and <linux/of.h>
  powerpc/powermac: backlight: Include <linux/of.h>
  powerpc/64s/slb: Add no_slb_preload early cmdline param
  powerpc/64s/slb: Make preload_add return type as void
  powerpc/ptdump: Dump PXX level info for kernel_page_tables
  powerpc/64s/pgtable: Enable directMap counters in meminfo for Hash
  powerpc/64s/hash: Update directMap page counters for Hash
  powerpc/64s/hash: Hash hpt_order should be only available with Hash MMU
  powerpc/64s/hash: Improve hash mmu printk messages
  powerpc/64s/hash: Fix phys_addr_t printf format in htab_initialize()
  powerpc/64s/ptdump: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE format
  powerpc/64s/hash: Restrict stress_hpt_struct memblock region to within RMA limit
  powerpc/64s/slb: Fix SLB multihit issue during SLB preload
  powerpc, mm: Fix mprotect on book3s 32-bit
  powerpc/smp: Expose die_id and die_cpumask
  powerpc/83xx: Add a null pointer check to mcu_gpiochip_add
  arch:powerpc:tools This file was missing shebang line, so added it
  kexec: Include kernel-end even without crashkernel
  powerpc: p2020: Rename wdt@ nodes to watchdog@
  powerpc: 86xx: Rename wdt@ nodes to watchdog@
  ...
2025-12-05 16:18:21 -08:00
Linus Torvalds
ce5cfb0fa2 IOMMU Updates for Linux v6.19
Including:
 
 	- Introduction of the generic IO page-table framework with support for
 	  Intel and AMD IOMMU formats from Jason. This has good potential for
 	  unifying more IO page-table implementations and making future
 	  enhancements more easy. But this also needed quite some fixes during
 	  development. All known issues have been fixed, but my feeling is that
 	  there is a higher potential than usual that more might be needed.
 
 	- Intel VT-d updates:
 	  - Use right invalidation hint in qi_desc_iotlb().
 
 	  - Reduce the scope of INTEL_IOMMU_FLOPPY_WA.
 
 	- ARM-SMMU updates:
 	  - Qualcomm device-tree binding updates for Kaanapali and Glymur SoCs
 	    and a new clock for the TBU.
 
 	  - Fix error handling if level 1 CD table allocation fails.
 
 	  - Permit more than the architectural maximum number of SMRs for funky
 	    Qualcomm mis-implementations of SMMUv2.
 
 	- Mediatek driver:
 	  - MT8189 iommu support.
 
 	- Move ARM IO-pgtable selftests to kunit.
 
 	- Device leak fixes for a couple of drivers.
 
 	- Random smaller fixes and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmktjWgACgkQK/BELZcB
 GuMQgA//YljqMVbMmYpFQF/9nXsyTvkpzaaVqj3CscjfnLJQ7YNIrWHLMw4xcZP7
 c2zsSDMPc5LAe2nkyNPvMeFufsbDOYE1CcbqFfZhNqwKGWIVs7J1j73aqJXKfXB4
 RaVv5GA1NFeeIRRKPx/ZpD9W1WiR9PiUQXBxNTAJLzbBNhcTJzo+3YuSpJ6ckOak
 aG67Aox6Dwq/u0gHy8gWnuA2XL6Eit+nQbod7TfchHoRu+TUdbv8qWL+sUChj+u/
 IlyBt1YL/do3bJC0G1G2E81J1KGPU/OZRfB34STQKlopEdXX17ax3b2X0bt3Hz/h
 9Yk3BLDtGMBQ0aVZzAZcOLLlBlEpPiMKBVuJQj29kK9KJSYfmr2iolOK0cGyt+kv
 DfQ8+nv6HRFMbwjetfmhGYf6WemPcggvX44Hm/rgR2qbN3P+Q8/klyyH8MLuQeqO
 ttoQIwDd9DYKJelmWzbLgpb2vGE3O0EAFhiTcCKOk643PaudfengWYKZpJVIIqtF
 nEUEpk17HlpgFkYrtmIE7CMONqUGaQYO84R3j7DXcYXYAvqQJkhR3uJejlWQeh8x
 uMc9y04jpg3p5vC5c7LfkQ3at3p/jf7jzz4GuNoZP5bdVIUkwXXolir0ct3/oby3
 /bXXNA1pSaRuUADm7pYBBhKYAFKC7vCSa8LVaDR3CB95aNZnvS0=
 =KG7j
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu updates from Joerg Roedel:

 - Introduction of the generic IO page-table framework with support for
   Intel and AMD IOMMU formats from Jason.

   This has good potential for unifying more IO page-table
   implementations and making future enhancements more easy. But this
   also needed quite some fixes during development. All known issues
   have been fixed, but my feeling is that there is a higher potential
   than usual that more might be needed.

 - Intel VT-d updates:
    - Use right invalidation hint in qi_desc_iotlb()
    - Reduce the scope of INTEL_IOMMU_FLOPPY_WA

 - ARM-SMMU updates:
    - Qualcomm device-tree binding updates for Kaanapali and Glymur SoCs
      and a new clock for the TBU.
    - Fix error handling if level 1 CD table allocation fails.
    - Permit more than the architectural maximum number of SMRs for
      funky Qualcomm mis-implementations of SMMUv2.

 - Mediatek driver:
    - MT8189 iommu support

 - Move ARM IO-pgtable selftests to kunit

 - Device leak fixes for a couple of drivers

 - Random smaller fixes and improvements

* tag 'iommu-updates-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (81 commits)
  iommupt/vtd: Support mgaw's less than a 4 level walk for first stage
  iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires
  powerpc/pseries/svm: Make mem_encrypt.h self contained
  genpt: Make GENERIC_PT invisible
  iommupt: Avoid a compiler bug with sw_bit
  iommu/arm-smmu-qcom: Enable use of all SMR groups when running bare-metal
  iommupt: Fix unlikely flows in increase_top()
  iommu/amd: Propagate the error code returned by __modify_irte_ga() in modify_irte_ga()
  MAINTAINERS: Update my email address
  iommu/arm-smmu-v3: Fix error check in arm_smmu_alloc_cd_tables
  dt-bindings: iommu: qcom_iommu: Allow 'tbu' clock
  iommu/vt-d: Restore previous domain::aperture_end calculation
  iommu/vt-d: Fix unused invalidation hint in qi_desc_iotlb
  iommu/vt-d: Set INTEL_IOMMU_FLOPPY_WA depend on BLK_DEV_FD
  iommu/tegra: fix device leak on probe_device()
  iommu/sun50i: fix device leak on of_xlate()
  iommu/omap: simplify probe_device() error handling
  iommu/omap: fix device leaks on probe_device()
  iommu/mediatek-v1: add missing larb count sanity check
  iommu/mediatek-v1: fix device leaks on probe()
  ...
2025-12-04 18:05:06 -08:00
Donet Tom
00312419f0 powerpc/64s/slb: Fix SLB multihit issue during SLB preload
On systems using the hash MMU, there is a software SLB preload cache that
mirrors the entries loaded into the hardware SLB buffer. This preload
cache is subject to periodic eviction — typically after every 256 context
switches — to remove old entry.

To optimize performance, the kernel skips switch_mmu_context() in
switch_mm_irqs_off() when the prev and next mm_struct are the same.
However, on hash MMU systems, this can lead to inconsistencies between
the hardware SLB and the software preload cache.

If an SLB entry for a process is evicted from the software cache on one
CPU, and the same process later runs on another CPU without executing
switch_mmu_context(), the hardware SLB may retain stale entries. If the
kernel then attempts to reload that entry, it can trigger an SLB
multi-hit error.

The following timeline shows how stale SLB entries are created and can
cause a multi-hit error when a process moves between CPUs without a
MMU context switch.

CPU 0                                   CPU 1
-----                                    -----
Process P
exec                                    swapper/1
 load_elf_binary
  begin_new_exc
    activate_mm
     switch_mm_irqs_off
      switch_mmu_context
       switch_slb
       /*
        * This invalidates all
        * the entries in the HW
        * and setup the new HW
        * SLB entries as per the
        * preload cache.
        */
context_switch
sched_migrate_task migrates process P to cpu-1

Process swapper/0                       context switch (to process P)
(uses mm_struct of Process P)           switch_mm_irqs_off()
                                         switch_slb
                                           load_slb++
                                            /*
                                            * load_slb becomes 0 here
                                            * and we evict an entry from
                                            * the preload cache with
                                            * preload_age(). We still
                                            * keep HW SLB and preload
                                            * cache in sync, that is
                                            * because all HW SLB entries
                                            * anyways gets evicted in
                                            * switch_slb during SLBIA.
                                            * We then only add those
                                            * entries back in HW SLB,
                                            * which are currently
                                            * present in preload_cache
                                            * (after eviction).
                                            */
                                        load_elf_binary continues...
                                         setup_new_exec()
                                          slb_setup_new_exec()

                                        sched_switch event
                                        sched_migrate_task migrates
                                        process P to cpu-0

context_switch from swapper/0 to Process P
 switch_mm_irqs_off()
  /*
   * Since both prev and next mm struct are same we don't call
   * switch_mmu_context(). This will cause the HW SLB and SW preload
   * cache to go out of sync in preload_new_slb_context. Because there
   * was an SLB entry which was evicted from both HW and preload cache
   * on cpu-1. Now later in preload_new_slb_context(), when we will try
   * to add the same preload entry again, we will add this to the SW
   * preload cache and then will add it to the HW SLB. Since on cpu-0
   * this entry was never invalidated, hence adding this entry to the HW
   * SLB will cause a SLB multi-hit error.
   */
load_elf_binary continues...
 START_THREAD
  start_thread
   preload_new_slb_context
   /*
    * This tries to add a new EA to preload cache which was earlier
    * evicted from both cpu-1 HW SLB and preload cache. This caused the
    * HW SLB of cpu-0 to go out of sync with the SW preload cache. The
    * reason for this was, that when we context switched back on CPU-0,
    * we should have ideally called switch_mmu_context() which will
    * bring the HW SLB entries on CPU-0 in sync with SW preload cache
    * entries by setting up the mmu context properly. But we didn't do
    * that since the prev mm_struct running on cpu-0 was same as the
    * next mm_struct (which is true for swapper / kernel threads). So
    * now when we try to add this new entry into the HW SLB of cpu-0,
    * we hit a SLB multi-hit error.
    */

WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62
assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149]
Modules linked in:
CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty #12
VOLUNTARY
Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected)
0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries
NIP:  c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000
REGS: c0000000497c77e0 TRAP: 0700   Not tainted  (6.16.0-rc3-dirty)
MSR:  8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 28888482  XER: 00000000
CFAR: c0000000001543b0 IRQMASK: 3
<...>
NIP [c00000000015426c] assert_slb_presence+0x2c/0x50
LR [c0000000001543b4] slb_insert_entry+0x124/0x390
Call Trace:
  0x7fffceb5ffff (unreliable)
  preload_new_slb_context+0x100/0x1a0
  start_thread+0x26c/0x420
  load_elf_binary+0x1b04/0x1c40
  bprm_execve+0x358/0x680
  do_execveat_common+0x1f8/0x240
  sys_execve+0x58/0x70
  system_call_exception+0x114/0x300
  system_call_common+0x160/0x2c4

>From the above analysis, during early exec the hardware SLB is cleared,
and entries from the software preload cache are reloaded into hardware
by switch_slb. However, preload_new_slb_context and slb_setup_new_exec
also attempt to load some of the same entries, which can trigger a
multi-hit. In most cases, these additional preloads simply hit existing
entries and add nothing new. Removing these functions avoids redundant
preloads and eliminates the multi-hit issue. This patch removes these
two functions.

We tested process switching performance using the context_switch
benchmark on POWER9/hash, and observed no regression.

Without this patch: 129041 ops/sec
With this patch:    129341 ops/sec

We also measured SLB faults during boot, and the counts are essentially
the same with and without this patch.

SLB faults without this patch: 19727
SLB faults with this patch:    19786

Fixes: 5434ae7462 ("powerpc/64s/hash: Add a SLB preload cache")
cc: stable@vger.kernel.org
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/0ac694ae683494fe8cadbd911a1a5018d5d3c541.1761834163.git.ritesh.list@gmail.com
2025-11-18 12:35:52 +05:30
Srikar Dronamraju
fb2ff9fa72 powerpc/smp: Expose die_id and die_cpumask
>From Power10 processors onwards, each chip has 2 hemispheres. For LPARs
running on PowerVM Hypervisor, hypervisor determines the allocation of
CPU groups to each LPAR, resulting in two LPARs with the same number of
CPUs potentially having different numbers of CPUs from each hemisphere.
Additionally, it is not feasible to ascertain the hemisphere based
solely on the CPU number.

Users wishing to assign their workload to all CPUs, or a subset of CPUs
within a specific hemisphere, encounter difficulties in identifying the
cpumask. To address this, it is proposed to expose hemisphere
information as a die in sysfs. This aligns with other architectures
and facilitates the identification of CPUs within the same hemisphere.
Tools such as lstopo can also access this information.

Please note: The hypervisor reveals the locality of the CPUs to
hemispheres only in dedicated mode. Consequently, in systems where
hemisphere information is unavailable, such as shared LPARs, the
die_cpus information in sysfs will mirror package_cpus, with
die_id set to -1.

Without this change.
$ grep . /sys/devices/system/cpu/cpu16/topology/{die*,package*} 2>/dev/null
/sys/devices/system/cpu/cpu16/topology/package_cpus:000000,000000ff,ffff0000
/sys/devices/system/cpu/cpu16/topology/package_cpus_list:16-39

With this change.
$ grep . /sys/devices/system/cpu/cpu16/topology/{die*,package*} 2>/dev/null
/sys/devices/system/cpu/cpu16/topology/die_cpus:000000,00000000,00ff0000
/sys/devices/system/cpu/cpu16/topology/die_cpus_list:16-23
/sys/devices/system/cpu/cpu16/topology/die_id:2
/sys/devices/system/cpu/cpu16/topology/package_cpus:000000,000000ff,ffff0000
/sys/devices/system/cpu/cpu16/topology/package_cpus_list:16-39

snipped lstopo-no-graphics o/p
  Group0 L#0 (total=8747584KB)
    Package L#0 (total=3564096KB CPUModel="POWER10 (architected), altivec supported" CPURevision="2.0 (pvr 0080 0200)")
      NUMANode L#0 (P#0 local=3564096KB total=3564096KB)
      Die L#0 (P#0)
        Core L#0 (P#0)
<snipped>
    Package L#1 (total=5183488KB CPUModel="POWER10 (architected), altivec supported" CPURevision="2.0 (pvr 0080 0200)")
      NUMANode L#1 (P#1 local=5183488KB total=5183488KB)
      Die L#2 (P#2)
        Core L#2 (P#16)
          L3Cache L#4 (size=4096KB linesize=128 ways=16)
            L2Cache L#4 (size=1024KB linesize=128 ways=8)
              L1dCache L#4 (size=32KB linesize=128 ways=8)
                L1iCache L#4 (size=48KB linesize=128 ways=6)
                  PU L#16 (P#16)
                  PU L#17 (P#18)
                  PU L#18 (P#20)
                  PU L#19 (P#22)
          L3Cache L#5 (size=4096KB linesize=128 ways=16)
            L2Cache L#5 (size=1024KB linesize=128 ways=8)
              L1dCache L#5 (size=32KB linesize=128 ways=8)
                L1iCache L#5 (size=48KB linesize=128 ways=6)
                  PU L#20 (P#17)
                  PU L#21 (P#19)
                  PU L#22 (P#21)
                  PU L#23 (P#23)
      Die L#3 (P#3)
        Core L#3 (P#24)
          L3Cache L#6 (size=4096KB linesize=128 ways=16)
            L2Cache L#6 (size=1024KB linesize=128 ways=8)
              L1dCache L#6 (size=32KB linesize=128 ways=8)
                L1iCache L#6 (size=48KB linesize=128 ways=6)
                  PU L#24 (P#24)
                  PU L#25 (P#26)
                  PU L#26 (P#28)
                  PU L#27 (P#30)
          L3Cache L#7 (size=4096KB linesize=128 ways=16)
            L2Cache L#7 (size=1024KB linesize=128 ways=8)
              L1dCache L#7 (size=32KB linesize=128 ways=8)
                L1iCache L#7 (size=48KB linesize=128 ways=6)
                  PU L#28 (P#25)
                  PU L#29 (P#27)
                  PU L#30 (P#29)
                  PU L#31 (P#31)
        Core L#4 (P#32)
          L3Cache L#8 (size=4096KB linesize=128 ways=16)
            L2Cache L#8 (size=1024KB linesize=128 ways=8)
              L1dCache L#8 (size=32KB linesize=128 ways=8)
                L1iCache L#8 (size=48KB linesize=128 ways=6)
                  PU L#32 (P#32)
                  PU L#33 (P#34)
                  PU L#34 (P#36)
                  PU L#35 (P#38)
          L3Cache L#9 (size=4096KB linesize=128 ways=16)
            L2Cache L#9 (size=1024KB linesize=128 ways=8)
              L1dCache L#9 (size=32KB linesize=128 ways=8)
                L1iCache L#9 (size=48KB linesize=128 ways=6)
                  PU L#36 (P#33)
                  PU L#37 (P#35)
                  PU L#38 (P#37)
                  PU L#39 (P#39)
  Group0 L#1 (total=7736896KB)
    Package L#2 (total=5170880KB CPUModel="POWER10 (architected), altivec supported" CPURevision="2.0 (pvr 0080 0200)")
      NUMANode L#2 (P#2 local=5170880KB total=5170880KB)
      Die L#4 (P#4)
<snipped>

Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251112074859.814087-1-srikar@linux.ibm.com
2025-11-14 11:12:56 +05:30
Nathan Chancellor
d2be62d585 powerpc/vmlinux.lds: Drop .interp description
Commit da30705c46 ("arch/powerpc: Remove .interp section in vmlinux")
intended to drop the .interp section from vmlinux but even with this
change, relocatable kernels linked with ld.lld contain an empty .interp
section, which ends up causing crashes in GDB [1].

  $ make -skj"$(nproc)" ARCH=powerpc LLVM=1 clean pseries_le_defconfig vmlinux

  $ llvm-readelf -S vmlinux | grep interp
    [44] .interp           PROGBITS        c0000000021ddb34 21edb34 000000 00   A  0   0  1

There appears to be a subtle difference between GNU ld and ld.lld when
it comes to discarding sections that specify load addresses [2].

Since '--no-dynamic-linker' prevents emission of the .interp section,
there is no need to describe it in the output sections of the vmlinux
linker script. Drop the .interp section description from vmlinux.lds.S
to avoid this issue altogether.

Link: https://sourceware.org/bugzilla/show_bug.cgi?id=33481 [1]
Link: https://github.com/ClangBuiltLinux/linux/issues/2137 [2]
Reported-by: Vishal Chourasia <vishalc@linux.ibm.com>
Closes: https://lore.kernel.org/20251013040148.560439-1-vishalc@linux.ibm.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251018-ppc-fix-lld-interp-v1-1-a083de6dccc9@kernel.org
2025-11-11 14:23:27 +05:30
Christophe Leroy
10e1c77c36 powerpc/32: Fix unpaired stwcx. on interrupt exit
Commit b96bae3ae2 ("powerpc/32: Replace ASM exception exit by C
exception exit from ppc64") erroneouly copied to powerpc/32 the logic
from powerpc/64 based on feature CPU_FTR_STCX_CHECKS_ADDRESS which is
always 0 on powerpc/32.

Re-instate the logic implemented by commit b64f87c16f ("[POWERPC]
Avoid unpaired stwcx. on some processors") which is based on
CPU_FTR_NEED_PAIRED_STWCX feature.

Fixes: b96bae3ae2 ("powerpc/32: Replace ASM exception exit by C exception exit from ppc64")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/6040b5dbcf5cdaa1cd919fcf0790f12974ea6e5a.1757666244.git.christophe.leroy@csgroup.eu
2025-11-11 14:14:07 +05:30
Christophe Leroy
2997876c4a powerpc/32: Restore clearing of MSR[RI] at interrupt/syscall exit
Commit 13799748b9 ("powerpc/64: use interrupt restart table to speed
up return from interrupt") removed the inconditional clearing of
MSR[RI] when returning from interrupt into kernel. But powerpc/32
doesn't implement interrupt restart table hence still need MSR[RI]
to be cleared.

It could be added back in interrupt_exit_kernel_prepare() but it is
easier and better to add it back in entry_32.S for following reasons:
- Writing to MSR must be followed by a synchronising instruction
- The smaller the non recoverable section is the better it is

So add a macro called clr_ri and use it in the three places that play
up with SRR0/SRR1. Use it just before another mtspr for synchronisation
to avoid having to add an isync.

Now that's done in entry_32.S, exit_must_hard_disable() can return
false for non book3s/64, taking into account that BOOKE doesn't have
MSR_RI.

Also add back blacklisting syscall_exit_finish for kprobe. This was
initially added by commit 7cdf440138 ("powerpc/entry32: Blacklist
syscall exit points for kprobe.") then lost with
commit 6f76a01173 ("powerpc/syscall: implement system call
entry/exit logic in C for PPC32").

Fixes: 6f76a01173 ("powerpc/syscall: implement system call entry/exit logic in C for PPC32")
Fixes: 13799748b9 ("powerpc/64: use interrupt restart table to speed up return from interrupt")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/66d0ab070563ad460ed481328ab0887c27f21a2c.1757593807.git.christophe.leroy@csgroup.eu
2025-11-11 14:13:37 +05:30
Christophe Leroy
98fa236044 powerpc/8xx: Remove specific code from fast_exception_return
The label 2: in fast_exception_return is a leftover from
commit b96bae3ae2 ("powerpc/32: Replace ASM exception exit by C
exception exit from ppc64"). Once removed, we see that
fast_exception_return is a standalone function that is called only
from pieces of assembly dedicated to book3s/32 or booke, never by
common code or 8xx code.

So remove the clear of MSR[RI] enclosed in #ifdef CONFIG_PPC_8xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/39de3e0f0122b571474b1ba352a2dc3ad8cb71dd.1756304318.git.christophe.leroy@csgroup.eu
2025-11-11 14:12:48 +05:30
Sourabh Jain
b4a96ab50f powerpc/kdump: Add support for crashkernel CMA reservation
Commit 35c18f2933 ("Add a new optional ",cma" suffix to the
crashkernel= command line option") and commit ab475510e0 ("kdump:
implement reserve_crashkernel_cma") added CMA support for kdump
crashkernel reservation.

Extend crashkernel CMA reservation support to powerpc.

The following changes are made to enable CMA reservation on powerpc:

- Parse and obtain the CMA reservation size along with other crashkernel
  parameters
- Call reserve_crashkernel_cma() to allocate the CMA region for kdump
- Include the CMA-reserved ranges in the usable memory ranges for the
  kdump kernel to use.
- Exclude the CMA-reserved ranges from the crash kernel memory to
  prevent them from being exported through /proc/vmcore.

With the introduction of the CMA crashkernel regions,
crash_exclude_mem_range() needs to be called multiple times to exclude
both crashk_res and crashk_cma_ranges from the crash memory ranges. To
avoid repetitive logic for validating mem_ranges size and handling
reallocation when required, this functionality is moved to a new wrapper
function crash_exclude_mem_range_guarded().

To ensure proper CMA reservation, reserve_crashkernel_cma() is called
after pageblock_order is initialized.

Update kernel-parameters.txt to document CMA support for crashkernel on
powerpc architecture.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251107080334.708028-1-sourabhjain@linux.ibm.com
2025-11-11 14:11:08 +05:30
Christian Brauner
b36d4b6aa8
arch: hookup listns() system call
Add the listns() system call to all architectures.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-20-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:18 +01:00
Leon Romanovsky
a10d648d13 powerpc: Convert to physical address DMA mapping
Adapt PowerPC DMA to use physical addresses in order to prepare code
to removal .map_page and .unmap_page.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20251015-remove-map-page-v5-10-3bbfe3a25cdf@kernel.org
2025-10-29 10:27:30 +01:00
Nicolin Chen
fd714986e4 iommu: Pass in old domain to attach_dev callback functions
The IOMMU core attaches each device to a default domain on probe(). Then,
every new "attach" operation has a fundamental meaning of two-fold:
 - detach from its currently attached (old) domain
 - attach to a given new domain

Modern IOMMU drivers following this pattern usually want to clean up the
things related to the old domain, so they call iommu_get_domain_for_dev()
to fetch the old domain.

Pass in the old domain pointer from the core to drivers, aligning with the
set_dev_pasid op that does so already.

Ensure all low-level attach fcuntions in the core can forward the correct
old domain pointer. Thus, rework those functions as well.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-10-27 13:55:35 +01:00
Sourabh Jain
0843ba4584 powerpc/fadump: skip parameter area allocation when fadump is disabled
Fadump allocates memory to pass additional kernel command-line argument
to the fadump kernel. However, this allocation is not needed when fadump
is disabled. So avoid allocating memory for the additional parameter
area in such cases.

Fixes: f4892c68ec ("powerpc/fadump: allocate memory for additional parameters early")
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Fixes: f4892c68ec ("powerpc/fadump: allocate memory for additional  parameters early")
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251008032934.262683-1-sourabhjain@linux.ibm.com
2025-10-13 09:41:31 +05:30
Linus Torvalds
2f2c725493 pci-v6.18-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmjgOAkUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxzlA//QxoUF4p1cN7+rPwuzCPNi2ZmKNyU
 T7mLfUciV/t8nPLPFdtxdttHB3F+BsA/E9WYFiUUGBzvdYafnoZ/Qnio1WdMIIYz
 0eVrTpnMUMBXrUwGFnnIER3b4GCJb2WR3RPfaBrbqQRHoAlDmv/ijh7rIKhgWIeR
 NsCmPiFnsxPjgVusn2jXWLheUHEbZh2dVTk9lceQXFRdrUELC9wH7zigAA6GviGO
 ssPC1pKfg5DrtuuM6k9JCcEYibQIlynxZ8sbT6YfQ2bs1uSEd2pEcr7AORb4l2yQ
 rcirHwGTpvZ/QvzKpDY8FcuzPFRP7QPd+34zMEQ2OW04y1k61iKE/4EE2Z9w/OoW
 esFQXbevy9P5JHu6DBcaJ2uwvnLiVesry+9CmkKCc6Dxyjbcbgeta1LR5dhn1Rv0
 dMtRnkd/pxzIF5cRnu+WlOFV2aAw2gKL9pGuimH5TO4xL2qCZKak0hh8PAjUN2c/
 12GAlrwAyBK1FeY2ZflTN7Vr8o2O0I6I6NeaF3sCW1VO2e6E9/bAIhrduUO4lhGq
 BHTVRBefFRtbFVaxTlUAj+lSCyqES3Wzm8y/uLQvT6M3opunTziSDff1aWbm1Y2t
 aASl1IByuKsGID8VrT5khHeBKSWtnd/v7LLUjCeq+g6eKdfN2arInPvw5X1NpVMj
 tzzBYqwHgBoA4u8=
 =BUw/
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Add PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() macros that
     take config space accessor functions.

     Implement pci_find_capability(), pci_find_ext_capability(), and
     dwc, dwc endpoint, and cadence capability search interfaces with
     them (Hans Zhang)

   - Leave parent unit address 0 in 'interrupt-map' so that when we
     build devicetree nodes to describe PCI functions that contain
     multiple peripherals, we can build this property even when
     interrupt controllers lack 'reg' properties (Lorenzo Pieralisi)

   - Add a Xeon 6 quirk to disable Extended Tags and limit Max Read
     Request Size to 128B to avoid a performance issue (Ilpo Järvinen)

   - Add sysfs 'serial_number' file to expose the Device Serial Number
     (Matthew Wood)

   - Fix pci_acpi_preserve_config() memory leak (Nirmoy Das)

  Resource management:

   - Align m68k pcibios_enable_device() with other arches (Ilpo
     Järvinen)

   - Remove sparc pcibios_enable_device() implementations that don't do
     anything beyond what pci_enable_resources() does (Ilpo Järvinen)

   - Remove mips pcibios_enable_resources() and use
     pci_enable_resources() instead (Ilpo Järvinen)

   - Clean up bridge window sizing and assignment (Ilpo Järvinen),
     including:

       - Leave non-claimed bridge windows disabled

       - Enable bridges even if a window wasn't assigned because not all
         windows are required by downstream devices

       - Preserve bridge window type when releasing the resource, since
         the type is needed for reassignment

       - Consolidate selection of bridge windows into two new
         interfaces, pbus_select_window() and
         pbus_select_window_for_type(), so this is done consistently

       - Compute bridge window start and end earlier to avoid logging
         stale information

  MSI:

   - Add quirk to disable MSI on RDC PCI to PCIe bridges (Marcos Del Sol
     Vives)

  Error handling:

   - Align AER with EEH by allowing drivers to request a Bus Reset on
     Non-Fatal Errors (in addition to the reset on Fatal Errors that we
     already do) (Lukas Wunner)

   - If error recovery fails, emit FAILED_RECOVERY uevents for the
     devices, not for the bridge leading to them.

     This makes them correspond to BEGIN_RECOVERY uevents (Lukas Wunner)

   - Align AER with EEH by calling err_handler.error_detected()
     callbacks to notify drivers if error recovery fails (Lukas Wunner)

   - Align AER with EEH by restoring device error_state to
     pci_channel_io_normal before the err_handler.slot_reset() callback.

     This is earlier than before the err_handler.resume() callback
     (Lukas Wunner)

   - Emit a BEGIN_RECOVERY uevent when driver's
     err_handler.error_detected() requests a reset, as well as when it
     says recovery is complete or can be done without a reset (Niklas
     Schnelle)

   - Align s390 with AER and EEH by emitting uevents during error
     recovery (Niklas Schnelle)

   - Align EEH with AER and s390 by emitting BEGIN_RECOVERY,
     SUCCESSFUL_RECOVERY, or FAILED_RECOVERY uevents depending on the
     result of err_handler.error_detected() (Niklas Schnelle)

   - Fix a NULL pointer dereference in aer_ratelimit() when ACPI GHES
     error information identifies a device without an AER Capability
     (Breno Leitao)

   - Update error decoding and TLP Log printing for new errors in
     current PCIe base spec (Lukas Wunner)

   - Update error recovery documentation to match the current code
     and use consistent nomenclature (Lukas Wunner)

  ASPM:

   - Enable all ClockPM and ASPM states for devicetree platforms, since
     there's typically no firmware that enables ASPM

     This is a risky change that may uncover hardware or configuration
     defects at boot-time rather than when users enable ASPM via sysfs
     later. Booting with "pcie_aspm=off" prevents this enabling
     (Manivannan Sadhasivam)

   - Remove the qcom code that enabled ASPM (Manivannan Sadhasivam)

  Power management:

   - If a device has already been disconnected, e.g., by a hotplug
     removal, don't bother trying to resume it to D0 when detaching the
     driver.

     This avoids annoying "Unable to change power state from D3cold to
     D0" messages (Mario Limonciello)

   - Ensure devices are powered up before config reads for
     'max_link_width', 'current_link_speed', 'current_link_width',
     'secondary_bus_number', and 'subordinate_bus_number' sysfs files.

     This prevents using invalid data (~0) in drivers or lspci and,
     depending on how the PCIe controller reports errors, may avoid
     error interrupts or crashes (Brian Norris)

  Virtualization:

   - Add rescan/remove locking when enabling/disabling SR-IOV, which
     avoids list corruption on s390, where disabling SR-IOV also
     generates hotplug events (Niklas Schnelle)

  Peer-to-peer DMA:

   - Free struct p2p_pgmap, not a member within it, in the
     pci_p2pdma_add_resource() error path (Sungho Kim)

  Endpoint framework:

   - Document sysfs interface for BAR assignment of vNTB endpoint
     functions (Jerome Brunet)

   - Fix array underflow in endpoint BAR test case (Dan Carpenter)

   - Skip endpoint IRQ test if the IRQ is out of range to avoid false
     errors (Christian Bruel)

   - Fix endpoint test case for controllers with fixed-size BARs smaller
     than requested by the test (Marek Vasut)

   - Restore inbound translation when disabling doorbell so the endpoint
     doorbell test case can be run more than once (Niklas Cassel)

   - Avoid a NULL pointer dereference when releasing DMA channels in
     endpoint DMA test case (Shin'ichiro Kawasaki)

   - Convert tegra194 interrupt number to MSI vector to fix endpoint
     Kselftest MSI_TEST test case (Niklas Cassel)

   - Reset tegra194 BARs when running in endpoint mode so the BAR tests
     don't overwrite the ATU settings in BAR4 (Niklas Cassel)

   - Handle errors in tegra194 BPMP transactions so we don't mistakenly
     skip future PERST# assertion (Vidya Sagar)

  AMD MDB PCIe controller driver:

   - Update DT binding example to separate PERST# to a Root Port stanza
     to make multiple Root Ports possible in the future (Sai Krishna
     Musham)

   - Add driver support for PERST# being described in a Root Port
     stanza, falling back to the host bridge if not found there (Sai
     Krishna Musham)

  Freescale i.MX6 PCIe controller driver:

   - Enable the 3.3V Vaux supply if available so devices can request
     wakeup with either Beacon or WAKE# (Richard Zhu)

  MediaTek PCIe Gen3 controller driver:

   - Add optional sys clock ready time setting to avoid sys_clk_rdy
     signal glitching in MT6991 and MT8196 (AngeloGioacchino Del Regno)

   - Add DT binding and driver support for MT6991 and MT8196
     (AngeloGioacchino Del Regno)

  NVIDIA Tegra PCIe controller driver:

   - When asserting PERST#, disable the controller instead of mistakenly
     disabling the PLL twice (Nagarjuna Kristam)

   - Convert struct tegra_msi mask_lock to raw spinlock to avoid a lock
     nesting error (Marek Vasut)

  Qualcomm PCIe controller driver:

   - Select PCI Power Control Slot driver so slot voltage rails can be
     turned on/off if described in Root Port devicetree node (Qiang Yu)

   - Parse only PCI bridge child nodes in devicetree, skipping unrelated
     nodes such as OPP (Operating Performance Points), which caused
     probe failures (Krishna Chaitanya Chundru)

   - Add 8.0 GT/s and 32.0 GT/s equalization settings (Ziyue Zhang)

   - Consolidate Root Port 'phy' and 'reset' properties in struct
     qcom_pcie_port, regardless of whether we got them from the Root
     Port node or the host bridge node (Manivannan Sadhasivam)

   - Fetch and map the ELBI register space in the DWC core rather than
     in each driver individually (Krishna Chaitanya Chundru)

   - Enable ECAM mechanism in DWC core by setting up iATU with 'CFG
     Shift Feature' and use this in the qcom driver (Krishna Chaitanya
     Chundru)

   - Add SM8750 compatible to qcom,pcie-sm8550.yaml (Krishna Chaitanya
     Chundru)

   - Update qcom,pcie-x1e80100.yaml to allow fifth PCIe host on Qualcomm
     Glymur, which is compatible with X1E80100 but doesn't have the
     cnoc_sf_axi clock (Qiang Yu)

  Renesas R-Car PCIe controller driver:

   - Fix a typo that prevented correct PHY initialization (Marek Vasut)

   - Add a missing 1ms delay after PWR reset assertion as required by
     the V4H manual (Marek Vasut)

   - Assure reset has completed before DBI access to avoid SError (Marek
     Vasut)

   - Fix inverted PHY initialization check, which sometimes led to
     timeouts and failure to start the controller (Marek Vasut)

   - Pass the correct IRQ domain to generic_handle_domain_irq() to fix a
     regression when converting to msi_create_parent_irq_domain()
     (Claudiu Beznea)

   - Drop the spinlock protecting the PMSR register - it's no longer
     required since pci_lock already serializes accesses (Marek Vasut)

   - Convert struct rcar_msi mask_lock to raw spinlock to avoid a lock
     nesting error (Marek Vasut)

  SOPHGO PCIe controller driver:

   - Check for existence of struct cdns_pcie.ops before using it to
     allow Cadence drivers that don't need to supply ops (Chen Wang)

   - Add DT binding and driver for the SOPHGO SG2042 PCIe controller
     (Chen Wang)

  STMicroelectronics STM32MP25 PCIe controller driver:

   - Update pinctrl documentation of initial states and use in runtime
     suspend/resume (Christian Bruel)

   - Add pinctrl_pm_select_init_state() for use by stm32 driver, which
     needs it during resume (Christian Bruel)

   - Add devicetree bindings and drivers for the STMicroelectronics
     STM32MP25 in host and endpoint modes (Christian Bruel)

  Synopsys DesignWare PCIe controller driver:

   - Add support for x16 in devicetree 'num-lanes' property (Konrad
     Dybcio)

   - Verify that if DT specifies a single IRQ for all eDMA channels, it
     is named 'dma' (Niklas Cassel)

  TI J721E PCIe driver:

   - Add MODULE_DEVICE_TABLE() so driver can be autoloaded (Siddharth
     Vadapalli)

   - Power controller off before configuring the glue layer so the
     controller latches the correct values on power-on (Siddharth
     Vadapalli)

  TI Keystone PCIe controller driver:

   - Use devm_request_irq() so 'ks-pcie-error-irq' is freed when driver
     exits with error (Siddharth Vadapalli)

   - Add Peripheral Virtualization Unit (PVU), which restricts DMA from
     PCIe devices to specific regions of host memory, to the ti,am65
     binding (Jan Kiszka)

  Xilinx NWL PCIe controller driver:

   - Clear bootloader E_ECAM_CONTROL before merging in the new driver
     value to avoid writing invalid values (Jani Nurminen)"

* tag 'pci-v6.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (141 commits)
  PCI/AER: Avoid NULL pointer dereference in aer_ratelimit()
  MAINTAINERS: Add entry for ST STM32MP25 PCIe drivers
  PCI: stm32-ep: Add PCIe Endpoint support for STM32MP25
  dt-bindings: PCI: Add STM32MP25 PCIe Endpoint bindings
  PCI: stm32: Add PCIe host support for STM32MP25
  PCI: xilinx-nwl: Fix ECAM programming
  PCI: j721e: Fix incorrect error message in probe()
  PCI: keystone: Use devm_request_irq() to free "ks-pcie-error-irq" on exit
  dt-bindings: PCI: qcom,pcie-x1e80100: Set clocks minItems for the fifth Glymur PCIe Controller
  PCI: dwc: Support 16-lane operation
  PCI: Add lockdep assertion in pci_stop_and_remove_bus_device()
  PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV
  PCI: rcar-host: Convert struct rcar_msi mask_lock into raw spinlock
  PCI: tegra194: Rename 'root_bus' to 'root_port_bus' in tegra_pcie_downstream_dev_to_D0()
  PCI: tegra: Convert struct tegra_msi mask_lock into raw spinlock
  PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  PCI: rcar-gen4: Assure reset occurs before DBI access
  PCI: rcar-gen4: Add missing 1ms delay after PWR reset assertion
  PCI: Set up bridge resources earlier
  PCI: rcar-host: Drop PMSR spinlock
  ...
2025-10-06 10:41:03 -07:00
Linus Torvalds
a498d59c46 dma-mapping updates for Linux 6.18:
- refactoring of DMA mapping API to physical addresses as the primary
 interface instead of page+offset parameters; this gets much closer to
 Matthew Wilcox's long term wish for struct-pageless IO to cacheable DRAM and is
 supporting memdesc project which seeks to substantially transform how
 struct page works; an advantage of this approach is the possibility of
 introducing DMA_ATTR_MMIO, which covers existing 'dma_map_resource' flow
 in the common paths, what in turn lets to use recently introduced
 dma_iova_link() API to map PCI P2P MMIO without creating struct page;
 developped by Leon Romanovsky and Jason Gunthorpe
 - minor clean-up by Petr Tesarik and Qianfeng Rong
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCaNugqAAKCRCJp1EFxbsS
 RBvDAQCEd4P6pz6ROQHf5BfiF5J1db2H6bWsFLjajx3KfNWf8gD+P0eQ0hTzLrcd
 zuSKZTivviOiyjXlt/9GOaXXPnmTwA0=
 =b0nZ
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.18-2025-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-mapping updates from Marek Szyprowski:

 - Refactoring of DMA mapping API to physical addresses as the primary
   interface instead of page+offset parameters

   This gets much closer to Matthew Wilcox's long term wish for
   struct-pageless IO to cacheable DRAM and is supporting memdesc
   project which seeks to substantially transform how struct page works.

   An advantage of this approach is the possibility of introducing
   DMA_ATTR_MMIO, which covers existing 'dma_map_resource' flow in the
   common paths, what in turn lets to use recently introduced
   dma_iova_link() API to map PCI P2P MMIO without creating struct page

   Developped by Leon Romanovsky and Jason Gunthorpe

 - Minor clean-up by Petr Tesarik and Qianfeng Rong

* tag 'dma-mapping-6.18-2025-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  kmsan: fix missed kmsan_handle_dma() signature conversion
  mm/hmm: properly take MMIO path
  mm/hmm: migrate to physical address-based DMA mapping API
  dma-mapping: export new dma_*map_phys() interface
  xen: swiotlb: Open code map_resource callback
  dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs()
  kmsan: convert kmsan_handle_dma to use physical addresses
  dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
  iommu/dma: implement DMA_ATTR_MMIO for iommu_dma_(un)map_phys()
  iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
  dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
  dma-debug: refactor to use physical addresses for page mapping
  iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link().
  dma-mapping: introduce new DMA attribute to indicate MMIO memory
  swiotlb: Remove redundant __GFP_NOWARN
  dma-direct: clean up the logic in __dma_direct_alloc_pages()
2025-10-03 17:41:12 -07:00
Linus Torvalds
6c7340a7a8 Scheduler updates for v6.18:
Core scheduler changes:
 
  - Make migrate_{en,dis}able() inline, to improve performance
    (Menglong Dong)
 
  - Move STDL_INIT() functions out-of-line (Peter Zijlstra)
 
  - Unify the SCHED_{SMT,CLUSTER,MC} Kconfig (Peter Zijlstra)
 
 Fair scheduling:
 
  - Defer throttling when tasks exit to user-space, to reduce the
    chance & impact of throttle-preemption with held locks and
    other resources. (Aaron Lu, Valentin Schneider)
 
  - Get rid of sched_domains_curr_level hack for tl->cpumask(),
    as the warning was getting triggered on certain topologies.
    (Peter Zijlstra)
 
 Misc cleanups & fixes:
 
  - Header cleanups (Menglong Dong)
 
  - Fix race in push_dl_task() (Harshit Agarwal)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmjWn0IRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1i9ng/+KJYEsNilPA6nGzZLurU3HXm6S5NrNBPc
 xJU43eo3QdFUymKqYaCoXCwaMah3m8fFW+DC8ZoEvWC5wHnlIw6+u/ZEtsWQ4R8N
 8+sjQddQeb9Gez+nkC7pXX8h1nqlhHyGWU88+9hOyEEMk21KgH7tJ5gOuGQdPhiA
 v24D8dkS1sT6CAeqZAycYIkos+kfNNV+wAEQCXAxfvSSslpzJVZcj9kdQInxF4O4
 9+WrvFZFVYJWdJgmgfbpBMw7N8a4Gpv+Lr6U0ZVXHNeLjNdn60I+5Me+9BW9GRcO
 pPL50RfGgGgVIqyEHvBhhz8p0tlKUJo9NkRJ9hmNB5SKT3P442SU789ppTYgI1il
 5P4g3TiuomqPVuo99/mYHYOpT6NkYC1fkwMP9Hfk4NRUX0EA6lKXelVnMXaC3Z7R
 U+0xVYtK4BBLRFBo4HIEM1HChSIcOnutuufFX4lPPt+ewnAmJwSt619w+lBF0v+n
 cpiLIlQ74LrYlrULw3bznnwzh5dV8XHm4CT3XAShm6qB97/SaLr0rI5W7njdSvte
 ym9z4yFWidawowvLWjFX1CykSnX/T5MV+uzuIH64MEIA39B4VKJWCAC3l3AMJn/5
 Ckhf93B5uG0q+wUtE66x/JrN7wtbz9PMpwh01fXNWs/eWc1VKkVrvq+PmHODngmz
 drSUoI1P570=
 =8ZTZ
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Core scheduler changes:

   - Make migrate_{en,dis}able() inline, to improve performance
     (Menglong Dong)

   - Move STDL_INIT() functions out-of-line (Peter Zijlstra)

   - Unify the SCHED_{SMT,CLUSTER,MC} Kconfig (Peter Zijlstra)

  Fair scheduling:

   - Defer throttling to when tasks exit to user-space, to reduce the
     chance & impact of throttle-preemption with held locks and other
     resources (Aaron Lu, Valentin Schneider)

   - Get rid of sched_domains_curr_level hack for tl->cpumask(), as the
     warning was getting triggered on certain topologies (Peter
     Zijlstra)

  Misc cleanups & fixes:

   - Header cleanups (Menglong Dong)

   - Fix race in push_dl_task() (Harshit Agarwal)"

* tag 'sched-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix some typos in include/linux/preempt.h
  sched: Make migrate_{en,dis}able() inline
  rcu: Replace preempt.h with sched.h in include/linux/rcupdate.h
  arch: Add the macro COMPILE_OFFSETS to all the asm-offsets.c
  sched/fair: Do not balance task to a throttled cfs_rq
  sched/fair: Do not special case tasks in throttled hierarchy
  sched/fair: update_cfs_group() for throttled cfs_rqs
  sched/fair: Propagate load for throttled cfs_rq
  sched/fair: Get rid of throttled_lb_pair()
  sched/fair: Task based throttle time accounting
  sched/fair: Switch to task based throttle model
  sched/fair: Implement throttle task work and related helpers
  sched/fair: Add related data structure for task based throttle
  sched: Unify the SCHED_{SMT,CLUSTER,MC} Kconfig
  sched: Move STDL_INIT() functions out-of-line
  sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
  sched/deadline: Fix race in push_dl_task()
2025-09-30 10:35:11 -07:00
Linus Torvalds
417552999d powerpc updates for 6.18
- powerpc support for BPF arena and arena atomics
  - Patches to switch to msi parent domain (per-device MSI domains)
  - Add a lock contention tracepoint in the queued spinlock slowpath
  - Fixes for underflow in pseries/powernv msi and pci paths
  - Switch from legacy-of-mm-gpiochip dependency to platform driver
  - Fixes for handling TLB misses
  - Introduce support for powerpc papr-hvpipe
  - Add vpa-dtl PMU driver for pseries platform
  - Misc fixes and cleanups
 
 Thanks to: Aboorva Devarajan, Aditya Bodkhe, Andrew Donnellan, Athira Rajeev, Cédric Le Goater, Christophe Leroy,
 Erhard Furtner, Gautam Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Joe Lawrence, Kajol Jain, Kienan Stewart,
 Linus Walleij, Mahesh Salgaonkar, Nam Cao, Nicolas Schier, Nysal Jan K.A., Ritesh Harjani (IBM), Ruben Wauters,
 Saket Kumar Bhaskar, Shashank MS, Shrikanth Hegde, Tejas Manhas, Thomas Gleixner, Thomas Huth, Thorsten Blum,
 Tyrel Datwyler, Venkat Rao Bagalkote,
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmjatlYACgkQpnEsdPSH
 ZJRQRQ//fCFV+ZMAgg54N/PMaobYSQdjvAGLGj+ynTfLas0qkhV62x3X1R85pLbW
 cqiDTcBeVT9nW7+2w+eIcS4kWku2/oorlPnsDtKTadLa78Sk9r12kzsPn+vOobwA
 /dril1kRHNqdpqUG8P751EUIFWULsKFOwNMdp24q8inpr2YOy3csVbAxUqVkSVbk
 8aafr5/hl5uvj38KdE4ditLlphNenA+kB1Tf67/R4T7MjHTqZtLPd+crqrLtEQlk
 eYvcJ1c5t/+ltgiSnuR5Go48CKDaw4YKCtqnaBiOBxMad2xnzqRo+NWmcxH1HmwI
 MZsSyHeri/sF7zBRCpytqGlci/wFQwiNeI3bpE31lV0wwu8Bmoypommxly/fPUSU
 23Qi7UuTmOXu537aFcMAfqmA05JlJh79JCEuWVN5LUgQUIalATeiyoKNPAuYYw2V
 FNYyURBBe1FNVhGyJdTg+766npUX8eqOWMImcyuAzdAV4MuBPzUbhVKJazHax+4r
 BzqDD5DeApl53+wUzLjWnN4s1rKxOuRUdWgOsrzD3DQZJkIP3p8WDZkgQY4YiI3P
 EWbHEx7q1XSiMJD4lhyAYHY01yc1bwYrrZ0/NCpl4G1trbdLb0sZmGEH9ykJlzSC
 cVV/kvH9tyy95I7HbDKc4lK1c0dC5kwFmcArTaHnDaRpRvc+eBQ=
 =U2s/
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Madhavan Srinivasan:

 - powerpc support for BPF arena and arena atomics

 - Patches to switch to msi parent domain (per-device MSI domains)

 - Add a lock contention tracepoint in the queued spinlock slowpath

 - Fixes for underflow in pseries/powernv msi and pci paths

 - Switch from legacy-of-mm-gpiochip dependency to platform driver

 - Fixes for handling TLB misses

 - Introduce support for powerpc papr-hvpipe

 - Add vpa-dtl PMU driver for pseries platform

 - Misc fixes and cleanups

Thanks to Aboorva Devarajan, Aditya Bodkhe, Andrew Donnellan, Athira
Rajeev, Cédric Le Goater, Christophe Leroy, Erhard Furtner, Gautam
Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Joe Lawrence,
Kajol Jain, Kienan Stewart, Linus Walleij, Mahesh Salgaonkar, Nam Cao,
Nicolas Schier, Nysal Jan K.A., Ritesh Harjani (IBM), Ruben Wauters,
Saket Kumar Bhaskar, Shashank MS, Shrikanth Hegde, Tejas Manhas, Thomas
Gleixner, Thomas Huth, Thorsten Blum, Tyrel Datwyler, and Venkat Rao
Bagalkote.

* tag 'powerpc-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (49 commits)
  powerpc/pseries: Define __u{8,32} types in papr_hvpipe_hdr struct
  genirq/msi: Remove msi_post_free()
  powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU
  powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed
  powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer
  powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data
  docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu
  powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf
  powerpc/time: Expose boot_tb via accessor
  powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure
  powerpc/fprobe: fix updated fprobe for function-graph tracer
  powerpc/ftrace: support CONFIG_FUNCTION_GRAPH_RETVAL
  powerpc64/modules: replace stub allocation sentinel with an explicit counter
  powerpc64/modules: correctly iterate over stubs in setup_ftrace_ool_stubs
  powerpc/ftrace: ensure ftrace record ops are always set for NOPs
  powerpc/603: Really copy kernel PGD entries into all PGDIRs
  powerpc/8xx: Remove left-over instruction and comments in DataStoreTLBMiss handler
  powerpc/pseries: HVPIPE changes to support migration
  powerpc/pseries: Enable hvpipe with ibm,set-system-parameter RTAS
  powerpc/pseries: Enable HVPIPE event message interrupt
  ...
2025-09-29 19:28:50 -07:00
Linus Torvalds
722df25ddf kernel-6.18-rc1.clone3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZgMQAKCRCRxhvAZXjc
 ornXAP954dZjz+OJw6lJLCf0j9TXJOczGHvK3oW5ZD9KnqtTdwEA7p1A6WMOKJyl
 8VtTgCS0yNt8QlznUnsSDfVm0jXVGAY=
 =tUXG
 -----END PGP SIGNATURE-----

Merge tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull copy_process updates from Christian Brauner:
 "This contains the changes to enable support for clone3() on nios2
  which apparently is still a thing.

  The more exciting part of this is that it cleans up the inconsistency
  in how the 64-bit flag argument is passed from copy_process() into the
  various other copy_*() helpers"

[ Fixed up rv ltl_monitor 32-bit support as per Sasha Levin in the merge ]

* tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  nios2: implement architecture-specific portion of sys_clone3
  arch: copy_thread: pass clone_flags as u64
  copy_process: pass clone_flags as u64 across calltree
  copy_sighand: Handle architectures where sizeof(unsigned long) < sizeof(u64)
2025-09-29 10:36:50 -07:00
Menglong Dong
35561bab76 arch: Add the macro COMPILE_OFFSETS to all the asm-offsets.c
The include/generated/asm-offsets.h is generated in Kbuild during
compiling from arch/SRCARCH/kernel/asm-offsets.c. When we want to
generate another similar offset header file, circular dependency can
happen.

For example, we want to generate a offset file include/generated/test.h,
which is included in include/sched/sched.h. If we generate asm-offsets.h
first, it will fail, as include/sched/sched.h is included in asm-offsets.c
and include/generated/test.h doesn't exist; If we generate test.h first,
it can't success neither, as include/generated/asm-offsets.h is included
by it.

In x86_64, the macro COMPILE_OFFSETS is used to avoid such circular
dependency. We can generate asm-offsets.h first, and if the
COMPILE_OFFSETS is defined, we don't include the "generated/test.h".

And we define the macro COMPILE_OFFSETS for all the asm-offsets.c for this
purpose.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-09-25 09:57:15 +02:00
Aboorva Devarajan
2dc019ca39 powerpc/time: Expose boot_tb via accessor
- Define accessor function get_boot_tb() to safely return boot_tb value,
  this is only needed when running in SPLPAR environments, so the
  accessor is built conditionally under CONFIG_PPC_SPLPAR.

- Tag boot_tb as __ro_after_init since it is written once at initialized
  and never updated afterwards.

Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Tested-by: Tejas Manhas <tejas05@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250915102947.26681-2-atrajeev@linux.ibm.com
2025-09-22 14:48:56 +05:30
Aditya Bodkhe
d733f18a6d powerpc/ftrace: support CONFIG_FUNCTION_GRAPH_RETVAL
commit a1be9ccc57 ("function_graph: Support recording and printing the
return value of function") introduced support for function graph return
value tracing.

Additionally, commit a3ed4157b7 ("fgraph: Replace fgraph_ret_regs with
ftrace_regs") further refactored and optimized the implementation,
making `struct fgraph_ret_regs` unnecessary.

This patch enables the above modifications for powerpc all, ensuring that
function graph return value tracing is available on this architecture.

In this patch we have redefined two functions:
- 'ftrace_regs_get_return_value()' - the existing implementation on
ppc returns -ve of return value based on some conditions not
relevant to our patch.
- 'ftrace_regs_get_frame_pointer()' - always returns 0 in current code .

We also allocate stack space to equivalent of 'SWITCH_FRAME_SIZE',
allowing us to directly use predefined offsets like 'GPR3' and 'GPR4'
this keeps code clean and consistent with already defined offsets .

After this patch, v6.14+ kernel can also be built with FPROBE on powerpc
but there are a few other build and runtime dependencies for FPROBE to
work properly. The next patch addresses them.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Aditya Bodkhe <adityab1@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250916044035.29033-1-adityab1@linux.ibm.com
2025-09-16 16:13:00 +05:30
Joe Lawrence
b137312fbf powerpc64/modules: replace stub allocation sentinel with an explicit counter
The logic for allocating ppc64_stub_entry trampolines in the .stubs
section relies on an inline sentinel, where a NULL .funcdata member
indicates an available slot.

While preceding commits fixed the initialization bugs that led to ftrace
stub corruption, the sentinel-based approach remains fragile: it depends
on an implicit convention between subsystems modifying different
struct types in the same memory area.

Replace the sentinel with an explicit counter, module->arch.num_stubs.
Instead of iterating through memory to find a NULL marker, the module
loader uses this counter as the boundary for the next free slot.

This simplifies the allocation code, hardens it against future changes
to stub structures, and removes the need for an extra relocation slot
previously reserved to terminate the sentinel search.

Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250912142740.3581368-4-joe.lawrence@redhat.com
2025-09-15 16:40:52 +05:30
Joe Lawrence
f6b4df37eb powerpc64/modules: correctly iterate over stubs in setup_ftrace_ool_stubs
CONFIG_PPC_FTRACE_OUT_OF_LINE introduced setup_ftrace_ool_stubs() to
extend the ppc64le module .stubs section with an array of
ftrace_ool_stub structures for each patchable function.

Fix its ppc64_stub_entry stub reservation loop to properly write across
all of the num_stubs used and not just the first entry.

Fixes: eec37961a5 ("powerpc64/ftrace: Move ftrace sequence out of line")
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250912142740.3581368-3-joe.lawrence@redhat.com
2025-09-15 16:40:52 +05:30
Joe Lawrence
5337609a31 powerpc/ftrace: ensure ftrace record ops are always set for NOPs
When an ftrace call site is converted to a NOP, its corresponding
dyn_ftrace record should have its ftrace_ops pointer set to
ftrace_nop_ops.

Correct the powerpc implementation to ensure the
ftrace_rec_set_nop_ops() helper is called on all successful NOP
initialization paths. This ensures all ftrace records are consistent
before being handled by the ftrace core.

Fixes: eec37961a5 ("powerpc64/ftrace: Move ftrace sequence out of line")
Suggested-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250912142740.3581368-2-joe.lawrence@redhat.com
2025-09-15 16:40:52 +05:30
Christophe Leroy
d9e46de4bf powerpc/8xx: Remove left-over instruction and comments in DataStoreTLBMiss handler
Commit ac9f97ff8b ("powerpc/8xx: Inconditionally use task PGDIR in
DTLB misses") removed the test that needed the valeur in SPRN_EPN but
failed to remove the read.

Remove it.

And remove related comments, including the very same comment
in InstructionTLBMiss that should have been removed by
commit 33c527522f ("powerpc/8xx: Inconditionally use task PGDIR in
ITLB misses").

Also update the comment about absence of a second level table which
has been handled implicitely since commit 5ddb75cee5 ("powerpc/8xx:
remove tests on PGDIR entry validity").

Fixes: ac9f97ff8b ("powerpc/8xx: Inconditionally use task PGDIR in DTLB misses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/5811c8d1d6187f280ad140d6c0ad6010e41eeaeb.1755361995.git.christophe.leroy@csgroup.eu
2025-09-15 13:48:22 +05:30
Haren Myneni
26b4fcecea powerpc/pseries: Define HVPIPE specific macros
Define HVPIPE specific macros which are needed to support
ibm,send-hvpipe-msg and ibm,receive-hvpipe-msg RTAS calls
and used to handle HVPIPE message events.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Shashank MS <shashank.gowda@in.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250909084402.1488456-3-haren@linux.ibm.com
2025-09-15 13:38:40 +05:30