Commit graph

370 commits

Author SHA1 Message Date
Linus Torvalds
cb5573868e Loongarch:
- Add more CPUCFG mask bits.
 
 - Improve feature detection.
 
 - Add lazy load support for FPU and binary translation (LBT) register state.
 
 - Fix return value for memory reads from and writes to in-kernel devices.
 
 - Add support for detecting preemption from within a guest.
 
 - Add KVM steal time test case to tools/selftests.
 
 ARM:
 
 - Add support for FEAT_IDST, allowing ID registers that are not
   implemented to be reported as a normal trap rather than as an UNDEF
   exception.
 
 - Add sanitisation of the VTCR_EL2 register, fixing a number of
   UXN/PXN/XN bugs in the process.
 
 - Full handling of RESx bits, instead of only RES0, and resulting in
   SCTLR_EL2 being added to the list of sanitised registers.
 
 - More pKVM fixes for features that are not supposed to be exposed to
   guests.
 
 - Make sure that MTE being disabled on the pKVM host doesn't give it
   the ability to attack the hypervisor.
 
 - Allow pKVM's host stage-2 mappings to use the Force Write Back
   version of the memory attributes by using the "pass-through'
   encoding.
 
 - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
   guest.
 
 - Preliminary work for guest GICv5 support.
 
 - A bunch of debugfs fixes, removing pointless custom iterators stored
   in guest data structures.
 
 - A small set of FPSIMD cleanups.
 
 - Selftest fixes addressing the incorrect alignment of page
   allocation.
 
 - Other assorted low-impact fixes and spelling fixes.
 
 RISC-V:
 
 - Fixes for issues discoverd by KVM API fuzzing in
   kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(),
   and kvm_riscv_vcpu_aia_imsic_update()
 
 - Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM
 
 - Transparent huge page support for hypervisor page tables
 
 - Adjust the number of available guest irq files based on MMIO
   register sizes found in the device tree or the ACPI tables
 
 - Add RISC-V specific paging modes to KVM selftests
 
 - Detect paging mode at runtime for selftests
 
 s390:
 
 - Performance improvement for vSIE (aka nested virtualization)
 
 - Completely new memory management.  s390 was a special snowflake that enlisted
   help from the architecture's page table management to build hypervisor
   page tables, in particular enabling sharing the last level of page
   tables.  This however was a lot of code (~3K lines) in order to support
   KVM, and also blocked several features.  The biggest advantages is
   that the page size of userspace is completely independent of the
   page size used by the guest: userspace can mix normal pages, THPs and
   hugetlbfs as it sees fit, and in fact transparent hugepages were not
   possible before.  It's also now possible to have nested guests and
   guests with huge pages running on the same host.
 
 - Maintainership change for s390 vfio-pci
 
 - Small quality of life improvement for protected guests
 
 x86:
 
 - Add support for giving the guest full ownership of PMU hardware (contexted
   switched around the fastpath run loop) and allowing direct access to data
   MSRs and PMCs (restricted by the vPMU model).  KVM still intercepts
   access to control registers, e.g. to enforce event filtering and to
   prevent the guest from profiling sensitive host state.  This is more
   accurate, since it has no risk of contention and thus dropped events, and
   also has significantly less overhead.
 
   For more information, see the commit message for merge commit bf2c3138ae
   ("Merge tag 'kvm-x86-pmu-6.20' of https://github.com/kvm-x86/linux into HEAD").
 
 - Disallow changing the virtual CPU model if L2 is active, for all the same
   reasons KVM disallows change the model after the first KVM_RUN.
 
 - Fix a bug where KVM would incorrectly reject host accesses to PV MSRs
   when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled, even if those
   were advertised as supported to userspace,
 
 - Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs, where KVM
   would attempt to read CR3 configuring an async #PF entry.
 
 - Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM (for x86
   only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL.  Only a few exports
   that are intended for external usage, and those are allowed explicitly.
 
 - When checking nested events after a vCPU is unblocked, ignore -EBUSY instead
   of WARNing.  Userspace can sometimes put the vCPU into what should be an
   impossible state, and spurious exit to userspace on -EBUSY does not really
   do anything to solve the issue.
 
 - Also throw in the towel and drop the WARN on INIT/SIPI being blocked when vCPU
   is in Wait-For-SIPI, which also resulted in playing whack-a-mole with syzkaller
   stuffing architecturally impossible states into KVM.
 
 - Add support for new Intel instructions that don't require anything beyond
   enumerating feature flags to userspace.
 
 - Grab SRCU when reading PDPTRs in KVM_GET_SREGS2.
 
 - Add WARNs to guard against modifying KVM's CPU caps outside of the intended
   setup flow, as nested VMX in particular is sensitive to unexpected changes
   in KVM's golden configuration.
 
 - Add a quirk to allow userspace to opt-in to actually suppress EOI broadcasts
   when the suppression feature is enabled by the guest (currently limited to
   split IRQCHIP, i.e. userspace I/O APIC).  Sadly, simply fixing KVM to honor
   Suppress EOI Broadcasts isn't an option as some userspaces have come to rely
   on KVM's buggy behavior (KVM advertises Supress EOI Broadcast irrespective
   of whether or not userspace I/O APIC supports Directed EOIs).
 
 - Clean up KVM's handling of marking mapped vCPU pages dirty.
 
 - Drop a pile of *ancient* sanity checks hidden behind in KVM's unused
   ASSERT() macro, most of which could be trivially triggered by the guest
   and/or user, and all of which were useless.
 
 - Fold "struct dest_map" into its sole user, "struct rtc_status", to make it
   more obvious what the weird parameter is used for, and to allow fropping
   these RTC shenanigans if CONFIG_KVM_IOAPIC=n.
 
 - Bury all of ioapic.h, i8254.h and related ioctls (including
   KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y.
 
 - Add a regression test for recent APICv update fixes.
 
 - Handle "hardware APIC ISR", a.k.a. SVI, updates in kvm_apic_update_apicv()
   to consolidate the updates, and to co-locate SVI updates with the updates
   for KVM's own cache of ISR information.
 
 - Drop a dead function declaration.
 
 - Minor cleanups.
 
 x86 (Intel):
 
 - Rework KVM's handling of VMCS updates while L2 is active to temporarily
   switch to vmcs01 instead of deferring the update until the next nested
   VM-Exit.  The deferred updates approach directly contributed to several
   bugs, was proving to be a maintenance burden due to the difficulty in
   auditing the correctness of deferred updates, and was polluting
   "struct nested_vmx" with a growing pile of booleans.
 
 - Fix an SGX bug where KVM would incorrectly try to handle EPCM page faults,
   and instead always reflect them into the guest.  Since KVM doesn't shadow
   EPCM entries, EPCM violations cannot be due to KVM interference and
   can't be resolved by KVM.
 
 - Fix a bug where KVM would register its posted interrupt wakeup handler even
   if loading kvm-intel.ko ultimately failed.
 
 - Disallow access to vmcb12 fields that aren't fully supported, mostly to
   avoid weirdness and complexity for FRED and other features, where KVM wants
   enable VMCS shadowing for fields that conditionally exist.
 
 - Print out the "bad" offsets and values if kvm-intel.ko refuses to load (or
   refuses to online a CPU) due to a VMCS config mismatch.
 
 x86 (AMD):
 
 - Drop a user-triggerable WARN on nested_svm_load_cr3() failure.
 
 - Add support for virtualizing ERAPS.  Note, correct virtualization of ERAPS
   relies on an upcoming, publicly announced change in the APM to reduce the
   set of conditions where hardware (i.e. KVM) *must* flush the RAP.
 
 - Ignore nSVM intercepts for instructions that are not supported according to
   L1's virtual CPU model.
 
 - Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath
   for EPT Misconfig.
 
 - Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM,
   and allow userspace to restore nested state with GIF=0.
 
 - Treat exit_code as an unsigned 64-bit value through all of KVM.
 
 - Add support for fetching SNP certificates from userspace.
 
 - Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD
   or VMSAVE on behalf of L2.
 
 - Misc fixes and cleanups.
 
 x86 selftests:
 
 - Add a regression test for TPR<=>CR8 synchronization and IRQ masking.
 
 - Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support,
   and extend x86's infrastructure to support EPT and NPT (for L2 guests).
 
 - Extend several nested VMX tests to also cover nested SVM.
 
 - Add a selftest for nested VMLOAD/VMSAVE.
 
 - Rework the nested dirty log test, originally added as a regression test for
   PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage
   and to hopefully make the test easier to understand and maintain.
 
 guest_memfd:
 
 - Remove kvm_gmem_populate()'s preparation tracking and half-baked hugepage
   handling.  SEV/SNP was the only user of the tracking and it can do it via
   the RMP.
 
 - Retroactively document and enforce (for SNP) that KVM_SEV_SNP_LAUNCH_UPDATE
   and KVM_TDX_INIT_MEM_REGION require the source page to be 4KiB aligned, to
   avoid non-trivial complexity for something that no known VMM seems to be
   doing and to avoid an API special case for in-place conversion, which
   simply can't support unaligned sources.
 
 - When populating guest_memfd memory, GUP the source page in common code and
   pass the refcounted page to the vendor callback, instead of letting vendor
   code do the heavy lifting.  Doing so avoids a looming deadlock bug with
   in-place due an AB-BA conflict betwee mmap_lock and guest_memfd's filemap
   invalidate lock.
 
 Generic:
 
 - Fix a bug where KVM would ignore the vCPU's selected address space when
   creating a vCPU-specific mapping of guest memory.  Actually this bug
   could not be hit even on x86, the only architecture with multiple
   address spaces, but it's a bug nevertheless.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmmNqwwUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPaZAf/cJx5B67lnST272esz0j29MIuT/Ti
 jnf6PI9b7XubKYOtNvlu5ZW4Jsa5dqRG0qeO/JmcXDlwBf5/UkWOyvqIXyiuTl0l
 KcSUlKPtTgKZSoZpJpTppuuDE8FSYqEdcCmjNvoYzcJoPjmaeJbK6aqO0AkBbb6e
 L5InrLV7nV9iua6rFvA0s/G8/Eq2DG8M9hTRHe6NcI/z4hvslOudvpUXtC8Jygoo
 cV8vFavUwc+atrmvhAOLvSitnrjfNa4zcG6XMOlwXPfIdvi3zqTlQTgUpwGKiAGQ
 RIDUVZ/9bcWgJqbPRsdEWwaYRkNQWc5nmrAHRpEEaYV/NeBBNf4v6qfKSw==
 =SkJ1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Loongarch:

   - Add more CPUCFG mask bits

   - Improve feature detection

   - Add lazy load support for FPU and binary translation (LBT) register
     state

   - Fix return value for memory reads from and writes to in-kernel
     devices

   - Add support for detecting preemption from within a guest

   - Add KVM steal time test case to tools/selftests

  ARM:

   - Add support for FEAT_IDST, allowing ID registers that are not
     implemented to be reported as a normal trap rather than as an UNDEF
     exception

   - Add sanitisation of the VTCR_EL2 register, fixing a number of
     UXN/PXN/XN bugs in the process

   - Full handling of RESx bits, instead of only RES0, and resulting in
     SCTLR_EL2 being added to the list of sanitised registers

   - More pKVM fixes for features that are not supposed to be exposed to
     guests

   - Make sure that MTE being disabled on the pKVM host doesn't give it
     the ability to attack the hypervisor

   - Allow pKVM's host stage-2 mappings to use the Force Write Back
     version of the memory attributes by using the "pass-through'
     encoding

   - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
     guest

   - Preliminary work for guest GICv5 support

   - A bunch of debugfs fixes, removing pointless custom iterators
     stored in guest data structures

   - A small set of FPSIMD cleanups

   - Selftest fixes addressing the incorrect alignment of page
     allocation

   - Other assorted low-impact fixes and spelling fixes

  RISC-V:

   - Fixes for issues discoverd by KVM API fuzzing in
     kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and
     kvm_riscv_vcpu_aia_imsic_update()

   - Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM

   - Transparent huge page support for hypervisor page tables

   - Adjust the number of available guest irq files based on MMIO
     register sizes found in the device tree or the ACPI tables

   - Add RISC-V specific paging modes to KVM selftests

   - Detect paging mode at runtime for selftests

  s390:

   - Performance improvement for vSIE (aka nested virtualization)

   - Completely new memory management. s390 was a special snowflake that
     enlisted help from the architecture's page table management to
     build hypervisor page tables, in particular enabling sharing the
     last level of page tables. This however was a lot of code (~3K
     lines) in order to support KVM, and also blocked several features.
     The biggest advantages is that the page size of userspace is
     completely independent of the page size used by the guest:
     userspace can mix normal pages, THPs and hugetlbfs as it sees fit,
     and in fact transparent hugepages were not possible before. It's
     also now possible to have nested guests and guests with huge pages
     running on the same host

   - Maintainership change for s390 vfio-pci

   - Small quality of life improvement for protected guests

  x86:

   - Add support for giving the guest full ownership of PMU hardware
     (contexted switched around the fastpath run loop) and allowing
     direct access to data MSRs and PMCs (restricted by the vPMU model).

     KVM still intercepts access to control registers, e.g. to enforce
     event filtering and to prevent the guest from profiling sensitive
     host state. This is more accurate, since it has no risk of
     contention and thus dropped events, and also has significantly less
     overhead.

     For more information, see the commit message for merge commit
     bf2c3138ae ("Merge tag 'kvm-x86-pmu-6.20' ...")

   - Disallow changing the virtual CPU model if L2 is active, for all
     the same reasons KVM disallows change the model after the first
     KVM_RUN

   - Fix a bug where KVM would incorrectly reject host accesses to PV
     MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled,
     even if those were advertised as supported to userspace,

   - Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs,
     where KVM would attempt to read CR3 configuring an async #PF entry

   - Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM
     (for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL.
     Only a few exports that are intended for external usage, and those
     are allowed explicitly

   - When checking nested events after a vCPU is unblocked, ignore
     -EBUSY instead of WARNing. Userspace can sometimes put the vCPU
     into what should be an impossible state, and spurious exit to
     userspace on -EBUSY does not really do anything to solve the issue

   - Also throw in the towel and drop the WARN on INIT/SIPI being
     blocked when vCPU is in Wait-For-SIPI, which also resulted in
     playing whack-a-mole with syzkaller stuffing architecturally
     impossible states into KVM

   - Add support for new Intel instructions that don't require anything
     beyond enumerating feature flags to userspace

   - Grab SRCU when reading PDPTRs in KVM_GET_SREGS2

   - Add WARNs to guard against modifying KVM's CPU caps outside of the
     intended setup flow, as nested VMX in particular is sensitive to
     unexpected changes in KVM's golden configuration

   - Add a quirk to allow userspace to opt-in to actually suppress EOI
     broadcasts when the suppression feature is enabled by the guest
     (currently limited to split IRQCHIP, i.e. userspace I/O APIC).
     Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an
     option as some userspaces have come to rely on KVM's buggy behavior
     (KVM advertises Supress EOI Broadcast irrespective of whether or
     not userspace I/O APIC supports Directed EOIs)

   - Clean up KVM's handling of marking mapped vCPU pages dirty

   - Drop a pile of *ancient* sanity checks hidden behind in KVM's
     unused ASSERT() macro, most of which could be trivially triggered
     by the guest and/or user, and all of which were useless

   - Fold "struct dest_map" into its sole user, "struct rtc_status", to
     make it more obvious what the weird parameter is used for, and to
     allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n

   - Bury all of ioapic.h, i8254.h and related ioctls (including
     KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y

   - Add a regression test for recent APICv update fixes

   - Handle "hardware APIC ISR", a.k.a. SVI, updates in
     kvm_apic_update_apicv() to consolidate the updates, and to
     co-locate SVI updates with the updates for KVM's own cache of ISR
     information

   - Drop a dead function declaration

   - Minor cleanups

  x86 (Intel):

   - Rework KVM's handling of VMCS updates while L2 is active to
     temporarily switch to vmcs01 instead of deferring the update until
     the next nested VM-Exit.

     The deferred updates approach directly contributed to several bugs,
     was proving to be a maintenance burden due to the difficulty in
     auditing the correctness of deferred updates, and was polluting
     "struct nested_vmx" with a growing pile of booleans

   - Fix an SGX bug where KVM would incorrectly try to handle EPCM page
     faults, and instead always reflect them into the guest. Since KVM
     doesn't shadow EPCM entries, EPCM violations cannot be due to KVM
     interference and can't be resolved by KVM

   - Fix a bug where KVM would register its posted interrupt wakeup
     handler even if loading kvm-intel.ko ultimately failed

   - Disallow access to vmcb12 fields that aren't fully supported,
     mostly to avoid weirdness and complexity for FRED and other
     features, where KVM wants enable VMCS shadowing for fields that
     conditionally exist

   - Print out the "bad" offsets and values if kvm-intel.ko refuses to
     load (or refuses to online a CPU) due to a VMCS config mismatch

  x86 (AMD):

   - Drop a user-triggerable WARN on nested_svm_load_cr3() failure

   - Add support for virtualizing ERAPS. Note, correct virtualization of
     ERAPS relies on an upcoming, publicly announced change in the APM
     to reduce the set of conditions where hardware (i.e. KVM) *must*
     flush the RAP

   - Ignore nSVM intercepts for instructions that are not supported
     according to L1's virtual CPU model

   - Add support for expedited writes to the fast MMIO bus, a la VMX's
     fastpath for EPT Misconfig

   - Don't set GIF when clearing EFER.SVME, as GIF exists independently
     of SVM, and allow userspace to restore nested state with GIF=0

   - Treat exit_code as an unsigned 64-bit value through all of KVM

   - Add support for fetching SNP certificates from userspace

   - Fix a bug where KVM would use vmcb02 instead of vmcb01 when
     emulating VMLOAD or VMSAVE on behalf of L2

   - Misc fixes and cleanups

  x86 selftests:

   - Add a regression test for TPR<=>CR8 synchronization and IRQ masking

   - Overhaul selftest's MMU infrastructure to genericize stage-2 MMU
     support, and extend x86's infrastructure to support EPT and NPT
     (for L2 guests)

   - Extend several nested VMX tests to also cover nested SVM

   - Add a selftest for nested VMLOAD/VMSAVE

   - Rework the nested dirty log test, originally added as a regression
     test for PML where KVM logged L2 GPAs instead of L1 GPAs, to
     improve test coverage and to hopefully make the test easier to
     understand and maintain

  guest_memfd:

   - Remove kvm_gmem_populate()'s preparation tracking and half-baked
     hugepage handling. SEV/SNP was the only user of the tracking and it
     can do it via the RMP

   - Retroactively document and enforce (for SNP) that
     KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the
     source page to be 4KiB aligned, to avoid non-trivial complexity for
     something that no known VMM seems to be doing and to avoid an API
     special case for in-place conversion, which simply can't support
     unaligned sources

   - When populating guest_memfd memory, GUP the source page in common
     code and pass the refcounted page to the vendor callback, instead
     of letting vendor code do the heavy lifting. Doing so avoids a
     looming deadlock bug with in-place due an AB-BA conflict betwee
     mmap_lock and guest_memfd's filemap invalidate lock

  Generic:

   - Fix a bug where KVM would ignore the vCPU's selected address space
     when creating a vCPU-specific mapping of guest memory. Actually
     this bug could not be hit even on x86, the only architecture with
     multiple address spaces, but it's a bug nevertheless"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (267 commits)
  KVM: s390: Increase permitted SE header size to 1 MiB
  MAINTAINERS: Replace backup for s390 vfio-pci
  KVM: s390: vsie: Fix race in acquire_gmap_shadow()
  KVM: s390: vsie: Fix race in walk_guest_tables()
  KVM: s390: Use guest address to mark guest page dirty
  irqchip/riscv-imsic: Adjust the number of available guest irq files
  RISC-V: KVM: Transparent huge page support
  RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test
  RISC-V: KVM: Allow Zalasr extensions for Guest/VM
  KVM: riscv: selftests: Add riscv vm satp modes
  KVM: riscv: selftests: add Zilsd and Zclsd extension to get-reg-list test
  riscv: KVM: allow Zilsd and Zclsd extensions for Guest/VM
  RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized
  RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr()
  RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr()
  RISC-V: KVM: Remove unnecessary 'ret' assignment
  KVM: s390: Add explicit padding to struct kvm_s390_keyop
  KVM: LoongArch: selftests: Add steal time test case
  LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side
  LoongArch: KVM: Add paravirt preempt feature in hypervisor side
  ...
2026-02-13 11:31:15 -08:00
Linus Torvalds
36ae1c45b2 Scheduler changes for v7.0:
Scheduler Kconfig space updates:
 
  - Further consolidate configurable preemption modes: reduce
    the number of architectures that are allowed to offer
    PREEMPT_NONE and PREEMPT_VOLUNTARY, reducing the number
    of preemption models from four to just two: 'full' and 'lazy'
    on up-to-date architectures (arm64, loongarch, powerpc,
    riscv, s390, x86).
 
    None and voluntary are only available as legacy features
    on platforms that don't implement lazy preemption yet,
    or which don't even support preemption.
 
    The goal is to eventually remove cond_resched() and
    voluntary preemption altogether.
 
    (Peter Zijlstra)
 
 RSEQ based 'scheduler time slice extension' support:
 
 This allows a thread to request a time slice extension when it
 enters a critical section to avoid contention on a resource when
 the thread is scheduled out inside of the critical section.
 
  - Add fields and constants for time slice extension
  - Provide static branch for time slice extensions
  - Add statistics for time slice extensions
  - Add prctl() to enable time slice extensions
  - Implement sys_rseq_slice_yield()
  - Implement syscall entry work for time slice extensions
  - Implement time slice extension enforcement timer
  - Reset slice extension when scheduled
  - Implement rseq_grant_slice_extension()
  - entry: Hook up rseq time slice extension
  - selftests: Implement time slice extension test
 
    (Thomas Gleixner)
 
  - Allow registering RSEQ with slice extension
  - Move slice_ext_nsec to debugfs
  - Lower default slice extension
  - selftests/rseq: Add rseq slice histogram script
 
    (Peter Zijlstra)
 
 Scheduler performance/scalability improvements:
 
  - Update rq->avg_idle when a task is moved to an idle CPU,
    which improves the scalability of various workloads.
    (Shubhang Kaushik)
 
  - Reorder fields in 'struct rq' for better caching
    (Blake Jones)
 
  - Fair scheduler SMP NOHZ balancing code speedups:
 
    - Move checking for nohz cpus after time check
    - Change likelyhood of nohz.nr_cpus
    - Remove nohz.nr_cpus and use weight of cpumask instead
 
      (Shrikanth Hegde)
 
  - Avoid false sharing for sched_clock_irqtime (Wangyang Guo)
 
  - Drop useless cpumask_empty() in find_energy_efficient_cpu()
  - Simplify task_numa_find_cpu()
  - Use cpumask_weight_and() in sched_balance_find_dst_group()
 
    (Yury Norov)
 
 DL scheduler updates:
 
  - Add a deadline server for sched_ext tasks (by Andrea Righi and
    Joel Fernandes, with fixes by Peter Zijlstra)
 
 RT scheduler updates:
 
  - Skip currently executing CPU in rto_next_cpu() (Chen Jinghuang)
 
 Entry code updates and performance improvements, which is part of the
 scheduler tree in this cycle due to interdependencies with the RSEQ
 based time slice extension work:
 
   - Remove unused syscall argument from syscall_trace_enter()
   - Rework syscall_exit_to_user_mode_work() for architecture reuse
   - Add arch_ptrace_report_syscall_entry/exit()
   - Inline syscall_exit_work() and syscall_trace_enter()
 
     (Jinjie Ruan)
 
 Scheduler core updates:
 
  - Rework sched_class::wakeup_preempt() and rq_modified_*()
  - Avoid rq->lock bouncing in sched_balance_newidle()
  - Rename rcu_dereference_check_sched_domain() =>
           rcu_dereference_sched_domain()
  - <linux/compiler_types.h>: Add the __signed_scalar_typeof() helper
 
    (Peter Zijlstra)
 
 Fair scheduler updates/refactoring:
 
  - Fold the sched_avg update
  - Change rcu_dereference_check_sched_domain() to rcu-sched
  - Switch to rcu_dereference_all()
  - Remove superfluous rcu_read_lock()
  - Limit hrtick work
 
    (Peter Zijlstra)
 
  - Join two #ifdef CONFIG_FAIR_GROUP_SCHED blocks
  - Clean up comments in 'struct cfs_rq'
  - Separate se->vlag from se->vprot
  - Rename cfs_rq::avg_load to cfs_rq::sum_weight
  - Rename cfs_rq::avg_vruntime to ::sum_w_vruntime & helper functions
  - Introduce and use the vruntime_cmp() and vruntime_op() wrappers
    for wrapped-signed aritmetics
  - Sort out 'blocked_load*' namespace noise
 
    (Ingo Molnar)
 
 Scheduler debugging code updates:
 
  - Export hidden tracepoints to modules (Gabriele Monaco)
 
  - Convert copy_from_user() + kstrtouint() to kstrtouint_from_user()
    (Fushuai Wang)
 
  - Add assertions to QUEUE_CLASS (Peter Zijlstra)
 
  - hrtimer: Fix tracing oddity (Thomas Gleixner)
 
 Misc fixes and cleanups:
 
  - Re-evaluate scheduling when migrating queued tasks out of
    throttled cgroups (Zicheng Qu)
 
  - Remove task_struct->faults_disabled_mapping (Christoph Hellwig)
 
  - Fix math notation errors in avg_vruntime comment (Zhan Xusheng)
 
  - sched/cpufreq: Use %pe format for PTR_ERR() printing (zenghongling)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmJj+IRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1grtQ//WyXYGVE/WicdqslfaCY2Mr0uJnL0tLSM
 CJp+0LROdkmy+ChJmftO8RgjCUSsjhC4/xcBhUQXApf/ffQi3b2jH6nkTp/Z64Ms
 p2IXLkBiZjwdcO6fGbB0JE2G1J4hGRC5BlqfgkZzWidMf3kIbmrHg99mVWGzODLY
 N/cPW4d0WGf9TScl1FgEiOqgF3czMLlqvTDJqaFMpsTzSUcRBnrG4xushb4W/bBx
 573eqxgZJ6urNSGu8niY9PAl9F7gskXW3YxI3k8SH7VmJKSevWlwI9vMEhcRDzud
 E0XxD7J8iPOKtr7ypXm7anMBv4jWVUdAnPbYi4TDsyDDU/HguqMqT1McTGn8wQ+F
 jmdhmMC9/TEIzq93SNLbCYieibqDsmJoNVFFi0FWfPLMtYbcZd5a884SIz532vx4
 DegdlDXdazUwhxzDiQR3sq1CsHXpxNS2YdrpadAtF/r2gU86DQjsEew8yBvXi7bb
 Wrkzpax70sU1AFI23wJQkEb/OnnXyehAHAhhQN6GVvuiGr9P7C02WLEGLlmSmJrx
 zl2F750P76yhTfGcvTfJ/5LTfSB+yRozGvcdXnIkyzWotY6a2D1MKNusAfVax+IR
 kyfAWqVdxBhlKnqYbu92lTogvnPh3Lymd6G4TZZRkSH2jixyGd2oS7nZaDBAeBEM
 NHQtr9R+KyU=
 =Xj2f
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Scheduler Kconfig space updates:

   - Further consolidate configurable preemption modes (Peter Zijlstra)

     Reduce the number of architectures that are allowed to offer
     PREEMPT_NONE and PREEMPT_VOLUNTARY, reducing the number of
     preemption models from four to just two: 'full' and 'lazy' on
     up-to-date architectures (arm64, loongarch, powerpc, riscv, s390,
     x86).

     None and voluntary are only available as legacy features on
     platforms that don't implement lazy preemption yet, or which don't
     even support preemption.

     The goal is to eventually remove cond_resched() and voluntary
     preemption altogether.

  RSEQ based 'scheduler time slice extension' support (Thomas Gleixner
  and Peter Zijlstra):

  This allows a thread to request a time slice extension when it enters
  a critical section to avoid contention on a resource when the thread
  is scheduled out inside of the critical section.

   - Add fields and constants for time slice extension
   - Provide static branch for time slice extensions
   - Add statistics for time slice extensions
   - Add prctl() to enable time slice extensions
   - Implement sys_rseq_slice_yield()
   - Implement syscall entry work for time slice extensions
   - Implement time slice extension enforcement timer
   - Reset slice extension when scheduled
   - Implement rseq_grant_slice_extension()
   - entry: Hook up rseq time slice extension
   - selftests: Implement time slice extension test
   - Allow registering RSEQ with slice extension
   - Move slice_ext_nsec to debugfs
   - Lower default slice extension
   - selftests/rseq: Add rseq slice histogram script

  Scheduler performance/scalability improvements:

   - Update rq->avg_idle when a task is moved to an idle CPU, which
     improves the scalability of various workloads (Shubhang Kaushik)

   - Reorder fields in 'struct rq' for better caching (Blake Jones)

   - Fair scheduler SMP NOHZ balancing code speedups (Shrikanth Hegde):
      - Move checking for nohz cpus after time check
      - Change likelyhood of nohz.nr_cpus
      - Remove nohz.nr_cpus and use weight of cpumask instead

   - Avoid false sharing for sched_clock_irqtime (Wangyang Guo)

   - Cleanups (Yury Norov):
      - Drop useless cpumask_empty() in find_energy_efficient_cpu()
      - Simplify task_numa_find_cpu()
      - Use cpumask_weight_and() in sched_balance_find_dst_group()

  DL scheduler updates:

   - Add a deadline server for sched_ext tasks (by Andrea Righi and Joel
     Fernandes, with fixes by Peter Zijlstra)

  RT scheduler updates:

   - Skip currently executing CPU in rto_next_cpu() (Chen Jinghuang)

  Entry code updates and performance improvements (Jinjie Ruan)

  This is part of the scheduler tree in this cycle due to inter-
  dependencies with the RSEQ based time slice extension work:

    - Remove unused syscall argument from syscall_trace_enter()
    - Rework syscall_exit_to_user_mode_work() for architecture reuse
    - Add arch_ptrace_report_syscall_entry/exit()
    - Inline syscall_exit_work() and syscall_trace_enter()

  Scheduler core updates (Peter Zijlstra):

   - Rework sched_class::wakeup_preempt() and rq_modified_*()
   - Avoid rq->lock bouncing in sched_balance_newidle()
   - Rename rcu_dereference_check_sched_domain() =>
            rcu_dereference_sched_domain()
   - <linux/compiler_types.h>: Add the __signed_scalar_typeof() helper

  Fair scheduler updates/refactoring (Peter Zijlstra and Ingo Molnar):

   - Fold the sched_avg update
   - Change rcu_dereference_check_sched_domain() to rcu-sched
   - Switch to rcu_dereference_all()
   - Remove superfluous rcu_read_lock()
   - Limit hrtick work
   - Join two #ifdef CONFIG_FAIR_GROUP_SCHED blocks
   - Clean up comments in 'struct cfs_rq'
   - Separate se->vlag from se->vprot
   - Rename cfs_rq::avg_load to cfs_rq::sum_weight
   - Rename cfs_rq::avg_vruntime to ::sum_w_vruntime & helper functions
   - Introduce and use the vruntime_cmp() and vruntime_op() wrappers for
     wrapped-signed aritmetics
   - Sort out 'blocked_load*' namespace noise

  Scheduler debugging code updates:

   - Export hidden tracepoints to modules (Gabriele Monaco)

   - Convert copy_from_user() + kstrtouint() to kstrtouint_from_user()
     (Fushuai Wang)

   - Add assertions to QUEUE_CLASS (Peter Zijlstra)

   - hrtimer: Fix tracing oddity (Thomas Gleixner)

  Misc fixes and cleanups:

   - Re-evaluate scheduling when migrating queued tasks out of throttled
     cgroups (Zicheng Qu)

   - Remove task_struct->faults_disabled_mapping (Christoph Hellwig)

   - Fix math notation errors in avg_vruntime comment (Zhan Xusheng)

   - sched/cpufreq: Use %pe format for PTR_ERR() printing
     (zenghongling)"

* tag 'sched-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups
  sched/cpufreq: Use %pe format for PTR_ERR() printing
  sched/rt: Skip currently executing CPU in rto_next_cpu()
  sched/clock: Avoid false sharing for sched_clock_irqtime
  selftests/sched_ext: Add test for DL server total_bw consistency
  selftests/sched_ext: Add test for sched_ext dl_server
  sched/debug: Fix dl_server (re)start conditions
  sched/debug: Add support to change sched_ext server params
  sched_ext: Add a DL server for sched_ext tasks
  sched/debug: Stop and start server based on if it was active
  sched/debug: Fix updating of ppos on server write ops
  sched/deadline: Clear the defer params
  entry: Inline syscall_exit_work() and syscall_trace_enter()
  entry: Add arch_ptrace_report_syscall_entry/exit()
  entry: Rework syscall_exit_to_user_mode_work() for architecture reuse
  entry: Remove unused syscall argument from syscall_trace_enter()
  sched: remove task_struct->faults_disabled_mapping
  sched: Update rq->avg_idle when a task is moved to an idle CPU
  selftests/rseq: Add rseq slice histogram script
  hrtimer: Fix trace oddity
  ...
2026-02-10 12:50:10 -08:00
Marc Zyngier
1df3f01ebf Merge branch kvm-arm64/resx into kvmarm-master/next
* kvm-arm64/resx:
  : .
  : Add infrastructure to deal with the full gamut of RESx bits
  : for NV. As a result, it is now possible to have the expected
  : semantics for some bits such as SCTLR_EL2.SPAN.
  : .
  KVM: arm64: Add debugfs file dumping computed RESx values
  KVM: arm64: Add sanitisation to SCTLR_EL2
  KVM: arm64: Remove all traces of HCR_EL2.MIOCNCE
  KVM: arm64: Remove all traces of FEAT_TME
  KVM: arm64: Simplify handling of full register invalid constraint
  KVM: arm64: Get rid of FIXED_VALUE altogether
  KVM: arm64: Simplify handling of HCR_EL2.E2H RESx
  KVM: arm64: Move RESx into individual register descriptors
  KVM: arm64: Add RES1_WHEN_E2Hx constraints as configuration flags
  KVM: arm64: Add REQUIRES_E2H1 constraint as configuration flags
  KVM: arm64: Simplify FIXED_VALUE handling
  KVM: arm64: Convert HCR_EL2.RW to AS_RES1
  KVM: arm64: Correctly handle SCTLR_EL1 RES1 bits for unsupported features
  KVM: arm64: Allow RES1 bits to be inferred from configuration
  KVM: arm64: Inherit RESx bits from FGT register descriptors
  KVM: arm64: Extend unified RESx handling to runtime sanitisation
  KVM: arm64: Introduce data structure tracking both RES0 and RES1 bits
  KVM: arm64: Introduce standalone FGU computing primitive
  KVM: arm64: Remove duplicate configuration for SCTLR_EL1.{EE,E0E}
  arm64: Convert SCTLR_EL2 to sysreg infrastructure

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:17:48 +00:00
Marc Zyngier
47e89febd3 Merge branch kvm-arm64/gicv5-prologue into kvmarm-master/next
* kvm-arm64/gicv5-prologue:
  : .
  : Prologue to GICv5 support, courtesy of Sascha Bischoff.
  :
  : This is preliminary work that sets the scene for the full-blow
  : support.
  : .
  irqchip/gic-v5: Check if impl is virt capable
  KVM: arm64: gic: Set vgic_model before initing private IRQs
  arm64/sysreg: Drop ICH_HFGRTR_EL2.ICC_HAPR_EL1 and make RES1
  KVM: arm64: gic-v3: Switch vGIC-v3 to use generated ICH_VMCR_EL2

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:17:30 +00:00
Marc Zyngier
fb40cb15e8 KVM: arm64: Remove all traces of HCR_EL2.MIOCNCE
MIOCNCE had the potential to eat your data, and also was never
implemented by anyone. It's been retrospectively removed from
the architecture, and we're happy to follow that lead.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-19-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:02:13 +00:00
Marc Zyngier
d65bf6e317 KVM: arm64: Remove all traces of FEAT_TME
FEAT_TME has been dropped from the architecture. Retrospectively.
I'm sure someone is crying somewhere, but most of us won't.

Clean-up time.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-18-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 09:02:13 +00:00
Marc Zyngier
0a35bd285f arm64: Convert SCTLR_EL2 to sysreg infrastructure
Convert SCTLR_EL2 to the sysreg infrastructure, as per the 2025-12_rel
revision of the Registers.json file.

Note that we slightly deviate from the above, as we stick to the ARM
ARM M.a definition of SCTLR_EL2[9], which is RES0, in order to avoid
dragging the POE2 definitions...

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05 08:59:28 +00:00
Sascha Bischoff
b583177aaf arm64/sysreg: Drop ICH_HFGRTR_EL2.ICC_HAPR_EL1 and make RES1
The GICv5 architecture is dropping the ICC_HAPR_EL1 and ICV_HAPR_EL1
system registers. These registers were never added to the sysregs, but
the traps for them were.

Drop the trap bit from the ICH_HFGRTR_EL2 and make it Res1 as per the
upcoming GICv5 spec change. Additionally, update the EL2 setup code to
not attempt to set that bit.

Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260128175919.3828384-4-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-30 11:10:46 +00:00
Will Deacon
2f8aed5e97 Merge branch 'for-next/errata' into for-next/core
* for-next/errata:
  arm64: errata: Workaround for SI L1 downstream coherency issue
2026-01-29 12:05:33 +00:00
Lucas Wei
3fed7e0059 arm64: errata: Workaround for SI L1 downstream coherency issue
When software issues a Cache Maintenance Operation (CMO) targeting a
dirty cache line, the CPU and DSU cluster may optimize the operation by
combining the CopyBack Write and CMO into a single combined CopyBack
Write plus CMO transaction presented to the interconnect (MCN).
For these combined transactions, the MCN splits the operation into two
separate transactions, one Write and one CMO, and then propagates the
write and optionally the CMO to the downstream memory system or external
Point of Serialization (PoS).
However, the MCN may return an early CompCMO response to the DSU cluster
before the corresponding Write and CMO transactions have completed at
the external PoS or downstream memory. As a result, stale data may be
observed by external observers that are directly connected to the
external PoS or downstream memory.

This erratum affects any system topology in which the following
conditions apply:
 - The Point of Serialization (PoS) is located downstream of the
   interconnect.
 - A downstream observer accesses memory directly, bypassing the
   interconnect.

Conditions:
This erratum occurs only when all of the following conditions are met:
 1. Software executes a data cache maintenance operation, specifically,
    a clean or clean&invalidate by virtual address (DC CVAC or DC
    CIVAC), that hits on unique dirty data in the CPU or DSU cache.
    This results in a combined CopyBack and CMO being issued to the
    interconnect.
 2. The interconnect splits the combined transaction into separate Write
    and CMO transactions and returns an early completion response to the
    CPU or DSU before the write has completed at the downstream memory
    or PoS.
 3. A downstream observer accesses the affected memory address after the
    early completion response is issued but before the actual memory
    write has completed. This allows the observer to read stale data
    that has not yet been updated at the PoS or downstream memory.

The implementation of workaround put a second loop of CMOs at the same
virtual address whose operation meet erratum conditions to wait until
cache data be cleaned to PoC. This way of implementation mitigates
performance penalty compared to purely duplicate original CMO.

Signed-off-by: Lucas Wei <lucaswei@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
2026-01-23 13:30:38 +00:00
Marc Zyngier
cb6cd8a86d Merge branch kvm-arm64/feat_idst into kvmarm-master/next
* kvm-arm64/feat_idst:
  : .
  : Add support for FEAT_IDST, allowing ID registers that are not implemented
  : to be reported as a normal trap rather than as an UNDEF exception.
  : .
  KVM: arm64: selftests: Add a test for FEAT_IDST
  KVM: arm64: pkvm: Report optional ID register traps with a 0x18 syndrome
  KVM: arm64: pkvm: Add a generic synchronous exception injection primitive
  KVM: arm64: Force trap of GMID_EL1 when the guest doesn't have MTE
  KVM: arm64: Handle CSSIDR2_EL1 and SMIDR_EL1 in a generic way
  KVM: arm64: Handle FEAT_IDST for sysregs without specific handlers
  KVM: arm64: Add a generic synchronous exception injection primitive
  KVM: arm64: Add trap routing for GMID_EL1
  arm64: Repaint ID_AA64MMFR2_EL1.IDS description

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-23 10:04:35 +00:00
Marc Zyngier
1c07f35ff7 Merge branch kvm-arm64/vtcr into kvmarm-master/next
* kvm-arm64/vtcr:
  : .
  : VTCR_EL2 conversion to the configuration-driven RESx framework,
  : fixing a couple of UXN/PXN/XN bugs in the process.
  : .
  KVM: arm64: nv: Return correct RES0 bits for FGT registers
  KVM: arm64: Always populate FGT masks at boot time
  KVM: arm64: Honor UX/PX attributes for EL2 S1 mappings
  KVM: arm64: Convert VTCR_EL2 to config-driven sanitisation
  KVM: arm64: Account for RES1 bits in DECLARE_FEAT_MAP() and co
  arm64: Convert VTCR_EL2 to sysreg infratructure
  arm64: Convert ID_AA64MMFR0_EL1.TGRAN{4,16,64}_2 to UnsignedEnum
  KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers
  KVM: arm64: Don't blindly set set PSTATE.PAN on guest exit
  KVM: arm64: nv: Respect stage-2 write permssion when setting stage-1 AF
  KVM: arm64: Remove unused vcpu_{clear,set}_wfx_traps()
  KVM: arm64: Remove unused parameter in synchronize_vcpu_pstate()
  KVM: arm64: Remove extra argument for __pvkm_host_{share,unshare}_hyp()
  KVM: arm64: Inject UNDEF for a register trap without accessor
  KVM: arm64: Copy FGT traps to unprotected pKVM VCPU on VCPU load
  KVM: arm64: Fix EL2 S1 XN handling for hVHE setups
  KVM: arm64: gic: Check for vGICv3 when clearing TWI

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-23 10:03:03 +00:00
Yicong Yang
58ce78667a arm64: Add support for FEAT_{LS64, LS64_V}
Armv8.7 introduces single-copy atomic 64-byte loads and stores
instructions and its variants named under FEAT_{LS64, LS64_V}.
These features are identified by ID_AA64ISAR1_EL1.LS64 and the
use of such instructions in userspace (EL0) can be trapped.

As st64bv (FEAT_LS64_V) and st64bv0 (FEAT_LS64_ACCDATA) can not be tell
apart, FEAT_LS64 and FEAT_LS64_ACCDATA which will be supported in later
patch will be exported to userspace, FEAT_LS64_V will be enabled only
in kernel.

In order to support the use of corresponding instructions in userspace:
- Make ID_AA64ISAR1_EL1.LS64 visbile to userspace
- Add identifying and enabling in the cpufeature list
- Expose these support of these features to userspace through HWCAP3
  and cpuinfo

ld64b/st64b (FEAT_LS64) and st64bv (FEAT_LS64_V) is intended for
special memory (device memory) so requires support by the CPU, system
and target memory location (device that support these instructions).
The HWCAP3_LS64, implies the support of CPU and system (since no
identification method from system, so SoC vendors should advertise
support in the CPU if system also support them).

Otherwise for ld64b/st64b the atomicity may not be guaranteed or a
DABT will be generated, so users (probably userspace driver developer)
should make sure the target memory (device) also have the support.
For st64bv 0xffffffffffffffff will be returned as status result for
unsupported memory so user should check it.

Document the restrictions along with HWCAP3_LS64.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>
2026-01-22 13:24:49 +00:00
Thomas Gleixner
99d2592023 rseq: Implement sys_rseq_slice_yield()
Provide a new syscall which has the only purpose to yield the CPU after the
kernel granted a time slice extension.

sched_yield() is not suitable for that because it unconditionally
schedules, but the end of the time slice extension is not required to
schedule when the task was already preempted. This also allows to have a
strict check for termination to catch user space invoking random syscalls
including sched_yield() from a time slice extension region.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20251215155708.929634896@linutronix.de
2026-01-22 11:11:17 +01:00
Marc Zyngier
4a7fe842b8 arm64: Repaint ID_AA64MMFR2_EL1.IDS description
ID_AA64MMFR2_EL1.IDS, as described in the sysreg file, is pretty horrible
as it diesctly give the ESR value. Repaint it using the usual NI/IMP
identifiers to describe the absence/presence of FEAT_IDST.

Also add the new EL3 routing feature, even if we really don't care about it.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://patch.msgid.link/20260108173233.2911955-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15 11:58:56 +00:00
Marc Zyngier
a035001dea arm64: Convert VTCR_EL2 to sysreg infratructure
Our definition of VTCR_EL2 is both partial (tons of fields are
missing) and totally inconsistent (some constants are shifted,
some are not). They are also expressed in terms of TCR, which is
rather inconvenient.

Replace the ad-hoc definitions with the the generated version.
This results in a bunch of additional changes to make the code
with the unshifted nature of generated enumerations.

The register data was extracted from the BSD licenced AARCHMRS
(AARCHMRS_OPENSOURCE_A_profile_FAT-2025-09_ASL0).

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251210173024.561160-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15 10:58:20 +00:00
Marc Zyngier
f1640174c8 arm64: Convert ID_AA64MMFR0_EL1.TGRAN{4,16,64}_2 to UnsignedEnum
ID_AA64MMFR0_EL1.TGRAN{4,16,64}_2 are currently represented as unordered
enumerations. However, the architecture treats them as Unsigned,
as hinted to by the MRS data:

(FEAT_S2TGran4K <=> (((UInt(ID_AA64MMFR0_EL1.TGran4_2) == 0) &&
		       FEAT_TGran4K) ||
		     (UInt(ID_AA64MMFR0_EL1.TGran4_2) >= 2))))

and similar descriptions exist for 16 and 64k.

This is also confirmed by D24.1.3.3 ("Alternative ID scheme used for
ID_AA64MMFR0_EL1 stage 2 granule sizes") in the L.b revision of
the ARM ARM.

Turn these fields into UnsignedEnum so that we can use the above
description more or less literally.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251210173024.561160-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15 10:58:20 +00:00
Linus Torvalds
51d90a15fe ARM:
- Support for userspace handling of synchronous external aborts (SEAs),
   allowing the VMM to potentially handle the abort in a non-fatal
   manner.
 
 - Large rework of the VGIC's list register handling with the goal of
   supporting more active/pending IRQs than available list registers in
   hardware. In addition, the VGIC now supports EOImode==1 style
   deactivations for IRQs which may occur on a separate vCPU than the
   one that acked the IRQ.
 
 - Support for FEAT_XNX (user / privileged execute permissions) and
   FEAT_HAF (hardware update to the Access Flag) in the software page
   table walkers and shadow MMU.
 
 - Allow page table destruction to reschedule, fixing long need_resched
   latencies observed when destroying a large VM.
 
 - Minor fixes to KVM and selftests
 
 Loongarch:
 
 - Get VM PMU capability from HW GCFG register.
 
 - Add AVEC basic support.
 
 - Use 64-bit register definition for EIOINTC.
 
 - Add KVM timer test cases for tools/selftests.
 
 RISC/V:
 
 - SBI message passing (MPXY) support for KVM guest
 
 - Give a new, more specific error subcode for the case when in-kernel
   AIA virtualization fails to allocate IMSIC VS-file
 
 - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually
   in small chunks
 
 - Fix guest page fault within HLV* instructions
 
 - Flush VS-stage TLB after VCPU migration for Andes cores
 
 s390:
 
 - Always allocate ESCA (Extended System Control Area), instead of
   starting with the basic SCA and converting to ESCA with the
   addition of the 65th vCPU.  The price is increased number of
   exits (and worse performance) on z10 and earlier processor;
   ESCA was introduced by z114/z196 in 2010.
 
 - VIRT_XFER_TO_GUEST_WORK support
 
 - Operation exception forwarding support
 
 - Cleanups
 
 x86:
 
 - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE
   caching is disabled, as there can't be any relevant SPTEs to zap.
 
 - Relocate a misplaced export.
 
 - Fix an async #PF bug where KVM would clear the completion queue when the
   guest transitioned in and out of paging mode, e.g. when handling an SMI and
   then returning to paged mode via RSM.
 
 - Leave KVM's user-return notifier registered even when disabling
   virtualization, as long as kvm.ko is loaded.  On reboot/shutdown, keeping
   the notifier registered is ok; the kernel does not use the MSRs and the
   callback will run cleanly and restore host MSRs if the CPU manages to
   return to userspace before the system goes down.
 
 - Use the checked version of {get,put}_user().
 
 - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC
   timers can result in a hard lockup in the host.
 
 - Revert the periodic kvmclock sync logic now that KVM doesn't use a
   clocksource that's subject to NTP corrections.
 
 - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter
   behind CONFIG_CPU_MITIGATIONS.
 
 - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast path;
   the only reason they were handled in the fast path was to paper of a bug
   in the core #MC code, and that has long since been fixed.
 
 - Add emulator support for AVX MOV instructions, to play nice with emulated
   devices whose guest drivers like to access PCI BARs with large multi-byte
   instructions.
 
 x86 (AMD):
 
 - Fix a few missing "VMCB dirty" bugs.
 
 - Fix the worst of KVM's lack of EFER.LMSLE emulation.
 
 - Add AVIC support for addressing 4k vCPUs in x2AVIC mode.
 
 - Fix incorrect handling of selective CR0 writes when checking intercepts
   during emulation of L2 instructions.
 
 - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on
   VMRUN and #VMEXIT.
 
 - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft
   interrupt if the guest patched the underlying code after the VM-Exit, e.g.
   when Linux patches code with a temporary INT3.
 
 - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to
   userspace, and extend KVM "support" to all policy bits that don't require
   any actual support from KVM.
 
 x86 (Intel):
 
 - Use the root role from kvm_mmu_page to construct EPTPs instead of the
   current vCPU state, partly as worthwhile cleanup, but mostly to pave the
   way for tracking per-root TLB flushes, and elide EPT flushes on pCPU
   migration if the root is clean from a previous flush.
 
 - Add a few missing nested consistency checks.
 
 - Rip out support for doing "early" consistency checks via hardware as the
   functionality hasn't been used in years and is no longer useful in general;
   replace it with an off-by-default module param to WARN if hardware fails
   a check that KVM does not perform.
 
 - Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32]
   on VM-Enter.
 
 - Misc cleanups.
 
 - Overhaul the TDX code to address systemic races where KVM (acting on behalf
   of userspace) could inadvertantly trigger lock contention in the TDX-Module;
   KVM was either working around these in weird, ugly ways, or was simply
   oblivious to them (though even Yan's devilish selftests could only break
   individual VMs, not the host kernel)
 
 - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a TDX vCPU,
   if creating said vCPU failed partway through.
 
 - Fix a few sparse warnings (bad annotation, 0 != NULL).
 
 - Use struct_size() to simplify copying TDX capabilities to userspace.
 
 - Fix a bug where TDX would effectively corrupt user-return MSR values if the
   TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected.
 
 Selftests:
 
 - Fix a math goof in mmu_stress_test when running on a single-CPU system/VM.
 
 - Forcefully override ARCH from x86_64 to x86 to play nice with specifying
   ARCH=x86_64 on the command line.
 
 - Extend a bunch of nested VMX to validate nested SVM as well.
 
 - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to
   verify KVM can save/restore nested VMX state when L1 is using 5-level
   paging, but L2 is not.
 
 - Clean up the guest paging code in anticipation of sharing the core logic for
   nested EPT and nested NPT.
 
 guest_memfd:
 
 - Add NUMA mempolicy support for guest_memfd, and clean up a variety of
   rough edges in guest_memfd along the way.
 
 - Define a CLASS to automatically handle get+put when grabbing a guest_memfd
   from a memslot to make it harder to leak references.
 
 - Enhance KVM selftests to make it easer to develop and debug selftests like
   those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs
   often result in hard-to-debug SIGBUS errors.
 
 - Misc cleanups.
 
 Generic:
 
 - Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for
   irqfd cleanup.
 
 - Fix a goof in the dirty ring documentation.
 
 - Fix choice of target for directed yield across different calls to
   kvm_vcpu_on_spin(); the function was always starting from the first
   vCPU instead of continuing the round-robin search.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmkvMa8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMlFwf+Ow7zOYUuELSQ+Jn+hOYXiCNrdBDx
 ZamvMU8kLPr7XX0Zog6HgcMm//qyA6k5nSfqCjfsQZrIhRA/gWJ61jz1OX/Jxq18
 pJ9Vz6epnEPYiOtBwz+v8OS8MqDqVNzj2i6W1/cLPQE50c1Hhw64HWS5CSxDQiHW
 A7PVfl5YU12lW1vG3uE0sNESDt4Eh/spNM17iddXdF4ZUOGublserjDGjbc17E7H
 8BX3DkC2plqkJKwtjg0ae62hREkITZZc7RqsnftUkEhn0N0H9+rb6NKUyzIVh9NZ
 bCtCjtrKN9zfZ0Mujnms3ugBOVqNIputu/DtPnnFKXtXWSrHrgGSNv5ewA==
 =PEcw
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:

   - Support for userspace handling of synchronous external aborts
     (SEAs), allowing the VMM to potentially handle the abort in a
     non-fatal manner

   - Large rework of the VGIC's list register handling with the goal of
     supporting more active/pending IRQs than available list registers
     in hardware. In addition, the VGIC now supports EOImode==1 style
     deactivations for IRQs which may occur on a separate vCPU than the
     one that acked the IRQ

   - Support for FEAT_XNX (user / privileged execute permissions) and
     FEAT_HAF (hardware update to the Access Flag) in the software page
     table walkers and shadow MMU

   - Allow page table destruction to reschedule, fixing long
     need_resched latencies observed when destroying a large VM

   - Minor fixes to KVM and selftests

  Loongarch:

   - Get VM PMU capability from HW GCFG register

   - Add AVEC basic support

   - Use 64-bit register definition for EIOINTC

   - Add KVM timer test cases for tools/selftests

  RISC/V:

   - SBI message passing (MPXY) support for KVM guest

   - Give a new, more specific error subcode for the case when in-kernel
     AIA virtualization fails to allocate IMSIC VS-file

   - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually
     in small chunks

   - Fix guest page fault within HLV* instructions

   - Flush VS-stage TLB after VCPU migration for Andes cores

  s390:

   - Always allocate ESCA (Extended System Control Area), instead of
     starting with the basic SCA and converting to ESCA with the
     addition of the 65th vCPU. The price is increased number of exits
     (and worse performance) on z10 and earlier processor; ESCA was
     introduced by z114/z196 in 2010

   - VIRT_XFER_TO_GUEST_WORK support

   - Operation exception forwarding support

   - Cleanups

  x86:

   - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO
     SPTE caching is disabled, as there can't be any relevant SPTEs to
     zap

   - Relocate a misplaced export

   - Fix an async #PF bug where KVM would clear the completion queue
     when the guest transitioned in and out of paging mode, e.g. when
     handling an SMI and then returning to paged mode via RSM

   - Leave KVM's user-return notifier registered even when disabling
     virtualization, as long as kvm.ko is loaded. On reboot/shutdown,
     keeping the notifier registered is ok; the kernel does not use the
     MSRs and the callback will run cleanly and restore host MSRs if the
     CPU manages to return to userspace before the system goes down

   - Use the checked version of {get,put}_user()

   - Fix a long-lurking bug where KVM's lack of catch-up logic for
     periodic APIC timers can result in a hard lockup in the host

   - Revert the periodic kvmclock sync logic now that KVM doesn't use a
     clocksource that's subject to NTP corrections

   - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the
     latter behind CONFIG_CPU_MITIGATIONS

   - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast
     path; the only reason they were handled in the fast path was to
     paper of a bug in the core #MC code, and that has long since been
     fixed

   - Add emulator support for AVX MOV instructions, to play nice with
     emulated devices whose guest drivers like to access PCI BARs with
     large multi-byte instructions

  x86 (AMD):

   - Fix a few missing "VMCB dirty" bugs

   - Fix the worst of KVM's lack of EFER.LMSLE emulation

   - Add AVIC support for addressing 4k vCPUs in x2AVIC mode

   - Fix incorrect handling of selective CR0 writes when checking
     intercepts during emulation of L2 instructions

   - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32]
     on VMRUN and #VMEXIT

   - Fix a bug where KVM corrupt the guest code stream when re-injecting
     a soft interrupt if the guest patched the underlying code after the
     VM-Exit, e.g. when Linux patches code with a temporary INT3

   - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits
     to userspace, and extend KVM "support" to all policy bits that
     don't require any actual support from KVM

  x86 (Intel):

   - Use the root role from kvm_mmu_page to construct EPTPs instead of
     the current vCPU state, partly as worthwhile cleanup, but mostly to
     pave the way for tracking per-root TLB flushes, and elide EPT
     flushes on pCPU migration if the root is clean from a previous
     flush

   - Add a few missing nested consistency checks

   - Rip out support for doing "early" consistency checks via hardware
     as the functionality hasn't been used in years and is no longer
     useful in general; replace it with an off-by-default module param
     to WARN if hardware fails a check that KVM does not perform

   - Fix a currently-benign bug where KVM would drop the guest's
     SPEC_CTRL[63:32] on VM-Enter

   - Misc cleanups

   - Overhaul the TDX code to address systemic races where KVM (acting
     on behalf of userspace) could inadvertantly trigger lock contention
     in the TDX-Module; KVM was either working around these in weird,
     ugly ways, or was simply oblivious to them (though even Yan's
     devilish selftests could only break individual VMs, not the host
     kernel)

   - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a
     TDX vCPU, if creating said vCPU failed partway through

   - Fix a few sparse warnings (bad annotation, 0 != NULL)

   - Use struct_size() to simplify copying TDX capabilities to userspace

   - Fix a bug where TDX would effectively corrupt user-return MSR
     values if the TDX Module rejects VP.ENTER and thus doesn't clobber
     host MSRs as expected

  Selftests:

   - Fix a math goof in mmu_stress_test when running on a single-CPU
     system/VM

   - Forcefully override ARCH from x86_64 to x86 to play nice with
     specifying ARCH=x86_64 on the command line

   - Extend a bunch of nested VMX to validate nested SVM as well

   - Add support for LA57 in the core VM_MODE_xxx macro, and add a test
     to verify KVM can save/restore nested VMX state when L1 is using
     5-level paging, but L2 is not

   - Clean up the guest paging code in anticipation of sharing the core
     logic for nested EPT and nested NPT

  guest_memfd:

   - Add NUMA mempolicy support for guest_memfd, and clean up a variety
     of rough edges in guest_memfd along the way

   - Define a CLASS to automatically handle get+put when grabbing a
     guest_memfd from a memslot to make it harder to leak references

   - Enhance KVM selftests to make it easer to develop and debug
     selftests like those added for guest_memfd NUMA support, e.g. where
     test and/or KVM bugs often result in hard-to-debug SIGBUS errors

   - Misc cleanups

  Generic:

   - Use the recently-added WQ_PERCPU when creating the per-CPU
     workqueue for irqfd cleanup

   - Fix a goof in the dirty ring documentation

   - Fix choice of target for directed yield across different calls to
     kvm_vcpu_on_spin(); the function was always starting from the first
     vCPU instead of continuing the round-robin search"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits)
  KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
  KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
  KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
  KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
  KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
  KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
  KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
  KVM: arm64: selftests: Add test for AT emulation
  KVM: arm64: nv: Expose hardware access flag management to NV guests
  KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
  KVM: arm64: Implement HW access flag management in stage-1 SW PTW
  KVM: arm64: Propagate PTW errors up to AT emulation
  KVM: arm64: Add helper for swapping guest descriptor
  KVM: arm64: nv: Use pgtable definitions in stage-2 walk
  KVM: arm64: Handle endianness in read helper for emulated PTW
  KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
  KVM: arm64: Call helper for reading descriptors directly
  KVM: arm64: nv: Advertise support for FEAT_XNX
  KVM: arm64: Teach ptdump about FEAT_XNX permissions
  KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions
  ...
2025-12-05 17:01:20 -08:00
Linus Torvalds
44fc84337b arm64 updates for 6.19:
Core features:
 
  - Basic Arm MPAM (Memory system resource Partitioning And Monitoring)
    driver under drivers/resctrl/ which makes use of the fs/rectrl/ API
 
 Perf and PMU:
 
  - Avoid cycle counter on multi-threaded CPUs
 
  - Extend CSPMU device probing and add additional filtering support for
    NVIDIA implementations
 
  - Add support for the PMUs on the NoC S3 interconnect
 
  - Add additional compatible strings for new Cortex and C1 CPUs
 
  - Add support for data source filtering to the SPE driver
 
  - Add support for i.MX8QM and "DB" PMU in the imx PMU driver
 
 Memory managemennt:
 
  - Avoid broadcast TLBI if page reused in write fault
 
  - Elide TLB invalidation if the old PTE was not valid
 
  - Drop redundant cpu_set_*_tcr_t0sz() macros
 
  - Propagate pgtable_alloc() errors outside of __create_pgd_mapping()
 
  - Propagate return value from __change_memory_common()
 
 ACPI and EFI:
 
  - Call EFI runtime services without disabling preemption
 
  - Remove unused ACPI function
 
 Miscellaneous:
 
  - ptrace support to disable streaming on SME-only systems
 
  - Improve sysreg generation to include a 'Prefix' descriptor
 
  - Replace __ASSEMBLY__ with __ASSEMBLER__
 
  - Align register dumps in the kselftest zt-test
 
  - Remove some no longer used macros/functions
 
  - Various spelling corrections
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmkvMjkACgkQa9axLQDI
 XvGaGg//dtT/ZAqrWa6Yniv1LOlh837C07YdxAYTTuJ+I87DnrxIqjwbW+ye+bF+
 61RTkioeCUm3PH+ncO9gPVNi4ASZ1db3/Rc8Fb6rr1TYOI1sMIeBsbbVdRJgsbX6
 zu9197jOBHscTAeDceB6jZBDyW8iSLINPZ7LN6lGxXsZM/Vn5zfE0heKEEio6Fsx
 +AzO2vos0XcwBR9vFGXtiCDx57T+/cXUtrWfA0Cjz4nvHSgD8+ghS+Jwv+kHMt1L
 zrarqbeQfj+Iixm9PVHiazv+8THo9QdNl1yGLxDmJ4LEVPewjW5jBs8+5e8e3/Gj
 p5JEvmSyWvKTTbFoM5vhxC72A7yuT1QwAk2iCyFIxMbQ25PndHboKVp/569DzOkT
 +6CjI88sVSP6D7bVlN6pFlzc/Fa07YagnDMnMCSfk4LBjUfE3jYb+usaFydyv/rl
 jwZbJrnSF/H+uQlyoJFgOEXSoQdDsll3dv6yEsUCwbd8RqXbAe3svbguOUHSdvIj
 sCViezGZQ7Rkn6D21AfF9j6e7ceaSDaf5DWMxPI3dAxFKG8TJbCBsToR59NnoSj+
 bNEozbZ1mCxmwH8i43wZ6P0RkClvJnoXcvRA+TJj02fSZACO39d3XDNswfXWL41r
 KiWGUJZyn2lPKtiAWVX6pSBtDJ+5rFhuoFgADLX6trkxDe9/EMQ=
 =4Sb6
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "These are the arm64 updates for 6.19.

  The biggest part is the Arm MPAM driver under drivers/resctrl/.
  There's a patch touching mm/ to handle spurious faults for huge pmd
  (similar to the pte version). The corresponding arm64 part allows us
  to avoid the TLB maintenance if a (huge) page is reused after a write
  fault. There's EFI refactoring to allow runtime services with
  preemption enabled and the rest is the usual perf/PMU updates and
  several cleanups/typos.

  Summary:

  Core features:

   - Basic Arm MPAM (Memory system resource Partitioning And Monitoring)
     driver under drivers/resctrl/ which makes use of the fs/rectrl/ API

  Perf and PMU:

   - Avoid cycle counter on multi-threaded CPUs

   - Extend CSPMU device probing and add additional filtering support
     for NVIDIA implementations

   - Add support for the PMUs on the NoC S3 interconnect

   - Add additional compatible strings for new Cortex and C1 CPUs

   - Add support for data source filtering to the SPE driver

   - Add support for i.MX8QM and "DB" PMU in the imx PMU driver

  Memory managemennt:

   - Avoid broadcast TLBI if page reused in write fault

   - Elide TLB invalidation if the old PTE was not valid

   - Drop redundant cpu_set_*_tcr_t0sz() macros

   - Propagate pgtable_alloc() errors outside of __create_pgd_mapping()

   - Propagate return value from __change_memory_common()

  ACPI and EFI:

   - Call EFI runtime services without disabling preemption

   - Remove unused ACPI function

  Miscellaneous:

   - ptrace support to disable streaming on SME-only systems

   - Improve sysreg generation to include a 'Prefix' descriptor

   - Replace __ASSEMBLY__ with __ASSEMBLER__

   - Align register dumps in the kselftest zt-test

   - Remove some no longer used macros/functions

   - Various spelling corrections"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits)
  arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic
  arm64/pageattr: Propagate return value from __change_memory_common
  arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS
  KVM: arm64: selftests: Consider all 7 possible levels of cache
  KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user
  arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros
  Documentation/arm64: Fix the typo of register names
  ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()
  perf: arm_spe: Add support for filtering on data source
  perf: Add perf_event_attr::config4
  perf/imx_ddr: Add support for PMU in DB (system interconnects)
  perf/imx_ddr: Get and enable optional clks
  perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe()
  dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL
  arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT
  arm64: mm: use untagged address to calculate page index
  MAINTAINERS: new entry for MPAM Driver
  arm_mpam: Add kunit tests for props_mismatch()
  arm_mpam: Add kunit test for bitmap reset
  arm_mpam: Add helper to reset saved mbwu state
  ...
2025-12-02 17:03:55 -08:00
Oliver Upton
3eef0c83c3 Merge branch 'kvm-arm64/nv-xnx-haf' into kvmarm/next
* kvm-arm64/nv-xnx-haf: (22 commits)
  : Support for FEAT_XNX and FEAT_HAF in nested
  :
  : Add support for a couple of MMU-related features that weren't
  : implemented by KVM's software page table walk:
  :
  :  - FEAT_XNX: Allows the hypervisor to describe execute permissions
  :    separately for EL0 and EL1
  :
  :  - FEAT_HAF: Hardware update of the Access Flag, which in the context of
  :    nested means software walkers must also set the Access Flag.
  :
  : The series also adds some basic support for testing KVM's emulation of
  : the AT instruction, including the implementation detail that AT sets the
  : Access Flag in KVM.
  KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
  KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
  KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
  KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
  KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
  KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
  KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
  KVM: arm64: selftests: Add test for AT emulation
  KVM: arm64: nv: Expose hardware access flag management to NV guests
  KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
  KVM: arm64: Implement HW access flag management in stage-1 SW PTW
  KVM: arm64: Propagate PTW errors up to AT emulation
  KVM: arm64: Add helper for swapping guest descriptor
  KVM: arm64: nv: Use pgtable definitions in stage-2 walk
  KVM: arm64: Handle endianness in read helper for emulated PTW
  KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
  KVM: arm64: Call helper for reading descriptors directly
  KVM: arm64: nv: Advertise support for FEAT_XNX
  KVM: arm64: Teach ptdump about FEAT_XNX permissions
  KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2
  ...

Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:47:41 -08:00
Marc Zyngier
2a28810cbb KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trapping
A long time ago, an unsuspecting architect forgot to add a trap
bit for ICV_DIR_EL1 in ICH_HCR_EL2. Which was unfortunate, but
what's a bit of spec between friends? Thankfully, this was fixed
in a later revision, and ARM "deprecates" the lack of trapping
ability.

Unfortuantely, a few (billion) CPUs went out with that defect,
anything ARMv8.0 from ARM, give or take. And on these CPUs,
you can't trap DIR on its own, full stop.

As the next best thing, we can trap everything in the common group,
which is a tad expensive, but hey ho, that's what you get. You can
otherwise recycle the HW in the neaby bin.

Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-7-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:29:11 -08:00
Oliver Upton
dc31124379 arm64: Detect FEAT_XNX
Detect the feature in anticipation of using it in KVM.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-2-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24 14:24:44 -08:00
Sascha Bischoff
a04fbfb8a1 arm64/sysreg: Add ICH_VMCR_EL2
Add the ICH_VMCR_EL2 register, which is required for the upcoming
GICv5 KVM support. This register has two different field encodings,
based on if it is used for GICv3 or GICv5-based VMs. The
GICv5-specific field encodings are generated with a FEAT_GCIE prefix.

This register is already described in the GICv3 KVM code
directly. This will be ported across to use the generated encodings as
part of an upcoming change.

Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13 18:09:46 +00:00
Sascha Bischoff
a0b130eedd arm64/sysreg: Move generation of RES0/RES1/UNKN to function
The RESx and UNKN define generation happens in two places
(EndSysreg and EndSysregFields), and was using nearly identical
code. Split this out into a function, and call that instead, rather
then keeping the dupliated code.

There are no changes to the generated sysregs as part of this change.

Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13 18:09:46 +00:00
Sascha Bischoff
fe2ef46995 arm64/sysreg: Support feature-specific fields with 'Prefix' descriptor
Some system register field encodings change based on, for example the
in-use architecture features, or the context in which they are
accessed. In order to support these different field encodings,
introduce the Prefix descriptor (Prefix, EndPrefix) for describing
such sysregs.

The Prefix descriptor can be used in the following way:

        Sysreg  EXAMPLE 0    1    2    3    4
        Prefix    FEAT_A
	Field   63:0    Foo
	EndPrefix
	Prefix    FEAT_B
	Field   63:1    Bar
 	Res0    0
        EndPrefix
        Field   63:0    Baz
        EndSysreg

This will generate a single set of system register encodings (REG_,
SYS_, ...), and then generate three sets of field definitions for the
system register called EXAMPLE. The first set is prefixed by FEAT_A,
e.g. FEAT_A_EXAMPLE_Foo. The second set is prefixed by FEAT_B, e.g.,
FEAT_B_EXAMPLE_Bar. The third set is not given a prefix at all,
e.g. EXAMPLE_BAZ. For each set, a corresponding set of defines for
Res0, Res1, and Unkn is generated.

The intent for the final prefix-less fields is to describe default or
legacy field encodings. This ensure that prefixed encodings can be
added to already-present sysregs without affecting existing legacy
code. Prefixed fields must be defined before those without a prefix,
and this is checked by the generator. This ensures consisnt ordering
within the sysregs definitions.

The Prefix descriptor can be used within Sysreg or SysregFields
blocks. Field, Res0, Res1, Unkn, Rax, SignedEnum, Enum can all be used
within a Prefix block. Fields and Mapping can not. Fields that vary
with features must be described as part of a SysregFields block,
instead. Mappings, which are just a code comment, make little sense in
this context, and have hence not been included.

There are no changes to the generated system register definitions as
part of this change.

Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13 18:09:46 +00:00
Sascha Bischoff
0aab5772a5 arm64/sysreg: Fix checks for incomplete sysreg definitions
The checks for incomplete sysreg definitions were checking if the
next_bit was greater than 0, which is incorrect and missed occasions
where bit 0 hasn't been defined for a sysreg. The reason is that
next_bit is -1 when all bits have been processed (LSB - 1).

Change the checks to use >= 0, instead. Also, set next_bit in Mapping
to -1 instead of 0 to match these new checks.

There are no changes to the generated sysreg definitons as part of
this change, and conveniently no definitions lack definitions for bit
0.

Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13 18:09:46 +00:00
Christian Brauner
b36d4b6aa8
arch: hookup listns() system call
Add the listns() system call to all architectures.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-20-2e6f823ebdc0@kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:18 +01:00
Linus Torvalds
f3826aa996 guest_memfd:
* Add support for host userspace mapping of guest_memfd-backed memory for VM
   types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't
   precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have
   no way to detect private vs. shared).
 
   This lays the groundwork for removal of guest memory from the kernel direct
   map, as well as for limited mmap() for guest_memfd-backed memory.
 
   For more information see:
   * a6ad54137a ("Merge branch 'guest-memfd-mmap' into HEAD", 2025-08-27)
   * https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
     (guest_memfd in Firecracker)
   * https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
     (direct map removal)
   * https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
     (mmap support)
 
 ARM:
 
 * Add support for FF-A 1.2 as the secure memory conduit for pKVM,
   allowing more registers to be used as part of the message payload.
 
 * Change the way pKVM allocates its VM handles, making sure that the
   privileged hypervisor is never tricked into using uninitialised
   data.
 
 * Speed up MMIO range registration by avoiding unnecessary RCU
   synchronisation, which results in VMs starting much quicker.
 
 * Add the dump of the instruction stream when panic-ing in the EL2
   payload, just like the rest of the kernel has always done. This will
   hopefully help debugging non-VHE setups.
 
 * Add 52bit PA support to the stage-1 page-table walker, and make use
   of it to populate the fault level reported to the guest on failing
   to translate a stage-1 walk.
 
 * Add NV support to the GICv3-on-GICv5 emulation code, ensuring
   feature parity for guests, irrespective of the host platform.
 
 * Fix some really ugly architecture problems when dealing with debug
   in a nested VM. This has some bad performance impacts, but is at
   least correct.
 
 * Add enough infrastructure to be able to disable EL2 features and
   give effective values to the EL2 control registers. This then allows
   a bunch of features to be turned off, which helps cross-host
   migration.
 
 * Large rework of the selftest infrastructure to allow most tests to
   transparently run at EL2. This is the first step towards enabling
   NV testing.
 
 * Various fixes and improvements all over the map, including one BE
   fix, just in time for the removal of the feature.
 
 LoongArch:
 
 * Detect page table walk feature on new hardware
 
 * Add sign extension with kernel MMIO/IOCSR emulation
 
 * Improve in-kernel IPI emulation
 
 * Improve in-kernel PCH-PIC emulation
 
 * Move kvm_iocsr tracepoint out of generic code
 
 RISC-V:
 
 * Added SBI FWFT extension for Guest/VM with misaligned delegation and
   pointer masking PMLEN features
 
 * Added ONE_REG interface for SBI FWFT extension
 
 * Added Zicbop and bfloat16 extensions for Guest/VM
 
 * Enabled more common KVM selftests for RISC-V
 
 * Added SBI v3.0 PMU enhancements in KVM and perf driver
 
 s390:
 
 * Improve interrupt cpu for wakeup, in particular the heuristic to decide
   which vCPU to deliver a floating interrupt to.
 
 * Clear the PTE when discarding a swapped page because of CMMA; this
   bug was introduced in 6.16 when refactoring gmap code.
 
 x86 selftests:
 
 * Add #DE coverage in the fastops test (the only exception that's guest-
   triggerable in fastop-emulated instructions).
 
 * Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra
   Forest (SRF) and Clearwater Forest (CWF).
 
 * Minor cleanups and improvements
 
 x86 (guest side):
 
 * For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
   overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping
   could map devices as WB and prevent the device drivers from mapping their
   devices with UC/UC-.
 
 * Make kvm_async_pf_task_wake() a local static helper and remove its
   export.
 
 * Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU
   bindings even when PV_UNHALT is unsupported.
 
 Generic:
 
 * Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as __GFP_NOWARN is
   now included in GFP_NOWAIT.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmjcGSkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPSPAgAnJDswU4fZ5YdJr6jGzsbSQ6utlIV
 FeEltLKQIM7Aq/uvL6PLN5Kx1Pb/d9r9ag39mDT6lq9fOfJdOLjJr2SBXPTCsrPS
 6hyNL1mlgo5qzs54T8dkMbQThlSgA4zaehsc0zl8vnwil6ygoAdrtTHqZm6V0hu/
 F/sVlikCsLix1hC0KtzwscyWYcjWtXfVoi9eU5WY6ALpQaVXfRUtwyOhGDkldr+m
 i3iDiGiLAZ5Iu3igUCIOEzSSQY0FgLJpzbwJAeUxIvomDkHGJLaR14ijvM+NkRZi
 FBo2CLbjrwXb56Rbh2ABcq0CGJ3EiU3L+CC34UaRLzbtl/2BtpetkC3irA==
 =fyov
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "This excludes the bulk of the x86 changes, which I will send
  separately. They have two not complex but relatively unusual conflicts
  so I will wait for other dust to settle.

  guest_memfd:

   - Add support for host userspace mapping of guest_memfd-backed memory
     for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE
     (which isn't precisely the same thing as CoCo VMs, since x86's
     SEV-MEM and SEV-ES have no way to detect private vs. shared).

     This lays the groundwork for removal of guest memory from the
     kernel direct map, as well as for limited mmap() for
     guest_memfd-backed memory.

     For more information see:
       - commit a6ad54137a ("Merge branch 'guest-memfd-mmap' into HEAD")
       - guest_memfd in Firecracker:
           https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
       - direct map removal:
           https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
       - mmap support:
           https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/

  ARM:

   - Add support for FF-A 1.2 as the secure memory conduit for pKVM,
     allowing more registers to be used as part of the message payload.

   - Change the way pKVM allocates its VM handles, making sure that the
     privileged hypervisor is never tricked into using uninitialised
     data.

   - Speed up MMIO range registration by avoiding unnecessary RCU
     synchronisation, which results in VMs starting much quicker.

   - Add the dump of the instruction stream when panic-ing in the EL2
     payload, just like the rest of the kernel has always done. This
     will hopefully help debugging non-VHE setups.

   - Add 52bit PA support to the stage-1 page-table walker, and make use
     of it to populate the fault level reported to the guest on failing
     to translate a stage-1 walk.

   - Add NV support to the GICv3-on-GICv5 emulation code, ensuring
     feature parity for guests, irrespective of the host platform.

   - Fix some really ugly architecture problems when dealing with debug
     in a nested VM. This has some bad performance impacts, but is at
     least correct.

   - Add enough infrastructure to be able to disable EL2 features and
     give effective values to the EL2 control registers. This then
     allows a bunch of features to be turned off, which helps cross-host
     migration.

   - Large rework of the selftest infrastructure to allow most tests to
     transparently run at EL2. This is the first step towards enabling
     NV testing.

   - Various fixes and improvements all over the map, including one BE
     fix, just in time for the removal of the feature.

  LoongArch:

   - Detect page table walk feature on new hardware

   - Add sign extension with kernel MMIO/IOCSR emulation

   - Improve in-kernel IPI emulation

   - Improve in-kernel PCH-PIC emulation

   - Move kvm_iocsr tracepoint out of generic code

  RISC-V:

   - Added SBI FWFT extension for Guest/VM with misaligned delegation
     and pointer masking PMLEN features

   - Added ONE_REG interface for SBI FWFT extension

   - Added Zicbop and bfloat16 extensions for Guest/VM

   - Enabled more common KVM selftests for RISC-V

   - Added SBI v3.0 PMU enhancements in KVM and perf driver

  s390:

   - Improve interrupt cpu for wakeup, in particular the heuristic to
     decide which vCPU to deliver a floating interrupt to.

   - Clear the PTE when discarding a swapped page because of CMMA; this
     bug was introduced in 6.16 when refactoring gmap code.

  x86 selftests:

   - Add #DE coverage in the fastops test (the only exception that's
     guest- triggerable in fastop-emulated instructions).

   - Fix PMU selftests errors encountered on Granite Rapids (GNR),
     Sierra Forest (SRF) and Clearwater Forest (CWF).

   - Minor cleanups and improvements

  x86 (guest side):

   - For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
     overriding guest MTRR for TDX/SNP to fix an issue where ACPI
     auto-mapping could map devices as WB and prevent the device drivers
     from mapping their devices with UC/UC-.

   - Make kvm_async_pf_task_wake() a local static helper and remove its
     export.

   - Use native qspinlocks when running in a VM with dedicated
     vCPU=>pCPU bindings even when PV_UNHALT is unsupported.

  Generic:

   - Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as
     __GFP_NOWARN is now included in GFP_NOWAIT.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits)
  KVM: s390: Fix to clear PTE when discarding a swapped page
  KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs
  KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs
  KVM: arm64: selftests: Cope with arch silliness in EL2 selftest
  KVM: arm64: selftests: Add basic test for running in VHE EL2
  KVM: arm64: selftests: Enable EL2 by default
  KVM: arm64: selftests: Initialize HCR_EL2
  KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters
  KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2
  KVM: arm64: selftests: Select SMCCC conduit based on current EL
  KVM: arm64: selftests: Provide helper for getting default vCPU target
  KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts
  KVM: arm64: selftests: Create a VGICv3 for 'default' VMs
  KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation
  KVM: arm64: selftests: Add helper to check for VGICv3 support
  KVM: arm64: selftests: Initialize VGICv3 only once
  KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code
  KVM: selftests: Add ex_str() to print human friendly name of exception vectors
  selftests/kvm: remove stale TODO in xapic_state_test
  KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount
  ...
2025-10-04 08:52:16 -07:00
Linus Torvalds
feafee2845 arm64 updates for 6.18
Confidential computing:
  - Add support for accepting secrets from firmware (e.g. ACPI CCEL)
    and mapping them with appropriate attributes.
 
 CPU features:
  - Advertise atomic floating-point instructions to userspace.
 
  - Extend Spectre workarounds to cover additional Arm CPU variants.
 
  - Extend list of CPUs that support break-before-make level 2 and
    guarantee not to generate TLB conflict aborts for changes of mapping
    granularity (BBML2_NOABORT).
 
  - Add GCS support to our uprobes implementation.
 
 Documentation:
  - Remove bogus SME documentation concerning register state when
    entering/exiting streaming mode.
 
 Entry code:
  - Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY).
 
  - Micro-optimise syscall entry path with a compiler branch hint.
 
 Memory management:
  - Enable huge mappings in vmalloc space even when kernel page-table
    dumping is enabled.
 
  - Tidy up the types used in our early MMU setup code.
 
  - Rework rodata= for closer parity with the behaviour on x86.
 
  - For CPUs implementing BBML2_NOABORT, utilise block mappings in the
    linear map even when rodata= applies to virtual aliases.
 
  - Don't re-allocate the virtual region between '_text' and '_stext',
    as doing so confused tools parsing /proc/vmcore.
 
 Miscellaneous:
  - Clean-up Kconfig menuconfig text for architecture features.
 
  - Avoid redundant bitmap_empty() during determination of supported
    SME vector lengths.
 
  - Re-enable warnings when building the 32-bit vDSO object.
 
  - Avoid breaking our eggs at the wrong end.
 
 Perf and PMUs:
  - Support for v3 of the Hisilicon L3C PMU.
 
  - Support for Hisilicon's MN and NoC PMUs.
 
  - Support for Fujitsu's Uncore PMU.
 
  - Support for SPE's extended event filtering feature.
 
  - Preparatory work to enable data source filtering in SPE.
 
  - Support for multiple lanes in the DWC PCIe PMU.
 
  - Support for i.MX94 in the IMX DDR PMU driver.
 
  - MAINTAINERS update (Thank you, Yicong).
 
  - Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets).
 
 Selftests:
  - Add basic LSFE check to the existing hwcaps test.
 
  - Support nolibc in GCS tests.
 
  - Extend SVE ptrace test to pass unsupported regsets and invalid vector
    lengths.
 
  - Minor cleanups (typos, cosmetic changes).
 
 System registers:
  - Fix ID_PFR1_EL1 definition.
 
  - Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1.
 
  - Sync TCR_EL1 definition with the latest Arm ARM (L.b).
 
  - Be stricter about the input fed into our AWK sysreg generator script.
 
  - Typo fixes and removal of redundant definitions.
 
 ACPI, EFI and PSCI:
  - Decouple Arm's "Software Delegated Exception Interface" (SDEI)
    support from the ACPI GHES code so that it can be used by platforms
    booted with device-tree.
 
  - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI
    runtime calls.
 
  - Fix a node refcount imbalance in the PSCI device-tree code.
 
 CPU Features:
  - Ensure register sanitisation is applied to fields in ID_AA64MMFR4.
 
  - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests
    can reliably query the underlying CPU types from the VMM.
 
  - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes
    to our context-switching, signal handling and ptrace code.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmjWlMIQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNIVYB/sHaFYmToEaK4ldMfTn4ZQ8yr9ZuTMLr/ji
 zOm7wULP08wKIcp6t+1N6e7A8Wb3YSErxTywc4b9MqctIel1WvBMewjJd38xb2qO
 hFCuRuWemfyt6Rw/EGMCP54yueLzKbnN4q+/Aks8jedWtS+AXY6McGCJOzMjuECL
 zKb68Hqqb09YQ2b2BPSAiU9g42rotiIYppLaLbEdxUnw/eqfEN3wSrl92/LuBlqH
 r2wZDnJwA5q8Iy9HWlnhg4Jax0Cs86jSJPIQLB6v3WWhc3HR49pm5cqYzYkUht9L
 nA6eaWJiBl/+k3S+ftPKNzU8n04r15lqyNZE2FYfRk+0BPSacJOf
 =kO15
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "There's good stuff across the board, including some nice mm
  improvements for CPUs with the 'noabort' BBML2 feature and a clever
  patch to allow ptdump to play nicely with block mappings in the
  vmalloc area.

  Confidential computing:

   - Add support for accepting secrets from firmware (e.g. ACPI CCEL)
     and mapping them with appropriate attributes.

  CPU features:

   - Advertise atomic floating-point instructions to userspace

   - Extend Spectre workarounds to cover additional Arm CPU variants

   - Extend list of CPUs that support break-before-make level 2 and
     guarantee not to generate TLB conflict aborts for changes of
     mapping granularity (BBML2_NOABORT)

   - Add GCS support to our uprobes implementation.

  Documentation:

   - Remove bogus SME documentation concerning register state when
     entering/exiting streaming mode.

  Entry code:

   - Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY)

   - Micro-optimise syscall entry path with a compiler branch hint.

  Memory management:

   - Enable huge mappings in vmalloc space even when kernel page-table
     dumping is enabled

   - Tidy up the types used in our early MMU setup code

   - Rework rodata= for closer parity with the behaviour on x86

   - For CPUs implementing BBML2_NOABORT, utilise block mappings in the
     linear map even when rodata= applies to virtual aliases

   - Don't re-allocate the virtual region between '_text' and '_stext',
     as doing so confused tools parsing /proc/vmcore.

  Miscellaneous:

   - Clean-up Kconfig menuconfig text for architecture features

   - Avoid redundant bitmap_empty() during determination of supported
     SME vector lengths

   - Re-enable warnings when building the 32-bit vDSO object

   - Avoid breaking our eggs at the wrong end.

  Perf and PMUs:

   - Support for v3 of the Hisilicon L3C PMU

   - Support for Hisilicon's MN and NoC PMUs

   - Support for Fujitsu's Uncore PMU

   - Support for SPE's extended event filtering feature

   - Preparatory work to enable data source filtering in SPE

   - Support for multiple lanes in the DWC PCIe PMU

   - Support for i.MX94 in the IMX DDR PMU driver

   - MAINTAINERS update (Thank you, Yicong)

   - Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets).

  Selftests:

   - Add basic LSFE check to the existing hwcaps test

   - Support nolibc in GCS tests

   - Extend SVE ptrace test to pass unsupported regsets and invalid
     vector lengths

   - Minor cleanups (typos, cosmetic changes).

  System registers:

   - Fix ID_PFR1_EL1 definition

   - Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1

   - Sync TCR_EL1 definition with the latest Arm ARM (L.b)

   - Be stricter about the input fed into our AWK sysreg generator
     script

   - Typo fixes and removal of redundant definitions.

  ACPI, EFI and PSCI:

   - Decouple Arm's "Software Delegated Exception Interface" (SDEI)
     support from the ACPI GHES code so that it can be used by platforms
     booted with device-tree

   - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI
     runtime calls

   - Fix a node refcount imbalance in the PSCI device-tree code.

  CPU Features:

   - Ensure register sanitisation is applied to fields in ID_AA64MMFR4

   - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM
     guests can reliably query the underlying CPU types from the VMM

   - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes
     to our context-switching, signal handling and ptrace code"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
  arm64: cpufeature: Remove duplicate asm/mmu.h header
  arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN
  perf/dwc_pcie: Fix use of uninitialized variable
  arm/syscalls: mark syscall invocation as likely in invoke_syscall
  Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU
  Documentation: hisi-pmu: Fix of minor format error
  drivers/perf: hisi: Add support for L3C PMU v3
  drivers/perf: hisi: Refactor the event configuration of L3C PMU
  drivers/perf: hisi: Extend the field of tt_core
  drivers/perf: hisi: Extract the event filter check of L3C PMU
  drivers/perf: hisi: Simplify the probe process of each L3C PMU version
  drivers/perf: hisi: Export hisi_uncore_pmu_isr()
  drivers/perf: hisi: Relax the event ID check in the framework
  perf: Fujitsu: Add the Uncore PMU driver
  arm64: map [_text, _stext) virtual address range non-executable+read-only
  arm64/sysreg: Update TCR_EL1 register
  arm64: Enable vmalloc-huge with ptdump
  arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list
  arm64: errata: Apply workarounds for Neoverse-V3AE
  arm64: cputype: Add Neoverse-V3AE definitions
  ...
2025-09-29 18:48:39 -07:00
Will Deacon
c7c7eb4f0e Merge branch 'for-next/sysregs' into for-next/core
* for-next/sysregs:
  arm64/sysreg: Update TCR_EL1 register
  arm64: sysreg: Add validation checks to sysreg header generation script
  arm64: sysreg: Correct sign definitions for EIESB and DoubleLock
  arm64: sysreg: Fix and tidy up sysreg field definitions
2025-09-24 16:35:01 +01:00
Anshuman Khandual
14f158552e arm64/sysreg: Update TCR_EL1 register
Update TCR_EL1 register fields as per latest ARM ARM DDI 0487 L.B and while
here drop an explicit sysreg definition SYS_TCR_EL1 from sysreg.h, which is
now redundant.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-22 11:57:15 +01:00
James Clark
a7005ff2d0 arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 register
Add new fields and register that are introduced for the features
FEAT_SPE_EFT (extended filtering) and FEAT_SPE_FDS (data source
filtering).

Tested-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18 14:17:02 +01:00
Sascha Bischoff
7847f51189 arm64: cpucaps: Add GICv5 Legacy vCPU interface (GCIE_LEGACY) capability
Implement the GCIE_LEGACY capability as a system feature to be able to
check for support from KVM. The type is explicitly
ARM64_CPUCAP_EARLY_LOCAL_CPU_FEATURE, which means that the capability
is enabled early if all boot CPUs support it. Additionally, if this
capability is enabled during boot, it prevents late onlining of CPUs
that lack it, thereby avoiding potential mismatched configurations
which would break KVM.

Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-17 17:41:02 +01:00
Fuad Tabba
382cbbe7fb arm64: sysreg: Add validation checks to sysreg header generation script
The gen_sysreg.awk script processes the system register specification in
the sysreg text file to generate C macro definitions. The current script
will silently accept certain errors in the specification file, leading
to incorrect header generation.

For example, a Sysreg or SysregFields can be accidentally duplicated,
causing its macros to be emitted twice. An Enum can contain duplicate
values for different items, which is architecturally incorrect.

Add checks to catch these errors at build time. The script now tracks
all seen Sysreg and SysregFields definitions and checks for duplicates.
It also tracks values within each Enum block to ensure entries are
unique.

Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-11 14:46:07 +01:00
Fuad Tabba
f4d4ebc849 arm64: sysreg: Correct sign definitions for EIESB and DoubleLock
The `ID_AA64MMFR4_EL1.EIESB` field, is an unsigned enumeration, but was
incorrectly defined as a `SignedEnum` when introduced in commit
cfc680bb04 ("arm64: sysreg: Add layout for ID_AA64MMFR4_EL1"). This is
corrected to `UnsignedEnum`.

Conversely, the `ID_AA64DFR0_EL1.DoubleLock` field, is a signed
enumeration, but was incorrectly defined as an `UnsignedEnum`. This is
corrected to `SignedEnum`, which wasn't correctly set when annotated as
such in commit ad16d4cf0b ("arm64/sysreg: Initial unsigned annotations
for ID registers").

Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-11 14:46:07 +01:00
Fuad Tabba
2c2529e470 arm64: sysreg: Fix and tidy up sysreg field definitions
Fix the value of ID_PFR1_EL1.Security NSACR_RFR to be 0b0010, as per
DDI0601/2025-06, which wasn't correctly set when introduced in commit
1224308075 ("arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation").

While at it, remove redundant definitions of CPACR_EL12 and
RCWSMASK_EL1 and fix some typos.

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-09-11 14:46:07 +01:00
Paolo Bonzini
42a0305ab1 KVM/arm64 changes for 6.17, take #2
- Correctly handle 'invariant' system registers for protected VMs
 
  - Improved handling of VNCR data aborts, including external aborts
 
  - Fixes for handling of FEAT_RAS for NV guests, providing a sane
    fault context during SEA injection and preventing the use of
    RASv1p1 fault injection hardware
 
  - Ensure that page table destruction when a VM is destroyed gives an
    opportunity to reschedule
 
  - Large fix to KVM's infrastructure for managing guest context loaded
    on the CPU, addressing issues where the output of AT emulation
    doesn't get reflected to the guest
 
  - Fix AT S12 emulation to actually perform stage-2 translation when
    necessary
 
  - Avoid attempting vLPI irqbypass when GICv4 has been explicitly
    disabled for a VM
 
  - Minor KVM + selftest fixes
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaLC0JBccb2xpdmVyLnVw
 dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFogJAQCyxHd5tuvXWWT/iC2EYFlPWYkU
 LOQbNhus16QjQ9f2ggD8CoA+6UAxzYW7ZU6IzYkDhJkN/3dKQEQhh8Cx0GXXRAs=
 =uky+
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-6.17-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 changes for 6.17, take #2

 - Correctly handle 'invariant' system registers for protected VMs

 - Improved handling of VNCR data aborts, including external aborts

 - Fixes for handling of FEAT_RAS for NV guests, providing a sane
   fault context during SEA injection and preventing the use of
   RASv1p1 fault injection hardware

 - Ensure that page table destruction when a VM is destroyed gives an
   opportunity to reschedule

 - Large fix to KVM's infrastructure for managing guest context loaded
   on the CPU, addressing issues where the output of AT emulation
   doesn't get reflected to the guest

 - Fix AT S12 emulation to actually perform stage-2 translation when
   necessary

 - Avoid attempting vLPI irqbypass when GICv4 has been explicitly
   disabled for a VM

 - Minor KVM + selftest fixes
2025-08-29 12:57:31 -04:00
Marc Zyngier
8049164653 arm64: Add capability denoting FEAT_RASv1p1
Detecting FEAT_RASv1p1 is rather complicated, as there are two
ways for the architecture to advertise the same thing (always a
delight...).

Add a capability that will advertise this in a synthetic way to
the rest of the kernel.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/20250817202158.395078-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-21 16:28:46 -07:00
Linus Torvalds
63eb28bb14 ARM:
- Host driver for GICv5, the next generation interrupt controller for
   arm64, including support for interrupt routing, MSIs, interrupt
   translation and wired interrupts.
 
 - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
   GICv5 hardware, leveraging the legacy VGIC interface.
 
 - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
   userspace to disable support for SGIs w/o an active state on hardware
   that previously advertised it unconditionally.
 
 - Map supporting endpoints with cacheable memory attributes on systems
   with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
   maintenance on the address range.
 
 - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
   hypervisor to inject external aborts into an L2 VM and take traps of
   masked external aborts to the hypervisor.
 
 - Convert more system register sanitization to the config-driven
   implementation.
 
 - Fixes to the visibility of EL2 registers, namely making VGICv3 system
   registers accessible through the VGIC device instead of the ONE_REG
   vCPU ioctls.
 
 - Various cleanups and minor fixes.
 
 LoongArch:
 
 - Add stat information for in-kernel irqchip
 
 - Add tracepoints for CPUCFG and CSR emulation exits
 
 - Enhance in-kernel irqchip emulation
 
 - Various cleanups.
 
 RISC-V:
 
 - Enable ring-based dirty memory tracking
 
 - Improve perf kvm stat to report interrupt events
 
 - Delegate illegal instruction trap to VS-mode
 
 - MMU improvements related to upcoming nested virtualization
 
 s390x
 
 - Fixes
 
 x86:
 
 - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC,
   PIC, and PIT emulation at compile time.
 
 - Share device posted IRQ code between SVM and VMX and
   harden it against bugs and runtime errors.
 
 - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1)
   instead of O(n).
 
 - For MMIO stale data mitigation, track whether or not a vCPU has access to
   (host) MMIO based on whether the page tables have MMIO pfns mapped; using
   VFIO is prone to false negatives
 
 - Rework the MSR interception code so that the SVM and VMX APIs are more or
   less identical.
 
 - Recalculate all MSR intercepts from scratch on MSR filter changes,
   instead of maintaining shadow bitmaps.
 
 - Advertise support for LKGS (Load Kernel GS base), a new instruction
   that's loosely related to FRED, but is supported and enumerated
   independently.
 
 - Fix a user-triggerable WARN that syzkaller found by setting the vCPU
   in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU
   into VMX Root Mode (post-VMXON).  Trying to detect every possible path
   leading to architecturally forbidden states is hard and even risks
   breaking userspace (if it goes from valid to valid state but passes
   through invalid states), so just wait until KVM_RUN to detect that
   the vCPU state isn't allowed.
 
 - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of
   APERF/MPERF reads, so that a "properly" configured VM can access
   APERF/MPERF.  This has many caveats (APERF/MPERF cannot be zeroed
   on vCPU creation or saved/restored on suspend and resume, or preserved
   over thread migration let alone VM migration) but can be useful whenever
   you're interested in letting Linux guests see the effective physical CPU
   frequency in /proc/cpuinfo.
 
 - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
   created, as there's no known use case for changing the default
   frequency for other VM types and it goes counter to the very reason
   why the ioctl was added to the vm file descriptor.  And also, there
   would be no way to make it work for confidential VMs with a "secure"
   TSC, so kill two birds with one stone.
 
 - Dynamically allocation the shadow MMU's hashed page list, and defer
   allocating the hashed list until it's actually needed (the TDP MMU
   doesn't use the list).
 
 - Extract many of KVM's helpers for accessing architectural local APIC
   state to common x86 so that they can be shared by guest-side code for
   Secure AVIC.
 
 - Various cleanups and fixes.
 
 x86 (Intel):
 
 - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
   Failure to honor FREEZE_IN_SMM can leak host state into guests.
 
 - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent
   L1 from running L2 with features that KVM doesn't support, e.g. BTF.
 
 x86 (AMD):
 
 - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the
   nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty
   much a static condition and therefore should never happen, but still).
 
 - Fix a variety of flaws and bugs in the AVIC device posted IRQ code.
 
 - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
   supports) instead of rejecting vCPU creation.
 
 - Extend enable_ipiv module param support to SVM, by simply leaving
   IsRunning clear in the vCPU's physical ID table entry.
 
 - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by
   erratum #1235, to allow (safely) enabling AVIC on such CPUs.
 
 - Request GA Log interrupts if and only if the target vCPU is blocking,
   i.e. only if KVM needs a notification in order to wake the vCPU.
 
 - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the
   vCPU's CPUID model.
 
 - Accept any SNP policy that is accepted by the firmware with respect to
   SMT and single-socket restrictions.  An incompatible policy doesn't put
   the kernel at risk in any way, so there's no reason for KVM to care.
 
 - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
   use WBNOINVD instead of WBINVD when possible for SEV cache maintenance.
 
 - When reclaiming memory from an SEV guest, only do cache flushes on CPUs
   that have ever run a vCPU for the guest, i.e. don't flush the caches for
   CPUs that can't possibly have cache lines with dirty, encrypted data.
 
 Generic:
 
 - Rework irqbypass to track/match producers and consumers via an xarray
   instead of a linked list.  Using a linked list leads to O(n^2) insertion
   times, which is hugely problematic for use cases that create large
   numbers of VMs.  Such use cases typically don't actually use irqbypass,
   but eliminating the pointless registration is a future problem to
   solve as it likely requires new uAPI.
 
 - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *",
   to avoid making a simple concept unnecessarily difficult to understand.
 
 - Decouple device posted IRQs from VFIO device assignment, as binding a VM
   to a VFIO group is not a requirement for enabling device posted IRQs.
 
 - Clean up and document/comment the irqfd assignment code.
 
 - Disallow binding multiple irqfds to an eventfd with a priority waiter,
   i.e.  ensure an eventfd is bound to at most one irqfd through the entire
   host, and add a selftest to verify eventfd:irqfd bindings are globally
   unique.
 
 - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
   related to private <=> shared memory conversions.
 
 - Drop guest_memfd's .getattr() implementation as the VFS layer will call
   generic_fillattr() if inode_operations.getattr is NULL.
 
 - Fix issues with dirty ring harvesting where KVM doesn't bound the
   processing of entries in any way, which allows userspace to keep KVM
   in a tight loop indefinitely.
 
 - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking,
   now that KVM no longer uses assigned_device_count as a heuristic for
   either irqbypass usage or MDS mitigation.
 
 Selftests:
 
 - Fix a comment typo.
 
 - Verify KVM is loaded when getting any KVM module param so that attempting
   to run a selftest without kvm.ko loaded results in a SKIP message about
   KVM not being loaded/enabled (versus some random parameter not existing).
 
 - Skip tests that hit EACCES when attempting to access a file, and rpint
   a "Root required?" help message.  In most cases, the test just needs to
   be run with elevated permissions.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmiKXMgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMhMQf/QDhC/CP1aGXph2whuyeD2NMqPKiU
 9KdnDNST+ftPwjg9QxZ9mTaa8zeVz/wly6XlxD9OQHy+opM1wcys3k0GZAFFEEQm
 YrThgURdzEZ3nwJZgb+m0t4wjJQtpiFIBwAf7qq6z1VrqQBEmHXJ/8QxGuqO+BNC
 j5q/X+q6KZwehKI6lgFBrrOKWFaxqhnRAYfW6rGBxRXxzTJuna37fvDpodQnNceN
 zOiq+avfriUMArTXTqOteJNKU0229HjiPSnjILLnFQ+B3akBlwNG0jk7TMaAKR6q
 IZWG1EIS9q1BAkGXaw6DE1y6d/YwtXCR5qgAIkiGwaPt5yj9Oj6kRN2Ytw==
 =j2At
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Host driver for GICv5, the next generation interrupt controller for
     arm64, including support for interrupt routing, MSIs, interrupt
     translation and wired interrupts

   - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
     GICv5 hardware, leveraging the legacy VGIC interface

   - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
     userspace to disable support for SGIs w/o an active state on
     hardware that previously advertised it unconditionally

   - Map supporting endpoints with cacheable memory attributes on
     systems with FEAT_S2FWB and DIC where KVM no longer needs to
     perform cache maintenance on the address range

   - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the
     guest hypervisor to inject external aborts into an L2 VM and take
     traps of masked external aborts to the hypervisor

   - Convert more system register sanitization to the config-driven
     implementation

   - Fixes to the visibility of EL2 registers, namely making VGICv3
     system registers accessible through the VGIC device instead of the
     ONE_REG vCPU ioctls

   - Various cleanups and minor fixes

  LoongArch:

   - Add stat information for in-kernel irqchip

   - Add tracepoints for CPUCFG and CSR emulation exits

   - Enhance in-kernel irqchip emulation

   - Various cleanups

  RISC-V:

   - Enable ring-based dirty memory tracking

   - Improve perf kvm stat to report interrupt events

   - Delegate illegal instruction trap to VS-mode

   - MMU improvements related to upcoming nested virtualization

  s390x

   - Fixes

  x86:

   - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O
     APIC, PIC, and PIT emulation at compile time

   - Share device posted IRQ code between SVM and VMX and harden it
     against bugs and runtime errors

   - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups
     O(1) instead of O(n)

   - For MMIO stale data mitigation, track whether or not a vCPU has
     access to (host) MMIO based on whether the page tables have MMIO
     pfns mapped; using VFIO is prone to false negatives

   - Rework the MSR interception code so that the SVM and VMX APIs are
     more or less identical

   - Recalculate all MSR intercepts from scratch on MSR filter changes,
     instead of maintaining shadow bitmaps

   - Advertise support for LKGS (Load Kernel GS base), a new instruction
     that's loosely related to FRED, but is supported and enumerated
     independently

   - Fix a user-triggerable WARN that syzkaller found by setting the
     vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting
     the vCPU into VMX Root Mode (post-VMXON). Trying to detect every
     possible path leading to architecturally forbidden states is hard
     and even risks breaking userspace (if it goes from valid to valid
     state but passes through invalid states), so just wait until
     KVM_RUN to detect that the vCPU state isn't allowed

   - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling
     interception of APERF/MPERF reads, so that a "properly" configured
     VM can access APERF/MPERF. This has many caveats (APERF/MPERF
     cannot be zeroed on vCPU creation or saved/restored on suspend and
     resume, or preserved over thread migration let alone VM migration)
     but can be useful whenever you're interested in letting Linux
     guests see the effective physical CPU frequency in /proc/cpuinfo

   - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
     created, as there's no known use case for changing the default
     frequency for other VM types and it goes counter to the very reason
     why the ioctl was added to the vm file descriptor. And also, there
     would be no way to make it work for confidential VMs with a
     "secure" TSC, so kill two birds with one stone

   - Dynamically allocation the shadow MMU's hashed page list, and defer
     allocating the hashed list until it's actually needed (the TDP MMU
     doesn't use the list)

   - Extract many of KVM's helpers for accessing architectural local
     APIC state to common x86 so that they can be shared by guest-side
     code for Secure AVIC

   - Various cleanups and fixes

  x86 (Intel):

   - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
     Failure to honor FREEZE_IN_SMM can leak host state into guests

   - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to
     prevent L1 from running L2 with features that KVM doesn't support,
     e.g. BTF

  x86 (AMD):

   - WARN and reject loading kvm-amd.ko instead of panicking the kernel
     if the nested SVM MSRPM offsets tracker can't handle an MSR (which
     is pretty much a static condition and therefore should never
     happen, but still)

   - Fix a variety of flaws and bugs in the AVIC device posted IRQ code

   - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
     supports) instead of rejecting vCPU creation

   - Extend enable_ipiv module param support to SVM, by simply leaving
     IsRunning clear in the vCPU's physical ID table entry

   - Disable IPI virtualization, via enable_ipiv, if the CPU is affected
     by erratum #1235, to allow (safely) enabling AVIC on such CPUs

   - Request GA Log interrupts if and only if the target vCPU is
     blocking, i.e. only if KVM needs a notification in order to wake
     the vCPU

   - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to
     the vCPU's CPUID model

   - Accept any SNP policy that is accepted by the firmware with respect
     to SMT and single-socket restrictions. An incompatible policy
     doesn't put the kernel at risk in any way, so there's no reason for
     KVM to care

   - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
     use WBNOINVD instead of WBINVD when possible for SEV cache
     maintenance

   - When reclaiming memory from an SEV guest, only do cache flushes on
     CPUs that have ever run a vCPU for the guest, i.e. don't flush the
     caches for CPUs that can't possibly have cache lines with dirty,
     encrypted data

  Generic:

   - Rework irqbypass to track/match producers and consumers via an
     xarray instead of a linked list. Using a linked list leads to
     O(n^2) insertion times, which is hugely problematic for use cases
     that create large numbers of VMs. Such use cases typically don't
     actually use irqbypass, but eliminating the pointless registration
     is a future problem to solve as it likely requires new uAPI

   - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a
     "void *", to avoid making a simple concept unnecessarily difficult
     to understand

   - Decouple device posted IRQs from VFIO device assignment, as binding
     a VM to a VFIO group is not a requirement for enabling device
     posted IRQs

   - Clean up and document/comment the irqfd assignment code

   - Disallow binding multiple irqfds to an eventfd with a priority
     waiter, i.e. ensure an eventfd is bound to at most one irqfd
     through the entire host, and add a selftest to verify eventfd:irqfd
     bindings are globally unique

   - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
     related to private <=> shared memory conversions

   - Drop guest_memfd's .getattr() implementation as the VFS layer will
     call generic_fillattr() if inode_operations.getattr is NULL

   - Fix issues with dirty ring harvesting where KVM doesn't bound the
     processing of entries in any way, which allows userspace to keep
     KVM in a tight loop indefinitely

   - Kill off kvm_arch_{start,end}_assignment() and x86's associated
     tracking, now that KVM no longer uses assigned_device_count as a
     heuristic for either irqbypass usage or MDS mitigation

  Selftests:

   - Fix a comment typo

   - Verify KVM is loaded when getting any KVM module param so that
     attempting to run a selftest without kvm.ko loaded results in a
     SKIP message about KVM not being loaded/enabled (versus some random
     parameter not existing)

   - Skip tests that hit EACCES when attempting to access a file, and
     print a "Root required?" help message. In most cases, the test just
     needs to be run with elevated permissions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits)
  Documentation: KVM: Use unordered list for pre-init VGIC registers
  RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()
  RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs
  RISC-V: perf/kvm: Add reporting of interrupt events
  RISC-V: KVM: Enable ring-based dirty memory tracking
  RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap
  RISC-V: KVM: Delegate illegal instruction fault to VS mode
  RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs
  RISC-V: KVM: Factor-out g-stage page table management
  RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence
  RISC-V: KVM: Introduce struct kvm_gstage_mapping
  RISC-V: KVM: Factor-out MMU related declarations into separate headers
  RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect()
  RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range()
  RISC-V: KVM: Don't flush TLB when PTE is unchanged
  RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH
  RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()
  RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init()
  RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value
  KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list
  ...
2025-07-30 17:14:01 -07:00
Linus Torvalds
6fb44438a5 arm64 updates for 6.17:
Perf and PMU updates:
 
  - Add support for new (v3) Hisilicon SLLC and DDRC PMUs
 
  - Add support for Arm-NI PMU integrations that share interrupts between
    clock domains within a given instance
 
  - Allow SPE to be configured with a lower sample period than the
    minimum recommendation advertised by PMSIDR_EL1.Interval
 
  - Add suppport for Arm's "Branch Record Buffer Extension" (BRBE)
 
  - Adjust the perf watchdog period according to cpu frequency changes
 
  - Minor driver fixes and cleanups
 
 Hardware features:
 
  - Support for MTE store-only checking (FEAT_MTE_STORE_ONLY)
 
  - Support for reporting the non-address bits during a synchronous MTE
    tag check fault (FEAT_MTE_TAGGED_FAR)
 
  - Optimise the TLBI when folding/unfolding contiguous PTEs on hardware
    with FEAT_BBM (break-before-make) level 2 and no TLB conflict aborts
 
 Software features:
 
  - Enable HAVE_LIVEPATCH after implementing arch_stack_walk_reliable()
    and using the text-poke API for late module relocations
 
  - Force VMAP_STACK always on and change arm64_efi_rt_init() to use
    arch_alloc_vmap_stack() in order to avoid KASAN false positives
 
 ACPI:
 
  - Improve SPCR handling and messaging on systems lacking an SPCR table
 
 Debug:
 
  - Simplify the debug exception entry path
 
  - Drop redundant DBG_MDSCR_* macros
 
 Kselftests:
 
  - Cleanups and improvements for SME, SVE and FPSIMD tests
 
 Miscellaneous:
 
  - Optimise loop to reduce redundant operations in contpte_ptep_get()
 
  - Remove ISB when resetting POR_EL0 during signal handling
 
  - Mark the kernel as tainted on SEA and SError panic
 
  - Remove redundant gcs_free() call
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmiDkgoACgkQa9axLQDI
 XvFucQ//bYugRP5/Sdlrq5eDKWBGi1HufYzwfDEBLc4S75Eu8mGL/tuThfu9yFn+
 qCowtt4U84HdWsZDTSVo6lym6v2vJUpGOMgXzepvJaFBRnqGv9X9NxH6RQO1LTnu
 Pm7rO+7I9tNpfuc7Zu9pHDggsJEw+WzVfmEF6WPSFlT9mUNv6NbSx4rbLQKU86Dm
 ouTqXaePEQZ5oiRXVasxyT0otGtiACD20WpgOtNjYGzsfUVwCf/C83V/2DLwwbhr
 9cW9lCtFxA/yFdQcA9ThRzWZ9Eo5LAHqjGIq00+zOjuzgDbBtcTT79gpChkhovIR
 FBIsWHd9j9i3nYxzf4V4eRKQnyqS3NQWv7g7uKFwNgARif1Zk0VJ77QIlAYk5xLI
 ENTRjLKz5WNGGnhdkeCvDlVyxX+OktgcVTp3vqRxAKCRahMMUqBrwxiM8RzVF37e
 yzkEQayL8F7uZqy9H7Sjn48UpHZux6frJ1bBQw1oEvR9QmAoAdqavPMSAYIOT3Zr
 ze4WIljq/cFr3kBPIFP5pK1e0qYMHXZpSKIm8MAv6y/7KmQuVbMjZthpuPbLSIw0
 Q7C0KalB8lToPIbO7qMni/he0dCN4K2+E1YHFTR+pzfcoLuW4rjSg7i8tqMLKMJ8
 H+SeGLyPtM5A6bdAPTTpqefcgUUe7064ENUqrGUpDEynGXA7boE=
 =5h1C
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "A quick summary: perf support for Branch Record Buffer Extensions
  (BRBE), typical PMU hardware updates, small additions to MTE for
  store-only tag checking and exposing non-address bits to signal
  handlers, HAVE_LIVEPATCH enabled on arm64, VMAP_STACK forced on.

  There is also a TLBI optimisation on hardware that does not require
  break-before-make when changing the user PTEs between contiguous and
  non-contiguous.

  More details:

  Perf and PMU updates:

   - Add support for new (v3) Hisilicon SLLC and DDRC PMUs

   - Add support for Arm-NI PMU integrations that share interrupts
     between clock domains within a given instance

   - Allow SPE to be configured with a lower sample period than the
     minimum recommendation advertised by PMSIDR_EL1.Interval

   - Add suppport for Arm's "Branch Record Buffer Extension" (BRBE)

   - Adjust the perf watchdog period according to cpu frequency changes

   - Minor driver fixes and cleanups

  Hardware features:

   - Support for MTE store-only checking (FEAT_MTE_STORE_ONLY)

   - Support for reporting the non-address bits during a synchronous MTE
     tag check fault (FEAT_MTE_TAGGED_FAR)

   - Optimise the TLBI when folding/unfolding contiguous PTEs on
     hardware with FEAT_BBM (break-before-make) level 2 and no TLB
     conflict aborts

  Software features:

   - Enable HAVE_LIVEPATCH after implementing arch_stack_walk_reliable()
     and using the text-poke API for late module relocations

   - Force VMAP_STACK always on and change arm64_efi_rt_init() to use
     arch_alloc_vmap_stack() in order to avoid KASAN false positives

  ACPI:

   - Improve SPCR handling and messaging on systems lacking an SPCR
     table

  Debug:

   - Simplify the debug exception entry path

   - Drop redundant DBG_MDSCR_* macros

  Kselftests:

   - Cleanups and improvements for SME, SVE and FPSIMD tests

  Miscellaneous:

   - Optimise loop to reduce redundant operations in contpte_ptep_get()

   - Remove ISB when resetting POR_EL0 during signal handling

   - Mark the kernel as tainted on SEA and SError panic

   - Remove redundant gcs_free() call"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
  arm64/gcs: task_gcs_el0_enable() should use passed task
  arm64: Kconfig: Keep selects somewhat alphabetically ordered
  arm64: signal: Remove ISB when resetting POR_EL0
  kselftest/arm64: Handle attempts to disable SM on SME only systems
  kselftest/arm64: Fix SVE write data generation for SME only systems
  kselftest/arm64: Test SME on SME only systems in fp-ptrace
  kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace
  kselftest/arm64: Allow sve-ptrace to run on SME only systems
  arm64/mm: Drop redundant addr increment in set_huge_pte_at()
  kselftest/arm4: Provide local defines for AT_HWCAP3
  arm64: Mark kernel as tainted on SAE and SError panic
  arm64/gcs: Don't call gcs_free() when releasing task_struct
  drivers/perf: hisi: Support PMUs with no interrupt
  drivers/perf: hisi: Relax the event number check of v2 PMUs
  drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver
  drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information
  drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver
  drivers/perf: hisi: Simplify the probe process for each DDRC version
  perf/arm-ni: Support sharing IRQs within an NI instance
  perf/arm-ni: Consolidate CPU affinity handling
  ...
2025-07-29 20:21:54 -07:00
Oliver Upton
d9b9fa2c32 Merge branch 'kvm-arm64/config-masks' into kvmarm/next
* kvm-arm64/config-masks:
  : More config-driven mask computation, courtesy of Marc Zyngier
  :
  : Converts more system registers to the config-driven computation of RESx
  : masks based on the advertised feature set
  KVM: arm64: Tighten the definition of FEAT_PMUv3p9
  KVM: arm64: Convert MDCR_EL2 to config-driven sanitisation
  KVM: arm64: Convert SCTLR_EL1 to config-driven sanitisation
  KVM: arm64: Convert TCR2_EL2 to config-driven sanitisation
  arm64: sysreg: Add THE/ASID2 controls to TCR2_ELx

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-28 08:03:08 -07:00
Oliver Upton
1f315e99bd Merge branch 'kvm-arm64/gcie-legacy' into kvmarm/next
* kvm-arm64/gcie-legacy:
  : Support for GICv3 emulation on GICv5, courtesy of Sascha Bischoff
  :
  : FEAT_GCIE_LEGACY adds the necessary hardware for GICv5 systems to
  : support the legacy GICv3 for VMs, including a backwards-compatible VGIC
  : implementation that we all know and love.
  :
  : As a starting point for GICv5 enablement in KVM, enable + use the
  : GICv3-compatible feature when running VMs on GICv5 hardware.
  KVM: arm64: gic-v5: Probe for GICv5
  KVM: arm64: gic-v5: Support GICv3 compat
  arm64/sysreg: Add ICH_VCTLR_EL2
  irqchip/gic-v5: Populate struct gic_kvm_info
  irqchip/gic-v5: Skip deactivate for forwarded PPI interrupts

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26 08:50:06 -07:00
Oliver Upton
ccd73c5782 GICv5 initial host support
Add host kernel support for the new arm64 GICv5 architecture, which is
 quite a departure from the previous ones.
 
 Include support for the full gamut of the architecture (interrupt
 routing and delivery to CPUs, wired interrupts, MSIs, and interrupt
 translation).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmh45MYACgkQI9DQutE9
 ekPa3w//b5FfQAXwSco2+zqfR80a914CkBchHWJ50S1XHxymikI0VWin+4nsFXz1
 90/k52hz4a1rhjpMA0Z0rnEpzTpvyPckrfKDzUqf2Q8aAmfHMRw91kYvl2BII39O
 iWqEQKFRIxK5QR3mRt6C7mV8xth8zUbk/jPBdFbuB7iS/s8+Ayrxul9H4gHQsZqL
 f8fFZmFMKIIoshnWSr604510j0/jhj2lTXyesXGoNa/bBpPYsjOZeZByPaw+3RLS
 wGluBhMsbRk3gPzplVuPzMtQYLMinf2i08bhg4113zVvF1nvi1cs8ah28+HRH33X
 ZFIzClvWmCOu1zsYes49X8A6U2iJ4BL5Ndh9W6M3E7iH+pnzmYPsSuKL69welyvz
 7qRJnoAkIooaWrgES+TVCDGqC4gBTClBWUZKMRa21GMwyyPLaPQZBnAmHzqbbFO1
 k8WMcOVtvStc/Hd4Jc8GgbdWn5IRI6YAqIOEht1vYP9bKka8oj0nEt4I275bUlJP
 K2Qife4C6If8oAG+5Qu0dD6pAh7Pp6wylPm0EQ9AE5KCR4wWONOluvrSvU0WaAw6
 2uk5H/lTl0l9onO84YKP2dkYNawkKLVWeYnKFtpT1HKRUt1OkF01NsGKYivE5xp3
 qdsgyOYXR6r/MKa0ymfQ58y0txqTY7IQ/GSl44Sjh2WVU94Sp8A=
 =pB67
 -----END PGP SIGNATURE-----

Merge tag 'irqchip-gic-v5-host' into kvmarm/next

GICv5 initial host support

Add host kernel support for the new arm64 GICv5 architecture, which is
quite a departure from the previous ones.

Include support for the full gamut of the architecture (interrupt
routing and delivery to CPUs, wired interrupts, MSIs, and interrupt
translation).

* tag 'irqchip-gic-v5-host': (32 commits)
  arm64: smp: Fix pNMI setup after GICv5 rework
  arm64: Kconfig: Enable GICv5
  docs: arm64: gic-v5: Document booting requirements for GICv5
  irqchip/gic-v5: Add GICv5 IWB support
  irqchip/gic-v5: Add GICv5 ITS support
  irqchip/msi-lib: Add IRQ_DOMAIN_FLAG_FWNODE_PARENT handling
  irqchip/gic-v3: Rename GICv3 ITS MSI parent
  PCI/MSI: Add pci_msi_map_rid_ctlr_node() helper function
  of/irq: Add of_msi_xlate() helper function
  irqchip/gic-v5: Enable GICv5 SMP booting
  irqchip/gic-v5: Add GICv5 LPI/IPI support
  irqchip/gic-v5: Add GICv5 IRS/SPI support
  irqchip/gic-v5: Add GICv5 PPI support
  arm64: Add support for GICv5 GSB barriers
  arm64: smp: Support non-SGIs for IPIs
  arm64: cpucaps: Add GICv5 CPU interface (GCIE) capability
  arm64: cpucaps: Rename GICv3 CPU interface capability
  arm64: Disable GICv5 read/write/instruction traps
  arm64/sysreg: Add ICH_HFGITR_EL2
  arm64/sysreg: Add ICH_HFGWTR_EL2
  ...

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26 08:49:42 -07:00
Catalin Marinas
5b1ae9de71 Merge branch 'for-next/feat_mte_store_only' into for-next/core
* for-next/feat_mte_store_only:
  : MTE feature to restrict tag checking to store only operations
  kselftest/arm64/mte: Add MTE_STORE_ONLY testcases
  kselftest/arm64/mte: Preparation for mte store only test
  kselftest/arm64/abi: Add MTE_STORE_ONLY feature hwcap test
  KVM: arm64: Expose MTE_STORE_ONLY feature to guest
  arm64/hwcaps: Add MTE_STORE_ONLY hwcaps
  arm64/kernel: Support store-only mte tag check
  prctl: Introduce PR_MTE_STORE_ONLY
  arm64/cpufeature: Add MTE_STORE_ONLY feature
2025-07-24 16:03:34 +01:00
Catalin Marinas
3ae8cef210 Merge branches 'for-next/livepatch', 'for-next/user-contig-bbml2', 'for-next/misc', 'for-next/acpi', 'for-next/debug-entry', 'for-next/feat_mte_tagged_far', 'for-next/kselftest', 'for-next/mdscr-cleanup' and 'for-next/vmap-stack', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf: (23 commits)
  drivers/perf: hisi: Support PMUs with no interrupt
  drivers/perf: hisi: Relax the event number check of v2 PMUs
  drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver
  drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information
  drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver
  drivers/perf: hisi: Simplify the probe process for each DDRC version
  perf/arm-ni: Support sharing IRQs within an NI instance
  perf/arm-ni: Consolidate CPU affinity handling
  perf/cxlpmu: Fix typos in cxl_pmu.c comments and documentation
  perf/cxlpmu: Remove unintended newline from IRQ name format string
  perf/cxlpmu: Fix devm_kcalloc() argument order in cxl_pmu_probe()
  perf: arm_spe: Relax period restriction
  perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  KVM: arm64: nvhe: Disable branch generation in nVHE guests
  arm64: Handle BRBE booting requirements
  arm64/sysreg: Add BRBE registers and fields
  perf/arm: Add missing .suppress_bind_attrs
  perf/arm-cmn: Reduce stack usage during discovery
  perf: imx9_perf: make the read-only array mask static const
  perf/arm-cmn: Broaden module description for wider interconnect support
  ...

* for-next/livepatch:
  : Support for HAVE_LIVEPATCH on arm64
  arm64: Kconfig: Keep selects somewhat alphabetically ordered
  arm64: Implement HAVE_LIVEPATCH
  arm64: stacktrace: Implement arch_stack_walk_reliable()
  arm64: stacktrace: Check kretprobe_find_ret_addr() return value
  arm64/module: Use text-poke API for late relocations.

* for-next/user-contig-bbml2:
  : Optimise the TLBI when folding/unfolding contigous PTEs on hardware with BBML2 and no TLB conflict aborts
  arm64/mm: Elide tlbi in contpte_convert() under BBML2
  iommu/arm: Add BBM Level 2 smmu feature
  arm64: Add BBM Level 2 cpu feature
  arm64: cpufeature: Introduce MATCH_ALL_EARLY_CPUS capability type

* for-next/misc:
  : Miscellaneous arm64 patches
  arm64/gcs: task_gcs_el0_enable() should use passed task
  arm64: signal: Remove ISB when resetting POR_EL0
  arm64/mm: Drop redundant addr increment in set_huge_pte_at()
  arm64: Mark kernel as tainted on SAE and SError panic
  arm64/gcs: Don't call gcs_free() when releasing task_struct
  arm64: fix unnecessary rebuilding when CONFIG_DEBUG_EFI=y
  arm64/mm: Optimize loop to reduce redundant operations of contpte_ptep_get
  arm64: pi: use 'targets' instead of extra-y in Makefile

* for-next/acpi:
  : Various ACPI arm64 changes
  ACPI: Suppress misleading SPCR console message when SPCR table is absent
  ACPI: Return -ENODEV from acpi_parse_spcr() when SPCR support is disabled

* for-next/debug-entry:
  : Simplify the debug exception entry path
  arm64: debug: remove debug exception registration infrastructure
  arm64: debug: split bkpt32 exception entry
  arm64: debug: split brk64 exception entry
  arm64: debug: split hardware watchpoint exception entry
  arm64: debug: split single stepping exception entry
  arm64: debug: refactor reinstall_suspended_bps()
  arm64: debug: split hardware breakpoint exception entry
  arm64: entry: Add entry and exit functions for debug exceptions
  arm64: debug: remove break/step handler registration infrastructure
  arm64: debug: call step handlers statically
  arm64: debug: call software breakpoint handlers statically
  arm64: refactor aarch32_break_handler()
  arm64: debug: clean up single_step_handler logic

* for-next/feat_mte_tagged_far:
  : Support for reporting the non-address bits during a synchronous MTE tag check fault
  kselftest/arm64/mte: Add mtefar tests on check_mmap_options
  kselftest/arm64/mte: Refactor check_mmap_option test
  kselftest/arm64/mte: Add verification for address tag in signal handler
  kselftest/arm64/mte: Add address tag related macro and function
  kselftest/arm64/mte: Check MTE_FAR feature is supported
  kselftest/arm64/mte: Register mte signal handler with SA_EXPOSE_TAGBITS
  kselftest/arm64: Add MTE_FAR hwcap test
  KVM: arm64: Expose FEAT_MTE_TAGGED_FAR feature to guest
  arm64: Report address tag when FEAT_MTE_TAGGED_FAR is supported
  arm64/cpufeature: Add FEAT_MTE_TAGGED_FAR feature

* for-next/kselftest:
  : Kselftest updates for arm64
  kselftest/arm64: Handle attempts to disable SM on SME only systems
  kselftest/arm64: Fix SVE write data generation for SME only systems
  kselftest/arm64: Test SME on SME only systems in fp-ptrace
  kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace
  kselftest/arm64: Allow sve-ptrace to run on SME only systems
  kselftest/arm4: Provide local defines for AT_HWCAP3
  kselftest/arm64: Specify SVE data when testing VL set in sve-ptrace
  kselftest/arm64: Fix test for streaming FPSIMD write in sve-ptrace
  kselftest/arm64: Fix check for setting new VLs in sve-ptrace
  kselftest/arm64: Convert tpidr2 test to use kselftest.h

* for-next/mdscr-cleanup:
  : Drop redundant DBG_MDSCR_* macros
  KVM: selftests: Change MDSCR_EL1 register holding variables as uint64_t
  arm64/debug: Drop redundant DBG_MDSCR_* macros

* for-next/vmap-stack:
  : Force VMAP_STACK on arm64
  arm64: remove CONFIG_VMAP_STACK checks from entry code
  arm64: remove CONFIG_VMAP_STACK checks from SDEI stack handling
  arm64: remove CONFIG_VMAP_STACK checks from stacktrace overflow logic
  arm64: remove CONFIG_VMAP_STACK conditionals from traps overflow stack
  arm64: remove CONFIG_VMAP_STACK conditionals from irq stack setup
  arm64: Remove CONFIG_VMAP_STACK conditionals from THREAD_SHIFT and THREAD_ALIGN
  arm64: efi: Remove CONFIG_VMAP_STACK check
  arm64: Mandate VMAP_STACK
  arm64: efi: Fix KASAN false positive for EFI runtime stack
  arm64/ptrace: Fix stack-out-of-bounds read in regs_get_kernel_stack_nth()
  arm64/gcs: Don't call gcs_free() during flush_gcs()
  arm64: Restrict pagetable teardown to avoid false warning
  docs: arm64: Fix ICC_SRE_EL2 register typo in booting.rst
2025-07-24 16:01:22 +01:00
Marc Zyngier
a3ed7da911 arm64: sysreg: Add THE/ASID2 controls to TCR2_ELx
FEAT_THE and FEAT_ASID2 add new controls to the TCR2_ELx registers.

Add them to the register descriptions.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-2-maz@kernel.org
[ fix whitespace ]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-15 20:39:24 -07:00
Sascha Bischoff
b62f4b5dec arm64/sysreg: Add ICH_VCTLR_EL2
This system register is required to enable/disable V3 legacy mode when
running on a GICv5 host.

Co-authored-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://lore.kernel.org/r/20250627100847.1022515-4-sascha.bischoff@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08 14:41:06 -07:00
Oliver Upton
bf49e73dde arm64: Detect FEAT_SCTLR2
KVM is about to pick up support for SCTLR2. Add cpucap for later use in
the guest/host context switch hot path.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08 10:40:30 -07:00
Lorenzo Pieralisi
988699f9e6 arm64: cpucaps: Add GICv5 CPU interface (GCIE) capability
Implement the GCIE capability as a strict boot cpu capability to
detect whether architectural GICv5 support is available in HW.

Plug it in with a naming consistent with the existing GICv3
CPU interface capability.

Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-17-12e71f1b3528@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08 18:35:51 +01:00
Lorenzo Pieralisi
0bb5b6faa0 arm64: cpucaps: Rename GICv3 CPU interface capability
In preparation for adding a GICv5 CPU interface capability,
rework the existing GICv3 CPUIF capability - change its name and
description so that the subsequent GICv5 CPUIF capability
can be added with a more consistent naming on top.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-16-12e71f1b3528@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08 18:35:51 +01:00