Loongarch:

- Add more CPUCFG mask bits.
 
 - Improve feature detection.
 
 - Add lazy load support for FPU and binary translation (LBT) register state.
 
 - Fix return value for memory reads from and writes to in-kernel devices.
 
 - Add support for detecting preemption from within a guest.
 
 - Add KVM steal time test case to tools/selftests.
 
 ARM:
 
 - Add support for FEAT_IDST, allowing ID registers that are not
   implemented to be reported as a normal trap rather than as an UNDEF
   exception.
 
 - Add sanitisation of the VTCR_EL2 register, fixing a number of
   UXN/PXN/XN bugs in the process.
 
 - Full handling of RESx bits, instead of only RES0, and resulting in
   SCTLR_EL2 being added to the list of sanitised registers.
 
 - More pKVM fixes for features that are not supposed to be exposed to
   guests.
 
 - Make sure that MTE being disabled on the pKVM host doesn't give it
   the ability to attack the hypervisor.
 
 - Allow pKVM's host stage-2 mappings to use the Force Write Back
   version of the memory attributes by using the "pass-through'
   encoding.
 
 - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
   guest.
 
 - Preliminary work for guest GICv5 support.
 
 - A bunch of debugfs fixes, removing pointless custom iterators stored
   in guest data structures.
 
 - A small set of FPSIMD cleanups.
 
 - Selftest fixes addressing the incorrect alignment of page
   allocation.
 
 - Other assorted low-impact fixes and spelling fixes.
 
 RISC-V:
 
 - Fixes for issues discoverd by KVM API fuzzing in
   kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(),
   and kvm_riscv_vcpu_aia_imsic_update()
 
 - Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM
 
 - Transparent huge page support for hypervisor page tables
 
 - Adjust the number of available guest irq files based on MMIO
   register sizes found in the device tree or the ACPI tables
 
 - Add RISC-V specific paging modes to KVM selftests
 
 - Detect paging mode at runtime for selftests
 
 s390:
 
 - Performance improvement for vSIE (aka nested virtualization)
 
 - Completely new memory management.  s390 was a special snowflake that enlisted
   help from the architecture's page table management to build hypervisor
   page tables, in particular enabling sharing the last level of page
   tables.  This however was a lot of code (~3K lines) in order to support
   KVM, and also blocked several features.  The biggest advantages is
   that the page size of userspace is completely independent of the
   page size used by the guest: userspace can mix normal pages, THPs and
   hugetlbfs as it sees fit, and in fact transparent hugepages were not
   possible before.  It's also now possible to have nested guests and
   guests with huge pages running on the same host.
 
 - Maintainership change for s390 vfio-pci
 
 - Small quality of life improvement for protected guests
 
 x86:
 
 - Add support for giving the guest full ownership of PMU hardware (contexted
   switched around the fastpath run loop) and allowing direct access to data
   MSRs and PMCs (restricted by the vPMU model).  KVM still intercepts
   access to control registers, e.g. to enforce event filtering and to
   prevent the guest from profiling sensitive host state.  This is more
   accurate, since it has no risk of contention and thus dropped events, and
   also has significantly less overhead.
 
   For more information, see the commit message for merge commit bf2c3138ae
   ("Merge tag 'kvm-x86-pmu-6.20' of https://github.com/kvm-x86/linux into HEAD").
 
 - Disallow changing the virtual CPU model if L2 is active, for all the same
   reasons KVM disallows change the model after the first KVM_RUN.
 
 - Fix a bug where KVM would incorrectly reject host accesses to PV MSRs
   when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled, even if those
   were advertised as supported to userspace,
 
 - Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs, where KVM
   would attempt to read CR3 configuring an async #PF entry.
 
 - Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM (for x86
   only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL.  Only a few exports
   that are intended for external usage, and those are allowed explicitly.
 
 - When checking nested events after a vCPU is unblocked, ignore -EBUSY instead
   of WARNing.  Userspace can sometimes put the vCPU into what should be an
   impossible state, and spurious exit to userspace on -EBUSY does not really
   do anything to solve the issue.
 
 - Also throw in the towel and drop the WARN on INIT/SIPI being blocked when vCPU
   is in Wait-For-SIPI, which also resulted in playing whack-a-mole with syzkaller
   stuffing architecturally impossible states into KVM.
 
 - Add support for new Intel instructions that don't require anything beyond
   enumerating feature flags to userspace.
 
 - Grab SRCU when reading PDPTRs in KVM_GET_SREGS2.
 
 - Add WARNs to guard against modifying KVM's CPU caps outside of the intended
   setup flow, as nested VMX in particular is sensitive to unexpected changes
   in KVM's golden configuration.
 
 - Add a quirk to allow userspace to opt-in to actually suppress EOI broadcasts
   when the suppression feature is enabled by the guest (currently limited to
   split IRQCHIP, i.e. userspace I/O APIC).  Sadly, simply fixing KVM to honor
   Suppress EOI Broadcasts isn't an option as some userspaces have come to rely
   on KVM's buggy behavior (KVM advertises Supress EOI Broadcast irrespective
   of whether or not userspace I/O APIC supports Directed EOIs).
 
 - Clean up KVM's handling of marking mapped vCPU pages dirty.
 
 - Drop a pile of *ancient* sanity checks hidden behind in KVM's unused
   ASSERT() macro, most of which could be trivially triggered by the guest
   and/or user, and all of which were useless.
 
 - Fold "struct dest_map" into its sole user, "struct rtc_status", to make it
   more obvious what the weird parameter is used for, and to allow fropping
   these RTC shenanigans if CONFIG_KVM_IOAPIC=n.
 
 - Bury all of ioapic.h, i8254.h and related ioctls (including
   KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y.
 
 - Add a regression test for recent APICv update fixes.
 
 - Handle "hardware APIC ISR", a.k.a. SVI, updates in kvm_apic_update_apicv()
   to consolidate the updates, and to co-locate SVI updates with the updates
   for KVM's own cache of ISR information.
 
 - Drop a dead function declaration.
 
 - Minor cleanups.
 
 x86 (Intel):
 
 - Rework KVM's handling of VMCS updates while L2 is active to temporarily
   switch to vmcs01 instead of deferring the update until the next nested
   VM-Exit.  The deferred updates approach directly contributed to several
   bugs, was proving to be a maintenance burden due to the difficulty in
   auditing the correctness of deferred updates, and was polluting
   "struct nested_vmx" with a growing pile of booleans.
 
 - Fix an SGX bug where KVM would incorrectly try to handle EPCM page faults,
   and instead always reflect them into the guest.  Since KVM doesn't shadow
   EPCM entries, EPCM violations cannot be due to KVM interference and
   can't be resolved by KVM.
 
 - Fix a bug where KVM would register its posted interrupt wakeup handler even
   if loading kvm-intel.ko ultimately failed.
 
 - Disallow access to vmcb12 fields that aren't fully supported, mostly to
   avoid weirdness and complexity for FRED and other features, where KVM wants
   enable VMCS shadowing for fields that conditionally exist.
 
 - Print out the "bad" offsets and values if kvm-intel.ko refuses to load (or
   refuses to online a CPU) due to a VMCS config mismatch.
 
 x86 (AMD):
 
 - Drop a user-triggerable WARN on nested_svm_load_cr3() failure.
 
 - Add support for virtualizing ERAPS.  Note, correct virtualization of ERAPS
   relies on an upcoming, publicly announced change in the APM to reduce the
   set of conditions where hardware (i.e. KVM) *must* flush the RAP.
 
 - Ignore nSVM intercepts for instructions that are not supported according to
   L1's virtual CPU model.
 
 - Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath
   for EPT Misconfig.
 
 - Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM,
   and allow userspace to restore nested state with GIF=0.
 
 - Treat exit_code as an unsigned 64-bit value through all of KVM.
 
 - Add support for fetching SNP certificates from userspace.
 
 - Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD
   or VMSAVE on behalf of L2.
 
 - Misc fixes and cleanups.
 
 x86 selftests:
 
 - Add a regression test for TPR<=>CR8 synchronization and IRQ masking.
 
 - Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support,
   and extend x86's infrastructure to support EPT and NPT (for L2 guests).
 
 - Extend several nested VMX tests to also cover nested SVM.
 
 - Add a selftest for nested VMLOAD/VMSAVE.
 
 - Rework the nested dirty log test, originally added as a regression test for
   PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage
   and to hopefully make the test easier to understand and maintain.
 
 guest_memfd:
 
 - Remove kvm_gmem_populate()'s preparation tracking and half-baked hugepage
   handling.  SEV/SNP was the only user of the tracking and it can do it via
   the RMP.
 
 - Retroactively document and enforce (for SNP) that KVM_SEV_SNP_LAUNCH_UPDATE
   and KVM_TDX_INIT_MEM_REGION require the source page to be 4KiB aligned, to
   avoid non-trivial complexity for something that no known VMM seems to be
   doing and to avoid an API special case for in-place conversion, which
   simply can't support unaligned sources.
 
 - When populating guest_memfd memory, GUP the source page in common code and
   pass the refcounted page to the vendor callback, instead of letting vendor
   code do the heavy lifting.  Doing so avoids a looming deadlock bug with
   in-place due an AB-BA conflict betwee mmap_lock and guest_memfd's filemap
   invalidate lock.
 
 Generic:
 
 - Fix a bug where KVM would ignore the vCPU's selected address space when
   creating a vCPU-specific mapping of guest memory.  Actually this bug
   could not be hit even on x86, the only architecture with multiple
   address spaces, but it's a bug nevertheless.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmmNqwwUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPaZAf/cJx5B67lnST272esz0j29MIuT/Ti
 jnf6PI9b7XubKYOtNvlu5ZW4Jsa5dqRG0qeO/JmcXDlwBf5/UkWOyvqIXyiuTl0l
 KcSUlKPtTgKZSoZpJpTppuuDE8FSYqEdcCmjNvoYzcJoPjmaeJbK6aqO0AkBbb6e
 L5InrLV7nV9iua6rFvA0s/G8/Eq2DG8M9hTRHe6NcI/z4hvslOudvpUXtC8Jygoo
 cV8vFavUwc+atrmvhAOLvSitnrjfNa4zcG6XMOlwXPfIdvi3zqTlQTgUpwGKiAGQ
 RIDUVZ/9bcWgJqbPRsdEWwaYRkNQWc5nmrAHRpEEaYV/NeBBNf4v6qfKSw==
 =SkJ1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Loongarch:

   - Add more CPUCFG mask bits

   - Improve feature detection

   - Add lazy load support for FPU and binary translation (LBT) register
     state

   - Fix return value for memory reads from and writes to in-kernel
     devices

   - Add support for detecting preemption from within a guest

   - Add KVM steal time test case to tools/selftests

  ARM:

   - Add support for FEAT_IDST, allowing ID registers that are not
     implemented to be reported as a normal trap rather than as an UNDEF
     exception

   - Add sanitisation of the VTCR_EL2 register, fixing a number of
     UXN/PXN/XN bugs in the process

   - Full handling of RESx bits, instead of only RES0, and resulting in
     SCTLR_EL2 being added to the list of sanitised registers

   - More pKVM fixes for features that are not supposed to be exposed to
     guests

   - Make sure that MTE being disabled on the pKVM host doesn't give it
     the ability to attack the hypervisor

   - Allow pKVM's host stage-2 mappings to use the Force Write Back
     version of the memory attributes by using the "pass-through'
     encoding

   - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
     guest

   - Preliminary work for guest GICv5 support

   - A bunch of debugfs fixes, removing pointless custom iterators
     stored in guest data structures

   - A small set of FPSIMD cleanups

   - Selftest fixes addressing the incorrect alignment of page
     allocation

   - Other assorted low-impact fixes and spelling fixes

  RISC-V:

   - Fixes for issues discoverd by KVM API fuzzing in
     kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and
     kvm_riscv_vcpu_aia_imsic_update()

   - Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM

   - Transparent huge page support for hypervisor page tables

   - Adjust the number of available guest irq files based on MMIO
     register sizes found in the device tree or the ACPI tables

   - Add RISC-V specific paging modes to KVM selftests

   - Detect paging mode at runtime for selftests

  s390:

   - Performance improvement for vSIE (aka nested virtualization)

   - Completely new memory management. s390 was a special snowflake that
     enlisted help from the architecture's page table management to
     build hypervisor page tables, in particular enabling sharing the
     last level of page tables. This however was a lot of code (~3K
     lines) in order to support KVM, and also blocked several features.
     The biggest advantages is that the page size of userspace is
     completely independent of the page size used by the guest:
     userspace can mix normal pages, THPs and hugetlbfs as it sees fit,
     and in fact transparent hugepages were not possible before. It's
     also now possible to have nested guests and guests with huge pages
     running on the same host

   - Maintainership change for s390 vfio-pci

   - Small quality of life improvement for protected guests

  x86:

   - Add support for giving the guest full ownership of PMU hardware
     (contexted switched around the fastpath run loop) and allowing
     direct access to data MSRs and PMCs (restricted by the vPMU model).

     KVM still intercepts access to control registers, e.g. to enforce
     event filtering and to prevent the guest from profiling sensitive
     host state. This is more accurate, since it has no risk of
     contention and thus dropped events, and also has significantly less
     overhead.

     For more information, see the commit message for merge commit
     bf2c3138ae ("Merge tag 'kvm-x86-pmu-6.20' ...")

   - Disallow changing the virtual CPU model if L2 is active, for all
     the same reasons KVM disallows change the model after the first
     KVM_RUN

   - Fix a bug where KVM would incorrectly reject host accesses to PV
     MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled,
     even if those were advertised as supported to userspace,

   - Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs,
     where KVM would attempt to read CR3 configuring an async #PF entry

   - Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM
     (for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL.
     Only a few exports that are intended for external usage, and those
     are allowed explicitly

   - When checking nested events after a vCPU is unblocked, ignore
     -EBUSY instead of WARNing. Userspace can sometimes put the vCPU
     into what should be an impossible state, and spurious exit to
     userspace on -EBUSY does not really do anything to solve the issue

   - Also throw in the towel and drop the WARN on INIT/SIPI being
     blocked when vCPU is in Wait-For-SIPI, which also resulted in
     playing whack-a-mole with syzkaller stuffing architecturally
     impossible states into KVM

   - Add support for new Intel instructions that don't require anything
     beyond enumerating feature flags to userspace

   - Grab SRCU when reading PDPTRs in KVM_GET_SREGS2

   - Add WARNs to guard against modifying KVM's CPU caps outside of the
     intended setup flow, as nested VMX in particular is sensitive to
     unexpected changes in KVM's golden configuration

   - Add a quirk to allow userspace to opt-in to actually suppress EOI
     broadcasts when the suppression feature is enabled by the guest
     (currently limited to split IRQCHIP, i.e. userspace I/O APIC).
     Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an
     option as some userspaces have come to rely on KVM's buggy behavior
     (KVM advertises Supress EOI Broadcast irrespective of whether or
     not userspace I/O APIC supports Directed EOIs)

   - Clean up KVM's handling of marking mapped vCPU pages dirty

   - Drop a pile of *ancient* sanity checks hidden behind in KVM's
     unused ASSERT() macro, most of which could be trivially triggered
     by the guest and/or user, and all of which were useless

   - Fold "struct dest_map" into its sole user, "struct rtc_status", to
     make it more obvious what the weird parameter is used for, and to
     allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n

   - Bury all of ioapic.h, i8254.h and related ioctls (including
     KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y

   - Add a regression test for recent APICv update fixes

   - Handle "hardware APIC ISR", a.k.a. SVI, updates in
     kvm_apic_update_apicv() to consolidate the updates, and to
     co-locate SVI updates with the updates for KVM's own cache of ISR
     information

   - Drop a dead function declaration

   - Minor cleanups

  x86 (Intel):

   - Rework KVM's handling of VMCS updates while L2 is active to
     temporarily switch to vmcs01 instead of deferring the update until
     the next nested VM-Exit.

     The deferred updates approach directly contributed to several bugs,
     was proving to be a maintenance burden due to the difficulty in
     auditing the correctness of deferred updates, and was polluting
     "struct nested_vmx" with a growing pile of booleans

   - Fix an SGX bug where KVM would incorrectly try to handle EPCM page
     faults, and instead always reflect them into the guest. Since KVM
     doesn't shadow EPCM entries, EPCM violations cannot be due to KVM
     interference and can't be resolved by KVM

   - Fix a bug where KVM would register its posted interrupt wakeup
     handler even if loading kvm-intel.ko ultimately failed

   - Disallow access to vmcb12 fields that aren't fully supported,
     mostly to avoid weirdness and complexity for FRED and other
     features, where KVM wants enable VMCS shadowing for fields that
     conditionally exist

   - Print out the "bad" offsets and values if kvm-intel.ko refuses to
     load (or refuses to online a CPU) due to a VMCS config mismatch

  x86 (AMD):

   - Drop a user-triggerable WARN on nested_svm_load_cr3() failure

   - Add support for virtualizing ERAPS. Note, correct virtualization of
     ERAPS relies on an upcoming, publicly announced change in the APM
     to reduce the set of conditions where hardware (i.e. KVM) *must*
     flush the RAP

   - Ignore nSVM intercepts for instructions that are not supported
     according to L1's virtual CPU model

   - Add support for expedited writes to the fast MMIO bus, a la VMX's
     fastpath for EPT Misconfig

   - Don't set GIF when clearing EFER.SVME, as GIF exists independently
     of SVM, and allow userspace to restore nested state with GIF=0

   - Treat exit_code as an unsigned 64-bit value through all of KVM

   - Add support for fetching SNP certificates from userspace

   - Fix a bug where KVM would use vmcb02 instead of vmcb01 when
     emulating VMLOAD or VMSAVE on behalf of L2

   - Misc fixes and cleanups

  x86 selftests:

   - Add a regression test for TPR<=>CR8 synchronization and IRQ masking

   - Overhaul selftest's MMU infrastructure to genericize stage-2 MMU
     support, and extend x86's infrastructure to support EPT and NPT
     (for L2 guests)

   - Extend several nested VMX tests to also cover nested SVM

   - Add a selftest for nested VMLOAD/VMSAVE

   - Rework the nested dirty log test, originally added as a regression
     test for PML where KVM logged L2 GPAs instead of L1 GPAs, to
     improve test coverage and to hopefully make the test easier to
     understand and maintain

  guest_memfd:

   - Remove kvm_gmem_populate()'s preparation tracking and half-baked
     hugepage handling. SEV/SNP was the only user of the tracking and it
     can do it via the RMP

   - Retroactively document and enforce (for SNP) that
     KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the
     source page to be 4KiB aligned, to avoid non-trivial complexity for
     something that no known VMM seems to be doing and to avoid an API
     special case for in-place conversion, which simply can't support
     unaligned sources

   - When populating guest_memfd memory, GUP the source page in common
     code and pass the refcounted page to the vendor callback, instead
     of letting vendor code do the heavy lifting. Doing so avoids a
     looming deadlock bug with in-place due an AB-BA conflict betwee
     mmap_lock and guest_memfd's filemap invalidate lock

  Generic:

   - Fix a bug where KVM would ignore the vCPU's selected address space
     when creating a vCPU-specific mapping of guest memory. Actually
     this bug could not be hit even on x86, the only architecture with
     multiple address spaces, but it's a bug nevertheless"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (267 commits)
  KVM: s390: Increase permitted SE header size to 1 MiB
  MAINTAINERS: Replace backup for s390 vfio-pci
  KVM: s390: vsie: Fix race in acquire_gmap_shadow()
  KVM: s390: vsie: Fix race in walk_guest_tables()
  KVM: s390: Use guest address to mark guest page dirty
  irqchip/riscv-imsic: Adjust the number of available guest irq files
  RISC-V: KVM: Transparent huge page support
  RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test
  RISC-V: KVM: Allow Zalasr extensions for Guest/VM
  KVM: riscv: selftests: Add riscv vm satp modes
  KVM: riscv: selftests: add Zilsd and Zclsd extension to get-reg-list test
  riscv: KVM: allow Zilsd and Zclsd extensions for Guest/VM
  RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized
  RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr()
  RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr()
  RISC-V: KVM: Remove unnecessary 'ret' assignment
  KVM: s390: Add explicit padding to struct kvm_s390_keyop
  KVM: LoongArch: selftests: Add steal time test case
  LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side
  LoongArch: KVM: Add paravirt preempt feature in hypervisor side
  ...
This commit is contained in:
Linus Torvalds 2026-02-13 11:31:15 -08:00
commit cb5573868e
212 changed files with 11541 additions and 7834 deletions

View file

@ -3100,6 +3100,26 @@ Kernel parameters
Default is Y (on).
kvm.enable_pmu=[KVM,X86]
If enabled, KVM will virtualize PMU functionality based
on the virtual CPU model defined by userspace. This
can be overridden on a per-VM basis via
KVM_CAP_PMU_CAPABILITY.
If disabled, KVM will not virtualize PMU functionality,
e.g. MSRs, PMCs, PMIs, etc., even if userspace defines
a virtual CPU model that contains PMU assets.
Note, KVM's vPMU support implicitly requires running
with an in-kernel local APIC, e.g. to deliver PMIs to
the guest. Running without an in-kernel local APIC is
not supported, though KVM will allow such a combination
(with severely degraded functionality).
See also enable_mediated_pmu.
Default is Y (on).
kvm.enable_virt_at_load=[KVM,ARM64,LOONGARCH,MIPS,RISCV,X86]
If enabled, KVM will enable virtualization in hardware
when KVM is loaded, and disable virtualization when KVM
@ -3146,6 +3166,35 @@ Kernel parameters
If the value is 0 (the default), KVM will pick a period based
on the ratio, such that a page is zapped after 1 hour on average.
kvm-{amd,intel}.enable_mediated_pmu=[KVM,AMD,INTEL]
If enabled, KVM will provide a mediated virtual PMU,
instead of the default perf-based virtual PMU (if
kvm.enable_pmu is true and PMU is enumerated via the
virtual CPU model).
With a perf-based vPMU, KVM operates as a user of perf,
i.e. emulates guest PMU counters using perf events.
KVM-created perf events are managed by perf as regular
(guest-only) events, e.g. are scheduled in/out, contend
for hardware resources, etc. Using a perf-based vPMU
allows guest and host usage of the PMU to co-exist, but
incurs non-trivial overhead and can result in silently
dropped guest events (due to resource contention).
With a mediated vPMU, hardware PMU state is context
switched around the world switch to/from the guest.
KVM mediates which events the guest can utilize, but
gives the guest direct access to all other PMU assets
when possible (KVM may intercept some accesses if the
virtual CPU model provides a subset of hardware PMU
functionality). Using a mediated vPMU significantly
reduces PMU virtualization overhead and eliminates lost
guest events, but is mutually exclusive with using perf
to profile KVM guests and adds latency to most VM-Exits
(to context switch PMU state).
Default is N (off).
kvm-amd.nested= [KVM,AMD] Control nested virtualization feature in
KVM/SVM. Default is 1 (enabled).

View file

@ -6518,6 +6518,40 @@ the capability to be present.
`flags` must currently be zero.
4.144 KVM_S390_KEYOP
--------------------
:Capability: KVM_CAP_S390_KEYOP
:Architectures: s390
:Type: vm ioctl
:Parameters: struct kvm_s390_keyop (in/out)
:Returns: 0 in case of success, < 0 on error
The specified key operation is performed on the given guest address. The
previous storage key (or the relevant part thereof) will be returned in
`key`.
::
struct kvm_s390_keyop {
__u64 guest_addr;
__u8 key;
__u8 operation;
};
Currently supported values for ``operation``:
KVM_S390_KEYOP_ISKE
Returns the storage key for the guest address ``guest_addr`` in ``key``.
KVM_S390_KEYOP_RRBE
Resets the reference bit for the guest address ``guest_addr``, returning the
R and C bits of the old storage key in ``key``; the remaining fields of
the storage key will be set to 0.
KVM_S390_KEYOP_SSKE
Sets the storage key for the guest address ``guest_addr`` to the key
specified in ``key``, returning the previous value in ``key``.
.. _kvm_run:
@ -7382,6 +7416,50 @@ Please note that the kernel is allowed to use the kvm_run structure as the
primary storage for certain register types. Therefore, the kernel may use the
values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
::
/* KVM_EXIT_SNP_REQ_CERTS */
struct kvm_exit_snp_req_certs {
__u64 gpa;
__u64 npages;
__u64 ret;
};
KVM_EXIT_SNP_REQ_CERTS indicates an SEV-SNP guest with certificate-fetching
enabled (see KVM_SEV_SNP_ENABLE_REQ_CERTS) has generated an Extended Guest
Request NAE #VMGEXIT (SNP_GUEST_REQUEST) with message type MSG_REPORT_REQ,
i.e. has requested an attestation report from firmware, and would like the
certificate data corresponding to the attestation report signature to be
provided by the hypervisor as part of the request.
To allow for userspace to provide the certificate, the 'gpa' and 'npages'
are forwarded verbatim from the guest request (the RAX and RBX GHCB fields
respectively). 'ret' is not an "output" from KVM, and is always '0' on
exit. KVM verifies the 'gpa' is 4KiB aligned prior to exiting to userspace,
but otherwise the information from the guest isn't validated.
Upon the next KVM_RUN, e.g. after userspace has serviced the request (or not),
KVM will complete the #VMGEXIT, using the 'ret' field to determine whether to
signal success or failure to the guest, and on failure, what reason code will
be communicated via SW_EXITINFO2. If 'ret' is set to an unsupported value (see
the table below), KVM_RUN will fail with -EINVAL. For a 'ret' of 'ENOSPC', KVM
also consumes the 'npages' field, i.e. userspace can use the field to inform
the guest of the number of pages needed to hold all the certificate data.
The supported 'ret' values and their respective SW_EXITINFO2 encodings:
====== =============================================================
0 0x0, i.e. success. KVM will emit an SNP_GUEST_REQUEST command
to SNP firmware.
ENOSPC 0x0000000100000000, i.e. not enough guest pages to hold the
certificate table and certificate data. KVM will also set the
RBX field in the GHBC to 'npages'.
EAGAIN 0x0000000200000000, i.e. the host is busy and the guest should
retry the request.
EIO 0xffffffff00000000, for all other errors (this return code is
a KVM-defined hypervisor value, as allowed by the GHCB)
====== =============================================================
.. _cap_enable:
@ -7864,8 +7942,10 @@ Will return -EBUSY if a VCPU has already been created.
Valid feature flags in args[0] are::
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
#define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST (1ULL << 2)
#define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST (1ULL << 3)
Enabling KVM_X2APIC_API_USE_32BIT_IDS changes the behavior of
KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC,
@ -7878,6 +7958,28 @@ as a broadcast even in x2APIC mode in order to support physical x2APIC
without interrupt remapping. This is undesirable in logical mode,
where 0xff represents CPUs 0-7 in cluster 0.
Setting KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST instructs KVM to enable
Suppress EOI Broadcasts. KVM will advertise support for Suppress EOI
Broadcast to the guest and suppress LAPIC EOI broadcasts when the guest
sets the Suppress EOI Broadcast bit in the SPIV register. This flag is
supported only when using a split IRQCHIP.
Setting KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST disables support for
Suppress EOI Broadcasts entirely, i.e. instructs KVM to NOT advertise
support to the guest.
Modern VMMs should either enable KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST
or KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST. If not, legacy quirky
behavior will be used by KVM: in split IRQCHIP mode, KVM will advertise
support for Suppress EOI Broadcasts but not actually suppress EOI
broadcasts; for in-kernel IRQCHIP mode, KVM will not advertise support for
Suppress EOI Broadcasts.
Setting both KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST will fail with an EINVAL error,
as will setting KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST without a split
IRCHIP.
7.8 KVM_CAP_S390_USER_INSTR0
----------------------------
@ -9316,6 +9418,14 @@ The presence of this capability indicates that KVM_RUN will update the
KVM_RUN_X86_GUEST_MODE bit in kvm_run.flags to indicate whether the
vCPU was executing nested guest code when it exited.
8.46 KVM_CAP_S390_KEYOP
-----------------------
:Architectures: s390
The presence of this capability indicates that the KVM_S390_KEYOP ioctl is
available.
KVM exits with the register state of either the L1 or L2 guest
depending on which executed at the time of an exit. Userspace must
take care to differentiate between these cases.

View file

@ -523,7 +523,7 @@ Returns: 0 on success, < 0 on error, -EAGAIN if caller should retry
struct kvm_sev_snp_launch_update {
__u64 gfn_start; /* Guest page number to load/encrypt data into. */
__u64 uaddr; /* Userspace address of data to be loaded/encrypted. */
__u64 uaddr; /* 4k-aligned address of data to be loaded/encrypted. */
__u64 len; /* 4k-aligned length in bytes to copy into guest memory.*/
__u8 type; /* The type of the guest pages being initialized. */
__u8 pad0;
@ -572,6 +572,52 @@ Returns: 0 on success, -negative on error
See SNP_LAUNCH_FINISH in the SEV-SNP specification [snp-fw-abi]_ for further
details on the input parameters in ``struct kvm_sev_snp_launch_finish``.
21. KVM_SEV_SNP_ENABLE_REQ_CERTS
--------------------------------
The KVM_SEV_SNP_ENABLE_REQ_CERTS command will configure KVM to exit to
userspace with a ``KVM_EXIT_SNP_REQ_CERTS`` exit type as part of handling
a guest attestation report, which will to allow userspace to provide a
certificate corresponding to the endorsement key used by firmware to sign
that attestation report.
Returns: 0 on success, -negative on error
NOTE: The endorsement key used by firmware may change as a result of
management activities like updating SEV-SNP firmware or loading new
endorsement keys, so some care should be taken to keep the returned
certificate data in sync with the actual endorsement key in use by
firmware at the time the attestation request is sent to SNP firmware. The
recommended scheme to do this is to use file locking (e.g. via fcntl()'s
F_OFD_SETLK) in the following manner:
- Prior to obtaining/providing certificate data as part of servicing an
exit type of ``KVM_EXIT_SNP_REQ_CERTS``, the VMM should obtain a
shared/read or exclusive/write lock on the certificate blob file before
reading it and returning it to KVM, and continue to hold the lock until
the attestation request is actually sent to firmware. To facilitate
this, the VMM can set the ``immediate_exit`` flag of kvm_run just after
supplying the certificate data, and just before resuming the vCPU.
This will ensure the vCPU will exit again to userspace with ``-EINTR``
after it finishes fetching the attestation request from firmware, at
which point the VMM can safely drop the file lock.
- Tools/libraries that perform updates to SNP firmware TCB values or
endorsement keys (e.g. via /dev/sev interfaces such as ``SNP_COMMIT``,
``SNP_SET_CONFIG``, or ``SNP_VLEK_LOAD``, see
Documentation/virt/coco/sev-guest.rst for more details) in such a way
that the certificate blob needs to be updated, should similarly take an
exclusive lock on the certificate blob for the duration of any updates
to endorsement keys or the certificate blob contents to ensure that
VMMs using the above scheme will not return certificate blob data that
is out of sync with the endorsement key used by firmware at the time
the attestation request is actually issued.
This scheme is recommended so that tools can use a fairly generic/natural
approach to synchronizing firmware/certificate updates via file-locking,
which should make it easier to maintain interoperability across
tools/VMMs/vendors.
Device attribute API
====================
@ -579,11 +625,15 @@ Attributes of the SEV implementation can be retrieved through the
``KVM_HAS_DEVICE_ATTR`` and ``KVM_GET_DEVICE_ATTR`` ioctls on the ``/dev/kvm``
device node, using group ``KVM_X86_GRP_SEV``.
Currently only one attribute is implemented:
The following attributes are currently implemented:
* ``KVM_X86_SEV_VMSA_FEATURES``: return the set of all bits that
are accepted in the ``vmsa_features`` of ``KVM_SEV_INIT2``.
* ``KVM_X86_SEV_SNP_REQ_CERTS``: return a value of 1 if the kernel supports the
``KVM_EXIT_SNP_REQ_CERTS`` exit, which allows for fetching endorsement key
certificates from userspace for each SNP attestation request the guest issues.
Firmware Management
===================

View file

@ -156,7 +156,7 @@ KVM_TDX_INIT_MEM_REGION
:Returns: 0 on success, <0 on error
Initialize @nr_pages TDX guest private memory starting from @gpa with userspace
provided data from @source_addr.
provided data from @source_addr. @source_addr must be PAGE_SIZE-aligned.
Note, before calling this sub command, memory attribute of the range
[gpa, gpa + nr_pages] needs to be private. Userspace can use

View file

@ -14012,14 +14012,12 @@ L: kvm@vger.kernel.org
S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git
F: Documentation/virt/kvm/s390*
F: arch/s390/include/asm/gmap.h
F: arch/s390/include/asm/gmap_helpers.h
F: arch/s390/include/asm/kvm*
F: arch/s390/include/uapi/asm/kvm*
F: arch/s390/include/uapi/asm/uvdevice.h
F: arch/s390/kernel/uv.c
F: arch/s390/kvm/
F: arch/s390/mm/gmap.c
F: arch/s390/mm/gmap_helpers.c
F: drivers/s390/char/uvdevice.c
F: tools/testing/selftests/drivers/s390x/uvdevice/
@ -23300,7 +23298,8 @@ F: include/uapi/linux/vfio_ccw.h
S390 VFIO-PCI DRIVER
M: Matthew Rosato <mjrosato@linux.ibm.com>
M: Eric Farman <farman@linux.ibm.com>
M: Farhan Ali <alifm@linux.ibm.com>
R: Eric Farman <farman@linux.ibm.com>
L: linux-s390@vger.kernel.org
L: kvm@vger.kernel.org
S: Supported

View file

@ -235,7 +235,6 @@
ICH_HFGRTR_EL2_ICC_ICSR_EL1 | \
ICH_HFGRTR_EL2_ICC_PCR_EL1 | \
ICH_HFGRTR_EL2_ICC_HPPIR_EL1 | \
ICH_HFGRTR_EL2_ICC_HAPR_EL1 | \
ICH_HFGRTR_EL2_ICC_CR0_EL1 | \
ICH_HFGRTR_EL2_ICC_IDRn_EL1 | \
ICH_HFGRTR_EL2_ICC_APR_EL1)

View file

@ -101,7 +101,7 @@
HCR_BSU_IS | HCR_FB | HCR_TACR | \
HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
HCR_FMO | HCR_IMO | HCR_PTW | HCR_TID3 | HCR_TID1)
#define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
#define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK)
#define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
#define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H | HCR_AMO | HCR_IMO | HCR_FMO)
@ -124,37 +124,7 @@
#define TCR_EL2_MASK (TCR_EL2_TG0_MASK | TCR_EL2_SH0_MASK | \
TCR_EL2_ORGN0_MASK | TCR_EL2_IRGN0_MASK)
/* VTCR_EL2 Registers bits */
#define VTCR_EL2_DS TCR_EL2_DS
#define VTCR_EL2_RES1 (1U << 31)
#define VTCR_EL2_HD (1 << 22)
#define VTCR_EL2_HA (1 << 21)
#define VTCR_EL2_PS_SHIFT TCR_EL2_PS_SHIFT
#define VTCR_EL2_PS_MASK TCR_EL2_PS_MASK
#define VTCR_EL2_TG0_MASK TCR_TG0_MASK
#define VTCR_EL2_TG0_4K TCR_TG0_4K
#define VTCR_EL2_TG0_16K TCR_TG0_16K
#define VTCR_EL2_TG0_64K TCR_TG0_64K
#define VTCR_EL2_SH0_MASK TCR_SH0_MASK
#define VTCR_EL2_SH0_INNER TCR_SH0_INNER
#define VTCR_EL2_ORGN0_MASK TCR_ORGN0_MASK
#define VTCR_EL2_ORGN0_WBWA TCR_ORGN0_WBWA
#define VTCR_EL2_IRGN0_MASK TCR_IRGN0_MASK
#define VTCR_EL2_IRGN0_WBWA TCR_IRGN0_WBWA
#define VTCR_EL2_SL0_SHIFT 6
#define VTCR_EL2_SL0_MASK (3 << VTCR_EL2_SL0_SHIFT)
#define VTCR_EL2_T0SZ_MASK 0x3f
#define VTCR_EL2_VS_SHIFT 19
#define VTCR_EL2_VS_8BIT (0 << VTCR_EL2_VS_SHIFT)
#define VTCR_EL2_VS_16BIT (1 << VTCR_EL2_VS_SHIFT)
#define VTCR_EL2_T0SZ(x) TCR_T0SZ(x)
/*
* We configure the Stage-2 page tables to always restrict the IPA space to be
* 40 bits wide (T0SZ = 24). Systems with a PARange smaller than 40 bits are
* not known to exist and will break with this configuration.
*
* The VTCR_EL2 is configured per VM and is initialised in kvm_init_stage2_mmu.
*
* Note that when using 4K pages, we concatenate two first level page tables
@ -162,9 +132,6 @@
*
*/
#define VTCR_EL2_COMMON_BITS (VTCR_EL2_SH0_INNER | VTCR_EL2_ORGN0_WBWA | \
VTCR_EL2_IRGN0_WBWA | VTCR_EL2_RES1)
/*
* VTCR_EL2:SL0 indicates the entry level for Stage2 translation.
* Interestingly, it depends on the page size.
@ -196,30 +163,35 @@
*/
#ifdef CONFIG_ARM64_64K_PAGES
#define VTCR_EL2_TGRAN VTCR_EL2_TG0_64K
#define VTCR_EL2_TGRAN 64K
#define VTCR_EL2_TGRAN_SL0_BASE 3UL
#elif defined(CONFIG_ARM64_16K_PAGES)
#define VTCR_EL2_TGRAN VTCR_EL2_TG0_16K
#define VTCR_EL2_TGRAN 16K
#define VTCR_EL2_TGRAN_SL0_BASE 3UL
#else /* 4K */
#define VTCR_EL2_TGRAN VTCR_EL2_TG0_4K
#define VTCR_EL2_TGRAN 4K
#define VTCR_EL2_TGRAN_SL0_BASE 2UL
#endif
#define VTCR_EL2_LVLS_TO_SL0(levels) \
((VTCR_EL2_TGRAN_SL0_BASE - (4 - (levels))) << VTCR_EL2_SL0_SHIFT)
FIELD_PREP(VTCR_EL2_SL0, (VTCR_EL2_TGRAN_SL0_BASE - (4 - (levels))))
#define VTCR_EL2_SL0_TO_LVLS(sl0) \
((sl0) + 4 - VTCR_EL2_TGRAN_SL0_BASE)
#define VTCR_EL2_LVLS(vtcr) \
VTCR_EL2_SL0_TO_LVLS(((vtcr) & VTCR_EL2_SL0_MASK) >> VTCR_EL2_SL0_SHIFT)
VTCR_EL2_SL0_TO_LVLS(FIELD_GET(VTCR_EL2_SL0, (vtcr)))
#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN)
#define VTCR_EL2_IPA(vtcr) (64 - ((vtcr) & VTCR_EL2_T0SZ_MASK))
#define VTCR_EL2_FLAGS (SYS_FIELD_PREP_ENUM(VTCR_EL2, SH0, INNER) | \
SYS_FIELD_PREP_ENUM(VTCR_EL2, ORGN0, WBWA) | \
SYS_FIELD_PREP_ENUM(VTCR_EL2, IRGN0, WBWA) | \
SYS_FIELD_PREP_ENUM(VTCR_EL2, TG0, VTCR_EL2_TGRAN) | \
VTCR_EL2_RES1)
#define VTCR_EL2_IPA(vtcr) (64 - FIELD_GET(VTCR_EL2_T0SZ, (vtcr)))
/*
* ARM VMSAv8-64 defines an algorithm for finding the translation table
@ -344,6 +316,8 @@
#define PAR_TO_HPFAR(par) \
(((par) & GENMASK_ULL(52 - 1, 12)) >> 8)
#define FAR_TO_FIPA_OFFSET(far) ((far) & GENMASK_ULL(11, 0))
#define ECN(x) { ESR_ELx_EC_##x, #x }
#define kvm_arm_exception_class \

View file

@ -300,8 +300,6 @@ void kvm_get_kimage_voffset(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst);
void kvm_compute_final_ctr_el0(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst);
void kvm_pan_patch_el2_entry(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst);
void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr, u64 elr_virt,
u64 elr_phys, u64 par, uintptr_t vcpu, u64 far, u64 hpfar);

View file

@ -45,6 +45,7 @@ bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
void kvm_skip_instr32(struct kvm_vcpu *vcpu);
void kvm_inject_undefined(struct kvm_vcpu *vcpu);
void kvm_inject_sync(struct kvm_vcpu *vcpu, u64 esr);
int kvm_inject_serror_esr(struct kvm_vcpu *vcpu, u64 esr);
int kvm_inject_sea(struct kvm_vcpu *vcpu, bool iabt, u64 addr);
int kvm_inject_dabt_excl_atomic(struct kvm_vcpu *vcpu, u64 addr);

View file

@ -201,7 +201,7 @@ struct kvm_s2_mmu {
* host to parse the guest S2.
* This either contains:
* - the virtual VTTBR programmed by the guest hypervisor with
* CnP cleared
* CnP cleared
* - The value 1 (VMID=0, BADDR=0, CnP=1) if invalid
*
* We also cache the full VTCR which gets used for TLB invalidation,
@ -373,9 +373,6 @@ struct kvm_arch {
/* Maximum number of counters for the guest */
u8 nr_pmu_counters;
/* Iterator for idreg debugfs */
u8 idreg_debugfs_iter;
/* Hypercall features firmware registers' descriptor */
struct kvm_smccc_features smccc_feat;
struct maple_tree smccc_filter;
@ -495,7 +492,6 @@ enum vcpu_sysreg {
DBGVCR32_EL2, /* Debug Vector Catch Register */
/* EL2 registers */
SCTLR_EL2, /* System Control Register (EL2) */
ACTLR_EL2, /* Auxiliary Control Register (EL2) */
CPTR_EL2, /* Architectural Feature Trap Register (EL2) */
HACR_EL2, /* Hypervisor Auxiliary Control Register */
@ -526,6 +522,7 @@ enum vcpu_sysreg {
/* Anything from this can be RES0/RES1 sanitised */
MARKER(__SANITISED_REG_START__),
SCTLR_EL2, /* System Control Register (EL2) */
TCR2_EL2, /* Extended Translation Control Register (EL2) */
SCTLR2_EL2, /* System Control Register 2 (EL2) */
MDCR_EL2, /* Monitor Debug Configuration Register (EL2) */
@ -626,18 +623,45 @@ enum vcpu_sysreg {
NR_SYS_REGS /* Nothing after this line! */
};
struct kvm_sysreg_masks {
struct {
u64 res0;
u64 res1;
} mask[NR_SYS_REGS - __SANITISED_REG_START__];
struct resx {
u64 res0;
u64 res1;
};
struct kvm_sysreg_masks {
struct resx mask[NR_SYS_REGS - __SANITISED_REG_START__];
};
static inline struct resx __kvm_get_sysreg_resx(struct kvm_arch *arch,
enum vcpu_sysreg sr)
{
struct kvm_sysreg_masks *masks;
masks = arch->sysreg_masks;
if (likely(masks &&
sr >= __SANITISED_REG_START__ && sr < NR_SYS_REGS))
return masks->mask[sr - __SANITISED_REG_START__];
return (struct resx){};
}
#define kvm_get_sysreg_resx(k, sr) __kvm_get_sysreg_resx(&(k)->arch, (sr))
static inline void __kvm_set_sysreg_resx(struct kvm_arch *arch,
enum vcpu_sysreg sr, struct resx resx)
{
arch->sysreg_masks->mask[sr - __SANITISED_REG_START__] = resx;
}
#define kvm_set_sysreg_resx(k, sr, resx) \
__kvm_set_sysreg_resx(&(k)->arch, (sr), (resx))
struct fgt_masks {
const char *str;
u64 mask;
u64 nmask;
u64 res0;
u64 res1;
};
extern struct fgt_masks hfgrtr_masks;
@ -710,11 +734,11 @@ struct cpu_sve_state {
struct kvm_host_data {
#define KVM_HOST_DATA_FLAG_HAS_SPE 0
#define KVM_HOST_DATA_FLAG_HAS_TRBE 1
#define KVM_HOST_DATA_FLAG_TRBE_ENABLED 4
#define KVM_HOST_DATA_FLAG_EL1_TRACING_CONFIGURED 5
#define KVM_HOST_DATA_FLAG_VCPU_IN_HYP_CONTEXT 6
#define KVM_HOST_DATA_FLAG_L1_VNCR_MAPPED 7
#define KVM_HOST_DATA_FLAG_HAS_BRBE 8
#define KVM_HOST_DATA_FLAG_TRBE_ENABLED 2
#define KVM_HOST_DATA_FLAG_EL1_TRACING_CONFIGURED 3
#define KVM_HOST_DATA_FLAG_VCPU_IN_HYP_CONTEXT 4
#define KVM_HOST_DATA_FLAG_L1_VNCR_MAPPED 5
#define KVM_HOST_DATA_FLAG_HAS_BRBE 6
unsigned long flags;
struct kvm_cpu_context host_ctxt;
@ -1606,7 +1630,7 @@ static inline bool kvm_arch_has_irq_bypass(void)
}
void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt);
void get_reg_fixed_bits(struct kvm *kvm, enum vcpu_sysreg reg, u64 *res0, u64 *res1);
struct resx get_reg_fixed_bits(struct kvm *kvm, enum vcpu_sysreg reg);
void check_feature_map(void);
void kvm_vcpu_load_fgt(struct kvm_vcpu *vcpu);
@ -1655,4 +1679,6 @@ static __always_inline enum fgt_group_id __fgt_reg_to_group_id(enum vcpu_sysreg
p; \
})
long kvm_get_cap_for_kvm_ioctl(unsigned int ioctl, long *ext);
#endif /* __ARM64_KVM_HOST_H__ */

View file

@ -103,6 +103,7 @@ alternative_cb_end
void kvm_update_va_mask(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst);
void kvm_compute_layout(void);
u32 kvm_hyp_va_bits(void);
void kvm_apply_hyp_relocations(void);
#define __hyp_pa(x) (((phys_addr_t)(x)) + hyp_physvirt_offset)
@ -185,7 +186,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
phys_addr_t kvm_mmu_get_httbr(void);
phys_addr_t kvm_get_idmap_vector(void);
int __init kvm_mmu_init(u32 *hyp_va_bits);
int __init kvm_mmu_init(u32 hyp_va_bits);
static inline void *__kvm_vector_slot2addr(void *base,
enum arm64_hyp_spectre_vector slot)

View file

@ -87,15 +87,9 @@ typedef u64 kvm_pte_t;
#define KVM_PTE_LEAF_ATTR_HI_SW GENMASK(58, 55)
#define __KVM_PTE_LEAF_ATTR_HI_S1_XN BIT(54)
#define __KVM_PTE_LEAF_ATTR_HI_S1_UXN BIT(54)
#define __KVM_PTE_LEAF_ATTR_HI_S1_PXN BIT(53)
#define KVM_PTE_LEAF_ATTR_HI_S1_XN \
({ cpus_have_final_cap(ARM64_KVM_HVHE) ? \
(__KVM_PTE_LEAF_ATTR_HI_S1_UXN | \
__KVM_PTE_LEAF_ATTR_HI_S1_PXN) : \
__KVM_PTE_LEAF_ATTR_HI_S1_XN; })
#define KVM_PTE_LEAF_ATTR_HI_S1_XN BIT(54)
#define KVM_PTE_LEAF_ATTR_HI_S1_UXN BIT(54)
#define KVM_PTE_LEAF_ATTR_HI_S1_PXN BIT(53)
#define KVM_PTE_LEAF_ATTR_HI_S2_XN GENMASK(54, 53)
@ -237,13 +231,12 @@ struct kvm_pgtable_mm_ops {
/**
* enum kvm_pgtable_stage2_flags - Stage-2 page-table flags.
* @KVM_PGTABLE_S2_NOFWB: Don't enforce Normal-WB even if the CPUs have
* ARM64_HAS_STAGE2_FWB.
* @KVM_PGTABLE_S2_IDMAP: Only use identity mappings.
* @KVM_PGTABLE_S2_AS_S1: Final memory attributes are that of Stage-1.
*/
enum kvm_pgtable_stage2_flags {
KVM_PGTABLE_S2_NOFWB = BIT(0),
KVM_PGTABLE_S2_IDMAP = BIT(1),
KVM_PGTABLE_S2_IDMAP = BIT(0),
KVM_PGTABLE_S2_AS_S1 = BIT(1),
};
/**

View file

@ -9,6 +9,7 @@
#include <linux/arm_ffa.h>
#include <linux/memblock.h>
#include <linux/scatterlist.h>
#include <asm/kvm_host.h>
#include <asm/kvm_pgtable.h>
/* Maximum number of VMs that can co-exist under pKVM. */
@ -23,10 +24,12 @@ void pkvm_destroy_hyp_vm(struct kvm *kvm);
int pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu);
/*
* This functions as an allow-list of protected VM capabilities.
* Features not explicitly allowed by this function are denied.
* Check whether the specific capability is allowed in pKVM.
*
* Certain features are allowed only for non-protected VMs in pKVM, which is why
* this takes the VM (kvm) as a parameter.
*/
static inline bool kvm_pvm_ext_allowed(long ext)
static inline bool kvm_pkvm_ext_allowed(struct kvm *kvm, long ext)
{
switch (ext) {
case KVM_CAP_IRQCHIP:
@ -42,11 +45,32 @@ static inline bool kvm_pvm_ext_allowed(long ext)
case KVM_CAP_ARM_PTRAUTH_ADDRESS:
case KVM_CAP_ARM_PTRAUTH_GENERIC:
return true;
default:
case KVM_CAP_ARM_MTE:
return false;
default:
return !kvm || !kvm_vm_is_protected(kvm);
}
}
/*
* Check whether the KVM VM IOCTL is allowed in pKVM.
*
* Certain features are allowed only for non-protected VMs in pKVM, which is why
* this takes the VM (kvm) as a parameter.
*/
static inline bool kvm_pkvm_ioctl_allowed(struct kvm *kvm, unsigned int ioctl)
{
long ext;
int r;
r = kvm_get_cap_for_kvm_ioctl(ioctl, &ext);
if (WARN_ON_ONCE(r < 0))
return false;
return kvm_pkvm_ext_allowed(kvm, ext);
}
extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);

View file

@ -175,19 +175,24 @@
#define MT_DEVICE_nGnRE 4
/*
* Memory types for Stage-2 translation
* Memory types for Stage-2 translation when HCR_EL2.FWB=0. See R_HMNDG,
* R_TNHFM, R_GQFSF and I_MCQKW for the details on how these attributes get
* combined with Stage-1.
*/
#define MT_S2_NORMAL 0xf
#define MT_S2_NORMAL_NC 0x5
#define MT_S2_DEVICE_nGnRE 0x1
#define MT_S2_AS_S1 MT_S2_NORMAL
/*
* Memory types for Stage-2 translation when ID_AA64MMFR2_EL1.FWB is 0001
* Stage-2 enforces Normal-WB and Device-nGnRE
* Memory types for Stage-2 translation when HCR_EL2.FWB=1. Stage-2 enforces
* Normal-WB and Device-nGnRE, unless we actively say that S1 wins. See
* R_VRJSW and R_RHWZM for details.
*/
#define MT_S2_FWB_NORMAL 6
#define MT_S2_FWB_NORMAL_NC 5
#define MT_S2_FWB_DEVICE_nGnRE 1
#define MT_S2_FWB_AS_S1 7
#ifdef CONFIG_ARM64_4K_PAGES
#define IOREMAP_MAX_ORDER (PUD_SHIFT)

View file

@ -109,10 +109,10 @@ static inline bool __pure lpa2_is_enabled(void)
#define PAGE_KERNEL_EXEC __pgprot(_PAGE_KERNEL_EXEC)
#define PAGE_KERNEL_EXEC_CONT __pgprot(_PAGE_KERNEL_EXEC_CONT)
#define PAGE_S2_MEMATTR(attr, has_fwb) \
#define PAGE_S2_MEMATTR(attr) \
({ \
u64 __val; \
if (has_fwb) \
if (cpus_have_final_cap(ARM64_HAS_STAGE2_FWB)) \
__val = PTE_S2_MEMATTR(MT_S2_FWB_ ## attr); \
else \
__val = PTE_S2_MEMATTR(MT_S2_ ## attr); \

View file

@ -504,7 +504,6 @@
#define SYS_VPIDR_EL2 sys_reg(3, 4, 0, 0, 0)
#define SYS_VMPIDR_EL2 sys_reg(3, 4, 0, 0, 5)
#define SYS_SCTLR_EL2 sys_reg(3, 4, 1, 0, 0)
#define SYS_ACTLR_EL2 sys_reg(3, 4, 1, 0, 1)
#define SYS_SCTLR2_EL2 sys_reg(3, 4, 1, 0, 3)
#define SYS_HCR_EL2 sys_reg(3, 4, 1, 1, 0)
@ -517,7 +516,6 @@
#define SYS_TTBR1_EL2 sys_reg(3, 4, 2, 0, 1)
#define SYS_TCR_EL2 sys_reg(3, 4, 2, 0, 2)
#define SYS_VTTBR_EL2 sys_reg(3, 4, 2, 1, 0)
#define SYS_VTCR_EL2 sys_reg(3, 4, 2, 1, 2)
#define SYS_HAFGRTR_EL2 sys_reg(3, 4, 3, 1, 6)
#define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0)
@ -561,7 +559,6 @@
#define SYS_ICC_SRE_EL2 sys_reg(3, 4, 12, 9, 5)
#define SYS_ICH_EISR_EL2 sys_reg(3, 4, 12, 11, 3)
#define SYS_ICH_ELRSR_EL2 sys_reg(3, 4, 12, 11, 5)
#define SYS_ICH_VMCR_EL2 sys_reg(3, 4, 12, 11, 7)
#define __SYS__LR0_EL2(x) sys_reg(3, 4, 12, 12, x)
#define SYS_ICH_LR0_EL2 __SYS__LR0_EL2(0)
@ -838,12 +835,6 @@
#define SCTLR_ELx_A (BIT(1))
#define SCTLR_ELx_M (BIT(0))
/* SCTLR_EL2 specific flags. */
#define SCTLR_EL2_RES1 ((BIT(4)) | (BIT(5)) | (BIT(11)) | (BIT(16)) | \
(BIT(18)) | (BIT(22)) | (BIT(23)) | (BIT(28)) | \
(BIT(29)))
#define SCTLR_EL2_BT (BIT(36))
#ifdef CONFIG_CPU_BIG_ENDIAN
#define ENDIAN_SET_EL2 SCTLR_ELx_EE
#else
@ -989,26 +980,6 @@
#define ICH_LR_PRIORITY_SHIFT 48
#define ICH_LR_PRIORITY_MASK (0xffULL << ICH_LR_PRIORITY_SHIFT)
/* ICH_VMCR_EL2 bit definitions */
#define ICH_VMCR_ACK_CTL_SHIFT 2
#define ICH_VMCR_ACK_CTL_MASK (1 << ICH_VMCR_ACK_CTL_SHIFT)
#define ICH_VMCR_FIQ_EN_SHIFT 3
#define ICH_VMCR_FIQ_EN_MASK (1 << ICH_VMCR_FIQ_EN_SHIFT)
#define ICH_VMCR_CBPR_SHIFT 4
#define ICH_VMCR_CBPR_MASK (1 << ICH_VMCR_CBPR_SHIFT)
#define ICH_VMCR_EOIM_SHIFT 9
#define ICH_VMCR_EOIM_MASK (1 << ICH_VMCR_EOIM_SHIFT)
#define ICH_VMCR_BPR1_SHIFT 18
#define ICH_VMCR_BPR1_MASK (7 << ICH_VMCR_BPR1_SHIFT)
#define ICH_VMCR_BPR0_SHIFT 21
#define ICH_VMCR_BPR0_MASK (7 << ICH_VMCR_BPR0_SHIFT)
#define ICH_VMCR_PMR_SHIFT 24
#define ICH_VMCR_PMR_MASK (0xffUL << ICH_VMCR_PMR_SHIFT)
#define ICH_VMCR_ENG0_SHIFT 0
#define ICH_VMCR_ENG0_MASK (1 << ICH_VMCR_ENG0_SHIFT)
#define ICH_VMCR_ENG1_SHIFT 1
#define ICH_VMCR_ENG1_MASK (1 << ICH_VMCR_ENG1_SHIFT)
/*
* Permission Indirection Extension (PIE) permission encodings.
* Encodings with the _O suffix, have overlays applied (Permission Overlay Extension).

View file

@ -2335,16 +2335,16 @@ static bool can_trap_icv_dir_el1(const struct arm64_cpu_capabilities *entry,
BUILD_BUG_ON(ARM64_HAS_ICH_HCR_EL2_TDIR <= ARM64_HAS_GICV3_CPUIF);
BUILD_BUG_ON(ARM64_HAS_ICH_HCR_EL2_TDIR <= ARM64_HAS_GICV5_LEGACY);
if (!this_cpu_has_cap(ARM64_HAS_GICV3_CPUIF) &&
!is_midr_in_range_list(has_vgic_v3))
return false;
if (!is_hyp_mode_available())
return false;
if (this_cpu_has_cap(ARM64_HAS_GICV5_LEGACY))
return true;
if (!this_cpu_has_cap(ARM64_HAS_GICV3_CPUIF) &&
!is_midr_in_range_list(has_vgic_v3))
return false;
if (is_kernel_in_hyp_mode())
res.a1 = read_sysreg_s(SYS_ICH_VTR_EL2);
else

View file

@ -299,7 +299,7 @@ SYM_INNER_LABEL(init_el2, SYM_L_LOCAL)
isb
0:
init_el2_hcr HCR_HOST_NVHE_FLAGS
init_el2_hcr HCR_HOST_NVHE_FLAGS | HCR_ATA
init_el2_state
/* Hypervisor stub */

View file

@ -86,7 +86,6 @@ KVM_NVHE_ALIAS(kvm_patch_vector_branch);
KVM_NVHE_ALIAS(kvm_update_va_mask);
KVM_NVHE_ALIAS(kvm_get_kimage_voffset);
KVM_NVHE_ALIAS(kvm_compute_final_ctr_el0);
KVM_NVHE_ALIAS(kvm_pan_patch_el2_entry);
KVM_NVHE_ALIAS(spectre_bhb_patch_loop_iter);
KVM_NVHE_ALIAS(spectre_bhb_patch_loop_mitigation_enable);
KVM_NVHE_ALIAS(spectre_bhb_patch_wa3);

View file

@ -1056,10 +1056,14 @@ static void timer_context_init(struct kvm_vcpu *vcpu, int timerid)
ctxt->timer_id = timerid;
if (timerid == TIMER_VTIMER)
ctxt->offset.vm_offset = &kvm->arch.timer_data.voffset;
else
ctxt->offset.vm_offset = &kvm->arch.timer_data.poffset;
if (!kvm_vm_is_protected(vcpu->kvm)) {
if (timerid == TIMER_VTIMER)
ctxt->offset.vm_offset = &kvm->arch.timer_data.voffset;
else
ctxt->offset.vm_offset = &kvm->arch.timer_data.poffset;
} else {
ctxt->offset.vm_offset = NULL;
}
hrtimer_setup(&ctxt->hrtimer, kvm_hrtimer_expire, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
@ -1083,7 +1087,8 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
timer_context_init(vcpu, i);
/* Synchronize offsets across timers of a VM if not already provided */
if (!test_bit(KVM_ARCH_FLAG_VM_COUNTER_OFFSET, &vcpu->kvm->arch.flags)) {
if (!vcpu_is_protected(vcpu) &&
!test_bit(KVM_ARCH_FLAG_VM_COUNTER_OFFSET, &vcpu->kvm->arch.flags)) {
timer_set_offset(vcpu_vtimer(vcpu), kvm_phys_timer_read());
timer_set_offset(vcpu_ptimer(vcpu), 0);
}
@ -1687,6 +1692,9 @@ int kvm_vm_ioctl_set_counter_offset(struct kvm *kvm,
if (offset->reserved)
return -EINVAL;
if (kvm_vm_is_protected(kvm))
return -EINVAL;
mutex_lock(&kvm->lock);
if (!kvm_trylock_all_vcpus(kvm)) {

View file

@ -40,6 +40,7 @@
#include <asm/kvm_pkvm.h>
#include <asm/kvm_ptrauth.h>
#include <asm/sections.h>
#include <asm/stacktrace/nvhe.h>
#include <kvm/arm_hypercalls.h>
#include <kvm/arm_pmu.h>
@ -58,6 +59,51 @@ enum kvm_wfx_trap_policy {
static enum kvm_wfx_trap_policy kvm_wfi_trap_policy __read_mostly = KVM_WFX_NOTRAP_SINGLE_TASK;
static enum kvm_wfx_trap_policy kvm_wfe_trap_policy __read_mostly = KVM_WFX_NOTRAP_SINGLE_TASK;
/*
* Tracks KVM IOCTLs and their associated KVM capabilities.
*/
struct kvm_ioctl_cap_map {
unsigned int ioctl;
long ext;
};
/* Make KVM_CAP_NR_VCPUS the reference for features we always supported */
#define KVM_CAP_ARM_BASIC KVM_CAP_NR_VCPUS
/*
* Sorted by ioctl to allow for potential binary search,
* though linear scan is sufficient for this size.
*/
static const struct kvm_ioctl_cap_map vm_ioctl_caps[] = {
{ KVM_CREATE_IRQCHIP, KVM_CAP_IRQCHIP },
{ KVM_ARM_SET_DEVICE_ADDR, KVM_CAP_ARM_SET_DEVICE_ADDR },
{ KVM_ARM_MTE_COPY_TAGS, KVM_CAP_ARM_MTE },
{ KVM_SET_DEVICE_ATTR, KVM_CAP_DEVICE_CTRL },
{ KVM_GET_DEVICE_ATTR, KVM_CAP_DEVICE_CTRL },
{ KVM_HAS_DEVICE_ATTR, KVM_CAP_DEVICE_CTRL },
{ KVM_ARM_SET_COUNTER_OFFSET, KVM_CAP_COUNTER_OFFSET },
{ KVM_ARM_GET_REG_WRITABLE_MASKS, KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES },
{ KVM_ARM_PREFERRED_TARGET, KVM_CAP_ARM_BASIC },
};
/*
* Set *ext to the capability.
* Return 0 if found, or -EINVAL if no IOCTL matches.
*/
long kvm_get_cap_for_kvm_ioctl(unsigned int ioctl, long *ext)
{
int i;
for (i = 0; i < ARRAY_SIZE(vm_ioctl_caps); i++) {
if (vm_ioctl_caps[i].ioctl == ioctl) {
*ext = vm_ioctl_caps[i].ext;
return 0;
}
}
return -EINVAL;
}
DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_base);
@ -87,7 +133,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
if (cap->flags)
return -EINVAL;
if (kvm_vm_is_protected(kvm) && !kvm_pvm_ext_allowed(cap->cap))
if (is_protected_kvm_enabled() && !kvm_pkvm_ext_allowed(kvm, cap->cap))
return -EINVAL;
switch (cap->cap) {
@ -303,7 +349,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
{
int r;
if (kvm && kvm_vm_is_protected(kvm) && !kvm_pvm_ext_allowed(ext))
if (is_protected_kvm_enabled() && !kvm_pkvm_ext_allowed(kvm, ext))
return 0;
switch (ext) {
@ -1894,6 +1940,9 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
void __user *argp = (void __user *)arg;
struct kvm_device_attr attr;
if (is_protected_kvm_enabled() && !kvm_pkvm_ioctl_allowed(kvm, ioctl))
return -EINVAL;
switch (ioctl) {
case KVM_CREATE_IRQCHIP: {
int ret;
@ -2045,6 +2094,12 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
else
params->hcr_el2 = HCR_HOST_NVHE_FLAGS;
if (system_supports_mte())
params->hcr_el2 |= HCR_ATA;
else
params->hcr_el2 |= HCR_TID5;
if (cpus_have_final_cap(ARM64_KVM_HVHE))
params->hcr_el2 |= HCR_E2H;
params->vttbr = params->vtcr = 0;
@ -2358,7 +2413,7 @@ static int __init init_subsystems(void)
if (err)
goto out;
kvm_register_perf_callbacks(NULL);
kvm_register_perf_callbacks();
out:
if (err)
@ -2569,7 +2624,7 @@ static void pkvm_hyp_init_ptrauth(void)
/* Inits Hyp-mode on all online CPUs */
static int __init init_hyp_mode(void)
{
u32 hyp_va_bits;
u32 hyp_va_bits = kvm_hyp_va_bits();
int cpu;
int err = -ENOMEM;
@ -2583,7 +2638,7 @@ static int __init init_hyp_mode(void)
/*
* Allocate Hyp PGD and setup Hyp identity mapping
*/
err = kvm_mmu_init(&hyp_va_bits);
err = kvm_mmu_init(hyp_va_bits);
if (err)
goto out_err;

View file

@ -16,14 +16,18 @@
*/
struct reg_bits_to_feat_map {
union {
u64 bits;
u64 *res0p;
u64 bits;
struct fgt_masks *masks;
};
#define NEVER_FGU BIT(0) /* Can trap, but never UNDEF */
#define CALL_FUNC BIT(1) /* Needs to evaluate tons of crap */
#define FIXED_VALUE BIT(2) /* RAZ/WI or RAO/WI in KVM */
#define RES0_POINTER BIT(3) /* Pointer to RES0 value instead of bits */
#define FORCE_RESx BIT(2) /* Unconditional RESx */
#define MASKS_POINTER BIT(3) /* Pointer to fgt_masks struct instead of bits */
#define AS_RES1 BIT(4) /* RES1 when not supported */
#define REQUIRES_E2H1 BIT(5) /* Add HCR_EL2.E2H RES1 as a pre-condition */
#define RES1_WHEN_E2H0 BIT(6) /* RES1 when E2H=0 and not supported */
#define RES1_WHEN_E2H1 BIT(7) /* RES1 when E2H=1 and not supported */
unsigned long flags;
@ -36,7 +40,6 @@ struct reg_bits_to_feat_map {
s8 lo_lim;
};
bool (*match)(struct kvm *);
bool (*fval)(struct kvm *, u64 *);
};
};
@ -69,13 +72,6 @@ struct reg_feat_map_desc {
.lo_lim = id ##_## fld ##_## lim \
}
#define __NEEDS_FEAT_2(m, f, w, fun, dummy) \
{ \
.w = (m), \
.flags = (f) | CALL_FUNC, \
.fval = (fun), \
}
#define __NEEDS_FEAT_1(m, f, w, fun) \
{ \
.w = (m), \
@ -83,17 +79,20 @@ struct reg_feat_map_desc {
.match = (fun), \
}
#define __NEEDS_FEAT_0(m, f, w, ...) \
{ \
.w = (m), \
.flags = (f), \
}
#define __NEEDS_FEAT_FLAG(m, f, w, ...) \
CONCATENATE(__NEEDS_FEAT_, COUNT_ARGS(__VA_ARGS__))(m, f, w, __VA_ARGS__)
#define NEEDS_FEAT_FLAG(m, f, ...) \
__NEEDS_FEAT_FLAG(m, f, bits, __VA_ARGS__)
#define NEEDS_FEAT_FIXED(m, ...) \
__NEEDS_FEAT_FLAG(m, FIXED_VALUE, bits, __VA_ARGS__, 0)
#define NEEDS_FEAT_RES0(p, ...) \
__NEEDS_FEAT_FLAG(p, RES0_POINTER, res0p, __VA_ARGS__)
#define NEEDS_FEAT_MASKS(p, ...) \
__NEEDS_FEAT_FLAG(p, MASKS_POINTER, masks, __VA_ARGS__)
/*
* Declare the dependency between a set of bits and a set of features,
@ -101,27 +100,32 @@ struct reg_feat_map_desc {
*/
#define NEEDS_FEAT(m, ...) NEEDS_FEAT_FLAG(m, 0, __VA_ARGS__)
/* Declare fixed RESx bits */
#define FORCE_RES0(m) NEEDS_FEAT_FLAG(m, FORCE_RESx)
#define FORCE_RES1(m) NEEDS_FEAT_FLAG(m, FORCE_RESx | AS_RES1)
/*
* Declare the dependency between a non-FGT register, a set of
* feature, and the set of individual bits it contains. This generates
* a struct reg_feat_map_desc.
* Declare the dependency between a non-FGT register, a set of features,
* and the set of individual bits it contains. This generates a struct
* reg_feat_map_desc.
*/
#define DECLARE_FEAT_MAP(n, r, m, f) \
struct reg_feat_map_desc n = { \
.name = #r, \
.feat_map = NEEDS_FEAT(~r##_RES0, f), \
.feat_map = NEEDS_FEAT(~(r##_RES0 | \
r##_RES1), f), \
.bit_feat_map = m, \
.bit_feat_map_sz = ARRAY_SIZE(m), \
}
/*
* Specialised version of the above for FGT registers that have their
* RES0 masks described as struct fgt_masks.
* RESx masks described as struct fgt_masks.
*/
#define DECLARE_FEAT_MAP_FGT(n, msk, m, f) \
struct reg_feat_map_desc n = { \
.name = #msk, \
.feat_map = NEEDS_FEAT_RES0(&msk.res0, f),\
.feat_map = NEEDS_FEAT_MASKS(&msk, f), \
.bit_feat_map = m, \
.bit_feat_map_sz = ARRAY_SIZE(m), \
}
@ -140,6 +144,7 @@ struct reg_feat_map_desc {
#define FEAT_AA64EL1 ID_AA64PFR0_EL1, EL1, IMP
#define FEAT_AA64EL2 ID_AA64PFR0_EL1, EL2, IMP
#define FEAT_AA64EL3 ID_AA64PFR0_EL1, EL3, IMP
#define FEAT_SEL2 ID_AA64PFR0_EL1, SEL2, IMP
#define FEAT_AIE ID_AA64MMFR3_EL1, AIE, IMP
#define FEAT_S2POE ID_AA64MMFR3_EL1, S2POE, IMP
#define FEAT_S1POE ID_AA64MMFR3_EL1, S1POE, IMP
@ -182,7 +187,6 @@ struct reg_feat_map_desc {
#define FEAT_RME ID_AA64PFR0_EL1, RME, IMP
#define FEAT_MPAM ID_AA64PFR0_EL1, MPAM, 1
#define FEAT_S2FWB ID_AA64MMFR2_EL1, FWB, IMP
#define FEAT_TME ID_AA64ISAR0_EL1, TME, IMP
#define FEAT_TWED ID_AA64MMFR1_EL1, TWED, IMP
#define FEAT_E2H0 ID_AA64MMFR4_EL1, E2H0, IMP
#define FEAT_SRMASK ID_AA64MMFR4_EL1, SRMASK, IMP
@ -201,6 +205,8 @@ struct reg_feat_map_desc {
#define FEAT_ASID2 ID_AA64MMFR4_EL1, ASID2, IMP
#define FEAT_MEC ID_AA64MMFR3_EL1, MEC, IMP
#define FEAT_HAFT ID_AA64MMFR1_EL1, HAFDBS, HAFT
#define FEAT_HDBSS ID_AA64MMFR1_EL1, HAFDBS, HDBSS
#define FEAT_HPDS2 ID_AA64MMFR1_EL1, HPDS, HPDS2
#define FEAT_BTI ID_AA64PFR1_EL1, BT, IMP
#define FEAT_ExS ID_AA64MMFR0_EL1, EXS, IMP
#define FEAT_IESB ID_AA64MMFR2_EL1, IESB, IMP
@ -218,6 +224,7 @@ struct reg_feat_map_desc {
#define FEAT_FGT2 ID_AA64MMFR0_EL1, FGT, FGT2
#define FEAT_MTPMU ID_AA64DFR0_EL1, MTPMU, IMP
#define FEAT_HCX ID_AA64MMFR1_EL1, HCX, IMP
#define FEAT_S2PIE ID_AA64MMFR3_EL1, S2PIE, IMP
static bool not_feat_aa64el3(struct kvm *kvm)
{
@ -305,21 +312,6 @@ static bool feat_trbe_mpam(struct kvm *kvm)
(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_MPAM));
}
static bool feat_asid2_e2h1(struct kvm *kvm)
{
return kvm_has_feat(kvm, FEAT_ASID2) && !kvm_has_feat(kvm, FEAT_E2H0);
}
static bool feat_d128_e2h1(struct kvm *kvm)
{
return kvm_has_feat(kvm, FEAT_D128) && !kvm_has_feat(kvm, FEAT_E2H0);
}
static bool feat_mec_e2h1(struct kvm *kvm)
{
return kvm_has_feat(kvm, FEAT_MEC) && !kvm_has_feat(kvm, FEAT_E2H0);
}
static bool feat_ebep_pmuv3_ss(struct kvm *kvm)
{
return kvm_has_feat(kvm, FEAT_EBEP) || kvm_has_feat(kvm, FEAT_PMUv3_SS);
@ -361,29 +353,26 @@ static bool feat_pmuv3p9(struct kvm *kvm)
return check_pmu_revision(kvm, V3P9);
}
static bool compute_hcr_rw(struct kvm *kvm, u64 *bits)
{
/* This is purely academic: AArch32 and NV are mutually exclusive */
if (bits) {
if (kvm_has_feat(kvm, FEAT_AA32EL1))
*bits &= ~HCR_EL2_RW;
else
*bits |= HCR_EL2_RW;
}
#define has_feat_s2tgran(k, s) \
((kvm_has_feat_enum(kvm, ID_AA64MMFR0_EL1, TGRAN##s##_2, TGRAN##s) && \
kvm_has_feat(kvm, ID_AA64MMFR0_EL1, TGRAN##s, IMP)) || \
kvm_has_feat(kvm, ID_AA64MMFR0_EL1, TGRAN##s##_2, IMP))
return true;
static bool feat_lpa2(struct kvm *kvm)
{
return ((kvm_has_feat(kvm, ID_AA64MMFR0_EL1, TGRAN4, 52_BIT) ||
!kvm_has_feat(kvm, ID_AA64MMFR0_EL1, TGRAN4, IMP)) &&
(kvm_has_feat(kvm, ID_AA64MMFR0_EL1, TGRAN16, 52_BIT) ||
!kvm_has_feat(kvm, ID_AA64MMFR0_EL1, TGRAN16, IMP)) &&
(kvm_has_feat(kvm, ID_AA64MMFR0_EL1, TGRAN4_2, 52_BIT) ||
!has_feat_s2tgran(kvm, 4)) &&
(kvm_has_feat(kvm, ID_AA64MMFR0_EL1, TGRAN16_2, 52_BIT) ||
!has_feat_s2tgran(kvm, 16)));
}
static bool compute_hcr_e2h(struct kvm *kvm, u64 *bits)
static bool feat_vmid16(struct kvm *kvm)
{
if (bits) {
if (kvm_has_feat(kvm, FEAT_E2H0))
*bits &= ~HCR_EL2_E2H;
else
*bits |= HCR_EL2_E2H;
}
return true;
return kvm_has_feat_enum(kvm, ID_AA64MMFR1_EL1, VMIDBits, 16);
}
static const struct reg_bits_to_feat_map hfgrtr_feat_map[] = {
@ -939,7 +928,7 @@ static const DECLARE_FEAT_MAP(hcrx_desc, __HCRX_EL2,
static const struct reg_bits_to_feat_map hcr_feat_map[] = {
NEEDS_FEAT(HCR_EL2_TID0, FEAT_AA32EL0),
NEEDS_FEAT_FIXED(HCR_EL2_RW, compute_hcr_rw),
NEEDS_FEAT_FLAG(HCR_EL2_RW, AS_RES1, FEAT_AA32EL1),
NEEDS_FEAT(HCR_EL2_HCD, not_feat_aa64el3),
NEEDS_FEAT(HCR_EL2_AMO |
HCR_EL2_BSU |
@ -949,7 +938,6 @@ static const struct reg_bits_to_feat_map hcr_feat_map[] = {
HCR_EL2_FMO |
HCR_EL2_ID |
HCR_EL2_IMO |
HCR_EL2_MIOCNCE |
HCR_EL2_PTW |
HCR_EL2_SWIO |
HCR_EL2_TACR |
@ -1001,11 +989,12 @@ static const struct reg_bits_to_feat_map hcr_feat_map[] = {
NEEDS_FEAT(HCR_EL2_FIEN, feat_rasv1p1),
NEEDS_FEAT(HCR_EL2_GPF, FEAT_RME),
NEEDS_FEAT(HCR_EL2_FWB, FEAT_S2FWB),
NEEDS_FEAT(HCR_EL2_TME, FEAT_TME),
NEEDS_FEAT(HCR_EL2_TWEDEL |
HCR_EL2_TWEDEn,
FEAT_TWED),
NEEDS_FEAT_FIXED(HCR_EL2_E2H, compute_hcr_e2h),
NEEDS_FEAT_FLAG(HCR_EL2_E2H, RES1_WHEN_E2H1 | FORCE_RESx),
FORCE_RES0(HCR_EL2_RES0),
FORCE_RES1(HCR_EL2_RES1),
};
static const DECLARE_FEAT_MAP(hcr_desc, HCR_EL2,
@ -1026,21 +1015,23 @@ static const struct reg_bits_to_feat_map sctlr2_feat_map[] = {
SCTLR2_EL1_CPTM |
SCTLR2_EL1_CPTM0,
FEAT_CPA2),
FORCE_RES0(SCTLR2_EL1_RES0),
FORCE_RES1(SCTLR2_EL1_RES1),
};
static const DECLARE_FEAT_MAP(sctlr2_desc, SCTLR2_EL1,
sctlr2_feat_map, FEAT_SCTLR2);
static const struct reg_bits_to_feat_map tcr2_el2_feat_map[] = {
NEEDS_FEAT(TCR2_EL2_FNG1 |
TCR2_EL2_FNG0 |
TCR2_EL2_A2,
feat_asid2_e2h1),
NEEDS_FEAT(TCR2_EL2_DisCH1 |
TCR2_EL2_DisCH0 |
TCR2_EL2_D128,
feat_d128_e2h1),
NEEDS_FEAT(TCR2_EL2_AMEC1, feat_mec_e2h1),
NEEDS_FEAT_FLAG(TCR2_EL2_FNG1 |
TCR2_EL2_FNG0 |
TCR2_EL2_A2,
REQUIRES_E2H1, FEAT_ASID2),
NEEDS_FEAT_FLAG(TCR2_EL2_DisCH1 |
TCR2_EL2_DisCH0 |
TCR2_EL2_D128,
REQUIRES_E2H1, FEAT_D128),
NEEDS_FEAT_FLAG(TCR2_EL2_AMEC1, REQUIRES_E2H1, FEAT_MEC),
NEEDS_FEAT(TCR2_EL2_AMEC0, FEAT_MEC),
NEEDS_FEAT(TCR2_EL2_HAFT, FEAT_HAFT),
NEEDS_FEAT(TCR2_EL2_PTTWI |
@ -1051,33 +1042,36 @@ static const struct reg_bits_to_feat_map tcr2_el2_feat_map[] = {
TCR2_EL2_E0POE,
FEAT_S1POE),
NEEDS_FEAT(TCR2_EL2_PIE, FEAT_S1PIE),
FORCE_RES0(TCR2_EL2_RES0),
FORCE_RES1(TCR2_EL2_RES1),
};
static const DECLARE_FEAT_MAP(tcr2_el2_desc, TCR2_EL2,
tcr2_el2_feat_map, FEAT_TCR2);
static const struct reg_bits_to_feat_map sctlr_el1_feat_map[] = {
NEEDS_FEAT(SCTLR_EL1_CP15BEN |
SCTLR_EL1_ITD |
SCTLR_EL1_SED,
FEAT_AA32EL0),
NEEDS_FEAT(SCTLR_EL1_CP15BEN, FEAT_AA32EL0),
NEEDS_FEAT_FLAG(SCTLR_EL1_ITD |
SCTLR_EL1_SED,
AS_RES1, FEAT_AA32EL0),
NEEDS_FEAT(SCTLR_EL1_BT0 |
SCTLR_EL1_BT1,
FEAT_BTI),
NEEDS_FEAT(SCTLR_EL1_CMOW, FEAT_CMOW),
NEEDS_FEAT(SCTLR_EL1_TSCXT, feat_csv2_2_csv2_1p2),
NEEDS_FEAT(SCTLR_EL1_EIS |
SCTLR_EL1_EOS,
FEAT_ExS),
NEEDS_FEAT_FLAG(SCTLR_EL1_TSCXT,
AS_RES1, feat_csv2_2_csv2_1p2),
NEEDS_FEAT_FLAG(SCTLR_EL1_EIS |
SCTLR_EL1_EOS,
AS_RES1, FEAT_ExS),
NEEDS_FEAT(SCTLR_EL1_EnFPM, FEAT_FPMR),
NEEDS_FEAT(SCTLR_EL1_IESB, FEAT_IESB),
NEEDS_FEAT(SCTLR_EL1_EnALS, FEAT_LS64),
NEEDS_FEAT(SCTLR_EL1_EnAS0, FEAT_LS64_ACCDATA),
NEEDS_FEAT(SCTLR_EL1_EnASR, FEAT_LS64_V),
NEEDS_FEAT(SCTLR_EL1_nAA, FEAT_LSE2),
NEEDS_FEAT(SCTLR_EL1_LSMAOE |
SCTLR_EL1_nTLSMD,
FEAT_LSMAOC),
NEEDS_FEAT_FLAG(SCTLR_EL1_LSMAOE |
SCTLR_EL1_nTLSMD,
AS_RES1, FEAT_LSMAOC),
NEEDS_FEAT(SCTLR_EL1_EE, FEAT_MixedEnd),
NEEDS_FEAT(SCTLR_EL1_E0E, feat_mixedendel0),
NEEDS_FEAT(SCTLR_EL1_MSCEn, FEAT_MOPS),
@ -1093,7 +1087,8 @@ static const struct reg_bits_to_feat_map sctlr_el1_feat_map[] = {
NEEDS_FEAT(SCTLR_EL1_NMI |
SCTLR_EL1_SPINTMASK,
FEAT_NMI),
NEEDS_FEAT(SCTLR_EL1_SPAN, FEAT_PAN),
NEEDS_FEAT_FLAG(SCTLR_EL1_SPAN,
AS_RES1, FEAT_PAN),
NEEDS_FEAT(SCTLR_EL1_EPAN, FEAT_PAN3),
NEEDS_FEAT(SCTLR_EL1_EnDA |
SCTLR_EL1_EnDB |
@ -1104,17 +1099,10 @@ static const struct reg_bits_to_feat_map sctlr_el1_feat_map[] = {
NEEDS_FEAT(SCTLR_EL1_EnRCTX, FEAT_SPECRES),
NEEDS_FEAT(SCTLR_EL1_DSSBS, FEAT_SSBS),
NEEDS_FEAT(SCTLR_EL1_TIDCP, FEAT_TIDCP1),
NEEDS_FEAT(SCTLR_EL1_TME0 |
SCTLR_EL1_TME |
SCTLR_EL1_TMT0 |
SCTLR_EL1_TMT,
FEAT_TME),
NEEDS_FEAT(SCTLR_EL1_TWEDEL |
SCTLR_EL1_TWEDEn,
FEAT_TWED),
NEEDS_FEAT(SCTLR_EL1_UCI |
SCTLR_EL1_EE |
SCTLR_EL1_E0E |
SCTLR_EL1_WXN |
SCTLR_EL1_nTWE |
SCTLR_EL1_nTWI |
@ -1128,11 +1116,91 @@ static const struct reg_bits_to_feat_map sctlr_el1_feat_map[] = {
SCTLR_EL1_A |
SCTLR_EL1_M,
FEAT_AA64EL1),
FORCE_RES0(SCTLR_EL1_RES0),
FORCE_RES1(SCTLR_EL1_RES1),
};
static const DECLARE_FEAT_MAP(sctlr_el1_desc, SCTLR_EL1,
sctlr_el1_feat_map, FEAT_AA64EL1);
static const struct reg_bits_to_feat_map sctlr_el2_feat_map[] = {
NEEDS_FEAT_FLAG(SCTLR_EL2_CP15BEN,
RES1_WHEN_E2H0 | REQUIRES_E2H1,
FEAT_AA32EL0),
NEEDS_FEAT_FLAG(SCTLR_EL2_ITD |
SCTLR_EL2_SED,
RES1_WHEN_E2H1 | REQUIRES_E2H1,
FEAT_AA32EL0),
NEEDS_FEAT_FLAG(SCTLR_EL2_BT0, REQUIRES_E2H1, FEAT_BTI),
NEEDS_FEAT(SCTLR_EL2_BT, FEAT_BTI),
NEEDS_FEAT_FLAG(SCTLR_EL2_CMOW, REQUIRES_E2H1, FEAT_CMOW),
NEEDS_FEAT_FLAG(SCTLR_EL2_TSCXT,
RES1_WHEN_E2H1 | REQUIRES_E2H1,
feat_csv2_2_csv2_1p2),
NEEDS_FEAT_FLAG(SCTLR_EL2_EIS |
SCTLR_EL2_EOS,
AS_RES1, FEAT_ExS),
NEEDS_FEAT(SCTLR_EL2_EnFPM, FEAT_FPMR),
NEEDS_FEAT(SCTLR_EL2_IESB, FEAT_IESB),
NEEDS_FEAT_FLAG(SCTLR_EL2_EnALS, REQUIRES_E2H1, FEAT_LS64),
NEEDS_FEAT_FLAG(SCTLR_EL2_EnAS0, REQUIRES_E2H1, FEAT_LS64_ACCDATA),
NEEDS_FEAT_FLAG(SCTLR_EL2_EnASR, REQUIRES_E2H1, FEAT_LS64_V),
NEEDS_FEAT(SCTLR_EL2_nAA, FEAT_LSE2),
NEEDS_FEAT_FLAG(SCTLR_EL2_LSMAOE |
SCTLR_EL2_nTLSMD,
AS_RES1 | REQUIRES_E2H1, FEAT_LSMAOC),
NEEDS_FEAT(SCTLR_EL2_EE, FEAT_MixedEnd),
NEEDS_FEAT_FLAG(SCTLR_EL2_E0E, REQUIRES_E2H1, feat_mixedendel0),
NEEDS_FEAT_FLAG(SCTLR_EL2_MSCEn, REQUIRES_E2H1, FEAT_MOPS),
NEEDS_FEAT_FLAG(SCTLR_EL2_ATA0 |
SCTLR_EL2_TCF0,
REQUIRES_E2H1, FEAT_MTE2),
NEEDS_FEAT(SCTLR_EL2_ATA |
SCTLR_EL2_TCF,
FEAT_MTE2),
NEEDS_FEAT(SCTLR_EL2_ITFSB, feat_mte_async),
NEEDS_FEAT_FLAG(SCTLR_EL2_TCSO0, REQUIRES_E2H1, FEAT_MTE_STORE_ONLY),
NEEDS_FEAT(SCTLR_EL2_TCSO,
FEAT_MTE_STORE_ONLY),
NEEDS_FEAT(SCTLR_EL2_NMI |
SCTLR_EL2_SPINTMASK,
FEAT_NMI),
NEEDS_FEAT_FLAG(SCTLR_EL2_SPAN, AS_RES1 | REQUIRES_E2H1, FEAT_PAN),
NEEDS_FEAT_FLAG(SCTLR_EL2_EPAN, REQUIRES_E2H1, FEAT_PAN3),
NEEDS_FEAT(SCTLR_EL2_EnDA |
SCTLR_EL2_EnDB |
SCTLR_EL2_EnIA |
SCTLR_EL2_EnIB,
feat_pauth),
NEEDS_FEAT_FLAG(SCTLR_EL2_EnTP2, REQUIRES_E2H1, FEAT_SME),
NEEDS_FEAT(SCTLR_EL2_EnRCTX, FEAT_SPECRES),
NEEDS_FEAT(SCTLR_EL2_DSSBS, FEAT_SSBS),
NEEDS_FEAT_FLAG(SCTLR_EL2_TIDCP, REQUIRES_E2H1, FEAT_TIDCP1),
NEEDS_FEAT_FLAG(SCTLR_EL2_TWEDEL |
SCTLR_EL2_TWEDEn,
REQUIRES_E2H1, FEAT_TWED),
NEEDS_FEAT_FLAG(SCTLR_EL2_nTWE |
SCTLR_EL2_nTWI,
AS_RES1 | REQUIRES_E2H1, FEAT_AA64EL2),
NEEDS_FEAT_FLAG(SCTLR_EL2_UCI |
SCTLR_EL2_UCT |
SCTLR_EL2_DZE |
SCTLR_EL2_SA0,
REQUIRES_E2H1, FEAT_AA64EL2),
NEEDS_FEAT(SCTLR_EL2_WXN |
SCTLR_EL2_I |
SCTLR_EL2_SA |
SCTLR_EL2_C |
SCTLR_EL2_A |
SCTLR_EL2_M,
FEAT_AA64EL2),
FORCE_RES0(SCTLR_EL2_RES0),
FORCE_RES1(SCTLR_EL2_RES1),
};
static const DECLARE_FEAT_MAP(sctlr_el2_desc, SCTLR_EL2,
sctlr_el2_feat_map, FEAT_AA64EL2);
static const struct reg_bits_to_feat_map mdcr_el2_feat_map[] = {
NEEDS_FEAT(MDCR_EL2_EBWE, FEAT_Debugv8p9),
NEEDS_FEAT(MDCR_EL2_TDOSA, FEAT_DoubleLock),
@ -1162,27 +1230,75 @@ static const struct reg_bits_to_feat_map mdcr_el2_feat_map[] = {
MDCR_EL2_TDE |
MDCR_EL2_TDRA,
FEAT_AA64EL1),
FORCE_RES0(MDCR_EL2_RES0),
FORCE_RES1(MDCR_EL2_RES1),
};
static const DECLARE_FEAT_MAP(mdcr_el2_desc, MDCR_EL2,
mdcr_el2_feat_map, FEAT_AA64EL2);
static const struct reg_bits_to_feat_map vtcr_el2_feat_map[] = {
NEEDS_FEAT(VTCR_EL2_HDBSS, FEAT_HDBSS),
NEEDS_FEAT(VTCR_EL2_HAFT, FEAT_HAFT),
NEEDS_FEAT(VTCR_EL2_TL0 |
VTCR_EL2_TL1 |
VTCR_EL2_AssuredOnly |
VTCR_EL2_GCSH,
FEAT_THE),
NEEDS_FEAT(VTCR_EL2_D128, FEAT_D128),
NEEDS_FEAT(VTCR_EL2_S2POE, FEAT_S2POE),
NEEDS_FEAT(VTCR_EL2_S2PIE, FEAT_S2PIE),
NEEDS_FEAT(VTCR_EL2_SL2 |
VTCR_EL2_DS,
feat_lpa2),
NEEDS_FEAT(VTCR_EL2_NSA |
VTCR_EL2_NSW,
FEAT_SEL2),
NEEDS_FEAT(VTCR_EL2_HWU62 |
VTCR_EL2_HWU61 |
VTCR_EL2_HWU60 |
VTCR_EL2_HWU59,
FEAT_HPDS2),
NEEDS_FEAT(VTCR_EL2_HD, ID_AA64MMFR1_EL1, HAFDBS, DBM),
NEEDS_FEAT(VTCR_EL2_HA, ID_AA64MMFR1_EL1, HAFDBS, AF),
NEEDS_FEAT(VTCR_EL2_VS, feat_vmid16),
NEEDS_FEAT(VTCR_EL2_PS |
VTCR_EL2_TG0 |
VTCR_EL2_SH0 |
VTCR_EL2_ORGN0 |
VTCR_EL2_IRGN0 |
VTCR_EL2_SL0 |
VTCR_EL2_T0SZ,
FEAT_AA64EL1),
FORCE_RES0(VTCR_EL2_RES0),
FORCE_RES1(VTCR_EL2_RES1),
};
static const DECLARE_FEAT_MAP(vtcr_el2_desc, VTCR_EL2,
vtcr_el2_feat_map, FEAT_AA64EL2);
static void __init check_feat_map(const struct reg_bits_to_feat_map *map,
int map_size, u64 res0, const char *str)
int map_size, u64 resx, const char *str)
{
u64 mask = 0;
/*
* Don't account for FORCE_RESx that are architectural, and
* therefore part of the resx parameter. Other FORCE_RESx bits
* are implementation choices, and therefore accounted for.
*/
for (int i = 0; i < map_size; i++)
mask |= map[i].bits;
if (!((map[i].flags & FORCE_RESx) && (map[i].bits & resx)))
mask |= map[i].bits;
if (mask != ~res0)
if (mask != ~resx)
kvm_err("Undefined %s behaviour, bits %016llx\n",
str, mask ^ ~res0);
str, mask ^ ~resx);
}
static u64 reg_feat_map_bits(const struct reg_bits_to_feat_map *map)
{
return map->flags & RES0_POINTER ? ~(*map->res0p) : map->bits;
return map->flags & MASKS_POINTER ? (map->masks->mask | map->masks->nmask) : map->bits;
}
static void __init check_reg_desc(const struct reg_feat_map_desc *r)
@ -1209,7 +1325,9 @@ void __init check_feature_map(void)
check_reg_desc(&sctlr2_desc);
check_reg_desc(&tcr2_el2_desc);
check_reg_desc(&sctlr_el1_desc);
check_reg_desc(&sctlr_el2_desc);
check_reg_desc(&mdcr_el2_desc);
check_reg_desc(&vtcr_el2_desc);
}
static bool idreg_feat_match(struct kvm *kvm, const struct reg_bits_to_feat_map *map)
@ -1226,14 +1344,14 @@ static bool idreg_feat_match(struct kvm *kvm, const struct reg_bits_to_feat_map
}
}
static u64 __compute_fixed_bits(struct kvm *kvm,
const struct reg_bits_to_feat_map *map,
int map_size,
u64 *fixed_bits,
unsigned long require,
unsigned long exclude)
static struct resx compute_resx_bits(struct kvm *kvm,
const struct reg_bits_to_feat_map *map,
int map_size,
unsigned long require,
unsigned long exclude)
{
u64 val = 0;
bool e2h0 = kvm_has_feat(kvm, FEAT_E2H0);
struct resx resx = {};
for (int i = 0; i < map_size; i++) {
bool match;
@ -1244,60 +1362,72 @@ static u64 __compute_fixed_bits(struct kvm *kvm,
if (map[i].flags & exclude)
continue;
if (map[i].flags & CALL_FUNC)
match = (map[i].flags & FIXED_VALUE) ?
map[i].fval(kvm, fixed_bits) :
map[i].match(kvm);
if (map[i].flags & FORCE_RESx)
match = false;
else if (map[i].flags & CALL_FUNC)
match = map[i].match(kvm);
else
match = idreg_feat_match(kvm, &map[i]);
if (!match || (map[i].flags & FIXED_VALUE))
val |= reg_feat_map_bits(&map[i]);
if (map[i].flags & REQUIRES_E2H1)
match &= !e2h0;
if (!match) {
u64 bits = reg_feat_map_bits(&map[i]);
if ((map[i].flags & AS_RES1) ||
(e2h0 && (map[i].flags & RES1_WHEN_E2H0)) ||
(!e2h0 && (map[i].flags & RES1_WHEN_E2H1)))
resx.res1 |= bits;
else
resx.res0 |= bits;
}
}
return val;
return resx;
}
static u64 compute_res0_bits(struct kvm *kvm,
const struct reg_bits_to_feat_map *map,
int map_size,
unsigned long require,
unsigned long exclude)
static struct resx compute_reg_resx_bits(struct kvm *kvm,
const struct reg_feat_map_desc *r,
unsigned long require,
unsigned long exclude)
{
return __compute_fixed_bits(kvm, map, map_size, NULL,
require, exclude | FIXED_VALUE);
}
struct resx resx;
static u64 compute_reg_res0_bits(struct kvm *kvm,
const struct reg_feat_map_desc *r,
unsigned long require, unsigned long exclude)
{
u64 res0;
res0 = compute_res0_bits(kvm, r->bit_feat_map, r->bit_feat_map_sz,
resx = compute_resx_bits(kvm, r->bit_feat_map, r->bit_feat_map_sz,
require, exclude);
/*
* If computing FGUs, don't take RES0 or register existence
* into account -- we're not computing bits for the register
* itself.
*/
if (!(exclude & NEVER_FGU)) {
res0 |= compute_res0_bits(kvm, &r->feat_map, 1, require, exclude);
res0 |= ~reg_feat_map_bits(&r->feat_map);
if (r->feat_map.flags & MASKS_POINTER) {
resx.res0 |= r->feat_map.masks->res0;
resx.res1 |= r->feat_map.masks->res1;
}
return res0;
/*
* If the register itself was not valid, all the non-RESx bits are
* now considered RES0 (this matches the behaviour of registers such
* as SCTLR2 and TCR2). Weed out any potential (though unlikely)
* overlap with RES1 bits coming from the previous computation.
*/
resx.res0 |= compute_resx_bits(kvm, &r->feat_map, 1, require, exclude).res0;
resx.res1 &= ~resx.res0;
return resx;
}
static u64 compute_reg_fixed_bits(struct kvm *kvm,
const struct reg_feat_map_desc *r,
u64 *fixed_bits, unsigned long require,
unsigned long exclude)
static u64 compute_fgu_bits(struct kvm *kvm, const struct reg_feat_map_desc *r)
{
return __compute_fixed_bits(kvm, r->bit_feat_map, r->bit_feat_map_sz,
fixed_bits, require | FIXED_VALUE, exclude);
struct resx resx;
/*
* If computing FGUs, we collect the unsupported feature bits as
* RESx bits, but don't take the actual RESx bits or register
* existence into account -- we're not computing bits for the
* register itself.
*/
resx = compute_resx_bits(kvm, r->bit_feat_map, r->bit_feat_map_sz,
0, NEVER_FGU);
return resx.res0 | resx.res1;
}
void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt)
@ -1306,40 +1436,29 @@ void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt)
switch (fgt) {
case HFGRTR_GROUP:
val |= compute_reg_res0_bits(kvm, &hfgrtr_desc,
0, NEVER_FGU);
val |= compute_reg_res0_bits(kvm, &hfgwtr_desc,
0, NEVER_FGU);
val |= compute_fgu_bits(kvm, &hfgrtr_desc);
val |= compute_fgu_bits(kvm, &hfgwtr_desc);
break;
case HFGITR_GROUP:
val |= compute_reg_res0_bits(kvm, &hfgitr_desc,
0, NEVER_FGU);
val |= compute_fgu_bits(kvm, &hfgitr_desc);
break;
case HDFGRTR_GROUP:
val |= compute_reg_res0_bits(kvm, &hdfgrtr_desc,
0, NEVER_FGU);
val |= compute_reg_res0_bits(kvm, &hdfgwtr_desc,
0, NEVER_FGU);
val |= compute_fgu_bits(kvm, &hdfgrtr_desc);
val |= compute_fgu_bits(kvm, &hdfgwtr_desc);
break;
case HAFGRTR_GROUP:
val |= compute_reg_res0_bits(kvm, &hafgrtr_desc,
0, NEVER_FGU);
val |= compute_fgu_bits(kvm, &hafgrtr_desc);
break;
case HFGRTR2_GROUP:
val |= compute_reg_res0_bits(kvm, &hfgrtr2_desc,
0, NEVER_FGU);
val |= compute_reg_res0_bits(kvm, &hfgwtr2_desc,
0, NEVER_FGU);
val |= compute_fgu_bits(kvm, &hfgrtr2_desc);
val |= compute_fgu_bits(kvm, &hfgwtr2_desc);
break;
case HFGITR2_GROUP:
val |= compute_reg_res0_bits(kvm, &hfgitr2_desc,
0, NEVER_FGU);
val |= compute_fgu_bits(kvm, &hfgitr2_desc);
break;
case HDFGRTR2_GROUP:
val |= compute_reg_res0_bits(kvm, &hdfgrtr2_desc,
0, NEVER_FGU);
val |= compute_reg_res0_bits(kvm, &hdfgwtr2_desc,
0, NEVER_FGU);
val |= compute_fgu_bits(kvm, &hdfgrtr2_desc);
val |= compute_fgu_bits(kvm, &hdfgwtr2_desc);
break;
default:
BUG();
@ -1348,87 +1467,77 @@ void compute_fgu(struct kvm *kvm, enum fgt_group_id fgt)
kvm->arch.fgu[fgt] = val;
}
void get_reg_fixed_bits(struct kvm *kvm, enum vcpu_sysreg reg, u64 *res0, u64 *res1)
struct resx get_reg_fixed_bits(struct kvm *kvm, enum vcpu_sysreg reg)
{
u64 fixed = 0, mask;
struct resx resx;
switch (reg) {
case HFGRTR_EL2:
*res0 = compute_reg_res0_bits(kvm, &hfgrtr_desc, 0, 0);
*res1 = HFGRTR_EL2_RES1;
resx = compute_reg_resx_bits(kvm, &hfgrtr_desc, 0, 0);
break;
case HFGWTR_EL2:
*res0 = compute_reg_res0_bits(kvm, &hfgwtr_desc, 0, 0);
*res1 = HFGWTR_EL2_RES1;
resx = compute_reg_resx_bits(kvm, &hfgwtr_desc, 0, 0);
break;
case HFGITR_EL2:
*res0 = compute_reg_res0_bits(kvm, &hfgitr_desc, 0, 0);
*res1 = HFGITR_EL2_RES1;
resx = compute_reg_resx_bits(kvm, &hfgitr_desc, 0, 0);
break;
case HDFGRTR_EL2:
*res0 = compute_reg_res0_bits(kvm, &hdfgrtr_desc, 0, 0);
*res1 = HDFGRTR_EL2_RES1;
resx = compute_reg_resx_bits(kvm, &hdfgrtr_desc, 0, 0);
break;
case HDFGWTR_EL2:
*res0 = compute_reg_res0_bits(kvm, &hdfgwtr_desc, 0, 0);
*res1 = HDFGWTR_EL2_RES1;
resx = compute_reg_resx_bits(kvm, &hdfgwtr_desc, 0, 0);
break;
case HAFGRTR_EL2:
*res0 = compute_reg_res0_bits(kvm, &hafgrtr_desc, 0, 0);
*res1 = HAFGRTR_EL2_RES1;
resx = compute_reg_resx_bits(kvm, &hafgrtr_desc, 0, 0);
break;
case HFGRTR2_EL2:
*res0 = compute_reg_res0_bits(kvm, &hfgrtr2_desc, 0, 0);
*res1 = HFGRTR2_EL2_RES1;
resx = compute_reg_resx_bits(kvm, &hfgrtr2_desc, 0, 0);
break;
case HFGWTR2_EL2:
*res0 = compute_reg_res0_bits(kvm, &hfgwtr2_desc, 0, 0);
*res1 = HFGWTR2_EL2_RES1;
resx = compute_reg_resx_bits(kvm, &hfgwtr2_desc, 0, 0);
break;
case HFGITR2_EL2:
*res0 = compute_reg_res0_bits(kvm, &hfgitr2_desc, 0, 0);
*res1 = HFGITR2_EL2_RES1;
resx = compute_reg_resx_bits(kvm, &hfgitr2_desc, 0, 0);
break;
case HDFGRTR2_EL2:
*res0 = compute_reg_res0_bits(kvm, &hdfgrtr2_desc, 0, 0);
*res1 = HDFGRTR2_EL2_RES1;
resx = compute_reg_resx_bits(kvm, &hdfgrtr2_desc, 0, 0);
break;
case HDFGWTR2_EL2:
*res0 = compute_reg_res0_bits(kvm, &hdfgwtr2_desc, 0, 0);
*res1 = HDFGWTR2_EL2_RES1;
resx = compute_reg_resx_bits(kvm, &hdfgwtr2_desc, 0, 0);
break;
case HCRX_EL2:
*res0 = compute_reg_res0_bits(kvm, &hcrx_desc, 0, 0);
*res1 = __HCRX_EL2_RES1;
resx = compute_reg_resx_bits(kvm, &hcrx_desc, 0, 0);
resx.res1 |= __HCRX_EL2_RES1;
break;
case HCR_EL2:
mask = compute_reg_fixed_bits(kvm, &hcr_desc, &fixed, 0, 0);
*res0 = compute_reg_res0_bits(kvm, &hcr_desc, 0, 0);
*res0 |= (mask & ~fixed);
*res1 = HCR_EL2_RES1 | (mask & fixed);
resx = compute_reg_resx_bits(kvm, &hcr_desc, 0, 0);
break;
case SCTLR2_EL1:
case SCTLR2_EL2:
*res0 = compute_reg_res0_bits(kvm, &sctlr2_desc, 0, 0);
*res1 = SCTLR2_EL1_RES1;
resx = compute_reg_resx_bits(kvm, &sctlr2_desc, 0, 0);
break;
case TCR2_EL2:
*res0 = compute_reg_res0_bits(kvm, &tcr2_el2_desc, 0, 0);
*res1 = TCR2_EL2_RES1;
resx = compute_reg_resx_bits(kvm, &tcr2_el2_desc, 0, 0);
break;
case SCTLR_EL1:
*res0 = compute_reg_res0_bits(kvm, &sctlr_el1_desc, 0, 0);
*res1 = SCTLR_EL1_RES1;
resx = compute_reg_resx_bits(kvm, &sctlr_el1_desc, 0, 0);
break;
case SCTLR_EL2:
resx = compute_reg_resx_bits(kvm, &sctlr_el2_desc, 0, 0);
break;
case MDCR_EL2:
*res0 = compute_reg_res0_bits(kvm, &mdcr_el2_desc, 0, 0);
*res1 = MDCR_EL2_RES1;
resx = compute_reg_resx_bits(kvm, &mdcr_el2_desc, 0, 0);
break;
case VTCR_EL2:
resx = compute_reg_resx_bits(kvm, &vtcr_el2_desc, 0, 0);
break;
default:
WARN_ON_ONCE(1);
*res0 = *res1 = 0;
resx = (typeof(resx)){};
break;
}
return resx;
}
static __always_inline struct fgt_masks *__fgt_reg_to_masks(enum vcpu_sysreg reg)

View file

@ -70,6 +70,7 @@ enum cgt_group_id {
CGT_HCR_ENSCXT,
CGT_HCR_TTLBIS,
CGT_HCR_TTLBOS,
CGT_HCR_TID5,
CGT_MDCR_TPMCR,
CGT_MDCR_TPM,
@ -308,6 +309,12 @@ static const struct trap_bits coarse_trap_bits[] = {
.mask = HCR_TTLBOS,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_HCR_TID5] = {
.index = HCR_EL2,
.value = HCR_TID5,
.mask = HCR_TID5,
.behaviour = BEHAVE_FORWARD_RW,
},
[CGT_MDCR_TPMCR] = {
.index = MDCR_EL2,
.value = MDCR_EL2_TPMCR,
@ -665,6 +672,7 @@ static const struct encoding_to_trap_config encoding_to_cgt[] __initconst = {
SR_TRAP(SYS_CCSIDR2_EL1, CGT_HCR_TID2_TID4),
SR_TRAP(SYS_CLIDR_EL1, CGT_HCR_TID2_TID4),
SR_TRAP(SYS_CSSELR_EL1, CGT_HCR_TID2_TID4),
SR_TRAP(SYS_GMID_EL1, CGT_HCR_TID5),
SR_RANGE_TRAP(SYS_ID_PFR0_EL1,
sys_reg(3, 0, 0, 7, 7), CGT_HCR_TID3),
SR_TRAP(SYS_ICC_SGI0R_EL1, CGT_HCR_IMO_FMO_ICH_HCR_TC),
@ -1166,6 +1174,7 @@ static const struct encoding_to_trap_config encoding_to_cgt[] __initconst = {
SR_TRAP(SYS_DBGWCRn_EL1(12), CGT_MDCR_TDE_TDA),
SR_TRAP(SYS_DBGWCRn_EL1(13), CGT_MDCR_TDE_TDA),
SR_TRAP(SYS_DBGWCRn_EL1(14), CGT_MDCR_TDE_TDA),
SR_TRAP(SYS_DBGWCRn_EL1(15), CGT_MDCR_TDE_TDA),
SR_TRAP(SYS_DBGCLAIMSET_EL1, CGT_MDCR_TDE_TDA),
SR_TRAP(SYS_DBGCLAIMCLR_EL1, CGT_MDCR_TDE_TDA),
SR_TRAP(SYS_DBGAUTHSTATUS_EL1, CGT_MDCR_TDE_TDA),
@ -2105,23 +2114,24 @@ static u32 encoding_next(u32 encoding)
}
#define FGT_MASKS(__n, __m) \
struct fgt_masks __n = { .str = #__m, .res0 = __m, }
struct fgt_masks __n = { .str = #__m, .res0 = __m ## _RES0, .res1 = __m ## _RES1 }
FGT_MASKS(hfgrtr_masks, HFGRTR_EL2_RES0);
FGT_MASKS(hfgwtr_masks, HFGWTR_EL2_RES0);
FGT_MASKS(hfgitr_masks, HFGITR_EL2_RES0);
FGT_MASKS(hdfgrtr_masks, HDFGRTR_EL2_RES0);
FGT_MASKS(hdfgwtr_masks, HDFGWTR_EL2_RES0);
FGT_MASKS(hafgrtr_masks, HAFGRTR_EL2_RES0);
FGT_MASKS(hfgrtr2_masks, HFGRTR2_EL2_RES0);
FGT_MASKS(hfgwtr2_masks, HFGWTR2_EL2_RES0);
FGT_MASKS(hfgitr2_masks, HFGITR2_EL2_RES0);
FGT_MASKS(hdfgrtr2_masks, HDFGRTR2_EL2_RES0);
FGT_MASKS(hdfgwtr2_masks, HDFGWTR2_EL2_RES0);
FGT_MASKS(hfgrtr_masks, HFGRTR_EL2);
FGT_MASKS(hfgwtr_masks, HFGWTR_EL2);
FGT_MASKS(hfgitr_masks, HFGITR_EL2);
FGT_MASKS(hdfgrtr_masks, HDFGRTR_EL2);
FGT_MASKS(hdfgwtr_masks, HDFGWTR_EL2);
FGT_MASKS(hafgrtr_masks, HAFGRTR_EL2);
FGT_MASKS(hfgrtr2_masks, HFGRTR2_EL2);
FGT_MASKS(hfgwtr2_masks, HFGWTR2_EL2);
FGT_MASKS(hfgitr2_masks, HFGITR2_EL2);
FGT_MASKS(hdfgrtr2_masks, HDFGRTR2_EL2);
FGT_MASKS(hdfgwtr2_masks, HDFGWTR2_EL2);
static __init bool aggregate_fgt(union trap_config tc)
{
struct fgt_masks *rmasks, *wmasks;
u64 rresx, wresx;
switch (tc.fgt) {
case HFGRTR_GROUP:
@ -2154,24 +2164,27 @@ static __init bool aggregate_fgt(union trap_config tc)
break;
}
rresx = rmasks->res0 | rmasks->res1;
if (wmasks)
wresx = wmasks->res0 | wmasks->res1;
/*
* A bit can be reserved in either the R or W register, but
* not both.
*/
if ((BIT(tc.bit) & rmasks->res0) &&
(!wmasks || (BIT(tc.bit) & wmasks->res0)))
if ((BIT(tc.bit) & rresx) && (!wmasks || (BIT(tc.bit) & wresx)))
return false;
if (tc.pol)
rmasks->mask |= BIT(tc.bit) & ~rmasks->res0;
rmasks->mask |= BIT(tc.bit) & ~rresx;
else
rmasks->nmask |= BIT(tc.bit) & ~rmasks->res0;
rmasks->nmask |= BIT(tc.bit) & ~rresx;
if (wmasks) {
if (tc.pol)
wmasks->mask |= BIT(tc.bit) & ~wmasks->res0;
wmasks->mask |= BIT(tc.bit) & ~wresx;
else
wmasks->nmask |= BIT(tc.bit) & ~wmasks->res0;
wmasks->nmask |= BIT(tc.bit) & ~wresx;
}
return true;
@ -2180,7 +2193,6 @@ static __init bool aggregate_fgt(union trap_config tc)
static __init int check_fgt_masks(struct fgt_masks *masks)
{
unsigned long duplicate = masks->mask & masks->nmask;
u64 res0 = masks->res0;
int ret = 0;
if (duplicate) {
@ -2194,10 +2206,14 @@ static __init int check_fgt_masks(struct fgt_masks *masks)
ret = -EINVAL;
}
masks->res0 = ~(masks->mask | masks->nmask);
if (masks->res0 != res0)
kvm_info("Implicit %s = %016llx, expecting %016llx\n",
masks->str, masks->res0, res0);
if ((masks->res0 | masks->res1 | masks->mask | masks->nmask) != GENMASK(63, 0) ||
(masks->res0 & masks->res1) || (masks->res0 & masks->mask) ||
(masks->res0 & masks->nmask) || (masks->res1 & masks->mask) ||
(masks->res1 & masks->nmask) || (masks->mask & masks->nmask)) {
kvm_info("Inconsistent masks for %s (%016llx, %016llx, %016llx, %016llx)\n",
masks->str, masks->res0, masks->res1, masks->mask, masks->nmask);
masks->res0 = ~(masks->res1 | masks->mask | masks->nmask);
}
return ret;
}
@ -2269,9 +2285,6 @@ int __init populate_nv_trap_config(void)
kvm_info("nv: %ld coarse grained trap handlers\n",
ARRAY_SIZE(encoding_to_cgt));
if (!cpus_have_final_cap(ARM64_HAS_FGT))
goto check_mcb;
for (int i = 0; i < ARRAY_SIZE(encoding_to_fgt); i++) {
const struct encoding_to_trap_config *fgt = &encoding_to_fgt[i];
union trap_config tc;
@ -2291,6 +2304,15 @@ int __init populate_nv_trap_config(void)
}
tc.val |= fgt->tc.val;
if (!aggregate_fgt(tc)) {
ret = -EINVAL;
print_nv_trap_error(fgt, "FGT bit is reserved", ret);
}
if (!cpus_have_final_cap(ARM64_HAS_FGT))
continue;
prev = xa_store(&sr_forward_xa, enc,
xa_mk_value(tc.val), GFP_KERNEL);
@ -2298,11 +2320,6 @@ int __init populate_nv_trap_config(void)
ret = xa_err(prev);
print_nv_trap_error(fgt, "Failed FGT insertion", ret);
}
if (!aggregate_fgt(tc)) {
ret = -EINVAL;
print_nv_trap_error(fgt, "FGT bit is reserved", ret);
}
}
}
@ -2318,7 +2335,6 @@ int __init populate_nv_trap_config(void)
kvm_info("nv: %ld fine grained trap handlers\n",
ARRAY_SIZE(encoding_to_fgt));
check_mcb:
for (int id = __MULTIPLE_CONTROL_BITS__; id < __COMPLEX_CONDITIONS__; id++) {
const enum cgt_group_id *cgids;
@ -2420,15 +2436,7 @@ static enum trap_behaviour compute_trap_behaviour(struct kvm_vcpu *vcpu,
static u64 kvm_get_sysreg_res0(struct kvm *kvm, enum vcpu_sysreg sr)
{
struct kvm_sysreg_masks *masks;
/* Only handle the VNCR-backed regs for now */
if (sr < __VNCR_START__)
return 0;
masks = kvm->arch.sysreg_masks;
return masks->mask[sr - __VNCR_START__].res0;
return kvm_get_sysreg_resx(kvm, sr).res0;
}
static bool check_fgt_bit(struct kvm_vcpu *vcpu, enum vcpu_sysreg sr,
@ -2580,6 +2588,19 @@ local:
params = esr_sys64_to_params(esr);
/*
* This implements the pseudocode UnimplementedIDRegister()
* helper for the purpose of dealing with FEAT_IDST.
*/
if (in_feat_id_space(&params)) {
if (kvm_has_feat(vcpu->kvm, ID_AA64MMFR2_EL1, IDS, IMP))
kvm_inject_sync(vcpu, kvm_vcpu_get_esr(vcpu));
else
kvm_inject_undefined(vcpu);
return true;
}
/*
* Check for the IMPDEF range, as per DDI0487 J.a,
* D18.3.2 Reserved encodings for IMPLEMENTATION

View file

@ -59,10 +59,8 @@ static inline void __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
* If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
* it will cause an exception.
*/
if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd()) {
if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd())
write_sysreg(1 << 30, fpexc32_el2);
isb();
}
}
static inline void __activate_cptr_traps_nvhe(struct kvm_vcpu *vcpu)
@ -495,7 +493,7 @@ static inline void fpsimd_lazy_switch_to_host(struct kvm_vcpu *vcpu)
/*
* When the guest owns the FP regs, we know that guest+hyp traps for
* any FPSIMD/SVE/SME features exposed to the guest have been disabled
* by either fpsimd_lazy_switch_to_guest() or kvm_hyp_handle_fpsimd()
* by either __activate_cptr_traps() or kvm_hyp_handle_fpsimd()
* prior to __guest_entry(). As __guest_entry() guarantees a context
* synchronization event, we don't need an ISB here to avoid taking
* traps for anything that was exposed to the guest.

View file

@ -792,7 +792,7 @@ static void do_ffa_version(struct arm_smccc_1_2_regs *res,
.a0 = FFA_VERSION,
.a1 = ffa_req_version,
}, res);
if (res->a0 == FFA_RET_NOT_SUPPORTED)
if ((s32)res->a0 == FFA_RET_NOT_SUPPORTED)
goto unlock;
hyp_ffa_version = ffa_req_version;
@ -943,7 +943,7 @@ int hyp_ffa_init(void *pages)
.a0 = FFA_VERSION,
.a1 = FFA_VERSION_1_2,
}, &res);
if (res.a0 == FFA_RET_NOT_SUPPORTED)
if ((s32)res.a0 == FFA_RET_NOT_SUPPORTED)
return 0;
/*

View file

@ -260,11 +260,6 @@ reset:
msr sctlr_el2, x5
isb
alternative_if ARM64_KVM_PROTECTED_MODE
mov_q x5, HCR_HOST_NVHE_FLAGS
msr_hcr_el2 x5
alternative_else_nop_endif
/* Install stub vectors */
adr_l x5, __hyp_stub_vectors
msr vbar_el2, x5

View file

@ -690,6 +690,69 @@ static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
kvm_skip_host_instr();
}
/*
* Inject an Undefined Instruction exception into the host.
*
* This is open-coded to allow control over PSTATE construction without
* complicating the generic exception entry helpers.
*/
static void inject_undef64(void)
{
u64 spsr_mask, vbar, sctlr, old_spsr, new_spsr, esr, offset;
spsr_mask = PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT | PSR_V_BIT | PSR_DIT_BIT | PSR_PAN_BIT;
vbar = read_sysreg_el1(SYS_VBAR);
sctlr = read_sysreg_el1(SYS_SCTLR);
old_spsr = read_sysreg_el2(SYS_SPSR);
new_spsr = old_spsr & spsr_mask;
new_spsr |= PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT;
new_spsr |= PSR_MODE_EL1h;
if (!(sctlr & SCTLR_EL1_SPAN))
new_spsr |= PSR_PAN_BIT;
if (sctlr & SCTLR_ELx_DSSBS)
new_spsr |= PSR_SSBS_BIT;
if (system_supports_mte())
new_spsr |= PSR_TCO_BIT;
esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT) | ESR_ELx_IL;
offset = CURRENT_EL_SP_ELx_VECTOR + except_type_sync;
write_sysreg_el1(esr, SYS_ESR);
write_sysreg_el1(read_sysreg_el2(SYS_ELR), SYS_ELR);
write_sysreg_el1(old_spsr, SYS_SPSR);
write_sysreg_el2(vbar + offset, SYS_ELR);
write_sysreg_el2(new_spsr, SYS_SPSR);
}
static bool handle_host_mte(u64 esr)
{
switch (esr_sys64_to_sysreg(esr)) {
case SYS_RGSR_EL1:
case SYS_GCR_EL1:
case SYS_TFSR_EL1:
case SYS_TFSRE0_EL1:
/* If we're here for any reason other than MTE, it's a bug. */
if (read_sysreg(HCR_EL2) & HCR_ATA)
return false;
break;
case SYS_GMID_EL1:
/* If we're here for any reason other than MTE, it's a bug. */
if (!(read_sysreg(HCR_EL2) & HCR_TID5))
return false;
break;
default:
return false;
}
inject_undef64();
return true;
}
void handle_trap(struct kvm_cpu_context *host_ctxt)
{
u64 esr = read_sysreg_el2(SYS_ESR);
@ -705,6 +768,10 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
case ESR_ELx_EC_DABT_LOW:
handle_host_mem_abort(host_ctxt);
break;
case ESR_ELx_EC_SYS64:
if (handle_host_mte(esr))
break;
fallthrough;
default:
BUG();
}

View file

@ -19,7 +19,7 @@
#include <nvhe/mem_protect.h>
#include <nvhe/mm.h>
#define KVM_HOST_S2_FLAGS (KVM_PGTABLE_S2_NOFWB | KVM_PGTABLE_S2_IDMAP)
#define KVM_HOST_S2_FLAGS (KVM_PGTABLE_S2_AS_S1 | KVM_PGTABLE_S2_IDMAP)
struct host_mmu host_mmu;
@ -324,6 +324,8 @@ int __pkvm_prot_finalize(void)
params->vttbr = kvm_get_vttbr(mmu);
params->vtcr = mmu->vtcr;
params->hcr_el2 |= HCR_VM;
if (cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
params->hcr_el2 |= HCR_FWB;
/*
* The CMO below not only cleans the updated params to the

View file

@ -82,7 +82,7 @@ static void pvm_init_traps_hcr(struct kvm_vcpu *vcpu)
if (!kvm_has_feat(kvm, ID_AA64PFR0_EL1, AMU, IMP))
val &= ~(HCR_AMVOFFEN);
if (!kvm_has_feat(kvm, ID_AA64PFR1_EL1, MTE, IMP)) {
if (!kvm_has_mte(kvm)) {
val |= HCR_TID5;
val &= ~(HCR_DCT | HCR_ATA);
}
@ -117,8 +117,8 @@ static void pvm_init_traps_mdcr(struct kvm_vcpu *vcpu)
if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, TraceFilt, IMP))
val |= MDCR_EL2_TTRF;
if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, ExtTrcBuff, IMP))
val |= MDCR_EL2_E2TB_MASK;
if (!kvm_has_feat(kvm, ID_AA64DFR0_EL1, TraceBuffer, IMP))
val &= ~MDCR_EL2_E2TB_MASK;
/* Trap Debug Communications Channel registers */
if (!kvm_has_feat(kvm, ID_AA64MMFR0_EL1, FGT, IMP))
@ -339,9 +339,6 @@ static void pkvm_init_features_from_host(struct pkvm_hyp_vm *hyp_vm, const struc
/* Preserve the vgic model so that GICv3 emulation works */
hyp_vm->kvm.arch.vgic.vgic_model = host_kvm->arch.vgic.vgic_model;
if (test_bit(KVM_ARCH_FLAG_MTE_ENABLED, &host_kvm->arch.flags))
set_bit(KVM_ARCH_FLAG_MTE_ENABLED, &kvm->arch.flags);
/* No restrictions for non-protected VMs. */
if (!kvm_vm_is_protected(kvm)) {
hyp_vm->kvm.arch.flags = host_arch_flags;
@ -356,20 +353,23 @@ static void pkvm_init_features_from_host(struct pkvm_hyp_vm *hyp_vm, const struc
return;
}
if (kvm_pkvm_ext_allowed(kvm, KVM_CAP_ARM_MTE))
kvm->arch.flags |= host_arch_flags & BIT(KVM_ARCH_FLAG_MTE_ENABLED);
bitmap_zero(allowed_features, KVM_VCPU_MAX_FEATURES);
set_bit(KVM_ARM_VCPU_PSCI_0_2, allowed_features);
if (kvm_pvm_ext_allowed(KVM_CAP_ARM_PMU_V3))
if (kvm_pkvm_ext_allowed(kvm, KVM_CAP_ARM_PMU_V3))
set_bit(KVM_ARM_VCPU_PMU_V3, allowed_features);
if (kvm_pvm_ext_allowed(KVM_CAP_ARM_PTRAUTH_ADDRESS))
if (kvm_pkvm_ext_allowed(kvm, KVM_CAP_ARM_PTRAUTH_ADDRESS))
set_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, allowed_features);
if (kvm_pvm_ext_allowed(KVM_CAP_ARM_PTRAUTH_GENERIC))
if (kvm_pkvm_ext_allowed(kvm, KVM_CAP_ARM_PTRAUTH_GENERIC))
set_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, allowed_features);
if (kvm_pvm_ext_allowed(KVM_CAP_ARM_SVE)) {
if (kvm_pkvm_ext_allowed(kvm, KVM_CAP_ARM_SVE)) {
set_bit(KVM_ARM_VCPU_SVE, allowed_features);
kvm->arch.flags |= host_arch_flags & BIT(KVM_ARCH_FLAG_GUEST_HAS_SVE);
}

View file

@ -134,7 +134,7 @@ static const struct pvm_ftr_bits pvmid_aa64mmfr2[] = {
MAX_FEAT(ID_AA64MMFR2_EL1, UAO, IMP),
MAX_FEAT(ID_AA64MMFR2_EL1, IESB, IMP),
MAX_FEAT(ID_AA64MMFR2_EL1, AT, IMP),
MAX_FEAT_ENUM(ID_AA64MMFR2_EL1, IDS, 0x18),
MAX_FEAT(ID_AA64MMFR2_EL1, IDS, IMP),
MAX_FEAT(ID_AA64MMFR2_EL1, TTL, IMP),
MAX_FEAT(ID_AA64MMFR2_EL1, BBM, 2),
MAX_FEAT(ID_AA64MMFR2_EL1, E0PD, IMP),
@ -243,16 +243,15 @@ static u64 pvm_calc_id_reg(const struct kvm_vcpu *vcpu, u32 id)
}
}
/*
* Inject an unknown/undefined exception to an AArch64 guest while most of its
* sysregs are live.
*/
static void inject_undef64(struct kvm_vcpu *vcpu)
static void inject_sync64(struct kvm_vcpu *vcpu, u64 esr)
{
u64 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
*vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR);
*vcpu_cpsr(vcpu) = read_sysreg_el2(SYS_SPSR);
/*
* Make sure we have the latest update to VBAR_EL1, as pKVM
* handles traps very early, before sysregs are resync'ed
*/
__vcpu_assign_sys_reg(vcpu, VBAR_EL1, read_sysreg_el1(SYS_VBAR));
kvm_pend_exception(vcpu, EXCEPT_AA64_EL1_SYNC);
@ -265,6 +264,15 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
}
/*
* Inject an unknown/undefined exception to an AArch64 guest while most of its
* sysregs are live.
*/
static void inject_undef64(struct kvm_vcpu *vcpu)
{
inject_sync64(vcpu, (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT));
}
static u64 read_id_reg(const struct kvm_vcpu *vcpu,
struct sys_reg_desc const *r)
{
@ -339,6 +347,18 @@ static bool pvm_gic_read_sre(struct kvm_vcpu *vcpu,
return true;
}
static bool pvm_idst_access(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
if (kvm_has_feat(vcpu->kvm, ID_AA64MMFR2_EL1, IDS, IMP))
inject_sync64(vcpu, kvm_vcpu_get_esr(vcpu));
else
inject_undef64(vcpu);
return false;
}
/* Mark the specified system register as an AArch32 feature id register. */
#define AARCH32(REG) { SYS_DESC(REG), .access = pvm_access_id_aarch32 }
@ -469,6 +489,9 @@ static const struct sys_reg_desc pvm_sys_reg_descs[] = {
HOST_HANDLED(SYS_CCSIDR_EL1),
HOST_HANDLED(SYS_CLIDR_EL1),
{ SYS_DESC(SYS_CCSIDR2_EL1), .access = pvm_idst_access },
{ SYS_DESC(SYS_GMID_EL1), .access = pvm_idst_access },
{ SYS_DESC(SYS_SMIDR_EL1), .access = pvm_idst_access },
HOST_HANDLED(SYS_AIDR_EL1),
HOST_HANDLED(SYS_CSSELR_EL1),
HOST_HANDLED(SYS_CTR_EL0),

View file

@ -342,6 +342,9 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
if (!(prot & KVM_PGTABLE_PROT_R))
return -EINVAL;
if (!cpus_have_final_cap(ARM64_KVM_HVHE))
prot &= ~KVM_PGTABLE_PROT_UX;
if (prot & KVM_PGTABLE_PROT_X) {
if (prot & KVM_PGTABLE_PROT_W)
return -EINVAL;
@ -351,8 +354,16 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
if (system_supports_bti_kernel())
attr |= KVM_PTE_LEAF_ATTR_HI_S1_GP;
}
if (cpus_have_final_cap(ARM64_KVM_HVHE)) {
if (!(prot & KVM_PGTABLE_PROT_PX))
attr |= KVM_PTE_LEAF_ATTR_HI_S1_PXN;
if (!(prot & KVM_PGTABLE_PROT_UX))
attr |= KVM_PTE_LEAF_ATTR_HI_S1_UXN;
} else {
attr |= KVM_PTE_LEAF_ATTR_HI_S1_XN;
if (!(prot & KVM_PGTABLE_PROT_PX))
attr |= KVM_PTE_LEAF_ATTR_HI_S1_XN;
}
attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
@ -373,8 +384,15 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
if (!kvm_pte_valid(pte))
return prot;
if (!(pte & KVM_PTE_LEAF_ATTR_HI_S1_XN))
prot |= KVM_PGTABLE_PROT_X;
if (cpus_have_final_cap(ARM64_KVM_HVHE)) {
if (!(pte & KVM_PTE_LEAF_ATTR_HI_S1_PXN))
prot |= KVM_PGTABLE_PROT_PX;
if (!(pte & KVM_PTE_LEAF_ATTR_HI_S1_UXN))
prot |= KVM_PGTABLE_PROT_UX;
} else {
if (!(pte & KVM_PTE_LEAF_ATTR_HI_S1_XN))
prot |= KVM_PGTABLE_PROT_PX;
}
ap = FIELD_GET(KVM_PTE_LEAF_ATTR_LO_S1_AP, pte);
if (ap == KVM_PTE_LEAF_ATTR_LO_S1_AP_RO)
@ -583,8 +601,8 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
u64 vtcr = VTCR_EL2_FLAGS;
s8 lvls;
vtcr |= kvm_get_parange(mmfr0) << VTCR_EL2_PS_SHIFT;
vtcr |= VTCR_EL2_T0SZ(phys_shift);
vtcr |= FIELD_PREP(VTCR_EL2_PS, kvm_get_parange(mmfr0));
vtcr |= FIELD_PREP(VTCR_EL2_T0SZ, (UL(64) - phys_shift));
/*
* Use a minimum 2 level page table to prevent splitting
* host PMD huge pages at stage2.
@ -624,21 +642,11 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
vtcr |= VTCR_EL2_DS;
/* Set the vmid bits */
vtcr |= (get_vmid_bits(mmfr1) == 16) ?
VTCR_EL2_VS_16BIT :
VTCR_EL2_VS_8BIT;
vtcr |= (get_vmid_bits(mmfr1) == 16) ? VTCR_EL2_VS : 0;
return vtcr;
}
static bool stage2_has_fwb(struct kvm_pgtable *pgt)
{
if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
return false;
return !(pgt->flags & KVM_PGTABLE_S2_NOFWB);
}
void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
phys_addr_t addr, size_t size)
{
@ -659,7 +667,17 @@ void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
}
}
#define KVM_S2_MEMATTR(pgt, attr) PAGE_S2_MEMATTR(attr, stage2_has_fwb(pgt))
#define KVM_S2_MEMATTR(pgt, attr) \
({ \
kvm_pte_t __attr; \
\
if ((pgt)->flags & KVM_PGTABLE_S2_AS_S1) \
__attr = PAGE_S2_MEMATTR(AS_S1); \
else \
__attr = PAGE_S2_MEMATTR(attr); \
\
__attr; \
})
static int stage2_set_xn_attr(enum kvm_pgtable_prot prot, kvm_pte_t *attr)
{
@ -868,7 +886,7 @@ static bool stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt)
* system supporting FWB as the optimization is entirely
* pointless when the unmap walker needs to perform CMOs.
*/
return system_supports_tlb_range() && stage2_has_fwb(pgt);
return system_supports_tlb_range() && cpus_have_final_cap(ARM64_HAS_STAGE2_FWB);
}
static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
@ -1148,7 +1166,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
if (mm_ops->page_count(childp) != 1)
return 0;
} else if (stage2_pte_cacheable(pgt, ctx->old)) {
need_flush = !stage2_has_fwb(pgt);
need_flush = !cpus_have_final_cap(ARM64_HAS_STAGE2_FWB);
}
/*
@ -1379,7 +1397,7 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
.arg = pgt,
};
if (stage2_has_fwb(pgt))
if (cpus_have_final_cap(ARM64_HAS_STAGE2_FWB))
return 0;
return kvm_pgtable_walk(pgt, addr, size, &walker);

View file

@ -44,7 +44,7 @@ int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu)
/* Build the full address */
fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
fault_ipa |= FAR_TO_FIPA_OFFSET(kvm_vcpu_get_hfar(vcpu));
/* If not for GICV, move on */
if (fault_ipa < vgic->vgic_cpu_base ||

View file

@ -569,11 +569,11 @@ static int __vgic_v3_highest_priority_lr(struct kvm_vcpu *vcpu, u32 vmcr,
continue;
/* Group-0 interrupt, but Group-0 disabled? */
if (!(val & ICH_LR_GROUP) && !(vmcr & ICH_VMCR_ENG0_MASK))
if (!(val & ICH_LR_GROUP) && !(vmcr & ICH_VMCR_EL2_VENG0_MASK))
continue;
/* Group-1 interrupt, but Group-1 disabled? */
if ((val & ICH_LR_GROUP) && !(vmcr & ICH_VMCR_ENG1_MASK))
if ((val & ICH_LR_GROUP) && !(vmcr & ICH_VMCR_EL2_VENG1_MASK))
continue;
/* Not the highest priority? */
@ -646,19 +646,19 @@ static int __vgic_v3_get_highest_active_priority(void)
static unsigned int __vgic_v3_get_bpr0(u32 vmcr)
{
return (vmcr & ICH_VMCR_BPR0_MASK) >> ICH_VMCR_BPR0_SHIFT;
return FIELD_GET(ICH_VMCR_EL2_VBPR0, vmcr);
}
static unsigned int __vgic_v3_get_bpr1(u32 vmcr)
{
unsigned int bpr;
if (vmcr & ICH_VMCR_CBPR_MASK) {
if (vmcr & ICH_VMCR_EL2_VCBPR_MASK) {
bpr = __vgic_v3_get_bpr0(vmcr);
if (bpr < 7)
bpr++;
} else {
bpr = (vmcr & ICH_VMCR_BPR1_MASK) >> ICH_VMCR_BPR1_SHIFT;
bpr = FIELD_GET(ICH_VMCR_EL2_VBPR1, vmcr);
}
return bpr;
@ -758,7 +758,7 @@ static void __vgic_v3_read_iar(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
if (grp != !!(lr_val & ICH_LR_GROUP))
goto spurious;
pmr = (vmcr & ICH_VMCR_PMR_MASK) >> ICH_VMCR_PMR_SHIFT;
pmr = FIELD_GET(ICH_VMCR_EL2_VPMR, vmcr);
lr_prio = (lr_val & ICH_LR_PRIORITY_MASK) >> ICH_LR_PRIORITY_SHIFT;
if (pmr <= lr_prio)
goto spurious;
@ -806,7 +806,7 @@ static int ___vgic_v3_write_dir(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
int lr;
/* EOImode == 0, nothing to be done here */
if (!(vmcr & ICH_VMCR_EOIM_MASK))
if (!(vmcr & ICH_VMCR_EL2_VEOIM_MASK))
return 1;
/* No deactivate to be performed on an LPI */
@ -849,7 +849,7 @@ static void __vgic_v3_write_eoir(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
}
/* EOImode == 1 and not an LPI, nothing to be done here */
if ((vmcr & ICH_VMCR_EOIM_MASK) && !(vid >= VGIC_MIN_LPI))
if ((vmcr & ICH_VMCR_EL2_VEOIM_MASK) && !(vid >= VGIC_MIN_LPI))
return;
lr_prio = (lr_val & ICH_LR_PRIORITY_MASK) >> ICH_LR_PRIORITY_SHIFT;
@ -865,22 +865,19 @@ static void __vgic_v3_write_eoir(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
static void __vgic_v3_read_igrpen0(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
{
vcpu_set_reg(vcpu, rt, !!(vmcr & ICH_VMCR_ENG0_MASK));
vcpu_set_reg(vcpu, rt, FIELD_GET(ICH_VMCR_EL2_VENG0, vmcr));
}
static void __vgic_v3_read_igrpen1(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
{
vcpu_set_reg(vcpu, rt, !!(vmcr & ICH_VMCR_ENG1_MASK));
vcpu_set_reg(vcpu, rt, FIELD_GET(ICH_VMCR_EL2_VENG1, vmcr));
}
static void __vgic_v3_write_igrpen0(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
{
u64 val = vcpu_get_reg(vcpu, rt);
if (val & 1)
vmcr |= ICH_VMCR_ENG0_MASK;
else
vmcr &= ~ICH_VMCR_ENG0_MASK;
FIELD_MODIFY(ICH_VMCR_EL2_VENG0, &vmcr, val & 1);
__vgic_v3_write_vmcr(vmcr);
}
@ -889,10 +886,7 @@ static void __vgic_v3_write_igrpen1(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
{
u64 val = vcpu_get_reg(vcpu, rt);
if (val & 1)
vmcr |= ICH_VMCR_ENG1_MASK;
else
vmcr &= ~ICH_VMCR_ENG1_MASK;
FIELD_MODIFY(ICH_VMCR_EL2_VENG1, &vmcr, val & 1);
__vgic_v3_write_vmcr(vmcr);
}
@ -916,10 +910,7 @@ static void __vgic_v3_write_bpr0(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
if (val < bpr_min)
val = bpr_min;
val <<= ICH_VMCR_BPR0_SHIFT;
val &= ICH_VMCR_BPR0_MASK;
vmcr &= ~ICH_VMCR_BPR0_MASK;
vmcr |= val;
FIELD_MODIFY(ICH_VMCR_EL2_VBPR0, &vmcr, val);
__vgic_v3_write_vmcr(vmcr);
}
@ -929,17 +920,14 @@ static void __vgic_v3_write_bpr1(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
u64 val = vcpu_get_reg(vcpu, rt);
u8 bpr_min = __vgic_v3_bpr_min();
if (vmcr & ICH_VMCR_CBPR_MASK)
if (FIELD_GET(ICH_VMCR_EL2_VCBPR, val))
return;
/* Enforce BPR limiting */
if (val < bpr_min)
val = bpr_min;
val <<= ICH_VMCR_BPR1_SHIFT;
val &= ICH_VMCR_BPR1_MASK;
vmcr &= ~ICH_VMCR_BPR1_MASK;
vmcr |= val;
FIELD_MODIFY(ICH_VMCR_EL2_VBPR1, &vmcr, val);
__vgic_v3_write_vmcr(vmcr);
}
@ -1029,19 +1017,14 @@ spurious:
static void __vgic_v3_read_pmr(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
{
vmcr &= ICH_VMCR_PMR_MASK;
vmcr >>= ICH_VMCR_PMR_SHIFT;
vcpu_set_reg(vcpu, rt, vmcr);
vcpu_set_reg(vcpu, rt, FIELD_GET(ICH_VMCR_EL2_VPMR, vmcr));
}
static void __vgic_v3_write_pmr(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
{
u32 val = vcpu_get_reg(vcpu, rt);
val <<= ICH_VMCR_PMR_SHIFT;
val &= ICH_VMCR_PMR_MASK;
vmcr &= ~ICH_VMCR_PMR_MASK;
vmcr |= val;
FIELD_MODIFY(ICH_VMCR_EL2_VPMR, &vmcr, val);
write_gicreg(vmcr, ICH_VMCR_EL2);
}
@ -1064,9 +1047,11 @@ static void __vgic_v3_read_ctlr(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
/* A3V */
val |= ((vtr >> 21) & 1) << ICC_CTLR_EL1_A3V_SHIFT;
/* EOImode */
val |= ((vmcr & ICH_VMCR_EOIM_MASK) >> ICH_VMCR_EOIM_SHIFT) << ICC_CTLR_EL1_EOImode_SHIFT;
val |= FIELD_PREP(ICC_CTLR_EL1_EOImode_MASK,
FIELD_GET(ICH_VMCR_EL2_VEOIM, vmcr));
/* CBPR */
val |= (vmcr & ICH_VMCR_CBPR_MASK) >> ICH_VMCR_CBPR_SHIFT;
val |= FIELD_PREP(ICC_CTLR_EL1_CBPR_MASK,
FIELD_GET(ICH_VMCR_EL2_VCBPR, vmcr));
vcpu_set_reg(vcpu, rt, val);
}
@ -1075,15 +1060,11 @@ static void __vgic_v3_write_ctlr(struct kvm_vcpu *vcpu, u32 vmcr, int rt)
{
u32 val = vcpu_get_reg(vcpu, rt);
if (val & ICC_CTLR_EL1_CBPR_MASK)
vmcr |= ICH_VMCR_CBPR_MASK;
else
vmcr &= ~ICH_VMCR_CBPR_MASK;
FIELD_MODIFY(ICH_VMCR_EL2_VCBPR, &vmcr,
FIELD_GET(ICC_CTLR_EL1_CBPR_MASK, val));
if (val & ICC_CTLR_EL1_EOImode_MASK)
vmcr |= ICH_VMCR_EOIM_MASK;
else
vmcr &= ~ICH_VMCR_EOIM_MASK;
FIELD_MODIFY(ICH_VMCR_EL2_VEOIM, &vmcr,
FIELD_GET(ICC_CTLR_EL1_EOImode_MASK, val));
write_gicreg(vmcr, ICH_VMCR_EL2);
}

View file

@ -205,7 +205,7 @@ void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu)
/*
* When running a normal EL1 guest, we only load a new vcpu
* after a context switch, which imvolves a DSB, so all
* after a context switch, which involves a DSB, so all
* speculative EL1&0 walks will have already completed.
* If running NV, the vcpu may transition between vEL1 and
* vEL2 without a context switch, so make sure we complete

View file

@ -162,12 +162,16 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
vcpu_write_sys_reg(vcpu, esr, exception_esr_elx(vcpu));
}
void kvm_inject_sync(struct kvm_vcpu *vcpu, u64 esr)
{
pend_sync_exception(vcpu);
vcpu_write_sys_reg(vcpu, esr, exception_esr_elx(vcpu));
}
static void inject_undef64(struct kvm_vcpu *vcpu)
{
u64 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
pend_sync_exception(vcpu);
/*
* Build an unknown exception, depending on the instruction
* set.
@ -175,7 +179,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
if (kvm_vcpu_trap_il_is32bit(vcpu))
esr |= ESR_ELx_IL;
vcpu_write_sys_reg(vcpu, esr, exception_esr_elx(vcpu));
kvm_inject_sync(vcpu, esr);
}
#define DFSR_FSC_EXTABT_LPAE 0x10
@ -292,7 +296,7 @@ void kvm_inject_size_fault(struct kvm_vcpu *vcpu)
unsigned long addr, esr;
addr = kvm_vcpu_get_fault_ipa(vcpu);
addr |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
addr |= FAR_TO_FIPA_OFFSET(kvm_vcpu_get_hfar(vcpu));
__kvm_inject_sea(vcpu, kvm_vcpu_trap_is_iabt(vcpu), addr);

View file

@ -2079,7 +2079,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
/* Falls between the IPA range and the PARange? */
if (fault_ipa >= BIT_ULL(VTCR_EL2_IPA(vcpu->arch.hw_mmu->vtcr))) {
fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
fault_ipa |= FAR_TO_FIPA_OFFSET(kvm_vcpu_get_hfar(vcpu));
return kvm_inject_sea(vcpu, is_iabt, fault_ipa);
}
@ -2185,7 +2185,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
* faulting VA. This is always 12 bits, irrespective
* of the page size.
*/
ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
ipa |= FAR_TO_FIPA_OFFSET(kvm_vcpu_get_hfar(vcpu));
ret = io_mem_abort(vcpu, ipa);
goto out_unlock;
}
@ -2294,11 +2294,9 @@ static struct kvm_pgtable_mm_ops kvm_hyp_mm_ops = {
.virt_to_phys = kvm_host_pa,
};
int __init kvm_mmu_init(u32 *hyp_va_bits)
int __init kvm_mmu_init(u32 hyp_va_bits)
{
int err;
u32 idmap_bits;
u32 kernel_bits;
hyp_idmap_start = __pa_symbol(__hyp_idmap_text_start);
hyp_idmap_start = ALIGN_DOWN(hyp_idmap_start, PAGE_SIZE);
@ -2312,25 +2310,7 @@ int __init kvm_mmu_init(u32 *hyp_va_bits)
*/
BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
/*
* The ID map is always configured for 48 bits of translation, which
* may be fewer than the number of VA bits used by the regular kernel
* stage 1, when VA_BITS=52.
*
* At EL2, there is only one TTBR register, and we can't switch between
* translation tables *and* update TCR_EL2.T0SZ at the same time. Bottom
* line: we need to use the extended range with *both* our translation
* tables.
*
* So use the maximum of the idmap VA bits and the regular kernel stage
* 1 VA bits to assure that the hypervisor can both ID map its code page
* and map any kernel memory.
*/
idmap_bits = IDMAP_VA_BITS;
kernel_bits = vabits_actual;
*hyp_va_bits = max(idmap_bits, kernel_bits);
kvm_debug("Using %u-bit virtual addresses at EL2\n", *hyp_va_bits);
kvm_debug("Using %u-bit virtual addresses at EL2\n", hyp_va_bits);
kvm_debug("IDMAP page: %lx\n", hyp_idmap_start);
kvm_debug("HYP VA range: %lx:%lx\n",
kern_hyp_va(PAGE_OFFSET),
@ -2355,7 +2335,7 @@ int __init kvm_mmu_init(u32 *hyp_va_bits)
goto out;
}
err = kvm_pgtable_hyp_init(hyp_pgtable, *hyp_va_bits, &kvm_hyp_mm_ops);
err = kvm_pgtable_hyp_init(hyp_pgtable, hyp_va_bits, &kvm_hyp_mm_ops);
if (err)
goto out_free_pgtable;
@ -2364,7 +2344,7 @@ int __init kvm_mmu_init(u32 *hyp_va_bits)
goto out_destroy_pgtable;
io_map_base = hyp_idmap_start;
__hyp_va_bits = *hyp_va_bits;
__hyp_va_bits = hyp_va_bits;
return 0;
out_destroy_pgtable:

View file

@ -377,7 +377,7 @@ static void vtcr_to_walk_info(u64 vtcr, struct s2_walk_info *wi)
{
wi->t0sz = vtcr & TCR_EL2_T0SZ_MASK;
switch (vtcr & VTCR_EL2_TG0_MASK) {
switch (FIELD_GET(VTCR_EL2_TG0_MASK, vtcr)) {
case VTCR_EL2_TG0_4K:
wi->pgshift = 12; break;
case VTCR_EL2_TG0_16K:
@ -513,7 +513,7 @@ static u8 get_guest_mapping_ttl(struct kvm_s2_mmu *mmu, u64 addr)
lockdep_assert_held_write(&kvm_s2_mmu_to_kvm(mmu)->mmu_lock);
switch (vtcr & VTCR_EL2_TG0_MASK) {
switch (FIELD_GET(VTCR_EL2_TG0_MASK, vtcr)) {
case VTCR_EL2_TG0_4K:
ttl = (TLBI_TTL_TG_4K << 2);
break;
@ -530,7 +530,7 @@ static u8 get_guest_mapping_ttl(struct kvm_s2_mmu *mmu, u64 addr)
again:
/* Iteratively compute the block sizes for a particular granule size */
switch (vtcr & VTCR_EL2_TG0_MASK) {
switch (FIELD_GET(VTCR_EL2_TG0_MASK, vtcr)) {
case VTCR_EL2_TG0_4K:
if (sz < SZ_4K) sz = SZ_4K;
else if (sz < SZ_2M) sz = SZ_2M;
@ -593,7 +593,7 @@ unsigned long compute_tlb_inval_range(struct kvm_s2_mmu *mmu, u64 val)
if (!max_size) {
/* Compute the maximum extent of the invalidation */
switch (mmu->tlb_vtcr & VTCR_EL2_TG0_MASK) {
switch (FIELD_GET(VTCR_EL2_TG0_MASK, mmu->tlb_vtcr)) {
case VTCR_EL2_TG0_4K:
max_size = SZ_1G;
break;
@ -1101,6 +1101,9 @@ void kvm_nested_s2_wp(struct kvm *kvm)
lockdep_assert_held_write(&kvm->mmu_lock);
if (!kvm->arch.nested_mmus_size)
return;
for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
@ -1117,6 +1120,9 @@ void kvm_nested_s2_unmap(struct kvm *kvm, bool may_block)
lockdep_assert_held_write(&kvm->mmu_lock);
if (!kvm->arch.nested_mmus_size)
return;
for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
@ -1133,6 +1139,9 @@ void kvm_nested_s2_flush(struct kvm *kvm)
lockdep_assert_held_write(&kvm->mmu_lock);
if (!kvm->arch.nested_mmus_size)
return;
for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
@ -1145,6 +1154,9 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
{
int i;
if (!kvm->arch.nested_mmus_size)
return;
for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
@ -1505,11 +1517,6 @@ u64 limit_nv_id_reg(struct kvm *kvm, u32 reg, u64 val)
u64 orig_val = val;
switch (reg) {
case SYS_ID_AA64ISAR0_EL1:
/* Support everything but TME */
val &= ~ID_AA64ISAR0_EL1_TME;
break;
case SYS_ID_AA64ISAR1_EL1:
/* Support everything but LS64 and Spec Invalidation */
val &= ~(ID_AA64ISAR1_EL1_LS64 |
@ -1669,36 +1676,28 @@ u64 limit_nv_id_reg(struct kvm *kvm, u32 reg, u64 val)
u64 kvm_vcpu_apply_reg_masks(const struct kvm_vcpu *vcpu,
enum vcpu_sysreg sr, u64 v)
{
struct kvm_sysreg_masks *masks;
struct resx resx;
masks = vcpu->kvm->arch.sysreg_masks;
if (masks) {
sr -= __SANITISED_REG_START__;
v &= ~masks->mask[sr].res0;
v |= masks->mask[sr].res1;
}
resx = kvm_get_sysreg_resx(vcpu->kvm, sr);
v &= ~resx.res0;
v |= resx.res1;
return v;
}
static __always_inline void set_sysreg_masks(struct kvm *kvm, int sr, u64 res0, u64 res1)
static __always_inline void set_sysreg_masks(struct kvm *kvm, int sr, struct resx resx)
{
int i = sr - __SANITISED_REG_START__;
BUILD_BUG_ON(!__builtin_constant_p(sr));
BUILD_BUG_ON(sr < __SANITISED_REG_START__);
BUILD_BUG_ON(sr >= NR_SYS_REGS);
kvm->arch.sysreg_masks->mask[i].res0 = res0;
kvm->arch.sysreg_masks->mask[i].res1 = res1;
kvm_set_sysreg_resx(kvm, sr, resx);
}
int kvm_init_nv_sysregs(struct kvm_vcpu *vcpu)
{
struct kvm *kvm = vcpu->kvm;
u64 res0, res1;
struct resx resx;
lockdep_assert_held(&kvm->arch.config_lock);
@ -1711,111 +1710,116 @@ int kvm_init_nv_sysregs(struct kvm_vcpu *vcpu)
return -ENOMEM;
/* VTTBR_EL2 */
res0 = res1 = 0;
resx = (typeof(resx)){};
if (!kvm_has_feat_enum(kvm, ID_AA64MMFR1_EL1, VMIDBits, 16))
res0 |= GENMASK(63, 56);
resx.res0 |= GENMASK(63, 56);
if (!kvm_has_feat(kvm, ID_AA64MMFR2_EL1, CnP, IMP))
res0 |= VTTBR_CNP_BIT;
set_sysreg_masks(kvm, VTTBR_EL2, res0, res1);
resx.res0 |= VTTBR_CNP_BIT;
set_sysreg_masks(kvm, VTTBR_EL2, resx);
/* VTCR_EL2 */
res0 = GENMASK(63, 32) | GENMASK(30, 20);
res1 = BIT(31);
set_sysreg_masks(kvm, VTCR_EL2, res0, res1);
resx = get_reg_fixed_bits(kvm, VTCR_EL2);
set_sysreg_masks(kvm, VTCR_EL2, resx);
/* VMPIDR_EL2 */
res0 = GENMASK(63, 40) | GENMASK(30, 24);
res1 = BIT(31);
set_sysreg_masks(kvm, VMPIDR_EL2, res0, res1);
resx.res0 = GENMASK(63, 40) | GENMASK(30, 24);
resx.res1 = BIT(31);
set_sysreg_masks(kvm, VMPIDR_EL2, resx);
/* HCR_EL2 */
get_reg_fixed_bits(kvm, HCR_EL2, &res0, &res1);
set_sysreg_masks(kvm, HCR_EL2, res0, res1);
resx = get_reg_fixed_bits(kvm, HCR_EL2);
set_sysreg_masks(kvm, HCR_EL2, resx);
/* HCRX_EL2 */
get_reg_fixed_bits(kvm, HCRX_EL2, &res0, &res1);
set_sysreg_masks(kvm, HCRX_EL2, res0, res1);
resx = get_reg_fixed_bits(kvm, HCRX_EL2);
set_sysreg_masks(kvm, HCRX_EL2, resx);
/* HFG[RW]TR_EL2 */
get_reg_fixed_bits(kvm, HFGRTR_EL2, &res0, &res1);
set_sysreg_masks(kvm, HFGRTR_EL2, res0, res1);
get_reg_fixed_bits(kvm, HFGWTR_EL2, &res0, &res1);
set_sysreg_masks(kvm, HFGWTR_EL2, res0, res1);
resx = get_reg_fixed_bits(kvm, HFGRTR_EL2);
set_sysreg_masks(kvm, HFGRTR_EL2, resx);
resx = get_reg_fixed_bits(kvm, HFGWTR_EL2);
set_sysreg_masks(kvm, HFGWTR_EL2, resx);
/* HDFG[RW]TR_EL2 */
get_reg_fixed_bits(kvm, HDFGRTR_EL2, &res0, &res1);
set_sysreg_masks(kvm, HDFGRTR_EL2, res0, res1);
get_reg_fixed_bits(kvm, HDFGWTR_EL2, &res0, &res1);
set_sysreg_masks(kvm, HDFGWTR_EL2, res0, res1);
resx = get_reg_fixed_bits(kvm, HDFGRTR_EL2);
set_sysreg_masks(kvm, HDFGRTR_EL2, resx);
resx = get_reg_fixed_bits(kvm, HDFGWTR_EL2);
set_sysreg_masks(kvm, HDFGWTR_EL2, resx);
/* HFGITR_EL2 */
get_reg_fixed_bits(kvm, HFGITR_EL2, &res0, &res1);
set_sysreg_masks(kvm, HFGITR_EL2, res0, res1);
resx = get_reg_fixed_bits(kvm, HFGITR_EL2);
set_sysreg_masks(kvm, HFGITR_EL2, resx);
/* HAFGRTR_EL2 - not a lot to see here */
get_reg_fixed_bits(kvm, HAFGRTR_EL2, &res0, &res1);
set_sysreg_masks(kvm, HAFGRTR_EL2, res0, res1);
resx = get_reg_fixed_bits(kvm, HAFGRTR_EL2);
set_sysreg_masks(kvm, HAFGRTR_EL2, resx);
/* HFG[RW]TR2_EL2 */
get_reg_fixed_bits(kvm, HFGRTR2_EL2, &res0, &res1);
set_sysreg_masks(kvm, HFGRTR2_EL2, res0, res1);
get_reg_fixed_bits(kvm, HFGWTR2_EL2, &res0, &res1);
set_sysreg_masks(kvm, HFGWTR2_EL2, res0, res1);
resx = get_reg_fixed_bits(kvm, HFGRTR2_EL2);
set_sysreg_masks(kvm, HFGRTR2_EL2, resx);
resx = get_reg_fixed_bits(kvm, HFGWTR2_EL2);
set_sysreg_masks(kvm, HFGWTR2_EL2, resx);
/* HDFG[RW]TR2_EL2 */
get_reg_fixed_bits(kvm, HDFGRTR2_EL2, &res0, &res1);
set_sysreg_masks(kvm, HDFGRTR2_EL2, res0, res1);
get_reg_fixed_bits(kvm, HDFGWTR2_EL2, &res0, &res1);
set_sysreg_masks(kvm, HDFGWTR2_EL2, res0, res1);
resx = get_reg_fixed_bits(kvm, HDFGRTR2_EL2);
set_sysreg_masks(kvm, HDFGRTR2_EL2, resx);
resx = get_reg_fixed_bits(kvm, HDFGWTR2_EL2);
set_sysreg_masks(kvm, HDFGWTR2_EL2, resx);
/* HFGITR2_EL2 */
get_reg_fixed_bits(kvm, HFGITR2_EL2, &res0, &res1);
set_sysreg_masks(kvm, HFGITR2_EL2, res0, res1);
resx = get_reg_fixed_bits(kvm, HFGITR2_EL2);
set_sysreg_masks(kvm, HFGITR2_EL2, resx);
/* TCR2_EL2 */
get_reg_fixed_bits(kvm, TCR2_EL2, &res0, &res1);
set_sysreg_masks(kvm, TCR2_EL2, res0, res1);
resx = get_reg_fixed_bits(kvm, TCR2_EL2);
set_sysreg_masks(kvm, TCR2_EL2, resx);
/* SCTLR_EL1 */
get_reg_fixed_bits(kvm, SCTLR_EL1, &res0, &res1);
set_sysreg_masks(kvm, SCTLR_EL1, res0, res1);
resx = get_reg_fixed_bits(kvm, SCTLR_EL1);
set_sysreg_masks(kvm, SCTLR_EL1, resx);
/* SCTLR_EL2 */
resx = get_reg_fixed_bits(kvm, SCTLR_EL2);
set_sysreg_masks(kvm, SCTLR_EL2, resx);
/* SCTLR2_ELx */
get_reg_fixed_bits(kvm, SCTLR2_EL1, &res0, &res1);
set_sysreg_masks(kvm, SCTLR2_EL1, res0, res1);
get_reg_fixed_bits(kvm, SCTLR2_EL2, &res0, &res1);
set_sysreg_masks(kvm, SCTLR2_EL2, res0, res1);
resx = get_reg_fixed_bits(kvm, SCTLR2_EL1);
set_sysreg_masks(kvm, SCTLR2_EL1, resx);
resx = get_reg_fixed_bits(kvm, SCTLR2_EL2);
set_sysreg_masks(kvm, SCTLR2_EL2, resx);
/* MDCR_EL2 */
get_reg_fixed_bits(kvm, MDCR_EL2, &res0, &res1);
set_sysreg_masks(kvm, MDCR_EL2, res0, res1);
resx = get_reg_fixed_bits(kvm, MDCR_EL2);
set_sysreg_masks(kvm, MDCR_EL2, resx);
/* CNTHCTL_EL2 */
res0 = GENMASK(63, 20);
res1 = 0;
resx.res0 = GENMASK(63, 20);
resx.res1 = 0;
if (!kvm_has_feat(kvm, ID_AA64PFR0_EL1, RME, IMP))
res0 |= CNTHCTL_CNTPMASK | CNTHCTL_CNTVMASK;
resx.res0 |= CNTHCTL_CNTPMASK | CNTHCTL_CNTVMASK;
if (!kvm_has_feat(kvm, ID_AA64MMFR0_EL1, ECV, CNTPOFF)) {
res0 |= CNTHCTL_ECV;
resx.res0 |= CNTHCTL_ECV;
if (!kvm_has_feat(kvm, ID_AA64MMFR0_EL1, ECV, IMP))
res0 |= (CNTHCTL_EL1TVT | CNTHCTL_EL1TVCT |
CNTHCTL_EL1NVPCT | CNTHCTL_EL1NVVCT);
resx.res0 |= (CNTHCTL_EL1TVT | CNTHCTL_EL1TVCT |
CNTHCTL_EL1NVPCT | CNTHCTL_EL1NVVCT);
}
if (!kvm_has_feat(kvm, ID_AA64MMFR1_EL1, VH, IMP))
res0 |= GENMASK(11, 8);
set_sysreg_masks(kvm, CNTHCTL_EL2, res0, res1);
resx.res0 |= GENMASK(11, 8);
set_sysreg_masks(kvm, CNTHCTL_EL2, resx);
/* ICH_HCR_EL2 */
res0 = ICH_HCR_EL2_RES0;
res1 = ICH_HCR_EL2_RES1;
resx.res0 = ICH_HCR_EL2_RES0;
resx.res1 = ICH_HCR_EL2_RES1;
if (!(kvm_vgic_global_state.ich_vtr_el2 & ICH_VTR_EL2_TDS))
res0 |= ICH_HCR_EL2_TDIR;
resx.res0 |= ICH_HCR_EL2_TDIR;
/* No GICv4 is presented to the guest */
res0 |= ICH_HCR_EL2_DVIM | ICH_HCR_EL2_vSGIEOICount;
set_sysreg_masks(kvm, ICH_HCR_EL2, res0, res1);
resx.res0 |= ICH_HCR_EL2_DVIM | ICH_HCR_EL2_vSGIEOICount;
set_sysreg_masks(kvm, ICH_HCR_EL2, resx);
/* VNCR_EL2 */
set_sysreg_masks(kvm, VNCR_EL2, VNCR_EL2_RES0, VNCR_EL2_RES1);
resx.res0 = VNCR_EL2_RES0;
resx.res1 = VNCR_EL2_RES1;
set_sysreg_masks(kvm, VNCR_EL2, resx);
out:
for (enum vcpu_sysreg sr = __SANITISED_REG_START__; sr < NR_SYS_REGS; sr++)

View file

@ -3414,8 +3414,6 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_CCSIDR_EL1), access_ccsidr },
{ SYS_DESC(SYS_CLIDR_EL1), access_clidr, reset_clidr, CLIDR_EL1,
.set_user = set_clidr, .val = ~CLIDR_EL1_RES0 },
{ SYS_DESC(SYS_CCSIDR2_EL1), undef_access },
{ SYS_DESC(SYS_SMIDR_EL1), undef_access },
IMPLEMENTATION_ID(AIDR_EL1, GENMASK_ULL(63, 0)),
{ SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 },
ID_FILTERED(CTR_EL0, ctr_el0,
@ -4995,7 +4993,7 @@ static bool emulate_sys_reg(struct kvm_vcpu *vcpu,
return false;
}
static const struct sys_reg_desc *idregs_debug_find(struct kvm *kvm, u8 pos)
static const struct sys_reg_desc *idregs_debug_find(struct kvm *kvm, loff_t pos)
{
unsigned long i, idreg_idx = 0;
@ -5005,10 +5003,8 @@ static const struct sys_reg_desc *idregs_debug_find(struct kvm *kvm, u8 pos)
if (!is_vm_ftr_id_reg(reg_to_encoding(r)))
continue;
if (idreg_idx == pos)
if (idreg_idx++ == pos)
return r;
idreg_idx++;
}
return NULL;
@ -5017,23 +5013,11 @@ static const struct sys_reg_desc *idregs_debug_find(struct kvm *kvm, u8 pos)
static void *idregs_debug_start(struct seq_file *s, loff_t *pos)
{
struct kvm *kvm = s->private;
u8 *iter;
mutex_lock(&kvm->arch.config_lock);
if (!test_bit(KVM_ARCH_FLAG_ID_REGS_INITIALIZED, &kvm->arch.flags))
return NULL;
iter = &kvm->arch.idreg_debugfs_iter;
if (test_bit(KVM_ARCH_FLAG_ID_REGS_INITIALIZED, &kvm->arch.flags) &&
*iter == (u8)~0) {
*iter = *pos;
if (!idregs_debug_find(kvm, *iter))
iter = NULL;
} else {
iter = ERR_PTR(-EBUSY);
}
mutex_unlock(&kvm->arch.config_lock);
return iter;
return (void *)idregs_debug_find(kvm, *pos);
}
static void *idregs_debug_next(struct seq_file *s, void *v, loff_t *pos)
@ -5042,37 +5026,19 @@ static void *idregs_debug_next(struct seq_file *s, void *v, loff_t *pos)
(*pos)++;
if (idregs_debug_find(kvm, kvm->arch.idreg_debugfs_iter + 1)) {
kvm->arch.idreg_debugfs_iter++;
return &kvm->arch.idreg_debugfs_iter;
}
return NULL;
return (void *)idregs_debug_find(kvm, *pos);
}
static void idregs_debug_stop(struct seq_file *s, void *v)
{
struct kvm *kvm = s->private;
if (IS_ERR(v))
return;
mutex_lock(&kvm->arch.config_lock);
kvm->arch.idreg_debugfs_iter = ~0;
mutex_unlock(&kvm->arch.config_lock);
}
static int idregs_debug_show(struct seq_file *s, void *v)
{
const struct sys_reg_desc *desc;
const struct sys_reg_desc *desc = v;
struct kvm *kvm = s->private;
desc = idregs_debug_find(kvm, kvm->arch.idreg_debugfs_iter);
if (!desc->name)
if (!desc)
return 0;
seq_printf(s, "%20s:\t%016llx\n",
@ -5090,12 +5056,78 @@ static const struct seq_operations idregs_debug_sops = {
DEFINE_SEQ_ATTRIBUTE(idregs_debug);
static const struct sys_reg_desc *sr_resx_find(struct kvm *kvm, loff_t pos)
{
unsigned long i, sr_idx = 0;
for (i = 0; i < ARRAY_SIZE(sys_reg_descs); i++) {
const struct sys_reg_desc *r = &sys_reg_descs[i];
if (r->reg < __SANITISED_REG_START__)
continue;
if (sr_idx++ == pos)
return r;
}
return NULL;
}
static void *sr_resx_start(struct seq_file *s, loff_t *pos)
{
struct kvm *kvm = s->private;
if (!kvm->arch.sysreg_masks)
return NULL;
return (void *)sr_resx_find(kvm, *pos);
}
static void *sr_resx_next(struct seq_file *s, void *v, loff_t *pos)
{
struct kvm *kvm = s->private;
(*pos)++;
return (void *)sr_resx_find(kvm, *pos);
}
static void sr_resx_stop(struct seq_file *s, void *v)
{
}
static int sr_resx_show(struct seq_file *s, void *v)
{
const struct sys_reg_desc *desc = v;
struct kvm *kvm = s->private;
struct resx resx;
if (!desc)
return 0;
resx = kvm_get_sysreg_resx(kvm, desc->reg);
seq_printf(s, "%20s:\tRES0:%016llx\tRES1:%016llx\n",
desc->name, resx.res0, resx.res1);
return 0;
}
static const struct seq_operations sr_resx_sops = {
.start = sr_resx_start,
.next = sr_resx_next,
.stop = sr_resx_stop,
.show = sr_resx_show,
};
DEFINE_SEQ_ATTRIBUTE(sr_resx);
void kvm_sys_regs_create_debugfs(struct kvm *kvm)
{
kvm->arch.idreg_debugfs_iter = ~0;
debugfs_create_file("idregs", 0444, kvm->debugfs_dentry, kvm,
&idregs_debug_fops);
debugfs_create_file("resx", 0444, kvm->debugfs_dentry, kvm,
&sr_resx_fops);
}
static void reset_vm_ftr_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *reg)
@ -5581,6 +5613,8 @@ static void vcpu_set_hcr(struct kvm_vcpu *vcpu)
if (kvm_has_mte(vcpu->kvm))
vcpu->arch.hcr_el2 |= HCR_ATA;
else
vcpu->arch.hcr_el2 |= HCR_TID5;
/*
* In the absence of FGT, we cannot independently trap TLBI

View file

@ -49,6 +49,16 @@ struct sys_reg_params {
.Op2 = ((esr) >> 17) & 0x7, \
.is_write = !((esr) & 1) })
/*
* The Feature ID space is defined as the System register space in AArch64
* with op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7}, op2=={0-7}.
*/
static inline bool in_feat_id_space(struct sys_reg_params *p)
{
return (p->Op0 == 3 && !(p->Op1 & 0b100) && p->Op1 != 2 &&
p->CRn == 0 && !(p->CRm & 0b1000));
}
struct sys_reg_desc {
/* Sysreg string for debug */
const char *name;

View file

@ -46,9 +46,31 @@ static void init_hyp_physvirt_offset(void)
hyp_physvirt_offset = (s64)__pa(kern_va) - (s64)hyp_va;
}
/*
* Calculate the actual VA size used by the hypervisor
*/
__init u32 kvm_hyp_va_bits(void)
{
/*
* The ID map is always configured for 48 bits of translation, which may
* be different from the number of VA bits used by the regular kernel
* stage 1.
*
* At EL2, there is only one TTBR register, and we can't switch between
* translation tables *and* update TCR_EL2.T0SZ at the same time. Bottom
* line: we need to use the extended range with *both* our translation
* tables.
*
* So use the maximum of the idmap VA bits and the regular kernel stage
* 1 VA bits as the hypervisor VA size to assure that the hypervisor can
* both ID map its code page and map any kernel memory.
*/
return max(IDMAP_VA_BITS, vabits_actual);
}
/*
* We want to generate a hyp VA with the following format (with V ==
* vabits_actual):
* hypervisor VA bits):
*
* 63 ... V | V-1 | V-2 .. tag_lsb | tag_lsb - 1 .. 0
* ---------------------------------------------------------
@ -61,10 +83,11 @@ __init void kvm_compute_layout(void)
{
phys_addr_t idmap_addr = __pa_symbol(__hyp_idmap_text_start);
u64 hyp_va_msb;
u32 hyp_va_bits = kvm_hyp_va_bits();
/* Where is my RAM region? */
hyp_va_msb = idmap_addr & BIT(vabits_actual - 1);
hyp_va_msb ^= BIT(vabits_actual - 1);
hyp_va_msb = idmap_addr & BIT(hyp_va_bits - 1);
hyp_va_msb ^= BIT(hyp_va_bits - 1);
tag_lsb = fls64((u64)phys_to_virt(memblock_start_of_DRAM()) ^
(u64)(high_memory - 1));
@ -72,9 +95,9 @@ __init void kvm_compute_layout(void)
va_mask = GENMASK_ULL(tag_lsb - 1, 0);
tag_val = hyp_va_msb;
if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && tag_lsb != (vabits_actual - 1)) {
if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && tag_lsb != (hyp_va_bits - 1)) {
/* We have some free bits to insert a random tag. */
tag_val |= get_random_long() & GENMASK_ULL(vabits_actual - 2, tag_lsb);
tag_val |= get_random_long() & GENMASK_ULL(hyp_va_bits - 2, tag_lsb);
}
tag_val >>= tag_lsb;
@ -296,31 +319,3 @@ void kvm_compute_final_ctr_el0(struct alt_instr *alt,
generate_mov_q(read_sanitised_ftr_reg(SYS_CTR_EL0),
origptr, updptr, nr_inst);
}
void kvm_pan_patch_el2_entry(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst)
{
/*
* If we're running at EL1 without hVHE, then SCTLR_EL2.SPAN means
* nothing to us (it is RES1), and we don't need to set PSTATE.PAN
* to anything useful.
*/
if (!is_kernel_in_hyp_mode() && !cpus_have_cap(ARM64_KVM_HVHE))
return;
/*
* Leap of faith: at this point, we must be running VHE one way or
* another, and FEAT_PAN is required to be implemented. If KVM
* explodes at runtime because your system does not abide by this
* requirement, call your favourite HW vendor, they have screwed up.
*
* We don't expect hVHE to access any userspace mapping, so always
* set PSTATE.PAN on enty. Same thing if we have PAN enabled on an
* EL2 kernel. Only force it to 0 if we have not configured PAN in
* the kernel (and you know this is really silly).
*/
if (cpus_have_cap(ARM64_KVM_HVHE) || IS_ENABLED(CONFIG_ARM64_PAN))
*updptr = cpu_to_le32(ENCODE_PSTATE(1, PAN));
else
*updptr = cpu_to_le32(ENCODE_PSTATE(0, PAN));
}

View file

@ -25,11 +25,9 @@
struct vgic_state_iter {
int nr_cpus;
int nr_spis;
int nr_lpis;
int dist_id;
int vcpu_id;
unsigned long intid;
int lpi_idx;
};
static void iter_next(struct kvm *kvm, struct vgic_state_iter *iter)
@ -45,13 +43,15 @@ static void iter_next(struct kvm *kvm, struct vgic_state_iter *iter)
* Let the xarray drive the iterator after the last SPI, as the iterator
* has exhausted the sequentially-allocated INTID space.
*/
if (iter->intid >= (iter->nr_spis + VGIC_NR_PRIVATE_IRQS - 1) &&
iter->nr_lpis) {
if (iter->lpi_idx < iter->nr_lpis)
xa_find_after(&dist->lpi_xa, &iter->intid,
VGIC_LPI_MAX_INTID,
LPI_XA_MARK_DEBUG_ITER);
iter->lpi_idx++;
if (iter->intid >= (iter->nr_spis + VGIC_NR_PRIVATE_IRQS - 1)) {
if (iter->intid == VGIC_LPI_MAX_INTID + 1)
return;
rcu_read_lock();
if (!xa_find_after(&dist->lpi_xa, &iter->intid,
VGIC_LPI_MAX_INTID, XA_PRESENT))
iter->intid = VGIC_LPI_MAX_INTID + 1;
rcu_read_unlock();
return;
}
@ -61,44 +61,21 @@ static void iter_next(struct kvm *kvm, struct vgic_state_iter *iter)
iter->intid = 0;
}
static int iter_mark_lpis(struct kvm *kvm)
static int vgic_count_lpis(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
unsigned long intid, flags;
struct vgic_irq *irq;
unsigned long intid;
int nr_lpis = 0;
xa_lock_irqsave(&dist->lpi_xa, flags);
xa_for_each(&dist->lpi_xa, intid, irq) {
if (!vgic_try_get_irq_ref(irq))
continue;
__xa_set_mark(&dist->lpi_xa, intid, LPI_XA_MARK_DEBUG_ITER);
rcu_read_lock();
xa_for_each(&dist->lpi_xa, intid, irq)
nr_lpis++;
}
xa_unlock_irqrestore(&dist->lpi_xa, flags);
rcu_read_unlock();
return nr_lpis;
}
static void iter_unmark_lpis(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
unsigned long intid, flags;
struct vgic_irq *irq;
xa_for_each_marked(&dist->lpi_xa, intid, irq, LPI_XA_MARK_DEBUG_ITER) {
xa_lock_irqsave(&dist->lpi_xa, flags);
__xa_clear_mark(&dist->lpi_xa, intid, LPI_XA_MARK_DEBUG_ITER);
xa_unlock_irqrestore(&dist->lpi_xa, flags);
/* vgic_put_irq() expects to be called outside of the xa_lock */
vgic_put_irq(kvm, irq);
}
}
static void iter_init(struct kvm *kvm, struct vgic_state_iter *iter,
loff_t pos)
{
@ -108,8 +85,6 @@ static void iter_init(struct kvm *kvm, struct vgic_state_iter *iter,
iter->nr_cpus = nr_cpus;
iter->nr_spis = kvm->arch.vgic.nr_spis;
if (kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
iter->nr_lpis = iter_mark_lpis(kvm);
/* Fast forward to the right position if needed */
while (pos--)
@ -121,7 +96,7 @@ static bool end_of_vgic(struct vgic_state_iter *iter)
return iter->dist_id > 0 &&
iter->vcpu_id == iter->nr_cpus &&
iter->intid >= (iter->nr_spis + VGIC_NR_PRIVATE_IRQS) &&
(!iter->nr_lpis || iter->lpi_idx > iter->nr_lpis);
iter->intid > VGIC_LPI_MAX_INTID;
}
static void *vgic_debug_start(struct seq_file *s, loff_t *pos)
@ -129,72 +104,56 @@ static void *vgic_debug_start(struct seq_file *s, loff_t *pos)
struct kvm *kvm = s->private;
struct vgic_state_iter *iter;
mutex_lock(&kvm->arch.config_lock);
iter = kvm->arch.vgic.iter;
if (iter) {
iter = ERR_PTR(-EBUSY);
goto out;
}
iter = kmalloc(sizeof(*iter), GFP_KERNEL);
if (!iter) {
iter = ERR_PTR(-ENOMEM);
goto out;
}
if (!iter)
return ERR_PTR(-ENOMEM);
iter_init(kvm, iter, *pos);
kvm->arch.vgic.iter = iter;
if (end_of_vgic(iter))
if (end_of_vgic(iter)) {
kfree(iter);
iter = NULL;
out:
mutex_unlock(&kvm->arch.config_lock);
}
return iter;
}
static void *vgic_debug_next(struct seq_file *s, void *v, loff_t *pos)
{
struct kvm *kvm = s->private;
struct vgic_state_iter *iter = kvm->arch.vgic.iter;
struct vgic_state_iter *iter = v;
++*pos;
iter_next(kvm, iter);
if (end_of_vgic(iter))
if (end_of_vgic(iter)) {
kfree(iter);
iter = NULL;
}
return iter;
}
static void vgic_debug_stop(struct seq_file *s, void *v)
{
struct kvm *kvm = s->private;
struct vgic_state_iter *iter;
struct vgic_state_iter *iter = v;
/*
* If the seq file wasn't properly opened, there's nothing to clearn
* up.
*/
if (IS_ERR(v))
if (IS_ERR_OR_NULL(v))
return;
mutex_lock(&kvm->arch.config_lock);
iter = kvm->arch.vgic.iter;
iter_unmark_lpis(kvm);
kfree(iter);
kvm->arch.vgic.iter = NULL;
mutex_unlock(&kvm->arch.config_lock);
}
static void print_dist_state(struct seq_file *s, struct vgic_dist *dist,
struct vgic_state_iter *iter)
{
bool v3 = dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3;
struct kvm *kvm = s->private;
seq_printf(s, "Distributor\n");
seq_printf(s, "===========\n");
seq_printf(s, "vgic_model:\t%s\n", v3 ? "GICv3" : "GICv2");
seq_printf(s, "nr_spis:\t%d\n", dist->nr_spis);
if (v3)
seq_printf(s, "nr_lpis:\t%d\n", iter->nr_lpis);
seq_printf(s, "nr_lpis:\t%d\n", vgic_count_lpis(kvm));
seq_printf(s, "enabled:\t%d\n", dist->enabled);
seq_printf(s, "\n");
@ -291,16 +250,13 @@ static int vgic_debug_show(struct seq_file *s, void *v)
if (iter->vcpu_id < iter->nr_cpus)
vcpu = kvm_get_vcpu(kvm, iter->vcpu_id);
/*
* Expect this to succeed, as iter_mark_lpis() takes a reference on
* every LPI to be visited.
*/
if (iter->intid < VGIC_NR_PRIVATE_IRQS)
irq = vgic_get_vcpu_irq(vcpu, iter->intid);
else
irq = vgic_get_irq(kvm, iter->intid);
if (WARN_ON_ONCE(!irq))
return -EINVAL;
if (!irq)
return 0;
raw_spin_lock_irqsave(&irq->irq_lock, flags);
print_irq_state(s, irq, vcpu);

View file

@ -140,6 +140,10 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
goto out_unlock;
}
kvm->arch.vgic.in_kernel = true;
kvm->arch.vgic.vgic_model = type;
kvm->arch.vgic.implementation_rev = KVM_VGIC_IMP_REV_LATEST;
kvm_for_each_vcpu(i, vcpu, kvm) {
ret = vgic_allocate_private_irqs_locked(vcpu, type);
if (ret)
@ -156,10 +160,6 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
goto out_unlock;
}
kvm->arch.vgic.in_kernel = true;
kvm->arch.vgic.vgic_model = type;
kvm->arch.vgic.implementation_rev = KVM_VGIC_IMP_REV_LATEST;
kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
aa64pfr0 = kvm_read_vm_id_reg(kvm, SYS_ID_AA64PFR0_EL1) & ~ID_AA64PFR0_EL1_GIC;

View file

@ -57,7 +57,7 @@ static int lr_map_idx_to_shadow_idx(struct shadow_if *shadow_if, int idx)
* as the L1 guest is in charge of provisioning the interrupts via its own
* view of the ICH_LR*_EL2 registers, which conveniently live in the VNCR
* page. This means that the flow described above does work (there is no
* state to rebuild in the L0 hypervisor), and that most things happed on L2
* state to rebuild in the L0 hypervisor), and that most things happen on L2
* load/put:
*
* - on L2 load: move the in-memory L1 vGIC configuration into a shadow,
@ -202,16 +202,16 @@ u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu)
if ((hcr & ICH_HCR_EL2_NPIE) && !mi_state.pend)
reg |= ICH_MISR_EL2_NP;
if ((hcr & ICH_HCR_EL2_VGrp0EIE) && (vmcr & ICH_VMCR_ENG0_MASK))
if ((hcr & ICH_HCR_EL2_VGrp0EIE) && (vmcr & ICH_VMCR_EL2_VENG0_MASK))
reg |= ICH_MISR_EL2_VGrp0E;
if ((hcr & ICH_HCR_EL2_VGrp0DIE) && !(vmcr & ICH_VMCR_ENG0_MASK))
if ((hcr & ICH_HCR_EL2_VGrp0DIE) && !(vmcr & ICH_VMCR_EL2_VENG0_MASK))
reg |= ICH_MISR_EL2_VGrp0D;
if ((hcr & ICH_HCR_EL2_VGrp1EIE) && (vmcr & ICH_VMCR_ENG1_MASK))
if ((hcr & ICH_HCR_EL2_VGrp1EIE) && (vmcr & ICH_VMCR_EL2_VENG1_MASK))
reg |= ICH_MISR_EL2_VGrp1E;
if ((hcr & ICH_HCR_EL2_VGrp1DIE) && !(vmcr & ICH_VMCR_ENG1_MASK))
if ((hcr & ICH_HCR_EL2_VGrp1DIE) && !(vmcr & ICH_VMCR_EL2_VENG1_MASK))
reg |= ICH_MISR_EL2_VGrp1D;
return reg;

View file

@ -41,9 +41,9 @@ void vgic_v3_configure_hcr(struct kvm_vcpu *vcpu,
if (!als->nr_sgi)
cpuif->vgic_hcr |= ICH_HCR_EL2_vSGIEOICount;
cpuif->vgic_hcr |= (cpuif->vgic_vmcr & ICH_VMCR_ENG0_MASK) ?
cpuif->vgic_hcr |= (cpuif->vgic_vmcr & ICH_VMCR_EL2_VENG0_MASK) ?
ICH_HCR_EL2_VGrp0DIE : ICH_HCR_EL2_VGrp0EIE;
cpuif->vgic_hcr |= (cpuif->vgic_vmcr & ICH_VMCR_ENG1_MASK) ?
cpuif->vgic_hcr |= (cpuif->vgic_vmcr & ICH_VMCR_EL2_VENG1_MASK) ?
ICH_HCR_EL2_VGrp1DIE : ICH_HCR_EL2_VGrp1EIE;
/*
@ -215,7 +215,7 @@ void vgic_v3_deactivate(struct kvm_vcpu *vcpu, u64 val)
* We only deal with DIR when EOIMode==1, and only for SGI,
* PPI or SPI.
*/
if (!(cpuif->vgic_vmcr & ICH_VMCR_EOIM_MASK) ||
if (!(cpuif->vgic_vmcr & ICH_VMCR_EL2_VEOIM_MASK) ||
val >= vcpu->kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS)
return;
@ -408,25 +408,23 @@ void vgic_v3_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcrp)
u32 vmcr;
if (model == KVM_DEV_TYPE_ARM_VGIC_V2) {
vmcr = (vmcrp->ackctl << ICH_VMCR_ACK_CTL_SHIFT) &
ICH_VMCR_ACK_CTL_MASK;
vmcr |= (vmcrp->fiqen << ICH_VMCR_FIQ_EN_SHIFT) &
ICH_VMCR_FIQ_EN_MASK;
vmcr = FIELD_PREP(ICH_VMCR_EL2_VAckCtl, vmcrp->ackctl);
vmcr |= FIELD_PREP(ICH_VMCR_EL2_VFIQEn, vmcrp->fiqen);
} else {
/*
* When emulating GICv3 on GICv3 with SRE=1 on the
* VFIQEn bit is RES1 and the VAckCtl bit is RES0.
*/
vmcr = ICH_VMCR_FIQ_EN_MASK;
vmcr = ICH_VMCR_EL2_VFIQEn_MASK;
}
vmcr |= (vmcrp->cbpr << ICH_VMCR_CBPR_SHIFT) & ICH_VMCR_CBPR_MASK;
vmcr |= (vmcrp->eoim << ICH_VMCR_EOIM_SHIFT) & ICH_VMCR_EOIM_MASK;
vmcr |= (vmcrp->abpr << ICH_VMCR_BPR1_SHIFT) & ICH_VMCR_BPR1_MASK;
vmcr |= (vmcrp->bpr << ICH_VMCR_BPR0_SHIFT) & ICH_VMCR_BPR0_MASK;
vmcr |= (vmcrp->pmr << ICH_VMCR_PMR_SHIFT) & ICH_VMCR_PMR_MASK;
vmcr |= (vmcrp->grpen0 << ICH_VMCR_ENG0_SHIFT) & ICH_VMCR_ENG0_MASK;
vmcr |= (vmcrp->grpen1 << ICH_VMCR_ENG1_SHIFT) & ICH_VMCR_ENG1_MASK;
vmcr |= FIELD_PREP(ICH_VMCR_EL2_VCBPR, vmcrp->cbpr);
vmcr |= FIELD_PREP(ICH_VMCR_EL2_VEOIM, vmcrp->eoim);
vmcr |= FIELD_PREP(ICH_VMCR_EL2_VBPR1, vmcrp->abpr);
vmcr |= FIELD_PREP(ICH_VMCR_EL2_VBPR0, vmcrp->bpr);
vmcr |= FIELD_PREP(ICH_VMCR_EL2_VPMR, vmcrp->pmr);
vmcr |= FIELD_PREP(ICH_VMCR_EL2_VENG0, vmcrp->grpen0);
vmcr |= FIELD_PREP(ICH_VMCR_EL2_VENG1, vmcrp->grpen1);
cpu_if->vgic_vmcr = vmcr;
}
@ -440,10 +438,8 @@ void vgic_v3_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcrp)
vmcr = cpu_if->vgic_vmcr;
if (model == KVM_DEV_TYPE_ARM_VGIC_V2) {
vmcrp->ackctl = (vmcr & ICH_VMCR_ACK_CTL_MASK) >>
ICH_VMCR_ACK_CTL_SHIFT;
vmcrp->fiqen = (vmcr & ICH_VMCR_FIQ_EN_MASK) >>
ICH_VMCR_FIQ_EN_SHIFT;
vmcrp->ackctl = FIELD_GET(ICH_VMCR_EL2_VAckCtl, vmcr);
vmcrp->fiqen = FIELD_GET(ICH_VMCR_EL2_VFIQEn, vmcr);
} else {
/*
* When emulating GICv3 on GICv3 with SRE=1 on the
@ -453,13 +449,13 @@ void vgic_v3_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcrp)
vmcrp->ackctl = 0;
}
vmcrp->cbpr = (vmcr & ICH_VMCR_CBPR_MASK) >> ICH_VMCR_CBPR_SHIFT;
vmcrp->eoim = (vmcr & ICH_VMCR_EOIM_MASK) >> ICH_VMCR_EOIM_SHIFT;
vmcrp->abpr = (vmcr & ICH_VMCR_BPR1_MASK) >> ICH_VMCR_BPR1_SHIFT;
vmcrp->bpr = (vmcr & ICH_VMCR_BPR0_MASK) >> ICH_VMCR_BPR0_SHIFT;
vmcrp->pmr = (vmcr & ICH_VMCR_PMR_MASK) >> ICH_VMCR_PMR_SHIFT;
vmcrp->grpen0 = (vmcr & ICH_VMCR_ENG0_MASK) >> ICH_VMCR_ENG0_SHIFT;
vmcrp->grpen1 = (vmcr & ICH_VMCR_ENG1_MASK) >> ICH_VMCR_ENG1_SHIFT;
vmcrp->cbpr = FIELD_GET(ICH_VMCR_EL2_VCBPR, vmcr);
vmcrp->eoim = FIELD_GET(ICH_VMCR_EL2_VEOIM, vmcr);
vmcrp->abpr = FIELD_GET(ICH_VMCR_EL2_VBPR1, vmcr);
vmcrp->bpr = FIELD_GET(ICH_VMCR_EL2_VBPR0, vmcr);
vmcrp->pmr = FIELD_GET(ICH_VMCR_EL2_VPMR, vmcr);
vmcrp->grpen0 = FIELD_GET(ICH_VMCR_EL2_VENG0, vmcr);
vmcrp->grpen1 = FIELD_GET(ICH_VMCR_EL2_VENG1, vmcr);
}
#define INITIAL_PENDBASER_VALUE \
@ -880,6 +876,20 @@ void noinstr kvm_compute_ich_hcr_trap_bits(struct alt_instr *alt,
*updptr = cpu_to_le32(insn);
}
void vgic_v3_enable_cpuif_traps(void)
{
u64 traps = vgic_ich_hcr_trap_bits();
if (traps) {
kvm_info("GICv3 sysreg trapping enabled ([%s%s%s%s], reduced performance)\n",
(traps & ICH_HCR_EL2_TALL0) ? "G0" : "",
(traps & ICH_HCR_EL2_TALL1) ? "G1" : "",
(traps & ICH_HCR_EL2_TC) ? "C" : "",
(traps & ICH_HCR_EL2_TDIR) ? "D" : "");
static_branch_enable(&vgic_v3_cpuif_trap);
}
}
/**
* vgic_v3_probe - probe for a VGICv3 compatible interrupt controller
* @info: pointer to the GIC description
@ -891,7 +901,6 @@ int vgic_v3_probe(const struct gic_kvm_info *info)
{
u64 ich_vtr_el2 = kvm_call_hyp_ret(__vgic_v3_get_gic_config);
bool has_v2;
u64 traps;
int ret;
has_v2 = ich_vtr_el2 >> 63;
@ -955,15 +964,7 @@ int vgic_v3_probe(const struct gic_kvm_info *info)
kvm_vgic_global_state.ich_vtr_el2 &= ~ICH_VTR_EL2_SEIS;
}
traps = vgic_ich_hcr_trap_bits();
if (traps) {
kvm_info("GICv3 sysreg trapping enabled ([%s%s%s%s], reduced performance)\n",
(traps & ICH_HCR_EL2_TALL0) ? "G0" : "",
(traps & ICH_HCR_EL2_TALL1) ? "G1" : "",
(traps & ICH_HCR_EL2_TC) ? "C" : "",
(traps & ICH_HCR_EL2_TDIR) ? "D" : "");
static_branch_enable(&vgic_v3_cpuif_trap);
}
vgic_v3_enable_cpuif_traps();
kvm_vgic_global_state.vctrl_base = NULL;
kvm_vgic_global_state.type = VGIC_V3;

View file

@ -48,5 +48,7 @@ int vgic_v5_probe(const struct gic_kvm_info *info)
static_branch_enable(&kvm_vgic_global_state.gicv3_cpuif);
kvm_info("GCIE legacy system register CPU interface\n");
vgic_v3_enable_cpuif_traps();
return 0;
}

View file

@ -324,6 +324,7 @@ void vgic_v3_configure_hcr(struct kvm_vcpu *vcpu, struct ap_list_summary *als);
void vgic_v3_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
void vgic_v3_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
void vgic_v3_reset(struct kvm_vcpu *vcpu);
void vgic_v3_enable_cpuif_traps(void);
int vgic_v3_probe(const struct gic_kvm_info *info);
int vgic_v3_map_resources(struct kvm *kvm);
int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq);

View file

@ -1856,10 +1856,7 @@ UnsignedEnum 31:28 RDM
0b0000 NI
0b0001 IMP
EndEnum
UnsignedEnum 27:24 TME
0b0000 NI
0b0001 IMP
EndEnum
Res0 27:24
UnsignedEnum 23:20 ATOMIC
0b0000 NI
0b0010 IMP
@ -2098,18 +2095,18 @@ UnsignedEnum 47:44 EXS
0b0000 NI
0b0001 IMP
EndEnum
Enum 43:40 TGRAN4_2
UnsignedEnum 43:40 TGRAN4_2
0b0000 TGRAN4
0b0001 NI
0b0010 IMP
0b0011 52_BIT
EndEnum
Enum 39:36 TGRAN64_2
UnsignedEnum 39:36 TGRAN64_2
0b0000 TGRAN64
0b0001 NI
0b0010 IMP
EndEnum
Enum 35:32 TGRAN16_2
UnsignedEnum 35:32 TGRAN16_2
0b0000 TGRAN16
0b0001 NI
0b0010 IMP
@ -2256,9 +2253,10 @@ UnsignedEnum 43:40 FWB
0b0000 NI
0b0001 IMP
EndEnum
Enum 39:36 IDS
0b0000 0x0
0b0001 0x18
UnsignedEnum 39:36 IDS
0b0000 NI
0b0001 IMP
0b0010 EL3
EndEnum
UnsignedEnum 35:32 AT
0b0000 NI
@ -2432,10 +2430,7 @@ Field 57 EPAN
Field 56 EnALS
Field 55 EnAS0
Field 54 EnASR
Field 53 TME
Field 52 TME0
Field 51 TMT
Field 50 TMT0
Res0 53:50
Field 49:46 TWEDEL
Field 45 TWEDEn
Field 44 DSSBS
@ -3749,6 +3744,75 @@ UnsignedEnum 2:0 F8S1
EndEnum
EndSysreg
Sysreg SCTLR_EL2 3 4 1 0 0
Field 63 TIDCP
Field 62 SPINTMASK
Field 61 NMI
Field 60 EnTP2
Field 59 TCSO
Field 58 TCSO0
Field 57 EPAN
Field 56 EnALS
Field 55 EnAS0
Field 54 EnASR
Res0 53:50
Field 49:46 TWEDEL
Field 45 TWEDEn
Field 44 DSSBS
Field 43 ATA
Field 42 ATA0
Enum 41:40 TCF
0b00 NONE
0b01 SYNC
0b10 ASYNC
0b11 ASYMM
EndEnum
Enum 39:38 TCF0
0b00 NONE
0b01 SYNC
0b10 ASYNC
0b11 ASYMM
EndEnum
Field 37 ITFSB
Field 36 BT
Field 35 BT0
Field 34 EnFPM
Field 33 MSCEn
Field 32 CMOW
Field 31 EnIA
Field 30 EnIB
Field 29 LSMAOE
Field 28 nTLSMD
Field 27 EnDA
Field 26 UCI
Field 25 EE
Field 24 E0E
Field 23 SPAN
Field 22 EIS
Field 21 IESB
Field 20 TSCXT
Field 19 WXN
Field 18 nTWE
Res0 17
Field 16 nTWI
Field 15 UCT
Field 14 DZE
Field 13 EnDB
Field 12 I
Field 11 EOS
Field 10 EnRCTX
Res0 9
Field 8 SED
Field 7 ITD
Field 6 nAA
Field 5 CP15BEN
Field 4 SA0
Field 3 SA
Field 2 C
Field 1 A
Field 0 M
EndSysreg
Sysreg HCR_EL2 3 4 1 1 0
Field 63:60 TWEDEL
Field 59 TWEDEn
@ -3771,8 +3835,7 @@ Field 43 NV1
Field 42 NV
Field 41 API
Field 40 APK
Field 39 TME
Field 38 MIOCNCE
Res0 39:38
Field 37 TEA
Field 36 TERR
Field 35 TLOR
@ -4400,6 +4463,63 @@ Field 56:12 BADDR
Res0 11:0
EndSysreg
Sysreg VTCR_EL2 3 4 2 1 2
Res0 63:46
Field 45 HDBSS
Field 44 HAFT
Res0 43:42
Field 41 TL0
Field 40 GCSH
Res0 39
Field 38 D128
Field 37 S2POE
Field 36 S2PIE
Field 35 TL1
Field 34 AssuredOnly
Field 33 SL2
Field 32 DS
Res1 31
Field 30 NSA
Field 29 NSW
Field 28 HWU62
Field 27 HWU61
Field 26 HWU60
Field 25 HWU59
Res0 24:23
Field 22 HD
Field 21 HA
Res0 20
Enum 19 VS
0b0 8BIT
0b1 16BIT
EndEnum
Field 18:16 PS
Enum 15:14 TG0
0b00 4K
0b01 64K
0b10 16K
EndEnum
Enum 13:12 SH0
0b00 NONE
0b01 OUTER
0b11 INNER
EndEnum
Enum 11:10 ORGN0
0b00 NC
0b01 WBWA
0b10 WT
0b11 WBnWA
EndEnum
Enum 9:8 IRGN0
0b00 NC
0b01 WBWA
0b10 WT
0b11 WBnWA
EndEnum
Field 7:6 SL0
Field 5:0 T0SZ
EndSysreg
Sysreg GCSCR_EL2 3 4 2 5 0
Fields GCSCR_ELx
EndSysreg
@ -4579,7 +4699,7 @@ Field 7 ICC_IAFFIDR_EL1
Field 6 ICC_ICSR_EL1
Field 5 ICC_PCR_EL1
Field 4 ICC_HPPIR_EL1
Field 3 ICC_HAPR_EL1
Res1 3
Field 2 ICC_CR0_EL1
Field 1 ICC_IDRn_EL1
Field 0 ICC_APR_EL1

View file

@ -37,6 +37,7 @@
#define KVM_REQ_TLB_FLUSH_GPA KVM_ARCH_REQ(0)
#define KVM_REQ_STEAL_UPDATE KVM_ARCH_REQ(1)
#define KVM_REQ_PMU KVM_ARCH_REQ(2)
#define KVM_REQ_AUX_LOAD KVM_ARCH_REQ(3)
#define KVM_GUESTDBG_SW_BP_MASK \
(KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP)
@ -164,6 +165,7 @@ enum emulation_result {
#define LOONGARCH_PV_FEAT_UPDATED BIT_ULL(63)
#define LOONGARCH_PV_FEAT_MASK (BIT(KVM_FEATURE_IPI) | \
BIT(KVM_FEATURE_PREEMPT) | \
BIT(KVM_FEATURE_STEAL_TIME) | \
BIT(KVM_FEATURE_USER_HCALL) | \
BIT(KVM_FEATURE_VIRT_EXTIOI))
@ -200,6 +202,7 @@ struct kvm_vcpu_arch {
/* Which auxiliary state is loaded (KVM_LARCH_*) */
unsigned int aux_inuse;
unsigned int aux_ldtype;
/* FPU state */
struct loongarch_fpu fpu FPU_ALIGN;
@ -252,6 +255,7 @@ struct kvm_vcpu_arch {
u64 guest_addr;
u64 last_steal;
struct gfn_to_hva_cache cache;
u8 preempted;
} st;
};
@ -265,6 +269,11 @@ static inline void writel_sw_gcsr(struct loongarch_csrs *csr, int reg, unsigned
csr->csrs[reg] = val;
}
static inline bool kvm_guest_has_msgint(struct kvm_vcpu_arch *arch)
{
return arch->cpucfg[1] & CPUCFG1_MSGINT;
}
static inline bool kvm_guest_has_fpu(struct kvm_vcpu_arch *arch)
{
return arch->cpucfg[2] & CPUCFG2_FP;

View file

@ -37,8 +37,10 @@ struct kvm_steal_time {
__u64 steal;
__u32 version;
__u32 flags;
__u32 pad[12];
__u8 preempted;
__u8 pad[47];
};
#define KVM_VCPU_PREEMPTED (1 << 0)
/*
* Hypercall interface for KVM hypervisor

View file

@ -690,6 +690,7 @@
#define LOONGARCH_CSR_ISR3 0xa3
#define LOONGARCH_CSR_IRR 0xa4
#define LOONGARCH_CSR_IPR 0xa5
#define LOONGARCH_CSR_PRID 0xc0

View file

@ -34,6 +34,10 @@ __retry:
return true;
}
#define vcpu_is_preempted vcpu_is_preempted
bool vcpu_is_preempted(int cpu);
#endif /* CONFIG_PARAVIRT */
#include <asm-generic/qspinlock.h>

View file

@ -105,6 +105,7 @@ struct kvm_fpu {
#define KVM_LOONGARCH_VM_FEAT_PV_STEALTIME 7
#define KVM_LOONGARCH_VM_FEAT_PTW 8
#define KVM_LOONGARCH_VM_FEAT_MSGINT 9
#define KVM_LOONGARCH_VM_FEAT_PV_PREEMPT 10
/* Device Control API on vcpu fd */
#define KVM_LOONGARCH_VCPU_CPUCFG 0

View file

@ -15,6 +15,7 @@
#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
#define KVM_FEATURE_IPI 1
#define KVM_FEATURE_STEAL_TIME 2
#define KVM_FEATURE_PREEMPT 3
/* BIT 24 - 31 are features configurable by user space vmm */
#define KVM_FEATURE_VIRT_EXTIOI 24
#define KVM_FEATURE_USER_HCALL 25

View file

@ -11,6 +11,7 @@
static int has_steal_clock;
static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
static DEFINE_STATIC_KEY_FALSE(virt_preempt_key);
DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
static bool steal_acc = true;
@ -259,6 +260,18 @@ static int pv_time_cpu_down_prepare(unsigned int cpu)
return 0;
}
bool vcpu_is_preempted(int cpu)
{
struct kvm_steal_time *src;
if (!static_branch_unlikely(&virt_preempt_key))
return false;
src = &per_cpu(steal_time, cpu);
return !!(src->preempted & KVM_VCPU_PREEMPTED);
}
EXPORT_SYMBOL(vcpu_is_preempted);
#endif
static void pv_cpu_reboot(void *unused)
@ -300,6 +313,9 @@ int __init pv_time_init(void)
pr_err("Failed to install cpu hotplug callbacks\n");
return r;
}
if (kvm_para_has_feature(KVM_FEATURE_PREEMPT))
static_branch_enable(&virt_preempt_key);
#endif
static_call_update(pv_steal_clock, paravt_steal_clock);
@ -310,7 +326,10 @@ int __init pv_time_init(void)
static_key_slow_inc(&paravirt_steal_rq_enabled);
#endif
pr_info("Using paravirt steal-time\n");
if (static_key_enabled(&virt_preempt_key))
pr_info("Using paravirt steal-time with preempt enabled\n");
else
pr_info("Using paravirt steal-time with preempt disabled\n");
return 0;
}

View file

@ -754,7 +754,8 @@ static int kvm_handle_fpu_disabled(struct kvm_vcpu *vcpu, int ecode)
return RESUME_HOST;
}
kvm_own_fpu(vcpu);
vcpu->arch.aux_ldtype = KVM_LARCH_FPU;
kvm_make_request(KVM_REQ_AUX_LOAD, vcpu);
return RESUME_GUEST;
}
@ -792,8 +793,12 @@ static long kvm_save_notify(struct kvm_vcpu *vcpu)
*/
static int kvm_handle_lsx_disabled(struct kvm_vcpu *vcpu, int ecode)
{
if (kvm_own_lsx(vcpu))
if (!kvm_guest_has_lsx(&vcpu->arch))
kvm_queue_exception(vcpu, EXCCODE_INE, 0);
else {
vcpu->arch.aux_ldtype = KVM_LARCH_LSX;
kvm_make_request(KVM_REQ_AUX_LOAD, vcpu);
}
return RESUME_GUEST;
}
@ -808,16 +813,24 @@ static int kvm_handle_lsx_disabled(struct kvm_vcpu *vcpu, int ecode)
*/
static int kvm_handle_lasx_disabled(struct kvm_vcpu *vcpu, int ecode)
{
if (kvm_own_lasx(vcpu))
if (!kvm_guest_has_lasx(&vcpu->arch))
kvm_queue_exception(vcpu, EXCCODE_INE, 0);
else {
vcpu->arch.aux_ldtype = KVM_LARCH_LASX;
kvm_make_request(KVM_REQ_AUX_LOAD, vcpu);
}
return RESUME_GUEST;
}
static int kvm_handle_lbt_disabled(struct kvm_vcpu *vcpu, int ecode)
{
if (kvm_own_lbt(vcpu))
if (!kvm_guest_has_lbt(&vcpu->arch))
kvm_queue_exception(vcpu, EXCCODE_INE, 0);
else {
vcpu->arch.aux_ldtype = KVM_LARCH_LBT;
kvm_make_request(KVM_REQ_AUX_LOAD, vcpu);
}
return RESUME_GUEST;
}

View file

@ -119,7 +119,7 @@ void eiointc_set_irq(struct loongarch_eiointc *s, int irq, int level)
static int loongarch_eiointc_read(struct kvm_vcpu *vcpu, struct loongarch_eiointc *s,
gpa_t addr, unsigned long *val)
{
int index, ret = 0;
int index;
u64 data = 0;
gpa_t offset;
@ -150,40 +150,36 @@ static int loongarch_eiointc_read(struct kvm_vcpu *vcpu, struct loongarch_eioint
data = s->coremap[index];
break;
default:
ret = -EINVAL;
break;
}
*val = data;
return ret;
return 0;
}
static int kvm_eiointc_read(struct kvm_vcpu *vcpu,
struct kvm_io_device *dev,
gpa_t addr, int len, void *val)
{
int ret = -EINVAL;
unsigned long flags, data, offset;
struct loongarch_eiointc *eiointc = vcpu->kvm->arch.eiointc;
if (!eiointc) {
kvm_err("%s: eiointc irqchip not valid!\n", __func__);
return -EINVAL;
return 0;
}
if (addr & (len - 1)) {
kvm_err("%s: eiointc not aligned addr %llx len %d\n", __func__, addr, len);
return -EINVAL;
return 0;
}
offset = addr & 0x7;
addr -= offset;
vcpu->stat.eiointc_read_exits++;
spin_lock_irqsave(&eiointc->lock, flags);
ret = loongarch_eiointc_read(vcpu, eiointc, addr, &data);
loongarch_eiointc_read(vcpu, eiointc, addr, &data);
spin_unlock_irqrestore(&eiointc->lock, flags);
if (ret)
return ret;
data = data >> (offset * 8);
switch (len) {
@ -208,7 +204,7 @@ static int loongarch_eiointc_write(struct kvm_vcpu *vcpu,
struct loongarch_eiointc *s,
gpa_t addr, u64 value, u64 field_mask)
{
int index, irq, ret = 0;
int index, irq;
u8 cpu;
u64 data, old, mask;
gpa_t offset;
@ -287,29 +283,27 @@ static int loongarch_eiointc_write(struct kvm_vcpu *vcpu,
eiointc_update_sw_coremap(s, index * 8, data, sizeof(data), true);
break;
default:
ret = -EINVAL;
break;
}
return ret;
return 0;
}
static int kvm_eiointc_write(struct kvm_vcpu *vcpu,
struct kvm_io_device *dev,
gpa_t addr, int len, const void *val)
{
int ret = -EINVAL;
unsigned long flags, value;
struct loongarch_eiointc *eiointc = vcpu->kvm->arch.eiointc;
if (!eiointc) {
kvm_err("%s: eiointc irqchip not valid!\n", __func__);
return -EINVAL;
return 0;
}
if (addr & (len - 1)) {
kvm_err("%s: eiointc not aligned addr %llx len %d\n", __func__, addr, len);
return -EINVAL;
return 0;
}
vcpu->stat.eiointc_write_exits++;
@ -317,24 +311,24 @@ static int kvm_eiointc_write(struct kvm_vcpu *vcpu,
switch (len) {
case 1:
value = *(unsigned char *)val;
ret = loongarch_eiointc_write(vcpu, eiointc, addr, value, 0xFF);
loongarch_eiointc_write(vcpu, eiointc, addr, value, 0xFF);
break;
case 2:
value = *(unsigned short *)val;
ret = loongarch_eiointc_write(vcpu, eiointc, addr, value, USHRT_MAX);
loongarch_eiointc_write(vcpu, eiointc, addr, value, USHRT_MAX);
break;
case 4:
value = *(unsigned int *)val;
ret = loongarch_eiointc_write(vcpu, eiointc, addr, value, UINT_MAX);
loongarch_eiointc_write(vcpu, eiointc, addr, value, UINT_MAX);
break;
default:
value = *(unsigned long *)val;
ret = loongarch_eiointc_write(vcpu, eiointc, addr, value, ULONG_MAX);
loongarch_eiointc_write(vcpu, eiointc, addr, value, ULONG_MAX);
break;
}
spin_unlock_irqrestore(&eiointc->lock, flags);
return ret;
return 0;
}
static const struct kvm_io_device_ops kvm_eiointc_ops = {
@ -352,7 +346,7 @@ static int kvm_eiointc_virt_read(struct kvm_vcpu *vcpu,
if (!eiointc) {
kvm_err("%s: eiointc irqchip not valid!\n", __func__);
return -EINVAL;
return 0;
}
addr -= EIOINTC_VIRT_BASE;
@ -376,28 +370,25 @@ static int kvm_eiointc_virt_write(struct kvm_vcpu *vcpu,
struct kvm_io_device *dev,
gpa_t addr, int len, const void *val)
{
int ret = 0;
unsigned long flags;
u32 value = *(u32 *)val;
struct loongarch_eiointc *eiointc = vcpu->kvm->arch.eiointc;
if (!eiointc) {
kvm_err("%s: eiointc irqchip not valid!\n", __func__);
return -EINVAL;
return 0;
}
addr -= EIOINTC_VIRT_BASE;
spin_lock_irqsave(&eiointc->lock, flags);
switch (addr) {
case EIOINTC_VIRT_FEATURES:
ret = -EPERM;
break;
case EIOINTC_VIRT_CONFIG:
/*
* eiointc features can only be set at disabled status
*/
if ((eiointc->status & BIT(EIOINTC_ENABLE)) && value) {
ret = -EPERM;
break;
}
eiointc->status = value & eiointc->features;
@ -407,7 +398,7 @@ static int kvm_eiointc_virt_write(struct kvm_vcpu *vcpu,
}
spin_unlock_irqrestore(&eiointc->lock, flags);
return ret;
return 0;
}
static const struct kvm_io_device_ops kvm_eiointc_virt_ops = {

View file

@ -111,7 +111,7 @@ static int mail_send(struct kvm *kvm, uint64_t data)
vcpu = kvm_get_vcpu_by_cpuid(kvm, cpu);
if (unlikely(vcpu == NULL)) {
kvm_err("%s: invalid target cpu: %d\n", __func__, cpu);
return -EINVAL;
return 0;
}
mailbox = ((data & 0xffffffff) >> 2) & 0x7;
offset = IOCSR_IPI_BUF_20 + mailbox * 4;
@ -145,7 +145,7 @@ static int send_ipi_data(struct kvm_vcpu *vcpu, gpa_t addr, uint64_t data)
srcu_read_unlock(&vcpu->kvm->srcu, idx);
if (unlikely(ret)) {
kvm_err("%s: : read data from addr %llx failed\n", __func__, addr);
return ret;
return 0;
}
/* Construct the mask by scanning the bit 27-30 */
for (i = 0; i < 4; i++) {
@ -162,7 +162,7 @@ static int send_ipi_data(struct kvm_vcpu *vcpu, gpa_t addr, uint64_t data)
if (unlikely(ret))
kvm_err("%s: : write data to addr %llx failed\n", __func__, addr);
return ret;
return 0;
}
static int any_send(struct kvm *kvm, uint64_t data)
@ -174,7 +174,7 @@ static int any_send(struct kvm *kvm, uint64_t data)
vcpu = kvm_get_vcpu_by_cpuid(kvm, cpu);
if (unlikely(vcpu == NULL)) {
kvm_err("%s: invalid target cpu: %d\n", __func__, cpu);
return -EINVAL;
return 0;
}
offset = data & 0xffff;
@ -183,7 +183,6 @@ static int any_send(struct kvm *kvm, uint64_t data)
static int loongarch_ipi_readl(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *val)
{
int ret = 0;
uint32_t offset;
uint64_t res = 0;
@ -202,33 +201,27 @@ static int loongarch_ipi_readl(struct kvm_vcpu *vcpu, gpa_t addr, int len, void
spin_unlock(&vcpu->arch.ipi_state.lock);
break;
case IOCSR_IPI_SET:
res = 0;
break;
case IOCSR_IPI_CLEAR:
res = 0;
break;
case IOCSR_IPI_BUF_20 ... IOCSR_IPI_BUF_38 + 7:
if (offset + len > IOCSR_IPI_BUF_38 + 8) {
kvm_err("%s: invalid offset or len: offset = %d, len = %d\n",
__func__, offset, len);
ret = -EINVAL;
break;
}
res = read_mailbox(vcpu, offset, len);
break;
default:
kvm_err("%s: unknown addr: %llx\n", __func__, addr);
ret = -EINVAL;
break;
}
*(uint64_t *)val = res;
return ret;
return 0;
}
static int loongarch_ipi_writel(struct kvm_vcpu *vcpu, gpa_t addr, int len, const void *val)
{
int ret = 0;
uint64_t data;
uint32_t offset;
@ -239,7 +232,6 @@ static int loongarch_ipi_writel(struct kvm_vcpu *vcpu, gpa_t addr, int len, cons
switch (offset) {
case IOCSR_IPI_STATUS:
ret = -EINVAL;
break;
case IOCSR_IPI_EN:
spin_lock(&vcpu->arch.ipi_state.lock);
@ -257,7 +249,6 @@ static int loongarch_ipi_writel(struct kvm_vcpu *vcpu, gpa_t addr, int len, cons
if (offset + len > IOCSR_IPI_BUF_38 + 8) {
kvm_err("%s: invalid offset or len: offset = %d, len = %d\n",
__func__, offset, len);
ret = -EINVAL;
break;
}
write_mailbox(vcpu, offset, data, len);
@ -266,18 +257,17 @@ static int loongarch_ipi_writel(struct kvm_vcpu *vcpu, gpa_t addr, int len, cons
ipi_send(vcpu->kvm, data);
break;
case IOCSR_MAIL_SEND:
ret = mail_send(vcpu->kvm, data);
mail_send(vcpu->kvm, data);
break;
case IOCSR_ANY_SEND:
ret = any_send(vcpu->kvm, data);
any_send(vcpu->kvm, data);
break;
default:
kvm_err("%s: unknown addr: %llx\n", __func__, addr);
ret = -EINVAL;
break;
}
return ret;
return 0;
}
static int kvm_ipi_read(struct kvm_vcpu *vcpu,

View file

@ -74,7 +74,7 @@ void pch_msi_set_irq(struct kvm *kvm, int irq, int level)
static int loongarch_pch_pic_read(struct loongarch_pch_pic *s, gpa_t addr, int len, void *val)
{
int ret = 0, offset;
int offset;
u64 data = 0;
void *ptemp;
@ -121,34 +121,32 @@ static int loongarch_pch_pic_read(struct loongarch_pch_pic *s, gpa_t addr, int l
data = s->isr;
break;
default:
ret = -EINVAL;
break;
}
spin_unlock(&s->lock);
if (ret == 0) {
offset = (addr - s->pch_pic_base) & 7;
data = data >> (offset * 8);
memcpy(val, &data, len);
}
offset = (addr - s->pch_pic_base) & 7;
data = data >> (offset * 8);
memcpy(val, &data, len);
return ret;
return 0;
}
static int kvm_pch_pic_read(struct kvm_vcpu *vcpu,
struct kvm_io_device *dev,
gpa_t addr, int len, void *val)
{
int ret;
int ret = 0;
struct loongarch_pch_pic *s = vcpu->kvm->arch.pch_pic;
if (!s) {
kvm_err("%s: pch pic irqchip not valid!\n", __func__);
return -EINVAL;
return ret;
}
if (addr & (len - 1)) {
kvm_err("%s: pch pic not aligned addr %llx len %d\n", __func__, addr, len);
return -EINVAL;
return ret;
}
/* statistics of pch pic reading */
@ -161,7 +159,7 @@ static int kvm_pch_pic_read(struct kvm_vcpu *vcpu,
static int loongarch_pch_pic_write(struct loongarch_pch_pic *s, gpa_t addr,
int len, const void *val)
{
int ret = 0, offset;
int offset;
u64 old, data, mask;
void *ptemp;
@ -226,29 +224,28 @@ static int loongarch_pch_pic_write(struct loongarch_pch_pic *s, gpa_t addr,
case PCH_PIC_ROUTE_ENTRY_START ... PCH_PIC_ROUTE_ENTRY_END:
break;
default:
ret = -EINVAL;
break;
}
spin_unlock(&s->lock);
return ret;
return 0;
}
static int kvm_pch_pic_write(struct kvm_vcpu *vcpu,
struct kvm_io_device *dev,
gpa_t addr, int len, const void *val)
{
int ret;
int ret = 0;
struct loongarch_pch_pic *s = vcpu->kvm->arch.pch_pic;
if (!s) {
kvm_err("%s: pch pic irqchip not valid!\n", __func__);
return -EINVAL;
return ret;
}
if (addr & (len - 1)) {
kvm_err("%s: pch pic not aligned addr %llx len %d\n", __func__, addr, len);
return -EINVAL;
return ret;
}
/* statistics of pch pic writing */

View file

@ -32,7 +32,7 @@ static int kvm_irq_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
if (priority < EXCCODE_INT_NUM)
irq = priority_to_irq[priority];
if (cpu_has_msgint && (priority == INT_AVEC)) {
if (kvm_guest_has_msgint(&vcpu->arch) && (priority == INT_AVEC)) {
set_gcsr_estat(irq);
return 1;
}
@ -64,7 +64,7 @@ static int kvm_irq_clear(struct kvm_vcpu *vcpu, unsigned int priority)
if (priority < EXCCODE_INT_NUM)
irq = priority_to_irq[priority];
if (cpu_has_msgint && (priority == INT_AVEC)) {
if (kvm_guest_has_msgint(&vcpu->arch) && (priority == INT_AVEC)) {
clear_gcsr_estat(irq);
return 1;
}

View file

@ -192,6 +192,14 @@ static void kvm_init_gcsr_flag(void)
set_gcsr_sw_flag(LOONGARCH_CSR_PERFCNTR2);
set_gcsr_sw_flag(LOONGARCH_CSR_PERFCTRL3);
set_gcsr_sw_flag(LOONGARCH_CSR_PERFCNTR3);
if (cpu_has_msgint) {
set_gcsr_hw_flag(LOONGARCH_CSR_IPR);
set_gcsr_hw_flag(LOONGARCH_CSR_ISR0);
set_gcsr_hw_flag(LOONGARCH_CSR_ISR1);
set_gcsr_hw_flag(LOONGARCH_CSR_ISR2);
set_gcsr_hw_flag(LOONGARCH_CSR_ISR3);
}
}
static void kvm_update_vpid(struct kvm_vcpu *vcpu, int cpu)
@ -394,7 +402,7 @@ static int kvm_loongarch_env_init(void)
}
kvm_init_gcsr_flag();
kvm_register_perf_callbacks(NULL);
kvm_register_perf_callbacks();
/* Register LoongArch IPI interrupt controller interface. */
ret = kvm_loongarch_register_ipi_device();

View file

@ -181,6 +181,11 @@ static void kvm_update_stolen_time(struct kvm_vcpu *vcpu)
}
st = (struct kvm_steal_time __user *)ghc->hva;
if (kvm_guest_has_pv_feature(vcpu, KVM_FEATURE_PREEMPT)) {
unsafe_put_user(0, &st->preempted, out);
vcpu->arch.st.preempted = 0;
}
unsafe_get_user(version, &st->version, out);
if (version & 1)
version += 1; /* first time write, random junk */
@ -232,6 +237,27 @@ static void kvm_late_check_requests(struct kvm_vcpu *vcpu)
kvm_flush_tlb_gpa(vcpu, vcpu->arch.flush_gpa);
vcpu->arch.flush_gpa = INVALID_GPA;
}
if (kvm_check_request(KVM_REQ_AUX_LOAD, vcpu)) {
switch (vcpu->arch.aux_ldtype) {
case KVM_LARCH_FPU:
kvm_own_fpu(vcpu);
break;
case KVM_LARCH_LSX:
kvm_own_lsx(vcpu);
break;
case KVM_LARCH_LASX:
kvm_own_lasx(vcpu);
break;
case KVM_LARCH_LBT:
kvm_own_lbt(vcpu);
break;
default:
break;
}
vcpu->arch.aux_ldtype = 0;
}
}
/*
@ -652,6 +678,8 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
static int _kvm_get_cpucfg_mask(int id, u64 *v)
{
unsigned int config;
if (id < 0 || id >= KVM_MAX_CPUCFG_REGS)
return -EINVAL;
@ -684,9 +712,17 @@ static int _kvm_get_cpucfg_mask(int id, u64 *v)
if (cpu_has_ptw)
*v |= CPUCFG2_PTW;
config = read_cpucfg(LOONGARCH_CPUCFG2);
*v |= config & (CPUCFG2_FRECIPE | CPUCFG2_DIV32 | CPUCFG2_LAM_BH);
*v |= config & (CPUCFG2_LAMCAS | CPUCFG2_LLACQ_SCREL | CPUCFG2_SCQ);
return 0;
case LOONGARCH_CPUCFG3:
*v = GENMASK(16, 0);
*v = GENMASK(23, 0);
/* VM does not support memory order and SFB setting */
config = read_cpucfg(LOONGARCH_CPUCFG3);
*v &= config & ~(CPUCFG3_SFB);
*v &= config & ~(CPUCFG3_ALDORDER_CAP | CPUCFG3_ASTORDER_CAP | CPUCFG3_SLDORDER_CAP);
return 0;
case LOONGARCH_CPUCFG4:
case LOONGARCH_CPUCFG5:
@ -717,6 +753,7 @@ static int _kvm_get_cpucfg_mask(int id, u64 *v)
static int kvm_check_cpucfg(int id, u64 val)
{
int ret;
u32 host;
u64 mask = 0;
ret = _kvm_get_cpucfg_mask(id, &mask);
@ -746,9 +783,16 @@ static int kvm_check_cpucfg(int id, u64 val)
/* LASX architecturally implies LSX and FP but val does not satisfy that */
return -EINVAL;
return 0;
case LOONGARCH_CPUCFG3:
host = read_cpucfg(LOONGARCH_CPUCFG3);
if ((val & CPUCFG3_RVAMAX) > (host & CPUCFG3_RVAMAX))
return -EINVAL;
if ((val & CPUCFG3_SPW_LVL) > (host & CPUCFG3_SPW_LVL))
return -EINVAL;
return 0;
case LOONGARCH_CPUCFG6:
if (val & CPUCFG6_PMP) {
u32 host = read_cpucfg(LOONGARCH_CPUCFG6);
host = read_cpucfg(LOONGARCH_CPUCFG6);
if ((val & CPUCFG6_PMBITS) != (host & CPUCFG6_PMBITS))
return -EINVAL;
if ((val & CPUCFG6_PMNUM) > (host & CPUCFG6_PMNUM))
@ -1286,16 +1330,11 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
#ifdef CONFIG_CPU_HAS_LBT
int kvm_own_lbt(struct kvm_vcpu *vcpu)
{
if (!kvm_guest_has_lbt(&vcpu->arch))
return -EINVAL;
preempt_disable();
if (!(vcpu->arch.aux_inuse & KVM_LARCH_LBT)) {
set_csr_euen(CSR_EUEN_LBTEN);
_restore_lbt(&vcpu->arch.lbt);
vcpu->arch.aux_inuse |= KVM_LARCH_LBT;
}
preempt_enable();
return 0;
}
@ -1338,8 +1377,6 @@ static inline void kvm_check_fcsr_alive(struct kvm_vcpu *vcpu) { }
/* Enable FPU and restore context */
void kvm_own_fpu(struct kvm_vcpu *vcpu)
{
preempt_disable();
/*
* Enable FPU for guest
* Set FR and FRE according to guest context
@ -1350,19 +1387,12 @@ void kvm_own_fpu(struct kvm_vcpu *vcpu)
kvm_restore_fpu(&vcpu->arch.fpu);
vcpu->arch.aux_inuse |= KVM_LARCH_FPU;
trace_kvm_aux(vcpu, KVM_TRACE_AUX_RESTORE, KVM_TRACE_AUX_FPU);
preempt_enable();
}
#ifdef CONFIG_CPU_HAS_LSX
/* Enable LSX and restore context */
int kvm_own_lsx(struct kvm_vcpu *vcpu)
{
if (!kvm_guest_has_fpu(&vcpu->arch) || !kvm_guest_has_lsx(&vcpu->arch))
return -EINVAL;
preempt_disable();
/* Enable LSX for guest */
kvm_check_fcsr(vcpu, vcpu->arch.fpu.fcsr);
set_csr_euen(CSR_EUEN_LSXEN | CSR_EUEN_FPEN);
@ -1384,7 +1414,6 @@ int kvm_own_lsx(struct kvm_vcpu *vcpu)
trace_kvm_aux(vcpu, KVM_TRACE_AUX_RESTORE, KVM_TRACE_AUX_LSX);
vcpu->arch.aux_inuse |= KVM_LARCH_LSX | KVM_LARCH_FPU;
preempt_enable();
return 0;
}
@ -1394,11 +1423,6 @@ int kvm_own_lsx(struct kvm_vcpu *vcpu)
/* Enable LASX and restore context */
int kvm_own_lasx(struct kvm_vcpu *vcpu)
{
if (!kvm_guest_has_fpu(&vcpu->arch) || !kvm_guest_has_lsx(&vcpu->arch) || !kvm_guest_has_lasx(&vcpu->arch))
return -EINVAL;
preempt_disable();
kvm_check_fcsr(vcpu, vcpu->arch.fpu.fcsr);
set_csr_euen(CSR_EUEN_FPEN | CSR_EUEN_LSXEN | CSR_EUEN_LASXEN);
switch (vcpu->arch.aux_inuse & (KVM_LARCH_FPU | KVM_LARCH_LSX)) {
@ -1420,7 +1444,6 @@ int kvm_own_lasx(struct kvm_vcpu *vcpu)
trace_kvm_aux(vcpu, KVM_TRACE_AUX_RESTORE, KVM_TRACE_AUX_LASX);
vcpu->arch.aux_inuse |= KVM_LARCH_LASX | KVM_LARCH_LSX | KVM_LARCH_FPU;
preempt_enable();
return 0;
}
@ -1661,7 +1684,9 @@ static int _kvm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
kvm_restore_hw_gcsr(csr, LOONGARCH_CSR_DMWIN2);
kvm_restore_hw_gcsr(csr, LOONGARCH_CSR_DMWIN3);
kvm_restore_hw_gcsr(csr, LOONGARCH_CSR_LLBCTL);
if (cpu_has_msgint) {
if (kvm_guest_has_msgint(&vcpu->arch)) {
kvm_restore_hw_gcsr(csr, LOONGARCH_CSR_IPR);
kvm_restore_hw_gcsr(csr, LOONGARCH_CSR_ISR0);
kvm_restore_hw_gcsr(csr, LOONGARCH_CSR_ISR1);
kvm_restore_hw_gcsr(csr, LOONGARCH_CSR_ISR2);
@ -1756,7 +1781,9 @@ static int _kvm_vcpu_put(struct kvm_vcpu *vcpu, int cpu)
kvm_save_hw_gcsr(csr, LOONGARCH_CSR_DMWIN1);
kvm_save_hw_gcsr(csr, LOONGARCH_CSR_DMWIN2);
kvm_save_hw_gcsr(csr, LOONGARCH_CSR_DMWIN3);
if (cpu_has_msgint) {
if (kvm_guest_has_msgint(&vcpu->arch)) {
kvm_save_hw_gcsr(csr, LOONGARCH_CSR_IPR);
kvm_save_hw_gcsr(csr, LOONGARCH_CSR_ISR0);
kvm_save_hw_gcsr(csr, LOONGARCH_CSR_ISR1);
kvm_save_hw_gcsr(csr, LOONGARCH_CSR_ISR2);
@ -1773,11 +1800,57 @@ out:
return 0;
}
static void kvm_vcpu_set_pv_preempted(struct kvm_vcpu *vcpu)
{
gpa_t gpa;
struct gfn_to_hva_cache *ghc;
struct kvm_memslots *slots;
struct kvm_steal_time __user *st;
gpa = vcpu->arch.st.guest_addr;
if (!(gpa & KVM_STEAL_PHYS_VALID))
return;
/* vCPU may be preempted for many times */
if (vcpu->arch.st.preempted)
return;
/* This happens on process exit */
if (unlikely(current->mm != vcpu->kvm->mm))
return;
gpa &= KVM_STEAL_PHYS_MASK;
ghc = &vcpu->arch.st.cache;
slots = kvm_memslots(vcpu->kvm);
if (slots->generation != ghc->generation || gpa != ghc->gpa) {
if (kvm_gfn_to_hva_cache_init(vcpu->kvm, ghc, gpa, sizeof(*st))) {
ghc->gpa = INVALID_GPA;
return;
}
}
st = (struct kvm_steal_time __user *)ghc->hva;
unsafe_put_user(KVM_VCPU_PREEMPTED, &st->preempted, out);
vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;
out:
mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
}
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
{
int cpu;
int cpu, idx;
unsigned long flags;
if (vcpu->preempted && kvm_guest_has_pv_feature(vcpu, KVM_FEATURE_PREEMPT)) {
/*
* Take the srcu lock as memslots will be accessed to check
* the gfn cache generation against the memslots generation.
*/
idx = srcu_read_lock(&vcpu->kvm->srcu);
kvm_vcpu_set_pv_preempted(vcpu);
srcu_read_unlock(&vcpu->kvm->srcu, idx);
}
local_irq_save(flags);
cpu = smp_processor_id();
vcpu->arch.last_sched_cpu = cpu;

View file

@ -29,6 +29,21 @@ static void kvm_vm_init_features(struct kvm *kvm)
{
unsigned long val;
if (cpu_has_lsx)
kvm->arch.kvm_features |= BIT(KVM_LOONGARCH_VM_FEAT_LSX);
if (cpu_has_lasx)
kvm->arch.kvm_features |= BIT(KVM_LOONGARCH_VM_FEAT_LASX);
if (cpu_has_lbt_x86)
kvm->arch.kvm_features |= BIT(KVM_LOONGARCH_VM_FEAT_X86BT);
if (cpu_has_lbt_arm)
kvm->arch.kvm_features |= BIT(KVM_LOONGARCH_VM_FEAT_ARMBT);
if (cpu_has_lbt_mips)
kvm->arch.kvm_features |= BIT(KVM_LOONGARCH_VM_FEAT_MIPSBT);
if (cpu_has_ptw)
kvm->arch.kvm_features |= BIT(KVM_LOONGARCH_VM_FEAT_PTW);
if (cpu_has_msgint)
kvm->arch.kvm_features |= BIT(KVM_LOONGARCH_VM_FEAT_MSGINT);
val = read_csr_gcfg();
if (val & CSR_GCFG_GPMP)
kvm->arch.kvm_features |= BIT(KVM_LOONGARCH_VM_FEAT_PMU);
@ -37,7 +52,9 @@ static void kvm_vm_init_features(struct kvm *kvm)
kvm->arch.pv_features = BIT(KVM_FEATURE_IPI);
kvm->arch.kvm_features = BIT(KVM_LOONGARCH_VM_FEAT_PV_IPI);
if (kvm_pvtime_supported()) {
kvm->arch.pv_features |= BIT(KVM_FEATURE_PREEMPT);
kvm->arch.pv_features |= BIT(KVM_FEATURE_STEAL_TIME);
kvm->arch.kvm_features |= BIT(KVM_LOONGARCH_VM_FEAT_PV_PREEMPT);
kvm->arch.kvm_features |= BIT(KVM_LOONGARCH_VM_FEAT_PV_STEALTIME);
}
}
@ -131,35 +148,15 @@ static int kvm_vm_feature_has_attr(struct kvm *kvm, struct kvm_device_attr *attr
{
switch (attr->attr) {
case KVM_LOONGARCH_VM_FEAT_LSX:
if (cpu_has_lsx)
return 0;
return -ENXIO;
case KVM_LOONGARCH_VM_FEAT_LASX:
if (cpu_has_lasx)
return 0;
return -ENXIO;
case KVM_LOONGARCH_VM_FEAT_X86BT:
if (cpu_has_lbt_x86)
return 0;
return -ENXIO;
case KVM_LOONGARCH_VM_FEAT_ARMBT:
if (cpu_has_lbt_arm)
return 0;
return -ENXIO;
case KVM_LOONGARCH_VM_FEAT_MIPSBT:
if (cpu_has_lbt_mips)
return 0;
return -ENXIO;
case KVM_LOONGARCH_VM_FEAT_PTW:
if (cpu_has_ptw)
return 0;
return -ENXIO;
case KVM_LOONGARCH_VM_FEAT_MSGINT:
if (cpu_has_msgint)
return 0;
return -ENXIO;
case KVM_LOONGARCH_VM_FEAT_PMU:
case KVM_LOONGARCH_VM_FEAT_PV_IPI:
case KVM_LOONGARCH_VM_FEAT_PV_PREEMPT:
case KVM_LOONGARCH_VM_FEAT_PV_STEALTIME:
if (kvm_vm_support(&kvm->arch, attr->attr))
return 0;

View file

@ -192,6 +192,9 @@ enum KVM_RISCV_ISA_EXT_ID {
KVM_RISCV_ISA_EXT_ZFBFMIN,
KVM_RISCV_ISA_EXT_ZVFBFMIN,
KVM_RISCV_ISA_EXT_ZVFBFWMA,
KVM_RISCV_ISA_EXT_ZCLSD,
KVM_RISCV_ISA_EXT_ZILSD,
KVM_RISCV_ISA_EXT_ZALASR,
KVM_RISCV_ISA_EXT_MAX,
};

View file

@ -630,7 +630,7 @@ int kvm_riscv_aia_init(void)
*/
if (gc)
kvm_riscv_aia_nr_hgei = min((ulong)kvm_riscv_aia_nr_hgei,
BIT(gc->guest_index_bits) - 1);
gc->nr_guest_files);
else
kvm_riscv_aia_nr_hgei = 0;

View file

@ -797,6 +797,10 @@ int kvm_riscv_vcpu_aia_imsic_update(struct kvm_vcpu *vcpu)
if (kvm->arch.aia.mode == KVM_DEV_RISCV_AIA_MODE_EMUL)
return 1;
/* IMSIC vCPU state may not be initialized yet */
if (!imsic)
return 1;
/* Read old IMSIC VS-file details */
read_lock_irqsave(&imsic->vsfile_lock, flags);
old_vsfile_hgei = imsic->vsfile_hgei;
@ -952,8 +956,10 @@ int kvm_riscv_aia_imsic_rw_attr(struct kvm *kvm, unsigned long type,
if (!vcpu)
return -ENODEV;
isel = KVM_DEV_RISCV_AIA_IMSIC_GET_ISEL(type);
imsic = vcpu->arch.aia_context.imsic_state;
if (!imsic)
return -ENODEV;
isel = KVM_DEV_RISCV_AIA_IMSIC_GET_ISEL(type);
read_lock_irqsave(&imsic->vsfile_lock, flags);
@ -993,8 +999,11 @@ int kvm_riscv_aia_imsic_has_attr(struct kvm *kvm, unsigned long type)
if (!vcpu)
return -ENODEV;
isel = KVM_DEV_RISCV_AIA_IMSIC_GET_ISEL(type);
imsic = vcpu->arch.aia_context.imsic_state;
if (!imsic)
return -ENODEV;
isel = KVM_DEV_RISCV_AIA_IMSIC_GET_ISEL(type);
return imsic_mrif_isel_check(imsic->nr_eix, isel);
}

View file

@ -174,7 +174,7 @@ static int __init riscv_kvm_init(void)
kvm_riscv_setup_vendor_features();
kvm_register_perf_callbacks(NULL);
kvm_register_perf_callbacks();
rc = kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE);
if (rc) {

View file

@ -305,6 +305,142 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
return pte_young(ptep_get(ptep));
}
static bool fault_supports_gstage_huge_mapping(struct kvm_memory_slot *memslot,
unsigned long hva)
{
hva_t uaddr_start, uaddr_end;
gpa_t gpa_start;
size_t size;
size = memslot->npages * PAGE_SIZE;
uaddr_start = memslot->userspace_addr;
uaddr_end = uaddr_start + size;
gpa_start = memslot->base_gfn << PAGE_SHIFT;
/*
* Pages belonging to memslots that don't have the same alignment
* within a PMD for userspace and GPA cannot be mapped with g-stage
* PMD entries, because we'll end up mapping the wrong pages.
*
* Consider a layout like the following:
*
* memslot->userspace_addr:
* +-----+--------------------+--------------------+---+
* |abcde|fgh vs-stage block | vs-stage block tv|xyz|
* +-----+--------------------+--------------------+---+
*
* memslot->base_gfn << PAGE_SHIFT:
* +---+--------------------+--------------------+-----+
* |abc|def g-stage block | g-stage block |tvxyz|
* +---+--------------------+--------------------+-----+
*
* If we create those g-stage blocks, we'll end up with this incorrect
* mapping:
* d -> f
* e -> g
* f -> h
*/
if ((gpa_start & (PMD_SIZE - 1)) != (uaddr_start & (PMD_SIZE - 1)))
return false;
/*
* Next, let's make sure we're not trying to map anything not covered
* by the memslot. This means we have to prohibit block size mappings
* for the beginning and end of a non-block aligned and non-block sized
* memory slot (illustrated by the head and tail parts of the
* userspace view above containing pages 'abcde' and 'xyz',
* respectively).
*
* Note that it doesn't matter if we do the check using the
* userspace_addr or the base_gfn, as both are equally aligned (per
* the check above) and equally sized.
*/
return (hva >= ALIGN(uaddr_start, PMD_SIZE)) && (hva < ALIGN_DOWN(uaddr_end, PMD_SIZE));
}
static int get_hva_mapping_size(struct kvm *kvm,
unsigned long hva)
{
int size = PAGE_SIZE;
unsigned long flags;
pgd_t pgd;
p4d_t p4d;
pud_t pud;
pmd_t pmd;
/*
* Disable IRQs to prevent concurrent tear down of host page tables,
* e.g. if the primary MMU promotes a P*D to a huge page and then frees
* the original page table.
*/
local_irq_save(flags);
/*
* Read each entry once. As above, a non-leaf entry can be promoted to
* a huge page _during_ this walk. Re-reading the entry could send the
* walk into the weeks, e.g. p*d_leaf() returns false (sees the old
* value) and then p*d_offset() walks into the target huge page instead
* of the old page table (sees the new value).
*/
pgd = pgdp_get(pgd_offset(kvm->mm, hva));
if (pgd_none(pgd))
goto out;
p4d = p4dp_get(p4d_offset(&pgd, hva));
if (p4d_none(p4d) || !p4d_present(p4d))
goto out;
pud = pudp_get(pud_offset(&p4d, hva));
if (pud_none(pud) || !pud_present(pud))
goto out;
if (pud_leaf(pud)) {
size = PUD_SIZE;
goto out;
}
pmd = pmdp_get(pmd_offset(&pud, hva));
if (pmd_none(pmd) || !pmd_present(pmd))
goto out;
if (pmd_leaf(pmd))
size = PMD_SIZE;
out:
local_irq_restore(flags);
return size;
}
static unsigned long transparent_hugepage_adjust(struct kvm *kvm,
struct kvm_memory_slot *memslot,
unsigned long hva,
kvm_pfn_t *hfnp, gpa_t *gpa)
{
kvm_pfn_t hfn = *hfnp;
/*
* Make sure the adjustment is done only for THP pages. Also make
* sure that the HVA and GPA are sufficiently aligned and that the
* block map is contained within the memslot.
*/
if (fault_supports_gstage_huge_mapping(memslot, hva)) {
int sz;
sz = get_hva_mapping_size(kvm, hva);
if (sz < PMD_SIZE)
return sz;
*gpa &= PMD_MASK;
hfn &= ~(PTRS_PER_PMD - 1);
*hfnp = hfn;
return PMD_SIZE;
}
return PAGE_SIZE;
}
int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
gpa_t gpa, unsigned long hva, bool is_write,
struct kvm_gstage_mapping *out_map)
@ -398,6 +534,10 @@ int kvm_riscv_mmu_map(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
if (mmu_invalidate_retry(kvm, mmu_seq))
goto out_unlock;
/* Check if we are backed by a THP and thus use block mapping if possible */
if (vma_pagesize == PAGE_SIZE)
vma_pagesize = transparent_hugepage_adjust(kvm, memslot, hva, &hfn, &gpa);
if (writable) {
mark_page_dirty_in_slot(kvm, memslot, gfn);
ret = kvm_riscv_gstage_map_page(&gstage, pcache, gpa, hfn << PAGE_SHIFT,

View file

@ -50,6 +50,7 @@ static const unsigned long kvm_isa_ext_arr[] = {
KVM_ISA_EXT_ARR(ZAAMO),
KVM_ISA_EXT_ARR(ZABHA),
KVM_ISA_EXT_ARR(ZACAS),
KVM_ISA_EXT_ARR(ZALASR),
KVM_ISA_EXT_ARR(ZALRSC),
KVM_ISA_EXT_ARR(ZAWRS),
KVM_ISA_EXT_ARR(ZBA),
@ -63,6 +64,7 @@ static const unsigned long kvm_isa_ext_arr[] = {
KVM_ISA_EXT_ARR(ZCB),
KVM_ISA_EXT_ARR(ZCD),
KVM_ISA_EXT_ARR(ZCF),
KVM_ISA_EXT_ARR(ZCLSD),
KVM_ISA_EXT_ARR(ZCMOP),
KVM_ISA_EXT_ARR(ZFA),
KVM_ISA_EXT_ARR(ZFBFMIN),
@ -79,6 +81,7 @@ static const unsigned long kvm_isa_ext_arr[] = {
KVM_ISA_EXT_ARR(ZIHINTNTL),
KVM_ISA_EXT_ARR(ZIHINTPAUSE),
KVM_ISA_EXT_ARR(ZIHPM),
KVM_ISA_EXT_ARR(ZILSD),
KVM_ISA_EXT_ARR(ZIMOP),
KVM_ISA_EXT_ARR(ZKND),
KVM_ISA_EXT_ARR(ZKNE),
@ -187,6 +190,7 @@ static bool kvm_riscv_vcpu_isa_disable_allowed(unsigned long ext)
case KVM_RISCV_ISA_EXT_ZAAMO:
case KVM_RISCV_ISA_EXT_ZABHA:
case KVM_RISCV_ISA_EXT_ZACAS:
case KVM_RISCV_ISA_EXT_ZALASR:
case KVM_RISCV_ISA_EXT_ZALRSC:
case KVM_RISCV_ISA_EXT_ZAWRS:
case KVM_RISCV_ISA_EXT_ZBA:

View file

@ -494,12 +494,9 @@ int kvm_riscv_vcpu_pmu_event_info(struct kvm_vcpu *vcpu, unsigned long saddr_low
}
ret = kvm_vcpu_write_guest(vcpu, shmem, einfo, shmem_size);
if (ret) {
if (ret)
ret = SBI_ERR_INVALID_ADDRESS;
goto free_mem;
}
ret = 0;
free_mem:
kfree(einfo);
out:

View file

@ -47,6 +47,7 @@ pud_t *pud_offset(p4d_t *p4d, unsigned long address)
return (pud_t *)p4d;
}
EXPORT_SYMBOL_GPL(pud_offset);
p4d_t *p4d_offset(pgd_t *pgd, unsigned long address)
{
@ -55,6 +56,7 @@ p4d_t *p4d_offset(pgd_t *pgd, unsigned long address)
return (p4d_t *)pgd;
}
EXPORT_SYMBOL_GPL(p4d_offset);
#endif
#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP

View file

@ -32,9 +32,6 @@ config GENERIC_BUG_RELATIVE_POINTERS
config GENERIC_LOCKBREAK
def_bool y if PREEMPTION
config PGSTE
def_bool y if KVM
config AUDIT_ARCH
def_bool y

View file

@ -9,6 +9,32 @@
#ifndef _S390_DAT_BITS_H
#define _S390_DAT_BITS_H
/*
* vaddress union in order to easily decode a virtual address into its
* region first index, region second index etc. parts.
*/
union vaddress {
unsigned long addr;
struct {
unsigned long rfx : 11;
unsigned long rsx : 11;
unsigned long rtx : 11;
unsigned long sx : 11;
unsigned long px : 8;
unsigned long bx : 12;
};
struct {
unsigned long rfx01 : 2;
unsigned long : 9;
unsigned long rsx01 : 2;
unsigned long : 9;
unsigned long rtx01 : 2;
unsigned long : 9;
unsigned long sx01 : 2;
unsigned long : 29;
};
};
union asce {
unsigned long val;
struct {
@ -98,7 +124,8 @@ union region3_table_entry {
struct {
unsigned long : 53;
unsigned long fc: 1; /* Format-Control */
unsigned long : 4;
unsigned long p : 1; /* DAT-Protection Bit */
unsigned long : 3;
unsigned long i : 1; /* Region-Invalid Bit */
unsigned long cr: 1; /* Common-Region Bit */
unsigned long tt: 2; /* Table-Type Bits */
@ -140,7 +167,8 @@ union segment_table_entry {
struct {
unsigned long : 53;
unsigned long fc: 1; /* Format-Control */
unsigned long : 4;
unsigned long p : 1; /* DAT-Protection Bit */
unsigned long : 3;
unsigned long i : 1; /* Segment-Invalid Bit */
unsigned long cs: 1; /* Common-Segment Bit */
unsigned long tt: 2; /* Table-Type Bits */

View file

@ -1,174 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* KVM guest address space mapping code
*
* Copyright IBM Corp. 2007, 2016
* Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
*/
#ifndef _ASM_S390_GMAP_H
#define _ASM_S390_GMAP_H
#include <linux/radix-tree.h>
#include <linux/refcount.h>
/* Generic bits for GMAP notification on DAT table entry changes. */
#define GMAP_NOTIFY_SHADOW 0x2
#define GMAP_NOTIFY_MPROT 0x1
/* Status bits only for huge segment entries */
#define _SEGMENT_ENTRY_GMAP_IN 0x0800 /* invalidation notify bit */
#define _SEGMENT_ENTRY_GMAP_UC 0x0002 /* dirty (migration) */
/**
* struct gmap_struct - guest address space
* @list: list head for the mm->context gmap list
* @mm: pointer to the parent mm_struct
* @guest_to_host: radix tree with guest to host address translation
* @host_to_guest: radix tree with pointer to segment table entries
* @guest_table_lock: spinlock to protect all entries in the guest page table
* @ref_count: reference counter for the gmap structure
* @table: pointer to the page directory
* @asce: address space control element for gmap page table
* @pfault_enabled: defines if pfaults are applicable for the guest
* @guest_handle: protected virtual machine handle for the ultravisor
* @host_to_rmap: radix tree with gmap_rmap lists
* @children: list of shadow gmap structures
* @shadow_lock: spinlock to protect the shadow gmap list
* @parent: pointer to the parent gmap for shadow guest address spaces
* @orig_asce: ASCE for which the shadow page table has been created
* @edat_level: edat level to be used for the shadow translation
* @removed: flag to indicate if a shadow guest address space has been removed
* @initialized: flag to indicate if a shadow guest address space can be used
*/
struct gmap {
struct list_head list;
struct mm_struct *mm;
struct radix_tree_root guest_to_host;
struct radix_tree_root host_to_guest;
spinlock_t guest_table_lock;
refcount_t ref_count;
unsigned long *table;
unsigned long asce;
unsigned long asce_end;
void *private;
bool pfault_enabled;
/* only set for protected virtual machines */
unsigned long guest_handle;
/* Additional data for shadow guest address spaces */
struct radix_tree_root host_to_rmap;
struct list_head children;
spinlock_t shadow_lock;
struct gmap *parent;
unsigned long orig_asce;
int edat_level;
bool removed;
bool initialized;
};
/**
* struct gmap_rmap - reverse mapping for shadow page table entries
* @next: pointer to next rmap in the list
* @raddr: virtual rmap address in the shadow guest address space
*/
struct gmap_rmap {
struct gmap_rmap *next;
unsigned long raddr;
};
#define gmap_for_each_rmap(pos, head) \
for (pos = (head); pos; pos = pos->next)
#define gmap_for_each_rmap_safe(pos, n, head) \
for (pos = (head); n = pos ? pos->next : NULL, pos; pos = n)
/**
* struct gmap_notifier - notify function block for page invalidation
* @notifier_call: address of callback function
*/
struct gmap_notifier {
struct list_head list;
struct rcu_head rcu;
void (*notifier_call)(struct gmap *gmap, unsigned long start,
unsigned long end);
};
static inline int gmap_is_shadow(struct gmap *gmap)
{
return !!gmap->parent;
}
struct gmap *gmap_create(struct mm_struct *mm, unsigned long limit);
void gmap_remove(struct gmap *gmap);
struct gmap *gmap_get(struct gmap *gmap);
void gmap_put(struct gmap *gmap);
void gmap_free(struct gmap *gmap);
struct gmap *gmap_alloc(unsigned long limit);
int gmap_map_segment(struct gmap *gmap, unsigned long from,
unsigned long to, unsigned long len);
int gmap_unmap_segment(struct gmap *gmap, unsigned long to, unsigned long len);
unsigned long __gmap_translate(struct gmap *, unsigned long gaddr);
int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr);
void __gmap_zap(struct gmap *, unsigned long gaddr);
void gmap_unlink(struct mm_struct *, unsigned long *table, unsigned long vmaddr);
int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val);
void gmap_unshadow(struct gmap *sg);
int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2t,
int fake);
int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t,
int fake);
int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sgt,
int fake);
int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
int fake);
int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte);
void gmap_register_pte_notifier(struct gmap_notifier *);
void gmap_unregister_pte_notifier(struct gmap_notifier *);
int gmap_protect_one(struct gmap *gmap, unsigned long gaddr, int prot, unsigned long bits);
void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
unsigned long gaddr, unsigned long vmaddr);
int s390_replace_asce(struct gmap *gmap);
void s390_uv_destroy_pfns(unsigned long count, unsigned long *pfns);
int __s390_uv_destroy_range(struct mm_struct *mm, unsigned long start,
unsigned long end, bool interruptible);
unsigned long *gmap_table_walk(struct gmap *gmap, unsigned long gaddr, int level);
/**
* s390_uv_destroy_range - Destroy a range of pages in the given mm.
* @mm: the mm on which to operate on
* @start: the start of the range
* @end: the end of the range
*
* This function will call cond_sched, so it should not generate stalls, but
* it will otherwise only return when it completed.
*/
static inline void s390_uv_destroy_range(struct mm_struct *mm, unsigned long start,
unsigned long end)
{
(void)__s390_uv_destroy_range(mm, start, end, false);
}
/**
* s390_uv_destroy_range_interruptible - Destroy a range of pages in the
* given mm, but stop when a fatal signal is received.
* @mm: the mm on which to operate on
* @start: the start of the range
* @end: the end of the range
*
* This function will call cond_sched, so it should not generate stalls. If
* a fatal signal is received, it will return with -EINTR immediately,
* without finishing destroying the whole range. Upon successful
* completion, 0 is returned.
*/
static inline int s390_uv_destroy_range_interruptible(struct mm_struct *mm, unsigned long start,
unsigned long end)
{
return __s390_uv_destroy_range(mm, start, end, true);
}
#endif /* _ASM_S390_GMAP_H */

View file

@ -11,5 +11,6 @@
void gmap_helper_zap_one_page(struct mm_struct *mm, unsigned long vmaddr);
void gmap_helper_discard(struct mm_struct *mm, unsigned long vmaddr, unsigned long end);
int gmap_helper_disable_cow_sharing(void);
void gmap_helper_try_set_pte_unused(struct mm_struct *mm, unsigned long vmaddr);
#endif /* _ASM_S390_GMAP_HELPERS_H */

View file

@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
return __huge_ptep_get_and_clear(mm, addr, ptep);
}
static inline void arch_clear_hugetlb_flags(struct folio *folio)
{
clear_bit(PG_arch_1, &folio->flags.f);
}
#define arch_clear_hugetlb_flags arch_clear_hugetlb_flags
#define __HAVE_ARCH_HUGE_PTE_CLEAR
static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned long sz)

View file

@ -27,6 +27,7 @@
#include <asm/isc.h>
#include <asm/guarded_storage.h>
#define KVM_HAVE_MMU_RWLOCK
#define KVM_MAX_VCPUS 255
#define KVM_INTERNAL_MEM_SLOTS 1
@ -441,6 +442,7 @@ struct kvm_vcpu_arch {
bool acrs_loaded;
struct kvm_s390_pv_vcpu pv;
union diag318_info diag318_info;
struct kvm_s390_mmu_cache *mc;
};
struct kvm_vm_stat {
@ -630,8 +632,12 @@ struct kvm_s390_pv {
void *set_aside;
struct list_head need_cleanup;
struct mmu_notifier mmu_notifier;
/* Protects against concurrent import-like operations */
struct mutex import_lock;
};
struct kvm_s390_mmu_cache;
struct kvm_arch {
struct esca_block *sca;
debug_info_t *dbf;
@ -671,6 +677,7 @@ struct kvm_arch {
struct kvm_s390_pv pv;
struct list_head kzdev_list;
spinlock_t kzdev_list_lock;
struct kvm_s390_mmu_cache *mc;
};
#define KVM_HVA_ERR_BAD (-1UL)

View file

@ -18,24 +18,11 @@ typedef struct {
unsigned long vdso_base;
/* The mmu context belongs to a secure guest. */
atomic_t protected_count;
/*
* The following bitfields need a down_write on the mm
* semaphore when they are written to. As they are only
* written once, they can be read without a lock.
*/
/* The mmu context uses extended page tables. */
unsigned int has_pgste:1;
/* The mmu context uses storage keys. */
unsigned int uses_skeys:1;
/* The mmu context uses CMM. */
unsigned int uses_cmm:1;
/*
* The mmu context allows COW-sharing of memory pages (KSM, zeropage).
* Note that COW-sharing during fork() is currently always allowed.
*/
unsigned int allow_cow_sharing:1;
/* The gmaps associated with this context are allowed to use huge pages. */
unsigned int allow_gmap_hpage_1m:1;
} mm_context_t;
#define INIT_MM_CONTEXT(name) \

View file

@ -29,12 +29,8 @@ static inline int init_new_context(struct task_struct *tsk,
atomic_set(&mm->context.protected_count, 0);
mm->context.gmap_asce = 0;
mm->context.flush_mm = 0;
#ifdef CONFIG_PGSTE
mm->context.has_pgste = 0;
mm->context.uses_skeys = 0;
mm->context.uses_cmm = 0;
#if IS_ENABLED(CONFIG_KVM)
mm->context.allow_cow_sharing = 1;
mm->context.allow_gmap_hpage_1m = 0;
#endif
switch (mm->context.asce_limit) {
default:

View file

@ -77,7 +77,6 @@ static inline void copy_page(void *to, void *from)
#ifdef STRICT_MM_TYPECHECKS
typedef struct { unsigned long pgprot; } pgprot_t;
typedef struct { unsigned long pgste; } pgste_t;
typedef struct { unsigned long pte; } pte_t;
typedef struct { unsigned long pmd; } pmd_t;
typedef struct { unsigned long pud; } pud_t;
@ -93,7 +92,6 @@ static __always_inline unsigned long name ## _val(name ## _t name) \
#else /* STRICT_MM_TYPECHECKS */
typedef unsigned long pgprot_t;
typedef unsigned long pgste_t;
typedef unsigned long pte_t;
typedef unsigned long pmd_t;
typedef unsigned long pud_t;
@ -109,7 +107,6 @@ static __always_inline unsigned long name ## _val(name ## _t name) \
#endif /* STRICT_MM_TYPECHECKS */
DEFINE_PGVAL_FUNC(pgprot)
DEFINE_PGVAL_FUNC(pgste)
DEFINE_PGVAL_FUNC(pte)
DEFINE_PGVAL_FUNC(pmd)
DEFINE_PGVAL_FUNC(pud)
@ -119,7 +116,6 @@ DEFINE_PGVAL_FUNC(pgd)
typedef pte_t *pgtable_t;
#define __pgprot(x) ((pgprot_t) { (x) } )
#define __pgste(x) ((pgste_t) { (x) } )
#define __pte(x) ((pte_t) { (x) } )
#define __pmd(x) ((pmd_t) { (x) } )
#define __pud(x) ((pud_t) { (x) } )

View file

@ -27,10 +27,6 @@ unsigned long *page_table_alloc_noprof(struct mm_struct *);
#define page_table_alloc(...) alloc_hooks(page_table_alloc_noprof(__VA_ARGS__))
void page_table_free(struct mm_struct *, unsigned long *);
struct ptdesc *page_table_alloc_pgste_noprof(struct mm_struct *mm);
#define page_table_alloc_pgste(...) alloc_hooks(page_table_alloc_pgste_noprof(__VA_ARGS__))
void page_table_free_pgste(struct ptdesc *ptdesc);
static inline void crst_table_init(unsigned long *crst, unsigned long entry)
{
memset64((u64 *)crst, entry, _CRST_ENTRIES);

View file

@ -413,28 +413,6 @@ void setup_protection_map(void);
* SW-bits: y young, d dirty, r read, w write
*/
/* Page status table bits for virtualization */
#define PGSTE_ACC_BITS 0xf000000000000000UL
#define PGSTE_FP_BIT 0x0800000000000000UL
#define PGSTE_PCL_BIT 0x0080000000000000UL
#define PGSTE_HR_BIT 0x0040000000000000UL
#define PGSTE_HC_BIT 0x0020000000000000UL
#define PGSTE_GR_BIT 0x0004000000000000UL
#define PGSTE_GC_BIT 0x0002000000000000UL
#define PGSTE_ST2_MASK 0x0000ffff00000000UL
#define PGSTE_UC_BIT 0x0000000000008000UL /* user dirty (migration) */
#define PGSTE_IN_BIT 0x0000000000004000UL /* IPTE notify bit */
#define PGSTE_VSIE_BIT 0x0000000000002000UL /* ref'd in a shadow table */
/* Guest Page State used for virtualization */
#define _PGSTE_GPS_ZERO 0x0000000080000000UL
#define _PGSTE_GPS_NODAT 0x0000000040000000UL
#define _PGSTE_GPS_USAGE_MASK 0x0000000003000000UL
#define _PGSTE_GPS_USAGE_STABLE 0x0000000000000000UL
#define _PGSTE_GPS_USAGE_UNUSED 0x0000000001000000UL
#define _PGSTE_GPS_USAGE_POT_VOLATILE 0x0000000002000000UL
#define _PGSTE_GPS_USAGE_VOLATILE _PGSTE_GPS_USAGE_MASK
/*
* A user page table pointer has the space-switch-event bit, the
* private-space-control bit and the storage-alteration-event-control
@ -566,34 +544,15 @@ static inline bool mm_pmd_folded(struct mm_struct *mm)
}
#define mm_pmd_folded(mm) mm_pmd_folded(mm)
static inline int mm_has_pgste(struct mm_struct *mm)
{
#ifdef CONFIG_PGSTE
if (unlikely(mm->context.has_pgste))
return 1;
#endif
return 0;
}
static inline int mm_is_protected(struct mm_struct *mm)
{
#ifdef CONFIG_PGSTE
#if IS_ENABLED(CONFIG_KVM)
if (unlikely(atomic_read(&mm->context.protected_count)))
return 1;
#endif
return 0;
}
static inline pgste_t clear_pgste_bit(pgste_t pgste, unsigned long mask)
{
return __pgste(pgste_val(pgste) & ~mask);
}
static inline pgste_t set_pgste_bit(pgste_t pgste, unsigned long mask)
{
return __pgste(pgste_val(pgste) | mask);
}
static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot)
{
return __pte(pte_val(pte) & ~pgprot_val(prot));
@ -632,22 +591,13 @@ static inline pud_t set_pud_bit(pud_t pud, pgprot_t prot)
#define mm_forbids_zeropage mm_forbids_zeropage
static inline int mm_forbids_zeropage(struct mm_struct *mm)
{
#ifdef CONFIG_PGSTE
#if IS_ENABLED(CONFIG_KVM)
if (!mm->context.allow_cow_sharing)
return 1;
#endif
return 0;
}
static inline int mm_uses_skeys(struct mm_struct *mm)
{
#ifdef CONFIG_PGSTE
if (mm->context.uses_skeys)
return 1;
#endif
return 0;
}
/**
* cspg() - Compare and Swap and Purge (CSPG)
* @ptr: Pointer to the value to be exchanged
@ -1136,6 +1086,13 @@ static inline pte_t pte_mkhuge(pte_t pte)
}
#endif
static inline unsigned long sske_frame(unsigned long addr, unsigned char skey)
{
asm volatile("sske %[skey],%[addr],1"
: [addr] "+a" (addr) : [skey] "d" (skey));
return addr;
}
#define IPTE_GLOBAL 0
#define IPTE_LOCAL 1
@ -1232,7 +1189,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
res = ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
/* At this point the reference through the mapping is still present */
if (mm_is_protected(mm) && pte_present(res))
uv_convert_from_secure_pte(res);
WARN_ON_ONCE(uv_convert_from_secure_pte(res));
return res;
}
@ -1250,7 +1207,7 @@ static inline pte_t ptep_clear_flush(struct vm_area_struct *vma,
res = ptep_xchg_direct(vma->vm_mm, addr, ptep, __pte(_PAGE_INVALID));
/* At this point the reference through the mapping is still present */
if (mm_is_protected(vma->vm_mm) && pte_present(res))
uv_convert_from_secure_pte(res);
WARN_ON_ONCE(uv_convert_from_secure_pte(res));
return res;
}
@ -1287,9 +1244,10 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
/*
* If something went wrong and the page could not be destroyed, or
* if this is not a mm teardown, the slower export is used as
* fallback instead.
* fallback instead. If even that fails, print a warning and leak
* the page, to avoid crashing the whole system.
*/
uv_convert_from_secure_pte(res);
WARN_ON_ONCE(uv_convert_from_secure_pte(res));
return res;
}
@ -1348,50 +1306,13 @@ static inline int ptep_set_access_flags(struct vm_area_struct *vma,
{
if (pte_same(*ptep, entry))
return 0;
if (cpu_has_rdp() && !mm_has_pgste(vma->vm_mm) && pte_allow_rdp(*ptep, entry))
if (cpu_has_rdp() && pte_allow_rdp(*ptep, entry))
ptep_reset_dat_prot(vma->vm_mm, addr, ptep, entry);
else
ptep_xchg_direct(vma->vm_mm, addr, ptep, entry);
return 1;
}
/*
* Additional functions to handle KVM guest page tables
*/
void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t entry);
void ptep_set_notify(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
void ptep_notify(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned long bits);
int ptep_force_prot(struct mm_struct *mm, unsigned long gaddr,
pte_t *ptep, int prot, unsigned long bit);
void ptep_zap_unused(struct mm_struct *mm, unsigned long addr,
pte_t *ptep , int reset);
void ptep_zap_key(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
int ptep_shadow_pte(struct mm_struct *mm, unsigned long saddr,
pte_t *sptep, pte_t *tptep, pte_t pte);
void ptep_unshadow_pte(struct mm_struct *mm, unsigned long saddr, pte_t *ptep);
bool ptep_test_and_clear_uc(struct mm_struct *mm, unsigned long address,
pte_t *ptep);
int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
unsigned char key, bool nq);
int cond_set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
unsigned char key, unsigned char *oldkey,
bool nq, bool mr, bool mc);
int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr);
int get_guest_storage_key(struct mm_struct *mm, unsigned long addr,
unsigned char *key);
int set_pgste_bits(struct mm_struct *mm, unsigned long addr,
unsigned long bits, unsigned long value);
int get_pgste(struct mm_struct *mm, unsigned long hva, unsigned long *pgstep);
int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc,
unsigned long *oldpte, unsigned long *oldpgste);
void gmap_pmdp_invalidate(struct mm_struct *mm, unsigned long vmaddr);
void gmap_pmdp_idte_local(struct mm_struct *mm, unsigned long vmaddr);
void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr);
#define pgprot_writecombine pgprot_writecombine
pgprot_t pgprot_writecombine(pgprot_t prot);
@ -1406,23 +1327,12 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
{
if (pte_present(entry))
entry = clear_pte_bit(entry, __pgprot(_PAGE_UNUSED));
if (mm_has_pgste(mm)) {
for (;;) {
ptep_set_pte_at(mm, addr, ptep, entry);
if (--nr == 0)
break;
ptep++;
entry = __pte(pte_val(entry) + PAGE_SIZE);
addr += PAGE_SIZE;
}
} else {
for (;;) {
set_pte(ptep, entry);
if (--nr == 0)
break;
ptep++;
entry = __pte(pte_val(entry) + PAGE_SIZE);
}
for (;;) {
set_pte(ptep, entry);
if (--nr == 0)
break;
ptep++;
entry = __pte(pte_val(entry) + PAGE_SIZE);
}
}
#define set_ptes set_ptes
@ -2015,9 +1925,6 @@ extern int __vmem_map_4k_page(unsigned long addr, unsigned long phys, pgprot_t p
extern int vmem_map_4k_page(unsigned long addr, unsigned long phys, pgprot_t prot);
extern void vmem_unmap_4k_page(unsigned long addr);
extern pte_t *vmem_get_alloc_pte(unsigned long addr, bool alloc);
extern int s390_enable_sie(void);
extern int s390_enable_skey(void);
extern void s390_reset_cmma(struct mm_struct *mm);
/* s390 has a private copy of get unmapped area to deal with cache synonyms */
#define HAVE_ARCH_UNMAPPED_AREA
@ -2026,40 +1933,4 @@ extern void s390_reset_cmma(struct mm_struct *mm);
#define pmd_pgtable(pmd) \
((pgtable_t)__va(pmd_val(pmd) & -sizeof(pte_t)*PTRS_PER_PTE))
static inline unsigned long gmap_pgste_get_pgt_addr(unsigned long *pgt)
{
unsigned long *pgstes, res;
pgstes = pgt + _PAGE_ENTRIES;
res = (pgstes[0] & PGSTE_ST2_MASK) << 16;
res |= pgstes[1] & PGSTE_ST2_MASK;
res |= (pgstes[2] & PGSTE_ST2_MASK) >> 16;
res |= (pgstes[3] & PGSTE_ST2_MASK) >> 32;
return res;
}
static inline pgste_t pgste_get_lock(pte_t *ptep)
{
unsigned long value = 0;
#ifdef CONFIG_PGSTE
unsigned long *ptr = (unsigned long *)(ptep + PTRS_PER_PTE);
do {
value = __atomic64_or_barrier(PGSTE_PCL_BIT, ptr);
} while (value & PGSTE_PCL_BIT);
value |= PGSTE_PCL_BIT;
#endif
return __pgste(value);
}
static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste)
{
#ifdef CONFIG_PGSTE
barrier();
WRITE_ONCE(*(unsigned long *)(ptep + PTRS_PER_PTE), pgste_val(pgste) & ~PGSTE_PCL_BIT);
#endif
}
#endif /* _S390_PAGE_H */

View file

@ -36,7 +36,6 @@ static inline bool __tlb_remove_folio_pages(struct mmu_gather *tlb,
#include <asm/tlbflush.h>
#include <asm-generic/tlb.h>
#include <asm/gmap.h>
/*
* Release the page cache reference for a pte removed by
@ -83,8 +82,6 @@ static inline void pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
tlb->mm->context.flush_mm = 1;
tlb->freed_tables = 1;
tlb->cleared_pmds = 1;
if (mm_has_pgste(tlb->mm))
gmap_unlink(tlb->mm, (unsigned long *)pte, address);
tlb_remove_ptdesc(tlb, virt_to_ptdesc(pte));
}

View file

@ -471,65 +471,15 @@ do { \
#define arch_get_kernel_nofault __mvc_kernel_nofault
#define arch_put_kernel_nofault __mvc_kernel_nofault
void __cmpxchg_user_key_called_with_bad_pointer(void);
int __cmpxchg_user_key1(unsigned long address, unsigned char *uval,
unsigned char old, unsigned char new, unsigned long key);
int __cmpxchg_user_key2(unsigned long address, unsigned short *uval,
unsigned short old, unsigned short new, unsigned long key);
int __cmpxchg_user_key4(unsigned long address, unsigned int *uval,
unsigned int old, unsigned int new, unsigned long key);
int __cmpxchg_user_key8(unsigned long address, unsigned long *uval,
unsigned long old, unsigned long new, unsigned long key);
int __cmpxchg_user_key16(unsigned long address, __uint128_t *uval,
__uint128_t old, __uint128_t new, unsigned long key);
static __always_inline int _cmpxchg_user_key(unsigned long address, void *uval,
__uint128_t old, __uint128_t new,
unsigned long key, int size)
{
switch (size) {
case 1: return __cmpxchg_user_key1(address, uval, old, new, key);
case 2: return __cmpxchg_user_key2(address, uval, old, new, key);
case 4: return __cmpxchg_user_key4(address, uval, old, new, key);
case 8: return __cmpxchg_user_key8(address, uval, old, new, key);
case 16: return __cmpxchg_user_key16(address, uval, old, new, key);
default: __cmpxchg_user_key_called_with_bad_pointer();
}
return 0;
}
/**
* cmpxchg_user_key() - cmpxchg with user space target, honoring storage keys
* @ptr: User space address of value to compare to @old and exchange with
* @new. Must be aligned to sizeof(*@ptr).
* @uval: Address where the old value of *@ptr is written to.
* @old: Old value. Compared to the content pointed to by @ptr in order to
* determine if the exchange occurs. The old value read from *@ptr is
* written to *@uval.
* @new: New value to place at *@ptr.
* @key: Access key to use for checking storage key protection.
*
* Perform a cmpxchg on a user space target, honoring storage key protection.
* @key alone determines how key checking is performed, neither
* storage-protection-override nor fetch-protection-override apply.
* The caller must compare *@uval and @old to determine if values have been
* exchanged. In case of an exception *@uval is set to zero.
*
* Return: 0: cmpxchg executed
* -EFAULT: an exception happened when trying to access *@ptr
* -EAGAIN: maxed out number of retries (byte and short only)
*/
#define cmpxchg_user_key(ptr, uval, old, new, key) \
({ \
__typeof__(ptr) __ptr = (ptr); \
__typeof__(uval) __uval = (uval); \
\
BUILD_BUG_ON(sizeof(*(__ptr)) != sizeof(*(__uval))); \
might_fault(); \
__chk_user_ptr(__ptr); \
_cmpxchg_user_key((unsigned long)(__ptr), (void *)(__uval), \
(old), (new), (key), sizeof(*(__ptr))); \
})
int __cmpxchg_key1(void *address, unsigned char *uval, unsigned char old,
unsigned char new, unsigned long key);
int __cmpxchg_key2(void *address, unsigned short *uval, unsigned short old,
unsigned short new, unsigned long key);
int __cmpxchg_key4(void *address, unsigned int *uval, unsigned int old,
unsigned int new, unsigned long key);
int __cmpxchg_key8(void *address, unsigned long *uval, unsigned long old,
unsigned long new, unsigned long key);
int __cmpxchg_key16(void *address, __uint128_t *uval, __uint128_t old,
__uint128_t new, unsigned long key);
#endif /* __S390_UACCESS_H */

View file

@ -631,7 +631,8 @@ int uv_pin_shared(unsigned long paddr);
int uv_destroy_folio(struct folio *folio);
int uv_destroy_pte(pte_t pte);
int uv_convert_from_secure_pte(pte_t pte);
int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_header *uvcb);
int s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio);
int __make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb);
int uv_convert_from_secure(unsigned long paddr);
int uv_convert_from_secure_folio(struct folio *folio);

View file

@ -134,14 +134,15 @@ static int uv_destroy(unsigned long paddr)
*/
int uv_destroy_folio(struct folio *folio)
{
unsigned long i;
int rc;
/* Large folios cannot be secure */
if (unlikely(folio_test_large(folio)))
return 0;
folio_get(folio);
rc = uv_destroy(folio_to_phys(folio));
for (i = 0; i < (1 << folio_order(folio)); i++) {
rc = uv_destroy(folio_to_phys(folio) + i * PAGE_SIZE);
if (rc)
break;
}
if (!rc)
clear_bit(PG_arch_1, &folio->flags.f);
folio_put(folio);
@ -183,14 +184,15 @@ EXPORT_SYMBOL_GPL(uv_convert_from_secure);
*/
int uv_convert_from_secure_folio(struct folio *folio)
{
unsigned long i;
int rc;
/* Large folios cannot be secure */
if (unlikely(folio_test_large(folio)))
return 0;
folio_get(folio);
rc = uv_convert_from_secure(folio_to_phys(folio));
for (i = 0; i < (1 << folio_order(folio)); i++) {
rc = uv_convert_from_secure(folio_to_phys(folio) + i * PAGE_SIZE);
if (rc)
break;
}
if (!rc)
clear_bit(PG_arch_1, &folio->flags.f);
folio_put(folio);
@ -207,39 +209,6 @@ int uv_convert_from_secure_pte(pte_t pte)
return uv_convert_from_secure_folio(pfn_folio(pte_pfn(pte)));
}
/**
* should_export_before_import - Determine whether an export is needed
* before an import-like operation
* @uvcb: the Ultravisor control block of the UVC to be performed
* @mm: the mm of the process
*
* Returns whether an export is needed before every import-like operation.
* This is needed for shared pages, which don't trigger a secure storage
* exception when accessed from a different guest.
*
* Although considered as one, the Unpin Page UVC is not an actual import,
* so it is not affected.
*
* No export is needed also when there is only one protected VM, because the
* page cannot belong to the wrong VM in that case (there is no "other VM"
* it can belong to).
*
* Return: true if an export is needed before every import, otherwise false.
*/
static bool should_export_before_import(struct uv_cb_header *uvcb, struct mm_struct *mm)
{
/*
* The misc feature indicates, among other things, that importing a
* shared page from a different protected VM will automatically also
* transfer its ownership.
*/
if (uv_has_feature(BIT_UV_FEAT_MISC))
return false;
if (uvcb->cmd == UVC_CMD_UNPIN_PAGE_SHARED)
return false;
return atomic_read(&mm->context.protected_count) > 1;
}
/*
* Calculate the expected ref_count for a folio that would otherwise have no
* further pins. This was cribbed from similar functions in other places in
@ -279,7 +248,7 @@ static int expected_folio_refs(struct folio *folio)
* (it's the same logic as split_folio()), and the folio must be
* locked.
*/
static int __make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb)
int __make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb)
{
int expected, cc = 0;
@ -309,20 +278,7 @@ static int __make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb)
return -EAGAIN;
return uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
}
static int make_folio_secure(struct mm_struct *mm, struct folio *folio, struct uv_cb_header *uvcb)
{
int rc;
if (!folio_trylock(folio))
return -EAGAIN;
if (should_export_before_import(uvcb, mm))
uv_convert_from_secure(folio_to_phys(folio));
rc = __make_folio_secure(folio, uvcb);
folio_unlock(folio);
return rc;
}
EXPORT_SYMBOL(__make_folio_secure);
/**
* s390_wiggle_split_folio() - try to drain extra references to a folio and
@ -337,7 +293,7 @@ static int make_folio_secure(struct mm_struct *mm, struct folio *folio, struct u
* but another attempt can be made;
* -EINVAL in case of other folio splitting errors. See split_folio().
*/
static int s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio)
int s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio)
{
int rc, tried_splits;
@ -409,56 +365,7 @@ static int s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio)
}
return -EAGAIN;
}
int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_header *uvcb)
{
struct vm_area_struct *vma;
struct folio_walk fw;
struct folio *folio;
int rc;
mmap_read_lock(mm);
vma = vma_lookup(mm, hva);
if (!vma) {
mmap_read_unlock(mm);
return -EFAULT;
}
folio = folio_walk_start(&fw, vma, hva, 0);
if (!folio) {
mmap_read_unlock(mm);
return -ENXIO;
}
folio_get(folio);
/*
* Secure pages cannot be huge and userspace should not combine both.
* In case userspace does it anyway this will result in an -EFAULT for
* the unpack. The guest is thus never reaching secure mode.
* If userspace plays dirty tricks and decides to map huge pages at a
* later point in time, it will receive a segmentation fault or
* KVM_RUN will return -EFAULT.
*/
if (folio_test_hugetlb(folio))
rc = -EFAULT;
else if (folio_test_large(folio))
rc = -E2BIG;
else if (!pte_write(fw.pte) || (pte_val(fw.pte) & _PAGE_INVALID))
rc = -ENXIO;
else
rc = make_folio_secure(mm, folio, uvcb);
folio_walk_end(&fw, vma);
mmap_read_unlock(mm);
if (rc == -E2BIG || rc == -EBUSY) {
rc = s390_wiggle_split_folio(mm, folio);
if (!rc)
rc = -EAGAIN;
}
folio_put(folio);
return rc;
}
EXPORT_SYMBOL_GPL(make_hva_secure);
EXPORT_SYMBOL_GPL(s390_wiggle_split_folio);
/*
* To be called with the folio locked or with an extra reference! This will
@ -470,21 +377,18 @@ int arch_make_folio_accessible(struct folio *folio)
{
int rc = 0;
/* Large folios cannot be secure */
if (unlikely(folio_test_large(folio)))
return 0;
/*
* PG_arch_1 is used in 2 places:
* 1. for storage keys of hugetlb folios and KVM
* 2. As an indication that this small folio might be secure. This can
* overindicate, e.g. we set the bit before calling
* convert_to_secure.
* As secure pages are never large folios, both variants can co-exists.
* PG_arch_1 is used as an indication that this small folio might be
* secure. This can overindicate, e.g. we set the bit before calling
* convert_to_secure.
*/
if (!test_bit(PG_arch_1, &folio->flags.f))
return 0;
/* Large folios cannot be secure. */
if (WARN_ON_ONCE(folio_test_large(folio)))
return -EFAULT;
rc = uv_pin_shared(folio_to_phys(folio));
if (!rc) {
clear_bit(PG_arch_1, &folio->flags.f);

View file

@ -30,6 +30,8 @@ config KVM
select KVM_VFIO
select MMU_NOTIFIER
select VIRT_XFER_TO_GUEST_WORK
select KVM_GENERIC_MMU_NOTIFIER
select KVM_MMU_LOCKLESS_AGING
help
Support hosting paravirtualized guest machines using the SIE
virtualization capability on the mainframe. This should work

View file

@ -8,7 +8,8 @@ include $(srctree)/virt/kvm/Makefile.kvm
ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
kvm-y += kvm-s390.o intercept.o interrupt.o priv.o sigp.o
kvm-y += diag.o gaccess.o guestdbg.o vsie.o pv.o gmap-vsie.o
kvm-y += diag.o gaccess.o guestdbg.o vsie.o pv.o
kvm-y += dat.o gmap.o faultin.o
kvm-$(CONFIG_VFIO_PCI_ZDEV_KVM) += pci.o
obj-$(CONFIG_KVM) += kvm.o

1391
arch/s390/kvm/dat.c Normal file

File diff suppressed because it is too large Load diff

970
arch/s390/kvm/dat.h Normal file
View file

@ -0,0 +1,970 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* KVM guest address space mapping code
*
* Copyright IBM Corp. 2024, 2025
* Author(s): Claudio Imbrenda <imbrenda@linux.ibm.com>
*/
#ifndef __KVM_S390_DAT_H
#define __KVM_S390_DAT_H
#include <linux/radix-tree.h>
#include <linux/refcount.h>
#include <linux/io.h>
#include <linux/kvm_types.h>
#include <linux/pgalloc.h>
#include <asm/tlbflush.h>
#include <asm/dat-bits.h>
/*
* Base address and length must be sent at the start of each block, therefore
* it's cheaper to send some clean data, as long as it's less than the size of
* two longs.
*/
#define KVM_S390_MAX_BIT_DISTANCE (2 * sizeof(void *))
/* For consistency */
#define KVM_S390_CMMA_SIZE_MAX ((u32)KVM_S390_SKEYS_MAX)
#define _ASCE(x) ((union asce) { .val = (x), })
#define NULL_ASCE _ASCE(0)
enum {
_DAT_TOKEN_NONE = 0,
_DAT_TOKEN_PIC,
};
#define _CRSTE_TOK(l, t, p) ((union crste) { \
.tok.i = 1, \
.tok.tt = (l), \
.tok.type = (t), \
.tok.par = (p) \
})
#define _CRSTE_PIC(l, p) _CRSTE_TOK(l, _DAT_TOKEN_PIC, p)
#define _CRSTE_HOLE(l) _CRSTE_PIC(l, PGM_ADDRESSING)
#define _CRSTE_EMPTY(l) _CRSTE_TOK(l, _DAT_TOKEN_NONE, 0)
#define _PMD_EMPTY _CRSTE_EMPTY(TABLE_TYPE_SEGMENT)
#define _PTE_TOK(t, p) ((union pte) { .tok.i = 1, .tok.type = (t), .tok.par = (p) })
#define _PTE_EMPTY _PTE_TOK(_DAT_TOKEN_NONE, 0)
/* This fake table type is used for page table walks (both for normal page tables and vSIE) */
#define TABLE_TYPE_PAGE_TABLE -1
enum dat_walk_flags {
DAT_WALK_USES_SKEYS = 0x40,
DAT_WALK_CONTINUE = 0x20,
DAT_WALK_IGN_HOLES = 0x10,
DAT_WALK_SPLIT = 0x08,
DAT_WALK_ALLOC = 0x04,
DAT_WALK_ANY = 0x02,
DAT_WALK_LEAF = 0x01,
DAT_WALK_DEFAULT = 0
};
#define DAT_WALK_SPLIT_ALLOC (DAT_WALK_SPLIT | DAT_WALK_ALLOC)
#define DAT_WALK_ALLOC_CONTINUE (DAT_WALK_CONTINUE | DAT_WALK_ALLOC)
#define DAT_WALK_LEAF_ALLOC (DAT_WALK_LEAF | DAT_WALK_ALLOC)
union pte {
unsigned long val;
union page_table_entry h;
struct {
unsigned long :56; /* Hardware bits */
unsigned long u : 1; /* Page unused */
unsigned long s : 1; /* Special */
unsigned long w : 1; /* Writable */
unsigned long r : 1; /* Readable */
unsigned long d : 1; /* Dirty */
unsigned long y : 1; /* Young */
unsigned long sd: 1; /* Soft dirty */
unsigned long pr: 1; /* Present */
} s;
struct {
unsigned char hwbytes[7];
unsigned char swbyte;
};
union {
struct {
unsigned long type :16; /* Token type */
unsigned long par :16; /* Token parameter */
unsigned long :20;
unsigned long : 1; /* Must be 0 */
unsigned long i : 1; /* Must be 1 */
unsigned long : 2;
unsigned long : 7;
unsigned long pr : 1; /* Must be 0 */
};
struct {
unsigned long token:32; /* Token and parameter */
unsigned long :32;
};
} tok;
};
/* Soft dirty, needed as macro for atomic operations on ptes */
#define _PAGE_SD 0x002
/* Needed as macro to perform atomic operations */
#define PGSTE_PCL_BIT 0x0080000000000000UL /* PCL lock, HW bit */
#define PGSTE_CMMA_D_BIT 0x0000000000008000UL /* CMMA dirty soft-bit */
enum pgste_gps_usage {
PGSTE_GPS_USAGE_STABLE = 0,
PGSTE_GPS_USAGE_UNUSED,
PGSTE_GPS_USAGE_POT_VOLATILE,
PGSTE_GPS_USAGE_VOLATILE,
};
union pgste {
unsigned long val;
struct {
unsigned long acc : 4;
unsigned long fp : 1;
unsigned long : 3;
unsigned long pcl : 1;
unsigned long hr : 1;
unsigned long hc : 1;
unsigned long : 2;
unsigned long gr : 1;
unsigned long gc : 1;
unsigned long : 1;
unsigned long :16; /* val16 */
unsigned long zero : 1;
unsigned long nodat : 1;
unsigned long : 4;
unsigned long usage : 2;
unsigned long : 8;
unsigned long cmma_d : 1; /* Dirty flag for CMMA bits */
unsigned long prefix_notif : 1; /* Guest prefix invalidation notification */
unsigned long vsie_notif : 1; /* Referenced in a shadow table */
unsigned long : 5;
unsigned long : 8;
};
struct {
unsigned short hwbytes0;
unsigned short val16; /* Used to store chunked values, see dat_{s,g}et_ptval() */
unsigned short hwbytes4;
unsigned char flags; /* Maps to the software bits */
unsigned char hwbyte7;
} __packed;
};
union pmd {
unsigned long val;
union segment_table_entry h;
struct {
struct {
unsigned long :44; /* HW */
unsigned long : 3; /* Unused */
unsigned long : 1; /* HW */
unsigned long w : 1; /* Writable soft-bit */
unsigned long r : 1; /* Readable soft-bit */
unsigned long d : 1; /* Dirty */
unsigned long y : 1; /* Young */
unsigned long prefix_notif : 1; /* Guest prefix invalidation notification */
unsigned long : 3; /* HW */
unsigned long vsie_notif : 1; /* Referenced in a shadow table */
unsigned long : 1; /* Unused */
unsigned long : 4; /* HW */
unsigned long sd : 1; /* Soft-Dirty */
unsigned long pr : 1; /* Present */
} fc1;
} s;
};
union pud {
unsigned long val;
union region3_table_entry h;
struct {
struct {
unsigned long :33; /* HW */
unsigned long :14; /* Unused */
unsigned long : 1; /* HW */
unsigned long w : 1; /* Writable soft-bit */
unsigned long r : 1; /* Readable soft-bit */
unsigned long d : 1; /* Dirty */
unsigned long y : 1; /* Young */
unsigned long prefix_notif : 1; /* Guest prefix invalidation notification */
unsigned long : 3; /* HW */
unsigned long vsie_notif : 1; /* Referenced in a shadow table */
unsigned long : 1; /* Unused */
unsigned long : 4; /* HW */
unsigned long sd : 1; /* Soft-Dirty */
unsigned long pr : 1; /* Present */
} fc1;
} s;
};
union p4d {
unsigned long val;
union region2_table_entry h;
};
union pgd {
unsigned long val;
union region1_table_entry h;
};
union crste {
unsigned long val;
union {
struct {
unsigned long :52;
unsigned long : 1;
unsigned long fc: 1;
unsigned long p : 1;
unsigned long : 1;
unsigned long : 2;
unsigned long i : 1;
unsigned long : 1;
unsigned long tt: 2;
unsigned long : 2;
};
struct {
unsigned long to:52;
unsigned long : 1;
unsigned long fc: 1;
unsigned long p : 1;
unsigned long : 1;
unsigned long tf: 2;
unsigned long i : 1;
unsigned long : 1;
unsigned long tt: 2;
unsigned long tl: 2;
} fc0;
struct {
unsigned long :47;
unsigned long av : 1; /* ACCF-Validity Control */
unsigned long acc: 4; /* Access-Control Bits */
unsigned long f : 1; /* Fetch-Protection Bit */
unsigned long fc : 1; /* Format-Control */
unsigned long p : 1; /* DAT-Protection Bit */
unsigned long iep: 1; /* Instruction-Execution-Protection */
unsigned long : 2;
unsigned long i : 1; /* Segment-Invalid Bit */
unsigned long cs : 1; /* Common-Segment Bit */
unsigned long tt : 2; /* Table-Type Bits */
unsigned long : 2;
} fc1;
} h;
struct {
struct {
unsigned long :47;
unsigned long : 1; /* HW (should be 0) */
unsigned long w : 1; /* Writable */
unsigned long r : 1; /* Readable */
unsigned long d : 1; /* Dirty */
unsigned long y : 1; /* Young */
unsigned long prefix_notif : 1; /* Guest prefix invalidation notification */
unsigned long : 3; /* HW */
unsigned long vsie_notif : 1; /* Referenced in a shadow table */
unsigned long : 1;
unsigned long : 4; /* HW */
unsigned long sd : 1; /* Soft-Dirty */
unsigned long pr : 1; /* Present */
} fc1;
} s;
union {
struct {
unsigned long type :16; /* Token type */
unsigned long par :16; /* Token parameter */
unsigned long :26;
unsigned long i : 1; /* Must be 1 */
unsigned long : 1;
unsigned long tt : 2;
unsigned long : 1;
unsigned long pr : 1; /* Must be 0 */
};
struct {
unsigned long token:32; /* Token and parameter */
unsigned long :32;
};
} tok;
union pmd pmd;
union pud pud;
union p4d p4d;
union pgd pgd;
};
union skey {
unsigned char skey;
struct {
unsigned char acc :4;
unsigned char fp :1;
unsigned char r :1;
unsigned char c :1;
unsigned char zero:1;
};
};
static_assert(sizeof(union pgste) == sizeof(unsigned long));
static_assert(sizeof(union pte) == sizeof(unsigned long));
static_assert(sizeof(union pmd) == sizeof(unsigned long));
static_assert(sizeof(union pud) == sizeof(unsigned long));
static_assert(sizeof(union p4d) == sizeof(unsigned long));
static_assert(sizeof(union pgd) == sizeof(unsigned long));
static_assert(sizeof(union crste) == sizeof(unsigned long));
static_assert(sizeof(union skey) == sizeof(char));
struct segment_table {
union pmd pmds[_CRST_ENTRIES];
};
struct region3_table {
union pud puds[_CRST_ENTRIES];
};
struct region2_table {
union p4d p4ds[_CRST_ENTRIES];
};
struct region1_table {
union pgd pgds[_CRST_ENTRIES];
};
struct crst_table {
union {
union crste crstes[_CRST_ENTRIES];
struct segment_table segment;
struct region3_table region3;
struct region2_table region2;
struct region1_table region1;
};
};
struct page_table {
union pte ptes[_PAGE_ENTRIES];
union pgste pgstes[_PAGE_ENTRIES];
};
static_assert(sizeof(struct crst_table) == _CRST_TABLE_SIZE);
static_assert(sizeof(struct page_table) == PAGE_SIZE);
struct dat_walk;
typedef long (*dat_walk_op)(union crste *crste, gfn_t gfn, gfn_t next, struct dat_walk *w);
struct dat_walk_ops {
union {
dat_walk_op crste_ops[4];
struct {
dat_walk_op pmd_entry;
dat_walk_op pud_entry;
dat_walk_op p4d_entry;
dat_walk_op pgd_entry;
};
};
long (*pte_entry)(union pte *pte, gfn_t gfn, gfn_t next, struct dat_walk *w);
};
struct dat_walk {
const struct dat_walk_ops *ops;
union crste *last;
union pte *last_pte;
union asce asce;
gfn_t start;
gfn_t end;
int flags;
void *priv;
};
struct ptval_param {
unsigned char offset : 6;
unsigned char len : 2;
};
/**
* _pte() - Useful constructor for union pte
* @pfn: the pfn this pte should point to.
* @writable: whether the pte should be writable.
* @dirty: whether the pte should be dirty.
* @special: whether the pte should be marked as special
*
* The pte is also marked as young and present. If the pte is marked as dirty,
* it gets marked as soft-dirty too. If the pte is not dirty, the hardware
* protect bit is set (independently of the write softbit); this way proper
* dirty tracking can be performed.
*
* Return: a union pte value.
*/
static inline union pte _pte(kvm_pfn_t pfn, bool writable, bool dirty, bool special)
{
union pte res = { .val = PFN_PHYS(pfn) };
res.h.p = !dirty;
res.s.y = 1;
res.s.pr = 1;
res.s.w = writable;
res.s.d = dirty;
res.s.sd = dirty;
res.s.s = special;
return res;
}
static inline union crste _crste_fc0(kvm_pfn_t pfn, int tt)
{
union crste res = { .val = PFN_PHYS(pfn) };
res.h.tt = tt;
res.h.fc0.tl = _REGION_ENTRY_LENGTH;
res.h.fc0.tf = 0;
return res;
}
/**
* _crste() - Useful constructor for union crste with FC=1
* @pfn: the pfn this pte should point to.
* @tt: the table type
* @writable: whether the pte should be writable.
* @dirty: whether the pte should be dirty.
*
* The crste is also marked as young and present. If the crste is marked as
* dirty, it gets marked as soft-dirty too. If the crste is not dirty, the
* hardware protect bit is set (independently of the write softbit); this way
* proper dirty tracking can be performed.
*
* Return: a union crste value.
*/
static inline union crste _crste_fc1(kvm_pfn_t pfn, int tt, bool writable, bool dirty)
{
union crste res = { .val = PFN_PHYS(pfn) & _SEGMENT_MASK };
res.h.tt = tt;
res.h.p = !dirty;
res.h.fc = 1;
res.s.fc1.y = 1;
res.s.fc1.pr = 1;
res.s.fc1.w = writable;
res.s.fc1.d = dirty;
res.s.fc1.sd = dirty;
return res;
}
union essa_state {
unsigned char val;
struct {
unsigned char : 2;
unsigned char nodat : 1;
unsigned char exception : 1;
unsigned char usage : 2;
unsigned char content : 2;
};
};
/**
* struct vsie_rmap - reverse mapping for shadow page table entries
* @next: pointer to next rmap in the list
* @r_gfn: virtual rmap address in the shadow guest address space
*/
struct vsie_rmap {
struct vsie_rmap *next;
union {
unsigned long val;
struct {
long level: 8;
unsigned long : 4;
unsigned long r_gfn:52;
};
};
};
static_assert(sizeof(struct vsie_rmap) == 2 * sizeof(long));
#define KVM_S390_MMU_CACHE_N_CRSTS 6
#define KVM_S390_MMU_CACHE_N_PTS 2
#define KVM_S390_MMU_CACHE_N_RMAPS 16
struct kvm_s390_mmu_cache {
void *crsts[KVM_S390_MMU_CACHE_N_CRSTS];
void *pts[KVM_S390_MMU_CACHE_N_PTS];
void *rmaps[KVM_S390_MMU_CACHE_N_RMAPS];
short int n_crsts;
short int n_pts;
short int n_rmaps;
};
struct guest_fault {
gfn_t gfn; /* Guest frame */
kvm_pfn_t pfn; /* Host PFN */
struct page *page; /* Host page */
union pte *ptep; /* Used to resolve the fault, or NULL */
union crste *crstep; /* Used to resolve the fault, or NULL */
bool writable; /* Mapping is writable */
bool write_attempt; /* Write access attempted */
bool attempt_pfault; /* Attempt a pfault first */
bool valid; /* This entry contains valid data */
void (*callback)(struct guest_fault *f);
void *priv;
};
/*
* 0 1 2 3 4 5 6 7
* +-------+-------+-------+-------+-------+-------+-------+-------+
* 0 | | PGT_ADDR |
* 8 | VMADDR | |
* 16 | |
* 24 | |
*/
#define MKPTVAL(o, l) ((struct ptval_param) { .offset = (o), .len = ((l) + 1) / 2 - 1})
#define PTVAL_PGT_ADDR MKPTVAL(4, 8)
#define PTVAL_VMADDR MKPTVAL(8, 6)
union pgste __must_check __dat_ptep_xchg(union pte *ptep, union pgste pgste, union pte new,
gfn_t gfn, union asce asce, bool uses_skeys);
bool dat_crstep_xchg_atomic(union crste *crstep, union crste old, union crste new, gfn_t gfn,
union asce asce);
void dat_crstep_xchg(union crste *crstep, union crste new, gfn_t gfn, union asce asce);
long _dat_walk_gfn_range(gfn_t start, gfn_t end, union asce asce,
const struct dat_walk_ops *ops, int flags, void *priv);
int dat_entry_walk(struct kvm_s390_mmu_cache *mc, gfn_t gfn, union asce asce, int flags,
int walk_level, union crste **last, union pte **ptepp);
void dat_free_level(struct crst_table *table, bool owns_ptes);
struct crst_table *dat_alloc_crst_sleepable(unsigned long init);
int dat_set_asce_limit(struct kvm_s390_mmu_cache *mc, union asce *asce, int newtype);
int dat_get_storage_key(union asce asce, gfn_t gfn, union skey *skey);
int dat_set_storage_key(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_t gfn,
union skey skey, bool nq);
int dat_cond_set_storage_key(struct kvm_s390_mmu_cache *mmc, union asce asce, gfn_t gfn,
union skey skey, union skey *oldkey, bool nq, bool mr, bool mc);
int dat_reset_reference_bit(union asce asce, gfn_t gfn);
long dat_reset_skeys(union asce asce, gfn_t start);
unsigned long dat_get_ptval(struct page_table *table, struct ptval_param param);
void dat_set_ptval(struct page_table *table, struct ptval_param param, unsigned long val);
int dat_set_slot(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_t start, gfn_t end,
u16 type, u16 param);
int dat_set_prefix_notif_bit(union asce asce, gfn_t gfn);
bool dat_test_age_gfn(union asce asce, gfn_t start, gfn_t end);
int dat_link(struct kvm_s390_mmu_cache *mc, union asce asce, int level,
bool uses_skeys, struct guest_fault *f);
int dat_perform_essa(union asce asce, gfn_t gfn, int orc, union essa_state *state, bool *dirty);
long dat_reset_cmma(union asce asce, gfn_t start_gfn);
int dat_peek_cmma(gfn_t start, union asce asce, unsigned int *count, u8 *values);
int dat_get_cmma(union asce asce, gfn_t *start, unsigned int *count, u8 *values, atomic64_t *rem);
int dat_set_cmma_bits(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_t gfn,
unsigned long count, unsigned long mask, const uint8_t *bits);
int kvm_s390_mmu_cache_topup(struct kvm_s390_mmu_cache *mc);
#define GFP_KVM_S390_MMU_CACHE (GFP_ATOMIC | __GFP_ACCOUNT | __GFP_NOWARN)
static inline struct page_table *kvm_s390_mmu_cache_alloc_pt(struct kvm_s390_mmu_cache *mc)
{
if (mc->n_pts)
return mc->pts[--mc->n_pts];
return (void *)__get_free_page(GFP_KVM_S390_MMU_CACHE);
}
static inline struct crst_table *kvm_s390_mmu_cache_alloc_crst(struct kvm_s390_mmu_cache *mc)
{
if (mc->n_crsts)
return mc->crsts[--mc->n_crsts];
return (void *)__get_free_pages(GFP_KVM_S390_MMU_CACHE | __GFP_COMP, CRST_ALLOC_ORDER);
}
static inline struct vsie_rmap *kvm_s390_mmu_cache_alloc_rmap(struct kvm_s390_mmu_cache *mc)
{
if (mc->n_rmaps)
return mc->rmaps[--mc->n_rmaps];
return kzalloc(sizeof(struct vsie_rmap), GFP_KVM_S390_MMU_CACHE);
}
static inline struct crst_table *crste_table_start(union crste *crstep)
{
return (struct crst_table *)ALIGN_DOWN((unsigned long)crstep, _CRST_TABLE_SIZE);
}
static inline struct page_table *pte_table_start(union pte *ptep)
{
return (struct page_table *)ALIGN_DOWN((unsigned long)ptep, _PAGE_TABLE_SIZE);
}
static inline bool crdte_crste(union crste *crstep, union crste old, union crste new, gfn_t gfn,
union asce asce)
{
unsigned long dtt = 0x10 | new.h.tt << 2;
void *table = crste_table_start(crstep);
return crdte(old.val, new.val, table, dtt, gfn_to_gpa(gfn), asce.val);
}
/**
* idte_crste() - invalidate a crste entry using idte
* @crstep: pointer to the crste to be invalidated
* @gfn: a gfn mapped by the crste
* @opt: options for the idte instruction
* @asce: the asce
* @local: whether the operation is cpu-local
*/
static __always_inline void idte_crste(union crste *crstep, gfn_t gfn, unsigned long opt,
union asce asce, int local)
{
unsigned long table_origin = __pa(crste_table_start(crstep));
unsigned long gaddr = gfn_to_gpa(gfn) & HPAGE_MASK;
if (__builtin_constant_p(opt) && opt == 0) {
/* flush without guest asce */
asm volatile("idte %[table_origin],0,%[gaddr],%[local]"
: "+m" (*crstep)
: [table_origin] "a" (table_origin), [gaddr] "a" (gaddr),
[local] "i" (local)
: "cc");
} else {
/* flush with guest asce */
asm volatile("idte %[table_origin],%[asce],%[gaddr_opt],%[local]"
: "+m" (*crstep)
: [table_origin] "a" (table_origin), [gaddr_opt] "a" (gaddr | opt),
[asce] "a" (asce.val), [local] "i" (local)
: "cc");
}
}
static inline void dat_init_pgstes(struct page_table *pt, unsigned long val)
{
memset64((void *)pt->pgstes, val, PTRS_PER_PTE);
}
static inline void dat_init_page_table(struct page_table *pt, unsigned long ptes,
unsigned long pgstes)
{
memset64((void *)pt->ptes, ptes, PTRS_PER_PTE);
dat_init_pgstes(pt, pgstes);
}
static inline gfn_t asce_end(union asce asce)
{
return 1ULL << ((asce.dt + 1) * 11 + _SEGMENT_SHIFT - PAGE_SHIFT);
}
#define _CRSTE(x) ((union crste) { .val = _Generic((x), \
union pgd : (x).val, \
union p4d : (x).val, \
union pud : (x).val, \
union pmd : (x).val, \
union crste : (x).val)})
#define _CRSTEP(x) ((union crste *)_Generic((*(x)), \
union pgd : (x), \
union p4d : (x), \
union pud : (x), \
union pmd : (x), \
union crste : (x)))
#define _CRSTP(x) ((struct crst_table *)_Generic((*(x)), \
struct crst_table : (x), \
struct segment_table : (x), \
struct region3_table : (x), \
struct region2_table : (x), \
struct region1_table : (x)))
static inline bool asce_contains_gfn(union asce asce, gfn_t gfn)
{
return gfn < asce_end(asce);
}
static inline bool is_pmd(union crste crste)
{
return crste.h.tt == TABLE_TYPE_SEGMENT;
}
static inline bool is_pud(union crste crste)
{
return crste.h.tt == TABLE_TYPE_REGION3;
}
static inline bool is_p4d(union crste crste)
{
return crste.h.tt == TABLE_TYPE_REGION2;
}
static inline bool is_pgd(union crste crste)
{
return crste.h.tt == TABLE_TYPE_REGION1;
}
static inline phys_addr_t pmd_origin_large(union pmd pmd)
{
return pmd.val & _SEGMENT_ENTRY_ORIGIN_LARGE;
}
static inline phys_addr_t pud_origin_large(union pud pud)
{
return pud.val & _REGION3_ENTRY_ORIGIN_LARGE;
}
/**
* crste_origin_large() - Return the large frame origin of a large crste
* @crste: The crste whose origin is to be returned. Should be either a
* region-3 table entry or a segment table entry, in both cases with
* FC set to 1 (large pages).
*
* Return: The origin of the large frame pointed to by @crste, or -1 if the
* crste was not large (wrong table type, or FC==0)
*/
static inline phys_addr_t crste_origin_large(union crste crste)
{
if (unlikely(!crste.h.fc || crste.h.tt > TABLE_TYPE_REGION3))
return -1;
if (is_pmd(crste))
return pmd_origin_large(crste.pmd);
return pud_origin_large(crste.pud);
}
#define crste_origin(x) (_Generic((x), \
union pmd : (x).val & _SEGMENT_ENTRY_ORIGIN, \
union pud : (x).val & _REGION_ENTRY_ORIGIN, \
union p4d : (x).val & _REGION_ENTRY_ORIGIN, \
union pgd : (x).val & _REGION_ENTRY_ORIGIN))
static inline unsigned long pte_origin(union pte pte)
{
return pte.val & PAGE_MASK;
}
static inline bool pmd_prefix(union pmd pmd)
{
return pmd.h.fc && pmd.s.fc1.prefix_notif;
}
static inline bool pud_prefix(union pud pud)
{
return pud.h.fc && pud.s.fc1.prefix_notif;
}
static inline bool crste_leaf(union crste crste)
{
return (crste.h.tt <= TABLE_TYPE_REGION3) && crste.h.fc;
}
static inline bool crste_prefix(union crste crste)
{
return crste_leaf(crste) && crste.s.fc1.prefix_notif;
}
static inline bool crste_dirty(union crste crste)
{
return crste_leaf(crste) && crste.s.fc1.d;
}
static inline union pgste *pgste_of(union pte *pte)
{
return (union pgste *)(pte + _PAGE_ENTRIES);
}
static inline bool pte_hole(union pte pte)
{
return pte.h.i && !pte.tok.pr && pte.tok.type != _DAT_TOKEN_NONE;
}
static inline bool _crste_hole(union crste crste)
{
return crste.h.i && !crste.tok.pr && crste.tok.type != _DAT_TOKEN_NONE;
}
#define crste_hole(x) _crste_hole(_CRSTE(x))
static inline bool _crste_none(union crste crste)
{
return crste.h.i && !crste.tok.pr && crste.tok.type == _DAT_TOKEN_NONE;
}
#define crste_none(x) _crste_none(_CRSTE(x))
static inline phys_addr_t large_pud_to_phys(union pud pud, gfn_t gfn)
{
return pud_origin_large(pud) | (gfn_to_gpa(gfn) & ~_REGION3_MASK);
}
static inline phys_addr_t large_pmd_to_phys(union pmd pmd, gfn_t gfn)
{
return pmd_origin_large(pmd) | (gfn_to_gpa(gfn) & ~_SEGMENT_MASK);
}
static inline phys_addr_t large_crste_to_phys(union crste crste, gfn_t gfn)
{
if (unlikely(!crste.h.fc || crste.h.tt > TABLE_TYPE_REGION3))
return -1;
if (is_pmd(crste))
return large_pmd_to_phys(crste.pmd, gfn);
return large_pud_to_phys(crste.pud, gfn);
}
static inline bool cspg_crste(union crste *crstep, union crste old, union crste new)
{
return cspg(&crstep->val, old.val, new.val);
}
static inline struct page_table *dereference_pmd(union pmd pmd)
{
return phys_to_virt(crste_origin(pmd));
}
static inline struct segment_table *dereference_pud(union pud pud)
{
return phys_to_virt(crste_origin(pud));
}
static inline struct region3_table *dereference_p4d(union p4d p4d)
{
return phys_to_virt(crste_origin(p4d));
}
static inline struct region2_table *dereference_pgd(union pgd pgd)
{
return phys_to_virt(crste_origin(pgd));
}
static inline struct crst_table *_dereference_crste(union crste crste)
{
if (unlikely(is_pmd(crste)))
return NULL;
return phys_to_virt(crste_origin(crste.pud));
}
#define dereference_crste(x) (_Generic((x), \
union pud : _dereference_crste(_CRSTE(x)), \
union p4d : _dereference_crste(_CRSTE(x)), \
union pgd : _dereference_crste(_CRSTE(x)), \
union crste : _dereference_crste(_CRSTE(x))))
static inline struct crst_table *dereference_asce(union asce asce)
{
return phys_to_virt(asce.val & _ASCE_ORIGIN);
}
static inline void asce_flush_tlb(union asce asce)
{
__tlb_flush_idte(asce.val);
}
static inline bool pgste_get_trylock(union pte *ptep, union pgste *res)
{
union pgste *pgstep = pgste_of(ptep);
union pgste old_pgste;
if (READ_ONCE(pgstep->val) & PGSTE_PCL_BIT)
return false;
old_pgste.val = __atomic64_or_barrier(PGSTE_PCL_BIT, &pgstep->val);
if (old_pgste.pcl)
return false;
old_pgste.pcl = 1;
*res = old_pgste;
return true;
}
static inline union pgste pgste_get_lock(union pte *ptep)
{
union pgste res;
while (!pgste_get_trylock(ptep, &res))
cpu_relax();
return res;
}
static inline void pgste_set_unlock(union pte *ptep, union pgste pgste)
{
pgste.pcl = 0;
barrier();
WRITE_ONCE(*pgste_of(ptep), pgste);
}
static inline void dat_ptep_xchg(union pte *ptep, union pte new, gfn_t gfn, union asce asce,
bool has_skeys)
{
union pgste pgste;
pgste = pgste_get_lock(ptep);
pgste = __dat_ptep_xchg(ptep, pgste, new, gfn, asce, has_skeys);
pgste_set_unlock(ptep, pgste);
}
static inline void dat_ptep_clear(union pte *ptep, gfn_t gfn, union asce asce, bool has_skeys)
{
dat_ptep_xchg(ptep, _PTE_EMPTY, gfn, asce, has_skeys);
}
static inline void dat_free_pt(struct page_table *pt)
{
free_page((unsigned long)pt);
}
static inline void _dat_free_crst(struct crst_table *table)
{
free_pages((unsigned long)table, CRST_ALLOC_ORDER);
}
#define dat_free_crst(x) _dat_free_crst(_CRSTP(x))
static inline void kvm_s390_free_mmu_cache(struct kvm_s390_mmu_cache *mc)
{
if (!mc)
return;
while (mc->n_pts)
dat_free_pt(mc->pts[--mc->n_pts]);
while (mc->n_crsts)
_dat_free_crst(mc->crsts[--mc->n_crsts]);
while (mc->n_rmaps)
kfree(mc->rmaps[--mc->n_rmaps]);
kfree(mc);
}
DEFINE_FREE(kvm_s390_mmu_cache, struct kvm_s390_mmu_cache *, if (_T) kvm_s390_free_mmu_cache(_T))
static inline struct kvm_s390_mmu_cache *kvm_s390_new_mmu_cache(void)
{
struct kvm_s390_mmu_cache *mc __free(kvm_s390_mmu_cache) = NULL;
mc = kzalloc(sizeof(*mc), GFP_KERNEL_ACCOUNT);
if (mc && !kvm_s390_mmu_cache_topup(mc))
return_ptr(mc);
return NULL;
}
static inline bool dat_pmdp_xchg_atomic(union pmd *pmdp, union pmd old, union pmd new,
gfn_t gfn, union asce asce)
{
return dat_crstep_xchg_atomic(_CRSTEP(pmdp), _CRSTE(old), _CRSTE(new), gfn, asce);
}
static inline bool dat_pudp_xchg_atomic(union pud *pudp, union pud old, union pud new,
gfn_t gfn, union asce asce)
{
return dat_crstep_xchg_atomic(_CRSTEP(pudp), _CRSTE(old), _CRSTE(new), gfn, asce);
}
static inline void dat_crstep_clear(union crste *crstep, gfn_t gfn, union asce asce)
{
union crste newcrste = _CRSTE_EMPTY(crstep->h.tt);
dat_crstep_xchg(crstep, newcrste, gfn, asce);
}
static inline int get_level(union crste *crstep, union pte *ptep)
{
return ptep ? TABLE_TYPE_PAGE_TABLE : crstep->h.tt;
}
static inline int dat_delete_slot(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_t start,
unsigned long npages)
{
return dat_set_slot(mc, asce, start, start + npages, _DAT_TOKEN_PIC, PGM_ADDRESSING);
}
static inline int dat_create_slot(struct kvm_s390_mmu_cache *mc, union asce asce, gfn_t start,
unsigned long npages)
{
return dat_set_slot(mc, asce, start, start + npages, _DAT_TOKEN_NONE, 0);
}
static inline bool crste_is_ucas(union crste crste)
{
return is_pmd(crste) && crste.h.i && crste.h.fc0.tl == 1 && crste.h.fc == 0;
}
#endif /* __KVM_S390_DAT_H */

View file

@ -10,13 +10,13 @@
#include <linux/kvm.h>
#include <linux/kvm_host.h>
#include <asm/gmap.h>
#include <asm/gmap_helpers.h>
#include <asm/virtio-ccw.h>
#include "kvm-s390.h"
#include "trace.h"
#include "trace-s390.h"
#include "gaccess.h"
#include "gmap.h"
static void do_discard_gfn_range(struct kvm_vcpu *vcpu, gfn_t gfn_start, gfn_t gfn_end)
{

148
arch/s390/kvm/faultin.c Normal file
View file

@ -0,0 +1,148 @@
// SPDX-License-Identifier: GPL-2.0
/*
* KVM guest fault handling.
*
* Copyright IBM Corp. 2025
* Author(s): Claudio Imbrenda <imbrenda@linux.ibm.com>
*/
#include <linux/kvm_types.h>
#include <linux/kvm_host.h>
#include "gmap.h"
#include "trace.h"
#include "faultin.h"
bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu);
/*
* kvm_s390_faultin_gfn() - handle a dat fault.
* @vcpu: The vCPU whose gmap is to be fixed up, or NULL if operating on the VM.
* @kvm: The VM whose gmap is to be fixed up, or NULL if operating on a vCPU.
* @f: The guest fault that needs to be resolved.
*
* Return:
* * 0 on success
* * < 0 in case of error
* * > 0 in case of guest exceptions
*
* Context:
* * The mm lock must not be held before calling
* * kvm->srcu must be held
* * may sleep
*/
int kvm_s390_faultin_gfn(struct kvm_vcpu *vcpu, struct kvm *kvm, struct guest_fault *f)
{
struct kvm_s390_mmu_cache *local_mc __free(kvm_s390_mmu_cache) = NULL;
struct kvm_s390_mmu_cache *mc = NULL;
struct kvm_memory_slot *slot;
unsigned long inv_seq;
int foll, rc = 0;
foll = f->write_attempt ? FOLL_WRITE : 0;
foll |= f->attempt_pfault ? FOLL_NOWAIT : 0;
if (vcpu) {
kvm = vcpu->kvm;
mc = vcpu->arch.mc;
}
lockdep_assert_held(&kvm->srcu);
scoped_guard(read_lock, &kvm->mmu_lock) {
if (gmap_try_fixup_minor(kvm->arch.gmap, f) == 0)
return 0;
}
while (1) {
f->valid = false;
inv_seq = kvm->mmu_invalidate_seq;
/* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */
smp_rmb();
if (vcpu)
slot = kvm_vcpu_gfn_to_memslot(vcpu, f->gfn);
else
slot = gfn_to_memslot(kvm, f->gfn);
f->pfn = __kvm_faultin_pfn(slot, f->gfn, foll, &f->writable, &f->page);
/* Needs I/O, try to setup async pfault (only possible with FOLL_NOWAIT). */
if (f->pfn == KVM_PFN_ERR_NEEDS_IO) {
if (unlikely(!f->attempt_pfault))
return -EAGAIN;
if (unlikely(!vcpu))
return -EINVAL;
trace_kvm_s390_major_guest_pfault(vcpu);
if (kvm_arch_setup_async_pf(vcpu))
return 0;
vcpu->stat.pfault_sync++;
/* Could not setup async pfault, try again synchronously. */
foll &= ~FOLL_NOWAIT;
f->pfn = __kvm_faultin_pfn(slot, f->gfn, foll, &f->writable, &f->page);
}
/* Access outside memory, addressing exception. */
if (is_noslot_pfn(f->pfn))
return PGM_ADDRESSING;
/* Signal pending: try again. */
if (f->pfn == KVM_PFN_ERR_SIGPENDING)
return -EAGAIN;
/* Check if it's read-only memory; don't try to actually handle that case. */
if (f->pfn == KVM_PFN_ERR_RO_FAULT)
return -EOPNOTSUPP;
/* Any other error. */
if (is_error_pfn(f->pfn))
return -EFAULT;
if (!mc) {
local_mc = kvm_s390_new_mmu_cache();
if (!local_mc)
return -ENOMEM;
mc = local_mc;
}
/* Loop, will automatically release the faulted page. */
if (mmu_invalidate_retry_gfn_unsafe(kvm, inv_seq, f->gfn)) {
kvm_release_faultin_page(kvm, f->page, true, false);
continue;
}
scoped_guard(read_lock, &kvm->mmu_lock) {
if (!mmu_invalidate_retry_gfn(kvm, inv_seq, f->gfn)) {
f->valid = true;
rc = gmap_link(mc, kvm->arch.gmap, f);
kvm_release_faultin_page(kvm, f->page, !!rc, f->write_attempt);
f->page = NULL;
}
}
kvm_release_faultin_page(kvm, f->page, true, false);
if (rc == -ENOMEM) {
rc = kvm_s390_mmu_cache_topup(mc);
if (rc)
return rc;
} else if (rc != -EAGAIN) {
return rc;
}
}
}
int kvm_s390_get_guest_page(struct kvm *kvm, struct guest_fault *f, gfn_t gfn, bool w)
{
struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
int foll = w ? FOLL_WRITE : 0;
f->write_attempt = w;
f->gfn = gfn;
f->pfn = __kvm_faultin_pfn(slot, gfn, foll, &f->writable, &f->page);
if (is_noslot_pfn(f->pfn))
return PGM_ADDRESSING;
if (is_sigpending_pfn(f->pfn))
return -EINTR;
if (f->pfn == KVM_PFN_ERR_NEEDS_IO)
return -EAGAIN;
if (is_error_pfn(f->pfn))
return -EFAULT;
f->valid = true;
return 0;
}

92
arch/s390/kvm/faultin.h Normal file
View file

@ -0,0 +1,92 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* KVM guest fault handling.
*
* Copyright IBM Corp. 2025
* Author(s): Claudio Imbrenda <imbrenda@linux.ibm.com>
*/
#ifndef __KVM_S390_FAULTIN_H
#define __KVM_S390_FAULTIN_H
#include <linux/kvm_host.h>
#include "dat.h"
int kvm_s390_faultin_gfn(struct kvm_vcpu *vcpu, struct kvm *kvm, struct guest_fault *f);
int kvm_s390_get_guest_page(struct kvm *kvm, struct guest_fault *f, gfn_t gfn, bool w);
static inline int kvm_s390_faultin_gfn_simple(struct kvm_vcpu *vcpu, struct kvm *kvm,
gfn_t gfn, bool wr)
{
struct guest_fault f = { .gfn = gfn, .write_attempt = wr, };
return kvm_s390_faultin_gfn(vcpu, kvm, &f);
}
static inline int kvm_s390_get_guest_page_and_read_gpa(struct kvm *kvm, struct guest_fault *f,
gpa_t gaddr, unsigned long *val)
{
int rc;
rc = kvm_s390_get_guest_page(kvm, f, gpa_to_gfn(gaddr), false);
if (rc)
return rc;
*val = *(unsigned long *)phys_to_virt(pfn_to_phys(f->pfn) | offset_in_page(gaddr));
return 0;
}
static inline void kvm_s390_release_multiple(struct kvm *kvm, struct guest_fault *guest_faults,
int n, bool ignore)
{
int i;
for (i = 0; i < n; i++) {
kvm_release_faultin_page(kvm, guest_faults[i].page, ignore,
guest_faults[i].write_attempt);
guest_faults[i].page = NULL;
}
}
static inline bool kvm_s390_multiple_faults_need_retry(struct kvm *kvm, unsigned long seq,
struct guest_fault *guest_faults, int n,
bool unsafe)
{
int i;
for (i = 0; i < n; i++) {
if (!guest_faults[i].valid)
continue;
if (unsafe && mmu_invalidate_retry_gfn_unsafe(kvm, seq, guest_faults[i].gfn))
return true;
if (!unsafe && mmu_invalidate_retry_gfn(kvm, seq, guest_faults[i].gfn))
return true;
}
return false;
}
static inline int kvm_s390_get_guest_pages(struct kvm *kvm, struct guest_fault *guest_faults,
gfn_t start, int n_pages, bool write_attempt)
{
int i, rc;
for (i = 0; i < n_pages; i++) {
rc = kvm_s390_get_guest_page(kvm, guest_faults + i, start + i, write_attempt);
if (rc)
break;
}
return rc;
}
#define kvm_s390_release_faultin_array(kvm, array, ignore) \
kvm_s390_release_multiple(kvm, array, ARRAY_SIZE(array), ignore)
#define kvm_s390_array_needs_retry_unsafe(kvm, seq, array) \
kvm_s390_multiple_faults_need_retry(kvm, seq, array, ARRAY_SIZE(array), true)
#define kvm_s390_array_needs_retry_safe(kvm, seq, array) \
kvm_s390_multiple_faults_need_retry(kvm, seq, array, ARRAY_SIZE(array), false)
#endif /* __KVM_S390_FAULTIN_H */

File diff suppressed because it is too large Load diff

View file

@ -206,8 +206,8 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra,
void *data, unsigned long len, enum gacc_mode mode);
int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, __uint128_t *old,
__uint128_t new, u8 access_key, bool *success);
int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, union kvm_s390_quad *old,
union kvm_s390_quad new, u8 access_key, bool *success);
/**
* write_guest_with_key - copy data from kernel space to guest space
@ -450,11 +450,17 @@ void ipte_unlock(struct kvm *kvm);
int ipte_lock_held(struct kvm *kvm);
int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra);
/* MVPG PEI indication bits */
#define PEI_DAT_PROT 2
#define PEI_NOT_PTE 4
union mvpg_pei {
unsigned long val;
struct {
unsigned long addr : 61;
unsigned long not_pte : 1;
unsigned long dat_prot: 1;
unsigned long real : 1;
};
};
int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *shadow,
unsigned long saddr, unsigned long *datptr);
int gaccess_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, gpa_t saddr,
union mvpg_pei *datptr, bool wr);
#endif /* __KVM_S390_GACCESS_H */

View file

@ -1,141 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Guest memory management for KVM/s390 nested VMs.
*
* Copyright IBM Corp. 2008, 2020, 2024
*
* Author(s): Claudio Imbrenda <imbrenda@linux.ibm.com>
* Martin Schwidefsky <schwidefsky@de.ibm.com>
* David Hildenbrand <david@redhat.com>
* Janosch Frank <frankja@linux.vnet.ibm.com>
*/
#include <linux/compiler.h>
#include <linux/kvm.h>
#include <linux/kvm_host.h>
#include <linux/pgtable.h>
#include <linux/pagemap.h>
#include <linux/mman.h>
#include <asm/lowcore.h>
#include <asm/gmap.h>
#include <asm/uv.h>
#include "kvm-s390.h"
/**
* gmap_find_shadow - find a specific asce in the list of shadow tables
* @parent: pointer to the parent gmap
* @asce: ASCE for which the shadow table is created
* @edat_level: edat level to be used for the shadow translation
*
* Returns the pointer to a gmap if a shadow table with the given asce is
* already available, ERR_PTR(-EAGAIN) if another one is just being created,
* otherwise NULL
*
* Context: Called with parent->shadow_lock held
*/
static struct gmap *gmap_find_shadow(struct gmap *parent, unsigned long asce, int edat_level)
{
struct gmap *sg;
lockdep_assert_held(&parent->shadow_lock);
list_for_each_entry(sg, &parent->children, list) {
if (!gmap_shadow_valid(sg, asce, edat_level))
continue;
if (!sg->initialized)
return ERR_PTR(-EAGAIN);
refcount_inc(&sg->ref_count);
return sg;
}
return NULL;
}
/**
* gmap_shadow - create/find a shadow guest address space
* @parent: pointer to the parent gmap
* @asce: ASCE for which the shadow table is created
* @edat_level: edat level to be used for the shadow translation
*
* The pages of the top level page table referred by the asce parameter
* will be set to read-only and marked in the PGSTEs of the kvm process.
* The shadow table will be removed automatically on any change to the
* PTE mapping for the source table.
*
* Returns a guest address space structure, ERR_PTR(-ENOMEM) if out of memory,
* ERR_PTR(-EAGAIN) if the caller has to retry and ERR_PTR(-EFAULT) if the
* parent gmap table could not be protected.
*/
struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce, int edat_level)
{
struct gmap *sg, *new;
unsigned long limit;
int rc;
if (KVM_BUG_ON(parent->mm->context.allow_gmap_hpage_1m, (struct kvm *)parent->private) ||
KVM_BUG_ON(gmap_is_shadow(parent), (struct kvm *)parent->private))
return ERR_PTR(-EFAULT);
spin_lock(&parent->shadow_lock);
sg = gmap_find_shadow(parent, asce, edat_level);
spin_unlock(&parent->shadow_lock);
if (sg)
return sg;
/* Create a new shadow gmap */
limit = -1UL >> (33 - (((asce & _ASCE_TYPE_MASK) >> 2) * 11));
if (asce & _ASCE_REAL_SPACE)
limit = -1UL;
new = gmap_alloc(limit);
if (!new)
return ERR_PTR(-ENOMEM);
new->mm = parent->mm;
new->parent = gmap_get(parent);
new->private = parent->private;
new->orig_asce = asce;
new->edat_level = edat_level;
new->initialized = false;
spin_lock(&parent->shadow_lock);
/* Recheck if another CPU created the same shadow */
sg = gmap_find_shadow(parent, asce, edat_level);
if (sg) {
spin_unlock(&parent->shadow_lock);
gmap_free(new);
return sg;
}
if (asce & _ASCE_REAL_SPACE) {
/* only allow one real-space gmap shadow */
list_for_each_entry(sg, &parent->children, list) {
if (sg->orig_asce & _ASCE_REAL_SPACE) {
spin_lock(&sg->guest_table_lock);
gmap_unshadow(sg);
spin_unlock(&sg->guest_table_lock);
list_del(&sg->list);
gmap_put(sg);
break;
}
}
}
refcount_set(&new->ref_count, 2);
list_add(&new->list, &parent->children);
if (asce & _ASCE_REAL_SPACE) {
/* nothing to protect, return right away */
new->initialized = true;
spin_unlock(&parent->shadow_lock);
return new;
}
spin_unlock(&parent->shadow_lock);
/* protect after insertion, so it will get properly invalidated */
mmap_read_lock(parent->mm);
rc = __kvm_s390_mprotect_many(parent, asce & _ASCE_ORIGIN,
((asce & _ASCE_TABLE_LENGTH) + 1),
PROT_READ, GMAP_NOTIFY_SHADOW);
mmap_read_unlock(parent->mm);
spin_lock(&parent->shadow_lock);
new->initialized = true;
if (rc) {
list_del(&new->list);
gmap_free(new);
new = ERR_PTR(rc);
}
spin_unlock(&parent->shadow_lock);
return new;
}

1244
arch/s390/kvm/gmap.c Normal file

File diff suppressed because it is too large Load diff

244
arch/s390/kvm/gmap.h Normal file
View file

@ -0,0 +1,244 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* KVM guest address space mapping code
*
* Copyright IBM Corp. 2007, 2016, 2025
* Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
* Claudio Imbrenda <imbrenda@linux.ibm.com>
*/
#ifndef ARCH_KVM_S390_GMAP_H
#define ARCH_KVM_S390_GMAP_H
#include "dat.h"
/**
* enum gmap_flags - Flags of a gmap.
*
* @GMAP_FLAG_SHADOW: The gmap is a vsie shadow gmap.
* @GMAP_FLAG_OWNS_PAGETABLES: The gmap owns all dat levels; normally 1, is 0
* only for ucontrol per-cpu gmaps, since they
* share the page tables with the main gmap.
* @GMAP_FLAG_IS_UCONTROL: The gmap is ucontrol (main gmap or per-cpu gmap).
* @GMAP_FLAG_ALLOW_HPAGE_1M: 1M hugepages are allowed for this gmap,
* independently of the page size used by userspace.
* @GMAP_FLAG_ALLOW_HPAGE_2G: 2G hugepages are allowed for this gmap,
* independently of the page size used by userspace.
* @GMAP_FLAG_PFAULT_ENABLED: Pfault is enabled for the gmap.
* @GMAP_FLAG_USES_SKEYS: If the guest uses storage keys.
* @GMAP_FLAG_USES_CMM: Whether the guest uses CMMA.
* @GMAP_FLAG_EXPORT_ON_UNMAP: Whether to export guest pages when unmapping.
*/
enum gmap_flags {
GMAP_FLAG_SHADOW = 0,
GMAP_FLAG_OWNS_PAGETABLES,
GMAP_FLAG_IS_UCONTROL,
GMAP_FLAG_ALLOW_HPAGE_1M,
GMAP_FLAG_ALLOW_HPAGE_2G,
GMAP_FLAG_PFAULT_ENABLED,
GMAP_FLAG_USES_SKEYS,
GMAP_FLAG_USES_CMM,
GMAP_FLAG_EXPORT_ON_UNMAP,
};
/**
* struct gmap_struct - Guest address space.
*
* @flags: GMAP_FLAG_* flags.
* @edat_level: The edat level of this shadow gmap.
* @kvm: The vm.
* @asce: The ASCE used by this gmap.
* @list: List head used in children gmaps for the children gmap list.
* @children_lock: Protects children and scb_users.
* @children: List of child gmaps of this gmap.
* @scb_users: List of vsie_scb that use this shadow gmap.
* @parent: Parent gmap of a child gmap.
* @guest_asce: Original ASCE of this shadow gmap.
* @host_to_rmap_lock: Protects host_to_rmap.
* @host_to_rmap: Radix tree mapping host addresses to guest addresses.
*/
struct gmap {
unsigned long flags;
unsigned char edat_level;
struct kvm *kvm;
union asce asce;
struct list_head list;
spinlock_t children_lock; /* Protects: children, scb_users */
struct list_head children;
struct list_head scb_users;
struct gmap *parent;
union asce guest_asce;
spinlock_t host_to_rmap_lock; /* Protects host_to_rmap */
struct radix_tree_root host_to_rmap;
refcount_t refcount;
};
struct gmap_cache {
struct list_head list;
struct gmap *gmap;
};
#define gmap_for_each_rmap_safe(pos, n, head) \
for (pos = (head); n = pos ? pos->next : NULL, pos; pos = n)
int s390_replace_asce(struct gmap *gmap);
bool _gmap_unmap_prefix(struct gmap *gmap, gfn_t gfn, gfn_t end, bool hint);
bool gmap_age_gfn(struct gmap *gmap, gfn_t start, gfn_t end);
bool gmap_unmap_gfn_range(struct gmap *gmap, struct kvm_memory_slot *slot, gfn_t start, gfn_t end);
int gmap_try_fixup_minor(struct gmap *gmap, struct guest_fault *fault);
struct gmap *gmap_new(struct kvm *kvm, gfn_t limit);
struct gmap *gmap_new_child(struct gmap *parent, gfn_t limit);
void gmap_remove_child(struct gmap *child);
void gmap_dispose(struct gmap *gmap);
int gmap_link(struct kvm_s390_mmu_cache *mc, struct gmap *gmap, struct guest_fault *fault);
void gmap_sync_dirty_log(struct gmap *gmap, gfn_t start, gfn_t end);
int gmap_set_limit(struct gmap *gmap, gfn_t limit);
int gmap_ucas_translate(struct kvm_s390_mmu_cache *mc, struct gmap *gmap, gpa_t *gaddr);
int gmap_ucas_map(struct gmap *gmap, gfn_t p_gfn, gfn_t c_gfn, unsigned long count);
void gmap_ucas_unmap(struct gmap *gmap, gfn_t c_gfn, unsigned long count);
int gmap_enable_skeys(struct gmap *gmap);
int gmap_pv_destroy_range(struct gmap *gmap, gfn_t start, gfn_t end, bool interruptible);
int gmap_insert_rmap(struct gmap *sg, gfn_t p_gfn, gfn_t r_gfn, int level);
int gmap_protect_rmap(struct kvm_s390_mmu_cache *mc, struct gmap *sg, gfn_t p_gfn, gfn_t r_gfn,
kvm_pfn_t pfn, int level, bool wr);
void gmap_set_cmma_all_dirty(struct gmap *gmap);
void _gmap_handle_vsie_unshadow_event(struct gmap *parent, gfn_t gfn);
struct gmap *gmap_create_shadow(struct kvm_s390_mmu_cache *mc, struct gmap *gmap,
union asce asce, int edat_level);
void gmap_split_huge_pages(struct gmap *gmap);
static inline bool uses_skeys(struct gmap *gmap)
{
return test_bit(GMAP_FLAG_USES_SKEYS, &gmap->flags);
}
static inline bool uses_cmm(struct gmap *gmap)
{
return test_bit(GMAP_FLAG_USES_CMM, &gmap->flags);
}
static inline bool pfault_enabled(struct gmap *gmap)
{
return test_bit(GMAP_FLAG_PFAULT_ENABLED, &gmap->flags);
}
static inline bool is_ucontrol(struct gmap *gmap)
{
return test_bit(GMAP_FLAG_IS_UCONTROL, &gmap->flags);
}
static inline bool is_shadow(struct gmap *gmap)
{
return test_bit(GMAP_FLAG_SHADOW, &gmap->flags);
}
static inline bool owns_page_tables(struct gmap *gmap)
{
return test_bit(GMAP_FLAG_OWNS_PAGETABLES, &gmap->flags);
}
static inline struct gmap *gmap_put(struct gmap *gmap)
{
if (refcount_dec_and_test(&gmap->refcount))
gmap_dispose(gmap);
return NULL;
}
static inline void gmap_get(struct gmap *gmap)
{
WARN_ON_ONCE(unlikely(!refcount_inc_not_zero(&gmap->refcount)));
}
static inline void gmap_handle_vsie_unshadow_event(struct gmap *parent, gfn_t gfn)
{
scoped_guard(spinlock, &parent->children_lock)
_gmap_handle_vsie_unshadow_event(parent, gfn);
}
static inline bool gmap_mkold_prefix(struct gmap *gmap, gfn_t gfn, gfn_t end)
{
return _gmap_unmap_prefix(gmap, gfn, end, true);
}
static inline bool gmap_unmap_prefix(struct gmap *gmap, gfn_t gfn, gfn_t end)
{
return _gmap_unmap_prefix(gmap, gfn, end, false);
}
static inline union pgste _gmap_ptep_xchg(struct gmap *gmap, union pte *ptep, union pte newpte,
union pgste pgste, gfn_t gfn, bool needs_lock)
{
lockdep_assert_held(&gmap->kvm->mmu_lock);
if (!needs_lock)
lockdep_assert_held(&gmap->children_lock);
else
lockdep_assert_not_held(&gmap->children_lock);
if (pgste.prefix_notif && (newpte.h.p || newpte.h.i)) {
pgste.prefix_notif = 0;
gmap_unmap_prefix(gmap, gfn, gfn + 1);
}
if (pgste.vsie_notif && (ptep->h.p != newpte.h.p || newpte.h.i)) {
pgste.vsie_notif = 0;
if (needs_lock)
gmap_handle_vsie_unshadow_event(gmap, gfn);
else
_gmap_handle_vsie_unshadow_event(gmap, gfn);
}
return __dat_ptep_xchg(ptep, pgste, newpte, gfn, gmap->asce, uses_skeys(gmap));
}
static inline union pgste gmap_ptep_xchg(struct gmap *gmap, union pte *ptep, union pte newpte,
union pgste pgste, gfn_t gfn)
{
return _gmap_ptep_xchg(gmap, ptep, newpte, pgste, gfn, true);
}
static inline void _gmap_crstep_xchg(struct gmap *gmap, union crste *crstep, union crste ne,
gfn_t gfn, bool needs_lock)
{
unsigned long align = 8 + (is_pmd(*crstep) ? 0 : 11);
lockdep_assert_held(&gmap->kvm->mmu_lock);
if (!needs_lock)
lockdep_assert_held(&gmap->children_lock);
gfn = ALIGN_DOWN(gfn, align);
if (crste_prefix(*crstep) && (ne.h.p || ne.h.i || !crste_prefix(ne))) {
ne.s.fc1.prefix_notif = 0;
gmap_unmap_prefix(gmap, gfn, gfn + align);
}
if (crste_leaf(*crstep) && crstep->s.fc1.vsie_notif &&
(ne.h.p || ne.h.i || !ne.s.fc1.vsie_notif)) {
ne.s.fc1.vsie_notif = 0;
if (needs_lock)
gmap_handle_vsie_unshadow_event(gmap, gfn);
else
_gmap_handle_vsie_unshadow_event(gmap, gfn);
}
dat_crstep_xchg(crstep, ne, gfn, gmap->asce);
}
static inline void gmap_crstep_xchg(struct gmap *gmap, union crste *crstep, union crste ne,
gfn_t gfn)
{
return _gmap_crstep_xchg(gmap, crstep, ne, gfn, true);
}
/**
* gmap_is_shadow_valid() - check if a shadow guest address space matches the
* given properties and is still valid.
* @sg: Pointer to the shadow guest address space structure.
* @asce: ASCE for which the shadow table is requested.
* @edat_level: Edat level to be used for the shadow translation.
*
* Return: true if the gmap shadow is still valid and matches the given
* properties and the caller can continue using it; false otherwise, the
* caller has to request a new shadow gmap in this case.
*/
static inline bool gmap_is_shadow_valid(struct gmap *sg, union asce asce, int edat_level)
{
return sg->guest_asce.val == asce.val && sg->edat_level == edat_level;
}
#endif /* ARCH_KVM_S390_GMAP_H */

View file

@ -21,6 +21,7 @@
#include "gaccess.h"
#include "trace.h"
#include "trace-s390.h"
#include "faultin.h"
u8 kvm_s390_get_ilen(struct kvm_vcpu *vcpu)
{
@ -367,8 +368,11 @@ static int handle_mvpg_pei(struct kvm_vcpu *vcpu)
reg2, &srcaddr, GACC_FETCH, 0);
if (rc)
return kvm_s390_inject_prog_cond(vcpu, rc);
rc = kvm_s390_handle_dat_fault(vcpu, srcaddr, 0);
if (rc != 0)
do {
rc = kvm_s390_faultin_gfn_simple(vcpu, NULL, gpa_to_gfn(srcaddr), false);
} while (rc == -EAGAIN);
if (rc)
return rc;
/* Ensure that the source is paged-in, no actual access -> no key checking */
@ -376,8 +380,11 @@ static int handle_mvpg_pei(struct kvm_vcpu *vcpu)
reg1, &dstaddr, GACC_STORE, 0);
if (rc)
return kvm_s390_inject_prog_cond(vcpu, rc);
rc = kvm_s390_handle_dat_fault(vcpu, dstaddr, FOLL_WRITE);
if (rc != 0)
do {
rc = kvm_s390_faultin_gfn_simple(vcpu, NULL, gpa_to_gfn(dstaddr), true);
} while (rc == -EAGAIN);
if (rc)
return rc;
kvm_s390_retry_instr(vcpu);

View file

@ -26,7 +26,6 @@
#include <linux/uaccess.h>
#include <asm/sclp.h>
#include <asm/isc.h>
#include <asm/gmap.h>
#include <asm/nmi.h>
#include <asm/airq.h>
#include <asm/tpi.h>
@ -34,6 +33,7 @@
#include "gaccess.h"
#include "trace-s390.h"
#include "pci.h"
#include "gmap.h"
#define PFAULT_INIT 0x0600
#define PFAULT_DONE 0x0680
@ -2632,12 +2632,12 @@ static int flic_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
case KVM_DEV_FLIC_APF_ENABLE:
if (kvm_is_ucontrol(dev->kvm))
return -EINVAL;
dev->kvm->arch.gmap->pfault_enabled = 1;
set_bit(GMAP_FLAG_PFAULT_ENABLED, &dev->kvm->arch.gmap->flags);
break;
case KVM_DEV_FLIC_APF_DISABLE_WAIT:
if (kvm_is_ucontrol(dev->kvm))
return -EINVAL;
dev->kvm->arch.gmap->pfault_enabled = 0;
clear_bit(GMAP_FLAG_PFAULT_ENABLED, &dev->kvm->arch.gmap->flags);
/*
* Make sure no async faults are in transition when
* clearing the queues. So we don't need to worry
@ -2768,13 +2768,13 @@ static int adapter_indicators_set(struct kvm *kvm,
bit = get_ind_bit(adapter_int->ind_addr,
adapter_int->ind_offset, adapter->swap);
set_bit(bit, map);
mark_page_dirty(kvm, adapter_int->ind_addr >> PAGE_SHIFT);
mark_page_dirty(kvm, adapter_int->ind_gaddr >> PAGE_SHIFT);
set_page_dirty_lock(ind_page);
map = page_address(summary_page);
bit = get_ind_bit(adapter_int->summary_addr,
adapter_int->summary_offset, adapter->swap);
summary_set = test_and_set_bit(bit, map);
mark_page_dirty(kvm, adapter_int->summary_addr >> PAGE_SHIFT);
mark_page_dirty(kvm, adapter_int->summary_gaddr >> PAGE_SHIFT);
set_page_dirty_lock(summary_page);
srcu_read_unlock(&kvm->srcu, idx);
@ -2870,7 +2870,9 @@ int kvm_set_routing_entry(struct kvm *kvm,
if (kvm_is_error_hva(uaddr_s) || kvm_is_error_hva(uaddr_i))
return -EFAULT;
e->adapter.summary_addr = uaddr_s;
e->adapter.summary_gaddr = ue->u.adapter.summary_addr;
e->adapter.ind_addr = uaddr_i;
e->adapter.ind_gaddr = ue->u.adapter.ind_addr;
e->adapter.summary_offset = ue->u.adapter.summary_offset;
e->adapter.ind_offset = ue->u.adapter.ind_offset;
e->adapter.adapter_id = ue->u.adapter.adapter_id;

File diff suppressed because it is too large Load diff

Some files were not shown because too many files have changed in this diff Show more