mirror of
https://github.com/torvalds/linux.git
synced 2026-03-07 23:04:33 +01:00
The ARM64_WORKAROUND_REPEAT_TLBI workaround is used to mitigate several errata where broadcast TLBI;DSB sequences don't provide all the architecturally required synchronization. The workaround performs more work than necessary, and can have significant overhead. This patch optimizes the workaround, as explained below. The workaround was originally added for Qualcomm Falkor erratum 1009 in commit:d9ff80f83e("arm64: Work around Falkor erratum 1009") As noted in the message for that commit, the workaround is applied even in cases where it is not strictly necessary. The workaround was later reused without changes for: * Arm Cortex-A76 erratum #1286807 SDEN v33: https://developer.arm.com/documentation/SDEN-885749/33-0/ * Arm Cortex-A55 erratum #2441007 SDEN v16: https://developer.arm.com/documentation/SDEN-859338/1600/ * Arm Cortex-A510 erratum #2441009 SDEN v19: https://developer.arm.com/documentation/SDEN-1873351/1900/ The important details to note are as follows: 1. All relevant errata only affect the ordering and/or completion of memory accesses which have been translated by an invalidated TLB entry. The actual invalidation of TLB entries is unaffected. 2. The existing workaround is applied to both broadcast and local TLB invalidation, whereas for all relevant errata it is only necessary to apply a workaround for broadcast invalidation. 3. The existing workaround replaces every TLBI with a TLBI;DSB;TLBI sequence, whereas for all relevant errata it is only necessary to execute a single additional TLBI;DSB sequence after any number of TLBIs are completed by a DSB. For example, for a sequence of batched TLBIs: TLBI <op1>[, <arg1>] TLBI <op2>[, <arg2>] TLBI <op3>[, <arg3>] DSB ISH ... the existing workaround will expand this to: TLBI <op1>[, <arg1>] DSB ISH // additional TLBI <op1>[, <arg1>] // additional TLBI <op2>[, <arg2>] DSB ISH // additional TLBI <op2>[, <arg2>] // additional TLBI <op3>[, <arg3>] DSB ISH // additional TLBI <op3>[, <arg3>] // additional DSB ISH ... whereas it is sufficient to have: TLBI <op1>[, <arg1>] TLBI <op2>[, <arg2>] TLBI <op3>[, <arg3>] DSB ISH TLBI <opX>[, <argX>] // additional DSB ISH // additional Using a single additional TBLI and DSB at the end of the sequence can have significantly lower overhead as each DSB which completes a TLBI must synchronize with other PEs in the system, with potential performance effects both locally and system-wide. 4. The existing workaround repeats each specific TLBI operation, whereas for all relevant errata it is sufficient for the additional TLBI to use *any* operation which will be broadcast, regardless of which translation regime or stage of translation the operation applies to. For example, for a single TLBI: TLBI ALLE2IS DSB ISH ... the existing workaround will expand this to: TLBI ALLE2IS DSB ISH TLBI ALLE2IS // additional DSB ISH // additional ... whereas it is sufficient to have: TLBI ALLE2IS DSB ISH TLBI VALE1IS, XZR // additional DSB ISH // additional As the additional TLBI doesn't have to match a specific earlier TLBI, the additional TLBI can be implemented in separate code, with no memory of the earlier TLBIs. The additional TLBI can also use a cheaper TLBI operation. 5. The existing workaround is applied to both Stage-1 and Stage-2 TLB invalidation, whereas for all relevant errata it is only necessary to apply a workaround for Stage-1 invalidation. Architecturally, TLBI operations which invalidate only Stage-2 information (e.g. IPAS2E1IS) are not required to invalidate TLB entries which combine information from Stage-1 and Stage-2 translation table entries, and consequently may not complete memory accesses translated by those combined entries. In these cases, completion of memory accesses is only guaranteed after subsequent invalidation of Stage-1 information (e.g. VMALLE1IS). Taking the above points into account, this patch reworks the workaround logic to reduce overhead: * New __tlbi_sync_s1ish() and __tlbi_sync_s1ish_hyp() functions are added and used in place of any dsb(ish) which is used to complete broadcast Stage-1 TLB maintenance. When the ARM64_WORKAROUND_REPEAT_TLBI workaround is enabled, these helpers will execute an additional TLBI;DSB sequence. For consistency, it might make sense to add __tlbi_sync_*() helpers for local and stage 2 maintenance. For now I've left those with open-coded dsb() to keep the diff small. * The duplication of TLBIs in __TLBI_0() and __TLBI_1() is removed. This is no longer needed as the necessary synchronization will happen in __tlbi_sync_s1ish() or __tlbi_sync_s1ish_hyp(). * The additional TLBI operation is chosen to have minimal impact: - __tlbi_sync_s1ish() uses "TLBI VALE1IS, XZR". This is only used at EL1 or at EL2 with {E2H,TGE}=={1,1}, where it will target an unused entry for the reserved ASID in the kernel's own translation regime, and have no adverse affect. - __tlbi_sync_s1ish_hyp() uses "TLBI VALE2IS, XZR". This is only used in hyp code, where it will target an unused entry in the hyp code's TTBR0 mapping, and should have no adverse effect. * As __TLBI_0() and __TLBI_1() no longer replace each TLBI with a TLBI;DSB;TLBI sequence, batching TLBIs is worthwhile, and there's no need for arch_tlbbatch_should_defer() to consider ARM64_WORKAROUND_REPEAT_TLBI. When building defconfig with GCC 15.1.0, compared to v6.19-rc1, this patch saves ~1KiB of text, makes the vmlinux ~42KiB smaller, and makes the resulting Image 64KiB smaller: | [mark@lakrids:~/src/linux]% size vmlinux-* | text data bss dec hex filename | 21179831 19660919 70821641548966279fca6 vmlinux-after | 21181075 19660903 708216 41550194 27a0172 vmlinux-before | [mark@lakrids:~/src/linux]% ls -l vmlinux-* | -rwxr-xr-x 1 mark mark 157771472 Feb 4 12:05 vmlinux-after | -rwxr-xr-x 1 mark mark 157815432 Feb 4 12:05 vmlinux-before | [mark@lakrids:~/src/linux]% ls -l Image-* | -rw-r--r-- 1 mark mark 41007616 Feb 4 12:05 Image-after | -rw-r--r-- 1 mark mark 41073152 Feb 4 12:05 Image-before Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oupton@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
504 lines
12 KiB
C
504 lines
12 KiB
C
// SPDX-License-Identifier: GPL-2.0-only
|
|
/*
|
|
* Copyright (C) 2020 Google LLC
|
|
* Author: Quentin Perret <qperret@google.com>
|
|
*/
|
|
|
|
#include <linux/kvm_host.h>
|
|
#include <asm/kvm_hyp.h>
|
|
#include <asm/kvm_mmu.h>
|
|
#include <asm/kvm_pgtable.h>
|
|
#include <asm/kvm_pkvm.h>
|
|
#include <asm/spectre.h>
|
|
|
|
#include <nvhe/early_alloc.h>
|
|
#include <nvhe/gfp.h>
|
|
#include <nvhe/memory.h>
|
|
#include <nvhe/mem_protect.h>
|
|
#include <nvhe/mm.h>
|
|
#include <nvhe/spinlock.h>
|
|
|
|
struct kvm_pgtable pkvm_pgtable;
|
|
hyp_spinlock_t pkvm_pgd_lock;
|
|
|
|
struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
|
|
unsigned int hyp_memblock_nr;
|
|
|
|
static u64 __io_map_base;
|
|
|
|
struct hyp_fixmap_slot {
|
|
u64 addr;
|
|
kvm_pte_t *ptep;
|
|
};
|
|
static DEFINE_PER_CPU(struct hyp_fixmap_slot, fixmap_slots);
|
|
|
|
static int __pkvm_create_mappings(unsigned long start, unsigned long size,
|
|
unsigned long phys, enum kvm_pgtable_prot prot)
|
|
{
|
|
int err;
|
|
|
|
hyp_spin_lock(&pkvm_pgd_lock);
|
|
err = kvm_pgtable_hyp_map(&pkvm_pgtable, start, size, phys, prot);
|
|
hyp_spin_unlock(&pkvm_pgd_lock);
|
|
|
|
return err;
|
|
}
|
|
|
|
static int __pkvm_alloc_private_va_range(unsigned long start, size_t size)
|
|
{
|
|
unsigned long cur;
|
|
|
|
hyp_assert_lock_held(&pkvm_pgd_lock);
|
|
|
|
if (!start || start < __io_map_base)
|
|
return -EINVAL;
|
|
|
|
/* The allocated size is always a multiple of PAGE_SIZE */
|
|
cur = start + PAGE_ALIGN(size);
|
|
|
|
/* Are we overflowing on the vmemmap ? */
|
|
if (cur > __hyp_vmemmap)
|
|
return -ENOMEM;
|
|
|
|
__io_map_base = cur;
|
|
|
|
return 0;
|
|
}
|
|
|
|
/**
|
|
* pkvm_alloc_private_va_range - Allocates a private VA range.
|
|
* @size: The size of the VA range to reserve.
|
|
* @haddr: The hypervisor virtual start address of the allocation.
|
|
*
|
|
* The private virtual address (VA) range is allocated above __io_map_base
|
|
* and aligned based on the order of @size.
|
|
*
|
|
* Return: 0 on success or negative error code on failure.
|
|
*/
|
|
int pkvm_alloc_private_va_range(size_t size, unsigned long *haddr)
|
|
{
|
|
unsigned long addr;
|
|
int ret;
|
|
|
|
hyp_spin_lock(&pkvm_pgd_lock);
|
|
addr = __io_map_base;
|
|
ret = __pkvm_alloc_private_va_range(addr, size);
|
|
hyp_spin_unlock(&pkvm_pgd_lock);
|
|
|
|
*haddr = addr;
|
|
|
|
return ret;
|
|
}
|
|
|
|
int __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
|
|
enum kvm_pgtable_prot prot,
|
|
unsigned long *haddr)
|
|
{
|
|
unsigned long addr;
|
|
int err;
|
|
|
|
size = PAGE_ALIGN(size + offset_in_page(phys));
|
|
err = pkvm_alloc_private_va_range(size, &addr);
|
|
if (err)
|
|
return err;
|
|
|
|
err = __pkvm_create_mappings(addr, size, phys, prot);
|
|
if (err)
|
|
return err;
|
|
|
|
*haddr = addr + offset_in_page(phys);
|
|
return err;
|
|
}
|
|
|
|
int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot)
|
|
{
|
|
unsigned long start = (unsigned long)from;
|
|
unsigned long end = (unsigned long)to;
|
|
unsigned long virt_addr;
|
|
phys_addr_t phys;
|
|
|
|
hyp_assert_lock_held(&pkvm_pgd_lock);
|
|
|
|
start = start & PAGE_MASK;
|
|
end = PAGE_ALIGN(end);
|
|
|
|
for (virt_addr = start; virt_addr < end; virt_addr += PAGE_SIZE) {
|
|
int err;
|
|
|
|
phys = hyp_virt_to_phys((void *)virt_addr);
|
|
err = kvm_pgtable_hyp_map(&pkvm_pgtable, virt_addr, PAGE_SIZE,
|
|
phys, prot);
|
|
if (err)
|
|
return err;
|
|
}
|
|
|
|
return 0;
|
|
}
|
|
|
|
int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
|
|
{
|
|
int ret;
|
|
|
|
hyp_spin_lock(&pkvm_pgd_lock);
|
|
ret = pkvm_create_mappings_locked(from, to, prot);
|
|
hyp_spin_unlock(&pkvm_pgd_lock);
|
|
|
|
return ret;
|
|
}
|
|
|
|
int hyp_back_vmemmap(phys_addr_t back)
|
|
{
|
|
unsigned long i, start, size, end = 0;
|
|
int ret;
|
|
|
|
for (i = 0; i < hyp_memblock_nr; i++) {
|
|
start = hyp_memory[i].base;
|
|
start = ALIGN_DOWN((u64)hyp_phys_to_page(start), PAGE_SIZE);
|
|
/*
|
|
* The beginning of the hyp_vmemmap region for the current
|
|
* memblock may already be backed by the page backing the end
|
|
* the previous region, so avoid mapping it twice.
|
|
*/
|
|
start = max(start, end);
|
|
|
|
end = hyp_memory[i].base + hyp_memory[i].size;
|
|
end = PAGE_ALIGN((u64)hyp_phys_to_page(end));
|
|
if (start >= end)
|
|
continue;
|
|
|
|
size = end - start;
|
|
ret = __pkvm_create_mappings(start, size, back, PAGE_HYP);
|
|
if (ret)
|
|
return ret;
|
|
|
|
memset(hyp_phys_to_virt(back), 0, size);
|
|
back += size;
|
|
}
|
|
|
|
return 0;
|
|
}
|
|
|
|
static void *__hyp_bp_vect_base;
|
|
int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot)
|
|
{
|
|
void *vector;
|
|
|
|
switch (slot) {
|
|
case HYP_VECTOR_DIRECT: {
|
|
vector = __kvm_hyp_vector;
|
|
break;
|
|
}
|
|
case HYP_VECTOR_SPECTRE_DIRECT: {
|
|
vector = __bp_harden_hyp_vecs;
|
|
break;
|
|
}
|
|
case HYP_VECTOR_INDIRECT:
|
|
case HYP_VECTOR_SPECTRE_INDIRECT: {
|
|
vector = (void *)__hyp_bp_vect_base;
|
|
break;
|
|
}
|
|
default:
|
|
return -EINVAL;
|
|
}
|
|
|
|
vector = __kvm_vector_slot2addr(vector, slot);
|
|
*this_cpu_ptr(&kvm_hyp_vector) = (unsigned long)vector;
|
|
|
|
return 0;
|
|
}
|
|
|
|
int hyp_map_vectors(void)
|
|
{
|
|
phys_addr_t phys;
|
|
unsigned long bp_base;
|
|
int ret;
|
|
|
|
if (!kvm_system_needs_idmapped_vectors()) {
|
|
__hyp_bp_vect_base = __bp_harden_hyp_vecs;
|
|
return 0;
|
|
}
|
|
|
|
phys = __hyp_pa(__bp_harden_hyp_vecs);
|
|
ret = __pkvm_create_private_mapping(phys, __BP_HARDEN_HYP_VECS_SZ,
|
|
PAGE_HYP_EXEC, &bp_base);
|
|
if (ret)
|
|
return ret;
|
|
|
|
__hyp_bp_vect_base = (void *)bp_base;
|
|
|
|
return 0;
|
|
}
|
|
|
|
static void *fixmap_map_slot(struct hyp_fixmap_slot *slot, phys_addr_t phys)
|
|
{
|
|
kvm_pte_t pte, *ptep = slot->ptep;
|
|
|
|
pte = *ptep;
|
|
pte &= ~kvm_phys_to_pte(KVM_PHYS_INVALID);
|
|
pte |= kvm_phys_to_pte(phys) | KVM_PTE_VALID;
|
|
WRITE_ONCE(*ptep, pte);
|
|
dsb(ishst);
|
|
|
|
return (void *)slot->addr;
|
|
}
|
|
|
|
void *hyp_fixmap_map(phys_addr_t phys)
|
|
{
|
|
return fixmap_map_slot(this_cpu_ptr(&fixmap_slots), phys);
|
|
}
|
|
|
|
static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
|
|
{
|
|
kvm_pte_t *ptep = slot->ptep;
|
|
u64 addr = slot->addr;
|
|
u32 level;
|
|
|
|
if (FIELD_GET(KVM_PTE_TYPE, *ptep) == KVM_PTE_TYPE_PAGE)
|
|
level = KVM_PGTABLE_LAST_LEVEL;
|
|
else
|
|
level = KVM_PGTABLE_LAST_LEVEL - 1; /* create_fixblock() guarantees PMD level */
|
|
|
|
WRITE_ONCE(*ptep, *ptep & ~KVM_PTE_VALID);
|
|
|
|
/*
|
|
* Irritatingly, the architecture requires that we use inner-shareable
|
|
* broadcast TLB invalidation here in case another CPU speculates
|
|
* through our fixmap and decides to create an "amalagamation of the
|
|
* values held in the TLB" due to the apparent lack of a
|
|
* break-before-make sequence.
|
|
*
|
|
* https://lore.kernel.org/kvm/20221017115209.2099-1-will@kernel.org/T/#mf10dfbaf1eaef9274c581b81c53758918c1d0f03
|
|
*/
|
|
dsb(ishst);
|
|
__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
|
|
__tlbi_sync_s1ish_hyp();
|
|
isb();
|
|
}
|
|
|
|
void hyp_fixmap_unmap(void)
|
|
{
|
|
fixmap_clear_slot(this_cpu_ptr(&fixmap_slots));
|
|
}
|
|
|
|
static int __create_fixmap_slot_cb(const struct kvm_pgtable_visit_ctx *ctx,
|
|
enum kvm_pgtable_walk_flags visit)
|
|
{
|
|
struct hyp_fixmap_slot *slot = (struct hyp_fixmap_slot *)ctx->arg;
|
|
|
|
if (!kvm_pte_valid(ctx->old) || (ctx->end - ctx->start) != kvm_granule_size(ctx->level))
|
|
return -EINVAL;
|
|
|
|
slot->addr = ctx->addr;
|
|
slot->ptep = ctx->ptep;
|
|
|
|
/*
|
|
* Clear the PTE, but keep the page-table page refcount elevated to
|
|
* prevent it from ever being freed. This lets us manipulate the PTEs
|
|
* by hand safely without ever needing to allocate memory.
|
|
*/
|
|
fixmap_clear_slot(slot);
|
|
|
|
return 0;
|
|
}
|
|
|
|
static int create_fixmap_slot(u64 addr, u64 cpu)
|
|
{
|
|
struct kvm_pgtable_walker walker = {
|
|
.cb = __create_fixmap_slot_cb,
|
|
.flags = KVM_PGTABLE_WALK_LEAF,
|
|
.arg = per_cpu_ptr(&fixmap_slots, cpu),
|
|
};
|
|
|
|
return kvm_pgtable_walk(&pkvm_pgtable, addr, PAGE_SIZE, &walker);
|
|
}
|
|
|
|
#if PAGE_SHIFT < 16
|
|
#define HAS_FIXBLOCK
|
|
static struct hyp_fixmap_slot hyp_fixblock_slot;
|
|
static DEFINE_HYP_SPINLOCK(hyp_fixblock_lock);
|
|
#endif
|
|
|
|
static int create_fixblock(void)
|
|
{
|
|
#ifdef HAS_FIXBLOCK
|
|
struct kvm_pgtable_walker walker = {
|
|
.cb = __create_fixmap_slot_cb,
|
|
.flags = KVM_PGTABLE_WALK_LEAF,
|
|
.arg = &hyp_fixblock_slot,
|
|
};
|
|
unsigned long addr;
|
|
phys_addr_t phys;
|
|
int ret, i;
|
|
|
|
/* Find a RAM phys address, PMD aligned */
|
|
for (i = 0; i < hyp_memblock_nr; i++) {
|
|
phys = ALIGN(hyp_memory[i].base, PMD_SIZE);
|
|
if (phys + PMD_SIZE < (hyp_memory[i].base + hyp_memory[i].size))
|
|
break;
|
|
}
|
|
|
|
if (i >= hyp_memblock_nr)
|
|
return -EINVAL;
|
|
|
|
hyp_spin_lock(&pkvm_pgd_lock);
|
|
addr = ALIGN(__io_map_base, PMD_SIZE);
|
|
ret = __pkvm_alloc_private_va_range(addr, PMD_SIZE);
|
|
if (ret)
|
|
goto unlock;
|
|
|
|
ret = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, PMD_SIZE, phys, PAGE_HYP);
|
|
if (ret)
|
|
goto unlock;
|
|
|
|
ret = kvm_pgtable_walk(&pkvm_pgtable, addr, PMD_SIZE, &walker);
|
|
|
|
unlock:
|
|
hyp_spin_unlock(&pkvm_pgd_lock);
|
|
|
|
return ret;
|
|
#else
|
|
return 0;
|
|
#endif
|
|
}
|
|
|
|
void *hyp_fixblock_map(phys_addr_t phys, size_t *size)
|
|
{
|
|
#ifdef HAS_FIXBLOCK
|
|
*size = PMD_SIZE;
|
|
hyp_spin_lock(&hyp_fixblock_lock);
|
|
return fixmap_map_slot(&hyp_fixblock_slot, phys);
|
|
#else
|
|
*size = PAGE_SIZE;
|
|
return hyp_fixmap_map(phys);
|
|
#endif
|
|
}
|
|
|
|
void hyp_fixblock_unmap(void)
|
|
{
|
|
#ifdef HAS_FIXBLOCK
|
|
fixmap_clear_slot(&hyp_fixblock_slot);
|
|
hyp_spin_unlock(&hyp_fixblock_lock);
|
|
#else
|
|
hyp_fixmap_unmap();
|
|
#endif
|
|
}
|
|
|
|
int hyp_create_fixmap(void)
|
|
{
|
|
unsigned long addr, i;
|
|
int ret;
|
|
|
|
for (i = 0; i < hyp_nr_cpus; i++) {
|
|
ret = pkvm_alloc_private_va_range(PAGE_SIZE, &addr);
|
|
if (ret)
|
|
return ret;
|
|
|
|
ret = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, PAGE_SIZE,
|
|
__hyp_pa(__hyp_bss_start), PAGE_HYP);
|
|
if (ret)
|
|
return ret;
|
|
|
|
ret = create_fixmap_slot(addr, i);
|
|
if (ret)
|
|
return ret;
|
|
}
|
|
|
|
return create_fixblock();
|
|
}
|
|
|
|
int hyp_create_idmap(u32 hyp_va_bits)
|
|
{
|
|
unsigned long start, end;
|
|
|
|
start = hyp_virt_to_phys((void *)__hyp_idmap_text_start);
|
|
start = ALIGN_DOWN(start, PAGE_SIZE);
|
|
|
|
end = hyp_virt_to_phys((void *)__hyp_idmap_text_end);
|
|
end = ALIGN(end, PAGE_SIZE);
|
|
|
|
/*
|
|
* One half of the VA space is reserved to linearly map portions of
|
|
* memory -- see va_layout.c for more details. The other half of the VA
|
|
* space contains the trampoline page, and needs some care. Split that
|
|
* second half in two and find the quarter of VA space not conflicting
|
|
* with the idmap to place the IOs and the vmemmap. IOs use the lower
|
|
* half of the quarter and the vmemmap the upper half.
|
|
*/
|
|
__io_map_base = start & BIT(hyp_va_bits - 2);
|
|
__io_map_base ^= BIT(hyp_va_bits - 2);
|
|
__hyp_vmemmap = __io_map_base | BIT(hyp_va_bits - 3);
|
|
|
|
return __pkvm_create_mappings(start, end - start, start, PAGE_HYP_EXEC);
|
|
}
|
|
|
|
int pkvm_create_stack(phys_addr_t phys, unsigned long *haddr)
|
|
{
|
|
unsigned long addr, prev_base;
|
|
size_t size;
|
|
int ret;
|
|
|
|
hyp_spin_lock(&pkvm_pgd_lock);
|
|
|
|
prev_base = __io_map_base;
|
|
/*
|
|
* Efficient stack verification using the NVHE_STACK_SHIFT bit implies
|
|
* an alignment of our allocation on the order of the size.
|
|
*/
|
|
size = NVHE_STACK_SIZE * 2;
|
|
addr = ALIGN(__io_map_base, size);
|
|
|
|
ret = __pkvm_alloc_private_va_range(addr, size);
|
|
if (!ret) {
|
|
/*
|
|
* Since the stack grows downwards, map the stack to the page
|
|
* at the higher address and leave the lower guard page
|
|
* unbacked.
|
|
*
|
|
* Any valid stack address now has the NVHE_STACK_SHIFT bit as 1
|
|
* and addresses corresponding to the guard page have the
|
|
* NVHE_STACK_SHIFT bit as 0 - this is used for overflow detection.
|
|
*/
|
|
ret = kvm_pgtable_hyp_map(&pkvm_pgtable, addr + NVHE_STACK_SIZE,
|
|
NVHE_STACK_SIZE, phys, PAGE_HYP);
|
|
if (ret)
|
|
__io_map_base = prev_base;
|
|
}
|
|
hyp_spin_unlock(&pkvm_pgd_lock);
|
|
|
|
*haddr = addr + size;
|
|
|
|
return ret;
|
|
}
|
|
|
|
static void *admit_host_page(void *arg)
|
|
{
|
|
struct kvm_hyp_memcache *host_mc = arg;
|
|
|
|
if (!host_mc->nr_pages)
|
|
return NULL;
|
|
|
|
/*
|
|
* The host still owns the pages in its memcache, so we need to go
|
|
* through a full host-to-hyp donation cycle to change it. Fortunately,
|
|
* __pkvm_host_donate_hyp() takes care of races for us, so if it
|
|
* succeeds we're good to go.
|
|
*/
|
|
if (__pkvm_host_donate_hyp(hyp_phys_to_pfn(host_mc->head), 1))
|
|
return NULL;
|
|
|
|
return pop_hyp_memcache(host_mc, hyp_phys_to_virt);
|
|
}
|
|
|
|
/* Refill our local memcache by popping pages from the one provided by the host. */
|
|
int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
|
|
struct kvm_hyp_memcache *host_mc)
|
|
{
|
|
struct kvm_hyp_memcache tmp = *host_mc;
|
|
int ret;
|
|
|
|
ret = __topup_hyp_memcache(mc, min_pages, admit_host_page,
|
|
hyp_virt_to_phys, &tmp);
|
|
*host_mc = tmp;
|
|
|
|
return ret;
|
|
}
|