linux/virt/kvm
Shivank Garg ed1ffa810b KVM: guest_memfd: Enforce NUMA mempolicy using shared policy
Previously, guest-memfd allocations followed local NUMA node id in absence
of process mempolicy, resulting in arbitrary memory allocation.
Moreover, mbind() couldn't be used  by the VMM as guest memory wasn't
mapped into userspace when allocation occurred.

Enable NUMA policy support by implementing vm_ops for guest-memfd mmap
operation.  This allows the VMM to use mmap()+mbind() to set the desired
NUMA policy for a range of memory, and provides fine-grained control over
guest memory allocation across NUMA nodes.

Note, using mmap()+mbind() works even for PRIVATE memory, as mbind()
doesn't require the memory to be faulted in.  However, get_mempolicy()
and other paths that require the userspace page tables to be populated
may return incorrect information for PRIVATE memory (though under the hood,
KVM+guest_memfd will still behave correctly).

Store the policy in the inode structure, gmem_inode, as a shared memory
policy, so that the policy is a property of the physical memory itself,
i.e. not bound to the VMA.  In guest_memfd, KVM is the primary MMU and any
VMAs are secondary, i.e. using mbind() on a VMA to set policy is a means
to an end, e.g. to avoid having to add a file-based equivalent to mbind().

Similarly, retrieve the policy via mpol_shared_policy_lookup(), not
get_vma_policy(), even when allocating to fault in memory for userspace
mappings, so that the policy stored in gmem_inode is always the source of
true.

Apply policy changes only to future allocations, i.e. do not migrate
existing memory in the guest_memfd instance.  This matches mbind(2)'s
default behavior, which affects only new allocations unless overridden
with MPOL_MF_MOVE/MPOL_MF_MOVE_ALL flags (which are not supported by
guest_memfd as guest_memfd memory is unmovable).

Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Shivank Garg <shivankg@amd.com>
Tested-by: Ashish Kalra <ashish.kalra@amd.com>
Link: https://lore.kernel.org/all/e9d43abc-bcdb-4f9f-9ad7-5644f714de19@amd.com
[sean: fold in fixup (see Link above), massage changelog]
Link: https://lore.kernel.org/r/20251016172853.52451-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20 06:30:41 -07:00
..
async_pf.c KVM: remove redundant __GFP_NOWARN 2025-08-19 11:51:13 -07:00
async_pf.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504 2019-06-19 17:09:56 +02:00
binary_stats.c KVM: stats: remove dead stores 2021-08-13 03:35:15 -04:00
coalesced_mmio.c KVM: Clean up coalesced MMIO ring full check 2024-08-29 19:38:33 -07:00
coalesced_mmio.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dirty_ring.c KVM: Assert that slots_lock is held when resetting per-vCPU dirty rings 2025-06-20 13:41:04 -07:00
eventfd.c KVM: Export KVM-internal symbols for sub-modules only 2025-09-30 13:40:02 -04:00
guest_memfd.c KVM: guest_memfd: Enforce NUMA mempolicy using shared policy 2025-10-20 06:30:41 -07:00
irqchip.c KVM: x86: Trigger I/O APIC route rescan in kvm_arch_irq_routing_update() 2025-06-20 13:52:41 -07:00
Kconfig KVM x86 fixes for 6.18: 2025-10-18 10:25:43 +02:00
kvm_main.c KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes 2025-10-20 06:30:40 -07:00
kvm_mm.h KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes 2025-10-20 06:30:40 -07:00
Makefile.kvm KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GUEST_MEMFD 2025-08-27 04:34:59 -04:00
pfncache.c KVM: pfncache: Precisely track refcounted pages 2024-10-25 12:57:59 -04:00
vfio.c VFIO: KVM: x86: Drop kvm_arch_{start,end}_assignment() 2025-06-25 09:51:33 -07:00
vfio.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00