linux/mm
Thomas Hellström b570f37a2c
mm: Fix a hmm_range_fault() livelock / starvation problem
If hmm_range_fault() fails a folio_trylock() in do_swap_page,
trying to acquire the lock of a device-private folio for migration,
to ram, the function will spin until it succeeds grabbing the lock.

However, if the process holding the lock is depending on a work
item to be completed, which is scheduled on the same CPU as the
spinning hmm_range_fault(), that work item might be starved and
we end up in a livelock / starvation situation which is never
resolved.

This can happen, for example if the process holding the
device-private folio lock is stuck in
   migrate_device_unmap()->lru_add_drain_all()
sinc lru_add_drain_all() requires a short work-item
to be run on all online cpus to complete.

A prerequisite for this to happen is:
a) Both zone device and system memory folios are considered in
   migrate_device_unmap(), so that there is a reason to call
   lru_add_drain_all() for a system memory folio while a
   folio lock is held on a zone device folio.
b) The zone device folio has an initial mapcount > 1 which causes
   at least one migration PTE entry insertion to be deferred to
   try_to_migrate(), which can happen after the call to
   lru_add_drain_all().
c) No or voluntary only preemption.

This all seems pretty unlikely to happen, but indeed is hit by
the "xe_exec_system_allocator" igt test.

Resolve this by waiting for the folio to be unlocked if the
folio_trylock() fails in do_swap_page().

Rename migration_entry_wait_on_locked() to
softleaf_entry_wait_unlock() and update its documentation to
indicate the new use-case.

Future code improvements might consider moving
the lru_add_drain_all() call in migrate_device_unmap() to be
called *after* all pages have migration entries inserted.
That would eliminate also b) above.

v2:
- Instead of a cond_resched() in hmm_range_fault(),
  eliminate the problem by waiting for the folio to be unlocked
  in do_swap_page() (Alistair Popple, Andrew Morton)
v3:
- Add a stub migration_entry_wait_on_locked() for the
  !CONFIG_MIGRATION case. (Kernel Test Robot)
v4:
- Rename migrate_entry_wait_on_locked() to
  softleaf_entry_wait_on_locked() and update docs (Alistair Popple)
v5:
- Add a WARN_ON_ONCE() for the !CONFIG_MIGRATION
  version of softleaf_entry_wait_on_locked().
- Modify wording around function names in the commit message
  (Andrew Morton)

Suggested-by: Alistair Popple <apopple@nvidia.com>
Fixes: 1afaeb8293 ("mm/migrate: Trylock device page in do_swap_page")
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: linux-mm@kvack.org
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.15+
Reviewed-by: John Hubbard <jhubbard@nvidia.com> #v3
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Link: https://patch.msgid.link/20260210115653.92413-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit a69d1ab971a624c6f112cea61536569d579c3215)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-03-02 11:51:51 -05:00
..
damon mm/damon/core: disallow non-power of two min_region_sz 2026-02-24 11:13:27 -08:00
kasan Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
kfence mm/kfence: fix KASAN hardware tag faults during late enablement 2026-02-24 11:13:27 -08:00
kmsan Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
tests sparc/mm: export symbols for lazy_mmu_mode KUnit tests 2026-01-31 14:22:40 -08:00
backing-dev.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
balloon.c mm: rename CONFIG_BALLOON_COMPACTION to CONFIG_BALLOON_MIGRATION 2026-01-31 14:22:36 -08:00
bootmem_info.c mm/sparse: allow for alternate vmemmap section init at boot 2025-03-16 22:06:27 -07:00
bpf_memcontrol.c bpf: Revert "bpf: drop KF_ACQUIRE flag on BPF kfunc bpf_get_root_mem_cgroup()" 2026-01-21 09:38:16 -08:00
cma.c mm/cma: replace snprintf with strscpy in cma_new_area 2026-02-06 15:47:15 -08:00
cma.h mm: cma: set early_pfn and bitmap as a union in cma_memrange 2025-05-22 14:55:36 -07:00
cma_debug.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
cma_sysfs.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
compaction.c mm/compaction: fix low_pfn advance on isolating hugetlb 2025-09-28 11:51:29 -07:00
debug.c mm: constify __dump_folio() arguments 2025-11-20 13:43:57 -08:00
debug_page_alloc.c mm/debug_page_alloc: improve error message for invalid guardpage minorder 2025-05-12 23:50:38 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm: debug_vm_pgtable: add debug_vm_pgtable_free_huge_page() 2026-01-26 20:02:27 -08:00
dmapool.c docs: dma-api: replace consistent with coherent 2025-07-01 13:25:36 -06:00
dmapool_test.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
early_ioremap.c mm/early_ioremap: clean up the use of WARN() for debugging 2026-01-26 20:02:26 -08:00
execmem.c mm: remove PMD alignment constraint in execmem_vmalloc() 2025-09-28 11:51:31 -07:00
fadvise.c mm: rename filemap_fdatawrite_range_kick to filemap_flush_range 2025-10-29 15:50:42 +01:00
fail_page_alloc.c
failslab.c
filemap.c mm: Fix a hmm_range_fault() livelock / starvation problem 2026-03-02 11:51:51 -05:00
folio-compat.c mm: add SPDX id lines to some mm source files 2026-02-06 15:47:16 -08:00
gup.c mm/gup: remove no longer used gup_fast_undo_dev_pagemap 2026-01-20 19:24:49 -08:00
gup_test.c mm: add SPDX id lines to some mm source files 2026-02-06 15:47:16 -08:00
gup_test.h
highmem.c mm/highmem: fix __kmap_to_page() build error 2026-01-31 14:22:38 -08:00
hmm.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
huge_memory.c mm: thp: deny THP for files on anonymous inodes 2026-02-24 11:13:26 -08:00
hugetlb.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
hugetlb_cgroup.c Convert 'alloc_flex' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
hugetlb_cma.c mm: hugetlb_cma: mark hugetlb_cma{_only} as __ro_after_init 2026-01-31 14:22:43 -08:00
hugetlb_cma.h mm: hugetlb: allocate frozen pages for gigantic allocation 2026-01-26 20:02:28 -08:00
hugetlb_internal.h mm/hugetlb: extract sysctl into hugetlb_sysctl.c 2025-11-20 13:43:57 -08:00
hugetlb_sysctl.c mm, hugetlb: implement movable_gigantic_pages sysctl 2026-01-20 19:24:50 -08:00
hugetlb_sysfs.c mm/hugetlb: extract sysfs into hugetlb_sysfs.c 2025-11-20 13:43:57 -08:00
hugetlb_vmemmap.c Revert "mm/hugetlb: deal with multiple calls to hugetlb_bootmem_alloc" 2026-01-26 20:02:20 -08:00
hugetlb_vmemmap.h mm/hugetlb: do pre-HVO for bootmem allocated pages 2025-03-16 22:06:29 -07:00
hwpoison-inject.c mm/hwpoison: decouple hwpoison_filter from mm/memory-failure.c 2025-09-21 14:22:21 -07:00
init-mm.c mm: rename cpu_bitmap field to flexible_array 2026-01-19 12:30:00 -08:00
internal.h mm.git review status for linus..mm-stable 2026-02-18 20:50:32 -08:00
interval_tree.c
ioremap.c mm/ioremap: pass pgprot_t to ioremap_prot() instead of unsigned long 2025-03-16 22:06:23 -07:00
Kconfig mm.git review status for linus..mm-stable 2026-02-12 11:32:37 -08:00
Kconfig.debug mm: fix DEBUG_RODATA_TEST indentation in Kconfig 2025-11-29 10:41:09 -08:00
khugepaged.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
kmemleak.c slab updates for 7.0 part2 2026-02-16 13:41:38 -08:00
ksm.c Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
list_lru.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
maccess.c mm: unexport globally copy_to_kernel_nofault 2025-07-09 22:42:22 -07:00
madvise.c Convert 'alloc_flex' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
Makefile mm.git review status for linus..mm-nonmm-stable 2026-02-12 12:13:01 -08:00
mapping_dirty_helpers.c mm/dirty: replace READ_ONCE() with pudp_get() 2025-11-16 17:27:58 -08:00
memblock.c memblock: updates for 7.0-rc1 2026-02-14 12:39:34 -08:00
memcontrol-v1.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
memcontrol-v1.h mm.git review status for linus..mm-stable 2026-02-12 11:32:37 -08:00
memcontrol.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
memfd.c mm: update shmem_[kernel]_file_*() functions to use vma_flags_t 2026-02-12 15:42:58 -08:00
memfd_luo.c liveupdate: luo_file: remember retrieve() status 2026-02-24 11:13:26 -08:00
memory-failure.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
memory-tiers.c Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
memory.c mm: Fix a hmm_range_fault() livelock / starvation problem 2026-03-02 11:51:51 -05:00
memory_hotplug.c mm: rename CONFIG_BALLOON_COMPACTION to CONFIG_BALLOON_MIGRATION 2026-01-31 14:22:36 -08:00
mempolicy.c Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
mempool.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
memremap.c mm/zone_device: reinitialize large zone device private folios 2026-01-26 19:03:48 -08:00
memtest.c mm/memtest: add underflow detection for size calculation 2026-01-09 11:53:51 +02:00
migrate.c mm: Fix a hmm_range_fault() livelock / starvation problem 2026-03-02 11:51:51 -05:00
migrate_device.c mm: Fix a hmm_range_fault() livelock / starvation problem 2026-03-02 11:51:51 -05:00
mincore.c mm: replace remaining pte_to_swp_entry() with softleaf_from_pte() 2025-11-24 15:08:52 -08:00
mlock.c mm: update vma_modify_flags() to handle residual flags, document 2025-11-20 13:43:58 -08:00
mm_init.c mm: fix NULL NODE_DATA dereference for memoryless nodes on boot 2026-02-24 11:13:28 -08:00
mm_slot.h
mmap.c mm: update secretmem to use VMA flags on mmap_prepare 2026-02-12 15:42:58 -08:00
mmap_lock.c mm/vma: improve and document __is_vma_write_locked() 2026-01-31 14:22:51 -08:00
mmu_gather.c mm: add SPDX id lines to some mm source files 2026-02-06 15:47:16 -08:00
mmu_notifier.c Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
mmzone.c mm: introduce memdesc_flags_t 2025-09-13 16:55:07 -07:00
mprotect.c mm: introduce generic lazy_mmu helpers 2026-01-20 19:24:33 -08:00
mremap.c mm: update secretmem to use VMA flags on mmap_prepare 2026-02-12 15:42:58 -08:00
mseal.c mm: fix minor spelling mistakes in comments 2026-01-20 19:24:48 -08:00
msync.c
nommu.c mm/nommu: convert kobjsize() to folios 2025-09-13 16:54:46 -07:00
numa.c mm/numa: remove unnecessary local variable in alloc_node_data() 2025-05-12 23:50:38 -07:00
numa_emulation.c mm: numa,memblock: Use SZ_1M macro to denote bytes to MB conversion 2025-08-20 16:31:23 +03:00
numa_memblks.c memblock: numa_memblks: fix detection of NUMA node for CXL windows 2026-02-21 09:58:22 -08:00
oom_kill.c mm: fix OOM killer inaccuracy on large many-core systems 2026-01-31 14:22:37 -08:00
page-writeback.c mm/block/fs: remove laptop_mode 2026-01-20 19:24:47 -08:00
page_alloc.c mm/kfence: fix KASAN hardware tag faults during late enablement 2026-02-24 11:13:27 -08:00
page_counter.c page_counter: track failcnt only for legacy cgroups 2025-03-17 00:05:35 -07:00
page_ext.c mm/page_ext: Add page_ext_get_from_phys() 2026-01-21 12:51:48 +01:00
page_frag_cache.c mm/page_alloc: export free_frozen_pages() instead of free_unref_page() 2025-01-13 22:40:31 -08:00
page_idle.c mm/rmap: extend rmap and migration support device-private entries 2025-11-24 15:08:48 -08:00
page_io.c mm: fix minor spelling mistakes in comments 2026-01-20 19:24:48 -08:00
page_isolation.c mm: page_isolation: introduce page_is_unmovable() 2026-01-31 14:22:42 -08:00
page_owner.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
page_poison.c
page_reporting.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
page_reporting.h
page_table_check.c mm: provide address parameter to p{te,md,ud}_user_accessible_page() 2026-01-26 20:02:35 -08:00
page_vma_mapped.c mm: eliminate further swapops predicates 2025-11-24 15:08:52 -08:00
pagewalk.c mm/pagewalk: use min() to simplify the code 2026-01-31 14:22:52 -08:00
percpu-internal.h
percpu-km.c mm/mm/percpu-km: drop nth_page() usage within single allocation 2025-09-21 14:22:04 -07:00
percpu-stats.c mm: remove outdated filename comment in percpu-stats.c 2025-07-13 16:38:23 -07:00
percpu-vm.c kmsan: remove hard-coded GFP_KERNEL flags 2025-11-16 17:27:54 -08:00
percpu.c percpu: add double free check to pcpu_free_area() 2026-01-31 14:22:52 -08:00
pgalloc-track.h
pgtable-generic.c compiler-context-analysis: Remove __cond_lock() function-like helper 2026-01-05 16:43:33 +01:00
process_vm_access.c mm: refactor mm_access() to not return NULL 2024-11-05 16:56:23 -08:00
ptdump.c mm/ptdump: replace READ_ONCE() with standard page table accessors 2025-11-16 17:27:52 -08:00
readahead.c mm.git review status for linus..mm-stable 2026-02-12 11:32:37 -08:00
rmap.c mm: rmap: support batched unmapping for file large folios 2026-02-12 15:43:01 -08:00
rodata_test.c mm/rodata_test: verify test data is unchanged, rather than non-zero 2025-01-13 22:40:38 -08:00
secretmem.c mm: update secretmem to use VMA flags on mmap_prepare 2026-02-12 15:42:58 -08:00
shmem.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
shmem_quota.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
show_mem.c mm/vmscan: add tracepoint and reason for kswapd_failures reset 2026-01-31 14:22:38 -08:00
shrinker.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
shrinker_debug.c memcg: rename mem_cgroup_ino() to mem_cgroup_id() 2026-01-26 20:02:25 -08:00
shuffle.c
shuffle.h
slab.h mm/slab: mark alloc tags empty for sheaves allocated with __GFP_NO_OBJ_EXT 2026-02-26 17:30:32 +01:00
slab_common.c Merge branch 'slab/for-7.0/sheaves' into slab/for-next 2026-02-10 09:10:00 +01:00
slub.c mm/slab: initialize slab->stride early to avoid memory ordering issues 2026-02-27 16:22:57 +01:00
sparse-vmemmap.c mm: replace READ_ONCE() with standard page table accessors 2025-11-16 17:27:56 -08:00
sparse.c mm/memory_hotplug: Remove MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers 2025-10-14 14:24:53 +02:00
swap.c mm: fix minor spelling mistakes in comments 2026-01-20 19:24:48 -08:00
swap.h mm, swap: drop the SWAP_HAS_CACHE flag 2026-01-31 14:22:57 -08:00
swap_cgroup.c mm: swap_cgroup: remove double initialization of locals 2025-03-17 22:06:58 -07:00
swap_state.c mm, swap: drop the SWAP_HAS_CACHE flag 2026-01-31 14:22:57 -08:00
swap_table.h mm, swap: use a single page for swap table when the size fits 2025-09-21 14:22:25 -07:00
swapfile.c Convert more 'alloc_obj' cases to default GFP_KERNEL arguments 2026-02-21 20:03:00 -08:00
truncate.c vfs-6.19-rc1.folio 2025-12-01 10:26:38 -08:00
usercopy.c usercopy: Remove folio references from check_heap_object() 2025-11-13 11:01:08 +01:00
userfaultfd.c mm, swap: check swap table directly for checking cache 2026-01-31 14:22:57 -08:00
util.c mm: make vm_area_desc utilise vma_flags_t only 2026-02-12 15:42:59 -08:00
vma.c mm: make vm_area_desc utilise vma_flags_t only 2026-02-12 15:42:59 -08:00
vma.h mm: make vm_area_desc utilise vma_flags_t only 2026-02-12 15:42:59 -08:00
vma_exec.c mm: softdirty: add pgtable_supports_soft_dirty() 2025-11-24 15:08:54 -08:00
vma_init.c Summary of significant series in this pull request: 2025-10-02 18:18:33 -07:00
vma_internal.h mm: relocate the page table ceiling and floor definitions 2026-02-12 15:42:53 -08:00
vmalloc.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
vmpressure.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
vmscan.c treewide: Replace kmalloc with kmalloc_obj for non-scalar types 2026-02-21 01:02:28 -08:00
vmstat.c mm.git review status for linus..mm-stable 2026-02-12 11:32:37 -08:00
workingset.c memcg: introduce private id API for in-kernel users 2026-01-26 20:02:23 -08:00
zpdesc.h mm: zpdesc: minor naming and comment corrections 2025-09-21 14:21:59 -07:00
zsmalloc.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00
zswap.c Convert 'alloc_obj' family to use the new default GFP_KERNEL argument 2026-02-21 17:09:51 -08:00