linux/mm
Johannes Weiner c2f6ea38fc mm: page_alloc: don't steal single pages from biggest buddy
The fallback code searches for the biggest buddy first in an attempt to
steal the whole block and encourage type grouping down the line.

The approach used to be this:

- Non-movable requests will split the largest buddy and steal the
  remainder. This splits up contiguity, but it allows subsequent
  requests of this type to fall back into adjacent space.

- Movable requests go and look for the smallest buddy instead. The
  thinking is that movable requests can be compacted, so grouping is
  less important than retaining contiguity.

c0cd6f557b ("mm: page_alloc: fix freelist movement during block
conversion") enforces freelist type hygiene, which restricts stealing to
either claiming the whole block or just taking the requested chunk; no
additional pages or buddy remainders can be stolen any more.

The patch mishandled when to switch to finding the smallest buddy in that
new reality.  As a result, it may steal the exact request size, but from
the biggest buddy.  This causes fracturing for no good reason.

Fix this by committing to the new behavior: either steal the whole block,
or fall back to the smallest buddy.

Remove single-page stealing from steal_suitable_fallback().  Rename it to
try_to_steal_block() to make the intentions clear.  If this fails, always
fall back to the smallest buddy.

The following is from 4 runs of mmtest's thpchallenge.  "Pollute" is
single page fallback, "steal" is conversion of a partially used block. 
The numbers for free block conversions (omitted) are comparable.

				     vanilla	      patched

@pollute[unmovable from reclaimable]:	  27		  106
@pollute[unmovable from movable]:	  82		   46
@pollute[reclaimable from unmovable]:	 256		   83
@pollute[reclaimable from movable]:	  46		    8
@pollute[movable from unmovable]:	4841		  868
@pollute[movable from reclaimable]:	5278		12568

@steal[unmovable from reclaimable]:	  11		   12
@steal[unmovable from movable]:		 113		   49
@steal[reclaimable from unmovable]:	  19		   34
@steal[reclaimable from movable]:	  47		   21
@steal[movable from unmovable]:		 250		  183
@steal[movable from reclaimable]:	  81		   93

The allocator appears to do a better job at keeping stealing and polluting
to the first fallback preference.  As a result, the numbers for "from
movable" - the least preferred fallback option, and most detrimental to
compactability - are down across the board.

Link: https://lkml.kernel.org/r/20250225001023.1494422-2-hannes@cmpxchg.org
Fixes: c0cd6f557b ("mm: page_alloc: fix freelist movement during block conversion")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:41 -07:00
..
damon mm/damon: implement a new DAMOS filter type for unmapped pages 2025-03-16 22:06:32 -07:00
kasan kasan: don't call find_vm_area() in a PREEMPT_RT kernel 2025-02-17 22:40:04 -08:00
kfence kfence: skip __GFP_THISNODE allocations on NUMA systems 2025-02-01 03:53:26 -08:00
kmsan dma: kmsan: export kmsan_handle_dma() for modules 2025-03-05 21:36:14 -08:00
backing-dev.c writeback: support retrieving per group debug writeback stats of bdi 2024-05-05 17:53:51 -07:00
balloon_compaction.c mm: remove MIGRATE_SYNC_NO_COPY mode 2024-07-03 19:30:00 -07:00
bootmem_info.c mm/sparse: allow for alternate vmemmap section init at boot 2025-03-16 22:06:27 -07:00
cma.c mm/cma: introduce interface for early reservations 2025-03-16 22:06:30 -07:00
cma.h mm/cma: introduce interface for early reservations 2025-03-16 22:06:30 -07:00
cma_debug.c mm, cma: support multiple contiguous ranges, if requested 2025-03-16 22:06:25 -07:00
cma_sysfs.c mm/cma: export total and free number of pages for CMA areas 2025-03-16 22:06:24 -07:00
compaction.c NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback 2025-03-05 21:36:15 -08:00
debug.c mm/debug: print vm_refcnt state when dumping the vma 2025-03-16 22:06:20 -07:00
debug_page_alloc.c mm: page_alloc: consolidate free page accounting 2024-04-25 20:56:04 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries 2024-09-17 01:07:01 -07:00
dmapool.c
dmapool_test.c mm/dmapool: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
early_ioremap.c mm/early_ioremap: add null pointer checks to prevent NULL-pointer dereference 2025-01-13 22:40:59 -08:00
execmem.c alloc_tag: populate memory for module tags as needed 2024-11-07 14:25:16 -08:00
fadvise.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
fail_page_alloc.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
failslab.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
filemap.c filemap: remove redundant folio_test_large check in filemap_free_folio 2025-03-16 22:06:16 -07:00
folio-compat.c mm/writeback: add folio_mark_dirty_lock() 2024-11-05 11:14:32 +01:00
gup.c mm: remove the access_ok() call from gup_fast_fallback() 2025-03-16 22:06:10 -07:00
gup_test.c
gup_test.h
highmem.c mm/highmem: make nr_free_highpages() return "unsigned long" 2024-07-03 19:30:06 -07:00
hmm.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
huge_memory.c mm: avoid splitting pmd for lazyfree pmd-mapped THP in try_to_unmap 2025-03-16 22:06:17 -07:00
hugetlb.c mm/hugetlb: move hugetlb CMA code in to its own file 2025-03-16 22:06:31 -07:00
hugetlb_cgroup.c mm/hugetlb-cgroup: convert hugetlb_cgroup_css_offline() to work on folios 2025-01-25 20:22:42 -08:00
hugetlb_cma.c mm/hugetlb: move hugetlb CMA code in to its own file 2025-03-16 22:06:31 -07:00
hugetlb_cma.h mm/hugetlb: move hugetlb CMA code in to its own file 2025-03-16 22:06:31 -07:00
hugetlb_vmemmap.c mm/hugetlb: do pre-HVO for bootmem allocated pages 2025-03-16 22:06:29 -07:00
hugetlb_vmemmap.h mm/hugetlb: do pre-HVO for bootmem allocated pages 2025-03-16 22:06:29 -07:00
hwpoison-inject.c mm/hwpoison: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
init-mm.c mm: replace vm_lock and detached flag with a reference count 2025-03-16 22:06:20 -07:00
internal.h mm/cma: introduce interface for early reservations 2025-03-16 22:06:30 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm/ioremap: pass pgprot_t to ioremap_prot() instead of unsigned long 2025-03-16 22:06:23 -07:00
Kconfig mm/sparse: allow for alternate vmemmap section init at boot 2025-03-16 22:06:27 -07:00
Kconfig.debug slub: Introduce CONFIG_SLUB_RCU_DEBUG 2024-08-27 14:12:51 +02:00
khugepaged.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
kmemleak.c mm: kmemleak: add support for dumping physical and __percpu object info 2025-03-16 22:06:08 -07:00
ksm.c mm/ksm: handle device-exclusive entries correctly in write_protect_page() 2025-03-16 22:05:58 -07:00
list_lru.c mm/list_lru: fix false warning of negative counter 2024-12-30 17:59:10 -08:00
maccess.c kasan: migrate copy_user_test to kunit 2024-11-11 00:26:44 -08:00
madvise.c mm: allow guard regions in file-backed and read-only mappings 2025-03-16 22:06:14 -07:00
Makefile mm/hugetlb: move hugetlb CMA code in to its own file 2025-03-16 22:06:31 -07:00
mapping_dirty_helpers.c
memblock.c mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
memcontrol-v1.c mm: memcontrol: move memsw charge callbacks to v1 2025-03-16 22:05:55 -07:00
memcontrol-v1.h mm: memcontrol: move memsw charge callbacks to v1 2025-03-16 22:05:55 -07:00
memcontrol.c mm: memcontrol: move memsw charge callbacks to v1 2025-03-16 22:05:55 -07:00
memfd.c mm/memfd: fix spelling and grammatical issues 2025-03-16 22:06:04 -07:00
memory-failure.c mm: memory-failure: update ttu flag inside unmap_poisoned_folio 2025-03-05 21:36:13 -08:00
memory-tiers.c memory tiers: use default_dram_perf_ref_source in log message 2024-09-26 14:01:44 -07:00
memory.c mm/ioremap: pass pgprot_t to ioremap_prot() instead of unsigned long 2025-03-16 22:06:23 -07:00
memory_hotplug.c hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio 2025-03-05 21:36:13 -08:00
mempolicy.c mm/hugetlb: rename isolate_hugetlb() to folio_isolate_hugetlb() 2025-01-25 20:22:41 -08:00
mempool.c mm: fix xyz_noprof functions calling profiled functions 2024-06-05 19:19:26 -07:00
memremap.c mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs() 2024-05-05 17:53:49 -07:00
memtest.c
migrate.c mm: use READ/WRITE_ONCE() for vma->vm_flags on migrate, mprotect 2025-03-16 22:06:09 -07:00
migrate_device.c mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize() 2025-02-17 22:40:02 -08:00
mincore.c mm/mincore: improve performance by adding an unlikely hint 2025-03-16 22:06:32 -07:00
mlock.c mm/mlock: set the correct prev on failure 2024-11-07 14:14:58 -08:00
mm_init.c mm/cma: introduce interface for early reservations 2025-03-16 22:06:30 -07:00
mm_slot.h
mmap.c mm: make vma cache SLAB_TYPESAFE_BY_RCU 2025-03-16 22:06:21 -07:00
mmap_lock.c mm: mmap_lock: optimize mmap_lock tracepoints 2025-01-13 22:40:34 -08:00
mmu_gather.c mm/mmu_gather: update comment on RCU freeing 2025-03-16 22:06:12 -07:00
mmu_notifier.c mm: move internal core VMA manipulation functions to own file 2024-09-01 20:25:54 -07:00
mmzone.c mm: improve code consistency with zonelist_* helper functions 2024-09-01 20:25:55 -07:00
mprotect.c mm: use READ/WRITE_ONCE() for vma->vm_flags on migrate, mprotect 2025-03-16 22:06:09 -07:00
mremap.c mm: clear uffd-wp PTE/PMD state on mremap() 2025-01-12 19:03:37 -08:00
mseal.c mseal: remove can_do_mseal() 2025-01-13 22:40:51 -08:00
msync.c
nommu.c mm: introduce vma_iter_store_attached() to use with attached vmas 2025-03-16 22:06:18 -07:00
numa.c mm/memblock: add memblock_alloc_or_panic interface 2025-01-25 20:22:38 -08:00
numa_emulation.c mm/fake-numa: allow later numa node hotplug 2025-01-25 20:22:29 -08:00
numa_memblks.c mm/fake-numa: allow later numa node hotplug 2025-01-25 20:22:29 -08:00
oom_kill.c mm/oom_kill: fix trivial typo in comment 2025-03-16 22:05:55 -07:00
page-writeback.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
page_alloc.c mm: page_alloc: don't steal single pages from biggest buddy 2025-03-16 22:06:41 -07:00
page_counter.c kernel/cgroup: Add "dmem" memory accounting cgroup 2025-01-06 17:24:38 +01:00
page_ext.c mm: don't account memmap per-node 2024-08-15 22:16:14 -07:00
page_frag_cache.c mm/page_alloc: export free_frozen_pages() instead of free_unref_page() 2025-01-13 22:40:31 -08:00
page_idle.c mm/page_idle: handle device-exclusive entries correctly in page_idle_clear_pte_refs_one() 2025-03-16 22:05:59 -07:00
page_io.c mm, swap: clean up device availability check 2025-01-25 20:22:36 -08:00
page_isolation.c mm/hugetlb: wait for hugetlb folios to be freed 2025-03-05 21:36:14 -08:00
page_owner.c mm/page-owner: use gfp_nested_mask() instead of open coded masking 2024-05-19 14:40:44 -07:00
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c mm: use single SWP_DEVICE_EXCLUSIVE entry type 2025-03-16 22:05:58 -07:00
page_vma_mapped.c mm/page_vma_mapped: device-exclusive entries are not migration entries 2025-03-16 22:05:58 -07:00
pagewalk.c mm: pagewalk: add the ability to install PTEs 2024-11-11 00:26:44 -08:00
percpu-internal.h mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c percpu: clean up all mappings when pcpu_map_pages() fails 2024-04-25 20:55:49 -07:00
percpu.c mm, percpu: do not consider sleepable allocations atomic 2025-03-16 22:06:08 -07:00
pgalloc-track.h
pgtable-generic.c mm: add RCU annotation to pte_offset_map(_lock) 2024-12-18 19:04:43 -08:00
process_vm_access.c mm: refactor mm_access() to not return NULL 2024-11-05 16:56:23 -08:00
pt_reclaim.c mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED) 2025-01-13 22:40:48 -08:00
ptdump.c
readahead.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
rmap.c mm: avoid splitting pmd for lazyfree pmd-mapped THP in try_to_unmap 2025-03-16 22:06:17 -07:00
rodata_test.c mm/rodata_test: verify test data is unchanged, rather than non-zero 2025-01-13 22:40:38 -08:00
secretmem.c add a string-to-qstr constructor 2025-01-27 19:25:45 -05:00
shmem.c mm: memcontrol: move memsw charge callbacks to v1 2025-03-16 22:05:55 -07:00
shmem_quota.c shmem_quota: build the object file conditionally to the config option 2024-09-01 20:25:45 -07:00
show_mem.c mm/show_mem: use str_yes_no() helper in show_free_areas() 2024-11-07 14:38:08 -08:00
shrinker.c mm: shrinker: avoid memleak in alloc_shrinker_info 2024-10-31 20:27:04 -07:00
shrinker_debug.c saner replacement for debugfs_rename() 2025-01-15 13:14:37 +01:00
shuffle.c
shuffle.h
slab.h mm/slab: fix kernel-doc func param names 2025-01-13 10:22:04 +01:00
slab_common.c mm/slab/kvfree_rcu: Switch to WQ_MEM_RECLAIM wq 2025-03-04 08:51:53 +01:00
slub.c alloc_tag: uninline code gated by mem_alloc_profiling_key in slab allocator 2025-03-16 22:06:03 -07:00
sparse-vmemmap.c mm/hugetlb: do pre-HVO for bootmem allocated pages 2025-03-16 22:06:29 -07:00
sparse.c mm/sparse: allow for alternate vmemmap section init at boot 2025-03-16 22:06:27 -07:00
swap.c mm/filemap: add read support for RWF_DONTCACHE 2025-01-25 20:22:43 -08:00
swap.h mm: fix swap_read_folio_zeromap() for large folios with partial zeromap 2024-09-17 01:07:01 -07:00
swap_cgroup.c mm: memcontrol: fix swap counter leak from offline cgroup 2025-03-16 17:40:24 -07:00
swap_slots.c mm, swap_slots: remove slot cache for freeing path 2025-01-25 20:22:37 -08:00
swap_state.c mm/swap: rename swap_swapcount() to swap_entry_swapped() 2025-03-16 22:06:07 -07:00
swapfile.c mm: swap: remove stale comment of swap_reclaim_full_clusters() 2025-03-16 22:06:40 -07:00
truncate.c mm/truncate: don't skip dirty page in folio_unmap_invalidate() 2025-02-21 14:09:47 +01:00
usercopy.c
userfaultfd.c mm: allow vma_start_read_locked/vma_start_read_locked_nested to fail 2025-03-16 22:06:18 -07:00
util.c mm: add comments to do_mmap(), mmap_region() and vm_mmap() 2025-01-13 22:40:59 -08:00
vma.c mm: make vma cache SLAB_TYPESAFE_BY_RCU 2025-03-16 22:06:21 -07:00
vma.h mm: make vma cache SLAB_TYPESAFE_BY_RCU 2025-03-16 22:06:21 -07:00
vma_internal.h mm/vma: move brk() internals to mm/vma.c 2025-01-13 22:40:42 -08:00
vmalloc.c mm: don't skip arch_sync_kernel_mappings() in error paths 2025-03-05 21:36:18 -08:00
vmpressure.c
vmscan.c vmscan, cleanup: add for_each_managed_zone_pgdat macro 2025-03-16 22:06:10 -07:00
vmstat.c vmstat: disable vmstat_work on vmstat_cpu_down_prep() 2025-01-12 19:03:38 -08:00
workingset.c mm/mglru: rework workingset protection 2025-01-25 20:22:39 -08:00
zpdesc.h mm/zsmalloc: introduce __zpdesc_clear/set_zsmalloc() 2025-01-25 20:22:35 -08:00
zpool.c mm: zbud: remove zbud 2025-03-16 22:06:01 -07:00
zsmalloc.c zsmalloc: introduce new object mapping API 2025-03-16 22:06:36 -07:00
zswap.c mm: zswap: use ATOMIC_LONG_INIT to initialize zswap_stored_pages 2025-03-05 21:36:17 -08:00