mm.git review status for linus..mm-stable

Everything:
 
 Total patches:       325
 Reviews/patch:       1.39
 Reviewed rate:       72%
 
 Excluding DAMON:
 
 Total patches:       262
 Reviews/patch:       1.63
 Reviewed rate:       82%
 
 Excluding DAMON and zram:
 
 Total patches:       248
 Reviews/patch:       1.72
 Reviewed rate:       86%
 
 - The 14 patch series "powerpc/64s: do not re-activate batched TLB
   flush" from Alexander Gordeev makes arch_{enter|leave}_lazy_mmu_mode()
   nest properly.
 
   It adds a generic enter/leave layer and switches architectures to use
   it.  Various hacks were removed in the process.
 
 - The 7 patch series "zram: introduce compressed data writeback" from
   Richard Chang and Sergey Senozhatsky implements data compression for
   zram writeback.
 
 - The 8 patch series "mm: folio_zero_user: clear page ranges" from David
   Hildenbrand adds clearing of contiguous page ranges for hugepages.
   Large improvements during demand faulting are demonstrated.
 
 - The 2 patch series "memcg cleanups" from Chen Ridong tideis up some
   memcg code.
 
 - The 12 patch series "mm/damon: introduce {,max_}nr_snapshots and
   tracepoint for damos stats" from SeongJae Park improves DAMOS stat's
   provided information, deterministic control, and readability.
 
 - The 3 patch series "selftests/mm: hugetlb cgroup charging: robustness
   fixes" from Li Wang fixes a few issues in the hugetlb cgroup charging
   selftests.
 
 - The 5 patch series "Fix va_high_addr_switch.sh test failure - again"
   from Chunyu Hu addresses several issues in the va_high_addr_switch test.
 
 - The 5 patch series "mm/damon/tests/core-kunit: extend existing test
   scenarios" from Shu Anzai improves the KUnit test coverage for DAMON.
 
 - The 2 patch series "mm/khugepaged: fix dirty page handling for
   MADV_COLLAPSE" from Shivank Garg fixes a glitch in khugepaged which was
   causing madvise(MADV_COLLAPSE) to transiently return -EAGAIN.
 
 - The 29 patch series "arch, mm: consolidate hugetlb early reservation"
   from Mike Rapoport reworks and consolidates a pile of straggly code
   related to reservation of hugetlb memory from bootmem and creation of
   CMA areas for hugetlb.
 
 - The 9 patch series "mm: clean up anon_vma implementation" from Lorenzo
   Stoakes cleans up the anon_vma implementation in various ways.
 
 - The 3 patch series "tweaks for __alloc_pages_slowpath()" from
   Vlastimil Babka does a little streamlining of the page allocator's
   slowpath code.
 
 - The 8 patch series "memcg: separate private and public ID namespaces"
   from Shakeel Butt cleans up the memcg ID code and prevents the
   internal-only private IDs from being exposed to userspace.
 
 - The 6 patch series "mm: hugetlb: allocate frozen gigantic folio" from
   Kefeng Wang cleans up the allocation of frozen folios and avoids some
   atomic refcount operations.
 
 - The 11 patch series "mm/damon: advance DAMOS-based LRU sorting" from
   SeongJae Park improves DAMOS's movement of memory betewwn the active and
   inactive LRUs and adds auto-tuning of the ratio-based quotas and of
   monitoring intervals.
 
 - The 18 patch series "Support page table check on PowerPC" from Andrew
   Donnellan makes CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc.
 
 - The 3 patch series "nodemask: align nodes_and{,not} with underlying
   bitmap ops" from Yury Norov makes nodes_and() and nodes_andnot()
   propagate the return values from the underlying bit operations, enabling
   some cleanup in calling code.
 
 - The 5 patch series "mm/damon: hide kdamond and kdamond_lock from API
   callers" from SeongJae Park cleans up some DAMON internal interfaces.
 
 - The 4 patch series "mm/khugepaged: cleanups and scan limit fix" from
   Shivank Garg does some cleanup work in khupaged and fixes a scan limit
   accounting issue.
 
 - The 24 patch series "mm: balloon infrastructure cleanups" from David
   Hildenbrand goes to town on the balloon infrastructure and its page
   migration function.  Mainly cleanups, also some locking simplification.
 
 - The 2 patch series "mm/vmscan: add tracepoint and reason for
   kswapd_failures reset" from Jiayuan Chen adds additional tracepoints to
   the page reclaim code.
 
 - The 3 patch series "Replace wq users and add WQ_PERCPU to
   alloc_workqueue() users" from Marco Crivellari is part of Marco's
   kernel-wide migration from the legacy workqueue APIs over to the
   preferred unbound workqueues.
 
 - The 9 patch series "Various mm kselftests improvements/fixes" from
   Kevin Brodsky provides various unrelated improvements/fixes for the mm
   kselftests.
 
 - The 5 patch series "mm: accelerate gigantic folio allocation" from
   Kefeng Wang greatly speeds up gigantic folio allocation, mainly by
   avoiding unnecessary work in pfn_range_valid_contig().
 
 - The 5 patch series "selftests/damon: improve leak detection and wss
   estimation reliability" from SeongJae Park improves the reliability of
   two of the DAMON selftests.
 
 - The 8 patch series "mm/damon: cleanup kdamond, damon_call(), damos
   filter and DAMON_MIN_REGION" from SeongJae Park does some cleanup work
   in the core DAMON code.
 
 - The 8 patch series "Docs/mm/damon: update intro, modules, maintainer
   profile, and misc" from SeongJae Park performs maintenance work on the
   DAMON documentation.
 
 - The 10 patch series "mm: add and use vma_assert_stabilised() helper"
   from Lorenzo Stoakes refactors and cleans up the core VMA code.  The
   main aim here is to be able to use the mmap write lock's lockdep state
   to perform various assertions regarding the locking which the VMA code
   requires.
 
 - The 19 patch series "mm, swap: swap table phase II: unify swapin use"
   from Kairui Song removes some old swap code (swap cache bypassing and
   swap synchronization) which wasn't working very well.  Various other
   cleanups and simplifications were made.  The end result is a 20% speedup
   in one benchmark.
 
 - The 8 patch series "enable PT_RECLAIM on more 64-bit architectures"
   from Qi Zheng makes PT_RECLAIM available on 64-bit alpha, loongarch,
   mips, parisc, um,  Various cleanups were performed along the way.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY1HfAAKCRDdBJ7gKXxA
 jqhZAP9H8ZlKKqCEgnr6U5XXmJ63Ep2FDQpl8p35yr9yVuU9+gEAgfyWiJ43l1fP
 rT0yjsUW3KQFBi/SEA3R6aYarmoIBgI=
 =+HLt
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - "powerpc/64s: do not re-activate batched TLB flush" makes
   arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev)

   It adds a generic enter/leave layer and switches architectures to use
   it. Various hacks were removed in the process.

 - "zram: introduce compressed data writeback" implements data
   compression for zram writeback (Richard Chang and Sergey Senozhatsky)

 - "mm: folio_zero_user: clear page ranges" adds clearing of contiguous
   page ranges for hugepages. Large improvements during demand faulting
   are demonstrated (David Hildenbrand)

 - "memcg cleanups" tidies up some memcg code (Chen Ridong)

 - "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos
   stats" improves DAMOS stat's provided information, deterministic
   control, and readability (SeongJae Park)

 - "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few
   issues in the hugetlb cgroup charging selftests (Li Wang)

 - "Fix va_high_addr_switch.sh test failure - again" addresses several
   issues in the va_high_addr_switch test (Chunyu Hu)

 - "mm/damon/tests/core-kunit: extend existing test scenarios" improves
   the KUnit test coverage for DAMON (Shu Anzai)

 - "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a
   glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to
   transiently return -EAGAIN (Shivank Garg)

 - "arch, mm: consolidate hugetlb early reservation" reworks and
   consolidates a pile of straggly code related to reservation of
   hugetlb memory from bootmem and creation of CMA areas for hugetlb
   (Mike Rapoport)

 - "mm: clean up anon_vma implementation" cleans up the anon_vma
   implementation in various ways (Lorenzo Stoakes)

 - "tweaks for __alloc_pages_slowpath()" does a little streamlining of
   the page allocator's slowpath code (Vlastimil Babka)

 - "memcg: separate private and public ID namespaces" cleans up the
   memcg ID code and prevents the internal-only private IDs from being
   exposed to userspace (Shakeel Butt)

 - "mm: hugetlb: allocate frozen gigantic folio" cleans up the
   allocation of frozen folios and avoids some atomic refcount
   operations (Kefeng Wang)

 - "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement
   of memory betewwn the active and inactive LRUs and adds auto-tuning
   of the ratio-based quotas and of monitoring intervals (SeongJae Park)

 - "Support page table check on PowerPC" makes
   CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan)

 - "nodemask: align nodes_and{,not} with underlying bitmap ops" makes
   nodes_and() and nodes_andnot() propagate the return values from the
   underlying bit operations, enabling some cleanup in calling code
   (Yury Norov)

 - "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up
   some DAMON internal interfaces (SeongJae Park)

 - "mm/khugepaged: cleanups and scan limit fix" does some cleanup work
   in khupaged and fixes a scan limit accounting issue (Shivank Garg)

 - "mm: balloon infrastructure cleanups" goes to town on the balloon
   infrastructure and its page migration function. Mainly cleanups, also
   some locking simplification (David Hildenbrand)

 - "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds
   additional tracepoints to the page reclaim code (Jiayuan Chen)

 - "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is
   part of Marco's kernel-wide migration from the legacy workqueue APIs
   over to the preferred unbound workqueues (Marco Crivellari)

 - "Various mm kselftests improvements/fixes" provides various unrelated
   improvements/fixes for the mm kselftests (Kevin Brodsky)

 - "mm: accelerate gigantic folio allocation" greatly speeds up gigantic
   folio allocation, mainly by avoiding unnecessary work in
   pfn_range_valid_contig() (Kefeng Wang)

 - "selftests/damon: improve leak detection and wss estimation
   reliability" improves the reliability of two of the DAMON selftests
   (SeongJae Park)

 - "mm/damon: cleanup kdamond, damon_call(), damos filter and
   DAMON_MIN_REGION" does some cleanup work in the core DAMON code
   (SeongJae Park)

 - "Docs/mm/damon: update intro, modules, maintainer profile, and misc"
   performs maintenance work on the DAMON documentation (SeongJae Park)

 - "mm: add and use vma_assert_stabilised() helper" refactors and cleans
   up the core VMA code. The main aim here is to be able to use the mmap
   write lock's lockdep state to perform various assertions regarding
   the locking which the VMA code requires (Lorenzo Stoakes)

 - "mm, swap: swap table phase II: unify swapin use" removes some old
   swap code (swap cache bypassing and swap synchronization) which
   wasn't working very well. Various other cleanups and simplifications
   were made. The end result is a 20% speedup in one benchmark (Kairui
   Song)

 - "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM
   available on 64-bit alpha, loongarch, mips, parisc, and um. Various
   cleanups were performed along the way (Qi Zheng)

* tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits)
  mm/memory: handle non-split locks correctly in zap_empty_pte_table()
  mm: move pte table reclaim code to memory.c
  mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE
  mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
  um: mm: enable MMU_GATHER_RCU_TABLE_FREE
  parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE
  mips: mm: enable MMU_GATHER_RCU_TABLE_FREE
  LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE
  alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE
  mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h
  mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles
  zsmalloc: make common caches global
  mm: add SPDX id lines to some mm source files
  mm/zswap: use %pe to print error pointers
  mm/vmscan: use %pe to print error pointers
  mm/readahead: fix typo in comment
  mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
  mm: refactor vma_map_pages to use vm_insert_pages
  mm/damon: unify address range representation with damon_addr_range
  mm/cma: replace snprintf with strscpy in cma_new_area
  ...
This commit is contained in:
Linus Torvalds 2026-02-12 11:32:37 -08:00
commit 4cff5c05e0
332 changed files with 6257 additions and 5611 deletions

View file

@ -150,3 +150,17 @@ Contact: Sergey Senozhatsky <senozhatsky@chromium.org>
Description:
The algorithm_params file is write-only and is used to setup
compression algorithm parameters.
What: /sys/block/zram<id>/writeback_compressed
Date: Decemeber 2025
Contact: Richard Chang <richardycc@google.com>
Description:
The writeback_compressed device atrribute toggles compressed
writeback feature.
What: /sys/block/zram<id>/writeback_batch_size
Date: November 2025
Contact: Sergey Senozhatsky <senozhatsky@chromium.org>
Description:
The writeback_batch_size device atrribute sets the maximum
number of in-flight writeback operations.

View file

@ -516,6 +516,19 @@ Contact: SeongJae Park <sj@kernel.org>
Description: Reading this file returns the number of the exceed events of
the scheme's quotas.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/stats/nr_snapshots
Date: Dec 2025
Contact: SeongJae Park <sj@kernel.org>
Description: Reading this file returns the total number of DAMON snapshots
that the scheme has tried to be applied.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/stats/max_nr_snapshots
Date: Dec 2025
Contact: SeongJae Park <sj@kernel.org>
Description: Writing a number to this file sets the upper limit of
nr_snapshots that deactivates the scheme when the limit is
reached or exceeded.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/tried_regions/total_bytes
Date: Jul 2023
Contact: SeongJae Park <sj@kernel.org>

View file

@ -214,6 +214,9 @@ mem_limit WO specifies the maximum amount of memory ZRAM can
writeback_limit WO specifies the maximum amount of write IO zram
can write out to backing device as 4KB unit
writeback_limit_enable RW show and set writeback_limit feature
writeback_batch_size RW show and set maximum number of in-flight
writeback operations
writeback_compressed RW show and set compressed writeback feature
comp_algorithm RW show and change the compression algorithm
algorithm_params WO setup compression algorithm parameters
compact WO trigger memory compaction
@ -222,7 +225,6 @@ backing_dev RW set up backend storage for zram to write out
idle WO mark allocated slot as idle
====================== ====== ===============================================
User space is advised to use the following files to read the device statistics.
File /sys/block/zram<id>/stat
@ -434,6 +436,26 @@ system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of
writeback happened until you reset the zram to allocate extra writeback
budget in next setting is user's job.
By default zram stores written back pages in decompressed (raw) form, which
means that writeback operation involves decompression of the page before
writing it to the backing device. This behavior can be changed by enabling
`writeback_compressed` feature, which causes zram to write compressed pages
to the backing device, thus avoiding decompression overhead. To enable
this feature, execute::
$ echo yes > /sys/block/zramX/writeback_compressed
Note that this feature should be configured before the `zramX` device is
initialized.
Depending on backing device storage type, writeback operation may benefit
from a higher number of in-flight write requests (batched writes). The
number of maximum in-flight writeback operations can be configured via
`writeback_batch_size` attribute. To change the default value (which is 32),
execute::
$ echo 64 > /sys/block/zramX/writeback_batch_size
If admin wants to measure writeback count in a certain period, they could
know it via /sys/block/zram0/bd_stat's 3rd column.

View file

@ -311,9 +311,8 @@ Lock order is as follows::
folio_lock
mm->page_table_lock or split pte_lock
folio_memcg_lock (memcg->move_lock)
mapping->i_pages lock
lruvec->lru_lock.
mapping->i_pages lock
lruvec->lru_lock.
Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by
lruvec->lru_lock; the folio LRU flag is cleared before

View file

@ -10,7 +10,6 @@ Laptop Drivers
alienware-wmi
asus-laptop
disk-shock-protection
laptop-mode
lg-laptop
samsung-galaxybook
sony-laptop

View file

@ -1,770 +0,0 @@
===============================================
How to conserve battery power using laptop-mode
===============================================
Document Author: Bart Samwel (bart@samwel.tk)
Date created: January 2, 2004
Last modified: December 06, 2004
Introduction
------------
Laptop mode is used to minimize the time that the hard disk needs to be spun up,
to conserve battery power on laptops. It has been reported to cause significant
power savings.
.. Contents
* Introduction
* Installation
* Caveats
* The Details
* Tips & Tricks
* Control script
* ACPI integration
* Monitoring tool
Installation
------------
To use laptop mode, you don't need to set any kernel configuration options
or anything. Simply install all the files included in this document, and
laptop mode will automatically be started when you're on battery. For
your convenience, a tarball containing an installer can be downloaded at:
http://www.samwel.tk/laptop_mode/laptop_mode/
To configure laptop mode, you need to edit the configuration file, which is
located in /etc/default/laptop-mode on Debian-based systems, or in
/etc/sysconfig/laptop-mode on other systems.
Unfortunately, automatic enabling of laptop mode does not work for
laptops that don't have ACPI. On those laptops, you need to start laptop
mode manually. To start laptop mode, run "laptop_mode start", and to
stop it, run "laptop_mode stop". (Note: The laptop mode tools package now
has experimental support for APM, you might want to try that first.)
Caveats
-------
* The downside of laptop mode is that you have a chance of losing up to 10
minutes of work. If you cannot afford this, don't use it! The supplied ACPI
scripts automatically turn off laptop mode when the battery almost runs out,
so that you won't lose any data at the end of your battery life.
* Most desktop hard drives have a very limited lifetime measured in spindown
cycles, typically about 50.000 times (it's usually listed on the spec sheet).
Check your drive's rating, and don't wear down your drive's lifetime if you
don't need to.
* If you mount some of your ext3 filesystems with the -n option, then
the control script will not be able to remount them correctly. You must set
DO_REMOUNTS=0 in the control script, otherwise it will remount them with the
wrong options -- or it will fail because it cannot write to /etc/mtab.
* If you have your filesystems listed as type "auto" in fstab, like I did, then
the control script will not recognize them as filesystems that need remounting.
You must list the filesystems with their true type instead.
* It has been reported that some versions of the mutt mail client use file access
times to determine whether a folder contains new mail. If you use mutt and
experience this, you must disable the noatime remounting by setting the option
DO_REMOUNT_NOATIME to 0 in the configuration file.
The Details
-----------
Laptop mode is controlled by the knob /proc/sys/vm/laptop_mode. This knob is
present for all kernels that have the laptop mode patch, regardless of any
configuration options. When the knob is set, any physical disk I/O (that might
have caused the hard disk to spin up) causes Linux to flush all dirty blocks. The
result of this is that after a disk has spun down, it will not be spun up
anymore to write dirty blocks, because those blocks had already been written
immediately after the most recent read operation. The value of the laptop_mode
knob determines the time between the occurrence of disk I/O and when the flush
is triggered. A sensible value for the knob is 5 seconds. Setting the knob to
0 disables laptop mode.
To increase the effectiveness of the laptop_mode strategy, the laptop_mode
control script increases dirty_expire_centisecs and dirty_writeback_centisecs in
/proc/sys/vm to about 10 minutes (by default), which means that pages that are
dirtied are not forced to be written to disk as often. The control script also
changes the dirty background ratio, so that background writeback of dirty pages
is not done anymore. Combined with a higher commit value (also 10 minutes) for
ext3 filesystem (also done automatically by the control script),
this results in concentration of disk activity in a small time interval which
occurs only once every 10 minutes, or whenever the disk is forced to spin up by
a cache miss. The disk can then be spun down in the periods of inactivity.
Configuration
-------------
The laptop mode configuration file is located in /etc/default/laptop-mode on
Debian-based systems, or in /etc/sysconfig/laptop-mode on other systems. It
contains the following options:
MAX_AGE:
Maximum time, in seconds, of hard drive spindown time that you are
comfortable with. Worst case, it's possible that you could lose this
amount of work if your battery fails while you're in laptop mode.
MINIMUM_BATTERY_MINUTES:
Automatically disable laptop mode if the remaining number of minutes of
battery power is less than this value. Default is 10 minutes.
AC_HD/BATT_HD:
The idle timeout that should be set on your hard drive when laptop mode
is active (BATT_HD) and when it is not active (AC_HD). The defaults are
20 seconds (value 4) for BATT_HD and 2 hours (value 244) for AC_HD. The
possible values are those listed in the manual page for "hdparm" for the
"-S" option.
HD:
The devices for which the spindown timeout should be adjusted by laptop mode.
Default is /dev/hda. If you specify multiple devices, separate them by a space.
READAHEAD:
Disk readahead, in 512-byte sectors, while laptop mode is active. A large
readahead can prevent disk accesses for things like executable pages (which are
loaded on demand while the application executes) and sequentially accessed data
(MP3s).
DO_REMOUNTS:
The control script automatically remounts any mounted journaled filesystems
with appropriate commit interval options. When this option is set to 0, this
feature is disabled.
DO_REMOUNT_NOATIME:
When remounting, should the filesystems be remounted with the noatime option?
Normally, this is set to "1" (enabled), but there may be programs that require
access time recording.
DIRTY_RATIO:
The percentage of memory that is allowed to contain "dirty" or unsaved data
before a writeback is forced, while laptop mode is active. Corresponds to
the /proc/sys/vm/dirty_ratio sysctl.
DIRTY_BACKGROUND_RATIO:
The percentage of memory that is allowed to contain "dirty" or unsaved data
after a forced writeback is done due to an exceeding of DIRTY_RATIO. Set
this nice and low. This corresponds to the /proc/sys/vm/dirty_background_ratio
sysctl.
Note that the behaviour of dirty_background_ratio is quite different
when laptop mode is active and when it isn't. When laptop mode is inactive,
dirty_background_ratio is the threshold percentage at which background writeouts
start taking place. When laptop mode is active, however, background writeouts
are disabled, and the dirty_background_ratio only determines how much writeback
is done when dirty_ratio is reached.
DO_CPU:
Enable CPU frequency scaling when in laptop mode. (Requires CPUFreq to be setup.
See Documentation/admin-guide/pm/cpufreq.rst for more info. Disabled by default.)
CPU_MAXFREQ:
When on battery, what is the maximum CPU speed that the system should use? Legal
values are "slowest" for the slowest speed that your CPU is able to operate at,
or a value listed in /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies.
Tips & Tricks
-------------
* Bartek Kania reports getting up to 50 minutes of extra battery life (on top
of his regular 3 to 3.5 hours) using a spindown time of 5 seconds (BATT_HD=1).
* You can spin down the disk while playing MP3, by setting disk readahead
to 8MB (READAHEAD=16384). Effectively, the disk will read a complete MP3 at
once, and will then spin down while the MP3 is playing. (Thanks to Bartek
Kania.)
* Drew Scott Daniels observed: "I don't know why, but when I decrease the number
of colours that my display uses it consumes less battery power. I've seen
this on powerbooks too. I hope that this is a piece of information that
might be useful to the Laptop Mode patch or its users."
* In syslog.conf, you can prefix entries with a dash `-` to omit syncing the
file after every logging. When you're using laptop-mode and your disk doesn't
spin down, this is a likely culprit.
* Richard Atterer observed that laptop mode does not work well with noflushd
(http://noflushd.sourceforge.net/), it seems that noflushd prevents laptop-mode
from doing its thing.
* If you're worried about your data, you might want to consider using a USB
memory stick or something like that as a "working area". (Be aware though
that flash memory can only handle a limited number of writes, and overuse
may wear out your memory stick pretty quickly. Do _not_ use journalling
filesystems on flash memory sticks.)
Configuration file for control and ACPI battery scripts
-------------------------------------------------------
This allows the tunables to be changed for the scripts via an external
configuration file
It should be installed as /etc/default/laptop-mode on Debian, and as
/etc/sysconfig/laptop-mode on Red Hat, SUSE, Mandrake, and other work-alikes.
Config file::
# Maximum time, in seconds, of hard drive spindown time that you are
# comfortable with. Worst case, it's possible that you could lose this
# amount of work if your battery fails you while in laptop mode.
#MAX_AGE=600
# Automatically disable laptop mode when the number of minutes of battery
# that you have left goes below this threshold.
MINIMUM_BATTERY_MINUTES=10
# Read-ahead, in 512-byte sectors. You can spin down the disk while playing MP3/OGG
# by setting the disk readahead to 8MB (READAHEAD=16384). Effectively, the disk
# will read a complete MP3 at once, and will then spin down while the MP3/OGG is
# playing.
#READAHEAD=4096
# Shall we remount journaled fs. with appropriate commit interval? (1=yes)
#DO_REMOUNTS=1
# And shall we add the "noatime" option to that as well? (1=yes)
#DO_REMOUNT_NOATIME=1
# Dirty synchronous ratio. At this percentage of dirty pages the process
# which
# calls write() does its own writeback
#DIRTY_RATIO=40
#
# Allowed dirty background ratio, in percent. Once DIRTY_RATIO has been
# exceeded, the kernel will wake flusher threads which will then reduce the
# amount of dirty memory to dirty_background_ratio. Set this nice and low,
# so once some writeout has commenced, we do a lot of it.
#
#DIRTY_BACKGROUND_RATIO=5
# kernel default dirty buffer age
#DEF_AGE=30
#DEF_UPDATE=5
#DEF_DIRTY_BACKGROUND_RATIO=10
#DEF_DIRTY_RATIO=40
#DEF_XFS_AGE_BUFFER=15
#DEF_XFS_SYNC_INTERVAL=30
#DEF_XFS_BUFD_INTERVAL=1
# This must be adjusted manually to the value of HZ in the running kernel
# on 2.4, until the XFS people change their 2.4 external interfaces to work in
# centisecs. This can be automated, but it's a work in progress that still
# needs# some fixes. On 2.6 kernels, XFS uses USER_HZ instead of HZ for
# external interfaces, and that is currently always set to 100. So you don't
# need to change this on 2.6.
#XFS_HZ=100
# Should the maximum CPU frequency be adjusted down while on battery?
# Requires CPUFreq to be setup.
# See Documentation/admin-guide/pm/cpufreq.rst for more info
#DO_CPU=0
# When on battery what is the maximum CPU speed that the system should
# use? Legal values are "slowest" for the slowest speed that your
# CPU is able to operate at, or a value listed in:
# /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
# Only applicable if DO_CPU=1.
#CPU_MAXFREQ=slowest
# Idle timeout for your hard drive (man hdparm for valid values, -S option)
# Default is 2 hours on AC (AC_HD=244) and 20 seconds for battery (BATT_HD=4).
#AC_HD=244
#BATT_HD=4
# The drives for which to adjust the idle timeout. Separate them by a space,
# e.g. HD="/dev/hda /dev/hdb".
#HD="/dev/hda"
# Set the spindown timeout on a hard drive?
#DO_HD=1
Control script
--------------
Please note that this control script works for the Linux 2.4 and 2.6 series (thanks
to Kiko Piris).
Control script::
#!/bin/bash
# start or stop laptop_mode, best run by a power management daemon when
# ac gets connected/disconnected from a laptop
#
# install as /sbin/laptop_mode
#
# Contributors to this script: Kiko Piris
# Bart Samwel
# Micha Feigin
# Andrew Morton
# Herve Eychenne
# Dax Kelson
#
# Original Linux 2.4 version by: Jens Axboe
#############################################################################
# Source config
if [ -f /etc/default/laptop-mode ] ; then
# Debian
. /etc/default/laptop-mode
elif [ -f /etc/sysconfig/laptop-mode ] ; then
# Others
. /etc/sysconfig/laptop-mode
fi
# Don't raise an error if the config file is incomplete
# set defaults instead:
# Maximum time, in seconds, of hard drive spindown time that you are
# comfortable with. Worst case, it's possible that you could lose this
# amount of work if your battery fails you while in laptop mode.
MAX_AGE=${MAX_AGE:-'600'}
# Read-ahead, in kilobytes
READAHEAD=${READAHEAD:-'4096'}
# Shall we remount journaled fs. with appropriate commit interval? (1=yes)
DO_REMOUNTS=${DO_REMOUNTS:-'1'}
# And shall we add the "noatime" option to that as well? (1=yes)
DO_REMOUNT_NOATIME=${DO_REMOUNT_NOATIME:-'1'}
# Shall we adjust the idle timeout on a hard drive?
DO_HD=${DO_HD:-'1'}
# Adjust idle timeout on which hard drive?
HD="${HD:-'/dev/hda'}"
# spindown time for HD (hdparm -S values)
AC_HD=${AC_HD:-'244'}
BATT_HD=${BATT_HD:-'4'}
# Dirty synchronous ratio. At this percentage of dirty pages the process which
# calls write() does its own writeback
DIRTY_RATIO=${DIRTY_RATIO:-'40'}
# cpu frequency scaling
# See Documentation/admin-guide/pm/cpufreq.rst for more info
DO_CPU=${CPU_MANAGE:-'0'}
CPU_MAXFREQ=${CPU_MAXFREQ:-'slowest'}
#
# Allowed dirty background ratio, in percent. Once DIRTY_RATIO has been
# exceeded, the kernel will wake flusher threads which will then reduce the
# amount of dirty memory to dirty_background_ratio. Set this nice and low,
# so once some writeout has commenced, we do a lot of it.
#
DIRTY_BACKGROUND_RATIO=${DIRTY_BACKGROUND_RATIO:-'5'}
# kernel default dirty buffer age
DEF_AGE=${DEF_AGE:-'30'}
DEF_UPDATE=${DEF_UPDATE:-'5'}
DEF_DIRTY_BACKGROUND_RATIO=${DEF_DIRTY_BACKGROUND_RATIO:-'10'}
DEF_DIRTY_RATIO=${DEF_DIRTY_RATIO:-'40'}
DEF_XFS_AGE_BUFFER=${DEF_XFS_AGE_BUFFER:-'15'}
DEF_XFS_SYNC_INTERVAL=${DEF_XFS_SYNC_INTERVAL:-'30'}
DEF_XFS_BUFD_INTERVAL=${DEF_XFS_BUFD_INTERVAL:-'1'}
# This must be adjusted manually to the value of HZ in the running kernel
# on 2.4, until the XFS people change their 2.4 external interfaces to work in
# centisecs. This can be automated, but it's a work in progress that still needs
# some fixes. On 2.6 kernels, XFS uses USER_HZ instead of HZ for external
# interfaces, and that is currently always set to 100. So you don't need to
# change this on 2.6.
XFS_HZ=${XFS_HZ:-'100'}
#############################################################################
KLEVEL="$(uname -r |
{
IFS='.' read a b c
echo $a.$b
}
)"
case "$KLEVEL" in
"2.4"|"2.6")
;;
*)
echo "Unhandled kernel version: $KLEVEL ('uname -r' = '$(uname -r)')" >&2
exit 1
;;
esac
if [ ! -e /proc/sys/vm/laptop_mode ] ; then
echo "Kernel is not patched with laptop_mode patch." >&2
exit 1
fi
if [ ! -w /proc/sys/vm/laptop_mode ] ; then
echo "You do not have enough privileges to enable laptop_mode." >&2
exit 1
fi
# Remove an option (the first parameter) of the form option=<number> from
# a mount options string (the rest of the parameters).
parse_mount_opts () {
OPT="$1"
shift
echo ",$*," | sed \
-e 's/,'"$OPT"'=[0-9]*,/,/g' \
-e 's/,,*/,/g' \
-e 's/^,//' \
-e 's/,$//'
}
# Remove an option (the first parameter) without any arguments from
# a mount option string (the rest of the parameters).
parse_nonumber_mount_opts () {
OPT="$1"
shift
echo ",$*," | sed \
-e 's/,'"$OPT"',/,/g' \
-e 's/,,*/,/g' \
-e 's/^,//' \
-e 's/,$//'
}
# Find out the state of a yes/no option (e.g. "atime"/"noatime") in
# fstab for a given filesystem, and use this state to replace the
# value of the option in another mount options string. The device
# is the first argument, the option name the second, and the default
# value the third. The remainder is the mount options string.
#
# Example:
# parse_yesno_opts_wfstab /dev/hda1 atime atime defaults,noatime
#
# If fstab contains, say, "rw" for this filesystem, then the result
# will be "defaults,atime".
parse_yesno_opts_wfstab () {
L_DEV="$1"
OPT="$2"
DEF_OPT="$3"
shift 3
L_OPTS="$*"
PARSEDOPTS1="$(parse_nonumber_mount_opts $OPT $L_OPTS)"
PARSEDOPTS1="$(parse_nonumber_mount_opts no$OPT $PARSEDOPTS1)"
# Watch for a default atime in fstab
FSTAB_OPTS="$(awk '$1 == "'$L_DEV'" { print $4 }' /etc/fstab)"
if echo "$FSTAB_OPTS" | grep "$OPT" > /dev/null ; then
# option specified in fstab: extract the value and use it
if echo "$FSTAB_OPTS" | grep "no$OPT" > /dev/null ; then
echo "$PARSEDOPTS1,no$OPT"
else
# no$OPT not found -- so we must have $OPT.
echo "$PARSEDOPTS1,$OPT"
fi
else
# option not specified in fstab -- choose the default.
echo "$PARSEDOPTS1,$DEF_OPT"
fi
}
# Find out the state of a numbered option (e.g. "commit=NNN") in
# fstab for a given filesystem, and use this state to replace the
# value of the option in another mount options string. The device
# is the first argument, and the option name the second. The
# remainder is the mount options string in which the replacement
# must be done.
#
# Example:
# parse_mount_opts_wfstab /dev/hda1 commit defaults,commit=7
#
# If fstab contains, say, "commit=3,rw" for this filesystem, then the
# result will be "rw,commit=3".
parse_mount_opts_wfstab () {
L_DEV="$1"
OPT="$2"
shift 2
L_OPTS="$*"
PARSEDOPTS1="$(parse_mount_opts $OPT $L_OPTS)"
# Watch for a default commit in fstab
FSTAB_OPTS="$(awk '$1 == "'$L_DEV'" { print $4 }' /etc/fstab)"
if echo "$FSTAB_OPTS" | grep "$OPT=" > /dev/null ; then
# option specified in fstab: extract the value, and use it
echo -n "$PARSEDOPTS1,$OPT="
echo ",$FSTAB_OPTS," | sed \
-e 's/.*,'"$OPT"'=//' \
-e 's/,.*//'
else
# option not specified in fstab: set it to 0
echo "$PARSEDOPTS1,$OPT=0"
fi
}
deduce_fstype () {
MP="$1"
# My root filesystem unfortunately has
# type "unknown" in /etc/mtab. If we encounter
# "unknown", we try to get the type from fstab.
cat /etc/fstab |
grep -v '^#' |
while read FSTAB_DEV FSTAB_MP FSTAB_FST FSTAB_OPTS FSTAB_DUMP FSTAB_DUMP ; do
if [ "$FSTAB_MP" = "$MP" ]; then
echo $FSTAB_FST
exit 0
fi
done
}
if [ $DO_REMOUNT_NOATIME -eq 1 ] ; then
NOATIME_OPT=",noatime"
fi
case "$1" in
start)
AGE=$((100*$MAX_AGE))
XFS_AGE=$(($XFS_HZ*$MAX_AGE))
echo -n "Starting laptop_mode"
if [ -d /proc/sys/vm/pagebuf ] ; then
# (For 2.4 and early 2.6.)
# This only needs to be set, not reset -- it is only used when
# laptop mode is enabled.
echo $XFS_AGE > /proc/sys/vm/pagebuf/lm_flush_age
echo $XFS_AGE > /proc/sys/fs/xfs/lm_sync_interval
elif [ -f /proc/sys/fs/xfs/lm_age_buffer ] ; then
# (A couple of early 2.6 laptop mode patches had these.)
# The same goes for these.
echo $XFS_AGE > /proc/sys/fs/xfs/lm_age_buffer
echo $XFS_AGE > /proc/sys/fs/xfs/lm_sync_interval
elif [ -f /proc/sys/fs/xfs/age_buffer ] ; then
# (2.6.6)
# But not for these -- they are also used in normal
# operation.
echo $XFS_AGE > /proc/sys/fs/xfs/age_buffer
echo $XFS_AGE > /proc/sys/fs/xfs/sync_interval
elif [ -f /proc/sys/fs/xfs/age_buffer_centisecs ] ; then
# (2.6.7 upwards)
# And not for these either. These are in centisecs,
# not USER_HZ, so we have to use $AGE, not $XFS_AGE.
echo $AGE > /proc/sys/fs/xfs/age_buffer_centisecs
echo $AGE > /proc/sys/fs/xfs/xfssyncd_centisecs
echo 3000 > /proc/sys/fs/xfs/xfsbufd_centisecs
fi
case "$KLEVEL" in
"2.4")
echo 1 > /proc/sys/vm/laptop_mode
echo "30 500 0 0 $AGE $AGE 60 20 0" > /proc/sys/vm/bdflush
;;
"2.6")
echo 5 > /proc/sys/vm/laptop_mode
echo "$AGE" > /proc/sys/vm/dirty_writeback_centisecs
echo "$AGE" > /proc/sys/vm/dirty_expire_centisecs
echo "$DIRTY_RATIO" > /proc/sys/vm/dirty_ratio
echo "$DIRTY_BACKGROUND_RATIO" > /proc/sys/vm/dirty_background_ratio
;;
esac
if [ $DO_REMOUNTS -eq 1 ]; then
cat /etc/mtab | while read DEV MP FST OPTS DUMP PASS ; do
PARSEDOPTS="$(parse_mount_opts "$OPTS")"
if [ "$FST" = 'unknown' ]; then
FST=$(deduce_fstype $MP)
fi
case "$FST" in
"ext3")
PARSEDOPTS="$(parse_mount_opts commit "$OPTS")"
mount $DEV -t $FST $MP -o remount,$PARSEDOPTS,commit=$MAX_AGE$NOATIME_OPT
;;
"xfs")
mount $DEV -t $FST $MP -o remount,$OPTS$NOATIME_OPT
;;
esac
if [ -b $DEV ] ; then
blockdev --setra $(($READAHEAD * 2)) $DEV
fi
done
fi
if [ $DO_HD -eq 1 ] ; then
for THISHD in $HD ; do
/sbin/hdparm -S $BATT_HD $THISHD > /dev/null 2>&1
/sbin/hdparm -B 1 $THISHD > /dev/null 2>&1
done
fi
if [ $DO_CPU -eq 1 -a -e /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq ]; then
if [ $CPU_MAXFREQ = 'slowest' ]; then
CPU_MAXFREQ=`cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq`
fi
echo $CPU_MAXFREQ > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
fi
echo "."
;;
stop)
U_AGE=$((100*$DEF_UPDATE))
B_AGE=$((100*$DEF_AGE))
echo -n "Stopping laptop_mode"
echo 0 > /proc/sys/vm/laptop_mode
if [ -f /proc/sys/fs/xfs/age_buffer -a ! -f /proc/sys/fs/xfs/lm_age_buffer ] ; then
# These need to be restored, if there are no lm_*.
echo $(($XFS_HZ*$DEF_XFS_AGE_BUFFER)) > /proc/sys/fs/xfs/age_buffer
echo $(($XFS_HZ*$DEF_XFS_SYNC_INTERVAL)) > /proc/sys/fs/xfs/sync_interval
elif [ -f /proc/sys/fs/xfs/age_buffer_centisecs ] ; then
# These need to be restored as well.
echo $((100*$DEF_XFS_AGE_BUFFER)) > /proc/sys/fs/xfs/age_buffer_centisecs
echo $((100*$DEF_XFS_SYNC_INTERVAL)) > /proc/sys/fs/xfs/xfssyncd_centisecs
echo $((100*$DEF_XFS_BUFD_INTERVAL)) > /proc/sys/fs/xfs/xfsbufd_centisecs
fi
case "$KLEVEL" in
"2.4")
echo "30 500 0 0 $U_AGE $B_AGE 60 20 0" > /proc/sys/vm/bdflush
;;
"2.6")
echo "$U_AGE" > /proc/sys/vm/dirty_writeback_centisecs
echo "$B_AGE" > /proc/sys/vm/dirty_expire_centisecs
echo "$DEF_DIRTY_RATIO" > /proc/sys/vm/dirty_ratio
echo "$DEF_DIRTY_BACKGROUND_RATIO" > /proc/sys/vm/dirty_background_ratio
;;
esac
if [ $DO_REMOUNTS -eq 1 ] ; then
cat /etc/mtab | while read DEV MP FST OPTS DUMP PASS ; do
# Reset commit and atime options to defaults.
if [ "$FST" = 'unknown' ]; then
FST=$(deduce_fstype $MP)
fi
case "$FST" in
"ext3")
PARSEDOPTS="$(parse_mount_opts_wfstab $DEV commit $OPTS)"
PARSEDOPTS="$(parse_yesno_opts_wfstab $DEV atime atime $PARSEDOPTS)"
mount $DEV -t $FST $MP -o remount,$PARSEDOPTS
;;
"xfs")
PARSEDOPTS="$(parse_yesno_opts_wfstab $DEV atime atime $OPTS)"
mount $DEV -t $FST $MP -o remount,$PARSEDOPTS
;;
esac
if [ -b $DEV ] ; then
blockdev --setra 256 $DEV
fi
done
fi
if [ $DO_HD -eq 1 ] ; then
for THISHD in $HD ; do
/sbin/hdparm -S $AC_HD $THISHD > /dev/null 2>&1
/sbin/hdparm -B 255 $THISHD > /dev/null 2>&1
done
fi
if [ $DO_CPU -eq 1 -a -e /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq ]; then
echo `cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq` > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
fi
echo "."
;;
*)
echo "Usage: $0 {start|stop}" 2>&1
exit 1
;;
esac
exit 0
ACPI integration
----------------
Dax Kelson submitted this so that the ACPI acpid daemon will
kick off the laptop_mode script and run hdparm. The part that
automatically disables laptop mode when the battery is low was
written by Jan Topinski.
/etc/acpi/events/ac_adapter::
event=ac_adapter
action=/etc/acpi/actions/ac.sh %e
/etc/acpi/events/battery::
event=battery.*
action=/etc/acpi/actions/battery.sh %e
/etc/acpi/actions/ac.sh::
#!/bin/bash
# ac on/offline event handler
status=`awk '/^state: / { print $2 }' /proc/acpi/ac_adapter/$2/state`
case $status in
"on-line")
/sbin/laptop_mode stop
exit 0
;;
"off-line")
/sbin/laptop_mode start
exit 0
;;
esac
/etc/acpi/actions/battery.sh::
#! /bin/bash
# Automatically disable laptop mode when the battery almost runs out.
BATT_INFO=/proc/acpi/battery/$2/state
if [[ -f /proc/sys/vm/laptop_mode ]]
then
LM=`cat /proc/sys/vm/laptop_mode`
if [[ $LM -gt 0 ]]
then
if [[ -f $BATT_INFO ]]
then
# Source the config file only now that we know we need
if [ -f /etc/default/laptop-mode ] ; then
# Debian
. /etc/default/laptop-mode
elif [ -f /etc/sysconfig/laptop-mode ] ; then
# Others
. /etc/sysconfig/laptop-mode
fi
MINIMUM_BATTERY_MINUTES=${MINIMUM_BATTERY_MINUTES:-'10'}
ACTION="`cat $BATT_INFO | grep charging | cut -c 26-`"
if [[ ACTION -eq "discharging" ]]
then
PRESENT_RATE=`cat $BATT_INFO | grep "present rate:" | sed "s/.* \([0-9][0-9]* \).*/\1/" `
REMAINING=`cat $BATT_INFO | grep "remaining capacity:" | sed "s/.* \([0-9][0-9]* \).*/\1/" `
fi
if (($REMAINING * 60 / $PRESENT_RATE < $MINIMUM_BATTERY_MINUTES))
then
/sbin/laptop_mode stop
fi
else
logger -p daemon.warning "You are using laptop mode and your battery interface $BATT_INFO is missing. This may lead to loss of data when the battery runs out. Check kernel ACPI support and /proc/acpi/battery folder, and edit /etc/acpi/battery.sh to set BATT_INFO to the correct path."
fi
fi
fi
Monitoring tool
---------------
Bartek Kania submitted this, it can be used to measure how much time your disk
spends spun up/down. See tools/laptop/dslm/dslm.c

View file

@ -79,6 +79,43 @@ of parametrs except ``enabled`` again. Once the re-reading is done, this
parameter is set as ``N``. If invalid parameters are found while the
re-reading, DAMON_LRU_SORT will be disabled.
active_mem_bp
-------------
Desired active to [in]active memory ratio in bp (1/10,000).
While keeping the caps that set by other quotas, DAMON_LRU_SORT automatically
increases and decreases the effective level of the quota aiming the LRU
[de]prioritizations of the hot and cold memory resulting in this active to
[in]active memory ratio. Value zero means disabling this auto-tuning feature.
Disabled by default.
Auto-tune monitoring intervals
------------------------------
If this parameter is set as ``Y``, DAMON_LRU_SORT automatically tunes DAMON's
sampling and aggregation intervals. The auto-tuning aims to capture meaningful
amount of access events in each DAMON-snapshot, while keeping the sampling
interval 5 milliseconds in minimum, and 10 seconds in maximum. Setting this as
``N`` disables the auto-tuning.
Disabled by default.
filter_young_pages
------------------
Filter [non-]young pages accordingly for LRU [de]prioritizations.
If this is set, check page level access (youngness) once again before each
LRU [de]prioritization operation. LRU prioritization operation is skipped
if the page has not accessed since the last check (not young). LRU
deprioritization operation is skipped if the page has accessed since the
last check (young). The feature is enabled or disabled if this parameter is
set as ``Y`` or ``N``, respectively.
Disabled by default.
hot_thres_access_freq
---------------------

View file

@ -6,6 +6,11 @@ Detailed Usages
DAMON provides below interfaces for different users.
- *Special-purpose DAMON modules.*
:ref:`This <damon_modules_special_purpose>` is for people who are building,
distributing, and/or administrating the kernel with special-purpose DAMON
usages. Using this, users can use DAMON's major features for the given
purposes in build, boot, or runtime in simple ways.
- *DAMON user space tool.*
`This <https://github.com/damonitor/damo>`_ is for privileged people such as
system administrators who want a just-working human-friendly interface.
@ -87,7 +92,7 @@ comma (",").
│ │ │ │ │ │ │ │ 0/type,matching,allow,memcg_path,addr_start,addr_end,target_idx,min,max
│ │ │ │ │ │ │ :ref:`dests <damon_sysfs_dests>`/nr_dests
│ │ │ │ │ │ │ │ 0/id,weight
│ │ │ │ │ │ │ :ref:`stats <sysfs_schemes_stats>`/nr_tried,sz_tried,nr_applied,sz_applied,sz_ops_filter_passed,qt_exceeds
│ │ │ │ │ │ │ :ref:`stats <sysfs_schemes_stats>`/nr_tried,sz_tried,nr_applied,sz_applied,sz_ops_filter_passed,qt_exceeds,nr_snapshots,max_nr_snapshots
│ │ │ │ │ │ │ :ref:`tried_regions <sysfs_schemes_tried_regions>`/total_bytes
│ │ │ │ │ │ │ │ 0/start,end,nr_accesses,age,sz_filter_passed
│ │ │ │ │ │ │ │ ...
@ -543,10 +548,14 @@ online analysis or tuning of the schemes. Refer to :ref:`design doc
The statistics can be retrieved by reading the files under ``stats`` directory
(``nr_tried``, ``sz_tried``, ``nr_applied``, ``sz_applied``,
``sz_ops_filter_passed``, and ``qt_exceeds``), respectively. The files are not
updated in real time, so you should ask DAMON sysfs interface to update the
content of the files for the stats by writing a special keyword,
``update_schemes_stats`` to the relevant ``kdamonds/<N>/state`` file.
``sz_ops_filter_passed``, ``qt_exceeds``, ``nr_snapshots`` and
``max_nr_snapshots``), respectively.
The files are not updated in real time by default. Users should ask DAMON
sysfs interface to periodically update those using ``refresh_ms``, or do a one
time update by writing a special keyword, ``update_schemes_stats`` to the
relevant ``kdamonds/<N>/state`` file. Refer to :ref:`kdamond directory
<sysfs_kdamond>` for more details.
.. _sysfs_schemes_tried_regions:

View file

@ -603,17 +603,18 @@ ZONE_MOVABLE, especially when fine-tuning zone ratios:
memory for metadata and page tables in the direct map; having a lot of offline
memory blocks is not a typical case, though.
- Memory ballooning without balloon compaction is incompatible with
ZONE_MOVABLE. Only some implementations, such as virtio-balloon and
pseries CMM, fully support balloon compaction.
- Memory ballooning without support for balloon memory migration is incompatible
with ZONE_MOVABLE. Only some implementations, such as virtio-balloon and
pseries CMM, fully support balloon memory migration.
Further, the CONFIG_BALLOON_COMPACTION kernel configuration option might be
Further, the CONFIG_BALLOON_MIGRATION kernel configuration option might be
disabled. In that case, balloon inflation will only perform unmovable
allocations and silently create a zone imbalance, usually triggered by
inflation requests from the hypervisor.
- Gigantic pages are unmovable, resulting in user space consuming a
lot of unmovable memory.
- Gigantic pages are unmovable when an architecture does not support
huge page migration and/or the ``movable_gigantic_pages`` sysctl is false.
See Documentation/admin-guide/sysctl/vm.rst for more info on this sysctl.
- Huge pages are unmovable when an architectures does not support huge
page migration, resulting in a similar issue as with gigantic pages.
@ -672,6 +673,15 @@ block might fail:
- Concurrent activity that operates on the same physical memory area, such as
allocating gigantic pages, can result in temporary offlining failures.
- When an admin sets the ``movable_gigantic_pages`` sysctl to true, gigantic
pages are allowed in ZONE_MOVABLE. This only allows migratable gigantic
pages to be allocated; however, if there are no eligible destination gigantic
pages at offline, the offlining operation will fail.
Users leveraging ``movable_gigantic_pages`` should weigh the value of
ZONE_MOVABLE for increasing the reliability of gigantic page allocation
against the potential loss of hot-unplug reliability.
- Out of memory when dissolving huge pages, especially when HugeTLB Vmemmap
Optimization (HVO) is enabled.

View file

@ -41,7 +41,6 @@ Currently, these files are in /proc/sys/vm:
- extfrag_threshold
- highmem_is_dirtyable
- hugetlb_shm_group
- laptop_mode
- legacy_va_layout
- lowmem_reserve_ratio
- max_map_count
@ -54,6 +53,7 @@ Currently, these files are in /proc/sys/vm:
- mmap_min_addr
- mmap_rnd_bits
- mmap_rnd_compat_bits
- movable_gigantic_pages
- nr_hugepages
- nr_hugepages_mempolicy
- nr_overcommit_hugepages
@ -365,13 +365,6 @@ hugetlb_shm_group contains group id that is allowed to create SysV
shared memory segment using hugetlb page.
laptop_mode
===========
laptop_mode is a knob that controls "laptop mode". All the things that are
controlled by this knob are discussed in Documentation/admin-guide/laptops/laptop-mode.rst.
legacy_va_layout
================
@ -630,6 +623,33 @@ This value can be changed after boot using the
/proc/sys/vm/mmap_rnd_compat_bits tunable
movable_gigantic_pages
======================
This parameter controls whether gigantic pages may be allocated from
ZONE_MOVABLE. If set to non-zero, gigantic pages can be allocated
from ZONE_MOVABLE. ZONE_MOVABLE memory may be created via the kernel
boot parameter `kernelcore` or via memory hotplug as discussed in
Documentation/admin-guide/mm/memory-hotplug.rst.
Support may depend on specific architecture.
Note that using ZONE_MOVABLE gigantic pages make memory hotremove unreliable.
Memory hot-remove operations will block indefinitely until the admin reserves
sufficient gigantic pages to service migration requests associated with the
memory offlining process. As HugeTLB gigantic page reservation is a manual
process (via `nodeN/hugepages/.../nr_hugepages` interfaces) this may not be
obvious when just attempting to offline a block of memory.
Additionally, as multiple gigantic pages may be reserved on a single block,
it may appear that gigantic pages are available for migration when in reality
they are in the process of being removed. For example if `memoryN` contains
two gigantic pages, one reserved and one allocated, and an admin attempts to
offline that block, this operations may hang indefinitely unless another
reserved gigantic page is available on another block `memoryM`.
nr_hugepages
============

View file

@ -130,5 +130,5 @@ More Memory Management Functions
.. kernel-doc:: mm/vmscan.c
.. kernel-doc:: mm/memory_hotplug.c
.. kernel-doc:: mm/mmu_notifier.c
.. kernel-doc:: mm/balloon_compaction.c
.. kernel-doc:: mm/balloon.c
.. kernel-doc:: mm/huge_memory.c

View file

@ -125,7 +125,7 @@ The contiguous memory allocator (CMA) enables reservation of contiguous memory
regions on NUMA nodes during early boot. However, CMA cannot reserve memory
on NUMA nodes that are not online during early boot. ::
void __init hugetlb_cma_reserve(int order) {
void __init hugetlb_cma_reserve(void) {
if (!node_online(nid))
/* do not allow reservations */
}

View file

@ -585,6 +585,10 @@ mechanism tries to make ``current_value`` of ``target_metric`` be same to
specific NUMA node, in bp (1/10,000).
- ``node_memcg_free_bp``: Specific cgroup's node unused memory ratio for a
specific NUMA node, in bp (1/10,000).
- ``active_mem_bp``: Active to active + inactive (LRU) memory size ratio in bp
(1/10,000).
- ``inactive_mem_bp``: Inactive to active + inactive (LRU) memory size ratio in
bp (1/10,000).
``nid`` is optionally required for only ``node_mem_used_bp``,
``node_mem_free_bp``, ``node_memcg_used_bp`` and ``node_memcg_free_bp`` to
@ -718,6 +722,9 @@ scheme's execution.
- ``nr_applied``: Total number of regions that the scheme is applied.
- ``sz_applied``: Total size of regions that the scheme is applied.
- ``qt_exceeds``: Total number of times the quota of the scheme has exceeded.
- ``nr_snapshots``: Total number of DAMON snapshots that the scheme is tried to
be applied.
- ``max_nr_snapshots``: Upper limit of ``nr_snapshots``.
"A scheme is tried to be applied to a region" means DAMOS core logic determined
the region is eligible to apply the scheme's :ref:`action
@ -739,6 +746,10 @@ to exclude anonymous pages and the region has only anonymous pages, or if the
action is ``pageout`` while all pages of the region are unreclaimable, applying
the action to the region will fail.
Unlike normal stats, ``max_nr_snapshots`` is set by users. If it is set as
non-zero and ``nr_snapshots`` be same to or greater than ``nr_snapshots``, the
scheme is deactivated.
To know how user-space can read the stats via :ref:`DAMON sysfs interface
<sysfs_interface>`, refer to :ref:s`stats <sysfs_stats>` part of the
documentation.
@ -798,14 +809,16 @@ The ABIs are designed to be used for user space applications development,
rather than human beings' fingers. Human users are recommended to use such
user space tools. One such Python-written user space tool is available at
Github (https://github.com/damonitor/damo), Pypi
(https://pypistats.org/packages/damo), and Fedora
(https://packages.fedoraproject.org/pkgs/python-damo/damo/).
(https://pypistats.org/packages/damo), and multiple distros
(https://repology.org/project/damo/versions).
Currently, one module for this type, namely 'DAMON sysfs interface' is
available. Please refer to the ABI :ref:`doc <sysfs_interface>` for details of
the interfaces.
.. _damon_modules_special_purpose:
Special-Purpose Access-aware Kernel Modules
-------------------------------------------
@ -823,5 +836,18 @@ To support such cases, yet more DAMON API user kernel modules that provide more
simple and optimized user space interfaces are available. Currently, two
modules for proactive reclamation and LRU lists manipulation are provided. For
more detail, please read the usage documents for those
(:doc:`/admin-guide/mm/damon/reclaim` and
(:doc:`/admin-guide/mm/damon/stat`, :doc:`/admin-guide/mm/damon/reclaim` and
:doc:`/admin-guide/mm/damon/lru_sort`).
Sample DAMON Modules
--------------------
DAMON modules that provides example DAMON kernel API usages.
kernel programmers can build their own special or general purpose DAMON modules
using DAMON kernel API. To help them easily understand how DAMON kernel API
can be used, a few sample modules are provided under ``samples/damon/`` of the
linux source tree. Please note that these modules are not developed for being
used on real products, but only for showing how DAMON kernel API can be used in
simple ways.

View file

@ -4,28 +4,15 @@
DAMON: Data Access MONitoring and Access-aware System Operations
================================================================
DAMON is a Linux kernel subsystem that provides a framework for data access
monitoring and the monitoring results based system operations. The core
monitoring :ref:`mechanisms <damon_design_monitoring>` of DAMON make it
DAMON is a Linux kernel subsystem for efficient :ref:`data access monitoring
<damon_design_monitoring>` and :ref:`access-aware system operations
<damon_design_damos>`. It is designed for being
- *accurate* (the monitoring output is useful enough for DRAM level memory
management; It might not appropriate for CPU Cache levels, though),
- *light-weight* (the monitoring overhead is low enough to be applied online),
and
- *scalable* (the upper-bound of the overhead is in constant range regardless
of the size of target workloads).
Using this framework, therefore, the kernel can operate system in an
access-aware fashion. Because the features are also exposed to the :doc:`user
space </admin-guide/mm/damon/index>`, users who have special information about
their workloads can write personalized applications for better understanding
and optimizations of their workloads and systems.
For easier development of such systems, DAMON provides a feature called
:ref:`DAMOS <damon_design_damos>` (DAMon-based Operation Schemes) in addition
to the monitoring. Using the feature, DAMON users in both kernel and :doc:`user
spaces </admin-guide/mm/damon/index>` can do access-aware system operations
with no code but simple configurations.
- *accurate* (for DRAM level memory management),
- *light-weight* (for production online usages),
- *scalable* (in terms of memory size),
- *tunable* (for flexible usages), and
- *autoamted* (for production operation without manual tunings).
.. toctree::
:maxdepth: 2

View file

@ -3,8 +3,8 @@
DAMON Maintainer Entry Profile
==============================
The DAMON subsystem covers the files that are listed in 'DATA ACCESS MONITOR'
section of 'MAINTAINERS' file.
The DAMON subsystem covers the files that are listed in 'DAMON' section of
'MAINTAINERS' file.
The mailing lists for the subsystem are damon@lists.linux.dev and
linux-mm@kvack.org. Patches should be made against the `mm-new tree
@ -48,8 +48,7 @@ Further doing below and putting the results will be helpful.
- Run `damon-tests/corr
<https://github.com/damonitor/damon-tests/tree/master/corr>`_ for normal
changes.
- Run `damon-tests/perf
<https://github.com/damonitor/damon-tests/tree/master/perf>`_ for performance
- Measure impacts on benchmarks or real world workloads for performance
changes.
Key cycle dates

View file

@ -97,9 +97,6 @@ sections:
`mem_section` objects and the number of rows is calculated to fit
all the memory sections.
The architecture setup code should call sparse_init() to
initialize the memory sections and the memory maps.
With SPARSEMEM there are two possible ways to convert a PFN to the
corresponding `struct page` - a "classic sparse" and "sparse
vmemmap". The selection is made at build time and it is determined by

View file

@ -83,8 +83,6 @@ SPARSEMEM模型将物理内存显示为一个部分的集合。一个区段用me
每一行包含价值 `PAGE_SIZE``mem_section` 对象,行数的计算是为了适应所有的
内存区。
架构设置代码应该调用sparse_init()来初始化内存区和内存映射。
通过SPARSEMEM有两种可能的方式将PFN转换为相应的 `struct page` --"classic sparse"和
"sparse vmemmap"。选择是在构建时进行的,它由 `CONFIG_SPARSEMEM_VMEMMAP`
值决定。

View file

@ -16583,6 +16583,17 @@ T: quilt git://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new
F: mm/
F: tools/mm/
MEMORY MANAGEMENT - BALLOON
M: Andrew Morton <akpm@linux-foundation.org>
M: David Hildenbrand <david@kernel.org>
L: linux-mm@kvack.org
L: virtualization@lists.linux.dev
S: Maintained
W: http://www.linux-mm.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
F: include/linux/balloon.h
F: mm/balloon.c
MEMORY MANAGEMENT - CORE
M: Andrew Morton <akpm@linux-foundation.org>
M: David Hildenbrand <david@kernel.org>
@ -16810,7 +16821,6 @@ R: Shakeel Butt <shakeel.butt@linux.dev>
R: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
L: linux-mm@kvack.org
S: Maintained
F: mm/pt_reclaim.c
F: mm/vmscan.c
F: mm/workingset.c
@ -27776,9 +27786,7 @@ M: David Hildenbrand <david@kernel.org>
L: virtualization@lists.linux.dev
S: Maintained
F: drivers/virtio/virtio_balloon.c
F: include/linux/balloon_compaction.h
F: include/uapi/linux/virtio_balloon.h
F: mm/balloon_compaction.c
VIRTIO BLOCK AND SCSI DRIVERS
M: "Michael S. Tsirkin" <mst@redhat.com>

View file

@ -38,6 +38,7 @@ config ALPHA
select OLD_SIGSUSPEND
select CPU_NO_EFFICIENT_FFS if !ALPHA_EV67
select MMU_GATHER_NO_RANGE
select MMU_GATHER_RCU_TABLE_FREE
select SPARSEMEM_EXTREME if SPARSEMEM
select ZONE_DMA
help

View file

@ -11,7 +11,6 @@
#define STRICT_MM_TYPECHECKS
extern void clear_page(void *page);
#define clear_user_page(page, vaddr, pg) clear_page(page)
#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr)

View file

@ -4,7 +4,7 @@
#include <asm-generic/tlb.h>
#define __pte_free_tlb(tlb, pte, address) pte_free((tlb)->mm, pte)
#define __pmd_free_tlb(tlb, pmd, address) pmd_free((tlb)->mm, pmd)
#define __pte_free_tlb(tlb, pte, address) tlb_remove_ptdesc((tlb), page_ptdesc(pte))
#define __pmd_free_tlb(tlb, pmd, address) tlb_remove_ptdesc((tlb), virt_to_ptdesc(pmd))
#endif

View file

@ -607,7 +607,6 @@ setup_arch(char **cmdline_p)
/* Find our memory. */
setup_memory(kernel_end);
memblock_set_bottom_up(true);
sparse_init();
/* First guess at cpu cache sizes. Do this before init_arch. */
determine_cpu_caches(cpu->type);

View file

@ -208,12 +208,8 @@ callback_init(void * kernel_end)
return kernel_end;
}
/*
* paging_init() sets up the memory map.
*/
void __init paging_init(void)
void __init arch_zone_limits_init(unsigned long *max_zone_pfn)
{
unsigned long max_zone_pfn[MAX_NR_ZONES] = {0, };
unsigned long dma_pfn;
dma_pfn = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT;
@ -221,11 +217,13 @@ void __init paging_init(void)
max_zone_pfn[ZONE_DMA] = dma_pfn;
max_zone_pfn[ZONE_NORMAL] = max_pfn;
}
/* Initialize mem_map[]. */
free_area_init(max_zone_pfn);
/* Initialize the kernel's ZERO_PGE. */
/*
* paging_init() initializes the kernel's ZERO_PGE.
*/
void __init paging_init(void)
{
memset(absolute_pointer(ZERO_PGE), 0, PAGE_SIZE);
}

View file

@ -32,6 +32,8 @@ struct page;
void copy_user_highpage(struct page *to, struct page *from,
unsigned long u_vaddr, struct vm_area_struct *vma);
#define clear_user_page clear_user_page
void clear_user_page(void *to, unsigned long u_vaddr, struct page *page);
typedef struct {

View file

@ -75,6 +75,25 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
base, TO_MB(size), !in_use ? "Not used":"");
}
void __init arch_zone_limits_init(unsigned long *max_zone_pfn)
{
/*----------------- node/zones setup --------------------------*/
max_zone_pfn[ZONE_NORMAL] = max_low_pfn;
#ifdef CONFIG_HIGHMEM
/*
* max_high_pfn should be ok here for both HIGHMEM and HIGHMEM+PAE.
* For HIGHMEM without PAE max_high_pfn should be less than
* min_low_pfn to guarantee that these two regions don't overlap.
* For PAE case highmem is greater than lowmem, so it is natural
* to use max_high_pfn.
*
* In both cases, holes should be handled by pfn_valid().
*/
max_zone_pfn[ZONE_HIGHMEM] = max_high_pfn;
#endif
}
/*
* First memory setup routine called from setup_arch()
* 1. setup swapper's mm @init_mm
@ -83,8 +102,6 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
*/
void __init setup_arch_memory(void)
{
unsigned long max_zone_pfn[MAX_NR_ZONES] = { 0 };
setup_initial_init_mm(_text, _etext, _edata, _end);
/* first page of system - kernel .vector starts here */
@ -122,9 +139,6 @@ void __init setup_arch_memory(void)
memblock_dump_all();
/*----------------- node/zones setup --------------------------*/
max_zone_pfn[ZONE_NORMAL] = max_low_pfn;
#ifdef CONFIG_HIGHMEM
/*
* On ARC (w/o PAE) HIGHMEM addresses are actually smaller (0 based)
@ -139,22 +153,9 @@ void __init setup_arch_memory(void)
min_high_pfn = PFN_DOWN(high_mem_start);
max_high_pfn = PFN_DOWN(high_mem_start + high_mem_sz);
/*
* max_high_pfn should be ok here for both HIGHMEM and HIGHMEM+PAE.
* For HIGHMEM without PAE max_high_pfn should be less than
* min_low_pfn to guarantee that these two regions don't overlap.
* For PAE case highmem is greater than lowmem, so it is natural
* to use max_high_pfn.
*
* In both cases, holes should be handled by pfn_valid().
*/
max_zone_pfn[ZONE_HIGHMEM] = max_high_pfn;
arch_pfn_offset = min(min_low_pfn, min_high_pfn);
kmap_init();
#endif /* CONFIG_HIGHMEM */
free_area_init(max_zone_pfn);
}
void __init arch_mm_preinit(void)

View file

@ -11,7 +11,6 @@
#define clear_page(page) memset((page), 0, PAGE_SIZE)
#define copy_page(to,from) memcpy((to), (from), PAGE_SIZE)
#define clear_user_page(page, vaddr, pg) clear_page(page)
#define copy_user_page(to, from, vaddr, pg) copy_page(to, from)
/*

View file

@ -15,8 +15,8 @@
* ZERO_PAGE is a global shared page that is always zero: used
* for zero-mapped memory areas etc..
*/
extern struct page *empty_zero_page;
#define ZERO_PAGE(vaddr) (empty_zero_page)
extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
#define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))
#endif
#include <asm-generic/pgtable-nopud.h>

View file

@ -107,19 +107,15 @@ void __init setup_dma_zone(const struct machine_desc *mdesc)
#endif
}
static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
unsigned long max_high)
void __init arch_zone_limits_init(unsigned long *max_zone_pfn)
{
unsigned long max_zone_pfn[MAX_NR_ZONES] = { 0 };
#ifdef CONFIG_ZONE_DMA
max_zone_pfn[ZONE_DMA] = min(arm_dma_pfn_limit, max_low);
max_zone_pfn[ZONE_DMA] = min(arm_dma_pfn_limit, max_low_pfn);
#endif
max_zone_pfn[ZONE_NORMAL] = max_low;
max_zone_pfn[ZONE_NORMAL] = max_low_pfn;
#ifdef CONFIG_HIGHMEM
max_zone_pfn[ZONE_HIGHMEM] = max_high;
max_zone_pfn[ZONE_HIGHMEM] = max_pfn;
#endif
free_area_init(max_zone_pfn);
}
#ifdef CONFIG_HAVE_ARCH_PFN_VALID
@ -211,19 +207,6 @@ void __init bootmem_init(void)
early_memtest((phys_addr_t)min_low_pfn << PAGE_SHIFT,
(phys_addr_t)max_low_pfn << PAGE_SHIFT);
/*
* sparse_init() tries to allocate memory from memblock, so must be
* done after the fixed reservations
*/
sparse_init();
/*
* Now free the memory - free_area_init needs
* the sparse mem_map arrays initialized by sparse_init()
* for memmap_init_zone(), otherwise all PFNs are invalid.
*/
zone_sizes_init(min_low_pfn, max_low_pfn, max_pfn);
}
/*

View file

@ -45,7 +45,7 @@ extern unsigned long __atags_pointer;
* empty_zero_page is a special page that is used for
* zero-initialized data and COW.
*/
struct page *empty_zero_page;
unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
EXPORT_SYMBOL(empty_zero_page);
/*
@ -1754,8 +1754,6 @@ static void __init early_fixmap_shutdown(void)
*/
void __init paging_init(const struct machine_desc *mdesc)
{
void *zero_page;
#ifdef CONFIG_XIP_KERNEL
/* Store the kernel RW RAM region start/end in these variables */
kernel_sec_start = CONFIG_PHYS_OFFSET & SECTION_MASK;
@ -1781,13 +1779,7 @@ void __init paging_init(const struct machine_desc *mdesc)
top_pmd = pmd_off_k(0xffff0000);
/* allocate the zero page. */
zero_page = early_alloc(PAGE_SIZE);
bootmem_init();
empty_zero_page = virt_to_page(zero_page);
__flush_dcache_folio(NULL, page_folio(empty_zero_page));
}
void __init early_mm_init(const struct machine_desc *mdesc)

View file

@ -31,7 +31,7 @@ unsigned long vectors_base;
* empty_zero_page is a special page that is used for
* zero-initialized data and COW.
*/
struct page *empty_zero_page;
unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
EXPORT_SYMBOL(empty_zero_page);
#ifdef CONFIG_ARM_MPU
@ -156,18 +156,10 @@ void __init adjust_lowmem_bounds(void)
*/
void __init paging_init(const struct machine_desc *mdesc)
{
void *zero_page;
early_trap_init((void *)vectors_base);
mpu_setup();
/* allocate the zero page. */
zero_page = (void *)memblock_alloc_or_panic(PAGE_SIZE, PAGE_SIZE);
bootmem_init();
empty_zero_page = virt_to_page(zero_page);
flush_dcache_page(empty_zero_page);
}
/*

View file

@ -35,6 +35,7 @@ config ARM64
select ARCH_HAS_KCOV
select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON
select ARCH_HAS_KEEPINITRD
select ARCH_HAS_LAZY_MMU_MODE
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_MEM_ENCRYPT
select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS

View file

@ -56,8 +56,6 @@ extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
#define __HAVE_ARCH_HUGE_PTEP_GET
extern pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
void __init arm64_hugetlb_cma_reserve(void);
#define huge_ptep_modify_prot_start huge_ptep_modify_prot_start
extern pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep);

View file

@ -36,7 +36,6 @@ struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
bool tag_clear_highpages(struct page *to, int numpages);
#define __HAVE_ARCH_TAG_CLEAR_HIGHPAGES
#define clear_user_page(page, vaddr, pg) clear_page(page)
#define copy_user_page(to, from, vaddr, pg) copy_page(to, from)
typedef struct page *pgtable_t;

View file

@ -62,61 +62,26 @@ static inline void emit_pte_barriers(void)
static inline void queue_pte_barriers(void)
{
unsigned long flags;
if (in_interrupt()) {
emit_pte_barriers();
return;
}
flags = read_thread_flags();
if (flags & BIT(TIF_LAZY_MMU)) {
if (is_lazy_mmu_mode_active()) {
/* Avoid the atomic op if already set. */
if (!(flags & BIT(TIF_LAZY_MMU_PENDING)))
if (!test_thread_flag(TIF_LAZY_MMU_PENDING))
set_thread_flag(TIF_LAZY_MMU_PENDING);
} else {
emit_pte_barriers();
}
}
#define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
static inline void arch_enter_lazy_mmu_mode(void)
{
/*
* lazy_mmu_mode is not supposed to permit nesting. But in practice this
* does happen with CONFIG_DEBUG_PAGEALLOC, where a page allocation
* inside a lazy_mmu_mode section (such as zap_pte_range()) will change
* permissions on the linear map with apply_to_page_range(), which
* re-enters lazy_mmu_mode. So we tolerate nesting in our
* implementation. The first call to arch_leave_lazy_mmu_mode() will
* flush and clear the flag such that the remainder of the work in the
* outer nest behaves as if outside of lazy mmu mode. This is safe and
* keeps tracking simple.
*/
if (in_interrupt())
return;
set_thread_flag(TIF_LAZY_MMU);
}
static inline void arch_enter_lazy_mmu_mode(void) {}
static inline void arch_flush_lazy_mmu_mode(void)
{
if (in_interrupt())
return;
if (test_and_clear_thread_flag(TIF_LAZY_MMU_PENDING))
emit_pte_barriers();
}
static inline void arch_leave_lazy_mmu_mode(void)
{
if (in_interrupt())
return;
arch_flush_lazy_mmu_mode();
clear_thread_flag(TIF_LAZY_MMU);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
@ -708,22 +673,24 @@ static inline pgprot_t pud_pgprot(pud_t pud)
return __pgprot(pud_val(pfn_pud(pfn, __pgprot(0))) ^ pud_val(pud));
}
static inline void __set_ptes_anysz(struct mm_struct *mm, pte_t *ptep,
pte_t pte, unsigned int nr,
static inline void __set_ptes_anysz(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned int nr,
unsigned long pgsize)
{
unsigned long stride = pgsize >> PAGE_SHIFT;
switch (pgsize) {
case PAGE_SIZE:
page_table_check_ptes_set(mm, ptep, pte, nr);
page_table_check_ptes_set(mm, addr, ptep, pte, nr);
break;
case PMD_SIZE:
page_table_check_pmds_set(mm, (pmd_t *)ptep, pte_pmd(pte), nr);
page_table_check_pmds_set(mm, addr, (pmd_t *)ptep,
pte_pmd(pte), nr);
break;
#ifndef __PAGETABLE_PMD_FOLDED
case PUD_SIZE:
page_table_check_puds_set(mm, (pud_t *)ptep, pte_pud(pte), nr);
page_table_check_puds_set(mm, addr, (pud_t *)ptep,
pte_pud(pte), nr);
break;
#endif
default:
@ -744,26 +711,23 @@ static inline void __set_ptes_anysz(struct mm_struct *mm, pte_t *ptep,
__set_pte_complete(pte);
}
static inline void __set_ptes(struct mm_struct *mm,
unsigned long __always_unused addr,
static inline void __set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned int nr)
{
__set_ptes_anysz(mm, ptep, pte, nr, PAGE_SIZE);
__set_ptes_anysz(mm, addr, ptep, pte, nr, PAGE_SIZE);
}
static inline void __set_pmds(struct mm_struct *mm,
unsigned long __always_unused addr,
static inline void __set_pmds(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd, unsigned int nr)
{
__set_ptes_anysz(mm, (pte_t *)pmdp, pmd_pte(pmd), nr, PMD_SIZE);
__set_ptes_anysz(mm, addr, (pte_t *)pmdp, pmd_pte(pmd), nr, PMD_SIZE);
}
#define set_pmd_at(mm, addr, pmdp, pmd) __set_pmds(mm, addr, pmdp, pmd, 1)
static inline void __set_puds(struct mm_struct *mm,
unsigned long __always_unused addr,
static inline void __set_puds(struct mm_struct *mm, unsigned long addr,
pud_t *pudp, pud_t pud, unsigned int nr)
{
__set_ptes_anysz(mm, (pte_t *)pudp, pud_pte(pud), nr, PUD_SIZE);
__set_ptes_anysz(mm, addr, (pte_t *)pudp, pud_pte(pud), nr, PUD_SIZE);
}
#define set_pud_at(mm, addr, pudp, pud) __set_puds(mm, addr, pudp, pud, 1)
@ -1301,17 +1265,17 @@ static inline int pmdp_set_access_flags(struct vm_area_struct *vma,
#endif
#ifdef CONFIG_PAGE_TABLE_CHECK
static inline bool pte_user_accessible_page(pte_t pte)
static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
{
return pte_valid(pte) && (pte_user(pte) || pte_user_exec(pte));
}
static inline bool pmd_user_accessible_page(pmd_t pmd)
static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
{
return pmd_valid(pmd) && !pmd_table(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
}
static inline bool pud_user_accessible_page(pud_t pud)
static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
{
return pud_valid(pud) && !pud_table(pud) && (pud_user(pud) || pud_user_exec(pud));
}
@ -1370,6 +1334,7 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma,
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG */
static inline pte_t __ptep_get_and_clear_anysz(struct mm_struct *mm,
unsigned long address,
pte_t *ptep,
unsigned long pgsize)
{
@ -1377,14 +1342,14 @@ static inline pte_t __ptep_get_and_clear_anysz(struct mm_struct *mm,
switch (pgsize) {
case PAGE_SIZE:
page_table_check_pte_clear(mm, pte);
page_table_check_pte_clear(mm, address, pte);
break;
case PMD_SIZE:
page_table_check_pmd_clear(mm, pte_pmd(pte));
page_table_check_pmd_clear(mm, address, pte_pmd(pte));
break;
#ifndef __PAGETABLE_PMD_FOLDED
case PUD_SIZE:
page_table_check_pud_clear(mm, pte_pud(pte));
page_table_check_pud_clear(mm, address, pte_pud(pte));
break;
#endif
default:
@ -1397,7 +1362,7 @@ static inline pte_t __ptep_get_and_clear_anysz(struct mm_struct *mm,
static inline pte_t __ptep_get_and_clear(struct mm_struct *mm,
unsigned long address, pte_t *ptep)
{
return __ptep_get_and_clear_anysz(mm, ptep, PAGE_SIZE);
return __ptep_get_and_clear_anysz(mm, address, ptep, PAGE_SIZE);
}
static inline void __clear_full_ptes(struct mm_struct *mm, unsigned long addr,
@ -1436,7 +1401,7 @@ static inline pte_t __get_and_clear_full_ptes(struct mm_struct *mm,
static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
unsigned long address, pmd_t *pmdp)
{
return pte_pmd(__ptep_get_and_clear_anysz(mm, (pte_t *)pmdp, PMD_SIZE));
return pte_pmd(__ptep_get_and_clear_anysz(mm, address, (pte_t *)pmdp, PMD_SIZE));
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
@ -1525,7 +1490,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
{
page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
return __pmd(xchg_relaxed(&pmd_val(*pmdp), pmd_val(pmd)));
}
#endif

View file

@ -84,8 +84,7 @@ void arch_setup_new_exec(void);
#define TIF_SME_VL_INHERIT 28 /* Inherit SME vl_onexec across exec */
#define TIF_KERNEL_FPSTATE 29 /* Task is in a kernel mode FPSIMD section */
#define TIF_TSC_SIGSEGV 30 /* SIGSEGV on counter-timer access */
#define TIF_LAZY_MMU 31 /* Task in lazy mmu mode */
#define TIF_LAZY_MMU_PENDING 32 /* Ops pending for lazy mmu mode exit */
#define TIF_LAZY_MMU_PENDING 31 /* Ops pending for lazy mmu mode exit */
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)

View file

@ -36,16 +36,12 @@
* huge pages could still be served from those areas.
*/
#ifdef CONFIG_CMA
void __init arm64_hugetlb_cma_reserve(void)
unsigned int arch_hugetlb_cma_order(void)
{
int order;
if (pud_sect_supported())
order = PUD_SHIFT - PAGE_SHIFT;
else
order = CONT_PMD_SHIFT - PAGE_SHIFT;
return PUD_SHIFT - PAGE_SHIFT;
hugetlb_cma_reserve(order);
return CONT_PMD_SHIFT - PAGE_SHIFT;
}
#endif /* CONFIG_CMA */
@ -159,11 +155,12 @@ static pte_t get_clear_contig(struct mm_struct *mm,
pte_t pte, tmp_pte;
bool present;
pte = __ptep_get_and_clear_anysz(mm, ptep, pgsize);
pte = __ptep_get_and_clear_anysz(mm, addr, ptep, pgsize);
present = pte_present(pte);
while (--ncontig) {
ptep++;
tmp_pte = __ptep_get_and_clear_anysz(mm, ptep, pgsize);
addr += pgsize;
tmp_pte = __ptep_get_and_clear_anysz(mm, addr, ptep, pgsize);
if (present) {
if (pte_dirty(tmp_pte))
pte = pte_mkdirty(pte);
@ -207,7 +204,7 @@ static void clear_flush(struct mm_struct *mm,
unsigned long i, saddr = addr;
for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
__ptep_get_and_clear_anysz(mm, ptep, pgsize);
__ptep_get_and_clear_anysz(mm, addr, ptep, pgsize);
if (mm == &init_mm)
flush_tlb_kernel_range(saddr, addr);
@ -225,8 +222,8 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
ncontig = num_contig_ptes(sz, &pgsize);
if (!pte_present(pte)) {
for (i = 0; i < ncontig; i++, ptep++)
__set_ptes_anysz(mm, ptep, pte, 1, pgsize);
for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
__set_ptes_anysz(mm, addr, ptep, pte, 1, pgsize);
return;
}
@ -234,7 +231,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
if (pte_cont(pte) && pte_valid(__ptep_get(ptep)))
clear_flush(mm, addr, ptep, pgsize, ncontig);
__set_ptes_anysz(mm, ptep, pte, ncontig, pgsize);
__set_ptes_anysz(mm, addr, ptep, pte, ncontig, pgsize);
}
pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
@ -449,7 +446,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
if (pte_young(orig_pte))
pte = pte_mkyoung(pte);
__set_ptes_anysz(mm, ptep, pte, ncontig, pgsize);
__set_ptes_anysz(mm, addr, ptep, pte, ncontig, pgsize);
return 1;
}
@ -473,7 +470,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
pte = get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig);
pte = pte_wrprotect(pte);
__set_ptes_anysz(mm, ptep, pte, ncontig, pgsize);
__set_ptes_anysz(mm, addr, ptep, pte, ncontig, pgsize);
}
pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,

View file

@ -118,9 +118,22 @@ static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
}
static void __init zone_sizes_init(void)
void __init arch_zone_limits_init(unsigned long *max_zone_pfns)
{
phys_addr_t __maybe_unused dma32_phys_limit =
max_zone_phys(DMA_BIT_MASK(32));
#ifdef CONFIG_ZONE_DMA
max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_phys(zone_dma_limit));
#endif
#ifdef CONFIG_ZONE_DMA32
max_zone_pfns[ZONE_DMA32] = PFN_DOWN(dma32_phys_limit);
#endif
max_zone_pfns[ZONE_NORMAL] = max_pfn;
}
static void __init dma_limits_init(void)
{
unsigned long max_zone_pfns[MAX_NR_ZONES] = {0};
phys_addr_t __maybe_unused acpi_zone_dma_limit;
phys_addr_t __maybe_unused dt_zone_dma_limit;
phys_addr_t __maybe_unused dma32_phys_limit =
@ -139,18 +152,13 @@ static void __init zone_sizes_init(void)
if (memblock_start_of_DRAM() < U32_MAX)
zone_dma_limit = min(zone_dma_limit, U32_MAX);
arm64_dma_phys_limit = max_zone_phys(zone_dma_limit);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
#endif
#ifdef CONFIG_ZONE_DMA32
max_zone_pfns[ZONE_DMA32] = PFN_DOWN(dma32_phys_limit);
if (!arm64_dma_phys_limit)
arm64_dma_phys_limit = dma32_phys_limit;
#endif
if (!arm64_dma_phys_limit)
arm64_dma_phys_limit = PHYS_MASK + 1;
max_zone_pfns[ZONE_NORMAL] = max_pfn;
free_area_init(max_zone_pfns);
}
int pfn_is_map_memory(unsigned long pfn)
@ -303,23 +311,8 @@ void __init bootmem_init(void)
arch_numa_init();
/*
* must be done after arch_numa_init() which calls numa_init() to
* initialize node_online_map that gets used in hugetlb_cma_reserve()
* while allocating required CMA size across online nodes.
*/
#if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA)
arm64_hugetlb_cma_reserve();
#endif
kvm_hyp_reserve();
/*
* sparse_init() tries to allocate memory from memblock, so must be
* done after the fixed reservations
*/
sparse_init();
zone_sizes_init();
dma_limits_init();
/*
* Reserve the CMA area after arm64_dma_phys_limit was initialised.

View file

@ -800,7 +800,7 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
return -EINVAL;
mutex_lock(&pgtable_split_lock);
arch_enter_lazy_mmu_mode();
lazy_mmu_mode_enable();
/*
* The split_kernel_leaf_mapping_locked() may sleep, it is not a
@ -822,7 +822,7 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
ret = split_kernel_leaf_mapping_locked(end);
}
arch_leave_lazy_mmu_mode();
lazy_mmu_mode_disable();
mutex_unlock(&pgtable_split_lock);
return ret;
}
@ -883,10 +883,10 @@ static int range_split_to_ptes(unsigned long start, unsigned long end, gfp_t gfp
{
int ret;
arch_enter_lazy_mmu_mode();
lazy_mmu_mode_enable();
ret = walk_kernel_page_table_range_lockless(start, end,
&split_to_ptes_ops, NULL, &gfp);
arch_leave_lazy_mmu_mode();
lazy_mmu_mode_disable();
return ret;
}

View file

@ -110,7 +110,7 @@ static int update_range_prot(unsigned long start, unsigned long size,
if (WARN_ON_ONCE(ret))
return ret;
arch_enter_lazy_mmu_mode();
lazy_mmu_mode_enable();
/*
* The caller must ensure that the range we are operating on does not
@ -119,7 +119,7 @@ static int update_range_prot(unsigned long start, unsigned long size,
*/
ret = walk_kernel_page_table_range_lockless(start, start + size,
&pageattr_ops, NULL, &data);
arch_leave_lazy_mmu_mode();
lazy_mmu_mode_disable();
return ret;
}

View file

@ -10,6 +10,7 @@ static inline unsigned long pages_do_alias(unsigned long addr1,
return (addr1 ^ addr2) & (SHMLBA-1);
}
#define clear_user_page clear_user_page
static inline void clear_user_page(void *addr, unsigned long vaddr,
struct page *page)
{

View file

@ -1,11 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 */
static inline void clear_user_page(void *addr, unsigned long vaddr,
struct page *page)
{
clear_page(addr);
}
static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
struct page *page)
{

View file

@ -51,11 +51,18 @@ disable:
}
#endif
void __init arch_zone_limits_init(unsigned long *max_zone_pfns)
{
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
#ifdef CONFIG_HIGHMEM
max_zone_pfns[ZONE_HIGHMEM] = max_pfn;
#endif
}
static void __init csky_memblock_init(void)
{
unsigned long lowmem_size = PFN_DOWN(LOWMEM_LIMIT - PHYS_OFFSET_OFFSET);
unsigned long sseg_size = PFN_DOWN(SSEG_SIZE - PHYS_OFFSET_OFFSET);
unsigned long max_zone_pfn[MAX_NR_ZONES] = { 0 };
signed long size;
memblock_reserve(__pa(_start), _end - _start);
@ -83,12 +90,9 @@ static void __init csky_memblock_init(void)
setup_initrd();
#endif
max_zone_pfn[ZONE_NORMAL] = max_low_pfn;
mmu_init(min_low_pfn, max_low_pfn);
#ifdef CONFIG_HIGHMEM
max_zone_pfn[ZONE_HIGHMEM] = max_pfn;
highstart_pfn = max_low_pfn;
highend_pfn = max_pfn;
@ -96,8 +100,6 @@ static void __init csky_memblock_init(void)
memblock_set_current_limit(PFN_PHYS(max_low_pfn));
dma_contiguous_reserve(0);
free_area_init(max_zone_pfn);
}
void __init setup_arch(char **cmdline_p)
@ -121,8 +123,6 @@ void __init setup_arch(char **cmdline_p)
setup_smp();
#endif
sparse_init();
fixaddr_init();
#ifdef CONFIG_HIGHMEM

View file

@ -113,7 +113,6 @@ static inline void clear_page(void *page)
/*
* Under assumption that kernel always "sees" user map...
*/
#define clear_user_page(page, vaddr, pg) clear_page(page)
#define copy_user_page(to, from, vaddr, pg) copy_page(to, from)
static inline unsigned long virt_to_pfn(const void *kaddr)

View file

@ -54,17 +54,8 @@ void sync_icache_dcache(pte_t pte)
__vmcache_idsync(addr, PAGE_SIZE);
}
/*
* In order to set up page allocator "nodes",
* somebody has to call free_area_init() for UMA.
*
* In this mode, we only have one pg_data_t
* structure: contig_mem_data.
*/
static void __init paging_init(void)
void __init arch_zone_limits_init(unsigned long *max_zone_pfns)
{
unsigned long max_zone_pfn[MAX_NR_ZONES] = {0, };
/*
* This is not particularly well documented anywhere, but
* give ZONE_NORMAL all the memory, including the big holes
@ -72,11 +63,11 @@ static void __init paging_init(void)
* in the bootmem_map; free_area_init should see those bits and
* adjust accordingly.
*/
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
}
max_zone_pfn[ZONE_NORMAL] = max_low_pfn;
free_area_init(max_zone_pfn); /* sets up the zonelists and mem_map */
static void __init paging_init(void)
{
/*
* Set the init_mm descriptors "context" value to point to the
* initial kernel segment table's physical address.

View file

@ -187,6 +187,7 @@ config LOONGARCH
select IRQ_LOONGARCH_CPU
select LOCK_MM_AND_FIND_VMA
select MMU_GATHER_MERGE_VMAS if MMU
select MMU_GATHER_RCU_TABLE_FREE
select MODULES_USE_ELF_RELA if MODULES
select NEED_PER_CPU_EMBED_FIRST_CHUNK
select NEED_PER_CPU_PAGE_FIRST_CHUNK

View file

@ -30,7 +30,6 @@
extern void clear_page(void *page);
extern void copy_page(void *to, void *from);
#define clear_user_page(page, vaddr, pg) clear_page(page)
#define copy_user_page(to, from, vaddr, pg) copy_page(to, from)
extern unsigned long shm_align_mask;

View file

@ -55,8 +55,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
return pte;
}
#define __pte_free_tlb(tlb, pte, address) \
tlb_remove_ptdesc((tlb), page_ptdesc(pte))
#define __pte_free_tlb(tlb, pte, address) tlb_remove_ptdesc((tlb), page_ptdesc(pte))
#ifndef __PAGETABLE_PMD_FOLDED
@ -79,7 +78,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
return pmd;
}
#define __pmd_free_tlb(tlb, x, addr) pmd_free((tlb)->mm, x)
#define __pmd_free_tlb(tlb, x, addr) tlb_remove_ptdesc((tlb), virt_to_ptdesc(x))
#endif
@ -99,7 +98,7 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
return pud;
}
#define __pud_free_tlb(tlb, x, addr) pud_free((tlb)->mm, x)
#define __pud_free_tlb(tlb, x, addr) tlb_remove_ptdesc((tlb), virt_to_ptdesc(x))
#endif /* __PAGETABLE_PUD_FOLDED */

View file

@ -353,8 +353,6 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
return pte;
}
extern void paging_init(void);
#define pte_none(pte) (!(pte_val(pte) & ~_PAGE_GLOBAL))
#define pte_present(pte) (pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROTNONE))
#define pte_no_exec(pte) (pte_val(pte) & _PAGE_NO_EXEC)

View file

@ -402,14 +402,6 @@ static void __init arch_mem_init(char **cmdline_p)
check_kernel_sections_mem();
/*
* In order to reduce the possibility of kernel panic when failed to
* get IO TLB memory under CONFIG_SWIOTLB, it is better to allocate
* low memory as small as possible before swiotlb_init(), so make
* sparse_init() using top-down allocation.
*/
memblock_set_bottom_up(false);
sparse_init();
memblock_set_bottom_up(true);
swiotlb_init(true, SWIOTLB_VERBOSE);
@ -621,8 +613,6 @@ void __init setup_arch(char **cmdline_p)
prefill_possible_map();
#endif
paging_init();
#ifdef CONFIG_KASAN
kasan_init();
#endif

View file

@ -60,16 +60,12 @@ int __ref page_is_ram(unsigned long pfn)
return memblock_is_memory(addr) && !memblock_is_reserved(addr);
}
void __init paging_init(void)
void __init arch_zone_limits_init(unsigned long *max_zone_pfns)
{
unsigned long max_zone_pfns[MAX_NR_ZONES];
#ifdef CONFIG_ZONE_DMA32
max_zone_pfns[ZONE_DMA32] = MAX_DMA32_PFN;
#endif
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
free_area_init(max_zone_pfns);
}
void __ref free_initmem(void)

View file

@ -10,7 +10,6 @@ extern unsigned long memory_end;
#define clear_page(page) memset((page), 0, PAGE_SIZE)
#define copy_page(to,from) memcpy((to), (from), PAGE_SIZE)
#define clear_user_page(page, vaddr, pg) clear_page(page)
#define copy_user_page(to, from, vaddr, pg) copy_page(to, from)
#define vma_alloc_zeroed_movable_folio(vma, vaddr) \

View file

@ -40,6 +40,11 @@
void *empty_zero_page;
EXPORT_SYMBOL(empty_zero_page);
void __init arch_zone_limits_init(unsigned long *max_zone_pfns)
{
max_zone_pfns[ZONE_DMA] = PFN_DOWN(memblock_end_of_DRAM());
}
#ifdef CONFIG_MMU
int m68k_virt_to_node_shift;
@ -64,13 +69,10 @@ void __init paging_init(void)
* page_alloc get different views of the world.
*/
unsigned long end_mem = memory_end & PAGE_MASK;
unsigned long max_zone_pfn[MAX_NR_ZONES] = { 0, };
high_memory = (void *) end_mem;
empty_zero_page = memblock_alloc_or_panic(PAGE_SIZE, PAGE_SIZE);
max_zone_pfn[ZONE_DMA] = end_mem >> PAGE_SHIFT;
free_area_init(max_zone_pfn);
}
#endif /* CONFIG_MMU */

View file

@ -39,7 +39,6 @@ void __init paging_init(void)
pte_t *pg_table;
unsigned long address, size;
unsigned long next_pgtable;
unsigned long max_zone_pfn[MAX_NR_ZONES] = { 0 };
int i;
empty_zero_page = memblock_alloc_or_panic(PAGE_SIZE, PAGE_SIZE);
@ -73,8 +72,6 @@ void __init paging_init(void)
}
current->mm = NULL;
max_zone_pfn[ZONE_DMA] = PFN_DOWN(_ramend);
free_area_init(max_zone_pfn);
}
int cf_tlb_miss(struct pt_regs *regs, int write, int dtlb, int extension_word)

View file

@ -429,7 +429,6 @@ DECLARE_VM_GET_PAGE_PROT
*/
void __init paging_init(void)
{
unsigned long max_zone_pfn[MAX_NR_ZONES] = { 0, };
unsigned long min_addr, max_addr;
unsigned long addr;
int i;
@ -511,12 +510,9 @@ void __init paging_init(void)
set_fc(USER_DATA);
#ifdef DEBUG
printk ("before free_area_init\n");
printk ("before node_set_state\n");
#endif
for (i = 0; i < m68k_num_memory; i++)
if (node_present_pages(i))
node_set_state(i, N_NORMAL_MEMORY);
max_zone_pfn[ZONE_DMA] = memblock_end_of_DRAM();
free_area_init(max_zone_pfn);
}

View file

@ -41,7 +41,6 @@ void __init paging_init(void)
unsigned long address;
unsigned long next_pgtable;
unsigned long bootmem_end;
unsigned long max_zone_pfn[MAX_NR_ZONES] = { 0, };
unsigned long size;
empty_zero_page = memblock_alloc_or_panic(PAGE_SIZE, PAGE_SIZE);
@ -80,14 +79,6 @@ void __init paging_init(void)
mmu_emu_init(bootmem_end);
current->mm = NULL;
/* memory sizing is a hack stolen from motorola.c.. hope it works for us */
max_zone_pfn[ZONE_DMA] = ((unsigned long)high_memory) >> PAGE_SHIFT;
/* I really wish I knew why the following change made things better... -- Sam */
free_area_init(max_zone_pfn);
}
static const pgprot_t protection_map[16] = {

View file

@ -45,7 +45,6 @@ typedef unsigned long pte_basic_t;
# define copy_page(to, from) memcpy((to), (from), PAGE_SIZE)
# define clear_page(pgaddr) memset((pgaddr), 0, PAGE_SIZE)
# define clear_user_page(pgaddr, vaddr, page) memset((pgaddr), 0, PAGE_SIZE)
# define copy_user_page(vto, vfrom, vaddr, topg) \
memcpy((vto), (vfrom), PAGE_SIZE)

View file

@ -54,32 +54,30 @@ static void __init highmem_init(void)
}
#endif /* CONFIG_HIGHMEM */
void __init arch_zone_limits_init(unsigned long *max_zone_pfns)
{
#ifdef CONFIG_HIGHMEM
max_zone_pfns[ZONE_DMA] = max_low_pfn;
max_zone_pfns[ZONE_HIGHMEM] = max_pfn;
#else
max_zone_pfns[ZONE_DMA] = max_pfn;
#endif
}
/*
* paging_init() sets up the page tables - in fact we've already done this.
*/
static void __init paging_init(void)
{
unsigned long zones_size[MAX_NR_ZONES];
int idx;
/* Setup fixmaps */
for (idx = 0; idx < __end_of_fixed_addresses; idx++)
clear_fixmap(idx);
/* Clean every zones */
memset(zones_size, 0, sizeof(zones_size));
#ifdef CONFIG_HIGHMEM
highmem_init();
zones_size[ZONE_DMA] = max_low_pfn;
zones_size[ZONE_HIGHMEM] = max_pfn;
#else
zones_size[ZONE_DMA] = max_pfn;
#endif
/* We don't have holes in memory map */
free_area_init(zones_size);
}
void __init setup_memory(void)

View file

@ -99,6 +99,7 @@ config MIPS
select IRQ_FORCED_THREADING
select ISA if EISA
select LOCK_MM_AND_FIND_VMA
select MMU_GATHER_RCU_TABLE_FREE
select MODULES_USE_ELF_REL if MODULES
select MODULES_USE_ELF_RELA if MODULES && 64BIT
select PERF_USE_VMALLOC

View file

@ -90,6 +90,7 @@ static inline void clear_user_page(void *addr, unsigned long vaddr,
if (pages_do_alias((unsigned long) addr, vaddr & PAGE_MASK))
flush_data_cache_page((unsigned long)addr);
}
#define clear_user_page clear_user_page
struct vm_area_struct;
extern void copy_user_highpage(struct page *to, struct page *from,

View file

@ -48,8 +48,7 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
extern void pgd_init(void *addr);
extern pgd_t *pgd_alloc(struct mm_struct *mm);
#define __pte_free_tlb(tlb, pte, address) \
tlb_remove_ptdesc((tlb), page_ptdesc(pte))
#define __pte_free_tlb(tlb, pte, address) tlb_remove_ptdesc((tlb), page_ptdesc(pte))
#ifndef __PAGETABLE_PMD_FOLDED
@ -72,7 +71,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
return pmd;
}
#define __pmd_free_tlb(tlb, x, addr) pmd_free((tlb)->mm, x)
#define __pmd_free_tlb(tlb, x, addr) tlb_remove_ptdesc((tlb), virt_to_ptdesc(x))
#endif
@ -97,10 +96,8 @@ static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
set_p4d(p4d, __p4d((unsigned long)pud));
}
#define __pud_free_tlb(tlb, x, addr) pud_free((tlb)->mm, x)
#define __pud_free_tlb(tlb, x, addr) tlb_remove_ptdesc((tlb), virt_to_ptdesc(x))
#endif /* __PAGETABLE_PUD_FOLDED */
extern void pagetable_init(void);
#endif /* _ASM_PGALLOC_H */

View file

@ -56,7 +56,7 @@ extern unsigned long zero_page_mask;
(virt_to_page((void *)(empty_zero_page + (((unsigned long)(vaddr)) & zero_page_mask))))
#define __HAVE_COLOR_ZERO_PAGE
extern void paging_init(void);
extern void pagetable_init(void);
/*
* Conversion functions: convert a page and protection to a page entry,

View file

@ -614,8 +614,7 @@ static void __init bootcmdline_init(void)
* kernel but generic memory management system is still entirely uninitialized.
*
* o bootmem_init()
* o sparse_init()
* o paging_init()
* o pagetable_init()
* o dma_contiguous_reserve()
*
* At this stage the bootmem allocator is ready to use.
@ -665,16 +664,6 @@ static void __init arch_mem_init(char **cmdline_p)
mips_parse_crashkernel();
device_tree_init();
/*
* In order to reduce the possibility of kernel panic when failed to
* get IO TLB memory under CONFIG_SWIOTLB, it is better to allocate
* low memory as small as possible before plat_swiotlb_setup(), so
* make sparse_init() using top-down allocation.
*/
memblock_set_bottom_up(false);
sparse_init();
memblock_set_bottom_up(true);
plat_swiotlb_setup();
dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
@ -789,7 +778,7 @@ void __init setup_arch(char **cmdline_p)
prefill_possible_map();
cpu_cache_init();
paging_init();
pagetable_init();
memblock_dump_all();

View file

@ -154,14 +154,10 @@ static __init void prom_meminit(void)
}
}
void __init paging_init(void)
void __init arch_zone_limits_init(unsigned long *max_zone_pfns)
{
unsigned long zones_size[MAX_NR_ZONES] = {0, };
pagetable_init();
zones_size[ZONE_DMA32] = MAX_DMA32_PFN;
zones_size[ZONE_NORMAL] = max_low_pfn;
free_area_init(zones_size);
max_zone_pfns[ZONE_DMA32] = MAX_DMA32_PFN;
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
}
/* All PCI device belongs to logical Node-0 */

View file

@ -394,12 +394,8 @@ void maar_init(void)
}
#ifndef CONFIG_NUMA
void __init paging_init(void)
void __init arch_zone_limits_init(unsigned long *max_zone_pfns)
{
unsigned long max_zone_pfns[MAX_NR_ZONES];
pagetable_init();
#ifdef CONFIG_ZONE_DMA
max_zone_pfns[ZONE_DMA] = MAX_DMA_PFN;
#endif
@ -417,8 +413,6 @@ void __init paging_init(void)
max_zone_pfns[ZONE_HIGHMEM] = max_low_pfn;
}
#endif
free_area_init(max_zone_pfns);
}
#ifdef CONFIG_64BIT

View file

@ -406,11 +406,7 @@ void __init prom_meminit(void)
}
}
void __init paging_init(void)
void __init arch_zone_limits_init(unsigned long *max_zone_pfns)
{
unsigned long zones_size[MAX_NR_ZONES] = {0, };
pagetable_init();
zones_size[ZONE_NORMAL] = max_low_pfn;
free_area_init(zones_size);
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
}

View file

@ -45,6 +45,7 @@
struct page;
#define clear_user_page clear_user_page
extern void clear_user_page(void *addr, unsigned long vaddr, struct page *page);
extern void copy_user_page(void *vto, void *vfrom, unsigned long vaddr,
struct page *to);

View file

@ -38,6 +38,11 @@
pgd_t *pgd_current;
void __init arch_zone_limits_init(unsigned long *max_zone_pfns)
{
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
}
/*
* paging_init() continues the virtual memory environment setup which
* was begun by the code in arch/head.S.
@ -46,16 +51,9 @@ pgd_t *pgd_current;
*/
void __init paging_init(void)
{
unsigned long max_zone_pfn[MAX_NR_ZONES] = { 0 };
pagetable_init();
pgd_current = swapper_pg_dir;
max_zone_pfn[ZONE_NORMAL] = max_low_pfn;
/* pass the memory from the bootmem allocator to the main allocator */
free_area_init(max_zone_pfn);
flush_dcache_range((unsigned long)empty_zero_page,
(unsigned long)empty_zero_page + PAGE_SIZE);
}

View file

@ -30,7 +30,6 @@
#define clear_page(page) memset((page), 0, PAGE_SIZE)
#define copy_page(to, from) memcpy((to), (from), PAGE_SIZE)
#define clear_user_page(page, vaddr, pg) clear_page(page)
#define copy_user_page(to, from, vaddr, pg) copy_page(to, from)
/*

View file

@ -39,16 +39,12 @@
int mem_init_done;
static void __init zone_sizes_init(void)
void __init arch_zone_limits_init(unsigned long *max_zone_pfns)
{
unsigned long max_zone_pfn[MAX_NR_ZONES] = { 0 };
/*
* We use only ZONE_NORMAL
*/
max_zone_pfn[ZONE_NORMAL] = max_low_pfn;
free_area_init(max_zone_pfn);
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
}
extern const char _s_kernel_ro[], _e_kernel_ro[];
@ -141,8 +137,6 @@ void __init paging_init(void)
map_ram();
zone_sizes_init();
/* self modifying code ;) */
/* Since the old TLB miss handler has been running up until now,
* the kernel pages are still all RW, so we can still modify the

View file

@ -79,6 +79,7 @@ config PARISC
select GENERIC_CLOCKEVENTS
select CPU_NO_EFFICIENT_FFS
select THREAD_INFO_IN_TASK
select MMU_GATHER_RCU_TABLE_FREE
select NEED_DMA_MAP_STATE
select NEED_SG_DMA_LENGTH
select HAVE_ARCH_KGDB

View file

@ -21,7 +21,6 @@ struct vm_area_struct;
void clear_page_asm(void *page);
void copy_page_asm(void *to, void *from);
#define clear_user_page(vto, vaddr, page) clear_page_asm(vto)
void copy_user_highpage(struct page *to, struct page *from, unsigned long vaddr,
struct vm_area_struct *vma);
#define __HAVE_ARCH_COPY_USER_HIGHPAGE

View file

@ -5,8 +5,8 @@
#include <asm-generic/tlb.h>
#if CONFIG_PGTABLE_LEVELS == 3
#define __pmd_free_tlb(tlb, pmd, addr) pmd_free((tlb)->mm, pmd)
#define __pmd_free_tlb(tlb, pmd, addr) tlb_remove_ptdesc((tlb), virt_to_ptdesc(pmd))
#endif
#define __pte_free_tlb(tlb, pte, addr) pte_free((tlb)->mm, pte)
#define __pte_free_tlb(tlb, pte, addr) tlb_remove_ptdesc((tlb), page_ptdesc(pte))
#endif

View file

@ -693,13 +693,9 @@ static void __init fixmap_init(void)
} while (addr < end);
}
static void __init parisc_bootmem_free(void)
void __init arch_zone_limits_init(unsigned long *max_zone_pfns)
{
unsigned long max_zone_pfn[MAX_NR_ZONES] = { 0, };
max_zone_pfn[0] = memblock_end_of_DRAM();
free_area_init(max_zone_pfn);
max_zone_pfns[ZONE_NORMAL] = PFN_DOWN(memblock_end_of_DRAM());
}
void __init paging_init(void)
@ -710,9 +706,6 @@ void __init paging_init(void)
fixmap_init();
flush_cache_all_local(); /* start with known state */
flush_tlb_all_local(NULL);
sparse_init();
parisc_bootmem_free();
}
static void alloc_btlb(unsigned long start, unsigned long end, int *slot,

View file

@ -172,6 +172,7 @@ config PPC
select ARCH_STACKWALK
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_DEBUG_PAGEALLOC if PPC_BOOK3S || PPC_8xx
select ARCH_SUPPORTS_PAGE_TABLE_CHECK if !HUGETLB_PAGE
select ARCH_SUPPORTS_SCHED_MC if SMP
select ARCH_SUPPORTS_SCHED_SMT if PPC64 && SMP
select SCHED_MC if ARCH_SUPPORTS_SCHED_MC
@ -304,6 +305,7 @@ config PPC
select LOCK_MM_AND_FIND_VMA
select MMU_GATHER_PAGE_SIZE
select MMU_GATHER_RCU_TABLE_FREE
select HAVE_ARCH_TLB_REMOVE_TABLE
select MMU_GATHER_MERGE_VMAS
select MMU_LAZY_TLB_SHOOTDOWN if PPC_BOOK3S_64
select MODULES_USE_ELF_RELA

View file

@ -198,6 +198,7 @@ void unmap_kernel_page(unsigned long va);
#ifndef __ASSEMBLER__
#include <linux/sched.h>
#include <linux/threads.h>
#include <linux/page_table_check.h>
/* Bits to mask out from a PGD to get to the PUD page */
#define PGD_MASKED_BITS 0
@ -311,7 +312,11 @@ static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
pte_t *ptep)
{
return __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
page_table_check_pte_clear(mm, addr, old_pte);
return old_pte;
}
#define __HAVE_ARCH_PTEP_SET_WRPROTECT
@ -433,6 +438,11 @@ static inline bool pte_access_permitted(pte_t pte, bool write)
return true;
}
static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
{
return pte_present(pte) && !is_kernel_addr(addr);
}
/* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
*

View file

@ -144,6 +144,8 @@
#define PAGE_KERNEL_ROX __pgprot(_PAGE_BASE | _PAGE_KERNEL_ROX)
#ifndef __ASSEMBLER__
#include <linux/page_table_check.h>
/*
* page table defines
*/
@ -416,8 +418,11 @@ static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0, 0);
return __pte(old);
pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~0UL, 0, 0));
page_table_check_pte_clear(mm, addr, old_pte);
return old_pte;
}
#define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL
@ -426,11 +431,16 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
pte_t *ptep, int full)
{
if (full && radix_enabled()) {
pte_t old_pte;
/*
* We know that this is a full mm pte clear and
* hence can be sure there is no parallel set_pte.
*/
return radix__ptep_get_and_clear_full(mm, addr, ptep, full);
old_pte = radix__ptep_get_and_clear_full(mm, addr, ptep, full);
page_table_check_pte_clear(mm, addr, old_pte);
return old_pte;
}
return ptep_get_and_clear(mm, addr, ptep);
}
@ -539,6 +549,11 @@ static inline bool pte_access_permitted(pte_t pte, bool write)
return arch_pte_access_permitted(pte_val(pte), write, 0);
}
static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
{
return pte_present(pte) && pte_user(pte);
}
/*
* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
@ -909,6 +924,12 @@ static inline bool pud_access_permitted(pud_t pud, bool write)
return pte_access_permitted(pud_pte(pud), write);
}
#define pud_user_accessible_page pud_user_accessible_page
static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
{
return pud_leaf(pud) && pte_user_accessible_page(pud_pte(pud), addr);
}
#define __p4d_raw(x) ((p4d_t) { __pgd_raw(x) })
static inline __be64 p4d_raw(p4d_t x)
{
@ -1074,6 +1095,12 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write)
return pte_access_permitted(pmd_pte(pmd), write);
}
#define pmd_user_accessible_page pmd_user_accessible_page
static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
{
return pmd_leaf(pmd) && pte_user_accessible_page(pmd_pte(pmd), addr);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
extern pud_t pfn_pud(unsigned long pfn, pgprot_t pgprot);
@ -1284,19 +1311,34 @@ extern int pudp_test_and_clear_young(struct vm_area_struct *vma,
static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
unsigned long addr, pmd_t *pmdp)
{
if (radix_enabled())
return radix__pmdp_huge_get_and_clear(mm, addr, pmdp);
return hash__pmdp_huge_get_and_clear(mm, addr, pmdp);
pmd_t old_pmd;
if (radix_enabled()) {
old_pmd = radix__pmdp_huge_get_and_clear(mm, addr, pmdp);
} else {
old_pmd = hash__pmdp_huge_get_and_clear(mm, addr, pmdp);
}
page_table_check_pmd_clear(mm, addr, old_pmd);
return old_pmd;
}
#define __HAVE_ARCH_PUDP_HUGE_GET_AND_CLEAR
static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm,
unsigned long addr, pud_t *pudp)
{
if (radix_enabled())
return radix__pudp_huge_get_and_clear(mm, addr, pudp);
BUG();
return *pudp;
pud_t old_pud;
if (radix_enabled()) {
old_pud = radix__pudp_huge_get_and_clear(mm, addr, pudp);
} else {
BUG();
}
page_table_check_pud_clear(mm, addr, old_pud);
return old_pud;
}
static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,

View file

@ -12,7 +12,6 @@
#define PPC64_TLB_BATCH_NR 192
struct ppc64_tlb_batch {
int active;
unsigned long index;
struct mm_struct *mm;
real_pte_t pte[PPC64_TLB_BATCH_NR];
@ -24,12 +23,8 @@ DECLARE_PER_CPU(struct ppc64_tlb_batch, ppc64_tlb_batch);
extern void __flush_tlb_pending(struct ppc64_tlb_batch *batch);
#define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
static inline void arch_enter_lazy_mmu_mode(void)
{
struct ppc64_tlb_batch *batch;
if (radix_enabled())
return;
/*
@ -37,11 +32,9 @@ static inline void arch_enter_lazy_mmu_mode(void)
* operating on kernel page tables.
*/
preempt_disable();
batch = this_cpu_ptr(&ppc64_tlb_batch);
batch->active = 1;
}
static inline void arch_leave_lazy_mmu_mode(void)
static inline void arch_flush_lazy_mmu_mode(void)
{
struct ppc64_tlb_batch *batch;
@ -51,11 +44,16 @@ static inline void arch_leave_lazy_mmu_mode(void)
if (batch->index)
__flush_tlb_pending(batch);
batch->active = 0;
preempt_enable();
}
#define arch_flush_lazy_mmu_mode() do {} while (0)
static inline void arch_leave_lazy_mmu_mode(void)
{
if (radix_enabled())
return;
arch_flush_lazy_mmu_mode();
preempt_enable();
}
extern void hash__tlbiel_all(unsigned int action);

View file

@ -68,7 +68,6 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep,
pte_t pte, int dirty);
void gigantic_hugetlb_cma_reserve(void) __init;
#include <asm-generic/hugetlb.h>
#else /* ! CONFIG_HUGETLB_PAGE */
@ -77,10 +76,6 @@ static inline void flush_hugetlb_page(struct vm_area_struct *vma,
{
}
static inline void __init gigantic_hugetlb_cma_reserve(void)
{
}
static inline void __init hugetlbpage_init_defaultsize(void)
{
}

View file

@ -29,6 +29,8 @@ static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, p
#ifndef __ASSEMBLER__
#include <linux/page_table_check.h>
extern int icache_44x_need_flush;
#ifndef pte_huge_size
@ -122,7 +124,11 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
pte_t *ptep)
{
return __pte(pte_update(mm, addr, ptep, ~0UL, 0, 0));
pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~0UL, 0, 0));
page_table_check_pte_clear(mm, addr, old_pte);
return old_pte;
}
#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
@ -243,6 +249,11 @@ static inline bool pte_access_permitted(pte_t pte, bool write)
return true;
}
static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
{
return pte_present(pte) && !is_kernel_addr(addr);
}
/* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
*

View file

@ -271,6 +271,7 @@ static inline const void *pfn_to_kaddr(unsigned long pfn)
struct page;
extern void clear_user_page(void *page, unsigned long vaddr, struct page *pg);
#define clear_user_page clear_user_page
extern void copy_user_page(void *to, void *from, unsigned long vaddr,
struct page *p);
extern int devmem_is_allowed(unsigned long pfn);

View file

@ -34,6 +34,8 @@ struct mm_struct;
void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
pte_t pte, unsigned int nr);
#define set_ptes set_ptes
void set_pte_at_unchecked(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte);
#define update_mmu_cache(vma, addr, ptep) \
update_mmu_cache_range(NULL, vma, addr, ptep, 1)
@ -202,6 +204,14 @@ static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size)
#endif /* CONFIG_PPC64 */
#ifndef pmd_user_accessible_page
#define pmd_user_accessible_page(pmd, addr) false
#endif
#ifndef pud_user_accessible_page
#define pud_user_accessible_page(pud, addr) false
#endif
#endif /* __ASSEMBLER__ */
#endif /* _ASM_POWERPC_PGTABLE_H */

View file

@ -20,7 +20,11 @@ extern void reloc_got2(unsigned long);
void check_for_initrd(void);
void mem_topology_setup(void);
#ifdef CONFIG_NUMA
void initmem_init(void);
#else
static inline void initmem_init(void) {}
#endif
void setup_panic(void);
#define ARCH_PANIC_TIMEOUT 180

View file

@ -154,12 +154,10 @@ void arch_setup_new_exec(void);
/* Don't move TLF_NAPPING without adjusting the code in entry_32.S */
#define TLF_NAPPING 0 /* idle thread enabled NAP mode */
#define TLF_SLEEPING 1 /* suspend code enabled SLEEP mode */
#define TLF_LAZY_MMU 3 /* tlb_batch is active */
#define TLF_RUNLATCH 4 /* Is the runlatch enabled? */
#define _TLF_NAPPING (1 << TLF_NAPPING)
#define _TLF_SLEEPING (1 << TLF_SLEEPING)
#define _TLF_LAZY_MMU (1 << TLF_LAZY_MMU)
#define _TLF_RUNLATCH (1 << TLF_RUNLATCH)
#ifndef __ASSEMBLER__

View file

@ -37,7 +37,6 @@ extern void tlb_flush(struct mmu_gather *tlb);
*/
#define tlb_needs_table_invalidate() radix_enabled()
#define __HAVE_ARCH_TLB_REMOVE_TABLE
/* Get the generic bits... */
#include <asm-generic/tlb.h>

View file

@ -1281,9 +1281,6 @@ struct task_struct *__switch_to(struct task_struct *prev,
{
struct thread_struct *new_thread, *old_thread;
struct task_struct *last;
#ifdef CONFIG_PPC_64S_HASH_MMU
struct ppc64_tlb_batch *batch;
#endif
new_thread = &new->thread;
old_thread = &current->thread;
@ -1291,14 +1288,6 @@ struct task_struct *__switch_to(struct task_struct *prev,
WARN_ON(!irqs_disabled());
#ifdef CONFIG_PPC_64S_HASH_MMU
batch = this_cpu_ptr(&ppc64_tlb_batch);
if (batch->active) {
current_thread_info()->local_flags |= _TLF_LAZY_MMU;
if (batch->index)
__flush_tlb_pending(batch);
batch->active = 0;
}
/*
* On POWER9 the copy-paste buffer can only paste into
* foreign real addresses, so unprivileged processes can not
@ -1369,20 +1358,6 @@ struct task_struct *__switch_to(struct task_struct *prev,
*/
#ifdef CONFIG_PPC_BOOK3S_64
#ifdef CONFIG_PPC_64S_HASH_MMU
/*
* This applies to a process that was context switched while inside
* arch_enter_lazy_mmu_mode(), to re-activate the batch that was
* deactivated above, before _switch(). This will never be the case
* for new tasks.
*/
if (current_thread_info()->local_flags & _TLF_LAZY_MMU) {
current_thread_info()->local_flags &= ~_TLF_LAZY_MMU;
batch = this_cpu_ptr(&ppc64_tlb_batch);
batch->active = 1;
}
#endif
/*
* Math facilities are masked out of the child MSR in copy_thread.
* A new task does not need to restore_math because it will

View file

@ -1003,7 +1003,6 @@ void __init setup_arch(char **cmdline_p)
fadump_cma_init();
kdump_cma_reserve();
kvm_cma_reserve();
gigantic_hugetlb_cma_reserve();
early_memtest(min_low_pfn << PAGE_SHIFT, max_low_pfn << PAGE_SHIFT);

View file

@ -8,6 +8,7 @@
#include <linux/sched.h>
#include <linux/mm_types.h>
#include <linux/mm.h>
#include <linux/page_table_check.h>
#include <linux/stop_machine.h>
#include <asm/sections.h>
@ -230,6 +231,9 @@ pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long addres
pmd = *pmdp;
pmd_clear(pmdp);
page_table_check_pmd_clear(vma->vm_mm, address, pmd);
/*
* Wait for all pending hash_page to finish. This is needed
* in case of subpage collapse. When we collapse normal pages

View file

@ -25,11 +25,12 @@
#include <asm/tlb.h>
#include <asm/bug.h>
#include <asm/pte-walk.h>
#include <kunit/visibility.h>
#include <trace/events/thp.h>
DEFINE_PER_CPU(struct ppc64_tlb_batch, ppc64_tlb_batch);
EXPORT_SYMBOL_IF_KUNIT(ppc64_tlb_batch);
/*
* A linux PTE was changed and the corresponding hash table entry
@ -100,7 +101,7 @@ void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
* Check if we have an active batch on this CPU. If not, just
* flush now and return.
*/
if (!batch->active) {
if (!is_lazy_mmu_mode_active()) {
flush_hash_page(vpn, rpte, psize, ssize, mm_is_thread_local(mm));
put_cpu_var(ppc64_tlb_batch);
return;
@ -154,6 +155,7 @@ void __flush_tlb_pending(struct ppc64_tlb_batch *batch)
flush_hash_range(i, local);
batch->index = 0;
}
EXPORT_SYMBOL_IF_KUNIT(__flush_tlb_pending);
void hash__tlb_flush(struct mmu_gather *tlb)
{
@ -205,7 +207,7 @@ void __flush_hash_table_range(unsigned long start, unsigned long end)
* way to do things but is fine for our needs here.
*/
local_irq_save(flags);
arch_enter_lazy_mmu_mode();
lazy_mmu_mode_enable();
for (; start < end; start += PAGE_SIZE) {
pte_t *ptep = find_init_mm_pte(start, &hugepage_shift);
unsigned long pte;
@ -217,7 +219,7 @@ void __flush_hash_table_range(unsigned long start, unsigned long end)
continue;
hpte_need_flush(&init_mm, start, ptep, pte, hugepage_shift);
}
arch_leave_lazy_mmu_mode();
lazy_mmu_mode_disable();
local_irq_restore(flags);
}
@ -237,7 +239,7 @@ void flush_hash_table_pmd_range(struct mm_struct *mm, pmd_t *pmd, unsigned long
* way to do things but is fine for our needs here.
*/
local_irq_save(flags);
arch_enter_lazy_mmu_mode();
lazy_mmu_mode_enable();
start_pte = pte_offset_map(pmd, addr);
if (!start_pte)
goto out;
@ -249,6 +251,6 @@ void flush_hash_table_pmd_range(struct mm_struct *mm, pmd_t *pmd, unsigned long
}
pte_unmap(start_pte);
out:
arch_leave_lazy_mmu_mode();
lazy_mmu_mode_disable();
local_irq_restore(flags);
}

View file

@ -10,6 +10,7 @@
#include <linux/pkeys.h>
#include <linux/debugfs.h>
#include <linux/proc_fs.h>
#include <linux/page_table_check.h>
#include <asm/pgalloc.h>
#include <asm/tlb.h>
@ -127,7 +128,8 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
WARN_ON(!(pmd_leaf(pmd)));
#endif
trace_hugepage_set_pmd(addr, pmd_val(pmd));
return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
page_table_check_pmd_set(mm, addr, pmdp, pmd);
return set_pte_at_unchecked(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
}
void set_pud_at(struct mm_struct *mm, unsigned long addr,
@ -144,7 +146,8 @@ void set_pud_at(struct mm_struct *mm, unsigned long addr,
WARN_ON(!(pud_leaf(pud)));
#endif
trace_hugepage_set_pud(addr, pud_val(pud));
return set_pte_at(mm, addr, pudp_ptep(pudp), pud_pte(pud));
page_table_check_pud_set(mm, addr, pudp, pud);
return set_pte_at_unchecked(mm, addr, pudp_ptep(pudp), pud_pte(pud));
}
static void do_serialize(void *arg)
@ -179,23 +182,27 @@ void serialize_against_pte_lookup(struct mm_struct *mm)
pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp)
{
unsigned long old_pmd;
pmd_t old_pmd;
VM_WARN_ON_ONCE(!pmd_present(*pmdp));
old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, _PAGE_INVALID);
old_pmd = __pmd(pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, _PAGE_INVALID));
flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
return __pmd(old_pmd);
page_table_check_pmd_clear(vma->vm_mm, address, old_pmd);
return old_pmd;
}
pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address,
pud_t *pudp)
{
unsigned long old_pud;
pud_t old_pud;
VM_WARN_ON_ONCE(!pud_present(*pudp));
old_pud = pud_hugepage_update(vma->vm_mm, address, pudp, _PAGE_PRESENT, _PAGE_INVALID);
old_pud = __pud(pud_hugepage_update(vma->vm_mm, address, pudp, _PAGE_PRESENT, _PAGE_INVALID));
flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE);
return __pud(old_pud);
page_table_check_pud_clear(vma->vm_mm, address, old_pud);
return old_pud;
}
pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma,
@ -550,7 +557,7 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
if (radix_enabled())
return radix__ptep_modify_prot_commit(vma, addr,
ptep, old_pte, pte);
set_pte_at(vma->vm_mm, addr, ptep, pte);
set_pte_at_unchecked(vma->vm_mm, addr, ptep, pte);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE

View file

@ -14,6 +14,7 @@
#include <linux/of.h>
#include <linux/of_fdt.h>
#include <linux/mm.h>
#include <linux/page_table_check.h>
#include <linux/hugetlb.h>
#include <linux/string_helpers.h>
#include <linux/memory.h>
@ -1474,6 +1475,8 @@ pmd_t radix__pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long addre
pmd = *pmdp;
pmd_clear(pmdp);
page_table_check_pmd_clear(vma->vm_mm, address, pmd);
radix__flush_tlb_collapsed_pmd(vma->vm_mm, address);
return pmd;
@ -1606,7 +1609,7 @@ void radix__ptep_modify_prot_commit(struct vm_area_struct *vma,
(atomic_read(&mm->context.copros) > 0))
radix__flush_tlb_page(vma, addr);
set_pte_at(mm, addr, ptep, pte);
set_pte_at_unchecked(mm, addr, ptep, pte);
}
int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
@ -1617,7 +1620,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
if (!radix_enabled())
return 0;
set_pte_at(&init_mm, 0 /* radix unused */, ptep, new_pud);
set_pte_at_unchecked(&init_mm, 0 /* radix unused */, ptep, new_pud);
return 1;
}
@ -1664,7 +1667,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
if (!radix_enabled())
return 0;
set_pte_at(&init_mm, 0 /* radix unused */, ptep, new_pmd);
set_pte_at_unchecked(&init_mm, 0 /* radix unused */, ptep, new_pmd);
return 1;
}

View file

@ -73,13 +73,13 @@ static void hpte_flush_range(struct mm_struct *mm, unsigned long addr,
pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
if (!pte)
return;
arch_enter_lazy_mmu_mode();
lazy_mmu_mode_enable();
for (; npages > 0; --npages) {
pte_update(mm, addr, pte, 0, 0, 0);
addr += PAGE_SIZE;
++pte;
}
arch_leave_lazy_mmu_mode();
lazy_mmu_mode_disable();
pte_unmap_unlock(pte - 1, ptl);
}

View file

@ -200,18 +200,15 @@ static int __init hugetlbpage_init(void)
arch_initcall(hugetlbpage_init);
void __init gigantic_hugetlb_cma_reserve(void)
unsigned int __init arch_hugetlb_cma_order(void)
{
unsigned long order = 0;
if (radix_enabled())
order = PUD_SHIFT - PAGE_SHIFT;
return PUD_SHIFT - PAGE_SHIFT;
else if (!firmware_has_feature(FW_FEATURE_LPAR) && mmu_psize_defs[MMU_PAGE_16G].shift)
/*
* For pseries we do use ibm,expected#pages for reserving 16G pages.
*/
order = mmu_psize_to_shift(MMU_PAGE_16G) - PAGE_SHIFT;
return mmu_psize_to_shift(MMU_PAGE_16G) - PAGE_SHIFT;
if (order)
hugetlb_cma_reserve(order);
return 0;
}

View file

@ -182,11 +182,6 @@ void __init mem_topology_setup(void)
memblock_set_node(0, PHYS_ADDR_MAX, &memblock.memory, 0);
}
void __init initmem_init(void)
{
sparse_init();
}
/* mark pages that don't exist as nosave */
static int __init mark_nonram_nosave(void)
{
@ -221,7 +216,16 @@ static int __init mark_nonram_nosave(void)
* anyway) will take a first dip into ZONE_NORMAL and get otherwise served by
* ZONE_DMA.
*/
static unsigned long max_zone_pfns[MAX_NR_ZONES];
void __init arch_zone_limits_init(unsigned long *max_zone_pfns)
{
#ifdef CONFIG_ZONE_DMA
max_zone_pfns[ZONE_DMA] = min((zone_dma_limit >> PAGE_SHIFT) + 1, max_low_pfn);
#endif
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
#ifdef CONFIG_HIGHMEM
max_zone_pfns[ZONE_HIGHMEM] = max_pfn;
#endif
}
/*
* paging_init() sets up the page tables - in fact we've already done this.
@ -259,17 +263,6 @@ void __init paging_init(void)
zone_dma_limit = DMA_BIT_MASK(zone_dma_bits);
#ifdef CONFIG_ZONE_DMA
max_zone_pfns[ZONE_DMA] = min(max_low_pfn,
1UL << (zone_dma_bits - PAGE_SHIFT));
#endif
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
#ifdef CONFIG_HIGHMEM
max_zone_pfns[ZONE_HIGHMEM] = max_pfn;
#endif
free_area_init(max_zone_pfns);
mark_nonram_nosave();
}

View file

@ -1213,8 +1213,6 @@ void __init initmem_init(void)
setup_node_data(nid, start_pfn, end_pfn);
}
sparse_init();
/*
* We need the numa_cpu_lookup_table to be accurate for all CPUs,
* even before we online them, so that we can use cpu_to_{node,mem}

View file

@ -22,6 +22,7 @@
#include <linux/mm.h>
#include <linux/percpu.h>
#include <linux/hardirq.h>
#include <linux/page_table_check.h>
#include <linux/hugetlb.h>
#include <asm/tlbflush.h>
#include <asm/tlb.h>
@ -206,6 +207,9 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
* and not hw_valid ptes. Hence there is no translation cache flush
* involved that need to be batched.
*/
page_table_check_ptes_set(mm, addr, ptep, pte, nr);
for (;;) {
/*
@ -224,6 +228,14 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
}
}
void set_pte_at_unchecked(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte)
{
VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
pte = set_pte_filter(pte, addr);
__set_pte_at(mm, addr, ptep, pte, 0);
}
void unmap_kernel_page(unsigned long va)
{
pmd_t *pmdp = pmd_off_k(va);

View file

@ -93,6 +93,7 @@ config PPC_BOOK3S_64
select IRQ_WORK
select PPC_64S_HASH_MMU if !PPC_RADIX_MMU
select KASAN_VMALLOC if KASAN
select ARCH_HAS_LAZY_MMU_MODE
config PPC_BOOK3E_64
bool "Embedded processors"

View file

@ -120,7 +120,7 @@ config PPC_SMLPAR
config CMM
tristate "Collaborative memory management"
depends on PPC_SMLPAR
select MEMORY_BALLOON
select BALLOON
default y
help
Select this option, if you want to enable the kernel interface

View file

@ -19,7 +19,7 @@
#include <linux/stringify.h>
#include <linux/swap.h>
#include <linux/device.h>
#include <linux/balloon_compaction.h>
#include <linux/balloon.h>
#include <asm/firmware.h>
#include <asm/hvcall.h>
#include <asm/mmu.h>
@ -165,7 +165,6 @@ static long cmm_alloc_pages(long nr)
balloon_page_enqueue(&b_dev_info, page);
atomic_long_inc(&loaned_pages);
adjust_managed_page_count(page, -1);
nr--;
}
@ -190,7 +189,6 @@ static long cmm_free_pages(long nr)
if (!page)
break;
plpar_page_set_active(page);
adjust_managed_page_count(page, 1);
__free_page(page);
atomic_long_dec(&loaned_pages);
nr--;
@ -496,13 +494,11 @@ static struct notifier_block cmm_mem_nb = {
.priority = CMM_MEM_HOTPLUG_PRI
};
#ifdef CONFIG_BALLOON_COMPACTION
#ifdef CONFIG_BALLOON_MIGRATION
static int cmm_migratepage(struct balloon_dev_info *b_dev_info,
struct page *newpage, struct page *page,
enum migrate_mode mode)
{
unsigned long flags;
/*
* loan/"inflate" the newpage first.
*
@ -517,47 +513,17 @@ static int cmm_migratepage(struct balloon_dev_info *b_dev_info,
return -EBUSY;
}
/* balloon page list reference */
get_page(newpage);
/*
* When we migrate a page to a different zone, we have to fixup the
* count of both involved zones as we adjusted the managed page count
* when inflating.
*/
if (page_zone(page) != page_zone(newpage)) {
adjust_managed_page_count(page, 1);
adjust_managed_page_count(newpage, -1);
}
spin_lock_irqsave(&b_dev_info->pages_lock, flags);
balloon_page_insert(b_dev_info, newpage);
__count_vm_event(BALLOON_MIGRATE);
b_dev_info->isolated_pages--;
spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
/*
* activate/"deflate" the old page. We ignore any errors just like the
* other callers.
*/
plpar_page_set_active(page);
balloon_page_finalize(page);
/* balloon page list reference */
put_page(page);
return 0;
}
static void cmm_balloon_compaction_init(void)
{
b_dev_info.migratepage = cmm_migratepage;
}
#else /* CONFIG_BALLOON_COMPACTION */
static void cmm_balloon_compaction_init(void)
{
}
#endif /* CONFIG_BALLOON_COMPACTION */
#else /* CONFIG_BALLOON_MIGRATION */
int cmm_migratepage(struct balloon_dev_info *b_dev_info, struct page *newpage,
struct page *page, enum migrate_mode mode);
#endif /* CONFIG_BALLOON_MIGRATION */
/**
* cmm_init - Module initialization
@ -573,11 +539,13 @@ static int cmm_init(void)
return -EOPNOTSUPP;
balloon_devinfo_init(&b_dev_info);
cmm_balloon_compaction_init();
b_dev_info.adjust_managed_page_count = true;
if (IS_ENABLED(CONFIG_BALLOON_MIGRATION))
b_dev_info.migratepage = cmm_migratepage;
rc = register_oom_notifier(&cmm_oom_nb);
if (rc < 0)
goto out_balloon_compaction;
return rc;
if ((rc = register_reboot_notifier(&cmm_reboot_nb)))
goto out_oom_notifier;
@ -606,7 +574,6 @@ out_reboot_notifier:
unregister_reboot_notifier(&cmm_reboot_nb);
out_oom_notifier:
unregister_oom_notifier(&cmm_oom_nb);
out_balloon_compaction:
return rc;
}

View file

@ -50,7 +50,6 @@ void clear_page(void *page);
#endif
#define copy_page(to, from) memcpy((to), (from), PAGE_SIZE)
#define clear_user_page(pgaddr, vaddr, page) clear_page(pgaddr)
#define copy_user_page(vto, vfrom, vaddr, topg) \
memcpy((vto), (vfrom), PAGE_SIZE)

View file

@ -627,7 +627,7 @@ static inline void __set_pte_at(struct mm_struct *mm, pte_t *ptep, pte_t pteval)
static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pteval, unsigned int nr)
{
page_table_check_ptes_set(mm, ptep, pteval, nr);
page_table_check_ptes_set(mm, addr, ptep, pteval, nr);
for (;;) {
__set_pte_at(mm, ptep, pteval);
@ -664,7 +664,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
set_pte(ptep, __pte(0));
#endif
page_table_check_pte_clear(mm, pte);
page_table_check_pte_clear(mm, address, pte);
return pte;
}
@ -946,29 +946,29 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd)
{
page_table_check_pmd_set(mm, pmdp, pmd);
page_table_check_pmd_set(mm, addr, pmdp, pmd);
return __set_pte_at(mm, (pte_t *)pmdp, pmd_pte(pmd));
}
static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
pud_t *pudp, pud_t pud)
{
page_table_check_pud_set(mm, pudp, pud);
page_table_check_pud_set(mm, addr, pudp, pud);
return __set_pte_at(mm, (pte_t *)pudp, pud_pte(pud));
}
#ifdef CONFIG_PAGE_TABLE_CHECK
static inline bool pte_user_accessible_page(pte_t pte)
static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
{
return pte_present(pte) && pte_user(pte);
}
static inline bool pmd_user_accessible_page(pmd_t pmd)
static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
{
return pmd_leaf(pmd) && pmd_user(pmd);
}
static inline bool pud_user_accessible_page(pud_t pud)
static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
{
return pud_leaf(pud) && pud_user(pud);
}
@ -1007,7 +1007,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
pmd_clear(pmdp);
#endif
page_table_check_pmd_clear(mm, pmd);
page_table_check_pmd_clear(mm, address, pmd);
return pmd;
}
@ -1023,7 +1023,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
{
page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
return __pmd(atomic_long_xchg((atomic_long_t *)pmdp, pmd_val(pmd)));
}
@ -1101,7 +1101,7 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm,
pud_clear(pudp);
#endif
page_table_check_pud_clear(mm, pud);
page_table_check_pud_clear(mm, address, pud);
return pud;
}
@ -1122,7 +1122,7 @@ static inline void update_mmu_cache_pud(struct vm_area_struct *vma,
static inline pud_t pudp_establish(struct vm_area_struct *vma,
unsigned long address, pud_t *pudp, pud_t pud)
{
page_table_check_pud_set(vma->vm_mm, pudp, pud);
page_table_check_pud_set(vma->vm_mm, address, pudp, pud);
return __pud(atomic_long_xchg((atomic_long_t *)pudp, pud_val(pud)));
}

Some files were not shown because too many files have changed in this diff Show more