linux/kernel
Baoquan He b3ba234171 kexec_file: load kernel at top of system RAM if required
Patch series "kexec_file: Load kernel at top of system RAM if required".

Justification:
==============

Kexec_load interface has been doing top down searching and loading
kernel/initrd/purgtory etc to prepare for kexec reboot.  In that way, the
benefits are that it avoids to consume and fragment limited low memory
which satisfy DMA buffer allocation and big chunk of continuous memory
during system init; and avoids to stir with BIOS/FW reserved or occupied
areas, or corner case handling/work around/quirk occupied areas when doing
system init.  By the way, the top-down searching and loading of kexec-ed
kernel is done in user space utility code.

For kexec_file loading, even if kexec_buf.top_down is 'true', it's simply
ignored.  It calls walk_system_ram_res() directly to go through all
resources of System RAM bottom up, to find an available memory region,
then call locate_mem_hole_callback() to allocate memory in that found
memory region from top to down.  This is not expected and inconsistent
with kexec_load.

Implementation
===============

In patch 1, introduce a new function walk_system_ram_res_rev() which is a
variant of walk_system_ram_res(), it walks through a list of all the
resources of System RAM in reversed order, i.e., from higher to lower.

In patch 2, check if kexec_buf.top_down is 'true' in
kexec_walk_resources(), if yes, call walk_system_ram_res_rev() to find
memory region of system RAM from top to down to load kernel/initrd etc.

Background information: ======================= And I ever tried this in
the past in a different way, please see below link.  In the post, I tried
to adjust struct sibling linking code, replace the the singly linked list
with list_head so that walk_system_ram_res_rev() can be implemented in a
much easier way.  Finally I failed. 
https://lore.kernel.org/all/20180718024944.577-4-bhe@redhat.com/

This time, I picked up the patch from AKASHI Takahiro's old post and made
some change to take as the current patch 1:
https://lists.infradead.org/pipermail/linux-arm-kernel/2017-September/531456.html


This patch (of 2):

Kexec_load interface has been doing top down searching and loading
kernel/initrd/purgtory etc to prepare for kexec reboot.  In that way, the
benefits are that it avoids to consume and fragment limited low memory
which satisfy DMA buffer allocation and big chunk of continuous memory
during system init; and avoids to stir with BIOS/FW reserved or occupied
areas, or corner case handling/work around/quirk occupied areas when doing
system init.  By the way, the top-down searching and loading of kexec-ed
kernel is done in user space utility code.

For kexec_file loading, even if kexec_buf.top_down is 'true', it's simply
ignored.  It calls walk_system_ram_res() directly to go through all
resources of System RAM bottom up, to find an available memory region,
then call locate_mem_hole_callback() to allocate memory in that found
memory region from top to down.  This is not expected and inconsistent
with kexec_load.

Here check if kexec_buf.top_down is 'true' in kexec_walk_resources(), if
yes, call the newly added walk_system_ram_res_rev() to find memory region
of system RAM from top to down to load kernel/initrd etc.

Link: https://lkml.kernel.org/r/20231114091658.228030-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20231114091658.228030-3-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-10 17:21:44 -08:00
..
bpf bpf: Add missed allocation hint for bpf_mem_cache_alloc_flags() 2023-11-26 18:00:26 -08:00
cgroup sched: psi: fix unprivileged polling against cgroups 2023-11-14 22:27:00 +01:00
configs hardening: Provide Kconfig fragments for basic options 2023-09-22 09:50:55 -07:00
debug kdb: Corrects comment for kdballocenv 2023-11-06 17:13:55 +00:00
dma swiotlb: fix out-of-bounds TLB allocations with CONFIG_SWIOTLB_DYNAMIC 2023-11-08 16:27:05 +01:00
entry entry: Remove empty addr_limit_user_check() 2023-08-23 10:32:39 +02:00
events perf/core: Fix cpuctx refcounting 2023-11-15 04:18:31 +01:00
futex futex: Fix hardcoded flags 2023-11-15 04:02:25 +01:00
gcov gcov: annotate struct gcov_iterator with __counted_by 2023-10-18 14:43:22 -07:00
irq As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
kcsan mm: delete checks for xor_unlock_is_negative_byte() 2023-10-18 14:34:17 -07:00
livepatch livepatch: Fix missing newline character in klp_resolve_symbols() 2023-09-20 11:24:18 +02:00
locking lockdep: Fix block chain corruption 2023-11-24 11:04:54 +01:00
module This update includes the following changes: 2023-11-02 16:15:30 -10:00
power Power management updates for 6.7-rc1 2023-10-31 15:38:12 -10:00
printk TTY/Serial changes for 6.7-rc1 2023-11-03 15:44:25 -10:00
rcu RCU fixes for v6.7 2023-11-08 09:47:52 -08:00
sched sched: fair: move unused stub functions to header 2023-12-10 17:21:43 -08:00
time - Do the push of pending hrtimers away from a CPU which is being 2023-11-19 13:35:07 -08:00
trace rethook: Use __rcu pointer for rethook::handler 2023-12-01 14:53:56 +09:00
.gitignore
acct.c fs: rename __mnt_{want,drop}_write*() helpers 2023-09-11 15:05:50 +02:00
async.c
audit.c audit: move trailing statements to next line 2023-08-15 18:16:14 -04:00
audit.h audit: correct audit_filter_inodes() definition 2023-07-21 12:17:25 -04:00
audit_fsnotify.c
audit_tree.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
audit_watch.c audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare() 2023-11-14 17:34:27 -05:00
auditfilter.c audit: move trailing statements to next line 2023-08-15 18:16:14 -04:00
auditsc.c audit,io_uring: io_uring openat triggers audit reference count underflow 2023-10-13 18:34:46 +02:00
backtracetest.c
bounds.c
capability.c lsm: constify the 'target' parameter in security_capget() 2023-08-08 16:48:47 -04:00
cfi.c
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-03-14 19:32:38 -07:00
configs.c
context_tracking.c locking/atomic: treewide: use raw_atomic*_<op>() 2023-06-05 09:57:20 +02:00
cpu.c - Do the push of pending hrtimers away from a CPU which is being 2023-11-19 13:35:07 -08:00
cpu_pm.c cpuidle, cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}() 2023-01-13 11:48:15 +01:00
crash_core.c crash_core.c: remove unneeded functions 2023-10-04 10:41:58 -07:00
crash_dump.c
cred.c lsm/stable-6.7 PR 20231030 2023-10-30 20:13:17 -10:00
delayacct.c delayacct: track delays from IRQ/SOFTIRQ 2023-04-18 16:39:34 -07:00
dma.c
exec_domain.c
exit.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
exit.h exit: add internal include file with helpers 2023-09-21 12:03:50 -06:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-02-08 13:36:22 +01:00
fork.c arch: remove ARCH_TASK_STRUCT_ALLOCATOR 2023-12-10 17:21:31 -08:00
freezer.c freezer,sched: Use saved_state to reduce some spurious wakeups 2023-09-18 08:14:36 +02:00
gen_kheaders.sh Revert "kheaders: substituting --sort in archive creation" 2023-05-28 16:20:21 +09:00
groups.c groups: Convert group_info.usage to refcount_t 2023-09-29 11:28:39 -07:00
hung_task.c kernel/hung_task.c: set some hung_task.c variables storage-class-specifier to static 2023-04-08 13:45:37 -07:00
iomem.c kernel/iomem.c: remove __weak ioremap_cache helper 2023-08-21 13:37:28 -07:00
irq_work.c trace: Add trace_ipi_send_cpu() 2023-03-24 11:01:29 +01:00
jump_label.c jump_label: Prevent key->enabled int overflow 2022-12-01 15:53:05 -08:00
kallsyms.c kallsyms: Change func signature for cleanup_symbol_name() 2023-08-25 15:00:36 -07:00
kallsyms_internal.h
kallsyms_selftest.c Modules changes for v6.6-rc1 2023-08-29 17:32:32 -07:00
kallsyms_selftest.h
kcmp.c file: convert to SLAB_TYPESAFE_BY_RCU 2023-10-19 11:02:48 +02:00
Kconfig.freezer
Kconfig.hz
Kconfig.kexec kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP 2023-12-06 16:12:48 -08:00
Kconfig.locks
Kconfig.preempt
kcov.c kcov: add prototypes for helper functions 2023-06-09 17:44:17 -07:00
kexec.c kernel: kexec: copy user-array safely 2023-10-09 16:59:47 +10:00
kexec_core.c kexec: use atomic_try_cmpxchg in crash_kexec 2023-12-10 17:21:33 -08:00
kexec_elf.c
kexec_file.c kexec_file: load kernel at top of system RAM if required 2023-12-10 17:21:44 -08:00
kexec_internal.h
kheaders.c kheaders: Use array declaration instead of char 2023-03-24 20:10:59 -07:00
kprobes.c kprobes: consistent rcu api usage for kretprobe holder 2023-12-01 14:53:55 +09:00
ksyms_common.c kallsyms: make kallsyms_show_value() as generic function 2023-06-08 12:27:20 -07:00
ksysfs.c crash: hotplug support for kexec_load() 2023-08-24 16:25:14 -07:00
kthread.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
latencytop.c
Makefile v6.5-rc1-modules-next 2023-06-28 15:51:08 -07:00
module_signature.c
notifier.c notifiers: add tracepoints to the notifiers infrastructure 2023-04-08 13:45:38 -07:00
nsproxy.c nsproxy: Convert nsproxy.count to refcount_t 2023-08-21 11:29:12 -07:00
padata.c padata: Fix refcnt handling in padata_free_shell() 2023-10-27 18:04:24 +08:00
panic.c panic: use atomic_try_cmpxchg in panic() and nmi_panic() 2023-10-04 10:41:56 -07:00
params.c kernel: params: Remove unnecessary ‘0’ values from err 2023-07-10 12:47:01 -07:00
pid.c pidfd: prevent a kernel-doc warning 2023-09-19 13:21:33 -07:00
pid_namespace.c pid: pid_ns_ctl_handler: remove useless comment 2023-10-04 10:41:57 -07:00
pid_sysctl.h memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy 2023-08-21 13:37:59 -07:00
profile.c
ptrace.c __ptrace_unlink: kill the obsolete "FIXME" code 2023-12-10 17:21:32 -08:00
range.c
reboot.c kernel/reboot: explicitly notify if halt occurred instead of power off 2023-12-10 17:21:25 -08:00
regset.c
relay.c kernel: relay: remove unnecessary NULL values from relay_open_buf 2023-08-18 10:18:55 -07:00
resource.c resource: add walk_system_ram_res_rev() 2023-12-10 17:21:44 -08:00
resource_kunit.c
rseq.c rseq: Extend struct rseq with per-memory-map concurrency ID 2022-12-27 12:52:12 +01:00
scftorture.c scftorture: Pause testing after memory-allocation failure 2023-07-14 15:02:57 -07:00
scs.c
seccomp.c seccomp: Add missing kerndoc notations 2023-08-17 12:32:15 -07:00
signal.c kernel/signal.c: simplify force_sig_info_to_task(), kill recalc_sigpending_and_wake() 2023-12-10 17:21:32 -08:00
smp.c CSD lock commits for v6.7 2023-10-30 17:56:53 -10:00
smpboot.c kthread: add kthread_stop_put 2023-10-04 10:41:57 -07:00
smpboot.h
softirq.c sched/core: introduce sched_core_idle_cpu() 2023-07-13 15:21:50 +02:00
stackleak.c stackleak: allow to specify arch specific stackleak poison function 2023-04-20 11:36:35 +02:00
stacktrace.c stacktrace: Export stack_trace_save_tsk 2023-09-11 23:59:47 -04:00
static_call.c
static_call_inline.c
stop_machine.c
sys.c prctl: Disable prctl(PR_SET_MDWE) on parisc 2023-11-18 19:35:31 +01:00
sys_ni.c asm-generic updates for v6.7 2023-11-01 15:28:33 -10:00
sysctl-test.c
sysctl.c asm-generic updates for v6.7 2023-11-01 15:28:33 -10:00
task_work.c task_work: add kerneldoc annotation for 'data' argument 2023-09-19 13:21:32 -07:00
taskstats.c taskstats: fill_stats_for_tgid: use for_each_thread() 2023-10-04 10:41:57 -07:00
torture.c torture: Print out torture module parameters 2023-09-24 17:24:01 +02:00
tracepoint.c tracepoint: Allow livepatch module add trace event 2023-02-18 14:34:36 -05:00
tsacct.c
ucount.c sysctl: Add size to register_sysctl 2023-08-15 15:26:17 -07:00
uid16.c
uid16.h
umh.c sysctl: fix unused proc_cap_handler() function warning 2023-06-29 15:19:43 -07:00
up.c smp: Change function signatures to use call_single_data_t 2023-09-13 14:59:24 +02:00
user-return-notifier.c
user.c binfmt_misc: enable sandboxed mounts 2023-10-11 08:46:01 -07:00
user_namespace.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
usermode_driver.c
utsname.c
utsname_sysctl.c utsname: simplify one-level sysctl registration for uts_kern_table 2023-04-13 11:49:35 -07:00
vhost_task.c vhost: Fix worker hangs due to missed wake up calls 2023-06-08 15:43:09 -04:00
watch_queue.c kernel: watch_queue: copy user-array safely 2023-10-09 16:59:48 +10:00
watchdog.c watchdog: move softlockup_panic back to early_param 2023-11-01 12:10:02 -07:00
watchdog_buddy.c watchdog/hardlockup: move SMP barriers from common code to buddy code 2023-06-19 16:25:28 -07:00
watchdog_perf.c watchdog/perf: add a weak function for an arch to detect if perf can use NMIs 2023-06-09 17:44:21 -07:00
workqueue.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
workqueue_internal.h workqueue: Drop the special locking rule for worker->flags and worker_pool->flags 2023-08-07 15:57:22 -10:00