Linux kernel source tree
Find a file
Breno Leitao 3fa805c37d vmcoreinfo: track and log recoverable hardware errors
Introduce a generic infrastructure for tracking recoverable hardware
errors (HW errors that are visible to the OS but does not cause a panic)
and record them for vmcore consumption.  This aids post-mortem crash
analysis tools by preserving a count and timestamp for the last occurrence
of such errors.  On the other side, correctable errors, which the OS
typically remains unaware of because the underlying hardware handles them
transparently, are less relevant for crash dump and therefore are NOT
tracked in this infrastructure.

Add centralized logging for sources of recoverable hardware errors based
on the subsystem it has been notified.

hwerror_data is write-only at kernel runtime, and it is meant to be read
from vmcore using tools like crash/drgn.  For example, this is how it
looks like when opening the crashdump from drgn.

	>>> prog['hwerror_data']
	(struct hwerror_info[1]){
		{
			.count = (int)844,
			.timestamp = (time64_t)1752852018,
		},
		...

This helps fleet operators quickly triage whether a crash may be
influenced by hardware recoverable errors (which executes a uncommon code
path in the kernel), especially when recoverable errors occurred shortly
before a panic, such as the bug fixed by commit ee62ce7a1d ("page_pool:
Track DMA-mapped pages and unmap them when destroying the pool")

This is not intended to replace full hardware diagnostics but provides a
fast way to correlate hardware events with kernel panics quickly.

Rare machine check exceptions—like those indicated by mce_flags.p5 or
mce_flags.winchip—are not accounted for in this method, as they fall
outside the intended usage scope for this feature's user base.

[leitao@debian.org: add hw-recoverable-errors to toctree]
  Link: https://lkml.kernel.org/r/20251127-vmcoreinfo_fix-v1-1-26f5b1c43da9@debian.org
Link: https://lkml.kernel.org/r/20251010-vmcore_hw_error-v5-1-636ede3efe44@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Suggested-by: Tony Luck <tony.luck@intel.com>
Suggested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>	[APEI]
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Bob Moore <robert.moore@intel.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Konrad Rzessutek Wilk <konrad.wilk@oracle.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "Oliver O'Halloran" <oohall@gmail.com>
Cc: Omar Sandoval <osandov@osandov.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27 14:24:44 -08:00
arch vmcoreinfo: track and log recoverable hardware errors 2025-11-27 14:24:44 -08:00
block block-6.18-20251031 2025-10-31 12:57:19 -07:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto This push contains the following changes: 2025-10-10 08:56:16 -07:00
Documentation vmcoreinfo: track and log recoverable hardware errors 2025-11-27 14:24:44 -08:00
drivers vmcoreinfo: track and log recoverable hardware errors 2025-11-27 14:24:44 -08:00
fs Merge branch 'mm-hotfixes-stable' into mm-nonmm-stable in order to be able 2025-11-27 14:17:02 -08:00
include vmcoreinfo: track and log recoverable hardware errors 2025-11-27 14:24:44 -08:00
init init: replace simple_strtoul with kstrtoul to improve lpj_setup 2025-11-27 14:24:43 -08:00
io_uring io_uring: fix regbuf vector size truncation 2025-11-07 17:17:13 -07:00
ipc ipc: create_ipc_ns: drop mqueue mount on sysctl setup failure 2025-11-12 10:00:15 -08:00
kernel vmcoreinfo: track and log recoverable hardware errors 2025-11-27 14:24:44 -08:00
lib test_kho: always print restore status 2025-11-27 14:24:42 -08:00
LICENSES LICENSES: Replace the obsolete address of the FSF in the GFDL-1.2 2025-07-24 11:15:39 +02:00
mm mm: memfd_luo: allow preserving memfd 2025-11-27 14:24:41 -08:00
net Including fixes from bluetooth and wireless. 2025-11-06 08:52:30 -08:00
rust rbtree: inline rb_last() 2025-11-27 14:24:30 -08:00
samples selftests: complete kselftest include centralization 2025-11-27 14:24:31 -08:00
scripts Merge branch 'mm-hotfixes-stable' into mm-nonmm-stable in order to be able 2025-11-27 14:17:02 -08:00
security integrity-v6.18 2025-10-05 10:48:33 -07:00
sound ASoC: Fixes for v6.18 2025-10-30 13:08:08 +01:00
tools selftests/liveupdate: add kexec test for multiple and empty sessions 2025-11-27 14:24:42 -08:00
usr gen_init_cpio: Ignore fsync() returning EINVAL on pipes 2025-10-07 09:53:05 -07:00
virt KVM x86 fixes for 6.18: 2025-10-18 10:25:43 +02:00
.clang-format memblock: drop for_each_free_mem_pfn_range_in_zone_from() 2025-09-14 08:49:03 +03:00
.clippy.toml rust: clean Rust 1.88.0's warning about clippy::disallowed_macros configuration 2025-05-07 00:11:47 +02:00
.cocciconfig
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore MAINTAINERS: remove Alyssa Rosenzweig 2025-09-18 21:17:31 +02:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore .gitignore: ignore compile_commands.json globally 2025-08-12 15:53:55 -07:00
.mailmap Merge branch 'mm-hotfixes-stable' into mm-nonmm-stable in order to be able 2025-11-27 14:17:02 -08:00
.pylintrc tools: docs: parse-headers.py: move it from sphinx dir 2025-08-29 15:54:42 -06:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS CREDITS: update Martin's information 2025-11-12 10:00:14 -08:00
Kbuild sched: Make migrate_{en,dis}able() inline 2025-09-25 09:57:16 +02:00
Kconfig io_uring: Rename KConfig to Kconfig 2025-02-19 14:53:27 -07:00
MAINTAINERS MAINTAINERS: TPM DEVICE DRIVER: update the W-tag 2025-11-27 14:24:44 -08:00
Makefile Linux 6.18-rc5 2025-11-09 15:10:19 -08:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.