mirror of
https://github.com/torvalds/linux.git
synced 2026-03-08 00:44:31 +01:00
18420 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
c7decec2f2 |
perf tools changes for v7.0:
- Introduce 'perf sched stats' tool with record/report/diff workflows
using schedstat counters.
- Add a faster libdw based addr2line implementation and allow selecting
it or its alternatives via 'perf config addr2line.style='.
- Data-type profiling fixes and improvements including the ability
to select fields using 'perf report''s -F/-fields, e.g.:
'perf report --fields overhead,type'
- Add 'perf test' regression tests for Data-type profiling with
C and Rust workloads.
- Fix srcline printing with inlines in callchains, make sure this has
coverage in 'perf test'.
- Fix printing of leaf IP in LBR callchains.
- Fix display of metrics without sufficient permission in 'perf stat'.
- Print all machines in 'perf kvm report -vvv', not just the host.
- Switch from SHA-1 to BLAKE2s for build ID generation, remove SHA-1
code.
- Fix 'perf report's histogram entry collapsing with '-F' option.
- Use system's cacheline size instead of a hardcoded value in 'perf
report'.
- Allow filtering conversion by time range in 'perf data'.
- Cover conversion to CTF using 'perf data' in 'perf test'.
- Address newer glibc const-correctness (-Werror=discarded-qualifiers)
issues.
- Fixes and improvements for ARM's CoreSight support, simplify ARM SPE
event config in 'perf mem', update docs for 'perf c2c' including the
ARM events it can be used with.
- Build support for generating metrics from arch specific python script,
add extra AMD, Intel, ARM64 metrics using it.
- Add AMD Zen 6 events and metrics.
- Add JSON file with OpenHW Risc-V CVA6 hardware counters.
- Add 'perf kvm' stats live testing.
- Add more 'perf stat' tests to 'perf test'.
- Fix segfault in `perf lock contention -b/--use-bpf`
- Fix various 'perf test' cases for s390.
- Build system cleanups, bump minimum shellcheck version to 0.7.2
- Support building the capstone based annotation routines as a plugin.
- Allow passing extra Clang flags via EXTRA_BPF_FLAGS.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCaZn25QAKCRCyPKLppCJ+
J9MbAQCTKChBwDsqVh5iPr0UAc+mez9LOPJFa580SYH9nmd1jgD+Oqip7xCf514G
ZtYPNh+Mz0se0Mcb++syLUEjxvbrQQY=
=y2VY
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v7.0-1-2026-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Arnaldo Carvalho de Melo:
- Introduce 'perf sched stats' tool with record/report/diff workflows
using schedstat counters
- Add a faster libdw based addr2line implementation and allow selecting
it or its alternatives via 'perf config addr2line.style='
- Data-type profiling fixes and improvements including the ability to
select fields using 'perf report''s -F/-fields, e.g.:
'perf report --fields overhead,type'
- Add 'perf test' regression tests for Data-type profiling with C and
Rust workloads
- Fix srcline printing with inlines in callchains, make sure this has
coverage in 'perf test'
- Fix printing of leaf IP in LBR callchains
- Fix display of metrics without sufficient permission in 'perf stat'
- Print all machines in 'perf kvm report -vvv', not just the host
- Switch from SHA-1 to BLAKE2s for build ID generation, remove SHA-1
code
- Fix 'perf report's histogram entry collapsing with '-F' option
- Use system's cacheline size instead of a hardcoded value in 'perf
report'
- Allow filtering conversion by time range in 'perf data'
- Cover conversion to CTF using 'perf data' in 'perf test'
- Address newer glibc const-correctness (-Werror=discarded-qualifiers)
issues
- Fixes and improvements for ARM's CoreSight support, simplify ARM SPE
event config in 'perf mem', update docs for 'perf c2c' including the
ARM events it can be used with
- Build support for generating metrics from arch specific python
script, add extra AMD, Intel, ARM64 metrics using it
- Add AMD Zen 6 events and metrics
- Add JSON file with OpenHW Risc-V CVA6 hardware counters
- Add 'perf kvm' stats live testing
- Add more 'perf stat' tests to 'perf test'
- Fix segfault in `perf lock contention -b/--use-bpf`
- Fix various 'perf test' cases for s390
- Build system cleanups, bump minimum shellcheck version to 0.7.2
- Support building the capstone based annotation routines as a plugin
- Allow passing extra Clang flags via EXTRA_BPF_FLAGS
* tag 'perf-tools-for-v7.0-1-2026-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (255 commits)
perf test script: Add python script testing support
perf test script: Add perl script testing support
perf script: Allow the generated script to be a path
perf test: perf data --to-ctf testing
perf test: Test pipe mode with data conversion --to-json
perf json: Pipe mode --to-ctf support
perf json: Pipe mode --to-json support
perf check: Add libbabeltrace to the listed features
perf build: Allow passing extra Clang flags via EXTRA_BPF_FLAGS
perf test data_type_profiling.sh: Skip just the Rust tests if code_with_type workload is missing
tools build: Fix feature test for rust compiler
perf libunwind: Fix calls to thread__e_machine()
perf stat: Add no-affinity flag
perf evlist: Reduce affinity use and move into iterator, fix no affinity
perf evlist: Missing TPEBS close in evlist__close()
perf evlist: Special map propagation for tool events that read on 1 CPU
perf stat-shadow: In prepare_metric fix guard on reading NULL perf_stat_evsel
Revert "perf tool_pmu: More accurately set the cpus for tool events"
tools build: Emit dependencies file for test-rust.bin
tools build: Make test-rust.bin be removed by the 'clean' target
...
|
||
|
|
cb5573868e |
Loongarch:
- Add more CPUCFG mask bits. - Improve feature detection. - Add lazy load support for FPU and binary translation (LBT) register state. - Fix return value for memory reads from and writes to in-kernel devices. - Add support for detecting preemption from within a guest. - Add KVM steal time test case to tools/selftests. ARM: - Add support for FEAT_IDST, allowing ID registers that are not implemented to be reported as a normal trap rather than as an UNDEF exception. - Add sanitisation of the VTCR_EL2 register, fixing a number of UXN/PXN/XN bugs in the process. - Full handling of RESx bits, instead of only RES0, and resulting in SCTLR_EL2 being added to the list of sanitised registers. - More pKVM fixes for features that are not supposed to be exposed to guests. - Make sure that MTE being disabled on the pKVM host doesn't give it the ability to attack the hypervisor. - Allow pKVM's host stage-2 mappings to use the Force Write Back version of the memory attributes by using the "pass-through' encoding. - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the guest. - Preliminary work for guest GICv5 support. - A bunch of debugfs fixes, removing pointless custom iterators stored in guest data structures. - A small set of FPSIMD cleanups. - Selftest fixes addressing the incorrect alignment of page allocation. - Other assorted low-impact fixes and spelling fixes. RISC-V: - Fixes for issues discoverd by KVM API fuzzing in kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and kvm_riscv_vcpu_aia_imsic_update() - Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM - Transparent huge page support for hypervisor page tables - Adjust the number of available guest irq files based on MMIO register sizes found in the device tree or the ACPI tables - Add RISC-V specific paging modes to KVM selftests - Detect paging mode at runtime for selftests s390: - Performance improvement for vSIE (aka nested virtualization) - Completely new memory management. s390 was a special snowflake that enlisted help from the architecture's page table management to build hypervisor page tables, in particular enabling sharing the last level of page tables. This however was a lot of code (~3K lines) in order to support KVM, and also blocked several features. The biggest advantages is that the page size of userspace is completely independent of the page size used by the guest: userspace can mix normal pages, THPs and hugetlbfs as it sees fit, and in fact transparent hugepages were not possible before. It's also now possible to have nested guests and guests with huge pages running on the same host. - Maintainership change for s390 vfio-pci - Small quality of life improvement for protected guests x86: - Add support for giving the guest full ownership of PMU hardware (contexted switched around the fastpath run loop) and allowing direct access to data MSRs and PMCs (restricted by the vPMU model). KVM still intercepts access to control registers, e.g. to enforce event filtering and to prevent the guest from profiling sensitive host state. This is more accurate, since it has no risk of contention and thus dropped events, and also has significantly less overhead. For more information, see the commit message for merge commit |
||
|
|
dbf0108347 |
perf test script: Add python script testing support
Basic coverage of python script support from `perf script`. Committer testing: $ perf test 'perf script python' 107: perf script python tests : Ok $ perf test -vv 'perf script python' 107: perf script python tests: --- start --- test child forked, pid 595537 Testing event: sched:sched_switch perf script python test [Skipped: failed to record sched:sched_switch] Testing event: task-clock Generating python script... generated Python script: /tmp/__perf_test_script.J4rWj.py Executing python script... perf script python test [Success: task-clock triggered param_dict] ---- end(0) ---- 107: perf script python tests : Ok $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Yujie Liu <yujie.liu@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
2273697781 |
perf test script: Add perl script testing support
Basic coverage of perl script support from `perf script`. This is
disabled by default and so the test will most normally skip.
Committer testing:
$ perf test 'perf script perl'
106: perf script perl tests : Skip
$ perf test -vv 'perf script perl'
106: perf script perl tests:
--- start ---
test child forked, pid 578323
perf script perl test [Skipped: no libperl support]
---- end(-2) ----
106: perf script perl tests : Skip
$ perf check feature libperl
libperl: [ OFF ] # HAVE_LIBPERL_SUPPORT ( tip: Deprecated, use LIBPERL=1 and install perl-ExtUtils-Embed/libperl-dev to build with it )
$
Install perl-ExtUtils-Embed, build with LIBPERL=1, rebuild:
$ perf check feature libperl
libperl: [ on ] # HAVE_LIBPERL_SUPPORT
$ perf test 'perf script perl'
106: perf script perl tests : Ok
$ perf test -vv 'perf script perl'
106: perf script perl tests:
--- start ---
test child forked, pid 588206
Testing event: sched:sched_switch
perf script perl test [Skipped: failed to record sched:sched_switch]
Testing event: task-clock
Generating perl script...
generated Perl script: /tmp/__perf_test_script.RpMn5.pl
Executing perl script...
perf script perl test [Success: task-clock triggered $VAR1]
---- end(0) ----
106: perf script perl tests : Ok
$
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
22ca2f7f32 |
perf script: Allow the generated script to be a path
Allow the script generated by "perf script -g <language>" to be a file
path and the language determined by the file extension.
This is useful in testing so that the generated script file can be
written to a test directory.
Committer testing:
$ perf record ls a.a
ls: cannot access 'a.a': No such file or directory
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data (7 samples) ]
$ perf script -g python
generated Python script: perf-script.py
$ perf script -g myscript.py
generated Python script: myscript.py
$ diff -u perf-script.py myscript.py
$ tail myscript.py
def trace_unhandled(event_name, context, event_fields_dict, perf_sample_dict):
print(get_dict_as_string(event_fields_dict))
print('Sample: {'+get_dict_as_string(perf_sample_dict['sample'], ', ')+'}')
def print_header(event_name, cpu, secs, nsecs, pid, comm):
print("%-20s %5u %05u.%09u %8u %-20s " % \
(event_name, cpu, secs, nsecs, pid, comm), end="")
def get_dict_as_string(a_dict, delimiter=' '):
return delimiter.join(['%s=%s'%(k,str(v))for k,v in sorted(a_dict.items())])
$
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
9083ce531a |
perf test: perf data --to-ctf testing
If babeltrace is detected check that --to-ctf functions with a data
file and in pipe mode.
Committer testing:
$ perf test 'perf data convert --to-ctf'
124: 'perf data convert --to-ctf' command test : Ok
$ perf test -vv 'perf data convert --to-ctf'
124: 'perf data convert --to-ctf' command test:
--- start ---
test child forked, pid 556008
libbabeltrace: [ on ] # HAVE_LIBBABELTRACE_SUPPORT
Testing Perf Data Conversion Command to CTF (File input)
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.021 MB /tmp/__perf_test.perf.data.9TxzZ (115 samples) ]
[ perf data convert: Converted '/tmp/__perf_test.perf.data.9TxzZ' into CTF data '/tmp/__perf_test.ctf.f5EkS' ]
[ perf data convert: Converted and wrote 0.012 MB (115 samples) ]
Perf Data Converter Command to CTF (File input) [SUCCESS]
Testing Perf Data Conversion Command to CTF (Pipe mode)
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.047 MB - ]
Failed to setup all events.
[ perf data convert: Converted '/tmp/__perf_test.perf.data.9TxzZ' into CTF data '/tmp/__perf_test.ctf.f5EkS' ]
[ perf data convert: Converted and wrote 0.000 MB (0 samples) ]
Perf Data Converter Command to CTF (Pipe mode) [SUCCESS]
Unexpected signal in main
---- end(0) ----
124: 'perf data convert --to-ctf' command test : Ok
$
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Derek Foreman <derek.foreman@collabora.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
fc4577b52a |
perf test: Test pipe mode with data conversion --to-json
Add pipe mode test for json data conversion. Tidy up exit and cleanup code. Committer testing: $ perf test 'perf data convert --to-json' 124: 'perf data convert --to-json' command test : Ok $ perf test -vv 'perf data convert --to-json' 124: 'perf data convert --to-json' command test: --- start --- test child forked, pid 548738 Testing Perf Data Conversion Command to JSON [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.020 MB /tmp/__perf_test.perf.data.krxvl (104 samples) ] [ perf data convert: Converted '/tmp/__perf_test.perf.data.krxvl' into JSON data '/tmp/__perf_test.output.json.0z60p' ] [ perf data convert: Converted and wrote 0.075 MB (104 samples) ] Perf Data Converter Command to JSON [SUCCESS] Validating Perf Data Converted JSON file The file contains valid JSON format [SUCCESS] Testing Perf Data Conversion Command to JSON (Pipe mode) [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.046 MB - ] [ perf data convert: Converted '-' into JSON data '/tmp/__perf_test.output.json.0z60p' ] [ perf data convert: Converted and wrote 0.081 MB (110 samples) ] Perf Data Converter Command to JSON (Pipe mode) [SUCCESS] Validating Perf Data Converted JSON file The file contains valid JSON format [SUCCESS] ---- end(0) ---- 124: 'perf data convert --to-json' command test : Ok $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Derek Foreman <derek.foreman@collabora.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
5b92fc082c |
perf json: Pipe mode --to-ctf support
In pipe mode the environment may not be fully initialized so be robust to fields being NULL. Add default handling of attr events, use the feature events to populate the ctf writer environment. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Derek Foreman <derek.foreman@collabora.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
6db2f7c67b |
perf json: Pipe mode --to-json support
In pipe mode the environment may not be fully initialized so be robust to fields being NULL. Add default handling of feature and attr events. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Derek Foreman <derek.foreman@collabora.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
8772598b78 |
perf check: Add libbabeltrace to the listed features
This enables scripts to more easily determine if `perf data --to-ctf`
is supported.
Committer testing:
$ perf check feature libbabeltrace
libbabeltrace: [ on ] # HAVE_LIBBABELTRACE_SUPPORT
$ perf check feature -q libbabeltrace && echo have libbabeltrace support
have libbabeltrace support
$
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Derek Foreman <derek.foreman@collabora.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
9eb1760f84 |
perf build: Allow passing extra Clang flags via EXTRA_BPF_FLAGS
Add support for EXTRA_BPF_FLAGS in the eBPF skeleton build, allowing
users to pass additional clang options such as --sysroot or custom
include paths when cross-compiling perf.
This is primarily intended for cross-build scenarios where the default
host include paths do not match the target kernel version.
Example usage:
make perf ARCH="arm64" EXTRA_BPF_FLAGS="--sysroot=..."
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: hupu <hupu.gm@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
adc1284bae |
perf test data_type_profiling.sh: Skip just the Rust tests if code_with_type workload is missing
Namhyung suggested skipping only the rust tests when the code_with_type 'perf test' workload is not built into perf, do it so that we can continue to test the C based workloads: With rust: root@number:/# perf test -vv "data type" 83: perf data type profiling tests: --- start --- test child forked, pid 2645245 Basic Rust perf annotate test Basic annotate test [Success] Pipe Rust perf annotate test Pipe annotate test [Success] Basic C perf annotate test Basic annotate test [Success] Pipe C perf annotate test Pipe annotate test [Success] ---- end(0) ---- 83: perf data type profiling tests : Ok root@number:/# Without: root@number:/# perf test "data type" 83: perf data type profiling tests : Ok root@number:/# perf test -vv "data type" 83: perf data type profiling tests: --- start --- test child forked, pid 2634759 Basic Rust perf annotate test Skip: code_with_type workload not built in 'perf test' Pipe Rust perf annotate test Skip: code_with_type workload not built in 'perf test' Basic C perf annotate test Basic annotate test [Success] Pipe C perf annotate test Pipe annotate test [Success] ---- end(0) ---- 83: perf data type profiling tests : Ok root@number:/# Suggested-by: Namhyung Kim <namhyung@kernel.org> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
1a6c45969a |
perf libunwind: Fix calls to thread__e_machine()
Add the missing 'e_flags' option to fix the build.
Fixes:
|
||
|
|
41f1a08645 |
Kbuild/Kconfig updates for 7.0
Kbuild changes
==============
* Drop '*_probe' pattern from modpost section check allowlist, which hid
legitimate warnings (Johan Hovold)
* Disable -Wtype-limits altogether, instead of enabling at W=2 (Vincent
Mailhol)
* Improve UAPI testing to skip testing headers that require a libc when
CONFIG_CC_CAN_LINK is not set, opening up testing of headers with no
libc dependencies to more environments (Thomas Weißschuh)
* Update gendwarfksyms documentation with required dependencies (Jihan
LIN)
* Reject invalid LLVM= values to avoid unintentionally falling back to
system toolchain (Thomas Weißschuh)
* Add a script to help run the kernel build process in a container for
consistent environments and testing (Guillaume Tucker)
* Simplify kallsyms by getting rid of the relative base (Ard Biesheuvel)
* Performance and usability improvements to scripts/make_fit.py (Simon
Glass)
* Minor various clean ups and fixes
Kconfig changes
===============
* Move XPM icons to individual files, clearing up GTK deprecation
warnings (Rostislav Krasny)
* Support
depends on FOO if BAR
as syntactic sugar for
depends on FOO || !BAR' (Nicolas Pitre, Graham Roff)
* Refactor merge_config.sh to use awk over shell/sed/grep, dramatically
speeding up processing large number of config fragments (Anders
Roxell, Mikko Rapeli)
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR74yXHMTGczQHYypIdayaRccAalgUCaYpgQwAKCRAdayaRccAa
liOGAQCqMI42YMLqljFcPu3B/3f43xhDBCXAhquPBIMhbgt+aAEAmmo3uMLHKSRV
XZDKkq13HMMV3Zlmrn5Xk/tzk+hkwwk=
=WYl4
-----END PGP SIGNATURE-----
Merge tag 'kbuild-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild/Kconfig updates from Nathan Chancellor:
"Kbuild:
- Drop '*_probe' pattern from modpost section check allowlist, which
hid legitimate warnings (Johan Hovold)
- Disable -Wtype-limits altogether, instead of enabling at W=2
(Vincent Mailhol)
- Improve UAPI testing to skip testing headers that require a libc
when CONFIG_CC_CAN_LINK is not set, opening up testing of headers
with no libc dependencies to more environments (Thomas Weißschuh)
- Update gendwarfksyms documentation with required dependencies
(Jihan LIN)
- Reject invalid LLVM= values to avoid unintentionally falling back
to system toolchain (Thomas Weißschuh)
- Add a script to help run the kernel build process in a container
for consistent environments and testing (Guillaume Tucker)
- Simplify kallsyms by getting rid of the relative base (Ard
Biesheuvel)
- Performance and usability improvements to scripts/make_fit.py
(Simon Glass)
- Minor various clean ups and fixes
Kconfig:
- Move XPM icons to individual files, clearing up GTK deprecation
warnings (Rostislav Krasny)
- Support
depends on FOO if BAR
as syntactic sugar for
depends on FOO || !BAR
(Nicolas Pitre, Graham Roff)
- Refactor merge_config.sh to use awk over shell/sed/grep,
dramatically speeding up processing large number of config
fragments (Anders Roxell, Mikko Rapeli)"
* tag 'kbuild-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: (39 commits)
kbuild: remove dependency of run-command on config
scripts/make_fit: Compress dtbs in parallel
scripts/make_fit: Support a few more parallel compressors
kbuild: Support a FIT_EXTRA_ARGS environment variable
scripts/make_fit: Move dtb processing into a function
scripts/make_fit: Support an initial ramdisk
scripts/make_fit: Speed up operation
rust: kconfig: Don't require RUST_IS_AVAILABLE for rustc-option
MAINTAINERS: Add scripts/install.sh into Kbuild entry
modpost: Amend ppc64 save/restfpr symnames for -Os build
MIPS: tools: relocs: Ship a definition of R_MIPS_PC32
streamline_config.pl: remove superfluous exclamation mark
kbuild: dummy-tools: Add python3
scripts: kconfig: merge_config.sh: warn on duplicate input files
scripts: kconfig: merge_config.sh: use awk in checks too
scripts: kconfig: merge_config.sh: refactor from shell/sed/grep to awk
kallsyms: Get rid of kallsyms relative base
mips: Add support for PC32 relocations in vmlinux
Documentation: dev-tools: add container.rst page
scripts: add tool to run containerized builds
...
|
||
|
|
bf2c3138ae |
Merge tag 'kvm-x86-pmu-6.20' of https://github.com/kvm-x86/linux into HEAD
KVM mediated PMU support for 6.20 Add support for mediated PMUs, where KVM gives the guest full ownership of PMU hardware (contexted switched around the fastpath run loop) and allows direct access to data MSRs and PMCs (restricted by the vPMU model), but intercepts access to control registers, e.g. to enforce event filtering and to prevent the guest from profiling sensitive host state. To keep overall complexity reasonable, mediated PMU usage is all or nothing for a given instance of KVM (controlled via module param). The Mediated PMU is disabled default, partly to maintain backwards compatilibity for existing setup, partly because there are tradeoffs when running with a mediated PMU that may be non-starters for some use cases, e.g. the host loses the ability to profile guests with mediated PMUs, the fastpath run loop is also a blind spot, entry/exit transitions are more expensive, etc. Versus the emulated PMU, where KVM is "just another perf user", the mediated PMU delivers more accurate profiling and monitoring (no risk of contention and thus dropped events), with significantly less overhead (fewer exits and faster emulation/programming of event selectors) E.g. when running Specint-2017 on a single-socket Sapphire Rapids with 56 cores and no-SMT, and using perf from within the guest: Perf command: a. basic-sampling: perf record -F 1000 -e 6-instructions -a --overwrite b. multiplex-sampling: perf record -F 1000 -e 10-instructions -a --overwrite Guest performance overhead: --------------------------------------------------------------------------- | Test case | emulated vPMU | all passthrough | passthrough with | | | | | event filters | --------------------------------------------------------------------------- | basic-sampling | 33.62% | 4.24% | 6.21% | --------------------------------------------------------------------------- | multiplex-sampling | 79.32% | 7.34% | 10.45% | --------------------------------------------------------------------------- |
||
|
|
4d84667627 |
Performance events changes for v7.0:
x86 PMU driver updates:
- Add support for the core PMU for Intel Diamond Rapids (DMR) CPUs.
Compared to previous iterations of the Intel PMU code, there's
been a lot of changes, which center around three main areas:
- Introduce the OFF-MODULE RESPONSE (OMR) facility to
replace the Off-Core Response (OCR) facility
- New PEBS data source encoding layout
- Support the new "RDPMC user disable" feature
(Dapeng Mi)
- Likewise, a large series adds uncore PMU support for
Intel Diamond Rapids (DMR) CPUs, which center around these
four main areas:
- DMR may have two Integrated I/O and Memory Hub (IMH) dies,
separate from the compute tile (CBB) dies. Each CBB and
each IMH die has its own discovery domain.
- Unlike prior CPUs that retrieve the global discovery table
portal exclusively via PCI or MSR, DMR uses PCI for IMH PMON
discovery and MSR for CBB PMON discovery.
- DMR introduces several new PMON types: SCA, HAMVF, D2D_ULA,
UBR, PCIE4, CRS, CPC, ITC, OTC, CMS, and PCIE6.
- IIO free-running counters in DMR are MMIO-based, unlike SPR.
(Zide Chen)
- Also add support for Add missing PMON units for Intel Panther Lake,
and support Nova Lake (NVL), which largely maps to Panther Lake.
(Zide Chen)
- KVM integration: Add support for mediated vPMUs (by Kan Liang
and Sean Christopherson, with fixes and cleanups by Peter Zijlstra,
Sandipan Das and Mingwei Zhang)
- Add Intel cstate driver to support for Wildcat Lake (WCL)
CPUs, which are a low-power variant of Panther Lake.
(Zide Chen)
- Add core, cstate and MSR PMU support for the Airmont NP Intel CPU
(aka MaxLinear Lightning Mountain), which maps to the existing
Airmont code. (Martin Schiller)
Performance enhancements:
- core: Speed up kexec shutdown by avoiding unnecessary
cross CPU calls. (Jan H. Schönherr)
- core: Fix slow perf_event_task_exit() with LBR callstacks
(Namhyung Kim)
User-space stack unwinding support:
- Various cleanups and refactorings in preparation to generalize
the unwinding code for other architectures. (Jens Remus)
Uprobes updates:
- Transition from kmap_atomic to kmap_local_page (Keke Ming)
- Fix incorrect lockdep condition in filter_chain() (Breno Leitao)
- Fix XOL allocation failure for 32-bit tasks (Oleg Nesterov)
Misc fixes and cleanups:
- s390: Remove kvm_types.h from Kbuild (Randy Dunlap)
- x86/intel/uncore: Convert comma to semicolon (Chen Ni)
- x86/uncore: Clean up const mismatch (Greg Kroah-Hartman)
- x86/ibs: Fix typo in dc_l2tlb_miss comment (Xiang-Bin Shi)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmJhTURHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1i/qw/9F/sjMqbxH8d3kPXB8wk2eUuSkynQ2aNw
Zec9qtfCC5N1U9b2D7ywGJRscTmWYnX/3BKTyzFuyA6SDz6buAgDDIGPlHi+9Fww
+RUUS3lQ7N3pVWZ4Ifu3kbh3Vz4lkQuOXhfcjiyIMS6QIxfrcSLFoKHK+2V6PeU+
x0k+THHz/Ymg+DIpqSjqil1yrKaUmU9xRrbnyy6zJB1duREQrkYBhIWL1+bcd7SA
89RVAGXQ+sWzVQMPaKrMkZj6GavOCB7zseigiiwjBRLznukS2OulDDe8zR6pCJZp
wbdc7TR/nCm+QtNfkHlOmTQvsPAXiXNyXe5Vi8aFjGc0uMGhHaeiL9ah/bwsKA5m
Bm5Y7oVSmBlCJbcr/CTrGYkb+WwvLIgPCwVkn4FYPlsWv+U92qTOx9q7qKmIDFaj
1oUXCwoHbYrYnoZqZqPp2h689m0Lh/lsGhy0QRt8aGnKu0SDjMaqRGbYWH0UI0kA
aDnZstVHG76RnVi0143q8HcvvjNZb82NL0cS749tY/YcwH4kUEGj2XuSK2Ar887T
H0oDJHXijMlGXWqO5bK3WMoCQajR7nRyqBo7/rKYj20OjXmwoXS3vel77/s8WFo2
fUUC469MacDzyxdBNutnkJvvcvsUio3r4MWFsEEWQk2nUE58PtN8YM8j/FdNYpql
zAZ4Jx/A0RM=
=4SRB
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance event updates from Ingo Molnar:
"x86 PMU driver updates:
- Add support for the core PMU for Intel Diamond Rapids (DMR) CPUs
(Dapeng Mi)
Compared to previous iterations of the Intel PMU code, there's been
a lot of changes, which center around three main areas:
- Introduce the OFF-MODULE RESPONSE (OMR) facility to replace the
Off-Core Response (OCR) facility
- New PEBS data source encoding layout
- Support the new "RDPMC user disable" feature
- Likewise, a large series adds uncore PMU support for Intel Diamond
Rapids (DMR) CPUs (Zide Chen)
This centers around these four main areas:
- DMR may have two Integrated I/O and Memory Hub (IMH) dies,
separate from the compute tile (CBB) dies. Each CBB and each IMH
die has its own discovery domain.
- Unlike prior CPUs that retrieve the global discovery table
portal exclusively via PCI or MSR, DMR uses PCI for IMH PMON
discovery and MSR for CBB PMON discovery.
- DMR introduces several new PMON types: SCA, HAMVF, D2D_ULA, UBR,
PCIE4, CRS, CPC, ITC, OTC, CMS, and PCIE6.
- IIO free-running counters in DMR are MMIO-based, unlike SPR.
- Also add support for Add missing PMON units for Intel Panther Lake,
and support Nova Lake (NVL), which largely maps to Panther Lake.
(Zide Chen)
- KVM integration: Add support for mediated vPMUs (by Kan Liang and
Sean Christopherson, with fixes and cleanups by Peter Zijlstra,
Sandipan Das and Mingwei Zhang)
- Add Intel cstate driver to support for Wildcat Lake (WCL) CPUs,
which are a low-power variant of Panther Lake (Zide Chen)
- Add core, cstate and MSR PMU support for the Airmont NP Intel CPU
(aka MaxLinear Lightning Mountain), which maps to the existing
Airmont code (Martin Schiller)
Performance enhancements:
- Speed up kexec shutdown by avoiding unnecessary cross CPU calls
(Jan H. Schönherr)
- Fix slow perf_event_task_exit() with LBR callstacks (Namhyung Kim)
User-space stack unwinding support:
- Various cleanups and refactorings in preparation to generalize the
unwinding code for other architectures (Jens Remus)
Uprobes updates:
- Transition from kmap_atomic to kmap_local_page (Keke Ming)
- Fix incorrect lockdep condition in filter_chain() (Breno Leitao)
- Fix XOL allocation failure for 32-bit tasks (Oleg Nesterov)
Misc fixes and cleanups:
- s390: Remove kvm_types.h from Kbuild (Randy Dunlap)
- x86/intel/uncore: Convert comma to semicolon (Chen Ni)
- x86/uncore: Clean up const mismatch (Greg Kroah-Hartman)
- x86/ibs: Fix typo in dc_l2tlb_miss comment (Xiang-Bin Shi)"
* tag 'perf-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
s390: remove kvm_types.h from Kbuild
uprobes: Fix incorrect lockdep condition in filter_chain()
x86/ibs: Fix typo in dc_l2tlb_miss comment
x86/uprobes: Fix XOL allocation failure for 32-bit tasks
perf/x86/intel/uncore: Convert comma to semicolon
perf/x86/intel: Add support for rdpmc user disable feature
perf/x86: Use macros to replace magic numbers in attr_rdpmc
perf/x86/intel: Add core PMU support for Novalake
perf/x86/intel: Add support for PEBS memory auxiliary info field in NVL
perf/x86/intel: Add core PMU support for DMR
perf/x86/intel: Add support for PEBS memory auxiliary info field in DMR
perf/x86/intel: Support the 4 new OMR MSRs introduced in DMR and NVL
perf/core: Fix slow perf_event_task_exit() with LBR callstacks
perf/core: Speed up kexec shutdown by avoiding unnecessary cross CPU calls
uprobes: use kmap_local_page() for temporary page mappings
arm/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
mips/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
arm64/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
riscv/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
perf/x86/intel/uncore: Add Nova Lake support
...
|
||
|
|
5d1ab659fb |
perf stat: Add no-affinity flag
Add flag that disables affinity behavior. Using sched_setaffinity() to place a perf thread on a CPU can avoid certain interprocessor interrupts but may introduce a delay due to the scheduling, particularly on loaded machines. Add a command line option to disable the behavior. This behavior is less present in other tools like `perf record`, as it uses a ring buffer and doesn't make repeated system calls. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andres Freund <andres@anarazel.de> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
d484361550 |
perf evlist: Reduce affinity use and move into iterator, fix no affinity
The evlist__for_each_cpu iterator will call sched_setaffitinity when moving between CPUs to avoid IPIs. If only 1 IPI is saved then this may be unprofitable as the delay to get scheduled may be considerable. This may be particularly true if reading an event group in `perf stat` in interval mode. Move the affinity handling completely into the iterator so that a single evlist__use_affinity can determine whether CPU affinities will be used. For `perf record` the change is minimal as the dummy event and the real event will always make the use of affinities the thing to do. In `perf stat`, tool events are ignored and affinities only used if >1 event on the same CPU occur. Determining if affinities are useful is done by evlist__use_affinity which tests per-event whether the event's PMU benefits from affinity use - it is assumed only perf event using PMUs do. Fix a bug where when there are no affinities that the CPU map iterator may reference a CPU not present in the initial evsel. Fix by making the iterator and non-iterator code common. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andres Freund <andres@anarazel.de> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
47172912c9 |
perf evlist: Missing TPEBS close in evlist__close()
The libperf evsel close won't close TPEBS events properly. Add a test to do this. The libperf close routine is used in evlist__close() for affinity reasons. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andres Freund <andres@anarazel.de> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
ff8548172f |
perf evlist: Special map propagation for tool events that read on 1 CPU
Tool events like duration_time don't need a perf_cpu_map that contains all online CPUs. Having such a perf_cpu_map causes overheads when iterating between events for CPU affinity. During parsing mark events that just read on a single CPU map index as such, then during map propagation set up the evsel's CPUs and thereby the evlists accordingly. The setting cannot be done early in parsing as user CPUs are only fully known when evlist__create_maps is called. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andres Freund <andres@anarazel.de> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
63b320aaac |
perf stat-shadow: In prepare_metric fix guard on reading NULL perf_stat_evsel
The aggr value is setup to always be non-null creating a redundant
guard for reading from it. Switch to using the perf_stat_evsel (ps)
and narrow the scope of aggr so that it is known valid when used.
Fixes:
|
||
|
|
bc105a8918 |
Revert "perf tool_pmu: More accurately set the cpus for tool events"
This reverts commit
|
||
|
|
3d012b8614 |
perf test: Fix test case perftool-testsuite_report for s390
Test case perftool-testsuite_report fails on s390 for some time
now.
Root cause is a time out which is too tight for large s390 machines.
The time out value addr2line_timeout_ms is per default set to 1 second.
This is the maximum time the function read_addr2line_record() waits for
a reply from the forked off tool addr2line, which is started as a child
in interactive mode.
It reads stdin (an address in hexadecimal) and replies on stdout with
function name, file name and line number. This might take more than one
second.
However one second is not always enough and the reply from addr2line
tool is not received. Function read_addr2line_record() fails and emits
a warning, which is not expected by the test case. It fails.
Output before:
# perf test -F 133
-- [ PASS ] -- perf_report :: setup :: prepare the perf.data file
==================
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.087 MB \
/tmp/perftool-testsuite_report.FHz/perf_report/perf.data.1 \
(207 samples) ]
==================
-- [ PASS ] -- perf_report :: setup :: prepare the perf.data.1 file
## [ PASS ] ## perf_report :: setup SUMMARY
-- [ SKIP ] -- perf_report :: test_basic :: help message :: testcase skipped
Line did not match any pattern: "cmd__addr2line /usr/lib/debug/lib/modules/
6.19.0-20260205.rc8.git366.9845cf73f7db.300.fc43.s390x+next/
vmlinux: could not read first record"
Line did not match any pattern: "cmd__addr2line /usr/lib/debug/lib/modules/
6.19.0-20260205.rc8.git366.9845cf73f7db.300.fc43.s390x+next/
vmlinux: could not read first record"
-- [ FAIL ] -- perf_report :: test_basic :: basic execution
(output regexp parsing)
....
133: perftool-testsuite_report : FAILED!
Output after:
# ./perf test -F 133
-- [ PASS ] -- perf_report :: setup :: prepare the perf.data file
==================
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.087 MB \
/tmp/perftool-testsuite_report.Mlp/perf_report/perf.data.1
(188 samples) ]
==================
-- [ PASS ] -- perf_report :: setup :: prepare the perf.data.1 file
## [ PASS ] ## perf_report :: setup SUMMARY
-- [ SKIP ] -- perf_report :: test_basic :: help message :: testcase skipped
-- [ PASS ] -- perf_report :: test_basic :: basic execution
-- [ PASS ] -- perf_report :: test_basic :: number of samples
-- [ PASS ] -- perf_report :: test_basic :: header
-- [ PASS ] -- perf_report :: test_basic :: header timestamp
-- [ PASS ] -- perf_report :: test_basic :: show CPU utilization
-- [ PASS ] -- perf_report :: test_basic :: pid
-- [ PASS ] -- perf_report :: test_basic :: non-existing symbol
-- [ PASS ] -- perf_report :: test_basic :: symbol filter
-- [ PASS ] -- perf_report :: test_basic :: latency header
-- [ PASS ] -- perf_report :: test_basic :: default report for latency profile
-- [ PASS ] -- perf_report :: test_basic :: latency report for latency profile
-- [ PASS ] -- perf_report :: test_basic :: parallelism histogram
## [ PASS ] ## perf_report :: test_basic SUMMARY
133: perftool-testsuite_report : Ok
#
Fixes:
|
||
|
|
2a400eeba4 |
perf test code_with_type.sh: Skip test if rust wasn't available at build time
$ perf test 'perf data type profiling tests' 83: perf data type profiling tests : Skip $ perf test -vv 'perf data type profiling tests' 83: perf data type profiling tests: --- start --- test child forked, pid 977213 Skip: code_with_type workload not built in 'perf test' ---- end(-2) ---- 83: perf data type profiling tests : Skip $ Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
1d3ffe6233 |
perf tests workload: Formatting for code_with_type.rs
One part of the rust code for code_with_type workload wasn't properly formatted. Pass it through rustfmt to fix that. Closes: https://lore.kernel.org/oe-kbuild-all/202602091357.oyRv6hgQ-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
5490063269 |
KVM/arm64 updates for 7.0
- Add support for FEAT_IDST, allowing ID registers that are not implemented to be reported as a normal trap rather than as an UNDEF exception. - Add sanitisation of the VTCR_EL2 register, fixing a number of UXN/PXN/XN bugs in the process. - Full handling of RESx bits, instead of only RES0, and resulting in SCTLR_EL2 being added to the list of sanitised registers. - More pKVM fixes for features that are not supposed to be exposed to guests. - Make sure that MTE being disabled on the pKVM host doesn't give it the ability to attack the hypervisor. - Allow pKVM's host stage-2 mappings to use the Force Write Back version of the memory attributes by using the "pass-through' encoding. - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the guest. - Preliminary work for guest GICv5 support. - A bunch of debugfs fixes, removing pointless custom iterators stored in guest data structures. - A small set of FPSIMD cleanups. - Selftest fixes addressing the incorrect alignment of page allocation. - Other assorted low-impact fixes and spelling fixes. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmmGBEkACgkQI9DQutE9 ekPxQQ//VOzle+RVmgzVSJzpNcoW576QGI7+pZLEMIywXTx6rH+uz2FCaZvvgV7M LrJ+1Qps9ea5Yti9OplNJmQwy1yAHIurZnpnAoMR+EJ5PUeq8p1EAypySpHtmT/d KngZsbCvSMydNdfJFwGaz3NFSYj05FlTmWNN+Ndq0JFqyMJQMgY2qKDVmg3pWKcv TLKTNRo9fJFUVhhBIyIoMl2hE36M6Ac3Qd4dUb5J+Fn834QDXgOzVzUjBtkmbSHD kJ4gbSs2Ic6QsYWtt70RlyRdreBYegA4C3z1cZV6DDQYxp5Jz2oqXYYC31Ro520A swuI5y9HMct4mOxqPUqf1lhbvsmkjuZ5Iog6P7W+mOtYHXZIzY8F61sv9YAis9/5 XNOHkg9Cn/n8C2RRQ8vnq0FEI1g7se1UGbe/1NkD4xeR/bzhE/AZSoOrRE7G/XJx qbF9FkPzd4OXYB2Pdm37G1BWsfN4M1bY1rOmmCyMKym793+b/jM7xdoZY1QfbabP uKiavuK8RYgqxrEilhP0asvafKjpZaJbn2R3jwHZgQDWe7WH5FhXwX2UcUpQsTan XZd+/cWaYXjLsKJbiAzy3UArgnzSrHPSpwIOkYq8Lf8EvPgS2g3LLJYbw250Cf1G 74stwoK4PgZ3e6k0nkMk43x1swKb13Gp0vCZjVdnIec9EQgOHfI= =X8iC -----END PGP SIGNATURE----- Merge tag 'kvmarm-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 7.0 - Add support for FEAT_IDST, allowing ID registers that are not implemented to be reported as a normal trap rather than as an UNDEF exception. - Add sanitisation of the VTCR_EL2 register, fixing a number of UXN/PXN/XN bugs in the process. - Full handling of RESx bits, instead of only RES0, and resulting in SCTLR_EL2 being added to the list of sanitised registers. - More pKVM fixes for features that are not supposed to be exposed to guests. - Make sure that MTE being disabled on the pKVM host doesn't give it the ability to attack the hypervisor. - Allow pKVM's host stage-2 mappings to use the Force Write Back version of the memory attributes by using the "pass-through' encoding. - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the guest. - Preliminary work for guest GICv5 support. - A bunch of debugfs fixes, removing pointless custom iterators stored in guest data structures. - A small set of FPSIMD cleanups. - Selftest fixes addressing the incorrect alignment of page allocation. - Other assorted low-impact fixes and spelling fixes. |
||
|
|
3f5dfa472e |
tools build: Fix rust feature detection
Features in FEATURE_TESTS_BASIC will be set as being available if
test-all.c builds, so since the rust test isn't included in test-all.c,
we can't have 'rust' in there, remove it from FEATURE_TESTS_BASIC and
use feature-check so that it tries to build test-rust.bin, doing the
actual feature detection.
On a system lacking a rust compiler:
Makefile.config:1158: Rust is not found. Test workloads with rust are disabled.
Auto-detecting system features:
... libdw: [ on ]
... glibc: [ on ]
... libelf: [ on ]
... libnuma: [ on ]
... numa_num_possible_cpus: [ on ]
... libpython: [ on ]
... libcapstone: [ on ]
... llvm-perf: [ on ]
... zlib: [ on ]
... lzma: [ on ]
... bpf: [ on ]
... libaio: [ on ]
... libzstd: [ on ]
... libopenssl: [ on ]
... rust: [ OFF ]
$ cat /tmp/build/perf-tools-next/feature/test-rust.make.output
/bin/sh: line 1: rustc: command not found
$ file /tmp/build/perf-tools-next/feature/test-rust.bin
/tmp/build/perf-tools-next/feature/test-rust.bin: cannot open `/tmp/build/perf-tools-next/feature/test-rust.bin' (No such file or directory)
$
$ perf -vv | grep RUST
rust: [ OFF ] # HAVE_RUST_SUPPORT
$
And after installing it:
... rust: [ on ]
$ cat /tmp/build/perf-tools-next/feature/test-rust.make.output
$ file /tmp/build/perf-tools-next/feature/test-rust.bin
/tmp/build/perf-tools-next/feature/test-rust.bin: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=9c416edf673ee3705b97bae893a99a6fcf1ee258, for GNU/Linux 3.2.0, with debug_info, not stripped
$
$ perf -vv | grep RUST
rust: [ on ] # HAVE_RUST_SUPPORT
$
Fixes:
|
||
|
|
335047109d |
perf tests: Test annotate with data type profiling and C
Exercise the annotate command with data type profiling feature with C. For that extend the existing data type profiling shell test to profile the datasym workload, then annotate the result expecting to see some data structures from the C code. Committer testing: root@number:~# perf test 'perf data type profiling tests' 83: perf data type profiling tests : Ok root@number:~# perf test -vv 'perf data type profiling tests' 83: perf data type profiling tests: --- start --- test child forked, pid 125028 Basic Rust perf annotate test Basic annotate test [Success] Pipe Rust perf annotate test Pipe annotate test [Success] Basic C perf annotate test Basic annotate test [Success] Pipe C perf annotate test Pipe annotate test [Success] ---- end(0) ---- 83: perf data type profiling tests : Ok root@number:~# Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
f60a5c2296 |
perf tests: Test annotate with data type profiling and rust
Exercise the annotate command with data type profiling feature on the rust runtime. For that add a new shell test, which will profile the code_with_type workload, then annotate the result expecting to see some data structures from the rust code. Committer testing: root@number:~# perf test 'perf data type profiling tests' 83: perf data type profiling tests : Ok root@number:~# perf test -v 'perf data type profiling tests' 83: perf data type profiling tests : Ok root@number:~# perf test -vv 'perf data type profiling tests' 83: perf data type profiling tests: --- start --- test child forked, pid 111044 Basic perf annotate test Basic annotate test [Success] Pipe perf annotate test Pipe annotate test [Success] ---- end(0) ---- 83: perf data type profiling tests : Ok root@number:~# Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
2e05bb52a1 |
perf test workload: Add code_with_type test workload
The purpose of the workload is to gather samples of rust runtime. To
achieve that it has a dummy rust library linked with it.
Per recommendations for such scenarios [1], the rust library is
statically linked.
An example:
$ perf record perf test -w code_with_type
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.160 MB perf.data (4074 samples) ]
$ perf report --stdio --dso perf -s srcfile,srcline
45.16% ub_checks.rs ub_checks.rs:72
6.72% code_with_type.rs code_with_type.rs:15
6.64% range.rs range.rs:767
4.26% code_with_type.rs code_with_type.rs:21
4.23% range.rs range.rs:0
3.99% code_with_type.rs code_with_type.rs:16
[...]
[1]: https://doc.rust-lang.org/reference/linkage.html#mixed-rust-and-foreign-codebases
Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
6a32fa5ccd |
tools build: Add a feature test for rust compiler
Add a feature test to identify if the rust compiler is available, so that perf could build rust based worloads based on that. Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
ff9aeb6bd1 |
perf test parse-metric: Ensure aggregate counts appear to have run
Commit |
||
|
|
c60ee958d6 |
perf test record.sh: Fix shellcheck warning
Add quotes to avoid the following warning:
```
In tests/shell/record.sh line 264:
[ $(uname -m) = "s390x" ] && {
^---------^ SC2046 (warning): Quote this to prevent word splitting.
For more information:
https://www.shellcheck.net/wiki/SC2046 -- Quote this to prevent word splitt...
```
Fixes:
|
||
|
|
36a1b0061a |
perf build: Reduce pmu-events related copying and mkdirs
When building to an output directory the previous code would remove files and then copy the source files over. Each source file copy would have a rule to make its directory. All JSON for every architecture was considered a source file. This led to unnecessary copying as a file would be deleted and then the same file copied again, unnecessary directory making, and copying of files not used in the build. A side-effect would be a lot of build messages. This change makes it so that all computed output files are created and then compared to all files in the OUTPUT directory. By filtering out the files that would be copied, unnecessary files can be determined and then deleted - note, this is a phony target which would remake the pmu-events.c if always depended upon, and so the dependency is conditional on there being files to remove. This has some overhead as the $(OUTPUT)/pmu-events is "find" over rather than just "rm -fr", but the savings from unnecessary copying, etc. should make up for this new make overhead. The copy target just does copying but has a dependency on the directory it needs being built, avoiding repetitive mkdirs. The source files for copying only consider the JEVENTS_ARCH unless the JEVENTS_ARCH is all. The metric JSON is only generated if appropriate, rather than always being generated and jevents.py deciding whether or not to use the files. The mypy and pylint targets are fixed as variable names had changed but the rules not updated. The line count of a build with "make -C tools/perf O=/tmp/perf clean all" prior to this change was 2181 lines, after this change it is 1596 lines. This is a reduction of 585 lines or about 27%. The generated pmu-events.c for JEVENTS_ARCH "x86" and "all" were validated as being identical after this change. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
4479884d1f |
perf lock contention: fix segfault in lock contention -b/--use-bpf
When run on a kernel without BTF info, perf crashes:
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
libbpf: failed to find valid kernel BTF
Program received signal SIGSEGV, Segmentation fault.
0x00005555556915b7 in btf.type_cnt ()
(gdb) bt
#0 0x00005555556915b7 in btf.type_cnt ()
#1 0x0000555555691fbc in btf_find_by_name_kind ()
#2 0x00005555556920d0 in btf.find_by_name_kind ()
#3 0x00005555558a1b7c in init_numa_data (con=0x7fffffffd0a0) at util/bpf_lock_contention.c:125
#4 0x00005555558a264b in lock_contention_prepare (con=0x7fffffffd0a0) at util/bpf_lock_contention.c:313
#5 0x0000555555620702 in __cmd_contention (argc=0, argv=0x7fffffffea10) at builtin-lock.c:2084
#6 0x0000555555622c8d in cmd_lock (argc=0, argv=0x7fffffffea10) at builtin-lock.c:2755
#7 0x0000555555651451 in run_builtin (p=0x555556104f00 <commands+576>, argc=3, argv=0x7fffffffea10)
at perf.c:349
#8 0x00005555556516ed in handle_internal_command (argc=3, argv=0x7fffffffea10) at perf.c:401
#9 0x000055555565184e in run_argv (argcp=0x7fffffffe7fc, argv=0x7fffffffe7f0) at perf.c:445
#10 0x0000555555651b9f in main (argc=3, argv=0x7fffffffea10) at perf.c:553
Check if btf loading failed, and don't do anything with it in
init_numa_data(). This leads to the following error message, instead of
just a crash:
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
libbpf: failed to find valid kernel BTF
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
libbpf: failed to find valid kernel BTF
libbpf: Error loading vmlinux BTF: -ESRCH
libbpf: failed to load BPF skeleton 'lock_contention_bpf': -ESRCH
Failed to load lock-contention BPF skeleton
lock contention BPF setup failed
Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
920c5570a6 |
perf sort: Replace static cacheline size with sysconf cacheline size
Testing: - Built perf - Executed perf mem record and report Committer notes: This addresses a TODO and improves the situation where record and report/c2c are performed on the same machine or in machines with the same cacheline size, but the proper way is to store the cacheline size in the perf.data header at 'record' time and then use it at post processing time. Signed-off-by: Ricky Ringler <ricky.ringler@proton.me> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20260129004223.26799-1-ricky.ringler@proton.me Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
c73a56ed3c |
perf test: Fix test case Leader sampling on s390
The subtest 'Leader sampling' some time fails on s390.
- for z/VM guest: Disable the test for z/VM guest. There is no
CPU Measurement facility to run the test successfully.
- for LPAR: Use correct event names.
A detailed analysis follows here:
Now to the debugging and investigation:
1. With command
perf record -e '{cycles,cycles}:S' -- ....
the first cycles event starts sampling.
On s390 this sets up sampling with a frequency of 4000 Hz.
This translates to hardware sample rate of 1377000 instructions per
micro-second to meet a frequency of 4000 HZ.
2. With first event cycles now sampling into a hardware buffer, an
interrupt is triggered each time a sampling buffer gets full.
The interrupt handler is then invoked and debug output shows the
processing of samples. The size of one hardware sample is 32 bytes.
With an interrupt triggered when the hardware buffer page of 4KB
gets full, the interrupt handler processes 128 samples.
(This is taken from s390 specific fast debug data gathering)
2025-11-07 14:35:51.977248 000003ffe013cbfa \
perf_event_count_update event->count 0x0 count 0x1502e8
2025-11-07 14:35:51.977248 000003ffe013cbfa \
perf_event_count_update event->count 0x1502e8 count 0x1502e8
2025-11-07 14:35:51.977248 000003ffe013cbfa \
perf_event_count_update event->count 0x2a05d0 count 0x1502e8
2025-11-07 14:35:51.977252 000003ffe013cbfa \
perf_event_count_update event->count 0x3f08b8 count 0x1502e8
2025-11-07 14:35:51.977252 000003ffe013cbfa \
perf_event_count_update event->count 0x540ba0 count 0x1502e8
2025-11-07 14:35:51.977253 000003ffe013cbfa \
perf_event_count_update event->count 0x690e88 count 0x1502e8
2025-11-07 14:35:51.977254 000003ffe013cbfa \
perf_event_count_update event->count 0x7e1170 count 0x1502e8
2025-11-07 14:35:51.977254 000003ffe013cbfa \
perf_event_count_update event->count 0x931458 count 0x1502e8
2025-11-07 14:35:51.977254 000003ffe013cbfa \
perf_event_count_update event->count 0xa81740 count 0x1502e8
3. The value is constantly increasing by the number of instructions
executed to generate a sample entry. This is the first line of the
pairs of lines. count 0x1502e8 --> 1377000
# perf script | grep 1377000 | wc -l
214
# perf script | wc -l
428
#
That is 428 lines in total, and half of the lines contain value
1377000.
4. The second event cycles is opened against the counting PMU, which
is an independent PMU and is not interrupt driven. Once enabled it
runs in the background and keeps running, incrementing silently
about 400+ counters. The counter values are read via assembly
instructions.
This second counter PMU's read call back function is called when the
interrupt handler of the sampling facility processes each sample. The
function call sequence is:
perf_event_overflow()
+--> __perf_event_overflow()
+--> __perf_event_output()
+--> perf_output_sample()
+--> perf_output_read()
+--> perf_output_read_group()
for_each_sibling_event(sub, leader) {
values[n++] = perf_event_count(sub, self);
printk("%s sub %p values %#lx\n", __func__, sub, values[n-1]);
}
The last function perf_event_count() is invoked on the second event
cylces *on* the counting PMU. An added printk statement shows the
following lines in the dmesg output:
# dmesg|grep perf_output_read_group |head -10
[ 332.368620] perf_output_read_group sub 00000000d80b7c1f values 0x3a80917 (1)
[ 332.368624] perf_output_read_group sub 00000000d80b7c1f values 0x3a86c7f (2)
[ 332.368627] perf_output_read_group sub 00000000d80b7c1f values 0x3a89c15 (3)
[ 332.368629] perf_output_read_group sub 00000000d80b7c1f values 0x3a8c895 (4)
[ 332.368631] perf_output_read_group sub 00000000d80b7c1f values 0x3a8f569 (5)
[ 332.368633] perf_output_read_group sub 00000000d80b7c1f values 0x3a9204b
[ 332.368635] perf_output_read_group sub 00000000d80b7c1f values 0x3a94790
[ 332.368637] perf_output_read_group sub 00000000d80b7c1f values 0x3a9704b
[ 332.368638] perf_output_read_group sub 00000000d80b7c1f values 0x3a99888
#
This correlates with the output of
# perf report -D | grep 'id 00000000000000'|head -10
..... id 0000000000000006, value 00000000001502e8, lost 0
..... id 000000000000000e, value 0000000003a80917, lost 0 --> line (1) above
..... id 0000000000000006, value 00000000002a05d0, lost 0
..... id 000000000000000e, value 0000000003a86c7f, lost 0 --> line (2) above
..... id 0000000000000006, value 00000000003f08b8, lost 0
..... id 000000000000000e, value 0000000003a89c15, lost 0 --> line (3) above
..... id 0000000000000006, value 0000000000540ba0, lost 0
..... id 000000000000000e, value 0000000003a8c895, lost 0 --> line (4) above
..... id 0000000000000006, value 0000000000690e88, lost 0
..... id 000000000000000e, value 0000000003a8f569, lost 0 --> line (5) above
Summary:
- Above command starts the CPU sampling facility, with runs interrupt
driven when a 4KB page is full. An interrupt processes the 128 samples
and calls eventually perf_output_read_group() for each sample to save it
in the event's ring buffer.
- At that time the CPU counting facility is invoked to read the value of
the event cycles. This value is saved as the second value in the
sample_read structure.
- The first and odd lines in the perf script output displays the period
value between 2 samples being created by hardware. It is the number
of instructions executes before the hardware writes a sample.
- The second and even lines in the perf script output displays the number
of CPU cycles needed to process each sample and save it in the event's
ring buffer.
These 2 different values can never be identical on s390.
Since event leader sampling is not possible on s390 the perf tool will
return EOPNOTSUPP soon. Perpare the test case for that.
Suggested-by: James Clark <james.clark@linaro.org>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Jan Polensky <japo@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
64ea7a4620 |
perf annotate: Fix register usage in data type profiling
On data type profiling, it tried to match register name with a partial string. For example, it allowed to match with "%rbp)" or "%rdi,8)". But with recent change in the area, it doesn't match anymore and break the data type profiling. Let's pass the correct register name by removing the unwanted part. Add arch__dwarf_regnum() to handle it in a single place. Closes: 7d3n23li6drroxrdlpxn7ixehdeszkjdftah3zyngjl2qs22ef@yelcjv53v42o Reported-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
bb5a920b90 |
perf stat: Ensure metrics are displayed even with failed events
Currently, `perf stat` skips or hides metrics when the underlying
hardware events cannot be counted (e.g., due to insufficient permissions
or unsupported events).
In `--metric-only` mode, this often results in missing columns or blank
spaces, making the output difficult to parse.
Modify the logic to ensure metrics are consistently displayed by
propagating NAN (Not a Number) through the expression evaluator.
Specifically:
1. Update `prepare_metric()` in stat-shadow.c to treat uncounted events
(where `run == 0`) as NAN. This leverages the existing math in expr.y
to propagate NAN through metric expressions.
2. Remove the early return in the display logic's `printout()` function
that was previously skipping metrics in `--metric-only` mode for
failed events.
l
3. Simplify `perf_stat__skip_metric_event()` to no longer depend on
event runtime.
Tested:
1. `perf all metrics test` did not crash while paranoid is 2.
2. Multiple combinations with `CPUs_utilized` while paranoid is 2.
$ ./perf stat -M CPUs_utilized -a -- sleep 1
Performance counter stats for 'system wide':
<not supported> msec cpu-clock:u # nan CPUs CPUs_utilized
1,006,356,120 duration_time
1.004375550 seconds time elapsed
$ ./perf stat -M CPUs_utilized -a -j -- sleep 1
{"counter-value" : "<not supported>", "unit" : "msec", "event" : "cpu-clock:u", "event-runtime" : 0, "pcnt-running" : 100.00, "metric-value" : "nan", "metric-unit" : "CPUs CPUs_utilized"}
{"counter-value" : "1006642462.000000", "unit" : "", "event" : "duration_time", "event-runtime" : 1, "pcnt-running" : 100.00}
$ ./perf stat -M CPUs_utilized -a --metric-only -- sleep 1
Performance counter stats for 'system wide':
CPUs CPUs_utilized
nan
1.004424652 seconds time elapsed
$ ./perf stat -M CPUs_utilized -a --metric-only -j -- sleep 1
{"CPUs CPUs_utilized" : "none"}
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
446c595dc0 |
perf test addr2line_inlines: Ensure inline information shows on LBR leaves
Expand the addr2line inline function testing to also run for an LBR callchain, skipping if LBR support isn't present. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Krzysztof Łopatowski <krzysztof.m.lopatowski@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
04f81f45b4 |
perf callchain lbr: Make the leaf IP that of the sample
The current IP of a leaf function when reported from a perf record with
"--call-graph lbr" is the "to" field of the LBR branch stack record.
The sample for the event being recorded may be further into the function
and there may be inlining information associated with it.
Rather than use the branch stack "to" field in this case switch to the
callchain appending the sample->ip and thereby allowing the inline
information to show.
Before this change:
```
$ perf record --call-graph lbr perf test -w inlineloop
...
$ perf script --fields +srcline
...
perf-inlineloop 467586 4649.344493: 950905 cpu_core/cycles/P:
55dfda2829c0 parent+0x0 (perf)
inlineloop.c:31
55dfda282a96 inlineloop+0x86 (perf)
inlineloop.c:47
55dfda236420 run_workload+0x59 (perf)
builtin-test.c:715
55dfda236b03 cmd_test+0x413 (perf)
builtin-test.c:825
...
```
After this change:
```
$ perf record --call-graph lbr perf test -w inlineloop
...
$ perf script --fields +srcline
...
perf-inlineloop 529703 11878.680815: 950905 cpu_core/cycles/P:
555ce86be9e6 leaf+0x26
inlineloop.c:20 (inlined)
555ce86be9e6 middle+0x26
inlineloop.c:27 (inlined)
555ce86be9e6 parent+0x26 (perf)
inlineloop.c:32
555ce86bea96 inlineloop+0x86 (perf)
inlineloop.c:47
555ce8672420 run_workload+0x59 (perf)
builtin-test.c:715
555ce8672b03 cmd_test+0x413 (perf)
builtin-test.c:825
...
```
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Krzysztof Łopatowski <krzysztof.m.lopatowski@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
a724a8fce5 |
perf kvm stat: Fix build error
Since commit |
||
|
|
e5e66adfe4 |
perf regs: Remove __weak attributive arch_sdt_arg_parse_op() function
In line with the previous patch, the __weak arch_sdt_arg_parse_op() function is removed. Architectural-specific implementations in the arch/ directory are now converted into sub-functions within the util/perf-regs-arch/ directory. The perf_sdt_arg_parse_op() function will call these sub-functions based on the EM_HOST. This change enables cross-architecture calls to arch_sdt_arg_parse_op(). No functional changes are intended. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Guo Ren <guoren@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <pjw@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Xudong Hao <xudong.hao@intel.com> Cc: Zide Chen <zide.chen@intel.com> [ Fixed up somme fuzz with powerpc and x86 Build files wrt removing perf_regs.o ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
16dccbb842 |
perf regs: Remove __weak attributive arch__xxx_reg_mask() functions
Currently, some architecture-specific perf-regs functions, such as arch__intr_reg_mask() and arch__user_reg_mask(), are defined with the __weak attribute. This approach ensures that only functions matching the architecture of the build/run host are compiled and executed, reducing build time and binary size. However, this __weak attribute restricts these functions to be called only on the same architecture, preventing cross-architecture functionality. For example, a perf.data file captured on x86 cannot be parsed on an ARM platform. To address this limitation, this patch removes the __weak attribute from these perf-regs functions. The architecture-specific code is moved from the arch/ directory to the util/perf-regs-arch/ directory. The appropriate architectural functions are then called based on the EM_HOST. No functional changes are intended. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Guo Ren <guoren@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <pjw@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Xudong Hao <xudong.hao@intel.com> Cc: Zide Chen <zide.chen@intel.com> [ Fixed up somme fuzz with s390 and riscv Build files wrt removing perf_regs.o ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
e716e69cf6 |
perf arch: Update arch headers to use relative UAPI paths
The architectural specific headers perf_regs.h currently rely on the host architecture's 'asm/perf_regs.h'. This can lead to compilation inconsistencies or failures when including and building perf for a target architecture that differs from the host's architecture. Explicitly point to the UAPI headers within the tools source tree using relative paths. This ensures that perf is always built against the intended architecture. No functional changes are intended. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Guo Ren <guoren@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <pjw@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Xudong Hao <xudong.hao@intel.com> Cc: Zide Chen <zide.chen@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
c2e28ae294 |
perf regs: Fix abort for "-I" or "--user-regs" options
Fix an issue where the `perf` tool aborts unexpectedly when running the
following command:
```
perf record -e cycles -I -- true
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-I, --intr-regs[=<any register>]
sample selected machine registers on interrupt, use '-I?' to list register names
```
The usage of the `-I` or `--user-regs` options without specifying any
registers should default to sampling all general-purpose registers.
However, this currently causes an abnormal termination.
The issue was introduced by commit
|
||
|
|
cee275edcd |
perf metricgroup: Don't early exit if no CPUID table exists
The failure to find a table of metrics with a CPUID shouldn't early
exit as the metric code will now also consider the default table.
When searching for a metric or metric group,
pmu_metrics_table__for_each_metric() considers all tables and so the
caller doesn't need to switch the table to do this.
Fixes:
|
||
|
|
f637bb2eed |
perf tests: build-test coverage for NO_JEVENTS=1
Leo reported 'perf stat' being broken and this highlighted that the 'make NO_JEVENTS=1' variant is missing from 'make -C tools/perf build-test', add it. Closes: https://lore.kernel.org/linux-perf-users/20260205175250.GC3529712@e132581.arm.com/ Reported-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
1d9622c3c1 |
perf tests: Additional 'perf stat' tests
Recently 'perf stat' regressed in per CPU mode [1]. Let's expand test coverage to catch the same breakage again as well as to test the repeat, pid, detailed and no aggregation options. [1] https://lore.kernel.org/linux-perf-users/cgja46br2smmznxs7kbeabs6zgv3b4olfqgh2fdp5mxk2yom4v@w6jjgov6hdi6/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andres Freund <andres@anarazel.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
a108a6a4b9 |
perf record: Make logs more readable for event open failures
Since commit
|