Commit graph

231 commits

Author SHA1 Message Date
Costa Shulyupin
6ea8a20610 rtla: Fix parse_cpu_set() bug introduced by strtoi()
The patch 'Replace atoi() with a robust strtoi()' introduced a bug
in parse_cpu_set(), which relies on partial parsing of the input string.

The function parses CPU specifications like '0-3,5' by incrementing
a pointer through the string. strtoi() rejects strings with trailing
characters, causing parse_cpu_set() to fail on any CPU list with
multiple entries.

Restore the original use of atoi() in parse_cpu_set().

Fixes: 7e9dfccf8f ("rtla: Replace atoi() with a robust strtoi()")
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20260112192642.212848-2-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-13 08:32:52 +01:00
Wander Lairson Costa
fb8b818320 rtla: Fix parse_cpu_set() return value documentation
Correct the return value documentation for parse_cpu_set() function
in utils.c. The comment incorrectly stated that the function returns
1 on success and 0 on failure, but the actual implementation returns
0 on success and 1 on failure, following the common error-on-nonzero
convention used throughout the codebase.

This documentation fix ensures that developers reading the code
understand the correct return value semantics and prevents potential
misuse of the function's return value in conditional checks.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-18-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:56 +01:00
Wander Lairson Costa
33e3c807ab rtla: Ensure null termination after read operations in utils.c
Add explicit null termination and buffer initialization for read()
operations in procfs_is_workload_pid() and get_self_cgroup() functions.
The read() system call does not null-terminate the data it reads, and
when the buffer is filled to capacity, subsequent string operations
will read past the buffer boundary searching for a null terminator.

In procfs_is_workload_pid(), explicitly set buffer[MAX_PATH-1] to '\0'
to ensure the buffer is always null-terminated before passing it to
strncmp(). In get_self_cgroup(), use memset() to zero the path buffer
before reading, which ensures null termination when retval is less than
MAX_PATH. Additionally, set path[MAX_PATH-1] to '\0' after the read to
handle the case where the buffer is filled completely.

These defensive buffer handling practices prevent potential buffer
overruns and align with the ongoing buffer safety improvements across
the rtla codebase.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-17-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:56 +01:00
Wander Lairson Costa
af2962d68b rtla: Make stop_tracing variable volatile
The stop_tracing global variable is accessed from both the signal
handler context and the main program flow without synchronization.
This creates a potential race condition where compiler optimizations
could cache the variable value in registers, preventing the signal
handler's updates from being visible to other parts of the program.

Add the volatile qualifier to stop_tracing in both common.c and
common.h to ensure all accesses to this variable bypass compiler
optimizations and read directly from memory. This guarantees that
when the signal handler sets stop_tracing, the change is immediately
visible to the main program loop, preventing potential hangs or
delayed shutdown when termination signals are received.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-16-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:55 +01:00
Wander Lairson Costa
02689ae385 rtla: Add generated output files to gitignore
The rtla tool generates various output files during testing and
execution, including custom trace outputs and histogram data. These
files are artifacts of running the tool with different options and
should not be tracked in version control.

Add gitignore entries for custom_filename.txt, osnoise_irq_noise_hist.txt,
osnoise_trace.txt, and timerlat_trace.txt to prevent accidentally
committing these generated files. This aligns with the existing pattern
of ignoring build artifacts and generated headers like *.skel.h.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-15-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:55 +01:00
Wander Lairson Costa
a0890f9dbd rtla: Fix NULL pointer dereference in actions_parse
The actions_parse() function uses strtok() to tokenize the trigger
string, but does not check if the returned token is NULL before
passing it to strcmp(). If the trigger parameter is an empty string
or contains only delimiter characters, strtok() returns NULL, causing
strcmp() to dereference a NULL pointer and crash the program.

This issue can be triggered by malformed user input or edge cases in
trigger string parsing. Add a NULL check immediately after the strtok()
call to validate that a token was successfully extracted before using
it. If no token is found, the function now returns -1 to indicate a
parsing error.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-13-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:55 +01:00
Wander Lairson Costa
f3cc3e4b51 rtla: Remove unused headers
Remove unused includes for <errno.h> and <signal.h> to clean up the
code and reduce unnecessary dependencies.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-12-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:55 +01:00
Wander Lairson Costa
d849f3af1c rtla: Remove redundant memset after calloc
The actions struct is allocated using calloc, which already returns
zeroed memory. The subsequent memset call to zero the 'present' member
is therefore redundant.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-10-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:55 +01:00
Wander Lairson Costa
9bf942f3c3 rtla: Use standard exit codes for result enum
The result enum defines custom values for PASSED, ERROR, and FAILED.
These values correspond to standard exit codes EXIT_SUCCESS and
EXIT_FAILURE.

Update the enum to use the standard macros EXIT_SUCCESS and
EXIT_FAILURE to improve readability and adherence to standard C
practices.

The FAILED value is implicitly assigned EXIT_FAILURE + 1, so there
is no need to assign an explicit value.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-9-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:55 +01:00
Wander Lairson Costa
7e9dfccf8f rtla: Replace atoi() with a robust strtoi()
The atoi() function does not perform error checking, which can lead to
undefined behavior when parsing invalid or out-of-range strings. This
can cause issues when parsing user-provided numerical inputs, such as
signal numbers, PIDs, or CPU lists.

To address this, introduce a new strtoi() helper function that safely
converts a string to an integer. This function validates the input and
checks for overflows, returning a negative value on  failure.

Replace all calls to atoi() with the new strtoi() function and add
proper error handling to make the parsing more robust and prevent
potential issues.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-5-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:55 +01:00
Wander Lairson Costa
648634d17c rtla: Introduce for_each_action() helper
The for loop to iterate over the list of actions is used in
more than one place. To avoid code duplication and improve
readability, introduce a for_each_action() helper macro.

Replace the open-coded for loops with the new helper.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20260106133655.249887-4-wander@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:55 +01:00
Costa Shulyupin
2a3a25336b tools/rtla: Deduplicate cgroup path opening code
Both set_pid_cgroup() and set_comm_cgroup() functions contain
identical code for opening the cgroup.procs file.

Extract this common code into a new helper function open_cgroup_procs()
to reduce code duplication and improve maintainability.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251224125058.1771519-1-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:38 +01:00
Costa Shulyupin
0576be469e tools/rtla: Consolidate -H/--house-keeping option parsing
Each rtla tool duplicates parsing of -H/--house-keeping.

Migrate the option parsing from individual tools to the
common_parse_options().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251209100047.2692515-8-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:17 +01:00
Costa Shulyupin
5cc90b14ee tools/rtla: Consolidate -P/--priority option parsing
Each rtla tool duplicates parsing of -P/--priority.

Migrate the option parsing from individual tools to the
common_parse_options().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251209100047.2692515-7-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:17 +01:00
Costa Shulyupin
c93c25fca5 tools/rtla: Consolidate -e/--event option parsing
Each rtla tool duplicates parsing of -e/--event.

Migrate the option parsing from individual tools to the
common_parse_options().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251209100047.2692515-6-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:17 +01:00
Costa Shulyupin
76975581fb tools/rtla: Consolidate -d/--duration option parsing
Each rtla tool duplicates parsing of -d/--duration.

Migrate the option parsing from individual tools to the
common_parse_options().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251209100047.2692515-5-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:17 +01:00
Costa Shulyupin
fd788c49a9 tools/rtla: Consolidate -D/--debug option parsing
Each rtla tool duplicates parsing of -D/--debug.

Migrate the option parsing from individual tools to the
common_parse_options().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251209100047.2692515-4-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:17 +01:00
Costa Shulyupin
edb23c8372 tools/rtla: Consolidate -C/--cgroup option parsing
Each rtla tool duplicates parsing of -C/--cgroup.

Migrate the option parsing from individual tools to the
common_parse_options().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251209100047.2692515-3-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:17 +01:00
Costa Shulyupin
28dc445919 tools/rtla: Consolidate -c/--cpus option parsing
Each rtla tool duplicates parsing of -c/--cpus.

Migrate the option parsing from individual tools to the
common_parse_options().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251209100047.2692515-2-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:16 +01:00
Costa Shulyupin
850cd24cb6 tools/rtla: Add common_parse_options()
Each rtla tool duplicates parsing of many common options. This creates
maintenance overhead and risks inconsistencies when updating these
options.

Add common_parse_options() to centralize parsing of options used across
all tools.

Common options to be migrated in future patches.

Changes since v1:
- restore opterr

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251209100047.2692515-1-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:16 +01:00
Tomas Glozar
fbb8ed6682 rtla/tests: Run Test::Harness in verbose mode
Add -v flag to prove command to also print the names of tests that
succeeded, not only those that failed, to allow easier debugging of the
test suite.

Also, drop printing the option and value to stdout in
check_with_osnoise_options, which was a debugging print that was
accidentally left in the final commit, and which would be otherwise now
visible in make check output, as stdout is no longer suppressed.

Suggested-by: Crystal Wood <crwood@redhat.com>
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20251126144205.331954-6-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:16 +01:00
Tomas Glozar
5525aebd4e rtla/tests: Test BPF action program
Add a test that implements a BPF program writing to a test map, which
is attached to RTLA via --bpf-action to be executed on theshold
overflow.

A combination of --on-threshold shell with bpftool (which is always
present if BPF support is enabled) is used to check whether the BPF
program has executed successfully.

Suggested-by: Crystal Wood <crwood@redhat.com>
Link: https://lore.kernel.org/r/20251126144205.331954-5-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:16 +01:00
Tomas Glozar
0304a3b7ec rtla/timerlat: Add example for BPF action program
Add an example BPF action program that prints the measured latency to
the tracefs buffer via bpf_printk().

A new Makefile target, "examples", is added to build the example. In
addition, "sample/" subfolder is renamed to "example".

If BPF skeleton support is unavailable or disabled, a warning will be
displayed when building the BPF action program example.

Link: https://lore.kernel.org/r/20251126144205.331954-4-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:16 +01:00
Tomas Glozar
f967d1eca7 rtla/timerlat: Add --bpf-action option
Add option --bpf-action that allows the user to attach an external BPF
program that will be executed via BPF tail call on latency threshold
overflow.

Executing additional BPF code on latency threshold overflow allows doing
low-latency and in-kernel troubleshooting of the cause of the overflow.

The option takes an argument, which is a path to a BPF ELF file
expected to contain a function named "action_handler" in a section named
"tp/timerlat_action" (the section is necessary for libbpf to assign the
correct BPF program type to it).

Link: https://lore.kernel.org/r/20251126144205.331954-3-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:16 +01:00
Tomas Glozar
8cd0f08ac7 rtla/timerlat: Support tail call from BPF program
Add a map to the rtla-timerlat BPF program that holds a file descriptor
of another BPF program, to be executed on threshold overflow.

timerlat_bpf_set_action() is added as an interface to set the program.

Link: https://lore.kernel.org/r/20251126144205.331954-2-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:15 +01:00
Costa Shulyupin
a08e012e81 tools/rtla: Add common_usage()
The rtla tools have significant code quadruplication in their usage
functions. Each tool implements its own version of the same help text
formatting and option descriptions, leading to maintenance overhead and
inconsistencies.  Documentation/tools/rtla/common_options.rst lists 14
common options.

Add common_usage() infrastructure to consolidate help formatting.
Subsequent patches will extend this to handle other common options.

The refactored output is almost identical to the original, with the
following changes:
- add square brackets to specify optionality: `usage: [rtla] ...`
- remove `-q` from timerlat hist because hist tools don't support it
- minor spacing

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251124063204.845425-1-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:15 +01:00
Crystal Wood
c219d4ee1d rtla: Set stop threshold after all instances are enabled
This avoids startup races where one of the instances hit a threshold
before all instances were enabled, and thus tracing stops without
the relevant event.  In particular, this is not uncommon with the
tests that set a very tight threshold and then complain if there's
no analysis.

This also ensures that we don't stop tracing during a warmup.

The downside is a small chance of having an event over the threshold
early in the output, without stopping on it, which could cause user
confusion.  This should be less likely if the warmup feature is used, but
that doesn't eliminate the race window, just the odds of an unusual spike
right at that moment.

Signed-off-by: Crystal Wood <crwood@redhat.com>
Link: https://lore.kernel.org/r/20251112152529.956778-6-crwood@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-07 15:57:15 +01:00
Costa Shulyupin
11aa4a1809 tools/rtla: Remove unused function declarations
Historically four function declarations remain orphaned or duplicated.

Remove them to keep the source clean.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/r/20251012071133.290225-1-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2026-01-06 10:11:59 +01:00
Linus Torvalds
5779de8d36 rtla updaets for v6.19:
- Officially add Tomas Glozar as a maintainer to RTLA tool
 
 - Add for_each_monitored_cpu() helper
 
   In multiple places, RTLA tools iterate over the list of CPUs running
   tracer threads.
 
   Use single helper instead of repeating the for/if combination.
 
 - Remove unused variable option_index in argument parsing
 
   RTLA tools use getopt_long() for argument parsing. For its last
   argument, an unused variable "option_index" is passed.
 
   Remove the variable and pass NULL to getopt_long() to shorten
   the naturally long parsing functions, and make them more readable.
 
 - Fix unassigned nr_cpus after code consolidation
 
   In recent code consolidation, timerlat tool cleanup, previously
   implemented separately for each tool, was moved to a common function
   timerlat_free().
 
   The cleanup relies on nr_cpus being set. This was not done in the new
   function, leaving the variable uninitialized.
 
   Initialize the variable properly, and remove silencing of compiler
   warning for uninitialized variables.
 
 - Stop tracing on user latency in BPF mode
 
   Despite the name, rtla-timerlat's -T/--thread option sets timerlat's
   stop_tracing_total_us option, which also stops tracing on
   return-from-user latency, not only on thread latency.
 
   Implement the same behavior also in BPF sample collection stop tracing
   handler to avoid a discrepancy and restore correspondence of behavior
   with the equivalent option of cyclictest.
 
 - Fix threshold actions always triggering
 
   A bug in threshold action logic caused the action to execute even
   if tracing did not stop because of threshold.
 
   Fix the logic to stop correctly.
 
 - Fix few minor issues in tests
 
   Extend tests that were shown to need it to 5s, fix osnoise test
   calling timerlat by mistake, and use new, more reliable output
   checking in timerlat's "top stop at failed action" test.
 
 - Do not print usage on argument parsing error
 
   RTLA prints the entire usage message on encountering errors in
   argument parsing, like a malformed CPU list.
 
   The usage message has gotten too long. Instead of printing it,
   use newly added fatal() helper function to simply exit with
   the error message, excluding the usage.
 
 - Fix unintuitive -C/--cgroup interface
 
   "-C cgroup" and "--cgroup cgroup" are invalid syntax, despite that
   being a common way to specify an option with argument. Moreover,
   using them fails silently and no cgroup is set.
 
   Create new helper function to unify the handling of all such options
   and allow all of:
 
   -Xsomething
   -X=something
   -X something
 
   as well as the equivalent for the long option.
 
 - Fix -a overriding -t argument filename
 
   Fix a bug where -a following -t custom_file.txt overrides the custom
   filename with the default timerlat_trace.txt.
 
 - Stop tracing correctly on multiple events at once
 
   In some race scenarios, RTLA BPF sample collection might send multiple
   stop tracing events via the BPF ringbuffer at once.
 
   Compare the number of events for != 0 instead of == 1 to cover for
   this scenario and stop tracing properly.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaS9bxBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhrgAP0a/AtsL9+IFXAK5JK8aO1XWApVyK9n
 48FRZWu/jrupuAD7BO+EHazmPEourNaUqYPeuymwxT+4O47RH1Q/aasLQwo=
 =RvNH
 -----END PGP SIGNATURE-----

Merge tag 'trace-tools-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull rtla trace tooling updates from Steven Rostedt:

 - Officially add Tomas Glozar as a maintainer to RTLA tool

 - Add for_each_monitored_cpu() helper

   In multiple places, RTLA tools iterate over the list of CPUs running
   tracer threads.

   Use single helper instead of repeating the for/if combination.

 - Remove unused variable option_index in argument parsing

   RTLA tools use getopt_long() for argument parsing. For its last
   argument, an unused variable "option_index" is passed.

   Remove the variable and pass NULL to getopt_long() to shorten the
   naturally long parsing functions, and make them more readable.

 - Fix unassigned nr_cpus after code consolidation

   In recent code consolidation, timerlat tool cleanup, previously
   implemented separately for each tool, was moved to a common function
   timerlat_free().

   The cleanup relies on nr_cpus being set. This was not done in the new
   function, leaving the variable uninitialized.

   Initialize the variable properly, and remove silencing of compiler
   warning for uninitialized variables.

 - Stop tracing on user latency in BPF mode

   Despite the name, rtla-timerlat's -T/--thread option sets timerlat's
   stop_tracing_total_us option, which also stops tracing on
   return-from-user latency, not only on thread latency.

   Implement the same behavior also in BPF sample collection stop
   tracing handler to avoid a discrepancy and restore correspondence of
   behavior with the equivalent option of cyclictest.

 - Fix threshold actions always triggering

   A bug in threshold action logic caused the action to execute even if
   tracing did not stop because of threshold.

   Fix the logic to stop correctly.

 - Fix few minor issues in tests

   Extend tests that were shown to need it to 5s, fix osnoise test
   calling timerlat by mistake, and use new, more reliable output
   checking in timerlat's "top stop at failed action" test.

 - Do not print usage on argument parsing error

   RTLA prints the entire usage message on encountering errors in
   argument parsing, like a malformed CPU list.

   The usage message has gotten too long. Instead of printing it, use
   newly added fatal() helper function to simply exit with the error
   message, excluding the usage.

 - Fix unintuitive -C/--cgroup interface

   "-C cgroup" and "--cgroup cgroup" are invalid syntax, despite that
   being a common way to specify an option with argument. Moreover,
   using them fails silently and no cgroup is set.

   Create new helper function to unify the handling of all such options
   and allow all of:

     -Xsomething
     -X=something
     -X something

   as well as the equivalent for the long option.

 - Fix -a overriding -t argument filename

   Fix a bug where -a following -t custom_file.txt overrides the custom
   filename with the default timerlat_trace.txt.

 - Stop tracing correctly on multiple events at once

   In some race scenarios, RTLA BPF sample collection might send
   multiple stop tracing events via the BPF ringbuffer at once.

   Compare the number of events for != 0 instead of == 1 to cover for
   this scenario and stop tracing properly.

* tag 'trace-tools-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rtla/timerlat: Exit top main loop on any non-zero wait_retval
  rtla/tests: Don't rely on matching ^1ALL
  rtla: Fix -a overriding -t argument
  rtla: Fix -C/--cgroup interface
  tools/rtla: Replace osnoise_hist_usage("...") with fatal("...")
  tools/rtla: Replace osnoise_top_usage("...") with fatal("...")
  tools/rtla: Replace timerlat_hist_usage("...") with fatal("...")
  tools/rtla: Replace timerlat_top_usage("...") with fatal("...")
  tools/rtla: Add fatal() and replace error handling pattern
  rtla/tests: Fix osnoise test calling timerlat
  rtla/tests: Extend action tests to 5s
  tools/rtla: Fix --on-threshold always triggering
  rtla/timerlat_bpf: Stop tracing on user latency
  tools/rtla: Fix unassigned nr_cpus
  tools/rtla: Remove unused optional option_index
  tools/rtla: Add for_each_monitored_cpu() helper
  MAINTAINERS: Add Tomas Glozar as a maintainer to RTLA tool
2025-12-05 09:34:01 -08:00
Crystal Wood
3138df6f0c rtla/timerlat: Exit top main loop on any non-zero wait_retval
Comparing to exactly 1 will fail if more than one ring buffer
event was seen since the last call to timerlat_bpf_wait(), which
can happen in some race scenarios.

Signed-off-by: Crystal Wood <crwood@redhat.com>
Link: https://lore.kernel.org/r/20251112152529.956778-5-crwood@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Crystal Wood
61f1fd5d69 rtla/tests: Don't rely on matching ^1ALL
The timerlat "top stop at failed action" test was relying on "ALL" being
printed immediately after the "1" from the threshold action.  Besides being
fragile, this depends on stdbuf behavior, which is easy to miss when
recreating the test outside of the framework for debugging purposes.

Instead, use the expected/unexpected text mechanism from the
corresponding osnoise test.

Signed-off-by: Crystal Wood <crwood@redhat.com>
Link: https://lore.kernel.org/r/20251112152529.956778-2-crwood@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Ivan Pravdin
ddb6e42494 rtla: Fix -a overriding -t argument
When running rtla as

    `rtla <timerlat|osnoise> <top|hist> -t custom_file.txt -a 100`

-a options override trace output filename specified by -t option.
Running the command above will create <timerlat|osnoise>_trace.txt file
instead of custom_file.txt. Fix this by making sure that -a option does
not override trace output filename even if it's passed after trace
output filename is specified.

Fixes: 173a3b0148 ("rtla/timerlat: Add the automatic trace option")
Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/b6ae60424050b2c1c8709e18759adead6012b971.1762186418.git.ipravdin.official@gmail.com
[ use capital letter in subject, as required by tracing subsystem ]
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Ivan Pravdin
7b71f3a698 rtla: Fix -C/--cgroup interface
Currently, user can only specify cgroup to the tracer's thread the
following ways:

    `-C[cgroup]`
    `-C[=cgroup]`
    `--cgroup[=cgroup]`

If user tries to specify cgroup as `-C [cgroup]` or `--cgroup [cgroup]`,
the parser silently fails and rtla's cgroup is used for the tracer
threads.

To make interface more user-friendly, allow user to specify cgroup in
the aforementioned way, i.e. `-C [cgroup]` and `--cgroup [cgroup]`.

Refactor identical logic between -t/--trace and -C/--cgroup into a
common function.

Change documentation to reflect this user interface change.

Fixes: a957cbc025 ("rtla: Add -C cgroup support")
Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/16132f1565cf5142b5fbd179975be370b529ced7.1762186418.git.ipravdin.official@gmail.com
[ use capital letter in subject, as required by tracing subsystem ]
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Costa Shulyupin
49c1579419 tools/rtla: Replace osnoise_hist_usage("...") with fatal("...")
A long time ago, when the usage help was short, it was a favor
to the user to show it on error. Now that the usage help has
become very long, it is too noisy to dump the complete help text
for each typo after the error message itself.

Replace osnoise_hist_usage("...") with fatal("...") on errors.

Remove the already unused 'usage' argument from osnoise_hist_usage().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-6-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Costa Shulyupin
92b5b55e5e tools/rtla: Replace osnoise_top_usage("...") with fatal("...")
A long time ago, when the usage help was short, it was a favor
to the user to show it on error. Now that the usage help has
become very long, it is too noisy to dump the complete help text
for each typo after the error message itself.

Replace osnoise_top_usage("...") with fatal("...") on errors.

Remove the already unused 'usage' argument from osnoise_top_usage().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-5-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Costa Shulyupin
8f4264e046 tools/rtla: Replace timerlat_hist_usage("...") with fatal("...")
A long time ago, when the usage help was short, it was a favor
to the user to show it on error. Now that the usage help has
become very long, it is too noisy to dump the complete help text
for each typo after the error message itself.

Replace timerlat_hist_usage("...\n") with fatal("...") on errors.

Remove the already unused 'usage' argument from timerlat_hist_usage().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-4-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Costa Shulyupin
4e5e7210f9 tools/rtla: Replace timerlat_top_usage("...") with fatal("...")
A long time ago, when the usage help was short, it was a favor
to the user to show it on error. Now that the usage help has
become very long, it is too noisy to dump the complete help text
for each typo after the error message itself.

Replace timerlat_top_usage("...\n") with fatal("...") on errors.

Remove the already unused 'usage' argument from timerlat_top_usage().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-3-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Costa Shulyupin
8cbb25db81 tools/rtla: Add fatal() and replace error handling pattern
The code contains some technical debt in error handling,
which complicates the consolidation of duplicated code.

Introduce an fatal() function to replace the common pattern of
err_msg() followed by exit(EXIT_FAILURE), reducing the length of an
already long function.

Further patches using fatal() follow.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-2-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Tomas Glozar
34c170ae5c rtla/tests: Fix osnoise test calling timerlat
osnoise test "top stop at failed action" is calling timerlat instead of
osnoise by mistake.

Fix it so that it calls the correct RTLA subcommand.

Fixes: 05b7e10687 ("tools/rtla: Add remaining support for osnoise actions")
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20251007095341.186923-3-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:00 +01:00
Tomas Glozar
d649e9f04c rtla/tests: Extend action tests to 5s
In non-BPF mode, it takes up to 1 second for RTLA to notice that tracing
has been stopped. That means that action tests cannot have a 1 second
duration, as the SIGALRM will be racing with the threshold overflow.

Previously, non-BPF mode actions were buggy and always executed
the action, even when stopping on duration or SIGINT, preventing
this issue from manifesting. Now that this has been fixed, the tests
have become flaky, and this has to be adjusted.

Fixes: 4e26f84abf ("rtla/tests: Add tests for actions")
Fixes: 05b7e10687 ("tools/rtla: Add remaining support for osnoise actions")
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20251007095341.186923-2-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:29:01 +01:00
Tomas Glozar
417bd0d502 tools/rtla: Fix --on-threshold always triggering
Commit 8d933d5c89 ("rtla/timerlat: Add continue action") moved the
code performing on-threshold actions (enabled through --on-threshold
option) to inside the RTLA main loop.

The condition in the loop does not check whether the threshold was
actually exceeded or if stop tracing was requested by the user through
SIGINT or duration. This leads to a bug where on-threshold actions are
always performed, even when the threshold was not hit.

(BPF mode is not affected, since it uses a different condition in the
while loop.)

Add a condition that checks for !stop_tracing before executing the
actions. Also, fix incorrect brackets in hist_main_loop to match the
semantics of top_main_loop.

Fixes: 8d933d5c89 ("rtla/timerlat: Add continue action")
Fixes: 2f3172f9dd ("tools/rtla: Consolidate code between osnoise/timerlat and hist/top")
Reviewed-by: Crystal Wood <crwood@redhat.com>
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20251007095341.186923-1-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20 13:15:55 +01:00
Tomas Glozar
e4240db933 rtla/timerlat_bpf: Stop tracing on user latency
rtla-timerlat allows a *thread* latency threshold to be set via the
-T/--thread option. However, the timerlat tracer calls this *total*
latency (stop_tracing_total_us), and stops tracing also when the
return-to-user latency is over the threshold.

Change the behavior of the timerlat BPF program to reflect what the
timerlat tracer is doing, to avoid discrepancy between stopping
collecting data in the BPF program and stopping tracing in the timerlat
tracer.

Cc: stable@vger.kernel.org
Fixes: e34293ddce ("rtla/timerlat: Add BPF skeleton to collect samples")
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20251006143100.137255-1-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20 13:15:55 +01:00
Costa Shulyupin
b4275b2301 tools/rtla: Fix unassigned nr_cpus
In recently introduced timerlat_free(),
the variable 'nr_cpus' is not assigned.

Assign it with sysconf(_SC_NPROCESSORS_CONF) as done elsewhere.
Remove the culprit: -Wno-maybe-uninitialized. The rest of the
code is clean.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Fixes: 2f3172f9dd ("tools/rtla: Consolidate code between osnoise/timerlat and hist/top")
Link: https://lore.kernel.org/r/20251002170846.437888-1-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20 13:15:54 +01:00
Costa Shulyupin
671314fce1 tools/rtla: Remove unused optional option_index
The longindex argument of getopt_long() is optional
and tied to the unused local variable option_index.

Remove it to shorten the four longest functions
and make the code neater.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251002123553.389467-2-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20 13:15:53 +01:00
Costa Shulyupin
04fa6bf373 tools/rtla: Add for_each_monitored_cpu() helper
The rtla tools have many instances of iterating over CPUs while
checking if they are monitored.

Add a for_each_monitored_cpu() helper macro to make the code
more readable and reduce code duplication.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251002123553.389467-1-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20 13:15:53 +01:00
Zhang Chujun
53afec2c8f tracing/tools: Fix incorrcet short option in usage text for --threads
The help message incorrectly listed '-t' as the short option for
--threads, but the actual getopt_long configuration uses '-e'.
This mismatch can confuse users and lead to incorrect command-line
usage. This patch updates the usage string to correctly show:
	"-e, --threads NRTHR"
to match the implementation.

Note: checkpatch.pl reports a false-positive spelling warning on
'Run', which is intentional.

Link: https://patch.msgid.link/20251106031040.1869-1-zhangchujun@cmss.chinamobile.com
Signed-off-by: Zhang Chujun <zhangchujun@cmss.chinamobile.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-07 07:59:37 -05:00
Linus Torvalds
d9f24f8e60 rtla: Updates for v6.18
- This update is mostly just consolidating code between osnoise/timerlat
   and top/hist for easier maintenance and less future divergence.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaN/guhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qqU9AQCO+u+Qmx678DCfDJo9X1UPDtS/bM5f
 r30X1pwYfZ3nNAEA47hbkVFcryFJZbrIPxuTGb0GSM36PHAxmch4QAwBqgs=
 =qzZh
 -----END PGP SIGNATURE-----

Merge tag 'trace-tools-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing tools updates from Steven Rostedt

 - This is mostly just consolidating code between osnoise/timerlat and
   top/hist for easier maintenance and less future divergence

* tag 'trace-tools-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tools/rtla: Add remaining support for osnoise actions
  tools/rtla: Add test engine support for unexpected output
  tools/rtla: Fix -A option name in test comment
  tools/rtla: Consolidate code between osnoise/timerlat and hist/top
  tools/rtla: Create common_apply_config()
  tools/rtla: Move top/hist params into common struct
  tools/rtla: Consolidate common parameters into shared structure
2025-10-05 09:38:26 -07:00
Wander Lairson Costa
2227f273b7 rtla/actions: Fix condition for buffer reallocation
The condition to check if the actions buffer needs to be resized was
incorrect. The check `self->size >= self->len` would evaluate to
true on almost every call to `actions_new()`, causing the buffer to
be reallocated unnecessarily each time an action was added.

Fix the condition to `self->len >= self.size`, ensuring
that the buffer is only resized when it is actually full.

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250915181101.52513-1-wander@redhat.com
Fixes: 6ea082b171 ("rtla/timerlat: Add action on threshold feature")
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27 06:01:20 -04:00
Ivan Pravdin
b1e0ff7209 rtla: Fix buffer overflow in actions_parse
Currently, tests 3 and 13-22 in tests/timerlat.t fail with error:

    *** buffer overflow detected ***: terminated
    timeout: the monitored command dumped core

The result of running `sudo make check` is

    tests/timerlat.t (Wstat: 0 Tests: 22 Failed: 11)
      Failed tests:  3, 13-22
    Files=3, Tests=34, 140 wallclock secs ( 0.07 usr  0.01 sys + 27.63 cusr
    27.96 csys = 55.67 CPU)
    Result: FAIL

Fix buffer overflow in actions_parse to avoid this error. After this
change, the tests results are

    tests/hwnoise.t ... ok
    tests/osnoise.t ... ok
    tests/timerlat.t .. ok
    All tests successful.
    Files=3, Tests=34, 186 wallclock secs ( 0.06 usr  0.01 sys + 41.10 cusr
    44.38 csys = 85.55 CPU)
    Result: PASS

Link: https://lore.kernel.org/164ffc2ec8edacaf1295789dad82a07817b6263d.1757034919.git.ipravdin.official@gmail.com
Fixes: 6ea082b171 ("rtla/timerlat: Add action on threshold feature")
Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27 05:57:29 -04:00
Crystal Wood
05b7e10687 tools/rtla: Add remaining support for osnoise actions
The basic functionality came with the consolidation; now hook up the
command line options, and add documentation and tests.

Cc: John Kacur <jkacur@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/20250907022325.243930-8-crwood@redhat.com
Reviewed-by: Tomas Glozar  <tglozar@redhat.com>
Signed-off-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27 04:53:48 -04:00