linux/kernel/time
Frederic Weisbecker 61f7fdf8fd timers/migration: Fix ignored event due to missing CPU update
When a group event is updated with its expiry unchanged but a different
CPU, that target change may go unnoticed and the event may be propagated
up with a stale CPU value. The following depicts a scenario that has
been actually observed:

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = TGRP1:0 (T0)
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T0
      /         \
    0 (T0)       1 (T1)
    idle         idle

0) The hierarchy has 3 levels. The left part (GRP1:0) is all idle,
including CPU 0 and CPU 1 which have a timer each: T0 and T1. They have
the same expiry value.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = KTIME_MAX
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T0
      /         \
    0 (T0)       1 (T1)
    idle         idle

1) The migrator in GRP1:1 handles remotely T0. The event is dequeued
from the top and T0 executed.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = KTIME_MAX
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T1
      /         \
    0            1 (T1)
    idle         idle

2) The migrator in GRP1:1 fetches the next timer for CPU 0 and finds
none. But it updates the events from its groups, starting with GRP0:0
which now has T1 as its next event. So far so good.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = KTIME_MAX
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T1
      /         \
    0            1 (T1)
    idle         idle

3) The migrator in GRP1:1 proceeds upward and updates the events in
GRP1:0. The child event TGRP0:0 is found queued with the same expiry
as before. And therefore it is left unchanged. However the target CPU
is not the same but that fact is ignored so TGRP0:0 still points to
CPU 0 when it should point to CPU 1.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = TGRP1:0 (T0)
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T1
      /         \
    0            1 (T1)
    idle         idle

4) The propagation has reached the top level and TGRP1:0, having TGRP0:0
as its first event, also wrongly points to CPU 0. TGRP1:0 is added to
the top level group.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = KTIME_MAX
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T1
      /         \
    0            1 (T1)
    idle         idle

5) The migrator in GRP1:1 dequeues the next event in top level pointing
to CPU 0. But since it actually doesn't see any real event in CPU 0, it
early returns.

6) T1 is left unhandled until either CPU 0 or CPU 1 wake up.

Some other bad scenario may involve trees with just two levels.

Fix this with unconditionally updating the CPU of the child event before
considering to early return while updating a queued event with an
unchanged expiry value.

Fixes: 7ee9887703 ("timers: Implement the hierarchical pull model")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Link: https://lore.kernel.org/r/Zg2Ct6M2RJAYHgCB@localhost.localdomain
2024-04-05 11:05:16 +02:00
..
alarmtimer.c rtc: class: make rtc_class constant 2024-03-08 12:05:10 +01:00
clockevents.c clockevents: Make clockevents_subsys const 2024-02-07 15:11:24 +01:00
clocksource-wdtest.c clocksource: Scale the watchdog read retries automatically 2024-02-21 12:00:42 +01:00
clocksource.c clocksource: Scale the watchdog read retries automatically 2024-02-21 12:00:42 +01:00
hrtimer.c tick: Split nohz and highres features from nohz_mode 2024-02-26 11:37:32 +01:00
itimer.c time: Prevent undefined behaviour in timespec64_to_ns() 2020-10-26 11:48:11 +01:00
jiffies.c clocksource: Make clocksource watchdog test safe for slow-HZ systems 2021-08-28 17:01:32 +02:00
Kconfig sched/idle: Conditionally handle tick broadcast in default_idle_call() 2024-03-01 21:04:27 +01:00
Makefile timers: Implement the hierarchical pull model 2024-02-22 17:52:32 +01:00
namespace.c vdso/timens: Refactor copy-pasted find_timens_vvar_page() helper into one copy 2022-12-01 11:35:40 +01:00
ntp.c timekeeping, clocksource: Fix various typos in comments 2021-03-22 23:06:48 +01:00
ntp_internal.h ntp: Make the RTC synchronization more reliable 2020-12-11 10:40:52 +01:00
posix-clock.c Fix memory leak in posix_clock_open() 2024-03-27 09:03:22 -07:00
posix-cpu-timers.c posix-cpu-timers: Implement the missing timer_wait_running callback 2023-04-21 15:34:33 +02:00
posix-stubs.c posix-timers: Get rid of [COMPAT_]SYS_NI() uses 2023-12-20 21:30:27 -08:00
posix-timers.c posix-timers: Refer properly to CONFIG_HIGH_RES_TIMERS 2023-06-18 22:41:53 +02:00
posix-timers.h posix-clocks: Introduce clock_get_ktime() callback 2020-01-14 12:20:51 +01:00
sched_clock.c time/sched_clock: Provide sched_clock_noinstr() 2023-06-05 21:11:04 +02:00
test_udelay.c time/debug: Fix memory leak with using debugfs_lookup() 2023-02-09 20:12:27 +01:00
tick-broadcast-hrtimer.c time/tick-broadcast: Remove RCU_NONIDLE() usage 2023-01-13 11:48:16 +01:00
tick-broadcast.c tick/broadcast: Make broadcast device replacement work correctly 2023-05-08 23:18:16 +02:00
tick-common.c tick: Assume timekeeping is correctly handed over upon last offline idle call 2024-02-26 11:37:32 +01:00
tick-internal.h tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING 2024-02-26 11:37:32 +01:00
tick-legacy.c timekeeping: remove xtime_update 2020-10-30 21:57:07 +01:00
tick-oneshot.c time: Fix various kernel-doc problems 2023-01-03 11:07:58 +01:00
tick-sched.c tick/sched: Fix various kernel-doc warnings 2024-04-01 10:36:35 +02:00
tick-sched.h tick/sched: Fix struct tick_sched doc warnings 2024-04-01 10:36:35 +02:00
time.c time: add kernel-doc in time.c 2023-07-14 13:47:07 -06:00
time_test.c time: test: Fix incorrect format specifier 2024-02-27 15:26:08 -07:00
timeconst.bc time: Add SPDX license identifiers 2018-11-23 11:51:20 +01:00
timeconv.c time: Improve performance of time64_to_tm() 2021-06-24 11:51:59 +02:00
timecounter.c time/timecounter: Mark 1st argument of timecounter_cyc2time() as const 2021-04-16 21:03:50 +02:00
timekeeping.c A large set of updates and features for timers and timekeeping: 2024-03-11 14:38:26 -07:00
timekeeping.h asm-generic: cross-architecture timer cleanup 2020-12-16 00:07:17 -08:00
timekeeping_debug.c timekeeping/debug: No need to check return value of debugfs_create functions 2019-01-29 20:08:41 +01:00
timekeeping_internal.h timekeeping/vsyscall: Provide vdso_update_begin/end() 2020-08-06 10:57:30 +02:00
timer.c timers: Fix text inconsistencies and spelling 2024-04-01 10:36:35 +02:00
timer_list.c tick: Split nohz and highres features from nohz_mode 2024-02-26 11:37:32 +01:00
timer_migration.c timers/migration: Fix ignored event due to missing CPU update 2024-04-05 11:05:16 +02:00
timer_migration.h timers: Implement the hierarchical pull model 2024-02-22 17:52:32 +01:00
vsyscall.c timekeeping, clocksource: Fix various typos in comments 2021-03-22 23:06:48 +01:00