mirror of
https://github.com/torvalds/linux.git
synced 2026-03-08 04:04:43 +01:00
Consider the following sequence on a CPU configured with nohz_full:
1) A task P runs in cgroup A, and cgroup A becomes throttled due to CFS
bandwidth control. The gse (cgroup A) where the task P attached is
dequeued and the CPU switches to idle.
2) Before cgroup A is unthrottled, task P is migrated from cgroup A to
another cgroup B (not throttled).
During sched_move_task(), the task P is observed as queued but not
running, and therefore no resched_curr() is triggered.
3) Since the CPU is nohz_full, it remains in do_idle() waiting for an
explicit scheduling event, i.e., resched_curr().
4) For kernel <= 5.10: Later, cgroup A is unthrottled. However, the task
P has already been migrated out of cgroup A, so unthrottle_cfs_rq()
may observe load_weight == 0 and return early without resched_curr()
called. For kernel >= 6.6: The unthrottling path normally triggers
`resched_curr()` almost cases even when no runnable tasks remain in the
unthrottled cgroup, preventing the idle stall described above. However,
if cgroup A is removed before it gets unthrottled, the unthrottling path
for cgroup A is never executed. In a result, no `resched_curr()` can be
called.
5) At this point, the task P is runnable in cgroup B (not throttled), but
the CPU remains in do_idle() with no pending reschedule point. The
system stays in this state until an unrelated event (e.g. a new task
wakeup or any cases) that can trigger a resched_curr() breaks the
nohz_full idle state, and then the task P finally gets scheduled.
The root cause is that sched_move_task() may classify the task as only
queued, not running, and therefore fails to trigger a resched_curr(),
while the later unthrottling path no longer has visibility of the
migrated task.
Preserve the existing behavior for running tasks by issuing
resched_curr(), and explicitly invoke check_preempt_curr() for tasks
that were queued at the time of migration. This ensures that runnable
tasks are reconsidered for scheduling even when nohz_full suppresses
periodic ticks.
Fixes:
|
||
|---|---|---|
| .. | ||
| autogroup.c | ||
| autogroup.h | ||
| build_policy.c | ||
| build_utility.c | ||
| clock.c | ||
| completion.c | ||
| core.c | ||
| core_sched.c | ||
| cpuacct.c | ||
| cpudeadline.c | ||
| cpudeadline.h | ||
| cpufreq.c | ||
| cpufreq_schedutil.c | ||
| cpupri.c | ||
| cpupri.h | ||
| cputime.c | ||
| deadline.c | ||
| debug.c | ||
| ext.c | ||
| ext.h | ||
| ext_idle.c | ||
| ext_idle.h | ||
| ext_internal.h | ||
| fair.c | ||
| features.h | ||
| idle.c | ||
| isolation.c | ||
| loadavg.c | ||
| Makefile | ||
| membarrier.c | ||
| pelt.c | ||
| pelt.h | ||
| psi.c | ||
| rq-offsets.c | ||
| rt.c | ||
| sched-pelt.h | ||
| sched.h | ||
| smp.h | ||
| stats.c | ||
| stats.h | ||
| stop_task.c | ||
| swait.c | ||
| syscalls.c | ||
| topology.c | ||
| wait.c | ||
| wait_bit.c | ||