From 4fb352df14de4b5277f38a9874f7c19cf641ae4d Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Fri, 5 Dec 2025 16:24:05 +0100 Subject: [PATCH 1/5] PM: sleep: Do not flag runtime PM workqueue as freezable Till now, the runtime PM workqueue has been flagged as freezable, so it does not process work items during system-wide PM transitions like system suspend and resume. The original reason to do that was to reduce the likelihood of runtime PM getting in the way of system-wide PM processing, but now it is mostly an optimization because (1) runtime suspend of devices is prevented by bumping up their runtime PM usage counters in device_prepare() and (2) device drivers are expected to disable runtime PM for the devices handled by them before they embark on system-wide PM activities that may change the state of the hardware or otherwise interfere with runtime PM. However, it prevents asynchronous runtime resume of devices from working during system-wide PM transitions, which is confusing because synchronous runtime resume is not prevented at the same time, and it also sometimes turns out to be problematic. For example, it has been reported that blk_queue_enter() may deadlock during a system suspend transition because of the pm_request_resume() usage in it [1]. It may also deadlock during a system resume transition in a similar way. That happens because the asynchronous runtime resume of the given device is not processed due to the freezing of the runtime PM workqueue. While it may be better to address this particular issue in the block layer, the very presence of it means that similar problems may be expected to occur elsewhere. For this reason, remove the WQ_FREEZABLE flag from the runtime PM workqueue and make device_suspend_late() use the generic variant of pm_runtime_disable() that will carry out runtime PM of the device synchronously if there is pending resume work for it. Also update the comment before the pm_runtime_disable() call in device_suspend_late(), to document the fact that the runtime PM should not be expected to work for the device until the end of device_resume_early(), and update the related documentation. This change may, even though it is not expected to, uncover some latent issues related to queuing up asynchronous runtime resume work items during system suspend or hibernation. However, they should be limited to the interference between runtime resume and system-wide PM callbacks in the cases when device drivers start to handle system-wide PM before disabling runtime PM as described above. Link: https://lore.kernel.org/linux-pm/20251126101636.205505-2-yang.yang@vivo.com/ Signed-off-by: Rafael J. Wysocki Reviewed-by: Ulf Hansson Link: https://patch.msgid.link/12794222.O9o76ZdvQC@rafael.j.wysocki --- Documentation/power/runtime_pm.rst | 7 +++---- drivers/base/power/main.c | 7 ++++--- kernel/power/main.c | 2 +- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/Documentation/power/runtime_pm.rst b/Documentation/power/runtime_pm.rst index 455b9d135d85..a53ab09c37d5 100644 --- a/Documentation/power/runtime_pm.rst +++ b/Documentation/power/runtime_pm.rst @@ -712,10 +712,9 @@ out the following operations: * During system suspend pm_runtime_get_noresume() is called for every device right before executing the subsystem-level .prepare() callback for it and pm_runtime_barrier() is called for every device right before executing the - subsystem-level .suspend() callback for it. In addition to that the PM core - calls __pm_runtime_disable() with 'false' as the second argument for every - device right before executing the subsystem-level .suspend_late() callback - for it. + subsystem-level .suspend() callback for it. In addition to that, the PM + core disables runtime PM for every device right before executing the + subsystem-level .suspend_late() callback for it. * During system resume pm_runtime_enable() and pm_runtime_put() are called for every device right after executing the subsystem-level .resume_early() diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 97a8b4fcf471..189de5250f25 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -1647,10 +1647,11 @@ static void device_suspend_late(struct device *dev, pm_message_t state, bool asy goto Complete; /* - * Disable runtime PM for the device without checking if there is a - * pending resume request for it. + * After this point, any runtime PM operations targeting the device + * will fail until the corresponding pm_runtime_enable() call in + * device_resume_early(). */ - __pm_runtime_disable(dev, false); + pm_runtime_disable(dev); if (dev->power.syscore) goto Skip; diff --git a/kernel/power/main.c b/kernel/power/main.c index 03b2c5495c77..5f8c9e12eaec 100644 --- a/kernel/power/main.c +++ b/kernel/power/main.c @@ -1125,7 +1125,7 @@ EXPORT_SYMBOL_GPL(pm_wq); static int __init pm_start_workqueues(void) { - pm_wq = alloc_workqueue("pm", WQ_FREEZABLE | WQ_UNBOUND, 0); + pm_wq = alloc_workqueue("pm", WQ_UNBOUND, 0); if (!pm_wq) return -ENOMEM; From 1081c1649da989ef9cbc01ffa99babc190df6077 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 26 Jan 2026 21:03:57 +0100 Subject: [PATCH 2/5] PM: hibernate: Drop NULL pointer checks before acomp_request_free() Since acomp_request_free() checks its argument against NULL, the NULL pointer checks before calling it added by commit ("7966cf0ebe32 PM: hibernate: Fix crash when freeing invalid crypto compressor") are redundant, so drop them. No intentional functional impact. Signed-off-by: Rafael J. Wysocki Link: https://patch.msgid.link/6233709.lOV4Wx5bFT@rafael.j.wysocki --- kernel/power/swap.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/power/swap.c b/kernel/power/swap.c index 8050e5182835..7e462957c9bf 100644 --- a/kernel/power/swap.c +++ b/kernel/power/swap.c @@ -902,8 +902,8 @@ out_clean: for (thr = 0; thr < nr_threads; thr++) { if (data[thr].thr) kthread_stop(data[thr].thr); - if (data[thr].cr) - acomp_request_free(data[thr].cr); + + acomp_request_free(data[thr].cr); if (!IS_ERR_OR_NULL(data[thr].cc)) crypto_free_acomp(data[thr].cc); @@ -1502,8 +1502,8 @@ out_clean: for (thr = 0; thr < nr_threads; thr++) { if (data[thr].thr) kthread_stop(data[thr].thr); - if (data[thr].cr) - acomp_request_free(data[thr].cr); + + acomp_request_free(data[thr].cr); if (!IS_ERR_OR_NULL(data[thr].cc)) crypto_free_acomp(data[thr].cc); From 75ce02f4bc9a8b8350b6b1b01872467b0cc960cc Mon Sep 17 00:00:00 2001 From: Samuel Wu Date: Fri, 23 Jan 2026 17:21:29 -0800 Subject: [PATCH 3/5] PM: wakeup: Handle empty list in wakeup_sources_walk_start() In the case of an empty wakeup_sources list, wakeup_sources_walk_start() will return an invalid but non-NULL address. This also affects wrappers of the aforementioned function, like for_each_wakeup_source(). Update wakeup_sources_walk_start() to return NULL in case of an empty list. Fixes: b4941adb24c0 ("PM: wakeup: Add routine to help fetch wakeup source object.") Signed-off-by: Samuel Wu [ rjw: Subject and changelog edits ] Link: https://patch.msgid.link/20260124012133.2451708-2-wusamuel@google.com Signed-off-by: Rafael J. Wysocki --- drivers/base/power/wakeup.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index 1e1a0e7eeac5..e69033d16fba 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -275,9 +275,7 @@ EXPORT_SYMBOL_GPL(wakeup_sources_read_unlock); */ struct wakeup_source *wakeup_sources_walk_start(void) { - struct list_head *ws_head = &wakeup_sources; - - return list_entry_rcu(ws_head->next, struct wakeup_source, entry); + return list_first_or_null_rcu(&wakeup_sources, struct wakeup_source, entry); } EXPORT_SYMBOL_GPL(wakeup_sources_walk_start); From 5c9ecd8e6437cd55a38ea4f1e1d19cee8e226cb8 Mon Sep 17 00:00:00 2001 From: Gui-Dong Han Date: Tue, 3 Feb 2026 11:19:43 +0800 Subject: [PATCH 4/5] PM: sleep: wakeirq: harden dev_pm_clear_wake_irq() against races dev_pm_clear_wake_irq() currently uses a dangerous pattern where dev->power.wakeirq is read and checked for NULL outside the lock. If two callers invoke this function concurrently, both might see a valid pointer and proceed. This could result in a double-free when the second caller acquires the lock and tries to release the same object. Address this by removing the lockless check of dev->power.wakeirq. Instead, acquire dev->power.lock immediately to ensure the check and the subsequent operations are atomic. If dev->power.wakeirq is NULL under the lock, simply unlock and return. This guarantees that concurrent calls cannot race to free the same object. Based on a quick scan of current users, I did not find an actual bug as drivers seem to rely on their own synchronization. However, since asynchronous usage patterns exist (e.g., in drivers/net/wireless/ti/wlcore), I believe a race is theoretically possible if the API is used less carefully in the future. This change hardens the API to be robust against such cases. Fixes: 4990d4fe327b ("PM / Wakeirq: Add automated device wake IRQ handling") Signed-off-by: Gui-Dong Han Link: https://patch.msgid.link/20260203031943.1924-1-hanguidong02@gmail.com Signed-off-by: Rafael J. Wysocki --- drivers/base/power/wakeirq.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/base/power/wakeirq.c b/drivers/base/power/wakeirq.c index 8aa28c08b289..c0809d18fc54 100644 --- a/drivers/base/power/wakeirq.c +++ b/drivers/base/power/wakeirq.c @@ -83,13 +83,16 @@ EXPORT_SYMBOL_GPL(dev_pm_set_wake_irq); */ void dev_pm_clear_wake_irq(struct device *dev) { - struct wake_irq *wirq = dev->power.wakeirq; + struct wake_irq *wirq; unsigned long flags; - if (!wirq) - return; - spin_lock_irqsave(&dev->power.lock, flags); + wirq = dev->power.wakeirq; + if (!wirq) { + spin_unlock_irqrestore(&dev->power.lock, flags); + return; + } + device_wakeup_detach_irq(dev); dev->power.wakeirq = NULL; spin_unlock_irqrestore(&dev->power.lock, flags); From 0491f3f9f664e7e0131eb4d2a8b19c49562e5c64 Mon Sep 17 00:00:00 2001 From: Xuewen Yan Date: Wed, 4 Feb 2026 13:25:09 +0100 Subject: [PATCH 5/5] PM: sleep: core: Avoid bit field races related to work_in_progress In all of the system suspend transition phases, the async processing of a device may be carried out in parallel with power.work_in_progress updates for the device's parent or suppliers and if it touches bit fields from the same group (for example, power.must_resume or power.wakeup_path), bit field corruption is possible. To avoid that, turn work_in_progress in struct dev_pm_info into a proper bool field and relocate it to save space. Fixes: aa7a9275ab81 ("PM: sleep: Suspend async parents after suspending children") Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous") Signed-off-by: Xuewen Yan Closes: https://lore.kernel.org/linux-pm/20260203063459.12808-1-xuewen.yan@unisoc.com/ Cc: All applicable [ rjw: Added subject and changelog ] Link: https://patch.msgid.link/CAB8ipk_VX2VPm706Jwa1=8NSA7_btWL2ieXmBgHr2JcULEP76g@mail.gmail.com Signed-off-by: Rafael J. Wysocki --- include/linux/pm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/pm.h b/include/linux/pm.h index 98a899858ece..afcaaa37a812 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -681,10 +681,10 @@ struct dev_pm_info { struct list_head entry; struct completion completion; struct wakeup_source *wakeup; + bool work_in_progress; /* Owned by the PM core */ bool wakeup_path:1; bool syscore:1; bool no_pm_callbacks:1; /* Owned by the PM core */ - bool work_in_progress:1; /* Owned by the PM core */ bool smart_suspend:1; /* Owned by the PM core */ bool must_resume:1; /* Owned by the PM core */ bool may_skip_resume:1; /* Set by subsystems */