linux

mirror of https://github.com/torvalds/linux.git synced 2026-03-09 20:36:39 +01:00

Author	SHA1	Message	Date
Linus Torvalds	dda5df9823	Miscellaneous MMCID fixes to address bugs and performance regressions in the recent rewrite of the SCHED_MM_CID management code: - Fix livelock triggered by BPF CI testing - Fix hard lockup on weakly ordered systems - Simplify the dropping of CIDs in the exit path by removing an unintended transition phase. - Fix performance/scalability regression on a thread-pool benchmark by optimizing transitional CIDs when scheduling out. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmHDvQRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hdPBAAgnl/L09wF8WCQLSoLrhr71FmS6fZApDB Rvov2be8tGJR0BsrJF5uOKTNjulqUIr0mfO73fdHZftdFuhm/WLnWjBO62GhCKMg d8kXOVZ7PudFN+QwL17pOAub8voh9s9/mceE/hZ3M5eNjXlG4sAcpyGvnrTLLYru rfzO48NOpy5NMfbxU5/f9nojfr2t8fhnpX2QjquOhEPpl/BeYzexTZK7h2IJXqTK tkU6IY9X8fT7y8LkKbTCIMJvEuWawHj1DSW2EiWNPJZkX+Hk5ZHttg28JjROavEy orgairCSCT/cOETKugfToFd0Z4WlmemY6Nk5Kyx//WiFQ/u0HHlFVgMJoJfQEovV MtIxLVygVbEoQyTszZyFUlTQjrnH8uKxXYhh1mX5wSj9lyDfpfJZycFFA2RpE4Rw /+pvH08BfR4FgpqTfojfgOnuK/575VsomaVghritoNW3bAie1kpnWIeBaXS8lL4O 0pkK7XX8ng6hXuZTMxgXXfkfUB6oM1Yp1OZJAEzUvftsK0FQ5q3e0WxD+pdVza2s PfQPaA7bT/G7y8k4LIXm59/tPX2QWPwe0yci00NbyfWiOdxHSgS7crQO8E1+VAiq TcLGZNj/wFL6B5ghaiUIi22Mo+WnLX8fW+aiIjSiUQILmbNZXYmwtfEFsvsahh9W /RkE/WQ492E= =/PkF -----END PGP SIGNATURE----- Merge tag 'sched-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Miscellaneous MMCID fixes to address bugs and performance regressions in the recent rewrite of the SCHED_MM_CID management code: - Fix livelock triggered by BPF CI testing - Fix hard lockup on weakly ordered systems - Simplify the dropping of CIDs in the exit path by removing an unintended transition phase - Fix performance/scalability regression on a thread-pool benchmark by optimizing transitional CIDs when scheduling out" * tag 'sched-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/mmcid: Optimize transitional CIDs when scheduling out sched/mmcid: Drop per CPU CID immediately when switching to per task mode sched/mmcid: Protect transition on weakly ordered systems sched/mmcid: Prevent live lock on task to CPU mode transition	2026-02-07 09:10:42 -08:00
Linus Torvalds	7e0b172c80	Misc objtool fixes: - Bump up the Clang minimum version requirements for livepatch builds, due to Clang assembler section handling bugs causing silent miscompilations. - Strip livepatching symbol artifacts from non-livepatch modules. - Fix livepatch build warnings when certain Clang LTO options are enabled. - Fix livepatch build error when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmHCvwRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hkTQ/8DhLI5m9CqhMaouuR5Vm9POKgFOXAe6uz eHTuKhpJlw+anhNjeUA7PtYbnkrj0j+aNo5SmrfD4Yx8CW7dCt+oO3Y5ziVhkPHw 46Q/9KbmcT11uPbYywp/G4b15FF8YYu9slzhGau/Wa9H+oqU/WPoGapJsMPpUtBo s0qGxdr2G3WZyD9H/wgCyhMwCOkAYMJ0sHxpGgRajLefDutsRvtlau/ktYU79vI3 nUFteD3YDIAUblBtZPogsCP36QJlx7TWCUNK02vPeOYRh3xPjf3iG+vgf1+sjZHV P20psekpDwhh1KyeVziUyihUy8TmEVVozRvsrUKVlXmEqLqDtNrysqKTSp5/yRdT MqgNwDrvv2wW/DKYJhefbuttx3ppAErrnJ3zC9TYSmdn27feKPDJcD1OmdS3BIpH x9u/eVbOS8xbsOc/t3/Al7CRazvjLU0+OXsMJWbmAaO3tE7SwHq2aOnGbLGjvwWC Ts1AYfNp4H41CLLFnmKR8q2t/DOBhefW8p3cR5U+cVQ7PdqKRT+TwKWVnCrbrBcJ 71702IrqoghwUrmhtxdZR0jZLtwb80s8zqdbrHojXSbXUFfYwwHNNaW1NSEpdSr4 W8xYfSPrK71OGZ6oh/v1Wce6D+mxqb6kYD8DRLgOdbxrnvLwUhGXxxNDjfXA0f4B abjGHlAyoCs= =cdu2 -----END PGP SIGNATURE----- Merge tag 'objtool-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Ingo Molnar:: - Bump up the Clang minimum version requirements for livepatch builds, due to Clang assembler section handling bugs causing silent miscompilations - Strip livepatching symbol artifacts from non-livepatch modules - Fix livepatch build warnings when certain Clang LTO options are enabled - Fix livepatch build error when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y * tag 'objtool-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool/klp: Fix unexported static call key access for manually built livepatch modules objtool/klp: Fix symbol correlation for orphaned local symbols livepatch: Free klp_{object,func}_ext data after initialization livepatch: Fix having __klp_objects relics in non-livepatch modules livepatch/klp-build: Require Clang assembler >= 20	2026-02-07 08:21:21 -08:00
Linus Torvalds	bab849a908	tracing fix for v6.19: - Fix event format field alignments for 32 bit architectures The fields in the event format files are used to parse the raw binary buffer data by applications. If they are incorrect, then the application produces garbage. On 32 bit architectures, the function graph 64bit calltime and rettime were off by 4bytes. That's because the actual fields are in a packed structure but the macros used by the ftrace events did not mark them as packed, and instead, gave them their natural alignment which made their offsets off by 4 bytes. There are macros to have a packed field within an embedded structure of an event, but there's no macro for normal fields within a packed structure of the event. The macro __field_packed() was used for the packed embedded structure field. Rename that to __field_desc_pcaked() (to match the non-packed embedded field macro __field_desc()), and make __field_packed() for fields that are in a packed event structure (which matches the unpacked __field() macro). Switch the calltime and rettime fields of the function graph event to use the new __field_packed() and this makes the offsets correct. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaYZKpRQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qk1UAP41H5PL24xmYp+34GIP6lHuJr6iZzUm KbZi1Zx4zNmXSAD/e3Ra5SZopWszeMTf/tmxUXbl30oLdw4CJgS1WztBggk= =m36h -----END PGP SIGNATURE----- Merge tag 'trace-v6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fix from Steven Rostedt: - Fix event format field alignments for 32 bit architectures The fields in the event format files are used to parse the raw binary buffer data by applications. If they are incorrect, then the application produces garbage. On 32 bit architectures, the function graph 64bit calltime and rettime were off by 4bytes. That's because the actual fields are in a packed structure but the macros used by the ftrace events did not mark them as packed, and instead, gave them their natural alignment which made their offsets off by 4 bytes. There are macros to have a packed field within an embedded structure of an event, but there's no macro for normal fields within a packed structure of the event. The macro __field_packed() was used for the packed embedded structure field. Rename that to __field_desc_packed() (to match the non-packed embedded field macro __field_desc()), and make __field_packed() for fields that are in a packed event structure (which matches the unpacked __field() macro). Switch the calltime and rettime fields of the function graph event to use the new __field_packed() and this makes the offsets correct. * tag 'trace-v6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix ftrace event field alignments	2026-02-06 12:37:28 -08:00
Linus Torvalds	23b0d2f7c2	dma-mapping fixes for Linux 6.19 Two minor fixes for DMA-mapping subsystem: - check for the rare case of the allocation failure of the global CMA pool (Shanker Donthineni) - avoid perf buffer overflow when tracing large scatter-gather lists (Deepanshu Kartikey) -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCaYXTzAAKCRCJp1EFxbsS REbUAP40hTgNNGjlbV/b6nES4P/SuZ5a8p05+YWF7bTOVf/pMwEA8EHFz5DLsKeS 1bX7/X2wzEYOyJ7v1S+PYxIswn9A5AY= =2IQo -----END PGP SIGNATURE----- Merge tag 'dma-mapping-6.19-2026-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: "Two minor fixes for the DMA-mapping subsystem: - check for the rare case of the allocation failure of the global CMA pool (Shanker Donthineni) - avoid perf buffer overflow when tracing large scatter-gather lists (Deepanshu Kartikey)" * tag 'dma-mapping-6.19-2026-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma: contiguous: Check return value of dma_contiguous_reserve_area() tracing/dma: Cap dma_map_sg tracepoint arrays to prevent buffer overflow	2026-02-06 10:27:42 -08:00
Petr Pavlu	ab10815472	livepatch: Fix having __klp_objects relics in non-livepatch modules The linker script scripts/module.lds.S specifies that all input __klp_objects sections should be consolidated into an output section of the same name, and start/stop symbols should be created to enable scripts/livepatch/init.c to locate this data. This start/stop pattern is not ideal for modules because the symbols are created even if no __klp_objects input sections are present. Consequently, a dummy __klp_objects section also appears in the resulting module. This unnecessarily pollutes non-livepatch modules. Instead, since modules are relocatable files, the usual method for locating consolidated data in a module is to read its section table. This approach avoids the aforementioned problem. The klp_modinfo already stores a copy of the entire section table with the final addresses. Introduce a helper function that scripts/livepatch/init.c can call to obtain the location of the __klp_objects section from this data. Fixes: `dd590d4d57` ("objtool/klp: Introduce klp diff subcommand for diffing object files") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Link: https://patch.msgid.link/20260123102825.3521961-2-petr.pavlu@suse.com Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>	2026-02-05 08:00:44 -08:00
Steven Rostedt	033c55fe2e	tracing: Fix ftrace event field alignments The fields of ftrace specific events (events used to save ftrace internal events like function traces and trace_printk) are generated similarly to how normal trace event fields are generated. That is, the fields are added to a trace_events_fields array that saves the name, offset, size, alignment and signness of the field. It is used to produce the output in the format file in tracefs so that tooling knows how to parse the binary data of the trace events. The issue is that some of the ftrace event structures are packed. The function graph exit event structures are one of them. The 64 bit calltime and rettime fields end up 4 byte aligned, but the algorithm to show to userspace shows them as 8 byte aligned. The macros that create the ftrace events has one for embedded structure fields. There's two macros for theses fields: __field_desc() and __field_packed() The difference of the latter macro is that it treats the field as packed. Rename that field to __field_desc_packed() and create replace the __field_packed() to be a normal field that is packed and have the calltime and rettime use those. This showed up on 32bit architectures for function graph time fields. It had: ~# cat /sys/kernel/tracing/events/ftrace/funcgraph_exit/format [..] field:unsigned long func; offset:8; size:4; signed:0; field:unsigned int depth; offset:12; size:4; signed:0; field:unsigned int overrun; offset:16; size:4; signed:0; field:unsigned long long calltime; offset:24; size:8; signed:0; field:unsigned long long rettime; offset:32; size:8; signed:0; Notice that overrun is at offset 16 with size 4, where in the structure calltime is at offset 20 (16 + 4), but it shows the offset at 24. That's because it used the alignment of unsigned long long when used as a declaration and not as a member of a structure where it would be aligned by word size (in this case 4). By using the proper structure alignment, the format has it at the correct offset: ~# cat /sys/kernel/tracing/events/ftrace/funcgraph_exit/format [..] field:unsigned long func; offset:8; size:4; signed:0; field:unsigned int depth; offset:12; size:4; signed:0; field:unsigned int overrun; offset:16; size:4; signed:0; field:unsigned long long calltime; offset:20; size:8; signed:0; field:unsigned long long rettime; offset:28; size:8; signed:0; Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reported-by: "jempty.liang" <imntjempty@163.com> Link: https://patch.msgid.link/20260204113628.53faec78@gandalf.local.home Fixes: `04ae87a520` ("ftrace: Rework event_create_dir()") Closes: https://lore.kernel.org/all/20260130015740.212343-1-imntjempty@163.com/ Closes: https://lore.kernel.org/all/20260202123342.2544795-1-imntjempty@163.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-02-05 09:47:11 -05:00
Linus Torvalds	b20624608f	5 hotfixes. 2 are cc:stable, 2 are for MM. All are singletons - please see the changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaYPcbwAKCRDdBJ7gKXxA jpM9AQDRiBlZRBdYY8/nS2zMc8hE7s5O3koXu/UMf2O01aJjsgD6AssmcJzkbLir O1mlBSD0wlR3TZLEqSOUYIxgw7evLww= =ZHeq -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2026-02-04-15-55' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Five hotfixes. Two are cc:stable, two are for MM. All are singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-02-04-15-55' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: Documentation: document liveupdate cmdline parameter mm, shmem: prevent infinite loop on truncate race mailmap: update Alexander Mikhalitsyn's emails liveupdate: luo_file: do not clear serialized_data on unfreeze x86/kfence: fix booting on 32bit non-PAE systems	2026-02-04 16:04:00 -08:00
Linus Torvalds	3c7b4d1994	sched_ext: Fixes for v6.19-rc8 - Fix race where sched_class operations (sched_setscheduler() and friends) could be invoked on dead tasks after sched_ext_dead() already ran, causing invalid SCX task state transitions and NULL pointer dereferences. This was a regression from the cgroup exit ordering fix which moved sched_ext_free() to finish_task_switch(). -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaYPIhw4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGS3HAQChF4sOgoD67cul36LJaeiQzjLCh9iTU9vi2lB4 slJb5QD/dJhrC0T2ZVRm5rHVxckIx7KeFwbzhvlrUD7l+zEaAwo= =ysQE -----END PGP SIGNATURE----- Merge tag 'sched_ext-for-6.19-rc8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fix from Tejun Heo: - Fix race where sched_class operations (sched_setscheduler() and friends) could be invoked on dead tasks after sched_ext_dead() already ran, causing invalid SCX task state transitions and NULL pointer dereferences. This was a regression from the cgroup exit ordering fix which moved sched_ext_free() to finish_task_switch(). * tag 'sched_ext-for-6.19-rc8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Short-circuit sched_class operations on dead tasks	2026-02-04 15:11:24 -08:00
Tejun Heo	0eca95cba2	sched_ext: Short-circuit sched_class operations on dead tasks `7900aa699c` ("sched_ext: Fix cgroup exit ordering by moving sched_ext_free() to finish_task_switch()") moved sched_ext_free() to finish_task_switch() and renamed it to sched_ext_dead() to fix cgroup exit ordering issues. However, this created a race window where certain sched_class ops may be invoked on dead tasks leading to failures - e.g. sched_setscheduler() may try to switch a task which finished sched_ext_dead() back into SCX triggering invalid SCX task state transitions. Add task_dead_and_done() which tests whether a task is TASK_DEAD and has completed its final context switch, and use it to short-circuit sched_class operations which may be called on dead tasks. Fixes: `7900aa699c` ("sched_ext: Fix cgroup exit ordering by moving sched_ext_free() to finish_task_switch()") Reported-by: Andrea Righi <arighi@nvidia.com> Link: http://lkml.kernel.org/r/20260202151341.796959-1-arighi@nvidia.com Reviewed-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-02-04 12:22:11 -10:00
Thomas Gleixner	4463c7aa11	sched/mmcid: Optimize transitional CIDs when scheduling out During the investigation of the various transition mode issues instrumentation revealed that the amount of bitmap operations can be significantly reduced when a task with a transitional CID schedules out after the fixup function completed and disabled the transition mode. At that point the mode is stable and therefore it is not required to drop the transitional CID back into the pool. As the fixup is complete the potential exhaustion of the CID pool is not longer possible, so the CID can be transferred to the scheduling out task or to the CPU depending on the current ownership mode. The racy snapshot of mm_cid::mode which contains both the ownership state and the transition bit is valid because runqueue lock is held and the fixup function of a concurrent mode switch is serialized. Assigning the ownership right there not only spares the bitmap access for dropping the CID it also avoids it when the task is scheduled back in as it directly hits the fast path in both modes when the CID is within the optimal range. If it's outside the range the next schedule in will need to converge so dropping it right away is sensible. In the good case this also allows to go into the fast path on the next schedule in operation. With a thread pool benchmark which is configured to cross the mode switch boundaries frequently this reduces the number of bitmap operations by about 30% and increases the fastpath utilization in the low single digit percentage range. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260201192835.100194627@kernel.org	2026-02-04 12:21:12 +01:00
Thomas Gleixner	007d84287c	sched/mmcid: Drop per CPU CID immediately when switching to per task mode When a exiting task initiates the switch from per CPU back to per task mode, it has already dropped its CID and marked itself inactive. But a leftover from an earlier iteration of the rework then reassigns the per CPU CID to the exiting task with the transition bit set. That's wrong as the task is already marked CID inactive, which means it is inconsistent state. It's harmless because the CID is marked in transit and therefore dropped back into the pool when the exiting task schedules out either through preemption or the final schedule(). Simply drop the per CPU CID when the exiting task triggered the transition. Fixes: `fbd0e71dc3` ("sched/mmcid: Provide CID ownership mode fixup functions") Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260201192835.032221009@kernel.org	2026-02-04 12:21:12 +01:00
Thomas Gleixner	47ee94efcc	sched/mmcid: Protect transition on weakly ordered systems Shrikanth reported a hard lockup which he observed once. The stack trace shows the following CID related participants: watchdog: CPU 23 self-detected hard LOCKUP @ mm_get_cid+0xe8/0x188 NIP: mm_get_cid+0xe8/0x188 LR: mm_get_cid+0x108/0x188 mm_cid_switch_to+0x3c4/0x52c __schedule+0x47c/0x700 schedule_idle+0x3c/0x64 do_idle+0x160/0x1b0 cpu_startup_entry+0x48/0x50 start_secondary+0x284/0x288 start_secondary_prolog+0x10/0x14 watchdog: CPU 11 self-detected hard LOCKUP @ plpar_hcall_norets_notrace+0x18/0x2c NIP: plpar_hcall_norets_notrace+0x18/0x2c LR: queued_spin_lock_slowpath+0xd88/0x15d0 _raw_spin_lock+0x80/0xa0 raw_spin_rq_lock_nested+0x3c/0xf8 mm_cid_fixup_cpus_to_tasks+0xc8/0x28c sched_mm_cid_exit+0x108/0x22c do_exit+0xf4/0x5d0 make_task_dead+0x0/0x178 system_call_exception+0x128/0x390 system_call_vectored_common+0x15c/0x2ec The task on CPU11 is running the CID ownership mode change fixup function and is stuck on a runqueue lock. The task on CPU23 is trying to get a CID from the pool with the same runqueue lock held, but the pool is empty. After decoding a similar issue in the opposite direction switching from per task to per CPU mode the tool which models the possible scenarios failed to come up with a similar loop hole. This showed up only once, was not reproducible and according to tooling not related to a overlooked scheduling scenario permutation. But the fact that it was observed on a PowerPC system gave the right hint: PowerPC is a weakly ordered architecture. The transition mechanism does: WRITE_ONCE(mm->mm_cid.transit, MM_CID_TRANSIT); WRITE_ONCE(mm->mm_cid.percpu, new_mode); fixup() WRITE_ONCE(mm->mm_cid.transit, 0); mm_cid_schedin() does: if (!READ_ONCE(mm->mm_cid.percpu)) ... cid \|= READ_ONCE(mm->mm_cid.transit); so weakly ordered systems can observe percpu == false and transit == 0 even if the fixup function has not yet completed. As a consequence the task will not drop the CID when scheduling out before the fixup is completed, which means the CID space can be exhausted and the next task scheduling in will loop in mm_get_cid() and the fixup thread can livelock on the held runqueue lock as above. This could obviously be solved by using: smp_store_release(&mm->mm_cid.percpu, true); and smp_load_acquire(&mm->mm_cid.percpu); but that brings a memory barrier back into the scheduler hotpath, which was just designed out by the CID rewrite. That can be completely avoided by combining the per CPU mode and the transit storage into a single mm_cid::mode member and ordering the stores against the fixup functions to prevent the CPU from reordering them. That makes the update of both states atomic and a concurrent read observes always consistent state. The price is an additional AND operation in mm_cid_schedin() to evaluate the per CPU or the per task path, but that's in the noise even on strongly ordered architectures as the actual load can be significantly more expensive and the conditional branch evaluation is there anyway. Fixes: `fbd0e71dc3` ("sched/mmcid: Provide CID ownership mode fixup functions") Closes: https://lore.kernel.org/bdfea828-4585-40e8-8835-247c6a8a76b0@linux.ibm.com Reported-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260201192834.965217106@kernel.org	2026-02-04 12:21:12 +01:00
Thomas Gleixner	4327fb13fa	sched/mmcid: Prevent live lock on task to CPU mode transition Ihor reported a BPF CI failure which turned out to be a live lock in the MM_CID management. The scenario is: A test program creates the 5th thread, which means the MM_CID users become more than the number of CPUs (four in this example), so it switches to per CPU ownership mode. At this point each live task of the program has a CID associated. Assume thread creation order assignment for simplicity. T0 CID0 runs fork() and creates T4 T1 CID1 T2 CID2 T3 CID3 T4 --- not visible yet T0 sets mm_cid::percpu = true and transfers its own CID to CPU0 where it runs on and then starts the fixup which walks through the threads to transfer the per task CIDs either to the CPU the task is running on or drop it back into the pool if the task is not on a CPU. During that T1 - T3 are free to schedule in and out before the fixup caught up with them. Going through all possible permutations with a python script revealed a few problematic cases. The most trivial one is: T1 schedules in on CPU1 and observes percpu == true, so it transfers its CID to CPU1 T1 is migrated to CPU2 and schedule in observes percpu == true, but CPU2 does not have a CID associated and T1 transferred its own to CPU1 So it has to allocate one with CPU2 runqueue lock held, but the pool is empty, so it keeps looping in mm_get_cid(). Now T0 reaches T1 in the thread walk and tries to lock the corresponding runqueue lock, which is held causing a full live lock. There is a similar scenario in the reverse direction of switching from per CPU to task mode which is way more obvious and got therefore addressed by an intermediate mode. In this mode the CIDs are marked with MM_CID_TRANSIT, which means that they are neither owned by the CPU nor by the task. When a task schedules out with a transit CID it drops the CID back into the pool making it available for others to use temporarily. Once the task which initiated the mode switch finished the fixup it clears the transit mode and the process goes back into per task ownership mode. Unfortunately this insight was not mapped back to the task to CPU mode switch as the above described scenario was not considered in the analysis. Apply the same transit mechanism to the task to CPU mode switch to handle these problematic cases correctly. As with the CPU to task transition this results in a potential temporary contention on the CID bitmap, but that's only for the time it takes to complete the transition. After that it stays in steady mode which does not touch the bitmap at all. Fixes: `fbd0e71dc3` ("sched/mmcid: Provide CID ownership mode fixup functions") Closes: https://lore.kernel.org/2b7463d7-0f58-4e34-9775-6e2115cfb971@linux.dev Reported-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260201192834.897115238@kernel.org	2026-02-04 12:21:11 +01:00
Pratyush Yadav (Google)	011d4e52a7	liveupdate: luo_file: do not clear serialized_data on unfreeze Patch series "liveupdate: fixes in error handling". This series contains some fixes in LUO's error handling paths. The first patch deals with failed freeze() attempts. The cleanup path calls unfreeze, and that clears some data needed by later unpreserve calls. The second patch is a bit more involved. It deals with failed retrieve() attempts. To do so properly, it reworks some of the error handling logic in luo_file core. Both these fixes are "theoretical" -- in the sense that I have not been able to reproduce either of them in normal operation. The only supported file type right now is memfd, and there is nothing userspace can do right now to make it fail its retrieve or freeze. I need to make the retrieve or freeze fail by artificially injecting errors. The injected errors trigger a use-after-free and a double-free. That said, once more complex file handlers are added or memfd preservation is used in ways not currently expected or covered by the tests, we will be able to see them on real systems. This patch (of 2): The unfreeze operation is supposed to undo the effects of the freeze operation. serialized_data is not set by freeze, but by preserve. Consequently, the unpreserve operation needs to access serialized_data to undo the effects of the preserve operation. This includes freeing the serialized data structures for example. If a freeze callback fails, unfreeze is called for all frozen files. This would clear serialized_data for them. Since live update has failed, it can be expected that userspace aborts, releasing all sessions. When the sessions are released, unpreserve will be called for all files. The unfrozen files will see 0 in their serialized_data. This is not expected by file handlers, and they might either fail, leaking data and state, or might even crash or cause invalid memory access. Do not clear serialized_data on unfreeze so it gets passed on to unpreserve. There is no need to clear it on unpreserve since luo_file will be freed immediately after. Link: https://lkml.kernel.org/r/20260126230302.2936817-1-pratyush@kernel.org Link: https://lkml.kernel.org/r/20260126230302.2936817-2-pratyush@kernel.org Fixes: `7c722a7f44` ("liveupdate: luo_file: implement file systems callbacks") Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-02-02 18:43:55 -08:00
Linus Torvalds	6bd9ed0287	cgroup: Fixes for v6.19-rc8 Three dmem fixes from Chen Ridong addressing use-after-free, RCU warning, and NULL pointer dereference issues introduced with the dmem controller. All changes are confined to kernel/cgroup/dmem.c and can only affect dmem controller users. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaYES4Q4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGQ3XAP92niQQAOj9ekLd9O9DnwA0KlkHKT30QO4oPJVR 0z0wuwEA4kY8/jUAWpKjxzmXse9m06MTvzfjuv/4k5IRnZ84cwY= =ze/P -----END PGP SIGNATURE----- Merge tag 'cgroup-for-6.19-rc8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Three dmem fixes from Chen Ridong addressing use-after-free, RCU warning, and NULL pointer dereference issues introduced with the dmem controller. All changes are confined to kernel/cgroup/dmem.c and can only affect dmem controller users" * tag 'cgroup-for-6.19-rc8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/dmem: avoid pool UAF cgroup/dmem: avoid rcu warning when unregister region cgroup/dmem: fix NULL pointer dereference when setting max	2026-02-02 15:14:45 -08:00
Chen Ridong	99a2ef5009	cgroup/dmem: avoid pool UAF An UAF issue was observed: BUG: KASAN: slab-use-after-free in page_counter_uncharge+0x65/0x150 Write of size 8 at addr ffff888106715440 by task insmod/527 CPU: 4 UID: 0 PID: 527 Comm: insmod 6.19.0-rc7-next-20260129+ #11 Tainted: [O]=OOT_MODULE Call Trace: <TASK> dump_stack_lvl+0x82/0xd0 kasan_report+0xca/0x100 kasan_check_range+0x39/0x1c0 page_counter_uncharge+0x65/0x150 dmem_cgroup_uncharge+0x1f/0x260 Allocated by task 527: Freed by task 0: The buggy address belongs to the object at ffff888106715400 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 64 bytes inside of freed 512-byte region [ffff888106715400, ffff888106715600) The buggy address belongs to the physical page: Memory state around the buggy address: ffff888106715300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888106715380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888106715400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888106715480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888106715500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb The issue occurs because a pool can still be held by a caller after its associated memory region is unregistered. The current implementation frees the pool even if users still hold references to it (e.g., before uncharge operations complete). This patch adds a reference counter to each pool, ensuring that a pool is only freed when its reference count drops to zero. Fixes: `b168ed458d` ("kernel/cgroup: Add "dmem" memory accounting cgroup") Cc: stable@vger.kernel.org # v6.14+ Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-02-02 06:04:13 -10:00
Chen Ridong	592a68212c	cgroup/dmem: avoid rcu warning when unregister region A warnning was detected: WARNING: suspicious RCU usage 6.19.0-rc7-next-20260129+ #1101 Tainted: G O kernel/cgroup/dmem.c:456 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by insmod/532: #0: ffffffff85e78b38 (dmemcg_lock){+.+.}-dmem_cgroup_unregister_region+ stack backtrace: CPU: 2 UID: 0 PID: 532 Comm: insmod Tainted: 6.19.0-rc7-next- Tainted: [O]=OOT_MODULE Call Trace: <TASK> dump_stack_lvl+0xb0/0xd0 lockdep_rcu_suspicious+0x151/0x1c0 dmem_cgroup_unregister_region+0x1e2/0x380 ? __pfx_dmem_test_init+0x10/0x10 [dmem_uaf] dmem_test_init+0x65/0xff0 [dmem_uaf] do_one_initcall+0xbb/0x3a0 The macro list_for_each_rcu() must be used within an RCU read-side critical section (between rcu_read_lock() and rcu_read_unlock()). Using it outside that context, as seen in dmem_cgroup_unregister_region(), triggers the lockdep warning because the RCU protection is not guaranteed. Replace list_for_each_rcu() with list_for_each_entry_safe(), which is appropriate for traversal under spinlock protection where nodes may be deleted. Fixes: `b168ed458d` ("kernel/cgroup: Add "dmem" memory accounting cgroup") Cc: stable@vger.kernel.org # v6.14+ Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-02-02 06:03:28 -10:00
Chen Ridong	43151f8128	cgroup/dmem: fix NULL pointer dereference when setting max An issue was triggered: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 15 UID: 0 PID: 658 Comm: bash Tainted: 6.19.0-rc6-next-2026012 Tainted: [O]=OOT_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), RIP: 0010:strcmp+0x10/0x30 RSP: 0018:ffffc900017f7dc0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff888107cd4358 RDX: 0000000019f73907 RSI: ffffffff82cc381a RDI: 0000000000000000 RBP: ffff8881016bef0d R08: 000000006c0e7145 R09: 0000000056c0e714 R10: 0000000000000001 R11: ffff888107cd4358 R12: 0007ffffffffffff R13: ffff888101399200 R14: ffff888100fcb360 R15: 0007ffffffffffff CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000105c79000 CR4: 00000000000006f0 Call Trace: <TASK> dmemcg_limit_write.constprop.0+0x16d/0x390 ? __pfx_set_resource_max+0x10/0x10 kernfs_fop_write_iter+0x14e/0x200 vfs_write+0x367/0x510 ksys_write+0x66/0xe0 do_syscall_64+0x6b/0x390 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f42697e1887 It was trriggered setting max without limitation, the command is like: "echo test/region0 > dmem.max". To fix this issue, add check whether options is valid after parsing the region_name. Fixes: `b168ed458d` ("kernel/cgroup: Add "dmem" memory accounting cgroup") Cc: stable@vger.kernel.org # v6.14+ Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-02-02 06:02:42 -10:00
Shanker Donthineni	c33efdfcfa	dma: contiguous: Check return value of dma_contiguous_reserve_area() Commit `8f1fc1bf1a` ("dma: contiguous: Reserve default CMA heap") introduced a bug where dma_heap_cma_register_heap() is called with a NULL pointer when dma_contiguous_reserve_area() fails to reserve the CMA area. When dma_contiguous_reserve_area() fails, dma_contiguous_default_area remains NULL (initialized as a global variable), but the code doesn't check the return value and proceeds to call dma_heap_cma_register_heap() with this NULL pointer. Later during boot, add_cma_heaps() iterates through the dma_areas[] array and attempts to register heaps. When it encounters the NULL pointer stored by the earlier call, it crashes in __add_cma_heap() -> dma_heap_add() when trying to dereference the NULL CMA pointer. The crash manifests as: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038 ... Call trace: dma_heap_add+0x40/0x2b0 __add_cma_heap+0x80/0xe0 add_cma_heaps+0x64/0xb0 do_one_initcall+0x60/0x318 kernel_init_freeable+0x260/0x2f0 kernel_init+0x2c/0x168 ret_from_fork+0x10/0x20 Fix this by checking the return value of dma_contiguous_reserve_area() and only calling dma_heap_cma_register_heap() when the reservation succeeds. Fixes: `8f1fc1bf1a` ("dma: contiguous: Reserve default CMA heap") Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Reviewed-by: T.J. Mercier <tjmercier@google.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260129181317.2429196-1-sdonthineni@nvidia.com	2026-02-02 09:20:32 +01:00
Linus Torvalds	c00a879164	Fix a race in the user-callchains code. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAml/E38RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gESw/+OvzSQGvWAnL7LDHF5kNvS8eedFQMIXVk dkAgQ0ZK2URqU9tJ4RJY8gEAuKwwT+TtA7Ve1vXLYk6s8gpJ5jzYyi760NAbclFj JpyPJtS4zIqC2AQD4nMw+xzpLaGjavPN3ewYPdYgnRL2cK2ezZxlxWWxweAShWGH CqIXEYxG6Ezx0pUXtgqnUNRNM0ayIWwI8ZDHpvcaFt8FC86aKWvZ4urODGxZAT3Z W5JcsJu/cpPjUv4KAkxc9xdeofbPo1YK3yTLA7ih1MsH7p/Q7u5zkNSeXNnF25Wh 3NrrJCXV3K4MMVwyPQJzoMn8EWsCuP7yeZ5R+ds5aDaOBy4LYoL+kp5HzJ4GzdFr YUs3C2cRraR5xB6U9VjgidLNDXIriSUjwj9D9ucV9hlJ3Z7NJw04ZuQP5Oglt1TQ Pdndx8OqUyWjBctAOe0+bx6iC40jHrEj9HI3M/LkHM3DVIi5l3KWZi4qTHVTN/f9 mH5Z4Ot5QSy+A6jFiVXuIfo7CEYow5tOTrEibEX/J5fI5RK/5OgJAPUs+J3VeAMc jKb/Hp0e1SbMsoZmggWnNvIUz5Fk6nkkxrvC5iVMuGz6fsD9M7Ms6ckZd3Mw7tpa d4F95syN4odTseZX4hMFKyMP8Be62WNXSgXV4a7T8d1+FyYsaEdMRew6pyJD1nUc aW5b5eiBeIE= =ruON -----END PGP SIGNATURE----- Merge tag 'perf-urgent-2026-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events fix from Ingo Molnar: "Fix a race in the user-callchains code" * tag 'perf-urgent-2026-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: sched: Fix perf crash with new is_user_task() helper	2026-02-01 10:47:21 -08:00
Linus Torvalds	e53ada651a	Fix a regression in the deferrable dl_server code that can cause the dl_server to be stuck. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAml/FIkRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jB7w/9GazyDrZhdmJp7NeJCEW82FGKljBuEBvP RxN9lrhRuOIKEVuOfJI76Chd5iooSlKdDmaOcxFrkihwJ/AREHekTlYFLtncg5Zu 5JjeYLiuN04kkOClDz7mTgcSzJD7yM0m2mD5CS45lZ35shM64eLr5GtXaFTEs9SZ E9R4cBiNpz4bVMNW4y3TphSlEx84sIRWoj5Gmzw3CG/52GbIcj1rgwO4b/RI4FYV yio7V5+ol6oVijlul/UIMYkB6vWGbO8swa/YErhyCvXpHfR1pwsEDFcTjJvO9hCK 2QPwBo34m9mibybvQoyS+/Cybf/yWqeyc1KHqn2+wlO/Ectk97H3eBsbLO2ot/dX Sv+y+vq3Wm5BczaKtyCeGvk+N5NSp3jhKUO4jH0Iivi+jRJ2yynJ+Dgg0F0ldIMM 0qBfEvaiN8Fbg8LLDevfqtGnctvRiwzKNstaZyUmq1g5NXPOGlerVtHP+osuPb/L lIK7uSZMZNStJS7R7vcO+g2lAl15UwBmMPEepDdUsBilnKEnefl+j5wilhjLoNM7 G+XMSFLMoWX1/oY9f8TRs6ncZ+ays6tal8bePVyTrdp/RBAXCFlB9npNjqlTkZjT M6zRqHcIOPZxEJP3ZqsGb0rT41v7gJbuHg2uUSQwyqEWmTB45IqTewi2Omw/duJ/ OOp7g2matLY= =U5Zw -----END PGP SIGNATURE----- Merge tag 'sched-urgent-2026-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix a regression in the deferrable dl_server code that can cause the dl_server to be stuck" * tag 'sched-urgent-2026-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Fix 'stuck' dl_server	2026-02-01 10:39:52 -08:00
Steven Rostedt	76ed27608f	perf: sched: Fix perf crash with new is_user_task() helper In order to do a user space stacktrace the current task needs to be a user task that has executed in user space. It use to be possible to test if a task is a user task or not by simply checking the task_struct mm field. If it was non NULL, it was a user task and if not it was a kernel task. But things have changed over time, and some kernel tasks now have their own mm field. An idea was made to instead test PF_KTHREAD and two functions were used to wrap this check in case it became more complex to test if a task was a user task or not[1]. But this was rejected and the C code simply checked the PF_KTHREAD directly. It was later found that not all kernel threads set PF_KTHREAD. The io-uring helpers instead set PF_USER_WORKER and this needed to be added as well. But checking the flags is still not enough. There's a very small window when a task exits that it frees its mm field and it is set back to NULL. If perf were to trigger at this moment, the flags test would say its a user space task but when perf would read the mm field it would crash with at NULL pointer dereference. Now there are flags that can be used to test if a task is exiting, but they are set in areas that perf may still want to profile the user space task (to see where it exited). The only real test is to check both the flags and the mm field. Instead of making this modification in every location, create a new is_user_task() helper function that does all the tests needed to know if it is safe to read the user space memory or not. [1] https://lore.kernel.org/all/20250425204120.639530125@goodmis.org/ Fixes: `90942f9fac` ("perf: Use current->flags & PF_KTHREAD\|PF_USER_WORKER instead of current->mm == NULL") Closes: https://lore.kernel.org/all/0d877e6f-41a7-4724-875d-0b0a27b8a545@roeck-us.net/ Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260129102821.46484722@gandalf.local.home	2026-01-30 23:06:07 +01:00
Peter Zijlstra	1151354225	sched/deadline: Fix 'stuck' dl_server Andrea reported the dl_server getting stuck for him. He tracked it down to a state where dl_server_start() saw dl_defer_running==1, but the dl_server's job is no longer valid at the time of dl_server_start(). In the state diagram this corresponds to [4] D->A (or dl_server_stop() due to no more runnable tasks) followed by [1], which in case of a lapsed deadline must then be A->B. Now our A has dl_defer_running==1, while B demands dl_defer_running==0, therefore it must get cleared when the CBS wakeup rules demand a replenish. Fixes: `a110a81c52` ("sched/deadline: Deferrable dl server") Reported-by: Andrea Righi arighi@nvidia.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Tested-by: Andrea Righi arighi@nvidia.com Link: https://lkml.kernel.org/r/20260123161645.2181752-1-arighi@nvidia.com Link: https://patch.msgid.link/20260130124100.GC1079264@noisy.programming.kicks-ass.net	2026-01-30 23:06:06 +01:00
Linus Torvalds	2b54ac9e0c	dma-mapping fixes for Linux 6.19 - important fix for ARM 32-bit based systems using cma= kernel parameter (Oreoluwa Babatunde) - a fix for the corner case of the DMA atomic pool based allocations (Sai Sree Kartheek Adivi) -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCaXyw5wAKCRCJp1EFxbsS RBN7AP9rEfEEB3JBOglcZG3TTrLoCKLnw+16uroyKuD95RLWrQD/bWeJnRYcEZB5 ox1peKYBA4SsDB3bCUWFDfW4I0OZFQQ= =9Nx1 -----END PGP SIGNATURE----- Merge tag 'dma-mapping-6.19-2026-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: - important fix for ARM 32-bit based systems using cma= kernel parameter (Oreoluwa Babatunde) - a fix for the corner case of the DMA atomic pool based allocations (Sai Sree Kartheek Adivi) * tag 'dma-mapping-6.19-2026-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma/pool: distinguish between missing and exhausted atomic pools of: reserved_mem: Allow reserved_mem framework detect "cma=" kernel param	2026-01-30 13:15:04 -08:00
Linus Torvalds	bcb6058a4b	16 hotfixes. 9 are cc:stable, 12 are for MM. - There's a 3 patch series from Pratyush Yadav which fixes a few things in the new-in-6.19 LUO memfd code. - Plus the usual shower of singletons - please see the changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaXub9wAKCRDdBJ7gKXxA jjoHAP48ag3lmJnpej977MxrA4VUBhv6ATmbFE2+czCzbRJaigEAiIMWtUlSVYbH WuEYQvcFeSR0hYjs0ClKUiZYOUcj+wI= =lHEL -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2026-01-29-09-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "16 hotfixes. 9 are cc:stable, 12 are for MM. There's a patch series from Pratyush Yadav which fixes a few things in the new-in-6.19 LUO memfd code. Plus the usual shower of singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-01-29-09-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: vmcoreinfo: make hwerr_data visible for debugging mm/zone_device: reinitialize large zone device private folios mm/mm_init: don't cond_resched() in deferred_init_memmap_chunk() if called from deferred_grow_zone() mm/kfence: randomize the freelist on initialization kho: kho_preserve_vmalloc(): don't return 0 when ENOMEM kho: init alloc tags when restoring pages from reserved memory mm: memfd_luo: restore and free memfd_luo_ser on failure mm: memfd_luo: use memfd_alloc_file() instead of shmem_file_setup() memfd: export alloc_file() flex_proportions: make fprop_new_period() hardirq safe mailmap: add entry for Viacheslav Bocharov mm/memory-failure: teach kill_accessing_process to accept hugetlb tail page pfn mm/memory-failure: fix missing ->mf_stats count in hugetlb poison mm, swap: restore swap_space attr aviod kernel panic mm/kasan: fix KASAN poisoning in vrealloc() mm/shmem, swap: fix race of truncate and swap entry split	2026-01-29 11:09:13 -08:00
Sai Sree Kartheek Adivi	56c430c7f0	dma/pool: distinguish between missing and exhausted atomic pools Currently, dma_alloc_from_pool() unconditionally warns and dumps a stack trace when an allocation fails, with the message "Failed to get suitable pool". This conflates two distinct failure modes: 1. Configuration error: No atomic pool is available for the requested DMA mask (a fundamental system setup issue) 2. Resource Exhaustion: A suitable pool exists but is currently full (a recoverable runtime state) This lack of distinction prevents drivers from using __GFP_NOWARN to suppress error messages during temporary pressure spikes, such as when awaiting synchronous reclaim of descriptors. Refactor the error handling to distinguish these cases: - If no suitable pool is found, keep the unconditional WARN regarding the missing pool. - If a pool was found but is exhausted, respect __GFP_NOWARN and update the warning message to explicitly state "DMA pool exhausted". Fixes: `9420139f51` ("dma-pool: fix coherent pool allocations for IOMMU mappings") Signed-off-by: Sai Sree Kartheek Adivi <s-adivi@ti.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260128133554.3056582-1-s-adivi@ti.com	2026-01-29 10:23:45 +01:00
Oreoluwa Babatunde	0fd17e5983	of: reserved_mem: Allow reserved_mem framework detect "cma=" kernel param When initializing the default cma region, the "cma=" kernel parameter takes priority over a DT defined linux,cma-default region. Hence, give the reserved_mem framework the ability to detect this so that the DT defined cma region can skip initialization accordingly. Signed-off-by: Oreoluwa Babatunde <oreoluwa.babatunde@oss.qualcomm.com> Tested-by: Joy Zou <joy.zou@nxp.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Fixes: `8a6e02d0c0` ("of: reserved_mem: Restructure how the reserved memory regions are processed") Fixes: `2c223f7239` ("of: reserved_mem: Restructure call site for dma_contiguous_early_fixup()") Link: https://lore.kernel.org/r/20251210002027.1171519-1-oreoluwa.babatunde@oss.qualcomm.com [mszyprow: rebased onto v6.19-rc1, added fixes tags, added a stub for cma_skip_dt_default_reserved_mem() if no CONFIG_DMA_CMA is set] Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>	2026-01-29 00:26:36 +01:00
Breno Leitao	bd58782995	vmcoreinfo: make hwerr_data visible for debugging If the kernel is compiled with LTO, hwerr_data symbol might be lost, and vmcoreinfo doesn't have it dumped. This is currently seen in some production kernels with LTO enabled. Remove the static qualifier from hwerr_data so that the information is still preserved when the kernel is built with LTO. Making hwerr_data a global symbol ensures its debug info survives the LTO link process and appears in kallsyms. Also document it, so it doesn't get removed in the future as suggested by akpm. Link: https://lkml.kernel.org/r/20260122-fix_vmcoreinfo-v2-1-2d6311f9e36c@debian.org Fixes: `3fa805c37d` ("vmcoreinfo: track and log recoverable hardware errors") Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Omar Sandoval <osandov@osandov.com> Cc: Shuai Xue <xueshuai@linux.alibaba.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Zhiquan Li <zhiquan1.li@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 19:03:49 -08:00
Andrew Morton	412a32f0e5	kho: kho_preserve_vmalloc(): don't return 0 when ENOMEM kho_preserve_vmalloc() should return -ENOMEM when new_vmalloc_chunk() fails. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202601211636.IRaejjdw-lkp@intel.com/ Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 19:03:48 -08:00
Ran Xiaokai	e86436ad0a	kho: init alloc tags when restoring pages from reserved memory Memblock pages (including reserved memory) should have their allocation tags initialized to CODETAG_EMPTY via clear_page_tag_ref() before being released to the page allocator. When kho restores pages through kho_restore_page(), missing this call causes mismatched allocation/deallocation tracking and below warning message: alloc_tag was not set WARNING: include/linux/alloc_tag.h:164 at ___free_pages+0xb8/0x260, CPU#1: swapper/0/1 RIP: 0010:___free_pages+0xb8/0x260 kho_restore_vmalloc+0x187/0x2e0 kho_test_init+0x3c4/0xa30 do_one_initcall+0x62/0x2b0 kernel_init_freeable+0x25b/0x480 kernel_init+0x1a/0x1c0 ret_from_fork+0x2d1/0x360 Add missing clear_page_tag_ref() annotation in kho_restore_page() to fix this. Link: https://lkml.kernel.org/r/20260122132740.176468-1-ranxiaokai627@163.com Fixes: `fc33e4b44b` ("kexec: enable KHO support for memory preservation") Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-26 19:03:47 -08:00
Linus Torvalds	b83a8ff87a	tracing fixes for v6.19: - Fix a crash with passing a stacktrace between synthetic events A synthetic event is an event that combines two events into a single event that can display fields from both events as well as the time delta that took place between the events. It can also pass a stacktrace from the first event so that it can be displayed by the synthetic event (this is useful to get a stacktrace of a task scheduling out when blocked and recording the time it was blocked for). A synthetic event can also connect an existing synthetic event to another event. An issue was found that if the first synthetic event had a stacktrace as one of its fields, and that stacktrace field was passed to the new synthetic event to be displayed, it would crash the kernel. This was due to the stacktrace not being saved as a stacktrace but was still marked as one. When the stacktrace was read, it would try to read an array but instead read the integer metadata of the stacktrace and dereferenced a bad value. Fix this by saving the stacktrace field as a stracktrace. - Fix possible overflow in cmp_mod_entry() compare function A binary search is used to find a module address and if the addresses are greater than 2GB apart it could lead to truncation and cause a bad search result. Use normal compares instead of a subtraction between addresses to calculate the compare value. - Fix output of entry arguments in function graph tracer Depending on the configurations enabled, the entry can be two different types that hold the argument array. The macro FGRAPH_ENTRY_ARGS() is used to find the correct arguments from the given type. One location was missed and still referenced the arguments directly via entry->args and could produce the wrong value depending on how the kernel was configured. - Fix memory leak in scripts/tracepoint-update build tool If the array fails to allocate, the memory for the values needs to be freed and was not. Free the allocated values if the array failed to allocate. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaXUQLxQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qgsJAQDgtWH9DWUkJKgzXTkiOA0l8JArPOVf tCSMla2wWJA70QD/as2ptacYAFU9v1oxO5YIgsKOLFBF68ZUIhJtvXpqtAE= =JeC6 -----END PGP SIGNATURE----- Merge tag 'trace-v6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix a crash with passing a stacktrace between synthetic events A synthetic event is an event that combines two events into a single event that can display fields from both events as well as the time delta that took place between the events. It can also pass a stacktrace from the first event so that it can be displayed by the synthetic event (this is useful to get a stacktrace of a task scheduling out when blocked and recording the time it was blocked for). A synthetic event can also connect an existing synthetic event to another event. An issue was found that if the first synthetic event had a stacktrace as one of its fields, and that stacktrace field was passed to the new synthetic event to be displayed, it would crash the kernel. This was due to the stacktrace not being saved as a stacktrace but was still marked as one. When the stacktrace was read, it would try to read an array but instead read the integer metadata of the stacktrace and dereferenced a bad value. Fix this by saving the stacktrace field as a stacktrace. - Fix possible overflow in cmp_mod_entry() compare function A binary search is used to find a module address and if the addresses are greater than 2GB apart it could lead to truncation and cause a bad search result. Use normal compares instead of a subtraction between addresses to calculate the compare value. - Fix output of entry arguments in function graph tracer Depending on the configurations enabled, the entry can be two different types that hold the argument array. The macro FGRAPH_ENTRY_ARGS() is used to find the correct arguments from the given type. One location was missed and still referenced the arguments directly via entry->args and could produce the wrong value depending on how the kernel was configured. - Fix memory leak in scripts/tracepoint-update build tool If the array fails to allocate, the memory for the values needs to be freed and was not. Free the allocated values if the array failed to allocate. * tag 'trace-v6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: scripts/tracepoint-update: Fix memory leak in add_string() on failure function_graph: Fix args pointer mismatch in print_graph_retval() tracing: Avoid possible signed 64-bit truncation tracing: Fix crash on synthetic stacktrace field usage	2026-01-24 17:18:57 -08:00
Linus Torvalds	12a0094839	Misc fixes: - Fix auxiliary timekeeper update & locking bug - Reduce the sensitivity of the clocksource watchdog, to fix false positive measurements that marked the TSC clocksource unstable. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAml0ksERHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gvtRAAvLWNMb9YPW/65Gn8dkyMQUPzXBjEaq8A yP4L1b8EujoM6fSeQC0Y367hpn1GKHhEGZyj9ksRcU4dsU5XWlzPZr9QXCETmMuh ffTCvrUGI6d95685S+R1VplmhoCQAkerQAFcPQDAQd0QgfoEJO+hf2AWHrilnicu gCcZGDE/+gLAPYjR7LaRu7vb0W6VtwqhXvz8xTCGALmMlU84BDT3deLzCmujxtbF PvNaShBAtppm468Ln6HY2mk4mN5kWthPnonNF4n0zVYy8uAHLEEUERr/LndZ60Ua KlFgKukfoPXyJoU0M0umNcX6oXaRw7DeyNcPtJovZUwtfXyjkTPWrcfZ4sD3r37K QWjFqmbTCtj70vlUFP2RiHusOmNkuzcWKww5KdpA+HoeXEI4zcjhZq7zObyjDPIZ t0Cs5sZoWWpL7o53ikMjsO2Fe/zSDRaocYyImCWh2U+DdBn3/fh8a0pboXQakujx kjmuDrHaLXFNMI9h7NvlP143IW8g7AHUpu0piDGLVFFkZoNcII/8g7qawemQw8T9 ZCUmL3oq1Zu0z3aGq9GRFz31ysVLXwDZdtY8CCuHxgVTuZQQnRNrLiNiTjZn75E/ PY63jtSgKNJsAOTHJZ5hnyvcGb8w05anU0T7M38kTJFtiX4R6JaaDJVmj3eFG3g8 es9cQ4gJGmo= =O1ly -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2026-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: - Fix auxiliary timekeeper update & locking bug - Reduce the sensitivity of the clocksource watchdog, to fix false positive measurements that marked the TSC clocksource unstable * tag 'timers-urgent-2026-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: Reduce watchdog readout delay limit to prevent false positives timekeeping: Adjust the leap state for the correct auxiliary timekeeper	2026-01-24 09:36:03 -08:00
Linus Torvalds	af5a3fae86	Miscellaneous scheduler fixes: - Fix PELT clock synchronization bug when entering idle - Disable the NEXT_BUDDY feature, as during extensive testing Mel found that the negatives outweigh the positives. - Make wakeup preemption less aggressive, which resulted in an unreasonable increase in preemption frequency. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAml0kYYRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1j/Bw/+M5UK6EOPzLTs0Tj87X/kI7vXLN/uf9+K FpDtFNmoJnJKrQxPFfa8aT8OAz7zLfnjrGmRgxOG1fFkACMuB8s+TO4+i9lhzAsF gHOg6lDczi4K14mkTyiP/Bdaf3lZThghZitdRCBZFN4gjBpAh1rRI4ikQ0a3pwJH 6y1NF8fX/xRBm6Bgprgbv/em+fKsO89i6NwunPtYaxHsOGA5U0FVPFSa3yghz+X2 7Sl4U8LRdkU3Z62M+I8DKWuAMMb9nLwEXrSiNy+KJHIJ7FNFb1+U10gPrLq+z2Oi hvd25pzDzGUgOglZ7f1IRL5H48putMjCqv+A2wOZnzl7fHt4c02hpTBqMDle5hEP DYHyNNC1ZIRQ1zuy8j33LzB10ycP6pX9nawt+S1trEZcDVwaKwq9TbYsBjbJKtmZ V7C181fTUaNQ6M4wJY3gx+hD1ocrLv+Y0vhXJnK7sg+j4IhWJ7Yk6ne9hrpXbbUj cK8gzvo4OkShYtMlB9ut/RIiuyvFThyJd9rcNkeN4tTM9DVFURvOlIK2W6U70Yty OJDkhI6KBmkERPQoa43OsGvxZwF+g0KY8ISsLq3bJ2vLyATPdpJ2ns+zRg6I+B+q zHZmI86Fc8gr+O/y9oqi2VJhgio34RcoSId/QcAPC1Gl4TaIcmqFFTSzMhI+kLk9 Ng7vHy/3KQg= =pGnx -----END PGP SIGNATURE----- Merge tag 'sched-urgent-2026-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: - Fix PELT clock synchronization bug when entering idle - Disable the NEXT_BUDDY feature, as during extensive testing Mel found that the negatives outweigh the positives - Make wakeup preemption less aggressive, which resulted in an unreasonable increase in preemption frequency * tag 'sched-urgent-2026-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Revert force wakeup preemption sched/fair: Disable scheduler feature NEXT_BUDDY sched/fair: Fix pelt clock sync when entering idle	2026-01-24 09:29:41 -08:00
Linus Torvalds	ceaeaf66a2	Two perf events fixes: - Fix mmap_count warning & bug when creating a group member event with the PERF_FLAG_FD_OUTPUT flag. - Disable the sample period == 1 branch events BTS optimization on guests, because BTS is not virtualized. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAml0kF8RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1icZxAAhsYBY0uB3OzAgJYRIVlw/vzF7ubHh8+r PhQ+azap/uSk/dS8IXZ0FrP/vq9kgqEFpsYwWbB1EFRxB24nd8AAYarASyc22Dhb Qe6tKh4hzlDC9Cw0vJ111PgKJLiPDrmkxzS4C2HkPwYwriNql81hQjLW5nF6dHwg i8aP7+/uNB2h/xnPIRgkUFSUzBacRz1Snqgi/vSHmwEhk1GFI48rXoYywM9ItAnR CGN7tGxJ45YwDYeknZf9Ngsd3q+/eh38ihPfMEKoulf4oFvhIsiG4rJ1Vee0V9fD aD1NhukULtjw3lkKnMM75W+Jdb7fHDTOuxzUov9WpyIOIUTMqRkbvaKHgJzSD2SK 01TbP3kQixhGhiRXx78GDQwYGjX8JQxngbqyJvL708GNvxbcKYrLhqtr5Ho00pWH ERxx/ajoDXB7Neo7XPhgYRHk/lrlnZsK1LoidhMzN1UX6C12VnhPcD+zUSqRxz5w yFuJp0+7wF+G74FpO7kv8jv5KDB/lery5mdzWu0kqAKMfYWdw5eGoaI6T48DVoBy IwNYU8bxDBRPzu64XudDE4xBuTuy4HpJmbvOUkxMUkBGMTZ+nysqHu8D7cJJqYtL 2xYJOpR+P4Se9fRyR9xo5vtTZy27TQqTp+AjA5RlIoVOJhYK5T9mAFCJaEhdupbM 2IA4xdYhbb8= =lNSW -----END PGP SIGNATURE----- Merge tag 'perf-urgent-2026-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events fixes from Ingo Molnar: - Fix mmap_count warning & bug when creating a group member event with the PERF_FLAG_FD_OUTPUT flag - Disable the sample period == 1 branch events BTS optimization on guests, because BTS is not virtualized * tag 'perf-urgent-2026-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Do not enable BTS for guests perf: Fix refcount warning on event->mmap_count increment	2026-01-24 09:24:17 -08:00
Donglin Peng	c9703d17d2	function_graph: Fix args pointer mismatch in print_graph_retval() When funcgraph-args and funcgraph-retaddr are both enabled, many kernel functions display invalid parameters in trace logs. The issue occurs because print_graph_retval() passes a mismatched args pointer to print_function_args(). Fix this by retrieving the correct args pointer using the FGRAPH_ENTRY_ARGS() macro. Link: https://patch.msgid.link/20260112021601.1300479-1-dolinux.peng@gmail.com Fixes: `f83ac7544f` ("function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-01-23 13:34:38 -05:00
Ian Rogers	00f13e28a9	tracing: Avoid possible signed 64-bit truncation 64-bit truncation to 32-bit can result in the sign of the truncated value changing. The cmp_mod_entry is used in bsearch and so the truncation could result in an invalid search order. This would only happen were the addresses more than 2GB apart and so unlikely, but let's fix the potentially broken compare anyway. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260108002625.333331-1-irogers@google.com Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-01-23 13:34:30 -05:00
Steven Rostedt	90f9f5d64c	tracing: Fix crash on synthetic stacktrace field usage When creating a synthetic event based on an existing synthetic event that had a stacktrace field and the new synthetic event used that field a kernel crash occurred: ~# cd /sys/kernel/tracing ~# echo 's:stack unsigned long stack[];' > dynamic_events ~# echo 'hist:keys=prev_pid:s0=common_stacktrace if prev_state & 3' >> events/sched/sched_switch/trigger ~# echo 'hist:keys=next_pid:s1=$s0:onmatch(sched.sched_switch).trace(stack,$s1)' >> events/sched/sched_switch/trigger The above creates a synthetic event that takes a stacktrace when a task schedules out in a non-running state and passes that stacktrace to the sched_switch event when that task schedules back in. It triggers the "stack" synthetic event that has a stacktrace as its field (called "stack"). ~# echo 's:syscall_stack s64 id; unsigned long stack[];' >> dynamic_events ~# echo 'hist:keys=common_pid:s2=stack' >> events/synthetic/stack/trigger ~# echo 'hist:keys=common_pid:s3=$s2,i0=id:onmatch(synthetic.stack).trace(syscall_stack,$i0,$s3)' >> events/raw_syscalls/sys_exit/trigger The above makes another synthetic event called "syscall_stack" that attaches the first synthetic event (stack) to the sys_exit trace event and records the stacktrace from the stack event with the id of the system call that is exiting. When enabling this event (or using it in a historgram): ~# echo 1 > events/synthetic/syscall_stack/enable Produces a kernel crash! BUG: unable to handle page fault for address: 0000000000400010 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 6 UID: 0 PID: 1257 Comm: bash Not tainted 6.16.3+deb14-amd64 #1 PREEMPT(lazy) Debian 6.16.3-1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 RIP: 0010:trace_event_raw_event_synth+0x90/0x380 Code: c5 00 00 00 00 85 d2 0f 84 e1 00 00 00 31 db eb 34 0f 1f 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 <49> 8b 04 24 48 83 c3 01 8d 0c c5 08 00 00 00 01 cd 41 3b 5d 40 0f RSP: 0018:ffffd2670388f958 EFLAGS: 00010202 RAX: ffff8ba1065cc100 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: fffff266ffda7b90 RDI: ffffd2670388f9b0 RBP: 0000000000000010 R08: ffff8ba104e76000 R09: ffffd2670388fa50 R10: ffff8ba102dd42e0 R11: ffffffff9a908970 R12: 0000000000400010 R13: ffff8ba10a246400 R14: ffff8ba10a710220 R15: fffff266ffda7b90 FS: 00007fa3bc63f740(0000) GS:ffff8ba2e0f48000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000400010 CR3: 0000000107f9e003 CR4: 0000000000172ef0 Call Trace: <TASK> ? __tracing_map_insert+0x208/0x3a0 action_trace+0x67/0x70 event_hist_trigger+0x633/0x6d0 event_triggers_call+0x82/0x130 trace_event_buffer_commit+0x19d/0x250 trace_event_raw_event_sys_exit+0x62/0xb0 syscall_exit_work+0x9d/0x140 do_syscall_64+0x20a/0x2f0 ? trace_event_raw_event_sched_switch+0x12b/0x170 ? save_fpregs_to_fpstate+0x3e/0x90 ? _raw_spin_unlock+0xe/0x30 ? finish_task_switch.isra.0+0x97/0x2c0 ? __rseq_handle_notify_resume+0xad/0x4c0 ? __schedule+0x4b8/0xd00 ? restore_fpregs_from_fpstate+0x3c/0x90 ? switch_fpu_return+0x5b/0xe0 ? do_syscall_64+0x1ef/0x2f0 ? do_fault+0x2e9/0x540 ? __handle_mm_fault+0x7d1/0xf70 ? count_memcg_events+0x167/0x1d0 ? handle_mm_fault+0x1d7/0x2e0 ? do_user_addr_fault+0x2c3/0x7f0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The reason is that the stacktrace field is not labeled as such, and is treated as a normal field and not as a dynamic event that it is. In trace_event_raw_event_synth() the event is field is still treated as a dynamic array, but the retrieval of the data is considered a normal field, and the reference is just the meta data: // Meta data is retrieved instead of a dynamic array str_val = (char )(long)var_ref_vals[val_idx]; // Then when it tries to process it: len = ((unsigned long *)str_val) + 1; It triggers a kernel page fault. To fix this, first when defining the fields of the first synthetic event, set the filter type to FILTER_STACKTRACE. This is used later by the second synthetic event to know that this field is a stacktrace. When creating the field of the new synthetic event, have it use this FILTER_STACKTRACE to know to create a stacktrace field to copy the stacktrace into. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20260122194824.6905a38e@gandalf.local.home Fixes: `00cf3d672a` ("tracing: Allow synthetic events to pass around stacktraces") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2026-01-23 13:34:21 -05:00
Vincent Guittot	15257cc2f9	sched/fair: Revert force wakeup preemption This agressively bypasses run_to_parity and slice protection with the assumpiton that this is what waker wants but there is no garantee that the wakee will be the next to run. It is a better choice to use yield_to_task or WF_SYNC in such case. This increases the number of resched and preemption because a task becomes quickly "ineligible" when it runs; We update the task vruntime periodically and before the task exhausted its slice or at least quantum. Example: 2 tasks A and B wake up simultaneously with lag = 0. Both are eligible. Task A runs 1st and wakes up task C. Scheduler updates task A's vruntime which becomes greater than average runtime as all others have a lag == 0 and didn't run yet. Now task A is ineligible because it received more runtime than the other task but it has not yet exhausted its slice nor a min quantum. We force preemption, disable protection but Task B will run 1st not task C. Sidenote, DELAY_ZERO increases this effect by clearing positive lag at wake up. Fixes: `e837456fdc` ("sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals") Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260123102858.52428-1-vincent.guittot@linaro.org	2026-01-23 11:53:20 +01:00
Mel Gorman	4f70f106bc	sched/fair: Disable scheduler feature NEXT_BUDDY NEXT_BUDDY was disabled with the introduction of EEVDF and enabled again after NEXT_BUDDY was rewritten for EEVDF by commit `e837456fdc` ("sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals"). It was not expected that this would be a universal win without a crystal ball instruction but the reported regressions are a concern [1][2] even if gains were also reported. Specifically; o mysql with client/server running on different servers regresses o specjbb reports lower peak metrics o daytrader regresses The mysql is realistic and a concern. It needs to be confirmed if specjbb is simply shifting the point where peak performance is measured but still a concern. daytrader is considered to be representative of a real workload. Access to test machines is currently problematic for verifying any fix to this problem. Disable NEXT_BUDDY for now by default until the root causes are addressed. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Link: https://lore.kernel.org/lkml/4b96909a-f1ac-49eb-b814-97b8adda6229@arm.com [1] Link: https://lore.kernel.org/lkml/ec3ea66f-3a0d-4b5a-ab36-ce778f159b5b@linux.ibm.com [2] Link: https://patch.msgid.link/fyqsk63pkoxpeaclyqsm5nwtz3dyejplr7rg6p74xwemfzdzuu@7m7xhs5aqpqw	2026-01-23 11:53:19 +01:00
Vincent Guittot	98c88dc8a1	sched/fair: Fix pelt clock sync when entering idle Samuel and Alex reported regressions of the util_avg of RT rq with commit `17e3e88ed0` ("sched/fair: Fix pelt lost idle time detection"). It happens that fair is updating and syncing the pelt clock with task one when pick_next_task_fair() fails to pick a task but before the prev scheduling class got a chance to update its pelt signals. Move update_idle_rq_clock_pelt() in set_next_task_idle() which is called after prev class has been called. Fixes: `17e3e88ed0` ("sched/fair: Fix pelt lost idle time detection") Closes: https://lore.kernel.org/all/CAG2KctpO6VKS6GN4QWDji0t92_gNBJ7HjjXrE+6H+RwRXt=iLg@mail.gmail.com/ Closes: https://lore.kernel.org/all/8cf19bf0e0054dcfed70e9935029201694f1bb5a.camel@mediatek.com/ Reported-by: Samuel Wu <wusamuel@google.com> Reported-by: Alex Hoh <Alex.Hoh@mediatek.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Samuel Wu <wusamuel@google.com> Tested-by: Alex Hoh <Alex.Hoh@mediatek.com> Link: https://patch.msgid.link/20260121163317.505635-1-vincent.guittot@linaro.org	2026-01-21 17:46:08 +01:00
Will Rosenberg	d06bf78e55	perf: Fix refcount warning on event->mmap_count increment When calling refcount_inc(&event->mmap_count) inside perf_mmap_rb(), the following warning is triggered: refcount_t: addition on 0; use-after-free. WARNING: lib/refcount.c:25 PoC: struct perf_event_attr attr = {0}; int fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0); mmap(NULL, 0x3000, PROT_READ \| PROT_WRITE, MAP_SHARED, fd, 0); int victim = syscall(__NR_perf_event_open, &attr, 0, -1, fd, PERF_FLAG_FD_OUTPUT); mmap(NULL, 0x3000, PROT_READ \| PROT_WRITE, MAP_SHARED, victim, 0); This occurs when creating a group member event with the flag PERF_FLAG_FD_OUTPUT. The group leader should be mmap-ed and then mmap-ing the event triggers the warning. Since the event has copied the output_event in perf_event_set_output(), event->rb is set. As a result, perf_mmap_rb() calls refcount_inc(&event->mmap_count) when event->mmap_count = 0. Disallow the case when event->mmap_count = 0. This also prevents two events from updating the same user_page. Fixes: `448f97fba9` ("perf: Convert mmap() refcounts to refcount_t") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Will Rosenberg <whrosenb@asu.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260119184956.801238-1-whrosenb@asu.edu	2026-01-21 16:28:58 +01:00
Thomas Gleixner	c06343be0b	clocksource: Reduce watchdog readout delay limit to prevent false positives The "valid" readout delay between the two reads of the watchdog is larger than the valid delta between the resulting watchdog and clocksource intervals, which results in false positive watchdog results. Assume TSC is the clocksource and HPET is the watchdog and both have a uncertainty margin of 250us (default). The watchdog readout does: 1) wdnow = read(HPET); 2) csnow = read(TSC); 3) wdend = read(HPET); The valid window for the delta between #1 and #3 is calculated by the uncertainty margins of the watchdog and the clocksource: m = 2 * watchdog.uncertainty_margin + cs.uncertainty margin; which results in 750us for the TSC/HPET case. The actual interval comparison uses a smaller margin: m = watchdog.uncertainty_margin + cs.uncertainty margin; which results in 500us for the TSC/HPET case. That means the following scenario will trigger the watchdog: Watchdog cycle N: 1) wdnow[N] = read(HPET); 2) csnow[N] = read(TSC); 3) wdend[N] = read(HPET); Assume the delay between #1 and #2 is 100us and the delay between #1 and Watchdog cycle N + 1: 4) wdnow[N + 1] = read(HPET); 5) csnow[N + 1] = read(TSC); 6) wdend[N + 1] = read(HPET); If the delay between #4 and #6 is within the 750us margin then any delay between #4 and #5 which is larger than 600us will fail the interval check and mark the TSC unstable because the intervals are calculated against the previous value: wd_int = wdnow[N + 1] - wdnow[N]; cs_int = csnow[N + 1] - csnow[N]; Putting the above delays in place this results in: cs_int = (wdnow[N + 1] + 610us) - (wdnow[N] + 100us); -> cs_int = wd_int + 510us; which is obviously larger than the allowed 500us margin and results in marking TSC unstable. Fix this by using the same margin as the interval comparison. If the delay between two watchdog reads is larger than that, then the readout was either disturbed by interconnect congestion, NMIs or SMIs. Fixes: `4ac1dd3245` ("clocksource: Set cs_watchdog_read() checks based on .uncertainty_margin") Reported-by: Daniel J Blueman <daniel@quora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/lkml/20250602223251.496591-1-daniel@quora.org/ Link: https://patch.msgid.link/87bjjxc9dq.ffs@tglx	2026-01-21 11:33:11 +01:00
Linus Torvalds	c25f2fb1f4	17 hotfixes. 12 are cc:stable, 16 are for MM. - A 4 patch series from David Hildenbrand which fixes a few things realted to hugetlb PMD sharing - The remainder are singletons, please see their changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaW/vEQAKCRDdBJ7gKXxA jtMcAQCK1tKnINRmVjY3UJCqZMAaXvOdOoUIgHDaTXD/DWKm9AD9HRwWzYB4+TNr k/Te8F33d418LcMBTW9CLhrplQpaIAI= =+d1A -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2026-01-20-13-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: - A patch series from David Hildenbrand which fixes a few things related to hugetlb PMD sharing - The remainder are singletons, please see their changelogs for details * tag 'mm-hotfixes-stable-2026-01-20-13-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: restore per-memcg proactive reclaim with !CONFIG_NUMA mm/kfence: fix potential deadlock in reboot notifier Docs/mm/allocation-profiling: describe sysctrl limitations in debug mode mm: do not copy page tables unnecessarily for VM_UFFD_WP mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather mm/rmap: fix two comments related to huge_pmd_unshare() mm/hugetlb: fix two comments related to huge_pmd_unshare() mm/hugetlb: fix hugetlb_pmd_shared() mm: remove unnecessary and incorrect mmap lock assert x86/kfence: avoid writing L1TF-vulnerable PTEs mm/vma: do not leak memory when .mmap_prepare swaps the file migrate: correct lock ordering for hugetlb file folios panic: only warn about deprecated panic_print on write access fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes() mm: take into account mm_cid size for mm_struct static definitions mm: rename cpu_bitmap field to flexible_array mm: add missing static initializer for init_mm::mm_cid.lock	2026-01-20 13:32:16 -08:00
Linus Torvalds	c03e9c42ae	dma-mapping fixes for Linux 6.19 - minor fixes for the corner cases of the SWIOTLB pool management (Robin Murphy) -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCaW+NbgAKCRCJp1EFxbsS RKU3AQCIpNwIYN6evmnTXO/Y+AFRjsrb1pE1cUyHn5QTY4/o4wEA6BiSgMCrZEbv HTEs1ZSHgLOwBwZre1Z11icGg3BkRgY= =6fBC -----END PGP SIGNATURE----- Merge tag 'dma-mapping-6.19-2026-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: - minor fixes for the corner cases of the SWIOTLB pool management (Robin Murphy) * tag 'dma-mapping-6.19-2026-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma/pool: Avoid allocating redundant pools mm_zone: Generalise has_managed_dma() dma/pool: Improve pool lookup	2026-01-20 10:16:18 -08:00
Thomas Weißschuh	e806f7dde8	timekeeping: Adjust the leap state for the correct auxiliary timekeeper When __do_ajdtimex() was introduced to handle adjtimex for any timekeeper, this reference to tk_core was not updated. When called on an auxiliary timekeeper, the core timekeeper would be updated incorrectly. This gets caught by the lock debugging diagnostics because the timekeepers sequence lock gets written to without holding its associated spinlock: WARNING: include/linux/seqlock.h:226 at __do_adjtimex+0x394/0x3b0, CPU#2: test/125 aux_clock_adj (kernel/time/timekeeping.c:2979) __do_sys_clock_adjtime (kernel/time/posix-timers.c:1161 kernel/time/posix-timers.c:1173) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131) Update the correct auxiliary timekeeper. Fixes: `775f71ebed` ("timekeeping: Make do_adjtimex() reusable") Fixes: `ecf3e70304` ("timekeeping: Provide adjtimex() for auxiliary clocks") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260120-timekeeper-auxclock-leapstate-v1-1-5b358c6b3cfd@linutronix.de	2026-01-20 10:18:53 +01:00
Gal Pressman	90f3c12324	panic: only warn about deprecated panic_print on write access The panic_print_deprecated() warning is being triggered on both read and write operations to the panic_print parameter. This causes spurious warnings when users run 'sysctl -a' to list all sysctl values, since that command reads /proc/sys/kernel/panic_print and triggers the deprecation notice. Modify the handlers to only emit the deprecation warning when the parameter is actually being set: - sysctl_panic_print_handler(): check 'write' flag before warning. - panic_print_get(): remove the deprecation call entirely. This way, users are only warned when they actively try to use the deprecated parameter, not when passively querying system state. Link: https://lkml.kernel.org/r/20260106163321.83586-1-gal@nvidia.com Fixes: `ee13240cd7` ("panic: add note that panic_print sysctl interface is deprecated") Fixes: `2683df6539` ("panic: add note that 'panic_print' parameter is deprecated") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Nimrod Oren <noren@nvidia.com> Cc: Feng Tang <feng.tang@linux.alibaba.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-19 12:30:01 -08:00
Linus Torvalds	6f32aa9161	cgroup: Another fix for v6.19-rc5 - Add Chen Ridong as cpuset reviewer. - Add SPDX license identifiers to cgroup files that were missing them. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaW0ToA4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGel7AQD3eTbiR/OU0i5yJj6YZsIdVYcteqqsWBkDFnZ3 9pZZugEA+KCwi5eQz+cryKZ7y5fU5g5O6f4OP/lfLEcSmFDpfQU= =QHwy -----END PGP SIGNATURE----- Merge tag 'cgroup-for-6.19-rc5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Add Chen Ridong as cpuset reviewer - Add SPDX license identifiers to cgroup files that were missing them * tag 'cgroup-for-6.19-rc5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: kernel: cgroup: Add LGPL-2.1 SPDX license ID to legacy_freezer.c kernel: cgroup: Add SPDX-License-Identifier lines MAINTAINERS: Add Chen Ridong as cpuset reviewer	2026-01-18 14:30:27 -08:00
Linus Torvalds	b671c1dad2	Fix the update_needs_ipi() check in the hrtimer code that may result in incorrect skipping of hrtimer IPIs. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmlsr+oRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1iLPw/+OuA8hPlFY8nxrpVKASoiP7u+mGQ82Qmf xdf1Kimpmc5ydpGsxpRLKY19l3fLuGSJuOa2Url0Ro52JHxU+yPPOkN2ONjsPxLI mFMdhyXiS1PIgYt307OmNmKFadxdg5cGJTKY0wIS9JCcPFxI5Y3NFhWZ4ybPlGHK /bxtsjUctekUTDVahqP65lRIYMvjhO1I2LQfoPQyg/C+2n6Owy7TAX8xZovDZLHz a9RkkX6FPc8aR8Kj1VYm1yABNlUhGvFvTE6wfcfOMjiHHeDv2qYdwNHF2tK2eoKC QX609zBTCT/n0ioZlikLrKDZitj6uRgS2blBc+hodHEs4cY5gE2umWSU4IirT7HH GDjvG/vns/BLRr97gPdMaJ+N/sjdCSQ5tsDRrvHwCGC27VXPu3O3DYU128eH4eXW +KNJGQHJvR7slBKElQ4wxJaLhbLBB5rw/AqqdXa4PU8LqvMBBNcLZjHnrYlbD5Ei ovdX5VtRYOjrVl7pCNrXXdzFfBpXK+sOPHQMoJxnLHKXO+cdsZz8zIB7AHZEVe20 CIOMRcfgw2nvpS+0MqaHzre0ixyq+U2XSIu7MRIlq8GZ+zdjiVMqx3pPa19QRdyr hA9JupeRC9eZdymlms2MiSzqIVmWSVvgcyhIFSYw9yLe55vXcGyArF27RKGtBHRA uF+K5aT8gqA= =CAdu -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix the update_needs_ipi() check in the hrtimer code that may result in incorrect skipping of hrtimer IPIs" * tag 'timers-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Fix softirq base check in update_needs_ipi()	2026-01-18 10:56:32 -08:00
Linus Torvalds	837c8180e3	Misc DL scheduler fixes, mainly for a new category of bugs that were discovered and fixed recently: - Fix a race condition in the DL server - Fix a DL server bug which can result in incorrectly going idle when there's work available - Fix DL server bug which triggers a WARN() due to broken get_prio_dl() logic and subsequent misbehavior - Fix double update_rq_clock() calls - Fix setscheduler() assumption about static priorities - Make sure balancing callbacks are always called - Plus a handful of preparatory commits for the fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmlsrfIRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hgUA/9HQi51sxF/TlhKeoj9Z0zSO/L5Fbd7d5O ucsr+exoU7B+X95AnYC64pV3YlTNibD5y6HR4zpPzhlmEdFw8K9hTmDeSOF2cB0T dPdEY2KtvNoQwVwytPYKMsXtBTcJtZZqOec9BgGBzSIpjPm6Zhxr1J2/lKFy/udA 00aqNuJwfQW94dxvyPf+jPXbm0JW9HHZGNuuOXkkmZpkH+HIzBDrTWcAjHUaZt9H doPyEfO08dg3Uu4OxNBtxx94X7bQsn5RU1plSEc+xF7zynqwEIwDbelFftpHiM+z UqfFgkzCV8A8o/72BfZVhQgaspMZ+M4XSsaGBbouOTg3Rgx8gRKC23s8/xhDJSpy ayMWhuR5bsI04GD1xZVOwLvUi8oDIgAFIAKRI6gcW5eQELlejii0bAvs9Jw9/OjI QhhrX54sJCPPKMZpKO2n+AzQvv9a6+/p7ikiA2U5dR4wBEERFUw8ffRguHwSm+BU U1M+t+2tw+UiC7yML0zvIC3HdyfBhj33ZeW7PWyvFU0vjGm78U6NjqliAxbqW21l GdtQUpi9aXCtijXqHhLFvBtWGmqowj9UQiHroC0+nq1BsTpBDe575vwJNYF2K9he ht0hZqreUIf7xTsJHXlLFbtFxPQw0xAKzWJWUswaaqO8eS2JcWJbmHUAsHQuxmJB MbkJAlU7j/w= =DgiW -----END PGP SIGNATURE----- Merge tag 'sched-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc deadline scheduler fixes, mainly for a new category of bugs that were discovered and fixed recently: - Fix a race condition in the DL server - Fix a DL server bug which can result in incorrectly going idle when there's work available - Fix DL server bug which triggers a WARN() due to broken get_prio_dl() logic and subsequent misbehavior - Fix double update_rq_clock() calls - Fix setscheduler() assumption about static priorities - Make sure balancing callbacks are always called - Plus a handful of preparatory commits for the fixes" * tag 'sched-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Use ENQUEUE_MOVE to allow priority change sched: Deadline has dynamic priority sched: Audit MOVE vs balance_callbacks sched: Fold rq-pin swizzle into __balance_callbacks() sched/deadline: Avoid double update_rq_clock() sched/deadline: Ensure get_prio_dl() is up-to-date sched/deadline: Fix server stopping with runnable tasks sched: Provide idle_rq() helper sched/deadline: Fix potential race in dl_add_task_root_domain() sched/deadline: Remove unnecessary comment in dl_add_task_root_domain()	2026-01-18 10:17:40 -08:00
Linus Torvalds	b62ce2547f	Power management fixes for 6.19-rc6 - Fix a memory leak in em_create_pd() error path (Malaya Kumar Rout) - Fix stale description of the cost field in struct em_perf_state to reflect the current code (Yaxiong Tian) - Fix and revamp the energy model YNL specification added recently along with the energy model netlink interface (Changwoo Min) -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmlqWS8SHHJqd0Byand5 c29ja2kubmV0AAoJEO5fvZ0v1OO1BhAH/iSA/9YP9/xIW+rIZj2irsncGVc/hJ+V 8PcM2tar966AEQBlmDPc7ug2MXT0Mb/5WRtQo/IvZ1tdAALB+bC70FHjdu3y7SAX IzFGyHKJ9OVJHGxYq0TCBbRXgYubUZiqvAnaBTBZ/ZIFlt3B4KEyatfFfxJMb1Pe H9zQIyUVBN1tjPfgNVbc22XGfkwfmmla72le6GmHseKZRqpwutIwzxm1PoMNVeLb fQv625LsWrDD9NU6j5uGq3WEq3xPltubtHN545FTFi4aGwBYJaG1FuIi+kPiQuCR VZmxTWVEiVwjttiZf/xWbOkPfwQqdgeXlTk5j+iBlF1jkrrW75IuLYQ= =eJ2p -----END PGP SIGNATURE----- Merge tag 'pm-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix an error path memory leak in the energy model management code, fix a kerneldoc comment in it, and fix and revamp the energy model YNL specification added recently along with the new energy model management netlink interface (that received feedback after being added): - Fix a memory leak in em_create_pd() error path (Malaya Kumar Rout) - Fix stale description of the cost field in struct em_perf_state to reflect the current code (Yaxiong Tian) - Fix and revamp the energy model YNL specification added recently along with the energy model netlink interface (Changwoo Min)" * tag 'pm-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: EM: Add dump to get-perf-domains in the EM YNL spec PM: EM: Change cpus' type from string to u64 array in the EM YNL spec PM: EM: Rename em.yaml to dev-energymodel.yaml PM: EM: Fix yamllint warnings in the EM YNL spec PM: EM: Fix memory leak in em_create_pd() error path PM: EM: Fix incorrect description of the cost field in struct em_perf_state	2026-01-16 12:08:19 -08:00

1 2 3 4 5 ...

50409 commits