Commit graph

50761 commits

Author SHA1 Message Date
zhidao su
9adfcef334 sched_ext: Use READ_ONCE() for the read side of dsq->nr update
scx_bpf_dsq_nr_queued() reads dsq->nr via READ_ONCE() without holding
any lock, making dsq->nr a lock-free concurrently accessed variable.
However, dsq_mod_nr(), the sole writer of dsq->nr, only uses
WRITE_ONCE() on the write side without the matching READ_ONCE() on the
read side:

    WRITE_ONCE(dsq->nr, dsq->nr + delta);
                        ^^^^^^^
                        plain read -- KCSAN data race

The KCSAN documentation requires that if one accessor uses READ_ONCE()
or WRITE_ONCE() on a variable to annotate lock-free access, all other
accesses must also use the appropriate accessor. A plain read on the
right-hand side of WRITE_ONCE() leaves the pair incomplete and will
trigger KCSAN warnings.

Fix by using READ_ONCE() for the read side of the update:

    WRITE_ONCE(dsq->nr, READ_ONCE(dsq->nr) + delta);

This is consistent with scx_bpf_dsq_nr_queued() and makes the
concurrent access annotation complete and KCSAN-clean.

Signed-off-by: zhidao su <suzhidao@xiaomi.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-03-02 07:23:00 -10:00
David Carlier
749989b2d9 sched_ext: Fix SCX_EFLAG_INITIALIZED being a no-op flag
SCX_EFLAG_INITIALIZED is the sole member of enum scx_exit_flags with no
explicit value, so the compiler assigns it 0. This makes the bitwise OR
in scx_ops_init() a no-op:

    sch->exit_info->flags |= SCX_EFLAG_INITIALIZED; /* |= 0 */

As a result, BPF schedulers cannot distinguish whether ops.init()
completed successfully by inspecting exit_info->flags.

Assign the value 1LLU << 0 so the flag is actually set.

Fixes: f3aec2adce ("sched_ext: Add SCX_EFLAG_INITIALIZED to indicate successful ops.init()")
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-26 12:03:24 -10:00
David Carlier
2a064262eb sched_ext: Fix out-of-bounds access in scx_idle_init_masks()
scx_idle_node_masks is allocated with num_possible_nodes() elements but
indexed by NUMA node IDs via for_each_node(). On systems with
non-contiguous NUMA node numbering (e.g. nodes 0 and 4), node IDs can
exceed the array size, causing out-of-bounds memory corruption.

Use nr_node_ids instead, which represents the maximum node ID range and
is the correct size for arrays indexed by node ID.

Fixes: 7c60329e3521 ("sched_ext: Add NUMA-awareness to the default idle selection policy")
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-25 13:12:28 -10:00
Tejun Heo
83236b2e43 sched_ext: Disable preemption between scx_claim_exit() and kicking helper work
scx_claim_exit() atomically sets exit_kind, which prevents scx_error() from
triggering further error handling. After claiming exit, the caller must kick
the helper kthread work which initiates bypass mode and teardown.

If the calling task gets preempted between claiming exit and kicking the
helper work, and the BPF scheduler fails to schedule it back (since error
handling is now disabled), the helper work is never queued, bypass mode
never activates, tasks stop being dispatched, and the system wedges.

Disable preemption across scx_claim_exit() and the subsequent work kicking
in all callers - scx_disable() and scx_vexit(). Add
lockdep_assert_preemption_disabled() to scx_claim_exit() to enforce the
requirement.

Fixes: f0e1a0643a ("sched_ext: Implement BPF extensible scheduler class")
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-24 21:39:58 -10:00
Linus Torvalds
37a93dd5c4 Networking changes for 7.0
Core & protocols
 ----------------
 
  - A significant effort all around the stack to guide the compiler to
    make the right choice when inlining code, to avoid unneeded calls for
    small helper and stack canary overhead in the fast-path. This
    generates better and faster code with very small or no text size
    increases, as in many cases the call generated more code than the
    actual inlined helper.
 
  - Extend AccECN implementation so that is now functionally complete,
    also allow the user-space enabling it on a per network namespace
    basis.
 
  - Add support for memory providers with large (above 4K) rx buffer.
    Paired with hw-gro, larger rx buffer sizes reduce the number of
    buffers traversing the stack, dincreasing single stream CPU usage by
    up to ~30%.
 
  - Do not add HBH header to Big TCP GSO packets. This simplifies the RX
    path, the TX path and the NIC drivers, and is possible because
    user-space taps can now interpret correctly such packets without the
    HBH hint.
 
  - Allow IPv6 routes to be configured with a gateway address that is
    resolved out of a different interface than the one specified, aligning
    IPv6 to IPv4 behavior.
 
  - Multi-queue aware sch_cake. This makes it possible to scale the rate
    shaper of sch_cake across multiple CPUs, while still enforcing a
    single global rate on the interface.
 
  - Add support for the nbcon (new buffer console) infrastructure to
    netconsole, enabling lock-free, priority-based console operations that
    are safer in crash scenarios.
 
  - Improve the TCP ipv6 output path to cache the flow information, saving
    cpu cycles, reducing cache line misses and stack use.
 
  - Improve netfilter packet tracker to resolve clashes for most protocols,
    avoiding unneeded drops on rare occasions.
 
  - Add IP6IP6 tunneling acceleration to the flowtable infrastructure.
 
  - Reduce tcp socket size by one cache line.
 
  - Notify neighbour changes atomically, avoiding inconsistencies between
    the notification sequence and the actual states sequence.
 
  - Add vsock namespace support, allowing complete isolation of vsocks
    across different network namespaces.
 
  - Improve xsk generic performances with cache-alignment-oriented
    optimizations.
 
  - Support netconsole automatic target recovery, allowing netconsole
    to reestablish targets when underlying low-level interface comes back
    online.
 
 Driver API
 ----------
 
  - Support for switching the working mode (automatic vs manual) of a DPLL
    device via netlink.
 
  - Introduce PHY ports representation to expose multiple front-facing
    media ports over a single MAC.
 
  - Introduce "rx-polarity" and "tx-polarity" device tree properties, to
    generalize polarity inversion requirements for differential signaling.
 
  - Add helper to create, prepare and enable managed clocks.
 
 Device drivers
 --------------
 
  - Add Huawei hinic3 PF etherner driver.
 
  - Add DWMAC glue driver for Motorcomm YT6801 PCIe ethernet controller.
 
  - Add ethernet driver for MaxLinear MxL862xx switches
 
  - Remove parallel-port Ethernet driver.
 
  - Convert existing driver timestamp configuration reporting to
    hwtstamp_get and remove legacy ioctl().
 
  - Convert existing drivers to .get_rx_ring_count(), simplifing the RX
    ring count retrieval. Also remove the legacy fallback path.
 
  - Ethernet high-speed NICs:
    - Broadcom (bnxt, bng):
      - bnxt: add FW interface update to support FEC stats histogram and
        NVRAM defragmentation
      - bng: add TSO and H/W GRO support
    - nVidia/Mellanox (mlx5):
      - improve latency of channel restart operations, reducing the used
        H/W resources
      - add TSO support for UDP over GRE over VLAN
      - add flow counters support for hardware steering (HWS) rules
      - use a static memory area to store headers for H/W GRO, leading to
        12% RX tput improvement
    - Intel (100G, ice, idpf):
      - ice: reorganizes layout of Tx and Rx rings for cacheline
        locality and utilizes __cacheline_group* macros on the new layouts
      - ice: introduces Synchronous Ethernet (SyncE) support
    - Meta (fbnic):
      - adds debugfs for firmware mailbox and tx/rx rings vectors
 
  - Ethernet virtual:
    - geneve: introduce GRO/GSO support for double UDP encapsulation
 
  - Ethernet NICs consumer, and embedded:
    - Synopsys (stmmac):
      - some code refactoring and cleanups
    - RealTek (r8169):
      - add support for RTL8127ATF (10G Fiber SFP)
      - add dash and LTR support
    - Airoha:
      - AN8811HB 2.5 Gbps phy support
    - Freescale (fec):
      - add XDP zero-copy support
    - Thunderbolt:
      - add get link setting support to allow bonding
    - Renesas:
      - add support for RZ/G3L GBETH SoC
 
  - Ethernet switches:
    - Maxlinear:
      - support R(G)MII slow rate configuration
      - add support for Intel GSW150
    - Motorcomm (yt921x):
      - add DCB/QoS support
    - TI:
      - icssm-prueth: support bridging (STP/RSTP) via the switchdev
        framework
 
  - Ethernet PHYs:
    - Realtek:
      - enable SGMII and 2500Base-X in-band auto-negotiation
      - simplify and reunify C22/C45 drivers
    - Micrel: convert bindings to DT schema
 
  - CAN:
    - move skb headroom content into skb extensions, making CAN metadata
      access more robust
 
  - CAN drivers:
    - rcar_canfd:
      - add support for FD-only mode
      - add support for the RZ/T2H SoC
    - sja1000: cleanup the CAN state handling
 
  - WiFi:
    - implement EPPKE/802.1X over auth frames support
    - split up drop reasons better, removing generic RX_DROP
    - additional FTM capabilities: 6 GHz support, supported number of
      spatial streams and supported number of LTF repetitions
    - better mac80211 iterators to enumerate resources
    - initial UHR (Wi-Fi 8) support for cfg80211/mac80211
 
  - WiFi drivers:
    - Qualcomm/Atheros:
      - ath11k: support for Channel Frequency Response measurement
      - ath12k: a significant driver refactor to support
        multi-wiphy devices and and pave the way for future device support
        in the same driver (rather than splitting to ath13k)
      - ath12k: support for the QCC2072 chipset
    - Intel:
      - iwlwifi: partial Neighbor Awareness Networking (NAN) support
      - iwlwifi: initial support for U-NII-9 and IEEE 802.11bn
    - RealTek (rtw89):
      - preparations for RTL8922DE support
 
  - Bluetooth:
    - implement setsockopt(BT_PHY) to set the connection packet type/PHY
    - set link_policy on incoming ACL connections
 
  - Bluetooth drivers:
    - btusb: add support for MediaTek7920, Realtek RTL8761BU and 8851BE
    - btqca: add WCN6855 firmware priority selection feature
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmmMum4SHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkDnMP/3bpHAGj+gylTid3Xsj0TjJ8AkPsQs+W
 uvSMiCB1TvGCTD9kK36Vr+qPoIgJY10UxYMMjt5Gs0A9TvGDDfYnUOVoUIkfkWCH
 grqSdp6dVkyaJVfyLEcuOVQQG2HwEnhC4c3ZOhOxaKNAnsLCP142lYsMR9ktGRuA
 4vDGtz1+y7t8qBk/lyfXDM71KRrtq0HWJZIhmhz8QXTBsgPDfSejbTPNxXQOJoeO
 sKeArsHr/Cmvf89ZtLZ63vbfr4BKDm4PeXqPYR3PrQs2Yu6I1EK4lehygTY2yE2O
 I3MEPlvpa/tiVLxqXNNwEFbYIkMPY6FXS9x05hTxNZM65A6aB3vvdkqPVnVmAlXE
 f+4PYg9paI13lbzZOeQbGfZ5HgPpzQvnginaaX6s9Fp12K3Ll1FkwWdUznFWhzVn
 5LSrGyecR00CdKJByTIw9JGg/1ptz5a57pa8OQmcKRx3WhQ1XeV5TIJQF4QcPgHw
 ApyjmeGDTQMQMzha1fsaVr+i6BK2zgZvKK9uGDTX90xn2JUw/M75tyOlsTtGlnuM
 sZgj0KVGQlG2wLwBB/+D4S9Oi9YlPG00rkCs0E4jk5C/G4NBmMgpEPQg6azkb57h
 Uiy0paohxfwcZ3qbGA9In091ClGqIwOiCBaq+uXRq1ro88Neo6PWkjz5ItNrsD8t
 Ttgd5AVAQyPT
 =O31Y
 -----END PGP SIGNATURE-----

Merge tag 'net-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "Core & protocols:

   - A significant effort all around the stack to guide the compiler to
     make the right choice when inlining code, to avoid unneeded calls
     for small helper and stack canary overhead in the fast-path.

     This generates better and faster code with very small or no text
     size increases, as in many cases the call generated more code than
     the actual inlined helper.

   - Extend AccECN implementation so that is now functionally complete,
     also allow the user-space enabling it on a per network namespace
     basis.

   - Add support for memory providers with large (above 4K) rx buffer.
     Paired with hw-gro, larger rx buffer sizes reduce the number of
     buffers traversing the stack, dincreasing single stream CPU usage
     by up to ~30%.

   - Do not add HBH header to Big TCP GSO packets. This simplifies the
     RX path, the TX path and the NIC drivers, and is possible because
     user-space taps can now interpret correctly such packets without
     the HBH hint.

   - Allow IPv6 routes to be configured with a gateway address that is
     resolved out of a different interface than the one specified,
     aligning IPv6 to IPv4 behavior.

   - Multi-queue aware sch_cake. This makes it possible to scale the
     rate shaper of sch_cake across multiple CPUs, while still enforcing
     a single global rate on the interface.

   - Add support for the nbcon (new buffer console) infrastructure to
     netconsole, enabling lock-free, priority-based console operations
     that are safer in crash scenarios.

   - Improve the TCP ipv6 output path to cache the flow information,
     saving cpu cycles, reducing cache line misses and stack use.

   - Improve netfilter packet tracker to resolve clashes for most
     protocols, avoiding unneeded drops on rare occasions.

   - Add IP6IP6 tunneling acceleration to the flowtable infrastructure.

   - Reduce tcp socket size by one cache line.

   - Notify neighbour changes atomically, avoiding inconsistencies
     between the notification sequence and the actual states sequence.

   - Add vsock namespace support, allowing complete isolation of vsocks
     across different network namespaces.

   - Improve xsk generic performances with cache-alignment-oriented
     optimizations.

   - Support netconsole automatic target recovery, allowing netconsole
     to reestablish targets when underlying low-level interface comes
     back online.

  Driver API:

   - Support for switching the working mode (automatic vs manual) of a
     DPLL device via netlink.

   - Introduce PHY ports representation to expose multiple front-facing
     media ports over a single MAC.

   - Introduce "rx-polarity" and "tx-polarity" device tree properties,
     to generalize polarity inversion requirements for differential
     signaling.

   - Add helper to create, prepare and enable managed clocks.

  Device drivers:

   - Add Huawei hinic3 PF etherner driver.

   - Add DWMAC glue driver for Motorcomm YT6801 PCIe ethernet
     controller.

   - Add ethernet driver for MaxLinear MxL862xx switches

   - Remove parallel-port Ethernet driver.

   - Convert existing driver timestamp configuration reporting to
     hwtstamp_get and remove legacy ioctl().

   - Convert existing drivers to .get_rx_ring_count(), simplifing the RX
     ring count retrieval. Also remove the legacy fallback path.

   - Ethernet high-speed NICs:
      - Broadcom (bnxt, bng):
         - bnxt: add FW interface update to support FEC stats histogram
           and NVRAM defragmentation
         - bng: add TSO and H/W GRO support
      - nVidia/Mellanox (mlx5):
         - improve latency of channel restart operations, reducing the
           used H/W resources
         - add TSO support for UDP over GRE over VLAN
         - add flow counters support for hardware steering (HWS) rules
         - use a static memory area to store headers for H/W GRO,
           leading to 12% RX tput improvement
      - Intel (100G, ice, idpf):
         - ice: reorganizes layout of Tx and Rx rings for cacheline
           locality and utilizes __cacheline_group* macros on the new
           layouts
         - ice: introduces Synchronous Ethernet (SyncE) support
      - Meta (fbnic):
         - adds debugfs for firmware mailbox and tx/rx rings vectors

   - Ethernet virtual:
      - geneve: introduce GRO/GSO support for double UDP encapsulation

   - Ethernet NICs consumer, and embedded:
      - Synopsys (stmmac):
         - some code refactoring and cleanups
      - RealTek (r8169):
         - add support for RTL8127ATF (10G Fiber SFP)
         - add dash and LTR support
      - Airoha:
         - AN8811HB 2.5 Gbps phy support
      - Freescale (fec):
         - add XDP zero-copy support
      - Thunderbolt:
         - add get link setting support to allow bonding
      - Renesas:
         - add support for RZ/G3L GBETH SoC

   - Ethernet switches:
      - Maxlinear:
         - support R(G)MII slow rate configuration
         - add support for Intel GSW150
      - Motorcomm (yt921x):
         - add DCB/QoS support
      - TI:
         - icssm-prueth: support bridging (STP/RSTP) via the switchdev
           framework

   - Ethernet PHYs:
      - Realtek:
         - enable SGMII and 2500Base-X in-band auto-negotiation
         - simplify and reunify C22/C45 drivers
      - Micrel: convert bindings to DT schema

   - CAN:
      - move skb headroom content into skb extensions, making CAN
        metadata access more robust

   - CAN drivers:
      - rcar_canfd:
         - add support for FD-only mode
         - add support for the RZ/T2H SoC
      - sja1000: cleanup the CAN state handling

   - WiFi:
      - implement EPPKE/802.1X over auth frames support
      - split up drop reasons better, removing generic RX_DROP
      - additional FTM capabilities: 6 GHz support, supported number of
        spatial streams and supported number of LTF repetitions
      - better mac80211 iterators to enumerate resources
      - initial UHR (Wi-Fi 8) support for cfg80211/mac80211

   - WiFi drivers:
      - Qualcomm/Atheros:
         - ath11k: support for Channel Frequency Response measurement
         - ath12k: a significant driver refactor to support multi-wiphy
           devices and and pave the way for future device support in the
           same driver (rather than splitting to ath13k)
         - ath12k: support for the QCC2072 chipset
      - Intel:
         - iwlwifi: partial Neighbor Awareness Networking (NAN) support
         - iwlwifi: initial support for U-NII-9 and IEEE 802.11bn
      - RealTek (rtw89):
         - preparations for RTL8922DE support

   - Bluetooth:
      - implement setsockopt(BT_PHY) to set the connection packet type/PHY
      - set link_policy on incoming ACL connections

   - Bluetooth drivers:
      - btusb: add support for MediaTek7920, Realtek RTL8761BU and 8851BE
      - btqca: add WCN6855 firmware priority selection feature"

* tag 'net-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1254 commits)
  bnge/bng_re: Add a new HSI
  net: macb: Fix tx/rx malfunction after phy link down and up
  af_unix: Fix memleak of newsk in unix_stream_connect().
  net: ti: icssg-prueth: Add optional dependency on HSR
  net: dsa: add basic initial driver for MxL862xx switches
  net: mdio: add unlocked mdiodev C45 bus accessors
  net: dsa: add tag format for MxL862xx switches
  dt-bindings: net: dsa: add MaxLinear MxL862xx
  selftests: drivers: net: hw: Modify toeplitz.c to poll for packets
  octeontx2-pf: Unregister devlink on probe failure
  net: renesas: rswitch: fix forwarding offload statemachine
  ionic: Rate limit unknown xcvr type messages
  tcp: inet6_csk_xmit() optimization
  tcp: populate inet->cork.fl.u.ip6 in tcp_v6_syn_recv_sock()
  tcp: populate inet->cork.fl.u.ip6 in tcp_v6_connect()
  ipv6: inet6_csk_xmit() and inet6_csk_update_pmtu() use inet->cork.fl.u.ip6
  ipv6: use inet->cork.fl.u.ip6 and np->final in ip6_datagram_dst_update()
  ipv6: use np->final in inet6_sk_rebuild_header()
  ipv6: add daddr/final storage in struct ipv6_pinfo
  net: stmmac: qcom-ethqos: fix qcom_ethqos_serdes_powerup()
  ...
2026-02-11 19:31:52 -08:00
Linus Torvalds
1c2b4a4c2b pci-v7.0-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmmKJO4UHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vzotA/+OGSOPOs9hWd+OwNF5Dm2WA81yG/3
 K3Jx5uMuPoSjduMbPhVcib02Mr6YDJTa6WlYNVa76ADs2G6HxcVMFHutlYudSVcl
 umSF48FnyeH1LTba88dRoVj4DB47Cue+BfhYY2L0ZtxmjQq/NRuDFAaGBh54uNeF
 Gcdgr52QlM01n1X6yKvl7vE9gPdcPH80L256ssHAm6oSOHI1SPc6gqEKUUD02f8G
 FtzfTUAq/cWYjlY3VoS5GKtdHxFYuXqC5WfbURhJ11o/nVJY9k1Zx8n4eI1tmAtN
 7q692xjWSQJZlzepOBBEyjFUpIiy80tZ43z2ptRRBeI/n/qMmGPAov/g4MzegBWG
 IAEHTAp/xx1Wra1ynr7RNvYVcPpXm2TEim8gIGah9DkHbNgbu7ing+OO7DnQuyfD
 2h4hGD2622o6uikqkwzVd4mYuIcFu7SA6yROZhFn83BRnz0QOQienDrDlvOB8XCV
 EodLAOMc2KClvOmmriFMy11PH7MFFoXexV6KS83VfDJHi4+XzBsy0w6TXTohcA9s
 JTPIkSWqf/u6SrdLjXlFGyyJ2/KCgRiXFIBhhtYBMhDuuO7nG+mcSVzMa1PT0s6C
 PF+QoT7sJof/5VMJ4o3BgPrPkD3CQICrlt8XIt5I8ngsy6RZRQ5rt+pUix7Shcn8
 DgcunuINYfQtkfw=
 =LIjp
 -----END PGP SIGNATURE-----

Merge tag 'pci-v7.0-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Don't try to enable Extended Tags on VFs since that bit is Reserved
     and causes misleading log messages (Håkon Bugge)

   - Initialize Endpoint Read Completion Boundary to match Root Port,
     regardless of ACPI _HPX (Håkon Bugge)

   - Apply _HPX PCIe Setting Record only to AER configuration, and only
     when OS owns PCIe hotplug but not AER, to avoid clobbering Extended
     Tag and Relaxed Ordering settings (Håkon Bugge)

  Resource management:

   - Move CardBus code to setup-cardbus.c and only build it when
     CONFIG_CARDBUS is set (Ilpo Järvinen)

   - Fix bridge window alignment with optional resources, where
     additional alignment requirement was previously lost (Ilpo
     Järvinen)

   - Stop over-estimating bridge window size since they are now assigned
     without any gaps between them (Ilpo Järvinen)

   - Increase resource MAX_IORES_LEVEL to avoid /proc/iomem flattening
     for nested bridges and endpoints (Ilpo Järvinen)

   - Add pbus_mem_size_optional() to handle sizes of optional resources
     (SR-IOV VF BARs, expansion ROMs, bridge windows) (Ilpo Järvinen)

   - Don't claim disabled bridge windows to avoid spurious claim
     failures (Ilpo Järvinen)

  Driver binding:

   - Fix device reference leak in pcie_port_remove_service() (Uwe
     Kleine-König)

   - Move pcie_port_bus_match() and pcie_port_bus_type to PCIe-specific
     portdrv.c (Uwe Kleine-König)

   - Convert portdrv to use pcie_port_bus_type.probe() and .remove()
     callbacks so .probe() and .remove() can eventually be removed from
     struct device_driver (Uwe Kleine-König)

  Error handling:

   - Clear stale errors on reporting agents upon probe so they don't
     look like recent errors (Lukas Wunner)

   - Add generic RAS tracepoint for hotplug events (Shuai Xue)

   - Add RAS tracepoint for link speed changes (Shuai Xue)

  Power management:

   - Avoid redundant delay on transition from D3hot to D3cold if the
     device was already in D3hot (Brian Norris)

   - Prevent runtime suspend until devices are fully initialized to
     avoid saving incompletely configured device state (Brian Norris)

  Power control:

   - Add power_on/off callbacks with generic signature to pwrseq,
     tc9563, and slot drivers so they can be used by pwrctrl core
     (Manivannan Sadhasivam)

   - Add PCIe M.2 connector support to the slot pwrctrl driver
     (Manivannan Sadhasivam)

   - Switch to pwrctrl interfaces to create, destroy, and power on/off
     devices, calling them from host controller drivers instead of the
     PCI core (Manivannan Sadhasivam)

   - Drop qcom .assert_perst() callbacks since this is now done by the
     controller driver instead of the pwrctrl driver (Manivannan
     Sadhasivam)

  Virtualization:

   - Remove an incorrect unlock in pci_slot_trylock() error handling
     (Jinhui Guo)

   - Lock the bridge device for slot reset (Keith Busch)

   - Enable ACS after IOMMU configuration on OF platforms so ACS is
     enabled an all devices; previously the first device enumerated
     (typically a Root Port) didn't have ACS enabled (Manivannan
     Sadhasivam)

   - Disable ACS Source Validation for IDT 0x80b5 and 0x8090 switches to
     work around hardware erratum; previously ACS SV was only
     temporarily disabled, which worked for enumeration but not after
     reset (Manivannan Sadhasivam)

  Peer-to-peer DMA:

   - Release per-CPU pgmap ref when vm_insert_page() fails to avoid hang
     when removing the PCI device (Hou Tao)

   - Remove incorrect p2pmem_alloc_mmap() warning about page refcount
     (Hou Tao)

  Endpoint framework:

   - Add configfs sub-groups synchronously to avoid NULL pointer
     dereference when racing with removal (Liu Song)

   - Fix swapped parameters in pci_{primary/secondary}_epc_epf_unlink()
     functions (Manikanta Maddireddy)

  ASPEED PCIe controller driver:

   - Add ASPEED Root Complex DT binding and driver (Jacky Chou)

  Freescale i.MX6 PCIe controller driver:

   - Add DT binding and driver support for an optional external refclock
     in addition to the refclock from the internal PLL (Richard Zhu)

   - Fix CLKREQ# control so host asserts it during enumeration and
     Endpoints can use it afterwards to exit the L1.2 link state
     (Richard Zhu)

  NVIDIA Tegra PCIe controller driver:

   - Export irq_domain_free_irqs() to allow PCI/MSI drivers that tear
     down MSI domains to be built as modules (Aaron Kling)

   - Allow pci-tegra to be built as a module (Aaron Kling)

  NVIDIA Tegra194 PCIe controller driver:

   - Relax Kconfig so tegra194 can be built for platforms beyond
     Tegra194 (Vidya Sagar)

  Qualcomm PCIe controller driver:

   - Merge SC8180x DT binding into SM8150 (Krzysztof Kozlowski)

   - Move SDX55, SDM845, QCS404, IPQ5018, IPQ6018, IPQ8074 Gen3,
     IPQ8074, IPQ4019, IPQ9574, APQ8064, MSM8996, APQ8084 to dedicated
     schema (Krzysztof Kozlowski)

   - Add DT binding and driver support for SA8255p Endpoint being
     configured by firmware (Mrinmay Sarkar)

   - Parse PERST# from all PCIe bridge nodes for future platforms that
     will have PERST# in Switch Downstream Ports as well as in Root
     Ports (Manivannan Sadhasivam)

  Renesas RZ/G3S PCIe controller driver:

   - Use pci_generic_config_write() since the writability provided by
     the custom wrapper is unnecessary (Claudiu Beznea)

  SOPHGO PCIe controller driver:

   - Disable ASPM L0s and L1 on Sophgo 2044 PCIe Root Ports (Inochi
     Amaoto)

  Synopsys DesignWare PCIe controller driver:

   - Extend PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() to return a
     pointer to the preceding Capability, to allow removal of
     Capabilities that are advertised but not fully implemented (Qiang
     Yu)

   - Remove MSI and MSI-X Capabilities in platforms that can't support
     them, so the PCI core automatically falls back to INTx (Qiang Yu)

   - Add ASPM L1.1 and L1.2 Substates context to debugfs ltssm_status
     for drivers that support this (Shawn Lin)

   - Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if
     link is not up to avoid an unnecessary timeout (Manivannan
     Sadhasivam)

   - Revert dw-rockchip, qcom, and DWC core changes that used link-up
     IRQs to trigger enumeration instead of waiting for link to be up
     because the PCI core doesn't allocate bus number space for
     hierarchies that might be attached (Niklas Cassel)

   - Make endpoint iATU entry for MSI permanent instead of programming
     it dynamically, which is slow and racy with respect to other
     concurrent traffic, e.g., eDMA (Koichiro Den)

   - Use iMSI-RX MSI target address when possible to fix endpoints using
     32-bit MSI (Shawn Lin)

   - Allow DWC host controller driver probe to continue if device is not
     found or found but inactive; only fail when there's an error with
     the link (Manivannan Sadhasivam)

   - For controllers like NXP i.MX6QP and i.MX7D, where LTSSM registers
     are not accessible after PME_Turn_Off, simply wait 10ms instead of
     polling for L2/L3 Ready (Richard Zhu)

   - Use multiple iATU entries to map large bridge windows and DMA
     ranges when necessary instead of failing (Samuel Holland)

   - Add EPC dynamic_inbound_mapping feature bit for Endpoint
     Controllers that can update BAR inbound address translation without
     requiring EPF driver to clear/reset the BAR first, and advertise it
     for DWC-based Endpoints (Koichiro Den)

   - Add EPC subrange_mapping feature bit for Endpoint Controllers that
     can map multiple independent inbound regions in a single BAR,
     implement subrange mapping, advertise it for DWC-based Endpoints,
     and add Endpoint selftests for it (Koichiro Den)

   - Make resizable BARs work for Endpoint multi-PF configurations;
     previously it only worked for PF 0 (Aksh Garg)

   - Fix Endpoint non-PF 0 support for BAR configuration, ATU mappings,
     and Address Match Mode (Aksh Garg)

   - Set up iATU when ECAM is enabled; previously IO and MEM outbound
     windows weren't programmed, and ECAM-related iATU entries weren't
     restored after suspend/resume, so config accesses failed (Krishna
     Chaitanya Chundru)

  Miscellaneous:

   - Use system_percpu_wq and WQ_PERCPU to explicitly request per-CPU
     work so WQ_UNBOUND can eventually be removed (Marco Crivellari)"

* tag 'pci-v7.0-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (176 commits)
  PCI/bwctrl: Disable BW controller on Intel P45 using a quirk
  PCI: Disable ACS SV for IDT 0x8090 switch
  PCI: Disable ACS SV for IDT 0x80b5 switch
  PCI: Cache ACS Capabilities register
  PCI: Enable ACS after configuring IOMMU for OF platforms
  PCI: Add ACS quirk for Pericom PI7C9X2G404 switches [12d8:b404]
  PCI: Add ACS quirk for Qualcomm Hamoa & Glymur
  PCI: Use device_lock_assert() to verify device lock is held
  PCI: Use lockdep_assert_held(pci_bus_sem) to verify lock is held
  PCI: Fix pci_slot_lock () device locking
  PCI: Fix pci_slot_trylock() error handling
  PCI: Mark Nvidia GB10 to avoid bus reset
  PCI: Mark ASM1164 SATA controller to avoid bus reset
  PCI: host-generic: Avoid reporting incorrect 'missing reg property' error
  PCI/PME: Replace RMW of Root Status register with direct write
  PCI/AER: Clear stale errors on reporting agents upon probe
  PCI: Don't claim disabled bridge windows
  PCI: rzg3s-host: Fix device node reference leak in rzg3s_pcie_host_parse_port()
  PCI: dwc: Fix missing iATU setup when ECAM is enabled
  PCI: dwc: Clean up iATU index usage in dw_pcie_iatu_setup()
  ...
2026-02-11 17:20:38 -08:00
Linus Torvalds
db9571a661 printk changes for 7.0
-----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmmMSVQbFIAAAAAABAAO
 bWFudTIsMi41KzEuMTEsMiwyAAoJEFKgDEdIgJTyvWwQAJA0YB6t/2pjVPCm7V8P
 8uPm8e6MErF+XoBpihPdI0540SJ5LQ9XGFdS9iP4udmBeZ0PZEfSzRbZe5jEMD3q
 qYa1zXOctGGVjGIVeNwZ6zMGyQs3ILsN1VcUc5LDot8W1CCKfrbpRlWsMQv9RgYD
 Vls3p9G92333bQLGRR3iOAYAJG6QVOeRyBFqrclSt8Am8DCfCm2DhqEfRn43tqRS
 QU1Ai6G3Lvb/mKXmoPGsAHnuCRjlEvYhPp8iar9kPQOJxxAbBrEOLvf4NVrn9hVc
 vtrgn0ohzMWOUhmnp8+uOaJDOwUsZeoKG5wmA4FxUS4c/d+gKIjMnKbn8SvrVaIj
 XTlHpewrTa9s+UDJtf67dzc+cmnTRSl1LItMAV7Q2iHCs+9rAewSkA+dEkG828jk
 4esNhlWocZt99JFJv4mFe1r/a8o/czGpmdieJ/hXJXL4k9oZqCgQu3tTcm8vkAvL
 /oQevZxTGQYEwb28kTum4fdj6FPdkOQlqXnxkomDfzfdj/m48RSroLiuTYWab4Gn
 AxM8eJuRbM6dQ2cdCV65yxnywiTwhjutdFMFkWkNWEw2kBhZPjyBFmskUTj04HP1
 1u39894IloU52fncP+dLJZc+iKqnDlk0VzXQiVoLQonfN2av9WS2aGiofoYJmOSJ
 dmz2GsQu1wv0OJ2L9sg7qKkf
 =2Iob
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - Check all mandatory callbacks when registering nbcon consoles

 - Fix some compiler warnings

* tag 'printk-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  vsnprintf: drop __printf() attributes on binary printing functions
  printf: convert test_hashed into macro
  printk: nbcon: Check for device_{lock,unlock} callbacks
2026-02-11 14:36:47 -08:00
Linus Torvalds
41f1a08645 Kbuild/Kconfig updates for 7.0
Kbuild changes
 ==============
 
 * Drop '*_probe' pattern from modpost section check allowlist, which hid
   legitimate warnings (Johan Hovold)
 
 * Disable -Wtype-limits altogether, instead of enabling at W=2 (Vincent
   Mailhol)
 
 * Improve UAPI testing to skip testing headers that require a libc when
   CONFIG_CC_CAN_LINK is not set, opening up testing of headers with no
   libc dependencies to more environments (Thomas Weißschuh)
 
 * Update gendwarfksyms documentation with required dependencies (Jihan
   LIN)
 
 * Reject invalid LLVM= values to avoid unintentionally falling back to
   system toolchain (Thomas Weißschuh)
 
 * Add a script to help run the kernel build process in a container for
   consistent environments and testing (Guillaume Tucker)
 
 * Simplify kallsyms by getting rid of the relative base (Ard Biesheuvel)
 
 * Performance and usability improvements to scripts/make_fit.py (Simon
   Glass)
 
 * Minor various clean ups and fixes
 
 Kconfig changes
 ===============
 
 * Move XPM icons to individual files, clearing up GTK deprecation
   warnings (Rostislav Krasny)
 
 * Support
 
     depends on FOO if BAR
 
   as syntactic sugar for
 
     depends on FOO || !BAR' (Nicolas Pitre, Graham Roff)
 
 * Refactor merge_config.sh to use awk over shell/sed/grep, dramatically
   speeding up processing large number of config fragments (Anders
   Roxell, Mikko Rapeli)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR74yXHMTGczQHYypIdayaRccAalgUCaYpgQwAKCRAdayaRccAa
 liOGAQCqMI42YMLqljFcPu3B/3f43xhDBCXAhquPBIMhbgt+aAEAmmo3uMLHKSRV
 XZDKkq13HMMV3Zlmrn5Xk/tzk+hkwwk=
 =WYl4
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux

Pull Kbuild/Kconfig updates from Nathan Chancellor:
 "Kbuild:

   - Drop '*_probe' pattern from modpost section check allowlist, which
     hid legitimate warnings (Johan Hovold)

   - Disable -Wtype-limits altogether, instead of enabling at W=2
     (Vincent Mailhol)

   - Improve UAPI testing to skip testing headers that require a libc
     when CONFIG_CC_CAN_LINK is not set, opening up testing of headers
     with no libc dependencies to more environments (Thomas Weißschuh)

   - Update gendwarfksyms documentation with required dependencies
     (Jihan LIN)

   - Reject invalid LLVM= values to avoid unintentionally falling back
     to system toolchain (Thomas Weißschuh)

   - Add a script to help run the kernel build process in a container
     for consistent environments and testing (Guillaume Tucker)

   - Simplify kallsyms by getting rid of the relative base (Ard
     Biesheuvel)

   - Performance and usability improvements to scripts/make_fit.py
     (Simon Glass)

   - Minor various clean ups and fixes

  Kconfig:

   - Move XPM icons to individual files, clearing up GTK deprecation
     warnings (Rostislav Krasny)

   - Support

        depends on FOO if BAR

     as syntactic sugar for

        depends on FOO || !BAR

     (Nicolas Pitre, Graham Roff)

   - Refactor merge_config.sh to use awk over shell/sed/grep,
     dramatically speeding up processing large number of config
     fragments (Anders Roxell, Mikko Rapeli)"

* tag 'kbuild-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: (39 commits)
  kbuild: remove dependency of run-command on config
  scripts/make_fit: Compress dtbs in parallel
  scripts/make_fit: Support a few more parallel compressors
  kbuild: Support a FIT_EXTRA_ARGS environment variable
  scripts/make_fit: Move dtb processing into a function
  scripts/make_fit: Support an initial ramdisk
  scripts/make_fit: Speed up operation
  rust: kconfig: Don't require RUST_IS_AVAILABLE for rustc-option
  MAINTAINERS: Add scripts/install.sh into Kbuild entry
  modpost: Amend ppc64 save/restfpr symnames for -Os build
  MIPS: tools: relocs: Ship a definition of R_MIPS_PC32
  streamline_config.pl: remove superfluous exclamation mark
  kbuild: dummy-tools: Add python3
  scripts: kconfig: merge_config.sh: warn on duplicate input files
  scripts: kconfig: merge_config.sh: use awk in checks too
  scripts: kconfig: merge_config.sh: refactor from shell/sed/grep to awk
  kallsyms: Get rid of kallsyms relative base
  mips: Add support for PC32 relocations in vmlinux
  Documentation: dev-tools: add container.rst page
  scripts: add tool to run containerized builds
  ...
2026-02-11 13:40:35 -08:00
Linus Torvalds
38ef046544 sched_ext: Changes for v6.20
- Move C example schedulers back from the external scx repo to
   tools/sched_ext as the authoritative source. scx_userland and scx_pair
   are returning while scx_sdt (BPF arena-based task data management) is
   new. These schedulers will be dropped from the external repo.
 
 - Improve error reporting by adding scx_bpf_error() calls when DSQ
   creation fails across all in-tree schedulers.
 
 - Avoid redundant irq_work_queue() calls in destroy_dsq() by only
   queueing when llist_add() indicates an empty list.
 
 - Fix flaky init_enable_count selftest by properly synchronizing
   pre-forked children using a pipe instead of sleep().
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaYo1pQ4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGa5VAP9udiGksQ4bBFQrUD+0yhuvjsSuXzssfdfxHgT6
 Hj66wgEAjgbnSyxfcGrB+w7DWUxNLaZlXepibVMPcfvAaSieSgU=
 =UvLY
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-6.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext updates from Tejun Heo:

 - Move C example schedulers back from the external scx repo to
   tools/sched_ext as the authoritative source. scx_userland and
   scx_pair are returning while scx_sdt (BPF arena-based task data
   management) is new. These schedulers will be dropped from the
   external repo.

 - Improve error reporting by adding scx_bpf_error() calls when DSQ
   creation fails across all in-tree schedulers

 - Avoid redundant irq_work_queue() calls in destroy_dsq() by only
   queueing when llist_add() indicates an empty list

 - Fix flaky init_enable_count selftest by properly synchronizing
   pre-forked children using a pipe instead of sleep()

* tag 'sched_ext-for-6.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  selftests/sched_ext: Fix init_enable_count flakiness
  tools/sched_ext: Fix data header access during free in scx_sdt
  tools/sched_ext: Add error logging for dsq creation failures in remaining schedulers
  tools/sched_ext: add arena based scheduler
  tools/sched_ext: add scx_pair scheduler
  tools/sched_ext: add scx_userland scheduler
  sched_ext: Add error logging for dsq creation failures
  sched_ext: Avoid multiple irq_work_queue() calls in destroy_dsq()
2026-02-11 13:35:24 -08:00
Linus Torvalds
ff661eeee2 cgroup: Changes for v6.20
- cpuset changes:
 
   - Continue separating v1 and v2 implementations by moving more
     v1-specific logic into cpuset-v1.c.
 
   - Improve partition handling. Sibling partitions are no longer
     invalidated on cpuset.cpus conflict, cpuset.cpus changes no longer
     fail in v2, and effective_xcpus computation is made consistent.
 
   - Fix partition effective CPUs overlap that caused a warning on cpuset
     removal when sibling partitions shared CPUs.
 
 - Increase the maximum cgroup subsystem count from 16 to 32 to
   accommodate future subsystem additions.
 
 - Misc cleanups and selftest improvements including switching to
   css_is_online() helper, removing dead code and stale documentation
   references, using lockdep_assert_cpuset_lock_held() consistently,
   and adding polling helpers for asynchronously updated cgroup
   statistics.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaYozIw4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGZQKAQD51KJQz4M79wf2yBhIBLOnM4aakMalhSwZNL4O
 JiGutwD+Ir33VzNX8aXBuDin9p4wI15O54PhqSenJbelKRQ3Dws=
 =gR7L
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - cpuset changes:

    - Continue separating v1 and v2 implementations by moving more
      v1-specific logic into cpuset-v1.c

    - Improve partition handling. Sibling partitions are no longer
      invalidated on cpuset.cpus conflict, cpuset.cpus changes no longer
      fail in v2, and effective_xcpus computation is made consistent

    - Fix partition effective CPUs overlap that caused a warning on
      cpuset removal when sibling partitions shared CPUs

 - Increase the maximum cgroup subsystem count from 16 to 32 to
   accommodate future subsystem additions

 - Misc cleanups and selftest improvements including switching to
   css_is_online() helper, removing dead code and stale documentation
   references, using lockdep_assert_cpuset_lock_held() consistently, and
   adding polling helpers for asynchronously updated cgroup statistics

* tag 'cgroup-for-6.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
  cpuset: fix overlap of partition effective CPUs
  cgroup: increase maximum subsystem count from 16 to 32
  cgroup: Remove stale cpu.rt.max reference from documentation
  cpuset: replace direct lockdep_assert_held() with lockdep_assert_cpuset_lock_held()
  cgroup/cpuset: Move the v1 empty cpus/mems check to cpuset1_validate_change()
  cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict
  cgroup/cpuset: Don't fail cpuset.cpus change in v2
  cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier()
  cgroup/cpuset: Streamline rm_siblings_excl_cpus()
  cpuset: remove dead code in cpuset-v1.c
  cpuset: remove v1-specific code from generate_sched_domains
  cpuset: separate generate_sched_domains for v1 and v2
  cpuset: move update_domain_attr_tree to cpuset_v1.c
  cpuset: add cpuset1_init helper for v1 initialization
  cpuset: add cpuset1_online_css helper for v1-specific operations
  cpuset: add lockdep_assert_cpuset_lock_held helper
  cpuset: Remove unnecessary checks in rebuild_sched_domains_locked
  cgroup: switch to css_is_online() helper
  selftests: cgroup: Replace sleep with cg_read_key_long_poll() for waiting on nr_dying_descendants
  selftests: cgroup: make test_memcg_sock robust against delayed sock stats
  ...
2026-02-11 13:20:50 -08:00
Linus Torvalds
9bdc64892d workqueue: Changes for v6.20
- Rework the rescuer to process work items one-by-one instead of
   slurping all pending work items in a single pass. As there is only
   one rescuer per workqueue, a single long-blocking work item could
   cause high latency for all tasks queued behind it, even after memory
   pressure is relieved and regular kworkers become available to service
   them.
 
 - Add CONFIG_BOOTPARAM_WQ_STALL_PANIC build-time option and
   workqueue.panic_on_stall_time parameter for time-based stall panic,
   giving systems more control over workqueue stall handling.
 
 - Replace BUG_ON() with panic() in the stall panic path for clearer
   intent and more informative output.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaYov/A4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGWXnAQCfyELl+evz3RdFhyTiVCM1TiOnC1TsBjgkm3SJ
 orMhwgEAkgg40jino34wgeZRfdIAThxQ1O6bsvTpooWKjlCYcQY=
 =zZCF
 -----END PGP SIGNATURE-----

Merge tag 'wq-for-6.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue updates from Tejun Heo:

 - Rework the rescuer to process work items one-by-one instead of
   slurping all pending work items in a single pass.

   As there is only one rescuer per workqueue, a single long-blocking
   work item could cause high latency for all tasks queued behind it,
   even after memory pressure is relieved and regular kworkers become
   available to service them.

 - Add CONFIG_BOOTPARAM_WQ_STALL_PANIC build-time option and
   workqueue.panic_on_stall_time parameter for time-based stall panic,
   giving systems more control over workqueue stall handling.

 - Replace BUG_ON() with panic() in the stall panic path for clearer
   intent and more informative output.

* tag 'wq-for-6.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: replace BUG_ON with panic in panic_on_wq_watchdog
  workqueue: add time-based panic for stalls
  workqueue: add CONFIG_BOOTPARAM_WQ_STALL_PANIC option
  workqueue: Process extra works in rescuer on memory pressure
  workqueue: Process rescuer work items one-by-one using a cursor
  workqueue: Make send_mayday() take a PWQ argument directly
2026-02-11 13:13:32 -08:00
Thomas Gleixner
1e83ccd592 sched/mmcid: Don't assume CID is CPU owned on mode switch
Shinichiro reported a KASAN UAF, which is actually an out of bounds access
in the MMCID management code.

   CPU0						CPU1
   						T1 runs in userspace
   T0: fork(T4) -> Switch to per CPU CID mode
         fixup() set MM_CID_TRANSIT on T1/CPU1
   T4 exit()
   T3 exit()
   T2 exit()
						T1 exit() switch to per task mode
						 ---> Out of bounds access.

As T1 has not scheduled after T0 set the TRANSIT bit, it exits with the
TRANSIT bit set. sched_mm_cid_remove_user() clears the TRANSIT bit in
the task and drops the CID, but it does not touch the per CPU storage.
That's functionally correct because a CID is only owned by the CPU when
the ONCPU bit is set, which is mutually exclusive with the TRANSIT flag.

Now sched_mm_cid_exit() assumes that the CID is CPU owned because the
prior mode was per CPU. It invokes mm_drop_cid_on_cpu() which clears the
not set ONCPU bit and then invokes clear_bit() with an insanely large
bit number because TRANSIT is set (bit 29).

Prevent that by actually validating that the CID is CPU owned in
mm_drop_cid_on_cpu().

Fixes: 007d84287c ("sched/mmcid: Drop per CPU CID immediately when switching to per task mode")
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: stable@vger.kernel.org
Closes: https://lore.kernel.org/aYsZrixn9b6s_2zL@shinmob
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-11 12:59:56 -08:00
Petr Mladek
9abbecf408 Merge branch 'for-6.20' into for-linus 2026-02-11 10:14:35 +01:00
Linus Torvalds
192c015940 powerpc updates for 7.0
- Implement masked user access
  - Add support for internal only per-CPU instructions and inline the bpf_get_smp_processor_id() and bpf_get_current_task()
  - Fix pSeries MSI-X allocation failure when quota is exceeded
  - Fix recursive pci_lock_rescan_remove locking in EEH event handling
  - Support tailcalls with subprogs & BPF exceptions on 64bit
  - Extend "trusted" keys to support the PowerVM Key Wrapping Module (PKWM)
 
 Thanks to: Abhishek Dubey, Christophe Leroy, Gaurav Batra, Guangshuo Li, Jarkko
 Sakkinen, Mahesh Salgaonkar, Mimi Zohar, Miquel Sabaté Solà, Nam Cao, Narayana
 Murty N, Nayna Jain, Nilay Shroff, Puranjay Mohan, Saket Kumar Bhaskar, Sourabh
 Jain, Srish Srinivasan, Venkat Rao Bagalkote,
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmmL7S0ACgkQpnEsdPSH
 ZJR2xA/9G+tZp9+2TidbLSyPT5o063uC5H5+j5QcvvHWi/ImGUNtixlDm5HcZLCR
 yKywE1piKYBU8HoISjzAt0+JCVd3ZjJ8chTpKgCHLXPRSBTgdR1MG+SXQysDJSWb
 yA4pwDikmwoLlfi+pf500F0nX2DRCRdU2Yi28ZFeaF/khJ7ebwj41QJ7LjN22+Q1
 G8Kq543obZluzSoVvfG4xUK4ByWER+Zdd2F6699iMP68yw5PJ8PPc0SUGt8nuD4i
 FUs0Lw7XV7i/K3+zm/ZgH5+Cvn7wOIcMNkXgFlxJXkFit97KXUDijifYPoXQyKLL
 ksD7SPFdV0++Sc+3mWcgW4j+hQZC0Pn864unmh8C6ug3SagQ+3pE1JYWKwCmoyXd
 49ROH0y+npArJ4NAc79eweunhafGcRYTSG+zV7swQvpRocMujEqa4CDz4uk1ll5W
 1yAac08AN6PnfcXl2VMrcDboziTlQVFcnNQbK/ieYMC7KpgA+udw1hd2rOWNZCPd
 u0byXxR1ak5YaAEuyMztd/39hrExx8306Jtkh5FIRZKWGAO+3np5bi3vxk11rDni
 c9BGh2JIMtuPUGys3wcFPGMRTKwF2bDFW/pB+5hMHeLUdlkni9WGCX8eLe2klYsy
 T7fBVb4d99IVrJGYv3J1lwELgjrgXvv35XOaUiyJyZhcbng15cc=
 =zJoL
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates for 7.0

 - Implement masked user access

 - Add bpf support for internal only per-CPU instructions and inline the
   bpf_get_smp_processor_id() and bpf_get_current_task() functions

 - Fix pSeries MSI-X allocation failure when quota is exceeded

 - Fix recursive pci_lock_rescan_remove locking in EEH event handling

 - Support tailcalls with subprogs & BPF exceptions on 64bit

 - Extend "trusted" keys to support the PowerVM Key Wrapping Module
   (PKWM)

Thanks to Abhishek Dubey, Christophe Leroy, Gaurav Batra, Guangshuo Li,
Jarkko Sakkinen, Mahesh Salgaonkar, Mimi Zohar, Miquel Sabaté Solà, Nam
Cao, Narayana Murty N, Nayna Jain, Nilay Shroff, Puranjay Mohan, Saket
Kumar Bhaskar, Sourabh Jain, Srish Srinivasan, and Venkat Rao Bagalkote.

* tag 'powerpc-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (27 commits)
  powerpc/pseries: plpks: export plpks_wrapping_is_supported
  docs: trusted-encryped: add PKWM as a new trust source
  keys/trusted_keys: establish PKWM as a trusted source
  pseries/plpks: add HCALLs for PowerVM Key Wrapping Module
  pseries/plpks: expose PowerVM wrapping features via the sysfs
  powerpc/pseries: move the PLPKS config inside its own sysfs directory
  pseries/plpks: fix kernel-doc comment inconsistencies
  powerpc/smp: Add check for kcalloc() failure in parse_thread_groups()
  powerpc: kgdb: Remove OUTBUFMAX constant
  powerpc64/bpf: Additional NVR handling for bpf_throw
  powerpc64/bpf: Support exceptions
  powerpc64/bpf: Add arch_bpf_stack_walk() for BPF JIT
  powerpc64/bpf: Avoid tailcall restore from trampoline
  powerpc64/bpf: Support tailcalls with subprogs
  powerpc64/bpf: Moving tail_call_cnt to bottom of frame
  powerpc/eeh: fix recursive pci_lock_rescan_remove locking in EEH event handling
  powerpc/pseries: Fix MSI-X allocation failure when quota is exceeded
  powerpc/iommu: bypass DMA APIs for coherent allocations for pre-mapped memory
  powerpc64/bpf: Inline bpf_get_smp_processor_id() and bpf_get_current_task/_btf()
  powerpc64/bpf: Support internal-only MOV instruction to resolve per-CPU addrs
  ...
2026-02-10 21:46:12 -08:00
Breno Leitao
60325c27d3 printk: Add execution context (task name/CPU) to printk_info
Extend struct printk_info to include the task name, pid, and CPU
number where printk messages originate. This information is captured
at vprintk_store() time and propagated through printk_message to
nbcon_write_context, making it available to nbcon console drivers.

This is useful for consoles like netconsole that want to include
execution context in their output, allowing correlation of messages
with specific tasks and CPUs regardless of where the console driver
actually runs.

The feature is controlled by CONFIG_PRINTK_EXECUTION_CTX, which is
automatically selected by CONFIG_NETCONSOLE_DYNAMIC. When disabled,
the helper functions compile to no-ops with no overhead.

Suggested-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20260206-nbcon-v7-1-62bda69b1b41@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-10 19:51:56 -08:00
Linus Torvalds
57cb845067 - A nice cleanup to the paravirt code containing a unification of the paravirt
clock interface, taming the include hell by splitting the pv_ops structure
   and removing of a bunch of obsolete code. Work by Juergen Gross.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmmLKHAACgkQEsHwGGHe
 VUrURg//Ucf+3EAIkLCmFkH0WwYmQl2JjRYww8bPAw3iJMIVxy4dMnaBbsUiAtUp
 kYza+pgEtvyAwwd8RIEs85c9VhZn0DKoaWV8goBH3zFH6YvIRiLwb0w2QvjkF+70
 FNU+4zlvt/I3FD+tWNElAgVtkFL3Gmzm44qyLLsPtlYaJ71xFl2XB7V+TlqXMHzE
 m8BMenP9/CrbTlBBdNJGzAkAbWi1uAP+IydvuFNolH/F2lqVM2z5Ta3gUWWCIk/q
 jWrPLDZCHr2WlBZNUGamKVVH9NEh+7YNwBAGUrSNYGZFoaFjqeX6lN3djzS+wXIj
 0nDoW35jN0QNKz239MdXZDf1mfpb6ZQd/iOhFjo4dAvbm+J8WPAMr98ac8wR3Dyb
 2LF/BxkoKWRabxQApXSCrLPXEuqT6Qc1+lDA0bNHg51zBoqP5vRNVZRwArnzGB+O
 LxDKx+o4VYOf+UCaB6oQHjylbSgFvIedZ9p822hBe3QG9act8indRE8LWip7Utld
 peoJGgvlQ0xtClh6FjVHpvmVfAvk7Zki5ywj2GwmB/TZ0yywuGStAjE3UqY168/M
 gb7MSajh+HHZNj1/2+b/se4CUYlAgIPDQ+SwHJPm5TqyopvnOVi/2XWmjbx8I5jT
 jS0nxaxD+SbESSZ6IMAsppnAAxAYbvRHGIS+6mtNCXVkaV1pMbA=
 =AeFt
 -----END PGP SIGNATURE-----

Merge tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 paravirt updates from Borislav Petkov:

 - A nice cleanup to the paravirt code containing a unification of the
   paravirt clock interface, taming the include hell by splitting the
   pv_ops structure and removing of a bunch of obsolete code (Juergen
   Gross)

* tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/paravirt: Use XOR r32,r32 to clear register in pv_vcpu_is_preempted()
  x86/paravirt: Remove trailing semicolons from alternative asm templates
  x86/pvlocks: Move paravirt spinlock functions into own header
  x86/paravirt: Specify pv_ops array in paravirt macros
  x86/paravirt: Allow pv-calls outside paravirt.h
  objtool: Allow multiple pv_ops arrays
  x86/xen: Drop xen_mmu_ops
  x86/xen: Drop xen_cpu_ops
  x86/xen: Drop xen_irq_ops
  x86/paravirt: Move pv_native_*() prototypes to paravirt.c
  x86/paravirt: Introduce new paravirt-base.h header
  x86/paravirt: Move paravirt_sched_clock() related code into tsc.c
  x86/paravirt: Use common code for paravirt_steal_clock()
  riscv/paravirt: Use common code for paravirt_steal_clock()
  loongarch/paravirt: Use common code for paravirt_steal_clock()
  arm64/paravirt: Use common code for paravirt_steal_clock()
  arm/paravirt: Use common code for paravirt_steal_clock()
  sched: Move clock related paravirt code to kernel/sched
  paravirt: Remove asm/paravirt_api_clock.h
  x86/paravirt: Move thunk macros to paravirt_types.h
  ...
2026-02-10 19:01:45 -08:00
Linus Torvalds
f1c538ca81 Updates for the VDSO subsystem:
- Provide the missing 64-bit variant of clock_getres()
 
     This allows the extension of CONFIG_COMPAT_32BIT_TIME to the vDSO and
     finally the removal of 32-bit time types from the kernel and UAPI.
 
   - Remove the useless and broken getcpu_cache from the VDSO
 
     The intention was to provide a trivial way to retrieve the CPU number from
     the VDSO, but as the VDSO data is per process there is no way to make it
     work.
 
   - Switch get/put_unaligned() from packed struct to memcpy()
 
     The packed struct violates strict aliasing rules which requires to pass
     -fno-strict-aliasing to the compiler. As this are scalar values
     __builtin_memcpy() turns them into simple loads and stores
 
   - Use __typeof_unqual__() for __unqual_scalar_typeof()
 
     The get/put_unaligned() changes triggered a new sparse warning when __beNN
     types are used with get/put_unaligned() as sparse builds add a special
     'bitwise' attribute to them which prevents sparse to evaluate the Generic
     in __unqual_scalar_typeof().
 
     Newer sparse versions support __typeof_unqual__() which avoids the problem,
     but requires a recent sparse install. So this adds a sanity check to sparse
     builds, which validates that sparse is available and capable of handling it.
 
   - Force inline __cvdso_clock_getres_common()
 
     Compilers sometimes un-inline agressively, which results in function call
     overhead and problems with automatic stack variable initialization.
 
     Interestingly enough the force inlining results in smaller code than the
     un-inlined variant produced by GCC when optimizing for size.
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmmJ2eAQHHRnbHhAa2Vy
 bmVsLm9yZwAKCRCmGPVMDXSYoXhQD/4mjneVlRBbQ6Nt9LxxhzPlYXvED6u8b4SX
 R//DQ4qqqagh2fSpE57tr3f56HqhUmXptGJSgEvr6tVKoyugsLqP6sN9J9J3o181
 jnPQR0VBP9dihQZ5X93pKo9NZWxLIn0uLD0RsdCbE9NTx4ciw4VIPFYQqkC4Rw6b
 jiRrDR2l8EhV8cmxB6puW5WaQ932M6Awabw9RumzwH3MzIIlbc5Ero51S9eS64LL
 byU5XeWUe295W1Gxze5RHHJWyNQEyx1eUCFfe3LWvfpz7FMzc2AQsKnIJDzW3GiO
 UGu5MGptbLpG+ccvhVEs6/Ls5pWXcoCw4WuDNAunCCOmqda98oDniKf2LwRRbLT0
 nAfLNatMnhXdTPk2zbS45z9uipUQAGKmVAE3/LVqB+ekcutmIGMyqHgR75QX0b4l
 CQPkC9rBsV6gGsScWTnhRydhqioNO/uhhrQv0vEXnKZa0ysTbgZKt3JDZbgUEL2B
 uDxXKyrqjpnqDZKlMMaoLtwd+l+T80ya4/NhHd4ZNGUpTUrHVw2H47lgE7ahCxEk
 /SvXTZSU4Jp8sVQIQ+J6y5z2AQ/xGy++zvNKiZMyP9fQuPqhiDqYkLpmOp61bARx
 wqyVsfGXZYSB1l16AiSC/CyUDcMqqsFohGXQ/Yf0SOiVtXu2WUyfofY1N0IXIGu4
 WOVV9mH1yg==
 =0ogX
 -----END PGP SIGNATURE-----

Merge tag 'timers-vdso-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull VDSO updates from Thomas Gleixner:

 - Provide the missing 64-bit variant of clock_getres()

   This allows the extension of CONFIG_COMPAT_32BIT_TIME to the vDSO and
   finally the removal of 32-bit time types from the kernel and UAPI.

 - Remove the useless and broken getcpu_cache from the VDSO

   The intention was to provide a trivial way to retrieve the CPU number
   from the VDSO, but as the VDSO data is per process there is no way to
   make it work.

 - Switch get/put_unaligned() from packed struct to memcpy()

   The packed struct violates strict aliasing rules which requires to
   pass -fno-strict-aliasing to the compiler. As this are scalar values
   __builtin_memcpy() turns them into simple loads and stores

 - Use __typeof_unqual__() for __unqual_scalar_typeof()

   The get/put_unaligned() changes triggered a new sparse warning when
   __beNN types are used with get/put_unaligned() as sparse builds add a
   special 'bitwise' attribute to them which prevents sparse to evaluate
   the Generic in __unqual_scalar_typeof().

   Newer sparse versions support __typeof_unqual__() which avoids the
   problem, but requires a recent sparse install. So this adds a sanity
   check to sparse builds, which validates that sparse is available and
   capable of handling it.

 - Force inline __cvdso_clock_getres_common()

   Compilers sometimes un-inline agressively, which results in function
   call overhead and problems with automatic stack variable
   initialization.

   Interestingly enough the force inlining results in smaller code than
   the un-inlined variant produced by GCC when optimizing for size.

* tag 'timers-vdso-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  vdso/gettimeofday: Force inlining of __cvdso_clock_getres_common()
  x86/percpu: Make CONFIG_USE_X86_SEG_SUPPORT work with sparse
  compiler: Use __typeof_unqual__() for __unqual_scalar_typeof()
  powerpc/vdso: Provide clock_getres_time64()
  tools headers: Remove unneeded ignoring of warnings in unaligned.h
  tools headers: Update the linux/unaligned.h copy with the kernel sources
  vdso: Switch get/put_unaligned() from packed struct to memcpy()
  parisc: Inline a type punning version of get_unaligned_le32()
  vdso: Remove struct getcpu_cache
  MIPS: vdso: Provide getres_time64() for 32-bit ABIs
  arm64: vdso32: Provide clock_getres_time64()
  ARM: VDSO: Provide clock_getres_time64()
  ARM: VDSO: Patch out __vdso_clock_getres() if unavailable
  x86/vdso: Provide clock_getres_time64() for x86-32
  selftests: vDSO: vdso_test_abi: Add test for clock_getres_time64()
  selftests: vDSO: vdso_test_abi: Use UAPI system call numbers
  selftests: vDSO: vdso_config: Add configurations for clock_getres_time64()
  vdso: Add prototype for __vdso_clock_getres_time64()
2026-02-10 17:02:23 -08:00
Linus Torvalds
353a7e8a69 Updates for the core time subsystem:
- Inline timecounter_cyc2time() as that is now used in the networking
     hotpath. Inlining it significantly improves performance.
 
   - Optimize the tick dependency check in case that the tracepoint is disabled,
     which improves the hotpath performance in the tick management code, which
     is a hotpath on transitions in and out of idle.
 
   - The usual cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmmJ1WUQHHRnbHhAa2Vy
 bmVsLm9yZwAKCRCmGPVMDXSYoabjD/9cc06W408fLGfqEGlriV8BnMJSGiWO/efe
 SJPA4IziIu6DS7aQugdmk1vTqV9PxQWZ+L0tgSe1BNzUY+9x+z+I7X+asY6pjPmp
 4zQSGSuotvFpRsWa4MsEgOjXk0QV/1Lm1Ba0eaBBqfY1wacoxJgjxHd5qjZBycUp
 H1nLAshXbDBsKYdTLqI4KQq9ZbHKTPz47UhJiKwPHEIRSGNzuAdNkD7KOv34GEDU
 ZHlTfWv8dMoH0lp/N/2+NmojYOzVXjH9UeeG3MYyfrcrzxU8Ifg/21zz9KgcyrF/
 rrDnUCdnJUk2z8AltgSyQfqw3awUrTDTnHQDFe9oS4t5lexXd7fkmbjqsZBpP8OD
 QQJd6yyI2j5GsKKI/fcYz/3NEHZR5HxjgT/Dv3qdHh/7L49FIXFI68kcOqdHFGSf
 21CY25cBRr7uPnyAs/G3LtSb/WyBVgTzMaPo798gIqpoDkSIuSlR0x252yaI6kW9
 fMXul0ZYwU5j8xGPt39IW9/Xm0HAMBUdvMusKn8CKBPfYCAx2EtH75NXilCuRgjG
 UJvzEv20W3f3IXgvrVqTznKUh0Airsfo8blNtX/LqhCZraeJIRV/UGUrhL8QpOp+
 Ri8FJ1Dr57oRJDDqM7XVckJuyOGkRvYtvJyPhbyEXiWZpDUqQnrivm+3j7DecHN2
 6+ThtJ6P4A==
 =Zh3T
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer core updates from Thomas Gleixner:

 - Inline timecounter_cyc2time() as that is now used in the networking
   hotpath. Inlining it significantly improves performance.

 - Optimize the tick dependency check in case that the tracepoint is
   disabled, which improves the hotpath performance in the tick
   management code, which is a hotpath on transitions in and out of
   idle.

 - The usual cleanups and improvements

* tag 'timers-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time/kunit: Document handling of negative years of is_leap()
  tick/nohz: Optimize check_tick_dependency() with early return
  time/sched_clock: Use ACCESS_PRIVATE() to evaluate hrtimer::function
  hrtimer: Drop _tv64() helpers
  hrtimer: Remove public definition of HIGH_RES_NSEC
  hrtimer: Remove unused resolution constants
  time/timecounter: Inline timecounter_cyc2time()
2026-02-10 16:41:59 -08:00
Linus Torvalds
3381d7b2b3 Updates for the [PCI] MSI subsystem:
- Add interrupt redirection infrastructure
 
     Some PCI controllers use a single demultiplexing interrupt for the MSI
     interrupts of subordinate devices.
 
     This prevents setting the interrupt affinity of device interrupts, which
     causes device interrupts to be delivered to a single CPU. That obviously is
     counterproductive for multi-queue devices and interrupt balancing.
 
     To work around this limitation the new infrastructure installs a dummy
     irq_set_affinity() callback which captures the affinity mask and picks a
     redirection target CPU out of the mask.
 
     When the PCI controller demultiplexes the interrupts it invokes a new
     handling function in the core, which either runs the interrupt handler in
     the context of the target CPU or delegates it to irq_work on the target CPU.
 
   - Utilize the interrupt redirection mechanism in the PCI DWC host controller
     driver.
 
     This allows affinity control for the subordinate device MSI interrupts
     instead of being randomly executed on the CPU which runs the demultiplex
     handler.
 
   - Replace the binary 64-bit MSI flag with a DMA mask
 
     Some PCI devices have PCI_MSI_FLAGS_64BIT in the MSI capability, but
     implement less than 64 address bits. This breaks on platforms where such a
     device is assigned an MSI address higher than what's supported.
 
     With the binary 64-bit flag there is no other choice than disabling 64-bit
     MSI support which leaves the device disfunctional.
 
     By using a DMA mask the address limit of a device can be described
     correctly which provides support for the above scenario.
 
   - Make use of the DMA mask based address limit in the hda/intel and radeon
     drivers to enable them on affected platforms.
 
   - The usual small cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmmJyPsQHHRnbHhAa2Vy
 bmVsLm9yZwAKCRCmGPVMDXSYocekEADAsS5FlUkFuBy6kODhl5J7b9/oqlL3IEnR
 3CdOrFO716dce+Gej+Wp3T93dJ3XsfD7nCZuy99+LwUkTubmaBJXfjY9S+Ket0ID
 Wc3ltiD6f3GEFB14rXN+fFG/u+OOLkaXdpbQpiTnqL4JAti9qF80D4uon28+FC/o
 wc1MhqVBPbOHU9iM196ngkZuXCNVPLcnZN6PNBgIn0sxx06LcK+daY0bNGxfn5Ua
 LY9SD8hN7tYlkDi42nB/ZXMrexqT9cxSqHObmPX+G/QLfXCRBtD+gyVbs+KVzpRL
 hmFERTlUh9tUdcQFrjgiZP/r4N5ilzsu6w5ZpSOEsGuahFUPZWJWFFC1D8rmq/Ay
 X9HKge1jqXJtbCf0pJM/kdbJKSH5S6aLP3iF37y+PqITIEIX8jIT3oVcvL9hI0BW
 HFxpuJfhAVg63kMegZCO/iROTusLHUZr8iwYOM7pEiCE6fP46jPijsPffVIWvrlJ
 2LVOv/A5wy9q8FW8sF9/M6CW7cdeYQF06Ce3qAyMxjZjEyR3KFBJCVWjhqyMxZJP
 3zFl1XXKXgRO+CDrYKVTPIaXR5D76k/l6MnECQpq81CQyQKm2h6A9PyY+n70FfbZ
 BimakUlBGCd92ZbSxzC9pAOiHo0ZoKtc5BhnsRhKVyBCmEKDazEplDuf49/OSZUE
 p2kaf/PuOw==
 =SCSQ
 -----END PGP SIGNATURE-----

Merge tag 'irq-msi-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull MSI updates from Thomas Gleixner:
 "Updates for the [PCI] MSI subsystem:

   - Add interrupt redirection infrastructure

     Some PCI controllers use a single demultiplexing interrupt for the
     MSI interrupts of subordinate devices.

     This prevents setting the interrupt affinity of device interrupts,
     which causes device interrupts to be delivered to a single CPU.
     That obviously is counterproductive for multi-queue devices and
     interrupt balancing.

     To work around this limitation the new infrastructure installs a
     dummy irq_set_affinity() callback which captures the affinity mask
     and picks a redirection target CPU out of the mask.

     When the PCI controller demultiplexes the interrupts it invokes a
     new handling function in the core, which either runs the interrupt
     handler in the context of the target CPU or delegates it to
     irq_work on the target CPU.

   - Utilize the interrupt redirection mechanism in the PCI DWC host
     controller driver.

     This allows affinity control for the subordinate device MSI
     interrupts instead of being randomly executed on the CPU which runs
     the demultiplex handler.

   - Replace the binary 64-bit MSI flag with a DMA mask

     Some PCI devices have PCI_MSI_FLAGS_64BIT in the MSI capability,
     but implement less than 64 address bits. This breaks on platforms
     where such a device is assigned an MSI address higher than what's
     supported.

     With the binary 64-bit flag there is no other choice than disabling
     64-bit MSI support which leaves the device disfunctional.

     By using a DMA mask the address limit of a device can be described
     correctly which provides support for the above scenario.

   - Make use of the DMA mask based address limit in the hda/intel and
     radeon drivers to enable them on affected platforms

   - The usual small cleanups and improvements"

* tag 'irq-msi-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ALSA: hda/intel: Make MSI address limit based on the device DMA limit
  drm/radeon: Make MSI address limit based on the device DMA limit
  PCI/MSI: Check the device specific address mask in msi_verify_entries()
  PCI/MSI: Convert the boolean no_64bit_msi flag to a DMA address mask
  genirq/redirect: Prevent writing MSI message on affinity change
  PCI/MSI: Unmap MSI-X region on error
  genirq: Update effective affinity for redirected interrupts
  PCI: dwc: Enable MSI affinity support
  PCI: dwc: Code cleanup
  genirq: Add interrupt redirection infrastructure
  genirq/msi: Correct kernel-doc in <linux/msi.h>
2026-02-10 16:30:29 -08:00
Linus Torvalds
66bbe4a8ed Updates for the interrupt core subsystem:
- Remove the interrupt timing infrastructure
 
    This was added seven years ago to be used for power management purposes, but
    that integration never happened.
 
  - Clean up the remaining setup_percpu_irq() users
 
    The memory allocator is available when interrupts can be requested so there
    is not need for static irq_action. Move the remaining users to
    request_percpu_irq() and delete the historical cruft.
 
  - Warn when interrupt flag inconsistencies are detected in request*_irq().
 
    Inconsistent flags can lead to hard to diagnose malfunction. The fallout of
    this new warning has been addressed in next and the fixes are coming in via
    the maintainer trees and the tip irq/cleanup pull requests.
 
  - Invoke affinity notifier when CPU hotplug breaks affinity
 
    Otherwise the code using the notifier misses the affinity change and
    operates on stale information.
 
  - The usual cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmmJwSAQHHRnbHhAa2Vy
 bmVsLm9yZwAKCRCmGPVMDXSYoV1rD/9lxepiLVhIvaaauyr69HkHnLDJyyyIuNQk
 OZ5Rx8JEnmtJIN49DJ/Ps/VjDZA6GYV2RvtKfqCNtpA3AdpHcLCDw2/mZKetHH3v
 0RlTNeu0TKt76Mo+d7vadJZqdcKbBPYEBucaQed4XLGFq9Vb7nuVMFLzkDZe+C5U
 zG8zhtIaPQyY+MUxFGpWJdeKcddnIC/lKRFBmhglmuflQ8ZdrydRlJZzFBxfcnPy
 dcZVnOVZYD4aaW+uryd9eVWg0de5q97C03OobVdhwt077zcmTZpGanRCBM/+7Kop
 UDB/p04Z5FFFKFbd+j0f76gl1l6yQ06K/2Pby8qrLnodsqOEgjEgLs0qbHhkNsVw
 g5W6ry9boLWAA1XhGFOnqZ4Dc/Qb1G7DpnRUzpmAlyL04MTManjocIHNPrGhjlNG
 +jT3iPuraiRUoNZ9K2PBYX9SikGVnKJXFmOwlffkPpOtaJk/oYVUAdMq2W71C4Du
 TZeAbd2RyrSPMYaSh7pnBhM7KBmsYIg+hHU1+RaAi/rFSOT0Se5sUWwmUUUzSfOn
 KvCAzMGX6gIg8rkSmTp7OvLQhXEfJ2kEVrVZACS6MBxS7+PNVBkxYuOqxdhdoVj0
 sZrfB8Qnm4mAcbq7YJtrD6az3+LH5CkrdNyWIAL62leMmqWDoTOgLRMFh2/yEMTl
 BeAOgPO2pw==
 =3hJ2
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq core updates from Thomas Gleixner:
 "Updates for the interrupt core subsystem:

   - Remove the interrupt timing infrastructure

     This was added seven years ago to be used for power management
     purposes, but that integration never happened.

   - Clean up the remaining setup_percpu_irq() users

     The memory allocator is available when interrupts can be requested
     so there is not need for static irq_action. Move the remaining
     users to request_percpu_irq() and delete the historical cruft.

   - Warn when interrupt flag inconsistencies are detected in
     request*_irq().

     Inconsistent flags can lead to hard to diagnose malfunction. The
     fallout of this new warning has been addressed in next and the
     fixes are coming in via the maintainer trees and the tip
     irq/cleanup pull requests.

   - Invoke affinity notifier when CPU hotplug breaks affinity

     Otherwise the code using the notifier misses the affinity change
     and operates on stale information.

   - The usual cleanups and improvements"

* tag 'irq-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/proc: Replace snprintf with strscpy in register_handler_proc
  genirq/cpuhotplug: Notify about affinity changes breaking the affinity mask
  genirq: Move clear of kstat_irqs to free_desc()
  genirq: Warn about using IRQF_ONESHOT without a threaded handler
  irqdomain: Fix up const problem in irq_domain_set_name()
  genirq: Remove setup_percpu_irq()
  clocksource/drivers/mips-gic-timer: Move GIC timer to request_percpu_irq()
  MIPS: Move IP27 timer to request_percpu_irq()
  MIPS: Move IP30 timer to request_percpu_irq()
  genirq: Remove __request_percpu_irq() helper
  genirq: Remove IRQ timing tracking infrastructure
2026-02-10 13:39:37 -08:00
Linus Torvalds
36ae1c45b2 Scheduler changes for v7.0:
Scheduler Kconfig space updates:
 
  - Further consolidate configurable preemption modes: reduce
    the number of architectures that are allowed to offer
    PREEMPT_NONE and PREEMPT_VOLUNTARY, reducing the number
    of preemption models from four to just two: 'full' and 'lazy'
    on up-to-date architectures (arm64, loongarch, powerpc,
    riscv, s390, x86).
 
    None and voluntary are only available as legacy features
    on platforms that don't implement lazy preemption yet,
    or which don't even support preemption.
 
    The goal is to eventually remove cond_resched() and
    voluntary preemption altogether.
 
    (Peter Zijlstra)
 
 RSEQ based 'scheduler time slice extension' support:
 
 This allows a thread to request a time slice extension when it
 enters a critical section to avoid contention on a resource when
 the thread is scheduled out inside of the critical section.
 
  - Add fields and constants for time slice extension
  - Provide static branch for time slice extensions
  - Add statistics for time slice extensions
  - Add prctl() to enable time slice extensions
  - Implement sys_rseq_slice_yield()
  - Implement syscall entry work for time slice extensions
  - Implement time slice extension enforcement timer
  - Reset slice extension when scheduled
  - Implement rseq_grant_slice_extension()
  - entry: Hook up rseq time slice extension
  - selftests: Implement time slice extension test
 
    (Thomas Gleixner)
 
  - Allow registering RSEQ with slice extension
  - Move slice_ext_nsec to debugfs
  - Lower default slice extension
  - selftests/rseq: Add rseq slice histogram script
 
    (Peter Zijlstra)
 
 Scheduler performance/scalability improvements:
 
  - Update rq->avg_idle when a task is moved to an idle CPU,
    which improves the scalability of various workloads.
    (Shubhang Kaushik)
 
  - Reorder fields in 'struct rq' for better caching
    (Blake Jones)
 
  - Fair scheduler SMP NOHZ balancing code speedups:
 
    - Move checking for nohz cpus after time check
    - Change likelyhood of nohz.nr_cpus
    - Remove nohz.nr_cpus and use weight of cpumask instead
 
      (Shrikanth Hegde)
 
  - Avoid false sharing for sched_clock_irqtime (Wangyang Guo)
 
  - Drop useless cpumask_empty() in find_energy_efficient_cpu()
  - Simplify task_numa_find_cpu()
  - Use cpumask_weight_and() in sched_balance_find_dst_group()
 
    (Yury Norov)
 
 DL scheduler updates:
 
  - Add a deadline server for sched_ext tasks (by Andrea Righi and
    Joel Fernandes, with fixes by Peter Zijlstra)
 
 RT scheduler updates:
 
  - Skip currently executing CPU in rto_next_cpu() (Chen Jinghuang)
 
 Entry code updates and performance improvements, which is part of the
 scheduler tree in this cycle due to interdependencies with the RSEQ
 based time slice extension work:
 
   - Remove unused syscall argument from syscall_trace_enter()
   - Rework syscall_exit_to_user_mode_work() for architecture reuse
   - Add arch_ptrace_report_syscall_entry/exit()
   - Inline syscall_exit_work() and syscall_trace_enter()
 
     (Jinjie Ruan)
 
 Scheduler core updates:
 
  - Rework sched_class::wakeup_preempt() and rq_modified_*()
  - Avoid rq->lock bouncing in sched_balance_newidle()
  - Rename rcu_dereference_check_sched_domain() =>
           rcu_dereference_sched_domain()
  - <linux/compiler_types.h>: Add the __signed_scalar_typeof() helper
 
    (Peter Zijlstra)
 
 Fair scheduler updates/refactoring:
 
  - Fold the sched_avg update
  - Change rcu_dereference_check_sched_domain() to rcu-sched
  - Switch to rcu_dereference_all()
  - Remove superfluous rcu_read_lock()
  - Limit hrtick work
 
    (Peter Zijlstra)
 
  - Join two #ifdef CONFIG_FAIR_GROUP_SCHED blocks
  - Clean up comments in 'struct cfs_rq'
  - Separate se->vlag from se->vprot
  - Rename cfs_rq::avg_load to cfs_rq::sum_weight
  - Rename cfs_rq::avg_vruntime to ::sum_w_vruntime & helper functions
  - Introduce and use the vruntime_cmp() and vruntime_op() wrappers
    for wrapped-signed aritmetics
  - Sort out 'blocked_load*' namespace noise
 
    (Ingo Molnar)
 
 Scheduler debugging code updates:
 
  - Export hidden tracepoints to modules (Gabriele Monaco)
 
  - Convert copy_from_user() + kstrtouint() to kstrtouint_from_user()
    (Fushuai Wang)
 
  - Add assertions to QUEUE_CLASS (Peter Zijlstra)
 
  - hrtimer: Fix tracing oddity (Thomas Gleixner)
 
 Misc fixes and cleanups:
 
  - Re-evaluate scheduling when migrating queued tasks out of
    throttled cgroups (Zicheng Qu)
 
  - Remove task_struct->faults_disabled_mapping (Christoph Hellwig)
 
  - Fix math notation errors in avg_vruntime comment (Zhan Xusheng)
 
  - sched/cpufreq: Use %pe format for PTR_ERR() printing (zenghongling)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmJj+IRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1grtQ//WyXYGVE/WicdqslfaCY2Mr0uJnL0tLSM
 CJp+0LROdkmy+ChJmftO8RgjCUSsjhC4/xcBhUQXApf/ffQi3b2jH6nkTp/Z64Ms
 p2IXLkBiZjwdcO6fGbB0JE2G1J4hGRC5BlqfgkZzWidMf3kIbmrHg99mVWGzODLY
 N/cPW4d0WGf9TScl1FgEiOqgF3czMLlqvTDJqaFMpsTzSUcRBnrG4xushb4W/bBx
 573eqxgZJ6urNSGu8niY9PAl9F7gskXW3YxI3k8SH7VmJKSevWlwI9vMEhcRDzud
 E0XxD7J8iPOKtr7ypXm7anMBv4jWVUdAnPbYi4TDsyDDU/HguqMqT1McTGn8wQ+F
 jmdhmMC9/TEIzq93SNLbCYieibqDsmJoNVFFi0FWfPLMtYbcZd5a884SIz532vx4
 DegdlDXdazUwhxzDiQR3sq1CsHXpxNS2YdrpadAtF/r2gU86DQjsEew8yBvXi7bb
 Wrkzpax70sU1AFI23wJQkEb/OnnXyehAHAhhQN6GVvuiGr9P7C02WLEGLlmSmJrx
 zl2F750P76yhTfGcvTfJ/5LTfSB+yRozGvcdXnIkyzWotY6a2D1MKNusAfVax+IR
 kyfAWqVdxBhlKnqYbu92lTogvnPh3Lymd6G4TZZRkSH2jixyGd2oS7nZaDBAeBEM
 NHQtr9R+KyU=
 =Xj2f
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Scheduler Kconfig space updates:

   - Further consolidate configurable preemption modes (Peter Zijlstra)

     Reduce the number of architectures that are allowed to offer
     PREEMPT_NONE and PREEMPT_VOLUNTARY, reducing the number of
     preemption models from four to just two: 'full' and 'lazy' on
     up-to-date architectures (arm64, loongarch, powerpc, riscv, s390,
     x86).

     None and voluntary are only available as legacy features on
     platforms that don't implement lazy preemption yet, or which don't
     even support preemption.

     The goal is to eventually remove cond_resched() and voluntary
     preemption altogether.

  RSEQ based 'scheduler time slice extension' support (Thomas Gleixner
  and Peter Zijlstra):

  This allows a thread to request a time slice extension when it enters
  a critical section to avoid contention on a resource when the thread
  is scheduled out inside of the critical section.

   - Add fields and constants for time slice extension
   - Provide static branch for time slice extensions
   - Add statistics for time slice extensions
   - Add prctl() to enable time slice extensions
   - Implement sys_rseq_slice_yield()
   - Implement syscall entry work for time slice extensions
   - Implement time slice extension enforcement timer
   - Reset slice extension when scheduled
   - Implement rseq_grant_slice_extension()
   - entry: Hook up rseq time slice extension
   - selftests: Implement time slice extension test
   - Allow registering RSEQ with slice extension
   - Move slice_ext_nsec to debugfs
   - Lower default slice extension
   - selftests/rseq: Add rseq slice histogram script

  Scheduler performance/scalability improvements:

   - Update rq->avg_idle when a task is moved to an idle CPU, which
     improves the scalability of various workloads (Shubhang Kaushik)

   - Reorder fields in 'struct rq' for better caching (Blake Jones)

   - Fair scheduler SMP NOHZ balancing code speedups (Shrikanth Hegde):
      - Move checking for nohz cpus after time check
      - Change likelyhood of nohz.nr_cpus
      - Remove nohz.nr_cpus and use weight of cpumask instead

   - Avoid false sharing for sched_clock_irqtime (Wangyang Guo)

   - Cleanups (Yury Norov):
      - Drop useless cpumask_empty() in find_energy_efficient_cpu()
      - Simplify task_numa_find_cpu()
      - Use cpumask_weight_and() in sched_balance_find_dst_group()

  DL scheduler updates:

   - Add a deadline server for sched_ext tasks (by Andrea Righi and Joel
     Fernandes, with fixes by Peter Zijlstra)

  RT scheduler updates:

   - Skip currently executing CPU in rto_next_cpu() (Chen Jinghuang)

  Entry code updates and performance improvements (Jinjie Ruan)

  This is part of the scheduler tree in this cycle due to inter-
  dependencies with the RSEQ based time slice extension work:

    - Remove unused syscall argument from syscall_trace_enter()
    - Rework syscall_exit_to_user_mode_work() for architecture reuse
    - Add arch_ptrace_report_syscall_entry/exit()
    - Inline syscall_exit_work() and syscall_trace_enter()

  Scheduler core updates (Peter Zijlstra):

   - Rework sched_class::wakeup_preempt() and rq_modified_*()
   - Avoid rq->lock bouncing in sched_balance_newidle()
   - Rename rcu_dereference_check_sched_domain() =>
            rcu_dereference_sched_domain()
   - <linux/compiler_types.h>: Add the __signed_scalar_typeof() helper

  Fair scheduler updates/refactoring (Peter Zijlstra and Ingo Molnar):

   - Fold the sched_avg update
   - Change rcu_dereference_check_sched_domain() to rcu-sched
   - Switch to rcu_dereference_all()
   - Remove superfluous rcu_read_lock()
   - Limit hrtick work
   - Join two #ifdef CONFIG_FAIR_GROUP_SCHED blocks
   - Clean up comments in 'struct cfs_rq'
   - Separate se->vlag from se->vprot
   - Rename cfs_rq::avg_load to cfs_rq::sum_weight
   - Rename cfs_rq::avg_vruntime to ::sum_w_vruntime & helper functions
   - Introduce and use the vruntime_cmp() and vruntime_op() wrappers for
     wrapped-signed aritmetics
   - Sort out 'blocked_load*' namespace noise

  Scheduler debugging code updates:

   - Export hidden tracepoints to modules (Gabriele Monaco)

   - Convert copy_from_user() + kstrtouint() to kstrtouint_from_user()
     (Fushuai Wang)

   - Add assertions to QUEUE_CLASS (Peter Zijlstra)

   - hrtimer: Fix tracing oddity (Thomas Gleixner)

  Misc fixes and cleanups:

   - Re-evaluate scheduling when migrating queued tasks out of throttled
     cgroups (Zicheng Qu)

   - Remove task_struct->faults_disabled_mapping (Christoph Hellwig)

   - Fix math notation errors in avg_vruntime comment (Zhan Xusheng)

   - sched/cpufreq: Use %pe format for PTR_ERR() printing
     (zenghongling)"

* tag 'sched-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups
  sched/cpufreq: Use %pe format for PTR_ERR() printing
  sched/rt: Skip currently executing CPU in rto_next_cpu()
  sched/clock: Avoid false sharing for sched_clock_irqtime
  selftests/sched_ext: Add test for DL server total_bw consistency
  selftests/sched_ext: Add test for sched_ext dl_server
  sched/debug: Fix dl_server (re)start conditions
  sched/debug: Add support to change sched_ext server params
  sched_ext: Add a DL server for sched_ext tasks
  sched/debug: Stop and start server based on if it was active
  sched/debug: Fix updating of ppos on server write ops
  sched/deadline: Clear the defer params
  entry: Inline syscall_exit_work() and syscall_trace_enter()
  entry: Add arch_ptrace_report_syscall_entry/exit()
  entry: Rework syscall_exit_to_user_mode_work() for architecture reuse
  entry: Remove unused syscall argument from syscall_trace_enter()
  sched: remove task_struct->faults_disabled_mapping
  sched: Update rq->avg_idle when a task is moved to an idle CPU
  selftests/rseq: Add rseq slice histogram script
  hrtimer: Fix trace oddity
  ...
2026-02-10 12:50:10 -08:00
Linus Torvalds
0923fd0419 Locking updates for v6.20:
Lock debugging:
 
  - Implement compiler-driven static analysis locking context
    checking, using the upcoming Clang 22 compiler's context
    analysis features. (Marco Elver)
 
    We removed Sparse context analysis support, because prior to
    removal even a defconfig kernel produced 1,700+ context
    tracking Sparse warnings, the overwhelming majority of which
    are false positives. On an allmodconfig kernel the number of
    false positive context tracking Sparse warnings grows to
    over 5,200... On the plus side of the balance actual locking
    bugs found by Sparse context analysis is also rather ... sparse:
    I found only 3 such commits in the last 3 years. So the
    rate of false positives and the maintenance overhead is
    rather high and there appears to be no active policy in
    place to achieve a zero-warnings baseline to move the
    annotations & fixers to developers who introduce new code.
 
    Clang context analysis is more complete and more aggressive
    in trying to find bugs, at least in principle. Plus it has
    a different model to enabling it: it's enabled subsystem by
    subsystem, which results in zero warnings on all relevant
    kernel builds (as far as our testing managed to cover it).
    Which allowed us to enable it by default, similar to other
    compiler warnings, with the expectation that there are no
    warnings going forward. This enforces a zero-warnings baseline
    on clang-22+ builds. (Which are still limited in distribution,
    admittedly.)
 
    Hopefully the Clang approach can lead to a more maintainable
    zero-warnings status quo and policy, with more and more
    subsystems and drivers enabling the feature. Context tracking
    can be enabled for all kernel code via WARN_CONTEXT_ANALYSIS_ALL=y
    (default disabled), but this will generate a lot of false positives.
 
    ( Having said that, Sparse support could still be added back,
      if anyone is interested - the removal patch is still
      relatively straightforward to revert at this stage. )
 
 Rust integration updates: (Alice Ryhl, Fujita Tomonori, Boqun Feng)
 
   - Add support for Atomic<i8/i16/bool> and replace most Rust native
     AtomicBool usages with Atomic<bool>
 
   - Clean up LockClassKey and improve its documentation
 
   - Add missing Send and Sync trait implementation for SetOnce
 
   - Make ARef Unpin as it is supposed to be
 
   - Add __rust_helper to a few Rust helpers as a preparation for
     helper LTO
 
   - Inline various lock related functions to avoid additional
     function calls.
 
 WW mutexes:
 
   - Extend ww_mutex tests and other test-ww_mutex updates (John Stultz)
 
 Misc fixes and cleanups:
 
   - rcu: Mark lockdep_assert_rcu_helper() __always_inline
     (Arnd Bergmann)
 
   - locking/local_lock: Include more missing headers (Peter Zijlstra)
 
   - seqlock: fix scoped_seqlock_read kernel-doc (Randy Dunlap)
 
   - rust: sync: Replace `kernel::c_str!` with C-Strings
     (Tamir Duberstein)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmIXiURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gH+A/9GX5UmU6+HuDfDrCtXm9GDve6wkwahvcW
 jLDxOYjs764I2BhyjZnjKjyF5zw60hbykem7Wcf5EV2YH30nM4XRgEWVJfkr1UAI
 Pra415X4DdOzZ6qYQIpO8Udt1LtR7BMSaXITVLJaLicxEoOVtq3SKxjqyhCFs7UW
 MfJdqleB+RMLqq3LlzgB4l43eKk1xyeHh+oQwI0RSxuIpVZme3p4TObnCKjIWnK7
 Ihd+dkgC852WBjANgNL7F/sd5UsF5QX3wjtOrLhMKvkIgTPdXln0g398pivjN/G/
 Kpnw18SFeb159JfJu8eMotsYvVnQ0D5aOcTBfL4qvOHCImhpcu2s6ik9BcXqt2yT
 8IiuWk9xEM3Ok+I/I4ClT5cf5GYpyigV2QsXxn+IjDX5Na8v4zlHh0r8SElP8fOt
 7dpQx7iw8UghAib3AzA3suN78Oh39m8l5BNobj7LAjnqOQcVvoPo4o7/48ntuH7A
 38EucFrXfxQBMfGbMwvxEmgYuX7MyVfQLaPE06MHy1BkZkffT8Um38TB0iNtZmtf
 WUx01yLKWYspehlwFi319uVI4/Zp7FnTfqa5uKv1oSXVdL9vZojSXUzrgDV7FVqT
 Z4xAAw/kwNHpUG7y0zNOqd6PukovG1t+CjbLvK+eHPwc5c0vEGG2oTRAfEvvP1z/
 kesYDmCyJnk=
 =N1gA
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2026-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
 "Lock debugging:

   - Implement compiler-driven static analysis locking context checking,
     using the upcoming Clang 22 compiler's context analysis features
     (Marco Elver)

     We removed Sparse context analysis support, because prior to
     removal even a defconfig kernel produced 1,700+ context tracking
     Sparse warnings, the overwhelming majority of which are false
     positives. On an allmodconfig kernel the number of false positive
     context tracking Sparse warnings grows to over 5,200... On the plus
     side of the balance actual locking bugs found by Sparse context
     analysis is also rather ... sparse: I found only 3 such commits in
     the last 3 years. So the rate of false positives and the
     maintenance overhead is rather high and there appears to be no
     active policy in place to achieve a zero-warnings baseline to move
     the annotations & fixers to developers who introduce new code.

     Clang context analysis is more complete and more aggressive in
     trying to find bugs, at least in principle. Plus it has a different
     model to enabling it: it's enabled subsystem by subsystem, which
     results in zero warnings on all relevant kernel builds (as far as
     our testing managed to cover it). Which allowed us to enable it by
     default, similar to other compiler warnings, with the expectation
     that there are no warnings going forward. This enforces a
     zero-warnings baseline on clang-22+ builds (Which are still limited
     in distribution, admittedly)

     Hopefully the Clang approach can lead to a more maintainable
     zero-warnings status quo and policy, with more and more subsystems
     and drivers enabling the feature. Context tracking can be enabled
     for all kernel code via WARN_CONTEXT_ANALYSIS_ALL=y (default
     disabled), but this will generate a lot of false positives.

     ( Having said that, Sparse support could still be added back,
       if anyone is interested - the removal patch is still
       relatively straightforward to revert at this stage. )

  Rust integration updates: (Alice Ryhl, Fujita Tomonori, Boqun Feng)

    - Add support for Atomic<i8/i16/bool> and replace most Rust native
      AtomicBool usages with Atomic<bool>

    - Clean up LockClassKey and improve its documentation

    - Add missing Send and Sync trait implementation for SetOnce

    - Make ARef Unpin as it is supposed to be

    - Add __rust_helper to a few Rust helpers as a preparation for
      helper LTO

    - Inline various lock related functions to avoid additional function
      calls

  WW mutexes:

    - Extend ww_mutex tests and other test-ww_mutex updates (John
      Stultz)

  Misc fixes and cleanups:

    - rcu: Mark lockdep_assert_rcu_helper() __always_inline (Arnd
      Bergmann)

    - locking/local_lock: Include more missing headers (Peter Zijlstra)

    - seqlock: fix scoped_seqlock_read kernel-doc (Randy Dunlap)

    - rust: sync: Replace `kernel::c_str!` with C-Strings (Tamir
      Duberstein)"

* tag 'locking-core-2026-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (90 commits)
  locking/rwlock: Fix write_trylock_irqsave() with CONFIG_INLINE_WRITE_TRYLOCK
  rcu: Mark lockdep_assert_rcu_helper() __always_inline
  compiler-context-analysis: Remove __assume_ctx_lock from initializers
  tomoyo: Use scoped init guard
  crypto: Use scoped init guard
  kcov: Use scoped init guard
  compiler-context-analysis: Introduce scoped init guards
  cleanup: Make __DEFINE_LOCK_GUARD handle commas in initializers
  seqlock: fix scoped_seqlock_read kernel-doc
  tools: Update context analysis macros in compiler_types.h
  rust: sync: Replace `kernel::c_str!` with C-Strings
  rust: sync: Inline various lock related methods
  rust: helpers: Move #define __rust_helper out of atomic.c
  rust: wait: Add __rust_helper to helpers
  rust: time: Add __rust_helper to helpers
  rust: task: Add __rust_helper to helpers
  rust: sync: Add __rust_helper to helpers
  rust: refcount: Add __rust_helper to helpers
  rust: rcu: Add __rust_helper to helpers
  rust: processor: Add __rust_helper to helpers
  ...
2026-02-10 12:28:44 -08:00
Linus Torvalds
4d84667627 Performance events changes for v7.0:
x86 PMU driver updates:
 
  - Add support for the core PMU for Intel Diamond Rapids (DMR) CPUs.
    Compared to previous iterations of the Intel PMU code, there's
    been a lot of changes, which center around three main areas:
 
     - Introduce the OFF-MODULE RESPONSE (OMR) facility to
       replace the Off-Core Response (OCR) facility
 
     - New PEBS data source encoding layout
 
     - Support the new "RDPMC user disable" feature
 
    (Dapeng Mi)
 
  - Likewise, a large series adds uncore PMU support for
    Intel Diamond Rapids (DMR) CPUs, which center around these
    four main areas:
 
     - DMR may have two Integrated I/O and Memory Hub (IMH) dies,
       separate from the compute tile (CBB) dies.  Each CBB and
       each IMH die has its own discovery domain.
 
     - Unlike prior CPUs that retrieve the global discovery table
       portal exclusively via PCI or MSR, DMR uses PCI for IMH PMON
       discovery and MSR for CBB PMON discovery.
 
     - DMR introduces several new PMON types: SCA, HAMVF, D2D_ULA,
       UBR, PCIE4, CRS, CPC, ITC, OTC, CMS, and PCIE6.
 
     - IIO free-running counters in DMR are MMIO-based, unlike SPR.
 
    (Zide Chen)
 
  - Also add support for Add missing PMON units for Intel Panther Lake,
    and support Nova Lake (NVL), which largely maps to Panther Lake.
    (Zide Chen)
 
  - KVM integration: Add support for mediated vPMUs (by Kan Liang
    and Sean Christopherson, with fixes and cleanups by Peter Zijlstra,
    Sandipan Das and Mingwei Zhang)
 
  - Add Intel cstate driver to support for Wildcat Lake (WCL)
    CPUs, which are a low-power variant of Panther Lake.
    (Zide Chen)
 
  - Add core, cstate and MSR PMU support for the Airmont NP Intel CPU
    (aka MaxLinear Lightning Mountain), which maps to the existing
    Airmont code. (Martin Schiller)
 
 Performance enhancements:
 
  - core: Speed up kexec shutdown by avoiding unnecessary
    cross CPU calls. (Jan H. Schönherr)
 
  - core: Fix slow perf_event_task_exit() with LBR callstacks
    (Namhyung Kim)
 
 User-space stack unwinding support:
 
  - Various cleanups and refactorings in preparation to generalize
    the unwinding code for other architectures. (Jens Remus)
 
 Uprobes updates:
 
  - Transition from kmap_atomic to kmap_local_page (Keke Ming)
 
  - Fix incorrect lockdep condition in filter_chain() (Breno Leitao)
 
  - Fix XOL allocation failure for 32-bit tasks (Oleg Nesterov)
 
 Misc fixes and cleanups:
 
  - s390: Remove kvm_types.h from Kbuild (Randy Dunlap)
 
  - x86/intel/uncore: Convert comma to semicolon (Chen Ni)
 
  - x86/uncore: Clean up const mismatch (Greg Kroah-Hartman)
 
  - x86/ibs: Fix typo in dc_l2tlb_miss comment (Xiang-Bin Shi)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmJhTURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1i/qw/9F/sjMqbxH8d3kPXB8wk2eUuSkynQ2aNw
 Zec9qtfCC5N1U9b2D7ywGJRscTmWYnX/3BKTyzFuyA6SDz6buAgDDIGPlHi+9Fww
 +RUUS3lQ7N3pVWZ4Ifu3kbh3Vz4lkQuOXhfcjiyIMS6QIxfrcSLFoKHK+2V6PeU+
 x0k+THHz/Ymg+DIpqSjqil1yrKaUmU9xRrbnyy6zJB1duREQrkYBhIWL1+bcd7SA
 89RVAGXQ+sWzVQMPaKrMkZj6GavOCB7zseigiiwjBRLznukS2OulDDe8zR6pCJZp
 wbdc7TR/nCm+QtNfkHlOmTQvsPAXiXNyXe5Vi8aFjGc0uMGhHaeiL9ah/bwsKA5m
 Bm5Y7oVSmBlCJbcr/CTrGYkb+WwvLIgPCwVkn4FYPlsWv+U92qTOx9q7qKmIDFaj
 1oUXCwoHbYrYnoZqZqPp2h689m0Lh/lsGhy0QRt8aGnKu0SDjMaqRGbYWH0UI0kA
 aDnZstVHG76RnVi0143q8HcvvjNZb82NL0cS749tY/YcwH4kUEGj2XuSK2Ar887T
 H0oDJHXijMlGXWqO5bK3WMoCQajR7nRyqBo7/rKYj20OjXmwoXS3vel77/s8WFo2
 fUUC469MacDzyxdBNutnkJvvcvsUio3r4MWFsEEWQk2nUE58PtN8YM8j/FdNYpql
 zAZ4Jx/A0RM=
 =4SRB
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance event updates from Ingo Molnar:
 "x86 PMU driver updates:

   - Add support for the core PMU for Intel Diamond Rapids (DMR) CPUs
     (Dapeng Mi)

     Compared to previous iterations of the Intel PMU code, there's been
     a lot of changes, which center around three main areas:

      - Introduce the OFF-MODULE RESPONSE (OMR) facility to replace the
        Off-Core Response (OCR) facility

      - New PEBS data source encoding layout

      - Support the new "RDPMC user disable" feature

   - Likewise, a large series adds uncore PMU support for Intel Diamond
     Rapids (DMR) CPUs (Zide Chen)

     This centers around these four main areas:

      - DMR may have two Integrated I/O and Memory Hub (IMH) dies,
        separate from the compute tile (CBB) dies. Each CBB and each IMH
        die has its own discovery domain.

      - Unlike prior CPUs that retrieve the global discovery table
        portal exclusively via PCI or MSR, DMR uses PCI for IMH PMON
        discovery and MSR for CBB PMON discovery.

      - DMR introduces several new PMON types: SCA, HAMVF, D2D_ULA, UBR,
        PCIE4, CRS, CPC, ITC, OTC, CMS, and PCIE6.

      - IIO free-running counters in DMR are MMIO-based, unlike SPR.

   - Also add support for Add missing PMON units for Intel Panther Lake,
     and support Nova Lake (NVL), which largely maps to Panther Lake.
     (Zide Chen)

   - KVM integration: Add support for mediated vPMUs (by Kan Liang and
     Sean Christopherson, with fixes and cleanups by Peter Zijlstra,
     Sandipan Das and Mingwei Zhang)

   - Add Intel cstate driver to support for Wildcat Lake (WCL) CPUs,
     which are a low-power variant of Panther Lake (Zide Chen)

   - Add core, cstate and MSR PMU support for the Airmont NP Intel CPU
     (aka MaxLinear Lightning Mountain), which maps to the existing
     Airmont code (Martin Schiller)

  Performance enhancements:

   - Speed up kexec shutdown by avoiding unnecessary cross CPU calls
     (Jan H. Schönherr)

   - Fix slow perf_event_task_exit() with LBR callstacks (Namhyung Kim)

  User-space stack unwinding support:

   - Various cleanups and refactorings in preparation to generalize the
     unwinding code for other architectures (Jens Remus)

  Uprobes updates:

   - Transition from kmap_atomic to kmap_local_page (Keke Ming)

   - Fix incorrect lockdep condition in filter_chain() (Breno Leitao)

   - Fix XOL allocation failure for 32-bit tasks (Oleg Nesterov)

  Misc fixes and cleanups:

   - s390: Remove kvm_types.h from Kbuild (Randy Dunlap)

   - x86/intel/uncore: Convert comma to semicolon (Chen Ni)

   - x86/uncore: Clean up const mismatch (Greg Kroah-Hartman)

   - x86/ibs: Fix typo in dc_l2tlb_miss comment (Xiang-Bin Shi)"

* tag 'perf-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
  s390: remove kvm_types.h from Kbuild
  uprobes: Fix incorrect lockdep condition in filter_chain()
  x86/ibs: Fix typo in dc_l2tlb_miss comment
  x86/uprobes: Fix XOL allocation failure for 32-bit tasks
  perf/x86/intel/uncore: Convert comma to semicolon
  perf/x86/intel: Add support for rdpmc user disable feature
  perf/x86: Use macros to replace magic numbers in attr_rdpmc
  perf/x86/intel: Add core PMU support for Novalake
  perf/x86/intel: Add support for PEBS memory auxiliary info field in NVL
  perf/x86/intel: Add core PMU support for DMR
  perf/x86/intel: Add support for PEBS memory auxiliary info field in DMR
  perf/x86/intel: Support the 4 new OMR MSRs introduced in DMR and NVL
  perf/core: Fix slow perf_event_task_exit() with LBR callstacks
  perf/core: Speed up kexec shutdown by avoiding unnecessary cross CPU calls
  uprobes: use kmap_local_page() for temporary page mappings
  arm/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
  mips/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
  arm64/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
  riscv/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
  perf/x86/intel/uncore: Add Nova Lake support
  ...
2026-02-10 12:00:46 -08:00
Linus Torvalds
f17b474e36 bpf-next-7.0
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmmGmrgACgkQ6rmadz2v
 bTq6NxAAkCHosxzGn9GYYBV8xhrBJoJJDCyEbQ4nR0XNY+zaWnuykmiPP9w1aOAM
 zm/po3mQB2pZjetvlrPrgG5RLgBCAUHzqVGy0r+phUvD3vbohKlmSlMm2kiXOb9N
 T01BgLWsyqN2ZcNFvORdSsftqIJUHcXxU6RdupGD60sO5XM9ty5cwyewLX8GBOas
 UN2bOhbK2DpqYWUvtv+3Q3ykxoStMSkXZvDRurwLKl4RHeLjXZXPo8NjnfBlk/F2
 vdFo/F4NO4TmhOave6UPXvKb4yo9IlBRmiPAl0RmNKBxenY8j9XuV/xZxU6YgzDn
 +SQfDK+CKQ4IYIygE+fqd4e5CaQrnjmPPcIw12AB2CF0LimY9Xxyyk6FSAhMN7wm
 GTVh5K2C3Dk3OiRQk4G58EvQ5QcxzX98IeeCpcckMUkPsFWHRvF402WMUcv9SWpD
 DsxxPkfENY/6N67EvH0qcSe/ikdUorQKFl4QjXKwsMCd5WhToeP4Z7Ck1gVSNkAh
 9CX++mLzg333Lpsc4SSIuk9bEPpFa5cUIKUY7GCsCiuOXciPeMDP3cGSd5LioqxN
 qWljs4Z88QDM2LJpAh8g4m3sA7bMhES3nPmdlI5CfgBcVyLW8D8CqQq4GEZ1McwL
 Ky084+lEosugoVjRejrdMMEOsqAfcbkTr2b8jpuAZdwJKm6p/bw=
 =cBdK
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:

 - Support associating BPF program with struct_ops (Amery Hung)

 - Switch BPF local storage to rqspinlock and remove recursion detection
   counters which were causing false positives (Amery Hung)

 - Fix live registers marking for indirect jumps (Anton Protopopov)

 - Introduce execution context detection BPF helpers (Changwoo Min)

 - Improve verifier precision for 32bit sign extension pattern
   (Cupertino Miranda)

 - Optimize BTF type lookup by sorting vmlinux BTF and doing binary
   search (Donglin Peng)

 - Allow states pruning for misc/invalid slots in iterator loops (Eduard
   Zingerman)

 - In preparation for ASAN support in BPF arenas teach libbpf to move
   global BPF variables to the end of the region and enable arena kfuncs
   while holding locks (Emil Tsalapatis)

 - Introduce support for implicit arguments in kfuncs and migrate a
   number of them to new API. This is a prerequisite for cgroup
   sub-schedulers in sched-ext (Ihor Solodrai)

 - Fix incorrect copied_seq calculation in sockmap (Jiayuan Chen)

 - Fix ORC stack unwind from kprobe_multi (Jiri Olsa)

 - Speed up fentry attach by using single ftrace direct ops in BPF
   trampolines (Jiri Olsa)

 - Require frozen map for calculating map hash (KP Singh)

 - Fix lock entry creation in TAS fallback in rqspinlock (Kumar
   Kartikeya Dwivedi)

 - Allow user space to select cpu in lookup/update operations on per-cpu
   array and hash maps (Leon Hwang)

 - Make kfuncs return trusted pointers by default (Matt Bobrowski)

 - Introduce "fsession" support where single BPF program is executed
   upon entry and exit from traced kernel function (Menglong Dong)

 - Allow bpf_timer and bpf_wq use in all programs types (Mykyta
   Yatsenko, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Alexei
   Starovoitov)

 - Make KF_TRUSTED_ARGS the default for all kfuncs and clean up their
   definition across the tree (Puranjay Mohan)

 - Allow BPF arena calls from non-sleepable context (Puranjay Mohan)

 - Improve register id comparison logic in the verifier and extend
   linked registers with negative offsets (Puranjay Mohan)

 - In preparation for BPF-OOM introduce kfuncs to access memcg events
   (Roman Gushchin)

 - Use CFI compatible destructor kfunc type (Sami Tolvanen)

 - Add bitwise tracking for BPF_END in the verifier (Tianci Cao)

 - Add range tracking for BPF_DIV and BPF_MOD in the verifier (Yazhou
   Tang)

 - Make BPF selftests work with 64k page size (Yonghong Song)

* tag 'bpf-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (268 commits)
  selftests/bpf: Fix outdated test on storage->smap
  selftests/bpf: Choose another percpu variable in bpf for btf_dump test
  selftests/bpf: Remove test_task_storage_map_stress_lookup
  selftests/bpf: Update task_local_storage/task_storage_nodeadlock test
  selftests/bpf: Update task_local_storage/recursion test
  selftests/bpf: Update sk_storage_omem_uncharge test
  bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy}
  bpf: Support lockless unlink when freeing map or local storage
  bpf: Prepare for bpf_selem_unlink_nofail()
  bpf: Remove unused percpu counter from bpf_local_storage_map_free
  bpf: Remove cgroup local storage percpu counter
  bpf: Remove task local storage percpu counter
  bpf: Change local_storage->lock and b->lock to rqspinlock
  bpf: Convert bpf_selem_unlink to failable
  bpf: Convert bpf_selem_link_map to failable
  bpf: Convert bpf_selem_unlink_map to failable
  bpf: Select bpf_local_storage_map_bucket based on bpf_local_storage
  selftests/xsk: fix number of Tx frags in invalid packet
  selftests/xsk: properly handle batch ending in the middle of a packet
  bpf: Prevent reentrance into call_rcu_tasks_trace()
  ...
2026-02-10 11:26:21 -08:00
Linus Torvalds
a7423e6ea2 Modules changes for v7.0-rc1
Module signing:
 
   - Remove SHA-1 support for signing modules. SHA-1 is no longer
     considered secure for signatures due to vulnerabilities that can
     lead to hash collisions. None of the major distributions use
     SHA-1 anymore, and the kernel has defaulted to SHA-512 since
     v6.11. Note that loading SHA-1 signed modules is still supported.
 
   - Update scripts/sign-file to use only the OpenSSL CMS API for
     signing. As SHA-1 support is gone, we can drop the legacy PKCS#7
     API which was limited to SHA-1. This also cleans up support for
     legacy OpenSSL versions.
 
 Cleanups and fixes:
 
   - Use system_dfl_wq instead of the per-cpu system_wq following the
     ongoing workqueue API refactoring.
 
   - Avoid open-coded kvrealloc() in module decompression logic by
     using the standard helper.
 
   - Improve section annotations by replacing the custom __modinit
     with __init_or_module and removing several unused __INIT*_OR_MODULE
     macros.
 
   - Fix kernel-doc warnings in include/linux/moduleparam.h.
 
   - Ensure set_module_sig_enforced is only declared when module
     signing is enabled.
 
   - Fix gendwarfksyms build failures on 32-bit hosts.
 
 MAINTAINERS:
 
   - Update the module subsystem entry to reflect the maintainer
     rotation and update the git repository link.
 
 The changes have been soaking in linux-next since -rc2.
 
 Note that like Daniel mentioned in the previous pull request [1], we
 rotate maintainership every 6 months, and I will be handling the module
 subsystem pull requests for the first half of this year.
 
 Link: https://lore.kernel.org/r/20251203234840.3720-1-da.gomez@kernel.org [1]
 Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSE9au1u/dCZerzchhaByWrOaGnegUCaYYeeAAKCRBaByWrOaGn
 epIxAQDU/VSAC491S9/5dAUeGbOis9/p6QJKQlNgEqU4oTlOsgEA0p8BZ9Spkwzd
 v9BfIl3j9qVt7wUdlLdbHfdvPgtUVgc=
 =at6w
 -----END PGP SIGNATURE-----

Merge tag 'modules-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux

Pull module updates from Sami Tolvanen:
 "Module signing:

   - Remove SHA-1 support for signing modules.

     SHA-1 is no longer considered secure for signatures due to
     vulnerabilities that can lead to hash collisions. None of the major
     distributions use SHA-1 anymore, and the kernel has defaulted to
     SHA-512 since v6.11.

     Note that loading SHA-1 signed modules is still supported.

   - Update scripts/sign-file to use only the OpenSSL CMS API for
     signing.

     As SHA-1 support is gone, we can drop the legacy PKCS#7 API which
     was limited to SHA-1. This also cleans up support for legacy
     OpenSSL versions.

  Cleanups and fixes:

   - Use system_dfl_wq instead of the per-cpu system_wq following the
     ongoing workqueue API refactoring.

   - Avoid open-coded kvrealloc() in module decompression logic by using
     the standard helper.

   - Improve section annotations by replacing the custom __modinit with
     __init_or_module and removing several unused __INIT*_OR_MODULE
     macros.

   - Fix kernel-doc warnings in include/linux/moduleparam.h.

   - Ensure set_module_sig_enforced is only declared when module signing
     is enabled.

   - Fix gendwarfksyms build failures on 32-bit hosts.

  MAINTAINERS:

   - Update the module subsystem entry to reflect the maintainer
     rotation and update the git repository link"

* tag 'modules-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
  modules: moduleparam.h: fix kernel-doc comments
  module: Only declare set_module_sig_enforced when CONFIG_MODULE_SIG=y
  module/decompress: Avoid open-coded kvrealloc()
  gendwarfksyms: Fix build on 32-bit hosts
  sign-file: Use only the OpenSSL CMS API for signing
  module: Remove SHA-1 support for module signing
  module: replace use of system_wq with system_dfl_wq
  params: Replace __modinit with __init_or_module
  module: Remove unused __INIT*_OR_MODULE macros
  MAINTAINERS: Update module subsystem maintainers and repository
2026-02-10 09:49:18 -08:00
Linus Torvalds
08df88fa14 This update includes the following changes:
API:
 
 - Fix race condition in hwrng core by using RCU.
 
 Algorithms:
 
 - Allow authenc(sha224,rfc3686) in fips mode.
 - Add test vectors for authenc(hmac(sha384),cbc(aes)).
 - Add test vectors for authenc(hmac(sha224),cbc(aes)).
 - Add test vectors for authenc(hmac(md5),cbc(des3_ede)).
 - Add lz4 support in hisi_zip.
 - Only allow clear key use during self-test in s390/{phmac,paes}.
 
 Drivers:
 
 - Set rng quality to 900 in airoha.
 - Add gcm(aes) support for AMD/Xilinx Versal device.
 - Allow tfms to share device in hisilicon/trng.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmmJlNEACgkQxycdCkmx
 i6dfYw//fLKHita7B7k6Rnfv7aTX7ZaF7bwMb1w2OtNu7061ZK1+Ou127ZjFKFxC
 qJtI71qmTnhTOXnqeLHDio81QLZ3D9cUwSITv4YS4SCIZlbpKmKNFNfmNd5qweNG
 xHRQnD4jiM2Qk8GFx6CmXKWEooev9Z9vvjWtPSbuHSXVUd5WPGkJfLv6s9Oy3W6u
 7/Z+KcPtMNx3mAhNy7ZwzttKLCPfLp8YhEP99sOFmrUhehjC2e5z59xcQmef5gfJ
 cCTBUJkySLChF2bd8eHWilr8y7jow/pyldu2Ksxv2/o0l01xMqrQoIOXwCeEuEq0
 uxpKMCR0wM9jBlA1C59zBfiL5Dacb+Dbc7jcRRAa49MuYclVMRoPmnAutUMiz38G
 mk/gpc1BQJIez1rAoTyXiNsXiSeZnu/fR9tOq28pTfNXOt2CXsR6kM1AuuP2QyuP
 QC0+UM5UsTE+QIibYklop3HfSCFIaV5LkDI/RIvPzrUjcYkJYgMnG3AoIlqkOl1s
 mzcs20cH9PoZG3v5W4SkKJMib6qSx1qfa1YZ7GucYT1nUk04Plcb8tuYabPP4x6y
 ow/vfikRjnzuMesJShifJUwplaZqP64RBXMvIfgdoOCXfeQ1tKCKz0yssPfgmSs6
 K5mmnmtMvgB6k14luCD3E2zFHO6W+PHZQbSanEvhnlikPo86Dbk=
 =n4fL
 -----END PGP SIGNATURE-----

Merge tag 'v7.0-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto update from Herbert Xu:
 "API:
   - Fix race condition in hwrng core by using RCU

  Algorithms:
   - Allow authenc(sha224,rfc3686) in fips mode
   - Add test vectors for authenc(hmac(sha384),cbc(aes))
   - Add test vectors for authenc(hmac(sha224),cbc(aes))
   - Add test vectors for authenc(hmac(md5),cbc(des3_ede))
   - Add lz4 support in hisi_zip
   - Only allow clear key use during self-test in s390/{phmac,paes}

  Drivers:
   - Set rng quality to 900 in airoha
   - Add gcm(aes) support for AMD/Xilinx Versal device
   - Allow tfms to share device in hisilicon/trng"

* tag 'v7.0-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (100 commits)
  crypto: img-hash - Use unregister_ahashes in img_{un}register_algs
  crypto: testmgr - Add test vectors for authenc(hmac(md5),cbc(des3_ede))
  crypto: cesa - Simplify return statement in mv_cesa_dequeue_req_locked
  crypto: testmgr - Add test vectors for authenc(hmac(sha224),cbc(aes))
  crypto: testmgr - Add test vectors for authenc(hmac(sha384),cbc(aes))
  hwrng: core - use RCU and work_struct to fix race condition
  crypto: starfive - Fix memory leak in starfive_aes_aead_do_one_req()
  crypto: xilinx - Fix inconsistant indentation
  crypto: rng - Use unregister_rngs in register_rngs
  crypto: atmel - Use unregister_{aeads,ahashes,skciphers}
  hwrng: optee - simplify OP-TEE context match
  crypto: ccp - Add sysfs attribute for boot integrity
  dt-bindings: crypto: atmel,at91sam9g46-sha: add microchip,lan9691-sha
  dt-bindings: crypto: atmel,at91sam9g46-aes: add microchip,lan9691-aes
  dt-bindings: crypto: qcom,inline-crypto-engine: document the Milos ICE
  crypto: caam - fix netdev memory leak in dpaa2_caam_probe
  crypto: hisilicon/qm - increase wait time for mailbox
  crypto: hisilicon/qm - obtain the mailbox configuration at one time
  crypto: hisilicon/qm - remove unnecessary code in qm_mb_write()
  crypto: hisilicon/qm - move the barrier before writing to the mailbox register
  ...
2026-02-10 08:36:42 -08:00
Linus Torvalds
d16738a4e7 The kthread code provides an infrastructure which manages the preferred
affinity of unbound kthreads (node or custom cpumask) against
 housekeeping (CPU isolation) constraints and CPU hotplug events.
 
 One crucial missing piece is the handling of cpuset: when an isolated
 partition is created, deleted, or its CPUs updated, all the unbound
 kthreads in the top cpuset become indifferently affine to _all_ the
 non-isolated CPUs, possibly breaking their preferred affinity along
 the way.
 
 Solve this with performing the kthreads affinity update from cpuset to
 the kthreads consolidated relevant code instead so that preferred
 affinities are honoured and applied against the updated cpuset isolated
 partitions.
 
 The dispatch of the new isolated cpumasks to timers, workqueues and
 kthreads is performed by housekeeping, as per the nice Tejun's
 suggestion.
 
 As a welcome side effect, HK_TYPE_DOMAIN then integrates both the set
 from boot defined domain isolation (through isolcpus=) and cpuset
 isolated partitions. Housekeeping cpumasks are now modifiable with a
 specific RCU based synchronization. A big step toward making nohz_full=
 also mutable through cpuset in the future.
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEd76+gtGM8MbftQlOhSRUR1COjHcFAmmE0mYbFIAAAAAABAAO
 bWFudTIsMi41KzEuMTEsMiwyAAoJEIUkVEdQjox36eMP/0Ls/ArfYVi/MNAXWlpy
 rAt6m9Y/X9GBcDM/VI9BXq1ZX4qEr2XjJ8UUb8cM08uHEAt0ErlmpRxREwJFrKbI
 H4jzg5EwO0D0c6MnvgQJEAwkHxQVIjsxG9DovRIjxyW4ycx3aSsRg/f2VKyWoLvY
 7ZT7CbLFE+I/MQh2ZgUu/9pnCDQVR2anss2WYIej5mmgFL5pyEv3YvYgKYVyK08z
 sXyNxpP976g2d9ECJ9OtFJV9we6mlqxlG0MVCiv/Uxh7DBjxWWPsLvlmLAXggQ03
 +0GW+nnutDaKz83pgS7Z4zum/+Oa+I1dTLIN27pARUNcMCYip7njM2KNpJwPdov3
 +fAIODH2JVX1xewT+U1cCq6gdI55ejbwdQYGFV075dKBUxKQeIyrghvfC3Ga6aKQ
 Gw3y68jdrXOw6iyfHR5k/0Mnu2/FDKUW2fZxLKm55PvNZP5jQFmSlz9wyiwwyb3m
 UUSgThj6Ozodxks8hDX41rGVezCcm1ni+qNSiNIs8HPaaZQrwbnvKHQFBBJHQzJP
 rJ39VWBx3Hq/ly71BOR6pCzoZsfS1f85YKhJ4vsfjLO6BfhI16nBat89eROSRKcz
 XptyWqW0PgAD0teDuMCTPNuUym/viBHALXHKuSO12CIizacvftiGcmaQNPlLiiFZ
 /Dr2+aOhwYw3UD6djn3u94M9
 =nWGh
 -----END PGP SIGNATURE-----

Merge tag 'kthread-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks

Pull kthread updates from Frederic Weisbecker:
 "The kthread code provides an infrastructure which manages the
  preferred affinity of unbound kthreads (node or custom cpumask)
  against housekeeping (CPU isolation) constraints and CPU hotplug
  events.

  One crucial missing piece is the handling of cpuset: when an isolated
  partition is created, deleted, or its CPUs updated, all the unbound
  kthreads in the top cpuset become indifferently affine to _all_ the
  non-isolated CPUs, possibly breaking their preferred affinity along
  the way.

  Solve this with performing the kthreads affinity update from cpuset to
  the kthreads consolidated relevant code instead so that preferred
  affinities are honoured and applied against the updated cpuset
  isolated partitions.

  The dispatch of the new isolated cpumasks to timers, workqueues and
  kthreads is performed by housekeeping, as per the nice Tejun's
  suggestion.

  As a welcome side effect, HK_TYPE_DOMAIN then integrates both the set
  from boot defined domain isolation (through isolcpus=) and cpuset
  isolated partitions. Housekeeping cpumasks are now modifiable with a
  specific RCU based synchronization. A big step toward making
  nohz_full= also mutable through cpuset in the future"

* tag 'kthread-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks: (33 commits)
  doc: Add housekeeping documentation
  kthread: Document kthread_affine_preferred()
  kthread: Comment on the purpose and placement of kthread_affine_node() call
  kthread: Honour kthreads preferred affinity after cpuset changes
  sched/arm64: Move fallback task cpumask to HK_TYPE_DOMAIN
  sched: Switch the fallback task allowed cpumask to HK_TYPE_DOMAIN
  kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management
  kthread: Include kthreadd to the managed affinity list
  kthread: Include unbound kthreads in the managed affinity list
  kthread: Refine naming of affinity related fields
  PCI: Remove superfluous HK_TYPE_WQ check
  sched/isolation: Remove HK_TYPE_TICK test from cpu_is_isolated()
  cpuset: Remove cpuset_cpu_is_isolated()
  timers/migration: Remove superfluous cpuset isolation test
  cpuset: Propagate cpuset isolation update to timers through housekeeping
  cpuset: Propagate cpuset isolation update to workqueue through housekeeping
  PCI: Flush PCI probe workqueue on cpuset isolated partition change
  sched/isolation: Flush vmstat workqueues on cpuset isolated partition change
  sched/isolation: Flush memcg workqueues on cpuset isolated partition change
  cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset
  ...
2026-02-09 19:57:30 -08:00
Linus Torvalds
9b1b3dcd28 Power management updates for 6.20-rc1/7.0-rc1
- Remove the unused omap-cpufreq driver (Andreas Kemnade)
 
  - Optimize error handling code in cpufreq_boost_trigger_state() and
    make cpufreq_boost_trigger_state() return -EOPNOTSUPP if no policy
    supports boost (Lifeng Zheng)
 
  - Update cpufreq-dt-platdev list for tegra, qcom, TI (Aaron Kling,
    Dhruva Gole, and Konrad Dybcio)
 
  - Minor improvements to the cpufreq and cpumask rust implementation
    (Alexandre Courbot, Alice Ryhl, Tamir Duberstein, and Yilin Chen)
 
  - Add support for AM62L3 SoC to the ti-cpufreq driver (Dhruva Gole)
 
  - Update arch_freq_scale in the CPPC cpufreq driver's frequency
    invariance engine (FIE) in scheduler ticks if the related CPPC
    registers are not in PCC (Jie Zhan)
 
  - Assorted minor cleanups and improvements in ARM cpufreq drivers (Juan
    Martinez, Felix Gu, Luca Weiss, and Sergey Shtylyov)
 
  - Add generic helpers for sysfs show/store to cppc_cpufreq (Sumit
    Gupta)
 
  - Make the scaling_setspeed cpufreq sysfs attribute return the actual
    requested frequency to avoid confusion (Pengjie Zhang)
 
  - Simplify the idle CPU time granularity test in the ondemand cpufreq
    governor (Frederic Weisbecker)
 
  - Enable asym capacity in intel_pstate only when CPU SMT is not
    possible (Yaxiong Tian)
 
  - Update the description of rate_limit_us default value in cpufreq
    documentation (Yaxiong Tian)
 
  - Add a command line option to adjust the C-states table in the
    intel_idle driver, remove the 'preferred_cstates' module parameter
    from it, add C-states validation to it and clean it up (Artem
    Bityutskiy)
 
  - Make the menu cpuidle governor always check the time till the closest
    timer event when the scheduler tick has been stopped to prevent it
    from mistakenly selecting the deepest available idle state (Rafael
    Wysocki)
 
  - Update the teo cpuidle governor to avoid making suboptimal decisions
    in certain corner cases and generally improve idle state selection
    accuracy (Rafael Wysocki)
 
  - Remove an unlikely() annotation on the early-return condition in
    menu_select() that leads to branch misprediction 100% of the time
    on systems with only 1 idle state enabled, like ARM64 servers (Breno
    Leitao)
 
  - Add Christian Loehle to MAINTAINERS as a cpuidle reviewer (Christian
    Loehle)
 
  - Stop flagging the PM runtime workqueue as freezable to avoid system
    suspend and resume deadlocks in subsystems that assume asynchronous
    runtime PM to work during system-wide PM transitions (Rafael Wysocki)
 
  - Drop redundant NULL pointer checks before acomp_request_free() from
    the hibernation code handling image saving (Rafael Wysocki)
 
  - Update wakeup_sources_walk_start() to handle empty lists of wakeup
    sources as appropriate (Samuel Wu)
 
  - Make dev_pm_clear_wake_irq() check the power.wakeirq value under
    power.lock to avoid race conditions (Gui-Dong Han)
 
  - Avoid bit field races related to power.work_in_progress in the core
    device suspend code (Xuewen Yan)
 
  - Make several drivers discard pm_runtime_put() return value in
    preparation for converting that function to a void one (Rafael
    Wysocki)
 
  - Add PL4 support for Ice Lake to the Intel RAPL power capping
    driver (Daniel Tang)
 
  - Replace sprintf() with sysfs_emit() in power capping sysfs show
    functions (Sumeet Pawnikar)
 
  - Make dev_pm_opp_get_level() return value match the documentation
    after a previous update of the latter (Aleks Todorov)
 
  - Use scoped for each OF child loop in the OPP code (Krzysztof
    Kozlowski)
 
  - Fix a bug in an example code snippet and correct typos in the energy
    model management documentation (Patrick Little)
 
  - Fix miscellaneous problems in cpupower (Kaushlendra Kumar):
 
    * idle_monitor: Fix incorrect value logged after stop
    * Fix inverted APERF capability check
    * Use strcspn() to strip trailing newline
    * Reset errno before strtoull()
    * Show C0 in idle-info dump
 
  - Improve cpupower installation procedure by making the systemd step
    optional and allowing users to disable the installation of systemd's
    unit file (João Marcos Costa)
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmmDr5ASHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO1Q8oH/0KRqdidHzesIQl6gd5WSS/sWdxODRUt
 R9dEGQQ6LXCY0z05RAq29HZQf618fYuRFX4PSrtyCvrcRJK7MJKuzK55MRq0MC3c
 c/2pL1PdpHexjLXUP9pcoxrYjetsr7SnD6Y0M3JfOPg1E/bG8sp1DlnE8cdqrL0W
 lrdB2cEGewT2SVkNhCIQ2n6bwfQwmLlfQl1vXTM8BA7xCjoslePUJlRphAFVAt/J
 5fQxSOH0eSxK5PYQFUDM2D2J3uMAN0pFb6eIjwVYYqjABqV//BPl99Rv2W3ElJq7
 K/SICRWlvzyINCgF15QAUtQHWdINxSb0GzovECVxODHOv0N4mKHdpNU=
 =QlVe
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "By the number of commits, cpufreq is the leading party (again) and the
  most visible change there is the removal of the omap-cpufreq driver
  that has not been used for a long time (good riddance). There are also
  quite a few changes in the cppc_cpufreq driver, mostly related to
  fixing its frequency invariance engine in the case when the CPPC
  registers used by it are not in PCC. In addition to that, support for
  AM62L3 is added to the ti-cpufreq driver and the cpufreq-dt-platdev
  list is updated for some platforms. The remaining cpufreq changes are
  assorted fixes and cleanups.

  Next up is cpuidle and the changes there are dominated by intel_idle
  driver updates, mostly related to the new command line facility
  allowing users to adjust the list of C-states used by the driver.
  There are also a few updates of cpuidle governors, including two menu
  governor fixes and some refinements of the teo governor, and a
  MAINTAINERS update adding Christian Loehle as a cpuidle reviewer.
  [Thanks for stepping up Christian!]

  The most significant update related to system suspend and hibernation
  is the one to stop freezing the PM runtime workqueue during system PM
  transitions which allows some deadlocks to be avoided. There is also a
  fix for possible concurrent bit field updates in the core device
  suspend code and a few other minor fixes.

  Apart from the above, several drivers are updated to discard the
  return value of pm_runtime_put() which is going to be converted to a
  void function as soon as everybody stops using its return value, PL4
  support for Ice Lake is added to the Intel RAPL power capping driver,
  and there are assorted cleanups, documentation fixes, and some
  cpupower utility improvements.

  Specifics:

   - Remove the unused omap-cpufreq driver (Andreas Kemnade)

   - Optimize error handling code in cpufreq_boost_trigger_state() and
     make cpufreq_boost_trigger_state() return -EOPNOTSUPP if no policy
     supports boost (Lifeng Zheng)

   - Update cpufreq-dt-platdev list for tegra, qcom, TI (Aaron Kling,
     Dhruva Gole, and Konrad Dybcio)

   - Minor improvements to the cpufreq and cpumask rust implementation
     (Alexandre Courbot, Alice Ryhl, Tamir Duberstein, and Yilin Chen)

   - Add support for AM62L3 SoC to the ti-cpufreq driver (Dhruva Gole)

   - Update arch_freq_scale in the CPPC cpufreq driver's frequency
     invariance engine (FIE) in scheduler ticks if the related CPPC
     registers are not in PCC (Jie Zhan)

   - Assorted minor cleanups and improvements in ARM cpufreq drivers
     (Juan Martinez, Felix Gu, Luca Weiss, and Sergey Shtylyov)

   - Add generic helpers for sysfs show/store to cppc_cpufreq (Sumit
     Gupta)

   - Make the scaling_setspeed cpufreq sysfs attribute return the actual
     requested frequency to avoid confusion (Pengjie Zhang)

   - Simplify the idle CPU time granularity test in the ondemand cpufreq
     governor (Frederic Weisbecker)

   - Enable asym capacity in intel_pstate only when CPU SMT is not
     possible (Yaxiong Tian)

   - Update the description of rate_limit_us default value in cpufreq
     documentation (Yaxiong Tian)

   - Add a command line option to adjust the C-states table in the
     intel_idle driver, remove the 'preferred_cstates' module parameter
     from it, add C-states validation to it and clean it up (Artem
     Bityutskiy)

   - Make the menu cpuidle governor always check the time till the
     closest timer event when the scheduler tick has been stopped to
     prevent it from mistakenly selecting the deepest available idle
     state (Rafael Wysocki)

   - Update the teo cpuidle governor to avoid making suboptimal
     decisions in certain corner cases and generally improve idle state
     selection accuracy (Rafael Wysocki)

   - Remove an unlikely() annotation on the early-return condition in
     menu_select() that leads to branch misprediction 100% of the time
     on systems with only 1 idle state enabled, like ARM64 servers
     (Breno Leitao)

   - Add Christian Loehle to MAINTAINERS as a cpuidle reviewer
     (Christian Loehle)

   - Stop flagging the PM runtime workqueue as freezable to avoid system
     suspend and resume deadlocks in subsystems that assume asynchronous
     runtime PM to work during system-wide PM transitions (Rafael
     Wysocki)

   - Drop redundant NULL pointer checks before acomp_request_free() from
     the hibernation code handling image saving (Rafael Wysocki)

   - Update wakeup_sources_walk_start() to handle empty lists of wakeup
     sources as appropriate (Samuel Wu)

   - Make dev_pm_clear_wake_irq() check the power.wakeirq value under
     power.lock to avoid race conditions (Gui-Dong Han)

   - Avoid bit field races related to power.work_in_progress in the core
     device suspend code (Xuewen Yan)

   - Make several drivers discard pm_runtime_put() return value in
     preparation for converting that function to a void one (Rafael
     Wysocki)

   - Add PL4 support for Ice Lake to the Intel RAPL power capping driver
     (Daniel Tang)

   - Replace sprintf() with sysfs_emit() in power capping sysfs show
     functions (Sumeet Pawnikar)

   - Make dev_pm_opp_get_level() return value match the documentation
     after a previous update of the latter (Aleks Todorov)

   - Use scoped for each OF child loop in the OPP code (Krzysztof
     Kozlowski)

   - Fix a bug in an example code snippet and correct typos in the
     energy model management documentation (Patrick Little)

   - Fix miscellaneous problems in cpupower (Kaushlendra Kumar):
      * idle_monitor: Fix incorrect value logged after stop
      * Fix inverted APERF capability check
      * Use strcspn() to strip trailing newline
      * Reset errno before strtoull()
      * Show C0 in idle-info dump

   - Improve cpupower installation procedure by making the systemd step
     optional and allowing users to disable the installation of
     systemd's unit file (João Marcos Costa)"

* tag 'pm-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (65 commits)
  PM: sleep: core: Avoid bit field races related to work_in_progress
  PM: sleep: wakeirq: harden dev_pm_clear_wake_irq() against races
  cpufreq: Documentation: Update description of rate_limit_us default value
  cpufreq: intel_pstate: Enable asym capacity only when CPU SMT is not possible
  PM: wakeup: Handle empty list in wakeup_sources_walk_start()
  PM: EM: Documentation: Fix bug in example code snippet
  Documentation: Fix typos in energy model documentation
  cpuidle: governors: teo: Refine intercepts-based idle state lookup
  cpuidle: governors: teo: Adjust the classification of wakeup events
  cpufreq: ondemand: Simplify idle cputime granularity test
  cpufreq: userspace: make scaling_setspeed return the actual requested frequency
  PM: hibernate: Drop NULL pointer checks before acomp_request_free()
  cpufreq: CPPC: Add generic helpers for sysfs show/store
  cpufreq: scmi: Fix device_node reference leak in scmi_cpu_domain_id()
  cpufreq: ti-cpufreq: add support for AM62L3 SoC
  cpufreq: dt-platdev: Add ti,am62l3 to blocklist
  cpufreq/amd-pstate: Add comment explaining nominal_perf usage for performance policy
  cpufreq: scmi: correct SCMI explanation
  cpufreq: dt-platdev: Block the driver from probing on more QC platforms
  rust: cpumask: rename methods of Cpumask for clarity and consistency
  ...
2026-02-09 19:00:42 -08:00
Linus Torvalds
d84e173311 ACPI support updates for 6.20-rc1/7.0-rc1
- Update the ACPICA code in the kernel to upstream version 20251212
    which includes the following changes:
 
    * Add support for new ACPI table DTPR (Michal Camacho Romero)
    * Release objects with acpi_ut_delete_object_desc() (Zilin Guan)
    * Add UUIDs for Microsoft fan extensions and UUIDs associated with
      TPM 2.0 devices (Armin Wolf)
    * Fix NULL pointer dereference in acpi_ev_address_space_dispatch()
      (Alexey Simakov)
    * Add KEYP ACPI table definition (Dave Jiang)
    * Add support for the Microsoft display mux _OSI string (Armin Wolf)
    * Add definitions for the IOVT ACPI table (Xianglai Li)
    * Abort AML bytecode execution on AML_FATAL_OP (Armin Wolf)
    * Include all fields in subtable type1 for PPTT (Ben Horgan)
    * Add GICv5 MADT structures and Arm IORT IWB node definitions (Jose
      Marinho)
    * Update Parameter Block structure for RAS2 and add a new flag in
      Memory Affinity Structure for SRAT (Pawel Chmielewski)
    * Add _VDM (Voltage Domain) object (Pawel Chmielewski)
 
  - Add support for GICv5 ACPI probing on ARM which is based on the
    GICv5 MADT structures and ARM IORT IWB node definitions recently
    added to ACPICA (Lorenzo Pieralisi)
 
  - Rework ACPI PM notification setup for PCI root buses and modify the
    ACPI PM setup for devices to register wakeup source objects under
    physical (that is, PCI, platform, etc.) devices instead of doing that
    under their ACPI companions (Rafael Wysocki)
 
  - Adjust debug messages regarding postponed ACPI PM printed during
    system resume to be more accurate (Rafael Wysocki)
 
  - Remove dead code from lps0_device_attach() (Gergo Koteles)
 
  - Start to invoke Microsoft Function 9 (Turn On Display) of the Low-
    Power S0 Idle (LPS0) _DSM in the suspend-to-idle resume flow on
    systems with ACPI LPS0 support to address a functional issue on
    Lenovo Yoga Slim 7i Aura (15ILL9), where system fans and keyboard
    backlights fail to resume after suspend (Jakob Riemenschneider)
 
  - Add sysfs attribute cid for exposing _CID lists under ACPI device
    objects (Rafael Wysocki)
 
  - Replace sprintf() with sysfs_emit() in all of the core ACPI sysfs
    interface code (Sumeet Pawnikar)
 
  - Use acpi_get_local_u64_address() in the code implementing ACPI
    support for PCI to evaluate _ADR instead of evaluating that object
    directly (Andy Shevchenko)
 
  - Add JWIPC JVC9100 to irq1_level_low_skip_override[] to unbreak
    serial IRQs on that system (Ai Chao)
 
  - Fix handling of _OSC errors in acpi_run_osc() to avoid failures on
    systems where _OSC error bits are set even though the _OSC return
    buffer contains acknowledged feature bits (Rafael Wysocki)
 
  - Clean up and rearrange \_SB._OSC handling for general platform
    features and USB4 features to avoid code duplication and unnecessary
    memory management overhead (Rafael Wysocki)
 
  - Make the ACPI core device enumeration code handle PNP0C01 and PNP0C02
    ("system resource") device objects directly instead of letting the
    legacy PNP system driver handle them to avoid device enumeration
    issues on systems where PNP0C02 is present in the _CID list under
    ACPI device objects with a _HID matching a proper device driver in
    Linux (Rafael Wysocki)
 
  - Drop workarounds for the known device enumeration issues related to
    _CID lists containing PNP0C02 (Rafael Wysocki)
 
  - Drop outdated comment regarding removed function in the ACPI-based
    device enumeration code (Julia Lawall)
 
  - Make PRP0001 device matching work as expected for ACPI device objects
    using it as a _HID for board development and similar purposes (Kartik
    Rajput)
 
  - Use async schedule function in acpi_scan_clear_dep_fn() to avoid
    races with user space initialization on some systems (Yicong Yang)
 
  - Add a piece of documentation explaining why binding drivers directly
    to ACPI device objects is not a good idea in general and why it is
    desirable to convert drivers doing so into proper platform drivers
    that use struct platform_driver for device binding (Rafael Wysocki)
 
  - Convert multiple "core ACPI" drivers, including the NFIT ACPI device
    driver, the generic ACPI button drivers, the generic ACPI thermal
    zone driver, the ACPI hardware event device (HED) driver, the ACPI EC
    driver, the ACPI SMBUS HC driver, the ACPI Smart Battery Subsystem
    (SBS) driver, and the ACPI backlight (video) driver to proper platform
    drivers that use struct platform_driver for device binding (Rafael
    Wysocki)
 
  - Use acpi_get_local_u64_address() in the ACPI backlight (video) driver
    to evaluate _ADR instead of evaluating that object directly (Andy
    Shevchenko)
 
  - Convert the generic ACPI battery driver to a proper platform driver
    using struct platform_driver for device binding (Rafael Wysocki)
 
  - Fix incorrect charging status when current is zero in the generic
    ACPI battery driver (Ata İlhan Köktürk)
 
  - Use LIST_HEAD() for initializing a stack-allocated list in the
    generic ACPI watchdog device driver (Can Peng)
 
  - Rework the ACPI idle driver initialization to register it directly
    from the common initialization code instead of doing that from a
    CPU hotplug "online" callback and clean it up (Huisong Li, Rafael
    Wysocki)
 
  - Fix a possible NULL pointer dereference in
    acpi_processor_errata_piix4() (Tuo Li)
 
  - Make read-only array non_mmio_desc[] static const (Colin Ian King)
 
  - Prevent the APEI GHES support code on ARM from accessing memory out
    of bounds or going past the ARM processor CPER record buffer (Mauro
    Carvalho Chehab)
 
  - Prevent cper_print_fw_err() from dumping the entire memory on systems
    with defective firmware (Mauro Carvalho Chehab)
 
  - Improve ghes_notify_nmi() status check to avoid unnecessary overhead
    in the NMI handler by carrying out all of the requisite preparations
    and the NMI registration time (Tony Luck)
 
  - Refactor the GHES driver by extracting common functionality into
    reusable helper functions to reduce code duplication and improve
    the ghes_notify_sea() status check in analogy with the previous
    ghes_notify_nmi() status check improvement (Shuai Xue)
 
  - Make ELOG and GHES log and trace consistently and support the CPER
    CXL protocol analogously (Fabio De Francesco)
 
  - Disable KASAN instrumentation in the APEI GHES driver when compile
    testing with clang < 18 (Nathan Chancellor)
 
  - Let ghes_edac be the preferred driver to load on  __ZX__ and _BYO_
    systems by extending the platform detection list in the APEI GHES
    driver (Tony W Wang-oc)
 
  - Clean up cppc_perf_caps and cppc_perf_ctrls structs and rename EPP
    constants for clarity in the ACPI CPPC library (Sumit Gupta)
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmmErOoSHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO1wCoH/iWkTc5/kTTmzi5Urh9lsUsOL8Oc9hrx
 KlH4XdLf9TUUSGPJtPRrrtXiOotNUSaZpg2ULktJAZHxDWXnA83DO0FjwT3+2WbH
 /5YGVEK24Sae+DI38ukEvOI6bXrxxqNPLnFRtvV7tclUJD2OEAdKcUd8Y98s9qNf
 64bf2gKoQtbZj0eM33KHiotULYZUNoZ2SWKFAWXzKQr7EOkcQGKcuupeStgQ04Id
 uumWr+lIgW5vYPOcPFCufE3+LM0MzDvruxBlMJQEJJFVkY4IEFwwJNGW/lo8hgE3
 74pFKqOPhYsIFPM8rtKXS+JmjP22iCI49QjnHIsQdDG03GEKvKFFeCw=
 =CG0j
 -----END PGP SIGNATURE-----

Merge tag 'acpi-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
 "This one is significantly larger than previous ACPI support pull
  requests because several significant updates have coincided in it.

  First, there is a routine ACPICA code update, to upstream version
  20251212, but this time it covers new ACPI 6.6 material that has not
  been covered yet. Among other things, it includes definitions of a few
  new ACPI tables and updates of some others, like the GICv5 MADT
  structures and ARM IORT IWB node definitions that are used for adding
  GICv5 ACPI probing on ARM (that technically is IRQ subsystem material,
  but it depends on the ACPICA changes, so it is included here). The
  latter alone adds a few hundred lines of new code.

  Second, there is an update of ACPI _OSC handling including a fix that
  prevents failures from occurring in some corner cases due to careless
  handling of _OSC error bits.

  On top of that, the "system resource" ACPI device objects with the
  PNP0C01 and PNP0C02 are now going to be handled by the ACPI core
  device enumeration code instead of handing them over to the legacy PNP
  system driver which causes device enumeration issues to occur. Some of
  those issues have been worked around in device drivers and elsewhere
  and those workarounds should not be necessary any more, so they are
  going away.

  Moreover, the time has come to convert all "core ACPI" device drivers
  that were still using struct acpi_driver objects for device binding
  into proper platform drivers that use struct platform_driver for this
  purpose. These updates are accompanied by some requisite core ACPI
  device enumeration code changes.

  Next, there are ACPI APEI updates, including changes to avoid excess
  overhead in the NMI handler and in SEA on the ARM side, changes to
  unify ACPI-based HW error tracing and logging, and changes to prevent
  APEI code from reaching out of its allocated memory.

  There are also some ACPI power management updates, mostly related to
  the ACPI cpuidle support in the processor driver, suspend-to-idle
  handling on systems with ACPI support and to ACPI PM of devices.

  In addition to the above, bugs are fixed and the code is cleaned up in
  assorted places all over.

  Specifics:

   - Update the ACPICA code in the kernel to upstream version 20251212
     which includes the following changes:
      * Add support for new ACPI table DTPR (Michal Camacho Romero)
      * Release objects with acpi_ut_delete_object_desc() (Zilin Guan)
      * Add UUIDs for Microsoft fan extensions and UUIDs associated with
        TPM 2.0 devices (Armin Wolf)
      * Fix NULL pointer dereference in acpi_ev_address_space_dispatch()
        (Alexey Simakov)
      * Add KEYP ACPI table definition (Dave Jiang)
      * Add support for the Microsoft display mux _OSI string (Armin
        Wolf)
      * Add definitions for the IOVT ACPI table (Xianglai Li)
      * Abort AML bytecode execution on AML_FATAL_OP (Armin Wolf)
      * Include all fields in subtable type1 for PPTT (Ben Horgan)
      * Add GICv5 MADT structures and Arm IORT IWB node definitions
        (Jose Marinho)
      * Update Parameter Block structure for RAS2 and add a new flag in
        Memory Affinity Structure for SRAT (Pawel Chmielewski)
      * Add _VDM (Voltage Domain) object (Pawel Chmielewski)

   - Add support for GICv5 ACPI probing on ARM which is based on the
     GICv5 MADT structures and ARM IORT IWB node definitions recently
     added to ACPICA (Lorenzo Pieralisi)

   - Rework ACPI PM notification setup for PCI root buses and modify the
     ACPI PM setup for devices to register wakeup source objects under
     physical (that is, PCI, platform, etc.) devices instead of doing
     that under their ACPI companions (Rafael Wysocki)

   - Adjust debug messages regarding postponed ACPI PM printed during
     system resume to be more accurate (Rafael Wysocki)

   - Remove dead code from lps0_device_attach() (Gergo Koteles)

   - Start to invoke Microsoft Function 9 (Turn On Display) of the Low-
     Power S0 Idle (LPS0) _DSM in the suspend-to-idle resume flow on
     systems with ACPI LPS0 support to address a functional issue on
     Lenovo Yoga Slim 7i Aura (15ILL9), where system fans and keyboard
     backlights fail to resume after suspend (Jakob Riemenschneider)

   - Add sysfs attribute cid for exposing _CID lists under ACPI device
     objects (Rafael Wysocki)

   - Replace sprintf() with sysfs_emit() in all of the core ACPI sysfs
     interface code (Sumeet Pawnikar)

   - Use acpi_get_local_u64_address() in the code implementing ACPI
     support for PCI to evaluate _ADR instead of evaluating that object
     directly (Andy Shevchenko)

   - Add JWIPC JVC9100 to irq1_level_low_skip_override[] to unbreak
     serial IRQs on that system (Ai Chao)

   - Fix handling of _OSC errors in acpi_run_osc() to avoid failures on
     systems where _OSC error bits are set even though the _OSC return
     buffer contains acknowledged feature bits (Rafael Wysocki)

   - Clean up and rearrange \_SB._OSC handling for general platform
     features and USB4 features to avoid code duplication and
     unnecessary memory management overhead (Rafael Wysocki)

   - Make the ACPI core device enumeration code handle PNP0C01 and
     PNP0C02 ("system resource") device objects directly instead of
     letting the legacy PNP system driver handle them to avoid device
     enumeration issues on systems where PNP0C02 is present in the _CID
     list under ACPI device objects with a _HID matching a proper device
     driver in Linux (Rafael Wysocki)

   - Drop workarounds for the known device enumeration issues related to
     _CID lists containing PNP0C02 (Rafael Wysocki)

   - Drop outdated comment regarding removed function in the ACPI-based
     device enumeration code (Julia Lawall)

   - Make PRP0001 device matching work as expected for ACPI device
     objects using it as a _HID for board development and similar
     purposes (Kartik Rajput)

   - Use async schedule function in acpi_scan_clear_dep_fn() to avoid
     races with user space initialization on some systems (Yicong Yang)

   - Add a piece of documentation explaining why binding drivers
     directly to ACPI device objects is not a good idea in general and
     why it is desirable to convert drivers doing so into proper
     platform drivers that use struct platform_driver for device binding
     (Rafael Wysocki)

   - Convert multiple "core ACPI" drivers, including the NFIT ACPI
     device driver, the generic ACPI button drivers, the generic ACPI
     thermal zone driver, the ACPI hardware event device (HED) driver,
     the ACPI EC driver, the ACPI SMBUS HC driver, the ACPI Smart
     Battery Subsystem (SBS) driver, and the ACPI backlight (video)
     driver to proper platform drivers that use struct platform_driver
     for device binding (Rafael Wysocki)

   - Use acpi_get_local_u64_address() in the ACPI backlight (video)
     driver to evaluate _ADR instead of evaluating that object directly
     (Andy Shevchenko)

   - Convert the generic ACPI battery driver to a proper platform driver
     using struct platform_driver for device binding (Rafael Wysocki)

   - Fix incorrect charging status when current is zero in the generic
     ACPI battery driver (Ata İlhan Köktürk)

   - Use LIST_HEAD() for initializing a stack-allocated list in the
     generic ACPI watchdog device driver (Can Peng)

   - Rework the ACPI idle driver initialization to register it directly
     from the common initialization code instead of doing that from a
     CPU hotplug "online" callback and clean it up (Huisong Li, Rafael
     Wysocki)

   - Fix a possible NULL pointer dereference in
     acpi_processor_errata_piix4() (Tuo Li)

   - Make read-only array non_mmio_desc[] static const (Colin Ian King)

   - Prevent the APEI GHES support code on ARM from accessing memory out
     of bounds or going past the ARM processor CPER record buffer (Mauro
     Carvalho Chehab)

   - Prevent cper_print_fw_err() from dumping the entire memory on
     systems with defective firmware (Mauro Carvalho Chehab)

   - Improve ghes_notify_nmi() status check to avoid unnecessary
     overhead in the NMI handler by carrying out all of the requisite
     preparations and the NMI registration time (Tony Luck)

   - Refactor the GHES driver by extracting common functionality into
     reusable helper functions to reduce code duplication and improve
     the ghes_notify_sea() status check in analogy with the previous
     ghes_notify_nmi() status check improvement (Shuai Xue)

   - Make ELOG and GHES log and trace consistently and support the CPER
     CXL protocol analogously (Fabio De Francesco)

   - Disable KASAN instrumentation in the APEI GHES driver when compile
     testing with clang < 18 (Nathan Chancellor)

   - Let ghes_edac be the preferred driver to load on __ZX__ and _BYO_
     systems by extending the platform detection list in the APEI GHES
     driver (Tony W Wang-oc)

   - Clean up cppc_perf_caps and cppc_perf_ctrls structs and rename EPP
     constants for clarity in the ACPI CPPC library (Sumit Gupta)"

* tag 'acpi-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (117 commits)
  ACPI: battery: fix incorrect charging status when current is zero
  ACPI: scan: Use async schedule function in acpi_scan_clear_dep_fn()
  ACPI: x86: s2idle: Invoke Microsoft _DSM Function 9 (Turn On Display)
  ACPI: APEI: GHES: Add ghes_edac support for __ZX__ and _BYO_ systems
  ACPI: APEI: GHES: Disable KASAN instrumentation when compile testing with clang < 18
  ACPI: sysfs: Replace sprintf() with sysfs_emit()
  ACPI: CPPC: Rename EPP constants for clarity
  ACPI: CPPC: Clean up cppc_perf_caps and cppc_perf_ctrls structs
  ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()
  ACPI: processor: idle: Convert acpi_processor_setup_cpuidle_dev() to void
  ACPI: processor: idle: Convert acpi_processor_setup_cpuidle_states() to void
  irqchip/gic-v5: Add ACPI IWB probing
  irqchip/gic-v5: Add ACPI ITS probing
  irqchip/gic-v5: Add ACPI IRS probing
  irqchip/gic-v5: Split IRS probing into OF and generic portions
  PCI/MSI: Make the pci_msi_map_rid_ctlr_node() interface firmware agnostic
  irqdomain: Add parent field to struct irqchip_fwid
  ACPI: PCI: simplify code with acpi_get_local_u64_address()
  ACPI: video: simplify code with acpi_get_local_u64_address()
  ACPI: PM: Adjust messages regarding postponed ACPI PM
  ...
2026-02-09 18:42:47 -08:00
Linus Torvalds
0c00ed308d for-7.0/block-20260206
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmGLwcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpv+TD/48S2HTnMhmW6AtFYWErQ+sEKXpHrxbYe7S
 +qR8/g/T+QSfhfqPwZEuagndFKtIP3LJfaXGSP1Lk1RfP9NLQy91v33Ibe4DjHkp
 etWSfnMHA9MUAoWKmg8EvncB2G+ZQFiYCpjazj5tKHD9S2+psGMuL8kq6qzMJE83
 uhpb8WutUl4aSIXbMSfyGlwBhI1MjjRbbWlIBmg4yC8BWt1sH8Qn2L2GNVylEIcX
 U8At3KLgPGn0axSg4yGMAwTqtGhL/jwdDyeczbmRlXuAr4iVL9UX/yADCYkazt6U
 ttQ2/H+cxCwfES84COx9EteAatlbZxo6wjGvZ3xOMiMJVTjYe1x6Gkcckq+LrZX6
 tjofi2KK78qkrMXk1mZMkZjpyUWgRtCswhDllbQyqFs0SwzQtno2//Rk8HU9dhbt
 pkpryDbGFki9X3upcNyEYp5TYflpW6YhAzShYgmE6KXim2fV8SeFLviy0erKOAl+
 fwjTE6KQ5QoQv0s3WxkWa4lREm34O6IHrCUmbiPm5CruJnQDhqAN2QZIDgYC4WAf
 0gu9cR/O4Vxu7TQXrumPs5q+gCyDU0u0B8C3mG2s+rIo+PI5cVZKs2OIZ8HiPo0F
 x73kR/pX3DMe35ZQkQX22ymMuowV+aQouDLY9DTwakP5acdcg7h7GZKABk6VLB06
 gUIsnxURiQ==
 =jNzW
 -----END PGP SIGNATURE-----

Merge tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block updates from Jens Axboe:

 - Support for batch request processing for ublk, improving the
   efficiency of the kernel/ublk server communication. This can yield
   nice 7-12% performance improvements

 - Support for integrity data for ublk

 - Various other ublk improvements and additions, including a ton of
   selftests additions and updated

 - Move the handling of blk-crypto software fallback from below the
   block layer to above it. This reduces the complexity of dealing with
   bio splitting

 - Series fixing a number of potential deadlocks in blk-mq related to
   the queue usage counter and writeback throttling and rq-qos debugfs
   handling

 - Add an async_depth queue attribute, to resolve a performance
   regression that's been around for a qhilw related to the scheduler
   depth handling

 - Only use task_work for IOPOLL completions on NVMe, if it is necessary
   to do so. An earlier fix for an issue resulted in all these
   completions being punted to task_work, to guarantee that completions
   were only run for a given io_uring ring when it was local to that
   ring. With the new changes, we can detect if it's necessary to use
   task_work or not, and avoid it if possible.

 - rnbd fixes:
      - Fix refcount underflow in device unmap path
      - Handle PREFLUSH and NOUNMAP flags properly in protocol
      - Fix server-side bi_size for special IOs
      - Zero response buffer before use
      - Fix trace format for flags
      - Add .release to rnbd_dev_ktype

 - MD pull requests via Yu Kuai
      - Fix raid5_run() to return error when log_init() fails
      - Fix IO hang with degraded array with llbitmap
      - Fix percpu_ref not resurrected on suspend timeout in llbitmap
      - Fix GPF in write_page caused by resize race
      - Fix NULL pointer dereference in process_metadata_update
      - Fix hang when stopping arrays with metadata through dm-raid
      - Fix any_working flag handling in raid10_sync_request
      - Refactor sync/recovery code path, improve error handling for
        badblocks, and remove unused recovery_disabled field
      - Consolidate mddev boolean fields into mddev_flags
      - Use mempool to allocate stripe_request_ctx and make sure
        max_sectors is not less than io_opt in raid5
      - Fix return value of mddev_trylock
      - Fix memory leak in raid1_run()
      - Add Li Nan as mdraid reviewer

 - Move phys_vec definitions to the kernel types, mostly in preparation
   for some VFIO and RDMA changes

 - Improve the speed for secure erase for some devices

 - Various little rust updates

 - Various other minor fixes, improvements, and cleanups

* tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits)
  blk-mq: ABI/sysfs-block: fix docs build warnings
  selftests: ublk: organize test directories by test ID
  block: decouple secure erase size limit from discard size limit
  block: remove redundant kill_bdev() call in set_blocksize()
  blk-mq: add documentation for new queue attribute async_dpeth
  block, bfq: convert to use request_queue->async_depth
  mq-deadline: covert to use request_queue->async_depth
  kyber: covert to use request_queue->async_depth
  blk-mq: add a new queue sysfs attribute async_depth
  blk-mq: factor out a helper blk_mq_limit_depth()
  blk-mq-sched: unify elevators checking for async requests
  block: convert nr_requests to unsigned int
  block: don't use strcpy to copy blockdev name
  blk-mq-debugfs: warn about possible deadlock
  blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs()
  blk-mq-debugfs: remove blk_mq_debugfs_unregister_rqos()
  blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
  blk-rq-qos: fix possible debugfs_mutex deadlock
  blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
  blk-wbt: fix possible deadlock to nest pcpu_alloc_mutex under q_usage_counter
  ...
2026-02-09 17:57:21 -08:00
Linus Torvalds
591beb0e3a io_uring-bpf-restrictions.4-20260206
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmGJ1kQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpky8EAChIL3uJ5Vmv+oQTxT4EVb1wpc8U/XzXWU5
 Q5F9IpZZCGO7+i015Y7iTTqDRixjblRaWpWzZZP8vflWDUS8LESNZLQdcoEnxaiv
 P367KNPUGwxejcKsu8PvZvfnX6JWSQoNstcDmrwkCF0ND2UUfvvMZyn3uKhkbBRY
 h5Ehcqkvqc1OJDAWC7+yPzYAmB01uRPQ6sc9/GeujznHPlfbvie4u6gBvvfXeirT
 592zbVftINMrm6Twd6zl4n+HNAn+CUoyVMppeeddv5IcyFPm9uz/dLOZBXTz6552
 jFYNmB0U4g+SxGXMyqp37YISTALnuY+57y5eXmEAtgkEeE3HrF+F/ZdxQHwXSpo3
 T2Lb9IOqFyHtSvq678HZ37JB6aIYbBE/mZdNf8FFFpnPJGb5Ey7d50qPp/ywVq0H
 p9CahbpkzGUBMsZ+koew0YHiFdWV9tww+/Bnk5dTtn2197uyaHsLdmbf4C36GWke
 Bk5cwNgU+3DMFAfTiL9m+AIXYsJkBayRJn+hViTrF5AL7gcGiBryGF43FOSKoYuq
 f0mniDnGSwvn86VZPuZQ6wBRHZPEMR3OlaUXn6XrUU6cYyvMg0pBZV+QHF7zlsSP
 2sdfUbPL5TxexF3G8dsxlDIypz9Z6TCoUCfU0WiiUETnCrVNkXfIY846A+w08p0b
 ejBjzrwRtQ==
 =CqJq
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-bpf-restrictions.4-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring bpf filters from Jens Axboe:
 "This adds support for both cBPF filters for io_uring, as well as task
  inherited restrictions and filters.

  seccomp and io_uring don't play along nicely, as most of the
  interesting data to filter on resides somewhat out-of-band, in the
  submission queue ring.

  As a result, things like containers and systemd that apply seccomp
  filters, can't filter io_uring operations.

  That leaves them with just one choice if filtering is critical -
  filter the actual io_uring_setup(2) system call to simply disallow
  io_uring. That's rather unfortunate, and has limited us because of it.

  io_uring already has some filtering support. It requires the ring to
  be setup in a disabled state, and then a filter set can be applied.
  This filter set is completely bi-modal - an opcode is either enabled
  or it's not. Once a filter set is registered, the ring can be enabled.
  This is very restrictive, and it's not useful at all to systemd or
  containers which really want both broader and more specific control.

  This first adds support for cBPF filters for opcodes, which enables
  tighter control over what exactly a specific opcode may do. As
  examples, specific support is added for IORING_OP_OPENAT/OPENAT2,
  allowing filtering on resolve flags. And another example is added for
  IORING_OP_SOCKET, allowing filtering on domain/type/protocol. These
  are both common use cases. cBPF was chosen rather than eBPF, because
  the latter is often restricted in containers as well.

  These filters are run post the init phase of the request, which allows
  filters to even dip into data that is being passed in struct in user
  memory, as the init side of requests make that data stable by bringing
  it into the kernel. This allows filtering without needing to copy this
  data twice, or have filters etc know about the exact layout of the
  user data. The filters get the already copied and sanitized data
  passed.

  On top of that support is added for per-task filters, meaning that any
  ring created with a task that has a per-task filter will get those
  filters applied when it's created. These filters are inherited across
  fork as well. Once a filter has been registered, any further added
  filters may only further restrict what operations are permitted.

  Filters cannot change the return value of an operation, they can only
  permit or deny it based on the contents"

* tag 'io_uring-bpf-restrictions.4-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring: allow registration of per-task restrictions
  io_uring: add task fork hook
  io_uring/bpf_filter: add ref counts to struct io_bpf_filter
  io_uring/bpf_filter: cache lookup table in ctx->bpf_filters
  io_uring/bpf_filter: allow filtering on contents of struct open_how
  io_uring/net: allow filtering on IORING_OP_SOCKET data
  io_uring: add support for BPF filtering for opcode restrictions
2026-02-09 17:31:17 -08:00
Linus Torvalds
26c9342bb7 struct filename series
[mostly] sanitize struct filename hanling
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaYlcJgAKCRBZ7Krx/gZQ
 6xlKAP9c9J13sJ/mcobsj1Ov7nSHISNbnYqvRRCu09Wq3UQvJgEApNQYOEdLtpff
 zUnWOAQ0nOKY7w9VMLkRRustXpuGjAc=
 =Fld4
 -----END PGP SIGNATURE-----

Merge tag 'pull-filename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs 'struct filename' updates from Al Viro:
 "[Mostly] sanitize struct filename handling"

* tag 'pull-filename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (68 commits)
  sysfs(2): fs_index() argument is _not_ a pathname
  alpha: switch osf_mount() to strndup_user()
  ksmbd: use CLASS(filename_kernel)
  mqueue: switch to CLASS(filename)
  user_statfs(): switch to CLASS(filename)
  statx: switch to CLASS(filename_maybe_null)
  quotactl_block(): switch to CLASS(filename)
  chroot(2): switch to CLASS(filename)
  move_mount(2): switch to CLASS(filename_maybe_null)
  namei.c: switch user pathname imports to CLASS(filename{,_flags})
  namei.c: convert getname_kernel() callers to CLASS(filename_kernel)
  do_f{chmod,chown,access}at(): use CLASS(filename_uflags)
  do_readlinkat(): switch to CLASS(filename_flags)
  do_sys_truncate(): switch to CLASS(filename)
  do_utimes_path(): switch to CLASS(filename_uflags)
  chdir(2): unspaghettify a bit...
  do_fchownat(): unspaghettify a bit...
  fspick(2): use CLASS(filename_flags)
  name_to_handle_at(): use CLASS(filename_uflags)
  vfs_open_tree(): use CLASS(filename_uflags)
  ...
2026-02-09 16:58:28 -08:00
Linus Torvalds
9e355113f0 vfs-7.0-rc1.misc
Please consider pulling these changes from the signed vfs-7.0-rc1.misc tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49QAKCRCRxhvAZXjc
 ojrZAQD1VJzY46r5FnAVf4jlEHyjIbDnZCP/n+c4x6XnqpU6EQEAgB0yAtAGP6+u
 SBuytElqHoTT5VtmEXTAabCNQ9Ks8wo=
 =JwZz
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains a mix of VFS cleanups, performance improvements, API
  fixes, documentation, and a deprecation notice.

  Scalability and performance:

   - Rework pid allocation to only take pidmap_lock once instead of
     twice during alloc_pid(), improving thread creation/teardown
     throughput by 10-16% depending on false-sharing luck. Pad the
     namespace refcount to reduce false-sharing

   - Track file lock presence via a flag in ->i_opflags instead of
     reading ->i_flctx, avoiding false-sharing with ->i_readcount on
     open/close hot paths. Measured 4-16% improvement on 24-core
     open-in-a-loop benchmarks

   - Use a consume fence in locks_inode_context() to match the
     store-release/load-consume idiom, eliminating a hardware fence on
     some architectures

   - Annotate cdev_lock with __cacheline_aligned_in_smp to prevent
     false-sharing

   - Remove a redundant DCACHE_MANAGED_DENTRY check in
     __follow_mount_rcu() that never fires since the caller already
     verifies it, eliminating a 100% mispredicted branch

   - Fix a 100% mispredicted likely() in devcgroup_inode_permission()
     that became wrong after a prior code reorder

  Bug fixes and correctness:

   - Make insert_inode_locked() wait for inode destruction instead of
     skipping, fixing a corner case where two matching inodes could
     exist in the hash

   - Move f_mode initialization before file_ref_init() in alloc_file()
     to respect the SLAB_TYPESAFE_BY_RCU ordering contract

   - Add a WARN_ON_ONCE guard in try_to_free_buffers() for folios with
     no buffers attached, preventing a null pointer dereference when
     AS_RELEASE_ALWAYS is set but no release_folio op exists

   - Fix select restart_block to store end_time as timespec64, avoiding
     truncation of tv_sec on 32-bit architectures

   - Make dump_inode() use get_kernel_nofault() to safely access inode
     and superblock fields, matching the dump_mapping() pattern

  API modernization:

   - Make posix_acl_to_xattr() allocate the buffer internally since
     every single caller was doing it anyway. Reduces boilerplate and
     unnecessary error checking across ~15 filesystems

   - Replace deprecated simple_strtoul() with kstrtoul() for the
     ihash_entries, dhash_entries, mhash_entries, and mphash_entries
     boot parameters, adding proper error handling

   - Convert chardev code to use guard(mutex) and __free(kfree) cleanup
     patterns

   - Replace min_t() with min() or umin() in VFS code to avoid silently
     truncating unsigned long to unsigned int

   - Gate LOOKUP_RCU assertions behind CONFIG_DEBUG_VFS since callers
     already check the flag

  Deprecation:

   - Begin deprecating legacy BSD process accounting (acct(2)). The
     interface has numerous footguns and better alternatives exist
     (eBPF)

  Documentation:

   - Fix and complete kernel-doc for struct export_operations, removing
     duplicated documentation between ReST and source

   - Fix kernel-doc warnings for __start_dirop() and ilookup5_nowait()

  Testing:

   - Add a kunit test for initramfs cpio handling of entries with
     filesize > PATH_MAX

  Misc:

   - Add missing <linux/init_task.h> include in fs_struct.c"

* tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
  posix_acl: make posix_acl_to_xattr() alloc the buffer
  fs: make insert_inode_locked() wait for inode destruction
  initramfs_test: kunit test for cpio.filesize > PATH_MAX
  fs: improve dump_inode() to safely access inode fields
  fs: add <linux/init_task.h> for 'init_fs'
  docs: exportfs: Use source code struct documentation
  fs: move initializing f_mode before file_ref_init()
  exportfs: Complete kernel-doc for struct export_operations
  exportfs: Mark struct export_operations functions at kernel-doc
  exportfs: Fix kernel-doc output for get_name()
  acct(2): begin the deprecation of legacy BSD process accounting
  device_cgroup: remove branch hint after code refactor
  VFS: fix __start_dirop() kernel-doc warnings
  fs: Describe @isnew parameter in ilookup5_nowait()
  fs/namei: Remove redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu
  fs: only assert on LOOKUP_RCU when built with CONFIG_DEBUG_VFS
  select: store end_time as timespec64 in restart block
  chardev: Switch to guard(mutex) and __free(kfree)
  namespace: Replace simple_strtoul with kstrtoul to parse boot params
  dcache: Replace simple_strtoul with kstrtoul in set_dhash_entries
  ...
2026-02-09 15:13:05 -08:00
Linus Torvalds
bcc8fd3e15 lsm/stable-7.0 PR 20260203
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmmCurkUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNDWA//RZxjjyY1I0GRDepJXJ8UFEVt4Fdr
 VsnSKL3o7sf0SAsQj2HCJsJPiwD5fHm2C2gdxh9rFC0bPpMbTVAkwUL7WhP+nkAt
 LA+UZKYurrk1XF6OctILoY3JcXmynb1Oe3lg6uVcWX5b1uEriqRgGKNcMYLb5fmr
 D1vZ9LMuZe8WwGTScprQID9FMrZ0TDbdI/vqG7si1W/PCFH7630MPJkmzmjPWvnV
 xJISKLOG+qbyWoNGLr+VaNjkmA+jPfsXAKWbfNXUGfikP8g/OHpFd70nIzJs8p7J
 dxZD7w6/kqSGhauQjcX8ov0zKxn83Z2Xt0+4Ldl5vOCWI3r4T3Y8WdarmULbq65n
 jIN8djDgmCJPqa5zuPmik+womaPk2GmSy1viEJdT4W0iHggTC1snOz1J+BbD+nkh
 uEZkmcCZbaeEQmfefxIyHDirrFsJvrunWupGrkfxvfFr+QU8H1xNLfMd6CQzvtI4
 P5p/KrnP2e58tJqvPxSY315ewUMy73kZU5DUl+Rq6Y4ai415R7vtwwEEkSKWnyja
 LMdEumc9IrsiBMcLmsj8QwobCr7XJtdCQV5ohR8CPxxcsI/G0pR99e1pckD7l7Qm
 OG461BKHntU3SFWSiZw+rNWlJuyPcSy5nmUxQvxQHP9pShZPu8rTfYX+CBzrHJk2
 OFjAwNJn1N/NfYI=
 =cCyp
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20260203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull lsm updates from Paul Moore:

 - Unify the security_inode_listsecurity() calls in NFSv4

   While looking at security_inode_listsecurity() with an eye towards
   improving the interface, we realized that the NFSv4 code was making
   multiple calls to the LSM hook that could be consolidated into one.

 - Mark the LSM static branch keys as static - this helps resolve some
   sparse warnings

 - Add __rust_helper annotations to the LSM and cred wrapper functions

 - Remove the unsused set_security_override_from_ctx() function

 - Minor fixes to some of the LSM kdoc comment blocks

* tag 'lsm-pr-20260203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  lsm: make keys for static branch static
  cred: remove unused set_security_override_from_ctx()
  rust: security: add __rust_helper to helpers
  rust: cred: add __rust_helper to helpers
  nfs: unify security_inode_listsecurity() calls
  lsm: fix kernel-doc struct member names
2026-02-09 10:16:48 -08:00
Linus Torvalds
698749164a audit/stable-7.0 PR 20260203
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmmCuoQUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMRFhAAntv4vmqRciFI4oEqxi5X8wmYmzc9
 BUQV2XXcfO63IOHdGrXmYHByx3+mZZddAPpYMTqrzA0p2NCqi4svCVwspUHwUcTY
 btl+xlppBJpBtUL5pmLiP6Q4u+zURYCwuA/OKfxuKa5Frm8D3kbkd5MpxJS15Mev
 qqEhLT0aj6/rjQpYVwOFGMwehKfE7iuyc8XTBaetvUKHW38sj18ANSpLnN5bmiuE
 3lz252kCjyDoOsu+vO0Saa8Rv8lVDjlSMn6mYr4L2fVygYwFDg2Gj7+bmB6LGYy9
 YyIm6P+b23E8GOltEObpvrz8ItPR7nvKNiDMEeP1eqGzQ/Mc5OqqljVaNMNPmP+s
 XN/jZt02XePKXlje+C08620mDVeIYp35TK1bY2/HrYMqySE0wwO1iSyBI4ftPFtu
 CteM8XA8oH49pspFWbEKCHmtFFGxDVjfVM7YrHeDc+qw2tJjZ7R1GRk5hadP1Ou7
 emxGLb6jfejT6NMNU8rM2RVmQNs1jcFh+8lHvDgqqQmaCXJd3AEgr+Om5w9kZ6fJ
 FyEkh0f9HuZdcEn8tqWaIwCAZXTzECOThj6hhxZGiG9xXFXza1eIXxq7VtIE6fdO
 ATAJ6cpcj0LIQmE7QWntS1NloPkWD1OSiWLNis0AgCiN97oRTk0oFJ5q7fD0bLUa
 HGAQqd4OJKmNciw=
 =pFeD
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20260203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:

 - Improve the NETFILTER_PKT audit records

   Add source and destination ports to the NETFILTER_PKT audit records
   while also consolidating a lot of the code into a new, singular
   audit_log_nf_skb() function. This new approach to structuring the
   NETFILTER_PKT record generation should eliminate some unnecessary
   overhead when audit is not built into the kernel.

 - Update the audit syscall classifier code

   Add the listxattrat(), getxattrat(), and fchmodat2() syscall to the
   audit code which classifies syscalls into categories of operations,
   e.g. "read" or "change attributes".

 - Move the syscall classifier declarations into audit_arch.h

   Shuffle around some header file declarations to resolve some sparse
   warnings.

* tag 'audit-pr-20260203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: move the compat_xxx_class[] extern declarations to audit_arch.h
  audit: add missing syscalls to read class
  audit: include source and destination ports to NETFILTER_PKT
  audit: add audit_log_nf_skb helper function
  audit: add fchmodat2() to change attributes class
2026-02-09 10:13:03 -08:00
Linus Torvalds
ef852baaf6 RCU changes for v7.0
RCU Tasks Trace:
 
 Re-implement RCU tasks trace in term of SRCU-fast, not only more than 500 lines
 of code are saved because of the reimplementation, a new set of API,
 rcu_read_{,un}lock_tasks_trace(), becomes possible as well. Compared to the
 previous rcu_read_{,un}lock_trace(), the new API avoid the task_struct accesses
 thanks to the SRCU-fast semantics. As a result, the old
 rcu_read{,un}lock_trace() API is now deprecated.
 
 RCU Torture Test:
 
 - Multiple improvements on kvm-series.sh (parallel run and progress showing
   metrics)
 - Add context checks to rcu_torture_timer().
 - Make config2csv.sh properly handle comments in .boot files.
 - Include commit discription in testid.txt.
 
 Miscellaneous RCU changes:
 
 - Reduce synchronize_rcu() latency by reporting GP kthread's CPU QS early.
 - Use suitable gfp_flags for the init_srcu_struct_nodes().
 - Fix rcu_read_unlock() deadloop due to softirq.
 - Correctly compute probability to invoke ->exp_current() in rcutorture.
 - Make expedited RCU CPU stall warnings detect stall-end races.
 
 RCU nocb:
 
 - Remove unnecessary WakeOvfIsDeferred wake path and callback overload
   handling.
 - Extract nocb_defer_wakeup_cancel() helper.
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCAAvFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAmmARZERHGJvcXVuQGtl
 cm5lbC5vcmcACgkQSXnow7UH+rh8SAf+PDIBWAkdbGgs32EfgpFY42RB4CWygH47
 YRup/M3+nU0JcBzNnona1srpHBXRySBJQbvRbsOdlM45VoNQ2wPjig/3vFVRUKYx
 uqj9Tze00DS74IIGESoTGp0amZde9SS9JakNRoEfTr+Zpj8N6LFERQw0ywUwjR5b
 RR6bz7q05TAl3u2BYUAgNdnf3VWWTmj4WYwArlQ+qRFAyGN+TVj8Ezra6+K5TJ7u
 SQYrf7WmRGOhHbVVolvVEOVdACccI8dFl3ebJVE2Ky0gp1o3BLPkcDLJ6gBdTCoE
 rRrbnkeqs5V7tOkPFDBeUhLPrm1QxrdEDxUQFWjSApbv161sx7AOZA==
 =O9QQ
 -----END PGP SIGNATURE-----

Merge tag 'rcu.release.v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Boqun Feng:

 - RCU Tasks Trace:

   Re-implement RCU tasks trace in term of SRCU-fast, not only more than
   500 lines of code are saved because of the reimplementation, a new
   set of API, rcu_read_{,un}lock_tasks_trace(), becomes possible as
   well. Compared to the previous rcu_read_{,un}lock_trace(), the new
   API avoid the task_struct accesses thanks to the SRCU-fast semantics.

   As a result, the old rcu_read{,un}lock_trace() API is now deprecated.

 - RCU Torture Test:
    - Multiple improvements on kvm-series.sh (parallel run and
      progress showing metrics)
    - Add context checks to rcu_torture_timer()
    - Make config2csv.sh properly handle comments in .boot files
    - Include commit discription in testid.txt

 - Miscellaneous RCU changes:
    - Reduce synchronize_rcu() latency by reporting GP kthread's
      CPU QS early
    - Use suitable gfp_flags for the init_srcu_struct_nodes()
    - Fix rcu_read_unlock() deadloop due to softirq
    - Correctly compute probability to invoke ->exp_current()
      in rcutorture
    - Make expedited RCU CPU stall warnings detect stall-end races

 - RCU nocb:
    - Remove unnecessary WakeOvfIsDeferred wake path and callback
      overload handling
    - Extract nocb_defer_wakeup_cancel() helper

* tag 'rcu.release.v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (25 commits)
  rcu/nocb: Extract nocb_defer_wakeup_cancel() helper
  rcu/nocb: Remove dead callback overload handling
  rcu/nocb: Remove unnecessary WakeOvfIsDeferred wake path
  rcu: Reduce synchronize_rcu() latency by reporting GP kthread's CPU QS early
  srcu: Use suitable gfp_flags for the init_srcu_struct_nodes()
  rcu: Fix rcu_read_unlock() deadloop due to softirq
  rcutorture: Correctly compute probability to invoke ->exp_current()
  rcu: Make expedited RCU CPU stall warnings detect stall-end races
  rcutorture: Add --kill-previous option to terminate previous kvm.sh runs
  rcutorture: Prevent concurrent kvm.sh runs on same source tree
  torture: Include commit discription in testid.txt
  torture: Make config2csv.sh properly handle comments in .boot files
  torture: Make kvm-series.sh give run numbers and totals
  torture: Make kvm-series.sh give build numbers and totals
  torture: Parallelize kvm-series.sh guest-OS execution
  rcutorture: Add context checks to rcu_torture_timer()
  rcutorture: Test rcu_tasks_trace_expedite_current()
  srcu: Create an rcu_tasks_trace_expedite_current() function
  checkpatch: Deprecate rcu_read_{,un}lock_trace()
  rcu: Update Requirements.rst for RCU Tasks Trace
  ...
2026-02-09 09:46:26 -08:00
Linus Torvalds
dda5df9823 Miscellaneous MMCID fixes to address bugs and
performance regressions in the recent rewrite
 of the SCHED_MM_CID management code:
 
  - Fix livelock triggered by BPF CI testing
 
  - Fix hard lockup on weakly ordered systems
 
  - Simplify the dropping of CIDs in the exit path
    by removing an unintended transition phase.
 
  - Fix performance/scalability regression on a
    thread-pool benchmark by optimizing transitional
    CIDs when scheduling out.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmHDvQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hdPBAAgnl/L09wF8WCQLSoLrhr71FmS6fZApDB
 Rvov2be8tGJR0BsrJF5uOKTNjulqUIr0mfO73fdHZftdFuhm/WLnWjBO62GhCKMg
 d8kXOVZ7PudFN+QwL17pOAub8voh9s9/mceE/hZ3M5eNjXlG4sAcpyGvnrTLLYru
 rfzO48NOpy5NMfbxU5/f9nojfr2t8fhnpX2QjquOhEPpl/BeYzexTZK7h2IJXqTK
 tkU6IY9X8fT7y8LkKbTCIMJvEuWawHj1DSW2EiWNPJZkX+Hk5ZHttg28JjROavEy
 orgairCSCT/cOETKugfToFd0Z4WlmemY6Nk5Kyx//WiFQ/u0HHlFVgMJoJfQEovV
 MtIxLVygVbEoQyTszZyFUlTQjrnH8uKxXYhh1mX5wSj9lyDfpfJZycFFA2RpE4Rw
 /+pvH08BfR4FgpqTfojfgOnuK/575VsomaVghritoNW3bAie1kpnWIeBaXS8lL4O
 0pkK7XX8ng6hXuZTMxgXXfkfUB6oM1Yp1OZJAEzUvftsK0FQ5q3e0WxD+pdVza2s
 PfQPaA7bT/G7y8k4LIXm59/tPX2QWPwe0yci00NbyfWiOdxHSgS7crQO8E1+VAiq
 TcLGZNj/wFL6B5ghaiUIi22Mo+WnLX8fW+aiIjSiUQILmbNZXYmwtfEFsvsahh9W
 /RkE/WQ492E=
 =/PkF
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
 "Miscellaneous MMCID fixes to address bugs and performance regressions
  in the recent rewrite of the SCHED_MM_CID management code:

   - Fix livelock triggered by BPF CI testing

   - Fix hard lockup on weakly ordered systems

   - Simplify the dropping of CIDs in the exit path by removing an
     unintended transition phase

   - Fix performance/scalability regression on a thread-pool benchmark
     by optimizing transitional CIDs when scheduling out"

* tag 'sched-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/mmcid: Optimize transitional CIDs when scheduling out
  sched/mmcid: Drop per CPU CID immediately when switching to per task mode
  sched/mmcid: Protect transition on weakly ordered systems
  sched/mmcid: Prevent live lock on task to CPU mode transition
2026-02-07 09:10:42 -08:00
Breno Leitao
9cb8b0f289 workqueue: replace BUG_ON with panic in panic_on_wq_watchdog
Replace BUG_ON() with panic() in panic_on_wq_watchdog(). This is not
a bug condition but a deliberate forced panic requested by the user
via module parameters to crash the system for debugging purposes.

Using panic() instead of BUG_ON() makes this intent clearer and provides
more informative output about which threshold was exceeded and the actual
values, making it easier to diagnose the stall condition from crash dumps.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-07 06:54:42 -10:00
Breno Leitao
f84c9dd34e workqueue: add time-based panic for stalls
Add a new module parameter 'panic_on_stall_time' that triggers a panic
when a workqueue stall persists for longer than the specified duration
in seconds.

Unlike 'panic_on_stall' which counts accumulated stall events, this
parameter triggers based on the duration of a single continuous stall.
This is useful for catching truly stuck workqueues rather than
accumulating transient stalls.

Usage:
  workqueue.panic_on_stall_time=120

This would panic if any workqueue pool has been stalled for 120 seconds
or more.

The stall duration is measured from the workqueue last progress
(poll_ts) which accounts for legitimate system stalls.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-07 06:54:38 -10:00
Linus Torvalds
7e0b172c80 Misc objtool fixes:
- Bump up the Clang minimum version requirements
    for livepatch builds, due to Clang assembler
    section handling bugs causing silent
    miscompilations.
 
  - Strip livepatching symbol artifacts from
    non-livepatch modules.
 
  - Fix livepatch build warnings when certain
    Clang LTO options are enabled.
 
  - Fix livepatch build error when
    CONFIG_MEM_ALLOC_PROFILING_DEBUG=y.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmHCvwRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hkTQ/8DhLI5m9CqhMaouuR5Vm9POKgFOXAe6uz
 eHTuKhpJlw+anhNjeUA7PtYbnkrj0j+aNo5SmrfD4Yx8CW7dCt+oO3Y5ziVhkPHw
 46Q/9KbmcT11uPbYywp/G4b15FF8YYu9slzhGau/Wa9H+oqU/WPoGapJsMPpUtBo
 s0qGxdr2G3WZyD9H/wgCyhMwCOkAYMJ0sHxpGgRajLefDutsRvtlau/ktYU79vI3
 nUFteD3YDIAUblBtZPogsCP36QJlx7TWCUNK02vPeOYRh3xPjf3iG+vgf1+sjZHV
 P20psekpDwhh1KyeVziUyihUy8TmEVVozRvsrUKVlXmEqLqDtNrysqKTSp5/yRdT
 MqgNwDrvv2wW/DKYJhefbuttx3ppAErrnJ3zC9TYSmdn27feKPDJcD1OmdS3BIpH
 x9u/eVbOS8xbsOc/t3/Al7CRazvjLU0+OXsMJWbmAaO3tE7SwHq2aOnGbLGjvwWC
 Ts1AYfNp4H41CLLFnmKR8q2t/DOBhefW8p3cR5U+cVQ7PdqKRT+TwKWVnCrbrBcJ
 71702IrqoghwUrmhtxdZR0jZLtwb80s8zqdbrHojXSbXUFfYwwHNNaW1NSEpdSr4
 W8xYfSPrK71OGZ6oh/v1Wce6D+mxqb6kYD8DRLgOdbxrnvLwUhGXxxNDjfXA0f4B
 abjGHlAyoCs=
 =cdu2
 -----END PGP SIGNATURE-----

Merge tag 'objtool-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Ingo Molnar::

 - Bump up the Clang minimum version requirements for livepatch
   builds, due to Clang assembler section handling bugs causing
   silent miscompilations

 - Strip livepatching symbol artifacts from non-livepatch modules

 - Fix livepatch build warnings when certain Clang LTO options
   are enabled

 - Fix livepatch build error when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y

* tag 'objtool-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool/klp: Fix unexported static call key access for manually built livepatch modules
  objtool/klp: Fix symbol correlation for orphaned local symbols
  livepatch: Free klp_{object,func}_ext data after initialization
  livepatch: Fix having __klp_objects relics in non-livepatch modules
  livepatch/klp-build: Require Clang assembler >= 20
2026-02-07 08:21:21 -08:00
Bjorn Helgaas
5b4e5be1cc Merge branch 'pci/controller/tegra'
- Export irq_domain_free_irqs() to allow PCI/MSI drivers that tear down
  MSI domains to be built as modules (Aaron Kling)

- Export tegra_cpuidle_pcie_irqs_in_use(), which disables Tegra CC6 while
  PCI IRQs are in use, so pci-tegra can be built as a module (Aaron Kling)

- Allow pci-tegra to be built as a module (Aaron Kling)

* pci/controller/tegra:
  PCI: tegra: Allow building as a module
  cpuidle: tegra: Export tegra_cpuidle_pcie_irqs_in_use()
  irqdomain: Export irq_domain_free_irqs()
2026-02-06 17:09:50 -06:00
Amery Hung
0be08389c7 bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy}
Take care of rqspinlock error in bpf_local_storage_{map_free, destroy}()
properly by switching to bpf_selem_unlink_nofail().

Both functions iterate their own RCU-protected list of selems and call
bpf_selem_unlink_nofail(). In map_free(), to prevent infinite loop when
both map_free() and destroy() fail to remove a selem from b->list
(extremely unlikely), switch to hlist_for_each_entry_rcu(). In destroy(),
also switch to hlist_for_each_entry_rcu() since we no longer iterate
local_storage->list under local_storage->lock.

bpf_selem_unlink() now becomes dedicated to helpers and syscalls paths
so reuse_now should always be false. Remove it from the argument and
hardcode it.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Co-developed-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-12-ameryhung@gmail.com
2026-02-06 14:47:59 -08:00
Amery Hung
5d800f87d0 bpf: Support lockless unlink when freeing map or local storage
Introduce bpf_selem_unlink_nofail() to properly handle errors returned
from rqspinlock in bpf_local_storage_map_free() and
bpf_local_storage_destroy() where the operation must succeeds.

The idea of bpf_selem_unlink_nofail() is to allow an selem to be
partially linked and use atomic operation on a bit field, selem->state,
to determine when and who can free the selem if any unlink under lock
fails. An selem initially is fully linked to a map and a local storage.
Under normal circumstances, bpf_selem_unlink_nofail() will be able to
grab locks and unlink a selem from map and local storage in sequeunce,
just like bpf_selem_unlink(), and then free it after an RCU grace period.
However, if any of the lock attempts fails, it will only clear
SDATA(selem)->smap or selem->local_storage depending on the caller and
set SELEM_MAP_UNLINKED or SELEM_STORAGE_UNLINKED according to the
caller. Then, after both map_free() and destroy() see the selem and the
state becomes SELEM_UNLINKED, one of two racing caller can succeed in
cmpxchg the state from SELEM_UNLINKED to SELEM_TOFREE, ensuring no
double free or memory leak.

To make sure bpf_obj_free_fields() is done only once and when map is
still present, it is called when unlinking an selem from b->list under
b->lock.

To make sure uncharging memory is done only when the owner is still
present in map_free(), block destroy() from returning until there is no
pending map_free().

Since smap may not be valid in destroy(), bpf_selem_unlink_nofail()
skips bpf_selem_unlink_storage_nolock_misc() when called from destroy().
This is okay as bpf_local_storage_destroy() will return the remaining
amount of memory charge tracked by mem_charge to the owner to uncharge.
It is also safe to skip clearing local_storage->owner and owner_storage
as the owner is being freed and no users or bpf programs should be able
to reference the owner and using local_storage.

Finally, access of selem, SDATA(selem)->smap and selem->local_storage
are racy. Callers will protect these fields with RCU.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Co-developed-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-11-ameryhung@gmail.com
2026-02-06 14:47:47 -08:00
Amery Hung
c8be3da147 bpf: Prepare for bpf_selem_unlink_nofail()
The next patch will introduce bpf_selem_unlink_nofail() to handle
rqspinlock errors. bpf_selem_unlink_nofail() will allow an selem to be
partially unlinked from map or local storage. Save memory allocation
method in selem so that later an selem can be correctly freed even when
SDATA(selem)->smap is init to NULL.

In addition, keep track of memory charge to the owner in local storage
so that later bpf_selem_unlink_nofail() can return the correct memory
charge to the owner. Updating local_storage->mem_charge is protected by
local_storage->lock.

Finally, extract miscellaneous tasks performed when unlinking an selem
from local_storage into bpf_selem_unlink_storage_nolock_misc(). It will
be reused by bpf_selem_unlink_nofail().

This patch also takes the chance to remove local_storage->smap, which
is no longer used since commit f484f4a3e0 ("bpf: Replace bpf memory
allocator with kmalloc_nolock() in local storage").

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-10-ameryhung@gmail.com
2026-02-06 14:29:22 -08:00
Amery Hung
3417dffb58 bpf: Remove unused percpu counter from bpf_local_storage_map_free
Percpu locks have been removed from cgroup and task local storage. Now
that all local storage no longer use percpu variables as locks preventing
recursion, there is no need to pass them to bpf_local_storage_map_free().
Remove the argument from the function.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-9-ameryhung@gmail.com
2026-02-06 14:29:18 -08:00
Amery Hung
5254de7b96 bpf: Remove cgroup local storage percpu counter
The percpu counter in cgroup local storage is no longer needed as the
underlying bpf_local_storage can now handle deadlock with the help of
rqspinlock. Remove the percpu counter and related migrate_{disable,
enable}.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-8-ameryhung@gmail.com
2026-02-06 14:29:14 -08:00
Amery Hung
4a98c2efa6 bpf: Remove task local storage percpu counter
The percpu counter in task local storage is no longer needed as the
underlying bpf_local_storage can now handle deadlock with the help of
rqspinlock. Remove the percpu counter and related migrate_{disable,
enable}.

Since the percpu counter is removed, merge back bpf_task_storage_get()
and bpf_task_storage_get_recur(). This will allow the bpf syscalls and
helpers to run concurrently on the same CPU, removing the spurious
-EBUSY error. bpf_task_storage_get(..., F_CREATE) will now always
succeed with enough free memory unless being called recursively.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-7-ameryhung@gmail.com
2026-02-06 14:29:09 -08:00
Amery Hung
8dabe34b9d bpf: Change local_storage->lock and b->lock to rqspinlock
Change bpf_local_storage::lock and bpf_local_storage_map_bucket::lock
from raw_spin_lock to rqspinlock.

Finally, propagate errors from raw_res_spin_lock_irqsave() to syscall
return or BPF helper return.

In bpf_local_storage_destroy(), ignore return from
raw_res_spin_lock_irqsave() for now. A later patch will correctly
handle errors correctly in bpf_local_storage_destroy() so that it can
unlink selems even when failing to acquire locks.

For __bpf_local_storage_map_cache(), instead of handling the error,
skip updating the cache.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-6-ameryhung@gmail.com
2026-02-06 14:29:04 -08:00
Amery Hung
403e935f91 bpf: Convert bpf_selem_unlink to failable
To prepare changing both bpf_local_storage_map_bucket::lock and
bpf_local_storage::lock to rqspinlock, convert bpf_selem_unlink() to
failable. It still always succeeds and returns 0 until the change
happens. No functional change.

Open code bpf_selem_unlink_storage() in the only caller,
bpf_selem_unlink(), since unlink_map and unlink_storage must be done
together after all the necessary locks are acquired.

For bpf_local_storage_map_free(), ignore the return from
bpf_selem_unlink() for now. A later patch will allow it to unlink selems
even when failing to acquire locks.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-5-ameryhung@gmail.com
2026-02-06 14:28:59 -08:00
Amery Hung
fd103ffc57 bpf: Convert bpf_selem_link_map to failable
To prepare for changing bpf_local_storage_map_bucket::lock to rqspinlock,
convert bpf_selem_link_map() to failable. It still always succeeds and
returns 0 until the change happens. No functional change.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-4-ameryhung@gmail.com
2026-02-06 14:28:55 -08:00