Endpoint drivers use dw_pcie_ep_raise_msix_irq() to raise an MSI-X
interrupt to the host using a writel(), which generates a PCI posted write
transaction. There's no completion for posted writes, so the writel() may
return before the PCI write completes. dw_pcie_ep_raise_msix_irq() also
unmaps the outbound ATU entry used for the PCI write, so the write races
with the unmap.
If the PCI write loses the race with the ATU unmap, the write may corrupt
host memory or cause IOMMU errors, e.g., these when running fio with a
larger queue depth against nvmet-pci-epf:
arm-smmu-v3 fc900000.iommu: 0x0000010000000010
arm-smmu-v3 fc900000.iommu: 0x0000020000000000
arm-smmu-v3 fc900000.iommu: 0x000000090000f040
arm-smmu-v3 fc900000.iommu: 0x0000000000000000
arm-smmu-v3 fc900000.iommu: event: F_TRANSLATION client: 0000:01:00.0 sid: 0x100 ssid: 0x0 iova: 0x90000f040 ipa: 0x0
arm-smmu-v3 fc900000.iommu: unpriv data write s1 "Input address caused fault" stag: 0x0
Flush the write by performing a readl() of the same address to ensure that
the write has reached the destination before the ATU entry is unmapped.
The same problem was solved for dw_pcie_ep_raise_msi_irq() in commit
8719c64e76 ("PCI: dwc: ep: Cache MSI outbound iATU mapping"), but there
it was solved by dedicating an outbound iATU only for MSI. We can't do the
same for MSI-X because each vector can have a different msg_addr and the
msg_addr may be changed while the vector is masked.
Fixes: beb4641a78 ("PCI: dwc: Add MSI-X callbacks handler")
Signed-off-by: Niklas Cassel <cassel@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260211175540.105677-2-cassel@kernel.org
Endpoint drivers use dw_pcie_ep_raise_msi_irq() to raise MSI interrupts to
the host. After 8719c64e76 ("PCI: dwc: ep: Cache MSI outbound iATU
mapping"), dw_pcie_ep_raise_msi_irq() caches the Message Address from the
MSI Capability in ep->msi_msg_addr. But that Message Address is controlled
by the host, and it may change. For example, if:
- firmware on the host configures the Message Address and triggers an
MSI,
- a driver on the Endpoint raises the MSI via dw_pcie_ep_raise_msi_irq(),
which caches the Message Address,
- a kernel on the host reconfigures the Message Address and the host
kernel driver triggers another MSI,
dw_pcie_ep_raise_msi_irq() notices that the Message Address no longer
matches the cached ep->msi_msg_addr, warns about it, and returns error
instead of raising the MSI. The host kernel may hang because it never
receives the MSI.
This was seen with the nvmet_pci_epf_driver: the host UEFI performs NVMe
commands, e.g. Identify Controller to get the name of the controller,
nvmet-pci-epf posts the completion queue entry and raises an IRQ using
dw_pcie_ep_raise_msi_irq(). When the host boots Linux, we see a
WARN_ON_ONCE() from dw_pcie_ep_raise_msi_irq(), and the host kernel hangs
because the nvme driver never gets an IRQ.
Remove the warning when dw_pcie_ep_raise_msi_irq() notices that Message
Address has changed, remap using the new address, and update the
ep->msi_msg_addr cache.
Fixes: 8719c64e76 ("PCI: dwc: ep: Cache MSI outbound iATU mapping")
Signed-off-by: Niklas Cassel <cassel@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Koichiro Den <den@valinux.co.jp>
Acked-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260210181225.3926165-2-cassel@kernel.org
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmmWuQwTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXnnHB/41Jji+y8FHe2SqpQhUOqHb6NDEr3GX
YpAybhz2IsBHVhbCQn789UiIcSr0UDR7wnVLAmXe+5eY/jRwNggIO3tFqLYn92pK
KSTNafgNbLxh3iKBxRsUy0b3JutjD2LytkpFj2KVbBsZfmRxCZmKIV/4V18rV+fA
uemvoqLwU7emEWkhZ24suHMHPVpv6xKs9O6gOrQ4+zXR0g//eMLDqb17uj8h+8sM
ZsPsMYeuOihXlvGeBRjbnWYjA1ODWGDvwR9VT+VU4+HWht/KSr15EGeXZdV2eZUt
e/8swbqOS94a2ZjOgStzVkcPqAF88t9zZ+gvYElTDzLlHjqbrZdpeDDt
=A7tT
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed-20260218' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V updates from Wei Liu:
- Debugfs support for MSHV statistics (Nuno Das Neves)
- Support for the integrated scheduler (Stanislav Kinsburskii)
- Various fixes for MSHV memory management and hypervisor status
handling (Stanislav Kinsburskii)
- Expose more capabilities and flags for MSHV partition management
(Anatol Belski, Muminul Islam, Magnus Kulke)
- Miscellaneous fixes to improve code quality and stability (Carlos
López, Ethan Nelson-Moore, Li RongQing, Michael Kelley, Mukesh
Rathor, Purna Pavan Chandra Aekkaladevi, Stanislav Kinsburskii, Uros
Bizjak)
- PREEMPT_RT fixes for vmbus interrupts (Jan Kiszka)
* tag 'hyperv-next-signed-20260218' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (34 commits)
mshv: Handle insufficient root memory hypervisor statuses
mshv: Handle insufficient contiguous memory hypervisor status
mshv: Introduce hv_deposit_memory helper functions
mshv: Introduce hv_result_needs_memory() helper function
mshv: Add SMT_ENABLED_GUEST partition creation flag
mshv: Add nested virtualization creation flag
Drivers: hv: vmbus: Simplify allocation of vmbus_evt
mshv: expose the scrub partition hypercall
mshv: Add support for integrated scheduler
mshv: Use try_cmpxchg() instead of cmpxchg()
x86/hyperv: Fix error pointer dereference
x86/hyperv: Reserve 3 interrupt vectors used exclusively by MSHV
Drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RT
x86/hyperv: Remove ASM_CALL_CONSTRAINT with VMMCALL insn
x86/hyperv: Use savesegment() instead of inline asm() to save segment registers
mshv: fix SRCU protection in irqfd resampler ack handler
mshv: make field names descriptive in a header struct
x86/hyperv: Update comment in hyperv_cleanup()
mshv: clear eventfd counter on irqfd shutdown
x86/hyperv: Use memremap()/memunmap() instead of ioremap_cache()/iounmap()
...
dw_pcie_ep_set_bar() currently tears down existing inbound mappings only
when either the previous or the new struct pci_epf_bar uses submaps
(num_submap != 0). If both the old and new mappings are BAR Match Mode,
reprogramming the same ATU index is sufficient, so no explicit teardown
was needed.
However, some callers may reuse the same struct pci_epf_bar instance and
update it in place before calling set_bar() again. In that case
ep_func->epf_bar[bar] and the passed-in epf_bar can point to the same
object, so we cannot reliably distinguish BAR Match Mode -> BAR Match Mode
from Address Match Mode -> BAR Match Mode. As a result, the conditional
teardown based on num_submap becomes unreliable and existing inbound maps
may be left active.
Call dw_pcie_ep_clear_ib_maps() unconditionally before reprogramming the
BAR so that in-place updates are handled correctly.
This introduces a behavioral change in a corner case: if a BAR
reprogramming attempt fails (especially for the long-standing BAR Match
Mode -> BAR Match Mode update case), the previously programmed inbound
mapping will already have been torn down. This should be acceptable, since
the caller observes the error and should not use the BAR for any real
transactions in that case.
While at it, document that the existing update parameter check is
best-effort for in-place updates.
Fixes: cc839bef77 ("PCI: dwc: ep: Support BAR subrange inbound mapping via Address Match Mode iATU")
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260202145407.503348-3-den@valinux.co.jp
dw_pcie_ep_clear_ib_maps() first checks whether the inbound mapping for a
BAR is in BAR Match Mode (tracked via ep_func->bar_to_atu[bar]). Once
found, the iATU region is disabled and the bookkeeping is cleared.
BAR Match Mode and Address Match Mode mappings are mutually exclusive for a
given BAR, so there is nothing left for the Address Match Mode teardown
path to do after the BAR Match Mode mapping has been removed.
Return early after clearing the BAR Match Mode mapping to avoid running the
Address Match Mode teardown path. This makes the helper's intention
explicit and helps detect incorrect use of pci_epc_set_bar().
Suggested-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260202145407.503348-2-den@valinux.co.jp
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmmKJO4UHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vzotA/+OGSOPOs9hWd+OwNF5Dm2WA81yG/3
K3Jx5uMuPoSjduMbPhVcib02Mr6YDJTa6WlYNVa76ADs2G6HxcVMFHutlYudSVcl
umSF48FnyeH1LTba88dRoVj4DB47Cue+BfhYY2L0ZtxmjQq/NRuDFAaGBh54uNeF
Gcdgr52QlM01n1X6yKvl7vE9gPdcPH80L256ssHAm6oSOHI1SPc6gqEKUUD02f8G
FtzfTUAq/cWYjlY3VoS5GKtdHxFYuXqC5WfbURhJ11o/nVJY9k1Zx8n4eI1tmAtN
7q692xjWSQJZlzepOBBEyjFUpIiy80tZ43z2ptRRBeI/n/qMmGPAov/g4MzegBWG
IAEHTAp/xx1Wra1ynr7RNvYVcPpXm2TEim8gIGah9DkHbNgbu7ing+OO7DnQuyfD
2h4hGD2622o6uikqkwzVd4mYuIcFu7SA6yROZhFn83BRnz0QOQienDrDlvOB8XCV
EodLAOMc2KClvOmmriFMy11PH7MFFoXexV6KS83VfDJHi4+XzBsy0w6TXTohcA9s
JTPIkSWqf/u6SrdLjXlFGyyJ2/KCgRiXFIBhhtYBMhDuuO7nG+mcSVzMa1PT0s6C
PF+QoT7sJof/5VMJ4o3BgPrPkD3CQICrlt8XIt5I8ngsy6RZRQ5rt+pUix7Shcn8
DgcunuINYfQtkfw=
=LIjp
-----END PGP SIGNATURE-----
Merge tag 'pci-v7.0-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Don't try to enable Extended Tags on VFs since that bit is Reserved
and causes misleading log messages (Håkon Bugge)
- Initialize Endpoint Read Completion Boundary to match Root Port,
regardless of ACPI _HPX (Håkon Bugge)
- Apply _HPX PCIe Setting Record only to AER configuration, and only
when OS owns PCIe hotplug but not AER, to avoid clobbering Extended
Tag and Relaxed Ordering settings (Håkon Bugge)
Resource management:
- Move CardBus code to setup-cardbus.c and only build it when
CONFIG_CARDBUS is set (Ilpo Järvinen)
- Fix bridge window alignment with optional resources, where
additional alignment requirement was previously lost (Ilpo
Järvinen)
- Stop over-estimating bridge window size since they are now assigned
without any gaps between them (Ilpo Järvinen)
- Increase resource MAX_IORES_LEVEL to avoid /proc/iomem flattening
for nested bridges and endpoints (Ilpo Järvinen)
- Add pbus_mem_size_optional() to handle sizes of optional resources
(SR-IOV VF BARs, expansion ROMs, bridge windows) (Ilpo Järvinen)
- Don't claim disabled bridge windows to avoid spurious claim
failures (Ilpo Järvinen)
Driver binding:
- Fix device reference leak in pcie_port_remove_service() (Uwe
Kleine-König)
- Move pcie_port_bus_match() and pcie_port_bus_type to PCIe-specific
portdrv.c (Uwe Kleine-König)
- Convert portdrv to use pcie_port_bus_type.probe() and .remove()
callbacks so .probe() and .remove() can eventually be removed from
struct device_driver (Uwe Kleine-König)
Error handling:
- Clear stale errors on reporting agents upon probe so they don't
look like recent errors (Lukas Wunner)
- Add generic RAS tracepoint for hotplug events (Shuai Xue)
- Add RAS tracepoint for link speed changes (Shuai Xue)
Power management:
- Avoid redundant delay on transition from D3hot to D3cold if the
device was already in D3hot (Brian Norris)
- Prevent runtime suspend until devices are fully initialized to
avoid saving incompletely configured device state (Brian Norris)
Power control:
- Add power_on/off callbacks with generic signature to pwrseq,
tc9563, and slot drivers so they can be used by pwrctrl core
(Manivannan Sadhasivam)
- Add PCIe M.2 connector support to the slot pwrctrl driver
(Manivannan Sadhasivam)
- Switch to pwrctrl interfaces to create, destroy, and power on/off
devices, calling them from host controller drivers instead of the
PCI core (Manivannan Sadhasivam)
- Drop qcom .assert_perst() callbacks since this is now done by the
controller driver instead of the pwrctrl driver (Manivannan
Sadhasivam)
Virtualization:
- Remove an incorrect unlock in pci_slot_trylock() error handling
(Jinhui Guo)
- Lock the bridge device for slot reset (Keith Busch)
- Enable ACS after IOMMU configuration on OF platforms so ACS is
enabled an all devices; previously the first device enumerated
(typically a Root Port) didn't have ACS enabled (Manivannan
Sadhasivam)
- Disable ACS Source Validation for IDT 0x80b5 and 0x8090 switches to
work around hardware erratum; previously ACS SV was only
temporarily disabled, which worked for enumeration but not after
reset (Manivannan Sadhasivam)
Peer-to-peer DMA:
- Release per-CPU pgmap ref when vm_insert_page() fails to avoid hang
when removing the PCI device (Hou Tao)
- Remove incorrect p2pmem_alloc_mmap() warning about page refcount
(Hou Tao)
Endpoint framework:
- Add configfs sub-groups synchronously to avoid NULL pointer
dereference when racing with removal (Liu Song)
- Fix swapped parameters in pci_{primary/secondary}_epc_epf_unlink()
functions (Manikanta Maddireddy)
ASPEED PCIe controller driver:
- Add ASPEED Root Complex DT binding and driver (Jacky Chou)
Freescale i.MX6 PCIe controller driver:
- Add DT binding and driver support for an optional external refclock
in addition to the refclock from the internal PLL (Richard Zhu)
- Fix CLKREQ# control so host asserts it during enumeration and
Endpoints can use it afterwards to exit the L1.2 link state
(Richard Zhu)
NVIDIA Tegra PCIe controller driver:
- Export irq_domain_free_irqs() to allow PCI/MSI drivers that tear
down MSI domains to be built as modules (Aaron Kling)
- Allow pci-tegra to be built as a module (Aaron Kling)
NVIDIA Tegra194 PCIe controller driver:
- Relax Kconfig so tegra194 can be built for platforms beyond
Tegra194 (Vidya Sagar)
Qualcomm PCIe controller driver:
- Merge SC8180x DT binding into SM8150 (Krzysztof Kozlowski)
- Move SDX55, SDM845, QCS404, IPQ5018, IPQ6018, IPQ8074 Gen3,
IPQ8074, IPQ4019, IPQ9574, APQ8064, MSM8996, APQ8084 to dedicated
schema (Krzysztof Kozlowski)
- Add DT binding and driver support for SA8255p Endpoint being
configured by firmware (Mrinmay Sarkar)
- Parse PERST# from all PCIe bridge nodes for future platforms that
will have PERST# in Switch Downstream Ports as well as in Root
Ports (Manivannan Sadhasivam)
Renesas RZ/G3S PCIe controller driver:
- Use pci_generic_config_write() since the writability provided by
the custom wrapper is unnecessary (Claudiu Beznea)
SOPHGO PCIe controller driver:
- Disable ASPM L0s and L1 on Sophgo 2044 PCIe Root Ports (Inochi
Amaoto)
Synopsys DesignWare PCIe controller driver:
- Extend PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() to return a
pointer to the preceding Capability, to allow removal of
Capabilities that are advertised but not fully implemented (Qiang
Yu)
- Remove MSI and MSI-X Capabilities in platforms that can't support
them, so the PCI core automatically falls back to INTx (Qiang Yu)
- Add ASPM L1.1 and L1.2 Substates context to debugfs ltssm_status
for drivers that support this (Shawn Lin)
- Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if
link is not up to avoid an unnecessary timeout (Manivannan
Sadhasivam)
- Revert dw-rockchip, qcom, and DWC core changes that used link-up
IRQs to trigger enumeration instead of waiting for link to be up
because the PCI core doesn't allocate bus number space for
hierarchies that might be attached (Niklas Cassel)
- Make endpoint iATU entry for MSI permanent instead of programming
it dynamically, which is slow and racy with respect to other
concurrent traffic, e.g., eDMA (Koichiro Den)
- Use iMSI-RX MSI target address when possible to fix endpoints using
32-bit MSI (Shawn Lin)
- Allow DWC host controller driver probe to continue if device is not
found or found but inactive; only fail when there's an error with
the link (Manivannan Sadhasivam)
- For controllers like NXP i.MX6QP and i.MX7D, where LTSSM registers
are not accessible after PME_Turn_Off, simply wait 10ms instead of
polling for L2/L3 Ready (Richard Zhu)
- Use multiple iATU entries to map large bridge windows and DMA
ranges when necessary instead of failing (Samuel Holland)
- Add EPC dynamic_inbound_mapping feature bit for Endpoint
Controllers that can update BAR inbound address translation without
requiring EPF driver to clear/reset the BAR first, and advertise it
for DWC-based Endpoints (Koichiro Den)
- Add EPC subrange_mapping feature bit for Endpoint Controllers that
can map multiple independent inbound regions in a single BAR,
implement subrange mapping, advertise it for DWC-based Endpoints,
and add Endpoint selftests for it (Koichiro Den)
- Make resizable BARs work for Endpoint multi-PF configurations;
previously it only worked for PF 0 (Aksh Garg)
- Fix Endpoint non-PF 0 support for BAR configuration, ATU mappings,
and Address Match Mode (Aksh Garg)
- Set up iATU when ECAM is enabled; previously IO and MEM outbound
windows weren't programmed, and ECAM-related iATU entries weren't
restored after suspend/resume, so config accesses failed (Krishna
Chaitanya Chundru)
Miscellaneous:
- Use system_percpu_wq and WQ_PERCPU to explicitly request per-CPU
work so WQ_UNBOUND can eventually be removed (Marco Crivellari)"
* tag 'pci-v7.0-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (176 commits)
PCI/bwctrl: Disable BW controller on Intel P45 using a quirk
PCI: Disable ACS SV for IDT 0x8090 switch
PCI: Disable ACS SV for IDT 0x80b5 switch
PCI: Cache ACS Capabilities register
PCI: Enable ACS after configuring IOMMU for OF platforms
PCI: Add ACS quirk for Pericom PI7C9X2G404 switches [12d8:b404]
PCI: Add ACS quirk for Qualcomm Hamoa & Glymur
PCI: Use device_lock_assert() to verify device lock is held
PCI: Use lockdep_assert_held(pci_bus_sem) to verify lock is held
PCI: Fix pci_slot_lock () device locking
PCI: Fix pci_slot_trylock() error handling
PCI: Mark Nvidia GB10 to avoid bus reset
PCI: Mark ASM1164 SATA controller to avoid bus reset
PCI: host-generic: Avoid reporting incorrect 'missing reg property' error
PCI/PME: Replace RMW of Root Status register with direct write
PCI/AER: Clear stale errors on reporting agents upon probe
PCI: Don't claim disabled bridge windows
PCI: rzg3s-host: Fix device node reference leak in rzg3s_pcie_host_parse_port()
PCI: dwc: Fix missing iATU setup when ECAM is enabled
PCI: dwc: Clean up iATU index usage in dw_pcie_iatu_setup()
...
- Add interrupt redirection infrastructure
Some PCI controllers use a single demultiplexing interrupt for the MSI
interrupts of subordinate devices.
This prevents setting the interrupt affinity of device interrupts, which
causes device interrupts to be delivered to a single CPU. That obviously is
counterproductive for multi-queue devices and interrupt balancing.
To work around this limitation the new infrastructure installs a dummy
irq_set_affinity() callback which captures the affinity mask and picks a
redirection target CPU out of the mask.
When the PCI controller demultiplexes the interrupts it invokes a new
handling function in the core, which either runs the interrupt handler in
the context of the target CPU or delegates it to irq_work on the target CPU.
- Utilize the interrupt redirection mechanism in the PCI DWC host controller
driver.
This allows affinity control for the subordinate device MSI interrupts
instead of being randomly executed on the CPU which runs the demultiplex
handler.
- Replace the binary 64-bit MSI flag with a DMA mask
Some PCI devices have PCI_MSI_FLAGS_64BIT in the MSI capability, but
implement less than 64 address bits. This breaks on platforms where such a
device is assigned an MSI address higher than what's supported.
With the binary 64-bit flag there is no other choice than disabling 64-bit
MSI support which leaves the device disfunctional.
By using a DMA mask the address limit of a device can be described
correctly which provides support for the above scenario.
- Make use of the DMA mask based address limit in the hda/intel and radeon
drivers to enable them on affected platforms.
- The usual small cleanups and improvements
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmmJyPsQHHRnbHhAa2Vy
bmVsLm9yZwAKCRCmGPVMDXSYocekEADAsS5FlUkFuBy6kODhl5J7b9/oqlL3IEnR
3CdOrFO716dce+Gej+Wp3T93dJ3XsfD7nCZuy99+LwUkTubmaBJXfjY9S+Ket0ID
Wc3ltiD6f3GEFB14rXN+fFG/u+OOLkaXdpbQpiTnqL4JAti9qF80D4uon28+FC/o
wc1MhqVBPbOHU9iM196ngkZuXCNVPLcnZN6PNBgIn0sxx06LcK+daY0bNGxfn5Ua
LY9SD8hN7tYlkDi42nB/ZXMrexqT9cxSqHObmPX+G/QLfXCRBtD+gyVbs+KVzpRL
hmFERTlUh9tUdcQFrjgiZP/r4N5ilzsu6w5ZpSOEsGuahFUPZWJWFFC1D8rmq/Ay
X9HKge1jqXJtbCf0pJM/kdbJKSH5S6aLP3iF37y+PqITIEIX8jIT3oVcvL9hI0BW
HFxpuJfhAVg63kMegZCO/iROTusLHUZr8iwYOM7pEiCE6fP46jPijsPffVIWvrlJ
2LVOv/A5wy9q8FW8sF9/M6CW7cdeYQF06Ce3qAyMxjZjEyR3KFBJCVWjhqyMxZJP
3zFl1XXKXgRO+CDrYKVTPIaXR5D76k/l6MnECQpq81CQyQKm2h6A9PyY+n70FfbZ
BimakUlBGCd92ZbSxzC9pAOiHo0ZoKtc5BhnsRhKVyBCmEKDazEplDuf49/OSZUE
p2kaf/PuOw==
=SCSQ
-----END PGP SIGNATURE-----
Merge tag 'irq-msi-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MSI updates from Thomas Gleixner:
"Updates for the [PCI] MSI subsystem:
- Add interrupt redirection infrastructure
Some PCI controllers use a single demultiplexing interrupt for the
MSI interrupts of subordinate devices.
This prevents setting the interrupt affinity of device interrupts,
which causes device interrupts to be delivered to a single CPU.
That obviously is counterproductive for multi-queue devices and
interrupt balancing.
To work around this limitation the new infrastructure installs a
dummy irq_set_affinity() callback which captures the affinity mask
and picks a redirection target CPU out of the mask.
When the PCI controller demultiplexes the interrupts it invokes a
new handling function in the core, which either runs the interrupt
handler in the context of the target CPU or delegates it to
irq_work on the target CPU.
- Utilize the interrupt redirection mechanism in the PCI DWC host
controller driver.
This allows affinity control for the subordinate device MSI
interrupts instead of being randomly executed on the CPU which runs
the demultiplex handler.
- Replace the binary 64-bit MSI flag with a DMA mask
Some PCI devices have PCI_MSI_FLAGS_64BIT in the MSI capability,
but implement less than 64 address bits. This breaks on platforms
where such a device is assigned an MSI address higher than what's
supported.
With the binary 64-bit flag there is no other choice than disabling
64-bit MSI support which leaves the device disfunctional.
By using a DMA mask the address limit of a device can be described
correctly which provides support for the above scenario.
- Make use of the DMA mask based address limit in the hda/intel and
radeon drivers to enable them on affected platforms
- The usual small cleanups and improvements"
* tag 'irq-msi-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ALSA: hda/intel: Make MSI address limit based on the device DMA limit
drm/radeon: Make MSI address limit based on the device DMA limit
PCI/MSI: Check the device specific address mask in msi_verify_entries()
PCI/MSI: Convert the boolean no_64bit_msi flag to a DMA address mask
genirq/redirect: Prevent writing MSI message on affinity change
PCI/MSI: Unmap MSI-X region on error
genirq: Update effective affinity for redirected interrupts
PCI: dwc: Enable MSI affinity support
PCI: dwc: Code cleanup
genirq: Add interrupt redirection infrastructure
genirq/msi: Correct kernel-doc in <linux/msi.h>
- Relax Kconfig so tegra194 can be built for platforms beyond Tegra194
(Vidya Sagar)
* pci/controller/tegra194:
PCI: dwc: tegra194: Broaden architecture dependency
- Export irq_domain_free_irqs() to allow PCI/MSI drivers that tear down
MSI domains to be built as modules (Aaron Kling)
- Export tegra_cpuidle_pcie_irqs_in_use(), which disables Tegra CC6 while
PCI IRQs are in use, so pci-tegra can be built as a module (Aaron Kling)
- Allow pci-tegra to be built as a module (Aaron Kling)
* pci/controller/tegra:
PCI: tegra: Allow building as a module
cpuidle: tegra: Export tegra_cpuidle_pcie_irqs_in_use()
irqdomain: Export irq_domain_free_irqs()
- Use pci_generic_config_write(), not custom wrapper, since we don't need
the writability provided by the wrapper (Claudiu Beznea)
- Drop lock around RZG3S_PCI_MSIRS and RZG3S_PCI_PINTRCVIS updates since
they are RW1C registers (Claudiu Beznea)
- Fix a device node reference leak in rzg3s_pcie_host_parse_port() (Felix
Gu)
* pci/controller/rzg3s-host:
PCI: rzg3s-host: Fix device node reference leak in rzg3s_pcie_host_parse_port()
PCI: rzg3s-host: Drop the lock on RZG3S_PCI_MSIRS and RZG3S_PCI_PINTRCVIS
PCI: rzg3s-host: Use pci_generic_config_write() for the root bus
- Use regulator APIs to control the 3v3 power supply of PCIe slots (Hal
Feng)
* pci/controller/plda-starfive:
PCI: starfive: Use regulator APIs to control the 3v3 power supply of PCIe slots
- Add DT binding and driver support for SA8255p Endpoint being managed by
firmware (Mrinmay Sarkar)
* pci/controller/dwc-qcom-ep:
PCI: qcom-ep: Add support for firmware-managed PCIe Endpoint
dt-bindings: PCI: qcom,sa8255p-pcie-ep: Document firmware managed PCIe endpoint
- Parse PERST# from all PCIe bridge nodes for future platforms that will
have PERST# in Switch Downstream Ports as well as in Root Ports
(Manivannan Sadhasivam)
- Rename qcom PERST# assert/deassert helpers, e.g., qcom_ep_reset_assert(),
to avoid confusion with Endpoint interfaces (Manivannan Sadhasivam)
* pci/controller/dwc-qcom:
PCI: qcom: Rename PERST# assert/deassert helpers for uniformity
PCI: qcom: Parse PERST# from all PCIe bridge nodes
# Conflicts:
# drivers/pci/controller/dwc/pcie-qcom.c
- Add DT binding and driver support for an optional external refclock in
addition to the refclock from the internal PLL (Richard Zhu)
- Apply i.MX95 ERR051586 erratum workaround (release CLKREQ# so endpoint
can assert it when required) during resume (Richard Zhu)
- Enable i.MX95 REFCLK by overriding CLKREQ# so it's driven by default
(Richard Zhu)
- Clear CLKREQ# override if link is up and DT says 'supports-clkreq' so
endpoints can use CLKREQ# to exit the L1.2 state (Richard Zhu)
* pci/controller/dwc-imx6:
PCI: imx6: Clear CLKREQ# override if 'supports-clkreq' DT property is available
PCI: imx6: Add CLKREQ# override to enable REFCLK for i.MX95 PCIe
PCI: dwc: Invoke post_init in dw_pcie_resume_noirq()
PCI: imx6: Add external reference clock input mode support
dt-bindings: PCI: pci-imx6: Add external reference clock input
dt-bindings: PCI: dwc: Add external reference clock input
- Extend PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() to return a
pointer to the preceding Capability (Qiang Yu)
- Add dw_pcie_remove_capability() and dw_pcie_remove_ext_capability() to
remove Capabilities that are advertised but not fully implemented (Qiang
Yu)
- Remove MSI and MSI-X Capabilities for DWC controllers in platforms that
can't support them, so we automatically fall back to INTx (Qiang Yu)
- Remove MSI-X and DPC Capabilities for Qualcomm platforms that advertise
but don't support them (Qiang Yu)
- Remove duplicate dw_pcie_ep_hide_ext_capability() function and replace
with dw_pcie_remove_ext_capability() (Qiang Yu)
- Add ASPM L1.1 and L1.2 Substates context to debugfs ltssm_status for
drivers that support this (Shawn Lin)
- Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if link
is not up to avoid an unnecessary timeout (Manivannan Sadhasivam)
- Revert dw-rockchip, qcom, and DWC core changes that used link-up IRQs to
trigger enumeration instead of waiting for link to be up because the PCI
core doesn't allocate bus number space for hierarchies that might be
attached (Niklas Cassel)
- Make endpoint iATU entry for MSI permanent instead of programming it
dynamically, which is slow and racy with respect to other concurrent
traffic, e.g., eDMA (Koichiro Den)
- Use iMSI-RX MSI target address when possible to fix endpoints using
32-bit MSI (Shawn Lin)
- Make dw_pcie_ltssm_status_string() available and use it for logging
errors in dw_pcie_wait_for_link() (Manivannan Sadhasivam)
- Return -ENODEV when dw_pcie_wait_for_link() finds no devices, -EIO for
device present but inactive, -ETIMEDOUT for other failures, so callers
can handle these cases differently (Manivannan Sadhasivam)
- Allow DWC host controller driver probe to continue if device is not found
or found but inactive; only fail when there's an error with the link
(Manivannan Sadhasivam)
- For controllers like NXP i.MX6QP and i.MX7D, where LTSSM registers are
not accessible after PME_Turn_Off, simply wait 10ms instead of polling
for L2/L3 Ready (Richard Zhu)
- Use multiple iATU entries to map large bridge windows and DMA ranges when
necessary instead of failing (Samuel Holland)
- Rename struct dw_pcie_rp.has_msi_ctrl to .use_imsi_rx for clarity (Qiang
Yu)
- Add EPC dynamic_inbound_mapping feature bit for Endpoint Controllers that
can update BAR inbound address translation without requiring EPF driver
to clear/reset the BAR first, and advertise it for DWC-based Endpoints
(Koichiro Den)
- Add EPC subrange_mapping feature bit for Endpoint Controllers that can
map multiple independent inbound regions in a single BAR, implement
subrange mapping, advertise it for DWC-based Endpoints, and add Endpoint
selftests for it (Koichiro Den)
- Allow overriding default BAR sizes for pci-epf-test (Niklas Cassel)
- Make resizable BARs work for Endpoint multi-PF configurations; previously
it only worked for PF 0 (Aksh Garg)
- Fix Endpoint non-PF 0 support for BAR configuration, ATU mappings, and
Address Match Mode (Aksh Garg)
- Fix issues with outbound iATU index assignment that caused iATU index to
be out of bounds (Niklas Cassel)
- Clean up iATU index tracking to be consistent (Niklas Cassel)
- Set up iATU when ECAM is enabled; previously IO and MEM outbound windows
weren't programmed, and ECAM-related iATU entries weren't restored after
suspend/resume, so config accesses failed (Krishna Chaitanya Chundru)
* pci/controller/dwc:
PCI: dwc: Fix missing iATU setup when ECAM is enabled
PCI: dwc: Clean up iATU index usage in dw_pcie_iatu_setup()
PCI: dwc: Fix msg_atu_index assignment
PCI: dwc: ep: Add comment explaining controller level PTM access in multi PF setup
PCI: dwc: ep: Add per-PF BAR and inbound ATU mapping support
PCI: dwc: ep: Fix resizable BAR support for multi-PF configurations
PCI: endpoint: pci-epf-test: Allow overriding default BAR sizes
selftests: pci_endpoint: Add BAR subrange mapping test case
misc: pci_endpoint_test: Add BAR subrange mapping test case
PCI: endpoint: pci-epf-test: Add BAR subrange mapping test support
Documentation: PCI: endpoint: Clarify pci_epc_set_bar() usage
PCI: dwc: ep: Support BAR subrange inbound mapping via Address Match Mode iATU
PCI: dwc: Advertise dynamic inbound mapping support
PCI: endpoint: Add BAR subrange mapping support
PCI: endpoint: Add dynamic_inbound_mapping EPC feature
PCI: dwc: Rename dw_pcie_rp::has_msi_ctrl to dw_pcie_rp::use_imsi_rx for clarity
PCI: dwc: Fix grammar and formatting for comment in dw_pcie_remove_ext_capability()
PCI: dwc: Use multiple iATU windows for mapping large bridge windows and DMA ranges
PCI: dwc: Remove duplicate dw_pcie_ep_hide_ext_capability() function
PCI: dwc: Skip waiting for L2/L3 Ready if dw_pcie_rp::skip_l23_wait is true
PCI: dwc: Fail dw_pcie_host_init() if dw_pcie_wait_for_link() returns -ETIMEDOUT
PCI: dwc: Rework the error print of dw_pcie_wait_for_link()
PCI: dwc: Rename and move ltssm_status_string() to pcie-designware.c
PCI: dwc: Return -EIO from dw_pcie_wait_for_link() if device is not active
PCI: dwc: Return -ENODEV from dw_pcie_wait_for_link() if device is not found
PCI: dwc: Use cfg0_base as iMSI-RX target address to support 32-bit MSI devices
PCI: dwc: ep: Cache MSI outbound iATU mapping
Revert "PCI: dwc: Don't wait for link up if driver can detect Link Up event"
Revert "PCI: qcom: Enumerate endpoints based on Link up event in 'global_irq' interrupt"
Revert "PCI: qcom: Enable MSI interrupts together with Link up if 'Global IRQ' is supported"
Revert "PCI: qcom: Don't wait for link if we can detect Link Up"
Revert "PCI: dw-rockchip: Enumerate endpoints based on dll_link_up IRQ"
Revert "PCI: dw-rockchip: Don't wait for link since we can detect Link Up"
PCI: dwc: Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if link is not up
PCI: dw-rockchip: Change get_ltssm() to provide L1 Substates info
PCI: dwc: Add L1 Substates context to ltssm_status of debugfs
PCI: qcom: Remove DPC Extended Capability
PCI: qcom: Remove MSI-X Capability for Root Ports
PCI: dwc: Remove MSI/MSIX capability for Root Port if iMSI-RX is used as MSI controller
PCI: dwc: Add new APIs to remove standard and extended Capability
PCI: Add preceding capability position support in PCI_FIND_NEXT_*_CAP macros
- Add config guards to fix build error when sg2042 is a module but j721e is
built-in (Siddharth Vadapalli)
* pci/controller/cadence-j721e:
PCI: j721e: Add config guards for Cadence Host and Endpoint library APIs
- Fix cdns_pcie_host_dma_ranges_cmp() to prevent possible invalid sort
order (Ian Rogers)
* pci/controller/cadence:
PCI: cadence: Avoid signed 64-bit truncation and invalid sort
When pci_host_common_ecam_create() calls of_address_to_resource(), it
assumes all errors are due to a missing "reg" property in the device tree
node, when they may be due to a malformed "reg" property.
This can manifest when running the qemu "virt" board with a 32-bit kernel
and `highmem=on` and leads to the very confusing error message:
pci-host-generic 4010000000.pcie: host bridge /pcie@10000000 ranges:
pci-host-generic 4010000000.pcie: IO 0x003eff0000..0x003effffff -> 0x0000000000
pci-host-generic 4010000000.pcie: MEM 0x0010000000..0x003efeffff -> 0x0010000000
pci-host-generic 4010000000.pcie: MEM 0x8000000000..0xffffffffff -> 0x8000000000
pci-host-generic 4010000000.pcie: missing "reg" property
pci-host-generic 4010000000.pcie: probe with driver pci-host-generic failed with error -75
Make the error message more generic.
Link: https://www.qemu.org/docs/master/system/arm/virt.html
Signed-off-by: Jess <jess@jessie.cafe>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20260120004444.191093-1-jess@jessie.cafe
In rzg3s_pcie_host_parse_port(), of_get_next_child() returns a device node
with an incremented reference count that must be released with
of_node_put(). The current code fails to call of_node_put() which causes a
reference leak.
Use the __free(device_node) attribute to ensure automatic cleanup when the
variable goes out of scope.
Fixes: 7ef502fb35 ("PCI: Add Renesas RZ/G3S host controller driver")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Acked-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260204-rzg3s-v1-1-142bc81c3312@gmail.com
When ECAM is enabled, the driver skipped calling dw_pcie_iatu_setup()
before configuring ECAM iATU entries. This left IO and MEM outbound
windows unprogrammed, resulting in broken IO transactions. Additionally,
dw_pcie_config_ecam_iatu() was only called during host initialization,
so ECAM-related iATU entries were not restored after suspend/resume,
leading to failures in configuration space access
To resolve these issues, move the ECAM iATU configuration to
dw_pcie_iatu_setup(), and invoke dw_pcie_iatu_setup() when ECAM is
enabled.
Furthermore, add error checks in dw_pcie_prog_outbound_atu() and
dw_pcie_prog_inbound_atu() such that an error is returned if the caller is
trying to program an iATU that is outside the number of iATUs supported by
the controller.
Fixes: f6fd357f7a ("PCI: dwc: Prepare the driver for enabling ECAM mechanism using iATU 'CFG Shift Feature'")
Reported-by: Maciej W. Rozycki <macro@orcam.me.uk>
Closes: https://lore.kernel.org/all/alpine.DEB.2.21.2511280256260.36486@angie.orcam.me.uk/
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Co-developed-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
[mani: used imperative tone]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Tested-by: Maciej W. Rozycki <macro@orcam.me.uk>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hans Zhang <zhanghuabing@ecosda.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Cc: stable+noautosel@kernel.org # depends on Clean up iATU index usage in dw_pcie_iatu_setup()
Link: https://patch.msgid.link/20260127151038.1484881-8-cassel@kernel.org
The current iATU index usage in dw_pcie_iatu_setup() is a mess.
For outbound address translation the index is incremented before usage.
For inbound address translation the index is incremented after usage.
Incrementing the index after usage make much more sense, and make the
index usage consistent for both outbound and inbound address translation.
Most likely, the overly complicated logic for the outbound address
translation is because the iATU at index 0 is reserved for CFG IOs
(dw_pcie_other_conf_map_bus()), however, we should be able to use the
exact same logic for the indexing of the outbound and inbound iATUs.
(Only the starting index should be different.)
Create two new variables ob_iatu_index and ib_iatu_index, which makes
it more clear from the name itself that it is a zeroes based index,
and only increment the index if the iATU configuration call succeeded.
Since we always check if there is an index available immediately before
programming the iATU, we can remove the useless "ranges exceed outbound
iATU size" warnings, as the code is already unreachable. For the same
reason, we can also remove the useless breaks outside of the while loops.
No functional changes intended.
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Tested-by: Maciej W. Rozycki <macro@orcam.me.uk>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hans Zhang <zhanghuabing@ecosda.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260127151038.1484881-7-cassel@kernel.org
When dw_pcie_iatu_setup() configures outbound address translation for both
type PCIE_ATU_TYPE_MEM and PCIE_ATU_TYPE_IO, the iATU index to use is
incremented before calling dw_pcie_prog_outbound_atu().
However for msg_atu_index, the index is not incremented before use,
causing the iATU index to be the same as the last configured iATU index,
which means that it will incorrectly use the same iATU index that is
already in use, breaking outbound address translation.
In total there are three problems with this code:
-It assigns msg_atu_index the same index that was used for the last
outbound address translation window, rather than incrementing the index
before assignment.
-The index should only be incremented (and msg_atu_index assigned) if the
use_atu_msg feature is actually requested/in use (pp->use_atu_msg is set).
-If the use_atu_msg feature is requested/in use, and there are no outbound
iATUs available, the code should return an error, as otherwise when this
this feature is used, it will use an iATU index that is out of bounds.
Fixes: e1a4ec1a95 ("PCI: dwc: Add generic MSG TLP support for sending PME_Turn_Off when system suspend")
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Tested-by: Maciej W. Rozycki <macro@orcam.me.uk>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hans Zhang <zhanghuabing@ecosda.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260127151038.1484881-6-cassel@kernel.org
The pci-hyperv-intf driver has unnecessary empty module_init and
module_exit functions. Remove them. Note that if a module_init function
exists, a module_exit function must also exist; otherwise, the module
cannot be unloaded.
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Field pci_bus in struct hv_pcibus_device is unused since
commit 418cb6c8e0 ("PCI: hv: Generify PCI probing"). Remove it.
No functional change.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
Reviewed-by: Srivatsa S. Bhat (Microsoft) <srivatsa@csail.mit.edu>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
PCIe r6.0, section 7.9.15 requires PTM capability in exactly one
function to control all PTM-capable functions. This makes PTM registers
controller level rather than per-function.
Add a comment explaining why PTM capability registers are accessed
using the standard DBI accessors instead of func_no indexed
per-function accessors.
Suggested-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Aksh Garg <a-garg7@ti.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260130115516.515082-4-a-garg7@ti.com
The commit 24ede430fa ("PCI: designware-ep: Add multiple PFs support
for DWC") added support for multiple PFs in the DWC driver, but the
implementation was incomplete. It did not properly support MSI/MSI-X,
as well as BAR and inbound ATU mapping for multiple PFs. The MSI/MSI-X
issue was later fixed by commit 47a062609a ("PCI: designware-ep:
Modify MSI and MSIX CAP way of finding") by introducing a per-PF
struct dw_pcie_ep_func.
However, even with both commits, the multiple PF support in the driver
remains broken because BAR configuration and ATU mappings are managed
globally in struct dw_pcie_ep, meaning all PFs share the same BAR-to-ATU
mapping table. This causes one PF's EPF to overwrite the address
translation of another PF's EPF in the internal ATU region,
creating conflicts when multiple physical functions attempt to
configure their BARs independently.
The commit cfbc98dbf44d ("PCI: dwc: ep: Support BAR subrange inbound
mapping via Address Match Mode iATU") later introduced Address Match
Mode support, which suffers from the same multi-PF conflict issue.
Fix this by moving the required members from struct dw_pcie_ep to
struct dw_pcie_ep_func, similar to what commit 47a062609a
("PCI: designware-ep: Modify MSI and MSIX CAP way of finding") did for
MSI/MSI-X capability support, to allow proper multi-function endpoint
operation, where each PF can configure its BARs and corresponding
internal ATU region without interfering with other PFs.
Fixes: 24ede430fa ("PCI: designware-ep: Add multiple PFs support for DWC")
Fixes: cc839bef77 ("PCI: dwc: ep: Support BAR subrange inbound mapping via Address Match Mode iATU")
Signed-off-by: Aksh Garg <a-garg7@ti.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260130115516.515082-3-a-garg7@ti.com
Currently, s32g_pcie_parse_ports() exercises the 'err_port' path even
during the success case. This results in ports getting deleted after
successful parsing of Root Ports.
Hence, skip the removal of Root Ports during success.
Fixes: 5cbc7d3e31 ("PCI: s32g: Add NXP S32G PCIe controller driver (RC)")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
[mani: reworded subject and description]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260202151050.1446165-1-vincent.guittot@linaro.org
The resizable BAR support added by the commit 3a3d4cabe6 ("PCI: dwc: ep:
Allow EPF drivers to configure the size of Resizable BARs") incorrectly
configures the resizable BARs only for the first Physical Function (PF0)
in EP mode.
The resizable BAR configuration functions use generic dw_pcie_*_dbi()
operations instead of physical function specific dw_pcie_ep_*_dbi()
operations. This causes resizable BAR configuration to always target
PF0 regardless of the requested function number.
Additionally, dw_pcie_ep_init_non_sticky_registers() only initializes
resizable BAR registers for PF0, leaving other PFs unconfigured during
the execution of this function.
Fix this by using physical function specific configuration space access
operations throughout the resizable BAR code path and initializing
registers for all the physical functions that support resizable BARs.
Fixes: 3a3d4cabe6 ("PCI: dwc: ep: Allow EPF drivers to configure the size of Resizable BARs")
Signed-off-by: Aksh Garg <a-garg7@ti.com>
[mani: added stable tag]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260130115516.515082-2-a-garg7@ti.com
Extend dw_pcie_ep_set_bar() to support inbound mappings for BAR
subranges using Address Match Mode IB iATU when pci_epf_bar.num_submap
is non-zero.
Rename the existing BAR-match helper into dw_pcie_ep_ib_atu_bar() and
introduce dw_pcie_ep_ib_atu_addr() for Address Match Mode. When
num_submap is non-zero, read the assigned BAR base address and program
one inbound iATU window per subrange. Validate the submap array before
programming:
- each subrange is aligned to pci->region_align
- subranges cover the whole BAR (no gaps and no overlaps)
Track Address Match Mode mappings and tear them down on clear_bar() and
on set_bar() error paths to avoid leaving half-programmed state or
untranslated BAR holes.
Advertise this capability by extending the common feature bit
initializer macro (DWC_EPC_COMMON_FEATURES).
This enables multiple inbound windows within a single BAR, which is
useful on platforms where usable BARs are scarce but EPFs need multiple
inbound regions.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260124145012.2794108-5-den@valinux.co.jp
The DesignWare EP core has supported updating the inbound iATU mapping
for an already configured BAR (i.e. allowing pci_epc_set_bar() to be
called again without a prior pci_epc_clear_bar()) since
commit 4284c88fff ("PCI: designware-ep: Allow pci_epc_set_bar() update
inbound map address").
Now that this capability is exposed via the dynamic_inbound_mapping EPC
feature bit, set it for DWC-based EP glue drivers using a common
initializer macro to avoid duplicating the same flag in each driver.
Note that pci-layerscape-ep.c is untouched. It currently constructs the
feature struct dynamically in ls_pcie_ep_init(). Once converted to a
static feature definition, it will use DWC_EPC_COMMON_FEATURES as well.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260124145012.2794108-4-den@valinux.co.jp
The current "has_msi_ctrl" flag name is misleading because it suggests the
presence of any MSI controller, while it is specifically set for platforms
that lack .msi_init() callback and don't have "msi-parent" or "msi-map"
device tree properties, indicating they rely on the iMSI-RX module for MSI
functionality.
Rename it to "use_imsi_rx" to make the intent clear:
- When true: Platform uses the iMSI-RX module for MSI handling
- When false: Platform has other MSI controller support (ITS/MBI, external
MSI controller)
No functional changes, only improves code readability and eliminates
naming confusion.
Signed-off-by: Qiang Yu <qiang.yu@oss.qualcomm.com>
[mani: renamed 'uses_imsi_rx' to 'use_imsi_rx' per https://lore.kernel.org/linux-pci/09f9acc1-d1ad-4971-8488-f0268cf08799@rock-chips.com]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Link: https://patch.msgid.link/20260121-remove_cap_clean_up-v1-2-e78115e5d467@oss.qualcomm.com
Fix a grammatical error in the comment by changing "it's" to "its". Also
add a blank line after the variable declaration for better code
formatting.
Signed-off-by: Qiang Yu <qiang.yu@oss.qualcomm.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Link: https://patch.msgid.link/20260121-remove_cap_clean_up-v1-1-e78115e5d467@oss.qualcomm.com
The DWC driver tries to use a single iATU region for mapping the individual
entries of the bridge window and DMA range. If a bridge window/DMA range is
larger than the iATU inbound/outbound window size, then the mapping will
fail.
Hence, avoid this failure by using multiple iATU windows to map the whole
region. If the region runs out of iATU windows, then return failure.
Signed-off-by: Charles Mirabile <cmirabil@redhat.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Co-developed-by: Randolph Lin <randolph@andestech.com>
Signed-off-by: Randolph Lin <randolph@andestech.com>
[mani: reworded description, minor code cleanup]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Acked-by: Charles Mirabile <cmirabil@redhat.com>
Link: https://patch.msgid.link/20260109113430.2767264-1-randolph@andestech.com
Remove dw_pcie_ep_hide_ext_capability() and replace its usage with
dw_pcie_remove_ext_capability(). Both functions serve the same purpose
of hiding PCIe extended capabilities, but dw_pcie_remove_ext_capability()
provides a cleaner API that doesn't require the caller to specify the
previous capability ID.
Suggested-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Qiang Yu <qiang.yu@oss.qualcomm.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Tested-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20251224-remove_dw_pcie_ep_hide_ext_capability-v1-1-4302c9cdc316@oss.qualcomm.com
In NXP i.MX6QP and i.MX7D SoCs, LTSSM registers are not accessible once
PME_Turn_Off message is broadcasted to the link. So there is no way to
verify whether the link has entered L2/L3 Ready state or not.
Hence, add a new flag 'dw_pcie_rp::skip_l23_ready' and set it to 'true' for
the above mentioned SoCs. This flag when set, will allow the DWC core to
skip polling for L2/L3 Ready state and just wait for 10ms as recommended in
the PCIe spec r6.0, sec 5.3.3.2.1.
Fixes: a528d1a725 ("PCI: imx6: Use DWC common suspend resume method")
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[mani: renamed flag to skip_l23_ready and reworded description]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260114083300.3689672-2-hongxing.zhu@nxp.com
The dw_pcie_wait_for_link() API now distinguishes link failures more
precisely:
-ENODEV: Device not found on the bus.
-EIO: Device found but inactive.
-ETIMEDOUT: Link failed to come up.
Out of these three errors, only -ETIMEDOUT represents a definitive link
failure since it signals that something is wrong with the link. For the
other two errors, there is a possibility that the link might come up later.
So fail dw_pcie_host_init() if -ETIMEDOUT is returned and skip the failure
otherwise.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260120-pci-dwc-suspend-rework-v4-5-2f32d5082549@oss.qualcomm.com
For the cases where the link cannot come up later i.e., when LTSSM is not
in Detect.{Quiet/Active} or Poll.{Active/Compliance} states,
dw_pcie_wait_for_link() should log an error.
So promote dev_info() to dev_err(), reword the error log to make it clear
and also print the LTSSM state to aid debugging.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Tested-by: Richard Zhu <hongxing.zhu@nxp.com>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260120-pci-dwc-suspend-rework-v4-4-2f32d5082549@oss.qualcomm.com
Rename ltssm_status_string() to dw_pcie_ltssm_status_string() and move it
to the common file pcie-designware.c so that this function could be used
outside of pcie-designware-debugfs.c file.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Tested-by: Richard Zhu <hongxing.zhu@nxp.com>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260120-pci-dwc-suspend-rework-v4-3-2f32d5082549@oss.qualcomm.com
There are cases where the PCIe device would be physically connected to the
bus, but the device firmware might not be active. So the LTSSM will
get stuck in POLL.{Active/Compliance} states.
This behavior is common with endpoint devices controlled by the PCI
Endpoint framework, where the device will wait for the user to start its
operation through configfs.
For those cases, print the relevant log and return -EIO to indicate that
the device is present, but not active. This will allow the callers to skip
the failure as the device might become active in the future.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260120-pci-dwc-suspend-rework-v4-2-2f32d5082549@oss.qualcomm.com