linux

mirror of https://github.com/torvalds/linux.git synced 2026-03-14 02:06:15 +01:00

Author	SHA1	Message	Date
Kees Cook	189f164e57	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 08:26:33 -08:00
Linus Torvalds	32a92f8c89	Convert more 'alloc_obj' cases to default GFP_KERNEL arguments This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 20:03:00 -08:00
Linus Torvalds	323bbfcf1e	Convert 'alloc_flex' family to use the new default GFP_KERNEL argument This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Linus Torvalds	cee73b1e84	RISC-V updates for v7.0 - Add support for control flow integrity for userspace processes. This is based on the standard RISC-V ISA extensions Zicfiss and Zicfilp - Improve ptrace behavior regarding vector registers, and add some selftests - Optimize our strlen() assembly - Enable the ISO-8859-1 code page as built-in, similar to ARM64, for EFI volume mounting - Clean up some code slightly, including defining copy_user_page() as copy_page() rather than memcpy(), aligning us with other architectures; and using max3() to slightly simplify an expression in riscv_iommu_init_check() -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmmOYpYACgkQx4+xDQu9 KkvzOQ/9Fq8ZxWgYofhTPtw9/vps3avheOHlEoRrBWYfn1VkTRPAcbUULL4PGXwg dnVFEl3AcrpOFikIthbukklLeLoOnUshZJBU25zY5h0My1jb63V1//gEwJR6I0dg +V+GJmfzc4+YVaHK6UFdn7j3GgKUbTC7xXRMuGEriAzKPnm3AXAjh94wMNx6depv Li3IXRoZT/HvqIAyfeAoM9STwOzJtE3Sc6fXABkzsIbNTjjdgIqoRSsQsKY10178 z6ox/sVStnLmVaMbOd/ZVN0J70JRDsvK0TC0/13K1ESUbnVia9a3bPIxLRmSapKC wXnwAuSeevtFshGGyd5LZO0QQGxzG1H63Gky2GRoh8bTQbd2tQcfQzANdnPkBAQS j2aOiSsiUQeNZqfZAfEBwRd27GXRYlKb/MpgCZKUH+ZO9VG6QaD3VGvg17/Caghy nVdbBQ81ZV9tkz9EMN0vt2VJHmEqARh88w619laHjg+ioPTG4/UIDPzskt1I+Fgm Y6NQLeFyfaO3RKKDYWGPcY7fmWQI9V8MECHOvyVI4xJcgqAbqnfsgytjuiFbrfRo fTvpuB7kvltBZ180QSB79xj0sWGFTWR02MeWy3uOaLZz2eIm2ZTZbMUSgNYR0ldG L3y7CEkTkoVF1ijYgAfuMgptk3Yf0dpa66D9HUo947wWkNrW5ds= =4fTk -----END PGP SIGNATURE----- Merge tag 'riscv-for-linus-7.0-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Paul Walmsley: - Add support for control flow integrity for userspace processes. This is based on the standard RISC-V ISA extensions Zicfiss and Zicfilp - Improve ptrace behavior regarding vector registers, and add some selftests - Optimize our strlen() assembly - Enable the ISO-8859-1 code page as built-in, similar to ARM64, for EFI volume mounting - Clean up some code slightly, including defining copy_user_page() as copy_page() rather than memcpy(), aligning us with other architectures; and using max3() to slightly simplify an expression in riscv_iommu_init_check() * tag 'riscv-for-linus-7.0-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits) riscv: lib: optimize strlen loop efficiency selftests: riscv: vstate_exec_nolibc: Use the regular prctl() function selftests: riscv: verify ptrace accepts valid vector csr values selftests: riscv: verify ptrace rejects invalid vector csr inputs selftests: riscv: verify syscalls discard vector context selftests: riscv: verify initial vector state with ptrace selftests: riscv: test ptrace vector interface riscv: ptrace: validate input vector csr registers riscv: csr: define vtype register elements riscv: vector: init vector context with proper vlenb riscv: ptrace: return ENODATA for inactive vector extension kselftest/riscv: add kselftest for user mode CFI riscv: add documentation for shadow stack riscv: add documentation for landing pad / indirect branch tracking riscv: create a Kconfig fragment for shadow stack and landing pad support arch/riscv: add dual vdso creation logic and select vdso based on hw arch/riscv: compile vdso with landing pad and shadow stack note riscv: enable kernel access to shadow stack memory via the FWFT SBI call riscv: add kernel command line option to opt out of user CFI riscv/hwprobe: add zicfilp / zicfiss enumeration in hwprobe ...	2026-02-12 19:17:44 -08:00
Linus Torvalds	cebcffe666	VFIO updates for v7.0-rc1 - Update outdated mdev comment referencing the renamed mdev_type_add() function. (Julia Lawall) - Introduce selftest support for IOMMU mapping of PCI MMIO BARs. (Alex Mastro) - Relax selftest assertion relative to differences in huge page handling between legacy (v1) TYPE1 IOMMU mapping behavior and the compatibility mode supported by IOMMUFD. (David Matlack) - Reintroduce memory poison handling support for non-struct-page- backed memory in the nvgrace-gpu variant driver. (Ankit Agrawal) - Replace dma_buf_phys_vec with phys_vec to avoid duplicate structure and semantics. (Leon Romanovsky) - Add missing upstream bridge locking across PCI function reset, resolving an assertion failure when secondary bus reset is used to provide that reset. (Anthony Pighin) - Fixes to hisi_acc vfio-pci variant driver to resolve corner case issues related to resets, repeated migration, and error injection scenarios. (Longfang Liu, Weili Qian) - Restrict vfio selftest builds to arm64 and x86_64, resolving compiler warnings on 32-bit archs. (Ted Logan) - Un-deprecate the fsl-mc vfio bus driver as a new maintainer has stepped up. (Ioana Ciornei) -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmmNCcMRHGFsZXhAc2hh emJvdC5vcmcACgkQI5ubbjuwiyLlvw/9FLOcpjKCcxyWFPGUMHV9L0N8dWMR5t75 Pu6cBuYdpqGgrUaa1NWHYEzFbMSkEJMb5jLj26lokn2l4VZ9BKwdehaE/7t978z2 J0FgnGUg3B4lYm5qoBStaJ26123XafTMnsBn+wKdXt/lN6ng6GXVBxnmGP+Fuuwd HA3MSFB6HUFw4et8qDG3ziyboN/pSWyXaupy60zvVy9x39i4/ZzMm3PSrYPdUX4x aPM+lWKRi5yFMwiksZyYb67XA717Js8xhmgNMeJ8Yz3ZUF0n3Z7ZpOzbU+hl8LNn sAea6+lXXsvNjEXfet1mjg7A+RYmuQdcjk58J//ijRXn7zRijRM671Bzc40T2JcP bfrajHhprMsE+u7VwiBuERACTtbemuaKSbi5iNLHAIqTFwPpb400PvbptkyQhkxh IRXIxqgKb5G6/sd73m9dKR9HU7d5SL3mNCARrymgqT6kRxz8fqtaVsXbbsa1Tgah iV8in7wjKJ/80rYQd7gNyj/RRpYTAJJemfnJtKGQ9LxGnej8AV6kUZ3np7hpspz7 TVtmn9RxlwbA5lWYXJ4VUzt9u2Riwd2W6jg6ZnUknSZN6B5j2Jd2bDtF/FKLauKG DW/bN8UU7nzgC40ro92qJEFF2PC7GkfZUVRlgW0oq54QZjyCoAIpfYOXjLTSteYP umnjcrWkgag= =F+FV -----END PGP SIGNATURE----- Merge tag 'vfio-v7.0-rc1' of https://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: "A small cycle with the bulk in selftests and reintroducing poison handling in the nvgrace-gpu driver. The rest are fixes, cleanups, and some dmabuf structure consolidation. - Update outdated mdev comment referencing the renamed mdev_type_add() function (Julia Lawall) - Introduce selftest support for IOMMU mapping of PCI MMIO BARs (Alex Mastro) - Relax selftest assertion relative to differences in huge page handling between legacy (v1) TYPE1 IOMMU mapping behavior and the compatibility mode supported by IOMMUFD (David Matlack) - Reintroduce memory poison handling support for non-struct-page- backed memory in the nvgrace-gpu variant driver (Ankit Agrawal) - Replace dma_buf_phys_vec with phys_vec to avoid duplicate structure and semantics (Leon Romanovsky) - Add missing upstream bridge locking across PCI function reset, resolving an assertion failure when secondary bus reset is used to provide that reset (Anthony Pighin) - Fixes to hisi_acc vfio-pci variant driver to resolve corner case issues related to resets, repeated migration, and error injection scenarios (Longfang Liu, Weili Qian) - Restrict vfio selftest builds to arm64 and x86_64, resolving compiler warnings on 32-bit archs (Ted Logan) - Un-deprecate the fsl-mc vfio bus driver as a new maintainer has stepped up (Ioana Ciornei)" * tag 'vfio-v7.0-rc1' of https://github.com/awilliam/linux-vfio: vfio/fsl-mc: add myself as maintainer vfio: selftests: only build tests on arm64 and x86_64 hisi_acc_vfio_pci: fix the queue parameter anomaly issue hisi_acc_vfio_pci: resolve duplicate migration states hisi_acc_vfio_pci: update status after RAS error hisi_acc_vfio_pci: fix VF reset timeout issue vfio/pci: Lock upstream bridge for vfio_pci_core_disable() types: reuse common phys_vec type instead of DMABUF open‑coded variant vfio/nvgrace-gpu: register device memory for poison handling mm: add stubs for PFNMAP memory failure registration functions vfio: selftests: Drop IOMMU mapping size assertions for VFIO_TYPE1_IOMMU vfio: selftests: Add vfio_dma_mapping_mmio_test vfio: selftests: Align BAR mmaps for efficient IOMMU mapping vfio: selftests: Centralize IOMMU mode name definitions vfio/mdev: update outdated comment	2026-02-12 15:52:39 -08:00
Linus Torvalds	c6e62d002b	Driver core changes for 7.0-rc1 - Bus: - Ensure bus->match() is consistently called with the device lock held - Improve type safety of bus_find_device_by_acpi_dev() - Devtmpfs: - Parse 'devtmpfs.mount=' boot parameter with kstrtoint() instead of simple_strtoul() - Avoid sparse warning by making devtmpfs_context_ops static - IOMMU: - Do not register the qcom_smmu_tbu_driver in arm_smmu_device_probe() - MAINTAINERS: - Add the new driver-core mailing list (driver-core@lists.linux.dev) to all relevant entries - Add missing tree location for "FIRMWARE LOADER (request_firmware)" - Add driver-model documentation to the "DRIVER CORE" entry - Add missing driver-core maintainers to the "AUXILIARY BUS" entry - Misc: - Change return type of attribute_container_register() to void; it has always been infallible - Do not export sysfs_change_owner(), sysfs_file_change_owner() and device_change_owner() - Move devres_for_each_res() from the public devres header to drivers/base/base.h - Do not use a static struct device for the faux bus; allocate it dynamically - Revocable: - Patches for the revocable synchronization primitive have been scheduled for v7.0-rc1, but have been reverted as they need some more refinement - Rust: - Device: - Support dev_printk on all device types, not just the core Device struct; remove now-redundant .as_ref() calls in dev_* print calls - Devres: - Introduce an internal reference count in Devres<T> to avoid a deadlock condition in case of (indirect) nesting - DMA: - Allow drivers to tune the maximum DMA segment size via dma_set_max_seg_size() - I/O: - Introduce the concept of generic I/O backends to handle different kinds of device shared memory through a common interface. This enables higher-level concepts such as register abstractions, I/O slices, and field projections to be built generically on top. In a first step, introduce the Io, IoCapable<T>, and IoKnownSize trait hierarchy for sharing a common interface supporting offset validation and bound-checking logic between I/O backends. - Refactor MMIO to use the common I/O backend infrastructure - Misc: - Add __rust_helper annotations to C helpers for inlining into Rust code - Use "kernel vertical" style for imports - Replace kernel::c_str! with C string literals - Update ARef imports to use sync::aref - Use pin_init::zeroed() for struct auxiliary_device_id and debugfs file_operations initialization - Use LKMM atomic types in debugfs doc-tests - Various minor comment and documentation fixes - PCI: - Implement PCI configuration space accessors using the common I/O backend infrastructure - Document pci::Bar device endianness assumptions - SoC: - Abstractions for struct soc_device and struct soc_device_attribute - Sample driver for soc::Device -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQS2q/xV6QjXAdC7k+1FlHeO1qrKLgUCaY0JegAKCRBFlHeO1qrK LtCjAQDeSqGuzQM6hkMjsUWbjdWyw0yrrXcOxhwIINTc7uCzogEA7JL00+eiKHYu SV2Ckn6UnSQ14rpEaDIzgZdurZHGUAM= =TL00 -----END PGP SIGNATURE----- Merge tag 'driver-core-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Danilo Krummrich: "Bus: - Ensure bus->match() is consistently called with the device lock held - Improve type safety of bus_find_device_by_acpi_dev() Devtmpfs: - Parse 'devtmpfs.mount=' boot parameter with kstrtoint() instead of simple_strtoul() - Avoid sparse warning by making devtmpfs_context_ops static IOMMU: - Do not register the qcom_smmu_tbu_driver in arm_smmu_device_probe() MAINTAINERS: - Add the new driver-core mailing list (driver-core@lists.linux.dev) to all relevant entries - Add missing tree location for "FIRMWARE LOADER (request_firmware)" - Add driver-model documentation to the "DRIVER CORE" entry - Add missing driver-core maintainers to the "AUXILIARY BUS" entry Misc: - Change return type of attribute_container_register() to void; it has always been infallible - Do not export sysfs_change_owner(), sysfs_file_change_owner() and device_change_owner() - Move devres_for_each_res() from the public devres header to drivers/base/base.h - Do not use a static struct device for the faux bus; allocate it dynamically Revocable: - Patches for the revocable synchronization primitive have been scheduled for v7.0-rc1, but have been reverted as they need some more refinement Rust: - Device: - Support dev_printk on all device types, not just the core Device struct; remove now-redundant .as_ref() calls in dev_* print calls - Devres: - Introduce an internal reference count in Devres<T> to avoid a deadlock condition in case of (indirect) nesting - DMA: - Allow drivers to tune the maximum DMA segment size via dma_set_max_seg_size() - I/O: - Introduce the concept of generic I/O backends to handle different kinds of device shared memory through a common interface. This enables higher-level concepts such as register abstractions, I/O slices, and field projections to be built generically on top. In a first step, introduce the Io, IoCapable<T>, and IoKnownSize trait hierarchy for sharing a common interface supporting offset validation and bound-checking logic between I/O backends. - Refactor MMIO to use the common I/O backend infrastructure - Misc: - Add __rust_helper annotations to C helpers for inlining into Rust code - Use "kernel vertical" style for imports - Replace kernel::c_str! with C string literals - Update ARef imports to use sync::aref - Use pin_init::zeroed() for struct auxiliary_device_id and debugfs file_operations initialization - Use LKMM atomic types in debugfs doc-tests - Various minor comment and documentation fixes - PCI: - Implement PCI configuration space accessors using the common I/O backend infrastructure - Document pci::Bar device endianness assumptions - SoC: - Abstractions for struct soc_device and struct soc_device_attribute - Sample driver for soc::Device" * tag 'driver-core-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (79 commits) rust: devres: fix race condition due to nesting rust: dma: add missing __rust_helper annotations samples: rust: pci: Remove some additional `.as_ref()` for `dev_` print Revert "revocable: Revocable resource management" Revert "revocable: Add Kunit test cases" Revert "selftests: revocable: Add kselftest cases" driver core: remove device_change_owner() export sysfs: remove exports of sysfs_change_owner() driver core: disable revocable code from build revocable: Add KUnit test for concurrent access revocable: fix SRCU index corruption by requiring caller-provided storage revocable: Add KUnit test for provider lifetime races revocable: Fix races in revocable_alloc() using RCU driver core: fix inverted "locked" suffix of driver_match_device() rust: io: move MIN_SIZE and io_addr_assert to IoKnownSize rust: pci: re-export ConfigSpace rust: dma: allow drivers to tune max segment size gpu: tyr: remove redundant `.as_ref()` for `dev_*` print rust: auxiliary: use `pin_init::zeroed()` for device ID rust: debugfs: use pin_init::zeroed() for file_operations ...	2026-02-11 17:43:59 -08:00
Linus Torvalds	1e0ea4dff0	IOMMU Updates for Linux v7.0 Including: - Core changes: - Rust bindings for IO-pgtable code - IOMMU page allocation debugging support - Disable ATS during PCI resets - Intel VT-d changes: - Skip dev-iotlb flush for inaccessible PCIe device - Flush cache for PASID table before using it - Use right invalidation method for SVA and NESTED domains - Ensure atomicity in context and PASID entry updates - AMD-Vi changes: - Support for nested translations - Other minor improvements - ARM-SMMU-v2 changes: - Configure SoC-specific prefetcher settings for Qualcomm's "MDSS". - ARM-SMMU-v3 changes: - Improve CMDQ locking fairness for pathetically small queue sizes. - Remove tracking of the IAS as this is only relevant for AArch32 and was causing C_BAD_STE errors. - Add device-tree support for NVIDIA's CMDQV extension. - Allow some hitless transitions for the 'MEV' and 'EATS' STE fields. - Don't disable ATS for nested S1-bypass nested domains. - Additions to the kunit selftests. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmmLDZwACgkQK/BELZcB GuNHgg//Yf9K/+T6+IOemA5Z8k3x2p39Q/Dv5x+SEGkh+CUh2C5dX97WD9LHntus 1mgIHlSgbM3bgMB+XTS1Q5ghy1QH71XOMnGCPhthwg843iCP2CcrB84ZZKKnNmw9 2YJdxYlNcbAMpvSd0F1XKaXoiNl9qzWx+QFtnVaTXMptNEhYOxMOlaZPtlEuwfJa T7h4cwtsiMDLWA4pw85y4hfvc5jKRv4dMoohin0lNEBpWkCfYE6b2Cjpff+9TtU2 Jyvvcvyns0US3amEwPHlIyfTUPKdaq6Vv3NX8TkAJUhGyEzdfwEtzqAvWMvOEYFh HfnE/LjZZLB1CUkF5MTib9dBgJACf/jtvOtuh4wZkx+7O2WIR6Ebo41dtWBM6dxh cHGeeQGqxdDZ5UJbIonF8Am0lxsaZx2zs09tlHEMGl2pNDi6vUppk1iTOkv3Wog0 zy4GhDBl0n/IcyCaIinnWck8C+BsAMcRGpDP2AB0I9/C2qpsaFY/NdNkbIGidhaJ 3khdAcjWsNPiJPNbUx66n6t8RSXdYKUuhJq2a/GgYmtAjhRR9cJlupB8/QYCBS5j fxXpHp4xMtw+Cgj58xC+gYXDivQOEThPs/BhL/qrxOzWE03HWI15MFydqRFWicnI gJCZSevMncBfNUTIJUSUmuT7ukP40cnh58QBeRkTmKGcW6HjuyY= =W/nW -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core changes: - Rust bindings for IO-pgtable code - IOMMU page allocation debugging support - Disable ATS during PCI resets Intel VT-d changes: - Skip dev-iotlb flush for inaccessible PCIe device - Flush cache for PASID table before using it - Use right invalidation method for SVA and NESTED domains - Ensure atomicity in context and PASID entry updates AMD-Vi changes: - Support for nested translations - Other minor improvements ARM-SMMU-v2 changes: - Configure SoC-specific prefetcher settings for Qualcomm's "MDSS" ARM-SMMU-v3 changes: - Improve CMDQ locking fairness for pathetically small queue sizes - Remove tracking of the IAS as this is only relevant for AArch32 and was causing C_BAD_STE errors - Add device-tree support for NVIDIA's CMDQV extension - Allow some hitless transitions for the 'MEV' and 'EATS' STE fields - Don't disable ATS for nested S1-bypass nested domains - Additions to the kunit selftests" * tag 'iommu-updates-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (54 commits) iommupt: Always add IOVA range to iotlb_gather in gather_range_pages() iommu/amd: serialize sequence allocation under concurrent TLB invalidations iommu/amd: Fix type of type parameter to amd_iommufd_hw_info() iommu/arm-smmu-v3: Do not set disable_ats unless vSTE is Translate iommu/arm-smmu-v3-test: Add nested s1bypass/s1dssbypass coverage iommu/arm-smmu-v3: Mark EATS_TRANS safe when computing the update sequence iommu/arm-smmu-v3: Mark STE MEV safe when computing the update sequence iommu/arm-smmu-v3: Add update_safe bits to fix STE update sequence iommu/arm-smmu-v3: Add device-tree support for CMDQV driver iommu/tegra241-cmdqv: Decouple driver from ACPI iommu/arm-smmu-qcom: Restore ACTLR settings for MDSS on sa8775p iommu/vt-d: Fix race condition during PASID entry replacement iommu/vt-d: Clear Present bit before tearing down context entry iommu/vt-d: Clear Present bit before tearing down PASID entry iommu/vt-d: Flush piotlb for SVM and Nested domain iommu/vt-d: Flush cache for PASID table before using it iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in scalable mode iommu/vt-d: Skip dev-iotlb flush for inaccessible PCIe device without scalable mode rust: iommu: fix `srctree` link warning rust: iommu: fix Rust formatting ...	2026-02-11 16:36:08 -08:00
Linus Torvalds	2619c62b7e	Trivial cleanups for the posted MSI interrupt handling -----BEGIN PGP SIGNATURE----- iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmmJ2msQHHRnbHhAa2Vy bmVsLm9yZwAKCRCmGPVMDXSYoa3TD/9vZsCSf/SxpgHRLh/GwrIgCOOg1IHBmwmZ xbfa1k2E29uIiez+qEE2oBBsCdL8NPx2JQLO4qZu4d9Cv9F9vk/QedjTFhjfiz5n geaGuka3tXsdEYO4cZhKgH3MkZGo4u3vKauj7zTaVLbknq5NfdMlZipWtQ3P88B/ bN7t0814vqhg+8JNUraMYqG15o6CVAvLj3IDiSpcpPj6kCVmFfRdtJFvJvRuCY/I trnbSwV4wEqsX629BdEcjX2izqDCUO9tqSB709KmjeUFuCyPdr+mxfUScE0gmTNq L/gWvbNT2xQzk65Z3toJZsqsDGuUm1dq5DfEedzaZ8F1tuoSY6ePfz2242yXmzGo IhmLRuKuGJ5PQH1X2NfwC/QHIZJikE71O6+ojMo4PEM98/EQ7iBMFvXWanbRTQqu d/ZIJ7LNHjQKQHXm11oMbwiz4nuLLH9gb8Rv/nnxHT1UFI2QlozmWLpYhAE2sWPU T7MiDf9Dha1dRuG7U0LyuvT9/wvdsYIHPzhLEpiFuFAm1uSUqITr6UnW9ADIXlcI 5ZWqe7YZo60Nnj1BPj9tw480dGpUpQDFzuTnGMHaMTUNoInQlJU6S4ZYNiHJY4vu 3z+EfP4qHkMFAjhOsK9T+UL7UDiZygf/CjaJxriJZXvIMH9MBdYFBjuC8HbYjdKv q48DcsUU8g== =Innj -----END PGP SIGNATURE----- Merge tag 'x86-irq-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 irq updates from Thomas Gleixner: "Trivial cleanups for the posted MSI interrupt handling" * tag 'x86-irq-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq_remapping: Sanitize posted_msi_supported() x86/irq: Cleanup posted MSI code	2026-02-10 17:39:08 -08:00
Linus Torvalds	4e21e585b6	A series of treewide cleanups to ensure interrupt request consistency. - Add the missing IRQF_COND_ONESHOT flag to devm_request_irq() This is inconsistent vs. request_irq() and causes the same issues which where addressed with the introduction of this flag - Cleanup IRQF_ONESHOT and IRQF_NO_THREAD usage Quite some drivers have inconsistent interrupt request flags related to interrupt threading namely IRQF_ONESHOT and IRQF_NO_THREAD. This leads to warnings and/or malfunction when forced interrupt threading is enabled. - Remove stub primary (hard interrupt) handlers A bunch of drivers implement a stub primary (hard interrupt) handler which just returns IRQ_WAKE_THREAD. The same functionality is provided by the core code when the primary handler argument of request_thread_irq() is set to NULL. -----BEGIN PGP SIGNATURE----- iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmmJs8MQHHRnbHhAa2Vy bmVsLm9yZwAKCRCmGPVMDXSYoTbvEACH4OegGofKri7aecUPNcpRdQDHBoueikni Rio/vydFJ/H2hto4xlSPC4C84onxuFqY9lJgo/tCQTCrO0t+ZQ4ZGqnlQKzLJzmv vcVzNgGsxDZ0p1wJO0rBpTRxJN8yTXi8VVv5e6OPuihjLhdXGesyYtk1zosR3nOS CF/w8r9jVMzsSMPvtEMr5AwXD9ZTziUqyhQv94fYlpsbyD4TPXnUxhVkdUFFHHo3 ROzWPFw1Ykh6wpdRPEpupcCf1d2Pq0TIAU86y3Sbf2msuXiTouHf+lH1uTd3EsLN 6qUIqRYjwWE8HTieh+3YcH415wrIsUsWJb8YDi0DpqhPbja3IXP5ACHqEWaaNHRA MaBE2Gc02se4ChXMWncYR3cdzyAAwAeKLUahpLNc+7U4cHOm1w2g60yy4I0v2krh V0vfEN88WQ8DgrM0VvDLST6ZinSz4ia+R0qYWywl6eIW4RVNtuBi6wrN5PtzSEtz jZ3LqnRLGmNfKwS/taHBCAme7NIJSNa1L0ao/icnW5XVQz/d2EHVcUsLHecHZSMx l9tr/g3t85tsFW1eIKfF8T1a5DrbCEP4afceQk9KexAfAkP7el53M1E1yQDk/kW8 so0CwZtbDJ136RQdBIQqx49QrUEOvtrgNDRQxPFBUrWEHcvjqbUuFclp9hpLheOj 8YnzkVe0Rg== =vrmm -----END PGP SIGNATURE----- Merge tag 'irq-cleanups-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq cleanups from Thomas Gleixner: "A series of treewide cleanups to ensure interrupt request consistency. - Add the missing IRQF_COND_ONESHOT flag to devm_request_irq() This is inconsistent vs request_irq() and causes the same issues which where addressed with the introduction of this flag - Cleanup IRQF_ONESHOT and IRQF_NO_THREAD usage Quite some drivers have inconsistent interrupt request flags related to interrupt threading namely IRQF_ONESHOT and IRQF_NO_THREAD. This leads to warnings and/or malfunction when forced interrupt threading is enabled. - Remove stub primary (hard interrupt) handlers A bunch of drivers implement a stub primary (hard interrupt) handler which just returns IRQ_WAKE_THREAD. The same functionality is provided by the core code when the primary handler argument of request_thread_irq() is set to NULL" * tag 'irq-cleanups-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: media: pci: mg4b: Use IRQF_NO_THREAD mfd: wm8350-core: Use IRQF_ONESHOT thermal/qcom/lmh: Replace IRQF_ONESHOT with IRQF_NO_THREAD rtc: amlogic-a4: Remove IRQF_ONESHOT usb: typec: fusb302: Remove IRQF_ONESHOT EDAC/altera: Remove IRQF_ONESHOT char: tpm: cr50: Remove IRQF_ONESHOT ARM: versatile: Remove IRQF_ONESHOT scsi: efct: Use IRQF_ONESHOT and default primary handler Bluetooth: btintel_pcie: Use IRQF_ONESHOT and default primary handler bus: fsl-mc: Use default primary handler mailbox: bcm-ferxrm-mailbox: Use default primary handler iommu/amd: Use core's primary handler and set IRQF_ONESHOT platform/x86: int0002: Remove IRQF_ONESHOT from request_irq() genirq: Set IRQF_COND_ONESHOT in devm_request_irq().	2026-02-10 13:22:50 -08:00
Joerg Roedel	ad09563660	Merge branches 'fixes', 'arm/smmu/updates', 'intel/vt-d', 'amd/amd-vi' and 'core' into next	2026-02-06 11:10:40 +01:00
Viktor Kleen	02f9d76a76	iommu/vt-d: Treat PAGE_SNOOP and PWSNP separately The PASID_FLAG_PAGE_SNOOP and PASID_FLAG_PWSNP constants are identical. This will cause the pasid code to always set both or neither of the PGSNP and PWSNP bits in PASID table entries. However, PWSNP is a reserved bit if SMPWC is not set in the IOMMU's extended capability register, even if SC is supported. This has resulted in DMAR errors when testing the iommufd code on an Arrow Lake platform. With this patch, those errors disappear and the PASID table entries look correct. Fixes: `101a285411` ("iommu/vt-d: Follow PT_FEAT_DMA_INCOHERENT into the PASID entry") Cc: stable@vger.kernel.org Signed-off-by: Viktor Kleen <viktor@kleen.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20260202192109.1665799-1-viktor@kleen.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-02-06 11:01:00 +01:00
Yu Zhang	b48ca92061	iommupt: Always add IOVA range to iotlb_gather in gather_range_pages() Add current (iova, len) to the iotlb gather, regardless of the setting of PT_FEAT_FLUSH_RANGE or PT_FEAT_FLUSH_RANGE_NO_GAPS. In gather_range_pages(), the current IOVA range is only added to iotlb_gather when PT_FEAT_FLUSH_RANGE is set. Yet a virtual IOMMU with NpCache uses only PT_FEAT_FLUSH_RANGE_NO_GAPS. In that case, iotlb_gather will stay empty (start=ULONG_MAX, end=0) after initialization, and the current (iova, len) will not be added to the iotlb_gather, causing subsequent iommu_iotlb_sync() to perform IOTLB invalidation with wrong parameters (e.g., amd_iommu_iotlb_sync() computes size from gather->end - gather->start + 1, leading to an invalid range). The disjoint check and sync for PT_FEAT_FLUSH_RANGE_NO_GAPS remain unchanged: when the new range is disjoint from the existing gather, we still sync first and then add the new range, so semantics for NO_GAPS are preserved. Fixes: `7c53f4238a` ("iommupt: Add unmap_pages op") Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yu Zhang <zhangyu1@linux.microsoft.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-02-03 14:36:21 +01:00
Ankit Soni	9e249c4841	iommu/amd: serialize sequence allocation under concurrent TLB invalidations With concurrent TLB invalidations, completion wait randomly gets timed out because cmd_sem_val was incremented outside the IOMMU spinlock, allowing CMD_COMPL_WAIT commands to be queued out of sequence and breaking the ordering assumption in wait_on_sem(). Move the cmd_sem_val increment under iommu->lock so completion sequence allocation is serialized with command queuing. And remove the unnecessary return. Fixes: `d2a0cac105` ("iommu/amd: move wait_on_sem() out of spinlock") Tested-by: Srikanth Aithal <sraithal@amd.com> Reported-by: Srikanth Aithal <sraithal@amd.com> Signed-off-by: Ankit Soni <Ankit.Soni@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-02-03 14:27:05 +01:00
Sebastian Andrzej Siewior	5bfcdccb4d	iommu/amd: Use core's primary handler and set IRQF_ONESHOT request_threaded_irq() is invoked with a primary and a secondary handler and no flags are passed. The primary handler is the same as irq_default_primary_handler() so there is no need to have an identical copy. The lack of the IRQF_ONESHOT can be dangerous because the interrupt source is not masked while the threaded handler is active. This means, especially on LEVEL typed interrupt lines, the interrupt can fire again before the threaded handler had a chance to run. Use the default primary interrupt handler by specifying NULL and set IRQF_ONESHOT so the interrupt source is masked until the secondary handler is done. Fixes: `72fe00f01f` ("x86/amd-iommu: Use threaded interupt handler") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260128095540.863589-4-bigeasy@linutronix.de	2026-02-01 17:37:13 +01:00
Linus Torvalds	162b42445b	IOMMU Fixes for Linux v6.19-rc7 Including: - Fix a performance regression cause by new Generic IO-Page-Table code detected in Intel VT-d driver. - Command queue flushing fix for NVidia version of the ARM-SMMU-v3 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAml+LeMACgkQK/BELZcB GuPi4xAA5T7+RG/KG244Wtb+jjiYg8FabKkPtubWw4PD3fVo2KW3Zkt3xHUtznDL Lpm5Ywb7yJ2pgwy4E4H4mE3wkVAWy9K4WNzv+jr++SxXktBBi52xrqAzJspbQUGd Twrdh7CGeoHQnuEVAm0OnUDA0JKx6/tLPI3XCHNH0zdas5eKnODDpE2w0etDpuTE +fRXc+n2Z2P7UzjDBBrjb5xM+SyN0ImuBqk11D3psye2zZu/KLXZskLpwtNRm+fr Jzl5LkTcOwCHk/Do/ZW2SpJLMV4p70QympEJRCfWu/wca+0R9APNdhUFlTPazKSj eo+SUSUBYo7LTmrR4asyEr1UCEI0M+OCZAhYKQzVKNyPCFnnnKCufH6oTyxX17I4 nDg5piCtedwU9rUjX0VqRmx3ZA+0dILfcOGsgWV4mOvbO0nuQk6PHXfgTd3Fzdnh 2yjaUFWwUN/edCZHQAtk1RqmPu3LLXeUsx1eMngJlivHljMtGKGD18vmZzXZUNZL aG/H5FRVDX63mNZNJyC90SWx3LMmhmOAWWBbvRTXtL3xvLgd6j+ZYYSwnghnYmpn c9BE+POvLexRrRNXAFEChTHt9K2IWuPxBqGHOjLmM0Nds+lP0Ixh5EjuwpJW0gSY k+z9MLtFFI3TJn6S3h7SAQpc7ngDiHfvxKSfcRZ1YwZIDXLwM3Q= =/mf8 -----END PGP SIGNATURE----- Merge tag 'iommu-fixes-v6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: - Fix a performance regression cause by the new Generic IO-Page-Table code detected in Intel VT-d driver - Command queue flushing fix for NVidia version of the ARM-SMMU-v3 * tag 'iommu-fixes-v6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu/tegra241-cmdqv: Reset VCMDQ in tegra241_vcmdq_hw_init_user() iommupt: Only cache flush memory changed by unmap	2026-01-31 09:40:13 -08:00
Nicolin Chen	80f1a2c233	iommu/tegra241-cmdqv: Reset VCMDQ in tegra241_vcmdq_hw_init_user() The Enable bits in CMDQV/VINTF/VCMDQ_CONFIG registers do not actually reset the HW registers. So, the driver explicitly clears all the registers when a VINTF or VCMDQ is being initialized calling its hw_deinit() function. However, a userspace VCMDQ is not properly reset, unlike an in-kernel VCMDQ getting reset in tegra241_vcmdq_hw_init(). Meanwhile, tegra241_vintf_hw_init() calling tegra241_vintf_hw_deinit() will not deinit any VCMDQ, since there is no userspace VCMDQ mapped to the VINTF at that stage. Then, this may result in dirty VCMDQ registers, which can fail the VM. Like tegra241_vcmdq_hw_init(), reset a VCMDQ in tegra241_vcmdq_hw_init() to fix this bug. This is required by a host kernel. Fixes: 6717f26ab1e7 ("iommu/tegra241-cmdqv: Add user-space use support") Cc: stable@vger.kernel.org Reported-by: Bao Nguyen <ncqb@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-31 10:22:08 +01:00
Deepanshu Kartikey	2724138b2f	iommufd: Initialize batch->kind in batch_clear() KMSAN reported an uninitialized value when batch_add_pfn_num() reads batch->kind. This occurs because batch_clear() does not initialize the kind field. When batch_add_pfn_num() checks "if (batch->kind != kind)", it reads this uninitialized value, triggering KMSAN warnings. However the algorithm is fine with any value in kind at this point as the batch is always empty and it always corrects kind if wrong. Initialize batch->kind to zero in batch_clear() to silence the KMSAN warning. Link: https://patch.msgid.link/r/20260124132214.624041-1-kartikey406@gmail.com Reported-by: syzbot+df28076a30d726933015@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=df28076a30d726933015 Fixes: `f394576eb1` ("iommufd: PFN handling for iopt_pages") Tested-by: syzbot+df28076a30d726933015@syzkaller.appspotmail.com Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: syzbot+a0c841e02f328005bbcc@syzkaller.appspotmail.com Reported-by: syzbot+a0c841e02f328005bbcc@syzkaller.appspotmail.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2026-01-28 12:49:17 -04:00
Jason Gunthorpe	5815d9303c	iommupt: Only cache flush memory changed by unmap The cache flush was happening on every level across the whole range of iteration, even if no leafs or tables were cleared. Instead flush only the sub range that was actually written. Overflushing isn't a correctness problem but it does impact the performance of unmap. After this series the performance compared to the original VT-d implementation with cache flushing turned on is: map_pages pgsz ,avg new,old ns, min new,old ns , min % (+ve is better) 2^12, 253,266 , 213,227 , 6.06 2^21, 246,244 , 221,219 , 0.00 2^30, 231,240 , 209,217 , 3.03 2562^12, 2604,2668 , 2415,2540 , 4.04 2562^21, 2495,2824 , 2390,2734 , 12.12 2562^30, 2542,2845 , 2380,2718 , 12.12 unmap_pages pgsz ,avg new,old ns, min new,old ns , min % (+ve is better) 2^12, 259,292 , 222,251 , 11.11 2^21, 255,259 , 227,236 , 3.03 2^30, 238,254 , 217,230 , 5.05 2562^12, 2751,2620 , 2417,2437 , 0.00 2562^21, 2461,2526 , 2377,2423 , 1.01 2562^30, 2498,2543 , 2370,2404 , 1.01 Fixes: `efa03dab7c` ("iommupt: Flush the CPU cache after any writes to the page table") Reported-by: Francois Dugast <francois.dugast@intel.com> Closes: https://lore.kernel.org/all/20260121130233.257428-1-francois.dugast@intel.com/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-28 15:14:17 +01:00
Nathan Chancellor	5b0530bb16	iommu/amd: Fix type of type parameter to amd_iommufd_hw_info() When building with -Wincompatible-function-pointer-types-strict, a warning designed to catch kernel control flow integrity (kCFI) issues at build time, there is an instance around amd_iommufd_hw_info(): drivers/iommu/amd/iommu.c:3141:13: error: incompatible function pointer types initializing 'void ()(struct device , u32 , enum iommu_hw_info_type )' (aka 'void ()(struct device , unsigned int , enum iommu_hw_info_type )') with an expression of type 'void (struct device , u32 , u32 )' (aka 'void (struct device , unsigned int , unsigned int )') [-Werror,-Wincompatible-function-pointer-types-strict] 3141 \| .hw_info = amd_iommufd_hw_info, \| ^~~~~~~~~~~~~~~~~~~ While 'u32 ' and 'enum iommu_hw_info_type ' are ABI compatible, hence no regular warning from -Wincompatible-function-pointer-types, the mismatch will trigger a kCFI violation when amd_iommufd_hw_info() is called indirectly. Update the type parameter of amd_iommufd_hw_info() to be 'enum iommu_hw_info_type *' to match the prototype in 'struct iommu_ops', clearing up the warning and kCFI violation. Fixes: `7d8b06ecc4` ("iommu/amd: Add support for hw_info for iommu capability query") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-28 15:13:01 +01:00
Danilo Krummrich	559ac49154	Driver core fixes deferred from 6.19-rc7 [1, 2] were originally intended for -rc7. Patch [1] uncovered potential deadlocks that require a few driver fixes; [2] is one such fix. [1] https://patch.msgid.link/20260113162843.12712-1-hanguidong02@gmail.com [2] https://patch.msgid.link/20260121141215.29658-1-dakr@kernel.org -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQS2q/xV6QjXAdC7k+1FlHeO1qrKLgUCaXdn0AAKCRBFlHeO1qrK Lu0BAQDfaNgkqh55vA7C+meIUTKKEnARTsRHowRnex2zty0VVQEA/tExyJKy8QY4 pNe8ghYW6mmad/eY+pixYzaFacv7+ws= =CzPS -----END PGP SIGNATURE----- Merge tag 'driver-core-6.19-rc7-deferred' into driver-core-next Driver core fixes deferred from 6.19-rc7 [1, 2] were originally intended for -rc7. Patch [1] uncovered potential deadlocks that require a few driver fixes; [2] is one such fix. [1] https://patch.msgid.link/20260113162843.12712-1-hanguidong02@gmail.com [2] https://patch.msgid.link/20260121141215.29658-1-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2026-01-26 14:12:02 +01:00
Markus Elfring	3127718ad9	iommu/riscv: Simplify maximum determination in riscv_iommu_init_check() Reduce nested max() calls by a single max3() call in this function implementation. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Link: https://patch.msgid.link/d1a384c9-f154-4537-94d6-c3613f4167bc@web.de Signed-off-by: Paul Walmsley <pjw@kernel.org>	2026-01-25 21:09:52 -07:00
Linus Torvalds	b33d706259	IOMMU Fixes for Linux v6.19-rc6 Including: - AMD IOMMU: Fix potential NULL-ptr dereference in error path of amd_iommu_probe_device(). - Generic IOMMUPT: Fix another compiler issue seen with older compiler versions. - Fix signedness issue in ARM IO-PageTable code. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmlzOg0ACgkQK/BELZcB GuMQRhAAhgPw7RD6NPEb67NG4g28tuDBLEWBc10IS0d0XjP5QZCcFT3wLY4CYDsK J5QRDgYv6sACaOWgv5oB4N2WffqC6nC2YVniOmmq3s2Emfk/5eAo0MpRchvdiVDD F6tuUU/RwmeuUbK/6pa6SDrRsXy/SgQI858FkvVQSk8Ngj0ECFoUyJCbaqODzpP6 Cxp2KyY0FqcfssGtf6uFMSvXhC0CrFOwWHBXp5UzMOPnEHABxMUdQEHTfsR631bt IrYphkspnhfMpeAntZxpqAeejWnWcMf3nYlwt5j1UxjUuvfNtloc6hHFb6ln+/ad BG0wX5kqOK4LC4oiStz3qo9eMWrmbvG8L32Wpo8iPxd1CJ+Lu2+EaaHAYDrgoMo2 ApRAEEApzDkuGK61J3a1Ff2eg2WInBXmY55VXH2rBA2wnUrZKwYriivojt6ySXWz g6RpOWHxo6ztQN/C7VynqnvaQ5WKMHG9AhL2M6jQtP16obCz9hJut0Ps3v6AbOl2 9bQDTAEnFAi+ribCqg4PFKlcVg8wLqjDQJQLRO/hbATfV/mY0loyRVIHdyPuQUya IP1iKZn2LNwsYgl0AtrcDvXojl6CD/T1W5XjR0Q5hrNbWk1CSi2GmXMJFCiKV0pu fzS6Sv/8DS2tE1QgG4IAgbEqqb0TKeOdWcYFdZP/WRgaC8tPTeM= =k63u -----END PGP SIGNATURE----- Merge tag 'iommu-fixes-v6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: - AMD IOMMU: Fix potential NULL-ptr dereference in error path of amd_iommu_probe_device() - Generic IOMMUPT: Fix another compiler issue seen with older compiler versions - Fix signedness issue in ARM IO-PageTable code * tag 'iommu-fixes-v6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu/io-pgtable-arm: fix size_t signedness bug in unmap path iommupt: Make it clearer to the compiler that pts.level == 0 for single page iommu/amd: Fix error path in amd_iommu_probe_device()	2026-01-23 12:46:12 -08:00
Nicolin Chen	a45dd34663	iommu/arm-smmu-v3: Do not set disable_ats unless vSTE is Translate A vSTE may have three configuration types: Abort, Bypass, and Translate. An Abort vSTE wouldn't enable ATS, but the other two might. It makes sense for a Transalte vSTE to rely on the guest vSTE.EATS field. For a Bypass vSTE, it would end up with an S2-only physical STE, similar to an attachment to a regular S2 domain. However, the nested case always disables ATS following the Bypass vSTE, while the regular S2 case always enables ATS so long as arm_smmu_ats_supported(master) == true. Note that ATS is needed for certain VM centric workloads and historically non-vSMMU cases have relied on this automatic enablement. So, having the nested case behave differently causes problems. To fix that, add a condition to disable_ats, so that it might enable ATS for a Bypass vSTE, aligning with the regular S2 case. Fixes: `f27298a82b` ("iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED") Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2026-01-23 13:58:38 +00:00
Nicolin Chen	a4f976edcb	iommu/arm-smmu-v3-test: Add nested s1bypass/s1dssbypass coverage STE in a nested case requires both S1 and S2 fields. And this makes the use case different from the existing one. Add coverage for previously failed cases shifting between S2-only and S1+S2 STEs. Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2026-01-23 13:47:49 +00:00
Jason Gunthorpe	7cad800485	iommu/arm-smmu-v3: Mark EATS_TRANS safe when computing the update sequence If VM wants to toggle EATS_TRANS off at the same time as changing the CFG, hypervisor will see EATS change to 0 and insert a V=0 breaking update into the STE even though the VM did not ask for that. In bare metal, EATS_TRANS is ignored by CFG=ABORT/BYPASS, which is why this does not cause a problem until we have the nested case where CFG is always a variation of S2 trans that does use EATS_TRANS. Relax the rules for EATS_TRANS sequencing, we don't need it to be exact as the enclosing code will always disable ATS at the PCI device when changing EATS_TRANS. This ensures there are no ATS transactions that can race with an EATS_TRANS change so we don't need to carefully sequence these bits. Fixes: `1e8be08d1c` ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED") Cc: stable@vger.kernel.org Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2026-01-23 13:47:49 +00:00
Jason Gunthorpe	f3c1d372db	iommu/arm-smmu-v3: Mark STE MEV safe when computing the update sequence Nested CD tables set the MEV bit to try to reduce multi-fault spamming on the hypervisor. Since MEV is in STE word 1 this causes a breaking update sequence that is not required and impacts real workloads. For the purposes of STE updates the value of MEV doesn't matter, if it is set/cleared early or late it just results in a change to the fault reports that must be supported by the kernel anyhow. The spec says: Note: Software must expect, and be able to deal with, coalesced fault records even when MEV == 0. So mark STE MEV safe when computing the update sequence, to avoid creating a breaking update. Fixes: `da0c56520e` ("iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations") Cc: stable@vger.kernel.org Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2026-01-23 13:47:49 +00:00
Jason Gunthorpe	2781f2a930	iommu/arm-smmu-v3: Add update_safe bits to fix STE update sequence C_BAD_STE was observed when updating nested STE from an S1-bypass mode to an S1DSS-bypass mode. As both modes enabled S2, the used bit is slightly different than the normal S1-bypass and S1DSS-bypass modes. As a result, fields like MEV and EATS in S2's used list marked the word1 as a critical word that requested a STE.V=0. This breaks a hitless update. However, both MEV and EATS aren't critical in terms of STE update. One controls the merge of the events and the other controls the ATS that is managed by the driver at the same time via pci_enable_ats(). Add an arm_smmu_get_ste_update_safe() to allow STE update algorithm to relax those fields, avoiding the STE update breakages. After this change, entry_set has no caller checking its return value, so change it to void. Note that this change is required by both MEV and EATS fields, which were introduced in different kernel versions. So add get_update_safe() first. MEV and EATS will be added to arm_smmu_get_ste_update_safe() separately. Fixes: `1e8be08d1c` ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED") Cc: stable@vger.kernel.org Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2026-01-23 13:47:49 +00:00
Ashish Mhetre	ea69dc4e20	iommu/arm-smmu-v3: Add device-tree support for CMDQV driver Add device tree support to the CMDQV driver to enable usage on Tegra264 SoCs. The implementation parses the nvidia,cmdqv phandle from the SMMU device tree node to associate each SMMU with its corresponding CMDQV instance based on compatible string. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Ashish Mhetre <amhetre@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2026-01-22 15:12:08 +00:00
Nicolin Chen	eb20758f86	iommu/tegra241-cmdqv: Decouple driver from ACPI A platform device is created by acpi_create_platform_device() per CMDQV's adev. That means there is no point in going through _CRS of ACPI. Replace all the ACPI functions with standard platform functions. And drop all ACPI dependencies. This will make the driver compatible with DT also. Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2026-01-22 15:12:08 +00:00
Danilo Krummrich	ed1ac3c977	iommu/arm-smmu-qcom: do not register driver in probe() Commit `0b4eeee287` ("iommu/arm-smmu-qcom: Register the TBU driver in qcom_smmu_impl_init") intended to also probe the TBU driver when CONFIG_ARM_SMMU_QCOM_DEBUG is disabled, but also moved the corresponding platform_driver_register() call into qcom_smmu_impl_init() which is called from arm_smmu_device_probe(). However, it neither makes sense to register drivers from probe() callbacks of other drivers, nor does the driver core allow registering drivers with a device lock already being held. The latter was revealed by commit `dc23806a7c` ("driver core: enforce device_lock for driver_match_device()") leading to a deadlock condition described in [1]. Additionally, it was noted by Robin that the current approach is potentially racy with async probe [2]. Hence, fix this by registering the qcom_smmu_tbu_driver from module_init(). Unfortunately, due to the vendoring of the driver, this requires an indirection through arm-smmu-impl.c. Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/lkml/7ae38e31-ef31-43ad-9106-7c76ea0e8596@sirena.org.uk/ Link: https://lore.kernel.org/lkml/DFU7CEPUSG9A.1KKGVW4HIPMSH@kernel.org/ [1] Link: https://lore.kernel.org/lkml/0c0d3707-9ea5-44f9-88a1-a65c62e3df8d@arm.com/ [2] Fixes: `dc23806a7c` ("driver core: enforce device_lock for driver_match_device()") Fixes: `0b4eeee287` ("iommu/arm-smmu-qcom: Register the TBU driver in qcom_smmu_impl_init") Acked-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Bjorn Andersson <andersson@kernel.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Acked-by: Konrad Dybcio <konradybcio@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> #LX2160ARDB Tested-by: Wang Jiayue <akaieurus@gmail.com> Reviewed-by: Wang Jiayue <akaieurus@gmail.com> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Link: https://patch.msgid.link/20260121141215.29658-1-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2026-01-22 15:39:55 +01:00
Bibek Kumar Patro	14e9a138dd	iommu/arm-smmu-qcom: Restore ACTLR settings for MDSS on sa8775p The ACTLR configuration for the sa8775p MDSS client was inadvertently dropped while reworking the commit `f91879fdf7` ("iommu/arm-smmu-qcom: Add actlr settings for mdss on Qualcomm platforms"). Without this entry, the sa8775p MDSS block does not receive the intended default ACTLR configuration. Restore the missing compatible entry so that the platform receives the expected behavior. Fixes: `f91879fdf7` ("iommu/arm-smmu-qcom: Add actlr settings for mdss on Qualcomm platforms") Signed-off-by: Bibek Kumar Patro <bibek.patro@oss.qualcomm.com> Signed-off-by: Will Deacon <will@kernel.org>	2026-01-22 14:24:36 +00:00
Lu Baolu	c3b1edea37	iommu/vt-d: Fix race condition during PASID entry replacement The Intel VT-d PASID table entry is 512 bits (64 bytes). When replacing an active PASID entry (e.g., during domain replacement), the current implementation calculates a new entry on the stack and copies it to the table using a single structure assignment. struct pasid_entry pte, new_pte; pte = intel_pasid_get_entry(dev, pasid); pasid_pte_config_first_level(iommu, &new_pte, ...); pte = new_pte; Because the hardware may fetch the 512-bit PASID entry in multiple 128-bit chunks, updating the entire entry while it is active (Present bit set) risks a "torn" read. In this scenario, the IOMMU hardware could observe an inconsistent state — partially new data and partially old data — leading to unpredictable behavior or spurious faults. Fix this by removing the unsafe "replace" helpers and following the "clear-then-update" flow, which ensures the Present bit is cleared and the required invalidation handshake is completed before the new configuration is applied. Fixes: `7543ee63e8` ("iommu/vt-d: Add pasid replace helpers") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20260120061816.2132558-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-22 09:20:30 +01:00
Lu Baolu	c1e4f1dccb	iommu/vt-d: Clear Present bit before tearing down context entry When tearing down a context entry, the current implementation zeros the entire 128-bit entry using multiple 64-bit writes. This creates a window where the hardware can fetch a "torn" entry — where some fields are already zeroed while the 'Present' bit is still set — leading to unpredictable behavior or spurious faults. While x86 provides strong write ordering, the compiler may reorder writes to the two 64-bit halves of the context entry. Even without compiler reordering, the hardware fetch is not guaranteed to be atomic with respect to multiple CPU writes. Align with the "Guidance to Software for Invalidations" in the VT-d spec (Section 6.5.3.3) by implementing the recommended ownership handshake: 1. Clear only the 'Present' (P) bit of the context entry first to signal the transition of ownership from hardware to software. 2. Use dma_wmb() to ensure the cleared bit is visible to the IOMMU. 3. Perform the required cache and context-cache invalidation to ensure hardware no longer has cached references to the entry. 4. Fully zero out the entry only after the invalidation is complete. Also, add a dma_wmb() to context_set_present() to ensure the entry is fully initialized before the 'Present' bit becomes visible. Fixes: `ba39592764` ("Intel IOMMU: Intel IOMMU driver") Reported-by: Dmytro Maluka <dmaluka@chromium.org> Closes: https://lore.kernel.org/all/aTG7gc7I5wExai3S@google.com/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Dmytro Maluka <dmaluka@chromium.org> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20260120061816.2132558-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-22 09:20:29 +01:00
Lu Baolu	75ed00055c	iommu/vt-d: Clear Present bit before tearing down PASID entry The Intel VT-d Scalable Mode PASID table entry consists of 512 bits (64 bytes). When tearing down an entry, the current implementation zeros the entire 64-byte structure immediately using multiple 64-bit writes. Since the IOMMU hardware may fetch these 64 bytes using multiple internal transactions (e.g., four 128-bit bursts), updating or zeroing the entire entry while it is active (P=1) risks a "torn" read. If a hardware fetch occurs simultaneously with the CPU zeroing the entry, the hardware could observe an inconsistent state, leading to unpredictable behavior or spurious faults. Follow the "Guidance to Software for Invalidations" in the VT-d spec (Section 6.5.3.3) by implementing the recommended ownership handshake: 1. Clear only the 'Present' (P) bit of the PASID entry. 2. Use a dma_wmb() to ensure the cleared bit is visible to hardware before proceeding. 3. Execute the required invalidation sequence (PASID cache, IOTLB, and Device-TLB flush) to ensure the hardware has released all cached references. 4. Only after the flushes are complete, zero out the remaining fields of the PASID entry. Also, add a dma_wmb() in pasid_set_present() to ensure that all other fields of the PASID entry are visible to the hardware before the Present bit is set. Fixes: `0bbeb01a4f` ("iommu/vt-d: Manage scalalble mode PASID tables") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Dmytro Maluka <dmaluka@chromium.org> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20260120061816.2132558-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-22 09:20:29 +01:00
Yi Liu	04b1b069f1	iommu/vt-d: Flush piotlb for SVM and Nested domain Besides the paging domains that use FS, SVM and Nested domains need to use piotlb invalidation descriptor as well. Fixes: `b33125296b` ("iommu/vt-d: Create unique domain ops for each stage") Cc: stable@vger.kernel.org Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20251223065824.6164-1-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-22 09:20:29 +01:00
Dmytro Maluka	22d169bdd2	iommu/vt-d: Flush cache for PASID table before using it When writing the address of a freshly allocated zero-initialized PASID table to a PASID directory entry, do that after the CPU cache flush for this PASID table, not before it, to avoid the time window when this PASID table may be already used by non-coherent IOMMU hardware while its contents in RAM is still some random old data, not zero-initialized. Fixes: `194b3348bd` ("iommu/vt-d: Fix PASID directory pointer coherency") Signed-off-by: Dmytro Maluka <dmaluka@chromium.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20251221123508.37495-1-dmaluka@chromium.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-22 09:20:29 +01:00
Jinhui Guo	10e60d8781	iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in scalable mode Commit `4fc82cd907` ("iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected") relies on pci_dev_is_disconnected() to skip ATS invalidation for safely-removed devices, but it does not cover link-down caused by faults, which can still hard-lock the system. For example, if a VM fails to connect to the PCIe device, "virsh destroy" is executed to release resources and isolate the fault, but a hard-lockup occurs while releasing the group fd. Call Trace: qi_submit_sync qi_flush_dev_iotlb intel_pasid_tear_down_entry device_block_translation blocking_domain_attach_dev __iommu_attach_device __iommu_device_set_domain __iommu_group_set_domain_internal iommu_detach_group vfio_iommu_type1_detach_group vfio_group_detach_container vfio_group_fops_release __fput Although pci_device_is_present() is slower than pci_dev_is_disconnected(), it still takes only ~70 µs on a ConnectX-5 (8 GT/s, x2) and becomes even faster as PCIe speed and width increase. Besides, devtlb_invalidation_with_pasid() is called only in the paths below, which are far less frequent than memory map/unmap. 1. mm-struct release 2. {attach,release}_dev 3. set/remove PASID 4. dirty-tracking setup The gain in system stability far outweighs the negligible cost of using pci_device_is_present() instead of pci_dev_is_disconnected() to decide when to skip ATS invalidation, especially under GDR high-load conditions. Fixes: `4fc82cd907` ("iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected") Cc: stable@vger.kernel.org Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com> Link: https://lore.kernel.org/r/20251211035946.2071-3-guojinhui.liam@bytedance.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-22 09:20:28 +01:00
Jinhui Guo	42662d1983	iommu/vt-d: Skip dev-iotlb flush for inaccessible PCIe device without scalable mode PCIe endpoints with ATS enabled and passed through to userspace (e.g., QEMU, DPDK) can hard-lock the host when their link drops, either by surprise removal or by a link fault. Commit `4fc82cd907` ("iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected") adds pci_dev_is_disconnected() to devtlb_invalidation_with_pasid() so ATS invalidation is skipped only when the device is being safely removed, but it applies only when Intel IOMMU scalable mode is enabled. With scalable mode disabled or unsupported, a system hard-lock occurs when a PCIe endpoint's link drops because the Intel IOMMU waits indefinitely for an ATS invalidation that cannot complete. Call Trace: qi_submit_sync qi_flush_dev_iotlb __context_flush_dev_iotlb.part.0 domain_context_clear_one_cb pci_for_each_dma_alias device_block_translation blocking_domain_attach_dev iommu_deinit_device __iommu_group_remove_device iommu_release_device iommu_bus_notifier blocking_notifier_call_chain bus_notify device_del pci_remove_bus_device pci_stop_and_remove_bus_device pciehp_unconfigure_device pciehp_disable_slot pciehp_handle_presence_or_link_change pciehp_ist Commit `81e921fd32` ("iommu/vt-d: Fix NULL domain on device release") adds intel_pasid_teardown_sm_context() to intel_iommu_release_device(), which calls qi_flush_dev_iotlb() and can also hard-lock the system when a PCIe endpoint's link drops. Call Trace: qi_submit_sync qi_flush_dev_iotlb __context_flush_dev_iotlb.part.0 intel_context_flush_no_pasid device_pasid_table_teardown pci_pasid_table_teardown pci_for_each_dma_alias intel_pasid_teardown_sm_context intel_iommu_release_device iommu_deinit_device __iommu_group_remove_device iommu_release_device iommu_bus_notifier blocking_notifier_call_chain bus_notify device_del pci_remove_bus_device pci_stop_and_remove_bus_device pciehp_unconfigure_device pciehp_disable_slot pciehp_handle_presence_or_link_change pciehp_ist Sometimes the endpoint loses connection without a link-down event (e.g., due to a link fault); killing the process (virsh destroy) then hard-locks the host. Call Trace: qi_submit_sync qi_flush_dev_iotlb __context_flush_dev_iotlb.part.0 domain_context_clear_one_cb pci_for_each_dma_alias device_block_translation blocking_domain_attach_dev __iommu_attach_device __iommu_device_set_domain __iommu_group_set_domain_internal iommu_detach_group vfio_iommu_type1_detach_group vfio_group_detach_container vfio_group_fops_release __fput pci_dev_is_disconnected() only covers safe-removal paths; pci_device_is_present() tests accessibility by reading vendor/device IDs and internally calls pci_dev_is_disconnected(). On a ConnectX-5 (8 GT/s, x2) this costs ~70 µs. Since __context_flush_dev_iotlb() is only called on {attach,release}_dev paths (not hot), add pci_device_is_present() there to skip inaccessible devices and avoid the hard-lock. Fixes: `37764b952e` ("iommu/vt-d: Global devTLB flush when present context entry changed") Fixes: `81e921fd32` ("iommu/vt-d: Fix NULL domain on device release") Cc: stable@vger.kernel.org Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com> Link: https://lore.kernel.org/r/20251211035946.2071-2-guojinhui.liam@bytedance.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-22 09:20:28 +01:00
Mostafa Saleh	a7f1bc231b	iommu: debug-pagealloc: Use page_ext_get_from_phys() Instead of calling pfn_valid() and then getting the page, call the newly added function page_ext_get_from_phys(), which would also check for MMIO and offline memory and return NULL in that case. Signed-off-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-21 12:51:49 +01:00
Chaitanya Kulkarni	374e7af67d	iommu/io-pgtable-arm: fix size_t signedness bug in unmap path __arm_lpae_unmap() returns size_t but was returning -ENOENT (negative error code) when encountering an unmapped PTE. Since size_t is unsigned, -ENOENT (typically -2) becomes a huge positive value (0xFFFFFFFFFFFFFFFE on 64-bit systems). This corrupted value propagates through the call chain: __arm_lpae_unmap() returns -ENOENT as size_t -> arm_lpae_unmap_pages() returns it -> __iommu_unmap() adds it to iova address -> iommu_pgsize() triggers BUG_ON due to corrupted iova This can cause IOVA address overflow in __iommu_unmap() loop and trigger BUG_ON in iommu_pgsize() from invalid address alignment. Fix by returning 0 instead of -ENOENT. The WARN_ON already signals the error condition, and returning 0 (meaning "nothing unmapped") is the correct semantic for size_t return type. This matches the behavior of other io-pgtable implementations (io-pgtable-arm-v7s, io-pgtable-dart) which return 0 on error conditions. Fixes: `3318f7b5ce` ("iommu/io-pgtable-arm: Add quirk to quiet WARN_ON()") Cc: stable@vger.kernel.org Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Rob Clark <robin.clark@oss.qualcomm.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-20 18:07:35 +01:00
Jason Gunthorpe	98d5110f90	iommupt: Make it clearer to the compiler that pts.level == 0 for single page Older versions of gcc and clang sometimes get tripped up by the build time assertion in FIELD_PREP because they can see that the argument to FIELD_PREP is constant but can't see that the if condition protecting it is also a constant false. In file included from <command-line>: In function 'amdv1pt_install_leaf_entry', inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:651:3, inlined from '__map_single_page0' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:662:1, inlined from 'pt_descend' at drivers/iommu/generic_pt/fmt/../pt_iter.h:391:9, inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:658:10, inlined from '__map_single_page1.constprop' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:662:1: ././include/linux/compiler_types.h:631:45: error: call to '__compiletime_assert_251' declared with attribute error: FIELD_PREP: value too large for the field 631 \| _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) \| ^ ././include/linux/compiler_types.h:612:25: note: in definition of macro '__compiletime_assert' 612 \| prefix ## suffix(); \ \| ^~~~~~ ././include/linux/compiler_types.h:631:9: note: in expansion of macro '_compiletime_assert' 631 \| _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) \| ^~~~~~~~~~~~~~~~~~~ ./include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert' 39 \| #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) \| ^~~~~~~~~~~~~~~~~~ ./include/linux/bitfield.h:69:17: note: in expansion of macro 'BUILD_BUG_ON_MSG' 69 \| BUILD_BUG_ON_MSG(__builtin_constant_p(_val) ? \ \| ^~~~~~~~~~~~~~~~ ./include/linux/bitfield.h:90:17: note: in expansion of macro '__BF_FIELD_CHECK_MASK' 90 \| __BF_FIELD_CHECK_MASK(mask, val, pfx); \ \| ^~~~~~~~~~~~~~~~~~~~~ ./include/linux/bitfield.h:137:17: note: in expansion of macro '__FIELD_PREP' 137 \| __FIELD_PREP(_mask, _val, "FIELD_PREP: "); \ \| ^~~~~~~~~~~~ drivers/iommu/generic_pt/fmt/amdv1.h:220:26: note: in expansion of macro 'FIELD_PREP' 220 \| FIELD_PREP(AMDV1PT_FMT_OA, \| ^~~~~~~~~~ Changing the caller to check pts.level == 0 avoids demanding a bit of complex reasoning from the compiler that pts.level == level == 0. Instead the compiler sees that pt_install_leaf_entry() is called with a constant pts.level == 0 which makes it more reliable to see the constant false in the if. Fixes: `dcd6a011a8` ("iommupt: Add map_pages op") Reported-by: Chunyu Hu <chuhu@redhat.com> Closes: https://lore.kernel.org/all/aUn9uGPCooqB-RIF@gmail.com/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-20 10:18:04 +01:00
Suravee Suthikulpanit	c0a652a3d1	iommu/amd: Remove unused variable in amd_iommufd_viommu_destroy() This fixes warning reported by 0-DAY CI Kernel Test Service. Fixes: `757d2b1fdf` ("iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202601190634.bl7Mjx5Q-lkp@intel.com/ Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-20 10:16:16 +01:00
Alex Williamson	fab06e956f	* Reuse common phys_vec, phase out dma_buf_phys_vec Signed-off-by: Alex Williamson <alex@shazbot.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmluaLsRHGFsZXhAc2hh emJvdC5vcmcACgkQI5ubbjuwiyIZow/+N/BbZYkRub0lFbufNQ/+M/N80kRzfjjQ dE2/AN+YDfLdFxNvmq/P1fYI2B/Oc8msuGwT+C+eJMSVMzZIdCDvnhxH+aPH9roo 5zmE7fxLjWp/k1lOwb8TupOIR3Jl334AiC/t9PvqraIUMGvt3Uv+/7KoCx/xDxb7 ID6NbDKiXCZpvlQN1dukf8TCzDVFJnWOOMKLiDnez14okfd5rIqiFpOWMjtAhwlY VrwIy/eVqX6YktniwunqU3f4/BeurlHdCS29LdDnHdZzL6HER5onvMwIO8qMeuFZ yS8Bf8KJTYtqEedMtFUi5a/ipYu4vuK0KEqIE6USKZLuhQCqLVjFDKs5zUGRQ48X qLs59BBmP1WgOnM63OGXzBAAvelLNoh/D5KVzzXmQyNkBn6mFy1MWeR38Ozgl0FA +GjK+iwV/GRo+CgDa6Vz+eVwvCV2RhcYlT4cK6BodIQbwd9SWAMEcRxI/IEvOxfC YY/1U2JRhOSaQb9j65xgwylEbwoi8BMVbFWE3DydYMr+9PVaOyTKLcJLKrYmhwLn cuPetgLaK3UtxdcfhnZyrwzpmtvA56SAReQYg9s+TXFGFurQjNlGVlcKk4dB45nX JcOtWHm/6+3D8qoN6FY8Vj5QPePn48urSw1R1/D0LP7951gxknILiQpI7aqEPHyU rjAZH6nH6bI= =c1Gn -----END PGP SIGNATURE----- Merge tag 'common_phys_vec_via_vfio' into v6.20/vfio/next * Reuse common phys_vec, phase out dma_buf_phys_vec Signed-off-by: Alex Williamson <alex@shazbot.org>	2026-01-19 10:25:24 -07:00
Leon Romanovsky	b703b31ea8	types: reuse common phys_vec type instead of DMABUF open‑coded variant After commit `fcf463b92a` ("types: move phys_vec definition to common header"), we can use the shared phys_vec type instead of the DMABUF‑specific dma_buf_phys_vec, which duplicated the same structure and semantics. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20260107-convert-to-pvec-v1-1-6e3ab8079708@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>	2026-01-19 10:13:29 -07:00
Wei Wang	e2692c4eea	iommupt: Do not set C-bit on MMIO backed PTEs AMD Secure Memory Encryption (SME) marks individual memory pages as encrypted by setting the C-bit in page table entries. According to the AMD APM,any pages corresponding to MMIO addresses must be configured with the C-bit clear. The current *_iommu_set_prot() implementation sets the C-bit on all PTEs in the IOMMU page tables. This is incorrect for PTEs backed by MMIO, and can break PCIe peer-to-peer communication when IOVA is used. Fix this by avoiding the C-bit for MMIO-backed mappings. For amdv2 IOMMU page tables, there is a usage scenario for GVA->GPA mappings, and for the trusted MMIO in the TEE-IO case, the C-bit will need to be added to GPA. However, SNP guests do not yet support vIOMMU, and the trusted MMIO support is not ready in upstream. Adding the C-bit for trusted MMIO can be considered once those features land. Fixes: `879ced2bab` ("iommupt: Add the AMD IOMMU v1 page table format") Fixes: `aef5de756e` ("iommupt: Add the x86 64 bit page table format") Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Wei Wang <wei.w.wang@hotmail.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-19 10:19:54 +01:00
Vasant Hegde	3222b6de51	iommu/amd: Fix error path in amd_iommu_probe_device() Currently, the error path of amd_iommu_probe_device() unconditionally references dev_data, which may not be initialized if an early failure occurs (like iommu_init_device() fails). Move the out_err label to ensure the function exits immediately on failure without accessing potentially uninitialized dev_data. Fixes: `19e5cc156c` ("iommu/amd: Enable support for up to 2K interrupts per function") Cc: Rakuram Eswaran <rakuram.e96@gmail.com> Cc: Jörg Rödel <joro@8bytes.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202512191724.meqJENXe-lkp@intel.com/ Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 11:03:12 +01:00
Suravee Suthikulpanit	103f4e7c85	iommu/amd: Add support for nested domain attach/detach Introduce set_dte_nested() to program guest translation settings in the host DTE when attaches the nested domain to a device. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:15 +01:00
Suravee Suthikulpanit	93eee2a49c	iommu/amd: Refactor logic to program the host page table in DTE Introduce the amd_iommu_set_dte_v1() helper function to configure IOMMU host (v1) page table into DTE. This will be used later when attaching nested doamin. Also, remove obsolete warning when SNP is enabled and domain id is zero since this check is no longer applicable. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:15 +01:00

1 2 3 4 5 ...

6218 commits