Commit graph

6218 commits

Author SHA1 Message Date
Kees Cook
189f164e57 Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses
Conversion performed via this Coccinelle script:

  // SPDX-License-Identifier: GPL-2.0-only
  // Options: --include-headers-for-types --all-includes --include-headers --keep-comments
  virtual patch

  @gfp depends on patch && !(file in "tools") && !(file in "samples")@
  identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex,
 		    kzalloc_obj,kzalloc_objs,kzalloc_flex,
		    kvmalloc_obj,kvmalloc_objs,kvmalloc_flex,
		    kvzalloc_obj,kvzalloc_objs,kvzalloc_flex};
  @@

  	ALLOC(...
  -		, GFP_KERNEL
  	)

  $ make coccicheck MODE=patch COCCI=gfp.cocci

Build and boot tested x86_64 with Fedora 42's GCC and Clang:

Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01

Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-22 08:26:33 -08:00
Linus Torvalds
32a92f8c89 Convert more 'alloc_obj' cases to default GFP_KERNEL arguments
This converts some of the visually simpler cases that have been split
over multiple lines.  I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.

Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script.  I probably had made it a bit _too_ trivial.

So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.

The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 20:03:00 -08:00
Linus Torvalds
323bbfcf1e Convert 'alloc_flex' family to use the new default GFP_KERNEL argument
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.

As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
cee73b1e84 RISC-V updates for v7.0
- Add support for control flow integrity for userspace processes.
   This is based on the standard RISC-V ISA extensions Zicfiss and
   Zicfilp
 
 - Improve ptrace behavior regarding vector registers, and add some selftests
 
 - Optimize our strlen() assembly
 
 - Enable the ISO-8859-1 code page as built-in, similar to ARM64, for EFI
   volume mounting
 
 - Clean up some code slightly, including defining copy_user_page() as
   copy_page() rather than memcpy(), aligning us with other
   architectures; and using max3() to slightly simplify an expression
   in riscv_iommu_init_check()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAmmOYpYACgkQx4+xDQu9
 KkvzOQ/9Fq8ZxWgYofhTPtw9/vps3avheOHlEoRrBWYfn1VkTRPAcbUULL4PGXwg
 dnVFEl3AcrpOFikIthbukklLeLoOnUshZJBU25zY5h0My1jb63V1//gEwJR6I0dg
 +V+GJmfzc4+YVaHK6UFdn7j3GgKUbTC7xXRMuGEriAzKPnm3AXAjh94wMNx6depv
 Li3IXRoZT/HvqIAyfeAoM9STwOzJtE3Sc6fXABkzsIbNTjjdgIqoRSsQsKY10178
 z6ox/sVStnLmVaMbOd/ZVN0J70JRDsvK0TC0/13K1ESUbnVia9a3bPIxLRmSapKC
 wXnwAuSeevtFshGGyd5LZO0QQGxzG1H63Gky2GRoh8bTQbd2tQcfQzANdnPkBAQS
 j2aOiSsiUQeNZqfZAfEBwRd27GXRYlKb/MpgCZKUH+ZO9VG6QaD3VGvg17/Caghy
 nVdbBQ81ZV9tkz9EMN0vt2VJHmEqARh88w619laHjg+ioPTG4/UIDPzskt1I+Fgm
 Y6NQLeFyfaO3RKKDYWGPcY7fmWQI9V8MECHOvyVI4xJcgqAbqnfsgytjuiFbrfRo
 fTvpuB7kvltBZ180QSB79xj0sWGFTWR02MeWy3uOaLZz2eIm2ZTZbMUSgNYR0ldG
 L3y7CEkTkoVF1ijYgAfuMgptk3Yf0dpa66D9HUo947wWkNrW5ds=
 =4fTk
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-7.0-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Paul Walmsley:

 - Add support for control flow integrity for userspace processes.

   This is based on the standard RISC-V ISA extensions Zicfiss and
   Zicfilp

 - Improve ptrace behavior regarding vector registers, and add some
   selftests

 - Optimize our strlen() assembly

 - Enable the ISO-8859-1 code page as built-in, similar to ARM64, for
   EFI volume mounting

 - Clean up some code slightly, including defining copy_user_page() as
   copy_page() rather than memcpy(), aligning us with other
   architectures; and using max3() to slightly simplify an expression
   in riscv_iommu_init_check()

* tag 'riscv-for-linus-7.0-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits)
  riscv: lib: optimize strlen loop efficiency
  selftests: riscv: vstate_exec_nolibc: Use the regular prctl() function
  selftests: riscv: verify ptrace accepts valid vector csr values
  selftests: riscv: verify ptrace rejects invalid vector csr inputs
  selftests: riscv: verify syscalls discard vector context
  selftests: riscv: verify initial vector state with ptrace
  selftests: riscv: test ptrace vector interface
  riscv: ptrace: validate input vector csr registers
  riscv: csr: define vtype register elements
  riscv: vector: init vector context with proper vlenb
  riscv: ptrace: return ENODATA for inactive vector extension
  kselftest/riscv: add kselftest for user mode CFI
  riscv: add documentation for shadow stack
  riscv: add documentation for landing pad / indirect branch tracking
  riscv: create a Kconfig fragment for shadow stack and landing pad support
  arch/riscv: add dual vdso creation logic and select vdso based on hw
  arch/riscv: compile vdso with landing pad and shadow stack note
  riscv: enable kernel access to shadow stack memory via the FWFT SBI call
  riscv: add kernel command line option to opt out of user CFI
  riscv/hwprobe: add zicfilp / zicfiss enumeration in hwprobe
  ...
2026-02-12 19:17:44 -08:00
Linus Torvalds
cebcffe666 VFIO updates for v7.0-rc1
- Update outdated mdev comment referencing the renamed
    mdev_type_add() function. (Julia Lawall)
 
  - Introduce selftest support for IOMMU mapping of PCI MMIO BARs.
    (Alex Mastro)
 
  - Relax selftest assertion relative to differences in huge page
    handling between legacy (v1) TYPE1 IOMMU mapping behavior and
    the compatibility mode supported by IOMMUFD. (David Matlack)
 
  - Reintroduce memory poison handling support for non-struct-page-
    backed memory in the nvgrace-gpu variant driver. (Ankit Agrawal)
 
  - Replace dma_buf_phys_vec with phys_vec to avoid duplicate
    structure and semantics. (Leon Romanovsky)
 
  - Add missing upstream bridge locking across PCI function reset,
    resolving an assertion failure when secondary bus reset is used
    to provide that reset. (Anthony Pighin)
 
  - Fixes to hisi_acc vfio-pci variant driver to resolve corner case
    issues related to resets, repeated migration, and error injection
    scenarios. (Longfang Liu, Weili Qian)
 
  - Restrict vfio selftest builds to arm64 and x86_64, resolving
    compiler warnings on 32-bit archs. (Ted Logan)
 
  - Un-deprecate the fsl-mc vfio bus driver as a new maintainer has
    stepped up. (Ioana Ciornei)
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmmNCcMRHGFsZXhAc2hh
 emJvdC5vcmcACgkQI5ubbjuwiyLlvw/9FLOcpjKCcxyWFPGUMHV9L0N8dWMR5t75
 Pu6cBuYdpqGgrUaa1NWHYEzFbMSkEJMb5jLj26lokn2l4VZ9BKwdehaE/7t978z2
 J0FgnGUg3B4lYm5qoBStaJ26123XafTMnsBn+wKdXt/lN6ng6GXVBxnmGP+Fuuwd
 HA3MSFB6HUFw4et8qDG3ziyboN/pSWyXaupy60zvVy9x39i4/ZzMm3PSrYPdUX4x
 aPM+lWKRi5yFMwiksZyYb67XA717Js8xhmgNMeJ8Yz3ZUF0n3Z7ZpOzbU+hl8LNn
 sAea6+lXXsvNjEXfet1mjg7A+RYmuQdcjk58J//ijRXn7zRijRM671Bzc40T2JcP
 bfrajHhprMsE+u7VwiBuERACTtbemuaKSbi5iNLHAIqTFwPpb400PvbptkyQhkxh
 IRXIxqgKb5G6/sd73m9dKR9HU7d5SL3mNCARrymgqT6kRxz8fqtaVsXbbsa1Tgah
 iV8in7wjKJ/80rYQd7gNyj/RRpYTAJJemfnJtKGQ9LxGnej8AV6kUZ3np7hpspz7
 TVtmn9RxlwbA5lWYXJ4VUzt9u2Riwd2W6jg6ZnUknSZN6B5j2Jd2bDtF/FKLauKG
 DW/bN8UU7nzgC40ro92qJEFF2PC7GkfZUVRlgW0oq54QZjyCoAIpfYOXjLTSteYP
 umnjcrWkgag=
 =F+FV
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v7.0-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:
 "A small cycle with the bulk in selftests and reintroducing poison
  handling in the nvgrace-gpu driver. The rest are fixes, cleanups, and
  some dmabuf structure consolidation.

   - Update outdated mdev comment referencing the renamed
     mdev_type_add() function (Julia Lawall)

   - Introduce selftest support for IOMMU mapping of PCI MMIO BARs (Alex
     Mastro)

   - Relax selftest assertion relative to differences in huge page
     handling between legacy (v1) TYPE1 IOMMU mapping behavior and the
     compatibility mode supported by IOMMUFD (David Matlack)

   - Reintroduce memory poison handling support for non-struct-page-
     backed memory in the nvgrace-gpu variant driver (Ankit Agrawal)

   - Replace dma_buf_phys_vec with phys_vec to avoid duplicate structure
     and semantics (Leon Romanovsky)

   - Add missing upstream bridge locking across PCI function reset,
     resolving an assertion failure when secondary bus reset is used to
     provide that reset (Anthony Pighin)

   - Fixes to hisi_acc vfio-pci variant driver to resolve corner case
     issues related to resets, repeated migration, and error injection
     scenarios (Longfang Liu, Weili Qian)

   - Restrict vfio selftest builds to arm64 and x86_64, resolving
     compiler warnings on 32-bit archs (Ted Logan)

   - Un-deprecate the fsl-mc vfio bus driver as a new maintainer has
     stepped up (Ioana Ciornei)"

* tag 'vfio-v7.0-rc1' of https://github.com/awilliam/linux-vfio:
  vfio/fsl-mc: add myself as maintainer
  vfio: selftests: only build tests on arm64 and x86_64
  hisi_acc_vfio_pci: fix the queue parameter anomaly issue
  hisi_acc_vfio_pci: resolve duplicate migration states
  hisi_acc_vfio_pci: update status after RAS error
  hisi_acc_vfio_pci: fix VF reset timeout issue
  vfio/pci: Lock upstream bridge for vfio_pci_core_disable()
  types: reuse common phys_vec type instead of DMABUF open‑coded variant
  vfio/nvgrace-gpu: register device memory for poison handling
  mm: add stubs for PFNMAP memory failure registration functions
  vfio: selftests: Drop IOMMU mapping size assertions for VFIO_TYPE1_IOMMU
  vfio: selftests: Add vfio_dma_mapping_mmio_test
  vfio: selftests: Align BAR mmaps for efficient IOMMU mapping
  vfio: selftests: Centralize IOMMU mode name definitions
  vfio/mdev: update outdated comment
2026-02-12 15:52:39 -08:00
Linus Torvalds
c6e62d002b Driver core changes for 7.0-rc1
- Bus:
   - Ensure bus->match() is consistently called with the device lock held
   - Improve type safety of bus_find_device_by_acpi_dev()
 
 - Devtmpfs:
   - Parse 'devtmpfs.mount=' boot parameter with kstrtoint() instead of
     simple_strtoul()
   - Avoid sparse warning by making devtmpfs_context_ops static
 
 - IOMMU:
   - Do not register the qcom_smmu_tbu_driver in arm_smmu_device_probe()
 
 - MAINTAINERS:
   - Add the new driver-core mailing list (driver-core@lists.linux.dev)
     to all relevant entries
   - Add missing tree location for "FIRMWARE LOADER (request_firmware)"
   - Add driver-model documentation to the "DRIVER CORE" entry
   - Add missing driver-core maintainers to the "AUXILIARY BUS" entry
 
 - Misc:
   - Change return type of attribute_container_register() to void; it has
     always been infallible
   - Do not export sysfs_change_owner(), sysfs_file_change_owner() and
     device_change_owner()
   - Move devres_for_each_res() from the public devres header to
     drivers/base/base.h
   - Do not use a static struct device for the faux bus; allocate it
     dynamically
 
 - Revocable:
   - Patches for the revocable synchronization primitive have been
     scheduled for v7.0-rc1, but have been reverted as they need some
     more refinement
 
 - Rust:
   - Device:
     - Support dev_printk on all device types, not just the core Device
       struct; remove now-redundant .as_ref() calls in dev_* print calls
 
   - Devres:
     - Introduce an internal reference count in Devres<T> to avoid a
       deadlock condition in case of (indirect) nesting
 
   - DMA:
     - Allow drivers to tune the maximum DMA segment size via
       dma_set_max_seg_size()
 
   - I/O:
     - Introduce the concept of generic I/O backends to handle different
       kinds of device shared memory through a common interface.
 
       This enables higher-level concepts such as register abstractions,
       I/O slices, and field projections to be built generically on top.
 
       In a first step, introduce the Io, IoCapable<T>, and IoKnownSize
       trait hierarchy for sharing a common interface supporting offset
       validation and bound-checking logic between I/O backends.
 
     - Refactor MMIO to use the common I/O backend infrastructure
 
   - Misc:
     - Add __rust_helper annotations to C helpers for inlining into Rust
       code
     - Use "kernel vertical" style for imports
     - Replace kernel::c_str! with C string literals
     - Update ARef imports to use sync::aref
     - Use pin_init::zeroed() for struct auxiliary_device_id and debugfs
       file_operations initialization
     - Use LKMM atomic types in debugfs doc-tests
     - Various minor comment and documentation fixes
 
   - PCI:
     - Implement PCI configuration space accessors using the common I/O
       backend infrastructure
     - Document pci::Bar device endianness assumptions
 
   - SoC:
     - Abstractions for struct soc_device and struct soc_device_attribute
     - Sample driver for soc::Device
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS2q/xV6QjXAdC7k+1FlHeO1qrKLgUCaY0JegAKCRBFlHeO1qrK
 LtCjAQDeSqGuzQM6hkMjsUWbjdWyw0yrrXcOxhwIINTc7uCzogEA7JL00+eiKHYu
 SV2Ckn6UnSQ14rpEaDIzgZdurZHGUAM=
 =TL00
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core

Pull driver core updates from Danilo Krummrich:
 "Bus:

   - Ensure bus->match() is consistently called with the device lock
     held

   - Improve type safety of bus_find_device_by_acpi_dev()

  Devtmpfs:

   - Parse 'devtmpfs.mount=' boot parameter with kstrtoint() instead of
     simple_strtoul()

   - Avoid sparse warning by making devtmpfs_context_ops static

  IOMMU:

   - Do not register the qcom_smmu_tbu_driver in arm_smmu_device_probe()

  MAINTAINERS:

   - Add the new driver-core mailing list (driver-core@lists.linux.dev)
     to all relevant entries

   - Add missing tree location for "FIRMWARE LOADER (request_firmware)"

   - Add driver-model documentation to the "DRIVER CORE" entry

   - Add missing driver-core maintainers to the "AUXILIARY BUS" entry

  Misc:

   - Change return type of attribute_container_register() to void; it
     has always been infallible

   - Do not export sysfs_change_owner(), sysfs_file_change_owner() and
     device_change_owner()

   - Move devres_for_each_res() from the public devres header to
     drivers/base/base.h

   - Do not use a static struct device for the faux bus; allocate it
     dynamically

  Revocable:

   - Patches for the revocable synchronization primitive have been
     scheduled for v7.0-rc1, but have been reverted as they need some
     more refinement

  Rust:

   - Device:
      - Support dev_printk on all device types, not just the core Device
        struct; remove now-redundant .as_ref() calls in dev_* print
        calls

   - Devres:
      - Introduce an internal reference count in Devres<T> to avoid a
        deadlock condition in case of (indirect) nesting

   - DMA:
      - Allow drivers to tune the maximum DMA segment size via
        dma_set_max_seg_size()

   - I/O:
      - Introduce the concept of generic I/O backends to handle
        different kinds of device shared memory through a common
        interface.

        This enables higher-level concepts such as register
        abstractions, I/O slices, and field projections to be built
        generically on top.

        In a first step, introduce the Io, IoCapable<T>, and IoKnownSize
        trait hierarchy for sharing a common interface supporting offset
        validation and bound-checking logic between I/O backends.

      - Refactor MMIO to use the common I/O backend infrastructure

   - Misc:
      - Add __rust_helper annotations to C helpers for inlining into
        Rust code

      - Use "kernel vertical" style for imports

      - Replace kernel::c_str! with C string literals

      - Update ARef imports to use sync::aref

      - Use pin_init::zeroed() for struct auxiliary_device_id and
        debugfs file_operations initialization

      - Use LKMM atomic types in debugfs doc-tests

      - Various minor comment and documentation fixes

   - PCI:
      - Implement PCI configuration space accessors using the common I/O
        backend infrastructure

      - Document pci::Bar device endianness assumptions

   - SoC:
      - Abstractions for struct soc_device and struct soc_device_attribute

      - Sample driver for soc::Device"

* tag 'driver-core-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (79 commits)
  rust: devres: fix race condition due to nesting
  rust: dma: add missing __rust_helper annotations
  samples: rust: pci: Remove some additional `.as_ref()` for `dev_*` print
  Revert "revocable: Revocable resource management"
  Revert "revocable: Add Kunit test cases"
  Revert "selftests: revocable: Add kselftest cases"
  driver core: remove device_change_owner() export
  sysfs: remove exports of sysfs_*change_owner()
  driver core: disable revocable code from build
  revocable: Add KUnit test for concurrent access
  revocable: fix SRCU index corruption by requiring caller-provided storage
  revocable: Add KUnit test for provider lifetime races
  revocable: Fix races in revocable_alloc() using RCU
  driver core: fix inverted "locked" suffix of driver_match_device()
  rust: io: move MIN_SIZE and io_addr_assert to IoKnownSize
  rust: pci: re-export ConfigSpace
  rust: dma: allow drivers to tune max segment size
  gpu: tyr: remove redundant `.as_ref()` for `dev_*` print
  rust: auxiliary: use `pin_init::zeroed()` for device ID
  rust: debugfs: use pin_init::zeroed() for file_operations
  ...
2026-02-11 17:43:59 -08:00
Linus Torvalds
1e0ea4dff0 IOMMU Updates for Linux v7.0
Including:
 
 	- Core changes:
 	  - Rust bindings for IO-pgtable code
 	  - IOMMU page allocation debugging support
 	  - Disable ATS during PCI resets
 
 	- Intel VT-d changes:
 	  - Skip dev-iotlb flush for inaccessible PCIe device
 	  - Flush cache for PASID table before using it
 	  - Use right invalidation method for SVA and NESTED domains
 	  - Ensure atomicity in context and PASID entry updates
 
 	- AMD-Vi changes:
 	  - Support for nested translations
 	  - Other minor improvements
 
 	- ARM-SMMU-v2 changes:
 	  - Configure SoC-specific prefetcher settings for Qualcomm's "MDSS".
 
 	- ARM-SMMU-v3 changes:
 	  - Improve CMDQ locking fairness for pathetically small queue sizes.
 	  - Remove tracking of the IAS as this is only relevant for AArch32 and
 	    was causing C_BAD_STE errors.
 	  - Add device-tree support for NVIDIA's CMDQV extension.
 	  - Allow some hitless transitions for the 'MEV' and 'EATS' STE fields.
 	  - Don't disable ATS for nested S1-bypass nested domains.
 	  - Additions to the kunit selftests.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmmLDZwACgkQK/BELZcB
 GuNHgg//Yf9K/+T6+IOemA5Z8k3x2p39Q/Dv5x+SEGkh+CUh2C5dX97WD9LHntus
 1mgIHlSgbM3bgMB+XTS1Q5ghy1QH71XOMnGCPhthwg843iCP2CcrB84ZZKKnNmw9
 2YJdxYlNcbAMpvSd0F1XKaXoiNl9qzWx+QFtnVaTXMptNEhYOxMOlaZPtlEuwfJa
 T7h4cwtsiMDLWA4pw85y4hfvc5jKRv4dMoohin0lNEBpWkCfYE6b2Cjpff+9TtU2
 Jyvvcvyns0US3amEwPHlIyfTUPKdaq6Vv3NX8TkAJUhGyEzdfwEtzqAvWMvOEYFh
 HfnE/LjZZLB1CUkF5MTib9dBgJACf/jtvOtuh4wZkx+7O2WIR6Ebo41dtWBM6dxh
 cHGeeQGqxdDZ5UJbIonF8Am0lxsaZx2zs09tlHEMGl2pNDi6vUppk1iTOkv3Wog0
 zy4GhDBl0n/IcyCaIinnWck8C+BsAMcRGpDP2AB0I9/C2qpsaFY/NdNkbIGidhaJ
 3khdAcjWsNPiJPNbUx66n6t8RSXdYKUuhJq2a/GgYmtAjhRR9cJlupB8/QYCBS5j
 fxXpHp4xMtw+Cgj58xC+gYXDivQOEThPs/BhL/qrxOzWE03HWI15MFydqRFWicnI
 gJCZSevMncBfNUTIJUSUmuT7ukP40cnh58QBeRkTmKGcW6HjuyY=
 =W/nW
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu updates from Joerg Roedel:
 "Core changes:
   - Rust bindings for IO-pgtable code
   - IOMMU page allocation debugging support
   - Disable ATS during PCI resets

  Intel VT-d changes:
   - Skip dev-iotlb flush for inaccessible PCIe device
   - Flush cache for PASID table before using it
   - Use right invalidation method for SVA and NESTED domains
   - Ensure atomicity in context and PASID entry updates

  AMD-Vi changes:
   - Support for nested translations
   - Other minor improvements

  ARM-SMMU-v2 changes:
   - Configure SoC-specific prefetcher settings for Qualcomm's "MDSS"

  ARM-SMMU-v3 changes:
   - Improve CMDQ locking fairness for pathetically small queue sizes
   - Remove tracking of the IAS as this is only relevant for AArch32 and
     was causing C_BAD_STE errors
   - Add device-tree support for NVIDIA's CMDQV extension
   - Allow some hitless transitions for the 'MEV' and 'EATS' STE fields
   - Don't disable ATS for nested S1-bypass nested domains
   - Additions to the kunit selftests"

* tag 'iommu-updates-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (54 commits)
  iommupt: Always add IOVA range to iotlb_gather in gather_range_pages()
  iommu/amd: serialize sequence allocation under concurrent TLB invalidations
  iommu/amd: Fix type of type parameter to amd_iommufd_hw_info()
  iommu/arm-smmu-v3: Do not set disable_ats unless vSTE is Translate
  iommu/arm-smmu-v3-test: Add nested s1bypass/s1dssbypass coverage
  iommu/arm-smmu-v3: Mark EATS_TRANS safe when computing the update sequence
  iommu/arm-smmu-v3: Mark STE MEV safe when computing the update sequence
  iommu/arm-smmu-v3: Add update_safe bits to fix STE update sequence
  iommu/arm-smmu-v3: Add device-tree support for CMDQV driver
  iommu/tegra241-cmdqv: Decouple driver from ACPI
  iommu/arm-smmu-qcom: Restore ACTLR settings for MDSS on sa8775p
  iommu/vt-d: Fix race condition during PASID entry replacement
  iommu/vt-d: Clear Present bit before tearing down context entry
  iommu/vt-d: Clear Present bit before tearing down PASID entry
  iommu/vt-d: Flush piotlb for SVM and Nested domain
  iommu/vt-d: Flush cache for PASID table before using it
  iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in scalable mode
  iommu/vt-d: Skip dev-iotlb flush for inaccessible PCIe device without scalable mode
  rust: iommu: fix `srctree` link warning
  rust: iommu: fix Rust formatting
  ...
2026-02-11 16:36:08 -08:00
Linus Torvalds
2619c62b7e Trivial cleanups for the posted MSI interrupt handling
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmmJ2msQHHRnbHhAa2Vy
 bmVsLm9yZwAKCRCmGPVMDXSYoa3TD/9vZsCSf/SxpgHRLh/GwrIgCOOg1IHBmwmZ
 xbfa1k2E29uIiez+qEE2oBBsCdL8NPx2JQLO4qZu4d9Cv9F9vk/QedjTFhjfiz5n
 geaGuka3tXsdEYO4cZhKgH3MkZGo4u3vKauj7zTaVLbknq5NfdMlZipWtQ3P88B/
 bN7t0814vqhg+8JNUraMYqG15o6CVAvLj3IDiSpcpPj6kCVmFfRdtJFvJvRuCY/I
 trnbSwV4wEqsX629BdEcjX2izqDCUO9tqSB709KmjeUFuCyPdr+mxfUScE0gmTNq
 L/gWvbNT2xQzk65Z3toJZsqsDGuUm1dq5DfEedzaZ8F1tuoSY6ePfz2242yXmzGo
 IhmLRuKuGJ5PQH1X2NfwC/QHIZJikE71O6+ojMo4PEM98/EQ7iBMFvXWanbRTQqu
 d/ZIJ7LNHjQKQHXm11oMbwiz4nuLLH9gb8Rv/nnxHT1UFI2QlozmWLpYhAE2sWPU
 T7MiDf9Dha1dRuG7U0LyuvT9/wvdsYIHPzhLEpiFuFAm1uSUqITr6UnW9ADIXlcI
 5ZWqe7YZo60Nnj1BPj9tw480dGpUpQDFzuTnGMHaMTUNoInQlJU6S4ZYNiHJY4vu
 3z+EfP4qHkMFAjhOsK9T+UL7UDiZygf/CjaJxriJZXvIMH9MBdYFBjuC8HbYjdKv
 q48DcsUU8g==
 =Innj
 -----END PGP SIGNATURE-----

Merge tag 'x86-irq-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 irq updates from Thomas Gleixner:
 "Trivial cleanups for the posted MSI interrupt handling"

* tag 'x86-irq-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq_remapping: Sanitize posted_msi_supported()
  x86/irq: Cleanup posted MSI code
2026-02-10 17:39:08 -08:00
Linus Torvalds
4e21e585b6 A series of treewide cleanups to ensure interrupt request consistency.
- Add the missing IRQF_COND_ONESHOT flag to devm_request_irq()
 
     This is inconsistent vs. request_irq() and causes the same issues which
     where addressed with the introduction of this flag
 
   - Cleanup IRQF_ONESHOT and IRQF_NO_THREAD usage
 
     Quite some drivers have inconsistent interrupt request flags related to
     interrupt threading namely IRQF_ONESHOT and IRQF_NO_THREAD. This leads to
     warnings and/or malfunction when forced interrupt threading is enabled.
 
   - Remove stub primary (hard interrupt) handlers
 
     A bunch of drivers implement a stub primary (hard interrupt) handler which
     just returns IRQ_WAKE_THREAD. The same functionality is provided by the
     core code when the primary handler argument of request_thread_irq() is set
     to NULL.
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmmJs8MQHHRnbHhAa2Vy
 bmVsLm9yZwAKCRCmGPVMDXSYoTbvEACH4OegGofKri7aecUPNcpRdQDHBoueikni
 Rio/vydFJ/H2hto4xlSPC4C84onxuFqY9lJgo/tCQTCrO0t+ZQ4ZGqnlQKzLJzmv
 vcVzNgGsxDZ0p1wJO0rBpTRxJN8yTXi8VVv5e6OPuihjLhdXGesyYtk1zosR3nOS
 CF/w8r9jVMzsSMPvtEMr5AwXD9ZTziUqyhQv94fYlpsbyD4TPXnUxhVkdUFFHHo3
 ROzWPFw1Ykh6wpdRPEpupcCf1d2Pq0TIAU86y3Sbf2msuXiTouHf+lH1uTd3EsLN
 6qUIqRYjwWE8HTieh+3YcH415wrIsUsWJb8YDi0DpqhPbja3IXP5ACHqEWaaNHRA
 MaBE2Gc02se4ChXMWncYR3cdzyAAwAeKLUahpLNc+7U4cHOm1w2g60yy4I0v2krh
 V0vfEN88WQ8DgrM0VvDLST6ZinSz4ia+R0qYWywl6eIW4RVNtuBi6wrN5PtzSEtz
 jZ3LqnRLGmNfKwS/taHBCAme7NIJSNa1L0ao/icnW5XVQz/d2EHVcUsLHecHZSMx
 l9tr/g3t85tsFW1eIKfF8T1a5DrbCEP4afceQk9KexAfAkP7el53M1E1yQDk/kW8
 so0CwZtbDJ136RQdBIQqx49QrUEOvtrgNDRQxPFBUrWEHcvjqbUuFclp9hpLheOj
 8YnzkVe0Rg==
 =vrmm
 -----END PGP SIGNATURE-----

Merge tag 'irq-cleanups-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq cleanups from Thomas Gleixner:
 "A series of treewide cleanups to ensure interrupt request consistency.

   - Add the missing IRQF_COND_ONESHOT flag to devm_request_irq()

     This is inconsistent vs request_irq() and causes the same issues
     which where addressed with the introduction of this flag

   - Cleanup IRQF_ONESHOT and IRQF_NO_THREAD usage

     Quite some drivers have inconsistent interrupt request flags
     related to interrupt threading namely IRQF_ONESHOT and
     IRQF_NO_THREAD. This leads to warnings and/or malfunction when
     forced interrupt threading is enabled.

   - Remove stub primary (hard interrupt) handlers

     A bunch of drivers implement a stub primary (hard interrupt)
     handler which just returns IRQ_WAKE_THREAD. The same functionality
     is provided by the core code when the primary handler argument of
     request_thread_irq() is set to NULL"

* tag 'irq-cleanups-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  media: pci: mg4b: Use IRQF_NO_THREAD
  mfd: wm8350-core: Use IRQF_ONESHOT
  thermal/qcom/lmh: Replace IRQF_ONESHOT with IRQF_NO_THREAD
  rtc: amlogic-a4: Remove IRQF_ONESHOT
  usb: typec: fusb302: Remove IRQF_ONESHOT
  EDAC/altera: Remove IRQF_ONESHOT
  char: tpm: cr50: Remove IRQF_ONESHOT
  ARM: versatile: Remove IRQF_ONESHOT
  scsi: efct: Use IRQF_ONESHOT and default primary handler
  Bluetooth: btintel_pcie: Use IRQF_ONESHOT and default primary handler
  bus: fsl-mc: Use default primary handler
  mailbox: bcm-ferxrm-mailbox: Use default primary handler
  iommu/amd: Use core's primary handler and set IRQF_ONESHOT
  platform/x86: int0002: Remove IRQF_ONESHOT from request_irq()
  genirq: Set IRQF_COND_ONESHOT in devm_request_irq().
2026-02-10 13:22:50 -08:00
Joerg Roedel
ad09563660 Merge branches 'fixes', 'arm/smmu/updates', 'intel/vt-d', 'amd/amd-vi' and 'core' into next 2026-02-06 11:10:40 +01:00
Viktor Kleen
02f9d76a76 iommu/vt-d: Treat PAGE_SNOOP and PWSNP separately
The PASID_FLAG_PAGE_SNOOP and PASID_FLAG_PWSNP constants are identical.
This will cause the pasid code to always set both or neither of the
PGSNP and PWSNP bits in PASID table entries. However, PWSNP is a
reserved bit if SMPWC is not set in the IOMMU's extended capability
register, even if SC is supported.

This has resulted in DMAR errors when testing the iommufd code on an
Arrow Lake platform. With this patch, those errors disappear and the
PASID table entries look correct.

Fixes: 101a285411 ("iommu/vt-d: Follow PT_FEAT_DMA_INCOHERENT into the PASID entry")
Cc: stable@vger.kernel.org
Signed-off-by: Viktor Kleen <viktor@kleen.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20260202192109.1665799-1-viktor@kleen.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-02-06 11:01:00 +01:00
Yu Zhang
b48ca92061 iommupt: Always add IOVA range to iotlb_gather in gather_range_pages()
Add current (iova, len) to the iotlb gather, regardless of the setting
of PT_FEAT_FLUSH_RANGE or PT_FEAT_FLUSH_RANGE_NO_GAPS.

In gather_range_pages(), the current IOVA range is only added to
iotlb_gather when PT_FEAT_FLUSH_RANGE is set. Yet a virtual IOMMU with
NpCache uses only PT_FEAT_FLUSH_RANGE_NO_GAPS. In that case, iotlb_gather
will stay empty (start=ULONG_MAX, end=0) after initialization, and the
current (iova, len) will not be added to the iotlb_gather, causing
subsequent iommu_iotlb_sync() to perform IOTLB invalidation with wrong
parameters (e.g., amd_iommu_iotlb_sync() computes size from
gather->end - gather->start + 1, leading to an invalid range).

The disjoint check and sync for PT_FEAT_FLUSH_RANGE_NO_GAPS remain
unchanged: when the new range is disjoint from the existing gather,
we still sync first and then add the new range, so semantics for
NO_GAPS are preserved.

Fixes: 7c53f4238a ("iommupt: Add unmap_pages op")
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yu Zhang <zhangyu1@linux.microsoft.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-02-03 14:36:21 +01:00
Ankit Soni
9e249c4841 iommu/amd: serialize sequence allocation under concurrent TLB invalidations
With concurrent TLB invalidations, completion wait randomly gets timed out
because cmd_sem_val was incremented outside the IOMMU spinlock, allowing
CMD_COMPL_WAIT commands to be queued out of sequence and breaking the
ordering assumption in wait_on_sem().
Move the cmd_sem_val increment under iommu->lock so completion sequence
allocation is serialized with command queuing.
And remove the unnecessary return.

Fixes: d2a0cac105 ("iommu/amd: move wait_on_sem() out of spinlock")

Tested-by: Srikanth Aithal <sraithal@amd.com>
Reported-by: Srikanth Aithal <sraithal@amd.com>
Signed-off-by: Ankit Soni <Ankit.Soni@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-02-03 14:27:05 +01:00
Sebastian Andrzej Siewior
5bfcdccb4d iommu/amd: Use core's primary handler and set IRQF_ONESHOT
request_threaded_irq() is invoked with a primary and a secondary handler
and no flags are passed. The primary handler is the same as
irq_default_primary_handler() so there is no need to have an identical
copy.

The lack of the IRQF_ONESHOT can be dangerous because the interrupt
source is not masked while the threaded handler is active. This means,
especially on LEVEL typed interrupt lines, the interrupt can fire again
before the threaded handler had a chance to run.

Use the default primary interrupt handler by specifying NULL and set
IRQF_ONESHOT so the interrupt source is masked until the secondary
handler is done.

Fixes: 72fe00f01f ("x86/amd-iommu: Use threaded interupt handler")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260128095540.863589-4-bigeasy@linutronix.de
2026-02-01 17:37:13 +01:00
Linus Torvalds
162b42445b IOMMU Fixes for Linux v6.19-rc7
Including:
 
 	- Fix a performance regression cause by new Generic
 	  IO-Page-Table code detected in Intel VT-d driver.
 
 	- Command queue flushing fix for NVidia version of the
 	  ARM-SMMU-v3
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAml+LeMACgkQK/BELZcB
 GuPi4xAA5T7+RG/KG244Wtb+jjiYg8FabKkPtubWw4PD3fVo2KW3Zkt3xHUtznDL
 Lpm5Ywb7yJ2pgwy4E4H4mE3wkVAWy9K4WNzv+jr++SxXktBBi52xrqAzJspbQUGd
 Twrdh7CGeoHQnuEVAm0OnUDA0JKx6/tLPI3XCHNH0zdas5eKnODDpE2w0etDpuTE
 +fRXc+n2Z2P7UzjDBBrjb5xM+SyN0ImuBqk11D3psye2zZu/KLXZskLpwtNRm+fr
 Jzl5LkTcOwCHk/Do/ZW2SpJLMV4p70QympEJRCfWu/wca+0R9APNdhUFlTPazKSj
 eo+SUSUBYo7LTmrR4asyEr1UCEI0M+OCZAhYKQzVKNyPCFnnnKCufH6oTyxX17I4
 nDg5piCtedwU9rUjX0VqRmx3ZA+0dILfcOGsgWV4mOvbO0nuQk6PHXfgTd3Fzdnh
 2yjaUFWwUN/edCZHQAtk1RqmPu3LLXeUsx1eMngJlivHljMtGKGD18vmZzXZUNZL
 aG/H5FRVDX63mNZNJyC90SWx3LMmhmOAWWBbvRTXtL3xvLgd6j+ZYYSwnghnYmpn
 c9BE+POvLexRrRNXAFEChTHt9K2IWuPxBqGHOjLmM0Nds+lP0Ixh5EjuwpJW0gSY
 k+z9MLtFFI3TJn6S3h7SAQpc7ngDiHfvxKSfcRZ1YwZIDXLwM3Q=
 =/mf8
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:

 - Fix a performance regression cause by the new Generic IO-Page-Table
   code detected in Intel VT-d driver

 - Command queue flushing fix for NVidia version of the ARM-SMMU-v3

* tag 'iommu-fixes-v6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/tegra241-cmdqv: Reset VCMDQ in tegra241_vcmdq_hw_init_user()
  iommupt: Only cache flush memory changed by unmap
2026-01-31 09:40:13 -08:00
Nicolin Chen
80f1a2c233 iommu/tegra241-cmdqv: Reset VCMDQ in tegra241_vcmdq_hw_init_user()
The Enable bits in CMDQV/VINTF/VCMDQ_CONFIG registers do not actually reset
the HW registers. So, the driver explicitly clears all the registers when a
VINTF or VCMDQ is being initialized calling its hw_deinit() function.

However, a userspace VCMDQ is not properly reset, unlike an in-kernel VCMDQ
getting reset in tegra241_vcmdq_hw_init().

Meanwhile, tegra241_vintf_hw_init() calling tegra241_vintf_hw_deinit() will
not deinit any VCMDQ, since there is no userspace VCMDQ mapped to the VINTF
at that stage.

Then, this may result in dirty VCMDQ registers, which can fail the VM.

Like tegra241_vcmdq_hw_init(), reset a VCMDQ in tegra241_vcmdq_hw_init() to
fix this bug. This is required by a host kernel.

Fixes: 6717f26ab1e7 ("iommu/tegra241-cmdqv: Add user-space use support")
Cc: stable@vger.kernel.org
Reported-by: Bao Nguyen <ncqb@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-31 10:22:08 +01:00
Deepanshu Kartikey
2724138b2f iommufd: Initialize batch->kind in batch_clear()
KMSAN reported an uninitialized value when batch_add_pfn_num() reads
batch->kind. This occurs because batch_clear() does not initialize the
kind field.

When batch_add_pfn_num() checks "if (batch->kind != kind)", it reads this
uninitialized value, triggering KMSAN warnings. However the algorithm is
fine with any value in kind at this point as the batch is always empty and
it always corrects kind if wrong.

Initialize batch->kind to zero in batch_clear() to silence the KMSAN
warning.

Link: https://patch.msgid.link/r/20260124132214.624041-1-kartikey406@gmail.com
Reported-by: syzbot+df28076a30d726933015@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=df28076a30d726933015
Fixes: f394576eb1 ("iommufd: PFN handling for iopt_pages")
Tested-by: syzbot+df28076a30d726933015@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: syzbot+a0c841e02f328005bbcc@syzkaller.appspotmail.com
Reported-by: syzbot+a0c841e02f328005bbcc@syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-01-28 12:49:17 -04:00
Jason Gunthorpe
5815d9303c iommupt: Only cache flush memory changed by unmap
The cache flush was happening on every level across the whole range of
iteration, even if no leafs or tables were cleared. Instead flush only the
sub range that was actually written.

Overflushing isn't a correctness problem but it does impact the
performance of unmap.

After this series the performance compared to the original VT-d
implementation with cache flushing turned on is:

map_pages
   pgsz  ,avg new,old ns, min new,old ns  , min % (+ve is better)
     2^12,    253,266   ,     213,227     ,   6.06
     2^21,    246,244   ,     221,219     ,   0.00
     2^30,    231,240   ,     209,217     ,   3.03
 256*2^12,   2604,2668  ,    2415,2540    ,   4.04
 256*2^21,   2495,2824  ,    2390,2734    ,  12.12
 256*2^30,   2542,2845  ,    2380,2718    ,  12.12

unmap_pages
   pgsz  ,avg new,old ns, min new,old ns  , min % (+ve is better)
     2^12,    259,292   ,     222,251     ,  11.11
     2^21,    255,259   ,     227,236     ,   3.03
     2^30,    238,254   ,     217,230     ,   5.05
 256*2^12,   2751,2620  ,    2417,2437    ,   0.00
 256*2^21,   2461,2526  ,    2377,2423    ,   1.01
 256*2^30,   2498,2543  ,    2370,2404    ,   1.01

Fixes: efa03dab7c ("iommupt: Flush the CPU cache after any writes to the page table")
Reported-by: Francois Dugast <francois.dugast@intel.com>
Closes: https://lore.kernel.org/all/20260121130233.257428-1-francois.dugast@intel.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-28 15:14:17 +01:00
Nathan Chancellor
5b0530bb16 iommu/amd: Fix type of type parameter to amd_iommufd_hw_info()
When building with -Wincompatible-function-pointer-types-strict, a
warning designed to catch kernel control flow integrity (kCFI) issues at
build time, there is an instance around amd_iommufd_hw_info():

  drivers/iommu/amd/iommu.c:3141:13: error: incompatible function pointer types initializing 'void *(*)(struct device *, u32 *, enum iommu_hw_info_type *)' (aka 'void *(*)(struct device *, unsigned int *, enum iommu_hw_info_type *)') with an expression of type 'void *(struct device *, u32 *, u32 *)' (aka 'void *(struct device *, unsigned int *, unsigned int *)') [-Werror,-Wincompatible-function-pointer-types-strict]
   3141 |         .hw_info = amd_iommufd_hw_info,
        |                    ^~~~~~~~~~~~~~~~~~~

While 'u32 *' and 'enum iommu_hw_info_type *' are ABI compatible, hence
no regular warning from -Wincompatible-function-pointer-types, the
mismatch will trigger a kCFI violation when amd_iommufd_hw_info() is
called indirectly.

Update the type parameter of amd_iommufd_hw_info() to be
'enum iommu_hw_info_type *' to match the prototype in
'struct iommu_ops', clearing up the warning and kCFI violation.

Fixes: 7d8b06ecc4 ("iommu/amd: Add support for hw_info for iommu capability query")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-28 15:13:01 +01:00
Danilo Krummrich
559ac49154 Driver core fixes deferred from 6.19-rc7
[1, 2] were originally intended for -rc7. Patch [1] uncovered potential
 deadlocks that require a few driver fixes; [2] is one such fix.
 
 [1] https://patch.msgid.link/20260113162843.12712-1-hanguidong02@gmail.com
 [2] https://patch.msgid.link/20260121141215.29658-1-dakr@kernel.org
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS2q/xV6QjXAdC7k+1FlHeO1qrKLgUCaXdn0AAKCRBFlHeO1qrK
 Lu0BAQDfaNgkqh55vA7C+meIUTKKEnARTsRHowRnex2zty0VVQEA/tExyJKy8QY4
 pNe8ghYW6mmad/eY+pixYzaFacv7+ws=
 =CzPS
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.19-rc7-deferred' into driver-core-next

Driver core fixes deferred from 6.19-rc7

[1, 2] were originally intended for -rc7. Patch [1] uncovered potential
deadlocks that require a few driver fixes; [2] is one such fix.

[1] https://patch.msgid.link/20260113162843.12712-1-hanguidong02@gmail.com
[2] https://patch.msgid.link/20260121141215.29658-1-dakr@kernel.org

Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-01-26 14:12:02 +01:00
Markus Elfring
3127718ad9 iommu/riscv: Simplify maximum determination in riscv_iommu_init_check()
Reduce nested max() calls by a single max3() call in this
function implementation.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Link: https://patch.msgid.link/d1a384c9-f154-4537-94d6-c3613f4167bc@web.de
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-01-25 21:09:52 -07:00
Linus Torvalds
b33d706259 IOMMU Fixes for Linux v6.19-rc6
Including:
 
 	- AMD IOMMU: Fix potential NULL-ptr dereference in error path of
 	  amd_iommu_probe_device().
 
 	- Generic IOMMUPT: Fix another compiler issue seen with older compiler
 	  versions.
 
 	- Fix signedness issue in ARM IO-PageTable code.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmlzOg0ACgkQK/BELZcB
 GuMQRhAAhgPw7RD6NPEb67NG4g28tuDBLEWBc10IS0d0XjP5QZCcFT3wLY4CYDsK
 J5QRDgYv6sACaOWgv5oB4N2WffqC6nC2YVniOmmq3s2Emfk/5eAo0MpRchvdiVDD
 F6tuUU/RwmeuUbK/6pa6SDrRsXy/SgQI858FkvVQSk8Ngj0ECFoUyJCbaqODzpP6
 Cxp2KyY0FqcfssGtf6uFMSvXhC0CrFOwWHBXp5UzMOPnEHABxMUdQEHTfsR631bt
 IrYphkspnhfMpeAntZxpqAeejWnWcMf3nYlwt5j1UxjUuvfNtloc6hHFb6ln+/ad
 BG0wX5kqOK4LC4oiStz3qo9eMWrmbvG8L32Wpo8iPxd1CJ+Lu2+EaaHAYDrgoMo2
 ApRAEEApzDkuGK61J3a1Ff2eg2WInBXmY55VXH2rBA2wnUrZKwYriivojt6ySXWz
 g6RpOWHxo6ztQN/C7VynqnvaQ5WKMHG9AhL2M6jQtP16obCz9hJut0Ps3v6AbOl2
 9bQDTAEnFAi+ribCqg4PFKlcVg8wLqjDQJQLRO/hbATfV/mY0loyRVIHdyPuQUya
 IP1iKZn2LNwsYgl0AtrcDvXojl6CD/T1W5XjR0Q5hrNbWk1CSi2GmXMJFCiKV0pu
 fzS6Sv/8DS2tE1QgG4IAgbEqqb0TKeOdWcYFdZP/WRgaC8tPTeM=
 =k63u
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:

 - AMD IOMMU: Fix potential NULL-ptr dereference in error path
   of amd_iommu_probe_device()

 - Generic IOMMUPT: Fix another compiler issue seen with older
   compiler versions

 - Fix signedness issue in ARM IO-PageTable code

* tag 'iommu-fixes-v6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/io-pgtable-arm: fix size_t signedness bug in unmap path
  iommupt: Make it clearer to the compiler that pts.level == 0 for single page
  iommu/amd: Fix error path in amd_iommu_probe_device()
2026-01-23 12:46:12 -08:00
Nicolin Chen
a45dd34663 iommu/arm-smmu-v3: Do not set disable_ats unless vSTE is Translate
A vSTE may have three configuration types: Abort, Bypass, and Translate.

An Abort vSTE wouldn't enable ATS, but the other two might.

It makes sense for a Transalte vSTE to rely on the guest vSTE.EATS field.

For a Bypass vSTE, it would end up with an S2-only physical STE, similar
to an attachment to a regular S2 domain. However, the nested case always
disables ATS following the Bypass vSTE, while the regular S2 case always
enables ATS so long as arm_smmu_ats_supported(master) == true.

Note that ATS is needed for certain VM centric workloads and historically
non-vSMMU cases have relied on this automatic enablement. So, having the
nested case behave differently causes problems.

To fix that, add a condition to disable_ats, so that it might enable ATS
for a Bypass vSTE, aligning with the regular S2 case.

Fixes: f27298a82b ("iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED")
Cc: stable@vger.kernel.org
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
2026-01-23 13:58:38 +00:00
Nicolin Chen
a4f976edcb iommu/arm-smmu-v3-test: Add nested s1bypass/s1dssbypass coverage
STE in a nested case requires both S1 and S2 fields. And this makes the use
case different from the existing one.

Add coverage for previously failed cases shifting between S2-only and S1+S2
STEs.

Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
2026-01-23 13:47:49 +00:00
Jason Gunthorpe
7cad800485 iommu/arm-smmu-v3: Mark EATS_TRANS safe when computing the update sequence
If VM wants to toggle EATS_TRANS off at the same time as changing the CFG,
hypervisor will see EATS change to 0 and insert a V=0 breaking update into
the STE even though the VM did not ask for that.

In bare metal, EATS_TRANS is ignored by CFG=ABORT/BYPASS, which is why this
does not cause a problem until we have the nested case where CFG is always
a variation of S2 trans that does use EATS_TRANS.

Relax the rules for EATS_TRANS sequencing, we don't need it to be exact as
the enclosing code will always disable ATS at the PCI device when changing
EATS_TRANS. This ensures there are no ATS transactions that can race with
an EATS_TRANS change so we don't need to carefully sequence these bits.

Fixes: 1e8be08d1c ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
2026-01-23 13:47:49 +00:00
Jason Gunthorpe
f3c1d372db iommu/arm-smmu-v3: Mark STE MEV safe when computing the update sequence
Nested CD tables set the MEV bit to try to reduce multi-fault spamming on
the hypervisor. Since MEV is in STE word 1 this causes a breaking update
sequence that is not required and impacts real workloads.

For the purposes of STE updates the value of MEV doesn't matter, if it is
set/cleared early or late it just results in a change to the fault reports
that must be supported by the kernel anyhow. The spec says:

 Note: Software must expect, and be able to deal with, coalesced fault
 records even when MEV == 0.

So mark STE MEV safe when computing the update sequence, to avoid creating
a breaking update.

Fixes: da0c56520e ("iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
2026-01-23 13:47:49 +00:00
Jason Gunthorpe
2781f2a930 iommu/arm-smmu-v3: Add update_safe bits to fix STE update sequence
C_BAD_STE was observed when updating nested STE from an S1-bypass mode to
an S1DSS-bypass mode. As both modes enabled S2, the used bit is slightly
different than the normal S1-bypass and S1DSS-bypass modes. As a result,
fields like MEV and EATS in S2's used list marked the word1 as a critical
word that requested a STE.V=0. This breaks a hitless update.

However, both MEV and EATS aren't critical in terms of STE update. One
controls the merge of the events and the other controls the ATS that is
managed by the driver at the same time via pci_enable_ats().

Add an arm_smmu_get_ste_update_safe() to allow STE update algorithm to
relax those fields, avoiding the STE update breakages.

After this change, entry_set has no caller checking its return value, so
change it to void.

Note that this change is required by both MEV and EATS fields, which were
introduced in different kernel versions. So add get_update_safe() first.
MEV and EATS will be added to arm_smmu_get_ste_update_safe() separately.

Fixes: 1e8be08d1c ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
2026-01-23 13:47:49 +00:00
Ashish Mhetre
ea69dc4e20 iommu/arm-smmu-v3: Add device-tree support for CMDQV driver
Add device tree support to the CMDQV driver to enable usage on Tegra264
SoCs. The implementation parses the nvidia,cmdqv phandle from the SMMU
device tree node to associate each SMMU with its corresponding CMDQV
instance based on compatible string.

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
2026-01-22 15:12:08 +00:00
Nicolin Chen
eb20758f86 iommu/tegra241-cmdqv: Decouple driver from ACPI
A platform device is created by acpi_create_platform_device() per CMDQV's
adev. That means there is no point in going through _CRS of ACPI.

Replace all the ACPI functions with standard platform functions. And drop
all ACPI dependencies. This will make the driver compatible with DT also.

Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
2026-01-22 15:12:08 +00:00
Danilo Krummrich
ed1ac3c977 iommu/arm-smmu-qcom: do not register driver in probe()
Commit 0b4eeee287 ("iommu/arm-smmu-qcom: Register the TBU driver in
qcom_smmu_impl_init") intended to also probe the TBU driver when
CONFIG_ARM_SMMU_QCOM_DEBUG is disabled, but also moved the corresponding
platform_driver_register() call into qcom_smmu_impl_init() which is
called from arm_smmu_device_probe().

However, it neither makes sense to register drivers from probe()
callbacks of other drivers, nor does the driver core allow registering
drivers with a device lock already being held.

The latter was revealed by commit dc23806a7c ("driver core: enforce
device_lock for driver_match_device()") leading to a deadlock condition
described in [1].

Additionally, it was noted by Robin that the current approach is
potentially racy with async probe [2].

Hence, fix this by registering the qcom_smmu_tbu_driver from
module_init(). Unfortunately, due to the vendoring of the driver, this
requires an indirection through arm-smmu-impl.c.

Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/lkml/7ae38e31-ef31-43ad-9106-7c76ea0e8596@sirena.org.uk/
Link: https://lore.kernel.org/lkml/DFU7CEPUSG9A.1KKGVW4HIPMSH@kernel.org/ [1]
Link: https://lore.kernel.org/lkml/0c0d3707-9ea5-44f9-88a1-a65c62e3df8d@arm.com/ [2]
Fixes: dc23806a7c ("driver core: enforce device_lock for driver_match_device()")
Fixes: 0b4eeee287 ("iommu/arm-smmu-qcom: Register the TBU driver in qcom_smmu_impl_init")
Acked-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Bjorn Andersson <andersson@kernel.org>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Acked-by: Konrad Dybcio <konradybcio@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> #LX2160ARDB
Tested-by: Wang Jiayue <akaieurus@gmail.com>
Reviewed-by: Wang Jiayue <akaieurus@gmail.com>
Tested-by: Mark Brown <broonie@kernel.org>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Link: https://patch.msgid.link/20260121141215.29658-1-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-01-22 15:39:55 +01:00
Bibek Kumar Patro
14e9a138dd iommu/arm-smmu-qcom: Restore ACTLR settings for MDSS on sa8775p
The ACTLR configuration for the sa8775p MDSS client was inadvertently
dropped while reworking the commit f91879fdf7 ("iommu/arm-smmu-qcom:
Add actlr settings for mdss on Qualcomm platforms"). Without this
entry, the sa8775p MDSS block does not receive the intended default
ACTLR configuration.

Restore the missing compatible entry so that the platform receives the
expected behavior.

Fixes: f91879fdf7 ("iommu/arm-smmu-qcom: Add actlr settings for mdss on Qualcomm platforms")
Signed-off-by: Bibek Kumar Patro <bibek.patro@oss.qualcomm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2026-01-22 14:24:36 +00:00
Lu Baolu
c3b1edea37 iommu/vt-d: Fix race condition during PASID entry replacement
The Intel VT-d PASID table entry is 512 bits (64 bytes). When replacing
an active PASID entry (e.g., during domain replacement), the current
implementation calculates a new entry on the stack and copies it to the
table using a single structure assignment.

        struct pasid_entry *pte, new_pte;

        pte = intel_pasid_get_entry(dev, pasid);
        pasid_pte_config_first_level(iommu, &new_pte, ...);
        *pte = new_pte;

Because the hardware may fetch the 512-bit PASID entry in multiple
128-bit chunks, updating the entire entry while it is active (Present
bit set) risks a "torn" read. In this scenario, the IOMMU hardware
could observe an inconsistent state — partially new data and partially
old data — leading to unpredictable behavior or spurious faults.

Fix this by removing the unsafe "replace" helpers and following the
"clear-then-update" flow, which ensures the Present bit is cleared and
the required invalidation handshake is completed before the new
configuration is applied.

Fixes: 7543ee63e8 ("iommu/vt-d: Add pasid replace helpers")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260120061816.2132558-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-22 09:20:30 +01:00
Lu Baolu
c1e4f1dccb iommu/vt-d: Clear Present bit before tearing down context entry
When tearing down a context entry, the current implementation zeros the
entire 128-bit entry using multiple 64-bit writes. This creates a window
where the hardware can fetch a "torn" entry — where some fields are
already zeroed while the 'Present' bit is still set — leading to
unpredictable behavior or spurious faults.

While x86 provides strong write ordering, the compiler may reorder writes
to the two 64-bit halves of the context entry. Even without compiler
reordering, the hardware fetch is not guaranteed to be atomic with
respect to multiple CPU writes.

Align with the "Guidance to Software for Invalidations" in the VT-d spec
(Section 6.5.3.3) by implementing the recommended ownership handshake:

1. Clear only the 'Present' (P) bit of the context entry first to
   signal the transition of ownership from hardware to software.
2. Use dma_wmb() to ensure the cleared bit is visible to the IOMMU.
3. Perform the required cache and context-cache invalidation to ensure
   hardware no longer has cached references to the entry.
4. Fully zero out the entry only after the invalidation is complete.

Also, add a dma_wmb() to context_set_present() to ensure the entry
is fully initialized before the 'Present' bit becomes visible.

Fixes: ba39592764 ("Intel IOMMU: Intel IOMMU driver")
Reported-by: Dmytro Maluka <dmaluka@chromium.org>
Closes: https://lore.kernel.org/all/aTG7gc7I5wExai3S@google.com/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Dmytro Maluka <dmaluka@chromium.org>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260120061816.2132558-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-22 09:20:29 +01:00
Lu Baolu
75ed00055c iommu/vt-d: Clear Present bit before tearing down PASID entry
The Intel VT-d Scalable Mode PASID table entry consists of 512 bits (64
bytes). When tearing down an entry, the current implementation zeros the
entire 64-byte structure immediately using multiple 64-bit writes.

Since the IOMMU hardware may fetch these 64 bytes using multiple
internal transactions (e.g., four 128-bit bursts), updating or zeroing
the entire entry while it is active (P=1) risks a "torn" read. If a
hardware fetch occurs simultaneously with the CPU zeroing the entry, the
hardware could observe an inconsistent state, leading to unpredictable
behavior or spurious faults.

Follow the "Guidance to Software for Invalidations" in the VT-d spec
(Section 6.5.3.3) by implementing the recommended ownership handshake:

1. Clear only the 'Present' (P) bit of the PASID entry.
2. Use a dma_wmb() to ensure the cleared bit is visible to hardware
   before proceeding.
3. Execute the required invalidation sequence (PASID cache, IOTLB, and
   Device-TLB flush) to ensure the hardware has released all cached
   references.
4. Only after the flushes are complete, zero out the remaining fields
   of the PASID entry.

Also, add a dma_wmb() in pasid_set_present() to ensure that all other
fields of the PASID entry are visible to the hardware before the Present
bit is set.

Fixes: 0bbeb01a4f ("iommu/vt-d: Manage scalalble mode PASID tables")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Dmytro Maluka <dmaluka@chromium.org>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260120061816.2132558-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-22 09:20:29 +01:00
Yi Liu
04b1b069f1 iommu/vt-d: Flush piotlb for SVM and Nested domain
Besides the paging domains that use FS, SVM and Nested domains need to
use piotlb invalidation descriptor as well.

Fixes: b33125296b ("iommu/vt-d: Create unique domain ops for each stage")
Cc: stable@vger.kernel.org
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20251223065824.6164-1-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-22 09:20:29 +01:00
Dmytro Maluka
22d169bdd2 iommu/vt-d: Flush cache for PASID table before using it
When writing the address of a freshly allocated zero-initialized PASID
table to a PASID directory entry, do that after the CPU cache flush for
this PASID table, not before it, to avoid the time window when this
PASID table may be already used by non-coherent IOMMU hardware while
its contents in RAM is still some random old data, not zero-initialized.

Fixes: 194b3348bd ("iommu/vt-d: Fix PASID directory pointer coherency")
Signed-off-by: Dmytro Maluka <dmaluka@chromium.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20251221123508.37495-1-dmaluka@chromium.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-22 09:20:29 +01:00
Jinhui Guo
10e60d8781 iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in scalable mode
Commit 4fc82cd907 ("iommu/vt-d: Don't issue ATS Invalidation
request when device is disconnected") relies on
pci_dev_is_disconnected() to skip ATS invalidation for
safely-removed devices, but it does not cover link-down caused
by faults, which can still hard-lock the system.

For example, if a VM fails to connect to the PCIe device,
"virsh destroy" is executed to release resources and isolate
the fault, but a hard-lockup occurs while releasing the group fd.

Call Trace:
 qi_submit_sync
 qi_flush_dev_iotlb
 intel_pasid_tear_down_entry
 device_block_translation
 blocking_domain_attach_dev
 __iommu_attach_device
 __iommu_device_set_domain
 __iommu_group_set_domain_internal
 iommu_detach_group
 vfio_iommu_type1_detach_group
 vfio_group_detach_container
 vfio_group_fops_release
 __fput

Although pci_device_is_present() is slower than
pci_dev_is_disconnected(), it still takes only ~70 µs on a
ConnectX-5 (8 GT/s, x2) and becomes even faster as PCIe speed
and width increase.

Besides, devtlb_invalidation_with_pasid() is called only in the
paths below, which are far less frequent than memory map/unmap.

1. mm-struct release
2. {attach,release}_dev
3. set/remove PASID
4. dirty-tracking setup

The gain in system stability far outweighs the negligible cost
of using pci_device_is_present() instead of pci_dev_is_disconnected()
to decide when to skip ATS invalidation, especially under GDR
high-load conditions.

Fixes: 4fc82cd907 ("iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected")
Cc: stable@vger.kernel.org
Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
Link: https://lore.kernel.org/r/20251211035946.2071-3-guojinhui.liam@bytedance.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-22 09:20:28 +01:00
Jinhui Guo
42662d1983 iommu/vt-d: Skip dev-iotlb flush for inaccessible PCIe device without scalable mode
PCIe endpoints with ATS enabled and passed through to userspace
(e.g., QEMU, DPDK) can hard-lock the host when their link drops,
either by surprise removal or by a link fault.

Commit 4fc82cd907 ("iommu/vt-d: Don't issue ATS Invalidation
request when device is disconnected") adds pci_dev_is_disconnected()
to devtlb_invalidation_with_pasid() so ATS invalidation is skipped
only when the device is being safely removed, but it applies only
when Intel IOMMU scalable mode is enabled.

With scalable mode disabled or unsupported, a system hard-lock
occurs when a PCIe endpoint's link drops because the Intel IOMMU
waits indefinitely for an ATS invalidation that cannot complete.

Call Trace:
 qi_submit_sync
 qi_flush_dev_iotlb
 __context_flush_dev_iotlb.part.0
 domain_context_clear_one_cb
 pci_for_each_dma_alias
 device_block_translation
 blocking_domain_attach_dev
 iommu_deinit_device
 __iommu_group_remove_device
 iommu_release_device
 iommu_bus_notifier
 blocking_notifier_call_chain
 bus_notify
 device_del
 pci_remove_bus_device
 pci_stop_and_remove_bus_device
 pciehp_unconfigure_device
 pciehp_disable_slot
 pciehp_handle_presence_or_link_change
 pciehp_ist

Commit 81e921fd32 ("iommu/vt-d: Fix NULL domain on device release")
adds intel_pasid_teardown_sm_context() to intel_iommu_release_device(),
which calls qi_flush_dev_iotlb() and can also hard-lock the system
when a PCIe endpoint's link drops.

Call Trace:
 qi_submit_sync
 qi_flush_dev_iotlb
 __context_flush_dev_iotlb.part.0
 intel_context_flush_no_pasid
 device_pasid_table_teardown
 pci_pasid_table_teardown
 pci_for_each_dma_alias
 intel_pasid_teardown_sm_context
 intel_iommu_release_device
 iommu_deinit_device
 __iommu_group_remove_device
 iommu_release_device
 iommu_bus_notifier
 blocking_notifier_call_chain
 bus_notify
 device_del
 pci_remove_bus_device
 pci_stop_and_remove_bus_device
 pciehp_unconfigure_device
 pciehp_disable_slot
 pciehp_handle_presence_or_link_change
 pciehp_ist

Sometimes the endpoint loses connection without a link-down event
(e.g., due to a link fault); killing the process (virsh destroy)
then hard-locks the host.

Call Trace:
 qi_submit_sync
 qi_flush_dev_iotlb
 __context_flush_dev_iotlb.part.0
 domain_context_clear_one_cb
 pci_for_each_dma_alias
 device_block_translation
 blocking_domain_attach_dev
 __iommu_attach_device
 __iommu_device_set_domain
 __iommu_group_set_domain_internal
 iommu_detach_group
 vfio_iommu_type1_detach_group
 vfio_group_detach_container
 vfio_group_fops_release
 __fput

pci_dev_is_disconnected() only covers safe-removal paths;
pci_device_is_present() tests accessibility by reading
vendor/device IDs and internally calls pci_dev_is_disconnected().
On a ConnectX-5 (8 GT/s, x2) this costs ~70 µs.

Since __context_flush_dev_iotlb() is only called on
{attach,release}_dev paths (not hot), add pci_device_is_present()
there to skip inaccessible devices and avoid the hard-lock.

Fixes: 37764b952e ("iommu/vt-d: Global devTLB flush when present context entry changed")
Fixes: 81e921fd32 ("iommu/vt-d: Fix NULL domain on device release")
Cc: stable@vger.kernel.org
Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
Link: https://lore.kernel.org/r/20251211035946.2071-2-guojinhui.liam@bytedance.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-22 09:20:28 +01:00
Mostafa Saleh
a7f1bc231b iommu: debug-pagealloc: Use page_ext_get_from_phys()
Instead of calling pfn_valid() and then getting the page, call
the newly added function page_ext_get_from_phys(), which would
also check for MMIO and offline memory and return NULL in that
case.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-21 12:51:49 +01:00
Chaitanya Kulkarni
374e7af67d iommu/io-pgtable-arm: fix size_t signedness bug in unmap path
__arm_lpae_unmap() returns size_t but was returning -ENOENT (negative
error code) when encountering an unmapped PTE. Since size_t is unsigned,
-ENOENT (typically -2) becomes a huge positive value (0xFFFFFFFFFFFFFFFE
on 64-bit systems).

This corrupted value propagates through the call chain:
  __arm_lpae_unmap() returns -ENOENT as size_t
  -> arm_lpae_unmap_pages() returns it
  -> __iommu_unmap() adds it to iova address
  -> iommu_pgsize() triggers BUG_ON due to corrupted iova

This can cause IOVA address overflow in __iommu_unmap() loop and
trigger BUG_ON in iommu_pgsize() from invalid address alignment.

Fix by returning 0 instead of -ENOENT. The WARN_ON already signals
the error condition, and returning 0 (meaning "nothing unmapped")
is the correct semantic for size_t return type. This matches the
behavior of other io-pgtable implementations (io-pgtable-arm-v7s,
io-pgtable-dart) which return 0 on error conditions.

Fixes: 3318f7b5ce ("iommu/io-pgtable-arm: Add quirk to quiet WARN_ON()")
Cc: stable@vger.kernel.org
Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Rob Clark <robin.clark@oss.qualcomm.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-20 18:07:35 +01:00
Jason Gunthorpe
98d5110f90 iommupt: Make it clearer to the compiler that pts.level == 0 for single page
Older versions of gcc and clang sometimes get tripped up by the build time
assertion in FIELD_PREP because they can see that the argument to
FIELD_PREP is constant but can't see that the if condition protecting it
is also a constant false.

   In file included from <command-line>:
   In function 'amdv1pt_install_leaf_entry',
       inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:651:3,
       inlined from '__map_single_page0' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:662:1,
       inlined from 'pt_descend' at drivers/iommu/generic_pt/fmt/../pt_iter.h:391:9,
       inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:658:10,
       inlined from '__map_single_page1.constprop' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:662:1:
   ././include/linux/compiler_types.h:631:45: error: call to '__compiletime_assert_251' declared with attribute error: FIELD_PREP: value too large for the field
     631 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
	 |                                             ^
   ././include/linux/compiler_types.h:612:25: note: in definition of macro '__compiletime_assert'
     612 |                         prefix ## suffix();                             \
	 |                         ^~~~~~
   ././include/linux/compiler_types.h:631:9: note: in expansion of macro '_compiletime_assert'
     631 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
	 |         ^~~~~~~~~~~~~~~~~~~
   ./include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
	 |                                     ^~~~~~~~~~~~~~~~~~
   ./include/linux/bitfield.h:69:17: note: in expansion of macro 'BUILD_BUG_ON_MSG'
      69 |                 BUILD_BUG_ON_MSG(__builtin_constant_p(_val) ?           \
	 |                 ^~~~~~~~~~~~~~~~
   ./include/linux/bitfield.h:90:17: note: in expansion of macro '__BF_FIELD_CHECK_MASK'
      90 |                 __BF_FIELD_CHECK_MASK(mask, val, pfx);                  \
	 |                 ^~~~~~~~~~~~~~~~~~~~~
   ./include/linux/bitfield.h:137:17: note: in expansion of macro '__FIELD_PREP'
     137 |                 __FIELD_PREP(_mask, _val, "FIELD_PREP: ");              \
	 |                 ^~~~~~~~~~~~
   drivers/iommu/generic_pt/fmt/amdv1.h:220:26: note: in expansion of macro 'FIELD_PREP'
     220 |                          FIELD_PREP(AMDV1PT_FMT_OA,
	 |                          ^~~~~~~~~~

Changing the caller to check pts.level == 0 avoids demanding a bit of
complex reasoning from the compiler that pts.level == level == 0. Instead
the compiler sees that pt_install_leaf_entry() is called with a constant
pts.level == 0 which makes it more reliable to see the constant false in
the if.

Fixes: dcd6a011a8 ("iommupt: Add map_pages op")
Reported-by: Chunyu Hu <chuhu@redhat.com>
Closes: https://lore.kernel.org/all/aUn9uGPCooqB-RIF@gmail.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-20 10:18:04 +01:00
Suravee Suthikulpanit
c0a652a3d1 iommu/amd: Remove unused variable in amd_iommufd_viommu_destroy()
This fixes warning reported by 0-DAY CI Kernel Test Service.

Fixes: 757d2b1fdf ("iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601190634.bl7Mjx5Q-lkp@intel.com/
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-20 10:16:16 +01:00
Alex Williamson
fab06e956f * Reuse common phys_vec, phase out dma_buf_phys_vec
Signed-off-by: Alex Williamson <alex@shazbot.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmluaLsRHGFsZXhAc2hh
 emJvdC5vcmcACgkQI5ubbjuwiyIZow/+N/BbZYkRub0lFbufNQ/+M/N80kRzfjjQ
 dE2/AN+YDfLdFxNvmq/P1fYI2B/Oc8msuGwT+C+eJMSVMzZIdCDvnhxH+aPH9roo
 5zmE7fxLjWp/k1lOwb8TupOIR3Jl334AiC/t9PvqraIUMGvt3Uv+/7KoCx/xDxb7
 ID6NbDKiXCZpvlQN1dukf8TCzDVFJnWOOMKLiDnez14okfd5rIqiFpOWMjtAhwlY
 VrwIy/eVqX6YktniwunqU3f4/BeurlHdCS29LdDnHdZzL6HER5onvMwIO8qMeuFZ
 yS8Bf8KJTYtqEedMtFUi5a/ipYu4vuK0KEqIE6USKZLuhQCqLVjFDKs5zUGRQ48X
 qLs59BBmP1WgOnM63OGXzBAAvelLNoh/D5KVzzXmQyNkBn6mFy1MWeR38Ozgl0FA
 +GjK+iwV/GRo+CgDa6Vz+eVwvCV2RhcYlT4cK6BodIQbwd9SWAMEcRxI/IEvOxfC
 YY/1U2JRhOSaQb9j65xgwylEbwoi8BMVbFWE3DydYMr+9PVaOyTKLcJLKrYmhwLn
 cuPetgLaK3UtxdcfhnZyrwzpmtvA56SAReQYg9s+TXFGFurQjNlGVlcKk4dB45nX
 JcOtWHm/6+3D8qoN6FY8Vj5QPePn48urSw1R1/D0LP7951gxknILiQpI7aqEPHyU
 rjAZH6nH6bI=
 =c1Gn
 -----END PGP SIGNATURE-----

Merge tag 'common_phys_vec_via_vfio' into v6.20/vfio/next

 * Reuse common phys_vec, phase out dma_buf_phys_vec

Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-01-19 10:25:24 -07:00
Leon Romanovsky
b703b31ea8 types: reuse common phys_vec type instead of DMABUF open‑coded variant
After commit fcf463b92a ("types: move phys_vec definition to common header"),
we can use the shared phys_vec type instead of the DMABUF‑specific
dma_buf_phys_vec, which duplicated the same structure and semantics.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20260107-convert-to-pvec-v1-1-6e3ab8079708@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-01-19 10:13:29 -07:00
Wei Wang
e2692c4eea iommupt: Do not set C-bit on MMIO backed PTEs
AMD Secure Memory Encryption (SME) marks individual memory pages as
encrypted by setting the C-bit in page table entries. According to the
AMD APM,any pages corresponding to MMIO addresses must be configured
with the C-bit clear.

The current *_iommu_set_prot() implementation sets the C-bit on all PTEs
in the IOMMU page tables. This is incorrect for PTEs backed by MMIO, and
can break PCIe peer-to-peer communication when IOVA is used. Fix this by
avoiding the C-bit for MMIO-backed mappings.

For amdv2 IOMMU page tables, there is a usage scenario for GVA->GPA
mappings, and for the trusted MMIO in the TEE-IO case, the C-bit will need
to be added to GPA. However, SNP guests do not yet support vIOMMU, and the
trusted MMIO support is not ready in upstream. Adding the C-bit for trusted
MMIO can be considered once those features land.

Fixes: 879ced2bab ("iommupt: Add the AMD IOMMU v1 page table format")
Fixes: aef5de756e ("iommupt: Add the x86 64 bit page table format")
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Wei Wang <wei.w.wang@hotmail.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-19 10:19:54 +01:00
Vasant Hegde
3222b6de51 iommu/amd: Fix error path in amd_iommu_probe_device()
Currently, the error path of amd_iommu_probe_device() unconditionally
references dev_data, which may not be initialized if an early failure
occurs (like iommu_init_device() fails).

Move the out_err label to ensure the function exits immediately on
failure without accessing potentially uninitialized dev_data.

Fixes: 19e5cc156c ("iommu/amd: Enable support for up to 2K interrupts per function")
Cc: Rakuram Eswaran <rakuram.e96@gmail.com>
Cc: Jörg Rödel <joro@8bytes.org>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202512191724.meqJENXe-lkp@intel.com/
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-18 11:03:12 +01:00
Suravee Suthikulpanit
103f4e7c85 iommu/amd: Add support for nested domain attach/detach
Introduce set_dte_nested() to program guest translation settings in
the host DTE when attaches the nested domain to a device.

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-18 10:56:15 +01:00
Suravee Suthikulpanit
93eee2a49c iommu/amd: Refactor logic to program the host page table in DTE
Introduce the amd_iommu_set_dte_v1() helper function to configure
IOMMU host (v1) page table into DTE. This will be used later
when attaching nested doamin.

Also, remove obsolete warning when SNP is enabled and domain id
is zero since this check is no longer applicable.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-18 10:56:15 +01:00