linux

mirror of https://github.com/torvalds/linux.git synced 2026-03-08 03:44:45 +01:00

Author	SHA1	Message	Date
Kees Cook	189f164e57	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 08:26:33 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Linus Torvalds	1e0ea4dff0	IOMMU Updates for Linux v7.0 Including: - Core changes: - Rust bindings for IO-pgtable code - IOMMU page allocation debugging support - Disable ATS during PCI resets - Intel VT-d changes: - Skip dev-iotlb flush for inaccessible PCIe device - Flush cache for PASID table before using it - Use right invalidation method for SVA and NESTED domains - Ensure atomicity in context and PASID entry updates - AMD-Vi changes: - Support for nested translations - Other minor improvements - ARM-SMMU-v2 changes: - Configure SoC-specific prefetcher settings for Qualcomm's "MDSS". - ARM-SMMU-v3 changes: - Improve CMDQ locking fairness for pathetically small queue sizes. - Remove tracking of the IAS as this is only relevant for AArch32 and was causing C_BAD_STE errors. - Add device-tree support for NVIDIA's CMDQV extension. - Allow some hitless transitions for the 'MEV' and 'EATS' STE fields. - Don't disable ATS for nested S1-bypass nested domains. - Additions to the kunit selftests. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmmLDZwACgkQK/BELZcB GuNHgg//Yf9K/+T6+IOemA5Z8k3x2p39Q/Dv5x+SEGkh+CUh2C5dX97WD9LHntus 1mgIHlSgbM3bgMB+XTS1Q5ghy1QH71XOMnGCPhthwg843iCP2CcrB84ZZKKnNmw9 2YJdxYlNcbAMpvSd0F1XKaXoiNl9qzWx+QFtnVaTXMptNEhYOxMOlaZPtlEuwfJa T7h4cwtsiMDLWA4pw85y4hfvc5jKRv4dMoohin0lNEBpWkCfYE6b2Cjpff+9TtU2 Jyvvcvyns0US3amEwPHlIyfTUPKdaq6Vv3NX8TkAJUhGyEzdfwEtzqAvWMvOEYFh HfnE/LjZZLB1CUkF5MTib9dBgJACf/jtvOtuh4wZkx+7O2WIR6Ebo41dtWBM6dxh cHGeeQGqxdDZ5UJbIonF8Am0lxsaZx2zs09tlHEMGl2pNDi6vUppk1iTOkv3Wog0 zy4GhDBl0n/IcyCaIinnWck8C+BsAMcRGpDP2AB0I9/C2qpsaFY/NdNkbIGidhaJ 3khdAcjWsNPiJPNbUx66n6t8RSXdYKUuhJq2a/GgYmtAjhRR9cJlupB8/QYCBS5j fxXpHp4xMtw+Cgj58xC+gYXDivQOEThPs/BhL/qrxOzWE03HWI15MFydqRFWicnI gJCZSevMncBfNUTIJUSUmuT7ukP40cnh58QBeRkTmKGcW6HjuyY= =W/nW -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core changes: - Rust bindings for IO-pgtable code - IOMMU page allocation debugging support - Disable ATS during PCI resets Intel VT-d changes: - Skip dev-iotlb flush for inaccessible PCIe device - Flush cache for PASID table before using it - Use right invalidation method for SVA and NESTED domains - Ensure atomicity in context and PASID entry updates AMD-Vi changes: - Support for nested translations - Other minor improvements ARM-SMMU-v2 changes: - Configure SoC-specific prefetcher settings for Qualcomm's "MDSS" ARM-SMMU-v3 changes: - Improve CMDQ locking fairness for pathetically small queue sizes - Remove tracking of the IAS as this is only relevant for AArch32 and was causing C_BAD_STE errors - Add device-tree support for NVIDIA's CMDQV extension - Allow some hitless transitions for the 'MEV' and 'EATS' STE fields - Don't disable ATS for nested S1-bypass nested domains - Additions to the kunit selftests" * tag 'iommu-updates-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (54 commits) iommupt: Always add IOVA range to iotlb_gather in gather_range_pages() iommu/amd: serialize sequence allocation under concurrent TLB invalidations iommu/amd: Fix type of type parameter to amd_iommufd_hw_info() iommu/arm-smmu-v3: Do not set disable_ats unless vSTE is Translate iommu/arm-smmu-v3-test: Add nested s1bypass/s1dssbypass coverage iommu/arm-smmu-v3: Mark EATS_TRANS safe when computing the update sequence iommu/arm-smmu-v3: Mark STE MEV safe when computing the update sequence iommu/arm-smmu-v3: Add update_safe bits to fix STE update sequence iommu/arm-smmu-v3: Add device-tree support for CMDQV driver iommu/tegra241-cmdqv: Decouple driver from ACPI iommu/arm-smmu-qcom: Restore ACTLR settings for MDSS on sa8775p iommu/vt-d: Fix race condition during PASID entry replacement iommu/vt-d: Clear Present bit before tearing down context entry iommu/vt-d: Clear Present bit before tearing down PASID entry iommu/vt-d: Flush piotlb for SVM and Nested domain iommu/vt-d: Flush cache for PASID table before using it iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in scalable mode iommu/vt-d: Skip dev-iotlb flush for inaccessible PCIe device without scalable mode rust: iommu: fix `srctree` link warning rust: iommu: fix Rust formatting ...	2026-02-11 16:36:08 -08:00
Linus Torvalds	4e21e585b6	A series of treewide cleanups to ensure interrupt request consistency. - Add the missing IRQF_COND_ONESHOT flag to devm_request_irq() This is inconsistent vs. request_irq() and causes the same issues which where addressed with the introduction of this flag - Cleanup IRQF_ONESHOT and IRQF_NO_THREAD usage Quite some drivers have inconsistent interrupt request flags related to interrupt threading namely IRQF_ONESHOT and IRQF_NO_THREAD. This leads to warnings and/or malfunction when forced interrupt threading is enabled. - Remove stub primary (hard interrupt) handlers A bunch of drivers implement a stub primary (hard interrupt) handler which just returns IRQ_WAKE_THREAD. The same functionality is provided by the core code when the primary handler argument of request_thread_irq() is set to NULL. -----BEGIN PGP SIGNATURE----- iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmmJs8MQHHRnbHhAa2Vy bmVsLm9yZwAKCRCmGPVMDXSYoTbvEACH4OegGofKri7aecUPNcpRdQDHBoueikni Rio/vydFJ/H2hto4xlSPC4C84onxuFqY9lJgo/tCQTCrO0t+ZQ4ZGqnlQKzLJzmv vcVzNgGsxDZ0p1wJO0rBpTRxJN8yTXi8VVv5e6OPuihjLhdXGesyYtk1zosR3nOS CF/w8r9jVMzsSMPvtEMr5AwXD9ZTziUqyhQv94fYlpsbyD4TPXnUxhVkdUFFHHo3 ROzWPFw1Ykh6wpdRPEpupcCf1d2Pq0TIAU86y3Sbf2msuXiTouHf+lH1uTd3EsLN 6qUIqRYjwWE8HTieh+3YcH415wrIsUsWJb8YDi0DpqhPbja3IXP5ACHqEWaaNHRA MaBE2Gc02se4ChXMWncYR3cdzyAAwAeKLUahpLNc+7U4cHOm1w2g60yy4I0v2krh V0vfEN88WQ8DgrM0VvDLST6ZinSz4ia+R0qYWywl6eIW4RVNtuBi6wrN5PtzSEtz jZ3LqnRLGmNfKwS/taHBCAme7NIJSNa1L0ao/icnW5XVQz/d2EHVcUsLHecHZSMx l9tr/g3t85tsFW1eIKfF8T1a5DrbCEP4afceQk9KexAfAkP7el53M1E1yQDk/kW8 so0CwZtbDJ136RQdBIQqx49QrUEOvtrgNDRQxPFBUrWEHcvjqbUuFclp9hpLheOj 8YnzkVe0Rg== =vrmm -----END PGP SIGNATURE----- Merge tag 'irq-cleanups-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq cleanups from Thomas Gleixner: "A series of treewide cleanups to ensure interrupt request consistency. - Add the missing IRQF_COND_ONESHOT flag to devm_request_irq() This is inconsistent vs request_irq() and causes the same issues which where addressed with the introduction of this flag - Cleanup IRQF_ONESHOT and IRQF_NO_THREAD usage Quite some drivers have inconsistent interrupt request flags related to interrupt threading namely IRQF_ONESHOT and IRQF_NO_THREAD. This leads to warnings and/or malfunction when forced interrupt threading is enabled. - Remove stub primary (hard interrupt) handlers A bunch of drivers implement a stub primary (hard interrupt) handler which just returns IRQ_WAKE_THREAD. The same functionality is provided by the core code when the primary handler argument of request_thread_irq() is set to NULL" * tag 'irq-cleanups-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: media: pci: mg4b: Use IRQF_NO_THREAD mfd: wm8350-core: Use IRQF_ONESHOT thermal/qcom/lmh: Replace IRQF_ONESHOT with IRQF_NO_THREAD rtc: amlogic-a4: Remove IRQF_ONESHOT usb: typec: fusb302: Remove IRQF_ONESHOT EDAC/altera: Remove IRQF_ONESHOT char: tpm: cr50: Remove IRQF_ONESHOT ARM: versatile: Remove IRQF_ONESHOT scsi: efct: Use IRQF_ONESHOT and default primary handler Bluetooth: btintel_pcie: Use IRQF_ONESHOT and default primary handler bus: fsl-mc: Use default primary handler mailbox: bcm-ferxrm-mailbox: Use default primary handler iommu/amd: Use core's primary handler and set IRQF_ONESHOT platform/x86: int0002: Remove IRQF_ONESHOT from request_irq() genirq: Set IRQF_COND_ONESHOT in devm_request_irq().	2026-02-10 13:22:50 -08:00
Joerg Roedel	ad09563660	Merge branches 'fixes', 'arm/smmu/updates', 'intel/vt-d', 'amd/amd-vi' and 'core' into next	2026-02-06 11:10:40 +01:00
Ankit Soni	9e249c4841	iommu/amd: serialize sequence allocation under concurrent TLB invalidations With concurrent TLB invalidations, completion wait randomly gets timed out because cmd_sem_val was incremented outside the IOMMU spinlock, allowing CMD_COMPL_WAIT commands to be queued out of sequence and breaking the ordering assumption in wait_on_sem(). Move the cmd_sem_val increment under iommu->lock so completion sequence allocation is serialized with command queuing. And remove the unnecessary return. Fixes: `d2a0cac105` ("iommu/amd: move wait_on_sem() out of spinlock") Tested-by: Srikanth Aithal <sraithal@amd.com> Reported-by: Srikanth Aithal <sraithal@amd.com> Signed-off-by: Ankit Soni <Ankit.Soni@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-02-03 14:27:05 +01:00
Sebastian Andrzej Siewior	5bfcdccb4d	iommu/amd: Use core's primary handler and set IRQF_ONESHOT request_threaded_irq() is invoked with a primary and a secondary handler and no flags are passed. The primary handler is the same as irq_default_primary_handler() so there is no need to have an identical copy. The lack of the IRQF_ONESHOT can be dangerous because the interrupt source is not masked while the threaded handler is active. This means, especially on LEVEL typed interrupt lines, the interrupt can fire again before the threaded handler had a chance to run. Use the default primary interrupt handler by specifying NULL and set IRQF_ONESHOT so the interrupt source is masked until the secondary handler is done. Fixes: `72fe00f01f` ("x86/amd-iommu: Use threaded interupt handler") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260128095540.863589-4-bigeasy@linutronix.de	2026-02-01 17:37:13 +01:00
Nathan Chancellor	5b0530bb16	iommu/amd: Fix type of type parameter to amd_iommufd_hw_info() When building with -Wincompatible-function-pointer-types-strict, a warning designed to catch kernel control flow integrity (kCFI) issues at build time, there is an instance around amd_iommufd_hw_info(): drivers/iommu/amd/iommu.c:3141:13: error: incompatible function pointer types initializing 'void ()(struct device , u32 , enum iommu_hw_info_type )' (aka 'void ()(struct device , unsigned int , enum iommu_hw_info_type )') with an expression of type 'void (struct device , u32 , u32 )' (aka 'void (struct device , unsigned int , unsigned int )') [-Werror,-Wincompatible-function-pointer-types-strict] 3141 \| .hw_info = amd_iommufd_hw_info, \| ^~~~~~~~~~~~~~~~~~~ While 'u32 ' and 'enum iommu_hw_info_type ' are ABI compatible, hence no regular warning from -Wincompatible-function-pointer-types, the mismatch will trigger a kCFI violation when amd_iommufd_hw_info() is called indirectly. Update the type parameter of amd_iommufd_hw_info() to be 'enum iommu_hw_info_type *' to match the prototype in 'struct iommu_ops', clearing up the warning and kCFI violation. Fixes: `7d8b06ecc4` ("iommu/amd: Add support for hw_info for iommu capability query") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-28 15:13:01 +01:00
Suravee Suthikulpanit	c0a652a3d1	iommu/amd: Remove unused variable in amd_iommufd_viommu_destroy() This fixes warning reported by 0-DAY CI Kernel Test Service. Fixes: `757d2b1fdf` ("iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202601190634.bl7Mjx5Q-lkp@intel.com/ Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-20 10:16:16 +01:00
Vasant Hegde	3222b6de51	iommu/amd: Fix error path in amd_iommu_probe_device() Currently, the error path of amd_iommu_probe_device() unconditionally references dev_data, which may not be initialized if an early failure occurs (like iommu_init_device() fails). Move the out_err label to ensure the function exits immediately on failure without accessing potentially uninitialized dev_data. Fixes: `19e5cc156c` ("iommu/amd: Enable support for up to 2K interrupts per function") Cc: Rakuram Eswaran <rakuram.e96@gmail.com> Cc: Jörg Rödel <joro@8bytes.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202512191724.meqJENXe-lkp@intel.com/ Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 11:03:12 +01:00
Suravee Suthikulpanit	103f4e7c85	iommu/amd: Add support for nested domain attach/detach Introduce set_dte_nested() to program guest translation settings in the host DTE when attaches the nested domain to a device. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:15 +01:00
Suravee Suthikulpanit	93eee2a49c	iommu/amd: Refactor logic to program the host page table in DTE Introduce the amd_iommu_set_dte_v1() helper function to configure IOMMU host (v1) page table into DTE. This will be used later when attaching nested doamin. Also, remove obsolete warning when SNP is enabled and domain id is zero since this check is no longer applicable. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:15 +01:00
Suravee Suthikulpanit	4e1b09d90b	iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte() To help avoid duplicate logic when programing DTE for nested translation. Note that this commit changes behavior of when the IOMMU driver is switching domain during attach and the blocking domain, where DTE bit fields for interrupt pass-through (i.e. Lint0, Lint1, NMI, INIT, ExtInt) and System management message could be affected. These DTE bits are specified in the IVRS table for specific devices, and should be persistent. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:14 +01:00
Suravee Suthikulpanit	757d2b1fdf	iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation Each nested domain is assigned guest domain ID (gDomID), which guest OS programs into guest Device Table Entry (gDTE). For each gDomID, the driver assigns a corresponding host domain ID (hDomID), which will be programmed into the host Device Table Entry (hDTE). The hDomID is allocated during amd_iommu_alloc_domain_nested(), and free during nested_domain_free(). The gDomID-to-hDomID mapping info (struct guest_domain_mapping_info) is stored in a per-viommu xarray (struct amd_iommu_viommu.gdomid_array), which is indexed by gDomID. Note also that parent domain can be shared among struct iommufd_viommu. Therefore, when hypervisor invalidates the nest parent domain, the AMD IOMMU command INVALIDATE_IOMMU_PAGES must be issued for each hDomID in the gdomid_array. This is handled by the iommu_flush_pages_v1_hdom_ids(), where it iterates through struct protection_domain.viommu_list. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:14 +01:00
Suravee Suthikulpanit	774180a74a	iommu/amd: Add support for nested domain allocation The nested domain is allocated with IOMMU_DOMAIN_NESTED type to store stage-1 translation (i.e. GVA->GPA). This includes the GCR3 root pointer table along with guest page tables. The struct iommu_hwpt_amd_guest contains this information, and is passed from user-space as a parameter of the struct iommu_ops.domain_alloc_nested(). Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:13 +01:00
Suravee Suthikulpanit	e113a72576	iommu/amd: Introduce struct amd_iommu_viommu Which stores reference to nested parent domain assigned during the call to struct iommu_ops.viommu_init(). Information in the nest parent is needed when setting up the nested translation. Note that the viommu initialization will be introduced in subsequent commit. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:12 +01:00
Suravee Suthikulpanit	b43a29def2	iommu/amd: Add support for nest parent domain allocation To support nested translation, the nest parent domain is allocated with IOMMU_HWPT_ALLOC_NEST_PARENT flag, and stores information of the v1 page table for stage 2 (i.e. GPA->SPA). Also, only support nest parent domain on AMD system, which can support the Guest CR3 Table (GCR3TRPMode) feature. This feature is required in order to program DTE[GCR3 Table Root Pointer] with the GPA. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:12 +01:00
Suravee Suthikulpanit	b2bb0573dd	iommu/amd: Always enable GCR3TRPMode when supported. The GCR3TRPMode feature allows the DTE[GCR3TRP] field to be configured with GPA (instead of SPA). This simplifies the implementation, and is a pre-requisite for nested translation support. Therefore, always enable this feature if available. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:12 +01:00
Suravee Suthikulpanit	9b467a5af8	iommu/amd: Introduce helper function amd_iommu_update_dte() Which includes DTE update, clone_aliases, DTE flush and completion-wait commands to avoid code duplication when reuse to setup DTE for nested translation. Also, make amd_iommu_update_dte() non-static to reuse in in a new nested.c file for nested translation. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:11 +01:00
Suravee Suthikulpanit	11cfa782f0	iommu/amd: Make amd_iommu_make_clear_dte() non-static inline This will be reused in a new nested.c file for nested translation. Also, remove unused function parameter ptr. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:11 +01:00
Suravee Suthikulpanit	5335fc1657	iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Also change the define to use GENMASK_ULL instead. There is no functional change. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:10 +01:00
Suravee Suthikulpanit	7d8b06ecc4	iommu/amd: Add support for hw_info for iommu capability query AMD IOMMU Extended Feature (EFR) and Extended Feature 2 (EFR2) registers specify features supported by each IOMMU hardware instance. The IOMMU driver checks each feature-specific bits before enabling each feature at run time. For IOMMUFD, the hypervisor passes the raw value of amd_iommu_efr and amd_iommu_efr2 to VMM via iommufd IOMMU_DEVICE_GET_HW_INFO ioctl. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-18 10:56:09 +01:00
Rakuram Eswaran	2e66659565	iommu/amd: Drop incorrect NULL check for iommu in alloc_irq_table() alloc_irq_table() contains a conditional check for a NULL iommu pointer when computing the NUMA node, but the function dereferences iommu in multiple places afterwards. All callers ensure that a valid iommu pointer is passed in, and a NULL iommu is not expected by the current callers. Remove the incorrect NULL check to make the assumptions consistent and address the Smatch warning. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202512191724.meqJENXe-lkp@intel.com/ Signed-off-by: Rakuram Eswaran <rakuram.e96@gmail.com> Reviewed-by: Ankit Soni <Ankit.Soni@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-10 11:17:43 +01:00
Ankit Soni	d2a0cac105	iommu/amd: move wait_on_sem() out of spinlock With iommu.strict=1, the existing completion wait path can cause soft lockups under stressed environment, as wait_on_sem() busy-waits under the spinlock with interrupts disabled. Move the completion wait in iommu_completion_wait() out of the spinlock. wait_on_sem() only polls the hardware-updated cmd_sem and does not require iommu->lock, so holding the lock during the busy wait unnecessarily increases contention and extends the time with interrupts disabled. Signed-off-by: Ankit Soni <Ankit.Soni@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-10 10:54:38 +01:00
Sairaj Kodilkar	c7fe9384c8	amd/iommu: Make protection domain ID functions non-static So that both iommu.c and init.c can utilize them. Also define a new function 'pdom_id_destroy()' to destroy 'pdom_ids' instead of directly calling ida functions. Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-12-19 11:23:49 +01:00
Sairaj Kodilkar	c2e8dc1222	amd/iommu: Preserve domain ids inside the kdump kernel Currently AMD IOMMU driver does not reserve domain ids programmed in the DTE while reusing the device table inside kdump kernel. This can cause reallocation of these domain ids for newer domains that are created by the kdump kernel, which can lead to potential IO_PAGE_FAULTs Hence reserve these ids inside pdom_ids. Fixes: `38e5f33ee3` ("iommu/amd: Reuse device table for kdump") Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reported-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-12-19 11:23:48 +01:00
Linus Torvalds	249872f53d	tsm for 6.19 - Introduce the PCI/TSM core for the coordination of device authentication, link encryption and establishment (IDE), and later management of the device security operational states (TDISP). Notify the new TSM core layer of PCI device arrival and departure. - Add a low level TSM driver for the link encryption establishment capabilities of the AMD SEV-TIO architecture. - Add a library of helpers TSM drivers to use for IDE establishment and the DOE transport. - Add skeleton support for 'bind' and 'guest_request' operations in support of TDISP. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCaTOdAwAKCRDfioYZHlFs Z/fWAQDS5mwS/8rn0UdH/SijTm/oKVxdiyIQbTstrjk8AySITgEA5ki9w2iKa0WG x1ACZKlo9gS9emyx4wuJpCBIMtR50Qc= =B4oG -----END PGP SIGNATURE----- Merge tag 'tsm-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm Pull PCIe Link Encryption and Device Authentication from Dan Williams: "New PCI infrastructure and one architecture implementation for PCIe link encryption establishment via platform firmware services. This work is the result of multiple vendors coming to consensus on some core infrastructure (thanks Alexey, Yilun, and Aneesh!), and three vendor implementations, although only one is included in this pull. The PCI core changes have an ack from Bjorn, the crypto/ccp/ changes have an ack from Tom, and the iommu/amd/ changes have an ack from Joerg. PCIe link encryption is made possible by the soup of acronyms mentioned in the shortlog below. Link Integrity and Data Encryption (IDE) is a protocol for installing keys in the transmitter and receiver at each end of a link. That protocol is transported over Data Object Exchange (DOE) mailboxes using PCI configuration requests. The aspect that makes this a "platform firmware service" is that the key provisioning and protocol is coordinated through a Trusted Execution Envrionment (TEE) Security Manager (TSM). That is either firmware running in a coprocessor (AMD SEV-TIO), or quasi-hypervisor software (Intel TDX Connect / ARM CCA) running in a protected CPU mode. Now, the only reason to ask a TSM to run this protocol and install the keys rather than have a Linux driver do the same is so that later, a confidential VM can ask the TSM directly "can you certify this device?". That precludes host Linux from provisioning its own keys, because host Linux is outside the trust domain for the VM. It also turns out that all architectures, save for one, do not publish a mechanism for an OS to establish keys in the root port. So "TSM-established link encryption" is the only cross-architecture path for this capability for the foreseeable future. This unblocks the other arch implementations to follow in v6.20/v7.0, once they clear some other dependencies, and it unblocks the next phase of work to implement the end-to-end flow of confidential device assignment. The PCIe specification calls this end-to-end flow Trusted Execution Environment (TEE) Device Interface Security Protocol (TDISP). In the meantime, Linux gets a link encryption facility which has practical benefits along the same lines as memory encryption. It authenticates devices via certificates and may protect against interposer attacks trying to capture clear-text PCIe traffic. Summary: - Introduce the PCI/TSM core for the coordination of device authentication, link encryption and establishment (IDE), and later management of the device security operational states (TDISP). Notify the new TSM core layer of PCI device arrival and departure - Add a low level TSM driver for the link encryption establishment capabilities of the AMD SEV-TIO architecture - Add a library of helpers TSM drivers to use for IDE establishment and the DOE transport - Add skeleton support for 'bind' and 'guest_request' operations in support of TDISP" * tag 'tsm-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm: (23 commits) crypto/ccp: Fix CONFIG_PCI=n build virt: Fix Kconfig warning when selecting TSM without VIRT_DRIVERS crypto/ccp: Implement SEV-TIO PCIe IDE (phase1) iommu/amd: Report SEV-TIO support psp-sev: Assign numbers to all status codes and add new ccp: Make snp_reclaim_pages and __sev_do_cmd_locked public PCI/TSM: Add 'dsm' and 'bound' attributes for dependent functions PCI/TSM: Add pci_tsm_guest_req() for managing TDIs PCI/TSM: Add pci_tsm_bind() helper for instantiating TDIs PCI/IDE: Initialize an ID for all IDE streams PCI/IDE: Add Address Association Register setup for downstream MMIO resource: Introduce resource_assigned() for discerning active resources PCI/TSM: Drop stub for pci_tsm_doe_transfer() drivers/virt: Drop VIRT_DRIVERS build dependency PCI/TSM: Report active IDE streams PCI/IDE: Report available IDE streams PCI/IDE: Add IDE establishment helpers PCI: Establish document for PCI host bridge sysfs attributes PCI: Add PCIe Device 3 Extended Capability enumeration PCI/TSM: Establish Secure Sessions and Link Encryption ...	2025-12-06 10:15:41 -08:00
Linus Torvalds	208eed95fc	soc: driver updates for 6.19 This is the first half of the driver changes: - A treewide interface change to the "syscore" operations for power management, as a preparation for future Tegra specific changes. - Reset controller updates with added drivers for LAN969x, eic770 and RZ/G3S SoCs. - Protection of system controller registers on Renesas and Google SoCs, to prevent trivially triggering a system crash from e.g. debugfs access. - soc_device identification updates on Nvidia, Exynos and Mediatek - debugfs support in the ST STM32 firewall driver - Minor updates for SoC drivers on AMD/Xilinx, Renesas, Allwinner, TI - Cleanups for memory controller support on Nvidia and Renesas -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmky/8gACgkQmmx57+YA GNlqohAApPTLM6Q4gf1cIcsTVaP0uxx9CBgupCGuT5ORrOMKBghVWjTOTSxeEAab UQF465QwYUUu602GH34UmRaY9CKW2bMIsfmkgmxNB4Y4Qd7yCgQNJ/h/TnN0rBH+ qTeEsRH/hax4miSNsh0oOZfVkZkg+23VF02d1VL0CcaX7y4oT45RPBQugrNx/gNS fHfVwgIq8vJ8WyrmM1h2nv1i1vgSzEy50B3kY674BBw83FcJTafNLvD7N5DSgD1H /I/2xeyEpb+oL1VfeHcXZaX/jf04O+cmvSzBi+MOH1tI3MpdxJib1vEYBdggoOWN K/FFGgsOY+DNmJPpSnPTTu8UpzksS8SxGBP7M9Q8roKZwA2c9wLotxySvjki5yv8 2zvabRdzbrSaoYwsH9QnZdQ2hVkJ9W8MESu8PevD3yMNuFUzledPDWW0N1SbGm78 0ZdB6NPdaBZYHMNMRdFhN8P275/Mx5e0XWN9oYMQqjPooH7YkyT7hJWz6ao2PCJP 8mDmnW1RzL+LWf7mJ25ZEtS+YjmKA/PVmogRrGurKCadvdxXqCF09KNljICHhmmu t0KB4dqw02OXLPvBk21qCi0zL56w1JDgqtS8suFvDYo9sCceeAbAcmpyoUOFj2N+ Upn976tb4iqFrr9mFswpmCJWPpqJkU+A+KnKsIRPU7N4kSrP35I= =HvlN -----END PGP SIGNATURE----- Merge tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "This is the first half of the driver changes: - A treewide interface change to the "syscore" operations for power management, as a preparation for future Tegra specific changes - Reset controller updates with added drivers for LAN969x, eic770 and RZ/G3S SoCs - Protection of system controller registers on Renesas and Google SoCs, to prevent trivially triggering a system crash from e.g. debugfs access - soc_device identification updates on Nvidia, Exynos and Mediatek - debugfs support in the ST STM32 firewall driver - Minor updates for SoC drivers on AMD/Xilinx, Renesas, Allwinner, TI - Cleanups for memory controller support on Nvidia and Renesas" * tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (114 commits) memory: tegra186-emc: Fix missing put_bpmp Documentation: reset: Remove reset_controller_add_lookup() reset: fix BIT macro reference reset: rzg2l-usbphy-ctrl: Fix a NULL vs IS_ERR() bug in probe reset: th1520: Support reset controllers in more subsystems reset: th1520: Prepare for supporting multiple controllers dt-bindings: reset: thead,th1520-reset: Add controllers for more subsys dt-bindings: reset: thead,th1520-reset: Remove non-VO-subsystem resets reset: remove legacy reset lookup code clk: davinci: psc: drop unused reset lookup reset: rzg2l-usbphy-ctrl: Add support for RZ/G3S SoC reset: rzg2l-usbphy-ctrl: Add support for USB PWRRDY dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document RZ/G3S support reset: eswin: Add eic7700 reset driver dt-bindings: reset: eswin: Documentation for eic7700 SoC reset: sparx5: add LAN969x support dt-bindings: reset: microchip: Add LAN969x support soc: rockchip: grf: Add select correct PWM implementation on RK3368 soc/tegra: pmc: Add USB wake events for Tegra234 amba: tegra-ahb: Fix device leak on SMMU enable ...	2025-12-05 17:29:04 -08:00
Alexey Kardashevskiy	eeb934137d	iommu/amd: Report SEV-TIO support The SEV-TIO switch in the AMD BIOS is reported to the OS via the IOMMU Extended Feature 2 register (EFR2), bit 1. Add helper to parse the bit and report the feature presence. Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Link: https://patch.msgid.link/20251202024449.542361-4-aik@amd.com Acked-by: Joerg Roedel <joerg.roedel@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2025-12-02 12:06:45 -08:00
Joerg Roedel	0d081b1694	Merge branches 'arm/smmu/updates', 'arm/smmu/bindings', 'mediatek', 'nvidia/tegra', 'intel/vt-d', 'amd/amd-vi' and 'core' into next	2025-11-28 08:44:21 +01:00
Jason Gunthorpe	1eb0ae6fbd	iommupt/vtd: Support mgaw's less than a 4 level walk for first stage If the IOVA is limited to less than 48 the page table will be constructed with a 3 level configuration which is unsupported by hardware. Like the second stage the caller needs to pass in both the top_level an the vasz to specify a table that has more levels than required to hold the IOVA range. Fixes: `6cbc09b771` ("iommu/vt-d: Restore previous domain::aperture_end calculation") Reported-by: Calvin Owens <calvin@wbinvd.org> Closes: https://lore.kernel.org/r/8f257d2651eb8a4358fcbd47b0145002e5f1d638.1764237717.git.calvin@wbinvd.org Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-11-28 08:43:55 +01:00
Jinhui Guo	2381a1b40b	iommu/amd: Propagate the error code returned by __modify_irte_ga() in modify_irte_ga() The return type of __modify_irte_ga() is int, but modify_irte_ga() treats it as a bool. Casting the int to bool discards the error code. To fix the issue, change the type of ret to int in modify_irte_ga(). Fixes: `57cdb720ea` ("iommu/amd: Do not flush IRTE when only updating isRun and destination fields") Cc: stable@vger.kernel.org Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-11-25 15:10:23 +01:00
Thierry Reding	a97fbc3ee3	syscore: Pass context data to callbacks Several drivers can benefit from registering per-instance data along with the syscore operations. To achieve this, move the modifiable fields out of the syscore_ops structure and into a separate struct syscore that can be registered with the framework. Add a void * driver data field for drivers to store contextual data that will be passed to the syscore ops. Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com>	2025-11-14 10:01:52 +01:00
Jinhui Guo	75ba146c26	iommu/amd: Fix pci_segment memleak in alloc_pci_segment() Fix a memory leak of struct amd_iommu_pci_segment in alloc_pci_segment() when system memory (or contiguous memory) is insufficient. Fixes: `04230c1199` ("iommu/amd: Introduce per PCI segment device table") Fixes: `eda797a277` ("iommu/amd: Introduce per PCI segment rlookup table") Fixes: `99fc4ac3d2` ("iommu/amd: Introduce per PCI segment alias_table") Cc: stable@vger.kernel.org Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-11-13 16:15:56 +01:00
Dheeraj Kumar Srivastava	d1e281f832	iommu/amd: Enhance "Completion-wait Time-out" error message Current IOMMU driver prints "Completion-wait Time-out" error message with insufficient information to further debug the issue. Enhancing the error message as following: 1. Log IOMMU PCI device ID in the error message. 2. With "amd_iommu_dump=1" kernel command line option, dump entire command buffer entries including Head and Tail offset. Dump the entire command buffer only on the first 'Completion-wait Time-out' to avoid dmesg spam. Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Reviewed-by: Ankit Soni <Ankit.Soni@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-11-13 16:13:26 +01:00
Jason Gunthorpe	2fdf6db436	iommu/amd: Remove AMD io_pgtable support None of this is used anymore, delete it. Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-11-05 09:08:57 +01:00
Alejandro Jimenez	789a5913b2	iommu/amd: Use the generic iommu page table Replace the io_pgtable versions with pt_iommu versions. The v2 page table uses the x86 implementation that will be eventually shared with VT-d. This supports the same special features as the original code: - increase_top for the v1 format to allow scaling from 3 to 6 levels - non-present flushing - Dirty tracking for v1 only - __sme_set() to adjust the PTEs for CC - Optimization for flushing with virtualization to minimize the range - amd_iommu_pgsize_bitmap override of the native page sizes - page tables allocate from the device's NUMA node Rework the domain ops so that v1/v2 get their own ops. Make dedicated allocation functions for v1 and v2. Hook up invalidation for a top change to struct pt_iommu_flush_ops. Delete some of the iopgtable related code that becomes unused in this patch. The next patch will delete the rest of it. This fixes a race bug in AMD's increase_address_space() implementation. It stores the top level and top pointer in different memory, which prevents other threads from reading a coherent version: increase_address_space() alloc_pte() level = pgtable->mode - 1; pgtable->root = pte; pgtable->mode += 1; pte = &pgtable->root[PM_LEVEL_INDEX(level, address)]; The iommupt version is careful to put mode and root under a single READ_ONCE and then is careful to only READ_ONCE a single time per walk. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-11-05 09:08:56 +01:00
Songtang Liu	a0c7005333	iommu/amd: Fix potential out-of-bounds read in iommu_mmio_show In iommu_mmio_write(), it validates the user-provided offset with the check: `iommu->dbg_mmio_offset > iommu->mmio_phys_end - 4`. This assumes a 4-byte access. However, the corresponding show handler, iommu_mmio_show(), uses readq() to perform an 8-byte (64-bit) read. If a user provides an offset equal to `mmio_phys_end - 4`, the check passes, and will lead to a 4-byte out-of-bounds read. Fix this by adjusting the boundary check to use sizeof(u64), which corresponds to the size of the readq() operation. Fixes: `7a4ee419e8` ("iommu/amd: Add debugfs support to dump IOMMU MMIO registers") Signed-off-by: Songtang Liu <liusongtang@bytedance.com> Reviewed-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-11-04 11:06:00 +01:00
Nicolin Chen	fd714986e4	iommu: Pass in old domain to attach_dev callback functions The IOMMU core attaches each device to a default domain on probe(). Then, every new "attach" operation has a fundamental meaning of two-fold: - detach from its currently attached (old) domain - attach to a given new domain Modern IOMMU drivers following this pattern usually want to clean up the things related to the old domain, so they call iommu_get_domain_for_dev() to fetch the old domain. Pass in the old domain pointer from the core to drivers, aligning with the set_dev_pasid op that does so already. Ensure all low-level attach fcuntions in the core can forward the correct old domain pointer. Thus, rework those functions as well. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-10-27 13:55:35 +01:00
Nicolin Chen	c21b34762e	iommu/amd: Set release_domain to blocked_domain The set_dev_pasid for a release domain never gets called anyhow. So, there is no point in defining a separate release_domain from the blocked_domain. Simply reuse the blocked_domain. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-10-27 13:55:35 +01:00
Joerg Roedel	5f4b8c03f4	Merge branches 'apple/dart', 'ti/omap', 'riscv', 'intel/vt-d' and 'amd/amd-vi' into next	2025-09-26 10:03:33 +02:00
Vasant Hegde	1e56310b40	iommu/amd/pgtbl: Fix possible race while increase page table level The AMD IOMMU host page table implementation supports dynamic page table levels (up to 6 levels), starting with a 3-level configuration that expands based on IOVA address. The kernel maintains a root pointer and current page table level to enable proper page table walks in alloc_pte()/fetch_pte() operations. The IOMMU IOVA allocator initially starts with 32-bit address and onces its exhuasted it switches to 64-bit address (max address is determined based on IOMMU and device DMA capability). To support larger IOVA, AMD IOMMU driver increases page table level. But in unmap path (iommu_v1_unmap_pages()), fetch_pte() reads pgtable->[root/mode] without lock. So its possible that in exteme corner case, when increase_address_space() is updating pgtable->[root/mode], fetch_pte() reads wrong page table level (pgtable->mode). It does compare the value with level encoded in page table and returns NULL. This will result is iommu_unmap ops to fail and upper layer may retry/log WARN_ON. CPU 0 CPU 1 ------ ------ map pages unmap pages alloc_pte() -> increase_address_space() iommu_v1_unmap_pages() -> fetch_pte() pgtable->root = pte (new root value) READ pgtable->[mode/root] Reads new root, old mode Updates mode (pgtable->mode += 1) Since Page table level updates are infrequent and already synchronized with a spinlock, implement seqcount to enable lock-free read operations on the read path. Fixes: `754265bcab` ("iommu/amd: Fix race in increase_address_space()") Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Cc: stable@vger.kernel.org Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-19 09:39:40 +02:00
Vasant Hegde	a0c17ed907	iommu/amd: Fix alias device DTE setting Commit `7bea695ada` restructured DTE flag handling but inadvertently changed the alias device configuration logic. This may cause incorrect DTE settings for certain devices. Add alias flag check before calling set_dev_entry_from_acpi(). Also move the device iteration loop inside the alias check to restrict execution to cases where alias devices are present. Fixes: `7bea695ada` ("iommu/amd: Introduce struct ivhd_dte_flags to store persistent DTE flags") Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-13 08:11:30 +02:00
Zhen Ni	923b70581c	iommu/amd: Fix ivrs_base memleak in early_amd_iommu_init() Fix a permanent ACPI table memory leak in early_amd_iommu_init() when CMPXCHG16B feature is not supported Fixes: `82582f85ed` ("iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported") Cc: stable@vger.kernel.org Signed-off-by: Zhen Ni <zhen.ni@easystack.cn> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20250822024915.673427-1-zhen.ni@easystack.cn Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 15:11:07 +02:00
Ashish Kalra	9be15fbfc6	iommu/amd: Skip enabling command/event buffers for kdump After a panic if SNP is enabled in the previous kernel then the kdump kernel boots with IOMMU SNP enforcement still enabled. IOMMU command buffers and event buffer registers remain locked and exclusive to the previous kernel. Attempts to enable command and event buffers in the kdump kernel will fail, as hardware ignores writes to the locked MMIO registers as per AMD IOMMU spec Section 2.12.2.1. Skip enabling command buffers and event buffers for kdump boot as they are already enabled in the previous kernel. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Link: https://lore.kernel.org/r/576445eb4f168b467b0fc789079b650ca7c5b037.1756157913.git.ashish.kalra@amd.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 14:45:17 +02:00
Ashish Kalra	38e5f33ee3	iommu/amd: Reuse device table for kdump After a panic if SNP is enabled in the previous kernel then the kdump kernel boots with IOMMU SNP enforcement still enabled. IOMMU device table register is locked and exclusive to the previous kernel. Attempts to copy old device table from the previous kernel fails in kdump kernel as hardware ignores writes to the locked device table base address register as per AMD IOMMU spec Section 2.12.2.1. This causes the IOMMU driver (OS) and the hardware to reference different memory locations. As a result, the IOMMU hardware cannot process the command which results in repeated "Completion-Wait loop timed out" errors and a second kernel panic: "Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC". Reuse device table instead of copying device table in case of kdump boot and remove all copying device table code. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Link: https://lore.kernel.org/r/3a31036fb2f7323e6b1a1a1921ac777e9f7bdddc.1756157913.git.ashish.kalra@amd.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 14:44:32 +02:00
Ashish Kalra	f32fe7cb01	iommu/amd: Add support to remap/unmap IOMMU buffers for kdump After a panic if SNP is enabled in the previous kernel then the kdump kernel boots with IOMMU SNP enforcement still enabled. IOMMU completion wait buffers (CWBs), command buffers and event buffer registers remain locked and exclusive to the previous kernel. Attempts to allocate and use new buffers in the kdump kernel fail, as hardware ignores writes to the locked MMIO registers as per AMD IOMMU spec Section 2.12.2.1. This results in repeated "Completion-Wait loop timed out" errors and a second kernel panic: "Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC" The list of MMIO registers locked and which ignore writes after failed SNP shutdown are mentioned in the AMD IOMMU specifications below: Section 2.12.2.1. https://docs.amd.com/v/u/en-US/48882_3.10_PUB Reuse the pages of the previous kernel for completion wait buffers, command buffers, event buffers and memremap them during kdump boot and essentially work with an already enabled IOMMU configuration and re-using the previous kernel’s data structures. Reusing of command buffers and event buffers is now done for kdump boot irrespective of SNP being enabled during kdump. Re-use of completion wait buffers is only done when SNP is enabled as the exclusion base register is used for the completion wait buffer (CWB) address only when SNP is enabled. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Link: https://lore.kernel.org/r/ff04b381a8fe774b175c23c1a336b28bc1396511.1756157913.git.ashish.kalra@amd.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 14:44:30 +02:00
Xichao Zhao	d3d3b60427	iommu/amd: use str_plural() to simplify the code Use the string choice helper function str_plural() to simplify the code. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Reviewed-by: Ankit Soni <Ankit.Soni@amd.com> Link: https://lore.kernel.org/r/20250818070556.458271-1-zhao.xichao@vivo.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-09-05 14:25:32 +02:00
Kees Cook	8503d0fcb1	iommu/amd: Avoid stack buffer overflow from kernel cmdline While the kernel command line is considered trusted in most environments, avoid writing 1 byte past the end of "acpiid" if the "str" argument is maximum length. Reported-by: Simcha Kosman <simcha.kosman@cyberark.com> Closes: https://lore.kernel.org/all/AS8P193MB2271C4B24BCEDA31830F37AE84A52@AS8P193MB2271.EURP193.PROD.OUTLOOK.COM Fixes: `b6b26d86c6` ("iommu/amd: Add a length limitation for the ivrs_acpihid command-line parameter") Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Ankit Soni <Ankit.Soni@amd.com> Link: https://lore.kernel.org/r/20250804154023.work.970-kees@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-08-15 11:50:47 +02:00

1 2 3 4 5 ...

607 commits