From 72971184a1eed005b48babe226673d5496bcd959 Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Tue, 3 Feb 2026 18:35:58 +0100 Subject: [PATCH 01/16] cxl, doc: Remove isonum.txt inclusion MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This patch removes the line to include:: . From Jon: "This include has been cargo-culted around the docs...the only real use of it is to write |copy| rather than ©, but these docs don't even do that. It can be taken out." Cc: Jonathan Corbet Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Reviewed-by: Alison Schofield Signed-off-by: Robert Richter Link: https://patch.msgid.link/20260203173604.1440334-1-rrichter@amd.com Signed-off-by: Dave Jiang --- Documentation/driver-api/cxl/conventions.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/Documentation/driver-api/cxl/conventions.rst b/Documentation/driver-api/cxl/conventions.rst index e37336d7b116..ed4237583d36 100644 --- a/Documentation/driver-api/cxl/conventions.rst +++ b/Documentation/driver-api/cxl/conventions.rst @@ -1,5 +1,4 @@ .. SPDX-License-Identifier: GPL-2.0 -.. include:: ======================================= Compute Express Link: Linux Conventions From e6efbd2995c1c14fbf53e2b63056eeeb30b034b1 Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Tue, 3 Feb 2026 18:35:59 +0100 Subject: [PATCH 02/16] cxl, doc: Moving conventions in separate files Moving conventions in separate files. Cc: Jonathan Corbet Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Reviewed-by: Alison Schofield Signed-off-by: Robert Richter Link: https://patch.msgid.link/20260203173604.1440334-2-rrichter@amd.com Signed-off-by: Dave Jiang --- Documentation/driver-api/cxl/conventions.rst | 176 +----------------- .../driver-api/cxl/conventions/cxl-lmh.rst | 135 ++++++++++++++ .../driver-api/cxl/conventions/template.rst | 37 ++++ 3 files changed, 178 insertions(+), 170 deletions(-) create mode 100644 Documentation/driver-api/cxl/conventions/cxl-lmh.rst create mode 100644 Documentation/driver-api/cxl/conventions/template.rst diff --git a/Documentation/driver-api/cxl/conventions.rst b/Documentation/driver-api/cxl/conventions.rst index ed4237583d36..9267a697b2fe 100644 --- a/Documentation/driver-api/cxl/conventions.rst +++ b/Documentation/driver-api/cxl/conventions.rst @@ -1,8 +1,7 @@ .. SPDX-License-Identifier: GPL-2.0 -======================================= Compute Express Link: Linux Conventions -======================================= +####################################### There exists shipping platforms that bend or break CXL specification expectations. Record the details and the rationale for those deviations. @@ -10,172 +9,9 @@ Borrow the ACPI Code First template format to capture the assumptions and tradeoffs such that multiple platform implementations can follow the same convention. -<(template) Title> -================== +.. toctree:: + :maxdepth: 1 + :caption: Contents -Document --------- -CXL Revision , Version - -License -------- -SPDX-License Identifier: CC-BY-4.0 - -Creator/Contributors --------------------- - -Summary of the Change ---------------------- - - - - -Benefits of the Change ----------------------- - - - -References ----------- - -Detailed Description of the Change ----------------------------------- - - - - -Resolve conflict between CFMWS, Platform Memory Holes, and Endpoint Decoders -============================================================================ - -Document --------- - -CXL Revision 3.2, Version 1.0 - -License -------- - -SPDX-License Identifier: CC-BY-4.0 - -Creator/Contributors --------------------- - -- Fabio M. De Francesco, Intel -- Dan J. Williams, Intel -- Mahesh Natu, Intel - -Summary of the Change ---------------------- - -According to the current Compute Express Link (CXL) Specifications (Revision -3.2, Version 1.0), the CXL Fixed Memory Window Structure (CFMWS) describes zero -or more Host Physical Address (HPA) windows associated with each CXL Host -Bridge. Each window represents a contiguous HPA range that may be interleaved -across one or more targets, including CXL Host Bridges. Each window has a set -of restrictions that govern its usage. It is the Operating System-directed -configuration and Power Management (OSPM) responsibility to utilize each window -for the specified use. - -Table 9-22 of the current CXL Specifications states that the Window Size field -contains the total number of consecutive bytes of HPA this window describes. -This value must be a multiple of the Number of Interleave Ways (NIW) * 256 MB. - -Platform Firmware (BIOS) might reserve physical addresses below 4 GB where a -memory gap such as the Low Memory Hole for PCIe MMIO may exist. In such cases, -the CFMWS Range Size may not adhere to the NIW * 256 MB rule. - -The HPA represents the actual physical memory address space that the CXL devices -can decode and respond to, while the System Physical Address (SPA), a related -but distinct concept, represents the system-visible address space that users can -direct transaction to and so it excludes reserved regions. - -BIOS publishes CFMWS to communicate the active SPA ranges that, on platforms -with LMH's, map to a strict subset of the HPA. The SPA range trims out the hole, -resulting in lost capacity in the Endpoints with no SPA to map to that part of -the HPA range that intersects the hole. - -E.g, an x86 platform with two CFMWS and an LMH starting at 2 GB: - - +--------+------------+-------------------+------------------+-------------------+------+ - | Window | CFMWS Base | CFMWS Size | HDM Decoder Base | HDM Decoder Size | Ways | - +========+============+===================+==================+===================+======+ - |  0 | 0 GB | 2 GB | 0 GB | 3 GB | 12 | - +--------+------------+-------------------+------------------+-------------------+------+ - |  1 | 4 GB | NIW*256MB Aligned | 4 GB | NIW*256MB Aligned | 12 | - +--------+------------+-------------------+------------------+-------------------+------+ - -HDM decoder base and HDM decoder size represent all the 12 Endpoint Decoders of -a 12 ways region and all the intermediate Switch Decoders. They are configured -by the BIOS according to the NIW * 256MB rule, resulting in a HPA range size of -3GB. Instead, the CFMWS Base and CFMWS Size are used to configure the Root -Decoder HPA range that results smaller (2GB) than that of the Switch and -Endpoint Decoders in the hierarchy (3GB). - -This creates 2 issues which lead to a failure to construct a region: - -1) A mismatch in region size between root and any HDM decoder. The root decoders - will always be smaller due to the trim. - -2) The trim causes the root decoder to violate the (NIW * 256MB) rule. - -This change allows a region with a base address of 0GB to bypass these checks to -allow for region creation with the trimmed root decoder address range. - -This change does not allow for any other arbitrary region to violate these -checks - it is intended exclusively to enable x86 platforms which map CXL memory -under 4GB. - -Despite the HDM decoders covering the PCIE hole HPA region, it is expected that -the platform will never route address accesses to the CXL complex because the -root decoder only covers the trimmed region (which excludes this). This is -outside the ability of Linux to enforce. - -On the example platform, only the first 2GB will be potentially usable, but -Linux, aiming to adhere to the current specifications, fails to construct -Regions and attach Endpoint and intermediate Switch Decoders to them. - -There are several points of failure that due to the expectation that the Root -Decoder HPA size, that is equal to the CFMWS from which it is configured, has -to be greater or equal to the matching Switch and Endpoint HDM Decoders. - -In order to succeed with construction and attachment, Linux must construct a -Region with Root Decoder HPA range size, and then attach to that all the -intermediate Switch Decoders and Endpoint Decoders that belong to the hierarchy -regardless of their range sizes. - -Benefits of the Change ----------------------- - -Without the change, the OSPM wouldn't match intermediate Switch and Endpoint -Decoders with Root Decoders configured with CFMWS HPA sizes that don't align -with the NIW * 256MB constraint, and so it leads to lost memdev capacity. - -This change allows the OSPM to construct Regions and attach intermediate Switch -and Endpoint Decoders to them, so that the addressable part of the memory -devices total capacity is made available to the users. - -References ----------- - -Compute Express Link Specification Revision 3.2, Version 1.0 - - -Detailed Description of the Change ----------------------------------- - -The description of the Window Size field in table 9-22 needs to account for -platforms with Low Memory Holes, where SPA ranges might be subsets of the -endpoints HPA. Therefore, it has to be changed to the following: - -"The total number of consecutive bytes of HPA this window represents. This value -shall be a multiple of NIW * 256 MB. - -On platforms that reserve physical addresses below 4 GB, such as the Low Memory -Hole for PCIe MMIO on x86, an instance of CFMWS whose Base HPA range is 0 might -have a size that doesn't align with the NIW * 256 MB constraint. - -Note that the matching intermediate Switch Decoders and the Endpoint Decoders -HPA range sizes must still align to the above-mentioned rule, but the memory -capacity that exceeds the CFMWS window size won't be accessible.". + conventions/cxl-lmh.rst + conventions/template.rst diff --git a/Documentation/driver-api/cxl/conventions/cxl-lmh.rst b/Documentation/driver-api/cxl/conventions/cxl-lmh.rst new file mode 100644 index 000000000000..baece5c35345 --- /dev/null +++ b/Documentation/driver-api/cxl/conventions/cxl-lmh.rst @@ -0,0 +1,135 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Resolve conflict between CFMWS, Platform Memory Holes, and Endpoint Decoders +============================================================================ + +Document +-------- + +CXL Revision 3.2, Version 1.0 + +License +------- + +SPDX-License Identifier: CC-BY-4.0 + +Creator/Contributors +-------------------- + +- Fabio M. De Francesco, Intel +- Dan J. Williams, Intel +- Mahesh Natu, Intel + +Summary of the Change +--------------------- + +According to the current Compute Express Link (CXL) Specifications (Revision +3.2, Version 1.0), the CXL Fixed Memory Window Structure (CFMWS) describes zero +or more Host Physical Address (HPA) windows associated with each CXL Host +Bridge. Each window represents a contiguous HPA range that may be interleaved +across one or more targets, including CXL Host Bridges. Each window has a set +of restrictions that govern its usage. It is the Operating System-directed +configuration and Power Management (OSPM) responsibility to utilize each window +for the specified use. + +Table 9-22 of the current CXL Specifications states that the Window Size field +contains the total number of consecutive bytes of HPA this window describes. +This value must be a multiple of the Number of Interleave Ways (NIW) * 256 MB. + +Platform Firmware (BIOS) might reserve physical addresses below 4 GB where a +memory gap such as the Low Memory Hole for PCIe MMIO may exist. In such cases, +the CFMWS Range Size may not adhere to the NIW * 256 MB rule. + +The HPA represents the actual physical memory address space that the CXL devices +can decode and respond to, while the System Physical Address (SPA), a related +but distinct concept, represents the system-visible address space that users can +direct transaction to and so it excludes reserved regions. + +BIOS publishes CFMWS to communicate the active SPA ranges that, on platforms +with LMH's, map to a strict subset of the HPA. The SPA range trims out the hole, +resulting in lost capacity in the Endpoints with no SPA to map to that part of +the HPA range that intersects the hole. + +E.g, an x86 platform with two CFMWS and an LMH starting at 2 GB: + + +--------+------------+-------------------+------------------+-------------------+------+ + | Window | CFMWS Base | CFMWS Size | HDM Decoder Base | HDM Decoder Size | Ways | + +========+============+===================+==================+===================+======+ + |  0 | 0 GB | 2 GB | 0 GB | 3 GB | 12 | + +--------+------------+-------------------+------------------+-------------------+------+ + |  1 | 4 GB | NIW*256MB Aligned | 4 GB | NIW*256MB Aligned | 12 | + +--------+------------+-------------------+------------------+-------------------+------+ + +HDM decoder base and HDM decoder size represent all the 12 Endpoint Decoders of +a 12 ways region and all the intermediate Switch Decoders. They are configured +by the BIOS according to the NIW * 256MB rule, resulting in a HPA range size of +3GB. Instead, the CFMWS Base and CFMWS Size are used to configure the Root +Decoder HPA range that results smaller (2GB) than that of the Switch and +Endpoint Decoders in the hierarchy (3GB). + +This creates 2 issues which lead to a failure to construct a region: + +1) A mismatch in region size between root and any HDM decoder. The root decoders + will always be smaller due to the trim. + +2) The trim causes the root decoder to violate the (NIW * 256MB) rule. + +This change allows a region with a base address of 0GB to bypass these checks to +allow for region creation with the trimmed root decoder address range. + +This change does not allow for any other arbitrary region to violate these +checks - it is intended exclusively to enable x86 platforms which map CXL memory +under 4GB. + +Despite the HDM decoders covering the PCIE hole HPA region, it is expected that +the platform will never route address accesses to the CXL complex because the +root decoder only covers the trimmed region (which excludes this). This is +outside the ability of Linux to enforce. + +On the example platform, only the first 2GB will be potentially usable, but +Linux, aiming to adhere to the current specifications, fails to construct +Regions and attach Endpoint and intermediate Switch Decoders to them. + +There are several points of failure that due to the expectation that the Root +Decoder HPA size, that is equal to the CFMWS from which it is configured, has +to be greater or equal to the matching Switch and Endpoint HDM Decoders. + +In order to succeed with construction and attachment, Linux must construct a +Region with Root Decoder HPA range size, and then attach to that all the +intermediate Switch Decoders and Endpoint Decoders that belong to the hierarchy +regardless of their range sizes. + +Benefits of the Change +---------------------- + +Without the change, the OSPM wouldn't match intermediate Switch and Endpoint +Decoders with Root Decoders configured with CFMWS HPA sizes that don't align +with the NIW * 256MB constraint, and so it leads to lost memdev capacity. + +This change allows the OSPM to construct Regions and attach intermediate Switch +and Endpoint Decoders to them, so that the addressable part of the memory +devices total capacity is made available to the users. + +References +---------- + +Compute Express Link Specification Revision 3.2, Version 1.0 + + +Detailed Description of the Change +---------------------------------- + +The description of the Window Size field in table 9-22 needs to account for +platforms with Low Memory Holes, where SPA ranges might be subsets of the +endpoints HPA. Therefore, it has to be changed to the following: + +"The total number of consecutive bytes of HPA this window represents. This value +shall be a multiple of NIW * 256 MB. + +On platforms that reserve physical addresses below 4 GB, such as the Low Memory +Hole for PCIe MMIO on x86, an instance of CFMWS whose Base HPA range is 0 might +have a size that doesn't align with the NIW * 256 MB constraint. + +Note that the matching intermediate Switch Decoders and the Endpoint Decoders +HPA range sizes must still align to the above-mentioned rule, but the memory +capacity that exceeds the CFMWS window size won't be accessible.". diff --git a/Documentation/driver-api/cxl/conventions/template.rst b/Documentation/driver-api/cxl/conventions/template.rst new file mode 100644 index 000000000000..ff2fcf1b5e24 --- /dev/null +++ b/Documentation/driver-api/cxl/conventions/template.rst @@ -0,0 +1,37 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. :: Template Title here: + +Template File +============= + +Document +-------- +CXL Revision , Version + +License +------- +SPDX-License Identifier: CC-BY-4.0 + +Creator/Contributors +-------------------- + +Summary of the Change +--------------------- + + + +Benefits of the Change +---------------------- + + + +References +---------- + +Detailed Description of the Change +---------------------------------- + + From 0692afe940e0959dd2fa74539622f16cf3709433 Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Tue, 3 Feb 2026 18:36:00 +0100 Subject: [PATCH 03/16] Documentation/driver-api/cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement This adds a convention document for the following patch series: cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement Version 7 and later: https://lore.kernel.org/linux-cxl/20251114213931.30754-1-rrichter@amd.com/ Link: https://lore.kernel.org/linux-cxl/20251114213931.30754-1-rrichter@amd.com/ Reviewed-by: Gregory Price Reviewed-by: Dave Jiang Reviewed-by: Alison Schofield Reviewed-by: Jonathan Cameron Reviewed-by: Dan Williams Acked-by: Dan Williams Signed-off-by: Robert Richter Link: https://patch.msgid.link/20260203173604.1440334-3-rrichter@amd.com Signed-off-by: Dave Jiang --- Documentation/driver-api/cxl/conventions.rst | 1 + .../driver-api/cxl/conventions/cxl-atl.rst | 304 ++++++++++++++++++ 2 files changed, 305 insertions(+) create mode 100644 Documentation/driver-api/cxl/conventions/cxl-atl.rst diff --git a/Documentation/driver-api/cxl/conventions.rst b/Documentation/driver-api/cxl/conventions.rst index 9267a697b2fe..0d2e07279ad9 100644 --- a/Documentation/driver-api/cxl/conventions.rst +++ b/Documentation/driver-api/cxl/conventions.rst @@ -14,4 +14,5 @@ same convention. :caption: Contents conventions/cxl-lmh.rst + conventions/cxl-atl.rst conventions/template.rst diff --git a/Documentation/driver-api/cxl/conventions/cxl-atl.rst b/Documentation/driver-api/cxl/conventions/cxl-atl.rst new file mode 100644 index 000000000000..3a36a84743d0 --- /dev/null +++ b/Documentation/driver-api/cxl/conventions/cxl-atl.rst @@ -0,0 +1,304 @@ +.. SPDX-License-Identifier: GPL-2.0 + +ACPI PRM CXL Address Translation +================================ + +Document +-------- + +CXL Revision 3.2, Version 1.0 + +License +------- + +SPDX-License Identifier: CC-BY-4.0 + +Creator/Contributors +-------------------- + +- Robert Richter, AMD et al. + +Summary of the Change +--------------------- + +The CXL Fixed Memory Window Structures (CFMWS) describe zero or more Host +Physical Address (HPA) windows associated with one or more CXL Host Bridges. +Each HPA range of a CXL Host Bridge is represented by a CFMWS entry. An HPA +range may include addresses currently assigned to CXL.mem devices, or an OS may +assign ranges from an address window to a device. + +Host-managed Device Memory is Device-attached memory that is mapped to system +coherent address space and accessible to the Host using standard write-back +semantics. The managed address range is configured in the CXL HDM Decoder +registers of the device. An HDM Decoder in a device is responsible for +converting HPA into DPA by stripping off specific address bits. + +CXL devices and CXL bridges use the same HPA space. It is common across all +components that belong to the same host domain. The view of the address region +must be consistent on the CXL.mem path between the Host and the Device. + +This is described in the *CXL 3.2 specification* (Table 1-1, 3.3.1, +8.2.4.20, 9.13.1, 9.18.1.3). [#cxl-spec-3.2]_ + +Depending on the interconnect architecture of the platform, components attached +to a host may not share the same host physical address space. Those platforms +need address translation to convert an HPA between the host and the attached +component, such as a CXL device. The translation mechanism is host-specific and +implementation dependent. + +For example, x86 AMD platforms use a Data Fabric that manages access to physical +memory. Devices have their own memory space and can be configured to use +'Normalized addresses' different from System Physical Addresses (SPA). Address +translation is then needed. For details, see +:doc:`x86 AMD Address Translation `. + +Those AMD platforms provide PRM [#prm-spec]_ handlers in firmware to perform +various types of address translation, including for CXL endpoints. AMD Zen5 +systems implement the ACPI PRM CXL Address Translation firmware call. The ACPI +PRM handler has a specific GUID to uniquely identify platforms with support for +Normalized addressing. This is documented in the *ACPI v6.5 Porting Guide* +(Address Translation - CXL DPA to System Physical Address). [#amd-ppr-58088]_ + +When in Normalized address mode, HDM decoder address ranges must be configured +and handled differently. Hardware addresses used in the HDM decoder +configurations of an endpoint are not SPA and need to be translated from the +address range of the endpoint to that of the CXL host bridge. This is especially +important for finding an endpoint's associated CXL Host Bridge and HPA window +described in the CFMWS. Additionally, the interleave decoding is done by the +Data Fabric and the endpoint does not perform decoding when converting HPA to +DPA. Instead, interleaving is switched off for the endpoint (1-way). Finally, +address translation might also be needed to inspect the endpoint's hardware +addresses, such as during profiling, tracing, or error handling. + +For example, with Normalized addressing the HDM decoders could look as follows:: + + ------------------------------- + | Root Decoder (CFMWS) | + | SPA Range: 0x850000000 | + | Size: 0x8000000000 (512 GB) | + | Interleave Ways: 1 | + ------------------------------- + | + v + ------------------------------- + | Host Bridge Decoder (HDM) | + | SPA Range: 0x850000000 | + | Size: 0x8000000000 (512 GB) | + | Interleave Ways: 4 | + | Targets: endpoint5,8,11,13 | + | Granularity: 256 | + ------------------------------- + | + -----------------------------+------------------------------ + | | | | + v v v v + ------------------- ------------------- ------------------- ------------------- + | endpoint5 | | endpoint8 | | endpoint11 | | endpoint13 | + | decoder5.0 | | decoder8.0 | | decoder11.0 | | decoder13.0 | + | PCIe: | | PCIe: | | PCIe: | | PCIe: | + | 0000:e2:00.0 | | 0000:e3:00.0 | | 0000:e4:00.0 | | 0000:e1:00.0 | + | DPA: | | DPA: | | DPA: | | DPA: | + | Start: 0x0 | | Start: 0x0 | | Start: 0x0 | | Start: 0x0 | + | Size: | | Size: | | Size: | | Size: | + | 0x2000000000 | | 0x2000000000 | | 0x2000000000 | | 0x2000000000 | + | (128 GB) | | (128 GB) | | (128 GB) | | (128 GB) | + | Interleaving: | | Interleaving: | | Interleaving: | | Interleaving: | + | Ways: 1 | | Ways: 1 | | Ways: 1 | | Ways: 1 | + | Gran: 256 | | Gran: 256 | | Gran: 256 | | Gran: 256 | + ------------------- ------------------- ------------------- ------------------- + | | | | + v v v v + DPA DPA DPA DPA + +This shows the representation in sysfs: + +.. code-block:: none + + /sys/bus/cxl/devices/endpoint5/decoder5.0/interleave_granularity:256 + /sys/bus/cxl/devices/endpoint5/decoder5.0/interleave_ways:1 + /sys/bus/cxl/devices/endpoint5/decoder5.0/size:0x2000000000 + /sys/bus/cxl/devices/endpoint5/decoder5.0/start:0x0 + /sys/bus/cxl/devices/endpoint8/decoder8.0/interleave_granularity:256 + /sys/bus/cxl/devices/endpoint8/decoder8.0/interleave_ways:1 + /sys/bus/cxl/devices/endpoint8/decoder8.0/size:0x2000000000 + /sys/bus/cxl/devices/endpoint8/decoder8.0/start:0x0 + /sys/bus/cxl/devices/endpoint11/decoder11.0/interleave_granularity:256 + /sys/bus/cxl/devices/endpoint11/decoder11.0/interleave_ways:1 + /sys/bus/cxl/devices/endpoint11/decoder11.0/size:0x2000000000 + /sys/bus/cxl/devices/endpoint11/decoder11.0/start:0x0 + /sys/bus/cxl/devices/endpoint13/decoder13.0/interleave_granularity:256 + /sys/bus/cxl/devices/endpoint13/decoder13.0/interleave_ways:1 + /sys/bus/cxl/devices/endpoint13/decoder13.0/size:0x2000000000 + /sys/bus/cxl/devices/endpoint13/decoder13.0/start:0x0 + +Note that the endpoint interleaving configurations use direct mapping (1-way). + +With PRM calls, the kernel can determine the following mappings: + +.. code-block:: none + + cxl decoder5.0: address mapping found for 0000:e2:00.0 (hpa -> spa): + 0x0+0x2000000000 -> 0x850000000+0x8000000000 ways:4 granularity:256 + cxl decoder8.0: address mapping found for 0000:e3:00.0 (hpa -> spa): + 0x0+0x2000000000 -> 0x850000000+0x8000000000 ways:4 granularity:256 + cxl decoder11.0: address mapping found for 0000:e4:00.0 (hpa -> spa): + 0x0+0x2000000000 -> 0x850000000+0x8000000000 ways:4 granularity:256 + cxl decoder13.0: address mapping found for 0000:e1:00.0 (hpa -> spa): + 0x0+0x2000000000 -> 0x850000000+0x8000000000 ways:4 granularity:256 + +The corresponding CXL host bridge (HDM) decoders and root decoder (CFMWS) match +the calculated endpoint mappings shown: + +.. code-block:: none + + /sys/bus/cxl/devices/port1/decoder1.0/interleave_granularity:256 + /sys/bus/cxl/devices/port1/decoder1.0/interleave_ways:4 + /sys/bus/cxl/devices/port1/decoder1.0/size:0x8000000000 + /sys/bus/cxl/devices/port1/decoder1.0/start:0x850000000 + /sys/bus/cxl/devices/port1/decoder1.0/target_list:0,1,2,3 + /sys/bus/cxl/devices/port1/decoder1.0/target_type:expander + /sys/bus/cxl/devices/root0/decoder0.0/interleave_granularity:256 + /sys/bus/cxl/devices/root0/decoder0.0/interleave_ways:1 + /sys/bus/cxl/devices/root0/decoder0.0/size:0x8000000000 + /sys/bus/cxl/devices/root0/decoder0.0/start:0x850000000 + /sys/bus/cxl/devices/root0/decoder0.0/target_list:7 + +The following changes to the specification are needed: + +* Allow a CXL device to be in an HPA space other than the host's address space. + +* Allow the platform to use implementation-specific address translation when + crossing memory domains on the CXL.mem path between the host and the device. + +* Define a PRM handler method for converting device addresses to SPAs. + +* Specify that the platform shall provide the PRM handler method to the + Operating System to detect Normalized addressing and for determining Endpoint + SPA ranges and interleaving configurations. + +* Add reference to: + + | Platform Runtime Mechanism Specification, Version 1.1 – November 2020 + | https://uefi.org/sites/default/files/resources/PRM_Platform_Runtime_Mechanism_1_1_release_candidate.pdf + +Benefits of the Change +---------------------- + +Without the change, the Operating System may be unable to determine the memory +region and Root Decoder for an Endpoint and its corresponding HDM decoder. +Region creation would fail. Platforms with a different interconnect architecture +would fail to set up and use CXL. + +References +---------- + +.. [#cxl-spec-3.2] Compute Express Link Specification, Revision 3.2, Version 1.0, + https://www.computeexpresslink.org/ + +.. [#amd-ppr-58088] AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh, + ACPI v6.5 Porting Guide, Publication # 58088, + https://www.amd.com/en/search/documentation/hub.html + +.. [#prm-spec] Platform Runtime Mechanism, Version: 1.1, + https://uefi.org/sites/default/files/resources/PRM_Platform_Runtime_Mechanism_1_1_release_candidate.pdf + +Detailed Description of the Change +---------------------------------- + +The following describes the necessary changes to the *CXL 3.2 specification* +[#cxl-spec-3.2]_: + +Add the following reference to the table: + +Table 1-2. Reference Documents + ++----------------------------+-------------------+---------------------------+ +| Document | Chapter Reference | Document No./Location | ++============================+===================+===========================+ +| Platform Runtime Mechanism | Chapter 8, 9 | https://www.uefi.org/acpi | +| Version: 1.1 | | | ++----------------------------+-------------------+---------------------------+ + +Add the following paragraphs to the end of the section: + +**8.2.4.20 CXL HDM Decoder Capability Structure** + +"A device may use an HPA space that is not common to other components of the +host domain. The platform is responsible for address translation when crossing +HPA spaces. The Operating System must determine the interleaving configuration +and perform address translation to the HPA ranges of the HDM decoders as needed. +The translation mechanism is host-specific and implementation dependent. + +The platform indicates support of independent HPA spaces and the need for +address translation by providing a Platform Runtime Mechanism (PRM) handler. The +OS shall use that handler to perform the necessary translations from the DPA +space to the HPA space. The handler is defined in Section 9.18.4 *PRM Handler +for CXL DPA to System Physical Address Translation*." + +Add the following section and sub-section including tables: + +**9.18.4 PRM Handler for CXL DPA to System Physical Address Translation** + +"A platform may be configured to use 'Normalized addresses'. Host physical +address (HPA) spaces are component-specific and differ from system physical +addresses (SPAs). The endpoint has its own physical address space. All requests +presented to the device already use Device Physical Addresses (DPAs). The CXL +endpoint decoders have interleaving disabled (1-way interleaving) and the device +does not perform HPA decoding to determine a DPA. + +The platform provides a PRM handler for CXL DPA to System Physical Address +Translation. The PRM handler translates a Device Physical Address (DPA) to a +System Physical Address (SPA) for a specified CXL endpoint. In the address space +of the host, SPA and HPA are equivalent, and the OS shall use this handler to +determine the HPA that corresponds to a device address, for example when +configuring HDM decoders on platforms with Normalized addressing. The GUID and +the parameter buffer format of the handler are specified in section 9.18.4.1. If +the OS identifies the PRM handler, the platform supports Normalized addressing +and the OS must perform DPA address translation as needed." + +**9.18.4.1 PRM Handler Invocation** + +"The OS calls the PRM handler for CXL DPA to System Physical Address Translation +using the direct invocation mechanism. Details of calling a PRM handler are +described in the Platform Runtime Mechanism (PRM) specification. + +The PRM handler is identified by the following GUID: + + EE41B397-25D4-452C-AD54-48C6E3480B94 + +The caller allocates and prepares a Parameter Buffer, then passes the PRM +handler GUID and a pointer to the Parameter Buffer to invoke the handler. The +Parameter Buffer is described in Table 9-32." + +**Table 9-32. PRM Parameter Buffer used for CXL DPA to System Physical Address Translation** + ++-------------+-----------+------------------------------------------------------------------------+ +| Byte Offset | Length in | Description | +| | Bytes | | ++=============+===========+========================================================================+ +| 00h | 8 | **CXL Device Physical Address (DPA)**: CXL DPA (e.g., from | +| | | CXL Component Event Log) | ++-------------+-----------+------------------------------------------------------------------------+ +| 08h | 4 | **CXL Endpoint SBDF**: | +| | | | +| | | - Byte 3 - PCIe Segment | +| | | - Byte 2 - Bus Number | +| | | - Byte 1: | +| | | - Device Number Bits[7:3] | +| | | - Function Number Bits[2:0] | +| | | - Byte 0 - RESERVED (MBZ) | +| | | | ++-------------+-----------+------------------------------------------------------------------------+ +| 0Ch | 8 | **Output Buffer**: Virtual Address Pointer to the buffer, | +| | | as defined in Table 9-33. | ++-------------+-----------+------------------------------------------------------------------------+ + +**Table 9-33. PRM Output Buffer used for CXL DPA to System Physical Address Translation** + ++-------------+-----------+------------------------------------------------------------------------+ +| Byte Offset | Length in | Description | +| | Bytes | | ++=============+===========+========================================================================+ +| 00h | 8 | **System Physical Address (SPA)**: The SPA converted | +| | | from the CXL DPA. | ++-------------+-----------+------------------------------------------------------------------------+ From df8b57c34b47e0acbe1133ca58ac75ec3c56771f Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Wed, 14 Jan 2026 17:48:17 +0100 Subject: [PATCH 04/16] cxl/region: Rename misleading variable name @hpa to @hpa_range @hpa is actually a @hpa_range, rename variables accordingly. Reviewed-by: Gregory Price Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Reviewed-by: Alison Schofield Signed-off-by: Robert Richter Link: https://patch.msgid.link/20260114164837.1076338-2-rrichter@amd.com Signed-off-by: Dave Jiang --- drivers/cxl/core/region.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index ae899f68551f..51f1a5545324 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -3474,9 +3474,9 @@ static int match_decoder_by_range(struct device *dev, const void *data) } static struct cxl_decoder * -cxl_port_find_switch_decoder(struct cxl_port *port, struct range *hpa) +cxl_port_find_switch_decoder(struct cxl_port *port, struct range *hpa_range) { - struct device *cxld_dev = device_find_child(&port->dev, hpa, + struct device *cxld_dev = device_find_child(&port->dev, hpa_range, match_decoder_by_range); return cxld_dev ? to_cxl_decoder(cxld_dev) : NULL; @@ -3489,14 +3489,14 @@ cxl_find_root_decoder(struct cxl_endpoint_decoder *cxled) struct cxl_port *port = cxled_to_port(cxled); struct cxl_root *cxl_root __free(put_cxl_root) = find_cxl_root(port); struct cxl_decoder *root, *cxld = &cxled->cxld; - struct range *hpa = &cxld->hpa_range; + struct range *hpa_range = &cxld->hpa_range; - root = cxl_port_find_switch_decoder(&cxl_root->port, hpa); + root = cxl_port_find_switch_decoder(&cxl_root->port, hpa_range); if (!root) { dev_err(cxlmd->dev.parent, "%s:%s no CXL window for range %#llx:%#llx\n", dev_name(&cxlmd->dev), dev_name(&cxld->dev), - cxld->hpa_range.start, cxld->hpa_range.end); + hpa_range->start, hpa_range->end); return NULL; } @@ -3562,7 +3562,7 @@ static int __construct_region(struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled) { struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - struct range *hpa = &cxled->cxld.hpa_range; + struct range *hpa_range = &cxled->cxld.hpa_range; struct cxl_region_params *p; struct resource *res; int rc; @@ -3583,7 +3583,7 @@ static int __construct_region(struct cxl_region *cxlr, if (!res) return -ENOMEM; - *res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa), + *res = DEFINE_RES_MEM_NAMED(hpa_range->start, range_len(hpa_range), dev_name(&cxlr->dev)); rc = cxl_extended_linear_cache_resize(cxlr, res); @@ -3666,11 +3666,12 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, } static struct cxl_region * -cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa) +cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, + struct range *hpa_range) { struct device *region_dev; - region_dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa, + region_dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa_range, match_region_by_range); if (!region_dev) return NULL; @@ -3680,7 +3681,7 @@ cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa) int cxl_add_to_region(struct cxl_endpoint_decoder *cxled) { - struct range *hpa = &cxled->cxld.hpa_range; + struct range *hpa_range = &cxled->cxld.hpa_range; struct cxl_region_params *p; bool attach = false; int rc; @@ -3691,12 +3692,13 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled) return -ENXIO; /* - * Ensure that if multiple threads race to construct_region() for @hpa - * one does the construction and the others add to that. + * Ensure that, if multiple threads race to construct_region() + * for the HPA range, one does the construction and the others + * add to that. */ mutex_lock(&cxlrd->range_lock); struct cxl_region *cxlr __free(put_cxl_region) = - cxl_find_region_by_range(cxlrd, hpa); + cxl_find_region_by_range(cxlrd, hpa_range); if (!cxlr) cxlr = construct_region(cxlrd, cxled); mutex_unlock(&cxlrd->range_lock); From 4fe82279580d10ba63c1461ff404f2c6c82ff1d5 Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Wed, 14 Jan 2026 17:48:18 +0100 Subject: [PATCH 05/16] cxl/region: Store root decoder in struct cxl_region A region is always bound to a root decoder. The region's associated root decoder is often needed. Add it to struct cxl_region. This simplifies the code by removing dynamic lookups and the root decoder argument from the function argument list where possible. Patch is a prerequisite to implement address translation which uses struct cxl_region to store all relevant region and interleaving parameters. It changes the argument list of __construct_region() in preparation of adding a context argument. Additionally the arg list of cxl_region_attach_position() is simplified and the use of to_cxl_root_decoder() removed, which always reconstructs and checks the pointer. The pointer never changes and is frequently used. Code becomes more readable as this amphazises the binding between both objects. Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Reviewed-by: Gregory Price Reviewed-by: Alison Schofield Signed-off-by: Robert Richter Link: https://patch.msgid.link/20260114164837.1076338-3-rrichter@amd.com Signed-off-by: Dave Jiang --- drivers/cxl/core/region.c | 37 +++++++++++++++++++------------------ drivers/cxl/cxl.h | 2 ++ 2 files changed, 21 insertions(+), 18 deletions(-) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 51f1a5545324..22bd8ff37cef 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -489,9 +489,9 @@ static ssize_t interleave_ways_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent); - struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; struct cxl_region_params *p = &cxlr->params; unsigned int val, save; int rc; @@ -552,9 +552,9 @@ static ssize_t interleave_granularity_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent); - struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; struct cxl_region_params *p = &cxlr->params; int rc, val; u16 ig; @@ -628,7 +628,7 @@ static DEVICE_ATTR_RO(mode); static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; struct cxl_region_params *p = &cxlr->params; struct resource *res; u64 remainder = 0; @@ -1373,7 +1373,7 @@ static int cxl_port_setup_targets(struct cxl_port *port, struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; int parent_iw, parent_ig, ig, iw, rc, pos = cxled->pos; struct cxl_port *parent_port = to_cxl_port(port->dev.parent); struct cxl_region_ref *cxl_rr = cxl_rr_load(port, cxlr); @@ -1731,10 +1731,10 @@ static int cxl_region_validate_position(struct cxl_region *cxlr, } static int cxl_region_attach_position(struct cxl_region *cxlr, - struct cxl_root_decoder *cxlrd, struct cxl_endpoint_decoder *cxled, const struct cxl_dport *dport, int pos) { + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd; struct cxl_decoder *cxld = &cxlsd->cxld; @@ -1971,7 +1971,7 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr) static int cxl_region_attach(struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled, int pos) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct cxl_region_params *p = &cxlr->params; @@ -2076,8 +2076,7 @@ static int cxl_region_attach(struct cxl_region *cxlr, ep_port = cxled_to_port(cxled); dport = cxl_find_dport_by_dev(root_port, ep_port->host_bridge); - rc = cxl_region_attach_position(cxlr, cxlrd, cxled, - dport, i); + rc = cxl_region_attach_position(cxlr, cxled, dport, i); if (rc) return rc; } @@ -2100,7 +2099,7 @@ static int cxl_region_attach(struct cxl_region *cxlr, if (rc) return rc; - rc = cxl_region_attach_position(cxlr, cxlrd, cxled, dport, pos); + rc = cxl_region_attach_position(cxlr, cxled, dport, pos); if (rc) return rc; @@ -2396,8 +2395,8 @@ static const struct attribute_group *region_groups[] = { static void cxl_region_release(struct device *dev) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent); struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; int id = atomic_read(&cxlrd->region_id); /* @@ -2480,10 +2479,12 @@ static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int i * region id allocations */ get_device(dev->parent); + cxlr->cxlrd = cxlrd; + cxlr->id = id; + device_set_pm_not_required(dev); dev->bus = &cxl_bus_type; dev->type = &cxl_region_type; - cxlr->id = id; cxl_region_set_lock(cxlr, &cxlrd->cxlsd.cxld); return cxlr; @@ -3115,7 +3116,7 @@ EXPORT_SYMBOL_FOR_MODULES(cxl_calculate_hpa_offset, "cxl_translate"); u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, u64 dpa) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; struct cxl_region_params *p = &cxlr->params; struct cxl_endpoint_decoder *cxled = NULL; u64 dpa_offset, hpa_offset, hpa; @@ -3168,7 +3169,7 @@ static int region_offset_to_dpa_result(struct cxl_region *cxlr, u64 offset, struct dpa_result *result) { struct cxl_region_params *p = &cxlr->params; - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; struct cxl_endpoint_decoder *cxled; u64 hpa, hpa_offset, dpa_offset; u16 eig = 0; @@ -3522,7 +3523,7 @@ static int match_region_by_range(struct device *dev, const void *data) static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr, struct resource *res) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; struct cxl_region_params *p = &cxlr->params; resource_size_t size = resource_size(res); resource_size_t cache_size, start; @@ -3558,9 +3559,9 @@ static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr, } static int __construct_region(struct cxl_region *cxlr, - struct cxl_root_decoder *cxlrd, struct cxl_endpoint_decoder *cxled) { + struct cxl_root_decoder *cxlrd = cxlr->cxlrd; struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct range *hpa_range = &cxled->cxld.hpa_range; struct cxl_region_params *p; @@ -3656,7 +3657,7 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, return cxlr; } - rc = __construct_region(cxlr, cxlrd, cxled); + rc = __construct_region(cxlr, cxled); if (rc) { devm_release_action(port->uport_dev, unregister_region, cxlr); return ERR_PTR(rc); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index ba17fa86d249..10ce9c3a8a55 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -529,6 +529,7 @@ enum cxl_partition_mode { * struct cxl_region - CXL region * @dev: This region's device * @id: This region's id. Id is globally unique across all regions + * @cxlrd: Region's root decoder * @mode: Operational mode of the mapped capacity * @type: Endpoint decoder target type * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown @@ -542,6 +543,7 @@ enum cxl_partition_mode { struct cxl_region { struct device dev; int id; + struct cxl_root_decoder *cxlrd; enum cxl_partition_mode mode; enum cxl_decoder_type type; struct cxl_nvdimm_bridge *cxl_nvb; From 98ceb1a42dab91c6dcf95d1d424cba61b0f9bc5c Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Wed, 14 Jan 2026 17:48:19 +0100 Subject: [PATCH 06/16] cxl/region: Store HPA range in struct cxl_region Each region has a known host physical address (HPA) range it is assigned to. Endpoint decoders assigned to a region share the same HPA range. The region's address range is the system's physical address (SPA) range. Endpoint decoders in systems that need address translation use HPAs which are not SPAs. To make the SPA range accessible to the endpoint decoders, store and track the region's SPA range in struct cxl_region. Introduce the @hpa_range member to the struct. Now, the SPA range of an endpoint decoder can be determined based on its assigned region. Patch is a prerequisite to implement address translation which uses struct cxl_region to store all relevant region and interleaving parameters. Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Reviewed-by: Gregory Price Reviewed-by: Alison Schofield Signed-off-by: Robert Richter Link: https://patch.msgid.link/20260114164837.1076338-4-rrichter@amd.com Signed-off-by: Dave Jiang --- drivers/cxl/core/region.c | 7 +++++++ drivers/cxl/cxl.h | 2 ++ 2 files changed, 9 insertions(+) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 22bd8ff37cef..04c3ff66ec81 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -664,6 +664,8 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size) return PTR_ERR(res); } + cxlr->hpa_range = DEFINE_RANGE(res->start, res->end); + p->res = res; p->state = CXL_CONFIG_INTERLEAVE_ACTIVE; @@ -700,6 +702,8 @@ static int free_hpa(struct cxl_region *cxlr) if (p->state >= CXL_CONFIG_ACTIVE) return -EBUSY; + cxlr->hpa_range = DEFINE_RANGE(0, -1); + cxl_region_iomem_release(cxlr); p->state = CXL_CONFIG_IDLE; return 0; @@ -2453,6 +2457,8 @@ static void unregister_region(void *_cxlr) for (i = 0; i < p->interleave_ways; i++) detach_target(cxlr, i); + cxlr->hpa_range = DEFINE_RANGE(0, -1); + cxl_region_iomem_release(cxlr); put_device(&cxlr->dev); } @@ -3579,6 +3585,7 @@ static int __construct_region(struct cxl_region *cxlr, } set_bit(CXL_REGION_F_AUTO, &cxlr->flags); + cxlr->hpa_range = *hpa_range; res = kmalloc(sizeof(*res), GFP_KERNEL); if (!res) diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 10ce9c3a8a55..3a5ca1936ed1 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -530,6 +530,7 @@ enum cxl_partition_mode { * @dev: This region's device * @id: This region's id. Id is globally unique across all regions * @cxlrd: Region's root decoder + * @hpa_range: Address range occupied by the region * @mode: Operational mode of the mapped capacity * @type: Endpoint decoder target type * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown @@ -544,6 +545,7 @@ struct cxl_region { struct device dev; int id; struct cxl_root_decoder *cxlrd; + struct range hpa_range; enum cxl_partition_mode mode; enum cxl_decoder_type type; struct cxl_nvdimm_bridge *cxl_nvb; From 3e422caa40d0d4bf25ece6e82418ce642d56524a Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Wed, 14 Jan 2026 17:48:20 +0100 Subject: [PATCH 07/16] cxl: Simplify cxl_root_ops allocation and handling A root port's callback handlers are collected in struct cxl_root_ops. The structure is dynamically allocated, though it contains only a single pointer in it. This also requires to check two pointers to check for the existance of a callback. Simplify the allocation, release and handler check by embedding the ops statically in struct cxl_root. Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Reviewed-by: Alison Schofield Signed-off-by: Robert Richter Link: https://patch.msgid.link/20260114164837.1076338-5-rrichter@amd.com Signed-off-by: Dave Jiang --- drivers/cxl/acpi.c | 7 ++----- drivers/cxl/core/cdat.c | 8 ++++---- drivers/cxl/core/port.c | 8 ++------ drivers/cxl/cxl.h | 19 ++++++++++--------- 4 files changed, 18 insertions(+), 24 deletions(-) diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index 77ac940e3013..b4bed40ef7c0 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -318,10 +318,6 @@ static int cxl_acpi_qos_class(struct cxl_root *cxl_root, return cxl_acpi_evaluate_qtg_dsm(handle, coord, entries, qos_class); } -static const struct cxl_root_ops acpi_root_ops = { - .qos_class = cxl_acpi_qos_class, -}; - static void del_cxl_resource(struct resource *res) { if (!res) @@ -923,9 +919,10 @@ static int cxl_acpi_probe(struct platform_device *pdev) cxl_res->end = -1; cxl_res->flags = IORESOURCE_MEM; - cxl_root = devm_cxl_add_root(host, &acpi_root_ops); + cxl_root = devm_cxl_add_root(host); if (IS_ERR(cxl_root)) return PTR_ERR(cxl_root); + cxl_root->ops.qos_class = cxl_acpi_qos_class; root_port = &cxl_root->port; rc = bus_for_each_dev(adev->dev.bus, NULL, root_port, diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c index 7120b5f2e31f..18f0f2a25113 100644 --- a/drivers/cxl/core/cdat.c +++ b/drivers/cxl/core/cdat.c @@ -213,7 +213,7 @@ static int cxl_port_perf_data_calculate(struct cxl_port *port, if (!cxl_root) return -ENODEV; - if (!cxl_root->ops || !cxl_root->ops->qos_class) + if (!cxl_root->ops.qos_class) return -EOPNOTSUPP; xa_for_each(dsmas_xa, index, dent) { @@ -221,9 +221,9 @@ static int cxl_port_perf_data_calculate(struct cxl_port *port, cxl_coordinates_combine(dent->coord, dent->cdat_coord, ep_c); dent->entries = 1; - rc = cxl_root->ops->qos_class(cxl_root, - &dent->coord[ACCESS_COORDINATE_CPU], - 1, &qos_class); + rc = cxl_root->ops.qos_class(cxl_root, + &dent->coord[ACCESS_COORDINATE_CPU], + 1, &qos_class); if (rc != 1) continue; diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index fef3aa0c6680..2338d146577c 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -954,19 +954,15 @@ struct cxl_port *devm_cxl_add_port(struct device *host, } EXPORT_SYMBOL_NS_GPL(devm_cxl_add_port, "CXL"); -struct cxl_root *devm_cxl_add_root(struct device *host, - const struct cxl_root_ops *ops) +struct cxl_root *devm_cxl_add_root(struct device *host) { - struct cxl_root *cxl_root; struct cxl_port *port; port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL); if (IS_ERR(port)) return ERR_CAST(port); - cxl_root = to_cxl_root(port); - cxl_root->ops = ops; - return cxl_root; + return to_cxl_root(port); } EXPORT_SYMBOL_NS_GPL(devm_cxl_add_root, "CXL"); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 3a5ca1936ed1..0e15dc6e169f 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -646,6 +646,14 @@ struct cxl_port { resource_size_t component_reg_phys; }; +struct cxl_root; + +struct cxl_root_ops { + int (*qos_class)(struct cxl_root *cxl_root, + struct access_coordinate *coord, int entries, + int *qos_class); +}; + /** * struct cxl_root - logical collection of root cxl_port items * @@ -654,7 +662,7 @@ struct cxl_port { */ struct cxl_root { struct cxl_port port; - const struct cxl_root_ops *ops; + struct cxl_root_ops ops; }; static inline struct cxl_root * @@ -663,12 +671,6 @@ to_cxl_root(const struct cxl_port *port) return container_of(port, struct cxl_root, port); } -struct cxl_root_ops { - int (*qos_class)(struct cxl_root *cxl_root, - struct access_coordinate *coord, int entries, - int *qos_class); -}; - static inline struct cxl_dport * cxl_find_dport_by_dev(struct cxl_port *port, const struct device *dport_dev) { @@ -782,8 +784,7 @@ struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport_dev, resource_size_t component_reg_phys, struct cxl_dport *parent_dport); -struct cxl_root *devm_cxl_add_root(struct device *host, - const struct cxl_root_ops *ops); +struct cxl_root *devm_cxl_add_root(struct device *host); struct cxl_root *find_cxl_root(struct cxl_port *port); DEFINE_FREE(put_cxl_root, struct cxl_root *, if (_T) put_device(&_T->port.dev)) From bc01fd5019faa14f4253de6f6abcae6d957c3a12 Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Wed, 14 Jan 2026 17:48:21 +0100 Subject: [PATCH 08/16] cxl/region: Separate region parameter setup and region construction To construct a region, the region parameters such as address range and interleaving config need to be determined. This is done while constructing the region by inspecting the endpoint decoder configuration. The endpoint decoder is passed as a function argument. With address translation the endpoint decoder data is no longer sufficient to extract the region parameters as some of the information is obtained using other methods such as using firmware calls. In a first step, separate code to determine the region parameters from the region construction. Temporarily store all the data to create the region in the new struct cxl_region_context. Once the region data is determined and struct cxl_region_context is filled, construct the region. Patch is a prerequisite to implement address translation. The code separation helps to later extend it to determine region parameters using other methods as needed, esp. to support address translation. Reviewed-by: Gregory Price Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Reviewed-by: Alison Schofield Tested-by: Gregory Price Signed-off-by: Robert Richter Link: https://patch.msgid.link/20260114164837.1076338-6-rrichter@amd.com Signed-off-by: Dave Jiang --- drivers/cxl/core/core.h | 8 ++++++++ drivers/cxl/core/region.c | 27 ++++++++++++++++++--------- 2 files changed, 26 insertions(+), 9 deletions(-) diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 1fb66132b777..ae9e1bb51562 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -19,6 +19,14 @@ enum cxl_detach_mode { }; #ifdef CONFIG_CXL_REGION + +struct cxl_region_context { + struct cxl_endpoint_decoder *cxled; + struct range hpa_range; + int interleave_ways; + int interleave_granularity; +}; + extern struct device_attribute dev_attr_create_pmem_region; extern struct device_attribute dev_attr_create_ram_region; extern struct device_attribute dev_attr_delete_region; diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 04c3ff66ec81..5ae77e9feb4d 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -3565,11 +3565,12 @@ static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr, } static int __construct_region(struct cxl_region *cxlr, - struct cxl_endpoint_decoder *cxled) + struct cxl_region_context *ctx) { + struct cxl_endpoint_decoder *cxled = ctx->cxled; struct cxl_root_decoder *cxlrd = cxlr->cxlrd; struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - struct range *hpa_range = &cxled->cxld.hpa_range; + struct range *hpa_range = &ctx->hpa_range; struct cxl_region_params *p; struct resource *res; int rc; @@ -3622,8 +3623,8 @@ static int __construct_region(struct cxl_region *cxlr, } p->res = res; - p->interleave_ways = cxled->cxld.interleave_ways; - p->interleave_granularity = cxled->cxld.interleave_granularity; + p->interleave_ways = ctx->interleave_ways; + p->interleave_granularity = ctx->interleave_granularity; p->state = CXL_CONFIG_INTERLEAVE_ACTIVE; rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group()); @@ -3643,8 +3644,9 @@ static int __construct_region(struct cxl_region *cxlr, /* Establish an empty region covering the given HPA range */ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, - struct cxl_endpoint_decoder *cxled) + struct cxl_region_context *ctx) { + struct cxl_endpoint_decoder *cxled = ctx->cxled; struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_port *port = cxlrd_to_port(cxlrd); struct cxl_dev_state *cxlds = cxlmd->cxlds; @@ -3664,7 +3666,7 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, return cxlr; } - rc = __construct_region(cxlr, cxled); + rc = __construct_region(cxlr, ctx); if (rc) { devm_release_action(port->uport_dev, unregister_region, cxlr); return ERR_PTR(rc); @@ -3689,11 +3691,18 @@ cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, int cxl_add_to_region(struct cxl_endpoint_decoder *cxled) { - struct range *hpa_range = &cxled->cxld.hpa_range; + struct cxl_region_context ctx; struct cxl_region_params *p; bool attach = false; int rc; + ctx = (struct cxl_region_context) { + .cxled = cxled, + .hpa_range = cxled->cxld.hpa_range, + .interleave_ways = cxled->cxld.interleave_ways, + .interleave_granularity = cxled->cxld.interleave_granularity, + }; + struct cxl_root_decoder *cxlrd __free(put_cxl_root_decoder) = cxl_find_root_decoder(cxled); if (!cxlrd) @@ -3706,9 +3715,9 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled) */ mutex_lock(&cxlrd->range_lock); struct cxl_region *cxlr __free(put_cxl_region) = - cxl_find_region_by_range(cxlrd, hpa_range); + cxl_find_region_by_range(cxlrd, &ctx.hpa_range); if (!cxlr) - cxlr = construct_region(cxlrd, cxled); + cxlr = construct_region(cxlrd, &ctx); mutex_unlock(&cxlrd->range_lock); rc = PTR_ERR_OR_ZERO(cxlr); From 1fd6c38fc5e18a9904bc1bd447bb4c2708f0292d Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Wed, 14 Jan 2026 17:48:22 +0100 Subject: [PATCH 09/16] cxl/region: Add @hpa_range argument to function cxl_calc_interleave_pos() cxl_calc_interleave_pos() uses the endpoint decoder's HPA range to determine its interleaving position. This requires the endpoint decoders to be an SPA, which is not the case for systems that need address translation. Add a separate @hpa_range argument to function cxl_calc_interleave_pos() to specify the address range. Now it is possible to pass the SPA translated address range of an endpoint decoder to function cxl_calc_interleave_pos(). Refactor only, no functional changes. Patch is a prerequisite to implement address translation. Reviewed-by: Gregory Price Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Reviewed-by: Alison Schofield Tested-by: Gregory Price Signed-off-by: Robert Richter Link: https://patch.msgid.link/20260114164837.1076338-7-rrichter@amd.com Signed-off-by: Dave Jiang --- drivers/cxl/core/region.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 5ae77e9feb4d..60d2d1dae2aa 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1878,6 +1878,7 @@ static int find_pos_and_ways(struct cxl_port *port, struct range *range, /** * cxl_calc_interleave_pos() - calculate an endpoint position in a region * @cxled: endpoint decoder member of given region + * @hpa_range: translated HPA range of the endpoint * * The endpoint position is calculated by traversing the topology from * the endpoint to the root decoder and iteratively applying this @@ -1890,11 +1891,11 @@ static int find_pos_and_ways(struct cxl_port *port, struct range *range, * Return: position >= 0 on success * -ENXIO on failure */ -static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled) +static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled, + struct range *hpa_range) { struct cxl_port *iter, *port = cxled_to_port(cxled); struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - struct range *range = &cxled->cxld.hpa_range; int parent_ways = 0, parent_pos = 0, pos = 0; int rc; @@ -1932,7 +1933,8 @@ static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled) if (is_cxl_root(iter)) break; - rc = find_pos_and_ways(iter, range, &parent_pos, &parent_ways); + rc = find_pos_and_ways(iter, hpa_range, &parent_pos, + &parent_ways); if (rc) return rc; @@ -1942,7 +1944,7 @@ static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled) dev_dbg(&cxlmd->dev, "decoder:%s parent:%s port:%s range:%#llx-%#llx pos:%d\n", dev_name(&cxled->cxld.dev), dev_name(cxlmd->dev.parent), - dev_name(&port->dev), range->start, range->end, pos); + dev_name(&port->dev), hpa_range->start, hpa_range->end, pos); return pos; } @@ -1955,7 +1957,7 @@ static int cxl_region_sort_targets(struct cxl_region *cxlr) for (i = 0; i < p->nr_targets; i++) { struct cxl_endpoint_decoder *cxled = p->targets[i]; - cxled->pos = cxl_calc_interleave_pos(cxled); + cxled->pos = cxl_calc_interleave_pos(cxled, &cxlr->hpa_range); /* * Record that sorting failed, but still continue to calc * cxled->pos so that follow-on code paths can reliably @@ -2139,7 +2141,7 @@ static int cxl_region_attach(struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled = p->targets[i]; int test_pos; - test_pos = cxl_calc_interleave_pos(cxled); + test_pos = cxl_calc_interleave_pos(cxled, &cxlr->hpa_range); dev_dbg(&cxled->cxld.dev, "Test cxl_calc_interleave_pos(): %s test_pos:%d cxled->pos:%d\n", (test_pos == cxled->pos) ? "success" : "fail", From d01149bbe76d81d360ed24853d5247fcaad873e4 Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Wed, 14 Jan 2026 17:48:23 +0100 Subject: [PATCH 10/16] cxl/region: Use region data to get the root decoder To find a region's root decoder, the endpoint's HPA range is used to search the matching decoder by its range. With address translation the endpoint decoder's range is in a different address space and thus cannot be used to determine the root decoder. The region parameters are encapsulated within struct cxl_region_context and may include the translated Host Physical Address (HPA) range. Use this context to identify the root decoder rather than relying on the endpoint. Modify cxl_find_root_decoder() and add the region context as parameter. Rename this function to get_cxl_root_decoder() as a counterpart to put_cxl_root_decoder(). Simplify the implementation by removing function cxl_port_find_switch_decode(). The function is unnecessary because it is not referenced or utilized elsewhere in the code. Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Reviewed-by: Alison Schofield Tested-by: Gregory Price Signed-off-by: Robert Richter Link: https://patch.msgid.link/20260114164837.1076338-8-rrichter@amd.com Signed-off-by: Dave Jiang --- drivers/cxl/core/region.c | 50 +++++++++++++++++++-------------------- 1 file changed, 24 insertions(+), 26 deletions(-) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 60d2d1dae2aa..912796fd708e 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -3469,47 +3469,44 @@ err: return rc; } -static int match_decoder_by_range(struct device *dev, const void *data) +static int match_root_decoder(struct device *dev, const void *data) { const struct range *r1, *r2 = data; - struct cxl_decoder *cxld; + struct cxl_root_decoder *cxlrd; - if (!is_switch_decoder(dev)) + if (!is_root_decoder(dev)) return 0; - cxld = to_cxl_decoder(dev); - r1 = &cxld->hpa_range; + cxlrd = to_cxl_root_decoder(dev); + r1 = &cxlrd->cxlsd.cxld.hpa_range; + return range_contains(r1, r2); } -static struct cxl_decoder * -cxl_port_find_switch_decoder(struct cxl_port *port, struct range *hpa_range) -{ - struct device *cxld_dev = device_find_child(&port->dev, hpa_range, - match_decoder_by_range); - - return cxld_dev ? to_cxl_decoder(cxld_dev) : NULL; -} - +/* + * Note, when finished with the device, drop the reference with + * put_device() or use the put_cxl_root_decoder helper. + */ static struct cxl_root_decoder * -cxl_find_root_decoder(struct cxl_endpoint_decoder *cxled) +get_cxl_root_decoder(struct cxl_endpoint_decoder *cxled, + struct cxl_region_context *ctx) { struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_port *port = cxled_to_port(cxled); struct cxl_root *cxl_root __free(put_cxl_root) = find_cxl_root(port); - struct cxl_decoder *root, *cxld = &cxled->cxld; - struct range *hpa_range = &cxld->hpa_range; + struct device *cxlrd_dev; - root = cxl_port_find_switch_decoder(&cxl_root->port, hpa_range); - if (!root) { + cxlrd_dev = device_find_child(&cxl_root->port.dev, &ctx->hpa_range, + match_root_decoder); + if (!cxlrd_dev) { dev_err(cxlmd->dev.parent, "%s:%s no CXL window for range %#llx:%#llx\n", - dev_name(&cxlmd->dev), dev_name(&cxld->dev), - hpa_range->start, hpa_range->end); - return NULL; + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), + ctx->hpa_range.start, ctx->hpa_range.end); + return ERR_PTR(-ENXIO); } - return to_cxl_root_decoder(&root->dev); + return to_cxl_root_decoder(cxlrd_dev); } static int match_region_by_range(struct device *dev, const void *data) @@ -3706,9 +3703,10 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled) }; struct cxl_root_decoder *cxlrd __free(put_cxl_root_decoder) = - cxl_find_root_decoder(cxled); - if (!cxlrd) - return -ENXIO; + get_cxl_root_decoder(cxled, &ctx); + + if (IS_ERR(cxlrd)) + return PTR_ERR(cxlrd); /* * Ensure that, if multiple threads race to construct_region() From a31af41115b0f7021a86f5439cb8720b93314f91 Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Wed, 14 Jan 2026 17:48:24 +0100 Subject: [PATCH 11/16] cxl: Introduce callback for HPA address ranges translation Introduce a callback to translate an endpoint's HPA range to the address range of the root port which is the System Physical Address (SPA) range used by a region. The callback can be set if a platform needs to handle address translation. The callback is attached to the root port. An endpoint's root port can easily be determined in the PCI hierarchy without any CXL specific knowledge. This allows the early use of address translation for CXL enumeration. Address translation is esp. needed for the detection of the root decoders. Thus, the callback is embedded in struct cxl_root_ops instead of struct cxl_rd_ops. Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Reviewed-by: Alison Schofield Tested-by: Gregory Price Signed-off-by: Robert Richter Link: https://patch.msgid.link/20260114164837.1076338-9-rrichter@amd.com Signed-off-by: Dave Jiang --- drivers/cxl/core/region.c | 24 ++++++++++++++++++++++++ drivers/cxl/cxl.h | 1 + 2 files changed, 25 insertions(+) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 912796fd708e..ed8469fa55a9 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -3483,6 +3483,15 @@ static int match_root_decoder(struct device *dev, const void *data) return range_contains(r1, r2); } +static int cxl_root_setup_translation(struct cxl_root *cxl_root, + struct cxl_region_context *ctx) +{ + if (!cxl_root->ops.translation_setup_root) + return 0; + + return cxl_root->ops.translation_setup_root(cxl_root, ctx); +} + /* * Note, when finished with the device, drop the reference with * put_device() or use the put_cxl_root_decoder helper. @@ -3495,6 +3504,21 @@ get_cxl_root_decoder(struct cxl_endpoint_decoder *cxled, struct cxl_port *port = cxled_to_port(cxled); struct cxl_root *cxl_root __free(put_cxl_root) = find_cxl_root(port); struct device *cxlrd_dev; + int rc; + + /* + * Adjust the endpoint's HPA range and interleaving + * configuration to the root decoder’s memory space before + * setting up the root decoder. + */ + rc = cxl_root_setup_translation(cxl_root, ctx); + if (rc) { + dev_err(cxlmd->dev.parent, + "%s:%s Failed to setup translation for address range %#llx:%#llx\n", + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), + ctx->hpa_range.start, ctx->hpa_range.end); + return ERR_PTR(rc); + } cxlrd_dev = device_find_child(&cxl_root->port.dev, &ctx->hpa_range, match_root_decoder); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 0e15dc6e169f..8ea334d81edf 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -652,6 +652,7 @@ struct cxl_root_ops { int (*qos_class)(struct cxl_root *cxl_root, struct access_coordinate *coord, int entries, int *qos_class); + int (*translation_setup_root)(struct cxl_root *cxl_root, void *data); }; /** From 7be03eae1fdb690dff8f102a7306ca61b55a810c Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Wed, 14 Jan 2026 17:48:25 +0100 Subject: [PATCH 12/16] cxl/acpi: Prepare use of EFI runtime services In order to use EFI runtime services, esp. ACPI PRM which uses the efi_rts_wq workqueue, initialize EFI before CXL ACPI. There is a subsys_initcall order dependency if driver is builtin: subsys_initcall(cxl_acpi_init); subsys_initcall(efisubsys_init); Prevent the efi_rts_wq workqueue being used by cxl_acpi_init() before its allocation. Use subsys_initcall_sync(cxl_acpi_init) to always run efisubsys_init() first. Reported-by: Gregory Price Tested-by: Joshua Hahn Reviewed-by: Joshua Hahn Reviewed-by: Gregory Price Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Tested-by: Gregory Price Signed-off-by: Robert Richter Reviewed-by: Dave Jiang > --- Link: https://patch.msgid.link/20260114164837.1076338-10-rrichter@amd.com Signed-off-by: Dave Jiang --- drivers/cxl/acpi.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index b4bed40ef7c0..a31d0f97f916 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -1005,8 +1005,12 @@ static void __exit cxl_acpi_exit(void) cxl_bus_drain(); } -/* load before dax_hmem sees 'Soft Reserved' CXL ranges */ -subsys_initcall(cxl_acpi_init); +/* + * Load before dax_hmem sees 'Soft Reserved' CXL ranges. Use + * subsys_initcall_sync() since there is an order dependency with + * subsys_initcall(efisubsys_init), which must run first. + */ +subsys_initcall_sync(cxl_acpi_init); /* * Arrange for host-bridge ports to be active synchronous with From af74daf91652f15b82560bb93850d2ec8bbfa976 Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Tue, 27 Jan 2026 11:12:31 -0700 Subject: [PATCH 13/16] cxl: Enable AMD Zen5 address translation using ACPI PRMT MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add AMD Zen5 support for address translation. Zen5 systems may be configured to use 'Normalized addresses'. Then, host physical addresses (HPA) are different from their system physical addresses (SPA). The endpoint has its own physical address space and an incoming HPA is already converted to the device's physical address (DPA). Thus it has interleaving disabled and CXL endpoints are programmed passthrough (DPA == HPA). Host Physical Addresses (HPAs) need to be translated from the endpoint to its CXL host bridge, esp. to identify the endpoint's root decoder and region's address range. ACPI Platform Runtime Mechanism (PRM) provides a handler to translate the DPA to its SPA. This is documented in: AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh ACPI v6.5 Porting Guide, Publication # 58088 https://www.amd.com/en/search/documentation/hub.html With Normalized Addressing this PRM handler must be used to translate an HPA of an endpoint to its SPA. Do the following to implement AMD Zen5 address translation: Introduce a new file core/atl.c to handle ACPI PRM specific address translation code. Naming is loosely related to the kernel's AMD Address Translation Library (CONFIG_AMD_ATL) but implementation does not depend on it, nor it is vendor specific. Use Kbuild and Kconfig options respectively to enable the code depending on architecture and platform options. AMD Zen5 systems support the ACPI PRM CXL Address Translation firmware call (see ACPI v6.5 Porting Guide, Address Translation - CXL DPA to System Physical Address). Firmware enables the PRM handler if the platform has address translation implemented. Check firmware and kernel support of ACPI PRM using the specific GUID. On success enable address translation by setting up the earlier introduced root port callback, see function cxl_prm_setup_translation(). Setup is done in cxl_setup_prm_address_translation(), it is the only function that needs to be exported. For low level PRM firmware calls, use the ACPI framework. Identify the region's interleaving ways by inspecting the address ranges. Also determine the interleaving granularity using the address translation callback. Note that the position of the chunk from one interleaving block to the next may vary and thus cannot be considered constant. Address offsets larger than the interleaving block size cannot be used to calculate the granularity. Thus, probe the granularity using address translation for various HPAs in the same interleaving block. [ dj: Add atl.o build to cxl_test ] Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Tested-by: Gregory Price Signed-off-by: Robert Richter Link: https://patch.msgid.link/20260114164837.1076338-11-rrichter@amd.com Signed-off-by: Dave Jiang --- drivers/cxl/Kconfig | 5 + drivers/cxl/acpi.c | 2 + drivers/cxl/core/Makefile | 1 + drivers/cxl/core/atl.c | 190 ++++++++++++++++++++++++++++++++++++++ drivers/cxl/cxl.h | 7 ++ tools/testing/cxl/Kbuild | 1 + 6 files changed, 206 insertions(+) create mode 100644 drivers/cxl/core/atl.c diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index 48b7314afdb8..103950a9b73e 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -233,4 +233,9 @@ config CXL_MCE def_bool y depends on X86_MCE && MEMORY_FAILURE +config CXL_ATL + def_bool y + depends on CXL_REGION + depends on ACPI_PRMT && AMD_NB + endif diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index a31d0f97f916..50c2987e0459 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -925,6 +925,8 @@ static int cxl_acpi_probe(struct platform_device *pdev) cxl_root->ops.qos_class = cxl_acpi_qos_class; root_port = &cxl_root->port; + cxl_setup_prm_address_translation(cxl_root); + rc = bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_dport); if (rc < 0) diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile index 5ad8fef210b5..11fe272a6e29 100644 --- a/drivers/cxl/core/Makefile +++ b/drivers/cxl/core/Makefile @@ -20,3 +20,4 @@ cxl_core-$(CONFIG_CXL_REGION) += region.o cxl_core-$(CONFIG_CXL_MCE) += mce.o cxl_core-$(CONFIG_CXL_FEATURES) += features.o cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o +cxl_core-$(CONFIG_CXL_ATL) += atl.o diff --git a/drivers/cxl/core/atl.c b/drivers/cxl/core/atl.c new file mode 100644 index 000000000000..c36984686fb0 --- /dev/null +++ b/drivers/cxl/core/atl.c @@ -0,0 +1,190 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2025 Advanced Micro Devices, Inc. + */ + +#include +#include +#include + +#include +#include "core.h" + +/* + * PRM Address Translation - CXL DPA to System Physical Address + * + * Reference: + * + * AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh + * ACPI v6.5 Porting Guide, Publication # 58088 + */ + +static const guid_t prm_cxl_dpa_spa_guid = + GUID_INIT(0xee41b397, 0x25d4, 0x452c, 0xad, 0x54, 0x48, 0xc6, 0xe3, + 0x48, 0x0b, 0x94); + +struct prm_cxl_dpa_spa_data { + u64 dpa; + u8 reserved; + u8 devfn; + u8 bus; + u8 segment; + u64 *spa; +} __packed; + +static u64 prm_cxl_dpa_spa(struct pci_dev *pci_dev, u64 dpa) +{ + struct prm_cxl_dpa_spa_data data; + u64 spa; + int rc; + + data = (struct prm_cxl_dpa_spa_data) { + .dpa = dpa, + .devfn = pci_dev->devfn, + .bus = pci_dev->bus->number, + .segment = pci_domain_nr(pci_dev->bus), + .spa = &spa, + }; + + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data); + if (rc) { + pci_dbg(pci_dev, "failed to get SPA for %#llx: %d\n", dpa, rc); + return ULLONG_MAX; + } + + pci_dbg(pci_dev, "PRM address translation: DPA -> SPA: %#llx -> %#llx\n", dpa, spa); + + return spa; +} + +static int cxl_prm_setup_root(struct cxl_root *cxl_root, void *data) +{ + struct cxl_region_context *ctx = data; + struct cxl_endpoint_decoder *cxled = ctx->cxled; + struct cxl_decoder *cxld = &cxled->cxld; + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct range hpa_range = ctx->hpa_range; + struct pci_dev *pci_dev; + u64 spa_len, len; + u64 addr, base_spa, base; + int ways, gran; + + /* + * When Normalized Addressing is enabled, the endpoint maintains a 1:1 + * mapping between HPA and DPA. If disabled, skip address translation + * and perform only a range check. + */ + if (hpa_range.start != cxled->dpa_res->start) + return 0; + + /* + * Endpoints are programmed passthrough in Normalized Addressing mode. + */ + if (ctx->interleave_ways != 1) { + dev_dbg(&cxld->dev, "unexpected interleaving config: ways: %d granularity: %d\n", + ctx->interleave_ways, ctx->interleave_granularity); + return -ENXIO; + } + + if (!cxlmd || !dev_is_pci(cxlmd->dev.parent)) { + dev_dbg(&cxld->dev, "No endpoint found: %s, range %#llx-%#llx\n", + dev_name(cxld->dev.parent), hpa_range.start, + hpa_range.end); + return -ENXIO; + } + + pci_dev = to_pci_dev(cxlmd->dev.parent); + + /* Translate HPA range to SPA. */ + base = hpa_range.start; + hpa_range.start = prm_cxl_dpa_spa(pci_dev, hpa_range.start); + hpa_range.end = prm_cxl_dpa_spa(pci_dev, hpa_range.end); + base_spa = hpa_range.start; + + if (hpa_range.start == ULLONG_MAX || hpa_range.end == ULLONG_MAX) { + dev_dbg(cxld->dev.parent, + "CXL address translation: Failed to translate HPA range: %#llx-%#llx:%#llx-%#llx(%s)\n", + hpa_range.start, hpa_range.end, ctx->hpa_range.start, + ctx->hpa_range.end, dev_name(&cxld->dev)); + return -ENXIO; + } + + /* + * Since translated addresses include the interleaving offsets, align + * the range to 256 MB. + */ + hpa_range.start = ALIGN_DOWN(hpa_range.start, SZ_256M); + hpa_range.end = ALIGN(hpa_range.end, SZ_256M) - 1; + + len = range_len(&ctx->hpa_range); + spa_len = range_len(&hpa_range); + if (!len || !spa_len || spa_len % len) { + dev_dbg(cxld->dev.parent, + "CXL address translation: HPA range not contiguous: %#llx-%#llx:%#llx-%#llx(%s)\n", + hpa_range.start, hpa_range.end, ctx->hpa_range.start, + ctx->hpa_range.end, dev_name(&cxld->dev)); + return -ENXIO; + } + + ways = spa_len / len; + gran = SZ_256; + + /* + * Determine interleave granularity + * + * Note: The position of the chunk from one interleaving block to the + * next may vary and thus cannot be considered constant. Address offsets + * larger than the interleaving block size cannot be used to calculate + * the granularity. + */ + if (ways > 1) { + while (gran <= SZ_16M) { + addr = prm_cxl_dpa_spa(pci_dev, base + gran); + if (addr != base_spa + gran) + break; + gran <<= 1; + } + } + + if (gran > SZ_16M) { + dev_dbg(cxld->dev.parent, + "CXL address translation: Cannot determine granularity: %#llx-%#llx:%#llx-%#llx(%s)\n", + hpa_range.start, hpa_range.end, ctx->hpa_range.start, + ctx->hpa_range.end, dev_name(&cxld->dev)); + return -ENXIO; + } + + ctx->hpa_range = hpa_range; + ctx->interleave_ways = ways; + ctx->interleave_granularity = gran; + + dev_dbg(&cxld->dev, + "address mapping found for %s (hpa -> spa): %#llx+%#llx -> %#llx+%#llx ways:%d granularity:%d\n", + dev_name(cxlmd->dev.parent), base, len, hpa_range.start, + spa_len, ways, gran); + + return 0; +} + +void cxl_setup_prm_address_translation(struct cxl_root *cxl_root) +{ + struct device *host = cxl_root->port.uport_dev; + u64 spa; + struct prm_cxl_dpa_spa_data data = { .spa = &spa }; + int rc; + + /* + * Applies only to PCIe Host Bridges which are children of the CXL Root + * Device (HID=“ACPI0017”). Check this and drop cxl_test instances. + */ + if (!acpi_match_device(host->driver->acpi_match_table, host)) + return; + + /* Check kernel (-EOPNOTSUPP) and firmware support (-ENODEV) */ + rc = acpi_call_prm_handler(prm_cxl_dpa_spa_guid, &data); + if (rc == -EOPNOTSUPP || rc == -ENODEV) + return; + + cxl_root->ops.translation_setup_root = cxl_prm_setup_root; +} +EXPORT_SYMBOL_NS_GPL(cxl_setup_prm_address_translation, "CXL"); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 8ea334d81edf..20b0fd43fa7b 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -817,6 +817,13 @@ static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host) { } #endif +#ifdef CONFIG_CXL_ATL +void cxl_setup_prm_address_translation(struct cxl_root *cxl_root); +#else +static inline +void cxl_setup_prm_address_translation(struct cxl_root *cxl_root) {} +#endif + struct cxl_decoder *to_cxl_decoder(struct device *dev); struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev); struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev); diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild index 0e151d0572d1..612d8edbfc6f 100644 --- a/tools/testing/cxl/Kbuild +++ b/tools/testing/cxl/Kbuild @@ -63,6 +63,7 @@ cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o cxl_core-$(CONFIG_CXL_MCE) += $(CXL_CORE_SRC)/mce.o cxl_core-$(CONFIG_CXL_FEATURES) += $(CXL_CORE_SRC)/features.o cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += $(CXL_CORE_SRC)/edac.o +cxl_core-$(CONFIG_CXL_ATL) += $(CXL_CORE_SRC)/atl.o cxl_core-y += config_check.o cxl_core-y += cxl_core_test.o cxl_core-y += cxl_core_exports.o From a2e794895089c1356b7687e8df1fa7d224d40bb6 Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Wed, 14 Jan 2026 17:48:27 +0100 Subject: [PATCH 14/16] cxl/atl: Lock decoders that need address translation The current kernel implementation does not support endpoint setup with Normalized Addressing. It only translates an endpoint's DPA to the SPA range of the host bridge. Therefore, the endpoint address range cannot be determined, making a non-auto setup impossible. If a decoder requires address translation, reprogramming should be disabled and the decoder locked. The BIOS, however, provides all the necessary address translation data, which the kernel can use to reconfigure endpoint decoders with normalized addresses. Locking the decoders in the BIOS would prevent a capable kernel (or other operating systems) from shutting down auto-generated regions and managing resources dynamically. Reviewed-by: Gregory Price Reviewed-by: Jonathan Cameron Reviewed-by: Dave Jiang Tested-by: Gregory Price Signed-off-by: Robert Richter Reviewed-by: Dave Jiang > --- Link: https://patch.msgid.link/20260114164837.1076338-12-rrichter@amd.com Signed-off-by: Dave Jiang --- drivers/cxl/core/atl.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/drivers/cxl/core/atl.c b/drivers/cxl/core/atl.c index c36984686fb0..09d0ea1792d9 100644 --- a/drivers/cxl/core/atl.c +++ b/drivers/cxl/core/atl.c @@ -154,6 +154,24 @@ static int cxl_prm_setup_root(struct cxl_root *cxl_root, void *data) return -ENXIO; } + /* + * The current kernel implementation does not support endpoint + * setup with Normalized Addressing. It only translates an + * endpoint's DPA to the SPA range of the host bridge. + * Therefore, the endpoint address range cannot be determined, + * making a non-auto setup impossible. If a decoder requires + * address translation, reprogramming should be disabled and + * the decoder locked. + * + * The BIOS, however, provides all the necessary address + * translation data, which the kernel can use to reconfigure + * endpoint decoders with normalized addresses. Locking the + * decoders in the BIOS would prevent a capable kernel (or + * other operating systems) from shutting down auto-generated + * regions and managing resources dynamically. + */ + cxld->flags |= CXL_DECODER_F_LOCK; + ctx->hpa_range = hpa_range; ctx->interleave_ways = ways; ctx->interleave_granularity = gran; From d1c9ba46d6c36ff8d5b5f83ae28eae4132e46988 Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Wed, 14 Jan 2026 17:48:28 +0100 Subject: [PATCH 15/16] cxl/region: Factor out code into cxl_region_setup_poison() Poison injection setup code is embedded in cxl_region_probe(). For improved encapsulation, readability, and maintainability, factor out code into function cxl_region_setup_poison(). This patch is a prerequisite to disable poison by region offset for Normalized Addressing. No functional changes. Reviewed-by: Dave Jiang Reviewed-by: Alison Schofield Signed-off-by: Robert Richter Reviewed-by: Jonathan Cameron Link: https://patch.msgid.link/20260114164837.1076338-13-rrichter@amd.com Signed-off-by: Dave Jiang --- drivers/cxl/core/region.c | 53 +++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 25 deletions(-) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index ed8469fa55a9..80cd77f0842e 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -3916,6 +3916,31 @@ static int cxl_region_debugfs_poison_clear(void *data, u64 offset) DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL, cxl_region_debugfs_poison_clear, "%llx\n"); +static int cxl_region_setup_poison(struct cxl_region *cxlr) +{ + struct device *dev = &cxlr->dev; + struct cxl_region_params *p = &cxlr->params; + struct dentry *dentry; + + /* Create poison attributes if all memdevs support the capabilities */ + for (int i = 0; i < p->nr_targets; i++) { + struct cxl_endpoint_decoder *cxled = p->targets[i]; + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + + if (!cxl_memdev_has_poison_cmd(cxlmd, CXL_POISON_ENABLED_INJECT) || + !cxl_memdev_has_poison_cmd(cxlmd, CXL_POISON_ENABLED_CLEAR)) + return 0; + } + + dentry = cxl_debugfs_create_dir(dev_name(dev)); + debugfs_create_file("inject_poison", 0200, dentry, cxlr, + &cxl_poison_inject_fops); + debugfs_create_file("clear_poison", 0200, dentry, cxlr, + &cxl_poison_clear_fops); + + return devm_add_action_or_reset(dev, remove_debugfs, dentry); +} + static int cxl_region_can_probe(struct cxl_region *cxlr) { struct cxl_region_params *p = &cxlr->params; @@ -3945,7 +3970,6 @@ static int cxl_region_probe(struct device *dev) { struct cxl_region *cxlr = to_cxl_region(dev); struct cxl_region_params *p = &cxlr->params; - bool poison_supported = true; int rc; rc = cxl_region_can_probe(cxlr); @@ -3969,30 +3993,9 @@ static int cxl_region_probe(struct device *dev) if (rc) return rc; - /* Create poison attributes if all memdevs support the capabilities */ - for (int i = 0; i < p->nr_targets; i++) { - struct cxl_endpoint_decoder *cxled = p->targets[i]; - struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - - if (!cxl_memdev_has_poison_cmd(cxlmd, CXL_POISON_ENABLED_INJECT) || - !cxl_memdev_has_poison_cmd(cxlmd, CXL_POISON_ENABLED_CLEAR)) { - poison_supported = false; - break; - } - } - - if (poison_supported) { - struct dentry *dentry; - - dentry = cxl_debugfs_create_dir(dev_name(dev)); - debugfs_create_file("inject_poison", 0200, dentry, cxlr, - &cxl_poison_inject_fops); - debugfs_create_file("clear_poison", 0200, dentry, cxlr, - &cxl_poison_clear_fops); - rc = devm_add_action_or_reset(dev, remove_debugfs, dentry); - if (rc) - return rc; - } + rc = cxl_region_setup_poison(cxlr); + if (rc) + return rc; switch (cxlr->mode) { case CXL_PARTMODE_PMEM: From 208f432406b7ed446c061d68cc73efd85b575d3f Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Wed, 14 Jan 2026 17:48:29 +0100 Subject: [PATCH 16/16] cxl: Disable HPA/SPA translation handlers for Normalized Addressing The root decoder provides the callbacks hpa_to_spa and spa_to_hpa to perform Host Physical Address (HPA) and System Physical Address translations, respectively. The callbacks are required to convert addresses when HPA != SPA. XOR interleaving depends on this mechanism, and the necessary handlers are implemented. The translation handlers are used for poison injection (trace_cxl_poison, cxl_poison_inject_fops) and error handling (cxl_event_trace_record). In AMD Zen5 systems with Normalized Addressing, endpoint addresses are not SPAs, and translation handlers are required for these features to function correctly. Now, as ACPI PRM translation could be expensive in tracing or error handling code paths, do not yet enable translations to avoid its intensive use. Instead, disable those features which are used only for debugging and enhanced logging. Introduce the flag CXL_REGION_F_NORMALIZED_ADDRESSING that indicates Normalized Addressing for a region and use it to disable poison injection and DPA to HPA conversion. Note: Dropped unused CXL_DECODER_F_MASK macro. [dj: Fix commit log CXL_REGION_F_NORM_ADDR to CXL_REGION_F_NORMALIZED_ADDRESSING ] Reviewed-by: Alison Schofield Signed-off-by: Robert Richter Reviewed-by: Jonathan Cameron Reviewed-by: Dave Jiang Link: https://patch.msgid.link/20260114164837.1076338-14-rrichter@amd.com Signed-off-by: Dave Jiang --- drivers/cxl/core/atl.c | 3 +++ drivers/cxl/core/region.c | 33 +++++++++++++++++++++++++-------- drivers/cxl/cxl.h | 9 ++++++++- 3 files changed, 36 insertions(+), 9 deletions(-) diff --git a/drivers/cxl/core/atl.c b/drivers/cxl/core/atl.c index 09d0ea1792d9..310668786189 100644 --- a/drivers/cxl/core/atl.c +++ b/drivers/cxl/core/atl.c @@ -169,8 +169,11 @@ static int cxl_prm_setup_root(struct cxl_root *cxl_root, void *data) * decoders in the BIOS would prevent a capable kernel (or * other operating systems) from shutting down auto-generated * regions and managing resources dynamically. + * + * Indicate that Normalized Addressing is enabled. */ cxld->flags |= CXL_DECODER_F_LOCK; + cxld->flags |= CXL_DECODER_F_NORMALIZED_ADDRESSING; ctx->hpa_range = hpa_range; ctx->interleave_ways = ways; diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 80cd77f0842e..8e92b491d686 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1097,14 +1097,16 @@ static int cxl_rr_assign_decoder(struct cxl_port *port, struct cxl_region *cxlr, return 0; } -static void cxl_region_set_lock(struct cxl_region *cxlr, - struct cxl_decoder *cxld) +static void cxl_region_setup_flags(struct cxl_region *cxlr, + struct cxl_decoder *cxld) { - if (!test_bit(CXL_DECODER_F_LOCK, &cxld->flags)) - return; + if (test_bit(CXL_DECODER_F_LOCK, &cxld->flags)) { + set_bit(CXL_REGION_F_LOCK, &cxlr->flags); + clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); + } - set_bit(CXL_REGION_F_LOCK, &cxlr->flags); - clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); + if (test_bit(CXL_DECODER_F_NORMALIZED_ADDRESSING, &cxld->flags)) + set_bit(CXL_REGION_F_NORMALIZED_ADDRESSING, &cxlr->flags); } /** @@ -1218,7 +1220,7 @@ static int cxl_port_attach_region(struct cxl_port *port, } } - cxl_region_set_lock(cxlr, cxld); + cxl_region_setup_flags(cxlr, cxld); rc = cxl_rr_ep_add(cxl_rr, cxled); if (rc) { @@ -2493,7 +2495,7 @@ static struct cxl_region *cxl_region_alloc(struct cxl_root_decoder *cxlrd, int i device_set_pm_not_required(dev); dev->bus = &cxl_bus_type; dev->type = &cxl_region_type; - cxl_region_set_lock(cxlr, &cxlrd->cxlsd.cxld); + cxl_region_setup_flags(cxlr, &cxlrd->cxlsd.cxld); return cxlr; } @@ -3132,6 +3134,13 @@ u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, u8 eiw = 0; int pos; + /* + * Conversion between SPA and DPA is not supported in + * Normalized Address mode. + */ + if (test_bit(CXL_REGION_F_NORMALIZED_ADDRESSING, &cxlr->flags)) + return ULLONG_MAX; + for (int i = 0; i < p->nr_targets; i++) { if (cxlmd == cxled_to_memdev(p->targets[i])) { cxled = p->targets[i]; @@ -3922,6 +3931,14 @@ static int cxl_region_setup_poison(struct cxl_region *cxlr) struct cxl_region_params *p = &cxlr->params; struct dentry *dentry; + /* + * Do not enable poison injection in Normalized Address mode. + * Conversion between SPA and DPA is required for this, but it is + * not supported in this mode. + */ + if (test_bit(CXL_REGION_F_NORMALIZED_ADDRESSING, &cxlr->flags)) + return 0; + /* Create poison attributes if all memdevs support the capabilities */ for (int i = 0; i < p->nr_targets; i++) { struct cxl_endpoint_decoder *cxled = p->targets[i]; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 20b0fd43fa7b..de30a87600be 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -332,7 +332,7 @@ int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport); #define CXL_DECODER_F_TYPE3 BIT(3) #define CXL_DECODER_F_LOCK BIT(4) #define CXL_DECODER_F_ENABLE BIT(5) -#define CXL_DECODER_F_MASK GENMASK(5, 0) +#define CXL_DECODER_F_NORMALIZED_ADDRESSING BIT(6) enum cxl_decoder_type { CXL_DECODER_DEVMEM = 2, @@ -525,6 +525,13 @@ enum cxl_partition_mode { */ #define CXL_REGION_F_LOCK 2 +/* + * Indicate Normalized Addressing. Use it to disable SPA conversion if + * HPA != SPA and an address translation callback handler does not + * exist. Flag is needed by AMD Zen5 platforms. + */ +#define CXL_REGION_F_NORMALIZED_ADDRESSING 3 + /** * struct cxl_region - CXL region * @dev: This region's device