Commit graph

1011 commits

Author SHA1 Message Date
Kees Cook
189f164e57 Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses
Conversion performed via this Coccinelle script:

  // SPDX-License-Identifier: GPL-2.0-only
  // Options: --include-headers-for-types --all-includes --include-headers --keep-comments
  virtual patch

  @gfp depends on patch && !(file in "tools") && !(file in "samples")@
  identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex,
 		    kzalloc_obj,kzalloc_objs,kzalloc_flex,
		    kvmalloc_obj,kvmalloc_objs,kvmalloc_flex,
		    kvzalloc_obj,kvzalloc_objs,kvzalloc_flex};
  @@

  	ALLOC(...
  -		, GFP_KERNEL
  	)

  $ make coccicheck MODE=patch COCCI=gfp.cocci

Build and boot tested x86_64 with Fedora 42's GCC and Clang:

Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01

Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-22 08:26:33 -08:00
Linus Torvalds
32a92f8c89 Convert more 'alloc_obj' cases to default GFP_KERNEL arguments
This converts some of the visually simpler cases that have been split
over multiple lines.  I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.

Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script.  I probably had made it a bit _too_ trivial.

So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.

The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 20:03:00 -08:00
Linus Torvalds
323bbfcf1e Convert 'alloc_flex' family to use the new default GFP_KERNEL argument
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.

As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
505d195b0f Char/Misc/IIO driver changes for 7.0-rc1
Here is the big set of char/misc/iio and other smaller driver subsystem
 changes for 7.0-rc1.  Lots of little things in here, including:
   - Loads of iio driver changes and updates and additions
   - gpib driver updates
   - interconnect driver updates
   - i3c driver updates
   - hwtracing (coresight and intel) driver updates
   - deletion of the obsolete mwave driver
   - binder driver updates (rust and c versions)
   - mhi driver updates (causing a merge conflict, see below)
   - mei driver updates
   - fsi driver updates
   - eeprom driver updates
   - lots of other small char and misc driver updates and cleanups
 
 All of these have been in linux-next for a while, with no reported
 issues except for a merge conflict with your tree due to the mhi driver
 changes in the drivers/net/wireless/ath/ath12k/mhi.c file.  To fix that
 up, just delete the "auto_queue" structure fields being set, see this
 message for the full change needed:
 	https://lore.kernel.org/r/aXD6X23btw8s-RZP@sirena.org.uk
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCaZRxOg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykIrACgs9S+A/GG9X0Kvc+ND/J1XYZpj3QAoKl0yXGj
 SV1SR/giEBc7iKV6Dn6O
 =jbok
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc/IIO driver updates from Greg KH:
 "Here is the big set of char/misc/iio and other smaller driver
  subsystem changes for 7.0-rc1. Lots of little things in here,
  including:

   - Loads of iio driver changes and updates and additions

   - gpib driver updates

   - interconnect driver updates

   - i3c driver updates

   - hwtracing (coresight and intel) driver updates

   - deletion of the obsolete mwave driver

   - binder driver updates (rust and c versions)

   - mhi driver updates (causing a merge conflict, see below)

   - mei driver updates

   - fsi driver updates

   - eeprom driver updates

   - lots of other small char and misc driver updates and cleanups

  All of these have been in linux-next for a while, with no reported
  issues"

* tag 'char-misc-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (297 commits)
  mux: mmio: fix regmap leak on probe failure
  rust_binder: return p from rust_binder_transaction_target_node()
  drivers: android: binder: Update ARef imports from sync::aref
  rust_binder: fix needless borrow in context.rs
  iio: magn: mmc5633: Fix Kconfig for combination of I3C as module and driver builtin
  iio: sca3000: Fix a resource leak in sca3000_probe()
  iio: proximity: rfd77402: Add interrupt handling support
  iio: proximity: rfd77402: Document device private data structure
  iio: proximity: rfd77402: Use devm-managed mutex initialization
  iio: proximity: rfd77402: Use kernel helper for result polling
  iio: proximity: rfd77402: Align polling timeout with datasheet
  iio: cros_ec: Allow enabling/disabling calibration mode
  iio: frequency: ad9523: correct kernel-doc bad line warning
  iio: buffer: buffer_impl.h: fix kernel-doc warnings
  iio: gyro: itg3200: Fix unchecked return value in read_raw
  MAINTAINERS: add entry for ADE9000 driver
  iio: accel: sca3000: remove unused last_timestamp field
  iio: accel: adxl372: remove unused int2_bitmask field
  iio: adc: ad7766: Use iio_trigger_generic_data_rdy_poll()
  iio: magnetometer: Remove IRQF_ONESHOT
  ...
2026-02-17 09:11:04 -08:00
Lizhi Hou
69674c1c70 accel/amdxdna: Move RPM resume into job run function
Currently, amdxdna_pm_resume_get() is called during job creation, and
amdxdna_pm_suspend_put() is called when the hardware notifies job
completion. If a job is canceled before it is run, no hardware
completion notification is generated, resulting in an unbalanced
runtime PM resume/suspend pair.

Fix this by moving amdxdna_pm_resume_get() to the job run path, ensuring
runtime PM is only resumed for jobs that are actually executed.

Fixes: 063db45183 ("accel/amdxdna: Enhance runtime power management")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260204171118.3165607-1-lizhi.hou@amd.com
2026-02-04 13:08:48 -08:00
Lizhi Hou
d19d963d2a accel/amdxdna: Fix incorrect DPM level after suspend/resume
The suspend routine sets the DPM level to 0, which unintentionally
overwrites the previously saved DPM level. As a result, the device always
resumes with DPM level 0 instead of restoring the original value.

Fix this by ensuring the suspend path does not overwrite the saved DPM
level, allowing the correct DPM level to be restored during resume.

Fixes: f4d7b8a6bc ("accel/amdxdna: Enhance power management settings")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260204171048.3165580-1-lizhi.hou@amd.com
2026-02-04 13:08:35 -08:00
Lizhi Hou
750817a7c4 accel/amdxdna: Fix incorrect error code returned for failed chain command
The driver currently returns an incorrect error code when a chain command
fails. In this case, ERT_CMD_STATE_ERROR is expected to be reported for
failed chain commands.

Fixes: aac243092b ("accel/amdxdna: Add command execution")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260203184037.2751889-1-lizhi.hou@amd.com
2026-02-03 11:35:53 -08:00
Lizhi Hou
b853007fdc accel/amdxdna: Remove hardware context status
One newly supported command does not require hardware context configuration
to be performed upfront. As a result, checking hardware context status
causes this command to fail incorrectly.

Remove hardware context status handling entirely. For other commands,
if userspace submits a request without configuring the hardware context
first, the firmware will report an error or time out as appropriate.

Fixes: aac243092b ("accel/amdxdna: Add command execution")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260202212450.2681273-1-lizhi.hou@amd.com
2026-02-03 09:20:24 -08:00
Zishun Yi
84dd57fb03 accel/amdxdna: Fix memory leak in amdxdna_ubuf_map
The amdxdna_ubuf_map() function allocates memory for sg and
internal sg table structures, but it fails to free them if subsequent
operations (sg_alloc_table_from_pages or dma_map_sgtable) fail.

Fixes: bd72d4acda ("accel/amdxdna: Support user space allocated buffer")
Signed-off-by: Zishun Yi <zishun.yi.dev@gmail.com>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Reviewed-by: Min Ma <mamin506@gmail.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260129171022.68578-1-zishun.yi.dev@gmail.com
2026-01-30 11:52:59 -08:00
Lizhi Hou
f1370241fe accel/amdxdna: Stop job scheduling across aie2_release_resource()
Running jobs on a hardware context while it is in the process of
releasing resources can lead to use-after-free and crashes.

Fix this by stopping job scheduling before calling
aie2_release_resource() and restarting it after the release completes.
Additionally, aie2_sched_job_run() now checks whether the hardware
context is still active.

Fixes: 4fd6ca90fc ("accel/amdxdna: Refactor hardware context destroy routine")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260130003255.2083255-1-lizhi.hou@amd.com
2026-01-30 11:52:53 -08:00
Lizhi Hou
a9162439ad accel/amdxdna: Hold mm structure across iommu_sva_unbind_device()
Some tests trigger a crash in iommu_sva_unbind_device() due to
accessing iommu_mm after the associated mm structure has been
freed.

Fix this by taking an explicit reference to the mm structure
after successfully binding the device, and releasing it only
after the device is unbound. This ensures the mm remains valid
for the entire SVA bind/unbind lifetime.

Fixes: be462c97b7 ("accel/amdxdna: Add hardware context")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260128002356.1858122-1-lizhi.hou@amd.com
2026-01-30 11:52:45 -08:00
Greg Kroah-Hartman
740aeaf40f MHI Host
--------
 
 - Add support for loading dual ELF image format firmware to Qcom Trust
   Management Engine Lit (TME-L) supported devices like QCC2072, which require
   separate ELF header for SBL and WLAN firmware segments in a single firmware.
 
 - Remove the MHI auto_queue feature support. This feature was added to offload
   the queuing of buffers from the client drivers to the MHI stack, but it caused
   a lot of race over the time. So remove this feature from the QRTR client
   driver and also from the MHI stack/controller drivers.
 
 - Move the .probe() and .remove() callbacks from driver level to bus level.
 
 MHI Endpoint
 ------------
 
 - Move the .probe() and .remove() callbacks from driver level to bus level.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEZ6VDKoFIy9ikWCeXVZ8R5v6RzvUFAmlvHJAACgkQVZ8R5v6R
 zvUpogf+Leqm2ma1cORmVV0YgSQkC/DNpNw2wD8oYKRp0hlLdiQTVyJ1l0LZhK1v
 ud/0dDQ25HFlnICltm/eyzDlRp/kJz2OVh4becoDmnUqgqnfR4G1WP3DHhPhvODo
 5wFiKU463RW1Ba0GsJxhPjSNwbXQeMhyBFOIr2XuWNI9wLCT6c5BRXbM8X7fVTcY
 6pNONiEb8YMpclJAIoZwgMLlvHut+RmpKuVdYzOUfzL71os6AmRaDbKFVaBjTqDK
 aXUrh5WZZyVndKcxIPTHzrRiZSbqG8laQp8ES2Lfk5ogX5MUEVJVZ14m2Tx9inyH
 YYaOnYAyeW0/ZP33/p1odfqrJ6RX4w==
 =8oyK
 -----END PGP SIGNATURE-----

Merge tag 'mhi-for-v6.20' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next

Manivannan writes:

MHI Host
--------

- Add support for loading dual ELF image format firmware to Qcom Trust
  Management Engine Lit (TME-L) supported devices like QCC2072, which require
  separate ELF header for SBL and WLAN firmware segments in a single firmware.

- Remove the MHI auto_queue feature support. This feature was added to offload
  the queuing of buffers from the client drivers to the MHI stack, but it caused
  a lot of race over the time. So remove this feature from the QRTR client
  driver and also from the MHI stack/controller drivers.

- Move the .probe() and .remove() callbacks from driver level to bus level.

MHI Endpoint
------------

- Move the .probe() and .remove() callbacks from driver level to bus level.

* tag 'mhi-for-v6.20' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mani/mhi:
  bus: mhi: ep: Use bus callbacks for .probe() and .remove()
  bus: mhi: host: Use bus callbacks for .probe() and .remove()
  bus: mhi: host: Drop the auto_queue support
  net: qrtr: Drop the MHI auto_queue feature for IPCR DL channels
  mhi: host: Add support for loading dual ELF image format
2026-01-20 16:40:49 +01:00
Dave Airlie
37b812b7fd drm-misc-next for 6.20:
Core Changes:
 
 - atomic: Introduce Gamma/Degamma LUT size check
 - gem: Fix a leak in drm_gem_get_unmapped_area
 - gpuvm: API sanitation for Rust bindings
 - panic: Few corner-cases fixes
 
 Driver Changes:
 
 - Replace system workqueue with percpu equivalent
 
 - amdxdna: Update message buffer allocation requirements, Update
   firmware version check
 - imagination: Add AM62P support
 - ivpu: Implement warm boot flow
 - rockchip: Get rid of atomic_check fixups, Add Rockchip RK3506 Support
 - rocket: Cleanups
 
 - bridge:
   - dw-hdmi-qp: Add support for HPD-less setups
 - panel:
   - mantix: Various power management related improvements
   - new panels: Innolux G150XGE-L05,
 
 - dma-buf:
   - cma: Call clear_page instead of memset
 -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCaWjdfQAKCRAnX84Zoj2+
 dnRFAX9F/dDeYlwD4lelMAqoj/LMWQKoZcI/69A0HE6WwxsAsJJMk5eY2zwz++p2
 7IMcHb0BgLzYKv4dPoV58g+f25kXdxg1xuCPEBwQlHYTxN+owIMnvTbWUgWG4tvP
 yr2Zbt0rzg==
 =zL+Z
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2026-01-15' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for 6.20:

Core Changes:

- atomic: Introduce Gamma/Degamma LUT size check
- gem: Fix a leak in drm_gem_get_unmapped_area
- gpuvm: API sanitation for Rust bindings
- panic: Few corner-cases fixes

Driver Changes:

- Replace system workqueue with percpu equivalent

- amdxdna: Update message buffer allocation requirements, Update
  firmware version check
- imagination: Add AM62P support
- ivpu: Implement warm boot flow
- rockchip: Get rid of atomic_check fixups, Add Rockchip RK3506 Support
- rocket: Cleanups

- bridge:
  - dw-hdmi-qp: Add support for HPD-less setups
- panel:
  - mantix: Various power management related improvements
  - new panels: Innolux G150XGE-L05,

- dma-buf:
  - cma: Call clear_page instead of memset

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patch.msgid.link/20260115-lilac-dragon-of-opposition-ac0a30@houat
2026-01-16 11:04:03 +10:00
Lizhi Hou
b36178488d accel/amdxdna: Fix notifier_wq flushing warning
Create notifier_wq with WQ_MEM_RECLAIM flag to fix the possible warning.

  workqueue: WQ_MEM_RECLAIM amdxdna_js:drm_sched_free_job_work [gpu_sched] is flushing !WQ_MEM_RECLAIM notifier_wq:0x0

Fixes: e486147c91 ("accel/amdxdna: Add BO import and export")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260113173624.256053-1-lizhi.hou@amd.com
2026-01-14 09:07:33 -08:00
Quentin Schulz
57d8ae1569 accel/rocket: factor out code with find_core_for_dev in rocket_remove
There already is a function to return the offset of the core for a given
struct device, so let's reuse that function instead of reimplementing
the same logic.

There's one change in behavior when a struct device is passed which
doesn't match any core's. Before, we would continue through
rocket_remove() but now we exit early, to match what other callers of
find_core_for_dev() (rocket_device_runtime_resume/suspend()) are doing.
This however should never happen. Aside from that, no intended change in
behavior.

Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de>
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Link: https://patch.msgid.link/20251215-rocket-reuse-find-core-v1-1-be86a1d2734c@cherry.de
2026-01-10 17:49:14 +01:00
Quentin Schulz
34f4495a7f accel/rocket: fix unwinding in error path in rocket_probe
When rocket_core_init() fails (as could be the case with EPROBE_DEFER),
we need to properly unwind by decrementing the counter we just
incremented and if this is the first core we failed to probe, remove the
rocket DRM device with rocket_device_fini() as well. This matches the
logic in rocket_remove(). Failing to properly unwind results in
out-of-bounds accesses.

Fixes: 0810d5ad88 ("accel/rocket: Add job submission IOCTL")
Cc: stable@vger.kernel.org
Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de>
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Link: https://patch.msgid.link/20251215-rocket-error-path-v1-2-eec3bf29dc3b@cherry.de
2026-01-10 17:49:14 +01:00
Quentin Schulz
f509a081f6 accel/rocket: fix unwinding in error path in rocket_core_init
When rocket_job_init() is called, iommu_group_get() has already been
called, therefore we should call iommu_group_put() and make the
iommu_group pointer NULL. This aligns with what's done in
rocket_core_fini().

If pm_runtime_resume_and_get() somehow fails, not only should
rocket_job_fini() be called but we should also unwind everything done
before that, that is, disable PM, put the iommu_group, NULLify it and
then call rocket_job_fini(). This is exactly what's done in
rocket_core_fini() so let's call that function instead of duplicating
the code.

Fixes: 0810d5ad88 ("accel/rocket: Add job submission IOCTL")
Cc: stable@vger.kernel.org
Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de>
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Link: https://patch.msgid.link/20251215-rocket-error-path-v1-1-eec3bf29dc3b@cherry.de
2026-01-10 17:49:14 +01:00
Lizhi Hou
f1eac46fe5 accel/amdxdna: Update firmware version check for latest firmware
The latest firmware increases the major version number. Update
aie2_check_protocol() to accept and support the new firmware version.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251219014356.2234241-2-lizhi.hou@amd.com
2026-01-08 09:48:43 -08:00
Lizhi Hou
1222e06934 accel/amdxdna: Update message DMA buffer allocation
The latest firmware requires the message DMA buffer to
  - have a minimum size of 8K
  - use a power-of-two size
  - be aligned to the buffer size
  - not cross 64M boundary

Update the buffer allocation logic to meet these requirements and support
the latest firmware.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251219014356.2234241-1-lizhi.hou@amd.com
2026-01-08 09:48:34 -08:00
Karol Wachowski
44e4c88951 accel/ivpu: Implement warm boot flow for NPU6 and unify boot handling
Starting from NPU6, the driver can pass boot parameters address through
the AON retention register and toggle between cold/warm boot types using
the boot_type parameter, while setting the cold boot entry point in both
cases.

Refactor the existing cold/warm boot handling to be consistent with the
new NPU6 boot flow requirements and still maintain compatibility with
older boot flows.

This will allow firmware to remove support for legacy warm boot starting
from NPU6.

Fixes: 550f4dd2ce ("accel/ivpu: Add support for Nova Lake's NPU")
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Reviewed-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Link: https://patch.msgid.link/20251230142116.540026-1-maciej.falkowski@linux.intel.com
2026-01-08 11:49:37 +01:00
Manivannan Sadhasivam
51731792a2 net: qrtr: Drop the MHI auto_queue feature for IPCR DL channels
MHI stack offers the 'auto_queue' feature, which allows the MHI stack to
auto queue the buffers for the RX path (DL channel). Though this feature
simplifies the client driver design, it introduces race between the client
drivers and the MHI stack. For instance, with auto_queue, the 'dl_callback'
for the DL channel may get called before the client driver is fully probed.
This means, by the time the dl_callback gets called, the client driver's
structures might not be initialized, leading to NULL ptr dereference.

Currently, the drivers have to workaround this issue by initializing the
internal structures before calling mhi_prepare_for_transfer_autoqueue().
But even so, there is a chance that the client driver's internal code path
may call the MHI queue APIs before mhi_prepare_for_transfer_autoqueue() is
called, leading to similar NULL ptr dereference. This issue has been
reported on the Qcom X1E80100 CRD machines affecting boot.

So to properly fix all these races, drop the MHI 'auto_queue' feature
altogether and let the client driver (QRTR) manage the RX buffers manually.
In the QRTR driver, queue the RX buffers based on the ring length during
probe and recycle the buffers in 'dl_callback' once they are consumed. This
also warrants removing the setting of 'auto_queue' flag from controller
drivers.

Currently, this 'auto_queue' feature is only enabled for IPCR DL channel.
So only the QRTR client driver requires the modification.

Fixes: 227fee5fc9 ("bus: mhi: core: Add an API for auto queueing buffers for DL channel")
Fixes: 68a838b84e ("net: qrtr: start MHI channel after endpoit creation")
Reported-by: Johan Hovold <johan@kernel.org>
Closes: https://lore.kernel.org/linux-arm-msm/ZyTtVdkCCES0lkl4@hovoldconsulting.com
Suggested-by: Chris Lew <quic_clew@quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Acked-by: Jeff Johnson <jjohnson@kernel.org> # drivers/net/wireless/ath/...
Acked-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251218-qrtr-fix-v2-1-c7499bfcfbe0@oss.qualcomm.com
2025-12-31 16:24:04 +05:30
Dave Airlie
7bc0f871f9 drm-misc-next for 6.20:
Core Changes:
 
   - dma-buf: Add tracepoints
   - sched: Introduce new helpers
 
 Driver Changes:
 
   - amdxdna: Enable hardware context priority, Remove (obsolete and
     never public) NPU2 Support, Race condition fix
   - rockchip: Add RK3368 HDMI Support
   - rz-du: Add RZ/V2H(P) MIPI-DSI Support
 
   - panels:
     - st7571: Introduce SPI support
     - New panels: Sitronix ST7920, Samsung LTL106HL02, LG LH546WF1-ED01, HannStar HSD156JUW2
 -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCaUUQPQAKCRAnX84Zoj2+
 djCkAYC0wujWi4gdMY50vISgrz8sNnYlsMpWKlEgU9oBlPTC226FQz9VNzjruW2D
 ftxAsRcBfjTEY3eadqq51+B/JzKwDnWpQskjOq6/MZbrlXvMhnokVBkWzYKEF/s/
 H+znO5LbfQ==
 =7lY/
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2025-12-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for 6.20:

Core Changes:

  - dma-buf: Add tracepoints
  - sched: Introduce new helpers

Driver Changes:

  - amdxdna: Enable hardware context priority, Remove (obsolete and
    never public) NPU2 Support, Race condition fix
  - rockchip: Add RK3368 HDMI Support
  - rz-du: Add RZ/V2H(P) MIPI-DSI Support

  - panels:
    - st7571: Introduce SPI support
    - New panels: Sitronix ST7920, Samsung LTL106HL02, LG LH546WF1-ED01, HannStar HSD156JUW2

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patch.msgid.link/20251219-arcane-quaint-skunk-e383b0@houat
2025-12-26 19:00:41 +10:00
Dave Airlie
6c8e404891 drm-misc-next for 6.19:
UAPI Changes:
 
   - panfrost: Add PANFROST_BO_SYNC ioctl
   - panthor: Add PANTHOR_BO_SYNC ioctl
 
 Core Changes:
 
   - atomic: Add drm_device pointer to drm_private_obj
   - bridge: Introduce drm_bridge_unplug, drm_bridge_enter, and
     drm_bridge_exit
   - dma-buf: Improve sg_table debugging
   - dma-fence: Add new helpers, and use them when needed
   - dp_mst: Avoid out-of-bounds access with VCPI==0
   - gem: Reduce page table overhead with transparent huge pages
   - panic: Report invalid panic modes
   - sched: Add TODO entries
   - ttm: Various cleanups
   - vblank: Various refactoring and cleanups
 
   - Kconfig cleanups
   - Removed support for kdb
 
 Driver Changes:
 
   - amdxdna: Fix race conditions at suspend, Improve handling of zero
     tail pointers, Fix cu_idx being overwritten during command setup
   - ast: Support imported cursor buffers
   -
   - panthor: Enable timestamp propagation, Multiple improvements and
     fixes to improve the overall robustness, notably of the scheduler.
 
   - panels:
     - panel-edp: Support for CSW MNE007QB3-1, AUO B140HAN06.4, AUO B140QAX01.H
 -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCaTvY/gAKCRAnX84Zoj2+
 dvwQAX9/8TuCixEI9o5A9HRsKM3yD+PuwHKNmqIGB8NqTMsvuV1WyMvbtrFiiDCy
 ga92dJEBf2QNx07e9UnTn5ktvZMGWvUlmi5O7iimxnMj+zNoPItBJ+xvS0WgW0gI
 MmXXjg8ICg==
 =cT/H
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2025-12-12' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for 6.19:

UAPI Changes:

  - panfrost: Add PANFROST_BO_SYNC ioctl
  - panthor: Add PANTHOR_BO_SYNC ioctl

Core Changes:

  - atomic: Add drm_device pointer to drm_private_obj
  - bridge: Introduce drm_bridge_unplug, drm_bridge_enter, and
    drm_bridge_exit
  - dma-buf: Improve sg_table debugging
  - dma-fence: Add new helpers, and use them when needed
  - dp_mst: Avoid out-of-bounds access with VCPI==0
  - gem: Reduce page table overhead with transparent huge pages
  - panic: Report invalid panic modes
  - sched: Add TODO entries
  - ttm: Various cleanups
  - vblank: Various refactoring and cleanups

  - Kconfig cleanups
  - Removed support for kdb

Driver Changes:

  - amdxdna: Fix race conditions at suspend, Improve handling of zero
    tail pointers, Fix cu_idx being overwritten during command setup
  - ast: Support imported cursor buffers
  -
  - panthor: Enable timestamp propagation, Multiple improvements and
    fixes to improve the overall robustness, notably of the scheduler.

  - panels:
    - panel-edp: Support for CSW MNE007QB3-1, AUO B140HAN06.4, AUO B140QAX01.H

Signed-off-by: Dave Airlie <airlied@redhat.com>

[airlied: fix mm conflict]
From: Maxime Ripard <mripard@redhat.com>
Link: https://patch.msgid.link/20251212-spectacular-agama-of-abracadabra-aaef32@penduick
2025-12-26 18:15:33 +10:00
Lizhi Hou
332070795b accel/amdxdna: Enable hardware context priority
Newer firmware supports hardware context priority. Set the priority based
on application input.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251217171719.2139025-1-lizhi.hou@amd.com
2025-12-18 10:36:44 -08:00
Lizhi Hou
7818618a09 accel/amdxdna: Enable temporal sharing only mode
Newer firmware versions prefer temporal sharing only mode. In this mode,
the driver no longer needs to manage AIE array column allocation. Instead,
a new field, num_unused_col, is added to the hardware context creation
request to specify how many columns will not be used by this hardware
context.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251217191150.2145937-1-lizhi.hou@amd.com
2025-12-18 10:36:33 -08:00
Lizhi Hou
3ef9384103 accel/amdxdna: Remove NPU2 support
NPU2 hardware was never publicly released and is now obsolete.
Remove all remaining NPU2 support from the driver.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251217190818.2145781-1-lizhi.hou@amd.com
2025-12-18 10:36:22 -08:00
Lizhi Hou
e8c28e16c3 accel/amdxdna: Remove amdxdna_flush()
amdxdna_flush() was introduced to ensure that the device does not access
a process address space after it has been freed. However, this is no
longer necessary because the driver now increments the mm reference count
when a command is submitted and decrements it only after the command has
completed. This guarantees that the process address space remains valid
for the entire duration of command execution. Remove amdxdna_flush to
simplify the teardown path.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251216031311.2033399-1-lizhi.hou@amd.com
2025-12-17 08:28:53 -08:00
Karol Wachowski
2359fe9313 accel/ivpu: Validate scatter-gather size against buffer size
Validate scatter-gather table size matches buffer object size before
mapping. Break mapping early if the table exceeds buffer size to
prevent overwriting existing mappings. Also validate the table is
not smaller than buffer size to avoid unmapped regions that trigger
MMU translation faults.

Log error and fail mapping operation on size mismatch to prevent
data corruption from mismatched host memory locations and NPU
addresses. Unmap any partially mapped buffer on failure.

Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20251215070933.520377-1-karol.wachowski@linux.intel.com
2025-12-16 08:10:19 +01:00
Mario Limonciello (AMD)
7bbf6d15e9 accel/amdxdna: Block running under a hypervisor
SVA support is required, which isn't configured by hypervisor
solutions.

Closes: https://github.com/QubesOS/qubes-issues/issues/10275
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4656
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251213054513.87925-1-superm1@kernel.org
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-12-15 13:00:03 -06:00
Lizhi Hou
97f2757383 accel/amdxdna: Fix potential NULL pointer dereference in context cleanup
aie_destroy_context() is invoked during error handling in
aie2_create_context(). However, aie_destroy_context() assumes that the
context's mailbox channel pointer is non-NULL. If mailbox channel
creation fails, the pointer remains NULL and calling aie_destroy_context()
can lead to a NULL pointer dereference.

In aie2_create_context(), replace aie_destroy_context() with a function
which request firmware to remove the context created previously.

Fixes: be462c97b7 ("accel/amdxdna: Add hardware context")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251212183244.1826318-1-lizhi.hou@amd.com
2025-12-15 09:52:59 -08:00
Lizhi Hou
343f5683cf accel/amdxdna: Fix race where send ring appears full due to delayed head update
The firmware sends a response and interrupts the driver before advancing
the mailbox send ring head pointer. As a result, the driver may observe
the response and attempt to send a new request before the firmware has
updated the head pointer. In this window, the send ring still appears
full, causing the driver to incorrectly fail the send operation.

This race can be triggered more easily in a multithreaded environment,
leading to unexpected and spurious "send ring full" failures.

To address this, poll the send ring head pointer for up to 100us before
returning a full-ring condition. This allows the firmware time to update
the head pointer.

Fixes: b87f920b93 ("accel/amdxdna: Support hardware mailbox")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251211045125.1724604-1-lizhi.hou@amd.com
2025-12-12 09:49:10 -08:00
Lizhi Hou
3d32eb7a5e accel/amdxdna: Fix cu_idx being cleared by memset() during command setup
For one command type, cu_idx is assigned before calling memset() on the
command structure. This results in cu_idx being overwritten, causing the
firmware to receive an incomplete or invalid command and leading to
unexpected command failures.

Fix this by moving the memset() call before initializing cu_idx so that
all fields are populated in the correct order.

Fixes: 71829d7f2f ("accel/amdxdna: Use MSG_OP_CHAIN_EXEC_NPU when supported")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251209211639.1636888-1-lizhi.hou@amd.com
2025-12-10 09:07:18 -08:00
Lizhi Hou
00ffe45ece accel/amdxdna: Fix race condition when checking rpm_on
When autosuspend is triggered, driver rpm_on flag is set to indicate that
a suspend/resume is already in progress. However, when a userspace
application submits a command during this narrow window,
amdxdna_pm_resume_get() may incorrectly skip the resume operation because
the rpm_on flag is still set. This results in commands being submitted
while the device has not actually resumed, causing unexpected behavior.

The set_dpm() is called by suspend/resume, it relied on rpm_on flag to
avoid calling into rpm suspend/resume recursivly. So to fix this, remove
the use of the rpm_on flag entirely. Instead, introduce aie2_pm_set_dpm()
which explicitly resumes the device before invoking set_dpm(). With this
change, set_dpm() is called directly inside the suspend or resume execution
path. Otherwise, aie2_pm_set_dpm() is called.

Fixes: 063db45183 ("accel/amdxdna: Enhance runtime power management")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251208165356.1549237-1-lizhi.hou@amd.com
2025-12-09 11:51:03 -08:00
Lizhi Hou
cd77d5a4aa accel/amdxdna: Fix tail-pointer polling in mailbox_get_msg()
In mailbox_get_msg(), mailbox_reg_read_non_zero() is called to poll for a
non-zero tail pointer. This assumed that a zero value indicates an error.
However, certain corner cases legitimately produce a zero tail pointer.
To handle these cases, remove mailbox_reg_read_non_zero(). The zero tail
pointer will be treated as a valid rewind event.

Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251204181603.793824-1-lizhi.hou@amd.com
2025-12-08 08:29:34 -08:00
Lizhi Hou
3d3ac202c7 accel/amdxdna: Poll MPNPU_PWAITMODE after requesting firmware suspend
After issuing a firmware suspend request, the driver must ensure that the
suspend operation has completed before proceeding. Add polling of the
MPNPU_PWAITMODE register to confirm that the firmware has fully entered
the suspended state. This prevents race conditions where subsequent
operations assume the firmware is idle before it has actually completed
its suspend sequence.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251202165427.507414-1-lizhi.hou@amd.com
2025-12-02 16:31:14 -08:00
Lizhi Hou
ca25834123 accel/amdxdna: Fix deadlock between context destroy and job timeout
Hardware context destroy function holds dev_lock while waiting for all jobs
to complete. The timeout job also needs to acquire dev_lock, this leads to
a deadlock.

Fix the issue by temporarily releasing dev_lock before waiting for all
jobs to finish, and reacquiring it afterward.

Fixes: 4fd6ca90fc ("accel/amdxdna: Refactor hardware context destroy routine")
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251107181050.1293125-1-lizhi.hou@amd.com
2025-11-13 09:10:43 -08:00
Lizhi Hou
6ff9385c07 accel/amdxdna: Clear mailbox interrupt register during channel creation
The mailbox interrupt register is not always cleared when a mailbox channel
is created. This can leave stale interrupt states from previous operations.

Fix this by explicitly clearing the interrupt register in the mailbox
channel creation function.

Fixes: b87f920b93 ("accel/amdxdna: Support hardware mailbox")
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251107181115.1293158-1-lizhi.hou@amd.com
2025-11-13 08:36:08 -08:00
Karol Wachowski
ccb7725df5 accel/ivpu: Fix warning due to undefined CONFIG_PROC_FS
Change #if to #ifdef CONFIG_PROC_FS to fix warning reported by test robot:
drivers/accel/ivpu/ivpu_drv.c:458:5: warning: "CONFIG_PROC_FS" is not defined, evaluates to 0 [-Wundef]

Fixes: 63cc028484 ("accel/ivpu: Add fdinfo support for memory statistics")
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Andrzej.Kacprowski@linux.intel.com
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20251112071911.1136934-1-karol.wachowski@linux.intel.com
2025-11-12 09:34:18 +01:00
Karol Wachowski
5ce6778a31 accel/ivpu: Count only resident buffers in memory utilization
Do not count buffer objects that have no backing pages, including imported
buffers where pages are set by VM faults triggered by userspace or pinned
by other drivers. Instead, return information about actual memory used by
the NPU.

Counting imported buffers results in incorrect calculations when
the same pages are counted multiple times, giving overly high
results.

Fixes: 7bfc9fa995 ("accel/ivpu: Expose NPU memory utilization info in sysfs")
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20251106101052.1050348-3-karol.wachowski@linux.intel.com
2025-11-12 07:58:44 +01:00
Karol Wachowski
63cc028484 accel/ivpu: Add fdinfo support for memory statistics
Implement DRM fdinfo interface to expose memory usage statistics
for NPU device file descriptors. Exclude unpinned and imported
buffers from resident memory calculations to provide accurate
memory usage reporting.

Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20251106101052.1050348-2-karol.wachowski@linux.intel.com
2025-11-12 07:58:44 +01:00
Zack McKevitt
a2b0c33e94 accel/qaic: Add qaic_ prefix to irq_polling_work
Rename irq_polling_work to qaic_irq_polling_work to reduce ambiguity
and avoid potential naming conflicts in the future.

Signed-off-by: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Link: https://patch.msgid.link/20251031192511.3179130-1-zachary.mckevitt@oss.qualcomm.com
2025-11-07 11:27:46 -07:00
Pranjal Ramajor Asha Kanojiya
6bc1fe6c74 accel/qaic: Collect crashdump from SSR channel
After subsystem of the device has crashed it sends a message with
command DEBUG_TRANSFER_INFO to kernel(host). Send ACK for that message
and then prepare to collect the ramdump of the subsystem

Steps of crashdump collection is as follows,
1)  Device sends DEBUG_TRANSFER_INFO message indicating that device wants
    to send crashdump.
2)  Send an acknowledgment to that message either ACK or NACK.
    a) NACK will inform the device that host will not download the
       crashdump
    b) ACK will inform the device that host will download the crashdump
3)  Along with the DEBUG_TRANSFER_INFO we receive a table base address and
    its length, use that to download that table from device.
    a) This table is meta data of the crashdump and not the actual
       crashdump.
4)  After we respond as ACK for message received on step 1) we start
    downloading the table. Use series of MEMORY_READ/MEMORY_READ_RSP SSR
    commands to download the entire table.
5)  Each entry in the table represents a segment of crashdump. Once the
    table downloading is complete, iterate through each entry of table
    and download each crashdump segment(same as table itself). Table entry
    contains the memory base address and length along with other info.
6)  After the entire crashdump is downloaded send DEBUG_TRANSFER_DONE
    which marks that host is terminating the crashdump transfer. This
    message can be send in both success or error case.
7)  After receiving DEBUG_TRANSFER_DONE_RSP hand over the crashdump to
    dev_coredumpv() and free all the necessary memory.

Co-developed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Co-developed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Signed-off-by: Pranjal Ramajor Asha Kanojiya <pkanojiy@codeaurora.org>
Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com>
Signed-off-by: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Link: https://patch.msgid.link/20251031174059.2814445-4-zachary.mckevitt@oss.qualcomm.com
2025-11-07 11:16:51 -07:00
Jeffrey Hugo
9675093ace accel/qaic: Implement basic SSR handling
Subsystem restart (SSR) for a qaic device means that a NSP has crashed,
and will be restarted.  However the restart process will lose any state
associated with activation, so the user will need to do some recovery.

While SSR has the provision to collect a crash dump, this patch does not
implement support for it.

Co-developed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Co-developed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Co-developed-by: Troy Hanson <quic_thanson@quicinc.com>
Signed-off-by: Troy Hanson <quic_thanson@quicinc.com>
Co-developed-by: Aswin Venkatesan <aswivenk@qti.qualcomm.com>
Signed-off-by: Aswin Venkatesan <aswivenk@qti.qualcomm.com>
Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com>
Signed-off-by: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
[jhugo: Fix minor checkpatch whitespace issues]
Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Link: https://patch.msgid.link/20251031174059.2814445-3-zachary.mckevitt@oss.qualcomm.com
2025-11-07 11:01:18 -07:00
Pranjal Ramajor Asha Kanojiya
f286066ed9 accel/qaic: Add DMA Bridge Channel(DBC) sysfs and uevents
Expose sysfs files for each DBC representing the current state of that DBC.
For example, sysfs for DBC ID 0 and accel minor number 0 looks like this,

/sys/class/accel/accel0/dbc0_state

Following are the states and their corresponding values,
DBC_STATE_IDLE (0)
DBC_STATE_ASSIGNED (1)
DBC_STATE_BEFORE_SHUTDOWN (2)
DBC_STATE_AFTER_SHUTDOWN (3)
DBC_STATE_BEFORE_POWER_UP (4)
DBC_STATE_AFTER_POWER_UP (5)

Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com>
Signed-off-by: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Link: https://patch.msgid.link/20251031174059.2814445-2-zachary.mckevitt@oss.qualcomm.com
2025-11-07 10:55:39 -07:00
Lizhi Hou
1750e652a9 accel/amdxdna: Treat power-off failure as unrecoverable error
Failing to set power off indicates an unrecoverable hardware or firmware
error. Update the driver to treat such a failure as a fatal condition
and stop further operations that depend on successful power state
transition.

This prevents undefined behavior when the hardware remains in an
unexpected state after a failed power-off attempt.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251106180521.1095218-1-lizhi.hou@amd.com
2025-11-07 08:52:29 -08:00
Lizhi Hou
dea9f84776 accel/amdxdna: Fix dma_fence leak when job is canceled
Currently, dma_fence_put(job->fence) is called in job notification
callback. However, if a job is canceled, the notification callback is never
invoked, leading to a memory leak. Move dma_fence_put(job->fence)
to the job cleanup function to ensure the fence is always released.

Fixes: aac243092b ("accel/amdxdna: Add command execution")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251105194140.1004314-1-lizhi.hou@amd.com
2025-11-06 09:23:42 -08:00
Youssef Samir
3301ef0a72 accel/qaic: Add support for PM callbacks
Add initial support for suspend and hibernation PM callbacks to QAIC.
The device can be suspended any time in which the data path is not
busy as queued I/O operations are lost on suspension and cannot be
resumed after suspend.

Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com>
Reviewed-by: Carl Vanderlip <carl.vanderlip@oss.qualcomm.com>
Signed-off-by: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Link: https://patch.msgid.link/20251029181808.1216466-1-zachary.mckevitt@oss.qualcomm.com
2025-11-05 15:41:43 -07:00
Lizhi Hou
3a0ff7b98a accel/amdxdna: Support preemption requests
The driver checks the firmware version during initialization.If preemption
is supported, the driver configures preemption accordingly and handles
userspace preemption requests. Otherwise, the driver returns an error for
userspace preemption requests.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251104185340.897560-1-lizhi.hou@amd.com
2025-11-05 08:56:28 -08:00