mirror of
https://github.com/torvalds/linux.git
synced 2026-03-08 03:44:45 +01:00
Some platforms (e.g. R-Car S4) do not gain from using a DMAC on TX path in ntb_transport and end up CPU-bound on memcpy_toio(). Add a module parameter 'tx_memcpy_offload' that moves the TX memcpy_toio() and descriptor writes to a per-QP kernel thread. It is disabled by default. This change also fixes a rare ordering hazard in ntb_tx_copy_callback(), that was observed on R-Car S4 once throughput improved with the new module parameter: the DONE flag write to the peer MW, which is WC mapped, could be observed after the DB/MSI trigger. Both operations are posted PCIe MWr (often via different OB iATUs), so WC buffering and bridges may reorder visibility. Insert dma_mb() to enforce store->load ordering and then read back hdr->flags to flush the posted write before ringing the doorbell / issuing MSI. While at it, update tx_index with WRITE_ONCE() at the earlier possible location to make ntb_transport_tx_free_entry() robust. Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Jon Mason <jdmason@kudzu.us> |
||
|---|---|---|
| .. | ||
| hw | ||
| test | ||
| core.c | ||
| Kconfig | ||
| Makefile | ||
| msi.c | ||
| ntb_transport.c | ||