linux/arch
Linus Torvalds 47ee3f1dd9 x86: re-introduce support for ERMS copies for user space accesses
I tried to streamline our user memory copy code fairly aggressively in
commit adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory
copies"), in order to then be able to clean up the code and inline the
modern FSRM case in commit 577e6a7fd5 ("x86: inline the 'rep movs' in
user copies for the FSRM case").

We had reports [1] of that causing regressions earlier with blogbench,
but that turned out to be a horrible benchmark for that case, and not a
sufficient reason for re-instating "rep movsb" on older machines.

However, now Eric Dumazet reported [2] a regression in performance that
seems to be a rather more real benchmark, where due to the removal of
"rep movs" a TCP stream over a 100Gbps network no longer reaches line
speed.

And it turns out that with the simplified the calling convention for the
non-FSRM case in commit 427fda2c8a ("x86: improve on the non-rep
'copy_user' function"), re-introducing the ERMS case is actually fairly
simple.

Of course, that "fairly simple" is glossing over several missteps due to
having to fight our assembler alternative code.  This code really wanted
to rewrite a conditional branch to have two different targets, but that
made objtool sufficiently unhappy that this instead just ended up doing
a choice between "jump to the unrolled loop, or use 'rep movsb'
directly".

Let's see if somebody finds a case where the kernel memory copies also
care (see commit 68674f94ff: "x86: don't use REP_GOOD or ERMS for
small memory copies").  But Eric does argue that the user copies are
special because networking tries to copy up to 32KB at a time, if
order-3 pages allocations are possible.

In-kernel memory copies are typically small, unless they are the special
"copy pages at a time" kind that still use "rep movs".

Link: https://lore.kernel.org/lkml/202305041446.71d46724-yujie.liu@intel.com/ [1]
Link: https://lore.kernel.org/lkml/CANn89iKUbyrJ=r2+_kK+sb2ZSSHifFZ7QkPLDpAtkJ8v4WUumA@mail.gmail.com/ [2]
Reported-and-tested-by: Eric Dumazet <edumazet@google.com>
Fixes: adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory copies")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-26 12:34:20 -07:00
..
alpha Locking changes in v6.4: 2023-05-05 12:56:55 -07:00
arc Locking changes in v6.4: 2023-05-05 12:56:55 -07:00
arm arm64 fixes for -rc3 2023-05-19 11:05:42 -07:00
arm64 ARM: 2023-05-21 13:58:37 -07:00
csky arch/csky patches for 6.4 2023-05-04 12:25:05 -07:00
hexagon Locking changes in v6.4: 2023-05-05 12:56:55 -07:00
ia64 Locking changes in v6.4: 2023-05-05 12:56:55 -07:00
loongarch Locking changes in v6.4: 2023-05-05 12:56:55 -07:00
m68k m68k: Move signal frame following exception on 68020/030 2023-05-22 13:51:20 +02:00
microblaze Kconfig: introduce HAS_IOPORT option and select it as necessary 2023-04-05 22:15:19 +02:00
mips Locking changes in v6.4: 2023-05-05 12:56:55 -07:00
nios2 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
openrisc Locking changes in v6.4: 2023-05-05 12:56:55 -07:00
parisc parisc: Fix flush_dcache_page() for usage from irq context 2023-05-24 19:03:49 +02:00
powerpc powerpc/iommu: Incorrect DDW Table is referenced for SR-IOV device 2023-05-17 00:54:55 +10:00
riscv Probes fixes for 6.4-rc1: 2023-05-18 09:04:45 -07:00
s390 s390 updates for 6.4-rc3 2023-05-19 11:11:04 -07:00
sh Locking changes in v6.4: 2023-05-05 12:56:55 -07:00
sparc Locking changes in v6.4: 2023-05-05 12:56:55 -07:00
um um: harddog: fix modular build 2023-05-10 00:21:30 +02:00
x86 x86: re-introduce support for ERMS copies for user space accesses 2023-05-26 12:34:20 -07:00
xtensa Xtensa fixes for v6.4: 2023-05-23 15:21:34 -07:00
.gitignore
Kconfig lazy tlb: shoot lazies, non-refcounting lazy tlb mm reference handling scheme 2023-03-28 16:20:08 -07:00