mirror of
https://github.com/torvalds/linux.git
synced 2026-03-14 02:06:15 +01:00
On x86-64, with CONFIG_RETPOLINE=n, GCC's "global common subexpression
elimination" optimization results in ___bpf_prog_run()'s jumptable code
changing from this:
select_insn:
jmp *jumptable(, %rax, 8)
...
ALU64_ADD_X:
...
jmp *jumptable(, %rax, 8)
ALU_ADD_X:
...
jmp *jumptable(, %rax, 8)
to this:
select_insn:
mov jumptable, %r12
jmp *(%r12, %rax, 8)
...
ALU64_ADD_X:
...
jmp *(%r12, %rax, 8)
ALU_ADD_X:
...
jmp *(%r12, %rax, 8)
The jumptable address is placed in a register once, at the beginning of
the function. The function execution can then go through multiple
indirect jumps which rely on that same register value. This has a few
issues:
1) Objtool isn't smart enough to be able to track such a register value
across multiple recursive indirect jumps through the jump table.
2) With CONFIG_RETPOLINE enabled, this optimization actually results in
a small slowdown. I measured a ~4.7% slowdown in the test_bpf
"tcpdump port 22" selftest.
This slowdown is actually predicted by the GCC manual:
Note: When compiling a program using computed gotos, a GCC
extension, you may get better run-time performance if you
disable the global common subexpression elimination pass by
adding -fno-gcse to the command line.
So just disable the optimization for this function.
Fixes:
|
||
|---|---|---|
| .. | ||
| arraymap.c | ||
| bpf_lru_list.c | ||
| bpf_lru_list.h | ||
| btf.c | ||
| cgroup.c | ||
| core.c | ||
| cpumap.c | ||
| devmap.c | ||
| disasm.c | ||
| disasm.h | ||
| hashtab.c | ||
| helpers.c | ||
| inode.c | ||
| local_storage.c | ||
| lpm_trie.c | ||
| Makefile | ||
| map_in_map.c | ||
| map_in_map.h | ||
| offload.c | ||
| percpu_freelist.c | ||
| percpu_freelist.h | ||
| queue_stack_maps.c | ||
| reuseport_array.c | ||
| stackmap.c | ||
| syscall.c | ||
| tnum.c | ||
| verifier.c | ||
| xskmap.c | ||