linux

mirror of https://github.com/torvalds/linux.git synced 2026-03-08 03:24:45 +01:00

Author	SHA1	Message	Date
Linus Torvalds	3ad66a34cc	io_uring-7.0-20260305 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmqPTAQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpueIEACHF0uws+uZSEsy9LUyC9ha8+5YN9szIJ3K QUGxa9pmCnQG5K50KpxyEYP6buaJDy1smJGgD2obkeGncC4w6xKK2kQTQV1U+C1C YA+7B/3HLhz5AWS6GIbRy6VZ599I4evlF8W79dX8BTnF8Y1ddkSuUnKx//q0AoQZ hr3foglcFlchy8JuQ2/MpxzfOouvNMdMmeUN4O+t8iXDrmePFYIOgrLcT+ObgC5D SXWx2cc3hMJ35hcSzedMWEBFcXnkX9nfh8Hd/+uPRcKsIwS8kCo6z01GoT/BCPRA jdrxAfoYSL16HPfq6GU52n6iCaRd+5NK+tt/ECCzTxGL32Hadrr+nxw4O7g3Q96u 07zeaqHSoTGUchtlqrGjALQLP2yxdACEjxMh3rfdStRv3x3bbbVVDdioVEzPukCr EBA+AbqaaG3LIYXwcY+15zx5NrAfeBAP1RjLgoV0s2ch4ghEqvnZGY4NLBDkcQ2R 97tM9+OdecBrsnlQr5GBoDbwpqc2pDEqSjkYDwoXqvqXs0DrMRq2MQw1Hjjh7Z7G FZx1KNTiLB/YQ0sSyMcUKnH+qBA0FxwN/C6dDnRjj4dH5YsoeG/GhsS3B00a+0yE S3MKrsf+uN21OYLVPSTEN6qS+02ZvK6E/Aw7/fk2IV60JMeM5KvCccmxa53dKls8 iyEJ7nVLOg== =xyKA -----END PGP SIGNATURE----- Merge tag 'io_uring-7.0-20260305' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring fixes from Jens Axboe: - Fix a typo in the mock_file help text - Fix a comment regarding IORING_SETUP_TASKRUN_FLAG in the io_uring.h UAPI header - Use READ_ONCE() for reading refill queue entries - Reject SEND_VECTORIZED for fixed buffer sends, as it isn't implemented. Currently this flag is silently ignored This is in preparation for making these work, but first we need a fixup so that older kernels will correctly reject them - Ensure "0" means default for the rx page size * tag 'io_uring-7.0-20260305' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring/zcrx: use READ_ONCE with user shared RQEs io_uring/mock: Fix typo in help text io_uring/net: reject SEND_VECTORIZED when unsupported io_uring: correct comment for IORING_SETUP_TASKRUN_FLAG io_uring/zcrx: don't set rx_page_size when not requested	2026-03-06 08:31:36 -08:00
Pavel Begunkov	531bb98a03	io_uring/zcrx: use READ_ONCE with user shared RQEs Refill queue entries are shared with the user space, use READ_ONCE when reading them. Fixes: `34a3e60821` ("io_uring/zcrx: implement zerocopy receive pp memory provider"); Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-04 06:30:39 -07:00
Pavel Begunkov	c36e28becd	io_uring/net: reject SEND_VECTORIZED when unsupported IORING_SEND_VECTORIZED with registered buffers is not implemented but could be. Don't silently ignore the flag in this case but reject it with an error. It only affects sendzc as normal sends don't support registered buffers. Fixes: `6f02527729` ("io_uring/net: Allow to do vectorized send") Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-03-02 09:17:04 -07:00
Jakub Kicinski	3d17d76d1f	io_uring/zcrx: don't set rx_page_size when not requested The rx_buf_len parameter was recently added to the Rx zero-copy implementation. The expectation is that when not set system will maintain previous behavior and use the default buffer size (PAGE_SIZE). This works correctly at the iouring level, but we don't preserve the same "zero means default" semantics when registering the memory provider on the netdev. mp_param.rx_page_size is unconditionally set to PAGE_SIZE. This causes __net_mp_open_rxq() to check for QCFG_RX_PAGE_SIZE support in the driver, and return -EOPNOTSUPP for drivers that don't advertise it -- even though the user never asked for large buffers. Only set mp_param.rx_page_size when rx_buf_len was explicitly provided, so that the default page size path works on all zcrx-capable drivers. mlx5 and fbnic only support 4kB pages in the current release. Fixes: `795663b4d1` ("io_uring/zcrx: implement large rx buffer support") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-27 12:32:49 -07:00
Linus Torvalds	530b0b61df	io_uring-7.0-20260227 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmhxlEQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpot9EACmBXos2lXMkiFN5TcSOLKW9Lh7PNAuDUns oflfPVkS/F17muPnQIHzn5wqHpbWijx2uH0ehUXNJi+6U7OBdvZyZvEfdslh2ewZ TAPnTFDZZbvdPqnGw7MJ1fLqGOB0RLoMJ72GkwIPhV6SQqmfu6U89ppplyMxybLK MPLOx3j8HK/pX/3uLEyOpZ6XIfZjGyGiiMj8lEN+UZ5a9b5G3W87LnGPTRitl6SL j5QTC5abGVk0vOqEPjm6Qws1icU9MumNAqTBTGL12WVlDRw0bQfSsZQzwySLXQpc pbT3CUurt3jgU873S8xPnbc0v5g/YMLuGv7morMRHJ13h4g1lTR3Q5u5+KX3RTBQ /I+R6C0uptLity+mBymdf0ZSgFOG7J7vfI6MzlqjUXSUqCfF3QVqrw5GrSDiDbI7 oO+JjSExM01w6kf1dcxk1ROJxxNiNWPpuP958poDydpD9jnp+Mr6jHuN66Es0Ctd fBvVMa1w2MzYopIeclN4KNvuPQV+HxITMt9RgNW+6iXo7Fwqy1SCyGbuOr6t1f4Y SD8s6xVYw47OilvGyaFZrERIq0xe8SjoCaNeVgDk9sX8g2XCZL1Utop9FbkzgPWW WzH6Y9by3UobHoCB2iehIbglaQtCKSwJou76Z7nXi15luZNL5MlQbQzYEaDcSlea i1dBMOmi+w== =lbcV -----END PGP SIGNATURE----- Merge tag 'io_uring-7.0-20260227' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring fixes from Jens Axboe: "Just two minor patches in here, ensuring the use of READ_ONCE() for sqe field reading is consistent across the codebase. There were two missing cases, now they are covered too" * tag 'io_uring-7.0-20260227' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring/timeout: READ_ONCE sqe->addr io_uring/cmd_net: use READ_ONCE() for ->addr3 read	2026-02-27 10:39:11 -08:00
Pavel Begunkov	85f6c439a6	io_uring/timeout: READ_ONCE sqe->addr We should use READ_ONCE when reading from a SQE, make sure timeout gets a stable timespec address. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-25 08:36:05 -07:00
Jens Axboe	a46435537a	io_uring/cmd_net: use READ_ONCE() for ->addr3 read Any SQE read should use READ_ONCE(), to ensure the result is read once and only once. Doesn't really matter for this case, but it's better to keep these 100% consistent and always use READ_ONCE() for the prep side of SQE handling. Fixes: `5d24321e4c` ("io_uring: Introduce getsockname io_uring cmd") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-24 11:38:34 -07:00
Linus Torvalds	323bbfcf1e	Convert 'alloc_flex' family to use the new default GFP_KERNEL argument This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/\(alloc_objs(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Linus Torvalds	8934827db5	kmalloc_obj treewide refactoring for v7.0-rc1 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaZl14wAKCRA2KwveOeQk uz8aAQCBFLYlij3Y3ivVADkBxuVF3xECaznFya41ENYsBwlHdwEArXqMyNrw+DiG TvWCK/tiddNmGIRpI2sxBFzyRpsHfAY= =rVD3 -----END PGP SIGNATURE----- Merge tag 'kmalloc_obj-treewide-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kmalloc_obj conversion from Kees Cook: "This does the tree-wide conversion to kmalloc_obj() and friends using coccinelle, with a subsequent small manual cleanup of whitespace alignment that coccinelle does not handle. This uncovered a clang bug in __builtin_counted_by_ref(), so the conversion is preceded by disabling that for current versions of clang. The imminent clang 22.1 release has the fix. I've done allmodconfig build tests for x86_64, arm64, i386, and arm. I did defconfig builds for alpha, m68k, mips, parisc, powerpc, riscv, s390, sparc, sh, arc, csky, xtensa, hexagon, and openrisc" * tag 'kmalloc_obj-treewide-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: kmalloc_obj: Clean up after treewide replacements treewide: Replace kmalloc with kmalloc_obj for non-scalar types compiler_types: Disable __builtin_counted_by_ref for Clang	2026-02-21 11:02:58 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Caleb Sander Mateos	42a6bd57ee	io_uring: add IORING_OP_URING_CMD128 to opcode checks io_should_commit(), io_uring_classic_poll(), and io_do_iopoll() compare struct io_kiocb's opcode against IORING_OP_URING_CMD to implement special treatment for uring_cmds. The recently added opcode IORING_OP_URING_CMD128 is meant to be equivalent to IORING_OP_URING_CMD, so treat it the same way in these functions. Fixes: `1cba30bf9f` ("io_uring: add support for IORING_SETUP_SQE_MIXED") Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-19 07:25:39 -07:00
Kai Aizen	003049b1c4	io_uring/zcrx: fix user_ref race between scrub and refill paths The io_zcrx_put_niov_uref() function uses a non-atomic check-then-decrement pattern (atomic_read followed by separate atomic_dec) to manipulate user_refs. This is serialized against other callers by rq_lock, but io_zcrx_scrub() modifies the same counter with atomic_xchg() WITHOUT holding rq_lock. On SMP systems, the following race exists: CPU0 (refill, holds rq_lock) CPU1 (scrub, no rq_lock) put_niov_uref: atomic_read(uref) - 1 // window opens atomic_xchg(uref, 0) - 1 return_niov_freelist(niov) [PUSH #1] // window closes atomic_dec(uref) - wraps to -1 returns true return_niov(niov) return_niov_freelist(niov) [PUSH #2: DOUBLE-FREE] The same niov is pushed to the freelist twice, causing free_count to exceed nr_iovs. Subsequent freelist pushes then perform an out-of-bounds write (a u32 value) past the kvmalloc'd freelist array into the adjacent slab object. Fix this by replacing the non-atomic read-then-dec in io_zcrx_put_niov_uref() with an atomic_try_cmpxchg loop that atomically tests and decrements user_refs. This makes the operation safe against concurrent atomic_xchg from scrub without requiring scrub to acquire rq_lock. Fixes: `34a3e60821` ("io_uring/zcrx: implement zerocopy receive pp memory provider") Cc: stable@vger.kernel.org Signed-off-by: Kai Aizen <kai@snailsploit.com> [pavel: removed a warning and a comment] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-18 10:39:48 -07:00
Linus Torvalds	7b751b01ad	io_uring-7.0-20260216 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmTrLsQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgplYaEACgWcIcGa9/nWq1x02uN7Zi9vHWpDJqgEhq JCLpMLdn3ZG6Ksn8RAfI4dKAKZKS7MuXDrpoXgchQ8LQjpssN6kTj2TlKdZR8Je3 NNWfkPnLUp/t3MN/V0vZiX5NQaJVCNblbcnauDzlN+6WkWku5p1wkwYwy3I7NPJ4 P7HHqFJAOwhyBpk/Nr3sQEDnKIn/vOiedyOuO+3HB6rlmnSmjY1cQ+FUSaOI+rNQ D3i9TMEojHYhMDt76ql2YdKcksBu6HaZQ6JNpIiN9iqNB+96e+X2bcLPyfwkuHwC N7G1IMfyTsuV7JWktcZP+AT8WK4Qf45fuUN/1EkKEL9MWF2TUMob8toQ0GXRCb22 NqSC1JyeVJ/sSnKzb2Z4wY4+BgRMo83ME3l6hi6QckWXfFyTAQe70JyUnu4w11qn 62astpZXVRSfvbH3vT76BWTa+5HUZExQgLRgor19BTeVY4ihh+muaoMH6An6jf6i ZnqUSsn7nFB20MEudVqhgiKTvqVic2Atsl6JD4wjwWs5nEP9wzmmCSEGd3Nkrrji HPWN4zu+1qczDZxmCJAj3w29cRO/vZCNpFARlSCMcXNOQsZaFWVaaQlzt26ZMhTi AyMav25X8fNCERvGP++uo7cKzDGCuhhIR6y5GlXZ6yTHsGTcSgooW/NNz6Ik2jUW Bwa5GBK36A== =TgoD -----END PGP SIGNATURE----- Merge tag 'io_uring-7.0-20260216' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull more io_uring updates from Jens Axboe: "This is a mix of cleanups and fixes. No major fixes in here, just a bunch of little fixes. Some of them marked for stable as it fixes behavioral issues - Fix an issue with SOCKET_URING_OP_SETSOCKOPT for netlink sockets, due to a too restrictive check on it having an ioctl handler - Remove a redundant SQPOLL check in ring creation - Kill dead accounting for zero-copy send, which doesn't use ->buf or ->len post the initial setup - Fix missing clamp of the allocation hint, which could cause allocations to fall outside of the range the application asked for. Still within the allowed limits. - Fix for IORING_OP_PIPE's handling of direct descriptors - Tweak to the API for the newly added BPF filters, making them more future proof in terms of how applications deal with them - A few fixes for zcrx, fixing a few error handling conditions - Fix for zcrx request flag checking - Add support for querying the zcrx page size - Improve the NO_SQARRAY static branch inc/dec, avoiding busy conditions causing too much traffic - Various little cleanups" * tag 'io_uring-7.0-20260216' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring/bpf_filter: pass in expected filter payload size io_uring/bpf_filter: move filter size and populate helper into struct io_uring/cancel: de-unionize file and user_data in struct io_cancel_data io_uring/rsrc: improve regbuf iov validation io_uring: remove unneeded io_send_zc accounting io_uring/cmd_net: fix too strict requirement on ioctl io_uring: delay sqarray static branch disablement io_uring/query: add query.h copyright notice io_uring/query: return support for custom rx page size io_uring/zcrx: check unsupported flags on import io_uring/zcrx: fix post open error handling io_uring/zcrx: fix sgtable leak on mapping failures io_uring: use the right type for creds iteration io_uring/openclose: fix io_pipe_fixed() slot tracking for specific slots io_uring/filetable: clamp alloc_hint to the configured alloc range io_uring/rsrc: replace reg buffer bit field with flags io_uring/zcrx: improve types for size calculation io_uring/tctx: avoid modifying loop variable in io_ring_add_registered_file io_uring: simplify IORING_SETUP_DEFER_TASKRUN && !SQPOLL check	2026-02-17 08:33:49 -08:00
Jens Axboe	be3573124e	io_uring/bpf_filter: pass in expected filter payload size It's quite possible that opcodes that have payloads attached to them, like IORING_OP_OPENAT/OPENAT2 or IORING_OP_SOCKET, that these paylods can change over time. For example, on the openat/openat2 side, the struct open_how argument is extensible, and could be extended in the future to allow further arguments to be passed in. Allow registration of a cBPF filter to give the size of the filter as seen by userspace. If that filter is for an opcode that takes extra payload data, allow it if the application payload expectation is the same size than the kernels. If that is the case, the kernel supports filtering on the payload that the application expects. If the size differs, the behavior depends on the IO_URING_BPF_FILTER_SZ_STRICT flag: 1) If IO_URING_BPF_FILTER_SZ_STRICT is set and the size expectation differs, fail the attempt to load the filter. 2) If IO_URING_BPF_FILTER_SZ_STRICT isn't set, allow the filter if the userspace pdu size is smaller than what the kernel offers. 3) Regardless if IO_URING_BPF_FILTER_SZ_STRICT, fail loading the filter if the userspace pdu size is bigger than what the kernel supports. An attempt to load a filter due to sizing will error with -EMSGSIZE. For that error, the registration struct will have filter->pdu_size populated with the pdu size that the kernel uses. Reported-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-16 15:56:31 -07:00
Jens Axboe	d21c362182	io_uring/bpf_filter: move filter size and populate helper into struct Rather than open-code this logic in io_uring_populate_bpf_ctx() with a switch, move it to the issue side definitions. Outside of making this easier to extend in the future, it's also a prep patch for using the pdu size for a given opcode filter elsewhere. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-16 15:56:25 -07:00
Jens Axboe	22dbb0987b	io_uring/cancel: de-unionize file and user_data in struct io_cancel_data By having them share the same space in struct io_cancel_data, it ends up disallowing IORING_ASYNC_CANCEL_FD\|IORING_ASYNC_CANCEL_USERDATA from working. Eg you cannot match on both a file and user_data for cancelation purposes. This obviously isn't a common use case as nobody has reported this, but it does result in -ENOENT potentially being returned when trying to match on both, rather than actually doing what the API says it would. Fixes: `4bf94615b8` ("io_uring: allow IORING_OP_ASYNC_CANCEL with 'fd' key") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-16 14:16:27 -07:00
Pavel Begunkov	2e02f9efdb	io_uring/rsrc: improve regbuf iov validation Deduplicate io_buffer_validate() calls by moving the checks into io_sqe_buffer_register(). Now we also don't need special handling in io_buffer_validate() passing through buffer removal requests. I also was using it as a cleanup before some other changes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-16 08:15:38 -07:00
Dylan Yudaken	046fcc83ac	io_uring: remove unneeded io_send_zc accounting zc->len and zc->buf are not actually used once you get to the retry stage. The buffer remains in kmsg->msg.msg_iter, which is setup in io_send_setup. Note: it still seems needed in io_send due to io_send_select_buffer needing it (for the len parameter). Signed-off-by: Dylan Yudaken <dyudaken@gmail.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-16 08:10:46 -07:00
Asbjørn Sloth Tønnesen	600b665b90	io_uring/cmd_net: fix too strict requirement on ioctl Attempting SOCKET_URING_OP_SETSOCKOPT on an AF_NETLINK socket resulted in an -EOPNOTSUPP, as AF_NETLINK doesn't have an ioctl in its struct proto, but only in struct proto_ops. Prior to the blamed commit, io_uring_cmd_sock() only had two cmd_op operations, both requiring ioctl, thus the check was warranted. Since then, 4 new cmd_op operations have been added, none of which depend on ioctl. This patch moves the ioctl check, so it only applies to the original operations. AFAICT, the ioctl requirement was unintentional, and it wasn't visible in the blamed patch within 3 lines of context. Cc: stable@vger.kernel.org Fixes: `a5d2f99aff` ("io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT") Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-16 08:08:01 -07:00
Pavel Begunkov	56112578c7	io_uring: delay sqarray static branch disablement io_key_has_sqarray static branch can be easily switched on/off by the user every time patching the kernel. That can be very disruptive as it might require heavy synchronisation across all CPUs. Use deferred static keys, which can rate-limit it by deferring, batching and potentially effectively eliminating dec+inc pairs. Fixes: `9b296c625a` ("io_uring: static_key for !IORING_SETUP_NO_SQARRAY") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-15 15:12:54 -07:00
Pavel Begunkov	c29214677a	io_uring/query: return support for custom rx page size Add an ability to query if the zcrx rx page size setting is available. Note, even when the API is supported by io_uring, the registration can still get rejected for various reasons, e.g. when the NIC or the driver doesn't support it, when the particular specified size is unsupported, when the memory area doesn't satisfy all requirements, etc. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-15 14:55:37 -07:00
Pavel Begunkov	7496e658a7	io_uring/zcrx: check unsupported flags on import The imoorted zcrx registration path checks for ZCRX_REG_IMPORT, as it should, but doesn't reject any unsupported flags. Fix that. Cc: stable@vger.kernel.org Fixes: `00d9148127` ("io_uring/zcrx: share an ifq between rings") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-15 14:55:29 -07:00
Pavel Begunkov	5d540e4508	io_uring/zcrx: fix post open error handling Closing a queue doesn't guarantee that all associated page pools are terminated right away, let the refcounting do the work instead of releasing the zcrx ctx directly. Cc: stable@vger.kernel.org Fixes: `e0793de24a` ("io_uring/zcrx: set pp memory provider for an rx queue") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-14 18:05:08 -07:00
Pavel Begunkov	a983aae397	io_uring/zcrx: fix sgtable leak on mapping failures In an unlikely case when io_populate_area_dma() fails, which could only happen on a PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA machine, io_zcrx_map_area() will have an initialised and not freed table. It was supposed to be cleaned up in the error path, but !is_mapped prevents that. Fixes: `439a98b972` ("io_uring/zcrx: deduplicate area mapping") Cc: stable@vger.kernel.org Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-14 18:05:00 -07:00
Linus Torvalds	041c16acba	for-7.0/io_uring-zcrx-large-buffers-20260206 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmGJ4AQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpnL0D/9rEC0FGoGM3473v2SUPEkcpSIIab3KMZl5 YW/GcPeE8i/jMTfa/QJjTpp24ke1m+AuMYmEUm+KDHNXNQTjJQVnSFRPxER5dLJl goarFuk5JVDdzovL8QZuYGQHY3o1G54QLuBpVypBBqZdOQFTk8UwOdLFk4JrHMM2 PkUBUf9lZq9KiH6jdwn3v4qpZsZq93IubZ2dSncDcuZ3l2FbiZG88C3pp7wd+w1Z VM+xTLkqS3OhXiLVfbmRVM//3PgQZU4bO6k+vjWhlztC+5+ELcuiPyN6nd++6lLw LHH55T/xzko+TVc7kW7NTZ79MjBrjFKSNq4M/LV5SXSEpUAlqMoEakXVupVrBv+Z gheHHHas3t8FWKwIkjEH6iV1TyGOxZzxaiQ8MurzQ5v3FC2PdfH+Uqisf56LTNKY yAI0Ka8JWBv0sImgGB5J5ZzMP+xPxp4oyLEqKGLkUSi2OGtOeSHXwEiNbbJuTJBQ Q+xGdTXukG+zlMnTIcq0fGFaT0hXYs8a6ZiF7vyaYdW9R6/WdfDlS/YN+gM+JgrD EH2k29NE4kZi6cPFayfmgnfh10leiyMYt020b0GR4VpCHtthz7k+oasLB400qGrs X6wad+Y1YfdEpD1SW447ZrM8JmcOVrLCj/yIodyV30Kw0v3QSjKLcp2LoxwvPpRi Crs4Sb74Lg== =d42R -----END PGP SIGNATURE----- Merge tag 'for-7.0/io_uring-zcrx-large-buffers-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring large rx buffer support from Jens Axboe: "Now that the networking updates are upstream, here's the support for large buffers for zcrx. Using larger (bigger than 4K) rx buffers can increase the effiency of zcrx. For example, it's been shown that using 32K buffers can decrease CPU usage by ~30% compared to 4K buffers" * tag 'for-7.0/io_uring-zcrx-large-buffers-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring/zcrx: implement large rx buffer support	2026-02-12 15:07:50 -08:00
Jens Axboe	d7d95207ca	io_uring: use the right type for creds iteration In io_ring_ctx_wait_and_kill(), struct creds *creds is used to iterate and prune credentials. But the correct type is struct cred. This doesn't matter as the variable isn't used at all, only the index is used. But it's confusing using a type that isn't valid, so fix it up. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-11 20:31:58 -07:00
Jens Axboe	f4d0668b38	io_uring/openclose: fix io_pipe_fixed() slot tracking for specific slots __io_fixed_fd_install() returns 0 on success for non-alloc mode (specific slot), not the slot index. io_pipe_fixed() used this return value directly as the slot index in fds[], which can cause the reported values returned via copy_to_user() to be incorrect, or the error path operating on the incorrect direct descriptor. Fix by computing the actual 0-based slot index (slot - 1) for specific slot mode, while preserving the existing behavior for auto-alloc mode where __io_fixed_fd_install() already returns the allocated index. Cc: stable@vger.kernel.org Fixes: `53db8a71ec` ("io_uring: add support for IORING_OP_PIPE") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-11 20:31:21 -07:00
Jens Axboe	a6bded921e	io_uring/filetable: clamp alloc_hint to the configured alloc range Explicit fixed file install/remove operations on slots outside the configured alloc range can corrupt alloc_hint via io_file_bitmap_set() and io_file_bitmap_clear(), which unconditionally update alloc_hint to the bit position. This causes subsequent auto-allocations to fall outside the configured range. For example, if the alloc range is [10, 20) and a file is removed at slot 2, alloc_hint gets set to 2. The next auto-alloc then starts searching from slot 2, potentially returning a slot below the range. Fix this by clamping alloc_hint to [file_alloc_start, file_alloc_end) at the top of io_file_bitmap_get() before starting the search. Cc: stable@vger.kernel.org Fixes: `6e73dffbb9` ("io_uring: let to set a range for file slot allocation") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-11 15:20:44 -07:00
Pavel Begunkov	0efc331d78	io_uring/rsrc: replace reg buffer bit field with flags I'll need a flag in the registered buffer struct for dmabuf work, and it'll be more convenient to have a flags field rather than bit fields, especially for io_mapped_ubuf initialisation. We might want to add more flags in the future as well. For example, it might be useful for debugging and potentially optimisations to split out a flag indicating the shape of the buffer to gate iov_iter_advance() walks vs bit/mask arithmetics. It can also be combined with the direction mask field. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-10 05:26:15 -07:00
Pavel Begunkov	417d029dc4	io_uring/zcrx: improve types for size calculation Make sure io_import_umem() promotes the type to long before calculating the area size. While the area size is capped at 1GB by io_validate_user_buf_range() and fits into an "int", it's still too error prone. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-10 05:26:12 -07:00
Yang Xiuwei	daa0b901f8	io_uring/tctx: avoid modifying loop variable in io_ring_add_registered_file Use a separate 'idx' variable to store the result of array_index_nospec() instead of modifying the loop variable 'offset' directly. This improves code clarity by separating the logical index from the sanitized index used for array access. No functional change intended. Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-09 20:12:46 -07:00
Caleb Sander Mateos	7cb3a68376	io_uring: simplify IORING_SETUP_DEFER_TASKRUN && !SQPOLL check io_uring_sanitise_params() already rejects flags that include both IORING_SETUP_SQPOLL and IORING_SETUP_DEFER_TASKRUN. So it's unnecessary to check IORING_SETUP_SQPOLL in io_uring_create() when IORING_SETUP_DEFER_TASKRUN has already been checked. Drop the !(ctx->flags & IORING_SETUP_SQPOLL) check for the task_complete case. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-09 20:12:36 -07:00
Linus Torvalds	0c00ed308d	for-7.0/block-20260206 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmGLwcQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpv+TD/48S2HTnMhmW6AtFYWErQ+sEKXpHrxbYe7S +qR8/g/T+QSfhfqPwZEuagndFKtIP3LJfaXGSP1Lk1RfP9NLQy91v33Ibe4DjHkp etWSfnMHA9MUAoWKmg8EvncB2G+ZQFiYCpjazj5tKHD9S2+psGMuL8kq6qzMJE83 uhpb8WutUl4aSIXbMSfyGlwBhI1MjjRbbWlIBmg4yC8BWt1sH8Qn2L2GNVylEIcX U8At3KLgPGn0axSg4yGMAwTqtGhL/jwdDyeczbmRlXuAr4iVL9UX/yADCYkazt6U ttQ2/H+cxCwfES84COx9EteAatlbZxo6wjGvZ3xOMiMJVTjYe1x6Gkcckq+LrZX6 tjofi2KK78qkrMXk1mZMkZjpyUWgRtCswhDllbQyqFs0SwzQtno2//Rk8HU9dhbt pkpryDbGFki9X3upcNyEYp5TYflpW6YhAzShYgmE6KXim2fV8SeFLviy0erKOAl+ fwjTE6KQ5QoQv0s3WxkWa4lREm34O6IHrCUmbiPm5CruJnQDhqAN2QZIDgYC4WAf 0gu9cR/O4Vxu7TQXrumPs5q+gCyDU0u0B8C3mG2s+rIo+PI5cVZKs2OIZ8HiPo0F x73kR/pX3DMe35ZQkQX22ymMuowV+aQouDLY9DTwakP5acdcg7h7GZKABk6VLB06 gUIsnxURiQ== =jNzW -----END PGP SIGNATURE----- Merge tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - Support for batch request processing for ublk, improving the efficiency of the kernel/ublk server communication. This can yield nice 7-12% performance improvements - Support for integrity data for ublk - Various other ublk improvements and additions, including a ton of selftests additions and updated - Move the handling of blk-crypto software fallback from below the block layer to above it. This reduces the complexity of dealing with bio splitting - Series fixing a number of potential deadlocks in blk-mq related to the queue usage counter and writeback throttling and rq-qos debugfs handling - Add an async_depth queue attribute, to resolve a performance regression that's been around for a qhilw related to the scheduler depth handling - Only use task_work for IOPOLL completions on NVMe, if it is necessary to do so. An earlier fix for an issue resulted in all these completions being punted to task_work, to guarantee that completions were only run for a given io_uring ring when it was local to that ring. With the new changes, we can detect if it's necessary to use task_work or not, and avoid it if possible. - rnbd fixes: - Fix refcount underflow in device unmap path - Handle PREFLUSH and NOUNMAP flags properly in protocol - Fix server-side bi_size for special IOs - Zero response buffer before use - Fix trace format for flags - Add .release to rnbd_dev_ktype - MD pull requests via Yu Kuai - Fix raid5_run() to return error when log_init() fails - Fix IO hang with degraded array with llbitmap - Fix percpu_ref not resurrected on suspend timeout in llbitmap - Fix GPF in write_page caused by resize race - Fix NULL pointer dereference in process_metadata_update - Fix hang when stopping arrays with metadata through dm-raid - Fix any_working flag handling in raid10_sync_request - Refactor sync/recovery code path, improve error handling for badblocks, and remove unused recovery_disabled field - Consolidate mddev boolean fields into mddev_flags - Use mempool to allocate stripe_request_ctx and make sure max_sectors is not less than io_opt in raid5 - Fix return value of mddev_trylock - Fix memory leak in raid1_run() - Add Li Nan as mdraid reviewer - Move phys_vec definitions to the kernel types, mostly in preparation for some VFIO and RDMA changes - Improve the speed for secure erase for some devices - Various little rust updates - Various other minor fixes, improvements, and cleanups * tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits) blk-mq: ABI/sysfs-block: fix docs build warnings selftests: ublk: organize test directories by test ID block: decouple secure erase size limit from discard size limit block: remove redundant kill_bdev() call in set_blocksize() blk-mq: add documentation for new queue attribute async_dpeth block, bfq: convert to use request_queue->async_depth mq-deadline: covert to use request_queue->async_depth kyber: covert to use request_queue->async_depth blk-mq: add a new queue sysfs attribute async_depth blk-mq: factor out a helper blk_mq_limit_depth() blk-mq-sched: unify elevators checking for async requests block: convert nr_requests to unsigned int block: don't use strcpy to copy blockdev name blk-mq-debugfs: warn about possible deadlock blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs() blk-mq-debugfs: remove blk_mq_debugfs_unregister_rqos() blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static blk-rq-qos: fix possible debugfs_mutex deadlock blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos blk-wbt: fix possible deadlock to nest pcpu_alloc_mutex under q_usage_counter ...	2026-02-09 17:57:21 -08:00
Linus Torvalds	591beb0e3a	io_uring-bpf-restrictions.4-20260206 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmGJ1kQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpky8EAChIL3uJ5Vmv+oQTxT4EVb1wpc8U/XzXWU5 Q5F9IpZZCGO7+i015Y7iTTqDRixjblRaWpWzZZP8vflWDUS8LESNZLQdcoEnxaiv P367KNPUGwxejcKsu8PvZvfnX6JWSQoNstcDmrwkCF0ND2UUfvvMZyn3uKhkbBRY h5Ehcqkvqc1OJDAWC7+yPzYAmB01uRPQ6sc9/GeujznHPlfbvie4u6gBvvfXeirT 592zbVftINMrm6Twd6zl4n+HNAn+CUoyVMppeeddv5IcyFPm9uz/dLOZBXTz6552 jFYNmB0U4g+SxGXMyqp37YISTALnuY+57y5eXmEAtgkEeE3HrF+F/ZdxQHwXSpo3 T2Lb9IOqFyHtSvq678HZ37JB6aIYbBE/mZdNf8FFFpnPJGb5Ey7d50qPp/ywVq0H p9CahbpkzGUBMsZ+koew0YHiFdWV9tww+/Bnk5dTtn2197uyaHsLdmbf4C36GWke Bk5cwNgU+3DMFAfTiL9m+AIXYsJkBayRJn+hViTrF5AL7gcGiBryGF43FOSKoYuq f0mniDnGSwvn86VZPuZQ6wBRHZPEMR3OlaUXn6XrUU6cYyvMg0pBZV+QHF7zlsSP 2sdfUbPL5TxexF3G8dsxlDIypz9Z6TCoUCfU0WiiUETnCrVNkXfIY846A+w08p0b ejBjzrwRtQ== =CqJq -----END PGP SIGNATURE----- Merge tag 'io_uring-bpf-restrictions.4-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring bpf filters from Jens Axboe: "This adds support for both cBPF filters for io_uring, as well as task inherited restrictions and filters. seccomp and io_uring don't play along nicely, as most of the interesting data to filter on resides somewhat out-of-band, in the submission queue ring. As a result, things like containers and systemd that apply seccomp filters, can't filter io_uring operations. That leaves them with just one choice if filtering is critical - filter the actual io_uring_setup(2) system call to simply disallow io_uring. That's rather unfortunate, and has limited us because of it. io_uring already has some filtering support. It requires the ring to be setup in a disabled state, and then a filter set can be applied. This filter set is completely bi-modal - an opcode is either enabled or it's not. Once a filter set is registered, the ring can be enabled. This is very restrictive, and it's not useful at all to systemd or containers which really want both broader and more specific control. This first adds support for cBPF filters for opcodes, which enables tighter control over what exactly a specific opcode may do. As examples, specific support is added for IORING_OP_OPENAT/OPENAT2, allowing filtering on resolve flags. And another example is added for IORING_OP_SOCKET, allowing filtering on domain/type/protocol. These are both common use cases. cBPF was chosen rather than eBPF, because the latter is often restricted in containers as well. These filters are run post the init phase of the request, which allows filters to even dip into data that is being passed in struct in user memory, as the init side of requests make that data stable by bringing it into the kernel. This allows filtering without needing to copy this data twice, or have filters etc know about the exact layout of the user data. The filters get the already copied and sanitized data passed. On top of that support is added for per-task filters, meaning that any ring created with a task that has a per-task filter will get those filters applied when it's created. These filters are inherited across fork as well. Once a filter has been registered, any further added filters may only further restrict what operations are permitted. Filters cannot change the return value of an operation, they can only permit or deny it based on the contents" * tag 'io_uring-bpf-restrictions.4-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring: allow registration of per-task restrictions io_uring: add task fork hook io_uring/bpf_filter: add ref counts to struct io_bpf_filter io_uring/bpf_filter: cache lookup table in ctx->bpf_filters io_uring/bpf_filter: allow filtering on contents of struct open_how io_uring/net: allow filtering on IORING_OP_SOCKET data io_uring: add support for BPF filtering for opcode restrictions	2026-02-09 17:31:17 -08:00
Linus Torvalds	f5d4feed17	for-7.0/io_uring-20260206 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmGJxsQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpk6+EACamMdw6WU4VVNjUtjT93FuXxor4ioyhowJ myRtKG3ZvYrE63Z8F1dCQE28RXi9n6MhGxabCq8WZVGkhTv27DuaBkDjU4T8oCnP EYhs5a3sdRXfKuIlqVbxuiFdmiPHEP0vh3/MviKx9Ju3/Po3OEWKBalNMevfGkS4 bRNp9IQkAYNSRhGma2ni9Rnc5welWmhpsxUKFdGtPRX53ZlYegiZxKlfKMB4/SQ+ 7XAWKhy9dOGVo4DpLof7mCX6hMeX+FoNkJzF6cTMO/IF//lCLjI9BN4SMiI6mmEN RY6PLJiFraoQx8wdr3J1LtBCNXzzj6cPk6PNHKtsodoafe2oYFNLNgfAa9pHDzfM 12kvy58au0cQG6TnS2eNlqM2GN116mJi+k00E+UW4iaXXtpqcdcBrLlS+Q5hJ78C 9MBLQofv7D06C6kbpxV2pVS1u4oxefjl19wWLqLKx/VytCHrsaTm50n1r0k7YLCc plvPkQRQobqpp2GtcaXcfmsi1Vfu4jzMBAN+rTN4/te0kudNqL9+hPvrejIMEURc 2AcktMAHC8wjpr93dFASXiWh/fdyhV4e2a/D/ML4PXxhnCfnGx5s5Tp/pGjePHEU dLZm9vadmr/Yrdgycf9gQ8mz9IxI9FNJCKbI7lf7+/KJXe7DwngOa6VHNblWBRHv YoX6bG1yQQ== =Q248 -----END PGP SIGNATURE----- Merge tag 'for-7.0/io_uring-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring updates from Jens Axboe: - Clean up the IORING_SETUP_R_DISABLED and submitter task checking, mostly just in preparation for relaxing the locking for SINGLE_ISSUER in the future. - Improve IOPOLL by using a doubly linked list to manage completions. Previously it was singly listed, which meant that to complete request N in the chain 0..N-1 had to have completed first. With a doubly linked list we can complete whatever request completes in that order, rather than need to wait for a consecutive range to be available. This reduces latencies. - Improve the restriction setup and checking. Mostly in preparation for adding further features on top of that. Coming in a separate pull request. - Split out task_work and wait handling into separate files. These are mostly nicely abstracted already, but still remained in the io_uring.c file which is on the larger side. - Use GFP_KERNEL_ACCOUNT in a few more spots, where appropriate. - Ensure even the idle io-wq worker exits if a task no longer has any rings open. - Add support for a non-circular submission queue. By default, the SQ ring keeps moving around, even if only a few entries are used for each submission. This can be wasteful in terms of cachelines. If IORING_SETUP_SQ_REWIND is set for the ring when created, each submission will start at offset 0 instead of where we last left off doing submissions. - Various little cleanups * tag 'for-7.0/io_uring-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (30 commits) io_uring/kbuf: fix memory leak if io_buffer_add_list fails io_uring: Add SPDX id lines to remaining source files io_uring: allow io-wq workers to exit when unused io_uring/io-wq: add exit-on-idle state io_uring/net: don't continue send bundle if poll was required for retry io_uring/rsrc: use GFP_KERNEL_ACCOUNT consistently io_uring/futex: use GFP_KERNEL_ACCOUNT for futex data allocation io_uring/io-wq: handle !sysctl_hung_task_timeout_secs io_uring: fix bad indentation for setup flags if statement io_uring/rsrc: take unsigned index in io_rsrc_node_lookup() io_uring: introduce non-circular SQ io_uring: split out CQ waiting code into wait.c io_uring: split out task work code into tw.c io_uring/io-wq: don't trigger hung task for syzbot craziness io_uring: add IO_URING_EXIT_WAIT_MAX definition io_uring/sync: validate passed in offset io_uring/eventfd: remove unused ctx->evfd_last_cq_tail member io_uring/timeout: annotate data race in io_flush_timeouts() io_uring/uring_cmd: explicitly disallow cancelations for IOPOLL io_uring: fix IOPOLL with passthrough I/O ...	2026-02-09 17:22:00 -08:00
Linus Torvalds	26c9342bb7	struct filename series [mostly] sanitize struct filename hanling Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaYlcJgAKCRBZ7Krx/gZQ 6xlKAP9c9J13sJ/mcobsj1Ov7nSHISNbnYqvRRCu09Wq3UQvJgEApNQYOEdLtpff zUnWOAQ0nOKY7w9VMLkRRustXpuGjAc= =Fld4 -----END PGP SIGNATURE----- Merge tag 'pull-filename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs 'struct filename' updates from Al Viro: "[Mostly] sanitize struct filename handling" * tag 'pull-filename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (68 commits) sysfs(2): fs_index() argument is _not_ a pathname alpha: switch osf_mount() to strndup_user() ksmbd: use CLASS(filename_kernel) mqueue: switch to CLASS(filename) user_statfs(): switch to CLASS(filename) statx: switch to CLASS(filename_maybe_null) quotactl_block(): switch to CLASS(filename) chroot(2): switch to CLASS(filename) move_mount(2): switch to CLASS(filename_maybe_null) namei.c: switch user pathname imports to CLASS(filename{,_flags}) namei.c: convert getname_kernel() callers to CLASS(filename_kernel) do_f{chmod,chown,access}at(): use CLASS(filename_uflags) do_readlinkat(): switch to CLASS(filename_flags) do_sys_truncate(): switch to CLASS(filename) do_utimes_path(): switch to CLASS(filename_uflags) chdir(2): unspaghettify a bit... do_fchownat(): unspaghettify a bit... fspick(2): use CLASS(filename_flags) name_to_handle_at(): use CLASS(filename_uflags) vfs_open_tree(): use CLASS(filename_uflags) ...	2026-02-09 16:58:28 -08:00
Jens Axboe	ed82f35b92	io_uring: allow registration of per-task restrictions Currently io_uring supports restricting operations on a per-ring basis. To use those, the ring must be setup in a disabled state by setting IORING_SETUP_R_DISABLED. Then restrictions can be set for the ring, and the ring can then be enabled. This commit adds support for IORING_REGISTER_RESTRICTIONS with ring_fd == -1, like the other "blind" register opcodes which work on the task rather than a specific ring. This allows registration of the same kind of restrictions as can been done on a specific ring, but with the task itself. Once done, any ring created will inherit these restrictions. If a restriction filter is registered with a task, then it's inherited on fork for its children. Children may only further restrict operations, not extend them. Inheriting restrictions include both the classic IORING_REGISTER_RESTRICTIONS based restrictions, as well as the BPF filters that have been registered with the task via IORING_REGISTER_BPF_FILTER. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-06 07:29:19 -07:00
Jens Axboe	9fd99788f3	io_uring: add task fork hook Called when copy_process() is called to copy state to a new child. Right now this is just a stub, but will be used shortly to properly handle fork'ing of task based io_uring restrictions. Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-06 07:29:14 -07:00
Linus Torvalds	92f778a0b1	io_uring-6.19-20260205 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmFEccQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpjXxD/9Tkn0DgevRtQEciopjKmLdgC5kki8UOWjA lGfq0ZidcOAO0JLlr4gbH7lRsklN2wqGV0sNWyX++72U4Lw9iZli5Zv/ykNQ8odG DcYAOSWyxPYP2glrNGwHwFS/aT66vcCfwjNeE8eLkl3L8qhSzx5O50NHMGLb45Ob 7fUGaH3SVy4CLctFms/3EZ6rV+El7Xu37AzLCUAnE4cvZsyLozuGM8b9ED/+ZpJx 3VrIx9Md5VM1fiQ8yiY45liAGxA76IO6nZwp+Uq7pOVMMTRyX7Z46PMWhVi2xwwI fz/oiJTR8a5CRbSLZU6JKukIuAEVhc60vTEWQHeUAEndCapgprBX+12IQ2dJdGAJ soaQsLJzNrBvt5CydIzjsbRwbV6rJRi8Te26iBHRFwHP4ind+BqfpdE4X72YQN5j Hgr/XsVLWluCSVb1WbmoTM+ptbcw0GgzhK7k9oG2iqaYISBBK+Deuo4Wg1xsFWLQ 4sTeVF7V84lYpNBf7DMIdyjhqqN7+In6oGA+4NEhDmxlDdLYsdPdVkcxOUVwPeL5 v3vaY1CR/KX+hmio+e/pIQSi7NhKfmHBdteQHl4CuCONa16obPeOSFczpZP7cwRt yINF7+FWxOrDHVgJ23JckwZflD/xfgU7Ch6scdiIpAURU2im15dLxqh6bg8nwViz BafKe66Bgw== =sRYt -----END PGP SIGNATURE----- Merge tag 'io_uring-6.19-20260205' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring fixes from Jens Axboe: - Two small fixes for zcrx - Two small fixes for fdinfo - one is just killing a superflous newline * tag 'io_uring-6.19-20260205' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring/fdinfo: be a bit nicer when looping a lot of SQEs/CQEs io_uring/fdinfo: kill unnecessary newline feed in CQE32 printing io_uring/zcrx: fix rq flush locking io_uring/zcrx: fix page array leak	2026-02-05 14:40:06 -08:00
Jens Axboe	442ae40660	io_uring/kbuf: fix memory leak if io_buffer_add_list fails io_register_pbuf_ring() ignores the return value of io_buffer_add_list(), which can fail if xa_store() returns an error (e.g., -ENOMEM). When this happens, the function returns 0 (success) to the caller, but the io_buffer_list structure is neither added to the xarray nor freed. In practice this requires failure injection to hit, hence not a real issue. But it should get fixed up none the less. Fixes: `c7fb19428d` ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-05 11:13:16 -07:00
Tim Bird	ccd18ce290	io_uring: Add SPDX id lines to remaining source files Some io_uring files are missing SPDX-License-Identifier lines. Add lines with GPL-2.0 license IDs to these files. Signed-off-by: Tim Bird <tim.bird@sony.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-04 07:23:45 -07:00
Jens Axboe	38cfdd9dd2	io_uring/fdinfo: be a bit nicer when looping a lot of SQEs/CQEs Add cond_resched() in those dump loops, just in case a lot of entries are being dumped. And detect invalid CQ ring head/tail entries, to avoid iterating more than what is necessary. Generally not an issue, but can be if things like KASAN or other debugging metrics are enabled. Reported-by: 是参差 <shicenci@gmail.com> Link: https://lore.kernel.org/all/PS1PPF7E1D7501FE5631002D242DD89403FAB9BA@PS1PPF7E1D7501F.apcprd02.prod.outlook.com/ Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-03 10:58:32 -07:00
Jens Axboe	b1dfe4e0fc	io_uring/fdinfo: kill unnecessary newline feed in CQE32 printing There's an unconditional newline feed anyway after dumping both normal and big CQE contents, remove the \n from the CQE32 extra1/extra2 printing. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-03 10:00:39 -07:00
Pavel Begunkov	af07330e28	io_uring/zcrx: fix rq flush locking zcrx needs to keep the rq lock for uref manipulations, for now move all zcrx_return_buffers() under the lock. Fixes: `475eb39b00` ("io_uring/zcrx: add sync refill queue flushing") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-02 08:19:43 -07:00
Pavel Begunkov	0ae91d8ab7	io_uring/zcrx: fix page array leak `d9f595b9a6` ("io_uring/zcrx: fix leaking pages on sg init fail") fixed a page leakage but didn't free the page array, release it as well. Fixes: `b84621d96e` ("io_uring/zcrx: allocate sgtable for umem areas") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-02 08:19:35 -07:00
Li Chen	9121466148	io_uring: allow io-wq workers to exit when unused io_uring keeps a per-task io-wq around, even when the task no longer has any io_uring instances. If the task previously used io_uring for file I/O, this can leave an unrelated iou-wrk-* worker thread behind after the last io_uring instance is gone. When the last io_uring ctx is removed from the task context, mark the io-wq exit-on-idle so workers can go away. Clear the flag on subsequent io_uring usage. Signed-off-by: Li Chen <me@linux.beauty> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-02 08:11:42 -07:00
Li Chen	38aa434ab9	io_uring/io-wq: add exit-on-idle state io-wq uses an idle timeout to shrink the pool, but keeps the last worker around indefinitely to avoid churn. For tasks that used io_uring for file I/O and then stop using io_uring, this can leave an iou-wrk-* thread behind even after all io_uring instances are gone. This is unnecessary overhead and also gets in the way of process checkpoint/restore. Add an exit-on-idle state that makes all io-wq workers exit as soon as they become idle, and provide io_wq_set_exit_on_idle() to toggle it. Signed-off-by: Li Chen <me@linux.beauty> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-02-02 08:10:23 -07:00
Jens Axboe	806ae939c4	io_uring/net: don't continue send bundle if poll was required for retry If a send bundle has picked a bunch of buffers, then it needs to send all of those to be complete. This may require poll arming, if the send buffer ends up being full. Once a send bundle has been poll armed, no further bundles should be attempted. This allows a current bundle to complete even though it needs to go through polling to do so, but it will not allow another bundle to be started once that has happened. Ideally we would abort a bundle if it was only partially sent, but as some parts of it already went out on the wire, this obviously isn't feasible. Not continuing more bundle attempts post encountering a full socket buffer is the second best thing. Cc: stable@vger.kernel.org Fixes: `a05d1f625c` ("io_uring/net: support bundles for send") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-27 21:06:28 -07:00
Jens Axboe	e7f67c2be7	io_uring/bpf_filter: add ref counts to struct io_bpf_filter In preparation for allowing inheritance of BPF filters and filter tables, add a reference count to the filter. This allows multiple tables to safely include the same filter. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-27 11:10:46 -07:00

1 2 3 4 5 ...

2020 commits