-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmGLwcQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpv+TD/48S2HTnMhmW6AtFYWErQ+sEKXpHrxbYe7S
+qR8/g/T+QSfhfqPwZEuagndFKtIP3LJfaXGSP1Lk1RfP9NLQy91v33Ibe4DjHkp
etWSfnMHA9MUAoWKmg8EvncB2G+ZQFiYCpjazj5tKHD9S2+psGMuL8kq6qzMJE83
uhpb8WutUl4aSIXbMSfyGlwBhI1MjjRbbWlIBmg4yC8BWt1sH8Qn2L2GNVylEIcX
U8At3KLgPGn0axSg4yGMAwTqtGhL/jwdDyeczbmRlXuAr4iVL9UX/yADCYkazt6U
ttQ2/H+cxCwfES84COx9EteAatlbZxo6wjGvZ3xOMiMJVTjYe1x6Gkcckq+LrZX6
tjofi2KK78qkrMXk1mZMkZjpyUWgRtCswhDllbQyqFs0SwzQtno2//Rk8HU9dhbt
pkpryDbGFki9X3upcNyEYp5TYflpW6YhAzShYgmE6KXim2fV8SeFLviy0erKOAl+
fwjTE6KQ5QoQv0s3WxkWa4lREm34O6IHrCUmbiPm5CruJnQDhqAN2QZIDgYC4WAf
0gu9cR/O4Vxu7TQXrumPs5q+gCyDU0u0B8C3mG2s+rIo+PI5cVZKs2OIZ8HiPo0F
x73kR/pX3DMe35ZQkQX22ymMuowV+aQouDLY9DTwakP5acdcg7h7GZKABk6VLB06
gUIsnxURiQ==
=jNzW
-----END PGP SIGNATURE-----
Merge tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block updates from Jens Axboe:
- Support for batch request processing for ublk, improving the
efficiency of the kernel/ublk server communication. This can yield
nice 7-12% performance improvements
- Support for integrity data for ublk
- Various other ublk improvements and additions, including a ton of
selftests additions and updated
- Move the handling of blk-crypto software fallback from below the
block layer to above it. This reduces the complexity of dealing with
bio splitting
- Series fixing a number of potential deadlocks in blk-mq related to
the queue usage counter and writeback throttling and rq-qos debugfs
handling
- Add an async_depth queue attribute, to resolve a performance
regression that's been around for a qhilw related to the scheduler
depth handling
- Only use task_work for IOPOLL completions on NVMe, if it is necessary
to do so. An earlier fix for an issue resulted in all these
completions being punted to task_work, to guarantee that completions
were only run for a given io_uring ring when it was local to that
ring. With the new changes, we can detect if it's necessary to use
task_work or not, and avoid it if possible.
- rnbd fixes:
- Fix refcount underflow in device unmap path
- Handle PREFLUSH and NOUNMAP flags properly in protocol
- Fix server-side bi_size for special IOs
- Zero response buffer before use
- Fix trace format for flags
- Add .release to rnbd_dev_ktype
- MD pull requests via Yu Kuai
- Fix raid5_run() to return error when log_init() fails
- Fix IO hang with degraded array with llbitmap
- Fix percpu_ref not resurrected on suspend timeout in llbitmap
- Fix GPF in write_page caused by resize race
- Fix NULL pointer dereference in process_metadata_update
- Fix hang when stopping arrays with metadata through dm-raid
- Fix any_working flag handling in raid10_sync_request
- Refactor sync/recovery code path, improve error handling for
badblocks, and remove unused recovery_disabled field
- Consolidate mddev boolean fields into mddev_flags
- Use mempool to allocate stripe_request_ctx and make sure
max_sectors is not less than io_opt in raid5
- Fix return value of mddev_trylock
- Fix memory leak in raid1_run()
- Add Li Nan as mdraid reviewer
- Move phys_vec definitions to the kernel types, mostly in preparation
for some VFIO and RDMA changes
- Improve the speed for secure erase for some devices
- Various little rust updates
- Various other minor fixes, improvements, and cleanups
* tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits)
blk-mq: ABI/sysfs-block: fix docs build warnings
selftests: ublk: organize test directories by test ID
block: decouple secure erase size limit from discard size limit
block: remove redundant kill_bdev() call in set_blocksize()
blk-mq: add documentation for new queue attribute async_dpeth
block, bfq: convert to use request_queue->async_depth
mq-deadline: covert to use request_queue->async_depth
kyber: covert to use request_queue->async_depth
blk-mq: add a new queue sysfs attribute async_depth
blk-mq: factor out a helper blk_mq_limit_depth()
blk-mq-sched: unify elevators checking for async requests
block: convert nr_requests to unsigned int
block: don't use strcpy to copy blockdev name
blk-mq-debugfs: warn about possible deadlock
blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs()
blk-mq-debugfs: remove blk_mq_debugfs_unregister_rqos()
blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
blk-rq-qos: fix possible debugfs_mutex deadlock
blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
blk-wbt: fix possible deadlock to nest pcpu_alloc_mutex under q_usage_counter
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmGJ1kQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpky8EAChIL3uJ5Vmv+oQTxT4EVb1wpc8U/XzXWU5
Q5F9IpZZCGO7+i015Y7iTTqDRixjblRaWpWzZZP8vflWDUS8LESNZLQdcoEnxaiv
P367KNPUGwxejcKsu8PvZvfnX6JWSQoNstcDmrwkCF0ND2UUfvvMZyn3uKhkbBRY
h5Ehcqkvqc1OJDAWC7+yPzYAmB01uRPQ6sc9/GeujznHPlfbvie4u6gBvvfXeirT
592zbVftINMrm6Twd6zl4n+HNAn+CUoyVMppeeddv5IcyFPm9uz/dLOZBXTz6552
jFYNmB0U4g+SxGXMyqp37YISTALnuY+57y5eXmEAtgkEeE3HrF+F/ZdxQHwXSpo3
T2Lb9IOqFyHtSvq678HZ37JB6aIYbBE/mZdNf8FFFpnPJGb5Ey7d50qPp/ywVq0H
p9CahbpkzGUBMsZ+koew0YHiFdWV9tww+/Bnk5dTtn2197uyaHsLdmbf4C36GWke
Bk5cwNgU+3DMFAfTiL9m+AIXYsJkBayRJn+hViTrF5AL7gcGiBryGF43FOSKoYuq
f0mniDnGSwvn86VZPuZQ6wBRHZPEMR3OlaUXn6XrUU6cYyvMg0pBZV+QHF7zlsSP
2sdfUbPL5TxexF3G8dsxlDIypz9Z6TCoUCfU0WiiUETnCrVNkXfIY846A+w08p0b
ejBjzrwRtQ==
=CqJq
-----END PGP SIGNATURE-----
Merge tag 'io_uring-bpf-restrictions.4-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring bpf filters from Jens Axboe:
"This adds support for both cBPF filters for io_uring, as well as task
inherited restrictions and filters.
seccomp and io_uring don't play along nicely, as most of the
interesting data to filter on resides somewhat out-of-band, in the
submission queue ring.
As a result, things like containers and systemd that apply seccomp
filters, can't filter io_uring operations.
That leaves them with just one choice if filtering is critical -
filter the actual io_uring_setup(2) system call to simply disallow
io_uring. That's rather unfortunate, and has limited us because of it.
io_uring already has some filtering support. It requires the ring to
be setup in a disabled state, and then a filter set can be applied.
This filter set is completely bi-modal - an opcode is either enabled
or it's not. Once a filter set is registered, the ring can be enabled.
This is very restrictive, and it's not useful at all to systemd or
containers which really want both broader and more specific control.
This first adds support for cBPF filters for opcodes, which enables
tighter control over what exactly a specific opcode may do. As
examples, specific support is added for IORING_OP_OPENAT/OPENAT2,
allowing filtering on resolve flags. And another example is added for
IORING_OP_SOCKET, allowing filtering on domain/type/protocol. These
are both common use cases. cBPF was chosen rather than eBPF, because
the latter is often restricted in containers as well.
These filters are run post the init phase of the request, which allows
filters to even dip into data that is being passed in struct in user
memory, as the init side of requests make that data stable by bringing
it into the kernel. This allows filtering without needing to copy this
data twice, or have filters etc know about the exact layout of the
user data. The filters get the already copied and sanitized data
passed.
On top of that support is added for per-task filters, meaning that any
ring created with a task that has a per-task filter will get those
filters applied when it's created. These filters are inherited across
fork as well. Once a filter has been registered, any further added
filters may only further restrict what operations are permitted.
Filters cannot change the return value of an operation, they can only
permit or deny it based on the contents"
* tag 'io_uring-bpf-restrictions.4-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring: allow registration of per-task restrictions
io_uring: add task fork hook
io_uring/bpf_filter: add ref counts to struct io_bpf_filter
io_uring/bpf_filter: cache lookup table in ctx->bpf_filters
io_uring/bpf_filter: allow filtering on contents of struct open_how
io_uring/net: allow filtering on IORING_OP_SOCKET data
io_uring: add support for BPF filtering for opcode restrictions
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmmGJxsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpk6+EACamMdw6WU4VVNjUtjT93FuXxor4ioyhowJ
myRtKG3ZvYrE63Z8F1dCQE28RXi9n6MhGxabCq8WZVGkhTv27DuaBkDjU4T8oCnP
EYhs5a3sdRXfKuIlqVbxuiFdmiPHEP0vh3/MviKx9Ju3/Po3OEWKBalNMevfGkS4
bRNp9IQkAYNSRhGma2ni9Rnc5welWmhpsxUKFdGtPRX53ZlYegiZxKlfKMB4/SQ+
7XAWKhy9dOGVo4DpLof7mCX6hMeX+FoNkJzF6cTMO/IF//lCLjI9BN4SMiI6mmEN
RY6PLJiFraoQx8wdr3J1LtBCNXzzj6cPk6PNHKtsodoafe2oYFNLNgfAa9pHDzfM
12kvy58au0cQG6TnS2eNlqM2GN116mJi+k00E+UW4iaXXtpqcdcBrLlS+Q5hJ78C
9MBLQofv7D06C6kbpxV2pVS1u4oxefjl19wWLqLKx/VytCHrsaTm50n1r0k7YLCc
plvPkQRQobqpp2GtcaXcfmsi1Vfu4jzMBAN+rTN4/te0kudNqL9+hPvrejIMEURc
2AcktMAHC8wjpr93dFASXiWh/fdyhV4e2a/D/ML4PXxhnCfnGx5s5Tp/pGjePHEU
dLZm9vadmr/Yrdgycf9gQ8mz9IxI9FNJCKbI7lf7+/KJXe7DwngOa6VHNblWBRHv
YoX6bG1yQQ==
=Q248
-----END PGP SIGNATURE-----
Merge tag 'for-7.0/io_uring-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring updates from Jens Axboe:
- Clean up the IORING_SETUP_R_DISABLED and submitter task checking,
mostly just in preparation for relaxing the locking for SINGLE_ISSUER
in the future.
- Improve IOPOLL by using a doubly linked list to manage completions.
Previously it was singly listed, which meant that to complete request
N in the chain 0..N-1 had to have completed first. With a doubly
linked list we can complete whatever request completes in that order,
rather than need to wait for a consecutive range to be available.
This reduces latencies.
- Improve the restriction setup and checking. Mostly in preparation for
adding further features on top of that. Coming in a separate pull
request.
- Split out task_work and wait handling into separate files. These are
mostly nicely abstracted already, but still remained in the
io_uring.c file which is on the larger side.
- Use GFP_KERNEL_ACCOUNT in a few more spots, where appropriate.
- Ensure even the idle io-wq worker exits if a task no longer has any
rings open.
- Add support for a non-circular submission queue.
By default, the SQ ring keeps moving around, even if only a few
entries are used for each submission. This can be wasteful in terms
of cachelines.
If IORING_SETUP_SQ_REWIND is set for the ring when created, each
submission will start at offset 0 instead of where we last left off
doing submissions.
- Various little cleanups
* tag 'for-7.0/io_uring-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (30 commits)
io_uring/kbuf: fix memory leak if io_buffer_add_list fails
io_uring: Add SPDX id lines to remaining source files
io_uring: allow io-wq workers to exit when unused
io_uring/io-wq: add exit-on-idle state
io_uring/net: don't continue send bundle if poll was required for retry
io_uring/rsrc: use GFP_KERNEL_ACCOUNT consistently
io_uring/futex: use GFP_KERNEL_ACCOUNT for futex data allocation
io_uring/io-wq: handle !sysctl_hung_task_timeout_secs
io_uring: fix bad indentation for setup flags if statement
io_uring/rsrc: take unsigned index in io_rsrc_node_lookup()
io_uring: introduce non-circular SQ
io_uring: split out CQ waiting code into wait.c
io_uring: split out task work code into tw.c
io_uring/io-wq: don't trigger hung task for syzbot craziness
io_uring: add IO_URING_EXIT_WAIT_MAX definition
io_uring/sync: validate passed in offset
io_uring/eventfd: remove unused ctx->evfd_last_cq_tail member
io_uring/timeout: annotate data race in io_flush_timeouts()
io_uring/uring_cmd: explicitly disallow cancelations for IOPOLL
io_uring: fix IOPOLL with passthrough I/O
...
- nilfs2: fix missing struct keywords in nilfs2_api.h kernel-doc
- nilfs2: convert nilfs_super_block to kernel-doc
- nilfs2: Fix potential block overflow that cause system hang
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQT4wVoLCG92poNnMFAhI4xTh21NnQUCaYZw2gAKCRAhI4xTh21N
nZuFAQD8vJk3OJKbxxroCyTGxDT3C/WL7PMC9Z1QNQ/yF0zBYwD+JSPN/dfQY6uF
LracWCoFQcdaWRRWn1SA2k7L1OchFgQ=
=WXsp
-----END PGP SIGNATURE-----
Merge tag 'nilfs2-v7.0-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/nilfs2
Pull nilfs2 updates from Viacheslav Dubeyko:
- Fix potential block overflow that cause system hang
When executing the FITRIM command, an underflow can occur in the
calculation of nblocks. This ultimately leads to the block layer
function __blkdev_issue_discard() taking an excessively long time
to process the bio chain, and the ns_segctor_sem lock remains held
for a long period.
This prevents other tasks from acquiring the ns_segctor_sem lock,
resulting in a hang reported by syzbot (Edward Adam Davis)
- Fix missing struct keywords in nilfs2_api.h kernel-doc (Ryusuke
Konishi)
- Convert nilfs_super_block to kernel-doc
Eliminate 40+ kernel-doc warnings in nilfs2_ondisk.h by converting
all of the struct member comments to kernel-doc comments (Randy
Dunlap)
* tag 'nilfs2-v7.0-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/nilfs2:
nilfs2: fix missing struct keywords in nilfs2_api.h kernel-doc
nilfs2: convert nilfs_super_block to kernel-doc
nilfs2: Fix potential block overflow that cause system hang
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmmDT6sACgkQxWXV+ddt
WDteIBAAnQBKtHZOrefnA/SjbT4N+IV20x8sxVc3XI2MXw6RpjEN6k+0oGMLdvMy
5NBryJ43q5CwCV6iNkWQE4mT86gcPa6Bqv1nFOC5Q2BDkvbVBpOfOq7kC2+fQ7ay
HF2Mr0PUHc0Y0MhkRSljO+T2QD4tDpWaxbEeVY+TxiAsepD1paK4fHV6Lwu2sk25
17RJQvm/2XRY32g9Sa6NZIc7mGuyIasMCBcTpDKDJW10hP61NNtK4wHgPLtMRtzx
qzCAPSMS6QkeJZHcDa/Atg+iqpR5U8pdKAUSYJii3Kgcmjr5n1U1ZTp5WRLlXSS2
tHiR62a983ya022wKR1ApsdjN7ncE8iIeT/GrezZVcPtm9jTxaSzgd7dDNfSmr29
my4crJWvlEuD9Qt+/oz//eLAjkgEe2Q5RtaAworCAG00MzaGOEwNiXXP7DDMQApI
VTxx9dvY0s/W3UF/IuJWTTN9q95KjvlmZ9ELAPxwwtyq+sAD41CvlYhJqCaLLec5
6xMotP5cy3Ur+yp+J7RCDprQ7x6YcU98PYIXQxf1/77f3Lz/7QA2TWafPzJ5V2Bk
UtprVCrlqwCmSFrSISN6HzNf0UYY/ZI36WRoUj/ZJkGNfkQwvs9aBjb+lVYRb8T8
OcMlJrJvoUwIY//ef5K97ma8HOecodxszdEIafOgmnJtE9H3foI=
=Ie8n
-----END PGP SIGNATURE-----
Merge tag 'for-6.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"User visible changes, feature updates:
- when using block size > page size, enable direct IO
- fallback to buffered IO if the data profile has duplication,
workaround to avoid checksum mismatches on block group profiles
with redundancy, real direct IO is possible on single or RAID0
- redo export of zoned statistics, moved from sysfs to
/proc/pid/mountstats due to size limitations of the former
Experimental features:
- remove offload checksum tunable, intended to find best way to do it
but since we've switched to offload to thread for everything we
don't need it anymore
- initial support for remap-tree feature, a translation layer of
logical block addresses that allow changes without moving/rewriting
blocks to do eg. relocation, or other changes that require COW
Notable fixes:
- automatic removal of accidentally leftover chunks when
free-space-tree is enabled since mkfs.btrfs v6.16.1
- zoned mode:
- do not try to append to conventional zones when RAID is mixing
zoned and conventional drives
- fixup write pointers when mixing zoned and conventional on
DUP/RAID* profiles
- when using squota, relax deletion rules for qgroups with 0 members
to allow easier recovery from accounting bugs, also add more checks
to detect bad accounting
- fix periodic reclaim scanning, properly check boundary conditions
not to trigger it unexpectedly or miss the time to run it
- trim:
- continue after first error
- change reporting to the first detected error
- add more cancellation points
- reduce contention of big device lock that can block other
operations when there's lots of trimmed space
- when chunk allocation is forced (needs experimental build) fix
transaction abort when unexpected space layout is detected
Core:
- switch to crypto library API for checksumming, removed module
dependencies, pointer indirections, etc.
- error handling improvements
- adjust how and where transaction commit or abort are done and are
maybe not necessary
- minor compression optimization to skip single block ranges
- improve how compression folios are handled
- new and updated selftests
- cleanups, refactoring:
- auto-freeing and other automatic variable cleanup conversion
- structure size optimizations
- condition annotations"
* tag 'for-6.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (137 commits)
btrfs: get rid of compressed_bio::compressed_folios[]
btrfs: get rid of compressed_folios[] usage for encoded writes
btrfs: get rid of compressed_folios[] usage for compressed read
btrfs: remove the old btrfs_compress_folios() infrastructure
btrfs: switch to btrfs_compress_bio() interface for compressed writes
btrfs: introduce btrfs_compress_bio() helper
btrfs: zlib: introduce zlib_compress_bio() helper
btrfs: zstd: introduce zstd_compress_bio() helper
btrfs: lzo: introduce lzo_compress_bio() helper
btrfs: zoned: factor out the zone loading part into a testable function
btrfs: add cleanup function for btrfs_free_chunk_map
btrfs: tests: add cleanup functions for test specific functions
btrfs: raid56: fix memory leak of btrfs_raid_bio::stripe_uptodate_bitmap
btrfs: tests: add unit tests for pending extent walking functions
btrfs: fix EEXIST abort due to non-consecutive gaps in chunk allocation
btrfs: fix transaction commit blocking during trim of unallocated space
btrfs: handle user interrupt properly in btrfs_trim_fs()
btrfs: preserve first error in btrfs_trim_fs()
btrfs: continue trimming remaining devices on failure
btrfs: do not BUG_ON() in btrfs_remove_block_group()
...
Please consider pulling these changes from the signed vfs-7.0-rc1.namespace tag.
Thanks!
Christian
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc
ovzgAP9BpqMQhMy2VCurru8/T5VAd6eJdgXzEfXqMksL5BNm8gEAsLx666KJNKgm
Sh/yVA2KBjf51gvcLZ4gHOISaMU8bAI=
=RGLf
-----END PGP SIGNATURE-----
Merge tag 'vfs-7.0-rc1.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
- statmount: accept fd as a parameter
Extend struct mnt_id_req with a file descriptor field and a new
STATMOUNT_BY_FD flag. When set, statmount() returns mount information
for the mount the fd resides on — including detached mounts
(unmounted via umount2(MNT_DETACH)).
For detached mounts the STATMOUNT_MNT_POINT and STATMOUNT_MNT_NS_ID
mask bits are cleared since neither is meaningful. The capability
check is skipped for STATMOUNT_BY_FD since holding an fd already
implies prior access to the mount and equivalent information is
available through fstatfs() and /proc/pid/mountinfo without
privilege. Includes comprehensive selftests covering both attached
and detached mount cases.
- fs: Remove internal old mount API code (1 patch)
Now that every in-tree filesystem has been converted to the new
mount API, remove all the legacy shim code in fs_context.c that
handled unconverted filesystems. This deletes ~280 lines including
legacy_init_fs_context(), the legacy_fs_context struct, and
associated wrappers. The mount(2) syscall path for userspace remains
untouched. Documentation references to the legacy callbacks are
cleaned up.
- mount: add OPEN_TREE_NAMESPACE to open_tree()
Container runtimes currently use CLONE_NEWNS to copy the caller's
entire mount namespace — only to then pivot_root() and recursively
unmount everything they just copied. With large mount tables and
thousands of parallel container launches this creates significant
contention on the namespace semaphore.
OPEN_TREE_NAMESPACE copies only the specified mount tree (like
OPEN_TREE_CLONE) but returns a mount namespace fd instead of a
detached mount fd. The new namespace contains the copied tree mounted
on top of a clone of the real rootfs.
This functions as a combined unshare(CLONE_NEWNS) + pivot_root() in a
single syscall. Works with user namespaces: an unshare(CLONE_NEWUSER)
followed by OPEN_TREE_NAMESPACE creates a mount namespace owned by
the new user namespace. Mount namespace file mounts are excluded from
the copy to prevent cycles. Includes ~1000 lines of selftests"
* tag 'vfs-7.0-rc1.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests/open_tree: add OPEN_TREE_NAMESPACE tests
mount: add OPEN_TREE_NAMESPACE
fs: Remove internal old mount API code
selftests: statmount: tests for STATMOUNT_BY_FD
statmount: accept fd as a parameter
statmount: permission check should return EPERM
Please consider pulling these changes from the signed vfs-7.0-rc1.nullfs tag.
Thanks!
Christian
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc
olG7AQD9TywOR0HC9PMT8jrhC1TKODnZ4H1aLNlYVltzfJ09xwEAwFSGO4rQmGAF
aZdD0RQw4bkf7IC1PIZHEGUqmVXJCQ8=
=NvyI
-----END PGP SIGNATURE-----
Merge tag 'vfs-7.0-rc1.nullfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs nullfs update from Christian Brauner:
"Add a completely catatonic minimal pseudo filesystem called "nullfs"
and make pivot_root() work in the initramfs.
Currently pivot_root() does not work on the real rootfs because it
cannot be unmounted. Userspace has to recursively delete initramfs
contents manually before continuing boot, using the fragile
switch_root sequence (overmount + chroot).
Add nullfs, a minimal immutable filesystem that serves as the true
root of the mount hierarchy. The mutable rootfs (tmpfs/ramfs) is
mounted on top of it. This allows userspace to simply:
chdir(new_root);
pivot_root(".", ".");
umount2(".", MNT_DETACH);
without the traditional switch_root workarounds. systemd already
handles this correctly. It tries pivot_root() first and falls back
to MS_MOVE only when that fails.
This also means rootfs mounts in unprivileged namespaces no longer
need MNT_LOCKED, since the immutable nullfs guarantees nothing can be
revealed by unmounting the covering mount.
nullfs is a single-instance filesystem (get_tree_single()) marked
SB_NOUSER | SB_I_NOEXEC | SB_I_NODEV with an immutable empty root
directory. This means sooner or later it can be used to overmount
other directories to hide their contents without any additional
protection needed.
We enable it unconditionally. If we see any real regression we'll
hide it behind a boot option.
nullfs has extensions beyond this in the future. It will serve as a
concept to support the creation of completely empty mount namespaces -
which is work coming up in the next cycle"
* tag 'vfs-7.0-rc1.nullfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: use nullfs unconditionally as the real rootfs
docs: mention nullfs
fs: add immutable rootfs
fs: add init_pivot_root()
fs: ensure that internal tmpfs mount gets mount id zero
Please consider pulling these changes from the signed vfs-7.0-rc1.fserror tag.
Thanks!
Christian
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc
orUJAP9taSsjaB9zD9gU/rs8RfaPjhDXbVuPkBiDFARvGPSegwD/ZxTygHYsYarv
7JtAuKI/njOcfhl+fvHSHT1BgcO+nQ8=
=nUTi
-----END PGP SIGNATURE-----
Merge tag 'vfs-7.0-rc1.fserror' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs error reporting updates from Christian Brauner:
"This contains the changes to support generic I/O error reporting.
Filesystems currently have no standard mechanism for reporting
metadata corruption and file I/O errors to userspace via fsnotify.
Each filesystem (xfs, ext4, erofs, f2fs, etc.) privately defines
EFSCORRUPTED, and error reporting to fanotify is inconsistent or
absent entirely.
This introduces a generic fserror infrastructure built around struct
super_block that gives filesystems a standard way to queue metadata
and file I/O error reports for delivery to fsnotify.
Errors are queued via mempools and queue_work to avoid holding
filesystem locks in the notification path; unmount waits for pending
events to drain. A new super_operations::report_error callback lets
filesystem drivers respond to file I/O errors themselves (to be used
by an upcoming XFS self-healing patchset).
On the uapi side, EFSCORRUPTED and EUCLEAN are promoted from private
per-filesystem definitions to canonical errno.h values across all
architectures"
* tag 'vfs-7.0-rc1.fserror' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
ext4: convert to new fserror helpers
xfs: translate fsdax media errors into file "data lost" errors when convenient
xfs: report fs metadata errors via fsnotify
iomap: report file I/O errors to the VFS
fs: report filesystem and file I/O errors to fsnotify
uapi: promote EFSCORRUPTED and EUCLEAN to errno.h
Please consider pulling these changes from the signed vfs-7.0-rc1.initrd tag.
Thanks!
Christian
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc
ordBAQD4d6Y5Zvr852s9deMTDv+ng3bFk1YHqybGe3wEuATstwD/QcvAoNFW9Nn5
n7/268Nk6jTEygT7Fm3tn42SnwOxhgU=
=T203
-----END PGP SIGNATURE-----
Merge tag 'vfs-7.0-rc1.initrd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs initrd removal from Christian Brauner:
"Remove the deprecated linuxrc-based initrd code path and related dead
code. The linuxrc initrd path was deprecated in 2020 and this series
completes its removal. If we see real-life regressions we'll revert.
The core change removes handle_initrd() and init_linuxrc() — the
entire flow that ran /linuxrc from an initrd, pivoted roots, and
handed off to the real root filesystem. With that gone, initrd_load()
becomes void (no longer short-circuits prepare_namespace()),
rd_load_image() is simplified to always load /initrd.image instead of
taking a path, and rd_load_disk() is deleted.
The /proc/sys/kernel/real-root-dev sysctl and its backing variable are
removed since they only existed for linuxrc to communicate the real
root device back to the kernel.
The no-op load_ramdisk= and prompt_ramdisk= parameters are dropped,
and noinitrd and ramdisk_start= gain deprecation warnings.
Initramfs is entirely unaffected. The non-linuxrc initrd path
(root=/dev/ram0) is preserved but now carries a deprecation warning
targeting January 2027 removal"
* tag 'vfs-7.0-rc1.initrd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
init: remove /proc/sys/kernel/real-root-dev
initrd: remove deprecated code path (linuxrc)
init: remove deprecated "load_ramdisk" and "prompt_ramdisk" command line parameters
- Drop a user-triggerable WARN on nested_svm_load_cr3() failure.
- Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS
relies on an upcoming, publicly announced change in the APM to reduce the
set of conditions where hardware (i.e. KVM) *must* flush the RAP.
- Ignore nSVM intercepts for instructions that are not supported according to
L1's virtual CPU model.
- Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath
for EPT Misconfig.
- Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM,
and allow userspace to restore nested state with GIF=0.
- Treat exit_code as an unsigned 64-bit value through all of KVM.
- Add support for fetching SNP certificates from userspace.
- Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD
or VMSAVE on behalf of L2.
- Misc fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmmGsbEACgkQOlYIJqCj
N/18Iw//U9ZiNSW8k9CGRnXN/hmc8h21cNlTdGliqY3lkf0y7feCb1sEdkCFv6/U
KXlOhGUD8PiVlcJWm3ZWWMq/bJ5Ahcvyvre8RelRMQ5SRw07IojYSI1IkNHpSUBX
brEd8DBG24oaw2El+rkl6mN9fneNUAq4pZtU9QDA/ehKDxpdsym2OAUStAVjXy0R
YtIhsz0k1qX+EN/UIrvBTS6bCG3Ihd6btHgCehqGAOnY2rk5gNR0zChdKV3mdk2t
hsbpKp8rtZppZ9Ltru/ly4TYzaKT/dl9gWt7h1y78fN7XD5orenAe8MOkav3WoPI
zdDkDMzvwjv0p+bGPJKszxJrb4SBagtadvFMmKR+WZ0aYhysdAhxlpt64krqFrSV
wjfNfPQ1Z2qHb9PV4TfuBr4g+OyYZfnBcEvyJswrVHOBTfCoMn4hx4tF0bbSZdLd
nmOVqcXiPPpnOza2EXtYc97PSiHwl/CVlhXguYRPg/FQFnJKHHYoL9aRH4YpyZiK
o/7Bsqe20ouuMoRdVIt+zp8FvhOsuiHV122e6d55+bvNhUGBC4sXNDEKQlmQps4K
yvBUIGWLSx3Por/Iey7Rp+7hCXACf9KXaD1ogG2ZxL7xDE0smj9Jzu2NIzFJWUQ6
uubKwsZBJJDhYAZuDLUFmzoGydntb/Wi/FxetPp7Fzi7D4dnSUI=
=RH/c
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-svm-6.20' of https://github.com/kvm-x86/linux into HEAD
KVM SVM changes for 6.20
- Drop a user-triggerable WARN on nested_svm_load_cr3() failure.
- Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS
relies on an upcoming, publicly announced change in the APM to reduce the
set of conditions where hardware (i.e. KVM) *must* flush the RAP.
- Ignore nSVM intercepts for instructions that are not supported according to
L1's virtual CPU model.
- Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath
for EPT Misconfig.
- Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM,
and allow userspace to restore nested state with GIF=0.
- Treat exit_code as an unsigned 64-bit value through all of KVM.
- Add support for fetching SNP certificates from userspace.
- Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD
or VMSAVE on behalf of L2.
- Misc fixes and cleanups.
The vduse_iova_range_v2 and vduse_iotlb_entry_v2 structures are both
defined in a way that adds implicit padding and is incompatible between
i386 and x86_64 userspace because of the different structure alignment
requirements. Building the header with -Wpadded shows these new warnings:
vduse.h:305:1: error: padding struct size to alignment boundary with 4 bytes [-Werror=padded]
vduse.h:374:1: error: padding struct size to alignment boundary with 4 bytes [-Werror=padded]
Change the amount of padding in these two structures to align them to
64 bit words and avoid those problems. Since the v1 vduse_iotlb_entry
already has an inconsistent size, do not attempt to reuse the structure
but rather list the members indiviudally, with a fixed amount of
padding.
Fixes: 079212f687 ("vduse: add vq group asid support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260202224835.559538-1-arnd@kernel.org>
- Add support for FEAT_IDST, allowing ID registers that are not
implemented to be reported as a normal trap rather than as an UNDEF
exception.
- Add sanitisation of the VTCR_EL2 register, fixing a number of
UXN/PXN/XN bugs in the process.
- Full handling of RESx bits, instead of only RES0, and resulting in
SCTLR_EL2 being added to the list of sanitised registers.
- More pKVM fixes for features that are not supposed to be exposed to
guests.
- Make sure that MTE being disabled on the pKVM host doesn't give it
the ability to attack the hypervisor.
- Allow pKVM's host stage-2 mappings to use the Force Write Back
version of the memory attributes by using the "pass-through'
encoding.
- Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
guest.
- Preliminary work for guest GICv5 support.
- A bunch of debugfs fixes, removing pointless custom iterators stored
in guest data structures.
- A small set of FPSIMD cleanups.
- Selftest fixes addressing the incorrect alignment of page
allocation.
- Other assorted low-impact fixes and spelling fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmmGBEkACgkQI9DQutE9
ekPxQQ//VOzle+RVmgzVSJzpNcoW576QGI7+pZLEMIywXTx6rH+uz2FCaZvvgV7M
LrJ+1Qps9ea5Yti9OplNJmQwy1yAHIurZnpnAoMR+EJ5PUeq8p1EAypySpHtmT/d
KngZsbCvSMydNdfJFwGaz3NFSYj05FlTmWNN+Ndq0JFqyMJQMgY2qKDVmg3pWKcv
TLKTNRo9fJFUVhhBIyIoMl2hE36M6Ac3Qd4dUb5J+Fn834QDXgOzVzUjBtkmbSHD
kJ4gbSs2Ic6QsYWtt70RlyRdreBYegA4C3z1cZV6DDQYxp5Jz2oqXYYC31Ro520A
swuI5y9HMct4mOxqPUqf1lhbvsmkjuZ5Iog6P7W+mOtYHXZIzY8F61sv9YAis9/5
XNOHkg9Cn/n8C2RRQ8vnq0FEI1g7se1UGbe/1NkD4xeR/bzhE/AZSoOrRE7G/XJx
qbF9FkPzd4OXYB2Pdm37G1BWsfN4M1bY1rOmmCyMKym793+b/jM7xdoZY1QfbabP
uKiavuK8RYgqxrEilhP0asvafKjpZaJbn2R3jwHZgQDWe7WH5FhXwX2UcUpQsTan
XZd+/cWaYXjLsKJbiAzy3UArgnzSrHPSpwIOkYq8Lf8EvPgS2g3LLJYbw250Cf1G
74stwoK4PgZ3e6k0nkMk43x1swKb13Gp0vCZjVdnIec9EQgOHfI=
=X8iC
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 7.0
- Add support for FEAT_IDST, allowing ID registers that are not
implemented to be reported as a normal trap rather than as an UNDEF
exception.
- Add sanitisation of the VTCR_EL2 register, fixing a number of
UXN/PXN/XN bugs in the process.
- Full handling of RESx bits, instead of only RES0, and resulting in
SCTLR_EL2 being added to the list of sanitised registers.
- More pKVM fixes for features that are not supposed to be exposed to
guests.
- Make sure that MTE being disabled on the pKVM host doesn't give it
the ability to attack the hypervisor.
- Allow pKVM's host stage-2 mappings to use the Force Write Back
version of the memory attributes by using the "pass-through'
encoding.
- Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
guest.
- Preliminary work for guest GICv5 support.
- A bunch of debugfs fixes, removing pointless custom iterators stored
in guest data structures.
- A small set of FPSIMD cleanups.
- Selftest fixes addressing the incorrect alignment of page
allocation.
- Other assorted low-impact fixes and spelling fixes.
Expose DMABUF functionality to userspace through the uverbs interface,
enabling InfiniBand/RDMA devices to export PCI based memory regions
(e.g. device memory) as DMABUF file descriptors. This allows
zero-copy sharing of RDMA memory with other subsystems that support the
dma-buf framework.
A new UVERBS_OBJECT_DMABUF object type and allocation method were
introduced.
During allocation, uverbs invokes the driver to supply the
rdma_user_mmap_entry associated with the given page offset (pgoff).
Based on the returned rdma_user_mmap_entry, uverbs requests the driver
to provide the corresponding physical-memory details as well as the
driver’s PCI provider information.
Using this information, dma_buf_export() is called; if it succeeds,
uobj->object is set to the underlying file pointer returned by the
dma-buf framework.
The file descriptor number follows the standard uverbs allocation flow,
but the file pointer comes from the dma-buf subsystem, including its own
fops and private data.
When an mmap entry is removed, uverbs iterates over its associated
DMABUFs, marks them as revoked, and calls dma_buf_move_notify() so that
their importers are notified.
The same procedure applies during the disassociate flow; final cleanup
occurs when the application closes the file.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20260201-dmabuf-export-v3-2-da238b614fe3@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
The custom definition of 'struct timespec64' is incompatible with both the
kernel's internal definition and the glibc type, at least on big-endian
targets that have the tv_nsec field in a different place, and the
definition clashes with any userspace that also defines a timespec64
structure.
Running the header check with -Wpadding enabled produces this output that
warns about the incorrect padding:
usr/include/linux/taskstats.h:25:1: error: padding struct size to alignment boundary with 4 bytes [-Werror=padded]
Remove the hack and instead use the regular __kernel_timespec type that is
meant to be used in uapi definitions.
Link: https://lkml.kernel.org/r/20260202095906.1344100-1-arnd@kernel.org
Fixes: 29b63f6eff0e ("delayacct: add timestamp of delay max")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Fan Yu <fan.yu9@zte.com.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Jiang Kun <jiang.kun2@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The following warnings were visible:
$ ./scripts/kernel-doc -Wall -none \
net/mptcp/ include/net/mptcp.h include/uapi/linux/mptcp*.h \
include/trace/events/mptcp.h
Warning: net/mptcp/token.c:108 No description found for return value of 'mptcp_token_new_request'
Warning: net/mptcp/token.c:151 No description found for return value of 'mptcp_token_new_connect'
Warning: net/mptcp/token.c:246 No description found for return value of 'mptcp_token_get_sock'
Warning: net/mptcp/token.c:298 No description found for return value of 'mptcp_token_iter_next'
Warning: net/mptcp/protocol.c:4431 No description found for return value of 'mptcp_splice_read'
Warning: include/uapi/linux/mptcp_pm.h:13 missing initial short description on line:
* enum mptcp_event_type
Address all of them: either by using the 'Return:' keyword, or by adding
a missing initial short description.
The MPTCP CI will soon report issues with kdoc to avoid introducing new
issues and being flagged by the Netdev CI.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260205-net-mptcp-misc-fixes-6-19-rc8-v2-3-c2720ce75c34@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Extend PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() to return a
pointer to the preceding Capability (Qiang Yu)
- Add dw_pcie_remove_capability() and dw_pcie_remove_ext_capability() to
remove Capabilities that are advertised but not fully implemented (Qiang
Yu)
- Remove MSI and MSI-X Capabilities for DWC controllers in platforms that
can't support them, so we automatically fall back to INTx (Qiang Yu)
- Remove MSI-X and DPC Capabilities for Qualcomm platforms that advertise
but don't support them (Qiang Yu)
- Remove duplicate dw_pcie_ep_hide_ext_capability() function and replace
with dw_pcie_remove_ext_capability() (Qiang Yu)
- Add ASPM L1.1 and L1.2 Substates context to debugfs ltssm_status for
drivers that support this (Shawn Lin)
- Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if link
is not up to avoid an unnecessary timeout (Manivannan Sadhasivam)
- Revert dw-rockchip, qcom, and DWC core changes that used link-up IRQs to
trigger enumeration instead of waiting for link to be up because the PCI
core doesn't allocate bus number space for hierarchies that might be
attached (Niklas Cassel)
- Make endpoint iATU entry for MSI permanent instead of programming it
dynamically, which is slow and racy with respect to other concurrent
traffic, e.g., eDMA (Koichiro Den)
- Use iMSI-RX MSI target address when possible to fix endpoints using
32-bit MSI (Shawn Lin)
- Make dw_pcie_ltssm_status_string() available and use it for logging
errors in dw_pcie_wait_for_link() (Manivannan Sadhasivam)
- Return -ENODEV when dw_pcie_wait_for_link() finds no devices, -EIO for
device present but inactive, -ETIMEDOUT for other failures, so callers
can handle these cases differently (Manivannan Sadhasivam)
- Allow DWC host controller driver probe to continue if device is not found
or found but inactive; only fail when there's an error with the link
(Manivannan Sadhasivam)
- For controllers like NXP i.MX6QP and i.MX7D, where LTSSM registers are
not accessible after PME_Turn_Off, simply wait 10ms instead of polling
for L2/L3 Ready (Richard Zhu)
- Use multiple iATU entries to map large bridge windows and DMA ranges when
necessary instead of failing (Samuel Holland)
- Rename struct dw_pcie_rp.has_msi_ctrl to .use_imsi_rx for clarity (Qiang
Yu)
- Add EPC dynamic_inbound_mapping feature bit for Endpoint Controllers that
can update BAR inbound address translation without requiring EPF driver
to clear/reset the BAR first, and advertise it for DWC-based Endpoints
(Koichiro Den)
- Add EPC subrange_mapping feature bit for Endpoint Controllers that can
map multiple independent inbound regions in a single BAR, implement
subrange mapping, advertise it for DWC-based Endpoints, and add Endpoint
selftests for it (Koichiro Den)
- Allow overriding default BAR sizes for pci-epf-test (Niklas Cassel)
- Make resizable BARs work for Endpoint multi-PF configurations; previously
it only worked for PF 0 (Aksh Garg)
- Fix Endpoint non-PF 0 support for BAR configuration, ATU mappings, and
Address Match Mode (Aksh Garg)
- Fix issues with outbound iATU index assignment that caused iATU index to
be out of bounds (Niklas Cassel)
- Clean up iATU index tracking to be consistent (Niklas Cassel)
- Set up iATU when ECAM is enabled; previously IO and MEM outbound windows
weren't programmed, and ECAM-related iATU entries weren't restored after
suspend/resume, so config accesses failed (Krishna Chaitanya Chundru)
* pci/controller/dwc:
PCI: dwc: Fix missing iATU setup when ECAM is enabled
PCI: dwc: Clean up iATU index usage in dw_pcie_iatu_setup()
PCI: dwc: Fix msg_atu_index assignment
PCI: dwc: ep: Add comment explaining controller level PTM access in multi PF setup
PCI: dwc: ep: Add per-PF BAR and inbound ATU mapping support
PCI: dwc: ep: Fix resizable BAR support for multi-PF configurations
PCI: endpoint: pci-epf-test: Allow overriding default BAR sizes
selftests: pci_endpoint: Add BAR subrange mapping test case
misc: pci_endpoint_test: Add BAR subrange mapping test case
PCI: endpoint: pci-epf-test: Add BAR subrange mapping test support
Documentation: PCI: endpoint: Clarify pci_epc_set_bar() usage
PCI: dwc: ep: Support BAR subrange inbound mapping via Address Match Mode iATU
PCI: dwc: Advertise dynamic inbound mapping support
PCI: endpoint: Add BAR subrange mapping support
PCI: endpoint: Add dynamic_inbound_mapping EPC feature
PCI: dwc: Rename dw_pcie_rp::has_msi_ctrl to dw_pcie_rp::use_imsi_rx for clarity
PCI: dwc: Fix grammar and formatting for comment in dw_pcie_remove_ext_capability()
PCI: dwc: Use multiple iATU windows for mapping large bridge windows and DMA ranges
PCI: dwc: Remove duplicate dw_pcie_ep_hide_ext_capability() function
PCI: dwc: Skip waiting for L2/L3 Ready if dw_pcie_rp::skip_l23_wait is true
PCI: dwc: Fail dw_pcie_host_init() if dw_pcie_wait_for_link() returns -ETIMEDOUT
PCI: dwc: Rework the error print of dw_pcie_wait_for_link()
PCI: dwc: Rename and move ltssm_status_string() to pcie-designware.c
PCI: dwc: Return -EIO from dw_pcie_wait_for_link() if device is not active
PCI: dwc: Return -ENODEV from dw_pcie_wait_for_link() if device is not found
PCI: dwc: Use cfg0_base as iMSI-RX target address to support 32-bit MSI devices
PCI: dwc: ep: Cache MSI outbound iATU mapping
Revert "PCI: dwc: Don't wait for link up if driver can detect Link Up event"
Revert "PCI: qcom: Enumerate endpoints based on Link up event in 'global_irq' interrupt"
Revert "PCI: qcom: Enable MSI interrupts together with Link up if 'Global IRQ' is supported"
Revert "PCI: qcom: Don't wait for link if we can detect Link Up"
Revert "PCI: dw-rockchip: Enumerate endpoints based on dll_link_up IRQ"
Revert "PCI: dw-rockchip: Don't wait for link since we can detect Link Up"
PCI: dwc: Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if link is not up
PCI: dw-rockchip: Change get_ltssm() to provide L1 Substates info
PCI: dwc: Add L1 Substates context to ltssm_status of debugfs
PCI: qcom: Remove DPC Extended Capability
PCI: qcom: Remove MSI-X Capability for Root Ports
PCI: dwc: Remove MSI/MSIX capability for Root Port if iMSI-RX is used as MSI controller
PCI: dwc: Add new APIs to remove standard and extended Capability
PCI: Add preceding capability position support in PCI_FIND_NEXT_*_CAP macros
- Move ABI requirement next to each access right to prepare adding more
access rights;
- Mention the possibility to remove the random component of a socket's
ephemeral port choice within the netns-wide ephemeral port range,
since it allows choosing the "random" ephemeral port.
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://lore.kernel.org/r/20251212163704.142301-2-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Introduce the LANDLOCK_RESTRICT_SELF_TSYNC flag. With this flag, a
given Landlock ruleset is applied to all threads of the calling
process, instead of only the current one.
Without this flag, multithreaded userspace programs currently resort
to using the nptl(7)/libpsx hack for multithreaded policy enforcement,
which is also used by libcap and for setuid(2). Using this
userspace-based scheme, the threads of a process enforce the same
Landlock policy, but the resulting Landlock domains are still
separate. The domains being separate causes multiple problems:
* When using Landlock's "scoped" access rights, the domain identity is
used to determine whether an operation is permitted. As a result,
when using LANLDOCK_SCOPE_SIGNAL, signaling between sibling threads
stops working. This is a problem for programming languages and
frameworks which are inherently multithreaded (e.g. Go).
* In audit logging, the domains of separate threads in a process will
get logged with different domain IDs, even when they are based on
the same ruleset FD, which might confuse users.
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Paul Moore <paul@paul-moore.com>
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251127115136.3064948-2-gnoack@google.com
[mic: Fix restrict_self_flags test, clean up Makefile, allign comments,
reduce local variable scope, add missing includes]
Closes: https://github.com/landlock-lsm/linux/issues/2
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Currently io_uring supports restricting operations on a per-ring basis.
To use those, the ring must be setup in a disabled state by setting
IORING_SETUP_R_DISABLED. Then restrictions can be set for the ring, and
the ring can then be enabled.
This commit adds support for IORING_REGISTER_RESTRICTIONS with ring_fd
== -1, like the other "blind" register opcodes which work on the task
rather than a specific ring. This allows registration of the same kind
of restrictions as can been done on a specific ring, but with the task
itself. Once done, any ring created will inherit these restrictions.
If a restriction filter is registered with a task, then it's inherited
on fork for its children. Children may only further restrict operations,
not extend them.
Inheriting restrictions include both the classic
IORING_REGISTER_RESTRICTIONS based restrictions, as well as the BPF
filters that have been registered with the task via
IORING_REGISTER_BPF_FILTER.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The newly added structure causes a warning about implied padding:
./usr/include/linux/kvm.h:1247:1: error: padding struct size to alignment boundary with 6 bytes [-Werror=padded]
The padding can lead to leaking kernel data and ABI incompatibilies
when used on x86. Neither of these is a problem in this specific
patch, but it's best to avoid it and use explicit padding fields
in general.
Fixes: 0ee4ddc164 ("KVM: s390: Storage key manipulation IOCTL")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
- ath drivers: small features/cleanups
- rtw drivers: mostly refactoring for rtw89 RTL8922DE support
- mac80211: use hrtimers for CAC to avoid too long delays
- cfg80211/mac80211: some initial UHR (Wi-Fi 8) support
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmmDNuEACgkQ10qiO8sP
aADhvA/+J35p2CDkffi1KfZxxx1YdHAAlj1zjhjLzshCMCG3oWzLpOL7se5bgN/C
axPLPbeCAXtsRXln083lbwtrSRexPHSVhelPDNtybLPEocQYrksV8a6V3eWXCNTR
ymN4iDaO/K0gLkDRKH5T8lwZvJttA6iHi+Fm4ir+dsr0O5vwwe4CuAEPA1SuZ2rh
0lQMz6pEzsxq+sZX3p8SoBwXx147l0n6gwMNIgBTKo1tjZha4oaavdvcqq4zaZWV
WCcg4YVA/dWHL0UuwtIF8uQADM43quegBBUFx63QgzfgcnHAnBk2Ckeein/bfvnv
XOKlI4UJi1cxTkTJkDOrSn5IwBzVSlBXE3qEUKKnu5G3+ZgfdsnWmSPeTtOndvAE
rgbwwZb2SKH1kCvL0FDZTwq/iR9KF60ZfhWIq9Sz7m6VZxJoR8QACHglYCysj2JB
B1+oT53EIqP7Ob4s/GN2Yg9M0l4Lv3E6J9g6h3b8yeq9qEXVF8MaVN683rtNpec9
mUqLRlcoToB2W/qvEVESKj8jMvajYZ6TDoO7mSP3paTW3HgMC3wlPJlDc4Q/6h7e
LAKEljXlv6ofNGCcCL37l6KATqSZpIZn+tpSqbELIirWlc/rnTIDU2qZRb7MA1e1
3lKdrS6pOXGS1GJr7HWuLb4cX1SukyXNeyIcZJlSFoxG4oDPvwI=
=/NUu
-----END PGP SIGNATURE-----
Merge tag 'wireless-next-2026-02-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
Some more changes, including pulls from drivers:
- ath drivers: small features/cleanups
- rtw drivers: mostly refactoring for rtw89 RTL8922DE support
- mac80211: use hrtimers for CAC to avoid too long delays
- cfg80211/mac80211: some initial UHR (Wi-Fi 8) support
* tag 'wireless-next-2026-02-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (59 commits)
wifi: brcmsmac: phy: Remove unreachable error handling code
wifi: mac80211: Add eMLSR/eMLMR action frame parsing support
wifi: mac80211: add initial UHR support
wifi: cfg80211: add initial UHR support
wifi: ieee80211: add some initial UHR definitions
wifi: mac80211: use wiphy_hrtimer_work for CAC timeout
wifi: mac80211: correct ieee80211-{s1g/eht}.h include guard comments
wifi: ath12k: clear stale link mapping of ahvif->links_map
wifi: ath12k: Add support TX hardware queue stats
wifi: ath12k: Add support RX PDEV stats
wifi: ath12k: Fix index decrement when array_len is zero
wifi: ath12k: support OBSS PD configuration for AP mode
wifi: ath12k: add WMI support for spatial reuse parameter configuration
dt-bindings: net: wireless: ath11k-pci: deprecate 'firmware-name' property
wifi: ath11k: add usecase firmware handling based on device compatible
wifi: ath10k: sdio: add missing lock protection in ath10k_sdio_fw_crashed_dump()
wifi: ath10k: fix lock protection in ath10k_wmi_event_peer_sta_ps_state_chg()
wifi: ath10k: snoc: support powering on the device via pwrseq
wifi: rtw89: pci: warn if SPS OCP happens for RTL8922DE
wifi: rtw89: pci: restore LDO setting after device resume
...
====================
Link: https://patch.msgid.link/20260204121143.181112-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new IOCTL to allow userspace to manipulate storage keys directly.
This will make it easier to write selftests related to storage keys.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Parse the pipeline direction from topology. The direction_valid token is
required for backward-compatibility with older topologies that may not
have the direction set for pipelines. This will be used when
setting up pipelines to check if a pipeline is in the same direction as
the requested params and skip those in the opposite direction like in
the case of echo reference capture pipelines during playback.
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://patch.msgid.link/20260204081833.16630-6-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Align with the firmware and add the missing token for pipeline kcps.
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://patch.msgid.link/20260204081833.16630-4-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Add 2-bit tcpi_ecn_mode feild within tcp_info to indicate which ECN
mode is negotiated: ECN_MODE_DISABLED, ECN_MODE_RFC3168, ECN_MODE_ACCECN,
or ECN_MODE_PENDING. This is done by utilizing available bits from
tcpi_accecn_opt_seen (reduced from 16 bits to 2 bits) and
tcpi_accecn_fail_mode (reduced from 16 bits to 4 bits).
Also, an extra 24-bit tcpi_options2 field is identified to represent
newer options and connection features, as all 8 bits of tcpi_options
field have been used.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Co-developed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-14-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If we encounter a filesystem with the remap-tree incompat flag set,
validate its compatibility with the other flags, and load the remap tree
using the values that have been added to the superblock.
The remap-tree feature depends on the free-space-tree, but no-holes and
block-group-tree have been made dependencies to reduce the testing
matrix. Similarly I'm not aware of any reason why mixed-bg and zoned would be
incompatible with remap-tree, but this is blocked for the time being
until it can be fully tested.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a struct btrfs_block_group_item_v2, which is used in the block group
tree if the remap-tree incompat flag is set.
This adds two new fields to the block group item: `remap_bytes` and
`identity_remap_count`.
`remap_bytes` records the amount of data that's physically within this
block group, but nominally in another, remapped block group. This is
necessary because this data will need to be moved first if this block
group is itself relocated. If `remap_bytes` > 0, this is an indicator to
the relocation thread that it will need to search the remap-tree for
backrefs. A block group must also have `remap_bytes` == 0 before it can
be dropped.
`identity_remap_count` records how many identity remap items are located
in the remap tree for this block group. When relocation is begun for
this block group, this is set to the number of holes in the free-space
tree for this range. As identity remaps are converted into actual remaps
by the relocation process, this number is decreased. Once it reaches 0,
either because of relocation or because extents have been deleted, the
block group has been fully remapped and its chunk's device extents are
removed.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a new METADATA_REMAP chunk type, which is a metadata chunk that holds the
remap tree.
This is needed for bootstrapping purposes: the remap tree can't itself
be remapped, and must be relocated the existing way, by COWing every
leaf. The remap tree can't go in the SYSTEM chunk as space there is
limited, because a copy of the chunk item gets placed in the superblock.
The changes in fs/btrfs/volumes.h are because we're adding a new block
group type bit after the profile bits, and so can no longer rely on the
const_ilog2 trick.
The sizing to 32MB per chunk, matching the SYSTEM chunk, is an estimate
here, we can adjust it later if it proves to be too big or too small.
This works out to be ~500,000 remap items, which for a 4KB block size
covers ~2GB of remapped data in the worst case and ~500TB in the best case.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add an incompat flag for the new remap-tree feature, and the constants
and definitions needed to support it.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add optional support for device notifications in VMClock. When
supported, the hypervisor will send a device notification every time it
updates the seq_count to a new even value.
Moreover, add support for poll() in VMClock as a means to propagate this
notification to user space. poll() will return a POLLIN event to
listeners every time seq_count changes to a value different than the one
last seen (since open() or last read()/pread()). This means that when
poll() returns a POLLIN event, listeners need to use read() to observe
what has changed and update the reader's view of seq_count. In other
words, after a poll() returned, all subsequent calls to poll() will
immediately return with a POLLIN event until the listener calls read().
The device advertises support for the notification mechanism by setting
flag VMCLOCK_FLAG_NOTIFICATION_PRESENT in vmclock_abi flags field. If
the flag is not present the driver won't setup the ACPI notification
handler and poll() will always immediately return POLLHUP.
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Takahiro Itazuri <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-3-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Similar to live migration, loading a VM from some saved state (aka
snapshot) is also an event that calls for clock adjustments in the
guest. However, guests might want to take more actions as a response to
such events, e.g. as discarding UUIDs, resetting network connections,
reseeding entropy pools, etc. These are actions that guests don't
typically take during live migration, so add a new field in the
vmclock_abi called vm_generation_counter which informs the guest about
such events.
Hypervisor advertises support for vm_generation_counter through the
VMCLOCK_FLAG_VM_GEN_COUNTER_PRESENT flag. Users need to check the
presence of this bit in vmclock_abi flags field before using this flag.
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Takahiro Itazur <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-2-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Enable the support to report packet pacing capabilities
from kernel to user space. Packet pacing allows to limit
the rate to any number between the maximum and minimum.
The capabilities are exposed to user space through query_device.
The following capabilities are reported:
1. The maximum and minimum rate limit in kbps.
2. Bitmap showing which QP types support rate limit.
Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20260202133413.3182578-3-kalesh-anakkur.purayil@broadcom.com
Reviewed-by: Anantha Prabhu <anantha.prabhu@broadcom.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Problem
=======
Commit 658eb5ab91 ("delayacct: add delay max to record delay peak")
introduced the delay max for getdelays, which records abnormal latency
peaks and helps us understand the magnitude of such delays. However, the
peak latency value alone is insufficient for effective root cause
analysis. Without the precise timestamp of when the peak occurred, we
still lack the critical context needed to correlate it with other system
events.
Solution
========
To address this, we need to additionally record a precise timestamp when
the maximum latency occurs. By correlating this timestamp with system
logs and monitoring metrics, we can identify processes with abnormal
resource usage at the same moment, which can help us to pinpoint root
causes.
Use Case
========
bash-4.4# ./getdelays -d -t 227
print delayacct stats ON
TGID 227
CPU count real total virtual total delay total delay average delay max delay min delay max timestamp
46 188000000 192348334 4098012 0.089ms 0.429260ms 0.051205ms 2026-01-15T15:06:58
IO count delay total delay average delay max delay min delay max timestamp
0 0 0.000ms 0.000000ms 0.000000ms N/A
SWAP count delay total delay average delay max delay min delay max timestamp
0 0 0.000ms 0.000000ms 0.000000ms N/A
RECLAIM count delay total delay average delay max delay min delay max timestamp
0 0 0.000ms 0.000000ms 0.000000ms N/A
THRAS HING count delay total delay average delay max delay min delay max timestamp
0 0 0.000ms 0.000000ms 0.000000ms N/A
COMPACT count delay total delay average delay max delay min delay max timestamp
0 0 0.000ms 0.000000ms 0.000000ms N/A
WPCOPY count delay total delay average delay max delay min delay max timestamp
182 19413338 0.107ms 0.547353ms 0.022462ms 2026-01-15T15:05:24
IRQ count delay total delay average delay max delay min delay max timestamp
0 0 0.000ms 0.000000ms 0.000000ms N/A
Link: https://lkml.kernel.org/r/20260119100241520gWubW8-5QfhSf9gjqcc_E@zte.com.cn
Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Cc: Fan Yu <fan.yu9@zte.com.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a new feature flag UBLK_F_NO_AUTO_PART_SCAN to allow users to suppress
automatic partition scanning when starting a ublk device.
This is useful for some cases in which user don't want to scan
partitions.
Users still can manually trigger partition scanning later when appropriate
using standard tools (e.g., partprobe, blockdev --rereadpt).
Reported-by: Yoav Cohen <yoav@nvidia.com>
Link: https://lore.kernel.org/linux-block/DM4PR12MB63280C5637917C071C2F0D65A9A8A@DM4PR12MB6328.namprd12.prod.outlook.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- cfg80211/mac80211
- most of EPPKE/802.1X over auth frames support
- additional FTM capabilities
- split up drop reasons better, removing generic RX_DROP
- NAN cleanups/fixes
- ath11k:
- support for Channel Frequency Response measurement
- ath12k:
- support for the QCC2072 chipset
- iwlwifi:
- partial NAN support
- UNII-9 support
- some UHR/802.11bn FW APIs
- remove most of MLO/EHT from iwlmvm
(such devices use iwlmld)
- rtw89:
- preparations for RTL8922DE support
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAml7PQcACgkQ10qiO8sP
aAAKiA/6AnyNxa0bX2VFsWYW6KJYnJBVNLlP2ghkV3uIWtJoZdXuQO+W8/cy9Cng
yrhfPzNfT+2hqmxasxI0tND3H3tW9CqcwX80J84eP9JCpYuPept9uGpSxPQoQl5J
Q2k9gX1NlO/SEa8/mOFDT4EmH0bQobxiN84kxSg6Riaazkj6ZjHVVm/3PgzNhxlA
v77m5thlhopzYxKn38qA19E9uHSLcY7XwkeYOZDf00Zhgot29lmDeHOf39IH+HvI
+a20q6tW59D7iX2IUyvLnWzFV1iEcJ6ONF/hYJ0r3TlfmX/NDWfOQxx87K8M1Tqh
sMa+FGrFdqloE1aYi1l+9m6Wu30pHmh7vhlgskPffPmvG+RkCEQCg1Me7eoFOzTB
81K2CMJ34Cp9se+QdiBtY5GpRPZIOlFmY6ZVyZIoEXHkn6r0R94e6dsMZuFcqjv1
y1dzv7BnraVMAQcqwkE9pQtq6LeJoHl2OUT2JzjbKhQhivMf9YubPBZ2QC1LZdMg
NYEX4XSeJ/etpUk1MZFnm5wOw545tMi3U2sAhpYWbE6UBPDrQBvYADqd3lq3DmWe
BdCDHTbqMnAJ3C0xFEKTYTmVF8IoFt6eOclFUPw4Uhq+YmU9x8wx1yBQbF9TjyKU
a/rDCahmryj5gwD0QFJKhdQjfKaQFVNZWZqaKaokM84+8kIdA2U=
=70Rs
-----END PGP SIGNATURE-----
Merge tag 'wireless-next-2026-01-29' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
Another fairly large set of changes, notably:
- cfg80211/mac80211
- most of EPPKE/802.1X over auth frames support
- additional FTM capabilities
- split up drop reasons better, removing generic RX_DROP
- NAN cleanups/fixes
- ath11k:
- support for Channel Frequency Response measurement
- ath12k:
- support for the QCC2072 chipset
- iwlwifi:
- partial NAN support
- UNII-9 support
- some UHR/802.11bn FW APIs
- remove most of MLO/EHT from iwlmvm
(such devices use iwlmld)
- rtw89:
- preparations for RTL8922DE support
* tag 'wireless-next-2026-01-29' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (184 commits)
wifi: iwlegacy: add missing mutex protection in il4965_store_tx_power()
wifi: iwlegacy: add missing mutex protection in il3945_store_measurement()
wifi: mac80211: use u64_stats_t with u64_stats_sync properly
wifi: p54: Fix memory leak in p54_beacon_update()
wifi: cfg80211: treat deprecated INDOOR_SP_AP_OLD control value as LPI mode
wifi: rtw88: sdio: Migrate to use sdio specific shutdown function
wifi: rsi: sdio: Migrate to use sdio specific shutdown function
sdio: Provide a bustype shutdown function
wifi: nl80211/cfg80211: support operating as RSTA in PMSR FTM request
wifi: nl80211/cfg80211: add negotiated burst period to FTM result
wifi: nl80211/cfg80211: clarify periodic FTM parameters for non-EDCA based ranging
wifi: nl80211/cfg80211: add new FTM capabilities
wifi: iwlwifi: rename struct iwl_mcc_allowed_ap_type_cmd::offset_map
wifi: iwlwifi: mvm: Remove link_id from time_events
wifi: iwlwifi: mld: change cluster_id type to u8 array
wifi: iwlwifi: support V13 of iwl_lari_config_change_cmd
wifi: iwlwifi: split bios_value_u32 to separate the header
wifi: iwlwifi: uefi: cache the DSM functions
wifi: iwlwifi: acpi: cache the DSM functions
wifi: iwlwifi: mvm: Cleanup MLO code
...
====================
Link: https://patch.msgid.link/20260129110136.176980-39-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the dpll subsystem exports the fractional frequency offset
(FFO) in parts per million (ppm). This granularity is insufficient for
high-precision synchronization scenarios which often require parts per
trillion (ppt) resolution.
Add a new netlink attribute DPLL_A_PIN_FRACTIONAL_FREQUENCY_OFFSET_PPT
to expose the FFO in ppt.
Update the dpll netlink core to expect the driver-provided FFO value
to be in ppt. To maintain backward compatibility with existing userspace
tools, populate the legacy DPLL_A_PIN_FRACTIONAL_FREQUENCY_OFFSET
attribute by dividing the new ppt value by 1,000,000.
Update the zl3073x and mlx5 drivers to provide the FFO value in ppt:
- zl3073x: adjust the fixed-point calculation to produce ppt (10^12)
instead of ppm (10^6).
- mlx5: scale the existing ppm value by 1,000,000.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260126162253.27890-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new PCITEST_BAR_SUBRANGE ioctl to exercise EPC BAR subrange
mapping end-to-end.
The test programs a simple 2-subrange layout on the endpoint (via
pci-epf-test) and verifies that:
- the endpoint-provided per-subrange signature bytes are observed at
the expected PCIe BAR offsets, and
- writes to each subrange are routed to the correct backing region
(i.e. the submap order is applied rather than accidentally working
due to an identity mapping).
Return -EOPNOTSUPP when the endpoint does not advertise subrange
mapping, -ENODATA when the BAR is disabled, and -EBUSY when the BAR is
reserved for the test register space.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260124145012.2794108-8-den@valinux.co.jp
fs-verity introduced inode flag for inodes with enabled fs-verity on
them. This patch adds FS_XFLAG_VERITY file attribute which can be
retrieved with FS_IOC_FSGETXATTR ioctl() and file_getattr() syscall.
This flag is read-only and can not be set with corresponding set ioctl()
and file_setattr(). The FS_IOC_SETFLAGS requires file to be opened for
writing which is not allowed for verity files. The FS_IOC_FSSETXATTR and
file_setattr() clears this flag from the user input.
As this is now common flag for both flag interfaces (flags/xflags) add
it to overlapping flags list to exclude it from overwrite.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Link: https://patch.msgid.link/20260126115658.27656-2-aalbersh@kernel.org
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Expose a new register type NT_RISCV_USER_CFI for risc-v CFI status and
state. Intentionally, both landing pad and shadow stack status and
state are rolled into the CFI state. Creating two different
NT_RISCV_USER_XXX would not be useful and would waste a note
type. Enabling, disabling and locking the CFI feature is not allowed
via ptrace set interface. However, setting 'elp' state or setting
shadow stack pointer are allowed via the ptrace set interface. It is
expected that 'gdb' might need to fixup 'elp' state or 'shadow stack'
pointer.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-19-b55691eacf4f@rivosinc.com
[pjw@kernel.org: updated to apply; cleaned patch description and comments; addressed checkpatch issues]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Three architectures (x86, aarch64, riscv) have support for indirect
branch tracking feature in a very similar fashion. On a very high
level, indirect branch tracking is a CPU feature where CPU tracks
branches which use a memory operand to transfer control. As part of
this tracking, during an indirect branch, the CPU expects a landing
pad instruction on the target PC, and if not found, the CPU raises
some fault (architecture-dependent).
x86 landing pad instr - 'ENDBRANCH'
arch64 landing pad instr - 'BTI'
riscv landing instr - 'lpad'
Given that three major architectures have support for indirect branch
tracking, this patch creates architecture-agnostic 'prctls' to allow
userspace to control this feature. They are:
- PR_GET_INDIR_BR_LP_STATUS: Get the current configured status for indirect
branch tracking.
- PR_SET_INDIR_BR_LP_STATUS: Set the configuration for indirect branch
tracking.
The following status options are allowed:
- PR_INDIR_BR_LP_ENABLE: Enables indirect branch tracking on user
thread.
- PR_INDIR_BR_LP_DISABLE: Disables indirect branch tracking on user
thread.
- PR_LOCK_INDIR_BR_LP_STATUS: Locks configured status for indirect branch
tracking for user thread.
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-13-b55691eacf4f@rivosinc.com
[pjw@kernel.org: cleaned up patch description, code comments]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Add support for assigning Address Space Identifiers (ASIDs) to each VQ
group. This enables mapping each group into a distinct memory space.
The vq group to ASID association is protected by a rwlock now. But the
mutex domain_lock keeps protecting the domains of all ASIDs, as some
operations like the one related with the bounce buffer size still
requires to lock all the ASIDs.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260119143306.1818855-12-eperezma@redhat.com>
This allows separate the different virtqueues in groups that shares the
same address space. Asking the VDUSE device for the groups of the vq at
the beginning as they're needed for the DMA API.
Allocating 3 vq groups as net is the device that need the most groups:
* Dataplane (guest passthrough)
* CVQ
* Shadowed vrings.
Future versions of the series can include dynamic allocation of the
groups array so VDUSE can declare more groups.
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260119143306.1818855-4-eperezma@redhat.com>
This allows the kernel to detect whether the userspace VDUSE device
supports the VQ group and ASID features. VDUSE devices that don't set
the V1 API will not receive the new messages, and vdpa device will be
created with only one vq group and asid.
The next patches implement the new feature incrementally, only enabling
the VDUSE device to set the V1 API version by the end of the series.
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20260119143306.1818855-3-eperezma@redhat.com>
Add a new "min_threads" variable to the nfsd_net, along with the
corresponding netlink interface, to set that value from userland.
Pass that value to svc_set_pool_threads() and svc_set_num_threads().
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>