Linux kernel source tree
Find a file
Brian Foster 52aecaee1c mm: zero range of eof folio exposed by inode size extension
On some filesystems, it is currently possible to create a transient
data inconsistency between pagecache and on-disk state. For example,
on a 1k block size ext4 filesystem:

$ xfs_io -fc "pwrite 0 2k" -c "mmap 0 4k" -c "mwrite 2k 2k" \
	  -c "truncate 8k" -c "fiemap -v" -c "pread -v 2k 16" <file>
...
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..3]:          17410..17413         4   0x1
   1: [4..15]:         hole                12
00000800:  58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58  XXXXXXXXXXXXXXXX
$ umount <mnt>; mount <dev> <mnt>
$ xfs_io -c "pread -v 2k 16" <file>
00000800:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

This allocates and writes two 1k blocks, map writes to the post-eof
portion of the (4k) eof folio, extends the file, and then shows that
the post-eof data is not cleared before the file size is extended.
The result is pagecache with a clean and uptodate folio over a hole
that returns non-zero data. Once reclaimed, pagecache begins to
return valid data.

Some filesystems avoid this problem by flushing the EOF folio before
inode size extension. This triggers writeback time partial post-eof
zeroing. XFS explicitly zeroes newly exposed file ranges via
iomap_zero_range(), but this includes a hack to flush dirty but
hole-backed folios, which means writeback actually does the zeroing
in this particular case as well. bcachefs explicitly flushes the eof
folio on truncate extension to the same effect, but doesn't handle
the analogous write extension case (i.e., replace "truncate 8k" with
"pwrite 4k 4k" in the above example command to reproduce the same
problem on bcachefs). btrfs doesn't seem to support subpage block
sizes.

The two main options to avoid this behavior are to either flush or
do the appropriate zeroing during size extending operations. Zeroing
is only required when the size change exposes ranges of the file
that haven't been directly written, such as a write or truncate that
starts beyond the current eof. The pagecache_isize_extended() helper
is already used for this particular scenario. It currently cleans
any pte's for the eof folio to ensure preexisting mappings fault and
allow the filesystem to take action based on the updated inode size.
This is required to ensure the folio is fully backed by allocated
blocks, for example, but this also happens to be the same scenario
zeroing is required.

Update pagecache_isize_extended() to zero the post-eof range of the
eof folio if it is dirty at the time of the size change, since
writeback now won't have the chance. If non-dirty, the folio has
either not been written or the post-eof portion was zeroed by
writeback.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Link: https://patch.msgid.link/20240919160741.208162-3-bfoster@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12 23:54:14 -05:00
arch - Explicitly disable the TSC deadline timer when going idle to address 2024-10-20 12:04:32 -07:00
block block-6.12-20241018 2024-10-18 15:53:00 -07:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto This push fixes the following issues: 2024-10-16 08:42:54 -07:00
Documentation Char/Misc/IIO fixes for 6.12-rc4 2024-10-20 13:10:44 -07:00
drivers bluetooth pull request for net: 2024-10-20 14:08:17 -07:00
fs ext4: partial zero eof block on unaligned inode size extension 2024-11-12 23:54:14 -05:00
include TTY/Serial driver fixes for 6.12-rc4 2024-10-20 13:03:30 -07:00
init cfi: fix conditions for HAVE_CFI_ICALL_NORMALIZE_INTEGERS 2024-10-13 22:23:13 +02:00
io_uring io_uring/rw: fix wrong NOWAIT check in io_rw_init_file() 2024-10-19 09:25:45 -06:00
ipc struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
kernel - Add PREEMPT_RT maintainers 2024-10-20 11:30:56 -07:00
lib Rust fixes for v6.12 (2nd) 2024-10-19 08:32:47 -07:00
LICENSES LICENSES: add 0BSD license text 2024-09-01 20:43:24 -07:00
mm mm: zero range of eof folio exposed by inode size extension 2024-11-12 23:54:14 -05:00
net bluetooth pull request for net: 2024-10-20 14:08:17 -07:00
rust Driver core fix for 6.12-rc3 2024-10-13 09:10:52 -07:00
samples [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
scripts kbuild: rust: add CONFIG_RUSTC_LLVM_VERSION 2024-10-13 22:22:28 +02:00
security ipe: fallback to platform keyring also if key in trusted keyring is rejected 2024-10-18 12:14:53 -07:00
sound ALSA: hda/conexant - Use cached pin control for Node 0x1d on HP EliteOne 1000 G2 2024-10-16 10:29:57 +02:00
tools BPF fixes: 2024-10-18 16:27:14 -07:00
usr initramfs: shorten cmd_initfs in usr/Makefile 2024-07-16 01:07:52 +09:00
virt sched/fair: Fix external p->on_rq users 2024-10-14 09:14:35 +02:00
.clang-format clang-format: Update with v6.11-rc1's for_each macro list 2024-08-02 13:20:31 +02:00
.cocciconfig
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore Add Jeff Kirsher to .get_maintainer.ignore 2024-03-08 11:36:54 +00:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore Kbuild updates for v6.12 2024-09-24 13:02:06 -07:00
.mailmap mailmap: add an entry for Andy Chiu 2024-10-17 00:28:08 -07:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS CREDITS: sort alphabetically by name 2024-10-09 12:47:19 -07:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS Char/Misc/IIO fixes for 6.12-rc4 2024-10-20 13:10:44 -07:00
Makefile Linux 6.12-rc4 2024-10-20 15:19:38 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.