archives/zig - Forgejo: Beyond coding. We Forge.

mirror of https://codeberg.org/ziglang/zig.git synced 2026-03-09 06:36:41 +01:00

Author	SHA1	Message	Date
Andrew Kelley	e0173c2ce0	Merge pull request 'rework fuzz testing to be smith based' (#31205 ) from gooncreeper/zig:integrated-smith into master Reviewed-on: https://codeberg.org/ziglang/zig/pulls/31205 Reviewed-by: Andrew Kelley <andrew@ziglang.org>	2026-02-25 20:23:36 +01:00
Kendall Condon	bb304796f4	optimize flate decompression Matches now use memcpy and memset when possible. Block loops have been rewritten to be more optimizer friendly. Reworks Symbol and HuffmanDecoder * Symbol now only includes the value and number of code bits. decodeSymbol returns only the value. * HuffmanDecoder now takes the regular bits instead of the reversed. * Code table construction now uses buckets instead of sorting. * For linked codes, the value field of Symbol is now used as the next index. The actual value is the element index. * InvalidCode is now detected only once with a special linked index. Performance is 39.7% faster than before and 1.1% faster than gzip using a sample created from compressing a tar of the src directory.	2026-02-25 20:05:48 +01:00
Kendall Condon	5d58306162	rework fuzz testing to be smith based -- On the standard library side: The `input: []const u8` parameter of functions passed to `testing.fuzz` has changed to `smith: testing.Smith`. `Smith` is used to generate values from libfuzzer or input bytes generated by libfuzzer. `Smith` contains the following base methods: `value` as a generic method for generating any type * `eos` for generating end-of-stream markers. Provides the additional guarantee `true` will eventually by provided. * `bytes` for filling a byte array. * `slice` for filling part of a buffer and providing the length. `Smith.Weight` is used for giving value ranges a higher probability of being selected. By default, every value has a weight of zero (i.e. they will not be selected). Weights can only apply to values that fit within a u64. The above functions have corresponding ones that accept weights. Additionally, the following functions are provided: * `baselineWeights` which provides a set of weights containing every possible value of a type. * `eosSimpleWeighted` for unique weights for `true` and `false` * `valueRangeAtMost` and `valueRangeLessThan` for weighing only a range of values. -- On the libfuzzer and abi side: --- Uids These are u32s which are used to classify requested values. This solves the problem of a mutation causing a new value to be requested and shifting all future values; for example: 1. An initial input contains the values 1, 2, 3 which are interpreted as a, b, and c respectively by the test. 2. The 1 is mutated to a 4 which causes the test to request an extra value interpreted as d. The input is now 4, 2, 3, 5 (new value) which the test corresponds to a, d, b, c; however, b and c no longer correspond to their original values. Uids contain a hash component and type component. The hash component is currently determined in `Smith` by taking a hash of the calling `@returnAddress()` or via an argument in the corresponding `WithHash` functions. The type component is used extensively in libfuzzer with its hashmaps. --- Mutations At the start of a cycle (a run), a random number of values to mutate is selected with less being exponentially more likely. The indexes of the values are selected from a selected uid with a logarithmic bias to uids with more values. Mutations may change a single values, several consecutive values in a uid, or several consecutive values in the uid-independent order they were requested. They may generate random values, mutate from previous ones, or copy from other values in the same uid from the same input or spliced from another. For integers, mutations from previous ones currently only generates random values. For bytes, mutations from previous mix new random data and previous bytes with a set number of mutations. --- Passive Minimization A different approach has been taken for minimizing inputs: instead of trying a fixed set of mutations when a fresh input is found, the input is instead simply added to the corpus and removed when it is no longer valuable. The quality of an input is measured based off how many unique pcs it hit and how many values it needed from the fuzzer. It is tracked which inputs hold the best qualities for each pc for hitting the minimum and maximum unique pcs while needing the least values. Once all an input's qualities have been superseded for the pcs it hit, it is removed from the corpus. -- Comparison to byte-based smith A byte-based smith would be much more inefficient and complex than this solution. It would be unable to solve the shifting problem that Uids do. It is unable to provide values from the fuzzer past end-of-stream. Even with feedback, it would be unable to act on dynamic weights which have proven essential with the updated tests (e.g. to constrain values to a range). -- Test updates All the standard library tests have been updated to use the new smith interface. For `Deque`, an ad hoc allocator was written to improve performance and remove reliance on heap allocation. `TokenSmith` has been added to aid in testing Ast and help inform decisions on the smith interface.	2026-02-13 22:12:19 -05:00
Andrew Kelley	ee21a1f988	fetch: implement recompression After fetching a package and applying the filter by deleting files that are not part of the hash, creates a recompressed $GLOBAL_CACHE/p/$PKG_HASH.tar.gz Checking this cache before fetching network URLs is not yet implemented.	2026-02-05 16:50:41 -08:00
mercenary	fa988e88ed	zstd.Decompress: smarter rebase when discarding (#30891 ) The call to `rebase` in `discardIndirect` and `discardDirect` was inappropriate. As `rebase` expects the `capacity` parameter to exclude the sliding window, this call was asking for ANOTHER `d.window_len` bytes. This was impossible to fulfill with a buffer smaller than 2*`d.window_len`, and caused [#25764](https://github.com/ziglang/zig/issues/25764). This PR adds a basic test to do a discard (which does trigger [#25764](https://github.com/ziglang/zig/issues/25764)), and rebases only as much as is required to make the discard succeed ([or no rebase at all](https://github.com/ziglang/zig/issues/25764#issuecomment-3484716253)). That means: ideally rebase to fit `limit`, or if the buffer is too small, as much as possible. I must say, `discardDirect` does not make much sense to me, but I replaced it anyway. `rebaseForDiscard` works fine with `d.reader.buffer.len == 0`. Let me know if anything should be changed. Reviewed-on: https://codeberg.org/ziglang/zig/pulls/30891 Reviewed-by: Andrew Kelley <andrew@ziglang.org> Co-authored-by: mercenary <mercenary@noreply.codeberg.org> Co-committed-by: mercenary <mercenary@noreply.codeberg.org>	2026-01-30 20:19:19 +01:00
Adrià Arrufat	02c5f05e2f	std: replace usages of std.mem.indexOf with std.mem.find	2025-12-05 14:31:27 +01:00
Kendall Condon	8284da2f3d	flate.Compress: simplify huffman node comparisons Instead of comparing each field, nodes are now compared as 32-bit values where `freq` is in the most significant bits.	2025-11-22 22:11:33 -08:00
Kendall Condon	f50c647977	add deflate compression, simplify decompression Implements deflate compression from scratch. A history window is kept in the writer's buffer for matching and a chained hash table is used to find matches. Tokens are accumulated until a threshold is reached and then outputted as a block. Flush is used to indicate end of stream. Additionally, two other deflate writers are provided: * `Raw` writes only in store blocks (the uncompressed bytes). It utilizes data vectors to efficiently send block headers and data. * `Huffman` only performs Huffman compression on data and no matching. The above are also able to take advantage of writer semantics since they do not need to keep a history. Literal and distance code parameters in `token` have also been reworked. Their parameters are now derived mathematically, however the more expensive ones are still obtained through a lookup table (expect on ReleaseSmall). Decompression bit reading has been greatly simplified, taking advantage of the ability to peek on the underlying reader. Additionally, a few bugs with limit handling have been fixed.	2025-09-30 18:28:47 -07:00
binarycraft007	35f013db11	lzma2: fix premature finish lzma2 Decoder already checks if decoding is finished or not inside the process function, `range_decoder`finish does not mean the decoder has finished, also need to check `ld.rep[0] == 0xFFFF_FFFF`, which was already done inside the proccess function. This fix delete the redundant `isFinish()` check for `range_decoder`.	2025-09-03 01:48:46 -07:00
Andrew Kelley	d51d18c986	Merge pull request #25105 from binarycraft007/lzma2-fix lzma2: fix array list looping logic in appendLz	2025-09-02 13:16:09 -07:00
Andrew Kelley	90ac62cc75	std.compress.lzma2: optimize appendLz make the hot loop be a for loop without any failures or allocation. change a O(N) addition into O(1)	2025-09-02 12:00:43 -07:00
Ryan Liptak	f872dd03da	zstd.Decompress: Assert buffer length requirements as early as possible Without this assert, providing a buffer that's smaller than required results in more cryptic assertion failures later on.	2025-09-02 11:34:46 -07:00
binarycraft007	00b0beb682	lzma2: fix array list looping logic in appendLz	2025-09-01 17:01:54 +08:00
Andrew Kelley	79f267f6b9	std.Io: delete GenericReader and delete deprecated alias std.io	2025-08-29 17:14:26 -07:00
Andrew Kelley	68f590d430	std.compress.xz: fix 32-bit targets	2025-08-26 21:07:09 -07:00
Andrew Kelley	668299f0db	std: update xz unit tests to new I/O API	2025-08-26 21:00:58 -07:00
Andrew Kelley	980445f08b	std.compress.lzma: fix unpacked size checking logic	2025-08-26 21:00:58 -07:00
Andrew Kelley	722e066173	std.compress.xz.Decompress: some tests passing	2025-08-26 21:00:58 -07:00
Andrew Kelley	d87eb7d4e4	std.compress.xz: skeleton in place missing these things: - implementation of finish() - detect packed bytes read for check and block padding - implementation of discard() - implementation of block stream checksum	2025-08-26 21:00:58 -07:00
Andrew Kelley	a8ae6c2f42	std.compress.lzma2: tests passing	2025-08-26 21:00:58 -07:00
Andrew Kelley	3cb9baaf65	std.compress.lzma: delete dead parameter update is always passed as true	2025-08-26 21:00:58 -07:00
Andrew Kelley	8523cbef0e	std.compress.lzma: tests passing	2025-08-26 21:00:58 -07:00
Andrew Kelley	58e60697e2	std.compress.lzma: update for new I/O API	2025-08-26 21:00:58 -07:00
Andrew Kelley	6464e0d4fc	std.compress.xz: flatten namespaces	2025-08-26 21:00:58 -07:00
Andrew Kelley	ea0ce7afb5	std.compress: flatten lzma and lzma2 namespaces	2025-08-26 21:00:58 -07:00
Ryan Liptak	98547713a3	zstd: Protect against index out-of-bounds when decoding sequences Previously, index out-of-bounds could occur when copying match_length bytes while decoding whatever sequence happened to overflow `dest`. Now, each sequence checks that there is enough room for the full sequence_length (literal_length + match_length) before doing any copying. Fixes the failing inputs found here: https://github.com/ziglang/zig/issues/24817#issuecomment-3192927715	2025-08-15 22:11:51 -07:00
Andrew Kelley	30b41dc510	std.compress.zstd.Decompress fixes * std.Io.Reader: appendRemaining no longer supports alignment and has different rules about how exceeding limit. Fixed bug where it would return success instead of error.StreamTooLong like it was supposed to. * std.Io.Reader: simplify appendRemaining and appendRemainingUnlimited to be implemented based on std.Io.Writer.Allocating * std.Io.Writer: introduce unreachableRebase * std.Io.Writer: remove minimum_unused_capacity from Allocating. maybe that flexibility could have been handy, but let's see if anyone actually needs it. The field is redundant with the superlinear growth of ArrayList capacity. * std.Io.Writer: growingRebase also ensures total capacity on the preserve parameter, making it no longer necessary to do ensureTotalCapacity at the usage site of decompression streams. * std.compress.flate.Decompress: fix rebase not taking into account seek * std.compress.zstd.Decompress: split into "direct" and "indirect" usage patterns depending on whether a buffer is provided to init, matching how flate works. Remove some overzealous asserts that prevented buffer expansion from within rebase implementation. * std.zig: fix readSourceFileToAlloc returning an overaligned slice which was difficult to free correctly. fixes #24608	2025-08-15 10:44:35 -07:00
Ryan Liptak	08f0780cb2	zstd.Decompress.stream: Fix handling of skippable frames in new_frame state The previous code assumed that `initFrame` during the `new_frame` state would always result in the `in_frame` state, but that's not always the case. `initFrame` can also result in the `skippable_frame` state, which would lead to access of union field 'in_frame' while field 'skipping_frame' is active. Now, the switch is re-entered with the updated state so either case is handled appropriately. Fixes the crashes from https://github.com/ziglang/zig/issues/24817	2025-08-14 17:37:51 -07:00
Andrew Kelley	e252e6c696	Merge pull request #24847 from squeek502/zstd-partial-magic zstd.Decompress: Treat a partial magic number as a failure	2025-08-14 16:08:06 -07:00
Ryan Liptak	353cf1f671	zstd.Decompress: Delete unused/impossible "end" state	2025-08-14 14:08:49 -07:00
Ryan Liptak	60b0b21296	zstd.Decompress: Treat a partial magic number as a failure Previously, the "allow EndOfStream" part of this logic was too permissive. If there are a few dangling bytes at the end of the stream, that should be treated as a bad magic number. The only case where EndOfStream is allowed is when the stream is truly at the end, with exactly zero bytes available.	2025-08-14 14:08:49 -07:00
Andrew Kelley	af7e142485	std.Io.Writer: introduce rebase to the vtable fixes #24814	2025-08-14 12:56:37 -07:00
Isaac Freund	b8124d9c0b	std.io.Writer.Allocating: rename getWritten() to written() This "get" is useless noise and was copied from FixedBufferWriter. Since this API has not yet landed in a release, now is a good time to make the breaking change to fix this.	2025-08-13 01:43:52 -07:00
Andrew Kelley	749f10af49	std.ArrayList: make unmanaged the default	2025-08-11 15:52:49 -07:00
Andrew Kelley	df46ee61c4	std.Io.Writer.Allocating: configurable bump amount	2025-08-08 19:22:08 -07:00
Andrew Kelley	91a81d3846	std.compress.flate.Decompress: fix buffer size in test	2025-08-08 15:03:05 -07:00
Ryan Liptak	23fff3442d	flate: Handle invalid block type Fixes `panic: invalid enum value` when the type bits had the u2 value of 3. Contributes towards #24741	2025-08-08 12:27:25 -07:00
Igor Anić	6de2310035	flate change bit reader Bits to usize (#24719 ) Don't see why byte returned from specialPeek needs to be shifted by remaining_needed_bits. I believe that decision in specialPeek should be done on the number of the remaining bits not of the content of that bits. Some test result are changed, but they are now consistent with the original state as found in: https://github.com/ziglang/zig/blame/5f790464b0d5da3c4c1a7252643e7cdd4c4b605e/lib/std/compress/flate/Decompress.zig Changing Bits from usize to u32 or u64 now returns same results. * flate: simplify peekBitsEnding `peekBits` returns at most asked number of bits. Fails with EndOfStream when there are no available bits. If there are less bits available than asked still returns that available bits. Hopefully this change better reflects intention. On first input stream peek error we break the loop.	2025-08-07 14:40:08 -07:00
Igor Anić	d2149106a6	flate zlib fix end of block reading `n` is wanted number of bits to toss `buffered_n` is actual number of bytes in `next_int`	2025-08-05 17:09:41 -07:00
Ian Johnson	96be6f6566	std.compress.flate.Decompress: return correct size for unbuffered decompression Closes #24686 As a bonus, this commit also makes the `git.zig` "testing `main`" compile again.	2025-08-04 19:32:14 -07:00
Andrew Kelley	a6f7927764	std.compress.flate.Decompress: use 64 buffered bits will have to find out why usize doesn't work for 32 bit targets some other time	2025-08-01 09:04:27 -07:00
Andrew Kelley	64814dc986	std.compress.flate.Decompress: respect stream limit	2025-07-31 22:10:11 -07:00
Andrew Kelley	a7808892f7	std.compress.flate.Decompress: be in indirect or direct mode depending on whether buffered	2025-07-31 22:10:11 -07:00
Andrew Kelley	6eac56caf7	std.compress.flate.Decompress: allow users to swap out Writer	2025-07-31 22:10:11 -07:00
Andrew Kelley	c9ff068391	std.compress: fix discard impl and flate error detection	2025-07-31 22:10:11 -07:00
Andrew Kelley	111305678c	std: match readVec fn prototype exactly this is not necessary according to zig language, but works around a flaw in the C backend	2025-07-31 22:10:11 -07:00
Andrew Kelley	4c04835a08	std.compress.zstd.Decompress: implement discard and readVec	2025-07-31 22:10:11 -07:00
Andrew Kelley	84e4343b0c	fix test failures by adding readVec	2025-07-31 22:10:11 -07:00
Andrew Kelley	afe9f3a9ec	std.compress.flate.Decompress: implement readVec and discard	2025-07-31 22:10:11 -07:00
Andrew Kelley	6bcced31a0	fix 32-bit compilation	2025-07-31 22:10:11 -07:00

1 2 3 4 5 ...

269 commits