-- On the standard library side:
The `input: []const u8` parameter of functions passed to `testing.fuzz`
has changed to `smith: *testing.Smith`. `Smith` is used to generate
values from libfuzzer or input bytes generated by libfuzzer.
`Smith` contains the following base methods:
* `value` as a generic method for generating any type
* `eos` for generating end-of-stream markers. Provides the additional
guarantee `true` will eventually by provided.
* `bytes` for filling a byte array.
* `slice` for filling part of a buffer and providing the length.
`Smith.Weight` is used for giving value ranges a higher probability of
being selected. By default, every value has a weight of zero (i.e. they
will not be selected). Weights can only apply to values that fit within
a u64. The above functions have corresponding ones that accept weights.
Additionally, the following functions are provided:
* `baselineWeights` which provides a set of weights containing every
possible value of a type.
* `eosSimpleWeighted` for unique weights for `true` and `false`
* `valueRangeAtMost` and `valueRangeLessThan` for weighing only a range
of values.
-- On the libfuzzer and abi side:
--- Uids
These are u32s which are used to classify requested values. This solves
the problem of a mutation causing a new value to be requested and
shifting all future values; for example:
1. An initial input contains the values 1, 2, 3 which are interpreted
as a, b, and c respectively by the test.
2. The 1 is mutated to a 4 which causes the test to request an extra
value interpreted as d. The input is now 4, 2, 3, 5 (new value) which
the test corresponds to a, d, b, c; however, b and c no longer
correspond to their original values.
Uids contain a hash component and type component. The hash component
is currently determined in `Smith` by taking a hash of the calling
`@returnAddress()` or via an argument in the corresponding `WithHash`
functions. The type component is used extensively in libfuzzer with its
hashmaps.
--- Mutations
At the start of a cycle (a run), a random number of values to mutate is
selected with less being exponentially more likely. The indexes of the
values are selected from a selected uid with a logarithmic bias to uids
with more values.
Mutations may change a single values, several consecutive values in a
uid, or several consecutive values in the uid-independent order they
were requested. They may generate random values, mutate from previous
ones, or copy from other values in the same uid from the same input or
spliced from another.
For integers, mutations from previous ones currently only generates
random values. For bytes, mutations from previous mix new random data
and previous bytes with a set number of mutations.
--- Passive Minimization
A different approach has been taken for minimizing inputs: instead of
trying a fixed set of mutations when a fresh input is found, the input
is instead simply added to the corpus and removed when it is no longer
valuable.
The quality of an input is measured based off how many unique pcs it
hit and how many values it needed from the fuzzer. It is tracked which
inputs hold the best qualities for each pc for hitting the minimum and
maximum unique pcs while needing the least values.
Once all an input's qualities have been superseded for the pcs it hit,
it is removed from the corpus.
-- Comparison to byte-based smith
A byte-based smith would be much more inefficient and complex than this
solution. It would be unable to solve the shifting problem that Uids
do. It is unable to provide values from the fuzzer past end-of-stream.
Even with feedback, it would be unable to act on dynamic weights which
have proven essential with the updated tests (e.g. to constrain values
to a range).
-- Test updates
All the standard library tests have been updated to use the new smith
interface. For `Deque`, an ad hoc allocator was written to improve
performance and remove reliance on heap allocation. `TokenSmith` has
been added to aid in testing Ast and help inform decisions on the smith
interface.
Changes an assert back into a conditional to match the behavior of `getPosix`, see https://codeberg.org/ziglang/zig/pulls/31113#issuecomment-10371698 and https://github.com/ziglang/zig/issues/23331.
Note: the conditional has been updated to also return null early on 0-length key lookups, since there's no need to iterate the block in that case.
For `Environ.Map`, validation of keys has been split into two categories: 'put' and 'fetch', each of which are tailored to the constraints that the implementation actually relies upon. Specifically:
- Hashing (fetching) requires the keys to be valid WTF-8 on Windows, but does not rely on any other properties of the keys (attempting to fetch `F\x00=` is not a problem, it just won't be found)
- `create{Posix,Windows}Block` relies on the Map to always have fully valid keys (no NUL, no `=` in an invalid location, no zero-length keys), which means that the 'put' APIs need to validate that incoming keys adhere to those properties.
The relevant assertions are now documented on each of the Map functions.
Also reinstates some test cases in the `env_vars` standalone test. Some of the reinstated tests are effectively just testing the Environ.Map implementation due to how `Environ.contains`, `Environ.getAlloc`, etc are implemented, but that is not inherent to those functions so the tests are still potentially relevant if e.g. `contains` is implemented in terms of `getPosix`/`getWindows` in the future (which is totally possible and maybe a good idea since constructing the whole map is not necessary for looking up one key).
In Zig standard library, Dir means an open directory handle. path
represents a file system identifier string. This function is better
named after "current path" than "current dir". "get" and "working" are
superfluous.
We use long calls for these just like thumb*-linux-* to prevent range issues as
the binaries grow larger over time.
We also need function and data sections due to the many __stack_chk_guard
references within the std test binary; without these options, the linker is not
able to insert range thunks in between functions because the std binary just has
one giant .text section that's opaque to the linker.
closes https://codeberg.org/ziglang/zig/issues/30923
e2338edb47 didn't *quite* do it, the call sites of all switch prong related
functions now have to do their part too and be a little more precise about
what kind of prong they're currently analyzing.
Also removes some unused/unnecessary stuff.
This fixes a regression from a couple of commits ago; breaking from the
`else` block of a loop to the loop's tag should be allowed when explicitly
targeting the label by name.
This commit aims to simplify and de-duplicate the logic required for
semantically analyzing `switch` expressions.
The core logic has been moved to `analyzeSwitchBlock`, `resolveSwitchBlock`
and `finishSwitchBlock` and has been rewritten around the new iterator-based
API exposed by `Zir.UnwrappedSwitchBlock`.
All validation logic and switch prong item resolution have been moved to
`validateSwitchBlock`, which produces a `ValidatedSwitchBlock` containing
all the necessary information for further analysis.
`Zir.UnwrappedSwitchBlock`, `ValidatedSwitchBlock` and `SwitchOperand`
replace `SwitchProngAnalysis` while adding more flexibility, mainly for
better integration with `switch_block_err_union`.
`analyzeSwitchBlock` has an explicit code path for OPV types which lowers
them to either a `block`-`br` or a `loop`-`repeat` construct instead of a
switch. Backends expect `switch` to actually have an operand that exists
at runtime, so this is a bug fix and avoids further special cases in the
rest of the switch logic.
`resolveSwitchBlock` and `finishSwitchBr` exclusively deal with operands
which can have more than one value, at comptime and at runtime respectively.
This commit also reworks `RangeSet` to be an unmanaged container and adds
`Air.SwitchBr.BranchHints` to offload some complexity from Sema to there
and save a few bytes of memory in the process.
Additionally, some new features have been implemented:
- decl literals and everything else requiring a result type (`@enumFromInt`!)
may now be used as switch prong items
- union tag captures are now allowed for all prongs, not just `inline` ones
- switch prongs may contain errors which are not in the error set being
switched on, if these prongs contain `=> comptime unreachable`
and some bugs have been fixed:
- lots of issues with switching on OPV types are now fixed
- the rules around unreachable `else` prongs when switching on errors now
apply to *any* switch on an error, not just to `switch_block_err_union`,
and are applied properly based on the AST
- switching on `void` no longer requires an `else` prong unconditionally
- lazy values are properly resolved before any comparisons with prong items
- evaluation order between all kinds of switch statements is now the same,
with or without label
Enhances `GenZir` to allow labels to provide separate `break` and `continue`
target blocks and adds some more information on continue targets to
communicate whether the target is a switch block or cannot be targeted by
`continue` at all.
The main motivation is enabling this:
```
const result: u32 = operand catch |err| label: switch (err) {
else => continue :label error.MyError,
error.MyError => break :label 1,
};
```
to be lowered to something like this:
```
%1 = block({
%2 = is_non_err(%operand)
%3 = condbr(%2, {
%4 = err_union_payload_unsafe(%operand)
%5 = break(%1, result) // targets enclosing `block`
}, {
%6 = err_union_code(%operand)
%7 = switch_block(%6,
else => {
%8 = switch_continue(%7, "error.MyError") // targets `switch_block`
},
"error.MyError" => {
%9 = break(%1, @one) // targets enclosing `block`
},
)
%10 = break(%1, @void_value)
})
})
```
which makes the non-error case and all breaks from switch prongs, but not
continues from switch prongs, peers.
This is required to avoid the problems described in gh#11957 for labeled
switches without having to introduce a fairly complex special case to the
`switch_block_err_union` logic. Since this construct is very rare in practice,
introducing this additional complexity just to save a few ZIR bytes is
likely not worth it, so the simplified lowering described above will be
used instead.
As a nice bonus, AstGen can now also detect a `continue` trying to target
a labeled block and emit an appropriate error message.
Previously Zig allowed you to write something like,
```zig
switch (x) {
.y => |_| {
```
This seems a bit strange because in other cases, such as when
capturing the tag in a switch case,
```zig
switch (x) {
.y => |_, _| {
```
this produces an error.
The only usecase I can think of for the previous behaviour is
if you wanted to assert that all union payloads are able
to coerce,
```zig
const X = union(enum) { y: u8, z: f32 };
switch (x) {
.y, .z => |_| {
```
This will compile-error with the `|_|` and pass without it.
I don't believe this usecase is strong enough to keep the current
behaviour; it was never used in the Zig codebase and I cannot
find a single usage of this behaviour in the real world, searching
through Sourcegraph.
If the float can store all possible values of the integer without
rounding, coercion is allowed. The integer's precision must be less than
or equal to the float's significand precision.
Closes#18614