The goal of these changes is to allow the C backend to support the new
lazier type resolution system implemented by the frontend. This required
a full rewrite of the `CType` abstraction, and major changes to the C
backend "linker".
The `DebugConstPool` abstraction introduced in a previous commit turns
out to be useful for the C backend to codegen types. Because this use
case is not debug information but rather general linking (albeit when
targeting an unusual object format), I have renamed the abstraction to
`ConstPool`. With it, the C linker is told when a type's layout becomes
known, and can at that point generate the corresponding C definitions,
rather than deferring this work until `flush`.
The work done in `flush` is now more-or-less *solely* focused on
collecting all of the buffers into a big array for a vectored write.
This does unfortunately involve a non-trivial graph traversal to emit
type definitions in an appropriate order, but it's still quite fast in
practice, and it operates on fairly compact dependency data. We don't
generate the actual type *definitions* in `flush`; that happens during
compilation using `ConstPool` as discussed above. (We do generate the
typedefs for underaligned types in `flush`, but that's a trivial amount
of work in most cases.)
`CType` is now an ephemeral type: it is created only when we render a
type (the logic for which has been pushed into just 2 or 3 functions in
`codegen.c`---most of the backend now operates on unmolested Zig `Type`s
instead). C types are no longer stored in a "pool", although the type
"dependencies" of generated C code (that is, the struct, unions, and
typedefs which the generated code references) are tracked (in some
simple hash sets) and given to the linker so it can codegen the types.
Most importantly, adds support for `DW_TAG_typedef` to `llvm.Builder`,
and uses it to define error sets and optional pointers/errors.
Also deletes some random dead code I found.
The LLVM backend can now run the behavior tests and standard library
tests, like the x86_64 backend can. This commit required me to make a
lot of changes to how the LLVM backend lowers debug information, and
while I was doing that, I improved a few things:
* `anyerror` is now an enum type (and other error sets just wrap it), so
error values appear by name in debuggers
* Fixed broken lowering for tagged unions with zero-width payloads
* Associate container types with source locations in all cases
* Avoid depending on the order of type resolution (using the new
`DebugConstPool` abstraction), so debug information will contain all
available type information rather than just the subset which happens
to be resolved when the backend lowers that debug type
Introduces a small abstraction, `link.DebugConstPool`, to deal with
lowering type/value information into debug info when it may not be known
until type resolution (which in some cases will *never* happen). It is
currently only used by self-hosted DWARF logic, but it will also be of
use to the LLVM backend (which is my next focus).
Because of packed structs, checking whether a type is extern-compatible
requires that its layout be resolved. For functions to do this
validation as soon as the function type is created would lead to
dependency loops in cases like '*const fn (*@This()) void callconv(.c)`.
Therefore, when creating a function *type*, we no longer perform this
check immediately, instead waiting until the function is called.
This is separate from the previous commit so that these changes can be
easily reverted in the event that we decide to allow more granularity in
default value resolution in exchange for increased language complexity.
Pointers to comptime-only types (e.g. `*type`) are no longer themselves
comptime-only types. This means explicit `comptime` annotations are
required in a few more places. However, it also introduces the ability
to access pointers to (including slices of) comptime-only types at
runtime, provided only runtime fields are being accessed.
`@sizeOf` and `@bitSizeOf` are now more restricted: they are not allowed
on comptime-only or NPV (uninstantiable) types. This is because there is
no correct way to actually use the returned ABI size (e.g. you cannot
copy a comptime-only type by copying all of its runtime bits), so having
a non-zero return value had no benefit and was simply confusing.
`packed struct`s and `packed union`s can no longer contain pointer
fields. There are a few reasons for this, but in particular, binary
formats do not typically support the relocation types we would need to
lower such values into static memory. See the proposal at
https://github.com/ziglang/zig/issues/24657 for details.
Unions with no fields are now "uninstantiable" types, which work like
`noreturn` in that values of this type cannot exist. Enums with no
fields are different because they are currently considered `extern`
types, though https://github.com/ziglang/zig/issues/19855 will change
this in the future.
'comptime_int' is no longer considered a valid backing type for an enum.
In other words, 'enum(comptime_int)' is a compile error. This change is
accepted to simplify the language.
This actually doesn't cause any dependency loops in std, which is pretty
much my benchmark for it being acceptable. This can be reverted if it
turns out to be problematic, but for now, let's err on the side of
language simplicity.
To be clear, this *does* regress some cases which previously worked: I
will have to remove some behavior tests as a result of this commit. To
be honest, the tests which look to be failing as a result of this are
things which I think are generally unadvisable; I actually reckon a bit
more friction to use default field values in non-trivial ways might be a
good thing to stop people from misusing them as much. Struct fields
should very rarely have default values; about the only common situation
where they make sense is "options" structs.
Now that https://github.com/ziglang/zig/issues/24657 has been
implemented, the compiler can simplify its internal representation of
comptime-known `packed struct` and `packed union` values. Instead of
storing them field-wise, we can simply store their backing integer
value. This simplifies many operations and improves efficiency in some
cases.
...and rework some of the incremental reference tracking. Almost all
kinds of AnalUnit have one property in common: they might never be
referenced in any update despite conceptually "existing", in which case
we don't want to waste time semantically analyzing them. As of the lazy
type resolution introduced in this commit, the only units to which this
does not apply are `memoized_state` and `@"comptime"`. Previously, I had
a somewhat hacky system in `Zcu` for dealing with this, but I now have a
better understanding of the design incremental compilation is converging
on, so can implement a better solution. By finding a few unused bits
lying around (...or making them), we can represent a single bit of state
indicating whether something's corresponding units have ever been
referenced. This is akin to the units being in `Zcu.outdated`, with the
key difference being that the compiler will *not* attempt to analyze
units which are in this state. Once they are first referenced or
depended on, the flag is set to true and the unit is added to `outdated`
so that it can participate in the normal dependency resolution logic.
It is always a bug in Sema to check whether an IES is resolved. This is
because whether the IES is resolved depends on whether the function
which owns it has been analyzed yet, which depends on the order the
compiler analyzes declarations in, which it is incorrect to have any
dependency on. Instead, we must always either not look at the resolved
set, or resolve it first (with `Sema.ensureFuncIesResolved`) and then
look at the definitely-resolved concrete error set.
Luckily, removing a bunch of the buggy logic which tried to
opportunistically use already-resolved inferred error sets actually
didn't regress anything! It seems this logic was mostly left over from
before Andrew reworked inferred error sets, and had become essentially
dead code. This is because inferred error sets are stricter than they
used to be, and in particular, we make no attempt to support mutual
recursion.
I suspect that most of the logic touching IESes can be simplified even
further than I have done here without regressing any existing code; my
goal in this commit was just to remove any *buggy* code I could find.