Modifies the `Allocator` implementation provided by `ArenaAllocator` to be
threadsafe using only atomics and no synchronization primitives locked
behind an `Io` implementation.
At its core this is a lock-free singly linked list which uses CAS loops to
exchange the head node. A nice property of `ArenaAllocator` is that the
only functions that can ever remove nodes from its linked list are `reset`
and `deinit`, both of which are not part of the `Allocator` interface and
thus aren't threadsafe, so node-related ABA problems are impossible.
There *are* some trade-offs: end index tracking is now per node instead of
per allocator instance. It's not possible to publish a head node and its
end index at the same time if the latter isn't part of the former.
Another compromise had to be made in regards to resizing existing nodes.
Annoyingly, `rawResize` of an arbitrary thread-safe child allocator can
of course never be guaranteed to be an atomic operation, so only one
`alloc` call can ever resize at the same time, other threads have to
consider any resizes they attempt during that time failed. This causes
slightly less optimal behavior than what could be achieved with a mutex.
The LSB of `Node.size` is used to signal that a node is being resized.
This means that all nodes have to have an even size.
Calls to `alloc` have to allocate new nodes optimistically as they can
only know whether any CAS on a head node will succeed after attempting it,
and to attempt the CAS they of course already need to know the address of
the freshly allocated node they are trying to make the new head.
The simplest solution to this would be to just free the new node again if
a CAS fails, however this can be expensive and would mean that in practice
arenas could only really be used with a GPA as their child allocator. To
work around this, this implementation keeps its own free list of nodes
which didn't make their CAS to be reused by a later `alloc` invocation.
To keep things simple and avoid ABA problems the free list is only ever
be accessed beyond its head by 'stealing' the head node (and thus the
entire list) with an atomic swap. This makes iteration and removal trivial
since there's only ever one thread doing it at a time which also owns all
nodes it's holding. When the thread is done it can just push its list onto
the free list again.
This implementation offers comparable performance to the previous one when
only being accessed by a single thread and a slight speedup compared to
the previous implementation wrapped into a `ThreadSafeAllocator` up to ~7
threads performing operations on it concurrently.
(measured on a base model MacBook Pro M1)
- delete std.Thread.Futex
- delete std.Thread.Mutex
- delete std.Thread.Semaphore
- delete std.Thread.Condition
- delete std.Thread.RwLock
- delete std.once
std.Thread.Mutex.Recursive remains... for now. it will be replaced with
a special purpose mechanism used only by panic logic.
std.Io.Threaded exposes mutexLock and mutexUnlock for the advanced case
when you need to call them directly.
Our implementation did the classic add-sub rounding trick `(y = x +/- C =+ C - x)`
with `C = 1 / eps(T) = 2^(mantissa - 1)`. This approach only works for values whose
magnitude is below the rounding capacity of the constant. For a 64-bit mantissa
(like f80 has), `C = 2^63` only rounds for `|x| < 2^63`. Before we allowed this to
be ran on `e < bias + 64` aka `|x| < 2^64`. And because it isn't large enough,
we lose a bit to rounding.
For reference, the musl implementation does the same thing, using `mantissa - 1`:
https://git.musl-libc.org/cgit/musl/tree/src/math/ceill.c#n18
where `LDBL_MANT_DIG` is 64 for `long double` on x86.
This commit also combines the floor and ceil implementations into one generic one.
It has been observed in practice on powerpc64(le)-linux that zig.o (in various
stages and build modes) is large enough that the +/- 32MB branch range of
PowerPC is insufficient to reach from one end of the code section to the other.
With LLVM 21, this leads to silent miscompiles that then crash at runtime. With
LLVM 22, it will at least lead to branch range errors that fail the compilation,
but that gets us no closer to a working compiler.
By using these options, we give the linker much greater flexibility to move code
and data around to satisfy these range constraints; without them, the linker is
not allowed to split up the huge code and data sections of zig.o to do so.
Similar issues have also been observed on powerpc-linux (32-bit), hexagon-linux,
and some variations of mips(64)(el)-linux. But let's be conservative for now;
those other targets can be added to the condition later.
As a data point to support this change, it's worth noting that LLD started
enabling these options for LTO precisely because the resulting large compilation
units ran into these range issues. In some abstract sense, Zig can be seen as
doing a limited form of "LTO" in the frontend, so it's not surprising that we
would hit the same issues.
`__addosi4`, `__addodi4`, `__addoti4`, `__subosi4`, `__subodi4`, and
`__suboti4` were all functions which we invented for no apparent reason.
Neither LLVM, nor GCC, nor the Zig compiler use these functions. It
appears the functions were created in a kind of misunderstanding of an
old language proposal; see https://github.com/ziglang/zig/pull/10824.
There is no benefit to these functions existing; if a Zig compiler
backend needs this operation, it is trivial to implement, and *far*
simpler than calling a compiler-rt routine. Therefore, this commit
deletes them. A small amount of that code was used by other parts of
compiler-rt; the logic is trivial so has just been inlined where needed.
I also chose to quickly implement `__addvdi3` (a standard function)
because it is trivial and we already implement the `sub` parallel.
As with Solaris (dba1bf9353), we have no way to
actually audit contributions for these OSs. IBM also makes it even harder than
Oracle to actually obtain these OSs.
closes#23695closes#23694closes#3655closes#23693
There is no straightforward way for the Zig team to access the Solaris system
headers; to do this, one has to create an Oracle account, accept their EULA to
download the installer ISO, and finally install it on a machine or VM. We do not
have to jump through hoops like this for any other OS that we support, and no
one on the team has expressed willingness to do it.
As a result, we cannot audit any Solaris contributions to std.c or other
similarly sensitive parts of the standard library. The best we would be able to
do is assume that Solaris and illumos are 100% compatible with no way to verify
that assumption. But at that point, the solaris and illumos OS tags would be
functionally identical anyway.
For Solaris especially, any contributions that involve APIs introduced after the
OS was made closed-source would also be inherently more risky than equivalent
contributions for other proprietary OSs due to the case of Google LLC v. Oracle
America, Inc., wherein Oracle clearly demonstrated its willingness to pursue
legal action against entities that merely copy API declarations.
Finally, Oracle laid off most of the Solaris team in 2017; the OS has been in
maintenance mode since, presumably to be retired completely sometime in the 2030s.
For these reasons, this commit removes all Oracle Solaris support.
Anyone who still wishes to use Zig on Solaris can try their luck by simply using
illumos instead of solaris in target triples - chances are it'll work. But there
will be no effort from the Zig team to support this use case; we recommend that
people move to illumos instead.
This experimental target was never fully completed. The operating system
is not that interesting or popular anyway, and the maintainer is no
longer around.
Not worth the maintenance burden. This code can be resurrected later if
it is worth it. In such case it will be subject to greater scrutiny.
added adapter to AnyWriter and GenericWriter to help bridge the gap
between old and new API
make std.testing.expectFmt work at compile-time
std.fmt no longer has a dependency on std.unicode. Formatted printing
was never properly unicode-aware. Now it no longer pretends to be.
Breakage/deprecations:
* std.fs.File.reader -> std.fs.File.deprecatedReader
* std.fs.File.writer -> std.fs.File.deprecatedWriter
* std.io.GenericReader -> std.io.Reader
* std.io.GenericWriter -> std.io.Writer
* std.io.AnyReader -> std.io.Reader
* std.io.AnyWriter -> std.io.Writer
* std.fmt.format -> std.fmt.deprecatedFormat
* std.fmt.fmtSliceEscapeLower -> std.ascii.hexEscape
* std.fmt.fmtSliceEscapeUpper -> std.ascii.hexEscape
* std.fmt.fmtSliceHexLower -> {x}
* std.fmt.fmtSliceHexUpper -> {X}
* std.fmt.fmtIntSizeDec -> {B}
* std.fmt.fmtIntSizeBin -> {Bi}
* std.fmt.fmtDuration -> {D}
* std.fmt.fmtDurationSigned -> {D}
* {} -> {f} when there is a format method
* format method signature
- anytype -> *std.io.Writer
- inferred error set -> error{WriteFailed}
- options -> (deleted)
* std.fmt.Formatted
- now takes context type explicitly
- no fmt string
Cmake by default adds the `/RTC1` compiler flag for debug builds.
However, this causes C code that conforms to the C standard and has
well-defined behavior to trap. Here I've updated CMAKE to use the more
lenient `/RTCs` by default which removes the uninitialized variable checks
but keeps the stack error checks.
I'm not convinced that some of the possibilities that these regexes allowed are real. I've literally never seen or heard of "armhfel", nor of "thumb" ever showing up in `uname -m`, etc.
Each target can opt into different sets of legalize features.
By performing these transformations before liveness, instructions
that become unreferenced will have up-to-date liveness information.
Nothing interesting here; literally just the bare minimum so I can work on this
on and off in a branch without worrying about merge conflicts in the non-backend
code.