On NetBSD and Illumos, we were using the `std.Thread.Futex`-based
implementation of `std.Thread.Mutex`. But since futex is not a primitive
on these targets, the implementation of `std.Thread.Futex` was based on
pthread primitives, including `pthread_mutex_t`. This had the amusing
consequence that locking a contended mutex on NetBSD would actually
perform 2 mutex locks, 2 mutex unlocks, and 1 condition wait; likewise,
unlocking a contended mutex would perform 2 mutex locks, 2 mutex
unlocks, and 1 condition signal. Having read some cutting-edge studies,
I have concluded that this is a slightly suboptimal approach. Instead,
let's just use pthread mutexes directly in this case; that's an
obviously better idea.
In the future, I think we can probably entirely remove our usages of
pthread sync primitives---no platform actually treats them as the base
primitives. Of the platforms which std has any meaningful support for
today, most support futexes, and the exceptions (NetBSD and Illumos)
support a thread parking API. We can implement futex and/or mutex on top
of thread parking and drop the pthread dependency entirely.
On Windows, changing the value at the target address is not sufficient, there needs to be a wake call.
From [MSDN](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitonaddress):
> If the value at Address differs from the value at CompareAddress, the function returns immediately. If the values are the same, the function does not return until another thread in the same process signals that the value at Address has changed by calling WakeByAddressSingle or WakeByAddressAll or the timeout elapses, whichever comes first.
We could note that this behavior is only observed on Windows. However, if we want the function to have consistent behavior across all platforms, we should require users to call wake. On platforms where changing the value is sufficient to wake the waiting thread, an unblocking of the thread without a matching wake would just be considered a spurious wake.
while still preserving the guarantee about async() being assigned a unit
of concurrency (or immediately running the task), this change:
* retains the error from calling getCpuCount()
* spawns all threads in detached mode, using WaitGroup to join them
* treats all workers the same regardless of whether they are processing
concurrent or async tasks. one thread pool does all the work, while
respecting async and concurrent limits.
This test called `yield` 80,000 times, which is nothing on a system with
little load, but murder on a CI system. macOS' scheduler in particular
doesn't seem to deal with this very well. The `yield` calls also weren't
even necessarily doing what they were meant to: if the optimizer could
figure out that it doesn't clobber some memory, then it could happily
reorder around the `yield`s anyway!
The test has been simplified and made to work better, and the number of
yields have been reduced. The number of overall iterations has also been
reduced, because with the `yield` calls making races very likely, we
don't really need to run too many iterations to be confident that the
implementation is race-free.
Basically everything that has a direct replacement or no uses left.
Notable omissions:
- std.ArrayHashMap: Too much fallout, needs a separate cleanup.
- std.debug.runtime_safety: Too much fallout.
- std.heap.GeneralPurposeAllocator: Lots of references to it remain, not
a simple find and replace as "debug allocator" is not equivalent to
"general purpose allocator".
- std.io.Reader: Is being reworked at the moment.
- std.unicode.utf8Decode(): No replacement, needs a new API first.
- Manifest backwards compat options: Removal would break test data used
by TestFetchBuilder.
- panic handler needs to be a namespace: Many tests still rely on it
being a function, needs a separate cleanup.
* Use `packed struct` for flags arguments. So, instead of
`linux.FUTEX.WAIT` use `.{ .cmd = .WAIT, .private = true }`
* rename `futex_wait` and `futex_wake` which didn't actually specify
wait/wake, as `futex_3arg` and `futex_4arg` (as its the number
of parameters that is different, the `op` is whatever is specified.
* expose the full six-arg flavor of the syscall (for some of the advanced
ops), and add packed structs for their arguments.
* Use a `packed union` to support the 4th parameter which is sometimes a
`timespec` pointer, and sometimes a `u32`.
* Add tests that make sure the structure layout is correct and that the
basic argument passing is working (no actual futexes are contended).
Functions like isMinGW() and isGnuLibC() have a good reason to exist: They look
at multiple components of the target. But functions like isWasm(), isDarwin(),
isGnu(), etc only exist to save 4-8 characters. I don't think this is a good
enough reason to keep them, especially given that:
* It's not immediately obvious to a reader whether target.isDarwin() means the
same thing as target.os.tag.isDarwin() precisely because isMinGW() and similar
functions *do* look at multiple components.
* It's not clear where we would draw the line. The logical conclusion before
this commit would be to also wrap Arch.isX86(), Os.Tag.isSolarish(),
Abi.isOpenHarmony(), etc... this obviously quickly gets out of hand.
* It's nice to just have a single correct way of doing something.
these tasks have some shared data dependencies so they cannot be done
simultaneously. Future work should untangle these data dependencies so
that more can be done in parallel.
for now this commit ensures correctness by making linker input parsing
and codegen tasks part of the same queue.
It is now composed of these main sections:
* Declarations that are shared among all operating systems.
* Declarations that have the same name, but different type signatures
depending on the operating system. Often multiple operating systems
share the same type signatures however.
* Declarations that are specific to a single operating system.
- These are imported one per line so you can see where they come from,
protected by a comptime block to prevent accessing the wrong one.
Closes#19352 by changing the convention to making types `void` and
functions `{}`, so that it becomes possible to update `@hasDecl` sites
to use `@TypeOf(f) != void` or `T != void`. Happily, this ended up
removing some duplicate logic and update some bitrotted feature
detection checks.
A handful of types have been modified to gain namespacing and type
safety. This is a breaking change.
Oh, and the last usage of `usingnamespace` site is eliminated.
Instead of calling the dynamically loaded kernel32.GetLastError, we can extract it from the TEB.
As shown by [Wine](34b1606019/include/winternl.h (L439)), the last error lives at offset 0x34 of the TEB in 32-bit Windows and at offset 0x68 in 64-bit Windows.