I do the same in my toy JVM (to implement the reentrant mutex+condition variable that every Java object has), except I've got a rare deadlock somewhere because, as it turns out, writing complicated low level concurrency primitives is kinda hard :p
They're great, compared to cars. But while they have a relatively fast and cheap setup, over the long term light rail and trams are a lot cheaper to run and can coexist with foot & bike traffic easier since the rails make them very predictable.
I'd place serious concerns on the "coexist with bike traffic" thing though. Tram rails are a massive danger if you're running anything smaller than these "fatbike" wheels and have to cross them for whatever reason.
The author obviously knows that too, otherwise they wouldn't have written about it. All of these issues are just how the language works, and that's the problem.
Fun post! An alternative to using futexes to store thread queues in kernel space is to store them yourself. E.g. the parking_lot[0] Rust crate, inspired by WebKit[1], uses only one byte to store the unlocked/locked/locked_contended state, and under contention uses the address of the byte to index into a global open-addressing hash table of thread queues. You look up the object's entry, lock said entry, add the thread to the queue, unlock it, and go to sleep. Because you know that there is at most one entry per thread, you can keep the load factor very low in order to keep the mutex fast and form the thread queue out of a linked list of thread-locals. Leaking the old hash on resizing helps make resizing safe.
As a result, uncontended locks work the same as described in the blog post above; under contention, performance is similar to a futex too. But now your locks are only one byte in size, regardless of platform – while Windows allows 1-byte futexes, they're always 4 bytes on Linux and iirc Darwin doesn't quite have an equivalent api (but I might be wrong there). You also have more control over parked threads if you want to implement different fairness criteria, reliable timeouts or parking callbacks.
One drawback of this is that you can only easily use this within one process, while at least on Linux futexes can be shared between processes.
I've written a blog post[2] about using futexes to implement monitors (reëntrant mutexes with an associated condvar) in a compact way for my toy Java Virtual Machine, though I've since switched to a parking-lot-like approach.
That's not a very useful property, though. Because inter-core memory works on cache-line granularities, packing more than one lock in a cache line is a Bad Idea™. Potentially it allows you to pack more data being protected by a lock with that data... but alignment rules means that you're going to invariably end up spending 4 or 8 bytes (via a regular integer or a pointer) on that lock anyways.
That's typically not true due to the `Mutex<T>` design: the `T` gets padded to its alignment, then placed into the `struct Mutex` along with the signaling byte, and that struct is padded again before being put into the outer struct.
You can avoid this with a `parking_lot::Mutex<()>` or `parking_lot::RawMutex` guarding other contents, but then you need to use `unsafe` because the borrow checker doesn't understand what you're doing.
You could use CAS loops throughout to make your locks "less than one byte" in size, i.e. one byte, or perhaps one machine word, but using the free bits in that byte/word to store arbitrary data. (This is because a CAS loop can implement any read-modify-write operation on atomically sized data. But CAS will be somewhat slower than special-cased hardware atomics, so this is a bad idea for locks that are performance-sensitive.)
Yup, that's what I'm doing - storing the two bits needed for an object's monitor in the same word as its compressed class pointer. The pointer doesn't change over the lock's lifetime.
If you're interested in how the mountains and rivers are generated, it's mostly based on the paper "Large Scale Terrain Generation from Tectonic Uplift and
Fluvial Erosion": Each chunk rises (at a noise-based, constant rate) while erosion is applied based on the chunk's slope and the size of its catchment area.
The result is a river network as well as the central height of each chunk; based on this roads, caves and structures are laid out. The actual voxels are only determined when a player loads the area and are (usually) not persisted.
Also, for some technologies not related to worldgen: Rendering is done via wgpu, models are built in MagicaVoxel, and both client and server use an ECS (specs).
They're rather different: In Rust types only exist at compile time; dyn Any is a normal trait object, so you can only call the trait's methods. With C#'s dynamic, you can call arbitrary methods and access any fields with type checking of those accesses being delayed until runtime, which works because types exist at runtime too.
Rust's dyn Any corresponds better to C#'s Object; dynamic exists to interface with dynamic languages and is rarely used.
I guess this is due to the tag line of the company. I am not familiar with the compiler/LLVM space so unsure how the different branches (compiler maintenance and AI tool infrastructure for example) are covered by the PHD internships, etc.