At FOSDEM 2023 I will be presenting glidesort in the Rust room.
It is a new stable sorting algorithm that is a hybrid of quicksort and mergesort which is much faster for random data, while still taking full advantage of pre-existing order or inputs with many duplicates. On an Apple M1 in Rust the latest version is ~4.5x faster than the stdlib slice::sort for sorting 2^25 random integers while using 4 times less extra memory. It is 11.5x faster if you happen to sort only 4096 distinct integers with many duplicates. Note that the algorithm is purely comparison based and does not contain special type-dependent SIMD stuff.
Hey, the open source release will coincide with the FOSDEM presentation, so in a little under three weeks it will be under my Github account github.com/orlp.
Those numbers look impressive. For stuff like this, can it be said that it should just be implemented as is in the std libs of mainstream languages like Java, Python, C#?
For C#, it is very similar to Rust, so I would say yes. I'm too unfamiliar with the JVM and its performance characteristics to say for sure for Java, but probably also yes. However, the performance characteristics of Python are so different to Rust that I think it's safe to say a different approach will likely end up better (one that focuses on minimizing indirections and comparison function invocations).
Radix sorts actually can be used with many custom comparisons by using a sort by api which is enough to let you express many of the uses of custom sorting (e.g. sorting objects by fields, sorting by absolute value, etc). Given the overhead of radix sorts, it wouldn't shock me if this was faster for small sizes.
It is a new stable sorting algorithm that is a hybrid of quicksort and mergesort which is much faster for random data, while still taking full advantage of pre-existing order or inputs with many duplicates. On an Apple M1 in Rust the latest version is ~4.5x faster than the stdlib slice::sort for sorting 2^25 random integers while using 4 times less extra memory. It is 11.5x faster if you happen to sort only 4096 distinct integers with many duplicates. Note that the algorithm is purely comparison based and does not contain special type-dependent SIMD stuff.
I have a short preliminary talk on glidesort here: https://www.youtube.com/watch?v=2y3IK1l6PI4.
Perhaps the V8 people will be interested in it as well.