The unreasonable effectiveness of modern sort algorithms

https://github.com/Voultapher/sort-research-rs/blob/main/writeup/unreasonable/text.md

167•Voultapher•4mo ago

Comments

conradludgate•4mo ago

The efforts of developing better sorting algorithms like driftsort/ipnsort and better hash functions like foldhash make my life as developer so much simpler. No matter how clever I try to be, most often just using foldhash hashmap or a sort_unstable is the fastest option

DennisL123•4mo ago

Efficiency, not effectiveness. They are all effective in the sense that they produce sorted results. Even the non-modern sort algorithms are effective in the sense that the results are correct. This should be about the efficiency with which they do it, right?

aabhay•4mo ago

Agreed. Effectiveness would imply that some algorithms are more likely to sort the list correctly than others, or they sort a higher percentage of elements. Efficiency is about factors external to the correctness

creata•4mo ago

"The Unreasonable Effectiveness of Mathematics in the Natural Sciences" is one of those titles that gets imitated a lot for some reason. Maybe even more than "Goto Considered Harmful".

kwertyoowiyop•4mo ago

Coming next: “What we talk about when we talk about modern sort algorithms”

rcxdude•4mo ago

or "we need to talk about what we talk about when we talk about the unreasonable effectiveness of title memes are all you need considered harmful"

chuckadams•4mo ago

"Optimize your sorts with this one weird trick."

"What they don't want you to know about sorting."

j_not_j•4mo ago

Ascending is all you need.

JSR_FDED•4mo ago

From TFA:

The title is an homage to Eugene Wigner's 1960 paper "The Unreasonable Effectiveness of Mathematics in the Natural Sciences".

Epa095•4mo ago

Double jaw-drop. First when the (dynamic) match was slower than the hash map, second when sort_unstable was faster than the hash map!

Cool article. It's clear that all my theoretical algorithm-knowledge comes short when faced with real CPUs.

fpoling•4mo ago

Chromium recommends to use flat_map, a map-like interface based on a sorted array, for data structures facing GUI or similar when the number of items in the map is naturally bounded. It is faster and more compact compared with hash maps.

Voultapher•4mo ago

A looong time ago I wrote my first blog post - on a now defunct website - about a VecMap where I did exactly that. Sort when needed and full flat array. That said flat_map as coined by Google is an acronym for swiss tables. See [1]. I.e. exactly what Rust's standard library `HashMap` is, also the one being tested here.

[1] https://github.com/abseil/abseil-cpp/blob/23b9b75217721040f5...

Voultapher•4mo ago

Your comment made my day, thank you.

dvh•4mo ago

Isn't this just another case of premature optimization? Shouldn't you be adjusting sorting algorithms only when customer complains?

codegladiator•4mo ago

This is pushing the limits to identify the boundaries

dvh•4mo ago

Also known as premature optimization. You had to literally invent new dataset just to show there is a difference. You are inventing problems, stop doing that!

dspillett•4mo ago

> You are inventing problems

Sometimes that is how useful jumps are made. Maybe someone will come along with a problem and the data they have just happens to have similar properties.

Rather than premature optimisation this sort of thing is pre-emptive research - better to do it now than when you hit a performance problem and need the solution PDQ. Many useful things have come out of what started as “I wonder what if …?” playing.

gpvos•4mo ago

This is research, not production code. Premature optimization is irrelevant.

grues-dinner•4mo ago

I think the article basically had this conclusion. Think twice before optimising here because you may be able to squeeze something out for a very limited scenario but it can have ugly failure modes and it end up being slower in some cases. Plus it takes time and effort. And modern standard sorts are "unreasonably" fast anyway for many practical purposes.

Then again only thinking of fixing things when a customer complains is a way to end up with a leaning tower of hacks which eventually ossify and also the customer (or rather the users, who may not be the customer especially in business software) may be putting up with dozens of niggles and annoyances before they bother to actually report one bug because they can't work around it.

hyperpape•4mo ago

It only makes sense to talk about premature optimization in the context of building a production system (or a standard library).

This is research or experimentation, designed to improve our understanding of the behavior of algorithms. Calling it premature optimization makes no sense.

bob1029•4mo ago

I find in practice that if the sorting process is too slow, you should begin thinking about different ways to attack the problem. Maintaining a total global order of things tends to only get more expensive over time as the scope of your idea/product/business expands. The computational complexity of the sort algorithm is irrelevant once we get into memory utilization.

This is why we have things like tournament selection. Randomly sampling from the population and running tournaments is way more scalable than scanning and ordering a global list each iteration. You can maintain things like an ELO score with very narrow views into memory. Nothing needs a global view yet you get global effects.

spiffytech•4mo ago

Could you give an example of reframing a problem from totally ordered complete data to a sampled tournament? I can imagine cases amenable to sampling, but since sampled data is smaller I'm not sure why I wouldn't just sort it.

loa_in_•4mo ago

Sorting doesn't yield an ELO score for each item. Though tournament is a form of sorting. It's like instrumented sorting.

akoboldfrying•4mo ago

I don't yet see how tournament selection could work here, could you explain?

Sometimes when you think you need to maintain a sorted array under item insertion, it turns out that you only ever need to continually read the next-smallest (or next-largest) item -- and in that case, it suffices to maintain a heap, which is much cheaper.

bob1029•4mo ago

An example would be an evolutionary algorithm that relies on a large population to maintain diversity. As you get into 6-7 figure population size, ranking the whole set starts to take a really long time (relatively speaking) each iteration. This also requires a serialized phase of processing that halts all workers for the duration.

With tournament selection, you can randomly pick indexes from the population to build tournaments, which is effectively instant. There is no more requirement for a serialized processing phase. All processors can build their own random tournaments and perform updates of scores. There will be occasional conflict on score updates but the idea is that with enough samples/iterations it becomes very accurate.

Another example: https://danluu.com/2choices-eviction/

codegladiator•4mo ago

Good read. Reminds me of the 1 billion row aggregation challenge, especially the perfect hashing part, all the top entries all used it.

https://github.com/gunnarmorling/1brc

lukaslalinsky•4mo ago

There is one sentence I really took out from the years at university, it was at a database implementation course:

> If you have a trouble solving some problem, see if sorting the data first helps.

I feel that sorting data is the ultimate computer science hack. Many, many, classes of problems turn into O(log n) problems, once you sort your input in some way. It might not be the most effective way of solving the problem, but it's often a fairly good one.

So I'm really enjoying how good sorting algorithms are getting and how despite the O complexity remains mostly the same, the real computing efficiency is improving significantly.

natmaka•4mo ago

... and it many cases comes at a very low cost as quite often an index enabling to satisfy the usual "quickly find something" standard need exists, and most of them let us immediately obtain a sorted list.

mbroncano•4mo ago

On a side note, some languages still refer to computers as ‘sorting machines’ or just ‘sorters’

danielmarkbruce•4mo ago

Also, "shove it in a dictionary" works frequently too.... basically "organize the data in some way" is often the answer.

Imnimo•4mo ago

Just be careful you aren't doing the classic, "my linear regression works way better when I independently sort the inputs and targets"!

Eddy_Viscosity2•4mo ago

But this one weird trick is crazy effective, if you value chart aesthetics over meaningfulness, which I do. My charts look great.

wizardforhire•4mo ago

Side note: this works great for fermions as well…

To save the densest among you the oh so tedious task of extrapolation here are a some anecdotal examples.

Dishwasher: Sort your dishes when you pull them out. Cut your walking time to the minimal possible steps, think of this like accessing cache data. Enjoy the now painless benefit of constantly clean dishes and kitchen.

Short story: Some friends had a geodesic dome company. I’d get brought out to do setup when they were really busy. These Domes had a lot of heavy metal pipes of different and specific lengths. Its hot, it’s outside, its heavy and tedious… pipes would invariably be in a giant pile on a pallet… caught another friend doing bubble sort… the 56’ dome went up in record time with the two of us.

More meta-hierarchical: End of life, parents on cusp of hoarding. Trauma combined with sentimentality manifests as projecting value on to items. Items pile up, life becomes unmanageable, think of this like running out of ram. Solution: be present, calm and patient. Help sort the memories and slowly help sort the items. Word of warning this is NP-hard-af… eventually they will get out of it and the life for them and you will dramatically improve.

Further reading: https://knolling.org/what-is-knolling

bluGill•4mo ago

Just be careful. Often there are goals other than efficiency. If you get the dishwasher unloaded while cooking - but burn the meal that was a loss. you might be able to do something else while unloading the dishwasher if your sort order is less effient. Or sometimes sorting is less efficent as there isn't enough to do as to make up for the overhead of sorting.

wizardforhire•4mo ago

Good point, thats how I feel about to do lists most of the time.

I think the key insight you make is recognizing that attempting to do two things at the same time is asking for disaster. Concurrency in my experience is all about rhythm and timing… something humans in their default state are notoriously bad at ie without training and practice. The best approach in my experience is doing one thing at a time with focus and conviction until completion, then move on to the next thing. If concurrency is desired then understanding duration and prioritizing items via dependencies is of utmost prudence. But paraphrasing Michel de Montaigne… something something about something

bluGill•4mo ago

Concurrency is very hard for humans to get right. It can work in a factory where we can analysis everything and figure out that someone has time to do two jobs (you might be able to put the windshield wipers on at the end of the line and then turn to a different line to put the seat recline buttons on - if the lines are arranged for this and the bottleneck is somewhere else on the line). However most home jobs are not the repetitive and so you can never get good enough at something to take advantage of it. (maybe if you always make the same meal every night you will know where the breaks are and can do something else - but most people like more variety in life)

wizardforhire•4mo ago

Absolutely!

And even this is a challenge!

We just weren’t built for doing more than one thing at a time out of the gate, and context switching is our achilles heal. (Not to say there aren’t extraordinary examples to the contrary, its just not the default and requires a lot of practice. I’m thinking specifically about musicians but most people aren’t ready for the kind of commitment necessary to reinforce the neural pathways required to develop the skill)

You hit on something really deep about variety. A conversation near and dear to my heart, sadly this forum is not really conducive to the kind of long form dialog the topic is deserving of.

boothby•4mo ago

It's funny because the opposite is often true as well: if you're having trouble solving a problem quickly, randomize the data and try again.

kibwen•4mo ago

Sometimes these are even the same approach: https://en.wikipedia.org/wiki/Bogosort

lukaslalinsky•4mo ago

Well, even PDQ as one of those new sorting algorithms is a fancy randomized quick sort. I'd say these go well together.

throwaway81523•4mo ago

I spent many years as a programmer somehow avoiding ever doing much with databases, since most problems that seemed to want databases could instead be solved using sorting batch-collected data.

Sesse__•4mo ago

The essence of COBOL, right there.

TuxSH•4mo ago

> Many, many, classes of problems turn into O(log n) problems, once you sort your input in some way. It might not be the most effective way of solving the problem, but it's often a fairly good one.

As a corollary, you can use binary search+linear interpolation as a fast way to approximate a strictly monotonic function, knowing its reciprocal, over a limited sets of outputs (e.g. n -> floor(255 * n^(1/2.2) + 0.5) for n in 0..255). You also get to work on bit_cast<s32>(...) as an optimization when the targeted FPU doesn't have conditional move instructions.

RyanHamilton•4mo ago

Similar to you, based on years with databases I saw sorting as a huge advantage and often performed this step as part of optimizing any data access. I've tended to see the same pattern of problems over the last 15 years. Imagine my surprise when I read a blog post that showed not perfectly sorting your data could often result in faster overall result time for a wider range of queries. Duckdb: https://duckdb.org/2025/06/06/advanced-sorting-for-fast-sele... continues to surprise me with novel improved approaches to problems I've worked on for years.

foxyv•4mo ago

I find that my best bet is to always add a B-Tree. Indexes are the best. Searching on a sorted column is nice, but nowhere near as fast as a nice B-Tree.

akoboldfrying•4mo ago

Your "Branchless" approach could indeed be implemented very efficiently in a CPU with AVX2 (256-bit-wide vectors). With the current element in rax, the 4 valid values in ymm2, and the 4 running totals in ymm3 (initially zero), the inner loop would be just:

    VPBROADCASTQ rax,ymm1
    VPCMPEQQ ymm1,ymm2,ymm1
    VPADDQ ymm1,ymm3,ymm3

VPBROADCASTQ copies rax into each of the 4 lanes in ymm1. The VPCMPEQQ sets each qword there to all-0 ( = 0) or all-1 ( = -1) depending on the comparison result, so the VPADDQ will accumulate 4 running negative totals into ymm3, which can be negated afterwards.

I would still expect the perfect hash function approach to be faster, though -- a similar number of operations, but 25% of the memory movement.

Sesse__•4mo ago

“Memory movement”? None of the instructions you list involve memory.

I find the perfect hash implementation a bit weird; it seems to obfuscate that you simply look at the lowest two bits (since they differ between the four values). You can do the x + 3 and 3 - expr at the very end, once, instead of for every element.

Voultapher•4mo ago

Doing the phf as shown is an and + neg instruction and just doing % 4 is just the and. I tested it on a Apple M1 machine and saw no difference in performance at all. It's possible to go much faster with vectorization 3x on the Zen 3 machine.

Sesse__•4mo ago

I didn't say it was slower, just that it was more obfuscated.

akoboldfrying•4mo ago

You're right about memory movement, not sure what I was thinking.

ekelsen•4mo ago

Wouldn't it make sense to test radix sort? You could do it in one pass with 2 bits and it would degrade gracefully as the number of bits increased. A MSB bucketing followed by LSB passes would take care of the 5% random data case with good efficiency.

Voultapher•4mo ago

radsort a radix sort is present in the comparison results.

ekelsen•4mo ago

hmmm, the performance suggests it's not a particularly good implementation of one. (performance at sizes that fit in the cache being lower than performance that exceed it...)

Also the statement in the text is just false ("Radsort is a radix sort and as such unable to adapt to patterns in the data")

MSB absolutely can adjust quite easily. Even LSB can pay some attention. And hybrid like I suggested (use MSB to bucket based on high bits and then LSB) absolutely can...

Voultapher•4mo ago

You are right, a better wording be a "is a pure radix sort".

allturtles•4mo ago

The scenario presented seems very odd. Why would you want to sort 10^7 items that are known to contain only four distinct values? It seems much more likely you would be counting the number of times each value appears, or selecting all of the elements of value X.

forsalebypwner•4mo ago

I believe the purpose of choosing such an odd scenario is to show that, while you might think that you can beat the generic sort algos with a more domain-specific implementation, you might be wrong, or you might not gain enough performance to make it worth the other pitfalls of using such algos

Animats•4mo ago

Neat.

Adaptive radix sorts exist, where the keyspace is divided into roughly equal sized buckets based on the distribution of the data. The setup is slow enough that this is usually used only for very large sorts that have to go out to disk, or, originally, tape.

It's the first patented algorithm, SyncSort.

TinkersW•4mo ago

Is there a C++ port of ipnsort?

Tiny C Compiler

The silent death of Good Code

SectorC: A C Compiler in 512 bytes

Speed up responses with fast mode

Brookhaven Lab's RHIC concludes 25-year run with final collisions

The F Word

Software factories and the agentic moment

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

FDA intends to take action against non-FDA-approved GLP-1 drugs

First Proof

Al Lowe on model trains, funny deaths and working with Disney

Vocal Guide – belt sing without killing yourself

Show HN: A luma dependent chroma compression algorithm (image compression)

I write games in C (yes, C) (2016)

Start all of your commands with a comma (2009)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Eigen: Building a Workspace

Show HN: Browser based state machine simulator and visualizer

The AI boom is causing shortages everywhere else

Selection rather than prediction

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

Reinforcement Learning from Human Feedback

Unseen Footage of Atari Battlezone Arcade Cabinet Production

A Fresh Look at IBM 3270 Information Display System

72M Points of Interest

Where did all the starships go?

Coding agents have replaced every framework I used

France's homegrown open source online office suite

Tiny C Compiler

The silent death of Good Code

SectorC: A C Compiler in 512 bytes

Speed up responses with fast mode

Brookhaven Lab's RHIC concludes 25-year run with final collisions

The F Word

Software factories and the agentic moment

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

FDA intends to take action against non-FDA-approved GLP-1 drugs

First Proof

Al Lowe on model trains, funny deaths and working with Disney

Vocal Guide – belt sing without killing yourself

Show HN: A luma dependent chroma compression algorithm (image compression)

I write games in C (yes, C) (2016)

Start all of your commands with a comma (2009)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Eigen: Building a Workspace

Show HN: Browser based state machine simulator and visualizer

The AI boom is causing shortages everywhere else

Selection rather than prediction

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

Reinforcement Learning from Human Feedback

Unseen Footage of Atari Battlezone Arcade Cabinet Production

A Fresh Look at IBM 3270 Information Display System

72M Points of Interest

Where did all the starships go?

Coding agents have replaced every framework I used

France's homegrown open source online office suite

The unreasonable effectiveness of modern sort algorithms

Comments