"What they don't want you to know about sorting."
The title is an homage to Eugene Wigner's 1960 paper "The Unreasonable Effectiveness of Mathematics in the Natural Sciences".
Cool article. It's clear that all my theoretical algorithm-knowledge comes short when faced with real CPUs.
[1] https://github.com/abseil/abseil-cpp/blob/23b9b75217721040f5...
Sometimes that is how useful jumps are made. Maybe someone will come along with a problem and the data they have just happens to have similar properties.
Rather than premature optimisation this sort of thing is pre-emptive research - better to do it now than when you hit a performance problem and need the solution PDQ. Many useful things have come out of what started as “I wonder what if …?” playing.
Then again only thinking of fixing things when a customer complains is a way to end up with a leaning tower of hacks which eventually ossify and also the customer (or rather the users, who may not be the customer especially in business software) may be putting up with dozens of niggles and annoyances before they bother to actually report one bug because they can't work around it.
This is research or experimentation, designed to improve our understanding of the behavior of algorithms. Calling it premature optimization makes no sense.
This is why we have things like tournament selection. Randomly sampling from the population and running tournaments is way more scalable than scanning and ordering a global list each iteration. You can maintain things like an ELO score with very narrow views into memory. Nothing needs a global view yet you get global effects.
Sometimes when you think you need to maintain a sorted array under item insertion, it turns out that you only ever need to continually read the next-smallest (or next-largest) item -- and in that case, it suffices to maintain a heap, which is much cheaper.
With tournament selection, you can randomly pick indexes from the population to build tournaments, which is effectively instant. There is no more requirement for a serialized processing phase. All processors can build their own random tournaments and perform updates of scores. There will be occasional conflict on score updates but the idea is that with enough samples/iterations it becomes very accurate.
Another example: https://danluu.com/2choices-eviction/
> If you have a trouble solving some problem, see if sorting the data first helps.
I feel that sorting data is the ultimate computer science hack. Many, many, classes of problems turn into O(log n) problems, once you sort your input in some way. It might not be the most effective way of solving the problem, but it's often a fairly good one.
So I'm really enjoying how good sorting algorithms are getting and how despite the O complexity remains mostly the same, the real computing efficiency is improving significantly.
To save the densest among you the oh so tedious task of extrapolation here are a some anecdotal examples.
Dishwasher: Sort your dishes when you pull them out. Cut your walking time to the minimal possible steps, think of this like accessing cache data. Enjoy the now painless benefit of constantly clean dishes and kitchen.
Short story: Some friends had a geodesic dome company. I’d get brought out to do setup when they were really busy. These Domes had a lot of heavy metal pipes of different and specific lengths. Its hot, it’s outside, its heavy and tedious… pipes would invariably be in a giant pile on a pallet… caught another friend doing bubble sort… the 56’ dome went up in record time with the two of us.
More meta-hierarchical: End of life, parents on cusp of hoarding. Trauma combined with sentimentality manifests as projecting value on to items. Items pile up, life becomes unmanageable, think of this like running out of ram. Solution: be present, calm and patient. Help sort the memories and slowly help sort the items. Word of warning this is NP-hard-af… eventually they will get out of it and the life for them and you will dramatically improve.
Further reading: https://knolling.org/what-is-knolling
I think the key insight you make is recognizing that attempting to do two things at the same time is asking for disaster. Concurrency in my experience is all about rhythm and timing… something humans in their default state are notoriously bad at ie without training and practice. The best approach in my experience is doing one thing at a time with focus and conviction until completion, then move on to the next thing. If concurrency is desired then understanding duration and prioritizing items via dependencies is of utmost prudence. But paraphrasing Michel de Montaigne… something something about something
And even this is a challenge!
We just weren’t built for doing more than one thing at a time out of the gate, and context switching is our achilles heal. (Not to say there aren’t extraordinary examples to the contrary, its just not the default and requires a lot of practice. I’m thinking specifically about musicians but most people aren’t ready for the kind of commitment necessary to reinforce the neural pathways required to develop the skill)
You hit on something really deep about variety. A conversation near and dear to my heart, sadly this forum is not really conducive to the kind of long form dialog the topic is deserving of.
As a corollary, you can use binary search+linear interpolation as a fast way to approximate a strictly monotonic function, knowing its reciprocal, over a limited sets of outputs (e.g. n -> floor(255 * n^(1/2.2) + 0.5) for n in 0..255). You also get to work on bit_cast<s32>(...) as an optimization when the targeted FPU doesn't have conditional move instructions.
VPBROADCASTQ rax,ymm1
VPCMPEQQ ymm1,ymm2,ymm1
VPADDQ ymm1,ymm3,ymm3
VPBROADCASTQ copies rax into each of the 4 lanes in ymm1. The VPCMPEQQ sets each qword there to all-0 ( = 0) or all-1 ( = -1) depending on the comparison result, so the VPADDQ will accumulate 4 running negative totals into ymm3, which can be negated afterwards.I would still expect the perfect hash function approach to be faster, though -- a similar number of operations, but 25% of the memory movement.
I find the perfect hash implementation a bit weird; it seems to obfuscate that you simply look at the lowest two bits (since they differ between the four values). You can do the x + 3 and 3 - expr at the very end, once, instead of for every element.
Also the statement in the text is just false ("Radsort is a radix sort and as such unable to adapt to patterns in the data")
MSB absolutely can adjust quite easily. Even LSB can pay some attention. And hybrid like I suggested (use MSB to bucket based on high bits and then LSB) absolutely can...
Adaptive radix sorts exist, where the keyspace is divided into roughly equal sized buckets based on the distribution of the data. The setup is slow enough that this is usually used only for very large sorts that have to go out to disk, or, originally, tape.
It's the first patented algorithm, SyncSort.
conradludgate•4mo ago