I'm not convinced this methology is better than linear probing (which then can be optimized easily into RobinHood hashes).
The only line I see about linear hashes is:
> Linear jumps (h, h+16, h+32...) caused 42% insert failure rate due to probe sequence overlap. Quadratic jumps spread groups across the table, ensuring all slots are reachable.
Which just seems entirely erroneous to me. How can linear probing fail? Just keep jumping until you find an open spot. As long as there is at least one open spot, you'll find it in O(n) time because you're just scanning the whole table.
Linear probing has a clustering problem. But IIRC modern CPUs have these things called L1 Cache/locality, meaning scanning all those clusters is stupidly fast in practice.
If you have ten threads all probing at the same time then you could get priority inversion and have the first writer take the longest to insert. If they hit more than a couple collisions then writers who would collide with them end up taking their slots before they can scan them.
Better to use a few distributions of keys from production-like datasets, e.g., from ClickBench. Most of them will be Zipfian and also have different temporal locality.
dana321•1h ago
almostgotcaught•1h ago
This would've been 39,000X better if written in mojo.
anematode•1h ago
publicdebates•1h ago
conradludgate•1h ago
conradludgate•1h ago
https://github.com/rust-lang/hashbrown/blob/master/src/contr...
https://github.com/rust-lang/hashbrown/blob/6efda58a30fe712a...