Sorting 1M u64 KV-pairs in 20ms on i9-13980HX using a branchless Rust impl
2•EfurDec•2h ago
I’ve developed a zero-jitter sorting DLL (written in Rust) that consistently hits 20ms for 1M u64 KV-pairs (u32+u32), regardless of data distribution. Unlike SkaSort or other adaptive algorithms, my implementation is entirely data-independent and immune to 'poisoned' datasets that typically tank performance to 40-50ms.
The approach is FPGA-inspired, implemented for x86-64-v3 (AVX2/BMI2). It hits the physical L3 bandwidth limit on my i9-13980HX (~50GB/s). While pure FPGA hardware might hit 15ms, this is likely the theoretical limit for a single-threaded CPU implementation. Works on any modern desktop.