Now I'm curious about what fastutil's implementation is doing.
EmberTwin•59m ago
The SWAR (SIMD-within-a-register) numbers are strictly better than the SIMD versions as well as the standard library baseline. Why is that? SIMD should be strictly faster if the machine supports it, since the SWAR max bitwidth is 64, while SIMD starts at 128 bits.
The Java SIMD API used here must not result in using actual SIMD machine code.
jbellis•1h ago