In line 423 or the optimised code there's a typo: "sort2(e,i)" should be "sort2(i,e)"
chkas•45m ago
That should give the same result.
anticleiades•52m ago
branch-less programming is a fascinating area.
you have used -O3. Possibly, the compiler is also vectorizing some parts of the code. I am curious to know the contribution of AVX/SIMD to the speed-up (i.e, how much speed-up avoiding branches "alone" yields)
chkas•46m ago
You can take a look at this - it's fast even without vector operations, as long as you avoid the branches that are often predicted incorrectly.
jjgreen•1h ago
chkas•45m ago