This trick is very useful on Nvidia GPUs for calculating mins and maxes in some cases, e.g. atomic mins (better u32 support than f32) or warp-wide mins with `redux.sync` (only supports u32, not f32).
TheDudeMan•1h ago
How fast if you write a for loop and keep track of the index and value of the smallest (possibly treating them as ints)?
nine_k•1h ago
I hazard to guess that it would be the same, because the compiler would produce a loop out of .iter(), would expose the loop index via .enumerate(), and would keep track of that index in .min_by(). I suppose the lambda would be inlined, maybe even along with comparisons.
I wonder could that be made faster by using AVX instructions; they allow to find the minimum value among several u32 values, but not immediately its index.
teo_zero•39m ago
I had expected something about algorithms, not Rust-specific implementations.
why_only_15•2h ago