https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/...
https://learn.microsoft.com/en-us/dotnet/api/system.numerics...
Instead of recognizing a loop that acts on each memory location in an array, a language that supports performing an operation on an array can much more easily compile to SIMD instructions.
Because of this syntax defect, the compiler must guess when it may execute parts of the program concurrently. Very frequently it is impossible for the compiler do decide whether changing the order of execution is valid, so it gives up and it does not parallelize the execution.
There are many programming languages, or extensions for traditional programming languages, like OpenMP or CUDA, which remove this limitation, so parallelization is deterministic and not unpredictable and easily broken by any minor editing of the program source, like in mainstream programming languages.
rbanffy•6mo ago
gary_0•6mo ago
Otherwise, if you want to smack proper vectors and matrices together at high speed, libraries like Eigen or DXMath already abstract away the SIMD details and work great. For nitty-gritty stuff like codecs, that's always going to be handwritten with intrinsics (or ASM), and that's fine. And libc functions like memcpy already use the fastest, fanciest instructions. It's mostly a solved problem.
Lastly, for a lot of tasks, regular math instructions are plenty fast. On modern CPUs, you need to be doing a lot of math before worrying about SIMD is worth it. And once your program becomes particularly math-heavy, you'll probably want to use the GPU instead anyways.
rbanffy•6mo ago