Why do we even need SIMD instructions?

https://lemire.me/blog/2025/08/09/why-do-we-even-need-simd-instructions/

14•ibobev•6mo ago

Comments

rbanffy•6mo ago

Ideally, a compiler would recognize the code can be turned into SIMD instructions and just issue those, but C doesn't have syntax for making it trivial. If C had a portable syntax for at least some vector operations, compilers could readily generate code with those instructions and we'd all be a lot happier.

gary_0•6mo ago

Unfortunately a C compiler isn't going to know what to do with your probably-unaligned pointer to 5 floats that you're doing mostly horizontal arithmetic on. And SIMD instructions provide zero benefit if the data is spread out in memory, as is the case for most C/C++ application code.

Otherwise, if you want to smack proper vectors and matrices together at high speed, libraries like Eigen or DXMath already abstract away the SIMD details and work great. For nitty-gritty stuff like codecs, that's always going to be handwritten with intrinsics (or ASM), and that's fine. And libc functions like memcpy already use the fastest, fanciest instructions. It's mostly a solved problem.

Lastly, for a lot of tasks, regular math instructions are plenty fast. On modern CPUs, you need to be doing a lot of math before worrying about SIMD is worth it. And once your program becomes particularly math-heavy, you'll probably want to use the GPU instead anyways.

rbanffy•6mo ago

I completely agree - libraries is the way to deal with the problem, not only for C, but for any language that lacks syntax for array and matrix operations. Intrinsics is mot a great solution because they aren't portable, even when the exact function is the same across different ISAs. GPUs are a different ball game entirely, and reality gets messy, especially if your code intends to be portable across GPU architectures.

zdw•6mo ago

That header image is some truly cursed AI abomination.

buyucu•6mo ago

The problem is that compilers are really bad at automatically adding SIMD instructions. We need better, smarter compilers that abstract this out.

bob1029•6mo ago

I am finding the approach to intrinsics in .NET to be compelling. For example, a Vector<T> type is specifically handled by the JIT:

https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/...

https://learn.microsoft.com/en-us/dotnet/api/system.numerics...

dlcarrier•6mo ago

Instead of relying on a compiler to figure out when an abstract series of operations exactly matches the single operation of a SIMD instruction, I'd rather the language support operations that closer match SIMD instructions.

Instead of recognizing a loop that acts on each memory location in an array, a language that supports performing an operation on an array can much more easily compile to SIMD instructions.

adrian_b•5mo ago

The problem is that the mainstream programming languages are incomplete, because they do not have a simple way to specify whether a set of programming language statements must be executed in sequence, exactly in the order in which they appear in the source text, or they may be executed in any order, concurrently.

Because of this syntax defect, the compiler must guess when it may execute parts of the program concurrently. Very frequently it is impossible for the compiler do decide whether changing the order of execution is valid, so it gives up and it does not parallelize the execution.

There are many programming languages, or extensions for traditional programming languages, like OpenMP or CUDA, which remove this limitation, so parallelization is deterministic and not unpredictable and easily broken by any minor editing of the program source, like in mainstream programming languages.

Emacs-tramp-RPC: high-performance TRAMP back end using MsgPack-RPC

Nintendo Wii Themed Portfolio

"There must be something like the opposite of suicide "

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

Show HN: Engineering Perception with Combinatorial Memetics

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

The Anthropic Hive Mind

Just Started Using AmpCode

LLM as an Engineer vs. a Founder?

Crosstalk inside cells helps pathogens evade drugs, study finds

Show HN: Design system generator (mood to CSS in <1 second)

Show HN: 26/02/26 – 5 songs in a day

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

Top AI models fail at >96% of tasks

The Science of the Perfect Second (2023)

Bob Beck (OpenBSD) on why vi should stay vi (2006)

Show HN: a glimpse into the future of eye tracking for multi-agent use

The Optima-l Situation: A deep dive into the classic humanist sans-serif

Barn Owls Know When to Wait

Implementing TCP Echo Server in Rust [video]

LicGen – Offline License Generator (CLI and Web UI)

Service Degradation in West US Region

The Janitor on Mars

Bringing Polars to .NET

Adventures in Guix Packaging

Show HN: We had 20 Claude terminals open, so we built Orcha

Your Best Thinking Is Wasted on the Wrong Decisions

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

Velocity of Money

Stop building automations. Start running your business