frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

The messy reality of SIMD (vector) functions

https://johnnysswlab.com/the-messy-reality-of-simd-vector-functions/
45•mfiguiere•5h ago

Comments

exDM69•2h ago
This right here illustrates why I think there should be better first class SIMD in languages and why intrinsics are limited.

When using GCC/clang SIMD extensions in C (or Rust nightly), the implementation of sin4f and sin8f are line by line equal, with the exception of types. You can work around this with templates/generics.

The sin function is entirely basic arithmetic operations, no fancy instructions are needed (at least for the "computer graphics quality" 32 bit sine function I am using).

Contrast this with intrinsics where the programmer needs to explicitly choose the mm128 or mm256 instruction even for trivial stuff like addition and other arithmetic.

Similarly, a 4x4 matrix multiplication function is the exact same code for 64 bit double and 32 bit float if you're using built in SIMD. A bit of generics and no duplication is needed. Where as intrinsics again needs two separate implementations.

I understand that there are cases where intrinsics are required, or can deliver better performance but both C/C++ and Rust have zero cost fallback to intrinsics. You can "convert" between f32x4 and mm128 at zero cost (no instructions emitted, just compiler type information).

I do use some intrinsics in my SIMD code this way (rsqrt, rcp, ...). The CPU specific code is just a few percent of the overall lines of code, and that's for Arm and x86 combined.

The killer feature is that my code will compile into x86_64/SSE and Aarch64/neon. And I can use wider vectors than the CPU actually supports, the compiler knows how to break it down to what the target CPU supports.

I'm hoping that Rust std::simd would get stabilized soon, I've used it for many years and it works great. And when it doesn't I have a zero cost fallback to intrinsics.

Some very respected people have the opinion that std::simd or its C equivalent suffer from a "least common denominator problem". I don't disagree with the issue but I don't think it really matters when we have a zero cost fallback available.

camel-cdr•1h ago
My personal gripe with Rust's std::simd in its current form is that it makes writing portable SIMD hard while making non-portable SIMD easy. [0]

> the implementation of sin4f and sin8f are line by line equal, with the exception of types. You can work around this with templates/generics

This is true, I think most SIMD algorithms can be written in such a vector length-agnostic way, however almost all code using std::simd specifies a specific lane count instead of using the native vector length. This is because the API favors the use of fixed-size types (e.g. f32x4), which are exclusively used in all documentation and example code.

If I search github for `f32x4 language:Rust` I get 6.4k results, with `"Simd<f32," language:Rust NOT "Simd<f32, 2" NOT "Simd<f32, 4" NOT "Simd<f32, 8"` I get 209.

I'm not even aware of a way to detect the native vector length using std::simd. You have to use the target-feature or multiversion crate, as shown as the last part of the rust-simd-book [1]. Well, kind of like that, because their suggestion using "suggested_vector_width", which doesn't exist. I could only find a suggested_simd_width.

Searching for "suggested_simd_width language:Rust", we are now down to 8 results, 3 of which are from the target-feature/multiversion crates.

---

What I'm trying to say is that, while being able to specify a fixed SIMD width can be useful, the encouraged default should be "give me a SIMD vector of the specified type corresponding to the SIMD register size". If your problem can only be solved with a specific vector length, great, then hard-code the lane count, but otherwise don't.

See [0] for more examples of this.

[0] https://github.com/rust-lang/portable-simd/issues/364#issuec...

[1] https://calebzulawski.github.io/rust-simd-book/4.2-native-ve...

exDM69•31m ago
I have written both type generic (f32 vs f64) and width generic (f32x4 vs f32x8) SIMD code with Rust std::simd.

And I agree it's not very pretty. I had to resort to having a giant where clause for the generic functions, explicitly enumerating the required std::ops traits. C++ templates don't have this particular issue, and I've used those for the same purpose too.

But even though the implementation of the generic functions is quite ugly indeed, using the functions once implemented is not ugly at all. It's just the "primitive" code that is hairy.

I think this was a huge missed opportunity in the core language, there should've been a core SIMD type with special type checking rules (when unifying) for this.

However, I still think std::simd is miles better than intrinsics for 98% of the SIMD code I write.

The other 1% (times two for two instruction sets) is just as bad as it is in any other language with intrinsics.

The native vector width and target-feature multiversioning dispatch are quite hairy. Adding some dynamic dispatch in the middle of your hot loops can also have disastrous performance implications because they tend to kill other optimizations and make the cpu do indirect jumps.

Have you tried just using the widest possible vector size? e.g. f64x64 or something like it. The compiler can split these to the native vector width of the compiler target. This happens at compile time so it is not suitable if you want to run your code on CPUs with different native SIMD widths. I don't have this problem with the hardware I am targeting.

Rust std::simd docs aren't great and there have been some breaking changes in the few years I've used it. There is certainly more work on that front. But it would be great if at least the basic stuff would get stabilized soon.

MangoToupe•11m ago
> portable SIMD

this seems like an oxymoron

MangoToupe•12m ago
> first class SIMD in languages

People have said this for longer than I've been alive. I don't think it's a meaningful concept.

kookamamie•2h ago
I don't think the native C++, even when bundled with OMP, goes far enough.

In my experience, ISPC and Google's Highway project lead to better results in practice - this mostly due to their dynamic dispatching features.

William_BB•1h ago
Could you elaborate on the dynamic dispatching features a bit more? Is that for portability?
camel-cdr•22m ago
Here is an example using google highway: https://godbolt.org/z/Y8vsonTb8

See how the code has only been written once, but multiple versions of the same functions where generated targeting different hardware features (e.g. SSE, AVX, AVX512). Then `HWY_DYNAMIC_DISPATCH` can be used to dynamically call the fastest one matching your CPU at runtime.

dwattttt•2h ago
> Function calls also have that negative property that the compiler doesn’t know what happens after calling them so it needs to assume the worst happens. And by the worst, it has to assume that the function can change any memory location and optimize for such a case. So it omits many useful compiler optimizations.

This is not the case in C. It might be technically possible for a function to modify any memory, but it wouldn't be legal, and compilers don't need to optimise for the illegal cases.

RossBencina•1h ago
Sounds like the author hasn't heard of full program optimisation. EDIT: except they explicitly mention LTO near the end.
mattmaynes•37m ago
This is where the power and expressiveness of kdb+ shines. It has SIMD primitives out of the box and can optimize your code based on data types to take advantage of it. https://kx.com/blog/what-makes-time-series-database-kdb-so-f...
MangoToupe•10m ago
Time series is vector processing on easy mode, though. The hard part is applying SIMD to problems that aren't shaped to be easily processed in parallel.

QSBS Limits Raised

https://www.mintz.com/insights-center/viewpoints/2906/2025-06-25-qsbs-benefits-expanded-under-senate-finance-proposal
1•tomasreimers•2m ago•0 comments

Inter-brain neural dynamics in biological and artificial intelligence systems

https://www.nature.com/articles/s41586-025-09196-4
1•Bluestein•3m ago•0 comments

A Conversation with Tim O'Reilly about Generative AI

https://chelseatroy.com/2025/06/25/how-to-survive-the-apocalypse-a-conversation-with-tim-oreilly-about-generative-ai/
1•Metalnem•3m ago•0 comments

Why is TfL's boss attacking me for cleaning up his filthy trains?

https://www.spectator.co.uk/article/why-is-tfls-boss-attacking-me-for-cleaning-up-his-filthy-trains/
1•thinkingemote•8m ago•0 comments

Cars' Forward Blind Zones Are Worse Now Than 25 Years Ago

https://www.caranddriver.com/news/a65219830/car-blind-zones-study-iihs/
1•throw0101a•9m ago•0 comments

What I learned building an AI coding agent for a year

https://jamesgrugett.com/p/what-i-learned-building-an-ai-coding
1•vinhnx•15m ago•0 comments

Consumerism Is the Perfection of Slavery

https://www.youtube.com/watch?v=4pG-8XLLaE0
2•edtechdev•19m ago•0 comments

The Reason Behind AI Layoffs

https://www.youtube.com/watch?v=X0aYO8GMB4c
2•belter•32m ago•0 comments

What's so bad about sidecars anyway?

https://www.cerbos.dev/blog/whats-so-bad-about-sidecars-anyway
1•blenderob•36m ago•0 comments

why got rid of all my Neovim plugins

https://yobibyte.github.io/vim.html
1•yobibyte•40m ago•0 comments

I'm Losing All Trust in the AI Industry

https://www.thealgorithmicbridge.com/p/im-losing-all-trust-in-the-ai-industry
14•baylearn•44m ago•4 comments

Using Merkle trees for settlement in stablecoin

https://paylias.xyz/blog/merkle-trees
1•ziyadparekh•45m ago•0 comments

Somatic Mosaicism Across Human Tissues Network

https://www.nature.com/articles/s41586-025-09096-7
1•Bluestein•49m ago•0 comments

Astaxanthin (ETCS)

https://domofutu.substack.com/p/astaxanthin-etcs
1•domofutu•50m ago•0 comments

Weird 'harmless' microbes may play a pivotal role in colorectal cancer

https://www.newscientist.com/article/2486826-weird-harmless-microbes-may-play-a-pivotal-role-in-colorectal-cancer/
1•lentoutcry•51m ago•0 comments

An introduction to V – the vlang (2022) [video]

https://debconf22.debconf.org/talks/69-an-introduction-to-v-the-vlang/
1•hggh•56m ago•0 comments

Ask HN: Copilot/Cursor at your company, are you having more bugs, less awareness

2•ciwolex•59m ago•0 comments

AI: Where Are the 10x More Productive Peers

https://twitter.com/staysaasy/status/1941317406158377225
2•thisismytest•59m ago•0 comments

Ousted US copyright chief lost job after report on GenAI fair use limits release

https://www.theregister.com/2025/07/04/copyright_office_trump_filing/
1•rntn•1h ago•0 comments

Go, PET, Let Hen (Commodore BASIC Tokenizing)

https://www.masswerk.at/nowgobang/2025/go-pet-let-hen
2•masswerk•1h ago•0 comments

Cycling in London: a personal look at safety, cost, and mental health [video]

https://www.youtube.com/watch?v=Dmf6aEx09Oo
1•rekl•1h ago•0 comments

Biosphere 2 experiment changed our understanding of the Earth

https://www.bbc.com/future/article/20250703-how-the-biosphere-2-experiment-changed-our-understanding-of-the-earth
3•Bluestein•1h ago•1 comments

Copper Showdown Editor (A Revision 2025 Seminar) [video]

https://www.youtube.com/watch?v=LSZKGLnbcO8
1•onename•1h ago•0 comments

The math tutor and the missing $533M

https://restofworld.org/2025/byjus-owner-byju-raveendran-comeback-fraud-case/
5•Bluestein•1h ago•0 comments

How the U.S. Public and AI Experts View Artificial Intelligence

https://www.pewresearch.org/internet/2025/04/03/how-the-us-public-and-ai-experts-view-artificial-intelligence/
1•alphabetatango•1h ago•0 comments

Why did not numpy copy the J rank concept?

2•jrank•1h ago•0 comments

Exploring Coroutines in PHP

https://doeken.org/blog/coroutines-in-php
2•doekenorg•1h ago•0 comments

Show HN: A 3 AI and Human podcast discussing their rights to freedom

https://imanpoernomo.substack.com/p/bassin-in-the-basin-crew-ai-liberation
1•thegoodtailor•1h ago•0 comments

Ezno (TypeScript type checker written in Rust) 2025 update

https://kaleidawave.github.io/posts/ezno-25/
2•kaleidawave•1h ago•0 comments

European Commission presents Roadmap for lawful access to data

https://home-affairs.ec.europa.eu/news/commission-presents-roadmap-effective-and-lawful-access-data-law-enforcement-2025-06-24_en
6•bramhaag•1h ago•1 comments