frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

SIMD programming in pure Rust

https://kerkour.com/introduction-rust-simd
33•randomint64•2d ago

Comments

crote•2d ago
What is the "nasty surprise" of Zen 4 AVX512? Sure, it's not quite the twice as fast you might initially assume, but (unlike Intel's downclocking) it's still a strict upgrade over AVX2, is it not?
cogman10•1h ago
It's splitting a 512 instruction into 2 256 instructions internally. That's the main nasty surpise.

I suppose it saves on the decoding portion a little but it's ultimately no more effective than just issuing the 2 256 instructions yourself.

MobiusHorizons•1h ago
The benefit seems to be that we are one step closer to not needing to have the fallback path. This was probably a lot more relevant before Intel shit the bed with consumer avx-512 with e-cores not having the feature
convolvatron•1h ago
axv-512 for zen4 also includes a bunch of instructions that weren't in 256, including enhanced masking, 16 bit floats, bit instructions, double-sized double-width register file
rwaksmunski•2d ago
Every Rust SIMD article should mention the .chunks_exact() auto vectorization trick by law.
ChadNauseam•57m ago
Didn't know about this. Thanks!

Not related, but I often want to see the next or previous element when I'm iterating. When that happens, I always have to switch to an index-based loop. Is there a function that returns Iter<Item=(T, Option<T>)> where the second element is a lookahead?

tyilo•37m ago
You probably just want to use `.peekable()`: https://doc.rust-lang.org/stable/std/iter/trait.Iterator.htm...
formerly_proven•1h ago
Lazy man's "kinda good enough for some cases SIMD in pure Rust" is to simply target x86-64-v3 (RUSTFLAGS=-Ctarget-cpu=x86-64-v3), which is supported by all AMD Zen and Intel CPUs since Haswell; and for floating point code, which cannot be auto-vectorized due to the accuracy implications, "simply" write it with explicit four or eight-way lanes, and LLVM will do the rest. Usually. Loops may need explicit handling of head or tail to auto-vectorize (chunks_exact helps with this, it hands you the tail).
dfajgljsldkjag•47m ago
The benchmarks on Zen 5 are absolutely insane for just a bit of extra work. I really hope the portable SIMD module stabilizes soon, so we do not have to keep rewriting the same logic for NEON and AVX every time we want to optimize something. That example about implementing ChaCha20 twice really hit home for me.
jeffbee•38m ago
"Intel CPUs were downclocking their frequency when using AVX-512 instructions due to excessive energy usage (and thus heat generation) which led to performance worse than when not using AVX-512 acceleration."

This is an overstatement so gross that it can be considered false. On Skylake-X, for mixed workloads that only had a few AVX-512 instructions, a net performance loss could have happened. On Ice Lake and later this statement was not true in any way. For code like ChaCha20 it was not true even on Skylake-X.

shihab•8m ago
> For example, NEON ... can hold up to 32 128-bit vectors to perform your operations without having to touch the "slow" memory.

Something I recently learnt: the actual number of physical registers in modern x86 CPUs are significantly larger, even for 512-bit SIMD. Zen 5 CPUs actually have 384 vectors registers, 384*512b = 24KB!

Show HN: ChartGPU – WebGPU-powered charting library (1M points at 60fps)

https://github.com/ChartGPU/ChartGPU
459•huntergemmer•8h ago•140 comments

Show HN: TerabyteDeals – Compare storage prices by $/TB

https://terabytedeals.com
43•vektor888•2h ago•31 comments

Brain on ChatGPT: Accumulation of Cognitive Debt When Using an AI Assistant

https://www.media.mit.edu/publications/your-brain-on-chatgpt/
25•misswaterfairy•57m ago•16 comments

Claude's new constitution

https://www.anthropic.com/news/claude-new-constitution
246•meetpateltech•7h ago•211 comments

Golfing APL/K in 90 Lines of Python

https://aljamal.substack.com/p/golfing-aplk-in-90-lines-of-python
29•aburjg•5d ago•2 comments

Skip is now free and open source

https://skip.dev/blog/skip-is-free/
239•dayanruben•8h ago•86 comments

Challenges in join optimization

https://www.starrocks.io/blog/inside-starrocks-why-joins-are-faster-than-youd-expect
32•HermitX•6h ago•6 comments

The WebRacket language is a subset of Racket that compiles to WebAssembly

https://github.com/soegaard/webracket
76•mfru•4d ago•17 comments

Letting Claude play text adventures

https://borretti.me/article/letting-claude-play-text-adventures
58•varjag•5d ago•20 comments

Jerry (YC S17) Is Hiring

https://www.ycombinator.com/companies/jerry-inc/jobs/QaoK3rw-software-engineer-core-automation-ma...
1•linaz•2h ago

Show HN: Rails UI

https://railsui.com/
88•justalever•5h ago•58 comments

Mystery of the Head Activator

https://www.asimov.press/p/head-activator
9•mailyk•3d ago•0 comments

Show HN: RatatuiRuby wraps Rust Ratatui as a RubyGem – TUIs with the joy of Ruby

https://www.ratatui-ruby.dev/
27•Kerrick•4d ago•4 comments

Three types of LLM workloads and how to serve them

https://modal.com/llm-almanac/workloads
23•charles_irl•7h ago•1 comments

Setting Up a Cluster of Tiny PCs for Parallel Computing

https://www.kenkoonwong.com/blog/parallel-computing/
20•speckx•4h ago•5 comments

Waiting for dawn in search: Search index, Google rulings and impact on Kagi

https://blog.kagi.com/waiting-dawn-search
194•josephwegner•6h ago•128 comments

TrustTunnel: AdGuard VPN protocol goes open-source

https://adguard-vpn.com/en/blog/adguard-vpn-protocol-goes-open-source-meet-trusttunnel.html
40•kumrayu•6h ago•10 comments

SIMD programming in pure Rust

https://kerkour.com/introduction-rust-simd
33•randomint64•2d ago•11 comments

Stevey's Birthday Blog

https://steve-yegge.medium.com/steveys-birthday-blog-34f437139cb5
6•throwawayHMM19•1d ago•1 comments

Tell HN: 2 years building a kids audio app as a solo dev – lessons learned

20•oliverjanssen•9h ago•16 comments

Scientists find a way to regrow cartilage in mice and human tissue samples

https://www.sciencedaily.com/releases/2026/01/260120000333.htm
231•saikatsg•5h ago•63 comments

Slouching Towards Bethlehem – Joan Didion (1967)

https://www.saturdayeveningpost.com/2017/06/didion/
48•jxmorris12•6h ago•2 comments

Open source server code for the BitCraft MMORPG

https://github.com/clockworklabs/BitCraftPublic
26•sfkgtbor•6h ago•7 comments

Nested code fences in Markdown

https://susam.net/nested-code-fences.html
177•todsacerdoti•10h ago•59 comments

Can you slim macOS down?

https://eclecticlight.co/2026/01/21/can-you-slim-macos-down/
152•ingve•15h ago•198 comments

Show HN: Grov – Multiplayer for AI coding agents

https://github.com/TonyStef/Grov
20•tonyystef•1h ago•8 comments

I finally got my sway layout to autostart the way I like it

https://hugues.betakappaphi.com/2026/01/19/sway-layout/
13•__hugues•14h ago•4 comments

Without benchmarking LLMs, you're likely overpaying

https://karllorey.com/posts/without-benchmarking-llms-youre-overpaying
124•lorey•1d ago•69 comments

JPEG XL Test Page

https://tildeweb.nl/~michiel/jxl/
153•roywashere•7h ago•107 comments

Show HN: Semantic search engine for Studio Ghibli movie

https://ghibli-search.anini.workers.dev/
11•aninibread•9h ago•7 comments