frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Moving from GitHub to Codeberg, for lazy people

https://unterwaditzer.net/2025/codeberg.html
171•jslakro•2h ago•72 comments

European Parliament decided that Chat Control 1.0 must stop

https://bsky.app/profile/tuta.com/post/3mhxkfowv322c
421•lemoncookiechip•3h ago•105 comments

Personal Encyclopedias

https://whoami.wiki/blog/personal-encyclopedias
573•jrmyphlmn•20h ago•117 comments

From zero to a RAG system: successes and failures

https://en.andros.dev/blog/aa31d744/from-zero-to-a-rag-system-successes-and-failures/
177•andros•2d ago•48 comments

Landmark L.A. jury verdict finds Instagram, YouTube were designed to addict kids

https://www.latimes.com/california/story/2026-03-25/social-media-lawsuit-trial-meta-google-verdict
216•1vuio0pswjnm7•3h ago•154 comments

Swift 6.3

https://www.swift.org/blog/swift-6.3-released/
222•ingve•8h ago•133 comments

End of "Chat Control": EU Parliament Stops Mass Surveillance in Voting Thriller

https://www.patrick-breyer.de/en/end-of-chat-control-eu-parliament-stops-mass-surveillance-in-vot...
225•amarcheschi•3h ago•68 comments

Obsolete Sounds

https://citiesandmemory.com/obsolete-sounds/
140•benbreen•11h ago•24 comments

Running Tesla Model 3's computer on my desk using parts from crashed cars

https://bugs.xdavidhu.me/tesla/2026/03/23/running-tesla-model-3s-computer-on-my-desk-using-parts-...
749•driesdep•18h ago•251 comments

My home network observes bedtime with OpenBSD and pf

https://ratfactor.com/openbsd/pf-gateway-bedtime
25•ibobev•3d ago•2 comments

Cory Doctorow: Interoperability Can Save the Open Web

https://spectrum.ieee.org/doctorow-interoperability
32•janandonly•54m ago•6 comments

Newly purchased Vizio TVs now require Walmart accounts to use smart features

https://arstechnica.com/gadgets/2026/03/newly-purchased-vizio-tvs-now-require-walmart-accounts-to...
78•vidyesh•1h ago•78 comments

Meta and YouTube Found Negligent in Social-Media Addiction Trial

https://www.wsj.com/tech/personal-tech/meta-and-youtube-found-negligent-in-social-media-addiction...
31•1vuio0pswjnm7•1h ago•5 comments

Shell Tricks That Make Life Easier (and Save Your Sanity)

https://blog.hofstede.it/shell-tricks-that-actually-make-life-easier-and-save-your-sanity/
313•zdw•15h ago•142 comments

Niche Museums

https://www.niche-museums.com/
58•bookofjoe•2d ago•27 comments

ARC-AGI-3

https://arcprize.org/arc-agi/3
459•lairv•21h ago•295 comments

What came after the 486?

https://dfarq.homeip.net/what-came-after-486/
107•jnord•3d ago•88 comments

LibreOffice and the Art of Overreacting

https://blog.documentfoundation.org/blog/2026/03/25/libreoffice-and-the-art-of-overreacting/
144•bundie•5h ago•81 comments

Earthquake scientists reveal how overplowing weakens soil at experimental farm

https://www.washington.edu/news/2026/03/19/earthquake-scientists-reveal-how-overplowing-weakens-s...
189•Brajeshwar•1d ago•104 comments

Ashby (YC W19) Is Hiring Engineers Who Make Product Decisions

https://www.ashbyhq.com/careers?ashby_jid=c3c7125d-7883-4dff-a2bf-f5a55de4a364&utm_source=hn
1•abhikp•8h ago

My DIY FPGA board can run Quake II

https://blog.mikhe.ch/quake2-on-fpga/part4.html
201•sznio•3d ago•58 comments

Optimization lessons from a Minecraft structure locator

https://purplesyringa.moe/blog/optimization-lessons-from-a-minecraft-structure-locator/
34•ftk_•5d ago•4 comments

The truth that haunts the Ramones: 'They sold more T-shirts than records'

https://english.elpais.com/culture/2026-03-17/the-uncomfortable-truth-that-will-always-haunt-the-...
212•c420•4d ago•166 comments

The EU still wants to scan your private messages and photos

https://fightchatcontrol.eu/?foo=bar
1345•MrBruh•19h ago•364 comments

More precise elevation data for GraphHopper routing engine

https://www.graphhopper.com/blog/2026/03/23/more-precise-elevation-data-for-graphhopper/
68•karussell•2d ago•11 comments

Marriage over, €100k down; AI users whose lives were wrecked by delusion

https://www.theguardian.com/lifeandstyle/2026/mar/26/ai-chatbot-users-lives-wrecked-by-delusion
68•tim333•2h ago•50 comments

Government agencies buy commercial data about Americans in bulk

https://www.npr.org/2026/03/25/nx-s1-5752369/ice-surveillance-data-brokers-congress-anthropic
179•nuke-web3•9h ago•65 comments

Two studies in compiler optimisations

https://www.hmpcabral.com/2026/03/20/two-studies-in-compiler-optimisations/
114•hmpc•4d ago•16 comments

French e, è, é, ê, ë – what's the difference?

https://jakubmarian.com/french-e-e-e-e-e-whats-the-difference/
8•kerblang•40m ago•0 comments

Quantization from the Ground Up

https://ngrok.com/blog/quantization
306•samwho•23h ago•55 comments
Open in hackernews

Optimizing a lock-free ring buffer

https://david.alvarezrosa.com/posts/optimizing-a-lock-free-ring-buffer/
49•dalvrosa•2d ago

Comments

dalvrosa•21h ago
From 12M ops/s to 305 M ops/s on a lock-free ring buffer.

In this post, I walk you step by step through implementing a single-producer single-consumer queue from scratch.

This pattern is widely used to share data between threads in the lowest-latency environments.

loeg•1h ago
Your blog footer mentions that code samples are GPL unless otherwise noted. You don't seem to note otherwise in the article, so -- do you consider these snippets GPL licensed?
dalvrosa•1h ago
Actually I'm not sure. GPL was for source code of the website itself

I guess the code samples inside post are under https://david.alvarezrosa.com/LICENSE

But feel free to ping me if you need different license, quite open about it

kristianp•20h ago
This is in C++, other languages have different atomic primitives.
dalvrosa•17h ago
Yeah, this is quite specific to C++ (at a syntax level)
jitl•1h ago
Really? Pretty much all atomics i’ve used have load, store of various integer sizes. I wrote a ring buffer in Go that’s very similar to the final design here using similar atomics.

https://pkg.go.dev/sync/atomic#Int64

dalvrosa•1h ago
Nice one, thanks for sharing. Do you wanna share the ring buffer code itself?
wat10000•1h ago
They generally map directly to concepts in the CPU architecture. On many architectures, load/store instructions are already guaranteed to be atomic as long as the address is properly aligned, so atomic load/store is just a load/store. Non-relaxed ordering may emit a variant load/store instruction or a separate barrier instruction. Compare-exchange will usually emit a compare and swap, or load-linked/store-conditional sequence. Things like atomic add/subtract often map to single instructions, or might be implemented as a compare-exchange in a loop.

The exact syntax and naming will of course differ, but any language that exposes low-level atomics at all is going to provide a pretty similar set of operations.

dalvrosa•48m ago
100% agree +1
amluto•1h ago
Huh? Other languages that compile to machine code and offer control over struct layout and access to the machine’s atomic will work the same way.

Sure, C++ has a particular way of describing atomics in a cross-platform way, but the actual hardware operations are not specific to the language.

dalvrosa•1h ago
Yeah, different languages will have different syntaxes and ways of using atomics

But at the hardware level all are kindof the same

smj-edison•1h ago
Don't most people use C++11 atomics now? You have SeqCst, Release, Acquire, and Relaxed (with Consume deprecated due to the difficulty of implementing it). You can do loads, stores, and exchanges with each ordering type. Zig, Rust, and C all use the same orderings. I guess Java has its own memory model since it's been around a lot longer, but most people have standardized around C++'s design.

Which is a slight shame since Load-Linked/Store-Conditional is pretty cool, but I guess that's limited to ARM anyways, and now they've added extensions for CAS due to speed.

loeg•1h ago
LL/SC is still hinted at in the C++11 model with std::atomic<T>::compare_exchange_weak:

https://en.cppreference.com/w/cpp/atomic/atomic/compare_exch...

superxpro12•1h ago
I've taken an interest in lock-free queues for ultra-low power embedded... think Cortex-m0, or even avr/pic.

Things get interesting when you're working with a cpu that lacks the ldrex/strem assembly instructions that makes this all work. I think youre only options at that point are disable/enable interrupts. IF anyone has any insights into this constraint I'd love to hear it.

loeg•38m ago
For ultra low-power embedded, wouldn't a mutex approach work just fine? You're running on a single core anyway.
blacklion•41m ago
JVM has almost the same (C++ memory model was modeled after JVM one, with some subtle fixes).
sanufar•1h ago
Super fun, def gonna try this on my own time later
dalvrosa•1h ago
Feel free to share your findings
JonChesterfield•1h ago
It's obviously, trivially broken. Stores the index before storing the value, so the other thread reads nonsense whenever the race goes against it.

Also doesn't have fences on the store, has extra branches that shouldn't be there, and is written in really stylistically weird c++.

Maybe an llm that likes a different language more, copying a broken implementation off github? Mostly commenting because the initial replies are "best" and "lol", though I sympathise with one of those.

dalvrosa•1h ago
Sorry, but that's not actually true. There are no data races, the atomics prevent that (note that there are only one consumer and one producer)

Regarding the style, it follows the "almost always auto" idea from Herb Sutter

secondcoming•1h ago
If you enforce that the buffer size is a power of 2 you just use a mask to do the

    if (next_head == buffer.size())
        next_head = 0;
part
JonChesterfield•58m ago
If it's a power of two, you don't need the branch at all. Let the unsigned index wrap.
dalvrosa•49m ago
Interesting, I've never heard about anybody using this. Maybe a bit unreadable? But yeah, should work :)
loeg•35m ago
You ultimately need a mask to access the correct slot in the ring. But it's true that you can leave unmasked values in your reader/writer indices.
dalvrosa•52m ago
Indeed that's true. That extra constraint enables further optimization

It's mentioned in the post, but worth reiterating!

loeg•37m ago
This was, in fact, mentioned in the article.
loeg•1h ago
> It's obviously, trivially broken. Stores the index before storing the value, so the other thread reads nonsense whenever the race goes against it.

Are we reading the same code? The stores are clearly after value accesses.

> Also doesn't have fences on the store

?? It uses acquire/release semantics seemingly correctly. Explicit fences are not required.

JonChesterfield•59m ago
Push:

buffer_[head] = value;

head_.store(next_head, std::memory_order_release);

return true;

There's no relationship between the two written variables. Stores to the two are independent and can be reordered. The aq/rel applies to the index, not to the unrelated non-atomic buffer located near the index.

loeg•45m ago
> There's no relationship between the two written variables. Stores to the two are independent and can be reordered. The aq/rel applies to the index, not to the unrelated non-atomic buffer located near the index.

No, this is incorrect. If you think there's no relationship, you don't understand "release" semantics.

https://en.cppreference.com/w/cpp/atomic/memory_order.html

> A store operation with this memory order performs the release operation: no reads or writes in the current thread can be reordered after this store.

judofyr•43m ago
This is just wrong. See https://en.cppreference.com/w/cpp/atomic/memory_order.html. Emphasis mine:

> A store operation with this memory order performs the release operation: no reads or writes in the current thread can be reordered after this store. All writes in the current thread are visible in other threads that acquire the same atomic variable (see Release-Acquire ordering below) and writes that carry a dependency into the atomic variable become visible in other threads that consume the same atomic (see Release-Consume ordering below).

blacklion•38m ago
write with release semantic cannot be reordered with any other writes, dependent or not.

Relaxed atomic writes can be reordered in any way.

loeg•32m ago
> write with release semantic cannot be reordered with any other writes, dependent or not.

To quibble a little bit: later program-order writes CAN be reordered before release writes. But earlier program-order writes may not be reordered after release writes.

> Relaxed atomic writes can be reordered in any way.

To quibble a little bit: they can't be reordered with other operations on the same variable.

Blackthorn•1h ago
I had what I thought was a pretty good implementation, but I wasn't aware of the cache line bouncing. Looks like I've got some updates to make.
dalvrosa•58m ago
Glad that it helps :)
kevincox•58m ago
Random idea: If you have a known sentinel value for empty could you avoid the reader needing to read the writer's index? Just try to read, if it is empty the queue is empty, otherwise take the item and put an empty value there. Similarly for writing you can check the value, if it isn't empty the queue is full.

It seems that in this case as you get contention the faster end will slow down (as it is consuming what the other end just read) and this will naturally create a small buffer and run at good speeds.

The hard part is probably that sentinel and ensuring that it can be set/cleared atomically. On Rust you can do `Option<T>` to get a sentinel for any type (and it very often doesn't take any space) but I don't think there is an API to atomically set/clear that flag. (Technically I think this is always possible because the sentinel that Option picks will always be small even if the T is very large, but I don't think there is an API for this.)

loeg•41m ago
Yeah, or you could put a generation number in each slot adjacent to T and a read will only be valid if the slot's generation number == the last one observed + 1, for example. But ultimately the reader and writer still need to coordinate here, so we're just shifting the coordination cache line from the writer's index to the slot.
kevincox•37m ago
I think the key difference is that they only need to coordinate when the reader and writer are close together. If that slows one end down they naturally spread apart. So you don't lose throughput, only a little latency in the contested case.
loeg•34m ago
> I think the key difference is that they only need to coordinate when the reader and writer are close together.

This was already the case with the cached index design at the end of the article, though. (Which doesn't require extra space or extra atomic stores.)

erickpintor•50m ago
Great post!

Would you mind expanding on the correctness guarantees enforced by the atomic semantics used? Are they ensuring two threads can't push to the same slot nor pop the same value from the ring? These type of atomic coordination usually comes from CAS or atomic increment calls, which I'm not seeing, thus I'm interested in hearing your take on it.

erickpintor•44m ago
I see you replied on comment below with:

> note that there are only one consumer and one producer

That clarify things as you don't need multi-thread coordination on reads or writes if assuming single producer and single consumer.

dalvrosa•42m ago
Exactly, that's right
dalvrosa•43m ago
Thanks! That's not ensured, optimizations are only valid due to the constraints

- One single producer thread

- One single consumer thread

- Fixed buffer capacity

So to answer

> Are they ensuring two threads can't push to the same slot nor pop the same value from the ring?

No need for this usecase :)

loeg•39m ago
This is a SPSC queue -- there aren't multiple writers to coordinate, nor readers. It simplifies the design.
pixelpoet•48m ago
Great article, thanks for sharing. And such a lovely website too :)
dalvrosa•48m ago
Thanks for the feedback <3
ramon156•41m ago
Something to add to this; if you're focussing on these low-level optimizations, make sure the device this code runs on is actually tuned.

A lot of people focus on the code and then assume the device in question is only there to run it. There's so much you can tweak. I don't always measure it, but last time I saw at least a 20% improvement in Network throughput just by tweaking a few things on the machine.

dalvrosa•38m ago
Agreed. For benchmarking I used this <https://github.com/david-alvarez-rosa/CppPlayground/blob/mai...> which relies on GoogleBenchmark and pins producer/consumer threads to dedicated CPU cores

What else could be improved? Would like to learn :)

Maybe using huge pages?