frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
284•theblazehen•2d ago•93 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
33•AlexeyBrin•1h ago•5 comments

Software Engineering Is Back

https://blog.alaindichiappari.dev/p/software-engineering-is-back
15•alainrk•58m ago•7 comments

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501
14•onurkanbkrc•1h ago•1 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
713•klaussilveira•16h ago•215 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
978•xnx•21h ago•562 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
93•jesperordrup•6h ago•34 comments

Making geo joins faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
138•matheusalmeida•2d ago•35 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
71•videotopia•4d ago•10 comments

Omarchy First Impressions

https://brianlovin.com/writing/omarchy-first-impressions-CEEstJk
10•tosh•1h ago•7 comments

Ga68, a GNU Algol 68 Compiler

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
15•matt_d•3d ago•4 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
46•helloplanets•4d ago•46 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
242•isitcontent•16h ago•27 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
242•dmpetrov•16h ago•128 comments

Cross-Region MSK Replication: K2K vs. MirrorMaker2

https://medium.com/lensesio/cross-region-msk-replication-a-comprehensive-performance-comparison-o...
4•andmarios•4d ago•1 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
344•vecti•18h ago•153 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
509•todsacerdoti•1d ago•248 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
393•ostacke•22h ago•101 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
308•eljojo•19h ago•191 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
361•aktau•22h ago•187 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
436•lstoll•22h ago•286 comments

Watermark API – $0.01/image, 10x cheaper than Cloudinary

https://api-production-caa8.up.railway.app/docs
4•lembergs•2h ago•3 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
30•1vuio0pswjnm7•2h ago•29 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
73•kmm•5d ago•11 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
26•bikenaga•3d ago•13 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
98•quibono•4d ago•22 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
276•i5heu•19h ago•226 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
43•gmays•11h ago•14 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1086•cdrnsf•1d ago•469 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
312•surprisetalk•3d ago•45 comments
Open in hackernews

CRDT: Text Buffer

https://madebyevan.com/algos/crdt-text-buffer/
160•skadamat•5mo ago

Comments

j0e1•5mo ago
(2024)
josephg•5mo ago
This is a pretty good description of RGA (Replicated Growable Array). Which is a list & text CRDT that works pretty well in practice. Automerge used to use this algorithm for text editing, before moving across to FugueMax.

This algorithm has another obscure downside: It has interleaving problems if you insert items backwards. If two users do a series of inserts in reverse order, their inserts will get interleaved in a weird, unpredictable way. Eg, if I type "aaaaa" (as a series of prepended inserts) and you type "bbbb" in the same way, we can end up with "ababababa" or "aabbabbaa" some combination like that. We generally want CRDTs to be non-interleaving - so, "aaaaabbbb" or "bbbbaaaaa" should be the only possible results.

This problem is fixed by FugueMax, described in "The Art of the Fugue" paper[1]. If you're thinking of implementing a text CRDT, I recommend starting there. Fuguemax is a tiny change from RGA. We swap out the sequence numbers for a "right parent" pointer and the problem goes away. Coincidentally, the algorithm is also a 1 line change away from Yjs's CRDT algorithm.

And its really not that complicated. Most of the complexity in the fuguemax paper comes about because - like with RGA - they describe the algorithm in terms of inserts into a tree. If you ask me, this is a mistake. The algorithm is simpler if you primarily think of it as inserts into a list. (Thanks Kevin Jahns for this insight!) I programmed Fuguemax up live on camera a few months ago like this. You can fit a simple reference implementation of fuguemax in ~200 lines of code[2]. (The video is linked from the readme in that repository. In the video I explain the algorithm and all the code along the way).

[1] https://arxiv.org/abs/2305.00583

[2] https://github.com/josephg/crdt-from-scratch/blob/master/crd...

satvikpendem•5mo ago
Big fan of Automerge and other CRDTs. What are your thoughts on Eg-Walker?

https://loro.dev/docs/concepts/event_graph_walker

josephg•5mo ago
I invented it, so personally I like it very much.

The big benefit of eg-walker is that you don't need to load any history from disk to be able to do collaborative editing. There's no need to keep around and load the whole history of a document to be able to merge changes and send edits to other peers. Its also much faster in most editing situations - though modern optimizations mean text based CRDTs are crazy fast now anyway.

The downside is that eg-walker is more complex to implement. Compare - this "from scratch" traditional CRDT implementation of FugueMax:

https://github.com/josephg/crdt-from-scratch/blob/master/crd...

With the same ordering algorithm implemented on top of egwalker:

https://github.com/josephg/egwalker-from-scratch/blob/master...

Eg-walker takes about twice as much code. In this case, ~600 lines instead of 300. Its more complex, but its not crazy. It also embeds a traditional CRDT inside the algorithm. If you want to understand eg-walker, you should start with fuguemax anyway.

apt-apt-apt-apt•5mo ago
Haha if the author note had been at the bottom, I think this would have been even funnier!
tekkk•5mo ago
Wow. Didnt know there were these CRDT examples for mere mortals. I supppose once you put Rust in the mix the heads start to explode, mine included. Cool!
josephg•5mo ago
The thing that can make real world text CRDT implementations complex is that the optimisations kinda bleed into all the rest of your code. The 2 big optimisations you want for most text CRDTs - including egwalker - are:

- Using a b-tree instead of an array to store data

- Use internal run-length encoding. Humans usually type in runs of characters. So store runs of operations instead of individual operations. (Eg {insert "abc", pos 0} instead of [{insert "a", pos 0}, {insert "b" pos 1}, {insert "c" pos 2}]).

But these two ideas also affect one another. Its not enough to just use a b-tree. You need a b-tree which also stores runs. And you also need to be able to insert in the middle of a run. And so on. You need some custom collections.

If you do run-length encoding properly, all iteration throughout your code needs to make use of the compressed runs. If any part of the code works character-by-character, it'll become a bottleneck. Oh and did I mention that it works even better if you use columnar encoding, and break the data up into a bunch of small arrays? Yeahhhh.

So thats why diamond types - my optimized egwalker implementation - is tens of thousands of lines of code instead of a few hundred. (Though in my defence, it also includes custom binary serialization, testing, wasm bindings, and so on.)

Rust makes the implementation way easier to implement thanks to traits. I have simple traits for data that can be losslessly compressed into runs[1]. A whole bunch of code takes advantage of that, by providing tooling that can work with a wide variety of actual data. For example, I have a custom vec wrapper that automatically compresses items when you call push(). I have a "zip" iterator which glues together other iterators over run-length encoded data. And so on. Its great.

Though now that I think about it, maybe all that trait foo is what makes it headache inducing. I swear its worth it.

[1] Eg MergableSpan: https://github.com/josephg/diamond-types/blob/00f722d6ebdc9f...

tekkk•5mo ago
Jeezs. Thanks for the breakdown. I suppose the layering of different, complicated patterns make it too thick to parse. And some of the CRDT APIs leak quite a bit of complexity once you want to do something a little more complicated eg wrap rich text editor content.
josephg•5mo ago
I put at least some of the blame on text editors themselves. Many text editors - particularly on the web - don't expose a clean event based API telling you what changed. Its very annoying.
yladiz•5mo ago
I'm starting to learn more about RGAs and CRDTs in general, so I'm not sure if this makes sense, but when you say replacing the sequence number for the "right parent" (`originRight` in your code?), so you mean replacing the Lamport timestamp for the node/operation with a pointer to the element adjacent to the right, correct? One alternative way to approach it that comes to mind is to introduce transaction semantics so that you can consider a node to be identified by a [Lamport timestamp, site ID, transaction sequence] and the parent, and use a sequence number within the transaction to sort, but it seems like it would add additional data and complexity compared to the "right parent" approach, so it might not be ideal, and may fall victim to the same downside as the original RGA.
josephg•5mo ago
> so you mean replacing the Lamport timestamp for the node/operation with a pointer to the element adjacent to the right, correct?

Yeah thats right. Its a GUID, because they need to be sent over the wire. For text editing, we usually use {site ID, transaction sequence} because they compress better than random IDs.

> One alternative way to approach it that comes to mind is to introduce transaction semantics so that you can consider a node to be identified by a [Lamport timestamp, site ID, transaction sequence] and the parent, and use a sequence number within the transaction to sort, ...

Maybe? I don't fully understand what you mean. And even if I did, I'm not clever enough to infer all the implications of that construction. But yes, I suspect you're right that in the best case, it would be equivalent to fuguemax. And in the worst case, it would introduce new bugs.

archagon•5mo ago
> If you ask me, this is a mistake. The algorithm is simpler if you primarily think of it as inserts into a list. (Thanks Kevin Jahns for this insight!)

Can you elaborate? This sounds a little tautological, so I must be missing something.

josephg•5mo ago
The way RGA and Fugue are usually described, all the inserted items form a tree. In RGA, each item has {parent: ID, seq: number}. The item is inserted as a child of its specified parent. Fugue is a little more complex because items specify 2 parent IDs. You can store this as a tree where every item has left children and right children.

But if you actually implement your CRDT like this, you'll find the tree is incredibly unbalanced. You'll end up with runs of thousands of items where you have (x)->(y)->(z)->(q) and so on. It resembles a linked list more than anything. Performance is abysmal as a result. This is one of the causes for the terrible performance of early versions of automerge.

Here's the trick: Flatten the tree. Store all items in a list instead, in the order all the items show up in the document. But this presents a new problem: how do you correctly handle inserts? We need to insert new items in the list in the correct location, as if we inserted into a tree then flattened it afterwards. But it turns out that this translation is quite simple in practice. Its like ~10-20 lines of code.

Interestingly, the fugue paper first describes fugue (as a tree). Then it identifies & fixes a problem in the algorithm to produce fuguemax. If you do the list insertion order translation on both fugue and fuguemax, fugue ends up with an extra if() statement that causes this problem. If you remove that if statement, you get the (better) fuguemax algorithm.

This transformation results in much better performance, and much lower memory usage. Counter-intuitively, you get another order of magnitude improved performance if you then store this flattened list once more in a b-tree.

If you're curious, here's the equivalent insertion code for fuguemax, rga and yjs. These are all fuzz tested against their upstream reference implementations to verify equivalence. Fugue is also somewhere in this file, if you want to compare.

Here's FugueMax[1]: https://github.com/josephg/reference-crdts/blob/c53947408770...

RGA: https://github.com/josephg/reference-crdts/blob/c53947408770...

And Yjs: https://github.com/josephg/reference-crdts/blob/c53947408770...

As I said, I didn't come up with this idea. Kevin Jahns figured out this trick for Yjs. I adapted it to the other algorithms.

[1] Fuguemax is called "yjsmod" in this repository because this code predates the fugue paper. It turns out our algorithms are equivalent.