1. I can see there's an example of using it with React and Prosemirror, what's the gap to using it with Tiptap (for those who don't know, it's an abstraction on top of Prosemirror that aims to streamline the task of building editors)?
2. Is there any prior art or room in the design for supporting permissioned blocks of content _within_ a document? i.e things which some users aren't allowed to view (or edit)
I've been using Electric SQL but Automerge 3.0 seems to be the holy grail combining local first approach to CRDT?
Wondering if I should ditch Electric SQL and switch to this instead. I'm just not sure what kind of hardware I need to run a sync server for Automerge and how many users reads/writes it can support.
ElectricSQL is pretty good too but its still not quite there and implementing local first means some features related to rollback are harder to apply.
I'm still very new to this overall but that 10x memory boost is welcome as I find with very large documents the lag used to be very noticeable.
> In Automerge 3.0, we've rearchitected the library so that it also uses the compressed representation at runtime. This has achieved huge memory savings. For example, pasting Moby Dick into an Automerge 2 document consumes 700Mb of memory, in Automerge 3 it only consumes 1.3Mb!
> Finally, for documents with large histories load times can be much much faster (we recently had an example of a document which hadn't loaded after 17 hours loading in 9 seconds!).
I'm well above 1.3mb, and although I could get it down there, performance would suffer. I'm curious how fast they sync this data with such tiny memory usage. If the resources were available before, despite using 700mb of memory, was it still faster?
These people are definitely smarter than I am so maybe their solution is a lot more clever than what I'm doing
edit: Oh, they did this part with Rust. I thought it was written in JS. I still wonder: how'd they get memory usage this low, and did it impact speed much? I'll have to dig into it
is it OSS? i'd like to benchmark it against my csv parser :)
Your parser is almost certainly better and faster :) Mine is tailored to a certain schema with specific expectations about foreign keys (well, the concept and artificial enforcement of them) across the documents. This is actually why I've been thinking about using duckdb for this project; it'll allow me to pack the data into the db under multiple schemas with real keys and some primitive type-level constraints. Analysis after that would be sooo much cleaner and faster.
The parsing itself is done with the streams API and orchestrated by a state chart (XState), and while the memory management and concurrency of the whole system is really nice and I'm happy with it, I'm probably making tons of mistakes and trading program efficiency for developer comforts here and there.
The state chart essentially does some grouping operations to pull event data from multiple CSVs, then once it has those events, it stitches them together into smaller portions and ensures each table maps to each other one by the event's ID. It's nice because grouping occurs from one enormous file, and it carves out these groups for the state chart to then organize, validate, and store in parallel. You can configure how much it'll do in parallel, but only because we've got some funny practices here and it's a safety precaution to prevent tying up too many resources on a massive kitchen-sink server on AWS. Haha. So, lots of non-parsing-specific design considerations are baked in.
One day I'll shift this off the giga-server and let it run in isolation with whatever resources it needs, but for now it's baby steps and compromises.
If you want to have local first application data where a server is the authority, ElectricSQL is probably going to serve you best.
That said there are so many approaches out there right now, and they're all promising, but tricky.
> there are so many approaches out there right now
I'm almost to the point where I'll need one of these solutions. I'm fleshing out the corner cases now. I'd appreciate if you mention some of the solutions I should be looking at, and the trade offs. I'd also appreciate if you could mention non-obvious pitfalls.The use case is a voice note aggregation system, the notes are stored on S3 and cached locally to desktops and mobile applications. There are transcriptions, AI summaries, user annotations, and structured metadata associated with each voice note. The application will be used by a single human, but he might not always remember to sync or even have an internet connection when he wants to.
Thank you!
I don't know much about automerge or other local-first solutions, but a local-first solution that doesn't deal with CRDTs is likely a much better fit for you.
Show HN: Pg_CRDT – CRDTs in Postgres Using Automerge - https://news.ycombinator.com/item?id=43655920 - April 2025 (4 comments)
Automerge: A library of data structures for building collaborative applications - https://news.ycombinator.com/item?id=40976731 - July 2024 (58 comments)
Automerge-Repo: A "batteries-included" toolkit for local-first applications - https://news.ycombinator.com/item?id=38193640 - Nov 2023 (43 comments)
Automerge 2.0 - https://news.ycombinator.com/item?id=34586433 - Jan 2023 (89 comments)
Automerge CRDT – Build local-first software - https://news.ycombinator.com/item?id=30881016 - April 2022 (8 comments)
Automerge: A JSON-like data structure (a CRDT) that can be modified concurrently - https://news.ycombinator.com/item?id=30412550 - Feb 2022 (69 comments)
Automerge: a new foundation for collaboration software [video] - https://news.ycombinator.com/item?id=29501465 - Dec 2021 (29 comments)
Automerge: A library [..] for building collaborative applications in JavaScript - https://news.ycombinator.com/item?id=24791713 - Oct 2020 (1 comment)
Automerge: JSON-like data structure for building collaborative apps - https://news.ycombinator.com/item?id=16309533 - Feb 2018 (98 comments)
Is the map based on a multi-value register or a last-writer-wins register?
From the doc
> Automerge uses a combination of LWW (last writer wins) and multi-value register. By default, if you read from doc.foo you will get the LWW semantics, but you can also see the conflicts by calling Automerge.getConflicts(doc, 'foo') which has multi-value semantics.
> Note that "last writer wins" here is based on the internal ID of the opeartion [sic], not a wall clock time. The internal ID is a unique operation ID that is the combination of a counter and the actorId that generated it. Conflicts are ordered based on the counter first (using the actorId only to break ties when operations have the same counter value).
Seems like they use LWW with Lamport clocks to order operations and a unique ID for each client as a tie-breaker.
I have implemented a POC sync mechanism via central server and I believe it’s simpler as it takes advantage of certain assumptions about the app. I’ve yet to productionize it so I am interested in knowing if my understanding is correct or if there are other existing solutions for this use case.
cyanydeez•6mo ago
hugodan•6mo ago
josephg•6mo ago
I'd love some performance benchmarks.