The goal should be to build a full spec and then build a code forge and ecosystem around this. If it’s truly great, adoption will come. Microsoft doing a terrible job with GitHub is great for new solutions.
> ... CRDTs for version control, which is long overdue but hasn’t happened yet
Pijul happened and it has hundreds - perhaps thousands - of hours of real expert developer's toil put in it.
Not that Bram is not one of those, but the post reads like you all know what.
... and of course it is, because Pijul uses Pijul for development, not Git and GitHub!
I'm surprised! Pijul has been discussed here on HN many, many times. My impression is that many people here were hoping that Pijul might eventually become a serious Git contender but these days people seem to be more excited about Jujutsu, likely because migration is much easier.
I assume the proposed system addresses it somehow but I don't see it in my quick read of this.
the key insight is that changes should be flagged as conflicting when they touch each other, giving you informative conflict presentation on top of a system which never actually fails.
In the general case, such commits cannot be considered the same — consider a commit which flips a boolean that one branch had flipped in another file. But there are common cases where the commits should be considered equivalent, such as many rebased branches. Can the CRDT approach help with e.g. deciding that `git branch -d BRANCH` should succeed when a rebased version of BRANCH has been merged?
It'll fire on merge issues that aren't code problems under a smarter merge, while also missing all the things that merge OK but introduce deeper issues.
Post-merge syntax checks are better for that purpose.
And imminently: agent-based sanity-checks of preserved intent – operating on a logically-whole result file, without merge-tool cruft. Perhaps at higher intensity when line-overlaps – or even more-meaningful hints of cross-purposes – are present.
That has not been my experience at all. The changes you introduced is your responsibility. If you synchronizes your working tree to the source of truth, you need to evaluate your patch again whether it introduces conflict or not. In this case a conflict is a nice signal to know where someone has interacted with files you've touched and possibly change their semantics. The pros are substantial, and it's quite easy to resolve conflicts that's only due to syntastic changes (whitespace, formatting, equivalent statement,...)
FWIW I've struggled to get AI tools to handle merge conflicts well (especially rebase) for the same underlying reason.
Technically you could include conflict markers in your commits but I don't think people like that very much
What I do think is the critical challenge (particularly with Git) is scalability.
Size of repository & rate of change of repositories are starting to push limits of git, and I think this needs revisited across the server, client & wire protocols.
What exactly, I don't know. :). But I do know that in my current role (mid-size well-known tech company) is hitting these limits today.
When I was screwing around with the Git file format, tricks I would use to save space like hard-linking or memory-mapping couldn't work, because data is always stored compressed after a header.
A general copy-on-write approach to save checkout space is presumably impossible, but I wonder what other people have traveled down similar paths have concluded.
- What kind of problems do 1 person, 10 person, 100 person, 1k (etc) teams really run into with managing merge conflicts?
- What do teams of 1, 10, 100, 1k, etc care the most about?
- How does the modern "agent explosion" potentially affect this?
For example, my experience working in the 1-100 regime tells me that, for the most part, the kind of merge conflict being presented here is resolved by assigning subtrees of code to specific teams. For the large part, merge conflicts don't happen, because teams coordinate (in sprints) to make orthogonal changes, and long-running stale branches are discouraged.
However, if we start to mix in agents, a 100 person team could quickly jump into a 1000 person team, esp if each person is using subagents making micro commits.
It's an interesting idea definitely, but without real-world data, it kind of feels like this is just delivering a solution without a clear problem to assign it to. Like, yes merge-conflicts are a bummer, but they happen infrequently enough that it doesn't break your heart.
It's not the same as capturing it, but I would also note that there are a wide wide variety of ways to get 3-way merges / 3 way diffs from git too. One semi-recent submission (2022 discussing a 2017) discussed diff3 and has some excellent comments (https://news.ycombinator.com/item?id=31075608), including a fantastic incredibly wide ranging round up of merge tools (https://www.eseth.org/2020/mergetools.html).
However/alas git 2.35's (2022) fabulous zdiff3 doesn't seems to have any big discussions. Other links welcome but perhaps https://neg4n.dev/blog/understanding-zealous-diff3-style-git...? It works excellently for me; enthusiastically recommended!
No matter the tool, merges should always be presented like that. It's the only presentation that makes sense.
If you haven’t resolved conflicts then it probably doesn’t compile and of course tests won’t pass, so I don’t see any point in publishing that change? Maybe the commit is useful as a temporary state locally, but that seems of limited use?
Nowadays I’d ask a coding agent to figure out how to rebase a local branch to the latest published version before sending a pull request.
git config --global merge.conflictstyle diff3
to get something like what is shown in the article.Well, isn't that what the CRDT does in its own data structure ?
Also keep in mind that syntactic correctness doesn't mean functional correctness.
It's been amazing watching it grow over the last few years.
conflict free merging sounds cool, but doesn't that just mean that that a human review step is replaced by "changes become intervals rather than collections of lines" and "last set of intervals always wins"? seems like it makes sense when the conflicts are resolved instantaneously during live editing but does it still make sense with one shot code merges over long intervals of time? today's systems are "get the patch right" and then "get the merge right"... can automatic intervalization be trusted?
bos•1h ago
Codeville also used a weave for storage and merge, a concept that originated with SCCS (and thence into Teamware and BitKeeper).
Codeville predates the introduction of CRDTs by almost a decade, and at least on the face of it the two concepts seem like a natural fit.
It was always kind of difficult to argue that weaves produced unambiguously better merge results (and more limited conflicts) than the more heuristically driven approaches of git, Mercurial, et al, because the edit histories required to produce test cases were difficult (at least for me) to reason about.
I like that Bram hasn’t let go of the problem, and is still trying out new ideas in the space.