"What's inside .git ?" - https://prakharpratyush.com/blog/7/
I think theoratically, Git delta-compression is still a lot more optimized for smaller repos. But for bigger repos where sharding storaged is required, path-based delta dictionary compression does much better. Git recently (in the last 1 year) got something called "path-walk" which is fairly similar though.
For others, I highly recommend Git from the Bottom Up[1]. It is a very well-written piece on internal data structures and does a great job of demystifying the opaque git commands that most beginners blindly follow. Best thing you'll learn in 20ish minutes.
Ends up being circular if the author used LLM help for this writeup though there are no obvious signs of that.
Great argument for not using AI-assisted tools to write blog posts (especially if you DO use these tools). I wonder how much we're taking for granted in these early phases before it starts to eat itself.
Maybe that's obvious to most people, but it was a bit surprising to see it myself. It feels weird to think that LLMs are being trained on my code, especially when I'm painfully aware of every corner I'm cutting.
The article doesn't contain any LLM output. I use LLMs to ask for advice on coding conventions (especially in rust, since I'm bad at it), and sometimes as part of research (zstd was suggested by chatgpt along with comparisons to similar algorithms).
One of the funniest things I've started to notice from Gemini in particular is that in random situations, it talks with english with an agreeable affect that I can only describe as.. Indian? I've never noticed such a thing leak through before. There must be a ton of people in India who are generating new datasets for training.
Why not tvc-hub :P
Jokes aside, great write up!
P.S. Didn't know that plain '@' can be used instead of HEAD, but I guess it makes sense since you can omit both left and right parts of the expressions separated by '@'
And this way of versionning can be reused in other fields, as soon as have some kind of graph of data that can be modified independently but read all together then it makes sense.
Bookmarked for later
That's a weird thing to put so close to the start. Compression is about the least interesting aspect of Git's design.
kgeist•1h ago
How about using sqlite for this? Then you wouldn't need to parse anything, just read/update tables. Fast indexing out of the box, too.
grenran•1h ago
TonyStr•1h ago
[0] https://fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki#...
embedding-shape•1h ago
I really enjoy how local-first it is, as someone who sometimes work without internet connection. That the data around "work" is part of the SCM as well, not just the code, makes a lot of sense to me at a high-level, and many times I wish git worked the same...
usrbinbash•1h ago
But yeah, fossil is interesting, and it's a crying shame its not more well known, for the exact reasons you point out.
embedding-shape•31m ago
It isn't though, Fossil integrates all the data around the code too in the "repository", so issues, wiki, documentation, notes and so on are all together, not like in git where most commonly you have those things on another platform, or you use something like `git notes` which has maybe 10% of the features of the respective Fossil feature.
It might be useful to scan through the list of features of Fossil and dig into it, because it does a lot more than you seem to think :) https://fossil-scm.org/home/doc/trunk/www/index.wiki
smartmic•58m ago
[0]: https://fossil-scm.org/home/doc/trunk/www/rebaseharm.md
graemep•24m ago
It is very easy to self host.
Not having staging is awkward at first but works well once you get used to it.
I prefer it for personal projects. In think its better for small teams if people are willing to adjust but have not had enough opportunities to try it.