frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

The Future of Large Files in Git Is Git

https://tylercipriani.com/blog/2025/08/15/git-lfs/
75•thcipriani•11h ago

Comments

matheusmoreira•9h ago
As it should be! If it's not native to git, it's not worth using. I'm glad these issues are finally being solved.

These new features are pretty awesome too. Especially separate large object remotes. They will probably enable git to be used for even more things than it's already being used for. They will enable new ways to work with git.

als0•9h ago
10 years late is better than never.
tombert•9h ago
Is Git ever going to get proper support for binary files?

I’ve never used it for anything serious but my understanding is that Mercurial handles binary files better? Like it supports binary diffs if I understand correctly.

Any reason Git couldn’t get that?

ks2048•9h ago
I'm not sure binary diffs are the problem - e.g. for storing images or MP3s, binary diffs are usually worse than nothing.
digikata•9h ago
I would think that git would need a parallel storage scheme for binaries. Something that does binary chunking and deduplication between revisions, but keeps the same merkle referencing scheme as everything else.
tempay•8h ago
> binary chunking and deduplication

Are there many binaries that people would store in git where this would actually help? I assume most files end up with compression or some other form of randomization between revisions making deduplication futile.

hinkley•8h ago
It would likely require more tooling.
jauer•9h ago
TFA asserts that Git LFS is bad for several reasons including because proprietary with vendor lock-in which I don't think is fair to claim. GitHub provided an open client and server which negates that.

LFS does break disconnected/offline/sneakernet operations which wasn't mentioned and is not awesome, but those are niche workflows. It sounds like that would also be broken with promisors.

The `git partial clone` examples are cool!

The description of Large Object Promisors makes it sound like they take the client-side complexity in LFS, move it server-side, and then increases the complexity? Instead of the client uploading to a git server and to a LFS server it uploads to a git server which in turn uploads to an object store, but the client will download directly from the object store? Obviously different tradeoffs there. I'm curious how often people will get bit by uploading to public git servers which upload to hidden promisor remotes.

IshKebab•9h ago
LFS is bad. The server implementations suck. It conflates object contents with the storage method. It's opt-in, in a terrible way - if you do the obvious thing you get tiny text files instead of the files you actually want.

I dunno if their solution is any better but it's fairly unarguable that LFS is bad.

cma•8h ago
Git LFS didn't work with SSH, you had to get an SSL cert which github knew was a barrier for people self hosting at home. I think gitlab got it patched for SSH finally though.
glitchc•9h ago
No. This is not a solution.

While git LFS is just a kludge for now, writing a filter argument during the clone operation is not the long-term solution either.

Git clone is the very first command most people will run when learning how to use git. Emphasized for effect: the very first command.

Will they remember to write the filter? Maybe, if the tutorial to the cool codebase they're trying to access mentions it. Maybe not. What happens if they don't? It may take a long time without any obvious indication. And if they do? The cloned repo might not be compilable/usable since the blobs are missing.

Say they do get it right. Will they understand it? Most likely not. We are exposing the inner workings of git on the very first command they learn. What's a blob? Why do I need to filter on it? Where are blobs stored? It's classic abstraction leakage.

This is a solved problem: Rsync does it. Just port the bloody implementation over. It does mean supporting alternative representations or moving away from blobs altogether, which git maintainers seem unwilling to do.

IshKebab•9h ago
I totally agree. This follows a long tradition of Git "fixing" things by adding a flag that 99% of users won't ever discover. They never fix the defaults.

And yes, you can fix defaults without breaking backwards compatibility.

Jenk•9h ago
> They never fix the defaults

Not strictly true. They did change the default push behaviour from "matching" to "simple" in Git 2.0.

hinkley•8h ago
So what was the second time the stopped watch was right?

I agree with GP. The git community is very fond of doing checkbox fixes for team problems that aren’t or can’t be set as defaults and so require constant user intervention to work. See also some of the sparse checkout systems and adding notes to commits after the fact. They only work if you turn every pull and push into a flurry of activity. Which means they will never work from your IDE. Those are non fixes that pollute the space for actual fixes.

ks2048•9h ago
> This is a solved problem: Rsync does it.

Can you explain what the solution is? I don't mean the details of the rsync algorithm, but rather what it would like like from the users' perspective. What files are on your local filesystem when you do a "git clone"?

hinkley•8h ago
When you do a shallow clone, no files would be present. However when doing a full clone you’ll get a full copy of each version of each blob, and what is being suggested is treat each revision as an rsync operation upon the last. And the more times you muck with a file, which can happen a lot both with assets and if you check in your deps to get exact snapshotting of code, that’s a lot of big file churn.
matheusmoreira•9h ago
It is a solution. The fact beginners might not understand it doesn't really matter, solutions need not perish on that alone. Clone is a command people usually run once while setting up a repository. Maybe the case could be made that this behavior should be the default and that full clones should be opt-in but that's a separate issue.
TGower•9h ago
> The cloned repo might not be compilable/usable since the blobs are missing.

Only the histories of the blobs are filtered out.

hinkley•8h ago
Rsync also has a mode to try to make compressed files be chunkable. In for a penny, in for a pound.
spyrja•8h ago
Would it be incorrect to say that most of the bloat relates to historical revisions? If so, maybe an rsync-like behavior starting with the most current version of the files would be the best starting point. (Which is all most people will need anyhow.)
goneri•9h ago
git-annex is a good alternative to the solution of Githu, and it supports different storage backends. I'm actually surprised it's not more popular.
forrestthewoods•9h ago
Git is fundamentally broken and bad. Almost all projects are defacto centralized. Your project is not Linux.

A good version control system would support petabyte scale history and terabyte scale clones via sparse virtual filesystem.

Git’s design is just bad for almost all projects that aren’t Linux.

matheusmoreira•9h ago
Completely disagree. Git is fundamentally functional and good. All projects are local and decentralized, and any "centralization" is in fact just git hosting services, of which there are many options which are not even mutually exclusive.
codethief•9h ago
> A good version control system would support petabyte scale history and terabyte scale clones via sparse virtual filesystem.

I like this idea in principle but I always wonder what that would look in practice, outside a FAANG company: How do you ensure the virtual file system works equally well on all platforms, without root access, possibly even inside containers? How do you ensure it's fast? What do you do in case of network errors?

sublinear•8h ago
May I humbly suggest that those files probably belong in an LFS submodule called "assets" or "vendor"?

Then you can clone without checking out all the unnecessary large files to get a working build, This also helps on the legal side to correctly license your repos.

I'm struggling to see how this is a problem with git and not just antipatterns that arise from badly organized projects.

charcircuit•8h ago
The user shouldn't have to think about such a thing. Version control should handle everything automatically and not force the user into doing extra work to workaround issues.
jameshart•8h ago
Nit:

> if I git clone a repo with many revisions of a noisome 25 MB PNG file

FYI ‘noisome’ is not a synonym for ‘noisy’ - it’s more of a synonym for ‘noxious’; it means something smells bad.

jiggawatts•8h ago
What I would love to see in an SCM that properly supports large binary blobs is storing the contents using Prolly trees instead of a simple SHA hash.

Prolly trees are very similar to Merkle trees or the rsync algorithm, but they support mutation and version history retention with some nice properties. For example: you always obtain exactly the same tree (with the same root hash) irrespective of the order of incremental edit operations used to get to the same state.

In other words, two users could edit a subset of a 1 TB file, both could merge their edits, and both will then agree on the root hash without having to re-hash or even download the entire file!

Another major advantage on modern many-core CPUs is that Prolly trees can be constructed in parallel instead of having to be streamed sequentially on one thread.

Then the really big brained move is to store the entire SCM repo as a single Prolly tree for efficient incremental downloads, merges, or whatever. I.e.: a repo fork could share storage with the original not just up to the point-in-time of the fork, but all future changes too.

bahmboo•8h ago
I'm just dipping my toe into Data Version Control - DVC. It is aimed towards data science and large digital asset management using configurable storage sources under a git meta layer. The goal is separation of concerns: git is used for versioning and the storage layers are dumb storage.

Does anyone have feedback about personally using DVC vs LFS?

Discover the Hidden (H1B) Job Market

https://www.jobs.now
1•gadders•1m ago•0 comments

Tesla brings back free lifetime supercharging for new Model 3 owners (in Canada)

https://thedriven.io/2025/08/16/tesla-brings-back-free-lifetime-supercharging-for-new-model-3-owners/
1•decimalenough•3m ago•0 comments

Candle Flame Oscillations as a Clock

https://cpldcpu.com/2025/08/13/candle-flame-oscillations-as-a-clock/
1•cpldcpu•4m ago•0 comments

AI agent that interviews you and analyzes your response

1•aman-naik•11m ago•1 comments

MOTChallenge: The Multiple Object Tracking Benchmark

https://motchallenge.net/
1•jonbaer•15m ago•0 comments

Once Again, Oil States Thwart Agreement on Plastics

https://e360.yale.edu/digest/global-plastics-treaty
2•YaleE360•15m ago•0 comments

Everything I know about good system design

https://www.seangoedecke.com/good-system-design/
4•dondraper36•16m ago•0 comments

ICE Detention Map

https://watchice.org/
2•Improvement•18m ago•0 comments

New Zealand's population exodus hits 13-year high as economy worsens

https://www.reuters.com/world/asia-pacific/new-zealands-population-exodus-hits-13-year-high-economy-worsens-2025-08-15/
2•breve•20m ago•0 comments

Show HN: Sign Up at Remotelygood.us and RSVP for Our Soft Launch Party in SF

https://remotelygood.us
1•Theresa_i_a•20m ago•0 comments

Ask HN: Self-Checkin Terminals for Doctors

1•momoelz•27m ago•0 comments

Using bacteria to sneak viruses into tumors

https://www.eurekalert.org/news-releases/1094799
1•thunderbong•30m ago•0 comments

Buying a single character domain – and 3 character FQDN – for £15 (2020)

https://shkspr.mobi/blog/2020/08/buying-a-single-character-domain-and-3-character-fqdn-for-15/
2•OuterVale•33m ago•0 comments

'Cheapfake' AI Celeb Videos Are Rage-Baiting People on YouTube

https://www.wired.com/story/cheapfake-ai-celeb-videos-are-rage-baiting-people-on-youtube/
1•01-_-•35m ago•0 comments

NASA has sparked a race to develop the data pipeline to Mars

https://techcrunch.com/2025/08/13/nasa-has-sparked-a-race-to-develop-the-data-pipeline-to-mars/
1•adwmayer•36m ago•0 comments

What happens when chatbots shape your reality? Concerns are growing online

https://www.aol.com/happens-chatbots-shape-reality-concerns-170000551.html
1•01-_-•37m ago•0 comments

Air traffic control outage disrupts flight in Oceanic airspace

https://www.nzherald.co.nz/nz/multiple-australia-bound-flights-circling-off-new-zealand-reports-airpace-closed/H66PWKMGNFDA5CDH2CF7EWSI3U/
2•Geo_ge•40m ago•0 comments

Show HN: Something different, built an Free Coloring pages App

https://www.coloringwell.com
1•amittambulkar•47m ago•1 comments

Rigetti Computing Launches 36-Qubit Multi-Chip Quantum Computer

https://quantumzeitgeist.com/rigetti-computing-launches-36-qubit-multi-chip-quantum-computer/
1•donutloop•48m ago•0 comments

Amazon Braket Unveils 'Program Sets' – Run Quantum Workloads Up to 24× Faster

https://aws.amazon.com/blogs/quantum-computing/amazon-braket-introduces-program-sets-enabling-customers-to-run-quantum-programs-up-to-24x-faster/
1•donutloop•49m ago•0 comments

Every Breath You Take

https://domofutu.substack.com/p/every-breath-you-take
2•domofutu•53m ago•0 comments

Kaspersky: Quantum on Everyone's Lips: Why Security Preparations Must Start Now

https://www.kaspersky.com/about/policy-blog/quantum-on-everyones-lips-why-security-preparations-must-start-now
1•donutloop•53m ago•0 comments

Unblocked Games 76

https://unblockedgames766.com/
3•lion__93332•58m ago•0 comments

What are your go-to website submission sites when launching a product?

2•ifrosted•1h ago•0 comments

I built simplified and cost effective alternative to Intercom

https://www.helplify.io/
2•coder_con•1h ago•0 comments

URL and Natural Language = API

https://twitter.com/restocc/status/1956607754661138811
2•korbinschulz•1h ago•1 comments

Sam Altman vs. Elon Musk vs. Grok

https://twitter.com/sama/status/1955094792804720660
3•chirau•1h ago•0 comments

4 years since I quit my Job to make a living on the internet

https://superframeworks.com/blog/four-years-on
3•ayushchat•1h ago•3 comments

Russia blocks calls via WhatsApp and Telegram

https://www.euronews.com/next/2025/08/14/russia-blocks-calls-via-whatsapp-and-telegram-as-it-tightens-control-over-the-internet
3•SerCe•1h ago•0 comments

The Apple II color hack - How Wozniak created color from monochrome

https://ntsc-color-hack.lovable.app/
3•robarr•1h ago•1 comments