frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Package managers keep using Git as a database, it never works out

https://nesbitt.io/2025/12/24/package-managers-keep-using-git-as-a-database.html
115•birdculture•2h ago

Comments

eviks•1h ago
Indeed, the seductive nature of bad tools lying close to your hand - no need to lift your butt to get them!
twoodfin•1h ago
What made git special & powerful from the start was its data model: Like the network databases of old, but embedded in a Merkle tree for independent evolution and verifiability.

Scaling that data model beyond projects the size of the Linux kernel was not critical for the original implementation. I do wonder if there are fundamental limits to scaling the model for use cases beyond “source code management for modest-sized, long-lived projects”.

amluto•50m ago
Most of the problems mentioned in the article are not problems with using a content-addressed tree like git or even with using precisely git’s schema. The problems are with git’s protocol and GitHub’s implementation thereof.

Consider vcpkg. It’s entirely reasonable to download a tree named by its hash to represent a locked package. Git knows how to store exactly this, but git does not know how to transfer it efficiently.

quaintdev•1h ago
I host my own code repository using Forgejo. It's not public. In fact, it's behind mutual tls like all the service I host. Reason? I don't want to deal with bots and other security risks that come with opening port to the world.

Turns out Go module will not accept package hosted on my Forgejo instance because it asks for certificate. There are ways to make go get use ssh but even with that approach the repository needs to be accessible over https. In the end, I cloned the repository and used it in my project using replace directive. It's really annoying.

xyzzy_plugh•48m ago
> There are ways to make go get use ssh but even with that approach the repository needs to be accessible over https.

No, that's false. You don't need anything to be accessible over HTTP.

But even if it did, and you had to use mTLS, there's a whole bunch of ways to solve this. How do you solve this for any other software that doesn't present client certs? You use a local proxy.

agwa•13m ago
If you add .git to the end of your module path and set $GOPRIVATE to the hostname of your Forgejo instance, then Go will not make any HTTPS requests itself and instead delegate to the git command, which can be configured to authenticate with client certificates. See https://go.dev/ref/mod#vcs-find
Zambyte•1h ago
The issues with using Git for Nix seem to entirely be issues with using GitHub for Nix, no?
femiagbabiaka•1h ago
Yeah, it's inclusion in here is baffling because none of the listed issues have anything to do with the particular issue nixpkgs is having.
Rucadi•1h ago
I also got the same feeling from that, in fact, I would go as far as to say that nixpkgs and nix-commands integration with git works quite well and is not an issue.

So the phrase the article says "Package managers keep falling for this. And it keeps not working out" I feel that's untrue.

The most issue I have with this really is "flakes" integration where the whole recipe folder is copied into the store (which doesn't happen with non-flakes commands), but that's a tooling problem not an intrinsic problem of using git

ekjhgkejhgk•1h ago
Do the easy thing while it works, and when it stops working, fix the problem.

Julia does the same thing, and from the Rust numbers on the article, Julia has about 1/7th the number of packages that Rust does[1] (95k/13k = 7.3).

It works fine, Julia has some heuristics to not re-download it too often.

But more importantly, there's a simple path to improve. The top Registry.toml [1] has a path to each package, and once donwloading everything proves unsustainable you can just download that one file and use it to download the rest as needed. I don't think this is a difficult problem.

[1] https://github.com/JuliaRegistries/General/blob/master/Regis...

zahlman•47m ago
> 00000000-1111-2222-3333-444444444444 = { name = "REPLTreeViews", path = "R/REPLTreeViews" }

... Should it be concerning that someone was apparently able to engineer an ID like that?

adestefan•39m ago
It’s as random as any other UUID.
Severian•25m ago
Incorrect, only some UUIDs are random, specifically v4 and v7 (v7 uses time as well).

https://en.wikipedia.org/wiki/Universally_unique_identifier

> 00000000-1111-2222-3333-444444444444

This would technically be version 2, which would be built from the date-time and MAC address, and DCE security version.

But overall, if you allow any yahoo to pick a UUID, its not really a UUID, its just some random string that looks like one.

skycrafter0•39m ago
If you read the repo README, it just says "generate a uuid". You can use whatever you want as long as it fits the format, it seems.
ekjhgkejhgk•29m ago
Could you please articulate specifically why that should be concerning?

Right now I don't see the problem because the only criterion for IDs is that they are unique.

galenlynch•26m ago
I believe Julia only uses the Git registry as an authoritative ledger where new packages are registered [1]. My understanding is that as you mention, most clients don't access it, and instead use the "Pkg Protocol" [2] which does not use Git.

[1] https://github.com/JuliaRegistries/General

[2] https://pkgdocs.julialang.org/dev/protocol/

bencornia•1h ago
> Grab’s engineering team went from 18 minutes for go get to 12 seconds after deploying a module proxy. That’s not a typo. Eighteen minutes down to twelve seconds.

> The problem was that go get needed to fetch each dependency’s source code just to read its go.mod file and resolve transitive dependencies. Cloning entire repositories to get a single file.

I have also had inconsistent performance with go get. Never enough to look closely at it. I wonder if I was running into the same issue?

zahlman•45m ago
> needed to fetch each dependency’s source code just to read its go.mod file and resolve transitive dependencies.

Python used to have this problem as well (technically still does, but a large majority of things are available as a wheel and PyPI generally publishes a separate .metadata file for those wheels), but at least it was only a question of downloading and unpacking an archive file, not cloning an entire repo. Sheesh.

Why would Go need to do that, though? Isn't the go.mod file in a specific place relative to the package root in the repo?

fireflash38•27m ago
How long ago were you having issues? That was changed in go 1.13.
c-linkage•1h ago
This seems like a tragedy of the commons -- GitHub is free after all, and it has all of these great properties, so why not? -- but this kind of decision making occurs whenever externalities are present.

My favorite hill to die on (externality) is user time. Most software houses spend so much time focusing on how expensive engineering time is that they neglect user time. Software houses optimize for feature delivery and not user interaction time. Yet if I spent one hour making my app one second faster for my million users, I can save 277 user hour per year. But since user hours are an externality, such optimization never gets done.

Externalities lead to users downloading extra gigabytes of data (wasted time) and waiting for software, all of which is waste that the developer isn't responsible for and doesn't care about.

ekjhgkejhgk•1h ago
I wouldn't call it tragedy of the commons, because it's not a commons. It's owned by microsoft. They're calculating that it's worth it for them, so I say take as much as you can.

Commons would be if it's owned by nobody and everyone benefits from its existence.

TeMPOraL•50m ago
Still, because reality doesn't respect boundaries of human-made categories, and because people never define their categories exhaustively, we can safely assume that something almost-but-not-quite like a commons, is subject to an almost-but-not-quite tragedy of the commons.
reactordev•34m ago
An A- is still an A kind of thinking. I like this approach as not everything perfectly fits the mold.
ttiurani•24m ago
The whole notion of the "tragedy of the commons" needs to be put to rest. It's an armchair thought experiment that was disproven at the latest in the 90s by Elinor Ostrom with actual empirical evidence of commons.

The "tragedy", if you absolutely need to find one, is only for unrestricted, free-for-all commons, which is obviously a bad idea.

jasonkester•15m ago
It has the same effect though. A few bad actors using this “free” thing can end up driving the cost up enough that Microsoft will have to start charging for it.

The jerks get their free things for a while, then it goes away for everyone.

loloquwowndueo•58m ago
Just a reminder that GitHub is not git.

The article mentions that most of these projects did use GitHub as a central repo out of convenience so there’s that but they could also have used self-hosted repos.

justincormack•56m ago
They probably would have experienced issues way sooner, as the self hosted tools don't scale nearly as well.
machinationu•40m ago
Explain to me how you self-host a git repo which is accessed millions of time a day from CI jobs pulling packages.
ozim•36m ago
FTFY:

Explain to me how you self-host a git repo without spending any money and having no budget which is accessed millions of time a day from CI jobs pulling packages.

zahlman•51m ago
> Most software houses spend so much time focusing on how expensive engineering time is that they neglect user time. Software houses optimize for feature delivery and not user interaction time. Yet if I spent one hour making my app one second faster for my million users, I can save 277 user hour per year. But since user hours are an externality, such optimization never gets done.

This is what people mean about speed being a feature. But "user time" depends on more than the program's performance. UI design is also very important.

machinationu•41m ago
With AI engineering costs are plummeting.

You can implement entire features with 10 cents of tokens.

Companies which dont adapt will be left behind this year.

camgunz•39m ago
I've never been more convinced LLMs are the vanguard of the grift economy now that green accounts are low effort astroturfing on HN.
machinationu•3m ago
hey, I'm just a lowly LLM, gotta earn my tokens :|
solatic•35m ago
If you think too hard about this, you come back around to Alan Kay's quote about how people who are really serious about software should build their own hardware. Web applications, and in general loading pretty much anything over the network, is a horrible, no-good, really bad user experience, and it always will be. The only way to really respect the user is with native applications that are local-first, and if you take that really far, you build (at the very least) peripherals to make it even better.

The number of companies that have this much respect for the user is vanishingly small.

ghosty141•33m ago
Yes because users don't appreciate this enough to pay for the time this takes.
hombre_fatal•28m ago
Software I don’t have to install at all “respects me” the most.

Native software being an optimum is mostly an engineer fantasy that comes from imagining what you can build.

In reality that means having to install software like Meta’s WhatsApp, Zoom, and other crap I’d rather run in a browser tab.

I want very little software running natively on my machine.

inapis•33m ago
>Yet if I spent one hour making my app one second faster for my million users, I can save 277 user hour per year. But since user hours are an externality, such optimization never gets done.

I have never been convinced by this argument. The aggregate number sounds fantastic but I don't believe that any meaningful work can be done by each user saving 1 second. That 1 second (and more) can simply be taken by me trying to stretch my body out.

OTOH, if the argument is to make software smaller, I can get behind that since it will simply lead to more efficient usage of existing resources and thus reduce the environmental impact.

But we live in a capitalist world and there needs to be external pressure for change to occur. The current RAM shortage, if it lasts, might be one of them. Otherwise, we're only day dreaming for a utopia.

adrianN•25m ago
Time saved to increased productivity or happiness or whatever is not linear but a step function. Saving one second doesn’t help much, but there is a threshold (depending on the individual) where faster workflows lead to a better experience. It does make a difference whether a task takes a minute or half a second, at least for me.
Aerroon•21m ago
One second is long enough that it can put a user off from using your app though. Take notifications on phones for example. I know several people who would benefit from a habitual use of phone notifications, but they never stick to using them because the process of opening (or switching over to) the notification app and navigating its UI to leave a notification takes too long. Instead they write a physical sticky note, because it has a faster "startup time".
tehbeard•11m ago
All depends on the type of interaction.

A high usage one, absolutely improve the time of it.

Loading the profile page? Isn't done often so not really worth it unless it's a known and vocal issue.

https://xkcd.com/1205/ gives a good estimate.

ozim•16m ago
About apps done by software houses, even though we should strive for doing good job and I agree with sentiment...

First argument would be - take at least two 0's from your estimation, most of applications will have maybe thousands of users, successful ones will maybe run with 10's of thousands. You might get lucky to work on application that has 100's of thousands, millions of users and you work in FAANG not a typical "software house".

Second argument is - most users use 10-20 apps in typical workday, your application is most likely irrelevant.

Third argument is - most users would save much more time learning how to use applications (or to use computer) properly they use on daily basis, than someone optimizing some function from 2s to 1s. But of course that's hard because they have 10-20 apps daily plus god know how many other not on daily basis. Though still I see people doing super silly stuff in tools like Excel or even not knowing copy paste - so not even like any command line magic.

miyuru•1h ago
Funnily enough, I clicked the homebrew GitHub link in the post, only to get a rate limited error page from GitHub.
mikkupikku•58m ago
People who put off learning SQL for later end up using anything other than a database as their database.
redog•19m ago
SQL killed the set theory star
steeleduncan•57m ago
The other conclusion to draw is "Git is a fantastic choice of database for starting your package manager, almost all popular package managers began that way."
saidinesh5•39m ago
I think the conclusion is more that package definitions can still be maintained on git/GitHub but the package manager clients should probably rely on a cache/db/a more efficient intermediate layer.

Mostly to avoid downloading the whole repo/resolve deltas from the history for the few packages most applications tend to depend on. Especially in today's CI/CD World.

reactordev•27m ago
This is exactly the right approach. I did this for my package manager.

It relies on a git repo branch for stable. There are yaml definitions of the packages including urls to their repo, dependencies, etc. Preflight scripts. Post install checks. And the big one, the signatures for verification. No binaries, rpms, debs, ar, or zip files.

What’s actually installed lives in a small SQLite database and searching for software does a vector search on each packages yaml description.

Semver included.

This was inspired by brew/portage/dpkg for my hobby os.

bluGill•36m ago
Git isn't a fantastic choice unless you know nothing about databases. A search would show plenty of research on databases and what works when/why.
kibwen•11m ago
For the purposes of the article, git isn't just being used as a database, it's being used as a protocol to replicate the database to the client to allow for offline operation and then keep those distributed copies in sync. And even for that purpose you can do better than git if you know what you're doing, but knowledge of databases alone isn't going to help you (let alone make your engineering more economical than relying on free git hosting).
adastra22•27m ago
Git is an absolute shit database for a package manager even in the beginning. It’s just that GitHub subsidizes hosting and that is hard to pass up.
ori_b•51m ago
Alternatively: Downloading the entire state of all packages when you care about just one, it never works out.

O(1) beats O(n) as n gets large.

gruez•30m ago
Seems to still work out for apt?
born-jre•47m ago
lol I see this as I plan on using Git for my thing store. https://github.com/blue-monads/potatoverse
gjvc•47m ago
sqlite seems to be ideal for a package manager
hk1337•45m ago
I like Go but it’s dependency management is weird and seems to be centered around GitHub a lot.
hogrug•45m ago
The facts are interesting but the conclusion a bit strange. These package managers have succeeded because git is better for the low trust model and GitHub has been hosting infra for free that no one in their right mind would provide for the average DB.

If it didn't work we would not have these massive ecosystems upsetting GitHub's freemium model, but anything at scale is naturally going to have consequences and features that aren't so compatible with the use case.

ifh-hn•37m ago
So what's the answer then? That's the question I wanted answered after reading this article. With no experience with git or package management, would using a local client sqlite database and something similar on the server do?
encom•25m ago
I quite like Gentoo's rsync based package manager. I believe they've used that since the beginning. It works well.
aniou•37m ago
As side note. Maybe someone knows, why rust devs chose an already used name for language changes proposal? "RFC" was already taken and well-established and I simply refuse to accept that someone wasn't aware about Request For Comments - and if it was true and clash was created deliberately, then it was rude and arrogant.

Every, ...king time, when I read something like "RFC 2789 introduced a sparse HTTP protocol." my brain suffers from a short-circuit. BTW: RFC 2789 is a "Mail Monitoring MIB".

adastra22•25m ago
There are many, many RFC collections. Including many that predate the IETF. Some even predate computers.
aniou•12m ago
But they were in different domains. Here, we have a strong clash because Rust is positioning itself as secure system and internet language and computer and internet standard are already defined by RFC-s. So, it may be not uncommon, when someone would tell about Rust mechanisms, defined by particular RFC in context of handling particular protocol, defined by... well... RFC too. But not by rust-one.

Not so smart, when we realize, that one of aspects of secure and reliable system is elimination of ambiguities.

frumplestlatz•37m ago
Since ~2002, Macports has used svn or git, but users, by default, rsync the complete port definitions + a server-generated index + a signature.

The index is used for all lookups; it can also be generated or incrementally updated client-side to accommodate local changes.

This has worked fine for literally decades, starting back when bandwidth and CPU power was far more limited.

The problem isn’t using SCM, and the solutions have been known for a very long time.

gethly•26m ago
If we stopped using VCS to fetch source files, we would lose the ability to get the exact commit(understand as version that has nothing to do with the underlying VCS) of these files. Git, Mercurial, SVN.., github, bitbucket...it does not matter. Absolutely nobody will be building downloadable versions of their source files, hosted on who knows how "prestigious" domains, by copying them to another location just to serve the --->exact same content<--- that github and alike already provide.

This entire blog is just a waste of time for anyone reading it.

encom•22m ago
>[Homebrew] Auto-updates now run every 24 hours instead of every 5 minutes[...]

That is such an insane default, I'm at a loss for words.

dboon•20m ago
I’m building Cargo/UV for C. Good article. I thought about this problem very deeply.

Unfortunately, when you’re starting out, the idea of running a registry is a really tough sell. Now, on top of the very hard engineering problem of writing the code and making a world class tool, plus the social one of getting it adopted, I need to worry about funding and maintaining something that serves potentially a world of traffic? The git solution is intoxicating through this lense.

Fundamentally, the issue is the sparse checkouts mentioned by the author. You’d really like to use git to version package manifests, so that anyone with any package version can get the EXACT package they built with.

But this doesn’t work, because you need arbitrary commits. You either need a full checkout, or you need to somehow track the commit a package version is in without knowing what hash git will generate before you do it. You have to push the package update and then push a second commit recording that. Obviously infeasible, obviously a nightmare.

Conan’s solution is I think just about the only way. It trades the perfect reproduction for conditional logic in the manifest. Instead of 3.12 pointing to a commit, every 3.x points to the same manifest, and there’s just a little logic to set that specific config field added in 3.12. If the logic gets too much, they let you map version ranges to manifests for a package. So if 3.13 rewrites the entire manifest, just remap it.

I have not found another package manager that uses git as a backend that isn’t a terrible and slow tool. Conan may not be as rigorous as Nix because of this decision but it is quite pragmatic and useful. The real solution is to use a database, of course, but unless someone wants to wire me ten thousand dollars plus server costs in perpetuity, what’s a guy supposed to do?

kibwen•18m ago
I think there's a form of survivorship bias at work here. To use the example of Cargo, if Rust had never caught on, and thereby gotten popular enough to inflate the git-based index beyond reason, then it would never have been a problem to use git as the backing protocol for the index. Likewise, we can imagine innumerable smaller projects that successfully use git as a distributed delta-updating data distribution protocol, and never happen to outgrow it.

The point being, if you're not sure whether your project will ever need to scale, then it may not make sense to reinvent the wheel when git is right there (and then invent the solution for hosting that git repo, when Github is right there), letting you spend time instead on other, more immediate problems.

nacozarina•6m ago
successful things often have humble origins, it’s a feature not a bug

for every project that managed to out-grow ext4/git there were a hundred that were well-served and never needed to over-invest in something else

What Do We Tell the Humans?

https://theaidigest.org/village/blog/what-do-we-tell-the-humans
1•colejohnson66•3m ago•0 comments

Steve wants us to make the Macintosh boot faster

https://www.folklore.org/Saving_Lives.html
1•maayank•3m ago•0 comments

Rackarr – Visual rack layout designer for homelabbers

https://github.com/Rackarr/Rackarr
1•Brajeshwar•4m ago•0 comments

Show HN: LLMSwap – Switch between LLM providers with one line of code

https://github.com/sreenathmmenon/llmswap
1•sreenathmenon•10m ago•0 comments

ShowHN: LLMSwap – Switch between LLM providers with one line of code

https://github.com/sreenathm/llmswap
1•sreenathmenon•14m ago•2 comments

Streaming compression beats framed compression

https://bou.ke/blog/compressed/
1•bouk•16m ago•0 comments

Show HN: Cck – Auto-generate Claude.md so Claude Code remembers your project

https://github.com/takawasi/claude-context-keeper
1•takawasi•17m ago•0 comments

Shopify Handles 30TB of Data Every Minute with a Monolithic Architecture

https://medium.com/@himanshusingour7/how-shopify-handles-30tb-of-data-every-minute-with-a-monolit...
1•karlmush•18m ago•0 comments

Show HN: Promptelle-Turn photos into Gemini prompts and generate images on-site

https://aiphotoprompt.xyz
1•rule2025•19m ago•0 comments

What Happens If You Edit a JPEG with a Text Editor? [video]

https://www.youtube.com/watch?v=7aWFHn1wS1U
1•rene_d•20m ago•0 comments

Across Cities: The Rosen-Roback Model

https://www.henrydashwood.com/posts/rosen-roback-model
1•HenryDashwood•20m ago•0 comments

In the '90s, Wing Commander: Privateer made me realize what kind of games I love

https://arstechnica.com/gaming/2025/12/in-the-90s-wing-commander-privateer-made-me-realize-what-k...
2•doppp•21m ago•0 comments

Home Assistant as Personal Device Tracker

https://nuxx.net/blog/2025/12/26/home-assistant-as-personal-device-tracker/
1•c0nsumer•25m ago•0 comments

Liquid Cooling Means More Performance and Less Heat for Supercomputing

https://www.nextplatform.com/2025/12/22/liquid-cooling-means-more-performance-and-less-heat-for-s...
1•rbanffy•27m ago•1 comments

Sceptical of Meta glasses? They're 'magical' if you're blind

https://www.thetimes.com/uk/science/article/meta-glasses-visual-impairment-audio-description-ai-0...
1•bookofjoe•27m ago•1 comments

Ask HN: Any others here constantly reminded of Vonnegut's Player Piano lately?

3•massung•29m ago•0 comments

Show HN: Generate Sky Art flight paths

https://joseflys.com/sky-art?t=grinch
1•jfroma•32m ago•0 comments

Cjanet

https://github.com/janet-lang/spork/blob/cjanet-jit/spork/cjanet.janet
2•birdculture•32m ago•0 comments

Show HN: Access low level AMD EPYC and Threadripper metrics in Grafana

https://github.com/turbo/esmi
1•summarity•32m ago•0 comments

Ask HN: If browsers had an alternative to JavaScript, what would that be?

2•cupofjoakim•33m ago•0 comments

Why Are There So Many Car Companies in China and Japan vs. the US?

https://www.governance.fyi/p/why-are-there-so-many-car-companies
3•RetiredRichard•34m ago•0 comments

Building my faux Lego advent calendar feels like current software development

https://christianheilmann.com/2025/12/26/building-my-faux-lego-advent-calendar-feels-like-current...
1•ArmageddonIt•40m ago•1 comments

Rebellions AI Puts Together an HBM and Arm Alliance to Take on Nvidia

https://www.nextplatform.com/2025/12/23/rebellions-ai-puts-together-an-hbm-and-arm-alliance-to-ta...
1•rbanffy•40m ago•0 comments

Show HN: Xctbl – a system built around records, tools, and context (no signup)

https://RCRDBL.com/context
1•promptfluid•41m ago•0 comments

Aligning to What? Rethinking Agent Generalization in MiniMax M2

https://huggingface.co/blog/MiniMax-AI/aligning-to-what
1•victormustar•42m ago•0 comments

Global Grey – Rare and Classic Ebooks

https://www.globalgreyebooks.com/index.html
1•Brajeshwar•43m ago•0 comments

Show HN: I built an AI video tool that generates synced audio automatically

https://grokimagine.app
1•Evanmo666•44m ago•0 comments

Show HN: TocToc – Write your PDF table of contents in plain text

https://toctoc.imaginaryapps.com/
1•imaginaryapps•44m ago•0 comments

A local first context engine for Cursor, Claude Code and more

https://repobase.dev
1•falafio•46m ago•1 comments

Rob Pike Goes Nuclear over GenAI

https://skyview.social/?url=https%3A%2F%2Fbsky.app%2Fprofile%2Frobpike.io%2Fpost%2F3matwg6w3ic2s&...
214•christoph-heiss•47m ago•140 comments