Package managers keep using Git as a database, it never works out

https://nesbitt.io/2025/12/24/package-managers-keep-using-git-as-a-database.html

784•birdculture•1mo ago

Comments

eviks•1mo ago

Indeed, the seductive nature of bad tools lying close to your hand - no need to lift your butt to get them!

twoodfin•1mo ago

What made git special & powerful from the start was its data model: Like the network databases of old, but embedded in a Merkle tree for independent evolution and verifiability.

Scaling that data model beyond projects the size of the Linux kernel was not critical for the original implementation. I do wonder if there are fundamental limits to scaling the model for use cases beyond “source code management for modest-sized, long-lived projects”.

amluto•1mo ago

Most of the problems mentioned in the article are not problems with using a content-addressed tree like git or even with using precisely git’s schema. The problems are with git’s protocol and GitHub’s implementation thereof.

Consider vcpkg. It’s entirely reasonable to download a tree named by its hash to represent a locked package. Git knows how to store exactly this, but git does not know how to transfer it efficiently.

mananaysiempre•1mo ago

> Git knows how to store [a hash-addressed tree], but git does not know how to transfer it efficiently.

Naïvely, I’d expect shallow clones to be this, so I was quite surprised by a mention of GitHub asking people not to use them. Perhaps Git tries too hard to make a good packfile?..

Meanwhile, what Nixpkgs does (and why “release tarballs” were mentioned as a potential culprit in the discussion linked from TFA) is request a gzipped tarball of a particular commit’s files from a GitHub-specific endpoint over HTTP rather than use the Git protocol. So that’s already more or less what you want, except even the tarball is 46 MB at this point :( Either way, I don’t think the current problems with Nixpkgs actually support TFA’s thesis.

quaintdev•1mo ago

I host my own code repository using Forgejo. It's not public. In fact, it's behind mutual tls like all the service I host. Reason? I don't want to deal with bots and other security risks that come with opening port to the world.

Turns out Go module will not accept package hosted on my Forgejo instance because it asks for certificate. There are ways to make go get use ssh but even with that approach the repository needs to be accessible over https. In the end, I cloned the repository and used it in my project using replace directive. It's really annoying.

xyzzy_plugh•1mo ago

> There are ways to make go get use ssh but even with that approach the repository needs to be accessible over https.

No, that's false. You don't need anything to be accessible over HTTP.

But even if it did, and you had to use mTLS, there's a whole bunch of ways to solve this. How do you solve this for any other software that doesn't present client certs? You use a local proxy.

agwa•1mo ago

If you add .git to the end of your module path and set $GOPRIVATE to the hostname of your Forgejo instance, then Go will not make any HTTPS requests itself and instead delegate to the git command, which can be configured to authenticate with client certificates. See https://go.dev/ref/mod#vcs-find

irusensei•1mo ago

Have a look at Tailscale DNS and certs. Its gives you a valid cert through lets encrypt without exposing your services to the internet.

baobun•1mo ago

If you add the instance TLS cert (CA) to your trust store then go will happily download over https. It can be finicky depending on how you run go but I can confirm it works.

Zambyte•1mo ago

The issues with using Git for Nix seem to entirely be issues with using GitHub for Nix, no?

femiagbabiaka•1mo ago

Yeah, it's inclusion in here is baffling because none of the listed issues have anything to do with the particular issue nixpkgs is having.

Rucadi•1mo ago

I also got the same feeling from that, in fact, I would go as far as to say that nixpkgs and nix-commands integration with git works quite well and is not an issue.

So the phrase the article says "Package managers keep falling for this. And it keeps not working out" I feel that's untrue.

The most issue I have with this really is "flakes" integration where the whole recipe folder is copied into the store (which doesn't happen with non-flakes commands), but that's a tooling problem not an intrinsic problem of using git

ekjhgkejhgk•1mo ago

Do the easy thing while it works, and when it stops working, fix the problem.

Julia does the same thing, and from the Rust numbers on the article, Julia has about 1/7th the number of packages that Rust does[1] (95k/13k = 7.3).

It works fine, Julia has some heuristics to not re-download it too often.

But more importantly, there's a simple path to improve. The top Registry.toml [1] has a path to each package, and once donwloading everything proves unsustainable you can just download that one file and use it to download the rest as needed. I don't think this is a difficult problem.

[1] https://github.com/JuliaRegistries/General/blob/master/Regis...

zahlman•1mo ago

> 00000000-1111-2222-3333-444444444444 = { name = "REPLTreeViews", path = "R/REPLTreeViews" }

... Should it be concerning that someone was apparently able to engineer an ID like that?

adestefan•1mo ago

It’s as random as any other UUID.

Severian•1mo ago

Incorrect, only some UUIDs are random, specifically v4 and v7 (v7 uses time as well).

https://en.wikipedia.org/wiki/Universally_unique_identifier

> 00000000-1111-2222-3333-444444444444

This would technically be version 2, which would be built from the date-time and MAC address, and DCE security version.

But overall, if you allow any yahoo to pick a UUID, its not really a UUID, its just some random string that looks like one.

ekjhgkejhgk•1mo ago

> if you allow any yahoo to pick a UUID, its not really a UUID

universally unique identifier (UUID)

> 00000000-1111-2222-3333-444444444444

It's unique.

Anyway we're talking about a package that doesn't matter. It's abandoned. Furthermore it's also broken, because it uses REPL without importing it. You can't even precompile it.

https://github.com/pfitzseb/REPLTreeViews.jl/blob/969f04ce64...

anonymars•1mo ago

Which is to say, not guaranteed at all. GUIDs are designed to be unique, not random/unpredictable

https://devblogs.microsoft.com/oldnewthing/20120523-00/?p=75...

skycrafter0•1mo ago

If you read the repo README, it just says "generate a uuid". You can use whatever you want as long as it fits the format, it seems.

ekjhgkejhgk•1mo ago

Could you please articulate specifically why that should be concerning?

Right now I don't see the problem because the only criterion for IDs is that they are unique.

zahlman•1mo ago

I didn't know whether they were supposed to be within the developer's control (in which case the only real concern is whether someone else has already used the id), or generated by the system (in which case a developer demonstrated manipulation of that system).

Apparently it is the former, and most developers independently generate random IDs because it's easy and is extremely unlikely to result in collisions. But it seems the dev at the top of the list had a sense of vanity instead.

KenoFischer•1mo ago

You're supposed to generate a random one, but the only consequence of not doing so is that you won't be able to register your package if someone else already took the UUID (which is a pain if you have registered versions in a private registry). That said, "vanity" UUIDs are a bad look, so we'd probably reject them if someone tried that today, but there isn't any actual issue with them.

galenlynch•1mo ago

I believe Julia only uses the Git registry as an authoritative ledger where new packages are registered [1]. My understanding is that as you mention, most clients don't access it, and instead use the "Pkg Protocol" [2] which does not use Git.

[1] https://github.com/JuliaRegistries/General

[2] https://pkgdocs.julialang.org/dev/protocol/

0xbadcafebee•1mo ago

This is basically unethical. Imagine anything important in the world that worked this way. "Do nuclear engineering the easy way while it works, and when it stops working, fix the problem."

Software engineers always make the excuse that what they're making now is unimportant, so who cares? But then everything gets built on top of that unimportant thing, and one day the world crashes down. Worse, "fixing the problem" becomes near impossible, because now everything depends on it.

But really the reason not to do it, is there's no need to. There are plenty of other solutions than using Git that work as well or better without all the pitfalls. The lazy engineer picks bad solutions not because it's necessarily easier than the alternatives, but because it's the path of least resistance for themselves.

Not only is this not better, it's often actively worse. But this is excused by the same culture that gave us "move fast and break things". All you have to do is use any modern software to see how that worked out. Slow bug-riddled garbage that we're all now addicted to.

hombre_fatal•1mo ago

On the other hand, GitHub wants to be the place you choose to build your registry for a new project, and they are clearly on board with the idea given that they help massive projects like Nix packages instead of kicking them off.

As opposed to something like using a flock of free blogger.com blogs to host media for an offsite project.

baobun•1mo ago

...For now. The writing is on the wall.

ModernMech•1mo ago

Hold up... "lazy engineers" are the problem here? What about a society that insists on shoving the work product of unfunded, volunteer engineers into critical infrastructure because they don't want to pay what it costs to do things the right way? Imagine building a nuclear power plant with an army of volunteer nuclear engineers.

It cannot be the case that software engineers are labelled lazy for not building the at-scale solution to start with, but at the same time everyone wants to use their work, and there are next to no resources for said engineer to actually build the at scale solution.

> the path of least resistance for themselves.

Yeah because they're investing their own personal time and money, so of course they're going to take the path that is of least resistance for them. If society feels that's "unethical", maybe pony up the cash because you all still want to rely on their work product they are giving out for free.

rovr138•1mo ago

> If society feels that's "unethical", maybe pony up the cash because you all still want to rely on their work product they are giving out for free.

I like OSS and everything.

Having said that, ethically, should society be paying for these? Maybe that is what should happen. In some places, we have programs to help artists. Should we have the same for software?

ekjhgkejhgk•1mo ago

Fixing problems as they appear is unethical? Ok then.

You realize, there are people who think differently? Some people would argue that if you keep working on problems you don't have but might have, you end up never finishing anything.

It's a matter of striking a balance, and I think you're way on one end of the spectrum. The vast majority of people using Julia aren't building nuclear plants.

BenjiWiebe•1mo ago

Fixing problems when they appear is ethical.

Refusing to fix a problem that hasn't appeared yet, but has been/can be foreseen - that's different. I personally wouldn't call it unethical, but I'd consider it a negative.

zephen•1mo ago

The problem is that popularity is governed by power laws.

Literally anybody could forsee that, _if_ something scales to millions of users, there will be issues. Some of the people who forsee that could even fix it. But they might spend their time optimizing for something that will never hit 1000 users.

Also, the problems discussed here are not that things don't work, it's that they get slow and consume too many resources.

So there is certainly an optimal time to fix such problems, which is, yes, OK, _before_ things get _too_ slow and consume _too_ many resources, but is most assuredly _after_ you have a couple of thousand users.

xboxnolifes•1mo ago

Most of the world does work this way. Problems are solved within certain conditions and for use over a certain time frame. Once those change, the problem gets revisited.

Most software gets to take it to more of an extreme then many engineering fields since there isn't physical danger. Its telling that the counter examples always use the potentially dangerous problems like medicine or nuclear engineering. The software in those fields are more stringent.

0xbadcafebee•1mo ago

The "certain conditions" is wildly different for software engineers since there are virtually no laws or professional guidelines restricting them.

> Most software gets to take it to more of an extreme then many engineering fields since there isn't physical danger

But there is physical danger. It's just abstracted away from the engineer. The engineer writing a video card driver doesn't see any physical danger, but the video may be used to display a warning that the person is about to be shot by an assailant. That's one example of a billion possible ones, because you do not control what your software will eventually be used for. Thus it's unethical to make decisions based on one's personal interests, as what's at stake is much larger.

> Its telling that the counter examples always use the potentially dangerous problems like medicine or nuclear engineering. The software in those fields are more stringent.

As someone who's worked in those fields: Not really. Submit a form that said you did some black box testing, and whatever software you want (even when you have no idea how it works) gets approved for a medical device. Nuclear is also scarily vulnerable. The software that controls other critical systems is even less robust. Just look at the decades of failures in SCADA, and realize IoT is even worse.

soraminazuki•1mo ago

What is wrong with you? You berated and name-called open source volunteers because a blog post taught you that package managers using Git are "bad." Let me be clear: a 3 minute read of a blog post offers neither moral superiority nor technical insights that surpass those of actual maintainers.

Contrary to the snap conclusion you drew from the article, there are design trade-offs involved when it comes to package managers using Git. The article's favored solution advocates for databases, which in practice, makes the package repository a centralized black box that compromises package reproducibility. It may solve some problems, but still sucks harder in some ways.

The article is also flat-out wrong regarding Nixpkgs. The primary distribution method for Nixpkgs has always been tarballs, not Git. Although the article has attempted to backpedal [1], it hasn't entirely done so. It's now effectively criticizing collaboration over Git while vaguely suggesting that maybe it’s a GitHub problem. And you think what, that collaboration over Git is "unethical"???

On one side, there are open-source maintainers contributing their time and effort as volunteers. On the other, there are people like you attacking them, labeling them "lazy" and bemoaning that you're "forced" to rely on the results of their free labor, which you deride as "slow, bug-riddled garbage" without any real understanding. I know whose side I'm on.

[1]: https://github.com/andrew/nesbitt.io/commit/8e1c21d96f4e7b3c...

saagarjha•1mo ago

I feel like you could have dropped your first sentence to better represent your point

IshKebab•1mo ago

> when it stops working, fix the problem

This is too naive. Fixing the problem costs a different amount depending on when you do it. The later you leave it the more expensive it becomes. Very often to the point where it is prohibitively expensive and you just put up with it being a bit broken.

This article even has an example of that - see the vcpkg entry.

mi_lk•1mo ago

> Do the easy thing while it works, and when it stops working, fix the problem

Another way to phrase this mindset is "fuck around and find out" in gen-Z speak. It's usually practical to an extent but I'm personally not a fan

sagarm•1mo ago

I've mostly heard FAFO used to describe something obviously stupid.

Building on the same thing people use for code doesn't seem stupid to me, at least initially. You might have to migrate later if you're successful enough, but that's not a sign of bad engineering. It's just building for where you are, not where you expect to be in some distant future

zephen•1mo ago

Not at all.

When you fuck around optimizing prematurely, you find out that you're too late and nobody cares.

Oh, well, optimization is always fun, so there's that.

syockit•1mo ago

That's one thing, the other is you find out you were optimizing for the wrong thing, and now it takes more effort and time to reoptimize for the right thing.

bencornia•1mo ago

> Grab’s engineering team went from 18 minutes for go get to 12 seconds after deploying a module proxy. That’s not a typo. Eighteen minutes down to twelve seconds.

> The problem was that go get needed to fetch each dependency’s source code just to read its go.mod file and resolve transitive dependencies. Cloning entire repositories to get a single file.

I have also had inconsistent performance with go get. Never enough to look closely at it. I wonder if I was running into the same issue?

zahlman•1mo ago

> needed to fetch each dependency’s source code just to read its go.mod file and resolve transitive dependencies.

Python used to have this problem as well (technically still does, but a large majority of things are available as a wheel and PyPI generally publishes a separate .metadata file for those wheels), but at least it was only a question of downloading and unpacking an archive file, not cloning an entire repo. Sheesh.

Why would Go need to do that, though? Isn't the go.mod file in a specific place relative to the package root in the repo?

klooney•1mo ago

Go's lock files arrived at around the same time as the proxy, before then you didn't have transitive dependencies pre baked.

fireflash38•1mo ago

How long ago were you having issues? That was changed in go 1.13.

c-linkage•1mo ago

This seems like a tragedy of the commons -- GitHub is free after all, and it has all of these great properties, so why not? -- but this kind of decision making occurs whenever externalities are present.

My favorite hill to die on (externality) is user time. Most software houses spend so much time focusing on how expensive engineering time is that they neglect user time. Software houses optimize for feature delivery and not user interaction time. Yet if I spent one hour making my app one second faster for my million users, I can save 277 user hour per year. But since user hours are an externality, such optimization never gets done.

Externalities lead to users downloading extra gigabytes of data (wasted time) and waiting for software, all of which is waste that the developer isn't responsible for and doesn't care about.

ekjhgkejhgk•1mo ago

I wouldn't call it tragedy of the commons, because it's not a commons. It's owned by microsoft. They're calculating that it's worth it for them, so I say take as much as you can.

Commons would be if it's owned by nobody and everyone benefits from its existence.

TeMPOraL•1mo ago

Still, because reality doesn't respect boundaries of human-made categories, and because people never define their categories exhaustively, we can safely assume that something almost-but-not-quite like a commons, is subject to an almost-but-not-quite tragedy of the commons.

reactordev•1mo ago

An A- is still an A kind of thinking. I like this approach as not everything perfectly fits the mold.

ttiurani•1mo ago

The whole notion of the "tragedy of the commons" needs to be put to rest. It's an armchair thought experiment that was disproven at the latest in the 90s by Elinor Ostrom with actual empirical evidence of commons.

The "tragedy", if you absolutely need to find one, is only for unrestricted, free-for-all commons, which is obviously a bad idea.

b00ty4breakfast•1mo ago

yeah, it's a post-hoc rationalization for the enclosure and privatization of said commons.

TeMPOraL•1mo ago

And here I thought the standard, obvious solution to tragedy of the commons is centralized governance.

dpark•1mo ago

People invoke the tragedy of the commons in bad faith to argue for privatization because “the alternative is communism”. i.e. Either an individual or the government has to own the resource.

This is of course a false dichotomy because governance can be done at any level.

AnthonyMouse•1mo ago

It also seems to omit the possibility that the thing could be privately operated but not for profit.

Let's Encrypt is a solid example of something you could reasonably model as "tragedy of the commons" (who is going to maintain all this certificate verification and issuance infrastructure?) but then it turns out the value of having it is a million times more than the cost of operating it, so it's quite sustainable given a modicum of donations.

Free software licenses are another example in this category. Software frequently has a much higher value than development cost and incremental improvements decentralize well, so a license that lets you use it for free but requires you to contribute back improvements tends to work well because then people see something that would work for them except for this one thing, and it's cheaper to add that themselves or pay someone to than to pay someone who has to develop the whole thing from scratch.

b00ty4breakfast•1mo ago

That is, in fact, how medieval commons were able to exist successfully for hundreds of years.

Saline9515•1mo ago

Ostrom showed that it wasn't necessarily a tragedy, if tight groups involved decided to cooperate. This common in what we call "trust-based societies", which aren't universal.

Nonetheless, the concept is still alive, and anthropic global warming is here to remind you about this.

wongarsu•1mo ago

A high-trust community like a village can prevent a tragedy of the commons scenario. Participants feel obligations to the community, and misusing the commons actually does have real downsides for the individual because there are social feedback mechanisms. The classic examples like people grazing sheep or cutting wood are bad examples that don't really work.

But that doesn't mean the tragedy of the commons can't happen in other scenarios. If we define commons a bit more generously it does happen very frequently on the internet. It's also not difficult to find cases of it happening in larger cities, or in environments where cutthroat behavior has been normalized

TeMPOraL•1mo ago

> A high-trust community like a village can prevent a tragedy of the commons scenario. Participants feel obligations to the community, and misusing the commons actually does have real downsides for the individual because there are social feedback mechanisms.

That works while the size of the community is ~100-200 people, when everyone knows everyone else personally. It breaks down rapidly after that. We compensate for that with hierarchies of governance, which give rise to written laws and bureaucracy.

New tribes break off old tribes, form alliances, which form larger alliances, and eventually you end up with countries and counties and vovoidships and cities and districts and villages, in hierarchies that gain a level per ~100x population increase.

This is sociopolitical history of the world in a nutshell.

lukan•1mo ago

"and eventually you end up with countries and counties and vovoidships and cities and districts and villages, in hierarchies that gain a level per ~100x population increase."

You say it like this is a law set in stone, because this is what happened im history, but I would argue it happened under different conditions.

Mainly, the main advantage of an empire over small villages/tribes is not at all that they have more power than the villages combined, but that they can concentrate their power where it is needed. One village did not stand a chance against the empire - and the villages were not coordinated enough.

But today we would have the internet for better communication and coordination, enabling the small entieties to coordinate a defense.

Well, in theory of course. Because we do not really have autonomous small states, but are dominated by the big players. And the small states have mowtly the choice which block to align with, or get crushed. But the trend might go towards small again.

(See also cheap drones destroying expensive tanks, battleships etc.)

ajuc•1mo ago

Internet is working exactly the opposite way to what your describing - it's making everything more centralized. Once we had several big media companies in each country and in each big city. Now we have Google and Facebook and tik tok and twitter and then the "whatevers".

NETWORK effect is a real thing

lukan•1mo ago

Yes, but there is a difference between having the choice of joining FB or not having a choice at all when the empire comes to claim you (like in Ukraine).

8note•1mo ago

FB is part of the empire though, and it is coming for us.

canadians need an anti-imperial radio-canada run alternative. we arent gonna be able to coordinate against the empire when the empire has the main control over the internet.

when the americans come a knocking, we're gonna wish we had chinese radios

vlovich123•1mo ago

I’ve heard stories from communist villages where everyone knew everyone. Communal parks and property was not respected and frequently vandalized or otherwise neglected because it didn’t have an owner and it was treated as something for someone else to solve.

It’s easier to explain in those terms than assumptions about how things work in a tribe.

xorcist•1mo ago

> That works while the size of the community is ~100-200 people,

Yet we regularly observe that working with millions of people; we take care of our young, we organize, when we see that some action hurt our environment we tend to limit its use.

It's not obvious why some societies break down early and some go on working.

AnthonyMouse•1mo ago

I get the feeling it's the combination of Schelling points and surplus. If everyone else is being pro-social, i.e. there is a culture of it, and the people aren't so hard up that they can reasonably afford to do the same, then that's what happens, either by itself (Hofstadter's theory of superrationality) or via anything so much as light social pressure.

But if a significant fraction of the population is barely scraping by then they're not willing to be "good" if it means not making ends meet, and when other people see widespread defection, they start to feel like they're the only one holding up their end of the deal and then the whole thing collapses.

This is why the tendency for people to propose rent-seeking middlemen as a "solution" to the tragedy of the commons is such a diabolical scourge. It extracts the surplus that would allow things to work more efficiently in their absence.

TeMPOraL•1mo ago

> Yet we regularly observe that working with millions of people; we take care of our young, we organize, when we see that some action hurt our environment we tend to limit its use.

That's more like human universals. These behaviors generally manifest to smaller or larger degree, depending on how secure people feel. But those are extremely local behaviors. And in fact, one of them is exactly the thing I'm talking about:

> we organize

We organize. We organize for many reasons, "general living" is the main one but we're mostly born into it today (few got the chance to be among the founding people of a new village, city or country). But the same patterns show up in every other organizations people create, from companies to charities, from political interests groups to rural housewives' circles -- groups that grow past ~100 people split up. Sometimes into independent groups, sometimes into levels of hierarchies. Observe how companies have regional HQs and departments and areas and teams; religious groups have circuits and congregations, etc. Independent organizations end up creating joint ventures and partnerships, or merge together (and immediately split into a more complex internal structure).

The key factor here is, IMO, for everyone in a given group to be in regular contact with everyone else. Humans are well evolved for living in such small groups - we come with built-in hardware and software to navigate complex interpersonal situations. Alignment around shared goals and implicit rules is natural at this scale. There's no space for cheaters and free-loaders to thrive, because everyone knows everyone else - including the cheater and their victims. However, once the group crosses this "we're all a big family, in it together" size, coordinating everyone becomes hard, and free-loaders proliferate. That's where explicit laws come into play.

This pattern repeats daily, in organizations people create even today.

lo_zamoyski•1mo ago

Even here, the state is the steward of the common good. It is a mistaken notion that the state only exists because people are bad. Even if people were perfectly conscientious and concerned about the common good, you still need a steward. It simply wouldn’t be a steward who would need to use aggressive means to protect the common good from malice or abuse.

ttiurani•1mo ago

> But that doesn't mean the tragedy of the commons can't happen in other scenarios.

Commons can fail, but the whole point of Hardin calling commons a "tragedy" is to suggest it necessarily fails.

Compare it to, say, driving. It can fail too, but you wouldn't call it "the tragedy of driving".

We'd be much better off if people didn't throw around this zombie term decades after it's been shown to be unfounded.

jandrewrogers•1mo ago

> A high-trust community like a village can prevent a tragedy of the commons scenario.

No it does not. This sentiment, which many people have, is based on a fictional and idealistic notion of what small communities are like having never lived in such communities.

Empirically, even in high-trust small villages and hamlets where everyone knows everyone, the same incentives exist and the same outcomes happen. Every single time. I lived in several and I can't think of a counter-example. People are highly adaptive to these situations and their basic nature doesn't change because of them.

Humans are humans everywhere and at every scale.

yellow_postit•1mo ago

While an earlier poster is over stating Ostrom’s Nobel prize winning work — it is regularly shown that averting the tragedy of the commons is not as insurmountable as the original coining of the phrase implied.

gmfawcett•1mo ago

Ostrom's results didn't disprove ToC. She showed that common resources can be communally maintained, not that tragic outcomes could never happen.

8note•1mo ago

i dont thjnk anything can disprove that ToC issues can happen under any situation.

that seems like an unreasonable bar, and less useful than "does this system make ToC less frequent than that system"

dpark•1mo ago

She not “disprove” the existence of the tragedy of the commons. What she established was that controlling the commons can be done communally rather than through privatization or through government ownership.

Communal management of a resource is still government, though. It just isn’t central government.

The thesis of the tragedy of the commons is that an uncontrolled resource will be abused. The answer is governance at some level, whether individual, collective, or government ownership.

> The "tragedy", if you absolutely need to find one, is only for unrestricted, free-for-all commons, which is obviously a bad idea.

Right. And that’s what people are usually talking about when they say “tragedy of the commons”.

lo_zamoyski•1mo ago

There is an analogy in the sense that for the users a resource is, for certain practical intents and purposes, functionally common. Social media is like this as well.

But I would make the following clarifications:

1. A private entity is still the steward of the resource and therefore the resource figures into the aims, goals, and constraints of the private entity.

2. The common good is itself under the stewardship of the state, as its function is guardian of the common good.

3. The common good is the default (by natural law) and prior to the private good. The latter is instituted in positive law for the sake of the former by, e.g., reducing conflict over goods.

TeMPOraL•1mo ago

> There is an analogy in the sense that for the users a resource is, for certain practical intents and purposes, functionally common. Social media is like this as well.

I think it's both simpler and deeper than that.

Governments and corporations don't exist in nature. Those are just human constructs, mutually-recursive shared beliefs that emulate agents following some rules, as long as you don't think too hard about this.

"Tragedy of the commons" is a general coordination problem. The name itself might've been coined with some specific scenarios in mind, but for the phenomenon itself, it doesn't matter what kind of entities exploit the "commons"; the "private" vs. "public" distinction itself is neither a sharp divide, nor does it exist in nature. All that matters is that there's some resource used by several independent parties, and each of them finds it more beneficial to defect than to cooperate.

In a way, it's basically a 3+-player prisonner's dilemma. The solution is the same, too: introducing a party that forces all other parties to cooperate. That can be a private or public or any other kind of org taking ownership of the commons and enforcing quotas, or in case of prisonners, a mob boss ready to shoot anyone who defects.

lo_zamoyski•1mo ago

It was not my intent to be exhaustive, but to make a few points that left it up to the reader to relate them appropriately to your post in order to enrich thinking about the subject.

But it appears we cannot avoid getting into the weeds a bit…

> Governments and corporations don't exist in nature.

This is not as simple as you seem to think.

The claim “don’t exist in nature” is vague, because the word “nature” in common speech is vague. What is “natural”? Is a beehive “natural” Is a house “natural”? Is synthetic water “natural”? (I claim that the concept of “nature” concerns what it means to be some kind of thing. Perhaps polystyrene has never existed before human beings synthesized it, but it has a nature, that is, it means something to be polystyrene. And it is in the nature of human beings to make materials and artifacts, i.e., to produce technology ordered toward the human good.)

So, what is government? Well, it is an authority whose central purpose is to function as the guardian and steward of the common good. I claim that parenthood is the primordial form of human government and the family as the primordial form of the state. We are intrinsically social and political animals; legitimate societies exist only when joined by a common good. This is real and part of human nature. The capacity to deviate from human nature does not disprove the norm inherent to it.

Now, procedurally we could institute various particular and concrete arrangements through which government is actualized. We could institute a republican form of government or a monarchy, for example. These are historically conditioned. But in all cases, there is a government. Government qua government is not some arbitrary “construct”, but something proper to all forms and levels of human society.

> "Tragedy of the commons" is a general coordination problem.

We can talk about coordination once we establish the ends for which such coordination is needed, but there is something more fundamental that must be said about the framing of the problem of the “tragedy”. The framing does not presume a notion of human beings as moral agents and political and social creatures. In other words, it begins with a highly individualist, homo economicus view of human nature as rationally egoist and oriented toward maximizing utility, full stop. But I claim that is not in accord with human nature and thus the human good, even if people can fall into such pathological patterns of behavior (especially in a culture that routinely reinforces that norm).

As I wrote, human beings are inherently social animals. We cannot flourish outside of societies. A commons that suffers this sort of unhinged extraction is an example of a moral and a political failure. Why? Because it is unjust, intemperate, and a lack of solidarity to maximize resource extraction in that manner. So the tragedy is a matter of a) the moral failure of the users of that resource, and b) the failure of an authority to regulate its use. The typical solution that’s proposed is either privatization or centralization, but both solutions presuppose the false anthropology of homo economicus. (I am not claiming that privatization does not have a place, only that the dichotomy is false.)

Now, I did say that the case with something like github is analogical, because functionally, it is like a common resource, just like how social media functions like a public square in some respects. But analogy is not univocity. Github is not strictly speaking a common good, nor is social media strictly a public square, because in both cases, a private company manages them. And typically, private goods are managed for private benefit, even if they are morally bound not to harm the common good.

That intent, that purpose, is central to determining whether something is public or private, because something public has the common benefit as its aim, while something private has private benefit as its aim.

bee_rider•1mo ago

That seems to assume some sort of… maybe unfounded linearity or something? I mean, I’m not sure I agree that GitHub is nearly a commons in any sense, but let’s put that aside as a distraction…

The idea of the tragedy of the commons relies on this feedback loop of having these unsustainably growing herds (growing because they can exploit the zero-cost-to-them resources of the commons). Feedback loops are notoriously sensitive to small parameter changes. MS could presumably impose some damping if they wanted.

TeMPOraL•1mo ago

> That seems to assume some sort of… maybe unfounded linearity or something

Not linearity but continuity, which I think is a well-founded assumption, given that it's our categorization that simplifies the world by drawing sharp boundaries where no such bounds exist in nature.

> The idea of the tragedy of the commons relies on this feedback loop of having these unsustainably growing herds (growing because they can exploit the zero-cost-to-them resources of the commons)

AIUI, zero-cost is not a necessary condition, a positive return is enough. Fishermen still need to buy fuel and nets and pay off loans for the boats, but as long as their expected profit is greater than that, they'll still overfish and deplete the pond, unless stronger external feedback is introduced.

Given that the solution to tragedy of the commons is having the commons owned by someone who can boss the users around, GitHub being owned by MS makes it more of a commons in practice, not less.

kortilla•1mo ago

No, it’s not a well-founded assumption. Many categories like these were created in the first place because there is a very obvious discontinuous step change in behavior.

You’re fundamentally misunderstanding what tragedy of the commons is. It’s not that it’s “zero-cost” for the participants. All it requires a positive return that has a negative externality that eventually leads to the collapse of the system.

Overfishing and CO2 emissions are very clearly a tragedy of the commons.

GitHub right now is not. People putting all sorts of crap on there is not hurting github. GitHub is not going to collapse if people keep using it unbounded.

Not surprisingly, this is because it’s not a commons and Microsoft oversees it, placing appropriate rate limits and whatnot to make sure it keeps making sense as a business.

thayne•1mo ago

And indeed MS/GitHub does impose some "damping" in the form of things like API request throttling, CPU limits on CI, asking Homebrew not to use shallow cloning, etc. And those limits are one of the reasons given why using git as a database isn't good.

jasonkester•1mo ago

It has the same effect though. A few bad actors using this “free” thing can end up driving the cost up enough that Microsoft will have to start charging for it.

The jerks get their free things for a while, then it goes away for everyone.

Y_Y•1mo ago

I think the jerks are the ones who bought and enshittified GitHub after it had earned significant trust and become an important part of FOSS infrastructure.

irishcoffee•1mo ago

Scoping it to a local maxima, the only thing worse than git is github. In an alternate universe hg won the clone wars and we are all better off for it.

MarsIronPI•1mo ago

Excuse me if this is obvious, but how is Mercurial better than Git from a repo format perspective?

dahart•1mo ago

Why do you blame MS for predictably doing what MS does, and not the people who sold that trust & FOSS infra to MS for a profit? Your blame seems misplaced.

And out of curiosity, aside from costing more for some people, what’s worse exactly? I’m not a heavy GitHub user, but I haven’t really noticed anything in the core functionality that would justify calling it enshittified.

mastax•1mo ago

Plenty of blame to go around.

Probably the worst thing MS did was kill GitHub’s nascent CI project and replace it with Azure DevOps. Though to be fair the fundamental flaws with that approach didn’t really become apparent for a few years. And GitHub’s feature development pace was far too slow compared to its competitors at the time. Of course GitHub used to be a lot more reliable…

Now they’re cramming in half baked AI stuff everywhere but that’s hardly a MS specific sin.

MS GitHub has been worse about DMCA and sanctioned country related takedowns than I remember pre acquisition GitHub being.

Did I miss anything?

Y_Y•1mo ago

I don't blame them uniquely. I think it's a travesty the original GitHub sold out, but it's just as predictable. Giant corps will evilly make the line go up, individual regular people will have a finite amount of money for which they'll give up anything and everything.

As for how the site has become worse, plenty of others have already done a better job than I could there. Other people haven't noticed or don't care and that's ok too I guess.

PunchyHamster•1mo ago

Well, till you choose to host something yourself and it becomes popular

rvba•1mo ago

I doubt anyone is calculating

Remember how GTA5 took 10 minutes to start and nobody cared? Lots of software is like this.

Some Blizzard games download 137 MB file every time you run them and take few minutes to start (and no, this is not due to my computer).

ericyd•1mo ago

Tragedy of the Microsoft just doesn't sound as nice though

groundzeros2015•1mo ago

A public park suffers from tragedy of the commons even though it’s managed by the city.

dahart•1mo ago

> so I say take as much as you can. Commons would be if it’s owned by nobody

This isn’t what “commons” means in the term ‘tragedy of the commons’, and the obvious end result of your suggestion to take as much as you can is to cause the loss of access.

Anything that is free to use is a commons, regardless of ownership, and when some people use too much, everyone loses access.

Finite digital resources like bandwidth and database sizes within companies are even listed as examples in the Wikipedia article on Tragedy of the Commons. https://en.wikipedia.org/wiki/Tragedy_of_the_commons

nkmnz•1mo ago

No, the word and its meaning both point to the fact that there’s no exclusive ownership of a commons. This is importantl, since ownership is associated with bearing the cost of usage (i.e., deprecation) which would lead an owner to avoid the tragedy of the commons. Ownership is regularly the solution to the tragedy (socialism didn’t work).

The behavior that you warn against is that of a free rider that make use of a positive externality of GitHub’s offering.

dahart•1mo ago

That is one meaning of “commons”, but not all of them, and you might be mistaking which one the phrase ‘tragedy of the commons’ is using.

“Commons can also be defined as a social practice of governing a resource not by state or market but by a community of users that self-governs the resource through institutions that it creates.”

https://en.wikipedia.org/wiki/Commons

The actual mechanism by which ownership resolves tragedy of the commons scenarios is by making the resource non-free, by either charging, regulating, or limiting access. The effect still occurs when something is owned but free, and its name is still ‘tragedy of the commons’, even when the resource in question is owned by private interests.

bawolff•1mo ago

How does that differ from what the person you are arguing against is saying?

dahart•1mo ago

Ownership, I guess. The 2 parent comments are claiming that “tragedy of the commons” doesn’t apply to privately owned things. I’m suggesting that it does.

Edit: oh, I do see what you mean, and yes I misunderstood the quote I pulled from WP - it’s talking about non-ownership. I could pick a better example, but I think that’s distracting from the fact that ‘tragedy of the commons’ is a term that today doesn’t depend on the definition of the word ‘commons’. It’s my mistake to have gotten into any debate about what “commons” means, I’m only saying today’s usage and meaning of the phrase doesn’t depend on that definition, it’s a broader economic concept.

nkmnz•1mo ago

No, it’s not.

dahart•1mo ago

What’s not what? Care to back up your argument with any links? I already pointed out that examples in the WP article for ‘Tragedy of the Commons’ use private property. https://en.wikipedia.org/wiki/Tragedy_of_the_commons#Digital... Are you contradicting the Wikipedia article? Why, and on what basis?

bawolff•1mo ago

I'm not sure i agree that the Wikipedia article supports your position.

Certainly private property is involved in tragedy of the commons. In the classic shared cattle ranching example, the individual cattle are private property, only the field is held in common.

I generally think that tragedy of the commons requires the commons, to, well, be held in common. If someone owns the thing that is the commons, its not a commons but just a bad product. (With of course some nit picking about how things can be de jure private property while being defacto common property)

In the microsoft example, windows becoming shitty software is not a tragedy of the commons, its just MS making a business decision because windows is not a commons. On the other hand, computing in general becoming shitty, because each individual app does attention grabbing dark patterns, as it helps the induvidual apps bottom line while hurting the ecosystem as a whole, would be a tragedy of the commons, as user attention is something all apps hold in common and none of them own.

dahart•1mo ago

One of the examples of digital commons in the article is Wikipedia itself, which is privately owned, so now you can be sure the Wikipedia article does backup my claim at least a little.

The Microsoft example in this subthread is GitHub, not Windows. Windows is not a digital commons, because it’s neither free nor finite. Github is (or was) both. That is the criteria that Wikipedia is using to apply the descriptor ‘commons’: something that is both freely available to the public, and comes in limited supply, e.g. bandwidth, storage, databases, compute, etc.

Wikipedia’s article seems to be careful to not discuss ownership nor define the tragedy of the commons in terms of ownership, presumably because the phrase describes something that can still happen when privately owned things are made freely available. I skimmed Investopedia’s article on Tragedy as well, and it seems similarly to not explicitly discuss ownership, and even brings up the complicated issue of lack of international commons. That’s an interesting point: whatever we call commons locally may not be a commons globally. That suggests that even the original classic notion of tragedy of the commons often involves a type of private ownership, i.e. overfishing a “public” lake is a lake owned by a specific country, cattle overusing a “public” pasture is land owned by a specific country, and these resources might not be truly common when considered globally.

nkmnz•1mo ago

The use of Github is not "something that is both freely available to the public". If you're not the customer, you're the product.

dahart•1mo ago

What use of GitHub are you talking about? The use of GitHub by @c-linkage at the top of the thread was, in fact, based on GitHub being free to use. And GitHub’s basic services are free to use. I really don’t know what you mean.

Your oft-repeated customer vs product platitude doesn’t seem to apply to GitHub, at least not to it’s founding and core product offering. You are the customer, and GitHub doesn’t advertise. It’s a freemium model, the free access is just a sort of loss leader to entice paid upgrades by you, the customer.

nkmnz•1mo ago

> The use of GitHub by @c-linkage at the top of the thread was, in fact, based on GitHub being free to use.

The mere fact that we're discussing this here is advertisement for Github's services. Q.e.d.

dahart•1mo ago

How so? And what’s the relevance to this thread?

nkmnz•1mo ago

I'm contradicting your interpretation of the Wikipedia article. It does not support your initial statement that a) Github's (or any other company's) free tier constitutes a commons and/or b) the "overuse" of said free tiers by free riders could be the base of a tragedy of the commons (ToC). The idea is absurd, since there is no commons and also no tragedy. To the contrary. Commons have an external or natural limit to how much they can provide in a given time without incurring cost in the form of depreciation. But there is no external or natural limit to the free tier. The free tier is the result of the incentives under which the Github management operates and it is fully at their discretion, so the limits are purely internal. Other than in the case of commons, more usage can actually increase the amount of resources provided by the company for the users of the free tier, because a) network effects and b) economies of scale (more users bring more other users; more users cost less per user).

If Github realizes that the free tier is too generous, they can cut it anytime without it being in any way a "tragedy" for anybody involved - having to pay for stuff or service you want to consume is not the "T" in ToC! The T is that there are no incentives to pay (or use less) without increasing the incentives for everyone else to just increase their relative use! You not using the github free tier doesn't increase the usage of Github for anybody else - if it has any effect at all, it might actually decrease the usage of Github because you might not publish something that might in turn attract other users to interact.

dahart•1mo ago

Wikipedia does use Wikipedia, a privately owned organization, as an example of a digital commons.

The ‘tragedy’ that the top comment referred to is losing unlimited access to some of GitHub’s features, as described in the article (shallow clones, CPU limits, API rate limits, etc.). The finiteness, or natural limit, does exist in the form of bandwidth, storage capacity, server CPU capacity, etc.. The Wikipedia article goes through that, so I’m left with the impression you didn’t understand it.

bawolff•1mo ago

> Wikipedia does use Wikipedia, a privately owned organization

The Wikimedia organization does not actually own wikipedia. They do not control editorial policy nor own the copyright of any of the contents. They do not pay any of the editors.

dahart•1mo ago

They do own the servers. The rest of your comment is what demonstrates why Wikipedia counts as “commons”. Much of the same can be said for GitHub too.

nkmnz•1mo ago

It is really annoying that you're shifting the goal post by bringing up Wikipedia (as an example, not the article), which is very much different from Github in many ways. Still, Wikipedia is not a common good in my book, but at least in the case of Wikipedia I can understand the reasoning and it's a much more interesting case.

But let's stick with Github. On which of the following statements can we agree?

Z1) A "Commons" is a system of interacting market participants, governed by shared interests and incentives (and sometimes shared ownership). Github, a multi billion subsidiary of the multi trillion dollar company Microsoft, and I, their customer, are not members of the same commons; we don't share many interests, we have vastly different incentives, and we certainly do not share any ownership. We have a legally binding contract that each side can cancel within the boundaries of said contract under the applicable law.

Z2) A tragedy in the sense of the Tragedy of the Commons is that something bad happens even though everyone can have the best intentions, because the system lacks a mechanism would allow to a) coordinate interests and incentives across time, and b) to reward sustainable behavior instead of punishing it.

A) Github giving away stuff for free while covering the cost does not constitute a common good from... 1. a legal perspective 2. an ethical perspective 3. an economic perspective

B) If a free tier is successful, a profit maximizing company with a market penetration far from saturation will increase the resources provided in total, while there is no such mechanism or incentive for any participant in a market involving a common good, e.g. there will be no one providing additional pasture for free if an Allmende is already destroying the existing pasture through overgrazing.

C) If a free tier is unsuccessful because it costs more than it enables in new revenue, a company can simply shut it down – no tragedy involved. No server has been depreciated, no software destroyed, no user lost their share of a commonly owned good.

D) More users of a free tier reduce net loss / increase net earnings per free user for the provider, while more cattle grazing on a pasture decrease net earnings / increase net loss per cow.

E) If I use less of Github, you don't have any incentive to use more of it. This is the opposite of a commons, where one participant taking less of it puts out an incentive to everybody else to take their place and take more of it.

F) A service that you pay for with your data, your attention, your personal or company brand and reach (e.g. with public repositories), is not really free.

G) The tiny product samples that you can get for free in perfume shops do not constitute a common good, even though they are limited, "free" for the user, and presumably beneficial even for people not involved in the transaction. If you think they were a common good, what about Nestlé offering Cheerios with +25% more for free? Are those 20% a common good just because they are free? Where do you draw the line? Paying with data, attention, and brand + reach is fine, but paying for only 80% of the produce is not fine?

H) The concepts of "moral hazard" and "free riders" apply to all your examples, both Github and Wikipedia. The concept of a Commons (capital C) is neither necessary nor helpful in describing the problems that you want to describe wrt to free services provided by either Github of Wikipedia.

dahart•1mo ago

Nope, no goal posts were moved, Wikipedia and GitHub are both private entities that offer privately funded free services to everyone, and due to the widespread free access, both have been considered to be examples of digital commons by others. I didn’t make up the Wikipedia example, it’s in Wikipedia being offered as one of the canonical examples of digital commons, and unfortunately for you it pokes a hole in your argument. If your ‘book’ disagrees with the WP article, you’re free to fix it (since WP is a digital commons), and you’re also free to use it to re-evaluate whether your book needs updating.

You seem to be stuck on definitions of ‘commons’, and unfortunately that’s not a compelling argument for reasons I’ve already stated. Also unfortunate that there are fundamental terminology flaws, or made up definitions, or straw men arguments, or incorrect statements, or opinions in every single item you listed.

“Tragedy of the Commons” is a phrase that became an economic term of art a long time ago. It’s now an abstract concept, and gets used to mean (as well as defined by) any situation in which a community of people overusing shared resources causes any loss of access to those shared resources for anyone else in the community. “The tragedy of the commons is an economic theory claiming that individuals tend to exploit shared resources so that demand outweighs supply, and it becomes unavailable for the whole.” (Investopedia) I’ve already cited multiple sources that define it that way, and so far you’ve shared no evidence to the contrary.

There are also tons of examples online where the phrase has been used to refer to small, local, or privatized resources, I found a dozen in like one minute, so I already know it’s incorrect to claim that people don’t use the phrase in the way I’m suggesting.

Even though the phrase does not depend on any strict definition of commons (or of tragedy), none of your argument addresses the fact that what’s common in, say, Germany is not freely available to Iranians, for example. Land is often used in ‘tragedy of the commons’ examples. Hardin’s original example was sheep grazing on “public” land, and yet there is really no such thing as common land anywhere on this planet, all of it is claimed by subgroups, e.g., countries, and is private is some sense. The idea of commons, and even some of the alternate dictionary definitions, make explicit note that the word is relative to a specific community of people. Nothing you’ve said addresses that fact, and it means that ‘Tragedy of the Commons’ has always referred to resources that are not common in a global context. GitHub and Wikipedia are more common than “public” land in America in that global sense, because they’re used by and available to more people than US land is.

What I can agree with is that it’s common for people to mean things like land, air, and water, when using or referring to the phrase, and I agree those things count as commons.

nkmnz•1mo ago

You're confusing public goods with common goods. That's your personal tragedy of the commons.

> “The tragedy of the commons is an economic theory claiming that individuals tend to exploit shared resources so that demand outweighs supply, and it becomes unavailable for the whole.” (Investopedia)

EXACTLY. This is NOT what is happening in the case of Github. As explained plenty of times, Github has the incentive to INCREASE their supply, making MORE available for the whole, if the whole demands MORE. Also, they are a centralized, coordinated entity, that can change the rules for the whole flock, which is one of the famous coordination problems associated with common goods. They can also discriminate between their contractual partners and optimize for multi-period results for reducing moral hazards and free-riding. It must be stupidity to not see these fundamental difference on the systems level.

> I didn’t make up the Wikipedia example, it’s in Wikipedia being offered as one of the canonical examples of digital commons

Yeah, the example in the article is Wikipedia, not Github. That's your example. All my statements refer to 100% to Github and probably only 90% to Wikipedia. That said, there are true digital commons, e.g. the copper cables connecting the houses in your street. Unsufficient number of bands in old wifi standards.

Since Dunning-Kruger has entered the chat, I'm going to leave. Have a good day; you will have a hard time having serious conversations if you do not accept that it helps everyone to favor precise language over watering down the meaning of concepts, like some social scientists and journalists seem to prefer for self-marketing purposes.

dahart•1mo ago

> You’re confusing public goods with common goods.

Am I? Where did I do that? The distinction between common and public is defined as whether or not the thing can succumb to tragedy of the commons. If public goods are “non-rivalrous”, then land is not a public good, it’s a common good, right? And “common” land is owned by nation states, or by smaller geographic communities, is it not? Therefore, ownership is always involved and the land is not available for use by people from other nation states, right?

Above, you said “there’s no exclusive ownership of a commons”. But sheep grazing on “commons” land is generally land owned exclusively by a country, nation, state, province, city, etc.. I assume what you meant was that no one person or sub-group within the geographical community owns the commons.

> This is NOT what is happening in the case of GitHub.

That’s not true, the article we’re commenting on gave examples of at least three different specific things that GitHub has limited in response to overuse, and the comment that started this thread was reacting to that fact. If they have incentive to increase their supply, why didn’t they actually do it? Logic can’t override history.

> there are true digital commons, e.g. the copper cables connecting the houses in your street

That’s not true, that’s not a commons at all, and not what the phrase “digital commons” means. In the US, the cables are owned by the telcom providers that installed them, they are private property. Maybe there are public cables where you live, but in that case, it seems like maybe you are the one confusing public and common goods. The phrase ‘digital commons’ generally speaking refers to digital goods, not physical goods. (But there is some leakage into the physical world, which is why some digital commons are susceptible to the tragedy of the commons.) https://en.wikipedia.org/wiki/Digital_commons (Do note that GitHub is listed there as an example of a digital commons.)

> It must be stupidity to not see these fundamental difference on the systems level

FWIW, you’ve flatly broken HN guidelines here, and this reflects extremely poorly on you and your argument. From my point of view, I can only interpret this lack of civility to mean you you’re frustrated about not being able to answer my questions or form a convincing argument.

Please review, and strive for better: https://news.ycombinator.com/newsguidelines.html

xp84•1mo ago

GP shouldn't have said something insulting, but I do think it's you who are being obtuse here in not acknowledging that this is at least very different than the field everyone can graze on that gets overgrazed, that is the most simple and widely-accepted type of commons. It's probably not worth arguing semantics at all ("is this a commons?") because there isn't a "Tragedy of the Commons" central authority that could ever adjudicate that. Any definition of commons could be used; the only thing that matters is if the definitions are useful to define what's going on and to compare it to other situations.

In this case, GitHub can very cheaply add enforceable rules and force heavy users to consume only what they consider a tolerable amount of resources. The majority who don't need an outsized amount of resources will never be affected by this. That is why there is no 'tragedy' here.

It would be as if the grazing field were outfitted with sheep-facial-recognition and could automatically and at trivial cost, gently drone-airlift any sheep outside the field after they consume 3x what a normal sheep eats each day. In what most of us think of as a ToC situation, there is little that can be done besides closing the field or subdividing it into tiny, private plots which are policed.

dahart•1mo ago

The singular point of debate here from my side has been whether the phrase ‘tragedy of the commons’ applies to cases where the ‘commons’ are owned to the exclusion of some people, and nothing else. I don’t believe I have failed to acknowledge the differences between physical and digital commons, but let me correct that impression now: GitHub certainly is very different from a sheep-grazing field in almost every way. GitHub is even different from Wikipedia in many ways, just like GP said. I am arguing those differences, no matter how large, do not matter purely in terms of whether you can call these a ‘commons’, and I’ve supported that opinion by showing evidence that other people call both GitHub and Wikipedia a ‘digital commons’. If any definition of commons can be used, including privately owned land that is made available to the public, then I think you and I agree completely. The Wikipedia article about this phrase actually points out what I’ve been saying here, that common land does not exist.

There is a central authority on this topic: the paper by Hardin that coined the phrase. It’s worth a read. He defined ‘tragedy’ to be in the dramatic sense, e.g., a Greek or Shakespearean tragedy: “We may well call it ‘the tragedy of the commons,’ using the word ‘tragedy’ as the philosopher Whitehead used it: ‘The essence of dramatic tragedy is not unhappiness. It resides in the solemnity of the remorse-less working of things.’”

Hardin did not define ‘commons’, but he used multiple examples of things that are owned to the exclusion of others, and he even pointed out that a bank robber thinks of a bank as a commons. He himself blurred the line of what a commons means, and his actual argument depends only on the idea that commons means something shared and nothing more. In fact, he was making a point about human behavior, and his argument is stronger when ‘commons’ refers to any shared resources that can be exhausted by overuse at all. Hardin would have had a good chuckle over this extremely silly debate.

The actual points Hardin was making behind his phrase ‘Tragedy of the Commons’ were that Adam Smith’s ‘Invisible Hand’ economics, and Libertarian thinking, are provably wrong, and that we should abolish the UN’s Universal Declaration of Human Rights, specifically the right to breed freely, because he believes these things will certainly lead to overpopulation of the earth and thus increased human suffering. The only actual ‘commons’ he truly cared about in this paper is the earth’s space and food supply. The question of ownership is wholly and utterly irrelevant to his phrase.

GitHub adding rules that curtails people does limit some people’s access, that’s the point. How many people it affects I don’t know, and I don’t think it’s especially relevant, but note that in this case one single GitHub user being limited might affect many many people - Homebrew was one of the examples.

“Tragedy” never referred to the magnitude of the problem, as you and GP are assuming. Hardin’s “tragedy” refers to the human character flaw of thinking that shared things are preferable to limitations, because he argues that we end up with uncontrolled (worse) limitations anyway. His “tragedy” is the inevitability of loss, the irony of misguided belief in the very idea of a commons.

drob518•1mo ago

Right. Microsoft could easily impose a transfer fee if over a certain amount that would allow “normal” OSS development of even popular software to happen without charge while imposing a cost to projects that try to use GitHub like a database.

TUSF•1mo ago

I wouldn't call it "tragedy of the commons" because the very idea was coined as a strawman. As far as I'm concerned, the entire concept is a fallacy, and people should stop perpetuating it.

loloquwowndueo•1mo ago

Just a reminder that GitHub is not git.

The article mentions that most of these projects did use GitHub as a central repo out of convenience so there’s that but they could also have used self-hosted repos.

justincormack•1mo ago

They probably would have experienced issues way sooner, as the self hosted tools don't scale nearly as well.

machinationu•1mo ago

Explain to me how you self-host a git repo which is accessed millions of time a day from CI jobs pulling packages.

ozim•1mo ago

FTFY:

Explain to me how you self-host a git repo without spending any money and having no budget which is accessed millions of time a day from CI jobs pulling packages.

freedomben•1mo ago

I'm not sure whether this question was asked in good faith, but is actually a damn good one.

I've looked into self hosting and git repo that has horizontal scalability, and it is indeed very difficult. I don't have the time to detail it in a comment here, but for anyone who is curious it's very informative to look at how GitLab handled this with gitaly. I've also seen some clever attempts to use object storage, though I haven't seen any of those solutions put heavily to the test.

I'd love to hear from others about ideas and approaches they've heard about or tried

https://gitlab.com/gitlab-org/gitaly

adrianN•1mo ago

You git init —-bare on a host with sufficient resources. But I would recommend thinking about your CI flow too.

machinationu•1mo ago

no, hundred of thousands of thousands of individual projects CI jobs. OP was talking about package managers for the whole world, not for one company

adrianN•1mo ago

If people depend on remote downloads from different companies for their CI pipelines they’re doing it wrong. Every sensible company sets up a mirror or at least a cache on infra that they control. Rate limiting downloads is the natural course of action for the provider of a package registry. Once you have so many unique users that even civilized use of your infrastructure becomes too much you can probably hire a few people to build something more scalable.

machinationu•1mo ago

numpy had 16M downloads yesterday, at 10 MB that's 160 TB of traffic. It's one package. And there are no rate limits on pypi.

https://clickpy.clickhouse.com/dashboard/numpy

fweimer•1mo ago

These days, people solve similar problems by wrapping their data in an OCI container image and distribute it through one of the container registries that do not have a practically meaningful pull rate limit. Not really a joke, unfortunately.

mystifyingpoi•1mo ago

Even Amazon encourages this, probably not intentionally, more like as a bandaid for bad EKS config that people can do by mistake, but still - you can pull 5 terabytes from ECR for free under their free tier each month.

XorNot•1mo ago

I'd say it'd just Kubernetes in general should've shipped with a storage engine and an installation mechanism.

It's a very hacky feeling addon that RKE2 has a distributed internal registry if you enable it and use it in a very specific way.

For the rate at which people love just shipping a Helm chart, it's actually absurdly hard to ship a self contained installation without just trying to hit internet resources.

favflam•1mo ago

Is running the git binary as a read-only nginx backend not good enough? Probably not. Hosting tarballs is far more efficient.

fulafel•1mo ago

Let's assume 3 million. That's about 30 per second.

From compute POV you can serve that with one server or virtual machine.

Bandwidth-wise, given a 100 MB repo size, that would make it 3.4 GB/s - also easy terrain for a single server.

heavenlyhash•1mo ago

That is roughly the number of new requests per second, but these are not just light web requests.

The git transport protocol is "smart" in a way that is, in some ways, arguably rather dumb. It's certainly expensive on the server side. All of the smartness of it is aimed at reducing the amount of transfer and number of connections. But to do that, it shifts a considerable amount of work onto the server in choosing which objects to provide you.

If you benchmark the resource loads of this, you probably won't be saying a single server is such an easy win :)

fulafel•1mo ago

Here's a web source about how much cpu time it took from 5 years ago: https://github.blog/open-source/git/git-clone-a-data-driven-...

Using the slowest clone method they measured 8s for a 750 MB repo, 0.45s for a 40MB repo. appears to be linear so 1.1s for 100MB should be a valid interpolation.

So doing 30 of those per second only takes 33 cores. Servers have hundreds of cores now (eg 384 cores: https://www.phoronix.com/review/amd-epyc-9965-linux-619).

And remember we're using worst case assumptions in places (using the slowest clone method, and numbers from old hardware). In practice I'd bet a fastish laptop would suffice.

edit: actually on closer look at the github reported numbers the interpolation isn't straightforward: on the bigger 750MB repo the partial clone is actually said to be slower then the base full clone. However this doesn't change the big picture that it'll easily fit on one server.

saagarjha•1mo ago

One, expensive, server.

fulafel•1mo ago

.. or a cheaper one as we would be using only tens of cores in the above scenario. Or you could use a slice of an existing server using virtualization.

zahlman•1mo ago

> Most software houses spend so much time focusing on how expensive engineering time is that they neglect user time. Software houses optimize for feature delivery and not user interaction time. Yet if I spent one hour making my app one second faster for my million users, I can save 277 user hour per year. But since user hours are an externality, such optimization never gets done.

This is what people mean about speed being a feature. But "user time" depends on more than the program's performance. UI design is also very important.

solatic•1mo ago

If you think too hard about this, you come back around to Alan Kay's quote about how people who are really serious about software should build their own hardware. Web applications, and in general loading pretty much anything over the network, is a horrible, no-good, really bad user experience, and it always will be. The only way to really respect the user is with native applications that are local-first, and if you take that really far, you build (at the very least) peripherals to make it even better.

The number of companies that have this much respect for the user is vanishingly small.

ghosty141•1mo ago

Yes because users don't appreciate this enough to pay for the time this takes.

hombre_fatal•1mo ago

Software I don’t have to install at all “respects me” the most.

Native software being an optimum is mostly an engineer fantasy that comes from imagining what you can build.

In reality that means having to install software like Meta’s WhatsApp, Zoom, and other crap I’d rather run in a browser tab.

I want very little software running natively on my machine.

freedomben•1mo ago

Yes, amen. The more invasive and abusive software gets, the less I want it running on my machine natively. Native installed applications for me now are limited only to apps I trust, and even those need to have a reason to be native apps rather than web apps to get a place in my app drawer

solatic•1mo ago

Your browser is acting like a condom, in that respect (pun not intended).

Yes, there are many cases when condoms are indicative of respect between parties. But a great many people would disagree that the best, most respectful relationships involve condoms.

> Meta

Does not sell or operate respectful software. I will agree with you that it's best to run it in a browser (or similar sandbox).

tormeh•1mo ago

Desktop operating systems really dropped the ball on protecting us from the software we run. Even mobile OSs are so-so. So the browser is the only protection we reasonably have.

I think this is sad.

shash•1mo ago

You mean you’d rather run unverified scripts using a good order of magnitude more resources with a slower experience and have an entire sandboxing contraption to keep said unverified scripts from doing anything to your machine…

I know the browser is convenient, but frankly, its been a horror show of resource usage and vulnerabilities and pathetic performance

whstl•1mo ago

The #1 reason the web experience universally sucks today is because companies add an absurd amount of third-party code on their pages for tracking, advertisement, spying on you or whatever non-essential purpose. That, plus an excessive/unnecessary amount of visual decoration.

The idea that somehow those companies would respect your privacy were they running a native app is extremely naive.

We can already see this problem on video games, where copy protection became resource-heavy enough to cause performance issues.

cosmic_cheese•1mo ago

Web apps are great until you want to revert to an older version from before they became actively user-hostile or continue to use them past EoL or company demise.

In contrast as long as you have a native binary, one way or another you can make the thing run and nobody can stop you.

phkahler•1mo ago

>> The number of companies that have this much respect for the user is vanishingly small.

I think companies shifted to online apps because #1 it solved the copy protection problem. FOSS apps are not in any hurry to become centralized because they dont care about that issue.

Local apps and data are a huge benefit of FOSS and I think every app website should at least mention that.

"Local app. No ads. You own your data."

xorcist•1mo ago

Another important reason to move to online applications is that you can change the terms of the deal at any time. This may sound more nefarious than it needs to be, it just means you do not have to commit fully to your licensing terms before the first deal is made, which is tempting for just about anyone.

inapis•1mo ago

>Yet if I spent one hour making my app one second faster for my million users, I can save 277 user hour per year. But since user hours are an externality, such optimization never gets done.

I have never been convinced by this argument. The aggregate number sounds fantastic but I don't believe that any meaningful work can be done by each user saving 1 second. That 1 second (and more) can simply be taken by me trying to stretch my body out.

OTOH, if the argument is to make software smaller, I can get behind that since it will simply lead to more efficient usage of existing resources and thus reduce the environmental impact.

But we live in a capitalist world and there needs to be external pressure for change to occur. The current RAM shortage, if it lasts, might be one of them. Otherwise, we're only day dreaming for a utopia.

adrianN•1mo ago

Time saved to increased productivity or happiness or whatever is not linear but a step function. Saving one second doesn’t help much, but there is a threshold (depending on the individual) where faster workflows lead to a better experience. It does make a difference whether a task takes a minute or half a second, at least for me.

Aerroon•1mo ago

One second is long enough that it can put a user off from using your app though. Take notifications on phones for example. I know several people who would benefit from a habitual use of phone notifications, but they never stick to using them because the process of opening (or switching over to) the notification app and navigating its UI to leave a notification takes too long. Instead they write a physical sticky note, because it has a faster "startup time".

tehbeard•1mo ago

All depends on the type of interaction.

A high usage one, absolutely improve the time of it.

Loading the profile page? Isn't done often so not really worth it unless it's a known and vocal issue.

https://xkcd.com/1205/ gives a good estimate.

Aerroon•1mo ago

This is very true, but I think some of it has to do with expectations too. Editing a profile page is a complex thing, therefore people are more willing to put up with loading times on it, whereas checking out someone's profile is a simple task and the brain has already moved on, so any delay feels bad.

jorvi•1mo ago

But there isn't just one company deciding externalizing cost on the rest of us is a great way to boost profit since it costs them very little. Especially for a monopoly like YouTube that can decide that eating up your battery is fine if it saves them a few cents in bandwidth costs.

Not all of those externalizing companies abuse your time but whatever they abuse can be expressed in a $ amount and $ can be converted to a median's person time via median wage. Hell, free time is more valuable than whatever you produce during work.

Say all that boils down to companies collectively stealing 20 minutes of your time each day. 140 minutes each week. 7280 (!) minutes each year, which is 5.05 days, which makes it almost a year over the course of 70 years.

So yeah, don't do what you do and sweettalk the fact that companies externalize costs (private the profits, socialize the losses). They're sucking your blood.

schubidubiduba•1mo ago

Just because one individual second is small, it still adds up.

Even if all you do with it is just stretching, there's a chance it will prevent you pulling a muscle. Or lower your stress and prevent a stroke. Or any number of other beneficial outcomes.

WhyNotHugo•1mo ago

> I have never been convinced by this argument. The aggregate number sounds fantastic but I don't believe that any meaningful work can be done by each user saving 1 second. That 1 second (and more) can simply be taken by me trying to stretch my body out.

I’d see this differently from a user perspective. If the average operations takes one second less, I’d spend a lot of time less waiting for my computer. I’d also have less idle moments where my mind wanders while waiting for some operation to complete too.

ozim•1mo ago

About apps done by software houses, even though we should strive for doing good job and I agree with sentiment...

First argument would be - take at least two 0's from your estimation, most of applications will have maybe thousands of users, successful ones will maybe run with 10's of thousands. You might get lucky to work on application that has 100's of thousands, millions of users and you work in FAANG not a typical "software house".

Second argument is - most users use 10-20 apps in typical workday, your application is most likely irrelevant.

Third argument is - most users would save much more time learning how to use applications (or to use computer) properly they use on daily basis, than someone optimizing some function from 2s to 1s. But of course that's hard because they have 10-20 apps daily plus god know how many other not on daily basis. Though still I see people doing super silly stuff in tools like Excel or even not knowing copy paste - so not even like any command line magic.

Y-bar•1mo ago

You’ll enjoy ”Saving Lives” by Andy Hertzfied: https://www.folklore.org/Saving_Lives.html

> "The Macintosh boots too slowly. You've got to make it faster!"

kkjjjjw•1mo ago

https://news.ycombinator.com/item?id=44843223#44879509

pastor_williams•1mo ago

This was something that I heavily focused on for my feature area a year ago - new user sign up flow. But the decreased latency was really in pursuit of increased activation and conversion. At least the incentives aligned briefly.

robmccoll•1mo ago

I don't think most software houses spend enough time even focusing on engineering time. CI pipelines that take tens of minutes to over an hour, compile times that exceed ten seconds when nothing has changed, startup times that are much more than a few seconds. Focus and fast iteration are super important to writing software and it seems like a lot of orgs just kinda shrug when these long waits creep into the development process.

JohnHaugeland•1mo ago

> This seems like a tragedy of the commons -- GitHub is free after all, and it has all of these great properties, so why not?

because it's bad at this job, and sqlite is also free

this isn't about "externalities"

vlovich123•1mo ago

I think it’s naive to think engineers or managers don’t realize this or don’t think in these ways.

https://www.folklore.org/Saving_Lives.html

pdimitar•1mo ago

Is it truly naive if most engineer's careers pass and they never meet even one such manager?

For 24 years of career I've met the grand total of _two_ such. Both got fired not even 6 months after I got in the company, too.

Who's naive here?

vlovich123•1mo ago

I’ve met one who asked me a question like this and he’s still at Apple having been promoted several times to a fairly senior position. But the question was only half hearted because the question was “how much CO2 would we save if we made something 10% more CPU efficient” and the answer even at Apple’s current scale of billions of iPhones was insignificant.

So now you and I both have come across such a manager. Why would you make the claim most engineer’s don’t come across such people?

saagarjha•1mo ago

It turns out iPhones are not actually a huge contributor to worldwide carbon emissions. Data centers, on the other hand…

pdimitar•1mo ago

Anecdotal evidence and all such, but in my environment actually good managers were rarer than UFO sightings.

Environments and local markets matter a huge amount.

I believe the better question here would be: why would the reverse be claimed at all? Many people in the USA, and a lot of them are over-represented here on HN, are privileged and this is not obvious to them, leading to cringe-worthy reactions like "just find a better company".

I guess I am barking up the wrong tree. I do dislike how over-represented certain perspectives are on HN. It's a very classic filter bubble, and the fact that it's about privileged people makes this even worse.

brightball•1mo ago

User time is typically a mix of performance tuning and UX design isn’t it?

Aurornis•1mo ago

> Most software houses spend so much time focusing on how expensive engineering time is that they neglect user time. Software houses optimize for feature delivery and not user interaction time.

I don’t know what you mean by software houses, but every consumer facing software product I’ve worked on has tracked things like startup time and latency for common operations as a key metric

This has been common wisdom for decades. I don’t know how many times I’ve heard the repeated quote about how Amazon loses $X million for every Y milliseconds of page loading time, as an example.

dijit•1mo ago

I worked in e-commerce SaaS in 2011~ and this was true then but I find it less true these days.

Are you sure that you’re not the driving force behind those metrics; or that you’re not self-selecting for like-minded individuals?

I find it really difficult to convince myself that even large players (Discord) are measuring startup time. Every time I start the thing I’m greeted by a 25s wait and a `RAND()%9` number of updates that each take about 5-10s.

jama211•1mo ago

Discord’s user base is 99% people who leave it running 100% of the time, it’s not a typical situation

dijit•1mo ago

I think that they make the startup so horrible that people are more likely to leave it running.

hexer292•1mo ago

As a discord user, it's the kind of platform that I would want to have running to receive notifications, sort of like the SMS of gaming.

A large part of my friend group use discord as the primary method of communication, even in an in person context (was at a festival a few months ago with a friend, and we would send texts over discord if we got split up) so maybe its not a common use case.

jama211•1mo ago

I strongly doubt that!

jama211•1mo ago

Hikacking my own comment to mention that the normal thing on forums when a reasonable person reads an unreasonable comment is they move on, which can make the comment stand unopposed which gives it credence it doesn’t deserve. I believe if more of us actually showed our disagreement out loud as I have here, it could change things sometimes.

solarkraft•1mo ago

It leads to me dreading having to start it (or accidentally starting it - remember IE?) and opting for the browser instead.

drob518•1mo ago

Yep, indeed. Which is the main reason I don’t run Discord.

jama211•1mo ago

I strongly doubt that. The main reason you don’t run it is likely because you don’t have strong motivation to do so, or you’d push through the odd start up time.

oceanplexian•1mo ago

Just going to throw out an anecdote that I don’t use it for the same reason.

It’s closed unless I get a DM on my phone and then I suffer the 2-3 minute startup/failed update process and quit it again. Not a fan of leaving their broken, resource hogging app running at all times.

esseph•1mo ago

It would fail to auto update as a system installed package, because that requires a system level package install.

It would not fail to update if installed as a user installed flatpak.

Many apps are this way now.

jama211•1mo ago

Why not just respond to the dm on your phone?

Flere-Imsaho•1mo ago

For me, I really dislike the fact Discord is completely closed off to the wider internet, and Discord, the company, has absolute control: from a privacy and freedom of speech point of view. This goes against the core ideas of a free and open internet.

I'll admit that the Discord service is really good from a UX point of view.

spockz•1mo ago

I have the same experience on windows. On the other hand, starting up discord on my cachyos install is virtually instant. So maybe there is also a difference between the platform the developers use and that their users use.

godelski•1mo ago

I have plenty of responses to an angry comment I made several months ago that supports your point.

I made a slight at Word taking like 10 seconds to start and some people came back saying it only takes 2, as if that still isn't 2s too long.

Then again, look at how Microsoft is handling slow File Explorer speeds...

https://news.ycombinator.com/item?id=44944352

saagarjha•1mo ago

I never said that 2s wasn’t too long. I just said your environment was broken if it took 10.

yndoendo•1mo ago

There is a high chance the extra nuts and bolts added to Windows, which slow it down, are IT required softwoods, settings, and security enhancements.

Took me almost a year to get a separate laptop laptop for office and development. Their Enhanced Security prevented me from testing administrative code features and broke Visual Studios bug submission system, which Microsoft requires you to use for posting software bugs.

By the way, I can brake Windows simply by running their PowerShell utilities to configure NICs. Windows is not the stable product people think it is.

saagarjha•1mo ago

This was on macOS

godelski•1mo ago

Wild. How'd you even find me

saagarjha•1mo ago

I read the comments on this post

rovr138•1mo ago

There was a thread here earlier this month,

> Helldivers 2 devs slash install size from 154GB to 23GB

https://news.ycombinator.com/item?id=46134178

Section of the top comment says,

> It seems bizarre to me that they'd have accepted such a high cost (150GB+ installation size!) without entirely verifying that it was necessary!

and the reply to it has,

> They’re not the ones bearing the cost. Customers are.

ux266478•1mo ago

That's not how it works. The demand for engineering hours is an order of magnitude higher than the supply for any given game, you have to pick and choose your battles because there's always much, much more to do. It's not bizarre that nobody verified texture storage was being done in an optimal way at launch, without sacrificing load times at the altar or visual fidelity, particularly given the state the rest of the game was in. Who the hell has time to do that when there are crashes abound and the network stack has to be rewritten at a moments notice?

Gamedev is very different from other domains, being in the 90th percentile for complexity and codebase size, and the 99th percentile for structural instability. It's a foregone conclusion that you will rewrite huge chunks of your massive codebase many, many times within a single year to accomidate changing design choices, or if you're lucky, to improve an abstraction. Not every team gets so lucky on every project. Launch deadlines are hit when there's a huge backlog of additional stuff to do, sitting atop a mountain of cut features.

swiftcoder•1mo ago

> It's not bizarre that nobody verified texture storage was being done in an optimal way at launch

The inverse, however, is bizarre. That they spent potentially quite a bit of engineering effort implementing the (extremely non-optimal) system that duplicates all the assets half a dozen time to potentially save precious seconds on spinning rust - all without validating it was worth implementing in the first place.

rovr138•1mo ago

Yes.

They talk about it being an optimization. They also talk about the bottleneck being level generation, which happens at the same time as loading from disk.

MBCook•1mo ago

Was Helldivers II built from the ground up? Or grown from the v1 codebase?

The first was on PS3 and PS4 where they had to deal with spinning disks and that system would absolutely be necessary.

Also if the game ever targeted the PS4 during development, even though it wasn’t released there, again that system would be NEEDED.

esseph•1mo ago

It's a completely different game, engine, etc.

darubedarob•1mo ago

Gamedev engineering hours are also in endless oversupply thanks to myDreamCream brain.

viraptor•1mo ago

There was also the GTA wasting minutes to load/parse JSON files at startup. https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...

And Skylines rendering teeth on models miles away https://www.reddit.com/r/CitiesSkylines/comments/17gfq13/the...

Sometimes the performance is really ignored.

darubedarob•1mo ago

Wasn' there a website with formula on how much time things like the GTA bug costed humanity as a whole? Like 5 minutes × users× sessionsperday accumulated?

It cost several human lifetimes if i remember correctly. Still not as bad as windows update which taking the time times wage has set the gdp of a small nation on fire every year..

mattmanser•1mo ago

I met a seasoned game dev who complained to me he was only ever hired at the end of projects to speed up the code a bunch of mid/junior level game devs the company had used to actually make the game. Basically he said there was only so much time he'd get given, and he'd have to go for low hanging fruit and might miss stuff.

We've only got a couple of game dev shops in my city, so not sure how common that is.

whstl•1mo ago

Sweatshops love junior devs, as they never complain, never make suggestions and always take the blame for bugs.

A senior joining when time is tight makes sense, they don’t want anyone to rock the boat, just to plug the holes.

AuthAuth•1mo ago

Im pretty sure that CS rendering teeth a mile away turned out to be false. But it was repeated so much and the game release state was (and still is) so bad people assumed it to be true.

kibwen•1mo ago

> They’re not the ones bearing the cost. Customers are.

I think this is uncharitably erasing the context here.

AFAICT, the reason that Helldivers 2 was larger on disk is because they were following the standard industry practice of deliberately duplicating data in such a way as to improve locality and thereby reduce load times. In other words, this seems to have been a deliberate attempt to improve player experience, not something done out of sheer developer laziness. The fact that this attempt at optimization is obsolete these days just didn't filter down to whatever particular decision-maker was at the reins on the day this decision was made.

saghm•1mo ago

I don't think it's quite that simple. The reason they had such a large install size in the first place was due to concern about the load times for players using HDDs instead of SSDs; duplicating the data was intended to be a way to avoid making some players load into levels much more slowly than others (which in an online multiplayer game would potentially have repercussions for other players as well). The link you give mentions that this was based on flawed data (although it's somewhat light on those details), but that's means the actual cause was a combination of a technical mistake and the presence of care for user experience, just not the experience of the majority at the expense of the smaller but not insignificant minority. There's certainly room for argument about whether this was the correct judgement call to make or that they should have been better at recognizing their data was flawed, but it doesn't really seem like it fits the trends of devs not giving a shit about user experience. If making perfect judgement calls and never having flawed data is the bar for proving you care about users, we might as well just give up on the idea that any companies will ever reach it.

godelski•1mo ago

How about GitHub actions with safe sleep that took over a year to accept a trivial PR that fixed a bug that caused actions to hang forever because someone forgot that you need <= instead of == in a counter...

Though in this case GitHub wasn't bearing the cost, it was gaining a profit...

https://github.com/actions/runner/pull/3157

https://github.com/actions/runner/issues/3792

pjmlp•1mo ago

An exception that confirms the rule.

mindslight•1mo ago

> every consumer facing software product I’ve worked on has tracked things like startup time and latency for common operations as a key metric

Are they evaluating the shape of that line with the same goal as the stonk score? Time spent by users is an "engagement" metric, right?

eviks•1mo ago

The issue here is not tracking, but developing. Like, how do you explain the fact that whole classes of software have gotten worse on those "key metrics"? (and that includes web-selling webpages)

ponector•1mo ago

Contrary, every consumer facing product I've worked had no performance metrics tracked. And for enterprise software it was even worse as the end user is not the one who makes a decision to buy and use software.

>>what you mean by software houses

How about Microsoft? Start menu is a slow electron app.

philipallstar•1mo ago

> How about Microsoft? Start menu is a slow electron app.

If your users are trapped due to a lack of competition then this can definitely happen.

pjmlp•1mo ago

If only community actually gathered around the true Linux distribution instead of endless forks.

philipallstar•1mo ago

Exactly. Let's start by listing all the true Linux distributions and we can go from there!

julianz•1mo ago

The Start menu is not an Electron app. Don't believe everything you read on the internet.

Spooky23•1mo ago

That makes the usability and performance of the windows start menu even more embarrassing.

The decline of Windows as a user facing product is amazing, especially as they are really good at developing things they care about. The “back of house” guts of Windows has improved alot, for example. They should just have a cartoon Bill Gates pop up like clippy and flip you the bird at this point.

jiggawatts•1mo ago

Much worse is that the search function built into the start menu has been broken in different ways in every major release of Windows since XP, including Server builds.

It has both indexing failures and multi-day performance issues for mere kilobytes of text!

kortilla•1mo ago

People believing it says something about the start menu

TehShrike•1mo ago

hey, haven't seen that one in the wild for a little bit :-D https://www.smbc-comics.com/comic/aaaah

kortilla•1mo ago

The comic artist seems pretty ignorant to think that it’s not meaningful.

What falsehoods people believe and spread about a particular topic is an excellent way to tell what the public opinion is on something.

Consider spreading a falsehood about Boeing QA getting bonuses based on number of passed planes vs the same falsehood about Airbus. If the Boeing one spreads like wildfire, it tells you that Boeing has a terrible track record of safety and that it’s completely believable.

Back to the start menu. It should be a complete embarrassment to MSFT SWEs that people even think the start menu performance is so bad that it could be implemented in electron.

In summary: what lies spread easily is an amazing signal on public perception. The SMBC comic is dumb.

saagarjha•1mo ago

AAAAAAAAAAAAAA

Flimm•1mo ago

It's less meaningful than you think. Widespread prejudice does give you signal on public sentiment, but it doesn't give you much signal on whether the prejudice happens to coincide with reality or not, compared to other methods. People should be open to having their prejudices corrected by more relevant information.

kortilla•1mo ago

We’re talking about new prejudices, not old.

odo1242•1mo ago

React Native, not Electron. Though it is slower than it was

kevin_thibedeau•1mo ago

That's even more damning that they can't dogfood their own GUI toolkits for the primary UI of their own OS.

Conan_Kudo•1mo ago

The Start menu is React Native, but Outlook is now an Electron app.

moregrist•1mo ago

> I don’t know how many times I’ve heard the repeated quote about how Amazon loses $X million for every Y milliseconds of page loading time, as an example.

This is true for sites that are trying to make sales. You can quantify how much a delay affects closing a sale.

For other apps, it’s less clear. During its high-growth years, MS Office had an abysmally long startup time.

Maybe this was due to MS having a locked-in base of enterprise users. But given that OpenOffice and LibreOffice effectively duplicated long startup times, I don’t think it’s just that.

You also see the Adobe suite (and also tools like GIMP) with some excruciatingly long startup times.

I think it’s very likely that startup times of office apps have very little impact on whether users will buy the software.

delaminator•1mo ago

They even made it render the screen but still be unusable to make it look like it was running.

epmatsw•1mo ago

Every SSRed app these days…

croes•1mo ago

Then why do many software house favor cloud software over on premise?

They often have a recognizable delay to user data input compared to local software

kevin_thibedeau•1mo ago

The MBAs hate capital expenditures and love operating expenditures. They'll make strategic blunders like over-dependence on external services just to satisfy their warped mindset.

venturecruelty•1mo ago

>I don’t know what you mean by software houses, but every consumer facing software product I’ve worked on has tracked things like startup time and latency for common operations as a key metric.

Then respectfully, uh, why is basically all proprietary software slow as ass?

j_w•1mo ago

Clearly Amazon doesn't care about that sentiment across the board. Plenty of their products are absurdly slow because of their poor engineering.

Yoric•1mo ago

Can confirm at least for Firefox. When I worked on it, I've spent literal years shaving seconds from startup, or shutdown, or milliseconds from tab switching.

Everybody likes to hate Telemetry, and yes, it can be abused, but that's how Mozilla (and its competitors) manage to make user's life more comfortable.

xp84•1mo ago

> every consumer facing software product I’ve worked on has tracked things like startup time and latency for common operations as a key metric

Must be nice. In my career, all working on webapps, I've seen a few leaders popping in to ask us to fix a particularly egregious performance issue if the right customers complain, but aside from those finely-targeted and limited-attention-span drives to "improve performance" it seems the answer for the past decade or so is just to assume everyone is on at least a gigabit connection, stick fingers in ears, and just keep adding more node modules. If the developers' disks get full because node_modules got too big, buy a bigger SSD and keep going. (ok that last part is slight hyperbole but I also don't think frontend devs would be deterred from their ravenous appetite for libraries by a full disk).

ponector•1mo ago

>> I don’t know what you mean by software houses, but every consumer facing software product I’ve worked on has tracked things like startup time and latency for common operations as a key metric

Maybe Google? Gmail app is 700+ MB

threatofrain•1mo ago

> Most software houses spend so much time focusing on how expensive engineering time is that they neglect user time. Software houses optimize for feature delivery and not user interaction time.

Oh no no no. Consumer-facing companies will burn 30% of your internal team complexity budget on shipping the first "frame" of your app/website. Many people treat Next as synonymous with React, and Next's big deal was helping you do just this.

massysett•1mo ago

> Externalities lead to users downloading extra gigabytes of data (wasted time) and waiting for software, all of which is waste that the developer isn't responsible for and doesn't care about.

This is perfectly sensible behavior when the developers are working for free, or when the developers are working on a project that earns their employer no revenue. This is the case for several of the projects at issue here: Nix, Homebrew, Cargo. It makes perfect sense to waste the user's time, as the user pays with nothing else, or to waste Github's bandwidth, since it's willing to give bandwidth away for free.

Where users pay for software with money, they may be more picky and not purchase software that indiscriminately wastes their time.

BobbyTables2•1mo ago

Microsoft would have long gone out of business if users cared about their time being wasted.

Windows 11 should not be more sluggish than Windows 7.

imiric•1mo ago

> GitHub is free after all, and it has all of these great properties, so why not?

The answer is in TFA:

> The underlying issue is that git inherits filesystem limitations, and filesystems make terrible databases.

gritzko•1mo ago

Let’s make a thought experiment. Suppose that I have a data format and a store that resolves the issues in the post. It is like git meets JSON meets key-value. https://github.com/gritzko/go-rdx

What is the probability of it being used? About 0%, right? Because git is proven and GitHub is free. Engineering aspects are less important.

stkdump•1mo ago

Sorry, I am turned off by the CRDT in there. It immediately smells of overengineering to me. Not that I believe git is a better database. But why not just SQL?

gritzko•1mo ago

Merges require revisioning. JSON or SQL do not have that in the model. This variant of CRDT is actually quite minimalistic.

stkdump•1mo ago

I would argue LWW is the opposite of a merge. It is better to immediately know at the time of writing that there is a conflict. CRDTs either solve or (in this case) don't solve a problem that doesn't really exist, especially for package managers.

gritzko•1mo ago

Git solves that problem and it definitely exists. Speaking of package managers, it really depends. Like, can we use one SQLite file for that? So easy, why no one is doing that?

stkdump•1mo ago

idk, debian for example uses plain text files. I have to imagine it would bring some advantages to move that over to an sqlite to improve performance, but then it seems package management designers fall into the two categories of either under-engineering or over-engineering the solution. There is little glory in evolving something incrementally, everyone wants to do green field stuff.

pdimitar•1mo ago

I am very interested by something like this but your README is not making it easy to like. Demonstrating with 2-3 sample apps using RDX might have gone a long way.

So how do I start using it if I, for example, want to use it like a decentralized `syncthing`? Can I? If not, what can I use it for?

I am not a mathematician. Most people landing on your repo are not mathematicians either.

We the techies _hate_ marketing with a passion but I as another programmer find myself intrigued by your idea... with zero idea how to even use it and apply it.

3371•1mo ago

The user hour analogy sounds weird tho, 1s feels 1s regardless how many users you have. It's like the classic Asian teachers' logic of "if you come in 1 min late you are wasting N minutes for all of us in this class." It just does not stack like that.

BenjiWiebe•1mo ago

If the class takes N minutes and one person arrives 1 minute late, and the rest of the class is waiting for them, it does stack. Every one of those students lost a minute. Far worse than one student losing one minute.

3371•1mo ago

Do "we" lose 2mins because we both spent 1 min commenting? That sounds like The Mythical of Man Month thinking... for me time is parallel and does not combine.

bawolff•1mo ago

> Software houses optimize for feature delivery and not user interaction time. Yet if I spent one hour making my app one second faster for my million users, I can save 277 user hour per year. But since user hours are an externality, such optimization never gets done.

Google and amazon are famous for optimizing this. Its not an externality to them though, even 10s of ms can equal an extra sale.

That said, i don't think its fair to add time up like that. Saving 1 second for 600 people is not the same as saving 10 minutes for 1 person. Time in small increments does not have the same value as time in large increments.

esafak•1mo ago

1. If you can price the cost of the externality, you can justify optimizing it.

2. Monopolies and situations with the principal/agent dilemma are less sensitive to such concerns.

bawolff•1mo ago

> 1. If you can price the cost of the externality, you can justify optimizing it.

An externality is usually a cost you don't pay (or pay only a negligible amount of). I don't see how pricing it helps justify optimizing it.

DrewADesign•1mo ago

> Yet if I spent one hour making my app one second faster for my million users, I can save 277 user hour per year. But since user hours are an externality, such optimization never gets done.

Wait times don’t accumulate. Depending on the software, to each individual user, that one second will probably make very little difference. Developers often overestimate the effect of performance optimization on user experience because it’s the aspect of user experience optimization their expertise most readily addresses. The company, generally, will have a much better ROI implementing well-designed features and having you squash bugs

drbojingle•1mo ago

A well designed feature IS considerate of time and attention. Why would I want a game on 20 fps when I could have it on 120? The smoothness of the experience increases my ability to use the experience optimally because I don't have to pay as much attention to it. I'd prefer if my interactions with machines were as smooth as my interactions driving a car down a empty dry highway mid day.

Prehaps not everyone cares but I've played enough Age of Empires 2 to know that there are plenty of people who have felt value gains coming from shaving seconds off this and that to get compound games over time. It's a concept plenty of folks will be familiar with.

DrewADesign•1mo ago

Sure, but without unlimited resources, you need to have priorities, and everything has a ‘good enough’ state. All of this stuff lies on an Eisenhower chart and we tend to think our concerns fall into the important/urgent quadrant, but in the grand scheme of things, they almost never do.

8note•1mo ago

i still prefer 15fps for games. if theyre putting the fps any higher, its not considerate of my time and attention

i have to pay less attention to a thing that updates less frequently. idle games are the best in that respect because you can check into the game on your own time rather than the game forcing you to pay attention on its time

jfengel•1mo ago

Isn't there a limit to human perception, well below 120 fps?

Perhaps 120fps might result in a better approximation of motion blur.

gverrilla•1mo ago

Nothing surprising. Capital hates people, even though we sustain his kingdom.

miyuru•1mo ago

Funnily enough, I clicked the homebrew GitHub link in the post, only to get a rate limited error page from GitHub.

mikkupikku•1mo ago

People who put off learning SQL for later end up using anything other than a database as their database.

redog•1mo ago

SQL killed the set theory star

groundzeros2015•1mo ago

Is sql over ssh a thing?

yawaramin•1mo ago

https://litestream.io/

groundzeros2015•1mo ago

A proprietary cloud subscription doesn’t seem like the right fit for this

yawaramin•1mo ago

As opposed to a proprietary cloud git hosting platform?

steeleduncan•1mo ago

The other conclusion to draw is "Git is a fantastic choice of database for starting your package manager, almost all popular package managers began that way."

saidinesh5•1mo ago

I think the conclusion is more that package definitions can still be maintained on git/GitHub but the package manager clients should probably rely on a cache/db/a more efficient intermediate layer.

Mostly to avoid downloading the whole repo/resolve deltas from the history for the few packages most applications tend to depend on. Especially in today's CI/CD World.

reactordev•1mo ago

This is exactly the right approach. I did this for my package manager.

It relies on a git repo branch for stable. There are yaml definitions of the packages including urls to their repo, dependencies, etc. Preflight scripts. Post install checks. And the big one, the signatures for verification. No binaries, rpms, debs, ar, or zip files.

What’s actually installed lives in a small SQLite database and searching for software does a vector search on each packages yaml description.

Semver included.

This was inspired by brew/portage/dpkg for my hobby os.

pl4nty•1mo ago

yaml definitions compiled to sqlite sounds pretty similar to winget, which is scaling very well. vector search is a cool idea - how much storage does it use per package?

pseufaux•1mo ago

This is how WinGet works. It has a small SQLite db it downloads from a hosted url. The DB contains some minimal metadata and a url path to access the full metadata. This way WinGet only has to make API calls for packages it's actually interacting with. As a package manager, it has plenty of problems still, but it's a simple, elegant solution for the git as a DB issue.

bluGill•1mo ago

Git isn't a fantastic choice unless you know nothing about databases. A search would show plenty of research on databases and what works when/why.

kibwen•1mo ago

For the purposes of the article, git isn't just being used as a database, it's being used as a protocol to replicate the database to the client to allow for offline operation and then keep those distributed copies in sync. And even for that purpose you can do better than git if you know what you're doing, but knowledge of databases alone isn't going to help you (let alone make your engineering more economical than relying on free git hosting).

freedomben•1mo ago

Exactly. It's not just about the best solution to the problem, it's also heavily about the economics around it. If I wanted to create a new package manager today, I could get started by utilizing Git and existing git hosting solutions with very little effort, and effort translates to time, and time is a scarce resource. If you don't know whether your package manager will take off or not, it may not be the best use of your scarce resources to invest in a robust and optimized solution out of the gate. I wish that weren't the case, I would love to have an infinite amount of time, but wishing is not going to make it happen

adastra22•1mo ago

Git is an absolute shit database for a package manager even in the beginning. It’s just that GitHub subsidizes hosting and that is hard to pass up.

fn-mote•1mo ago

Sure, but can you back up the expletive with some reason why you think that?

As it is, this comment is just letting out your emotion, not engaging in dialogue.

venturecruelty•1mo ago

Can we please stop the tone-policing, please? It's not helpful. Not everything needs Wikipedia-style citation, and this particular rhetorical trick is extremely passive-aggressive.

adastra22•1mo ago

The article enumerates many such reasons.

IshKebab•1mo ago

What's a better option? One that keeps track of history and has a nice review interface?

edolstra•1mo ago

Indeed. Nixpkgs wouldn't have been as successful if it hadn't been using Git (or GitHub).

Sure, eventually you run into scaling issues, but that's a first world problem.

l9o•1mo ago

I actually find that nixpkgs being a monorepo makes it even better. The code is surprisingly easy to navigate and learn if you've worked in large codebases before. The scaling issues are good problems to have, and git has gotten significantly better at handling large repos than it was a decade ago, when Facebook opted for Mercurial because git couldn't scale to their needs. If anything, it's GitHub issues and PRs that are probably showing its cracks.

venturecruelty•1mo ago

No. No, no, no. Git is a fantastic choice if you want a supply chain nightmare and then Leftpad every week forever.

ori_b•1mo ago

Alternatively: Downloading the entire state of all packages when you care about just one, it never works out.

O(1) beats O(n) as n gets large.

gruez•1mo ago

Seems to still work out for apt?

ajb•1mo ago

Not in the same sense. An analogy might be: apt is like fetching a git repo in which all the packages are submodules, so lazily fetched. Some of the package managers in the article seem to be using a monorepo for all packages - including the content. Others seem to have different issues - go wasn't including enough information in the top level, so all the submodules had to be fetched anyway. vcpkg was doing something with tree hashes which meant they weren't really addressible.

collinmanderson•1mo ago

I consider apt kinds slow. I wish it were much faster.

born-jre•1mo ago

lol I see this as I plan on using Git for my thing store. https://github.com/blue-monads/potatoverse

gjvc•1mo ago

sqlite seems to be ideal for a package manager

sigwinch•1mo ago

I feel like the rqlite people would have a lot to say about how to coordinate your installations, especially for the high-bandwidth non-desktop installs.

https://news.ycombinator.com/item?id=45257349

mirekrusin•1mo ago

...or scm [0]

[0] https://fossil-scm.org

hk1337•1mo ago

I like Go but it’s dependency management is weird and seems to be centered around GitHub a lot.

Hendrikto•1mo ago

There is nothing tying Go to GitHub.

rewgs•1mo ago

Not at all. It can grab git repos (as well as work with other VCSs). There's just a lot of stuff on GitHub, hence your impression.

andreashaerter•1mo ago

It's mostly tradition rather than a hard requirement. Go has long supported vanity import paths: https://pkg.go.dev/cmd/go#hdr-Remote_import_paths

For example, we use Hugo to provide independent Go package URLs even though the code is hosted on GitHub. That makes migrating away from GitHub trivial if we ever choose to do so (Repo: https://github.com/foundata/hugo-theme-govanity; Example: https://golang.foundata.com/hugo-theme-dev/). Usage works as expected:

  go get golang.foundata.com/hugo-theme-dev

Edit: Formatting

hogrug•1mo ago

The facts are interesting but the conclusion a bit strange. These package managers have succeeded because git is better for the low trust model and GitHub has been hosting infra for free that no one in their right mind would provide for the average DB.

If it didn't work we would not have these massive ecosystems upsetting GitHub's freemium model, but anything at scale is naturally going to have consequences and features that aren't so compatible with the use case.

ifh-hn•1mo ago

So what's the answer then? That's the question I wanted answered after reading this article. With no experience with git or package management, would using a local client sqlite database and something similar on the server do?

encom•1mo ago

I quite like Gentoo's rsync based package manager. I believe they've used that since the beginning. It works well.

MarsIronPI•1mo ago

To be clear though, the rsync trees come from a central Git repo (though it's not hosted on GitHub). And syncing from Git actually makes syncing faster.

AaronFriel•1mo ago

OCI artifacts, using the same protocol as container registries. It's a protocol designed for versioning (tagging) content addressable blobs, associating metadata with them, and it's CDN friendly.

Homebrew uses OCI as its backend now, and I think every package manager should. It has the right primitives you expect from a registry to scale.

aniou•1mo ago

As side note. Maybe someone knows, why rust devs chose an already used name for language changes proposal? "RFC" was already taken and well-established and I simply refuse to accept that someone wasn't aware about Request For Comments - and if it was true and clash was created deliberately, then it was rude and arrogant.

Every, ...king time, when I read something like "RFC 2789 introduced a sparse HTTP protocol." my brain suffers from a short-circuit. BTW: RFC 2789 is a "Mail Monitoring MIB".

adastra22•1mo ago

There are many, many RFC collections. Including many that predate the IETF. Some even predate computers.

aniou•1mo ago

But they were in different domains. Here, we have a strong clash because Rust is positioning itself as secure system and internet language and computer and internet standard are already defined by RFC-s. So, it may be not uncommon, when someone would tell about Rust mechanisms, defined by particular RFC in context of handling particular protocol, defined by... well... RFC too. But not by rust-one.

Not so smart, when we realize, that one of aspects of secure and reliable system is elimination of ambiguities.

Conan_Kudo•1mo ago

Ask them, don't ask us. They have a public interface, you can ask them to change the name to something unique.

frumplestlatz•1mo ago

Since ~2002, Macports has used svn or git, but users, by default, rsync the complete port definitions + a server-generated index + a signature.

The index is used for all lookups; it can also be generated or incrementally updated client-side to accommodate local changes.

This has worked fine for literally decades, starting back when bandwidth and CPU power was far more limited.

The problem isn’t using SCM, and the solutions have been known for a very long time.

gethly•1mo ago

If we stopped using VCS to fetch source files, we would lose the ability to get the exact commit(understand as version that has nothing to do with the underlying VCS) of these files. Git, Mercurial, SVN.., github, bitbucket...it does not matter. Absolutely nobody will be building downloadable versions of their source files, hosted on who knows how "prestigious" domains, by copying them to another location just to serve the --->exact same content<--- that github and alike already provide.

This entire blog is just a waste of time for anyone reading it.

throwway120385•1mo ago

Or you could just ship a tarball and an sha checksum.

gethly•1mo ago

you could, in case you want to make only certain releases publicly available. but then, who wants to do that manual labour? we're talking mainstream here, not specific use cases.

layer8•1mo ago

And yet, that's pretty much how the Java world works (Maven repositories).

forrestthewoods•1mo ago

> This entire blog is just a waste of time for anyone reading it.

Well that’s an extremely rude thing to say.

Personally I thought it was really interesting to read about a bunch of different projects all running into the same wall with Git.

I also didn’t realize that Git had issues with sparse checkouts. Or maybe author meant shallow? I forget.

encom•1mo ago

>[Homebrew] Auto-updates now run every 24 hours instead of every 5 minutes[...]

That is such an insane default, I'm at a loss for words.

croemer•1mo ago

You mean the 5 minutes is insane, right?

justsomehnguy•1mo ago

Every time I things like I really want to punch people over TCP/IP. UDP wouldn't suffice.

dboon•1mo ago

I’m building Cargo/UV for C. Good article. I thought about this problem very deeply.

Unfortunately, when you’re starting out, the idea of running a registry is a really tough sell. Now, on top of the very hard engineering problem of writing the code and making a world class tool, plus the social one of getting it adopted, I need to worry about funding and maintaining something that serves potentially a world of traffic? The git solution is intoxicating through this lense.

Fundamentally, the issue is the sparse checkouts mentioned by the author. You’d really like to use git to version package manifests, so that anyone with any package version can get the EXACT package they built with.

But this doesn’t work, because you need arbitrary commits. You either need a full checkout, or you need to somehow track the commit a package version is in without knowing what hash git will generate before you do it. You have to push the package update and then push a second commit recording that. Obviously infeasible, obviously a nightmare.

Conan’s solution is I think just about the only way. It trades the perfect reproduction for conditional logic in the manifest. Instead of 3.12 pointing to a commit, every 3.x points to the same manifest, and there’s just a little logic to set that specific config field added in 3.12. If the logic gets too much, they let you map version ranges to manifests for a package. So if 3.13 rewrites the entire manifest, just remap it.

I have not found another package manager that uses git as a backend that isn’t a terrible and slow tool. Conan may not be as rigorous as Nix because of this decision but it is quite pragmatic and useful. The real solution is to use a database, of course, but unless someone wants to wire me ten thousand dollars plus server costs in perpetuity, what’s a guy supposed to do?

adrianN•1mo ago

Before you managed to build a popular tool it is unlikely that you need to serve many users. Directly going for something that can serve the world is probably premature

dboon•1mo ago

For most software, yes. But the value of a package manager is in its adoption. A package manager that doesn’t run up against these problems is probably a failure anyway.

EPWN3D•1mo ago

The point is not "design to serve the world". The point is "use the right technology for your problem space".

ambicapter•1mo ago

> Unfortunately, when you’re starting out, the idea of running a registry is a really tough sell. Now, on top of the very hard engineering problem of writing the code and making a world class tool, plus the social one of getting it adopted, I need to worry about funding and maintaining something that serves potentially a world of traffic? The git solution is intoxicating through this lense.

So you need a decentralized database? Those exist (or you can make your own, if you're feeling ambitious), probably ones that scale in different ways than git does.

dboon•1mo ago

Please share. I’m interested in anything that’s roughly as simple as implementing a centralized registry, is easily inspected by users (preferably with no external tooling), and is very fast.

It’s really important that someone is able to search for the manifest one of their dependencies uses for when stuff doesn’t work out of the box. That should be as simple as possible.

I’m all ears, though! Would love to find something as simple and good as a git registry but decentralized

strbean•1mo ago

Distributed ledger! /s... ?

jopsen•1mo ago

You don't need fully distributed database, do you?

You could just make a registry hosted as plain HTTP, with everything signed. And a special file that contains a list of mirrors.

Clients request the mirror list and the signed hash of the last entry in the Merkel tree. Then they go talk to a random mirror.

Maybe, you central service requires user sign-in for publishing and reading, while mirrors can't publish, but mirrors don't require sign-in.

Obviously, you'd have to validate that mirrors are up and populated. But that's it.

You can start by self hosting a mirror.

One could go with signing schemes inspired by: https://theupdateframework.io/

Or one could omit signing all together, so long as you have a Merkel tree with hashes for all publishing events. And the latest hash entry is always fetched from your server along with the mirror list.

Having all publishing go through a single service is probably desirable. You'll eventually need to do moderation, etc. And hosting your service or a mirror becomes a legal nightmare if there is not moderation.

Disclaimer: opinions are my own.

k8ssskhltl•1mo ago

Blockchain.

yawaramin•1mo ago

Package registry in an SQLite database, snapshotted daily. Stored in a cloud bucket. New clients download the latest snapshot, existing clients stream in the updates using eg Litestream. Resolving dependencies should now be ultra fast thanks to indexes.

dboon•1mo ago

I'm just a stupid systems programmer who just discovered Cloudflare. How much do you think that'd cost? Serving a pretty heavily cached SQLite database (i.e. everyone grabs the same bytes). I realize the answer depends on scale, so let's say what if Cargo or Homebrew or some such did this?

yawaramin•1mo ago

It's free: https://developers.cloudflare.com/d1/

krautsauer•1mo ago

I wonder how meson wraps' story fits with this. They used not to, but now they're throwing everything into a single repository [0]. I wonder about the motivation and how it compares to your project.

0: https://github.com/mesonbuild/wrapdb/tree/master/subprojects

mook•1mo ago

Is there a reason the users must see all of the historic data too? Why not just have a post-commit hook render the current HEAD to static files, into something like GitHub Pages?

That can be moved elsewhere / mirrored later if needed, of course. And the underlying data is still in git, just not actively used for the API calls.

It might also be interesting to look at what Linux distros do, like Debian (salsa), Fedora (Pagure), and openSUSE (OBS). They're good for this because their historic model is free mirrors hosted by unpaid people, so they don't have the compute resources.

jarofgreen•1mo ago

I'm not OP but I'll guess .... lock files with old versions of libs in. The latest version of a library may be v2 but if most users are locked to v1.267.34 you need all the old versions too.

However a lot of the "data in git repositories" projects I see don't have any such need, and then ...

> Why not just have a post-commit hook render the current HEAD to static files, into something like GitHub Pages?

... is a good plan. Usually they make a nice static website with the data that's easy for humans to read though.

dkarl•1mo ago

Think about the article from a different perspective: several of the most successful and widely used package managers of all time started out using Git, and they successfully transitioned to a more efficient solution when they needed to.

zephen•1mo ago

Not only this, but (if I understand the article correctly) at least some of them still use git on the backend.

baobun•1mo ago

How about the Arch Linux AUR approach?

Every package has its own git repository which for binary packages contains mostly only the manifest. Sources and assets, if in git, are usually in separate repos.

This seems to not have the issues in the examples given so far, which come from using "monorepos" or colocating. It also avoids the "nightmare" you mention since any references would be in separate repos.

The problematic examples either have their assets and manifests colocated, or use a monorepo approach (colocating manifests and the global index).

dboon•1mo ago

The problem is that Arch doesn't need to quickly resolve (version -> manifest) for arbitrary versions. With Arch, /var/lib/pacman/sync/core.db has one release of a set of packages. When you install, you just grab whatever's there. Rolling release. pacman -Syu pulls the newest version of that set of packages. If you install sqlite 3.0 and then come back a few years later and "reinstall" all the Arch packages you used to have on a new machine, you'll either (a) use that exact database and pull the same version or (b) pacman -Syu, pull latest package database, and get the newest sqlite (say, 3.5)

There's no concept of installing sqlite 3.0 on a system where sqlite 3.5 is available.

For a language package manager, it's exactly the opposite. I could make a project with every version of sqlite the package manager has ever known about. They all must be resolvable.

If you want to do that resolution quickly (which manifest do I use for sqlite 3.0?), repo-per-package doesn't work without a bunch of machinery that makes it, IMO, not worth it.

Pacman is the best, you'd have to pry Arch from my cold, dead hands. Just different constraints.

jopsen•1mo ago

The alluring thing is storing the repository on S3 (or similar). Recall early docker registries making requests so complicated that backing image storage with S3 was unfeasible, without a proxy service.

The thing that scales is dumb HTTP that can be backed by something like S3.

You don't have to use a cloud, just go with a big single server. And if you become popular, find a sponsor and move to cloud.

If money and sponsor independence is a huge concern the alternative would be: peer-to-peer.

I haven't seen many package managers do it, but it feels like a huge missed opportunity. You don't need that many volunteers to peer inorder to have a lot of bandwidth available.

Granted, the real problem that'll drive up hosting cost is CI. Or rather careless CI without caching. Unless you require a user login, or limit downloads for IPs without a login, caching is hard to enforce.

For popular package repositories you'll likely see extremely degenerate CI systems eating bandwidth as if it was free.

Disclaimer: opinions are my own.

dpedu•1mo ago

> I’m building Cargo/UV for C.

Interesting! Do you mind sharing a link to the project at this point?

dboon•1mo ago

Sure! It's very raw, though. There's a lot of functionality, and I use it to build all sorts of projects already. But a common thing I do is to write the stupidest possible version of a thing and only do the hard engineering when it becomes untenable. Hence it's not raw as in being new or bare, but it's very raw in that you'll see some really rough stuff in the code.

But, that being said, here's the repo! I added a very basic README for you. It's one command to bootstrap to a self hosting build, so give it a shot if you're interested. My contact is in my profile.

https://github.com/tspader/spn

kibwen•1mo ago

I think there's a form of survivorship bias at work here. To use the example of Cargo, if Rust had never caught on, and thereby gotten popular enough to inflate the git-based index beyond reason, then it would never have been a problem to use git as the backing protocol for the index. Likewise, we can imagine innumerable smaller projects that successfully use git as a distributed delta-updating data distribution protocol, and never happen to outgrow it.

The point being, if you're not sure whether your project will ever need to scale, then it may not make sense to reinvent the wheel when git is right there (and then invent the solution for hosting that git repo, when Github is right there), letting you spend time instead on other, more immediate problems.

stickfigure•1mo ago

Right, this post may encourage premature optimization. Cargo, Homebrew, et al chose an easy, good-enough solution which allowed them to grow until they hit scaling limits. This is a good problem to have.

I am sure there's value having a vision for what your scaling path might be in the future, so this discussion is a good one. But it doesn't automatically mean that git is a bad place to start.

8note•1mo ago

im surprised nobody has made a common db for package managers, so cargo could use it without having to think about it

kibwen•1mo ago

I mean, it's sort of the other way around. Cargo was built to be able to natively understand git-based dependencies, in the sense that you can bypass a crate registry and instead just point it directly at a git repo. That means that Cargo already had to have the ability to clone git repos, and so when it came to decide how to implement the index (which looks pretty similar to a git repo if you squint), choosing to use git required them to add literally no new dependencies and almost no new code.

Let's also keep in mind that the use case mentioned in the OP is specifically about the index, which is just the datastructure that informs the version resolver how to resolve versions. When it came time to replace the git-based index, Cargo didn't replace it with a specialized database, it replaced it with HTTP endpoints (which are probably just backed by an off-the-shelf database). It's not clear what sort of specialized database would be useful to abstract this for other package managers.

inferiorhuman•1mo ago

Keep in mind that crates.io, the main crate registry, uses GitHub as its only authentication method. They may have moved away from git but they're still locked into a rather piss poor vendor.

kibwen•1mo ago

No, crates.io isn't locked to Github. crates.io uses Github as an identity provider, but there's nothing stopping them from adding more. Furthermore, they've avoided tying themselves to Github in other ways, for example, by resisting all the people just telling them to allow using Github usernames as package namespaces, specifically to prevent them from being locked to Github.

nacozarina•1mo ago

successful things often have humble origins, it’s a feature not a bug

for every project that managed to out-grow ext4/git there were a hundred that were well-served and never needed to over-invest in something else

PunchyHamster•1mo ago

The article conclusion is just... not good. There are many benefits to using Git as backend, you can point your project to every single commit as a version which makes testing any fixes or changes in libs super easy, it has built in integrity control and technically (sadly not in practice) you could just sign commits and use that to verify whether package is authentic.

It being unoptimal bandwidth wise is frankly just a technical hurdle to get over it, with benefits well worth the drawback

0xbadcafebee•1mo ago

YOLO software engineering, the hallmark of the 21st century

cesarb•1mo ago

One of these is not like the others...

> The problem was that go get needed to fetch each dependency’s source code just to read its go.mod file and resolve transitive dependencies.

This article is mixing two separate issues. One is using git as the master database storing the index of packages and their versions. The other is fetching the code of each package through git. They are orthogonal; you can have a package index using git but the packages being zip/tar/etc archives, you can have a package index not using git but each package is cloned from a git repository, you can have both the index and the packages being git repositories, you can have neither using git, you can even not have a package index at all (AFAIK that's the case for Go).

bobpaw•1mo ago

I think the article takes issue not with fetching the code, but with fetching the go.mod file that contains index and dependency information. That’s why part of the solution was to host go.mod files separately.

jayd16•1mo ago

Even with git, it should be possible to grab the single file needed without the rest of the repo, but i'ts still trying to round a square peg.

skywhopper•1mo ago

Honestly I think the article is a bit ahistorical on this one. ‘go get’ pulls the source code into a local cache so it can build it, not just to fetch the go.mod file. If they were having slow CI builds because they didn’t or couldn’t maintain a filesystem cache, that’s annoying, but not really a fault in the design. Anyway, Go improved the design and added an easy way to do faster, local proxies. Not sure what the critique is here. The Go community hit a pain point and the Go team created an elegant solution for it.

everforward•1mo ago

I was thinking this too. I think it might be talking about operations like “go mod tidy” or update operations where it updates your go.mod/sum but doesn’t actually build the code. I would guess enterprise tools do a lot of checking whether there are updates without actually doing any building.

kpcyrd•1mo ago

The author seems a little lost tbh, it's starting with "your users should not all clone your database" which I definitely agree with, but that doesn't mean you can't encode your data in a git graph.

It then digresses into implementation details of Github's backend implementation (how is 20k forks relevant?), then complains about default settings of the "standard" git implementation. You don't need to checkout a git working tree to have efficient key value lookups. Without a git working tree you don't need to worry about filesystem directory limits, case sensitivity and path length limits.

I was surprised the author believes the git-equivalent of a database migration is a git history rewrite.

What do you want me to do, invent my own database? Run postgres on a $5 VPS and have everybody accept it as single-point-of-failure?

Spivak•1mo ago

> Run postgres on a $5 VPS and have everybody accept it as single-point-of-failure

Oh how times have changed. Yes, maybe run two $5 VPSs behind a load balancer for HA so you can patch and then put a CDN in front of it to serve the repository content globally to everyone. Sign the packages cryptographically so you can invite people in your community to become mirrors.

How do people think PyPI, RubyGems, CPAN, Maven Central, or distro Packages work?

kpcyrd•1mo ago

Sure let me put all that on my credit card because some guy doesn't like git.

The situation that PyPi is in is clearly worse: https://stackoverflow.com/questions/39537938/how-do-i-downlo...

Spivak•3w ago

You wouldn't be the one paying for it, like PyPi you would upload your package to them.

When you bootstrap your package ecosystem using git forges for hosting there's no index at all so I'm not really sure what the argument is.

kpcyrd•3w ago

The target audience for the article are people building these systems, so the people who would have to pay for the centralized infrastructure.

With git there's a sync protocol built-in that allows anybody who's interested to pull a copy of the index (this shouldn't be the default distribution model for the package clients, but anybody who truely wants it can pull it). PyPi is keeping their index private and you'd have to scrape all data through a heavily rate-limited API.

sghiassy•1mo ago

Use the git clone —shallow option and you’ll only download the most recent commits. Yeesh

ZenoArrow•1mo ago

Did you read the article? It references the server-side overhead for shallow clones.

sghiassy•1mo ago

The “server’ cost you’re referencing is the CI system running git shallow then brew update on GitHubs CI servers.

cben•1mo ago

That's not how I understood it. Full clones are big but simple — the server just sends all the packfiles. A first shallow clone needs some server work, but that's cachable, OK.

But then on subsequent interactions between a git client that made a shallow clone various time ago and the git server, it's AFAIU actually expensive for the git server to compute the portion this particular client doesn't yet have.

Intuitively, and very hand-wavingly, I suspect things could be improved by:

(1) clients relaxing "exact depth" requests to "give me approximately N days of stuff, over-sending being OK", and server relaxing "minimal traffic" to roughly map time ranges to whole packfiles — CPU/traffic tradeoff. (2) allowing servers to under-send too (makes (1) tradeoffs easier), by client asking for missing parts right away and/or later — needs on-demand fetch ability to be transparent to user. With "promisor" mechanism in "partial clones" this sounds more realistic? (3) storing history/trees/blobs in entirely separate packfiles(?) I suspect recent years work on bitmaps & MIDX move in this direction, only less naively?

I'm not saying Git can scale as well as a DB, but I do feel we sat on an effectively frozen Git format & protocol for a ~decade, and are now exploring more of the design space so hope future will be less clear-cut...

And specifically, partial clones remove the hard "fully offline vs. centralized" dichotomy we long clinged to. Assuming you stay online (necessary anyway if you consider HTTP/DB), things that used to be up-front UX decisions can now be matters of perf tuning!

* The most dramatic win is if you had to fetch info from every package's separate repo, like Go did. Then, a central DB/caching proxy can build global indexes, unlocking huge wins, no question. It's like "1+N" issues. However, most examples other than Go in the article talk of a single Git repo already storing a global view (still leaving opportunity for custom indexing and querying).

dleslie•1mo ago

GitHub is intoxicatingly free hosting, but Git itself is a terrible database. Why not maintain an _actual_ database on GitHub, with tagged releases?

Sqlite data is paged and so you can get away with only fetching the pages you need to resolve your query.

https://phiresky.github.io/blog/2021/hosting-sqlite-database...

jarofgreen•1mo ago

This seems to be about hosting an Sqlite database on a static website like GitHub Pages - this can be a great plan, there is also Datasette in a browser now: https://github.com/simonw/datasette-lite

But that's different from how you collect the data in a git repository in the first place - or are you suggesting just putting a Sqlite file in a git repository? If so I can think of one big reason against that.

dleslie•1mo ago

Yes, I'm suggesting hosting it on GitHub, leveraging their git lfs support. Just treat it like a binary blob and periodically update with a tagged release.

jarofgreen•1mo ago

It's not clear if you are suggesting accepting contributions to the SQLite file via PR from people (but accepting contributions is generally the point of why people put these on projects on GitHub).

But if you are I wouldn't recommend it.

PR's won't be able to show diff's. Worse, as soon as multiple people send a PR at once you'll have a really painful merge to resolve, and GitHub's tools won't help you at all. And you can't edit the files in GitHub's web UI.

I recommend one file per record, JSON, YAML, whatever non-binary format you want. But then you get:

* PR's with diff's that show you what's being changed

* Files that technical people can edit directly in GitHub's web editor

* If 2 people make PR's on different records at once it's an easy merge with no conflicts

* If 2 people make PR's on the same record at once ... ok, you might now have a merge conflict to resolve but it's in an easy text file and GitHub UI will let you see what it is.

You can of course then compile these data files into a SQLite file that can be served in a static website nicely - in fact if you see my other comments on this post I have a tool that does this. And on that note, sorry, I've done a few projects in this space so I have views :-)

dleslie•1mo ago

Nah, git is terrible with binaries. But the SQL database can be rebuilt periodically; the problem being solved is replacing the git querying with SQL.

Could even follow your record model, and use that as data to populate the db.

xpressvideoz•1mo ago

The article lists Git-based wiki engines as a bad usage of Git. Can anybody recommend alternatives? I want something that can be self-hosted, is easily modified by text editors, and has individual page history, preferably with Markdown.

cbondurant•1mo ago

Admittedly, I try and stay away from database design whenever possible at work. (Everything database is legacy for us) But the way the term is being used here kinda makes me wonder, do modern sql databases have enough security features and permissions management systems in place that you could just directly expose your database to the world with a "guest" user that can only make incredibly specific queries?

Cut out the middle man, directly serve the query response to the package manager client.

(I do immediately see issues stemming from the fact that you cant leverage features like edge caching this way, but I'm not really asking if its a good solution, im more asking if its possible at all)

brendoncarroll•1mo ago

I personally think that this is the future, especially since such an architecture allows for E2E encryption of the entire database. The protocol should just be a transaction layer for coordinating changes of opaque blobs.

All of the complexity lives on the client. That makes a lot of sense for a package manager because it's something lots of people want to run, but no one really wants to host.

bob1029•1mo ago

There are still no realistic ways to expose a hosted SQL solution to the public without really unhappy things occurring. It doesn't matter which vendor you pick.

Anything where you are opening a TCP connection to a hosted SQL server is a non-starter. You could hypothetically have so many read replicas that no one could blow anyone else up, but this would get to be very expensive at scale.

Something involving SQLite is probably the most viable option.

IshKebab•1mo ago

Feels like there's an opening in the market there. Why can't you expose an SQL server to the public?

Also Stackoverflow exposes a SQL interface so it isn't totally impossible.

mirekrusin•1mo ago

You can use fossil [0]

[0] https://fossil-scm.org

zX41ZdbW•1mo ago

ClickHouse can do it. Examples:

    https://play.clickhouse.com/

    clickhouse-client --host play.clickhouse.com --user play --secure

    ssh play.clickhouse.com

baobun•1mo ago

Yes but CH is not SQL.

Hasnep•1mo ago

Yes, SQL is a query language and clickhouse is a database that uses SQL as a query language, but I don't see why that's relevant.

yawaramin•1mo ago

There's no need to have a publicly accessible database server, just put all the data in a single SQLite database and distribute that to clients. It's possible to do streaming updates by just zipping up a text file containing all the SQL commands and letting clients download that. Or even a more sophisticated option is eg Litestream.

dromologist•1mo ago

We wanted to pull updated code in our undockerized instances when they were instantiated, so we decided to pull the code from GitHub. Worked out pretty well though after a thousand trials we got a 502 and now we're one step closer to being forced into a CD pipeline.

keithgroves•1mo ago

When building https:/enact.tools we considered this. I'm glad we didn't go this route.

jarofgreen•1mo ago

It's not just package manager who do this - a lot of smaller projects crowd source data in git repositories. Most of these don't reach the scale where the technical limitations become a problem.

Personally my view is that the main problem when they do this is that it gets much harder for non-technical people to contribute. At least that doesn't apply to package managers, where it's all technical people contributing.

There are a few other small problems - but it's interesting to see that so many other projects do this.

I ended up working on an open source software library to help in these cases: https://www.datatig.com/

Here's a write up of an introduction talk about it: https://www.datatig.com/2024/12/24/talk.html I'll add the scale point to future versions of this talk with a link to this post.

Hasnep•1mo ago

Oh, this would have been great for a project I was working on a while ago! I'll have to keep it in mind for the future. Thanks for sharing

pizlonator•1mo ago

What is the alternative?

"Use a database" isn't actionable advice because it's not specific enough

yawaramin•1mo ago

Use an SQLite database file, stream out delta updates to clients using zipped plaintext or Litestream or something.

holyknight•1mo ago

It’s basically the same thing that always happens when you choose a technology because it’s convenient rather than a great fit for your problem. Sooner or later, you’ll hit a wall. Just because you can cook a salmon in your dishwasher doesn’t mean you should.

BlueTemplar•1mo ago

Wait, isn't fossil based on sqlite ?

Or does fossil itself still have the same issues ?

dwardu•1mo ago

Worst thing is when you’re in a an office and your pc along with other pcs pulls from git unauthenticated, then you get hit with api limits

Ericson2314•1mo ago

The Nixpkgs example is not like the others, because it is source code.

I don't get what is so bad about shallow clones either. Why should they be so performance sensative?

ajb•1mo ago

In a compressed format, later commits would be added as a delta of some kind, to avoid increasing the size by the whole tree size each time. To make shallow clones efficient you'd need to rewrite the compressed form such that earlier commits are instead deltas on later ones, or something equivalent.

__MatrixMan__•1mo ago

It also seems like it's not git that's emitting scary creaks and groans, but rather GitHub. As much as it would be a bummer to forgo some of GitHub's nice-to-have features, I expect we could survive without some of it.

mindslight•1mo ago

Furthermore, the issues given for nixpkgs are actually demonstrating the success of using git as the database! Those 20k forks are all people maintaining their own version of nixpkgs on Github, right? Each their own independent tree that users can just go ahead and modify for their own whims and purposes, without having to overcome the activation energy of creating their own package repository.

If 83GB (4MB/fork) is "too big" then responsibility for that rests solely on the elective centralization encouraged by Github. I suspect if you could go and total up the cumulative storage used by the nixpkgs source tree distributed on computers spread throughout the world, that is many orders of magnitude larger.

__MatrixMan__•1mo ago

Agreed, nix really makes it easy to go from solving the problem for yourself to solving it for everybody. Not much else is easy, but when it comes to building an open source community, that criterion is a pretty powerful one.

MarsIronPI•1mo ago

Exactly. Gentoo's main package repo is hosted in Git (but not GitHub, except as a mirror). Now, most users fetch it via rsync, but actually using the Git repo IME makes syncing faster, not slower. Though it does make the initial fetch slower.

kccqzy•1mo ago

Shallow clones themselves aren’t the issue. It’s that updating shallow clones requires the server to spend a bunch of CPU time and GitHub simply isn’t willing to provide that for free.

The solution is simple: using a shallow clone means that the use case doesn’t care about the history at all, so download a tarball of the repo for the initial download and then later rsync the repo. Git can remain the source of truth for all history, but that history doesn’t have to be exposed.

collinmanderson•1mo ago

Can you rsync a repo from GitHub?

teiferer•1mo ago

And this my friends is the reason why (only) focusing on CPU cycles and memory hierarchies is insufficient when thinking of the performance of a system. Yes they are important. But no level of low-level optimization will get you out of the hole that a wrong choice of algorithm and/or data structure may have dug you into.

iamwil•1mo ago

This sounds like a missing piece of software in the OSS world. If you have the inclination, you should write it.

yawaramin•1mo ago

People have: https://0install.net/

weiwenhao•1mo ago

For package management software that is rarely used, free is the biggest motivation.

mukundesh•1mo ago

Though not Github, worth mentioning Huggingface, which is also using git, but managing large files with their(?) xet protocol. https://huggingface.co/docs/hub/en/xet/index

drzaiusx11•1mo ago

One of the first things I did at my current place of employment was to detangle the mess of gemfile git dependencies and get them to adopt semver and an actual package repo. There were so many footguns with git dependencies in ruby we were getting taken down by friendly fire on the daily...

drzaiusx11•1mo ago

I'd add git gemfile dependencies to the list of languages called out here as well. It supports git repos, but in general it's a bad idea unless you are diligent with git tag use and disallow git tag mutability, which also assumes you have complete control of your git dependencies...

newswangerd•1mo ago

It’s always humbling when you go on the front page of HN and see an article titled “the thing you’re doing right now is a bad idea and here’s why”

This has happened to me a few times now. The last one was a fantastic article about how PG Notify locks the whole database.

In this particular case it just doesn’t make a ton of sense to change course. Im a solo dev building a thing that may never take off, so using git for plug-in distribution is just a no brainer right now. That said, I’ll hold on to this article in case I’m lucky enough to be in a position where scale becomes an issue for me.

baobun•1mo ago

The good news is you can easier avoid some of the pitfalls now even as you stick with it. Some good points in comments.

I don't know if you rely on github.com but IMO vendor lock-in there might be a bigger issue which you can avoid.

newswangerd•1mo ago

Yeah, I'm implementing a couple of things to make my life easier in the future. I don't use any github APIs and I'm setting up my clients to load the plugin repo URLs from my server so I can change them later if I need to. I want all of the resources my clients need to come from my domain name so I can move it around if I need to.

ekjhgkejhgk•1mo ago

Uncertain if this is OT, but given that the CCC is politically inspired organization, I hope not:

One thing that still seems absent is awareness of the complete takeover of "gadgets" in schools. Schools these days, as early as primary school, shove screens in front of children. They're expected to look at them, and "use" them for various activities, including practicing handwriting. I wish I was joking [1].

I see two problems with this.

First is that these devices are engineered to be addictive by way of constant notifications/distractions, and learning is something that requires long sustained focus. There's a lot of data showing that under certain common circumstances, you do worse learning from a screen than from paper.

Second is implicitly it trains children to expect that anything has to be done through a screen connected to a closed point-and-click platform. (Uninformed) people will say "people who work with computers make money, so I want my child to have an ipad". But interacting with a closed platform like an ipad is removing the possibilities and putting the interaction "on rails". You don't learn to think, explore and learn from mistakes, instead you learn to use the app that's put in front of you. This in turn reinforces the "computer says no" [2] approach to understanding the world.

I think this is a matter of civil rights and freedom, but sadly I don't often see "civil rights" organizations talk about this. I think I heard Stallman say something along these lines once, but other than that I don't see campaigns anywhere.

[1] https://www.letterjoin.co.uk/

[2] https://youtu.be/eE9vO-DTNZc

AceJohnny2•1mo ago

It looks like you commented on the wrong post, although I don't immediately see a front-page post about the ongoing Chaos Computer Congress.

kzrdude•1mo ago

it's here https://news.ycombinator.com/item?id=46386211 (and it was last on the front page at the moment)

ekjhgkejhgk•1mo ago

LOL sorry. You're right. I'll copy paste over there.

grumbel•1mo ago

Do we have distributed databases that regular users can clone, modify and merge?

stephenlf•1mo ago

Omarchy

jama211•1mo ago

“It never works out” - hmm, seems like it worked out just fine, worked great to get the operation of the ground and when scale became an issue it was solvable by moving to something else. It served its purpose, sounds like it worked out to me.

swiftcoder•1mo ago

You appear to have glossed over the two projects in the list that are stuck due to architectural decisions, and don't have any route to migrate off of git-as-database?

hombre_fatal•1mo ago

Be more specific because I just see a list of workarounds deployed once they had the scale to warrant them, supporting the OP’s claim.

swiftcoder•1mo ago

Read the vcpkg section, it explicitly states that they have no horizontal on a solution. The nix section also doesn’t explain any potential solution.

baobun•1mo ago

The issues with nixpkgs stem from that it is a monorepo for all packages and doubling as an index.

The issues are only fundamental with that architecture. Using a separate repo for each package, like the Arch User Repos, does not have the same problems.

Nixpkgs certainly could be architected like that and submodules would be a graceful migration path. I'm not aware of discussion of this but guess that what's preventing it might be that github.com tooling makes it very painful to manage thousands of repos for a single project.

So I think it can be a lesson not to that using git as a database is bad but that using github.com as a database is. PRs as database transactions is clunky and GitHub Actions isn't really ACID.

yawaramin•1mo ago

It's not a monorepo though? It's a package index, it has the package metadata. It doesn't have the actual source code of the projects themselves.

baobun•1mo ago

Point being it carries both the index (versions/pointers) and full metadata + build instructions for all packages in single repo.

The index could be split from the build and the package build defs could live in independent repos (like go or aur).

It would probably take some change to nix itself to make that work and some nontrivial work on tooling to make the devex decent.

But I don't think the friction with nixpkgs should be seen as damning for backing a package registry with git in general.

jama211•1mo ago

It’s a fair criticism, and this article does serve well as a warning for people to try and avoid this issue from the start.

lijok•1mo ago

Nooo you don’t get it - it didn’t scale from 0 to a trillion users so it’s a garbage worthless system that “doesn’t scale”.

zephen•1mo ago

^^^ Poe's Law may or may not apply to the above comment.

efitz•1mo ago

When you start out with a store like git, with file system semantics and a client that has to be smart to handle all the compare and merge operations, then it’s practically impossible to migrate a large client base to a new protocol. Takes years lots of user complaints to and random breakage.

Much better to start with an API. Then you can have the server abstract the store and the operations - use git or whatever - but you can change the store later without disrupting your clients.

jama211•1mo ago

That costs hosting money no? That might be a bigger problem for someone starting than scalability

leoh•1mo ago

I couldn't agree more strongly. There is a huge opportunity to make git more effective for this kind of use-case, not to abandon it. The essay in question provides no compelling alternative; it therefore reaches an entirely half-baked conclusion.

jama211•1mo ago

A good point!

mikepurvis•1mo ago

The nix cli almost exclusively pulls GitHub as zipballs. Not perfect but certainly far faster than a real git clone.

pxc•1mo ago

That it supports fetching via Git as well as various via forge-specific tarballs, even for flakes, is pretty nice. It means that if your org uses Nix, you can fall back to distribution via Git as a solution that doesn't require you to stand up any new infra or tie you to any particular vendor, but once you get rolling it's an easy optimization to switch to downloading snapshots.

The most pain probably just becomes from the hugeness of Nixpkgs, but I remain an advocate for the huge monorepo of build recipes.

mikepurvis•1mo ago

Yes agreed. It’s possible to imagine some kind of cached-deltas scheme to get faster/smaller updates, but I suspect the folks who would have to build and maintain that are all on gigabit internet connections and don’t feel the complexity is worth it.

pxc•1mo ago

> It’s possible to imagine some kind of cached-deltas scheme to get faster/smaller updates

I think the snix¹ folks are working on something like this for the binary caches— the greater granularity of the content-addressing offers morally the same kind of optimization as delta RPMs: you can download less of what you don't need to re-download.

But I'm not aware of any current efforts to let people download the Nixpkgs tree itself more efficiently. Somehow caching Git deltas would be cool. But I'd expect that kind of optimization to come from a company that runs a Git forge, if it's generally viable, and to benefit many projects other than Nix and Nixpkgs.

1: https://snix.dev/

mikepurvis•1mo ago

Yes indeed. That said nix typically throws away the .git dir so it would require some work to adapt a solution to nix that operates at the git repo level.

The ideal for nix would be “I have all content at commit X and need the deltas for content at commit Y” and i suspect nix would be fairly unique in being able to benefit from that. To the point that it might actually make sense to just implement the fact git repo syncs and have a local client serving those tarballs to the nix daemon.

didip•1mo ago

So… What we need is a globally distributed git seeders of all open source github content, then?

Seems possible if every git client is also a torrent client.

the__alchemist•1mo ago

The Cargo example at the top is striking. Whenever I publish a crate, and it blocks me until I write `--allow-dirty`, I am reminded that there is a conflation between Cargo/crates.io and Git that should not exist. I will write `--allow-dirty` because I think these are two separate functionalities that should not be coupled. Crates.io should not know about or care about my project's Git usage or lack thereof.

cesarb•1mo ago

> The Cargo example at the top is striking. Whenever I publish a crate, and it blocks me until I write `--allow-dirty`, I am reminded that there is a conflation between Cargo/crates.io and Git that should not exist. I will write `--allow-dirty` because I think these are two separate functionalities that should not be coupled.

That's completely unrelated.

The --allow-dirty flag is to bypass a local safety check which prevents you from accidentally publishing a crate with changes which haven't been committed to your local git repository. It has no relation at all to the use of git for the index of packages.

> Crates.io should not know about or care about my project's Git usage or lack thereof.

There are good reasons to know or care. The first one, is to provide a link from the crates.io page to your canonical version control repository. The second one, is to add a file containing the original commit identifier (commit hash in case of git) which was used to generate the package, to simplify auditing that the contents of the package match what's on the version control repository (to help defend against supply chain attacks). Both are optional.

the__alchemist•1mo ago

Those are great points, and reinforce the concept that there is conflation between Cargo and Git/commits. Commits and Cargo IMO should be separate concepts. Cargo should not be checking my Git history prior to publishing.

aidenn0•1mo ago

As far as I know, Nixpkgs doesn't use git as a package database. The packages definitions are stored and developed in git, but the channels certainly are not.

mcny•1mo ago

I want to take a quick detour here if anyone is knowledgeable about this topic.

> The hosting problems are symptoms. The underlying issue is that git inherits filesystem limitations, and filesystems make terrible databases.

Does this mean mbox is inherently superior to maildir? I really like the idea of maildir because there is nothing to compact but if we assume we never delete emails (on the local machine anyways), does that mean mbox or similar is preferable over maildir?

juped•1mo ago

No, of course not.

notorandit•1mo ago

Repsy

pxc•1mo ago

Loved this article. Just enough detail to make the broad scope compatible with a reasonable length, and well-argued.

I feel sometimes like package management is a relatively second-class topic in computer science (or at least among many working programmers). But a package manager's behavior can be the difference between a grotesque, repulsive experience and a delightful, beautiful one. And there aren't quite yet any package managers that do well everything that we collectively have learned how to do well, which makes it an interesting space imo.

Re: Nixpkgs, interestingly, pre-flakes Nix distributes all of the needed Nix expressions as tarballs, which does play nice with CDNs. It also distributes an index of the tree as a SQLite database to obviate some of the "too many files/directories" problem with enumerating files. (In the meantime, Nixpkgs has also started bucketing package directories by name prefix, too.) So maybe there was a lesson learned here that would be useful to re-learn.

On the other hand, IIRC if you use the GitHub fetcher rather than the Git one, including for fetching flakes, Nix will download tarballs from GitHub instead of doing clones. Regardless, downloading and unpacking Nixpkgs has become kinda slow. :-\

themk•1mo ago

I think git is overkill, and probably a database is as well.

I quite like the hackage index, which is an append-only tar file. Incremental updates are trivial using HTTP range requests making hosting it trivial as well.

leoh•1mo ago

The conclusion reached in this essay is 100% wrong. See " The reftable backend What it is, where it's headed, and why should you care?"

>With release 2.45, Git has gained support for the “reftable” backend to read and write references in a Git repository. While this was a significant milestone for Git, it wasn‘t the end of GitLab’s journey to improve scalability in repositories with many references. In this talk you will learn what the reftable backend is, what work we did to improve it even further and why you should care.

https://www.youtube.com/watch?v=0UkonBcLeAo

Also see Scalar, which Microsoft used to scale their 300GiB Windows repository, https://github.com/microsoft/scalar.

skywhopper•1mo ago

Not sure I can agree with the takeaway. It works well at first, but doesn’t scale, so folks found workarounds. That’s how literally every working system grows. There are always bottlenecks eventually. And you address them when they become an issue, not five years earlier.

juped•1mo ago

These are actually all problems with using Github as an ersatz CDN.

bandrami•1mo ago

Maybe I'm misreading the article but isn't every example about the downside of using github as a database host, not the downside of using git as a database?

Like, yes, you should host your own database. This doesn't seem like an argument against that database being git.

khc•1mo ago

seems like the issue isn't with using git as a database, but using github as a distribution mechanism?

zzo38computer•1mo ago

Git commits will have a hash and each file will have a hash, which means that locking is unnecessary for read access. (This is also true of fossil, although fossil does have locking since it uses SQLite.)

The other stuff mentioned in the article seems to be valid criticisms.

wg0•1mo ago

Why not use SQLite then as database for package managers? A local copy could be replicated easily with delta fetch.

kuahyeow•1mo ago

GitLab employee here. We have completed the move away from Gollum years ago (see https://gitlab.com/groups/gitlab-org/-/epics/2381).

It looks like that doc https://docs.gitlab.com/development/wikis/ was outdated - since fixed to no longer mention Gollum.

venturecruelty•1mo ago

This is why I don't use programming languages that do that.

skylurk•1mo ago

Sounds like it worked pretty well several times? But yeah it does not scale forever.

krbaccord941x•1mo ago

I understand article is concerning RFC2789, in cloning whole indexes for lang indexes, but /cargo/src shallow-clones need another layer, where tertiary compilation or decompression takes place in mutex libraries, whether its SSL certificate is dependent on HTTP fetch.

shellkr•1mo ago

I am not sure this is necessarily a git issue as it is mostly a GitHub issue. just look at the Aur of Arch Linux which works perfectly.

nottorp•1mo ago

> Auto-updates now run every 24 hours instead of every 5 minutes

What the... why would you run an autoupdate every 5 minutes?

tarun_anand•1mo ago

why dont we have a P2P transfer platform for this (modulo security)

ferfumarma•1mo ago

I love this write-up. As a non-expert user of package managers I can quickly understand a set of patterns that have been deeply considered and carefully articulated. Thanks for taking the time to write up your observations!

whytevuhuni•1mo ago

No mention of Guix. Has its situation improved? I remember waiting almost an hour on “guix pull” to catch up with its git repo on a fresh install.

rudolph9•1mo ago

It’s worth considering if these package managers would have taken off if they didn’t use git. You get a bunch for free, why not use it while you’re small?

rldjbpin•1mo ago

the title and core argument do not seem to align much. subject is git, but most discourse is around github. the role discussed is for serving packages, while the title refers to it as "database".

regardless of the semantics, git is not ideal for serving files. this has been more apparent in the ai world, where extensions such as git lfs has allowed larger file size.

but as seen elsewhere, network effects trump over any design issues. we can always introduce an "lfs" for better shallow fetching (cached? compressed?) and this would resolve a majority of the op's grievences.

rurban•1mo ago

You can add Debian to the list of pain. They'll find out soon enough

schnatterer•1mo ago

> Even GitOps tools that embrace git as a source of truth have to work around its limitations

I'd say this only applies to huge scale (or monorepos, as mentioned in the article). Another workaround might be gitless gitops via OCI.

cben•1mo ago

Obligatory link to the gold intro to So. Many. Aspects. of pkg manager design: https://medium.com/@sdboyer/so-you-want-to-write-a-package-m... Even if its section on "Central Package Registry" isn't very deep.

account42•1mo ago

> Windows restricts paths to 260 characters, a constraint dating back to DOS.

It doesn't if you do it properly.

SectorC: A C Compiler in 512 bytes

The F Word

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Speed up responses with fast mode

Software factories and the agentic moment

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

I write games in C (yes, C)

First Proof

Show HN: A luma dependent chroma compression algorithm (image compression)

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

Reinforcement Learning from Human Feedback

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Selection Rather Than Prediction

Coding agents have replaced every framework I used

The AI boom is causing shortages everywhere else

A Fresh Look at IBM 3270 Information Display System

France's homegrown open source online office suite

72M Points of Interest

We mourn our craft

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

History and Timeline of the Proco Rat Pedal (2021)

SectorC: A C Compiler in 512 bytes

The F Word

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Speed up responses with fast mode

Software factories and the agentic moment

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

I write games in C (yes, C)

First Proof

Show HN: A luma dependent chroma compression algorithm (image compression)

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

Reinforcement Learning from Human Feedback

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Selection Rather Than Prediction

Coding agents have replaced every framework I used

The AI boom is causing shortages everywhere else

A Fresh Look at IBM 3270 Information Display System

France's homegrown open source online office suite

72M Points of Interest

We mourn our craft

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

History and Timeline of the Proco Rat Pedal (2021)

Package managers keep using Git as a database, it never works out

Comments