The Ingredients of a Productive Monorepo

https://blog.swgillespie.me/posts/monorepo-ingredients/

292•mifydev•4d ago

Comments

jph•1d ago

Good practical article, thank you. I've added the link to my monorepo-vs-polyrepo guide here: https://github.com/joelparkerhenderson/monorepo-vs-polyrepo/

swgillespie•1d ago

thanks!

Flux159•1d ago

This definitely tracks with my experience in big tech - managing large scale build systems ends up taking a team that works on the build system itself. The underlying repo technology itself needs to work at scale & that was with a virtual file system that downloaded source files on demand when you needed to access them.

One thing that this article didn't mention is that most development was done either on your development server running a datacenter (think ~50-100 cores) - or on an "on demand" machine that was like a short lived container that generally stayed up to date with known good commits every few hours. IDE was integrated with devservers / machines & generally language servers, other services were prewarmed or automatically setup via chef/ansible, etc. Rarely would you want to run the larger monorepos on your laptop client (exception would generally be mobile apps, Mac OS apps, etc.).

swgillespie•1d ago

Yeah - I worked on that build team probably at the same place you did!

I think for a lot of users it's more important that the monorepo devenv be reproducible than be specifically local or specifically remote. It's certainly easier to pull this off when it's a remote devserver that gets regularly imaged.

codethief•1d ago

> Yeah - I worked on that build team probably at the same place you did!

I did not work at that place but the story sounds very familiar – I believe there might have been a blog post about that remote development environment here on HN some time ago?

zer00eyz•1d ago

> One thing that this article didn't mention is that most development was done either on your development server running a datacenter (think ~50-100 cores)

I have done this for many small teams as well.

It remains pretty hard to get engineers to stop "thinking localy" when doing development. And with what modern hardware looks like (in terms of cost and density) it makes a lot of sense to find a rack some where for your dev team... It's easy enough to build a few boxes that can run dev, staging, test and what ever other on demand tooling you need with room to grow.

When you're close to your infrastructure and it looks that much like production, when you have to share the same playground the code inside a monorepo starts to look very different.

> managing large scale build systems ends up taking a team that works on the build system itself

This is what stops a lot of small teams from moving to monorepo. The thing is, your 10-20 person shop is never going to be google or fb or ms. They will never have large build system problems. Maintaining all of it MIGHT be someone's part time job IF you have a 20 person team and a very complex product. Even that would be pushing it.

tayo42•1d ago

Working with a well maintained mono repo is so nice, any other workflow just sucks to go back to. Working with a "lets do a monorepo" monorepo, where who ever set it up didn't understand the points in this article and more is a nightmare.

I think this is a business opportunity, if someone could sell the polished monorepo experience and tools to companies with engineering organizations but can't pull off a successful "we need to fork git" project to support their developers.

halflife•1d ago

It is a business opportunity, NX is offering it. In my previous startup, I started developing from the get go with NX, it became a huge velocity boost to our team. With 15 person RND we had standards that a 100 person RND didn’t accomplish. In my new company (which has bought the startup), they tried the “let’s do a monorepo” approach. It is a catastrophe. I am now in the process of migrating them to NX with great results.

mierz00•1d ago

Likewise, we’re using NX at my company and it has been a great experience.

Previous mono repo experiences were nothing short of a nightmare so it’s refreshing to see tooling come so far.

baq•1d ago

They’re on the right path but still have a lot to learn in the testing department.

Source: my job’s monorepo is running nx, but I’m not in developer productivity; used to work with a large codebase with an accompanying test suite of thousands of hours of a single box and it’s kinda like watching people rediscovering the roundness required to make the wheel.

lxe•1d ago

So there are 2 kinds of big tech monorepos.

One is the kind described in the article here: "THE" monorepo of the (mostly) entire codebase, requiring custom VCS, custom CI, and a team of 200 engineering supporting this whole thing. Uber and Meta and I guess Google do it this way now. It takes years of pain to reach to this point. It usually starts with the other kind of "monorepo":

The other kind is the "multirepo monorepo" where individual teams decide to start clustering their projects in monorepos loosely organized around orgs. The frontend folks want to use Turborepo and they hate Bazel. The Java people want to use Bazel and don't know that anything else really exists. The Python people do whatever the python people do these days after giving up on Poetry, etc... Eventually these might coalesce into larger monorepos.

Either approach costs millions of dollars and millions of hours of developers' time and effort. The effort is largely defensible to the business leaders by skillful technology VPs, and the resulting state is mostly supported by the developers who chose to forget the horror that they had to endure to actually reach it.

Pawka•1d ago

It's worth noting that most monorepos won't reach the same size as repositories from Google, Uber, or other tech giants. Some companies introduce new services every day, but for some, the number of services remains steady.

If a company has up to 100 services, there won't be VCS scale problems, LSP will be able to fit the tags of the entire codebase in a laptop's memory, and it is probably _almost_ fine to run all tests on CI.

TL;DR not every company will/should/plan to be the size of Google.

CamouflagedKiwi•1d ago

I do think the 'run all tests on CI' part is not that fine, it bites a lot earlier than the others do. Git is totally fine for a few hundred engineers and 100ish services (assuming nobody does anything really bad to it, but then it fails for 10 engineers anyway), but running all tests rapidly becomes an issue even with tens of engineers.

That is mitigated a lot by a really good caching system (and even more by full remote build execution) but most times you basically end up needing a 'big iron' build system to get that, at which point it should be able to run the changed subset of tests accurately for you anyway.

anon7000•1d ago

There are also so many types of slow tests in web systems. Any kind of e2e test like Cypress or Playwright can easily take a minute. Integrations tests that render components and potentially even access a DB take many times longer than a basic unit test. It doesn’t take very many of the slow group to reaaaly start slowing your system down. At that point, what matters is how much money you’re willing to pay to scale your build agents either vertical or (more likely) horizontally

CamouflagedKiwi•1d ago

Well no, it's just not build agent size; if you have 10 tests that take 3-4 minutes each, you're not gonna go any faster than the slowest of them (plus the time to build them, which is also typically bad for those kinds of tests, although a bigger build agent may be faster there). Having a system that can avoid running the test for many PRs because it can prove it's not affected means in those cases you don't have to wait for that thing to run at all.

Although, time is money, so often scaling build agents may be cheaper than paying for the engineering time to redo your build system...

bluGill•1d ago

I have hundreds of tests that take 15-30 mintues each. These tests tend to be whole system tests so there is no way useful way to say it won't touch your change (75% will). Despite an extensive unit test suite (that runs first) these tests catch a large number of real production bugs, and most of them are things that a quicker running test couldn't catch.

Which is to say that trying to avoid running tests isn't the right answer. Make them as fast as you can, but be prepared to pay the price - either a lot of parrell build systems, or lower quality.

sampullman•1d ago

It's a bit of a tangent and I agree with your point, but wanted to note that for one project our e2e tests went from ~40 min to less than 10, just by moving from Cypress to Playwright. You can go pretty far with Playwright and a couple of cheap runners.

CamouflagedKiwi•1d ago

I appreciate the point, but I've heard this kind of thing several times before - last time around was hype about how Cypress would have exactly this effect (spoiler: it did not live up to the hype). I don't believe the new framework du jour will save you from this kind of thing, it's about how you write & maintain the tests.

sampullman•1d ago

I wish I had hard evidence to show because my normal instinct would be similar to yours, but in this case I'm a total Playwright convert.

Part of it might be that Playwright makes it much easier to write and organize complex tests. But for that specific project, it was as close to a 1 to 1 conversion as you get, the speedup came without significant architectural changes.

The original reason for switching was flaky tests in CI that were taking way too much effort to fix over time, likely due to oddities in Cypress' command queue. After the switch, and in new projects using Playwright, I haven't had to deal with any intermittent flakiness.

aoeusnth1•1d ago

Or spend on time building test selection systems…

zbentley•1d ago

I think that discussions in this area get muddied by people using different definitions of “rapidly”. There are (at least) two kinds of speed WRT tests being run for a large code base.

First, there is “rapidly” as pertains to the speed of running tests during development of a change. This is “did I screw up in an obvious way” error checking, and also often “are the tests that I wrote as part of this change passing” error checking. “Rapid” in this area should target low single digits of minutes as the maximum allowed time, preferably much less. This type of validation doesn’t need to run all tests—or even run a full determinator pass to determine what tests to run; a cache, approximation, or sampling can be used instead. In some environments, tests can be run in the development environment rather than in CI for added speed.

Then there is “rapidly” as pertains to the speed of running tests before deployment. This is after the developer of a change thinks their code is pretty much done, unless they missed something—this pass checks for “something”. Full determinator runs or full builds are necessary here. Speed should usually be achieved through parallelism and, depending on the urgency of release needs, by spending money scaling out CI jobs across many cores.

Now the hot take: in nearly every professional software development context it is fine if “rapidly” for the pre-deployment category of tests is denominated in multiple hours.

Yes, really.

Obviously, make it faster than that if you can, but if you have to trade away “did I miss something” coverage, don’t. Hours are fine, I promise. You can work on something else or pick up the next story while you wait—and skip the “but context switching!” line; stop feverishly checking whether your build is green and work on the next thing for 90min regardless.

“But what if the slow build fails and I have to keep coming back and fixing stuff with an 2+ hours wait time each fix cycle? My precious sprint velocity predictability!”—you never had predictability; you paid that cost in fixing broken releases that made it out because you didn’t run all the tests. Really, just go work on something else while the big build runs, and tell your PM to chill out (a common organizational failure uncovered here is that PMs are held accountable for late releases but not for severe breakage caused by them pushing devs to release too early and spend less time on testing).

“But flakes!”—fix the flakes. If your organization draws a hard “all tests run on every build and spurious failures are p0 bugs for the responsible team” line, then this problem goes away very quickly—weeks, and not many of them. Shame and PagerDuty are powerful motivators.

“But what if production is down?” Have an artifact-based revert system to turn back the clock on everything, so you don’t need to wait hours to validate a forward fix or cherry-picked partial revert. Yes, even data migrations.

Hours is fine, really. I promise.

dalyons•1d ago

You are of course entitled to your opinion, and I do appreciate going against the grain, but having worked in an “hours” environment and a “minutes” environment I couldn’t disagree more. The minutes job is so much more pleasant to work with in nearly every way. And ironically ended up being higher quality because you couldn’t lean on a giant integration test suite as a crutch. Automated business metric based canary rollbacks, sophisticated feature flagging and gating systems, contract tests, etc. and these run in production, so are accurate where integration tests often aren’t in a complicated service topology.

There are also categories of work that are so miserable with long deployment times that they just don’t get done at all in those environments. Things like improving telemetry, tracing, observability. Things like performance debugging, where lower envs aren’t representative.

I would personally never go back, for a system of moderate or more distributive complexity (ie > 10 services, 10 total data stores )

zbentley•1d ago

All very fair points! I think it is perhaps much more situational than I made it out to be, and that functioning in an “hours” environment is only possible as described if some organizational patterns are in place to make it work.

dalyons•1d ago

yeah i realized as i wrote that out that my personal conclusions probably don't apply in a monoservice type architecture. If you have a mono(or few) service architecture with a single (or few) db, it is actually feasible to have integration tests that are worth the runtime. The bigger & more distributed you get, the more the costs of integration tests go up (velocity, fragility, maintenance, burden of mirroring production config) and the equation doesnt pencil out anymore. Probably other scenarios where im wrong also.

echelon•1d ago

As a former IC at a large monorepo company, I preferred monorepos over polyrepos.

It was the "THE" monorepo, and it made understanding the company's service graph, call graph, ownership graph, etc etc. incredibly clear. Crystal clear. Vividly so.

Polyrepos are tribal knowledge. You don't know where anything lives and you can't look or discover it. Every team does their own thing. Inheriting new code is a curse. Code archeology feels like an adventure in root cause analysis in a library of hidden and cryptic tomes.

Polyrepos are like messages and knowledge locked away inside Discord or Slack channels with bad retention policies. Everything atrophies in the dark corners.

If monorepos cost millions, I'd say polyrepos do just the same in a different way.

Monorepos are are a continent of giant megafauna. Large resources, monotrophic.

Polyrepos are a thousand species living and dying, some thriving, some never to be known, most completely in the dark.

shawabawa3•1d ago

Every time I've been monorepos compares with polyrepos it's always "monorepo plus millions of dollars of custom tool engineering" vs "stock polyrepo"

Why can't we add millions of dollars of tool engineering on top of polyrepos to get some of the benefits of monorepos without a lot of the pain? E.g. it wouldn't be too hard to create "linked" PRs across repos for changes that span projects, with linked testing infrastructure

And I don't see how discovery changes significantly from browsing through loads of repositories instead of loads of directories in a repository

phinnaeus•1d ago

Great call out. Amazon has an extremely effectively polyrepo setup and it’s a shame there’s no open source analog. Probably because it requires infrastructure outside of the repo software itself. I’ve been toying around with building it myself but it’s a massive project and I don’t have much free time.

vineyardmike•1d ago

The Amazon poly-repo setup is an engineering marvel, and a usability nightmare, and doesn't even solve all the major documented problems of poly-repos. The "version set" idea was probably revolutionary when it was invented, but everyone I know who has ever worked at amazon has casually mentioned that their team has at least one college-hire working 25%+ time on keeping their dependency tree building.

phinnaeus•1d ago

This really shouldn't be the case as of about 5 years ago, a massive effort was done to get all version sets merging from live regularly and things were much healthier after that. For what it's worth I suspect the usability of Brazil before then was still on par or better than the usability of a unkempt monorepo (which is unfortunately all too common).

Too•21h ago

Sounds interesting. Are there any public articles available describing this more?

friendzis•1d ago

Exactly. Take your monorepo, split it into n repos by directory at certain depth from root, write very a rudimentary VCS wrapper script to sync all the repos in tandem and you have already solved a lot of pain points.

> E.g. it wouldn't be too hard to create "linked" PRs across repos for changes that span projects, with linked testing infrastructure

Bitbucket does this out-of-the box :)

dezgeg•1d ago

> Take your monorepo, split it into n repos by directory at certain depth from root, write very a rudimentary VCS wrapper script to sync all the repos in tandem and you have already solved a lot of pain points.

Then you lose the capability to atomically make a commit that crosses repoes. I'm not sure if there is any forge that allows that, except Gerrit might with its topics feature (I've not gotten the opportunity to try that).

zelphirkalt•1d ago

You could also use git submodules in an overarching separate repo, if you want to lock down a set of versions. It doesn't even have to affect the submodule repos in any way. That would simplify branches in the single repos and enable teams to work independently on each repo. Then you only deploy from the overarching repo's main branch for example, where you have to create PRs for merging into the main branch and get it reviewed and approved.

dezgeg•1d ago

That's not a nice workflow from pipelines/CI point of view.

Let's take for example a service 'foobar' that depends on in-house library 'libfoo'. And now you need to add a feature to foobar that needs some changes to libfoo at same time (and for extra fun let's say those changes will break some other users of libfoo). Of course during development you want to run pipelines for both libfoo and foobar.

In such 'super module' system it gets pretty annoying to push changes for testing in CI when every change to either libfoo or foobar needs to be followed by a commit to the super repo.

In a monorepo that's just another Tuesday.

friendzis•1d ago

> In such 'super module' system it gets pretty annoying to push changes for testing in CI when every change to either libfoo or foobar needs to be followed by a commit to the super repo.

Again, tooling issue. CI can easily pull required changeset across multiple repos. We are in a subthread under "monorepo plus millions of dollars of custom tool engineering" vs "stock polyrepo"

oivey•1d ago

That sort of directory-based splitting almost never works in my experience. The code between those directories is almost always tightly coupled. Splitting arbitrarily like this gives the illusion of a non-tightly coupled code base with all the disadvantages of highly coupled dependencies. It’s basically the worst possible way to migrate workflows.

HdS84•1d ago

Hey, do you think Gitlab should do anything except running after the next trend and develop shitty not-solutions for that? Why, that could improve Gitlab. We cannot have that!

dezgeg•1d ago

> Every time I've been monorepos compares with polyrepos it's always "monorepo plus millions of dollars of custom tool engineering" vs "stock polyrepo"

The costs of the infra/build/CI work are of course more visible when there is a dedicated team doing it. If there is no such central team, the cost is just invisibly split between all the teams. In my experience this is more costly overall, due to every team rolling their own thing and requiring them to be jack-of-all-trades in rolling their own infra/build/CI.

> And I don't see how discovery changes significantly from browsing through loads of repositories instead of loads of directories in a repository

If repository permissions aren't set centrally but every team gets to micromanage them, then they usually end up too restrictive and you don't get even read-only access.

threeseed•1d ago

a) At least with Github Actions it is trivial to support polyrepos. At my company we have thousands of repositories which we can easily handle because we can sync templated CI/CD workflows from a shared repository to any number of downstream ones.

b) When you are browsing through repositories you see a description, tags, technologies used, contributors, number of commits, releases etc. Massive difference in discovery versus a directory.

GraemeMeyer•1d ago

Curious how you do the sync - do you just git include and occasionally pull from upstream, or another mechanism?

CamouflagedKiwi•1d ago

I think one reason is that there are various big companies (Google, Microsoft, Meta) who have talked about the tech they've deployed to make monorepos work, but I at least have never seen an equivalent big successful company describe their polyrepo setup, how they solved the pain points and what the tech around it looks like.

bluGill•1d ago

I keep meaning to write a blog post...

The short answer, start with a package management system like conan or npm (we rolled our own - releasing 1.0 the same month I first heard of conan which was then around version 0.6 - don't follow our example). Then you just need processes to ensure that everyone constantly has the latest version of all the repos they depend on - which ends up being a full time job for someone to manage.

Don't write your own package manager - if you use a common one that means your IDE will know how to work with it - our custom package manager has some nice features but we have to maintain our own IDE plugin so it can figure out the builds.

Too•21h ago

> Then you just need processes to ensure that everyone constantly has the latest version of all the repos they depend on - which ends up being a full time job for someone to manage.

One full time job equivalent can buy a lot of tooling. Tooling that not only replaces this role but also shifts the feedback a lot closer to dev introducing the breaking change.

stackskipton•1d ago

>equivalent big successful company describe their polyrepo setup, how they solved the pain points and what the tech around it looks like.

I've worked at big successful F500 boring companies with polyrepo setup and it's boring as well. For this company, it was Jenkins checked out the repo, ran the Jenkins file, artifact was created and stuck into JFrog Artifactory. We would update Puppet file in our repo and during approved deploy window in ServiceNow, Puppet would do the deploy. Because of this, Repos had certain fixed structure which was annoying at times.

Pain Points that were not solved is 4 different teams involved in touching everything (Jenkins, Puppet, InfoSec and dev team) and break downs that would happen.

WorldMaker•1d ago

I also think a lot of it is quiet for a reason. There aren't interesting problems to solve. A lot of it is boring. It isn't without pain, but most of the pain consists of lots of little papercuts rather than big giant showstopping injuries. A lot of the papercuts are just annoying enough itches that aren't worth scratching. Or are solved with ecosystems of normal, boring tools like Jenkins or GitHub Advanced Security or SonarQube or GitHub Actions or… Boring off-the-shelf tools for boring off-the-shelf pain points.

bluGill•1d ago

My company has millions of dollars in tooling for our polyrepo. It would not be hard to throw several more million into the problem.

If you have a large project there is no getting around the issues you will have. Just a set of pros and cons.

There are better tools for polyrepo you can start with, but there is a lot of things that we have that I wish I could get upstreamed (there is good reason the open source world would not accept our patches even if I cleaned them up)

maccard•1d ago

> Every time I've been monorepos compares with polyrepos it's always "monorepo plus millions of dollars of custom tool engineering" vs "stock polyrepo"

Not quite - it's "vs stock polyrepo with millions of dollars of engineering effort in manually doing what the monorepo tooling does".

codethief•1d ago

> Why can't we add millions of dollars of tool engineering on top of polyrepos

I don't think the "stock polyrepo" characterization is apt. Organizations using polyrepos already do invest that kind of money. Unfortunately, this effort is not visible because it's spread out across repos and every team does their own thing. So then people erroneously conclude that monorepos are much more expensive. Like the GP said:

> Polyrepos are a thousand species living and dying, some thriving, some never to be known, most completely in the dark.

nssnsjsjsjs•1d ago

Monorepo monoliths make it hard to experiment. Getting something as benine as a later version of .NET becomes a mammoth task requiring the architecture team and everything stays old. Want to use a reasonable tool? No chance.

zelphirkalt•1d ago

I don't see how it immediately has to follow from monorepo usage, that its parts cannot have separate runtimes and dependency versions. Perhaps the monorepo tooling is still that bad, idk, but there seems no inherent reason for that.

nssnsjsjsjs•1d ago

I mean monoliths specifically. If your mono repo is just storing many repos in different folders and aims to keep all that in lockstep it is a bit different.

lolinder•1d ago

But I think you're the first person to introduce the concept of a monolith to the conversation. How you structure your repo is an orthogonal question to how you break up your deployments, and this conversation is about the former not the latter.

A monolith that's broken up into 20 libraries in their own repos also prevents experimentation with new runtimes just as much as the monorepo version does.

grumpy_coder•1d ago

Monorepo very often means bazel for tooling (rbe and caching tests) and that means one WORKSPACE with common versions of libs.

Monorepo also means a team 'vetting' new thirdparty libs, and a team telling you your CI takes too long, and a team telling you to upgrade your lib within 23 minutes because theres a security issue in the korean language support...

lolinder•1d ago

Monorepo doesn't mean any of those things, nor does a polyrepo setup prevent any of them except for bazel.

It sounds like you worked in a dysfunctional organization that happened to use a monorepo. Their dysfunctions are not inherent in the monorepo model and they would have found other ways to be dysfunctional if not those.

DanielHB•1d ago

My company has been moving towards having monorepos per language stack. Decent compromise

Kinrany•1d ago

That sounds worse than either option. At that point put it all in one repo with a directory for each language.

amelius•1d ago

This will start to become a problem if the stacks need to communicate with each other using versioned protocols.

NiloCK•1d ago

Maybe I miss the point here, but it seems to me that versioning the protocols is the specific solution to maintaining interop between different implementations.

lolinder•1d ago

Why can't you just use versioning in your external-to-the-monorepo APIs and use HEAD within the monorepo? Nothing about combining some projects into a monorepo forces you into dropping everything else we know about software release cycles.

amelius•1d ago

The point is that it is more work.

lolinder•1d ago

More work than what? More work than sharing HEAD in the monorepo, certainly. But it's definitely not more work than versioning across multiple repos because it's literally the same thing. When you're exporting code from a monorepo you follow all the same patterns you would from a small single library repo.

jolt42•1d ago

Coupling and Cohesion likely has nothing to do with the language.

aldanor•1d ago

And then at some point your Rust people write a Python module in Rust via pyo3, and it has to be integrated into Python build system and Python type checkers, but also needs local rust crates as build dependencies and local python packages as runtime dependencies... hm.

przmk•1d ago

At my current $dayjob, there is a backend that is split into ~11 git repos which results in a single feature being split among 4-5 merge requests and it's very annoying. We're about to begin evaluating monorepos to group them all (among other projects). What would the alternative to a monorepo be in this case, knowing that we can't bundle the repos together?

dustingetz•1d ago

use git subtree - first to concatenate the minor repos into one major repo, and then subtree split from that point forward to publish subtree folders back to the minor repos, if needed (e.g. open source projects to github). works for us with about 8 minor repos, eliminated submodule pain entirely. only the delivery lead has to even know the minor repos exist.

przmk•1d ago

I have already briefly looked at git-subtree. From what I can gather, it doesn't help much with my use-case. You still need to manually pull from each subtree and push branches individually to each project. The end result is still 4-5 merge requests to handle on Gitlab for a single new feature.

I might have missed something.

whstl•1d ago

I believe dustingetz is suggesting making a monorepo for the code itself, but copying the subdirectories of the main repo into subrepos to solve your CI issues.

This means that developers have a monorepo for day to day work, but the CI/CD issues are isolated in their own separate repos, and can be handled separately.

Dunno if that's 100% of what they mean but it seems to be a solution to what you describe in another message ("our CI/CD pipeline doesn't allow us to do so and it is not handled by our team anyway")

whstl•1d ago

"11 repos with 4-5 merge requests" doesn't sound like Google-level, so I don't see why a monorepo wouldn't work without much work.

bluGill•1d ago

Is a mono repo the answer, or is the real problem you just have a bad repo split.

I can't answer that question, and there are reasons to go monorepo anyway. However if your problem is a bad polyrepo split going to monorepo is the obvious answer, but it isn't the only answer. Monorepo and polyrepo each have very significant problems (see the article for monorepo problems) that are unique to that setup. You have to choose what set of problems to live with and mitigate them as best you can.

hiddencost•1d ago

Do these 11 repos end up in separate binaries?

Because it sounds like you just need flag based feature releases.

przmk•1d ago

It ends up with 6 deployables that are coupled together (let's say micro-services). There are surely better ways to structure the project but our CI/CD pipeline doesn't allow us to do so and it is not handled by our team anyway. I haven't seen any good way to make my life easier for merges, tech reviews, deployments, etc…

mystified5016•1d ago

Yup, at work I have a few projects split across several repos in like four languages. A completely new feature implemented across the whole stack involves PRs in up to 8 different repos. Potentially more.

To be totally honest, yes this is an unbelievable pain in the ass, but I much prefer the strict isolation. Having worked with (much, much) smaller monorepos, I find the temptation to put code anywhere it fits too much, and things quickly get sloppy. With isolated repos, my brain much more clearly understands the boundaries and separation of concerns.

Then again, this results in a lot of code duplication that is not trivial to resolve. Submodules help to a degree, but when the codebase is this diverse, you're gonna have to copy some code somewhere.

I view it sort of like the split between inheritance and composition. You can either inherit code from the entire monorepo, or build projects from component submodules plugged together. I much prefer the latter solution, but clearly the former works for some people.

nightpool•1d ago

I think you might just have a badly architected backend. get rid of your microservices first and then we'll see how you're feeling

layer8•1d ago

The general rule is that things should be versioned together that change together. Separate repositories should be thought of similarly to separately versioned libraries. Dependencies between repositories should have stable interfaces. Design decisions that are likely to change should be encapsulated within a module, so that these decisions are hidden from other modules (a seminal paper about that is [0]). These considerations should guide any split into separate repositories.

[0] https://wstomv.win.tue.nl/edu/2ip30/references/criteria_for_...

matthew16550•1d ago

As an asside, I've found IntelliJ very helpful in this situation as it can load many repos into one project then doing commits / pushes / branches etc across various repos at the same time just seemed to work the way I wanted without much thinking about it.

wdb•1d ago

Only 11 repos? I am at 76 repos for one backend. lol It's madness.

mamcx•1d ago

And for small teams, what we want/need is the "all deps" mono-repo.

I wanna link other repos I depend on, but that repos can be read-only. And then all the tools work without extra friction

P.D: This could be another wish for jj!

no_wizard•1d ago

There's no good orchestration system that is both easy to implement and has the core features that make a monorepo pleasant to use that is language agnostic.

Bazel is complex and isn't the easiest to pick up for many (though to Google's credit the documentation is getting better). Buck isn't any better in this regard. Pants seems easiest out of all the big ones I've seen but its also a bit quirky, though much easier to get started with in my experience. NX is all over the place in my experience.

Until recently too, most of these monorepo systems didn't have good web ecosystem support and even of those that do they don't handle every build case you want them to, which means you have to extend them in some way and maintain that.

It also doesn't help that most CI systems don't have good defaults for these tools and can be hard to setup properly to take advantage of their advantages (like shared cross machine caching).

As an aside, the best monorepo tooling I have ever used was Rush[0] from Microsoft. If you are working in a frontend / node monorepo or considering it, do take a look. It works great and really makes working in a monorepo extremely uniform and consistent. It does mean doing things 'the rush way' but the trade off is worth it.

[0]: https://rushjs.io

baq•1d ago

Perfect write up. Rarely do I nod and murmur ’yes’ and ‘finally someone has written about it’ alternatively on each paragraph.

AlotOfReading•1d ago

One thing I don't usually see discussed in monorepo vs multi repo discussions is there's an inverse Conway's law that happens: choosing one or the other will affect the structure of your organization and the way it solves problems. Monorepos tend to invite individual heroics among common infrastructure teams, for example. Because there are so many changes going in at once, anything touching a common area has a huge number of potential breakages, so effort to deliver even a single "feature" skyrockets. Doing the same thing in a multi-repo may require coordinating several PRs over a couple of weeks and some internal politics, but that might also be split among different developers who aren't even on a dedicated build team.

makeitdouble•1d ago

Is your underlying assumption that the organization doesn't want to go one way or the other in the first place and is nudged by the technical choice afterwards ?

I think most of the time the philosophical decision (more shared parts or better separation) is made before deciding how you'll deal with the repos.

Now, if an org changes direction mid-way, the handling of the code can still be adapted without fundamentally switching the repo structure. Many orgs are multi-repo but their engineers have access to almost all of the code, and monorepo teams can still have strong isolation of what they're doing, up to having different CI rules and deployment management.

TeMPOraL•1d ago

I think GP's claiming it's a feedback loop, not one-directional relationship. Communication structure of an organization ends up reflected in the structure of systems it designs, and at the same time, the structure of a system influences the communication structure of the organization building it.

This makes sense if you consider that:

1) Changes to system structure, especially changes to fundamentals when the system is already being built, are difficult, expensive and time consuming. This gives system designs inertia that grows over time.

2) Growing the teams working on a system means creating new organizational units; the more inertia system has, the more sense it makes for growth to happen along the lines suggested by system architecture, rather than forcing the system to change to accommodate some team organization ideals.

Monorepo/multirepo is a choice that's very difficult to change once work on building the system starts, and it's a choice you commit at the very beginning (and way before the choice starts to matter) - a perfect recipe for not a mere nudge, but a scaffolding the organization itself will grow around, without even realizing it.

baq•1d ago

required reading: https://en.wikipedia.org/wiki/Conway%27s_law

tveita•1d ago

Having done this a few times, I suspect the norm is that the decision is taken without a full understanding of the tradeoffs, both because the decision is taken before engineering has matured and can be hard to change later, and because the disadvantages are easy to downplay - "Sure, but our team will always stay on top of dependencies."

Typically someone has read a few blog posts like the ones linked to, and have some vague ideas of the positives but don't have a full understanding of how how the disadvantages will shape their workflow.

I've seen people with experience at hobby or small scale successfully campaigning for a switch at work and then hitting a wall - in both directions. Updating every call site for a breaking change doesn't sound that onerous, and at a small scale it isn't. Having each team update versioned depencies doesn't sound that hard, and at a small scale it isn't.

Just like with languages, don't listen to anyone who tells you this will solve all your problems. One of the options are merely the least bad for your situation.

CamouflagedKiwi•1d ago

That's an optimistic take on what happens in the polyrepo setup. A common alternative (I suspect by far the more common one) is that changes are made to the common area but not propagated to downstream repos, which all end up pinned to different versions of the common repo and struggle to update once they get ~years out of date.

oivey•1d ago

Yeah. My experience is that the teams managing shared repos tend to shift responsibility for integrating their changes onto their users. They then also more often make breaking changes because they’re insulated from the costs of those changes.

williamdclt•1d ago

the obvious result of that is: the changes are often not integrated for ages, if ever. Which means at some point it becomes a problem and the cost to do the integration has become much higher.

bluGill•1d ago

We have a person deditated to bringing in changes to our polyrepo. Nothing is considered done until it is in his mainline branches so there is incentive to get things integrated. Nothing goes in until it passes the full test suite, whith he verifies you ran before integrating and then runs again to be sure.

as someone who works on core parts that are lively to break everything I spend half of my time just integrating things and anouther quarter trying to figure out how to make my things either less core or not need changes so often.

nitwit005•1d ago

I'd caution that a monorepo isn't a full fix to that. People often make multiple versions of libraries. You have separate 2.X and 3.X versions, with independent source code (or branches), and ask people to migrate to the new one.

There's not really a way around that when you need some behavioral change for the code using the library.

Adverblessly•1d ago

> Because there are so many changes going in at once, anything touching a common area has a huge number of potential breakages, so effort to deliver even a single "feature" skyrockets.

If a specific change in a monorepo is so centrally embedded it requires incredible effort to do atomically (the benefit of having the monorepo in the first place), you are still able to split it into multiple gradual changes (and "require coordinating several PRs over a couple of weeks and some internal politics, but that might also be split among different developers who aren't even on a dedicated build team.").

So in a monorepo you can still enjoy the same advantage you describe for multi repo, and you'll even have much better visibility into the rollout of your gradual change thanks to the monorepo.

wocram•23h ago

Much more common is that an easy cross project change in a monorepo simply isn't don't in a polyrepo because of how much more onerous it is.

ianpurton•1d ago

I've never worked on a mono repo that has the whole organizations code in it.

What are the advantages vs having a mono repo per team?

AlotOfReading•1d ago

One of the big advantages is visibility. You can be aware of what other people are doing because you can see it. They'll naturally come talk to you (or vice versa) if they discover issues or want to use it. It also makes it much easier to detect breakages/incompatibilities between changes, since the state of the "code universe" is effectively atomic.

lenkite•1d ago

Not sure if I get it. If you are using a product like Github Enterprise, you are already quite aware of what other people are doing. You have a lot of visibility, source-code search, etc. If you have a CICD that auto-creates issues you already can detect breakages, incompatibilities, etc.

State of the "code universe" being atomic seems like a single point of failure.

jeffbee•1d ago

GitHub search is insanely bad and it cannot do things like navigating to definitions between repos in an org.

lenkite•1d ago

If you want code search and navigation over a closed subgraph of projects that build into an artifact - opengrok does the job reasonably well.

eddd-ddde•1d ago

Imagine team A vendors into their repo team B's code and starts adding their own little patches.

Team B has no idea this is happening, as they only review code in repo B.

Soon enough team A stops updating their dependency, and now you have two completely different libraries doing the "same" thing.

Alternatively, team A simple pins their dependency to team B's repo at hash 12345, then just, never updates... How is team B going to catch bugs that their HEAD introduces on team A's repo?

lenkite•1d ago

This is already caught by multi-repo tooling like Github today. If you vendor in an outdated version with security vulnerabilities, issues are automatically raised on your repo. Team B doesn't need to do anything. It is Team-A's responsibility to adopt to latest changes.

eddd-ddde•13h ago

Curious because I haven't seen this myself. Do you mean, GitHub detects outdated submodule references? Or, GitHub detects copy of code existing in another repo, and said code has had some patches upstream?

lenkite•8h ago

Github has dependabot https://docs.github.com/en/code-security/dependabot/dependab... which can also raises PR's, though your mileage may greatly vary here depending on your language.

You can also configure update of dependencies https://docs.github.com/en/code-security/dependabot/dependab...

These work with vendored dependencies too.

(In our org, we have our own custom Go tool that handles more sophisticated cases like analyzing our divergent forks and upstream commits and raising PR's not just for dependencies, but for features. Only works when upstream refactoring is moderate though)

ashishb•1d ago

> What are the advantages vs having a mono repo per team?

If you have two internal services you can change them simultaneously. This is really useful for debugging using git bisect as you always have a code that passes the CI.

I might write a detailed blog about this at some point.

Arainach•1d ago

* Tooling improvements benefit everyone. Maybe that's a faster compiler, an improved linter, code search, code review tools, bug database integration, a presubmit check that formats your docs - it doesn't matter, everyone has access to it. Otherwise you get different teams maintaining different things. In 8 years at Microsoft my team went through at least four CI/CD pipelines (OK, not really CD), most of which were different from what most other teams in Windows were doing to say nothing of Office - despite us all writing Win32 C++ stored in Source Depot (Perforce) and later Git.

* Much easier refactors. If everything is an API and you need to maintain five previous versions because teams X, Y, Z are on versions 12, 17, and 21 it is utter hell. With a unified monorepo you can just do the refactor on all callers.

* It builds a culture of sharing code and reuse. If you can search everyone's code and read everyone's code you can not only borrow ideas but easily consume shared helpers. This is much more difficult in polyrepo because of aforementioned versioning hell.

* A single source of truth. Server X is running at CL #123, Server Y at CL #145, but you can quickly understand what that means because it's all one source control and you don't have to compare different commit numbers - higher is newer, end of story.

pawanjswal•1d ago

I felt like a pep talk and reality check rolled into one.

vinnymac•1d ago

I established monorepos for the last two large projects I operated. I’ve never heard such nice compliments from contributors in my whole career. It seems not only can it be a productivity booster but people genuinely love when things are easy to grok and painless.

Multiple large monorepos in an organization are highly valuable imo, and should become more of a thing over time.

wocram•23h ago

Why multiple manyrepos over a single monorepo?

jbverschoor•1d ago

Is there a way to set permissions on certain directories / force partial clones.

Not just a sparse clone.

echelon•1d ago

You can set permissions on writes.

Optional, per-directory OWNERS files are common, and most VCS frontends (Github, Bitbucket, etc.) can be configured to prevent merges without approval from the owning team(s) or DRI(s).

PRs that intersect multiple teams' ownership would require handoff of everyone impacted. So a team updating the company-wide "requests library" (or an equivalent change), with a wide blast radius, would be notifying everyone impacted and getting their buy-in.

Pawka•1d ago

It depends on the VCS you use. I don't know any ways to manage read permissions, such as allowing a person to checkout one directory but not another, though you can do that per branch on git.

But there are many ways to manage write permissions - limit the directories to which engineers are allowed to push code. E.g. if you use Git, this can be done with Gitolite, which is a popular hosting server.

Gitolite has very flexible hooks support, especially with so-called "Virtual Refs" (or VREFs)[1]. It is out of the box and has support to manage write permissions per write path [2]. You can go even further and use your own custom binary for VREF to "decide" if a user is allowed to push certain changes. One possible option - read incoming changed files, read metainformation from the repository itself (e.g., CODEOWNERS file at the root of the repo), and decide if push should be accepted. GitHub has CODEOWNERS [3], which behaves similarly.

[1]: https://gitolite.com/gitolite/cookbook.html#vrefs [2]: https://gitolite.com/gitolite/vref.html#quick-introexample [3]: https://docs.github.com/en/repositories/managing-your-reposi...

jbverschoor•1d ago

It's mostly about read/access permissions. I'd like to stay away from any type of git hook tbh

tex0•1d ago

Gerrit can do some of that.

ashishb•1d ago

> there a way to set permissions on certain directories / force partial clones.

No. And that's one reason small startups should separate frontend code into a separate monorepo.

If you would like to hire a contractor for SEO/web developer then give them access to frontend code. Keep the backend code segmented out.

jbverschoor•1d ago

This is exactly my point. I like git, I like monorepos, but I do care about control over access and history.

I use git mainly because everybody knows it, tooling is there, etc.

wocram•23h ago

With git you can accomplish this with submodules (eg. Main private repo with public submodules, or vice versa), but they are unpopular because they are fairly tedious to use.

tex0•1d ago

That depends on your VCS. Some systems don't even allow you to "clone" anything. And yes, some of them enforce all kinds of ACLs.

eddd-ddde•1d ago

Thats sounds like the type of workflow you would use with gerrit. It's technically multiple repos, but still implements the idea of an atomic build across all repos.

Then you configure ACLs for every repo or branch.

lihaoyi•1d ago

I wrote a bit about monorepo tooling in this blog post. It covers many of the same points in the OP, but in a lot more detail.

- https://mill-build.org/blog/2-monorepo-build-tool.html

People like to rave about Monorepos, and they are great if set up correctly, but there's a lot of intricacies that often goes on behind the scenes to make a Monorepo successful that it's easy to overlook since usually some "other" team (devops teams, devtools team, etc.) is shouldering all that burden. Still worth it, but most be approached with caution

kfkdjajgjic•1d ago

The artikel doesn’t bringa it up, but I’ve seen several places where repos has been cut according to company silos, where applikation code was in a monorepo for all teams, IaC was in one monorepo for all teams, and ops was in one monorepo for all teams. It was not good at all.

bluGill•1d ago

that isn't a mono repo, it is a polyrepo setup with the wrost features of a polyrepo. I use a polyrepo setup and it works well, but we need careful attention to the repo split - and repo where too many different teams work together gets the worst features of a monorepo combined with the worst features of a poly-repo. We have a lot of tooling around making the different repos stay in sync.

woile•1d ago

I've been very happy with nix. I've been using nix in the reciperium.com monorepo, granted, it's only me, but I'm quite happy with having everything there. From docs, to the infra with terraform, to frontend and backend. The procedure for the CI is quite straightforward (nix build .#project), and caching the dependencies in the CI works quite okay. Even the secrets are there, encrypted using age (might not be the best, but good enough).

wocram•23h ago

Nix is great until you're rebuilding multiple packages that are slow to build without incrementality (eg. C++ or Rust).

cloogshicer•1d ago

Here's what I never got about monorepos:

Imagine you have an internal library and also two consumers of that library in the repo. But then you make breaking changes to the library but you only have time to update one of the consumers. Now how can the other consumer still use the old version of that library?

code_biologist•1d ago

That's the neat part. They don't. Either the broken consumer updates their use, you update it for them to get your change shipped, or you add some backwards compatibility approach so your breaking changes aren't breaking.

cloogshicer•1d ago

Thanks for the info!

Seems like a big restriction to me.

bluGill•1d ago

it is the only sane thing to do. Allowing everyone to use their own fork means when a major bug is found you have to fix thousands of forks. If the bug is a security zero day you don't have time.

cloogshicer•1d ago

Couldn't you just leave the other consumer at the old release (presumably well tested, stable)?

I don't see how being forced to upgrade all consumers is a good thing.

bluGill•1d ago

Mapbe but each one now is another thing that you need to fix if a major issue is found. If it is only a few releases not a problem but it can get to hundreds and that becomes hard. Particularly if the fix can be cherry-picked cleanly to other branches.

code_biologist•15h ago

I don't see how being forced to upgrade all consumers is a good thing.

It forces implementers of broad or disruptive API or technical changes to be responsible for the full consequences of those decisions, rather than the consumers of those changes who likely don't have context. People make better choices when they have to deal with the consequences themselves.

It also forces the consequences to be incurred now as opposed to 6 months later when a consumer of the old library tries to upgrade, realizes they can't easily, but they need a capability only in the new version, and the guy who made the API change has left the company for a pay raise elsewhere.

As a plus, these properties enable gigantic changes and migrations with confidence knowing there aren't any code or infrastructure bits you missed, and you're not leaving timebombs for a different project's secret repo that wasn't included in the gigantic migration.

Bluntly, if you can't see why many people like it (even if you disagree), you probably haven't worked in an environment or technical context where mono vs poly truly matters.

spankalee•1d ago

The whole point of a monorepo is to force you to update all of the consumers, and to realize that breaking changes are expensive.

The two monorepo ways to do this:

1. Use automated refactoring tools that now work because it's one repo

2. Add the new behavior, migrate incrementally, then remove the old behavior

baq•1d ago

> force you to update all of the consumers, and to realize that breaking changes are expensive.

...and the article points out correctly that it's a lie anyway, but at least you can find all the consumers easily.

spankalee•1d ago

The article is not correct on that point. At Google we would create release branches to fix the monorepo at a predictable point for testing, and only cherry-pick what we need during the release process.

I'm sure others do similarly, because there is no way you would allow arbitrary changes to creep in the middle of a multi-service rollout.

baq•1d ago

The multi-service staggered rollout is the reason the article is correct unless you are tolerating contract mismatches somehow other than in the code. not at google so won't be guessing.

cloogshicer•1d ago

Thanks for the info!

Seems like a big restriction to me.

bluGill•1d ago

Both of those work in polyrepo. You need a tools team to make it happen though, just like monorepo needs a tool team. The tools needed are different but you still need them.

eddd-ddde•1d ago

In a polyrepo it is more common that the update simply happens, now repo A depends on v1 and repo B depends on v2, then a year has passed and repo A doesn't even remember they still depend on an old insecure library.

bluGill•1d ago

That is a downside of a polyreop that you will need to figure out how to mitigate.

It doesn't matter if you go monorepo or polyrepo you wil have issues as your project grows. You will need to mitigate those issues somehow.

kccqzy•1d ago

In a polyrepo it is common to say I depend on this specific git SHA of that other repo. In a monorepo it is weird and unheard of to say I depend on this specific SHA of the current repo. It's a matter of defaults.

bluGill•1d ago

In a polyrepo you need to figure out how/when to update those SHAs. This is one of the hard things about polyrepos. Monorepo of course doesn't need that concept because you cannot depend on some previous state of the repo.

marcosdumay•1d ago

With enough tooling, a monorepo or a polyrepo environment look exactly the same. Those articles are "Look. This is a good way to organize your code", not something that tells you one of those is better than the other.

spankalee•1d ago

Most monorepos imply that all first-party code only available at one version. Polyrepos usually allow first-party code to depend on old versions of other first-party code.

krschultz•1d ago

You don't make breaking changes. You provide the new API and the old API at the same time, and absorb the additional complexity as the library owner. Best case scenario everyone migrates to the new API and eventually remove the old one. This sounds onerous, but keep in mind at a certain scale there is no one commit in production at any given time. You could never roll out an atomic breaking change anyway, so going through this process is a reflection of the actual complexity involved.

cloogshicer•1d ago

Thank you for the response!

Genuine question: if you can't have one commit in production at any given time, what advantages for the monorepo remain?

codethief•16h ago

> if you can't have one commit in production at any given time

That might be possible in a simple library + 1 consumer scenario if you follow the other commentors' recommendation to always update library + consumer at once. But in many cases you can't, anyway, because you're deploying several artifacts or services from your monorepo, not just one. So while "1 commit in production at any given time" is certainly neat, it wouldn't strike me as the primary goal of a monorepo. See also this discussion about atomicity of changes further up: https://news.ycombinator.com/item?id=44119585

> what advantages for the monorepo remain?

Many, in my opinion. Discoverability, being able to track & version all inter-project dependencies in git, homogeneity of tooling, …

See also my other comment further up on common pains associated with polyrepos, pains that you typically don't experience in monorepos: https://news.ycombinator.com/item?id=44121696

Of course, nothing's free. Monorepos have their costs, too. But as I mention in the above comment and also in https://news.ycombinator.com/item?id=44121851, a polyrepo just distributes those costs and makes them less visible.

codethief•1d ago

There's a second option, not mentioned by the sibling comments so far: Publish the library somewhere (e.g. an internal Nexus) and then have the other consumer pin the old version of the library instead of referring to the newest version inside the monorepo. Whether or not this is acceptable is largely a question of ownership.

cloogshicer•18h ago

Thanks for your response! Don't you lose the main advantage of the monorepo then, since you can no longer rely on the fact that all the code at any one commit in history fits together? Or are there other significant advantages that still remain?

codethief•16h ago

See my response to your other comment in a sibling thread: https://news.ycombinator.com/item?id=44120854

slippy•1d ago

It's also worth noting that in systems that get as large as Google's that you end up with commits landing around the clock. It gets so that it's impossible to test everything for an individual commit, so you have a 2nd kind of test that launches all tests for all branches and monitors their status. At Google, we called this the Test Automation Platform (TAP). One cool thing was that it continuously started a new testing run of all testable builds every so often -- say, 15 minutes, and then your team had a status based on the flaky test failures vs solid test failures of if anyone in any dependency broke your code.

So if your code is testing fine, and someone makes a major refactor across the main codebase, and then your code fails, you have narrowed the commit window to only 15 minutes of changes to sort through. As a result, people who commit changes that break a lot of things that their pre-commit testing would be too large to determine can validate their commits after the fact.

There's always some amount of uncertainty with any change, but the test it all methodology helps raise confidence in a timely fashion. Also decent coding practices include: Don't submit your code at the end of the day right before becoming unavailable for your commute...

yc-kraln•1d ago

The answer, of course, is "it depends".

We have something like ~40 repos in our private gitlab repo, and each one has its own CI system, which compiles, runs tests, builds packages for distribution, etc. Then there's a CI task which integrates a file system image from those ~40 repo's packages, runs integration tasks, etc.

Many of those components communicate with each other with a flatbuffers-defined message, which of course itself is a submodule. Luckily, flatbuffers allows for progressive enhancement, but I digress--essentially, these components have some sort of inter-dependency on them which at the absolute latest surfaces at the integration phase.

Is this actually a multi-repo, or is it just a mono-repo with lots of sub-modules? Would we have benefits if we moved to a mono-repo (the current round-trip CI time for full integration is ~35 minutes, many of the components compile and test in under 10s)? Maybe.

Everything is a tradeoff. Anything can work, it's about what kinds of frustrations you're willing to put up with.

nssnsjsjsjs•1d ago

> Any operation over your repository that needs to be fast must be O(change) and not O(repo).

This is a good thought! It actually needs to be O(1/commit rate) though, so that having the monorepo doesn't create long queues of commits.

Or have some process batch passing ready to merge PRs into a combined PR and try to merge that. And best guess on the failing PR if it fails.

atq2119•1d ago

If you go the batching route, bisection on failure makes it more like O(log(1/commit rate)).

eddd-ddde•1d ago

Just make presubmit a fraction of your postsubmit. Each change has fast operations while still having global testing.

Then if postsubmit fails you just have to rerun the intersection of failing tests and affected tests on each change since the last green commit.

spankalee•1d ago

For those of you working in Node and npm, npm has pretty good built-in support for monorepos now with the workspaces feature. The big missing thing is incremental builds, which I highly recommend looking at Google's Wireit project for: https://github.com/google/wireit/

Wireit is the smallest change from plain npm that gets you a real dependency graph of scripts, caching (with GitHub Actions support), incremental script running, and services.

spankalee•1d ago

I love monorepos, but in large organizations they have a counter-intuitive incentive for teams to _not_ allow other teams to depend on them, which can _reduce_ code reuse - the opposite of what some adopters want.

This issue is that users of a library can put almost infinite friction on the library. If the library team wants to make a change, they have to update all the use sites, but Hyrum's Law will get you because users will do the damndest things.

So for the top organization, it's good if many other teams can utilize a great team's battle-tested library, but for the library team it's just liability (unless making common code is their job). In a place like Google you either end up with internal copies and forks, strict access control lists, or libraries that are slow as molasses to change.

eddd-ddde•1d ago

Well when making a library that's intended to be shared, you REALLY need to stop for a second and think about the API. Ideally APIs don't change, and when they do, you better have planned for large scale changes, or just use a new function and mark the old one deprecated.

I don't think there's anything wrong with copy pasting some useful piece of code too, not everything has to be a library you depend on, for small enough things.

kccqzy•1d ago

At least the benefit of a monorepo is that you can find all the use sites in the first place and correct these wrong uses. You can even correct them atomically if you so wish.

ec109685•1d ago

I would still say code is more likely to be reused in the monorepo versus trying to take an external dependency in the poly repo case. Just the ease of making a change to target your case is so much higher.

wocram•23h ago

All software with dependencies needs to respect it's dependents. A monorepo doesn't really change anything about the relationship between a library and it's users, except that the library or the users are somewhat more empowered to change each other.

zvr•1d ago

Genuine question, because I've never worked somewhere with a monorepo infrastructure: is it really "one repo for all code in the organization" or "one repo for everything related"?

In my organization we have around 70k internal git repos (and an order of magnitude fewer public ones), but of course not everything is related to everything else; we produce many distinct software products. I can understand "collect everything of a product to a single repo"; I can even understand going to "if there is a function call, that code has to be in the same repo". But putting everything into a single place... What are the benefits?

scott01•1d ago

In game dev monorepo per product is often used, which includes game code, art assets, build system and tooling, as well as engine code that can receive project-specific patches. In Perforce, it's organised into streams, where development streams are regularly promoted to staging, then to release, etc.

The benefit is the tooling, as the article mentioned. Everything in the repo is organised consistently, so I can make ad-hoc Python tools relying on relative paths knowing that my teammates have identical folder structure.

anon7000•1d ago

When you have N repos, you also have N ways of managing dependencies, N ways of doing local bin scripts and dev environment setups, N projects with various out of date & deprecated setups, N places to look when you need to upgrade a vulnerable dependency, N services which may or may not configure telemetry in a consistent way, N different CI & deployment workflows…

It just gets very difficult to manage, especially if people frequently need to work across many repos. Plus, onboarding is a pain in the ass.

Monorepo example: if I want to add a new Typescript package/library for internal NodeJS use, we have a boot strapping script that sets it up. And it basically:

1. Inherits a tsconfig that just works in the context of the repo

2. Jest is configured with our default config for node projects and works with TS out of the box.

3. Listing / formatting etc are all working out of the box.

4. Can essentially use existing dependencies the monorepo uses

5. Imports in existing code work immediately since it’s not an external dependency

6. CI picks up on the new typescript & jest configs and adds jobs for them automatically

7. Code review & collaboration is happening in the same spot

8. This also makes it easier to have devs managing the repo — for example, routine work like updating NodeJS is a lot easier when you know everything is using a nearly identical setup & is automatically verified in CI.

One challenge I had to help solve in a previous job was that onboarding was difficult because we had a small number of large repos everyone worked in. The standards were slightly different across them. Npm, pnpm, and yarn were all in use. Deployment worked pretty differently among them. CI setups were unique, and each of the large projects had, if not a team, some number of people spending a lot of time just managing the project’s workflows.

So many coordination things just get easier when there isn’t an opportunity to get out of sync. If you do separate repos, you can totally share config… but now it costs a dependency update PR to pull in that tiny update to the shared unit test config and now everything. It’s just guaranteed to get out of sync, and it’s hard to catch issues when you can’t validate a config change with all projects using it at the same time.

So because it becomes trickier (and takes work) just to do the action of syncing multiple repo’s setups… inevitably, you end up with some “standards” that are loosely followed and a lot of slightly different setups that get hard to untangle the longer they grow. If you can accept the cost of context switching between repos, or if people don’t need to switch, maybe it’s ok… until something like a foundational dependency update (NodeJS, Typescript, React, something like that) needed for security becomes extremely difficult because you have a million different ways of configuring things and the JS ecosystem sucks

bluGill•1d ago

You CAN have n different whatever in a polyrepo - that doesn't mean you must. You can settled on a company wide package manager, CI system, build system and whatever. That is what my company has, while each repo has their own setup scripts, there are maybe 100 lines in each repo (and almost all of those lines are redundant and could be combined if I spent some time).

The above breaks down when we have third party code - since they don't follow our common patterns for building so they have to do something different. Bringin that into a monorepo would be just as different from everything else.

lenkite•1d ago

> When you have N repos, you also have N ways of managing dependencies, N ways of doing local bin scripts and dev environment setups, N projects with various out of date & deprecated setups, N places to look when you need to upgrade a vulnerable dependency, N services which may or may not configure telemetry in a consistent way, N different CI & deployment workflows…

No, you do not, unless you mean N=1. Build scripts/tooling/linters etc are put into a different repo, are released and consumed by each individual repo.

bob1029•1d ago

This thread is reminding me of a prior one about complexity merchants. I am seeing a lot of sentiment that there is somehow a technical sacrifice by moving to a monorepo.

This is absolutely ludicrous unless you fail to grasp the power of a hierarchical file system. I don't see how a big mess like CI/CD is made easier by spreading it out to more points of configuration.

To me the whole point of a monorepo is atomic commits for the whole org. The power of this is really hard to overstate when you are trying to orchestrate the efforts of lots of developers - contrary to many claims. Rebasing in one repo and having one big meeting is a hell of a lot easier than doing it N times.

Even if the people on the team hate each other and refuse to directly collaborate. I still don't see the reason to not monorepo. In this scenario, the monorepo becomes a useful management and HR tool.

cmrdporcupine•1d ago

The push to fragmentation and atomism is so strong with this generation of devs. The obsession with microservices, dozens of small repositories, splitting everything up from fear of "monoliths."

What they're doing is creating a mass of complexity that is turning org-chart problems into future technical ones and at the same time not recognizing the intrinsic internal dependencies of the software systems they're building.

Luckily my current job is not like this, but the last one was, and I couldn't believe the wasted hours spent doing things as simple as updating the fields in a protobuf schema file.

bluGill•1d ago

That push to fragmentation is in large part because of hard lessons learned from the problems of a monolith.

The answer is IMO somewhere in between. Microservices can get too tiny and thus the system becomes impossible to understand. However a monolith is impossible to understand as well.

The real problem is you need good upfront architecture to figure out how the whole system fits together. However that is really hard to get right (and Agile discourages it - which is right for small projects where those architects add complex things to mitigate problems you will never have)

jyounker•1d ago

Monolith != Monorepo. They're independent concepts.

datadrivenangel•1d ago

The tooling defaults around Github encourage having one thing per repo.

layer8•1d ago

A monolith can (and often does) consist of separately versioned libraries.

MiscCompFacts•10h ago

I work at a company that has several micro services and a backend all in one monorepo. For some anecdata.

cmrdporcupine•1d ago

https://askastaffengineer.com/

mherkender•1d ago

Large software projects cycle back and forth between fragmentation and defragmentation. There is no right answer, only what's right for each project at the time.

Relevant xkcd: https://xkcd.com/2044/

ecoffey•1d ago

In my experience microservices are easier to manage and understand when organized in a monorepo.

layer8•1d ago

That indicates a strong coupling between those microservices.

ecoffey•1d ago

Even loose coupling is still coupling. For the things that have to be coupled having the code organized in the same place, being able to easily read the source for “the other side”, make a change and verify that dependees test still pass, etc is immensely powerful.

folkrav•23h ago

I like using the term “distributed monolith” for those systems with very tightly coupled microservices.

gengstrand•1d ago

It is true that there are significant benefits to monorepo but it comes at a cost. Managing a monorepo is more expensive than polyrepo. For the details behind that claim, check out https://www.exploravention.com/blogs/monorepo/

The question is this. Do the costs of monorepo justify the benefits for your situation? The answer is not always yes.

codethief•1d ago

Note that the costs depend on the scale and size of the monorepo, and a polyrepo is not without costs either (which people often like to forget or ignore because they are less visible).

I spent some years at small to mid-sized companies (~30-100 devs) that would have profited from a monorepo. However, people were in the habit of splitting repositories every other month. Sometimes a single team would own a given repository, but more often than not several teams would contribute to each repository.

I have serious PTSD from that time. Every single pipeline in every repo worked differently, tools were different, setup, scripts and commands were different. In some repositories you could trust the CI pipeline, in others you absolutely couldn't. CI performance gains in one repo wouldn't translate to another. And of course you often still had some tech debt lying around from when repositories got split and people forgot to clean up thoroughly. Uggh.

Now, to be fair, people did put quite a bit of effort into their setups and pipelines and all that, and it wasn't that the outcome of that effort in and by itself was totally bad. Not at all. But everyone did things differently. And often teams just re-did the work other people had already done & solved – there was a ton of overhead.

Finally, the worst part were inter-repository dependencies. People argued coupling would be loose and all that, so we could easily split repos, but in reality there were so many implicit dependencies because ultimately all repositories made up one big application. Merge requests had to be coordinated. One repo imported files from another. Later a CI pipeline in one repo triggered a pipeline in another repo…

This brings me to another problem with polyrepos: They harm discoverability and cause certain dependencies not to be encoded in code. In a monorepo, in contrast, when people wonder where they can find X, or whether anyone uses or depends on Y, the answer is usually only a `ripgrep` or `git grep` away. In a polyrepo, however, they need to know where to look first.

Born from all these experiences, my mantra has been: Split repos only if you really have to and have a very good reason! (Entirely different applications or ecosystems; different parts of the code owned by different & independent teams; …)

yencabulator•1d ago

> To me the whole point of a monorepo is atomic commits for the whole org.

The belief that a monorepo makes a change somehow more atomic is one of the traps.

From the article:

> The greatest power and biggest lie of the monorepo is that it is possible to make atomic commits across your entire codebase. [...]

> Your monorepo now contains many different deployable artifacts that deploy at different times. It is also technically possible to make, for example, a breaking change to a service’s interface, a service’s implementation, and the service’s clients all in one PR. However, this PR will break when you deploy it because you do not deploy your service and all of its clients atomically. While this is also possible in a world with many repositories, the requirement to do this change in multiple pull requests is often enough to remind engineers that breaking changes to a service contract are not safe to make.

> Your users must understand that your deployment system operates asynchronously with respect to what happens in the monorepo. Its primary interaction with the monorepo is to go and pick up the “latest” build artifacts for a particular service; everything else happens on timetables that are potentially not under your control and can happen arbitrarily far in the future.

> A common CI job in a monorepo is to validate service contracts and make sure that they are not broken unless the author deliberately intended to do so, and they are required to provide a justification as to why such a change is OK.

crazygringo•1d ago

That's confusing two different things, though.

A monorepo does make changes atomic in the code. There's no trap there.

You're talking about deployment, and yes when deployment is staggered, then obviously all atomic changes need to be backward-compatible, or else be very carefully orchestrated. But that doesn't have anything to do with monorepo vs polyrepo. That's just staggered deployment.

You have to deal with backwards compatibility in both cases. But at least with the monorepo you can see and track and merge all the changes related to a feature in one place, and be able to roll them back in one place.

yencabulator•1d ago

There's no such thing as when deployment is staggered. It's a distributed problem, so by definition it is not synchronous.

(Or you turn services off for the duration of the deploy. Most companies do not want that these days.)

Also, you're missing this part of the article:

> While this is also possible in a world with many repositories, the requirement to do this change in multiple pull requests is often enough to remind engineers that breaking changes to a service contract are not safe to make.

crazygringo•1d ago

I was talking about the concept of deployment in general. Yes, of course it's usually staggered for monorepos. I don't know why you're arguing about saying there's "no such thing as when" and then immediately point out a case of "when" immediately following. (And turning off services for 15 minutes at 2 am is definitely still a thing.)

And I'm not missing any part of the article. I was talking about your comment, and the fact that you are conflating two different things. A monorepo allows you to make atomic commits that a polyrepo does not, full stop. There's no trap there. Deployment is separate. You have to worry about breaking changes regardless of it being a polyrepo or monorepo. But a monorepo can make each atomic change far easier to track and manage and reason about.

oftenwrong•14h ago

There is also a question of what developers should be developing against. A typical monorepo approach allows many developers to develop only against trunk, and not have to allow the messy world of releases and deployments into their brain.

marcosdumay•1d ago

> The power of this is really hard to overstate

A really useful heuristics when you are designing programming environments is: the more power you give to a team of developers, the more problems you will have.

Technically, atomic commits are not more power, they are less. But it does empower the team to work with bad interfaces. And that's a power that creates problems.

jayd16•1d ago

Here's the issue... Being able to track commits across multiple projects is just a nice to have. Does it even significantly increase your ability to track dependencies or trigger down stream tests? You can track that stuff with multi-repo automations. Maybe it helps a bit but it's not free and its not complete.

Your deploys don't become atomic. Your builds don't become atomic. At best you get to tell yourself you can version a bit more loosely.

Mono-repos themselves do not scale easily. Inherently its a harder technical problem to solve. You need to toss git and find something better, which is not easy. Its work. Its so much work that it is incredibly clear that you've never experienced it yourself.

gugagore•1d ago

Do you have go-to multi repo automations and other tools for operating across multiple repos?

nobodywillobsrv•1d ago

While things like git submodules have problems, are they not sort of the best of both worlds camp?

I never quite got the arguments/grumblings about this stuff but perhaps I do not understand what friction people are hitting up agains. It feels to me the problems of multiple repositories is the same problem of having to manage versions of dependencies like in pip or something that you do not even own.

Perhaps people are using tools that mean they are not able to search/click through their multi repo setup even though the code connects?

Too•21h ago

Yes and No.

One top level "superrepo" is the best of both worlds. You don't break the limits of git, you can include third party repos, you can seal off access to some secret repos. All while still giving benefit of everything checked out in one place. Tooling for flattening the git log exists and is relatively cheap to build yourself, It's easy to design CI that runs both big and small jobs in the superrepo and the subcomponents.

Nested submodules = Game over. Don't ever do it. This is the worst of both worlds. Common components included twice or more need to be aligned to exact same revision, without help of semantic versioning. Components far down the dependency tree need to be uplifted recursively through a chain of submodules, resulting in 10 commits with no common CI for one tiny change.

wocram•23h ago

Just today I saw someone try to set up a new project and stumble over setting it up as it's own repository, instead of just having a central place to add it.

There's also a vicious feedback loop of separating projects across repos creating a huge impediment for touching anything outside of the project you're directly working in.

gorgoiler•1d ago

An unspoken truth of a monorepo is that everyone is committed to developing on trunk, and trunk is never allowed to be broken. The consequence of this is that execution must be configurable at runtime: feature flags and configuration options with old and new code alongside each other.

You can have a monorepo and still fail if every team works on their own branch and then attempts to integrate into trunk the week before your quarterly release process begins.

You can fail if a core team builds a brand new version of the product on master with all new tests such that everything is green on every commit but your code is unreleasable because customers aren’t ready for v2 and you need to keep that v1 compatability around.

946789987649•1d ago

I didn't know places still had quarterly releases. That seems to like the one to resolve rather than a mono repo.

bluGill•1d ago

not all the world is a web site or even internet connetted. not all the world has no safety concerns.

if you work in medical or aviation areas every release legally needs extensive - months - testing before you can release. If there are issuse found in that testing you start over. Not all tests can be automated.

i work in agraculture. the entire month of July there will be nobody in the world using a planter or any of the software on it. there is no point in a release then. the lack of users means we cannot use automated rollback if the change somehow fails for customers - we could but it would be months of changes rolled back whe Brasil starts planting season.

vegetablepotpie•1d ago

Every company that uses SAFe agile has quarterly, or bi-quarterly, releases [1].

[1] https://www.servicenow.com/docs/bundle/yokohama-it-business-...

gorgoiler•1d ago

It’s more common than you think if you expand your view of release a bit. On the one hand you very much still have shrink-wrap software (for example, all firmware) that ships on a very slow cadence.

On the other hand even the big tech companies will only expose code paths very slowly and very conservatively. Meta’s Threads.app for example combined both a constant churn of innovation on master with a very measured gating of new features shipping to the public.

The best teams do indeed, as you say, ship and test finished builds on a weekly or daily basis even if the stuff that gets under the customers’ / users’ / clients’ noses appears on a far less regular basis. After all, any kind of severe bug could necessitate a release at any moment.

surajrmal•1d ago

Android is only recently switching to quarter releases instead of yearly. Most. Popular Linux distros only have major releases every 6 months. While chrome cuts a release branch every 4 weeks, it soaks it in a beta channel for another 4. Same goes for the rust compiler toolchain, albeit on a 6 week cadence.

cormacrelf•1d ago

> Meta has a sophisticated implementation of a target determinator on top of buck2, but I don’t believe it is open-source.

It is: https://github.com/facebookincubator/buck2-change-detector

> Some tools such as bazel and buck2 discourage you from checking in generated code and instead run the code generator as part of the build. A downside of this approach is that IDE tools will be unable to resolve any code references to these generated files, since you have to perform a build for them to be generated at all in the first place

Not an issue I have experienced. It's pretty difficult to get into a situation where your IDE is looking in buck-out/v2/gen/781c3091ee3/... for something but not finding it, because the only way it knows about those paths is by the build system building them. Seeing this issue would have to involve stale caches in still-running IDE after cleaning the output folder, which is a problem any size repo can have. In general, if an IDE can index generated code with the language's own build system, then it's not a stretch to have it index generated code from another one.

The problem is more hooking up IDEs to use your build system in the first place. It's a real slog to support many IDEs.

Buck recently introduced an MSBuild project generator where all build commands shell out to buck2. I have seen references to an Xcode one as well, I think there's something there for Android as well. The rust-analyzer support works pretty well but I do run a fork of it. This is just a few. There is a need (somewhat like LSP, but not quite) for a degree of standardization. There is a cambrian explosion of different build systems and each company that maintains one of them only uses one or two IDEs and integrates with those. If you want to use a build system for an IDE they don't support, you are going to have a tough time. Last I checked the best effort by a Language Server implementation at being build-system agnostic is gopls with its "gopackagesdriver" protocol, but even then I don't think anyone but Bazel has integrated with it: https://github.com/bazel-contrib/rules_go/wiki/Editor-and-to...

rwieruch•1d ago

Over the past four years, I’ve set up three monorepos for different companies as contract work. The experience was positive, but it’s essential to know your tools.

Since our monorepos were used exclusively for frontend applications, we could rely entirely on the JavaScript/TypeScript ecosystem, which kept things manageable.

What I learned is that a good monorepo often behaves like a “polyrepo in disguise.” Each project within it can be developed, hosted, and even deployed independently, yet they all coexist in the same codebase. The key benefit: all projects can share code (like UI components) to ensure a consistent look and feel across the entire product suite.

If you're looking for a more practical guide, check out [0].

[0] https://www.robinwieruch.de/javascript-monorepos/

wocram•23h ago

This isn't a polyrepo in disguise. This is a monorepo done correctly.

bittermandel•1d ago

I firmly believe that us at Molnett(serverless cloud) going for a strict monorepo built with Bazel has been paramount to us being able to make the platform with a small team of ~1.5 full-time engineers.

We can start the entire platform, Kubernetes operators and all, locally on our laptops using Tilt + Bazel + Kind. This works on both Mac and Linux. This means we can validate essentially all functionality, even our Bottlerocket-based OS with Firecracker, locally without requiring a personal development cluster or such.

We have made this tool layer which means if I run `go` or `kubectl` while in our repo, it's built and provided by Bazel itself. This means that all of us are always on the same version of tools, and we never have to maintain local installations.

It's been a HUGE blessing. It has taken some effort, will take continuous effort and to be fair it has been crucial to have an ex Google SRE on the team. I would never want to work in another way in the future.

EDIT: To clarify, our repo is essentially only Golang, Bash and Rust.

mattmanser•1d ago

The question here is why are you using micro service pattern and k8s with 2 Devs. That pattern is not designed for that small scale operation and adds tons of completely unnecessary complexity.

And does it really matter what you go with when you've got 1.5 engineers?

It's a non-problem at that scale as both engineers are intimately aware of how the entire build process works and can keep it in their head.

At that scale I've done no repo at all, repo stored on Dropbox, repo in VCS, SVN, whatever, and it all still worked fine.

It really hasn't added anything at all to your success.

BTW, it's still common for developers to start entire repos on their own laptops with zero hassles in tons of dev shops that haven't been silly and used k8s with 2 developers.

In fact at the start of my career I worked with 10 or so developers the shitty old MS one where you had to lock files so no-one else can use them. You'd checkout files to allow you to change them (very different to git checkout), otherwise they'd be ready only on your drive.

And the build was a massive VB script we had to run manually with params.

And it still worked.

We got some moaning when we moved to SVN too at how much better the old system was. Which was ridiculous as you used to have to run around and ask people to unlock key files to finish a ticket, which was made worse as we had developer consultants who'd be out of office for days on end.

So then you'd have to go hassle the greybeard who had admin rights to unlock the file for you (although he wasn't actually that old and didn't have a beard).

bee_rider•1d ago

Keeping code in Dropbox kinda sucks even with 1 or .5 developers though. That said, yeah, a regular old git or (I assume, never used it) svn seems fine.

eadmund•1d ago

> Keeping code in Dropbox kinda sucks even with 1 or .5 developers though. That said, yeah, a regular old git or (I assume, never used it) svn seems fine.

What you do is store the git repo in Dropbox, and developers just use it as a remote. With backups, this could actually go a reasonably long time, although I personally wouldn’t suggest it.

skydhash•1d ago

I think an easier option would be to buy a vps and just use ssh.

bittermandel•1d ago

I think this take is quite shallow and lacks insight into how one would actually build a somewhat complex technical platform.

We are not using a microservice pattern at all. I am not sure where you get that from. If anything we have several "macro services".

Our final setup is quite complex as we are building a literal cloud provider, but in practice we have a Go API, a Docker registry, a Temporal Worker and a Kubernetes controller. Whats complicated is everything else around it. We run our platform on bare-metal and thus have auxiliary services like a full-blown Kubernetes cluster, Ory Hydra + Kratos, SpiceDB, Cilium, Temporal Cluster + Workers and some other small things. We need to be able to test this locally to feel safe to release to production. And in turn our production environment is almost identical to our local environments.

None of that would be possible unless we've done something similar to what we have built today. Most companies cannot run their entire stack on their laptop, more unlikely that they could run a full cloud provider.

eadmund•1d ago

> a small team of ~1.5 full-time engineers

Yes, with one and a half FTEs you should only have a single repo.

My experience with Bazel has been extremely bad, but I don’t think that it should necessarily be avoided completely. It may actually have some value on extremely large multi-team projects. But for less than two FTEs it seems like massive overkill.

I believe that you could do what you need with Kind (and maybe Tilt?), without Bazel.

> We have made this tool layer which means if I run `go` or `kubectl` while in our repo, it's built and provided by Bazel itself. This means that all of us are always on the same version of tools, and we never have to maintain local installations.

Go kind of does that for you already, with go.mod. Since kubectl is a Go program, you could achieve that goal the same way.

> it has been crucial to have an ex Google SRE on the team

I wonder how many additional team members y’all could afford in return for an ex-Googler’s salary expectations.

I sincerely hope that y’all find the maintenance expense of Bazel to be worth it going forward. Hopefully you will!

bittermandel•1d ago

I don't think you are wrong at all. As we are all founders with an OK salary and this is our area of expertise, so we're able to take advantage of our previous experiences and reap the benefits. We're probably uniquely positioned here.

I had massive issues at my previous employer with Bazel. They did not try to make Bazel work for non-SREs, which as you can imagine didn't work very well. So it's definitely not a silver bullet!

We should probably write a blog post about our setup!

munksbeer•1d ago

> My experience with Bazel has been extremely bad

Would you mind elaborating and providing some examples of what was bad?

We have a monorepo built using bazel, and at first when new to bazel, I was pretty frustrated. But now I can't think of any issue I've had with it recently.

But we do have a relatively simple setup.

wocram•23h ago

Bazel is notoriously difficult to set up to begin with, but not so difficult to use after that. What was your bad experience?

lbhdc•1d ago

I am in a pretty similar situation as you, and have really been feeling the benefits of going all in on bazel.

Currently I have to run `bazel run <tool>`. Your solution sounds way better. How does yours work?

eddd-ddde•1d ago

Probably something similar to a py venv that you run to activate some aliases in your environment?

peterldowns•1d ago

Not the OP but you can use tools like direnv + mise/asdf/nix so that every time a developer cd's into the monorepo, their shell environment loads a pinned, declaratively-configured set of dependencies and tools whose definitions are part of the monorepo.

The way I'd naively set up something like OP described would be to have direnv + nix flake deliver you a copy of bazelisk, and then have some custom shell scripts added to $PATH that alias `go = bazel run go`, `kubectl = bazel run kubectl` or whatever custom wrappers you want.

(Handwaving and I know the above isn't quite correct)

codethief•1d ago

Came here to post this. I've used asdf/mise for years (and recently also Nix + direnv) and it works tremendously well.

mikn•1d ago

Hi! Previously mentioned ex-Google SRE! There are a few layers to it - to make it work "ok" you need to first have a tool runner wrapper rule that does something similar to:

```

ctx.actions.write(output="""

tool_path=$(realpath {tool_short_path})

cd ${{BUILD_WORKING_DIRECTORY}}

exec $tool_path

""".format(tool_short_path=tool.short_path)

```

The purpose of this rule is to ensure that the tool's CWD is actually where you are inside the repository and not within the runfiles folder that Bazel prepared for you.

The second step is to set up a symlink target, similar to this:

```

#! /usr/bin/env bash

tool_name=$(basename $0)

exec -a "$tool_name" bazel run --ui_event_filters=-info,-stdout,-stderr --noshow_progress //tools/bin:$tool_name -- "$@"

```

We need to filter out all UI events since for some tools we intercept (such as jq) it expects the stdout to be clean from other output when used programmatically.

We then create a symlink for each tool name (say kubectl) to this script from another folder, and then we use `direnv` to inject the folder of symlinks into the user's paths with an `.envrc` file in the repository root like this:

```

PATH=$PWD/tools/path:$PATH

```

We have had this in place for quite a while now - it does seem like this pattern has caught some more wind and buildbuddy.io has released a ruleset: https://github.com/buildbuddy-io/bazel_env.bzl paired with https://github.com/theoremlp/rules_multitool achieves the same thing that we have built internally, the main difference being that with the bazel run wrapper we have made, you always run the latest version, whereas with the bazel_env pattern you need to manually rerun their target to get the latest binaries. :)

peterldowns•1d ago

Any chance you'll be releasing your rules? I'd love to see how you do it.

lbhdc•1d ago

Thanks! I am gonna give this a try.

teitoklien•1d ago

we run everything under systemd managed services instead of k8s and deploy via ansible playbooks at our company, and we similarly use tmuxinator to spin up all the backend api, search engine, databases like qdrant, meilisearch, etc and frontend services all in dev mode with all the terminals auto opening in window panes inside a tmux shell.

It really makes development in dev mode super simple and easy, and running all of the services in local dev environment is as simple as running one command, ‘tmuxinator’ at root of our monorepo and boom everything is up.

Monorepo truly outcompete individual repos for almost all projects, its far more pleasurable ever since I changed to this method of development.

chrismatic•1d ago

The point about trying to stick with a single language build tooling really cannot be stressed enough. It is what prompted me to write a simplified version of Bazel, a generic "target determinator" with caching capabilities if you will. I call it "Grog", the monorepo build tool for the grug-brained developer.

https://grog.build/why-grog/

bluGill•1d ago

If a single language is an option you are a small project that is not facing the problems people on large projects are facing. A monorepo will be easy for you without read the article and the lessons learned.

Come back when you have millions of lines of code, written over decades by hundreds (or thousands) of full time developers.

dezgeg•1d ago

What a weird take, "millions of lines of code, written over decades" applies to quite many C (or C++) codebases where using a high-level language is not a possibility (and companies that do have such codebases are pretty conservative and don't even talk about Rust no matter how great fit it would be).

bluGill•1d ago

In every case I've seen the vast majority might be C, but there are other other languages hidden in there that are hard to find. Many companies would use more languages if it wasn't such a pain. Rust for example would be really nice to use in new code, if only they can figure out how to mix it in.

chrismatic•16h ago

There is a space between the two types of repositories you are describing. One where you just have enough tools/langs that a single-language setup does not cut it for you anymore, but investing all that effort into Bazel does not seem worth it yet. That is the gap that Grog is meant to fill.

oftenwrong•14h ago

I am excited to learn of this project. I started working on something quite similar recently. It's a surprisingly unaddressed niche.

One thing your tool appears to be missing (IMO) is execution sandboxing. This is useful, as you likely know, for avoiding undeclared dependencies and for avoiding dirty builds due to actions polluting the source directory, among other things. I was playing around with allowing configurable sandboxing, with symlink forest and docker as two intial options.

boxed•1d ago

The article links to a site with this definition:

> A monorepo is a single repository containing multiple distinct projects, with well-defined relationships.

It would be better if there were terms that delineated "one repo for the company" from "one repo per project" from "many repos for a single project".

bluGill•1d ago

Idealy the term would indicate code and team size. Many commenting are working on tiny projects where they don't even see the problems that cause one to think of this debate

wocram•23h ago

I think most monorepo advocates are actually anti "one repo per project" at heart. That's the real anti-pattern imo.

KaiserPro•1d ago

One of the things not covered here is how to deal with versioning.

By default a monorepo will give you $current and nothing else.

A monorepo is not a bad idea, but you should think about either preventing breaking changes in some dependency killing the build globally, or have some sort of artefact store that allows versioned libraries (both have problems, you'll need to work out which is better for you. )

trollbridge•1d ago

I have been approaching this by eventually breaking out a module into its own repo when the time comes for that (enough resources to dedicate to maintaining it independently, having tests, and so forth).

When the folks working on the monorepo really need to slam through a change in the now-independent monorepo, we can use git submodules.

DrScientist•1d ago

I think a key idea often associated with the use of a monorepo is to encourage developer behaviour to do the integration/mitigation work at the point of change, rather than creating lots of integration debt in the form of versions ( however you do it ).

You need to look at your development model as a whole and decide whether the happy path incentivises good or bad development practices.

Do you want to incentivise the creation of technical debt with a myriad of versioned dependencies or do you want to incentivise designing code to be evolvable and resuable?

KaiserPro•1d ago

I worked at a startup with a "monorepo" (C++, cuda and python) it worked well and wasn't too hard to manage. Once someone bit the bullet and made some robust bazel spells it was brilliant to use and multi-platform too.

Worked at a FAANG with a monorepo, and everything was partially broken most of the time. Its trivial to bring in dependencies, which is great, super fast re-use.

The problem is, its trivial to add dependencies. That means that bringing in a library to manage messages also somehow requires a large amount of CUDA code as well.

A basic python programme would endup having something like >10k build items to go through each build.

DrScientist•13h ago

Great point - it's one of my pet peeves - automatic chained dependency management - at the what-do-I-need-to-build-everything level ( which is not the same as what do I need for my particular use ).

I think dependency management should be manual - make it intentional - and yes slightly harder.

If you have a static typed language, ( reflection like mechanisms aside ) you can make the compiler do the work in determining if the right dependencies are there and you can massively cut down the dependencies trees.

ie there is a mismatch between the semantics of automatic dependencies tree's and what you actually need when you import.

So if you need want to use library B from module A - I don't need the dependancies of B such that the whole of B compiles , I just need the dependencies of B that enable the my very specific use of B.

So if you add the module B to your project and run your compiler then it should tell you what further dependencies you need - rather than assuming you need to be in C and because you brought in C you also need D and E etc etc.

DrScientist•13h ago

If you don't have a compiler ( or have dynamic loading anyway ) then your test becomes does it run, rather than does it compile in terms of finding missing dependencies.

Given you only add dependencies once, I don't think it's a big deal to force developers to spend 5 mins determining exactly what they need rather than importing the world.

countWSS•1d ago

From viewpoint of security and separation of concerns giving unlimited access to everything by virtue of "everything" being stored in one giant repo sounds exceptionally short-sighted. A single rogue actor would be able to insert code to any component of choice instead of working on isolated repo with people who specifically know it and approve the code: the monorepo is a "big ball of mud" with vague shared responsibility that defers to people who worked on "specific parts" but they lack any authority or control, auditing the entire codebase doesn't scale.

morbicer•1d ago

Codeowners file + required review from the owner team solves like 90% of those worries

wh0knows•1d ago

Monorepo != all devs having merge permissions to all directories. Every single large monorepo company will have granular permissions on who can approve PRs into which directories based on team ownership. This is orthogonal to monorepo vs polyrepo.

cousin_it•1d ago

I've worked for a company with a large monorepo. At first I was a fan, but now I'm not so sure. The web of dependencies was too much. Now I think teams should be allowed to reuse other teams' code only as libraries or APIs with actual release cycles. There shouldn't be any "oh let's depend on the HEAD of this random build target somewhere else in the monorepo". There should be only "let's depend on a released version of such-and-such library or API".

If you adopt this discipline, you basically don't need a monorepo. Every team can have its own repo and depend on other stuff as third party. This adds some friction, but removes some other kinds of friction, and overall I think it's a better compromise.

ellisv•1d ago

I haven't really worked with any large monorepos.

I find your comment really interesting because having the capability to point to the HEAD (or realistically a commit SHA) is a feature I sometimes really enjoy about not using monorepos.

eddd-ddde•1d ago

This just creates tons of fragmentation. The second you have multiple teams depending on multiple versions you are doomed. You are stuck maintaining multiple versions, with their own quirks and bugs.

I think the one version rule is the most important part for a healthy monorepo.

bluGill•1d ago

I think one version is important for a healty polyrepo as well. You have to set lines where you say no new features unless you are all up to date. You can allow bug fix only releases to stay behind, but if you write a new feture it must be against the current latest of everything.

Otherwise you are doomed because there are so many different versions of everything in use. Some day a zero-day issue will hit all your projects as the same time and you will need months to get each in use version fixed.

senderista•1d ago

Welcome to Amazon.

wocram•23h ago

What do you do when you need a 1 character change in a dependency?

This added friction means you will do something unsavory to rush out a fix for yourself instead of fixing it in the dependency, waiting for a release, bumping the version of your dependency, then finally making your own release.

calvinmorrison•1d ago

One to look at historically was KDE using SVN.

all the downside of svn the partial checkout was great for a repo containing practically the entire K source tree

bigbuppo•1d ago

It's kind of weird that both Microsoft and Google were both using Perforce. What does Perforce do that worked well at those companies for so long, and what caused them to dump it? Did they just get tired of the licensing cost?

I think what I'm getting at is that maybe the real missing feature isn't whatever it is that allows you to make stupidly large monorepos, but that maybe we should add Perforce's client workspace model as a git extension?

senderista•1d ago

At MSFT we used a Perforce fork (Source Depot), but the Windows codebase was still developed in separate repos ("depots"): kernel, shell, graphics, etc. We had custom tooling to coordinate cross-repo changes, so it was still far from a monorepo.

WorldMaker•1d ago

Perforce didn't do anything extraordinarily well, it was just dumb enough it didn't do anything particularly poorly.

Perforce had a classic file locking model where a central server was in charge of file locks and a file was read-only until it was unlocked and the number of users that could unlock a file at the same time was often as low as 1.

So even if most Perforce operations were O(n^2) or worse, they were often only n = unlocked files, not n = files in repo. git status checks the full worktree, so is n = files in (visible part of) repo.

The "file is locked by another user" problem led to doing a lot of work outside Perforce itself. Often diff and patch tools and patch queues/changeset queue tools would proliferate around Perforce repos not provided by Perforce itself, but mini-VCSes built on top of Perforce. (Which is part of why Microsoft entirely forked Perforce early on. If you are already building a VCS toolkit on top of the VCS, might as well control that, too.)

A big point about git and its support for offline work, is that it works nothing like Perforce and you mostly don't want it to. A big benefit to git's model is that we mostly aren't using git as a low-level VCS toolkit and using a diaspora of other tools on top of git. (Ironically so, given git's original intent was to be the low-level VCS toolkit and early devs expected more "porcelain" tools to be built on top of it as third-party projects.)

jonthepirate•1d ago

I'm on the build team at DoorDash. We're in year 1 of our Bazel monorepo journey. We are heavy into Go, already have remote execution and caching working, and are looking to add support for Python & C++ soon.

If this sort of stuff happens to be something you might want to work on, our team has multiple openings... if you search for "bazel" on our careers page, you'll find them.

ecoffey•1d ago

Monorepo is one of few things I’ve drunk the koolaid on. I joke that the only thing worse than being in a monorepo, is not being in one.

codethief•1d ago

Thanks, I'll steal that one! :-)

v3ss0n•1d ago

Monorepo in ai driven development world is a disaster. The context consumption gonna be so off the roof

l5870uoo9y•1d ago

Separating out the database layer in a monorepo package was the best architectural decision I made this year. Now it is my default because at some point you either want to rebuild the existing app entirely or separate out services such as public API access that all need access to the same database.

someone654•1d ago

Can you elaborate on this? I’m facing a similar decision in my org and think sharing a common database store sounds smart. With rules of course, like clear ownership of data, only one writer, etc.

marcosdumay•1d ago

> in a monorepo package

Hum... Does that phrase mean you don't use anything remotely similar to a monorepo?

s17n•1d ago

If you've got less than 100 engineers, you aren't going to hit any of the scalability issues and there's literally no downside to a monorepo

nc0•1d ago

For the people interested in a good VCS system to achieve such monorepos, have a look at Ark [0]. It works really well for huge codebases, it is really fast, faster than Perforce Helix, it has an ethical and respectful pricing scheme, with a self-hosting mentality. Also it's indie, which is typically better than greedy corporate.

[0]: https://ark-vcs.com

joaonmatos•14h ago

As an Amazon employee, this is the kind of discussion that makes me glad we have the Brazil build system.

Practical SDR: Getting Started with Software-Defined Radio

WeatherStar 4000+: Weather Channel Simulator

FLUX.1 Kontext

Disarming an Atomic Bomb Is the Worst Job in the World

Player Piano Rolls

U.S. Sanctions Cloud Provider 'Funnull' as Top Source of 'Pig Butchering' Scams

Show HN: I wrote a modern Command Line Handbook

Car Physics for Games (2003)

My website is ugly because I made it

Learning C3

Making C and Python Talk to Each Other

Open-sourcing circuit tracing tools

Gurus of 90s Web Design: Zeldman, Siegel, Nielsen

Human coders are still better than LLMs

A visual exploration of vector embeddings

Flash Back: An "oral" history of Flash

Notes on Tunisia

Putting Rigid Bodies to Rest

The flip phone web: browsing with the original Opera Mini

Why Is Everybody Knitting Chickens?

Grid-Free Approach to Partial Differential Equations on Volumetric Domains [pdf]

How did geometry create modern physics?

Editing repeats in Huntington's:fewer somatic repeat expansions in patient cells

Nova: A JavaScript and WebAssembly engine written in Rust

Infisical (YC W23) Is Hiring Full Stack Engineers (TypeScript) in US and Canada

Show HN: Typed-FFmpeg 3.0–Typed Interface to FFmpeg and Visual Filter Editor

I started a little math club in Bangalore

Net-Negative Cursor

Airlines are charging solo passengers higher fares than groups

Run a C# file directly using dotnet run app.cs

Practical SDR: Getting Started with Software-Defined Radio

WeatherStar 4000+: Weather Channel Simulator

FLUX.1 Kontext

Disarming an Atomic Bomb Is the Worst Job in the World

Player Piano Rolls

U.S. Sanctions Cloud Provider 'Funnull' as Top Source of 'Pig Butchering' Scams

Show HN: I wrote a modern Command Line Handbook

Car Physics for Games (2003)

My website is ugly because I made it

Learning C3

Making C and Python Talk to Each Other

Open-sourcing circuit tracing tools

Gurus of 90s Web Design: Zeldman, Siegel, Nielsen

Human coders are still better than LLMs

A visual exploration of vector embeddings

Flash Back: An "oral" history of Flash

Notes on Tunisia

Putting Rigid Bodies to Rest

The flip phone web: browsing with the original Opera Mini

Why Is Everybody Knitting Chickens?

Grid-Free Approach to Partial Differential Equations on Volumetric Domains [pdf]

How did geometry create modern physics?

Editing repeats in Huntington's:fewer somatic repeat expansions in patient cells

Nova: A JavaScript and WebAssembly engine written in Rust

Infisical (YC W23) Is Hiring Full Stack Engineers (TypeScript) in US and Canada

Show HN: Typed-FFmpeg 3.0–Typed Interface to FFmpeg and Visual Filter Editor

I started a little math club in Bangalore

Net-Negative Cursor

Airlines are charging solo passengers higher fares than groups

Run a C# file directly using dotnet run app.cs

The Ingredients of a Productive Monorepo

Comments