One thing that this article didn't mention is that most development was done either on your development server running a datacenter (think ~50-100 cores) - or on an "on demand" machine that was like a short lived container that generally stayed up to date with known good commits every few hours. IDE was integrated with devservers / machines & generally language servers, other services were prewarmed or automatically setup via chef/ansible, etc. Rarely would you want to run the larger monorepos on your laptop client (exception would generally be mobile apps, Mac OS apps, etc.).
I think for a lot of users it's more important that the monorepo devenv be reproducible than be specifically local or specifically remote. It's certainly easier to pull this off when it's a remote devserver that gets regularly imaged.
I did not work at that place but the story sounds very familiar – I believe there might have been a blog post about that remote development environment here on HN some time ago?
I have done this for many small teams as well.
It remains pretty hard to get engineers to stop "thinking localy" when doing development. And with what modern hardware looks like (in terms of cost and density) it makes a lot of sense to find a rack some where for your dev team... It's easy enough to build a few boxes that can run dev, staging, test and what ever other on demand tooling you need with room to grow.
When you're close to your infrastructure and it looks that much like production, when you have to share the same playground the code inside a monorepo starts to look very different.
> managing large scale build systems ends up taking a team that works on the build system itself
This is what stops a lot of small teams from moving to monorepo. The thing is, your 10-20 person shop is never going to be google or fb or ms. They will never have large build system problems. Maintaining all of it MIGHT be someone's part time job IF you have a 20 person team and a very complex product. Even that would be pushing it.
I think this is a business opportunity, if someone could sell the polished monorepo experience and tools to companies with engineering organizations but can't pull off a successful "we need to fork git" project to support their developers.
Previous mono repo experiences were nothing short of a nightmare so it’s refreshing to see tooling come so far.
Source: my job’s monorepo is running nx, but I’m not in developer productivity; used to work with a large codebase with an accompanying test suite of thousands of hours of a single box and it’s kinda like watching people rediscovering the roundness required to make the wheel.
One is the kind described in the article here: "THE" monorepo of the (mostly) entire codebase, requiring custom VCS, custom CI, and a team of 200 engineering supporting this whole thing. Uber and Meta and I guess Google do it this way now. It takes years of pain to reach to this point. It usually starts with the other kind of "monorepo":
The other kind is the "multirepo monorepo" where individual teams decide to start clustering their projects in monorepos loosely organized around orgs. The frontend folks want to use Turborepo and they hate Bazel. The Java people want to use Bazel and don't know that anything else really exists. The Python people do whatever the python people do these days after giving up on Poetry, etc... Eventually these might coalesce into larger monorepos.
Either approach costs millions of dollars and millions of hours of developers' time and effort. The effort is largely defensible to the business leaders by skillful technology VPs, and the resulting state is mostly supported by the developers who chose to forget the horror that they had to endure to actually reach it.
If a company has up to 100 services, there won't be VCS scale problems, LSP will be able to fit the tags of the entire codebase in a laptop's memory, and it is probably _almost_ fine to run all tests on CI.
TL;DR not every company will/should/plan to be the size of Google.
That is mitigated a lot by a really good caching system (and even more by full remote build execution) but most times you basically end up needing a 'big iron' build system to get that, at which point it should be able to run the changed subset of tests accurately for you anyway.
Although, time is money, so often scaling build agents may be cheaper than paying for the engineering time to redo your build system...
Which is to say that trying to avoid running tests isn't the right answer. Make them as fast as you can, but be prepared to pay the price - either a lot of parrell build systems, or lower quality.
Part of it might be that Playwright makes it much easier to write and organize complex tests. But for that specific project, it was as close to a 1 to 1 conversion as you get, the speedup came without significant architectural changes.
The original reason for switching was flaky tests in CI that were taking way too much effort to fix over time, likely due to oddities in Cypress' command queue. After the switch, and in new projects using Playwright, I haven't had to deal with any intermittent flakiness.
First, there is “rapidly” as pertains to the speed of running tests during development of a change. This is “did I screw up in an obvious way” error checking, and also often “are the tests that I wrote as part of this change passing” error checking. “Rapid” in this area should target low single digits of minutes as the maximum allowed time, preferably much less. This type of validation doesn’t need to run all tests—or even run a full determinator pass to determine what tests to run; a cache, approximation, or sampling can be used instead. In some environments, tests can be run in the development environment rather than in CI for added speed.
Then there is “rapidly” as pertains to the speed of running tests before deployment. This is after the developer of a change thinks their code is pretty much done, unless they missed something—this pass checks for “something”. Full determinator runs or full builds are necessary here. Speed should usually be achieved through parallelism and, depending on the urgency of release needs, by spending money scaling out CI jobs across many cores.
Now the hot take: in nearly every professional software development context it is fine if “rapidly” for the pre-deployment category of tests is denominated in multiple hours.
Yes, really.
Obviously, make it faster than that if you can, but if you have to trade away “did I miss something” coverage, don’t. Hours are fine, I promise. You can work on something else or pick up the next story while you wait—and skip the “but context switching!” line; stop feverishly checking whether your build is green and work on the next thing for 90min regardless.
“But what if the slow build fails and I have to keep coming back and fixing stuff with an 2+ hours wait time each fix cycle? My precious sprint velocity predictability!”—you never had predictability; you paid that cost in fixing broken releases that made it out because you didn’t run all the tests. Really, just go work on something else while the big build runs, and tell your PM to chill out (a common organizational failure uncovered here is that PMs are held accountable for late releases but not for severe breakage caused by them pushing devs to release too early and spend less time on testing).
“But flakes!”—fix the flakes. If your organization draws a hard “all tests run on every build and spurious failures are p0 bugs for the responsible team” line, then this problem goes away very quickly—weeks, and not many of them. Shame and PagerDuty are powerful motivators.
“But what if production is down?” Have an artifact-based revert system to turn back the clock on everything, so you don’t need to wait hours to validate a forward fix or cherry-picked partial revert. Yes, even data migrations.
Hours is fine, really. I promise.
There are also categories of work that are so miserable with long deployment times that they just don’t get done at all in those environments. Things like improving telemetry, tracing, observability. Things like performance debugging, where lower envs aren’t representative.
I would personally never go back, for a system of moderate or more distributive complexity (ie > 10 services, 10 total data stores )
It was the "THE" monorepo, and it made understanding the company's service graph, call graph, ownership graph, etc etc. incredibly clear. Crystal clear. Vividly so.
Polyrepos are tribal knowledge. You don't know where anything lives and you can't look or discover it. Every team does their own thing. Inheriting new code is a curse. Code archeology feels like an adventure in root cause analysis in a library of hidden and cryptic tomes.
Polyrepos are like messages and knowledge locked away inside Discord or Slack channels with bad retention policies. Everything atrophies in the dark corners.
If monorepos cost millions, I'd say polyrepos do just the same in a different way.
Monorepos are are a continent of giant megafauna. Large resources, monotrophic.
Polyrepos are a thousand species living and dying, some thriving, some never to be known, most completely in the dark.
Why can't we add millions of dollars of tool engineering on top of polyrepos to get some of the benefits of monorepos without a lot of the pain? E.g. it wouldn't be too hard to create "linked" PRs across repos for changes that span projects, with linked testing infrastructure
And I don't see how discovery changes significantly from browsing through loads of repositories instead of loads of directories in a repository
> E.g. it wouldn't be too hard to create "linked" PRs across repos for changes that span projects, with linked testing infrastructure
Bitbucket does this out-of-the box :)
Then you lose the capability to atomically make a commit that crosses repoes. I'm not sure if there is any forge that allows that, except Gerrit might with its topics feature (I've not gotten the opportunity to try that).
Let's take for example a service 'foobar' that depends on in-house library 'libfoo'. And now you need to add a feature to foobar that needs some changes to libfoo at same time (and for extra fun let's say those changes will break some other users of libfoo). Of course during development you want to run pipelines for both libfoo and foobar.
In such 'super module' system it gets pretty annoying to push changes for testing in CI when every change to either libfoo or foobar needs to be followed by a commit to the super repo.
In a monorepo that's just another Tuesday.
Again, tooling issue. CI can easily pull required changeset across multiple repos. We are in a subthread under "monorepo plus millions of dollars of custom tool engineering" vs "stock polyrepo"
The costs of the infra/build/CI work are of course more visible when there is a dedicated team doing it. If there is no such central team, the cost is just invisibly split between all the teams. In my experience this is more costly overall, due to every team rolling their own thing and requiring them to be jack-of-all-trades in rolling their own infra/build/CI.
> And I don't see how discovery changes significantly from browsing through loads of repositories instead of loads of directories in a repository
If repository permissions aren't set centrally but every team gets to micromanage them, then they usually end up too restrictive and you don't get even read-only access.
b) When you are browsing through repositories you see a description, tags, technologies used, contributors, number of commits, releases etc. Massive difference in discovery versus a directory.
The short answer, start with a package management system like conan or npm (we rolled our own - releasing 1.0 the same month I first heard of conan which was then around version 0.6 - don't follow our example). Then you just need processes to ensure that everyone constantly has the latest version of all the repos they depend on - which ends up being a full time job for someone to manage.
Don't write your own package manager - if you use a common one that means your IDE will know how to work with it - our custom package manager has some nice features but we have to maintain our own IDE plugin so it can figure out the builds.
One full time job equivalent can buy a lot of tooling. Tooling that not only replaces this role but also shifts the feedback a lot closer to dev introducing the breaking change.
I've worked at big successful F500 boring companies with polyrepo setup and it's boring as well. For this company, it was Jenkins checked out the repo, ran the Jenkins file, artifact was created and stuck into JFrog Artifactory. We would update Puppet file in our repo and during approved deploy window in ServiceNow, Puppet would do the deploy. Because of this, Repos had certain fixed structure which was annoying at times.
Pain Points that were not solved is 4 different teams involved in touching everything (Jenkins, Puppet, InfoSec and dev team) and break downs that would happen.
If you have a large project there is no getting around the issues you will have. Just a set of pros and cons.
There are better tools for polyrepo you can start with, but there is a lot of things that we have that I wish I could get upstreamed (there is good reason the open source world would not accept our patches even if I cleaned them up)
Not quite - it's "vs stock polyrepo with millions of dollars of engineering effort in manually doing what the monorepo tooling does".
I don't think the "stock polyrepo" characterization is apt. Organizations using polyrepos already do invest that kind of money. Unfortunately, this effort is not visible because it's spread out across repos and every team does their own thing. So then people erroneously conclude that monorepos are much more expensive. Like the GP said:
> Polyrepos are a thousand species living and dying, some thriving, some never to be known, most completely in the dark.
A monolith that's broken up into 20 libraries in their own repos also prevents experimentation with new runtimes just as much as the monorepo version does.
Monorepo also means a team 'vetting' new thirdparty libs, and a team telling you your CI takes too long, and a team telling you to upgrade your lib within 23 minutes because theres a security issue in the korean language support...
It sounds like you worked in a dysfunctional organization that happened to use a monorepo. Their dysfunctions are not inherent in the monorepo model and they would have found other ways to be dysfunctional if not those.
I might have missed something.
This means that developers have a monorepo for day to day work, but the CI/CD issues are isolated in their own separate repos, and can be handled separately.
Dunno if that's 100% of what they mean but it seems to be a solution to what you describe in another message ("our CI/CD pipeline doesn't allow us to do so and it is not handled by our team anyway")
I can't answer that question, and there are reasons to go monorepo anyway. However if your problem is a bad polyrepo split going to monorepo is the obvious answer, but it isn't the only answer. Monorepo and polyrepo each have very significant problems (see the article for monorepo problems) that are unique to that setup. You have to choose what set of problems to live with and mitigate them as best you can.
Because it sounds like you just need flag based feature releases.
To be totally honest, yes this is an unbelievable pain in the ass, but I much prefer the strict isolation. Having worked with (much, much) smaller monorepos, I find the temptation to put code anywhere it fits too much, and things quickly get sloppy. With isolated repos, my brain much more clearly understands the boundaries and separation of concerns.
Then again, this results in a lot of code duplication that is not trivial to resolve. Submodules help to a degree, but when the codebase is this diverse, you're gonna have to copy some code somewhere.
I view it sort of like the split between inheritance and composition. You can either inherit code from the entire monorepo, or build projects from component submodules plugged together. I much prefer the latter solution, but clearly the former works for some people.
[0] https://wstomv.win.tue.nl/edu/2ip30/references/criteria_for_...
I wanna link other repos I depend on, but that repos can be read-only. And then all the tools work without extra friction
P.D: This could be another wish for jj!
Bazel is complex and isn't the easiest to pick up for many (though to Google's credit the documentation is getting better). Buck isn't any better in this regard. Pants seems easiest out of all the big ones I've seen but its also a bit quirky, though much easier to get started with in my experience. NX is all over the place in my experience.
Until recently too, most of these monorepo systems didn't have good web ecosystem support and even of those that do they don't handle every build case you want them to, which means you have to extend them in some way and maintain that.
It also doesn't help that most CI systems don't have good defaults for these tools and can be hard to setup properly to take advantage of their advantages (like shared cross machine caching).
As an aside, the best monorepo tooling I have ever used was Rush[0] from Microsoft. If you are working in a frontend / node monorepo or considering it, do take a look. It works great and really makes working in a monorepo extremely uniform and consistent. It does mean doing things 'the rush way' but the trade off is worth it.
[0]: https://rushjs.io
I think most of the time the philosophical decision (more shared parts or better separation) is made before deciding how you'll deal with the repos.
Now, if an org changes direction mid-way, the handling of the code can still be adapted without fundamentally switching the repo structure. Many orgs are multi-repo but their engineers have access to almost all of the code, and monorepo teams can still have strong isolation of what they're doing, up to having different CI rules and deployment management.
This makes sense if you consider that:
1) Changes to system structure, especially changes to fundamentals when the system is already being built, are difficult, expensive and time consuming. This gives system designs inertia that grows over time.
2) Growing the teams working on a system means creating new organizational units; the more inertia system has, the more sense it makes for growth to happen along the lines suggested by system architecture, rather than forcing the system to change to accommodate some team organization ideals.
Monorepo/multirepo is a choice that's very difficult to change once work on building the system starts, and it's a choice you commit at the very beginning (and way before the choice starts to matter) - a perfect recipe for not a mere nudge, but a scaffolding the organization itself will grow around, without even realizing it.
Typically someone has read a few blog posts like the ones linked to, and have some vague ideas of the positives but don't have a full understanding of how how the disadvantages will shape their workflow.
I've seen people with experience at hobby or small scale successfully campaigning for a switch at work and then hitting a wall - in both directions. Updating every call site for a breaking change doesn't sound that onerous, and at a small scale it isn't. Having each team update versioned depencies doesn't sound that hard, and at a small scale it isn't.
Just like with languages, don't listen to anyone who tells you this will solve all your problems. One of the options are merely the least bad for your situation.
as someone who works on core parts that are lively to break everything I spend half of my time just integrating things and anouther quarter trying to figure out how to make my things either less core or not need changes so often.
There's not really a way around that when you need some behavioral change for the code using the library.
If a specific change in a monorepo is so centrally embedded it requires incredible effort to do atomically (the benefit of having the monorepo in the first place), you are still able to split it into multiple gradual changes (and "require coordinating several PRs over a couple of weeks and some internal politics, but that might also be split among different developers who aren't even on a dedicated build team.").
So in a monorepo you can still enjoy the same advantage you describe for multi repo, and you'll even have much better visibility into the rollout of your gradual change thanks to the monorepo.
What are the advantages vs having a mono repo per team?
State of the "code universe" being atomic seems like a single point of failure.
Team B has no idea this is happening, as they only review code in repo B.
Soon enough team A stops updating their dependency, and now you have two completely different libraries doing the "same" thing.
Alternatively, team A simple pins their dependency to team B's repo at hash 12345, then just, never updates... How is team B going to catch bugs that their HEAD introduces on team A's repo?
You can also configure update of dependencies https://docs.github.com/en/code-security/dependabot/dependab...
These work with vendored dependencies too.
(In our org, we have our own custom Go tool that handles more sophisticated cases like analyzing our divergent forks and upstream commits and raising PR's not just for dependencies, but for features. Only works when upstream refactoring is moderate though)
If you have two internal services you can change them simultaneously. This is really useful for debugging using git bisect as you always have a code that passes the CI.
I might write a detailed blog about this at some point.
* Much easier refactors. If everything is an API and you need to maintain five previous versions because teams X, Y, Z are on versions 12, 17, and 21 it is utter hell. With a unified monorepo you can just do the refactor on all callers.
* It builds a culture of sharing code and reuse. If you can search everyone's code and read everyone's code you can not only borrow ideas but easily consume shared helpers. This is much more difficult in polyrepo because of aforementioned versioning hell.
* A single source of truth. Server X is running at CL #123, Server Y at CL #145, but you can quickly understand what that means because it's all one source control and you don't have to compare different commit numbers - higher is newer, end of story.
Multiple large monorepos in an organization are highly valuable imo, and should become more of a thing over time.
Not just a sparse clone.
Optional, per-directory OWNERS files are common, and most VCS frontends (Github, Bitbucket, etc.) can be configured to prevent merges without approval from the owning team(s) or DRI(s).
PRs that intersect multiple teams' ownership would require handoff of everyone impacted. So a team updating the company-wide "requests library" (or an equivalent change), with a wide blast radius, would be notifying everyone impacted and getting their buy-in.
But there are many ways to manage write permissions - limit the directories to which engineers are allowed to push code. E.g. if you use Git, this can be done with Gitolite, which is a popular hosting server.
Gitolite has very flexible hooks support, especially with so-called "Virtual Refs" (or VREFs)[1]. It is out of the box and has support to manage write permissions per write path [2]. You can go even further and use your own custom binary for VREF to "decide" if a user is allowed to push certain changes. One possible option - read incoming changed files, read metainformation from the repository itself (e.g., CODEOWNERS file at the root of the repo), and decide if push should be accepted. GitHub has CODEOWNERS [3], which behaves similarly.
[1]: https://gitolite.com/gitolite/cookbook.html#vrefs [2]: https://gitolite.com/gitolite/vref.html#quick-introexample [3]: https://docs.github.com/en/repositories/managing-your-reposi...
No. And that's one reason small startups should separate frontend code into a separate monorepo.
If you would like to hire a contractor for SEO/web developer then give them access to frontend code. Keep the backend code segmented out.
I use git mainly because everybody knows it, tooling is there, etc.
Then you configure ACLs for every repo or branch.
- https://mill-build.org/blog/2-monorepo-build-tool.html
People like to rave about Monorepos, and they are great if set up correctly, but there's a lot of intricacies that often goes on behind the scenes to make a Monorepo successful that it's easy to overlook since usually some "other" team (devops teams, devtools team, etc.) is shouldering all that burden. Still worth it, but most be approached with caution
Imagine you have an internal library and also two consumers of that library in the repo. But then you make breaking changes to the library but you only have time to update one of the consumers. Now how can the other consumer still use the old version of that library?
Seems like a big restriction to me.
I don't see how being forced to upgrade all consumers is a good thing.
It forces implementers of broad or disruptive API or technical changes to be responsible for the full consequences of those decisions, rather than the consumers of those changes who likely don't have context. People make better choices when they have to deal with the consequences themselves.
It also forces the consequences to be incurred now as opposed to 6 months later when a consumer of the old library tries to upgrade, realizes they can't easily, but they need a capability only in the new version, and the guy who made the API change has left the company for a pay raise elsewhere.
As a plus, these properties enable gigantic changes and migrations with confidence knowing there aren't any code or infrastructure bits you missed, and you're not leaving timebombs for a different project's secret repo that wasn't included in the gigantic migration.
Bluntly, if you can't see why many people like it (even if you disagree), you probably haven't worked in an environment or technical context where mono vs poly truly matters.
The two monorepo ways to do this:
1. Use automated refactoring tools that now work because it's one repo
2. Add the new behavior, migrate incrementally, then remove the old behavior
...and the article points out correctly that it's a lie anyway, but at least you can find all the consumers easily.
I'm sure others do similarly, because there is no way you would allow arbitrary changes to creep in the middle of a multi-service rollout.
Seems like a big restriction to me.
It doesn't matter if you go monorepo or polyrepo you wil have issues as your project grows. You will need to mitigate those issues somehow.
Genuine question: if you can't have one commit in production at any given time, what advantages for the monorepo remain?
That might be possible in a simple library + 1 consumer scenario if you follow the other commentors' recommendation to always update library + consumer at once. But in many cases you can't, anyway, because you're deploying several artifacts or services from your monorepo, not just one. So while "1 commit in production at any given time" is certainly neat, it wouldn't strike me as the primary goal of a monorepo. See also this discussion about atomicity of changes further up: https://news.ycombinator.com/item?id=44119585
> what advantages for the monorepo remain?
Many, in my opinion. Discoverability, being able to track & version all inter-project dependencies in git, homogeneity of tooling, …
See also my other comment further up on common pains associated with polyrepos, pains that you typically don't experience in monorepos: https://news.ycombinator.com/item?id=44121696
Of course, nothing's free. Monorepos have their costs, too. But as I mention in the above comment and also in https://news.ycombinator.com/item?id=44121851, a polyrepo just distributes those costs and makes them less visible.
So if your code is testing fine, and someone makes a major refactor across the main codebase, and then your code fails, you have narrowed the commit window to only 15 minutes of changes to sort through. As a result, people who commit changes that break a lot of things that their pre-commit testing would be too large to determine can validate their commits after the fact.
There's always some amount of uncertainty with any change, but the test it all methodology helps raise confidence in a timely fashion. Also decent coding practices include: Don't submit your code at the end of the day right before becoming unavailable for your commute...
We have something like ~40 repos in our private gitlab repo, and each one has its own CI system, which compiles, runs tests, builds packages for distribution, etc. Then there's a CI task which integrates a file system image from those ~40 repo's packages, runs integration tasks, etc.
Many of those components communicate with each other with a flatbuffers-defined message, which of course itself is a submodule. Luckily, flatbuffers allows for progressive enhancement, but I digress--essentially, these components have some sort of inter-dependency on them which at the absolute latest surfaces at the integration phase.
Is this actually a multi-repo, or is it just a mono-repo with lots of sub-modules? Would we have benefits if we moved to a mono-repo (the current round-trip CI time for full integration is ~35 minutes, many of the components compile and test in under 10s)? Maybe.
Everything is a tradeoff. Anything can work, it's about what kinds of frustrations you're willing to put up with.
This is a good thought! It actually needs to be O(1/commit rate) though, so that having the monorepo doesn't create long queues of commits.
Or have some process batch passing ready to merge PRs into a combined PR and try to merge that. And best guess on the failing PR if it fails.
Then if postsubmit fails you just have to rerun the intersection of failing tests and affected tests on each change since the last green commit.
Wireit is the smallest change from plain npm that gets you a real dependency graph of scripts, caching (with GitHub Actions support), incremental script running, and services.
This issue is that users of a library can put almost infinite friction on the library. If the library team wants to make a change, they have to update all the use sites, but Hyrum's Law will get you because users will do the damndest things.
So for the top organization, it's good if many other teams can utilize a great team's battle-tested library, but for the library team it's just liability (unless making common code is their job). In a place like Google you either end up with internal copies and forks, strict access control lists, or libraries that are slow as molasses to change.
I don't think there's anything wrong with copy pasting some useful piece of code too, not everything has to be a library you depend on, for small enough things.
In my organization we have around 70k internal git repos (and an order of magnitude fewer public ones), but of course not everything is related to everything else; we produce many distinct software products. I can understand "collect everything of a product to a single repo"; I can even understand going to "if there is a function call, that code has to be in the same repo". But putting everything into a single place... What are the benefits?
The benefit is the tooling, as the article mentioned. Everything in the repo is organised consistently, so I can make ad-hoc Python tools relying on relative paths knowing that my teammates have identical folder structure.
It just gets very difficult to manage, especially if people frequently need to work across many repos. Plus, onboarding is a pain in the ass.
Monorepo example: if I want to add a new Typescript package/library for internal NodeJS use, we have a boot strapping script that sets it up. And it basically:
1. Inherits a tsconfig that just works in the context of the repo
2. Jest is configured with our default config for node projects and works with TS out of the box.
3. Listing / formatting etc are all working out of the box.
4. Can essentially use existing dependencies the monorepo uses
5. Imports in existing code work immediately since it’s not an external dependency
6. CI picks up on the new typescript & jest configs and adds jobs for them automatically
7. Code review & collaboration is happening in the same spot
8. This also makes it easier to have devs managing the repo — for example, routine work like updating NodeJS is a lot easier when you know everything is using a nearly identical setup & is automatically verified in CI.
One challenge I had to help solve in a previous job was that onboarding was difficult because we had a small number of large repos everyone worked in. The standards were slightly different across them. Npm, pnpm, and yarn were all in use. Deployment worked pretty differently among them. CI setups were unique, and each of the large projects had, if not a team, some number of people spending a lot of time just managing the project’s workflows.
So many coordination things just get easier when there isn’t an opportunity to get out of sync. If you do separate repos, you can totally share config… but now it costs a dependency update PR to pull in that tiny update to the shared unit test config and now everything. It’s just guaranteed to get out of sync, and it’s hard to catch issues when you can’t validate a config change with all projects using it at the same time.
So because it becomes trickier (and takes work) just to do the action of syncing multiple repo’s setups… inevitably, you end up with some “standards” that are loosely followed and a lot of slightly different setups that get hard to untangle the longer they grow. If you can accept the cost of context switching between repos, or if people don’t need to switch, maybe it’s ok… until something like a foundational dependency update (NodeJS, Typescript, React, something like that) needed for security becomes extremely difficult because you have a million different ways of configuring things and the JS ecosystem sucks
The above breaks down when we have third party code - since they don't follow our common patterns for building so they have to do something different. Bringin that into a monorepo would be just as different from everything else.
No, you do not, unless you mean N=1. Build scripts/tooling/linters etc are put into a different repo, are released and consumed by each individual repo.
This is absolutely ludicrous unless you fail to grasp the power of a hierarchical file system. I don't see how a big mess like CI/CD is made easier by spreading it out to more points of configuration.
To me the whole point of a monorepo is atomic commits for the whole org. The power of this is really hard to overstate when you are trying to orchestrate the efforts of lots of developers - contrary to many claims. Rebasing in one repo and having one big meeting is a hell of a lot easier than doing it N times.
Even if the people on the team hate each other and refuse to directly collaborate. I still don't see the reason to not monorepo. In this scenario, the monorepo becomes a useful management and HR tool.
What they're doing is creating a mass of complexity that is turning org-chart problems into future technical ones and at the same time not recognizing the intrinsic internal dependencies of the software systems they're building.
Luckily my current job is not like this, but the last one was, and I couldn't believe the wasted hours spent doing things as simple as updating the fields in a protobuf schema file.
The answer is IMO somewhere in between. Microservices can get too tiny and thus the system becomes impossible to understand. However a monolith is impossible to understand as well.
The real problem is you need good upfront architecture to figure out how the whole system fits together. However that is really hard to get right (and Agile discourages it - which is right for small projects where those architects add complex things to mitigate problems you will never have)
Relevant xkcd: https://xkcd.com/2044/
The question is this. Do the costs of monorepo justify the benefits for your situation? The answer is not always yes.
I spent some years at small to mid-sized companies (~30-100 devs) that would have profited from a monorepo. However, people were in the habit of splitting repositories every other month. Sometimes a single team would own a given repository, but more often than not several teams would contribute to each repository.
I have serious PTSD from that time. Every single pipeline in every repo worked differently, tools were different, setup, scripts and commands were different. In some repositories you could trust the CI pipeline, in others you absolutely couldn't. CI performance gains in one repo wouldn't translate to another. And of course you often still had some tech debt lying around from when repositories got split and people forgot to clean up thoroughly. Uggh.
Now, to be fair, people did put quite a bit of effort into their setups and pipelines and all that, and it wasn't that the outcome of that effort in and by itself was totally bad. Not at all. But everyone did things differently. And often teams just re-did the work other people had already done & solved – there was a ton of overhead.
Finally, the worst part were inter-repository dependencies. People argued coupling would be loose and all that, so we could easily split repos, but in reality there were so many implicit dependencies because ultimately all repositories made up one big application. Merge requests had to be coordinated. One repo imported files from another. Later a CI pipeline in one repo triggered a pipeline in another repo…
This brings me to another problem with polyrepos: They harm discoverability and cause certain dependencies not to be encoded in code. In a monorepo, in contrast, when people wonder where they can find X, or whether anyone uses or depends on Y, the answer is usually only a `ripgrep` or `git grep` away. In a polyrepo, however, they need to know where to look first.
Born from all these experiences, my mantra has been: Split repos only if you really have to and have a very good reason! (Entirely different applications or ecosystems; different parts of the code owned by different & independent teams; …)
The belief that a monorepo makes a change somehow more atomic is one of the traps.
From the article:
> The greatest power and biggest lie of the monorepo is that it is possible to make atomic commits across your entire codebase. [...]
> Your monorepo now contains many different deployable artifacts that deploy at different times. It is also technically possible to make, for example, a breaking change to a service’s interface, a service’s implementation, and the service’s clients all in one PR. However, this PR will break when you deploy it because you do not deploy your service and all of its clients atomically. While this is also possible in a world with many repositories, the requirement to do this change in multiple pull requests is often enough to remind engineers that breaking changes to a service contract are not safe to make.
> Your users must understand that your deployment system operates asynchronously with respect to what happens in the monorepo. Its primary interaction with the monorepo is to go and pick up the “latest” build artifacts for a particular service; everything else happens on timetables that are potentially not under your control and can happen arbitrarily far in the future.
> A common CI job in a monorepo is to validate service contracts and make sure that they are not broken unless the author deliberately intended to do so, and they are required to provide a justification as to why such a change is OK.
A monorepo does make changes atomic in the code. There's no trap there.
You're talking about deployment, and yes when deployment is staggered, then obviously all atomic changes need to be backward-compatible, or else be very carefully orchestrated. But that doesn't have anything to do with monorepo vs polyrepo. That's just staggered deployment.
You have to deal with backwards compatibility in both cases. But at least with the monorepo you can see and track and merge all the changes related to a feature in one place, and be able to roll them back in one place.
(Or you turn services off for the duration of the deploy. Most companies do not want that these days.)
Also, you're missing this part of the article:
> While this is also possible in a world with many repositories, the requirement to do this change in multiple pull requests is often enough to remind engineers that breaking changes to a service contract are not safe to make.
And I'm not missing any part of the article. I was talking about your comment, and the fact that you are conflating two different things. A monorepo allows you to make atomic commits that a polyrepo does not, full stop. There's no trap there. Deployment is separate. You have to worry about breaking changes regardless of it being a polyrepo or monorepo. But a monorepo can make each atomic change far easier to track and manage and reason about.
A really useful heuristics when you are designing programming environments is: the more power you give to a team of developers, the more problems you will have.
Technically, atomic commits are not more power, they are less. But it does empower the team to work with bad interfaces. And that's a power that creates problems.
Your deploys don't become atomic. Your builds don't become atomic. At best you get to tell yourself you can version a bit more loosely.
Mono-repos themselves do not scale easily. Inherently its a harder technical problem to solve. You need to toss git and find something better, which is not easy. Its work. Its so much work that it is incredibly clear that you've never experienced it yourself.
I never quite got the arguments/grumblings about this stuff but perhaps I do not understand what friction people are hitting up agains. It feels to me the problems of multiple repositories is the same problem of having to manage versions of dependencies like in pip or something that you do not even own.
Perhaps people are using tools that mean they are not able to search/click through their multi repo setup even though the code connects?
One top level "superrepo" is the best of both worlds. You don't break the limits of git, you can include third party repos, you can seal off access to some secret repos. All while still giving benefit of everything checked out in one place. Tooling for flattening the git log exists and is relatively cheap to build yourself, It's easy to design CI that runs both big and small jobs in the superrepo and the subcomponents.
Nested submodules = Game over. Don't ever do it. This is the worst of both worlds. Common components included twice or more need to be aligned to exact same revision, without help of semantic versioning. Components far down the dependency tree need to be uplifted recursively through a chain of submodules, resulting in 10 commits with no common CI for one tiny change.
There's also a vicious feedback loop of separating projects across repos creating a huge impediment for touching anything outside of the project you're directly working in.
You can have a monorepo and still fail if every team works on their own branch and then attempts to integrate into trunk the week before your quarterly release process begins.
You can fail if a core team builds a brand new version of the product on master with all new tests such that everything is green on every commit but your code is unreleasable because customers aren’t ready for v2 and you need to keep that v1 compatability around.
if you work in medical or aviation areas every release legally needs extensive - months - testing before you can release. If there are issuse found in that testing you start over. Not all tests can be automated.
i work in agraculture. the entire month of July there will be nobody in the world using a planter or any of the software on it. there is no point in a release then. the lack of users means we cannot use automated rollback if the change somehow fails for customers - we could but it would be months of changes rolled back whe Brasil starts planting season.
[1] https://www.servicenow.com/docs/bundle/yokohama-it-business-...
On the other hand even the big tech companies will only expose code paths very slowly and very conservatively. Meta’s Threads.app for example combined both a constant churn of innovation on master with a very measured gating of new features shipping to the public.
The best teams do indeed, as you say, ship and test finished builds on a weekly or daily basis even if the stuff that gets under the customers’ / users’ / clients’ noses appears on a far less regular basis. After all, any kind of severe bug could necessitate a release at any moment.
It is: https://github.com/facebookincubator/buck2-change-detector
> Some tools such as bazel and buck2 discourage you from checking in generated code and instead run the code generator as part of the build. A downside of this approach is that IDE tools will be unable to resolve any code references to these generated files, since you have to perform a build for them to be generated at all in the first place
Not an issue I have experienced. It's pretty difficult to get into a situation where your IDE is looking in buck-out/v2/gen/781c3091ee3/... for something but not finding it, because the only way it knows about those paths is by the build system building them. Seeing this issue would have to involve stale caches in still-running IDE after cleaning the output folder, which is a problem any size repo can have. In general, if an IDE can index generated code with the language's own build system, then it's not a stretch to have it index generated code from another one.
The problem is more hooking up IDEs to use your build system in the first place. It's a real slog to support many IDEs.
Buck recently introduced an MSBuild project generator where all build commands shell out to buck2. I have seen references to an Xcode one as well, I think there's something there for Android as well. The rust-analyzer support works pretty well but I do run a fork of it. This is just a few. There is a need (somewhat like LSP, but not quite) for a degree of standardization. There is a cambrian explosion of different build systems and each company that maintains one of them only uses one or two IDEs and integrates with those. If you want to use a build system for an IDE they don't support, you are going to have a tough time. Last I checked the best effort by a Language Server implementation at being build-system agnostic is gopls with its "gopackagesdriver" protocol, but even then I don't think anyone but Bazel has integrated with it: https://github.com/bazel-contrib/rules_go/wiki/Editor-and-to...
Since our monorepos were used exclusively for frontend applications, we could rely entirely on the JavaScript/TypeScript ecosystem, which kept things manageable.
What I learned is that a good monorepo often behaves like a “polyrepo in disguise.” Each project within it can be developed, hosted, and even deployed independently, yet they all coexist in the same codebase. The key benefit: all projects can share code (like UI components) to ensure a consistent look and feel across the entire product suite.
If you're looking for a more practical guide, check out [0].
We can start the entire platform, Kubernetes operators and all, locally on our laptops using Tilt + Bazel + Kind. This works on both Mac and Linux. This means we can validate essentially all functionality, even our Bottlerocket-based OS with Firecracker, locally without requiring a personal development cluster or such.
We have made this tool layer which means if I run `go` or `kubectl` while in our repo, it's built and provided by Bazel itself. This means that all of us are always on the same version of tools, and we never have to maintain local installations.
It's been a HUGE blessing. It has taken some effort, will take continuous effort and to be fair it has been crucial to have an ex Google SRE on the team. I would never want to work in another way in the future.
EDIT: To clarify, our repo is essentially only Golang, Bash and Rust.
And does it really matter what you go with when you've got 1.5 engineers?
It's a non-problem at that scale as both engineers are intimately aware of how the entire build process works and can keep it in their head.
At that scale I've done no repo at all, repo stored on Dropbox, repo in VCS, SVN, whatever, and it all still worked fine.
It really hasn't added anything at all to your success.
BTW, it's still common for developers to start entire repos on their own laptops with zero hassles in tons of dev shops that haven't been silly and used k8s with 2 developers.
In fact at the start of my career I worked with 10 or so developers the shitty old MS one where you had to lock files so no-one else can use them. You'd checkout files to allow you to change them (very different to git checkout), otherwise they'd be ready only on your drive.
And the build was a massive VB script we had to run manually with params.
And it still worked.
We got some moaning when we moved to SVN too at how much better the old system was. Which was ridiculous as you used to have to run around and ask people to unlock key files to finish a ticket, which was made worse as we had developer consultants who'd be out of office for days on end.
So then you'd have to go hassle the greybeard who had admin rights to unlock the file for you (although he wasn't actually that old and didn't have a beard).
What you do is store the git repo in Dropbox, and developers just use it as a remote. With backups, this could actually go a reasonably long time, although I personally wouldn’t suggest it.
We are not using a microservice pattern at all. I am not sure where you get that from. If anything we have several "macro services".
Our final setup is quite complex as we are building a literal cloud provider, but in practice we have a Go API, a Docker registry, a Temporal Worker and a Kubernetes controller. Whats complicated is everything else around it. We run our platform on bare-metal and thus have auxiliary services like a full-blown Kubernetes cluster, Ory Hydra + Kratos, SpiceDB, Cilium, Temporal Cluster + Workers and some other small things. We need to be able to test this locally to feel safe to release to production. And in turn our production environment is almost identical to our local environments.
None of that would be possible unless we've done something similar to what we have built today. Most companies cannot run their entire stack on their laptop, more unlikely that they could run a full cloud provider.
Yes, with one and a half FTEs you should only have a single repo.
My experience with Bazel has been extremely bad, but I don’t think that it should necessarily be avoided completely. It may actually have some value on extremely large multi-team projects. But for less than two FTEs it seems like massive overkill.
I believe that you could do what you need with Kind (and maybe Tilt?), without Bazel.
> We have made this tool layer which means if I run `go` or `kubectl` while in our repo, it's built and provided by Bazel itself. This means that all of us are always on the same version of tools, and we never have to maintain local installations.
Go kind of does that for you already, with go.mod. Since kubectl is a Go program, you could achieve that goal the same way.
> it has been crucial to have an ex Google SRE on the team
I wonder how many additional team members y’all could afford in return for an ex-Googler’s salary expectations.
I sincerely hope that y’all find the maintenance expense of Bazel to be worth it going forward. Hopefully you will!
I had massive issues at my previous employer with Bazel. They did not try to make Bazel work for non-SREs, which as you can imagine didn't work very well. So it's definitely not a silver bullet!
We should probably write a blog post about our setup!
Would you mind elaborating and providing some examples of what was bad?
We have a monorepo built using bazel, and at first when new to bazel, I was pretty frustrated. But now I can't think of any issue I've had with it recently.
But we do have a relatively simple setup.
> We have made this tool layer which means if I run `go` or `kubectl` while in our repo, it's built and provided by Bazel itself. This means that all of us are always on the same version of tools, and we never have to maintain local installations.
Currently I have to run `bazel run <tool>`. Your solution sounds way better. How does yours work?
The way I'd naively set up something like OP described would be to have direnv + nix flake deliver you a copy of bazelisk, and then have some custom shell scripts added to $PATH that alias `go = bazel run go`, `kubectl = bazel run kubectl` or whatever custom wrappers you want.
(Handwaving and I know the above isn't quite correct)
```
ctx.actions.write(output="""
tool_path=$(realpath {tool_short_path})
cd ${{BUILD_WORKING_DIRECTORY}}
exec $tool_path
""".format(tool_short_path=tool.short_path)
```
The purpose of this rule is to ensure that the tool's CWD is actually where you are inside the repository and not within the runfiles folder that Bazel prepared for you.
The second step is to set up a symlink target, similar to this:
```
#! /usr/bin/env bash
tool_name=$(basename $0)
exec -a "$tool_name" bazel run --ui_event_filters=-info,-stdout,-stderr --noshow_progress //tools/bin:$tool_name -- "$@"
```
We need to filter out all UI events since for some tools we intercept (such as jq) it expects the stdout to be clean from other output when used programmatically.
We then create a symlink for each tool name (say kubectl) to this script from another folder, and then we use `direnv` to inject the folder of symlinks into the user's paths with an `.envrc` file in the repository root like this:
```
PATH=$PWD/tools/path:$PATH
```
We have had this in place for quite a while now - it does seem like this pattern has caught some more wind and buildbuddy.io has released a ruleset: https://github.com/buildbuddy-io/bazel_env.bzl paired with https://github.com/theoremlp/rules_multitool achieves the same thing that we have built internally, the main difference being that with the bazel run wrapper we have made, you always run the latest version, whereas with the bazel_env pattern you need to manually rerun their target to get the latest binaries. :)
It really makes development in dev mode super simple and easy, and running all of the services in local dev environment is as simple as running one command, ‘tmuxinator’ at root of our monorepo and boom everything is up.
Monorepo truly outcompete individual repos for almost all projects, its far more pleasurable ever since I changed to this method of development.
Come back when you have millions of lines of code, written over decades by hundreds (or thousands) of full time developers.
One thing your tool appears to be missing (IMO) is execution sandboxing. This is useful, as you likely know, for avoiding undeclared dependencies and for avoiding dirty builds due to actions polluting the source directory, among other things. I was playing around with allowing configurable sandboxing, with symlink forest and docker as two intial options.
> A monorepo is a single repository containing multiple distinct projects, with well-defined relationships.
It would be better if there were terms that delineated "one repo for the company" from "one repo per project" from "many repos for a single project".
By default a monorepo will give you $current and nothing else.
A monorepo is not a bad idea, but you should think about either preventing breaking changes in some dependency killing the build globally, or have some sort of artefact store that allows versioned libraries (both have problems, you'll need to work out which is better for you. )
When the folks working on the monorepo really need to slam through a change in the now-independent monorepo, we can use git submodules.
You need to look at your development model as a whole and decide whether the happy path incentivises good or bad development practices.
Do you want to incentivise the creation of technical debt with a myriad of versioned dependencies or do you want to incentivise designing code to be evolvable and resuable?
Worked at a FAANG with a monorepo, and everything was partially broken most of the time. Its trivial to bring in dependencies, which is great, super fast re-use.
The problem is, its trivial to add dependencies. That means that bringing in a library to manage messages also somehow requires a large amount of CUDA code as well.
A basic python programme would endup having something like >10k build items to go through each build.
I think dependency management should be manual - make it intentional - and yes slightly harder.
If you have a static typed language, ( reflection like mechanisms aside ) you can make the compiler do the work in determining if the right dependencies are there and you can massively cut down the dependencies trees.
ie there is a mismatch between the semantics of automatic dependencies tree's and what you actually need when you import.
So if you need want to use library B from module A - I don't need the dependancies of B such that the whole of B compiles , I just need the dependencies of B that enable the my very specific use of B.
So if you add the module B to your project and run your compiler then it should tell you what further dependencies you need - rather than assuming you need to be in C and because you brought in C you also need D and E etc etc.
Given you only add dependencies once, I don't think it's a big deal to force developers to spend 5 mins determining exactly what they need rather than importing the world.
If you adopt this discipline, you basically don't need a monorepo. Every team can have its own repo and depend on other stuff as third party. This adds some friction, but removes some other kinds of friction, and overall I think it's a better compromise.
I find your comment really interesting because having the capability to point to the HEAD (or realistically a commit SHA) is a feature I sometimes really enjoy about not using monorepos.
I think the one version rule is the most important part for a healthy monorepo.
Otherwise you are doomed because there are so many different versions of everything in use. Some day a zero-day issue will hit all your projects as the same time and you will need months to get each in use version fixed.
This added friction means you will do something unsavory to rush out a fix for yourself instead of fixing it in the dependency, waiting for a release, bumping the version of your dependency, then finally making your own release.
all the downside of svn the partial checkout was great for a repo containing practically the entire K source tree
I think what I'm getting at is that maybe the real missing feature isn't whatever it is that allows you to make stupidly large monorepos, but that maybe we should add Perforce's client workspace model as a git extension?
Perforce had a classic file locking model where a central server was in charge of file locks and a file was read-only until it was unlocked and the number of users that could unlock a file at the same time was often as low as 1.
So even if most Perforce operations were O(n^2) or worse, they were often only n = unlocked files, not n = files in repo. git status checks the full worktree, so is n = files in (visible part of) repo.
The "file is locked by another user" problem led to doing a lot of work outside Perforce itself. Often diff and patch tools and patch queues/changeset queue tools would proliferate around Perforce repos not provided by Perforce itself, but mini-VCSes built on top of Perforce. (Which is part of why Microsoft entirely forked Perforce early on. If you are already building a VCS toolkit on top of the VCS, might as well control that, too.)
A big point about git and its support for offline work, is that it works nothing like Perforce and you mostly don't want it to. A big benefit to git's model is that we mostly aren't using git as a low-level VCS toolkit and using a diaspora of other tools on top of git. (Ironically so, given git's original intent was to be the low-level VCS toolkit and early devs expected more "porcelain" tools to be built on top of it as third-party projects.)
If this sort of stuff happens to be something you might want to work on, our team has multiple openings... if you search for "bazel" on our careers page, you'll find them.
Hum... Does that phrase mean you don't use anything remotely similar to a monorepo?
[0]: https://ark-vcs.com
jph•1d ago
swgillespie•1d ago