When you propose a change to something that other things depend on, it makes sense to test those dependents for a regression; this is not earth shattering.
If you want to change something which breaks them, you have to then do it in a different way. First provide a new way of doing something. Then get all the dependencies that use the old way to migrate to the new way. Then when the dependents are no longer relying on the old way, you can push out a change which removes it.
I like the perspective presented in this article, I think CRAN is taking an interesting approach. But this is nuts and bolts. Explicitly saying you're compatible with any future breaking changes!? You can't possibly know that!
I get that a lot of R programmers might be data scientists first and programmers second, so many of them probably don't know semver, but I feel like the language should guide them to a safe choice here. If CRAN is going to email you about reverse dependencies, maybe publishing a package with a crazy semver expression should also trigger an email.
In a true monorepo — the one for the FreeBSD base system, say — if you make a PR that updates some low-level code, then the expectation is that you 1. compile the tree and run all the tests (so far so good), 2. update the high-level code so the tests pass (hmm), and 3. include those updates in your PR. In a true centralized monorepo, a single atomic commit can affect vertical-slice change through a dependency and all of its transitive dependents.
I don’t know what the equivalent would be in distributed “meta-monorepo” development ala CRAN, but it’s not what they’re currently doing.
(One hypothetical approach I could imagine, is that a dependency major-version release of a package can ship with AST-rewriting-algorithm code migrations, which automatically push both “dependency-computed” PRs to the dependents’ repos, while also pushing those same patches as temporary forced overlays onto releases of dependent packages until such time as the related PRs get merged. So your dependents’ tests still have to pass before you can release your package — but you can iteratively update things on your end until those tests do pass, and then trigger a simultaneous release of your package and your dependent packages. It’s then in your dependents’ court to modify + merge your PR to undo the forced overlay, asynchronously, as they wish.)
Jane Street has something similar called a "tree smash" [1]. When someone makes a breaking change to their internal dialect of OCaml, they also push a commit updating the entire company monorepo.
It's not explicitly stated whether such migrations happen via AST rewrites, but one can imagine leveraging the existing compiler infrastructure to do that.
[1]: https://signalsandthreads.com/future-of-programming/#3535
ideally yes. However, such a monorepo can become increasingly complex as the software being maintained becomes larger and larger (and/or more and more people work on it).
You end up with massive changes - which might eventually become something that a single person cannot realistically contain within their brain. Not to mention clashes - you will have people making contradictory/conflicting changes, and there will have to be some sort of resolution mechanism outside (or the "default" one, which is first come first served).
Of course, you could "manage" this complexity by attributing api boundary/layers, and these api changes are deemed to be important to not change too often. But that simply means you're a monorepo only in name - not too different from having different repos with versioned artefacts with a defined api boundary.
Automated tests, compilation by the package publisher, and enforcement of portability flags and SemVer semantics.
> In the years since, my discomfort has given away to fascination. I’ve come to respect R’s bold choices, its clarity of focus, and the R community’s continued confidence to ‘do their own thing’.
I would love to see a follow-up article about the key insights that the author took away from diving more deeply into R.
> CRAN had also rerun the tests for all packages that depend on mine, even if they don’t belong to me!
Another way to frame this is these are the customers of your package's API. If you broke them you are required to ship a fix.
I see why this isn't the default (e.g. on GitHub you have no idea how many people depend on you). But the developer experience is much nicer like this. Google, for example, makes this promise with some of their public tools.
Outside the word of professional software developers, R is used by many academics in statistics, economics, social sciences etc. This rule makes it less likely that their research breaks because of some obscure dependency they don't understand.
> But the migration had a steep cost: over 6 years later, there are thousands of projects still stuck on an older version.
This is a feature, not a bug. The pinning of versions allows systems to independently maintain their own dependency trees. This is how your Linux distribution actually remains stable (or used to, before the onslaught of "rolling release" distributions, and the infection of the "automatically updating application" into product development culture, which constantly leaves me with non-functional Mobile applications whereupon I am forced to update them once a week). You set the versions, and nothing changes, so you can keep using the same software, and it doesn't break. Until you choose to upgrade it and deal with all the breaking shit.
Every decision in life is a tradeoff. Do you go with no version numbers at all, always updating, always fixing things? Or do you always require version numbers, keeping things stable, but having difficulty updating because of a lack of compatible versions? Or do you find some middle ground? There are pros and cons to all these decisions. There is no one best way, only different ways.
I mean, just look at how many projects use “curl and bash” as their distribution method even though the project repositories they could use instead don’t even require anything nearly as onerous as the reverse dependency checks described in this article. If the minimal requirements the current repos have are enough to push projects to alternate distribution, I can’t imagine what would happen if it was added.
esafak•3h ago
That's the objective function of Hastie et al's GLM. I had a good chuckle when I realized the author's last name is Tibshirani. If you know you know.
bryanrasmussen•3h ago
david_draco•1h ago
esafak•1h ago