That may be. What's not specified there is the immense, immense cost of driving a dev org on those terms. It limits, radically, the percent of engineers you can hire (to those who understand this and are willing to work this way), and it slows deployment radically.
Cloudflare may well need to transition to this sort of engineering culture, but there is no doubt that they would not be in the position they are in if they started with this culture -- they would have been too slow to capture the market.
I think critiques that have actionable plans for real dev teams are likely to be more useful than what, to me, reads as a sort of complaint from an ivory tower. Culture matters, shipping speed matters, quality matters, team DNA matters. That's what makes this stuff hard (and interesting!)
In a database, you wouldn't solve this with a distinct or a limit? You would make the schema guarantee uniqueness?
And yes, that wouldn't deal with cross database queries. But the solution here is just the filter by db name, the rest is table design.
* The deployment should have followed the blue/green pattern, limiting the blast radius of a bad change to a subset of nodes.
* In general, a company so much at the foundational level of internet connectivity should not follow the "move fast, break things" pattern. They did not have an overwhelming reason to hurry and take risks. This has burned a lot of trust, no matter the nature of the actual bug.
mikece•16m ago
Have any of the post-mortems addressed if any of the code that led to CloudFlare's outage was generated by AI?