I've seen very simple services get bogged down in needing to be "scalable" so they're built so they can be spun up or torn down easily. Then a load balancer is needed. Then an orchestration layer is needed so let's add Kubernetes. Then a shared state cache is needed so let's deploy Redis. Then we need some sort of networking layer so let's add a VPC. That's hard to configure though so let's infra-as-code it with terraform. Then wow that's a lot of infrastructure so let's hire an SRE team.
Now nobody is incentivized to remove said infrastructure because now jobs rely on it existing so it's ossified in the organization.
And that's how you end up with a simple web server that suddenly exploded into costing millions a year.
When I looked into having this static page hosted on internal infra, it would have also needed minimum two dedicated oncalls, terraform, LB, containerization, security reviews, SLAs, etc.
I gave up after the second planning meeting and put it on my $5 VPS with a letsencrypt cert. That static page is still running today, having outlived not only the production line, but also the entire company.
In my experience there are two kinds of infrastructure or platform teams:
1) The friendly team trying to help everyone get things done with reasonable tradeoffs appropriate for the situation
2) The team who thinks their job is to make it as hard as possible for anyone to launch anything unless it satisfies their 50-item checklist of requirements and survives months of planning meetings where they try to flex their knowledge on your team by picking the project apart.
In my career it’s been either one or the other. I know it’s a spectrum and there must be a lot of room in the middle, yet it’s always been one extreme or the other for me.
Basically, is the ops team there to support the developer/development team, or the product?
In the case of the development team, the ops team will tend to be willing to provide advice and suggestions but be flexible on tooling and implementation. In the case of the product, the ops team will tend to be a lot more rigid and inflexible.
This plays out in things like:
When the PWA becomes critical for the production line and then "is not working" at 3AM, who is getting paged? If it's the developer, then ops is "supporting the developer". If it's the ops team getting called to debug and fix some project they've never laid eyes on before at 3AM, then it's the product. They are, naturally, going to start caring a lot more about how it is set up, deployed, and supported because nobody likes getting woken for work at 3AM.
When some project's dependencies start running past EOL, who is going to update it? If it's the developer, then ops is "supporting the developer". If the ops team isn't empowered to give a deadline and have _someone else_ responsible for keeping the project functioning, then they're supporting the product and by letting it be deployed effectively committed to maintaining it in perpetuity and they're going to start caring a lot more about what sort of languages, frameworks, etc are used and specifically how projects are set up because context switching to one of dozens of different projects at 3AM is hard enough as-is without having to also be trying to learn some new framework du jour.
(And before anyone says "well the updates probably aren't necessary this is just ops being a pain"--think of the case of a project relying on GCP product that's being shutdown or some kubernetes resource that's been changed. In one case inaction will cause the project to fail, in the other ops' action will cause it to fail. See the first point as to who is going to get called about that. Even in the happy case, consistency brings automation and allows the team to support a _class_ of deployments instead of individual products.)
I don't think places exist stably in the middle ground because it's a painful place to be for very long. The responsibility and the control land on separate people, and the person with the responsibility but without the control is generally going to work to wrestle control to reduce misery. In the case where ops acts as if they're supporting the developers but is in practice supporting the product, it's not going to take too many 3AM calls before they start pushing back on how the product's deployed and supported.
I've been both of those ops guys you describe. When I was the "checklists, meetings, and picking the project apart" guy it had nothing to do with me wanting to make anyone's life difficult or flexing my knowledge. It had to do with the 3AM calls waking myself, my wife, and my newborn up. If I was taking on responsibility for keeping your _product_ functional through its useful life, yeah, I wasn't going to let people dump stuff on my plate unless I had some reasonable basis to believe it wasn't going to substantially increase my workload and result in more middle of the night calls. The checklists were my way of trying to provide consistency and visibility into the process of reducing my own pain, not my way of trying to create pain for others.
I currently have VPSes running on both lowend and big cloud providers that have been running for years with no downtime except when it restarts for updates.
This sounds a little like saying "all of North America except the U.S."
I don't think people are worried about random breakdowns on a single VPS, but scheduled updates are still downtime, and downtime causes revenue loss regardless of why it happened.
Any time a service is important enough I ask for two servers and a load balancer specifically to handle deployments and upgrade windows transparently. But! I agree services are usually less important than people think.
Ok, that explains this and the above comment. The last time I had to restart anything to apply an OS update was when I moved to a new RHEL LTS version, the lifespan of which is about 10 years. And there are many ways to do similar GNU/Linux upgrades without a restart at all.
Does Windows Server really need to restart for updates like normal Windows? If so, that's hilariously crap and I'm glad I've never had to touch it.
Edit: not saying a single VPS is fine if it's GNU/Linux, just remaking on the "restart to update" thing they mentioned
GP might have meant "upgrade: Windows(tm)", or he might have meant "windows of time which we have allocated to upgrading the server", and on my first reading I interpreted the second without a single shred of thought towards the possibility of the first.
Yes, you can restart all the services that probably slightly less downtime to full reboot on most VPS these days.
The article seems to be saying, instead of using CGI which spawns a process per request, to have a single Web server binary in Go/whatever. Which is totally reasonable and per my understanding what everyone already does nowadays (are any greenfield projects still using CGI?)
CGI is a "clever 'Unixy' hack" to add dynamicism to early web servers. They stopped being "relevant" a long time ago IMO.
In fact, I think your diatribe actually contradicts the article.
Basically, the article is saying that they went with the "simple" CGI approach which ended up creating more complexity than using the slightly more complex dedicated binary. The author essentially followed your advice which ended up causing more complexity and hacks.
The morale of the story is, you need to use the right tool for the job, and know when to switch. Sometimes that is the simple path, sometimes that is not.
- Nginx https://blog.nginx.org/blog/rate-limiting-nginx
- Caddy https://github.com/mholt/caddy-ratelimit
- Treafik https://doc.traefik.io/traefik/middlewares/http/ratelimit/
IMO Lambda is kind of an unfair example because the author doesn't mention having multiple instances. Plus a hot take I have is you should not be building an entire web-app as a Lambda or series of Lambda functions... AWS does not have solutions for load balancing in things like APIG so you would have to architect that via DynamoDB or ElastiCache which is the "extra layer or two of overhead" the author mentioned.
- Nginx https://blog.nginx.org/blog/rate-limiting-nginx
- Caddy https://github.com/mholt/caddy-ratelimit
- Treafik https://doc.traefik.io/traefik/middlewares/http/ratelimit/
If a web browser is in a glorified chromebook like a 2025 Macbook Air, indeed there's a lot of breathing room. A lot of ram. Processing power. Cores. It's nice. I get that.
And then you can do off-line first: meaning use the cached local storage available to WASM apps.
Then whatever needs to go to the mother ship, then call web apis in the cloud.
That would, in theory, basically giving power back from "net pc theory of things" back to "fat client"--if you ask the grey-haired nerds among you. And you would gain something.
But outside of a glorified chromebook like a 2025 Macbook Air--we have to remember that we are working with all kinds of web devices--everything from crap phones to satellite servers with terabytes of ram--so the scalability story as we have it isn't entirely wrong.
I have been to U of Toronto, very smart people. But honestly this is a troll piece. Doesn't go into any depth and one-sided. Unhelpful. I think U of Toronto's reputation would be better served by something more sophisticated than this asinine blog entry.
There is a cost to the network synchronisation, so you definitely want to scale vertically until you really must scale horizontally.
Also, why are people submitting every single post from this blog recently? Does this person actually do any work at UToronto, or is he just paid to write? There are -8000- links to various pages under this domain. I hope it's just a collective pseudonym like Nicolas Bourbaki and one person didn't write 8000 pages.
I'm desperate to use some of the insights from a navel-gazing university computing center in my infrastructure: IPv6 NAT (huh? what? What?!), custom config management driven by pathological NIH (I know precisely zilch about anything at utcc but I can already say with 100% confidence that your environment isn't special enough to do that), 'run more fibers', 'keep a list of important infrastructure contacts in case of outages', 'i just can't switch away from rc shell', and that's just in the last six months. On second thought, I'll just avoid all links to here in the future to save my sanity.
And your surface-level scans indicate a lot of specialized deep-thinking about some specific tools. Sure, but you'll also find some good generalizations that arose from the depth and breadth of experience. He knows Linux like the back of his hand, and he's been using Debian and Ubuntu, and Fedora, so perhaps we can derive some takeaways from those? And thoughts on ZFS and anti-spam email hosting, those are good too.
cks is the guy who influenced me to run Byron's rc shell, and also to install and run MH as my mail reader in 1993, and he also singlehandedly convinced me to install Ubuntu in 2006, which I maintained through multiple computers and upgrades through 2020. I cannot say that many, if any, of his blog posts were directly helpful to me, except for his opinions on PC hardware such as PCIe, parity RAM, and the like. But his wisdom is truly inspiring, as is his ability to stay with one employer for 100% of his career, doing more or less the same sysadmin things as he did in 1995.
The GIL is also on the way out: Python 3.13 already shipped the first builds of "free threading" Python and 3.14 and onwards will continue to make progress on that front: https://docs.python.org/3/howto/free-threading-python.html
And honestly, a Python web app running on a single core is still likely good for hundreds or even thousands of requests a second. The vast majority of web apps get a fraction of that.
On the other hand, some of them do have performance problems and are a nightmare to migrate to a different solution, it's hard to reason about what parts of the application turn out to be stateful.
Today I would not recommend single threading since it won't be able to use multiple cores.
5 minutes sounds reasonable
Even moreso, I could introspect the entire application state, including providing myself a shell and modify application state within the application. Keeping a blocklist inside a simple array- no problem! And being able to run a shell inside the same process meant I could inspect and even modify the array while the application ran.
That made it incredibly pleasant to use and run.
On the flip side, upgrades can be very challenging.
In a modern web application it's standard practice to run (at least) two instances of the application at once and use the load balancer to test both, or to drain jobs from one to the other. This is relatively easy if the applications are stateless.
Once the application holds all the state in memory, there's a real challenge. That array that seemed so clever- you'll need to serialize it so it can be reloaded at initialization time. Keeping all the session identifiers in memory- be ready to dump that.
Worse, if the application is not designed to share this state with another application, you're now in some trouble. This is fine if you're running a small site with a few users who can accept some downtime, but if you're running a serious service, you'd like to have some kind of upgrade path other than shutting the application down and starting it back up again.
"I'll just share state with other application servers", you might think, and then use something like ZeroMQ to transmit state", but once you think about sharing state between application servers, you realize you'd probably be better off using a tool like Redis, and you're right back where we started.
Rust. Axum. Single compiled binary. Even html/js/css is baked into it via RustEmbed. Sqlite + litestream to S2. Cloudflare in front.
Works extremely well.
Once the compute and robustness demands mandated we should scale the thing across multiple machines, both horizontally and vertically, I would replace most DI services with proxies that wrapped the original functionality and called out to a remote host.
So for this scaled up version, I would only need to change how the DI container got started, which could either be in 'Manager' mode (meaning a high level functionality, with the submodules injected as proxies calling out to services on client machines), 'Worker' mode (meaning it served the proxy requests on said worker machines) or 'Standalone' mode (meaning the DI container actually injected the actual fat versions of the services, that allowed the whole thing to run in a single process, very useful for local testing and debugging).
The was only a single executable which could be run in multiple 'modes' selected via a command line switch, which made versioning and deployment trivial.
If you like the sound of this, check out Elixir/Phoenix
tryauuum•9mo ago
Deebster•9mo ago
Or did I miss the sarcasm?
jeffrallen•9mo ago
tryauuum•9mo ago
I actually recently planned to write a single threaded API to free myself from thinking about race conditions