Why is everything so scalable?

https://www.stavros.io/posts/why-is-everything-so-scalable/

148•kunley•5d ago

Comments

radarsat1•3h ago

> scalability needs a whole bunch of complexity

I am not sure this is true. Complexity is a function of architecture. Scalability can be achieved by abstraction, it doesn't necessarily imply highly coupled architecture, in fact scalability benefits from decoupling as much as possible, which effectively reduces complexity.

If you have a simple job to do that fits in an AWS Lambda, why not deploy it that way, scalability is essentially free. But the real advantage is that by writing it as a Lambda you are forced to think of it in stateless terms. On the other hand if suddenly it needs to coordinate with 50 other Lambdas or services, then you have complexity -- usually scalability will suffer in this case, as things become more and more synchronous and interdependent.

> The monolith is composed of separate modules (modules which all run together in the same process).

It's of course great to have a modular architecture, but whether or not they run in the same process should be an implementation detail. Barriers should be explicit. By writing it all depending on local, synchronous, same-process logic, you are likely building in all sorts of implicit barriers that will become hidden dangers when suddenly you do need to scale. And by the way that's one of the reasons we think about scaling in advance, is that when the need comes, it comes quickly.

It's not that you should scale early. But if you're designing a system architecture, I think it's better to think about scaling, not because you need it, but because doing so forces you to modularize, decouple, and make synchronization barriers explicit. If done correctly, this will lead to a better, more robust system even when it's small.

Just like premature optimization -- it's better not to get caught up doing it too early, but you still want to design your system so that you'll be able to do it later when needed, because that time will come, and the opportunity to start over is not going to come as easily as you might imagine.

CaptainOfCoit•3h ago

> It's of course great to have a modular architecture, but whether or not they run in the same process should be an implementation detail

It should be, but I think "microservices" somehow screwed up that. Many developers think "modular architecture == separate services communicating via HTTP/network that can be swapped", failing to realize you can do exactly what you're talking about. It doesn't really matter what the barrier is, as long as it's clear, and more often than not, network seems to be the default barrier when it doesn't have to be.

dapperdrake•3h ago

The complexity that makes money is all the essential complexity of the problem domain. The "complexity in the architecture" can only add to that (and often does).

This is the part that is about math as a language for patterns as well as research for finding counter-examples. It’s not an engineering problem yet.

Once you have product market fit, then it becomes and engineering problem.

saidinesh5•3h ago

> If you have a simple job to do that fits in an AWS Lambda, why not deploy it that way, scalability is essentially free. But the real advantage is that by writing it as a Lambda you are forced to think of it in stateless terms.

What you are describing is already the example of premature optimization. The moment you are thinking of a job in terms of "fits in an AWS Lambda" you are automatically stuck with "Use S3 to store the results" and "use a queue to manage the jobs" decisions.

You don't even know if that job is the bottleneck that needs to scale. For all you know, writing a simple monolithic script to deploy onto a VM/server would be a lot simpler deployment. Just use the ram/filesystem as the cache. Write the results to the filesystem/database. When the time comes to scale you know exactly which parts of your monolith are the bottleneck that need to be split. For all you know - you can simply replicate your monolith, shard the inputs and the scaling is already done. Or just use the DB's replication functionality.

To put things into perspective, even a cheap raspberry pi/entry level cloud VM gives you thousands of postgres queries per second. Most startups I worked at NEVER hit that number. Yet their deployment stories started off with "let's use lambdas, s3, etc..". That's just added complexity. And a lot of bills - if it weren't for the "free cloud credits".

bpicolo•1h ago

> The moment you are thinking of a job in terms of "fits in an AWS Lambda" you are automatically stuck with "Use S3 to store the results" and "use a queue to manage the jobs" decisions.

I think the most important one you get is that inputs/outputs must always be < 6mb in size. It makes sense as a limitation for Lambda's scalability, but you will definitely dread it the moment a 6.1mb use case makes sense for your application.

hedora•31m ago

The counterargument to this point is also incredibly weak: It forces you to have clean interfaces to your functions, and to think about where the application state lives, and how it's passed around inside your application.

That's equivalent to paying attention in software engineering 101. If you can't get those things right on one machine, you're going to be in world of hurt dealing with something like lambda.

CaptainOfCoit•3h ago

I've seen startups killed because of one or two "influential" programmers deciding they need to start architecturing the project for 1000TPS and 10K daily users, as "that's the proper way to build scalable software", while the project itself hasn't even found product-market fit yet and barely has users. Inevitably, the project needs to make a drastic change which now is so painful to do because it no longer fits the perfect vision the lead(s) had.

Cue programmers blaming the product team for "always changing their mind" as they discover what users actually need, and the product team blaming developers for being hesitant to do changes, and when programmers agree, it takes a long time to undo the perfect architecture they've spent weeks fine-tuning against some imaginary future user-base.

the8472•3h ago

1000TPS isn't that much? Engineer for low latency and with a 10ms budget that'd be 10 cores if it were CPU-bound, less in practice since usually part of the time is spent in IO wait.

drob518•2h ago

And with CPUs now being shipped with 100+ cores, you can brute force that sucker a long way.

CaptainOfCoit•2h ago

> 1000TPS isn't that much?

Why does that matter? My argument is: Engineer for what you know, leave the rest for when you know better, which isn't before you have lots of users.

the8472•2h ago

What I'm saying is that "building for 1000TPS" is not what gets you an overengineered 5-layer microservice architecture. If you build for a good user experience (which includes low latency) you get that not-that-big scale without sharding.

hedora•1h ago

I doubt much time would be in I/O wait if this was really a scale up architecture. Ignoring the 100's of GB of page cache, it should be sitting on NVMe drives, where a write is just a PCIe round trip, and a read is < 1ms.

otabdeveloper4•2h ago

> 1000TPS and 10K daily users

That is not a lot. You can host that on a Raspberry Pi.

pja•1h ago

Not if you’re going to be “web scale” (tm) you can’t.

hedora•1h ago

You can host it on 8 raspberry pi's: Three for etcd, three for minio/ceph, and two for Kubernetes workers.

(16 if you need geo replication.)

strken•2h ago

I've seen senior engineers get fired and the business suffer a setback because they didn't have any way to scale beyond a single low spec VPS from a budget provider, and their system crashed when a hall full of students tried to sign up together during a demo and each triggered 200ms of bcrypt CPU activity.

CaptainOfCoit•2h ago

Wonder which one happens more often? Personally I haven't worked in that kind of "find the person to blame" culture which would led to something like that, so I haven't witnessed what you're talking about, but I believe you it does happen in some places.

sgarland•2h ago

That’s a skill issue, not an indictment on the limitations of the architecture. You can spin up N servers and load-balance them, as TFA points out. If the server is a snowflake and has nothing in IaC, again, not an architectural issue, but a personnel / knowledge issue.

strken•48m ago

The architecture in TFA is fine, and sounds preferable to microservices for most use cases.

I am worried by the talk of 10k daily users and a peak of 1000TPS being too much premature optimisation. Those numbers are quite low. You should know your expected traffic patterns, add a margin of error, and stress test your system to make sure it can handle the traffic.

I disagree that self-inflicted architectural issues and personnel issues are different.

kunley•2h ago

I frankly don't believe that in a workplace where an userbase can be characterized as a "hall full of students" anyone was fired overnight. Doesn't happen at these places. Reprimanded, maybe

hedora•1h ago

More frequently, anyone that sounded the alarm about this was let go months ago, so the one that'd be fired is the one in charge of the firing.

Instead, they celebrate "learning from running at scale" or some nonsense.

ipsento606•2h ago

> they didn't have any way to scale beyond a single low spec VPS from a budget provider

they couldn't redeploy to a high-spec VPS instead?

nasmorn•2h ago

This seems weird. I have a lot of experience with rails which is considered super slow. But the scenario you describe is trivial. Just get a bigger VPS and change a single env var. even if you fucked up everything else like file storage etc you can still to that. If you build your whole application in way where you can’t scale anything you should be fired. That is not even that easy

hedora•1h ago

People screw up the bcrypt thing all the time. Pick a single threaded server stack (and run on one core, because Kubernetes), then configure bcrypt so brute forcing 8 character passwords is slow on an A100. Configure kubernetes to run on a medium range CPU because you have no load. Finally, leave your cloud provider's HTTP proxy's timeout set to default.

The result is 100% of auth requests timeout once the login queue depth gets above a hundred or so. At that point, the users retry their login attempts, so you need to scale out fast. If you haven't tested scale out, then it's time to implement a bcrypt thread pool, or reimplement your application.

But at least the architecture I described "scales".

strken•1h ago

Of course you should be fired for doing that! I meant the example as an illustration of how "you don't need to scale" thinking turns into A-grade bullshit.

You do, in fact, need to scale to trivial numbers of users. You may even need to scale to a small number of users in the near future.

g8oz•53m ago

I'm not seeing how your example proves that a beefy server/cloud free architecture cannot handle the workload that most companies will encounter. The example you give of an under specified VPS is not what is being discussed in the article.

strken•45m ago

I was responding to CaptainOfColt, who was writing about premature optimisation killing companies. The article's proposed architecture seems fine and is similar to things I've done, but it's not an excuse to completely avoid thinking about future traffic patterns.

stavros•1h ago

> 1000TPS and 10K daily users

I absolutely agree with your point, but I want to point out, like other commenters here, that the numbers should be much larger. We think that, because 10k daily users is a big deal for a product, they're also a big deal for a small server, but they really aren't.

It's fantastic that our servers nowadays can easily handle multiple tens of thousands of daily users on $100/mo.

systems•1h ago

Clearly this project failed for either

  1. scaling for a very specific use case, or because
  2. it hasn't even found product-market fit

Blaming the failure or designing for scale seem misplaced, you can scale while remaining agile and open to change

smoe•1h ago

In my opinion, if those influential programmers actually architected around some concrete metrics like 1,000 TPS and 10K daily users, they would end up with much simpler systems.

The problem I see is much more about extremely vague notions of scalability, trends, best practices, clean code, and so on. For example we need Kafka, because Kafka is for the big boys like us. Not because the alternatives couldn’t handle the actual numbers.

CV-driven development is a much bigger issue than people picking overly ambitious target numbers.

acron0•3h ago

Ugh, there is just something so satisfying about developer cynicism. It gives me that warm, fuzzy feeling.

I basically agree with most of what the author is saying here, and I think that my feeling is that most developers are at least aware that they should resist technical self-pleasure in pursuit of making sure the business/product they're attached to is actually performing. Are there really people out there who still reach for Meta-scale by default? Who start with microservices?

lpapez•3h ago

> Are there really people out there who still reach for Meta-scale by default? Who start with microservices?

Anecdotally, the last three greenfield projects I was a part of, the Architects (distinct people in every case) began the project along the lines of "let us define the microservices to handle our domains".

Every one of those projects failed, in my opinion not primarily owing to bad technical decisions - but they surely didn't help either by making things harder to pivot, extend and change.

Clean Code ruined a generation of engineers IMO.

robertlagrant•2h ago

I think this sounds more like Domain Driven Design than Clean Code.

ahoka•1h ago

It kinda started with Clean Code. I remember some old colleagues walking around with the book in their hand and deleting ten year old comments in every commit they made: "You see, we don't need that anymore, because the code describes itself". It made a generation (generations?) of software developers think that all the architectural patterns were found now, we can finally do real engineering and just have to find the one that fits for the problem at hand! Everyone asked the SOLID principles during interviews, because that's how real engineers design! I think "cargo cult" was getting used at that time too to describe this phenomenon.

sarchertech•1h ago

It was (is) bad. The worst part is they the majority of people pushing it haven’t even read Clean Code. They’ve read a blog post by a guy who read a blog post by a guy who skimmed the book.

jwr•3h ago

I don't get this scalability craze either. Computers are stupid fast these days and unless you are doing something silly, it's difficult to run into CPU speed limitations.

I've been running a SaaS for 10 years now. Initially on a single server, after a couple of years moved to a distributed database (RethinkDB) and a 3-server setup, not for "scalability" but to get redundancy and prevent data loss. Haven't felt a need for more servers yet. No microservices, no Kubernetes, no AWS, just plain bare-metal servers managed through ansible.

I guess things look different if you're using somebody else's money.

floating-io•3h ago

For how many users, and at what transaction rate?

Not disagreeing that you can do a lot on a lot less than in the old days, but your story would be much more impactful with that information. :)

drob518•2h ago

One of the silliest things you can do to cripple your performance is build something that is artificially over distributed, injecting lots of network delays between components, all of which have to be transited to fulfill a single user request. Monoliths are fast. Yes, sometimes you absolutely have to break something into a standalone service, but that’s rare.

hedora•42m ago

I've notice a strong correlation between artificially over-distributing, and not understanding things like the CAP theorem. So, you end up with a slow system that's added a bunch of unsolvable distributed systems problems on its fast path.

(Most distributed systems problems are solvable, but only if the person that architected the system knows what they're doing. If they know what they're doing, they won't over-distribute stuff.)

crazygringo•2h ago

Scalability isn't just about CPU.

It's just as much about storage and IO and memory and bandwidth.

Different types of sites have completely different resource profiles.

sreekanth850•2h ago

Microservice is not a solution for scalability. There are multiple options for building scalable software, even a monolith or a modular monolith with proper loadbalanced setup will drastically reduce the complexity of microservice and get massive scale. Only bottleneck will be db.

hedora•36m ago

Microservices take an organizational problem:

The teams don't talk, and always blame each other

and adds distributed systems and additional organizational problems:

Each team implements one half of dozens of bespoke network protocols, but they still don't talk, and still always blame each other. Also, now they have access to weaponizable uptime and latency metrics, since because each team "owns" the server half of one network endpoint, but not the client half.

ben_w•2h ago

> unless you are doing something silly, it's difficult to run into CPU speed limitations.

Yes, but it's not difficult to do something silly without even noticing until too late. Implicitly (and unintentionally) calling something with the wrong big-O, for example.

That said, anyone know what's up with the slow deletion of Safari history? Clearly O(n), but as shown in this blog post still only deleted at a rate of 22 items in 10 seconds: https://benwheatley.github.io/blog/2025/06/19-15.56.44.html

phkahler•2h ago

>> Yes, but it's not difficult to do something silly without even noticing until too late. Implicitly (and unintentionally) calling something with the wrong big-O, for example.

On a non-scalable system you're going to notice that big-O problem and correct it quickly. On a scalable system you're not going to notice it until you get your AWS bill.

hedora•51m ago

Also, instead of having a small team of people to fight scalable infrastructure configuration, you could put 1-2 full time engineers on performance engineering. They'd find big-O and constant factor problems way before they mattered in production.

Of course, those people's weekly status reports would always be "we spent all week tracking down a dumb mistake, wrote one line of code and solved a scaling problem we'd hit at 100x our current scale".

That's equivalent to waving a "fire me" flag at the bean counters and any borderline engineering managers.

_ZeD_•3h ago

Honestly, in my experience, the only good reason to have microservices in a "software solution" is to be able to match 1 service -> 1 mantainer/team and have a big (read "nested", with multiple level of middle-managers) group of teams, each that may have different goals. In this way it's very easy to "map" a manager/team to a "place" in the solution map, with very explicit and documented interactions between them

nycdotnet•3h ago

This is Conway’s Law: You ship your org chart.

hsn915•3h ago

Nice story but the places I've seen that make use of services, there's never a "1 server -> 1 team". It's more like 20 services distributed among 3 teams, and some services are "shared" by all teams

smokel•3h ago

ThoughtWorks gathers this phenomenon under the term "envy": Web Scale envy [1] or Big Data envy [2] are two relevant blips on their technology radar. It is typically better to keep things simple.

[1] https://www.thoughtworks.com/radar/techniques/high-performan...

[2] https://www.thoughtworks.com/radar/techniques/big-data-envy

abujazar•3h ago

I've seen my share of insanely over-engineered Azure locked-in applications that could easily have been run on an open source stack on a $20 VM.

hedora•29m ago

But what if payroll grows to 100M internal users?

yobbo•3h ago

Many startup business models have no chance of becoming profitable unless they reach a certain scale, but they might have less than 1% probability of reaching that scale. Making it scalable is easy work since it is deterministic, but growing customers is not.

Another perspective is that the defacto purpose of startups (and projects at random companies) may actually be work experience and rehearsal for the day the founders and friends get to interview at an actual FAANG.

I think the author's “dress for the job you want, not the job you have” nails it.

nicoburns•3h ago

I guess the work is deterministic, but it often (unintentionally) makes the systems being developed non-deterministic!

potatolicious•2h ago

Ah yes. I once worked at a startup that insisted on Mongo despite not having anywhere near the data volume for it to make any sense at all. Like, we're talking 5 orders of magnitude off of what one would reasonably expect to need a Mongo deployment.

I was but a baby engineer then, and the leads would not countenance anything as pedestrian as MySQL/Postgres.

Anyway, fast forward a bit and we were tasked with building an in-house messaging service. And at that point Mongo's eventual consistency became a roaring problem. Users would get notifications that they had a new message, and then when they tried to read it it was... well... not yet consistent.

We ended up implementing all kinds of ugly UX hacks to work around this, but really we could've run the entire thing off of sqlite on a single box and users would've been able to read messages instantaneously, so...

nicoburns•1h ago

I've seen similar with Firebase. Luckily I took over as tech lead at this company, so I was able to migrate us to Postgres. Amusingly, as well as being more reliable, the Postgres version (on a single small database instance) was also much faster than the previous Firebase-based version (due to it enabling JOINs in the database rather than in application code).

potatolicious•1h ago

Funnily enough prior to this startup I had worked at a rainforest-themed big tech co where we ran all kinds of stuff on MySQL without issue, at scales that dwarfed what this startup was up to by 3-4 orders of magnitude.

I feel like that's kind of the other arm of this whole argument: on the one hand, you ain't gonna need that "scalable" thing. On the other hand, the "unscalable" thing scales waaaaaay higher than you are led to believe.

A single primary instance with a few read-only mirrors gets you a reaaaaaaally long way before you have to seriously think about doing something else.

stavros•1h ago

Unfortunately, you can't really get experience from solving hypothetical problems. The actual problems you'll encounter are different, and while you can get experience in a particular "scalable" stack, it won't be worth its maintenance cost for a company that doesn't need it.

ahartmetz•1h ago

>“dress for the job you want, not the job you have”

I don't think I should dress down any further :>

hsn915•3h ago

I think it was around 2015 when everything was basically AWS and Kubernetes

The turning point might have been Heroku? Prior to Heroku, I think people just assumed you deploy to a VPS. Heroku taught people to stop thinking about the production environment so much.

I think people were so inspired by it and wanted to mimic it for other languages. It got more people curios about AWS.

Ironically, while the point of Heroku was to make deployment easy and done with a single command, the modern deployment story on cloud infrastructure is so complicated most teams need to hold a one hour meeting with several developers "hands on deck" and going through a very manual process.

So it might seem counter intuitive to suggest that the trend was started by Heroku, because the result is the exact opposite of the inspiration.

esher•3h ago

I can relate - running a small hosting business. People come up with too complex solutions. They solve problems that they'd wish to have. For instance: HA setups are complex. If not done correctly, like in most cases, people don't gain the additional '9' from the SLA.

llm_nerd•3h ago

This piece is written with a pretty cliche dismissive tone that assumes that everything everyone else does is driven by cargo-culting if not outright ignorance. That people make these choices because they're just rushing to chase the latest trend.

They're just trying to be cool, you see.

Here's the thing, though: Almost every choice that leads to scalability also leads to reliability. These two patterns are effectively interchangeable. Having your infra costs be "$100 per month" (a claim that usually comes with a massive disclaimer, as an aside) but then falling over for a day because your DB server crashed is a really, really bad place to be.

blueflow•3h ago

> Here's the thing, though: Almost every choice that leads to scalability also leads to reliability.

How is that supposed to happen. Without k8 involved somehow?

97nomad•2h ago

There is a lot of instruments, that don't need k8s to be scalable and reliable. Starting from stateless services and simple load balancers and ending with actor systems like in Erlang or Akka.

crazygringo•2h ago

> Almost every choice that leads to scalability also leads to reliability.

Empirically, that does not seem to be the case. Large scalable systems also go offline for hours at a time. There are so many more potential points of failure due to the complexity.

And even with a single regular server, it's very easy to keep a live replica backup of the database and point to that if the main one goes down. Which is a common practice. That's not scaling, just redundancy.

llm_nerd•2h ago

>Empirically, that does not seem to be the case.

Failures are astonishingly, vanishingly rare. Like it's amazing at this point how reliable almost every system is. There are a tiny number of failures at enormous scale operations (almost always due to network misconfigurations, FWIW), but in the grand scheme of things we've architected an outrageously reliable set of platforms.

>That's not scaling, just redundancy.

In practice it almost always is scaling. No one wants to pay for a whole n server just to apply shipped logs to. I mean, the whole premise of this article is that you should get the most out of your spend, so in that case much better is two hot servers. And once you have two hot...why not four, distributed across data centers. And so on.

crazygringo•2h ago

> Failures are astonishingly, vanishingly rare

You and I must be using different sites and different clouds.

There's a reason isitdownrightnow.com exists. And why HN'ers are always complaining about service status pages being hosted on the same services.

By your logic, AWS and Azure should fail once in a millennium, yet they regularly bring down large chunks of the internet.

Literally last week: https://cyberpress.org/microsoft-azure-faces-global-outage-i...

llm_nerd•2h ago

>By your logic

Ah, I didn't realize it was you. If HN had a block function, I would 100% just block your argumentative nonsense.

crazygringo•1h ago

> When disagreeing, please reply to the argument instead of calling names.

> Please don't sneer, including at the rest of the community.

https://news.ycombinator.com/newsguidelines.html

okaleniuk•2h ago

Yes, reliability comes from the same ground the scalability does, and yes people are mostly chasing the latest trend. One does not contradict the other.

llm_nerd•2h ago

>yes people are mostly chasing the latest trend

https://www.youtube.com/watch?v=b2F-DItXtZs

15 years ago people were making the same "chasing trends" complaints. In that case there absolutely were people cargo culting, but to still be whining about this a decade and a half later, when it's quite literally just absolutely basic best practices.

sgarland•2h ago

A distributed monolith - which is what nearly all places claiming to run microservices actually have - has N^m uptime.

Even if you do truly have a microservices architecture, you’ve also now introduced a great deal of complexity, and unless you have some extremely competent infra / SRE folk on staff, that’s going to bite you. I have seen this over and over and over again.

People make these choices because they don’t understand computing fundamentals, let alone distributed systems, but the Medium blogs and ChatGPT have assured them that they do.

TickleSteve•3h ago

scale vertically before horizontally...

- scaling vertically is cheaper to develop

- scaling horizontally gets you further.

What is correct for your situation depends on your human, financial and time resources.

lambdaone•3h ago

I had a client with a system just like this. EBS, S3, RDS, Cognito, the lot. It cost $00s per month under almost no load, and was a maintenance nightmare - which was the real problem, not the cost, as it stopped working altogether eventually. A bit of hacking later, it all fits on a single VM that costs ~$10/month to run and is far easier to build, deploy and maintain.

drob518•2h ago

> The first problem every startup solves is scalability. The first problem every startup should solve is “how do we have enough money to not go bust in two months”, but that’s a hard problem, whereas scalability is trivially solvable by reading a few engineering blogs, and anyway it’s not like anyone will ever call you out on it, since you’ll go bust in two months.

I laughed. I cried. Having a back full of microservices scars, I can attest that everything said here is true. Just build an effin monolith and get it done.

DrScientist•2h ago

Isn't it simple as the following?

Break your code into modules/components that have a defined interface between them. That interface only passes data - not code with behaviour - and signal the method calls may fail to complete ( ie throw exceptions ).

ie the interface could be a network call in the future.

Allow easy swapping of interface implementations by passing them into constructors/ using factories or dependency injection frameworks if you must.

That's it - you can then start with everything in-process and the rapid development that allows, but if you need to you can add splitting into networked microservices - any complexity that arises from the network aspect is hidden behind the proxy, with the ultimate escape hatch of the exception.

Have I missed something?

crazygringo•2h ago

You're not missing much, but I don't understand why you're just basically repeating what the article already says. Except the article also says to use a monorepo.

DrScientist•2h ago

I think I've added a couple of elements to make it possible to scale your auth service if you need to. Easily swappable implementations and making sure the interfaces advertise that calls may simply fail.

Even so it's still very simple.

To scale your auth service you just write a proxy to a remote implementation and pass that in - any load balancing etc is hidden behind that same interface and none of the rest of the code cares.

crazygringo•2h ago

Good point! Sorry if I was being ungenerous.

I like the idea of the remote implementation being proxied -- not sure I've come across that pattern before.

stavros•1h ago

No, I'm saying you don't need to use a monorepo! The repo discussion is a bit orthogonal, and up to you to decide whether you want a single repo or multiple repos with modules/libraries that get deployed together.

williamdclt•2h ago

You're not missing something, but you're assuming that it's easy to know ahead of time where the module boundaries should be and what the interfaces should look like. This is very far from easy, if possible at all (eg google "abstraction boundaries are optimization boundaries").

Also, most of these interfaces you'll likely never need. It's a cost of initial development, and the indirection is a cost on maintainability of your code. It's probably (although not certainly) cheaper to refactor to introduce interfaces as needed, rather than always anticipate a need that might never come.

BirAdam•2h ago

Just to be honest for a bit here... we also should be asking what kind of scale?

Quite a while ago, before containers were a thing at all, I did systems for some very large porn companies. They were doing streaming video at scale before most, and the only other people working on video at that scale were Youtube.

The general setup for the largest players in that space was haproxy in front of nginx in front of several PHP servers in front of a MySQL database that had one primary r/w with one read only replica. Storage (at that time) was usually done with glusterfs. This was scalable enough at the time for hundreds of thousands of concurrent users, though the video quality was quite a bit lower than what people expect today.

Today at AWS, it is easily possible for people to spend a multiple of the cost of that hardware setup every month for far less compute power and storage.

aeyes•2h ago

The architecture you describe is ok because in the end it is a fairly simple website. Little user interaction, limited amount of content (at most a few million records), few content changes per day. The most complex part is probably to have some kind of search engine but even with 10 million videos an ElasticSearch index is probably no larger than 1GB.

The only problem is that there is a lot of video data.

ben_w•2h ago

This is probably also true for 98% of startups.

I think most people don't realise that "10 million" records is small, for a computer.

(That said, I have had to deal with code that included an O(n^2) de-duplication where the test data had n ~= 20,000, causing app startup to take 20 minutes; the other developer insisted there was no possible way to speed this up, later that day I found the problem, asked the CTO if there was a business reason for that de-duplication, removed the de-duplication, and the following morning's stand-up was "you know that 20 minute startup you said couldn't possibly be sped up? Yeah, well, I sped it up and now it takes 200ms")

phkahler•2h ago

I thought you were going to say to reduced O(n^2) to O(n*log(n)), but you just deleted the operation. Normally I'd say that's great, but just how much duplicate data is being left around now? Is that OK?

ben_w•2h ago

Each element was about, oh I can't remember exactly, perhaps 50 bytes? It wasn't a constant value, there could in theory be a string in there, but those needed to be added manually and when you have 20,000 of them, nobody would.

Also, it was overwhelmingly likely that none of the elements were duplicates in the first place, and the few exceptions were probably exactly one duplicate.

hedora•1h ago

I'm kind of surprised no one just searched for "deduplication algorithm". If it was absolutely necessary to get this 1MB dataset to be smaller (when was this? Did it need to fit in L2 on a pentium 3 something?), then it could probably have been deduped + loaded in 300-400ms.

Most engineers that I've worked with that die on a premature optimization molehill like you describe also make that molehill as complicated as possible. Replacing the inside of the nested loop with a hashtable probe certainly fits the stereotype.

ben_w•1h ago

> I'm kind of surprised no one just searched for "deduplication algorithm".

Fair.

To set the scene a bit: the other developer at this point was arrogant, not at all up to date with even the developments of his preferred language, did not listen to or take advice from anyone.

I think a full quarter of my time there was just fire-fighting yet another weird thing he'd done.

> If it was absolutely necessary to get this 1MB dataset to be smaller

It was not, which is why my conversation with the CTO to check on if it was still needed was approximately one or two sentences from each of us. It's possible this might have been important on a previous pivot of the thing, at least one platform shift before I got there, but not when I got to it.

gf000•2h ago

As opposed to what problem?

Like I can honestly have trouble listing too many business problems/areas that would fail to scale with their expected user count, given reasonable hardware and technical competence.

Like YouTube and Facebook are absolute outliers. Famously, stackoverflow used to run on a single beefy machine (and the reason they changed their architecture was not due to scaling issues), and "your" startup ain't needing more scale than SO.

bobdvb•1h ago

In streaming your website is typically totally divorced from your media serving. Media serving is just a question of cloud storage and pointing at an hls/dash manifest in that object store. Once it starts playing the website itself does almost nothing. Live streaming adds more complexity but it's still not much of a website problem.

Maintaining the media lifecycle, receiving, transcoding, making it available and removing it, is the big task but that's not real-time, it's batch/event processing at best efforts.

The biggest challenges with streaming are maintaining the content catalogue, which aren't just a few million records but rich metadata about the lifecycle and content relationships. Then user management and payments tends to also have a significant overhead, especially when you're talking about international payment processing.

BirAdam•30m ago

This was before HTML5 and before the browser magically handled a lot of this… so there was definitely a bit more to it. Every company also wanted to have statistics of where people scrub to and all of that. It wasn’t super simple, but yeah, it also wasn’t crazy complex. The point is, scale is achievable without complex inf.

sgarland•2h ago

THANK YOU. People look at me like I’m insane when I tell them that their overly-complicated pipeline could be easily handled by a couple of beefy servers. Or at best, they’ll argue that “this way, they don’t have to manage infrastructure.” Except you do - you absolutely do. It’s just been partially abstracted away, and some parts like OS maintenance are handled (not that that was ever the difficult part of managing servers), but you absolutely need to configure and monitor your specific XaaS you’re renting.

huflungdung•1h ago

What I say is that we massively underestimate just how fast computers are these days

ahartmetz•1h ago

Indeed - they are incredibly fast, it's just buried under layers upon layers of stuff

skydhash•1h ago

I look at what I can do with an old mac mini (2011) and it’s quite good. I think the only issue with hardware is technical maintenance, but at the scale of a small companies, that would probably be having a support contract with Dell and co.

hrimfaxi•1h ago

Depending on your regulatory environment, it can be cost-effective to not have to maintain your own data center with 24/7 security response, environmental monitoring, fire suppression systems, etc. (of course, the majority of businesses are probably not interested in things like SOC 2)

wongarsu•1h ago

This argument comes up a lot, but it feels a bit silly to me. If you want a beefy server you start out with renting one. $150/month will give you a server with 24 core Xeon and 256GB of RAM, in a data center with everything you mentined plus a 24/7 hands-on technician you can book. Preferably rent two servers, because reliablity. Once you outgrow renting servers you start renting rack space in a certified data center with all the same amenities. Once you outgrow that you start renting entire racks, then rows of racks or small rooms inside the DC. Then you start renting portions of the DC. Once you have outgrown that you have to seriously worry about maintaining your own data center. But at that point you have so much scale that this will be the least of your worries

BirAdam•37m ago

This is handled by colo.

BobbyTables2•1h ago

Have always felt the same.

I’ve seen an entire company proudly proclaim a modern multicore Xeon with 32GB RAM can do basic monitoring tasks that should have been possible with little more than an Arduino.

Except the 32GB Xeon was far too slow for their implementation...

gaoshan•1h ago

Anyone that says, "they don’t have to manage infrastructure" I would invite them to deal with a multi-environment terraform setup and tell me again that about what they don't have to manage.

macNchz•56m ago

Working on various teams operating on infrastructure that ranged from a rack in the back of the office, a few beefy servers in a colo, a fleet of Chef-managed VMs, GKE, ECS, and various PaaSes, what I've liked the most about the cloud and containerized workflows is that they wind up being a forcing function for reproducibility, at least to a degree.

While it's absolutely 100% possible to have a "big beefy server architecture" that's reasonably portable, reproducible, and documented, it takes discipline and policy to avoid the "there's a small issue preventing {something important}, I can fix it over SSH with this one-liner and totally document it/add it to the config management tooling later once we've finished with {something else important}" pattern, and once people have been doing that for a while it's a total nightmare to unwind down the line.

Sometimes I want to smash my face into my monitor the 37th time I push an update to some CI code and wait 5 minutes for it to error out, wishing I could just make that band-aid fix, but at the end of the day I can't forget to write down what I did, since it's in my Dockerfile or deploy.yaml or entrypoint.sh or Terraform or whatever.

CableNinja•2h ago

I thought i knew about scaled deployments before i started working where i do now. After staring here, i realized i had no idea what an environment of huuuuge scale actually was. Id been part of multi site deployments and scaled infra, but it was basically potatoes comparatively. We have a team whose platform we, on IT, call the DoS'er of the company. Its responsible for processing hundreds of thousands of test runs a day, and data is fed to a plethora of services after. The scale is so large that they are able to take down critical services, or deeply impact them, purely due to throughput, if a developer goes too far (like say uploading a million small logs to an s3 bucket every minute).

We also have been contacted by AWS having them ask us what the hell we are doing, for a specific set of operations. We do a huge prep for some operations, and the prep feeds massive amounts of data through some AWS services, so much so, they thought we were under attack or had been compromised. Nope, just doin data ingestion!

ahoka•2h ago

Are those over engineered systems even actually scalable? I know teams who designed a CQRS architecture using messages queues and a distributed NoSQL database and fail to sustain 10req/s for a read in something that is basically a CRUD application. Heck once someone literally said "But we use Kafka, why aren't we fast?!".

arealaccount•2h ago

Exactly this, every time I see kafka or similar its a web of 10M microprocesses that take more time in invocation alone than if you just ran the program in one go.

_kb•1h ago

How very kafkaesque.

sgarland•1h ago

I watched in amusement as the architecture team at $JOB eagerly did a PoC of a distributed RDBMS, only to eventually conclude that the latency was too high. Gee… if only someone had told you that would happen when you mentioned the idea. Oh wait.

dig1•1h ago

> The general setup for the largest players in that space was haproxy in front of nginx in front of several PHP servers in front of a MySQL database that had one primary r/w with one read only replica.

You'd be surprised that the most stable setups today are run this way. The problem is that this way it's hard to attract investors; they'll assume you are running on old or outdated tech. Everything should be serverless, agentic and, at least on paper, hyperscalable, because that sells further.

> Today at AWS, it is easily possible for people to spend a multiple of the cost of that hardware setup every month for far less compute power and storage.

That is actually the goal of hyperscalers: they are charging you premium for way inferior results. Also, the article stated a very cold truth: "every engineer wants a fashionable CV that will help her get the next job" and you won't definitely get a job if you said: "I moved everything from AWS and put it behind haproxy on one bare-metal box for $100/mo infra bill".

bootsmann•15m ago

> The problem is that this way it's hard to attract investors; they'll assume you are running on old or outdated tech. Everything should be serverless, agentic and, at least on paper, hyperscalable, because that sells further.

Investors don't give a shit about your stack

donatj•1h ago

Exactly this! The educational product I work on is used by hundreds of thousands of students a day, and the secret to our success is how simple our architecture is. PHP monoliths + Cache (Redis/Memcached) scale super wide basically for free. We don't really think about scalability, it just happens.

I have a friend whose startup had a super complicated architecture that was falling apart at 20 requests per second. I used to be his boss a lifetime ago and he brought me in for a meeting with his team to talk about it. I was just there flabbergasted at "Why is any of this so complicated?!" It was hundreds of microservices, many of them black boxes they'd paid for but had no access to the source. Your app is essentially an async chat app, a fancy forum. It could have been a simple CRUD app.

I basically told my friend I couldn't help, if I can't get to the source of the problematic nodes. They'll need to talk to the vendor. I explained that I'd probably rewrite it from the ground up. They ran out of runway and shut down. He's an AI influencer now...

gampleman•2h ago

Hilariously written but also too true.

One start up I worked at we had 2 Kubernetes clusters and a rat's nest of microservices for an internal tool that, had we been actually successful at delivering sufficient value would have been used by at most a 100 employees (and those would unlikely be concurrent). And this was an extremely highly valued company at the time.

Another place I worked at we were paying for 2 dev ops engineers (and those guys don't come cheap) to maintain our deployment cluster for 3 apps which each had a single customer (with a handful of users). This whole operation had like 20 people and an engineering team of 8.

andoando•2h ago

We have the same shit and its super annoying too cause in addition I cant do shit without going through the dev ops team even though were 5 engineers.

radiator•2h ago

This sounds just about right: I have read that Kubernetes is the greek term for "more containers than customers".

Thiez•2h ago

What were these dev ops engineers doing all day? Surely you can only polish a cluster so much before it's done and there is nothing left to do?

gampleman•2h ago

You should have seen the architecture they came up with... it had ALL the bells and whistles you could possibly imagine and cost an absolute fortune.

Of course they eventually got bored and quit. And then it became really annoying since no one else understood anything about it.

tesdinger•2h ago

> you usually only have one database

What if I use the cloud? I don't even know how many servers my database runs on. Nor do I care. It's liberating not having to think about it at all.

sgarland•2h ago

And your cloud providers thank you for giving their executives a third yacht.

JohnMakin•2h ago

If you’ve ever been in a situation where you do suddenly face scale and have to rip apart a legacy monolith that was built without scale in mind, you’ll chuckle at this article. It’s extremely painful.

sgarland•2h ago

Legitimately asking, how? The only bottleneck should be the DB, and if you can saturate a 128-core DB, I want to see your queries and working set size. Not saying it can’t happen, but it’s rare that someone has actually maxed out MySQL or Postgres without there being some serious schema and query flaws, or just poor / absent tuning.

JohnMakin•1h ago

You’re thinking purely in terms of app performance. have you ever seen a terrible db schema? Having to suddenly iterate fast with a brittle codebase that doesnt really allow that ive seen bring teams to their knees for a year+.

I’ve seen monoliths because of their sheer size and how much crap and debt is packed into them, build and deploy processes taking several hours if not an entire day for some fix that could be ci/cd’d in seconds if it wasn’t such a ball of mud. Then, what tends to happen, is the infrastructure around it tends to compensate heavily for it, which turns into its own ball of mud. Nothing wrong with properly scaled monoliths but it’s a bit naive, in my personal experience, to just scoff at scale when your business succeeding relies on scale at some point. Don’t prematurely optimize, but don’t be oblivious to future scenarios, because they can happen quicker than you think

ben_w•2h ago

Some related blog posts of mine with some thematic overlap:

• https://benwheatley.github.io/blog/2025/02/26-14.04.07.html

• https://benwheatley.github.io/blog/2024/04/07-21.31.19.html

pdhborges•2h ago

Scale articles are too focused on architecture. What about business problems that come with scale. At a certain scale rare events are common many cases cease to be fixable by some random process that involves humans you have to handle a lot more business scenarios with your code.

reactordev•2h ago

I read this, and have the opposite experience. Your monolith will fester as developers step on each others toes. You aren’t solving for scalability, you’re solving for sovereignty. Giving other teams the ability to develop their own service without needing to conform to your archaic grey beard architecture restrictions and your lack of understanding what a pod is or how to get your logs from your cloud.

No, this whole article reads like someone who is crying that they no longer have their AS/200. Bye. The reason people use AWS and all those 3rd party is so they don’t have to reinvent the wheel which this author seems hell bent on.

Why are we using TCP when a Unix file is fine… why are we using databases when a directory and files is fine? Why are we scaling when we aren’t Google when my single machine can serve a webpage? Why am I getting paid to be an engineer while eschewing all the things that we have advanced over the last two decades?

Yeah, these are not the right questions. The real question should be: “Now that we have scale what are we gonna do with it?”

sgarland•2h ago

> Giving other teams the ability to develop their own service without needing to conform to your archaic grey beard architecture restrictions

IME at many different SaaS companies, the only one that had serious reliability was the one that had “archaic grey beard architecture restrictions.” Devs want to use New Shiny X? Put a formal request before the architectural review committee; they’ll read it, then explain how what the team wants already exists in a different form.

I don’t know why so many developers - notably, not system design experts, nor having any background in infrastructure - think that they know better than the gray beards. They’ve seen some shit.

> and your lack of understanding what a pod is or how to get your logs from your cloud.

No one said the gray beards don’t know this. At the aforementioned company, we ran hybrid on-prem and AWS, and our product was hybrid K8s and traditional Linux services.

Re: cloud logs, every time I’ve needed logs, it has consistently been faster for me to ssh onto the instance (assuming it wasn’t ephemeral) and use ripgrep. If I don’t know where the logs were emitted from, I’ll find that first, then ssh. The only LaaS I’ve used that was worth a damn was Sumologic, but I have no idea how they are now, as that was years ago.

fragmede•1h ago

Splunk was (and is) the gold standard for centralized logging. The problem with it now is mainly that it's crazy expensive, though the operational engineering burden in order to run it well is non-zero and has to be accounted for. But being able to basically grep across all logs on the whole fleet, and then easily being able to visualize those results, made me never want to go back to having to ssh somewhere and run grep manually. I could write a script to ssh to all the app servers, grab the past 15 minutes of requests, extract their IPs, and plot them on a map to see which countries are hot, but that would be annoying enough that I'd really have to want to do that.

Meanwhile if you have Splunk, you specify the logfile name and how to extract the IP and then append "| iplocation clientip | geostats count by Country" to see which countries requests are coming from, for example. Or append "| stats count by http_version" and then click pie chart and get a visualization that breaks down how much traffic is still on HTTP 1.1, who's on 1.2, whos is on 2, and who's moved to QUIC/3.

sarchertech•2h ago

>step on each others toes

Which leads us to a huge problem I’ve seen over the past few decades.

Too many developers for the task at hand. It’s easier for large companies to hire 100 developers with a lower bar that may or may not be a great fit than it is to hire 5 experts.

Then you have a 100 developers that you need to keep busy and not all of them can be busy 100% of the time because most people aren’t good at making their own impactful work. Then instead of trying to actually find naturally separate projects for some of them to do, you attempt to artificially break up your existing project in a way that 100 developers can work on together (and enforce those boundaries at through a network).

This artificial separation fixes some issues (merge conflicts, some deployment issues), but it causes others (everything is a distributed system now, multi stage and multi system deployments required for the smallest changes, massive infrastructure, added network latency everywhere).

That’s not to say that some problems aren’t really so big that you need a huge number of devs, but the vast majority aren’t.

> they don’t have to reinvent the wheel

Everything is a trade off, but we shouldn’t discount the cost of using generic solutions in place of bespoke ones.

Generic solutions are never going to be as good of a fit as something designed to do exactly what you need. Sometimes the tradeoff is worth it. Sometimes it’s isn’t. Like when you need to horizontally scale just to handle the overhead. Or when you have to maintain a fork of a complex system that does way more than you need.

It’s the same problem as hiring 100 generic devs instead of 5 experts. Sometimes worth it. Sometimes not.

There’s another issue here too. If not enough people are reinventing the wheel we get stuck in local optima.

The worst part is that not enough people spend enough time even thinking about these issues to make informed decisions regarding the tradeoffs they are making.

Havoc•2h ago

Comes down to knowing when to stop. You don’t really want to DIY your own orchestrator etc. So better off just using kubernetes. But then not going too far down that rabbit hole.

ie yes kubernetes but the simplest vanilla version of it you can manage

sgarland•2h ago

I wouldn’t start with K8s, and I’ve administered it at multiple companies. Unless every one of your initial hires is a SWE turned SRE, you’re in for a bad time (and you don’t need it).

I’d personally start with Linux services on some VMs, but Docker Compose is also valid. There are plenty of wrappers around Compose to add features if you’d like.

sreekanth850•2h ago

Frontend folks weren’t happy with how simple things were, so after seeing microservices, they invented microfrontends, and balance in complexity restored.

mjr00•2h ago

> The first problem every startup should solve is “how do we have enough money to not go bust in two months”, but that’s a hard problem, whereas scalability is trivially solvable by reading a few engineering blogs [...] Do you know what the difference between Google and your startup is? It’s definitely not scalability, you’ve solved that problem. It’s that Google has billions upon billions with which to pay for that scalability, which is really good because scalability is expensive.

Too true. Now that I've stepped into an "engineering leadership" role and spend as much time looking at finances as I do at code, I've formed the opinion that in 99.999% of cases, engineering problems are really business problems. If you could throw infinite time and money at the technical challenges, they'd no longer be challenging. But businesses, especially startups, don't have infinite (or even "some") money and time, so the challenge is doing the best engineering work you can, given time and budget constraints.

> The downsides [of the monolith approach]

I like the article's suggestion of using explicitly defined API boundaries between modules, and that's a good approach for a monolith. However one massive downside that cannot be ignored -- by having a single monolith you now have an implicit dependency on the same runtime working on all parts of your code. What I mean by this is, all your code is going to share the same Python version and same libraries (particularly true in Python, where it's not a common/well-supported use case to have multiple versions of library dependencies). This means that if you're working on Module A, and you realize you need a new feature from Pandas 2.x, but the rest of the code is on Pandas 1.x... well, you can't upgrade unless you go and fix Modules B, C, D ... Z to work with Pandas 2.

This won't be an issue at the start, but it's worth pointing out. Being forced to upgrade a core library or language runtime and finding out it's a multi-month disruptive project can be brutal.

okaleniuk•2h ago

Most of the startups actually plan to survive for more than 2 months. And it makes total sense to think about scalability, reliability, and performance while it's still possible to change your whole stack every other week. Not forgetting about other things such as securing your cash flow, growing your talent pool, protecting your IP, etc. Finding a good balance between multiple focii is exactly the job for a founder. Of course, it's a hard job, that's why we don't see many successful startups to begin with.

zokier•2h ago

In lot of contexts scaling down is far more important than scaling up. In that sense scalability is cost-optimization; instead of provisioning fixed capacity that is enough for (predicted) peak loads, you can scale based on actual demand and save money or have higher utilization.

tobyhinloopen•2h ago

It isn't hard to make something a bit scalable, but it is very hard to make it scalable _later_.

pigcat•2h ago

My friend is the first dev hire at a startup where they prematurely overengineered for scalability. The technical founders had recently exited a previous startup and their rationale was that it makes a future acquisition easier, since a potential acquirer will weigh scalability in their evaluation of the code (and maybe even conflate it with quality). In fact it was a regret from their first startup that they hadn't baked in scalability earlier. I remain skeptical of the decision, but curious if there's any truth to the fact that acquirers weigh scalability in their scorecard?

tptacek•2h ago

Related: Crawshaw's "one process programming":

https://crawshaw.io/blog/one-process-programming-notes

alpine01•1h ago

There's a now famous Harvard lecture video on YouTube of Zuckerberg earlier in the Facebook days, where he walks through the issues they hit early on.

https://www.youtube.com/watch?v=xFFs9UgOAlE

I watched it ages ago, but I seem to remember one thing that I liked was that each time they changed the architecture, it was to solve a problem they had, or were beginning to have. They seemed to be staying away from pre-optimization and instead took the approach of tackling problems as they had as they appeared, rather than imagining problems long before/if they occurred.

It's a bit like the "perfect is the enemy of done" concept - you could spend 2-3x the time making it much more scalable, but that might have an opportunity cost which weakens you somewhere else or makes it harder/more expensive to maintain and support.

Take it with a pinch of salt, but I thought it seemed like quite a good level-headed approach to choosing how to spend time/money early on, when there's a lot of financial/time constraints.

charlimangy•1h ago

Working in modern architectures that can scale is pretty important for developers that want to have attractive resumes. Given that your startup has a 9 out of 10 chance of failing you're going to need another job. If you want people to stay you have to give them the security of keeping up with at least some of the latest fashions.

gamerDude•1h ago

When I'm working with new developers I always have to convince them to simplify their setup. Why are we on autoscaled, pay by the query infra when we are serving a few people. Then they complain how expensive it is. I had someone tell me that their costs were $1500/mon when they were in demo stages. I asked them why they aren't hosting on a single small server for $20. And they responded that it didn't matter because they were using free credits.

Except that those free credits will go away and you'll find yourself not wanting to do all the work to move it over when it would've been easier to do so when you just had that first monolith server up.

I think free credits and hyped up technology is to blame. So, basically a gamed onboarding process that gets people to over-engineer and spend more.

ReptileMan•1h ago

That is helluva verbose way to quote Knuth ...

treve•1h ago

A bit of an alternative take on this, but I talk to a lot of folks at small start-ups (in Toronto, if that matters), but it seems like most people actually get this right and understand not to bring in complexity until later. Things like microservices seems like they are mostly understood as a tool that's not really meant to solve a real scalibility problem and is massive liability early on.

The exceptions are usually just inexperienced people at the helm. My feeling is, hire someone with adequate experience and this is likely not an issue.

I do think architecture astronauts tend to talk a lot more about their houses of cards, which makes it seem like these set ups are more popular than they are.

jcarrano•57m ago

I think part of the problem is (some) programmers being unable to draw clear encapsulation boundaries when writing a monolith. I'm not even referring to imposing a discipline for a whole team, but the ability to design a clean internal API and stick to it oneself.

sagyam•30m ago

I have read and watched these articles and videos where people seem to have a problem with Microservice, Kubernetes, cloud providers, or anything that's not a PHP server sitting behind an nginx running on a $5 VPS. I have also seen the front-end analogy of these types of posts, where anything that is not written using HTML, CSS, and jQuery is unnecessary bloat. I will soon write a blog, which I think will cover more points and nuances of both sides. For now, here are some of my scattered thoughts.

- If deploying your MVP to EKS is overengineering, then signing a year-long lease for bare metal is hubris. Both think one day they will need it, but only one of them can undo that decision.

- Don't compare your JBOD to a multi-region replicated, CDN-enabled object store that can shrug off a DDoS attack. One protects you from those egress fees, and the other protects you from a disaster. They are not comparable.

- A year from now, the startup you work for may not exist. Being able to write that you have experience with that trendy technology on your resume sure sounds nice. Given the layoffs we are seeing right now, putting our interest above the company's may be a good idea.

- Yes, everyone knows modern CPUs are very fast, and paying $300/mo for an 8-core machine feels like a ripoff, but unless you are business of renting GPUs and selling tokens. Compute was never your cost center; it was always humans. For some companies, not being able to meet your SLA due to talent attrition is scarier than the cloud bill.

I know these are one-sided arguments, and I said I would cover both sides with more nuance. I need some time to think through all the arguments, especially on the frontend side. I will soon write a blog.

rvitorper•6m ago

I thought capitalism was about adding value, not conflict of interest

vbezhenar•24m ago

Everything is scalable, because it became very easy to write scalable software. I guess that's the reason.

GPT-5o-mini hallucinates medical residency applicant grades

Astronomers 'image' a mysterious dark object in the distant Universe

Pyrefly: Python type checker and language server in Rust

Zoo of Array Languages

ADS-B Exposed

Wireshark 4.6.0 Supports macOS Pktap Metadata (PID, Process Name, etc.)

Ultrasound is ushering a new era of surgery-free cancer treatment

NanoChat – The best ChatGPT that $100 can buy

Don’t Look Up: Sensitive internal links in the clear on GEO satellites [pdf]

Dutch government takes control of Chinese-owned chipmaker Nexperia

KDE celebrates the 29th birthday and kicks off the yearly fundraiser

Kyber (YC W23) Is Hiring an Enterprise AE

Show HN: CSS Extras

No science, no startups: The innovation engine we're switching off

CRISPR-like tools that finally can edit mitochondria DNA could be revolutionary

America is getting an AI gold rush instead of a factory boom

Copy-and-Patch: A Copy-and-Patch Tutorial

Palisades Fire suspect's ChatGPT history to be used as evidence

First device based on 'optical thermodynamics' can route light without switches

Show HN: SQLite Online – 11 years of solo development, 11K daily users

Smartphones and being present

Modern iOS Security Features – A Deep Dive into SPTM, TXM, and Exclaves

JIT: So you want to be faster than an interpreter on modern CPUs

America's future could hinge on whether AI slightly disappoints

Why did containers happen?

DDoS Botnet Aisuru Blankets US ISPs in Record DDoS

Thread First – A model for chat experiences

Gravity can explain the collapse of the wavefunction

Debugging Humidity: Lessons from deploying software in the physical world

Strudel REPL – a music live coding environment living in the browser

GPT-5o-mini hallucinates medical residency applicant grades

Astronomers 'image' a mysterious dark object in the distant Universe

Pyrefly: Python type checker and language server in Rust

Zoo of Array Languages

ADS-B Exposed

Wireshark 4.6.0 Supports macOS Pktap Metadata (PID, Process Name, etc.)

Ultrasound is ushering a new era of surgery-free cancer treatment

NanoChat – The best ChatGPT that $100 can buy

Don’t Look Up: Sensitive internal links in the clear on GEO satellites [pdf]

Dutch government takes control of Chinese-owned chipmaker Nexperia

KDE celebrates the 29th birthday and kicks off the yearly fundraiser

Kyber (YC W23) Is Hiring an Enterprise AE

Show HN: CSS Extras

No science, no startups: The innovation engine we're switching off

CRISPR-like tools that finally can edit mitochondria DNA could be revolutionary

America is getting an AI gold rush instead of a factory boom

Copy-and-Patch: A Copy-and-Patch Tutorial

Palisades Fire suspect's ChatGPT history to be used as evidence

First device based on 'optical thermodynamics' can route light without switches

Show HN: SQLite Online – 11 years of solo development, 11K daily users

Smartphones and being present

Modern iOS Security Features – A Deep Dive into SPTM, TXM, and Exclaves

JIT: So you want to be faster than an interpreter on modern CPUs

America's future could hinge on whether AI slightly disappoints

Why did containers happen?

DDoS Botnet Aisuru Blankets US ISPs in Record DDoS

Thread First – A model for chat experiences

Gravity can explain the collapse of the wavefunction

Debugging Humidity: Lessons from deploying software in the physical world

Strudel REPL – a music live coding environment living in the browser

Why is everything so scalable?

Comments