They didn't have this kind of compute back when the article was written. Which is the point in the article.
Should have prefixed my comment wirh "nowadays"
We're slowly getting back to similarly-sized systems. IBM now has POWER systems with more than 1,500 threads (although I assume those are SMT8 configurations). This is a bit annoying because too many programs assume that the CPU mask fits into 128 bytes, which limits the CPU (hardware thread) count to 1,024. We fixed a few of these bugs twenty years ago, but as these systems fell out of use, similar problems are back.
This is equal to the combined single precision GPU and CPU horsepower of a modern MacBook [1]. Really makes you think about how resource-intensive even the simplest of modern software is...
2003 might seem like ancient history, but computers back then absolutely could handle 10,000 concurrent connections.
There's probably going to be some overhead, but it seems like you could do 1M, if you have the bandwidth.
[1]: https://www.amd.com/content/dam/amd/en/documents/products/et...
A lot of software time is spent making something scalable when in 2025 I can probably run any site the bottom 99% of most visited sites on the internet on a couple machines and < 40k capital.
What % is the AWS console, and what counts as "running" it?
0%
Prior to the recent RAM insanity(a big caveat I know) a 1u supermicro machine with 768GB some NVME storage and twin 32 core Epyc 9004s was ~12K USD. You can get 3 of those and and some redundant 10G network infra(people are literally throwing this out) for < 40k. Then you just have to find a rack/internet connection to put them in which would be a few hundred a month.
The reality is most sites don't need multi region setups, they have very predicable load and 3 of those machines would be massive overkill for many. A lot of people like to think they will lose millions per second of down time, and some sites certainly do but most wont.
All of this of course would be using new stuff. If you wanted to use used stuff the most cost effective are the 5 year old second gen xeon scalables that are being dumped by cloud providers. Those are more than enough compute for most they are just really thirsty so you will pay with the power bill.
This of course is predicated on assumption you have the skill set to support these machines and that is increasingly becoming less common though as successful companies that started in the last 10 years are starting to do more "hybrid cloud" it is starting to come back around.
Otherwise Viaweb would be the shining star of 2025. Instead it's a forgotten footnote on a path to programming with money (VC).
A lot of analytic data is like that. If you captured it for 1% of users you'd find out what you needed to know at 1% of the cost.
This article describes the 10k client connection problem, you should be handling 256K clients :)
When they say "most companies can run in a single server, but do backups" they usually mean the physical kind.
It was about physical servers.
However, most people used dedicated machines when this was written, so scaling 10K open connections on a daemon was essentially the same thing as 10K open connections on a single machine.
Those are not "by process" capabilities and daemons were never restricted to a single process.
The article focuses on threads because processes had more kernel level problems than threads. But it was never about processes limitations.
And by the way, process capabilities on Linux are exactly the same as machine capabilities. There's no limitation. You are insisting everybody uses a category that doesn't even exist.
Now perhaps my memory is a bit fuzzy after all these years, but I'm pretty sure when I asking about scaling above 15,000 simultaneous connections back in 1999 (I think the discussion on linux-kernel is referenced in this article), it was for a server listening on a single port that required communication between users and the only feasible way at the time to do that was multiplexing the connections in a single process.
Without that restriction, hitting 10,000 connections on a single Linux machine was much easier by running multiple daemons each listening on their own port and just use select(). It still wasn't great, but it wasn't eating 40% of the time in poll() either.
Most of the things the article covers: multiplexing, non-blocking IO and event handling were strategies for handling more connections in a process. The various multiplexing methods discussed were because syscalls like poll() scaled extremely poorly as the number of fds increased. None of that is particularly relevant for 1 connection per process forking daemons where in many cases, you don't even need polling at all.
Serving less than 1k qps per core is pretty underwhelming today, at such a high core count you'd likely hit OS limitations way before you're bound by hardware
But you're right OS resource limitations (file handles, PIDs, etc) would be the real pain for you. One problem after another.
Now, the real question is do you want to spend your engineering time on that? A small cluster running erlang is probably better than a tiny number of finely tuned race-car boxen.
Totally agree on many smaller boxes vs bigger box especially for proxying usecase.
"libuv is a multi-platform C library that provides support for asynchronous I/O based on event loops. It supports epoll(4), kqueue(2)"
Picking the correct theoretical architecture can't save you if you bog down on every practical decision.
If you haven’t had experience with actual performant code JS can seem fast. But it’s is a Huffy bike compared to a Kawasaki H2. Sure it is better than a kid’s trike but it is not a performance system by any stretch of the imagination. You use JS for convenience, not performance.
JavaScript engines also are also JITted which is better than a straight interpreter but except microbenchmarks worse than compiled code.
I use it for nearly all my projects. It is fine for most UI stuff and is OK for some server stuff (though Python is superior in every way). But would never want to replace something like nginx with a JavaScript based web server.
If your Node app spends vety little RAM per client, it can indeed service a great many of them.
A PHP script that does little more than checking credentials and invoking sendfile() could be adequate for the case of serving small files described in the article.
A lot of apps seem like they could literally all use the same exact backend, if there was a service that just gave you the 20 or so features these kinds of things need.
Pocketbase is pretty close, but not all the way there yet, you still have to handle billing and security yourself, you're not just creating a client that has access to whatever resources the end user paid for and assigned to the app via the host's UI.
https://youtu.be/hjjydz40rNI?si=F7aLOSkLqMzgh2-U
(From Wayne's World--how we knew the comedians had smart advisors)
The date (2003) is incorrect.
You're right, it's even older than that; it should be (1999).https://web.archive.org/web/*/https://www.kegel.com/c10k.htm...
It seems to me that there are far fewer problems nowadays with trying to figure out how to serve a tiny bit of data to many people with those kinds of resources, and more problems with understanding how to make a tiny bit of data relevant.
It still absolutely can be. We've just lost touch.
Yes, an RPi4 might be adequate to serve 20k of client requests in parallel, without crashing or breaking too much sweat. You usually want to plan for 5%-10% of this load as a norm if you care about tail latency. But a 20K spike should not kill it.
What Erlang excels at is "no programming mistake ever should the whole system permanently down". As in components will reboot to recover. It's not a magic fix for anything outside of that.
gnabgib•1mo ago
Title: The C10K problem
Popular in:
2014 (112 points, 55 comments) https://news.ycombinator.com/item?id=7250432
2007 (13 points, 3 comments) https://news.ycombinator.com/item?id=45603