Serving 200M requests per day with a CGI-bin

https://jacob.gold/posts/serving-200-million-requests-with-cgi-bin/

114•feep•6h ago

Comments

Traubenfuchs•5h ago

Try an apache tomcat 11 next. You can just dump .jsp files or whole java servlet applications as .war file via ssh and it will just work!

One shared JVM for maximum performance!

It can also share db connection pools, caches, etc. among those applications!

Wow!

atemerev•5h ago

I miss this so much. Deployment should be just copying the file (over ssh or whatever). Why people overcomplicated it so much?

rexreed•5h ago

PHP can work the same way. Push / FTP / SFTP PHP file to directory, deployed.

Twirrim•4h ago

We used to use symlinks to enable atomic operations, too. e.g. under /var/www/ we'd have /var/www/webapp_1.0, and have a symlink /var/www/webapp pointing to it. When there was a new version, upload it to /var/www/webapp_1.1, and then to bring it live, just update the symlink. Need to roll back? Switch the symlink back.

trinix912•2h ago

Wouldn't that cause problems when someone would find the old version and corrupt the data with it? Or would only the current version be accessible from the outside?

indigodaddy•1h ago

How would an external user find the old version?

mrweasel•5h ago

> Why people overcomplicated it so much?

Because a lot of production software is half-baked. If you have to hand over an application to an operations team you need documentation, instrumentation, useful logging, error handling and a ton of other things. Instead software is now stuffed into containers that never receive security updates, because containers make things secure apparently. Then the developers can just dump whatever works into a container and hide the details.

To be fair most of that software is also way more complex today. There are a ton of dependencies and integrations and keeping track of them is a lot of work.

I did work with an old school C programmer that complained that a system we deployed was a ~2GB war file, running on Tomcat and requiring at least 8GB of memory and still crashed constantly. He had on multiple occasions offered to rewrite the how thing in C, which he figured would be <1MB and requiring at most 50MB of RAM to run. Sadly the customer never agreed, I would have loved to see if it had worked out as he predicted.

nickjj•3h ago

Docker helps with this nowadays. Of course you need to understand setting things up the first time you do it but once you know, it can apply to any tech stack.

I develop and deploy Flask + Rails + Django apps regularly and the deploy process is the same few Docker Compose commands. All of the images are stored the same with only tiny differences in the Dockerfile itself.

It has been a tried and proven model for ~10 years. The core fundamentals have held up, there's new features but when I look at Dockerfiles I've written in 2015 vs today you can still see a lot of common ideas.

atemerev•2h ago

Docker makes things opaque. You deploy black boxes and have no idea how the components there operate. Which is fine for devops, but as a software engineer, I prefer to work without Docker (and having to use Docker to install something on a local machine is an abomination, of course).

stackskipton•1h ago

Ops here, I mean you still can if you use something like Golang or Java/.Net self-contained. However, the days of "Just transfer over PHP files" ignore the massive setup that Ops had to do to get web server into state where those files could just be transferred over and care/feeding required to keep the web server in that state.

Not to mention endless frustration any upgrades would cause since we had to get all teams onboard with "Hey, we are upgrading PHP 5, you ready?" and there was always that abandoned app that couldn't be shut down because $BusinessReasons.

Containers have greatly helped with those frustration points and languages self-hosting HTTP have really made stuff vastly better for us Ops folks.

mrkeen•13m ago

Perhaps. Over SSH? With a password or with a key? Do all employees share the same private key or do keys need to get added and removed when employees come and go. Is there one server or three (Are all deployment instructions done manually in triplicate?). When tomcat itself is upgraded, do you just eat the downtime? What about the system package upgrades or the OS? Which file should be copied over - whatever a particular Dev feels is the latest?

miroljub•5h ago

It depends on the application usage pattern. For heavily used applications, sure, it's an excellent choice.

But imagine having to host 50 small applications each serving a couple of hundreds requests per day. In that case, the memory overhead of Tomcat with 50 war files is much bigger than a simple Apache/Nginx server with a CGI script.

whartung•4h ago

The other issue with Tomcat is that a single bad actor can more easily compromise the server.

Not saying that can't happen with CGI, but since Tomcat is a shared environment, it's much more susceptible to it.

This is why shared, public Tomcat hosting never became popular compared to shared CGI hosting. A rogue CGI program can be managed by the host accounting subsystem (say, it runs too long, takes up too much memory, etc.), plus all of the other guards that can be put on processes.

The efficiency of CGI, specifically for compiled executables, is that the code segments are shared in virtual memory, so forking a new one can be quite cheap. While forking a new Perl or PHP process shares that, they still need to repeatedly go through the parsing phase.

The middle ground of "p-code" can work well, as those files are also shared in the buffer cache. The underlying runtime can map the p-code files into the process, and those are shared across instances also.

So, the fork startup time, while certainly not zero, can be quite efficient.

grandiego•8m ago

I believe even today there's no way to control/isolate memory leaks on a per-war basis.

sugarpimpdorsey•5h ago

Surprised with the choice of Apache. There are better choices for serving CGI nowadays. The only reason for still running Apache is you have legacy cruft that requires Apache (like .htaccess).

mrweasel•5h ago

Apache is still a solid option. It does everything, works with everything and is easy to configure. Performance is perfectly fine for ~99% of everything.

chgs•4h ago

I host 1200 vhosts off apache as an authenticating proxy, and run Al sorts of random scripts.

This is all internal use though, I don’t need to scale to hundreds of concurrent users let alone thousands. Apache and cgi bin is fine.

immibis•5h ago

I think the point of the experiment was to see how fast the old-school tech stack would go on modern hardware.

Twirrim•4h ago

Apache's httpd is great, reliable, fast, and feature-full, and you don't have to deal with Nginx's ongoing conflict over the the open source vs commercial offerings. That conflict has caused needless pain, like e.g. that quirk around dns resolution where if you put the hostname under proxy_pass it only used to resolve it on start-up and ignored TTL (not sure if it's still doing that). There were work-arounds on the open source version, like using a variable instead, but that wasn't necessary in the commercial offering.

jacob2161•2h ago

In the post I also showed results for a little `gohttpd` program running the CGI program:

https://github.com/Jacob2161/cgi-bin/blob/main/gohttpd/main....

See as "Benchmarking (writes|reads) using Go net/http"

It was faster but not by very much. Running CGI programs is just forking processes, so Apache's forking model works just about as well as anything else.

simonw•5h ago

I got my start in the CGI era, and it baked into me an extremely strong bias against running short-lived subprocesses for things.

We invented PHP and FastCGI mainly to get away from the performance hit of starting a new process just to handle a web request!

It was only a few years ago that I realized that modern hardware means that it really isn't prohibitively expensive to do that any more - this benchmark gets to 2,000/requests a second, and if you can even get to a few hundred requests a second it's easy enough to scale across multiple instances these days.

I have seen AWS Lambda described as the CGI model reborn and that's a pretty fair analogy.

pjc50•4h ago

> We invented PHP and FastCGI mainly to get away from the performance hit of starting a new process just to handle a web request!

Yes! Note that the author is using a technology that wasn't available when I too was writing cgi_bin programs in the 00's: Go. It produces AOT compiled executables but is also significantly easier to develop in and safer than trying to do the same with C/C++ in the 00's. Back then we tended to use Perl (now basically dead). Perl and Python would incur significant interpreter startup and compilation costs. Java was often worse in practice.

> I have seen AWS Lambda described as the CGI model reborn and that's a pretty fair analogy.

Yes, it's almost exactly identical to managed FastCGI. We're back to the challenges of deployment: can't we just upload and run an executable? But of course so many technologies make things much, much more complicated than that.

shrubble•29m ago

I know of two large telecoms that internally develop with Perl, and a telecom product sold by Oracle that heavily relies on Perl. For text munging etc. it is still used, though I grant that other languages like Python are more popular.

geocar•4h ago

I think you might have found that CGI scripts deployed as statically-linked C binaries, with some attention given to size, you might've not been so disappointed.

The "performance hit of starting a new process" is bigger if the process is a dynamically-linked php interpreter with gobs of shared libraries to load, and some source file, reading parsing compiling whatever, and not just by a little bit, always has been, so what the author is doing using go, I think, would still have been competitive 25 years ago if go had been around 25 years ago.

Opening an SQLite database is probably (surprisingly?) competitive to passing a few sockets through a context switch, across all server(ish) CPUS of this era and that, but both are much faster than opening a socket and authenticating to a remote mysql process, and programs that are not guestbook.cgi often have many more resource acquisitions which is why I think FastCGI is still pretty good for new applications today.

simonw•3h ago

That's likely true - but C is a scary language to write web-facing applications in because it's so easy to have things like buffer overflows or memory leaks.

rascul•3h ago

Can use rust, go, or whatever compiled language you want and it'll probably be much more performant starting up than any interpreted language.

simonw•2h ago

We didn't have so many options in the 90s!

dewitt•2h ago

> Can use rust, go, or whatever compiled language you want and it'll probably be much more performant starting up than any interpreted language.

One additional bit of context is the person you’re replying to, simonw, is also the creator of Django, which at the time was the world’s defacto standard Python web framework, and was created at a time (2005) that long predates either Go (2009) or Rust (2012).

foobiekr•2h ago

CGI basically is a protocol between the webserver and some process. It didn't go out of fashion because of C (many/most CGI scripting for a time was perl and php or even bash), it went out of fashion because people wanted (or were told that smart people did this) to use languages that had runtimes which were expensive to run in a fork()+exec() execution model like Java.

You can use it with any language that can read stdin and write stdout. Yes, printing and reading.

mh-•50m ago

I wrote my first web app upon realizing that cgi-bin stuff could just be .bas files I compiled in QB 4.5. I pretty quickly switched to Perl for the ecosystem of helper libs, but at the time Basic was all I knew. I think I was 10 or 11 years old.

qingcharles•1h ago

Neither of these things came up in the early days of web apps development in the mid-90s when I was doing this.

I had no real debugging environment. I was probably writing all my code in vi and then just compiling and deploying. I guarantee there were buffer overflows and off-by-ones etc.

Web app code was so simple back then, though. The most complex one I wrote was a webmail app, which I was so pleased with, and then HoTMaiL was released three weeks later, with this awesome logo:

https://tenor.com/view/hotmail-outlook-microsoft-outlookcom-...

geocar•30m ago

Don't be afraid.

Look at qmail, which has the best track record of any piece of software I am aware of in wide distribution, and it was written in C.

Also: Memory leaks go away when you exit(), so they are actually more common in dynamic languages in my experience, although they manifest as fragmentation that the interpreter simply lacks the ability to do anything out.

Buffer overflows seem pretty common to people who do a lot of dynamic memory allocation: I would recommend not doing that in response to user-input.

The result is that your C-based guestbook CGI is probably written very differently than a PHP-based guestbook. Mine basically just wrote to a logfile because since 2.6.35 we have been able to easily make a 1mb PIPE_BUF and get lock-free stores with no synchronisation and trivial recovery, and thus know exactly where each post began and end. I'm not sure I want more than 1mb of user input back in those days, but the design made me very confident there were no memory leaks or buffer overflows in what was like 5 system calls. No libraries.

You could do this.

You can do this.

But you want more? That C-based guestbook also only ever needs to write to one file, so permissions could be (carefully) arranged to make that the only file it can write to. A PHP-based guestbook needs read (and possibly write-access) to lots of files. Some of those things can be shared objects. It is so much easier to secure a single static binary than a dynamic language with dynamic loading that if you actually care about security, you could focus on how to make those static binaries easier.

citrin_ru•2h ago

CGI never was prohibitively expensive for low load and for high load a persistent process (e. g. FastCGI) is still better. CGI may be allows to handle 2k rps but FastCGI app doing the same job should handle more. You would need to start an additional server process (and restart it on upgrade) but it's worth to do if performance matters.

cenamus•1h ago

I agree, but if you're doing fastcgi, you might as well do http directly, with a relay in front of it (load balancing, tls termination, whatever).

cb321•4h ago

Even back in the 1990s, CGI programs written in C were lightning fast. It just was (is) an error prone environment. Any safer modern alternative like the article's Go program or Nim or whatever not making database connections will be very fast & low latency to localhost - really similar to a CLI utility where you fork & exec. It's not free, but it's not that expensive compared to network latencies then or now.

People/orgs do tend to get kind of addicted to certain technologies that can interact poorly with the one-shot model, though. E.g., high start up cost Python interpreters with a lot of imports are still pretty slow, and people get addicted to that ecosystem and so need multi-shot/persistent alternatives.

The one-shot model in early HTTP was itself a pendulum swing from other concerns, e.g. ftp servers not having enough RAM for 100s of long-lived, often mostly idle logins.

foobiekr•2h ago

You know, CGI with pre-forking (for latency hiding) and a safer language (like Rust) would be a great system to work on. Put the TLS termination in a nice multi-threaded web server (or in a layer like CloudFront).

No lingering state, very easy to dump a core and debug, nice mostly-linear request model (no callback chains, etc.) and trivially easy to scale. You're just reading from stdin and writing to stdout. Glorious. Websockets adds a bit of complexity but almost none.

The big change in how we build things was the rise of java. Java was too big, too bloated, too slow, etc. so people rapidly moved into multi-threaded application servers, all to avoid the cost of fork() and the dangers of C. We can Marie Kondo this shit and get back to things that are simple if we want to.

I don't even like Rust and this sounds like heaven to me. Maybe someone will come up with a way to make writing the kind of web-tier backend code in Rust easy by hiding a lot of the tediousness and/or complexity in a way that makes this appealing to node/js, php and python programmers.

cb321•2h ago

This is not to disagree, but to agree adding some detail... :-)

Part of the Java rise was C/C++ being error prone and syntax similarity with such, but this was surely intermingled with a full scale marketing assault by Sun Microsystems who at the time had big multi-socket SMP servers they wanted to sell with Solaris/etc. and part of that was the Solaris/Java threading. Really for a decade or two prior to that the focus was on true MMU-based hardware-enforced isolation with OS kernel clean-up (more like CHERI these days) not the compiler-enforced stuff like Rust does.

I think you could have something more ergonomic than Perl/Python ever was and as practically fast as C/Rust with Nim (https://nim-lang.org/). E.g., I just copied that guy's benchmark with a Nim stdlib std/cgi and got over 275M CGI/day to localhost on a 2016 CPU doing only 2 requesters & 2 http server threads. With some nice DSL easily written if you don't like any current ones you could get the "coding overhead" down to a tiny footprint. In fairness I did zero SQLite whatever, but also he was using a computer over 4x bigger and probably a GHz faster with some IPC lift as well. So, IF you had the network bandwidth (hint - usually you don't!), you could probably support billions of hits/day off a single server.

To head off some lazy complaints, GC is just not an issue with a single threaded Nim program whose lifetime is hoped/expected to be short anyway. In many cases (just as with CLI utilities!) you could probably just let the OS reap memory, but, of course, it always "all depends" on a lot of context. Nim does reference counting anyway whereas most "fighting the GC" is actually fighting a "separate GC thread" (Java again, Go, D, etc.) trashing CPU caches or consuming DIMM bandwidth and so on. For this use, you probably would care more about a statically linked binary so you don't pay ld.so shared library set up overhead on every `exec`.

jbverschoor•2h ago

There were quite a lot of ISA and OS platforms when Java started. Java made a lot of sense. It was that, and a superb standard library

cb321•42m ago

Ah. You refer to the "write once, run anywhere" marketing slogan at a time when there was a lot of JVM-to-JVM variability and JVM JITs were not very advanced. I didn't buy that slogan at the time (look at aaaaall that open source code running all over with ./configure) and always hated Java's crazy boilerplate verbosity combined with a culture to defend it as somehow "better".

I mean, it's not like people ran Internet servers on such vastly different CPUs/OSes or that diversity was such a disadvantage. DEC Alpha was probably the most different for its 64-bitness, but I ran all those open source Linux/C things on that by 1996..97. But we may just have to agree to disagree that it made a lot of sense for that reason. I have disagreements with several high profile SiValley "choices" and I know I'm a little weird.

Anyway, I don't mean to be arbitrarily disputatious. Focusing on what we do agree on, I agree 100% early Java stdlib's being bigger than C/C++ & early STL/template awfulness was a huge effect. :-) C++ like keywords and lexical sensibilities mattered, too. PLang researchers joked upon Java's success that C's replacement sure had to "look like C". But I think programmers having all their lib needs met with very little work matters even more and network-first package managers were just getting going. A perceived-as-good stdlib absolutely helps Go even today. Human network effects are a very real driver even among very talented engineers.

Maybe since CTAN/CPAN?, that often comes more from popularity and the ecosystem than "what's in the stdlib". Even before then there was netlib/fortran algorithm distribution, though. Node/Rust/Python worlds today show this.

How to drive popularity in a market of ideas competing for it is hard/finicky or the biggest marketing budget would always win and people could also always just "buy reputation" which empirically does not happen (though it sure happens sometimes which I guess shows ad spend is not wasted). Even so, "free advertising", "ecosystem builds going exponential", etc. - these are just tricky to induce.

The indisputable elephant in the room is path dependence. Fortran, still somewhat reflective of people "stacking punch card decks" to "link" programs in the 1950s, is still used by much modern scientific research either directly or indirectly. Folks were literally just figuring out what a PLang should be and how interacting with these new things called "computers" might work.

But path dependence is everywhere all around us.. in institutions, traditions, and technology. It's all really a big Humanity Complete discussion that spirals into a cluster of Wicked Problems. Happens so fast on so many topics. :-) If you happen to make something that catches on, let us hope you didn't make too many mistakes that get frozen in! Cheers!

petee•4h ago

A fastcgi comparison would be interesting

10000truths•4h ago

The nice thing about CGI is that you don't have to reinvent isolation primitives for multi-tenant use cases. A bug in one request doesn't corrupt another request, due to process isolation. An infinite loop in one request doesn't DoS other requests, due to preemptive scheduling. You can kill long-running requests with rlimit. You can use per-tenant cgroups to fairly allocate resources like memory, CPU and disk/network I/O. You can use namespaces/jails and privilege separation to restrict what a request has access to.

bob1029•3h ago

> These days, we have servers with 384 CPU threads. Even a small VM can have 16 CPUs. The CPUs and memory are much faster as well.

With this hardware, if you reach for Kestrel you can easily do a few trillion requests per day. The development experience would be nearly identical - You can leverage the string interpolation operator for a PHP-like experience. LINQ and String.Join() open the door to some very terse HTML template syntax for tables and other nested elements.

The hard part is knowing how to avoid certain landmines in the ecosystem (MVC/Blazor/EF/etc.). The whole thing can live in one top-level program file that is ran on the CLI, but you need to know the magic keywords - "Minimal APIs" - or you will find yourself in the middle of the wrong documentation.

anoojb•3h ago

The amount of Director/VP level promos that have come from creating abstractions on top of core technology no one gets rewarded for is amazing.

lukeasrodgers•3h ago

Does anyone know if the benchmarking tool the author uses, plow, avoids coordinated omission (https://www.scylladb.com/2021/04/22/on-coordinated-omission/)? I didn’t see any mention in the docs, and haven’t been able to peruse the source code yet.

p0w3n3d•2h ago

This was something that I've been suspecting for some time. We're moving towards complicated architecture while having possibility to use good ol' tech with newest CPUs.

I've been asked about architecture of a stocks ticker that would serve millions of clients to show them on their phone the current stock price. First thought was streams, Kafka, pubsub etc but then I came up with static files on a server.

I wonder how much would it cost though

nine_k•1h ago

AFAICT the latency of any non-trivial web API is determined by the latency of DB queries, ML model queries, and suchlike. The rest is trivial in comparison, even when using slow languages like Python.

If all you need is to return rarely-changing data, especially without wasting time on authorization, you can easily approach the limits of your NIC.

firefoxd•2h ago

I've created a visualizer for apache requests with the workers, queues and whatnot [0]. You can load the demo to view real traffic comic from HN earlier this year.

[0]: https://www.ibrahimdiallo.com/reqvis

Note: works best on desktop browsers for now.

znpy•1h ago

Op is probably missing the point: 2400 requests/second is abysmally low on a modern 16 core cpu.

At the very least go for FastCGI, for christ’s sake…

apgwoz•1h ago

Yeah, instead of relooking at this, we went off and built a new paradigm “serverless functions.” Obviously, serverless functions, like those via Lambda, have some other safety mechanisms in place (eg micro vms), but you could probably get pretty far with CGI and adjusting capabilities and such, with far less complexity.

kiitos•50m ago

> I used plow to make concurrent HTTP requests and measure the results.

If this refers to https://github.com/six-ddc/plow then -- oops! lots of issues in that repo, no tests, etc. etc.

The results in the README are also pretty clearly unsound! In both scenarios, writes were faster than reads?

_edit_: I guess because the writes all returned 3xx, oops again!

Probably don't take this article's claims at face value...

TZubiri•47m ago

Nonono, that can't be right, you need Kubernetes and Kafka and RabbitMQ and Graphana and a 300K$/day bill

LLMs caused drastic vocabulary shift in biomedical publications

Mini NASes marry NVMe to Intel's efficient chip

EverQuest

How to Incapacitate Google Tag Manager and Why You Should (2022)

Why I left my tech job to work on chronic pain

Air Pollution May Contribute to Development of Lung Cancer in Never-Smokers

Compression Dictionary Transport

Kepler.gl

Eight dormant Satoshi-era Bitcoin wallets reactivated after 14 yrs

Show HN: I AI-coded a tower defense game and documented the whole process

UpCodes (YC S17) is hiring a Head of Ops to automate construction compliance

Writing a Game Boy Emulator in OCaml

Larry (cat)

ChatGPT creates phisher's paradise by serving the wrong URLs for major companies

Bcachefs may be headed out of the kernel

``Free as Air, Free as Water, Free as Knowledge'' (1992)

Is an Intel N100 or N150 a better value than a Raspberry Pi?

Gremllm

Lens: Lenses, Folds and Traversals

In a milestone for Manhattan, a pair of coyotes has made Central Park their home

Can Large Language Models Play Text Games Well?

Show HN: BunkerWeb – the open-source and cloud-native WAF

Wind Knitting Factory

Zig breaking change – initial Writergate

Show HN: A cross-platform terminal emulator written in Java

Rust and WASM for Form Validation

The Novelty of the Arpanet

Killer whales groom each other with pieces of kelp

Logging Shell Commands in BusyBox? Yes, You Can Now

A Rust-TypeScript integration