Well ackshually ... the technology here that was important was mod_php; PHP itself was no different to Perl in how it was run, but the design choice of mod_php as compared to mod_perl was why PHP scripts could just be dumped on the server and run fast, where you needed a small amount of thinking and magic to mod_perl working.
mod_perl2[0] provides the ability to incorporate Perl logic within Apache httpd, if not other web servers. I believe this is functionally equivalent to the cited PHP Apache module documentation:
Running PHP/FI as an Apache module is the most efficient
way of using the package. Running it as a module means that
the PHP/FI functionality is combined with the Apache
server's functionality in a single program.
0 - https://perl.apache.org/docs/2.0/index.htmlEDIT: I have managed to dig out slides from a talk I gave about this a million years ago with a good section that walks through history of how all this worked, CGIs, mod_perl, PSGI etc, for anyone who wants a brief history lesson: https://www.slideshare.net/slideshow/psgi-and-plack-from-fir...
I got into web dev in the tail end of perl and cgi-bin. I remember my first couple scripts which were just copy/paste from tutorials and what not, everyone knows how it goes. It was very magical to me how this "cgi-bin" worked. There was a "script kiddy hacking tool" I think named subseven (or similar) written partially in perl that you would trick your friends into running or you'd upload on filesharing. The perl part gave you your web based C&C to mess with people or open chats or whatever. I really got into programming trying to figure out how this all worked. I soon switched over to PHP and in my inexperience never realized the deployment model was so similar.
I do think this model of running the script once per request and then exiting really messed with my internal mental model of how programs and scripts worked. Once I was exposed to long running programs that could maintain state, their own internal data structures, handling individual requests in a loop, etc, was a real shock and took me awhile to conceptualize.
It’s strange thinking back to the days where persisting information as simple as a view counter required persisting data to a flatfile* or something involving a database.
These days with node and our modern languages like go and rust it’s immediately obvious how it’s done.
I think it’s both a mix of me learning and growing and the industry evolving and growing, which I think all of us experience over time.
* for years using flat files was viewed as bad practice or amateurish. fun to learn years later that is how many databases work.
What almost brought us to tears the day we learned about PHP was how everything we had been painstakingly programming ourselves from scratch reading RFCs or reverse engineering HTTP was just a simple function call in PHP. No more debugging our scuffed urlencode implementation or losing a day to a stray carriage return in an HTTP header...
Another place this can be useful is for allowing customers to extend a local software with their own custom code. So instead of having to use say MCP to extend your AI tool they can just implement a certain request structure via CGI.
This makes me wonder if an MCP service couldn't be also implemented as CGI: an MCP framework might expose its feature as a program that supports both execution modes. I have to dig into the specs.
The fork[0] system call has been a relatively quick operation for the entirety of its existence. Where latency is introduced is in the canonical use of the execve[1] equivalent in newly created child process.
> ... cgi bin works really well if you don’t have to pay for ssl or tcp connections to databases or other services, but you can maybe run something like istio if you need that.
Istio[2] is specific to Kubernetes and thus unrelated to CGI.
0 - https://man.freebsd.org/cgi/man.cgi?query=fork&apropos=0&sek...
1 - https://man.freebsd.org/cgi/man.cgi?query=execve&sektion=2&a...
That is, if your web service struggles to handle single-digit millions of requests per day, not counting static "assets", CGI process startup is not the bottleneck.
A few years ago I would have said, "and of course it's boring technology that's been supported in the Python standard library forever," but apparently the remaining Python maintainers are the ones who think that code stability and backwards compatibility with boring technology are actively harmful things, so they've been removing modules from the standard library if they are too boring and stable. I swear I am not making this up. The cgi module is removed in 3.13.
I'm still in the habit of using Python for prototyping, since I've been using it daily for most of the past 25 years, but now I regret that. I'm kind of torn between JS and Lua.
Amusingly that links to https://peps.python.org/pep-0206/ from 14th July 2000 (25 years ago!) which, even back then, described the cgi package as "designed poorly and are now near-impossible to fix".
Looks like the https://github.com/jackrosenthal/legacy-cgi package provides a drop-in replacement for the standard library module.
There are certainly some suboptimal design choices in the cgi module's calling interface, things you did a much better job of in Django, but what made them "near-impossible to fix" was that at the time everyone reading and writing PEPs considered backwards compatibility to be not a bad thing, or even a mildly good thing, but an essential thing that was worth putting up with pain for. Fixing a badly designed interface is easy if you know what it should look like and aren't constrained by backwards compatibility.
That policy, and the heinous character assassination the PSF carried out against Tim Peters, mean I can no longer recommend in good conscience that anyone adopt Python.
Also I used Python way before JS, and I still like JS's syntax better. Especially not using whitespace for scope, which makes even less sense in a scripting language since it's hard to type that into a REPL.
Jupyter fixes the REPL problem, and it's a major advance in REPLs in a number of other ways, but it has real problems of its own.
What Node.js had from the start was concurrency via asynchronous IO. And before Node.js was around to be JavaScript's async IO framework, there was a robust async IO framework for Python called Twisted. Node.js was influenced by Twisted[0], and this is particularly evident in the design of its Promise abstraction (which you had to use directly when it was first released, because JavaScript didn't add the async/await keywords until years later).
But I also understand that the world is not perfect. We all need to prioritize all the time. As they write in the rationale: "The team has limited resources, reduced maintenance cost frees development time for other improvements". And the cgi module is apparently even unmaintained.
I guess a "batteries included" philosophy sooner or later is caught up by reality.
What do you mean by "character assassination" carried out against Tim Peters? Not anything in the linked article I presume?
As for prioritizing, I think the right choice is to deprioritize Python.
https://www.theregister.com/2024/08/09/core_python_developer...
https://tim-one.github.io/psf/ban
https://chrismcdonough.substack.com/p/the-shameful-defenestr...
This is a bit like Apple firing Steve Jobs for wearing sneakers to work because it violates some dress code.
I have bash CGI scripts too, though Shellshock and bash's general bug-proneness make me doubt that this was wise.
There are some advantages of having the CGI protocol implemented in a library. There are common input-handling bugs the library can avoid, it means that simple CGI programs can be really simple, and it lets you switch away from CGI when desired.
That said, XSS was a huge problem with really simple CGI programs, and an HTML output library that avoids that by default is more important than the input parsing—another thing absent from Python's standard library but done right by Django.
That said these days I'd rather use Go.
Admittedly Python is not great at this either (reload has interacted buggily with isinstance since the beginning), but it does attempt it.
I agree its not a rapid prototyping kind of language. AI assistance can help though.
Jython no longer works with basically any current Python libraries because it never made the leap to Python 3, and the Python community stigmatizes maintaining Python 2 compatibility in your libraries. This basically killed Jython, and from my point of view, Jython was one of the best things about Java.
Lua barely has any stdlib to speak of, most notably in terms of OS interfaces. I'm not even talking about chmod or sockets; there's no setenv or readdir.
You have to install C modules for any of that, which kinda kills it for having a simple language for CGI or scripting.
Don't get me wrong, I love Lua, but you won't get far without scaffolding.
But my concern is mostly not about needing to bring my own batteries; it's about instability of interfaces resulting from evaporating batteries.
LuaJIT, release-wise, has been stuck in a very weird spot for a long time, before officially announcing it's now a "rolling release" - which was making a lot of package maintainers anxious about shipping newer versions.
It also seems like it's going to be forever stuck on the 5.1 revision of the language, while continuing to pick a few cherries from 5.2 and 5.3. It's nice to have a "boring" language, but most distros (certainly Alpine, Debian, NixOS) just ship each release branch between 5.1 and 5.4 anyway. No "whatever was master 3 years ago" lottery.
Your app could add almost no latency beyond storage if you try.
At the time Perl was the thing I used in the way I use Python now. I spent a couple of years after that working on a mod_perl codebase using an in-house ORM. I still occasionally reach for Perl for shell one-liners. So, it's not that I haven't considered it.
Lua is in a sense absolutely stable unless your C compiler changes under it, because projects just bundle whatever version of Lua they use. That's because new versions of Lua don't attempt backwards compatibility at all. But there isn't the kind of public shaming problem that the Python community has where people criticize you for using an old version.
JS is mostly very good at backwards compatibility, retaining compatibility with even very bad ideas like dynamically-typed `with` statements. I don't know if that will continue; browser vendors also seem to think that backwards compatibility with boring technology like FTP is harmful.
- `perl -de 0` provides a REPL. With a readline wrapper, it gives you history and command editing. (I use comint-mode forn this, but there are other alternatives.)
- syscalls can automatically raise exceptions if you `use autodie`.
Why is this not the default? Because Perl maintainers value backward compatible. Improvements will always sit behind a line of config, preventing your scripts from breaking if you accidentally rely on functionality that later turns out to be a mistake.
Perl feels clumsy and bug-prone to me these days. I do miss things like autovivification from time to time, but it's definitely bug-prone, and there are a lot of DWIM features in Perl that usually do the wrong thing, and then I waste time debugging a bug that would have been automatically detected in Python. If the default Python traceback doesn't make the problem obvious, I use cgitb.enable(format='text') to get a verbose stack dump, which does. cgitb is being removed from the Python standard library, though, because the maintainers don't know it can do that.
Three years ago, a friend told me that a Perl CGI script I wrote last millennium was broken: http://canonical.org/~kragen/sw/rfc-index.cgi. I hadn't looked at the code in, I think, 20 years. I forget what the problem was, but in half an hour I fixed it and updated its parser to be able to use the updated format IETF uses for its source file. I was surprised that it was so easy, because I was worse at writing maintainable code then.
Maybe we could do a better job of designing a prototyping language today than Larry did in 01994, though? We have an additional 31 years of experience with Perl, Python, JS, Lua, Java, C#, R, Excel, Haskell, OCaml, TensorFlow, Tcl, Groovy, and HTML to draw lessons from.
One benefit Perl had that I think not many of the other languages do was being designed by a linguist. That makes it different -- hard to understand at first glance -- but also unusually suitable for prototyping.
Python has Werkzeug, Flask, or at the heavier end Django. With Werkzeug, you can translate your CGI business logic one small step at a time - it's pretty close to speaking raw HTTP, but has optional components like a router or debugger.
I agree that tens of milliseconds of latency is significant to the user experience, but it's not always the single most important consideration. My ping time to news.ycombinator.com is 162–164ms because I'm in Argentina, and I do unfortunately regularly have the experience of web page loads taking 10 seconds or more because of client-side JS.
All that was in the cgi module was a few functions for parsing HTML form data.
As a side note, though, CGIHTTPRequestHandler is for launching CGI programs (perhaps written in Rust) from a Python web server, not for writing CGI programs in Python, which is what the cgi module is for. And CGIHTTPRequestHandler is slated for removal in Python 3.15.
The problem is gratuitous changes that break existing code, so you have to debug your code base and fix the new problems introduced by each new Python release. It's usually fairly straightforward and quick, but it means you can't ship the code to someone who has Python installed but doesn't know it (they're dependent on you for continued fixes), and you can't count on being able to run code you wrote yourself on an earlier Python version without a half-hour interruption to fix it. Which may break it on the older Python version.
The support for writing CGI programs in Python is in wsgiref.handlers.CGIHandler .
Since I learnt Python starting in version 1.6, it has mostly been for OS scripting stuff.
Too many hard learnt lessons with using Tcl in Apache and IIS modules, continuously rewriting modules in C, back in 1999 - 2003.
It's possible that your experience with people switching was later, when performance was no longer such a pressing concern.
These were real issues on multi-user hosts, but as most of the time we don’t use shared hosting like that anymore it’s not an issue.
There were also some problems with libraries parsing the environment variables with the request data wrong, but that’s no different from a badly implemented http stack these days. I vaguely recall some issues with excessively log requests overflowing environment variables, but I can’t remember if that was a security problem or DoS.
"A brief, incomplete and largely inaccurate history of dynamic webpages"
https://www.slideshare.net/slideshow/psgi-and-plack-from-fir...
And we trading performance for what exactly? Code certainly didn't become any simpler.
Which is only a small proportion of sites out there.
So you really do not have to be bothered by installation or anything of these lines. You install once and you are fine. You should check out the Wiki pages of Arch Linux, for example. It is pretty straightforward. As for upgrades, Arch Linux NEVER broke. Not on my servers, and not on my desktop.
That said, to each their own.
I think the worst drama ever was a partial disk failure. Things kinda hobbled along for awhile before things actually started failing, and at that point things were getting corrupted. That poofed a weekend out of my life. Now I have better monitoring and alerting.
Or if you don't want to pay for an 8/16 for the sort of throughput you can get on a VPS with half a core.
In practise I'm not convinced -- but I would love to be. Reverse proxying a library-specific server or fiddling with FastCGI and alternatives always feels unnecessarily difficult to me.
This let's you drop .htaccess files anywhere and Apache will load them on each request for additional server config. https://httpd.apache.org/docs/2.4/howto/htaccess.html
One big reason to avoid them was performance; it required extra disk access on every request and it was always better to put the configuration in the main config file if possible.
But now? When most servers have an SSD and probably spare RAM that Linux will use to cache the file system?
Ok, performance is still slightly worse as Apache has to parse the config on every request as opposed to once, but again, now that most servers have more powerfull CPU's? In many use cases you can live with that.
[ Side project is very early version but I'm already using it: https://github.com/StaticPatch/StaticPatch/tree/main ]
> I'm not a real programmer. I throw together things until it works then I move on. The real programmers will say "Yeah it works but you're leaking memory everywhere. Perhaps we should fix that." I’ll just restart Apache every 10 requests.
PHP got a very long way since then, but a huge part of that was correcting the early mistakes.
> PHP 8 is significantly better because it contains a lot less of my code.
I do have thoughts for later about modes which could take all the config from .htaccess files and build them into the main config so then you avoid any performance issues - however you have to do that carefully to make sure people don't include any bad config that crashes the whole server. One of the nice things about using .htaccess files as intended is Apache has the Nonfatal flag on AllowOverride so you can avoid that. https://httpd.apache.org/docs/2.4/mod/core.html#allowoverrid...
IMO you don't need to compensate for bad configs if you're using a proper staging environment and push-button deployments (which is good practice regardless of your development model). In prod, you can offset its main issue (atomic deployments) by swapping a symlink. In that scenario, having a separate .htaccess file actually helps - you don't want to restart Apache if you can avoid it, and again - hot reloading can hide state.
My main issue is that this is all a very different model from what most languages, frameworks, and runtimes have been doing for almost 20 years now. If you're a sysop dealing with heterogenous environments, it's honestly just annoying to have separate tooling and processes.
Personally, ca 10 years ago, this was the tipping point at which I've demanded from our PHP devs that they start using Docker - I've been reluctant about it until that moment. And then, whether it was .htaccess or the main config, no longer mattered - Apache lived in a container. When I needed to make sure things performed well, I used Locust <https://locust.io/>. Just measure, then optimise.
So in practice, yes, spiritually I'm doing what PHP8 did to PHP3. Whether that's "approvingly" is up to your interpretation ;)
I can totally see how the cgi-bin process-per-request model is viable in a lot of places, but when it isn't, the difference can be vast. I don't think we'd have benefited from the easier concurrency either, but that's probably just because it was all golang to begin with.
There is work-arounds but usually it a better idea to ditch PHP for a better technology more suited for modern web.
And for your information, you can have stateful whatnots in PHP. Hell, you can have it in CSS as I have demonstrated in my earlier comments.
import wsgiref.handlers, flask
app = flask.Flask(__name__)
wsgiref.handlers.CGIHandler().run(app)
The way we run the scripts is with uwsgi and its cgi plugin[1]. I find it simpler and more flexible than running apache or lighttpd just for mod_cgi. Since uwsgi runs as a systemd unit, we also have all of systemd's hardening and sandboxing capabilities at our disposal. Something very convenient in uwsgi's cgi handling that's missing from mod_cgi, is the ability to set the interpreter for a given file type: cgi = /cgi-bin=/webapps/cgi-bin/src
cgi-allowed-ext = .py
cgi-helper = .py=/webapps/cgi-bin/venv/bin/python3 # all dependencies go here
Time to first byte is 250-350ms, which is acceptable for our use case.
eyberg•10h ago
That same go program can easily go over 10k reqs/sec without having to spawn a process for each incoming request.
CGI is insanely slow and insanely insecure.
simonw•10h ago
EDIT: Looks like the way CGI works made it vulnerable to Shellshock in 2014: https://en.m.wikipedia.org/wiki/Shellshock_(software_bug)
I agree that there's probably not much of an argument to switch to it from the well established alternative mechanisms we are using already.
The one thing in its favor is that it makes it easier to have a polyglot web app, with different languages used for different paths. You can get the same thing using a proxy server though.
eyberg•10h ago
simonw•10h ago
BirAdam•10h ago
cranberryturkey•9h ago
AdieuToLogic•7h ago
And a Go program reading from a network connection is immune from the same concerns how?
tonyedgecombe•6h ago
From your linked article: If the handler is a Bash script, or if it executes Bash...
But we are talking about Python not Bash.
kragen•5h ago
slashdave•5h ago
kragen•5h ago
I hesitate to suggest that you might be misremembering things that happened 30 years ago, but possibly you were using a very nonstandard setup?
sitharus•5h ago
You could configure the server to be insecure by, eg, allowing cgi execution from a directory where uploaded files are stored.
mike_hearn•9m ago
cperciva•10h ago
klysm•10h ago
giantrobot•7h ago
jasonvorhe•14m ago
__float•10h ago
And in a similar vein, Postgres (which is generally well liked!) uses a new backend process per connection. (Of course this has limitations, and sometimes necessitates pgbouncer, but not always.)
Tractor8626•7h ago
Uber famously switched from pg to mysql because their SWEs couldn't properly manage connections
anonzzzies•5h ago
reddec•6h ago
However, through the years I learned:
- yes, forks and in general processes are fast - yes, it saves memory and CPU on low load sites - yes, it’s simple protocol and can be used even in shell
However,
- splitting functions (mimic serverless) as different binaries/scripts creates mess of cross scripts communication - deployment is not that simple - security wise, you need to run manager as root and use unique users for each script or use cgroups (or at least chroot). At that moment the main question is why not use containers asis
Also, compute wise, even huge Go app with hundreds endpoints can fit just few megabytes of RAM - there is no much sense to save so few memory.
At worst - just create single binary and run on demand for different endpoints