There’s a reason we stopped using CGI the way we did though. You wouldn’t want to use escript for them, unless you’re happy to wait for the BEAM startup on every single request. Then the memory is used per-request, so a few slow requests can kill your server. And you lose in-process cache and any connection pooling too. I find this a bit annoying every time someone says that Lambda is just CGI these days…
CGI has some basic uses, but these days, I can’t think of any reason I would use it instead of FastCGI or something similar.
CGI has some basic uses, but these days, I can’t think of any reason I would use it instead of FastCGI or something similar.
One of the nicest things is if you write a program in such a way it works with basic cgi, it will also work in almost any other deploy system. You can transparently switch it to fastcgi or scgi or an embedded http server since the handler, by definition, will not rely on any of that cross-request in-process stuff (though you can still provide it if you want as a bonus, of course). There’s a lot of benefit in this - you can also do horizontal scaling with the model quite easily.
a few slow requests can kill your server.
this tends to be the case on a lot of setups, cgi is fairly resilient to it since you can always spawn more processes or kill individual requests from the outside.
Exactly. The reason they listed CGI for Perl is because presumably Perl didn’t have anything better. CGI is very much “the simplest thing that could possibly work”, but it has terrible performance since it launches a new process to handle every request. It’s ok for a toy project, but not much more.
but it has terrible performance since it launches a new process to handle every request
We should be careful to distinguish between Unix processes and interpreters / VMs
A process in C / C++ / Rust should start in 1 ms or so, which is plenty fast for many sites.
The real problem is that Perl’s “successors” like Python and Ruby start an order of magnitude slower – 30 ms is probably a minimum, and it’s easy to get up to 300 ms when you import libraries
node.js and Java have additional JIT startup time, and basically the JIT will not get warmed up at all.
A program that proves the point is cgit, which is CGIs written in C, and you can see it’s plenty fast.
My blog is a CGI program in C and it can return a page in 0.01 seconds. I’ve never really had an issue with performance in the 23 years I’ve been using the program (and that’s even when I first started out using a 66MHz 32-bit x86 Linux server in 1999). I think establishing the TLS connection is more system intensive than my CGI program.
It pains me to think of launching a process as “fast”, but yeah, you’re right about that. A ms isn’t unreasonable latency.
Of course even a C/Rust CGI will slow down if if it has to do something like open a connection to a database server or parse a big config file.
FastCGI isn’t much harder to implement and saves you the startup overhead. (Although TBH I can’t recall whether it was FastCGI I used or a different similar protocol.)
I think basically what happened is that everything else got so slow – Python/Ruby startup time, TLS, web pages do 100 network round trips now, etc. – that Unix processes are fast now :)
It’s basically like bash saying it’s “too big and too slow” in the man page written in the 90’s, but now it’s fast and small, compared to modern programs
Database connections are an issue, especially on shared hosting, but there seems to be this newfound love for sqlite which would almost eliminate that :)
I currently use FastCGI on Dreamhost, but my impression was that it was kind of a weird protocol without many implementations. It seems to be almost a PHP implementation technique – for some reason, it’s not widely used for Python / node.js / etc.
I actually had to download an ancient version of a FastCGI Python library to get it work. It’s clear to me that very few people use it on shared hosting
Which is shame because as mentioned in this thread, AWS Lambda is basically FastCGI – it has cold starts, and then it’s fast.
FastCGI was a big deal ~2000, less so now. There used to be a lot of stuff using it. It still has currency in PHP-land because php-fpm is the only reasonable way to run PHP besides Apache mod_php, and people have various reasons for not wanting to run Apache, or not wanting to have mod_php in their Apache.
For the uninitiated could you break down the difference between CGI and FastCGI. I understand the flow of CGI being request comes in, program runs and its stdout is sent back to the client.
But in FastCGI is their persistence processes involved?
Yeah, with fastcgi the launcher can spawn and kill worker processes as needed, and each worker process can handle an arbitrary number of requests in sequence. They might live a long time or a short time depending on load and configuration.
You can abstract it so a single program works as cgi or fastcgi pretty easily.
There’s also a scgi alternative, which aims to be a simplified version of fastcgi but is generally the same thing, just the communication protocol is a bit different.
To be fair that’s what “lambda”/Function-as-a-service things do (aws). Worse, they bring up the entire virtual machine up when a request is made. They seem to be used in production, albeit in very narrow workflows where the latency requirements aren’t tight.
That’s exactly what I mentioned above… No, lambda is not even close to CGI. Lambda spawns an environment on your first request and retains it for a predefined time, so that other requests reuse it. Under enough load it will spawn more of those environments to keep the responses quick. This allows for in memory caching and preserving database connections. There’s also no “entire virtual machine”, but a thin implementation called firecracker, which is not that much slower than starting a standard process. It comes with lots of resource usage controls and scaling options which CGI itself has no concept of. If you’re fine with the first request in a batch taking over 100ms (which honestly most services are), you shouldn’t suffer from lambda latency. (Officially listed as 150ms https://firecracker-microvm.github.io/ )
It’s closer to fastcgi container autoscaled in a cluster.
If you don’t, then everything is CGI with bolted on persistence. Literally anything that handles http requests is CGI with bolted on persistence if lambda is. Why even stop there - qmail is CGI with bolted on email protocol. That would make “CGI” itself meaningless.
Or from the other side - since CGI is defined as starting a new process per connection, which implies memory separation between requests, why would anything that doesn’t implement that core idea be called CGI?
There are a lot of “CGI-ish” protocols which have kept many of the conventions of CGI even if they haven’t kept exactly the same mechanics as CGI. So, as I pointed out in another comment, Python’s WSGI literally reuses a lot of the special names used by CGI; it just puts them as keys in a hash table instead of as names of env vars.
To me, it doesn’t matter if the particular mechanics allow for persistence; if the overall conventions of how I’m supposed to understand a “request” are the same as CGI, I don’t meaningfully distinguish it, as a programming model, from CGI.
This has truly been one of the most entertaining comment threads I have read. The lambda thing that AWS has is interesting. I don’t use any AWS services, only know that all Amazon employees talk about are their products but lambda does sound, like CGIA for 2022 (but if done in 2022 with modern tech).
The process that it launches can be very lightweight though. A lot of cgi scripts that I’ve seen were punting most of the logic to another process. For example, connecting to a dev and putting all of the logic there, or simply serving a static file and prodding an IPC channel that coalesced messages and regenerated the file if it had grown stale.
No, it’s because Perl was huge (much as Rails was once huge) back when no one had anything better. Perl has long since gotten a Rack/WSGI-like API and a nice little ecosystem of servers and middlewares and whatnot (and, of course, the ability to run anything that’s compliant with that API under CGI or FastCGI if that’s what floats your boat). But there was a time when “dynamic web content” meant “CGI”, and “CGI” meant “Perl”.
I discovered I can deploy Go apps to Dreamhost by deploying them as CGI or FastCGI scripts, and Go has built-in support for both of those, supporting any HTTP framework that exposes itself as a handler!
I love CGI. So much so that I switched back to lighttpd from nginx,
because lighttpd natively supports CGI and nginx doesn’t. Not gonna
bash nginx, but lighttpd was made to be a web server where nginx was
made to be a reverse proxy, and nginx does a fabulous job at that.
Sometime in the last 10-15 years, everyone decided that the best
deployment strategy was to embed an HTTP server into their application
and then reverse proxy to it from nginx. The rise of server-side web
applications natively speaking HTTP went right along with the rise of
nginx. It works, and it has some distinct advantages. But it strikes
me as kind of wrong. HTTP is a beast of a protocol. It seems like it
would be simpler and possibly more secure to use something much, much
smaller, like CGI (or SCGI, or FastCGI, for performance). By smaller, I
mean the interface between the application and the thing it is using to
talk to the Internet.
With CGI you don’t need much from your language. Can it access
environment variables? Can it write to stdout and read from stdin? If
it ticks those three boxes, it supports CGI out of the box.
WSGI and ASGI both still feel really similar to CGI to me. There are some nice affordances there (which would generally also be in FastCGI) that I don’t want to give up for production-y things, but there’s a certain beauty to just reading a request off standard input and spitting your response to standard output.
FWIW, I also thought of them as being primarily a perl thing for a long time. I think that was mostly because our university server where we could publish our web sites only allowed perl in the shebangs for cgi scripts. It was some time before I realized that was just an administrative restriction on that specific server and not something inherent to CGI.
I think the last time I used anything through CGI or, really, fastcgi, was the software I Rose to be in charge of in about 2015 that I last used in 2017. The app was written in C in the mid-2000s and ran perfectly inside of Apache the entire time. It was a collection of executables. I never really got down into the C myself but the CGI part was absolutely never a problem for us.
I remember some talk when serverless entered the vocabulary a few years ago and someone in the crowd asking if they could just ship a binary that talked fastcgi. I think the speaker was talking about lambda and openwhisk and at the time the answer was no. I was surprised.
AWS went with their own system for some reason and didn’t officially allow custom bootstraps for years, but now you can find almost every connector. For example lambda-wsgi: https://pypi.org/project/aws-lambda-wsgi/ or lambda-fastcgi: https://bref.sh/
(you could do it in a hacky way before though - deploy a python script which only did exec(your_custom_binary))
WSGI basically is CGI, but instead of passing things in env vars with standard names, they’re passed in a hash table with standard keys.
ASGI is more of a break from the CGI model. If you use it only to do HTTP it can still be made to feel kinda CGI-like, but if you use it to do another protocol like WebSockets the differences really show up.
For those of us in the lucky 10,000 who want to know more, I do believe the rabbit hole is this way; but remeber that there’s no better way to learn than to create! See implementations of the spec and the resources above.
I’ve written a CGI-based web application and a FastCGI RPC API in Fortran 2018. Both of them access SQLite 3 databases and run in lighttpd. Once the details of the protocols are abstracted away (environment variables, routing), it feels almost like Flask or any other web framework. The response time is usually < 10 ms per request. In my benchmark, FastCGI is only 15% faster than CGI (content compression is far more important). Fortran may be an obscure choice but allows to reuse a lot of code (engineering project). There are other benefits, like Unix interoperability for free: calling Gnuplot through pipes and embedding the returned plot in SVG format as base64-encoded data URI directly into the HTML response, and so on. Authentication is covered by the web server via HTTP Basic Auth and password files. Alternatively, the lighttpd auth module provides an LDAP backend for more complex solutions.
There’s a reason we stopped using CGI the way we did though. You wouldn’t want to use escript for them, unless you’re happy to wait for the BEAM startup on every single request. Then the memory is used per-request, so a few slow requests can kill your server. And you lose in-process cache and any connection pooling too. I find this a bit annoying every time someone says that Lambda is just CGI these days…
CGI has some basic uses, but these days, I can’t think of any reason I would use it instead of FastCGI or something similar.
One of the nicest things is if you write a program in such a way it works with basic cgi, it will also work in almost any other deploy system. You can transparently switch it to fastcgi or scgi or an embedded http server since the handler, by definition, will not rely on any of that cross-request in-process stuff (though you can still provide it if you want as a bonus, of course). There’s a lot of benefit in this - you can also do horizontal scaling with the model quite easily.
this tends to be the case on a lot of setups, cgi is fairly resilient to it since you can always spawn more processes or kill individual requests from the outside.
Exactly. The reason they listed CGI for Perl is because presumably Perl didn’t have anything better. CGI is very much “the simplest thing that could possibly work”, but it has terrible performance since it launches a new process to handle every request. It’s ok for a toy project, but not much more.
We should be careful to distinguish between Unix processes and interpreters / VMs
A process in C / C++ / Rust should start in 1 ms or so, which is plenty fast for many sites.
The real problem is that Perl’s “successors” like Python and Ruby start an order of magnitude slower – 30 ms is probably a minimum, and it’s easy to get up to 300 ms when you import libraries
node.js and Java have additional JIT startup time, and basically the JIT will not get warmed up at all.
A program that proves the point is cgit, which is CGIs written in C, and you can see it’s plenty fast.
https://git.zx2c4.com/cgit/
(Not making any comment on the wisdom of CGIs in C here :) But they made it work )
My blog is a CGI program in C and it can return a page in 0.01 seconds. I’ve never really had an issue with performance in the 23 years I’ve been using the program (and that’s even when I first started out using a 66MHz 32-bit x86 Linux server in 1999). I think establishing the TLS connection is more system intensive than my CGI program.
It pains me to think of launching a process as “fast”, but yeah, you’re right about that. A ms isn’t unreasonable latency.
Of course even a C/Rust CGI will slow down if if it has to do something like open a connection to a database server or parse a big config file.
FastCGI isn’t much harder to implement and saves you the startup overhead. (Although TBH I can’t recall whether it was FastCGI I used or a different similar protocol.)
I think basically what happened is that everything else got so slow – Python/Ruby startup time, TLS, web pages do 100 network round trips now, etc. – that Unix processes are fast now :)
It’s basically like bash saying it’s “too big and too slow” in the man page written in the 90’s, but now it’s fast and small, compared to modern programs
Database connections are an issue, especially on shared hosting, but there seems to be this newfound love for sqlite which would almost eliminate that :)
I currently use FastCGI on Dreamhost, but my impression was that it was kind of a weird protocol without many implementations. It seems to be almost a PHP implementation technique – for some reason, it’s not widely used for Python / node.js / etc.
I actually had to download an ancient version of a FastCGI Python library to get it work. It’s clear to me that very few people use it on shared hosting
Which is shame because as mentioned in this thread, AWS Lambda is basically FastCGI – it has cold starts, and then it’s fast.
FastCGI was a big deal ~2000, less so now. There used to be a lot of stuff using it. It still has currency in PHP-land because php-fpm is the only reasonable way to run PHP besides Apache mod_php, and people have various reasons for not wanting to run Apache, or not wanting to have mod_php in their Apache.
For the uninitiated could you break down the difference between CGI and FastCGI. I understand the flow of CGI being request comes in, program runs and its stdout is sent back to the client.
But in FastCGI is their persistence processes involved?
Yeah, with fastcgi the launcher can spawn and kill worker processes as needed, and each worker process can handle an arbitrary number of requests in sequence. They might live a long time or a short time depending on load and configuration.
You can abstract it so a single program works as cgi or fastcgi pretty easily.
There’s also a scgi alternative, which aims to be a simplified version of fastcgi but is generally the same thing, just the communication protocol is a bit different.
Si simply more control instead of directly spawning a process on visit, awesome - gotya!
To be fair that’s what “lambda”/Function-as-a-service things do (aws). Worse, they bring up the entire virtual machine up when a request is made. They seem to be used in production, albeit in very narrow workflows where the latency requirements aren’t tight.
That’s exactly what I mentioned above… No, lambda is not even close to CGI. Lambda spawns an environment on your first request and retains it for a predefined time, so that other requests reuse it. Under enough load it will spawn more of those environments to keep the responses quick. This allows for in memory caching and preserving database connections. There’s also no “entire virtual machine”, but a thin implementation called firecracker, which is not that much slower than starting a standard process. It comes with lots of resource usage controls and scaling options which CGI itself has no concept of. If you’re fine with the first request in a batch taking over 100ms (which honestly most services are), you shouldn’t suffer from lambda latency. (Officially listed as 150ms https://firecracker-microvm.github.io/ )
It’s closer to fastcgi container autoscaled in a cluster.
I guess there’s a line to be drawn on whether you consider “CGI” and “CGI with bolted-on process/env persistence” to be completely separate things.
I’m not sure I do.
If you don’t, then everything is CGI with bolted on persistence. Literally anything that handles http requests is CGI with bolted on persistence if lambda is. Why even stop there - qmail is CGI with bolted on email protocol. That would make “CGI” itself meaningless.
Or from the other side - since CGI is defined as starting a new process per connection, which implies memory separation between requests, why would anything that doesn’t implement that core idea be called CGI?
There are a lot of “CGI-ish” protocols which have kept many of the conventions of CGI even if they haven’t kept exactly the same mechanics as CGI. So, as I pointed out in another comment, Python’s WSGI literally reuses a lot of the special names used by CGI; it just puts them as keys in a hash table instead of as names of env vars.
To me, it doesn’t matter if the particular mechanics allow for persistence; if the overall conventions of how I’m supposed to understand a “request” are the same as CGI, I don’t meaningfully distinguish it, as a programming model, from CGI.
And FastCGI which is the de facto way of running PHP using nginx and others.
This has truly been one of the most entertaining comment threads I have read. The lambda thing that AWS has is interesting. I don’t use any AWS services, only know that all Amazon employees talk about are their products but lambda does sound, like CGIA for 2022 (but if done in 2022 with modern tech).
The process that it launches can be very lightweight though. A lot of cgi scripts that I’ve seen were punting most of the logic to another process. For example, connecting to a dev and putting all of the logic there, or simply serving a static file and prodding an IPC channel that coalesced messages and regenerated the file if it had grown stale.
No, it’s because Perl was huge (much as Rails was once huge) back when no one had anything better. Perl has long since gotten a Rack/WSGI-like API and a nice little ecosystem of servers and middlewares and whatnot (and, of course, the ability to run anything that’s compliant with that API under CGI or FastCGI if that’s what floats your boat). But there was a time when “dynamic web content” meant “CGI”, and “CGI” meant “Perl”.
I discovered I can deploy Go apps to Dreamhost by deploying them as CGI or FastCGI scripts, and Go has built-in support for both of those, supporting any HTTP framework that exposes itself as a handler!
If your eyes lit up reading this and you have an interest in serving protocols other than HTTP, do check out inetd for the same stdin/out experience.
You are lucky enough that you (better late than never) discovered a classic technology. Many other are reinventing the wheel.
I love CGI. So much so that I switched back to lighttpd from nginx, because lighttpd natively supports CGI and nginx doesn’t. Not gonna bash nginx, but lighttpd was made to be a web server where nginx was made to be a reverse proxy, and nginx does a fabulous job at that.
Sometime in the last 10-15 years, everyone decided that the best deployment strategy was to embed an HTTP server into their application and then reverse proxy to it from nginx. The rise of server-side web applications natively speaking HTTP went right along with the rise of nginx. It works, and it has some distinct advantages. But it strikes me as kind of wrong. HTTP is a beast of a protocol. It seems like it would be simpler and possibly more secure to use something much, much smaller, like CGI (or SCGI, or FastCGI, for performance). By smaller, I mean the interface between the application and the thing it is using to talk to the Internet.
With CGI you don’t need much from your language. Can it access environment variables? Can it write to stdout and read from stdin? If it ticks those three boxes, it supports CGI out of the box.
You’re one of today’s lucky 10,000.
WSGI and ASGI both still feel really similar to CGI to me. There are some nice affordances there (which would generally also be in FastCGI) that I don’t want to give up for production-y things, but there’s a certain beauty to just reading a request off standard input and spitting your response to standard output.
FWIW, I also thought of them as being primarily a perl thing for a long time. I think that was mostly because our university server where we could publish our web sites only allowed perl in the shebangs for cgi scripts. It was some time before I realized that was just an administrative restriction on that specific server and not something inherent to CGI.
I think the last time I used anything through CGI or, really, fastcgi, was the software I Rose to be in charge of in about 2015 that I last used in 2017. The app was written in C in the mid-2000s and ran perfectly inside of Apache the entire time. It was a collection of executables. I never really got down into the C myself but the CGI part was absolutely never a problem for us.
I remember some talk when serverless entered the vocabulary a few years ago and someone in the crowd asking if they could just ship a binary that talked fastcgi. I think the speaker was talking about lambda and openwhisk and at the time the answer was no. I was surprised.
AWS went with their own system for some reason and didn’t officially allow custom bootstraps for years, but now you can find almost every connector. For example lambda-wsgi: https://pypi.org/project/aws-lambda-wsgi/ or lambda-fastcgi: https://bref.sh/
(you could do it in a hacky way before though - deploy a python script which only did
exec(your_custom_binary)
)WSGI basically is CGI, but instead of passing things in env vars with standard names, they’re passed in a hash table with standard keys.
ASGI is more of a break from the CGI model. If you use it only to do HTTP it can still be made to feel kinda CGI-like, but if you use it to do another protocol like WebSockets the differences really show up.
I wrote some CGI web apps in bash, for fun. Terrible stuff. But I love the simplicity of CGI.
I read this, and immediately followed it with “I feel for the NetBSD community”; there, on Runenerd’s home page: https://rubenerd.com/the-beauty-of-cgi-and-simple-design-by-hales/
In which he links to Hailstrom: https://halestrom.net/darksleep/blog/046_cgi/ (Originally written here! https://lobste.rs/s/pdynxz/long_death_cgi_pm#c_lw4zci)
For formal folks, there is an RFI: https://www.rfc-editor.org/rfc/rfc3875
For those of us in the lucky 10,000 who want to know more, I do believe the rabbit hole is this way; but remeber that there’s no better way to learn than to create! See implementations of the spec and the resources above.
I’ve written a CGI-based web application and a FastCGI RPC API in Fortran 2018. Both of them access SQLite 3 databases and run in lighttpd. Once the details of the protocols are abstracted away (environment variables, routing), it feels almost like Flask or any other web framework. The response time is usually < 10 ms per request. In my benchmark, FastCGI is only 15% faster than CGI (content compression is far more important). Fortran may be an obscure choice but allows to reuse a lot of code (engineering project). There are other benefits, like Unix interoperability for free: calling Gnuplot through pipes and embedding the returned plot in SVG format as base64-encoded data URI directly into the HTML response, and so on. Authentication is covered by the web server via HTTP Basic Auth and password files. Alternatively, the lighttpd auth module provides an LDAP backend for more complex solutions.
Does CGI support websockets?
Currently, I have a standalone websocket server with nginx as frontend proxy. But I don’t like this solution.
One of my prouder creations was writing a CGI “script” in INTERCAL. It plays mastermind.