When I first encountered FastCGI in the 1990s it seemed silly to me to invent a new protocol when HTTP could do the job. Contrary to what cks says, HTTP has fairly well defined semantics for forwarding requests, it isn’t a non-standard extra thing. And error layering confusion is a problem even if you use FastCGI, especially for “restful” APIs that like to use HTTP status codes.
But cks is right that forwarding HTTP is full of traps and pitfalls. I just disagree that his remedy makes sense.
FastCGI was intended as a replacement for CGI. A CGI program didn’t have to speak all of HTTP - it just needed to know enough to produce a response and read the headers it was interested in from environment variables. Even a shell script could do that. FastCGI is a relatively straightforward extension of that, and could be used as a library.
But of course, it might’ve made more sense to just stand up multiple HTTP servers from your language and put that behind the proxy instead. But then you’d also need some way to know which workers are available and route it to the right one. I think FastCGI handled all of that for you.
FastCGI uses a library on the back end to make its custom protocol look like CGI. It could equally well have used HTTP for its protocol, and from the application’s code’s point of view it would still look like CGI.
FastCGI also has a component in the web server that does two things: it does protocol conversion from HTTP to FastCGI, and it makes configuration more like (in apache terms) mod_cgi than mod_proxy. But it could do the config improvement without the protocol conversion.
The reason my counterfactual would be better is that you could mix and match front end proxies and back end applications: you could run a FastCGI-style application behind any old HTTP front-end proxy, or you could run an HTTP application server behind a web server that just had a basic FastCGI configuration.
Great post! I think this is at the heart of a lot of security issues regarding forwarding proxies.
For example, the X-Forwarded-For header cannot be trusted in general - you have to peel it like an onion, stripping entries until you reach the last known trustworthy proxy, take the first entry after that and drop the rest, because those would all have been spoofed by the client. And that only works if you know 100% certainly that those proxies of yours all add such a header.
This is an interesting take. I thought the obvious reason for using a HTTP reverse-proxy over FastCGI was for distribution and load-balancing, but upon investigation you can run FastCGI over TCP, which I didn’t know. Anyone ever worked with a setup like that and can say something about it’s pros and cons?
Nice article. I agree it would be nice to have a more-specified way to encapsulate forwarded requests.
You can handle some this with the HTTP CONNECT method. I’m surprised that’s not mentioned in the article. CONNECT basically forms a TCP tunnel to a forward server, which you can speak HTTP with. That would fix their complaints about proxies altering HTTP requests in-flight.
Unfortunately, reverse proxies do more than just blindly forward traffic. Filtering, load-balancing, routing, security, etc. The biggest problem with CONNECT is it requires the outside peer to know about its ultimate destination, which is a security and reliability risk.
i always liked the idea of mongrel2 using zeromq as the backend transport. it solves that problem, and also nudges you towards something you can turn into a distributed system
When I first encountered FastCGI in the 1990s it seemed silly to me to invent a new protocol when HTTP could do the job. Contrary to what cks says, HTTP has fairly well defined semantics for forwarding requests, it isn’t a non-standard extra thing. And error layering confusion is a problem even if you use FastCGI, especially for “restful” APIs that like to use HTTP status codes.
But cks is right that forwarding HTTP is full of traps and pitfalls. I just disagree that his remedy makes sense.
FastCGI was intended as a replacement for CGI. A CGI program didn’t have to speak all of HTTP - it just needed to know enough to produce a response and read the headers it was interested in from environment variables. Even a shell script could do that. FastCGI is a relatively straightforward extension of that, and could be used as a library.
But of course, it might’ve made more sense to just stand up multiple HTTP servers from your language and put that behind the proxy instead. But then you’d also need some way to know which workers are available and route it to the right one. I think FastCGI handled all of that for you.
FastCGI uses a library on the back end to make its custom protocol look like CGI. It could equally well have used HTTP for its protocol, and from the application’s code’s point of view it would still look like CGI.
FastCGI also has a component in the web server that does two things: it does protocol conversion from HTTP to FastCGI, and it makes configuration more like (in apache terms) mod_cgi than mod_proxy. But it could do the config improvement without the protocol conversion.
The reason my counterfactual would be better is that you could mix and match front end proxies and back end applications: you could run a FastCGI-style application behind any old HTTP front-end proxy, or you could run an HTTP application server behind a web server that just had a basic FastCGI configuration.
Great post! I think this is at the heart of a lot of security issues regarding forwarding proxies.
For example, the
X-Forwarded-For
header cannot be trusted in general - you have to peel it like an onion, stripping entries until you reach the last known trustworthy proxy, take the first entry after that and drop the rest, because those would all have been spoofed by the client. And that only works if you know 100% certainly that those proxies of yours all add such a header.This is an interesting take. I thought the obvious reason for using a HTTP reverse-proxy over FastCGI was for distribution and load-balancing, but upon investigation you can run FastCGI over TCP, which I didn’t know. Anyone ever worked with a setup like that and can say something about it’s pros and cons?
Nice article. I agree it would be nice to have a more-specified way to encapsulate forwarded requests.
You can handle some this with the HTTP CONNECT method. I’m surprised that’s not mentioned in the article. CONNECT basically forms a TCP tunnel to a forward server, which you can speak HTTP with. That would fix their complaints about proxies altering HTTP requests in-flight.
Unfortunately, reverse proxies do more than just blindly forward traffic. Filtering, load-balancing, routing, security, etc. The biggest problem with CONNECT is it requires the outside peer to know about its ultimate destination, which is a security and reliability risk.
I agree with idea, but this touches one of the core properties of Roy Fielding’s thesis: transparent layering https://ics.uci.edu/~fielding/pubs/dissertation/fielding_dissertation.pdf (5.3.1)
Can you improve encapsulation and yet still preserve transparency?
i always liked the idea of mongrel2 using zeromq as the backend transport. it solves that problem, and also nudges you towards something you can turn into a distributed system
Yeah, the idea was great but it was a bit of solution in search of a problem.
Can you connect all the things? Yes.
Do you need to connect all the things that don’t speak FastCGI? No.
In a world without FastCGI it could have become the standard, but in reality it was a fun proof of concept.
Also wow, it’s been 12 years… https://github.com/winks/m2php