-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hijack #481
Conversation
Sorry for the double mail. We still have a parent to this repo :'( |
Awesome! I have a patch to Reel that half-ass supports websockets using something similar (it passes 'rack.websocket' into the Rack env): Would sure be nice to have an official API for this sort of thing. |
I support this! Puma exposes the socket via |
+1, I see no obvious problems. |
Do you think we need a I'd generally discourage it from middleware, but, right now there's no way to tell if "someone else" is already in control of the socket. |
Can't they just check whether rack.hijack_io is set? |
This seems good. I think I can port Sinatra's |
@chneukirchen no, as it is allowed to be set AOT or after the call. this part of the spec is intentionally open to avoid introducing implementation complexities. I am concerned about env replacement middleware too, although that's rare, we don't specify that middleware must pass on /the/ env object, which is why the "should" is there. we also don't directly specify that env can be written to or deleted from, but that's generally accepted + implemented. In general maybe we shouldn't worry about the middleware case too much. It's confined and hard to specify well. |
It sounds like everyones in agreement. I'm tempted to place a "beta api" marker around this with a projected roadmap of removing that marker by 1.6 if all is going well. This potentially leaves us open to make a few changes if we've missed things between now and then. Necessary, unnecessary? other approaches? |
@macournoyer adding you, take a look - I know you're familiar with the older async api. @igrigorik as you've also touched on similar items, maybe you have some thoughts? |
I like that there is an IO in the env hash, I think. But it seems difficult to support an API like this: def index
response.status = 200
response['X-Greeting'] = 'Hi MOM!'
10.times do
sleep 10;
response.stream.write "hello world\n"
end
response.stream.close
### response is closed here
end If I'm reading the spec correctly, rack will call def index
response.status = 200
response['X-Greeting'] = 'Hi MOM!'
streaming do |io|
10.times do
sleep 10;
io.write "hello world\n"
end
io.close
### response is closed here
end
### response is not closed here
end I'm afraid people would be confused by when the socket is open or closed. It may seem counterintuitive that the socket is still open after the Am I missing anything? |
Surprisingly the API you're afraid of ending up with is what Sinatra gives you right now. On Jan 6, 2013, at 1:31 AM, Aaron Patterson [email protected] wrote:
|
From a high-level pass, this makes sense. It's also pretty much the pattern we implemented in Goliath: https://github.com/postrank-labs/goliath/blob/master/examples/chunked_streaming.rb Admittedly, since we're "kinda, sorta" rack compliant, we were able to bend the rules and create our own helper methods to tackle the use case. Which also gives us the API @tenderlove highlighted, +/- a few EM timers. :) However, since we brought up HTTP 2.0.. This change is not sufficient to enable what we need in that department. While the 2.0 spec is still a WIP, we have a pretty good idea as to how it's going to look, and it may be worth starting to think about it. (But that shouldn't stop this proposal from going forward) |
@rkh just not super stoked about more callbacks. The number of |
@tenderlove callcc? ;) On Jan 6, 2013, at 2:06 AM, Aaron Patterson [email protected] wrote:
|
@igrigorik the first api I listed is verbatim what is possible in Rails master today. In fact, the second API can be implemented in terms of the first: def streaming
yield response.stream
end
def index
response.status = 200
response['X-Greeting'] = 'Hi MOM!'
streaming do |io|
10.times do
sleep 10;
io.write "hello world\n"
end
io.close
### response is closed here
end
### response is not closed here
end Unfortunately, the opposite is not true. Also, the user having no visibility in to when the |
@tenderlove It would be worth taking a look at the example rackup files I linked. There may be a misunderstanding. There are two APIs, one where rack does not call My problem with the API that you're proposing, is that it is totally undefined when the user takes control of the socket. What would the following code mean? response.status = 200
response['X-Greeting'] = 'Hi MOM!'
# When do the following two lines actually get written to the socket? What if I put a sleeping loop in there?
response.stream.write 'ohai!'
response['X-Treme-Headers'] = 'woosh!'
response.stream.close If we define class MyResponse < Rack::Response
def stream
@stream ||=
begin
unless env['rack.hijack?']
raise NotImplementedError, "server does not support socket hijacking, cannot stream"
end
env['rack.hijack'].call
stream = env['rack.hijack_io']
write_status_to stream
write_headers_to stream
stream
end
end
end Additionally, a particularly "clever" application might be able to do some buffering to allow for late initialization of headers, without a change to the proposed spec. It's a hacky solution to the problem I describe above, and horrendously complex with additional user facing edge cases, but, I could envisage something in Rails (I'd rather not), that operates with the following rules:
You'd need a more sophisticated stream object than just the hijack_io, but it's entirely possible to write this. This would enable you to avoid threads/callcc's, still not have to write status and headers yourself, but still have socket like IO for the body stream. If you wanted to add user threads (say a thread in an action to free up the webserver later on - good for really long running actions or the like) then all you'd need to do is ensure that the stream buffer -> immediate mode switching is thread safe. That's also not too hard. It would take a bit longer, but if this is really what you want, I'd be happy to write up a basic implementation for you. |
@tenderlove I think you want to be able to avoid this: https://github.com/rails/rails/blob/master/actionpack/lib/action_controller/metal/live.rb#L102-L125 but this is not a rack problem really. It's a problem with your API. Your API has no indicator to the user of when writes really happen, or when the response stops and the stream starts - it's just implied. Your API already forces people to make a distinction between "things that set headers" and "things that stream", that is here: https://github.com/rails/rails/blob/master/actionpack/lib/action_controller/metal/live.rb#L69 . Ok, so they can cause this "by accident" instead of explicitly declaring it - meaning they're in some ways more likely to trip on an exception to learn how the API works than to just have to add one line (I think it's a learnability (and thus usability) mistake). It's not possible to run two pieces of code at the same time, without some kind of threading style abstraction. This isn't an API limitation, it's as you know, the way the world works. Is there some other solution you have in mind where you can overcome this limitation? - To be clear, you want server code (sending status and headers) and middleware code (altering status and headers in after actions) to be running at the same time as user code (streaming data), I think without threads, if I understand your concerns correctly. Now we can serialize these operations a bit, and in these cases we have two options:
Or
You seem to indicate you would like to do the prior, which is as I describe above, possible with the request hijack side of this SPEC change. It means that the response is going to have to write the response data itself, rather than the server doing it, which is kind of ugly, but, it will work. Unfortunately, this also results in needing to disregard middleware "after actions" on the response, which will have no effect. In order to preserve middleware after actions (such as headers set after the application is called), then you have no choice but to use the second approach. It's either that, or you drop middleware altogether, and go right to the metal all the time. We could try to sugar the first case a little, by adding an additional server callback that says "here's the status and the headers, please write them for me", so that we're not writing status and header IO code - but that's so trivial, it may not be worth it. The only advantage to this could be, that if it was a user code -> server code callback, then it could be wrapped by supporting middleware using a similar approach to the chunkyio example I give above. It's not pretty, but middleware is middleware - it needs to sit in the middle somehow. I can draft something along these lines too, if that seems "more correct" for your use case. |
It would raise an exception. Header hash modification is forbidden after the headers have been dumped to the socket (which happens when the buffer thread wakes, but the buffer thread won't wake until there is data queued). But since we can "liberate" the socket from the webserver, then we're free to define those semantics.
Perfect. Exactly what I want.
No, I'd like a thread that's writing to the socket, an app thread, and a buffer between them. If I can say to the webserver "the socket is mine", then this should be perfect. Your code example is great. I'll read the telnet example you posted, but so far I am very happy. :-) |
The example code posted assumes that in the stack we always hand down the original env hash, never a new/modified one. Otherwise the io can't be added later on. And adding it before hand could lead to undefined behaviour if someone just starts writing to it. Maybe calling the callback should instead return the IO? Or we need some wrapper object in place for the IO, or something. |
For websockets, we should also consider updating the spec to no longer require the request body to be rewindable. Though that would have more implications for existing code. |
@rkh re env, yeah, i've been considering that too - I'll switch the should to a must on the returning it. maybe some language that it must be preserved / set / passed down after initialization (after the call to hijack). |
@rkh re rewindable and websockets, i'm not sure how practical that's going to be. clearly multipart and websockets have different usage patterns, maybe this is something where middleware that's reading from either needs to be more aware of the context. web servers supporting websocket pass-through certainly shouldn't be buffering the incoming body before hitting the app. |
Yeah, I mean, I think if people really want websockets, they should be fine with running that through something else, as it probably doesn't fit a request/response model anyhow. For SSE and alike, this would be gold already. |
@raggi the scenario/api @tenderlove has in mind would only work with a full/pre-header hijack, right? otherwise you need to hand down a callback for starting to stream, no? This would mean that Rails would need to take care of writing headers to the stream, potentially messing with middleware, right? |
Yeah, that's what I was getting at: It's not part of the spec. |
The spec deliberately ignores scheduling concerns. Servers are encouraged to clearly document how concurrency safe the sockets are. For most servers (all so far?) that just hand off a real socket, that's easy - the app is in control of scheduling and the socket. The socket could be handed off to an app managed thread instead of a server managed thread with no critical concerns. In the case of having to hand back a socket however, one would have to consider scheduling much more closely, even for "trivial cases". If the server accepts more sockets than it has threads for, then the hand back could easily result in deadlocks / over subscription / stale resources, etc. |
Wait, what, how do I hand back a socket? |
You don't, that's the point. If I provided an API for that, it would absolutely need some specification regarding scheduling concerns, which would have been far too big and prescriptive for a 1.x series change. |
I think I still don't really see the advantage of the partial hijack vs each-streaming then. At least not from a rails/sinatra/library perspective. |
I don't really see the use case for "handing back a socket". I was talking about the original thread not blocking/being reused or something. If the spec would contain a sentence stating that the rack handler permits capturing a large (whatever that means) number of sockets, wouldn't that be enough? |
@rkh the advantage of partial hijack is just for scheduling and io handoff when people can't understand how to deal with each. It's a cop-out, but I had a lot of smart people simply not getting how to yield each regularly. Regarding handing back a socket, we really have a lot of semantics to cover, is the socket even in a valid http state when it's handed back? A lot more encapsulation is required to make that any kind of sane. |
I don't want to hand a socket back, just a thread/control flow. |
How does the "for scheduling" part look like? Can I just safely do a |
@rkh it is to be defined by the server, but in all implementations I have seen so far, yes, absolutely. |
So, which servers understand that? @FooBarWidget has there been an open source passenger release by now that includes support? I want to experiment with moving Sinatra streaming over to this mechanism. |
Phusion Passenger 4.0 RC 4 open source version has been out for 2 weeks now. It supports the hijacking API. |
Sweet! |
@raggi btw, when I run your full hijack example in dev mode, I get a |
@raggi is there a automated test to confirm whether a |
== Changes Please note that this release includes a few potentially breaking changes. Of particular note are: * SessionHash is no longer a Hash sublcass * Rack::File cache_control parameter is removed in place of headers options Additonally, SPEC has been updated in several areas and is now at 1,2. A new SPEC section was introduced that provides two server-optional IO hijacking APIs. Further information on these APIs will be made available by the community in good time. In the mean time, some information can be found in the original pull request: rack/rack#481 * January 21st, 2013: Thirty third public release 1.5.0 * Introduced hijack SPEC, for before-response and after-response hijacking * SessionHash is no longer a Hash subclass * Rack::File cache_control parameter is removed, in place of headers options * Rack::Auth::AbstractRequest#scheme now yields strings, not symbols * Rack::Utils cookie functions now format expires in RFC 2822 format * Rack::File now has a default mime type * rackup -b 'run Rack::File.new(".")', option provides command line configs * Rack::Deflater will no longer double encode bodies * Rack::Mime#match? provides convenience for Accept header matching * Rack::Utils#q_values provides splitting for Accept headers * Rack::Utils#best_q_match provides a helper for Accept headers * Rack::Handler.pick provides convenience for finding available servers * Puma added to the list of default servers (preferred over Webrick) * Various middleware now correctly close body when replacing it * Rack::Request#params is no longer persistent with only GET params * Rack::Request#update_param and #delete_param provide persistent operations * Rack::Request#trusted_proxy? now returns true for local unix sockets * Rack::Response no longer forces Content-Types * Rack::Sendfile provides local mapping configuration options * Rack::Utils#rfc2109 provides old netscape style time output * Updated HTTP status codes * Ruby 1.8.6 likely no longer passes tests, and is no longer fully supported * January 28th, 2013: Thirty fourth public release 1.5.1 * Rack::Lint check_hijack now conforms to other parts of SPEC * Added hash-like methods to Abstract::ID::SessionHash for compatibility * Various documentation corrections * February 7th, Thirty fifth public release 1.5.2 * Fix CVE-2013-0263, timing attack against Rack::Session::Cookie * Fix CVE-2013-0262, symlink path traversal in Rack::File * Add various methods to Session for enhanced Rails compatibility * Request#trusted_proxy? now only matches whole stirngs * Add JSON cookie coder, to be default in Rack 1.6+ due to security concerns * URLMap host matching in environments that don't set the Host header fixed * Fix a race condition that could result in overwritten pidfiles * Various documentation additions
== Changes Please note that this release includes a few potentially breaking changes. Of particular note are: * SessionHash is no longer a Hash sublcass * Rack::File cache_control parameter is removed in place of headers options Additonally, SPEC has been updated in several areas and is now at 1,2. A new SPEC section was introduced that provides two server-optional IO hijacking APIs. Further information on these APIs will be made available by the community in good time. In the mean time, some information can be found in the original pull request: rack/rack#481 * January 21st, 2013: Thirty third public release 1.5.0 * Introduced hijack SPEC, for before-response and after-response hijacking * SessionHash is no longer a Hash subclass * Rack::File cache_control parameter is removed, in place of headers options * Rack::Auth::AbstractRequest#scheme now yields strings, not symbols * Rack::Utils cookie functions now format expires in RFC 2822 format * Rack::File now has a default mime type * rackup -b 'run Rack::File.new(".")', option provides command line configs * Rack::Deflater will no longer double encode bodies * Rack::Mime#match? provides convenience for Accept header matching * Rack::Utils#q_values provides splitting for Accept headers * Rack::Utils#best_q_match provides a helper for Accept headers * Rack::Handler.pick provides convenience for finding available servers * Puma added to the list of default servers (preferred over Webrick) * Various middleware now correctly close body when replacing it * Rack::Request#params is no longer persistent with only GET params * Rack::Request#update_param and #delete_param provide persistent operations * Rack::Request#trusted_proxy? now returns true for local unix sockets * Rack::Response no longer forces Content-Types * Rack::Sendfile provides local mapping configuration options * Rack::Utils#rfc2109 provides old netscape style time output * Updated HTTP status codes * Ruby 1.8.6 likely no longer passes tests, and is no longer fully supported * January 28th, 2013: Thirty fourth public release 1.5.1 * Rack::Lint check_hijack now conforms to other parts of SPEC * Added hash-like methods to Abstract::ID::SessionHash for compatibility * Various documentation corrections * February 7th, Thirty fifth public release 1.5.2 * Fix CVE-2013-0263, timing attack against Rack::Session::Cookie * Fix CVE-2013-0262, symlink path traversal in Rack::File * Add various methods to Session for enhanced Rails compatibility * Request#trusted_proxy? now only matches whole stirngs * Add JSON cookie coder, to be default in Rack 1.6+ due to security concerns * URLMap host matching in environments that don't set the Host header fixed * Fix a race condition that could result in overwritten pidfiles * Various documentation additions
== Changes Please note that this release includes a few potentially breaking changes. Of particular note are: * SessionHash is no longer a Hash sublcass * Rack::File cache_control parameter is removed in place of headers options Additonally, SPEC has been updated in several areas and is now at 1,2. A new SPEC section was introduced that provides two server-optional IO hijacking APIs. Further information on these APIs will be made available by the community in good time. In the mean time, some information can be found in the original pull request: rack/rack#481 * January 21st, 2013: Thirty third public release 1.5.0 * Introduced hijack SPEC, for before-response and after-response hijacking * SessionHash is no longer a Hash subclass * Rack::File cache_control parameter is removed, in place of headers options * Rack::Auth::AbstractRequest#scheme now yields strings, not symbols * Rack::Utils cookie functions now format expires in RFC 2822 format * Rack::File now has a default mime type * rackup -b 'run Rack::File.new(".")', option provides command line configs * Rack::Deflater will no longer double encode bodies * Rack::Mime#match? provides convenience for Accept header matching * Rack::Utils#q_values provides splitting for Accept headers * Rack::Utils#best_q_match provides a helper for Accept headers * Rack::Handler.pick provides convenience for finding available servers * Puma added to the list of default servers (preferred over Webrick) * Various middleware now correctly close body when replacing it * Rack::Request#params is no longer persistent with only GET params * Rack::Request#update_param and #delete_param provide persistent operations * Rack::Request#trusted_proxy? now returns true for local unix sockets * Rack::Response no longer forces Content-Types * Rack::Sendfile provides local mapping configuration options * Rack::Utils#rfc2109 provides old netscape style time output * Updated HTTP status codes * Ruby 1.8.6 likely no longer passes tests, and is no longer fully supported * January 28th, 2013: Thirty fourth public release 1.5.1 * Rack::Lint check_hijack now conforms to other parts of SPEC * Added hash-like methods to Abstract::ID::SessionHash for compatibility * Various documentation corrections * February 7th, Thirty fifth public release 1.5.2 * Fix CVE-2013-0263, timing attack against Rack::Session::Cookie * Fix CVE-2013-0262, symlink path traversal in Rack::File * Add various methods to Session for enhanced Rails compatibility * Request#trusted_proxy? now only matches whole stirngs * Add JSON cookie coder, to be default in Rack 1.6+ due to security concerns * URLMap host matching in environments that don't set the Host header fixed * Fix a race condition that could result in overwritten pidfiles * Various documentation additions
Hi all!
I had some time on the plane with no WiFi and some holiday left over, so...
@tenderlove please tell me if this matches up with the stuff you were describing to me at rubyconf
@tarcieri please take a look as you're also familiar with these things
@evanphx as with Tony, if you wouldn't mind
@chneukirchen @rkh you know why :-)
Please also highlight anyone else who would be worthwhile, or at least send them a link. I'd appreciate feedback.
What?
We still don't have an API that allows for IO-like streaming, and this is a straw man that addresses this within the confines of the rack 1.x spec. It's not an attempt to build out what I hope a 2.0 spec should be, but I am hoping that something like this will be enough to aid Rails 4s ventures, enable websockets, and a few other strategies. With HTTP2 around the corner, we'll likely want to revisit the IO API for 2.0, but we'll see how this plays out. Maybe IO wrapped around channels will be ok.
How?
There are two modes, one for "full hijacking", which is what you'd want for websockets and various hacks, and another mode which I think is closer to what @tenderlove really wants, which is a post-headers hijack. In both cases, the application assumes control of the socket after it has been hijacked, and there is currently no API for "giving sockets back to the server", so it's recommended that folks use Connection:close with this mechanism.
Examples?
I've implemented this spec in Thin (oddly enough) because it was the only reasonable server I had checked out on my machine as I as writing it. I suspect the Puma implementation will be even simpler. It's all in this branch: https://github.com/raggi/thin/commits/hijack
Specifically the server piece lives here: https://github.com/raggi/thin/blob/e04855459cb42fd98a0a483075f8337cafe6d949/lib/thin/connection.rb
There are two examples, one for each of the hijack "modes" (pre headers and post headers):
The examples implement a trivial HTTP -> telnet chat interface. You make any HTTP request to "join" and the response dumps you into the single global line buffered chat room.
Limitations
I have not tried to spec out a full IO API, and I'm not sure that we should.
I have not tried to respec all of the HTTP / anti-HTTP semantics.
There is no spec for buffering or the like.
The intent is that this is an API to "get out the way".
Similar implementations
http://golang.org/pkg/net/http/#Hijacker
Additional example
(this is a middleware example, that ships outgoing IO in chunked encoding - good for streamed xhr or the like)