-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage: No timeouts cause indefinite hanging #13
Comments
I believe we are experiencing a similar problems every since 1.25.0 was released last week. We are accessing blobs directly and getting random 504 errors. The code can be as simple as: Logs show errors like this:
|
@mcsimps2 Did you begin to see this behavior recently (with a particular release) or has it been ongoing? Edit: Assigning priority and type to keep the bots happy, please change to something more appropriate once you are able to look at this in more detail. |
I have had hanging threads on both google-cloud-storage == 1.24.1 and 1.25.0. However, this problem has only manifested recently (the first time I ran into it was this earlier this week). It first occurred on 1.24.1, so I upgraded to 1.25.0. The Google servers have usually been responsive enough that I've never run into a timeout issue at all. Thus, I believe these types of loopholes have existed in the code over several versions, yet perhaps something recent occurred that decreased the timeliness/responsiveness of the servers, thereby causing these In the meantime, if there are other users experiencing a similar problem (@akuttig-hs), especially with threads hanging indefinitely from the
The monkeypatch was the first thing I tried, and it definitely helped alleviate the problem since the proper Timeout exception is being thrown by the requests library, whereas before, it would never throw an exception and just wait forever for a response. That said, in the end, it's not a real solution to the problem. |
Does the issue have to do with the The snippet applies the default If the above is indeed the cause, it would affect versions Since we more or less always want to have some sort of a timeout, the above logic in resumable media could be modified to apply the default timeout on Edit: The code path from the issue description:
does not sound like going through FWIW, BigQuery had similar problems (example), because there used to be no way of specifying a timeout to use in the transport layer. Edit 2: Edit 3: |
Closes #434. This PR adds a non-None default timeout to `AuthorizedSession.request()` to prevent requests from hanging indefinitely in a default case. Should help with googleapis/google-cloud-python#10182
@mcsimps2 @akuttig-hs Would you mind trying an upgrade to |
Thanks for the quick response! @plamut I don't believe all requests still have a non-null Timeout with the recent PR you submitted that patched
So in the end, A quick code just for my sanity to make sure I understand kwargs correctly
So perhaps, the code should be like this:
|
That is correct. There was actually a PR commit in google-auth that would override any The ruling was that if a library passes a |
Ah, you’re perfectly right. I should’ve looked at the PR more closely, my apologies! |
Hey, |
@maykmayx I can speak only generally, because I am not too familiar with this particular library (storage). The following code drives the retries from inside the If the target callable blocks indefinitely, a retry will not have a chance to act, which is the issue that is being dealt with here (#10199 will add customizable timeouts to If the target callable does return, but with an error, from google.api_core.retry import Retry
retry = Retry()
retry = retry.with_predicate(lambda exc: "foobar" in str(exc)) From what I can see from the linked sample, |
Library: Google Cloud Storage
Environment: Win7 and OSX
Python Version: 3.7.3
Google Cloud Storage Version: 1.25.0
I don't believe all methods of the storage client are using timeouts. I've come across several situations where an upload or download has completely hung because of this. Unfortunately, there's no stack trace because the thread is just hanging waiting for a response. Just from a brief code inspection, I can identify an example area where a timeout is not being honored:
Bucket.get_blob
callsblob.reload()
, which then calls the following without specifying a timeout:This then calls
JSONConnection.api_request
(defaults timeout to None) ->JSONConnection._make_request
(defaults timeout to None) ->JSONConnection._do_request
(defaults timeout to None) ->AuthorizedSession.request
(defaults timeout to None), which makes the ultimate call to therequests.Session
object with aNone
timeout. The end result is that a request is made without a timeout, which can very easily cause a thread to hang indefinitely waiting for a response.I realize that it would be a huge pain to try and find all possible
None
timeout paths and patch them, but I at least wanted to bring this to attention. I'm currently wrapping every call to the Google Cloud Python library with a custom timeout function that forcefully stops execution after my own specified timeout, since I have no way to pass that in to the library. A fix that either allows developers to pass in their custom timeout, either to each function called (e.g.get_blob(...)
) or to theclient
object so that it's passed with every single request in the underlyinghttp
instance, would be amazing. (In this sense, I suppose this issue is a mix of a bug and a feature request, so my apologies if I chose the incorrect category.)Storage code for resumable uploads that makes the call to
resumable_media/requests/_helpers.py
, more specifically thehttp_request
function, seems to do much better since that function sets a default timeout of (60, 61) as opposed toNone
.The text was updated successfully, but these errors were encountered: