fix: http clients detect 4xx as errors by default#498
Conversation
There was a problem hiding this comment.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
There was a problem hiding this comment.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
9fbf479 to
600a05a
Compare
There was a problem hiding this comment.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
|
I haven't read the code yet, but does it retry on server error and fail immediately on client error? |
No, we do retry for all error codes. |
There was a problem hiding this comment.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
There was a problem hiding this comment.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
Well, I tried running JS crawlee against httpbin/status/{4xx, 5xx} with We should keep the parity here... |
|
we also have some special handling for blocked status codes like 403 when it comes to sessions (and this needs to be configurable) https://github.com/search?q=repo%3Aapify%2Fcrawlee%20blocked%20status%20codes&type=code |
6c3e0a4 to
38af191
Compare
There was a problem hiding this comment.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
There was a problem hiding this comment.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
e5d6893 to
898b436
Compare
There was a problem hiding this comment.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
|
Update:
AFAIK this is already implemented. |
src/crawlee/http_clients/_httpx.py
Outdated
| This client uses the `HTTPX` library to perform HTTP requests in crawlers (`BasicCrawler` subclasses) | ||
| and to manage sessions, proxies, and error handling. | ||
|
|
||
| The client raises an `HttpStatusCodeError` when it encounters an error response, defined by default as any |
There was a problem hiding this comment.
I think this should apply to any http client, right? Couldn't we just put this into the base client doc and link to it here?
There was a problem hiding this comment.
It should, but the implementation of such behavior is in the specific HTTP clients, not the base. If someone creates their own HTTP client, they are not required to implement it in the same way. As a result, the docstring may not be true.
There was a problem hiding this comment.
Valid point. In that case, we may want to pull this logic up into BaseHttpClient though. It seems unlikely you'd want to change this behavior, and at the same time, we don't want to make http-client-authors implement this over and over again.
Co-authored-by: Jan Buchar <[email protected]>
Description
Issues
Testing
Checklist