Skip to content

Add ignore_http_error_status_codes and additional_http_error_status_codes arguments to PlaywrightCrawler #953

@Pijukatel

Description

@Pijukatel

Currently arguments that allow to change how different return codes are handled are available only to static http-based crawlers. Those arguments can be used in crawler __init__, but are not available in PlaywrightCrawler. If someone wants to for example ignore 403 error:

crawler = ParselCrawler(..., ignore_http_error_status_codes = {403})

but in PlaywrightCrawler they have to do something like this:

crawler = PlaywrightCrawler(...)
crawler._http_client._ignore_http_error_status_codes = {403}

That is very confusing and users will hardly even know about it. The PlaywrightCrawler behavior should be aligned with other crawlers and these should be possible to set in __init__

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request.t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions