When a request fails in PlaywrightCrawler, my @crawler.failed_request_handler runs (I can see context.log output) but await context.push_data(...) does not create any dataset rows. Logging works (I can see the context.log.info(...) lines), only the dataset push appears to be ignored.
Minimal repro attached below.
import asyncio
from crawlee.crawlers import (
PlaywrightCrawler,
PlaywrightCrawlingContext,
BasicCrawlingContext,
)
async def main() -> None:
crawler = PlaywrightCrawler(
max_requests_per_crawl=10,
max_request_retries=2,
headless=True,
browser_type='chromium',
)
@crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None:
context.log.info(f'Processing {context.request.url} ...')
@crawler.failed_request_handler
async def failed_request_handler(context: BasicCrawlingContext, error: Exception) -> None:
context.log.info(f'failed_request_handler: processing {context.request.url} ...')
await context.push_data(
dataset_name="failed_request_handler_errors",
data={
"failed_url": context.request.url,
"label": context.request.label,
"error_type": type(error).__name__,
"error_message": str(error),
"retry_count": context.request.retry_count,
"status": "failed",
},
)
await crawler.run(['https://www.info.gouv.fr/totalnonsense'])
if __name__ == '__main__':
asyncio.run(main())
I'm using crawlee[playwright]>=1.0.4
Relevant part of the documentation is https://crawlee.dev/python/docs/guides/request-router#failed-request-handler
When a request fails in PlaywrightCrawler, my
@crawler.failed_request_handlerruns (I can seecontext.logoutput) butawait context.push_data(...)does not create any dataset rows. Logging works (I can see thecontext.log.info(...)lines), only the dataset push appears to be ignored.Minimal repro attached below.
I'm using
crawlee[playwright]>=1.0.4Relevant part of the documentation is https://crawlee.dev/python/docs/guides/request-router#failed-request-handler