📄️ Architecture overview
An overview of the core components of the Crawlee library and its architecture.
📄️ Avoid getting blocked
How to avoid getting blocked when scraping
📄️ Logging in with a crawler
How to log in to websites with Crawlee.
📄️ Creating web archive
How to create a Web ARChive (WARC) with Crawlee
📄️ Error handling
How to handle errors that occur during web crawling.
📄️ HTTP clients
Learn about Crawlee's HTTP client architecture, how to switch between different implementations, and create custom HTTP clients for specialized web scraping needs.
📄️ HTTP crawlers
Learn about Crawlee's HTTP crawlers including BeautifulSoup, Parsel, and raw HTTP crawlers for efficient server-rendered content extraction without JavaScript execution.
📄️ Playwright crawler
Learn how to use PlaywrightCrawler for browser-based web scraping.
📄️ Adaptive Playwright crawler
Learn how to use the Adaptive Playwright crawler to automatically switch between browser-based and HTTP-only crawling.
📄️ Playwright with Stagehand
How to integrate Stagehand AI-powered automation with PlaywrightCrawler.
📄️ Proxy management
Using proxies to get around those annoying IP-blocks
📄️ Request loaders
How to manage the requests your crawler will go through.
📄️ Request router
Learn how to use the Router class to organize request handlers, error handlers, and pre-navigation hooks in Crawlee.
📄️ Running in web server
Running in web server
📄️ Scaling crawlers
Learn how to scale your crawlers by controlling concurrency and limiting requests per minute.
📄️ Service locator
Crawlee's service locator is a central registry for global services, managing and providing access to them throughout the whole framework.
📄️ Session management
How to manage your cookies, proxy IP rotations and more.
📄️ Storage clients
How to work with storage clients in Crawlee, including the built-in clients and how to create your own.
📄️ Storages
How to work with storages in Crawlee, how to manage requests and how to store and retrieve scraping results.
📄️ Trace and monitor crawlers
Learn how to instrument your crawlers with OpenTelemetry to trace request handling, identify bottlenecks, monitor performance, and visualize telemetry data using Jaeger for performance optimization.