Skip to content

feat: integrate proxies into PlaywrightCrawler#325

Merged
vdusek merged 5 commits intomasterfrom
integrate-proxies-into-pw
Jul 18, 2024
Merged

feat: integrate proxies into PlaywrightCrawler#325
vdusek merged 5 commits intomasterfrom
integrate-proxies-into-pw

Conversation

@vdusek
Copy link
Copy Markdown
Collaborator

@vdusek vdusek commented Jul 18, 2024

Description

  • feat: integrate proxies into PlaywrightCrawler

Issues

  • N/A

Testing

  • Reproduce it with the following:
import asyncio

from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.proxy_configuration import ProxyConfiguration


async def main() -> None:
    proxy_configuration = ProxyConfiguration(proxy_urls=['http://USERNAME:PASSWORD@HOSTNAME:PORT'])
    crawler = PlaywrightCrawler(proxy_configuration=proxy_configuration)

    @crawler.router.default_handler
    async def request_handler(context: PlaywrightCrawlingContext) -> None:
        data = {
            'url': context.request.url,
            'title': await context.page.title(),
            'content': await context.page.content(),
        }

        context.log.info(f'Extracted data: {data}')
        await context.push_data(data)

    await crawler.run(['http://httpbin.org/ip'])


if __name__ == '__main__':
    asyncio.run(main())

Checklist

  • Changes are described in the CHANGELOG.md
  • CI passed

@vdusek vdusek added bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team. adhoc Ad-hoc unplanned task added during the sprint. labels Jul 18, 2024
@vdusek vdusek added this to the 94th sprint - Tooling team milestone Jul 18, 2024
@vdusek vdusek requested a review from janbuchar July 18, 2024 12:00
@vdusek vdusek self-assigned this Jul 18, 2024
@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Jul 18, 2024
@vdusek vdusek force-pushed the integrate-proxies-into-pw branch from d75ade2 to 15179de Compare July 18, 2024 12:05
@vdusek vdusek requested a review from janbuchar July 18, 2024 12:46
@vdusek vdusek merged commit 2e072b6 into master Jul 18, 2024
@vdusek vdusek deleted the integrate-proxies-into-pw branch July 18, 2024 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants