Skip to content

Is it possible to pass in a custom transport? #801

@tleyden

Description

@tleyden

I'm trying to add response caching via hishel transports, but am not seeing a way to customize the transport used by the Crawlee client as it is created internally in _get_client():

    def _get_client(self, proxy_url: str | None) -> httpx.AsyncClient:
        """Helper to get a HTTP client for the given proxy URL.

        If the client for the given proxy URL doesn't exist, it will be created and stored.
        """
        if proxy_url not in self._client_by_proxy_url:
            # Prepare a default kwargs for the new client.
            kwargs: dict[str, Any] = {
                'transport': _HttpxTransport(
                    proxy=proxy_url,
                    http1=self._http1,
                    http2=self._http2,
                ),
                'proxy': proxy_url,
                'http1': self._http1,
                'http2': self._http2,
            }

            # Update the default kwargs with any additional user-provided kwargs.
            kwargs.update(self._async_client_kwargs)

            client = httpx.AsyncClient(**kwargs)
            self._client_by_proxy_url[proxy_url] = client

        return self._client_by_proxy_url[proxy_url]

Is there a way to customize the httpx client transport that I'm not seeing?

Or instead of using a 3rd party library, does Crawlee have a native method for storing long term persistent caches of responses?

Somewhat related question, if its not possible to customize the transport. Is overriding HttpxHttpClient._get_client() the recommended way to use a custom httpx client, or is there a cleaner way?

    hishel_client = await _create_hishel_client(cache_path)
    class HishelCacheClient(HttpxHttpClient):
        def _get_client(self, proxy_url: str | None) -> httpx.AsyncClient:
            return hishel_client
    http_client = HishelCacheClient()

    crawler = BeautifulSoupCrawler(
        http_client=http_client,
    )

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request.t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions