Skip to content

[Bug] Python SDK Not Catching All scraping methods failed Error #851

Open
@brian-carnot

Description

Describe the Bug
When this issue occurs from scraping a page failing, instead of throwing this exception, it seems that content is just empty instead.

To Reproduce
Steps to reproduce the issue:

  1. Wait for a website to raise exception All scraping methods failed for url: on the dashboard
  2. View the return result from the .scrape_url(url) method
  3. Example: {'content': '', 'markdown': '', 'linksOnPage': [], 'metadata': {'sourceURL': 'https://ycombinator.com/people', 'pageStatusCode': 200}}

Expected Behavior
An exception should be thrown by the scrape_url method instead of returning empty content.

Screenshots
If applicable, add screenshots or copies of the command line output to help explain the issue.

Environment (please complete the following information):

  • OS: Linux (python:3.12.3-bookworm image)
  • Firecrawl Version: ^0.0.20 Python SDK
  • Node.js Version: v23.1.0

Logs

{
    "url": "https://ycombinator.com/people",
    "type": "scrape",
    "method": "fetch",
    "result": {
        "error": null,
        "success": false,
        "time_taken": 591,
        "response_code": 200,
        "response_size": 55917
    },
    "createdAt": "2024-10-31T05:50:34.653524+00:00"
}
{
    "type": "error",
    "stack": "Error: All scraping methods failed for URL: https://ycombinator.com/people\n    at scrapSingleUrl (/app/dist/src/scraper/WebScraper/single_url.js:378:19)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async /app/dist/src/scraper/WebScraper/index.js:66:32\n    at async Promise.all (index 0)\n    at async WebScraperDataProvider.convertUrlsToDocuments (/app/dist/src/scraper/WebScraper/index.js:64:13)\n    at async Promise.all (index 0)\n    at async WebScraperDataProvider.processLinks (/app/dist/src/scraper/WebScraper/index.js:208:40)\n    at async WebScraperDataProvider.handleSingleUrlsMode (/app/dist/src/scraper/WebScraper/index.js:174:25)\n    at async runWebScraper (/app/dist/src/main/runWebScraper.js:77:23)\n    at async startWebScraperPipeline (/app/dist/src/main/runWebScraper.js:13:13)\n    at async processJob (/app/dist/src/services/queue-worker.js:236:44)\n    at async processJobInternal (/app/dist/src/services/queue-worker.js:72:24)\n    at async /app/dist/src/services/queue-worker.js:174:39\n    at async /app/dist/src/services/queue-worker.js:161:25",
    "message": "All scraping methods failed for URL: https://ycombinator.com/people",
    "createdAt": "2024-10-31T05:50:35.242789+00:00"
}

Additional Context
Add any other context about the problem here, such as configuration specifics, network conditions, data volumes, etc.

Repeating the error is inconsistent and I am not reaching a rate limit for my api key.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions