Skip to content

doc: extract URL as string in examples#349

Closed
vdusek wants to merge 1 commit intomasterfrom
update-docs-str-url
Closed

doc: extract URL as string in examples#349
vdusek wants to merge 1 commit intomasterfrom
update-docs-str-url

Conversation

@vdusek
Copy link
Copy Markdown
Collaborator

@vdusek vdusek commented Jul 23, 2024

No description provided.

@vdusek vdusek added documentation Improvements or additions to documentation. t-tooling Issues with this label are in the ownership of the tooling team. adhoc Ad-hoc unplanned task added during the sprint. labels Jul 23, 2024
@vdusek vdusek added this to the 94th sprint - Tooling team milestone Jul 23, 2024
@vdusek vdusek requested a review from janbuchar July 23, 2024 14:59
@vdusek vdusek self-assigned this Jul 23, 2024
# Extract data from the page.
data = {
'url': context.request.url,
'url': str(context.request.url),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so was this a string before #343? if so, it feels quite breaking. also i am honestly not sure if its a good idea to have something else than a string URL in the context

@janbuchar
Copy link
Copy Markdown
Collaborator

Hm... what happens if you don't stringify the URL?

@vdusek
Copy link
Copy Markdown
Collaborator Author

vdusek commented Jul 23, 2024

@janbuchar

Hm... what happens if you don't stringify the URL?

Log:

[crawlee.playwright_crawler.playwright_crawler] INFO  Extracted data: {'url': Url('https://apify.com/'), 'title': 'Apify: Full-stack web scraping and data extraction platform'}

Dataset (it's converted to string somewhere):

{
  "url": "https://apify.com/",
  "title": "Apify: Full-stack web scraping and data extraction platform"
}

@janbuchar
Copy link
Copy Markdown
Collaborator

Well, honestly that's better than what I expected.

However, @B4nan also raises a valid point. We may want to keep the AnyHttpUrl for the url attribute so that we still have validation and add a url_string property 🤷.

@vdusek
Copy link
Copy Markdown
Collaborator Author

vdusek commented Jul 24, 2024

Closing, context.request.url should stay string.

@vdusek vdusek closed this Jul 24, 2024
@vdusek vdusek deleted the update-docs-str-url branch July 24, 2024 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. documentation Improvements or additions to documentation. t-tooling Issues with this label are in the ownership of the tooling team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants