Skip to content

fix(scraper): improve charset detection regex to accurately parse met…#1265

Merged
mogery merged 1 commit intofirecrawl:mainfrom
GrassH:fix-html4-charset
Feb 26, 2025
Merged

fix(scraper): improve charset detection regex to accurately parse met…#1265
mogery merged 1 commit intofirecrawl:mainfrom
GrassH:fix-html4-charset

Conversation

@GrassH
Copy link
Contributor

@GrassH GrassH commented Feb 26, 2025

improve charset detection to fully support legacy HTML4 declarations while maintaining HTML5 compatibility

works with both HTML4-style:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
and HTML5-style:
<meta charset="ISO-8859-1">

@nickscamara nickscamara requested a review from mogery February 26, 2025 16:08
@mogery mogery merged commit 7bf04d4 into firecrawl:main Feb 26, 2025
16 of 17 checks passed
@mogery
Copy link
Member

mogery commented Feb 26, 2025

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants