Some notes/resources for bypassing anti-bot/scraping features on Cloudflare, Akamai, etc.
- https://www.zenrows.com/blog/bypass-cloudflare
- https://www.zenrows.com/blog/bypass-cloudflare#waiting-room-reverse-engineer
- Lots of neat content in this section about the high level aspects of how the Cloudflare JavaScript check works, some of it's obfuscation techniques, and the general flow of things.
- https://www.zenrows.com/blog/bypass-cloudflare#waiting-room-reverse-engineer
- https://www.zenrows.com/blog/bypass-akamai
- https://www.zenrows.com/blog/bypass-akamai#akamais-javascript-challenge-explained
- Less detail than the Cloudflare post had, but still some useful detail
- https://www.zenrows.com/blog/bypass-akamai#akamais-javascript-challenge-explained
- https://github.com/FlareSolverr/FlareSolverr
-
Proxy server to bypass Cloudflare protection
-
FlareSolverr starts a proxy server, and it waits for user requests in an idle state using few resources. When some request arrives, it uses Selenium with the undetected-chromedriver to create a web browser (Chrome). It opens the URL with user parameters and waits until the Cloudflare challenge is solved (or timeout). The HTML code and the cookies are sent back to the user, and those cookies can be used to bypass Cloudflare using other HTTP clients.
-
- https://github.com/ultrafunkamsterdam/undetected-chromedriver
-
Optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io Automatically downloads the driver binary and patches it.
-
- https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth
-
A plugin for puppeteer-extra and playwright-extra to prevent detection.
-
puppeteer-extra with stealth passes all public bot tests.
Please note: I consider this a friendly competition in a rather interesting cat and mouse game. If the other team (👋) wants to detect headless chromium there are still ways to do that (at least I noticed a few, which I'll tackle in future updates).
It's probably impossible to prevent all ways to detect headless chromium, but it should be possible to make it so difficult that it becomes cost-prohibitive or triggers too many false-positives to be feasible.
If something new comes up or you experience a problem, please do your homework and create a PR in a respectful way (this is Github, not reddit) or I might not be motivated to help. :)
- https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth/evasions
-
- https://github.com/0xdevalias/chatgpt-source-watch : Analyzing the evolution of ChatGPT's codebase through time with curated archives and scripts.
- Deobfuscating / Unminifying Obfuscated Web App Code (0xdevalias' gist)
- Reverse Engineering Webpack Apps (0xdevalias' gist)
- Reverse Engineered Webpack Tailwind-Styled-Component (0xdevalias' gist)
- React Server Components, Next.js v13+, and Webpack: Notes on Streaming Wire Format (
__next_f
, etc) (0xdevalias' gist)) - Fingerprinting Minified JavaScript Libraries / AST Fingerprinting / Source Code Similarity / Etc (0xdevalias' gist)
- Debugging Electron Apps (and related memory issues) (0xdevalias' gist)
- devalias' Beeper CSS Hacks (0xdevalias' gist)
- Reverse Engineering Golang (0xdevalias' gist)
- Reverse Engineering on macOS (0xdevalias' gist)