Google changed their design, which killed the current functionality of this scraper. I have a lot going on right now and don't have time to fix this. I will try to return in June 2024 to create a patch, or an alternative solution utilizing the official google api. Sorry, thanks.
Project Description A multithreaded Google Images scraper without Chromium. Only requires the python standard library, requests, and a few helper libraries. Aimed at being cross-platform, with a preference towards linux.
A Note:
Scraping too aggressively can create large server loads, and lead to 503 errors. I'm sure this is less of an issue for Google, but please be considerate. Thanks :)
If you like the project give me a Star ⭐
As of now, there's no packaging or runner scripts. You have to clone the repo, manually install the dependencies, and run main with python.
git clone https://github.com/talleyhoe/google-image-scraper.git
pip install -r requirements.txt
$ -> python src/main.py [keyword]
optional arguments:
--count, -c How many images to try to scrape ( < 500 usually works )
--directory, -d What directory to save the images in
(Default is ~/Downloads/[keyword]))
--threads, -t How many threads to scrape with
(Default is single threaded)
- Proper packaging
- Extensibility to scrape other sites
- May not implement for a while unless there is sufficient interest