CAPTCHA is an acronym for Completely Automated Turing Test to Tell Computers and Humans Apart. As the name suggests, it is a system designed solely to distinguish between humans and machines. This means that CAPTCHA challenges ought to be easily solvable for humans, but impossible to solve for any automated systems.
This repository explores the feasibility of breaking image-based reCAPTCHA challenges using machine learning techniques. The aim is to proove that reCAPTCHA is no longer safe, and can be broken with relatively mild effort, even without using transformer based models.
In order to run the program, you will need to install both Docker and Python3.12 or greater, then pip or poetry. Then follow the steps below:
-
poetry
poetry install --no-root poetry env activate
then copy and paste the output into the terminal to activate the virtual environment.
-
pip
pip install .
Begin with editing .env file. Fill all the needed data - browser type and paths to files containing model weights.
Then activate your docker container with:
cd docker
sudo docker compose upand run the program by typing:
python main.pyThe main flow of the project can be summarised as follows:
- The user initiates the CAPTCHA-solving process by executing the
main.pyscript. - The
CaptchaProcessorclass orchestrates the solution by:- Capturing a screenshot of the CAPTCHA using the
GuiAgent. - Processing the image via the
ImageProcessorgenerated byImageProcessorFactoryto cut it into pieces and extract relevant features. - Performing OCR on the extracted header image to identify the CAPTCHA type we are looking for.
- Predicting the correct images or actions using pre-trained models (single or multi-image classifiers).
- Handling mouse actions using the
MouseEngine, which moves the mouse using chosen movement strategy.
- Capturing a screenshot of the CAPTCHA using the
- Once the CAPTCHA is solved, the
CaptchaProcessorhandles the submission and proceeds to the next CAPTCHA if applicable.
The project was created as a mono-repository, and its structure is logically divided to ensure separation and maintainability.
-
The app/ directory holds the core operational files, including GUI interaction via
GuiAgent, image processing tools, and mouse movement strategies. -
The gym/ directory is dedicated to neural networks training. This includes training models for CAPTCHA solving and mouse movement, dataset preparation, and testing. By isolating training-related tasks in this module, the project maintains a clear distinction between training workflows and real-world operations.
-
The docker/ directory provides configurations for containerized deployment.
To ensure the CAPTCHA-solving program could process images despite various resolutions in various systems, a segmentation algorithm was implemented.
Currently, breaking reCAPTCHA systems primarily involves solving two core challenges. Naturally, there are additional ones, but all can be addressed using models trained for these core tasks:
For the
For the
The model was also trained in stages, with the same training strategy as the previous model. The model was trained using the Adam optimizer with a varying learning rate.
The mouse movement functionality was implemented using the Strategy design pattern, enabling seamless substitution of different movement algorithms.
One innovative strategy was a Generative Adversarial Network (GAN) for generating realistic mouse movements. To create the dataset for training, a simple game was developed where users clicked on a green square to start recording their mouse movements and a red square to stop. These recorded sequences were then used to train the GAN.
Training GANs, however, is notoriously challenging. Issues such as mode collapse and instability made it difficult to train the model robustly. Hence, a baseline deterministic algorithm was also implemented as a strategy.










