reCAPTCHA Solver

About

CAPTCHA is an acronym for Completely Automated Turing Test to Tell Computers and Humans Apart. As the name suggests, it is a system designed solely to distinguish between humans and machines. This means that CAPTCHA challenges ought to be easily solvable for humans, but impossible to solve for any automated systems.

This repository explores the feasibility of breaking image-based reCAPTCHA challenges using machine learning techniques. The aim is to proove that reCAPTCHA is no longer safe, and can be broken with relatively mild effort, even without using transformer based models.

How to use?

Installation

In order to run the program, you will need to install both Docker and Python3.12 or greater, then pip or poetry. Then follow the steps below:

poetry
```
poetry install --no-root
poetry env activate
```
then copy and paste the output into the terminal to activate the virtual environment.
pip
```
pip install .
```

Running the program

Begin with editing .env file. Fill all the needed data - browser type and paths to files containing model weights. Then activate your docker container with:

cd docker
sudo docker compose up

and run the program by typing:

python main.py

Project Flow

The main flow of the project can be summarised as follows:

The user initiates the CAPTCHA-solving process by executing the main.py script.
The CaptchaProcessor class orchestrates the solution by:
- Capturing a screenshot of the CAPTCHA using the GuiAgent.
- Processing the image via the ImageProcessor generated by ImageProcessorFactory to cut it into pieces and extract relevant features.
- Performing OCR on the extracted header image to identify the CAPTCHA type we are looking for.
- Predicting the correct images or actions using pre-trained models (single or multi-image classifiers).
- Handling mouse actions using the MouseEngine, which moves the mouse using chosen movement strategy.
Once the CAPTCHA is solved, the CaptchaProcessor handles the submission and proceeds to the next CAPTCHA if applicable.

Directory Structure

The project was created as a mono-repository, and its structure is logically divided to ensure separation and maintainability.

The app/ directory holds the core operational files, including GUI interaction via GuiAgent, image processing tools, and mouse movement strategies.
The gym/ directory is dedicated to neural networks training. This includes training models for CAPTCHA solving and mouse movement, dataset preparation, and testing. By isolating training-related tasks in this module, the project maintains a clear distinction between training workflows and real-world operations.
The docker/ directory provides configurations for containerized deployment.

How it all works?

Screenshot and Segmentation

To ensure the CAPTCHA-solving program could process images despite various resolutions in various systems, a segmentation algorithm was implemented.

Machine Learning Models

Currently, breaking reCAPTCHA systems primarily involves solving two core challenges. Naturally, there are additional ones, but all can be addressed using models trained for these core tasks:

Multiple Image CAPTCHA

For the $3 \times 3$ Multiple Objects CAPTCHA a transfer learning solution was used. The base architecture for the NN was ResNet-18, with its final fully-connected layer removed, thus transforming it into a feature extractor. The extracted features were then passed through a fully connected layer with 12 outputs, each representing the probability of the input belonging to a specific class.

The model was trained in stages. First, fully frozen for few epochs, then with last layer unfrozen for few more epochs, and then with second-to-last layer unfrozen for the final training. The model was trained using the Adam optimizer with a varying learning rate.

Single Image CAPTCHA

For the $4 \times 4$ Single Object CAPTCHA, again, a ResNet-18 structure was used, with its final fully connected layer removed. Then adjustments were done to address the challenge at hand - additional contextual information about the object class was added into the ResNet output by concatenating a class embedding with the extracted image features.

The model was also trained in stages, with the same training strategy as the previous model. The model was trained using the Adam optimizer with a varying learning rate.

Mouse Simulation

The mouse movement functionality was implemented using the Strategy design pattern, enabling seamless substitution of different movement algorithms.

One innovative strategy was a Generative Adversarial Network (GAN) for generating realistic mouse movements. To create the dataset for training, a simple game was developed where users clicked on a green square to start recording their mouse movements and a red square to stop. These recorded sequences were then used to train the GAN.

Training GANs, however, is notoriously challenging. Issues such as mode collapse and instability made it difficult to train the model robustly. Hence, a baseline deterministic algorithm was also implemented as a strategy.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github		.github
app		app
docker		docker
gym		gym
.env		.env
.gitignore		.gitignore
README.md		README.md
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reCAPTCHA Solver

About

How to use?

Installation

Running the program

Project Flow

Directory Structure

How it all works?

Screenshot and Segmentation

Machine Learning Models

Multiple Image CAPTCHA

Single Image CAPTCHA

Mouse Simulation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

reCAPTCHA Solver

About

How to use?

Installation

Running the program

Project Flow

Directory Structure

How it all works?

Screenshot and Segmentation

Machine Learning Models

Multiple Image CAPTCHA

Single Image CAPTCHA

Mouse Simulation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages