Welcome to Text Recovery Project 👋

A python library for decentralized training of a Transformer neural network across the Internet to solve the Running Key Cipher, widely known in the field of cryptography.

🚀 Objective

The main goal of the project is to study the possibility of using Transformer neural network to “read” meaningful text in columns that can be compiled for a Running Key Cipher. You can read more about the problem here.

In addition, the second rather fun 😅 goal is to train a large enough model so that it can handle the case described below. Let there be an original sentence:

Hello, my name is Zendaya Maree Stoermer Coleman but you can just call me Zendaya.

The columns for this sentence will be compiled in such a way that the last seven contain from ten to thirteen letters of the English alphabet, and all the others from two to five. Thus, the last seven characters will be much harder to "read" compared to the rest. However, we can guess from the meaning of the sentence that this is the name Zendaya. In other words, the goal is also to train a model that can understand and correctly “read” the last word.

⚙ Installation

Trecover requires Python 3.8 or higher and supports both Windows and Linux platforms.

Clone the repository:

git clone https://github.com/alex-snd/TRecover.git  && cd trecover

Create a virtual environment:
- Windows:
```
python -m venv venv
```
- Linux:
```
python3 -m venv venv
```

Activate the virtual environment:

Windows:

venv\Scripts\activate.bat

Linux:

source venv/bin/activate

Install the package inside this virtual environment:
- Just to run the demo:
```
pip install -e ".[demo]"
```
- To train the Transformer:
```
pip install -e ".[train]"
```
- For development and training:
```
pip install -e ".[dev]"
```
Initialize project's environment:
```
trecover init
```
For more options use:
```
trecover init --help
```

👀 Demo

🤗 Hugging Face
You can play with a pre-trained model hosted here.

🐳 Docker Compose

Pull from Docker Hub:

docker-compose -f docker/compose/scalable-service.yml up

Build from source:

trecover download artifacts
docker-compose -f docker/compose/scalable-service-build.yml up

💻 Local (requires docker)
- Download pretrained model:
```
trecover download artifacts
```
- Launch the service:
```
trecover up
```

🗃️ Data

The WikiText and WikiQA datasets were used to train the model, from which all characters except English letters were removed.
You can download the cleaned dataset:

trecover download data

💪 Train

To quickly start training the model, open the Jupyter Notebook .

🕸️ Collaborative
TODO

💻 Local
After the dataset is loaded, you can start training the model:

trecover train \
--project-name {project_name} \
--exp-mark {exp_mark} \
--train-dataset-size {train_dataset_size} \
--val-dataset-size {val_dataset_size} \
--vis-dataset-size {vis_dataset_size} \
--test-dataset-size {test_dataset_size} \
--batch-size {batch_size} \
--n-workers {n_workers} \
--min-noise {min_noise} \
--max-noise {max_noise} \
--lr {lr} \
--n-epochs {n_epochs} \
--epoch-seek {epoch_seek} \
--accumulation-step {accumulation_step} \
--penalty-coefficient {penalty_coefficient} \

--pe-max-len {pe_max_len} \
--n-layers {n_layers} \
--d-model {d_model} \
--n-heads {n_heads} \
--d-ff {d_ff} \
--dropout {dropout}

For more information use trecover train local --help

✔️ Related work

TODO: what was done, tech stack.

🤝 Contributing

Contributions, issues and feature requests are welcome.
Feel free to check issues page if you want to contribute.

👏 Show your support

Please don't hesitate to ⭐️ this repository if you find it cool!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Welcome to Text Recovery Project 👋

🚀 Objective

⚙ Installation

👀 Demo

🗃️ Data

💪 Train

✔️ Related work

🤝 Contributing

👏 Show your support

📜 License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Welcome to Text Recovery Project 👋

🚀 Objective

⚙ Installation

👀 Demo

🗃️ Data

💪 Train

✔️ Related work

🤝 Contributing

👏 Show your support

📜 License