Automatic Speech Recognition (ASR)

About • Installation • How To Use • Final results • Credits • License

About

This repository contains the end-to-end pipeline for solving ASR task with PyTorch. The model was implemented is Deep Speech 2.

See the task assignment here.

See wandb report with all experiments.

Installation

Follow these steps to install the project:

(Optional) Create and activate new environment using conda.

# create env
conda create -n ASR python=3.10

# activate env
conda activate ASR

Install all required packages.
```
pip install -r requirements.txt
```
Download model checkpoint, vocab and language model.
```
python download_weights.py
```

How To Use

Inference

If you want only to decode audio to text, your directory with audio should has the following format:

NameOfTheDirectoryWithUtterances
└── audio
     ├── UtteranceID1.wav # may be flac or mp3
     ├── UtteranceID2.wav
     .
     .
     .
     └── UtteranceIDn.wav

Run the following command:

python inference.py datasets=inference_custom inferencer.save_path=SAVE_PATH datasets.test.audio_dir=TEST_DATA/audio

where SAVE_PATH is a path to save predicted text and TEST_DATA is directory with audio.

If you have ground truth text and want to evaluate model, make sure that directory with audio and ground truth text has the following format:

NameOfTheDirectoryWithUtterances
├── audio
│   ├── UtteranceID1.wav # may be flac or mp3
│   ├── UtteranceID2.wav
│   .
│   .
│   .
│   └── UtteranceIDn.wav
└── transcriptions
    ├── UtteranceID1.txt
    ├── UtteranceID2.txt
    .
    .
    .
    └── UtteranceIDn.txt

Then run the following command:

python inference.py datasets=inference_custom inferencer.save_path=SAVE_PATH datasets.test.audio_dir=TEST_DATA/audio datasets.test.transcription_dir=TEST_DATA/transcriptions

If you only have predicted and ground truth texts and only want to evaluate model, make sure that directory with ones has the following format:

NameOfTheDirectoryWithUtterances
 ├── ID1.json # may be flac or mp3
 .
 .
 .
 └── IDn.json

ID1 = {"pred_text": "ye are newcomers", "text": "YE ARE NEWCOMERS"}

Then run the following command:

python calculate_wer_cer.py --dir_path=DIR

Finally, if you want to reproduce results from here, run the following code:
```
python inference.py dataloader.batch_size=500 inferencer.save_path=SAVE_PATH datasets.test.part="test-other"
```
Feel free to choose what kind of metrics you want to evaluate (see this config).

Training

The model training contains of 3 stages. To reproduce results, train model using the following commands:

Train 47 epochs without augmentations

python train.py writer.run_name="part1" dataloader.batch_size=230 transforms=example_only_instance trainer.early_stop=47

Train 103 epochs with augmentations

python train.py writer.run_name="part2" dataloader.batch_size=230 trainer.resume_from=part1/model_best.pth datasets.val.part=test-other

Train 15 epochs from new optimizer state

python train.py -cn=part3 writer.run_name="part3" dataloader.batch_size=230 datasets.val.part=test-other

It takes around 57 hours to train model from scratch on A100 GPU.

Final results

This results were obtained using beam search and language model:

                WER     CER
test-other     16.96    9.43
test-clean     6.34     2.58

You can see that using language model yields a very significant quality boost:

             WER(w/o lm)    CER(w/o lm)
test-other      25.35          9.73

Finally, beam search also contributes to quality improvement:

             WER(w/o bm)     CER(w/o bm)
test-other      25.80            9.91

Credits

This repository is based on a PyTorch Project Template.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
src		src
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
calculate_wer_cer.py		calculate_wer_cer.py
download_weights.py		download_weights.py
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Speech Recognition (ASR)

About

Installation

How To Use

Inference

Training

Final results

Credits

License

About

Languages

License

free001style/ASR

Folders and files

Latest commit

History

Repository files navigation

Automatic Speech Recognition (ASR)

About

Installation

How To Use

Inference

Training

Final results

Credits

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages