VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration

VoiceRestore is a cutting-edge speech restoration model designed to significantly enhance the quality of degraded voice recordings. Leveraging flow-matching transformers, this model excels at addressing a wide range of audio imperfections commonly found in speech, including background noise, reverberation, distortion, and signal loss.

Demo of audio restorations: VoiceRestore

Credits: This repository is based on the E2-TTS implementation by Lucidrains

Super easy usage - using Transformers 🤗 by @jadechoghari - Hugging Face

Build it locally on gradio in this repo.

Try the Model here:

Example

Degraded Input:

Degraded audio (reverberation, distortion, noise, random cut):

Note: Adjust your volume before playing the degraded audio sample, as it may contain distortions.

degraded.mp4

Restored (steps=32, cfg=1.0):

Restored audio - 16 steps, strength 0.5:

restored.mp4

Ground Truth:

Key Features

Universal Restoration: The model can handle any level and type of voice recording degradation. Pure magic.
Easy to Use: Simple interface for processing degraded audio files.
Pretrained Model: Includes a 301 million parameter transformer model with pre-trained weights. (Model is still in the process of training, there will be further checkpoint updates)

Quick Start

Clone the repository:

git clone --recurse-submodules https://github.com/skirdey/voicerestore.git
cd VoiceRestore

if you did not clone with --recurse-submodules, you can run:

git submodule update --init --recursive

Install dependencies:
```
pip install -r requirements.txt
```
Download the pre-trained model and place it in the checkpoints folder. (Updated 9/29/2024)

Run a test restoration:

python inference_short.py --checkpoint ./checkpoints/voice-restore-20d-16h-optim.pt --input test_input.wav --output test_output.wav --steps 32 --cfg_strength 0.5

This will process test_input.wav and save the result as test_output.wav.

Run a long form restoration, it uses window chunking:

python inference_long.py --checkpoint ./checkpoints/voice-restore-20d-16h-optim.pt --input test_input_long.wav --output test_output_long.wav --steps 32 --cfg_strength 0.5 --window_size_sec 10.0 --overlap 0.25

This will process test_input_long.wav (you need to provide it) and save the result as test_output_long.wav.

Usage

To restore your own audio files:

from model import OptimizedAudioRestorationModel

model = OptimizedAudioRestorationModel()
restored_audio = model.forward(input_audio, steps=32, cfg_strength=0.5)

Alternative Usage - using Transformers 🤗

!git lfs install
!git clone https://huggingface.co/jadechoghari/VoiceRestore
%cd VoiceRestore
!pip install -r requirements.txt

from transformers import AutoModel
# path to the model folder (on colab it's as follows)
checkpoint_path = "/content/VoiceRestore"
model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True)
model("test_input.wav", "test_output.wav")

Model Details

Architecture: Flow-matching transformer
Parameters: 300M+ parameters
Input: Degraded speech audio (various formats supported)
Output: Restored speech

Limitations and Future Work

Current model is optimized for speech; may not perform optimally on music or other audio types.
Ongoing research to improve performance on extreme degradations.
Future updates may include real-time processing capabilities.

Citation

If you use VoiceRestore in your research, please cite our paper:

@article{kirdey2024voicerestore,
  title={VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration},
  author={Kirdey, Stanislav},
  journal={arXiv},
  year={2024}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Based on the E2-TTS implementation by Lucidrains
Special thanks to the open-source community for their invaluable contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
BigVGAN @ 7d2b454		BigVGAN @ 7d2b454
audio		audio
imgs		imgs
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
inference_long.py		inference_long.py
inference_short.py		inference_short.py
long_form_sample.ogg		long_form_sample.ogg
model.py		model.py
requirements.txt		requirements.txt
tensor_typing.py		tensor_typing.py
test_input.wav		test_input.wav
voice_restore.py		voice_restore.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration

Super easy usage - using Transformers 🤗 by @jadechoghari - Hugging Face

Build it locally on gradio in this repo.

Try the Model here:

Example

Degraded Input:

Restored (steps=32, cfg=1.0):

Ground Truth:

Key Features

Quick Start

Usage

Alternative Usage - using Transformers 🤗

Model Details

Limitations and Future Work

Citation

License

Acknowledgments

About

Releases 1

Packages

Contributors 3

Languages

License

skirdey/voicerestore

Folders and files

Latest commit

History

Repository files navigation

VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration

Super easy usage - using Transformers 🤗 by @jadechoghari - Hugging Face

Build it locally on gradio in this repo.

Try the Model here:

Example

Degraded Input:

Restored (steps=32, cfg=1.0):

Ground Truth:

Key Features

Quick Start

Usage

Alternative Usage - using Transformers 🤗

Model Details

Limitations and Future Work

Citation

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages