FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching

The official implementation of FLowHigh

Jun-Hak Yun, Seung-Bin Kim, Seong-Whan Lee

Clone our repository

git clone https://github.com/jjunak-yun/FLowHigh_code.git
cd FLowHigh_code

Install the requirements

pip install -r requirements.txt

Data preparation

Download the VCTK dataset.
Remove speakers p280 and p315 from the dataset.
Create a train directory and a test directory, then split the dataset accordingly.
Update the data_path in the configs/config.json file with the path to your newly created train directory.

Training

To adjust the training conditions, modify the configs/config.json file according to your preferences.

CUDA_VISIBLE_DEVICES=0 python train.py

Pre-trained checkpoints

FLowHigh_indep_adaptive_400k : This is the main model from our paper, an audio super-resolution model designed to reconstruct low-resolution audio into high-resolution audio at 48 kHz.

FLowHigh_basic_400k : This model adopts conditional probability path based on basic flow-matching and is also an audio super-resolution model that reconstructs low-resolution audio into high-resolution audio at 48 kHz.

Inference of audio

Prepare the checkpoint of the trained model.
Prepare a downsampled audio sample with a sampling rate smaller than 48 kHz (e.g., 12 kHz, 16 kHz).
Note: If you wish to match the experimental setup in our paper, use scipy.resample_poly() to downsample the audio.
Run the following command:

CUDA_VISIBLE_DEVICES=0 python inference.py \
    --input_path {downsampled_audio_path} --output_path {save_output_audio_path} \
    --target_sampling_rate 48000 --up_sampling_method scipy --architecture='transformer' \
    --time_step 1 --ode_method={ode_solver} --cfm_method={cfm_path} --sigma 0.0001 \
    --model_path {model_checkpoint_path} \
    --n_layers 2 --n_heads 16 --dim_head 64 \
    --n_mels 256 --f_max 24000 --n_fft 2048 --win_length 2048 --hop_length 480 \
    --vocoder 'bigvgan' --vocoder_path='/PATH/vocoder/BIGVGAN/checkpoint/g_48_00850000' \
    --vocoder_config_path='/PATH/vocoder/BIGVGAN/config/bigvgan_48khz_256band_config.json' \

Parameter Name	Description
--time_step	The number of steps for solving the ODE (Ordinary Differential Equation). In our paper, we utilized a single-step approach (`time_step=1`). While increasing `time_step` generally enhances the quality, in our case, the improvement was not significantly noticeable.
--ode_method	Choose between `euler` or `midpoint`. The `midpoint` method improves performance but doubles the NFEs (Number of Function Evaluations). Recommendation: Despite the increase in NFE, we recommend using the `midpoint` method for better performance. Note: The choice of `ode_method` is independent of the trainind settings.
--cfm_method	Sets the Conditional Probability Paths. In our paper, we used the path `independent_cfm_adaptive`. Other available options include `basic_cfm`(https://arxiv.org/abs/2210.02747) and `independent_cfm_constant`(https://arxiv.org/abs/2302.00482).
--sigma	Influences the path setting. Ensure you use the same value for `sigma` as was used during training.

To-do list

add base training code
add requirements.txt
upload pre-trained checkpoint for independent_cfm_adaptive
upload pre-trained checkpoint for basic_cfm
optimize the training speed

References

This implementation was developed based on the following repository:

Voicebox(unofficial pytorch implementation): https://github.com/lucidrains/voicebox-pytorch.git (for architecture backbone)
Fre-painter: https://github.com/FrePainter/code.git (for audio super-resolution implementation)
TorchCFM: https://github.com/atong01/conditional-flow-matching.git (for CFM logic)
BigVGAN: https://github.com/NVIDIA/BigVGAN.git (for pre-trained vocoder)
Nu-wave2: https://github.com/maum-ai/nuwave2.git (for data processing)

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
configs		configs
vocoder/BIGVGAN		vocoder/BIGVGAN
LICENSE		LICENSE
README.md		README.md
attend.py		attend.py
cfm_superresolution.py		cfm_superresolution.py
data.py		data.py
image1.jpg		image1.jpg
inference.py		inference.py
init_vocoder.py		init_vocoder.py
modules.py		modules.py
optimizer.py		optimizer.py
postprocessing.py		postprocessing.py
requirements.txt		requirements.txt
train.py		train.py
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching

The official implementation of FLowHigh

Clone our repository

Install the requirements

Data preparation

Training

Pre-trained checkpoints

Inference of audio

To-do list

References

About

Releases

Packages

Languages

License

jjunak-yun/FLowHigh_code

Folders and files

Latest commit

History

Repository files navigation

FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching

The official implementation of FLowHigh

Clone our repository

Install the requirements

Data preparation

Training

Pre-trained checkpoints

Inference of audio

To-do list

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages