Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

README.md

CL-MASR: A Continual Learning Benchmark for Multilingual ASR

This is the official benchmark platform accompanying the paper CL-MASR: A Continual Learning Benchmark for Multilingual ASR.

It includes scripts to train Whisper and WavLM-based ASR systems on a subset of 20 languages selected from Common Voice 13 in a continual learning fashion using a handful of methods including rehearsal-based, architecture-based, and regularization-based approaches.

The goal is to continually learn new languages while limiting forgetting the previously learned ones. An ideal method should achieve both positive forward transfer (i.e. improve performance on new tasks leveraging shared knowledge from previous tasks) and positive backward transfer (i.e. improve performance on previous tasks leveraging shared knowledge from new tasks).

The following algorithms have been implemented so far:


⚡ Dataset [download]

The dataset used for the CL-MASR benchmark is extracted from Common Voice 13 (see reference paper). Each of the 20 languages in the dataset includes approximately 10 hours of training material, with an additional 1 hour designated for validation and another 1 hour for testing purposes.

Download the dataset from here and extract it to a data folder of your choice (CL-MASR by default).


🛠️️ Installation

To set up the benchmark, clone the benchmark repository and install SpeechBrain:

git clone https://github.com/speechbrain/benchmarks.git
cd benchmarks
git submodule update --init --recursive
cd speechbrain
pip install -r requirements.txt
pip install -e .

▶️ Quickstart

Running an experiment

Navigate to <path-to-repository>/benchmarks/CL_MASR/<model>, open a terminal and run:

python train_<cl-method>.py hparams/train_<cl-method>.yaml --data_folder <path-to-data-folder>

NOTE: in order to reproduce the experiments with WavLM large, you need to download checkpoint pretrained on the base languages from here.

NOTE: to profile the model (optional), install ptflops and torchinfo as additional dependencies.

NOTE: multi-GPU training is currently not supported.


Analyzing the results

Navigate to <path-to-repository>/benchmarks/CL_MASR, open a terminal and run:

python analyze_logs.py <path-to-folder-containing-model-logs>

This command will recursively retrieve and analyze all log files that are named according to the format <cl-method>_base=<comma-separated-base-locales>_new=<comma-separated-new-locales>.txt (this is the default naming convention followed in all the training scripts). You can find the resulting performance metric summaries and plots in <path-to-folder-containing-model-logs>. See the help (python analyze_logs.py -h) for advanced configuration options.

NOTE: make sure to specify the --im_refs and --fwt_refs arguments that correspond to the given model (default to Whisper large-v2).

NOTE: to plot the results (optional), install matplotlib and/or plotly as additional dependencies.


📈️ Results

Release Hyperparameters Average AWER Average BWT Average IM Average FWT Logs GPUs
07-06-23 whisper/hparams/train_ft.yaml 98.50 -84.58 -4.16 -0.83 Link 1xV100 32GB
07-06-23 whisper/hparams/train_er.yaml 50.83 -13.20 -0.81 -4.17 Link 1xV100 32GB
07-06-23 whisper/hparams/train_agem.yaml 81.08 -55.85 0.20 -5.19 Link 1xV100 32GB
01-10-23 whisper/hparams/train_der.yaml 67.84 -41.28 -4.29 - Not available 1xV100 32GB
07-06-23 whisper/hparams/train_pnn.yaml 44.12 0.00 3.18 -8.16 Link 1xV100 32GB
07-06-23 whisper/hparams/train_pb.yaml 43.95 0.00 3.51 -8.50 Link 1xV100 32GB
01-10-23 whisper/hparams/train_l2p.yaml 114.65 0.00 110.50 - Not available 1xV100 32GB
07-06-23 whisper/hparams/train_ewc.yaml 98.04 -68.32 2.87 -7.85 Link 1xV100 32GB
07-06-23 whisper/hparams/train_lwf.yaml 95.76 -77.50 0.00 -4.98 Link 1xV100 32GB
01-10-23 whisper/hparams/train_mas.yaml 68.08 -0.58 38.62 - Not available 1xV100 32GB
07-06-23 wavlm/hparams/train_ft.yaml 91.61 -54.67 -10.19 -0.21 Link 1xV100 32GB
07-06-23 wavlm/hparams/train_er.yaml 60.79 -8.96 -7.62 -2.77 Link 1xV100 32GB
07-06-23 wavlm/hparams/train_agem.yaml 72.54 13.59 35.29 -45.69 Link 1xV100 32GB
01-10-23 wavlm/hparams/train_der.yaml 71.22 -16.64 -3.21 - Not available 1xV100 32GB
07-06-23 wavlm/hparams/train_pnn.yaml 66.07 0.00 12.95 -23.34 Link 1xV100 32GB
07-06-23 wavlm/hparams/train_pb.yaml 61.87 0.00 2.75 -13.15 Link 1xV100 32GB
01-10-23 wavlm/hparams/train_l2p.yaml 92.72 0.00 52.11 - Not available 1xV100 32GB
07-06-23 wavlm/hparams/train_ewc.yaml 86.98 -39.54 -4.26 -6.13 Link 1xV100 32GB
07-06-23 wavlm/hparams/train_lwf.yaml 87.17 -26.03 10.42 -20.82 Link 1xV100 32GB
01-10-23 wavlm/hparams/train_mas.yaml 83.06 -1.37 33.22 - Not available 1xV100 32GB

Raw experiment logs are available here. We do not include the checkpoints due to storage limits (each experiment with Whisper large-v2 generates ~125 GB of checkpoint data).

Analyses generated via analyze_logs.py are available here.

All the experiments were run on 5 CentOS Linux machines with an Intel(R) Xeon(R) Silver 4216 Cascade Lake CPU with 32 cores @ 2.10 GHz, 64 GB RAM and an NVIDIA Tesla V100 SXM2 @ 32 GB with CUDA Toolkit 11.4. With the specified hardware configuration, approximately 10 days are necessary to complete all the experiments.


@ Citing

If you use the CL-MASR benchmark, please cite:

@article{dellalibera2024clmasr,
  author  = {{Della Libera}, Luca and Mousavi, Pooneh and Zaiem, Salah and Subakan, Cem and Ravanelli, Mirco},
  journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  title   = {{CL-MASR}: A Continual Learning Benchmark for Multilingual {ASR}},
  year    = {2024},
  volume  = {32},
  number  = {},
  pages   = {4931--4944},
  doi     = {10.1109/TASLP.2024.3487410}
}

If you use SpeechBrain, please cite the reference paper:

@article{ravanelli2024open,
  author  = {Mirco Ravanelli and Titouan Parcollet and Adel Moumen and Sylvain de Langen and Cem Subakan and Peter Plantinga and Yingzhi Wang and Pooneh Mousavi and Luca {Della Libera} and Artem Ploujnikov and Francesco Paissan and Davide Borra and Salah Zaiem and Zeyu Zhao and Shucong Zhang and Georgios Karakasidis and Sung-Lin Yeh and Pierre Champion and Aku Rouhe and Rudolf Braun and Florian Mai and Juan Zuluaga-Gomez and Seyed Mahed Mousavi and Andreas Nautsch and Ha Nguyen and Xuechen Liu and Sangeet Sagar and Jarod Duret and Salima Mdhaffar and Ga{{\"e}}lle Laperri{{\`e}}re and Mickael Rouvier and Renato De Mori and Yannick Est{{\`e}}ve},
  title   = {Open-Source Conversational {AI} with {SpeechBrain} 1.0},
  journal = {Journal of Machine Learning Research},
  year    = {2024},
  volume  = {25},
  number  = {333},
  pages   = {1--11},
  url     = {http://jmlr.org/papers/v25/24-0991.html}
}
@article{ravanelli2021speechbrain,
  author  = {Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
  title   = {{SpeechBrain}: A General-Purpose Speech Toolkit},
  journal = {arXiv preprint arXiv:2106.04624},
  year    = {2021},
  url     = {https://arxiv.org/abs/2106.04624},
}

📧 Contact

[email protected]