Skip to content

CTC Beam Search - Torchaudio wrapper - CTC Prefix Beam Search and CTC Beam Search from scratch + kenlm #2011

Merged
mravanelli merged 171 commits into
speechbrain:unstable-v0.6from
Adel-Moumen:ctc-frame-sync-bs
Sep 20, 2023
Merged

CTC Beam Search - Torchaudio wrapper - CTC Prefix Beam Search and CTC Beam Search from scratch + kenlm #2011
mravanelli merged 171 commits into
speechbrain:unstable-v0.6from
Adel-Moumen:ctc-frame-sync-bs

Conversation

@Adel-Moumen

@Adel-Moumen Adel-Moumen commented May 31, 2023

Copy link
Copy Markdown
Collaborator

The goal of the PR is to add CTC Frame-synchronous beam search in SpeechBrain. It support kenLM scorers. It works out of the box with a Sentencepiece tokeniser or a CTCLabelEncoder.

Some part of the code is taken and modified from PyCTCDecode (see: https://github.com/kensho-technologies/pyctcdecode).

How to test it?

Download our pertained CTC wav2vec2 from our DropBox

cd recipes/LibriSpeech/ASR/CTC/
wget -O wav2vec.zip https://www.dropbox.com/sh/qj2ps85g8oiicrj/AAAxlkQw5Pfo0M9EyHMi8iAra?dl=1
unzip wav2vec.zip -d wav2vec2_ctc

Run the recipe

# make sure that the `output_folder` name match the pretained folder.  
python3 train_with_wav2vec.py hparams/train_hf_wav2vec.yaml --data_folder=path

Results

Please see the updated README.md in the PR.

To do:

  • make it equivalent to the official CTC Prefix Beam Search paper
  • documentation
  • clean
  • report results
  • add Torchaudio CPU decoding
  • add Torchaudio GPU decoding
  • add test examples for each ones
  • modify all CTC recipes and use the new Beam Search?
  • refacto kenLM language model
  • blank skip heuristic
  • kenLM support for BS
  • kenLM support for CTC Prefix BS
  • SentencePiece compatibility with CTC Prefix BS and BS

@Adel-Moumen Adel-Moumen changed the base branch from develop to ctc-prefix-beamsearch May 31, 2023 13:57
@Adel-Moumen Adel-Moumen changed the base branch from unstable-v0.6 to develop August 22, 2023 12:49
@Adel-Moumen Adel-Moumen changed the base branch from develop to unstable-v0.6 August 22, 2023 12:49
@Adel-Moumen

Copy link
Copy Markdown
Collaborator Author

Some updates regarding this PR:

I think we can now extend the beamsearcher to all the other recipes using CTC (e.g., whisper could be the first one).

I changed the following recipes: CommonVoice, Aishell, LibriSpeech and Switchboard. I can't change the others recipe due to the argument space_token. On Media for instance, I don't know what could be this value and I prefer to not change anything in case it will harm the results.

One note: Aishell with Beam Search is not leading to any improvement. I tried the TorchAudio/CTCBeamSearch/CTCPrefixBeamsearch with and without kenLM and it doest yield to any gains.

We can add some full-inference recipe tests for the CTC beamsearchers. As mentioned privately, we can take this opportunity to make the inference test fasters by downloading a version of LibriSpeech test-clean with only a few sentences.

Done.

I would retrain wav2vec2 ctc, due to the dimensionality issues that we have with the previous checkpoints.

Done.

I am currently running the recipe tests. I will report shortly if everything went fine.

@Adel-Moumen

Copy link
Copy Markdown
Collaborator Author

I am currently running the recipe tests. I will report shortly if everything went fine.

Everything went fine!

@mravanelli

Copy link
Copy Markdown
Collaborator

I ran again recipe tests and all seem to work. The only tests that fail are the direct recipes in timers and such, SLURM, fluent. This is expected because they use the pattern:

asr_model: !apply:speechbrain.pretrained.EncoderDecoderASR.from_hparams
    source: speechbrain/asr-crdnn-rnnlm-librispeech
    run_opts: {"device":"cuda:0"}

which is calls code not compatible with this version. When we will release the new version, we have to modify the HF repo accordingly and the issues will be fixed.

I think we can finally merge this PR. This is an amazing work @Adel-Moumen l and a fundamental step toward speechbrain 1.0!

@mravanelli mravanelli merged commit 2a51a4c into speechbrain:unstable-v0.6 Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants