Skip to content

SpeechBrain 0.6.0#751

Merged
mravanelli merged 348 commits into
unstable-v0.6from
ctc-prefix-beamsearch
Jul 31, 2023
Merged

SpeechBrain 0.6.0#751
mravanelli merged 348 commits into
unstable-v0.6from
ctc-prefix-beamsearch

Conversation

@30stomercury

@30stomercury 30stomercury commented May 11, 2021

Copy link
Copy Markdown
Collaborator

The goal of this PR is to support pure ctc training and decoding (beam search). Users can set ctc_weight: 1 and ctc_weight_decode: 1 to perform pure ctc training and beamsearch.

Here are the results I got (CTC with transformerlm):

WER 5.22 [ 2742 / 52576, 440 ins, 343 del, 1959 sub ] on test-clean
WER 12.41 [ 6494 / 52343, 1041 ins, 762 del, 4691 sub ] on test-other

To-dos:

  • Integrate N-gram LM interface in arpa format.
  • Run ctc, joint ctc/att decoding (with and without LM) after modification.

log_probs[:, self.eos_index] = self.minus_inf

# Set the eos prob to minus_inf when it doesn't exceed threshold.
if self.using_eos_threshold:

@30stomercury 30stomercury May 11, 2021

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified this part. We should check the eos threshold after all scorers (attn scorer, ctc scorer and lm scorer), or the emission of eos will only be based on the output of attn scorer.

@30stomercury 30stomercury added bug Something isn't working work in progress Not ready for merge labels May 11, 2021
@TParcollet

Copy link
Copy Markdown
Collaborator

Thanks @30stomercury, I guess that integrating arpa will be harder as it operates at word level and not bpe ?

@30stomercury

30stomercury commented May 11, 2021

Copy link
Copy Markdown
Collaborator Author

Thanks @TParcollet , I'm investigating kenlm and see if we can have BPE in ARPA format. I will add some to-dos later

@TParcollet

TParcollet commented May 11, 2021

Copy link
Copy Markdown
Collaborator

So, so, so it would be great to be able to manage word-lever arpa .. @Antoine-Caubriere did some work in that direction with the ctc_decode from DeepSpeech, but it's a standalone tool. However, it is operating at a word-levelm which could interest us more than bpe-level arpa (imo)

@30stomercury

Copy link
Copy Markdown
Collaborator Author

I see, I will take a look.

@TParcollet TParcollet removed the bug Something isn't working label May 18, 2021
@mravanelli

Copy link
Copy Markdown
Collaborator

@30stomercury, what's the status of this PR? I think the output of this work should be compared with #773, right?

@30stomercury 30stomercury changed the title Add pure ctc training/decoding Refactor Decoders Jun 1, 2021
@30stomercury 30stomercury added the refactor Edit code without changing functionality label Jun 1, 2021
@30stomercury

30stomercury commented Jun 1, 2021

Copy link
Copy Markdown
Collaborator Author

Hi @mravanelli and @TParcollet, this is the current status of this pr.

  • Support NgramScorer with kenlm.
    • full scorer mode
    • partial scorer mode
  • Compare our ctcscorer + ngramscorer with ctcdecode.
  • Support pure ctc decoding using CTCPrefixScorer.
  • Run decoding experiments on: seq2seq + ngram lm, ctc + ngram lm.
    • AIShell
    • LibriSpeech
  • A Global Scorer for beam searcher, see speechbrain/decoders/scorer.py.
  • Implementation of the top-k hypothesis output.

@Gastron

Gastron commented Jun 2, 2021

Copy link
Copy Markdown
Collaborator

Hey, should we implement the top-k hypothesis output (as discussed in PR #761) in this PR instead?

@30stomercury

Copy link
Copy Markdown
Collaborator Author

@Gastron Sure, we can do it here. I will add it to the to-dos.

@mravanelli

Copy link
Copy Markdown
Collaborator

Hi @30stomercury,
thank you very much for this PR! I like the idea of making beamsearch more modular by separating the "search" part from the "rescoring" one.
Let me share here some comments (already shared privately):

  1. kenlm shouldn't be mandatory. I suggest importing it one where needed and raise an error is not installed.
  2. add the docstrings in scorer.py (make sure to include runnable examples)
  3. It looks like recipes/AISHELL-1/ASR/seq2seq/train.py contains weird things like loss, loss_seq, loss_ctc = 0.0, 0.0, 0.0, # np.save('token_list.npy', token_list). I guess this is still a work in progress, right?
  4. ScorerBuilder => it is not clear to me how we manage the scoring with RNNLM or Transofrmer LM. I would have expected a type of scorer similar to NGramScorer (maybe called NeuralLMScorer) in scorer.py. Is that part missing?

@mravanelli

Copy link
Copy Markdown
Collaborator

Let me also put on the table one idea for the ScorerBuilder. At this moment it is a bit hardcoded and not that scalable to other scorers possibly defined by the users. My suggestion to make everything even more modular is the following:

  1. Define all the scorers in the yaml file (e.g, CTCPrefixScorer, NGramScorer, CoveragePenalty).
  2. Instantiate the builder in the yaml file as wel such that it takes a list of scorer objects:
scorer: !new:speechbrain.decoders.scorer.ScorerBuilder
[
CTCPrefixScorer,
NGramScorer,
CoveragePenalty
]

This is something quite similar to the way we manage the augmentation pipeline in some recipes (e.g, https://github.com/speechbrain/speechbrain/blob/develop/recipes/VoxCeleb/SpeakerRec/hparams/train_ecapa_tdnn.yaml#L122). The advantage is that everything looks more transparent and modular. Users, for instance, can define a new scorer without changing the ScorerBuilder.

What do you think?

@30stomercury

Copy link
Copy Markdown
Collaborator Author

Hi @mravanelli , thank you for your suggestions.
1, 2. Currently I don't have a better way to manage importing part, I will finish the docstring first.
3. Yes, that part is still in progress, I would like to make librispeech converge first.
4. I think we can move neural LM part to the scorer. The reason why I didn't do it is because moving RNNLM or Transofrmer LM to scorer part will involve a lot of changes, I'm not sure if other developers are okay with that.

@mravanelli

Copy link
Copy Markdown
Collaborator

As for point 4, this PR already requires us to change all the ASR recipes. So we can really take this opportunity to do the change that adds the NeuralLMScorer as well.

@30stomercury

Copy link
Copy Markdown
Collaborator Author

Hi @mravanelli ,
I propose two ways to manage scorers in yaml. The second approach is similar to the way you suggested.

# Method1

scorer: !new:speechbrain.decoders.scorer.ScorerBuilder
   eos_index: !ref <eos_index>
   blank_index: !ref <blank_index>
   vocab_size: !ref <output_neurons>
   ctc_weight: !ref <ctc_decode_weight>
   #ngram_weight: !ref <ngram_weight>
   transformerlm_weight: !ref <lm_weight>
   coverage_weight: !ref <coverage_penalty>
   ctc_score_mode: !ref <ctc_score_mode>
   ctc_linear: !ref <ctc_lin>
   #lm_path: !ref <ngram_model>
   #tokenizer: !ref <save_folder>/tokenizer.ckpt
   transformerlm: !ref <lm_model>


# Method2

ctc_scorer: !new:speechbrain.decoders.scorer.CTCScorer
   eos_index: !ref <eos_index>
   blank_index: !ref <blank_index>
   ctc_fc: !ref <ctc_lin>

coverage_scorer: !new:speechbrain.decoders.scorer.CoverageScorer
   vocab_size: !ref <output_neurons>

transformerlm_scorer: !new:speechbrain.decoders.scorer.TransformerLMScorer
   language_model: !ref <lm_model>

scorer2: !new:speechbrain.decoders.scorer.ScorerBuilder2
   ctc_weight: !ref <ctc_decode_weight>
   transformerlm_weight: !ref <lm_weight>
   coverage_weight: !ref <coverage_penalty>
   full_scorers: [
        !ref <transformerlm_scorer>,
        !ref <coverage_scorer> ]
   partial_scorers: [ !ref <ctc_scorer> ]

@30stomercury

30stomercury commented Jun 25, 2021

Copy link
Copy Markdown
Collaborator Author

I will add a method to validate if users define scorers correctly. E.g., coverage_scorer should always be put into full_scorers. Or ctc_weight should be 1.0 if ctc_scorer is put into full_scorers list.

@mravanelli

Copy link
Copy Markdown
Collaborator

Hi @30stomercury. I like method2 more because looks more modular and users can modify the scoring pipeline directly in yaml file. To make it even more modular, I would suggest using a list of tuples where the first element is the scorer object and the second one is the weight of the scorer:

scorer2: !new:speechbrain.decoders.scorer.ScorerBuilder2
   full_scorers: [
        !ref (<transformerlm_scorer>,  <lm_weight>),
        !ref (<coverage_weight>,  <coverage_penalty>), ]
   partial_scorers: [ !ref (<ctc_scorer>, ctc_weight ) ]

As an alternative, we can link the scorer object with the corresponding weight in a way similar to what done with the pretrainer:

scorer2: !new:speechbrain.decoders.scorer.ScorerBuilder2
   full_scorers:
      transformerlm_scorer: !ref <transformerlm_scorer>
      coverage_weight: !ref <coverage_weight>
   partial_scorers:
      ctc_scorer: !ref <ctc_scorer>
   weights:
       transformerlm_scorer: !ref<lm_weight>
       coverage_weight: !ref <coverage_penalty>
       ctc_scorer: !ref <ctc_weight>

The second one creates the link using dictionary. It is less compact, but probably more elegant and more in line with what done for the pretrainer. Maybe @Gastron and the others have some suggestions here. What do you think @30stomercury?

Also, what is the difference between full-scorers and partial one?

@30stomercury

30stomercury commented Jun 25, 2021

Copy link
Copy Markdown
Collaborator Author

Hi @mravanelli , the partial scorers score the topk tokens based on the logprobs after full scorers. Scoring all tokens in vocabulary is too expensive for some scorers, e.g. ngram scorer, therefore they only score on pruned tokens.

See candidates (pruned tokens) in the score() method. We score the logprobs with full scorers first, then partial scorers. I will add some descriptions for that.

    def score(self, inp_tokens, memory, attn, log_probs, beam_size):
        new_memory = dict()
        # score full candidates
        for k, impl in self.full_scorers.items():
            score, new_memory[k] = impl.score(inp_tokens, memory[k], None, attn)
            log_probs += score * self.weights[k]

        # select candidates for partial scorers
        _, candidates = log_probs.topk(int(beam_size * self.scorer_beam_scale), dim=-1)

        # score patial candidates
        for k, impl in self.partial_scorers.items():
            score, new_memory[k] = impl.score(
                inp_tokens, memory[k], candidates, attn
            )
            log_probs += score * self.weights[k]

        return log_probs, new_memory

I prefer using a list of tuples. But I think others have different ideas.

@mravanelli

mravanelli commented Jun 25, 2021 via email

Copy link
Copy Markdown
Collaborator

@30stomercury

Copy link
Copy Markdown
Collaborator Author

I prefer using a list of tuples.

@mravanelli

Copy link
Copy Markdown
Collaborator

I'm fine with both solutions. Let's hear a bit the others (e.g., @Gastron, @TParcollet ,..)

@Gastron

Gastron commented Jul 5, 2021

Copy link
Copy Markdown
Collaborator

I'm ok with using (scorer_func, weight) -tuples too, but maybe I vote for separate arguments (Mirco's second suggestion). Tuples would force specifying weights for everything, but I think it could be nice to have a default 1.0 weight for all scorers that don't get a weight specified. Having one weights argument but two scorer lists (full/partial) could look a little weird, but referring by key would allow very clear error messages. Specifying weights separately would be more in line with many PyTorch weighted things (like weighted sampling or weighted losses), I think.

Full / partial scorer makes me think of rescoring full utterance outputs vs. scoring a partial output. Perhaps the vocab could be something like fulldist_scorer / topk_scorer.

@30stomercury

30stomercury commented Jul 5, 2021

Copy link
Copy Markdown
Collaborator Author

I have a default 0.0 weight if scorers are not specified. I can go for solution 2, with an error message to check if weights and full/partial scorer lists are matched.
topk_scorer can be confused, it looks like scorers are rescoring on topk hyps but our partial scorers are rescoring on topk tokens of all hyps.

@30stomercury

Copy link
Copy Markdown
Collaborator Author

@mravanelli, @Gastron

I have adapted those changes to yaml files that involve beamsearch part.

For the top-k hypothesis output:

I suggest we modify the train.py to obtain best hyps from topk hyps.
An example of outputting topk hyps,

topk_tokens, topk_lens, scores = self.hparams.test_search(x, wav_lens)
# The best hyps of each batch
p_tokens = [
        hyps[:length-1] for hyps, length
        in zip(topk_tokens[:, 0, :].tolist(),
               topk_lens[:, 0].tolist())
]

where the shape of topk_tokens is [batch_size, topk, max_hyp_lengths], and topk_length is [batch_size, topk].

@Adel-Moumen

Adel-Moumen commented Jul 20, 2023

Copy link
Copy Markdown
Collaborator

To check:

  • AISHELL-1
  • Aishell1Mix
  • AMI
  • BinauralWSJ0Mix
  • CommonLanguage
  • CommonVoice
  • DVoice
  • ESC50
  • Fisher-Callhome-Spanish
  • fluent-speech-commands
    🔴 Failing but this is because the HuggingFace Hub yamls are not sync with the new beam search interface
  • Google-speech-commands
  • IEMOCAP
  • IWSLT22_lowresource
  • KsponSpeech
  • LibriMix
  • LibriParty
  • LibriSpeech
  • LibriTTS
  • LJSpeech
    🔴 Failing but this is because the HuggingFace Hub yamls are not sync with the new beam search interface
  • MEDIA
  • REAL-M
  • RescueSpeech
  • SLURP
    🔴 Failing but this is because the HuggingFace Hub yamls are not sync with the new beam search interface
  • Switchboard
  • timers-and-such
    🔴 Failing but this is because the HuggingFace Hub yamls are not sync with the new beam search interface
  • TIMIT
  • UrbanSound8k
  • Voicebank
  • VoxCeleb
  • VoxLingua107
  • WHAMandWHAMR
  • WSJ0Mix
  • ZaionEmotionDataset

@mravanelli

Copy link
Copy Markdown
Collaborator

Hi @Adel-Moumen, I merged the latest development branch and ran again the recipe tests (including the recently added full-inference tests).
Beyond the known issues due to HF (which we will be fixed after updating the HF repos to the new version of speechbrain), I found the following issues:

  1. PIQ recipe:
(153/186) Running test for ESC50_row_7...
        ... 9.57s
        ERROR: Error in ESC50_row_7 (recipes/ESC50/interpret/hparams/piq.yaml). Check tests/tmp/ESC50_row_7/stderr.txt and tests/tmp/ESC50_row_7/stdout.txt for more info.
Traceback (most recent call last):
  File "/workspace/speechbrain_adelPR/recipes/ESC50/interpret/train_piq.py", line 751, in <module>
    Interpreter_brain.checkpointer.recover_if_possible(
TypeError: recover_if_possible() got an unexpected keyword argument 'device'

2.Conformer transducer recipes

(159/186)
        ERROR: Error in LibriSpeech_row_7 (recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml). Check tests/tmp/LibriSpeech_row_7/stderr.txt and tests/tmp/LibriSpeech_row_7/stdout.txt for more info.
        ...checking files & performance...
        ERROR: The recipe LibriSpeech_row_7 does not contain the expected file tests/tmp/LibriSpeech_row_7/wer_ASR_train.txt
(160/186) Running test for LibriSpeech_row_8...
        ... 30.45s
        ERROR: Error in LibriSpeech_row_8 (recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml). Check tests/tmp/LibriSpeech_row_8/stderr.txt and tests/tmp/LibriSpeech_row_8/stdout.txt for more info.
        ...checking files & performance...
        ERROR: The recipe LibriSpeech_row_8 does not contain the expected file tests/tmp/LibriSpeech_row_8/wer_ASR_train.txt
Traceback (most recent call last):
  File "/workspace/speechbrain_adelPR/recipes/LibriSpeech/ASR/transducer/train.py", line 520, in <module>
    asr_brain.evaluate(
  File "/workspace/speechbrain_adelPR/speechbrain/core.py", line 1430, in evaluate
    self.on_evaluate_start(max_key=max_key, min_key=min_key)
  File "/workspace/speechbrain_adelPR/recipes/LibriSpeech/ASR/transducer/train.py", line 300, in on_evaluate_start
    ckpt = sb.utils.checkpoints.average_checkpoints(
TypeError: average_checkpoints() got an unexpected keyword argument 'device'

3.ASR Template

(168/186) Running test for LibriSpeech_row_16...
        ... 12.81s
        ERROR: Error in LibriSpeech_row_16 (templates/speech_recognition/ASR/train.yaml). Check tests/tmp/LibriSpeech_row_16/stderr.txt and tests/tmp/LibriSpeech_row_16/stdout.txt for more info.
  File "/workspace/speechbrain_adelPR/templates/speech_recognition/ASR/train.py", line 454, in <module>
    asr_brain.fit(
  File "/workspace/speechbrain_adelPR/speechbrain/core.py", line 1317, in fit
    self._fit_valid(valid_set=valid_set, epoch=epoch, enable=enable)
  File "/workspace/speechbrain_adelPR/speechbrain/core.py", line 1218, in _fit_valid
    loss = self.evaluate_batch(batch, stage=Stage.VALID)
  File "/workspace/speechbrain_adelPR/speechbrain/core.py", line 1122, in evaluate_batch
    out = self.compute_forward(batch, stage=stage)
  File "/workspace/speechbrain_adelPR/templates/speech_recognition/ASR/train.py", line 103, in compute_forward
    if stage == sb.Stage.Valid:
  File "/opt/conda/envs/myenv/lib/python3.9/enum.py", line 429, in __getattr__
    raise AttributeError(name) from None
AttributeError: Valid

This is something related to this PR because the same tests on the dev branch run smoothly. Any idea?

@Adel-Moumen

Copy link
Copy Markdown
Collaborator

Hey @mravanelli, I fixed everything. It was mainly related to the PR in unstable that is changing how we are loading ckpts.

@mravanelli

Copy link
Copy Markdown
Collaborator

All tests are passing now! Thank you @Adel-Moumen and @30stomercury for this PR. It took a while, but this is an important step toward SpeechBrain 1.0!

@mravanelli mravanelli merged commit 6178838 into unstable-v0.6 Jul 31, 2023
@mravanelli mravanelli deleted the ctc-prefix-beamsearch branch July 31, 2023 18:25
@Adel-Moumen Adel-Moumen restored the ctc-prefix-beamsearch branch July 31, 2023 18:31
@Adel-Moumen Adel-Moumen deleted the ctc-prefix-beamsearch branch August 4, 2023 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request refactor Edit code without changing functionality work in progress Not ready for merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants