SpeechBrain 0.6.0 by 30stomercury · Pull Request #751 · speechbrain/speechbrain

30stomercury · 2021-05-11T17:21:07Z

The goal of this PR is to support pure ctc training and decoding (beam search). Users can set ctc_weight: 1 and ctc_weight_decode: 1 to perform pure ctc training and beamsearch.

Here are the results I got (CTC with transformerlm):

WER 5.22 [ 2742 / 52576, 440 ins, 343 del, 1959 sub ] on test-clean
WER 12.41 [ 6494 / 52343, 1041 ins, 762 del, 4691 sub ] on test-other

To-dos:

Integrate N-gram LM interface in arpa format.
Run ctc, joint ctc/att decoding (with and without LM) after modification.

30stomercury · 2021-05-11T17:28:07Z

                log_probs[:, self.eos_index] = self.minus_inf

-            # Set the eos prob to minus_inf when it doesn't exceed threshold.
-            if self.using_eos_threshold:


I modified this part. We should check the eos threshold after all scorers (attn scorer, ctc scorer and lm scorer), or the emission of eos will only be based on the output of attn scorer.

TParcollet · 2021-05-11T17:30:40Z

Thanks @30stomercury, I guess that integrating arpa will be harder as it operates at word level and not bpe ?

30stomercury · 2021-05-11T17:33:37Z

Thanks @TParcollet , I'm investigating kenlm and see if we can have BPE in ARPA format. I will add some to-dos later

TParcollet · 2021-05-11T17:40:20Z

So, so, so it would be great to be able to manage word-lever arpa .. @Antoine-Caubriere did some work in that direction with the ctc_decode from DeepSpeech, but it's a standalone tool. However, it is operating at a word-levelm which could interest us more than bpe-level arpa (imo)

30stomercury · 2021-05-11T17:42:57Z

I see, I will take a look.

mravanelli · 2021-05-31T19:42:49Z

@30stomercury, what's the status of this PR? I think the output of this work should be compared with #773, right?

30stomercury · 2021-06-01T07:00:07Z

Gastron · 2021-06-02T10:15:24Z

Hey, should we implement the top-k hypothesis output (as discussed in PR #761) in this PR instead?

30stomercury · 2021-06-02T12:31:48Z

@Gastron Sure, we can do it here. I will add it to the to-dos.

mravanelli · 2021-06-19T00:48:08Z

Hi @30stomercury,
thank you very much for this PR! I like the idea of making beamsearch more modular by separating the "search" part from the "rescoring" one.
Let me share here some comments (already shared privately):

kenlm shouldn't be mandatory. I suggest importing it one where needed and raise an error is not installed.
add the docstrings in scorer.py (make sure to include runnable examples)
It looks like recipes/AISHELL-1/ASR/seq2seq/train.py contains weird things like loss, loss_seq, loss_ctc = 0.0, 0.0, 0.0, # np.save('token_list.npy', token_list). I guess this is still a work in progress, right?
ScorerBuilder => it is not clear to me how we manage the scoring with RNNLM or Transofrmer LM. I would have expected a type of scorer similar to NGramScorer (maybe called NeuralLMScorer) in scorer.py. Is that part missing?

mravanelli · 2021-06-19T01:09:19Z

Let me also put on the table one idea for the ScorerBuilder. At this moment it is a bit hardcoded and not that scalable to other scorers possibly defined by the users. My suggestion to make everything even more modular is the following:

Define all the scorers in the yaml file (e.g, CTCPrefixScorer, NGramScorer, CoveragePenalty).
Instantiate the builder in the yaml file as wel such that it takes a list of scorer objects:

scorer: !new:speechbrain.decoders.scorer.ScorerBuilder
[
CTCPrefixScorer,
NGramScorer,
CoveragePenalty
]

This is something quite similar to the way we manage the augmentation pipeline in some recipes (e.g, https://github.com/speechbrain/speechbrain/blob/develop/recipes/VoxCeleb/SpeakerRec/hparams/train_ecapa_tdnn.yaml#L122). The advantage is that everything looks more transparent and modular. Users, for instance, can define a new scorer without changing the ScorerBuilder.

What do you think?

30stomercury · 2021-06-19T09:44:50Z

Hi @mravanelli , thank you for your suggestions.
1, 2. Currently I don't have a better way to manage importing part, I will finish the docstring first.
3. Yes, that part is still in progress, I would like to make librispeech converge first.
4. I think we can move neural LM part to the scorer. The reason why I didn't do it is because moving RNNLM or Transofrmer LM to scorer part will involve a lot of changes, I'm not sure if other developers are okay with that.

mravanelli · 2021-06-19T16:06:34Z

As for point 4, this PR already requires us to change all the ASR recipes. So we can really take this opportunity to do the change that adds the NeuralLMScorer as well.

30stomercury · 2021-06-25T07:25:01Z

Hi @mravanelli ,
I propose two ways to manage scorers in yaml. The second approach is similar to the way you suggested.

# Method1

scorer: !new:speechbrain.decoders.scorer.ScorerBuilder
   eos_index: !ref <eos_index>
   blank_index: !ref <blank_index>
   vocab_size: !ref <output_neurons>
   ctc_weight: !ref <ctc_decode_weight>
   #ngram_weight: !ref <ngram_weight>
   transformerlm_weight: !ref <lm_weight>
   coverage_weight: !ref <coverage_penalty>
   ctc_score_mode: !ref <ctc_score_mode>
   ctc_linear: !ref <ctc_lin>
   #lm_path: !ref <ngram_model>
   #tokenizer: !ref <save_folder>/tokenizer.ckpt
   transformerlm: !ref <lm_model>


# Method2

ctc_scorer: !new:speechbrain.decoders.scorer.CTCScorer
   eos_index: !ref <eos_index>
   blank_index: !ref <blank_index>
   ctc_fc: !ref <ctc_lin>

coverage_scorer: !new:speechbrain.decoders.scorer.CoverageScorer
   vocab_size: !ref <output_neurons>

transformerlm_scorer: !new:speechbrain.decoders.scorer.TransformerLMScorer
   language_model: !ref <lm_model>

scorer2: !new:speechbrain.decoders.scorer.ScorerBuilder2
   ctc_weight: !ref <ctc_decode_weight>
   transformerlm_weight: !ref <lm_weight>
   coverage_weight: !ref <coverage_penalty>
   full_scorers: [
        !ref <transformerlm_scorer>,
        !ref <coverage_scorer> ]
   partial_scorers: [ !ref <ctc_scorer> ]

30stomercury · 2021-06-25T07:30:16Z

I will add a method to validate if users define scorers correctly. E.g., coverage_scorer should always be put into full_scorers. Or ctc_weight should be 1.0 if ctc_scorer is put into full_scorers list.

mravanelli · 2021-06-25T13:38:47Z

Hi @30stomercury. I like method2 more because looks more modular and users can modify the scoring pipeline directly in yaml file. To make it even more modular, I would suggest using a list of tuples where the first element is the scorer object and the second one is the weight of the scorer:

scorer2: !new:speechbrain.decoders.scorer.ScorerBuilder2
   full_scorers: [
        !ref (<transformerlm_scorer>,  <lm_weight>),
        !ref (<coverage_weight>,  <coverage_penalty>), ]
   partial_scorers: [ !ref (<ctc_scorer>, ctc_weight ) ]

As an alternative, we can link the scorer object with the corresponding weight in a way similar to what done with the pretrainer:

scorer2: !new:speechbrain.decoders.scorer.ScorerBuilder2
   full_scorers:
      transformerlm_scorer: !ref <transformerlm_scorer>
      coverage_weight: !ref <coverage_weight>
   partial_scorers:
      ctc_scorer: !ref <ctc_scorer>
   weights:
       transformerlm_scorer: !ref<lm_weight>
       coverage_weight: !ref <coverage_penalty>
       ctc_scorer: !ref <ctc_weight>

The second one creates the link using dictionary. It is less compact, but probably more elegant and more in line with what done for the pretrainer. Maybe @Gastron and the others have some suggestions here. What do you think @30stomercury?

Also, what is the difference between full-scorers and partial one?

30stomercury · 2021-06-25T14:01:03Z

Hi @mravanelli , the partial scorers score the topk tokens based on the logprobs after full scorers. Scoring all tokens in vocabulary is too expensive for some scorers, e.g. ngram scorer, therefore they only score on pruned tokens.

See candidates (pruned tokens) in the score() method. We score the logprobs with full scorers first, then partial scorers. I will add some descriptions for that.

    def score(self, inp_tokens, memory, attn, log_probs, beam_size):
        new_memory = dict()
        # score full candidates
        for k, impl in self.full_scorers.items():
            score, new_memory[k] = impl.score(inp_tokens, memory[k], None, attn)
            log_probs += score * self.weights[k]

        # select candidates for partial scorers
        _, candidates = log_probs.topk(int(beam_size * self.scorer_beam_scale), dim=-1)

        # score patial candidates
        for k, impl in self.partial_scorers.items():
            score, new_memory[k] = impl.score(
                inp_tokens, memory[k], candidates, attn
            )
            log_probs += score * self.weights[k]

        return log_probs, new_memory

I prefer using a list of tuples. But I think others have different ideas.

mravanelli · 2021-06-25T14:09:46Z

Makes sense. What is your preference for the yaml part?

…

On Fri, 25 Jun 2021 at 10:01, Sung-Lin Yeh ***@***.***> wrote: Hi @mravanelli <https://github.com/mravanelli> , the partial scorers score the topk tokens based on the logprobs after full scorers. Scoring all tokens in vocabulary is too expensive for some scorers, e.g. ngram scorer, therefore they only score on pruned tokens. See candidates (pruned tokens) in the score() method. We score the logprob with full scorers first, then partial scorers. I will add some descriptions for that. def score(self, inp_tokens, memory, attn, log_probs, beam_size): new_memory = dict() # score full candidates for k, impl in self.full_scorers.items(): score, new_memory[k] = impl.score(inp_tokens, memory[k], None, attn) log_probs += score * self.weights[k] # select candidates for partial scorers _, candidates = log_probs.topk(int(beam_size * self.scorer_beam_scale), dim=-1) # score patial candidates for k, impl in self.partial_scorers.items(): score, new_memory[k] = impl.score( inp_tokens, memory[k], candidates, attn ) log_probs += score * self.weights[k] return log_probs, new_memory — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#751 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVUZGZ6JYZADHUVNI63TUSD2TANCNFSM44WKGYHA> .

30stomercury · 2021-06-25T14:12:02Z

I prefer using a list of tuples.

mravanelli · 2021-06-25T14:22:26Z

I'm fine with both solutions. Let's hear a bit the others (e.g., @Gastron, @TParcollet ,..)

Gastron · 2021-07-05T10:47:56Z

I'm ok with using (scorer_func, weight) -tuples too, but maybe I vote for separate arguments (Mirco's second suggestion). Tuples would force specifying weights for everything, but I think it could be nice to have a default 1.0 weight for all scorers that don't get a weight specified. Having one weights argument but two scorer lists (full/partial) could look a little weird, but referring by key would allow very clear error messages. Specifying weights separately would be more in line with many PyTorch weighted things (like weighted sampling or weighted losses), I think.

Full / partial scorer makes me think of rescoring full utterance outputs vs. scoring a partial output. Perhaps the vocab could be something like fulldist_scorer / topk_scorer.

30stomercury · 2021-07-05T11:25:41Z

I have a default 0.0 weight if scorers are not specified. I can go for solution 2, with an error message to check if weights and full/partial scorer lists are matched.
topk_scorer can be confused, it looks like scorers are rescoring on topk hyps but our partial scorers are rescoring on topk tokens of all hyps.

30stomercury · 2021-07-08T10:47:33Z

@mravanelli, @Gastron

I have adapted those changes to yaml files that involve beamsearch part.

For the top-k hypothesis output:

I suggest we modify the train.py to obtain best hyps from topk hyps.
An example of outputting topk hyps,

topk_tokens, topk_lens, scores = self.hparams.test_search(x, wav_lens)
# The best hyps of each batch
p_tokens = [
        hyps[:length-1] for hyps, length
        in zip(topk_tokens[:, 0, :].tolist(),
               topk_lens[:, 0].tolist())
]

where the shape of topk_tokens is [batch_size, topk, max_hyp_lengths], and topk_length is [batch_size, topk].

Adel-Moumen · 2023-07-20T09:50:58Z

…eamsearch

mravanelli · 2023-07-29T01:35:08Z

Hi @Adel-Moumen, I merged the latest development branch and ran again the recipe tests (including the recently added full-inference tests).
Beyond the known issues due to HF (which we will be fixed after updating the HF repos to the new version of speechbrain), I found the following issues:

PIQ recipe:

(153/186) Running test for ESC50_row_7...
        ... 9.57s
        ERROR: Error in ESC50_row_7 (recipes/ESC50/interpret/hparams/piq.yaml). Check tests/tmp/ESC50_row_7/stderr.txt and tests/tmp/ESC50_row_7/stdout.txt for more info.

Traceback (most recent call last):
  File "/workspace/speechbrain_adelPR/recipes/ESC50/interpret/train_piq.py", line 751, in <module>
    Interpreter_brain.checkpointer.recover_if_possible(
TypeError: recover_if_possible() got an unexpected keyword argument 'device'

2.Conformer transducer recipes

(159/186)
        ERROR: Error in LibriSpeech_row_7 (recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml). Check tests/tmp/LibriSpeech_row_7/stderr.txt and tests/tmp/LibriSpeech_row_7/stdout.txt for more info.
        ...checking files & performance...
        ERROR: The recipe LibriSpeech_row_7 does not contain the expected file tests/tmp/LibriSpeech_row_7/wer_ASR_train.txt
(160/186) Running test for LibriSpeech_row_8...
        ... 30.45s
        ERROR: Error in LibriSpeech_row_8 (recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml). Check tests/tmp/LibriSpeech_row_8/stderr.txt and tests/tmp/LibriSpeech_row_8/stdout.txt for more info.
        ...checking files & performance...
        ERROR: The recipe LibriSpeech_row_8 does not contain the expected file tests/tmp/LibriSpeech_row_8/wer_ASR_train.txt

Traceback (most recent call last):
  File "/workspace/speechbrain_adelPR/recipes/LibriSpeech/ASR/transducer/train.py", line 520, in <module>
    asr_brain.evaluate(
  File "/workspace/speechbrain_adelPR/speechbrain/core.py", line 1430, in evaluate
    self.on_evaluate_start(max_key=max_key, min_key=min_key)
  File "/workspace/speechbrain_adelPR/recipes/LibriSpeech/ASR/transducer/train.py", line 300, in on_evaluate_start
    ckpt = sb.utils.checkpoints.average_checkpoints(
TypeError: average_checkpoints() got an unexpected keyword argument 'device'

3.ASR Template

(168/186) Running test for LibriSpeech_row_16...
        ... 12.81s
        ERROR: Error in LibriSpeech_row_16 (templates/speech_recognition/ASR/train.yaml). Check tests/tmp/LibriSpeech_row_16/stderr.txt and tests/tmp/LibriSpeech_row_16/stdout.txt for more info.

  File "/workspace/speechbrain_adelPR/templates/speech_recognition/ASR/train.py", line 454, in <module>
    asr_brain.fit(
  File "/workspace/speechbrain_adelPR/speechbrain/core.py", line 1317, in fit
    self._fit_valid(valid_set=valid_set, epoch=epoch, enable=enable)
  File "/workspace/speechbrain_adelPR/speechbrain/core.py", line 1218, in _fit_valid
    loss = self.evaluate_batch(batch, stage=Stage.VALID)
  File "/workspace/speechbrain_adelPR/speechbrain/core.py", line 1122, in evaluate_batch
    out = self.compute_forward(batch, stage=stage)
  File "/workspace/speechbrain_adelPR/templates/speech_recognition/ASR/train.py", line 103, in compute_forward
    if stage == sb.Stage.Valid:
  File "/opt/conda/envs/myenv/lib/python3.9/enum.py", line 429, in __getattr__
    raise AttributeError(name) from None
AttributeError: Valid

This is something related to this PR because the same tests on the dev branch run smoothly. Any idea?

Adel-Moumen · 2023-07-31T09:05:17Z

Hey @mravanelli, I fixed everything. It was mainly related to the PR in unstable that is changing how we are loading ckpts.

mravanelli · 2023-07-31T18:25:11Z

All tests are passing now! Thank you @Adel-Moumen and @30stomercury for this PR. It took a while, but this is an important step toward SpeechBrain 1.0!

30stomercury commented May 11, 2021

View reviewed changes

30stomercury added bug Something isn't working work in progress Not ready for merge labels May 11, 2021

TParcollet removed the bug Something isn't working label May 18, 2021

30stomercury changed the title ~~Add pure ctc training/decoding~~ Refactor Decoders Jun 1, 2021

30stomercury added the refactor Edit code without changing functionality label Jun 1, 2021

Gastron mentioned this pull request Jun 8, 2021

Return topk hypotheses in beam search decoding #761

Closed

30stomercury mentioned this pull request Jun 17, 2021

CTC beamsearch decoding via ctcdecode #773

Closed

Adel-Moumen added 2 commits July 19, 2023 19:42

all commonvoice is passing the tests

f742c50

fix ksponspeech incorrects hyperparams

b307fcc

Adel-Moumen added 14 commits July 20, 2023 11:57

aishell fix passing test

3d49ad5

iwslt

027742e

dvoice

7d115e0

fix libri3mix pretrainer

7ee724b

timit / rescue

8b65f91

message fetching.py

c6359ea

fix loading relative paths

e2cf818

fisher-callhome-spanish

3e1a8bc

KsponSpeech working

a90f3f1

switchboard

29d9039

slurp

85408fc

fetch unstable branch

434b014

Merge remote-tracking branch 'origin/unstable-v0.6' into ctc-prefix-b…

3b16372

…eamsearch

Merge remote-tracking branch 'origin/unstable-v0.6' into ctc-prefix-b…

5891fa7

…eamsearch

mravanelli mentioned this pull request Jul 28, 2023

[WIP] Support for full-inference recipe tests #2091

Merged

mravanelli added 2 commits July 28, 2023 16:47

Merge branch 'unstable-v0.6' into ctc-prefix-beamsearch

445bb1f

fix flake8

d733355

Adel-Moumen added 3 commits July 31, 2023 10:47

fix device piq recipe

c9c74c1

fix librispeech asr transducer

52220c3

fix train.py/yaml for template

2b25f80

mravanelli merged commit 6178838 into unstable-v0.6 Jul 31, 2023

mravanelli deleted the ctc-prefix-beamsearch branch July 31, 2023 18:25

Adel-Moumen restored the ctc-prefix-beamsearch branch July 31, 2023 18:31

Adel-Moumen deleted the ctc-prefix-beamsearch branch August 4, 2023 14:16

Conversation

30stomercury commented May 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

30stomercury May 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TParcollet commented May 11, 2021

Uh oh!

30stomercury commented May 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TParcollet commented May 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

30stomercury commented May 11, 2021

Uh oh!

mravanelli commented May 31, 2021

Uh oh!

30stomercury commented Jun 1, 2021 • edited by TParcollet Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Gastron commented Jun 2, 2021

Uh oh!

30stomercury commented Jun 2, 2021

Uh oh!

mravanelli commented Jun 19, 2021

Uh oh!

mravanelli commented Jun 19, 2021

Uh oh!

30stomercury commented Jun 19, 2021

Uh oh!

mravanelli commented Jun 19, 2021

Uh oh!

30stomercury commented Jun 25, 2021

Uh oh!

30stomercury commented Jun 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mravanelli commented Jun 25, 2021

Uh oh!

30stomercury commented Jun 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mravanelli commented Jun 25, 2021 via email

Uh oh!

30stomercury commented Jun 25, 2021

Uh oh!

mravanelli commented Jun 25, 2021

Uh oh!

Gastron commented Jul 5, 2021

Uh oh!

30stomercury commented Jul 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

30stomercury commented Jul 8, 2021

Uh oh!

Adel-Moumen commented Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mravanelli commented Jul 29, 2023

Uh oh!

Adel-Moumen commented Jul 31, 2023

Uh oh!

mravanelli commented Jul 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

30stomercury commented May 11, 2021 •

edited

Loading

30stomercury May 11, 2021 •

edited

Loading

30stomercury commented May 11, 2021 •

edited

Loading

TParcollet commented May 11, 2021 •

edited

Loading

30stomercury commented Jun 1, 2021 •

edited by TParcollet

Loading

30stomercury commented Jun 25, 2021 •

edited

Loading

30stomercury commented Jun 25, 2021 •

edited

Loading

30stomercury commented Jul 5, 2021 •

edited

Loading

Adel-Moumen commented Jul 20, 2023 •

edited

Loading