SpeechBrain 0.6.0#751
Conversation
| log_probs[:, self.eos_index] = self.minus_inf | ||
|
|
||
| # Set the eos prob to minus_inf when it doesn't exceed threshold. | ||
| if self.using_eos_threshold: |
There was a problem hiding this comment.
I modified this part. We should check the eos threshold after all scorers (attn scorer, ctc scorer and lm scorer), or the emission of eos will only be based on the output of attn scorer.
|
Thanks @30stomercury, I guess that integrating arpa will be harder as it operates at word level and not bpe ? |
|
Thanks @TParcollet , I'm investigating kenlm and see if we can have BPE in ARPA format. I will add some to-dos later |
|
So, so, so it would be great to be able to manage word-lever arpa .. @Antoine-Caubriere did some work in that direction with the ctc_decode from DeepSpeech, but it's a standalone tool. However, it is operating at a word-levelm which could interest us more than bpe-level arpa (imo) |
|
I see, I will take a look. |
|
@30stomercury, what's the status of this PR? I think the output of this work should be compared with #773, right? |
|
Hi @mravanelli and @TParcollet, this is the current status of this pr.
|
|
Hey, should we implement the top-k hypothesis output (as discussed in PR #761) in this PR instead? |
|
@Gastron Sure, we can do it here. I will add it to the to-dos. |
|
Hi @30stomercury,
|
|
Let me also put on the table one idea for the
This is something quite similar to the way we manage the augmentation pipeline in some recipes (e.g, https://github.com/speechbrain/speechbrain/blob/develop/recipes/VoxCeleb/SpeakerRec/hparams/train_ecapa_tdnn.yaml#L122). The advantage is that everything looks more transparent and modular. Users, for instance, can define a new scorer without changing the ScorerBuilder. What do you think? |
|
Hi @mravanelli , thank you for your suggestions. |
|
As for point 4, this PR already requires us to change all the ASR recipes. So we can really take this opportunity to do the change that adds the NeuralLMScorer as well. |
|
Hi @mravanelli , |
|
I will add a method to validate if users define scorers correctly. E.g., |
|
Hi @30stomercury. I like method2 more because looks more modular and users can modify the scoring pipeline directly in yaml file. To make it even more modular, I would suggest using a list of tuples where the first element is the scorer object and the second one is the weight of the scorer: As an alternative, we can link the scorer object with the corresponding weight in a way similar to what done with the pretrainer: The second one creates the link using dictionary. It is less compact, but probably more elegant and more in line with what done for the pretrainer. Maybe @Gastron and the others have some suggestions here. What do you think @30stomercury? Also, what is the difference between full-scorers and partial one? |
|
Hi @mravanelli , the partial scorers score the topk tokens based on the logprobs after full scorers. Scoring all tokens in vocabulary is too expensive for some scorers, e.g. ngram scorer, therefore they only score on pruned tokens. See candidates (pruned tokens) in the score() method. We score the logprobs with full scorers first, then partial scorers. I will add some descriptions for that. I prefer using a list of tuples. But I think others have different ideas. |
|
Makes sense. What is your preference for the yaml part?
…On Fri, 25 Jun 2021 at 10:01, Sung-Lin Yeh ***@***.***> wrote:
Hi @mravanelli <https://github.com/mravanelli> , the partial scorers
score the topk tokens based on the logprobs after full scorers. Scoring all
tokens in vocabulary is too expensive for some scorers, e.g. ngram scorer,
therefore they only score on pruned tokens.
See candidates (pruned tokens) in the score() method. We score the logprob
with full scorers first, then partial scorers. I will add some descriptions
for that.
def score(self, inp_tokens, memory, attn, log_probs, beam_size):
new_memory = dict()
# score full candidates
for k, impl in self.full_scorers.items():
score, new_memory[k] = impl.score(inp_tokens, memory[k], None, attn)
log_probs += score * self.weights[k]
# select candidates for partial scorers
_, candidates = log_probs.topk(int(beam_size * self.scorer_beam_scale), dim=-1)
# score patial candidates
for k, impl in self.partial_scorers.items():
score, new_memory[k] = impl.score(
inp_tokens, memory[k], candidates, attn
)
log_probs += score * self.weights[k]
return log_probs, new_memory
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#751 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEA2ZVUZGZ6JYZADHUVNI63TUSD2TANCNFSM44WKGYHA>
.
|
|
I prefer using a list of tuples. |
|
I'm fine with both solutions. Let's hear a bit the others (e.g., @Gastron, @TParcollet ,..) |
|
I'm ok with using Full / partial scorer makes me think of rescoring full utterance outputs vs. scoring a partial output. Perhaps the vocab could be something like fulldist_scorer / topk_scorer. |
|
I have a default 0.0 weight if scorers are not specified. I can go for solution 2, with an error message to check if weights and full/partial scorer lists are matched. |
|
I have adapted those changes to yaml files that involve beamsearch part. For the top-k hypothesis output: I suggest we modify the train.py to obtain best hyps from topk hyps. where the shape of |
|
To check:
|
|
Hi @Adel-Moumen, I merged the latest development branch and ran again the recipe tests (including the recently added full-inference tests).
2.Conformer transducer recipes 3.ASR Template This is something related to this PR because the same tests on the dev branch run smoothly. Any idea? |
|
Hey @mravanelli, I fixed everything. It was mainly related to the PR in unstable that is changing how we are loading ckpts. |
|
All tests are passing now! Thank you @Adel-Moumen and @30stomercury for this PR. It took a while, but this is an important step toward SpeechBrain 1.0! |
The goal of this PR is to support pure ctc training and decoding (beam search). Users can set
ctc_weight: 1andctc_weight_decode: 1to perform pure ctc training and beamsearch.Here are the results I got (CTC with transformerlm):
To-dos: