Skip to content

Add CTC recipe to AISHELL-1#1576

Merged
anautsch merged 21 commits into
speechbrain:developfrom
BenoitWang:aishell-ctc
Oct 7, 2022
Merged

Add CTC recipe to AISHELL-1#1576
anautsch merged 21 commits into
speechbrain:developfrom
BenoitWang:aishell-ctc

Conversation

@BenoitWang

Copy link
Copy Markdown
Collaborator

Hi @mravanelli @TParcollet , this PR adds a typical CTC-wav2vec recipe to AISHELL-1.
Test CER: 5.06%
Dev CER: 4.52%

Some points:

  1. chinese-wav2vec2-large (from Tencent) is used which is pretrained on 10k hours Chinese data
  2. bert-base-chinese is used as the tokenizer, ctc is trained on chars
  3. In prepare.py, pandas is not necessary to be used to generate csv, so it is deleted together with some unused variables.

@TParcollet

Copy link
Copy Markdown
Collaborator

Huge ! Is this comparable to the SOTA around?

@BenoitWang

Copy link
Copy Markdown
Collaborator Author

Hi @TParcollet , I think it's good for a system pure-CTC/greedy/without LM.
Hybrid models from espnet got better CER:

model Test CER Dev CER LM
our ctc-wav2vec 5.06% 4.52% No
espnet: branchformer-beam10-ctc0.4 4.4% 4.1% No
espnet: conformer-beam20-ctc0.3 4.9% 4.5% No

@TParcollet

Copy link
Copy Markdown
Collaborator

I see, not bad, but we use extra pre-training while they don't, correct ?

@BenoitWang

Copy link
Copy Markdown
Collaborator Author

Yes exact. Fair enough, but their branchformer is quite something according to the results.

@anautsch anautsch left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @BenoitWang minor details only.

The yaml file combines related hparam files well; as is for the train script.

Is the AISHELL-1 prepare script completely stripped of the extra dependency to pandas for all its recipes?
(not a bad thing to reduce dependencies, although pandas is neat, - just asking - pandas is not in the SB requirements and neither it is explicitly stated fo AISHELL-1, so it's a good catch)

Comment thread recipes/AISHELL-1/ASR/CTC/README.md Outdated
Comment thread recipes/AISHELL-1/ASR/CTC/train_with_wav2vec.py Outdated
@BenoitWang

Copy link
Copy Markdown
Collaborator Author

Hi @anautsch thanks for the review, the fix is done. And yes that's why I want to reduce pandas, it is only used to generate csv for all the recipes.

Comment thread recipes/AISHELL-1/ASR/CTC/README.md
Comment thread recipes/AISHELL-1/ASR/CTC/README.md Outdated
Comment thread recipes/AISHELL-1/ASR/CTC/prepare.py Outdated
Comment thread recipes/AISHELL-1/ASR/CTC/train_with_wav2vec.py Outdated
@BenoitWang

Copy link
Copy Markdown
Collaborator Author

Hi @TParcollet @anautsch @Adel-Moumen,

Thank you all for the reviews and tests! The HF link is added, here's a brief summary of the PR:

  1. add a CTC recipe
  2. fix naming problems
  3. fix dynamic batching conflicts for seq2seq & transformer recipes

@anautsch

anautsch commented Oct 7, 2022

Copy link
Copy Markdown
Collaborator

lgtm.

Tested recipes in --debug mode & the wav2vec2 with ddp.


Side note: we have an internal issue with --debug and eval checkpointing - this comes clear when running this transformer wav2vec2 recipe - here's the relevant log

   asr_brain.evaluate(
  File "speechbrain/core.py", line 1260, in evaluate
    self.on_evaluate_start(max_key=max_key, min_key=min_key)
  File "train_with_wav2vect.py", line 272, in on_evaluate_start
    ckpt = sb.utils.checkpoints.average_checkpoints(
  File "speechbrain/utils/checkpoints.py", line 1174, in average_checkpoints
    return averager(parameter_iterator)
  File "speechbrain/utils/checkpoints.py", line 1080, in average_state_dicts
    raise ValueError("No state dicts to average.")
ValueError: No state dicts to average.

@anautsch anautsch merged commit 39f9f39 into speechbrain:develop Oct 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants