Skip to content

Refactoring of SB to HuggingFace interface#1596

Closed
anautsch wants to merge 60 commits into
speechbrain:unstable-v0.6from
anautsch:hf-integration
Closed

Refactoring of SB to HuggingFace interface#1596
anautsch wants to merge 60 commits into
speechbrain:unstable-v0.6from
anautsch:hf-integration

Conversation

@anautsch

@anautsch anautsch commented Oct 5, 2022

Copy link
Copy Markdown
Collaborator

@TParcollet requested a more flexible interface to use HuggingFace than only with wav2vec2.

In another conversation, @Moumeneb1 pointed me to AutoModel of the HuggingFace transformers library.


Upfront a worthwhile side note—recently released:

  • transformers 4.22.2 on Sep 27, 2022
  • datasets 2.5.2 on Oct 5, 2022
  • huggingface-hub 0.10.0 on Sep 28, 2022

Should we opt for a minimum HF library support ?
(edit: the hub is now at 0.7.0 in requirements.txt - the other two are optional; should they remain so ? )


What has happened at the opening of this Draft PR:

  • availed handling of HuggingFace cache_dir throughout (so it can be put outside of $HOME/.cache)
    => minor impact to speechbrain/pretrained/fetching.py
  • removed unnecessary imports and dicts (handled via AutoModel)
  • placed static functions outside of a class
    • _check_model_source(path, save_path) // has changes to check if source is downloaded already
  • created helper static function - some are used implicitly by the pre-existing two classes
    • config_return_hidden_states(config)
    • model_set_spectral_augmentation(model, apply_spec_augment)
    • modify_state_dict_wav2vec2(path)
    • default_forward(model, data)
    • wav2vec2_forward(model, data, output_all_hiddens)
    • wav2vec2_pretraining_forward(model, data, mask_prob, mask_length)

    These functions are intended to be used as partials - use them -or- plug-in your own :)

  • new class: HuggingFaceModel(nn.Module) to handle all interfaces with HuggingFace transformers - with init doing:
    • determine AutoConfig (adjust if wanted)
    • create/download model from AutoModel (adjust if wanted)
    • prepare forward function abstraction
      • set input layer norm flag
      • assign inner forward function from given/default partial Callable
        (e.g., default_forward; wav2vec2_forward; wav2vec2_pretraining_forward)
      • set output layer norm flag
      • output of a variable -or- tuple

      Wrapper: forward() -> _forward() -> self.forward_partial_fn(data=data)

    • handle Freezing
  • HuggingFaceWav2Vec2 inherits now from HuggingFaceModel and is reduced to a super().init call
  • same for HuggingFaceWav2Vec2Pretrain; different init parameterization (serves here as proof-of-concept)
  • docstring examples for the three classes were working on my end

Drafting status:

  • initial PR (docstring examples & linters)
  • create integration test folder with YAML examples
  • whether/not pythonapp workflow integration tests should install transformers>=4.22.2 (or: skip their integration examples)
  • resolve TODO comments
  • check on single GPU if nothing breaks & on DDP for wav2vec2 training
  • minimize online communication overheads (once downloaded, that's it)

Edit (2022-10-11).

  • dissolve current file & create a nested folder structure with main interface & helper functions
  • drop normalization functions (note: they have been migrated correctly BUT were ontologically superfluous in the starting code prior to this PR)
  • expedite further auto-general use features provided by HF
  • explore to provide further hub examples (beyond w2v2)
  • expand briefly the existing tutorial for how to make use of this PR

Edit (2022-12-13).

@anautsch anautsch requested a review from Adel-Moumen October 5, 2022 13:19
Comment thread speechbrain/lobes/models/huggingface_wav2vec.py Outdated
Comment thread speechbrain/lobes/models/huggingface_wav2vec.py Outdated
Comment thread speechbrain/lobes/models/huggingface_wav2vec.py Outdated
Comment thread speechbrain/lobes/models/huggingface_wav2vec.py Outdated
Comment thread speechbrain/lobes/models/huggingface_wav2vec.py Outdated
Comment thread speechbrain/lobes/models/huggingface_wav2vec.py Outdated
Comment thread speechbrain/lobes/models/huggingface_wav2vec.py Outdated
Comment thread speechbrain/lobes/models/huggingface_wav2vec.py Outdated
Comment thread speechbrain/lobes/models/huggingface_wav2vec.py Outdated
Comment thread speechbrain/lobes/models/huggingface_wav2vec.py Outdated
@Adel-Moumen Adel-Moumen self-assigned this Oct 5, 2022
@Adel-Moumen Adel-Moumen added the refactor Edit code without changing functionality label Oct 5, 2022
@anautsch

anautsch commented Oct 10, 2022

Copy link
Copy Markdown
Collaborator Author

The above pre-review was resolved entirely in commit fb80a9d.

Since then, the following above TODOs have been completed:

  • create integration test folder with YAML examples
  • resolve TODO comments (note: those in py files)
  • check on single GPU if nothing breaks & on DDP for wav2vec2 training
    --> instead: on cluster cpu node (since the rest is covered through other tests)
  • minimize online communication overheads (once downloaded, that's it)
    --> cpu node ran offline

The four test yamls target the existing interfaces:

  • speechbrain.lobes.models.huggingface_wav2vec.HuggingFaceWav2Vec2
  • HuggingFaceWav2Vec2 immediately mapped from yaml through its new base HuggingFaceModel
  • speechbrain.lobes.models.huggingface_wav2vec.HuggingFaceWav2Vec2Pretrain
  • HuggingFaceWav2Vec2Pretrain immediately mapped from yaml through its new base HuggingFaceModel

Tested with

transformers       4.22.2
huggingface-hub    0.10.0
datasets           2.5.2

Note:
Each of the provided four integration tests are skipped on github (no transformers dependency & likely, out-of-memory). Common snippet:

def test_loss(device):
    skip = False
    try:
        import transformers

        _ = transformers.__version__
    except ImportError:
        skip = True
        print("\tSkipped")

    if not skip:
        main(device)

@anautsch

Copy link
Copy Markdown
Collaborator Author

Upfront - the provided integration tests are limited in their scope;
Not: throughout tests if all possible wav2vec2 settings would work or not.
Goal: does anything break immediately, crash, or not run to the end of a "minimal" example.

A list for potential legacy testing of HF integration:

pip install huggingface_hub==0.7.0 datasets==2.0.0 transformers==4.18.0
2022: May 24 - Mar 15 - Apr 6

pip install huggingface_hub==0.4.0 datasets==1.18.2 transformers==4.16.0
2022: Jan 11 - Jan 28 - Jan 27

pip install huggingface_hub==0.1.0 datasets==1.15.0 transformers==4.12.3
2021: Nov 2 - Nov 2 - Nov 3

pip install huggingface_hub==0.0.1 datasets==1.1.3 transformers==4.0.0
2020: Dec 23 - Nov 19 - Nov 30

As for the warning messages about changes in transformers v5, I'd recommend sticking with:
pip install huggingface_hub>=0.10.0 datasets>=2.5.2 "transformers>=4.22.2<5.0"

@TParcollet what's your take ?


Note: when jumping between the HF versions, on legacy testing, lots of cache-related errors appeared—one of the logs showed:

The new cache file layout looks like this

Locking the latest versions would be another option:
pip install huggingface_hub==0.10.0 datasets==2.5.2 transformers==4.22.2

@anautsch anautsch marked this pull request as ready for review October 10, 2022 15:54
@anautsch

Copy link
Copy Markdown
Collaborator Author

A question for the scope of this PR. Thus far, it was to make more transformers model available. In that sense, future PRs could explore more. Yet, what is more to be explored? 🤗

HF/datasets related pointers:
https://huggingface.co/docs/datasets/audio_load
https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0
https://huggingface.co/blog?tag=audio

I'd see these features different to the nature of this PR.

Then, there is the transformer Auto~ and Pipelines.

While we already do the former in SB (but in the SB way), the latter could have a broader impact: from which of these AutoClasses to choose from?
// perhaps an init check for alternatives to AutoModel would be a go-to here; structurally it appears of same use style.

@anautsch

Copy link
Copy Markdown
Collaborator Author

@Adel-Moumen Just noticed that we will need to update w2v2 related YAMLs on HF when this PR gets merged. What is the best way forward in your view?

For now, I'll put back in the speechbrain.lobes.models.huggingface_wav2vec interfaces.
Their use could be resolved when we do the v0.6 migration (which is becoming some sort of mantra...).

@anautsch anautsch changed the base branch from unstable-v0.6 to unstable February 8, 2023 14:40
@anautsch anautsch changed the base branch from unstable to unstable-v0.6 February 8, 2023 14:40
@anautsch

anautsch commented Feb 8, 2023

Copy link
Copy Markdown
Collaborator Author

Intermezzo notes:

Wrapping this one up over the next few weeks.

@anautsch

anautsch commented Mar 7, 2023

Copy link
Copy Markdown
Collaborator Author

To follow-up on the merge procedure described in:
#1596 (comment)

For how to use the advanced test tools, when interface refactoring touches upon pretrained model interfaces (e.g. YAML files on HuggingFace), please read:
https://github.com/speechbrain/speechbrain/blob/develop/tests/utils/README.md

To-date, the following SpeechBrain branches & PRs are related:

Branch Purpose PR
develop v0.5.14 to-come v0.6
unstable-v0.6 v0.6.0 #1596 flexible transformer integration
hf-interface-testing keeping track of HF interfaces (YAMLs & custom.py) #1868 adds two recent HF repos (original interfaces)
hf-interface-testing PR 1596 changes interfaces, so YAMLs on HF repos need to change #1801 supplements 1596 (after 1868)
unstable-v0.6 v0.6.0 #751 CTC decoding & scoring refactorings
  • PR 1868 should be a simple comparison with what's currently on HF
  • PR 1801 can be used for tutorials to show how YAMLs need to be updated; here, we keep track of our changes
  • PR 1596 is the transformers lib refactoring (this PR), to make use of their AutoConfig, AutoTokenizer, AutoModel, etc.
  • PR 751 the legendary v0.6 development - this PR will also need supplementing edits to YAMLs & pretrained interfaces on HF, in a similar tracking fashion as demonstrated with this PR for refactoring the transformers integration

The then "to-come v0.6" will be progressively enhanced on the unstable-v0.6 branch. When the moment of its merging comes, all YAMLs & interfaces that are on hf-interface-testing are ready to be put on HuggingFace. That in-parallel update procedure implies that each HF repo will have a PR which updates YAML & custom.py interface files with what we kept track of in the hf-interface-testing branch.

Before merging this PR on unstable-v0.6, please:

  1. merge the latest develop on the unstable-v0.6 branch
  2. rebase this PR away from & back to the unstable-v0.6 branch => git tree needs that for updating...

Here's how I went for testing this PR.

Preparation steps:

  1. Clone & checkout https://github.com/anautsch/speechbrain/tree/hf-integration
  2. Please create a fresh Python 3.9 environment (some testinig tools require >= 3.9)
  3. Install SpeechBrain from local repo + requirements + find recipes | grep extra | xargs cat | sort -u | grep -v \# | xargs -I {} pip install {}

Next => Test if the refactored recipes (SpeechBrain repo only) still work (here, whisper & wav2vec2 only).

  1. Run recipe tests by Hparam_file
python -c 'from tests.utils.recipe_tests import run_recipe_tests; print("TEST FAILED!") if not(run_recipe_tests(filters_fields=["Hparam_file"], filters=[["recipes/LibriSpeech/ASR/transformer/hparams/train_hf_whisper.yaml", "recipes/LibriSpeech/ASR/CTC/hparams/train_hf_whisper_encoder.yaml", "recipes/TIMIT/ASR/transducer/hparams/train_wav2vec.yaml", "recipes/TIMIT/ASR/seq2seq/hparams/train_with_wav2vec2.yaml", "recipes/SLURP/direct/hparams/train_with_wav2vec2.yaml", "recipes/IEMOCAP/emotion_recognition/hparams/train_with_wav2vec2.yaml", "recipes/LibriSpeech/ASR/CTC/hparams/train_sb_wav2vec.yaml", "recipes/LibriSpeech/ASR/CTC/hparams/train_hf_wav2vec.yaml", "recipes/LibriSpeech/self-supervised-learning/wav2vec2/hparams/wav2vec2_base.yaml", "recipes/Switchboard/ASR/CTC/hparams/train_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_fon_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_amh_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_sw_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_dar_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_wol_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_multi_with_wav2vec.yaml", "recipes/CommonVoice/ASR/seq2seq/hparams/train_en_with_wav2vec.yaml", "recipes/CommonVoice/ASR/seq2seq/hparams/train_fr_with_wav2vec.yaml", "recipes/CommonVoice/ASR/seq2seq/hparams/train_rw_with_wav2vec.yaml", "recipes/CommonVoice/ASR/seq2seq/hparams/train_it_with_wav2vec.yaml", "recipes/CommonVoice/ASR/CTC/hparams/train_en_with_wav2vec.yaml", "recipes/CommonVoice/ASR/CTC/hparams/train_fr_with_wav2vec.yaml", "recipes/CommonVoice/ASR/CTC/hparams/train_de_with_wav2vec.yaml", "recipes/CommonVoice/ASR/CTC/hparams/train_rw_with_wav2vec.yaml", "recipes/CommonVoice/ASR/CTC/hparams/train_it_with_wav2vec.yaml", "recipes/CommonVoice/self-supervised-learning/wav2vec2/hparams/wav2vec2_base.yaml", "recipes/AISHELL-1/ASR/transformer/hparams/train_ASR_transformer_with_wav2vect.yaml", "recipes/AISHELL-1/ASR/CTC/hparams/train_with_wav2vec.yaml", "recipes/timers-and-such/direct/hparams/train_with_wav2vec2.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_ar_hf_whisper.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_mn_hf_whisper.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_hi_hf_whisper.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_sr_hf_whisper.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_fa_hf_whisper.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_fr_hf_whisper.yaml"]], do_checks=False, run_opts="--device=cuda")) else print("TEST PASSED")'

Note: cat tests/recipes/*.csv | cut -d ',' -f2 | sort -u will list you the currently recorded recipe datasets available for testing.

  1. Prepare testing of YAMLs + custom.py (the to-come HF repo updates/PRs).
python -c 'from tests.utils.refactoring_checks import init;init(new_interfaces_git="https://github.com/anautsch/speechbrain", new_interfaces_branch="hf-integration")'

This clones the specified branch (it's PR 1801) to the nested tests/tmp/hf_interfaces folder. There, the tests/utils/refactoring_checks.py tool will be able to access updated YAMLs & custom.py files. The same tool gets to-date versions of these files directly from HuggingFace. To compare before and after the refactoring, we take advantage from the local repo installation of SpeechBrain: we can switch between develop & PR branches.

  1. Run single-file tests for pretrained interfaces
# Let's revisit the old way to integrate transformers into SpeechBrain
git checkout develop
python -c "from tests.utils.refactoring_checks import gather_expected_results;gather_expected_results()"

# so we can compare it with the proposed way to integrate all of the latest features from the transformers library
git checkout hf-integration
python -c "from tests.utils.refactoring_checks import gather_refactoring_results;gather_refactoring_results()"

Note: A yaml summary file will be created at tests/tmp/refactoring_results.yaml.

For the following, please ensure that test partitions of depending recipe datasets are available. As for this example, we assume there's access only available to LibriSpeech. As such, path specifications for other recipe datasets are empty, thus depending pretrained models cannot be tested. The following step aims to reproduce performance metrics using test partitions, as they are reported on the SpeechBrain recipe folders. Therefore, however, pretrained models are used.

  1. Prepare your datasets using the recipe, but point their output to the testing structure. Example: LibriSpeech (run on cpu-only; more CPUs, less waiting)
cd recipes/LibriSpeech/ASR/CTC && python train_with_wav2vec.py hparams/train_hf_wav2vec.yaml --data_folder=/path/to/dataset --output_folder=../../../../tests/tmp/LibriSpeech || cd -

Manually cancel after the data preparation finished (when recipe training starts). Repeat for other datasets (each one recipe); check on step (9) for using expected folder names.

  1. To avoid that the recipes folder cannot be found as a module, we create a symbolic link. (try w/o, you'll see)
cd tests/utils && ln -s ../../recipes  && cd -
  1. Run tests with pretrained models on the test partitions of recipe datasets
git checkout develop
python tests/utils/refactoring_checks.py tests/utils/overrides.yaml --LibriSpeech_data="/path/to/dataset" --CommonVoice_EN_data="" --CommonVoice_FR_data="" --IEMOCAP_data="" --after=False

git checkout hf-integration
python tests/utils/refactoring_checks.py tests/utils/overrides.yaml --LibriSpeech_data="/path/to/dataset" --CommonVoice_EN_data="" --CommonVoice_FR_data="" --IEMOCAP_data="" --after=True

Note: Other refactorings might have expected changes in their testing performance; then, this tool can be used to measure those changes as well.


Logs from (4)

(1/35) Running test for TIMIT_row_4...
	... 582.89s
(2/35) Running test for TIMIT_row_18...
	=> skipped; took too long – i.e. restart w/o the two TIMIT yamls

(1/33) Running test for LibriSpeech_row_2...
	... 167.02s
(2/33) Running test for LibriSpeech_row_3...
	... 44.51s
(3/33) Running test for LibriSpeech_row_23...
	... 15.72s
(4/33) Running test for LibriSpeech_row_24...
	... 28.28s
(5/33) Running test for LibriSpeech_row_25...
	... 23.78s
(6/33) Running test for DVoice_row_2...
	... 122.35s
(7/33) Running test for DVoice_row_3...
	... 110.42s
(8/33) Running test for DVoice_row_4...
	... 92.57s
(9/33) Running test for DVoice_row_5...
	... 91.19s
(10/33) Running test for DVoice_row_6...
	... 91.76s
(11/33) Running test for DVoice_row_7...
	... 88.60s
(12/33) Running test for AISHELL-1_row_2...
	... 177.38s
(13/33) Running test for AISHELL-1_row_5...
	... 99.93s
(14/33) Running test for timers-and-such_row_6...
	... 67.50s
(15/33) Running test for CommonVoice_row_2...
	... 93.06s
(16/33) Running test for CommonVoice_row_3...
	... 70.77s
(17/33) Running test for CommonVoice_row_4...
	... 70.30s
(18/33) Running test for CommonVoice_row_5...
	... 73.74s
(19/33) Running test for CommonVoice_row_6...
	... 74.12s
(20/33) Running test for CommonVoice_row_12...
	... 104.15s
(21/33) Running test for CommonVoice_row_13...
	... 123.71s
(22/33) Running test for CommonVoice_row_14...
	... 183.87s
(23/33) Running test for CommonVoice_row_15...
	... 144.55s
(24/33) Running test for CommonVoice_row_18...
	... 54.74s
(25/33) Running test for CommonVoice_row_19...
	... 34.37s
(26/33) Running test for CommonVoice_row_20...
	... 33.29s
(27/33) Running test for CommonVoice_row_21...
	... 34.68s
(28/33) Running test for CommonVoice_row_22...
	... 31.71s
(29/33) Running test for CommonVoice_row_23...
	... 40.47s
(30/33) Running test for CommonVoice_row_24...
	... 26.99s
(31/33) Running test for Switchboard_row_2...
	... 33.13s
(32/33) Running test for SLURP_row_4...
	... 25.21s
(33/33) Running test for IEMOCAP_row_2...
	... 20.28s

=> Ok, so the refactored recipes are not crashing.

Logs from (6)

$ grep same tests/tmp/refactoring_results.yaml
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true

Note: emotion-recognition-wav2vec2-IEMOCAP entries need manual clean-up.

=> Ok, so for single audios, the refactoring does no harm on inference when using pretrained models.

Logs from (9)

Run tests on: asr-wav2vec2-librispeech
speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: tests/tmp/LibriSpeech
	repo: asr-wav2vec2-librispeech
	speechbrain.pretrained.EncoderASR
	obj.from_hparams({'source': 'speechbrain/asr-wav2vec2-librispeech', 'savedir': 'pretrained_models/asr-wav2vec2-librispeech', 'run_opts': {'debug': False, 'debug_batches': 2, 'debug_epochs': 2, 'debug_persistently': False, 'device': 'cuda:0', 'data_parallel_backend': False, 'distributed_launch': False, 'distributed_backend': 'nccl', 'find_unused_parameters': False, 'tqdm_colored_bar': False}})
speechbrain.pretrained.fetching - Fetch hyperparams.yaml: Delegating to Huggingface hub, source speechbrain/asr-wav2vec2-librispeech.
speechbrain.pretrained.fetching - Fetch custom.py: Delegating to Huggingface hub, source speechbrain/asr-wav2vec2-librispeech.
Some weights of the model checkpoint at facebook/wav2vec2-large-960h-lv60-self were not used when initializing Wav2Vec2Model: ['lm_head.weight', 'lm_head.bias']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at facebook/wav2vec2-large-960h-lv60-self and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
speechbrain.lobes.models.huggingface_wav2vec - speechbrain.lobes.models.huggingface_wav2vec - wav2vec 2.0 is frozen.
speechbrain.pretrained.fetching - Fetch wav2vec2.ckpt: Using existing file/symlink in pretrained_models/asr-wav2vec2-librispeech/wav2vec2.ckpt.
speechbrain.pretrained.fetching - Fetch asr.ckpt: Using existing file/symlink in pretrained_models/asr-wav2vec2-librispeech/asr.ckpt.
speechbrain.pretrained.fetching - Fetch tokenizer.ckpt: Using existing file/symlink in pretrained_models/asr-wav2vec2-librispeech/tokenizer.ckpt.
speechbrain.utils.parameter_transfer - Loading pretrained files for: wav2vec2, asr, tokenizer
Some weights of the model checkpoint at facebook/wav2vec2-large-960h-lv60-self were not used when initializing Wav2Vec2Model: ['lm_head.weight', 'lm_head.bias']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at facebook/wav2vec2-large-960h-lv60-self and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
speechbrain.lobes.models.huggingface_wav2vec - speechbrain.lobes.models.huggingface_wav2vec - wav2vec 2.0 is frozen.
speechbrain.dataio.encoder - Load called, but CTCTextEncoder is not empty. Loaded data will overwrite everything. This is normal if there is e.g. an unk label defined at init.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 328/328 [01:26<00:00,  3.77it/s]
speechbrain.utils.train_logger - [LibriSpeech] - BEFORE: asr-wav2vec2-librispeech, set: test-clean - test CER: 5.00e-01, test WER: 1.90
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 368/368 [01:22<00:00,  4.46it/s]
speechbrain.utils.train_logger - [LibriSpeech] - BEFORE: asr-wav2vec2-librispeech, set: test-other - test CER: 8.83e-01, test WER: 2.95

We can compare this with:
https://github.com/speechbrain/speechbrain/tree/develop/recipes/LibriSpeech/ASR/CTC
where for train_hf_wav2vec.yaml a Test Clean WER of 1.90 is reported.

The log for the refactored interfaces:

Checking out files: 100% (122/122), done.
Switched to branch 'hf-integration'
Your branch is up to date with 'origin/hf-integration'.

Run tests on: asr-wav2vec2-librispeech
speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: tests/tmp/LibriSpeech
	repo: asr-wav2vec2-librispeech
	speechbrain.pretrained.EncoderASR
	obj.from_hparams({'source': 'tests/tmp/hf_interfaces/updates_pretrained_models/asr-wav2vec2-librispeech', 'savedir': 'pretrained_models/asr-wav2vec2-librispeech', 'run_opts': {'debug': False, 'debug_batches': 2, 'debug_epochs': 2, 'debug_persistently': False, 'device': 'cuda:0', 'data_parallel_backend': False, 'distributed_launch': False, 'distributed_backend': 'nccl', 'find_unused_parameters': False, 'tqdm_colored_bar': False}})
speechbrain.pretrained.fetching - Fetch hyperparams.yaml: Linking to local file in tests/tmp/hf_interfaces/updates_pretrained_models/asr-wav2vec2-librispeech/hyperparams.yaml.
speechbrain.pretrained.fetching - Fetch custom.py: Linking to local file in tests/tmp/hf_interfaces/updates_pretrained_models/asr-wav2vec2-librispeech/custom.py.
Some weights of the model checkpoint at facebook/wav2vec2-large-960h-lv60-self were not used when initializing Wav2Vec2Model: ['lm_head.bias', 'lm_head.weight']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at facebook/wav2vec2-large-960h-lv60-self and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
speechbrain.lobes.models.transformer.HuggingFace - speechbrain.lobes.models.HuggingFaceTransformer is frozen.
speechbrain.pretrained.fetching - Fetch wav2vec2.ckpt: Using existing file/symlink in pretrained_models/asr-wav2vec2-librispeech/wav2vec2.ckpt.
speechbrain.pretrained.fetching - Fetch asr.ckpt: Using existing file/symlink in pretrained_models/asr-wav2vec2-librispeech/asr.ckpt.
speechbrain.pretrained.fetching - Fetch tokenizer.ckpt: Using existing file/symlink in pretrained_models/asr-wav2vec2-librispeech/tokenizer.ckpt.
speechbrain.utils.parameter_transfer - Loading pretrained files for: wav2vec2, asr, tokenizer
Some weights of the model checkpoint at facebook/wav2vec2-large-960h-lv60-self were not used when initializing Wav2Vec2Model: ['lm_head.bias', 'lm_head.weight']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at facebook/wav2vec2-large-960h-lv60-self and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
speechbrain.lobes.models.transformer.HuggingFace - speechbrain.lobes.models.HuggingFaceTransformer is frozen.
speechbrain.dataio.encoder - Load called, but CTCTextEncoder is not empty. Loaded data will overwrite everything. This is normal if there is e.g. an unk label defined at init.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 328/328 [01:30<00:00,  3.61it/s]
speechbrain.utils.train_logger - [LibriSpeech] - AFTER: asr-wav2vec2-librispeech, set: test-clean - test CER: 5.27e-01, test WER: 2.04
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 368/368 [01:25<00:00,  4.30it/s]
speechbrain.utils.train_logger - [LibriSpeech] - AFTER: asr-wav2vec2-librispeech, set: test-other - test CER: 9.21e-01, test WER: 3.15
	before: {'test-clean': {'CER': 0.50, 'WER': 1.90}, 'test-other': {'CER': 0.88, 'WER': 2.95}}
	 after: {'test-clean': {'CER': 0.53, 'WER': 2.04}, 'test-other': {'CER': 0.92, 'WER': 3.15}}
	  same: False

=> Well, there's more going on ;-)

It's in the range & with the 751 PR, more refacotrings are coming in—also, we had a hotfixing of wav2vec2 to the end of 2022 & in early 2023, some more edits to whisper (without much performance re-checking). As of internal discussions, at some point the retraining of SpeechBrain models will be necessary.

@anautsch anautsch changed the base branch from unstable-v0.6 to develop March 10, 2023 14:18
@anautsch anautsch changed the base branch from develop to unstable-v0.6 March 10, 2023 14:18
@Adel-Moumen Adel-Moumen mentioned this pull request May 24, 2023
6 tasks
@mhn226 mhn226 self-requested a review June 20, 2023 14:46
@pplantinga

Copy link
Copy Markdown
Collaborator

Still getting a handle on this PR, but one quick question from the start: many of the changes seem to be related to removing device from the arguments to the checkpointer's .load() function. Is this a necessary part of this PR or would it make sense to put in a different PR? I would just like to hear the motivation for the change itself and for putting it in this PR, and I can't find a discussion of this point in the comments above.

@Adel-Moumen

Copy link
Copy Markdown
Collaborator

Hey @pplantinga, the device argument as already been removed in unstable-v6.0 (see: https://github.com/speechbrain/speechbrain/blob/unstable-v0.6/recipes/LibriSpeech/ASR/seq2seq/train.py#L370) thanks to @lucadellalib's PR (see: #1743).

I think you should not pay attention to this. The PR is not sync anymore to the latest commits in unstable-v6.0.

@TParcollet

Copy link
Copy Markdown
Collaborator

@mravanelli @Adel-Moumen @pplantinga I have an idea to simplify this PR. What about we take only the 'HuggingFaceTransformer(nn.Module):' class and make it an abstract class. THe idea is to force the user to override a few mandatory functions, like the forward, and the freeze params functions. THis would be MUCH simpler to grasp. There is no way we release a SB v1.0 with huggingface_wav2vec.py in the lobes while it's not even wav2vec2 anymore ...

@mravanelli

mravanelli commented Aug 7, 2023 via email

Copy link
Copy Markdown
Collaborator

@mhn226 mhn226 mentioned this pull request Aug 10, 2023
@Adel-Moumen

Copy link
Copy Markdown
Collaborator

Im closing this PR as there's a new one #2116.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready to review Waiting on reviewer to provide feedback refactor Edit code without changing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants