Refactoring of SB to HuggingFace interface by anautsch · Pull Request #1596 · speechbrain/speechbrain

anautsch · 2022-10-05T13:14:33Z

@TParcollet requested a more flexible interface to use HuggingFace than only with wav2vec2.

In another conversation, @Moumeneb1 pointed me to AutoModel of the HuggingFace transformers library.

Upfront a worthwhile side note—recently released:

transformers 4.22.2 on Sep 27, 2022
datasets 2.5.2 on Oct 5, 2022
huggingface-hub 0.10.0 on Sep 28, 2022

Should we opt for a minimum HF library support ?
(edit: the hub is now at 0.7.0 in requirements.txt - the other two are optional; should they remain so ? )

What has happened at the opening of this Draft PR:

availed handling of HuggingFace cache_dir throughout (so it can be put outside of $HOME/.cache)
=> minor impact to speechbrain/pretrained/fetching.py
removed unnecessary imports and dicts (handled via AutoModel)
placed static functions outside of a class
- _check_model_source(path, save_path) // has changes to check if source is downloaded already
created helper static function - some are used implicitly by the pre-existing two classes
- config_return_hidden_states(config)
- model_set_spectral_augmentation(model, apply_spec_augment)
- modify_state_dict_wav2vec2(path)
- default_forward(model, data)
- wav2vec2_forward(model, data, output_all_hiddens)
- wav2vec2_pretraining_forward(model, data, mask_prob, mask_length)
These functions are intended to be used as partials - use them -or- plug-in your own :)
new class: HuggingFaceModel(nn.Module) to handle all interfaces with HuggingFace transformers - with init doing:
- determine AutoConfig (adjust if wanted)
- create/download model from AutoModel (adjust if wanted)
- prepare forward function abstraction
  - set input layer norm flag
  - assign inner forward function from given/default partial Callable
    (e.g., default_forward; wav2vec2_forward; wav2vec2_pretraining_forward)
  - set output layer norm flag
  - output of a variable -or- tuple
  Wrapper: forward() -> _forward() -> self.forward_partial_fn(data=data)
- handle Freezing
HuggingFaceWav2Vec2 inherits now from HuggingFaceModel and is reduced to a super().init call
same for HuggingFaceWav2Vec2Pretrain; different init parameterization (serves here as proof-of-concept)
docstring examples for the three classes were working on my end

Drafting status:

initial PR (docstring examples & linters)
create integration test folder with YAML examples
whether/not pythonapp workflow integration tests should install transformers>=4.22.2 (or: skip their integration examples)
resolve TODO comments
check on single GPU if nothing breaks & on DDP for wav2vec2 training
minimize online communication overheads (once downloaded, that's it)

Edit (2022-10-11).

dissolve current file & create a nested folder structure with main interface & helper functions
drop normalization functions (note: they have been migrated correctly BUT were ontologically superfluous in the starting code prior to this PR)
expedite further auto-general use features provided by HF
explore to provide further hub examples (beyond w2v2)
expand briefly the existing tutorial for how to make use of this PR

Edit (2022-12-13).

merge testing from refactor: recipe testing CSVs #1600
re-test HF pretrained models & apply fixes
fix failing recipes (when transformers integration of this PR is the issue)

anautsch · 2022-10-10T14:35:10Z

The above pre-review was resolved entirely in commit fb80a9d.

Since then, the following above TODOs have been completed:

create integration test folder with YAML examples
resolve TODO comments (note: those in py files)
check on single GPU if nothing breaks & on DDP for wav2vec2 training
--> instead: on cluster cpu node (since the rest is covered through other tests)
minimize online communication overheads (once downloaded, that's it)
--> cpu node ran offline

The four test yamls target the existing interfaces:

speechbrain.lobes.models.huggingface_wav2vec.HuggingFaceWav2Vec2
HuggingFaceWav2Vec2 immediately mapped from yaml through its new base HuggingFaceModel
speechbrain.lobes.models.huggingface_wav2vec.HuggingFaceWav2Vec2Pretrain
HuggingFaceWav2Vec2Pretrain immediately mapped from yaml through its new base HuggingFaceModel

Tested with

transformers       4.22.2
huggingface-hub    0.10.0
datasets           2.5.2

Note:
Each of the provided four integration tests are skipped on github (no transformers dependency & likely, out-of-memory). Common snippet:

def test_loss(device):
    skip = False
    try:
        import transformers

        _ = transformers.__version__
    except ImportError:
        skip = True
        print("\tSkipped")

    if not skip:
        main(device)

anautsch · 2022-10-10T15:53:20Z

Upfront - the provided integration tests are limited in their scope;
Not: throughout tests if all possible wav2vec2 settings would work or not.
Goal: does anything break immediately, crash, or not run to the end of a "minimal" example.

A list for potential legacy testing of HF integration:

pip install huggingface_hub==0.7.0 datasets==2.0.0 transformers==4.18.0
2022: May 24 - Mar 15 - Apr 6

pip install huggingface_hub==0.4.0 datasets==1.18.2 transformers==4.16.0
2022: Jan 11 - Jan 28 - Jan 27

pip install huggingface_hub==0.1.0 datasets==1.15.0 transformers==4.12.3
2021: Nov 2 - Nov 2 - Nov 3

pip install huggingface_hub==0.0.1 datasets==1.1.3 transformers==4.0.0
2020: Dec 23 - Nov 19 - Nov 30

As for the warning messages about changes in transformers v5, I'd recommend sticking with:
pip install huggingface_hub>=0.10.0 datasets>=2.5.2 "transformers>=4.22.2<5.0"

@TParcollet what's your take ?

Note: when jumping between the HF versions, on legacy testing, lots of cache-related errors appeared—one of the logs showed:

The new cache file layout looks like this

Locking the latest versions would be another option:
pip install huggingface_hub==0.10.0 datasets==2.5.2 transformers==4.22.2

anautsch · 2022-10-17T15:02:42Z

A question for the scope of this PR. Thus far, it was to make more transformers model available. In that sense, future PRs could explore more. Yet, what is more to be explored? 🤗

data preparation: datasets is considered feature-complete
streaming (HF-tested on w2v2 fine-tuning), see: https://huggingface.co/docs/datasets/stream

HF/datasets related pointers:
https://huggingface.co/docs/datasets/audio_load
https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0
https://huggingface.co/blog?tag=audio

I'd see these features different to the nature of this PR.

Then, there is the transformer Auto~ and Pipelines.

pipeline for inference (comparable to SB architectures); see: https://huggingface.co/docs/transformers/pipeline_tutorial
AutoClass classes beyond AutoModel (currently the only one provided in this PR): AutoTokenizer, AutoFeatureExtractor, AutoProcessor & AutoModel; see: https://huggingface.co/docs/transformers/autoclass_tutorial

While we already do the former in SB (but in the SB way), the latter could have a broader impact: from which of these AutoClasses to choose from?
// perhaps an init check for alternatives to AutoModel would be a go-to here; structurally it appears of same use style.

anautsch · 2022-10-18T10:57:52Z

@Adel-Moumen Just noticed that we will need to update w2v2 related YAMLs on HF when this PR gets merged. What is the best way forward in your view?

For now, I'll put back in the speechbrain.lobes.models.huggingface_wav2vec interfaces.
Their use could be resolved when we do the v0.6 migration (which is becoming some sort of mantra...).

anautsch · 2023-02-08T14:54:14Z

Intermezzo notes:

115 commits with +6,403 -1,761 changes is now 49 commits with +2,498 -972 changes
PR Collecting HF yaml updates progressively #1623 is now pr1596 depending refactorings #1801 (cleaned-up before 1600/testing refactoring was merged)
meanwhile, whisper came in & changed
meanwhile, w2v2 was hotfixed
meanwhile, the tutorial changed

Wrapping this one up over the next few weeks.

anautsch · 2023-03-07T10:14:12Z

To follow-up on the merge procedure described in:
#1596 (comment)

For how to use the advanced test tools, when interface refactoring touches upon pretrained model interfaces (e.g. YAML files on HuggingFace), please read:
https://github.com/speechbrain/speechbrain/blob/develop/tests/utils/README.md

To-date, the following SpeechBrain branches & PRs are related:

Branch	Purpose	PR
develop	v0.5.14	to-come v0.6
unstable-v0.6	v0.6.0	#1596 flexible transformer integration
hf-interface-testing	keeping track of HF interfaces (YAMLs & custom.py)	#1868 adds two recent HF repos (original interfaces)
hf-interface-testing	PR 1596 changes interfaces, so YAMLs on HF repos need to change	#1801 supplements 1596 (after 1868)
unstable-v0.6	v0.6.0	#751 CTC decoding & scoring refactorings

PR 1868 should be a simple comparison with what's currently on HF
PR 1801 can be used for tutorials to show how YAMLs need to be updated; here, we keep track of our changes
PR 1596 is the transformers lib refactoring (this PR), to make use of their AutoConfig, AutoTokenizer, AutoModel, etc.
PR 751 the legendary v0.6 development - this PR will also need supplementing edits to YAMLs & pretrained interfaces on HF, in a similar tracking fashion as demonstrated with this PR for refactoring the transformers integration

The then "to-come v0.6" will be progressively enhanced on the unstable-v0.6 branch. When the moment of its merging comes, all YAMLs & interfaces that are on hf-interface-testing are ready to be put on HuggingFace. That in-parallel update procedure implies that each HF repo will have a PR which updates YAML & custom.py interface files with what we kept track of in the hf-interface-testing branch.

Before merging this PR on unstable-v0.6, please:

merge the latest develop on the unstable-v0.6 branch
rebase this PR away from & back to the unstable-v0.6 branch => git tree needs that for updating...

Here's how I went for testing this PR.

Preparation steps:

Clone & checkout https://github.com/anautsch/speechbrain/tree/hf-integration
Please create a fresh Python 3.9 environment (some testinig tools require >= 3.9)
Install SpeechBrain from local repo + requirements + find recipes | grep extra | xargs cat | sort -u | grep -v \# | xargs -I {} pip install {}

Next => Test if the refactored recipes (SpeechBrain repo only) still work (here, whisper & wav2vec2 only).

Run recipe tests by Hparam_file

python -c 'from tests.utils.recipe_tests import run_recipe_tests; print("TEST FAILED!") if not(run_recipe_tests(filters_fields=["Hparam_file"], filters=[["recipes/LibriSpeech/ASR/transformer/hparams/train_hf_whisper.yaml", "recipes/LibriSpeech/ASR/CTC/hparams/train_hf_whisper_encoder.yaml", "recipes/TIMIT/ASR/transducer/hparams/train_wav2vec.yaml", "recipes/TIMIT/ASR/seq2seq/hparams/train_with_wav2vec2.yaml", "recipes/SLURP/direct/hparams/train_with_wav2vec2.yaml", "recipes/IEMOCAP/emotion_recognition/hparams/train_with_wav2vec2.yaml", "recipes/LibriSpeech/ASR/CTC/hparams/train_sb_wav2vec.yaml", "recipes/LibriSpeech/ASR/CTC/hparams/train_hf_wav2vec.yaml", "recipes/LibriSpeech/self-supervised-learning/wav2vec2/hparams/wav2vec2_base.yaml", "recipes/Switchboard/ASR/CTC/hparams/train_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_fon_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_amh_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_sw_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_dar_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_wol_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_multi_with_wav2vec.yaml", "recipes/CommonVoice/ASR/seq2seq/hparams/train_en_with_wav2vec.yaml", "recipes/CommonVoice/ASR/seq2seq/hparams/train_fr_with_wav2vec.yaml", "recipes/CommonVoice/ASR/seq2seq/hparams/train_rw_with_wav2vec.yaml", "recipes/CommonVoice/ASR/seq2seq/hparams/train_it_with_wav2vec.yaml", "recipes/CommonVoice/ASR/CTC/hparams/train_en_with_wav2vec.yaml", "recipes/CommonVoice/ASR/CTC/hparams/train_fr_with_wav2vec.yaml", "recipes/CommonVoice/ASR/CTC/hparams/train_de_with_wav2vec.yaml", "recipes/CommonVoice/ASR/CTC/hparams/train_rw_with_wav2vec.yaml", "recipes/CommonVoice/ASR/CTC/hparams/train_it_with_wav2vec.yaml", "recipes/CommonVoice/self-supervised-learning/wav2vec2/hparams/wav2vec2_base.yaml", "recipes/AISHELL-1/ASR/transformer/hparams/train_ASR_transformer_with_wav2vect.yaml", "recipes/AISHELL-1/ASR/CTC/hparams/train_with_wav2vec.yaml", "recipes/timers-and-such/direct/hparams/train_with_wav2vec2.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_ar_hf_whisper.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_mn_hf_whisper.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_hi_hf_whisper.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_sr_hf_whisper.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_fa_hf_whisper.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_fr_hf_whisper.yaml"]], do_checks=False, run_opts="--device=cuda")) else print("TEST PASSED")'

Note: cat tests/recipes/*.csv | cut -d ',' -f2 | sort -u will list you the currently recorded recipe datasets available for testing.

Prepare testing of YAMLs + custom.py (the to-come HF repo updates/PRs).

python -c 'from tests.utils.refactoring_checks import init;init(new_interfaces_git="https://github.com/anautsch/speechbrain", new_interfaces_branch="hf-integration")'

This clones the specified branch (it's PR 1801) to the nested tests/tmp/hf_interfaces folder. There, the tests/utils/refactoring_checks.py tool will be able to access updated YAMLs & custom.py files. The same tool gets to-date versions of these files directly from HuggingFace. To compare before and after the refactoring, we take advantage from the local repo installation of SpeechBrain: we can switch between develop & PR branches.

Run single-file tests for pretrained interfaces

# Let's revisit the old way to integrate transformers into SpeechBrain
git checkout develop
python -c "from tests.utils.refactoring_checks import gather_expected_results;gather_expected_results()"

# so we can compare it with the proposed way to integrate all of the latest features from the transformers library
git checkout hf-integration
python -c "from tests.utils.refactoring_checks import gather_refactoring_results;gather_refactoring_results()"

Note: A yaml summary file will be created at tests/tmp/refactoring_results.yaml.

For the following, please ensure that test partitions of depending recipe datasets are available. As for this example, we assume there's access only available to LibriSpeech. As such, path specifications for other recipe datasets are empty, thus depending pretrained models cannot be tested. The following step aims to reproduce performance metrics using test partitions, as they are reported on the SpeechBrain recipe folders. Therefore, however, pretrained models are used.

Prepare your datasets using the recipe, but point their output to the testing structure. Example: LibriSpeech (run on cpu-only; more CPUs, less waiting)

cd recipes/LibriSpeech/ASR/CTC && python train_with_wav2vec.py hparams/train_hf_wav2vec.yaml --data_folder=/path/to/dataset --output_folder=../../../../tests/tmp/LibriSpeech || cd -

Manually cancel after the data preparation finished (when recipe training starts). Repeat for other datasets (each one recipe); check on step (9) for using expected folder names.

To avoid that the recipes folder cannot be found as a module, we create a symbolic link. (try w/o, you'll see)

cd tests/utils && ln -s ../../recipes  && cd -

Run tests with pretrained models on the test partitions of recipe datasets

git checkout develop
python tests/utils/refactoring_checks.py tests/utils/overrides.yaml --LibriSpeech_data="/path/to/dataset" --CommonVoice_EN_data="" --CommonVoice_FR_data="" --IEMOCAP_data="" --after=False

git checkout hf-integration
python tests/utils/refactoring_checks.py tests/utils/overrides.yaml --LibriSpeech_data="/path/to/dataset" --CommonVoice_EN_data="" --CommonVoice_FR_data="" --IEMOCAP_data="" --after=True

Note: Other refactorings might have expected changes in their testing performance; then, this tool can be used to measure those changes as well.

Logs from (4)

(1/35) Running test for TIMIT_row_4...
	... 582.89s
(2/35) Running test for TIMIT_row_18...
	=> skipped; took too long – i.e. restart w/o the two TIMIT yamls

(1/33) Running test for LibriSpeech_row_2...
	... 167.02s
(2/33) Running test for LibriSpeech_row_3...
	... 44.51s
(3/33) Running test for LibriSpeech_row_23...
	... 15.72s
(4/33) Running test for LibriSpeech_row_24...
	... 28.28s
(5/33) Running test for LibriSpeech_row_25...
	... 23.78s
(6/33) Running test for DVoice_row_2...
	... 122.35s
(7/33) Running test for DVoice_row_3...
	... 110.42s
(8/33) Running test for DVoice_row_4...
	... 92.57s
(9/33) Running test for DVoice_row_5...
	... 91.19s
(10/33) Running test for DVoice_row_6...
	... 91.76s
(11/33) Running test for DVoice_row_7...
	... 88.60s
(12/33) Running test for AISHELL-1_row_2...
	... 177.38s
(13/33) Running test for AISHELL-1_row_5...
	... 99.93s
(14/33) Running test for timers-and-such_row_6...
	... 67.50s
(15/33) Running test for CommonVoice_row_2...
	... 93.06s
(16/33) Running test for CommonVoice_row_3...
	... 70.77s
(17/33) Running test for CommonVoice_row_4...
	... 70.30s
(18/33) Running test for CommonVoice_row_5...
	... 73.74s
(19/33) Running test for CommonVoice_row_6...
	... 74.12s
(20/33) Running test for CommonVoice_row_12...
	... 104.15s
(21/33) Running test for CommonVoice_row_13...
	... 123.71s
(22/33) Running test for CommonVoice_row_14...
	... 183.87s
(23/33) Running test for CommonVoice_row_15...
	... 144.55s
(24/33) Running test for CommonVoice_row_18...
	... 54.74s
(25/33) Running test for CommonVoice_row_19...
	... 34.37s
(26/33) Running test for CommonVoice_row_20...
	... 33.29s
(27/33) Running test for CommonVoice_row_21...
	... 34.68s
(28/33) Running test for CommonVoice_row_22...
	... 31.71s
(29/33) Running test for CommonVoice_row_23...
	... 40.47s
(30/33) Running test for CommonVoice_row_24...
	... 26.99s
(31/33) Running test for Switchboard_row_2...
	... 33.13s
(32/33) Running test for SLURP_row_4...
	... 25.21s
(33/33) Running test for IEMOCAP_row_2...
	... 20.28s

=> Ok, so the refactored recipes are not crashing.

Logs from (6)

$ grep same tests/tmp/refactoring_results.yaml
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true
  same: true

Note: emotion-recognition-wav2vec2-IEMOCAP entries need manual clean-up.

=> Ok, so for single audios, the refactoring does no harm on inference when using pretrained models.

Logs from (9)

Run tests on: asr-wav2vec2-librispeech
speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: tests/tmp/LibriSpeech
	repo: asr-wav2vec2-librispeech
	speechbrain.pretrained.EncoderASR
	obj.from_hparams({'source': 'speechbrain/asr-wav2vec2-librispeech', 'savedir': 'pretrained_models/asr-wav2vec2-librispeech', 'run_opts': {'debug': False, 'debug_batches': 2, 'debug_epochs': 2, 'debug_persistently': False, 'device': 'cuda:0', 'data_parallel_backend': False, 'distributed_launch': False, 'distributed_backend': 'nccl', 'find_unused_parameters': False, 'tqdm_colored_bar': False}})
speechbrain.pretrained.fetching - Fetch hyperparams.yaml: Delegating to Huggingface hub, source speechbrain/asr-wav2vec2-librispeech.
speechbrain.pretrained.fetching - Fetch custom.py: Delegating to Huggingface hub, source speechbrain/asr-wav2vec2-librispeech.
Some weights of the model checkpoint at facebook/wav2vec2-large-960h-lv60-self were not used when initializing Wav2Vec2Model: ['lm_head.weight', 'lm_head.bias']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at facebook/wav2vec2-large-960h-lv60-self and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
speechbrain.lobes.models.huggingface_wav2vec - speechbrain.lobes.models.huggingface_wav2vec - wav2vec 2.0 is frozen.
speechbrain.pretrained.fetching - Fetch wav2vec2.ckpt: Using existing file/symlink in pretrained_models/asr-wav2vec2-librispeech/wav2vec2.ckpt.
speechbrain.pretrained.fetching - Fetch asr.ckpt: Using existing file/symlink in pretrained_models/asr-wav2vec2-librispeech/asr.ckpt.
speechbrain.pretrained.fetching - Fetch tokenizer.ckpt: Using existing file/symlink in pretrained_models/asr-wav2vec2-librispeech/tokenizer.ckpt.
speechbrain.utils.parameter_transfer - Loading pretrained files for: wav2vec2, asr, tokenizer
Some weights of the model checkpoint at facebook/wav2vec2-large-960h-lv60-self were not used when initializing Wav2Vec2Model: ['lm_head.weight', 'lm_head.bias']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at facebook/wav2vec2-large-960h-lv60-self and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
speechbrain.lobes.models.huggingface_wav2vec - speechbrain.lobes.models.huggingface_wav2vec - wav2vec 2.0 is frozen.
speechbrain.dataio.encoder - Load called, but CTCTextEncoder is not empty. Loaded data will overwrite everything. This is normal if there is e.g. an unk label defined at init.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 328/328 [01:26<00:00,  3.77it/s]
speechbrain.utils.train_logger - [LibriSpeech] - BEFORE: asr-wav2vec2-librispeech, set: test-clean - test CER: 5.00e-01, test WER: 1.90
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 368/368 [01:22<00:00,  4.46it/s]
speechbrain.utils.train_logger - [LibriSpeech] - BEFORE: asr-wav2vec2-librispeech, set: test-other - test CER: 8.83e-01, test WER: 2.95

We can compare this with:
https://github.com/speechbrain/speechbrain/tree/develop/recipes/LibriSpeech/ASR/CTC
where for train_hf_wav2vec.yaml a Test Clean WER of 1.90 is reported.

The log for the refactored interfaces:

Checking out files: 100% (122/122), done.
Switched to branch 'hf-integration'
Your branch is up to date with 'origin/hf-integration'.

Run tests on: asr-wav2vec2-librispeech
speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: tests/tmp/LibriSpeech
	repo: asr-wav2vec2-librispeech
	speechbrain.pretrained.EncoderASR
	obj.from_hparams({'source': 'tests/tmp/hf_interfaces/updates_pretrained_models/asr-wav2vec2-librispeech', 'savedir': 'pretrained_models/asr-wav2vec2-librispeech', 'run_opts': {'debug': False, 'debug_batches': 2, 'debug_epochs': 2, 'debug_persistently': False, 'device': 'cuda:0', 'data_parallel_backend': False, 'distributed_launch': False, 'distributed_backend': 'nccl', 'find_unused_parameters': False, 'tqdm_colored_bar': False}})
speechbrain.pretrained.fetching - Fetch hyperparams.yaml: Linking to local file in tests/tmp/hf_interfaces/updates_pretrained_models/asr-wav2vec2-librispeech/hyperparams.yaml.
speechbrain.pretrained.fetching - Fetch custom.py: Linking to local file in tests/tmp/hf_interfaces/updates_pretrained_models/asr-wav2vec2-librispeech/custom.py.
Some weights of the model checkpoint at facebook/wav2vec2-large-960h-lv60-self were not used when initializing Wav2Vec2Model: ['lm_head.bias', 'lm_head.weight']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at facebook/wav2vec2-large-960h-lv60-self and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
speechbrain.lobes.models.transformer.HuggingFace - speechbrain.lobes.models.HuggingFaceTransformer is frozen.
speechbrain.pretrained.fetching - Fetch wav2vec2.ckpt: Using existing file/symlink in pretrained_models/asr-wav2vec2-librispeech/wav2vec2.ckpt.
speechbrain.pretrained.fetching - Fetch asr.ckpt: Using existing file/symlink in pretrained_models/asr-wav2vec2-librispeech/asr.ckpt.
speechbrain.pretrained.fetching - Fetch tokenizer.ckpt: Using existing file/symlink in pretrained_models/asr-wav2vec2-librispeech/tokenizer.ckpt.
speechbrain.utils.parameter_transfer - Loading pretrained files for: wav2vec2, asr, tokenizer
Some weights of the model checkpoint at facebook/wav2vec2-large-960h-lv60-self were not used when initializing Wav2Vec2Model: ['lm_head.bias', 'lm_head.weight']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at facebook/wav2vec2-large-960h-lv60-self and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
speechbrain.lobes.models.transformer.HuggingFace - speechbrain.lobes.models.HuggingFaceTransformer is frozen.
speechbrain.dataio.encoder - Load called, but CTCTextEncoder is not empty. Loaded data will overwrite everything. This is normal if there is e.g. an unk label defined at init.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 328/328 [01:30<00:00,  3.61it/s]
speechbrain.utils.train_logger - [LibriSpeech] - AFTER: asr-wav2vec2-librispeech, set: test-clean - test CER: 5.27e-01, test WER: 2.04
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 368/368 [01:25<00:00,  4.30it/s]
speechbrain.utils.train_logger - [LibriSpeech] - AFTER: asr-wav2vec2-librispeech, set: test-other - test CER: 9.21e-01, test WER: 3.15
	before: {'test-clean': {'CER': 0.50, 'WER': 1.90}, 'test-other': {'CER': 0.88, 'WER': 2.95}}
	 after: {'test-clean': {'CER': 0.53, 'WER': 2.04}, 'test-other': {'CER': 0.92, 'WER': 3.15}}
	  same: False

=> Well, there's more going on ;-)

It's in the range & with the 751 PR, more refacotrings are coming in—also, we had a hotfixing of wav2vec2 to the end of 2022 & in early 2023, some more edits to whisper (without much performance re-checking). As of internal discussions, at some point the retraining of SpeechBrain models will be necessary.

pplantinga · 2023-08-02T01:34:37Z

Still getting a handle on this PR, but one quick question from the start: many of the changes seem to be related to removing device from the arguments to the checkpointer's .load() function. Is this a necessary part of this PR or would it make sense to put in a different PR? I would just like to hear the motivation for the change itself and for putting it in this PR, and I can't find a discussion of this point in the comments above.

Adel-Moumen · 2023-08-02T09:38:01Z

Hey @pplantinga, the device argument as already been removed in unstable-v6.0 (see: https://github.com/speechbrain/speechbrain/blob/unstable-v0.6/recipes/LibriSpeech/ASR/seq2seq/train.py#L370) thanks to @lucadellalib's PR (see: #1743).

I think you should not pay attention to this. The PR is not sync anymore to the latest commits in unstable-v6.0.

TParcollet · 2023-08-06T18:25:54Z

@mravanelli @Adel-Moumen @pplantinga I have an idea to simplify this PR. What about we take only the 'HuggingFaceTransformer(nn.Module):' class and make it an abstract class. THe idea is to force the user to override a few mandatory functions, like the forward, and the freeze params functions. THis would be MUCH simpler to grasp. There is no way we release a SB v1.0 with huggingface_wav2vec.py in the lobes while it's not even wav2vec2 anymore ...

mravanelli · 2023-08-07T13:08:23Z

I agree on the need for a simplification. Could you show us a code snippet to figure out how the code can look like with this proposal?

…

On Sun, Aug 6, 2023 at 2:26 PM Parcollet Titouan ***@***.***> wrote: @mravanelli <https://github.com/mravanelli> @Adel-Moumen <https://github.com/Adel-Moumen> @pplantinga <https://github.com/pplantinga> I have an idea to simplify this PR. What about we take only the 'HuggingFaceTransformer(nn.Module):' class and make it an abstract class. THe idea is to force the user to override a few mandatory functions, like the forward, and the freeze params functions. THis would be MUCH simpler to grasp. There is no way we release a SB v1.0 with huggingface_wav2vec.py in the lobes while it's not even wav2vec2 anymore ... — Reply to this email directly, view it on GitHub <#1596 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVQCBLKBELEHWEEZ2VDXT7OT5ANCNFSM6AAAAAAQ5SH6PY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Adel-Moumen · 2023-08-17T13:07:08Z

Im closing this PR as there's a new one #2116.

anautsch added 2 commits October 5, 2022 14:36

Refactoring of SB to HuggingFace interface

4d117a8

linters

6e7981d

anautsch requested a review from Adel-Moumen October 5, 2022 13:19

anautsch added 3 commits October 5, 2022 15:27

Merge branch 'speechbrain:develop' into hf-integration

cffbe3f

docstring edits

b5aa528

Value type checks (flags)

7836f26

Adel-Moumen requested changes Oct 5, 2022

View reviewed changes

Adel-Moumen self-assigned this Oct 5, 2022

Adel-Moumen added the refactor Edit code without changing functionality label Oct 5, 2022

anautsch added 6 commits October 6, 2022 10:45

documentation: partials for wrapping external forward functions

fb80a9d

integration tests w/ examples

c1dec26

immediate yaml wrappers & skip tests if no transformers

882ad91

immediate HF wrapper from yaml (pretrained models) & fixes

7cbfc54

integration tests revisited

b90dac2

minor fix

f8ad77f

anautsch marked this pull request as ready for review October 10, 2022 15:54

anautsch added 2 commits October 12, 2022 14:31

modularised

6062603

adjusted doctests

a9465d0

anautsch added 2 commits October 17, 2022 17:15

availed all HF AutoClass:es

da29b8b

linters

32421e9

anautsch added 6 commits October 18, 2022 12:59

legacy interface wrapping put back in place

dc436a2

put test/ignores back in for HF w2v2 intreface

6dd361f

updated paths to keep legacy in integration tests

2750f32

refactor checks

5ae61bb

fix

4f9acbf

deactivated ssl

feba26d

TParcollet requested a review from mravanelli January 3, 2023 16:50

anautsch mentioned this pull request Jan 13, 2023

pr1596 depending refactorings #1801

Merged

merge develop

40b687a

anautsch changed the base branch from unstable-v0.6 to unstable February 8, 2023 14:40

anautsch changed the base branch from unstable to unstable-v0.6 February 8, 2023 14:40

anautsch added 10 commits February 9, 2023 15:53

lints & enc/dec split in sb transformer w/ example

10c788a

reset SB transformer

c2179ff

Merge unstable

656a3b8

fixes to unstable branch

7745f53

merge develop

f4cc79c

lints & more

9ddab4c

merge pending PR to update refactoring test tools

e5edc6a

Merge & edits

e9655a3

CTC recipe fixes

8c6480b

adjust new whisper recipes

f9c6dd5

merge develop

b3144bf

anautsch changed the base branch from unstable-v0.6 to develop March 10, 2023 14:18

anautsch changed the base branch from develop to unstable-v0.6 March 10, 2023 14:18

Adel-Moumen mentioned this pull request May 24, 2023

Neural Rescoring #1986

Closed

6 tasks

Adel-Moumen assigned TParcollet Jun 20, 2023

mhn226 self-requested a review June 20, 2023 14:46

mhn226 mentioned this pull request Aug 10, 2023

HF interface #2116

Merged

Adel-Moumen closed this Aug 17, 2023

Conversation

anautsch commented Oct 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anautsch commented Oct 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anautsch commented Oct 10, 2022

Uh oh!

anautsch commented Oct 17, 2022

Uh oh!

anautsch commented Oct 18, 2022

Uh oh!

anautsch commented Feb 8, 2023

Uh oh!

anautsch commented Mar 7, 2023

Uh oh!

pplantinga commented Aug 2, 2023

Uh oh!

Adel-Moumen commented Aug 2, 2023

Uh oh!

TParcollet commented Aug 6, 2023

Uh oh!

mravanelli commented Aug 7, 2023 via email

Uh oh!

Adel-Moumen commented Aug 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

anautsch commented Oct 5, 2022 •

edited

Loading

anautsch commented Oct 10, 2022 •

edited

Loading