Refactoring of SB to HuggingFace interface#1596
Conversation
|
The above pre-review was resolved entirely in commit fb80a9d. Since then, the following above TODOs have been completed:
The four test yamls target the existing interfaces:
Tested with Note: def test_loss(device):
skip = False
try:
import transformers
_ = transformers.__version__
except ImportError:
skip = True
print("\tSkipped")
if not skip:
main(device) |
|
Upfront - the provided integration tests are limited in their scope; A list for potential legacy testing of HF integration: As for the warning messages about changes in transformers v5, I'd recommend sticking with: @TParcollet what's your take ? Note: when jumping between the HF versions, on legacy testing, lots of cache-related errors appeared—one of the logs showed: Locking the latest versions would be another option: |
|
A question for the scope of this PR. Thus far, it was to make more transformers model available. In that sense, future PRs could explore more. Yet, what is more to be explored? 🤗
HF/datasets related pointers: I'd see these features different to the nature of this PR. Then, there is the transformer Auto~ and Pipelines.
While we already do the former in SB (but in the SB way), the latter could have a broader impact: from which of these AutoClasses to choose from? |
|
@Adel-Moumen Just noticed that we will need to update w2v2 related YAMLs on HF when this PR gets merged. What is the best way forward in your view? For now, I'll put back in the |
|
Intermezzo notes:
Wrapping this one up over the next few weeks. |
|
To follow-up on the merge procedure described in: For how to use the advanced test tools, when interface refactoring touches upon pretrained model interfaces (e.g. YAML files on HuggingFace), please read: To-date, the following SpeechBrain branches & PRs are related:
The then "to-come v0.6" will be progressively enhanced on the Before merging this PR on
Here's how I went for testing this PR. Preparation steps:
Next => Test if the refactored recipes (SpeechBrain repo only) still work (here, whisper & wav2vec2 only).
python -c 'from tests.utils.recipe_tests import run_recipe_tests; print("TEST FAILED!") if not(run_recipe_tests(filters_fields=["Hparam_file"], filters=[["recipes/LibriSpeech/ASR/transformer/hparams/train_hf_whisper.yaml", "recipes/LibriSpeech/ASR/CTC/hparams/train_hf_whisper_encoder.yaml", "recipes/TIMIT/ASR/transducer/hparams/train_wav2vec.yaml", "recipes/TIMIT/ASR/seq2seq/hparams/train_with_wav2vec2.yaml", "recipes/SLURP/direct/hparams/train_with_wav2vec2.yaml", "recipes/IEMOCAP/emotion_recognition/hparams/train_with_wav2vec2.yaml", "recipes/LibriSpeech/ASR/CTC/hparams/train_sb_wav2vec.yaml", "recipes/LibriSpeech/ASR/CTC/hparams/train_hf_wav2vec.yaml", "recipes/LibriSpeech/self-supervised-learning/wav2vec2/hparams/wav2vec2_base.yaml", "recipes/Switchboard/ASR/CTC/hparams/train_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_fon_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_amh_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_sw_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_dar_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_wol_with_wav2vec.yaml", "recipes/DVoice/ASR/CTC/hparams/train_multi_with_wav2vec.yaml", "recipes/CommonVoice/ASR/seq2seq/hparams/train_en_with_wav2vec.yaml", "recipes/CommonVoice/ASR/seq2seq/hparams/train_fr_with_wav2vec.yaml", "recipes/CommonVoice/ASR/seq2seq/hparams/train_rw_with_wav2vec.yaml", "recipes/CommonVoice/ASR/seq2seq/hparams/train_it_with_wav2vec.yaml", "recipes/CommonVoice/ASR/CTC/hparams/train_en_with_wav2vec.yaml", "recipes/CommonVoice/ASR/CTC/hparams/train_fr_with_wav2vec.yaml", "recipes/CommonVoice/ASR/CTC/hparams/train_de_with_wav2vec.yaml", "recipes/CommonVoice/ASR/CTC/hparams/train_rw_with_wav2vec.yaml", "recipes/CommonVoice/ASR/CTC/hparams/train_it_with_wav2vec.yaml", "recipes/CommonVoice/self-supervised-learning/wav2vec2/hparams/wav2vec2_base.yaml", "recipes/AISHELL-1/ASR/transformer/hparams/train_ASR_transformer_with_wav2vect.yaml", "recipes/AISHELL-1/ASR/CTC/hparams/train_with_wav2vec.yaml", "recipes/timers-and-such/direct/hparams/train_with_wav2vec2.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_ar_hf_whisper.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_mn_hf_whisper.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_hi_hf_whisper.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_sr_hf_whisper.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_fa_hf_whisper.yaml", "recipes/CommonVoice/ASR/transformer/hparams/train_fr_hf_whisper.yaml"]], do_checks=False, run_opts="--device=cuda")) else print("TEST PASSED")'Note:
python -c 'from tests.utils.refactoring_checks import init;init(new_interfaces_git="https://github.com/anautsch/speechbrain", new_interfaces_branch="hf-integration")'This clones the specified branch (it's PR 1801) to the nested
# Let's revisit the old way to integrate transformers into SpeechBrain
git checkout develop
python -c "from tests.utils.refactoring_checks import gather_expected_results;gather_expected_results()"
# so we can compare it with the proposed way to integrate all of the latest features from the transformers library
git checkout hf-integration
python -c "from tests.utils.refactoring_checks import gather_refactoring_results;gather_refactoring_results()"Note: A yaml summary file will be created at For the following, please ensure that test partitions of depending recipe datasets are available. As for this example, we assume there's access only available to LibriSpeech. As such, path specifications for other recipe datasets are empty, thus depending pretrained models cannot be tested. The following step aims to reproduce performance metrics using test partitions, as they are reported on the SpeechBrain recipe folders. Therefore, however, pretrained models are used.
cd recipes/LibriSpeech/ASR/CTC && python train_with_wav2vec.py hparams/train_hf_wav2vec.yaml --data_folder=/path/to/dataset --output_folder=../../../../tests/tmp/LibriSpeech || cd -Manually cancel after the data preparation finished (when recipe training starts). Repeat for other datasets (each one recipe); check on step (9) for using expected folder names.
cd tests/utils && ln -s ../../recipes && cd -
git checkout develop
python tests/utils/refactoring_checks.py tests/utils/overrides.yaml --LibriSpeech_data="/path/to/dataset" --CommonVoice_EN_data="" --CommonVoice_FR_data="" --IEMOCAP_data="" --after=False
git checkout hf-integration
python tests/utils/refactoring_checks.py tests/utils/overrides.yaml --LibriSpeech_data="/path/to/dataset" --CommonVoice_EN_data="" --CommonVoice_FR_data="" --IEMOCAP_data="" --after=TrueNote: Other refactorings might have expected changes in their testing performance; then, this tool can be used to measure those changes as well. Logs from (4) => Ok, so the refactored recipes are not crashing. Logs from (6) Note: emotion-recognition-wav2vec2-IEMOCAP entries need manual clean-up. => Ok, so for single audios, the refactoring does no harm on inference when using pretrained models. Logs from (9) We can compare this with: The log for the refactored interfaces: => Well, there's more going on ;-) It's in the range & with the 751 PR, more refacotrings are coming in—also, we had a hotfixing of wav2vec2 to the end of 2022 & in early 2023, some more edits to whisper (without much performance re-checking). As of internal discussions, at some point the retraining of SpeechBrain models will be necessary. |
|
Still getting a handle on this PR, but one quick question from the start: many of the changes seem to be related to removing |
|
Hey @pplantinga, the I think you should not pay attention to this. The PR is not sync anymore to the latest commits in |
|
@mravanelli @Adel-Moumen @pplantinga I have an idea to simplify this PR. What about we take only the 'HuggingFaceTransformer(nn.Module):' class and make it an abstract class. THe idea is to force the user to override a few mandatory functions, like the forward, and the freeze params functions. THis would be MUCH simpler to grasp. There is no way we release a SB v1.0 with huggingface_wav2vec.py in the lobes while it's not even wav2vec2 anymore ... |
|
I agree on the need for a simplification. Could you show us a code snippet
to figure out how the code can look like with this proposal?
…On Sun, Aug 6, 2023 at 2:26 PM Parcollet Titouan ***@***.***> wrote:
@mravanelli <https://github.com/mravanelli> @Adel-Moumen
<https://github.com/Adel-Moumen> @pplantinga
<https://github.com/pplantinga> I have an idea to simplify this PR. What
about we take only the 'HuggingFaceTransformer(nn.Module):' class and make
it an abstract class. THe idea is to force the user to override a few
mandatory functions, like the forward, and the freeze params functions.
THis would be MUCH simpler to grasp. There is no way we release a SB v1.0
with huggingface_wav2vec.py in the lobes while it's not even wav2vec2
anymore ...
—
Reply to this email directly, view it on GitHub
<#1596 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEA2ZVQCBLKBELEHWEEZ2VDXT7OT5ANCNFSM6AAAAAAQ5SH6PY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Im closing this PR as there's a new one #2116. |
@TParcollet requested a more flexible interface to use HuggingFace than only with wav2vec2.
In another conversation, @Moumeneb1 pointed me to
AutoModelof the HuggingFace transformers library.Upfront a worthwhile side note—recently released:
Should we opt for a minimum HF library support ?
(edit: the hub is now at 0.7.0 in
requirements.txt- the other two are optional; should they remain so ? )What has happened at the opening of this Draft PR:
cache_dirthroughout (so it can be put outside of$HOME/.cache)=> minor impact to
speechbrain/pretrained/fetching.py_check_model_source(path, save_path)// has changes to check if source is downloaded alreadyconfig_return_hidden_states(config)model_set_spectral_augmentation(model, apply_spec_augment)modify_state_dict_wav2vec2(path)default_forward(model, data)wav2vec2_forward(model, data, output_all_hiddens)wav2vec2_pretraining_forward(model, data, mask_prob, mask_length)HuggingFaceModel(nn.Module)to handle all interfaces with HuggingFace transformers - with init doing:AutoConfig(adjust if wanted)AutoModel(adjust if wanted)(e.g., default_forward; wav2vec2_forward; wav2vec2_pretraining_forward)
HuggingFaceWav2Vec2inherits now fromHuggingFaceModeland is reduced to a super().init callHuggingFaceWav2Vec2Pretrain; different init parameterization (serves here as proof-of-concept)Drafting status:
transformers>=4.22.2(or: skip their integration examples)Edit (2022-10-11).
Edit (2022-12-13).