HF interface#2116
Conversation
TParcollet
left a comment
There was a problem hiding this comment.
Thanks for the work! I will request ideas from @Adel-Moumen and @pplantinga here. I think I like this easy to use abstraction. The main remaining question here being: do we put everything in a single file called HuggingfaceTransformers.py or do we create a directory with multiple files, with each file containing our implementation of a model? Also we have to think about one thing: dropping the wav2vec2 in the naming of our implementation, because it also works with Hubert and wavlm. So maybe something like DefaultHuggingfaceTransformer? wdyt @Adel-Moumen @pplantinga and @mravanelli
|
|
||
|
|
||
| class HuggingFaceTransformer(nn.Module): | ||
| """This lobe provides AutoClass architecture loading into SpeechBrain modules. |
There was a problem hiding this comment.
Maybe slightly more precise about what it means: being able to load any model from HuggingFace transformer.
There was a problem hiding this comment.
It would be good to give information about how to use this class as well. Like it gives an ensemble of functions that the user must override by inheriting from this class to support any HuggingFace model.
| tokenizer=False, | ||
| for_pretraining=False, | ||
| freeze=False, | ||
| sampling_rate=16000, |
There was a problem hiding this comment.
Do we need the sampling_rate here?
There was a problem hiding this comment.
I put it there because whisper puts it:
There was a problem hiding this comment.
This could be solved by passing a **kwarg instead. This is to avoid having 1 billion arguments once we have many model implemented
| err_msg = f"{path} does not contain a .bin or .ckpt checkpoint !" | ||
| raise FileNotFoundError(err_msg) | ||
|
|
||
| def _modify_state_dict_partial_fn(self): |
There was a problem hiding this comment.
What is this function for?
There was a problem hiding this comment.
it for modifying the state_dict when needed. For example with wav2vec, loading sb model would require some modifications to be compatible (
)This should be overwritten by users depending on the model in use.
There was a problem hiding this comment.
Can we remove the partial_fn and add a description to explain what it is for?
| return config | ||
|
|
||
|
|
||
| def make_padding_masks(src, wav_len=None, pad_idx=0): |
There was a problem hiding this comment.
Don't we have a function in SpeechBrain to do this already? I believe we do, maybe in util, or attention or somewhere.
| return src_key_padding_mask | ||
|
|
||
|
|
||
| def make_masks(src, wav_len=None, pad_idx=0): |
There was a problem hiding this comment.
Took these two from huggingface_wav2vec.py
what is default huggingface transformer exactly? I agree that "wav2vec" is not representative for hubert and wavlm, but DefaultHuggingfaceTransformer is a bit too much? :D |
|
Well, I have no idea actually, trying to build a name out of the fact that a lot of models share the same default encoder structure, and this is why they can use the same class. |
I like the idea of having a new folder (e.g. called Why should we drop the |
I also like this idea of making a huggingface_transformers directory which hosts separate files for each HF's model. One question, huggingface_transformers should be inside or outside the transformers directory? |
|
I added a huggingface_transformers directory inside speechbrain/lobes/models which hosts the interface huggingface.py, wav2vec class huggingface_wav2vec.py, and whisper class huggingface_whisper.py. |
| self.tokenizer = AutoTokenizer.from_pretrained(source, **kwarg) | ||
|
|
||
|
|
||
| def make_padding_masks(src, wav_len=None, pad_idx=0): |
There was a problem hiding this comment.
Did you double-check that we don't have this somewhere already? I really think we do.
There was a problem hiding this comment.
I couldn't find it anywhere else rather than in huggingface_wav2vec.py
It looks the same as the make_masks method though. So I removed make_masks.
TParcollet
left a comment
There was a problem hiding this comment.
I went again throughout the code. I like the current state of it. I think it is close to be ready. I will just ask @Adel-Moumen to run recipe testing to make sure that nothing breaks. Also @mhn226 could you please modify all the tutorials? You will need to manually check unfortunately ...
|
I think we should change the name of Furthermore, @TParcollet, you said you wanted to separate |
|
Note: when merging, please select merge with git stash! |
I somehow don't have problems with the file names because huggingface.py would cease to nothing.py following your logic, unless you have another suggestion for it? |
No, it could be a special case or could be renamed to something like |
Ok I see your point. I'll modify the model files as your suggestion. I'll keep huggingface.py as it is because "base" would create undesirable confusion with the model size in my opinion. |
Yes. Good. |
|
Hi, so this PR passed the recipe tests (thanks to @Adel-Moumen for doing the tests). The remaining problem is that it doesn't seem to give the same scores when loading some pre-trained (old) models we published on dropbox. For example, this model for Italian from this recipe gave super weird WER. And it's the same if I do the test with the develop branch. Retraining the model from scratch gave the same outcome for both this PR and the develop branch. With this recipe, Italian model from this recipe, we have relatively the same scores as what we published (tested with both this PR and the develop branch). So it's good here. But wav2vec+CTC doesn't work with the french model. same problem! So I don't know how we can conclude? I personally think that if we see the same scores between this PR and the develop branch, we can say it is OK? The underlying problem doesn't come from the PR? Can you give your thoughts? @Adel-Moumen @TParcollet @mravanelli |
mhn226
left a comment
There was a problem hiding this comment.
Adapt changes to comments from @TParcollet and @Adel-Moumen
|
|
||
|
|
||
| class HuggingFaceTransformer(nn.Module): | ||
| """This lobe provides AutoClass architecture loading into SpeechBrain modules. |
| err_msg = f"{path} does not contain a .bin or .ckpt checkpoint !" | ||
| raise FileNotFoundError(err_msg) | ||
|
|
||
| def _modify_state_dict_partial_fn(self): |
| self.model.train() | ||
|
|
||
| def _from_pretrained( | ||
| self, source, config, model, save_path, cache_dir, |
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class HuggingFaceHuBERT(HuggingFaceWav2Vec2): |
| def __init__(self, hub, num_layers, layernorm=False): | ||
| super().__init__() | ||
| self.encoder = AutoModel.from_pretrained(hub, output_hidden_states=True) | ||
| super().__init__(source=hub, save_path=".") |
| self.tokenizer = AutoTokenizer.from_pretrained(source, **kwarg) | ||
|
|
||
|
|
||
| def make_padding_masks(src, wav_len=None, pad_idx=0): |
There was a problem hiding this comment.
I couldn't find it anywhere else rather than in huggingface_wav2vec.py
It looks the same as the make_masks method though. So I removed make_masks.
Hi Mirco, I also plan to implement text-based pretrained models like mbart, labse, nllb, etc. too. |
|
Note that @poonehmousavi is working on an HF interface to integrate Llama2. Maybe you guys can talk to each other to better coordinate your efforts. Note: we need to merge in unstable. We just need to update the unstable branch and the code in your PR with the latest develop such that you have have the new GPT2 recipe |
OK I see. Thanks for the clarification! |
@mhn226 , The gpt recipe is already merged, it is not that much different from other HF models that we previously have. But it might be different slightly for LLAMA2. I will add you to the LLAMA2 PR when I feel more confident about it. Let me know if I could be of any help. |
|
I merged the latest version of dev, which includes the GPT2 fine-tuning recipe implemented by @poonehmousavi. @mhn226, could you please revise it and integrate into the HF refactor? |
Hi Marco, yes I'm working on that. Thank you for the merging! |
There was a problem hiding this comment.
Hi @poonehmousavi : I adapt your recipe to this HF interface PR. Could you please take a look at the changes, especially at my comments? Please leave some comments if needed.
P/s: I see that you're using directly the generate() method from HF. I don't have any problem with that but I wonder if we want to use our decoding method in the future as @Adel-Moumen has been making wonderful effort in the scoring and searching part?
|
|
||
| # gpt model | ||
| gpt_model: !new:speechbrain.lobes.models.huggingface_gpt.HuggingFaceGPT | ||
| gpt_model: !new:speechbrain.lobes.models.huggingface_transformers.gpt.GPT |
There was a problem hiding this comment.
@poonehmousavi : this got changed because we created a dir for all HF models (huggingface_transformers), in which gpt.py is located. Long story but we omitted "huggingface" everywhere in the py file names and also the name of the model classes because we believe that the huggingface_transformers dir implicitly indicates the HF nature of these models.
| ) | ||
|
|
||
| # Load tokenizer and add special tokens | ||
| tokenizer = GPT2Tokenizer.from_pretrained( |
There was a problem hiding this comment.
@poonehmousavi : I removed these lines because the tokenizer will be automatically loaded by the GPT class.
| tokenizer = GPT2Tokenizer.from_pretrained( | ||
| hparams["gpt_hub"], pad_token=None | ||
| ) | ||
| tokenizer = hparams["gpt_model"].tokenizer |
There was a problem hiding this comment.
@poonehmousavi : here I call the tokenizer of the GPT model
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class GPT(HFTransformersInterface): |
There was a problem hiding this comment.
@poonehmousavi : This class defines the GPT model. Could you please complete the docstring? I think it lacks quite a lot of the parameters used by the class. I could do it but I think you're more familiar with the class.
| self.top_k = top_k | ||
| self.top_p = top_p | ||
| self.num_beams = num_beams | ||
| self.early_stopping = early_stopping |
There was a problem hiding this comment.
@poonehmousavi : I made eos_token_id a param because different gpt models might have different eos_token_id. This is especially true when a multilingual model is used.
| self.num_beams = num_beams | ||
| self.early_stopping = early_stopping | ||
|
|
||
| self.load_tokenizer(source=source, pad_token=None, use_fast=False) |
There was a problem hiding this comment.
@poonehmousavi : here the tokenizer is loaded. "use_fast" is set False because I see that you specifically use GPT2Tokenizer instead of GPT2TokenizerFast which is loaded by default by AutoTokenizer.
No. This is a different beam search :) |
|
Thank you @mhn226 and all the other contributors for this PR. I did some tests and everything LGTM. It significantly improves the way we fetch models from HuggingFace. |
This PR simplifies the #1596 PR following @TParcollet 's ideas.
Briefly, we create an interface class called HuggingFaceTransformer which only implements several esensial methods, for example, modeling loading using HuggingFace's Auto Classes (https://huggingface.co/docs/transformers/model_doc/auto).
The users would need to customize their own HF's class which inherits HuggingFaceTransformer. The adapted huggingface_wav2vec.py and huggingface_whisper.py would serve as examples of how to make such classes.