-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Closed
Labels
Description
System Info
tranformers 4.47.0, python 3.11
Who can help?
@ArthurZucker (I think)
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
SpecialTokensMixin in tokenization_utils_base returns type of list[str | list[str] | Unknown] | Unknown for bos_token_id and other special token IDs, whereas previously it was correctly int | None. This is a regression caused by the following commit, where all special token types were (likely unintentionally) deleted: https://github.com/huggingface/transformers/pull/34461/files#diff-85b29486a884f445b1014a26fecfb189141f2e6b09f4ae701ee758a754fddcc1
Expected behavior
special token ids on tokenizers should keep returning type id | None