Skip to content

Special token ids are not longer typed properly in 4.47.0 #35126

@chanind

Description

@chanind

System Info

tranformers 4.47.0, python 3.11

Who can help?

@ArthurZucker (I think)

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

SpecialTokensMixin in tokenization_utils_base returns type of list[str | list[str] | Unknown] | Unknown for bos_token_id and other special token IDs, whereas previously it was correctly int | None. This is a regression caused by the following commit, where all special token types were (likely unintentionally) deleted: https://github.com/huggingface/transformers/pull/34461/files#diff-85b29486a884f445b1014a26fecfb189141f2e6b09f4ae701ee758a754fddcc1

Expected behavior

special token ids on tokenizers should keep returning type id | None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions