Skip to content

Deprecated shard_checkpoint's replacement save_torch_state_dict does not save tied embeddings #35080

@casper-hansen

Description

@casper-hansen

System Info

- `transformers` version: 4.46.3
- Platform: macOS-14.4-arm64-arm-64bit
- Python version: 3.10.13
- Huggingface_hub version: 0.26.3
- Safetensors version: 0.4.2
- Accelerate version: 0.26.1
- Accelerate config:    not found
- PyTorch version (GPU?): 2.2.2 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: <fill in>

Who can help?

@SunMarc @ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

The following shard_checkpoint has been deprecated in favor of save_torch_state_dict, so that's why I updated the saving mechanism in AutoAWQ to use the new method in casper-hansen/AutoAWQ#644. However, it seems there is a problem where tied embeddings are not correctly saved and thus causing problems during load time in vLLM and potentially other places not identified yet.

from transformers.modeling_utils import shard_checkpoint
from huggingface_hub import save_torch_state_dict

Overview from casper-hansen/AutoAWQ#665 where you can also see the full reproduction scripts and the issues caused.

Model Files model.embed_tokens lm_head
Qwen/Qwen2.5-1.5B-Instruct yes no
transformers==4.46.3 load and save yes no
autoawq==0.2.6 (shard_checkpoint) yes yes
autoawq==0.2.7.post2 (save_torch_state_dict) no yes

Expected behavior

shard_checkpoint seems to have saved tied weights which are important in a lot of engines compatible with Huggingface transformers. The expected behavior is therefore that save_torch_state_dict would also do this since we are being migrated to use this new method.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions