-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Description
System Info
- `transformers` version: 4.46.3
- Platform: macOS-14.4-arm64-arm-64bit
- Python version: 3.10.13
- Huggingface_hub version: 0.26.3
- Safetensors version: 0.4.2
- Accelerate version: 0.26.1
- Accelerate config: not found
- PyTorch version (GPU?): 2.2.2 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: <fill in>
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
The following shard_checkpoint has been deprecated in favor of save_torch_state_dict, so that's why I updated the saving mechanism in AutoAWQ to use the new method in casper-hansen/AutoAWQ#644. However, it seems there is a problem where tied embeddings are not correctly saved and thus causing problems during load time in vLLM and potentially other places not identified yet.
from transformers.modeling_utils import shard_checkpoint
from huggingface_hub import save_torch_state_dict
Overview from casper-hansen/AutoAWQ#665 where you can also see the full reproduction scripts and the issues caused.
| Model Files | model.embed_tokens |
lm_head |
|---|---|---|
Qwen/Qwen2.5-1.5B-Instruct |
yes | no |
transformers==4.46.3 load and save |
yes | no |
autoawq==0.2.6 (shard_checkpoint) |
yes | yes |
autoawq==0.2.7.post2 (save_torch_state_dict) |
no | yes |
Expected behavior
shard_checkpoint seems to have saved tied weights which are important in a lot of engines compatible with Huggingface transformers. The expected behavior is therefore that save_torch_state_dict would also do this since we are being migrated to use this new method.