Deprecated `shard_checkpoint`'s replacement `save_torch_state_dict` does not save tied embeddings

### System Info

```
- `transformers` version: 4.46.3
- Platform: macOS-14.4-arm64-arm-64bit
- Python version: 3.10.13
- Huggingface_hub version: 0.26.3
- Safetensors version: 0.4.2
- Accelerate version: 0.26.1
- Accelerate config:    not found
- PyTorch version (GPU?): 2.2.2 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: <fill in>
```

### Who can help?

@SunMarc @ArthurZucker 

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

The following `shard_checkpoint` has been deprecated in favor of `save_torch_state_dict`, so that's why I updated the saving mechanism in AutoAWQ to use the new method in https://github.com/casper-hansen/AutoAWQ/pull/644. However, it seems there is a problem where tied embeddings are not correctly saved and thus causing problems during load time in vLLM and potentially other places not identified yet.

```
from transformers.modeling_utils import shard_checkpoint
```

```
from huggingface_hub import save_torch_state_dict
```

Overview from https://github.com/casper-hansen/AutoAWQ/issues/665 where you can also see the full reproduction scripts and the issues caused.

| Model Files | `model.embed_tokens` | `lm_head` |
| :--- | :---: | :---: |
| `Qwen/Qwen2.5-1.5B-Instruct` | yes | no |
| `transformers==4.46.3` load and save | yes | no |
| `autoawq==0.2.6` (`shard_checkpoint`) | yes | yes |
| `autoawq==0.2.7.post2` (`save_torch_state_dict`) | no | yes |

### Expected behavior

`shard_checkpoint` seems to have saved tied weights which are important in a lot of engines compatible with Huggingface transformers. The expected behavior is therefore that `save_torch_state_dict` would also do this since we are being migrated to use this new method.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deprecated `shard_checkpoint`'s replacement `save_torch_state_dict` does not save tied embeddings #35080

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model Files	`model.embed_tokens`	`lm_head`
`Qwen/Qwen2.5-1.5B-Instruct`	yes	no
`transformers==4.46.3` load and save	yes	no
`autoawq==0.2.6` (`shard_checkpoint`)	yes	yes
`autoawq==0.2.7.post2` (`save_torch_state_dict`)	no	yes

Deprecated shard_checkpoint's replacement save_torch_state_dict does not save tied embeddings #35080

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Deprecated `shard_checkpoint`'s replacement `save_torch_state_dict` does not save tied embeddings #35080