Closed
Description
System Info
python version: 3.11.10
transformers version: 4.46.0
torch version: 2.4.0+cu118
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path = "my_workspace/llama3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_path, padding_side="left")
device_map = {}
device_map['model.embed_tokens'] = 0
for layer_idx in range(20):
device_map[f'model.layers.{layer_idx}'] = 0
for layer_idx in range(20, 32):
device_map[f'model.layers.{layer_idx}'] = 1
device_map['lm_head.weight'] = 1
device_map['model.norm.weight'] = 1
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype="auto",
device_map=device_map
)
print(model(**tokenizer("111",return_tensors="pt").to(0)).logits.shape)
With transformers==4.46.0
, the code above results in:
RuntimeError
: Expected all tensors to be on the same device, but found at least two devices, CPU and CUDA:0! (when checking argument for mat2 in method wrapper_CUDA_bmm)
Expected behavior
This issue does not occur with lower versions of transformers. I tried transformers==4.40.0
, and the code successfully outputs:
torch.Size([1, 1, 128256])
Activity