Skip to content

RuntimeError: shape '[1, 3098, 6, 5, 128]' is invalid for input of size 12689408 #35146

@LuchenZhou

Description

@LuchenZhou

System Info

conda create -yn duo python=3.10
conda activate duo

conda install -y git
conda install -y nvidia/label/cuda-12.4.0::cuda-toolkit
conda install -y nvidia::cuda-cudart-dev
conda install -y pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

pip install transformers==4.45.2 accelerate sentencepiece datasets wandb zstandard matplotlib huggingface_hub==0.25.2
pip install tensor_parallel==2.0.0

pip install ninja packaging
pip install flash-attn==2.6.3 --no-build-isolation

LongBench evaluation

pip install seaborn rouge_score einops pandas

pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

We encountered a shape mismatch error while trying to reproduce Duo Attention. We tested versions 4.37 to 4.47, and the issue shifted from a RuntimeError: Boolean value of Tensor with more than one value is ambiguous to a RuntimeError: shape '[1, 3098, 6, 5, 128]' is invalid for input of size 12689408. We couldn't resolve the issue by changing the versions.

We also tried different models with the following commands:

huggingface-cli download togethercomputer/Llama-2-7B-32K-Instruct --local-dir Llama-2-7B-32K-Instruct
huggingface-cli download gradientai/Llama-3-8B-Instruct-Gradient-1048k --local-dir Llama-3-8B-Instruct-Gradient-1048k
huggingface-cli download gradientai/Llama-3-8B-Instruct-Gradient-4194k --local-dir Llama-3-8B-Instruct-Gradient-4194k
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.2 --local-dir Mistral-7B-Instruct-v0.2
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.3 --local-dir Mistral-7B-Instruct-v0.3

However, none of these models worked. There was a previous issue suggesting that updating the transformer version could solve the problem, but we are still getting shape mismatch errors.

Could there be other packages that need to be updated as well?

Expected behavior

A solution of RuntimeError: shape '[1, 3098, 6, 5, 128]' is invalid for input of size 12689408

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions