-
Notifications
You must be signed in to change notification settings - Fork 31.5k
Description
System Info
conda create -yn duo python=3.10
conda activate duo
conda install -y git
conda install -y nvidia/label/cuda-12.4.0::cuda-toolkit
conda install -y nvidia::cuda-cudart-dev
conda install -y pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
pip install transformers==4.45.2 accelerate sentencepiece datasets wandb zstandard matplotlib huggingface_hub==0.25.2
pip install tensor_parallel==2.0.0
pip install ninja packaging
pip install flash-attn==2.6.3 --no-build-isolation
LongBench evaluation
pip install seaborn rouge_score einops pandas
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
We encountered a shape mismatch error while trying to reproduce Duo Attention. We tested versions 4.37 to 4.47, and the issue shifted from a RuntimeError: Boolean value of Tensor with more than one value is ambiguous to a RuntimeError: shape '[1, 3098, 6, 5, 128]' is invalid for input of size 12689408. We couldn't resolve the issue by changing the versions.
We also tried different models with the following commands:
huggingface-cli download togethercomputer/Llama-2-7B-32K-Instruct --local-dir Llama-2-7B-32K-Instruct
huggingface-cli download gradientai/Llama-3-8B-Instruct-Gradient-1048k --local-dir Llama-3-8B-Instruct-Gradient-1048k
huggingface-cli download gradientai/Llama-3-8B-Instruct-Gradient-4194k --local-dir Llama-3-8B-Instruct-Gradient-4194k
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.2 --local-dir Mistral-7B-Instruct-v0.2
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.3 --local-dir Mistral-7B-Instruct-v0.3
However, none of these models worked. There was a previous issue suggesting that updating the transformer version could solve the problem, but we are still getting shape mismatch errors.
Could there be other packages that need to be updated as well?
Expected behavior
A solution of RuntimeError: shape '[1, 3098, 6, 5, 128]' is invalid for input of size 12689408