Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LlavaForConditionalGeneration._merge_input_ids_with_image_features throws error #35169

Open
4 tasks done
NicolasDrapier opened this issue Dec 9, 2024 · 1 comment · May be fixed by #34502
Open
4 tasks done

LlavaForConditionalGeneration._merge_input_ids_with_image_features throws error #35169

NicolasDrapier opened this issue Dec 9, 2024 · 1 comment · May be fixed by #34502
Assignees
Labels
bug Multimodal WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Comments

@NicolasDrapier
Copy link

NicolasDrapier commented Dec 9, 2024

System Info

  • transformers version: 4.43.1
  • Platform: Linux-6.8.5-1-default-x86_64-with-glibc2.39
  • Python version: 3.11.9
  • Huggingface_hub version: 0.23.5
  • Safetensors version: 0.4.3
  • Accelerate version: 0.29.3
  • Accelerate config: - compute_environment: LOCAL_MACHINE
    - distributed_type: MULTI_GPU
    - mixed_precision: bf16
    - use_cpu: False
    - debug: False
    - num_processes: 8
    - machine_rank: 0
    - num_machines: 1
    - gpu_ids: all
    - rdzv_backend: static
    - same_network: True
    - main_training_function: main
    - enable_cpu_affinity: False
    - downcast_bf16: no
    - tpu_use_cluster: False
    - tpu_use_sudo: False
    - tpu_env: []
    - dynamo_config: {'dynamo_backend': 'INDUCTOR'}
  • PyTorch version (GPU?): 2.4.0+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: True
  • Using GPU in script?: True
  • GPU type: NVIDIA L40S

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Description

I am trying to use the AutoAWQ library to quantize a Pixtral model (mistral-community/Pixtral-Large-Instruct-2411). However, I am encountering the following error:

File "/quantization/quant/lib64/python3.11/site-packages/transformers/models/llava/modeling_llava.py", line 303, in _merge_input_ids_with_image_features
    num_images, num_image_patches, embed_dim = image_features.shape
                                               ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'shape'

Code

Here is the code I am using:

import os
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = r'/data/models/mistral/pixtral-large-instruct-2411' # from https://huggingface.co/mistral-community/Pixtral-Large-Instruct-2411
quant_path = r'/data/models/mistral/pixtral-large-instruct-2411-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
os.makedirs(quant_path, exist_ok=True)

# Load model
model = AutoAWQForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Quantize
model.quantize(tokenizer, quant_config=quant_config)

# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)
print(f'Model is quantized and saved at "{quant_path}"')

Analysis

The model I am using is Pixtral-Large-Instruct-2411, but its configuration is LlavaForConditionalGeneration. The issue arises in the Transformers library's source code where image_features remains None if pixel_values is None. Consequently, in the method _merge_input_ids_with_image_features, the first line num_images, num_image_patches, embed_dim = image_features.shape tries to access the shape attribute of None, resulting in an AttributeError.

image_features = None
if pixel_values is not None:
    image_features = self.get_image_features(
        pixel_values=pixel_values,
        vision_feature_layer=vision_feature_layer,
        vision_feature_select_strategy=vision_feature_select_strategy,
    )

if legacy_processing:
    logger.warning_once(
        "Expanding inputs for image tokens in LLaVa should be done in processing. "
        "Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly "
        "with `processor.patch_size = {{patch_size}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. "
        "Using processors without these attributes in the config is deprecated and will throw an error in v4.50."
    )
    # prefill stage vs decoding stage (legacy behavior copied)
    if input_ids.shape[1] != 1:
        inputs_embeds, attention_mask, labels, position_ids = self._merge_input_ids_with_image_features(
            image_features, inputs_embeds, input_ids, attention_mask, labels # <-- image_features is still None here
        )
        cache_position = torch.arange(attention_mask.shape[1], device=attention_mask.device)

Steps to Reproduce

  1. Ensure the Pixtral-Large-Instruct-2411 model is available at the specified path.
  2. Run the provided code snippet.

Actual Behavior

An AttributeError is raised due to image_features being None.

Expected behavior

The model should be loaded, quantized, and saved without any errors.

@zucchini-nlp
Copy link
Member

@NicolasDrapier Indeed llava cannot work with text only inputs currently and expects always an image as complementary input, which is why it is breaking when quantizing. The issue is known and should be fixed by #34502, we no longer should support _merge_input_ids_with_image_features

@zucchini-nlp zucchini-nlp added Multimodal WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress labels Dec 9, 2024
@zucchini-nlp zucchini-nlp self-assigned this Dec 9, 2024
@zucchini-nlp zucchini-nlp linked a pull request Dec 10, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Multimodal WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants