Closed
Description
System Info
transformers
version: 4.47.0- Platform: Linux-5.10.0-33-cloud-amd64-x86_64-with-glibc2.31
- Python version: 3.9.1
- Huggingface_hub version: 0.26.3
- Safetensors version: 0.4.3
- Accelerate version: 0.30.1
- Accelerate config: - compute_environment: LOCAL_MACHINE
- distributed_type: NO
- mixed_precision: fp16
- use_cpu: False
- debug: False
- num_processes: 1
- machine_rank: 0
- num_machines: 1
- gpu_ids: 0
- rdzv_backend: static
- same_network: True
- main_training_function: main
- enable_cpu_affinity: False
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: [] - PyTorch version (GPU?): 2.2.1+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: no
- Using GPU in script?: yes
- GPU type: Tesla T4
Who can help?
@ArthurZucker @molbap we chatted about the last paligemma release :)
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Here is a script that shows the problem:
from transformers import PaliGemmaForConditionalGeneration, PaliGemmaProcessor
from PIL import Image
import numpy as np
hf_token = "..."
processor = PaliGemmaProcessor.from_pretrained(
"google/paligemma2-3b-pt-224", token=hf_token
)
text = ["How many shapes are green?"]
suffix = ["4"]
image = [Image.fromarray(np.zeros((224, 224, 3), dtype=np.uint8))]
print(
processor(
images=image, text=text, suffix=suffix, return_tensors="pt", padding="longest"
).labels
)
text = ["<image>How many shapes are green?"]
print(
processor(
images=image, text=text, suffix=suffix, return_tensors="pt", padding="longest"
).labels
)
Expected behavior
As you can see, the bottom one is missing the EOS token, which leads to bad finetunes! But the processor class warns me when the <image>
token isn't present.
Activity