Skip to content

[Bug]: FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG #676

Open
@myoshimu

Description

@myoshimu

File Name

https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/intro_multimodal_rag.ipynb

What happened?

Following code failed with "FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG" error:

#Extract text and image metadata from the PDF document
text_metadata_df, image_metadata_df = get_document_metadata(
    multimodal_model,  # we are passing gemini 1.0 pro vision model
    pdf_folder_path,
    image_save_dir="images",
    image_description_prompt=image_description_prompt,
    embedding_size=1408,
)

print("\n\n --- Completed processing. ---")
:



Processing page: 1
Processing page: 2
Processing page: 3
Processing page: 4

:

FzErrorArgument                           Traceback (most recent call last)
[<ipython-input-8-96bfa690e8cb>](https://localhost:8080/#) in <cell line: 14>()
     12 
     13 # Extract text and image metadata from the PDF document
---> 14 text_metadata_df, image_metadata_df = get_document_metadata(
     15     multimodal_model,  # we are passing gemini 1.0 pro vision model
     16     pdf_folder_path,

4 frames
~/.local/lib/python3.10/site-packages/pymupdf/mupdf.py in fz_write_pixmap_as_jpeg(out, pix, quality, invert_cmyk)
  47578         Write a pixmap as a JPEG.
  47579     """
> 47580     return _mupdf.fz_write_pixmap_as_jpeg(out, pix, quality, invert_cmyk)
  47581 
  47582 def fz_write_pixmap_as_jpx(out, pix, quality):

FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG

Relevant log output

I think get_image_for_gemini() function in
gemini/use-cases/retrieval-augmented-generation/utils/intro_multimodal_rag_utils.py should be modified as below:

import fitz
import os
from PIL import Image


def get_image_for_gemini(
    doc: fitz.Document,
    image: tuple,
    image_no: int,
    image_save_dir: str,
    file_name: str,
    page_num: int,
) -> Tuple[Image, str]:
    """
    Extracts an image from a PDF document, converts it to JPEG format, saves it to a specified directory,
    and loads it as a PIL Image Object.

    Parameters:
    - doc (fitz.Document): The PDF document from which the image is extracted.
    - image (tuple): A tuple containing image information.
    - image_no (int): The image number for naming purposes.
    - image_save_dir (str): The directory where the image will be saved.
    - file_name (str): The base name for the image file.
    - page_num (int): The page number from which the image is extracted.

    Returns:
    - Tuple[Image.Image, str]: A tuple containing the Gemini Image object and the image filename.
    """

    # Extract the image from the document
    xref = image[0]
    pix = fitz.Pixmap(doc, xref)

    # Convert the image to JPEG format
    pix.tobytes("jpeg")

    # Create the image file name
    image_name = f"{image_save_dir}/{file_name}_image_{page_num}_{image_no}_{xref}.jpeg"

    # Create the image save directory if it doesn't exist
    os.makedirs(image_save_dir, exist_ok=True)

    # Save the image to the specified location
    pix.save(image_name)

    # Load the saved image as a Gemini Image Object
    image_for_gemini = Image.load_from_file(image_name)

    return image_for_gemini, image_name

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions