Open
Description
File Name
What happened?
Following code failed with "FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG" error:
#Extract text and image metadata from the PDF document
text_metadata_df, image_metadata_df = get_document_metadata(
multimodal_model, # we are passing gemini 1.0 pro vision model
pdf_folder_path,
image_save_dir="images",
image_description_prompt=image_description_prompt,
embedding_size=1408,
)
print("\n\n --- Completed processing. ---")
:
Processing page: 1
Processing page: 2
Processing page: 3
Processing page: 4
:
FzErrorArgument Traceback (most recent call last)
[<ipython-input-8-96bfa690e8cb>](https://localhost:8080/#) in <cell line: 14>()
12
13 # Extract text and image metadata from the PDF document
---> 14 text_metadata_df, image_metadata_df = get_document_metadata(
15 multimodal_model, # we are passing gemini 1.0 pro vision model
16 pdf_folder_path,
4 frames
~/.local/lib/python3.10/site-packages/pymupdf/mupdf.py in fz_write_pixmap_as_jpeg(out, pix, quality, invert_cmyk)
47578 Write a pixmap as a JPEG.
47579 """
> 47580 return _mupdf.fz_write_pixmap_as_jpeg(out, pix, quality, invert_cmyk)
47581
47582 def fz_write_pixmap_as_jpx(out, pix, quality):
FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG
Relevant log output
I think get_image_for_gemini() function in
gemini/use-cases/retrieval-augmented-generation/utils/intro_multimodal_rag_utils.py should be modified as below:
import fitz
import os
from PIL import Image
def get_image_for_gemini(
doc: fitz.Document,
image: tuple,
image_no: int,
image_save_dir: str,
file_name: str,
page_num: int,
) -> Tuple[Image, str]:
"""
Extracts an image from a PDF document, converts it to JPEG format, saves it to a specified directory,
and loads it as a PIL Image Object.
Parameters:
- doc (fitz.Document): The PDF document from which the image is extracted.
- image (tuple): A tuple containing image information.
- image_no (int): The image number for naming purposes.
- image_save_dir (str): The directory where the image will be saved.
- file_name (str): The base name for the image file.
- page_num (int): The page number from which the image is extracted.
Returns:
- Tuple[Image.Image, str]: A tuple containing the Gemini Image object and the image filename.
"""
# Extract the image from the document
xref = image[0]
pix = fitz.Pixmap(doc, xref)
# Convert the image to JPEG format
pix.tobytes("jpeg")
# Create the image file name
image_name = f"{image_save_dir}/{file_name}_image_{page_num}_{image_no}_{xref}.jpeg"
# Create the image save directory if it doesn't exist
os.makedirs(image_save_dir, exist_ok=True)
# Save the image to the specified location
pix.save(image_name)
# Load the saved image as a Gemini Image Object
image_for_gemini = Image.load_from_file(image_name)
return image_for_gemini, image_name
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
No labels