Skip to content

[Bug]: Missing files for example multimodal_rag_langchain.ipynb #742

@caetano-colin

Description

@caetano-colin

File Name

gemini/use-cases/retrieval-augmented-generation/multimodal_rag_langchain.ipynb

What happened?

I was running the example and after downloading the files from the bucket using: !gsutil -m rsync -r gs://github-repo/rag/intro_multimodal_rag/ .

When running the code below:

pdf_folder_path = "/content/data/" if "google.colab" in sys.modules else "data/"
pdf_file_name = "google-10k-sample-14pages.pdf"

# Extract images, tables, and chunk text from a PDF file.
raw_pdf_elements = partition_pdf(
    filename=pdf_file_name,
    extract_images_in_pdf=False,
    infer_table_structure=True,
    chunking_strategy="by_title",
    max_characters=4000,
    new_after_n_chars=3800,
    combine_text_under_n_chars=2000,
    image_output_dir_path=pdf_folder_path,
)

It is not able to find google-10k-sample-14pages.pdf file from the bucket downloaded files. I tried navigating through the github-repo bucket and was not able to find this pdf file.

Relevant log output

[Errno 2] No such file or directory: 'google-10k-sample-14pages.pdf'
PDF text extraction failed, skip text extraction...

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions