Skip to content

Add support multimodal RAG (with ColPali/ColQwen modal embedding) #10986

Closed
@k1endn

Description

@k1endn

Self Checks

  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

  • Dify currently supports only text embedding.
  • When working with files that contain many images, charts, and tables, these elements need to be converted to text using OCR (Optical Character Recognition) and Document Layout Detection tools.
  • Colpali, a multi-modal embedding model, demonstrates impressive Retrieval-Augmented Generation (RAG) performance compared to traditional RAG models.

2. Additional context or comments

image
This my demo. The retrieves images (after converting pages in a file into images) related to a query and utilizes multimodal LLMs to generate answers. The results are highly impressive.

3. Can you help us with this feature?

  • I am interested in contributing to this feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    👻 feat:ragEmbedding related issue, like qdrant, weaviate, milvus, vector database.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions