Skip to content

[Feat]: What is the best workaround for grounding with image input? #1194

@gustininho

Description

@gustininho

Is your feature request related to a problem? Please describe.

I'm frustrated that VertexAI grounding is not supported if input is non-text.

Describe the solution you'd like

Is there currently a convenient workaround for it? I'd like to have this functionality of being able to ask for ex:

input(what this item is made of? + [image]) -> grounded search of what item like this is usually made of in the document -> output(text)

Describe alternatives you've considered

There is obviously a way of making 2 separate calls:

  1. Querying about what is the item in the image
  2. Pass that text output to a model with grounding which inputs only text

However, that increases costs massively.

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions