cogvlm

Tiny-scale experiment showing that CLIP models trained using detailed captions generated by multimodal models (CogVLM and LLaVA 1.5) outperform models trained using the original alt-texts on a range of classification and retrieval tasks.

clip synthetic-data multimodal vision-language-model llava cogvlm

Updated Mar 6, 2024
Python

williamcfrancis / vlm-comparison-gemini-cog

Star

A comparitive study between the two of the best performing open source Vision Language Models - Google Gemini Vision and CogVLM

ai gemini vision vlm vision-and-language vision-language-model cogvlm google-gemini gemini-pro

Updated Jan 28, 2024
Python

Improve this page

Add a description, image, and links to the cogvlm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cogvlm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cogvlm

Here are 6 public repositories matching this topic...

THUDM / CogVLM2

jhc13 / taggui

gokayfem / awesome-vlm-architectures

ProGamerGov / VLM-Captioning-Tools

nopperl / clip-synthetic-captions

williamcfrancis / vlm-comparison-gemini-cog

Improve this page

Add this topic to your repo