GPT4V-level open-source multi-modal model based on Llama3-8B
-
Updated
Sep 3, 2024 - Python
GPT4V-level open-source multi-modal model based on Llama3-8B
Tag manager and captioner for image datasets
Famous Vision Language Models and Their Architectures
Python scripts to use for captioning images with VLMs
Tiny-scale experiment showing that CLIP models trained using detailed captions generated by multimodal models (CogVLM and LLaVA 1.5) outperform models trained using the original alt-texts on a range of classification and retrieval tasks.
A comparitive study between the two of the best performing open source Vision Language Models - Google Gemini Vision and CogVLM
Add a description, image, and links to the cogvlm topic page so that developers can more easily learn about it.
To associate your repository with the cogvlm topic, visit your repo's landing page and select "manage topics."