[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
-
Updated
Aug 12, 2024 - Python
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
✨✨Latest Advances on Multimodal Large Language Models
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Algorithms and Publications on 3D Object Tracking
Parsing-free RAG supported by VLMs
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google
[CVPR 2023] Collaborative Diffusion
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
An open-source implementation for training LLaVA-NeXT.
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
An official PyTorch implementation of the CRIS paper
[ICCV2019] Robust Multi-Modality Multi-Object Tracking
Official repo for "VisionZip: Longer is Better but Not Necessary in Vision Language Models"
Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)
Add a description, image, and links to the multi-modality topic page so that developers can more easily learn about it.
To associate your repository with the multi-modality topic, visit your repo's landing page and select "manage topics."