Stars
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
Official inference repo for FLUX.1 models
Official implementation of "SketchDeco: Decorating B&W Sketches with Colour"
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [TIFS 2024]
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
Deep Q-Network (DQN) and Fitted Q-Iteration (FQI) tutorial for RL Summer School 2023
✨✨Latest Advances on Multimodal Large Language Models
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
[CVPR2024] CapHuman: Capture Your Moments in Parallel Universes
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
Official implementations for paper: Anydoor: zero-shot object-level image customization
Official implementations for paper: LivePhoto: Real Image Animation with Text-guided Motion Control
[ICLR 2024] LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paper
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusion: LMD, TMLR 2024)
🔥 [CVPR2024] Official implementation of "Self-correcting LLM-controlled Diffusion Models (SLD)
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀
⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
Generative Models by Stability AI