Highlights
- Pro
Stars
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
Lightweight Python framework that provides a high-level API for creating and rendering scenes with Blender.
[NeurIPS 2024] NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing
This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!
A curated list of papers, code and resources pertaining to image composition/compositing or object insertion, which aims to generate realistic composite image.
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
we propose FlexEdit, an end-to-end image editing method that leverages both free-shape masks and language instructions for Flexible Editing.
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
SigLIP-based Aesthetic Score Predictor
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
This is the official reproduction of FancyVideo.
[CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Code and dataset for AAAI 2022 paper "CAISE: Conversational Agent for Image Search and Editing" Hyounghun Kim, Doo Soon Kim, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, and Mohit Bansal
Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"