Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
-
Updated
Dec 12, 2024 - Python
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Official implementation of the ICASSP-2022 paper "Text2Poster: Laying Out Stylized Texts on Retrieved Images"
[ICLR 2024] Contextualized Diffusion Models for Text-Guided Image and Video Generation
[CVPR '23] Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models
Add a description, image, and links to the multimodal-generation topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-generation topic, visit your repo's landing page and select "manage topics."