Skip to content

Kedreamix/Awesome-Talking-Head-Synthesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 

Repository files navigation

Awesome-Talking-Head-Synthesis

This repository organizes papers, codes and resources related to generative adversarial networks (GANs) 🤗 and neural radiance fields (NeRF) 🎨, with a main focus on image-driven and audio-driven talking head synthesis papers and released codes. 👤

Papers for Talking Head Synthesis, released codes collections. ✍️

Most papers are linked to PDFs on "arXiv" or journal/conference websites 📚. However, some papers require an academic license to view 🔐.

🔆 This project Awesome-Talking-Head-Synthesis is ongoing - pull requests are welcome! If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and submit a PR. You can also open an issue or contact me directly via email. 📩

⭐ If you find this repo useful, please give it a star! 🤩

2023.12 Update 📆

Thank you to https://github.com/Curated-Awesome-Lists/awesome-ai-talking-heads, I have added some of its contents, such as Tools & Software and Slides & Presentations. 🙏 I hope this will be helpful.😊

If you have any feedback or ideas on extending this aggregated resource, please open an issue or PR - community contributions are vital to advancing this shared knowledge. 🤝

Let's keep pushing forward to recreate ever more realistic digital human faces! 💪 We've come so far but still have a long way to go. With continued research 🔬 and collaboration, I'm sure we'll get there! 🤗

Please feel free to star ⭐ and share this repo if you find it a valuable resource. Your support helps motivate me to keep maintaining and improving it. 🥰 Let me know if you have any other questions!

Datasets

dataset描述

Year Dataset Conference/Journal Download Link Description
2026 SFQA ArXiv N/A A dataset for singing face generation quality assessment with 5,184 videos generated from 100 photographs and 36 music clips using 12 generation methods.
2025 TalkCuts ArXiv 2025 N/A A large-scale dataset with 164k clips totaling over 500 hours of human speech videos featuring diverse camera shots and detailed annotations including textual descriptions, 2D keypoints, and 3D SMPL-X motions for multi-shot speech video generation.
2025 EmojiBench++ IJCV 2025 Download A comprehensive benchmark for portrait animation comprising diverse portraits, driving videos, and landmark sequences.
2025 Multi-human Interactive ArXiv 2025 Download 12 hours of high-res footage with 2-4 speakers, fine-grained body pose and speech interaction annotations.
2025 THQA-10K ArXiv 2025 Download Largest AGTH quality assessment dataset with 10,457 samples from 12 T2I models and 14 talkers.
2025 SpeakerVid-5M ArXiv N/A Large-scale dataset with 5.2M video clips (8,743 hours) for audio-visual dyadic interactive virtual human generation, covering monadic talking, listening, and dyadic conversations, with pre-training and SFT subsets.
2025 TalkingHeadBench WACV 2026 Download Comprehensive benchmark for talking-head deepfake detection with multi-model generators.
2025 Motion-X++ ArXiv 2025 N/A 19.5M 3D whole-body pose annotations covering 120.5K motion sequences with 80.8K RGB videos.
2024 GLCF (MSTF) ArXiv 2024 N/A First large-scale multi-scenario talking face dataset with 22 audio/video forgery techniques.
2024 SAVEE ArXiv 2024 Download 480 British English utterances from 4 male actors expressing 7 emotions.
2024 DH-FaceVid-1K ICCV 2025 Download 1,200 hours, 270K+ clips from 20K+ individuals with speech audio, keypoints, and text annotations.
2024 MMHead ACMMM 2024 N/A Large-scale multi-modal 3D facial animation dataset with 49 hours of 3D facial motion sequences, speech audios, and hierarchical text annotations for text-induced 3D talking head animation and text-to-3D facial motion generation.
2024 Allo-AVA ArXiv 2024 N/A ~1,250 hours of conversational content for allocentric avatar gesture animation.
2024 MultiTalk CVPR 2024 Download 420+ hours across 20 languages, 293K clips (512x512, 25fps, avg 5.19s duration).
2024 THQA ArXiv 2024 Download 800 talking head videos from 8 speech-driven methods with subjective quality assessments.
2023 ViCo ArXiv 2023 N/A ViCo and ViCo-X are datasets for conversational head generation, with ViCo for sentence-level independent talking and listening tasks, and ViCo-X for multi-turn conversational scenarios.
2023 GRID ArXiv 2023 Download 34 volunteers each speaking 1000 phrases (34K utterances) with 6-word sentence structures.
2023 TalkingHead-1KH ArXiv 2023 Download 500K video clips with ~80K greater than 512x512 resolution. Only permissive license videos included.
2023 MMFace4D ArXiv 2023 Download Large-scale multi-modal 4D dataset with 35,000+ sequences from 431 subjects (age 15-68).
2023 CelebV CVPR 2023 Download Includes CelebV-Text with 70,000 in-the-wild face video clips for text-to-video generation.
2022 VFHQ CVPRW 2022 Download 16,000+ high-fidelity clips for video face super-resolution research.
2022 Multiface NeurIPS 2022 Download High-quality multi-view recordings of 13 people with 12K-23K frames per subject at 30fps. 65TB dataset.
2022 CelebV-HQ ECCV 2022 Download 35,666 clips with 15,653 identities, each labeled with 83 facial attributes.
2021 HDTF CVPR 2021 Download High-definition Talking-Face Dataset with ~362 videos (15.8 hours) in 720P/1080P resolution.
2020 MEAD ECCV 2020 Download Large-scale audio-visual dataset with 60 actors expressing 8 emotions at 3 intensity levels.
2019 VOCA SIGGRAPH 2019 Download 4D-face dataset with ~29 minutes of 4D face scans and synchronized audio from 12 speakers.
2019 FaceForensics++ ICCV 2019 Download Large-scale dataset for detecting manipulated facial images with over 1.8M images.
2019 CN-CVS ArXiv 2019 Download Large-scale continuous visual-speech dataset in Mandarin Chinese from TV news and speech shows.
2019 BIWI ArXiv 2019 Download 3D Audiovisual Corpus of Affective Communication with 40 sentences spoken by 14 subjects.
2018 LRS2 ArXiv 2018 Download Lip reading dataset with videos recorded in diverse settings from BBC television.
2018 LRW ACCV 2018 Download Diverse English-speaking dataset from BBC with 1000+ speakers. Each video is 1.16s (29 frames).
2018 VoxCeleb2 Interspeech 2018 Download Largest public audio-visual dataset with video URLs and timestamps. Requires 300GB+ storage.
2017 VoxCeleb1 Interspeech 2017 Download Contains over 100,000 utterances for 1,251 celebrities, extracted from YouTube videos.
2017 ObamaSet SIGGRAPH 2017 Download Specialized audio-visual dataset focused on analyzing visual speech of Barack Obama from weekly address footage.
2014 CREMA-D ACM TOCC 2014 Download Diverse dataset with 7,442 clips featuring 91 actors (48 male, 43 female) aged 20-74, expressing six emotions at four intensity levels.

Survey

Year Title Conference/Journal
2025 Advancing Talking Head Generation: A Comprehensive Survey of Multi-Modal Methodologies, Datasets, Evaluation Metrics, and Loss Functions ArXiv 2025
2024 Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey ArXiv 2024
2024 A Survey on 3D Human Avatar Modeling — From Reconstruction to Generation ArXiv 2024
2024 Deepfake Generation and Detection: A Benchmark and Survey Github ArXiv 2024
2024 A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos Code ArXiv 2024
2024 How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey 3DGS+SLAM🔥🔥🔥 ArXiv 2024
2024 3D Gaussian as a New Vision Era: A Survey 3DGS🔥🔥🔥 ArXiv 2024
2024 Advances in 3D Generation: A Survey ArXiv 2024
2024 A Survey on 3D Gaussian Splatting 3DGS🔥🔥🔥on going ArXiv 2024
2024 Neural Radiance Fields: Past, Present, and Future NeRF🔥🔥🔥 Amazing 413 pages ArXiv 2024
2023 From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications ArXiv 2023
2023 Human-Computer Interaction System: A Survey of Talking-Head Generation IEEE
2023 Talking human face generation: A survey ACM
2022 Deep Learning for Visual Speech Analysis: A Survey ArXiv 2022
2020 What comprises a good talking-head video generation?: A Survey and Benchmark ArXiv 2020

Funny Work

Year Title Code Project Keywords
2024 From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations Code Project Photoreal
2024 Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation Code Project 🔥Animate (阿里科目三驱动)
2024 What You See Is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs Project 🔥Nvidia
2024 LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control Codea Project 🔥快手

Audio-driven

Year Title Conference/Journal Code Project Keywords
2026 EmbedTalk: Triplane-Free Talking Head Synthesis using Embedding-Driven Gaussian Deformation Preprint Gaussian Splatting, 3DGS, Audio-Driven, Talking Head
2026 TempoSyncDiff: Distilled Temporally-Consistent Diffusion for Low-Latency Audio-Driven Talking Head Generation ArXiv 2026 Code Project Diffusion, Audio-Driven, Talking Head, VAE, Latent
2026 UniSync: Towards Generalizable and High-Fidelity Lip Synchronization for Challenging Scenarios ArXiv 2026 Lip Sync, Pose-Anchored, Generalizable
2026 FlowPortrait: Reinforcement Learning for Audio-Driven Portrait Video Generation ArXiv 2026 Audio-Driven, Portrait Animation, Reinforcement Learning, GRPO
2026 UniTalking: A Unified Audio-Video Framework for Talking Portrait Generation CVPR 2026 Audio-Driven, Portrait Animation, Talking Head, CVPR, Transformer, Attention
2026 AUHead: Realistic Emotional Talking Head Generation via Action Units Control ArXiv 2026 Code Action Units, Audio-Driven Generation, Emotion Control, Diffusion Model
2026 Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models ArXiv 2026 3D Facial Animation, Speech-Driven, Omni-modal LLMs, Token-as-query Fusion
2026 MOVA: Towards Scalable and Synchronized Video-Audio Generation ArXiv 2026 Code Project Audio-Driven
2026 VedicTHG: Symbolic Vedic Computation for Low-Resource Talking-Head Generation in Educational Avatars ArXiv 2026 Code Project Avatar, Talking Head
2026 3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars ArXiv 2026 3D, Emotional, Lip Sync, Avatar, Talking Head, Transformer
2026 DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation ArXiv 2026 Project Audio-Driven, Transformer, Attention
2026 VineetVC: Adaptive Video Conferencing Under Severe Bandwidth Constraints Using Audio-Driven Talking-Head Reconstruction ArXiv 2026 Audio-Driven, Talking Head
2026 Narrating For You: Prompt-guided Audio-visual Narrating Face Generation Employing Multi-entangled Latent Space WACV 2026 Audio-Driven, WACV, Latent
2026 SoulX-FlashHead: Oracle-guided Generation of Infinite Real-time Streaming Talking Heads ArXiv 2026 Real-time, Streaming, Talking Head
2026 Asymmetric Hierarchical Anchoring for Audio-Visual Joint Representation ArXiv 2026 Audio-Driven
2026 JoyAvatar: Unlocking Highly Expressive Avatars via Harmonized Text-Audio Conditioning ArXiv 2026 Project Audio-Driven, Avatar
2026 LPIPS-AttnWav2Lip: Generic Audio-Driven lip synchronization for Talking Head Generation in the Wild ArXiv 2026 Code Project Lip Sync, Audio-Driven, Talking Head, Latent
2026 JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion ArXiv 2026 Audio-Visual Diffusion, LoRA, Lip Sync
2026 MIRRORTALK: Forging Personalized Avatars Via Disentangled Style and Hierarchical Motion Control ArXiv 2026 Personalized Avatars, Lip Sync, Style Disentanglement, Diffusion Model
2026 EditYourself: Audio-Driven Generation and Manipulation of Talking Head Videos with Diffusion Transformers ArXiv 2026 Project Diffusion, Audio-Driven, Talking Head
2026 SkyReels-V3 Technique Report ArXiv 2026 Code Video Generation, Audio-Guided, Talking Avatar, Diffusion Transformers
2026 FunCineForge: A Unified Dataset Toolkit and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes ArXiv 2026 Talking Head, Movie Dubbing
2026 Uncertainty-Aware 3D Emotional Talking Face Synthesis with Emotion Prior Distillation ICASSP 2026 Project 3D, Emotional, Talking Head, ICASSP, Attention
2026 Audio-Driven Talking Face Generation with Blink Embedding and Hash Grid Landmarks Encoding ArXiv 2026 Audio-Driven, Talking Head, Transformer
2026 Now You See Me, Now You Don't: A Unified Framework for Expression Consistent Anonymization in Talking Head Videos ArXiv 2026 Talking Head
2026 EditEmoTalk: Controllable Speech-Driven 3D Facial Animation with Continuous Expression Editing ArXiv 2026 3D, Speech-Driven
2026 MoCha:End-to-End Video Character Replacement without Structural Guidance ArXiv 2026 Talking Head
2026 Exploring Talking Head Models With Adjacent Frame Prior for Speech-Preserving Facial Expression Manipulation ACM Trans. Multimedia Speech-Driven, Talking Head
2026 ESGaussianFace: Emotional and Stylized Audio-Driven Facial Animation via 3D Gaussian Splatting ArXiv 2026 3D, Gaussian Splatting, 3DGS, Emotional, Audio-Driven
2026 SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation ArXiv 2026 Real-time, Streaming, Audio-Driven, Avatar, Attention, VAE
2026 DyStream: Streaming Dyadic Talking Heads Generation via Flow Matching-based Autoregressive Model ArXiv 2026 Project Streaming, Talking Head, Flow Matching
2026 SyncAnyone: Implicit Disentanglement via Progressive Self-Correction for Lip-Syncing in the wild ArXiv 2026 Project Transformer
2026 Efficient and Robust Video Defense Framework against 3D-field Personalized Talking Face ArXiv 2026 Code Project 3D, Talking Head, Attention
2026 REST: Diffusion-based Real-time End-to-end Streaming Talking Head Generation via ID-Context Caching and Asynchronous Streaming Distillation ArXiv 2026 Diffusion, Real-time, Streaming, Talking Head, Latent
2026 JoyAvatar-Flash: Real-time and Infinite Audio-Driven Avatar Generation with Autoregressive Diffusion ArXiv 2026 Diffusion, Real-time, Audio-Driven, Avatar
2026 Lightning Fast Caching-based Parallel Denoising Prediction for Accelerating Talking Head Generation ArXiv 2026 Talking Head, Attention, Latent
2026 SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation ArXiv 2026 Audio-Driven, Talking Head
2025 From Inpainting to Editing: A Self-Bootstrapping Framework for Context-Rich Visual Dubbing ArXiv 2025 Project Visual dubbing, Diffusion Transformer, Self-bootstrapping, Lip sync
2025 The Locally Deployable Virtual Doctor: LLM Based Human Interface for Automated Anamnesis and Database Conversion ArXiv 2025 Conditional Diffusion, Facial Animation, Audio-Visual Synchronization, LLM-Based Avatar
2025 Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation ICXR 2025 Blendshapes, FLAME, Disentanglement, 3D Animation
2025 Revising Second Order Terms in Deep Animation Video Coding ArXiv 2025 FOMM, Keypoints, Head Rotation, Motion Model
2025 A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages ArXiv 2025 Phoneme-Viseme Alignment, Multilingual TFS, Mixture-of-Experts
2025 Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation ArXiv 2025 Project Diffusion models, co-speech video, real-time, sparse attention
2025 Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis ArXiv 2025 Multimodal Instructions, Avatar Synthesis, Lip Synchronization
2025 A Lightweight Pipeline for Noisy Speech Voice Cloning and Accurate Lip Sync Synthesis ArXiv 2025 Voice Cloning, Lip Sync Synthesis, Noisy Speech
2025 CEM-Net: Cross-Emotion Memory Network for Emotional Talking Face Generation ArXiv 2025 Cross-emotion memory, audio emotion enhancement, expression displacement, lip sync
2025 Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching Technical Report Project real-time, flow matching, lip-sync
2025 ATL-Diff: Audio-Driven Talking Head Generation with Early Landmarks-Guide Noise Diffusion ArXiv 2025 Code Diffusion, Landmarks-Guide, Real-time, Identity Preservation
2025 MoDiT: Learning Highly Consistent 3D Motion Coefficients with Diffusion Transformer for Talking Head Generation ArXiv 2025 3DMM, Diffusion Transformer, Temporal Consistency, Blinking Dynamics
2025 Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning ArXiv 2025 Joint uncertainty learning, Audio-driven talking face, Lip sync, Visual uncertainty
2025 Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation TMM 2025 Talking Head Animation, Temporal Correlation, One-Shot
2025 EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters ArXiv 2025 Neural Radiance Fields, Expression Parameters, Emotion Control, Audio-Driven
2025 STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing ICME 2025 Code Spatial-Temporal Alignment, Semantic Features, Visual Dubbing, Stability
2025 Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait ArXiv 2025 Code implicit keypoint, spatiotemporal diffusion, audio-driven, talking portrait
2025 EmoDiffusion: Enhancing Emotional 3D Facial Animation with Latent Diffusion Models ArXiv 2025 Project 3D facial animation, latent diffusion, emotional expression, speech-driven
2025 Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation ArXiv 2025 Audio-driven, Diffusion model, Motion Diffusion Transformer, Lip sync
2025 MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation ArXiv 2025 Audio-driven, Emotion Synthesis, Mixture of Experts, Portrait Animation
2025 PTalker: Personalized Speech-Driven 3D Talking Head Animation via Style Disentanglement and Modality Alignment ArXiv 2025 3D, Speech-Driven, Talking Head, Attention
2025 Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation ArXiv 2025 Project Diffusion, Real-time, Portrait Animation, Attention
2025 FlashLips: 100-FPS Mask-Free Latent Lip-Sync using Reconstruction Instead of Diffusion or GANs ArXiv 2025 Diffusion, Transformer, GAN, Latent
2025 In-Context Audio Control of Video Diffusion Transformers ArXiv 2025 Diffusion, Audio-Driven, Transformer, Attention
2025 SynergyWarpNet: Attention-Guided Cooperative Warping for Neural Portrait Animation ArXiv 2025 Portrait Animation, ICASSP, Attention
2025 FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction ArXiv 2025 Portrait Animation, Transformer, Latent
2025 VASA-3D: Lifelike Audio-Driven Gaussian Head Avatars from a Single Image NeurIPS 2025 3D, Gaussian Splatting, Audio-Driven, Avatar
2025 TalkVerse: Democratizing Minute-Long Audio-Driven Video Generation ArXiv 2025 Project Audio-Driven, VAE, Latent
2025 FacEDiT: Unified Talking Face Editing and Generation via Facial Motion Infilling ArXiv 2025 Project Talking Head, Transformer, Attention, Flow Matching
2025 STARCaster: Spatio-Temporal AutoRegressive Video Diffusion for Identity- and View-Aware Talking Portraits ArXiv 2025 Project Diffusion, Portrait Animation, Talking Head
2025 JoVA: Unified Multimodal Learning for Joint Video-Audio Generation ArXiv 2025 Project Audio-Driven, Transformer, Attention, GAN
2025 Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model Tech Report Audio-Driven, Transformer, Reinforcement Learning
2025 FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint ArXiv 2025 Project Portrait Animation, Transformer, Latent
2025 KeyframeFace: From Text to Expressive Facial Keyframes ArXiv 2025 Code Project Talking Head
2025 PersonaLive! Expressive Portrait Image Animation for Live Streaming ArXiv 2025 Streaming, Portrait Animation
2025 GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting WACV 2026 3D, Gaussian Splatting, 3DGS, Audio-Driven, Talking Head
2025 UniLS: End-to-End Audio-Driven Avatars for Unified Listening and Speaking ArXiv 2025 Audio-Driven, Avatar
2025 Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length ArXiv 2025 Real-time, Streaming, Audio-Driven, Avatar
2025 EvalTalker: Learning to Evaluate Real-Portrait-Driven Multi-Subject Talking Humans ArXiv 2025 Portrait Animation, Talking Head
2025 AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement ArXiv 2025 Project Talking Head, Transformer, Attention
2025 AI killed the video star. Audio-driven diffusion model for expressive talking head generation ArXiv 2025 Diffusion, Audio-Driven, Talking Head, Transformer
2025 IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer ArXiv 2025 Audio-Driven, Talking Head, Attention, Latent
2025 Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy ArXiv 2025 Audio-Driven, Attention, Latent
2025 StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model ArXiv 2025 Project 3D, Diffusion, Streaming, Audio-Driven
2025 ConsistTalk: Intensity Controllable Temporally Consistent Talking Head Generation with Diffusion Noise Search AAAI 2026 Diffusion, Talking Head, AAAI, Knowledge Distillation
2025 GRACE: Designing Generative Face Video Codec via Agile Hardware-Centric Workflow ArXiv 2025 Talking Head
2025 Shared Latent Representation for Joint Text-to-Audio-Visual Synthesis ArXiv 2025 Audio-Driven, Latent
2025 THEval. Evaluation Framework for Talking Head Video Generation ArXiv 2025 Talking Head
2025 UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions ArXiv 2025 Audio-Driven, Transformer, Attention, Latent
2025 Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback ArXiv 2025 Diffusion, Audio-Driven, AAAI, Transformer
2025 Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation ICXR 2025 Blendshapes, FLAME, Disentanglement, 3D Animation
2025 Revising Second Order Terms in Deep Animation Video Coding ArXiv 2025 FOMM, Keypoints, Head Rotation, Motion Model
2025 MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control ArXiv 2025
2025 See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region Refinement TASLP 2025 High-Resolution, Talking Faces, Speech-to-Face, Diffusion
2025 LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representation SIGGRAPH Asia 2025 Code Label-Free, Speech-Driven, Facial Animation, FLAME
2025 DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis ArXiv 2025 Disentangled Motion, Flow Matching, Talking Portrait, Controllable
2025 SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation ArXiv 2025 Contrastive Masked Pretraining, Audio-Visual, Talking-Face
2025 EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation IEEE SMC 2025 Real-Time, Audio-Driven, Gaussian Deformation, Talking Head
2025 Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing ArXiv 2025 Biometric Leakage, AI Videoconferencing, Security
2025 Audio Driven Real-Time Facial Animation for Social Telepresence SIGGRAPH Asia 2025 Project Real-time, Audio-Driven, SIGGRAPH, Transformer, Latent
2025 A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages ArXiv 2025 Phoneme-Viseme Alignment, Multilingual TFS, Mixture-of-Experts
2025 Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation ArXiv 2025 Project Diffusion models, co-speech video, real-time, sparse attention
2025 Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis ArXiv 2025 Multimodal Instructions, Avatar Synthesis, Lip Synchronization
2025 A Lightweight Pipeline for Noisy Speech Voice Cloning and Accurate Lip Sync Synthesis ArXiv 2025 Voice Cloning, Lip Sync Synthesis, Noisy Speech
2025 CEM-Net: Cross-Emotion Memory Network for Emotional Talking Face Generation ArXiv 2025 Cross-emotion memory, audio emotion enhancement, expression displacement, lip sync
2025 Warm Chat: Diffuse Emotion-aware Interactive Talking Head Avatar with Tree-Structured Guidance ArXiv 2025 (Withdrawn) Emotional, Avatar, Talking Head, Transformer, Latent
2025 EmoCAST: Emotional Talking Portrait via Emotive Text Description ArXiv 2025 Code Project Emotional, Portrait Animation, Talking Head, Attention
2025 READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation ArXiv 2025 Project Diffusion, Real-time, Audio-Driven, Talking Head, Transformer, VAE
2025 Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching Technical Report Project real-time, flow matching, lip-sync
2025 ATL-Diff: Audio-Driven Talking Head Generation with Early Landmarks-Guide Noise Diffusion ArXiv 2025 Code Diffusion, Landmarks-Guide, Real-time, Identity Preservation
2025 MoDiT: Learning Highly Consistent 3D Motion Coefficients with Diffusion Transformer for Talking Head Generation ArXiv 2025 3DMM, Diffusion Transformer, Temporal Consistency, Blinking Dynamics
2025 MOSPA: Human Motion Generation Driven by Spatial Audio NeurIPS 2025 Code Spatial Audio, Human Motion Generation, Virtual Human
2025 Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning ArXiv 2025 Joint uncertainty learning, Audio-driven talking face, Lip sync, Visual uncertainty
2025 Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation TMM 2025 Talking Head Animation, Temporal Correlation, One-Shot
2025 EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters ArXiv 2025 Neural Radiance Fields, Expression Parameters, Emotion Control, Audio-Driven
2025 STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing ICME 2025 Code Spatial-Temporal Alignment, Semantic Features, Visual Dubbing, Stability
2025 Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait ArXiv 2025 Code implicit keypoint, spatiotemporal diffusion, audio-driven, talking portrait
2025 EmoDiffusion: Enhancing Emotional 3D Facial Animation with Latent Diffusion Models ArXiv 2025 Project 3D facial animation, latent diffusion, emotional expression, speech-driven
2025 Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation ArXiv 2025 Audio-driven, Diffusion model, Motion Diffusion Transformer, Lip sync
2025 MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation ArXiv 2025 Audio-driven, Emotion Synthesis, Mixture of Experts, Portrait Animation
2025 DGTalker: Disentangled Generative Latent Space Learning for Audio-Driven Gaussian Talking Heads ICCV 2025 Project Gaussian, Latent Space
2025 Talking Head Generation via Viewpoint and Lighting Simulation Based on Global Representation ACM MM 2025 Depth-based
2025 PESTalk: Speech-Driven 3D Facial Animation with Personalized Emotional Styles ACM MM 2025 FLAME
2025 GOES: 3D Gaussian-based One-shot Head Animation with Any Emotion and Any Style ACM MM 2025 One-Shot, 3DGS
2025 StableDub: Taming Diffusion Prior for Generalized and Efficient Visual Dubbing ArXiv 2025 Project Visual Dubbing, Diffusion, Mamba-Transformer
2025 KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation ArXiv 2025 Keyframe, Diffusion, Dual-Path, Facial Animation
2025 SynchroRaMa: Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding WACV 2026 Project Multi-Modal, Emotion-Aware, LLM
2025 Talking Head Generation via AU-Guided Landmark Prediction ArXiv 2025 Action Units, Landmark Prediction, Diffusion
2025 3DiFACE: Synthesizing and Editing Holistic 3D Facial Animation ArXiv 2025 Project 3D Facial Animation, Diffusion, Editing, Speech-Driven
2025 PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control ICONIP 2025 3DGS, Real-Time, Pixel-Aware, Audio-Driven
2025 Beat on Gaze: Learning Stylized Generation of Gaze and Head Dynamics ArXiv 2025 Gaze Control, Head Motion, Style-Aware, 3D
2025 Think2Sing: Orchestrating Structured Motion Subtitles for Singing-Driven 3D Head Animation ArXiv 2025 Singing-Driven, 3D Head, Diffusion
2025 Audio2Face-3D: Audio-driven Realistic Facial Animation For Digital Avatars ArXiv 2025 Audio-driven Realistic Facial Animation, Digital Avatars
2025 DisenEmo: Learning disentangled emotional representation from facial motion for 3D talking head generation ICIP 2025 Disentangled Emotional Representation, 3D Talking Head Generation
2025 ExpTalk: Diverse Emotional Expression via Adaptive Disentanglement and Refined Alignment for Speech-Driven 3D Facial Animation IJCAI 2025 Adaptive Disentanglement, Refined Alignment, 3D Facial Animation
2025 SyncGaussian: Stable 3D Gaussian-Based Talking Head Generation with Enhanced Lip Sync via Discriminative Speech Feature IJCAI 2025 Stable 3D Gaussian-Based Talking Head Generation, Enhanced Lip Sync, Discriminative Speech Feature
2025 Wan-S2V: Audio-Driven Cinematic Video Generation ArXiv 2025 Cinematic, Audio-Driven, Video Generation
2025 InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing ArXiv 2025 Sparse-Frame Dubbing, Full-Body
2025 D^3-Talker: Dual-Branch Decoupled Deformation Fields for Few-Shot 3D Talking Head Synthesis ECAI 2025 Few-Shot, 3DGS, Deformation Fields
2025 RealTalk: Realistic Emotion-Aware Lifelike Talking-Head Synthesis ICCV 2025 Workshop Emotion, NeRF, VAE
2025 FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation ArXiv 2025 Project Audio-Driven, Portrait Animation, Preference Optimization
2025 HM-Talker: Hybrid Motion Modeling for High-Fidelity Talking Head Synthesis ArXiv 2025 Hybrid Motion, High-Fidelity, Talking Head
2025 StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation ArXiv 2025 Code Project Stable Diffusion
2025 X-Actor: Emotional and Expressive Long-Range Portrait Acting from Audio ArXiv 2025 Project Emotional Portrait, Long-range, Audio-driven
2025 DICE-Talk: Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation ACM MM 2025 Emotional Portrait, Identity Preservation, Emotion Cooperation
2025 M2DAO-Talker: Harmonizing Multi-granular Motion Decoupling and Alternating Optimization for Talking-head Generation ArXiv 2025 Project Multi-granular Motion, Decoupling, Optimization
2025 MEDTalk: Multimodal Controlled 3D Facial Animation with Dynamic Emotions by Disentangled Embedding ArXiv 2025 Code Multimodal, 3D Facial Animation, Dynamic Emotions
2025 KLASSify to Verify: Audio-Visual Deepfake Detection Using SSL-based Audio and Handcrafted Visual Features ACM MM 2025 Deepfake Detection, Audio-Visual, SSL
2025 DiTalker: A Unified DiT-based Framework for High-Quality and Speaking Styles Controllable Portrait Animation ArXiv 2025 Project DiT, Portrait Animation, Speaking Styles
2025 Learning Phonetic Context-Dependent Viseme for Enhancing Speech-Driven 3D Facial Animation Interspeech 2025 Project Phonetic Context, Viseme, 3D Facial Animation
2025 SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation ACM MM 2025 Spatial Audio, Video Generation, MLLM
2025 Is It Really You? Exploring Biometric Verification Scenarios in Photorealistic Talking-Head Avatar Videos IEEE IJCB 2025 Biometric Verification, Avatar Security, Facial Motion
2025 Mask-Free Audio-driven Talking Face Generation for Enhanced Visual Quality and Identity Preservation ArXiv 2025 Mask-Free, Identity Preservation, Audio-driven
2025 MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization ICCV 2025 Project Personalized, 3D Facial Animation, Memory
2025 Real-time Generation of Various Types of Nodding for Avatar Attentive Listening System ICMI 2025 Code Real-time, Nodding Generation, Avatar Interaction
2025 MoDA: Multi-modal Diffusion Architecture for Talking Head Generation ArXiv 2025 Project Multi-modal, Diffusion, Talking Head Generation
2025 GGTalker: Talking Head Synthesis with Generalizable Gaussian Priors and Identity-Specific Adaptation ICCV 2025 Code Project 3D Talking Head, Gaussian Priors, Identity Adaptation
2025 FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases ArXiv 2025 Identity Leakage, Extreme Cases
2025 Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field ArXiv 2025 Few-Shot, Global Gaussian Field, 3DGS
2025 JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching ArXiv 2025 Flow Matching, Audio-Motion
2025 ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model ArXiv 2025 Autoregressive, FLAME, 3D
2025 Audio-Visual Driven Compression for Low-Bitrate Talking Head Videos ICMR 2025 Compression, Low-Bitrate
2025 SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting ArXiv 2025 3DGS, Synchronization
2025 Controllable Expressive 3D Facial Animation via Diffusion in a Unified Multimodal Space ICME 2025 3D, Diffusion, Multimodal
2025 EmoVOCA: Speech-Driven Emotional 3D Talking Heads WACV 2025 Emotional, 3D, VOCA
2025 Lipschitz-Driven Noise Robustness in VQ-AE for High-Frequency Texture Repair in ID-Specific Talking Heads ArXiv 2025 Noise Robustness, VQ-AE, High-Frequency
2025 LLIA -- Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models ArXiv 2025 Low-Latency, Real-Time, Interactive
2025 Sonic: Shifting Focus to Global Audio Perception in Portrait Animation CVPR 2025 Global Audio Perception, Portrait Animation
2025 High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning ArXiv 2025 LLM, Reliability
2025 Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation CVPR 2025 Adversarial Defense, Privacy
2025 Cocktail-Party Audio-Visual Speech Recognition Interspeech 2025 Audio-Visual Speech Recognition
2025 TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models ArXiv 2025 Real-Time, Autoregressive Diffusion
2025 MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation ArXiv 2025 Co-Speech Gesture, Two-Stage
2025 V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow ICASSP 2025 Video-to-Speech, Speech Decomposition
2025 IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos CVPR 2025 3D-aware, Video Diffusion
2025 Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements ArXiv 2025 Voice Conversion, Survey
2025 FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing ArXiv 2025 Attribute Editing, Interactive
2025 Video Editing for Audio-Visual Dubbing ArXiv 2025 Video Editing, Dubbing
2025 Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation CVPR 2025 3D, Semantic Decoupling
2025 Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion ArXiv 2025 Diffusion, 3D
2025 VTutor: An Animated Pedagogical Agent SDK that Provide Real Time Multi-Model Feedback ArXiv 2025 SDK, LLM, Real-time
2025 OT-Talk: Animating 3D Talking Head with Optimal Transportation ArXiv 2025 FLAME, 3D
2025 GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting CVPRW 2025 3DGS
2025 Model See Model Do: Speech-Driven Facial Animation with Style Control SIGGRAPH 2025
2025 FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing ArXiv 2025 LLM, Qwen
2025 KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution ArXiv 2025
2025 Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis ArXiv 2025 Diffusion
2025 FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis ICMR 2025
2025 MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices CVPR 2025 100+fps
2025 Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation ArXiv 2025
2025 FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation CVPR 2025 Fast Diffusion 12.5X speedup
2025 Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency ICLR 2025
2025 Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance ArXiv 2025
2025 Audio-driven Gesture Generation via Deviation Feature in the Latent Space ArXiv 2025 Gesture
2025 Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics CVPR 2025
2025 MGGTalk:Monocular and Generalizable Gaussian Talking Head Animation CVPR 2025 Project One Shot, 3DGS
2025 DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance ArXiv 2025 Dubbing
2025 Dual Audio-Centric Modality Coupling for Talking Head Generation ArXiv 2025 NeRF
2025 Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis ArXiv 2025 3DGS
2025 AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers CVPR 2025 DiT
2025 DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model ICME 2025
2025 Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation CVPR 2025 Autoregressive
2025 DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation ICME 2025 Diffusion, 3D
2025 Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation CVPR 2025
2025 HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation CVPR 2025 Hunyuan
2025 MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling ArXiv 2025
2025 StyleSpeaker: Audio-Enhanced Fine-Grained Style Modeling for Speech-Driven 3D Facial Animation ArXiv 2025 3D
2025 LatentSnc: Taming Audio-Conditioned Latent Diffusion Models for Lip Sync with SyncNet Supervision ArXiv 2025
2025 MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice ArXiv 2025
2025 KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation CVPR 2025 Diffusion, Long Sequences
2025 Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture CVPR 2025 Texture
2025 AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation IEEE Transactions on Multimedia Project MLoRA, Personalized
2025 InsTaG: Learning Personalized 3D Talking Head from Few-Second Video CVPR 2025 Few Shot, 3DGS
2025 FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion model ArXiv 2025 Diffusion
2025 NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis ICASSP 2025
2025 Emotional Face-to-Speech ArXiv 2025 emotion, face2speech
2025 EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis ArXiv 2025 emotion, 3DGS
2025 EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face Animation ArXiv 2025 emotion,3D
2025 Towards Dynamic NeProbTalk3Dural Communication and Speech Neuroprosthesis Based on Viseme Decoding ICASSP 2025 Viseme
2025 SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation ArXiv 2025 Huaman Pose
2025 JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing ArXiv 2025 Depth, JD work
2025 Identity-Preserving Video Dubbing Using Motion Warping ArXiv 2025 Video Dubbing
2025 LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition ICASSP 2025 VSR
2025 DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis ICASSP 2025 Hair-Preserving
2025 UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control ArXiv 2025 SD, Lighting control
2024 Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis ArXiv 2024 Audio Feature Extraction, Whisper, Real-time processing, Talking portrait synthesis
2024 PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation ArXiv 2024 Project Pose Latent Diffusion, Lip Synchronization, Text-Audio Control
2024 One-Shot Pose-Driving Face Animation Platform ArXiv 2024 One-Shot, Pose-Driving, Face Animation, Talking Head
2024 FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization ArXiv 2024 Normalizing Flow, Vector-Quantization, Lip Sync, Emotional Talking Faces
2024 VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization ArXiv 2024 visemes, code book
2024 PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis AAAI 2025 Point Cloud, Gaussian Splatting
2024 EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion CVPR 2025 Project Emotion, Expressive, Diffusion
2024 GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression AAAI 2025 Gaze-oriented
2024 EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing CVPR 2025 Emotion, Dubber
2024 PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation ArXiv 2024 Diffusion, Attention, One-Shot
2024 DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation AAAI 2025 3D face, FLAME, Emotion
2024 LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync ArXiv 2024 Diffusion, SyncNet
2024 GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression AAAI 2025 Gaze
2024 FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait ICCV 2025 Project Flow Matching
2024 SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model ArXiv 2024 Diffusion, Style
2024 Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming Tech Report Omni!!!
2024 Controllable Talking Face Generation by Implicit Facial Keypoints Editing ArXiv 2024 Face Edit
2024 SPEAK: Speech-Driven Pose and Emotion-Adjustable Talking Head Generation ArXiv 2024
2024 LokiTalk: Learning Fine-Grained and Generalizable Correspondences to Enhance NeRF-based Talking Head Synthesis ArXiv 2024 NeRF
2024 MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation ArXiv 2024 Memory
2024 IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation ArXiv 2024 Motion Diffusion Model
2024 Memories are One-to-Many Mapping Alleviators in Talking Face Generation IEEE 2024 Memory
2024 Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis ArXiv 2024 Diffusion
2024 GaussianSpeech: Audio-Driven Gaussian Avatars ArXiv 2024 3DGS, 3D
2024 LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis ArXiv 2024
2024 EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion ArXiv 2024
2024 S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis ECCV 2024
2024 LES-Talker: Fine-Grained Emotion Editing for Talking Head Generation in Linear Emotion Space ArXiv 2024 Fine-Grained Emotion
2024 JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation ArXiv 2024 Diffusion, VASA
2024 JoyHallo: Digital human model for Mandarin ArXiv 2024 Diffusion, Hallo
2024 Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation ICLR 2025 Diffusion, Hallo
2024 Audio-Driven Emotional 3D Talking-Head Generation ArXiv 2024 Emotion
2024 Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts ArXiv 2024
2024 Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization ArXiv 2024
2024 DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation ArXiv 2024 Non-autoregressive Diffusion
2024 LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details ArXiv 2024
2024 Diverse Code Query Learning for Speech-Driven Facial Animation ArXiv 2024
2024 TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans ECCVW 2024 NeRF
2024 ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE SIGGRAPH MIG 2024 3D
2024 JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation BMVC 2024 NeRF
2024 3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy ArXiv 2024
2024 LawDNet: Enhanced Audio-Driven Lip Synthesis via Local Affine Warping Deformation ArXiv 2024
2024 StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads TPAMI 2024
2024 DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures ArXiv 2024 diffusion
2024 EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion ArXiv 2024 Diffusion
2024 PersonaTalk: Bring Attention to Your Persona in Visual Dubbing SIGGRAPH Asia 2024
2024 KAN-Based Fusion of Dual-Domain for Audio-Driven Facial Landmarks Generation ArXiv 2024 KAN
2024 TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation ArXiv 2024 LoRA
2024 Avatar Concept Slider: Manipulate Concepts In Your Human Avatar With Fine-grained Control ArXiv 2024
2024 G3FA: Geometry-guided GAN for Face Animation BMVC 2024
2024 Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation ArXiv 2024
2024 DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation ArXiv 2024
2024 High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model IEEE TIP
2024 Style-Preserving Lip Sync via Audio-Aware Style Reference IEEE TIP
2024 Talk to the Wall: The Role of Speech Interaction in Collaborative Visual Analytics IEEE TVCG 2024 Collaborative
2024 MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation ArXiv 2024 Co-Speech Gesture
2024 GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer ArXiv 2024
2024 UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model ArXiv 2024
2024 DiM-Gesture: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2 framework ArXiv 2024
2024 What if Red Can Talk? Dynamic Dialogue Generation Using Large Language Models ACL Wordplay 2024
2024 LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement ArXiv 2024
2024 RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network ArXiv 2024
2024 Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation ArXiv 2024
2024 JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model ArXiv 2024 3D
2024 Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs COLM 2024 LLM
2024 Digital Avatars: Framework Development and Their Evaluation ArXiv 2024
2024 EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head ECCV 2024
2024 PAV: Personalized Head Avatar from Unstructured Video Collection ECCV 2024
2024 Text-based Talking Video Editing with Cascaded Conditional Diffusion ArXiv 2024
2024 EmoFace: Audio-driven Emotional 3D Face Animation IEEE VR 2024
2024 Learning Online Scale Transformation for Talking Head Video Generation ArXiv 2024
2024 EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning AAAI 2025 🔥阿里
2024 Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN ArXiv 2024 StyleGAN
2024 Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert Interspeech 2024 3D
2024 MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset Interspeech 2024 3D, Dataset
2024 NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation ArXiv 2024 NeRF
2024 Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement ArXiv 2024
2024 V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation Tech Report 🔥EMO, Diffusion, Open-source
2024 CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer WACV 2024
2024 Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation ArXiv 2024 🔥EMO, Diffusion, Open-source
2024 Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation ArXiv 2024 Emotion
2024 Controllable Talking Face Generation by Implicit Facial Keypoints Editing ArXiv 2024 Controller
2024 InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation ArXiv 2024 Text-Guided
2024 Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation ArXiv 2024 A Benchmark and Survey
2024 NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior CVPRW 2024 SadTalker+NeRF
2024 SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space ICASSP 2025
2024 AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding ArXiv 2024
2024 EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars ArXiv 2024 EMO
2024 GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting ACMM 2024 🔥Gaussian Splatting
2024 CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation ArXiv 2024 Emotion
2024 GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting ArXiv 2024 🔥Gaussian Splatting
2024 GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting ACMM 2024 🔥Gaussian Splatting
2024 TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting ECCV 2024 🔥Gaussian Splatting
2024 Learn2Talk: 3D Talking Face Learns from 2D Talking Face ArXiv 2024 🔥Gaussian Splatting
2024 VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time NeurIPS 2024 🔥🔥🔥Awesome,Microsoft
2024 Pose-Aware 3D Talking Face Synthesis using Geometry-guided Audio-Vertices Attention IEEE 2024
2024 EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis ECCV 2024 Emotion
2024 FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio ArXiv 2024
2024 Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior ArXiv 2024
2024 AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation ArXiv 2024 🔥🔥🔥Similar to EMO
2024 Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework CVPR 2024
2024 Adaptive Super Resolution For One-Shot Talking-Head Generation ICASSP 2024
2024 VLOGGER: Multimodal Diffusion for Embodied ArXiv 2024 Embodied
2024 EmoVOCA: Speech-Driven Emotional 3D Talking Heads ArXiv 2024 3D, VOCA
2024 ScanTalk: 3D Talking Heads from Unregistered Scans ECCV 2024 3D
2024 Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style ArXiv 2024
2024 EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions ArXiv 2024 🔥🔥🔥Amazing, Diffusion
2024 G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment ArXiv 2024 A Generic Framework
2024 Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis CVPR 2024 High-Quality
2024 DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer ArXiv 2024 3D
2024 EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation ArXiv 2024 Emotion
2024 NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis ICASSP 2024 AU
2024 Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis ICLR 2024 3D, One-Shot,Realistic
2024 SyncTalk: The Devil😈 is in the Synchronization for Talking Head Synthesis CVPR 2024 😈Talking Head
2024 AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation ArXiv 2024 3D, Mesh
2024 DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation ArXiv 2024 Emotion
2024 AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis AAAI 2024
2024 R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning ArXiv 2024 based-RAD-NeRF
2024 DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis ICASSP 2024 - - ER-NeRF
2023 Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis ICCV 2023 Tri-plane
2023 LipNeRF: What is the right feature space to lip-sync a NeRF? FG 2023 Wav2lip
2024 VectorTalker: SVG Talking Face Generation with Progressive Vectorisation ArXiv 2024 SVG
2024 Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation AAAI 2024 3D
2024 DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models ArXiv 2024 Diffusion
2024 FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models ArXiv 2024
2024 GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance ArXiv 2024 3D
2024 GMTalker: Gaussian Mixture based Emotional talking video Portraits ArXiv 2024 Emotion
2024 VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior ArXiv 2024 Mesh
2024 GAIA: Zero-shot Talking Avatar Generation ArXiv 2024 Code(coming) 😲😲😲
2023 Towards Streaming Speech-to-Avatar Synthesis ArXiv 2023 Streaming Synthesis, Articulatory Inversion, Real-time, Speech-driven
2023 OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions ArXiv 2023 One-shot Talking Head, Head Motions, One-to-Many Mapping, Audio-driven
2023 Controllable One-Shot Face Video Synthesis With Semantic Aware Prior ArXiv 2023 One-shot Talking Head, Semantic Aware Prior, Controllable Generation, Pose Alignment
2023 FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions ICME 2023 Natural Head Motions, Flow-guided, Audio-driven Pose Prediction, One-shot Talking Head
2023 OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering ArXiv 2023 Tri-plane Rendering, One-shot Avatar, Controllable, 3D Consistency
2023 OPT: One-shot Pose-Controllable Talking Head Generation ICASSP 2023 pose control, identity preservation, audio feature disentanglement
2023 Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation ICCV 2023 -
2023 ToonTalker: Cross-Domain Face Reenactment ICCV 2023 - - -
2023 Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation ICCV 2023 -
2023 EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation ICCV 2023 - - Emotion
2023 Emotional Listener Portrait: Realistic Listener Motion Simulation in Conversation ICCV 2023 - - Emotion,LHG
2023 MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions ICCV 2023 - - -
2023 Facediffuser: Speech-driven 3d facial animation synthesis using diffusion ACM SIGGRAPH MIG 2023 🔥Diffusion,3D
2023 Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis TCSVT 2023 - -
2023 SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation CVPR 2023 3D,Single Image
2023 EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation ICCV 2023 3D,Emotion
2023 Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks InterSpeech 2023 Emotion
2023 DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video AAAI 2023
2023 StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles AAAI 2023 Style
2023 High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning CVPR 2023 Emotion
2023 StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator CVPR 2023 -
2023 TalkLip: Seeing What You Said - Talking Face Generation Guided by a Lip Reading Expert CVPR 2023
2023 CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior CVPR 2023 3D,codebook
2023 Emotionally Enhanced Talking Face Generation ArXiv 2023 Emotion
2023 DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder ACMM 2023 🔥Diffusion
2023 READ Avatars: Realistic Emotion-controllable Audio Driven Avatars ArXiv 2023 -
2023 DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis CVPR 2023 🔥Diffusion
2023 Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation ArXiv 2023 - 🔥Diffusion
2022 Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis ArXiv 2022 disentangled representation, contrastive learning, multi-motion control
2022 Emotion-Controllable Generalized Talking Face Generation IJCAI 2022 emotion control, graph convolutional network, geometry-aware
2022 StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN ArXiv 2022 StyleGAN, high-resolution, one-shot, lip sync
2022 VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild SIGGRAPH 2022
2022 Expressive Talking Head Generation with Granular Audio-Visual Control CVPR 2022 - - -
2022 Talking Face Generation with Multilingual TTS CVPR 2022 -
2022 EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model SIGGRAPH 2022 - - Emotion
2022 SPACEx 🚀: Speech-driven Portrait Animation with Controllable Expression ArXiv 2022 - Project -
2022 Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers SIGGRAPH Asia 2022 - - -
2022 Memories are One-to-Many Mapping Alleviators in Talking Face Generation ArXiv 2022 - - -
2021 One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning AAAI 2022 one-shot, audio-visual correlation, keypoint-based motion, lip sync
2021 Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion ArXiv 2021 Audio-driven, Talking-head, Head Motion, Keypoint-based Motion
2021 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head ArXiv 2021 3D Talking Head, Emotion, Geometry Map, Audio-driven
2021 MakeItTalk: Speaker-Aware Talking-Head Animation SIGGRAPH Asia 2020 Speaker-Aware, Audio-Driven, Facial Landmarks, Photorealistic
2021 PC-AVS: Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation CVPR 2021 -
2021 Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis ACM MM 2021 - - -
2021 Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation IJCAI 2021 - - -
2021 Talking Head Generation with Audio and Speech Related Facial Action Units BMVC 2021 - - AU
2021 Audio-Driven Emotional Video Portraits CVPR 2021 Emotion
2021 IATS: Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis ACM Multimedia 2021 - - -
2020 A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild ACM Multimedia 2020 -
2020 Talking-head Generation with Rhythmic Head Motion ECCV 2020 -
2020 Speaker-Aware Talking-Head Animation SIGGRAPH Asia 2020 -
2020 Neural Voice Puppetry: Audio-driven Facial Reenactment ECCV 2020 -
2020 A Large-scale Audio-visual Dataset for Emotional Talking-face Generation ECCV 2020 -
2020 Realistic Speech-Driven Facial Animation with GANs IJCV 2020 -
2020 Multi Modal Adaptive Normalization for Audio to Video Generation ArXiv 2020 Audio-to-Video, Multi-Modal Adaptive Normalization, Facial Video Generation, Keypoint Heatmap
2019 Talking Face Generation by Adversarially Disentangled Audio-Visual Representation AAAI 2019 -
2019 Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss CVPR 2019 -
2018 Lip Movements Generation at a Glance ECCV 2018 -
2018 Audio-Driven Animator-Centric Speech Animation SIGGRAPH 2018 -
2017 Synthesizing Obama: Learning Lip Sync From Audio SIGGRAPH 2017 -
2017 You Said That? Synthesising Talking Faces From Audio BMVC 2019 -
2017 Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion SIGGRAPH 2017 -
2017 A Deep Learning Approach for Generalized Speech Animation SIGGRAPH 2017 -
2016 Lip Reading in the Wild ACCV 2016 -

Text-driven

Year Title Conference/Journal Code/Proj
2026 Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars ArXiv 2026 Project
2026 ActAvatar: Temporally-Aware Precise Action Control for Talking Avatars ArXiv 2026
2026 Text-Driven Emotionally Continuous Talking Face Generation ArXiv 2026
2025 Think-Before-Draw: Decomposing Emotion Semantics & Fine-Grained Controllable Expressive Talking Head Generation ArXiv 2025
2025 When Words Smile: Generating Diverse Emotional Facial Expressions from Text EMNLP 2025 Code Project
2025 OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication ArXiv 2025
2025 Text2Lip: Progressive Lip-Synced Talking Face Generation from Text via Viseme-Guided Rendering ArXiv 2025
2024 FT2TF: First-Person Statement Text-To-Talking Face Generation WACV 2025
2024 HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting ECCV 2024 Code Project
2024 Text-Driven Talking Face Synthesis by Reprogramming Audio-Driven Models ICASSP 2024
2024 GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars ArXiv 2024
2023 Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism ArXiv 2023
2023 AgentAvatar: Disentangling Planning, Driving and Rendering for Photorealistic Avatar Agents ArXiv 2023
2023 Text-to-Video: A Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation ArXiv
2023 TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles ArXiv
2022 Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary ICASSP 2022 Project Code
2021 Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation AAAI Code
2021 Txt2vid: Ultra-low bitrate compression of talking-head videos via text ArXiv Code

NeRF & 3D & Gaussian Splatting

Year Title Conference/Journal Code Project Keywords
2026 Retrieval-Augmented Gaussian Avatars: Improving Expression Generalization ArXiv 2026 Gaussian Splatting, Expression Generalization, Retrieval Augmentation, 3D Avatars
2026 STAvatar: Soft Binding and Temporal Density Control for Monocular 3D Head Avatars Reconstruction ArXiv 2026 Gaussian Splatting, 3D Head Avatars, Soft Binding, Temporal Density Control
2026 OMG-Avatar: One-shot Multi-LOD Gaussian Head Avatar ArXiv 2026 3D Gaussian, One-Shot, Head Avatar
2026 LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation ArXiv 2026 Kinematic-Space Completion, Expression Control, 3D Gaussian Splatting, Video Diffusion
2026 GeoDiff4D: Geometry-Aware Diffusion for 4D Head Avatar Reconstruction ArXiv 2026 Geometry-Aware Diffusion, 4D Avatar Reconstruction, 3D Gaussian Splatting, Surface Normals
2026 OMEGA-Avatar: One-shot Modeling of 360° Gaussian Avatars ArXiv 2026 One-Shot Avatar, 360° Full-Head, 3D Gaussian Splatting, Multi-View Feature Splatting
2026 GTAvatar: Bridging Gaussian Splatting and Texture Mapping for Relightable and Editable Gaussian Avatars ArXiv 2026 Gaussian Splatting, Texture Mapping, Relightable Avatars, 3D Reconstruction
2026 OFERA: Blendshape-driven 3D Gaussian Control for Occluded Facial Expression to Realistic Avatars in VR ArXiv 2026 Project Blendshape Control, Gaussian Avatars, VR Telepresence, Real-time Expression
2026 FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation ICLR 2026 3D Gaussian Splatting, Head Avatars, Real-Time Animation, Few-Shot Learning
2026 CAG-Avatar: Cross-Attention Guided Gaussian Avatars for High-Fidelity Head Reconstruction ArXiv 2026 3D Gaussian Splatting, cross-attention, head reconstruction, drivable avatars
2026 Generalizable and Animatable 3D Full-Head Gaussian Avatar from a Single Image ArXiv 2026 3D full-head avatar, Gaussian primitives, UV space, single-image reconstruction
2026 UIKA: Fast Universal Head Avatar from Pose-Free Images ArXiv 2026 Project Gaussian Splatting, UV Mapping, Head Avatar, Feed-forward
2026 ELITE: Efficient Gaussian Head Avatar from a Monocular Video via Learned Initialization and TEst-time Generative Adaptation ArXiv 2026 Gaussian Avatar, Test-time Adaptation, Diffusion, Monocular Video
2026 RelightAnyone: A Generalized Relightable 3D Gaussian Head Model ArXiv 2026 3D Gaussian Splatting, relightable avatars, single-image fitting, cross-subject generalization
2026 Toward Fine-Grained Facial Control in 3D Talking Head Generation ArXiv 2026 3D, Talking Head
2026 From Blurry to Believable: Enhancing Low-quality Talking Heads with 3D Generative Priors 3DV 2026 Project 3D, Talking Head, 3DV, Latent
2026 Lightweight High-Fidelity Low-Bitrate Talking Face Compression for 3D Video Conference ArXiv 2026 3D, Talking Head
2026 Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting ArXiv 2026 Code Project Gaussian Splatting, 3DGS, Portrait Animation, Talking Head
2026 MANGO:Natural Multi-speaker 3D Talking Head Generation via 2D-Lifted Enhancement ArXiv 2026 3D, Talking Head, Transformer
2025 TexAvatars: Hybrid Texel-3D Representations for Stable Rigging of Photorealistic Gaussian Head Avatars ArXiv 2025 3D Gaussian Splatting, hybrid representation, analytic rigging, UV space
2025 FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation ArXiv 2025 Project 3D avatar, Gaussian Splatting, deformation, reconstruction
2025 FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision ArXiv 2025 Project 3D head avatar, partial supervision, transformer, monocular training
2025 Gaussian Pixel Codec Avatars: A Hybrid Representation for Efficient Rendering Tech Report 2025 Gaussian Splatting, head avatar, hybrid representation, efficient rendering
2025 AGORA: Adversarial Generation Of Real-time Animatable 3D Gaussian Head Avatars ArXiv 2025 Project 3D Gaussian Splatting, Animatable Avatars, FLAME, Real-time Rendering
2025 MagicPortrait: Temporally Consistent Face Reenactment with 3D Geometric Guidance ArXiv 2025 Code Latent Diffusion, FLAME, 3D Geometric Guidance, Face Reenactment
2025 AvatarBack: Back-Head Generation for Complete 3D Avatars from Front-View Images ArXiv 2025 3D Gaussian Splatting, Back-Head Generation, Avatar Reconstruction, Spatial Alignment
2025 EAvatar: Expression-Aware Head Avatar Reconstruction with Generative Geometry Priors ArXiv 2025 3D Gaussian Splatting, expression-aware, deformation-aware, generative priors
2025 SVG-Head: Hybrid Surface-Volumetric Gaussians for High-Fidelity Head Reconstruction and Real-Time Editing ArXiv 2025 Gaussian Splatting, 3D Avatar, Texture Editing, FLAME
2025 MoGaFace: Momentum-Guided and Texture-Aware Gaussian Avatars for Consistent Facial Geometry ArXiv 2025 Gaussian Avatars, FLAME Meshes, Geometry Refinement
2025 HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars ArXiv 2025 3D Gaussian Avatars, Hair Compositionality, Disentangled Prior, Few-shot Fine-tuning
2025 GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar ArXiv 2025 Adaptive Gaussian Splatting, 3D Head Avatar, Mouth Structure, Deformation Strategy
2025 StreamME: Simplify 3D Gaussian Avatar within Live Stream ArXiv 2025 Project 3D Gaussian Splatting, avatar reconstruction, on-the-fly training
2025 Total-Editing: Head Avatar with Editable Appearance, Motion, and Lighting ArXiv 2025 Neural Radiance Fields, Intrinsic Decomposition, Portrait Editing, Motion Control
2025 Interactive Rendering of Relightable and Animatable Gaussian Avatars ArXiv 2025 Gaussian Splatting, Relightable Avatars, Interactive Rendering, Pose-driven Animation
2025 Instant Expressive Gaussian Head Avatar via 3D-Aware Expression Distillation ArXiv 2025 3D, Gaussian Splatting, Avatar, Attention
2025 EmoDiffTalk:Emotion-aware Diffusion for Editable 3D Gaussian Talking Head ArXiv 2025 3D, Diffusion, Emotional, Talking Head
2025 Is It Truly Necessary to Process and Fit Minutes-Long Reference Videos for Personalized Talking Face Generation? ArXiv 2025 Talking Head, Attention
2025 Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications SUI 2025 Project Real-Time, Cross-Platform, 3D Avatar, Gaussian Splatting
2025 STG-Avatar: Animatable Human Avatars via Spacetime Gaussian IROS 2025 Project Spacetime Gaussian, Animatable Avatar, 3DGS
2025 Capture, Canonicalize, Splat: Zero-Shot 3D Gaussian Avatars from Unstructured Phone Images ICCV 2025 Zero-Shot, 3D Gaussian Avatars, Phone Images
2025 [HRM²Avatar] HRM²Avatar: High-Fidelity Real-Time Mobile Avatars from Monocular Phone Scans SIGGRAPH Asia 2025 Project Mobile, Real-Time, Monocular, Avatar
2025 MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussians ArXiv 2025 Mixed 2D-3D Gaussians, Head Avatar, Geometric Accuracy
2025 MVP4D: Multi-View Portrait Video Diffusion for Animatable 4D Avatars ArXiv 2025 Multi-View, Portrait Video, Diffusion, 4D Avatar
2025 Towards Efficient 3D Gaussian Human Avatar Compression: A Prior-Guided Framework ArXiv 2025 3D Gaussian, Human Avatar, Compression
2025 PanoLAM: Large Avatar Model for Gaussian Full-Head Synthesis from One-shot Unposed Image ArXiv 2025 Project Gaussian, Full-Head Synthesis, One-shot
2025 Generative Head-Mounted Camera Captures for Photorealistic Avatars SIGGRAPH Asia 2025 Project Head-Mounted Camera, Photorealistic, Avatar
2025 Capturing Head Avatar with Hand Contacts from a Monocular Video ICCV 2025 Head Avatar, Hand Contacts, Monocular Video, 3D Reconstruction
2025 ArchitectHead: Continuous Level of Detail Control for 3D Gaussian Head Avatars ArXiv 2025 3D Gaussian Head Avatars, Level of Detail Control, Continuous LOD
2025 Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks ArXiv 2025 Project Head Correspondence, Canonical Embedding, Tracking, Avatar
2025 FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction ArXiv 2025 Project Pose-free, Sparse-view, 3DGS
2025 Hyper Diffusion Avatars: Dynamic Human Avatar Generation using Network Weight Space Diffusion ArXiv 2025 Dynamic Avatar, Weight-Space Diffusion
2025 TeRA: Rethinking Text-guided Realistic 3D Avatar Generation ICCV 2025 Text-to-Avatar, Latent Diffusion
2025 GaussianGAN: Real-Time Photorealistic controllable Human Avatars FG 2025 3DGS, Real-Time, Photorealistic
2025 Im2Haircut: Single-view Strand-based Hair Reconstruction for Human Avatars ArXiv 2025 Project Hair Reconstruction, Gaussian Splatting
2025 DevilSight: Augmenting Monocular Human Avatar Reconstruction through a Virtual Perspective ArXiv 2025 Avatar Reconstruction, Video Generation
2025 DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting ICCV 2025 Project Relightable Avatar, 2DGS Distillation
2025 MoGA: 3D Generative Avatar Prior for Monocular Gaussian Avatar Reconstruction ICCV 2025 Project 3D Generative Avatar, Monocular Reconstruction
2025 GUAVA: Generalizable Upper Body 3D Gaussian Avatar ICCV 2025 Code Project 3D Gaussian Avatar, Upper Body, SMPLX
2025 GAS: Generative Avatar Synthesis from a Single Image ICCV 2025 Project Single Image, 3D Avatar, NeRF, Diffusion
2025 EPSilon: Efficient Point Sampling for Lightening of Hybrid-based 3D Avatar Generation ArXiv 2025 Code Efficient Point Sampling, Hybrid 3D Avatar
2025 VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis ICCV 2025 Workshop Visually-Guided, 3D Avatar, Lip Synthesis
2025 ScaffoldAvatar: High-Fidelity Gaussian Avatars with Patch Expressions SIGGRAPH 2025 Project High-Fidelity, Gaussian Avatars, Patch Expressions
2025 AvatarMakeup: Realistic Makeup Transfer for 3D Animatable Head Avatars ArXiv 2025 3D Makeup Transfer, Avatar
2025 HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars ArXiv 2025 Project High-Dimensional, Gaussian Splatting
2025 Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router ArXiv 2025 Multi-Character, 3D-mask
2025 Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos ArXiv 2025 Motion Blur, Animatable Avatars
2025 BecomingLit: Relightable Gaussian Avatars with Hybrid Neural Shading ArXiv 2025 Project 3DGS, Relightable, Neural Shading
2025 SmartAvatar: Text- and Image-Guided Human Avatar Generation with VLM AI Agents ArXiv 2025 Text-Guided, VLM Agents
2025 UMA: Ultra-detailed Human Avatars via Multi-level Surface Alignment ArXiv 2025 Ultra-detailed, Surface Alignment
2025 AniCrafter: Customizing Realistic Human-Centric Animation via Avatar-Background Conditioning in Video Diffusion Models ArXiv 2025 Code Avatar, Human-Centric Animation
2025 Eye-See-You: Reverse Pass-Through VR and Head Avatars IJCAI 2025 VR, Head Avatars, Pass-Through
2025 Barbie: Text to Barbie-Style 3D Avatars ArXiv 2025 Code Project Text to Avatar, Barbie-Style
2025 EVA: Expressive Virtual Avatars from Multi-view Videos SIGGRAPH 2025 Project Avatar, 3D Gaussian
2025 Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis ArXiv 2025 Project 3D, Avatar, Audio-Synthesis
2025 SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation CVPRW 2025 Code Project Single Image, 3D Avatar, 3DGS, Video Diffusion
2025 TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling SIGGRAPH 2025 Project 3DGS, Avatar, High-Resolution
2025 MPMAvatar: Learning 3D Gaussian Avatars with Accurate and Robust Physics-Based Dynamics NeurIPS 2025 Project Physics-Based, 3DGS, Garments
2025 PERSE: Personalized 3D Generative Avatars from A Single Portrait CVPR 2025 Project Personalized, 3DGS, Single Image
2025 SIE3D: Single-image Expressive 3D Avatar generation via Semantic Embedding and Perceptual Expression Loss ArXiv 2025 Project Expressive, Text-Driven, Single Image
2025 Dream3DAvatar: Text-Controlled 3D Avatar Reconstruction from a Single Image ArXiv 2025 Text-Driven, Single Image, 3DGS
2025 MAGE:A Multi-stage Avatar Generator with Sparse Observations ArXiv 2025 Avatar, AR/VR
2025 TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting CVPR 2025(Highlight🚀) Project AR
2025 Better Together: Unified Motion Capture and 3D Avatar Reconstruction ArXiv 2025
2025 WildAvatar: Learning In-the-wild 3D Avatars from the Web CVPR 2025 Code Project WildAvatar, Dataset
2025 2DGS-Avatar: Animatable High-fidelity Clothed Avatar via 2D Gaussian Splatting ICVRV 2024 2DGS
2025 Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior CVPR 2025 Project
2025 LAM: Large Avatar Model for One-shot Animatable Gaussian Head ArXiv 2025 Code Project
2025 Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars ArXiv 2025 Project
2025 LUCAS: Layered Universal Codec Avatars ArXiv 2025
2025 Hybrid Explicit Representation for Ultra-Realistic Head Avatars ArXiv 2025
2025 MaintaAvatar: A Maintainable Avatar Based on Neural Radiance Fields by Continual Learning AAAI 2025 NeRF
2025 Relightable Full-Body Gaussian Codec Avatars ArXiv 2025 Project Full-Body, Avatars
2025 Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance ArXiv 2025 Project Avatars, Single Image
2025 Disentangled Clothed Avatar Generation with Layered Representation ICCV 2025 (Highlight) Code Project
2025 L3D-Pose: Lifting Pose for 3D Avatars from a Single Camera in the Wild ICASSP 2025 Project
2025 Generating Editable Head Avatars with 3D Gaussian GANs ArXiv 2025 Code Project 3DGS
2024 FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model ArXiv 2024 3D Facial Animation, Expression Transfer, Foundation Model, Video-driven
2024 Universal Facial Encoding of Codec Avatars from VR Headsets SIGGRAPH 2024 Facial Encoding, VR Headset, Real-time Animation, 3D Avatar
2024 PSAvatar: A Point-based Shape Model for Real-Time Head Avatar Animation with 3D Gaussian Splatting ArXiv 2024 3D Gaussian Splatting, Head Avatar Animation, Point-based Shape Model, Real-time Rendering
2024 3D Gaussian Blendshapes for Head Avatar Animation ACM SIGGRAPH 2024 Gaussian splatting, blendshapes, head avatar, real-time rendering
2024 Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters ArXiv 2024 Co-speech
2024 GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians AAAI 2025 Code GNN-Generated, 3DGS
2024 CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models ArXiv 2024 Code Project Multi-View Diffusion
2024 [3D$^2$-Actor] 3D$^2$-Actor: Learning Pose-Conditioned 3D-Aware Denoiser for Realistic Gaussian Avatar Modeling AAAI 2025 Code Project 3DGS
2024 StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priors ArXiv 2024 Project
2024 Human-3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models NIPS 2024 Project Diffusion
2024 GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion ArXiv 2024 Code Demo Project Diffusion
2024 SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing ArXiv 2024 Project NVIDIA, Hair and Clothing
2024 GASP: Gaussian Avatars with Synthetic Priors ArXiv 2024 Project
2024 MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussian Splatting ArXiv 2024 Project 3DGS, 2D-3D
2024 PBDyG: Position Based Dynamic Gaussians for Motion-Aware Clothed Human Avatars ArXiv 2024 Clothed Avatar
2024 Topology-aware Human Avatars with Semantically-guided Gaussian Splatting ArXiv 2024
2024 3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning ArXiv 2024
2024 AniFaceDiff: Animating Stylized Avatars via Parametric Conditioned Diffusion Models ArXiv 2024
2024 HHAvatar: Gaussian Head Avatar with Dynamic Hairs ArXiv 2024 Project Hair
2024 InstantGeoAvatar: Effective Geometry and Appearance Modeling of Animatable Avatars from Monocular Video ACCV 2024 Code
2024 GAST: Sequential Gaussian Avatars with Hierarchical Spatio-temporal Context ArXiv 2024
2024 Bundle Adjusted Gaussian Avatars Deblurring ArXiv 2024 Code
2024 DynamicAvatars: Accurate Dynamic Facial Avatars Reconstruction and Precise Editing with Diffusion Models ArXiv 2024
2024 FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video ArXiv 2024 Project
2024 ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance ArXiv 2024 Project
2024 DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh ArXiv 2024
2024 DanceFusion: A Spatio-Temporal Skeleton Diffusion Transformer for Audio-Driven Dance Motion Reconstruction ArXiv 2024 Project
2024 EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars ArXiv 2024
2024 Towards Native Generative Model for 3D Head Avatar ArXiv 2024
2024 Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual Reality IEEE 2024
2024 Stable Video Portraits ECCV 2024 Project Diffusion
2024 LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field ECCV'24 CADL Code
2024 DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion ArXiv 2024 Project
2024 Gaussian Déjà-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities WACV 2025 Code Project
2024 GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations SIGGRAPH Asia 2024 Project 🔥Gaussian Splatting
2024 Avatar Concept Slider: Manipulate Concepts In Your Human Avatar With Fine-grained Control ArXiv 2024
2024 GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars ArXiv 2024
2024 DEGAS: Detailed Expressions on Full-Body Gaussian Avatars ArXiv 2024 🔥Gaussian Splatting
2024 CHASE: 3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning ArXiv 2024
2024 Expressive Whole-Body 3D Gaussian Avatar ECCV 2024 Project
2024 AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos ECCV 2024 Code Project
2024 XHand: Real-time Expressive Hand Avatar ArXiv 2024 Code Hand
2024 Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture ECCV 2024 Project
2024 CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images ECCV 2024 Code
2024 WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation ArXiv 2024 Code Project Dataset
2024 Instant 3D Human Avatar Generation using Image Diffusion Models ArXiv 2024 Project
2024 Gaussian Eigen Models for Human Heads ArXiv 2024 Project
2024 MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices ArXiv 2024 Real-Time
2024 Expressive Gaussian Human Avatars from Monocular RGB Video ArXiv 2024 Project
2024 Representing Animatable Avatar via Factorized Neural Fields ArXiv 2024
2024 Stratified Avatar Generation from Sparse Observations CVPR 2024 (Oral)
2024 NPGA: Neural Parametric Gaussian Avatars ArXiv 2024 Project
2024 E3Gen: Efficient, Expressive and Editable Avatars Generation ArXiv 2024 Code Project
2024 GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting On going work Try-ON
2024 X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation ICML 2024 Code Project
2024 MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing ArXiv 2024 Code Project 🔥Gaussian Splatting
2024 Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos ArXiv 2024 Code Project 🔥Gaussian Splatting
2024 GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image CVPR 2024 Code Project Editing
2024 Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes CVPR 2024 Project Blendshapes
2024 SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting CVPR 2024 Code Project 🔥Gaussian Splatting
2024 MagicMirror: Fast and High-Quality Avatar Generation with a Constrained Search Space ArXiv 2024 Project
2024 HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior ArXiv 2024 🔥Gaussian Splatting
2024 UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling ArXiv 2024 Project 🔥Gaussian Splatting
2024 NECA: Neural Customizable Human Avatar CVPR 2024 Code
2024 V3D: Video Diffusion Models are Effective 3D Generators ArXiv 2024 Code Project 🔥Gaussian Splatting, Video
2024 DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization CVPR 2024 Code Project 🔥Gaussian Splatting, Sparse-View
2024 GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video ArXiv 2024 Project 🔥Gaussian Splatting, Avatar
2024 Magic-Me: Identity-Specific Video Customized Diffusion ArXiv 2024 Code Project
2024 HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting ArXiv 2024 🔥Gaussian Splatting, Avatar
2024 GaussianHair: Hair Modeling and Rendering with Light-aware Gaussians ArXiv 2024 🔥Gaussian Splatting
2024 ImplicitDeepfake: Plausible Face-Swapping through Implicit Deepfake Generation using NeRF and Gaussian Splatting ArXiv 2024 🔥Gaussian Splatting, Deepfake
2024 Consolidating Attention Features for Multi-view Image Editing ArXiv 2024 🔥Gaussian Splatting, Edit
2024 Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos ArXiv 2024 Project Portraits
2024 4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes ArXiv 2024 Dynamic Scenes
2024 ViCA-NeRF: View-Consistency-Aware 3D Editing of Neural Radiance Fields NIPS 2023 Code Project 3D Edit
2024 CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians with Dual Feature Fusion ArXiv 2024 Project Segmentic
2024 Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation ArXiv 2024 Text to 3D
2024 CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians with Dual Feature Fusion ArXiv 2024 Project 🔥Gaussian Splatting, Segmentation
2024 UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures ArXiv 2024 Project Diffusion,Avatar
2024 GaussianBody: Clothed Human Reconstruction via 3d Gaussian Splatting ArXiv 2024 🔥Gaussian Splatting
2024 FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF ArXiv 2024 Code 4D face video editor
2024 AGG: Amortized Generative 3D Gaussians for Single Image to 3D ArXiv 2024 Project 🔥Gaussian Splatting
2024 Gaussian Shadow Casting for Neural Characters ArXiv 2024 🔥Gaussian Splatting
2024 Human101: Training 100+FPS Human Gaussians in 100s from 1 View ArXiv 2024 Code Project 🔥Gaussian Splatting
2024 Deformable 3D Gaussian Splatting for Animatable Human Avatars ArXiv 2024 🔥Gaussian Splatting
2024 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency ArXiv 2024 Code Project 🔥Gaussian Splatting
2024 What You See Is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs ArXiv 2024 Project
2024 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting ArXiv 2024 Code Project 🔥Gaussian Splatting
2024 Learning Dense Correspondence for NeRF-Based Face Reenactment AAAI 2024 one-shot multi-view face reenactmen
2024 GaussianHead: Impressive 3D Gaussian-based Head Avatars with Dynamic Hybrid Neural Field ArXiv 2024 Code 🔥Gaussian Splatting
2024 MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar ArXiv 2024 🔥Gaussian Splatting
2024 Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians ArXiv 2024 Code Project
2024 HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting ArXiv 2024 🔥Gaussian Splatting
2024 GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians CVPR 2024 Code Project 🔥Gaussian Splatting
2024 VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer ICCV2023
2023 AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text ArXiv 2023 Project Text-to-3D, NeRF, SMPL, Diffusion Model
2023 HAvatar: High-fidelity Head Avatar via Facial Model Conditioned Neural Radiance Field ArXiv 2023 Neural Radiance Field, Facial Model Conditioning, 3D Head Avatar, Expression Control
2023 SD-NeRF: Towards Lifelike Talking Head Animation via Spatially-adaptive Dual-driven NeRFs IEEE 2023 - -
2023 Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions ArXiv 2023
2023 GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation ArXiv 2023 Code Project -
2023 GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis ICLR 2023 Code Project -
2022 RAD-NeRF: Real-time Neural Talking Portrait Synthesis ArXiv 2022 Code Project InstantNGP
2022 DFRF:Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis ECCV 2022 Code Project
2022 NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real Image Animation ArXiv 2022 Code Project -
2022 Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars ArXiv 2022 Code Project -
2022 3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation ArXiv 2022 Code Project -
2022 FNeVR: Neural Volume Rendering for Face Animation ArXiv 2022 Code - -
2022 ROME: Realistic One-shot Mesh-based Head Avatars ECCV 2022 Code Project -
2022 IMavatar: Implicit Morphable Head Avatars from Videos CVPR 2022 Code Project -
2022 HeadNeRF: A Real-time NeRF-based Parametric Head Model CVPR 2022 Code Project -
2022 Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation ArXiv 2022 Code Project -
2021 AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis ICCV 2021 Code Project -
2021 NerFACE: Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction CVPR 2021 Oral Code Project -
2021 DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering ArXiv 2021 Code - -

Conversational & Dialogue

Year Title Conference/Journal Code Project Keywords
2026 HoverAI: An Embodied Aerial Agent for Natural Human-Drone Interaction ArXiv 2026 lip-synced avatars, real-time conversational AI, multimodal pipeline
2026 Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation ArXiv 2026 Interactive avatar, Diffusion forcing, Real-time, Preference optimization
2026 Talking Together: Synthesizing Co-Located 3D Conversations from Audio CVPR 2026 3D, Conversations, Talking Head, CVPR
2026 A²-LLM: An End-to-end Conversational Audio Avatar Large Language Model ArXiv 2026 Conversational, Avatar, LLM
2026 RSATalker: Realistic Socially-Aware Talking Head Generation for Multi-Turn Conversation ArXiv 2026 Multi-Turn Conversation, Talking Head
2025 ALIVE: An Avatar-Lecture Interactive Video Engine with Content-Aware Retrieval for Real-Time Interaction ArXiv 2025 neural talking-head synthesis, content-aware retrieval, real-time interaction, LLM
2025 TAVID: Text-Driven Audio-Visual Interactive Dialogue Generation ArXiv 2025 text-driven, audio-visual, interactive dialogue, cross-modal mappers
2025 ViBES: A Conversational Agent with Behaviorally-Intelligent 3D Virtual Body ArXiv 2025 Project conversational agent, 3D avatar, multimodal interaction, joint language-motion
2025 Towards Interactive Intelligence for Digital Humans ArXiv 2025 interactive intelligence, digital human, multimodal embodiment, real-time interaction
2025 VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction ArXiv 2025 Listener dynamics, 3D dyadic conversation, Expressive control, Multi-modal conditions
2025 UniTalker: Conversational Speech-Visual Synthesis ACM MM 2025 Conversational, Multimodal, Emotion
2025 Think Before You Talk: Enhancing Meaningful Dialogue Generation in Full-Duplex Speech Language Models with Planning-Inspired Text Guidance ArXiv 2025 Project Dialogue Generation, Speech Language Models
2025 MultiTalk: Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation ArXiv 2025 Multi-Person, Conversational
2025 DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations CVPR 2025 3D, Interaction, Dual-Speaker, Conversations, FLAME
2024 INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations ArXiv 2024 Dyadic Conversations
2024 Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction ACL 2024 Empathetic Dialogue
2024 MultiDialog: Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation ACL 2024 Dataset Dialogue, Face-to-Face Conversation
2022 DialogueNeRF: Towards Realistic Avatar Face-to-face Conversation Video Generation ArXiv 2022 - - Dialogue, Face-to-face Conversation

Talking Body & Avatar

Year Paper Conference Code Project Keywords
2026 MIBURI: Towards Expressive Interactive Gesture Synthesis ArXiv 2026 Project Gesture Synthesis, Real-Time, LLM-Conditioned
2026 3DGesPolicy: Phoneme-Aware Holistic Co-Speech Gesture Generation Based on Action Control ArXiv 2026 co-speech gesture, diffusion policy, phoneme-aware, holistic motion
2026 Mitigating Error Accumulation in Co-Speech Motion Generation via Global Rotation Diffusion and Multi-Level Constraints AAAI 2026 co-speech motion, global rotation diffusion, multi-level constraints, error accumulation
2026 SmoothSync: Dual-Stream Diffusion Transformers for Jitter-Robust Beat-Synchronized Gesture Generation from Quantized Audio ArXiv 2026 Gesture generation, Diffusion Transformer, Beat synchronization, Jitter suppression
2025 OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation ArXiv 2025 Project Cognitive Simulation, Avatar, Multimodal
2025 OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models ICCV 2025 Human Animation, Scaling
2025 Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation ArXiv 2025 Project Character Identity, Audio-Driven, Human Animation
2025 EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation CVPR 2025 Semi-Body, Human Animation
2025 InfinityHuman: Towards Long-Term Audio-Driven Human ArXiv 2025 Project Long-Term, Hand Motion, Pose-Guided
2025 Preview WB-DH: Towards Whole Body Digital Human Bench for the Generation of Whole-body Talking Avatar Videos ICCV 2025 Workshop Project Whole-Body Avatar, Benchmark
2025 JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1 ICCV 2025 Workshop Project Whole-Body Avatar, Benchmark
2025 EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation ArXiv 2025 Multi-Modal, Multi-Task, Human Animation
2025 MirrorMe: Towards Realtime and High Fidelity Audio-Driven Halfbody Animation ArXiv 2025 Real-time, Half-body Animation
2025 OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation ArXiv 2025 Audio-Driven, Body Animation
2025 AlignHuman: Improving Motion and Fidelity via Timestep-Segment Preference Optimization for Audio-Driven Human Animation ArXiv 2025 Preference Optimization, Human Animation
2025 HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation ArXiv 2025 Human-Object Interaction, Human Animation
2025 HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters ArXiv 2025 Multi-Character, Human Animation
2025 AsynFusion: Towards Asynchronous Latent Consistency Models for Decoupled Whole-Body Audio-Driven Avatars ArXiv 2025 Whole-Body, Diffusion, Avatar
2025 M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis NeurIPS 2025 Gesture, Full-Body
2025 PAHA: Parts-Aware Audio-Driven Human Animation with Diffusion Model ArXiv 2025 Parts-Aware, Diffusion, Human Animation
2025 Versatile Multimodal Controls for Whole-Body Talking Human Animation ArXiv 2025 Whole-Body, Multimodal
2025 PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning ICCV 2025 Avatar, Whole-Body, Motion Generation
2025 GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation ICCV 2025 Project Co-speech Gesture Synthesis, Hybrid Modality Diffusion Transformer, Retrieval-Augmented Generation
2025 StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars ArXiv 2025 Project Real-Time Avatar, Streaming Diffusion, Interactive Human Avatars, Autoregressive Distillation
2025 Semantic Co-Speech Gesture Synthesis and Real-Time Control for Humanoid Robots ArXiv 2025 co-speech gestures, humanoid robot, real-time control, semantic synthesis
2025 Evaluation of Generative Models for Emotional 3D Animation Generation in VR ArXiv 2025 speech-driven animation, emotional expression, 3D avatar, VR evaluation
2025 Towards Unified Co-Speech Gesture Generation via Hierarchical Implicit Periodicity Learning IEEE Transactions on Image Processing co-speech gesture generation, 3D motion, hierarchical periodicity, audio-driven
2025 Social Agent: Mastering Dyadic Nonverbal Behavior Generation via Conversational LLM Agents ArXiv 2025 Dyadic Interaction, Gesture Generation, LLM-driven
2025 Intentional Gesture: Deliver Your Intentions with Gestures for Speech ArXiv 2025 Project co-speech gesture, communicative intention, motion tokenizer, BEAT-2
2025 Democratizing High-Fidelity Co-Speech Gesture Video Generation ICCV 2025 Project Diffusion Model, Skeleton-Audio Fusion, Co-Speech Gesture, Video Generation
2025 Co-Speech Gesture and Facial Expression Generation for Non-Photorealistic 3D Characters SIGGRAPH 2025 Poster co-speech generation, non-photorealistic characters, facial expressions, gestures
2025 TRiMM: Transformer-Based Rich Motion Matching for Real-Time multi-modal Interaction in Digital Humans ArXiv 2025 Code Co-speech Gesture, Real-time, Transformer, Digital Humans
2025 Co3Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion ICLR 2025 Spotlight Project Concurrent co-speech gesture, Interactive diffusion, Two-person interaction, 3D gesture dataset
2025 CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild ArXiv 2025 Project 3D Gesture Generation, Diffusion Model, Audio-Driven
2025 EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation ArXiv 2025 Co-Speech Motion, Masked Modeling, Speech-Queried Attention
2025 EasyGenNet: An Efficient Framework for Audio-Driven Gesture Video Generation Based on Diffusion Model ArXiv 2025 Audio-Driven Video, Diffusion Model, Gesture Synthesis
2025 ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer ArXiv 2025 Project Co-Speech Motion, Transformer, Gesture Generation, Speech Synchronization
2025 SARGes: Semantically Aligned Reliable Gesture Generation via Intent Chain ArXiv 2025 Gesture Generation, LLM, Intent Chain, Co-speech
2025 DIDiffGes: Decoupled Semi-Implicit Diffusion Models for Real-time Gesture Generation from Speech AAAI 2025 Project Gesture Generation, Diffusion Models, Real-time, Speech-driven
2025 Large Language Models for Virtual Human Gesture Selection AAMAS 2025 Gesture Selection, Large Language Models, Virtual Agents, Co-speech
2025 Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers ArXiv 2025 Co-speech gesture, Diffusion Transformers, VQ-VAEs, Audio-visual synthesis
2025 HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation ArXiv 2025 Project Co-speech gesture, Multimodal entanglement, Spatiotemporal graph, Audio-text semantic
2025 EMO2: End-Effector Guided Audio-Driven Avatar Video Generation ArXiv 2025 Audio-driven, Gesture Generation, Diffusion Model, End-effector Guidance
2024 Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation CVPR 2024 Project 3D Co-speech Gesture, Emotion Transition, Weakly-Supervised Learning, Virtual Avatar Animation
2023 C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model ArXiv 2023 Project Co-speech Gesture, Latent Diffusion, Temporal Dependency
2022 Freeform Body Motion Generation from Speech ArXiv 2022 Code co-speech motion, pose modes, rhythmic dynamics, speech prosody

Metrics

Metrics Paper Link
PSNR (peak signal-to-noise ratio) -
SSIM (structural similarity index measure) Image quality assessment: from error visibility to structural similarity.
CPBD(cumulative probability of blur detection) A no-reference image blur metric based on the cumulative probability of blur detection
LPIPS (Learned Perceptual Image Patch Similarity) - The Unreasonable Effectiveness of Deep Features as a Perceptual Metric paper
NIQE (Natural Image Quality Evaluator) Making a ‘Completely Blind’ Image Quality Analyzer paper
FID (Fréchet inception distance) GANs trained by a two time-scale update rule converge to a local nash equilibrium
LMD (landmark distance error) Lip Movements Generation at a Glance
LRA (lip-reading accuracy) Talking Face Generation by Conditional Recurrent Adversarial Network paper
WER(word error rate) Lipnet: end-to-end sentencelevel lipreading.
LSE-D (Lip Sync Error - Distance) Out of time: automated lip sync in the wild
LSE-C (Lip Sync Error - Confidence) Out of time: automated lip sync in the wild
ACD(Average content distance) Facenet: a unified embedding for face recognition and clustering.
CSIM(cosine similarity) Arcface: additive angular margin loss for deep face recognition.
EAR(eye aspect ratio) Real-time eye blink detection using facial landmarks. In: Computer Vision Winter Workshop
ESD(emotion similarity distance) What comprises a good talking-head video generation?: A Survey and Benchmark

Related Papers on Metrics

Year Paper Conference Keywords
2025 NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results ArXiv 2025 quality assessment, talking head, challenge, THQA-NTIRE

Tools & Software

Tool/Resource Description
LUCIA Development of a MPEG-4 Talking Head Engine. 💻
Yepic Studio Create and dub talking head-style videos in minutes without expensive equipment. 🎥
Mel McGee's Talkbots A complete multi-browser, multi-platform talking head application in SVG suitable for web sites or as an avatar. 🗣️
face3D_chung Create 3D character avatar head objects with texture from a single photo for your games. 🎮
CrazyTalk Exciting features for 3D head creation and automation. 🤪
tts avatar free download - SourceForge Mel McGee's Talkbots is a complete multi-browser, multi-platform talking head. (🔧👄)
Verbatim AI - Product Information, Latest Updates, and Reviews 2023 A simple yet powerful API to generate AI "talking head" videos in near real-time with Verbatim AI. Add interest, intrigue, and dynamism to your chat bots! (🔧👄)
Best Open Source BASIC 3D Modeling Software Includes talk3D_chung, a small example using obj models created with face3D_chung, and speak3D_chung_dll, a dll to load and display face3D_chung talking avatars. (🛠️🎭)
DVDStyler / Discussion / Help: ffmpeg-vbr or internal Talking heads would get a bitrate which is unnecessarily high while using DVDStyler. (🛠️👄)
puffin web browser free download - SourceForge Mel McGee's Talkbots is a complete multi-browser, multi-platform talking head. (🔧👄)
12 best AI video generators to use in 2023 Free and paid |Product ... Whether you’re an entrepreneur, small business owner, or run a large company, AI video generators make it super easy to create high-quality videos from scratch. (🔧🎥)

Slides & Presentations

Presentation Title Description
Few-Shot Adversarial Learning of Realistic Neural Talking Head Models Presentation reviewing the few-shot adversarial learning of realistic neural talking head models.
Nethania Michelle's Character PPT: Presentation discussing the improvement of a 3D talking head for use in an avatar of a virtual meeting room.
Presenting you: Top tips on presenting with Prezi Video – Prezi Article providing top tips for presenting with Prezi Video.
Research Presentation PPT: Resident Research Presentation Slide Deck.
Adding narration to your presentation (using Prezi Video) – Prezi Learn how to add narration to your Prezi presentation with Prezi Video.

References

Website Description
arXiv Provides preprints in various academic fields, serving as an important platform for accessing the latest research findings.
CVF Open Access The Computer Vision Foundation's open-access platform, offering open-access papers from top conferences such as CVPR, ICCV, ECCV, and more.
Papers with Code A platform that aggregates research papers with accompanying code implementations, making it convenient to find the latest research findings and their corresponding implementations.
ICCV - International Conference on Computer Vision The International Conference on Computer Vision, gathering the latest research findings in the field of computer vision.
ECCV - European Conference on Computer Vision The European Conference on Computer Vision, providing the latest research results and related information in the field of computer vision.
CVPR - Conference on Computer Vision and Pattern Recognition The Conference on Computer Vision and Pattern Recognition, one of the top conferences in computer vision, showcasing numerous important research findings.

Star History

Star History Chart

About

💬 An extensive collection of exceptional resources dedicated to the captivating world of talking face synthesis! ⭐ If you find this repo useful, please give it a star! 🤩

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages