Awesome-Talking-Head-Synthesis

This repository organizes papers, codes and resources related to generative adversarial networks (GANs) 🤗 and neural radiance fields (NeRF) 🎨, with a main focus on image-driven and audio-driven talking head synthesis papers and released codes. 👤

Papers for Talking Head Synthesis, released codes collections. ✍️

Most papers are linked to PDFs on "arXiv" or journal/conference websites 📚. However, some papers require an academic license to view 🔐.

🔆 This project Awesome-Talking-Head-Synthesis is ongoing - pull requests are welcome! If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and submit a PR. You can also open an issue or contact me directly via email. 📩

⭐ If you find this repo useful, please give it a star! 🤩

2023.12 Update 📆

Thank you to https://github.com/Curated-Awesome-Lists/awesome-ai-talking-heads, I have added some of its contents, such as Tools & Software and Slides & Presentations. 🙏 I hope this will be helpful.😊

If you have any feedback or ideas on extending this aggregated resource, please open an issue or PR - community contributions are vital to advancing this shared knowledge. 🤝

Let's keep pushing forward to recreate ever more realistic digital human faces! 💪 We've come so far but still have a long way to go. With continued research 🔬 and collaboration, I'm sure we'll get there! 🤗

Please feel free to star ⭐ and share this repo if you find it a valuable resource. Your support helps motivate me to keep maintaining and improving it. 🥰 Let me know if you have any other questions!

Datasets

Year	Dataset	Conference/Journal	Download Link	Description
2026	SFQA	ArXiv	N/A	A dataset for singing face generation quality assessment with 5,184 videos generated from 100 photographs and 36 music clips using 12 generation methods.
2025	TalkCuts	ArXiv 2025	N/A	A large-scale dataset with 164k clips totaling over 500 hours of human speech videos featuring diverse camera shots and detailed annotations including textual descriptions, 2D keypoints, and 3D SMPL-X motions for multi-shot speech video generation.
2025	EmojiBench++	IJCV 2025	Download	A comprehensive benchmark for portrait animation comprising diverse portraits, driving videos, and landmark sequences.
2025	Multi-human Interactive	ArXiv 2025	Download	12 hours of high-res footage with 2-4 speakers, fine-grained body pose and speech interaction annotations.
2025	THQA-10K	ArXiv 2025	Download	Largest AGTH quality assessment dataset with 10,457 samples from 12 T2I models and 14 talkers.
2025	SpeakerVid-5M	ArXiv	N/A	Large-scale dataset with 5.2M video clips (8,743 hours) for audio-visual dyadic interactive virtual human generation, covering monadic talking, listening, and dyadic conversations, with pre-training and SFT subsets.
2025	TalkingHeadBench	WACV 2026	Download	Comprehensive benchmark for talking-head deepfake detection with multi-model generators.
2025	Motion-X++	ArXiv 2025	N/A	19.5M 3D whole-body pose annotations covering 120.5K motion sequences with 80.8K RGB videos.
2024	GLCF (MSTF)	ArXiv 2024	N/A	First large-scale multi-scenario talking face dataset with 22 audio/video forgery techniques.
2024	SAVEE	ArXiv 2024	Download	480 British English utterances from 4 male actors expressing 7 emotions.
2024	DH-FaceVid-1K	ICCV 2025	Download	1,200 hours, 270K+ clips from 20K+ individuals with speech audio, keypoints, and text annotations.
2024	MMHead	ACMMM 2024	N/A	Large-scale multi-modal 3D facial animation dataset with 49 hours of 3D facial motion sequences, speech audios, and hierarchical text annotations for text-induced 3D talking head animation and text-to-3D facial motion generation.
2024	Allo-AVA	ArXiv 2024	N/A	~1,250 hours of conversational content for allocentric avatar gesture animation.
2024	MultiTalk	CVPR 2024	Download	420+ hours across 20 languages, 293K clips (512x512, 25fps, avg 5.19s duration).
2024	THQA	ArXiv 2024	Download	800 talking head videos from 8 speech-driven methods with subjective quality assessments.
2023	ViCo	ArXiv 2023	N/A	ViCo and ViCo-X are datasets for conversational head generation, with ViCo for sentence-level independent talking and listening tasks, and ViCo-X for multi-turn conversational scenarios.
2023	GRID	ArXiv 2023	Download	34 volunteers each speaking 1000 phrases (34K utterances) with 6-word sentence structures.
2023	TalkingHead-1KH	ArXiv 2023	Download	500K video clips with ~80K greater than 512x512 resolution. Only permissive license videos included.
2023	MMFace4D	ArXiv 2023	Download	Large-scale multi-modal 4D dataset with 35,000+ sequences from 431 subjects (age 15-68).
2023	CelebV	CVPR 2023	Download	Includes CelebV-Text with 70,000 in-the-wild face video clips for text-to-video generation.
2022	VFHQ	CVPRW 2022	Download	16,000+ high-fidelity clips for video face super-resolution research.
2022	Multiface	NeurIPS 2022	Download	High-quality multi-view recordings of 13 people with 12K-23K frames per subject at 30fps. 65TB dataset.
2022	CelebV-HQ	ECCV 2022	Download	35,666 clips with 15,653 identities, each labeled with 83 facial attributes.
2021	HDTF	CVPR 2021	Download	High-definition Talking-Face Dataset with ~362 videos (15.8 hours) in 720P/1080P resolution.
2020	MEAD	ECCV 2020	Download	Large-scale audio-visual dataset with 60 actors expressing 8 emotions at 3 intensity levels.
2019	VOCA	SIGGRAPH 2019	Download	4D-face dataset with ~29 minutes of 4D face scans and synchronized audio from 12 speakers.
2019	FaceForensics++	ICCV 2019	Download	Large-scale dataset for detecting manipulated facial images with over 1.8M images.
2019	CN-CVS	ArXiv 2019	Download	Large-scale continuous visual-speech dataset in Mandarin Chinese from TV news and speech shows.
2019	BIWI	ArXiv 2019	Download	3D Audiovisual Corpus of Affective Communication with 40 sentences spoken by 14 subjects.
2018	LRS2	ArXiv 2018	Download	Lip reading dataset with videos recorded in diverse settings from BBC television.
2018	LRW	ACCV 2018	Download	Diverse English-speaking dataset from BBC with 1000+ speakers. Each video is 1.16s (29 frames).
2018	VoxCeleb2	Interspeech 2018	Download	Largest public audio-visual dataset with video URLs and timestamps. Requires 300GB+ storage.
2017	VoxCeleb1	Interspeech 2017	Download	Contains over 100,000 utterances for 1,251 celebrities, extracted from YouTube videos.
2017	ObamaSet	SIGGRAPH 2017	Download	Specialized audio-visual dataset focused on analyzing visual speech of Barack Obama from weekly address footage.
2014	CREMA-D	ACM TOCC 2014	Download	Diverse dataset with 7,442 clips featuring 91 actors (48 male, 43 female) aged 20-74, expressing six emotions at four intensity levels.

Survey

Year	Title	Conference/Journal
2025	Advancing Talking Head Generation: A Comprehensive Survey of Multi-Modal Methodologies, Datasets, Evaluation Metrics, and Loss Functions	ArXiv 2025
2024	Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey	ArXiv 2024
2024	A Survey on 3D Human Avatar Modeling — From Reconstruction to Generation	ArXiv 2024
2024	Deepfake Generation and Detection: A Benchmark and Survey Github	ArXiv 2024
2024	A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos Code	ArXiv 2024
2024	How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey 3DGS+SLAM🔥🔥🔥	ArXiv 2024
2024	3D Gaussian as a New Vision Era: A Survey 3DGS🔥🔥🔥	ArXiv 2024
2024	Advances in 3D Generation: A Survey	ArXiv 2024
2024	A Survey on 3D Gaussian Splatting 3DGS🔥🔥🔥on going	ArXiv 2024
2024	Neural Radiance Fields: Past, Present, and Future NeRF🔥🔥🔥 Amazing 413 pages	ArXiv 2024
2023	From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications	ArXiv 2023
2023	Human-Computer Interaction System: A Survey of Talking-Head Generation	IEEE
2023	Talking human face generation: A survey	ACM
2022	Deep Learning for Visual Speech Analysis: A Survey	ArXiv 2022
2020	What comprises a good talking-head video generation?: A Survey and Benchmark	ArXiv 2020

Funny Work

Year	Title	Code	Project	Keywords
2024	From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations	Code	Project	Photoreal
2024	Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation	Code	Project	🔥Animate (阿里科目三驱动)
2024	What You See Is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs		Project	🔥Nvidia
2024	LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control	Codea	Project	🔥快手

Audio-driven

Year	Title	Conference/Journal	Code	Project	Keywords
2026	EmbedTalk: Triplane-Free Talking Head Synthesis using Embedding-Driven Gaussian Deformation	Preprint			Gaussian Splatting, 3DGS, Audio-Driven, Talking Head
2026	TempoSyncDiff: Distilled Temporally-Consistent Diffusion for Low-Latency Audio-Driven Talking Head Generation	ArXiv 2026	Code	Project	Diffusion, Audio-Driven, Talking Head, VAE, Latent
2026	UniSync: Towards Generalizable and High-Fidelity Lip Synchronization for Challenging Scenarios	ArXiv 2026			Lip Sync, Pose-Anchored, Generalizable
2026	FlowPortrait: Reinforcement Learning for Audio-Driven Portrait Video Generation	ArXiv 2026			Audio-Driven, Portrait Animation, Reinforcement Learning, GRPO
2026	UniTalking: A Unified Audio-Video Framework for Talking Portrait Generation	CVPR 2026			Audio-Driven, Portrait Animation, Talking Head, CVPR, Transformer, Attention
2026	AUHead: Realistic Emotional Talking Head Generation via Action Units Control	ArXiv 2026	Code		Action Units, Audio-Driven Generation, Emotion Control, Diffusion Model
2026	Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models	ArXiv 2026			3D Facial Animation, Speech-Driven, Omni-modal LLMs, Token-as-query Fusion
2026	MOVA: Towards Scalable and Synchronized Video-Audio Generation	ArXiv 2026	Code	Project	Audio-Driven
2026	VedicTHG: Symbolic Vedic Computation for Low-Resource Talking-Head Generation in Educational Avatars	ArXiv 2026	Code	Project	Avatar, Talking Head
2026	3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars	ArXiv 2026			3D, Emotional, Lip Sync, Avatar, Talking Head, Transformer
2026	DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation	ArXiv 2026		Project	Audio-Driven, Transformer, Attention
2026	VineetVC: Adaptive Video Conferencing Under Severe Bandwidth Constraints Using Audio-Driven Talking-Head Reconstruction	ArXiv 2026			Audio-Driven, Talking Head
2026	Narrating For You: Prompt-guided Audio-visual Narrating Face Generation Employing Multi-entangled Latent Space	WACV 2026			Audio-Driven, WACV, Latent
2026	SoulX-FlashHead: Oracle-guided Generation of Infinite Real-time Streaming Talking Heads	ArXiv 2026			Real-time, Streaming, Talking Head
2026	Asymmetric Hierarchical Anchoring for Audio-Visual Joint Representation	ArXiv 2026			Audio-Driven
2026	JoyAvatar: Unlocking Highly Expressive Avatars via Harmonized Text-Audio Conditioning	ArXiv 2026		Project	Audio-Driven, Avatar
2026	LPIPS-AttnWav2Lip: Generic Audio-Driven lip synchronization for Talking Head Generation in the Wild	ArXiv 2026	Code	Project	Lip Sync, Audio-Driven, Talking Head, Latent
2026	JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion	ArXiv 2026			Audio-Visual Diffusion, LoRA, Lip Sync
2026	MIRRORTALK: Forging Personalized Avatars Via Disentangled Style and Hierarchical Motion Control	ArXiv 2026			Personalized Avatars, Lip Sync, Style Disentanglement, Diffusion Model
2026	EditYourself: Audio-Driven Generation and Manipulation of Talking Head Videos with Diffusion Transformers	ArXiv 2026		Project	Diffusion, Audio-Driven, Talking Head
2026	SkyReels-V3 Technique Report	ArXiv 2026	Code		Video Generation, Audio-Guided, Talking Avatar, Diffusion Transformers
2026	FunCineForge: A Unified Dataset Toolkit and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes	ArXiv 2026			Talking Head, Movie Dubbing
2026	Uncertainty-Aware 3D Emotional Talking Face Synthesis with Emotion Prior Distillation	ICASSP 2026		Project	3D, Emotional, Talking Head, ICASSP, Attention
2026	Audio-Driven Talking Face Generation with Blink Embedding and Hash Grid Landmarks Encoding	ArXiv 2026			Audio-Driven, Talking Head, Transformer
2026	Now You See Me, Now You Don't: A Unified Framework for Expression Consistent Anonymization in Talking Head Videos	ArXiv 2026			Talking Head
2026	EditEmoTalk: Controllable Speech-Driven 3D Facial Animation with Continuous Expression Editing	ArXiv 2026			3D, Speech-Driven
2026	MoCha:End-to-End Video Character Replacement without Structural Guidance	ArXiv 2026			Talking Head
2026	Exploring Talking Head Models With Adjacent Frame Prior for Speech-Preserving Facial Expression Manipulation	ACM Trans. Multimedia			Speech-Driven, Talking Head
2026	ESGaussianFace: Emotional and Stylized Audio-Driven Facial Animation via 3D Gaussian Splatting	ArXiv 2026			3D, Gaussian Splatting, 3DGS, Emotional, Audio-Driven
2026	SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation	ArXiv 2026			Real-time, Streaming, Audio-Driven, Avatar, Attention, VAE
2026	DyStream: Streaming Dyadic Talking Heads Generation via Flow Matching-based Autoregressive Model	ArXiv 2026		Project	Streaming, Talking Head, Flow Matching
2026	SyncAnyone: Implicit Disentanglement via Progressive Self-Correction for Lip-Syncing in the wild	ArXiv 2026		Project	Transformer
2026	Efficient and Robust Video Defense Framework against 3D-field Personalized Talking Face	ArXiv 2026	Code	Project	3D, Talking Head, Attention
2026	REST: Diffusion-based Real-time End-to-end Streaming Talking Head Generation via ID-Context Caching and Asynchronous Streaming Distillation	ArXiv 2026			Diffusion, Real-time, Streaming, Talking Head, Latent
2026	JoyAvatar-Flash: Real-time and Infinite Audio-Driven Avatar Generation with Autoregressive Diffusion	ArXiv 2026			Diffusion, Real-time, Audio-Driven, Avatar
2026	Lightning Fast Caching-based Parallel Denoising Prediction for Accelerating Talking Head Generation	ArXiv 2026			Talking Head, Attention, Latent
2026	SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation	ArXiv 2026			Audio-Driven, Talking Head
2025	From Inpainting to Editing: A Self-Bootstrapping Framework for Context-Rich Visual Dubbing	ArXiv 2025		Project	Visual dubbing, Diffusion Transformer, Self-bootstrapping, Lip sync
2025	The Locally Deployable Virtual Doctor: LLM Based Human Interface for Automated Anamnesis and Database Conversion	ArXiv 2025			Conditional Diffusion, Facial Animation, Audio-Visual Synchronization, LLM-Based Avatar
2025	Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation	ICXR 2025			Blendshapes, FLAME, Disentanglement, 3D Animation
2025	Revising Second Order Terms in Deep Animation Video Coding	ArXiv 2025			FOMM, Keypoints, Head Rotation, Motion Model
2025	A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages	ArXiv 2025			Phoneme-Viseme Alignment, Multilingual TFS, Mixture-of-Experts
2025	Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation	ArXiv 2025		Project	Diffusion models, co-speech video, real-time, sparse attention
2025	Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis	ArXiv 2025			Multimodal Instructions, Avatar Synthesis, Lip Synchronization
2025	A Lightweight Pipeline for Noisy Speech Voice Cloning and Accurate Lip Sync Synthesis	ArXiv 2025			Voice Cloning, Lip Sync Synthesis, Noisy Speech
2025	CEM-Net: Cross-Emotion Memory Network for Emotional Talking Face Generation	ArXiv 2025			Cross-emotion memory, audio emotion enhancement, expression displacement, lip sync
2025	Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching	Technical Report		Project	real-time, flow matching, lip-sync
2025	ATL-Diff: Audio-Driven Talking Head Generation with Early Landmarks-Guide Noise Diffusion	ArXiv 2025	Code		Diffusion, Landmarks-Guide, Real-time, Identity Preservation
2025	MoDiT: Learning Highly Consistent 3D Motion Coefficients with Diffusion Transformer for Talking Head Generation	ArXiv 2025			3DMM, Diffusion Transformer, Temporal Consistency, Blinking Dynamics
2025	Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning	ArXiv 2025			Joint uncertainty learning, Audio-driven talking face, Lip sync, Visual uncertainty
2025	Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation	TMM 2025			Talking Head Animation, Temporal Correlation, One-Shot
2025	EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters	ArXiv 2025			Neural Radiance Fields, Expression Parameters, Emotion Control, Audio-Driven
2025	STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing	ICME 2025	Code		Spatial-Temporal Alignment, Semantic Features, Visual Dubbing, Stability
2025	Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait	ArXiv 2025	Code		implicit keypoint, spatiotemporal diffusion, audio-driven, talking portrait
2025	EmoDiffusion: Enhancing Emotional 3D Facial Animation with Latent Diffusion Models	ArXiv 2025		Project	3D facial animation, latent diffusion, emotional expression, speech-driven
2025	Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation	ArXiv 2025			Audio-driven, Diffusion model, Motion Diffusion Transformer, Lip sync
2025	MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation	ArXiv 2025			Audio-driven, Emotion Synthesis, Mixture of Experts, Portrait Animation
2025	PTalker: Personalized Speech-Driven 3D Talking Head Animation via Style Disentanglement and Modality Alignment	ArXiv 2025			3D, Speech-Driven, Talking Head, Attention
2025	Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation	ArXiv 2025		Project	Diffusion, Real-time, Portrait Animation, Attention
2025	FlashLips: 100-FPS Mask-Free Latent Lip-Sync using Reconstruction Instead of Diffusion or GANs	ArXiv 2025			Diffusion, Transformer, GAN, Latent
2025	In-Context Audio Control of Video Diffusion Transformers	ArXiv 2025			Diffusion, Audio-Driven, Transformer, Attention
2025	SynergyWarpNet: Attention-Guided Cooperative Warping for Neural Portrait Animation	ArXiv 2025			Portrait Animation, ICASSP, Attention
2025	FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction	ArXiv 2025			Portrait Animation, Transformer, Latent
2025	VASA-3D: Lifelike Audio-Driven Gaussian Head Avatars from a Single Image	NeurIPS 2025			3D, Gaussian Splatting, Audio-Driven, Avatar
2025	TalkVerse: Democratizing Minute-Long Audio-Driven Video Generation	ArXiv 2025		Project	Audio-Driven, VAE, Latent
2025	FacEDiT: Unified Talking Face Editing and Generation via Facial Motion Infilling	ArXiv 2025		Project	Talking Head, Transformer, Attention, Flow Matching
2025	STARCaster: Spatio-Temporal AutoRegressive Video Diffusion for Identity- and View-Aware Talking Portraits	ArXiv 2025		Project	Diffusion, Portrait Animation, Talking Head
2025	JoVA: Unified Multimodal Learning for Joint Video-Audio Generation	ArXiv 2025		Project	Audio-Driven, Transformer, Attention, GAN
2025	Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model	Tech Report			Audio-Driven, Transformer, Reinforcement Learning
2025	FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint	ArXiv 2025		Project	Portrait Animation, Transformer, Latent
2025	KeyframeFace: From Text to Expressive Facial Keyframes	ArXiv 2025	Code	Project	Talking Head
2025	PersonaLive! Expressive Portrait Image Animation for Live Streaming	ArXiv 2025			Streaming, Portrait Animation
2025	GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting	WACV 2026			3D, Gaussian Splatting, 3DGS, Audio-Driven, Talking Head
2025	UniLS: End-to-End Audio-Driven Avatars for Unified Listening and Speaking	ArXiv 2025			Audio-Driven, Avatar
2025	Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length	ArXiv 2025			Real-time, Streaming, Audio-Driven, Avatar
2025	EvalTalker: Learning to Evaluate Real-Portrait-Driven Multi-Subject Talking Humans	ArXiv 2025			Portrait Animation, Talking Head
2025	AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement	ArXiv 2025		Project	Talking Head, Transformer, Attention
2025	AI killed the video star. Audio-driven diffusion model for expressive talking head generation	ArXiv 2025			Diffusion, Audio-Driven, Talking Head, Transformer
2025	IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer	ArXiv 2025			Audio-Driven, Talking Head, Attention, Latent
2025	Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy	ArXiv 2025			Audio-Driven, Attention, Latent
2025	StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model	ArXiv 2025		Project	3D, Diffusion, Streaming, Audio-Driven
2025	ConsistTalk: Intensity Controllable Temporally Consistent Talking Head Generation with Diffusion Noise Search	AAAI 2026			Diffusion, Talking Head, AAAI, Knowledge Distillation
2025	GRACE: Designing Generative Face Video Codec via Agile Hardware-Centric Workflow	ArXiv 2025			Talking Head
2025	Shared Latent Representation for Joint Text-to-Audio-Visual Synthesis	ArXiv 2025			Audio-Driven, Latent
2025	THEval. Evaluation Framework for Talking Head Video Generation	ArXiv 2025			Talking Head
2025	UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions	ArXiv 2025			Audio-Driven, Transformer, Attention, Latent
2025	Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback	ArXiv 2025			Diffusion, Audio-Driven, AAAI, Transformer
2025	Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation	ICXR 2025			Blendshapes, FLAME, Disentanglement, 3D Animation
2025	Revising Second Order Terms in Deep Animation Video Coding	ArXiv 2025			FOMM, Keypoints, Head Rotation, Motion Model
2025	MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control	ArXiv 2025
2025	See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region Refinement	TASLP 2025			High-Resolution, Talking Faces, Speech-to-Face, Diffusion
2025	LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representation	SIGGRAPH Asia 2025	Code		Label-Free, Speech-Driven, Facial Animation, FLAME
2025	DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis	ArXiv 2025			Disentangled Motion, Flow Matching, Talking Portrait, Controllable
2025	SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation	ArXiv 2025			Contrastive Masked Pretraining, Audio-Visual, Talking-Face
2025	EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation	IEEE SMC 2025			Real-Time, Audio-Driven, Gaussian Deformation, Talking Head
2025	Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing	ArXiv 2025			Biometric Leakage, AI Videoconferencing, Security
2025	Audio Driven Real-Time Facial Animation for Social Telepresence	SIGGRAPH Asia 2025		Project	Real-time, Audio-Driven, SIGGRAPH, Transformer, Latent
2025	A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages	ArXiv 2025			Phoneme-Viseme Alignment, Multilingual TFS, Mixture-of-Experts
2025	Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation	ArXiv 2025		Project	Diffusion models, co-speech video, real-time, sparse attention
2025	Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis	ArXiv 2025			Multimodal Instructions, Avatar Synthesis, Lip Synchronization
2025	A Lightweight Pipeline for Noisy Speech Voice Cloning and Accurate Lip Sync Synthesis	ArXiv 2025			Voice Cloning, Lip Sync Synthesis, Noisy Speech
2025	CEM-Net: Cross-Emotion Memory Network for Emotional Talking Face Generation	ArXiv 2025			Cross-emotion memory, audio emotion enhancement, expression displacement, lip sync
2025	Warm Chat: Diffuse Emotion-aware Interactive Talking Head Avatar with Tree-Structured Guidance	ArXiv 2025 (Withdrawn)			Emotional, Avatar, Talking Head, Transformer, Latent
2025	EmoCAST: Emotional Talking Portrait via Emotive Text Description	ArXiv 2025	Code	Project	Emotional, Portrait Animation, Talking Head, Attention
2025	READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation	ArXiv 2025		Project	Diffusion, Real-time, Audio-Driven, Talking Head, Transformer, VAE
2025	Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching	Technical Report		Project	real-time, flow matching, lip-sync
2025	ATL-Diff: Audio-Driven Talking Head Generation with Early Landmarks-Guide Noise Diffusion	ArXiv 2025	Code		Diffusion, Landmarks-Guide, Real-time, Identity Preservation
2025	MoDiT: Learning Highly Consistent 3D Motion Coefficients with Diffusion Transformer for Talking Head Generation	ArXiv 2025			3DMM, Diffusion Transformer, Temporal Consistency, Blinking Dynamics
2025	MOSPA: Human Motion Generation Driven by Spatial Audio	NeurIPS 2025	Code		Spatial Audio, Human Motion Generation, Virtual Human
2025	Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning	ArXiv 2025			Joint uncertainty learning, Audio-driven talking face, Lip sync, Visual uncertainty
2025	Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation	TMM 2025			Talking Head Animation, Temporal Correlation, One-Shot
2025	EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters	ArXiv 2025			Neural Radiance Fields, Expression Parameters, Emotion Control, Audio-Driven
2025	STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing	ICME 2025	Code		Spatial-Temporal Alignment, Semantic Features, Visual Dubbing, Stability
2025	Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait	ArXiv 2025	Code		implicit keypoint, spatiotemporal diffusion, audio-driven, talking portrait
2025	EmoDiffusion: Enhancing Emotional 3D Facial Animation with Latent Diffusion Models	ArXiv 2025		Project	3D facial animation, latent diffusion, emotional expression, speech-driven
2025	Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation	ArXiv 2025			Audio-driven, Diffusion model, Motion Diffusion Transformer, Lip sync
2025	MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation	ArXiv 2025			Audio-driven, Emotion Synthesis, Mixture of Experts, Portrait Animation
2025	DGTalker: Disentangled Generative Latent Space Learning for Audio-Driven Gaussian Talking Heads	ICCV 2025		Project	Gaussian, Latent Space
2025	Talking Head Generation via Viewpoint and Lighting Simulation Based on Global Representation	ACM MM 2025			Depth-based
2025	PESTalk: Speech-Driven 3D Facial Animation with Personalized Emotional Styles	ACM MM 2025			FLAME
2025	GOES: 3D Gaussian-based One-shot Head Animation with Any Emotion and Any Style	ACM MM 2025			One-Shot, 3DGS
2025	StableDub: Taming Diffusion Prior for Generalized and Efficient Visual Dubbing	ArXiv 2025		Project	Visual Dubbing, Diffusion, Mamba-Transformer
2025	KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation	ArXiv 2025			Keyframe, Diffusion, Dual-Path, Facial Animation
2025	SynchroRaMa: Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding	WACV 2026		Project	Multi-Modal, Emotion-Aware, LLM
2025	Talking Head Generation via AU-Guided Landmark Prediction	ArXiv 2025			Action Units, Landmark Prediction, Diffusion
2025	3DiFACE: Synthesizing and Editing Holistic 3D Facial Animation	ArXiv 2025		Project	3D Facial Animation, Diffusion, Editing, Speech-Driven
2025	PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control	ICONIP 2025			3DGS, Real-Time, Pixel-Aware, Audio-Driven
2025	Beat on Gaze: Learning Stylized Generation of Gaze and Head Dynamics	ArXiv 2025			Gaze Control, Head Motion, Style-Aware, 3D
2025	Think2Sing: Orchestrating Structured Motion Subtitles for Singing-Driven 3D Head Animation	ArXiv 2025			Singing-Driven, 3D Head, Diffusion
2025	Audio2Face-3D: Audio-driven Realistic Facial Animation For Digital Avatars	ArXiv 2025			Audio-driven Realistic Facial Animation, Digital Avatars
2025	DisenEmo: Learning disentangled emotional representation from facial motion for 3D talking head generation	ICIP 2025			Disentangled Emotional Representation, 3D Talking Head Generation
2025	ExpTalk: Diverse Emotional Expression via Adaptive Disentanglement and Refined Alignment for Speech-Driven 3D Facial Animation	IJCAI 2025			Adaptive Disentanglement, Refined Alignment, 3D Facial Animation
2025	SyncGaussian: Stable 3D Gaussian-Based Talking Head Generation with Enhanced Lip Sync via Discriminative Speech Feature	IJCAI 2025			Stable 3D Gaussian-Based Talking Head Generation, Enhanced Lip Sync, Discriminative Speech Feature
2025	Wan-S2V: Audio-Driven Cinematic Video Generation	ArXiv 2025			Cinematic, Audio-Driven, Video Generation
2025	InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing	ArXiv 2025			Sparse-Frame Dubbing, Full-Body
2025	D^3-Talker: Dual-Branch Decoupled Deformation Fields for Few-Shot 3D Talking Head Synthesis	ECAI 2025			Few-Shot, 3DGS, Deformation Fields
2025	RealTalk: Realistic Emotion-Aware Lifelike Talking-Head Synthesis	ICCV 2025 Workshop			Emotion, NeRF, VAE
2025	FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation	ArXiv 2025		Project	Audio-Driven, Portrait Animation, Preference Optimization
2025	HM-Talker: Hybrid Motion Modeling for High-Fidelity Talking Head Synthesis	ArXiv 2025			Hybrid Motion, High-Fidelity, Talking Head
2025	StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation	ArXiv 2025	Code	Project	Stable Diffusion
2025	X-Actor: Emotional and Expressive Long-Range Portrait Acting from Audio	ArXiv 2025		Project	Emotional Portrait, Long-range, Audio-driven
2025	DICE-Talk: Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation	ACM MM 2025			Emotional Portrait, Identity Preservation, Emotion Cooperation
2025	M2DAO-Talker: Harmonizing Multi-granular Motion Decoupling and Alternating Optimization for Talking-head Generation	ArXiv 2025		Project	Multi-granular Motion, Decoupling, Optimization
2025	MEDTalk: Multimodal Controlled 3D Facial Animation with Dynamic Emotions by Disentangled Embedding	ArXiv 2025	Code		Multimodal, 3D Facial Animation, Dynamic Emotions
2025	KLASSify to Verify: Audio-Visual Deepfake Detection Using SSL-based Audio and Handcrafted Visual Features	ACM MM 2025			Deepfake Detection, Audio-Visual, SSL
2025	DiTalker: A Unified DiT-based Framework for High-Quality and Speaking Styles Controllable Portrait Animation	ArXiv 2025		Project	DiT, Portrait Animation, Speaking Styles
2025	Learning Phonetic Context-Dependent Viseme for Enhancing Speech-Driven 3D Facial Animation	Interspeech 2025	Project		Phonetic Context, Viseme, 3D Facial Animation
2025	SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation	ACM MM 2025			Spatial Audio, Video Generation, MLLM
2025	Is It Really You? Exploring Biometric Verification Scenarios in Photorealistic Talking-Head Avatar Videos	IEEE IJCB 2025			Biometric Verification, Avatar Security, Facial Motion
2025	Mask-Free Audio-driven Talking Face Generation for Enhanced Visual Quality and Identity Preservation	ArXiv 2025			Mask-Free, Identity Preservation, Audio-driven
2025	MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization	ICCV 2025		Project	Personalized, 3D Facial Animation, Memory
2025	Real-time Generation of Various Types of Nodding for Avatar Attentive Listening System	ICMI 2025	Code		Real-time, Nodding Generation, Avatar Interaction
2025	MoDA: Multi-modal Diffusion Architecture for Talking Head Generation	ArXiv 2025		Project	Multi-modal, Diffusion, Talking Head Generation
2025	GGTalker: Talking Head Synthesis with Generalizable Gaussian Priors and Identity-Specific Adaptation	ICCV 2025	Code	Project	3D Talking Head, Gaussian Priors, Identity Adaptation
2025	FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases	ArXiv 2025			Identity Leakage, Extreme Cases
2025	Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field	ArXiv 2025			Few-Shot, Global Gaussian Field, 3DGS
2025	JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching	ArXiv 2025			Flow Matching, Audio-Motion
2025	ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model	ArXiv 2025			Autoregressive, FLAME, 3D
2025	Audio-Visual Driven Compression for Low-Bitrate Talking Head Videos	ICMR 2025			Compression, Low-Bitrate
2025	SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting	ArXiv 2025			3DGS, Synchronization
2025	Controllable Expressive 3D Facial Animation via Diffusion in a Unified Multimodal Space	ICME 2025			3D, Diffusion, Multimodal
2025	EmoVOCA: Speech-Driven Emotional 3D Talking Heads	WACV 2025			Emotional, 3D, VOCA
2025	Lipschitz-Driven Noise Robustness in VQ-AE for High-Frequency Texture Repair in ID-Specific Talking Heads	ArXiv 2025			Noise Robustness, VQ-AE, High-Frequency
2025	LLIA -- Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models	ArXiv 2025			Low-Latency, Real-Time, Interactive
2025	Sonic: Shifting Focus to Global Audio Perception in Portrait Animation	CVPR 2025			Global Audio Perception, Portrait Animation
2025	High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning	ArXiv 2025			LLM, Reliability
2025	Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation	CVPR 2025			Adversarial Defense, Privacy
2025	Cocktail-Party Audio-Visual Speech Recognition	Interspeech 2025			Audio-Visual Speech Recognition
2025	TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models	ArXiv 2025			Real-Time, Autoregressive Diffusion
2025	MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation	ArXiv 2025			Co-Speech Gesture, Two-Stage
2025	V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow	ICASSP 2025			Video-to-Speech, Speech Decomposition
2025	IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos	CVPR 2025			3D-aware, Video Diffusion
2025	Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements	ArXiv 2025			Voice Conversion, Survey
2025	FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing	ArXiv 2025			Attribute Editing, Interactive
2025	Video Editing for Audio-Visual Dubbing	ArXiv 2025			Video Editing, Dubbing
2025	Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation	CVPR 2025			3D, Semantic Decoupling
2025	Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion	ArXiv 2025			Diffusion, 3D
2025	VTutor: An Animated Pedagogical Agent SDK that Provide Real Time Multi-Model Feedback	ArXiv 2025			SDK, LLM, Real-time
2025	OT-Talk: Animating 3D Talking Head with Optimal Transportation	ArXiv 2025			FLAME, 3D
2025	GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting	CVPRW 2025			3DGS
2025	Model See Model Do: Speech-Driven Facial Animation with Style Control	SIGGRAPH 2025
2025	FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing	ArXiv 2025			LLM, Qwen
2025	KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution	ArXiv 2025
2025	Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis	ArXiv 2025			Diffusion
2025	FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis	ICMR 2025
2025	MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices	CVPR 2025			100+fps
2025	Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation	ArXiv 2025
2025	FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation	CVPR 2025			Fast Diffusion 12.5X speedup
2025	Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency	ICLR 2025
2025	Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance	ArXiv 2025
2025	Audio-driven Gesture Generation via Deviation Feature in the Latent Space	ArXiv 2025			Gesture
2025	Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics	CVPR 2025
2025	MGGTalk：Monocular and Generalizable Gaussian Talking Head Animation	CVPR 2025	Project		One Shot, 3DGS
2025	DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance	ArXiv 2025			Dubbing
2025	Dual Audio-Centric Modality Coupling for Talking Head Generation	ArXiv 2025			NeRF
2025	Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis	ArXiv 2025			3DGS
2025	AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers	CVPR 2025			DiT
2025	DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model	ICME 2025
2025	Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation	CVPR 2025			Autoregressive
2025	DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation	ICME 2025			Diffusion, 3D
2025	Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation	CVPR 2025
2025	HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation	CVPR 2025			Hunyuan
2025	MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling	ArXiv 2025
2025	StyleSpeaker: Audio-Enhanced Fine-Grained Style Modeling for Speech-Driven 3D Facial Animation	ArXiv 2025			3D
2025	LatentSnc: Taming Audio-Conditioned Latent Diffusion Models for Lip Sync with SyncNet Supervision	ArXiv 2025
2025	MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice	ArXiv 2025
2025	KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation	CVPR 2025			Diffusion, Long Sequences
2025	Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture	CVPR 2025			Texture
2025	AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation	IEEE Transactions on Multimedia		Project	MLoRA, Personalized
2025	InsTaG: Learning Personalized 3D Talking Head from Few-Second Video	CVPR 2025			Few Shot, 3DGS
2025	FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion model	ArXiv 2025			Diffusion
2025	NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis	ICASSP 2025
2025	Emotional Face-to-Speech	ArXiv 2025			emotion, face2speech
2025	EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis	ArXiv 2025			emotion, 3DGS
2025	EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face Animation	ArXiv 2025			emotion，3D
2025	Towards Dynamic NeProbTalk3Dural Communication and Speech Neuroprosthesis Based on Viseme Decoding	ICASSP 2025			Viseme
2025	SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation	ArXiv 2025			Huaman Pose
2025	JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing	ArXiv 2025			Depth, JD work
2025	Identity-Preserving Video Dubbing Using Motion Warping	ArXiv 2025			Video Dubbing
2025	LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition	ICASSP 2025			VSR
2025	DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis	ICASSP 2025			Hair-Preserving
2025	UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control	ArXiv 2025			SD, Lighting control
2024	Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis	ArXiv 2024			Audio Feature Extraction, Whisper, Real-time processing, Talking portrait synthesis
2024	PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation	ArXiv 2024		Project	Pose Latent Diffusion, Lip Synchronization, Text-Audio Control
2024	One-Shot Pose-Driving Face Animation Platform	ArXiv 2024			One-Shot, Pose-Driving, Face Animation, Talking Head
2024	FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization	ArXiv 2024			Normalizing Flow, Vector-Quantization, Lip Sync, Emotional Talking Faces
2024	VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization	ArXiv 2024			visemes, code book
2024	PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis	AAAI 2025			Point Cloud, Gaussian Splatting
2024	EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion	CVPR 2025		Project	Emotion, Expressive, Diffusion
2024	GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression	AAAI 2025			Gaze-oriented
2024	EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing	CVPR 2025			Emotion, Dubber
2024	PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation	ArXiv 2024			Diffusion, Attention, One-Shot
2024	DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation	AAAI 2025			3D face, FLAME, Emotion
2024	LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync	ArXiv 2024			Diffusion, SyncNet
2024	GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression	AAAI 2025			Gaze
2024	FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait	ICCV 2025		Project	Flow Matching
2024	SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model	ArXiv 2024			Diffusion, Style
2024	Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming	Tech Report			Omni！！！
2024	Controllable Talking Face Generation by Implicit Facial Keypoints Editing	ArXiv 2024			Face Edit
2024	SPEAK: Speech-Driven Pose and Emotion-Adjustable Talking Head Generation	ArXiv 2024
2024	LokiTalk: Learning Fine-Grained and Generalizable Correspondences to Enhance NeRF-based Talking Head Synthesis	ArXiv 2024			NeRF
2024	MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation	ArXiv 2024			Memory
2024	IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation	ArXiv 2024			Motion Diffusion Model
2024	Memories are One-to-Many Mapping Alleviators in Talking Face Generation	IEEE 2024			Memory
2024	Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis	ArXiv 2024			Diffusion
2024	GaussianSpeech: Audio-Driven Gaussian Avatars	ArXiv 2024			3DGS, 3D
2024	LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis	ArXiv 2024
2024	EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion	ArXiv 2024
2024	S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis	ECCV 2024
2024	LES-Talker: Fine-Grained Emotion Editing for Talking Head Generation in Linear Emotion Space	ArXiv 2024			Fine-Grained Emotion
2024	JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation	ArXiv 2024			Diffusion, VASA
2024	JoyHallo: Digital human model for Mandarin	ArXiv 2024			Diffusion, Hallo
2024	Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation	ICLR 2025			Diffusion, Hallo
2024	Audio-Driven Emotional 3D Talking-Head Generation	ArXiv 2024			Emotion
2024	Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts	ArXiv 2024
2024	Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization	ArXiv 2024
2024	DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation	ArXiv 2024			Non-autoregressive Diffusion
2024	LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details	ArXiv 2024
2024	Diverse Code Query Learning for Speech-Driven Facial Animation	ArXiv 2024
2024	TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans	ECCVW 2024			NeRF
2024	ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE	SIGGRAPH MIG 2024			3D
2024	JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation	BMVC 2024			NeRF
2024	3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy	ArXiv 2024
2024	LawDNet: Enhanced Audio-Driven Lip Synthesis via Local Affine Warping Deformation	ArXiv 2024
2024	StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads	TPAMI 2024
2024	DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures	ArXiv 2024			diffusion
2024	EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion	ArXiv 2024			Diffusion
2024	PersonaTalk: Bring Attention to Your Persona in Visual Dubbing	SIGGRAPH Asia 2024
2024	KAN-Based Fusion of Dual-Domain for Audio-Driven Facial Landmarks Generation	ArXiv 2024			KAN
2024	TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation	ArXiv 2024			LoRA
2024	Avatar Concept Slider: Manipulate Concepts In Your Human Avatar With Fine-grained Control	ArXiv 2024
2024	G3FA: Geometry-guided GAN for Face Animation	BMVC 2024
2024	Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation	ArXiv 2024
2024	DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation	ArXiv 2024
2024	High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model	IEEE TIP
2024	Style-Preserving Lip Sync via Audio-Aware Style Reference	IEEE TIP
2024	Talk to the Wall: The Role of Speech Interaction in Collaborative Visual Analytics	IEEE TVCG 2024			Collaborative
2024	MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation	ArXiv 2024			Co-Speech Gesture
2024	GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer	ArXiv 2024
2024	UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model	ArXiv 2024
2024	DiM-Gesture: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2 framework	ArXiv 2024
2024	What if Red Can Talk? Dynamic Dialogue Generation Using Large Language Models	ACL Wordplay 2024
2024	LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement	ArXiv 2024
2024	RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network	ArXiv 2024
2024	Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation	ArXiv 2024
2024	JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model	ArXiv 2024			3D
2024	Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs	COLM 2024			LLM
2024	Digital Avatars: Framework Development and Their Evaluation	ArXiv 2024
2024	EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head	ECCV 2024
2024	PAV: Personalized Head Avatar from Unstructured Video Collection	ECCV 2024
2024	Text-based Talking Video Editing with Cascaded Conditional Diffusion	ArXiv 2024
2024	EmoFace: Audio-driven Emotional 3D Face Animation	IEEE VR 2024
2024	Learning Online Scale Transformation for Talking Head Video Generation	ArXiv 2024
2024	EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning	AAAI 2025			🔥阿里
2024	Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN	ArXiv 2024			StyleGAN
2024	Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert	Interspeech 2024			3D
2024	MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset	Interspeech 2024			3D, Dataset
2024	NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation	ArXiv 2024			NeRF
2024	Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement	ArXiv 2024
2024	V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation	Tech Report			🔥EMO, Diffusion, Open-source
2024	CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer	WACV 2024
2024	Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation	ArXiv 2024			🔥EMO, Diffusion, Open-source
2024	Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation	ArXiv 2024			Emotion
2024	Controllable Talking Face Generation by Implicit Facial Keypoints Editing	ArXiv 2024			Controller
2024	InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation	ArXiv 2024			Text-Guided
2024	Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation	ArXiv 2024			A Benchmark and Survey
2024	NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior	CVPRW 2024			SadTalker+NeRF
2024	SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space	ICASSP 2025
2024	AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding	ArXiv 2024
2024	EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars	ArXiv 2024			EMO
2024	GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting	ACMM 2024			🔥Gaussian Splatting
2024	CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation	ArXiv 2024			Emotion
2024	GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting	ArXiv 2024			🔥Gaussian Splatting
2024	GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting	ACMM 2024			🔥Gaussian Splatting
2024	TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting	ECCV 2024			🔥Gaussian Splatting
2024	Learn2Talk: 3D Talking Face Learns from 2D Talking Face	ArXiv 2024			🔥Gaussian Splatting
2024	VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time	NeurIPS 2024			🔥🔥🔥Awesome，Microsoft
2024	Pose-Aware 3D Talking Face Synthesis using Geometry-guided Audio-Vertices Attention	IEEE 2024
2024	EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis	ECCV 2024			Emotion
2024	FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio	ArXiv 2024
2024	Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior	ArXiv 2024
2024	AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation	ArXiv 2024			🔥🔥🔥Similar to EMO
2024	Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework	CVPR 2024
2024	Adaptive Super Resolution For One-Shot Talking-Head Generation	ICASSP 2024
2024	VLOGGER: Multimodal Diffusion for Embodied	ArXiv 2024			Embodied
2024	EmoVOCA: Speech-Driven Emotional 3D Talking Heads	ArXiv 2024			3D, VOCA
2024	ScanTalk: 3D Talking Heads from Unregistered Scans	ECCV 2024			3D
2024	Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style	ArXiv 2024
2024	EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions	ArXiv 2024			🔥🔥🔥Amazing, Diffusion
2024	G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment	ArXiv 2024			A Generic Framework
2024	Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis	CVPR 2024			High-Quality
2024	DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer	ArXiv 2024			3D
2024	EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation	ArXiv 2024			Emotion
2024	NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis	ICASSP 2024			AU
2024	Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis	ICLR 2024			3D, One-Shot,Realistic
2024	SyncTalk: The Devil😈 is in the Synchronization for Talking Head Synthesis	CVPR 2024			😈Talking Head
2024	AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation	ArXiv 2024			3D, Mesh
2024	DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation	ArXiv 2024			Emotion
2024	AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis	AAAI 2024
2024	R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning	ArXiv 2024			based-RAD-NeRF
2024	DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis	ICASSP 2024	-	-	ER-NeRF
2023	Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis	ICCV 2023			Tri-plane
2023	LipNeRF: What is the right feature space to lip-sync a NeRF?	FG 2023			Wav2lip
2024	VectorTalker: SVG Talking Face Generation with Progressive Vectorisation	ArXiv 2024			SVG
2024	Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation	AAAI 2024			3D
2024	DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models	ArXiv 2024			Diffusion
2024	FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models	ArXiv 2024
2024	GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance	ArXiv 2024			3D
2024	GMTalker: Gaussian Mixture based Emotional talking video Portraits	ArXiv 2024			Emotion
2024	VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior	ArXiv 2024			Mesh
2024	GAIA: Zero-shot Talking Avatar Generation	ArXiv 2024	Code(coming)		😲😲😲
2023	Towards Streaming Speech-to-Avatar Synthesis	ArXiv 2023			Streaming Synthesis, Articulatory Inversion, Real-time, Speech-driven
2023	OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions	ArXiv 2023			One-shot Talking Head, Head Motions, One-to-Many Mapping, Audio-driven
2023	Controllable One-Shot Face Video Synthesis With Semantic Aware Prior	ArXiv 2023			One-shot Talking Head, Semantic Aware Prior, Controllable Generation, Pose Alignment
2023	FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions	ICME 2023			Natural Head Motions, Flow-guided, Audio-driven Pose Prediction, One-shot Talking Head
2023	OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering	ArXiv 2023			Tri-plane Rendering, One-shot Avatar, Controllable, 3D Consistency
2023	OPT: One-shot Pose-Controllable Talking Head Generation	ICASSP 2023			pose control, identity preservation, audio feature disentanglement
2023	Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation	ICCV 2023			-
2023	ToonTalker: Cross-Domain Face Reenactment	ICCV 2023	-	-	-
2023	Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation	ICCV 2023			-
2023	EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation	ICCV 2023	-	-	Emotion
2023	Emotional Listener Portrait: Realistic Listener Motion Simulation in Conversation	ICCV 2023	-	-	Emotion,LHG
2023	MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions	ICCV 2023	-	-	-
2023	Facediffuser: Speech-driven 3d facial animation synthesis using diffusion	ACM SIGGRAPH MIG 2023			🔥Diffusion,3D
2023	Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis	TCSVT 2023	-	-
2023	SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation	CVPR 2023			3D,Single Image
2023	EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation	ICCV 2023			3D,Emotion
2023	Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks	InterSpeech 2023			Emotion
2023	DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video	AAAI 2023
2023	StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles	AAAI 2023			Style
2023	High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning	CVPR 2023			Emotion
2023	StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator	CVPR 2023			-
2023	TalkLip: Seeing What You Said - Talking Face Generation Guided by a Lip Reading Expert	CVPR 2023
2023	CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior	CVPR 2023			3D,codebook
2023	Emotionally Enhanced Talking Face Generation	ArXiv 2023			Emotion
2023	DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder	ACMM 2023			🔥Diffusion
2023	READ Avatars: Realistic Emotion-controllable Audio Driven Avatars	ArXiv 2023			-
2023	DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis	CVPR 2023			🔥Diffusion
2023	Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation	ArXiv 2023	-		🔥Diffusion
2022	Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis	ArXiv 2022			disentangled representation, contrastive learning, multi-motion control
2022	Emotion-Controllable Generalized Talking Face Generation	IJCAI 2022			emotion control, graph convolutional network, geometry-aware
2022	StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN	ArXiv 2022			StyleGAN, high-resolution, one-shot, lip sync
2022	VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild	SIGGRAPH 2022
2022	Expressive Talking Head Generation with Granular Audio-Visual Control	CVPR 2022	-	-	-
2022	Talking Face Generation with Multilingual TTS	CVPR 2022			-
2022	EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model	SIGGRAPH 2022	-	-	Emotion
2022	SPACEx 🚀: Speech-driven Portrait Animation with Controllable Expression	ArXiv 2022	-	Project	-
2022	Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers	SIGGRAPH Asia 2022	-	-	-
2022	Memories are One-to-Many Mapping Alleviators in Talking Face Generation	ArXiv 2022	-	-	-
2021	One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning	AAAI 2022			one-shot, audio-visual correlation, keypoint-based motion, lip sync
2021	Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion	ArXiv 2021			Audio-driven, Talking-head, Head Motion, Keypoint-based Motion
2021	3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head	ArXiv 2021			3D Talking Head, Emotion, Geometry Map, Audio-driven
2021	MakeItTalk: Speaker-Aware Talking-Head Animation	SIGGRAPH Asia 2020			Speaker-Aware, Audio-Driven, Facial Landmarks, Photorealistic
2021	PC-AVS: Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation	CVPR 2021			-
2021	Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis	ACM MM 2021	-	-	-
2021	Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation	IJCAI 2021	-	-	-
2021	Talking Head Generation with Audio and Speech Related Facial Action Units	BMVC 2021	-	-	AU
2021	Audio-Driven Emotional Video Portraits	CVPR 2021			Emotion
2021	IATS: Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis	ACM Multimedia 2021	-	-	-
2020	A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild	ACM Multimedia 2020			-
2020	Talking-head Generation with Rhythmic Head Motion	ECCV 2020			-
2020	Speaker-Aware Talking-Head Animation	SIGGRAPH Asia 2020			-
2020	Neural Voice Puppetry: Audio-driven Facial Reenactment	ECCV 2020			-
2020	A Large-scale Audio-visual Dataset for Emotional Talking-face Generation	ECCV 2020			-
2020	Realistic Speech-Driven Facial Animation with GANs	IJCV 2020			-
2020	Multi Modal Adaptive Normalization for Audio to Video Generation	ArXiv 2020			Audio-to-Video, Multi-Modal Adaptive Normalization, Facial Video Generation, Keypoint Heatmap
2019	Talking Face Generation by Adversarially Disentangled Audio-Visual Representation	AAAI 2019			-
2019	Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss	CVPR 2019			-
2018	Lip Movements Generation at a Glance	ECCV 2018			-
2018	Audio-Driven Animator-Centric Speech Animation	SIGGRAPH 2018			-
2017	Synthesizing Obama: Learning Lip Sync From Audio	SIGGRAPH 2017			-
2017	You Said That? Synthesising Talking Faces From Audio	BMVC 2019			-
2017	Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion	SIGGRAPH 2017			-
2017	A Deep Learning Approach for Generalized Speech Animation	SIGGRAPH 2017			-
2016	Lip Reading in the Wild	ACCV 2016			-

Text-driven

Year	Title	Conference/Journal	Code/Proj
2026	Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars	ArXiv 2026	Project
2026	ActAvatar: Temporally-Aware Precise Action Control for Talking Avatars	ArXiv 2026
2026	Text-Driven Emotionally Continuous Talking Face Generation	ArXiv 2026
2025	Think-Before-Draw: Decomposing Emotion Semantics & Fine-Grained Controllable Expressive Talking Head Generation	ArXiv 2025
2025	When Words Smile: Generating Diverse Emotional Facial Expressions from Text	EMNLP 2025	Code Project
2025	OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication	ArXiv 2025
2025	Text2Lip: Progressive Lip-Synced Talking Face Generation from Text via Viseme-Guided Rendering	ArXiv 2025
2024	FT2TF: First-Person Statement Text-To-Talking Face Generation	WACV 2025
2024	HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting	ECCV 2024	Code Project
2024	Text-Driven Talking Face Synthesis by Reprogramming Audio-Driven Models	ICASSP 2024
2024	GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars	ArXiv 2024
2023	Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism	ArXiv 2023
2023	AgentAvatar: Disentangling Planning, Driving and Rendering for Photorealistic Avatar Agents	ArXiv 2023
2023	Text-to-Video: A Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation	ArXiv
2023	TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles	ArXiv
2022	Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary	ICASSP 2022	Project Code
2021	Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation	AAAI	Code
2021	Txt2vid: Ultra-low bitrate compression of talking-head videos via text	ArXiv	Code

NeRF & 3D & Gaussian Splatting

Year	Title	Conference/Journal	Code	Project	Keywords
2026	Retrieval-Augmented Gaussian Avatars: Improving Expression Generalization	ArXiv 2026			Gaussian Splatting, Expression Generalization, Retrieval Augmentation, 3D Avatars
2026	STAvatar: Soft Binding and Temporal Density Control for Monocular 3D Head Avatars Reconstruction	ArXiv 2026			Gaussian Splatting, 3D Head Avatars, Soft Binding, Temporal Density Control
2026	OMG-Avatar: One-shot Multi-LOD Gaussian Head Avatar	ArXiv 2026			3D Gaussian, One-Shot, Head Avatar
2026	LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation	ArXiv 2026			Kinematic-Space Completion, Expression Control, 3D Gaussian Splatting, Video Diffusion
2026	GeoDiff4D: Geometry-Aware Diffusion for 4D Head Avatar Reconstruction	ArXiv 2026			Geometry-Aware Diffusion, 4D Avatar Reconstruction, 3D Gaussian Splatting, Surface Normals
2026	OMEGA-Avatar: One-shot Modeling of 360° Gaussian Avatars	ArXiv 2026			One-Shot Avatar, 360° Full-Head, 3D Gaussian Splatting, Multi-View Feature Splatting
2026	GTAvatar: Bridging Gaussian Splatting and Texture Mapping for Relightable and Editable Gaussian Avatars	ArXiv 2026			Gaussian Splatting, Texture Mapping, Relightable Avatars, 3D Reconstruction
2026	OFERA: Blendshape-driven 3D Gaussian Control for Occluded Facial Expression to Realistic Avatars in VR	ArXiv 2026		Project	Blendshape Control, Gaussian Avatars, VR Telepresence, Real-time Expression
2026	FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation	ICLR 2026			3D Gaussian Splatting, Head Avatars, Real-Time Animation, Few-Shot Learning
2026	CAG-Avatar: Cross-Attention Guided Gaussian Avatars for High-Fidelity Head Reconstruction	ArXiv 2026			3D Gaussian Splatting, cross-attention, head reconstruction, drivable avatars
2026	Generalizable and Animatable 3D Full-Head Gaussian Avatar from a Single Image	ArXiv 2026			3D full-head avatar, Gaussian primitives, UV space, single-image reconstruction
2026	UIKA: Fast Universal Head Avatar from Pose-Free Images	ArXiv 2026		Project	Gaussian Splatting, UV Mapping, Head Avatar, Feed-forward
2026	ELITE: Efficient Gaussian Head Avatar from a Monocular Video via Learned Initialization and TEst-time Generative Adaptation	ArXiv 2026			Gaussian Avatar, Test-time Adaptation, Diffusion, Monocular Video
2026	RelightAnyone: A Generalized Relightable 3D Gaussian Head Model	ArXiv 2026			3D Gaussian Splatting, relightable avatars, single-image fitting, cross-subject generalization
2026	Toward Fine-Grained Facial Control in 3D Talking Head Generation	ArXiv 2026			3D, Talking Head
2026	From Blurry to Believable: Enhancing Low-quality Talking Heads with 3D Generative Priors	3DV 2026		Project	3D, Talking Head, 3DV, Latent
2026	Lightweight High-Fidelity Low-Bitrate Talking Face Compression for 3D Video Conference	ArXiv 2026			3D, Talking Head
2026	Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting	ArXiv 2026	Code	Project	Gaussian Splatting, 3DGS, Portrait Animation, Talking Head
2026	MANGO:Natural Multi-speaker 3D Talking Head Generation via 2D-Lifted Enhancement	ArXiv 2026			3D, Talking Head, Transformer
2025	TexAvatars: Hybrid Texel-3D Representations for Stable Rigging of Photorealistic Gaussian Head Avatars	ArXiv 2025			3D Gaussian Splatting, hybrid representation, analytic rigging, UV space
2025	FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation	ArXiv 2025		Project	3D avatar, Gaussian Splatting, deformation, reconstruction
2025	FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision	ArXiv 2025		Project	3D head avatar, partial supervision, transformer, monocular training
2025	Gaussian Pixel Codec Avatars: A Hybrid Representation for Efficient Rendering	Tech Report 2025			Gaussian Splatting, head avatar, hybrid representation, efficient rendering
2025	AGORA: Adversarial Generation Of Real-time Animatable 3D Gaussian Head Avatars	ArXiv 2025		Project	3D Gaussian Splatting, Animatable Avatars, FLAME, Real-time Rendering
2025	MagicPortrait: Temporally Consistent Face Reenactment with 3D Geometric Guidance	ArXiv 2025	Code		Latent Diffusion, FLAME, 3D Geometric Guidance, Face Reenactment
2025	AvatarBack: Back-Head Generation for Complete 3D Avatars from Front-View Images	ArXiv 2025			3D Gaussian Splatting, Back-Head Generation, Avatar Reconstruction, Spatial Alignment
2025	EAvatar: Expression-Aware Head Avatar Reconstruction with Generative Geometry Priors	ArXiv 2025			3D Gaussian Splatting, expression-aware, deformation-aware, generative priors
2025	SVG-Head: Hybrid Surface-Volumetric Gaussians for High-Fidelity Head Reconstruction and Real-Time Editing	ArXiv 2025			Gaussian Splatting, 3D Avatar, Texture Editing, FLAME
2025	MoGaFace: Momentum-Guided and Texture-Aware Gaussian Avatars for Consistent Facial Geometry	ArXiv 2025			Gaussian Avatars, FLAME Meshes, Geometry Refinement
2025	HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars	ArXiv 2025			3D Gaussian Avatars, Hair Compositionality, Disentangled Prior, Few-shot Fine-tuning
2025	GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar	ArXiv 2025			Adaptive Gaussian Splatting, 3D Head Avatar, Mouth Structure, Deformation Strategy
2025	StreamME: Simplify 3D Gaussian Avatar within Live Stream	ArXiv 2025		Project	3D Gaussian Splatting, avatar reconstruction, on-the-fly training
2025	Total-Editing: Head Avatar with Editable Appearance, Motion, and Lighting	ArXiv 2025			Neural Radiance Fields, Intrinsic Decomposition, Portrait Editing, Motion Control
2025	Interactive Rendering of Relightable and Animatable Gaussian Avatars	ArXiv 2025			Gaussian Splatting, Relightable Avatars, Interactive Rendering, Pose-driven Animation
2025	Instant Expressive Gaussian Head Avatar via 3D-Aware Expression Distillation	ArXiv 2025			3D, Gaussian Splatting, Avatar, Attention
2025	EmoDiffTalk:Emotion-aware Diffusion for Editable 3D Gaussian Talking Head	ArXiv 2025			3D, Diffusion, Emotional, Talking Head
2025	Is It Truly Necessary to Process and Fit Minutes-Long Reference Videos for Personalized Talking Face Generation?	ArXiv 2025			Talking Head, Attention
2025	Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications	SUI 2025		Project	Real-Time, Cross-Platform, 3D Avatar, Gaussian Splatting
2025	STG-Avatar: Animatable Human Avatars via Spacetime Gaussian	IROS 2025		Project	Spacetime Gaussian, Animatable Avatar, 3DGS
2025	Capture, Canonicalize, Splat: Zero-Shot 3D Gaussian Avatars from Unstructured Phone Images	ICCV 2025			Zero-Shot, 3D Gaussian Avatars, Phone Images
2025	[HRM²Avatar] HRM²Avatar: High-Fidelity Real-Time Mobile Avatars from Monocular Phone Scans	SIGGRAPH Asia 2025		Project	Mobile, Real-Time, Monocular, Avatar
2025	MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussians	ArXiv 2025			Mixed 2D-3D Gaussians, Head Avatar, Geometric Accuracy
2025	MVP4D: Multi-View Portrait Video Diffusion for Animatable 4D Avatars	ArXiv 2025			Multi-View, Portrait Video, Diffusion, 4D Avatar
2025	Towards Efficient 3D Gaussian Human Avatar Compression: A Prior-Guided Framework	ArXiv 2025			3D Gaussian, Human Avatar, Compression
2025	PanoLAM: Large Avatar Model for Gaussian Full-Head Synthesis from One-shot Unposed Image	ArXiv 2025		Project	Gaussian, Full-Head Synthesis, One-shot
2025	Generative Head-Mounted Camera Captures for Photorealistic Avatars	SIGGRAPH Asia 2025		Project	Head-Mounted Camera, Photorealistic, Avatar
2025	Capturing Head Avatar with Hand Contacts from a Monocular Video	ICCV 2025			Head Avatar, Hand Contacts, Monocular Video, 3D Reconstruction
2025	ArchitectHead: Continuous Level of Detail Control for 3D Gaussian Head Avatars	ArXiv 2025			3D Gaussian Head Avatars, Level of Detail Control, Continuous LOD
2025	Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks	ArXiv 2025		Project	Head Correspondence, Canonical Embedding, Tracking, Avatar
2025	FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction	ArXiv 2025		Project	Pose-free, Sparse-view, 3DGS
2025	Hyper Diffusion Avatars: Dynamic Human Avatar Generation using Network Weight Space Diffusion	ArXiv 2025			Dynamic Avatar, Weight-Space Diffusion
2025	TeRA: Rethinking Text-guided Realistic 3D Avatar Generation	ICCV 2025			Text-to-Avatar, Latent Diffusion
2025	GaussianGAN: Real-Time Photorealistic controllable Human Avatars	FG 2025			3DGS, Real-Time, Photorealistic
2025	Im2Haircut: Single-view Strand-based Hair Reconstruction for Human Avatars	ArXiv 2025		Project	Hair Reconstruction, Gaussian Splatting
2025	DevilSight: Augmenting Monocular Human Avatar Reconstruction through a Virtual Perspective	ArXiv 2025			Avatar Reconstruction, Video Generation
2025	DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting	ICCV 2025		Project	Relightable Avatar, 2DGS Distillation
2025	MoGA: 3D Generative Avatar Prior for Monocular Gaussian Avatar Reconstruction	ICCV 2025		Project	3D Generative Avatar, Monocular Reconstruction
2025	GUAVA: Generalizable Upper Body 3D Gaussian Avatar	ICCV 2025	Code	Project	3D Gaussian Avatar, Upper Body, SMPLX
2025	GAS: Generative Avatar Synthesis from a Single Image	ICCV 2025		Project	Single Image, 3D Avatar, NeRF, Diffusion
2025	EPSilon: Efficient Point Sampling for Lightening of Hybrid-based 3D Avatar Generation	ArXiv 2025	Code		Efficient Point Sampling, Hybrid 3D Avatar
2025	VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis	ICCV 2025 Workshop			Visually-Guided, 3D Avatar, Lip Synthesis
2025	ScaffoldAvatar: High-Fidelity Gaussian Avatars with Patch Expressions	SIGGRAPH 2025		Project	High-Fidelity, Gaussian Avatars, Patch Expressions
2025	AvatarMakeup: Realistic Makeup Transfer for 3D Animatable Head Avatars	ArXiv 2025			3D Makeup Transfer, Avatar
2025	HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars	ArXiv 2025		Project	High-Dimensional, Gaussian Splatting
2025	Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router	ArXiv 2025			Multi-Character, 3D-mask
2025	Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos	ArXiv 2025			Motion Blur, Animatable Avatars
2025	BecomingLit: Relightable Gaussian Avatars with Hybrid Neural Shading	ArXiv 2025		Project	3DGS, Relightable, Neural Shading
2025	SmartAvatar: Text- and Image-Guided Human Avatar Generation with VLM AI Agents	ArXiv 2025			Text-Guided, VLM Agents
2025	UMA: Ultra-detailed Human Avatars via Multi-level Surface Alignment	ArXiv 2025			Ultra-detailed, Surface Alignment
2025	AniCrafter: Customizing Realistic Human-Centric Animation via Avatar-Background Conditioning in Video Diffusion Models	ArXiv 2025	Code		Avatar, Human-Centric Animation
2025	Eye-See-You: Reverse Pass-Through VR and Head Avatars	IJCAI 2025			VR, Head Avatars, Pass-Through
2025	Barbie: Text to Barbie-Style 3D Avatars	ArXiv 2025	Code	Project	Text to Avatar, Barbie-Style
2025	EVA: Expressive Virtual Avatars from Multi-view Videos	SIGGRAPH 2025		Project	Avatar, 3D Gaussian
2025	Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis	ArXiv 2025		Project	3D, Avatar, Audio-Synthesis
2025	SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation	CVPRW 2025	Code	Project	Single Image, 3D Avatar, 3DGS, Video Diffusion
2025	TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling	SIGGRAPH 2025		Project	3DGS, Avatar, High-Resolution
2025	MPMAvatar: Learning 3D Gaussian Avatars with Accurate and Robust Physics-Based Dynamics	NeurIPS 2025		Project	Physics-Based, 3DGS, Garments
2025	PERSE: Personalized 3D Generative Avatars from A Single Portrait	CVPR 2025		Project	Personalized, 3DGS, Single Image
2025	SIE3D: Single-image Expressive 3D Avatar generation via Semantic Embedding and Perceptual Expression Loss	ArXiv 2025		Project	Expressive, Text-Driven, Single Image
2025	Dream3DAvatar: Text-Controlled 3D Avatar Reconstruction from a Single Image	ArXiv 2025			Text-Driven, Single Image, 3DGS
2025	MAGE:A Multi-stage Avatar Generator with Sparse Observations	ArXiv 2025			Avatar, AR/VR
2025	TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting	CVPR 2025(Highlight🚀)		Project	AR
2025	Better Together: Unified Motion Capture and 3D Avatar Reconstruction	ArXiv 2025
2025	WildAvatar: Learning In-the-wild 3D Avatars from the Web	CVPR 2025	Code	Project	WildAvatar, Dataset
2025	2DGS-Avatar: Animatable High-fidelity Clothed Avatar via 2D Gaussian Splatting	ICVRV 2024			2DGS
2025	Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior	CVPR 2025		Project
2025	LAM: Large Avatar Model for One-shot Animatable Gaussian Head	ArXiv 2025	Code	Project
2025	Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars	ArXiv 2025		Project
2025	LUCAS: Layered Universal Codec Avatars	ArXiv 2025
2025	Hybrid Explicit Representation for Ultra-Realistic Head Avatars	ArXiv 2025
2025	MaintaAvatar: A Maintainable Avatar Based on Neural Radiance Fields by Continual Learning	AAAI 2025			NeRF
2025	Relightable Full-Body Gaussian Codec Avatars	ArXiv 2025		Project	Full-Body, Avatars
2025	Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance	ArXiv 2025		Project	Avatars, Single Image
2025	Disentangled Clothed Avatar Generation with Layered Representation	ICCV 2025 (Highlight)	Code	Project
2025	L3D-Pose: Lifting Pose for 3D Avatars from a Single Camera in the Wild	ICASSP 2025		Project
2025	Generating Editable Head Avatars with 3D Gaussian GANs	ArXiv 2025	Code	Project	3DGS
2024	FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model	ArXiv 2024			3D Facial Animation, Expression Transfer, Foundation Model, Video-driven
2024	Universal Facial Encoding of Codec Avatars from VR Headsets	SIGGRAPH 2024			Facial Encoding, VR Headset, Real-time Animation, 3D Avatar
2024	PSAvatar: A Point-based Shape Model for Real-Time Head Avatar Animation with 3D Gaussian Splatting	ArXiv 2024			3D Gaussian Splatting, Head Avatar Animation, Point-based Shape Model, Real-time Rendering
2024	3D Gaussian Blendshapes for Head Avatar Animation	ACM SIGGRAPH 2024			Gaussian splatting, blendshapes, head avatar, real-time rendering
2024	Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters	ArXiv 2024			Co-speech
2024	GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians	AAAI 2025	Code		GNN-Generated, 3DGS
2024	CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models	ArXiv 2024	Code	Project	Multi-View Diffusion
2024	[3D$^2$-Actor] 3D$^2$-Actor: Learning Pose-Conditioned 3D-Aware Denoiser for Realistic Gaussian Avatar Modeling	AAAI 2025	Code	Project	3DGS
2024	StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priors	ArXiv 2024		Project
2024	Human-3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models	NIPS 2024		Project	Diffusion
2024	GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion	ArXiv 2024	Code Demo	Project	Diffusion
2024	SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing	ArXiv 2024		Project	NVIDIA, Hair and Clothing
2024	GASP: Gaussian Avatars with Synthetic Priors	ArXiv 2024		Project
2024	MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussian Splatting	ArXiv 2024		Project	3DGS, 2D-3D
2024	PBDyG: Position Based Dynamic Gaussians for Motion-Aware Clothed Human Avatars	ArXiv 2024			Clothed Avatar
2024	Topology-aware Human Avatars with Semantically-guided Gaussian Splatting	ArXiv 2024
2024	3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning	ArXiv 2024
2024	AniFaceDiff: Animating Stylized Avatars via Parametric Conditioned Diffusion Models	ArXiv 2024
2024	HHAvatar: Gaussian Head Avatar with Dynamic Hairs	ArXiv 2024		Project	Hair
2024	InstantGeoAvatar: Effective Geometry and Appearance Modeling of Animatable Avatars from Monocular Video	ACCV 2024	Code
2024	GAST: Sequential Gaussian Avatars with Hierarchical Spatio-temporal Context	ArXiv 2024
2024	Bundle Adjusted Gaussian Avatars Deblurring	ArXiv 2024	Code
2024	DynamicAvatars: Accurate Dynamic Facial Avatars Reconstruction and Precise Editing with Diffusion Models	ArXiv 2024
2024	FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video	ArXiv 2024		Project
2024	ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance	ArXiv 2024		Project
2024	DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh	ArXiv 2024
2024	DanceFusion: A Spatio-Temporal Skeleton Diffusion Transformer for Audio-Driven Dance Motion Reconstruction	ArXiv 2024		Project
2024	EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars	ArXiv 2024
2024	Towards Native Generative Model for 3D Head Avatar	ArXiv 2024
2024	Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual Reality	IEEE 2024
2024	Stable Video Portraits	ECCV 2024		Project	Diffusion
2024	LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field	ECCV'24 CADL	Code
2024	DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion	ArXiv 2024		Project
2024	Gaussian Déjà-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities	WACV 2025	Code	Project
2024	GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations	SIGGRAPH Asia 2024		Project	🔥Gaussian Splatting
2024	Avatar Concept Slider: Manipulate Concepts In Your Human Avatar With Fine-grained Control	ArXiv 2024
2024	GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars	ArXiv 2024
2024	DEGAS: Detailed Expressions on Full-Body Gaussian Avatars	ArXiv 2024			🔥Gaussian Splatting
2024	CHASE: 3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning	ArXiv 2024
2024	Expressive Whole-Body 3D Gaussian Avatar	ECCV 2024		Project
2024	AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos	ECCV 2024	Code	Project
2024	XHand: Real-time Expressive Hand Avatar	ArXiv 2024	Code		Hand
2024	Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture	ECCV 2024		Project
2024	CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images	ECCV 2024	Code
2024	WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation	ArXiv 2024	Code	Project	Dataset
2024	Instant 3D Human Avatar Generation using Image Diffusion Models	ArXiv 2024		Project
2024	Gaussian Eigen Models for Human Heads	ArXiv 2024		Project
2024	MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices	ArXiv 2024			Real-Time
2024	Expressive Gaussian Human Avatars from Monocular RGB Video	ArXiv 2024		Project
2024	Representing Animatable Avatar via Factorized Neural Fields	ArXiv 2024
2024	Stratified Avatar Generation from Sparse Observations	CVPR 2024 (Oral)
2024	NPGA: Neural Parametric Gaussian Avatars	ArXiv 2024		Project
2024	E3Gen: Efficient, Expressive and Editable Avatars Generation	ArXiv 2024	Code	Project
2024	GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting	On going work			Try-ON
2024	X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation	ICML 2024	Code	Project
2024	MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing	ArXiv 2024	Code	Project	🔥Gaussian Splatting
2024	Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos	ArXiv 2024	Code	Project	🔥Gaussian Splatting
2024	GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image	CVPR 2024	Code	Project	Editing
2024	Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes	CVPR 2024		Project	Blendshapes
2024	SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting	CVPR 2024	Code	Project	🔥Gaussian Splatting
2024	MagicMirror: Fast and High-Quality Avatar Generation with a Constrained Search Space	ArXiv 2024		Project
2024	HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior	ArXiv 2024			🔥Gaussian Splatting
2024	UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling	ArXiv 2024		Project	🔥Gaussian Splatting
2024	NECA: Neural Customizable Human Avatar	CVPR 2024	Code
2024	V3D: Video Diffusion Models are Effective 3D Generators	ArXiv 2024	Code	Project	🔥Gaussian Splatting, Video
2024	DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization	CVPR 2024	Code	Project	🔥Gaussian Splatting, Sparse-View
2024	GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video	ArXiv 2024		Project	🔥Gaussian Splatting, Avatar
2024	Magic-Me: Identity-Specific Video Customized Diffusion	ArXiv 2024	Code	Project
2024	HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting	ArXiv 2024			🔥Gaussian Splatting, Avatar
2024	GaussianHair: Hair Modeling and Rendering with Light-aware Gaussians	ArXiv 2024			🔥Gaussian Splatting
2024	ImplicitDeepfake: Plausible Face-Swapping through Implicit Deepfake Generation using NeRF and Gaussian Splatting	ArXiv 2024			🔥Gaussian Splatting, Deepfake
2024	Consolidating Attention Features for Multi-view Image Editing	ArXiv 2024			🔥Gaussian Splatting, Edit
2024	Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos	ArXiv 2024		Project	Portraits
2024	4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes	ArXiv 2024			Dynamic Scenes
2024	ViCA-NeRF: View-Consistency-Aware 3D Editing of Neural Radiance Fields	NIPS 2023	Code	Project	3D Edit
2024	CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians with Dual Feature Fusion	ArXiv 2024		Project	Segmentic
2024	Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation	ArXiv 2024			Text to 3D
2024	CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians with Dual Feature Fusion	ArXiv 2024		Project	🔥Gaussian Splatting, Segmentation
2024	UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures	ArXiv 2024		Project	Diffusion,Avatar
2024	GaussianBody: Clothed Human Reconstruction via 3d Gaussian Splatting	ArXiv 2024			🔥Gaussian Splatting
2024	FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF	ArXiv 2024	Code		4D face video editor
2024	AGG: Amortized Generative 3D Gaussians for Single Image to 3D	ArXiv 2024		Project	🔥Gaussian Splatting
2024	Gaussian Shadow Casting for Neural Characters	ArXiv 2024			🔥Gaussian Splatting
2024	Human101: Training 100+FPS Human Gaussians in 100s from 1 View	ArXiv 2024	Code	Project	🔥Gaussian Splatting
2024	Deformable 3D Gaussian Splatting for Animatable Human Avatars	ArXiv 2024			🔥Gaussian Splatting
2024	4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency	ArXiv 2024	Code	Project	🔥Gaussian Splatting
2024	What You See Is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs	ArXiv 2024		Project
2024	3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting	ArXiv 2024	Code	Project	🔥Gaussian Splatting
2024	Learning Dense Correspondence for NeRF-Based Face Reenactment	AAAI 2024			one-shot multi-view face reenactmen
2024	GaussianHead: Impressive 3D Gaussian-based Head Avatars with Dynamic Hybrid Neural Field	ArXiv 2024	Code		🔥Gaussian Splatting
2024	MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar	ArXiv 2024			🔥Gaussian Splatting
2024	Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians	ArXiv 2024	Code	Project
2024	HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting	ArXiv 2024			🔥Gaussian Splatting
2024	GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians	CVPR 2024	Code	Project	🔥Gaussian Splatting
2024	VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer	ICCV2023
2023	AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text	ArXiv 2023		Project	Text-to-3D, NeRF, SMPL, Diffusion Model
2023	HAvatar: High-fidelity Head Avatar via Facial Model Conditioned Neural Radiance Field	ArXiv 2023			Neural Radiance Field, Facial Model Conditioning, 3D Head Avatar, Expression Control
2023	SD-NeRF: Towards Lifelike Talking Head Animation via Spatially-adaptive Dual-driven NeRFs	IEEE 2023	-	-
2023	Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions	ArXiv 2023
2023	GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation	ArXiv 2023	Code	Project	-
2023	GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis	ICLR 2023	Code	Project	-
2022	RAD-NeRF: Real-time Neural Talking Portrait Synthesis	ArXiv 2022	Code	Project	InstantNGP
2022	DFRF：Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis	ECCV 2022	Code	Project
2022	NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real Image Animation	ArXiv 2022	Code	Project	-
2022	Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars	ArXiv 2022	Code	Project	-
2022	3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation	ArXiv 2022	Code	Project	-
2022	FNeVR: Neural Volume Rendering for Face Animation	ArXiv 2022	Code	-	-
2022	ROME: Realistic One-shot Mesh-based Head Avatars	ECCV 2022	Code	Project	-
2022	IMavatar: Implicit Morphable Head Avatars from Videos	CVPR 2022	Code	Project	-
2022	HeadNeRF: A Real-time NeRF-based Parametric Head Model	CVPR 2022	Code	Project	-
2022	Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation	ArXiv 2022	Code	Project	-
2021	AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis	ICCV 2021	Code	Project	-
2021	NerFACE: Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction	CVPR 2021 Oral	Code	Project	-
2021	DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering	ArXiv 2021	Code	-	-

Conversational & Dialogue

Year	Title	Conference/Journal	Code	Project	Keywords
2026	HoverAI: An Embodied Aerial Agent for Natural Human-Drone Interaction	ArXiv 2026			lip-synced avatars, real-time conversational AI, multimodal pipeline
2026	Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation	ArXiv 2026			Interactive avatar, Diffusion forcing, Real-time, Preference optimization
2026	Talking Together: Synthesizing Co-Located 3D Conversations from Audio	CVPR 2026			3D, Conversations, Talking Head, CVPR
2026	A²-LLM: An End-to-end Conversational Audio Avatar Large Language Model	ArXiv 2026			Conversational, Avatar, LLM
2026	RSATalker: Realistic Socially-Aware Talking Head Generation for Multi-Turn Conversation	ArXiv 2026			Multi-Turn Conversation, Talking Head
2025	ALIVE: An Avatar-Lecture Interactive Video Engine with Content-Aware Retrieval for Real-Time Interaction	ArXiv 2025			neural talking-head synthesis, content-aware retrieval, real-time interaction, LLM
2025	TAVID: Text-Driven Audio-Visual Interactive Dialogue Generation	ArXiv 2025			text-driven, audio-visual, interactive dialogue, cross-modal mappers
2025	ViBES: A Conversational Agent with Behaviorally-Intelligent 3D Virtual Body	ArXiv 2025		Project	conversational agent, 3D avatar, multimodal interaction, joint language-motion
2025	Towards Interactive Intelligence for Digital Humans	ArXiv 2025			interactive intelligence, digital human, multimodal embodiment, real-time interaction
2025	VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction	ArXiv 2025			Listener dynamics, 3D dyadic conversation, Expressive control, Multi-modal conditions
2025	UniTalker: Conversational Speech-Visual Synthesis	ACM MM 2025			Conversational, Multimodal, Emotion
2025	Think Before You Talk: Enhancing Meaningful Dialogue Generation in Full-Duplex Speech Language Models with Planning-Inspired Text Guidance	ArXiv 2025		Project	Dialogue Generation, Speech Language Models
2025	MultiTalk: Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation	ArXiv 2025			Multi-Person, Conversational
2025	DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations	CVPR 2025			3D, Interaction, Dual-Speaker, Conversations, FLAME
2024	INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations	ArXiv 2024			Dyadic Conversations
2024	Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction	ACL 2024			Empathetic Dialogue
2024	MultiDialog: Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation	ACL 2024		Dataset	Dialogue, Face-to-Face Conversation
2022	DialogueNeRF: Towards Realistic Avatar Face-to-face Conversation Video Generation	ArXiv 2022	-	-	Dialogue, Face-to-face Conversation

Talking Body & Avatar

Year	Paper	Conference	Code	Project	Keywords
2026	MIBURI: Towards Expressive Interactive Gesture Synthesis	ArXiv 2026		Project	Gesture Synthesis, Real-Time, LLM-Conditioned
2026	3DGesPolicy: Phoneme-Aware Holistic Co-Speech Gesture Generation Based on Action Control	ArXiv 2026			co-speech gesture, diffusion policy, phoneme-aware, holistic motion
2026	Mitigating Error Accumulation in Co-Speech Motion Generation via Global Rotation Diffusion and Multi-Level Constraints	AAAI 2026			co-speech motion, global rotation diffusion, multi-level constraints, error accumulation
2026	SmoothSync: Dual-Stream Diffusion Transformers for Jitter-Robust Beat-Synchronized Gesture Generation from Quantized Audio	ArXiv 2026			Gesture generation, Diffusion Transformer, Beat synchronization, Jitter suppression
2025	OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation	ArXiv 2025		Project	Cognitive Simulation, Avatar, Multimodal
2025	OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models	ICCV 2025			Human Animation, Scaling
2025	Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation	ArXiv 2025		Project	Character Identity, Audio-Driven, Human Animation
2025	EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation	CVPR 2025			Semi-Body, Human Animation
2025	InfinityHuman: Towards Long-Term Audio-Driven Human	ArXiv 2025		Project	Long-Term, Hand Motion, Pose-Guided
2025	Preview WB-DH: Towards Whole Body Digital Human Bench for the Generation of Whole-body Talking Avatar Videos	ICCV 2025 Workshop		Project	Whole-Body Avatar, Benchmark
2025	JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1	ICCV 2025 Workshop		Project	Whole-Body Avatar, Benchmark
2025	EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation	ArXiv 2025			Multi-Modal, Multi-Task, Human Animation
2025	MirrorMe: Towards Realtime and High Fidelity Audio-Driven Halfbody Animation	ArXiv 2025			Real-time, Half-body Animation
2025	OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation	ArXiv 2025			Audio-Driven, Body Animation
2025	AlignHuman: Improving Motion and Fidelity via Timestep-Segment Preference Optimization for Audio-Driven Human Animation	ArXiv 2025			Preference Optimization, Human Animation
2025	HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation	ArXiv 2025			Human-Object Interaction, Human Animation
2025	HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters	ArXiv 2025			Multi-Character, Human Animation
2025	AsynFusion: Towards Asynchronous Latent Consistency Models for Decoupled Whole-Body Audio-Driven Avatars	ArXiv 2025			Whole-Body, Diffusion, Avatar
2025	M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis	NeurIPS 2025			Gesture, Full-Body
2025	PAHA: Parts-Aware Audio-Driven Human Animation with Diffusion Model	ArXiv 2025			Parts-Aware, Diffusion, Human Animation
2025	Versatile Multimodal Controls for Whole-Body Talking Human Animation	ArXiv 2025			Whole-Body, Multimodal
2025	PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning	ICCV 2025			Avatar, Whole-Body, Motion Generation
2025	GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation	ICCV 2025		Project	Co-speech Gesture Synthesis, Hybrid Modality Diffusion Transformer, Retrieval-Augmented Generation
2025	StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars	ArXiv 2025		Project	Real-Time Avatar, Streaming Diffusion, Interactive Human Avatars, Autoregressive Distillation
2025	Semantic Co-Speech Gesture Synthesis and Real-Time Control for Humanoid Robots	ArXiv 2025			co-speech gestures, humanoid robot, real-time control, semantic synthesis
2025	Evaluation of Generative Models for Emotional 3D Animation Generation in VR	ArXiv 2025			speech-driven animation, emotional expression, 3D avatar, VR evaluation
2025	Towards Unified Co-Speech Gesture Generation via Hierarchical Implicit Periodicity Learning	IEEE Transactions on Image Processing			co-speech gesture generation, 3D motion, hierarchical periodicity, audio-driven
2025	Social Agent: Mastering Dyadic Nonverbal Behavior Generation via Conversational LLM Agents	ArXiv 2025			Dyadic Interaction, Gesture Generation, LLM-driven
2025	Intentional Gesture: Deliver Your Intentions with Gestures for Speech	ArXiv 2025		Project	co-speech gesture, communicative intention, motion tokenizer, BEAT-2
2025	Democratizing High-Fidelity Co-Speech Gesture Video Generation	ICCV 2025		Project	Diffusion Model, Skeleton-Audio Fusion, Co-Speech Gesture, Video Generation
2025	Co-Speech Gesture and Facial Expression Generation for Non-Photorealistic 3D Characters	SIGGRAPH 2025 Poster			co-speech generation, non-photorealistic characters, facial expressions, gestures
2025	TRiMM: Transformer-Based Rich Motion Matching for Real-Time multi-modal Interaction in Digital Humans	ArXiv 2025	Code		Co-speech Gesture, Real-time, Transformer, Digital Humans
2025	Co3Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion	ICLR 2025 Spotlight		Project	Concurrent co-speech gesture, Interactive diffusion, Two-person interaction, 3D gesture dataset
2025	CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild	ArXiv 2025		Project	3D Gesture Generation, Diffusion Model, Audio-Driven
2025	EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation	ArXiv 2025			Co-Speech Motion, Masked Modeling, Speech-Queried Attention
2025	EasyGenNet: An Efficient Framework for Audio-Driven Gesture Video Generation Based on Diffusion Model	ArXiv 2025			Audio-Driven Video, Diffusion Model, Gesture Synthesis
2025	ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer	ArXiv 2025		Project	Co-Speech Motion, Transformer, Gesture Generation, Speech Synchronization
2025	SARGes: Semantically Aligned Reliable Gesture Generation via Intent Chain	ArXiv 2025			Gesture Generation, LLM, Intent Chain, Co-speech
2025	DIDiffGes: Decoupled Semi-Implicit Diffusion Models for Real-time Gesture Generation from Speech	AAAI 2025		Project	Gesture Generation, Diffusion Models, Real-time, Speech-driven
2025	Large Language Models for Virtual Human Gesture Selection	AAMAS 2025			Gesture Selection, Large Language Models, Virtual Agents, Co-speech
2025	Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers	ArXiv 2025			Co-speech gesture, Diffusion Transformers, VQ-VAEs, Audio-visual synthesis
2025	HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation	ArXiv 2025		Project	Co-speech gesture, Multimodal entanglement, Spatiotemporal graph, Audio-text semantic
2025	EMO2: End-Effector Guided Audio-Driven Avatar Video Generation	ArXiv 2025			Audio-driven, Gesture Generation, Diffusion Model, End-effector Guidance
2024	Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation	CVPR 2024		Project	3D Co-speech Gesture, Emotion Transition, Weakly-Supervised Learning, Virtual Avatar Animation
2023	C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model	ArXiv 2023		Project	Co-speech Gesture, Latent Diffusion, Temporal Dependency
2022	Freeform Body Motion Generation from Speech	ArXiv 2022	Code		co-speech motion, pose modes, rhythmic dynamics, speech prosody

Metrics

Metrics	Paper	Link
PSNR (peak signal-to-noise ratio)	-
SSIM (structural similarity index measure)	Image quality assessment: from error visibility to structural similarity.
CPBD(cumulative probability of blur detection)	A no-reference image blur metric based on the cumulative probability of blur detection
LPIPS (Learned Perceptual Image Patch Similarity) -	The Unreasonable Effectiveness of Deep Features as a Perceptual Metric	paper
NIQE (Natural Image Quality Evaluator)	Making a ‘Completely Blind’ Image Quality Analyzer	paper
FID (Fréchet inception distance)	GANs trained by a two time-scale update rule converge to a local nash equilibrium
LMD (landmark distance error)	Lip Movements Generation at a Glance
LRA (lip-reading accuracy)	Talking Face Generation by Conditional Recurrent Adversarial Network	paper
WER(word error rate)	Lipnet: end-to-end sentencelevel lipreading.
LSE-D (Lip Sync Error - Distance)	Out of time: automated lip sync in the wild
LSE-C (Lip Sync Error - Confidence)	Out of time: automated lip sync in the wild
ACD(Average content distance)	Facenet: a unified embedding for face recognition and clustering.
CSIM(cosine similarity)	Arcface: additive angular margin loss for deep face recognition.
EAR(eye aspect ratio)	Real-time eye blink detection using facial landmarks. In: Computer Vision Winter Workshop
ESD(emotion similarity distance)	What comprises a good talking-head video generation?: A Survey and Benchmark

Related Papers on Metrics

Year	Paper	Conference	Keywords
2025	NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results	ArXiv 2025	quality assessment, talking head, challenge, THQA-NTIRE

Tools & Software

Tool/Resource	Description
LUCIA	Development of a MPEG-4 Talking Head Engine. 💻
Yepic Studio	Create and dub talking head-style videos in minutes without expensive equipment. 🎥
Mel McGee's Talkbots	A complete multi-browser, multi-platform talking head application in SVG suitable for web sites or as an avatar. 🗣️
face3D_chung	Create 3D character avatar head objects with texture from a single photo for your games. 🎮
CrazyTalk	Exciting features for 3D head creation and automation. 🤪
tts avatar free download - SourceForge	Mel McGee's Talkbots is a complete multi-browser, multi-platform talking head. (🔧👄)
Verbatim AI - Product Information, Latest Updates, and Reviews 2023	A simple yet powerful API to generate AI "talking head" videos in near real-time with Verbatim AI. Add interest, intrigue, and dynamism to your chat bots! (🔧👄)
Best Open Source BASIC 3D Modeling Software	Includes talk3D_chung, a small example using obj models created with face3D_chung, and speak3D_chung_dll, a dll to load and display face3D_chung talking avatars. (🛠️🎭)
DVDStyler / Discussion / Help: ffmpeg-vbr or internal	Talking heads would get a bitrate which is unnecessarily high while using DVDStyler. (🛠️👄)
puffin web browser free download - SourceForge	Mel McGee's Talkbots is a complete multi-browser, multi-platform talking head. (🔧👄)
12 best AI video generators to use in 2023 Free and paid \|Product ...	Whether you’re an entrepreneur, small business owner, or run a large company, AI video generators make it super easy to create high-quality videos from scratch. (🔧🎥)

Slides & Presentations

Presentation Title	Description
Few-Shot Adversarial Learning of Realistic Neural Talking Head Models	Presentation reviewing the few-shot adversarial learning of realistic neural talking head models.
Nethania Michelle's Character	PPT: Presentation discussing the improvement of a 3D talking head for use in an avatar of a virtual meeting room.
Presenting you: Top tips on presenting with Prezi Video – Prezi	Article providing top tips for presenting with Prezi Video.
Research Presentation	PPT: Resident Research Presentation Slide Deck.
Adding narration to your presentation (using Prezi Video) – Prezi	Learn how to add narration to your Prezi presentation with Prezi Video.

References

Website	Description
arXiv	Provides preprints in various academic fields, serving as an important platform for accessing the latest research findings.
CVF Open Access	The Computer Vision Foundation's open-access platform, offering open-access papers from top conferences such as CVPR, ICCV, ECCV, and more.
Papers with Code	A platform that aggregates research papers with accompanying code implementations, making it convenient to find the latest research findings and their corresponding implementations.
ICCV - International Conference on Computer Vision	The International Conference on Computer Vision, gathering the latest research findings in the field of computer vision.
ECCV - European Conference on Computer Vision	The European Conference on Computer Vision, providing the latest research results and related information in the field of computer vision.
CVPR - Conference on Computer Vision and Pattern Recognition	The Conference on Computer Vision and Pattern Recognition, one of the top conferences in computer vision, showcasing numerous important research findings.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Talking-Head-Synthesis

Datasets

Survey

Funny Work

Audio-driven

Text-driven

NeRF & 3D & Gaussian Splatting

Conversational & Dialogue

Talking Body & Avatar

Metrics

Related Papers on Metrics

Tools & Software

Slides & Presentations

References

Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Folders and files

Latest commit

History

Repository files navigation

Awesome-Talking-Head-Synthesis

Datasets

Survey

Funny Work

Audio-driven

Text-driven

NeRF & 3D & Gaussian Splatting

Conversational & Dialogue

Talking Body & Avatar

Metrics

Related Papers on Metrics

Tools & Software

Slides & Presentations

References

Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Packages