| 2026 |
EmbedTalk: Triplane-Free Talking Head Synthesis using Embedding-Driven Gaussian Deformation |
Preprint |
|
|
Gaussian Splatting, 3DGS, Audio-Driven, Talking Head |
| 2026 |
TempoSyncDiff: Distilled Temporally-Consistent Diffusion for Low-Latency Audio-Driven Talking Head Generation |
ArXiv 2026 |
Code |
Project |
Diffusion, Audio-Driven, Talking Head, VAE, Latent |
| 2026 |
UniSync: Towards Generalizable and High-Fidelity Lip Synchronization for Challenging Scenarios |
ArXiv 2026 |
|
|
Lip Sync, Pose-Anchored, Generalizable |
| 2026 |
FlowPortrait: Reinforcement Learning for Audio-Driven Portrait Video Generation |
ArXiv 2026 |
|
|
Audio-Driven, Portrait Animation, Reinforcement Learning, GRPO |
| 2026 |
UniTalking: A Unified Audio-Video Framework for Talking Portrait Generation |
CVPR 2026 |
|
|
Audio-Driven, Portrait Animation, Talking Head, CVPR, Transformer, Attention |
| 2026 |
AUHead: Realistic Emotional Talking Head Generation via Action Units Control |
ArXiv 2026 |
Code |
|
Action Units, Audio-Driven Generation, Emotion Control, Diffusion Model |
| 2026 |
Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models |
ArXiv 2026 |
|
|
3D Facial Animation, Speech-Driven, Omni-modal LLMs, Token-as-query Fusion |
| 2026 |
MOVA: Towards Scalable and Synchronized Video-Audio Generation |
ArXiv 2026 |
Code |
Project |
Audio-Driven |
| 2026 |
VedicTHG: Symbolic Vedic Computation for Low-Resource Talking-Head Generation in Educational Avatars |
ArXiv 2026 |
Code |
Project |
Avatar, Talking Head |
| 2026 |
3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars |
ArXiv 2026 |
|
|
3D, Emotional, Lip Sync, Avatar, Talking Head, Transformer |
| 2026 |
DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation |
ArXiv 2026 |
|
Project |
Audio-Driven, Transformer, Attention |
| 2026 |
VineetVC: Adaptive Video Conferencing Under Severe Bandwidth Constraints Using Audio-Driven Talking-Head Reconstruction |
ArXiv 2026 |
|
|
Audio-Driven, Talking Head |
| 2026 |
Narrating For You: Prompt-guided Audio-visual Narrating Face Generation Employing Multi-entangled Latent Space |
WACV 2026 |
|
|
Audio-Driven, WACV, Latent |
| 2026 |
SoulX-FlashHead: Oracle-guided Generation of Infinite Real-time Streaming Talking Heads |
ArXiv 2026 |
|
|
Real-time, Streaming, Talking Head |
| 2026 |
Asymmetric Hierarchical Anchoring for Audio-Visual Joint Representation |
ArXiv 2026 |
|
|
Audio-Driven |
| 2026 |
JoyAvatar: Unlocking Highly Expressive Avatars via Harmonized Text-Audio Conditioning |
ArXiv 2026 |
|
Project |
Audio-Driven, Avatar |
| 2026 |
LPIPS-AttnWav2Lip: Generic Audio-Driven lip synchronization for Talking Head Generation in the Wild |
ArXiv 2026 |
Code |
Project |
Lip Sync, Audio-Driven, Talking Head, Latent |
| 2026 |
JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion |
ArXiv 2026 |
|
|
Audio-Visual Diffusion, LoRA, Lip Sync |
| 2026 |
MIRRORTALK: Forging Personalized Avatars Via Disentangled Style and Hierarchical Motion Control |
ArXiv 2026 |
|
|
Personalized Avatars, Lip Sync, Style Disentanglement, Diffusion Model |
| 2026 |
EditYourself: Audio-Driven Generation and Manipulation of Talking Head Videos with Diffusion Transformers |
ArXiv 2026 |
|
Project |
Diffusion, Audio-Driven, Talking Head |
| 2026 |
SkyReels-V3 Technique Report |
ArXiv 2026 |
Code |
|
Video Generation, Audio-Guided, Talking Avatar, Diffusion Transformers |
| 2026 |
FunCineForge: A Unified Dataset Toolkit and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes |
ArXiv 2026 |
|
|
Talking Head, Movie Dubbing |
| 2026 |
Uncertainty-Aware 3D Emotional Talking Face Synthesis with Emotion Prior Distillation |
ICASSP 2026 |
|
Project |
3D, Emotional, Talking Head, ICASSP, Attention |
| 2026 |
Audio-Driven Talking Face Generation with Blink Embedding and Hash Grid Landmarks Encoding |
ArXiv 2026 |
|
|
Audio-Driven, Talking Head, Transformer |
| 2026 |
Now You See Me, Now You Don't: A Unified Framework for Expression Consistent Anonymization in Talking Head Videos |
ArXiv 2026 |
|
|
Talking Head |
| 2026 |
EditEmoTalk: Controllable Speech-Driven 3D Facial Animation with Continuous Expression Editing |
ArXiv 2026 |
|
|
3D, Speech-Driven |
| 2026 |
MoCha:End-to-End Video Character Replacement without Structural Guidance |
ArXiv 2026 |
|
|
Talking Head |
| 2026 |
Exploring Talking Head Models With Adjacent Frame Prior for Speech-Preserving Facial Expression Manipulation |
ACM Trans. Multimedia |
|
|
Speech-Driven, Talking Head |
| 2026 |
ESGaussianFace: Emotional and Stylized Audio-Driven Facial Animation via 3D Gaussian Splatting |
ArXiv 2026 |
|
|
3D, Gaussian Splatting, 3DGS, Emotional, Audio-Driven |
| 2026 |
SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation |
ArXiv 2026 |
|
|
Real-time, Streaming, Audio-Driven, Avatar, Attention, VAE |
| 2026 |
DyStream: Streaming Dyadic Talking Heads Generation via Flow Matching-based Autoregressive Model |
ArXiv 2026 |
|
Project |
Streaming, Talking Head, Flow Matching |
| 2026 |
SyncAnyone: Implicit Disentanglement via Progressive Self-Correction for Lip-Syncing in the wild |
ArXiv 2026 |
|
Project |
Transformer |
| 2026 |
Efficient and Robust Video Defense Framework against 3D-field Personalized Talking Face |
ArXiv 2026 |
Code |
Project |
3D, Talking Head, Attention |
| 2026 |
REST: Diffusion-based Real-time End-to-end Streaming Talking Head Generation via ID-Context Caching and Asynchronous Streaming Distillation |
ArXiv 2026 |
|
|
Diffusion, Real-time, Streaming, Talking Head, Latent |
| 2026 |
JoyAvatar-Flash: Real-time and Infinite Audio-Driven Avatar Generation with Autoregressive Diffusion |
ArXiv 2026 |
|
|
Diffusion, Real-time, Audio-Driven, Avatar |
| 2026 |
Lightning Fast Caching-based Parallel Denoising Prediction for Accelerating Talking Head Generation |
ArXiv 2026 |
|
|
Talking Head, Attention, Latent |
| 2026 |
SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation |
ArXiv 2026 |
|
|
Audio-Driven, Talking Head |
| 2025 |
From Inpainting to Editing: A Self-Bootstrapping Framework for Context-Rich Visual Dubbing |
ArXiv 2025 |
|
Project |
Visual dubbing, Diffusion Transformer, Self-bootstrapping, Lip sync |
| 2025 |
The Locally Deployable Virtual Doctor: LLM Based Human Interface for Automated Anamnesis and Database Conversion |
ArXiv 2025 |
|
|
Conditional Diffusion, Facial Animation, Audio-Visual Synchronization, LLM-Based Avatar |
| 2025 |
Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation |
ICXR 2025 |
|
|
Blendshapes, FLAME, Disentanglement, 3D Animation |
| 2025 |
Revising Second Order Terms in Deep Animation Video Coding |
ArXiv 2025 |
|
|
FOMM, Keypoints, Head Rotation, Motion Model |
| 2025 |
A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages |
ArXiv 2025 |
|
|
Phoneme-Viseme Alignment, Multilingual TFS, Mixture-of-Experts |
| 2025 |
Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation |
ArXiv 2025 |
|
Project |
Diffusion models, co-speech video, real-time, sparse attention |
| 2025 |
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis |
ArXiv 2025 |
|
|
Multimodal Instructions, Avatar Synthesis, Lip Synchronization |
| 2025 |
A Lightweight Pipeline for Noisy Speech Voice Cloning and Accurate Lip Sync Synthesis |
ArXiv 2025 |
|
|
Voice Cloning, Lip Sync Synthesis, Noisy Speech |
| 2025 |
CEM-Net: Cross-Emotion Memory Network for Emotional Talking Face Generation |
ArXiv 2025 |
|
|
Cross-emotion memory, audio emotion enhancement, expression displacement, lip sync |
| 2025 |
Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching |
Technical Report |
|
Project |
real-time, flow matching, lip-sync |
| 2025 |
ATL-Diff: Audio-Driven Talking Head Generation with Early Landmarks-Guide Noise Diffusion |
ArXiv 2025 |
Code |
|
Diffusion, Landmarks-Guide, Real-time, Identity Preservation |
| 2025 |
MoDiT: Learning Highly Consistent 3D Motion Coefficients with Diffusion Transformer for Talking Head Generation |
ArXiv 2025 |
|
|
3DMM, Diffusion Transformer, Temporal Consistency, Blinking Dynamics |
| 2025 |
Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning |
ArXiv 2025 |
|
|
Joint uncertainty learning, Audio-driven talking face, Lip sync, Visual uncertainty |
| 2025 |
Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation |
TMM 2025 |
|
|
Talking Head Animation, Temporal Correlation, One-Shot |
| 2025 |
EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters |
ArXiv 2025 |
|
|
Neural Radiance Fields, Expression Parameters, Emotion Control, Audio-Driven |
| 2025 |
STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing |
ICME 2025 |
Code |
|
Spatial-Temporal Alignment, Semantic Features, Visual Dubbing, Stability |
| 2025 |
Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait |
ArXiv 2025 |
Code |
|
implicit keypoint, spatiotemporal diffusion, audio-driven, talking portrait |
| 2025 |
EmoDiffusion: Enhancing Emotional 3D Facial Animation with Latent Diffusion Models |
ArXiv 2025 |
|
Project |
3D facial animation, latent diffusion, emotional expression, speech-driven |
| 2025 |
Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation |
ArXiv 2025 |
|
|
Audio-driven, Diffusion model, Motion Diffusion Transformer, Lip sync |
| 2025 |
MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation |
ArXiv 2025 |
|
|
Audio-driven, Emotion Synthesis, Mixture of Experts, Portrait Animation |
| 2025 |
PTalker: Personalized Speech-Driven 3D Talking Head Animation via Style Disentanglement and Modality Alignment |
ArXiv 2025 |
|
|
3D, Speech-Driven, Talking Head, Attention |
| 2025 |
Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation |
ArXiv 2025 |
|
Project |
Diffusion, Real-time, Portrait Animation, Attention |
| 2025 |
FlashLips: 100-FPS Mask-Free Latent Lip-Sync using Reconstruction Instead of Diffusion or GANs |
ArXiv 2025 |
|
|
Diffusion, Transformer, GAN, Latent |
| 2025 |
In-Context Audio Control of Video Diffusion Transformers |
ArXiv 2025 |
|
|
Diffusion, Audio-Driven, Transformer, Attention |
| 2025 |
SynergyWarpNet: Attention-Guided Cooperative Warping for Neural Portrait Animation |
ArXiv 2025 |
|
|
Portrait Animation, ICASSP, Attention |
| 2025 |
FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction |
ArXiv 2025 |
|
|
Portrait Animation, Transformer, Latent |
| 2025 |
VASA-3D: Lifelike Audio-Driven Gaussian Head Avatars from a Single Image |
NeurIPS 2025 |
|
|
3D, Gaussian Splatting, Audio-Driven, Avatar |
| 2025 |
TalkVerse: Democratizing Minute-Long Audio-Driven Video Generation |
ArXiv 2025 |
|
Project |
Audio-Driven, VAE, Latent |
| 2025 |
FacEDiT: Unified Talking Face Editing and Generation via Facial Motion Infilling |
ArXiv 2025 |
|
Project |
Talking Head, Transformer, Attention, Flow Matching |
| 2025 |
STARCaster: Spatio-Temporal AutoRegressive Video Diffusion for Identity- and View-Aware Talking Portraits |
ArXiv 2025 |
|
Project |
Diffusion, Portrait Animation, Talking Head |
| 2025 |
JoVA: Unified Multimodal Learning for Joint Video-Audio Generation |
ArXiv 2025 |
|
Project |
Audio-Driven, Transformer, Attention, GAN |
| 2025 |
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model |
Tech Report |
|
|
Audio-Driven, Transformer, Reinforcement Learning |
| 2025 |
FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint |
ArXiv 2025 |
|
Project |
Portrait Animation, Transformer, Latent |
| 2025 |
KeyframeFace: From Text to Expressive Facial Keyframes |
ArXiv 2025 |
Code |
Project |
Talking Head |
| 2025 |
PersonaLive! Expressive Portrait Image Animation for Live Streaming |
ArXiv 2025 |
|
|
Streaming, Portrait Animation |
| 2025 |
GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting |
WACV 2026 |
|
|
3D, Gaussian Splatting, 3DGS, Audio-Driven, Talking Head |
| 2025 |
UniLS: End-to-End Audio-Driven Avatars for Unified Listening and Speaking |
ArXiv 2025 |
|
|
Audio-Driven, Avatar |
| 2025 |
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length |
ArXiv 2025 |
|
|
Real-time, Streaming, Audio-Driven, Avatar |
| 2025 |
EvalTalker: Learning to Evaluate Real-Portrait-Driven Multi-Subject Talking Humans |
ArXiv 2025 |
|
|
Portrait Animation, Talking Head |
| 2025 |
AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement |
ArXiv 2025 |
|
Project |
Talking Head, Transformer, Attention |
| 2025 |
AI killed the video star. Audio-driven diffusion model for expressive talking head generation |
ArXiv 2025 |
|
|
Diffusion, Audio-Driven, Talking Head, Transformer |
| 2025 |
IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer |
ArXiv 2025 |
|
|
Audio-Driven, Talking Head, Attention, Latent |
| 2025 |
Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy |
ArXiv 2025 |
|
|
Audio-Driven, Attention, Latent |
| 2025 |
StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model |
ArXiv 2025 |
|
Project |
3D, Diffusion, Streaming, Audio-Driven |
| 2025 |
ConsistTalk: Intensity Controllable Temporally Consistent Talking Head Generation with Diffusion Noise Search |
AAAI 2026 |
|
|
Diffusion, Talking Head, AAAI, Knowledge Distillation |
| 2025 |
GRACE: Designing Generative Face Video Codec via Agile Hardware-Centric Workflow |
ArXiv 2025 |
|
|
Talking Head |
| 2025 |
Shared Latent Representation for Joint Text-to-Audio-Visual Synthesis |
ArXiv 2025 |
|
|
Audio-Driven, Latent |
| 2025 |
THEval. Evaluation Framework for Talking Head Video Generation |
ArXiv 2025 |
|
|
Talking Head |
| 2025 |
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions |
ArXiv 2025 |
|
|
Audio-Driven, Transformer, Attention, Latent |
| 2025 |
Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback |
ArXiv 2025 |
|
|
Diffusion, Audio-Driven, AAAI, Transformer |
| 2025 |
Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation |
ICXR 2025 |
|
|
Blendshapes, FLAME, Disentanglement, 3D Animation |
| 2025 |
Revising Second Order Terms in Deep Animation Video Coding |
ArXiv 2025 |
|
|
FOMM, Keypoints, Head Rotation, Motion Model |
| 2025 |
MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control |
ArXiv 2025 |
|
|
|
| 2025 |
See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region Refinement |
TASLP 2025 |
|
|
High-Resolution, Talking Faces, Speech-to-Face, Diffusion |
| 2025 |
LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representation |
SIGGRAPH Asia 2025 |
Code |
|
Label-Free, Speech-Driven, Facial Animation, FLAME |
| 2025 |
DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis |
ArXiv 2025 |
|
|
Disentangled Motion, Flow Matching, Talking Portrait, Controllable |
| 2025 |
SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation |
ArXiv 2025 |
|
|
Contrastive Masked Pretraining, Audio-Visual, Talking-Face |
| 2025 |
EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation |
IEEE SMC 2025 |
|
|
Real-Time, Audio-Driven, Gaussian Deformation, Talking Head |
| 2025 |
Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing |
ArXiv 2025 |
|
|
Biometric Leakage, AI Videoconferencing, Security |
| 2025 |
Audio Driven Real-Time Facial Animation for Social Telepresence |
SIGGRAPH Asia 2025 |
|
Project |
Real-time, Audio-Driven, SIGGRAPH, Transformer, Latent |
| 2025 |
A Bridge from Audio to Video: Phoneme-Viseme Alignment Allows Every Face to Speak Multiple Languages |
ArXiv 2025 |
|
|
Phoneme-Viseme Alignment, Multilingual TFS, Mixture-of-Experts |
| 2025 |
Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation |
ArXiv 2025 |
|
Project |
Diffusion models, co-speech video, real-time, sparse attention |
| 2025 |
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis |
ArXiv 2025 |
|
|
Multimodal Instructions, Avatar Synthesis, Lip Synchronization |
| 2025 |
A Lightweight Pipeline for Noisy Speech Voice Cloning and Accurate Lip Sync Synthesis |
ArXiv 2025 |
|
|
Voice Cloning, Lip Sync Synthesis, Noisy Speech |
| 2025 |
CEM-Net: Cross-Emotion Memory Network for Emotional Talking Face Generation |
ArXiv 2025 |
|
|
Cross-emotion memory, audio emotion enhancement, expression displacement, lip sync |
| 2025 |
Warm Chat: Diffuse Emotion-aware Interactive Talking Head Avatar with Tree-Structured Guidance |
ArXiv 2025 (Withdrawn) |
|
|
Emotional, Avatar, Talking Head, Transformer, Latent |
| 2025 |
EmoCAST: Emotional Talking Portrait via Emotive Text Description |
ArXiv 2025 |
Code |
Project |
Emotional, Portrait Animation, Talking Head, Attention |
| 2025 |
READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation |
ArXiv 2025 |
|
Project |
Diffusion, Real-time, Audio-Driven, Talking Head, Transformer, VAE |
| 2025 |
Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching |
Technical Report |
|
Project |
real-time, flow matching, lip-sync |
| 2025 |
ATL-Diff: Audio-Driven Talking Head Generation with Early Landmarks-Guide Noise Diffusion |
ArXiv 2025 |
Code |
|
Diffusion, Landmarks-Guide, Real-time, Identity Preservation |
| 2025 |
MoDiT: Learning Highly Consistent 3D Motion Coefficients with Diffusion Transformer for Talking Head Generation |
ArXiv 2025 |
|
|
3DMM, Diffusion Transformer, Temporal Consistency, Blinking Dynamics |
| 2025 |
MOSPA: Human Motion Generation Driven by Spatial Audio |
NeurIPS 2025 |
Code |
|
Spatial Audio, Human Motion Generation, Virtual Human |
| 2025 |
Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning |
ArXiv 2025 |
|
|
Joint uncertainty learning, Audio-driven talking face, Lip sync, Visual uncertainty |
| 2025 |
Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation |
TMM 2025 |
|
|
Talking Head Animation, Temporal Correlation, One-Shot |
| 2025 |
EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters |
ArXiv 2025 |
|
|
Neural Radiance Fields, Expression Parameters, Emotion Control, Audio-Driven |
| 2025 |
STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing |
ICME 2025 |
Code |
|
Spatial-Temporal Alignment, Semantic Features, Visual Dubbing, Stability |
| 2025 |
Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait |
ArXiv 2025 |
Code |
|
implicit keypoint, spatiotemporal diffusion, audio-driven, talking portrait |
| 2025 |
EmoDiffusion: Enhancing Emotional 3D Facial Animation with Latent Diffusion Models |
ArXiv 2025 |
|
Project |
3D facial animation, latent diffusion, emotional expression, speech-driven |
| 2025 |
Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation |
ArXiv 2025 |
|
|
Audio-driven, Diffusion model, Motion Diffusion Transformer, Lip sync |
| 2025 |
MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation |
ArXiv 2025 |
|
|
Audio-driven, Emotion Synthesis, Mixture of Experts, Portrait Animation |
| 2025 |
DGTalker: Disentangled Generative Latent Space Learning for Audio-Driven Gaussian Talking Heads |
ICCV 2025 |
|
Project |
Gaussian, Latent Space |
| 2025 |
Talking Head Generation via Viewpoint and Lighting Simulation Based on Global Representation |
ACM MM 2025 |
|
|
Depth-based |
| 2025 |
PESTalk: Speech-Driven 3D Facial Animation with Personalized Emotional Styles |
ACM MM 2025 |
|
|
FLAME |
| 2025 |
GOES: 3D Gaussian-based One-shot Head Animation with Any Emotion and Any Style |
ACM MM 2025 |
|
|
One-Shot, 3DGS |
| 2025 |
StableDub: Taming Diffusion Prior for Generalized and Efficient Visual Dubbing |
ArXiv 2025 |
|
Project |
Visual Dubbing, Diffusion, Mamba-Transformer |
| 2025 |
KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation |
ArXiv 2025 |
|
|
Keyframe, Diffusion, Dual-Path, Facial Animation |
| 2025 |
SynchroRaMa: Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding |
WACV 2026 |
|
Project |
Multi-Modal, Emotion-Aware, LLM |
| 2025 |
Talking Head Generation via AU-Guided Landmark Prediction |
ArXiv 2025 |
|
|
Action Units, Landmark Prediction, Diffusion |
| 2025 |
3DiFACE: Synthesizing and Editing Holistic 3D Facial Animation |
ArXiv 2025 |
|
Project |
3D Facial Animation, Diffusion, Editing, Speech-Driven |
| 2025 |
PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control |
ICONIP 2025 |
|
|
3DGS, Real-Time, Pixel-Aware, Audio-Driven |
| 2025 |
Beat on Gaze: Learning Stylized Generation of Gaze and Head Dynamics |
ArXiv 2025 |
|
|
Gaze Control, Head Motion, Style-Aware, 3D |
| 2025 |
Think2Sing: Orchestrating Structured Motion Subtitles for Singing-Driven 3D Head Animation |
ArXiv 2025 |
|
|
Singing-Driven, 3D Head, Diffusion |
| 2025 |
Audio2Face-3D: Audio-driven Realistic Facial Animation For Digital Avatars |
ArXiv 2025 |
|
|
Audio-driven Realistic Facial Animation, Digital Avatars |
| 2025 |
DisenEmo: Learning disentangled emotional representation from facial motion for 3D talking head generation |
ICIP 2025 |
|
|
Disentangled Emotional Representation, 3D Talking Head Generation |
| 2025 |
ExpTalk: Diverse Emotional Expression via Adaptive Disentanglement and Refined Alignment for Speech-Driven 3D Facial Animation |
IJCAI 2025 |
|
|
Adaptive Disentanglement, Refined Alignment, 3D Facial Animation |
| 2025 |
SyncGaussian: Stable 3D Gaussian-Based Talking Head Generation with Enhanced Lip Sync via Discriminative Speech Feature |
IJCAI 2025 |
|
|
Stable 3D Gaussian-Based Talking Head Generation, Enhanced Lip Sync, Discriminative Speech Feature |
| 2025 |
Wan-S2V: Audio-Driven Cinematic Video Generation |
ArXiv 2025 |
|
|
Cinematic, Audio-Driven, Video Generation |
| 2025 |
InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing |
ArXiv 2025 |
|
|
Sparse-Frame Dubbing, Full-Body |
| 2025 |
D^3-Talker: Dual-Branch Decoupled Deformation Fields for Few-Shot 3D Talking Head Synthesis |
ECAI 2025 |
|
|
Few-Shot, 3DGS, Deformation Fields |
| 2025 |
RealTalk: Realistic Emotion-Aware Lifelike Talking-Head Synthesis |
ICCV 2025 Workshop |
|
|
Emotion, NeRF, VAE |
| 2025 |
FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation |
ArXiv 2025 |
|
Project |
Audio-Driven, Portrait Animation, Preference Optimization |
| 2025 |
HM-Talker: Hybrid Motion Modeling for High-Fidelity Talking Head Synthesis |
ArXiv 2025 |
|
|
Hybrid Motion, High-Fidelity, Talking Head |
| 2025 |
StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation |
ArXiv 2025 |
Code |
Project |
Stable Diffusion |
| 2025 |
X-Actor: Emotional and Expressive Long-Range Portrait Acting from Audio |
ArXiv 2025 |
|
Project |
Emotional Portrait, Long-range, Audio-driven |
| 2025 |
DICE-Talk: Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation |
ACM MM 2025 |
|
|
Emotional Portrait, Identity Preservation, Emotion Cooperation |
| 2025 |
M2DAO-Talker: Harmonizing Multi-granular Motion Decoupling and Alternating Optimization for Talking-head Generation |
ArXiv 2025 |
|
Project |
Multi-granular Motion, Decoupling, Optimization |
| 2025 |
MEDTalk: Multimodal Controlled 3D Facial Animation with Dynamic Emotions by Disentangled Embedding |
ArXiv 2025 |
Code |
|
Multimodal, 3D Facial Animation, Dynamic Emotions |
| 2025 |
KLASSify to Verify: Audio-Visual Deepfake Detection Using SSL-based Audio and Handcrafted Visual Features |
ACM MM 2025 |
|
|
Deepfake Detection, Audio-Visual, SSL |
| 2025 |
DiTalker: A Unified DiT-based Framework for High-Quality and Speaking Styles Controllable Portrait Animation |
ArXiv 2025 |
|
Project |
DiT, Portrait Animation, Speaking Styles |
| 2025 |
Learning Phonetic Context-Dependent Viseme for Enhancing Speech-Driven 3D Facial Animation |
Interspeech 2025 |
Project |
|
Phonetic Context, Viseme, 3D Facial Animation |
| 2025 |
SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation |
ACM MM 2025 |
|
|
Spatial Audio, Video Generation, MLLM |
| 2025 |
Is It Really You? Exploring Biometric Verification Scenarios in Photorealistic Talking-Head Avatar Videos |
IEEE IJCB 2025 |
|
|
Biometric Verification, Avatar Security, Facial Motion |
| 2025 |
Mask-Free Audio-driven Talking Face Generation for Enhanced Visual Quality and Identity Preservation |
ArXiv 2025 |
|
|
Mask-Free, Identity Preservation, Audio-driven |
| 2025 |
MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization |
ICCV 2025 |
|
Project |
Personalized, 3D Facial Animation, Memory |
| 2025 |
Real-time Generation of Various Types of Nodding for Avatar Attentive Listening System |
ICMI 2025 |
Code |
|
Real-time, Nodding Generation, Avatar Interaction |
| 2025 |
MoDA: Multi-modal Diffusion Architecture for Talking Head Generation |
ArXiv 2025 |
|
Project |
Multi-modal, Diffusion, Talking Head Generation |
| 2025 |
GGTalker: Talking Head Synthesis with Generalizable Gaussian Priors and Identity-Specific Adaptation |
ICCV 2025 |
Code |
Project |
3D Talking Head, Gaussian Priors, Identity Adaptation |
| 2025 |
FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases |
ArXiv 2025 |
|
|
Identity Leakage, Extreme Cases |
| 2025 |
Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field |
ArXiv 2025 |
|
|
Few-Shot, Global Gaussian Field, 3DGS |
| 2025 |
JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching |
ArXiv 2025 |
|
|
Flow Matching, Audio-Motion |
| 2025 |
ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model |
ArXiv 2025 |
|
|
Autoregressive, FLAME, 3D |
| 2025 |
Audio-Visual Driven Compression for Low-Bitrate Talking Head Videos |
ICMR 2025 |
|
|
Compression, Low-Bitrate |
| 2025 |
SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting |
ArXiv 2025 |
|
|
3DGS, Synchronization |
| 2025 |
Controllable Expressive 3D Facial Animation via Diffusion in a Unified Multimodal Space |
ICME 2025 |
|
|
3D, Diffusion, Multimodal |
| 2025 |
EmoVOCA: Speech-Driven Emotional 3D Talking Heads |
WACV 2025 |
|
|
Emotional, 3D, VOCA |
| 2025 |
Lipschitz-Driven Noise Robustness in VQ-AE for High-Frequency Texture Repair in ID-Specific Talking Heads |
ArXiv 2025 |
|
|
Noise Robustness, VQ-AE, High-Frequency |
| 2025 |
LLIA -- Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models |
ArXiv 2025 |
|
|
Low-Latency, Real-Time, Interactive |
| 2025 |
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation |
CVPR 2025 |
|
|
Global Audio Perception, Portrait Animation |
| 2025 |
High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning |
ArXiv 2025 |
|
|
LLM, Reliability |
| 2025 |
Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation |
CVPR 2025 |
|
|
Adversarial Defense, Privacy |
| 2025 |
Cocktail-Party Audio-Visual Speech Recognition |
Interspeech 2025 |
|
|
Audio-Visual Speech Recognition |
| 2025 |
TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models |
ArXiv 2025 |
|
|
Real-Time, Autoregressive Diffusion |
| 2025 |
MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation |
ArXiv 2025 |
|
|
Co-Speech Gesture, Two-Stage |
| 2025 |
V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow |
ICASSP 2025 |
|
|
Video-to-Speech, Speech Decomposition |
| 2025 |
IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos |
CVPR 2025 |
|
|
3D-aware, Video Diffusion |
| 2025 |
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements |
ArXiv 2025 |
|
|
Voice Conversion, Survey |
| 2025 |
FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing |
ArXiv 2025 |
|
|
Attribute Editing, Interactive |
| 2025 |
Video Editing for Audio-Visual Dubbing |
ArXiv 2025 |
|
|
Video Editing, Dubbing |
| 2025 |
Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation |
CVPR 2025 |
|
|
3D, Semantic Decoupling |
| 2025 |
Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion |
ArXiv 2025 |
|
|
Diffusion, 3D |
| 2025 |
VTutor: An Animated Pedagogical Agent SDK that Provide Real Time Multi-Model Feedback |
ArXiv 2025 |
|
|
SDK, LLM, Real-time |
| 2025 |
OT-Talk: Animating 3D Talking Head with Optimal Transportation |
ArXiv 2025 |
|
|
FLAME, 3D |
| 2025 |
GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting |
CVPRW 2025 |
|
|
3DGS |
| 2025 |
Model See Model Do: Speech-Driven Facial Animation with Style Control |
SIGGRAPH 2025 |
|
|
|
| 2025 |
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing |
ArXiv 2025 |
|
|
LLM, Qwen |
| 2025 |
KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution |
ArXiv 2025 |
|
|
|
| 2025 |
Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis |
ArXiv 2025 |
|
|
Diffusion |
| 2025 |
FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis |
ICMR 2025 |
|
|
|
| 2025 |
MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices |
CVPR 2025 |
|
|
100+fps |
| 2025 |
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation |
ArXiv 2025 |
|
|
|
| 2025 |
FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation |
CVPR 2025 |
|
|
Fast Diffusion 12.5X speedup |
| 2025 |
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency |
ICLR 2025 |
|
|
|
| 2025 |
Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance |
ArXiv 2025 |
|
|
|
| 2025 |
Audio-driven Gesture Generation via Deviation Feature in the Latent Space |
ArXiv 2025 |
|
|
Gesture |
| 2025 |
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics |
CVPR 2025 |
|
|
|
| 2025 |
MGGTalk:Monocular and Generalizable Gaussian Talking Head Animation |
CVPR 2025 |
Project |
|
One Shot, 3DGS |
| 2025 |
DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance |
ArXiv 2025 |
|
|
Dubbing |
| 2025 |
Dual Audio-Centric Modality Coupling for Talking Head Generation |
ArXiv 2025 |
|
|
NeRF |
| 2025 |
Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis |
ArXiv 2025 |
|
|
3DGS |
| 2025 |
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers |
CVPR 2025 |
|
|
DiT |
| 2025 |
DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model |
ICME 2025 |
|
|
|
| 2025 |
Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation |
CVPR 2025 |
|
|
Autoregressive |
| 2025 |
DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation |
ICME 2025 |
|
|
Diffusion, 3D |
| 2025 |
Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation |
CVPR 2025 |
|
|
|
| 2025 |
HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation |
CVPR 2025 |
|
|
Hunyuan |
| 2025 |
MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling |
ArXiv 2025 |
|
|
|
| 2025 |
StyleSpeaker: Audio-Enhanced Fine-Grained Style Modeling for Speech-Driven 3D Facial Animation |
ArXiv 2025 |
|
|
3D |
| 2025 |
LatentSnc: Taming Audio-Conditioned Latent Diffusion Models for Lip Sync with SyncNet Supervision |
ArXiv 2025 |
|
|
|
| 2025 |
MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice |
ArXiv 2025 |
|
|
|
| 2025 |
KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation |
CVPR 2025 |
|
|
Diffusion, Long Sequences |
| 2025 |
Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture |
CVPR 2025 |
|
|
Texture |
| 2025 |
AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation |
IEEE Transactions on Multimedia |
|
Project |
MLoRA, Personalized |
| 2025 |
InsTaG: Learning Personalized 3D Talking Head from Few-Second Video |
CVPR 2025 |
|
|
Few Shot, 3DGS |
| 2025 |
FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion model |
ArXiv 2025 |
|
|
Diffusion |
| 2025 |
NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis |
ICASSP 2025 |
|
|
|
| 2025 |
Emotional Face-to-Speech |
ArXiv 2025 |
|
|
emotion, face2speech |
| 2025 |
EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis |
ArXiv 2025 |
|
|
emotion, 3DGS |
| 2025 |
EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face Animation |
ArXiv 2025 |
|
|
emotion,3D |
| 2025 |
Towards Dynamic NeProbTalk3Dural Communication and Speech Neuroprosthesis Based on Viseme Decoding |
ICASSP 2025 |
|
|
Viseme |
| 2025 |
SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation |
ArXiv 2025 |
|
|
Huaman Pose |
| 2025 |
JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing |
ArXiv 2025 |
|
|
Depth, JD work |
| 2025 |
Identity-Preserving Video Dubbing Using Motion Warping |
ArXiv 2025 |
|
|
Video Dubbing |
| 2025 |
LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition |
ICASSP 2025 |
|
|
VSR |
| 2025 |
DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis |
ICASSP 2025 |
|
|
Hair-Preserving |
| 2025 |
UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control |
ArXiv 2025 |
|
|
SD, Lighting control |
| 2024 |
Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis |
ArXiv 2024 |
|
|
Audio Feature Extraction, Whisper, Real-time processing, Talking portrait synthesis |
| 2024 |
PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation |
ArXiv 2024 |
|
Project |
Pose Latent Diffusion, Lip Synchronization, Text-Audio Control |
| 2024 |
One-Shot Pose-Driving Face Animation Platform |
ArXiv 2024 |
|
|
One-Shot, Pose-Driving, Face Animation, Talking Head |
| 2024 |
FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization |
ArXiv 2024 |
|
|
Normalizing Flow, Vector-Quantization, Lip Sync, Emotional Talking Faces |
| 2024 |
VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization |
ArXiv 2024 |
|
|
visemes, code book |
| 2024 |
PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis |
AAAI 2025 |
|
|
Point Cloud, Gaussian Splatting |
| 2024 |
EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion |
CVPR 2025 |
|
Project |
Emotion, Expressive, Diffusion |
| 2024 |
GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression |
AAAI 2025 |
|
|
Gaze-oriented |
| 2024 |
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing |
CVPR 2025 |
|
|
Emotion, Dubber |
| 2024 |
PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation |
ArXiv 2024 |
|
|
Diffusion, Attention, One-Shot |
| 2024 |
DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation |
AAAI 2025 |
|
|
3D face, FLAME, Emotion |
| 2024 |
LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync |
ArXiv 2024 |
|
|
Diffusion, SyncNet |
| 2024 |
GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression |
AAAI 2025 |
|
|
Gaze |
| 2024 |
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait |
ICCV 2025 |
|
Project |
Flow Matching |
| 2024 |
SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model |
ArXiv 2024 |
|
|
Diffusion, Style |
| 2024 |
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming |
Tech Report |
|
|
Omni!!! |
| 2024 |
Controllable Talking Face Generation by Implicit Facial Keypoints Editing |
ArXiv 2024 |
|
|
Face Edit |
| 2024 |
SPEAK: Speech-Driven Pose and Emotion-Adjustable Talking Head Generation |
ArXiv 2024 |
|
|
|
| 2024 |
LokiTalk: Learning Fine-Grained and Generalizable Correspondences to Enhance NeRF-based Talking Head Synthesis |
ArXiv 2024 |
|
|
NeRF |
| 2024 |
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation |
ArXiv 2024 |
|
|
Memory |
| 2024 |
IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation |
ArXiv 2024 |
|
|
Motion Diffusion Model |
| 2024 |
Memories are One-to-Many Mapping Alleviators in Talking Face Generation |
IEEE 2024 |
|
|
Memory |
| 2024 |
Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis |
ArXiv 2024 |
|
|
Diffusion |
| 2024 |
GaussianSpeech: Audio-Driven Gaussian Avatars |
ArXiv 2024 |
|
|
3DGS, 3D |
| 2024 |
LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis |
ArXiv 2024 |
|
|
|
| 2024 |
EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion |
ArXiv 2024 |
|
|
|
| 2024 |
S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis |
ECCV 2024 |
|
|
|
| 2024 |
LES-Talker: Fine-Grained Emotion Editing for Talking Head Generation in Linear Emotion Space |
ArXiv 2024 |
|
|
Fine-Grained Emotion |
| 2024 |
JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation |
ArXiv 2024 |
|
|
Diffusion, VASA |
| 2024 |
JoyHallo: Digital human model for Mandarin |
ArXiv 2024 |
|
|
Diffusion, Hallo |
| 2024 |
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation |
ICLR 2025 |
|
|
Diffusion, Hallo |
| 2024 |
Audio-Driven Emotional 3D Talking-Head Generation |
ArXiv 2024 |
|
|
Emotion |
| 2024 |
Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts |
ArXiv 2024 |
|
|
|
| 2024 |
Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization |
ArXiv 2024 |
|
|
|
| 2024 |
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation |
ArXiv 2024 |
|
|
Non-autoregressive Diffusion |
| 2024 |
LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details |
ArXiv 2024 |
|
|
|
| 2024 |
Diverse Code Query Learning for Speech-Driven Facial Animation |
ArXiv 2024 |
|
|
|
| 2024 |
TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans |
ECCVW 2024 |
|
|
NeRF |
| 2024 |
ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE |
SIGGRAPH MIG 2024 |
|
|
3D |
| 2024 |
JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation |
BMVC 2024 |
|
|
NeRF |
| 2024 |
3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy |
ArXiv 2024 |
|
|
|
| 2024 |
LawDNet: Enhanced Audio-Driven Lip Synthesis via Local Affine Warping Deformation |
ArXiv 2024 |
|
|
|
| 2024 |
StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads |
TPAMI 2024 |
|
|
|
| 2024 |
DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures |
ArXiv 2024 |
|
|
diffusion |
| 2024 |
EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion |
ArXiv 2024 |
|
|
Diffusion |
| 2024 |
PersonaTalk: Bring Attention to Your Persona in Visual Dubbing |
SIGGRAPH Asia 2024 |
|
|
|
| 2024 |
KAN-Based Fusion of Dual-Domain for Audio-Driven Facial Landmarks Generation |
ArXiv 2024 |
|
|
KAN |
| 2024 |
TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation |
ArXiv 2024 |
|
|
LoRA |
| 2024 |
Avatar Concept Slider: Manipulate Concepts In Your Human Avatar With Fine-grained Control |
ArXiv 2024 |
|
|
|
| 2024 |
G3FA: Geometry-guided GAN for Face Animation |
BMVC 2024 |
|
|
|
| 2024 |
Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation |
ArXiv 2024 |
|
|
|
| 2024 |
DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation |
ArXiv 2024 |
|
|
|
| 2024 |
High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model |
IEEE TIP |
|
|
|
| 2024 |
Style-Preserving Lip Sync via Audio-Aware Style Reference |
IEEE TIP |
|
|
|
| 2024 |
Talk to the Wall: The Role of Speech Interaction in Collaborative Visual Analytics |
IEEE TVCG 2024 |
|
|
Collaborative |
| 2024 |
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation |
ArXiv 2024 |
|
|
Co-Speech Gesture |
| 2024 |
GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer |
ArXiv 2024 |
|
|
|
| 2024 |
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model |
ArXiv 2024 |
|
|
|
| 2024 |
DiM-Gesture: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2 framework |
ArXiv 2024 |
|
|
|
| 2024 |
What if Red Can Talk? Dynamic Dialogue Generation Using Large Language Models |
ACL Wordplay 2024 |
|
|
|
| 2024 |
LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement |
ArXiv 2024 |
|
|
|
| 2024 |
RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network |
ArXiv 2024 |
|
|
|
| 2024 |
Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation |
ArXiv 2024 |
|
|
|
| 2024 |
JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model |
ArXiv 2024 |
|
|
3D |
| 2024 |
Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs |
COLM 2024 |
|
|
LLM |
| 2024 |
Digital Avatars: Framework Development and Their Evaluation |
ArXiv 2024 |
|
|
|
| 2024 |
EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head |
ECCV 2024 |
|
|
|
| 2024 |
PAV: Personalized Head Avatar from Unstructured Video Collection |
ECCV 2024 |
|
|
|
| 2024 |
Text-based Talking Video Editing with Cascaded Conditional Diffusion |
ArXiv 2024 |
|
|
|
| 2024 |
EmoFace: Audio-driven Emotional 3D Face Animation |
IEEE VR 2024 |
|
|
|
| 2024 |
Learning Online Scale Transformation for Talking Head Video Generation |
ArXiv 2024 |
|
|
|
| 2024 |
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning |
AAAI 2025 |
|
|
🔥阿里 |
| 2024 |
Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN |
ArXiv 2024 |
|
|
StyleGAN |
| 2024 |
Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert |
Interspeech 2024 |
|
|
3D |
| 2024 |
MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset |
Interspeech 2024 |
|
|
3D, Dataset |
| 2024 |
NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation |
ArXiv 2024 |
|
|
NeRF |
| 2024 |
Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement |
ArXiv 2024 |
|
|
|
| 2024 |
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation |
Tech Report |
|
|
🔥EMO, Diffusion, Open-source |
| 2024 |
CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer |
WACV 2024 |
|
|
|
| 2024 |
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation |
ArXiv 2024 |
|
|
🔥EMO, Diffusion, Open-source |
| 2024 |
Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation |
ArXiv 2024 |
|
|
Emotion |
| 2024 |
Controllable Talking Face Generation by Implicit Facial Keypoints Editing |
ArXiv 2024 |
|
|
Controller |
| 2024 |
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation |
ArXiv 2024 |
|
|
Text-Guided |
| 2024 |
Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation |
ArXiv 2024 |
|
|
A Benchmark and Survey |
| 2024 |
NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior |
CVPRW 2024 |
|
|
SadTalker+NeRF |
| 2024 |
SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space |
ICASSP 2025 |
|
|
|
| 2024 |
AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding |
ArXiv 2024 |
|
|
|
| 2024 |
EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars |
ArXiv 2024 |
|
|
EMO |
| 2024 |
GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting |
ACMM 2024 |
|
|
🔥Gaussian Splatting |
| 2024 |
CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation |
ArXiv 2024 |
|
|
Emotion |
| 2024 |
GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting |
ArXiv 2024 |
|
|
🔥Gaussian Splatting |
| 2024 |
GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting |
ACMM 2024 |
|
|
🔥Gaussian Splatting |
| 2024 |
TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting |
ECCV 2024 |
|
|
🔥Gaussian Splatting |
| 2024 |
Learn2Talk: 3D Talking Face Learns from 2D Talking Face |
ArXiv 2024 |
|
|
🔥Gaussian Splatting |
| 2024 |
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time |
NeurIPS 2024 |
|
|
🔥🔥🔥Awesome,Microsoft |
| 2024 |
Pose-Aware 3D Talking Face Synthesis using Geometry-guided Audio-Vertices Attention |
IEEE 2024 |
|
|
|
| 2024 |
EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis |
ECCV 2024 |
|
|
Emotion |
| 2024 |
FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio |
ArXiv 2024 |
|
|
|
| 2024 |
Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior |
ArXiv 2024 |
|
|
|
| 2024 |
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation |
ArXiv 2024 |
|
|
🔥🔥🔥Similar to EMO |
| 2024 |
Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework |
CVPR 2024 |
|
|
|
| 2024 |
Adaptive Super Resolution For One-Shot Talking-Head Generation |
ICASSP 2024 |
|
|
|
| 2024 |
VLOGGER: Multimodal Diffusion for Embodied |
ArXiv 2024 |
|
|
Embodied |
| 2024 |
EmoVOCA: Speech-Driven Emotional 3D Talking Heads |
ArXiv 2024 |
|
|
3D, VOCA |
| 2024 |
ScanTalk: 3D Talking Heads from Unregistered Scans |
ECCV 2024 |
|
|
3D |
| 2024 |
Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style |
ArXiv 2024 |
|
|
|
| 2024 |
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions |
ArXiv 2024 |
|
|
🔥🔥🔥Amazing, Diffusion |
| 2024 |
G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment |
ArXiv 2024 |
|
|
A Generic Framework |
| 2024 |
Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis |
CVPR 2024 |
|
|
High-Quality |
| 2024 |
DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer |
ArXiv 2024 |
|
|
3D |
| 2024 |
EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation |
ArXiv 2024 |
|
|
Emotion |
| 2024 |
NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis |
ICASSP 2024 |
|
|
AU |
| 2024 |
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis |
ICLR 2024 |
|
|
3D, One-Shot,Realistic |
| 2024 |
SyncTalk: The Devil😈 is in the Synchronization for Talking Head Synthesis |
CVPR 2024 |
|
|
😈Talking Head |
| 2024 |
AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation |
ArXiv 2024 |
|
|
3D, Mesh |
| 2024 |
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation |
ArXiv 2024 |
|
|
Emotion |
| 2024 |
AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis |
AAAI 2024 |
|
|
|
| 2024 |
R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning |
ArXiv 2024 |
|
|
based-RAD-NeRF |
| 2024 |
DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis |
ICASSP 2024 |
- |
- |
ER-NeRF |
| 2023 |
Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis |
ICCV 2023 |
|
|
Tri-plane |
| 2023 |
LipNeRF: What is the right feature space to lip-sync a NeRF? |
FG 2023 |
|
|
Wav2lip |
| 2024 |
VectorTalker: SVG Talking Face Generation with Progressive Vectorisation |
ArXiv 2024 |
|
|
SVG |
| 2024 |
Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation |
AAAI 2024 |
|
|
3D |
| 2024 |
DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models |
ArXiv 2024 |
|
|
Diffusion |
| 2024 |
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models |
ArXiv 2024 |
|
|
|
| 2024 |
GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance |
ArXiv 2024 |
|
|
3D |
| 2024 |
GMTalker: Gaussian Mixture based Emotional talking video Portraits |
ArXiv 2024 |
|
|
Emotion |
| 2024 |
VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior |
ArXiv 2024 |
|
|
Mesh |
| 2024 |
GAIA: Zero-shot Talking Avatar Generation |
ArXiv 2024 |
Code(coming) |
|
😲😲😲 |
| 2023 |
Towards Streaming Speech-to-Avatar Synthesis |
ArXiv 2023 |
|
|
Streaming Synthesis, Articulatory Inversion, Real-time, Speech-driven |
| 2023 |
OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions |
ArXiv 2023 |
|
|
One-shot Talking Head, Head Motions, One-to-Many Mapping, Audio-driven |
| 2023 |
Controllable One-Shot Face Video Synthesis With Semantic Aware Prior |
ArXiv 2023 |
|
|
One-shot Talking Head, Semantic Aware Prior, Controllable Generation, Pose Alignment |
| 2023 |
FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions |
ICME 2023 |
|
|
Natural Head Motions, Flow-guided, Audio-driven Pose Prediction, One-shot Talking Head |
| 2023 |
OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering |
ArXiv 2023 |
|
|
Tri-plane Rendering, One-shot Avatar, Controllable, 3D Consistency |
| 2023 |
OPT: One-shot Pose-Controllable Talking Head Generation |
ICASSP 2023 |
|
|
pose control, identity preservation, audio feature disentanglement |
| 2023 |
Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation |
ICCV 2023 |
|
|
- |
| 2023 |
ToonTalker: Cross-Domain Face Reenactment |
ICCV 2023 |
- |
- |
- |
| 2023 |
Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation |
ICCV 2023 |
|
|
- |
| 2023 |
EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation |
ICCV 2023 |
- |
- |
Emotion |
| 2023 |
Emotional Listener Portrait: Realistic Listener Motion Simulation in Conversation |
ICCV 2023 |
- |
- |
Emotion,LHG |
| 2023 |
MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions |
ICCV 2023 |
- |
- |
- |
| 2023 |
Facediffuser: Speech-driven 3d facial animation synthesis using diffusion |
ACM SIGGRAPH MIG 2023 |
|
|
🔥Diffusion,3D |
| 2023 |
Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis |
TCSVT 2023 |
- |
- |
|
| 2023 |
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation |
CVPR 2023 |
|
|
3D,Single Image |
| 2023 |
EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation |
ICCV 2023 |
|
|
3D,Emotion |
| 2023 |
Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks |
InterSpeech 2023 |
|
|
Emotion |
| 2023 |
DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video |
AAAI 2023 |
|
|
|
| 2023 |
StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles |
AAAI 2023 |
|
|
Style |
| 2023 |
High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning |
CVPR 2023 |
|
|
Emotion |
| 2023 |
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator |
CVPR 2023 |
|
|
- |
| 2023 |
TalkLip: Seeing What You Said - Talking Face Generation Guided by a Lip Reading Expert |
CVPR 2023 |
|
|
|
| 2023 |
CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior |
CVPR 2023 |
|
|
3D,codebook |
| 2023 |
Emotionally Enhanced Talking Face Generation |
ArXiv 2023 |
|
|
Emotion |
| 2023 |
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder |
ACMM 2023 |
|
|
🔥Diffusion |
| 2023 |
READ Avatars: Realistic Emotion-controllable Audio Driven Avatars |
ArXiv 2023 |
|
|
- |
| 2023 |
DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis |
CVPR 2023 |
|
|
🔥Diffusion |
| 2023 |
Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation |
ArXiv 2023 |
- |
|
🔥Diffusion |
| 2022 |
Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis |
ArXiv 2022 |
|
|
disentangled representation, contrastive learning, multi-motion control |
| 2022 |
Emotion-Controllable Generalized Talking Face Generation |
IJCAI 2022 |
|
|
emotion control, graph convolutional network, geometry-aware |
| 2022 |
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN |
ArXiv 2022 |
|
|
StyleGAN, high-resolution, one-shot, lip sync |
| 2022 |
VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild |
SIGGRAPH 2022 |
|
|
|
| 2022 |
Expressive Talking Head Generation with Granular Audio-Visual Control |
CVPR 2022 |
- |
- |
- |
| 2022 |
Talking Face Generation with Multilingual TTS |
CVPR 2022 |
|
|
- |
| 2022 |
EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model |
SIGGRAPH 2022 |
- |
- |
Emotion |
| 2022 |
SPACEx 🚀: Speech-driven Portrait Animation with Controllable Expression |
ArXiv 2022 |
- |
Project |
- |
| 2022 |
Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers |
SIGGRAPH Asia 2022 |
- |
- |
- |
| 2022 |
Memories are One-to-Many Mapping Alleviators in Talking Face Generation |
ArXiv 2022 |
- |
- |
- |
| 2021 |
One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning |
AAAI 2022 |
|
|
one-shot, audio-visual correlation, keypoint-based motion, lip sync |
| 2021 |
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion |
ArXiv 2021 |
|
|
Audio-driven, Talking-head, Head Motion, Keypoint-based Motion |
| 2021 |
3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head |
ArXiv 2021 |
|
|
3D Talking Head, Emotion, Geometry Map, Audio-driven |
| 2021 |
MakeItTalk: Speaker-Aware Talking-Head Animation |
SIGGRAPH Asia 2020 |
|
|
Speaker-Aware, Audio-Driven, Facial Landmarks, Photorealistic |
| 2021 |
PC-AVS: Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation |
CVPR 2021 |
|
|
- |
| 2021 |
Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis |
ACM MM 2021 |
- |
- |
- |
| 2021 |
Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation |
IJCAI 2021 |
- |
- |
- |
| 2021 |
Talking Head Generation with Audio and Speech Related Facial Action Units |
BMVC 2021 |
- |
- |
AU |
| 2021 |
Audio-Driven Emotional Video Portraits |
CVPR 2021 |
|
|
Emotion |
| 2021 |
IATS: Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis |
ACM Multimedia 2021 |
- |
- |
- |
| 2020 |
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild |
ACM Multimedia 2020 |
|
|
- |
| 2020 |
Talking-head Generation with Rhythmic Head Motion |
ECCV 2020 |
|
|
- |
| 2020 |
Speaker-Aware Talking-Head Animation |
SIGGRAPH Asia 2020 |
|
|
- |
| 2020 |
Neural Voice Puppetry: Audio-driven Facial Reenactment |
ECCV 2020 |
|
|
- |
| 2020 |
A Large-scale Audio-visual Dataset for Emotional Talking-face Generation |
ECCV 2020 |
|
|
- |
| 2020 |
Realistic Speech-Driven Facial Animation with GANs |
IJCV 2020 |
|
|
- |
| 2020 |
Multi Modal Adaptive Normalization for Audio to Video Generation |
ArXiv 2020 |
|
|
Audio-to-Video, Multi-Modal Adaptive Normalization, Facial Video Generation, Keypoint Heatmap |
| 2019 |
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation |
AAAI 2019 |
|
|
- |
| 2019 |
Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss |
CVPR 2019 |
|
|
- |
| 2018 |
Lip Movements Generation at a Glance |
ECCV 2018 |
|
|
- |
| 2018 |
Audio-Driven Animator-Centric Speech Animation |
SIGGRAPH 2018 |
|
|
- |
| 2017 |
Synthesizing Obama: Learning Lip Sync From Audio |
SIGGRAPH 2017 |
|
|
- |
| 2017 |
You Said That? Synthesising Talking Faces From Audio |
BMVC 2019 |
|
|
- |
| 2017 |
Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion |
SIGGRAPH 2017 |
|
|
- |
| 2017 |
A Deep Learning Approach for Generalized Speech Animation |
SIGGRAPH 2017 |
|
|
- |
| 2016 |
Lip Reading in the Wild |
ACCV 2016 |
|
|
- |