STARCaster: Spatio-Temporal AutoRegressive Video Diffusion for Identity- and View-Aware Talking Portraits
Foivos Paraperas Papantoniou1 Stathis Galanakis1 Rolandos Alexandros Potamias1
Bernhard Kainz1,2 Stefanos Zafeiriou1
1Imperial College London, UK 2FAU Erlangen-Nürnberg, Germany
STARCaster is a spatio-temporal autoregressive video diffusion framework for speech-driven portrait animation and continuous view synthesis. Leveraging strong identity guidance, it further supports subject-consistent yet reference-free talking portraits, allowing recontextualization beyond the constraints of the input image.
- [Dec 2025] Paper released on arXiv.
- Code & pretrained models will be released soon. Stay tuned!
If you find STARCaster useful for your research, please consider citing us:
@article{paraperas2025starcaster,
title={STARCaster: Spatio-Temporal AutoRegressive Video Diffusion for Identity- and View-Aware Talking Portraits},
author={Paraperas Papantoniou, Foivos and Galanakis, Stathis and Potamias, Rolandos Alexandros and Kainz, Bernhard and Zafeiriou, Stefanos},
journal={arXiv preprint arXiv:2512.13247},
year={2025}
}⭐ If you are interested in this project, please consider starring the repository to receive updates!
