すごい。“a text-to-speech network based on Char2Wav, a time-delayed LSTM to generate mouth-keypoints synced to the audio, and a network based on Pix2Pix to generate the video frames conditioned on the keypoints.”
ObamaNet: Photo-realistic lip-sync from text Rithesh Kumar, Jose Sotelo, Kundan Kumar, Alexandre de Brebisson, Yoshua Bengio Paper Abstract We present ObamaNet, the first architecture that generate...
すごい。“a text-to-speech network based on Char2Wav, a time-delayed LSTM to generate mouth-keypoints synced to the audio, and a network based on Pix2Pix to generate the video frames conditioned on the keypoints.”
braitom のブックマーク 2017/12/10 19:20
このブックマークにはスターがありません。
最初のスターをつけてみよう!