Nvidia AI researchers have introduced AI for generating talking heads for video conferences from a single 2D image capable of achieving a wide range of manipulation, from rotating and moving a person’s head to motion transfer and video reconstruction. The AI uses the first frame in a video as a 2D photo then uses an unsupervised learning method to gather 3D keypoints within a video. In addition to outperforming other approaches in tests using benchmark datasets, the AI achieves H.264 quality video using one-tenth of the bandwidth that was previously required.
Nvidia research scientists Ting-Chun Wang, Arun Mallya, and Ming-Yu Liu published a paper about the model Monday on preprint repository arXiv. Results show the latest AI model outperforms vid2vid, a few-shot GAN detailed in a paper published at NeurIPS last year of which Wang was lead author and Liu was a coauthor.
“By modifying the keypoint transformation only, we