Alibaba Introduces EMO Neural Network that Animates Portraits to Talk and Even Sing

EMO: An AI that Animates Static Images Developed by Alibaba Researchers

EMO, an artificial intelligence system capable of animating a static image of a person, has been developed by researchers at the Alibaba Institute of Intelligent Computation. The AI system can make the person in the image speak or sing realistically.

Realistic Mimicry and Head Movements

EMO renders authentic head movements and facial expressions corresponding to the emotional nuances of the soundtrack used to generate the animation. Linrui Tian, the leader of the development team, explained, “Traditional methods often fail to capture the full range of human facial expressions and the uniqueness of individual styles. To address these issues, we propose EMO — a new framework that employs the direct synthesis approach from audio to video, bypassing the need for intermediate 3D models or facial landmarks.”

Create Animation Directly From Sound

At the core of the EMO system is a diffusion-based AI model known for generating realistic images. The researchers trained the model on a data set comprising over 250 hours of ‘talking head’ video footage, including speeches, movie clips, TV shows, and vocal performances. Unlike previous methods that involved creating 3D models or mechanisms to mimic human expressions, EMO directly converts sound into a video sequence. This allows the system to capture subtle motions and personality traits associated with natural speech.

Superior Performance of EMO

The creators of EMO assert that it surpasses existing methods in terms of video quality, identity preservation, and expressiveness. A focus group survey revealed that the videos created by EMO are more natural and emotional compared to those produced by other systems. The system can generate animation not only based on speech but also using vocal soundtracks. It considers the shape of the person’s mouth in the original image, adds appropriate facial expressions, and synchronizes the movements with the vocal performance. The only issue associated with EMO is the potential misuse of this technology. The researchers plan to investigate methods for identifying videos created by AI.

Related Posts