Alibaba Introduces EMO Neural Network that Animates Portraits to Talk and Even Sing

EMO: An AI that Animates Static Images Developed by Alibaba Researchers

EMO, an artificial intelligence system capable of animating a static image of a person, has been developed by researchers at the Alibaba Institute of Intelligent Computation. The AI system can make the person in the image speak or sing realistically.

Realistic Mimicry and Head Movements

EMO renders authentic head movements and facial expressions corresponding to the emotional nuances of the soundtrack used to generate the animation. Linrui Tian, the leader of the development team, explained, “Traditional methods often fail to capture the full range of human facial expressions and the uniqueness of individual styles. To address these issues, we propose EMO — a new framework that employs the direct synthesis approach from audio to video, bypassing the need for intermediate 3D models or facial landmarks.”

Create Animation Directly From Sound

At the core of the EMO system is a diffusion-based AI model known for generating realistic images. The researchers trained the model on a data set comprising over 250 hours of ‘talking head’ video footage, including speeches, movie clips, TV shows, and vocal performances. Unlike previous methods that involved creating 3D models or mechanisms to mimic human expressions, EMO directly converts sound into a video sequence. This allows the system to capture subtle motions and personality traits associated with natural speech.

Superior Performance of EMO

The creators of EMO assert that it surpasses existing methods in terms of video quality, identity preservation, and expressiveness. A focus group survey revealed that the videos created by EMO are more natural and emotional compared to those produced by other systems. The system can generate animation not only based on speech but also using vocal soundtracks. It considers the shape of the person’s mouth in the original image, adds appropriate facial expressions, and synchronizes the movements with the vocal performance. The only issue associated with EMO is the potential misuse of this technology. The researchers plan to investigate methods for identifying videos created by AI.

This post was last modified on 02/29/2024

Harry Males: Hey there, I'm Harry Males, your go-to news writer at Dave's iPAQ, where I traverse the intricate landscape of technology, reporting on the latest developments that shape our digital world. With a pen in hand and a passion for all things tech, I dive deep into the realms of Software, AI, Cybersecurity, and Cryptocurrency to bring you the freshest insights and breaking news. Artificial Intelligence is not just a buzzword for me – it's a captivating realm where machines mimic human intelligence. From the wonders of machine learning to the ethical considerations of AI, I'm dedicated to keeping you informed about the advancements that are reshaping industries and everyday life. Beyond the bylines and breaking news, I believe in fostering a community of tech enthusiasts. Whether it's engaging in discussions on forums, attending tech conferences, or sharing insights on social media, I aim to connect with readers who share a passion for the ever-evolving world of technology.