Overview#
NVIDIA Audio2Face-3D NIM (A2F-3D NIM) is delivering generative AI avatar animation solutions based on audio and emotion inputs.
Audio2Face-3D NIM is a component of NVIDIA NIM™ and NVIDIA AI Enterprise. NVIDIA NIM™ offers containers for self-hosting GPU-accelerated inferencing microservices, enabling deployment of pretrained and customized AI models across clouds, data centers, and workstations.
The Audio2Face-3D NIM converts speech into facial animation in the form of ARKit Blendshapes. The facial animation includes emotional expression. Where emotions can be detected, the facial animation system captures key poses and shapes to replicate character facial performance by automatically detecting emotions in the input audio. Additionally emotions can be directly specified as part of the input to the A2F-3D NIM. A rendering engine can consume Blendshape topology to display a 3D avatar’s performance.
This Audio2Face-3D NIM supports multiple input streams simultaneously, enabling workflows that allow many users to connect and generate animation output at the same time.
The Audio2Face-3D NIM is also available as a microservice which can connect with other NVIDIA Unified Cloud Services Tools microservices that support its endpoint protocols.
Try out the A2F-3D NIM experience on our demo website.
Note
For users of Audio2Face-3D version 1.0, please use legacy documentation link here.