Release Notes#

v1.3.0#

SDK Versions#

  • Audio2Face & Audio2Emotion: 0.5.1

Audio2Face and Audio2Emotion were merged into a single SDK.

Features#

  • Added optional tongue blendshape support in the gRPC animation output. For an example of how to use this, see Flexible Configuration Management section in Audio2Face-3D NIM Container Deployment and Configuration Guide.

  • Added TLS/mTLS support for secure gRPC communication.

  • Added One-click deployment for Azure and AWS.

  • Added auto-profile selection and a command to list available profiles. For more information, see the Optimized Configurations section under Support Matrix.

Limitations#

  • Using mark_v2.3 instead of james_v2.3 or claire_v2.3 may lead to CUDA out of memory errors when deploying with the same number of streams, since mark_v2.3 uses more memory than the other two models. This issue has been observed on GPUs with lower VRAM, such as the 5080.

Deprecation Warnings#

The following items are deprecated and will be removed in the next release:

  • The unidirectional gRPC endpoints are deprecated and will be removed. The bidirectional gRPC endpoint will be the only supported endpoint for inference.

  • The 1.0 version of Audio2Face-3D Microservice is deprecated and will be removed.

  • The Python wheel used in our sample application will move to the PyPI repository. We will stop maintaining it at NVIDIA/Audio2Face-3D-Samples

v1.2.0#

SDK Versions#

  • Audio2Face: 0.22.4

  • Audio2Emotion: 0.7.9

Features#

  • The new service is now available as a downloadable NIM, seamlessly integrating into the NVIDIA NIM ecosystem.

  • New James 2.3 inference model provides better lip sync quality, stronger upperface expression for different emotions and less lip stretch artifact during silence.

  • New Claire 2.3 inference model provides better lip sync quality including F V M B P U S sounds and stronger upperface expression for different emotions.

  • New Mark 2.3 inference model provides better lip sync quality including F V M B P U S sounds.

  • Introduced support for bidirectional streaming with gRPC, enabling real-time communication between clients and the service while eliminating the need for the previously required A2F Controller.

  • Added runtime control for clamping blendshape values between 0 and 1.

  • Integrated OpenTelemetry for advanced observability, providing unified tracing and metrics.

  • Added functionality to download pre-built TensorRT (TRT) engines from NVCF, reducing service setup complexity.

  • Introduced an experimental gRPC endpoint for exporting configurations for a running service instance.

  • Updated the logging system to output application logs in structured JSON format.

v1.0.0#

SDK Versions#

  • Audio2Face: 0.17.0

  • Audio2Emotion: 0.2.2

Features#

  • New Claire 1.3 inference model provides enhanced lip movement and better accuracy for P and M sounds.

  • New Mark 2.2 inference model provides better lip sync and facial performance quality when used with Metahuman characters.

  • Users can now specify preferred emotions, enabling personalized outputs tailored to specific applications such as interactive avatars and virtual assistants.

  • Added emotional output to the microservice to help align other downstream animation components.

  • New output audio sampling rates supported in addition to 16kHz: 22.05kHz, 44.1kHz, 48kHz.

  • Added the ability to tune each stream at runtime with unique face parameters, emotions parameters, blendshape multipliers, and blendshape offsets.

Key improvements#

  • Improved the gRPC protocol to use less data and provide a more efficient stream for scalability. USD parser is no longer required.

  • Improved blendshape solve threading to improve scalability.