Release Notes#

v1.2.0#

SDK Versions#

  • Audio2Face: 0.22.4

  • Audio2Emotion: 0.7.9

Features#

  • The new service is now available as a downloadable NIM, seamlessly integrating into the NVIDIA NIM ecosystem.

  • New James 2.3 inference model provides better lip sync quality, stronger upperface expression for different emotions and less lip stretch artifact during silence.

  • New Claire 2.3 inference model provides better lip sync quality including F V M B P U S sounds and stronger upperface expression for different emotions.

  • New Mark 2.3 inference model provides better lip sync quality including F V M B P U S sounds.

  • Introduced support for bidirectional streaming with gRPC, enabling real-time communication between clients and the service while eliminating the need for the previously required A2F Controller.

  • Added runtime control for clamping blendshape values between 0 and 1.

  • Integrated OpenTelemetry for advanced observability, providing unified tracing and metrics.

  • Added functionality to download pre-built TensorRT (TRT) engines from NVCF, reducing service setup complexity.

  • Introduced an experimental gRPC endpoint for exporting configurations for a running service instance.

  • Updated the logging system to output application logs in structured JSON format.

v1.0.0#

SDK Versions#

  • Audio2Face: 0.17.0

  • Audio2Emotion: 0.2.2

Features#

  • New Claire 1.3 inference model provides enhanced lip movement and better accuracy for P and M sounds.

  • New Mark 2.2 inference model provides better lip sync and facial performance quality when used with Metahuman characters.

  • Users can now specify preferred emotions, enabling personalized outputs tailored to specific applications such as interactive avatars and virtual assistants.

  • Added emotional output to the microservice to help align other downstream animation components.

  • New output audio sampling rates supported in addition to 16kHz: 22.05kHz, 44.1kHz, 48kHz.

  • Added the ability to tune each stream at runtime with unique face parameters, emotions parameters, blendshape multipliers, and blendshape offsets.

Key improvements#

  • Improved the gRPC protocol to use less data and provide a more efficient stream for scalability. USD parser is no longer required.

  • Improved blendshape solve threading to improve scalability.