Release Notes#
v1.2.0#
SDK Versions#
- Audio2Face: 0.22.4
- Audio2Emotion: 0.7.9
Features#
- The new service is now available as a downloadable NIM, seamlessly integrating into the NVIDIA NIM ecosystem. 
- New James 2.3 inference model provides better lip sync quality, stronger upperface expression for different emotions and less lip stretch artifact during silence. 
- New Claire 2.3 inference model provides better lip sync quality including F V M B P U S sounds and stronger upperface expression for different emotions. 
- New Mark 2.3 inference model provides better lip sync quality including F V M B P U S sounds. 
- Introduced support for bidirectional streaming with gRPC, enabling real-time communication between clients and the service while eliminating the need for the previously required A2F Controller. 
- Added runtime control for clamping blendshape values between 0 and 1. 
- Integrated OpenTelemetry for advanced observability, providing unified tracing and metrics. 
- Added functionality to download pre-built TensorRT (TRT) engines from NVCF, reducing service setup complexity. 
- Introduced an experimental gRPC endpoint for exporting configurations for a running service instance. 
- Updated the logging system to output application logs in structured JSON format. 
v1.0.0#
SDK Versions#
- Audio2Face: 0.17.0
- Audio2Emotion: 0.2.2
Features#
- New Claire 1.3 inference model provides enhanced lip movement and better accuracy for P and M sounds. 
- New Mark 2.2 inference model provides better lip sync and facial performance quality when used with Metahuman characters. 
- Users can now specify preferred emotions, enabling personalized outputs tailored to specific applications such as interactive avatars and virtual assistants. 
- Added emotional output to the microservice to help align other downstream animation components. 
- New output audio sampling rates supported in addition to 16kHz: 22.05kHz, 44.1kHz, 48kHz. 
- Added the ability to tune each stream at runtime with unique face parameters, emotions parameters, blendshape multipliers, and blendshape offsets. 
Key improvements#
- Improved the gRPC protocol to use less data and provide a more efficient stream for scalability. USD parser is no longer required. 
- Improved blendshape solve threading to improve scalability.