ACE Release Notes

24.06 Release

ACE 24.06 introduces general availability for many of the components within our suite of digital human technologies. As we move our microservices to NIM, ACE microservices will be available through NVIDIA AI Enterprise and workflow examples can be found on our new GitHub repository.

NIMs for Digital Human and ACE Microservices

NIMs are the core technologies supporting our suite of digital human microservices. These microservices can be incorporated into existing digital human platforms and frameworks or used directly in your application.

NVIDIA AI Enterprise Supported

Riva ASR 2.15.1
  • New Features

    • Added ASR Parakeet-ctc-1.1b English (Default)

    • Parakeet-ctc-0.6b-unified English (beta)

    • Parakeet-ctc-1.1b-unified-ml-cs EMEA models (beta)

  • Key Improvements

    • ASR Parakeet-ctc-1.1b English (Default): Higher accuracy (lower WER) and better robustness for Accented English speech

    • Parakeet-ctc-0.6b-unified English (beta): Support for low-latency punctuated transcripts

    • Parakeet-ctc-1.1b-unified-ml-cs EMEA models (beta):Support for EMEA multilingual, code-switch, and low-latency punctuated transcripts

Riva TTS 2.15.1
  • New Features

    • TTS scaling:

      • German (Male)

      • European Spanish (Male,Female)

      • Mandarin (Male/Female)

      • Italian (Male/Female)

      • Latin-American Spanish Model (Male/Female)

    • P-Flow (zero-shot) beta release

  • Key Improvements

    • Fixed an issue that could cause breaks in audio synthesized using RADTTS++ (beta) emotion mixing model.

Riva NMT 2.15.1
  • New Features

    • Added NMT megatron 1.5B any-to-any translation model.

  • Key Improvements

    • Support for direct non-English translation with high accuracy for Spanish, Chinese, Japanese, French, German, Russian.

Audio2Face 1.0.11
  • New Features

    • New Claire 1.3 inference model provides enhanced lip movement and better accuracy for P and M sounds.

    • New Mark 2.2 inference model provides better lip sync and facial performance quality when used with Metahuman characters.

    • Added emotional output to the microservice to help align other downstream animation components.

    • New output audio sampling rates supported in addition to 16kHz: 22.05kHz, 44.1kHz, 48kHz.

    • Added the ability to tune each stream at runtime with unique face parameters, blendshape multipliers, and blendshape offsets.

  • Key Improvements

    • Improved the gRPC protocol to use less data and provide a more efficient stream for scalability. USD parser is no longer required.

    • Improved blendshape solve threading to improve scalability.

Omniverse Renderer Microservice 1.0.1
  • New Features

    • New animation data protocol and gRPC and HTTP endpoints

    • Cleaned up microservice parameters

  • Key Improvements

    • Various stability, logging, debugging, and error handling improvements

Animation Graph Microservice 1.0.1
  • New Features

    • Added support for avatar position and facial expression animations

    • New animation data protocol and gRPC and HTTP endpoints

    • Cleaned up microservice parameters

  • Key Improvements

    • Various stability, logging, debugging, and error handling improvements

ACE Agent 4.0.0
  • New Features

    • Bot response latency reduction with LLM output streaming support.

    • Support for Colang 2.0 and UMIM async event interface with enhanced control over avatar actions. Colang 1.1 support is also maintained.

    • Ability to add speech support to any custom built RAG or LLM based pipeline without any Colang based logic.

    • Prebuilt support for NVIDIA GenerativeAIExamples RAG workflows

  • Key Improvements

    • Improved integration support with LangChain, Lang Graph or any other framework based agents or RAGs

    • Support for LLM models hosted on https://build.nvidia.com/

Early Access Microservices

SpeechLivePortrait 0.1.0
  • New Features

    • All new Speech Live Portrait microservice that animates a person’s portrait photo using an audio input by animating the lip motion to match that of the audio.

    • Supports facial characteristics including lip sync, blinking and head pose animation.

    • Support two modes; quality mode for higher visual fidelity and performance mode for quicker run-time on real time streaming.

    • Algorithmic latency of 198 ms for model priming to streaming performance for a 30FPS output as:

      • Performance mode

        • Latency: 22ms (L4), 9.62ms (L40)

        • Throughput: 1 concurrent stream (L4), 3 concurrent streams (L40)

      • Quality mode (intended for offline enhancements)

        • Latency: 57.80ms (L4), 20ms (L40)

        • Throughput: 0 concurrent streams (L4), 1 concurrent streams (L40)

Nemotron 4.5B SLM 0.1.0
  • New Features

    • New Small Language Model (SLM) designed for on-device conversational inference.

    • Includes INT4 Quantization for minimal VRAM usage.

    • New NVIDIA AIM plugin available through our ACE Early Access program.

    • Support for role play and RAG use cases

VoiceFont 1.1.1
  • New Features

  • New low-latency model with reduced algorithmic latency of 170 ms for real time use cases

  • Supports 4 concurrent batches on all GPUs

  • Support added for Hopper GPUs (H100)

ACE Reference Workflows

Reference workflows showcase how the microservices can be used to build digital humans for particular use cases. These workflows are designed to be examples rather than full solutions.

Customer Service Workflow

In this release, the customer service workflow (Tokkio) includes workflows that leverage NVIDIA-LLM enterprise RAG integrated with Riva and avatar animation microservices. You can leverage this workflow, customize it and accelerate your development by bringing your own customized RAG connected to your customer service digital human.

New Reference Components
  • Generative AI examples for QSR app, LLM-based app

  • Avatar configurator tool

  • ACE agent Quick Start Script

  • Helm chart for animation pipeline

  • Default screen for animation pipeline

  • Template scene for customer avatar

Game Character Workflow

This release targets our first gaming reference workflow focused on enabling the Audio2Face microservice within games. This plugin can be used to integrate into existing game character platforms or used directly within the game.

New Reference Components
  • Unreal Engine Plugin with Audio2Face

New Samples
  • Audio2Face configuration sample that uses an NVCF API Key for inference and showcases the usage of the Unreal Engine Plugin.

Tools

ACE Tools help build and create custom workflows and extend reference functionality.

UCS Tools 2.5

  • New Features

    • UCS Tools can now deploy apps to NVCF

    • Support for HorizontalPodAutoscaler in k8s deployment and statefulsets

    • Microservices in a UCS App can now interface with the NVIDIA k8s RAG Operator to implicitly connect to and deploy NeMo microservices

    • The same microservice can now be used multiple times in the same UCS App and uses the feature of Helm Chart aliasing

  • Key Improvements

    • Updated logging such that more detail is provided upon errors

    • SemVer Pre-release versions are now supported in Helm Charts and MS specifications. Examples include 1.0.0-alpha.1, 1.0.0-rc.1, etc.

    • In the ucf.k8s.service component, user can now set the ‘port’ value to a parameter defined in the manifest.yaml’s ‘params’ block, e.g. port: $params.servicePort

    • StatefulSets now must use the new parameter statefulSetServiceName to set the service name, which in turn will help set the K8s resource StatefulSet.spec.serviceName field.

Avatar Configurator 1.0.1

  • New Features

    • Added Ferret base avatar

    • Added alternative hairstyle

    • Added apron clothing option