Tokkio 5.0 Beta#

Tokkio 5.0 Beta introduces a refreshed reference workflow that enables seamless integration of digital humans with a LLM, RAG or an agentic application, supporting 1-3 concurrent users per deployment based on available GPU resources.

Notable Updates#

New default Avatar “Aki” featuring the Unreal Renderer microservice and ElevenLabs voice integration.
Native support for any RAG built using Enterprise RAG Blueprint as the knowledge source for digital humans.
Streamlined digital human development pipeline powered by the Python-native ACE Controller microservice as the new orchestrator.
Easy orchestration of pipeline along with low-latency data transport and constructing avatar pipeline through ACE Controller microservice.
All new NVIDIA Pipecat Python extension library to create seamless interactive digital human experiences.
Configurable extensions with support for TTS, ASR, LLM and RAG.
Vision AI pipeline is temporarily not supported in this release.
All new developer workflow to allow a configurable interactive avatar experience with easy parameter configuration, tuning and tracing using new developer tools:

> Local workflow: For easy local iteration and configuration for the ACE Controller.

> ACE Configurator & corresponding VS Code plugin: Seamless iteration and configuration of the pipeline and avatar configurations straight from VS Code IDE.

> Observability tools: OpenTelemetry support for end-to-end logging, tracing and visualization.
Composable Tokkio UI to accommodate multi-modal experiences for elegant data presentation, support for iframe and full-screen application.
Deployment scripts for quick install on Bare-metal, AWS, GCP and Azure.

Verbose logging support to help with debugging.

Support for Terraform as IaC option with future alignment to use tofu as default IaC binary.

Support for –force-destroy option that optimizes uninstallation time on CSPs.

Cloud Native Stack 14.0 support with latest GPU Operator and Kubernetes versions.

Decomposition guide to support creation of bespoke scripts for custom platforms and infrastructure

Support for disaggregated deployment and API integration for Audio2Face-3D and RIVA Speech outside of the application to leverage existing deployments.

Known Issues#

The avatar may appear frozen or inactive after a fresh deployment on the very first interaction after a fresh deployment. A page refresh resolves this issue.
The Unreal Renderer pod may restart due to several reasons, such as the system losing access to GPUs, multiple application refreshes during interaction, or if the app remains idle for some time on A40 and T4 GPU instances. To resolve this issue, restart the pod, and if necessary, follow it with a restart of the ue-renderer SDR pod.
Updating bot.py to use a 48K sample rate for TTS audio, A2F, and modifying the VMS and animation graph configuration maps to the same 48K sample rate causes the Unreal renderer pod to crash when accessing the UI, as it only supports a 16K sample rate audio sample..
Reasoning models like Gemma, deepseek, o1-mini, and o1-preview do not generate any response for any queries.
The bot continues to respond to user questions even after the stop button is pressed. This issue arises because the frame processors (ASR, NVIDIAContextAggregator) do not reset their state when the user leaves or when a StartInterruptionFrame is pushed.
There is an intermittent mismatch between TTS audio and TTS transcript.
When the start page is enabled in the UI, the microphone icon may display an incorrect state after consecutive sessions. To resolve this, refresh the Tokkio UI and proceed as usual. This issue does not occur when the start page is disabled.
For some stream_ids, the Pipecat pipeline fails to initialize properly, preventing conversations on those pipelines. This issue occurs when the start frame is not received by the frame processor, resulting in ElevenLabs initialization errors.
NVIDIA Pipecat library is compatible only with the Python 3.12 version.
There’s a known concurrency bug in the Unreal Renderer microservice start-up script that might corrupt the Unreal project on the persistent volume causing the pods to crash loop. The issue may rarely occur when multiple Unreal Renderer pods get restarted a the same time. The pods can be brought back to life by deleting the file assembledProject/ngc_resource_path.txt in the persistent volume (claim name “ia-unreal-renderer-microservice-assets”). This forces the project to be reassemble. After about 5 minutes, all pods should be back in the running state.
When interacting with the Avatar, it has been observed that ElevenLabs TTS does not correctly pronounce certain numbers, such as those in the thousands and hundreds.
The latency measurements for the Tokkio NVIDIA LLM Service and ElevenLabs TTS are currently inaccurate. They only account for the sending of the payload and miss the receiving of the response.
Memory leak (System RAM) observed on unreal-renderer pods and ace-controller pods over 24-hr stability test.
The Avatar’s mouth occasionally remains open with the LLM RAG or default application after changes are uploaded using the ACE Configurator.
When the JWT token expires during infrastructure creation, GCP scripts fail to generate output.
Nvidia-GPU-operator pods fail to execute when the infrastructure installation is performed with CNS 14.0 and the master branch set as the git_ref using AWS one-click scripts.
Inconsistent ASR - frequently omits the initial words in user utterances.
The unreal-renderer microservice crashes in applications deployed with nvidia_driver_version “550.90.07” and “550.127.08”.
The VMS pod restarts frequently in the stability test due to segmentation fault.