Tokkio 4.1#
App Variation#
Single stream LLM RAG app with OV Rendering
Three stream LLM RAG app with OV Rendering
Three stream LLM RAG app with UE Rendering
Three stream LLM RAG app with A2F-2D Rendering
Six stream LLM RAG app with OV Rendering
Three stream Retail app with OV Rendering
Six stream Retail app with OV Rendering
To make it easy for users to switch between different use cases and deployment scenarios without having to manually adjust the parameters and rebuild the chart, Tokkio application is now prebuilt with seven variations. Users may pick and choose the one that fits their unique situation the best or use it as a base for further customization.
There are 3 dimensions at which the different flavors of Tokkio application varies.
Number of Streams
Number of streams indicate how many concurrent sessions can a single Tokkio deployment handle. Tokkio application support from 1 up to 6 concurrent streams depending on GPU resource available during the deployment. We provide some of the common app configs of 1,3 and 6 streams. We recommended using A10, L4 or higher GPU for the best experience.
1 stream (compatible only on instances with a 2x A10, L4, T4 type of GPU) 3 stream (compatible only on instances with 4x A10, L4, T4 type of GPUs) 6 stream (compatible only on instances with 4x A10, L4 type of GPUs)
Agent
Tokkio application currently supports two types of agent and interaction flow: retail and LLM/RAG. The user interaction will be very different and each agent will have clear service boundaries at runtime. e.g., retail agent can not fulfill any queries that are not related to retail application (default example provided is for food ordering). See below for more information on agents.
The UI should also be different depending on the type of agent deployed on the backend, and can either show a full-screen avatar layout (default), or a Retail layout with a touch menu. See the Tokkio Frontend section for more information on the different options available for the UI.
Rendering
User may pick and choose one of the three rendering options: Nvidia Omniverse, Unreal Engine, and Nvidia A2F-2D. Each one will have different effects on the avatar animation as well as resource footprint.
Notable updates for reference workflows#
Colang 2.0 Beta support
Enhanced End-of-utterance and Barge-in support
Vision AI - User attention support
Catalog RAG
An Iframe is added to UI to allow users to embed an avatar video into any website
User attention indicator added to the UI
Please refer to the ace agent Release Notes for notable updates on Ace agent used for Tokkio 4.1.0 release.
Known Issues#
Reference workflows are provided as-is and for reference only. They have limitations in terms of functionality. Known issues for both the reference workflows are listed in their respective documents.
Intermittent stutters observed with A2F-2D deployed on GCP L4
DS vision pod does not work intermittently on fresh installation or on stop and start of the instance
Tokkio LLM-RAG - A2F-2D - To avoid stuttering from the initial greeting speech from the avatar, user needs to enter the FOV AFTER the avatar is loaded on the UI
Tokkio LLM-RAG - Unreal Engine - During the first run after deployment, TTS responses from avatar might be missing in the first few seconds. This issue will not occur anymore after the avatar starts speaking
Triton pod crashes on a fresh deployment on T4 GPU with parakeet model. See Troubleshooting for more details.
Some of the microservices may need to be restarted after several hours of deployment.