Tokkio 4.1#

App Variation#

  1. Single stream LLM RAG app with OV Rendering

  2. Three stream LLM RAG app with OV Rendering

  3. Three stream LLM RAG app with UE Rendering

  4. Three stream LLM RAG app with A2F-2D Rendering

  5. Six stream LLM RAG app with OV Rendering

  6. Three stream Retail app with OV Rendering

  7. Six stream Retail app with OV Rendering

To make it easy for users to switch between different use cases and deployment scenarios without having to manually adjust the parameters and rebuild the chart, Tokkio application is now prebuilt with seven variations. Users may pick and choose the one that fits their unique situation the best or use it as a base for further customization.

There are 3 dimensions at which the different flavors of Tokkio application varies.

  1. Number of Streams

Number of streams indicate how many concurrent sessions can a single Tokkio deployment handle. Tokkio application support from 1 up to 6 concurrent streams depending on GPU resource available during the deployment. We provide some of the common app configs of 1,3 and 6 streams. We recommended using A10, L4 or higher GPU for the best experience.

1 stream (compatible only on instances with a 2x A10, L4, T4 type of GPU) 3 stream (compatible only on instances with 4x A10, L4, T4 type of GPUs) 6 stream (compatible only on instances with 4x A10, L4 type of GPUs)

  1. Agent

Tokkio application currently supports two types of agent and interaction flow: retail and LLM/RAG. The user interaction will be very different and each agent will have clear service boundaries at runtime. e.g., retail agent can not fulfill any queries that are not related to retail application (default example provided is for food ordering). See below for more information on agents.

Tokkio Retail Tokkio LLM-RAG

The UI should also be different depending on the type of agent deployed on the backend, and can either show a full-screen avatar layout (default), or a Retail layout with a touch menu. See the Tokkio Frontend section for more information on the different options available for the UI.

  1. Rendering

User may pick and choose one of the three rendering options: Nvidia Omniverse, Unreal Engine, and Nvidia A2F-2D. Each one will have different effects on the avatar animation as well as resource footprint.

Notable updates for reference workflows#

  • Colang 2.0 Beta support

  • Enhanced End-of-utterance and Barge-in support

  • Vision AI - User attention support

  • Catalog RAG

  • An Iframe is added to UI to allow users to embed an avatar video into any website

  • User attention indicator added to the UI

Please refer to the ace agent Release Notes for notable updates on Ace agent used for Tokkio 4.1.0 release.

Known Issues#

  • Reference workflows are provided as-is and for reference only. They have limitations in terms of functionality. Known issues for both the reference workflows are listed in their respective documents.

Tokkio Retail Tokkio LLM-RAG

  • Intermittent stutters observed with A2F-2D deployed on GCP L4

  • DS vision pod does not work intermittently on fresh installation or on stop and start of the instance

  • Tokkio LLM-RAG - A2F-2D - To avoid stuttering from the initial greeting speech from the avatar, user needs to enter the FOV AFTER the avatar is loaded on the UI

  • Tokkio LLM-RAG - Unreal Engine - During the first run after deployment, TTS responses from avatar might be missing in the first few seconds. This issue will not occur anymore after the avatar starts speaking

  • Triton pod crashes on a fresh deployment on T4 GPU with parakeet model. See Troubleshooting for more details.

  • Some of the microservices may need to be restarted after several hours of deployment.