Release Note

Documentation updates

Tokkio 4.0

Notable updates

Tokkio LLM app
  • The Tokkio LLM reference app becomes the Tokkio LLM-RAG reference app. It allows users to connect their own RAG pipeline to Tokkio with and without streaming.

  • Colang 2.0 support

  • Enterprise RAG support

  • GenerativeAIExamples RAG support

  • LLM based gesture generation

  • LLM response streaming support

  • Barge in support

Tokkio QSR app
  • The intent slot NLP models are replaced with LLM

  • Colang 2.0 support

  • Barge in support

Animation and Rendering
  • New animation pipeline architecture and micro services

  • New A2F model and FP16 support

  • OV kit upgrade

  • New scene

  • Dynamic GPU allocation support

  • Enable multiple level in the logs

VMS
  • Updated VMS micro service

  • 4K streaming support and dual peer connection

UMIM
  • Introduction of UMIM action server

UI and UI server
  • UMIM Integration

  • Dual peer connection support

ACE agent
  • Updated architecture and micro services

RIVA
  • Support for the ASR model parakeet_1-1b

  • IPA dict support

SDR
  • Error recovery support

Architecture
  • Migration from UCF 2.0 to UCS 2.5

Security
  • Security patches for all the micro services

MLOPS
  • The MLOPS micro service is no longer supported

Known Issues

Tokkio Reference Application

  • Some menu navigation related queries with speech like “Go to the next page” or “Show the main menu” might give inaccurate results.

  • LLM prompt tuning will be required to tune some of the responses appropriately for different LLM models.

  • Items and topping replacement might give inaccurate results.

  • Item recommendations feature has been removed.

  • Adding multiple items via speech in the same sentence might lead to inaccurate results.

Tokkio 3.0.1 patch

The default LLM endpoint used in the Tokkio LLM app is no longer supported. The model can be updated by following these steps:

  1. Get the tokkio UCF app source code by following the quickstart guide

  2. Open the tokkio-llm-app.yaml and update the dialog manager microservice version to 3.0.1 (ucf.svc.botmaker.dialog-manager:3.0.1)

  3. Get the fm resource from ngc at “eevaigoeixww/tokkio-3-0-ea/tokkio-llm-fm-resource:0.1.4”

  4. Update the model to gpt-3.5-turbo-instruct in the tokkio_llm_bot_open_ai_config.yml

  5. Upload the updated fm resource to NGC

  6. Replace “eevaigoeixww/tokkio-3-0-ea/tokkio-llm-fm-resource:0.1.4” in the tokkio-llm-app-params.yaml with the new location

  7. You can now resume the quickstart at the step Using UCF CLI

Tokkio RAG

Due to the growing popularity of RAG (Retrieval-Augmented Generation), we are excited to announce that you can now seamlessly integrate your custom RAG pipeline with Tokkio. Simply update the fulfillment module to call your custom RAG endpoint, and you’re all set to leverage the advanced capabilities of RAG within Tokkio’s environment. For a practical example of how to leverage and customize the Tokkio fulfillment module, please refer to the order_food.py script available in our latest NGC resource. To access this example, download version 1.2.7 of the Tokkio FM resource using the following command:

ngc registry resource download-version "eevaigoeixww/tokkio-3-0-ea/tokkio-fm-resource:1.2.7"

Tokkio 3.0

Notable updates

Animation and Rendering
  • Complete refactoring of the animation and rendering pipeline into microservice components to be easily swapped out on customer demand

  • Improve rendering latency and avatar facial expression and gesture

BotMaker
  • NVIDIA NeMo Guardrail and Colang-based dialog management and guard-railing support

  • Support for plugging in custom Pythonic dialog management code

  • NvBotMaker CLI tool for building your bot in a native Python environment

  • Docker compose-based deployment support

  • Deployment support on Kubernetes environment

  • Integration with NVIDIA NeMo LLM and NeMo Inform EA services

  • Integration with NVIDIA Riva translation

  • Standalone NLP server with dedicated endpoint for common NLP tasks

  • Connectors for custom NLP model and retrieval pipelines

  • Standalone Fulfillment server with ability to support custom endpoints

  • Conversion script and guidance for porting previous release-based bots to Colang

  • Concurrent deployment of multiple bots with different configs

  • Support for plugging in 3rd party TTS pipeline

  • OpenAI LLM backend support on QSR bot on previously difficult queries

  • Add support for nemollm inference server engine in dialog manager

  • Automatic Speech Recognition improvements

  • Moved to a new bot architecture based on Colang to provide better control

  • Various bug fixes

Vision AI & EMDX
  • Support for Deepstream 6.3

  • Support for L4 GPU

  • Support user attention detection

MLOps
  • Customizable triggers

Infrastructure
  • Introduce Reverse Proxy as a replacement for the co-turn server to simplify the Infrastructure.

  • Support 1-click deployment script on OCI (Oracle Cloud Infrastructure)

  • Support more types of GPUs: T4, A10, L4 (subjected to cloud vendor availabilities)

  • Base OS now is Ubuntu 22.04.3

  • UCF 2.0

Miscellaneous changes
  • End users are now notified with service crashes and failures through web UI

Known Issues

Tokkio Reference Application

  • Some menu related queries with speech like “Can I have something without onions”, “What are the non vegetarian options available” might give inaccurate results.

  • LLM prompt tuning might be required to tune some of the responses appropriately for menu, cart, IR or open domain related queries. Non-optimal results observed more frequently with NemoLLM/gpt-43b-905 model compared to OpenAI text-davinci-003.

  • Some queried items don’t show images as the image location was pointing to “default.png” which did not exist.

  • Exit message of “Thanks for visiting, goodbye!” is played when the user exits the camera view with or without placing the order.

  • Adding/removing some toppings to items via speech might not work correctly.

  • Items and topping replacement might give inaccurate results

  • Longer response times for queries that require LLM involvement

  • Intermittent FOV exits observed

BotMaker

  • Sub-intent classification is not enabled, leading to queries like “repeat order” and “repeat last response” not distinguished clearly by the intent-classification model.

  • Memory leak and pod-restart observed after prolonged use

  • It is recommended to return string responses instead of numeric values in the Fulfillment server when there is a possibility of 0 as the response. This is because BotMaker interprets 0 and empty strings as None when getting values from the Fulfillment server, which can lead to unexpected behavior.

MongoDB and redis-timeseries

  • Increase in memory usage over time when the system is in actively used for a long duration

OV-Renderer

  • Pod crashes observed at times during deployment

ML-Ops

  • Mlops packages have no-video or incomplete video captured intermittently

  • Mlops packages are not uploaded to object storage hosted on OCI

General

  • Tokkio deployment must be restarted after restarting the cloud instance used for deployment

  • After several hours of deployment, the users can observe that the bot state on Tokkio UI is stuck in initializing state OR the Avatar is not visible on UI. When this happens, inside the vms pod or ds-vision pod, the nvidia-smi command will give error “Failed to initialize NVML: Unknown Error”. To overcome this issue, the user can bounce the VMS pod and wait for sometime for the issue to get resolved; user can also try to restart ds-vision, ov-renderer pod after sometime if the issue is still observed.

Tokkio 2.0

Tokkio 2.0 unlocks various features including scalability, faster deployment times, helm upgrade support, lower latency, improved stability and error handling compared to release 1.5. Main features listed below:

Notable updates

Reference applications and Customization
  • Tokkio IR + LLM reference application to showcase how the Large Language Models can be easily used with Tokkio pipeline

  • Customization guidance to use Tokkio pipeline without vision component

  • Addition of recommendations feature for the reference application of Quick Service Restaurant

Scaling
  • One deployment/pipeline supports up to three active connections on a single 4-T4 GPU instance

BotMaker
  • OpenAI LLM backend support

  • Bot Controller RTSP Audio input support

  • Barge in support for speech pipeline

  • Information Retrieval support to allow utilizing existing knowledge sources

  • Dedicated helm chart to support bot customization and development seamlessly

  • Non-English language support in BotMaker pipeline with OpenAI backend

  • Automatic Speech Recognition improvements

  • Tokkio Food Ordering Fulfillment scaling support

  • Upgraded to latest Riva Skills 2.11.0 models

MLOps
  • Customizable metadata collector

  • Tokkio Package Analyzer (TPA) tool to visualize and validate accuracy of various AI subsystems within Tokkio deployments

K8s Cluster Logging
  • Elasticsearch cluster with Kibana can be enabled to view Tokkio logs on Kibana dashboard

Stream Distribution and Routing
  • Workload distribution agents created to manage the routing of video streams for various microservices

Easy Deployment
  • Update to the 1-click deployment scripts to enable fast and easy Tokkio deployment on supported cloud setups

Miscellaneous changes
  • Removal of Maxine audio and ThinClient microservices as a part of Tokkio deployment

  • The UI Server implements an addition websocket endpoint with VST

Updating to Tokkio 2.0
  • For any new microservice created to work with Tokkio 1.5, a crucial detail to note before the migration would be the endpoints it uses for connecting to other components within the Tokkio 1.5 deployment.

  • If any parameters values were updated/customized for microservices within Tokkio 1.5 deployment, these would also need to be manually carried forward into the Tokkio 2.0 deployment.

  • You can use UCF studio to set the relevant parameters and ensure that the connection endpoints can be successfully maintained or make minor adjustments, if necessary, to enable interfacing with the updated APIs.

  • Delete the previous (Tokkio 1.5) deployment before installing Tokkio 2.0 using the applicable 1-click scripts.

Documentation update: 1.5.3

  • Introduced Automated setup (One Click) guides for AWS, Azure and GCP.

  • Renamed previous CSP setup guides to include Interactive in their names.

  • Updated screenshots under “Setup TURN server instance” -> “Setup Security group” to reflect NAT IP in AWS interactive setup guide

  • Updated screenshot in “Create S3 bucket” section to reflect “Block all public Access” option

  • Added and an overview of negative slot tagging in Intent Slot model and how to use negated slots in Fulfillment

Documentation update: 1.5.2

The following are the list of topics updated for the release:

  • Fixed missing links and references in the documentation

  • CSP guides: Added easy to follow instructions, updated diagrams

  • VMS/VST introduction: Updated to clarify connection between VST and VMS

  • Riva Speech Skills: Added missing links, updated description

  • Tokkio Bot and its customization: Updated description, added easy to follow instructions

  • Tokkio customization details: Added a table for all supported customizations

  • Quick Start Guide: Added instructions regarding loading Tokkio graph with UCF Studio

  • Avatar Configurator: Added instructions regarding usage outside of Tokkio

Known Issues

ASR

  • Sometimes ASR predicts spurious transcripts, mapping non-speech signals to characters/words. This can be reduced by Fine Tuning Neural VAD, refer to Neural VAD Fine tuning in documentation.

  • Usage of Headphones and less noisy acoustic scenario’s are preferred.

TTS

  • Some cases currency denominations (Ex: $10000) does not get pronounced properly.

NLP

  • There is a known quality issue with the Answer Extender and QNA model used in Tokkio Bot. Some queries to the prebuilt Tokkio QSR bot, mainly related to menu items like “Do you have a coffee”, “Do you have a cobb salad” does not work properly.

  • Some page navigation related queries like “Go to main page” cause the UI to get stuck. This is due to known quality issues of the bert based intent-slot model for such queries, misclassification of the intent cause the state to be stuck on UI side.

LLM

  • LLM endpoints can result in intent miss-classifications which can result in the functional flow getting modified and hence the response can go haywire.

  • Only NemoLLM and OpenAI models are supported right now. Support for adding any 3rd party LLM not yet available.

  • When using guardrail policy, latency can be as high as ~10 seconds.

  • Form filling doesn’t work for LLM app when guardrail policy is used to formulate the response.

  • When guardrail policy is used, it is noted that the bot classifies the bot’s response as unethical even though it is not.

  • For IR, the documents may not get retrieved if the confidence threshold is lower than 0.15 (by default).

  • Only a single document can be auto-ingested. To ingested more documents, deploy the app first and use the IR URL endpoint to ingest them.

  • For date-time and weather queries, the bot can only respond to the current date-time and weather scenarios. It may not work if you ask it to do mathematics like “what is the time 45 minutes past the current time?”, or “How was the weather in Delhi 2 days ago?” etc.

  • Profanity handling prevents profane words from bot response but does not prevent it from being displayed as part of the ASR transcript which is rendered on the UI.

  • The responses from the bot may not always be factually correct from LLM endpoints.

  • The bot contains support for spurious_transcript_handler user is expected to update it based on spurious transcripts they are getting. Nemo guardrails can be used for such cases.

  • The LLM bot doesn’t have an exit intent like QSR where you can say bye to close the interaction session.

  • The bot has been configured with virtual_assistant_pipeline_idle_threshold_secs=10 seconds hence it is expected that the bot will close the session if bot stays idle for 10 seconds or more.

Recommendations

  • Current sample provides a reference to use with the QSR App and requires users to build their own recommendations Model with Nvidia Merlin for their use-cases.

  • The merlin model used for recommendation is trained on synthetic data and hence you can see it providing unreasonable recommendations. The model is provided for reference only and the user is expected to come up with their on recommendation model properly trained on good amount of data.

  • We do not support recommending items to the user without them explicitly asking for it.

  • Recommendations are only based on the cart of the user, not on any other information like season, weather, popular holidays, demographics etc.,

  • We don’t support recommendations for based on popular item, item description, toppings, calories.

UI

  • Some items doesn’t show image (e.g. when asking for recommendation) as the image location was pointing to “default.png” which did not exist.

Avatar Configurator

  • The app can consume high amounts of CPU and memory

  • Currently there is a known issue with NGC client 3.22.0 (and potentially earlier versions) that does not upload folders with nested sub-folders correctly. Until this is fixed, you will need to use a Linux/Ubuntu system to upload Avatar scenes to NGC.

Tokkio Package Analyzer (TPA)

  • Timeouts may occur during TPA installation if the Helm installer is unable to locate the required secrets or encounters issues while pulling Docker images. Ensuring that all pre-requisites are met and that image pull secrets have the appropriate permissions can prevent timeouts from occurring.

  • The TPA visualizer requires corresponding audio files for a turn’s Query and/or Response in order to play audio. There is a known issue currently where sometimes the audio files are not present in the MLOPs package, and hence playback shall not be possible in such cases.

  • TPA operates under the assumption that turn IDs and MLOps session IDs are unique and non-repeating. There is a known issue whereby the Riva session IDs overlap due to which the package does not show up in TPA UI. Workaround is to close browser between different sessions.

  • In TPA Visualizer, sometimes user may have to click multiple times on the audio play button of Query and/or Response columns for it to work

  • It is recommended that the root partition has more than 100GB of free space to cater to TPA storage requirements. Note that as TPA ingests packages, the storage consumed by TPA backend including Airflow and MongoDB grows over time. User should therefore be mindful of available space during TPA execution.

  • Failure bucket for other (E_other) in the TPA UI is not supported in the current release.

  • To restart the TPA after modifying the global_config and/or environment variables of airflow, visualizer-backend and visualizer-frontend via values.yaml, you will need to perform an upgrade installation (helm upgrade –install tpa-rel-name tpa-chart-name) and then delete the corresponding pod in which the change was required. This will automatically restart the pod with updated config values.

Other

  • The Barge-in support available in Tokkio 2.0 is for stopping the TTS playback only and not to stop the action triggered by the interrupted response. So for example, an ordered item being added to the cart would not be prevented if a user stops tts playback by using any of the barge-in words.

  • The pipeline can sometimes hang after long duration of operation. A hard reboot of the bot controller pod resolves the issue.

  • Profanity handling prevents profane words from bot response but does not prevent it from being displayed as part of the ASR transcript which is rendered on the UI.

  • Animation Microservice as part of Tokkio supports only Omniverse Avatars and not Metahuman / Other rendering engines like Unreal Engine

  • A single 4-T4 GPU node is the only deployment architecture supported currently. It can at most support up to 3 concurrent streams and additional requests will be rejected.

  • System will be temporarily inaccessible during software upgrades including parameter update. (no zero-downtime guarantee)

  • Chrome only

  • Good lighting conditions required

  • Single person in view

  • Client system should have sufficient memory and CPU for smooth operation.

  • Kit limitation: Avatar Configurator consumes high memory and CPU due to the RTX rendering viewport. If many applications are open (e.g. Chrome), sporadic crashes might happen.

  • Cart Manager MLops information currently provides information about all the sessions in a single message. This is meant for reference only. Customers are to modify or bring in their own Cart/Order Management microservices

  • Avatar scene: Image can sometimes display minor artifacts around character edges and backgrounds.

  • After the avatar speaks a sentence, there can be a small visible lag in the animation.

  • When starting the Animation Microservice, it could give a “ready” state before the A2F network is actually initialized. As a result, if the user starts the browser UI immediately after getting the “ready” signal from the Animation Microservice, the user can already see and interact with the avatar in the UI, but the lips won’t move for the first 30-60 seconds.

  • When starting the Animation Microservice with a scene which has been very recently uploaded to NGC (i.e. a few seconds before), the Animation Microservice can crash a few times during initialization (“Init”). Eventually, after a few automatic restarts, it will work properly. Even though the scene looks fine on NGC and the upload worked, it can indeed happen that the scene is not actually downloadable from NGC by the microservice for some more seconds/minutes.

  • Tokkio client systems should not have applications or sites opened which result in high CPU or memory utilization while accessing the Tokkio App through the browser. High CPU usage negatively impacts the quality of video streams transported to the cloud. Poor video quality leads to failure in detecting faces in the vision pipeline subsystem. This ultimately results in abrupt end user session terminations.

  • After long running of Tokkio backend system Redis time series pod may go into very high memory usage state resulting in Tokkio system to be unusable. To recover from this situation the Redis stream named ‘test’ needs to be capped/trimmed. Follow the steps below.

    • kubectl get pods #Identify the name of redis-timeseries pod by executing following command

    • Execute the redis-cli command line tool present inside the redis-timeseries pod.

      kubectl exec -it <redis-timeseries-pod-name> redis-cli Replace <redis-timeseries-pod-name> with the actual name of the Pod  identified in step a)
      
    • Trim the Redis stream test using the command shell of redis-cli tool: XTRIM test MAXLEN 10000

    • Exit the redis-cli command line tool: exit

  • All SRD containers need to be rebooted manually, if VST pod restarted for any reason:

    kubectl delete pod a2f-wdm-envoy-wdm-deployment-xxxxxxxx bot-controller-wdm-envoy-wdm-deployment-xxxxxxx ds-wdm-envoy-wdm-deployment-xxxxxxxx
    
  • If the system is previously installed with Tokkio 1.5. The chart needs to be uninstalled first and all PVC wiped clean before attempting to install Tokkio 2.0.

  • There are sometimes race conditions upon rebooting the host, which cause some pods to fail to boot. It’s advisable to re-install the app after a host reboot.

  • RAM usage on the REDIS service is unbounded and increases linearly with the length of operation.

  • Bot goes to idle state after 10 mins of inactivity.

  • MLOps data package video content might be cut short before the transaction ends.