Tokkio 1.5#
Documentation update: 1.5.3#
Introduced Automated setup (One Click) guides for AWS, Azure and GCP.
Renamed previous CSP setup guides to include Interactive in their names.
Updated screenshots under “Setup TURN server instance” -> “Setup Security group” to reflect NAT IP in AWS interactive setup guide
Updated screenshot in “Create S3 bucket” section to reflect “Block all public Access” option
Added and an overview of negative slot tagging in Intent Slot model and how to use negated slots in Fulfillment
Documentation update: 1.5.2#
The following are the list of topics updated for the release:
Fixed missing links and references in the documentation
CSP guides: Added easy to follow instructions, updated diagrams
VMS/VST introduction: Updated to clarify connection between VST and VMS
Riva Speech Skills: Added missing links, updated description
Tokkio Bot and its customization: Updated description, added easy to follow instructions
Tokkio customization details: Added a table for all supported customizations
Quick Start Guide: Added instructions regarding loading Tokkio graph with UCF Studio
Avatar Configurator: Added instructions regarding usage outside of Tokkio
Known Issues#
ASR
Sometimes ASR predicts spurious transcripts, mapping non-speech signals to characters/words. This can be reduced by Fine Tuning Neural VAD, refer to Neural VAD Fine tuning in documentation.
Usage of Headphones and less noisy acoustic scenario’s are preferred.
TTS
Some cases currency denominations (Ex: $10000) does not get pronounced properly.
NLP
There is a known quality issue with the Answer Extender and QNA model used in Tokkio Bot. Some queries to the prebuilt Tokkio QSR bot, mainly related to menu items like “Do you have a coffee”, “Do you have a cobb salad” does not work properly.
Some page navigation related queries like “Go to main page” cause the UI to get stuck. This is due to known quality issues of the bert based intent-slot model for such queries, misclassification of the intent cause the state to be stuck on UI side.
LLM
LLM endpoints can result in intent miss-classifications which can result in the functional flow getting modified and hence the response can go haywire.
Only NemoLLM and OpenAI models are supported right now. Support for adding any 3rd party LLM not yet available.
When using guardrail policy, latency can be as high as ~10 seconds.
Form filling doesn’t work for LLM app when guardrail policy is used to formulate the response.
When guardrail policy is used, it is noted that the bot classifies the bot’s response as unethical even though it is not.
For IR, the documents may not get retrieved if the confidence threshold is lower than 0.15 (by default).
Only a single document can be auto-ingested. To ingested more documents, deploy the app first and use the IR URL endpoint to ingest them.
For date-time and weather queries, the bot can only respond to the current date-time and weather scenarios. It may not work if you ask it to do mathematics like “what is the time 45 minutes past the current time?”, or “How was the weather in Delhi 2 days ago?” etc.
Profanity handling prevents profane words from bot response but does not prevent it from being displayed as part of the ASR transcript which is rendered on the UI.
The responses from the bot may not always be factually correct from LLM endpoints.
The bot contains support for spurious_transcript_handler user is expected to update it based on spurious transcripts they are getting. Nemo guardrails can be used for such cases.
The LLM bot doesn’t have an exit intent like QSR where you can say bye to close the interaction session.
The bot has been configured with virtual_assistant_pipeline_idle_threshold_secs=10 seconds hence it is expected that the bot will close the session if bot stays idle for 10 seconds or more.
Recommendations
Current sample provides a reference to use with the QSR App and requires users to build their own recommendations Model with Nvidia Merlin for their use-cases.
The merlin model used for recommendation is trained on synthetic data and hence you can see it providing unreasonable recommendations. The model is provided for reference only and the user is expected to come up with their on recommendation model properly trained on good amount of data.
We do not support recommending items to the user without them explicitly asking for it.
Recommendations are only based on the cart of the user, not on any other information like season, weather, popular holidays, demographics etc.,
We don’t support recommendations for based on popular item, item description, toppings, calories.
UI
Some items doesn’t show image (e.g. when asking for recommendation) as the image location was pointing to “default.png” which did not exist.
Avatar Configurator
The app can consume high amounts of CPU and memory
Currently there is a known issue with NGC client 3.22.0 (and potentially earlier versions) that does not upload folders with nested sub-folders correctly. Until this is fixed, you will need to use a Linux/Ubuntu system to upload Avatar scenes to NGC.
Tokkio Package Analyzer (TPA)
Timeouts may occur during TPA installation if the Helm installer is unable to locate the required secrets or encounters issues while pulling Docker images. Ensuring that all pre-requisites are met and that image pull secrets have the appropriate permissions can prevent timeouts from occurring.
The TPA visualizer requires corresponding audio files for a turn’s Query and/or Response in order to play audio. There is a known issue currently where sometimes the audio files are not present in the MLOPs package, and hence playback shall not be possible in such cases.
TPA operates under the assumption that turn IDs and MLOps session IDs are unique and non-repeating. There is a known issue whereby the Riva session IDs overlap due to which the package does not show up in TPA UI. Workaround is to close browser between different sessions.
In TPA Visualizer, sometimes user may have to click multiple times on the audio play button of Query and/or Response columns for it to work
It is recommended that the root partition has more than 100GB of free space to cater to TPA storage requirements. Note that as TPA ingests packages, the storage consumed by TPA backend including Airflow and MongoDB grows over time. User should therefore be mindful of available space during TPA execution.
Failure bucket for other (E_other) in the TPA UI is not supported in the current release.
To restart the TPA after modifying the global_config and/or environment variables of airflow, visualizer-backend and visualizer-frontend via values.yaml, you will need to perform an upgrade installation (helm upgrade –install tpa-rel-name tpa-chart-name) and then delete the corresponding pod in which the change was required. This will automatically restart the pod with updated config values.
Other
The Barge-in support available in Tokkio 2.0 is for stopping the TTS playback only and not to stop the action triggered by the interrupted response. So for example, an ordered item being added to the cart would not be prevented if a user stops tts playback by using any of the barge-in words.
The pipeline can sometimes hang after long duration of operation. A hard reboot of the bot controller pod resolves the issue.
Profanity handling prevents profane words from bot response but does not prevent it from being displayed as part of the ASR transcript which is rendered on the UI.
Animation Microservice as part of Tokkio supports only Omniverse Avatars and not Metahuman / Other rendering engines like Unreal Engine
A single 4-T4 GPU node is the only deployment architecture supported currently. It can at most support up to 3 concurrent streams and additional requests will be rejected.
System will be temporarily inaccessible during software upgrades including parameter update. (no zero-downtime guarantee)
Chrome only
Good lighting conditions required
Single person in view
Client system should have sufficient memory and CPU for smooth operation.
Kit limitation: Avatar Configurator consumes high memory and CPU due to the RTX rendering viewport. If many applications are open (e.g. Chrome), sporadic crashes might happen.
Cart Manager MLops information currently provides information about all the sessions in a single message. This is meant for reference only. Customers are to modify or bring in their own Cart/Order Management microservices
Avatar scene: Image can sometimes display minor artifacts around character edges and backgrounds.
After the avatar speaks a sentence, there can be a small visible lag in the animation.
When starting the Animation Microservice, it could give a “ready” state before the A2F network is actually initialized. As a result, if the user starts the browser UI immediately after getting the “ready” signal from the Animation Microservice, the user can already see and interact with the avatar in the UI, but the lips won’t move for the first 30-60 seconds.
When starting the Animation Microservice with a scene which has been very recently uploaded to NGC (i.e. a few seconds before), the Animation Microservice can crash a few times during initialization (“Init”). Eventually, after a few automatic restarts, it will work properly. Even though the scene looks fine on NGC and the upload worked, it can indeed happen that the scene is not actually downloadable from NGC by the microservice for some more seconds/minutes.
Tokkio client systems should not have applications or sites opened which result in high CPU or memory utilization while accessing the Tokkio App through the browser. High CPU usage negatively impacts the quality of video streams transported to the cloud. Poor video quality leads to failure in detecting faces in the vision pipeline subsystem. This ultimately results in abrupt end user session terminations.
After long running of Tokkio backend system Redis time series pod may go into very high memory usage state resulting in Tokkio system to be unusable. To recover from this situation the Redis stream named ‘test’ needs to be capped/trimmed. Follow the steps below.
kubectl get pods #Identify the name of redis-timeseries pod by executing following command
Execute the redis-cli command line tool present inside the redis-timeseries pod.
kubectl exec -it <redis-timeseries-pod-name> redis-cli Replace <redis-timeseries-pod-name> with the actual name of the Pod identified in step a)
Trim the Redis stream test using the command shell of redis-cli tool: XTRIM test MAXLEN 10000
Exit the redis-cli command line tool: exit
All SRD containers need to be rebooted manually, if VST pod restarted for any reason:
kubectl delete pod a2f-wdm-envoy-wdm-deployment-xxxxxxxx bot-controller-wdm-envoy-wdm-deployment-xxxxxxx ds-wdm-envoy-wdm-deployment-xxxxxxxx
If the system is previously installed with Tokkio 1.5. The chart needs to be uninstalled first and all PVC wiped clean before attempting to install Tokkio 2.0.
There are sometimes race conditions upon rebooting the host, which cause some pods to fail to boot. It’s advisable to re-install the app after a host reboot.
RAM usage on the REDIS service is unbounded and increases linearly with the length of operation.
Bot goes to idle state after 10 mins of inactivity.
MLOps data package video content might be cut short before the transaction ends.