Troubleshooting#

For any crashes for any of the components or troubleshooting help

Please report to Nvidia the following:

  1. Complete logs for the component and related component (ideally all logs). Please see below a short script to collect all the pod logs from the deployment setup -

#!/bin/bash
for pod in $(kubectl get pods -o go-template --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}')
do
    #depending on logs, this may take a while
    kubectl logs --all-containers $pod > $pod.txt
done
  1. Capture nvidia-smi or equivalent GPU utilization data

  2. Capture cpu & memory utilization data

  3. Capture console logs from UI

  4. Where applicable, please share a video of the interaction.

UI and Iframe Errors#

The below section goes through the different errors that can appear in the Tokkio UI and Tokkio Iframe.

Deployment Not Found#

This error appears when the location of the Tokkio deployment is misconfigured in the UI.

Appearance in Tokkio UI#

Deployment not found Tokkio UI

Appearance in Iframe#

Deployment not found Tokkio Iframe

Resolution#

If deploying the UI or Iframe without the one-click scripts, first ensure that the Tokkio deployment is healthy and reachable. This can be done by hitting the /health endpoint, using the below cURL command:

curl http(s)://<ingress endpoint>/health

If the deployment is healthy, you should see an empty list for unhealthy services in the response payload.

$ curl http://10.128.34.39:30888/health
{"unhealthy_services":[],"healthy_services":["....", "....."]}

If the endpoint does not respond as expected, you must ensure that the deployment is running and the machine you are running it on is reachable by the client.

Once you have verified that the deployment is healthy, confirm that the deployment endpoint is configured correctly in the UI. In the Tokkio Iframe, this is the INGRESS_ENDPOINT config option, and in the Tokkio UI, this is the UI_SERVER_ENDPOINT option. Take the time to read through the documentation for the Configuration or Configuration to ensure that the deployment is configured properly.

If deploying the UI with the one-click scripts, ensure that the above deployment location configs are not set, these will be set automatically by the one-click scripts.

Avatar Stream Not Connecting#

Upon startup, the UI will get the user’s video and audio stream, connect to the VST microservice WebSocket, then establish a webRTC connection for the avatar video. If any part of this process fails, then this error will appear.

Appearance in Tokkio UI#

Avatar Stream Failed Tokkio UI

Appearance in Iframe#

Avatar Stream Failed Tokkio Iframe

Resolution#

There are a number of possible reasons that this can show up.

  1. If Tokkio or the UI is deployed over HTTP and not HTTPS, browser flags need to be set in the user’s browser to treat the Tokkio and UI deployment endpoints as secure. On Google Chrome, this is done by navigating to the chrome://flags URL, opening the Insecure origin treated as secure option, then adding the insecure Tokkio endpoints to the text box. This will usually be a string like this: http://<ingress IP:PORT>,ws://<ingress IP:PORT>.

  2. The browser window must have access to the user’s microphone. If deploying over remote desktop, there will not be a microphone on the system, and this error will appear. If the camera is enabled, the browser must also have access to a camera.

  3. If any user has multiple UIs open in their browser, this can cause the backend to lose track of the total number of streams, and fail to notify the UI that there is no capacity. In this case, the WebSocket connection will be rejected, and this error will appear.

  4. The coturn server must be configured correctly in the VST microservice. If coturn is not set up properly, the avatar wheel will spin for a long time before showing the error.

  5. If not deploying the UI with the one-click scripts, ensure that the VST endpoint is configured correctly in the UI. In the Tokkio Iframe, this is the VST_ENDPOINT config option, and in the Tokkio UI, this is the VST_WEBSOCKET_ENDPOINT option. Take the time to read through the documentation for the UI or Iframe to ensure that the deployment is configured properly.

  6. If deploying the UI with the one-click scripts, ensure that the above deployment location configs are not set, these will be set automatically by the one-click scripts.

One can check the browser logs to narrow down the issue as well.

Capacity Full#

This error appears when the UI connects to Tokkio, but all of the streams are being used by others.

Appearance in Tokkio UI#

Capacity Full Tokkio UI

Appearance in Iframe#

Capacity Full Tokkio Iframe

Resolution#

Ensure that there are less people using the system than there are streams available. After closing the UI, it will take a minute or two for the stream used by that UI to free up.

Invalid token#

This error appears when the user attempts to call an endpoint in Tokkio without a session token.

Appearance in Tokkio UI#

Invalid Token Tokkio UI

Appearance in Iframe#

Avatar will appear for a moment, before showing the capacity full error message. Console logs may show an HTTP 400 error.

Invalid Token Tokkio Iframe

Resolution#

Ensure that cookies are enabled in your browser. Clear your cookies and history, and do not open Tokkio in incognito mode.

SDR Error Reporting#

This error appears when one of the backend services has crashed, then recovered.

Appearance in Tokkio UI#

SDR Error Tokkio UI

Appearance in Iframe#

This error will not appear in the iframe, the failure will happen silently. If the avatar video or Tokkio feature stops working abruptly, it is possible that a crash happened.

Resolution#

The error resolved itself by restarting the failing pod. Restart the UI to return to normal.

Tokkio vision tuning#

To disable VisionAI or the user attention visit Disable Tokkio vision To change the sensitivity of the user attention visit: Metropolis User Occupancy and Engagement Alerts

Manually triggering ADD / REMOVE calls from SDR#

  1. Send “CAMERA_ADD” redis message to vst_events to trigger SDR ADD call

$ xadd vst_events * sensor.id "{\"alert_type\":\"camera_status_change\",\"created_at\":\"2024-09-26T18:00:17Z\",\"event\":{\"camera_id\":\"3477c96b-5e3d-492f-8a5c-3682df2c3b37\",\"camera_name\":\"webcam_3477c96b-5e3d-492f-8a5c-3682df2c3b37\",\"camera_url\":\"rtsp://192.168.33.235:30554/webrtc/3477c96b-5e3d-492f-8a5c-3682df2c3b37\",\"change\":\"camera_add\"},\"source\":\"vst\"}\n"
$ xadd vst_events * sensor.id "{\"alert_type\":\"camera_status_change\",\"created_at\":\"2024-09-26T18:00:19Z\",\"event\":{\"camera_id\":\"3477c96b-5e3d-492f-8a5c-3682df2c3b37\",\"camera_name\":\"webcam_3477c96b-5e3d-492f-8a5c-3682df2c3b37\",\"camera_url\":\"rtsp://192.168.33.235:30554/webrtc/3477c96b-5e3d-492f-8a5c-3682df2c3b37\",\"change\":\"camera_streaming\"},\"source\":\"vst\"}\n"
  1. Check SDR log to make sure ADD call is made

$ kubectl logs -f chat-controller-sdr-envoy-sdr-deployment-bb5dd6588-rmhp7 --tail 1

2024-09-26 19:44:33 lib.podprovisioner.provisionconfig - INFO - Starting add call
2024-09-26 19:44:33 lib.podprovisioner.provisionconfig - INFO - adding camera at http://192.168.33.193:9010/add
2024-09-26 19:44:33 lib.podprovisioner.provisionconfig - INFO - payload: {
"alert_type": "camera_status_change",
"created_at": "2024-09-26T18:00:17Z",
"event": {
    "camera_id": "3477c96b-5e3d-492f-8a5c-3682df2c3b37",
    "camera_name": "webcam_3477c96b-5e3d-492f-8a5c-3682df2c3b37",
    "camera_url": "rtsp://192.168.33.235:30554/webrtc/3477c96b-5e3d-492f-8a5c-3682df2c3b37",
    "change": "camera_add"
},
"source": "vst"
}
2024-09-26 19:44:33 lib.podprovisioner.provisionconfig - INFO - add operation Response Code: 200
2024-09-26 19:44:33 lib.podprovisioner.provisionconfig - INFO - add operation text return: {
        "status" : "STREAM_ADD_SUCCESS"
}
2024-09-26 19:44:33 __main__ - INFO - add operation success updating
                                    the Route mapping
ace-agent-chat-controller-deployment-0: |-
    [
        {
            "alert_type": "camera_status_change",
            "created_at": "2024-09-26T18:00:17Z",
            "event": {
                "camera_id": "3477c96b-5e3d-492f-8a5c-3682df2c3b37",
                "camera_name": "webcam_3477c96b-5e3d-492f-8a5c-3682df2c3b37",
                "camera_url": "rtsp://192.168.33.235:30554/webrtc/3477c96b-5e3d-492f-8a5c-3682df2c3b37",
                "change": "camera_add"
            },
            "source": "vst"
        }
    ]
  1. Send “CAMERA_REMOVE” redis message to vst_events to trigger SDR REMOVE call

$ xadd vst_events * sensor.id "{\"alert_type\":\"camera_status_change\",\"created_at\":\"2024-09-26T18:01:15Z\",\"event\":{\"camera_id\":\"3477c96b-5e3d-492f-8a5c-3682df2c3b37\",\"camera_name\":\"webcam_3477c96b-5e3d-492f-8a5c-3682df2c3b37\",\"camera_url\":\"rtsp://192.168.33.235:30554/webrtc/3477c96b-5e3d-492f-8a5c-3682df2c3b37\",\"change\":\"camera_remove\"},\"source\":\"vst\"}\n"
  1. Check SDR log to make sure REMOVE call is made

    • See similar instruction for checking ADD call

2024-09-26 19:46:01 werkzeug - INFO - 169.254.3.1 - - [26/Sep/2024 19:46:01] "GET /healthz HTTP/1.1" 200 -
2024-09-26 19:46:01 werkzeug - INFO - 169.254.3.1 - - [26/Sep/2024 19:46:01] "GET /healthz HTTP/1.1" 200 -
2024-09-26 19:46:02 __main__ - INFO - id: 1727379962895-0, content: {'sensor.id': '{"alert_type":"camera_status_change","created_at":"2024-09-26T18:01:15Z","event":{"camera_id":"3477c96b-5e3d-492f-8a5c-3682df2c3b37","camera_name":"webcam_3477c96b-5e3d-492f-8a5c-3682df2c3b37","camera_url":"rtsp://192.168.33.235:30554/webrtc/3477c96b-5e3d-492f-8a5c-3682df2c3b37","change":"camera_remove"},"source":"vst"}\n'}
2024-09-26 19:46:02 __main__ - INFO - {'sensor.id': '{"alert_type":"camera_status_change","created_at":"2024-09-26T18:01:15Z","event":{"camera_id":"3477c96b-5e3d-492f-8a5c-3682df2c3b37","camera_name":"webcam_3477c96b-5e3d-492f-8a5c-3682df2c3b37","camera_url":"rtsp://192.168.33.235:30554/webrtc/3477c96b-5e3d-492f-8a5c-3682df2c3b37","change":"camera_remove"},"source":"vst"}\n'}
2024-09-26 19:46:02 __main__ - INFO - {"alert_type":"camera_status_change","created_at":"2024-09-26T18:01:15Z","event":{"camera_id":"3477c96b-5e3d-492f-8a5c-3682df2c3b37","camera_name":"webcam_3477c96b-5e3d-492f-8a5c-3682df2c3b37","camera_url":"rtsp://192.168.33.235:30554/webrtc/3477c96b-5e3d-492f-8a5c-3682df2c3b37","change":"camera_remove"},"source":"vst"}

2024-09-26 19:46:02 __main__ - INFO - deprovision stream

No response received for queries sent to the Tokkio reference application#

  1. Ensure that the microphone used for the speech input is functional

  2. The reference workflows from Tokkio also require a functional camera input to detect a person’s presence in the camera field of view

  3. Check the logs of the chat controller pod to ensure that the input speech is detected

  4. Check the logs of the chat engine pod and the plugin server for errors in retrieving a response

  5. Reach out to Tokkio support point of contact for more information, if needed.

Avatar stuttering or stopping to speak unexpectedly with multiple concurrent sessions#

The avatar speech may become less smooth when the load on compute resources becomes too high. When this happens, Audio2Face inference and blendshape solve can slow down, causing the animation to stutter. Evidence of this can be found in the Audio2Face microservice logs with entries like this:

Streaming <stream ID> at X FPS

Where X is below 30.

If this happens, please try reducing the number of concurrent sessions for a smoother experience.

User presence not detected by Tokkio reference application#

  1. Ensure that the microphone used for the speech input is functional

  2. The reference workflows from Tokkio also require a functional camera input to detect a person’s presence in the camera field of view

  3. Check the logs of the chat controller pod to check if the input speech is detected

  4. Restart occupency alerts pods and re-try

  5. Reach out to Tokkio support point of contact for more information, if needed. Please include the system logs and a detailed description of the setup configuration in your support request.

Triton pod crashes on T4 GPU with Parakeet model#

There are a couple of options that the users can try here.

1. Change the model used for Tokkio 4.1 asr_conformer_en_us_streaming_throughput_flashlight_vad:2.15.0-tokkio. We can achieve same by passing below user_override_value while deployment.

riva-api:
  modelRepoGenerator:
    ngcModelConfigs:
      triton0:
        models:
        - nvidia/ace/asr_conformer_en_us_streaming_throughput_flashlight_vad:2.15.0-tokkio
        - nvidia/riva/rmir_tts_fastpitch_hifigan_en_us_ipa:2.17.0

you can refer on how to pass user_override_value to OneClick script using Integrating Customization Changes.

  1. The pod will restart a few times for a fresh deployment and then eventually come up. A manual restart of the pod might be required if the pod does not automatically come up after several restarts when doing a 1-click deployment.

ASR and TTS not working on installing a new app with one click script#

  1. Check if the GPU is not available to the Riva init container (NVML Error). Model deployment happens in ONNX format which is not supported and Triton container subsequently fails.

$ kubectl exec -it triton0-bbd77d78f-22dr8 -c riva-model-init /bin/bash -n app group ID 1000 I have no name!@triton0-bbd77d78f-22dr8:/opt/riva$ nvidia-smi Failed to initialize NVML: Unknown Error

Please try the suggestions from NVIDIA Github

  1. Reach out to Tokkio support point of contact for more information, if needed. Please include the system logs and a detailed description of the setup configuration in your support request.

UE App does not work on three streams on a fresh Deployment#

Delete the renderer-sdr, ue-renderer and the VMS pod before re-trying.

User Presence not detected even when the user is in FOV#

  1. Uninstall the tokkio app

  2. Delete pvc vms-local-storage, ds-sdr-envoy-agent-storage and delete pv of mongodb, ds-sdr, vms, redis-data

  3. Deploy the app again and check.

Miscellaneous troubleshooting tips when using CSP guides#

  1. In AWS during app installation the scripts might show IAC change due to AWS certificate changes user have to re-run infra and then deploy the app

  2. Updating secrets will not show the modified data and will not restart the pods user has to manually restart the pods

  3. Upgrading the app doesn’t restart all the UE pods have to restart all the UE pods to give streams on all three clients

  4. Upgrade in VMS cm will not restart the pod have to manually restart the pod