Troubleshooting

Lag and animation or audio stutters can be caused by various factors. This section lists some common issues, and how to narrow down the root cause.

Audio2Face Microservice Too Slow

In some circumstances, the animation data stream from the Audio2Face microservice is streaming data slower than real-time (e.g. slower than 30.0 frames per second). In that case, the Animation Graph microservice doesn’t play back all the animation data and simply drops the animation data frames that arrive too late. When this happens, the following message is printed in the Animation Graph microservice logs:

Warning: buffer underrun. Discarding current data

When this happens a lot, and the root cause is that the Audio2Face microservice proves data too slowly or very irregularly, then this can be mitigated somewhat by increasing the value of the animationSource.bufferSize UCS microservice parameter of the Animation Graph microservice. The higher this value, the more animation data will be buffered before playback. This makes the connection more robust to jitter and delayed animation data, but at the cost of higher latency. We found a value of 0.1s to be acceptable. Note that this value depends on your system’s configuration and performance characteristics.

Audio2Face Microservice Does Not Send Animation Data

In rare scenarios, we observed that the Audio2Face microservice might not be sending any animation data chunks nor the animation data stream header. When this happens, no animation data is played back, and the following message is printed in the logs:

Received SUCCESS status before the header in the animation data stream!

We have only experienced this issue in experimental scenarios. If this happens, consider investigating whether the Audio2Face microservice is running correctly and consider upgrading if possible.

Animation Graph Microservice Too Slow

Sometimes, the animation data from the Animation Graph microservice arrives at the Omniverse Renderer microservice slower than in real-time. When this happens, the following message is printed in the Omniverse Renderer microservice logs:

Display time is in the past! Skipping frame!

This usually means that Animation Graph microservice was slowed down for a short period, e.g. when the CPU is busy handling another process. Here again, if the issue occurs a lot, you may mitigate this by increasing the value of the animationSource.bufferSize UCS parameter of the Omniverse Renderer microservice. The higher this value, the bigger the animation data input buffer. This reduces the risk of animation data stream jitter, but comes at the cost of higher latency. We found a value of 0.1s to be acceptable. Note that this value depends on your system’s configuration and performance characteristics.

Speech Audio and Lip Animation Synchronization

It may happen that there are no warnings in the animation data streams, and the animation data playback is smooth, however, the speech audio and the lip animation are not in sync.

The root cause is that the Livestream extensions sends audio and video through two different RTP streams. These streams don’t support synchronization protocols, which may lead to a perceptible temporal offset between the animation and the audio stream.

To solve this, the Omniverse Renderer microservice has a livestream.audioDelay UCS parameter that delays audio by a the specified number of seconds. We have found that a value of about 0.1s resolved the issue on systems.

Performance Monitoring

If you observe lags or interruptions that are not caused by any issue documented above, the next step is to inspect the frame rate of the Animation Graph and the Omniverse Renderer microservices.

The frame rate for the Animation Graph microservice is printed in the logs:

Output animation data | Stream ID: <stream_id> | Mean frame time: 0.0334

And for the Omniverse Renderer microservice:

Rendering animation data | Time since start [s]: 1314.159 | Port: 8000 | Mean frame time: 0.0334

These values are usually very similar.

If the Mean frame time of Animation Graph microservice and/or the Omniverse Renderer microservice are frequently above ~0.034 seconds, then it’s usually related to a general performance issue. This is often caused by the GPU reaching 100% usage and slowing down the whole system.

Avatar Gesture Is Not Triggered

If you trigger an invalid animation gesture or posture state, then the HTTP call will succeed without warning or error. However, the avatar goes into a neutral stance and extends one or more fingers of the left hand. That’s the so called test pose. This is an indicator that the Animation Graph has received an invalid or no input. E.g. if the string for a gesture was misspelled, or if the character wasn’t found by the micro service. You can recover from this state by triggering a new gesture and a new posture with a valid value.

This image shows the test pose the avatar will go into when an invalid gesture or posture is triggered.

This image shows the test pose the avatar will go into when an invalid gesture or posture is triggered.

Glfw Warnings

When running the Animation Pipeline with docker, you may encounter glfw warnings like below. When those happen, you are unable to see the scene.

2024-05-24 16:41:09 [1,094ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2024-05-24 16:41:09 [1,094ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.4]) (impl: carb.windowing-glfw.plugin)

To address this issue, run the following:

sudo xhost +

And re-run the Animation Graph container with -v /tmp/.X11-unix:/tmp/.X11-unix:

docker run -it --rm --gpus all --network=host --name anim-graph-ms -v <path-to-avatar-scene-folder>:/home/ace/asset nvcr.io/eevaigoeixww/animation/ia-animation-graph-microservice:1.0.1 -v /tmp/.X11-unix:/tmp/.X11-unix

Not Enough Memory

If docker containers or kubernetes deployments fail to start, you may be running out of memory. In kubernetes, this manifests as a pod stuck in a CrashloopBackoff state. In the container’s logs, you see lines resembling:

[A2F SDK] [ERROR] [TensorRT] 1: [defaultAllocator.cpp::allocate::20] Error Code 1: Cuda Runtime (out of memory)
[A2F SDK] [ERROR] [TensorRT] 2: [executionContext.cpp::ExecutionContext::410] Error Code 2: OutOfMemory (no further information)
[A2F SDK] [ERROR] Unable to create TensorRT Execution Context
[A2F SDK] [ERROR] Unable to initialize inference engine
[A2F SDK] [ERROR] SetNetwork Processor failed
[A2F SDK] [ERROR] Cannot Initialize from Json file: /opt/nvidia/a2f_pipeline/a2f_data/data/networks/claire_v1.3/a2f_ms_config.json

If this happens, ensure your system meets the hardware requirements for the docker or kubernetes workflows.