Docker & Omniverse Animation Pipeline Workflow#

Local docker workflow setup overview (1x Animation Graph microservice, 1x Omniverse Renderer microservice, 1x Audio2Face-3D microservice).

In this setup, you will run the microservices directly locally from Docker containers to create a single avatar animation stream instance.

The workflow configuration consists of the following components:

  • Audio2Face-3D microservice: converts speech audio into facial animation, including lip syncing.

  • Animation Graph microservice: manages and blends animation states.

  • Omniverse Renderer microservice: a renderer that visualizes the animation data based on the loaded avatar scene.

  • Gstreamer client: captures and shows image and audio streaming data as user front end.

  • Avatar scene: Collection of 3D scene and avatar model data that is saved in a local folder.

Prerequisites#

Before you start, make sure to have installed and checked all prerequisites in the Development Setup.

Additionally, this section assumes that the following prerequisites are met:

  • You have installed Docker.

  • You have access to NVAIE, which is required to download the Animation Graph and Omniverse Renderer microservices.

  • You have access access to the public NGC catalog, which is required to download the Avatar Configurator.

Hardware Requirements#

Each component has its own hardware requirements. The requirements of the workflow are the sum of its components.

The actual hardware requirements depend on the specific use case and the complexity of the avatar scene. Also the numbers vary on the actuall hardware used. The following table gives some guidance.

Component

Minimum CPU

Recommended CPU

Minimum RAM

Recommended RAM

Recommended GPU

Minimum GPU RAM

Recommended GPU RAM

Recommended Storage

Audio2Face-3D

4 cores

8 cores

4 GB

8 GB

Any NVIDIA GPU with compute capability

10 GB

12 GB

10 GB

Animation Graph

< 1 core

1 core

1 GB

4 GB

< 1 NVIDIA RTX-compatible GPU

1 GB

1 GB

5 GB

Omniverse Renderer

2 cores

6 cores

24 GB

32 GB

1 NVIDIA RTX-compatible GPU

8 GB

12 GB

7 GB

Total

4 cores

15 cores

29 GB

44 GB

1-2 NVIDIA RTX-compatible GPU

19 GB

25 GB

22 GB

Pull Docker Containers#

docker pull nvcr.io/nvidia/ace/ia-animation-graph-microservice:1.1.0
docker pull nvcr.io/nvidia/ace/ia-omniverse-renderer-microservice:1.1.0
docker pull nvcr.io/nim/nvidia/audio2face-3d:1.2

Download Avatar Scene from NGC#

Download scene from NGC and save it to a default-avatar-scene_v1.1.4 directory in your main work directory.

Alternatively, you can download the scene using the ngc command.

ngc registry resource download-version "nvidia/ace/default-avatar-scene:1.1.4"

Run Microservices#

The following instructions will guide you through the process of running the microservices in docker containers. Note that once started they can be restarted independently. Just make sure to reconnect them with a new stream id (see Connect Microservices through Common Stream ID).

Run Animation Graph Microservice#

Run the following command to start the Animation Graph Microservice. This service manages and animates the avatar’s position.

docker run -it --rm --gpus all --network=host --name anim-graph-ms -v $(pwd)/default-avatar-scene_v1.1.4:/home/interactive-avatar/asset nvcr.io/nvidia/ace/ia-animation-graph-microservice:1.1.0

The service is up and running from the moment you see the log line app ready.

Note

If you encounter warnings about glfw failing to initialize, see the troubleshooting section.

Run Omniverse Renderer Microservice#

docker run --env IAORMS_RTP_NEGOTIATION_HOST_MOCKING_ENABLED=true --rm --gpus all --network=host --name renderer-ms -v $(pwd)/default-avatar-scene_v1.1.4:/home/interactive-avatar/asset nvcr.io/nvidia/ace/ia-omniverse-renderer-microservice:1.1.0

You might need to adapt the path to the scene if it is not downloaded to the current folder $(pwd)/default-avatar-scene_v1.1.4.

Note

Starting this service can take up to 30 min. It is ready when you see the following line in the terminal:

[SceneLoader] Assets loaded

Run Audio2Face-3D Microservice#

Run the following command to run Audio2Face-3D:

# Create config file to disable bidirectional grpc
cat > deployment_config.yaml <<EOT
endpoints:
  use_bidirectional: false
EOT

# Run the docker container and mount the deployment config
docker run --rm --network=host -it --gpus all --entrypoint /bin/bash -w /opt/nvidia/a2f_pipeline -v ./deployment_config.yaml:/opt/configs/deployment_config.yaml nvcr.io/nim/nvidia/audio2face-3d:1.2

Generating TRT models:

# Generate the TRT models
./service/generate_trt_models.py

You should see the following line, when the model generation finishes successfully:

Model generation done.

Note

Note that generating models must be done every time the service is started unless they are cached explicitly. See Audio2Face-3D NIM Container Deployment and Configuration Guide for more information.

Launch service:

# Start A2F
/usr/local/bin/a2f_pipeline.run --deployment-config /opt/configs/deployment_config.yaml

The following log line appears when the service is ready:

[<date> <time>] [  global  ] [info] Running...

Note

More details on Audio2Face-3D deployment can be found in Audio2Face-3D NIM Container Deployment and Configuration Guide.

Setup and Start Gstreamer#

We will use Gstreamer to catch the image and audio output streams to visualize the avatar scene.

Note

No image or sound is shown until streams are added in the Connect Microservices through Common Stream ID step!

Install Gstreamer plugin:

sudo apt-get install gstreamer1.0-plugins-bad gstreamer1.0-libav

Run the video receiver in its own terminal:

gst-launch-1.0 -v udpsrc port=9020 caps="application/x-rtp" ! rtpjitterbuffer drop-on-latency=true latency=20 ! rtph264depay ! h264parse ! avdec_h264 ! videoconvert ! queue ! autovideosink sync=false

This will only show a terminal output until the stream is started by adding a stream id in the next step:

Terminal output of Gstreamer video receiver.

Run the audio receiver in its own terminal (Linux and Windows commands are slightly different. If you changed the IAORMS_LIVESTREAM_AUDIO_SAMPLE_RATE, make sure to set the clock-rate to the same value):

# On Linux
gst-launch-1.0 -v udpsrc port=9021 caps="application/x-rtp,clock-rate=16000" ! rtpjitterbuffer ! rtpL16depay ! audioconvert ! autoaudiosink sync=false

# On Windows
gst-launch-1.0 -v udpsrc port=9021 caps="application/x-rtp,clock-rate=16000" ! rtpjitterbuffer ! rtpL16depay ! audioconvert ! directsoundsink sync=false

This will only show a terminal output until the stream is started by adding a stream id in the next step:

Terminal output of Gstreamer audio receiver.

Connect Microservices through Common Stream ID#

Back in the main terminal, run the following commands:

stream_id=$(uuidgen)
curl -X POST -s http://localhost:8020/streams/$stream_id
curl -X POST -s http://localhost:8021/streams/$stream_id

Note

After executing these commands, a Gstreamer window should open and you should see a slightly moving avatar.

Test Animation Graph Interface#

Let’s set a new posture:

curl -X PUT -s http://localhost:8020/streams/$stream_id/animation_graphs/avatar/variables/posture_state/Talking

or change the avatar’s position:

curl -X PUT -s http://localhost:8020/streams/$stream_id/animation_graphs/avatar/variables/position_state/Left

or start a gesture:

curl -X PUT -s http://localhost:8020/streams/$stream_id/animation_graphs/avatar/variables/gesture_state/Pulling_Mime

or trigger a facial gesture:

curl -X PUT -s http://localhost:8020/streams/$stream_id/animation_graphs/avatar/variables/facial_gesture_state/Smile

Alternatively, you can use the OpenAPI interface to see and try out more: http://localhost:8020/docs.

You can find all valid variable values here: Default Animation Graph.

Test Audio2Face-3D#

Let’s now take a sample audio file to feed into Audio2Face-3D to drive the facial speaking animation.

Normally, you would send audio to Audio2Face-3D through its gRPC API. For convenience, a python script allows you to do this through the command line. Clone the ACE repo and follow the steps to setup the script.

The script comes with a sample audio file that is compatible with Audio2Face-3D. From the microservices/audio_2_face_microservice/1.2/scripts/audio2face_in_animation_pipeline_validation_app directory of the ACE repo, run the following command to send the sample audio file to Audio2Face-3D:

python3 validate.py -u 127.0.0.1:50000 -i $stream_id ../../example_audio/Mark_joy.wav

You can safely ignore any WavFileWarning warnings when running the script.

Note

The audio formats in the Audio2Face-3D microservice, Animation Graph microservice, and Omniverse Renderer microservice need to be configured in the same way.

You can interrupt the request while the avatar is speaking with the HTTP API interruption endpoint. For simplicity, you can replace the randomly generated request-id in the validate.py on line 46 by a hardcoded one (e.g. request_id="123") and use it in the following request:

curl -X DELETE -s http://localhost:8020/streams/$stream_id/requests/<request-id>

Change Avatar Scene#

To modify or switch out the default avatar scene you can use the Avatar Configurator.

Download and unpack it and start it by running ./run_avatar_configurator.sh.

_images/avatar_configurator.jpg

Once it successfully started (the first time it takes longer to compile shaders), you can create your custom new scene and save it.

Then copy all the files from the folder avatar_configurator/exported and replace the files in the local folder avatar-scene-folder from the step “Download Avatar Scene from NGC”.

After restarting the Animation Graph and Omniverse Renderer service you will see the new scene.

Clean up: Remove Streams#

You can clean up the streams using the following commands:

curl -X DELETE -s http://localhost:8020/streams/$stream_id
curl -X DELETE -s http://localhost:8021/streams/$stream_id