Warning

Usage Restrictions

You may not use the Software or any of its components for the purpose of emotion recognition. Any technology included in the Software may only be used as fully integrated in the Software and consistent with all applicable documentation.

Getting Started#

The steps below will help you setup and run the Audio2Face-3D NIM and use our sample application to receive blendshapes, audio and emotions.

Prerequisites#

Check the Support Matrix to make sure you have the supported hardware and software stack.

Read the instructions corresponding to your Operating System.

NVAIE access#

To download Audio2Face-3D NIM Microservice you need an active subscription to an NVIDIA AI Enterprise product.

Contact a Sales representative on this form and request an access to NVIDIA AI Enterprise Essentials.

NGC Personal Key#

Set up your NGC Personal Key if you have not done so already.

Go to the NGC personal key setup page of the NGC website and Generate Personal Key.

Once prompted with a Generate Personal Key form, choose your key Name and Expiration, then select all services for Services Included.

Then you will get your Personal Key, make sure to save it somewhere safe.

Export the API key#

Export the API key generated at the previous step in NGC_API_KEY environment variable to run the A2F-3D NIM by running:

$ export NGC_API_KEY=<value>

To make the key available at startup, run the following command if you are using bash. Make sure you replace <value> with the actual API key.

$ echo "export NGC_API_KEY=<value>" >> ~/.bashrc

Docker Login to NGC#

To pull the NIM container image, you need to login to the nvcr.io docker registry. The username is $oauthtoken and the password is the API key generated earlier and stored in NGC_API_KEY. You can simply run the following command to login:

$ echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
Login Succeeded

Launching the Audio2Face-3D NIM#

There are two quick ways to start the Audio2Face-3D NIM:

Use a pre-generated TRT engine for supported GPUs
Generate a TRT engine for your NVIDIA GPU

Pregenerated TRT engine#

The Audio2Face-3D NIM supports pre-generated engine for the following GPUs:

A10G
A100
A30
H100
L4
L40s
RTX6000
RTX4090
RTX 50 Series (using GB20x compatability mode profile)

To list available model profiles:

$ docker run -it --rm --network=host --gpus all \
--entrypoint nim_list_model_profiles nvcr.io/nim/nvidia/audio2face-3d:1.3.16

To launch the Audio2Face-3D NIM with auto profile selection:

$ docker run -it --rm --network=host --gpus all \
-e NGC_API_KEY=$NGC_CLI_API_KEY nvcr.io/nim/nvidia/audio2face-3d:1.3.16

Check the Manual profile selection section to set the profile manually and deploy.

Manual profile selection

Supported GPUs profiles can be found in the table below:

GPU Type	Profile ID
A10G	cb81f87bcd530fdec6bf29a96b83d6837e4a57ccd6f3622847c178988fb191ec
A30	5c9af5028db0c53e8c4f9db6db151ea839e18c2c270566229fc98f60d1ef993f
A100	e59c2f97d15763a368ae33b4c9419a83348d73193c5aa79f224d6022113afaaa
A100-SXM4-40GB	a2e2b4ff4edb677c0445275a5e3f7ea3b47233b8d517bd34aebb4df65f060e62
A100-PCIE	ee0960f5b9b2ed6321b1c0dcf58e3c76d2e3d8d9420bb64fe5e960cae724a4cd
A100-PCIE-40GB	2e01ff71a695f41bcfdff354fff1465983308d11a2716333bdf891d98cf465c2
H100	095153281fce60754c149b217e7775118c36450adfd5df2590b624cef810a767
H100-NVL	3d3a70ba2ae10496b827bf85468ae85b0c8ad6d65684c4f72e17fe57907eae88
H100-PCIE	c5fc10d30a2d1f1c514867dff26d8707ff1a9404d29312c4a3228e8288eca31a
L40S	c23fd2abf84952c6bdbe17378b865c562cab8784dac21d31aa36c30bdd6296c8
L4	2cec6eaafc5552880952775c50d95f02d4f6ef5b64ba6ea3f29bce5be0449bec
RTX6000	7296c3153bf4005ca20ebfd5e975b183b3e8a1ac189d2830dc09118eaedf5fd0
RTX4090	c761e52b62df2a2a46047aed74dd6e1da8826f3596bec3c197372c7592478f6b
GB20x (compatibility mode)	f4f4bc7183a661f81ab8f7a7bdbc1935d8397139593547a8d6513ee334a94375

Run this command and change the <manifest_profile_id> to the value from the table above corresponding to your GPU:

$  export NIM_MANIFEST_PROFILE=<manifest_profile_id>
$  docker run -it --rm --name audio2face-3d \
  --gpus all \
  --network=host \
  -e NGC_API_KEY=$NGC_API_KEY \
  -e NIM_MANIFEST_PROFILE=$NIM_MANIFEST_PROFILE \
  nvcr.io/nim/nvidia/audio2face-3d:1.3.16

When the Audio2Face-3D NIM is deployed, it will use the james_v2.3 model with tongue animation disabled by default, as shown in the logs below. In order to enable tongue animation, please refer to the Flexible Configuration Management section in Audio2Face-3D NIM Container Deployment and Configuration Guide

[info] Tongue animation is disabled
[info] Will use A2F stylization ids: inference_model_id=james_v2.3, blendshape_id=james_topo2_v2.3, tongue_blendshape_id=

Choosing a Different Pre-configured Model#

While james_v2.3 is the default model when launching with a pregenerated TRT engine, you can select other pre-configured models like claire_v2.3 or mark_v2.3 by setting the PERF_A2F_MODEL environment variable during the docker run command. Using PERF_A2F_MODEL will also enable tongue animation for the selected model.

For example, to launch the Audio2Face-3D NIM with the mark_v2.3 model:

$ docker run -it --rm --network=host --gpus all \
  -e NGC_API_KEY=$NGC_API_KEY \
  -e PERF_A2F_MODEL='mark_v2.3' \
  nvcr.io/nim/nvidia/audio2face-3d:1.3.16

The available values for PERF_A2F_MODEL are:

james_v2.3 (default if the variable is not set)
claire_v2.3
mark_v2.3

Important

The PERF_A2F_MODEL environment variable offers a convenient way to switch between standard, pre-configured A2F models.

However, if you are implementing more detailed customizations or are using your own stylization configuration files (e.g., claire_stylization_config.yaml, james_stylization_config.yaml, mark_stylization_config.yaml), as detailed in the Stylization Configuration Files section of the Audio2Face-3D NIM Container Deployment and Configuration Guide document, you should not set the PERF_A2F_MODEL variable.

Setting PERF_A2F_MODEL loads a specific pre-set configuration for the chosen model, which can override or conflict with settings loaded from your custom stylization files, potentially leading to unexpected behavior.

For advanced model tuning and customization, always rely on the dedicated stylization configuration files.

Note

When you start the service, you might encounter warnings labeled as GStreamer-WARNING. These warnings occur because some libraries are missing from the container. However, they are safe to ignore, as these libraries are not used by Audio2Face-3D.

Expand this section for more details about the docker commands used above:

Docker flags explained

You can find the explanation of each flag in the above docker command in this table:

Flag	Description
`-it`	`--interactive` + `--tty` (see Docker docs)
`--rm`	Delete the container after it stops (see Docker docs)
`--name`	Give a name to the NIM container. Use any preferred value.
`--gpus all`	Expose all NVIDIA GPUs inside the container. See the configuration page for mounting specific GPUs.
`--network=host`	Connect container to host machine network. (see Docker docs)
`-e NGC_API_KEY=$NGC_API_KEY`	Add `NGC_API_KEY` environment variable in the container with the value from the `NGC_API_KEY` environment variable from the local machine.
`-e NIM_MANIFEST_PROFILE=$NIM_MANIFEST_PROFILE`	Add `NIM_MANIFEST_PROFILE` environment variable in the container.
`-e NIM_DISABLE_MODEL_DOWNLOAD=<value>`	Set `NIM_DISABLE_MODEL_DOWNLOAD` environment variable in the container. Default value is false. The variable controls if the A2F-3D NIM should download the model from NGC or not.

Running Inference#

Audio2Face-3D uses gRPC API. You can quickly try out the API by using the A2F-3D Python interaction application. Follow the instructions below to set it up:

$ git clone https://github.com/NVIDIA/Audio2Face-3D-Samples.git
$ git checkout tags/v1.3
$ cd Audio2Face-3D-Samples/scripts/audio2face_3d_microservices_interaction_app
$ python3 -m venv .venv
$ source .venv/bin/activate
$ pip3 install ../../proto/sample_wheel/nvidia_ace-1.2.0-py3-none-any.whl
$ pip3 install -r requirements.txt

Note

Audio2Face-3D NIM v1.3 continues to use v1.2.0 of the nvidia_ace gRPC python module.

To check if the service is ready to handle inference requests:

$ python3 a2f_3d.py health_check --url 0.0.0.0:52000

To run inference on one of the example audios:

$ python3 a2f_3d.py run_inference ../../example_audio/Mark_neutral.wav config/config_james.yml -u 0.0.0.0:52000

This command will print out where the results are saved, in a log similar with:

Input audio header info:
Sample rate: 16000 Hz
Bit depth: 16 bits
Channels: 1
Receiving data from server...
.............................
Status code: SUCCESS
Received status message with value: 'sent all data'
Saving data into output_000001 folder...

You can then explore the A2F-3D NIM output animations by running the command below and replacing <output_folder> with the name of the folder printed by the run inference command.

$ ls -l <output_folder>/
-rw-rw-r-- 1 user user    328 Nov 14 15:46 a2f_3d_input_emotions.csv
-rw-rw-r-- 1 user user  65185 Nov 14 15:46 a2f_3d_smoothed_emotion_output.csv
-rw-rw-r-- 1 user user 291257 Nov 14 15:46 animation_frames.csv
-rw-rw-r-- 1 user user 406444 Nov 14 15:46 out.wav

out.wav: contains the audio received
animation_frames.csv: contains the blendshapes
a2f_3d_input_emotions.csv: contains the emotions provided as input in the gRPC protocol
a2f_3d_smoothed_emotion_output.csv: contains emotions smoothed over time

Note

The maximum size of 1 audio buffer sent over the grpc is 10 seconds.
The maximum size of the audio clip to process is 300 seconds.
This information can be found in Audio2Face-3D NIM Container Deployment and Configuration Guide under the Advanced Configuration File section.

Model Caching#

When running the first time, the Audio2Face-3D NIM will download the model from NGC. You can cache this model locally by using a Docker volume mount. Follow the example below and set the LOCAL_NIM_CACHE environment variable to the desired local path. Make sure the local path has execute, read and write permissions (777 permissions).

$ mkdir -p ~/.cache/audio2face-3d
$ chmod 777 ~/.cache/audio2face-3d
$ export LOCAL_NIM_CACHE=~/.cache/audio2face-3d

Then simply run the Audio2Face-3D NIM and mount the folder inside the Docker container in /tmp/a2x. This will download and store the models in LOCAL_NIM_CACHE.

$  docker run -it --rm --name audio2face-3d \
     --gpus all \
     --network=host \
     -e NGC_API_KEY=$NGC_API_KEY \
     -v "$LOCAL_NIM_CACHE:/tmp/a2x" \
     nvcr.io/nim/nvidia/audio2face-3d:1.3.16

Once the models have been stored locally, you can start running the Audio2Face-3D NIM as below using the NIM_DISABLE_MODEL_DOWNLOAD flag.

$  docker run -it --rm --name audio2face-3d \
     --gpus all \
     --network=host \
     -e NIM_DISABLE_MODEL_DOWNLOAD=true \
     -v $LOCAL_NIM_CACHE:/tmp/a2x \
     nvcr.io/nim/nvidia/audio2face-3d:1.3.16

Stopping the container#

You can easily stop and remove the running container by passing its name to docker stop and docker rm commands:

$ docker stop audio2face-3d
$ docker rm audio2face-3d