Warning

Usage Restrictions

You may not use the Software or any of its components for the purpose of emotion recognition. Any technology included in the Software may only be used as fully integrated in the Software and consistent with all applicable documentation.

Getting Started#

The steps below will help you setup and run the Audio2Face-3D NIM and use our sample application to receive blendshapes, audio and emotions.

Prerequisites#

Check the Support Matrix to make sure you have the supported hardware and software stack.

NGC Personal Key#

Set up your NGC Personal Key if you have not done so already.

Go to the NGC personal key setup page of the NGC website and Generate Personal Key.

Once prompted with a Generate Personal Key form, choose your key Name and Expiration, then select all services for Services Included.

Then you will get your Personal Key, make sure to save it somewhere safe.

Export the API key#

Export the API key generated at the previous step in NGC_API_KEY environment variable to run the A2F-3D NIM by running:

$ export NGC_API_KEY=<value>

To make the key available at startup, run the following command if you are using bash. Make sure you replace <value> with the actual API key.

$ echo "export NGC_API_KEY=<value>" >> ~/.bashrc

Docker Login to NGC#

To pull the NIM container image, you need to login to the nvcr.io docker registry. The username is $oauthtoken and the password is the API key generated earlier and stored in NGC_API_KEY. You can simply run the following command to login:

$ echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
Login Succeeded

Launching the Audio2Face-3D NIM#

There are two quick ways to start the Audio2Face-3D NIM:

Use a pre-generated TRT engine for supported GPUs
Generate a TRT engine for your NVIDIA GPU

Pregenerated TRT engine#

The Audio2Face-3D NIM supports pre-generated engines for the following GPUs:

A10G
A30
L4
L40S
RTX 4090
RTX 5080
RTX 5090
RTX 6000 Ada
RTX PRO 6000 Blackwell
B200

GPUs not listed above may use fallback profiles where available: for example, RTX 30 series maps to the A10G profile. GPUs with no mapping (such as A100 or H100) have no pre-generated profiles; to run on these GPUs, set NIM_DISABLE_MODEL_DOWNLOAD=true to generate TRT engines locally.

Note

The set of available pre-generated profiles evolves over time. For Blackwell-class GPUs (for example RTX PRO 6000 Blackwell or B200 in newer releases), prefer auto profile selection (or run nim_list_model_profiles) to see what profiles are available for your exact hardware.

To list available model profiles:

$ docker run -it --rm --network=host --gpus all \
--entrypoint nim_list_model_profiles nvcr.io/nim/nvidia/audio2face-3d:2.0

The output lists each profile with a 64-character hash ID followed by tags. The key fields to look for in each profile line are:

character: the A2F model (james, claire, mark, or multi)
gpu: the target GPU (RTX4090, RTX5090, B200, etc.)
precision: fp16 or fp32
model_type: tensorrt
batch_size: max concurrent streams supported

For example, in a profile line like batch_size:100|character:james|gpu:RTX5090|model_type:tensorrt|precision:fp16, the relevant info is: james model, optimized for RTX 5090, fp16 precision, up to 100 streams.

Tip

The hash IDs are opaque identifiers – you rarely need to copy them manually. In most cases, auto profile selection (shown next) picks the right profile for your GPU automatically. Use the GPU Type and Character columns in the profile table below if you need to identify a specific profile.

To launch the Audio2Face-3D NIM with auto profile selection:

$ docker run -it --rm --network=host --gpus all \
-e NGC_API_KEY=$NGC_API_KEY nvcr.io/nim/nvidia/audio2face-3d:2.0

Check the Manual profile selection section to set the profile manually and deploy.

Manual profile selection

Supported GPU and character profile combinations can be found in the table below. Each profile is pre-generated for a specific GPU and character model (claire, james, mark, or multi for diffusion):

GPU Type	Character	Profile ID
A10G	claire	7ab78afe1d3f160d49e22617ce5edb96dfb1eb2ca93b16ca1d0f5148a922801a
A10G	james	919638a5b89d3c1b1292be21c62bb8bbabbb93ac63b03d2e809a001041375700
A10G	mark	535fbb2bf408160cce3645b2b85eddc8c71378ebcfbda66da38ffb252d64a426
A10G	multi	f4c212c297315b9ab8462dd5da103f676a9a735a788010439d8171a81e303559
A30	claire	ea52901e7ab58809db52363a8683c5ca05808cb249dd5c89863c1d576887fe91
A30	james	cf58fe7973f49aa546cdea7488a6dc1767b3776f2787569af99cd2bc136b6756
A30	mark	9c83ef85849071b74f09bcd793659632cc6d4e2055ca33dc4d572f1b63d7e3dd
A30	multi	b328cd1c26b09c1ba69b0e65e1d44e6d742f5c381049a8bc6be448923efbc677
B200	claire	62f390868b91d5142c9ff67cabd55993f464aab58a6d267f3f62a639a6287fb3
B200	james	f3ed24948f366869149a15925d40e2a1da3cfa1ae26d570f361fb5d451aa90e3
B200	mark	c2f92c28e8211892bc799871b59e5d92672b2c93322c421451412309067179aa
B200	multi	2ef93055df06c23efe4780e7c225c383b07173e7a55ecfb8ddbf776bd6178f04
L4	claire	abeb4547dc5ddb096074adb70d8120aa0dc4e87dd99fe951fffb4e53336aa369
L4	james	e46d5580bc1b0160bf55e5f790e008cc31acbcb78da3a2efba414cb41ea53044
L4	mark	6d139e6bf3b85347c5412996cccb86a60c55532c20793d95d9deac6f63402b7f
L4	multi	b99e75b7475c597ebf6cdd6ff3383e60bf16b54c6aefa4647c675fd7edf492e2
L40S	claire	fdec053afe6da669a5a82e2c896f7af1e9bca0f562e0576e89d2baaa69ce11fd
L40S	james	9f69b475038f0ea376f63c2b6e460ab41431b2aca48e77b43a3c468b4b67302f
L40S	mark	059865f2fd53f8b0e1d5803532e96089d168f58b5de9cfe7b6a0d88604216b7e
L40S	multi	add44b555e5328bdb108b2b28df78d346ede048d642796622a63e1d615e50b4f
RTX4090	claire	c021f3ca049d620f84393cc2e8b1748439a849f4e4813e80343b46f819042f7d
RTX4090	james	c39d6cc51c706dcf9cb07ceecc80bf8641c4b2c08fe8588e0d6d875b464c6295
RTX4090	mark	4a13a78c606a49ddf4c0aeac66488897a0c392286bc98dde8c5097ce378b6835
RTX4090	multi	dee1ac011ab648c0e10c9e92b5fec24f6ca665d9a76f69daa4476260a0e4d453
RTX5080	claire	ed88e4a416109eb3ff36e01b34092149d989379ff3786b7df45e45806edba899
RTX5080	james	6f2226bfcd979463e7081b7f4dfea0b97c4dba1c8fe9911e7911863a0156196a
RTX5080	mark	c31d23e3fe98c7c947c10fdab96e0540ca2b6a44de039c6ce822f08bed2b797d
RTX5080	multi	3e538490d00d413977d090a78c012148009171eabd8bcf0d964a0aea29040a9a
RTX5090	claire	4e1be9a8b348517e3a914f53b33bb409b45315b01897014410903017b3fad9be
RTX5090	james	5c09e7a77d93637d0e69762ec6e9a574e478a47c0c7d72f6f6887aa5c952afeb
RTX5090	mark	829f14e8dfe36e55d7e5bf1a245325f14b18efbc6791cb11e1beccbd6cfdf3b9
RTX5090	multi	b7f47d1ba26445410947583427d9f717609e24241264015c9640371339b60e03
RTX6000 Ada	claire	325ea37a77d7986fdd6d25ea43223a3172f0267036c4a8492a6e4d6e9efc4952
RTX6000 Ada	james	df18171ab6821b765ec1bd06be0d7a8027e3ac49813e77fc6f6e3e717c839ebe
RTX6000 Ada	mark	2afb78efad674069bf4e88860def85685c41cfab0ea6eba3447798cd77df8a80
RTX6000 Ada	multi	b027596e3621e34702c8244cc473809cf12cdf7d136775bbaeba023ad7e54a4f
RTX PRO 6000 Blackwell	claire	2817c04ed84ce76e4793a4498b88e0e63a7cb8e5efcf1f465940e4908e75113c
RTX PRO 6000 Blackwell	james	a45d8afe9c7e624d103f691e0a30a138dd9bba4cbb47c168d2a649c695fa7357
RTX PRO 6000 Blackwell	mark	539bba5c5e2e7410d1369bbccf7335d54b814276fd9d8c16c95ab65490ca4def
RTX PRO 6000 Blackwell	multi	69fc655ff6a1a52b73969e980bd2b72f059e4b144cdf3f19aa99ea97524e0874

Run this command and change the <manifest_profile_id> to the value from the table above corresponding to your GPU and desired character:

$  export NIM_MANIFEST_PROFILE=<manifest_profile_id>
$  docker run -it --rm --name audio2face-3d \
  --gpus all \
  --network=host \
  -e NGC_API_KEY=$NGC_API_KEY \
  -e NIM_MANIFEST_PROFILE=$NIM_MANIFEST_PROFILE \
  nvcr.io/nim/nvidia/audio2face-3d:2.0

Warning

You must select a profile that matches your GPU. Each profile contains TensorRT engines compiled for a specific GPU architecture. Using a profile built for a different GPU (e.g., an L40S profile on an RTX 5090) will download TRT engines that are incompatible with your hardware and the NIM will fail at runtime. If unsure which GPU you have, run nvidia-smi --query-gpu=name --format=csv,noheader and match the result to the GPU Type column in the table above.

Exception: Some GPUs within the same architecture can share profiles (e.g., RTX 30 series uses the A10G profile). Auto profile selection handles these mappings correctly, but manually providing a NIM_MANIFEST_PROFILE for the wrong GPU will bypass this logic and may cause runtime failures. For the full list of GPU-to-profile mappings, see GPU Device ID Mismatch Warning (nim_list_model_profiles).

When the Audio2Face-3D NIM is deployed, it will use the james_v2.3.1 model by default, as shown in the logs below.

[info] Using A2F model: james_v2.3.1

Choosing a Different Pre-configured Model#

Auto profile selection always downloads the james TRT engine. To use a different model (claire, mark, or multi / diffusion), set NIM_MANIFEST_PROFILE to the profile hash that matches your GPU and character from the profile table above.

For example, to launch with the mark model on an L40S GPU:

$ docker run -it --rm --network=host --gpus all \
  -e NGC_API_KEY=$NGC_API_KEY \
  -e NIM_MANIFEST_PROFILE=059865f2fd53f8b0e1d5803532e96089d168f58b5de9cfe7b6a0d88604216b7e \
  nvcr.io/nim/nvidia/audio2face-3d:2.0

The container will download the TRT engine for the specified profile and automatically configure the service to use the matching character model.

Note

You can also set PERF_A2F_MODEL alongside NIM_MANIFEST_PROFILE if you need tongue animation enabled or want to force a specific model variant. The available values are:

james_v2.3.1 (default)
claire_v2.3.1
mark_v2.3
multi_v3.2_james
multi_v3.2_claire
multi_v3.2_mark

However, PERF_A2F_MODEL alone (without NIM_MANIFEST_PROFILE) will not change which TRT engine is downloaded — auto selection always picks james. Always pair it with the matching profile, or use NIM_DISABLE_MODEL_DOWNLOAD=true (see below).

Important

If you are using your own stylization configuration files (e.g., claire_stylization_config.yaml, mark_stylization_config.yaml), as detailed in the Stylization Configuration Files section of the Audio2Face-3D NIM Container Deployment and Configuration Guide document, you should not set PERF_A2F_MODEL. Setting it loads a pre-set configuration that can override your custom stylization files.

Similarly, setting NIM_MANIFEST_PROFILE will update the stylization configuration to match the profile’s character. For full control over both model selection and stylization parameters, use the custom entrypoint workflow described in Audio2Face-3D NIM Container Deployment and Configuration Guide.

Generating TRT Engines Locally#

If your GPU does not have a pre-generated profile for the character you need, or if you want to generate optimized TRT engines on your exact hardware, you can let the container build them locally. This is also useful for GPUs without any pre-generated profiles (such as A100 or H100).

The simplest approach uses NIM_DISABLE_MODEL_DOWNLOAD=true with PERF_A2F_MODEL. The container will automatically generate the required TRT engines on first launch (this takes a few minutes):

$ mkdir -p ~/.cache/audio2face-3d
$ chmod 755 ~/.cache/audio2face-3d
$ export LOCAL_NIM_CACHE=~/.cache/audio2face-3d

$ docker run -it --rm --name audio2face-3d \
  --gpus all \
  --network=host \
  -e NIM_DISABLE_MODEL_DOWNLOAD=true \
  -e PERF_A2F_MODEL='mark_v2.3' \
  -v "$LOCAL_NIM_CACHE:/tmp/a2x" \
  nvcr.io/nim/nvidia/audio2face-3d:2.0

On the first run, the container runs generate_trt_models.py internally to build mark_v2.3.trt and a2e.trt into /tmp/a2x. Because the directory is mounted as a volume, subsequent launches reuse the cached engines and start immediately.

Replace mark_v2.3 with any supported model: james_v2.3.1, claire_v2.3.1, mark_v2.3, multi_v3.2_james, multi_v3.2_claire, or multi_v3.2_mark.

Warning

Default model fallback behavior: If an invalid, misspelled, or unrecognized model name is provided (e.g., james_2.3 instead of james_v2.3.1), the container logs an error but does not stop. It silently falls back to the default model james_v2.3.1 and starts the inference server normally. This applies regardless of how the model is specified (PERF_A2F_MODEL, NIM_MANIFEST_PROFILE, or stylization config).

Always verify the active model in the startup logs:

[IProc] [info] Using A2F TRT engine for model 'mark_v2.3': /tmp/a2x/mark_v2.3.trt

If you see james_v2.3.1 here but expected a different model, check that your model name exactly matches one of the supported names listed above.

Note

The container runs as UID 1000. On most single-user Linux systems your user is also UID 1000 (verify with id -u), so chmod 755 is sufficient. If your UID differs, grant ownership with sudo chown 1000:1000 ~/.cache/audio2face-3d, or for quick prototyping use chmod 777.

TRT engines are GPU-specific and model-specific. Because NIM_DISABLE_MODEL_DOWNLOAD=true prevents downloading fresh engines, the container relies entirely on what is in the cache. If the cache contains engines for a different model (e.g., mark_v2.3.trt when you request multi_v3.2_claire), the container will fail with: Required TRT engine not found: multi_v3.2.trt. To switch models, either delete the cached engines first (rm $LOCAL_NIM_CACHE/*.trt) or omit the -v volume mount entirely to generate engines ephemerally without caching.

Expand this section for more details about the docker commands used above:

Docker flags explained

You can find the explanation of each flag in the above docker command in this table:

Flag	Description
`-it`	`--interactive` + `--tty` (see Docker docs)
`--rm`	Delete the container after it stops (see Docker docs)
`--name`	Give a name to the NIM container. Use any preferred value.
`--gpus all`	Expose all NVIDIA GPUs inside the container. See the configuration page for mounting specific GPUs.
`--network=host`	Connect container to host machine network. (see Docker docs)
`-e NGC_API_KEY=$NGC_API_KEY`	Add `NGC_API_KEY` environment variable in the container with the value from the `NGC_API_KEY` environment variable from the local machine.
`-e NIM_MANIFEST_PROFILE=$NIM_MANIFEST_PROFILE`	Add `NIM_MANIFEST_PROFILE` environment variable in the container.
`-e NIM_DISABLE_MODEL_DOWNLOAD=<value>`	Set `NIM_DISABLE_MODEL_DOWNLOAD` environment variable in the container. Default value is false. The variable controls if the A2F-3D NIM should download the model from NGC or not.

Checking Service Health#

The NIM container exposes HTTP endpoints on port 8000 that you can query directly with curl — no additional software is required:

$ curl -s http://localhost:8000/v1/health/ready
{"object":"health.response","message":"ready","status":"ready"}

Endpoint	Description
`GET /v1/health/ready`	Returns HTTP 200 when the service is ready to handle inference requests.
`GET /v1/health/live`	Liveness probe. Returns HTTP 200 when the process is running (even if not yet ready).
`GET /v1/metrics`	Prometheus-format metrics for the NIM process (GPU utilization, memory, power, HTTP request count and latency).
`GET /v1/version`	Returns the NIM release and API version.

Note

These HTTP endpoints are available alongside the primary gRPC API on port 52000. For gRPC-based health checking, see the Health Check gRPC section or use the Python sample application shown below.

The /v1/metrics endpoint on port 8000 exposes NIM-level metrics (GPU, process, HTTP requests). For A2F application-level metrics (streams_in_use, streams_available), enable telemetry in the deployment configuration and scrape port 9464 (the default prometheus_endpoint). See Observability for details.

Running Inference#

Audio2Face-3D uses gRPC API. You can quickly try out the API by using the A2F-3D Python interaction application. Follow the instructions below to set it up:

$ git clone https://github.com/NVIDIA/Audio2Face-3D-Samples.git
$ git checkout tags/v2.0
$ cd Audio2Face-3D-Samples/scripts/audio2face_3d_microservices_interaction_app
$ python3 -m venv .venv
$ source .venv/bin/activate
$ pip3 install ../../proto/sample_wheel/nvidia_ace-1.2.0-py3-none-any.whl
$ pip3 install -r requirements.txt

Note

Audio2Face-3D NIM v2.0 continues to use v1.2.0 of the nvidia_ace gRPC python module.

To check if the service is ready to handle inference requests:

$ python3 a2f_3d.py health_check --url 0.0.0.0:52000

To run inference on one of the example audios:

$ python3 a2f_3d.py run_inference ../../example_audio/Mark_neutral.wav config/config_james.yml -u 0.0.0.0:52000

This command will print out where the results are saved, in a log similar with:

Input audio header info:
Sample rate: 16000 Hz
Bit depth: 16 bits
Channels: 1
Receiving data from server...
.............................
Status code: SUCCESS
Received status message with value: 'sent all data'
Saving data into output_000001 folder...

You can then explore the A2F-3D NIM output animations by running the command below and replacing <output_folder> with the name of the folder printed by the run inference command.

$ ls -l <output_folder>/
-rw-rw-r-- 1 user user    328 Nov 14 15:46 a2f_3d_input_emotions.csv
-rw-rw-r-- 1 user user  65185 Nov 14 15:46 a2f_3d_smoothed_emotion_output.csv
-rw-rw-r-- 1 user user 291257 Nov 14 15:46 animation_frames.csv
-rw-rw-r-- 1 user user 406444 Nov 14 15:46 out.wav

out.wav: contains the audio received
animation_frames.csv: contains the blendshapes
a2f_3d_input_emotions.csv: contains the emotions provided as input in the gRPC protocol
a2f_3d_smoothed_emotion_output.csv: contains emotions smoothed over time

Note

The maximum size of 1 audio buffer sent over the grpc is 10 seconds.
The maximum size of the audio clip to process is 300 seconds.
This information can be found in Audio2Face-3D NIM Container Deployment and Configuration Guide under the Advanced Configuration File section.

Model Caching#

When running the first time, the Audio2Face-3D NIM will download the model from NGC. You can cache this model locally by using a Docker volume mount. Follow the example below and set the LOCAL_NIM_CACHE environment variable to the desired local path. The container runs as UID 1000 by default, so the cache directory must be readable and writable by that user.

$ mkdir -p ~/.cache/audio2face-3d
$ chmod 755 ~/.cache/audio2face-3d
$ export LOCAL_NIM_CACHE=~/.cache/audio2face-3d

Note

The container runs as UID 1000. On most single-user Linux systems your user is also UID 1000 (verify with id -u), so chmod 755 is sufficient. If your UID differs, grant ownership with sudo chown 1000:1000 ~/.cache/audio2face-3d, or for quick prototyping use chmod 777.

Then simply run the Audio2Face-3D NIM and mount the folder inside the Docker container in /tmp/a2x. This will download and store the models in LOCAL_NIM_CACHE.

$  docker run -it --rm --name audio2face-3d \
     --gpus all \
     --network=host \
     -e NGC_API_KEY=$NGC_API_KEY \
     -v "$LOCAL_NIM_CACHE:/tmp/a2x" \
     nvcr.io/nim/nvidia/audio2face-3d:2.0

Once the models have been stored locally, you can start running the Audio2Face-3D NIM as below using the NIM_DISABLE_MODEL_DOWNLOAD flag.

$  docker run -it --rm --name audio2face-3d \
     --gpus all \
     --network=host \
     -e NIM_DISABLE_MODEL_DOWNLOAD=true \
     -v $LOCAL_NIM_CACHE:/tmp/a2x \
     nvcr.io/nim/nvidia/audio2face-3d:2.0

Stopping the container#

You can easily stop and remove the running container by passing its name to docker stop and docker rm commands:

$ docker stop audio2face-3d
$ docker rm audio2face-3d