Warning
Usage Restrictions
You may not use the Software or any of its components for the purpose of emotion recognition. Any technology included in the Software may only be used as fully integrated in the Software and consistent with all applicable documentation.
Getting Started#
The steps below will help you setup and run the Audio2Face-3D NIM and use our sample application to receive blendshapes, audio and emotions.
Prerequisites#
Check the Support Matrix to make sure you have the supported hardware and software stack.
Linux System - Using Ubuntu 24.04
Setup Docker without Docker Desktop
Install docker using the convenience script:
$ curl -fsSL https://get.docker.com -o get-docker.sh
$ sudo sh ./get-docker.sh
Add your user account to docker group:
$ sudo groupadd docker
$ sudo usermod -aG docker <username>
Logout and login again of your system, then do a sanity check:
$ docker run hello-world
You should see “Hello from Docker!” printed out.
Install the Docker Compose plugin:
$ sudo apt-get update
$ sudo apt-get install docker-compose-plugin
Check that the installation was successful by running:
$ docker compose version
Set up iptables compatibility:
$ sudo update-alternatives --config iptables
When prompted, choose Selection 1, with the path /usr/sbin/iptables-legacy.
Restart your system and check the Docker status:
$ service docker status
You should see “active (running)” in the messages. To exit, press q.
Install CUDA Toolkit
For cuda-toolkit-12-9 (recommended) run these instructions:
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
$ sudo dpkg -i cuda-keyring_1.1-1_all.deb
$ sudo apt-get update
$ sudo apt-get -y install cuda-toolkit-12-9
Alternatively, to install the latest CUDA Toolkit, visit NVIDIA Developer - CUDA downloads:
and follow the instructions.
Note
B200 multi-GPU systems (NVSwitch/NVL5+): If your system has multiple B200 GPUs (e.g., 8x B200),
you must install and start NVIDIA FabricManager, NVLink Subnet Manager (nvlsm), and
InfiniBand diagnostics to enable CUDA compute. Without these, nvidia-smi will work but
all CUDA operations will fail with error 802. Install with:
$ sudo apt-get install -y nvidia-fabricmanager-<driver-branch> nvlsm infiniband-diags
$ sudo modprobe ib_umad
$ echo "ib_umad" | sudo tee /etc/modules-load.d/ib_umad.conf
$ sudo systemctl enable --now nvidia-fabricmanager
Replace <driver-branch> with your driver branch (e.g., 570, 580). The FabricManager
package version must match your installed driver version. See CUDA Error 802 “system not yet initialized” on B200 Multi-GPU Systems in
Troubleshooting for details.
Install NVIDIA Container Toolkit
If any of the steps below fails, follow the official NVIDIA Container Toolkit docs instead.
Configure the production repository:
$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
Optionally, configure the repository to use experimental packages:
$ sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
Update the packages list from the repository:
$ sudo apt-get update
$ sudo apt-get install -y nvidia-container-toolkit
Configure docker with NVIDIA Container Toolkit
Run the following instructions:
$ sudo nvidia-ctk runtime configure --runtime=docker
$ sudo systemctl restart docker
If all goes well then you should be able to start a Docker container and run nvidia-smi inside, to see information
about your GPU inside a container. We provide an example below but keep in mind that numbers will vary for your hardware
(ensure the reported CUDA version is >=12.8, <13.0 — 12.9 recommended).
$ sudo docker run --rm --gpus all ubuntu nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI <version> Driver Version: <version> CUDA Version: 12.9 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A10G On | 00000000:01:00.0 Off | 0 |
| 0% 33C P8 18W / 300W | 0MiB / 23028MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
NGC Personal Key#
Set up your NGC Personal Key if you have not done so already.
Go to the NGC personal key setup page of the NGC website
and Generate Personal Key.
Once prompted with a Generate Personal Key form, choose your key Name and Expiration,
then select all services for Services Included.
Then you will get your Personal Key, make sure to save it somewhere safe.
Export the API key#
Export the API key generated at the previous step in NGC_API_KEY environment variable to run the A2F-3D NIM by
running:
$ export NGC_API_KEY=<value>
To make the key available at startup, run the following command if you are using bash. Make sure you replace <value>
with the actual API key.
$ echo "export NGC_API_KEY=<value>" >> ~/.bashrc
Docker Login to NGC#
To pull the NIM container image, you need to login to the nvcr.io docker registry. The username is $oauthtoken
and the password is the API key generated earlier and stored in NGC_API_KEY. You can simply run the following command
to login:
$ echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
Login Succeeded
Launching the Audio2Face-3D NIM#
There are two quick ways to start the Audio2Face-3D NIM:
Use a pre-generated TRT engine for supported GPUs
Generate a TRT engine for your NVIDIA GPU
Pregenerated TRT engine#
The Audio2Face-3D NIM supports pre-generated engines for the following GPUs:
A10G
A30
L4
L40S
RTX 4090
RTX 5080
RTX 5090
RTX 6000 Ada
RTX PRO 6000 Blackwell
B200
GPUs not listed above may use fallback profiles where available: for example, RTX 30 series maps to the A10G
profile. GPUs with no mapping (such as A100 or H100) have no pre-generated profiles; to run on these GPUs,
set NIM_DISABLE_MODEL_DOWNLOAD=true to generate TRT engines locally.
Note
The set of available pre-generated profiles evolves over time. For Blackwell-class GPUs (for example RTX PRO 6000
Blackwell or B200 in newer releases), prefer auto profile selection (or run nim_list_model_profiles) to see what
profiles are available for your exact hardware.
To list available model profiles:
$ docker run -it --rm --network=host --gpus all \
--entrypoint nim_list_model_profiles nvcr.io/nim/nvidia/audio2face-3d:2.0
The output lists each profile with a 64-character hash ID followed by tags. The key fields to look for in each profile line are:
character: the A2F model (
james,claire,mark, ormulti)gpu: the target GPU (
RTX4090,RTX5090,B200, etc.)precision:
fp16orfp32model_type:
tensorrtbatch_size: max concurrent streams supported
For example, in a profile line like
batch_size:100|character:james|gpu:RTX5090|model_type:tensorrt|precision:fp16, the relevant
info is: james model, optimized for RTX 5090, fp16 precision, up to 100 streams.
Tip
The hash IDs are opaque identifiers – you rarely need to copy them manually. In most cases, auto profile selection (shown next) picks the right profile for your GPU automatically. Use the GPU Type and Character columns in the profile table below if you need to identify a specific profile.
To launch the Audio2Face-3D NIM with auto profile selection:
$ docker run -it --rm --network=host --gpus all \
-e NGC_API_KEY=$NGC_API_KEY nvcr.io/nim/nvidia/audio2face-3d:2.0
Check the Manual profile selection section to set the profile manually and deploy.
Manual profile selection
Supported GPU and character profile combinations can be found in the table below. Each profile is pre-generated for a specific GPU and character model (claire, james, mark, or multi for diffusion):
GPU Type |
Character |
Profile ID |
|---|---|---|
A10G |
claire |
7ab78afe1d3f160d49e22617ce5edb96dfb1eb2ca93b16ca1d0f5148a922801a |
A10G |
james |
919638a5b89d3c1b1292be21c62bb8bbabbb93ac63b03d2e809a001041375700 |
A10G |
mark |
535fbb2bf408160cce3645b2b85eddc8c71378ebcfbda66da38ffb252d64a426 |
A10G |
multi |
f4c212c297315b9ab8462dd5da103f676a9a735a788010439d8171a81e303559 |
A30 |
claire |
ea52901e7ab58809db52363a8683c5ca05808cb249dd5c89863c1d576887fe91 |
A30 |
james |
cf58fe7973f49aa546cdea7488a6dc1767b3776f2787569af99cd2bc136b6756 |
A30 |
mark |
9c83ef85849071b74f09bcd793659632cc6d4e2055ca33dc4d572f1b63d7e3dd |
A30 |
multi |
b328cd1c26b09c1ba69b0e65e1d44e6d742f5c381049a8bc6be448923efbc677 |
B200 |
claire |
62f390868b91d5142c9ff67cabd55993f464aab58a6d267f3f62a639a6287fb3 |
B200 |
james |
f3ed24948f366869149a15925d40e2a1da3cfa1ae26d570f361fb5d451aa90e3 |
B200 |
mark |
c2f92c28e8211892bc799871b59e5d92672b2c93322c421451412309067179aa |
B200 |
multi |
2ef93055df06c23efe4780e7c225c383b07173e7a55ecfb8ddbf776bd6178f04 |
L4 |
claire |
abeb4547dc5ddb096074adb70d8120aa0dc4e87dd99fe951fffb4e53336aa369 |
L4 |
james |
e46d5580bc1b0160bf55e5f790e008cc31acbcb78da3a2efba414cb41ea53044 |
L4 |
mark |
6d139e6bf3b85347c5412996cccb86a60c55532c20793d95d9deac6f63402b7f |
L4 |
multi |
b99e75b7475c597ebf6cdd6ff3383e60bf16b54c6aefa4647c675fd7edf492e2 |
L40S |
claire |
fdec053afe6da669a5a82e2c896f7af1e9bca0f562e0576e89d2baaa69ce11fd |
L40S |
james |
9f69b475038f0ea376f63c2b6e460ab41431b2aca48e77b43a3c468b4b67302f |
L40S |
mark |
059865f2fd53f8b0e1d5803532e96089d168f58b5de9cfe7b6a0d88604216b7e |
L40S |
multi |
add44b555e5328bdb108b2b28df78d346ede048d642796622a63e1d615e50b4f |
RTX4090 |
claire |
c021f3ca049d620f84393cc2e8b1748439a849f4e4813e80343b46f819042f7d |
RTX4090 |
james |
c39d6cc51c706dcf9cb07ceecc80bf8641c4b2c08fe8588e0d6d875b464c6295 |
RTX4090 |
mark |
4a13a78c606a49ddf4c0aeac66488897a0c392286bc98dde8c5097ce378b6835 |
RTX4090 |
multi |
dee1ac011ab648c0e10c9e92b5fec24f6ca665d9a76f69daa4476260a0e4d453 |
RTX5080 |
claire |
ed88e4a416109eb3ff36e01b34092149d989379ff3786b7df45e45806edba899 |
RTX5080 |
james |
6f2226bfcd979463e7081b7f4dfea0b97c4dba1c8fe9911e7911863a0156196a |
RTX5080 |
mark |
c31d23e3fe98c7c947c10fdab96e0540ca2b6a44de039c6ce822f08bed2b797d |
RTX5080 |
multi |
3e538490d00d413977d090a78c012148009171eabd8bcf0d964a0aea29040a9a |
RTX5090 |
claire |
4e1be9a8b348517e3a914f53b33bb409b45315b01897014410903017b3fad9be |
RTX5090 |
james |
5c09e7a77d93637d0e69762ec6e9a574e478a47c0c7d72f6f6887aa5c952afeb |
RTX5090 |
mark |
829f14e8dfe36e55d7e5bf1a245325f14b18efbc6791cb11e1beccbd6cfdf3b9 |
RTX5090 |
multi |
b7f47d1ba26445410947583427d9f717609e24241264015c9640371339b60e03 |
RTX6000 Ada |
claire |
325ea37a77d7986fdd6d25ea43223a3172f0267036c4a8492a6e4d6e9efc4952 |
RTX6000 Ada |
james |
df18171ab6821b765ec1bd06be0d7a8027e3ac49813e77fc6f6e3e717c839ebe |
RTX6000 Ada |
mark |
2afb78efad674069bf4e88860def85685c41cfab0ea6eba3447798cd77df8a80 |
RTX6000 Ada |
multi |
b027596e3621e34702c8244cc473809cf12cdf7d136775bbaeba023ad7e54a4f |
RTX PRO 6000 Blackwell |
claire |
2817c04ed84ce76e4793a4498b88e0e63a7cb8e5efcf1f465940e4908e75113c |
RTX PRO 6000 Blackwell |
james |
a45d8afe9c7e624d103f691e0a30a138dd9bba4cbb47c168d2a649c695fa7357 |
RTX PRO 6000 Blackwell |
mark |
539bba5c5e2e7410d1369bbccf7335d54b814276fd9d8c16c95ab65490ca4def |
RTX PRO 6000 Blackwell |
multi |
69fc655ff6a1a52b73969e980bd2b72f059e4b144cdf3f19aa99ea97524e0874 |
Run this command and change the <manifest_profile_id> to the value from the table above corresponding to your GPU and desired character:
$ export NIM_MANIFEST_PROFILE=<manifest_profile_id>
$ docker run -it --rm --name audio2face-3d \
--gpus all \
--network=host \
-e NGC_API_KEY=$NGC_API_KEY \
-e NIM_MANIFEST_PROFILE=$NIM_MANIFEST_PROFILE \
nvcr.io/nim/nvidia/audio2face-3d:2.0
Warning
You must select a profile that matches your GPU. Each profile contains TensorRT engines
compiled for a specific GPU architecture. Using a profile built for a different GPU (e.g.,
an L40S profile on an RTX 5090) will download TRT engines that are incompatible with your
hardware and the NIM will fail at runtime. If unsure which GPU you have, run
nvidia-smi --query-gpu=name --format=csv,noheader and match the result to the GPU Type
column in the table above.
Exception: Some GPUs within the same architecture can share profiles (e.g., RTX 30
series uses the A10G profile). Auto profile selection handles these mappings correctly,
but manually providing a NIM_MANIFEST_PROFILE for the wrong GPU will bypass this
logic and may cause runtime failures. For the full list of GPU-to-profile mappings, see
GPU Device ID Mismatch Warning (nim_list_model_profiles).
When the Audio2Face-3D NIM is deployed, it will use the james_v2.3.1 model by default,
as shown in the logs below.
[info] Using A2F model: james_v2.3.1
Choosing a Different Pre-configured Model#
Auto profile selection always downloads the james TRT engine. To use a different model
(claire, mark, or multi / diffusion), set NIM_MANIFEST_PROFILE to the profile hash
that matches your GPU and character from the profile table above.
For example, to launch with the mark model on an L40S GPU:
$ docker run -it --rm --network=host --gpus all \
-e NGC_API_KEY=$NGC_API_KEY \
-e NIM_MANIFEST_PROFILE=059865f2fd53f8b0e1d5803532e96089d168f58b5de9cfe7b6a0d88604216b7e \
nvcr.io/nim/nvidia/audio2face-3d:2.0
The container will download the TRT engine for the specified profile and automatically configure the service to use the matching character model.
Note
You can also set PERF_A2F_MODEL alongside NIM_MANIFEST_PROFILE if you need tongue animation
enabled or want to force a specific model variant. The available values are:
james_v2.3.1(default)claire_v2.3.1mark_v2.3multi_v3.2_jamesmulti_v3.2_clairemulti_v3.2_mark
However, PERF_A2F_MODEL alone (without NIM_MANIFEST_PROFILE) will not change which
TRT engine is downloaded — auto selection always picks james. Always pair it with the matching
profile, or use NIM_DISABLE_MODEL_DOWNLOAD=true (see below).
Important
If you are using your own stylization configuration files
(e.g., claire_stylization_config.yaml, mark_stylization_config.yaml),
as detailed in the Stylization Configuration Files section of
the Audio2Face-3D NIM Container Deployment and Configuration Guide document,
you should not set PERF_A2F_MODEL. Setting it loads a pre-set configuration that can
override your custom stylization files.
Similarly, setting NIM_MANIFEST_PROFILE will update the stylization configuration to match
the profile’s character. For full control over both model selection and stylization parameters,
use the custom entrypoint workflow described in Audio2Face-3D NIM Container Deployment and Configuration Guide.
Generating TRT Engines Locally#
If your GPU does not have a pre-generated profile for the character you need, or if you want to generate optimized TRT engines on your exact hardware, you can let the container build them locally. This is also useful for GPUs without any pre-generated profiles (such as A100 or H100).
The simplest approach uses NIM_DISABLE_MODEL_DOWNLOAD=true with PERF_A2F_MODEL. The container
will automatically generate the required TRT engines on first launch (this takes a few minutes):
$ mkdir -p ~/.cache/audio2face-3d
$ chmod 755 ~/.cache/audio2face-3d
$ export LOCAL_NIM_CACHE=~/.cache/audio2face-3d
$ docker run -it --rm --name audio2face-3d \
--gpus all \
--network=host \
-e NIM_DISABLE_MODEL_DOWNLOAD=true \
-e PERF_A2F_MODEL='mark_v2.3' \
-v "$LOCAL_NIM_CACHE:/tmp/a2x" \
nvcr.io/nim/nvidia/audio2face-3d:2.0
On the first run, the container runs generate_trt_models.py internally to build
mark_v2.3.trt and a2e.trt into /tmp/a2x. Because the directory is mounted as a volume,
subsequent launches reuse the cached engines and start immediately.
Replace mark_v2.3 with any supported model: james_v2.3.1, claire_v2.3.1, mark_v2.3,
multi_v3.2_james, multi_v3.2_claire, or multi_v3.2_mark.
Warning
Default model fallback behavior: If an invalid, misspelled, or unrecognized model name
is provided (e.g., james_2.3 instead of james_v2.3.1), the container logs an error
but does not stop. It silently falls back to the default model james_v2.3.1 and
starts the inference server normally. This applies regardless of how the model is specified
(PERF_A2F_MODEL, NIM_MANIFEST_PROFILE, or stylization config).
Always verify the active model in the startup logs:
[IProc] [info] Using A2F TRT engine for model 'mark_v2.3': /tmp/a2x/mark_v2.3.trt
If you see james_v2.3.1 here but expected a different model, check that your model
name exactly matches one of the supported names listed above.
Note
The container runs as UID 1000. On most single-user Linux systems your user is also
UID 1000 (verify with id -u), so chmod 755 is sufficient. If your UID differs,
grant ownership with sudo chown 1000:1000 ~/.cache/audio2face-3d, or for quick
prototyping use chmod 777.
TRT engines are GPU-specific and model-specific. Because NIM_DISABLE_MODEL_DOWNLOAD=true
prevents downloading fresh engines, the container relies entirely on what is in the cache. If
the cache contains engines for a different model (e.g., mark_v2.3.trt when you request
multi_v3.2_claire), the container will fail with:
Required TRT engine not found: multi_v3.2.trt. To switch models, either delete the cached
engines first (rm $LOCAL_NIM_CACHE/*.trt) or omit the -v volume mount entirely to
generate engines ephemerally without caching.
Manual TRT generation (advanced)
For full control, you can enter the container, generate engines manually, and then start the service. This is useful if you want to combine TRT generation with custom stylization configs.
Launch the container with a custom entrypoint:
$ docker run -it --rm --name audio2face-3d \
--gpus all \
--network=host \
--entrypoint /bin/bash -w /opt/nvidia/a2f_pipeline \
-e NIM_SKIP_A2F_START=true \
-e NIM_DISABLE_MODEL_DOWNLOAD=true \
-v "$LOCAL_NIM_CACHE:/tmp/a2x" \
nvcr.io/nim/nvidia/audio2face-3d:2.0
Inside the container, generate TRT engines for your desired model:
$ ./service/generate_trt_models.py \
--stylization-config /opt/sample_configs/mark_stylization_config.yaml
Start the service. You have two options:
Option A – Full NIM server (recommended): Starts the complete NIM stack including the HTTP API on port 8000 (health checks, metrics) and the gRPC server on port 52000:
$ /opt/nim/start_server.sh
Option B – A2F pipeline only: Starts just the gRPC inference server on port 52000
with the same stylization config. This is useful for quick testing or when you need to pass
custom --deployment-config or --advanced-config flags:
$ a2f_pipeline.run \
--stylization-config /opt/sample_configs/mark_stylization_config.yaml
The generated engines are persisted in $LOCAL_NIM_CACHE via the volume mount, so
subsequent launches (with the standard docker run command) will reuse them.
The built-in per-character stylization configs are located at /opt/sample_configs/ inside the
container (claire_stylization_config.yaml, james_stylization_config.yaml,
mark_stylization_config.yaml, and their diffusion variants). The active default config is at
/apps/configs/stylization_config.yaml (james). You can also mount your own configs; see
Audio2Face-3D NIM Container Deployment and Configuration Guide for details.
Expand this section for more details about the docker commands used above:
Docker flags explained
You can find the explanation of each flag in the above docker command in this table:
Flag |
Description |
|---|---|
|
|
|
Delete the container after it stops (see Docker docs) |
|
Give a name to the NIM container. Use any preferred value. |
|
Expose all NVIDIA GPUs inside the container. See the configuration page for mounting specific GPUs. |
|
Connect container to host machine network. (see Docker docs) |
|
Add |
|
Add |
|
Set |
Checking Service Health#
The NIM container exposes HTTP endpoints on port 8000 that you can query directly
with curl — no additional software is required:
$ curl -s http://localhost:8000/v1/health/ready
{"object":"health.response","message":"ready","status":"ready"}
Endpoint |
Description |
|---|---|
|
Returns HTTP 200 when the service is ready to handle inference requests. |
|
Liveness probe. Returns HTTP 200 when the process is running (even if not yet ready). |
|
Prometheus-format metrics for the NIM process (GPU utilization, memory, power, HTTP request count and latency). |
|
Returns the NIM release and API version. |
Note
These HTTP endpoints are available alongside the primary gRPC API on port 52000. For gRPC-based health checking, see the Health Check gRPC section or use the Python sample application shown below.
The /v1/metrics endpoint on port 8000 exposes NIM-level metrics (GPU, process, HTTP
requests). For A2F application-level metrics (streams_in_use, streams_available),
enable telemetry in the deployment configuration and scrape port 9464 (the default
prometheus_endpoint). See Observability for details.
Running Inference#
Audio2Face-3D uses gRPC API. You can quickly try out the API by using the A2F-3D Python interaction application. Follow the instructions below to set it up:
$ git clone https://github.com/NVIDIA/Audio2Face-3D-Samples.git
$ git checkout tags/v2.0
$ cd Audio2Face-3D-Samples/scripts/audio2face_3d_microservices_interaction_app
$ python3 -m venv .venv
$ source .venv/bin/activate
$ pip3 install ../../proto/sample_wheel/nvidia_ace-1.2.0-py3-none-any.whl
$ pip3 install -r requirements.txt
Note
Audio2Face-3D NIM v2.0 continues to use v1.2.0 of the nvidia_ace gRPC python module.
To check if the service is ready to handle inference requests:
$ python3 a2f_3d.py health_check --url 0.0.0.0:52000
To run inference on one of the example audios:
$ python3 a2f_3d.py run_inference ../../example_audio/Mark_neutral.wav config/config_james.yml -u 0.0.0.0:52000
This command will print out where the results are saved, in a log similar with:
Input audio header info:
Sample rate: 16000 Hz
Bit depth: 16 bits
Channels: 1
Receiving data from server...
.............................
Status code: SUCCESS
Received status message with value: 'sent all data'
Saving data into output_000001 folder...
You can then explore the A2F-3D NIM output animations by running the command below and replacing <output_folder> with
the name of the folder printed by the run inference command.
$ ls -l <output_folder>/
-rw-rw-r-- 1 user user 328 Nov 14 15:46 a2f_3d_input_emotions.csv
-rw-rw-r-- 1 user user 65185 Nov 14 15:46 a2f_3d_smoothed_emotion_output.csv
-rw-rw-r-- 1 user user 291257 Nov 14 15:46 animation_frames.csv
-rw-rw-r-- 1 user user 406444 Nov 14 15:46 out.wav
out.wav: contains the audio received
animation_frames.csv: contains the blendshapes
a2f_3d_input_emotions.csv: contains the emotions provided as input in the gRPC protocol
a2f_3d_smoothed_emotion_output.csv: contains emotions smoothed over time
Note
The maximum size of 1 audio buffer sent over the grpc is 10 seconds.
The maximum size of the audio clip to process is 300 seconds.
This information can be found in Audio2Face-3D NIM Container Deployment and Configuration Guide under the Advanced Configuration File section.
Model Caching#
When running the first time, the Audio2Face-3D NIM will download the model from NGC. You can cache this model locally
by using a Docker volume mount. Follow the example below and set the LOCAL_NIM_CACHE environment variable to the
desired local path. The container runs as UID 1000 by default, so the cache directory must be readable and writable
by that user.
$ mkdir -p ~/.cache/audio2face-3d
$ chmod 755 ~/.cache/audio2face-3d
$ export LOCAL_NIM_CACHE=~/.cache/audio2face-3d
Note
The container runs as UID 1000. On most single-user Linux systems your user is also
UID 1000 (verify with id -u), so chmod 755 is sufficient. If your UID differs,
grant ownership with sudo chown 1000:1000 ~/.cache/audio2face-3d, or for quick
prototyping use chmod 777.
Then simply run the Audio2Face-3D NIM and mount the folder inside the Docker container in /tmp/a2x.
This will download and store the models in LOCAL_NIM_CACHE.
$ docker run -it --rm --name audio2face-3d \
--gpus all \
--network=host \
-e NGC_API_KEY=$NGC_API_KEY \
-v "$LOCAL_NIM_CACHE:/tmp/a2x" \
nvcr.io/nim/nvidia/audio2face-3d:2.0
Once the models have been stored locally, you can start running the Audio2Face-3D NIM as below
using the NIM_DISABLE_MODEL_DOWNLOAD flag.
$ docker run -it --rm --name audio2face-3d \
--gpus all \
--network=host \
-e NIM_DISABLE_MODEL_DOWNLOAD=true \
-v $LOCAL_NIM_CACHE:/tmp/a2x \
nvcr.io/nim/nvidia/audio2face-3d:2.0
Stopping the container#
You can easily stop and remove the running container by passing its name to docker stop and docker rm commands:
$ docker stop audio2face-3d
$ docker rm audio2face-3d