Performance#
Audio2Face-3D NIM offers real time performance at 30 FPS for converting audio into facial animation, when run on optimal hardware configuration. The microservice supports concurrent input streams which may affect this performance.
For more information about optimal hardware configuration, check the Support Matrix page.
Performance evaluation#
You can use the provided sample application to get performance information about your current deployment of Audio2Face-3D.
Warning
The NIM container must be running and healthy before running the performance test. If the
container is not running, the script fails with a long traceback ending in
grpc.aio._call.AioRpcError: Connection refused. Verify readiness first:
$ curl -s http://localhost:8000/v1/health/ready
# Clone the repository
git clone https://github.com/NVIDIA/Audio2Face-3D-Samples.git
# Navigate to the scripts directory
cd Audio2Face-3D-Samples/scripts/audio2face_3d_microservices_interaction_app
# Checkout v2.0 tag
git checkout tags/v2.0
Follow the setup instructions in the README.md. The Python3 script for performance evaluation is nim_performance_test.py.
The script can be used as below:
$ python3 nim_performance_test.py --help
usage: nim_performance_test.py [-h] --request-nb REQUEST_NB --max-stream-nb MAX_STREAM_NB --url URL
Call Audio2Face-3D NIM with different audiofiles. Outputs NIM performance in a csv file in output/ folder. Uses a2f_3d.py to create multiple clients to connect to A2F-3D NIM.
options:
-h, --help show this help message and exit
--request-nb REQUEST_NB
Number of requests to simulate for each audio file
--max-stream-nb MAX_STREAM_NB
Maximum number of A2F-3D streams
--url URL IP of the Audio2Face-3D NIM
For example,
$ python3 nim_performance_test.py --request-nb 100 --max-stream-nb 10 --url 127.0.0.1:52000
The script requires two parameters: a number of requests to simulate and the number of concurrent streams connecting to Audio2Face-3D. The number of connecting streams has to match the
common.stream_numberoption used to startup A2F-3D NIM.Additionally, it accepts a
-uparameter for the A2F-3D NIM. For quick start deployment, use127.0.0.1:52000.The script uses 6 provided audio files at 5, 10, 20 seconds durations and 16khz and 44.1khz sample rates.
Modifying deployment configurations for benchmarking#
Performance results are primarily influenced by two key factors: the model and the number of max streams set when deploying the Audio2Face-3D NIM. When measuring performance against your deployment, you can modify the model and the number of max streams in two ways.
Modify Configuration Files:
- You can update the configuration files used for deployment by modifying the following:
Update the
common.stream_numbervalue in the deployment_config.yaml file.Update the
a2f.regression_model.inference_model_id(or thea2f.diffusion_modelsettings) in the stylization_config.yaml file.
This approach is documented in more detail at Audio2Face-3D NIM Container Deployment and Configuration Guide.
Use Environment Variables:
Alternatively, you can modify the model and max streams by provide the
PERF_MAX_STREAMandPERF_A2F_MODELenvironment variables when deploying the Audio2Face-3D NIM. For example:docker run -it --rm --gpus all --network=host \ -e NGC_API_KEY=$NGC_API_KEY \ -e NIM_MANIFEST_PROFILE=$NIM_MANIFEST_PROFILE \ -e PERF_MAX_STREAM=26 \ -e PERF_A2F_MODEL='mark_v2.3' \ nvcr.io/nim/nvidia/audio2face-3d:2.0
The
PERF_A2F_MODELenvironment variable can take values ofmark_v2.3,claire_v2.3.1, orjames_v2.3.1.
Note
The PERF_A2F_MODEL environment variable will load the configuration file related to the model you want to use, and enable the tongue animation.
Results#
This will produce a folder in which 4 files are available. You can explore the results by running the following command,
and replacing the <output_folder> with the name of the folder printed by the nim_performance_test.py script.
$ ls -l <output_folder>/
-rw-rw-r-- 1 user user 36 Nov 14 15:52 fps_stream_10_request_100.txt
-rw-rw-r-- 1 user user 203 Nov 14 15:52 latency_stream_10_request_100.txt
-rw-rw-r-- 1 user user 2908 Nov 14 15:52 stream_10_request_100.csv
-rw-rw-r-- 1 user user 140946 Nov 14 15:52 stream_10_request_100.png
fps_stream_10_request_100.txt: Lists the 1st percentile FPS for all measurements. Values below 30 FPS indicate potential performance bottlenecks for current deployment.
Example output:
percentile1 FPS: 78.05 above 30: True
latency_stream_10_request_100.txt: Summarizes latency results across all measurements. Average latency above 100ms might affect responsiveness in real-time applications.
Example output:
Worst case scenario: 67.56 ms 99% of requests below: 67.56 ms Average scenario: 42.00 ms Average below 100 ms: True
stream_10_request_100.csv: Contains detailed statistics about latency and FPS for each audio configuration:
Stream configuration (max streams, requests, audio length, sample rate)
First packet latency statistics (min, max, mean, percentiles)
FPS performance metrics (min, max, mean, percentiles)
Number of successful requests
stream_10_request_100.png: Provides visual representation of performance metrics:
Upper panel: Latency metrics across different audio configurations
Lower panel: FPS metrics across different test scenarios
Color-coded bars for easy metric comparison
Includes annotations for precise value readings
Performance benchmark#
The following table lists the most recently measured max streams each GPU supports on Ubuntu 24.04.
Note
The stream numbers below ensure that 99% of the FPS are above 30. We use 1st percentile FPS (P1) to measure performance, which provides a more accurate representation of worst-case latency compared to mean FPS.
Ubuntu 24.04#
GPU |
Max Streams (Regression) |
Max Streams (Diffusion) |
|---|---|---|
A10G |
35 |
32 |
A30 |
55 |
41 |
L4 |
30 |
16 |
L40S |
80 |
60 |
RTX 4090 |
80 |
35 |
RTX 5080 |
75 |
22 |
RTX 5090 |
100 |
52 |
RTX 6000 Ada |
70 |
50 |
RTX PRO 6000 Blackwell |
120 |
130 |
B200 |
255 |
144 |
The server configurations below were used to benchmark the above stream numbers:
GPU |
GPU SKU |
CPU |
RAM |
NIC Bandwidth |
|---|---|---|---|---|
A10G |
PG133 SKU 210 |
1x AMD 9124 16C 3.0GHz (3.75GHz Boost) |
256GB DDR5 |
25 Gbps |
A30 |
N/A |
2x AMD EPYC 7742 64C 2.25GHz (3.4GHz Boost) |
512GB DDR4 |
25 Gbps |
L4 |
N/A |
1x Intel Xeon Gold 6330 28C 2.0GHz (3.1GHz Boost) |
256GB DDR4 |
25 Gbps |
L40S |
N/A |
2x Intel Sapphire Rapids 8480+ 56C 2.0GHz (3.8GHz Boost) |
1024GB DDR5 |
25 Gbps |
RTX 4090 |
PG136 SKU 530 |
1x AMD RYZEN 3700X 8C 3.6GHz |
128GB DDR4 |
25 Gbps |
RTX 5080 |
N/A |
1x AMD RYZEN 9 9900X 12C 4.4GHz (5.6GHz Boost) |
128GB DDR5 |
25 Gbps |
RTX 5090 |
N/A |
1x AMD RYZEN 9 9950X 16C 4.3GHz (5.7GHz Boost) |
192GB DDR5 |
25 Gbps |
RTX 6000 Ada |
PG133 SKU 510 |
1x AMD RYZEN 3700X 8C 3.6GHz |
128GB DDR4 |
25 Gbps |
RTX PRO 6000 Blackwell |
N/A |
1x AMD EPYC 9654 96C 2.4GHz (3.7GHz Boost) |
512GB DDR5 |
100 Gbps |
B200 |
N/A |
2x Intel Xeon Platinum 8480+ 56C 2.0GHz (3.8GHz Boost) |
2048GB DDR5 |
100 Gbps |
Performance troubleshooting#
If you see performance degradation under 30 FPS, try the following:
Lower the number of concurrent connections: Change the
common.stream_numberconfiguration option and restart the service. The stream number must be lower than the TRT engine max shape for A2F and A2E advanced configs.Verify GPU solver is enabled: Ensure
a2f.use_gpu_solver: trueinadvanced_config.yaml(this is the default). The GPU solver keeps blendshape solving on the GPU, improving throughput by avoiding CPU-GPU data transfers.
If you are sharing resources with another GPU intensive application please check Sharing Audio2Face-3D Compute Resources.