Performance#
Audio2Face-3D NIM offers real time performance at 30 FPS for converting audio into facial animation, when run on optimal hardware configuration. The microservice supports concurrent input streams which may affect this performance.
For more information about optimal hardware configuration, check the Support Matrix page.
Performance evaluation#
You can use the provided sample application to get performance information about your current deployment of Audio2Face-3D.
# Clone the repository
git clone https://github.com/NVIDIA/Audio2Face-3D-Samples.git
# Navigate to the scripts directory
cd Audio2Face-3D-Samples/scripts/audio2face_3d_microservices_interaction_app
Follow the setup instructions in the README.md. The Python3 script for performance evaluation is nim_performance_test.py.
The script can be used as below:
$ python3 nim_performance_test.py --help
usage: nim_performance_test.py [-h] --request-nb REQUEST_NB --max-stream-nb MAX_STREAM_NB --url URL
Call Audio2Face-3D NIM with different audiofiles. Outputs NIM performance in a csv file in output/ folder. Uses a2f_3d.py to create multiple clients to connect to A2F-3D NIM.
options:
-h, --help show this help message and exit
--request-nb REQUEST_NB
Number of requests to simulate for each audio file
--max-stream-nb MAX_STREAM_NB
Maximum number of A2F-3D streams
--url URL IP of the Audio2Face-3D NIM
For example,
$ python3 nim_performance_test.py --request-nb 100 --max-stream-nb 10 --url 127.0.0.1:52000
The script requires two parameters: a number of requests to simulate and the number of concurrent streams connecting to Audio2Face-3D. The number of connecting streams has to match the
common.stream_number
option used to startup A2F-3D NIM.Additionally, it accepts a
-u
parameter for the A2F-3D NIM. For quick start deployment, use127.0.0.1:52000
.The script uses 6 provided audio files at 5, 10, 20 seconds durations and 16khz and 44.1khz sample rates.
Modifying deployment configurations for benchmarking#
Performance results are primarily influenced by two key factors: the model and the number of max streams set when deploying the Audio2Face-3D NIM. When measuring performance against your deployment, you can modify the model and the number of max streams in two ways.
Modify Configuration Files:
- You can update the configuration files used for deployment by modifying the following:
Update the
common.stream_number
value in the deployment_config.yaml file.Update the
a2f.inference_model_id
anda2f.blendshape_id
values in the stylization_config.yaml file.
This approach is documented in more detail at A2F-3D NIM Manual Container Deployment and Configuration.
Use Environment Variables:
Alternatively, you can modify the model and max streams by provide the
PERF_MAX_STREAM
andPERF_A2F_MODEL
environment variables when deploying the Audio2Face-3D NIM. For example:docker run -it --rm --gpus all --network=host \ -e NGC_API_KEY=$NGC_API_KEY \ -e NIM_MANIFEST_PROFILE=$NIM_MANIFEST_PROFILE \ -e PERF_MAX_STREAM=26 \ -e PERF_A2F_MODEL='mark_v2.3' \ nvcr.io/nim/nvidia/audio2face-3d:1.2
The
PERF_A2F_MODEL
environment variable can take values ofmark_v2.3
,claire_v2.3
, orjames_v2.3
.
Results#
This will produce a folder in which 4 files are available. You can explore the results by running the following command,
and replacing the <output_folder>
with the name of the folder printed by the nim_performance_test.py
script.
$ ls -l <output_folder>/
-rw-rw-r-- 1 user user 36 Nov 14 15:52 fps_stream_10_request_100.txt
-rw-rw-r-- 1 user user 203 Nov 14 15:52 latency_stream_10_request_100.txt
-rw-rw-r-- 1 user user 2908 Nov 14 15:52 stream_10_request_100.csv
-rw-rw-r-- 1 user user 140946 Nov 14 15:52 stream_10_request_100.png
fps_stream_10_request_100.txt: Lists the 1st percentile FPS for all measurements. Values below 30 FPS indicate potential performance bottlenecks for current deployment.
Example output:
percentile1 FPS: 78.05 above 30: True
latency_stream_10_request_100.txt: Summarizes latency results across all measurements. Average latency above 100ms might affect responsiveness in real-time applications.
Example output:
Worst case scenario: 67.56 ms 99% of requests below: 67.56 ms Average scenario: 42.00 ms Average below 100 ms: True
stream_10_request_100.csv: Contains detailed statistics about latency and FPS for each audio configuration:
Stream configuration (max streams, requests, audio length, sample rate)
First packet latency statistics (min, max, mean, percentiles)
FPS performance metrics (min, max, mean, percentiles)
Number of successful requests
stream_10_request_100.png: Provides visual representation of performance metrics:
Upper panel: Latency metrics across different audio configurations
Lower panel: FPS metrics across different test scenarios
Color-coded bars for easy metric comparison
Includes annotations for precise value readings
Performance benchmark#
The following table lists the most recently measured max streams each GPU supports.
GPU |
Max Streams |
---|---|
A10G |
33 |
RTX6000 |
51 |
RTX4090 |
53 |
L40S |
60 |
A100 |
95 |
H100 |
115 |
The server configurations below were used to benchmark the above stream numbers:
GPU |
GPU SKU |
CPU |
RAM |
NIC Bandwidth |
---|---|---|---|---|
A10G |
PG133 SKU 210 |
1x AMD 9124 16C 3.0GHz (3.75GHz Boost) |
4x 64GB (4800 MHz) 256GB |
25 Gbps |
RTX6000 |
PG133 SKU 510 |
1x AMD RYZEN 3700X 8C 3.6GHz |
4x 32GB (3200MHz) 128GB |
25 Gbps |
RTX4090 |
PG136 SKU 530 |
1x AMD RYZEN 3700X 8C 3.6GHz |
4x 32GB (3200MHz) 128GB |
25 Gbps |
L40S |
N/A |
2x Intel Sapphire Rapids 8480+ 56c 2 GHz (3.8 GHz Boost) |
1024 DDR5 |
25 Gbps |
A100 (SXM4 80GB) |
N/A |
2x AMD EPYC 7742 64-Core Processor 2.25 GHz (base) with 3.4 GHz (max boost)) |
2TB |
25 Gbps |
H100 (HBM3 80GB) |
N/A |
2x Intel Xeon Platinum 8480+ 56c 2 GHz (3.8 GHz Boost) |
2TB |
25 Gbps |
Performance troubleshooting#
If you see performance degradation under 30 FPS, try lowering the number of accepted concurrent connections. You can
do this by changing common.stream_number
configuration option and restarting the service. The stream number must be
lower than the TRT engine max shape for A2F and A2E advanced configs.
If you are sharing resources with another GPU intensive application please check Sharing Audio2Face-3D Compute Resources.