Performance#

Audio2Face-3D NIM offers real time performance at 30 FPS for converting audio into facial animation, when run on optimal hardware configuration. The microservice supports concurrent input streams which may affect this performance.

For more information about optimal hardware configuration, check the Support Matrix page.

Performance evaluation#

You can use the provided sample application to get performance information about your current deployment of Audio2Face-3D.

# Clone the repository
git clone https://github.com/NVIDIA/Audio2Face-3D-Samples.git

#**NEW:** Checkout the v1.2 tag:
git checkout tags/v1.2

# Navigate to the scripts directory
cd Audio2Face-3D-Samples/scripts/audio2face_3d_microservices_interaction_app

Follow the setup instructions in the README.md. The Python3 script for performance evaluation is nim_performance_test.py.

The script can be used as below:

$ python3 nim_performance_test.py --help
usage: nim_performance_test.py [-h] --request-nb REQUEST_NB --max-stream-nb MAX_STREAM_NB --url URL

Call Audio2Face-3D NIM with different audiofiles. Outputs NIM performance in a csv file in output/ folder. Uses a2f_3d.py to create multiple clients to connect to A2F-3D NIM.

options:
  -h, --help            show this help message and exit
  --request-nb REQUEST_NB
                      Number of requests to simulate for each audio file
  --max-stream-nb MAX_STREAM_NB
                      Maximum number of A2F-3D streams
  --url URL             IP of the Audio2Face-3D NIM

For example,

$ python3 nim_performance_test.py --request-nb 100 --max-stream-nb 10 --url 127.0.0.1:52000

The script requires two parameters: a number of requests to simulate and the number of concurrent streams connecting to Audio2Face-3D. The number of connecting streams has to match the common.stream_number option used to startup A2F-3D NIM.
Additionally, it accepts a -u parameter for the A2F-3D NIM. For quick start deployment, use 127.0.0.1:52000.
The script uses 6 provided audio files at 5, 10, 20 seconds durations and 16khz and 44.1khz sample rates.

Modifying deployment configurations for benchmarking#

Performance results are primarily influenced by two key factors: the model and the number of max streams set when deploying the Audio2Face-3D NIM. When measuring performance against your deployment, you can modify the model and the number of max streams in two ways.

Modify Configuration Files:
You can update the configuration files used for deployment by modifying the following:
- Update the common.stream_number value in the deployment_config.yaml file.
- Update the a2f.inference_model_id and a2f.blendshape_id values in the stylization_config.yaml file.
This approach is documented in more detail at A2F-3D NIM Manual Container Deployment and Configuration.
Use Environment Variables:

Alternatively, you can modify the model and max streams by provide the PERF_MAX_STREAM and PERF_A2F_MODEL environment variables when deploying the Audio2Face-3D NIM. For example:
```
docker run -it --rm --gpus all --network=host \
    -e NGC_API_KEY=$NGC_API_KEY \
    -e NIM_MANIFEST_PROFILE=$NIM_MANIFEST_PROFILE \
    -e PERF_MAX_STREAM=26 \
    -e PERF_A2F_MODEL='mark_v2.3' \
    nvcr.io/nim/nvidia/audio2face-3d:1.2
```
The PERF_A2F_MODEL environment variable can take values of mark_v2.3, claire_v2.3, or james_v2.3.

Results#

This will produce a folder in which 4 files are available. You can explore the results by running the following command, and replacing the <output_folder> with the name of the folder printed by the nim_performance_test.py script.

$ ls -l <output_folder>/
-rw-rw-r-- 1 user user     36 Nov 14 15:52 fps_stream_10_request_100.txt
-rw-rw-r-- 1 user user    203 Nov 14 15:52 latency_stream_10_request_100.txt
-rw-rw-r-- 1 user user   2908 Nov 14 15:52 stream_10_request_100.csv
-rw-rw-r-- 1 user user 140946 Nov 14 15:52 stream_10_request_100.png

fps_stream_10_request_100.txt: Lists the 1st percentile FPS for all measurements. Values below 30 FPS indicate potential performance bottlenecks for current deployment.
- Example output:
```
percentile1 FPS: 78.05
above 30: True
```
latency_stream_10_request_100.txt: Summarizes latency results across all measurements. Average latency above 100ms might affect responsiveness in real-time applications.
- Example output:
```
Worst case scenario: 67.56 ms
99% of requests below: 67.56 ms
Average scenario: 42.00 ms
Average below 100 ms: True
```
stream_10_request_100.csv: Contains detailed statistics about latency and FPS for each audio configuration:
- Stream configuration (max streams, requests, audio length, sample rate)
- First packet latency statistics (min, max, mean, percentiles)
- FPS performance metrics (min, max, mean, percentiles)
- Number of successful requests
stream_10_request_100.png: Provides visual representation of performance metrics:
- Upper panel: Latency metrics across different audio configurations
- Lower panel: FPS metrics across different test scenarios
- Color-coded bars for easy metric comparison
- Includes annotations for precise value readings

Performance benchmark#

The following table lists the most recently measured max streams each GPU supports.

GPU	Max Streams
A10G	33
RTX6000	51
RTX4090	53
L40S	60
A100	95
H100	115

The server configurations below were used to benchmark the above stream numbers:

GPU	GPU SKU	CPU	RAM	NIC Bandwidth
A10G	PG133 SKU 210	1x AMD 9124 16C 3.0GHz (3.75GHz Boost)	4x 64GB (4800 MHz) 256GB	25 Gbps
RTX6000	PG133 SKU 510	1x AMD RYZEN 3700X 8C 3.6GHz	4x 32GB (3200MHz) 128GB	25 Gbps
RTX4090	PG136 SKU 530	1x AMD RYZEN 3700X 8C 3.6GHz	4x 32GB (3200MHz) 128GB	25 Gbps
L40S	N/A	2x Intel Sapphire Rapids 8480+ 56c 2 GHz (3.8 GHz Boost)	1024 DDR5	25 Gbps
A100 (SXM4 80GB)	N/A	2x AMD EPYC 7742 64-Core Processor 2.25 GHz (base) with 3.4 GHz (max boost))	2TB	25 Gbps
H100 (HBM3 80GB)	N/A	2x Intel Xeon Platinum 8480+ 56c 2 GHz (3.8 GHz Boost)	2TB	25 Gbps

Performance troubleshooting#

If you see performance degradation under 30 FPS, try lowering the number of accepted concurrent connections. You can do this by changing common.stream_number configuration option and restarting the service. The stream number must be lower than the TRT engine max shape for A2F and A2E advanced configs.

If you are sharing resources with another GPU intensive application please check Sharing Audio2Face-3D Compute Resources.