Performance#

Audio2Face-3D NIM offers real time performance at 30 FPS for converting audio into facial animation, when run on optimal hardware configuration. The microservice supports concurrent input streams which may affect this performance.

For more information about optimal hardware configuration, check the Support Matrix page.

Performance evaluation#

You can use the provided sample application to get performance information about your current deployment of Audio2Face-3D.

Warning

The NIM container must be running and healthy before running the performance test. If the container is not running, the script fails with a long traceback ending in grpc.aio._call.AioRpcError: Connection refused. Verify readiness first:

$ curl -s http://localhost:8000/v1/health/ready
# Clone the repository
git clone https://github.com/NVIDIA/Audio2Face-3D-Samples.git

# Navigate to the scripts directory
cd Audio2Face-3D-Samples/scripts/audio2face_3d_microservices_interaction_app

# Checkout v2.0 tag
git checkout tags/v2.0

Follow the setup instructions in the README.md. The Python3 script for performance evaluation is nim_performance_test.py.

The script can be used as below:

$ python3 nim_performance_test.py --help
usage: nim_performance_test.py [-h] --request-nb REQUEST_NB --max-stream-nb MAX_STREAM_NB --url URL

Call Audio2Face-3D NIM with different audiofiles. Outputs NIM performance in a csv file in output/ folder. Uses a2f_3d.py to create multiple clients to connect to A2F-3D NIM.

options:
  -h, --help            show this help message and exit
  --request-nb REQUEST_NB
                      Number of requests to simulate for each audio file
  --max-stream-nb MAX_STREAM_NB
                      Maximum number of A2F-3D streams
  --url URL             IP of the Audio2Face-3D NIM

For example,

$ python3 nim_performance_test.py --request-nb 100 --max-stream-nb 10 --url 127.0.0.1:52000
  • The script requires two parameters: a number of requests to simulate and the number of concurrent streams connecting to Audio2Face-3D. The number of connecting streams has to match the common.stream_number option used to startup A2F-3D NIM.

  • Additionally, it accepts a -u parameter for the A2F-3D NIM. For quick start deployment, use 127.0.0.1:52000.

  • The script uses 6 provided audio files at 5, 10, 20 seconds durations and 16khz and 44.1khz sample rates.

Modifying deployment configurations for benchmarking#

Performance results are primarily influenced by two key factors: the model and the number of max streams set when deploying the Audio2Face-3D NIM. When measuring performance against your deployment, you can modify the model and the number of max streams in two ways.

  1. Modify Configuration Files:

    You can update the configuration files used for deployment by modifying the following:
    • Update the common.stream_number value in the deployment_config.yaml file.

    • Update the a2f.regression_model.inference_model_id (or the a2f.diffusion_model settings) in the stylization_config.yaml file.

    This approach is documented in more detail at Audio2Face-3D NIM Container Deployment and Configuration Guide.

  2. Use Environment Variables:

    Alternatively, you can modify the model and max streams by provide the PERF_MAX_STREAM and PERF_A2F_MODEL environment variables when deploying the Audio2Face-3D NIM. For example:

    docker run -it --rm --gpus all --network=host \
        -e NGC_API_KEY=$NGC_API_KEY \
        -e NIM_MANIFEST_PROFILE=$NIM_MANIFEST_PROFILE \
        -e PERF_MAX_STREAM=26 \
        -e PERF_A2F_MODEL='mark_v2.3' \
        nvcr.io/nim/nvidia/audio2face-3d:2.0
    

    The PERF_A2F_MODEL environment variable can take values of mark_v2.3, claire_v2.3.1, or james_v2.3.1.

Note

The PERF_A2F_MODEL environment variable will load the configuration file related to the model you want to use, and enable the tongue animation.

Results#

This will produce a folder in which 4 files are available. You can explore the results by running the following command, and replacing the <output_folder> with the name of the folder printed by the nim_performance_test.py script.

$ ls -l <output_folder>/
-rw-rw-r-- 1 user user     36 Nov 14 15:52 fps_stream_10_request_100.txt
-rw-rw-r-- 1 user user    203 Nov 14 15:52 latency_stream_10_request_100.txt
-rw-rw-r-- 1 user user   2908 Nov 14 15:52 stream_10_request_100.csv
-rw-rw-r-- 1 user user 140946 Nov 14 15:52 stream_10_request_100.png
  • fps_stream_10_request_100.txt: Lists the 1st percentile FPS for all measurements. Values below 30 FPS indicate potential performance bottlenecks for current deployment.

    • Example output:

      percentile1 FPS: 78.05
      above 30: True
      
  • latency_stream_10_request_100.txt: Summarizes latency results across all measurements. Average latency above 100ms might affect responsiveness in real-time applications.

    • Example output:

      Worst case scenario: 67.56 ms
      99% of requests below: 67.56 ms
      Average scenario: 42.00 ms
      Average below 100 ms: True
      
  • stream_10_request_100.csv: Contains detailed statistics about latency and FPS for each audio configuration:

    • Stream configuration (max streams, requests, audio length, sample rate)

    • First packet latency statistics (min, max, mean, percentiles)

    • FPS performance metrics (min, max, mean, percentiles)

    • Number of successful requests

  • stream_10_request_100.png: Provides visual representation of performance metrics:

    • Upper panel: Latency metrics across different audio configurations

    • Lower panel: FPS metrics across different test scenarios

    • Color-coded bars for easy metric comparison

    • Includes annotations for precise value readings

Performance benchmark#

The following table lists the most recently measured max streams each GPU supports on Ubuntu 24.04.

Note

The stream numbers below ensure that 99% of the FPS are above 30. We use 1st percentile FPS (P1) to measure performance, which provides a more accurate representation of worst-case latency compared to mean FPS.

Ubuntu 24.04#

GPU

Max Streams (Regression)

Max Streams (Diffusion)

A10G

35

32

A30

55

41

L4

30

16

L40S

80

60

RTX 4090

80

35

RTX 5080

75

22

RTX 5090

100

52

RTX 6000 Ada

70

50

RTX PRO 6000 Blackwell

120

130

B200

255

144

The server configurations below were used to benchmark the above stream numbers:

GPU

GPU SKU

CPU

RAM

NIC Bandwidth

A10G

PG133 SKU 210

1x AMD 9124 16C 3.0GHz (3.75GHz Boost)

256GB DDR5

25 Gbps

A30

N/A

2x AMD EPYC 7742 64C 2.25GHz (3.4GHz Boost)

512GB DDR4

25 Gbps

L4

N/A

1x Intel Xeon Gold 6330 28C 2.0GHz (3.1GHz Boost)

256GB DDR4

25 Gbps

L40S

N/A

2x Intel Sapphire Rapids 8480+ 56C 2.0GHz (3.8GHz Boost)

1024GB DDR5

25 Gbps

RTX 4090

PG136 SKU 530

1x AMD RYZEN 3700X 8C 3.6GHz

128GB DDR4

25 Gbps

RTX 5080

N/A

1x AMD RYZEN 9 9900X 12C 4.4GHz (5.6GHz Boost)

128GB DDR5

25 Gbps

RTX 5090

N/A

1x AMD RYZEN 9 9950X 16C 4.3GHz (5.7GHz Boost)

192GB DDR5

25 Gbps

RTX 6000 Ada

PG133 SKU 510

1x AMD RYZEN 3700X 8C 3.6GHz

128GB DDR4

25 Gbps

RTX PRO 6000 Blackwell

N/A

1x AMD EPYC 9654 96C 2.4GHz (3.7GHz Boost)

512GB DDR5

100 Gbps

B200

N/A

2x Intel Xeon Platinum 8480+ 56C 2.0GHz (3.8GHz Boost)

2048GB DDR5

100 Gbps

Performance troubleshooting#

If you see performance degradation under 30 FPS, try the following:

  1. Lower the number of concurrent connections: Change the common.stream_number configuration option and restart the service. The stream number must be lower than the TRT engine max shape for A2F and A2E advanced configs.

  2. Verify GPU solver is enabled: Ensure a2f.use_gpu_solver: true in advanced_config.yaml (this is the default). The GPU solver keeps blendshape solving on the GPU, improving throughput by avoiding CPU-GPU data transfers.

If you are sharing resources with another GPU intensive application please check Sharing Audio2Face-3D Compute Resources.