Sharing Audio2Face Compute Resources

In certain use cases, the Audio2Face Microservice will run alongside other services utilizing GPU compute. For most scenarios where these services perform intermittent compute tasks, this setup is generally adequate.

However, for scenarios involving constant and heavy compute demands, such as with a renderer, certain adjustments are necessary to optimize Audio2Face compute utilization.

Note

For optimal performance, we recommend running Audio2Face and renderers on separate systems to avoid resource contention.

Sharing Audio2Face compute with a Renderer

This section outlines the best practices for sharing a GPU between the Audio2Face Microservice and a renderer.

Audio2Face is designed to serve numerous clients rapidly, leading to high compute usage when processing audio clips.

The following guidelines will help you stabilize Audio2Face’s GPU usage by slowing down the processing while ensuring at least real-time speed, resulting in smoother operation for your rendering application.

Note

The feasibility of sharing a GPU between Audio2Face and a rendering application depends on the GPU’s capabilities and the rendering application’s computational demands.

Stabilizing Audio2Face GPU usage

We recommend starting with a single stream deployment (stream_number=1) and sending audio data as follows:

  • First, send one chunk of audio lasting 500ms.

  • Then, send 35ms of audio 30 times per second (35ms * 30 = 1.050s = 31.5 FPS).

This method allows the Audio2Face Microservice to process data at approximately 31 FPS. Streaming slightly above 30 FPS reduces the chances of stuttering and helps to smooth GPU processing usage.

Additionally, when building the TRT engine, using the fp16 option will enable more efficient computation.

Optimizing rendering performances

The following rendering settings can be configured in the Nvidia Control Panel:

../../../_images/panel_part1.png ../../../_images/panel_part2.png ../../../_images/panel_part3.png ../../../_images/panel_part4.png