Use Multiple GPUs#

Applications that are developed with the NVIDIA Audio Effects SDK can be used with multiple GPUs. By default, the SDK assumes that the application will set the GPU. Optionally, the SDK can select the best GPU to run the effects.

Select the GPU for Audio Effects Processing in a Multi-GPU Environment#

The GPU to be used to run audio effects in a multi-GPU environment can be controlled by using the cudaSetDevice() and cudaGetDevice() CUDA functions.

The device should be set before NvAFX_Load() is called because NvAFX_Load() will succeed only when the currently selected GPU supports the SDK:

int chosenGPU = 0; // or whichever GPU you want to use
cudaSetDevice(chosenGPU);
NvAFX_Handle effect;
err = NvAFX_API NvAFX_CreateEffect(code, &effect);
err = NvAFX_Set...; // set parameters
...
err = NvAFX_API NvAFX_Load(effect);
...
err = NvAFX_API NvAFX_Run(effect, ...);

Select GPUs for Chained Audio Effects on Linux#

When using chained effects in a multi-GPU environment on Linux, the SDK can optionally run the effects in the chain on separate GPUs. For example, in a Denoiser 16k + Superres 16k to 48k chain, the denoiser effect can be run entirely on one GPU and the Superres effect on another GPU.

To use this feature, create the chained effect and set NVAFX_PARAM_CHAINED_EFFECT_GPU_LIST to an array that specifies the GPU IDs that are using NvAFX_SetU32List. This parameter must be set before you call NvAFX_Load on the effect.

The following sample demonstrates use of this parameter:

NvAFX_Handle effect;
err = NvAFX_API NvAFX_CreateChainedEffect(code, &effect);
...

// Run first effect on GPU id 3, second on GPU id 4
uint32_t gpus[] = { 3, 4};
err = NvAFX_API SetU32List(effect, NVAFX_PARAM_CHAINED_EFFECT_GPU_LIST, gpus, sizeof(gpus));
...
err = NvAFX_API NvAFX_Load(effect);
...

Offload GPU Selection to the SDK for Audio Effects Processing in a Multi-GPU Environment#

In a multi-GPU environment, the SDK can optionally determine the optimal GPU on which to run the audio effects. To use this feature, call NvAFX_SetU32 with the parameters NvAFX_SetU32(effect, NVAFX_PARAM_USE_DEFAULT_GPU, 1) before loading effects. If called after an audio effect is loaded, this function has no effect.

Note

NVAFX_PARAM_USE_DEFAULT_GPU and NVAFX_PARAM_USER_CUDA_CONTEXT (Windows SDK only) cannot be used at the same time.

If the application sets NVAFX_PARAM_USE_DEFAULT_GPU to 0 (or does not set this parameter), the SDK does not explicitly select the GPU to run the effect. The application can set the device on which SDK calls are executed by using cudaSetDevice() to set the device. If this parameter is not set or is set to 0, the SDK uses the default device (device 0).

Note

This parameter is not supported for chained effects.

If the application sets NVAFX_PARAM_USE_DEFAULT_GPU to 1, the application should not call cudaSetDevice(), and all other effects or multiple instances of an effect will use the GPU determined by the SDK. If the application does explicitly call cudaSetDevice() before calling:ref:NvAFX_Load() <nvafx-load>, the SDK might override the application’s device preference. If the client calls cudaSetDevice() to set the GPU to a different GPU just before calling NvAFX_Run(), the NvAFX_Run() call will fail:

NvAFX_Handle effect;
err = NvAFX_API NvAFX_CreateEffect(code, &effect);
err = NvAFX_API SetU32(effect, NVAFX_PARAM_USE_DEFAULT_GPU, 1);
...
err = NvAFX_API NvAFX_Load(effect);
...

Select GPUs for Various Tasks#

The applications that use the SDK might be designed to perform multiple tasks in a multi-GPU environment in addition to applying the audio effect filter. In this situation, the best GPU for each task should be selected before calling NvAFX_Load() and be set before each NvAFX_Run() call.

Switching to the appropriate GPU is the responsibility of the application. If the application does not switch to the appropriate GPU before calling NvAFX_Run(), the call will fail with an error.

The following steps demonstrate how to complete CUDA tasks and SDK calls on different GPUs.

  1. Call cudaGetDeviceCount() to determine the number of available GPUs:

    cuErr = cudaGetDeviceCount(&deviceCount);
    
  2. Determine the best GPU for the task.

    For example, this can be determined by iterating over the available GPUs and selecting the GPU with the highest number of SMs by using cudaGetDeviceProperties().

  3. In the loop that completes the application’s tasks, select the best GPU for each task before performing the task. Call cudaSetDevice() to select the GPU for the task:

    while (!done) {
       ...
       cudaSetDevice(gpuOtherTask);
       PerformOtherTask();
       cudaSetDevice(gpuAFX);
       err = NvAFX_Run(effect, ...);
       ...
    }
    

CUDA Graph Support on Windows#

The Windows SDK supports using CUDA graphs, which improve performance by reducing the CPU overheads seen with short-lived CUDA kernels.

By default, graphs are enabled in the Windows SDK, but this can cause issues if the SDK runs in parallel with other applications that are using CUDA graphs. The following example shows how to disable CUDA graphs:

// Call before loading model (NvAFX_Load) on Windows
err = NvAFX_API SetU32(effect, NVAFX_PARAM_DISABLE_CUDA_GRAPH, 1);
...