Using Multiple GPUs#

Applications that are developed with the AR SDK can be used with multiple GPUs. By default, the SDK determines which GPU to use based on the capability of the currently selected GPU. If the currently selected GPU supports the SDK, the SDK uses it. Otherwise, the SDK selects the best GPU.

You can control which GPU is used in a multi-GPU environment by using the cudaSetDevice(int whichGPU) and cudaGetDevice(int *whichGPU) NVIDIA CUDA Toolkit functions and the NvAR_SetS32(NULL, NvAR_Parameter_Config(GPU), whichGPU) AR SDK Set function. The Set() function is called only once for the AR SDK before any effects are created. Because you can’t transparently pass images that are allocated on one GPU to another GPU, you must ensure that the same GPU is used for all AR features.

NvCV_Status err;
int chosenGPU = 0; // or whatever GPU you want to use
err = NvAR_SetS32(NULL, NvAR_Parameter_Config(GPU), chosenGPU);
if (NVCV_SUCCESS != err) {
  printf("Error choosing GPU %d: %s\n", chosenGPU,
         NvCV_GetErrorStringFromCode(err));
}

cudaSetDevice(chosenGPU);
NvCVImage dst = new NvCVImage(
   // your parameters
   );

NvAR_Handle eff;
err = NvAR_API NvAR_CreateEffect(code, &eff);

// your program logic here

err = NvAR_API NvAR_Load(eff);
err = NvAR_API NvAR_Run(eff, true);
// switch GPU for other task, then switch back for next frame

Buffers need to be allocated on the selected GPU, so before you allocate images on the GPU, call cudaSetDevice(). Neural networks need to be loaded on the selected GPU, so before NvAR_Load() is called, set this GPU as the current device.

To use the buffers and models, before you call NvAR_Run(), set the GPU device as the current device. A previous call to NvAR_SetS32(NULL, NvAR_Parameter_Config(GPU), whichGPU) helps enforce this requirement.

For performance reasons, the application is responsible for switching to the appropriate GPU.

Default Behavior in Multi-GPU Environments#

The NvAR_Load() function internally calls cudaGetDevice() to identify the currently selected GPU.

The function checks the compute capability of the currently selected GPU (default 0) to determine whether the GPU architecture supports the AR SDK and completes one of the following tasks:

  • If the SDK is supported, NvAR_Load() uses the GPU.

  • If the SDK is not supported, NvAR_Load() searches for the most powerful GPU that supports the AR SDK and calls cudaSetDevice() to set that GPU as the current GPU.

If you do not require your application to use a specific GPU in a multi-GPU environment, the default behavior should suffice.

Selecting the GPU for AR SDK Processing in a Multi-GPU Environment#

Your application might be designed to perform the task of applying an AR filter by using only a specific GPU in a multi-GPU environment. In this situation, ensure that the SDK does not override your choice of GPU for applying the video effect filter.

// Initialization
cudaGetDevice(&beforeGPU);

err = NvAR_Load(eff);

if (NVCV_SUCCESS != err) {
  printf("Cannot load ARSDK: %s\n", NvCV_GetErrorStringFromCode(err));
  exit(-1);
}

cudaGetDevice(&arsdkGPU);

if (beforeGPU != arsdkGPU) {
  printf("GPU #%d cannot run AR SDK, so GPU #%d was chosen instead\n",
         beforeGPU, arsdkGPU);
}

Selecting Different GPUs for Different Tasks#

Your application might be designed to perform multiple tasks in a multi-GPU environment, such as rendering a game and applying an AR filter. In this situation, select the best GPU for each task before calling NvAR_Load().

  1. Call cudaGetDeviceCount() to determine the number of GPUs in your environment.

    // Get the number of GPUs
    cuErr = cudaGetDeviceCount(&deviceCount);
    
  2. Get the properties of each GPU and determine whether it is the best GPU for each task by performing the following operations for each GPU in a loop:

    1. Call cudaSetDevice() to set the current GPU.

    2. Call cudaGetDeviceProperties() to get the properties of the current GPU.

    3. To determine whether the GPU is the best GPU for each specific task, use a custom code in your application to analyze the properties that were retrieved by cudaGetDeviceProperties().

    The following example uses the compute capability to determine whether a GPU’s properties should be analyzed and whether the current GPU is the best GPU on which to apply a video effect filter. A GPU’s properties are analyzed only when the compute capability is 7.5, 8.6, or 8.9, which denotes a GPU that is based on Turing, the Ampere architecture, or the Ada architecture, respectively.

    // Loop through the GPUs to get the properties of each GPU and
    // determine whether it is the best GPU for each task based on the
    // properties obtained.
    for (int dev = 0; dev < deviceCount; ++dev) {
       cudaSetDevice(dev);
       cudaGetDeviceProperties(&deviceProp, dev);
    
       if (DeviceIsBestForARSDK(&deviceProp)) gpuARSDK = dev;
       if (DeviceIsBestForGame(&deviceProp)) gpuGame = dev;
       // your program logic here
    }
    
    cudaSetDevice(gpuARSDK);
    err = NvAR_Set...; // set parameters
    err = NvAR_Load(eff);
    
  3. In the loop to complete the application’s tasks, select the best GPU for each task before performing the task.

    1. Call cudaSetDevice() to select the GPU for the task.

    2. Make all the function calls required to perform the task.

    In this way, you select the best GPU for each task only once without setting the GPU for every function call.

    This example selects the best GPU for rendering a game and uses custom code to render the game. It then selects the best GPU for applying a video effect filter before calling the NvCVImage_Transfer() and NvAR_Run() functions to apply the filter, avoiding the need to save and restore the GPU for every NVIDIA AR SDK API call.

    // Select the best GPU for each task and perform the task.
    while (!done) {
       // your program logic here
       cudaSetDevice(gpuGame);
       RenderGame();
    
       cudaSetDevice(gpuARSDK);
       err = NvAR_Run(eff, 1);
       // your program logic here
    }
    

Using Multi-Instance GPU (Linux only)#

Applications that are developed with the AR SDK can be deployed on Multi-Instance GPU (MIG) on supported devices, such as NVIDIA DGX™ A100. MIG lets you partition a device into up to seven multiple GPU instances, each with separate streaming multiprocessors, separate slices of the GPU memory, and separate pathways to the memory. This process ensures that heavy resource usage by an application on one partition does not impact the performance of the applications running on other partitions.

To run an application on a MIG partition, you do not need to call any additional SDK API in your application. You can specify which MIG instance to use for execution during invocation of your application.

You can select the MIG instance using one of the following options:

  • The bare-metal method of using the CUDA_VISIBLE_DEVICES environment variable.

  • The container method by using the NVIDIA Container Toolkit. MIG is supported only on Linux.

Refer to the NVIDIA Multi-Instance GPU User Guide for more information about the MIG and its usage.