NVIDIA Docs Hub NVIDIA Networking Accelerator Software NVIDIA HPC-X Software Toolkit Rev 2.23 Multi-GPU Support

Multi-GPU Support

The feature enables parallelization techniques involving multiple CUDA GPUs within a single process.

Overview

The feature enables parallelization techniques involving multiple CUDA GPUs within a single process in the general case, and hybrid MPI techniques in particular: MPI + OpenACC/OpenMP/stdpar; allowing a single MPI rank to manage more than one CUDA GPU.

The implementation is done in UCX library. Thus, the use of multiple CUDA GPUs is supported in applications that use UCX library directly, as well as in MPI applications.

Along with this functionality, the requirements for setting the CUDA device (cudaSetDevice ) in each CPU thread of the application process using UCX library interfaces have also been relaxed. Previously, the user had to set the CUDA device in all threads, including the progress thread. Now, the user may set the CUDA device only once to utilize the CUDA features in UCX. Since CUDA devices in CUDA Runtime API and CUcontext s in CUDA Driver API are synonymous, this is also true for applications using CUDA Driver API.

Performance Tuning

By default, UCX library uses closest NICs to the CUDA GPU which is set before the initialization of the library. This is the optimal policy for applications using one CUDA GPU per process. The following combination of UCX parameters can be used to achieve the best performance for applications using multiple CUDA GPUs.

Copy
Copied!

            
            UCX_SELECT_DISTANCE_MD=
UCX_CONNECT_ALL_TO_ALL=y

Limitations

The feature is not supported for memory allocated VMM API if the device on which the memory is allocated and the device for which access rights are set do not match.

On This Page

Multi-GPU Support

Overview

Performance Tuning

Limitations