System Requirements and Recommendations

The platform and server requirements for GPUDirect RDMA are detailed in the following table:

Platform

Type and Version

HCAs

  • NVIDIA® ConnectX®-4 (VPI/EN)

  • NVIDIA® ConnectX®-4 Lx

  • NVIDIA® ConnectX®-5 (VPI/EN)

  • NVIDIA® ConnectX®-6 (VPI/EN)

  • NVIDIA® ConnectX®-6 Dx

  • NVIDIA® ConnectX®-6 Lx

GPUs

  • NVIDIA® Tesla™ / Quadro K-Series or Tesla™ / Quadro™ P-Series GPU

Software/Plugins

Once the NVIDIA software components are installed, it is important to make sure that the GPUDirect kernel module is properly loaded on each of the compute systems where you plan to run the job that requires the GPUDirect RDMA. To do that, run:

Copy
Copied!
            

service nv_peer_mem status                                         

For other Linux flavors, run:

Copy
Copied!
            

lsmod | grep nv_peer_mem                                         

Usually, this kernel module is set to load by default by the system startup service. If it is not loaded, GPUDirect RDMA would not work, which would result in a very high latency for message communications.

In this case, to start the module, run:

Copy
Copied!
            

service nv_peer_mem start                                              

Or for other Linux flavors, run:

Copy
Copied!
            

modprobe nv_peer_mem   

To achieve the best performance for GPUDirect RDMA, it is required that both the HCA and the GPU be physically located on the same PCIe IO root complex.

For additional information on the system's architecture, either review the system manual, or run:

Copy
Copied!
            

lspci -tv |grep NVIDIA

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.