C. Installing Mellanox InfiniBand Drivers

Unlike the DGX OS shipped with the NVIDIA DGX server, the DGX software stack for Red Hat-derived operating systems does not include the Mellanox OpenFabrics Enterprise Distribution (MLNX_OFED) for Linux. This is to avoid an installation where the MLNX_OFED kernel may be out of sync with the Red Hat distribution kernel, resulting in system instability.

To use InfiniBand on the DGX server, do the following.

  1. Determine which MLNX_OFED package supports the latest version of the installed Red Hat Enterprise Linux release.
    1. Visit https://access.redhat.com/articles/3078 and determine the latest Red Hat Enterprise Linux 7 version
    2. Visit https://docs.mellanox.com/category/mlnxofedib, click the latest MLNX_OFED software version and then use the side menu to navigate to Release Notes->General Support in MLNX_OFED and view Supported Operating Systems to determine the MLNX_OFED package OS support.
  2. Visit the Mellanox site and download and install the appropriate MLNX_OFED driver.
  3. After installing the MLNX_OFED drivers, install the NVIDIA peer memory module.
    sudo yum install nvidia-peer-memory-dkms
Note: While in-box drivers may be available, using the in-box drivers is not recommended as they provide lower performance than the official MLNX OFED drivers and they do not support the GPUDirecttm RDMA feature. For more information on configuring the in-band drivers, see the following Red Hat Enterprise Linux documentation: