B. Installing the NVIDIA Mellanox InfiniBand Drivers

This section describes how to install MLNX_OFED on systems that do not yet have it installed. It is imperative that a validated MLNX_OFED version is used for the RHEL version that the DGX system is running. Note that the “dnf update” command that is run before installing the NVIDIA driver will update the system to the latest Red Hat Enterprise Linux version.

  1. Determine which version of Red Hat Enterprise Linux is installed on the DGX system.
    cat /etc/redhat-release
  2. Determine the appropriate MLNX_OFED software bundle to install..

    Refer to index.html#determining-mofed-install-version.

  3. Download the MLNX_OFED software bundle.
    1. Visit the Linux InfiniBand Drivers page, scroll down to the Download wizard, and then click the Download tab.



    2. At the MLNX_OFED Download Center matrix, choose
      • The version to install (you may need to select Archive Versions),
      • RHEL/CentOS (under OS Distribution), and
      • The relevant OS Distribution Version and Architecture.




    3. Click the desired ISO/tgz package.

      To obtain the download link, accept the End User License Agreement.

  4. After downloading the correct MLNX_OFED software bundle, proceed with the installation steps.
    1. Re-visit the MLNX_OFED Software Releases site and select the MLNX_OFED software version you intend to use.
    2. Use the side menu to navigate to Installation->Installing MLNX_OFED, and follow the instructions.
      Note: The system may report that additional software needs to be installed before performing the installation. If such a message appears, install the software and then retry installing the MLNX_OFED driver.
  5. If you intend to use NVIDIA GPUDirect Storage (GDS), enable the driver's GDS support according to the instructions at https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html#mofed-req-install.
  6. Install nvidia-mlnx-config.
    sudo dnf install -y nvidia-mlnx-config
  7. Install kernel headers and development packages for your kernel.

    These are needed for the ensuing DKMS compilation.

    sudo dnf install -y kernel-headers-$(uname -r) kernel-devel-$(uname -r)
  8. After installing the MLNX_OFED drivers, install the NVIDIA peer memory module.
    sudo dnf install -y nvidia-peer-memory-dkms
Note: While in-box drivers may be available, using the in-box drivers is not recommended as they provide lower performance than the official MLNX OFED drivers and they do not support the GPUDirectTM RDMA feature. For more information on configuring the in-box drivers, see the following Red Hat Enterprise Linux documentation: