Installing NVIDIA DOCA-OFED#

The NVIDIA DGX™ Software Stack for Red Hat Enterprise Linux does not include the NVIDIA DOCA™ OFED (OpenFabrics Enterprise Distribution) software for Linux. This is to ensure that the DOCA-OFED software, a subset of the full DOCA package, is in sync with the Red Hat distribution kernel. This topic describes how to download, install, and upgrade the DOCA-OFED software on systems that are running Red Hat Enterprise Linux.

DOCA-Host Installation Profiles#

The DOCA software package contains several subsets called the DOCA-Host installation profiles, which are fully validated and tested installation packages. The following table lists the available DOCA-Host profiles:

DOCA-Host Profile

Description

doca-ofed

Allows you to install the same drivers and tools of MLNX_OFED using the DOCA-Host package, but without other DOCA functionality.

doca-network

Intended for users who want to use only the networking functionality of the DOCA-Host package.

doca-all

Intended for users who want to use the full extent of DOCA drivers and libraries, the full DOCA-Host installation.

For more information, refer to NVIDIA DOCA Profiles.

Prerequisites#

  1. Before installing a different version of DOCA-OFED software, you must remove the installed DOCA-OFED or MLNX_OFED software on your system, if it was previously installed.

    • RPM-based Linux

      # Remove the installed DOCA-OFED software from the host.
      for f in $(rpm -qa | grep -i doca ) ; do sudo dnf -y remove $f; done
      
      # Remove the installed MLNX_OFED software.
      sudo /usr/sbin/ofed_uninstall.sh --force
      
      sudo dnf autoremove
      
      sudo dnf makecache
      

Installing DOCA-OFED#

If your system is equipped with the NVIDIA® BlueField®-3 DPU, ensure that the DPU is set in NIC mode. (See Identifying Which Mode BlueField is Currently Operating In and Changing BlueField Mode for more information.)

Then proceed with the following instructions to install DOCA. (See DOCA Installation Guide for Linux for more information.)

  1. Install DOCA:

    sudo dnf install -y doca-ofed
    
  2. The mlnxofed-docs documentation can be installed as follows:

    sudo dnf install mlnxofed-docs
    
  3. Load the drivers:

    sudo /etc/init.d/openibd restart
    
  4. Initialize MST:

    sudo mst restart
    

For more information about installing the doca-ofed profile, refer to Installing Software on Host.

Additional Steps For Installing DOCA-OFED on Systems with ConectX-7 Cards#

  1. Update the firmware as follows:

    sudo dnf install mlnx-fw-updater
    

Additional Steps For Installing DOCA-OFED on Systems with BlueField-3 in NIC Mode#

  1. Determine the BlueField-3 Device ID.

    As described in the NVIDIA BlueField-3 Networking Platform User Guide, the device ID of all [BlueField] DPUs is 41692 [0xA2DC]. To see all BlueField devices, run the following command:

    lspci -d :a2dc
    

    The output should look similar to the following:

    0006:03:00.0 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01)
    0006:03:00.1 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01)
    0016:03:00.0 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01)
    0016:03:00.1 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01)
    
  2. Install the kernel-modules-extra package if it has not already been installed:

    sudo dnf install -y kernel-modules-extra-$(uname -r)
    
  3. Determine if the kernel version on your host is supported as shown in Supported Host OS per DOCA-Host Installation Profile.

    If the kernel version is not supported, follow the instructions described in DOCA Extra Package and doca-kernel-support.

  4. RShim is used to update the BlueField-3 device firmware (See Deploying BlueField Software from Host for more information.) RShim was installed when the sudo dnf install doca-ofed command was run.

  5. Start RShim:

    sudo systemctl daemon-reload
    
    sudo systemctl enable rshim
    
    sudo systemctl start rshim
    
    sudo systemctl status rshim
    

    Note

    The output should look similar to the following:

     rshim.service - rshim driver for BlueField SoC
         Loaded: loaded (/usr/lib/systemd/system/rshim.service; enabled; preset: disabled)
         Active: active (running) since Tue 2025-10-14 15:27:40 PDT; 7s ago
         ...
    
  6. To confirm that the NVIDIA BlueField-3 SoC Management Interface is on the system, run the following to print the PCI BDF for the BlueField-3 Soc Management Interface devices:

    sudo lspci | grep "BlueField-3 SoC Management Interface"
    

    The output should look similar to the following:

    29:00.2 DMA controller: Mellanox Technologies MT43244 BlueField-3 SoC Management Interface (rev 01)
    aa:00.2 DMA controller: Mellanox Technologies MT43244 BlueField-3 SoC Management Interface (rev 01)
    
  7. If the BlueField-3 SoC Management Interface is on the system, install the BF-bundle:

    sudo bfb-install --rshim rshim<N> --bfb <image_path.bfb>
    

    Where <N> is the RShim device identifier (/dev/rshimN).

  8. Re-create an initramfs image.

    sudo dracut -f
    
  9. Reboot the system.

    sudo systemctl reboot
    

Additional Information

The nvidia-peermem Kernel Module#

The nvidia-peermem kernel module registers the NVIDIA GPU with the InfiniBand subsystem by using peer-to-peer APIs provided by the NVIDIA GPU driver. For more information, refer to Using nvidia-peermem in the NVIDIA GPUDirect RDMA documentation.

The nvidia-peermem module will be automatically reloaded on reboot by the following means:

When the NVIDIA System Core group was installed, a TuneD profile, /usr/lib/tuned/profiles/nvidia-base/tuned.conf, was created which contains an nvidia-peermem entry in its [modules] section that will load nvidia-peermem automatically on reboot.

Install nvidia-mlnx-config Package For DOCA Performance Improvement#

The nvidia-mlnx-config package that is included in the nvidia-driver-local-repo can be installed to provide better performance on systems where DOCA is installed. This package does the following two things:

  • The setpci command is run to set MaxReadReq (MRRS) to an optimum performance setting.

  • On Ampere platforms (DGX A100, DGX A800), the MAX_ACC_OUT_READ PCI parameter is set to the correct value for the firmware to be able to configure the optimum performance setting. It isn’t necessary to set the MAX_ACC_OUT_READ PCI parameter on other platform types, since the firmware configures the optimum performance setting without MAX_ACC_OUT_READ being modified.

Install the nvidia-mlnx-config package as follows:

sudo dnf install nvidia-mlnx-config

A reboot is required to incorporate these new settings.