Installing NVIDIA DOCA-OFED#
The NVIDIA DGX™ Software Stack for Red Hat Enterprise Linux does not include the NVIDIA DOCA™ OFED (OpenFabrics Enterprise Distribution) software for Linux. This is to ensure that the DOCA-OFED software, a subset of the full DOCA package, is in sync with the Red Hat distribution kernel. This topic describes how to download, install, and upgrade the DOCA-OFED software on systems that are running Red Hat Enterprise Linux.
DOCA-Host Installation Profiles#
The DOCA software package contains several subsets called the DOCA-Host installation profiles, which are fully validated and tested installation packages. The following table lists the available DOCA-Host profiles:
DOCA-Host Profile |
Description |
|---|---|
doca-ofed |
Allows you to install the same drivers and tools of MLNX_OFED using the DOCA-Host package, but without other DOCA functionality. |
doca-network |
Intended for users who want to use only the networking functionality of the DOCA-Host package. |
doca-all |
Intended for users who want to use the full extent of DOCA drivers and libraries, the full DOCA-Host installation. |
For more information, refer to NVIDIA DOCA Profiles.
Prerequisites#
Before installing a different version of DOCA-OFED software, you must remove the installed DOCA-OFED or MLNX_OFED software on your system, if it was previously installed.
RPM-based Linux
# Remove the installed DOCA-OFED software from the host. for f in $(rpm -qa | grep -i doca ) ; do sudo dnf -y remove $f; done
# Remove the installed MLNX_OFED software. sudo /usr/sbin/ofed_uninstall.sh --force
sudo dnf autoremove
sudo dnf makecache
Installing DOCA-OFED#
If your system is equipped with the NVIDIA® BlueField®-3 DPU, ensure that the DPU is set in NIC mode. (See Identifying Which Mode BlueField is Currently Operating In and Changing BlueField Mode for more information.)
Then proceed with the following instructions to install DOCA. (See DOCA Installation Guide for Linux for more information.)
Install DOCA:
sudo dnf install -y doca-ofed
The
mlnxofed-docsdocumentation can be installed as follows:sudo dnf install mlnxofed-docs
Load the drivers:
sudo /etc/init.d/openibd restart
Initialize MST:
sudo mst restart
For more information about installing the doca-ofed profile, refer to
Installing Software on Host.
Additional Steps For Installing DOCA-OFED on Systems with ConectX-7 Cards#
Update the firmware as follows:
sudo dnf install mlnx-fw-updater
Additional Steps For Installing DOCA-OFED on Systems with BlueField-3 in NIC Mode#
Determine the BlueField-3 Device ID.
As described in the NVIDIA BlueField-3 Networking Platform User Guide, the device ID of all [BlueField] DPUs is 41692 [0xA2DC]. To see all BlueField devices, run the following command:
lspci -d :a2dc
The output should look similar to the following:
0006:03:00.0 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) 0006:03:00.1 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) 0016:03:00.0 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) 0016:03:00.1 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01)
Install the
kernel-modules-extra packageif it has not already been installed:sudo dnf install -y kernel-modules-extra-$(uname -r)
Determine if the kernel version on your host is supported as shown in Supported Host OS per DOCA-Host Installation Profile.
If the kernel version is not supported, follow the instructions described in DOCA Extra Package and doca-kernel-support.
RShimis used to update the BlueField-3 device firmware (See Deploying BlueField Software from Host for more information.)RShimwas installed when thesudo dnf install doca-ofedcommand was run.Start RShim:
sudo systemctl daemon-reload
sudo systemctl enable rshim
sudo systemctl start rshim
sudo systemctl status rshim
Note
The output should look similar to the following:
● rshim.service - rshim driver for BlueField SoC Loaded: loaded (/usr/lib/systemd/system/rshim.service; enabled; preset: disabled) Active: active (running) since Tue 2025-10-14 15:27:40 PDT; 7s ago ...
To confirm that the NVIDIA BlueField-3 SoC Management Interface is on the system, run the following to print the PCI BDF for the BlueField-3 Soc Management Interface devices:
sudo lspci | grep "BlueField-3 SoC Management Interface"
The output should look similar to the following:
29:00.2 DMA controller: Mellanox Technologies MT43244 BlueField-3 SoC Management Interface (rev 01) aa:00.2 DMA controller: Mellanox Technologies MT43244 BlueField-3 SoC Management Interface (rev 01)
If the BlueField-3 SoC Management Interface is on the system, install the BF-bundle:
sudo bfb-install --rshim rshim<N> --bfb <image_path.bfb>
Where <N> is the RShim device identifier (/dev/rshimN).
Re-create an initramfs image.
sudo dracut -f
Reboot the system.
sudo systemctl reboot
Additional Information
MFT download instructions: Updating Firmware for a Single Network Interface Card (NIC)
Changing BlueField-3 BMC default password: Changing Default Password
The nvidia-peermem Kernel Module#
The nvidia-peermem kernel module registers the NVIDIA GPU with the InfiniBand subsystem by using
peer-to-peer APIs provided by the NVIDIA GPU driver. For more information, refer to Using nvidia-peermem
in the NVIDIA GPUDirect RDMA documentation.
The nvidia-peermem module will be automatically reloaded on reboot by the following means:
When the NVIDIA System Core group was installed,
a TuneD profile, /usr/lib/tuned/profiles/nvidia-base/tuned.conf, was created which contains an
nvidia-peermem entry in its [modules] section that will load nvidia-peermem automatically on reboot.
Install nvidia-mlnx-config Package For DOCA Performance Improvement#
The nvidia-mlnx-config package that is included in the nvidia-driver-local-repo
can be installed to provide better performance on systems where DOCA is installed.
This package does the following two things:
The
setpcicommand is run to setMaxReadReq(MRRS) to an optimum performance setting.On Ampere platforms (DGX A100, DGX A800), the
MAX_ACC_OUT_READPCI parameter is set to the correct value for the firmware to be able to configure the optimum performance setting. It isn’t necessary to set theMAX_ACC_OUT_READPCI parameter on other platform types, since the firmware configures the optimum performance setting withoutMAX_ACC_OUT_READbeing modified.
Install the nvidia-mlnx-config package as follows:
sudo dnf install nvidia-mlnx-config
A reboot is required to incorporate these new settings.