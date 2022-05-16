Installation Guide
NVIDIA DOCA Installation Guide
This document details the necessary steps to set up NVIDIA DOCA in your environment.
There are two ways to install the NVIDIA BlueField-2 DPU software:
- Using the SDK Manager which provides a GUI/CLI for full BlueField-2 installation
- Manual installation with a step-by-step procedure
1.1. Supported Platforms
|Model Number
|Description
|MBF2H322A-AEEOT
|NVIDIA® BlueField®-2 P-Series DPU 25GbE Dual-Port SFP56, PCIe Gen4 x8, Crypto Enabled, 8GB on-board DDR, 1GbE OOB management, Tall Bracket, HHHL
|MBF2H322A-AENOT
|NVIDIA BlueField-2 P-Series DPU 25GbE Dual-Port SFP56, PCIe Gen4 x8, Crypto Disabled, 8GB on-board DDR, 1GbE OOB management, Tall Bracket, HHHL
|MBF2H332A-AEEOT
|NVIDIA BlueField-2 P-Series DPU 25GbE Dual-Port SFP56, PCIe Gen3/4 x8, Crypto Enabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, HHHL
|MBF2H332A-AENOT
|NVIDIA BlueField-2 P-Series DPU 25GbE Dual-Port SFP56, PCIe Gen3/4 x8, Crypto Disabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, HHHL
|MBF2H516A-CEEOT
|NVIDIA BlueField-2 P-Series DPU 100GbE Dual-Port QSFP56, PCIe Gen4 x16, Crypto Enabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, FHHL
|MBF2H516A-CENOT
|NVIDIA BlueField-2 P-Series DPU 100GbE Dual-Port QSFP56, PCIe Gen4 x16, Crypto Disabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, FHHL
|MBF2H516A-EEEOT
|NVIDIA BlueField-2 P-Series DPU 100GbE/EDR/HDR100 VPI Dual-Port QSFP56, PCIe Gen4 x16, Crypto Enabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, FHHL
|MBF2H516A-EENOT
|NVIDIA BlueField-2 P-Series DPU 100GbE/EDR VPI Dual-Port QSFP56; PCIe Gen4 x16; Crypto Disabled; 16GB on-board DDR; 1GbE OOB management; FHHL
|MBF2H516B-CENOT
|NVIDIA BlueField-2 P-Series BF2500 DPU Controller, 100GbE Dual-Port QSFP56, PCIe Gen4 x16, Crypto Disabled, 16GB on-board DDR, 1GbE OOB Management, Tall Bracket, FHHL
|MBF2H516B-EENOT
|NVIDIA BlueField-2 P-Series BF2500 DPU Controller, 100GbE/EDR/HDR100 VPI Dual-Port QSFP56, PCIe Gen4 x16, Crypto Disabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, FHHL
|MBF2M322A-AEEOT
|NVIDIA BlueField-2 E-Series DPU 25GbE Dual-Port SFP56, PCIe Gen3/4 x8, Crypto, 8GB on-board DDR, 1GbE OOB management, Tall Bracket, HHHL
|MBF2M322A-AENOT
|NVIDIA BlueField-2 E-Series DPU 25GbE Dual-Port SFP56, PCIe Gen3/4 x8, Crypto Disabled, 8GB on-board DDR, 1GbE OOB management, Tall Bracket, HHHL
|MBF2M332A-AEEOT
|NVIDIA BlueField-2 E-Series DPU 25GbE Dual-Port SFP56, PCIe Gen4 x8, Crypto, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, HHHL
|MBF2M332A-AENOT
|NVIDIA BlueField-2 E-Series DPU 25GbE Dual-Port SFP56, PCIe Gen4 x8, Crypto Disabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, HHHL
|MBF2M516A-CEEOT
|NVIDIA BlueField-2 E-Series DPU 100GbE Dual-Port QSFP56; PCIe Gen4 x16; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management; FHHL
|MBF2M516A-CENOT
|NVIDIA BlueField-2 E-Series DPU 100GbE Dual-Port QSFP56, PCIe Gen4 x16, Crypto Disabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, FHHL
|MBF2M516A-EEEOT
|NVIDIA BlueField-2 E-Series DPU 100GbE/EDR/HDR100 VPI Dual-Port QSFP56, PCIe Gen4 x16, Crypto Enabled, 16GB on-board DDR, 1GbE OOB management, Tall Bracket, FHHL
|MBF2M516A-EENOT
|NVIDIA BlueField-2 E-Series DPU 100GbE/EDR/HDR100 VPI Dual-Port QSFP56; PCIe Gen4 x16; Crypto Disabled; 16GB on-board DDR; 1GbE OOB management; FHHL
|900-21004-0030-000
|NVIDIA BlueField-2 A30X, P1004 SKU 205, Generic, GA100, 24GB HBM2e, PCIe Passive Dual Slot 230W Gen 4.0, DPU Crypto ON W/ Bkt, 1 Dongle, Black, HF, VCPD
|900-21004-0010-000
|NVIDIA BlueField-2 A100X, P1004 SKU 230, Generic, GA100, 80GB HBM2e, PCIe Passive Dual Slot 300W Gen 4.0, DPU Crypto ON W/ Bkt, 1 Dongle, Black, HF, VCPD
1.2. Hardware Prerequisites
This quick start guide assumes that an NVIDIA® BlueField® DPU has been installed in a server according to the instructions detailed in your DPU's hardware user guide.
1.3. DOCA Packages
|Device
|Component
|Version
|Description
|
Host
|DOCA SDK
|0.4.0
|Software development kit package for developing host software
|DOCA Runtime
|1.3.0
|Runtime libraries required to run DOCA-based software applications on host
|DOCA tools
|1.3.0
|DOCA tools for developers and administrators on host
|
Arm emulated (Qemu) development container
|3.9.0
|Linux-based BlueField Arm emulated container for developers
|
Target BlueField-2 DPU (Arm)
|BlueField OS
|3.9.0
|BlueField OS image and firmware
|DOCA SDK
|0.4.0
|Software development kit packages for developing Arm software
|DOCA runtime
|1.3.0
|Runtime libraries requied to run DOCA-based software applications on Arm
|DOCA tools
|1.3.0
|DOCA tools for developers and administrators for Arm target
1.4. Supported Operating System
The operating system supported on the BlueField DPU is Ubuntu 20.04.
The following operating systems are supported on the host machine:
- Ubuntu 18.04/20.04
- CentOS/RHEL 7.6/8.0/8.2
- Debian 10.8
1.5. Supported Kernel Versions
Only the following generic kernel versions are supported for DOCA local repo package for host installation (whether by SDKM or manually).
|Host Operation System
|Kernel Support
|CentOS 7.6
|3.10.0-957.el7.x86_64
|CentOS 8.0
|4.18.0-80.el8.x86_64
|CentOS 8.2
|4.18.0-193.el8.x86_64
|RHEL 7.6
|3.10.0-957.el7.x86_64
|RHEL 8.0
|4.18.0-80.el8.x86_64
|RHEL 8.2
|4.18.0-193.el8.x86_64
|Ubuntu 18.04
|4.15.0-20-generic
|Ubuntu 20.04
|5.4.0-26-generic
|Debian 10.8
|4.19.0-14-amd64
NVIDIA SDK Manager supports DOCA installation, including software packages on the host and the BlueField-2 target.
- To use the SDK Manager GUI, please refer to NVIDIA SDK Manager GUI installation guide for DOCA for detailed instructions.
- To use the SDK Manager CLI, please refer to NVIDIA SDK Manager CLI installation guide for DOCA for detailed instructions.
SDK manager installation requires internet connection through out-of-band (OOB) port.
This guide provides the minimal first-step instructions for setting up DOCA on a standard system.
3.1. Installation Files
3.2. Software Prerequisites
- If you wish to continue without the DOCA local repo package for host, install the minimal tools needed on the host to allow managing and flashing new firmware on the BlueField.
For Ubuntu/Debian
- Download the DOCA Tools package from Installation Files section for the host.
- Unpack the deb repo. Run:
sudo dpkg -i doca-host-repo-ubuntu<version>_amd64.deb
- Perform apt update. Run:
sudo apt-get update
- Run apt install for DOCA SDK, DOCA runtime, DOCA tools.
sudo apt install doca-tools
For CentOS/RHEL
Note:
- Download the DOCA Tools package from Installation Files section for the x86 host.
- Unpack the RPM repo. Run:
sudo rpm -Uvh doca-host-repo-rhel<version>.x86_64.rpm
- Run
yum installfor DOCA runtime, tools, and SDK.
sudo yum install doca-runtime sudo yum install doca-tools sudo yum install doca-sdk
Skip the following step to proceed without the DOCA local repo package for host.
- Alternatively, to continue with the DOCA local repo package for host installation:
Installing DOCA Local Repo Package on Ubuntu Host
- Download the DOCA SDK, DOCA Runtime, and DOCA Tools package from Installation Files section for the host.
- Unpack the deb repo. Run:
sudo dpkg -i doca-host-repo-ubuntu<version>_amd64.deb
- Perform apt update. Run:
sudo apt-get update
- Run apt install for DOCA runtime, tools, and SDK.
sudo apt install doca-runtime sudo apt install doca-tools sudo apt install doca-sdk
Installing DOCA Local Repo Package on CentOS Host
- Download the DOCA SDK, DOCA Runtime, and DOCA Tools package from Installation Files section for the x86 host.
- Install the following software dependencies. Run:
sudo yum install -y epel-release sudo yum install -y uriparser-devel sudo yum install -y 'dnf-command(config-manager)' sudo dnf -y install dnf-plugins-core sudo yum install -y epel-release sudo dnf config-manager --set-enabled PowerTools sudo yum install meson
- Unpack the RPM repo. Run:
sudo rpm -Uvh doca-host-repo-rhel<version>.x86_64.rpm
- Run
yum installfor DOCA runtime, tools, and SDK.
sudo yum install doca-runtime sudo yum install doca-tools sudo yum install doca-sdk
Installing DOCA Local Repo Package on RHEL Host
- Open a RedHat account.
- Log into RedHat website via the developers tab.
- Create a developer user.
- Run:
subscription-manager register --username=<username> --password=PASSWORD
To extract pool ID:
subscription-manager list --available --all ... Subscription Name: Red Hat Developer Subscription for Individuals Provides: Red Hat Developer Tools (for RHEL Server for ARM) ... Red Hat CodeReady Linux Builder for x86_64 ... Pool ID: <pool-id> ...
And use the pool ID for the
Subscription Nameand
Providesthat include
Red Hat CodeReady Linux Builder for x86_64.
- Run:
subscription-manager attach --pool=<pool-id> subscription-manager repos --enable codeready-builder-for-rhel-8-x86_64-rpms
- Install the DOCA local repo package for host. Run:
rpm -Uvh doca-host-repo-rhel<version>.x86_64.rpm yum makecache sudo yum install doca-runtime sudo yum install doca-tools sudo yum install doca-sdk
- Sign out from your RHEL account. Run:
subscription-manager remove --all subscription-manager unregister
The upgrade takes effect only after
mlxfwresetwhich is performed in later steps.
- Initialize MST. Run:
sudo mst start
- Reset the
nvconfigparams to their default values:
sudo mlxconfig -d /dev/mst/mt41686_pciconf0 -y reset Reset configuration for device /dev/mst/mt41686_pciconf0? (y/n) [n] : y Applying... Done! -I- Please reboot machine to load new configurations.
- Skip this step if your BlueField DPU is Ethernet only. Please refer to Supported Platforms to learn your DPU type.
If you have a VPI DPU, the default link type of the ports will be configured to IB. To verify your link type, run:
sudo mst start sudo mlxconfig -d /dev/mst/mt41686_pciconf0 -e q | grep -i link_type Configurations: Default Current Next Boot * LINK_TYPE_P1 IB(1) ETH(2) IB(1) * LINK_TYPE_P2 IB(1) ETH(2) IB(1)Note:
If your DPU is Ethernet capable only, then the sudo mlxconfig -d <device> command will not provide an output.
If the current link type is set to IB, run the following command to change it to Ethernet:
sudo mlxconfig -d /dev/mst/mt41686_pciconf0 s LINK_TYPE_P1=2 LINK_TYPE_P2=2
- Assign a dynamic IP to
tmfifo_net0interface (RShim host interface).Note:
Skip this step if you are installing the DOCA image on multiple DPUs.
ifconfig tmfifo_net0 192.168.100.1 netmask 255.255.255.252 up
- Verify that RShim is active.
sudo systemctl status rshim
active (running). If RShim service does not launch automatically, run:
sudo systemctl enable rshim sudo systemctl start rshim
3.3. Image Installation
Users have two options for installing DOCA on the DPU:
- Upgrading the full DOCA image on the DPU (recommended) - this option overwrites the entire boot partition.
- Upgrading DOCA local repo package on the DPU – this option upgrades DOCA components without overwriting the boot partition. Use this option to preserve configurations or files on the DPU itself.
3.3.1. Installing Full DOCA Image on DPU
If you are installing DOCA on multiple DPUs, skip to section Installing Full DOCA Image on Multiple DPUs.
This step overwrites the entire boot partition.
Ubuntu users are required to provide a unique password that will be applied at the end of the BlueField OS image installation. This password needs to be defined in a
bf.cfg configuration file.
To set the password for the "ubuntu" user:
- Create password hash. Run:
# openssl passwd -1 Password: Verifying - Password: $1$3B0RIrfX$TlHry93NFUJzg3Nya00rE1
- Add the password hash in quotes to the
bf.cfgfile:
# sudo vim bf.cfg ubuntu_PASSWORD='$1$3B0RIrfX$TlHry93NFUJzg3Nya00rE1'
When running the installation command, use the
--configflag to provide the file containing the password:
sudo bfb-install --rshim <rshimN> --bfb <image_path.bfb> --config bf.cfgNote:
If --config is not used, then upon first login to the BlueField device, users will be asked to update their password.
The following is an example of Ubuntu installation assuming the "pv" Linux tool has been installed (to view the installation progress).
sudo bfb-install --rshim rshim0 --bfb DOCA_<version>-aarch64.bfb --config bf.cfg Pushing bfb 1.08GiB 0:00:57 [19.5MiB/s] [ <=> ] Collecting BlueField booting status. Press Ctrl+C to stop… INFO[BL2]: start INFO[BL2]: DDR POST passed INFO[BL2]: UEFI loaded INFO[BL31]: start INFO[BL31]: runtime INFO[UEFI]: eMMC init INFO[UEFI]: eMMC probed INFO[UEFI]: PCIe enum start INFO[UEFI]: PCIe enum end INFO[MISC]: Ubuntu installation started INFO[MISC]: Installation finished INFO[MISC]: Rebooting...Note:
This installation sets up the OVS bridge.
3.3.2. Installing Full DOCA Image on Multiple DPUs
On a host with multiple DPUs, the BFB image can be installed on all of them using the
multi-bfb-install script.
./bfb-multi-install --bfb <bfb-file> --password <password>
This script detects the number of RShim devices and configures them statically.
- For Ubuntu – the script creates a configuration file
/etc/netplan/20-tmfifo.yaml
- For CentOS/RH 7.6 – the script creates a configuration file
/etc/sysconfig/network-scripts/ifcfg-br_tmfifo
- For CentOS/RH 8.0 and 8.2 – the script installs the
bridge-utilspackage to use the
brctlcommand, creates the
tm-brbridge and connects all RShim interfaces to it
After the installation is complete, the configuration of the bridge and each RShim interface can be observed using
ifconfig. The expected result is to see the IP on the
tm-br bridge configured to
192.168.100.1 with subnet
255.255.255.0.
To log into BlueField with
rshim0, run:
ssh ubuntu@192.168.100.2
For each RShim after that, add 1 to the fourth octet of the IP address (e.g.,
ubuntu@192.168.100.3 for
rshim1,
ubuntu@192.168.100.4 for
rshim2, etc).
3.3.3. Installing DOCA Local Repo Package on DPU
If you have already installed BlueField OS image, be aware that the DOCA SDK Runtime tools are already contained in the BFB, and that this installation is not mandatory.
Before installing DOCA on the target DPU, make sure the out-of-band interface (mgmt) is connected to the internet.
- Download the DOCA SDK, DOCA Runtime, and DOCA Tools package from section Installation Files.
- Copy deb repo package into BlueField. Run:
sudo scp -r doca-repo-aarch64-ubuntu2004-local_<version>_arm64.deb ubuntu@192.168.100.2:/tmp/
- Unpack the deb repo. Run:
sudo dpkg -i doca-repo-aarch64-ubuntu2004-local_<version>_arm64.deb
- Run apt update.
sudo apt-get update
- Run apt install for DOCA runtime, tools, and SDK:
sudo apt install doca-runtime sudo apt install doca-tools sudo apt install doca-sdk
3.4. Firmware Upgrade
If you have multiple cards installed, the following steps must be performed on all of them after BFB installation.
To upgrade firmware:
- SSH to your BlueField device via 192.168.100.2 (preconfigured). The default credentials for Ubuntu are as follows:
- Username: ubuntu
- Password: unique password
For example:
host$ ssh ubuntu@192.168.100.2 Password: <configured-password>
-
Upgrade firmware in BlueField DPU. Run:
dpu$ sudo /opt/mellanox/mlnx-fw-updater/mlnx_fw_updater.pl
Example output:
Device #1: ---------- Device Type: BlueField-2 [...] Versions: Current Available FW <Old_FW> <New_FW>
- For the firmware upgrade to take effect:
- Run the following command on the BlueField DPU and host:
sudo mst start
- Run the command below on the BlueField DPU and immediately afterwards on the host. Do not wait for the command to complete on the BlueField DPU before issuing the command on the host.
sudo mlxfwreset -d /dev/mst/mt41686_pciconf0 -l 3 -y resetNote:
If your BlueField device is a controller or if you are performing remote install, you must power cycle the BlueField.Note:
If your BlueField device is an NVIDIA Converged Accelerator card, you must power cycle the card and the host.
- Run the following command on the BlueField DPU and host:
3.5. Post-installation Procedure
- Restart the driver. Run:
host$ sudo /etc/init.d/openibd restart Unloading HCA driver: [ OK ] Loading HCA driver and Access Layer: [ OK ]
- Configure the physical function (PF) interfaces.
host$ sudo ifconfig <interface-1> <network-1/mask> up host$ sudo ifconfig <interface-2> <network-2/mask> up
host$ sudo ifconfig p2p1 192.168.200.32/24 up host$ sudo ifconfig p2p2 192.168.201.32/24 up
For full instructions about setting up a development environment, refer to the NVIDIA DOCA Developer Guide.
NVIDIA® CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing GPUs.
This section details the necessary steps to set up CUDA on your environment. This section assumes that a BFB image has already been installed on your environment.
To install CUDA on your converged accelerator:
- Download and install the latest NVIDIA Data Center GPU driver.
- Download and install CUDA.
Downloading CUDA includes the latest NVIDIA Data Center GPU driver and CUDA toolkit. For more information about CUDA and driver compatibility, refer to NVIDIA CUDA Toolkit Release Notes.
5.1. Configuring Operation Mode
There are two modes that the NVIDIA Converged Accelerator may operate in:
- Standard mode (default) – the BlueField DPU and the GPU operate separately
- BlueField-X mode – the GPU is exposed to the DPU and is no longer visible on the host
To verify which mode the system is operating in, run:
$ sudo mst start
$ sudo mlxconfig -d /dev/mst/mt41686_pciconf0 q PCI_DOWNSTREAM_PORT_OWNER[4]
Standard mode output:
Device #1:
[…]
Configurations: Next Boot
PCI_DOWNSTREAM_PORT_OWNER[4] DEVICE_DEFAULT(0)
BlueField-X mode output:
Device #1:
[…]
Configurations: Next Boot
PCI_DOWNSTREAM_PORT_OWNER[4] EMBEDDED_CPU(15)
To configure BlueField-X mode, run:
$ mlxconfig -d /dev/mst/mt41686_pciconf0 s PCI_DOWNSTREAM_PORT_OWNER[4]=0xF
To configure standard mode, run:
$ mlxconfig -d /dev/mst/mt41686_pciconf0 s PCI_DOWNSTREAM_PORT_OWNER[4]=0x0
Power cycle is required for configuration to take effect.
5.2. Downloading and Installing CUDA Toolkit and Driver
This section details the necessary steps to set up CUDA on your environment. It assumes that a BFB image has already been installed on your environment.
- Verify that the installation completed successfully.
Note:
- Install Download CUDA samples repo. Run:
git clone https://github.com/NVIDIA/cuda-samples.git
- Build and run
vectorAddCUDA sample. Run:
cd cuda-samples/Samples/0_Introduction/vectorAdd make ./vectorAdd
If the
vectorAddsample works as expected, it should output "Test Passed".Note:
If it seems that the GPU is slow or stuck, stop execution and run:
sudo setpci -v -d ::0302 800.L=201 # CPL_VC0 = 32
- Install Download CUDA samples repo. Run:
5.3. GPUDirect RDMA
To enable GPUDirect RDMA with a network card on NVIDIA Converged Accelerator, you need an additional kernel module. Run:
sudo modprobe nvidia-peermem
5.4. DPDK GPUDEV
To enable CPU map GPU memory feature in DPDK's
gpudev library, you need the
gdrcopy library and driver to be installed on your system.
- Install the
gdrcopylibrary. Run:
git clone https://github.com/NVIDIA/gdrcopy.git
- Build the library and install the driver. Run:
cd gdrcopy make # Launch gdrdrv kernel module on the system ./insmod.sh
- Setup the path to
gdrcopy. Run:
export GDRCOPY_PATH_L=/path/to/libgdrapiNote:
In general, the path to
libgdrapiis
/path/to/gdrcopy/src/.
