Container Deployment
Set BlueField to DPU Mode
BlueField must run in DPU mode to use the DPL Runtime Service . For details how to change modes, see here: BlueField Modes of Operation.
Determine Your BlueField Variant
Your BlueField may be Installed in a host server or it may be a standalone server.
If your BlueField is a standalone server, please ignore the parts that mention the host server or SR-IOV.
You may still use Scalable Functions (SFs) if your BlueField is a standalone server.
Setup DPU Management Access and Update BlueField-Bundle
These pages provide detailed information about DPU management access and software installation and updates:
Systems with a Host Server typically use RShim (i.e. the tmfifo_net0
interface).
Standalone systems will have to use the OOB interface option for management access.
Creating SR-IOV Virtual Functions (Host Server)
The first step to use SR-IOV is to create Virtual Functions (VFs) on the host server.
VFs can be created using the following sequence:
sudo
-s # enter sudo shell
echo
4 > /sys/class/net/eth2/device/sriov_numvfs
exit
# exit sudo shell
Entering sudo shell rather than just issuing a single sudo
command is necessary because otherwise the sudo
applies only to the echo command and not the hosting shell and the redirection fails with "Permission denied"
This example creates 4 VFs under Physical Function eth2. Please adjust according to your needs.
If a PF already has VFs and you'd like to change the number of VFs, please set it to 0 before applying the new value.
Scalable Functions (DPU)
For more information, see this: BlueField Scalable Function User Guide
If you create SFs, refer to their representors in the configuration file.
Pulling the Container Resources and Scripts from NGC
Start by downloading and installing the ngc-cli tools.
Fetch the configuration files from NGC, this will create a directory named dpl_rt_service_<version>
.
e.g. dpl_rt_service_v1.0.0-doca2.10.0
Commands:
wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.58.0/files/ngccli_arm64.zip -O ngccli_arm64.zip
unzip ngccli_arm64.zip
./ngc-cli/ngc registry resource download-version "nvidia/doca/dpl_rt_service"
cd
dpl_rt_service_v1.0.0-doca2.10.0
Running the Preparation Script
Inside the directory with the scripts and YAML files that you pulled with the ngc-cli tool, you'll find
scripts/dpl_dpu_setup.sh
.
Running this script on the DPU (requires sudo) will allow the usage of SR-IOV Virtual-Function interfaces and create the directory structure of the configuration files in directory /etc/dpl_rt_service
. In addition, the script will set "hugepages" and call the necessary
mlxconfig
commands to use DPL Runtime Service.
Run the following sequence of commands from the working directory you pulled with the ngc-cli tool:
chmod
+x ./scripts/dpl_dpu_setup.sh
sudo
./scripts/dpl_dpu_setup.sh
sudo
systemctl restart kubelet.service
sudo
systemctl restart containerd.service
Restarting the services is necessary for the "hugepages" change to apply to them.
The following firmware settings are set by the setup script:
FLEX_PARSER_PROFILE_ENABLE=4
PROG_PARSE_GRAPH=true
SRIOV_EN=1
Edit the Configuration Files
Modify your configuration files as they are described here: Service Configuration
Important: you must create at least one device configuration under
/etc/dpl_rt_service/devices.d/
. It's advisable to start by making a copy of file /etc/dpl_rt_service/devices.d/NAME.conf.template
.
e.g.
cp
/etc/dpl_rt_service/devices.d/NAME.conf.template /etc/dpl_rt_service/devices.d/1000.conf
Setting up the kubelet Pod
Now that everything is ready, copy the file configs/dpl_rt_service.yaml
from the directory that you pulled with the ngc-cli into directory /etc/kubelet.d
.
Please allow a few minutes for the image to be pulled and the pod to be started. you may check the progress with command sudo journalctl -u kubelet --since -5m
, make sure to scroll down to see the latest log lines.
When the image is pulled, you will see it by using the command sudo crictl images
.
When the pod is loaded, you will see it by using the command sudo crictl pods
.
When the DPL Runtime Service is successfully running inside the pod, you will be able to find the log file in /var/log/doca/dpl_rt_service/dpl_rtd.log
Recap, Full Command Sequence
wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.58.0/files/ngccli_arm64.zip -O ngccli_arm64.zip
unzip ngccli_arm64.zip
./ngc-cli/ngc registry resource download-version "nvidia/doca/dpl_rt_service"
cd
dpl_rt_service_v1.0.0v1
chmod
+x ./scripts/dpl_dpu_setup.sh
sudo
./scripts/dpl_dpu_setup.sh
sudo
systemctl restart kubelet.service
sudo
systemctl restart containerd.service
sudo
cp
/etc/dpl_rt_service/devices.d/NAME.conf.template /etc/dpl_rt_service/devices.d/1000.conf
## Modify the configuration file /etc/dpl_rt_service/devices.d/1000.conf
sudo
cp
configs/dpl_rt_service.yaml /etc/kubelet.d/
The device ID and version numbers may be different in your case, please adapt as needed.