DPL Container Deployment

Setting BlueField to DPU Mode

BlueField must run in DPU mode to use the DPL Runtime Service. For details on how to change modes, see: BlueField Modes of Operation.

Determining Your BlueField Variant

Your BlueField may be installed in a host server or it may be a standalone server.

If your BlueField is a standalone server, ignore the parts that mention the host server or SR-IOV. You may still use Scalable Functions (SFs) if your BlueField is a standalone server.

Setting Up DPU Management Access and Updating BlueField-Bundle

These pages provide detailed information about DPU management access, software installation, and updates:

Note

Systems with a host server typically use RShim (i.e., the tmfifo_net0 interface). Standalone systems must use the OOB interface option for management access.

Creating SR-IOV Virtual Functions (Host Server)

To use SR-IOV, first create Virtual Functions (VFs) on the host server:

Example

Copy
Copied!

            
            sudo -s  # enter sudo shell
echo 4 > /sys/class/net/eth2/device/sriov_numvfs
exit     # exit sudo shell

Note

Entering a sudo shell is necessary because sudo only applies to the echo command, and not the redirection (>), which would otherwise result in "Permission denied."

This example creates 4 VFs under Physical Function eth2. Adjust the number as needed.

Info

If a PF already has VFs and you'd like to change the number, first set it to 0 before applying the new value.

Creating Scalable Functions (Optional)

Info

This step is optional and depends on your DPL program and setup needs.

For more information, see the BlueField Scalable Function User Guide.

If you create SFs, refer to their representors in the configuration file.

Enabling Multiport eSwitch Mode (Optional)

Info

This step is optional and depends on your DPL program and setup needs.

Multiport eSwitch mode allows for traffic forwarding between multiple physical ports and their VFs/SFs (e.g., between p0 and p1).

Before enabling this mode:

Ensure LAG_RESOURCE_ALLOCATION is enabled in firmware:
Example

Copy

Copied!
```
            
            sudo mlxconfig -d 0000:03:00.0 s LAG_RESOURCE_ALLOCATION=1
        
```
Info

Refer to the Using mlxconfig guide for more information.

After reboot or firmware reset, enable esw_multiport mode:

Example

Copy
Copied!

            
            sudo /opt/mellanox/iproute2/sbin/devlink dev param set pci/0000:03:00.0 name esw_multiport value 1 cmode runtime

Note

devlink settings are not persistent across reboots.

Installing the DPL Runtime Service on BlueField

Downloading Container Resources from NGC

Start by downloading and installing the ngc-cli tools.

For example:

Example

Copy
Copied!

            
            wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/4.5.2/files/ngccli_arm64.zip -O ngccli_arm64.zip
unzip ngccli_arm64.zip

Once the ngc-cli tool has been downloaded, use it to download the latest dpl_rt_service resources:

Copy
Copied!

            
            ./ngc-cli/ngc registry resource download-version "nvidia/doca/dpl_rt_service"

This creates a directory in the format dpl_rt_service_va.b.c-docax.y.z. Where a.b.c is the DPL Runtime Service version number, and x.y.z is the DOCA version number.

For example: dpl_rt_service_v1.2.0-doca3.1.0.

Info

You can find available versions at NGC Catalog.

Info

Each release includes a kubelet.d YAML file pointing to the correct container image for automatic download.

Running the Preparation Script

Run the dpl_dpu_setup.sh script to configure the DPU for DPL use:

Copy
Copied!

            
            cd dpl_rt_service_va.b.c-docax.y.z
chmod +x ./scripts/dpl_dpu_setup.sh
sudo ./scripts/dpl_dpu_setup.sh
sudo systemctl restart kubelet.service
sudo systemctl restart containerd.service

Warning

Restarting kubelet and containerd is required whenever hugepages configuration changes for the changes to take effect.

The dpl_dpu_setup.sh script will perform the following:

Configures mlxconfig values:
- FLEX_PARSER_PROFILE_ENABLE=4
- PROG_PARSE_GRAPH=true
- SRIOV_EN=1
Enables SR-IOV
Sets up initial DPL Runtime Service configuration folder at /etc/dpl_rt_service/
Configures hugepages

Info

Please note that the dpl_dpu_setup.sh script takes optional arguments to control the hugepages. However, if you set it larger than 4GB of hugepages, you also have to modify dpl_rt_service.yaml with a higher limit for spec->resources->limits->hugepages-2Mi

Editing the Configuration Files

Create device(s) configuration file based on the provided template config file.

See DPL Service Configuration for details.

For example:

Example

Copy
Copied!

            
            sudo cp /etc/dpl_rt_service/devices.d/NAME.conf.template /etc/dpl_rt_service/devices.d/1000.conf
# Then update /etc/dpl_rt_service/devices.d/1000.conf as needed.
sudo vim /etc/dpl_rt_service/devices.d/1000.conf

Warning

You must create at least one device configuration file.

Otherwise, the DPL Runtime Service Container will not be able to start.

Starting the DPL Runtime Service Pod and Container

Once your configuration files are ready, copy the file configs/dpl_rt_service.yaml from the directory that you pulled with the ngc-cli into /etc/kubelet.d:

Copy
Copied!

            
            sudo cp ./configs/dpl_rt_service.yaml /etc/kubelet.d/

Now that everything is ready, copy the file configs/dpl_rt_service.yaml from the directory that you pulled with the ngc-cli into /etc/kubelet.d:

Copy
Copied!

            
            sudo cp ./configs/dpl_rt_service.yaml /etc/kubelet.d/

Allow a few minutes for the Pod and Container to start.

To monitor status:

Check logs:

Copy
Copied!

            
            sudo journalctl -u kubelet --since -5m

List images:

Copy
Copied!

            
            sudo crictl images

List pods:

Copy
Copied!

            
            sudo crictl pods

View runtime logs:
Copy

Copied!
```
            
            /var/log/doca/dpl_rt_service/dpl_rtd.log
        
```
Note
If the container fails to start due to configuration errors, then the log file at /var/log/doca/dpl_rt_service/dpl_rtd.log might be empty or missing the relevant error logs.
In such case, you can view logs with the relevant errors using thecrictl tool:
Copy

Copied!

sudo crictl logs $(sudo crictl ps -a | grep dpl-rt-service | awk '{print $1}')

Restarting the DPL Runtime Service After Configuration Changes

Once the DPL Runtime Service Pod and Container are up and running, any change to any file under the /etc/dpl_rt_service/ configuration folder requires restarting the Container in order for the new changes to take effect.

Perform the following steps to restart the Container:

Remove the YAML file from the /etc/kubelet.d/ directory.

Copy
Copied!

            
            sudo mv /etc/kubelet.d/dpl_rt_service.yaml /

Wait for the pod to stop.

Restore the YAML file to the /etc/kubelet.d/ directory to trigger starting the Pod:

Copy
Copied!

            
            sudo mv /dpl_rt_service.yaml /etc/kubelet.d/

End-to-End Installation Steps

Example

Copy
Copied!

            
            # Download NGC CLI tool:
wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/4.5.2/files/ngccli_arm64.zip -O ngccli_arm64.zip
unzip ngccli_arm64.zip
 
# Download the DPL Runtime Service Resources bundle:
./ngc-cli/ngc registry resource download-version "nvidia/doca/dpl_rt_service"
 
# Prepare DPU and restart services:
cd dpl_rt_service_va.b.c-docax.y.z
chmod +x ./scripts/dpl_dpu_setup.sh
sudo ./scripts/dpl_dpu_setup.sh
sudo systemctl restart kubelet.service
sudo systemctl restart containerd.service
 
# Create a device configuration file with relevant interfaces info:
sudo cp /etc/dpl_rt_service/devices.d/NAME.conf.template /etc/dpl_rt_service/devices.d/1000.conf
sudo vim /etc/dpl_rt_service/devices.d/1000.conf
 
# Launch the Pod and Container:
sudo cp ./configs/dpl_rt_service.yaml /etc/kubelet.d/

Note

Replace device IDs and filenames as appropriate for your setup.

Stopping the DPL Runtime Service kubelet Pod and Container

Stop the Pod and Container by removing the YAML file from the /etc/kubelet.d/ directory.

Copy
Copied!

            
            sudo mv /etc/kubelet.d/dpl_rt_service.yaml /

Then confirm the pod is gone (this might take a few seconds to complete):

Copy
Copied!

            
            sudo crictl pods | grep dpl-rt-service

Troubleshooting

For additional troubleshooting steps and deeper explanations, refer to BlueField Container Deployment Guide.

Checkpoint	Command
View recent kubelet logs	`sudo journalctl -u kubelet --since -5m`
View logs of the `dpl-rt-service` container Helpful if `/var/log/doca/dpl_rt_service/dpl_rtd.log` is missing or incomplete	`sudo crictl logs $(sudo crictl ps -a \| grep dpl-rt-service \| awk '{print $1}')`
List pulled container images	`sudo crictl images`
List all created pods	`sudo crictl pods`
List running containers	`sudo crictl ps`
View DPL service logs	`/var/log/doca/dpl_rt_service/dpl_rtd.log`

Make sure the following conditions are met before or during deployment:

VFs were created before deploying the container (if using SR-IOV)
All required configuration files exist under /etc/dpl_rt_service/, are correctly named, and include valid device IDs
Network interface names and MTU settings match the physical and virtual network topology
Firmware is up to date and matches DOCA compatibility requirements
BlueField is operating in the correct mode (DPU mode) using sudo mlxconfig -d <pci-device> q

On This Page

DPL Container Deployment

Preparing the BlueField DPU

Setting BlueField to DPU Mode

Determining Your BlueField Variant

Setting Up DPU Management Access and Updating BlueField-Bundle

Port Configuration

Creating SR-IOV Virtual Functions (Host Server)

Example

Creating Scalable Functions (Optional)

Enabling Multiport eSwitch Mode (Optional)

Example

Example

Installing the DPL Runtime Service on BlueField

Downloading Container Resources from NGC

Example

Running the Preparation Script

Editing the Configuration Files

Example

Starting the DPL Runtime Service Pod and Container

Restarting the DPL Runtime Service After Configuration Changes

End-to-End Installation Steps

Example

Stopping the DPL Runtime Service kubelet Pod and Container

Troubleshooting