DOCA Documentation v3.1.0

DPL Container Deployment

Setting BlueField to DPU Mode

BlueField must run in DPU mode to use the DPL Runtime Service. For details on how to change modes, see: BlueField Modes of Operation.

Determining Your BlueField Variant

Your BlueField may be installed in a host server or it may be a standalone server.

If your BlueField is a standalone server, ignore the parts that mention the host server or SR-IOV. You may still use Scalable Functions (SFs) if your BlueField is a standalone server.

Setting Up DPU Management Access and Updating BlueField-Bundle

These pages provide detailed information about DPU management access, software installation, and updates:

Note

Systems with a host server typically use RShim (i.e., the tmfifo_net0 interface). Standalone systems must use the OOB interface option for management access.


Creating SR-IOV Virtual Functions (Host Server)

To use SR-IOV, first create Virtual Functions (VFs) on the host server:

Copy
Copied!
            

sudo -s # enter sudo shell echo 4 > /sys/class/net/eth2/device/sriov_numvfs exit # exit sudo shell

Note

Entering a sudo shell is necessary because sudo only applies to the echo command, and not the redirection (>), which would otherwise result in "Permission denied."

This example creates 4 VFs under Physical Function eth2. Adjust the number as needed.

Info

If a PF already has VFs and you'd like to change the number, first set it to 0 before applying the new value.


Creating Scalable Functions (Optional)

Info

This step is optional and depends on your DPL program and setup needs.

For more information, see the BlueField Scalable Function User Guide.

If you create SFs, refer to their representors in the configuration file.

Enabling Multiport eSwitch Mode (Optional)

Info

This step is optional and depends on your DPL program and setup needs.

Multiport eSwitch mode allows for traffic forwarding between multiple physical ports and their VFs/SFs (e.g., between p0 and p1).

Before enabling this mode:

  1. Ensure LAG_RESOURCE_ALLOCATION is enabled in firmware:

    Copy
    Copied!
                

    sudo mlxconfig -d 0000:03:00.0 s LAG_RESOURCE_ALLOCATION=1

    Info

    Refer to the Using mlxconfig guide for more information.

  2. After reboot or firmware reset, enable esw_multiport mode:

    Copy
    Copied!
                

    sudo devlink dev param set pci/0000:03:00.0 name esw_multiport value 1 cmode runtime

    Note

    devlink settings are not persistent across reboots.

Downloading Container Resources from NGC

Start by downloading and installing the ngc-cli tools.

Copy
Copied!
            

wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.58.0/files/ngccli_arm64.zip -O ngccli_arm64.zip unzip ngccli_arm64.zip ./ngc-cli/ngc registry resource download-version "nvidia/doca/dpl_rt_service"

This creates a directory such as dpl_rt_service_v1.2.0-doca3.1.0.

Info

You can find available versions at NGC Catalog.

Info

Each release includes a kubelet.d YAML file pointing to the correct container image for automatic download.


Running the Preparation Script

Run the script to configure the DPU for DPL use:

Copy
Copied!
            

cd dpl_rt_service_va.b.c-docax.y.z chmod +x ./scripts/dpl_dpu_setup.sh sudo ./scripts/dpl_dpu_setup.sh sudo systemctl restart kubelet.service sudo systemctl restart containerd.service

Warning

Restarting kubelet and containerd is required whenever hugepages configuration changes for the changes to take effect.

This script:

  • Configures mlxconfig values:

    • FLEX_PARSER_PROFILE_ENABLE=4

    • PROG_PARSE_GRAPH=true

    • SRIOV_EN=1

  • Enables SR-IOV

  • Sets up /etc/dpl_rt_service/

  • Configures hugepages

Editing the Configuration Files

You must create at least one device configuration file. For example:

Copy
Copied!
            

sudo cp /etc/dpl_rt_service/devices.d/NAME.conf.template /etc/dpl_rt_service/devices.d/1000.conf

Then edit /etc/dpl_rt_service/devices.d/1000.conf as needed.

See DPL Service Configuration for details.

Starting the DPL Runtime Service Pod

Once your configuration files are ready, copy the file configs/dpl_rt_service.yaml from the directory that you pulled with the ngc-cli into /etc/kubelet.d:

Copy
Copied!
            

sudo cp ./configs/dpl_rt_service.yaml /etc/kubelet.d/

Now that everything is ready, copy the file configs/dpl_rt_service.yaml from the directory that you pulled with the ngc-cli into /etc/kubelet.d:

Copy
Copied!
            

sudo cp ./configs/dpl_rt_service.yaml /etc/kubelet.d/

Allow a few minutes for the pod to start.

To monitor status:

  1. Check logs:

    Copy
    Copied!
                

    sudo journalctl -u kubelet --since -5m

  2. List images:

    Copy
    Copied!
                

    sudo crictl images

  3. List pods:

    Copy
    Copied!
                

    sudo crictl pods

  4. View runtime logs:

    Copy
    Copied!
                

    /var/log/doca/dpl_rt_service/dpl_rtd.log

    Note

    If the container fails to start due to configuration errors, view logs with:

    Copy
    Copied!
                

    sudo crictl logs $(sudo crictl ps -a | grep dpl-rt-service | awk '{print $1}')

Restarting the Pod After Configuration Changes

  1. Remove the YAML file:

    Copy
    Copied!
                

    sudo rm -fv /etc/kubelet.d/dpl_rt_service.yaml

  2. Wait for the pod to stop.

  3. Re-copy the YAML file to restart:

    Copy
    Copied!
                

    sudo cp ./configs/dpl_rt_service.yaml /etc/kubelet.d/

End-to-End Installation Steps

Copy
Copied!
            

# Download NGC CLI and container bundle wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.58.0/files/ngccli_arm64.zip -O ngccli_arm64.zip unzip ngccli_arm64.zip ./ngc-cli/ngc registry resource download-version "nvidia/doca/dpl_rt_service"   # Prepare DPU and restart services cd dpl_rt_service_va.b.c-docax.y.z chmod +x ./scripts/dpl_dpu_setup.sh sudo ./scripts/dpl_dpu_setup.sh sudo systemctl restart kubelet.service sudo systemctl restart containerd.service   # Configure the service sudo cp /etc/dpl_rt_service/devices.d/NAME.conf.template /etc/dpl_rt_service/devices.d/1000.conf # Edit the file above   # Launch the pod sudo cp ./configs/dpl_rt_service.yaml /etc/kubelet.d/

Note

Replace device IDs and filenames as appropriate for your setup.


Stopping the DPL Runtime Service kubelet Pod

Stop the pod by removing its kubelet YAML file:

Copy
Copied!
            

sudo /bin/rm -fv /etc/kubelet.d/dpl_rt_service.yaml

Then confirm the pod is gone:

Copy
Copied!
            

sudo crictl pods | grep dpl-rt-service


For additional troubleshooting steps and deeper explanations, refer to BlueField Container Deployment Guide.

Checkpoint

Command

View recent kubelet logs

sudo journalctl -u kubelet --since -5m

View logs of the dpl-rt-service container

Helpful if /var/log/doca/dpl_rt_service/dpl_rtd.log is missing or incomplete

sudo crictl logs $(sudo crictl ps -a | grep dpl-rt-service | awk '{print $1}')

List pulled container images

sudo crictl images

List all created pods

sudo crictl pods

List running containers

sudo crictl ps

View DPL service logs

/var/log/doca/dpl_rt_service/dpl_rtd.log

Make sure the following conditions are met before or during deployment:

  • VFs were created before deploying the container (if using SR-IOV)

  • All required configuration files exist under /etc/dpl_rt_service/, are correctly named, and include valid device IDs

  • Network interface names and MTU settings match the physical and virtual network topology

  • Firmware is up to date and matches DOCA compatibility requirements

  • BlueField is operating in the correct mode (DPU mode) using sudo mlxconfig -d <pci-device> q

© Copyright 2025, NVIDIA. Last updated on Sep 4, 2025.