DPU Passthrough in DPF Host Trusted
Follow this guide from the source GitHub repo at github.com/NVIDIA/doca-platform and moving to the docs/public/user-guides/host-trusted/use-cases/passthrough/README.md
for better formatting of the code.
This configuration provides instructions for deploying the NVIDIA DOCA Platform Framework (DPF) on high-performance, bare-metal infrastructure in Host Trusted mode. It focuses on provisioning NVIDIA® BlueField®-3 DPUs using DPF and enabling them to act as passthrough devices.
This guide should be run by cloning the repo from github.com/NVIDIA/doca-platform and moving to the docs/public/user-guides/host-trusted/use-cases/passthrough/README.md
directory.
The system is set up as described in the system prerequisites.
Software prerequisites
The following tools must be installed on the machine where the commands contained in this guide run:
kubectl
helm
envsubst
0. Required variables
The following variables are required by this guide. A sensible default is provided where it makes sense, but many will be specific to the target infrastructure.
Commands in this guide are run in the same directory that contains this readme.
Environment variables file
## Virtual IP used by the load balancer for
the DPU Cluster. Must be a reserved IP from the management subnet and not allocated by DHCP.
export DPUCLUSTER_VIP=
## Interface on which the DPUCluster load balancer will listen. Should be the management interface
of the control plane node.
export DPUCLUSTER_INTERFACE=
## IP address to the NFS server used as storage for
the BFB.
export NFS_SERVER_IP=
## The DPF REGISTRY is the Helm repository URL where the DPF Operator Chart resides.
## Usually this
is the NVIDIA Helm NGC registry. For development purposes, this
can be set to a different repository.
export REGISTRY=https://helm.ngc.nvidia.com/nvidia/doca
## The DPF TAG is the version of the DPF components which will be deployed in this
guide.
export TAG=v25.7.0
## URL to the BFB used in the `bfb.yaml` and linked by the DPUSet.
export BFB_URL="https://content.mellanox.com/BlueField/BFBs/Ubuntu22.04/bf-bundle-3.1.0-76_25.07_ubuntu-22.04_prod.bfb"
Modify the variables in manifests/00-env-vars/envvars.env
to fit your environment, then source the file:
source manifests/00
-env-vars/envvars.env
1. DPF Operator installation
Create storage required by the DPF Operator
A number of environment variables must be set before running this command.
kubectl create ns dpf-operator-system
cat manifests/01
-dpf-operator-installation/*.yaml | envsubst | kubectl apply -f -
This deploys the following objects:
PersistentVolume and PersistentVolumeClaim for the
provisioning controller
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: bfb-pv
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
nfs:
path: /mnt/dpf_share/bfb
server: $NFS_SERVER_IP
persistentVolumeReclaimPolicy: Delete
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: bfb-pvc
namespace: dpf-operator-system
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
volumeMode: Filesystem
storageClassName: ""
Additional Dependencies
Before deploying the DPF Operator, ensure that Helm is properly configured according to the Helm prerequisites.
This is a critical prerequisite step that must be completed for the DPF Operator to function properly.
Deploy the DPF Operator
A number of environment variables must be set before running this command.
HTTP Registry (default)
If the $REGISTRY is an HTTP Registry (default value) use this command:
helm repo add --force-update dpf-repository ${REGISTRY}
helm repo update
helm upgrade --install -n dpf-operator-system dpf-operator dpf-repository/dpf-operator --version=$TAG
OCI Registry
For development purposes, if the $REGISTRY is an OCI Registry use this command:
helm upgrade --install -n dpf-operator-system dpf-operator $REGISTRY/dpf-operator --version=$TAG
Verification
These verification commands may need to be run multiple times to ensure the condition is met.
Verify the DPF Operator installation with:
## Ensure the DPF Operator deployment is available.
kubectl rollout status deployment --namespace dpf-operator-system dpf-operator-controller-manager
## Ensure all pods in the DPF Operator system are ready.
kubectl wait --for
=condition=ready --namespace dpf-operator-system pods --all
2. DPF system installation
This section involves creating the DPF system components and some basic infrastructure required for a functioning DPF-enabled cluster.
Deploy the DPF System components
A number of environment variables must be set before running this command.
kubectl create ns dpu-cplane-tenant1
cat manifests/02
-dpf-system-installation/*.yaml | envsubst | kubectl apply -f -
This will create the following objects:
DPF Operator to install the DPF System components
---
apiVersion: operator.dpu.nvidia.com/v1alpha1
kind: DPFOperatorConfig
metadata:
name: dpfoperatorconfig
namespace: dpf-operator-system
spec:
provisioningController:
bfbPVCName: "bfb-pvc"
dmsTimeout: 900
kamajiClusterManager:
disable: false
DPUCluster to serve as Kubernetes control plane for DPU nodes
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUCluster
metadata:
name: dpu-cplane-tenant1
namespace: dpu-cplane-tenant1
spec:
type: kamaji
maxNodes: 10
clusterEndpoint:
# deploy keepalived instances on the nodes that match the given nodeSelector.
keepalived:
# interface
on which keepalived will listen. Should be the oob interface
of the control plane node.
interface
: $DPUCLUSTER_INTERFACE
# Virtual IP reserved for
the DPU Cluster load balancer. Must not be allocatable by DHCP.
vip: $DPUCLUSTER_VIP
# virtualRouterID must be in range [1
,255
], make sure the given virtualRouterID does not duplicate with any existing keepalived process running on the host
virtualRouterID: 126
nodeSelector:
node-role.kubernetes.io/control-plane: ""
Verification
These verification commands may need to be run multiple times to ensure the condition is met.
Verify the DPF System with:
## Ensure the provisioning and DPUService controller manager deployments are available.
kubectl rollout status deployment --namespace dpf-operator-system dpf-provisioning-controller-manager dpuservice-controller-manager
## Ensure all other deployments in the DPF Operator system are Available.
kubectl rollout status deployment --namespace dpf-operator-system
## Ensure the DPUCluster is ready for
nodes to join.
kubectl wait --for
=condition=ready --namespace dpu-cplane-tenant1 dpucluster --all
3. DPU Provisioning and Interface Plumbing
In this step we provision our DPUs and we do the nessecary interface plumbing to enable the DPU to act as a passthrough device.
The user is expected to create a DPUSet object to provision the DPUs and a DPUServiceChain to enable the nessecary connectivity between the host and DPU interfaces.
Check the DPUSet documentation and DPUServiceChain documentation for more information about these objects.
Create the BFB, DPUSet and DPUServiceChain
A number of environment variables must be set before running this command.
cat manifests/03
-dpf-object-installation/*.yaml | envsubst | kubectl apply -f -
This will deploy the following objects:
BFB to download the BFB to a shared volume
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: BFB
metadata:
name: bf-bundle
namespace: dpf-operator-system
spec:
url: $BFB_URL
DPUFlavor used for provisioning the DPUs
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUFlavor
metadata:
name: passthrough
namespace: dpf-operator-system
spec:
grub:
kernelParameters:
- console=hvc0
- console=ttyAMA0
- earlycon=pl011,0x13010000
- fixrttc
- net.ifnames=0
- biosdevname=0
- iommu.passthrough=1
- cgroup_no_v1=net_prio,net_cls
- hugepagesz=2048kB
- hugepages=3072
nvconfig:
- device: "*"
parameters:
- PF_BAR2_ENABLE=0
- PER_PF_NUM_SF=1
- PF_TOTAL_SF=20
- PF_SF_BAR_SIZE=10
- NUM_PF_MSIX_VALID=0
- PF_NUM_PF_MSIX_VALID=1
- PF_NUM_PF_MSIX=228
- INTERNAL_CPU_MODEL=1
- INTERNAL_CPU_OFFLOAD_ENGINE=0
- SRIOV_EN=1
- NUM_OF_VFS=46
- LAG_RESOURCE_ALLOCATION=1
ovs:
rawConfigScript: |
_ovs-vsctl() {
ovs-vsctl --no-wait --timeout 15
"$@"
}
_ovs-vsctl set Open_vSwitch . other_config:doca-init=true
_ovs-vsctl set Open_vSwitch . other_config:dpdk-max-memzones=50000
_ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
_ovs-vsctl set Open_vSwitch . other_config:pmd-quiet-idle=true
_ovs-vsctl set Open_vSwitch . other_config:max-idle=20000
_ovs-vsctl set Open_vSwitch . other_config:max-revalidator=5000
_ovs-vsctl set Open_vSwitch . other_config:ctl-pipe-size=1024
_ovs-vsctl --if
-exists del-br ovsbr1
_ovs-vsctl --if
-exists del-br ovsbr2
_ovs-vsctl --may-exist add-br br-sfc
_ovs-vsctl set bridge br-sfc datapath_type=netdev
_ovs-vsctl set bridge br-sfc fail_mode=secure
_ovs-vsctl --may-exist add-port br-sfc p0
_ovs-vsctl set Interface p0 type=dpdk
_ovs-vsctl set Interface p0 mtu_request=9216
_ovs-vsctl set Port p0 external_ids:dpf-type=physical
bfcfgParameters:
- UPDATE_ATF_UEFI=yes
- UPDATE_DPU_OS=yes
- WITH_NIC_FW_UPDATE=yes
configFiles:
- path: /etc/mellanox/mlnx-bf.conf
operation: override
raw: |
ALLOW_SHARED_RQ="no"
IPSEC_FULL_OFFLOAD="no"
ENABLE_ESWITCH_MULTIPORT="yes"
permissions: "0644"
- path: /etc/mellanox/mlnx-ovs.conf
operation: override
raw: |
CREATE_OVS_BRIDGES="no"
OVS_DOCA="yes"
permissions: "0644"
- path: /etc/mellanox/mlnx-sf.conf
operation: override
raw: ""
permissions: "0644"
DPUSet to provision DPUs on worker nodes
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUSet
metadata:
name: passthrough
namespace: dpf-operator-system
spec:
dpuNodeSelector:
matchLabels:
feature.node.kubernetes.io/dpu-enabled: "true"
dpuTemplate:
spec:
dpuFlavor: passthrough
bfb:
name: "bf-bundle"
DPUServiceInterfaces used by the DPUServiceChain
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceInterface
metadata:
name: p0
namespace: dpf-operator-system
spec:
template:
spec:
template:
metadata:
labels:
interface
: "p0"
spec:
interfaceType: physical
physical:
interfaceName: p0
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceInterface
metadata:
name: p1
namespace: dpf-operator-system
spec:
template:
spec:
template:
metadata:
labels:
interface
: "p1"
spec:
interfaceType: physical
physical:
interfaceName: p1
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceInterface
metadata:
name: pf0hpf
namespace: dpf-operator-system
spec:
template:
spec:
template:
metadata:
labels:
interface
: "pf0hpf"
spec:
interfaceType: pf
pf:
pfID: 0
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceInterface
metadata:
name: pf1hpf
namespace: dpf-operator-system
spec:
template:
spec:
template:
metadata:
labels:
interface
: "pf1hpf"
spec:
interfaceType: pf
pf:
pfID: 1
DPUServiceChain to make the device act as passthrough device
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceChain
metadata:
name: passthrough
namespace: dpf-operator-system
spec:
template:
spec:
template:
spec:
switches:
- ports:
- serviceInterface:
matchLabels:
interface
: p0
- serviceInterface:
matchLabels:
interface
: pf0hpf
- ports:
- serviceInterface:
matchLabels:
interface
: p1
- serviceInterface:
matchLabels:
interface
: pf1hpf
Verification
The following verification commands may need to be run multiple times to ensure the condition is met.
## Ensure the DPUServiceChain is ready
kubectl wait --for
=condition=ready --namespace dpf-operator-system dpuservicechain passthrough
## Ensure the DPUServiceInterfaces are ready
kubectl wait --for
=condition=ready --namespace dpf-operator-system dpuserviceinterface p0 p1 pf0hpf pf1hpf
## Ensure the BFB is ready
kubectl wait --for
="jsonpath={.status.phase}=Ready"
--namespace dpf-operator-system bfb bf-bundle
## Ensure the DPUs have the condition Initialized (this
may take time)
kubectl wait --for
=condition=Initialized --namespace dpf-operator-system dpu --all
Wait for the DPUs to be provisioned
The DPUs will take some time to be provisioned and the OS to be installed. You can check the status of the DPUs with
kubectl wait --for
=condition=Ready --namespace dpf-operator-system dpu --all
or with dpfctl
:
$ kubectl -n dpf-operator-system exec deploy/dpf-operator-controller-manager -- /dpfctl describe dpusets
NAME NAMESPACE STATUS REASON SINCE MESSAGE
DPFOperatorConfig/dpfoperatorconfig dpf-operator-system Ready: True Success 24m
├─DPUServiceChains
│ └─DPUServiceChain/passthrough dpf-operator-system Ready: True Success 7s
├─DPUServiceInterfaces
│ └─4
DPUServiceInterfaces... dpf-operator-system Ready: True Success 78m See p0, p1, pf0hpf, pf1hpf
└─DPUSets
└─DPUSet/passthrough dpf-operator-system
├─BFB/bf-bundle dpf-operator-system Ready: True Ready 78m File: bf-bundle-3.1
.0
-53_25.07_ubuntu-22
.04_prod.bfb, DOCA: 3.1
.0
└─DPUs
├─DPU/dpu-node-mt2306xz0370-mt2306xz0370 dpf-operator-system
│ └─Ready False OS Installing 1s
├─DPU/dpu-node-mt2333xz0xq3-mt2333xz0xq3 dpf-operator-system
│ └─Ready False OS Installing 1s
├─DPU/dpu-node-mt2402xz0f6v-mt2402xz0f6v dpf-operator-system
│ └─Ready False OS Installing 1s
└─DPU/dpu-node-mt2404xz0c98-mt2404xz0c98 dpf-operator-system
└─Ready False OS Installing 1s
Test Traffic
After the DPUs are provisioned and the rest of the objects are Ready, we can test traffic by assigning an IP on one of the PFs on the host for each DPU, and run a simple ping. This assumes that the high speed ports of the DPUs are connected and the DPUs can reach each other. Assuming the PF is named ens0f0np0
then:
Host 1:
ip addr add 192.168
.1.1
/24
dev ens0f0np0
Host 2:
ip addr add 192.168
.1.2
/24
dev ens0f0np0
From Host 1:
$ ping 192.168
.1.2
-c3
PING 192.168
.1.2
(192.168
.1.2
) 56
(84
) bytes of data.
64
bytes from 192.168
.1.2
: icmp_seq=1
ttl=64
time=0.387
ms
64
bytes from 192.168
.1.2
: icmp_seq=2
ttl=64
time=0.344
ms
64
bytes from 192.168
.1.2
: icmp_seq=3
ttl=64
time=0.396
ms
--- 192.168
.1.2
ping statistics ---
3
packets transmitted, 3
received, 0
% packet loss, time 2053ms
rtt min/avg/max/mdev = 0.344
/0.375
/0.396
/0.022
ms
This section covers only the DPF related components and not the prerequisites as these must be managed by the admin.
Delete the DPF Operator system and DPF Operator
kubectl delete -n dpf-operator-system dpfoperatorconfig dpfoperatorconfig --wait
helm uninstall -n dpf-operator-system dpf-operator --wait
Delete DPF Operator PVC
kubectl -n dpf-operator-system delete pvc bfb-pvc
kubectl delete pv bfb-pv
Note: there can be a race condition with deleting the underlying Kamaji cluster which runs the DPU cluster control plane in this guide. If that happens it may be necessary to remove finalizers manually from DPUCluster
and Datastore
objects.