DPF Book Template - RDG for DPF with OVN-Kubernetes and HBN Services Demo

DPF System Installation

This section involves creating the DPF system components and some basic infrastructure required for a functioning DPF-enabled cluster.

  1. The following YAML files define the DPFOperatorConfig to install the DPF System components and the DPUCluster to serve as Kubernetes control plane for DPU nodes.

    Note

    Note that to achieve high performance results you need to adjust the operatorconfig.yaml to support MTU 9000.

    manifests/03-dpf-system-installation/operatorconfig.yaml

    Copy
    Copied!
                

    --- apiVersion: operator.dpu.nvidia.com/v1alpha1 kind: DPFOperatorConfig metadata: name: dpfoperatorconfig namespace: dpf-operator-system spec: overrides: kubernetesAPIServerVIP: $TARGETCLUSTER_API_SERVER_HOST kubernetesAPIServerPort: $TARGETCLUSTER_API_SERVER_PORT provisioningController: bfbPVCName: "bfb-pvc" dmsTimeout: 900 kamajiClusterManager: disable: false networking: controlPlaneMTU: 9000 highSpeedMTU: 9000

    manifests/03-dpf-system-installation/dpucluster.yaml

    Copy
    Copied!
                

    --- apiVersion: provisioning.dpu.nvidia.com/v1alpha1 kind: DPUCluster metadata: name: dpu-cplane-tenant1 namespace: dpu-cplane-tenant1 spec: type: kamaji maxNodes: 10 version: v1.30.2 clusterEndpoint: # deploy keepalived instances on the nodes that match the given nodeSelector. keepalived: # interface on which keepalived will listen. Should be the oob interface of the control plane node. interface: $DPUCLUSTER_INTERFACE # Virtual IP reserved for the DPU Cluster load balancer. Must not be allocatable by DHCP. vip: $DPUCLUSTER_VIP # virtualRouterID must be in range [1,255], make sure the given virtualRouterID does not duplicate with any existing keepalived process running on the host virtualRouterID: 126 nodeSelector: node-role.kubernetes.io/control-plane: ""

  2. Create NS for the Kubernetes control plane of the DPU nodes:

    Jump Node Console

    Copy
    Copied!
                

    $ kubectl create ns dpu-cplane-tenant1

  3. Apply the previous YAML files:

    Jump Node Console

    Copy
    Copied!
                

    $ cat manifests/03-dpf-system-installation/*.yaml | envsubst | kubectl apply -f -

  4. Verify the DPF system by ensuring that the provisioning and DPUService controller manager deployments are available, that all other deployments in the DPF Operator system are available, and that the DPUCluster is ready for nodes to join.

    Jump Node Console

    Copy
    Copied!
                

    $ kubectl rollout status deployment --namespace dpf-operator-system dpf-provisioning-controller-manager dpuservice-controller-manager deployment "dpf-provisioning-controller-manager" successfully rolled out deployment "dpuservice-controller-manager" successfully rolled out   $ kubectl rollout status deployment --namespace dpf-operator-system deployment "dpf-operator-argocd-applicationset-controller" successfully rolled out deployment "dpf-operator-argocd-redis" successfully rolled out deployment "dpf-operator-argocd-repo-server" successfully rolled out deployment "dpf-operator-argocd-server" successfully rolled out deployment "dpf-operator-controller-manager" successfully rolled out deployment "dpf-operator-kamaji" successfully rolled out deployment "dpf-operator-maintenance-operator" successfully rolled out deployment "dpf-operator-node-feature-discovery-gc" successfully rolled out deployment "dpf-operator-node-feature-discovery-master" successfully rolled out deployment "dpf-provisioning-controller-manager" successfully rolled out deployment "dpuservice-controller-manager" successfully rolled out deployment "kamaji-cm-controller-manager" successfully rolled out   $ kubectl wait --for=condition=ready --namespace dpu-cplane-tenant1 dpucluster --all dpucluster.provisioning.dpu.nvidia.com/dpu-cplane-tenant1 condition met

© Copyright 2025, NVIDIA. Last updated on Jul 10, 2025.