RDG for DPF with OVN-Kubernetes and HBN Services

K8s Cluster Scale-out

At this point workers should be added to the cluster. As workers are added to the cluster, DPUs will be provisioned and DPUServices will begin to be spun up.

  1. Return to the shell where Kubespray was previously run to deploy the cluster, unmark the kube_node group in the hosts.yaml file, and add the worker nodes to the cluster:

    Note

    Ensure you are in the Python virtual environment ( .venv ) when running the command.

    Jump Node Console

    Copy
    Copied!
                

    (.venv) depuser@jump:~/kubespray$ cat inventory/mycluster/hosts.yaml ... k8s_cluster: children: kube_control_plane: kube_node: ...   (.venv) depuser@jump:~/kubespray$ ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root scale.yml

  2. The scale-out shouldn't take a long time, and a successful run should look similar to the following output:

    kubespray_scale_25.4.0-version-1-modificationdate-1753356491043-api-v2.png

  1. To follow the progress of the DPU provisioning, run the following command to check in which phase it currently is:

    Jump Node Console

    Copy
    Copied!
                

    $ watch -n10 "kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'" Every 10.0s: kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase' jump: Tue May 20 14:54:41 2025   Dpu Node Name: worker1 Last Transition Time: 2025-05-20T14:51:54Z Type: Initialized Last Transition Time: 2025-05-20T14:51:54Z Type: BFBReady Last Transition Time: 2025-05-20T14:52:09Z Type: NodeEffectReady Last Transition Time: 2025-05-20T14:52:10Z Type: InterfaceInitialized Last Transition Time: 2025-05-20T14:52:11Z Type: FWConfigured Phase: OS Installing Dpu Node Name: worker2 Last Transition Time: 2025-05-20T14:50:34Z Type: Initialized Last Transition Time: 2025-05-20T14:50:34Z Type: BFBReady Last Transition Time: 2025-05-20T14:50:49Z Type: NodeEffectReady Last Transition Time: 2025-05-20T14:50:50Z Type: InterfaceInitialized Last Transition Time: 2025-05-20T14:50:51Z Type: FWConfigured Phase: OS Installing

  2. Validate that the DPUs have been provisioned successfully by ensuring they're in ready state:

    Jump Node Console

    Copy
    Copied!
                

    $ kubectl wait --for=condition=ready --namespace dpf-operator-system dpu --all dpu.provisioning.dpu.nvidia.com/worker1-0000-89-00 condition met dpu.provisioning.dpu.nvidia.com/worker2-0000-89-00 condition met

  3. Ensure that the following DaemonSets have 2 ready replicas:

    Jump Node Console

    Copy
    Copied!
                

    $ kubectl wait ds --for=jsonpath='{.status.numberReady}'=2 --namespace nvidia-network-operator kube-multus-ds sriov-network-config-daemon sriov-device-plugin daemonset.apps/kube-multus-ds condition met daemonset.apps/sriov-network-config-daemon condition met daemonset.apps/sriov-device-plugin condition met   $ kubectl wait ds --for=jsonpath='{.status.numberReady}'=2 --namespace ovn-kubernetes ovn-kubernetes-node-dpu-host daemonset.apps/ovn-kubernetes-node-dpu-host condition met

  4. Validate that all the different DPUServices, DPUServiceIPAMs, DPUServiceInterfaces and DPUServiceChains objects are now in ready state

    Jump Node Console

    Copy
    Copied!
                

    $ kubectl wait --for=condition=ApplicationsReady --namespace dpf-operator-system dpuservices -l svc.dpu.nvidia.com/owned-by-dpudeployment=dpf-operator-system_ovn-hbn dpuservice.svc.dpu.nvidia.com/blueman-kqm2q condition met dpuservice.svc.dpu.nvidia.com/dts-b8vfs condition met dpuservice.svc.dpu.nvidia.com/hbn-2rglk condition met dpuservice.svc.dpu.nvidia.com/ovn-5tr2j condition met   $ kubectl wait --for=condition=DPUIPAMObjectReady --namespace dpf-operator-system dpuserviceipam --all dpuserviceipam.svc.dpu.nvidia.com/loopback condition met dpuserviceipam.svc.dpu.nvidia.com/pool1 condition met   $ kubectl wait --for=condition=ServiceInterfaceSetReady --namespace dpf-operator-system dpuserviceinterface --all dpuserviceinterface.svc.dpu.nvidia.com/hbn-p0-if-tnkf8 condition met dpuserviceinterface.svc.dpu.nvidia.com/hbn-p1-if-ww8qv condition met dpuserviceinterface.svc.dpu.nvidia.com/hbn-pf2dpu2-if-7l5mk condition met dpuserviceinterface.svc.dpu.nvidia.com/ovn condition met dpuserviceinterface.svc.dpu.nvidia.com/p0 condition met dpuserviceinterface.svc.dpu.nvidia.com/p1 condition met   $ kubectl wait --for=condition=ServiceChainSetReady --namespace dpf-operator-system dpuservicechain --all dpuservicechain.svc.dpu.nvidia.com/ovn-hbn-6lkvj condition met

Congratulations, the DPF system has been successfully installed!

© Copyright 2025, NVIDIA. Last updated on Jul 29, 2025.