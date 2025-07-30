RDG for DPF with OVN-Kubernetes and HBN Services
NVIDIA Docs Hub  NVIDIA Networking  Networking Solutions  RDG for DPF with OVN-Kubernetes and HBN Services  K8s Cluster Scale-out

On This Page

K8s Cluster Scale-out

Add Worker Nodes to the Cluster

At this point workers should be added to the cluster. As workers are added to the cluster, DPUs will be provisioned and DPUServices will begin to be spun up.

  1. Return to the shell where Kubespray was previously run to deploy the cluster, unmark the kube_node group in the hosts.yaml file, and add the worker nodes to the cluster:

    Note

    Ensure you are in the Python virtual environment ( .venv ) when running the command.

    Jump Node Console

    Copy
    Copied!
                
    
            
    (.venv) depuser@jump:~/kubespray$ cat inventory/mycluster/hosts.yaml
...
   k8s_cluster:
      children:
        kube_control_plane:
        kube_node:
...
 
(.venv) depuser@jump:~/kubespray$ ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root scale.yml

  2. The scale-out shouldn't take a long time, and a successful run should look similar to the following output:

    kubespray_scale_25.4.0-version-1-modificationdate-1753356491043-api-v2.png

Verification

  1. To follow the progress of the DPU provisioning, run the following command to check in which phase it currently is:

    Jump Node Console

    Copy
    Copied!
                
    
            
    $ watch -n10 "kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'"
Every 10.0s: kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'                                                                                                                                   jump: Tue May 20 14:54:41 2025
 
  Dpu Node Name:                                      worker1
    Last Transition Time:  2025-05-20T14:51:54Z
    Type:                  Initialized
    Last Transition Time:  2025-05-20T14:51:54Z
    Type:                  BFBReady
    Last Transition Time:  2025-05-20T14:52:09Z
    Type:                  NodeEffectReady
    Last Transition Time:  2025-05-20T14:52:10Z
    Type:                  InterfaceInitialized
    Last Transition Time:  2025-05-20T14:52:11Z
    Type:                  FWConfigured
  Phase:  OS Installing
  Dpu Node Name:                                      worker2
    Last Transition Time:  2025-05-20T14:50:34Z
    Type:                  Initialized
    Last Transition Time:  2025-05-20T14:50:34Z
    Type:                  BFBReady
    Last Transition Time:  2025-05-20T14:50:49Z
    Type:                  NodeEffectReady
    Last Transition Time:  2025-05-20T14:50:50Z
    Type:                  InterfaceInitialized
    Last Transition Time:  2025-05-20T14:50:51Z
    Type:                  FWConfigured
  Phase:  OS Installing

  2. Validate that the DPUs have been provisioned successfully by ensuring they're in ready state:

    Jump Node Console

    Copy
    Copied!
                
    
            
    $ kubectl wait --for=condition=ready --namespace dpf-operator-system dpu --all
dpu.provisioning.dpu.nvidia.com/worker1-0000-89-00 condition met
dpu.provisioning.dpu.nvidia.com/worker2-0000-89-00 condition met

  3. Ensure that the following DaemonSets have 2 ready replicas:

    Jump Node Console

    Copy
    Copied!
                
    
            
    $ kubectl wait ds --for=jsonpath='{.status.numberReady}'=2 --namespace nvidia-network-operator kube-multus-ds sriov-network-config-daemon sriov-device-plugin
daemonset.apps/kube-multus-ds condition met
daemonset.apps/sriov-network-config-daemon condition met
daemonset.apps/sriov-device-plugin condition met
 
$ kubectl wait ds --for=jsonpath='{.status.numberReady}'=2 --namespace ovn-kubernetes ovn-kubernetes-node-dpu-host
daemonset.apps/ovn-kubernetes-node-dpu-host condition met

  4. Validate that all the different DPUServices, DPUServiceIPAMs, DPUServiceInterfaces and DPUServiceChains objects are now in ready state

    Jump Node Console

    Copy
    Copied!
                
    
            
    $ kubectl wait --for=condition=ApplicationsReady --namespace dpf-operator-system dpuservices -l svc.dpu.nvidia.com/owned-by-dpudeployment=dpf-operator-system_ovn-hbn
dpuservice.svc.dpu.nvidia.com/blueman-kqm2q condition met
dpuservice.svc.dpu.nvidia.com/dts-b8vfs condition met
dpuservice.svc.dpu.nvidia.com/hbn-2rglk condition met
dpuservice.svc.dpu.nvidia.com/ovn-5tr2j condition met
 
$ kubectl wait --for=condition=DPUIPAMObjectReady --namespace dpf-operator-system dpuserviceipam --all
dpuserviceipam.svc.dpu.nvidia.com/loopback condition met
dpuserviceipam.svc.dpu.nvidia.com/pool1 condition met
 
$ kubectl wait --for=condition=ServiceInterfaceSetReady --namespace dpf-operator-system dpuserviceinterface --all
dpuserviceinterface.svc.dpu.nvidia.com/hbn-p0-if-tnkf8 condition met
dpuserviceinterface.svc.dpu.nvidia.com/hbn-p1-if-ww8qv condition met
dpuserviceinterface.svc.dpu.nvidia.com/hbn-pf2dpu2-if-7l5mk condition met
dpuserviceinterface.svc.dpu.nvidia.com/ovn condition met
dpuserviceinterface.svc.dpu.nvidia.com/p0 condition met
dpuserviceinterface.svc.dpu.nvidia.com/p1 condition met
 
$ kubectl wait --for=condition=ServiceChainSetReady --namespace dpf-operator-system dpuservicechain --all
dpuservicechain.svc.dpu.nvidia.com/ovn-hbn-6lkvj condition met

Congratulations, the DPF system has been successfully installed!
© Copyright 2025, NVIDIA. Last updated on Jul 30, 2025.
content here