DPF Book Template - RDG for DPF with OVN-Kubernetes and HBN Services Demo

DPU Provisioning and Service Installation

  1. Before deploying the objects under manifests/05-dpudeployment-installationdirectory, few adjustments need to be made to later achieve better performance results.

    1. Create a new DPUFlavor using the following YAML:

      Note

      The parameter NUM_VF_MSIX is configured to be 48 in the provided example, which is suited for the HP servers that were used in this RDG.

      Set it to the physical number of cores in the NUMA node the NIC is located in.

      manifests/05-dpudeployment-installation/dpuflavor_perf.yaml

      Copy
      Copied!
                  

      --- apiVersion: provisioning.dpu.nvidia.com/v1alpha1 kind: DPUFlavor metadata: name: dpf-provisioning-hbn-ovn-performance namespace: dpf-operator-system spec: bfcfgParameters: - UPDATE_ATF_UEFI=yes - UPDATE_DPU_OS=yes - WITH_NIC_FW_UPDATE=yes configFiles: - operation: override path: /etc/mellanox/mlnx-bf.conf permissions: "0644" raw: | ALLOW_SHARED_RQ="no" IPSEC_FULL_OFFLOAD="no" ENABLE_ESWITCH_MULTIPORT="yes" - operation: override path: /etc/mellanox/mlnx-ovs.conf permissions: "0644" raw: | CREATE_OVS_BRIDGES="no" - operation: override path: /etc/mellanox/mlnx-sf.conf permissions: "0644" raw: "" grub: kernelParameters: - console=hvc0 - console=ttyAMA0 - earlycon=pl011,0x13010000 - fixrttc - net.ifnames=0 - biosdevname=0 - iommu.passthrough=1 - cgroup_no_v1=net_prio,net_cls - hugepagesz=2048kB - hugepages=8072 nvconfig: - device: '*' parameters: - PF_BAR2_ENABLE=0 - PER_PF_NUM_SF=1 - PF_TOTAL_SF=20 - PF_SF_BAR_SIZE=10 - NUM_PF_MSIX_VALID=0 - PF_NUM_PF_MSIX_VALID=1 - PF_NUM_PF_MSIX=228 - INTERNAL_CPU_MODEL=1 - INTERNAL_CPU_OFFLOAD_ENGINE=0 - SRIOV_EN=1 - NUM_OF_VFS=46 - LAG_RESOURCE_ALLOCATION=1 - NUM_VF_MSIX=48 ovs: rawConfigScript: | _ovs-vsctl() { ovs-vsctl --no-wait --timeout 15 "$@" }   _ovs-vsctl set Open_vSwitch . other_config:doca-init=true _ovs-vsctl set Open_vSwitch . other_config:dpdk-max-memzones=50000 _ovs-vsctl set Open_vSwitch . other_config:hw-offload=true _ovs-vsctl set Open_vSwitch . other_config:pmd-quiet-idle=true _ovs-vsctl set Open_vSwitch . other_config:max-idle=20000 _ovs-vsctl set Open_vSwitch . other_config:max-revalidator=5000 _ovs-vsctl --if-exists del-br ovsbr1 _ovs-vsctl --if-exists del-br ovsbr2 _ovs-vsctl --may-exist add-br br-sfc _ovs-vsctl set bridge br-sfc datapath_type=netdev _ovs-vsctl set bridge br-sfc fail_mode=secure _ovs-vsctl --may-exist add-port br-sfc p0 _ovs-vsctl set Interface p0 type=dpdk _ovs-vsctl set Interface p0 mtu_request=9216 _ovs-vsctl set Port p0 external_ids:dpf-type=physical _ovs-vsctl --may-exist add-port br-sfc p1 _ovs-vsctl set Interface p1 type=dpdk _ovs-vsctl set Interface p1 mtu_request=9216 _ovs-vsctl set Port p1 external_ids:dpf-type=physical   _ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-datapath-type=netdev _ovs-vsctl --may-exist add-br br-ovn _ovs-vsctl set bridge br-ovn datapath_type=netdev _ovs-vsctl set Interface br-ovn mtu_request=9216 _ovs-vsctl --may-exist add-port br-ovn pf0hpf _ovs-vsctl set Interface pf0hpf type=dpdk _ovs-vsctl set Interface pf0hpf mtu_request=9216   cat <<EOT > /etc/netplan/99-dpf-comm-ch.yaml network: renderer: networkd version: 2 ethernets: pf0vf0: mtu: 9000 dhcp4: no bridges: br-comm-ch: dhcp4: yes interfaces: - pf0vf0 EOT

    2. Adjust dpudeployment.yaml to reference the DPUFlavor suited for performance (This component provisions DPUs on the worker nodes and describes a set of DPUServices and DPUServiceChain that run on those DPUs):

      manifests/05-dpudeployment-installation/dpudeployment.yaml

      Copy
      Copied!
                  

      --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUDeployment metadata: name: ovn-hbn namespace: dpf-operator-system spec: dpus: bfb: bf-bundle flavor: dpf-provisioning-hbn-ovn-performance dpuSets: - nameSuffix: "dpuset1" nodeSelector: matchLabels: feature.node.kubernetes.io/dpu-enabled: "true" services: ovn: serviceTemplate: ovn serviceConfiguration: ovn hbn: serviceTemplate: hbn serviceConfiguration: hbn dts: serviceTemplate: dts serviceConfiguration: dts blueman: serviceTemplate: blueman serviceConfiguration: blueman serviceChains: switches: - ports: - serviceInterface: matchLabels: uplink: p0 - service: name: hbn interface: p0_if - ports: - serviceInterface: matchLabels: uplink: p1 - service: name: hbn interface: p1_if - ports: - serviceInterface: matchLabels: port: ovn - service: name: hbn interface: pf2dpu2_if

    3. Set the mtu to 8940 for the OVN DPUServiceConfig (to deploy the OVN Kubernetes workloads on the DPU with the same MTU as in the host):

      manifests/05-dpudeployment-installation/dpuserviceconfig_ovn.yaml

      Copy
      Copied!
                  

      --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: ovn namespace: dpf-operator-system spec: deploymentServiceName: "ovn" serviceConfiguration: helmChart: values: k8sAPIServer: https://$TARGETCLUSTER_API_SERVER_HOST:$TARGETCLUSTER_API_SERVER_PORT podNetwork: $POD_CIDR/24 serviceNetwork: $SERVICE_CIDR mtu: 8940 dpuManifests: kubernetesSecretName: "ovn-dpu" # user needs to populate based on DPUServiceCredentialRequest vtepCIDR: "10.0.120.0/22" # user needs to populate based on DPUServiceIPAM hostCIDR: $TARGETCLUSTER_NODE_CIDR # user needs to populate ipamPool: "pool1" # user needs to populate based on DPUServiceIPAM ipamPoolType: "cidrpool" # user needs to populate based on DPUServiceIPAM ipamVTEPIPIndex: 0 ipamPFIPIndex: 1

    4. The rest of the configuration files remain the same, including:

      • BFB to download BlueField Bitstream to a shared volume.

        manifests/05-dpudeployment-installation/bfb.yaml

        Copy
        Copied!
                    

        --- apiVersion: provisioning.dpu.nvidia.com/v1alpha1 kind: BFB metadata: name: bf-bundle namespace: dpf-operator-system spec: url: $BLUEFIELD_BITSTREAM

      • OVN DPUServiceTemplate to deploy OVN Kubernetes workloads to the DPUs.

        manifests/05-dpudeployment-installation/dpuservicetemplate_ovn.yaml

        Copy
        Copied!
                    

        --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: ovn namespace: dpf-operator-system spec: deploymentServiceName: "ovn" helmChart: source: repoURL: $OVN_KUBERNETES_REPO_URL chart: ovn-kubernetes-chart version: $TAG values: commonManifests: enabled: true dpuManifests: enabled: true leaseNamespace: "ovn-kubernetes" gatewayOpts: "--gateway-interface=br-ovn --gateway-uplink-port=puplinkbrovn"

      • HBN DPUServiceConfig and DPUServiceTemplate to deploy HBN workloads to the DPUs.

        manifests/05-dpudeployment-installation/dpuserviceconfig_hbn.yaml

        Copy
        Copied!
                    

        --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: hbn namespace: dpf-operator-system spec: deploymentServiceName: "hbn" serviceConfiguration: serviceDaemonSet: annotations: k8s.v1.cni.cncf.io/networks: |- [ {"name": "iprequest", "interface": "ip_lo", "cni-args": {"poolNames": ["loopback"], "poolType": "cidrpool"}}, {"name": "iprequest", "interface": "ip_pf2dpu2", "cni-args": {"poolNames": ["pool1"], "poolType": "cidrpool", "allocateDefaultGateway": true}} ] helmChart: values: configuration: perDPUValuesYAML: | - hostnamePattern: "*" values: bgp_peer_group: hbn - hostnamePattern: "worker1*" values: bgp_autonomous_system: 65101 - hostnamePattern: "worker2*" values: bgp_autonomous_system: 65201 startupYAMLJ2: | - header: model: BLUEFIELD nvue-api-version: nvue_v1 rev-id: 1.0 version: HBN 2.4.0 - set: interface: lo: ip: address: {{ ipaddresses.ip_lo.ip }}/32: {} type: loopback p0_if,p1_if: type: swp link: mtu: 9000 pf2dpu2_if: ip: address: {{ ipaddresses.ip_pf2dpu2.cidr }}: {} type: swp link: mtu: 9000 router: bgp: autonomous-system: {{ config.bgp_autonomous_system }} enable: on graceful-restart: mode: full router-id: {{ ipaddresses.ip_lo.ip }} vrf: default: router: bgp: address-family: ipv4-unicast: enable: on redistribute: connected: enable: on ipv6-unicast: enable: on redistribute: connected: enable: on enable: on neighbor: p0_if: peer-group: {{ config.bgp_peer_group }} type: unnumbered p1_if: peer-group: {{ config.bgp_peer_group }} type: unnumbered path-selection: multipath: aspath-ignore: on peer-group: {{ config.bgp_peer_group }}: remote-as: external   interfaces: ## NOTE: Interfaces inside the HBN pod must have the `_if` suffix due to a naming convention in HBN. - name: p0_if network: mybrhbn - name: p1_if network: mybrhbn - name: pf2dpu2_if network: mybrhbn

        manifests/05-dpudeployment-installation/dpuservicetemplate_hbn.yaml

        Copy
        Copied!
                    

        --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: hbn namespace: dpf-operator-system spec: deploymentServiceName: "hbn" helmChart: source: repoURL: $HELM_REGISTRY_REPO_URL version: 1.0.2 chart: doca-hbn values: image: repository: $HBN_NGC_IMAGE_URL tag: 3.0.0-doca3.0.0 resources: memory: 6Gi nvidia.com/bf_sf: 3

      • DOCA Telemetry Service (DTS) DPUServiceConfig and DPUServiceTemplate to deploy DTS to the DPUs.

        manifests/05-dpudeployment-installation/dpuserviceconfig_dts.yaml

        Copy
        Copied!
                    

        --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: dts namespace: dpf-operator-system spec: deploymentServiceName: "dts"

        manifests/05-dpudeployment-installation/dpuservicetemplate_dts.yaml

        Copy
        Copied!
                    

        --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: dts namespace: dpf-operator-system spec: deploymentServiceName: "dts" helmChart: source: repoURL: $HELM_REGISTRY_REPO_URL version: 1.0.6 chart: doca-telemetry

      • Blueman DPUServiceConfig and DPUServiceTemplate to deploy Blueman to the DPUs.

        manifests/05-dpudeployment-installation/dpuserviceconfig_blueman.yaml

        Copy
        Copied!
                    

        --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: blueman namespace: dpf-operator-system spec: deploymentServiceName: "blueman"

        manifests/05-dpudeployment-installation/dpuservicetemplate_blueman.yaml

        Copy
        Copied!
                    

        --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: blueman namespace: dpf-operator-system spec: deploymentServiceName: "blueman" helmChart: source: repoURL: $HELM_REGISTRY_REPO_URL version: 1.0.8 chart: doca-blueman

      • OVN DPUServiceCredentialRequest to allow cross cluster communication.

        manifests/05-dpudeployment-installation/ovn-credentials.yaml

        Copy
        Copied!
                    

        --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceCredentialRequest metadata: name: ovn-dpu namespace: dpf-operator-system spec: serviceAccount: name: ovn-dpu namespace: dpf-operator-system duration: 24h type: tokenFile secret: name: ovn-dpu namespace: dpf-operator-system metadata: labels: dpu.nvidia.com/image-pull-secret: ""

      • DPUServiceInterfaces for physical ports on the DPU.

        manifests/05-dpudeployment-installation/physical-ifaces.yaml

        Copy
        Copied!
                    

        --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceInterface metadata: name: p0 namespace: dpf-operator-system spec: template: spec: template: metadata: labels: uplink: "p0" spec: interfaceType: physical physical: interfaceName: p0 --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceInterface metadata: name: p1 namespace: dpf-operator-system spec: template: spec: template: metadata: labels: uplink: "p1" spec: interfaceType: physical physical: interfaceName: p1

      • OVN DPUServiceInterface to define the ports attached to OVN workloads on the DPU.

        manifests/05-dpudeployment-installation/ovn-iface.yaml

        Copy
        Copied!
                    

        --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceInterface metadata: name: ovn namespace: dpf-operator-system spec: template: spec: template: metadata: labels: port: ovn spec: interfaceType: ovn

      • DPUServiceIPAM to set up IP Address Management on the DPUCluster.

        manifests/05-dpudeployment-installation/hbn-ovn-ipam.yaml

        Copy
        Copied!
                    

        --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceIPAM metadata: name: pool1 namespace: dpf-operator-system spec: ipv4Network: network: "10.0.120.0/22" gatewayIndex: 3 prefixSize: 29

      • DPUServiceIPAM for the loopback interface in HBN.

        manifests/05-dpudeployment-installation/hbn-loopback-ipam.yaml

        Copy
        Copied!
                    

        --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceIPAM metadata: name: loopback namespace: dpf-operator-system spec: ipv4Network: network: "11.0.0.0/24" prefixSize: 32

  2. Apply all of the YAML files mentioned above using the following command:

    Jump Node Console

    Copy
    Copied!
                

    $ cat manifests/05-dpudeployment-installation/*.yaml | envsubst | kubectl apply -f -

  3. Verify the DPUService installation by ensuring the DPUServices are created and have been reconciled, that the DPUServiceIPAMs have been reconciled, that the DPUServiceInterfaces have been reconciled, and that the DPUServiceChains have been reconciled:

    Note

    These verification commands may need to be run multiple times to ensure the conditions are met.

    Jump Node Console

    Copy
    Copied!
                

    $ kubectl wait --for=condition=ApplicationsReconciled --namespace dpf-operator-system dpuservices -l svc.dpu.nvidia.com/owned-by-dpudeployment=dpf-operator-system_ovn-hbn dpuservice.svc.dpu.nvidia.com/blueman-kqm2q condition met dpuservice.svc.dpu.nvidia.com/dts-b8vfs condition met dpuservice.svc.dpu.nvidia.com/hbn-2rglk condition met dpuservice.svc.dpu.nvidia.com/ovn-5tr2j condition met   $ kubectl wait --for=condition=DPUIPAMObjectReconciled --namespace dpf-operator-system dpuserviceipam --all dpuserviceipam.svc.dpu.nvidia.com/loopback condition met dpuserviceipam.svc.dpu.nvidia.com/pool1 condition met   $ kubectl wait --for=condition=ServiceInterfaceSetReconciled --namespace dpf-operator-system dpuserviceinterface --all dpuserviceinterface.svc.dpu.nvidia.com/hbn-p0-if-tnkf8 condition met dpuserviceinterface.svc.dpu.nvidia.com/hbn-p1-if-ww8qv condition met dpuserviceinterface.svc.dpu.nvidia.com/hbn-pf2dpu2-if-7l5mk condition met dpuserviceinterface.svc.dpu.nvidia.com/ovn condition met dpuserviceinterface.svc.dpu.nvidia.com/p0 condition met dpuserviceinterface.svc.dpu.nvidia.com/p1 condition met   $ kubectl wait --for=condition=ServiceChainSetReconciled --namespace dpf-operator-system dpuservicechain --all dpuservicechain.svc.dpu.nvidia.com/ovn-hbn-6lkvj condition met

© Copyright 2025, NVIDIA. Last updated on Jul 10, 2025.