DPU Provisioning and Service Installation
Before deploying the objects under
manifests/05-dpudeployment-installationdirectory, few adjustments need to be made to later achieve better performance results.Create a new DPUFlavor using the following YAML:
NoteThe parameter
NUM_VF_MSIXis configured to be 48 in the provided example, which is suited for the HP servers that were used in this RDG.Set it to the physical number of cores in the NUMA node the NIC is located in.
manifests/05-dpudeployment-installation/dpuflavor_perf.yaml
--- apiVersion: provisioning.dpu.nvidia.com/v1alpha1 kind: DPUFlavor metadata: name: dpf-provisioning-hbn-ovn-performance namespace: dpf-operator-system spec: bfcfgParameters: - UPDATE_ATF_UEFI=yes - UPDATE_DPU_OS=yes - WITH_NIC_FW_UPDATE=yes configFiles: - operation: override path: /etc/mellanox/mlnx-bf.conf permissions:
"0644"raw: | ALLOW_SHARED_RQ="no"IPSEC_FULL_OFFLOAD="no"ENABLE_ESWITCH_MULTIPORT="yes"- operation: override path: /etc/mellanox/mlnx-ovs.conf permissions:"0644"raw: | CREATE_OVS_BRIDGES="no"- operation: override path: /etc/mellanox/mlnx-sf.conf permissions:"0644"raw:""grub: kernelParameters: - console=hvc0 - console=ttyAMA0 - earlycon=pl011,0x13010000- fixrttc - net.ifnames=0- biosdevname=0- iommu.passthrough=1- cgroup_no_v1=net_prio,net_cls - hugepagesz=2048kB - hugepages=8072nvconfig: - device:'*'parameters: - PF_BAR2_ENABLE=0- PER_PF_NUM_SF=1- PF_TOTAL_SF=20- PF_SF_BAR_SIZE=10- NUM_PF_MSIX_VALID=0- PF_NUM_PF_MSIX_VALID=1- PF_NUM_PF_MSIX=228- INTERNAL_CPU_MODEL=1- INTERNAL_CPU_OFFLOAD_ENGINE=0- SRIOV_EN=1- NUM_OF_VFS=46- LAG_RESOURCE_ALLOCATION=1- NUM_VF_MSIX=48ovs: rawConfigScript: | _ovs-vsctl() { ovs-vsctl --no-wait --timeout15"$@"} _ovs-vsctl set Open_vSwitch . other_config:doca-init=true_ovs-vsctl set Open_vSwitch . other_config:dpdk-max-memzones=50000_ovs-vsctl set Open_vSwitch . other_config:hw-offload=true_ovs-vsctl set Open_vSwitch . other_config:pmd-quiet-idle=true_ovs-vsctl set Open_vSwitch . other_config:max-idle=20000_ovs-vsctl set Open_vSwitch . other_config:max-revalidator=5000_ovs-vsctl --if-exists del-br ovsbr1 _ovs-vsctl --if-exists del-br ovsbr2 _ovs-vsctl --may-exist add-br br-sfc _ovs-vsctl set bridge br-sfc datapath_type=netdev _ovs-vsctl set bridge br-sfc fail_mode=secure _ovs-vsctl --may-exist add-port br-sfc p0 _ovs-vsctl set Interface p0 type=dpdk _ovs-vsctl set Interface p0 mtu_request=9216_ovs-vsctl set Port p0 external_ids:dpf-type=physical _ovs-vsctl --may-exist add-port br-sfc p1 _ovs-vsctl set Interface p1 type=dpdk _ovs-vsctl set Interface p1 mtu_request=9216_ovs-vsctl set Port p1 external_ids:dpf-type=physical _ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-datapath-type=netdev _ovs-vsctl --may-exist add-br br-ovn _ovs-vsctl set bridge br-ovn datapath_type=netdev _ovs-vsctl set Interface br-ovn mtu_request=9216_ovs-vsctl --may-exist add-port br-ovn pf0hpf _ovs-vsctl set Interface pf0hpf type=dpdk _ovs-vsctl set Interface pf0hpf mtu_request=9216cat <<EOT > /etc/netplan/99-dpf-comm-ch.yaml network: renderer: networkd version:2ethernets: pf0vf0: mtu:9000dhcp4: no bridges: br-comm-ch: dhcp4: yes interfaces: - pf0vf0 EOTAdjust
dpudeployment.yamlto reference the DPUFlavor suited for performance (This component provisions DPUs on the worker nodes and describes a set of DPUServices and DPUServiceChain that run on those DPUs):manifests/05-dpudeployment-installation/dpudeployment.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUDeployment metadata: name: ovn-hbn namespace: dpf-operator-system spec: dpus: bfb: bf-bundle flavor: dpf-provisioning-hbn-ovn-performance dpuSets: - nameSuffix:
"dpuset1"nodeSelector: matchLabels: feature.node.kubernetes.io/dpu-enabled:"true"services: ovn: serviceTemplate: ovn serviceConfiguration: ovn hbn: serviceTemplate: hbn serviceConfiguration: hbn dts: serviceTemplate: dts serviceConfiguration: dts blueman: serviceTemplate: blueman serviceConfiguration: blueman serviceChains: switches: - ports: - serviceInterface: matchLabels: uplink: p0 - service: name: hbninterface: p0_if - ports: - serviceInterface: matchLabels: uplink: p1 - service: name: hbninterface: p1_if - ports: - serviceInterface: matchLabels: port: ovn - service: name: hbninterface: pf2dpu2_ifSet the
mtuto8940for the OVN DPUServiceConfig (to deploy the OVN Kubernetes workloads on the DPU with the same MTU as in the host):manifests/05-dpudeployment-installation/dpuserviceconfig_ovn.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: ovn namespace: dpf-operator-system spec: deploymentServiceName:
"ovn"serviceConfiguration: helmChart: values: k8sAPIServer: https://$TARGETCLUSTER_API_SERVER_HOST:$TARGETCLUSTER_API_SERVER_PORTpodNetwork: $POD_CIDR/24serviceNetwork: $SERVICE_CIDR mtu:8940dpuManifests: kubernetesSecretName:"ovn-dpu"# user needs to populate based on DPUServiceCredentialRequest vtepCIDR:"10.0.120.0/22"# user needs to populate based on DPUServiceIPAM hostCIDR: $TARGETCLUSTER_NODE_CIDR # user needs to populate ipamPool:"pool1"# user needs to populate based on DPUServiceIPAM ipamPoolType:"cidrpool"# user needs to populate based on DPUServiceIPAM ipamVTEPIPIndex:0ipamPFIPIndex:1The rest of the configuration files remain the same, including:
BFB to download BlueField Bitstream to a shared volume.
manifests/05-dpudeployment-installation/bfb.yaml
--- apiVersion: provisioning.dpu.nvidia.com/v1alpha1 kind: BFB metadata: name: bf-bundle namespace: dpf-operator-system spec: url: $BLUEFIELD_BITSTREAM
OVN DPUServiceTemplate to deploy OVN Kubernetes workloads to the DPUs.
manifests/05-dpudeployment-installation/dpuservicetemplate_ovn.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: ovn namespace: dpf-operator-system spec: deploymentServiceName:
"ovn"helmChart: source: repoURL: $OVN_KUBERNETES_REPO_URL chart: ovn-kubernetes-chart version: $TAG values: commonManifests: enabled:truedpuManifests: enabled:trueleaseNamespace:"ovn-kubernetes"gatewayOpts:"--gateway-interface=br-ovn --gateway-uplink-port=puplinkbrovn"HBN DPUServiceConfig and DPUServiceTemplate to deploy HBN workloads to the DPUs.
manifests/05-dpudeployment-installation/dpuserviceconfig_hbn.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: hbn namespace: dpf-operator-system spec: deploymentServiceName:
"hbn"serviceConfiguration: serviceDaemonSet: annotations: k8s.v1.cni.cncf.io/networks: |- [ {"name":"iprequest","interface":"ip_lo","cni-args": {"poolNames": ["loopback"],"poolType":"cidrpool"}}, {"name":"iprequest","interface":"ip_pf2dpu2","cni-args": {"poolNames": ["pool1"],"poolType":"cidrpool","allocateDefaultGateway":true}} ] helmChart: values: configuration: perDPUValuesYAML: | - hostnamePattern:"*"values: bgp_peer_group: hbn - hostnamePattern:"worker1*"values: bgp_autonomous_system:65101- hostnamePattern:"worker2*"values: bgp_autonomous_system:65201startupYAMLJ2: | - header: model: BLUEFIELD nvue-api-version: nvue_v1 rev-id:1.0version: HBN2.4.0- set:interface: lo: ip: address: {{ ipaddresses.ip_lo.ip }}/32: {} type: loopback p0_if,p1_if: type: swp link: mtu:9000pf2dpu2_if: ip: address: {{ ipaddresses.ip_pf2dpu2.cidr }}: {} type: swp link: mtu:9000router: bgp: autonomous-system: {{ config.bgp_autonomous_system }} enable: on graceful-restart: mode: full router-id: {{ ipaddresses.ip_lo.ip }} vrf:default: router: bgp: address-family: ipv4-unicast: enable: on redistribute: connected: enable: on ipv6-unicast: enable: on redistribute: connected: enable: on enable: on neighbor: p0_if: peer-group: {{ config.bgp_peer_group }} type: unnumbered p1_if: peer-group: {{ config.bgp_peer_group }} type: unnumbered path-selection: multipath: aspath-ignore: on peer-group: {{ config.bgp_peer_group }}: remote-as: external interfaces: ## NOTE: Interfaces inside the HBN pod must have the `_if` suffix due to a naming convention in HBN. - name: p0_if network: mybrhbn - name: p1_if network: mybrhbn - name: pf2dpu2_if network: mybrhbnmanifests/05-dpudeployment-installation/dpuservicetemplate_hbn.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: hbn namespace: dpf-operator-system spec: deploymentServiceName:
"hbn"helmChart: source: repoURL: $HELM_REGISTRY_REPO_URL version:1.0.2chart: doca-hbn values: image: repository: $HBN_NGC_IMAGE_URL tag:3.0.0-doca3.0.0resources: memory: 6Gi nvidia.com/bf_sf:3DOCA Telemetry Service (DTS) DPUServiceConfig and DPUServiceTemplate to deploy DTS to the DPUs.
manifests/05-dpudeployment-installation/dpuserviceconfig_dts.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: dts namespace: dpf-operator-system spec: deploymentServiceName:
"dts"manifests/05-dpudeployment-installation/dpuservicetemplate_dts.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: dts namespace: dpf-operator-system spec: deploymentServiceName:
"dts"helmChart: source: repoURL: $HELM_REGISTRY_REPO_URL version:1.0.6chart: doca-telemetryBlueman DPUServiceConfig and DPUServiceTemplate to deploy Blueman to the DPUs.
manifests/05-dpudeployment-installation/dpuserviceconfig_blueman.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: blueman namespace: dpf-operator-system spec: deploymentServiceName:
"blueman"manifests/05-dpudeployment-installation/dpuservicetemplate_blueman.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: blueman namespace: dpf-operator-system spec: deploymentServiceName:
"blueman"helmChart: source: repoURL: $HELM_REGISTRY_REPO_URL version:1.0.8chart: doca-bluemanOVN DPUServiceCredentialRequest to allow cross cluster communication.
manifests/05-dpudeployment-installation/ovn-credentials.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceCredentialRequest metadata: name: ovn-dpu namespace: dpf-operator-system spec: serviceAccount: name: ovn-dpu namespace: dpf-operator-system duration: 24h type: tokenFile secret: name: ovn-dpu namespace: dpf-operator-system metadata: labels: dpu.nvidia.com/image-pull-secret:
""DPUServiceInterfaces for physical ports on the DPU.
manifests/05-dpudeployment-installation/physical-ifaces.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceInterface metadata: name: p0 namespace: dpf-operator-system spec: template: spec: template: metadata: labels: uplink:
"p0"spec: interfaceType: physical physical: interfaceName: p0 --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceInterface metadata: name: p1 namespace: dpf-operator-system spec: template: spec: template: metadata: labels: uplink:"p1"spec: interfaceType: physical physical: interfaceName: p1OVN DPUServiceInterface to define the ports attached to OVN workloads on the DPU.
manifests/05-dpudeployment-installation/ovn-iface.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceInterface metadata: name: ovn namespace: dpf-operator-system spec: template: spec: template: metadata: labels: port: ovn spec: interfaceType: ovn
DPUServiceIPAM to set up IP Address Management on the DPUCluster.
manifests/05-dpudeployment-installation/hbn-ovn-ipam.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceIPAM metadata: name: pool1 namespace: dpf-operator-system spec: ipv4Network: network:
"10.0.120.0/22"gatewayIndex:3prefixSize:29DPUServiceIPAM for the loopback interface in HBN.
manifests/05-dpudeployment-installation/hbn-loopback-ipam.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceIPAM metadata: name: loopback namespace: dpf-operator-system spec: ipv4Network: network:
"11.0.0.0/24"prefixSize:32
Apply all of the YAML files mentioned above using the following command:
Jump Node Console
$ cat manifests/05-dpudeployment-installation/*.yaml | envsubst | kubectl apply -f -
Verify the DPUService installation by ensuring the DPUServices are created and have been reconciled, that the DPUServiceIPAMs have been reconciled, that the DPUServiceInterfaces have been reconciled, and that the DPUServiceChains have been reconciled:
NoteThese verification commands may need to be run multiple times to ensure the conditions are met.
Jump Node Console
$ kubectl wait --for=condition=ApplicationsReconciled --namespace dpf-operator-system dpuservices -l svc.dpu.nvidia.com/owned-by-dpudeployment=dpf-operator-system_ovn-hbn dpuservice.svc.dpu.nvidia.com/blueman-kqm2q condition met dpuservice.svc.dpu.nvidia.com/dts-b8vfs condition met dpuservice.svc.dpu.nvidia.com/hbn-2rglk condition met dpuservice.svc.dpu.nvidia.com/ovn-5tr2j condition met $ kubectl wait --for=condition=DPUIPAMObjectReconciled --namespace dpf-operator-system dpuserviceipam --all dpuserviceipam.svc.dpu.nvidia.com/loopback condition met dpuserviceipam.svc.dpu.nvidia.com/pool1 condition met $ kubectl wait --for=condition=ServiceInterfaceSetReconciled --namespace dpf-operator-system dpuserviceinterface --all dpuserviceinterface.svc.dpu.nvidia.com/hbn-p0-if-tnkf8 condition met dpuserviceinterface.svc.dpu.nvidia.com/hbn-p1-if-ww8qv condition met dpuserviceinterface.svc.dpu.nvidia.com/hbn-pf2dpu2-if-7l5mk condition met dpuserviceinterface.svc.dpu.nvidia.com/ovn condition met dpuserviceinterface.svc.dpu.nvidia.com/p0 condition met dpuserviceinterface.svc.dpu.nvidia.com/p1 condition met $ kubectl wait --for=condition=ServiceChainSetReconciled --namespace dpf-operator-system dpuservicechain --all dpuservicechain.svc.dpu.nvidia.com/ovn-hbn-6lkvj condition met