DPU Provisioning and Service Installation
Before deploying the objects under
manifests/05-dpudeployment-installation
directory, few adjustments need to be made to later achieve better performance results.Create a new DPUFlavor using the following YAML:
NoteThe parameter
NUM_VF_MSIX
is configured to be 48 in the provided example, which is suited for the HP servers that were used in this RDG.Set it to the physical number of cores in the NUMA node the NIC is located in.
manifests/05-dpudeployment-installation/dpuflavor_perf.yaml
--- apiVersion: provisioning.dpu.nvidia.com/v1alpha1 kind: DPUFlavor metadata: name: dpf-provisioning-hbn-ovn-performance namespace: dpf-operator-system spec: bfcfgParameters: - UPDATE_ATF_UEFI=yes - UPDATE_DPU_OS=yes - WITH_NIC_FW_UPDATE=yes configFiles: - operation: override path: /etc/mellanox/mlnx-bf.conf permissions:
"0644"
raw: | ALLOW_SHARED_RQ="no"
IPSEC_FULL_OFFLOAD="no"
ENABLE_ESWITCH_MULTIPORT="yes"
- operation: override path: /etc/mellanox/mlnx-ovs.conf permissions:"0644"
raw: | CREATE_OVS_BRIDGES="no"
- operation: override path: /etc/mellanox/mlnx-sf.conf permissions:"0644"
raw:""
grub: kernelParameters: - console=hvc0 - console=ttyAMA0 - earlycon=pl011,0x13010000
- fixrttc - net.ifnames=0
- biosdevname=0
- iommu.passthrough=1
- cgroup_no_v1=net_prio,net_cls - hugepagesz=2048kB - hugepages=8072
nvconfig: - device:'*'
parameters: - PF_BAR2_ENABLE=0
- PER_PF_NUM_SF=1
- PF_TOTAL_SF=20
- PF_SF_BAR_SIZE=10
- NUM_PF_MSIX_VALID=0
- PF_NUM_PF_MSIX_VALID=1
- PF_NUM_PF_MSIX=228
- INTERNAL_CPU_MODEL=1
- INTERNAL_CPU_OFFLOAD_ENGINE=0
- SRIOV_EN=1
- NUM_OF_VFS=46
- LAG_RESOURCE_ALLOCATION=1
- NUM_VF_MSIX=48
ovs: rawConfigScript: | _ovs-vsctl() { ovs-vsctl --no-wait --timeout15
"$@"
} _ovs-vsctl set Open_vSwitch . other_config:doca-init=true
_ovs-vsctl set Open_vSwitch . other_config:dpdk-max-memzones=50000
_ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
_ovs-vsctl set Open_vSwitch . other_config:pmd-quiet-idle=true
_ovs-vsctl set Open_vSwitch . other_config:max-idle=20000
_ovs-vsctl set Open_vSwitch . other_config:max-revalidator=5000
_ovs-vsctl --if
-exists del-br ovsbr1 _ovs-vsctl --if
-exists del-br ovsbr2 _ovs-vsctl --may-exist add-br br-sfc _ovs-vsctl set bridge br-sfc datapath_type=netdev _ovs-vsctl set bridge br-sfc fail_mode=secure _ovs-vsctl --may-exist add-port br-sfc p0 _ovs-vsctl set Interface p0 type=dpdk _ovs-vsctl set Interface p0 mtu_request=9216
_ovs-vsctl set Port p0 external_ids:dpf-type=physical _ovs-vsctl --may-exist add-port br-sfc p1 _ovs-vsctl set Interface p1 type=dpdk _ovs-vsctl set Interface p1 mtu_request=9216
_ovs-vsctl set Port p1 external_ids:dpf-type=physical _ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-datapath-type=netdev _ovs-vsctl --may-exist add-br br-ovn _ovs-vsctl set bridge br-ovn datapath_type=netdev _ovs-vsctl set Interface br-ovn mtu_request=9216
_ovs-vsctl --may-exist add-port br-ovn pf0hpf _ovs-vsctl set Interface pf0hpf type=dpdk _ovs-vsctl set Interface pf0hpf mtu_request=9216
cat <<EOT > /etc/netplan/99
-dpf-comm-ch.yaml network: renderer: networkd version:2
ethernets: pf0vf0: mtu:9000
dhcp4: no bridges: br-comm-ch: dhcp4: yes interfaces: - pf0vf0 EOTAdjust
dpudeployment.yaml
to reference the DPUFlavor suited for performance (This component provisions DPUs on the worker nodes and describes a set of DPUServices and DPUServiceChain that run on those DPUs):manifests/05-dpudeployment-installation/dpudeployment.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUDeployment metadata: name: ovn-hbn namespace: dpf-operator-system spec: dpus: bfb: bf-bundle flavor: dpf-provisioning-hbn-ovn-performance dpuSets: - nameSuffix:
"dpuset1"
nodeSelector: matchLabels: feature.node.kubernetes.io/dpu-enabled:"true"
services: ovn: serviceTemplate: ovn serviceConfiguration: ovn hbn: serviceTemplate: hbn serviceConfiguration: hbn dts: serviceTemplate: dts serviceConfiguration: dts blueman: serviceTemplate: blueman serviceConfiguration: blueman serviceChains: switches: - ports: - serviceInterface: matchLabels: uplink: p0 - service: name: hbninterface
: p0_if - ports: - serviceInterface: matchLabels: uplink: p1 - service: name: hbninterface
: p1_if - ports: - serviceInterface: matchLabels: port: ovn - service: name: hbninterface
: pf2dpu2_ifSet the
mtu
to8940
for the OVN DPUServiceConfig (to deploy the OVN Kubernetes workloads on the DPU with the same MTU as in the host):manifests/05-dpudeployment-installation/dpuserviceconfig_ovn.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: ovn namespace: dpf-operator-system spec: deploymentServiceName:
"ovn"
serviceConfiguration: helmChart: values: k8sAPIServer: https://$TARGETCLUSTER_API_SERVER_HOST:$TARGETCLUSTER_API_SERVER_PORT
podNetwork: $POD_CIDR/24
serviceNetwork: $SERVICE_CIDR mtu:8940
dpuManifests: kubernetesSecretName:"ovn-dpu"
# user needs to populate based on DPUServiceCredentialRequest vtepCIDR:"10.0.120.0/22"
# user needs to populate based on DPUServiceIPAM hostCIDR: $TARGETCLUSTER_NODE_CIDR # user needs to populate ipamPool:"pool1"
# user needs to populate based on DPUServiceIPAM ipamPoolType:"cidrpool"
# user needs to populate based on DPUServiceIPAM ipamVTEPIPIndex:0
ipamPFIPIndex:1
The rest of the configuration files remain the same, including:
BFB to download BlueField Bitstream to a shared volume.
manifests/05-dpudeployment-installation/bfb.yaml
--- apiVersion: provisioning.dpu.nvidia.com/v1alpha1 kind: BFB metadata: name: bf-bundle namespace: dpf-operator-system spec: url: $BLUEFIELD_BITSTREAM
OVN DPUServiceTemplate to deploy OVN Kubernetes workloads to the DPUs.
manifests/05-dpudeployment-installation/dpuservicetemplate_ovn.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: ovn namespace: dpf-operator-system spec: deploymentServiceName:
"ovn"
helmChart: source: repoURL: $OVN_KUBERNETES_REPO_URL chart: ovn-kubernetes-chart version: $TAG values: commonManifests: enabled:true
dpuManifests: enabled:true
leaseNamespace:"ovn-kubernetes"
gatewayOpts:"--gateway-interface=br-ovn --gateway-uplink-port=puplinkbrovn"
HBN DPUServiceConfig and DPUServiceTemplate to deploy HBN workloads to the DPUs.
manifests/05-dpudeployment-installation/dpuserviceconfig_hbn.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: hbn namespace: dpf-operator-system spec: deploymentServiceName:
"hbn"
serviceConfiguration: serviceDaemonSet: annotations: k8s.v1.cni.cncf.io/networks: |- [ {"name"
:"iprequest"
,"interface"
:"ip_lo"
,"cni-args"
: {"poolNames"
: ["loopback"
],"poolType"
:"cidrpool"
}}, {"name"
:"iprequest"
,"interface"
:"ip_pf2dpu2"
,"cni-args"
: {"poolNames"
: ["pool1"
],"poolType"
:"cidrpool"
,"allocateDefaultGateway"
:true
}} ] helmChart: values: configuration: perDPUValuesYAML: | - hostnamePattern:"*"
values: bgp_peer_group: hbn - hostnamePattern:"worker1*"
values: bgp_autonomous_system:65101
- hostnamePattern:"worker2*"
values: bgp_autonomous_system:65201
startupYAMLJ2: | - header: model: BLUEFIELD nvue-api-version: nvue_v1 rev-id:1.0
version: HBN2.4
.0
- set:interface
: lo: ip: address: {{ ipaddresses.ip_lo.ip }}/32
: {} type: loopback p0_if,p1_if: type: swp link: mtu:9000
pf2dpu2_if: ip: address: {{ ipaddresses.ip_pf2dpu2.cidr }}: {} type: swp link: mtu:9000
router: bgp: autonomous-system: {{ config.bgp_autonomous_system }} enable: on graceful-restart: mode: full router-id: {{ ipaddresses.ip_lo.ip }} vrf:default
: router: bgp: address-family: ipv4-unicast: enable: on redistribute: connected: enable: on ipv6-unicast: enable: on redistribute: connected: enable: on enable: on neighbor: p0_if: peer-group: {{ config.bgp_peer_group }} type: unnumbered p1_if: peer-group: {{ config.bgp_peer_group }} type: unnumbered path-selection: multipath: aspath-ignore: on peer-group: {{ config.bgp_peer_group }}: remote-as: external interfaces: ## NOTE: Interfaces inside the HBN pod must have the `_if` suffix due to a naming convention in HBN. - name: p0_if network: mybrhbn - name: p1_if network: mybrhbn - name: pf2dpu2_if network: mybrhbnmanifests/05-dpudeployment-installation/dpuservicetemplate_hbn.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: hbn namespace: dpf-operator-system spec: deploymentServiceName:
"hbn"
helmChart: source: repoURL: $HELM_REGISTRY_REPO_URL version:1.0
.2
chart: doca-hbn values: image: repository: $HBN_NGC_IMAGE_URL tag:3.0
.0
-doca3.0.0
resources: memory: 6Gi nvidia.com/bf_sf:3
DOCA Telemetry Service (DTS) DPUServiceConfig and DPUServiceTemplate to deploy DTS to the DPUs.
manifests/05-dpudeployment-installation/dpuserviceconfig_dts.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: dts namespace: dpf-operator-system spec: deploymentServiceName:
"dts"
manifests/05-dpudeployment-installation/dpuservicetemplate_dts.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: dts namespace: dpf-operator-system spec: deploymentServiceName:
"dts"
helmChart: source: repoURL: $HELM_REGISTRY_REPO_URL version:1.0
.6
chart: doca-telemetryBlueman DPUServiceConfig and DPUServiceTemplate to deploy Blueman to the DPUs.
manifests/05-dpudeployment-installation/dpuserviceconfig_blueman.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: blueman namespace: dpf-operator-system spec: deploymentServiceName:
"blueman"
manifests/05-dpudeployment-installation/dpuservicetemplate_blueman.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: blueman namespace: dpf-operator-system spec: deploymentServiceName:
"blueman"
helmChart: source: repoURL: $HELM_REGISTRY_REPO_URL version:1.0
.8
chart: doca-bluemanOVN DPUServiceCredentialRequest to allow cross cluster communication.
manifests/05-dpudeployment-installation/ovn-credentials.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceCredentialRequest metadata: name: ovn-dpu namespace: dpf-operator-system spec: serviceAccount: name: ovn-dpu namespace: dpf-operator-system duration: 24h type: tokenFile secret: name: ovn-dpu namespace: dpf-operator-system metadata: labels: dpu.nvidia.com/image-pull-secret:
""
DPUServiceInterfaces for physical ports on the DPU.
manifests/05-dpudeployment-installation/physical-ifaces.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceInterface metadata: name: p0 namespace: dpf-operator-system spec: template: spec: template: metadata: labels: uplink:
"p0"
spec: interfaceType: physical physical: interfaceName: p0 --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceInterface metadata: name: p1 namespace: dpf-operator-system spec: template: spec: template: metadata: labels: uplink:"p1"
spec: interfaceType: physical physical: interfaceName: p1OVN DPUServiceInterface to define the ports attached to OVN workloads on the DPU.
manifests/05-dpudeployment-installation/ovn-iface.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceInterface metadata: name: ovn namespace: dpf-operator-system spec: template: spec: template: metadata: labels: port: ovn spec: interfaceType: ovn
DPUServiceIPAM to set up IP Address Management on the DPUCluster.
manifests/05-dpudeployment-installation/hbn-ovn-ipam.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceIPAM metadata: name: pool1 namespace: dpf-operator-system spec: ipv4Network: network:
"10.0.120.0/22"
gatewayIndex:3
prefixSize:29
DPUServiceIPAM for the loopback interface in HBN.
manifests/05-dpudeployment-installation/hbn-loopback-ipam.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceIPAM metadata: name: loopback namespace: dpf-operator-system spec: ipv4Network: network:
"11.0.0.0/24"
prefixSize:32
Apply all of the YAML files mentioned above using the following command:
Jump Node Console
$ cat manifests/05-dpudeployment-installation/*.yaml | envsubst | kubectl apply -f -
Verify the DPUService installation by ensuring the DPUServices are created and have been reconciled, that the DPUServiceIPAMs have been reconciled, that the DPUServiceInterfaces have been reconciled, and that the DPUServiceChains have been reconciled:
NoteThese verification commands may need to be run multiple times to ensure the conditions are met.
Jump Node Console
$ kubectl wait --for=condition=ApplicationsReconciled --namespace dpf-operator-system dpuservices -l svc.dpu.nvidia.com/owned-by-dpudeployment=dpf-operator-system_ovn-hbn dpuservice.svc.dpu.nvidia.com/blueman-kqm2q condition met dpuservice.svc.dpu.nvidia.com/dts-b8vfs condition met dpuservice.svc.dpu.nvidia.com/hbn-2rglk condition met dpuservice.svc.dpu.nvidia.com/ovn-5tr2j condition met $ kubectl wait --for=condition=DPUIPAMObjectReconciled --namespace dpf-operator-system dpuserviceipam --all dpuserviceipam.svc.dpu.nvidia.com/loopback condition met dpuserviceipam.svc.dpu.nvidia.com/pool1 condition met $ kubectl wait --for=condition=ServiceInterfaceSetReconciled --namespace dpf-operator-system dpuserviceinterface --all dpuserviceinterface.svc.dpu.nvidia.com/hbn-p0-if-tnkf8 condition met dpuserviceinterface.svc.dpu.nvidia.com/hbn-p1-if-ww8qv condition met dpuserviceinterface.svc.dpu.nvidia.com/hbn-pf2dpu2-if-7l5mk condition met dpuserviceinterface.svc.dpu.nvidia.com/ovn condition met dpuserviceinterface.svc.dpu.nvidia.com/p0 condition met dpuserviceinterface.svc.dpu.nvidia.com/p1 condition met $ kubectl wait --for=condition=ServiceChainSetReconciled --namespace dpf-operator-system dpuservicechain --all dpuservicechain.svc.dpu.nvidia.com/ovn-hbn-6lkvj condition met