DPU Provisioning via Redfish API
DPF supports managing DPUs through Out-of-Band (Redfish) management.
The following requirements must be satisfied by the DPU to be managed via Redfish:
The BMC firmware version of DPU must be 24.10 or higher
The BMC of DPU must be reset to factory defaults before installing DPF
The DPU OOB interface must be connected with DPF control plane
Note: DOCA Perftest Bootstrap provides Ansible tasks for batch upgrading BMC and resetting BMC to factory defaults.
Follow the installation steps to install the DPF system.
DPF Operator Configuration
To enable provisioning via the Redfish interface, apply the following DPFOperatorConfig
:
---
apiVersion: operator.dpu.nvidia.com/v1alpha1
kind: DPFOperatorConfig
metadata:
name: dpfoperatorconfig
namespace: dpf-operator-system
labels:
app.kubernetes.io/name: dpf-operator
app.kubernetes.io/instance: dpf-operator
spec:
provisioningController:
bfbPVCName: "bfb-pvc"
installInterface:
installViaRedfish:
# Set this
to the IP of one of your control plane nodes + 8080
port
bfbRegistryAddress: "192.168.49.2:8080"
kamajiClusterManager:
disable: false
Credentials
To authenticate with Redfish, provide a password for the BMC root user:
Note: Refer to the BlueField DPU Administrator Quick Start Guide for BMC password constraints.
Create the BMC password secret:
kubectl create secret generic -n dpf-operator-system bmc-shared-password --from-literal=password='ROOT_BMC_PASSWORD'
During the DPU provisioning process, DPF will update the passwords of all DPUs according to the provided credential. Note that the credential cannot be modified after creation.
Create a DPUDevice
resource for each DPU:
Note: The
DPUDevice
is immutable, and creating a DPUDevice will not trigger DPU provisioning.
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUDevice
metadata:
name: dpu-device-1
namespace: dpf-operator-system
spec:
bmcIp: 10.0
.110.122
Create a DPUNode
resource for each host that has a DPU:
Note: The
.spec.dpus
field contains the names of each DPUDevice attached to the node. Currently, DPF only supports setting a single DPU for each DPUNode.
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUNode
metadata:
labels:
feature.node.kubernetes.io/dpu-enabled: "true"
feature.node.kubernetes.io/dpu-oob-bridge-configured: ""
name: worker1
namespace: dpf-operator-system
spec:
dpus:
- name: dpu-device-1
nodeRebootMethod:
external: {}
Use DPUSet
to deploy DPUs, refer DPUSet for more detail. Example configuration:
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUSet
metadata:
name: dpuset
namespace: dpf-operator-system
spec:
dpuNodeSelector:
matchLabels:
feature.node.kubernetes.io/dpu-enabled: "true"
strategy:
rollingUpdate:
maxUnavailable: "10%"
type: RollingUpdate
dpuTemplate:
spec:
dpuFlavor: dpf-provisioning-hbn-ovn
bfb:
name: bf-bundle-new
nodeEffect:
noEffect: true
In the Redfish scenario, DPF cannot manage the DPU's host machine. During the DPU provisioning process, when the DPU CR reaches the rebooting
phase, manual power-cycling is required by the user. The power-cycle operation must be completed within two hours; otherwise, the DPU join cluster's secret will expire, causing DPU CR pending in DPU Cluster Config
phase. After the worker node boots up, the provisioning.dpu.nvidia.com/dpunode-external-reboot-required
annotation on the DPUNode must be manually removed.
Follow the Deletion and clean up steps to uninstall the DPF system.