DPUDevice
The DPUDevice is a Kubernetes CRD that represents a physical DPU (Data Processing Unit) device that was discovered. The DPUDevice contain all the information required to identify and provision the DPU by the DPU Controller.
The DPUDevice resource serves as an inventory and management interface for physical DPU devices. It contains device-specific information such as serial numbers, product identifiers, BMC (Base Management Controller) details, and PCI addresses. The DPUDevice is can be created automatically through discovery processes or manually by administrators.
DPUDeviceSpec
The spec section defines the desired configuration for the DPU device:
Field | Type | Required | Description |
| string | Yes | The serial number of the device for inventory management |
| string | No | Product Serial ID (deprecated, use status.psid) |
| string | No | Ordering Part Number (deprecated, use status.opn) |
| string | No | IP address of the BMC for remote management |
| uint32 | No | Port number for BMC communication (default: 443) |
| int | No | Number of Physical Functions on the device (default: 1) |
| string | No | Name of the first Physical Function |
DPUDeviceStatus
The status section contains the observed state of the DPU device:
Field | Type | Description |
| string | Product Serial ID discovered from the device |
| string | Serial number discovered from the device |
| string | Ordering Part Number discovered from the device |
| string | BMC IP address discovered from the device |
| uint32 | BMC port discovered from the device |
| string | PCI address of the device in the host system |
| string | MAC address of the first Physical Function |
| array | Array of condition objects describing device state |
The DPUDevice resource uses several condition types to track its state:
DpuDeviceDiscovered: Indicates that the DPU has been discovered
DpuDeviceNodeAttached: Indicates that the DPU is attached to a node
DpuDeviceInitialized: Indicates that the DPU interface has been initialized
DpuDeviceError: Indicates that the DPUDevice has an error
DpuDeviceReady: Indicates that the DPUDevice is ready for use
Basic DPUDevice Creation
Determine the serial number of the DPUDevice. In zero-trust mode, serial number will be discovered from the BMC. In trusted mode, run: lspci -vvs ${pci_address} | grep "SN".
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUDevice
metadata:
name: MT25066004C7
namespace: dpf-operator-system
spec:
serialNumber: "MT25066004C7"
bmcIp: "10.1.2.3"
numberOfPFs: 1
pf0Name: "eth0"
Creation
DPUDevice resources are typically created through: * Automatic Discovery: * Zero-Trust: Via DPUDiscovery controller scanning IP ranges * Host-Trusted: Via dpudetector daemon on host nodes * Manual Creation: By administrators with known device details * DPU Detection: Via dpudetector daemon on host nodes
Firmware Update: - In zero-trust mode, BMC firmware will be updated to the latest version.
Updates
Most fields in DPUDevice are immutable once set. Only the following can be updated: - Labels and annotations - Status fields (managed by controllers)
Deletion
DPUDevice resources are protected by a finalizer (provisioning.dpu.nvidia.com/dpudevice-protection) to prevent accidental deletion while the device is in use.
DPUNode
DPUDevice resources are referenced by DPUNode resources through the dpus field by their serial numbers:
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUNode
metadata:
name: dpu-node-001
spec:
dpus:
- name: MT25066004C7
- name: MT25066004C8
DPU
DPU resources reference DPUDevice resources through the dpuDeviceName field:
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPU
metadata:
name: dpu-001
spec:
dpuDeviceName: MT25066004C7
dpuNodeName: dpu-node-001
# ... other fields
Checking Device Status
# Get all DPUDevice resources
kubectl get dpudevices -n dpf-operator-system
# Get detailed information about a specific device
kubectl describe dpudevice MT25066004C7 -n dpf-operator-system
# Check device conditions
kubectl get dpudevice MT25066004C7 -n dpf-operator-system -o jsonpath='{.status.conditions}'
Common Issues
Device Not Discovered when in Zero Trust setup: Check if the device is reachable via BMC IP
Invalid Serial Number: Ensure the serial number matches the required pattern
BMC Connection Issues: Verify BMC IP and port configuration
PCI Address Not Found: Check if the device is properly installed in the host
Status Conditions
Monitor the following conditions for device health:
# Check if device is ready
kubectl get dpudevice MT25066004C7 -n dpf-operator-system -o jsonpath='{.status.conditions[?(@.type=="DpuDeviceReady")].status}'
# Check for errors
kubectl get dpudevice MT25066004C7 -n dpf-operator-system -o jsonpath='{.status.conditions[?(@.type=="DpuDeviceError")]}'
DPUNode - Node-level DPU management
DPUDiscovery - Automatic DPU discovery
DPU - DPU provisioning and deployment
DPUSet - Bulk DPU management