DOCA Platform Framework (DPF) Documentation v25.10.0

DPUDevice

The DPUDevice is a Kubernetes CRD that represents a physical DPU (Data Processing Unit) device that was discovered. The DPUDevice contain all the information required to identify and provision the DPU by the DPU Controller.

The DPUDevice resource serves as an inventory and management interface for physical DPU devices. It contains device-specific information such as serial numbers, product identifiers, BMC (Base Management Controller) details, and PCI addresses. The DPUDevice is can be created automatically through discovery processes or manually by administrators.

DPUDeviceSpec

The spec section defines the desired configuration for the DPU device:

Field

Type

Required

Description

serialNumber

string

Yes

The serial number of the device for inventory management

psid

string

No

Product Serial ID (deprecated, use status.psid)

opn

string

No

Ordering Part Number (deprecated, use status.opn)

bmcIp

string

No

IP address of the BMC for remote management

bmcPort

uint32

No

Port number for BMC communication (default: 443)

numberOfPFs

int

No

Number of Physical Functions on the device (default: 1)

pf0Name

string

No

Name of the first Physical Function


DPUDeviceStatus

The status section contains the observed state of the DPU device:

Field

Type

Description

psid

string

Product Serial ID discovered from the device

serialNumber

string

Serial number discovered from the device

opn

string

Ordering Part Number discovered from the device

bmcIp

string

BMC IP address discovered from the device

bmcPort

uint32

BMC port discovered from the device

pciAddress

string

PCI address of the device in the host system

pf0Mac

string

MAC address of the first Physical Function

conditions

array

Array of condition objects describing device state


The DPUDevice resource uses several condition types to track its state:

  • DpuDeviceDiscovered: Indicates that the DPU has been discovered

  • DpuDeviceNodeAttached: Indicates that the DPU is attached to a node

  • DpuDeviceInitialized: Indicates that the DPU interface has been initialized

  • DpuDeviceError: Indicates that the DPUDevice has an error

  • DpuDeviceReady: Indicates that the DPUDevice is ready for use

Basic DPUDevice Creation

Determine the serial number of the DPUDevice. In zero-trust mode, serial number will be discovered from the BMC. In trusted mode, run: lspci -vvs ${pci_address} | grep "SN".

Copy
Copied!
            

--- apiVersion: provisioning.dpu.nvidia.com/v1alpha1 kind: DPUDevice metadata: name: MT25066004C7 namespace: dpf-operator-system spec: serialNumber: "MT25066004C7" bmcIp: "10.1.2.3" numberOfPFs: 1 pf0Name: "eth0"


Creation

DPUDevice resources are typically created through: * Automatic Discovery: * Zero-Trust: Via DPUDiscovery controller scanning IP ranges * Host-Trusted: Via dpudetector daemon on host nodes * Manual Creation: By administrators with known device details * DPU Detection: Via dpudetector daemon on host nodes

Firmware Update: - In zero-trust mode, BMC firmware will be updated to the latest version.

Updates

Most fields in DPUDevice are immutable once set. Only the following can be updated: - Labels and annotations - Status fields (managed by controllers)

Deletion

DPUDevice resources are protected by a finalizer (provisioning.dpu.nvidia.com/dpudevice-protection) to prevent accidental deletion while the device is in use.

DPUNode

DPUDevice resources are referenced by DPUNode resources through the dpus field by their serial numbers:

Copy
Copied!
            

apiVersion: provisioning.dpu.nvidia.com/v1alpha1 kind: DPUNode metadata: name: dpu-node-001 spec: dpus: - name: MT25066004C7 - name: MT25066004C8


DPU

DPU resources reference DPUDevice resources through the dpuDeviceName field:

Copy
Copied!
            

apiVersion: provisioning.dpu.nvidia.com/v1alpha1 kind: DPU metadata: name: dpu-001 spec: dpuDeviceName: MT25066004C7 dpuNodeName: dpu-node-001 # ... other fields


Checking Device Status

Copy
Copied!
            

# Get all DPUDevice resources kubectl get dpudevices -n dpf-operator-system   # Get detailed information about a specific device kubectl describe dpudevice MT25066004C7 -n dpf-operator-system   # Check device conditions kubectl get dpudevice MT25066004C7 -n dpf-operator-system -o jsonpath='{.status.conditions}'


Common Issues

  • Device Not Discovered when in Zero Trust setup: Check if the device is reachable via BMC IP

  • Invalid Serial Number: Ensure the serial number matches the required pattern

  • BMC Connection Issues: Verify BMC IP and port configuration

  • PCI Address Not Found: Check if the device is properly installed in the host

Status Conditions

Monitor the following conditions for device health:

Copy
Copied!
            

# Check if device is ready kubectl get dpudevice MT25066004C7 -n dpf-operator-system -o jsonpath='{.status.conditions[?(@.type=="DpuDeviceReady")].status}'   # Check for errors kubectl get dpudevice MT25066004C7 -n dpf-operator-system -o jsonpath='{.status.conditions[?(@.type=="DpuDeviceError")]}'


  • DPUNode - Node-level DPU management

  • DPUDiscovery - Automatic DPU discovery

  • DPU - DPU provisioning and deployment

  • DPUSet - Bulk DPU management

© Copyright 2025, NVIDIA. Last updated on Dec 23, 2025