DPU Provisioning Failures
Use this playbook when a DPU is stuck during discovery, initialization, reprovisioning, secure boot setup, or network configuration.
Where Failures Appear
DPU provisioning issues usually show up in two places:
Start with NICo state. Move to DPF resources when NICo is waiting on DPF.
Install Path
Know which install path is active before debugging.
Common States
DpuDiscoveringState
NICo is discovering the DPU and preparing it for provisioning.
Check:
- DPU BMC reachability.
- Redfish credentials and Vault access.
- Site Explorer reports for the DPU BMC.
- DPF device status if DPF owns the next step.
DPUInit
NICo is installing or bringing up the DPU OS and services.
Check:
- DPU BMC power and console.
- DPU install method: BFB over Redfish or UEFI HTTP boot.
nico-pxelogs for HTTP boot requests.- DPF operator status.
nico-dpu-agentstartup logs once the OS boots.
WaitingForNetworkConfig
NICo blocks state advancement until the DPU agent reports:
- It is alive.
- It applied the latest desired network config version.
- Its DPU network health is acceptable.
If Last seen is stale or HeartbeatTimeout is present, inspect the DPU
directly:
DPUReprovision
Reprovisioning may require approval when a host is assigned to an instance.
If the host is assigned, confirm the tenant or user approval path before forcing disruptive actions.
Health Probes
Common DPU probe alerts:
DPU Console and Logs
If SSH to the DPU works:
If SSH fails, use DPU BMC or rshim access and check whether the DPU OS booted.
Useful on-DPU checks:
Mitigations
Use the least disruptive mitigation that addresses the root cause.