DPU management is NICo’s primary value differentiator. NICo treats every BlueField DPU as a first-class managed component: installing its OS, configuring host networking, monitoring health, upgrading firmware, and reprovisioning the DPU automatically when it drifts from the desired state. The DPU is the enforcement boundary for host isolation and network security; NICo manages it end-to-end so operators do not have to.
This page covers the full DPU lifecycle: what NICo installs, how it installs it, how it keeps the DPU healthy, and how to intervene when something goes wrong. For the full host ingestion flow, which includes DPU provisioning, see Ingesting Hosts. For the exact state transitions and retry paths, see the Managed Host State Diagrams. For DPU network configuration details, see DPU Configuration.
dpu-agent starts, fetches desired configuration from NICo Core, applies HBN/NVUE configuration, and reports healthy.dpu-agent continuously checks DPU health and reports back to NICo Core. NICo uses these reports to gate lifecycle transitions and allocation.NICo treats each managed host as a host server paired with one or more BlueField DPUs. During ingestion, NICo installs the DPU OS, configures the DPU for host networking, and starts the services that let the site controller manage the host without trusting the host operating system.
dpu-agentThe DPU agent runs as a daemon on the DPU. In service names and logs it appears as nico-dpu-agent; in the documentation it is usually referred to as dpu-agent.
The agent periodically calls GetManagedHostNetworkConfig to fetch the desired configuration from NICo Core. It applies the configuration locally, runs health checks, and reports status back with RecordDpuNetworkStatus. The report includes applied configuration versions and DPU health.
The agent is responsible for:
NICo runs a custom DHCP server on the DPU. The DPU-local DHCP server handles DHCP requests from the attached host, so DHCP traffic from the host primary networking interfaces does not leave the DPU and does not appear directly on the underlay network.
This is a security benefit: the DPU enforces host isolation before the host receives any network configuration. A compromised host cannot broadcast DHCP traffic onto the underlay to discover or interfere with other hosts. It also makes DHCP behavior part of the declarative DPU configuration that dpu-agent receives from NICo Core.
The NICo Metadata Service (MDS) exposes instance metadata to tenants from the DPU. Tenants can use MDS to retrieve information such as the Machine ID and boot or operating system metadata for their instance. MDS runs on the DPU rather than on the host, so its responses are trusted independently of the host OS.
NICo uses HBN (Host-Based Networking), backed by containerized Cumulus, to provide the host networking behavior that the site controller expects. The dpu-agent converts desired network state from NICo Core into NVUE configuration and applies it through the NVUE CLI. After applying configuration, the agent checks that HBN and related services are healthy before NICo advances lifecycle state.
For the detailed configuration model, versioning behavior, and isolation semantics, see DPU Configuration.
Note: DPF-managed DPU installation and reprovisioning follow a separate flow and will be documented in a follow-up page. This guide describes the non-DPF DPU lifecycle unless a section explicitly says otherwise.
DPU OS installation happens as part of the managed host state machine after Site Explorer has discovered and paired the host with its DPU or DPUs. NICo supports two installation methods and selects the method automatically based on DPU BMC firmware capabilities and site configuration.
NICo uses two different BFB images. They are not interchangeable:
dpu-agent, the DPU DHCP server, MDS, HBN installer and configuration, NICo root CA, and scout. This is the image that makes the DPU a fully managed component. For build instructions, see Building NICo Containers.preingestion.bfb): The unmodified vanilla DOCA BFB, saved as-is during the build process before any NICo customization is applied. It does not contain dpu-agent, HBN, MDS, or any other NICo services. This image is used only for pre-ingestion recovery via rshim (copy-bfb-to-dpu-rshim) to return a DPU to a clean factory state so that NICo can discover and pair it. After the preingestion BFB is installed, the normal state machine installs the NICo BFB.dpu_enable_secure_boot defaults to false. When disabled, all DPUs use the UEFI HTTP Boot path regardless of BMC firmware version. To use Redfish BFB install, operators must explicitly enable it in the site configuration.
NICo checks supports_bfb_install against every DPU on the host. All DPUs on a host must support Redfish BFB install for NICo to use that path; if any DPU does not, the host falls back to UEFI HTTP Boot.
This is the preferred method for DPUs with recent BMC firmware. NICo pushes the BFB image directly to the DPU BMC over Redfish, which gives the state machine explicit progress tracking and error reporting.
DpuDiscoveringState.EnableSecureBoot sub-flow).DPUInit/InstallDpuOs/InstallingBFB.UpdateService SimpleUpdate action, pointing it at the NICo BFB hosted by nico-pxe, with the target DPU_OS.DPUInit/InstallDpuOs/WaitForInstallComplete.dpu-agent report before moving to host initialization.While the BFB task is running, the handler outcome includes messages like Waiting for BFB install to complete: <percent>%. If the Redfish task fails, the state moves to InstallationError and the task messages are stored in the state handler outcome and logs.
For DPUs whose BMC firmware does not support Redfish-based BFB install, NICo falls back to a network install via UEFI HTTP Boot. In this path the DPU downloads and installs its OS from nico-pxe during boot rather than receiving a Redfish push.
DpuDiscoveringState.DisableSecureBoot sub-flow). Secure Boot must be off because the network boot image is not signed for the DPU Secure Boot chain.SetUefiHttpBoot state) and reboots all DPUs.nico-pxe. NICo serves a DPU-specific boot payload: a nico.efi kernel, a nico.root initrd, and a BlueField Kickstart script (bfks) delivered via cloud-init user-data. The kickstart script drives the BFB installation on the DPU.DPUInit/Init, restarts all DPUs, power-cycles the host, and waits for the DPU to come up with the new image.WaitingForPlatformConfiguration and WaitingForNetworkConfig, waiting for the dpu-agent to apply configuration and report healthy, before moving to host initialization.Because there is no Redfish task to poll, NICo monitors the network install indirectly: it watches for the DPU to become reachable and for dpu-agent to report in. If the DPU does not come up within the SLA, the state machine triggers a reboot to retry.
Note: During reprovisioning, this same distinction applies. If BFB install is supported, NICo enters
ReprovisionState::InstallDpuOs. If not, it entersReprovisionState::WaitingForNetworkInstall, which boots the DPU via UEFI HTTP and waits for it to complete the network install and become healthy.
During normal ingestion no manual action is required. Operators can monitor the state with:
For Redfish BFB installs, the handler outcome reports install percentage. For UEFI HTTP Boot installs, the handler outcome reports DPU discovery and reboot status.
Most DPU OS installation failures are diagnosed from the managed host state, nico-api logs, and (for Redfish installs) the Redfish task messages returned by the DPU BMC.
For the manual rshim recovery command (which installs the preingestion BFB, not the NICo BFB) and additional pairing troubleshooting, see DPU-Related Issues: Installing a Fresh DPU OS. For the full DPU troubleshooting workflow, see WaitingForNetworkConfig and DPU health.
NICo manages DPU firmware as part of the same managed host lifecycle. DPU firmware inventory comes from Redfish and hardware discovery. The configured firmware baseline is stored in the site configuration under dpu_config.
NICo tracks the following DPU firmware components:
Firmware upgrades can be triggered in two ways:
Ready managed host has DPU NIC firmware outside the configured dpu_nic_firmware_update_versions, it queues a DPU reprovisioning request. During reprovisioning, NICo verifies and updates all DPU firmware components (BMC, CEC/ERoT, NIC) against the configured baseline, but only NIC firmware version drift triggers the automatic reprovisioning.Machine Update Manager stages upgrades so the site does not take too many hosts out of service at once. Before scheduling an additional update, it evaluates:
A DPU update is treated as a host-level maintenance event because the host and its DPU or DPUs are updated together. During an update, NICo applies a HostUpdateInProgress health alert with the PreventAllocations classification, which keeps tenants from acquiring the host while work is in progress.
Operators can inspect DPU firmware status with:
After the DPU OS is installed, the dpu-agent keeps HBN configured by applying NVUE configuration generated from NICo Core state. The configuration covers:
Configuration is versioned. NICo maintains separate version numbers for managedhost_network_config (site controller lifecycle changes) and instance_network_config (tenant-driven changes). NICo only considers the DPU synchronized when the DPU reports the expected versions for both and reports itself healthy.
After any configuration change, the dpu-agent raises a PostConfigCheckWait alert for approximately 30 seconds. This brief hold gives the DPU time to verify that the new configuration is stable (BGP sessions re-establish, services restart) before NICo treats it as applied.
If the dpu-agent calls GetManagedHostNetworkConfig and receives a NotFound error (the site controller does not recognize this DPU), the agent automatically configures the DPU into an isolated mode. This prevents unknown or removed DPUs from consuming network resources.
DPU health is part of aggregate host health. NICo combines reports from dpu-agent, BMC health monitoring, inventory monitoring, validation, and operator overrides. For the full health model, see Health Checks and Health Aggregation.
dpu-agent ChecksThe dpu-agent runs periodic health checks and includes the results in every RecordDpuNetworkStatus report. The checks cover:
NICo uses DPU health to gate state transitions and allocation:
PreventAllocations classification, the host is not available for new tenant allocation.dpu-agent stops sending reports entirely, NICo records a HeartbeatTimeout health alert against nico-dpu-agent.When a DPU becomes unhealthy, inspect the managed host state and DPU health report:
Key fields to check in the output:
HeartbeatTimeout, BgpStats, ServiceRunning).In State > SLA: true with the breach reason.For the full troubleshooting workflow, including how to check logs via Grafana/Loki, verify DPU liveliness, restart the agent, and diagnose specific health probe alerts, see WaitingForNetworkConfig and DPU health.
DPU reprovisioning reinstalls the DPU OS and then waits for discovery, network configuration, and DPU health to converge again. It is used for planned firmware updates, DPU recovery, and cases where a DPU must be returned to a known clean state.
The reprovisioning state machine runs through the following stages:
UpdateService (ReprovisionState::InstallDpuOs). If not, boot the DPU via UEFI HTTP for a network install (ReprovisionState::WaitingForNetworkInstall).Automatic DPU reprovisioning is triggered when Machine Update Manager selects an eligible Ready host whose DPU NIC firmware is outside the configured baseline. It queues a DPU reprovisioning request for the host.
The API requires a HostUpdateInProgress health alert on the host before it accepts a reprovisioning request. Use --update-message to apply this alert:
Firmware is always verified and updated during reprovisioning regardless of whether --update-firmware is passed. The --update-firmware flag is accepted but deprecated.
The managed-host show output displays the current reprovisioning substate, percent complete for BFB installation (when available), and any handler errors.
To restart a DPU reprovisioning flow for all DPUs on a host:
To clear a pending reprovisioning request that has not started:
For the complete reprovisioning state machine, see DPU Reprovision State Details.