Network Connectivity Issues
Use this playbook when a state-machine stall appears to come from BMC, DHCP, PXE/HTTP boot, DPU agent, BGP/HBN, or API reachability.
Connectivity Matrix
BMC or OOB Unreachable
NICo cannot discover, provision, or manage a machine when its BMC is unreachable.
Check:
Common causes:
- BMC is powered off or on the wrong network.
- OOB route or VLAN is missing.
- Vault credential lookup failed.
- BMC certificate or TLS settings changed.
- Redfish endpoint is slow or rate-limited.
DHCP Failures
DHCP appears in two places:
- site DHCP through
nico-dhcp - DPU-local DHCP or relay for host-facing networking
Check site DHCP:
Check pool pressure:
carbide_available_ips_countcarbide_reserved_ips_countcarbide_resourcepool_free_count
Common causes:
PXE and HTTP Boot
nico-pxe serves discovery images, iPXE scripts, cloud-init, kickstart, BFB
URLs, and root CA content used by install paths.
If there are no PXE or HTTP requests, inspect the serial console and boot order. If requests exist but the host does not advance, inspect scout or DPU agent logs.
DPU Agent Cannot Reach nico-api
NICo waits for the DPU agent to report health and applied network config.
Check:
On the DPU:
Common causes:
- DPU OS did not boot.
nico-dpu-agentis not running.- DPU cannot resolve or reach
nico-api. - TLS root CA is missing or stale.
- DPU network config version does not match desired state.
BGP, HBN, and Edge Connectivity
When DPU services are up but network health is not, inspect HBN and FRR.
Common causes:
- TOR or route-server peering is down.
- DPU interface is down.
- HBN container is unhealthy.
- DPU config version is stale.
- Fabric configuration is not applied.