NICo uses DMTF Redfish to discover, provision, and monitor bare-metal hosts and their DPUs through BMC (Baseboard Management Controller) interfaces. This document traces the end-to-end workflow from initial DHCP discovery through ongoing monitoring.
For the overall NICo architecture and component responsibilities, see Overview and components. The Site Explorer component described there is the primary consumer of Redfish APIs.
When a BMC on the underlay network sends a DHCP request, the NICo DHCP server (a Kea hook plugin) captures it and forwards the discovery information to NICo Core.
The Kea hook is implemented as a Rust library with C FFI bindings. When a DHCP packet arrives, the hook:
Discovery struct with these fieldsdiscover_dhcp() request to NICo Core with the MAC and vendor stringMachine response containing the network configuration (IP address, gateway, etc.) to return to the BMCThe vendor class string is parsed to identify the BMC type and capabilities. DHCP entries are tracked in the database by MAC address and associated with machine interfaces.
Key files:
crates/dhcp/src/discovery.rs — Discovery struct and FFI entry points (discovery_fetch_machine)crates/dhcp/src/machine.rs — Machine::try_fetch() sends gRPC discovery requestcrates/dhcp/src/vendor_class.rs — Vendor class parsing and BMC type identificationcrates/api-model/src/dhcp_entry.rs — DhcpEntry database modelOnce NICo knows about a BMC IP from DHCP, the Site Explorer component continuously probes and inventories it via Redfish.
Site Explorer first sends an anonymous (unauthenticated) GET to /redfish/v1 (the Redfish service root) to detect the BMC vendor. The RedfishVendor enum identifies the vendor from the service root response, which determines vendor-specific behavior for subsequent operations.
After vendor detection, Site Explorer creates an authenticated Redfish session using one of three methods:
With an authenticated session, Site Explorer queries a comprehensive set of Redfish resources and produces an EndpointExplorationReport containing:
Serial numbers are trimmed of whitespace. If system.serial_number is missing, the chassis serial number is used as a fallback.
Key files:
crates/site-explorer/src/redfish.rs — RedfishClient: get_redfish_vendor(), create_redfish_client(), inventory queriescrates/site-explorer/src/bmc_endpoint_explorer.rs — BmcEndpointExplorer orchestrates credential lookup and explorationcrates/api-model/src/bmc_info.rs — BmcInfo model (IP, port, MAC, firmware version)Once Site Explorer has explored both host BMCs and DPU BMCs, it matches them into host-DPU pairs using serial number correlation. This is the core logic that answers: “which DPU belongs to which host?”
The algorithm has three strategies, tried in order:
Step 1 — Build DPU serial number map:
For each explored DPU endpoint, extract system.serial_number and create a map: DPU serial → explored endpoint.
Step 2 — Primary match via PCIe devices:
For each host, iterate through system.pcie_devices. For each device where is_bluefield() returns true (BF2, BF3, or BF3 Super NIC), look up pcie_device.serial_number in the DPU serial map. A match means this DPU is physically installed in this host.
Step 3 — Fallback match via chassis network adapters:
If no BlueField PCIe devices were found (Step 2 count = 0), iterate through chassis.network_adapters instead. For each adapter where is_bluefield_model(part_number) is true, look up network_adapter.serial_number in the DPU serial map.
Step 4 — Final fallback via expected machines manifest:
If the explored matches are incomplete, check expected_machine.fallback_dpu_serial_numbers for manually specified DPU-to-host associations.
Before accepting a pairing, NICo validates:
check_and_configure_dpu_mode() verifies the DPU is correctly configured for its model. Hosts with misconfigured DPUs are not ingested.Once all DPUs are matched and validated, the host enters an “ingestable” state and Site Explorer kickstarts the ingestion process via the ManagedHost state machine.
Key file:
crates/site-explorer/src/lib.rs: identify_managed_hosts() with the complete pairing algorithmAfter pairing, the DPU must be provisioned with NICo software. This is orchestrated via Temporal workflows (in nico-rest) with Redfish power control (in infra-controller-core).
The DPU is configured to boot from HTTP IPv4 UEFI, which directs it to the NICo PXE server. The PXE server serves different artifacts based on architecture:
nico.efi with cloud-init user-data containing machine_id and server_uriscout.efi with machine discovery parameters (cli_cmd=auto-detect)The DPU is power-cycled via Redfish to trigger the network boot:
The power control operation supports multiple reset types: On, ForceOff, GracefulShutdown, GracefulRestart, ForceRestart, ACPowercycle, PowerCycle.
After PXE boot, the DPU:
nico.efi from the NICo PXE server over HTTPmachine_id and NICo API endpointdpu-agent), which connects back to NICo Core via gRPCKey files:
crates/api/src/ipxe.rs — iPXE instruction generation per architecturepxe/ipxe/local/embed.ipxe — iPXE boot script templatenico-rest/workflow/pkg/workflow/instance/reboot.go — RebootInstance Temporal workflownico-rest/site-workflow/pkg/grpc/client/instance_powercycle.go — Power cycle gRPC call to site agentWith the DPU provisioned, NICo configures the host BIOS and boot order via Redfish.
NICo sets BIOS attributes required for bare-metal infrastructure operation. This includes SR-IOV enablement and other platform-specific settings. BIOS operations use the libredfish Redfish trait:
bios() — Read current BIOS attributesset_bios() — Set BIOS attribute valuesmachine_setup() — Apply infrastructure-specific BIOS configurationis_bios_setup() / machine_setup_status() — Check configuration stateThese translate to Redfish calls:
The host boot order is set so the DPU’s network interface is the primary boot device:
This configures the UEFI boot order to prioritize the DPU’s PF MAC address, ensuring the host boots through the DPU’s network path.
After BIOS and boot order changes, the host is power-cycled via Redfish to apply the configuration:
Power cycles are rate-limited to avoid excessive reboots (checked via time_since_redfish_powercycle against config.reset_rate_limit).
Key files:
crates/site-explorer/src/redfish.rs — set_boot_order_dpu_first(), redfish_powercycle()crates/site-explorer/src/bmc_endpoint_explorer.rs — Orchestrates boot order with credential lookupOnce hosts are provisioned, the nico-hw-health service continuously monitors both host BMCs and DPU BMCs via Redfish. The endpoint discovery calls find_machine_ids with include_dpus: true, so every BMC known to NICo (host and DPU) gets its own set of collectors:
Each collector runs independently per BMC endpoint, meaning a host with two DPUs will have three sets of collectors (one for the host BMC, one for each DPU BMC).
The FirmwareCollector periodically queries each BMC’s firmware inventory using nv-redfish:
This translates to:
Each firmware item’s name and version is exported as a Prometheus gauge metric with labels:
serial_number — Machine chassis serialmachine_id — NICo machine UUIDbmc_mac — BMC MAC addressfirmware_name — Component name (e.g., “BMC_Firmware”, “DPU_NIC”)version — Firmware version stringSensors (temperature, fan speed, power consumption, current draw) are collected at configurable intervals:
Sensor data is read from:
Sensor types include: Temperature (Cel), Rotational/Fan (RPM), Power (W), and Current (A).
All sensor data is exported as Prometheus metrics on the /metrics endpoint (port 9009) and fed into NICo Core via RecordHardwareHealthReport for health aggregation.
Key files:
crates/health/src/firmware_collector.rs — FirmwareCollector using nv-redfishcrates/health/src/discovery.rs — Creates and manages collectors per endpointcrates/health/src/config.rs — Polling intervals and concurrency configurationNICo uses two Redfish client libraries concurrently. nv-redfish is replacing libredfish over time. Versions are pinned in the workspace dependencies in Cargo.toml.
libredfish provides a Redfish trait with vendor-specific implementations (Dell, HPE, Lenovo, Supermicro, NVIDIA DPU/GB200/GH200/Viking). It handles the full breadth of BMC operations.
nv-redfish uses a code-generation approach: CSDL (Redfish schema XML) is compiled into strongly-typed Rust at build time. It is feature-gated so only needed Redfish services are compiled in. Currently enabled features in NICo: std-redfish, update-service, resource-status.
Both libraries are declared in the workspace Cargo.toml.
For the complete list of Redfish endpoints and their required response fields, see Redfish Endpoints Reference.