Ingesting Hosts
Once you have NVIDIA Infra Controller (NICo) up and running, you can begin ingesting machines.
Prerequisites
Ensure you have the following prerequisites met before ingesting machines:
-
You have the
carbide-admin-clicommand available: You can compile it from sources or you can use the pre-compiled binary. Another choice is to use a containerized version. -
You can access the NICo site using the
carbide-admin-cli. -
The NICo API service is running at IP address
NICo_API_EXTERNAL. It is recommended that you add this IP address to your trusted list. -
DHCP requests from all managed host IPMI networks have been forwarded to the NICo service running at IP address
NICo_DHCP_EXTERNAL. -
You have the following information for all hosts that need to be ingested:
- The MAC address of the host BMC
- The chassis serial number
- The host BMC username (typically this is the factory default username)
- The host BMC password (typically this is the factory default password)
Update Site
NICo requires knowledge of the current and desired BMC and UEFI credentials for hosts and DPUs. NICo will reset current crendtials to the desired credentials on the BMC and UEFI when ingesting a host. You can use these credentials when accessing the host or DPU BMC yourself, and NICo will use these credentials for its automated processes.
The required credentials include the following:
- Host BMC Credential
- DPU BMC Credential
- Host UEFI password
- DPU UEFI password
Note: The following commands use the
<api-url>placeholder, which is typically the following:
Update Host and DPU BMC Password
Run this command to update the desired Host and DPU BMC password:
Update Host UEFI Password
Run this command to generate the desired host UEFI password:
Run this command to update host uefi password:
Run this command to update DPU uefi password:
Add Expected Machines Table
NICo needs to know the factory default credentials for each BMC, which is expressed as a JSON table of “Expected Machines”. The serial number is used to verify the BMC MAC matches the actual serial number of the chassis.
Prepare an expected_machines.json file as follows:
Only servers listed in this table will be ingested, so you must include all servers in this file.
Optional Per-Host Fields
Each entry supports additional optional fields:
-
host_lifecycle_profile(object): Per-host profile for settings that affect state-machine progression. Future per-host knobs should be added here.disable_lockdown(bool, defaultfalse): Whentrue, the state machine does not lockdown the host during lifecycle management. This is useful for automation workflows that need lockdown persistently disabled.
-
dpf_enabled(bool): Enable/disable DPF for this host. -
dpu_mode("dpu_mode"|"nic_mode"|"no_dpu"): Per-host DPU operating mode. -
bmc_retain_credentials(bool): Skip BMC password rotation. -
default_pause_ingestion_and_poweron(bool): Pause ingestion and power-on for this host. -
bmc_ip_address(string): Static BMC IP (pre-allocates a machine interface).
When the file is ready, upload it to the site with the following command:
Approve all Machines for Ingestion
NICo uses Measured Boot using the on-host Trusted Platform Module (TPM) v2.0 to enforce cryptographic identity of the host hardware and firmware. The following command configures NICo to approve all pending machines based on PCR Registers 0, 3, 5, and 6.
What Happens After Approval: Ingestion to Ready
Once machines are approved, NICo’s Site Explorer begins automatically ingesting them. No further operator action is required under normal circumstances.
The high-level flow is:
- DHCP discovery: the host BMC sends a DHCP request; NICo assigns an IP and Site Explorer probes the BMC over Redfish to collect a full inventory. Site Explorer authenticates using the factory default credentials from the expected machines table, then rotates the BMC password to the site-wide credential. See Redfish Workflow for details.
- Preingestion: before pairing, NICo runs a preingestion state machine against each discovered BMC endpoint (both host and DPU). It checks that the BMC clock is within an acceptable drift of the site time, resetting the BMC if not. For host endpoints, firmware components are upgraded if they are below the minimum version required for ingestion.
- DPU-host pairing: Site Explorer correlates host and DPU serial numbers to form matched pairs. Once all DPUs are validated and matched, the
ManagedHostobject is created and the state machine starts. DpuDiscoveringState/DPUInit: NICo configures Secure Boot on the DPU, installs the DPU OS (BFB image), and power-cycles the host to apply the new DPU configuration.HostInit: NICo configures BIOS, sets the host boot order, optionally collects TPM attestation measurements, waits for hardware discovery via thescoutagent, and applies UEFI lockdown. When thescoutagent reports back, NICo replaces the temporary predicted host ID (prefixfm100p) with a stable host ID (prefixfm100h) derived from the host’s own DMI serial data or TPM certificate.BomValidating/Validation: NICo validates the discovered hardware against the expected SKU. If hardware validation is enabled, the host is rebooted and tested before proceeding.Ready: the host transitions throughHostInit/Discoveredand enters the available pool, ready for an instance to be assigned to it.
For the complete state transitions, including substates, retry logic, and reprovision paths, see the Managed Host State Diagrams.
Troubleshooting: Host and DPU Ingestion Issues
When a machine is not being created or is stuck in a pre-Ready state, carbide-api logs are the primary investigation tool. Filtering logs by the host BMC IP or DPU BMC IP is often the fastest way to understand where ingestion or pairing is failing.
You can check the current detailed state of any managed host using:
For a full guide on diagnosing stuck objects, including how to use the NICo Grafana dashboard and how to read state handler error logs, see Stuck Objects Runbook.
Endpoint Exploration Errors
Before pairing can occur, Site Explorer must successfully explore each BMC endpoint. Exploration failures are logged and surfaced in carbide-api logs and the NICo Grafana dashboard. Common error types:
For a complete reference of all Redfish endpoints and required response fields, see Redfish Endpoints Reference.
Common Blockers During Host + DPU Pairing
The following are the conditions in which Site Explorer cannot complete pairing and logs a host_dpu_pairing_blockers_count metric. Each requires operator investigation.
DPU-Related Issues: Installing a Fresh DPU OS
For DPU pairing failures, including dpu_pf0_mac_missing and cases where the DPU is in an unknown or corrupt state, a common fix is to install a vanilla pre-ingestion BFB image via rshim to return the DPU to a clean state. This runs as part of the preingestion state machine:
This command copies the NICo BFB image directly to the DPU via rshim (SSH to the DPU BMC) and triggers a DPU reboot to complete the installation. After the BFB is installed, NICo power-cycles the host automatically to apply the new DPU image.
Note: The
--host-bmc-ipflag is required. NICo uses it to power-cycle the host after the BFB copy completes. Use--pre-copy-powercycleif the host needs to release rshim control to the DPU BMC before the copy can start.
For additional DPU-specific troubleshooting including Secure Boot configuration, BMC password resets, and firmware version checks, see Adding New Machines to an Existing Site.
Managing the Expected Machines Table
The expected machines table in the carbide-api database holds the following fields per host:
- Chassis Serial Number
- BMC MAC Address
- BMC manufacturer’s set login
- BMC manufacturer’s set password
- DPU chassis serial number (only needed for DGX-H100 or other machines where the NetworkAdapter serial number is not available in the host Redfish)
Individual operations
Use carbide-admin-cli to operate on individual entries:
Bulk operations
Replace all entries from a JSON file:
Erase all entries:
Export
Export the current table as JSON: