Quick Start Guide
This guide walks through deploying NICo end-to-end: from building containers to discovering your first managed host. The core deployment is orchestrated by setup.sh in the helm-prereqs/ directory, which installs all prerequisites and NICo components in the correct order.
Before starting, review the Prerequisites for hardware, networking, software, and BMC/OOB requirements.
Step 1 — Build NICo Containers
Build all NICo container images from source on Ubuntu 24.04. This produces images for Infra Controller Core, DPU BFB artifacts, and the admin CLI.
Refer to the Building NICo Containers manual for full build instructions, including x86_64 and aarch64 cross-compilation steps.
Push the built images to your container registry before proceeding.
Step 2 — Prepare the Kubernetes Cluster
NICo requires a Kubernetes cluster with at least three schedulable nodes (Ready, not tainted NoSchedule/NoExecute) for HA Vault and PostgreSQL. NICo does not provision the cluster itself—operators are expected to provision their own Kubernetes cluster that meets the requirements below using their preferred tooling (kubeadm, Kubespray, managed K8s, etc.).
Validated baseline:
The cluster must have:
net.bridge.bridge-nf-call-iptables=1andnet.ipv4.ip_forward=1on every node.- DNS resolution working (
kubernetes.default.svc.cluster.localresolves on every node). - Network connectivity to your container registry.
Required tools (local machine)
The following tools must be installed on the machine that you will use to run setup.sh—not on the Kubernetes cluster itself.
The helmfile tool requires the helm-diff plugin. Install it as follows:
Step 3 — Configure the Site
Everything in this step must be done before running setup.sh. Skipping any item will either cause setup to fail or result in a deployment with incorrect site configuration that is hard to fix after the fact.
3a. Set Required Environment Variables
NCX_IMAGE_REGISTRY is used for both NCX Core (<registry>/nvmetal-carbide) and NCX REST (<registry>/carbide-rest-*). Push all images to this registry before running setup.
Obtain an NGC API key at ngc.nvidia.com → API Keys → Generate Personal Key.
3b. Set your Site Name
Open helm-prereqs/values.yaml and change siteName from the placeholder to your actual site identifier:
This value is injected into every postgres pod as the TMP_SITE environment variable. It must match the sitename in the NCX Core siteConfig block below.
To tune PostgreSQL resources for your node capacity (the defaults are conservative for dev), edit the following values:
3c. Configure NCX Core Site Deployment
Open helm-prereqs/values/ncx-core.yaml and update the following values:
-
API hostname: The external DNS name for the Infra Controller Core API:
-
siteConfigTOML block: The site identity, network topology, and resource pools. These fields are most likely to differ per site:
All fields are documented with inline comments in the file.
-
Required fields—do not leave empty:
[networks.admin],prefix, andgatewaymust be set to real values.carbide-apicrashes at startup with a parse error if these are empty strings. Similarly,[pools.lo-ip],[pools.vlan-id], and[pools.vni]ranges must be non-empty.These fields are safe to leave as empty arrays:
dhcp_servers,site_fabric_prefixes,deny_prefixes. Do not delete any field from the TOML block; missing keys cause a different crash than empty ones.
3d. Get the NCX REST Repository
NCX REST (infra-controller-rest) is a separate repository that contains the Helm chart, kustomize bases, and helper scripts that setup.sh uses for Phase 7. It is not bundled inside this repo—you need a local clone before running setup.
Option 1: Let setup.sh handle it automatically (recommended)
setup.sh looks for the repo in these locations in order:
NCX_REPOenv var (explicit path—use this if you cloned it somewhere non-standard)- Sibling directories next to this repo:
../carbide-rest,../ncx-infra-controller-rest,../ncx - If not found anywhere,
preflight.shoffers to clone it for you before setup proceeds
If you place the clone next to this repo (the recommended layout), no env var is needed:
Option 2: Clone it manually
Use the following commands to clone the repository:
3e. Configure NCX REST Authentication
The default configuration uses the dev Keycloak instance that setup.sh deploys automatically. No changes are needed if you’re running a dev/test environment.
For production, or if you are using your own IdP, edit the helm-prereqs/values/ncx-rest.yaml file as follows:
Option 1: Use your own Keycloak or OIDC-compatible IdP
Option 2: Disable Keycloak and use a generic OIDC issuer
When keycloak.enabled: false, the Keycloak deployment is still created by setup.sh, but carbide-rest-api will not use it for token validation.
3f. Review site-agent Config
The defaults in helm-prereqs/values/ncx-site-agent.yaml should match the dev postgres instance deployed by setup.sh.
DB_USER and DB_PASSWORD are injected at runtime from the db-creds Kubernetes Secret (created by the carbide-rest-common sub-chart during Phase 7g). The Secret is referenced via secrets.dbCreds in the site-agent values.
For production or a different database, override the Secret name and connection config:
3g. Configure MetalLB
MetalLB provides LoadBalancer IPs for NCX Core services (carbide-api, DHCP, DNS, PXE, SSH console). Without it, those services stay in <pending> state and the site is unreachable.
NTP note: NICo does not run a standalone NTP service. Instead, NTP server addresses are provided to managed hosts via DHCP option 42—configured in the
carbide-dhcpchart Kea hook parameters (carbide-ntpserver). Point this to your enterprise NTP servers.
Edit helm-prereqs/values/metallb-config.yaml—this file ships pre-populated with example values. Replace all values labeled # EXAMPLE with your site-specific configuration before running setup.sh.
Add or remove BGPPeer blocks to match your node count, with one block per worker node.
BGPPeer and BGPAdvertisement sections and uncomment the L2Advertisement section at the bottom of the file.3h. Assign Service VIPs
Each NCX Core service that exposes a LoadBalancer needs a specific, stable IP from your MetalLB pool. Without explicit assignments, MetalLB picks IPs randomly on each install, which means your DHCP relay, DNS records, PXE config, and API hostname cannot be pre-configured and will break on redeploy.
Open helm-prereqs/values/ncx-core.yaml and update the VIP for each service:
All IPs must be within the IPAddressPool ranges you defined in values/metallb-config.yaml and must be unique across services.
- carbide-dhcp Note:
externalService.enabled: truemust be set explicitly; it defaults to false in the chart. - carbide-dns Note: Use
perPodAnnotations(a list) rather thanannotationsbecause each replica gets its own VIP. - carbide-api IP and DNS Note: The carbide-api VIP must resolve in external DNS to the
hostnameyou set in Step 3c.
3i. (Optional) Set a Stable Site UUID
If you want a specific site UUID instead of the default placeholder, set the NCX_SITE_UUID environment variable:
This UUID is used as the Temporal namespace for the site and as the CLUSTER_ID passed to the site-agent. Once set and deployed, changing it requires redeploying the site-agent and re-registering the site.
3j. Validate Configuration
Run the pre-flight check to catch issues before deployment:
The preflight.sh script is also run automatically at the start of every setup.sh invocation.
The preflight.sh script checks the following:
For air-gapped clusters, the per-node checks pull busybox:1.36 by default. If your cluster cannot reach Docker Hub, set PREFLIGHT_CHECK_IMAGE to a local mirror:
Step 4 — Run the Setup Script
Run the setup.sh script as follows:
The setup.sh script installs all prerequisites and NICo components in sequential phases:
The following components are deployed:
For manual phase-by-phase installation, re-running individual phases, or debugging failures, refer to the Reference Installation guide.
Step 5 — Verify the Site Controller
Before ingesting hosts, verify that all site controller components are healthy.
Check That All Pods Are Running
Verify That the Site-agent Is Connected
Look for the “successfully connected to server” message in the logs.
Verify That the LoadBalancer IPs Are Assigned
All LoadBalancer services should have an external IP from your IPAddressPool ranges. If any show <pending>, MetalLB has not assigned an IP. Check BGP session status on your TOR switches and verify values/metallb-config.yaml has correct peer addresses.
Verify That DHCP and PXE Are Serving
Both external IPs should be within your internal VIP pool range.
Acquire a Keycloak Access Token
This section only applies if keycloak.enabled: true in values/ncx-rest.yaml (the default). If you disabled the bundled Keycloak and pointed carbide-rest-api at your own IdP, obtain tokens from that IdP instead.
The setup.sh script deploys a dev Keycloak instance with a carbide realm pre-loaded with the ncx-service client (M2M / client_credentials).
localhost. The resulting JWT iss claim will not match what carbide-rest-api expects, and the token will be rejected.Use the helper script, which runs curl from a throw-away in-cluster pod:
Verify the token against carbide-rest-api:
Set up carbidecli and Create your First Site
NICo has two CLIs that serve different purposes:
carbidecli is built from the NCX REST repo. carbide-admin-cli is built from the NCX Core repo (crates/admin-cli).
1. Build and Install the CLI
2. Generate the Default Config File
3. Port-forward carbide-rest-api to localhost
4. Edit ~/.carbide/config.yaml
5. Bootstrap the Org (Required One-Time Call)
This GET endpoint lazily initializes the org on first call as follows:
- Checks if service account is enabled in the auth config
- Creates an InfrastructureProvider for the org if one doesn’t exist
- Creates a Tenant with targeted instance creation enabled if one doesn’t exist
- Creates a TenantAccount linking the provider and tenant if one doesn’t exist
- Returns the service account status with the provider and tenant IDs
Without this call, site create returns 404. Subsequent calls are read-only.
6. Create your First Site
Overall Health Check
Run the following commands to verify that all components are healthy:
For troubleshooting common issues, refer to the Reference Installation — Troubleshooting guide.
Step 6 — Connect the OOB Network
Configure the out-of-band network to relay BMC DHCP requests to the NICo DHCP service.
-
Configure the DHCP relay on your OOB switches to forward DHCP requests to the
carbide-dhcpLoadBalancer VIP (assigned in Step 3h). -
Verify DHCP requests are reaching NICo by checking the DHCP service logs:
For detailed OOB network requirements, refer to the BMC and Out-of-Band Setup guide.
Step 7 — Discover Your First Host
This step uses carbide-admin-cli, the gRPC CLI for NICo Core. Build it from the NCX Core repo:
Alternatively, use the containerized version bundled in the carbide-api pod (available at /opt/carbide/forge-admin-cli inside the container).
The <api-url> in the commands below is the NICo Core gRPC API endpoint. This is the carbide-api hostname configured in Step 3c, not the REST API used in Step 5. The format is typically https://api-<ENVIRONMENT_NAME>.<SITE_DOMAIN_NAME>. You can also retrieve it from the LoadBalancer VIP:
Set Site-wide Credentials
Configure the credentials NICo will apply to BMCs and UEFI after ingestion:
Upload the Expected Machines Manifest
Prepare an expected_machines.json with the BMC MAC address, factory default credentials, and chassis serial number for each host:
Upload the manifest:
Approve the host for ingestion
NICo uses Measured Boot with TPM v2.0 to enforce cryptographic identity:
NICo will now discover the host via Redfish, pair it with its DPU(s), provision the DPU, and bring the host to a ready state. For more details, refer to the Ingesting Hosts guide.
Monitor Host Discovery
Teardown
To perform teardown, run the following command:
This removes NCX REST, NCX Core, all helmfile releases, cluster-scoped resources, namespaces, and released PersistentVolumes. For details on what clean.sh does and the removal order, refer to the Reference Installation guide.