Single-cluster Local Development with the CLI
Single-cluster Local Development with the CLI
Single-cluster Local Development with the CLI
Install a complete NVCF self-hosted control plane and compute plane on a
single local k3d cluster using nvcf-cli. Useful for validating the install
and registration workflow before targeting real infrastructure.
This setup is for local development only. It uses fake GPUs, a single Cassandra replica, and ephemeral storage. Do not use this for production workloads.
Install the following tools:
Docker (running)
k3d v5.x or later
kubectl
helm >= 3.12
An NGC API key from ngc.nvidia.com with access to the NVCF chart and image registry.
The NGC organization and team slugs that hold the chart and image
repository you have access to. make build-and-deploy-cluster reads
these from SAMPLE_NGC_ORG / SAMPLE_NGC_TEAM during its credential
provider validation step; without them, the build target fails and
skips its final gateway-API setup.
nvcf-cli built from this repo:
Export the env vars used by the cluster bootstrap and the install steps:
The canonical single-cluster topology lives in tools/ncp-local-cluster/.
This creates a k3d cluster named ncp-local, installs the fake GPU operator,
the CSI SMB driver, Envoy Gateway, and validates the bootstrap end-to-end.
The single-cluster (ncp-local) and multi-cluster
(ncp-local-cp + ncp-local-compute-N) topologies both claim host
ports 8080/8443/4222 and cannot coexist. If you already have the
multi-cluster topology running, destroy it first:
build-and-deploy-cluster runs setup-gateway-api, check-gateway-api,
and validate-gateway as its final steps. If any earlier step fails (for
example, credential provider validation when SAMPLE_NGC_ORG /
SAMPLE_NGC_TEAM are not set), gateway setup is skipped. After fixing
the underlying issue, re-run just the gateway-API setup:
nvcf-cli self-hosted install --env local reads NGC credentials from
deploy/stacks/self-managed/secrets/local-secrets.yaml. Author it from the
canonical template:
Generate the base64 NGC dockerconfig credential and substitute it into the file:
local-secrets.yaml is gitignored. Keep your NGC key out of the working tree.
nvcf-cli self-hosted install renders helmfile manifests that reference
imagePullSecrets: [{name: nvcr-pull-secret}]. Create the secret in each
NVCF namespace before running install so pods can pull images from nvcr.io.
The loop is idempotent (uses kubectl apply):
--token DUMMY is a gate-bypass, not a real credential. The install command’s
check-cp phase normally requires a JWT, but the api-keys service that mints
that JWT does not exist yet on the first invocation. Pass --token DUMMY to
skip the gate; the install path itself never reads the token.
When this completes, a control-plane profile is written to
deploy/stacks/self-managed/out/control-plane-profile.yaml.
Now that the api-keys service is reachable, nvcf-cli init can mint a real
admin JWT:
The token is written to ~/.nvcf-cli.nvcf-cli-local.state. Subsequent commands
read it from there; the token never appears in argv or per-command logs.
In single-cluster topology, compute and control plane share the same k3d
cluster (ncp-local).
This emits out/ncp-local-register-values.yaml. Because compute and control
plane share a cluster, the in-cluster service URLs (for example
http://api.sis.svc.cluster.local:8080) are directly reachable and are
selected automatically.
Wait for the NVCA backend to become healthy:
Confirm the control-plane API is reachable:
The control-plane profile can be re-validated against the live cluster:
Remove the helm releases but keep the cluster (stack-only):
Or destroy the whole cluster: