Single-cluster Local Development with the CLI
Single-cluster Local Development with the CLI
Install a complete NVCF self-hosted control plane and compute plane on a
single local k3d cluster using nvcf-cli. Useful for validating the install
and registration workflow before targeting real infrastructure.
This setup is for local development only. It uses fake GPUs, a single Cassandra replica, and ephemeral storage. Do not use this for production workloads.
Prerequisites
Install the following tools:
-
Docker (running)
-
k3d v5.x or later
-
kubectl -
helm>= 3.12 -
An NGC API key from ngc.nvidia.com with access to the NVCF chart and image registry.
-
The NGC organization and team slugs that hold the chart and image repository you have access to.
make build-and-deploy-clusterreads these fromSAMPLE_NGC_ORG/SAMPLE_NGC_TEAMduring its credential provider validation step; without them, the build target fails and skips its final gateway-API setup. -
nvcf-clibuilt from this repo:
Export the env vars used by the cluster bootstrap and the install steps:
Step 1: Bring up the local k3d cluster
The canonical single-cluster topology lives in tools/ncp-local-cluster/.
This creates a k3d cluster named ncp-local, installs the fake GPU operator,
the CSI SMB driver, Envoy Gateway, and validates the bootstrap end-to-end.
The single-cluster (ncp-local) and multi-cluster
(ncp-local-cp + ncp-local-compute-N) topologies both claim host
ports 8080/8443/4222 and cannot coexist. The multi-cluster control plane also
claims host ports 9090 and 10081 for worker-facing API gRPC and the stack-owned
grpc-proxy TCP listener. If you already have the
multi-cluster topology running, destroy it first:
build-and-deploy-cluster runs setup-gateway-api, check-gateway-api,
and validate-gateway as its final steps. If any earlier step fails (for
example, credential provider validation when SAMPLE_NGC_ORG /
SAMPLE_NGC_TEAM are not set), gateway setup is skipped. After fixing
the underlying issue, re-run just the gateway-API setup:
Step 2: Author the local secrets file
nvcf-cli self-hosted install --env local reads NGC credentials from the
control-plane stack:
deploy/stacks/self-managed/secrets/local-secrets.yaml(control plane)
Author the file from its canonical template:
Generate the base64 NGC dockerconfig credential and substitute it into the file:
local-secrets.yaml is gitignored. Keep your NGC key out of the working tree.
Step 3: Create the image pull secrets
nvcf-cli self-hosted install renders helmfile manifests that reference
imagePullSecrets: [{name: nvcr-pull-secret}]. Create the secret in each
NVCF namespace before running install so pods can pull images from nvcr.io.
The loop is idempotent (uses kubectl apply):
Step 4: Install the control plane
--token DUMMY is a gate-bypass, not a real credential. The install command’s
check-cp phase normally requires a JWT, but the api-keys service that mints
that JWT does not exist yet on the first invocation. Pass --token DUMMY to
skip the gate; the install path itself never reads the token.
When this completes, a control-plane profile is written to
deploy/stacks/self-managed/out/control-plane-profile.yaml.
Step 5: Mint the admin JWT
Now that the api-keys service is reachable, nvcf-cli init can mint a real
admin JWT:
The token is written to ~/.nvcf-cli.nvcf-cli-local.state. Subsequent commands
read it from there; the token never appears in argv or per-command logs.
Step 6: Register the compute plane
In single-cluster topology, compute and control plane share the same k3d
cluster (ncp-local).
This emits out/ncp-local-register-values.yaml. Because compute and control
plane share a cluster, the in-cluster service URLs (for example
http://api.sis.svc.cluster.local:8080) are directly reachable and are
selected automatically.
Step 7: Install the compute plane
Step 8: Verify
Wait for the NVCA backend to become healthy:
Confirm the control-plane API is reachable:
Optional: Validate the profile
The control-plane profile can be re-validated against the live cluster:
Teardown
Remove the helm releases but keep the cluster (stack-only):
Or destroy the whole cluster: