Single-cluster (CLI) | NVIDIA Cloud Functions

Install a complete NVCF self-hosted control plane and compute plane on a single local k3d cluster using nvcf-cli. Useful for validating the install and registration workflow before targeting real infrastructure.

This setup is for local development only. It uses fake GPUs, a single Cassandra replica, and ephemeral storage. Do not use this for production workloads.

Clone the public repository and run the remaining commands from its root:

$ git clone https://github.com/nvidia/nvcf.git
$ cd nvcf

Prerequisites

Install the following tools:

Docker (running)
k3d v5.x or later
kubectl
helm >= 3.12
An NGC API key from ngc.nvidia.com with access to the NVCF chart and image registry.
The NGC organization and team slugs that hold the chart and image repository you have access to. make build-and-deploy-cluster reads these from SAMPLE_NGC_ORG / SAMPLE_NGC_TEAM during its credential provider validation step; without them, the build target fails and skips its final gateway-API setup.

nvcf-cli built from this repo:

$ go build -C src/clis/nvcf-cli -o ../../../nvcf-cli .

Export the env vars used by the cluster bootstrap and the install steps:

$ export NGC_API_KEY="<your-ngc-api-key>"
$ export SAMPLE_NGC_ORG="<your-ngc-org>"
$ export SAMPLE_NGC_TEAM="<your-ngc-team>"

Create the Docker credential file that k3d mounts into the local nodes:

$ mkdir -p tools/ncp-local-cluster/secrets
$ printf '{"auths":{"nvcr.io":{"auth":"%s"}}}\n' "$(printf '%s' "\$oauthtoken:${NGC_API_KEY}" | base64 | tr -d '\n')" > tools/ncp-local-cluster/secrets/docker-config.json

docker-config.json contains your NGC credential and is gitignored. Do not commit or share this file.

Step 1: Bring up the local k3d cluster

The canonical single-cluster topology lives in tools/ncp-local-cluster/.

$ make -C tools/ncp-local-cluster build-and-deploy-cluster

This creates a k3d cluster named ncp-local, installs the fake GPU operator, the CSI SMB driver, Envoy Gateway, and validates the bootstrap end-to-end.

The single-cluster (ncp-local) and multi-cluster (ncp-local-cp + ncp-local-compute-N) topologies both claim host ports 8080/8443/4222 and cannot coexist. The multi-cluster control plane also claims host ports 9090 and 10081 for worker-facing API gRPC and the stack-owned grpc-proxy TCP listener. If you already have the multi-cluster topology running, destroy it first:

$ make -C tools/ncp-local-cluster destroy-multicluster

build-and-deploy-cluster runs setup-gateway-api, check-gateway-api, and validate-gateway as its final steps. If any earlier step fails (for example, credential provider validation when SAMPLE_NGC_ORG / SAMPLE_NGC_TEAM are not set), gateway setup is skipped. After fixing the underlying issue, re-run just the gateway-API setup:

$ make -C tools/ncp-local-cluster setup-gateway-api
$ make -C tools/ncp-local-cluster check-gateway-api

Step 2: Author the Helmfile environment files

The CLI passes --env local-bdd to Helmfile when it renders each stack. Create the environment files from the single-cluster fixtures so Helmfile uses the NGC organization and team that you can access:

$ cp tests/bdd/fixtures/self-managed-local-bdd.yaml \
>    deploy/stacks/self-managed/environments/local-bdd.yaml
$ cp tests/bdd/fixtures/nvcf-compute-plane-local-bdd.yaml \
>    deploy/stacks/nvcf-compute-plane/environments/local-bdd.yaml
$ 
$ for file in \
>   deploy/stacks/self-managed/environments/local-bdd.yaml \
>   deploy/stacks/nvcf-compute-plane/environments/local-bdd.yaml; do
$   sed -i.bak \
>     -e "s|REPLACE_WITH_SAMPLE_NGC_ORG|${SAMPLE_NGC_ORG}|g" \
>     -e "s|REPLACE_WITH_SAMPLE_NGC_TEAM|${SAMPLE_NGC_TEAM}|g" \
>     "$file"
$   rm "${file}.bak"
$ done

Step 3: Author the local secrets file

nvcf-cli self-hosted install --env local-bdd reads NGC credentials from the control-plane stack:

deploy/stacks/self-managed/secrets/local-bdd-secrets.yaml (control plane)

Author the file from its canonical template:

$ cp deploy/stacks/self-managed/secrets/secrets.yaml.template \
>    deploy/stacks/self-managed/secrets/local-bdd-secrets.yaml

Generate the base64 NGC dockerconfig credential and substitute it into the file:

$ BASE64_CRED=$(printf '%s' "\$oauthtoken:${NGC_API_KEY}" | base64 | tr -d '\n')
$ sed -i.bak "s|REPLACE_WITH_BASE64_DOCKER_CREDENTIAL|${BASE64_CRED}|g" \
>   deploy/stacks/self-managed/secrets/local-bdd-secrets.yaml
$ rm deploy/stacks/self-managed/secrets/local-bdd-secrets.yaml.bak

local-bdd-secrets.yaml is gitignored. Keep your NGC key out of the working tree.

Step 4: Create the image pull secrets

nvcf-cli self-hosted install renders helmfile manifests that reference imagePullSecrets: [{name: nvcr-pull-secret}]. Create the secret in each NVCF namespace before running install so pods can pull images from nvcr.io. The loop is idempotent (uses kubectl apply):

$ for ns in cassandra-system nats-system nvcf api-keys ess sis \
>           vault-system nvca-operator nvca-system nvcf-backend cert-manager; do
$   kubectl create namespace "$ns" --dry-run=client -o yaml | kubectl apply -f -
$   kubectl create secret docker-registry nvcr-pull-secret \
>     --docker-server=nvcr.io \
>     --docker-username='$oauthtoken' \
>     --docker-password="${NGC_API_KEY}" \
>     --namespace="$ns" \
>     --dry-run=client -o yaml | kubectl apply -f -
$ done

Step 5: Install the control plane

$ ./nvcf-cli \
>   --config tests/bdd/fixtures/nvcf-cli-local.yaml \
>   self-hosted \
>     --control-plane-stack deploy/stacks/self-managed \
>     --compute-plane-stack deploy/stacks/nvcf-compute-plane \
>     --env local-bdd \
>     --plain \
>     --token DUMMY \
>   install --control-plane \
>     --cluster-name ncp-local \
>     --region us-west-1 \
>     --nca-id nvcf-default

--token DUMMY is a gate-bypass, not a real credential. The install command’s check-cp phase normally requires a JWT, but the api-keys service that mints that JWT does not exist yet on the first invocation. Pass --token DUMMY to skip the gate; the install path itself never reads the token.

When this completes, a control-plane profile is written to deploy/stacks/self-managed/out/control-plane-profile.yaml.

Step 6: Mint the admin JWT

Now that the api-keys service is reachable, nvcf-cli init can mint a real admin JWT:

$ ./nvcf-cli \
>   --config tests/bdd/fixtures/nvcf-cli-local.yaml \
>   init

The token is written to ~/.nvcf-cli.nvcf-cli-local.state. Subsequent commands read it from there; the token never appears in argv or per-command logs.

Step 7: Register the compute plane

In single-cluster topology, compute and control plane share the same k3d cluster (ncp-local).

$ ./nvcf-cli \
>   --config tests/bdd/fixtures/nvcf-cli-local.yaml \
>   self-hosted \
>     --control-plane-stack deploy/stacks/self-managed \
>     --compute-plane-stack deploy/stacks/nvcf-compute-plane \
>     --env local-bdd \
>     --plain \
>   compute-plane register \
>     --control-plane-profile deploy/stacks/self-managed/out/control-plane-profile.yaml \
>     --cluster-name ncp-local \
>     --kube-context k3d-ncp-local \
>     --region us-west-1 \
>     --output deploy/stacks/nvcf-compute-plane/out/ncp-local-register-values.yaml

This emits out/ncp-local-register-values.yaml. Because compute and control plane share a cluster, the in-cluster service URLs (for example http://api.sis.svc.cluster.local:8080) are directly reachable and are selected automatically.

Step 8: Install the compute plane

$ ./nvcf-cli \
>   --config tests/bdd/fixtures/nvcf-cli-local.yaml \
>   self-hosted \
>     --control-plane-stack deploy/stacks/self-managed \
>     --compute-plane-stack deploy/stacks/nvcf-compute-plane \
>     --env local-bdd \
>     --plain \
>   compute-plane install \
>     --values deploy/stacks/nvcf-compute-plane/out/ncp-local-register-values.yaml \
>     --kube-context k3d-ncp-local \
>     --cluster-name ncp-local

Step 9: Verify

Wait for the NVCA backend to become healthy:

$ kubectl wait nvcfbackend ncp-local \
>   -n nvca-operator \
>   --for=jsonpath='{.status.agentStatus}'=healthy \
>   --timeout=10m

Confirm the control-plane API is reachable:

$ export NVCF_TOKEN=$(curl -s -X POST "http://api-keys.localhost:8080/v1/admin/keys" \
>   | python3 -c "import sys,json; print(json.load(sys.stdin)['value'])")
$ 
$ curl -s "http://api.localhost:8080/v2/nvcf/functions" \
>   -H "Authorization: Bearer ${NVCF_TOKEN}" | python3 -m json.tool
$ # Expected: {"functions": []}

Optional: Validate the profile

The control-plane profile can be re-validated against the live cluster:

$ ./nvcf-cli \
>   --config tests/bdd/fixtures/nvcf-cli-local.yaml \
>   self-hosted \
>     --control-plane-stack deploy/stacks/self-managed \
>     --compute-plane-stack deploy/stacks/nvcf-compute-plane \
>     --env local-bdd \
>     --plain \
>   control-plane profile validate \
>     --file deploy/stacks/self-managed/out/control-plane-profile.yaml \
>     --require in-cluster

If the generated profile is deleted, recreate it from the control-plane stack and OpenBao root CA without re-running the full install:

$ nvcf-cli \
>   --config tests/bdd/fixtures/nvcf-cli-local.yaml \
>   self-hosted \
>     --control-plane-stack deploy/stacks/self-managed \
>     --env local \
>     --plain \
>   control-plane profile export \
>     --cluster-name ncp-local \
>     --nca-id nvcf-default \
>     --region us-west-1

Teardown

Remove the helm releases but keep the cluster (stack-only):

$ tests/bdd/scripts/destroy-stack.sh single

Or destroy the whole cluster:

$ make -C tools/ncp-local-cluster destroy