Is this page helpful?

Phase 2: Core Services#

This phase installs the NVCF control plane services. These services depend on the infrastructure components installed in Phase 1: Infrastructure Dependencies.

Important

All three infrastructure dependencies (NATS, OpenBao, Cassandra) must be running and healthy before proceeding. Verify with:

kubectl get pods -n nats-system -n vault-system -n cassandra-system

Install the services in the order shown below. Services with dependencies are noted — wait for the dependency to be healthy before installing the dependent service.

API Keys#

API Keys provides authentication token management for all NVCF API interactions.

Chart	`helm-nvcf-api-keys`
Version	`1.0.4`
Namespace	`api-keys`
Depends on	Infrastructure only

Configuration#

Create api-keys-values.yaml (download template):

Replace <REGISTRY> and <REPOSITORY> with your registry settings.

Install#

helm upgrade --install api-keys \
  oci://${REGISTRY}/${REPOSITORY}/helm-nvcf-api-keys \
  --version 1.0.4 \
  --namespace api-keys \
  --wait --timeout 10m \
  -f api-keys-values.yaml

Verify#

kubectl get pods -n api-keys

# Expected: api-keys pod Running

SIS#

The Spot Instance Service (SIS) handles cluster registration and GPU resource management.

Chart	`helm-nvcf-sis`
Version	`1.8.0`
Namespace	`sis`
Depends on	Infrastructure only

Configuration#

Create sis-values.yaml (download template):

Install#

helm upgrade --install sis \
  oci://${REGISTRY}/${REPOSITORY}/helm-nvcf-sis \
  --version 1.8.0 \
  --namespace sis \
  --wait --timeout 10m \
  -f sis-values.yaml

Verify#

kubectl get pods -n sis

# Expected: spot-instance-service pod Running

ESS API#

The ESS (Enterprise Secrets Service) API distributes secrets to NVCF services via OpenBao.

Chart	`helm-nvcf-ess-api`
Version	`1.3.0`
Namespace	`ess`
Depends on	Infrastructure only

Configuration#

Create ess-api-values.yaml (download template):

Install#

helm upgrade --install ess-api \
  oci://${REGISTRY}/${REPOSITORY}/helm-nvcf-ess-api \
  --version 1.3.0 \
  --namespace ess \
  --wait --timeout 10m \
  -f ess-api-values.yaml

Verify#

kubectl get pods -n ess

# Expected: ess-api pod Running

NVCF API#

The NVCF API is the primary control plane service. It manages functions, deployments, and account configuration. The API chart includes an account bootstrap job that runs on first install to initialize the NVCF account with registry credentials.

Chart	`helm-nvcf-api`
Version	`1.13.0`
Namespace	`nvcf`
Depends on	ESS API (must be running)

Important

The ESS API must be running before installing the NVCF API. The account bootstrap job communicates with ESS during initialization.

Configuration#

Create nvcf-api-values.yaml (download template):

Replace the following placeholders:

`<REGISTRY>`	Your container image registry
`<REPOSITORY>`	Your image repository path
`<REGISTRY_CREDENTIAL_B64>`	Base64-encoded registry credential (see Prerequisites and Configuration)
`<HELM_REGISTRY>`	Hostname for your Helm chart registry (e.g., `helm.ngc.nvidia.com` or your ECR hostname)

Install#

helm upgrade --install api \
  oci://${REGISTRY}/${REPOSITORY}/helm-nvcf-api \
  --version 1.13.0 \
  --namespace nvcf \
  --wait --wait-for-jobs --timeout 15m \
  -f nvcf-api-values.yaml

Important

Monitor for account bootstrap failures. Open a separate terminal and watch events:

kubectl get events -n nvcf -w

The account bootstrap job is the most common failure point (usually due to misconfigured registry credentials in the values file).

Verify#

kubectl get pods -n nvcf

# Expected: nvcf-api pod Running

Check the bootstrap job completed:

kubectl get jobs -n nvcf

# The nvcf-api-account-bootstrap job should show COMPLETIONS 1/1

Note

The bootstrap job auto-deletes after approximately 5 minutes. Monitor events in real-time to catch failures.

Troubleshooting#

Bootstrap job fails: Check the job logs:

kubectl logs job/nvcf-api-account-bootstrap -n nvcf

Registry credential errors: Verify your <REGISTRY_CREDENTIAL_B64> value is correct. The base64-encoded credential should decode to username:password format.

Recovering from bootstrap failure: Uninstall the API chart, fix the values, and reinstall:

helm uninstall api -n nvcf
# Fix nvcf-api-values.yaml
helm upgrade --install api ...

Invocation Service#

The Invocation Service handles function invocation requests and routes them to worker nodes.

Chart	`helm-nvcf-invocation-service`
Version	`1.3.1`
Namespace	`nvcf`
Depends on	NVCF API (must be running)

Configuration#

Create invocation-service-values.yaml (download template):

Install#

helm upgrade --install invocation-service \
  oci://${REGISTRY}/${REPOSITORY}/helm-nvcf-invocation-service \
  --version 1.3.1 \
  --namespace nvcf \
  --wait --timeout 10m \
  -f invocation-service-values.yaml

Verify#

kubectl get pods -n nvcf -l app.kubernetes.io/name=invocation-service

# Expected: invocation-service pod Running

gRPC Proxy#

The gRPC Proxy enables streaming workloads over gRPC connections.

Chart	`helm-nvcf-grpc-proxy`
Version	`1.4.0`
Namespace	`nvcf`
Depends on	NVCF API (must be running)

Configuration#

Create grpc-proxy-values.yaml (download template):

Install#

helm upgrade --install grpc-proxy \
  oci://${REGISTRY}/${REPOSITORY}/helm-nvcf-grpc-proxy \
  --version 1.4.0 \
  --namespace nvcf \
  --wait --timeout 10m \
  -f grpc-proxy-values.yaml

Verify#

kubectl get pods -n nvcf -l app.kubernetes.io/name=grpc-proxy

# Expected: grpc-proxy pod Running

Notary Service#

The Notary Service handles request signing and validation for secure inter-service communication.

Chart	`helm-nvcf-notary-service`
Version	`1.2.0`
Namespace	`nvcf`
Depends on	Infrastructure only

Configuration#

Create notary-service-values.yaml (download template):

Install#

helm upgrade --install notary-service \
  oci://${REGISTRY}/${REPOSITORY}/helm-nvcf-notary-service \
  --version 1.2.0 \
  --namespace nvcf \
  --wait --timeout 10m \
  -f notary-service-values.yaml

Verify#

kubectl get pods -n nvcf -l app.kubernetes.io/name=notary-service

# Expected: notary-service pod Running

Reval#

Reval renders Helm chart functions without requiring direct cluster access. It is installed in the nvcf namespace with the helm-reval chart.

Chart	`helm-reval`
Version	`1.2.2`
Namespace	`nvcf`
Depends on	Infrastructure only

Configuration#

Create reval-values.yaml (download template):

Replace <REGISTRY> and <REPOSITORY> with your registry settings.

Install#

helm upgrade --install reval \
  oci://${REGISTRY}/${REPOSITORY}/helm-reval \
  --version 1.2.2 \
  --namespace nvcf \
  --wait --timeout 10m \
  -f reval-values.yaml

Verify#

kubectl get pods -n nvcf -l app.kubernetes.io/name=reval

# Expected: reval pod Running

Admin Token Issuer Proxy#

The Admin Token Issuer Proxy provides an admin endpoint for generating API keys without requiring pre-existing credentials. It is used for initial setup and emergency access.

Chart	`helm-admin-token-issuer-proxy`
Version	`1.2.2`
Namespace	`api-keys`
Depends on	API Keys (must be running)

Configuration#

Create admin-issuer-proxy-values.yaml (download template):

Note

The gateway setting is false during this phase because the Gateway API CRDs and Gateway resource are not yet installed. The admin endpoint HTTPRoute will be created in Phase 3: Gateway and Ingress when the Gateway Routes chart is deployed.

Install#

helm upgrade --install admin-issuer-proxy \
  oci://${REGISTRY}/${REPOSITORY}/helm-admin-token-issuer-proxy \
  --version 1.2.2 \
  --namespace api-keys \
  --wait --timeout 10m \
  -f admin-issuer-proxy-values.yaml

Verify#

kubectl get pods -n api-keys

# Expected: api-keys and admin-token-issuer-proxy pods both Running

Verify All Core Services#

Before proceeding to gateway configuration, confirm all core services are healthy:

echo "=== NVCF namespace ==="
kubectl get pods -n nvcf

echo "=== API Keys namespace ==="
kubectl get pods -n api-keys

echo "=== ESS namespace ==="
kubectl get pods -n ess

echo "=== SIS namespace ==="
kubectl get pods -n sis

All pods should be in Running state. Verify helm releases:

helm list -A

# All releases should show STATUS: deployed

Tip

If any pod is stuck in CrashLoopBackOff, check its logs with kubectl logs <pod-name> -n <namespace> --tail=100. Common causes include misconfigured secrets or unreachable infrastructure services.

Next Steps#

Once all core services are running, proceed to Phase 3: Gateway and Ingress to configure ingress and verify end-to-end API connectivity.