Phase 1: Infrastructure Dependencies
Phase 1: Infrastructure Dependencies
This phase installs the three infrastructure services that all NVCF core services depend on: NATS (messaging), OpenBao (secrets management), and Cassandra (persistence).
Complete all steps in standalone-prerequisites before proceeding. You should have
your shell variables (REGISTRY, REPOSITORY, STORAGE_CLASS, STORAGE_SIZE,
CASSANDRA_PASSWORD, REGISTRY_CREDENTIAL_B64) exported and namespaces created.
NATS
NATS provides the messaging backbone for inter-service communication across all NVCF components.
Configuration
Create nats-values.yaml with your registry settings (download template):
Replace all <REGISTRY> and <REPOSITORY> placeholders with your actual registry values.
If you are using a custom storage class, uncomment the config.jetstream.fileStore.pvc.storageClassName
section and set it to your storage class.
If you are using node selectors (e.g., with nvcf-base EKS clusters), uncomment the
podTemplate section.
Install
Verify
Verify the NATS cluster has formed:
If pods remain in Pending state, check that your storage class is available and that
nodes satisfy any configured node selectors.
OpenBao
OpenBao provides Vault-compatible secrets management. It handles secret injection into NVCF service pods and stores sensitive configuration such as Cassandra credentials and registry pull secrets.
NATS must be running and healthy before installing OpenBao. The OpenBao migration job communicates with NATS during initialization.
Configuration
Create openbao-values.yaml with your registry and secret settings (download template):
Replace the following placeholders:
If you are using a custom storage class, uncomment dataStorage.storageClass and set it
appropriately.
If you are using node selectors, uncomment the nodeSelector sections under both
injector and server.
Install
The release name must be openbao-server. Other NVCF charts reference this name
for service discovery.
Post-Install Hooks
The OpenBao chart runs two post-install jobs automatically. The --wait-for-jobs flag
ensures helm waits for both to complete before returning.
1. Initialize Cluster (openbao-server-initialize-cluster)
This job initializes the OpenBao (Vault) cluster on first install:
- Initializes the vault and generates unseal keys
- Unseals all server replicas
- Saves the unseal key to a Kubernetes secret (
openbao-server-unseal) for the auto-unseal sidecar - Enables the Raft storage backend for HA
- Registers and enables the JWT secrets plugin
- Saves the JWT signing key to a Kubernetes secret (
cluster-jwt)
2. Migrations (openbao-server-migrations)
This job runs after the cluster is initialized and configures OpenBao for NVCF services:
- Creates KV secret stores for each NVCF service (api, sis, ess, invocation-service, etc.)
- Writes the Cassandra password and registry pull secret (from your values file) into the vault
- Configures Kubernetes JWT authentication backends so each service can authenticate using its service account
- Creates service-specific policies that control which secrets each service can access
- Sets up JWT signing roles used by SIS for cluster agent authentication
Both jobs must complete successfully before core services can start. If either job fails, the core services will not be able to authenticate with OpenBao. Check job logs for troubleshooting (see below).
Verify
Verify both post-install jobs completed:
Check that OpenBao is initialized and unsealed:
Troubleshooting
-
Initialize cluster job fails: Check the init job logs:
-
Migration job fails: Check the migration job logs for details:
-
Server remains sealed: The auto-unseal sidecar reads from a Kubernetes secret. Verify the unseal key secret exists:
-
Stale resources from previous install: If reinstalling OpenBao after a failed attempt, delete all resources in the namespace first to avoid conflicts with leftover secrets, configmaps, and jobs:
Cassandra
Apache Cassandra provides the persistence layer for NVCF. It stores function metadata, deployment state, and other operational data.
Configuration
Create cassandra-values.yaml with your registry and storage settings (download template):
Replace all <REGISTRY> and <REPOSITORY> placeholders with your actual registry values.
Adjust persistence.size based on your expected data volume (50-100Gi recommended for
production).
If you are using node selectors, uncomment the nodeSelector section.
For local development with a single node, set replicaCount: 1. Production deployments
should use a minimum of 3 replicas.
Install
Verify
Cassandra initialization pods showing “Error” is expected. The cassandra-initialize-cluster
job runs multiple pods in parallel and retries on failure. It is normal to see one or more pods
with Error status. The deployment is healthy as long as at least one initialization pod
reaches Completed and the cassandra-migrations job completes successfully.
Check the initialization and migration jobs:
Verify Cassandra is accepting connections:
Troubleshooting
-
Pods stuck in Pending: Verify your storage class can provision PVCs of the requested size. Some cloud providers (e.g., AWS EBS gp3) have minimum PVC size requirements.
-
Initialization job retries: This is normal. The initialization job may fail several times while Cassandra nodes are still starting. As long as one pod eventually reaches
Completed, the cluster is healthy. -
Migration job fails: Check migration logs:
Verify All Infrastructure
Before proceeding to the core services, confirm all three infrastructure components are healthy:
All pods should be in Running or Completed state. If any pods are unhealthy, resolve
the issues before continuing.
Next Steps
Once all infrastructure dependencies are running, proceed to standalone-core-services to install the NVCF control plane services.