Control Plane Operations
This section provides runbooks for operating the self-hosted NVCF control plane, including encryption key rotation, service management, and upgrades.
Encryption Key Management
Self-hosted NVCF uses a two-tier encryption hierarchy to protect secrets stored in the Encrypted Secret Store (ESS):
When a user stores a secret through the NVCF API, ESS encrypts it with the active NEK for that namespace. The NEK itself is stored in Cassandra, encrypted by the MEK. To decrypt a secret, ESS retrieves the NEK from Cassandra, decrypts it using the MEK from OpenBAO, then decrypts the secret.
Key Rotation Runbooks
- control-plane-runbook-mek-rotation — Rotate the master encryption key stored in OpenBAO.
Basic Operations
Service Reference
The following table lists all NVCF control plane services with their namespace, resource name, and resource type. Use these values in the commands throughout this section.
Restarting a Service
Restarting a Deployment:
Restarting a StatefulSet:
StatefulSets perform a rolling restart, terminating and recreating one pod at a time in reverse ordinal order (highest first). For clustered services like NATS, OpenBao, and Cassandra, this preserves quorum as long as a majority of replicas remain available.
For OpenBao, verify the seal status after the rollout completes. Each pod must unseal before it can serve requests:
Restarting all Deployments in a namespace:
Checking Service Health
List pods and their status:
Check logs for a service:
Describe a pod for events and conditions:
Scaling a Service
To temporarily take a service offline (for example, during maintenance), scale it to zero, perform the work, then scale it back:
Scaling infrastructure StatefulSets (Cassandra, NATS, OpenBao) to zero will cause a full outage. Only do this if you understand the implications for data availability and quorum.
Upgrading Services
Upgrades are not officially supported during the Early Access period. The self-hosted
NVCF stack does not yet have a validated upgrade path. Even a full helmfile sync may
introduce breaking changes between releases — there is no guarantee of backward
compatibility for configuration, database schemas, or inter-service APIs at this stage.
The guidance below is provided for advanced users who need to apply targeted fixes or hotfixes to individual services. It is not a substitute for a validated upgrade procedure.
Spot upgrades carry additional risk. Beyond the general Early Access limitations above, spot-upgrading an individual Helm chart bypasses the Helmfile’s version coordination and automatic database migrations. Proceed only when you understand the compatibility implications for the specific version you are upgrading to.
When to Spot Upgrade
Pre-Upgrade Checklist
Before upgrading any chart:
-
Note the current chart version and app version:
-
Back up the current Helm values:
-
Review release notes for the target version. Check for breaking changes, required value changes, or new dependencies.
-
Verify the cluster is healthy before starting — all pods running, no pending operations.
Spot Upgrading a Helm Chart
The following commands work for any Deployment-based service. Replace the placeholders with values from the [Service Reference] table above.
Example — upgrading the NVCF API chart:
Always pass your values file (-f values.yaml) during upgrade. If you omit it, Helm
resets all values to chart defaults, which can break your deployment. If you no longer
have the original values file, back up the current values first with helm get values.
Upgrading StatefulSet-Based Services
Cassandra, NATS, and OpenBao are deployed as StatefulSets. The helm upgrade command is the
same, but the rollout behavior differs:
- Rolling update: StatefulSets restart pods one at a time in reverse ordinal order, waiting for each pod to become ready before proceeding to the next.
- Quorum preserved: For 3-replica clusters, at most one pod is unavailable at a time, maintaining quorum throughout the upgrade.
Service-specific notes:
Rolling Back
If an upgrade causes issues, roll back to the previous Helm revision:
helm rollback reverts both the chart version and the values. If you made intentional
value changes alongside the version upgrade, you will need to re-apply them after the
rollback.
Observability
For observability configuration and reference architecture, see self-hosted-observability.