NVCF Architecture
NVCF is built around three independently scalable planes connected through NATS JetStream as the shared messaging layer. A single control plane manages multiple GPU worker clusters, each registered via the NVIDIA Cluster Agent (NVCA).
Three-Plane Overview
The diagram below shows the primary components and how requests flow across planes. The control plane contains additional services beyond those shown — see the manifest page for the full component list.
Component Responsibilities
For a full breakdown of component responsibilities by plane, see the manifest page.
Single Request Lifecycle
A synchronous HTTP function invocation flows through all three planes:
Scale-to-Zero
NVCF uses NATS JetStream as a durable request buffer, enabling true scale-to-zero without dropping requests:
- Autoscaler detects zero utilization and drives desired instance count to 0.
- No function pods are running.
- A new request arrives and is published to the NATS JetStream stream. The stream persists it durably.
- Autoscaler detects queue depth > 0 and sets desired instances to 1 or more.
- NVCA receives a creation message and launches the pod.
- Pod connects via WorkerService gRPC and pulls the buffered message.
- Response is returned to the caller through the still-open Invocation Service connection.
Multi-Cluster Routing
Each GPU cluster runs its own NVCA instance. NATS JetStream subjects are scoped per cluster:
Creation messages are only delivered to the consumer of the addressed cluster. The invocation plane selects the target cluster based on the function deployment specification (GPU type, region, cluster group).
Function Workload Types
NVCF supports four workload types on the compute plane:
Key Custom Resource Definitions
Per-Component Documentation
Each component ships an AGENTS.md with detailed internals, data flows, and API contracts:
src/compute-plane-services/nvca/AGENTS.mdsrc/compute-plane-services/ess-agent/AGENTS.mdsrc/compute-plane-services/byoo-otel-collector/AGENTS.mdsrc/invocation-plane-services/http-invocation/AGENTS.mdsrc/invocation-plane-services/grpc-proxy/AGENTS.mdsrc/invocation-plane-services/llm-gateway/AGENTS.mdsrc/invocation-plane-services/ratelimiter/AGENTS.mdsrc/invocation-plane-services/nats-auth-callout/AGENTS.mdsrc/control-plane-services/function-autoscaler/AGENTS.mdexamples/AGENTS.md