NVCF is built around three independently scalable planes connected through NATS JetStream as the shared messaging layer. A single control plane manages multiple GPU worker clusters, each registered via the NVIDIA Cluster Agent (NVCA).
The diagram below shows the primary components and how requests flow across planes. The control plane contains additional services beyond those shown. See the manifest page for the full component list.
For a full breakdown of component responsibilities by plane, see the manifest page.
A synchronous HTTP function invocation flows through all three planes:
NVCF uses NATS JetStream as a durable request buffer, enabling true scale-to-zero without dropping requests:
Each GPU cluster runs its own NVCA instance. NATS JetStream subjects are scoped per cluster:
Creation messages are only delivered to the consumer of the addressed cluster. The invocation plane selects the target cluster based on the function deployment specification (GPU type, region, cluster group).
NVCF supports four workload types on the compute plane:
Each component ships an AGENTS.md with detailed internals, data flows, and API contracts:
src/compute-plane-services/nvca/AGENTS.mdsrc/compute-plane-services/ess-agent/AGENTS.mdsrc/compute-plane-services/byoo-otel-collector/AGENTS.mdsrc/invocation-plane-services/http-invocation/AGENTS.mdsrc/invocation-plane-services/grpc-proxy/AGENTS.mdsrc/invocation-plane-services/llm-api-gateway/AGENTS.mdsrc/invocation-plane-services/ratelimiter/AGENTS.mdsrc/control-plane-services/nats-auth-callout/AGENTS.mdsrc/control-plane-services/function-autoscaler/AGENTS.mdexamples/AGENTS.md