Key Component Interactions#

Disaggregated LLM Serving#

The architecture enables disaggregated prefill and decode phases for optimal LLM serving:

Disaggregated LLM Serving — Disaggregated LLM Serving#

Key interactions:

Planner + Grove: Planner determines optimal prefill/decode ratios based on SLA constraints; Grove handles gang scheduling
Router + KV Block Manager: Router uses cache hit rates and load information to make intelligent routing decisions
KV Block Manager + NIXL: Enables KV cache to span across memory tiers (G1-G4) with high-speed transfers

Kubernetes Infrastructure Stack#

Kubernetes Infrastructure Stack — Kubernetes Infrastructure Stack#

Key interactions:

GPU Operator + Network Operator: Together provide full infrastructure management for GPU clusters
KAI Scheduler + Grove: KAI handles general GPU scheduling; Grove adds gang scheduling for multinode workloads
nvcr.io + Operators: Pre-built containers simplify deployment of the entire stack