Key Component Interactions
Disaggregated LLM Serving
The architecture enables disaggregated prefill and decode phases for optimal LLM serving:

Key interactions:
- Planner + Grove: Planner determines optimal prefill/decode ratios based on SLA constraints; Grove handles gang scheduling
- Router + KV Block Manager: Router uses cache hit rates and load information to make intelligent routing decisions
- KV Block Manager + NIXL: Enables KV cache to span across memory tiers (G1-G4) with high-speed transfers
Kubernetes Infrastructure Stack

Key interactions:
- GPU Operator + Network Operator: Together provide full infrastructure management for GPU clusters
- KAI Scheduler + Grove: KAI handles general GPU scheduling; Grove adds gang scheduling for multinode workloads
- nvcr.io + Operators: Pre-built containers simplify deployment of the entire stack