> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/dsx/llms.txt.
> For full documentation content, see https://docs.nvidia.com/dsx/llms-full.txt.

# Key Component Interactions

## Disaggregated LLM Serving

The architecture enables disaggregated prefill and decode phases for optimal LLM serving:

![Disaggregated LLM Serving](https://files.buildwithfern.com/nvidia-dsx.docs.buildwithfern.com/dsx/03f942a55e0beba432bdc5dfce8a434ec832a937f3de80286b0c9cda9cead302/_dot_dot_/docs/guides/inference-ra/assets/images/nira-disaggregated-llm-serving.png)

**Key interactions:**

* **Planner + Grove:** Planner determines optimal prefill/decode ratios based on SLA constraints; Grove handles gang scheduling
* **Router + KV Block Manager:** Router uses cache hit rates and load information to make intelligent routing decisions
* **KV Block Manager + NIXL:** Enables KV cache to span across memory tiers (G1-G4) with high-speed transfers

## Kubernetes Infrastructure Stack

![Kubernetes Infrastructure Stack](https://files.buildwithfern.com/nvidia-dsx.docs.buildwithfern.com/dsx/7b2edc0ec5fa2d6fdbd6e85fe5e62b5edff3bb83c3d1727359902996e7d3b3ae/_dot_dot_/docs/guides/inference-ra/assets/images/nira-k8s-stack.png)

**Key interactions:**

* **GPU Operator + Network Operator:** Together provide full infrastructure management for GPU clusters
* **KAI Scheduler + Grove:** KAI handles general GPU scheduling; Grove adds gang scheduling for multinode workloads
* **nvcr.io + Operators:** Pre-built containers simplify deployment of the entire stack