> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/dsx/llms.txt.
> For full documentation content, see https://docs.nvidia.com/dsx/llms-full.txt.

# Getting Started

Choose the components that fit your needs. Here are common adoption paths:

## Full Stack Deployment

1. **Infrastructure:** Deploy GPU Operator and Network Operator to your Kubernetes cluster
2. **Containers:** Pull optimized containers from nvcr.io
3. **Optimize:** Use Model Optimizer with TensorRT or TensorRT-LLM
4. **Plan:** Use AIConfigurator to estimate performance and plan deployment topology
5. **Deploy:** Use KAI Scheduler (add Grove for multinode) to deploy Triton or Dynamo
6. **Tune:** Use AIPerf for benchmarking, Planner for runtime optimization

## Traditional ML Inference Only

1. **Optimize:** Use TensorRT to optimize your models
2. **Serve:** Deploy with Triton Inference Server
3. **Optional:** Add DALI for GPU-accelerated preprocessing

## GenAI/LLM Inference Only

1. **Optimize:** Use TensorRT-LLM to optimize your LLM
2. **Serve:** Deploy with Dynamo
3. **Scale:** Add KV Block Manager, NIXL, and Router for distributed inference

## Kubernetes Integration Only

1. **Deploy:** GPU Operator + Network Operator for infrastructure management
2. **Schedule:** KAI Scheduler for GPU-aware scheduling
3. **Scale:** Add Grove for gang scheduling if needed