For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
      • Quickstart
      • Installation Guide
      • Model Deployment Guide
      • DGDR Reference
      • Dynamo Operator
      • Service Discovery
      • Webhooks
      • Minikube Setup
      • Managing Models with DynamoModel
      • Autoscaling
      • Rolling Update
      • Developing with Tilt
      • Inference Gateway (GAIE)
      • Snapshot
      • Shadow Engine Failover
      • Disagg Communication
  • User Guides
    • Disaggregated Serving
    • KV Cache Aware Routing
    • KV Cache Offloading
    • Tool Calling
    • Reasoning
    • Multimodal
    • Diffusion
    • LoRA Adapters
    • Agents
    • Observability (Local)
    • Fault Tolerance
    • Benchmarking
    • Writing Python Workers
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • SGLang HiCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Prerequisites
  • HuggingFace token secret
  • GPU Operator quick install
  • Detailed installation
  • Verify cluster is ready
  • Install Dynamo
  • Deploy Your First Model
  • Send a Request
  • Cleanup
  • Next Steps
Kubernetes DeploymentDeployment Guide

Quickstart

||View as Markdown|
Edit this page
Previous

Flash Indexer: A Story of Inter-Galactic KV Routing

Next

Installation Guide

Get a model running on Kubernetes in minutes.

Prerequisites

  • Kubernetes cluster (v1.24+) with GPU nodes
  • kubectl (v1.24+)
  • Helm (v3.0+) installed
  • NVIDIA GPU Operator installed on the cluster
  • HuggingFace token secret on cluster

HuggingFace token secret

Create a HuggingFace token secret for model downloads. If you don’t have a token, see the HuggingFace token guide.

$export HF_TOKEN=<your-hf-token>
$
$kubectl create secret generic hf-token-secret \
> --from-literal=HF_TOKEN="$HF_TOKEN"

GPU Operator quick install

If you don’t have the GPU Operator yet:

$helm repo add nvidia https://helm.ngc.nvidia.com/nvidia --force-update
$helm repo update nvidia
$helm install gpu-operator nvidia/gpu-operator \
> --namespace gpu-operator --create-namespace \
> --wait --timeout=600s

If your cluster already provides GPU drivers (e.g., GKE with gpu-driver-version=latest, or AKS), add:

$--set driver.enabled=false --set toolkit.enabled=false

Detailed installation

The GPU Operator is the only prerequisite for a basic deployment. For additional features like RDMA, Prometheus, or multinode scheduling with Grove/KAI Scheduler, see the Installation Guide.

If your GPU SKU and cloud provider are supported, you can use AICR for rapid installation of prerequisites and the Dynamo Helm chart.

Verify cluster is ready

Optionally, verify your cluster is ready:

$./deploy/pre-deployment/pre-deployment-check.sh

Install Dynamo

$export NAMESPACE=dynamo-system
$helm install dynamo-platform \
> oci://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform \
> --version "1.0.2" \
> --namespace "$NAMESPACE" \
> --create-namespace

Wait for the platform pods:

$kubectl get pods -n $NAMESPACE
$# Expected: dynamo-operator-*, etcd-*, nats-* pods all Running

Deploy Your First Model

Deploy Qwen/Qwen3-0.6B using a DynamoGraphDeploymentRequest (DGDR).

The DGDR is the entrypoint for deploying models. It runs automatic profiling for your model/hardware and creates an auto-configured DynamoGraphDeployment (DGD). After that, the DGDR is completed and reaches a terminal state, similar to a K8s Job and can be cleaned up. The DGD is the resource that persists and serves your model.

1# qwen3-quickstart.yaml
2apiVersion: nvidia.com/v1beta1
3kind: DynamoGraphDeploymentRequest
4metadata:
5 name: qwen3-quickstart
6spec:
7 model: Qwen/Qwen3-0.6B
8 backend: auto
9 image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.0.2"
$kubectl apply -f qwen3-quickstart.yaml -n $NAMESPACE

Watch the DGDR progress from Pending → Profiling → Deploying → Deployed:

$kubectl get dgdr qwen3-quickstart -n $NAMESPACE -w

Dynamo supports vLLM, TensorRT-LLM, and SGLang backends. Setting backend: auto lets the profiler choose the best one for your model and hardware. See the backends guide for details.

Send a Request

Once the DGDR shows Deployed:

$# Find and port-forward the frontend
$FRONTEND_SVC=$(kubectl get svc -n $NAMESPACE -o name | grep frontend | head -1)
$kubectl port-forward "$FRONTEND_SVC" 8000:8000 -n $NAMESPACE &
$
$# Send a request
$curl -s http://localhost:8000/v1/chat/completions \
> -H "Content-Type: application/json" \
> -d '{
> "model": "Qwen/Qwen3-0.6B",
> "messages": [{"role": "user", "content": "What is NVIDIA Dynamo?"}],
> "max_tokens": 200
> }' | python3 -m json.tool

Cleanup

$kubectl delete dgdr qwen3-quickstart -n $NAMESPACE

Next Steps

  • Installation Guide — Cloud provider setup, GPU Operator details, optional components (Grove, RDMA, model caching, Prometheus)
  • Model Deployment Guide — Strategy selection, model caching, planner, multinode, common pitfalls
  • DGDR Reference — Spec reference, lifecycle phases, monitoring commands, DGDR vs DGD
  • Creating Deployments — Hand-craft a DGD spec for full control