Kubernetes User Guide | NVIDIA Dynamo Documentation

This page covers the Kubernetes path. To run Dynamo directly on a workstation or virtual machine without Kubernetes, see the Local CLI User Guide.

Jump Right In

Quickstart

Install Dynamo and serve a model on a Kubernetes cluster.

Model Deployment Overview

Choose a deployment workflow and learn the core Kubernetes resources.

Deploy with DGD

Define and apply a DynamoGraphDeployment for direct control.

Auto Deploy with DGDR

Generate a deployment from model and workload requirements.

Browse the Kubernetes Guide

Getting Started

Quickstart

Install Dynamo and serve a model on a Kubernetes cluster.

Compatibility

Check supported platforms, backends, features, and dependencies.

Installation

Install the Dynamo platform and prepare networking, storage, and observability for your cluster.

Install Dynamo

Install the operator, custom resource definitions, and platform services.

Multinode Orchestration

Configure Grove or LeaderWorkerSet for workloads that span nodes.

Observability

Set up the monitoring stack for a Kubernetes deployment.

RDMA setup

RDMA Overview

Prepare high-speed networking for distributed serving.

InfiniBand on Azure

Configure InfiniBand for Azure Kubernetes Service.

EFA on AWS

Configure Elastic Fabric Adapter for Amazon EKS.

Model storage

Model Storage Overview

Choose a storage approach for model weights.

AKS Storage

Compare storage options for Azure Kubernetes Service.

Azure Lustre CSI Driver

Mount Azure Managed Lustre with the Container Storage Interface driver.

EFS

Use Amazon Elastic File System for shared model storage.

Managed Kubernetes

EKS Setup

Prepare an Amazon EKS cluster for Dynamo.

ECS

Deploy Dynamo with Amazon Elastic Container Service.

AKS Setup

Prepare an Azure Kubernetes Service cluster for Dynamo.

Spot VMs

Use interruptible capacity for suitable workloads.

GKE Setup

Prepare a Google Kubernetes Engine cluster for Dynamo.

Model Deployment

Create and tune Kubernetes deployments for different models and serving patterns.

Model Deployment Introduction

Choose a deployment workflow and learn the core Kubernetes resources.

Deploy with DGD

Define and apply a DynamoGraphDeployment.

KV-Aware Routing

Route requests according to cached prompt prefixes.

Disaggregated Serving

Scale prefill and decode workers independently.

Sizing with AIConfigurator

Select parallelism and replica settings for a workload.

Multinode Deployments

Deploy a model across multiple Kubernetes nodes.

Model Caching

Reduce model-loading time across pods.

KV Cache Offloading

Extend KV cache capacity beyond GPU memory.

Auto deploy with DGDR

Auto Deploy with DGDR

Generate a deployment from model and workload requirements.

Dynamo Profiler

Profile candidate configurations on your hardware.

Dynamo Planner

Select and adjust deployment configurations.

Operations

Observe, benchmark, and simulate Kubernetes deployments.

Observability

Collect metrics from the operator and serving components.

Benchmarking with AIPerf

Measure throughput and latency under load.

Inference simulation with DynoSim

DynoSim Overview

Learn the simulation workflow and components.

Simulate a Kubernetes Deployment

Exercise the Kubernetes frontend and router with simulated workers.