For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
  • Kubernetes Deployment
    • Deployment Guide
  • User Guides
    • KV Cache Aware Routing
    • Disaggregated Serving
    • KV Cache Offloading
    • Dynamo Benchmarking
    • Multimodal
    • Diffusion (Preview)
    • Tool Calling
    • LoRA Adapters
    • Agents
    • Observability (Local)
    • Fault Tolerance
    • Writing Python Workers
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • SGLang HiCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
    • Blog
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Choose Your Install Path
  • Install Dynamo
  • Run Dynamo
  • Test Your Deployment
Getting Started

Quickstart

||View as Markdown|
Edit this page
Next

Introduction

This guide covers running Dynamo using the CLI on your local machine or VM.

Looking to deploy on Kubernetes instead? See the Kubernetes Installation Guide and Kubernetes Quickstart for cluster deployments.

Choose Your Install Path

PathBest ForGuide
Local InstallRunning Dynamo on a single machine or VMLocal Installation
KubernetesProduction multi-node cluster deploymentsKubernetes Deployment Guide
Building from SourceContributors and local developmentBuilding from Source

Install Dynamo

Option A: Containers (Recommended)

Containers have all dependencies pre-installed. No setup required.

$# SGLang
$docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0
$
$# TensorRT-LLM
$docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0
$
$# vLLM
$docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0

See Release Artifacts for available versions and backend guides for run instructions: SGLang | TensorRT-LLM | vLLM

Option B: Install from PyPI

$# Install uv (recommended Python package manager)
$curl -LsSf https://astral.sh/uv/install.sh | sh
$
$# Create virtual environment
$uv venv venv
$source venv/bin/activate
$uv pip install pip

Install system dependencies and the Dynamo wheel for your chosen backend:

SGLang

$sudo apt install python3-dev
$uv pip install --prerelease=allow "ai-dynamo[sglang]"

TensorRT-LLM

$sudo apt install python3-dev
$pip install torch==2.9.0 torchvision --index-url https://download.pytorch.org/whl/cu130
$pip install --pre --extra-index-url https://pypi.nvidia.com "ai-dynamo[trtllm]"

vLLM

$sudo apt install python3-dev libxcb1
$uv pip install --prerelease=allow "ai-dynamo[vllm]"

Run Dynamo

Start the frontend, then start a worker for your chosen backend.

To run in a single terminal (useful in containers), append > logfile.log 2>&1 & to run processes in background. Example: python3 -m dynamo.frontend --discovery-backend file > dynamo.frontend.log 2>&1 &

$# Start the OpenAI compatible frontend (default port is 8000)
$# --discovery-backend file avoids needing etcd (frontend and workers must share a disk)
$python3 -m dynamo.frontend --discovery-backend file

In another terminal (or same terminal if using background mode), start a worker:

SGLang

$python3 -m dynamo.sglang --model-path Qwen/Qwen3-0.6B --discovery-backend file

TensorRT-LLM

$python3 -m dynamo.trtllm --model-path Qwen/Qwen3-0.6B --discovery-backend file

vLLM

$python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --discovery-backend file \
> --kv-events-config '{"enable_kv_cache_events": false}'

Test Your Deployment

$curl localhost:8000/v1/chat/completions \
> -H "Content-Type: application/json" \
> -d '{"model": "Qwen/Qwen3-0.6B",
> "messages": [{"role": "user", "content": "Hello!"}],
> "max_tokens": 50}'