Quickstart

View as Markdown

This guide covers running Dynamo using the CLI on your local machine or VM.

Looking to deploy on Kubernetes instead? See the Kubernetes Installation Guide and Kubernetes Quickstart for cluster deployments.

Choose Your Install Path

PathBest ForGuide
Local InstallRunning Dynamo on a single machine or VMLocal Installation
KubernetesProduction multi-node cluster deploymentsKubernetes Deployment Guide
Building from SourceContributors and local developmentBuilding from Source

Install Dynamo

Option A: Containers (Recommended)

Containers have all dependencies pre-installed. No setup required.

$# SGLang
$docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0
$
$# TensorRT-LLM
$docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0
$
$# vLLM
$docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0

See Release Artifacts for available versions and backend guides for run instructions: SGLang | TensorRT-LLM | vLLM

Option B: Install from PyPI

$# Install uv (recommended Python package manager)
$curl -LsSf https://astral.sh/uv/install.sh | sh
$
$# Create virtual environment
$uv venv venv
$source venv/bin/activate
$uv pip install pip

Install system dependencies and the Dynamo wheel for your chosen backend:

SGLang

$sudo apt install python3-dev
$uv pip install --prerelease=allow "ai-dynamo[sglang]"

TensorRT-LLM

$sudo apt install python3-dev
$pip install torch==2.9.0 torchvision --index-url https://download.pytorch.org/whl/cu130
$pip install --pre --extra-index-url https://pypi.nvidia.com "ai-dynamo[trtllm]"

vLLM

$sudo apt install python3-dev libxcb1
$uv pip install --prerelease=allow "ai-dynamo[vllm]"

Run Dynamo

Start the frontend, then start a worker for your chosen backend.

To run in a single terminal (useful in containers), append > logfile.log 2>&1 & to run processes in background. Example: python3 -m dynamo.frontend --discovery-backend file > dynamo.frontend.log 2>&1 &

$# Start the OpenAI compatible frontend (default port is 8000)
$# --discovery-backend file avoids needing etcd (frontend and workers must share a disk)
$python3 -m dynamo.frontend --discovery-backend file

In another terminal (or same terminal if using background mode), start a worker:

SGLang

$python3 -m dynamo.sglang --model-path Qwen/Qwen3-0.6B --discovery-backend file

TensorRT-LLM

$python3 -m dynamo.trtllm --model-path Qwen/Qwen3-0.6B --discovery-backend file

vLLM

$python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --discovery-backend file \
> --kv-events-config '{"enable_kv_cache_events": false}'

Test Your Deployment

$curl localhost:8000/v1/chat/completions \
> -H "Content-Type: application/json" \
> -d '{"model": "Qwen/Qwen3-0.6B",
> "messages": [{"role": "user", "content": "Hello!"}],
> "max_tokens": 50}'