Quickstart | NVIDIA Dynamo Documentation

This guide covers running Dynamo using the CLI on your local machine or VM.

Looking to deploy on Kubernetes instead? See the Kubernetes Installation Guide and Kubernetes Quickstart for cluster deployments.

Choose Your Install Path

Path	Best For	Guide
Local Install	Running Dynamo on a single machine or VM	Local Installation
Kubernetes	Production multi-node cluster deployments	Kubernetes Deployment Guide
Building from Source	Contributors and local development	Building from Source

Install Dynamo

Option A: Containers (Recommended)

Containers have all dependencies pre-installed. No setup required.

$ # SGLang
$ docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.1
$ 
$ # TensorRT-LLM
$ docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.1
$ 
$ # vLLM
$ docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.1

See Release Artifacts for available versions and backend guides for run instructions: SGLang | TensorRT-LLM | vLLM

Option B: Install from PyPI

$ # Install uv (recommended Python package manager)
$ curl -LsSf https://astral.sh/uv/install.sh | sh
$ 
$ # Create virtual environment
$ uv venv venv
$ source venv/bin/activate
$ uv pip install pip

Install system dependencies and the Dynamo wheel for your chosen backend:

SGLang

$ sudo apt install python3-dev
$ uv pip install --prerelease=allow "ai-dynamo[sglang]"

TensorRT-LLM

$ sudo apt install python3-dev
$ pip install torch==2.9.0 torchvision --index-url https://download.pytorch.org/whl/cu130
$ pip install --pre --extra-index-url https://pypi.nvidia.com "ai-dynamo[trtllm]"

vLLM

$ sudo apt install python3-dev libxcb1
$ uv pip install --prerelease=allow "ai-dynamo[vllm]"

Run Dynamo

Start the frontend, then start a worker for your chosen backend.

To run in a single terminal (useful in containers), append > logfile.log 2>&1 & to run processes in background. Example: python3 -m dynamo.frontend --discovery-backend file > dynamo.frontend.log 2>&1 &

$ # Start the OpenAI compatible frontend (default port is 8000)
$ # --discovery-backend file avoids needing etcd (frontend and workers must share a disk)
$ python3 -m dynamo.frontend --discovery-backend file

In another terminal (or same terminal if using background mode), start a worker:

SGLang

$ python3 -m dynamo.sglang --model-path Qwen/Qwen3-0.6B --discovery-backend file

TensorRT-LLM

$ python3 -m dynamo.trtllm --model-path Qwen/Qwen3-0.6B --discovery-backend file

vLLM

$ python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --discovery-backend file \
>   --kv-events-config '{"enable_kv_cache_events": false}'

Test Your Deployment

$ curl localhost:8000/v1/chat/completions \
>   -H "Content-Type: application/json" \
>   -d '{"model": "Qwen/Qwen3-0.6B",
>        "messages": [{"role": "user", "content": "Hello!"}],
>        "max_tokens": 50}'