This guide walks through installing and running Dynamo on a local machine or VM with one or more GPUs. By the end, you’ll have a working OpenAI-compatible endpoint serving a model.
For production multi-node clusters, see the Kubernetes Deployment Guide. To build from source for development, see Building from Source.
TensorRT-LLM does not support Python 3.11.
For the full compatibility matrix including backend framework versions, see the Support Matrix.
Containers have all dependencies pre-installed. No setup required.
To run frontend and worker in the same container, either:
& (see Run Dynamo section below), ordocker exec -it <container_id> bashSee Release Artifacts for available versions and backend guides for run instructions: SGLang | TensorRT-LLM | vLLM
Install system dependencies and the Dynamo wheel for your chosen backend:
SGLang
For CUDA 13 (B300/GB300), the container is recommended. See SGLang install docs for details.
TensorRT-LLM
TensorRT-LLM requires pip due to a transitive Git URL dependency that
uv doesn’t resolve. We recommend using the TensorRT-LLM container for
broader compatibility. See the TRT-LLM backend guide
for details.
vLLM
Dynamo components discover each other through a shared backend. Two options are available:
This guide uses --discovery-backend file. For etcd setup, see Service Discovery.
Verify the CLI is installed and callable:
If you cloned the repository, you can run additional system checks:
To run in a single terminal (useful in containers), append > logfile.log 2>&1 &
to run processes in background:
In another terminal (or same terminal if using background mode), start a worker for your chosen backend:
SGLang
TensorRT-LLM
The warning Cannot connect to ModelExpress server/transport error. Using direct download.
is expected in local deployments and can be safely ignored.
vLLM
For dependency-free local development, disable KV event publishing (avoids NATS):
--kv-events-config '{"enable_kv_cache_events": false}'vLLM automatically enables KV event publishing when prefix caching is active. In a future release, KV events will be disabled by default for all backends. Start using --kv-events-config explicitly to prepare.
CUDA/driver version mismatch
Run nvidia-smi to check your driver version. Dynamo requires driver 575.51.03+ for CUDA 12 or 580.00.03+ for CUDA 13. B300/GB300 GPUs require CUDA 13. See the Support Matrix for full requirements.
Model doesn’t fit on GPU (OOM)
The default model Qwen/Qwen3-0.6B requires ~2GB of GPU memory. Larger models need more VRAM:
Start with a small model and scale up based on your hardware.
Python 3.11 with TensorRT-LLM
TensorRT-LLM does not support Python 3.11. If you see installation failures with TensorRT-LLM, check your Python version with python3 --version. Use Python 3.10 or 3.12 instead.
Container runs but GPU not detected
Ensure you passed --gpus all to docker run. Without this flag, the container won’t have access to GPUs: