Quickstart
Get a Dynamo OpenAI-compatible endpoint running in a container in about 5 minutes.
Get a Dynamo OpenAI-compatible endpoint running in a container in about 5 minutes.
You’re here. Container fast path.
Full walkthrough — PyPI, configuration.
Kubernetes-native production path.
For contributors against main.
Dynamo is backend-agnostic and Kubernetes-native without being Kubernetes-only. Use this container path to try the same frontend/router/worker stack locally; use the Kubernetes path when you want the operator, CRDs, Gateway API integration, autoscaling, scheduling, and cluster lifecycle management.
Containers have all dependencies pre-installed. Pick your backend:
Hugging Face token required for gated models. Llama, Kimi, Qwen-VL, and other gated models require HF_TOKEN in your environment and accepting the model card’s license on huggingface.co. Set export HF_TOKEN=hf_… before launching.
For container versions and tags, see Release Artifacts.
In your container, start the OpenAI-compatible frontend on port 8000:
--discovery-backend file avoids needing etcd. To run frontend and worker in the same terminal, background each command with > logfile.log 2>&1 &.
In another terminal, launch a worker for your backend:
Check the endpoint is up:
If you see OK, send a chat completion:
Connection refused? The frontend takes a few seconds to start — retry. For production liveness and readiness probes, see Health Checks.
How Dynamo optimizes for agentic workloads at three layers: the frontend API, the router, and KV cache management.
How Dynamo’s concurrent global index evolved through six iterations to sustain over 100M ops/sec.
Pick a full install path from the four options above, or explore how Dynamo works under the hood: