Welcome to NVIDIA Dynamo#
The NVIDIA Dynamo Platform is a high-performance, low-latency inference framework designed to serve all AI models—across any framework, architecture, or deployment scale.
đź’Ž Discover the latest developments!
This guide is a snapshot of the Dynamo GitHub Repository at a specific point in time. For the latest information and examples, see:
Quick Start#
Local Deployment#
Get started with Dynamo locally in just a few commands:
1. Install Dynamo
# Install uv (recommended Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment and install Dynamo
uv venv venv
source venv/bin/activate
uv pip install "ai-dynamo[sglang]" # or [vllm], [trtllm]
2. Start etcd/NATS
# Start etcd and NATS using Docker Compose
docker compose -f deploy/docker-compose.yml up -d
3. Run Dynamo
# Start the OpenAI compatible frontend
python -m dynamo.frontend
# In another terminal, start an SGLang worker
python -m dynamo.sglang.worker deepseek-ai/DeepSeek-R1-Distill-Llama-8B
4. Test your deployment
curl localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 50}'
Kubernetes Deployment#
For deployments on Kubernetes, follow the Dynamo Platform Quickstart Guide.
Dive in: Examples#
The examples below assume you build the latest image yourself from source. If using a prebuilt image follow the examples from the corresponding branch.
Demonstrates the basic concepts of Dynamo by creating a simple GPU-unaware graph
Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with VLLM.
Demonstrates disaggregated serving on several nodes.
Presents TensorRT-LLM examples and reference implementations for deploying Large Language Models (LLMs) in various configurations.