Welcome to NVIDIA Dynamo#

The NVIDIA Dynamo Platform is a high-performance, low-latency inference framework designed to serve all AI models—across any framework, architecture, or deployment scale.

đź’Ž Discover the latest developments!

This guide is a snapshot of the Dynamo GitHub Repository at a specific point in time. For the latest information and examples, see:

Quick Start#

Local Deployment#

Get started with Dynamo locally in just a few commands:

1. Install Dynamo

# Install uv (recommended Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment and install Dynamo
uv venv venv
source venv/bin/activate
uv pip install "ai-dynamo[sglang]"  # or [vllm], [trtllm]

2. Start etcd/NATS

# Start etcd and NATS using Docker Compose
docker compose -f deploy/docker-compose.yml up -d

3. Run Dynamo

# Start the OpenAI compatible frontend
python -m dynamo.frontend

# In another terminal, start an SGLang worker
python -m dynamo.sglang.worker deepseek-ai/DeepSeek-R1-Distill-Llama-8B

4. Test your deployment

curl localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
       "messages": [{"role": "user", "content": "Hello!"}],
       "max_tokens": 50}'

Kubernetes Deployment#

For deployments on Kubernetes, follow the Dynamo Platform Quickstart Guide.

Dive in: Examples#

The examples below assume you build the latest image yourself from source. If using a prebuilt image follow the examples from the corresponding branch.

Demonstrates the basic concepts of Dynamo by creating a simple GPU-unaware graph

Hello World Example

Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with VLLM.

LLM Deployment using vLLM

Demonstrates disaggregated serving on several nodes.

Multinode Examples

Presents TensorRT-LLM examples and reference implementations for deploying Large Language Models (LLMs) in various configurations.

LLM Deployment using TensorRT-LLM