> For clean Markdown content of this page, append .md to this URL. For the complete documentation index, see https://docs.nvidia.com/dynamo/llms.txt. For full content including API reference and SDK examples, see https://docs.nvidia.com/dynamo/llms-full.txt.

# Quickstart

## Choose Your Path

<CardGroup cols={4}>
  <Card title="Quickstart" icon="regular bolt" href="/dynamo/dev/getting-started/quickstart">
    You're here. Container fast path.
  </Card>

  <Card title="Local Installation" icon="regular laptop-code" href="/dynamo/dev/getting-started/local-installation">
    Full walkthrough — PyPI, configuration.
  </Card>

  <Card title="Kubernetes" icon="regular cubes" href="/dynamo/dev/kubernetes">
    Production multi-node clusters.
  </Card>

  <Card title="Build from Source" icon="regular code-branch" href="/dynamo/dev/getting-started/building-from-source">
    For contributors against `main`.
  </Card>
</CardGroup>

<Note>
  Dynamo is backend-agnostic — every install path works with **SGLang**, **TensorRT-LLM**, and **vLLM**. Pick the install path that fits your environment, then choose your backend.
</Note>

## Pull a Container

Containers have all dependencies pre-installed. Pick your backend:

<Tabs>
  <Tab title="SGLang" language="sglang">
    ```bash
    docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.2
    ```
  </Tab>

  <Tab title="TensorRT-LLM" language="trtllm">
    ```bash
    docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.2
    ```
  </Tab>

  <Tab title="vLLM" language="vllm">
    ```bash
    docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.2
    ```
  </Tab>
</Tabs>

<Warning>
  **Hugging Face token required for gated models.** Llama, Kimi, Qwen-VL, and other gated models require `HF_TOKEN` in your environment and accepting the model card's license on huggingface.co. Set `export HF_TOKEN=hf_…` before launching.
</Warning>

For container versions and tags, see [Release Artifacts](/dynamo/v1.1.0/resources/release-artifacts#container-images).

## Start the Frontend

In your container, start the OpenAI-compatible frontend on port 8000:

```bash
python3 -m dynamo.frontend --discovery-backend file
```

<Tip>
  `--discovery-backend file` avoids needing etcd. To run frontend and worker in the same terminal, background each command with `> logfile.log 2>&1 &`.
</Tip>

## Start a Worker

In another terminal, launch a worker for your backend:

<Tabs>
  <Tab title="SGLang" language="sglang">
    ```bash
    python3 -m dynamo.sglang --model-path Qwen/Qwen3-0.6B --discovery-backend file
    ```
  </Tab>

  <Tab title="TensorRT-LLM" language="trtllm">
    ```bash
    python3 -m dynamo.trtllm --model-path Qwen/Qwen3-0.6B --discovery-backend file
    ```
  </Tab>

  <Tab title="vLLM" language="vllm">
    ```bash
    python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --discovery-backend file \
      --kv-events-config '{"enable_kv_cache_events": false}'
    ```
  </Tab>
</Tabs>

## Verify and Test

Check the endpoint is up:

```bash
curl -sf http://localhost:8000/health && echo OK
```

If you see `OK`, send a chat completion:

<CodeBlocks>
  ```bash title="Request"
  curl localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "Qwen/Qwen3-0.6B",
         "messages": [{"role": "user", "content": "Hello!"}],
         "max_tokens": 50}'
  ```

  ```json title="Response"
  {
    "id": "chatcmpl-...",
    "model": "Qwen/Qwen3-0.6B",
    "choices": [{
      "index": 0,
      "message": {"role": "assistant", "content": "Hello! How can I help you today?"},
      "finish_reason": "stop"
    }],
    "usage": {"prompt_tokens": 9, "completion_tokens": 10, "total_tokens": 19}
  }
  ```
</CodeBlocks>

<Info>
  Connection refused? The frontend takes a few seconds to start — retry. For production liveness and readiness probes, see [Health Checks](/dynamo/v1.1.0/user-guides/observability-local/health-checks).
</Info>

## From the Blog

<CardGroup cols={2}>
  <Card title="Full-Stack Optimizations for Agentic Inference" icon="regular microchip" href="/dynamo/dev/blog/agentic-inference">
    How Dynamo optimizes for agentic workloads at three layers: the frontend API, the router, and KV cache management.
  </Card>

  <Card title="Flash Indexer: Inter-Galactic KV Routing" icon="regular bolt" href="/dynamo/dev/blog/flash-indexer">
    How Dynamo's concurrent global index evolved through six iterations to sustain over 100M ops/sec.
  </Card>
</CardGroup>

## Dive Deeper

Pick a full install path from the [four options above](#choose-your-path), or explore how Dynamo works under the hood:

<CardGroup cols={2}>
  <Card title="Architecture" icon="regular sitemap" href="/dynamo/dev/design-docs/architecture">
    How the frontend, router, and workers fit together.
  </Card>

  <Card title="Frontend Guide" icon="regular bolt" href="/dynamo/dev/components/frontend">
    Worker discovery, multi-model routing, OpenAI compat.
  </Card>

  <Card title="KV Cache Aware Routing" icon="regular route" href="/dynamo/dev/components/router/router-concepts#kv-cache-routing">
    How the router places requests for prefix reuse.
  </Card>

  <Card title="Health Checks" icon="regular heart-pulse" href="/dynamo/dev/observability/health-checks">
    Liveness and readiness probes for production deployments.
  </Card>
</CardGroup>