> For clean Markdown content of this page, append .md to this URL. For the complete documentation index, see https://docs.nvidia.com/dynamo/llms.txt. For full content including API reference and SDK examples, see https://docs.nvidia.com/dynamo/llms-full.txt.

# Frontend Configuration Reference

This page documents all configuration options for the Dynamo Frontend (`python -m dynamo.frontend`).

Every CLI argument has a corresponding environment variable. CLI arguments take precedence over environment variables.

## HTTP & Networking

| CLI Argument | Env Var | Default | Description |
|-------------|---------|---------|-------------|
| `--http-host` | `DYN_HTTP_HOST` | `0.0.0.0` | HTTP listen address |
| `--http-port` | `DYN_HTTP_PORT` | `8000` | HTTP listen port |
| `--tls-cert-path` | `DYN_TLS_CERT_PATH` | — | TLS certificate path (PEM). Must be paired with `--tls-key-path` |
| `--tls-key-path` | `DYN_TLS_KEY_PATH` | — | TLS private key path (PEM). Must be paired with `--tls-cert-path` |

The Rust HTTP server also reads these environment variables (not exposed as CLI args):

| Env Var | Default | Description |
|---------|---------|-------------|
| `DYN_HTTP_BODY_LIMIT_MB` | `192` | Maximum request body size in MB |
| `DYN_HTTP_GRACEFUL_SHUTDOWN_TIMEOUT_SECS` | `5` | Graceful shutdown timeout in seconds |

## Router

| CLI Argument | Env Var | Default | Description |
|-------------|---------|---------|-------------|
| `--router-mode` | `DYN_ROUTER_MODE` | `round-robin` | Routing strategy: `round-robin`, `random`, `kv`, `direct` |
| `--router-kv-overlap-score-weight` | `DYN_ROUTER_KV_OVERLAP_SCORE_WEIGHT` | `1.0` | Weight for KV cache overlap in worker scoring. Higher = prefer cache reuse |
| `--router-temperature` | `DYN_ROUTER_TEMPERATURE` | `0.0` | Softmax temperature for worker sampling. 0 = deterministic |
| `--router-kv-events` / `--no-router-kv-events` | `DYN_ROUTER_USE_KV_EVENTS` | `true` | Enable KV cache state events from workers. Disable for prediction-based routing |
| `--router-ttl-secs` | `DYN_ROUTER_TTL_SECS` | `120.0` | Block TTL when KV events are disabled |
| `--router-max-tree-size` | `DYN_ROUTER_MAX_TREE_SIZE` | `1048576` | Max radix tree size before pruning (no-events mode) |
| `--router-prune-target-ratio` | `DYN_ROUTER_PRUNE_TARGET_RATIO` | `0.8` | Target size ratio after pruning (no-events mode) |
| `--router-replica-sync` / `--no-router-replica-sync` | `DYN_ROUTER_REPLICA_SYNC` | `false` | Sync state across multiple router instances |
| `--router-snapshot-threshold` | `DYN_ROUTER_SNAPSHOT_THRESHOLD` | `1000000` | Messages before triggering a snapshot |
| `--router-reset-states` / `--no-router-reset-states` | `DYN_ROUTER_RESET_STATES` | `false` | Reset router state on startup. **Warning:** affects existing replicas |
| `--router-track-active-blocks` / `--no-router-track-active-blocks` | `DYN_ROUTER_TRACK_ACTIVE_BLOCKS` | `true` | Track blocks used by in-progress requests for load balancing |
| `--router-assume-kv-reuse` / `--no-router-assume-kv-reuse` | `DYN_ROUTER_ASSUME_KV_REUSE` | `true` | Assume KV cache reuse when tracking active blocks |
| `--router-track-output-blocks` / `--no-router-track-output-blocks` | `DYN_ROUTER_TRACK_OUTPUT_BLOCKS` | `false` | Track output blocks with fractional decay during generation |
| `--router-event-threads` | `DYN_ROUTER_EVENT_THREADS` | `4` | Event processing threads. >1 enables concurrent radix tree |
| `--router-queue-threshold` | `DYN_ROUTER_QUEUE_THRESHOLD` | `2.0` | Queue threshold fraction of prefill capacity. Enables priority scheduling |
| `--router-queue-policy` | `DYN_ROUTER_QUEUE_POLICY` | `fcfs` | Queue scheduling policy: `fcfs` (tail TTFT) or `wspt` (avg TTFT) |
| `--enable-cache-control` / `--no-enable-cache-control` | `DYN_ENABLE_CACHE_CONTROL` | `false` | Enable TTL-based cache pinning (requires `--router-mode=kv`) |
| `--decode-fallback` / `--no-decode-fallback` | `DYN_DECODE_FALLBACK` | `false` | Fall back to aggregated mode when prefill workers unavailable |

## Fault Tolerance

| CLI Argument | Env Var | Default | Description |
|-------------|---------|---------|-------------|
| `--migration-limit` | `DYN_MIGRATION_LIMIT` | `0` | Max request migrations per worker disconnect. 0 = disabled |
| `--active-decode-blocks-threshold` | `DYN_ACTIVE_DECODE_BLOCKS_THRESHOLD` | — | KV cache utilization fraction (0.0–1.0) for busy detection |
| `--active-prefill-tokens-threshold` | `DYN_ACTIVE_PREFILL_TOKENS_THRESHOLD` | — | Absolute token count for prefill busy detection |
| `--active-prefill-tokens-threshold-frac` | `DYN_ACTIVE_PREFILL_TOKENS_THRESHOLD_FRAC` | — | Fraction of `max_num_batched_tokens` for prefill busy detection. OR logic with absolute threshold |

## Model Discovery

| CLI Argument | Env Var | Default | Description |
|-------------|---------|---------|-------------|
| `--namespace` | `DYN_NAMESPACE` | — | Exact namespace for model discovery scoping |
| `--namespace-prefix` | `DYN_NAMESPACE_PREFIX` | — | Namespace prefix for discovery (e.g., `ns` matches `ns`, `ns-abc123`). Takes precedence over `--namespace` |
| `--model-name` | `DYN_MODEL_NAME` | — | Override model name string |
| `--model-path` | `DYN_MODEL_PATH` | — | Path to local model directory (for private/custom models) |
| `--kv-cache-block-size` | `DYN_KV_CACHE_BLOCK_SIZE` | — | KV cache block size override |

## Infrastructure

| CLI Argument | Env Var | Default | Description |
|-------------|---------|---------|-------------|
| `--discovery-backend` | `DYN_DISCOVERY_BACKEND` | `etcd` | Service discovery: `kubernetes`, `etcd`, `file`, `mem` |
| `--request-plane` | `DYN_REQUEST_PLANE` | `tcp` | Request distribution: `tcp` (fastest), `nats`, `http` |
| `--event-plane` | `DYN_EVENT_PLANE` | `nats` | Event publishing: `nats`, `zmq` |

## KServe gRPC

| CLI Argument | Env Var | Default | Description |
|-------------|---------|---------|-------------|
| `--kserve-grpc-server` / `--no-kserve-grpc-server` | `DYN_KSERVE_GRPC_SERVER` | `false` | Start KServe gRPC v2 server |
| `--grpc-metrics-port` | `DYN_GRPC_METRICS_PORT` | `8788` | HTTP metrics port for gRPC service |

See the [Frontend Guide](/dynamo/v1.0.0/components/frontend/frontend-guide) for KServe message formats and integration details.

## Monitoring

| CLI Argument | Env Var | Default | Description |
|-------------|---------|---------|-------------|
| `--metrics-prefix` | `DYN_METRICS_PREFIX` | `dynamo_frontend` | Prefix for frontend Prometheus metrics |
| `--dump-config-to` | `DYN_DUMP_CONFIG_TO` | — | Dump resolved config to file path |

## Experimental

| CLI Argument | Env Var | Default | Description |
|-------------|---------|---------|-------------|
| `--enable-anthropic-api` | `DYN_ENABLE_ANTHROPIC_API` | `false` | Enable `/v1/messages` (Anthropic Messages API) |
| `--dyn-chat-processor` | `DYN_CHAT_PROCESSOR` | `dynamo` | Chat processor: `dynamo` or `vllm` |
| `--dyn-debug-perf` | `DYN_DEBUG_PERF` | `false` | Log per-function timing for preprocessing (vllm processor only) |
| `--dyn-preprocess-workers` | `DYN_PREPROCESS_WORKERS` | `0` | Worker processes for CPU-bound preprocessing. 0 = main event loop (vllm processor only) |
| `-i` / `--interactive` | `DYN_INTERACTIVE` | `false` | Interactive text chat mode |

## HTTP Endpoints

The frontend exposes the following HTTP endpoints:

### OpenAI-Compatible

| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/v1/chat/completions` | Chat completions (streaming and non-streaming) |
| `POST` | `/v1/completions` | Text completions |
| `POST` | `/v1/embeddings` | Text embeddings |
| `POST` | `/v1/responses` | Responses API |
| `POST` | `/v1/images/generations` | Image generation |
| `POST` | `/v1/videos/generations` | Video generation |
| `POST` | `/v1/videos/generations/stream` | Video generation (streaming) |
| `GET` | `/v1/models` | List available models |

### Anthropic (Experimental)

| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/v1/messages` | Anthropic Messages API (requires `--enable-anthropic-api`) |
| `POST` | `/v1/messages/count_tokens` | Token counting for Anthropic API |

### Infrastructure

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/health` | Health check |
| `GET` | `/live` | Liveness check |
| `GET` | `/metrics` | Prometheus metrics |
| `GET` | `/openapi.json` | OpenAPI specification |
| `GET` | `/docs` | Swagger UI |
| `POST` | `/busy_threshold` | Set busy thresholds |
| `GET` | `/busy_threshold` | Get current busy thresholds |

### Endpoint Path Customization

All endpoint paths can be overridden via environment variables:

| Env Var | Default Path |
|---------|-------------|
| `DYN_HTTP_SVC_CHAT_PATH_ENV` | `/v1/chat/completions` |
| `DYN_HTTP_SVC_CMP_PATH_ENV` | `/v1/completions` |
| `DYN_HTTP_SVC_EMB_PATH_ENV` | `/v1/embeddings` |
| `DYN_HTTP_SVC_RESPONSES_PATH_ENV` | `/v1/responses` |
| `DYN_HTTP_SVC_MODELS_PATH_ENV` | `/v1/models` |
| `DYN_HTTP_SVC_ANTHROPIC_PATH_ENV` | `/v1/messages` |
| `DYN_HTTP_SVC_HEALTH_PATH_ENV` | `/health` |
| `DYN_HTTP_SVC_LIVE_PATH_ENV` | `/live` |
| `DYN_HTTP_SVC_METRICS_PATH_ENV` | `/metrics` |

## Deprecated

| CLI Argument | Env Var | Description |
|-------------|---------|-------------|
| `--router-durable-kv-events` | `DYN_ROUTER_DURABLE_KV_EVENTS` | Use event-plane local indexer instead |

## See Also

- [Frontend Overview](/dynamo/v1.0.0/components/frontend) — quick start and feature matrix
- [Frontend Guide](/dynamo/v1.0.0/components/frontend/frontend-guide) — KServe gRPC configuration
- [NVIDIA Request Extensions (nvext)](/dynamo/v1.0.0/additional-resources/nvidia-request-extensions-nvext) — custom request fields
- [Router Guide](/dynamo/v1.0.0/user-guides/kv-cache-aware-routing) — detailed routing configuration
- [Metrics](/dynamo/v1.0.0/user-guides/observability-local/metrics) — available Prometheus metrics
- [Fault Tolerance](/dynamo/v1.0.0/user-guides/fault-tolerance) — request migration and rejection