For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
  • Kubernetes Deployment
    • Deployment Guide
  • User Guides
    • KV Cache Aware Routing
    • Disaggregated Serving
    • KV Cache Offloading
    • Dynamo Benchmarking
    • Multimodal
    • Diffusion (Preview)
    • Tool Calling
    • LoRA Adapters
    • Agents
    • Observability (Local)
    • Fault Tolerance
    • Writing Python Workers
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • SGLang HiCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
  • Additional Resources
    • Frontend Configuration Reference
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • HTTP & Networking
  • Router
  • Fault Tolerance
  • Model Discovery
  • Infrastructure
  • KServe gRPC
  • Monitoring
  • Experimental
  • HTTP Endpoints
  • OpenAI-Compatible
  • Anthropic (Experimental)
  • Infrastructure
  • Endpoint Path Customization
  • Deprecated
  • See Also
Additional Resources

Frontend Configuration Reference

Complete reference for all frontend CLI arguments, environment variables, and HTTP endpoints
||View as Markdown|
Previous

Dynamo Docs Guide

This page documents all configuration options for the Dynamo Frontend (python -m dynamo.frontend).

Every CLI argument has a corresponding environment variable. CLI arguments take precedence over environment variables.

HTTP & Networking

CLI ArgumentEnv VarDefaultDescription
--http-hostDYN_HTTP_HOST0.0.0.0HTTP listen address
--http-portDYN_HTTP_PORT8000HTTP listen port
--tls-cert-pathDYN_TLS_CERT_PATH—TLS certificate path (PEM). Must be paired with --tls-key-path
--tls-key-pathDYN_TLS_KEY_PATH—TLS private key path (PEM). Must be paired with --tls-cert-path

The Rust HTTP server also reads these environment variables (not exposed as CLI args):

Env VarDefaultDescription
DYN_HTTP_BODY_LIMIT_MB192Maximum request body size in MB
DYN_HTTP_GRACEFUL_SHUTDOWN_TIMEOUT_SECS5Graceful shutdown timeout in seconds

Router

CLI ArgumentEnv VarDefaultDescription
--router-modeDYN_ROUTER_MODEround-robinRouting strategy: round-robin, random, kv, direct
--router-kv-overlap-score-weightDYN_ROUTER_KV_OVERLAP_SCORE_WEIGHT1.0Weight for KV cache overlap in worker scoring. Higher = prefer cache reuse
--router-temperatureDYN_ROUTER_TEMPERATURE0.0Softmax temperature for worker sampling. 0 = deterministic
--router-kv-events / --no-router-kv-eventsDYN_ROUTER_USE_KV_EVENTStrueEnable KV cache state events from workers. Disable for prediction-based routing
--router-ttl-secsDYN_ROUTER_TTL_SECS120.0Block TTL when KV events are disabled
--router-max-tree-sizeDYN_ROUTER_MAX_TREE_SIZE1048576Max radix tree size before pruning (no-events mode)
--router-prune-target-ratioDYN_ROUTER_PRUNE_TARGET_RATIO0.8Target size ratio after pruning (no-events mode)
--router-replica-sync / --no-router-replica-syncDYN_ROUTER_REPLICA_SYNCfalseSync state across multiple router instances
--router-snapshot-thresholdDYN_ROUTER_SNAPSHOT_THRESHOLD1000000Messages before triggering a snapshot
--router-reset-states / --no-router-reset-statesDYN_ROUTER_RESET_STATESfalseReset router state on startup. Warning: affects existing replicas
--router-track-active-blocks / --no-router-track-active-blocksDYN_ROUTER_TRACK_ACTIVE_BLOCKStrueTrack blocks used by in-progress requests for load balancing
--router-assume-kv-reuse / --no-router-assume-kv-reuseDYN_ROUTER_ASSUME_KV_REUSEtrueAssume KV cache reuse when tracking active blocks
--router-track-output-blocks / --no-router-track-output-blocksDYN_ROUTER_TRACK_OUTPUT_BLOCKSfalseTrack output blocks with fractional decay during generation
--router-event-threadsDYN_ROUTER_EVENT_THREADS4Event processing threads. >1 enables concurrent radix tree
--router-queue-thresholdDYN_ROUTER_QUEUE_THRESHOLD2.0Queue threshold fraction of prefill capacity. Enables priority scheduling
--router-queue-policyDYN_ROUTER_QUEUE_POLICYfcfsQueue scheduling policy: fcfs (tail TTFT) or wspt (avg TTFT)
--enable-cache-control / --no-enable-cache-controlDYN_ENABLE_CACHE_CONTROLfalseEnable TTL-based cache pinning (requires --router-mode=kv)
--decode-fallback / --no-decode-fallbackDYN_DECODE_FALLBACKfalseFall back to aggregated mode when prefill workers unavailable

Fault Tolerance

CLI ArgumentEnv VarDefaultDescription
--migration-limitDYN_MIGRATION_LIMIT0Max request migrations per worker disconnect. 0 = disabled
--active-decode-blocks-thresholdDYN_ACTIVE_DECODE_BLOCKS_THRESHOLD—KV cache utilization fraction (0.0–1.0) for busy detection
--active-prefill-tokens-thresholdDYN_ACTIVE_PREFILL_TOKENS_THRESHOLD—Absolute token count for prefill busy detection
--active-prefill-tokens-threshold-fracDYN_ACTIVE_PREFILL_TOKENS_THRESHOLD_FRAC—Fraction of max_num_batched_tokens for prefill busy detection. OR logic with absolute threshold

Model Discovery

CLI ArgumentEnv VarDefaultDescription
--namespaceDYN_NAMESPACE—Exact namespace for model discovery scoping
--namespace-prefixDYN_NAMESPACE_PREFIX—Namespace prefix for discovery (e.g., ns matches ns, ns-abc123). Takes precedence over --namespace
--model-nameDYN_MODEL_NAME—Override model name string
--model-pathDYN_MODEL_PATH—Path to local model directory (for private/custom models)
--kv-cache-block-sizeDYN_KV_CACHE_BLOCK_SIZE—KV cache block size override

Infrastructure

CLI ArgumentEnv VarDefaultDescription
--discovery-backendDYN_DISCOVERY_BACKENDetcdService discovery: kubernetes, etcd, file, mem
--request-planeDYN_REQUEST_PLANEtcpRequest distribution: tcp (fastest), nats, http
--event-planeDYN_EVENT_PLANEnatsEvent publishing: nats, zmq

KServe gRPC

CLI ArgumentEnv VarDefaultDescription
--kserve-grpc-server / --no-kserve-grpc-serverDYN_KSERVE_GRPC_SERVERfalseStart KServe gRPC v2 server
--grpc-metrics-portDYN_GRPC_METRICS_PORT8788HTTP metrics port for gRPC service

See the Frontend Guide for KServe message formats and integration details.

Monitoring

CLI ArgumentEnv VarDefaultDescription
--metrics-prefixDYN_METRICS_PREFIXdynamo_frontendPrefix for frontend Prometheus metrics
--dump-config-toDYN_DUMP_CONFIG_TO—Dump resolved config to file path

Experimental

CLI ArgumentEnv VarDefaultDescription
--enable-anthropic-apiDYN_ENABLE_ANTHROPIC_APIfalseEnable /v1/messages (Anthropic Messages API)
--dyn-chat-processorDYN_CHAT_PROCESSORdynamoChat processor: dynamo or vllm
--dyn-debug-perfDYN_DEBUG_PERFfalseLog per-function timing for preprocessing (vllm processor only)
--dyn-preprocess-workersDYN_PREPROCESS_WORKERS0Worker processes for CPU-bound preprocessing. 0 = main event loop (vllm processor only)
-i / --interactiveDYN_INTERACTIVEfalseInteractive text chat mode

HTTP Endpoints

The frontend exposes the following HTTP endpoints:

OpenAI-Compatible

MethodPathDescription
POST/v1/chat/completionsChat completions (streaming and non-streaming)
POST/v1/completionsText completions
POST/v1/embeddingsText embeddings
POST/v1/responsesResponses API
POST/v1/images/generationsImage generation
POST/v1/videos/generationsVideo generation
POST/v1/videos/generations/streamVideo generation (streaming)
GET/v1/modelsList available models

Anthropic (Experimental)

MethodPathDescription
POST/v1/messagesAnthropic Messages API (requires --enable-anthropic-api)
POST/v1/messages/count_tokensToken counting for Anthropic API

Infrastructure

MethodPathDescription
GET/healthHealth check
GET/liveLiveness check
GET/metricsPrometheus metrics
GET/openapi.jsonOpenAPI specification
GET/docsSwagger UI
POST/busy_thresholdSet busy thresholds
GET/busy_thresholdGet current busy thresholds

Endpoint Path Customization

All endpoint paths can be overridden via environment variables:

Env VarDefault Path
DYN_HTTP_SVC_CHAT_PATH_ENV/v1/chat/completions
DYN_HTTP_SVC_CMP_PATH_ENV/v1/completions
DYN_HTTP_SVC_EMB_PATH_ENV/v1/embeddings
DYN_HTTP_SVC_RESPONSES_PATH_ENV/v1/responses
DYN_HTTP_SVC_MODELS_PATH_ENV/v1/models
DYN_HTTP_SVC_ANTHROPIC_PATH_ENV/v1/messages
DYN_HTTP_SVC_HEALTH_PATH_ENV/health
DYN_HTTP_SVC_LIVE_PATH_ENV/live
DYN_HTTP_SVC_METRICS_PATH_ENV/metrics

Deprecated

CLI ArgumentEnv VarDescription
--router-durable-kv-eventsDYN_ROUTER_DURABLE_KV_EVENTSUse event-plane local indexer instead

See Also

  • Frontend Overview — quick start and feature matrix
  • Frontend Guide — KServe gRPC configuration
  • NVIDIA Request Extensions (nvext) — custom request fields
  • Router Guide — detailed routing configuration
  • Metrics — available Prometheus metrics
  • Fault Tolerance — request migration and rejection