Additional Resources

Frontend Configuration Reference

Complete reference for all frontend CLI arguments, environment variables, and HTTP endpoints
View as Markdown

This page documents all configuration options for the Dynamo Frontend (python -m dynamo.frontend).

Every CLI argument has a corresponding environment variable. CLI arguments take precedence over environment variables.

HTTP & Networking

CLI ArgumentEnv VarDefaultDescription
--http-hostDYN_HTTP_HOST0.0.0.0HTTP listen address
--http-portDYN_HTTP_PORT8000HTTP listen port
--tls-cert-pathDYN_TLS_CERT_PATHTLS certificate path (PEM). Must be paired with --tls-key-path
--tls-key-pathDYN_TLS_KEY_PATHTLS private key path (PEM). Must be paired with --tls-cert-path

The Rust HTTP server also reads these environment variables (not exposed as CLI args):

Env VarDefaultDescription
DYN_HTTP_BODY_LIMIT_MB192Maximum request body size in MB
DYN_HTTP_GRACEFUL_SHUTDOWN_TIMEOUT_SECS5Graceful shutdown timeout in seconds

Router

CLI ArgumentEnv VarDefaultDescription
--router-modeDYN_ROUTER_MODEround-robinRouting strategy: round-robin, random, kv, direct
--router-kv-overlap-score-weightDYN_ROUTER_KV_OVERLAP_SCORE_WEIGHT1.0Weight for KV cache overlap in worker scoring. Higher = prefer cache reuse
--router-temperatureDYN_ROUTER_TEMPERATURE0.0Softmax temperature for worker sampling. 0 = deterministic
--router-kv-events / --no-router-kv-eventsDYN_ROUTER_USE_KV_EVENTStrueEnable KV cache state events from workers. Disable for prediction-based routing
--router-ttl-secsDYN_ROUTER_TTL_SECS120.0Block TTL when KV events are disabled
--router-max-tree-sizeDYN_ROUTER_MAX_TREE_SIZE1048576Max radix tree size before pruning (no-events mode)
--router-prune-target-ratioDYN_ROUTER_PRUNE_TARGET_RATIO0.8Target size ratio after pruning (no-events mode)
--router-replica-sync / --no-router-replica-syncDYN_ROUTER_REPLICA_SYNCfalseSync state across multiple router instances
--router-snapshot-thresholdDYN_ROUTER_SNAPSHOT_THRESHOLD1000000Messages before triggering a snapshot
--router-reset-states / --no-router-reset-statesDYN_ROUTER_RESET_STATESfalseReset router state on startup. Warning: affects existing replicas
--router-track-active-blocks / --no-router-track-active-blocksDYN_ROUTER_TRACK_ACTIVE_BLOCKStrueTrack blocks used by in-progress requests for load balancing
--router-assume-kv-reuse / --no-router-assume-kv-reuseDYN_ROUTER_ASSUME_KV_REUSEtrueAssume KV cache reuse when tracking active blocks
--router-track-output-blocks / --no-router-track-output-blocksDYN_ROUTER_TRACK_OUTPUT_BLOCKSfalseTrack output blocks with fractional decay during generation
--router-event-threadsDYN_ROUTER_EVENT_THREADS4Event processing threads. >1 enables concurrent radix tree
--router-queue-thresholdDYN_ROUTER_QUEUE_THRESHOLD2.0Queue threshold fraction of prefill capacity. Enables priority scheduling
--router-queue-policyDYN_ROUTER_QUEUE_POLICYfcfsQueue scheduling policy: fcfs (tail TTFT) or wspt (avg TTFT)
--enable-cache-control / --no-enable-cache-controlDYN_ENABLE_CACHE_CONTROLfalseEnable TTL-based cache pinning (requires --router-mode=kv)
--decode-fallback / --no-decode-fallbackDYN_DECODE_FALLBACKfalseFall back to aggregated mode when prefill workers unavailable

Fault Tolerance

CLI ArgumentEnv VarDefaultDescription
--migration-limitDYN_MIGRATION_LIMIT0Max request migrations per worker disconnect. 0 = disabled
--active-decode-blocks-thresholdDYN_ACTIVE_DECODE_BLOCKS_THRESHOLDKV cache utilization fraction (0.0–1.0) for busy detection
--active-prefill-tokens-thresholdDYN_ACTIVE_PREFILL_TOKENS_THRESHOLDAbsolute token count for prefill busy detection
--active-prefill-tokens-threshold-fracDYN_ACTIVE_PREFILL_TOKENS_THRESHOLD_FRACFraction of max_num_batched_tokens for prefill busy detection. OR logic with absolute threshold

Model Discovery

CLI ArgumentEnv VarDefaultDescription
--namespaceDYN_NAMESPACEExact namespace for model discovery scoping
--namespace-prefixDYN_NAMESPACE_PREFIXNamespace prefix for discovery (e.g., ns matches ns, ns-abc123). Takes precedence over --namespace
--model-nameDYN_MODEL_NAMEOverride model name string
--model-pathDYN_MODEL_PATHPath to local model directory (for private/custom models)
--kv-cache-block-sizeDYN_KV_CACHE_BLOCK_SIZEKV cache block size override

Infrastructure

CLI ArgumentEnv VarDefaultDescription
--discovery-backendDYN_DISCOVERY_BACKENDetcdService discovery: kubernetes, etcd, file, mem
--request-planeDYN_REQUEST_PLANEtcpRequest distribution: tcp (fastest), nats, http
--event-planeDYN_EVENT_PLANEnatsEvent publishing: nats, zmq

KServe gRPC

CLI ArgumentEnv VarDefaultDescription
--kserve-grpc-server / --no-kserve-grpc-serverDYN_KSERVE_GRPC_SERVERfalseStart KServe gRPC v2 server
--grpc-metrics-portDYN_GRPC_METRICS_PORT8788HTTP metrics port for gRPC service

See the Frontend Guide for KServe message formats and integration details.

Monitoring

CLI ArgumentEnv VarDefaultDescription
--metrics-prefixDYN_METRICS_PREFIXdynamo_frontendPrefix for frontend Prometheus metrics
--dump-config-toDYN_DUMP_CONFIG_TODump resolved config to file path

Experimental

CLI ArgumentEnv VarDefaultDescription
--enable-anthropic-apiDYN_ENABLE_ANTHROPIC_APIfalseEnable /v1/messages (Anthropic Messages API)
--dyn-chat-processorDYN_CHAT_PROCESSORdynamoChat processor: dynamo or vllm
--dyn-debug-perfDYN_DEBUG_PERFfalseLog per-function timing for preprocessing (vllm processor only)
--dyn-preprocess-workersDYN_PREPROCESS_WORKERS0Worker processes for CPU-bound preprocessing. 0 = main event loop (vllm processor only)
-i / --interactiveDYN_INTERACTIVEfalseInteractive text chat mode

HTTP Endpoints

The frontend exposes the following HTTP endpoints:

OpenAI-Compatible

MethodPathDescription
POST/v1/chat/completionsChat completions (streaming and non-streaming)
POST/v1/completionsText completions
POST/v1/embeddingsText embeddings
POST/v1/responsesResponses API
POST/v1/images/generationsImage generation
POST/v1/videos/generationsVideo generation
POST/v1/videos/generations/streamVideo generation (streaming)
GET/v1/modelsList available models

Anthropic (Experimental)

MethodPathDescription
POST/v1/messagesAnthropic Messages API (requires --enable-anthropic-api)
POST/v1/messages/count_tokensToken counting for Anthropic API

Infrastructure

MethodPathDescription
GET/healthHealth check
GET/liveLiveness check
GET/metricsPrometheus metrics
GET/openapi.jsonOpenAPI specification
GET/docsSwagger UI
POST/busy_thresholdSet busy thresholds
GET/busy_thresholdGet current busy thresholds

Endpoint Path Customization

All endpoint paths can be overridden via environment variables:

Env VarDefault Path
DYN_HTTP_SVC_CHAT_PATH_ENV/v1/chat/completions
DYN_HTTP_SVC_CMP_PATH_ENV/v1/completions
DYN_HTTP_SVC_EMB_PATH_ENV/v1/embeddings
DYN_HTTP_SVC_RESPONSES_PATH_ENV/v1/responses
DYN_HTTP_SVC_MODELS_PATH_ENV/v1/models
DYN_HTTP_SVC_ANTHROPIC_PATH_ENV/v1/messages
DYN_HTTP_SVC_HEALTH_PATH_ENV/health
DYN_HTTP_SVC_LIVE_PATH_ENV/live
DYN_HTTP_SVC_METRICS_PATH_ENV/metrics

Deprecated

CLI ArgumentEnv VarDescription
--router-durable-kv-eventsDYN_ROUTER_DURABLE_KV_EVENTSUse event-plane local indexer instead

See Also