Frontend Configuration Reference

Complete reference for all frontend CLI arguments, environment variables, and HTTP endpoints
View as Markdown

This page documents all configuration options for the Dynamo Frontend (python -m dynamo.frontend).

Every CLI argument has a corresponding environment variable. CLI arguments take precedence over environment variables.

HTTP & Networking

CLI ArgumentEnv VarDefaultDescription
--http-hostDYN_HTTP_HOST0.0.0.0HTTP listen address
--http-portDYN_HTTP_PORT8000HTTP listen port
--tls-cert-pathDYN_TLS_CERT_PATHTLS certificate path (PEM). Must be paired with --tls-key-path
--tls-key-pathDYN_TLS_KEY_PATHTLS private key path (PEM). Must be paired with --tls-cert-path

The Rust HTTP server also reads these environment variables (not exposed as CLI args):

Env VarDefaultDescription
DYN_HTTP_BODY_LIMIT_MB192Maximum request body size in MB
DYN_HTTP_GRACEFUL_SHUTDOWN_TIMEOUT_SECS5Graceful shutdown timeout in seconds

Router

CLI ArgumentEnv VarDefaultDescription
--router-modeDYN_ROUTER_MODEround-robinRouting strategy: round-robin, random, kv, direct, least-loaded, device-aware-weighted
--load-aware / --no-load-awareDYN_ROUTER_LOAD_AWAREfalsePreset for KV load-aware routing without cache-reuse signals; implies --router-mode kv
--router-kv-overlap-score-creditDYN_ROUTER_KV_OVERLAP_SCORE_CREDIT1.0Credit multiplier for device-local prefix overlap, from 0.0 to 1.0
--router-prefill-load-scaleDYN_ROUTER_PREFILL_LOAD_SCALE1.0Scale adjusted prompt-side prefill load before adding decode blocks
--router-temperatureDYN_ROUTER_TEMPERATURE0.0Softmax temperature for normalized worker sampling. 0 = deterministic
--router-kv-events / --no-router-kv-eventsDYN_ROUTER_USE_KV_EVENTStrueEnable KV cache state events from workers. Disable for prediction-based routing
--router-ttl-secsDYN_ROUTER_TTL_SECS120.0Block TTL when KV events are disabled
--router-replica-sync / --no-router-replica-syncDYN_ROUTER_REPLICA_SYNCfalseSync state across multiple router instances
--router-snapshot-thresholdDYN_ROUTER_SNAPSHOT_THRESHOLD1000000Messages before triggering a snapshot
--router-reset-states / --no-router-reset-statesDYN_ROUTER_RESET_STATESfalseReset router state on startup. Warning: affects existing replicas
--router-track-active-blocks / --no-router-track-active-blocksDYN_ROUTER_TRACK_ACTIVE_BLOCKStrueTrack blocks used by in-progress requests for load balancing
--router-assume-kv-reuse / --no-router-assume-kv-reuseDYN_ROUTER_ASSUME_KV_REUSEtrueAssume KV cache reuse when tracking active blocks
--router-track-output-blocks / --no-router-track-output-blocksDYN_ROUTER_TRACK_OUTPUT_BLOCKSfalseTrack output blocks with fractional decay during generation
--router-track-prefill-tokens / --no-router-track-prefill-tokensDYN_ROUTER_TRACK_PREFILL_TOKENStrueTrack prompt-side prefill load in worker load accounting
--router-prefill-load-modelDYN_ROUTER_PREFILL_LOAD_MODELnonePrompt-side load model: none for static load, aic for oldest-prefill decay using an AIC prediction
--router-event-threadsDYN_ROUTER_EVENT_THREADS4KV indexer worker threads. >1 enables the concurrent radix tree, including with --no-router-kv-events
--router-queue-thresholdDYN_ROUTER_QUEUE_THRESHOLD4.0Queue threshold fraction of prefill capacity. Enables priority scheduling
--router-queue-policyDYN_ROUTER_QUEUE_POLICYfcfsQueue scheduling policy: fcfs (tail TTFT), wspt (avg TTFT), or lcfs (comparison-only reverse ordering)
--decode-fallback / --no-decode-fallbackDYN_DECODE_FALLBACKfalseFall back to aggregated mode when prefill workers unavailable

AIC Prefill Load Model

These options are used only when --router-mode kv is combined with --router-prefill-load-model aic.

CLI ArgumentEnv VarDefaultDescription
--aic-backendDYN_AIC_BACKENDBackend family to model in AIC, for example vllm or sglang
--aic-systemDYN_AIC_SYSTEMAIC hardware/system identifier, for example h200_sxm
--aic-model-pathDYN_AIC_MODEL_PATHModel path or model identifier used for AIC perf lookup
--aic-backend-versionDYN_AIC_BACKEND_VERSIONbackend-specificPinned AIC database version. If omitted, Dynamo uses the backend default
--aic-tp-sizeDYN_AIC_TP_SIZE1Tensor-parallel size to model in AIC

When enabled, the frontend’s embedded KV router predicts one expected prefill duration per admitted request, using the selected worker’s overlap-derived cached prefix. The router then decays only the oldest active prefill request on each worker for prompt-side load accounting.

Fault Tolerance

CLI ArgumentEnv VarDefaultDescription
--migration-limitDYN_MIGRATION_LIMIT0Max request migrations per worker disconnect. 0 = disabled
--active-decode-blocks-thresholdDYN_ACTIVE_DECODE_BLOCKS_THRESHOLD1.0KV cache utilization fraction (0.0–1.0) for busy detection. Pass None to disable
--active-prefill-tokens-thresholdDYN_ACTIVE_PREFILL_TOKENS_THRESHOLD10000000Absolute token count for prefill busy detection. Pass None to disable
--active-prefill-tokens-threshold-fracDYN_ACTIVE_PREFILL_TOKENS_THRESHOLD_FRAC10.0Fraction of max_num_batched_tokens for prefill busy detection. OR logic with absolute threshold. Pass None to disable

Model Discovery

CLI ArgumentEnv VarDefaultDescription
--namespaceDYN_NAMESPACEExact namespace for model discovery scoping
--namespace-prefixDYN_NAMESPACE_PREFIXNamespace prefix for discovery (e.g., ns matches ns, ns-abc123). Takes precedence over --namespace
--model-nameDYN_MODEL_NAMEOverride model name string
--model-pathDYN_MODEL_PATHPath to local model directory (for private/custom models)
--kv-cache-block-sizeDYN_KV_CACHE_BLOCK_SIZEKV cache block size override

Infrastructure

CLI ArgumentEnv VarDefaultDescription
--discovery-backendDYN_DISCOVERY_BACKENDetcdService discovery: kubernetes, etcd, file, mem
--request-planeDYN_REQUEST_PLANEtcpRequest distribution: tcp (fastest), nats, http
--event-planeDYN_EVENT_PLANEnatsEvent publishing: nats, zmq

KServe gRPC

CLI ArgumentEnv VarDefaultDescription
--kserve-grpc-server / --no-kserve-grpc-serverDYN_KSERVE_GRPC_SERVERfalseStart KServe gRPC v2 server
--grpc-metrics-portDYN_GRPC_METRICS_PORT8788HTTP metrics port for gRPC service

See the Frontend Guide for KServe message formats and integration details.

Monitoring

CLI ArgumentEnv VarDefaultDescription
--metrics-prefixDYN_METRICS_PREFIXdynamo_frontendPrefix for frontend Prometheus metrics
--dump-config-toDYN_DUMP_CONFIG_TODump resolved config to file path

Tokenizer

CLI ArgumentEnv VarDefaultDescription
--tokenizerDYN_TOKENIZERdefaultTokenizer: default (HuggingFace) or fastokens (high-performance Rust tokenizer). See Tokenizer

Experimental

CLI ArgumentEnv VarDefaultDescription
--enable-anthropic-apiDYN_ENABLE_ANTHROPIC_APIfalseEnable /v1/messages (Anthropic Messages API)
--dyn-chat-processorDYN_CHAT_PROCESSORdynamoChat processor: dynamo or vllm
--dyn-debug-perfDYN_DEBUG_PERFfalseLog per-function timing for preprocessing (vllm processor only)
--dyn-preprocess-workersDYN_PREPROCESS_WORKERS0Worker processes for CPU-bound preprocessing. 0 = main event loop (vllm processor only)
-i / --interactiveDYN_INTERACTIVEfalseInteractive text chat mode

HTTP Endpoints

The frontend exposes the following HTTP endpoints:

OpenAI-Compatible

MethodPathDescription
POST/v1/chat/completionsChat completions (streaming and non-streaming)
POST/v1/completionsText completions
POST/v1/embeddingsText embeddings
POST/v1/responsesResponses API
POST/v1/images/generationsImage generation
POST/v1/videos/generationsVideo generation
POST/v1/videos/generations/streamVideo generation (streaming)
GET/v1/modelsList available models

Anthropic (Experimental)

MethodPathDescription
POST/v1/messagesAnthropic Messages API (requires --enable-anthropic-api)
POST/v1/messages/count_tokensToken counting for Anthropic API

Infrastructure

MethodPathDescription
GET/healthHealth check
GET/liveLiveness check
GET/metricsPrometheus metrics
GET/openapi.jsonOpenAPI specification
GET/docsSwagger UI
POST/busy_thresholdSet busy thresholds
GET/busy_thresholdGet current busy thresholds

Endpoint Path Customization

All endpoint paths can be overridden via environment variables:

Env VarDefault Path
DYN_HTTP_SVC_CHAT_PATH_ENV/v1/chat/completions
DYN_HTTP_SVC_CMP_PATH_ENV/v1/completions
DYN_HTTP_SVC_EMB_PATH_ENV/v1/embeddings
DYN_HTTP_SVC_RESPONSES_PATH_ENV/v1/responses
DYN_HTTP_SVC_MODELS_PATH_ENV/v1/models
DYN_HTTP_SVC_ANTHROPIC_PATH_ENV/v1/messages
DYN_HTTP_SVC_HEALTH_PATH_ENV/health
DYN_HTTP_SVC_LIVE_PATH_ENV/live
DYN_HTTP_SVC_METRICS_PATH_ENV/metrics

Deprecated

CLI ArgumentEnv VarDescription
--router-durable-kv-eventsDYN_ROUTER_DURABLE_KV_EVENTSUse event-plane local indexer instead

See Also