--router-mode | DYN_ROUTER_MODE | round-robin | Routing strategy: round-robin, random, kv, direct |
--router-kv-overlap-score-weight | DYN_ROUTER_KV_OVERLAP_SCORE_WEIGHT | 1.0 | Weight for KV cache overlap in worker scoring. Higher = prefer cache reuse |
--router-temperature | DYN_ROUTER_TEMPERATURE | 0.0 | Softmax temperature for worker sampling. 0 = deterministic |
--router-kv-events / --no-router-kv-events | DYN_ROUTER_USE_KV_EVENTS | true | Enable KV cache state events from workers. Disable for prediction-based routing |
--router-ttl-secs | DYN_ROUTER_TTL_SECS | 120.0 | Block TTL when KV events are disabled |
--router-max-tree-size | DYN_ROUTER_MAX_TREE_SIZE | 1048576 | Max radix tree size before pruning (no-events mode) |
--router-prune-target-ratio | DYN_ROUTER_PRUNE_TARGET_RATIO | 0.8 | Target size ratio after pruning (no-events mode) |
--router-replica-sync / --no-router-replica-sync | DYN_ROUTER_REPLICA_SYNC | false | Sync state across multiple router instances |
--router-snapshot-threshold | DYN_ROUTER_SNAPSHOT_THRESHOLD | 1000000 | Messages before triggering a snapshot |
--router-reset-states / --no-router-reset-states | DYN_ROUTER_RESET_STATES | false | Reset router state on startup. Warning: affects existing replicas |
--router-track-active-blocks / --no-router-track-active-blocks | DYN_ROUTER_TRACK_ACTIVE_BLOCKS | true | Track blocks used by in-progress requests for load balancing |
--router-assume-kv-reuse / --no-router-assume-kv-reuse | DYN_ROUTER_ASSUME_KV_REUSE | true | Assume KV cache reuse when tracking active blocks |
--router-track-output-blocks / --no-router-track-output-blocks | DYN_ROUTER_TRACK_OUTPUT_BLOCKS | false | Track output blocks with fractional decay during generation |
--router-event-threads | DYN_ROUTER_EVENT_THREADS | 4 | Event processing threads. >1 enables concurrent radix tree |
--router-queue-threshold | DYN_ROUTER_QUEUE_THRESHOLD | 2.0 | Queue threshold fraction of prefill capacity. Enables priority scheduling |
--router-queue-policy | DYN_ROUTER_QUEUE_POLICY | fcfs | Queue scheduling policy: fcfs (tail TTFT) or wspt (avg TTFT) |
--enable-cache-control / --no-enable-cache-control | DYN_ENABLE_CACHE_CONTROL | false | Enable TTL-based cache pinning (requires --router-mode=kv) |
--decode-fallback / --no-decode-fallback | DYN_DECODE_FALLBACK | false | Fall back to aggregated mode when prefill workers unavailable |