The Dynamo Planner is an autoscaling controller that adjusts prefill and decode engine replica counts at runtime to meet latency SLAs. It reads traffic signals (Prometheus metrics or load predictor output) and engine performance profiles to decide when to scale up or down.
For a quick overview, see the Planner overview. For architecture internals, see Planner Design.
The planner supports two scaling modes that can be used independently or together:
enable_throughput_scaling: true): Uses pre-deployment engine interpolation data and traffic prediction to plan capacity. Best for stable, predictable workloads. Requires profiling data generated by the Profiler.enable_load_scaling: true): Uses real-time per-worker engine metrics and online regression. Best for bursty or unpredictable traffic. Does not require profiling data. Requires the KV Router — see Current Limitations.When to use which:
throughput_adjustment_interval.The planner is configured via a PlannerConfig JSON/YAML object. When using the profiler, this is placed under the features.planner section of the DGDR spec:
At least one scaling mode must be enabled.
When throughput-based scaling is enabled, the planner needs interpolation curves that map ISL to TTFT (prefill) and KV-cache utilization to ITL (decode). The profiler generates this data based on the pre_deployment_sweeping_mode setting. See the Profiler Guide for details on how this data is produced.
When the profiler runs with planner enabled, it:
PlannerConfig and profiling data into separate Kubernetes ConfigMapsThe planner receives its config via --config /path/to/planner_config.json which is mounted from the planner-config-XXXX ConfigMap. Profiling data is mounted from the planner-profile-data-XXXX ConfigMap.
See the Profiler Guide for the full profiling workflow and how to configure pre-deployment sweeping.
If you want one public endpoint for a model but multiple private DGDs optimized for different request classes, use a hierarchical deployment:
Frontend, GlobalRouter, and GlobalPlannerIn the current workflow, run profiling independently for each intended pool, then compose the final control DGD plus pool DGDs manually. See the Global Planner Guide.