Practical examples for deploying the Planner with throughput-based scaling. All examples below use the DGDR workflow with pre-deployment profiling. For deployment concepts, see the Planner Guide. For a quick overview, see the Planner README.
The simplest way to deploy with the Planner. Uses AI Configurator for offline profiling (20-30 seconds instead of hours):
Deploy:
Standard online profiling runs real GPU measurements for more accurate results. Takes 2-4 hours:
Deploy:
Available sample DGDRs in components/src/dynamo/profiler/deploy/:
profile_sla_dgdr.yaml: Standard online profiling for dense modelsprofile_sla_aic_dgdr.yaml: Fast offline profiling using AI Configuratorprofile_sla_moe_dgdr.yaml: Online profiling for MoE models (SGLang)Note: Starting with Dynamo 1.0.0 (DGDR API version v1beta1), DGDR fields use structured spec fields (e.g.,
spec.workload,spec.sla,spec.hardware) instead of the nestedprofilingConfig.configblob used in v1alpha1.
For Mixture-of-Experts models like DeepSeek-R1, use SGLang backend:
Deploy:
Reference an existing DynamoGraphDeployment config via ConfigMap:
Step 1: Create ConfigMap from your DGD config:
Step 2: Reference it in your DGDR:
The profiler uses the DGD config from the ConfigMap as a base template, then optimizes it based on your SLA targets. The controller automatically injects spec.model and spec.backend into the final configuration.
For simple use cases without a custom DGD config, provide the configuration directly in the v1beta1 DGDR spec fields. The profiler auto-generates a basic DGD configuration:
Deploy a mocker backend that simulates GPU timing behavior without real GPUs. Useful for:
Profiling runs against the real backend (via GPUs or AIC). The mocker deployment then uses profiling data to simulate realistic timing.
For large models, use a pre-populated PVC instead of downloading from HuggingFace:
See SLA-Driven Profiling for configuration details.
Pre-load predictors with historical request patterns before live traffic:
The trace file should be in mooncake-style JSONL format with request-count, ISL, and OSL samples.
For workloads with rapid changes, tune the Kalman filter:
For workloads with daily/weekly patterns:
For non-Kubernetes environments, use the VirtualConnector to communicate scaling decisions:
See components/planner/test/test_virtual_connector.py for a full working example.
Pass planner-specific settings through the DGDR:
Disable auto-deployment to inspect the generated DGD:
After profiling completes:
Save detailed profiling artifacts (plots, logs, raw data) to a PVC:
Setup:
Access results: