DGDR Examples

View as Markdown

Practical examples for deploying with DynamoGraphDeploymentRequest (DGDR). The DGDR workflow can use native AIC estimates, optional bootstrap profiling data, or live FPM warmup depending on the model/backend combination. For DGDR concepts, see the DGDR Reference. For profiling concepts, see the Profiler Guide.

DGDR Examples

Minimal DGDR with AIC (Fastest)

The simplest way to generate a deployment from native AIC estimates. Uses AI Configurator for offline profiling (20-30 seconds instead of hours):

1apiVersion: nvidia.com/v1beta1
2kind: DynamoGraphDeploymentRequest
3metadata:
4 name: sla-aic
5spec:
6 model: Qwen/Qwen3-32B
7 backend: vllm
8 image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.2.1" # dynamo-frontend for Dynamo < 1.1.0

Deploy:

$export NAMESPACE=your-namespace
$# Save the manifest above as sla-aic.yaml first.
$kubectl apply -f sla-aic.yaml -n $NAMESPACE

Online Profiling (Real Measurements)

Standard online profiling runs real GPU measurements for more accurate results. Takes 2-4 hours:

1apiVersion: nvidia.com/v1beta1
2kind: DynamoGraphDeploymentRequest
3metadata:
4 name: sla-online
5spec:
6 model: meta-llama/Llama-3.3-70B-Instruct
7 backend: vllm
8 image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.2.1" # dynamo-frontend for Dynamo < 1.1.0
9 searchStrategy: thorough

Deploy:

$# Save the manifest above as sla-online.yaml first.
$kubectl apply -f sla-online.yaml -n $NAMESPACE

Note: Starting with Dynamo 1.0.0 (DGDR API version v1beta1), DGDR fields use structured spec fields (e.g., spec.workload, spec.sla, spec.hardware) instead of the nested profilingConfig.config blob used in v1alpha1.

Planner-Enabled DGDR

Set spec.features.planner to enable Planner generation in the final DGD. DGDR passes this object as PlannerConfig to the Planner service; see the Planner Guide for available fields.

1apiVersion: nvidia.com/v1beta1
2kind: DynamoGraphDeploymentRequest
3metadata:
4 name: qwen3-planner
5spec:
6 model: Qwen/Qwen3-0.6B
7 backend: vllm
8 image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.2.1" # dynamo-frontend for Dynamo < 1.1.0
9 features:
10 planner:
11 mode: disagg
12 backend: vllm

spec.overrides.dgd is not required to enable Planner; use it only when the generated DGD needs additional customization.

Additional DGDR Patterns

MoE Models (SGLang)

For Mixture-of-Experts models like DeepSeek-R1, use SGLang backend:

1apiVersion: nvidia.com/v1beta1
2kind: DynamoGraphDeploymentRequest
3metadata:
4 name: sla-moe
5spec:
6 model: deepseek-ai/DeepSeek-R1
7 backend: sglang
8 image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.2.1" # dynamo-frontend for Dynamo < 1.1.0

Deploy:

$# Save the manifest above as sla-moe.yaml first.
$kubectl apply -f sla-moe.yaml -n $NAMESPACE

Customizing the Generated DGD

Use spec.overrides.dgd to provide a partial DynamoGraphDeployment that is merged into the profiler-generated deployment:

1apiVersion: nvidia.com/v1beta1
2kind: DynamoGraphDeploymentRequest
3metadata:
4 name: deepseek-r1
5spec:
6 model: deepseek-ai/DeepSeek-R1
7 backend: sglang
8 image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.2.1" # dynamo-frontend for Dynamo < 1.1.0
9 overrides:
10 dgd:
11 apiVersion: nvidia.com/v1alpha1
12 kind: DynamoGraphDeployment
13 spec:
14 envs:
15 - name: CUSTOM_WORKER_ENV
16 value: "enabled"

DGDR merges the override into the generated DGD after profiling selects a configuration. The controller automatically injects spec.model and spec.backend into the final configuration.

Inline Configuration (Simple Use Cases)

For simple use cases without a custom DGD config, provide the configuration directly in the v1beta1 DGDR spec fields. The profiler auto-generates a basic DGD configuration:

1spec:
2 workload:
3 isl: 8000
4 osl: 200
5
6 sla:
7 ttft: 200.0
8 itl: 10.0
9
10 hardware:
11 gpuSku: h200_sxm
12
13 searchStrategy: rapid

Simulation with Mocker

Deploy a mocker backend that simulates GPU timing behavior without real GPUs. Useful for:

  • Large-scale experiments without GPU resources
  • Testing profiling behavior and infrastructure
  • Validating deployment configurations
1spec:
2 model: <model-name>
3 backend: trtllm # Real backend for profiling
4 features:
5 mocker:
6 enabled: true # Deploy mocker instead of real backend
7
8 image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.2.1" # dynamo-frontend for Dynamo < 1.1.0

Profiling runs against the real backend (via GPUs or AIC). The mocker deployment then uses profiling data to simulate realistic timing.

Model Cache PVC (0.8.1+)

For large models, use a pre-populated PVC instead of downloading from HuggingFace:

See SLA-Driven Profiling for configuration details.

Advanced DGDR Patterns

Review Before Deploy (autoApply: false)

Disable auto-deployment to inspect the generated DGD:

1spec:
2 autoApply: false

After profiling completes:

$# Extract and review generated DGD
$kubectl get dgdr sla-aic -n $NAMESPACE \
> -o jsonpath='{.status.profilingResults.selectedConfig}' > my-dgd.yaml
$
$# Review and modify as needed
$vi my-dgd.yaml
$
$# Deploy manually
$kubectl apply -f my-dgd.yaml -n $NAMESPACE

Profiling Artifacts with PVC

Save detailed profiling artifacts (plots, logs, raw data) to a PVC:

1spec:
2 workload:
3 isl: 3000
4 osl: 150
5
6 sla:
7 ttft: 200
8 itl: 20

Setup:

$export NAMESPACE=your-namespace
$deploy/utils/setup_benchmarking_resources.sh

Access results:

$kubectl apply -f deploy/utils/manifests/pvc-access-pod.yaml -n $NAMESPACE
$kubectl wait --for=condition=Ready pod/pvc-access-pod -n $NAMESPACE --timeout=60s
$kubectl cp $NAMESPACE/pvc-access-pod:/data ./profiling-results
$kubectl delete pod pvc-access-pod -n $NAMESPACE