DGDR Examples

Practical examples for deploying with DynamoGraphDeploymentRequest (DGDR). The DGDR workflow can use native AIC estimates, optional bootstrap profiling data, or live FPM warmup depending on the model/backend combination. For DGDR concepts, see the DGDR Reference. For profiling concepts, see the Profiler Guide.

Minimal DGDR with AIC (Fastest)

The simplest way to generate a deployment from native AIC estimates. Uses AI Configurator for offline profiling (20-30 seconds instead of hours):

1 apiVersion: nvidia.com/v1beta1
2 kind: DynamoGraphDeploymentRequest
3 metadata:
4   name: sla-aic
5 spec:
6   model: Qwen/Qwen3-32B
7   backend: vllm
8   image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.2.1"  # dynamo-frontend for Dynamo < 1.1.0

Deploy:

$ export NAMESPACE=your-namespace
$ # Save the manifest above as sla-aic.yaml first.
$ kubectl apply -f sla-aic.yaml -n $NAMESPACE

Online Profiling (Real Measurements)

Standard online profiling runs real GPU measurements for more accurate results. Takes 2-4 hours:

1 apiVersion: nvidia.com/v1beta1
2 kind: DynamoGraphDeploymentRequest
3 metadata:
4   name: sla-online
5 spec:
6   model: meta-llama/Llama-3.3-70B-Instruct
7   backend: vllm
8   image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.2.1"  # dynamo-frontend for Dynamo < 1.1.0
9   searchStrategy: thorough

Deploy:

$ # Save the manifest above as sla-online.yaml first.
$ kubectl apply -f sla-online.yaml -n $NAMESPACE

Note: Starting with Dynamo 1.0.0 (DGDR API version v1beta1), DGDR fields use structured spec fields (e.g., spec.workload, spec.sla, spec.hardware) instead of the nested profilingConfig.config blob used in v1alpha1.

Planner-Enabled DGDR

Set spec.features.planner to enable Planner generation in the final DGD. DGDR passes this object as PlannerConfig to the Planner service; see the Planner Guide for available fields.

1 apiVersion: nvidia.com/v1beta1
2 kind: DynamoGraphDeploymentRequest
3 metadata:
4   name: qwen3-planner
5 spec:
6   model: Qwen/Qwen3-0.6B
7   backend: vllm
8   image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.2.1"  # dynamo-frontend for Dynamo < 1.1.0
9   features:
10     planner:
11       mode: disagg
12       backend: vllm

spec.overrides.dgd is not required to enable Planner; use it only when the generated DGD needs additional customization.

Additional DGDR Patterns

MoE Models (SGLang)

For Mixture-of-Experts models like DeepSeek-R1, use SGLang backend:

1 apiVersion: nvidia.com/v1beta1
2 kind: DynamoGraphDeploymentRequest
3 metadata:
4   name: sla-moe
5 spec:
6   model: deepseek-ai/DeepSeek-R1
7   backend: sglang
8   image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.2.1"  # dynamo-frontend for Dynamo < 1.1.0

Deploy:

$ # Save the manifest above as sla-moe.yaml first.
$ kubectl apply -f sla-moe.yaml -n $NAMESPACE

Customizing the Generated DGD

Use spec.overrides.dgd to provide a partial DynamoGraphDeployment that is merged into the profiler-generated deployment:

1 apiVersion: nvidia.com/v1beta1
2 kind: DynamoGraphDeploymentRequest
3 metadata:
4   name: deepseek-r1
5 spec:
6   model: deepseek-ai/DeepSeek-R1
7   backend: sglang
8   image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.2.1"  # dynamo-frontend for Dynamo < 1.1.0
9   overrides:
10     dgd:
11       apiVersion: nvidia.com/v1alpha1
12       kind: DynamoGraphDeployment
13       spec:
14         envs:
15           - name: CUSTOM_WORKER_ENV
16             value: "enabled"

DGDR merges the override into the generated DGD after profiling selects a configuration. The controller automatically injects spec.model and spec.backend into the final configuration.

Inline Configuration (Simple Use Cases)

For simple use cases without a custom DGD config, provide the configuration directly in the v1beta1 DGDR spec fields. The profiler auto-generates a basic DGD configuration:

1 spec:
2   workload:
3     isl: 8000
4     osl: 200
5 
6   sla:
7     ttft: 200.0
8     itl: 10.0
9 
10   hardware:
11     gpuSku: h200_sxm
12 
13   searchStrategy: rapid

Simulation with Mocker

Deploy a mocker backend that simulates GPU timing behavior without real GPUs. Useful for:

Large-scale experiments without GPU resources
Testing profiling behavior and infrastructure
Validating deployment configurations

1 spec:
2   model: <model-name>
3   backend: trtllm  # Real backend for profiling
4   features:
5     mocker:
6       enabled: true  # Deploy mocker instead of real backend
7 
8   image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.2.1"  # dynamo-frontend for Dynamo < 1.1.0

Profiling runs against the real backend (via GPUs or AIC). The mocker deployment then uses profiling data to simulate realistic timing.

Model Cache PVC (0.8.1+)

For large models, use a pre-populated PVC instead of downloading from HuggingFace:

See SLA-Driven Profiling for configuration details.

Advanced DGDR Patterns

Review Before Deploy (autoApply: false)

Disable auto-deployment to inspect the generated DGD:

1 spec:
2   autoApply: false

After profiling completes:

$ # Extract and review generated DGD
$ kubectl get dgdr sla-aic -n $NAMESPACE \
>   -o jsonpath='{.status.profilingResults.selectedConfig}' > my-dgd.yaml
$ 
$ # Review and modify as needed
$ vi my-dgd.yaml
$ 
$ # Deploy manually
$ kubectl apply -f my-dgd.yaml -n $NAMESPACE

Profiling Artifacts with PVC

Save detailed profiling artifacts (plots, logs, raw data) to a PVC:

1 spec:
2   workload:
3     isl: 3000
4     osl: 150
5 
6   sla:
7     ttft: 200
8     itl: 20

Setup:

$ export NAMESPACE=your-namespace
$ deploy/utils/setup_benchmarking_resources.sh

Access results:

$ kubectl apply -f deploy/utils/manifests/pvc-access-pod.yaml -n $NAMESPACE
$ kubectl wait --for=condition=Ready pod/pvc-access-pod -n $NAMESPACE --timeout=60s
$ kubectl cp $NAMESPACE/pvc-access-pod:/data ./profiling-results
$ kubectl delete pod pvc-access-pod -n $NAMESPACE

DGDR Reference — DGDR field reference and lifecycle
Profiler Guide — Profiling workflow

DGDR Examples

Minimal DGDR with AIC (Fastest)

Online Profiling (Real Measurements)

Planner-Enabled DGDR

Additional DGDR Patterns

MoE Models (SGLang)

Customizing the Generated DGD

Inline Configuration (Simple Use Cases)

Simulation with Mocker

Model Cache PVC (0.8.1+)

Advanced DGDR Patterns

Review Before Deploy (autoApply: false)

Profiling Artifacts with PVC

Related Documentation