DGDR Reference

View as Markdown

A DynamoGraphDeploymentRequest (DGDR) is Dynamo’s deploy-by-intent API. You describe what you want to run and your performance targets; the profiler determines the optimal configuration and creates the live deployment.

For a step-by-step walkthrough of deploying your model — including strategy selection, model caching, planner setup, and common pitfalls — see the Model Deployment Guide.

DGDR vs DGD

Dynamo provides two Custom Resources for deploying inference graphs:

DGDR (recommended)DGD (manual)
You provideModel + optional SLA targetsFull deployment spec (parallelism, replicas, resource limits, etc.)
ProfilingAutomated — sweeps configurations to find optimal setupNone — you bring your own config
Hardware portabilityAdapts to whatever GPUs are in your clusterTied to the hardware you configured for
Best forMost deployments, SLA-driven optimizationKnown-good configs, pinned recipes

When to use DGD instead: Use DGD when you have a hand-crafted configuration for a specific model/hardware combination (e.g., from recipes/). These configs may be more optimal for known setups but require understanding of what parallelism parameters (TP, PP, EP) are appropriate and don’t generalize across different hardware.

For DGD deployment details, see Creating Deployments.

Spec Reference

Minimal Example

1apiVersion: nvidia.com/v1beta1
2kind: DynamoGraphDeploymentRequest
3metadata:
4 name: my-model
5spec:
6 model: Qwen/Qwen3-0.6B
7 image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.0.0"

Field Reference

FieldRequiredDefaultPurpose
modelYesHuggingFace model ID (e.g. Qwen/Qwen3-0.6B)
imageNoContainer image for the profiling job. Dynamo >= 1.1.0: use dynamo-planner; earlier versions: use dynamo-frontend.
backendNoautoInference engine: auto, vllm, sglang, trtllm
searchStrategyNorapidProfiling depth: rapid (AIC simulation, ~30s) or thorough (real GPU, 2–4h)
autoApplyNotrueAutomatically deploy the profiler’s recommended config
sla.ttftNoTarget time to first token (ms)
sla.itlNoTarget inter-token latency (ms)
sla.e2eLatencyNoTarget end-to-end latency (ms). Cannot be combined with explicit ttft/itl.
workload.islNo4000Expected average input sequence length
workload.oslNo1000Expected average output sequence length
workload.requestRateNoTarget requests per second
workload.concurrencyNoTarget concurrent requests
hardware.gpuSkuNoauto-detectedGPU SKU (see SKU Format)
hardware.vramMbNoauto-detectedGPU VRAM in MB
hardware.totalGpusNoauto-detected (capped at 32)Total GPUs available to the deployment
hardware.numGpusPerNodeNoauto-detectedGPUs per node
hardware.interconnectNoauto-detectedInterconnect type
hardware.rdmaNoauto-detectedWhether RDMA is available
modelCache.pvcNameNoName of a ReadWriteMany PVC containing cached model weights
modelCache.pvcModelPathNoPath to the model directory inside the PVC
modelCache.pvcMountPathNo/opt/model-cacheMount path inside containers
features.plannerNodisabledEnable the SLA-aware Planner (raw JSON config)
features.mockerNodisabledEnable mocker mode for testing
overrides.profilingJobNobatchv1.JobSpec overrides for the profiling job (e.g., tolerations)
overrides.dgdNoRaw DGD override base applied to the generated deployment

For the complete CRD spec, see the API Reference.

SKU Format

When providing hardware configuration manually, use lowercase underscore format:

CorrectIncorrect
h100_sxmH100-SXM5-80GB
h200_sxmH200-SXM-141GB
a100_sxmA100-SXM4-80GB
l40sL40S

All supported values: gb200_sxm, b200_sxm, h200_sxm, h100_sxm, h100_pcie, a100_sxm, a100_pcie, l40s, l40, l4, v100_sxm, v100_pcie, t4, mi200, mi300.

Not all SKUs are supported by the AIC profiler for rapid mode. See AIC Support Matrix for details.

Lifecycle

When you create a DGDR, it progresses through these phases:

PhaseWhat is happening
PendingSpec validated; operator is discovering GPU hardware and preparing the profiling job
ProfilingProfiling job running — sub-phases: Initializing, SweepingPrefill, SweepingDecode, SelectingConfig, BuildingCurves, GeneratingDGD, Done
ReadyProfiling complete; optimal config stored in .status.profilingResults.selectedConfig. Terminal state when autoApply: false.
DeployingCreating the DynamoGraphDeployment (only when autoApply: true)
DeployedDGD is running and healthy
FailedUnrecoverable error — check events and conditions for details

Conditions

The operator maintains these conditions on the DGDR status:

ConditionMeaning
ValidationSpec validation passed or failed
ProfilingProfiling job is running, succeeded, or failed
SpecGeneratedGenerated DGD spec is available
DeploymentReadyDGD is deployed and healthy
SucceededAggregate condition — true when the DGDR has reached its target state

Monitoring

$# Watch phase transitions
$kubectl get dgdr my-model -n $NAMESPACE -w
$
$# Detailed status, conditions, and events
$kubectl describe dgdr my-model -n $NAMESPACE
$
$# Profiling sub-phase
$kubectl get dgdr my-model -n $NAMESPACE -o jsonpath='{.status.profilingPhase}'
$
$# Profiling job logs
$kubectl get pods -n $NAMESPACE -l nvidia.com/dgdr-name=my-model
$kubectl logs -f <profiling-pod-name> -n $NAMESPACE
$
$# View generated DGD spec (when autoApply: false)
$kubectl get dgdr my-model -n $NAMESPACE \
> -o jsonpath='{.status.profilingResults.selectedConfig}' | python3 -m json.tool
$
$# View Pareto-optimal configs from profiling
$kubectl get dgdr my-model -n $NAMESPACE \
> -o jsonpath='{.status.profilingResults.pareto}'

Resource Ownership

  • The DGDR does not set an owner reference on the DGD it creates. Deleting a DGDR does not delete the DGD — it persists independently so it can continue serving traffic.
  • The relationship is tracked via labels: dgdr.nvidia.com/name and dgdr.nvidia.com/namespace.
  • Additional resources (planner ConfigMaps) are created in the same namespace and labeled with dgdr.nvidia.com/name.

Further Reading