For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes
    • DynoSim: Simulating the Pareto Frontier
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
      • Deployment Overview
      • Managing Models with DynamoModel
      • DGDR Reference
  • Feature Guides
    • KV Cache Aware Routing
    • Disaggregated Serving
    • KV Cache Offloading
    • Benchmarking
    • Tool Calling & Reasoning Parsing
    • Fault Tolerance
    • Observability (Local)
    • Inference Simulation
    • Agents
    • LoRA Adapters
    • Multimodal
    • Diffusion
    • Fastokens Tokenizer
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • DGDR, DGD, and Recipes
  • Spec Reference
  • Minimal Example
  • Field Reference
  • Generated DGD Overrides
  • SKU Format
  • Lifecycle
  • Conditions
  • Monitoring
  • Resource Ownership
  • Known Issues
  • Further Reading
Kubernetes DeploymentDeploy Models

DGDR Reference

||View as Markdown|
Previous

Managing Models with DynamoModel

Next

Model Caching

A DynamoGraphDeploymentRequest (DGDR) is Dynamo’s deploy-by-intent generator for DynamoGraphDeployment (DGD) resources. You describe what you want to run and your performance targets; the profiler determines a configuration and produces the DGD that serves traffic.

For the full deployment mental model — including DGD, DCD, DGDR, recipes, strategy selection, model caching, planner setup, and common pitfalls — see the Deployment Overview.

DGDR, DGD, and Recipes

Dynamo provides two Custom Resources for deploying inference graphs:

DGD (canonical live deployment)DGDR (generator/profiler)
You provideFull deployment spec (services, parallelism, replicas, resource limits, etc.)Model, backend, workload, hardware, and optional SLA targets
What happensThe operator reconciles the DGD into DynamoComponentDeployment resources and podsThe profiler generates a DGD; with autoApply: true, the operator creates it
Best forKnown-good configs, tuned recipes, or full manual controlNew model/hardware combinations, SLA-driven sizing, or generated DGD YAML
PersistencePersists and serves trafficReaches a terminal state after generation/deploy

Use DGD directly when you have a hand-crafted configuration for a specific model/hardware combination. Most recipes are tuned DGD manifests. Use DGDR when you want Dynamo to generate the DGD for you.

For DGD deployment details, see Creating Deployments.

Spec Reference

Minimal Example

1apiVersion: nvidia.com/v1beta1
2kind: DynamoGraphDeploymentRequest
3metadata:
4 name: my-model
5spec:
6 model: Qwen/Qwen3-0.6B
7 image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.1.1" # dynamo-frontend for Dynamo < 1.1.0

Field Reference

FieldRequiredDefaultPurpose
modelYes—HuggingFace model ID (e.g. Qwen/Qwen3-0.6B)
imageNo—Container image for the profiling job. Dynamo >= 1.1.0: use dynamo-planner; earlier versions: use dynamo-frontend.
backendNoautoInference engine: auto, vllm, sglang, trtllm
searchStrategyNorapidProfiling depth: rapid (AIC-backed DynoSim-style modeling, ~30s) or thorough (real GPU, 2–4h)
autoApplyNotrueAutomatically deploy the profiler’s recommended config
sla.ttftNo—Target time to first token (ms)
sla.itlNo—Target inter-token latency (ms)
sla.e2eLatencyNo—Target end-to-end latency (ms). Cannot be combined with explicit ttft/itl.
workload.islNo4000Expected average input sequence length
workload.oslNo1000Expected average output sequence length
workload.requestRateNo—Target requests per second
workload.concurrencyNo—Target concurrent requests
hardware.gpuSkuNoauto-detectedGPU SKU (see SKU Format)
hardware.vramMbNoauto-detectedGPU VRAM in MB
hardware.totalGpusNoauto-detected (capped at 32)Total GPUs available to the deployment
hardware.numGpusPerNodeNoauto-detectedGPUs per node
hardware.interconnectNoauto-detectedInterconnect type
hardware.rdmaNoauto-detectedWhether RDMA is available
modelCache.pvcNameNo—Name of a ReadWriteMany PVC containing cached model weights
modelCache.pvcModelPathNo—Path to the model directory inside the PVC
modelCache.pvcMountPathNo/opt/model-cacheMount path inside containers
features.plannerNodisabledEnable the SLA-aware Planner; the generated DGD includes Planner service/configuration
features.mockerNodisabledEnable mocker mode for testing
overrides.profilingJobNo—batchv1.JobSpec overrides for the profiling job (e.g., tolerations)
overrides.dgdNo—Raw DGD override base applied to the generated deployment

For the complete CRD spec, see the API Reference.

DGDR does not currently expose a features.kvRouter field. To configure router mode or KV-aware routing details, use a direct DGD, a tuned recipe, or overrides.dgd when you still want DGDR to generate the base deployment.

Generated DGD Overrides

Use spec.overrides.dgd when the generated DynamoGraphDeployment needs a field that DGDR does not expose directly. The value is a partial nvidia.com/v1alpha1 DGD object that is merged into the profiler-generated deployment after Dynamo selects a configuration.

For example, to inject an environment variable into every generated service:

1apiVersion: nvidia.com/v1beta1
2kind: DynamoGraphDeploymentRequest
3metadata:
4 name: qwen3-sglang
5spec:
6 model: Qwen/Qwen3-30B-A3B
7 backend: sglang
8 image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.1.1" # dynamo-frontend for Dynamo < 1.1.0
9 overrides:
10 dgd:
11 apiVersion: nvidia.com/v1alpha1
12 kind: DynamoGraphDeployment
13 spec:
14 envs:
15 - name: TRITON_PTXAS_PATH
16 value: /usr/local/cuda/bin/ptxas

Use spec.envs for variables that should apply to all generated services. To target a single service, override that service’s envs entry instead:

1spec:
2 overrides:
3 dgd:
4 apiVersion: nvidia.com/v1alpha1
5 kind: DynamoGraphDeployment
6 spec:
7 services:
8 decode: # replace with the generated service name
9 envs:
10 - name: CUSTOM_WORKER_ENV
11 value: "enabled"

overrides.profilingJob only customizes the profiling Job. Use overrides.dgd for settings that must appear on the deployed worker pods.

SKU Format

When providing hardware configuration manually, use lowercase underscore format:

CorrectIncorrect
h100_sxmH100-SXM5-80GB
h200_sxmH200-SXM-141GB
a100_sxmA100-SXM4-80GB
a30A30
l40sL40S

All supported values: gb200_sxm, b200_sxm, h200_sxm, h100_sxm, h100_pcie, a100_sxm, a100_pcie, a30, l40s, l40, l4, v100_sxm, v100_pcie, t4, mi200, mi300.

Not all SKUs are supported by the AIC profiler for rapid mode. See AIC Support Matrix for details.

PCIe variants not yet supported by profiler. The CRD admits PCIe SKUs (h100_pcie, a100_pcie, v100_pcie), but the profiler does not currently ship training data for them. You can submit a DGDR with a PCIe value; the operator will accept it but profiler-assisted sizing will fall back to defaults. Profiler support for PCIe SKUs is tracked as an engineering follow-up.

Lifecycle

When you create a DGDR, it progresses through these phases:

PhaseWhat is happening
PendingSpec validated; operator is discovering GPU hardware and preparing the profiling job
ProfilingProfiling job running — sub-phases: Initializing, SweepingPrefill, SweepingDecode, SelectingConfig, BuildingCurves, GeneratingDGD, Done
ReadyProfiling complete; optimal config stored in .status.profilingResults.selectedConfig. Terminal state when autoApply: false.
DeployingCreating the DynamoGraphDeployment (only when autoApply: true)
DeployedDGD is running and healthy
FailedUnrecoverable error — profiling failures are not retried (backoffLimit: 0); check events and conditions for details

Conditions

The operator maintains these conditions on the DGDR status:

ConditionMeaning
ValidationSpec validation passed or failed
ProfilingProfiling job is running, succeeded, or failed
SpecGeneratedGenerated DGD spec is available
DeploymentReadyDGD is deployed and healthy
SucceededAggregate condition — true when the DGDR has reached its target state

Monitoring

$# Watch phase transitions
$kubectl get dgdr my-model -n $NAMESPACE -w
$
$# Detailed status, conditions, and events
$kubectl describe dgdr my-model -n $NAMESPACE
$
$# Profiling sub-phase
$kubectl get dgdr my-model -n $NAMESPACE -o jsonpath='{.status.profilingPhase}'
$
$# Profiling job logs
$kubectl get pods -n $NAMESPACE -l nvidia.com/dgdr-name=my-model
$kubectl logs -f <profiling-pod-name> -n $NAMESPACE
$
$# View generated DGD spec (when autoApply: false)
$kubectl get dgdr my-model -n $NAMESPACE \
> -o jsonpath='{.status.profilingResults.selectedConfig}' | python3 -m json.tool
$
$# View Pareto-optimal configs from profiling
$kubectl get dgdr my-model -n $NAMESPACE \
> -o jsonpath='{.status.profilingResults.pareto}'

Resource Ownership

  • The DGDR does not set an owner reference on the DGD it creates. Deleting a DGDR does not delete the DGD — it persists independently so it can continue serving traffic.
  • The relationship is tracked via labels: dgdr.nvidia.com/name and dgdr.nvidia.com/namespace.
  • Additional resources (planner ConfigMaps) are created in the same namespace and labeled with dgdr.nvidia.com/name.

Known Issues

  • pareto_analysis.py produces NaN for some configurations. Tracked as an engineering follow-up. Workaround: re-run with a narrower sweep; narrow sweeps bypass the NaN path in practice.
  • PCIe profiler data not yet available. See the PCIe callout under SKU Format.

Further Reading

  • Deployment Overview — DGD, DCD, DGDR, recipes, strategy selection, and common pitfalls
  • Profiler Guide — Profiling algorithms, picking modes, gate checks
  • Profiler Examples — Ready-to-use YAML for SLA targets, private models, MoE, overrides
  • Planner Guide — Scaling modes, PlannerConfig reference
  • API Reference — Complete CRD field specifications
  • Creating Deployments — DGD spec for full manual control