> For clean Markdown content of this page, append .md to this URL. For the complete documentation index, see https://docs.nvidia.com/dynamo/llms.txt. For full content including API reference and SDK examples, see https://docs.nvidia.com/dynamo/llms-full.txt.

# DGDR Reference

A `DynamoGraphDeploymentRequest` (DGDR) is Dynamo's deploy-by-intent generator
for [`DynamoGraphDeployment`](/dynamo/dev/additional-resources/api-reference-k-8-s#dynamographdeployment) (DGD)
resources. You describe what you want to run and your performance targets; the
profiler determines a configuration and produces the DGD that serves traffic.

For the full deployment mental model — including DGD, DCD, DGDR, recipes,
strategy selection, model caching, planner setup, and common pitfalls — see the
[Deployment Overview](/dynamo/dev/kubernetes-deployment/deploy-models/model-deployment-guide).

## DGDR, DGD, and Recipes

Dynamo provides two Custom Resources for deploying inference graphs:

| | DGD (canonical live deployment) | DGDR (generator/profiler) |
|---|---|---|
| **You provide** | Full deployment spec (services, parallelism, replicas, resource limits, etc.) | Model, backend, workload, hardware, and optional SLA targets |
| **What happens** | The operator reconciles the DGD into `DynamoComponentDeployment` resources and pods | The profiler generates a DGD; with `autoApply: true`, the operator creates it |
| **Best for** | Known-good configs, tuned recipes, or full manual control | New model/hardware combinations, SLA-driven sizing, or generated DGD YAML |
| **Persistence** | Persists and serves traffic | Reaches a terminal state after generation/deploy |

Use DGD directly when you have a hand-crafted configuration for a specific
model/hardware combination. Most
[recipes](https://github.com/ai-dynamo/dynamo/tree/main/recipes) are tuned DGD
manifests. Use DGDR when you want Dynamo to generate the DGD for you.

For DGD deployment details, see [Creating Deployments](/dynamo/dev/additional-resources/creating-deployments).

## Spec Reference

### Minimal Example

```yaml
apiVersion: nvidia.com/v1beta1
kind: DynamoGraphDeploymentRequest
metadata:
  name: my-model
spec:
  model: Qwen/Qwen3-0.6B
  image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.1.1"  # dynamo-frontend for Dynamo < 1.1.0
```

### Field Reference

| Field | Required | Default | Purpose |
|---|---|---|---|
| `model` | Yes | — | HuggingFace model ID (e.g. `Qwen/Qwen3-0.6B`) |
| `image` | No | — | Container image for the profiling job. Dynamo >= 1.1.0: use `dynamo-planner`; earlier versions: use `dynamo-frontend`. |
| `backend` | No | `auto` | Inference engine: `auto`, `vllm`, `sglang`, `trtllm` |
| `searchStrategy` | No | `rapid` | Profiling depth: `rapid` (AIC-backed DynoSim-style modeling, ~30s) or `thorough` (real GPU, 2–4h) |
| `autoApply` | No | `true` | Automatically deploy the profiler's recommended config |
| `sla.ttft` | No | — | Target time to first token (ms) |
| `sla.itl` | No | — | Target inter-token latency (ms) |
| `sla.e2eLatency` | No | — | Target end-to-end latency (ms). Cannot be combined with explicit `ttft`/`itl`. |
| `workload.isl` | No | `4000` | Expected average input sequence length |
| `workload.osl` | No | `1000` | Expected average output sequence length |
| `workload.requestRate` | No | — | Target requests per second |
| `workload.concurrency` | No | — | Target concurrent requests |
| `hardware.gpuSku` | No | auto-detected | GPU SKU (see [SKU Format](#sku-format)) |
| `hardware.vramMb` | No | auto-detected | GPU VRAM in MB |
| `hardware.totalGpus` | No | auto-detected (capped at 32) | Total GPUs available to the deployment |
| `hardware.numGpusPerNode` | No | auto-detected | GPUs per node |
| `hardware.interconnect` | No | auto-detected | Interconnect type |
| `hardware.rdma` | No | auto-detected | Whether RDMA is available |
| `modelCache.pvcName` | No | — | Name of a `ReadWriteMany` PVC containing cached model weights |
| `modelCache.pvcModelPath` | No | — | Path to the model directory inside the PVC |
| `modelCache.pvcMountPath` | No | `/opt/model-cache` | Mount path inside containers |
| `features.planner` | No | disabled | Enable the SLA-aware Planner; the generated DGD includes Planner service/configuration |
| `features.mocker` | No | disabled | Enable mocker mode for testing |
| `overrides.profilingJob` | No | — | `batchv1.JobSpec` overrides for the profiling job (e.g., tolerations) |
| `overrides.dgd` | No | — | Raw DGD override base applied to the generated deployment |

For the complete CRD spec, see the [API Reference](/dynamo/dev/additional-resources/api-reference-k-8-s).

<Note>
DGDR does not currently expose a `features.kvRouter` field. To configure
router mode or KV-aware routing details, use a direct DGD, a tuned recipe, or
`overrides.dgd` when you still want DGDR to generate the base deployment.
</Note>

### Generated DGD Overrides

Use `spec.overrides.dgd` when the generated `DynamoGraphDeployment` needs a
field that DGDR does not expose directly. The value is a partial
`nvidia.com/v1alpha1` DGD object that is merged into the profiler-generated
deployment after Dynamo selects a configuration.

For example, to inject an environment variable into every generated service:

```yaml
apiVersion: nvidia.com/v1beta1
kind: DynamoGraphDeploymentRequest
metadata:
  name: qwen3-sglang
spec:
  model: Qwen/Qwen3-30B-A3B
  backend: sglang
  image: "nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.1.1"  # dynamo-frontend for Dynamo < 1.1.0
  overrides:
    dgd:
      apiVersion: nvidia.com/v1alpha1
      kind: DynamoGraphDeployment
      spec:
        envs:
          - name: TRITON_PTXAS_PATH
            value: /usr/local/cuda/bin/ptxas
```

Use `spec.envs` for variables that should apply to all generated services. To
target a single service, override that service's `envs` entry instead:

```yaml
spec:
  overrides:
    dgd:
      apiVersion: nvidia.com/v1alpha1
      kind: DynamoGraphDeployment
      spec:
        services:
          decode:  # replace with the generated service name
            envs:
              - name: CUSTOM_WORKER_ENV
                value: "enabled"
```

<Note>
`overrides.profilingJob` only customizes the profiling Job. Use
`overrides.dgd` for settings that must appear on the deployed worker pods.
</Note>

### SKU Format

When providing hardware configuration manually, use lowercase underscore format:

| Correct | Incorrect |
|---|---|
| `h100_sxm` | `H100-SXM5-80GB` |
| `h200_sxm` | `H200-SXM-141GB` |
| `a100_sxm` | `A100-SXM4-80GB` |
| `a30` | `A30` |
| `l40s` | `L40S` |

All supported values: `gb200_sxm`, `b200_sxm`, `h200_sxm`, `h100_sxm`,
`h100_pcie`, `a100_sxm`, `a100_pcie`, `a30`, `l40s`, `l40`, `l4`,
`v100_sxm`, `v100_pcie`, `t4`, `mi200`, `mi300`.

<Note>
Not all SKUs are supported by the AIC profiler for `rapid` mode. See
[AIC Support Matrix](/dynamo/dev/kubernetes-deployment/deploy-models/model-deployment-guide#aic-support-matrix) for details.
</Note>

<Info>
**PCIe variants not yet supported by profiler.** The CRD admits PCIe SKUs
(`h100_pcie`, `a100_pcie`, `v100_pcie`), but the profiler does not currently
ship training data for them. You can submit a DGDR with a PCIe value; the
operator will accept it but profiler-assisted sizing will fall back to
defaults. Profiler support for PCIe SKUs is tracked as an engineering
follow-up.
</Info>

## Lifecycle

When you create a DGDR, it progresses through these phases:

| Phase | What is happening |
|---|---|
| `Pending` | Spec validated; operator is discovering GPU hardware and preparing the profiling job |
| `Profiling` | Profiling job running — sub-phases: `Initializing`, `SweepingPrefill`, `SweepingDecode`, `SelectingConfig`, `BuildingCurves`, `GeneratingDGD`, `Done` |
| `Ready` | Profiling complete; optimal config stored in `.status.profilingResults.selectedConfig`. Terminal state when `autoApply: false`. |
| `Deploying` | Creating the `DynamoGraphDeployment` (only when `autoApply: true`) |
| `Deployed` | DGD is running and healthy |
| `Failed` | Unrecoverable error — profiling failures are not retried (`backoffLimit: 0`); check events and conditions for details |

### Conditions

The operator maintains these conditions on the DGDR status:

| Condition | Meaning |
|---|---|
| `Validation` | Spec validation passed or failed |
| `Profiling` | Profiling job is running, succeeded, or failed |
| `SpecGenerated` | Generated DGD spec is available |
| `DeploymentReady` | DGD is deployed and healthy |
| `Succeeded` | Aggregate condition — true when the DGDR has reached its target state |

### Monitoring

```bash
# Watch phase transitions
kubectl get dgdr my-model -n $NAMESPACE -w

# Detailed status, conditions, and events
kubectl describe dgdr my-model -n $NAMESPACE

# Profiling sub-phase
kubectl get dgdr my-model -n $NAMESPACE -o jsonpath='{.status.profilingPhase}'

# Profiling job logs
kubectl get pods -n $NAMESPACE -l nvidia.com/dgdr-name=my-model
kubectl logs -f <profiling-pod-name> -n $NAMESPACE

# View generated DGD spec (when autoApply: false)
kubectl get dgdr my-model -n $NAMESPACE \
  -o jsonpath='{.status.profilingResults.selectedConfig}' | python3 -m json.tool

# View Pareto-optimal configs from profiling
kubectl get dgdr my-model -n $NAMESPACE \
  -o jsonpath='{.status.profilingResults.pareto}'
```

### Resource Ownership

- The DGDR does **not** set an owner reference on the DGD it creates. Deleting
  a DGDR does not delete the DGD — it persists independently so it can continue
  serving traffic.
- The relationship is tracked via labels: `dgdr.nvidia.com/name` and
  `dgdr.nvidia.com/namespace`.
- Additional resources (planner ConfigMaps) are created in the same namespace
  and labeled with `dgdr.nvidia.com/name`.

## Known Issues

- **`pareto_analysis.py` produces NaN for some configurations.** Tracked as an
  engineering follow-up. Workaround: re-run with a narrower sweep; narrow
  sweeps bypass the NaN path in practice.
- **PCIe profiler data not yet available.** See the PCIe callout under
  [SKU Format](#sku-format).

## Further Reading

- [Deployment Overview](/dynamo/dev/kubernetes-deployment/deploy-models/model-deployment-guide) — DGD, DCD, DGDR, recipes, strategy selection, and common pitfalls
- [Profiler Guide](/dynamo/dev/components/profiler/profiler-guide) — Profiling algorithms, picking modes, gate checks
- [Profiler Examples](/dynamo/dev/components/profiler/profiler-examples) — Ready-to-use YAML for SLA targets, private models, MoE, overrides
- [Planner Guide](/dynamo/dev/components/planner/planner-guide) — Scaling modes, PlannerConfig reference
- [API Reference](/dynamo/dev/additional-resources/api-reference-k-8-s) — Complete CRD field specifications
- [Creating Deployments](/dynamo/dev/additional-resources/creating-deployments) — DGD spec for full manual control