End-to-end tutorial for deploying Qwen/Qwen3-0.6B on Kubernetes using Dynamo’s recommended
DynamoGraphDeploymentRequest (DGDR) workflow — from zero to your first inference response.
This guide assumes you have already completed the platform installation and that the Dynamo operator and CRDs are running in your cluster.
A DynamoGraphDeploymentRequest (DGDR) is Dynamo’s deploy-by-intent API. You describe what
you want to run and your performance targets; Dynamo’s profiler determines the optimal
configuration automatically, then creates the live deployment for you.
For a deeper comparison, see Understanding Dynamo’s Custom Resources.
Before starting, confirm:
kubectl get pods -n ${NAMESPACE} shows operator pods Runningkubectl get crd | grep dynamo shows dynamographdeploymentrequests.nvidia.comkubectl and helm available in your shellSet these variables once — they are referenced throughout the guide:
Qwen/Qwen3-0.6B is a public model. A HuggingFace token is not strictly required to download
it, but is recommended to avoid rate limiting.
Verify the secret was created:
Save the following as qwen3-first-model.yaml:
Apply it (uses envsubst to substitute the RELEASE_VERSION shell variable into the YAML):
For the full spec reference, see the DGDR API Reference and Profiler Guide.
If you are using a namespace-scoped operator with GPU discovery disabled, you must also provide explicit hardware info or the DGDR will be rejected at admission:
See the installation guide for details.
Profiling is the automated step where Dynamo sweeps across candidate configurations (parallelism, batching, scheduling strategies) to find the one that best meets your SLA and hardware — so you don’t have to tune it manually.
Watch the DGDR status in real time:
The PHASE column progresses through:
Deployed is the success terminal state when autoApply: true (the default).
If you set autoApply: false, the phase stops at Ready — profiling is complete and the
generated DGD spec is stored in .status, but no deployment is created automatically.
To inspect and deploy it manually:
For a full status summary and events:
To follow the profiling job logs:
searchStrategy: rapid, profiling typically completes in under 15 minutes on a single GPU.Once the DGDR reaches Deployed, the DynamoGraphDeployment has been created automatically.
Check that everything is running:
Wait until pods are ready:
Find the frontend service name:
Port-forward to the frontend and send an inference request:
A successful response looks like:
Your first model is now live.
To remove the deployment and profiling artifacts:
Deleting a DGDR does not delete the DynamoGraphDeployment it created. The DGD persists
independently so it can continue serving traffic.
DGDR stuck in Pending
Common causes: no available GPU nodes, image pull failure (check image tag; NGC credentials are
optional but may be needed if you hit rate limits pulling from public NGC), missing hardware
config for a namespace-scoped operator.
GPU node taints are a frequent cause of pods staying Pending. Many clusters (including
GKE by default and most shared/HPC environments) taint GPU nodes with
nvidia.com/gpu:NoSchedule so that only GPU-aware workloads land on them. If the profiling
job pod is stuck with a 0/N nodes are available: … node(s) had untolerated taint event,
add a toleration to your DGDR via overrides.profilingJob. The operator and profiler
automatically forward it to every candidate and deployed pod:
Profiling job fails
Pods not starting after profiling
Model not responding after port-forward
sla (TTFT, ITL) and workload (ISL, OSL) targets to
your DGDR so the profiler optimizes for your specific traffic. See the
Profiler Guide for the full configuration
reference and picking modes. For ready-to-use YAML — including SLA targets, private models,
MoE, and overrides — see DGDR Examples.features.planner in the DGDR —
see the Planner Guide.autoApply: false and extract the DGD spec with
kubectl get dgdr <name> -o jsonpath='{.status.profilingResults.selectedConfig}'
before deploying.DynamoGraphDeployment spec for full customization.