Get a model running on Kubernetes in minutes.
Create a HuggingFace token secret for model downloads. If you don’t have a token, see the HuggingFace token guide.
If you don’t have the GPU Operator yet:
If your cluster already provides GPU drivers (e.g., GKE with gpu-driver-version=latest, or AKS), add:
The GPU Operator is the only prerequisite for a basic deployment. For additional features like RDMA, Prometheus, or multinode scheduling with Grove/KAI Scheduler, see the Installation Guide.
If your GPU SKU and cloud provider are supported, you can use AICR for rapid installation of prerequisites and the Dynamo Helm chart.
Optionally, verify your cluster is ready:
Wait for the platform pods:
Deploy Qwen/Qwen3-0.6B using a DynamoGraphDeploymentRequest (DGDR).
The DGDR is the entrypoint for deploying models. It runs automatic profiling for your model/hardware and creates an auto-configured DynamoGraphDeployment (DGD). After that, the DGDR is completed and reaches a terminal state, similar to a K8s Job and can be cleaned up. The DGD is the resource that persists and serves your model.
Watch the DGDR progress from Pending → Profiling → Deploying → Deployed:
Dynamo supports vLLM, TensorRT-LLM, and SGLang backends. Setting backend: auto lets the profiler choose the best one for your model and hardware. See the backends guide for details.
Once the DGDR shows Deployed: