Google Kubernetes Engine (GKE)
Google Kubernetes Engine (GKE)
Dynamo Deployment on GKE
Pre-requisites
Install gcloud CLI
https://cloud.google.com/sdk/docs/install
Create GKE cluster
Create GPU pool
Clone Dynamo GitHub repository
Note: Please make sure GitHub branch/commit version matches with Dynamo platform and VLLM container.
Set environment variables for GKE
Install Dynamo Kubernetes Platform
After installation, verify the installation:
Expected output
Deploy Inference Graph
We will deploy a LLM model to the Dynamo platform. Here we use Qwen/Qwen3-0.6B model with VLLM and disaggregated deployment as an example.
In the deployment yaml file, some adjustments have to/ could be made:
- (Required) Add args to change
LD_LIBRARY_PATHandPATHof decoder container, to enable GKE find the correct GPU driver - Change VLLM image to the desired one on NGC
- Add namespace to metadata
- Adjust GPU/CPU request and limits
- Change model to deploy
More configurations please refer to https://github.com/ai-dynamo/dynamo/tree/main/examples/deployments/GKE/vllm
Highlighted configurations in yaml file
Please note that LD_LIBRARY_PATH needs to be set properly in GKE as per Run GPUs in GKE
The following snippet needs to be present in the args field of the deployment yaml file:
For example, refer to the following from examples/deployments/GKE/vllm/disagg.yaml
Deploy the model
Expected output after successful deployment