Deploying Inference Graphs to Kubernetes#

We expect users to deploy their inference graphs using CRDs or helm charts.

1. Install Dynamo Cloud.#

Prior to deploying an inference graph the user should deploy the Dynamo Cloud Platform. Reference the Quickstart Guide for steps to install Dynamo Cloud with Helm.

Dynamo Cloud acts as an orchestration layer between the end user and Kubernetes, handling the complexity of deploying your graphs for you. This is a one-time action, only necessary the first time you deploy a DynamoGraph.

2. Deploy your inference graph.#

We provide a Custom Resource YAML file for many examples under the components/backends/{engine}/deploy folders. Consult the examples below for the CRs for a specific inference backend.

View SGLang K8s

View vLLM K8s

View TRT-LLM K8s

Deploying a particular example#

# Set your dynamo root directory
cd <root-dynamo-folder>
export PROJECT_ROOT=$(pwd)
export NAMESPACE=<your-namespace> # the namespace you used to deploy Dynamo cloud to.

Deploying an example consists of the simple kubectl apply -f ... -n ${NAMESPACE} command. For example:

kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}

You can use kubectl get dynamoGraphDeployment -n ${NAMESPACE} to view your deployment. You can use kubectl delete dynamoGraphDeployment <your-dep-name> -n ${NAMESPACE} to delete the deployment.

Note 1 Example Image

The examples use a prebuilt image from the nvcr.io registry. You can utilize public images from Dynamo NGC or build your own image and update the image location in your CR file prior to applying. Either way, you will need to overwrite the image in the example YAML.

To build your own image:

./container/build.sh --framework <your-inference-framework>

For example for the sglang run

./container/build.sh --framework sglang

To overwrite the image in the example:

extraPodSpec:
        mainContainer:
          image: <image-in-your-$DYNAMO_IMAGE>

Note 2 Setup port forward if needed when deploying to Kubernetes.

List the services in your namespace:

kubectl get svc -n ${NAMESPACE}

Look for one that ends in -frontend and use it for port forward.

SERVICE_NAME=$(kubectl get svc -n ${NAMESPACE} -o name | grep frontend | sed 's|.*/||' | sed 's|-frontend||' | head -n1)
kubectl port-forward svc/${SERVICE_NAME}-frontend 8080:8080 -n ${NAMESPACE}

Additional Resources:

Manual Deployment with Helm Charts#

Users who need more control over their deployments can use the manual deployment path (deploy/helm/):

  • Used for manually deploying inference graphs to Kubernetes

  • Contains Helm charts and configurations for deploying individual inference pipelines

  • Provides full control over deployment parameters

  • Requires manual management of infrastructure components

  • Documentation: