Helm and Kubernetes#
This page describes how to deploy NVIDIA NIM for LLMs on Kubernetes using the NIM Helm chart.
Prerequisites#
Before deploying NIM with Helm, make sure you have the following:
A running Kubernetes cluster with GPU-capable nodes
Configured
kubectlaccess to the target clusterHelm 3.0.0 or later
An NGC API key for accessing the NIM Helm chart, pulling NIM container images, and downloading model artifacts
A storage class that supports persistent volumes for model caching
Note
To provision Kubernetes with NVIDIA Cloud Native Stack (CNS), refer to Using the Ansible Playbooks.
Fetch and extract the Helm chart before deployment. Go to the NGC Catalog and select the nim-llm Helm chart to pick a version. In most cases, you should select the latest version.
export HELM_CHART_VERSION="<version_number>"
helm fetch "https://helm.ngc.nvidia.com/nim/charts/nim-llm-${HELM_CHART_VERSION}.tgz" \
--username='$oauthtoken' \
--password="${NGC_API_KEY}"
tar -xzf "nim-llm-${HELM_CHART_VERSION}.tgz"
Configure Helm#
The following Helm values are the most important settings for a NIM deployment:
image.repository: NIM container image to deploy.image.tag: NIM container image tag.model.ngcAPISecretandimagePullSecrets: credentials required to pull images and model artifacts.persistence: storage settings for model cache.resources: GPU limits based on model requirements.env: optional advanced runtime configuration.
Use the following commands to inspect chart documentation and defaults:
helm show readme nim-llm/
helm show values nim-llm/
Minimal Example#
Complete the following steps to deploy the minimal Helm example:
Create the secrets before installing the chart:
export NGC_API_KEY=<your_ngc_api_key>
Create the image pull secret:
kubectl create secret docker-registry ngc-secret \ --docker-server=nvcr.io \ --docker-username='$oauthtoken' \ --docker-password="${NGC_API_KEY}"
Create the NGC API key secret:
kubectl create secret generic nvidia-nim-secrets \ --from-literal=NGC_API_KEY="${NGC_API_KEY}"
Create
values.yamlwith a minimal configuration:cat <<'EOF' > values.yaml image: repository: <NIM_LLM_MODEL_SPECIFIC_IMAGE> tag: "2.0.1" model: ngcAPISecret: "nvidia-nim-secrets" persistence: enabled: true storageClass: "nfs-client" accessMode: ReadWriteMany size: 50Gi resources: limits: nvidia.com/gpu: 1 imagePullSecrets: - name: "ngc-secret" EOF
Note
Set
persistence.storageClassto a StorageClass that is available in your Kubernetes cluster.Tip
Adjust
persistence.sizebased on your model size and expected cache usage.Install the release:
helm install my-nim nim-llm/ -f values.yaml
These values are intentionally minimal and work as a starting point in most clusters.
Optional: Enable LoRA With Helm#
Complete the following steps to enable LoRA adapters with Helm:
Create a dedicated PVC for the LoRA adapters:
cat <<'EOF' > nvidia-nim-lora-pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: nvidia-nim-lora-pvc spec: accessModes: - ReadWriteMany storageClassName: nfs-client resources: requests: storage: 10Gi EOF kubectl apply -f nvidia-nim-lora-pvc.yaml
Add LoRA adapters to the PVC under
/loras. For each adapter, create one directory that contains the adapter artifacts:/loras/ adapter_name/ adapter_config.json adapter_model.safetensors # or adapter_model.binNote
NIM loads adapters from
NIM_PEFT_SOURCE(/lorasin this example). If the PVC is empty, no LoRA adapters are available at runtime.Update
values.yaml:env: - name: NIM_PEFT_SOURCE value: /loras extraVolumes: lora-adapter: persistentVolumeClaim: claimName: nvidia-nim-lora-pvc extraVolumeMounts: lora-adapter: mountPath: /loras
Apply the updated values:
helm upgrade my-nim nim-llm/ -f values.yaml
For detailed LoRA configuration and runtime behavior, refer to Fine-Tuning with LoRA.
Verify Deployment#
Verify that the pods and service are ready, and then test inference with port forwarding.
Check that the pods are running:
kubectl get pods -l app.kubernetes.io/instance=my-nim
Check that the service is available:
kubectl get svc -l app.kubernetes.io/instance=my-nim
Port-forward the service for local testing:
kubectl port-forward svc/my-nim-nim-llm 8000:8000
Call the readiness endpoint to confirm that the service is ready:
curl -sS http://127.0.0.1:8000/v1/health/ready
A healthy deployment returns an HTTP 200 response from the readiness endpoint.