Helm and Kubernetes#
This page describes how to deploy NVIDIA NIM for LLMs on Kubernetes by using the NIM Helm chart.
Prerequisites#
A running Kubernetes cluster with GPU nodes.
kubectlandhelmconfigured for the target cluster.Access to the NIM Helm chart registry and NIM container image registry.
An NGC API key with access to the target model artifacts.
A storage class that supports persistent volumes.
Note
If you want to provision Kubernetes with NVIDIA Cloud Native Stack (CNS), refer to Using the Ansible Playbooks.
Fetch and extract the Helm chart before deployment:
export HELM_VERSION="<version_number>"
helm fetch "https://helm.ngc.nvidia.com/nim/charts/nim-llm-${HELM_VERSION}.tgz" \
--username='$oauthtoken' \
--password="${NGC_API_KEY}"
tar -xzf "nim-llm-${HELM_VERSION}.tgz"
Configure Helm#
The following Helm values are the most important settings for a NIM deployment:
image.repository: NIM container image to deploy.image.tag: NIM container image tag.model.ngcAPISecretandimagePullSecrets: credentials required to pull images and model artifacts.persistence: storage settings for model cache.resources: GPU limits based on model requirements.env: optional advanced runtime configuration.
Use the following commands to inspect chart documentation and defaults:
helm show readme nim-llm/
helm show values nim-llm/
Minimal Example#
Create secrets before installing the chart:
export NGC_API_KEY=<your_ngc_api_key>
# Secret for pulling images from nvcr.io
kubectl create secret docker-registry ngc-secret \
--docker-server=nvcr.io \
--docker-username='$oauthtoken' \
--docker-password="${NGC_API_KEY}"
# Secret consumed by the NIM container runtime
kubectl create secret generic nvidia-nim-secrets \
--from-literal=NGC_API_KEY="${NGC_API_KEY}"
Create values.yaml with a minimal configuration:
cat <<'EOF' > values.yaml
image:
repository: <NIM_LLM_MODEL_SPECIFIC_IMAGE>
tag: "2.0.0"
model:
ngcAPISecret: "nvidia-nim-secrets"
persistence:
enabled: true
storageClass: "nfs-client"
accessMode: ReadWriteMany
size: 50Gi
resources:
limits:
nvidia.com/gpu: 1
imagePullSecrets:
- name: "ngc-secret"
EOF
Note
Set persistence.storageClass to a StorageClass that is available in your Kubernetes cluster.
Tip
Adjust persistence.size based on your model size and expected cache usage.
Install the release:
helm install my-nim nim-llm/ -f values.yaml
These values are intentionally minimal and work as a starting point in most clusters.
Optional: Enable LoRA With Helm#
To enable LoRA adapters, create a dedicated PVC, copy LoRA adapter files into that PVC, and mount it into the NIM pod:
cat <<'EOF' > nvidia-nim-lora-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nvidia-nim-lora-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: nfs-client
resources:
requests:
storage: 10Gi
EOF
kubectl apply -f nvidia-nim-lora-pvc.yaml
Add LoRA adapters to the PVC under /loras before you run the Helm upgrade. For each adapter, create one directory that contains adapter artifacts:
/loras/
adapter_name/
adapter_config.json
adapter_model.safetensors # or adapter_model.bin
Note
NIM loads adapters from NIM_PEFT_SOURCE (/loras in this example). If the PVC is empty, no LoRA adapters are available at runtime.
For detailed LoRA configuration and runtime behavior, refer to Fine-Tuning with LoRA.
Update values.yaml:
env:
- name: NIM_PEFT_SOURCE
value: /loras
extraVolumes:
lora-adapter:
persistentVolumeClaim:
claimName: nvidia-nim-lora-pvc
extraVolumeMounts:
lora-adapter:
mountPath: /loras
Then apply the updated values:
helm upgrade my-nim nim-llm/ -f values.yaml
Verify Deployment#
kubectl get pods -l app.kubernetes.io/instance=my-nim
kubectl get svc -l app.kubernetes.io/instance=my-nim
kubectl port-forward svc/my-nim-nim-llm 8000:8000
curl -sS http://127.0.0.1:8000/v1/health/ready
A healthy deployment returns an HTTP 200 response from the readiness endpoint.