Helm and Kubernetes#

This page describes how to deploy NVIDIA NIM for LLMs on Kubernetes by using the NIM Helm chart.

Prerequisites#

A running Kubernetes cluster with GPU nodes.
kubectl and helm configured for the target cluster.
Access to the NIM Helm chart registry and NIM container image registry.
An NGC API key with access to the target model artifacts.
A storage class that supports persistent volumes.

Note

If you want to provision Kubernetes with NVIDIA Cloud Native Stack (CNS), refer to Using the Ansible Playbooks.

Fetch and extract the Helm chart before deployment:

export HELM_VERSION="<version_number>"
helm fetch "https://helm.ngc.nvidia.com/nim/charts/nim-llm-${HELM_VERSION}.tgz" \
  --username='$oauthtoken' \
  --password="${NGC_API_KEY}"
tar -xzf "nim-llm-${HELM_VERSION}.tgz"

Configure Helm#

The following Helm values are the most important settings for a NIM deployment:

image.repository: NIM container image to deploy.
image.tag: NIM container image tag.
model.ngcAPISecret and imagePullSecrets: credentials required to pull images and model artifacts.
persistence: storage settings for model cache.
resources: GPU limits based on model requirements.
env: optional advanced runtime configuration.

Use the following commands to inspect chart documentation and defaults:

helm show readme nim-llm/
helm show values nim-llm/

Minimal Example#

Create secrets before installing the chart:

export NGC_API_KEY=<your_ngc_api_key>

# Secret for pulling images from nvcr.io
kubectl create secret docker-registry ngc-secret \
  --docker-server=nvcr.io \
  --docker-username='$oauthtoken' \
  --docker-password="${NGC_API_KEY}"

# Secret consumed by the NIM container runtime
kubectl create secret generic nvidia-nim-secrets \
  --from-literal=NGC_API_KEY="${NGC_API_KEY}"

Create values.yaml with a minimal configuration:

cat <<'EOF' > values.yaml
image:
  repository: <NIM_LLM_MODEL_SPECIFIC_IMAGE>
  tag: "2.0.0"
model:
  ngcAPISecret: "nvidia-nim-secrets"
persistence:
  enabled: true
  storageClass: "nfs-client"
  accessMode: ReadWriteMany
  size: 50Gi
resources:
  limits:
    nvidia.com/gpu: 1
imagePullSecrets:
  - name: "ngc-secret"
EOF

Note

Set persistence.storageClass to a StorageClass that is available in your Kubernetes cluster.

Tip

Adjust persistence.size based on your model size and expected cache usage.

Install the release:

helm install my-nim nim-llm/ -f values.yaml

These values are intentionally minimal and work as a starting point in most clusters.

Optional: Enable LoRA With Helm#

To enable LoRA adapters, create a dedicated PVC, copy LoRA adapter files into that PVC, and mount it into the NIM pod:

cat <<'EOF' > nvidia-nim-lora-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nvidia-nim-lora-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: nfs-client
  resources:
    requests:
      storage: 10Gi
EOF

kubectl apply -f nvidia-nim-lora-pvc.yaml

Add LoRA adapters to the PVC under /loras before you run the Helm upgrade. For each adapter, create one directory that contains adapter artifacts:

/loras/
  adapter_name/
    adapter_config.json
    adapter_model.safetensors   # or adapter_model.bin

Note

NIM loads adapters from NIM_PEFT_SOURCE (/loras in this example). If the PVC is empty, no LoRA adapters are available at runtime.

For detailed LoRA configuration and runtime behavior, refer to Fine-Tuning with LoRA.

Update values.yaml:

env:
  - name: NIM_PEFT_SOURCE
    value: /loras
extraVolumes:
  lora-adapter:
    persistentVolumeClaim:
      claimName: nvidia-nim-lora-pvc
extraVolumeMounts:
  lora-adapter:
    mountPath: /loras

Then apply the updated values:

helm upgrade my-nim nim-llm/ -f values.yaml

Verify Deployment#

kubectl get pods -l app.kubernetes.io/instance=my-nim
kubectl get svc -l app.kubernetes.io/instance=my-nim
kubectl port-forward svc/my-nim-nim-llm 8000:8000
curl -sS http://127.0.0.1:8000/v1/health/ready

A healthy deployment returns an HTTP 200 response from the readiness endpoint.