OpenShift#
Red Hat OpenShift is an enterprise Kubernetes platform for hybrid and multi-cloud environments. NIM supports deployment on self-managed OCP and Azure Red Hat OpenShift (ARO). This guide covers GPU operator setup, NIM deployment, and verification. The commands have been tested on self-managed OCP and ARO.
Prerequisites#
Before deploying NIM on OpenShift, ensure you have:
An OpenShift cluster (self-managed OCP or ARO) with GPU-capable nodes
ocCLI installed and authenticated (oc login)Helm 3 installed
NGC API key for pulling container images
Install GPU Operators#
NIM requires NVIDIA GPUs to be available as schedulable resources. This requires the Node Feature Discovery (NFD) Operator and the NVIDIA GPU Operator.
Install NFD Operator#
oc apply -f - <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
name: openshift-nfd
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: nfd-operator-group
namespace: openshift-nfd
spec:
targetNamespaces:
- openshift-nfd
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: nfd
namespace: openshift-nfd
spec:
channel: stable
name: nfd
source: redhat-operators
sourceNamespace: openshift-marketplace
EOF
Wait for the NFD Operator to install:
timeout 300 bash -c 'until oc get csv -n openshift-nfd 2>/dev/null | grep -q Succeeded; do sleep 10; done'
Create a NodeFeatureDiscovery instance to label GPU nodes:
oc apply -f - <<'EOF'
apiVersion: nfd.openshift.io/v1
kind: NodeFeatureDiscovery
metadata:
name: nfd-instance
namespace: openshift-nfd
spec:
operand:
servicePort: 12000
workerConfig:
configData: |
sources:
pci:
deviceClassWhitelist:
- "0300"
- "0302"
deviceLabelFields:
- vendor
EOF
Install NVIDIA GPU Operator#
oc apply -f - <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
name: nvidia-gpu-operator
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: nvidia-gpu-operator-group
namespace: nvidia-gpu-operator
spec:
targetNamespaces:
- nvidia-gpu-operator
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: gpu-operator-certified
namespace: nvidia-gpu-operator
spec:
channel: v25.10
name: gpu-operator-certified
source: certified-operators
sourceNamespace: openshift-marketplace
EOF
Wait for the GPU Operator to install:
timeout 300 bash -c 'until oc get csv -n nvidia-gpu-operator 2>/dev/null | grep -q Succeeded; do sleep 10; done'
Create a ClusterPolicy to configure GPU drivers and device plugins:
oc apply -f - <<'EOF'
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: gpu-cluster-policy
spec:
operator:
defaultRuntime: crio
use_ocp_driver_toolkit: true
daemonsets:
priorityClassName: system-node-critical
dcgm:
enabled: true
dcgmExporter:
enabled: true
devicePlugin:
enabled: true
driver:
enabled: true
gfd:
enabled: true
nodeStatusExporter:
enabled: true
toolkit:
enabled: true
EOF
Verify GPU Availability#
Wait for the GPU driver to build and the device plugin to register GPUs (this can take up to 20 minutes):
oc get nodes -o custom-columns='NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu'
At least one node should show a GPU count of 1 or more.
Common Setup#
The following resources are shared across all deployment methods. These commands have been tested on OpenShift 4.21.0 with NVIDIA GPU Operator v25.10.
Create OpenShift Project#
oc new-project nim-llm
Create NGC Image Pull Secret#
export NGC_API_KEY=nvapi-xxx
oc create secret docker-registry ngc-secret \
--docker-server=nvcr.io \
--docker-username='$oauthtoken' \
--docker-password=${NGC_API_KEY} \
-n nim-llm
Create NGC API Secret#
oc create secret generic ngc-api \
--namespace nim-llm \
--from-literal=NGC_API_KEY="$NGC_API_KEY"
Grant Security Context Constraint#
OpenShift enforces SecurityContextConstraints (SCCs) that restrict which user IDs a pod can run as. The NIM container runs as UID 1000 (non-root). Grant the nonroot-v2 SCC to allow this:
oc adm policy add-scc-to-user nonroot-v2 -z default -n nim-llm
The Helm chart defaults (runAsUser: 1000, runAsGroup: 1000, fsGroup: 1000) are compatible with the nonroot-v2 SCC. No additional security context configuration is required.
Optional: Create Model Cache PVC#
For persistent model caching across pod restarts, create a PVC. This is optional because NIM downloads models into ephemeral storage at startup when no PVC is configured.
Adjust storageClassName based on your OpenShift edition:
OpenShift Edition |
StorageClass Name |
|---|---|
Self-managed OCP with ODF |
|
Self-managed OCP with NFS |
|
ARO (Azure) |
|
cat <<EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nvidia-nim-cache-pvc
namespace: nim-llm
spec:
accessModes:
- ReadWriteMany
storageClassName: ocs-storagecluster-cephfs # adjust for your edition or skip to use default
resources:
requests:
storage: 200Gi
EOF
Deploy NIM with Helm#
The NIM LLM Helm chart is compatible with OpenShift. The chart defaults handle security context, GPU resources (nvidia.com/gpu: 1), image pull secrets (ngc-secret), NGC API secret (ngc-api), and service configuration. You only need to override values specific to your deployment.
Single GPU Deployment (Llama 3.1 8B Instruct)#
Option A: Pre-built NIM image
cat <<EOF | tee custom-values-openshift.yaml
image:
repository: <NIM_LLM_MODEL_SPECIFIC_IMAGE>
tag: "2.0.0"
model:
name: "meta/llama3.1-8b-instruct"
EOF
helm upgrade --install my-nim ./helm \
-f custom-values-openshift.yaml \
-n nim-llm
Option B: vLLM-OSS stack (multi-model image)
For model-free NIM using the vLLM-OSS stack, set the model through environment variables:
cat <<EOF | tee custom-values-openshift-vllm-oss.yaml
image:
repository: "<NIM_LLM_MODEL_FREE_IMAGE>"
tag: "2.0.0"
model:
name: "my-model"
env:
- name: NIM_MODEL_PATH
value: "hf://TinyLlama/TinyLlama-1.1B-Chat-v1.0"
- name: NIM_SERVED_MODEL_NAME
value: "my-model"
- name: NIM_MAX_MODEL_LEN
value: "2048"
EOF
helm upgrade --install my-nim ./helm \
-f custom-values-openshift-vllm-oss.yaml \
-n nim-llm
For gated Hugging Face models, create an HF token secret:
oc create secret generic hf-token --namespace nim-llm --from-literal=HF_TOKEN="${HF_TOKEN}"
Then add to your values file: model.hfTokenSecret: hf-token
Optional: Enable Persistent Cache#
To use the PVC created earlier, add these lines to your values file:
persistence:
enabled: true
existingClaim: "nvidia-nim-cache-pvc"
Monitor Deployment#
Wait for the pod to be ready:
oc -n nim-llm get pods -l "app.kubernetes.io/name=nim-llm" -w
Wait until the pod status is Running and ready, then press Ctrl+C.
Expose NIM Service#
OpenShift Routes provide automatic TLS termination and DNS. Create a Route for external access:
oc expose svc/my-nim-nim-llm --port=8000 -n nim-llm
export NIM_URL=$(oc get route my-nim-nim-llm -n nim-llm -o jsonpath='{.spec.host}')
echo "NIM endpoint: http://${NIM_URL}"
Alternatively, use port-forward for local testing:
oc -n nim-llm port-forward svc/my-nim-nim-llm 8000:8000
Verify Deployment#
curl -s http://${NIM_URL}/v1/health/ready
curl -s http://${NIM_URL}/v1/models | python3 -m json.tool
curl -X POST http://${NIM_URL}/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta/llama3.1-8b-instruct",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 256
}' | python3 -m json.tool
Use the same model value as NIM_SERVED_MODEL_NAME (for vLLM-OSS) or model.name (for pre-built NIM).
Run Helm Tests#
helm test my-nim -n nim-llm
View Logs#
oc -n nim-llm logs -l "app.kubernetes.io/name=nim-llm" -f
Cleanup#
Uninstall NIM#
helm uninstall my-nim -n nim-llm
oc delete pvc -n nim-llm -l app.kubernetes.io/name=nim-llm
oc delete project nim-llm
Remove GPU Operators#
oc delete clusterpolicy gpu-cluster-policy
oc delete csv -n nvidia-gpu-operator -l operators.coreos.com/gpu-operator-certified.nvidia-gpu-operator
oc delete subscription gpu-operator-certified -n nvidia-gpu-operator
oc delete project nvidia-gpu-operator
oc delete nfd nfd-instance -n openshift-nfd
oc delete csv -n openshift-nfd -l operators.coreos.com/nfd.openshift-nfd
oc delete subscription nfd -n openshift-nfd
oc delete project openshift-nfd
ARO: Remove GPU MachineSet#
If you created a GPU MachineSet for ARO, scale it down and delete it:
oc scale machineset <gpu-machineset-name> -n openshift-machine-api --replicas=0
oc delete machineset <gpu-machineset-name> -n openshift-machine-api