OpenShift#
Red Hat OpenShift is an enterprise Kubernetes platform for hybrid and multicloud environments. Deploy NIM on self-managed OpenShift Container Platform (OCP) and Azure Red Hat OpenShift (ARO) with GPU Operator setup, deployment, and verification. The commands on this page have been tested on self-managed OCP and ARO.
Prerequisites#
Before deploying NIM on OpenShift, make sure you have the following:
An OpenShift cluster (self-managed OCP or ARO) with GPU-capable nodes
The
ocCLI, installed and authenticated withoc loginHelm 3
An NGC API key for pulling NIM container images and downloading model artifacts
Install GPU Operators#
NIM requires NVIDIA GPUs to be available as schedulable resources. This requires the Node Feature Discovery (NFD) Operator and the NVIDIA GPU Operator.
Install NFD Operator#
To install the NFD Operator and label GPU nodes, complete the following steps:
Create the
openshift-nfdnamespace:oc apply -f - <<'EOF' apiVersion: v1 kind: Namespace metadata: name: openshift-nfd EOF
Create the
OperatorGroup:oc apply -f - <<'EOF' apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: nfd-operator-group namespace: openshift-nfd spec: targetNamespaces: - openshift-nfd EOF
Create the
Subscription:oc apply -f - <<'EOF' apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: nfd namespace: openshift-nfd spec: channel: stable name: nfd source: redhat-operators sourceNamespace: openshift-marketplace EOF
Wait for the NFD Operator to install:
timeout 300 bash -c 'until oc get csv -n openshift-nfd 2>/dev/null | grep -q Succeeded; do sleep 10; done'
Create a
NodeFeatureDiscoveryinstance to label GPU nodes:oc apply -f - <<'EOF' apiVersion: nfd.openshift.io/v1 kind: NodeFeatureDiscovery metadata: name: nfd-instance namespace: openshift-nfd spec: operand: servicePort: 12000 workerConfig: configData: | sources: pci: deviceClassWhitelist: - "0300" - "0302" deviceLabelFields: - vendor EOF
Install NVIDIA GPU Operator#
To install the NVIDIA GPU Operator and configure GPU drivers and device plugins, complete the following steps:
Create the
nvidia-gpu-operatornamespace:oc apply -f - <<'EOF' apiVersion: v1 kind: Namespace metadata: name: nvidia-gpu-operator EOF
Create the
OperatorGroup:oc apply -f - <<'EOF' apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: nvidia-gpu-operator-group namespace: nvidia-gpu-operator spec: targetNamespaces: - nvidia-gpu-operator EOF
Create the
Subscription:oc apply -f - <<'EOF' apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: gpu-operator-certified namespace: nvidia-gpu-operator spec: channel: v25.10 name: gpu-operator-certified source: certified-operators sourceNamespace: openshift-marketplace EOF
Wait for the GPU Operator to install:
timeout 300 bash -c 'until oc get csv -n nvidia-gpu-operator 2>/dev/null | grep -q Succeeded; do sleep 10; done'
Create a
ClusterPolicyto configure GPU drivers and device plugins:oc apply -f - <<'EOF' apiVersion: nvidia.com/v1 kind: ClusterPolicy metadata: name: gpu-cluster-policy spec: operator: defaultRuntime: crio use_ocp_driver_toolkit: true daemonsets: priorityClassName: system-node-critical dcgm: enabled: true dcgmExporter: enabled: true devicePlugin: enabled: true driver: enabled: true gfd: enabled: true nodeStatusExporter: enabled: true toolkit: enabled: true EOF
Verify GPU Availability#
Wait for the GPU driver to build and for the device plugin to register GPUs. This process can take up to 20 minutes:
oc get nodes -o custom-columns='NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu'
At least one node should show a GPU count of 1 or more.
Common Setup#
The following resources are shared across all deployment methods. These commands have been tested on OpenShift 4.21.0 with NVIDIA GPU Operator v25.10.
Create OpenShift Project#
Create an OpenShift project for the NIM deployment:
oc new-project nim-llm
Create NGC Image Pull Secret#
Create an image pull secret so OpenShift can pull NIM container images from nvcr.io:
export NGC_API_KEY=nvapi-xxx
oc create secret docker-registry ngc-secret \
--docker-server=nvcr.io \
--docker-username='$oauthtoken' \
--docker-password=${NGC_API_KEY} \
-n nim-llm
Create NGC API Secret#
Create a secret that stores your NGC API key for use by the NIM container:
oc create secret generic ngc-api \
--namespace nim-llm \
--from-literal=NGC_API_KEY="$NGC_API_KEY"
Grant the Security Context Constraint#
OpenShift enforces SecurityContextConstraints (SCCs), which restrict the user IDs that a pod can run as. The NIM container runs as UID 1000 (non-root). Grant the nonroot-v2 SCC to allow this:
oc adm policy add-scc-to-user nonroot-v2 -z default -n nim-llm
The Helm chart defaults (runAsUser: 1000, runAsGroup: 1000, fsGroup: 1000) are compatible with the nonroot-v2 SCC. No additional security context configuration is required.
Optional: Create Model Cache PVC#
For persistent model caching across pod restarts, create a PVC. This step is optional because NIM downloads models into ephemeral storage at startup when no PVC is configured.
Adjust storageClassName based on your OpenShift edition:
OpenShift Edition |
StorageClass Name |
|---|---|
Self-managed OCP with ODF |
|
Self-managed OCP with NFS |
|
ARO (Azure) |
|
cat <<EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nvidia-nim-cache-pvc
namespace: nim-llm
spec:
accessModes:
- ReadWriteMany
storageClassName: ocs-storagecluster-cephfs # adjust for your edition or skip to use default
resources:
requests:
storage: 200Gi
EOF
Deploy NIM with Helm#
The NIM LLM Helm chart is compatible with OpenShift. The chart defaults handle security context, GPU resources (nvidia.com/gpu: 1), image pull secrets (ngc-secret), the NGC API secret (ngc-api), and service configuration. Override only the values that are specific to your deployment.
Single GPU Deployment (Llama 3.1 8B Instruct)#
Model-Specific NIM#
To deploy a model-specific NIM container, complete the following steps:
Create a values file:
cat <<EOF | tee custom-values-openshift.yaml image: repository: <NIM_LLM_MODEL_SPECIFIC_IMAGE> tag: "2.0.1" model: name: "meta/llama3.1-8b-instruct" EOF
Deploy the chart:
helm upgrade --install my-nim ./helm \ -f custom-values-openshift.yaml \ -n nim-llm
Model-Free NIM#
For model-free NIM container using the vLLM-OSS stack, complete the following steps:
Create a values file and set the model through environment variables:
cat <<EOF | tee custom-values-openshift-vllm-oss.yaml image: repository: "<NIM_LLM_MODEL_FREE_IMAGE>" tag: "2.0.1" model: name: "my-model" env: - name: NIM_MODEL_PATH value: "hf://TinyLlama/TinyLlama-1.1B-Chat-v1.0" - name: NIM_SERVED_MODEL_NAME value: "my-model" - name: NIM_MAX_MODEL_LEN value: "2048" EOF
Optional: For gated Hugging Face models, create an HF token secret:
oc create secret generic hf-token --namespace nim-llm --from-literal=HF_TOKEN="${HF_TOKEN}"
Optional: Add
model.hfTokenSecret: hf-tokento your values file.Deploy the chart:
helm upgrade --install my-nim ./helm \ -f custom-values-openshift-vllm-oss.yaml \ -n nim-llm
Optional: Enable Persistent Cache#
To use the PVC created earlier, add these lines to your values file:
persistence:
enabled: true
existingClaim: "nvidia-nim-cache-pvc"
Monitor Deployment#
Wait for the pod to be ready:
oc -n nim-llm get pods -l "app.kubernetes.io/name=nim-llm" -w
Wait until the pod status is Running and ready, then press Ctrl+C.
Expose NIM Service#
OpenShift Routes provide automatic TLS termination and DNS. Create a Route for external access by using the following steps:
Create the Route:
oc expose svc/my-nim-nim-llm --port=8000 -n nim-llm
Get the route host:
export NIM_URL=$(oc get route my-nim-nim-llm -n nim-llm -o jsonpath='{.spec.host}')
Print the endpoint:
echo "NIM endpoint: http://${NIM_URL}"
Alternatively, use port-forward for local testing:
oc -n nim-llm port-forward svc/my-nim-nim-llm 8000:8000
Verify Deployment#
Verify that the deployment is ready and can serve inference requests through the OpenShift Route. If you are using port-forward for local testing, replace http://${NIM_URL} with http://127.0.0.1:8000 in the following commands.
Call the readiness endpoint to confirm that the service is ready:
curl -s http://${NIM_URL}/v1/health/ready
Query the
/v1/modelsendpoint to discover the model name to use in inference requests:curl -s http://${NIM_URL}/v1/models | python3 -m json.tool
Send a chat completion request using the model name returned by
/v1/models:curl -X POST http://${NIM_URL}/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "meta/llama3.1-8b-instruct", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 256 }' | python3 -m json.tool
Use the same model value as NIM_SERVED_MODEL_NAME for a model-free NIM or model.name for a model-specific NIM.
Run Helm Tests#
Verify your deployment with built-in tests:
helm test my-nim -n nim-llm
View Logs#
Stream real-time logs from your NIM pods:
oc -n nim-llm logs -l "app.kubernetes.io/name=nim-llm" -f
Cleanup#
Uninstall NIM#
Remove the NIM deployment resources in the following order.
Uninstall the Helm release:
helm uninstall my-nim -n nim-llm
Delete the persistent volume claims:
oc delete pvc -n nim-llm -l app.kubernetes.io/name=nim-llm
Delete the project:
oc delete project nim-llm
Remove GPU Operators#
Remove the GPU Operator resources first, and then remove the NFD resources.
Delete the GPU Operator
ClusterPolicy:oc delete clusterpolicy gpu-cluster-policy
Delete the GPU Operator CSV and subscription:
oc delete csv -n nvidia-gpu-operator -l operators.coreos.com/gpu-operator-certified.nvidia-gpu-operator oc delete subscription gpu-operator-certified -n nvidia-gpu-operator
Delete the GPU Operator project:
oc delete project nvidia-gpu-operator
Delete the
NodeFeatureDiscoveryinstance:oc delete nfd nfd-instance -n openshift-nfd
Delete the NFD CSV and subscription:
oc delete csv -n openshift-nfd -l operators.coreos.com/nfd.openshift-nfd oc delete subscription nfd -n openshift-nfd
Delete the NFD project:
oc delete project openshift-nfd
ARO: Remove GPU MachineSet#
If you created a GPU MachineSet for ARO, scale it down and delete it:
oc scale machineset <gpu-machineset-name> -n openshift-machine-api --replicas=0
oc delete machineset <gpu-machineset-name> -n openshift-machine-api