OpenShift#
Red Hat OpenShift is an enterprise Kubernetes platform for hybrid and multicloud environments. Deploy NIM on self-managed OpenShift Container Platform (OCP), Red Hat OpenShift Service on AWS (ROSA), Azure Red Hat OpenShift (ARO), and Red Hat OpenShift Dedicated on Google Cloud with GPU Operator setup, deployment, and verification. The commands on this page have been tested on self-managed OCP, ROSA, ARO, and OpenShift Dedicated (GCP).
Note: The instructions for self-managed OCP apply universally to any OpenShift cluster provisioned by the user, whether running on bare metal, VMware vSphere, or cloud VMs such as GCP, AWS, or Azure. No cloud-specific deployment steps are required once the cluster is running.
Prerequisites#
Before deploying NIM on OpenShift, make sure you have the following:
An OpenShift cluster (self-managed OCP, ROSA, ARO, or OpenShift Dedicated on GCP) with GPU-capable nodes
The
ocCLI, installed and authenticated withoc loginHelm 3
An NGC API key for pulling NIM container images and downloading model artifacts
Install GPU Operators#
NIM requires NVIDIA GPUs to be available as schedulable resources. This requires the Node Feature Discovery (NFD) Operator and the NVIDIA GPU Operator.
Install NFD Operator#
To install the NFD Operator and label GPU nodes, complete the following steps:
Create the
openshift-nfdnamespace:oc apply -f - <<'EOF' apiVersion: v1 kind: Namespace metadata: name: openshift-nfd EOF
Create the
OperatorGroup:oc apply -f - <<'EOF' apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: nfd-operator-group namespace: openshift-nfd spec: targetNamespaces: - openshift-nfd EOF
Create the
Subscription:oc apply -f - <<'EOF' apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: nfd namespace: openshift-nfd spec: channel: stable name: nfd source: redhat-operators sourceNamespace: openshift-marketplace EOF
Wait for the NFD Operator to install:
timeout 300 bash -c 'until oc get csv -n openshift-nfd 2>/dev/null | grep -q Succeeded; do sleep 10; done'
Create a
NodeFeatureDiscoveryinstance to label GPU nodes:oc apply -f - <<'EOF' apiVersion: nfd.openshift.io/v1 kind: NodeFeatureDiscovery metadata: name: nfd-instance namespace: openshift-nfd spec: operand: servicePort: 12000 workerConfig: configData: | sources: pci: deviceClassWhitelist: - "0300" - "0302" deviceLabelFields: - vendor EOF
Install NVIDIA GPU Operator#
To install the NVIDIA GPU Operator and configure GPU drivers and device plugins, complete the following steps:
Create the
nvidia-gpu-operatornamespace:oc apply -f - <<'EOF' apiVersion: v1 kind: Namespace metadata: name: nvidia-gpu-operator EOF
Create the
OperatorGroup:oc apply -f - <<'EOF' apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: nvidia-gpu-operator-group namespace: nvidia-gpu-operator spec: targetNamespaces: - nvidia-gpu-operator EOF
Create the
Subscription:oc apply -f - <<'EOF' apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: gpu-operator-certified namespace: nvidia-gpu-operator spec: channel: v25.10 name: gpu-operator-certified source: certified-operators sourceNamespace: openshift-marketplace EOF
Wait for the GPU Operator to install:
timeout 300 bash -c 'until oc get csv -n nvidia-gpu-operator 2>/dev/null | grep -q Succeeded; do sleep 10; done'
Create a
ClusterPolicyto configure GPU drivers and device plugins:oc apply -f - <<'EOF' apiVersion: nvidia.com/v1 kind: ClusterPolicy metadata: name: gpu-cluster-policy spec: operator: defaultRuntime: crio use_ocp_driver_toolkit: true daemonsets: priorityClassName: system-node-critical dcgm: enabled: true dcgmExporter: enabled: true devicePlugin: enabled: true driver: enabled: true gfd: enabled: true nodeStatusExporter: enabled: true toolkit: enabled: true EOF
Verify GPU Availability#
Wait for the GPU driver to build and for the device plugin to register GPUs. This process can take up to 20 minutes:
oc get nodes -o custom-columns='NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu'
At least one node should show a GPU count of 1 or more.
Common Setup#
The following resources are shared across all deployment methods. These commands have been tested on OpenShift 4.18.36 (ROSA), 4.19.20 (ARO), 4.21.0 (self-managed OCP), and 4.21.8 (OpenShift Dedicated on GCP) with NVIDIA GPU Operator v24.6, v25.3, and v25.10.
Create OpenShift Project#
Create an OpenShift project for the NIM deployment:
oc new-project nim-llm
Create NGC Image Pull Secret#
Create an image pull secret so OpenShift can pull NIM container images from nvcr.io:
export NGC_API_KEY=nvapi-xxx
oc create secret docker-registry ngc-secret \
--docker-server=nvcr.io \
--docker-username='$oauthtoken' \
--docker-password=${NGC_API_KEY} \
-n nim-llm
Create NGC API Secret#
Create a secret that stores your NGC API key for use by the NIM container:
oc create secret generic ngc-api \
--namespace nim-llm \
--from-literal=NGC_API_KEY="$NGC_API_KEY"
Configure the Security Context Constraint#
OpenShift enforces SecurityContextConstraints (SCCs) that restrict which user IDs a pod can run as. The NIM container defaults to UID 1000 (non-root) and supports arbitrary UIDs that OpenShift assigns.
The Helm chart detects OpenShift and selects an SCC profile. The supported profiles are:
restricted-v2nonroot-v2
The chart renders pod, container, and init container security contexts that satisfy the selected SCC. This includes chart-managed init containers, such as ngc-model-puller, user-provided containers in initContainers.extraInit, and Helm test hook pods that run when you use helm test. The chart also annotates pods with openshift.io/required-scc so admission uses the selected SCC consistently. The openshift.scc value is the source of truth for this annotation. If you also set podAnnotations.openshift.io/required-scc, the chart replaces that annotation with the selected openshift.scc value and prints a note during Helm rendering.
For restricted-v2, user-provided podSecurityContext.runAsUser, podSecurityContext.runAsGroup, and podSecurityContext.fsGroup values from values.yaml are not rendered, and OpenShift injects namespace-valid UID and GID values during admission. For nonroot-v2, the chart keeps the default explicit pod security context values.
When an SCC profile is active, initContainers.extraInit[*].securityContext cannot override SCC-required fields. The chart rejects extra init containers that set privileged: true, allowPrivilegeEscalation: true, runAsUser: 0, or capabilities.add values other than NET_BIND_SERVICE.
Selection works in one of two modes:
When
openshift.lookupScc: true, the chart probes the cluster and selects the first profile listed inopenshift.sccPrioritythat exists on the cluster. The Helm identity must have permission to get SCC resources at the cluster scope.When
openshift.lookupScc: false(the default), the chart cannot verify which profiles the cluster supports, so automatic selection returns the first entry inopenshift.sccPriority. With the default priority list this resolves torestricted-v2, which is broadly available: it is included with every supported OpenShift release (4.11 and later) and is granted to all authenticated users by default. Overrideopenshift.sccPriorityto change which profile is selected in this mode.
On OpenShift 4.20 and later, restricted-v3 is the cluster’s default selected SCC for authenticated users. The chart’s openshift.io/required-scc: restricted-v2 annotation (rendered when openshift.annotateRequiredSCC is true, the default) overrides that cluster default and keeps admission on restricted-v2. If you see pod admission errors mentioning restricted-v3 or restricted-v2 access, refer to Troubleshooting OpenShift SCC Admission Failures.
If your service account does not have access to the selected SCC, grant the SCC before you deploy. For example, to grant restricted-v2 to the default service account:
oc adm policy add-scc-to-user restricted-v2 -z default -n nim-llm
To pin a specific SCC instead of using automatic selection, add the following values to your Helm values file:
openshift:
scc: restricted-v2
Note: Support for the
restricted-v3SCC (introduced in OpenShift 4.20, with pod-level user namespaces) is deferred pending NVIDIA GPU Operator device plugin compatibility withhostUsers: false. When the upstream blocker is resolved, the chart will reintroducerestricted-v3as an available profile.
Create Model Cache PVC#
Optional: For persistent model caching across pod restarts, create a PVC. NIM downloads models into ephemeral storage at startup when no PVC is configured.
Adjust storageClassName based on your OpenShift edition:
OpenShift Edition |
StorageClass Name |
|---|---|
Self-managed OCP with ODF |
|
Self-managed OCP with NFS |
|
ROSA (AWS) |
|
ARO (Azure) |
|
OpenShift Dedicated (GCP) |
|
Note:
gp3-csi(ROSA) andstandard-csi(OpenShift Dedicated on GCP) only supportReadWriteOnce. Change theaccessModesin the PVC manifest fromReadWriteManytoReadWriteOncewhen using these StorageClasses.
cat <<EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nvidia-nim-cache-pvc
namespace: nim-llm
spec:
accessModes:
- ReadWriteMany
storageClassName: ocs-storagecluster-cephfs # adjust for your edition or skip to use default
resources:
requests:
storage: 200Gi
EOF
Deploy NIM with Helm#
The NIM LLM Helm chart is compatible with OpenShift. The chart defaults handle security context, GPU resources (nvidia.com/gpu: 1), image pull secrets (ngc-secret), the NGC API secret (ngc-api), and service configuration. Override only the values that are specific to your deployment.
Single GPU Deployment#
The following examples show single-GPU deployment options using either a model-specific NIM image or a model-free NIM image.
Model-Specific NIM#
To deploy a model-specific NIM container, complete the following steps:
Create a values file:
cat <<EOF | tee custom-values-openshift.yaml image: repository: <NIM_LLM_MODEL_SPECIFIC_IMAGE> tag: "2.0.4-pb6.0" model: name: "meta/llama3.1-8b-instruct" EOF
Deploy the chart:
helm upgrade --install my-nim ./helm \ -f custom-values-openshift.yaml \ -n nim-llm
Model-Free NIM#
For model-free NIM container using the vLLM-OSS stack, complete the following steps:
Create a values file and set the model through environment variables:
cat <<EOF | tee custom-values-openshift-vllm-oss.yaml image: repository: "<NIM_LLM_MODEL_FREE_IMAGE>" tag: "2.0.4-pb6.0" model: name: "my-model" env: - name: NIM_MODEL_PATH value: "hf://TinyLlama/TinyLlama-1.1B-Chat-v1.0" - name: NIM_SERVED_MODEL_NAME value: "my-model" - name: NIM_MAX_MODEL_LEN value: "2048" EOF
Optional: For gated Hugging Face models, create an HF token secret:
oc create secret generic hf-token --namespace nim-llm --from-literal=HF_TOKEN="${HF_TOKEN}"
Optional: Add
model.hfTokenSecret: hf-tokento your values file.Deploy the chart:
helm upgrade --install my-nim ./helm \ -f custom-values-openshift-vllm-oss.yaml \ -n nim-llm
Enable Persistent Cache#
Optional: To use the PVC created earlier, add these lines to your values file:
persistence:
enabled: true
existingClaim: "nvidia-nim-cache-pvc"
Monitor Deployment#
Wait for the pod to be ready:
oc -n nim-llm get pods -l "app.kubernetes.io/name=nim-llm" -w
Wait until the pod status is Running and ready, then press Ctrl+C.
Expose NIM Service#
OpenShift Routes provide automatic TLS termination and DNS. Create a Route for external access by using the following steps:
Create the Route:
oc expose svc/my-nim-nim-llm --port=8000 -n nim-llm
Get the route host:
export NIM_URL=$(oc get route my-nim-nim-llm -n nim-llm -o jsonpath='{.spec.host}')
Print the endpoint:
echo "NIM endpoint: http://${NIM_URL}"
Alternatively, use port-forward for local testing:
oc -n nim-llm port-forward svc/my-nim-nim-llm 8000:8000
Verify Deployment#
Verify that the deployment is ready and can serve inference requests through the OpenShift Route. If you are using port-forward for local testing, replace http://${NIM_URL} with http://127.0.0.1:8000 in the following commands.
Call the readiness endpoint to confirm that the service is ready:
curl -s http://${NIM_URL}/v1/health/ready
Query the
/v1/modelsendpoint to discover the model name to use in inference requests:curl -s http://${NIM_URL}/v1/models | python3 -m json.tool
Send a chat completion request using the model name returned by
/v1/models:curl -X POST http://${NIM_URL}/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "meta/llama3.1-8b-instruct", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 256 }' | python3 -m json.tool
Use the same model value as NIM_SERVED_MODEL_NAME for a model-free NIM or model.name for a model-specific NIM.
Run Helm Tests#
Verify your deployment with built-in tests:
helm test my-nim -n nim-llm
View Logs#
Stream real-time logs from your NIM pods:
oc -n nim-llm logs -l "app.kubernetes.io/name=nim-llm" -f
Cleanup#
Use this section to remove the OpenShift deployment and related resources.
Uninstall NIM#
Remove the NIM deployment resources in the following order.
Uninstall the Helm release:
helm uninstall my-nim -n nim-llm
Delete the persistent volume claims:
oc delete pvc -n nim-llm -l app.kubernetes.io/name=nim-llm
Delete the project:
oc delete project nim-llm
Remove GPU Operators#
Remove the GPU Operator resources first, and then remove the NFD resources.
Delete the GPU Operator
ClusterPolicy:oc delete clusterpolicy gpu-cluster-policy
Delete the GPU Operator CSV and subscription:
oc delete csv -n nvidia-gpu-operator -l operators.coreos.com/gpu-operator-certified.nvidia-gpu-operator oc delete subscription gpu-operator-certified -n nvidia-gpu-operator
Delete the GPU Operator project:
oc delete project nvidia-gpu-operator
Delete the
NodeFeatureDiscoveryinstance:oc delete nfd nfd-instance -n openshift-nfd
Delete the NFD CSV and subscription:
oc delete csv -n openshift-nfd -l operators.coreos.com/nfd.openshift-nfd oc delete subscription nfd -n openshift-nfd
Delete the NFD project:
oc delete project openshift-nfd
OpenShift Dedicated (GCP): Remove GPU Machine Pool#
If you created a GPU machine pool for OpenShift Dedicated on GCP, delete it:
export CLUSTER_NAME="<your-cluster-name>"
ocm delete machinepool --cluster=${CLUSTER_NAME} gpu-pool
ROSA: Remove GPU Machine Pool#
If you created a GPU machine pool for ROSA, delete it:
export CLUSTER_NAME="<your-cluster-name>"
export GPU_POOL_NAME="<your-gpu-pool-name>"
rosa delete machinepool ${GPU_POOL_NAME} -c ${CLUSTER_NAME} --yes
ARO: Remove GPU MachineSet#
If you created a GPU MachineSet for ARO, scale it down and delete it:
export GPU_MACHINESET_NAME="<your-gpu-machineset-name>"
oc scale machineset ${GPU_MACHINESET_NAME} -n openshift-machine-api --replicas=0
oc delete machineset ${GPU_MACHINESET_NAME} -n openshift-machine-api