OpenShift#

Red Hat OpenShift is an enterprise Kubernetes platform for hybrid and multi-cloud environments. NIM supports deployment on self-managed OCP and Azure Red Hat OpenShift (ARO). This guide covers GPU operator setup, NIM deployment, and verification. The commands have been tested on self-managed OCP and ARO.

Prerequisites#

Before deploying NIM on OpenShift, ensure you have:

  • An OpenShift cluster (self-managed OCP or ARO) with GPU-capable nodes

  • oc CLI installed and authenticated (oc login)

  • Helm 3 installed

  • NGC API key for pulling container images

Install GPU Operators#

NIM requires NVIDIA GPUs to be available as schedulable resources. This requires the Node Feature Discovery (NFD) Operator and the NVIDIA GPU Operator.

Install NFD Operator#

oc apply -f - <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-nfd
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: nfd-operator-group
  namespace: openshift-nfd
spec:
  targetNamespaces:
    - openshift-nfd
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: nfd
  namespace: openshift-nfd
spec:
  channel: stable
  name: nfd
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF

Wait for the NFD Operator to install:

timeout 300 bash -c 'until oc get csv -n openshift-nfd 2>/dev/null | grep -q Succeeded; do sleep 10; done'

Create a NodeFeatureDiscovery instance to label GPU nodes:

oc apply -f - <<'EOF'
apiVersion: nfd.openshift.io/v1
kind: NodeFeatureDiscovery
metadata:
  name: nfd-instance
  namespace: openshift-nfd
spec:
  operand:
    servicePort: 12000
  workerConfig:
    configData: |
      sources:
        pci:
          deviceClassWhitelist:
            - "0300"
            - "0302"
          deviceLabelFields:
            - vendor
EOF

Install NVIDIA GPU Operator#

oc apply -f - <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
  name: nvidia-gpu-operator
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: nvidia-gpu-operator-group
  namespace: nvidia-gpu-operator
spec:
  targetNamespaces:
    - nvidia-gpu-operator
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: gpu-operator-certified
  namespace: nvidia-gpu-operator
spec:
  channel: v25.10
  name: gpu-operator-certified
  source: certified-operators
  sourceNamespace: openshift-marketplace
EOF

Wait for the GPU Operator to install:

timeout 300 bash -c 'until oc get csv -n nvidia-gpu-operator 2>/dev/null | grep -q Succeeded; do sleep 10; done'

Create a ClusterPolicy to configure GPU drivers and device plugins:

oc apply -f - <<'EOF'
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
  name: gpu-cluster-policy
spec:
  operator:
    defaultRuntime: crio
    use_ocp_driver_toolkit: true
  daemonsets:
    priorityClassName: system-node-critical
  dcgm:
    enabled: true
  dcgmExporter:
    enabled: true
  devicePlugin:
    enabled: true
  driver:
    enabled: true
  gfd:
    enabled: true
  nodeStatusExporter:
    enabled: true
  toolkit:
    enabled: true
EOF

Verify GPU Availability#

Wait for the GPU driver to build and the device plugin to register GPUs (this can take up to 20 minutes):

oc get nodes -o custom-columns='NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu'

At least one node should show a GPU count of 1 or more.

Common Setup#

The following resources are shared across all deployment methods. These commands have been tested on OpenShift 4.21.0 with NVIDIA GPU Operator v25.10.

Create OpenShift Project#

oc new-project nim-llm

Create NGC Image Pull Secret#

export NGC_API_KEY=nvapi-xxx

oc create secret docker-registry ngc-secret \
  --docker-server=nvcr.io \
  --docker-username='$oauthtoken' \
  --docker-password=${NGC_API_KEY} \
  -n nim-llm

Create NGC API Secret#

oc create secret generic ngc-api \
  --namespace nim-llm \
  --from-literal=NGC_API_KEY="$NGC_API_KEY"

Grant Security Context Constraint#

OpenShift enforces SecurityContextConstraints (SCCs) that restrict which user IDs a pod can run as. The NIM container runs as UID 1000 (non-root). Grant the nonroot-v2 SCC to allow this:

oc adm policy add-scc-to-user nonroot-v2 -z default -n nim-llm

The Helm chart defaults (runAsUser: 1000, runAsGroup: 1000, fsGroup: 1000) are compatible with the nonroot-v2 SCC. No additional security context configuration is required.

Optional: Create Model Cache PVC#

For persistent model caching across pod restarts, create a PVC. This is optional because NIM downloads models into ephemeral storage at startup when no PVC is configured.

Adjust storageClassName based on your OpenShift edition:

OpenShift Edition

StorageClass Name

Self-managed OCP with ODF

ocs-storagecluster-cephfs

Self-managed OCP with NFS

nfs-client

ARO (Azure)

managed-premium

cat <<EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nvidia-nim-cache-pvc
  namespace: nim-llm
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ocs-storagecluster-cephfs   # adjust for your edition or skip to use default
  resources:
    requests:
      storage: 200Gi
EOF

Deploy NIM with Helm#

The NIM LLM Helm chart is compatible with OpenShift. The chart defaults handle security context, GPU resources (nvidia.com/gpu: 1), image pull secrets (ngc-secret), NGC API secret (ngc-api), and service configuration. You only need to override values specific to your deployment.

Single GPU Deployment (Llama 3.1 8B Instruct)#

Option A: Pre-built NIM image

cat <<EOF | tee custom-values-openshift.yaml
image:
  repository: <NIM_LLM_MODEL_SPECIFIC_IMAGE>
  tag: "2.0.0"

model:
  name: "meta/llama3.1-8b-instruct"
EOF

helm upgrade --install my-nim ./helm \
  -f custom-values-openshift.yaml \
  -n nim-llm

Option B: vLLM-OSS stack (multi-model image)

For model-free NIM using the vLLM-OSS stack, set the model through environment variables:

cat <<EOF | tee custom-values-openshift-vllm-oss.yaml
image:
  repository: "<NIM_LLM_MODEL_FREE_IMAGE>"
  tag: "2.0.0"

model:
  name: "my-model"

env:
  - name: NIM_MODEL_PATH
    value: "hf://TinyLlama/TinyLlama-1.1B-Chat-v1.0"
  - name: NIM_SERVED_MODEL_NAME
    value: "my-model"
  - name: NIM_MAX_MODEL_LEN
    value: "2048"
EOF

helm upgrade --install my-nim ./helm \
  -f custom-values-openshift-vllm-oss.yaml \
  -n nim-llm

For gated Hugging Face models, create an HF token secret:

oc create secret generic hf-token --namespace nim-llm --from-literal=HF_TOKEN="${HF_TOKEN}"

Then add to your values file: model.hfTokenSecret: hf-token

Optional: Enable Persistent Cache#

To use the PVC created earlier, add these lines to your values file:

persistence:
  enabled: true
  existingClaim: "nvidia-nim-cache-pvc"

Monitor Deployment#

Wait for the pod to be ready:

oc -n nim-llm get pods -l "app.kubernetes.io/name=nim-llm" -w

Wait until the pod status is Running and ready, then press Ctrl+C.

Expose NIM Service#

OpenShift Routes provide automatic TLS termination and DNS. Create a Route for external access:

oc expose svc/my-nim-nim-llm --port=8000 -n nim-llm

export NIM_URL=$(oc get route my-nim-nim-llm -n nim-llm -o jsonpath='{.spec.host}')
echo "NIM endpoint: http://${NIM_URL}"

Alternatively, use port-forward for local testing:

oc -n nim-llm port-forward svc/my-nim-nim-llm 8000:8000

Verify Deployment#

curl -s http://${NIM_URL}/v1/health/ready

curl -s http://${NIM_URL}/v1/models | python3 -m json.tool

curl -X POST http://${NIM_URL}/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta/llama3.1-8b-instruct",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 256
  }' | python3 -m json.tool

Use the same model value as NIM_SERVED_MODEL_NAME (for vLLM-OSS) or model.name (for pre-built NIM).

Run Helm Tests#

helm test my-nim -n nim-llm

View Logs#

oc -n nim-llm logs -l "app.kubernetes.io/name=nim-llm" -f

Cleanup#

Uninstall NIM#

helm uninstall my-nim -n nim-llm
oc delete pvc -n nim-llm -l app.kubernetes.io/name=nim-llm
oc delete project nim-llm

Remove GPU Operators#

oc delete clusterpolicy gpu-cluster-policy
oc delete csv -n nvidia-gpu-operator -l operators.coreos.com/gpu-operator-certified.nvidia-gpu-operator
oc delete subscription gpu-operator-certified -n nvidia-gpu-operator
oc delete project nvidia-gpu-operator

oc delete nfd nfd-instance -n openshift-nfd
oc delete csv -n openshift-nfd -l operators.coreos.com/nfd.openshift-nfd
oc delete subscription nfd -n openshift-nfd
oc delete project openshift-nfd

ARO: Remove GPU MachineSet#

If you created a GPU MachineSet for ARO, scale it down and delete it:

oc scale machineset <gpu-machineset-name> -n openshift-machine-api --replicas=0
oc delete machineset <gpu-machineset-name> -n openshift-machine-api