OpenShift#

Red Hat OpenShift is an enterprise Kubernetes platform for hybrid and multicloud environments. Deploy NIM on self-managed OpenShift Container Platform (OCP), Red Hat OpenShift Service on AWS (ROSA), Azure Red Hat OpenShift (ARO), and Red Hat OpenShift Dedicated on Google Cloud with GPU Operator setup, deployment, and verification. The commands on this page have been tested on self-managed OCP, ROSA, ARO, and OpenShift Dedicated (GCP).

Note: The instructions for self-managed OCP apply universally to any OpenShift cluster provisioned by the user, whether running on bare metal, VMware vSphere, or cloud VMs such as GCP, AWS, or Azure. No cloud-specific deployment steps are required once the cluster is running.

Prerequisites#

Before deploying NIM on OpenShift, make sure you have the following:

  • An OpenShift cluster (self-managed OCP, ROSA, ARO, or OpenShift Dedicated on GCP) with GPU-capable nodes

  • The oc CLI, installed and authenticated with oc login

  • Helm 3

  • An NGC API key for pulling NIM container images and downloading model artifacts

Install GPU Operators#

NIM requires NVIDIA GPUs to be available as schedulable resources. This requires the Node Feature Discovery (NFD) Operator and the NVIDIA GPU Operator.

Install NFD Operator#

To install the NFD Operator and label GPU nodes, complete the following steps:

  1. Create the openshift-nfd namespace:

    oc apply -f - <<'EOF'
    apiVersion: v1
    kind: Namespace
    metadata:
      name: openshift-nfd
    EOF
    
  2. Create the OperatorGroup:

    oc apply -f - <<'EOF'
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: nfd-operator-group
      namespace: openshift-nfd
    spec:
      targetNamespaces:
        - openshift-nfd
    EOF
    
  3. Create the Subscription:

    oc apply -f - <<'EOF'
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: nfd
      namespace: openshift-nfd
    spec:
      channel: stable
      name: nfd
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
    
  4. Wait for the NFD Operator to install:

    timeout 300 bash -c 'until oc get csv -n openshift-nfd 2>/dev/null | grep -q Succeeded; do sleep 10; done'
    
  5. Create a NodeFeatureDiscovery instance to label GPU nodes:

    oc apply -f - <<'EOF'
    apiVersion: nfd.openshift.io/v1
    kind: NodeFeatureDiscovery
    metadata:
      name: nfd-instance
      namespace: openshift-nfd
    spec:
      operand:
        servicePort: 12000
      workerConfig:
        configData: |
          sources:
            pci:
              deviceClassWhitelist:
                - "0300"
                - "0302"
              deviceLabelFields:
                - vendor
    EOF
    

Install NVIDIA GPU Operator#

To install the NVIDIA GPU Operator and configure GPU drivers and device plugins, complete the following steps:

  1. Create the nvidia-gpu-operator namespace:

    oc apply -f - <<'EOF'
    apiVersion: v1
    kind: Namespace
    metadata:
      name: nvidia-gpu-operator
    EOF
    
  2. Create the OperatorGroup:

    oc apply -f - <<'EOF'
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: nvidia-gpu-operator-group
      namespace: nvidia-gpu-operator
    spec:
      targetNamespaces:
        - nvidia-gpu-operator
    EOF
    
  3. Create the Subscription:

    oc apply -f - <<'EOF'
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: gpu-operator-certified
      namespace: nvidia-gpu-operator
    spec:
      channel: v25.10
      name: gpu-operator-certified
      source: certified-operators
      sourceNamespace: openshift-marketplace
    EOF
    
  4. Wait for the GPU Operator to install:

    timeout 300 bash -c 'until oc get csv -n nvidia-gpu-operator 2>/dev/null | grep -q Succeeded; do sleep 10; done'
    
  5. Create a ClusterPolicy to configure GPU drivers and device plugins:

    oc apply -f - <<'EOF'
    apiVersion: nvidia.com/v1
    kind: ClusterPolicy
    metadata:
      name: gpu-cluster-policy
    spec:
      operator:
        defaultRuntime: crio
        use_ocp_driver_toolkit: true
      daemonsets:
        priorityClassName: system-node-critical
      dcgm:
        enabled: true
      dcgmExporter:
        enabled: true
      devicePlugin:
        enabled: true
      driver:
        enabled: true
      gfd:
        enabled: true
      nodeStatusExporter:
        enabled: true
      toolkit:
        enabled: true
    EOF
    

Verify GPU Availability#

Wait for the GPU driver to build and for the device plugin to register GPUs. This process can take up to 20 minutes:

oc get nodes -o custom-columns='NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu'

At least one node should show a GPU count of 1 or more.

Common Setup#

The following resources are shared across all deployment methods. These commands have been tested on OpenShift 4.18.36 (ROSA), 4.19.20 (ARO), 4.21.0 (self-managed OCP), and 4.21.8 (OpenShift Dedicated on GCP) with NVIDIA GPU Operator v24.6, v25.3, and v25.10.

Create OpenShift Project#

Create an OpenShift project for the NIM deployment:

oc new-project nim-llm

Create NGC Image Pull Secret#

Create an image pull secret so OpenShift can pull NIM container images from nvcr.io:

export NGC_API_KEY=nvapi-xxx

oc create secret docker-registry ngc-secret \
  --docker-server=nvcr.io \
  --docker-username='$oauthtoken' \
  --docker-password=${NGC_API_KEY} \
  -n nim-llm

Create NGC API Secret#

Create a secret that stores your NGC API key for use by the NIM container:

oc create secret generic ngc-api \
  --namespace nim-llm \
  --from-literal=NGC_API_KEY="$NGC_API_KEY"

Grant the Security Context Constraint#

OpenShift enforces SecurityContextConstraints (SCCs) that restrict which user IDs a pod can run as. The NIM container defaults to UID 1000 (non-root) but supports arbitrary UIDs — including the random UIDs that OpenShift assigns — as long as the container runs with GID 0 (root group). Grant the nonroot-v2 SCC to allow this:

oc adm policy add-scc-to-user nonroot-v2 -z default -n nim-llm

The Helm chart defaults (runAsUser: 1000, runAsGroup: 0, fsGroup: 0) are compatible with the nonroot-v2 SCC. OpenShift’s restricted-v2 SCC assigns a random UID with GID 0, which the container handles automatically via an entrypoint that registers the UID in /etc/passwd at startup. No additional security context configuration is required.

Create Model Cache PVC#

Optional: For persistent model caching across pod restarts, create a PVC. NIM downloads models into ephemeral storage at startup when no PVC is configured.

Adjust storageClassName based on your OpenShift edition:

OpenShift Edition

StorageClass Name

Self-managed OCP with ODF

ocs-storagecluster-cephfs

Self-managed OCP with NFS

nfs-client

ROSA (AWS)

gp3-csi

ARO (Azure)

managed-premium

OpenShift Dedicated (GCP)

standard-csi (default)

Note: gp3-csi (ROSA) and standard-csi (OpenShift Dedicated on GCP) only support ReadWriteOnce. Change the accessModes in the PVC manifest from ReadWriteMany to ReadWriteOnce when using these StorageClasses.

cat <<EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nvidia-nim-cache-pvc
  namespace: nim-llm
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ocs-storagecluster-cephfs   # adjust for your edition or skip to use default
  resources:
    requests:
      storage: 200Gi
EOF

Deploy NIM with Helm#

The NIM LLM Helm chart is compatible with OpenShift. The chart defaults handle security context, GPU resources (nvidia.com/gpu: 1), image pull secrets (ngc-secret), the NGC API secret (ngc-api), and service configuration. Override only the values that are specific to your deployment.

Single GPU Deployment#

The following examples show single-GPU deployment options using either a model-specific NIM image or a model-free NIM image.

Model-Specific NIM#

To deploy a model-specific NIM container, complete the following steps:

  1. Create a values file:

    cat <<EOF | tee custom-values-openshift.yaml
    image:
      repository: <NIM_LLM_MODEL_SPECIFIC_IMAGE>
      tag: "2.0.3"
    
    model:
      name: "meta/llama3.1-8b-instruct"
    EOF
    
  2. Deploy the chart:

    helm upgrade --install my-nim ./helm \
      -f custom-values-openshift.yaml \
      -n nim-llm
    

Model-Free NIM#

For model-free NIM container using the vLLM-OSS stack, complete the following steps:

  1. Create a values file and set the model through environment variables:

    cat <<EOF | tee custom-values-openshift-vllm-oss.yaml
    image:
      repository: "<NIM_LLM_MODEL_FREE_IMAGE>"
      tag: "2.0.3"
    
    model:
      name: "my-model"
    
    env:
      - name: NIM_MODEL_PATH
        value: "hf://TinyLlama/TinyLlama-1.1B-Chat-v1.0"
      - name: NIM_SERVED_MODEL_NAME
        value: "my-model"
      - name: NIM_MAX_MODEL_LEN
        value: "2048"
    EOF
    
  2. Optional: For gated Hugging Face models, create an HF token secret:

    oc create secret generic hf-token --namespace nim-llm --from-literal=HF_TOKEN="${HF_TOKEN}"
    
  3. Optional: Add model.hfTokenSecret: hf-token to your values file.

  4. Deploy the chart:

    helm upgrade --install my-nim ./helm \
      -f custom-values-openshift-vllm-oss.yaml \
      -n nim-llm
    

Enable Persistent Cache#

Optional: To use the PVC created earlier, add these lines to your values file:

persistence:
  enabled: true
  existingClaim: "nvidia-nim-cache-pvc"

Monitor Deployment#

Wait for the pod to be ready:

oc -n nim-llm get pods -l "app.kubernetes.io/name=nim-llm" -w

Wait until the pod status is Running and ready, then press Ctrl+C.

Expose NIM Service#

OpenShift Routes provide automatic TLS termination and DNS. Create a Route for external access by using the following steps:

  1. Create the Route:

    oc expose svc/my-nim-nim-llm --port=8000 -n nim-llm
    
  2. Get the route host:

    export NIM_URL=$(oc get route my-nim-nim-llm -n nim-llm -o jsonpath='{.spec.host}')
    
  3. Print the endpoint:

    echo "NIM endpoint: http://${NIM_URL}"
    

Alternatively, use port-forward for local testing:

oc -n nim-llm port-forward svc/my-nim-nim-llm 8000:8000

Verify Deployment#

Verify that the deployment is ready and can serve inference requests through the OpenShift Route. If you are using port-forward for local testing, replace http://${NIM_URL} with http://127.0.0.1:8000 in the following commands.

  1. Call the readiness endpoint to confirm that the service is ready:

    curl -s http://${NIM_URL}/v1/health/ready
    
  2. Query the /v1/models endpoint to discover the model name to use in inference requests:

    curl -s http://${NIM_URL}/v1/models | python3 -m json.tool
    
  3. Send a chat completion request using the model name returned by /v1/models:

    curl -X POST http://${NIM_URL}/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "meta/llama3.1-8b-instruct",
        "messages": [{"role": "user", "content": "Hello!"}],
        "max_tokens": 256
      }' | python3 -m json.tool
    

Use the same model value as NIM_SERVED_MODEL_NAME for a model-free NIM or model.name for a model-specific NIM.

Run Helm Tests#

Verify your deployment with built-in tests:

helm test my-nim -n nim-llm

View Logs#

Stream real-time logs from your NIM pods:

oc -n nim-llm logs -l "app.kubernetes.io/name=nim-llm" -f

Cleanup#

Use this section to remove the OpenShift deployment and related resources.

Uninstall NIM#

Remove the NIM deployment resources in the following order.

  1. Uninstall the Helm release:

    helm uninstall my-nim -n nim-llm
    
  2. Delete the persistent volume claims:

    oc delete pvc -n nim-llm -l app.kubernetes.io/name=nim-llm
    
  3. Delete the project:

    oc delete project nim-llm
    

Remove GPU Operators#

Remove the GPU Operator resources first, and then remove the NFD resources.

  1. Delete the GPU Operator ClusterPolicy:

    oc delete clusterpolicy gpu-cluster-policy
    
  2. Delete the GPU Operator CSV and subscription:

    oc delete csv -n nvidia-gpu-operator -l operators.coreos.com/gpu-operator-certified.nvidia-gpu-operator
    oc delete subscription gpu-operator-certified -n nvidia-gpu-operator
    
  3. Delete the GPU Operator project:

    oc delete project nvidia-gpu-operator
    
  4. Delete the NodeFeatureDiscovery instance:

    oc delete nfd nfd-instance -n openshift-nfd
    
  5. Delete the NFD CSV and subscription:

    oc delete csv -n openshift-nfd -l operators.coreos.com/nfd.openshift-nfd
    oc delete subscription nfd -n openshift-nfd
    
  6. Delete the NFD project:

    oc delete project openshift-nfd
    

OpenShift Dedicated (GCP): Remove GPU Machine Pool#

If you created a GPU machine pool for OpenShift Dedicated on GCP, delete it:

export CLUSTER_NAME="<your-cluster-name>"

ocm delete machinepool --cluster=${CLUSTER_NAME} gpu-pool

ROSA: Remove GPU Machine Pool#

If you created a GPU machine pool for ROSA, delete it:

export CLUSTER_NAME="<your-cluster-name>"
export GPU_POOL_NAME="<your-gpu-pool-name>"

rosa delete machinepool ${GPU_POOL_NAME} -c ${CLUSTER_NAME} --yes

ARO: Remove GPU MachineSet#

If you created a GPU MachineSet for ARO, scale it down and delete it:

export GPU_MACHINESET_NAME="<your-gpu-machineset-name>"

oc scale machineset ${GPU_MACHINESET_NAME} -n openshift-machine-api --replicas=0
oc delete machineset ${GPU_MACHINESET_NAME} -n openshift-machine-api