OpenShift#

Red Hat OpenShift is an enterprise Kubernetes platform for hybrid and multicloud environments. Deploy NIM on self-managed OpenShift Container Platform (OCP) and Azure Red Hat OpenShift (ARO) with GPU Operator setup, deployment, and verification. The commands on this page have been tested on self-managed OCP and ARO.

Prerequisites#

Before deploying NIM on OpenShift, make sure you have the following:

  • An OpenShift cluster (self-managed OCP or ARO) with GPU-capable nodes

  • The oc CLI, installed and authenticated with oc login

  • Helm 3

  • An NGC API key for pulling NIM container images and downloading model artifacts

Install GPU Operators#

NIM requires NVIDIA GPUs to be available as schedulable resources. This requires the Node Feature Discovery (NFD) Operator and the NVIDIA GPU Operator.

Install NFD Operator#

To install the NFD Operator and label GPU nodes, complete the following steps:

  1. Create the openshift-nfd namespace:

    oc apply -f - <<'EOF'
    apiVersion: v1
    kind: Namespace
    metadata:
      name: openshift-nfd
    EOF
    
  2. Create the OperatorGroup:

    oc apply -f - <<'EOF'
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: nfd-operator-group
      namespace: openshift-nfd
    spec:
      targetNamespaces:
        - openshift-nfd
    EOF
    
  3. Create the Subscription:

    oc apply -f - <<'EOF'
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: nfd
      namespace: openshift-nfd
    spec:
      channel: stable
      name: nfd
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
    
  4. Wait for the NFD Operator to install:

    timeout 300 bash -c 'until oc get csv -n openshift-nfd 2>/dev/null | grep -q Succeeded; do sleep 10; done'
    
  5. Create a NodeFeatureDiscovery instance to label GPU nodes:

    oc apply -f - <<'EOF'
    apiVersion: nfd.openshift.io/v1
    kind: NodeFeatureDiscovery
    metadata:
      name: nfd-instance
      namespace: openshift-nfd
    spec:
      operand:
        servicePort: 12000
      workerConfig:
        configData: |
          sources:
            pci:
              deviceClassWhitelist:
                - "0300"
                - "0302"
              deviceLabelFields:
                - vendor
    EOF
    

Install NVIDIA GPU Operator#

To install the NVIDIA GPU Operator and configure GPU drivers and device plugins, complete the following steps:

  1. Create the nvidia-gpu-operator namespace:

    oc apply -f - <<'EOF'
    apiVersion: v1
    kind: Namespace
    metadata:
      name: nvidia-gpu-operator
    EOF
    
  2. Create the OperatorGroup:

    oc apply -f - <<'EOF'
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: nvidia-gpu-operator-group
      namespace: nvidia-gpu-operator
    spec:
      targetNamespaces:
        - nvidia-gpu-operator
    EOF
    
  3. Create the Subscription:

    oc apply -f - <<'EOF'
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: gpu-operator-certified
      namespace: nvidia-gpu-operator
    spec:
      channel: v25.10
      name: gpu-operator-certified
      source: certified-operators
      sourceNamespace: openshift-marketplace
    EOF
    
  4. Wait for the GPU Operator to install:

    timeout 300 bash -c 'until oc get csv -n nvidia-gpu-operator 2>/dev/null | grep -q Succeeded; do sleep 10; done'
    
  5. Create a ClusterPolicy to configure GPU drivers and device plugins:

    oc apply -f - <<'EOF'
    apiVersion: nvidia.com/v1
    kind: ClusterPolicy
    metadata:
      name: gpu-cluster-policy
    spec:
      operator:
        defaultRuntime: crio
        use_ocp_driver_toolkit: true
      daemonsets:
        priorityClassName: system-node-critical
      dcgm:
        enabled: true
      dcgmExporter:
        enabled: true
      devicePlugin:
        enabled: true
      driver:
        enabled: true
      gfd:
        enabled: true
      nodeStatusExporter:
        enabled: true
      toolkit:
        enabled: true
    EOF
    

Verify GPU Availability#

Wait for the GPU driver to build and for the device plugin to register GPUs. This process can take up to 20 minutes:

oc get nodes -o custom-columns='NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu'

At least one node should show a GPU count of 1 or more.

Common Setup#

The following resources are shared across all deployment methods. These commands have been tested on OpenShift 4.21.0 with NVIDIA GPU Operator v25.10.

Create OpenShift Project#

Create an OpenShift project for the NIM deployment:

oc new-project nim-llm

Create NGC Image Pull Secret#

Create an image pull secret so OpenShift can pull NIM container images from nvcr.io:

export NGC_API_KEY=nvapi-xxx

oc create secret docker-registry ngc-secret \
  --docker-server=nvcr.io \
  --docker-username='$oauthtoken' \
  --docker-password=${NGC_API_KEY} \
  -n nim-llm

Create NGC API Secret#

Create a secret that stores your NGC API key for use by the NIM container:

oc create secret generic ngc-api \
  --namespace nim-llm \
  --from-literal=NGC_API_KEY="$NGC_API_KEY"

Grant the Security Context Constraint#

OpenShift enforces SecurityContextConstraints (SCCs), which restrict the user IDs that a pod can run as. The NIM container runs as UID 1000 (non-root). Grant the nonroot-v2 SCC to allow this:

oc adm policy add-scc-to-user nonroot-v2 -z default -n nim-llm

The Helm chart defaults (runAsUser: 1000, runAsGroup: 1000, fsGroup: 1000) are compatible with the nonroot-v2 SCC. No additional security context configuration is required.

Optional: Create Model Cache PVC#

For persistent model caching across pod restarts, create a PVC. This step is optional because NIM downloads models into ephemeral storage at startup when no PVC is configured.

Adjust storageClassName based on your OpenShift edition:

OpenShift Edition

StorageClass Name

Self-managed OCP with ODF

ocs-storagecluster-cephfs

Self-managed OCP with NFS

nfs-client

ARO (Azure)

managed-premium

cat <<EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nvidia-nim-cache-pvc
  namespace: nim-llm
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ocs-storagecluster-cephfs   # adjust for your edition or skip to use default
  resources:
    requests:
      storage: 200Gi
EOF

Deploy NIM with Helm#

The NIM LLM Helm chart is compatible with OpenShift. The chart defaults handle security context, GPU resources (nvidia.com/gpu: 1), image pull secrets (ngc-secret), the NGC API secret (ngc-api), and service configuration. Override only the values that are specific to your deployment.

Single GPU Deployment (Llama 3.1 8B Instruct)#

Model-Specific NIM#

To deploy a model-specific NIM container, complete the following steps:

  1. Create a values file:

    cat <<EOF | tee custom-values-openshift.yaml
    image:
      repository: <NIM_LLM_MODEL_SPECIFIC_IMAGE>
      tag: "2.0.1"
    
    model:
      name: "meta/llama3.1-8b-instruct"
    EOF
    
  2. Deploy the chart:

    helm upgrade --install my-nim ./helm \
      -f custom-values-openshift.yaml \
      -n nim-llm
    

Model-Free NIM#

For model-free NIM container using the vLLM-OSS stack, complete the following steps:

  1. Create a values file and set the model through environment variables:

    cat <<EOF | tee custom-values-openshift-vllm-oss.yaml
    image:
      repository: "<NIM_LLM_MODEL_FREE_IMAGE>"
      tag: "2.0.1"
    
    model:
      name: "my-model"
    
    env:
      - name: NIM_MODEL_PATH
        value: "hf://TinyLlama/TinyLlama-1.1B-Chat-v1.0"
      - name: NIM_SERVED_MODEL_NAME
        value: "my-model"
      - name: NIM_MAX_MODEL_LEN
        value: "2048"
    EOF
    
  2. Optional: For gated Hugging Face models, create an HF token secret:

    oc create secret generic hf-token --namespace nim-llm --from-literal=HF_TOKEN="${HF_TOKEN}"
    
  3. Optional: Add model.hfTokenSecret: hf-token to your values file.

  4. Deploy the chart:

    helm upgrade --install my-nim ./helm \
      -f custom-values-openshift-vllm-oss.yaml \
      -n nim-llm
    

Optional: Enable Persistent Cache#

To use the PVC created earlier, add these lines to your values file:

persistence:
  enabled: true
  existingClaim: "nvidia-nim-cache-pvc"

Monitor Deployment#

Wait for the pod to be ready:

oc -n nim-llm get pods -l "app.kubernetes.io/name=nim-llm" -w

Wait until the pod status is Running and ready, then press Ctrl+C.

Expose NIM Service#

OpenShift Routes provide automatic TLS termination and DNS. Create a Route for external access by using the following steps:

  1. Create the Route:

    oc expose svc/my-nim-nim-llm --port=8000 -n nim-llm
    
  2. Get the route host:

    export NIM_URL=$(oc get route my-nim-nim-llm -n nim-llm -o jsonpath='{.spec.host}')
    
  3. Print the endpoint:

    echo "NIM endpoint: http://${NIM_URL}"
    

Alternatively, use port-forward for local testing:

oc -n nim-llm port-forward svc/my-nim-nim-llm 8000:8000

Verify Deployment#

Verify that the deployment is ready and can serve inference requests through the OpenShift Route. If you are using port-forward for local testing, replace http://${NIM_URL} with http://127.0.0.1:8000 in the following commands.

  1. Call the readiness endpoint to confirm that the service is ready:

    curl -s http://${NIM_URL}/v1/health/ready
    
  2. Query the /v1/models endpoint to discover the model name to use in inference requests:

    curl -s http://${NIM_URL}/v1/models | python3 -m json.tool
    
  3. Send a chat completion request using the model name returned by /v1/models:

    curl -X POST http://${NIM_URL}/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "meta/llama3.1-8b-instruct",
        "messages": [{"role": "user", "content": "Hello!"}],
        "max_tokens": 256
      }' | python3 -m json.tool
    

Use the same model value as NIM_SERVED_MODEL_NAME for a model-free NIM or model.name for a model-specific NIM.

Run Helm Tests#

Verify your deployment with built-in tests:

helm test my-nim -n nim-llm

View Logs#

Stream real-time logs from your NIM pods:

oc -n nim-llm logs -l "app.kubernetes.io/name=nim-llm" -f

Cleanup#

Uninstall NIM#

Remove the NIM deployment resources in the following order.

  1. Uninstall the Helm release:

    helm uninstall my-nim -n nim-llm
    
  2. Delete the persistent volume claims:

    oc delete pvc -n nim-llm -l app.kubernetes.io/name=nim-llm
    
  3. Delete the project:

    oc delete project nim-llm
    

Remove GPU Operators#

Remove the GPU Operator resources first, and then remove the NFD resources.

  1. Delete the GPU Operator ClusterPolicy:

    oc delete clusterpolicy gpu-cluster-policy
    
  2. Delete the GPU Operator CSV and subscription:

    oc delete csv -n nvidia-gpu-operator -l operators.coreos.com/gpu-operator-certified.nvidia-gpu-operator
    oc delete subscription gpu-operator-certified -n nvidia-gpu-operator
    
  3. Delete the GPU Operator project:

    oc delete project nvidia-gpu-operator
    
  4. Delete the NodeFeatureDiscovery instance:

    oc delete nfd nfd-instance -n openshift-nfd
    
  5. Delete the NFD CSV and subscription:

    oc delete csv -n openshift-nfd -l operators.coreos.com/nfd.openshift-nfd
    oc delete subscription nfd -n openshift-nfd
    
  6. Delete the NFD project:

    oc delete project openshift-nfd
    

ARO: Remove GPU MachineSet#

If you created a GPU MachineSet for ARO, scale it down and delete it:

oc scale machineset <gpu-machineset-name> -n openshift-machine-api --replicas=0
oc delete machineset <gpu-machineset-name> -n openshift-machine-api