Deploy NVIDIA RAG Blueprint on OpenShift with Helm#

Use the following documentation to deploy the NVIDIA RAG Blueprint on a Red Hat OpenShift cluster by using Helm.

To deploy on standard Kubernetes (non-OpenShift), refer to Deploy on Kubernetes with Helm.
To deploy with MIG support, refer to RAG Deployment with MIG Support.
For other deployment options, refer to Deployment Options.

The chart includes built-in OpenShift support gated behind an openshift.enabled flag. When enabled, the chart automatically creates OpenShift Routes with edge TLS and an anyuid SCC RoleBinding for all required ServiceAccounts — no manual oc adm policy commands are needed.

Prerequisites#

Important

Ensure you have at least 200GB of available disk space per node where NIMs will be deployed. This space is required for the following:

NIM model cache downloads (~100-150GB)
Container images (~20-30GB)
Persistent volumes for vector database and application data
Logs and temporary files

Get an API Key.

Verify that you meet the hardware requirements. The minimum GPU requirements depend on deployment mode:

Deployment Mode	GPUs Required	Notes
Full (self-hosted NIMs)	8–10	All NIM models running in-cluster
Minimal (no VLM, no optional NIMs)	6–7	Core pipeline without VLM or audio
API-hosted LLM	4–6	LLM via build.nvidia.com; self-hosted embedding, reranking, and NV-Ingest NIMs

Verify that you have OpenShift 4.14 or later with cluster-admin access, and the oc CLI configured.
Verify that you have Helm 3 installed. To install Helm 3, follow the official Helm installation instructions.
Verify that you have the NVIDIA GPU Operator installed and functional. For details, see GPU Operator documentation.

Verify that you have the NVIDIA NIM Operator v3.0.2+ installed. If not, install it:

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
  --username='$oauthtoken' \
  --password=$NGC_API_KEY
helm repo update
helm install nim-operator nvidia/k8s-nim-operator -n nim-operator --create-namespace

For details, see NIM Operator installation guide.

Install the Elastic Cloud on Kubernetes (ECK) operator. Elasticsearch is the default vector database for this chart, and the chart provisions an Elasticsearch CR that requires the ECK operator to reconcile it:
```
helm repo add elastic https://helm.elastic.co
helm repo update
helm install elastic-operator elastic/eck-operator -n elastic-system --create-namespace
```
If you plan to replace Elasticsearch with Milvus or another backend and disable the chart-managed Elasticsearch, skip this step. See Vector database configuration.

Verify that a default StorageClass with dynamic provisioning is available (e.g., gp3-csi on AWS):

oc get storageclass

Note

If your cluster does not have a default dynamic StorageClass available (common on bare-metal OpenShift installations), install the OpenEBS Dynamic LocalPV Provisioner to satisfy the chart’s PVC requirements:

# Add the OpenEBS Helm repository
helm repo add openebs https://openebs.github.io/openebs
helm repo update

# Create the openebs namespace
kubectl create namespace openebs

# Install only the LocalPV provisioner; disable other storage engines
# On OpenShift, also disable the bundled minio/loki/alloy subcharts —
# their pods violate the restricted PodSecurity policy and the
# `openebs-minio-post-job` fails with BackoffLimitExceeded otherwise.
helm install openebs openebs/openebs \
  --namespace openebs \
  --set engines.replicated.mayastor.enabled=false \
  --set engines.local.lvm.enabled=false \
  --set engines.local.zfs.enabled=false \
  --set minio.enabled=false \
  --set loki.enabled=false \
  --set alloy.enabled=false

# OpenShift requires the privileged SCC for the provisioner service account
oc adm policy add-scc-to-user privileged -z openebs-localpv-provisioner -n openebs

# Mark openebs-hostpath as the default StorageClass
kubectl patch storageclass openebs-hostpath \
  -p '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class": "true"}}}'

Verify that the provisioner pods are running and the StorageClass is configured as default:

kubectl get pods -n openebs
kubectl get sc

Check GPU node taints. GPU nodes on OpenShift clusters typically have taints that prevent non-GPU workloads from scheduling on them. You need the taint keys for the tolerations configuration:
```
oc get nodes -l nvidia.com/gpu.present=true \
  -o custom-columns="NODE:.metadata.name,TAINTS:.spec.taints[*].key"
```
Verify the kubelet podPidsLimit is at least 16384. The rag-nv-ingest pod, along with the reranker and other NIMs, collectively spawn several thousand threads at steady state. The OpenShift default of 4096 is insufficient and surfaces as pthread_create failed: Resource temporarily unavailable errors during ingestion and reranking.

Inspect the current value on any worker node:
```
oc get --raw /api/v1/nodes/<node-name>/proxy/configz \
  | jq '.kubeletconfig.podPidsLimit'
```
If the value is below 16384, apply the following KubeletConfig (cluster-admin access required). The Machine Config Operator will roll the affected nodes:
```
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: rag-pod-pids-limit
spec:
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ""
  kubeletConfig:
    podPidsLimit: 16384
```
Accept NIM licenses. Each NIM container image on NGC requires individually accepting a license agreement before your API key can pull it. Accept licenses for each NIM at build.nvidia.com.

Deploy the RAG Helm Chart#

Important

When you use the Helm NIM Operator deployment, approximately 60 to 70 minutes is required for the entire pipeline to reach a running state on first deploy. Subsequent deployments are significantly faster (~10-15 minutes) because model caches are already populated.

To deploy the RAG Blueprint on OpenShift, use the following procedure.

Set your environment variables.

export NGC_API_KEY="nvapi-..."
export NAMESPACE="rag"

Navigate to the chart directory and build dependencies.

cd deploy/helm/nvidia-blueprint-rag

helm repo add nvidia-nemo https://helm.ngc.nvidia.com/nvidia/nemo-microservices \
  --username '$oauthtoken' --password "$NGC_API_KEY"

helm dependency build

Note

The OpenShift overlay passes values through the nv-ingest subchart’s extraVolumes / extraVolumeMounts keys. With the currently pinned nv-ingest 26.3.0, those values need a small indent adjustment in the pulled chart before helm upgrade will render valid YAML. Re-apply this after every helm dependency build or helm dependency update:

mkdir -p /tmp/nvi && \
  tar xzf charts/nv-ingest-26.3.0.tgz -C /tmp/nvi && \
  sed -i '/toYaml $v | nindent 12/s/nindent 12/nindent 14/' \
    /tmp/nvi/nv-ingest/templates/deployment.yaml && \
  tar czf charts/nv-ingest-26.3.0.tgz -C /tmp/nvi nv-ingest && \
  rm -rf /tmp/nvi

Note

Alternative — installing from the NGC chart URL

If you prefer the install-from-NGC pattern shown in Deploy on Kubernetes with Helm instead of cloning this repo, pull the chart locally first. Helm cannot patch a chart it streams directly from a remote URL, so the indent adjustment must be applied to a local copy before install:

# Pull and untar the chart from NGC. The NGC package ships with the
# nv-ingest subchart already extracted under charts/nv-ingest/, so the
# sed below can edit the template file in place.
helm pull https://helm.ngc.nvidia.com/nvidia/blueprint/charts/nvidia-blueprint-rag-v2.6.0.tgz \
  --username '$oauthtoken' --password "$NGC_API_KEY" \
  --untar --untardir /tmp

# Apply the indent adjustment to the bundled nv-ingest subchart
sed -i '/toYaml $v | nindent 12/s/nindent 12/nindent 14/' \
  /tmp/nvidia-blueprint-rag/charts/nv-ingest/templates/deployment.yaml

# Install from the patched local directory
helm upgrade --install rag -n $NAMESPACE /tmp/nvidia-blueprint-rag \
  -f /tmp/nvidia-blueprint-rag/values-openshift.yaml \
  --set imagePullSecret.password="$NGC_API_KEY" \
  --set ngcApiSecret.password="$NGC_API_KEY" \
  --timeout 15m

This replaces steps 2 and 4 of the procedure above; steps 3 and 5 are unchanged.

Create a namespace.
```
oc new-project $NAMESPACE
```

Install the Helm chart with the OpenShift overlay.

helm upgrade --install rag -n $NAMESPACE . \
  -f values-openshift.yaml \
  --set imagePullSecret.password="$NGC_API_KEY" \
  --set ngcApiSecret.password="$NGC_API_KEY" \
  --timeout 15m

The values-openshift.yaml overlay enables the following:

OpenShift Routes for the frontend and RAG server with edge TLS
anyuid SCC RoleBinding for all ServiceAccounts that need it
ClusterIP service type for the frontend (Routes handle external access)

Note

If your GPU nodes have taints, you must add tolerations. Pass them on the command line with --set-json or create a values overlay file. For example, if your GPU nodes have a gpu-taint taint:

helm upgrade --install rag -n $NAMESPACE . \
  -f values-openshift.yaml \
  --set imagePullSecret.password="$NGC_API_KEY" \
  --set ngcApiSecret.password="$NGC_API_KEY" \
  --set-json 'nimOperator.nim-llm.tolerations=[{"key":"gpu-taint","operator":"Exists","effect":"NoSchedule"}]' \
  --set-json 'nimOperator.nvidia-nim-llama-nemotron-embed-1b-v2.tolerations=[{"key":"gpu-taint","operator":"Exists","effect":"NoSchedule"}]' \
  --set-json 'nimOperator.nvidia-nim-llama-nemotron-rerank-1b-v2.tolerations=[{"key":"gpu-taint","operator":"Exists","effect":"NoSchedule"}]' \
  --set-json 'nv-ingest.nimOperator.ocr.tolerations=[{"key":"gpu-taint","operator":"Exists","effect":"NoSchedule"}]' \
  --set-json 'nv-ingest.nimOperator.page_elements.tolerations=[{"key":"gpu-taint","operator":"Exists","effect":"NoSchedule"}]' \
  --timeout 15m

The chart also includes a values-openshift-test.yaml reference overlay that demonstrates tolerations, resource tuning, disabled observability, and API-hosted LLM mode. Edit the toleration keys to match your cluster and layer it on with -f values-openshift-test.yaml.

Link the NGC pull secret to the NIM Operator ServiceAccount.

The NIM Operator creates a nim-cache-sa ServiceAccount for model cache jobs. Link the pull secret so it can pull NIM model images:
```
oc secrets link nim-cache-sa ngc-secret --for=pull -n $NAMESPACE
```
If NIMCache pods are stuck in ImagePullBackOff, delete them so the operator recreates them with the linked secret:
```
oc delete pod -l app.nvidia.com/nim-cache -n $NAMESPACE
```

Verify a Deployment#

List the pods by running the following code.

oc get pods -n $NAMESPACE

You should see output similar to the following.

NAME                                          READY   STATUS    AGE
ingestor-server-xxxxxxxxx-xxxxx               1/1     Running   5m
rag-eck-elasticsearch-es-default-0            1/1     Running   5m
nemotron-embedding-ms-xxxxxxxxx-xxxxx         1/1     Running   10m
nemotron-graphic-elements-v1-xxxxxxxxx-xxxxx  1/1     Running   10m
nemotron-ocr-v1-xxxxxxxxx-xxxxx               1/1     Running   10m
nemotron-page-elements-v3-xxxxxxxxx-xxxxx     1/1     Running   10m
nemotron-ranking-ms-xxxxxxxxx-xxxxx           1/1     Running   10m
nemotron-table-structure-v1-xxxxxxxxx-xxxxx   1/1     Running   10m
nim-llm-xxxxxxxxx-xxxxx                       1/1     Running   15m
rag-frontend-xxxxxxxxx-xxxxx                  1/1     Running   5m
rag-nv-ingest-xxxxxxxxx-xxxxx                 1/1     Running   5m
rag-redis-master-0                            1/1     Running   5m
rag-redis-replicas-0                          1/1     Running   5m
rag-seaweedfs-all-in-one-xxxxxxxxx-xxxxx      1/1     Running   5m
rag-server-xxxxxxxxx-xxxxx                    1/1     Running   5m

If you have enabled Milvus instead of the default Elasticsearch vector database (see Vector database configuration), the list also includes rag-etcd-0 and rag-minio-xxx pods.

Note

Model downloads do not show detailed progress indicators in pod status. Pods may appear in “ContainerCreating” or “Init” state for extended periods while models download in the background.

You can monitor the deployment progress by running the following code.

# Check NIMCache download status (shows if cache is ready)
oc get nimcache -n $NAMESPACE

# Check NIMService status
oc get nimservice -n $NAMESPACE

# Check events for detailed information
oc get events -n $NAMESPACE --sort-by='.lastTimestamp'

# Watch logs of a specific pod to see detailed progress
oc logs -f <pod-name> -n $NAMESPACE

Verify OpenShift Routes are created.
```
oc get routes -n $NAMESPACE
```

Get the application URLs.

# Frontend URL
echo "https://$(oc get route rag-frontend -n $NAMESPACE -o jsonpath='{.spec.host}')"

# API URL
echo "https://$(oc get route rag-server -n $NAMESPACE -o jsonpath='{.spec.host}')"

# API health check
API_HOST=$(oc get route rag-server -n $NAMESPACE -o jsonpath='{.spec.host}')
curl -sk "https://${API_HOST}/health"

Experiment with the Web User Interface#

Open a web browser and access the frontend URL from the previous step. You can start experimenting by uploading documents and asking questions. For details, see User Interface for NVIDIA RAG Blueprint.

Note

Unlike standard Kubernetes deployments, OpenShift Routes provide external access directly — no kubectl port-forward is needed.

Using NVIDIA-Hosted Models (Reduced GPU Requirements)#

For clusters with limited GPU capacity, you can use NVIDIA-hosted model endpoints at build.nvidia.com for the LLM while keeping embedding, reranking, and NV-Ingest NIMs self-hosted.

Set the LLM server URLs to empty strings and disable the self-hosted NIM LLM:

nimOperator:
  nim-llm:
    enabled: false

envVars:
  APP_LLM_SERVERURL: ""
  APP_QUERYREWRITER_SERVERURL: ""
  APP_FILTEREXPRESSIONGENERATOR_SERVERURL: ""
  REFLECTION_LLM_SERVERURL: ""

ingestor-server:
  envVars:
    SUMMARY_LLM_SERVERURL: ""

The included values-openshift-test.yaml overlay implements this pattern. Layer it on with -f values-openshift-test.yaml.

Change a Deployment#

To change an existing deployment, after you modify the values files, run the following code.

helm upgrade rag -n $NAMESPACE . \
  -f values-openshift.yaml \
  --set imagePullSecret.password="$NGC_API_KEY" \
  --set ngcApiSecret.password="$NGC_API_KEY"

Uninstall a Deployment#

To uninstall a deployment, run the following code.

helm uninstall rag -n $NAMESPACE

Run the following code to remove the NIMCache and Persistent Volume Claims (PVCs) created by the chart which are not removed by default.

oc delete nimcache --all -n $NAMESPACE
oc delete nimservice --all -n $NAMESPACE
oc delete pvc --all -n $NAMESPACE

To delete the namespace entirely:

oc delete namespace $NAMESPACE

OpenShift-Specific Troubleshooting#

Security Context Constraints (SCC)#

Symptom: Pods fail with CrashLoopBackOff and logs show permission errors such as mkdir: cannot create directory '/opt/nim/.cache': Permission denied.

Why: OpenShift’s default restricted SCC assigns random UIDs. NIM containers and infrastructure services expect to run as specific users.

Fix: The chart’s openshift.yaml template automatically grants the anyuid SCC to required ServiceAccounts when openshift.enabled is true. If you are not using values-openshift.yaml, grant anyuid manually:

oc adm policy add-scc-to-user anyuid -z default -n $NAMESPACE

nv-ingest Ray Worker Failures on Clusters with Low `podPidsLimit`#

Symptom: The rag-nv-ingest pod restarts repeatedly with pthread_create failed: Resource temporarily unavailable in its logs. Ingestion tasks remain in the pending state and the Redis queue (LLEN ingest_task_queue) does not drain.

Why: The pod’s cgroup cpuset.cpus reflects the full host CPU set (for example, 0-255), so Ray detects all host CPUs and prestarts an equally large Python worker pool. Each worker spawns several gRPC threads during initialization. On clusters where the kubelet enforces the default podPidsLimit of 4096, the cumulative thread count exceeds the cgroup’s PID ceiling, and worker processes are terminated before they can register with the raylet.

Recommended fix: Raise the kubelet podPidsLimit to 16384 via a KubeletConfig custom resource. See Prerequisites step 10 for the manifest. This is the cluster-level change that addresses the root cause.

Workaround when the cluster podPidsLimit cannot be raised: The values-openshift.yaml overlay enables a sitecustomize.py ConfigMap (nv-ingest.pyPatches.enabled: true) that overrides os.cpu_count and psutil.cpu_count to return the value of RAG_NV_INGEST_DETECTED_CPUS (default 4). The overlay also sets MAX_INGEST_PROCESS_WORKERS=4 to cap the number of Ray actor replicas per pipeline stage. Together, these settings keep the pod’s steady-state PID count well below the cgroup limit at the cost of slower per-document throughput.

Tuning the worker count: To change the worker count, update the values in values-openshift.yaml and re-run helm upgrade:

nv-ingest:
  envVars:
    RAG_NV_INGEST_DETECTED_CPUS: "8"   # increase to improve throughput
    MAX_INGEST_PROCESS_WORKERS: "8"    # keep aligned with the value above

Alternatively, override the values on the command line without editing the file:

helm upgrade --install rag -n $NAMESPACE . \
  -f values-openshift.yaml \
  --set 'nv-ingest.envVars.RAG_NV_INGEST_DETECTED_CPUS=8' \
  --set 'nv-ingest.envVars.MAX_INGEST_PROCESS_WORKERS=8' \
  --set imagePullSecret.password="$NGC_API_KEY" \
  --set ngcApiSecret.password="$NGC_API_KEY"

Higher values reduce per-document ingestion latency but increase the pod’s PID consumption. Values above 8 are not recommended unless the kubelet podPidsLimit has first been raised (typically to 16384) via the KubeletConfig manifest in Prerequisites step 10.

Reranker HTTP 500 Errors from Thread Pool Initialization Failure#

Symptom: The rag-server logs report [500] Unknown Error during query generation. The nemotron-ranking-ms pod logs contain ThreadPoolBuildError { kind: IOError(Os { code: 11, kind: WouldBlock }) } originating in the HuggingFace tokenizer path.

Why: The reranker NIM’s Rust/Rayon thread pool defaults to one thread per host CPU. On nodes that expose the full host cpuset, initialization exceeds the kubelet podPidsLimit and the NIM returns HTTP 500. The base chart sets thread caps on the OCR and YOLOX NIMs but not on the reranker.

Recommended fix: Raise the kubelet podPidsLimit to 16384 via a KubeletConfig custom resource. See Prerequisites step 10 for the manifest.

Workaround when the cluster podPidsLimit cannot be raised: The values-openshift.yaml overlay sets RAYON_NUM_THREADS=4 and TOKENIZERS_PARALLELISM=false on the reranker NIM. To adjust the cap, edit the value in values-openshift.yaml and re-run helm upgrade.

GPU Node Scheduling and Tolerations#

Symptom: NIM pods stay in Pending state.

Why: GPU nodes typically have taints. NIM workloads need matching tolerations.

Fix: Discover your taint keys and set tolerations in your values file:

oc get nodes -l nvidia.com/gpu.present=true \
  -o custom-columns="NODE:.metadata.name,TAINTS:.spec.taints[*].key"

Set matching tolerations for each NIM component via --set-json or a values overlay. The values-openshift-test.yaml file demonstrates the pattern.

NIM LLM VRAM Requirements#

Symptom: NIM LLM pod crashes during model loading with torch.OutOfMemoryError.

Fix: For GPUs with limited VRAM, reduce NIM_MAX_MODEL_LEN or use NVIDIA-hosted models as described in Using NVIDIA-Hosted Models.

Route Timeouts#

Symptom: Document ingestion or complex queries return 504 Gateway Timeout.

Why: OpenShift’s default Route timeout is 30 seconds. The chart sets haproxy.router.openshift.io/timeout: 300s on the RAG server Route, but if you create Routes manually, set this annotation explicitly.

Common Issues#

Issue	Cause	Solution
Pods stuck in `Pending`	Missing tolerations or insufficient GPU resources	Check taints; set tolerations in values
`ImagePullBackOff`	Missing NGC secret or unaccepted NIM license	Verify `ngc-secret` exists; accept licenses at build.nvidia.com
`CrashLoopBackOff`	SCC restrictions or insufficient memory	Enable `openshift.enabled`; check resource limits
NIM LLM `OOMKilled`	Insufficient VRAM	Reduce `NIM_MAX_MODEL_LEN` or use NVIDIA-hosted LLM
PVC `Pending`	StorageClass not found	Set correct `storageClass` in values or use `""` for default
`504 Gateway Timeout`	Route timeout too low	Annotate route with `haproxy.router.openshift.io/timeout=300s`
NIMCache `ImagePullBackOff`	Pull secret not linked to `nim-cache-sa`	Run `oc secrets link nim-cache-sa ngc-secret --for=pull`
Ingest tasks stuck `pending`	nv-ingest Ray workers hit `podPidsLimit`	See nv-ingest Ray Worker Failures
Reranker returns HTTP 500 with `ThreadPoolBuildError`	Rust/Rayon thread pool exceeds pod PID limit	See Reranker HTTP 500 Errors
`helm upgrade` fails with `yaml: ... did not find expected '-' indicator`	Indent adjustment needed in pulled `nv-ingest` 26.3.0 chart	Re-apply the post-`dependency build` step in Deploy step 2

Troubleshooting Helm Issues#

For general troubleshooting issues with Helm deployment, refer to Troubleshooting.

Deploy NVIDIA RAG Blueprint on OpenShift with Helm#

Prerequisites#

Deploy the RAG Helm Chart#

Verify a Deployment#

Experiment with the Web User Interface#

Using NVIDIA-Hosted Models (Reduced GPU Requirements)#

Change a Deployment#

Uninstall a Deployment#

OpenShift-Specific Troubleshooting#

Security Context Constraints (SCC)#

nv-ingest Ray Worker Failures on Clusters with Low podPidsLimit#

Reranker HTTP 500 Errors from Thread Pool Initialization Failure#

GPU Node Scheduling and Tolerations#

NIM LLM VRAM Requirements#

Route Timeouts#

Common Issues#

Troubleshooting Helm Issues#

Related Topics#

nv-ingest Ray Worker Failures on Clusters with Low `podPidsLimit`#