Deploy NVIDIA RAG Blueprint on Kubernetes with NIM Operator#

Use the following documentation to deploy the NVIDIA RAG Blueprint by using NIM Operator. For other deployment options, refer to Deployment Options.

Prerequisites#

Get an API Key.
Install NIM Operator 3.0.
Ensure you meet the hardware requirements.
Verify that you have the NGC CLI available on your client computer. You can download the CLI from https://ngc.nvidia.com/setup/installers/cli.
Verify that you have Kubernetes v1.33 installed and running on Ubuntu 22.04. For more information, see Kubernetes documentation and NVIDIA Cloud Native Stack repository.
Verify that you have a default storage class available in the cluster for PVC provisioning. One option is the local path provisioner by Rancher. Refer to the installation section of the README in the GitHub repository.
```
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.26/deploy/local-path-storage.yaml
kubectl get pods -n local-path-storage
kubectl get storageclass
```

If the local path storage class is not set as default, you can make it default by running the following code.

kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Verify that you have installed the NVIDIA GPU Operator by using the instructions here.

Enable NIM Operator with Helm Chart#

Change your directory to deploy/helm/ by running the following code.
```
cd deploy/helm/
```
Create a namespace for the deployment by running the following code.
```
kubectl create namespace rag
```

Create ImagePullSecrets with below commands

kubectl create secret -n rag docker-registry ngc-secret \
--docker-server=nvcr.io \
--docker-username='$oauthtoken' \
--docker-password=$NGC_API_KEY

kubectl create secret -n rag generic ngc-api-secret \
--from-literal=NGC_API_KEY=$NGC_API_KEY

Create a NIM Cache with available storage class on the cluster.

kubectl apply -f deploy/helm/nim-operator/rag-nimcache.yaml -n rag

Now create a NIM Services

No GPU Sharing

kubectl apply -f deploy/helm/nim-operator/rag-nimservice.yaml -n rag

GPU Sharing with Dynamic Resource Allocation(DRA)

Tip

With DRA Setup, All NIM Service can run on 3 GPUs with atleast 80GB memory, it could be A100 or H100 or B200

Prerequisite: NVIDIA DRA Driver

Kubernetes v1.33 or newer. Run the below commands to enable DRA FeatureGates on existing Kubernetes Cluster

sudo sed -i 's/- kube-apiserver/- kube-apiserver\n    - --feature-gates=DynamicResourceAllocation=true\n    - --runtime-config=resource.k8s.io\/v1beta1=true\n    - --runtime-config=resource.k8s.io\/v1beta2=true/' /etc/kubernetes/manifests/kube-apiserver.yaml

sudo sed -i 's/- kube-scheduler/- kube-scheduler\n    - --feature-gates=DynamicResourceAllocation=true/' /etc/kubernetes/manifests/kube-scheduler.yaml

sudo sed -i 's/- kube-controller-manager/- kube-controller-manager\n    - --feature-gates=DynamicResourceAllocation=true/' /etc/kubernetes/manifests/kube-controller-manager.yaml

sudo sed -i '$a\'$'\n''featureGates:\n  DynamicResourceAllocation: true' /var/lib/kubelet/config.yaml

sudo systemctl daemon-reload; sudo systemctl restart kubelet

Enable CDI to the GPU Operator and wait for few minutes

kubectl patch clusterpolicies.nvidia.com/cluster-policy --type='json' -p='[{"op": "replace", "path": "/spec/cdi/enabled", "value":true}]'
kubectl patch clusterpolicies.nvidia.com/cluster-policy --type='json' -p='[{"op": "replace", "path": "/spec/cdi/default", "value":true}]'

Verify NVIDIA GPU Driver 565 or later.

kubectl get pods -l app.kubernetes.io/component=nvidia-driver -n nvidia-gpu-operator -o name | xargs -I {} kubectl exec -n nvidia-gpu-operator  {} -- nvidia-smi

Install the NVIDIA DRA Driver

helm upgrade --install --version="25.3.2" --create-namespace --namespace nvidia-dra-driver-gpu nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu -n nvidia-dra-driver-gpu --set gpuResourcesEnabledOverride=true     --set nvidiaDriverRoot=/run/nvidia/driver

kubectl apply -f deploy/helm/nim-operator/rag-nimservice-dra.yaml -n rag

List available profiles for your system for NIM LLM container

More details about profiles can be found here.

Configure NIM Model Profile

For optimal performance, configure the NIM model profile. See the NIM Model Profile Configuration section for detailed instructions and hardware-specific examples.

Configure the NIM_MODEL_PROFILE in deploy/helm/nim-operator/rag-nimservice.yaml:

storage:
  nimCache:
    name: nemotron-llama3-49b-super
     profile: ''
  sharedMemorySizeLimit: 16Gi
env:
- name: NIM_MODEL_PROFILE
  value: "tensorrt_llm-h100_nvl-fp8-tp1-pp1-throughput-2321:10de-6343e21ba5cccf783d18951c6627c207b81803c3c45f1e8b59eee062ed350143-1"  # Example for H100 NVL

After modifying the profile, reapply the NIM service:

kubectl apply -f deploy/helm/nim-operator/rag-nimservice.yaml -n rag

Wait a few minutes and ensure that the NIMService status is Ready before proceeding to the next steps.

kubectl get nimservice -n rag
NAME                                STATUS     AGE
nemoretriever-embedding-ms          Ready      20m
nemoretriever-reranking-ms          Ready      20m
nemoretriever-graphic-elements-v1   Ready      20m
nemoretriever-page-elements-v2      Ready      20m
nemoretriever-table-structure-v1    Ready      20m
nim-llm                             Ready      20m

Delete the existing secret as it’s conflict with RAG installation

kubectl delete secret ngc-secret -n rag

Use the following values-nim-operator.yaml to deploy the RAG with NIM Operator NIM Services

helm upgrade --install rag -n rag https://helm.ngc.nvidia.com/0648981100760671/charts/nvidia-blueprint-rag-v2.4.0-dev.tgz \
--username '$oauthtoken' \
--password "${NGC_API_KEY}" \
--set imagePullSecret.password=$NGC_API_KEY \
--set ngcApiSecret.password=$NGC_API_KEY \
-f deploy/helm/nim-operator/values-nim-operator.yaml

Port-Forwarding to Access Web User Interface#

For Helm deployments, to port-forward the the RAG UI service to your local computer, run the following code. Then access the RAG UI at http://localhost:3000.

kubectl port-forward -n rag service/rag-frontend 3000:3000 --address 0.0.0.0

Experiment with the Web User Interface#

Open a web browser and access the RAG UI. You can start experimenting by uploading docs and asking questions. For details, see User Interface for NVIDIA RAG Blueprint.

Deploy NVIDIA RAG Blueprint on Kubernetes with NIM Operator#

Prerequisites#

Enable NIM Operator with Helm Chart#

Port-Forwarding to Access Web User Interface#

Experiment with the Web User Interface#

Related Topics#