Deploy NVIDIA RAG Blueprint on Kubernetes with NIM Operator#
Use the following documentation to deploy the NVIDIA RAG Blueprint by using NIM Operator. For other deployment options, refer to Deployment Options.
Prerequisites#
Ensure you meet the hardware requirements.
Verify that you have the NGC CLI available on your client computer. You can download the CLI from https://ngc.nvidia.com/setup/installers/cli.
Verify that you have Kubernetes v1.33 installed and running on Ubuntu 22.04. For more information, see Kubernetes documentation and NVIDIA Cloud Native Stack repository.
Verify that you have a default storage class available in the cluster for PVC provisioning. One option is the local path provisioner by Rancher. Refer to the installation section of the README in the GitHub repository.
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.26/deploy/local-path-storage.yaml kubectl get pods -n local-path-storage kubectl get storageclass
If the local path storage class is not set as default, you can make it default by running the following code.
kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
Verify that you have installed the NVIDIA GPU Operator by using the instructions here.
Enable NIM Operator with Helm Chart#
Change your directory to deploy/helm/ by running the following code.
cd deploy/helm/
Create a namespace for the deployment by running the following code.
kubectl create namespace rag
Create ImagePullSecrets with below commands
kubectl create secret -n rag docker-registry ngc-secret \ --docker-server=nvcr.io \ --docker-username='$oauthtoken' \ --docker-password=$NGC_API_KEY kubectl create secret -n rag generic ngc-api-secret \ --from-literal=NGC_API_KEY=$NGC_API_KEY
Create a NIM Cache with available storage class on the cluster.
kubectl apply -f deploy/helm/nim-operator/rag-nimcache.yaml -n rag
Now create a NIM Services
No GPU Sharing
kubectl apply -f deploy/helm/nim-operator/rag-nimservice.yaml -n rag
GPU Sharing with Dynamic Resource Allocation(DRA)
Tip
With DRA Setup, All NIM Service can run on 3 GPUs with atleast 80GB memory, it could be A100 or H100 or B200
Prerequisite: NVIDIA DRA Driver
Kubernetes v1.33 or newer. Run the below commands to enable DRA FeatureGates on existing Kubernetes Cluster
sudo sed -i 's/- kube-apiserver/- kube-apiserver\n - --feature-gates=DynamicResourceAllocation=true\n - --runtime-config=resource.k8s.io\/v1beta1=true\n - --runtime-config=resource.k8s.io\/v1beta2=true/' /etc/kubernetes/manifests/kube-apiserver.yaml sudo sed -i 's/- kube-scheduler/- kube-scheduler\n - --feature-gates=DynamicResourceAllocation=true/' /etc/kubernetes/manifests/kube-scheduler.yaml sudo sed -i 's/- kube-controller-manager/- kube-controller-manager\n - --feature-gates=DynamicResourceAllocation=true/' /etc/kubernetes/manifests/kube-controller-manager.yaml sudo sed -i '$a\'$'\n''featureGates:\n DynamicResourceAllocation: true' /var/lib/kubelet/config.yaml sudo systemctl daemon-reload; sudo systemctl restart kubelet
Enable CDI to the GPU Operator and wait for few minutes
kubectl patch clusterpolicies.nvidia.com/cluster-policy --type='json' -p='[{"op": "replace", "path": "/spec/cdi/enabled", "value":true}]' kubectl patch clusterpolicies.nvidia.com/cluster-policy --type='json' -p='[{"op": "replace", "path": "/spec/cdi/default", "value":true}]'
Verify NVIDIA GPU Driver 565 or later.
kubectl get pods -l app.kubernetes.io/component=nvidia-driver -n nvidia-gpu-operator -o name | xargs -I {} kubectl exec -n nvidia-gpu-operator {} -- nvidia-smi
Install the NVIDIA DRA Driver
helm upgrade --install --version="25.3.2" --create-namespace --namespace nvidia-dra-driver-gpu nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu -n nvidia-dra-driver-gpu --set gpuResourcesEnabledOverride=true --set nvidiaDriverRoot=/run/nvidia/driver
kubectl apply -f deploy/helm/nim-operator/rag-nimservice-dra.yaml -n rag
List available profiles for your system for NIM LLM container
More details about profiles can be found here.
Configure NIM Model Profile
For optimal performance, configure the NIM model profile. See the NIM Model Profile Configuration section for detailed instructions and hardware-specific examples.
Configure the
NIM_MODEL_PROFILEindeploy/helm/nim-operator/rag-nimservice.yaml:storage: nimCache: name: nemotron-llama3-49b-super profile: '' sharedMemorySizeLimit: 16Gi env: - name: NIM_MODEL_PROFILE value: "tensorrt_llm-h100_nvl-fp8-tp1-pp1-throughput-2321:10de-6343e21ba5cccf783d18951c6627c207b81803c3c45f1e8b59eee062ed350143-1" # Example for H100 NVL
After modifying the profile, reapply the NIM service:
kubectl apply -f deploy/helm/nim-operator/rag-nimservice.yaml -n rag
Wait a few minutes and ensure that the NIMService status is
Readybefore proceeding to the next steps.kubectl get nimservice -n rag NAME STATUS AGE nemoretriever-embedding-ms Ready 20m nemoretriever-reranking-ms Ready 20m nemoretriever-graphic-elements-v1 Ready 20m nemoretriever-page-elements-v2 Ready 20m nemoretriever-table-structure-v1 Ready 20m nim-llm Ready 20m
Delete the existing secret as it’s conflict with RAG installation
kubectl delete secret ngc-secret -n rag
Use the following
values-nim-operator.yamlto deploy the RAG with NIM Operator NIM Services
helm upgrade --install rag -n rag https://helm.ngc.nvidia.com/0648981100760671/charts/nvidia-blueprint-rag-v2.4.0-dev.tgz \
--username '$oauthtoken' \
--password "${NGC_API_KEY}" \
--set imagePullSecret.password=$NGC_API_KEY \
--set ngcApiSecret.password=$NGC_API_KEY \
-f deploy/helm/nim-operator/values-nim-operator.yaml
Port-Forwarding to Access Web User Interface#
For Helm deployments, to port-forward the the RAG UI service to your local computer, run the following code. Then access the RAG UI at http://localhost:3000.
kubectl port-forward -n rag service/rag-frontend 3000:3000 --address 0.0.0.0
Experiment with the Web User Interface#
Open a web browser and access the RAG UI. You can start experimenting by uploading docs and asking questions. For details, see User Interface for NVIDIA RAG Blueprint.