Installing the NVIDIA Enterprise RAG LLM Operator

Prerequisites

A Kubernetes cluster and the cluster-admin role. Refer to Platform Support for information about supported operating systems and Kubernetes platforms.
NVIDIA A100 80 GB, H100, or L40S GPUs on one or more nodes. Refer to Platform Support for information about models and required GPU model and GPU count. For large models that exceed the memory capacity of one GPU, you need to add more GPUs. When you deploy a Helm pipeline, you can specify more than one GPU for a workload.
An NGC CLI API key. Pods use the API key as an image pull secret to download container images that are available to early access customers only. Refer to Generating Your NGC API Key in the NVIDIA NGC User Guide for more information.

Special Considerations for VMware vSphere with Tanzu

Tanzu Kubernetes Grid Service enables the PodSecurityPolicy Admission Controller in Tanzu Kubernetes clusters. The admission controller enforces the pod security policy for pods created with a service account. The Operator uses a service account and as a result, requires labelling the namespace to prevent enforcing the policy.

Enter the following commands before installing the Operator:

Copy
Copied!

            
            $ kubectl create namespace rag-operator
$ kubectl label --overwrite ns rag-operator pod-security.kubernetes.io/warn=privileged pod-security.kubernetes.io/enforce=privileged

Install the NVIDIA GPU Operator

Use the NVIDIA GPU Operator to install, configure, and manage the NVIDIA GPU driver and NVIDIA container runtime on the Kubernetes node.

Add the NVIDIA Helm repository:

Copy
Copied!

            
            $ helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
   && helm repo update

Install the Operator:

Copy
Copied!

            
            $ helm install --wait --generate-name \
   -n gpu-operator --create-namespace \
   nvidia/gpu-operator

For more information or to adjust the configuration, refer to Install NVIDIA GPU Operator in the NVIDIA GPU Operator documentation.

Install the RAG LLM Operator

Add the Enterprise LLM RAG Operator repository:

Copy
Copied!

            
            $ helm repo add rag-operator https://helm.ngc.nvidia.com/ohlfw0olaadg/ea-participants \
  --username "\$oauthtoken" --password <ngc-cli-api-key>

Create the RAG Operator namespace:

Copy
Copied!

            
            $ kubectl create namespace rag-operator

Add a Docker registry secret that the Operator uses for pulling containers from NGC:

Copy
Copied!

            
            $ kubectl create secret -n rag-operator docker-registry ngc-secret \
    --docker-server=nvcr.io \
    --docker-username='$oauthtoken' \
    --docker-password=<ngc-cli-api-key>

Install the Operator:

Copy
Copied!

            
            $ helm install rag-operator rag-operator/rag-operator -n rag-operator

Optional: Confirm the controller pod is running:

Copy
Copied!

            
            $ kubectl get pods -n rag-operator

Example Output

Copy
Copied!

            
            NAME                                                                 READY   STATUS    RESTARTS      AGE
rag-operator   k8s-rag-operator-controller-manager-6b546f57d5-g4zgg  2/2     Running     0           35h

Next Steps

Refer to Sample RAG Pipeline to install and configure the inference and embedding services.

Previous Platform Support

Next Sample RAG Pipeline