A Kubernetes cluster and the cluster-admin role. Refer to Platform Support for information about supported operating systems and Kubernetes platforms.
NVIDIA A100 80 GB, H100, or L40S GPUs on one or more nodes. Refer to Platform Support for information about models and required GPU model and GPU count. For large models that exceed the memory capacity of one GPU, you need to add more GPUs. When you deploy a Helm pipeline, you can specify more than one GPU for a workload.
An NGC CLI API key. Pods use the API key as an image pull secret to download container images that are available to early access customers only. Refer to Generating Your NGC API Key in the NVIDIA NGC User Guide for more information.
Tanzu Kubernetes Grid Service enables the PodSecurityPolicy Admission Controller in Tanzu Kubernetes clusters. The admission controller enforces the pod security policy for pods created with a service account. The Operator uses a service account and as a result, requires labelling the namespace to prevent enforcing the policy.
Enter the following commands before installing the Operator:
$ kubectl create namespace rag-operator
$ kubectl label --overwrite ns rag-operator pod-security.kubernetes.io/warn=privileged pod-security.kubernetes.io/enforce=privileged
Use the NVIDIA GPU Operator to install, configure, and manage the NVIDIA GPU driver and NVIDIA container runtime on the Kubernetes node.
Add the NVIDIA Helm repository:
$ helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \ && helm repo update
Install the Operator:
$ helm install --wait --generate-name \ -n gpu-operator --create-namespace \ nvidia/gpu-operator
For more information or to adjust the configuration, refer to Install NVIDIA GPU Operator in the NVIDIA GPU Operator documentation.
Add the Enterprise LLM RAG Operator repository:
$ helm repo add rag-operator https://helm.ngc.nvidia.com/ohlfw0olaadg/ea-participants \ --username "\$oauthtoken" --password <ngc-cli-api-key>
Create the RAG Operator namespace:
$ kubectl create namespace rag-operator
Add a Docker registry secret that the Operator uses for pulling containers from NGC:
$ kubectl create secret -n rag-operator docker-registry ngc-secret \ --docker-server=nvcr.io \ --docker-username='$oauthtoken' \ --docker-password=<ngc-cli-api-key>
Install the Operator:
$ helm install rag-operator rag-operator/rag-operator -n rag-operator
Optional: Confirm the controller pod is running:
$ kubectl get pods -n rag-operator
Example Output
NAME READY STATUS RESTARTS AGE rag-operator k8s-rag-operator-controller-manager-6b546f57d5-g4zgg 2/2 Running 0 35h
Refer to Sample RAG Pipeline to install and configure the inference and embedding services.