Sample RAG Application

Prerequisites
Install a Vector Database
Install the NVIDIA Multi-Turn RAG Application
Accessing the RAG Playground
Related Information
Next Steps

Prerequisites

NVIDIA NIM microservices are deployed:
- NVIDIA NIM for LLMs
- NeMo Retriever Text Embedding NIM
- NeMo Retriever Text Reranking NIM
An active subscription to an NVIDIA AI Enterprise product or be an NVIDIA Developer Program member. Access to the Helm charts and containers is restricted.

Install a Vector Database

NVIDIA used Milvus in a standalone configuration during development and testing. Milvus provides a GPU-accelerated vector store. Pgvector is also supported by the Chain Server application.

If you do not already have Milvus running, refer to Run Milvus with GPU Support Using Helm Chart in the Milvus documentation.

Tip

The Milvus Helm chart does not specify a storage class for the PVCs. If your cluster does not have a default storage class provisioner, you can run a command like the following example:

$ kubectl patch storageclass <storage-class-name> \
    -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Install the NVIDIA Multi-Turn RAG Application

Create a RAG sample namespace:
```
$ kubectl create namespace rag-sample
```

Fetch the Helm chart from NGC:

$ helm fetch https://helm.ngc.nvidia.com/ohlfw0olaadg/ea-participants/charts/rag-app-multiturn-chatbot-v24.06.tgz \
    --username='$oauthtoken' --password=<ngc-api-key>

Save the values from the chart in a file:

$ helm show values rag-app-multiturn-chatbot-v24.06.tgz > values.yaml

Edit the values.yaml file and update the imagePullSecret.password and query.env fields with the following environment variables:

   imagePullSecret:
     ...
     password: "<ngc-api-key>"

   env:
     APP_VECTORSTORE_URL: "http://milvus.milvus.svc.cluster.local:19530"
     APP_VECTORSTORE_NAME: "milvus"
     APP_LLM_SERVERURL: "meta-llama3-8b-instruct.nim-service.svc.cluster.local:8000"
     APP_LLM_MODELNAME: meta/llama3-8b-instruct
     APP_LLM_MODELENGINE: nvidia-ai-endpoints
     APP_EMBEDDINGS_SERVERURL: "nv-embedqa-e5-v5.nim-service.svc.cluster.local:8000"
     APP_EMBEDDINGS_MODELNAME: nvidia/nv-embedqa-e5-v5
     APP_EMBEDDINGS_MODELENGINE: nvidia-ai-endpoints
     APP_RANKING_SERVERURL: "nv-rerankqa-mistral-4b-v3.nim-service.svc.cluster.local:8000"
     APP_RANKING_MODELNAME: nvidia/nv-rerankqa-mistral-4b-v3
     APP_RANKING_MODELENGINE: nvidia-ai-endpoints
     COLLECTION_NAME: multi_turn_rag
     APP_RETRIEVER_TOPK: 2
     APP_RETRIEVER_SCORETHRESHOLD: 0.25
     APP_TEXTSPLITTER_CHUNKSIZE: 506
     APP_TEXTSPLITTER_CHUNKOVERLAP: 200

The reranking microservice is optional. Set APP_RANKING_SERVERURL and APP_RANKING_MODELNAME variables to empty ("") to prevent the chain server from attempting to use the reranking microservice. Keep APP_RANKING_MODELENGINE: nvidia-ai-endpoints even if you did not deploy a reranking microservice.
The APP_VECTORSTORE_URL value is for Milvus running in a milvus namespace. Substitute your cluster-specific namespace or another address if Milvus is not running in the same cluster.

If you use pgvector, specify a connection string like pgvector.<namespace>:5432 and APP_VECTORSTORE_NAME: pgvector.
The APP_xxxxx_SERVERURL values are for services running in the nim-service namespace. Substitute your cluster-specific namespace.

Install the Helm chart:

$ helm install -n rag-sample multiturn-rag rag-app-multiturn-chatbot-v24.06.tgz -f values.yaml

Optional: List resources in the namespace:

$ kubectl get all -n rag-sample

Example Output

NAME                                                READY   STATUS    RESTARTS   AGE
pod/chain-server-multi-turn-9759ff9ff-62fdh         1/1     Running   0          99s
pod/rag-playground-multiturn-rag-5cbdc574d6-tgb9l   1/1     Running   0          99s

NAME                                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
service/chain-server-multi-turn        ClusterIP   10.105.82.33    <none>        8082/TCP         99s
service/rag-playground-multiturn-rag   NodePort    10.99.241.217   <none>        3001:30621/TCP   99s

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/chain-server-multi-turn        1/1     1            1           99s
deployment.apps/rag-playground-multiturn-rag   1/1     1            1           99s

NAME                                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/chain-server-multi-turn-9759ff9ff         1         1         1       99s
replicaset.apps/rag-playground-multiturn-rag-5cbdc574d6   1         1         1       99s

Accessing the RAG Playground

If your cluster is not configured to work with an external load balancer or ingress, you can port-forward the HTTP connection to the sample chat application.

Determine the node port for the sample chat application:

$ kubectl get service -n rag-sample rag-playground-multiturn-rag

In the following sample output, the application is listening on node port 30817.

NAME                           TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
rag-playground-multiturn-rag   NodePort   10.109.239.27   <none>        3001:30817/TCP   5d19h

Forward the port:

$ kubectl port-forward service/rag-playground-multiturn-rag -n rag-sample 30817:3001

After you forward the port, you can access the application at http://localhost:30817.

Next Steps

You can uninstall the Helm chart by running helm uninstall -n rag-sample multiturn-rag.