Sample RAG Application
Prerequisites
NVIDIA NIM microservices are deployed:
NVIDIA NIM for LLMs
NeMo Retriever Text Embedding NIM
NeMo Retriever Text Reranking NIM
An active subscription to an NVIDIA AI Enterprise product or be an NVIDIA Developer Program member. Access to the Helm charts and containers is restricted.
Install a Vector Database
NVIDIA used Milvus in a standalone configuration during development and testing. Milvus provides a GPU-accelerated vector store. Pgvector is also supported by the Chain Server application.
If you do not already have Milvus running, refer to Run Milvus with GPU Support Using Helm Chart in the Milvus documentation.
Tip
The Milvus Helm chart does not specify a storage class for the PVCs. If your cluster does not have a default storage class provisioner, you can run a command like the following example:
$ kubectl patch storageclass <storage-class-name> \
-p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
Install the NVIDIA Multi-Turn RAG Application
Create a RAG sample namespace:
$ kubectl create namespace rag-sample
Fetch the Helm chart from NGC:
$ helm fetch https://helm.ngc.nvidia.com/ohlfw0olaadg/ea-participants/charts/rag-app-multiturn-chatbot-v24.06.tgz \ --username='$oauthtoken' --password=<ngc-api-key>
Save the values from the chart in a file:
$ helm show values rag-app-multiturn-chatbot-v24.06.tgz > values.yaml
Edit the
values.yaml
file and update theimagePullSecret.password
andquery.env
fields with the following environment variables:imagePullSecret: ... password: "<ngc-api-key>" env: APP_VECTORSTORE_URL: "http://milvus.milvus.svc.cluster.local:19530" APP_VECTORSTORE_NAME: "milvus" APP_LLM_SERVERURL: "meta-llama3-8b-instruct.nim-service.svc.cluster.local:8000" APP_LLM_MODELNAME: meta/llama3-8b-instruct APP_LLM_MODELENGINE: nvidia-ai-endpoints APP_EMBEDDINGS_SERVERURL: "nv-embedqa-e5-v5.nim-service.svc.cluster.local:8000" APP_EMBEDDINGS_MODELNAME: nvidia/nv-embedqa-e5-v5 APP_EMBEDDINGS_MODELENGINE: nvidia-ai-endpoints APP_RANKING_SERVERURL: "nv-rerankqa-mistral-4b-v3.nim-service.svc.cluster.local:8000" APP_RANKING_MODELNAME: nvidia/nv-rerankqa-mistral-4b-v3 APP_RANKING_MODELENGINE: nvidia-ai-endpoints COLLECTION_NAME: multi_turn_rag APP_RETRIEVER_TOPK: 2 APP_RETRIEVER_SCORETHRESHOLD: 0.25 APP_TEXTSPLITTER_CHUNKSIZE: 506 APP_TEXTSPLITTER_CHUNKOVERLAP: 200
The reranking microservice is optional. Set
APP_RANKING_SERVERURL
andAPP_RANKING_MODELNAME
variables to empty (""
) to prevent the chain server from attempting to use the reranking microservice. KeepAPP_RANKING_MODELENGINE: nvidia-ai-endpoints
even if you did not deploy a reranking microservice.The
APP_VECTORSTORE_URL
value is for Milvus running in amilvus
namespace. Substitute your cluster-specific namespace or another address if Milvus is not running in the same cluster.If you use pgvector, specify a connection string like
pgvector.<namespace>:5432
andAPP_VECTORSTORE_NAME: pgvector
.The
APP_xxxxx_SERVERURL
values are for services running in thenim-service
namespace. Substitute your cluster-specific namespace.
Install the Helm chart:
$ helm install -n rag-sample multiturn-rag rag-app-multiturn-chatbot-v24.06.tgz -f values.yaml
Optional: List resources in the namespace:
$ kubectl get all -n rag-sample
Example Output
NAME READY STATUS RESTARTS AGE pod/chain-server-multi-turn-9759ff9ff-62fdh 1/1 Running 0 99s pod/rag-playground-multiturn-rag-5cbdc574d6-tgb9l 1/1 Running 0 99s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/chain-server-multi-turn ClusterIP 10.105.82.33 <none> 8082/TCP 99s service/rag-playground-multiturn-rag NodePort 10.99.241.217 <none> 3001:30621/TCP 99s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/chain-server-multi-turn 1/1 1 1 99s deployment.apps/rag-playground-multiturn-rag 1/1 1 1 99s NAME DESIRED CURRENT READY AGE replicaset.apps/chain-server-multi-turn-9759ff9ff 1 1 1 99s replicaset.apps/rag-playground-multiturn-rag-5cbdc574d6 1 1 1 99s
Accessing the RAG Playground
If your cluster is not configured to work with an external load balancer or ingress, you can port-forward the HTTP connection to the sample chat application.
Determine the node port for the sample chat application:
$ kubectl get service -n rag-sample rag-playground-multiturn-rag
In the following sample output, the application is listening on node port 30817.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE rag-playground-multiturn-rag NodePort 10.109.239.27 <none> 3001:30817/TCP 5d19h
Forward the port:
$ kubectl port-forward service/rag-playground-multiturn-rag -n rag-sample 30817:3001
After you forward the port, you can access the application at http://localhost:30817.
Next Steps
You can uninstall the Helm chart by running
helm uninstall -n rag-sample multiturn-rag
.