Sample RAG Application

Prerequisites

  • NVIDIA NIM microservices are deployed:

    • NVIDIA NIM for LLMs

    • NeMo Retriever Text Embedding NIM

    • NeMo Retriever Text Reranking NIM

  • An active subscription to an NVIDIA AI Enterprise product or be an NVIDIA Developer Program member. Access to the Helm charts and containers is restricted.

Install a Vector Database

NVIDIA used Milvus in a standalone configuration during development and testing. Milvus provides a GPU-accelerated vector store. Pgvector is also supported by the Chain Server application.

If you do not already have Milvus running, refer to Run Milvus with GPU Support Using Helm Chart in the Milvus documentation.

Tip

The Milvus Helm chart does not specify a storage class for the PVCs. If your cluster does not have a default storage class provisioner, you can run a command like the following example:

$ kubectl patch storageclass <storage-class-name> \
    -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Install the NVIDIA Multi-Turn RAG Application

  1. Create a RAG sample namespace:

    $ kubectl create namespace rag-sample
    
  2. Add a Docker registry secret for downloading the NIM container image from NVIDIA NGC:

    $ kubectl create secret -n rag-sample docker-registry ngc-secret-multi-turn \
        --docker-server=nvcr.io \
        --docker-username='$oauthtoken' \
        --docker-password=<ngc-api-key>
    
  3. Fetch the Helm chart from NGC:

    $ helm fetch https://helm.ngc.nvidia.com/ohlfw0olaadg/ea-participants/charts/rag-app-multiturn-chatbot-v24.06.tgz \
        --username='$oauthtoken' --password=<ngc-api-key>
    
  4. Save the values from the chart in a file:

    $ helm show values rag-app-multiturn-chatbot-v24.06.tgz > values.yaml
    
  5. Edit the values.yaml file and update the query.env field with the following environment variables:

       env:
         APP_VECTORSTORE_URL: "http://milvus.milvus.svc.cluster.local:19530"
         APP_VECTORSTORE_NAME: "milvus"
         APP_LLM_SERVERURL: "meta-llama3-8b-instruct.nim-service.svc.cluster.local:8000"
         APP_LLM_MODELNAME: meta/llama3-8b-instruct
         APP_LLM_MODELENGINE: nvidia-ai-endpoints
         APP_EMBEDDINGS_SERVERURL: "nv-embedqa-e5-v5.nim-service.svc.cluster.local:8000"
         APP_EMBEDDINGS_MODELNAME: nvidia/nv-embedqa-e5-v5
         APP_EMBEDDINGS_MODELENGINE: nvidia-ai-endpoints
         APP_RANKING_SERVERURL: "nv-rerankqa-mistral-4b-v3.nim-service.svc.cluster.local:8000"
         APP_RANKING_MODELNAME: nvidia/nv-rerankqa-mistral-4b-v3
         APP_RANKING_MODELENGINE: nvidia-ai-endpoints
         COLLECTION_NAME: multi_turn_rag
         APP_RETRIEVER_TOPK: 2
         APP_RETRIEVER_SCORETHRESHOLD: 0.25
         APP_TEXTSPLITTER_CHUNKSIZE: 506
         APP_TEXTSPLITTER_CHUNKOVERLAP: 200
    
    • The reranking microservice is optional. Set APP_RANKING_SERVERURL and APP_RANKING_MODELNAME variables to empty ("") to prevent the chain server from attempting to use the reranking microservice. Keep APP_RANKING_MODELENGINE: nvidia-ai-endpoints even if you did not deploy a reranking microservice.

    • The APP_VECTORSTORE_URL value is for Milvus running in a milvus namespace. Substitute your cluster-specific namespace or another address if Milvus is not running in the same cluster.

      If you use pgvector, specify a connection string like pgvector.<namespace>:5432 and APP_VECTORSTORE_NAME: pgvector.

    • The APP_xxxxx_SERVERURL values are for services running in the nim-service namespace. Substitute your cluster-specific namespace.

  6. Install the Helm chart:

    $ helm install -n rag-sample multiturn-rag rag-app-multiturn-chatbot-v24.06.tgz -f values.yaml
    
  7. Optional: List resources in the namespace:

    $ kubectl get all -n rag-sample
    

    Example Output

    NAME                                                READY   STATUS    RESTARTS   AGE
    pod/chain-server-multi-turn-9759ff9ff-62fdh         1/1     Running   0          99s
    pod/rag-playground-multiturn-rag-5cbdc574d6-tgb9l   1/1     Running   0          99s
    
    NAME                                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    service/chain-server-multi-turn        ClusterIP   10.105.82.33    <none>        8082/TCP         99s
    service/rag-playground-multiturn-rag   NodePort    10.99.241.217   <none>        3001:30621/TCP   99s
    
    NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/chain-server-multi-turn        1/1     1            1           99s
    deployment.apps/rag-playground-multiturn-rag   1/1     1            1           99s
    
    NAME                                                      DESIRED   CURRENT   READY   AGE
    replicaset.apps/chain-server-multi-turn-9759ff9ff         1         1         1       99s
    replicaset.apps/rag-playground-multiturn-rag-5cbdc574d6   1         1         1       99s
    

Accessing the RAG Playground

If your cluster is not configured to work with an external load balancer or ingress, you can port-forward the HTTP connection to the sample chat application.

  1. Determine the node port for the sample chat application:

    $ kubectl get service -n nim-service rag-playground-multiturn-rag
    

    In the following sample output, the application is listening on node port 30817.

    NAME                           TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    rag-playground-multiturn-rag   NodePort   10.109.239.27   <none>        3001:30817/TCP   5d19h
    
  2. Forward the port:

    $ kubectl port-forward service/rag-playground-multiturn-rag -n nim-service 30817:3001
    

After you forward the port, you can access the application at http://localhost:30817.

Next Steps

  • You can uninstall the Helm chart by running helm uninstall -n rag-sample multiturn-rag.