RAG Bot#

This is an example chatbot that showcases RAG. This bot interacts with the RAG chain server’s /generate endpoint using the Plugin server to answer questions based on the ingested documents.

The RAG bot showcases the following ACE Agent features:

Integrating an example from NVIDIA’s Generative AI Examples
Direct connection between Chat Controller and Plugin server
Streaming the JSON response from Plugin server
Can be deployed in either Chat Engine Server Architecture or Plugin Server Architecture

RAG Chain server deployment

Deploy one of the RAG examples by following the instructions in the GenerativeAIExamples repository. A good example to start with is the NVIDIA API Catalog example. You can also deploy RAG Server in Kubernetes using NVIDIA Enterprise RAG LLM Operator.
Ingest documents as required for your use case by visiting http://<your-ip>:8090/kb.

Docker-based bot deployment

The RAG sample bot is present in the quickstart directory at ./samples/rag_bot/.

Prepare the environment for the Docker compose commands.

export BOT_PATH=./samples/rag_bot/
source deploy/docker/docker_init.sh

Deploy the Speech models.

docker compose -f deploy/docker/docker-compose.yml up model-utils-speech

Deploy the ACE Agent microservices. Deploy the Chat Controller, Chat Engine, Plugin server, and NLP server microservices.
docker compose -f deploy/docker/docker-compose.yml up speech-bot -d
Wait for a few minutes for all services to be ready. You can check the Docker logs for individual microservices to confirm. You will see log print Server listening on 0.0.0.0:50055 in the Docker logs for the Chat Controller container.
Try out the bot using a web browser. You can deploy a sample frontend application with voice capture and playback support as well as with text input-output support using the following command.
docker compose -f deploy/docker/docker-compose.yml up frontend-speech
Interact with the bot using the URL http://<workstation IP>:9001/. For accessing the mic on the browser, we need to either convert http to https endpoint by adding SSL validation or update your chrome://flags/ or edge://flags/ to allow http://<workstation IP>:9001 as a secure endpoint.

You can try asking questions related to the ingested documents.

Deploy using Plugin Server Architecture by Connecting Chat controller to Plugin server directly

Update the dialog_manager section of speech_config.yaml to point to the plugin server instead of the Chat Engine server.
dialog_manager: DialogManager: server: "http://localhost:9002/rag" use_streaming: true

Launch the Plugin server and the Chat Controller containers.

export BOT_PATH=./samples/rag_bot/
source deploy/docker/docker_init.sh

# deploy Speech models
docker compose -f deploy/docker/docker-compose.yml up model-utils-speech

# deploy plugin server container
docker compose -f deploy/docker/docker-compose.yml up --build plugin-server -d

# deploy chat controller container
docker compose -f deploy/docker/docker-compose.yml up chat-controller -d

# Deploy sample frontend application
docker compose -f deploy/docker/docker-compose.yml up frontend-speech -d