RAG Bot#
This is an example chatbot that showcases RAG. This bot interacts with the RAG chain server’s /generate
endpoint using the Plugin server to answer questions based on the ingested documents.
The RAG bot showcases the following ACE Agent features:
Integrating an example from NVIDIA’s Generative AI Examples
Direct connection between Chat Controller and Plugin server
Streaming the JSON response from Plugin server
Can be deployed in either Chat Engine Server Architecture or Plugin Server Architecture
RAG Chain server deployment
Deploy one of the RAG examples by following the instructions in the GenerativeAIExamples repository. A good example to start with is the NVIDIA API Catalog example. You can also deploy RAG Server in Kubernetes using NVIDIA Enterprise RAG LLM Operator.
Ingest documents as required for your use case by visiting
http://<your-ip>:8090/kb
.
Docker-based bot deployment
The RAG sample bot is present in the quickstart directory at ./samples/rag_bot/
.
Prepare the environment for the Docker compose commands.
export BOT_PATH=./samples/rag_bot/ source deploy/docker/docker_init.sh
Deploy the Speech models.
docker compose -f deploy/docker/docker-compose.yml up model-utils-speech
Deploy the ACE Agent microservices. Deploy the Chat Controller, Chat Engine, Plugin server, and NLP server microservices.
docker compose -f deploy/docker/docker-compose.yml up speech-bot -d
Wait for a few minutes for all services to be ready. You can check the Docker logs for individual microservices to confirm. You will see log print
Server listening on 0.0.0.0:50055
in the Docker logs for the Chat Controller container.Try out the bot using a web browser. You can deploy a sample frontend application with voice capture and playback support as well as with text input-output support using the following command.
docker compose -f deploy/docker/docker-compose.yml up frontend-speech
Interact with the bot using the URL
http://<workstation IP>:9001/
. For accessing the mic on the browser, we need to either converthttp
tohttps
endpoint by adding SSL validation or update yourchrome://flags/
oredge://flags/
to allowhttp://<workstation IP>:9001
as a secure endpoint.
You can try asking questions related to the ingested documents.
Deploy using Plugin Server Architecture by Connecting Chat controller to Plugin server directly
Update the
dialog_manager
section ofspeech_config.yaml
to point to the plugin server instead of the Chat Engine server.dialog_manager: DialogManager: server: "http://localhost:9002/rag" use_streaming: true
Launch the Plugin server and the Chat Controller containers.
export BOT_PATH=./samples/rag_bot/ source deploy/docker/docker_init.sh # deploy Speech models docker compose -f deploy/docker/docker-compose.yml up model-utils-speech # deploy plugin server container docker compose -f deploy/docker/docker-compose.yml up --build plugin-server -d # deploy chat controller container docker compose -f deploy/docker/docker-compose.yml up chat-controller -d # Deploy sample frontend application docker compose -f deploy/docker/docker-compose.yml up frontend-speech -d