Deploy NVIDIA RAG Blueprint with Docker (NVIDIA-Hosted Models)#
Use this documentation to deploy the NVIDIA RAG Blueprint with Docker Compose for a single node deployment, and using NVIDIA-hosted models for testing and experimenting. For other deployment options, refer to Deployment Options.
Tip
If you want to run the RAG Blueprint with NVIDIA AI Workbench, use Quickstart for NVIDIA AI Workbench.
Note
When using NVIDIA-hosted endpoints, you might encounter rate limiting with larger file ingestions (>10 files). For details, see Troubleshoot.
Prerequisites#
Install Docker Engine. For more information, see Ubuntu.
Install Docker Compose. For more information, see install the Compose plugin.
a. Ensure the Docker Compose plugin version is 2.29.1 or later.
b. After you get the Docker Compose plugin installed, run
docker compose versionto confirm.To pull images required by the blueprint from NGC, you must first authenticate Docker with nvcr.io. Use the NGC API Key you created in the first step.
export NGC_API_KEY="nvapi-..." echo "${NGC_API_KEY}" | docker login nvcr.io -u '$oauthtoken' --password-stdin
Some containers with are enabled with GPU acceleration, such as Milvus and NVIDIA NIMS deployed on-prem. To configure Docker for GPU-accelerated containers, install, the NVIDIA Container Toolkit.
Start services using NVIDIA-hosted models#
Use the following procedure to start all containers needed for this blueprint.
Open
deploy/compose/.envand uncomment the sectionEndpoints for using cloud NIMs. Then set the environment variables by running the following code.source deploy/compose/.env
Start the vector db containers from the repo root.
docker compose -f deploy/compose/vectordb.yaml up -d
Start the ingestion containers from the repo root. This pulls the prebuilt containers from NGC and deploys it on your system.
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d
You can check the status of the ingestor-server by running the following code.
curl -X 'GET' 'http://workstation_ip:8082/v1/health?check_dependencies=true' -H 'accept: application/json'
You should see output similar to the following.
{ "message": "Service is up.", "databases": [ ... ], "object_storage": [ ... ], "nim": [ { "service": "Embeddings", "status": "healthy", "message": "Using NVIDIA API Catalog", ... }, { "service": "Summary LLM", "status": "healthy", "message": "Using NVIDIA API Catalog", ... }, { "service": "Caption Model", "status": "healthy", "message": "Using NVIDIA API Catalog", ... } ], "processing": [ { "service": "NV-Ingest", "status": "healthy", ... } ], "task_management": [ { "service": "Redis", "status": "healthy", ... } ] }
Start the rag containers from the repo root. This pulls the prebuilt containers from NGC and deploys it on your system.
docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d
You can check the status of the rag-server and its dependencies by issuing this curl command
curl -X 'GET' 'http://workstation_ip:8081/v1/health?check_dependencies=true' -H 'accept: application/json'
You should see output similar to the following.
{ "message": "Service is up.", "databases": [ ... ], "object_storage": [ ... ], "nim": [ { "service": "LLM", "status": "healthy", "message": "Using NVIDIA API Catalog", ... }, { "service": "Embeddings", "status": "healthy", "message": "Using NVIDIA API Catalog", ... }, { "service": "Ranking", "status": "healthy", "message": "Using NVIDIA API Catalog", ... } ] }
Check the status of the deployment by running the following code.
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
You should see output similar to the following. Confirm all the following containers are running.
NAMES STATUS compose-nv-ingest-ms-runtime-1 Up 5 minutes (healthy) ingestor-server Up 5 minutes compose-redis-1 Up 5 minutes rag-frontend Up 9 minutes rag-server Up 9 minutes milvus-standalone Up 36 minutes milvus-minio Up 35 minutes (healthy) milvus-etcd Up 35 minutes (healthy)
Experiment with the Web User Interface#
After the RAG Blueprint is deployed, you can use the RAG UI to start experimenting with it.
Open a web browser and access the RAG UI. You can start experimenting by uploading docs and asking questions. For details, see User Interface for NVIDIA RAG Blueprint.
Experiment with the Ingestion API Usage Notebook#
After the RAG Blueprint is deployed, you can use the Ingestion API Usage notebook to start experimenting with it. For details, refer to Experiment with the Ingestion API Usage Notebook.
Shut down services#
To stop all running services.
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml down docker compose -f deploy/compose/docker-compose-rag-server.yaml down docker compose -f deploy/compose/vectordb.yaml down
Advanced Deployment Considerations#
After the first time you deploy the RAG Blueprint successfully, you can consider the following advanced deployment options:
For information about advanced settings, see Best Practices for Common Settings.
To turn on recommended configurations for accuracy optimization, run the following code:
source deploy/compose/accuracy_profile.env
To turn on recommended configurations for performance optimization, run the following code:
source deploy/compose/perf_profile.env
If you don’t have a GPU available, you can switch to CPU-only Milvus by following the instructions in milvus-configuration.md.
If you have a requirement to build the NVIDIA Ingest runtime container from source, you can do it by following instructions here.