Milvus Configuration for NVIDIA RAG Blueprint#
You can configure how Milvus works with your NVIDIA RAG Blueprint.
GPU to CPU Mode Switch#
Milvus uses GPU acceleration by default for vector operations. Switch to CPU mode if you encounter:
GPU memory constraints
Development without GPU support
Docker compose#
Configuration Steps#
1. Update Docker Compose Configuration (vectordb.yaml)#
First, you need to modify the deploy/compose/vectordb.yaml file to disable GPU usage:
Step 1: Comment Out GPU Reservations Comment out the entire deploy section that reserves GPU resources:
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# capabilities: ["gpu"]
# # count: ${INFERENCE_GPU_COUNT:-all}
# device_ids: ['${VECTORSTORE_GPU_DEVICE_ID:-0}']
Step 2: Change the Milvus Docker Image
# Change this line:
image: milvusdb/milvus:v2.6.2-gpu # milvusdb/milvus:v2.6.2 for CPU
# To this:
image: milvusdb/milvus:v2.6.2 # milvusdb/milvus:v2.6.2-gpu for GPU
2. Set Environment Variables#
Before starting any services, you must set these environment variables in your terminal. These variables tell the ingestor server to use CPU mode:
# Set these environment variables BEFORE starting the ingestor server
export APP_VECTORSTORE_ENABLEGPUSEARCH=False
export APP_VECTORSTORE_ENABLEGPUINDEX=False
3. Restart Services#
After making the configuration changes and setting environment variables, restart the services:
# 1. Stop existing services
docker compose -f deploy/compose/vectordb.yaml down
# 2. Start Milvus and dependencies
docker compose -f deploy/compose/vectordb.yaml up -d
# 3. Now start the ingestor server
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d
Switching Milvus to CPU Mode using Helm#
To configure Milvus to run in CPU mode when deploying with Helm:
Disable GPU search and indexing by editing values.yaml.
A. In the
envVarsandingestor-server.envVarssections, set the following environment variables:```yaml envVars: APP_VECTORSTORE_ENABLEGPUSEARCH: "False" ingestor-server: envVars: APP_VECTORSTORE_ENABLEGPUSEARCH: "False" APP_VECTORSTORE_ENABLEGPUINDEX: "False" ```B. Also, change the image under
milvus.image.allto remove the-gputag.```yaml milvus: image: all: repository: milvusdb/milvus tag: v2.5.17 # instead of v2.5.17-gpu ```C. (Optional) Remove or set GPU resource requests/limits to zero in the
milvus.standalone.resourcesblock.```yaml milvus: standalone: resources: limits: nvidia.com/gpu: 0 ```After you modify values.yaml, apply the changes as described in Change a Deployment.
GPU Indexing with CPU Search#
This mode uses the GPU to build indexes during ingestion while serving search on the CPU. It is useful when you want fast index construction but prefer CPU-based query serving for cost, capacity, or scheduling reasons.
For general GPU↔CPU switching instructions, see the GPU to CPU Mode Switch section above.
Environment Variables#
Set the following before starting the ingestor server:
export APP_VECTORSTORE_ENABLEGPUSEARCH=False
export APP_VECTORSTORE_ENABLEGPUINDEX=True
With APP_VECTORSTORE_ENABLEGPUSEARCH=False, the client enables adapt_for_cpu=true automatically. adapt_for_cpu decides whether to use GPU for index-building and CPU for search. When this parameter is true, search requests must include the ef parameter.
Docker Compose notes#
Keep Milvus running with a GPU-capable image if you want GPU index-building (for example:
milvusdb/milvus:v2.6.2-gpu).Set the environment variables above before starting the ingestor server.
For inference (search and generate) in
rag-server, you can use either the GPU or CPU Docker image. Search will run on CPU for the Milvus collection built with GPU indexing whenAPP_VECTORSTORE_ENABLEGPUSEARCH=False.
Example sequence:
# Start/ensure Milvus is up (GPU image if you want GPU indexing)
docker compose -f deploy/compose/vectordb.yaml up -d
# Set env vars and start the ingestor (GPU indexing + CPU search)
export APP_VECTORSTORE_ENABLEGPUSEARCH=False
export APP_VECTORSTORE_ENABLEGPUINDEX=True
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d
# Start rag-server (either Milvus CPU or GPU image is fine)
docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d
Helm notes#
Set the environment variables in values.yaml:
envVars:
APP_VECTORSTORE_ENABLESEARCH: "True"
ingestor-server:
envVars:
APP_VECTORSTORE_ENABLEGPUSEARCH: "False"
APP_VECTORSTORE_ENABLEGPUINDEX: "True"
If you require GPU index-building, ensure the Milvus image variant supports GPU (for example, keep a -gpu tag where applicable). rag-server can be deployed with either CPU or GPU images for inference; search will be served on CPU for collections indexed with GPU when APP_VECTORSTORE_ENABLEGPUSEARCH is set to False.
Note
When adapt_for_cpu is in effect, your search requests must supply an ef parameter.
(Optional) Customize the Milvus Endpoint#
To use a custom Milvus endpoint, use the following procedure.
Update the
APP_VECTORSTORE_URLandMINIO_ENDPOINTvariables in both the RAG server and the ingestor server sections in values.yaml. Your changes should look similar to the following.env: # ... existing code ... APP_VECTORSTORE_URL: "http://your-custom-milvus-endpoint:19530" MINIO_ENDPOINT: "http://your-custom-minio-endpoint:9000" # ... existing code ... ingestor-server: env: # ... existing code ... APP_VECTORSTORE_URL: "http://your-custom-milvus-endpoint:19530" MINIO_ENDPOINT: "http://your-custom-minio-endpoint:9000" # ... existing code ... nv-ingest: envVars: # ... existing code ... MINIO_INTERNAL_ADDRESS: "http://your-custom-minio-endpoint:9000" # ... existing code ...
Disable the Milvus deployment. Set
milvusDeployed: falsein thenv-ingest.milvusDeployedsection to prevent deploying the default Milvus instance. Your changes should look like the following.nv-ingest: # ... existing code ... milvusDeployed: false # ... existing code ...
Redeploy the Helm chart by running the following code.
helm upgrade rag https://helm.ngc.nvidia.com/0648981100760671/charts/nvidia-blueprint-rag-v2.4.0-dev-dev.tgz -f nvidia-blueprint-rag/values.yaml -n rag
Milvus Authentication#
Enable authentication for Milvus to secure your vector database.
Docker Compose#
1. Configure Milvus Authentication#
Extract the default Milvus configuration:
docker cp milvus-standalone:/milvus/configs/milvus.yaml ./deploy/compose/
Edit deploy/compose/milvus.yaml to enable authentication:
security:
authorizationEnabled: true
defaultRootPassword: "your-secure-password"
Mount the configuration file in deploy/compose/vectordb.yaml by uncommenting the volume mount:
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
- ${MILVUS_CONFIG_FILE:-./milvus.yaml}:/milvus/configs/milvus.yaml
2. Start Services#
Start Milvus with authentication:
docker compose -f deploy/compose/vectordb.yaml up -d
Set authentication credentials and start RAG services:
export APP_VECTORSTORE_USERNAME="root"
export APP_VECTORSTORE_PASSWORD="your-secure-password"
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d
docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d
Helm Chart#
1. Configure Milvus Authentication in Helm:#
Configure Milvus Authentication
Edit deploy/helm/nvidia-blueprint-rag/files/milvus.yaml to enable authentication:
security:
authorizationEnabled: true
defaultRootPassword: "your-secure-password"
Create a ConfigMap from the milvus.yaml file:
kubectl create configmap milvus-config --from-file=milvus.yaml=deploy/helm/nvidia-blueprint-rag/files/milvus.yaml
Configure Volume Mounting
The values.yaml file includes the necessary volume configuration:
milvus:
standalone:
extraVolumes:
- name: milvus-config
configMap:
name: milvus-config
extraVolumeMounts:
- name: milvus-config
mountPath: /milvus/configs/milvus.yaml
subPath: milvus.yaml
2. Configure username and password in deploy/helm/nvidia-blueprint-rag/values.yaml:#
rag-server:
envVars:
APP_VECTORSTORE_USERNAME: "root"
APP_VECTORSTORE_PASSWORD: "your-secure-password"
ingestor-server:
envVars:
APP_VECTORSTORE_USERNAME: "root"
APP_VECTORSTORE_PASSWORD: "your-secure-password"
3. Deploy with Helm:#
helm upgrade --install rag -n rag https://helm.ngc.nvidia.com/0648981100760671/charts/nvidia-blueprint-rag-v2.4.0-dev-dev-rc2.tgz \
--username '$oauthtoken' \
--password "${NGC_API_KEY}" \
--set imagePullSecret.password=$NGC_API_KEY \
--set ngcApiSecret.password=$NGC_API_KEY \
-f deploy/helm/nvidia-blueprint-rag/values.yaml
For detailed HELM deployment instructions, see Helm Deployment Guide.
Troubleshooting#
GPU_CAGRA Error#
If you encounter GPU_CAGRA errors that cannot be resolved by when switching to CPU mode, try the following:
Stop all running services:
docker compose -f deploy/compose/vectordb.yaml down docker compose -f deploy/compose/docker-compose-ingestor-server.yaml down
Delete the Milvus volumes directory:
rm -rf deploy/compose/volumes
Restart the services:
docker compose -f deploy/compose/vectordb.yaml up -d docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d
Note
This will delete all existing vector data, so ensure you have backups if needed.