Milvus Configuration for NVIDIA RAG Blueprint#

You can configure how Milvus works with your NVIDIA RAG Blueprint.

GPU to CPU Mode Switch#

Milvus uses GPU acceleration by default for vector operations. Switch to CPU mode if you encounter:

GPU memory constraints
Development without GPU support

Docker compose#

Configuration Steps#

1. Update Docker Compose Configuration (vectordb.yaml)#

First, you need to modify the deploy/compose/vectordb.yaml file to disable GPU usage:

Step 1: Comment Out GPU Reservations Comment out the entire deploy section that reserves GPU resources:

# deploy:
#   resources:
#     reservations:
#       devices:
#         - driver: nvidia
#           capabilities: ["gpu"]
#           # count: ${INFERENCE_GPU_COUNT:-all}
#           device_ids: ['${VECTORSTORE_GPU_DEVICE_ID:-0}']

Step 2: Change the Milvus Docker Image

# Change this line:
image: milvusdb/milvus:v2.6.2-gpu # milvusdb/milvus:v2.6.2 for CPU

# To this:
image: milvusdb/milvus:v2.6.2 # milvusdb/milvus:v2.6.2-gpu for GPU

2. Set Environment Variables#

Before starting any services, you must set these environment variables in your terminal. These variables tell the ingestor server to use CPU mode:

# Set these environment variables BEFORE starting the ingestor server
export APP_VECTORSTORE_ENABLEGPUSEARCH=False
export APP_VECTORSTORE_ENABLEGPUINDEX=False

3. Restart Services#

After making the configuration changes and setting environment variables, restart the services:

# 1. Stop existing services
docker compose -f deploy/compose/vectordb.yaml down

# 2. Start Milvus and dependencies
docker compose -f deploy/compose/vectordb.yaml up -d

# 3. Now start the ingestor server
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d

Switching Milvus to CPU Mode using Helm#

To configure Milvus to run in CPU mode when deploying with Helm:

Disable GPU search and indexing by editing values.yaml.

A. In the envVars and ingestor-server.envVars sections, set the following environment variables:

 ```yaml
 envVars:
 APP_VECTORSTORE_ENABLEGPUSEARCH: "False"
 ingestor-server:
 envVars:
     APP_VECTORSTORE_ENABLEGPUSEARCH: "False"
     APP_VECTORSTORE_ENABLEGPUINDEX: "False"
 ```

B. Also, change the image under milvus.image.all to remove the -gpu tag.

 ```yaml
 milvus:
 image:
     all:
     repository: milvusdb/milvus
     tag: v2.5.17  # instead of v2.5.17-gpu
 ```

C. (Optional) Remove or set GPU resource requests/limits to zero in the milvus.standalone.resources block.

 ```yaml
 milvus:
 standalone:
     resources:
     limits:
         nvidia.com/gpu: 0
 ```

After you modify values.yaml, apply the changes as described in Change a Deployment.

GPU Indexing with CPU Search#

This mode uses the GPU to build indexes during ingestion while serving search on the CPU. It is useful when you want fast index construction but prefer CPU-based query serving for cost, capacity, or scheduling reasons.

For general GPU↔CPU switching instructions, see the GPU to CPU Mode Switch section above.

Environment Variables#

Set the following before starting the ingestor server:

export APP_VECTORSTORE_ENABLEGPUSEARCH=False
export APP_VECTORSTORE_ENABLEGPUINDEX=True

With APP_VECTORSTORE_ENABLEGPUSEARCH=False, the client enables adapt_for_cpu=true automatically. adapt_for_cpu decides whether to use GPU for index-building and CPU for search. When this parameter is true, search requests must include the ef parameter.

Docker Compose notes#

Keep Milvus running with a GPU-capable image if you want GPU index-building (for example: milvusdb/milvus:v2.6.2-gpu).
Set the environment variables above before starting the ingestor server.
For inference (search and generate) in rag-server, you can use either the GPU or CPU Docker image. Search will run on CPU for the Milvus collection built with GPU indexing when APP_VECTORSTORE_ENABLEGPUSEARCH=False.

Example sequence:

# Start/ensure Milvus is up (GPU image if you want GPU indexing)
docker compose -f deploy/compose/vectordb.yaml up -d

# Set env vars and start the ingestor (GPU indexing + CPU search)
export APP_VECTORSTORE_ENABLEGPUSEARCH=False
export APP_VECTORSTORE_ENABLEGPUINDEX=True
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d

# Start rag-server (either Milvus CPU or GPU image is fine)
docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d

Helm notes#

Set the environment variables in values.yaml:

envVars:
  APP_VECTORSTORE_ENABLESEARCH: "True"
ingestor-server:
  envVars:
    APP_VECTORSTORE_ENABLEGPUSEARCH: "False"
    APP_VECTORSTORE_ENABLEGPUINDEX: "True"

If you require GPU index-building, ensure the Milvus image variant supports GPU (for example, keep a -gpu tag where applicable). rag-server can be deployed with either CPU or GPU images for inference; search will be served on CPU for collections indexed with GPU when APP_VECTORSTORE_ENABLEGPUSEARCH is set to False.

Note

When adapt_for_cpu is in effect, your search requests must supply an ef parameter.

(Optional) Customize the Milvus Endpoint#

To use a custom Milvus endpoint, use the following procedure.

Update the APP_VECTORSTORE_URL and MINIO_ENDPOINT variables in both the RAG server and the ingestor server sections in values.yaml. Your changes should look similar to the following.

env:
  # ... existing code ...
  APP_VECTORSTORE_URL: "http://your-custom-milvus-endpoint:19530"
  MINIO_ENDPOINT: "http://your-custom-minio-endpoint:9000"
  # ... existing code ...

ingestor-server:
  env:
    # ... existing code ...
    APP_VECTORSTORE_URL: "http://your-custom-milvus-endpoint:19530"
    MINIO_ENDPOINT: "http://your-custom-minio-endpoint:9000"
    # ... existing code ...

nv-ingest:
  envVars:
    # ... existing code ...
    MINIO_INTERNAL_ADDRESS: "http://your-custom-minio-endpoint:9000"
    # ... existing code ...

Disable the Milvus deployment. Set milvusDeployed: false in the nv-ingest.milvusDeployed section to prevent deploying the default Milvus instance. Your changes should look like the following.
```
 nv-ingest:
   # ... existing code ...
   milvusDeployed: false
   # ... existing code ...
```

Redeploy the Helm chart by running the following code.

helm upgrade rag https://helm.ngc.nvidia.com/0648981100760671/charts/nvidia-blueprint-rag-v2.4.0-dev-dev.tgz -f nvidia-blueprint-rag/values.yaml -n rag

Milvus Authentication#

Enable authentication for Milvus to secure your vector database.

Docker Compose#

1. Configure Milvus Authentication#

Extract the default Milvus configuration:

docker cp milvus-standalone:/milvus/configs/milvus.yaml ./deploy/compose/

Edit deploy/compose/milvus.yaml to enable authentication:

security:
  authorizationEnabled: true
  defaultRootPassword: "your-secure-password"

Mount the configuration file in deploy/compose/vectordb.yaml by uncommenting the volume mount:

volumes:
  - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
  - ${MILVUS_CONFIG_FILE:-./milvus.yaml}:/milvus/configs/milvus.yaml

2. Start Services#

Start Milvus with authentication:

docker compose -f deploy/compose/vectordb.yaml up -d

Set authentication credentials and start RAG services:

export APP_VECTORSTORE_USERNAME="root"
export APP_VECTORSTORE_PASSWORD="your-secure-password"

docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d
docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d

Helm Chart#

1. Configure Milvus Authentication in Helm:#

Configure Milvus Authentication

Edit deploy/helm/nvidia-blueprint-rag/files/milvus.yaml to enable authentication:

security:
  authorizationEnabled: true
  defaultRootPassword: "your-secure-password"

Create a ConfigMap from the milvus.yaml file:

kubectl create configmap milvus-config --from-file=milvus.yaml=deploy/helm/nvidia-blueprint-rag/files/milvus.yaml

Configure Volume Mounting

The values.yaml file includes the necessary volume configuration:

milvus:
  standalone:
    extraVolumes:
      - name: milvus-config
        configMap:
          name: milvus-config
    extraVolumeMounts:
      - name: milvus-config
        mountPath: /milvus/configs/milvus.yaml
        subPath: milvus.yaml

2. Configure username and password in `deploy/helm/nvidia-blueprint-rag/values.yaml`:#

rag-server:
  envVars:
    APP_VECTORSTORE_USERNAME: "root"
    APP_VECTORSTORE_PASSWORD: "your-secure-password"

ingestor-server:
  envVars:
    APP_VECTORSTORE_USERNAME: "root"
    APP_VECTORSTORE_PASSWORD: "your-secure-password"

3. Deploy with Helm:#

helm upgrade --install rag -n rag https://helm.ngc.nvidia.com/0648981100760671/charts/nvidia-blueprint-rag-v2.4.0-dev-dev-rc2.tgz \
--username '$oauthtoken' \
--password "${NGC_API_KEY}" \
--set imagePullSecret.password=$NGC_API_KEY \
--set ngcApiSecret.password=$NGC_API_KEY \
-f deploy/helm/nvidia-blueprint-rag/values.yaml

For detailed HELM deployment instructions, see Helm Deployment Guide.

Troubleshooting#

GPU_CAGRA Error#

If you encounter GPU_CAGRA errors that cannot be resolved by when switching to CPU mode, try the following:

Stop all running services:

docker compose -f deploy/compose/vectordb.yaml down
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml down

Delete the Milvus volumes directory:
```
rm -rf deploy/compose/volumes
```

Restart the services:

docker compose -f deploy/compose/vectordb.yaml up -d
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d

Note

This will delete all existing vector data, so ensure you have backups if needed.

Milvus Configuration for NVIDIA RAG Blueprint#

GPU to CPU Mode Switch#

Docker compose#

Configuration Steps#

1. Update Docker Compose Configuration (vectordb.yaml)#

2. Set Environment Variables#

3. Restart Services#

Switching Milvus to CPU Mode using Helm#

GPU Indexing with CPU Search#

Environment Variables#

Docker Compose notes#

Helm notes#

(Optional) Customize the Milvus Endpoint#

Milvus Authentication#

Docker Compose#

1. Configure Milvus Authentication#

2. Start Services#

Helm Chart#

1. Configure Milvus Authentication in Helm:#

2. Configure username and password in deploy/helm/nvidia-blueprint-rag/values.yaml:#

3. Deploy with Helm:#

Troubleshooting#

GPU_CAGRA Error#

Related Topics#

2. Configure username and password in `deploy/helm/nvidia-blueprint-rag/values.yaml`:#