Vector Database Configuration for NVIDIA RAG Blueprint#
NVIDIA RAG Blueprint is compatible with several vector database backends, including Elasticsearch and Milvus. Elasticsearch is the default option. Standard deployments automatically use Elasticsearch—no manual configuration is required. In both the RAG and Ingestor servers, the defaults are set to APP_VECTORSTORE_NAME=elasticsearch and APP_VECTORSTORE_URL=http://elasticsearch:9200 in Docker Compose, or to the bundled ECK Elasticsearch HTTP service when using Helm.
Milvus is available as an optional secondary backend if you prefer to use that stack.
After you’ve deployed the blueprint, use this page to configure Elasticsearch settings (including authentication and index options), work with the default Elasticsearch setup, switch to Milvus, or connect a custom vector database.
Tip
To navigate this page more easily, click the outline button at the top of the page. 
Configuring Elasticsearch#
Use this section as a map to the topics below.
Elasticsearch client (library installs) – For local development without the pre-built Docker images, enable Elasticsearch support by installing nvidia_rag[elasticsearch] using pip install nvidia_rag[elasticsearch] or uv sync --extra elasticsearch. The Docker images already include this dependency.
Changing the backend – If you switch between vector databases (for example Elasticsearch and Milvus), you must re-upload your documents; data is not migrated automatically.
Port – Elasticsearch listens on port 9200 by default. Ensure it is available or adjust your configuration.
Elasticsearch data volume (Docker Compose) – Elasticsearch persists data in a dedicated
rag-vol-elasticsearchDocker named volume (host path:/var/lib/docker/volumes/rag-vol-elasticsearch/_data/). For inspection, backup, reset, and migration from the legacydeploy/compose/volumes/host directory, see Manage Persistent Data Volumes in the troubleshooting guide.Authentication – Refer to Elasticsearch Authentication for xpack, API keys, and Helm (ECK) credentials.
Index and search tuning – Adjust index type, dense or hybrid search, and related behavior with
APP_VECTORSTORE_*in the RAG and Ingestor compose files or HelmenvVars(for exampleAPP_VECTORSTORE_SEARCHTYPE,APP_VECTORSTORE_INDEXTYPE).GPU indexing – For optional GPU-accelerated vector indexing (requires an Elastic Enterprise license and a GPU-enabled image), see Elasticsearch Configuration.
Using Elasticsearch (Default)#
The following steps describe the default Elasticsearch deployment for Docker Compose and Helm.
Docker Compose#
Start the vector database stack. Elasticsearch is included in the default profile for
vectordb.yaml(you may pass--profile elasticsearchexplicitly if you prefer).docker compose -f deploy/compose/vectordb.yaml up -d
Confirm vector database settings. The compose files for the RAG and Ingestor servers already default to Elasticsearch; set or export these if you need to be explicit:
export APP_VECTORSTORE_URL="http://elasticsearch:9200" export APP_VECTORSTORE_NAME="elasticsearch"
Relaunch the RAG and ingestion services.
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d
Update the RAG UI configuration.
Access the RAG UI at
http://<host-ip>:8090. In the UI, navigate to: Settings > Endpoint Configuration > Vector Database Endpoint → set tohttp://elasticsearch:9200.
Helm#
If you’re using Helm for deployment, Elasticsearch (ECK) is enabled by default. Use the following steps to align the release with the default vector database.
Note
Performance Consideration: Slow VDB upload is observed in Helm deployments for Elasticsearch (ES). For more details, refer to the troubleshooting documentation.
Prerequisites#
Install the ECK (Elastic Cloud on Kubernetes) operator:
The ECK operator is required to manage Elasticsearch deployments on Kubernetes.
# Add Elastic Helm repository helm repo add elastic https://helm.elastic.co helm repo update # Install ECK operator in its own namespace helm install elastic-operator elastic/eck-operator -n elastic-system --create-namespace
Tip
The ECK operator manages the Elasticsearch lifecycle, including deployment, upgrades, and configuration management.
Verify ECK operator installation:
Ensure the ECK operator is running before proceeding:
# Check ECK operator pod status kubectl get pods -n elastic-system # Expected output: elastic-operator-0 1/1 Running # Verify ECK operator is ready kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=elastic-operator -n elastic-system --timeout=300s
Configuration Steps#
Confirm Elasticsearch settings in
values.yaml.The chart defaults to Elasticsearch; ensure both the RAG server and ingestor-server sections match your ECK service URL and credentials:
# RAG Server configuration envVars: APP_VECTORSTORE_URL: "http://rag-eck-elasticsearch-es-http:9200" APP_VECTORSTORE_NAME: "elasticsearch" APP_VECTORSTORE_USERNAME: "" APP_VECTORSTORE_PASSWORD: "" # Ingestor Server configuration ingestor-server: envVars: APP_VECTORSTORE_URL: "http://rag-eck-elasticsearch-es-http:9200" APP_VECTORSTORE_NAME: "elasticsearch" APP_VECTORSTORE_USERNAME: "" APP_VECTORSTORE_PASSWORD: ""
Ensure that Elasticsearch (ECK) is enabled in values.yaml. This is the default setting; use the following block as the reference configuration:
eck-elasticsearch:
enabled: true
http:
tls:
selfSignedCertificate:
disabled: true
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
# Disable authentication for easier setup (default)
xpack.security.enabled: false
xpack.security.transport.ssl.enabled: false
Deploy the Helm chart:
After modifying
values.yaml, apply the changes as described in Change a Deployment.For detailed Helm deployment instructions, see Deploy the RAG Pipeline.
Verify Elasticsearch deployment:
Check that the Elasticsearch pod and service are running:
# Check Elasticsearch pod status kubectl get pods -n rag | grep elasticsearch # Expected output: rag-eck-elasticsearch-es-default-0 1/1 Running # Check Elasticsearch service kubectl get svc -n rag | grep elasticsearch # Expected services: # - rag-eck-elasticsearch-es-default (ClusterIP, port 9200) # - rag-eck-elasticsearch-es-http (ClusterIP, port 9200) # - rag-eck-elasticsearch-es-transport (ClusterIP, port 9300) # Wait for Elasticsearch to be ready kubectl wait --for=condition=ready pod -l elasticsearch.k8s.elastic.co/cluster-name=rag-eck-elasticsearch -n rag --timeout=300s
Test Elasticsearch health:
# Test from inside the cluster kubectl exec -n rag rag-eck-elasticsearch-es-default-0 -- curl -s http://localhost:9200/_cluster/health # Expected: {"cluster_name":"rag-eck-elasticsearch","status":"yellow" or "green",...}
(Optional) Enable authentication - see Elasticsearch Authentication (Helm) if you need to secure your Elasticsearch instance.
After the Helm deployment, port-forward the RAG UI service:
kubectl port-forward -n rag service/rag-frontend 3000:3000 --address 0.0.0.0
Access the UI at
http://<host-ip>:3000and set Settings > Endpoint Configuration > Vector Database Endpoint tohttp://rag-eck-elasticsearch-es-http:9200.
Verify Your Elasticsearch Vector Database Setup#
After you complete the setup, verify that Elasticsearch is running correctly:
For Docker Deployment:#
curl -X GET "localhost:9200/_cluster/health?pretty"
For Helm deployments:#
# 1. Get the name of your Elasticsearch pod:
kubectl get pods -n rag | grep elasticsearch
# 2. Run the following command, replacing <elasticsearch-pod-name> with the actual pod name:
# Use the pod name from step 1 (e.g. rag-eck-elasticsearch-es-default-0)
kubectl exec -n rag <elasticsearch-pod-name> -- curl -s "localhost:9200/_cluster/health?pretty"
# Alternative: Port-forward and test from your machine (service name uses Helm release prefix)
kubectl port-forward -n rag svc/rag-eck-elasticsearch-es-http 9200:9200 &
curl -X GET "localhost:9200/_cluster/health?pretty"
You should see a response that indicates the cluster status is green or yellow, confirming that Elasticsearch is operational and ready to store embeddings.
Switching to Milvus#
Use Milvus when you want the optional Milvus stack instead of Elasticsearch. You must re-upload your documents after switching; embeddings stored in Elasticsearch are not migrated to Milvus automatically.
Docker Compose#
Start the Milvus profile (Milvus, etcd, SeaweedFS object store, and related services).
docker compose -f deploy/compose/vectordb.yaml --profile milvus up -d
Point the application at Milvus.
export APP_VECTORSTORE_NAME="milvus" export APP_VECTORSTORE_URL="http://milvus:19530"
Relaunch the RAG and ingestion services.
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d
Update the RAG UI configuration.
Settings > Endpoint Configuration > Vector Database Endpoint →
http://milvus:19530.
Helm#
Configure values.yaml so Milvus is deployed, ECK Elasticsearch is disabled, and both servers use the Milvus endpoint.
Set the vector database environment variables on the RAG server and ingestor server:
# RAG Server configuration envVars: APP_VECTORSTORE_URL: "http://milvus:19530" APP_VECTORSTORE_NAME: "milvus" APP_VECTORSTORE_USERNAME: "" APP_VECTORSTORE_PASSWORD: "" # Ingestor Server configuration ingestor-server: envVars: APP_VECTORSTORE_URL: "http://milvus:19530" APP_VECTORSTORE_NAME: "milvus" APP_VECTORSTORE_USERNAME: "" APP_VECTORSTORE_PASSWORD: ""
Disable ECK Elasticsearch and deploy Milvus via nv-ingest:
eck-elasticsearch: enabled: false nv-ingest: milvusDeployed: true
Adjust additional
nv-ingest.milvussettings invalues.yamlas needed (resources, images, authentication, and so on).Deploy or upgrade the Helm release as described in Change a Deployment.
Verify Milvus pods and services, then set the RAG UI vector endpoint to match your Milvus HTTP/gRPC service (commonly
http://milvus:19530from application pods when using the defaultfullnameOverride).
Note
Kubernetes DNS names may include your Helm release prefix. Use kubectl get svc -n rag to confirm the Milvus service hostname if connections fail.
Elasticsearch Authentication#
For Elasticsearch authentication configuration (xpack security, API keys, and ECK credentials), refer to Elasticsearch Configuration.
Using VDB Auth Token at Runtime via APIs (Enterprise Feature)#
For runtime VDB token authentication with Elasticsearch, refer to Elasticsearch Configuration.
Define Your Own Vector Database#
You can create your own custom vector database operators by implementing the VDBRag base class.
This enables you to integrate with any vector database that isn’t already supported.
Caution
This section is for advanced developers who need to integrate custom vector databases beyond the supported database options.
For a complete example, refer to Custom VDB Operator Notebook.
Tip
Choose your integration path:
Start with Library Mode for fastest iteration during development (recommended for most users).
Advanced users who are comfortable with deployments can start directly with Server Mode. See: Integrate Into NVIDIA RAG (Server Mode).
Integrate Custom VDB in Library Mode (Developer-Friendly Approach)#
Before wiring your custom VDB into the servers, the quickest way to iterate is to run it in library mode. This is ideal for development, debugging, and ensuring the operator behaves correctly.
Tip
New to library mode? Check out the Containerless Deployment (Lite Mode) notebook for a complete example of using the RAG library without Docker containers.
Reference implementation (start here)
Read the notebook:
../notebooks/building_rag_vdb_operator.ipynb.It contains a complete, working example you can copy and adapt.
What you build
A class that inherits from
VDBRagand implements the required methods for ingestion and retrieval.Instantiate that class and pass it to
NvidiaRAGand/orNvidiaRAGIngestorvia thevdb_opparameter.
Minimal example
from nvidia_rag import NvidiaRAG, NvidiaRAGIngestor from nvidia_rag.utils.vdb.vdb_base import VDBRag class CustomVDB(VDBRag): def __init__(self, custom_url: str, index_name: str, embedding_model=None): # initialize client(s) here self.url = custom_url self.index_name = index_name self.embedding_model = embedding_model def create_collection(self, collection_name: str, dimension: int = 2048, collection_type: str = "text"): ... # create index/collection def write_to_index(self, records: list[dict], **kwargs): ... # bulk insert vectors + metadata # implement retrieval and other required methods used by the notebook custom_vdb_op = CustomVDB(custom_url="http://localhost:9200", index_name="test_library") rag = NvidiaRAG(vdb_op=custom_vdb_op) ingestor = NvidiaRAGIngestor(vdb_op=custom_vdb_op)
Quick checklist:
Implement a
VDBRagsubclass with at least:create_collection,write_to_index, and retrieval helpers used in the notebook.Initialize your operator and pass it via
vdb_optoNvidiaRAG/NvidiaRAGIngestor.Run the notebook cells to validate: create collection → upload documents → search/generate → list/delete documents.
Once satisfied, proceed to Server Mode integration below.
Step-by-Step Implementation Guide for Custom Vector Database#
Use the following steps to create and use your own custom database operators.
Create a class that inherits from
VDBRagand implements all required methods.from nvidia_rag.utils.vdb.vdb_base import VDBRag class CustomVDB(VDBRag): def __init__(self, custom_url, index_name, embedding_model=None): # Initialize your custom VDB connection pass def create_collection(self, collection_name, dimension=2048): # Implement collection creation pass def write_to_index(self, records, **kwargs): # Implement document indexing pass # Implement other required methods...
Use your custom operator with NVIDIA RAG components.
# Initialize custom VDB operator custom_vdb_op = CustomVDB( custom_url="your://database:url", index_name="collection_name", embedding_model=embedding_model ) # Use with NVIDIA RAG rag = NvidiaRAG(vdb_op=custom_vdb_op) ingestor = NvidiaRAGIngestor(vdb_op=custom_vdb_op)
Method Descriptions:
Use this as a minimal checklist for your
VDBRagsubclass. Keep names consistent with your codebase; ensure these behaviors exist.Initialization
__init__(...): Initialize your backend client/connection, set collection/index name, capture metadata helpers, and optionally accept an embedding model handle.collection_name (property): Getter/Setter mapping to your underlying collection/index identifier.
Core index operations
_check_index_exists(name): Return whether the target collection/index exists.create_index(): Create the collection/index if missing with appropriate vector settings.write_to_index(records, **kwargs): Clean incoming records, extracttext,vector, and metadata (e.g.,source,content_metadata), bulk-insert, and refresh visibility.retrieval(queries, **kwargs): Optional for RAG. Implement multi-query retrieval or raiseNotImplementedErrorif you expose a different retrieval entrypoint.reindex(records, **kwargs): Optional for RAG. Implement reindex/update workflows or raiseNotImplementedError.run(records): Convenience helper to create (if needed) then write.
Collection management
create_collection(collection_name, dimension=2048, collection_type="text"): Ensure a collection exists and is ready for inserts/queries.check_collection_exists(collection_name): Boolean existence check.get_collection(): Return a list of collections with document counts, stored metadata schema, and collection-level document info.delete_collections(collection_names): Delete specified collections and clean up stored schemas and document info.
Document management
get_documents(collection_name): Return unique documents (commonly grouped by asourcefield) with schema-aligned metadata values and document info.delete_documents(collection_name, source_values): Bulk-delete documents matching provided sources; refresh visibility and clean up associated document info.
Metadata schema management
create_metadata_schema_collection(): Initialize storage for metadata schemas if missing.add_metadata_schema(collection_name, metadata_schema): Replace the stored schema for a collection.get_metadata_schema(collection_name): Fetch the stored schema; return an empty list if none.
Document info management (implementation of these methods is optional)
create_document_info_collection(): Initialize storage for document-level and collection-level information.add_document_info(info_type, collection_name, document_name, info_value): Store document or collection info (e.g., processing statistics, custom metadata).get_document_info(info_type, collection_name, document_name): Retrieve stored document/collection info; return an empty dict if none.
Catalog metadata management (implementation of these methods is optional)
get_catalog_metadata(collection_name): Retrieve catalog metadata (description, tags, owner, etc.) for a collection.update_catalog_metadata(collection_name, updates): Update catalog metadata for a collection with merge semantics.get_document_catalog_metadata(collection_name, document_name): Retrieve catalog metadata (description, tags) for a specific document.update_document_catalog_metadata(collection_name, document_name, updates): Update catalog metadata for a specific document.
Retrieval helpers
Retrieval helper (e.g.,
retrieval_*): Return top‑k relevant documents using your backend’s semantic search. Support optional filters and tracing where applicable.Vector index handle (e.g.,
get_*_vectorstore): Return a handle to your backend’s vector index suitable for retrieval operations.Add collection tag (e.g.,
_add_collection_name_to_*docs): Add the originating collection name into each document’s metadata (useful for multi‑collection citations).
For a concrete, working example, see
src/nvidia_rag/utils/vdb/elasticsearch/elastic_vdb.pyandnotebooks/building_rag_vdb_operator.ipynb.
Integrate Custom Vector Database Into NVIDIA RAG Servers (Docker Mode)#
Before proceeding in server mode, go through the Implementation Steps above to implement and validate your operator.
Follow these steps to add your custom vector database to the NVIDIA RAG servers (RAG server and Ingestor server).
Reference implementation (read this first)
We strongly recommend reviewing the companion notebook:
../notebooks/building_rag_vdb_operator.ipynb.It contains a complete, working custom VDB example that you can adapt. The server-mode integration below reuses the same class and only adds a small registration step plus environment configuration.
Add your implementation
Create your operator under the project tree:
src/nvidia_rag/utils/vdb/custom_vdb_name/custom_vdb_name.py
Implement the class that inherits from
VDBRagand fulfills the required methods (create collection, write, search, etc.).
Register your operator in the server
Update the VDB factory so the servers can instantiate your operator by name. Edit
src/nvidia_rag/utils/vdb/__init__.pyand add a branch inside_get_vdb_op:elif CONFIG.vector_store.name == "your_custom_vdb": from nvidia_rag.utils.vdb.custom_vdb_name.custom_vdb_name import CustomVDB return CustomVDB( index_name=collection_name, custom_url=vdb_endpoint or CONFIG.vector_store.url, embedding_model=embedding_model, )
Add required client libraries (if needed)
If your custom operator depends on an external client SDK, add the library to
pyproject.tomlunder[project].dependenciesso it is installed consistently across local runs, CI, and Docker builds. Example:[project] dependencies = [ # ... "opensearch-py>=3.0.0", # or your custom client library ]
Rebuild your images if deploying with Docker so the new dependency is included.
Configure docker compose (server deployments)
Set
APP_VECTORSTORE_NAMEto your custom name and pointAPP_VECTORSTORE_URLto your service in both compose files:deploy/compose/docker-compose-rag-server.yamldeploy/compose/docker-compose-ingestor-server.yaml
Example overrides via environment (recommended):
export APP_VECTORSTORE_NAME="your_custom_vdb" export APP_VECTORSTORE_URL="http://your-custom-vdb:1234" # Build the containers to include your current codebase docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d --build docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d --build
Or, you may edit the files locally to show your custom value. Search for
APP_VECTORSTORE_NAMEand adjust defaults if desired:# Type of vectordb used to store embedding (supports "milvus", "elasticsearch", or a custom value like "your_custom_vdb") APP_VECTORSTORE_NAME: ${APP_VECTORSTORE_NAME:-"elasticsearch"} # URL on which vectorstore is hosted APP_VECTORSTORE_URL: ${APP_VECTORSTORE_URL:-http://your-custom-vdb:1234}
How the configuration is picked up
The application configuration (
src/nvidia_rag/utils/configuration.py) maps environment variables into theAppConfigobject. Specifically:APP_VECTORSTORE_NAME→CONFIG.vector_store.nameAPP_VECTORSTORE_URL→CONFIG.vector_store.url
The server calls the VDB factory with this configuration. See
_get_vdb_op:When
CONFIG.vector_store.name == "your_custom_vdb", the branch you added is executed and your operator is constructed withCONFIG.vector_store.url(or the request override) and the embedding model.
TL;DR
Create a
VDBRagsubclass insrc/nvidia_rag/utils/vdb/<your_name>/<your_name>.py(mirror the notebook example).Add a new
elif CONFIG.vector_store.name == "your_custom_vdb"branch insrc/nvidia_rag/utils/vdb/__init__.py::_get_vdb_opthat instantiates your class.Set env vars for both servers:
APP_VECTORSTORE_NAME=your_custom_vdb,APP_VECTORSTORE_URL=http://your-custom-vdb:1234.Restart
docker-composeservices for the RAG server and Ingestor server.
That’s it—after these steps, both the RAG server and the Ingestor will use your custom vector database when APP_VECTORSTORE_NAME is set to your_custom_vdb.
Integrate Custom Vector Database Into NVIDIA RAG Servers (Helm/Kubernetes Mode)#
Warning
Advanced Developer Guide - Production Use Only
This section is for advanced developers with Kubernetes and Helm experience. Recommended for production environments only. For development and testing, use the Docker Compose approach instead.
Before proceeding with Helm deployment, ensure you have completed the implementation steps mentioned above, including:
Creating your custom VDB operator class that inherits from
VDBRagRegistering your operator in
src/nvidia_rag/utils/vdb/__init__.pyAdding required client libraries to
pyproject.toml(if needed)
Refer to the steps above for detailed implementation guidance.
Build Custom Images#
Once your custom vector database implementation is complete, you need to build custom images for both the RAG server and Ingestor server:
Update image names in Docker Compose files:
Edit
deploy/compose/docker-compose-rag-server.yamland change the image name:services: rag-server: image: your-registry/your-rag-server:your-tag
Edit
deploy/compose/docker-compose-ingestor-server.yamland change the image name:services: ingestor-server: image: your-registry/your-ingestor-server:your-tag
Tip
Use a public registry for easier deployment and accessibility.
Build Ingestor server and RAG server image:
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml build docker compose -f deploy/compose/docker-compose-rag-server.yaml build
Push images to your registry:
docker push your-registry/your-rag-server:your-tag docker push your-registry/your-ingestor-server:your-tag
Configure Helm Values#
Update your values.yaml file to use your custom images and configure your vector database:
Update image repositories and tags:
# RAG server image configuration image: repository: your-registry/your-rag-server tag: "your-tag" pullPolicy: Always # Ingestor server image configuration ingestor-server: image: repository: your-registry/your-ingestor-server tag: "your-tag" pullPolicy: Always
Configure vector database settings:
# RAG server environment variables envVars: APP_VECTORSTORE_URL: "http://your-custom-vdb:port" APP_VECTORSTORE_NAME: "your_custom_vdb" # ... other existing configurations # Ingestor server environment variables ingestor-server: envVars: APP_VECTORSTORE_URL: "http://your-custom-vdb:port" APP_VECTORSTORE_NAME: "your_custom_vdb" # ... other existing configurations
Disable Default Vector Database and Add Custom Helm Chart#
Disable the chart-managed vector databases you are replacing in values.yaml. The default configuration deploys Elasticsearch (ECK) and keeps Milvus optional through nv-ingest; for a custom Helm chart backend, disable any bundled backends you no longer need—for example:
eck-elasticsearch:
enabled: false
nv-ingest:
enabled: true
milvusDeployed: false
milvus:
enabled: false
Add your custom vector database Helm chart to
Chart.yaml:Edit
deploy/helm/nvidia-blueprint-rag/Chart.yamland add your custom VDB as a dependency:dependencies: # ... existing dependencies - condition: your-custom-vdb.enabled name: your-custom-vdb repository: https://your-helm-repo.com/charts version: 1.0.0
Note
Replace
your-custom-vdb,https://your-helm-repo.com/charts, and1.0.0with your actual chart name, repository URL, and version.Add Helm repository and update dependencies:
cd deploy/helm/ # Add your custom VDB Helm repository helm repo add your-vdb-repo https://your-helm-repo.com/charts helm repo update # Update Helm dependencies helm dependency update nvidia-blueprint-rag
Enable your custom vector database in
values.yaml:# Add your custom VDB configuration your-custom-vdb: enabled: true # Add your VDB-specific configuration here # Example configurations: service: type: ClusterIP port: 9200 resources: limits: memory: "4Gi" requests: memory: "2Gi"
Deploy with Helm#
Deploy your updated NVIDIA RAG system with the custom vector database:
cd deploy/helm/
helm upgrade --install rag -n rag nvidia-blueprint-rag/ \
--set imagePullSecret.password=$NGC_API_KEY \
--set ngcApiSecret.password=$NGC_API_KEY \
-f nvidia-blueprint-rag/values.yaml
Verify Deployment#
After deployment, verify that your custom vector database is working correctly:
Check pod status:
kubectl get pods -n rag
Check service endpoints:
kubectl get services -n rag
Test vector database connectivity:
# Get your custom VDB pod name: kubectl get pods -n rag # Then run the health check (replace <custom-vdb-pod-name> with your pod name and correct /health endpoint): kubectl exec -n rag <custom-vdb-pod-name> -- curl -X GET "localhost:port/health"
Access the RAG UI:
Port-forward the RAG UI service:
kubectl port-forward -n rag service/rag-frontend 3000:3000 --address 0.0.0.0
Access the UI at
http://<host-ip>:3000and configure:Go to Settings > Endpoint Configuration > Vector Database Endpoint and set it to
http://your-custom-vdb:port
Troubleshooting#
If you encounter issues during deployment:
Check Helm chart dependencies:
helm dependency list nvidia-blueprint-rag
Verify image pull secrets:
kubectl get secrets -n rag
Check pod logs:
kubectl logs -n rag deployment/rag-server kubectl logs -n rag deployment/ingestor-server
Validate Helm values:
helm template rag nvidia-blueprint-rag/ -f nvidia-blueprint-rag/values.yaml
Implement Retrieval-Only Vector Database Integration#
You can integrate your own vector database with NVIDIA RAG by implementing only the retrieval functionality while managing ingestion separately. This approach allows you to use existing RAG server, RAG UI, and ingestor server components with your custom vector database backend.
Note
This approach is ideal when you have an existing vector database with pre-indexed documents and want to leverage NVIDIA RAG’s retrieval and generation capabilities without implementing full ingestion workflows into Nvidia RAG Blueprint.
Implementation Requirements#
Implement only the retrieval-focused methods from the VDBRag interface:
Required Methods:
__init__(vdb_endpoint, collection_name, embedding_model=None): Initialize connectionclose(): Clean up connections__enter__()/__exit__(): Context manager supportcheck_health(): Return database health statusget_collection(): Return available collections with metadatacheck_collection_exists(collection_name): Verify collection existenceretrieval_langchain(query, collection_name, vectorstore=None, top_k=10, filter_expr="", otel_ctx=None): Core retrieval method - Must returnlangchain_core.documents.Documentobjects with:page_content: The document text contentmetadata: Dictionary containing:source: Document source identifier (e.g., “file1.pdf”)content_metadata: Nested dictionary with additional metadata (e.g.,{"topic": "science"})collection_name: To be added in each Document’s metadata
get_langchain_vectorstore(collection_name): Return vectorstore handle (can returnNone)
Optional Methods: Raise NotImplementedError for all ingestion methods (create_collection(), write_to_index(), etc.) and document info management methods (create_document_info_collection(), add_document_info(), get_document_info())
Example Document Structure:
from langchain_core.documents import Document
# Example return from retrieval_langchain()
documents = [
Document(
page_content="Albert Einstein was playing chess with his friend",
metadata={
"source": "file1.pdf",
"content_metadata": {"topic": "science"},
"collection_name": "my_collection"
}
)
]
Integration Steps#
Follow the steps in ## Integrate Into NVIDIA RAG (Server Mode - Docker) with these key differences:
Skip ingestion implementations - Raise
NotImplementedErrorfor ingestion methodsHandle document indexing separately - Use your own processes or tools
Ensure proper document format - Your
retrieval_langchainmethod must returnDocumentobjects with metadata
The integration process remains the same: create your VDB class, register it, configure environment variables, and deploy.