By default, the sample pipeline starts a pgvector database in a pod. The query server connects to the pgvector database to store and retrieve embeddings.
To connect to an alternative vector database, refer to the following requirements:
Must be pgvector or Milvus.
Must have network connectivity between the Kubernetes cluster and the database.
The typical network ports are
5432
for pgvector and19530
for Milvus.
Installed the NVIDIA GPU Operator and NVIDIA Enterprise RAG LLM Operator.
Created two persistent volume claims:
nemollm-inference-pvc
andnemo-embedding-pvc
.
To connect to an alternative vector database, perform the following steps:
Edit the
config/samples/helmpipeline_app.yaml
file.Modify the query server specification and set the
APP_VECTORSTORE_URL
,APP_VECTORSTORE_NAME
, and environment variables for credentials:spec: pipeline: - repoEntry: name: rag-llm-app url: "file:///helm-charts/pipeline" # ... chartValues: query: image: ... # ... env: APP_VECTORSTORE_URL: "http://<ip-or-hostname>:<port>" APP_VECTORSTORE_NAME: "milvus" or "pgvector" # If the database is pgvector, specify credentials. POSTGRES_PASSWORD: <password> POSTGRES_USER: <postgres-user-id> POSTGRES_DB: <db-name> # ...
TipIf the database is hosted in the same cluster, but in a different namespace, specify the Kubernetes DNS record for the service. For example, if Milvus Standalone is deployed in the default namespace, specify a value like
http://my-release-milvus.default:19530
. If pgvector is deployed in a namespace named db, specify a value likepgvector.db:5432
.For more information, refer to DNS for Services and Pods in the Kubernetes documentation.
Disable the default pgvector pod:
spec: pipeline: - repoEntry: name: rag-llm-app url: "file:///helm-charts/pipeline" # ... chartValues: pgvector: enabled: false
Apply the configuration change:
$ kubectl apply -f config/samples/helmpipeline_app.yaml -n rag-sample
Example Output
helmpipeline.package.nvidia.com/my-sample-pipeline configured
Optional: Monitor the query server logs.
View the logs of the query server pod:
$ kubectl logs -n rag-sample -f -l app.kubernetes.io/name=query-router
Upload a document using the Knowledge Base tab of the sample web application.
Confirm the query logs resemble the following output:
INFO:example:Ingesting dgxh100-user-guide.pdf in vectorDB INFO:RetrievalAugmentedGeneration.common.utils:Using milvus as vector store DEBUG:pymilvus.milvus_client.milvus_client:Created new connection using: b83ee3672c404217bd5e62a3da774ed0 INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: intfloat/e5-large-v2 .gitattributes: 100%|██████████| 1.48k/1.48k [00:00<00:00, 3.38MB/s] 1_Pooling/config.json: 100%|██████████| 201/201 [00:00<00:00, 478kB/s] README.md: 100%|██████████| 67.8k/67.8k [00:00<00:00, 70.1MB/s] config.json: 100%|██████████| 616/616 [00:00<00:00, 1.41MB/s] handler.py: 100%|██████████| 1.12k/1.12k [00:00<00:00, 2.83MB/s] model.safetensors: 100%|██████████| 1.34G/1.34G [00:03<00:00, 349MB/s] pytorch_model.bin: 100%|██████████| 1.34G/1.34G [00:04<00:00, 315MB/s] sentence_bert_config.json: 100%|██████████| 57.0/57.0 [00:00<00:00, 138kB/s] special_tokens_map.json: 100%|██████████| 125/125 [00:00<00:00, 241kB/s] tokenizer.json: 100%|██████████| 711k/711k [00:00<00:00, 2.95MB/s] tokenizer_config.json: 100%|██████████| 314/314 [00:00<00:00, 527kB/s] vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 3.94MB/s] modules.json: 100%|██████████| 387/387 [00:00<00:00, 891kB/s] INFO:sentence_transformers.SentenceTransformer:Use pytorch device: cuda INFO:example:Document dgxh100-user-guide.pdf ingested successfully
The example output is for Milvus Standalone. Output for pgvector is different.