Connecting to a Vector Database

Enterprise RAG LLM Operator - (Latest Version)

By default, the sample pipeline starts a pgvector database in a pod. The query server connects to the pgvector database to store and retrieve embeddings.

To connect to an alternative vector database, refer to the following requirements:

  • Must be pgvector or Milvus.

  • Must have network connectivity between the Kubernetes cluster and the database.

    The typical network ports are 5432 for pgvector and 19530 for Milvus.

  • Installed the NVIDIA GPU Operator and NVIDIA Enterprise RAG LLM Operator.

  • Created two persistent volume claims: nemollm-inference-pvc and nemo-embedding-pvc.

To connect to an alternative vector database, perform the following steps:

  1. Edit the config/samples/helmpipeline_app.yaml file.

    • Modify the query server specification and set the APP_VECTORSTORE_URL, APP_VECTORSTORE_NAME, and environment variables for credentials:


      spec: pipeline: - repoEntry: name: rag-llm-app url: "file:///helm-charts/pipeline" # ... chartValues: query: image: ... # ... env: APP_VECTORSTORE_URL: "http://<ip-or-hostname>:<port>" APP_VECTORSTORE_NAME: "milvus" or "pgvector" # If the database is pgvector, specify credentials. POSTGRES_PASSWORD: <password> POSTGRES_USER: <postgres-user-id> POSTGRES_DB: <db-name> # ...


      If the database is hosted in the same cluster, but in a different namespace, specify the Kubernetes DNS record for the service. For example, if Milvus Standalone is deployed in the default namespace, specify a value like http://my-release-milvus.default:19530. If pgvector is deployed in a namespace named db, specify a value like pgvector.db:5432.

      For more information, refer to DNS for Services and Pods in the Kubernetes documentation.

    • Disable the default pgvector pod:


      spec: pipeline: - repoEntry: name: rag-llm-app url: "file:///helm-charts/pipeline" # ... chartValues: pgvector: enabled: false

  2. Apply the configuration change:


    $ kubectl apply -f config/samples/helmpipeline_app.yaml -n rag-sample

    Example Output


  3. Optional: Monitor the query server logs.

    1. View the logs of the query server pod:


      $ kubectl logs -n rag-sample -f -l

    2. Upload a document using the Knowledge Base tab of the sample web application.

    3. Confirm the query logs resemble the following output:


      INFO:example:Ingesting dgxh100-user-guide.pdf in vectorDB INFO:RetrievalAugmentedGeneration.common.utils:Using milvus as vector store DEBUG:pymilvus.milvus_client.milvus_client:Created new connection using: b83ee3672c404217bd5e62a3da774ed0 INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: intfloat/e5-large-v2 .gitattributes: 100%|██████████| 1.48k/1.48k [00:00<00:00, 3.38MB/s] 1_Pooling/config.json: 100%|██████████| 201/201 [00:00<00:00, 478kB/s] 100%|██████████| 67.8k/67.8k [00:00<00:00, 70.1MB/s] config.json: 100%|██████████| 616/616 [00:00<00:00, 1.41MB/s] 100%|██████████| 1.12k/1.12k [00:00<00:00, 2.83MB/s] model.safetensors: 100%|██████████| 1.34G/1.34G [00:03<00:00, 349MB/s] pytorch_model.bin: 100%|██████████| 1.34G/1.34G [00:04<00:00, 315MB/s] sentence_bert_config.json: 100%|██████████| 57.0/57.0 [00:00<00:00, 138kB/s] special_tokens_map.json: 100%|██████████| 125/125 [00:00<00:00, 241kB/s] tokenizer.json: 100%|██████████| 711k/711k [00:00<00:00, 2.95MB/s] tokenizer_config.json: 100%|██████████| 314/314 [00:00<00:00, 527kB/s] vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 3.94MB/s] modules.json: 100%|██████████| 387/387 [00:00<00:00, 891kB/s] INFO:sentence_transformers.SentenceTransformer:Use pytorch device: cuda INFO:example:Document dgxh100-user-guide.pdf ingested successfully

      The example output is for Milvus Standalone. Output for pgvector is different.

Previous Uninstalling the NVIDIA Enterprise RAG LLM Operator
Next Release Notes
© Copyright © 2024, NVIDIA Corporation. Last updated on May 21, 2024.