Release Notes

The initial release of the NVIDIA Enterprise RAG LLM Operator enables NVIDIA AI Enterprise customers to deploy an Operator that manages the life cycle of the following key components for RAG pipelines:

NVIDIA Inference Microservice
NVIDIA NeMo Retriever Embedding Microservice

NVIDIA provides a sample RAG pipeline to demonstrate deploying an LLM model, pgvector as a sample vector database, a chat bot web application, and a query server that communicates with the microservices and the vector database.

Known Issues

Autoscaling the microservices is not operational. Alternatively, you can scale the microservices using the kubectl scale sts --replicas=<n> command.
Modifying a Helm pipeline specification and applying the change might not roll out the change. Alternatively, you can roll out the change using the kubectl rollout restart sts command.
The Operator is not verified in an air-gapped network environment.

Previous Connecting to a Vector Database

Release Notes

v0.1.0

Initial Release

Known Issues