Release Notes
Initial Release
The initial release of the NVIDIA Enterprise RAG LLM Operator enables NVIDIA AI Enterprise customers to deploy an Operator that manages the life cycle of the following key components for RAG pipelines:
NVIDIA Inference Microservice
NVIDIA NeMo Retriever Embedding Microservice
NVIDIA provides a sample RAG pipeline to demonstrate deploying an LLM model, pgvector as a sample vector database, a chat bot web application, and a query server that communicates with the microservices and the vector database.
Known Issues
Autoscaling the microservices is not operational. Alternatively, you can scale the microservices using the
kubectl scale sts --replicas=<n>
command.Modifying a Helm pipeline specification and applying the change might not roll out the change. Alternatively, you can roll out the change using the
kubectl rollout restart sts
command.The Operator is not verified in an air-gapped network environment.