NVIDIA RAG Blueprint Documentation#
Welcome to the NVIDIA RAG Blueprint documentation. You can learn more here, including how to get started with the RAG Blueprint, how to customize the RAG Blueprint, and how to troubleshoot the RAG Blueprint.
To view this documentation on docs.nvidia.com, browse to NVIDIA RAG Blueprint Documentation.
To view this documentation on GitHub, browse to NVIDIA RAG Blueprint Documentation.
Release Notes#
For the release notes, refer to Release Notes.
Support Matrix#
For hardware requirements and other information, refer to the Support Matrix.
Get Started With RAG Blueprint#
Use the procedures in Get Started to get started quickly with the NVIDIA RAG Blueprint.
Experiment and test in the Web User Interface.
Use the Python Package to interact with the RAG system directly from Python code.
Explore the notebooks that demonstrate how to use the APIs. For details refer to Notebooks.
Deployment Options for RAG Blueprint#
You can deploy the RAG Blueprint with Docker, Helm, or NIM Operator, and target dedicated hardware or a Kubernetes cluster. Use the following documentation to deploy the blueprint.
Important
Before you deploy, consider the following:
Self-hosted deployments require ~200GB of free disk space for model downloads and caching.
First-time deployments take 15-30 minutes (Docker) or 60-70 minutes (Kubernetes) as large models are downloaded.
Model downloads do not show progress bars; see the deployment guides for monitoring commands.
Subsequent deployments are much faster (2-15 minutes) because models are already cached.
For detailed requirements, refer to Support Matrix.
Alternative Deployment Options:
Use the Python Package (Library Mode) - Use the NVIDIA RAG Python package directly for programmatic access to the RAG system
Containerless Deployment (Lite Mode) - Simplified Python-only setup using Milvus Lite and NVIDIA cloud APIs, without Docker containers
Developer Guide#
After you deploy the RAG blueprint, you can customize it for your use cases.
Common configurations
Data Ingestion & Processing
Vector Database and Retrieval
Multimodal and Advanced Generation
Evaluation
Governance
Observability and Telemetry
Troubleshoot RAG Blueprint#
Reference#
Blog Posts#
NVIDIA NeMo Retriever Delivers Accurate Multimodal PDF Data Extraction 15x Faster
Finding the Best Chunking Strategy for Accurate AI Responses