Minimum System Requirements for NVIDIA RAG Blueprint#
This documentation contains the system requirements for the NVIDIA RAG Blueprint.
Important
You can deploy the RAG Blueprint with Docker, Helm, or NIM Operator, and target dedicated hardware or a Kubernetes cluster. Some requirements are different depending on your target system and deployment method.
Disk Space Requirements#
Important
Ensure that you have at least 200GB of available disk space before you deploy the RAG Blueprint. This space is required for the following:
NIM model downloads and caching (largest component, ~100-150GB)
Container images (~20-30GB)
Vector database data and indices
Application logs and temporary files
Insufficient disk space causes deployment failures during model downloads or runtime operations.
Operating System#
For the RAG Blueprint you need the following operating system:
Ubuntu 22.04 OS
Driver Versions#
For the RAG Blueprint you need the following drivers:
GPU Driver - 560 or later
CUDA version - 12.9 or later
For details, see NVIDIA NIM for LLMs Software.
Hardware Requirements (Docker)#
By default, the RAG Blueprint deploys the NIM microservices locally (self-hosted). The default LLM (nemotron-3-super-120b-a12b) requires 2 GPUs (FP8 TP2). You need one of the following:
3 x H100
3 x B200
3 x RTX PRO 6000
Tip
You can also modify the RAG Blueprint to use NVIDIA-hosted NIM microservices.
Tip
No GPU Available? Try the Containerless Deployment (Lite Mode) which requires no GPU hardware and uses NVIDIA cloud APIs for all processing.
Hardware Requirements (Kubernetes)#
To install the default RAG Blueprint Helm chart on Kubernetes, you need one of the following:
8 x H100-80GB
8 x B200
8 x RTX PRO 6000
5 x H100-80GB (with Multi-Instance GPU)
Optional GPU-backed services increase the requirement. Plan for one additional GPU for each optional service that you enable, such as VLM generation, VLM captioning, VLM reranking, Nemotron Parse, or audio processing, unless you use MIG slicing or another explicit sharing strategy.
Hardware requirements for self-hosting all NVIDIA NIM microservices#
The following are requirements and recommendations for the individual components of the RAG Blueprint:
Pipeline operation – 1x L40 GPU or similar recommended. This is required if you use Milvus (optional) as the vector database with GPU acceleration. The default Elasticsearch VDB does not require a GPU. If you change the vector backend or enable optional GPU acceleration for Elasticsearch vector indexing, refer Elasticsearch Configuration and confirm GPU requirements for that configuration.
LLM NIM (nemotron-3-super-120b-a12b) – Refer to the Support Matrix.
Embedding NIM (llama-nemotron-embed-vl-1b-v2) – Refer to the Support Matrix for your deployment target.
Reranking NIM (llama-nemotron-rerank-1b-v2): Refer to the Support Matrix.
VLM Reranking NIM (llama-nemotron-rerank-vl-1b-v2, optional): Refer to the Support Matrix.
Nemotron OCR (Default): Refer to the Support Matrix.
NVIDIA NIMs for Object Detection:
Nemotron Page Elements v3 Support Matrix
Nemotron Graphic Elements v1 Support Matrix
Nemotron Table Structure v1 Support Matrix
Tip
Nemotron OCR is now the default OCR service. To use the legacy Paddle OCR instead, see OCR Configuration Guide.