Migration Guide for NVIDIA RAG Blueprint#
This documentation contains the information to upgrade NVIDIA RAG Blueprint from previous versions.
Tip
To navigate this page more easily, click the outline button at the top of the page. outline-button
Migration Guide: v2.5.1 to v2.6.0#
This guide summarizes the default changes and new capabilities introduced in NVIDIA RAG Blueprint v2.6.0. Review these items before upgrading an existing deployment.
Default Deployment Changes#
Vector database: Elasticsearch is now the default vector database. Milvus remains available as an optional backend. If you need to keep using Milvus, set
APP_VECTORSTORE_NAME=milvus, pointAPP_VECTORSTORE_URLto Milvus in both RAG and ingestor services, and follow Vector Database Configuration.Object store: SeaweedFS is now the default S3-compatible object store. Docker Compose deployments use named
rag-vol-*volumes for persistent data. If you are upgrading from a deployment that used host-mounted data underdeploy/compose/volumes/, follow Manage Persistent Data Volumes.LLM: The default LLM is now
nvidia/nemotron-3-super-120b-a12b. The v2.6.0 deployment files enable low-effort reasoning by default withLLM_ENABLE_THINKING=true,LLM_REASONING_BUDGET=256, andLLM_LOW_EFFORT=true. For latency-sensitive deployments, see Enable Reasoning for how to disable or tune reasoning.Embedding model: The default embedding model is now
nvidia/llama-nemotron-embed-vl-1b-v2. The text-onlynvidia/llama-nemotron-embed-1b-v2model remains available as an optional configuration. If you switch embedding models or dimensions, re-ingest your documents so the stored vectors match the retrieval embedder.OCR naming: OCR endpoint names now use
nemotron-ocr-v1instead ofnemoretriever-ocr-v1.
New Optional Features#
Agentic RAG: v2.6.0 adds an Agentic RAG plan-and-execute pipeline. It is disabled by default and can be enabled per request with the
agenticfield or by settingENABLE_AGENTIC_RAG=true. For details, see Agentic RAG.VLM reranker:
nvidia/llama-nemotron-rerank-vl-1b-v2is available as an opt-in reranker for image-heavy corpora. For details, see Change the LLM or Embedding Model.OpenShift Helm deployment: Red Hat OpenShift and OKD deployment is now documented for Helm. For details, see Deploy on OpenShift with Helm.
Evaluation and performance tooling: v2.6.0 adds the filesystem evaluation CLI under
scripts/eval/and therag-perfperformance benchmarking CLI underscripts/rag-perf/. For details, see Evaluate Your NVIDIA RAG Blueprint System and Benchmark the Performance of Your RAG System.
Migration Guide: v2.2.0 to v2.3.0#
This guide summarizes the key API changes and new features introduced in NVIDIA RAG Blueprint v2.3.0. Update your integrations to take advantage of the new confidence threshold filtering capability, enhanced summarization features, and prepare for upcoming deprecations.
API changes#
Confidence threshold filtering
A
confidence_threshold: floatfield has been added to the request schema of thePOST /generateandPOST /searchendpoints.This feature filters documents by their relevance score, improving response quality by excluding low-quality matches.
Works best when reranker is enabled to provide relevance scores.
Default value is 0.0 (no filtering).
Valid range is 0.0 to 1.0 (inclusive).
When confidence threshold is set but reranker is disabled, a warning will be logged.
Enhanced Summarization Features (v2.3.0)#
The summarization API has been significantly enhanced with new capabilities:
New summary_options Parameter#
In v2.2.0, only generate_summary: bool was available. v2.3.0 adds a new summary_options parameter with the following optional fields:
page_filter: Select specific pages to summarizeSupports ranges:
[[1, 10], [20, 30]]for specific page rangesSupports negative indexing:
[[-5, -1]]for last 5 pages (Pythonic style)Supports even/odd filters:
"even"or"odd"for pattern-based selection
shallow_summary: bool(default:false)Set to
truefor 10x faster text-only extraction (skips OCR/tables/images)Set to
falsefor full multimodal extraction
summarization_strategy: str | null(default:null)"single": Fastest - one-pass with truncation"hierarchical": Balanced - parallel processingnullor omit: Best quality - sequential refinement (iterative)
Example:
{
"collection_name": "my_collection",
"generate_summary": true,
"summary_options": {
"page_filter": [[1, 10], [-5, -1]],
"shallow_summary": true,
"summarization_strategy": "single"
}
}
Updated Environment Variables#
Token-based chunking (changed from character-based):
SUMMARY_LLM_MAX_CHUNK_LENGTH: Now9000tokens (default)SUMMARY_CHUNK_OVERLAP: Now400tokens (default)
New variable:
SUMMARY_MAX_PARALLELIZATION:20(default) - Global rate limit for concurrent summaries
Action Required:
If you’ve customized
SUMMARY_LLM_MAX_CHUNK_LENGTHorSUMMARY_CHUNK_OVERLAP, adjust values from characters to tokensTypical conversion: divide character count by ~4 for token estimate
The system now uses the same tokenizer as nv-ingest (
e5-large-unsupervised) for consistency
Migration Example#
Before (v2.2.0):
await ingestor.upload_documents(
collection_name="my_collection",
filepaths=["doc.pdf"],
generate_summary=True
)
After (v2.3.0) - with new optional features:
await ingestor.upload_documents(
collection_name="my_collection",
filepaths=["doc.pdf"],
generate_summary=True,
summary_options={
"page_filter": [[1, 10]], # NEW: Select specific pages
"shallow_summary": True, # NEW: Fast extraction
"summarization_strategy": "hierarchical" # NEW: Balanced approach
}
)
For complete details, see Document Summarization.
Migration Guide: v2.1.0 to v2.2.0#
This guide summarizes the key API changes and new features introduced in RAG v2.2.0. Update your integrations to take advantage of new summarization, metadata, and multi-collection capabilities, and to prepare for upcoming deprecations.
API changes#
Summarization support
A
generate_summary: boolfield has been added to thePOST /documentsandPATCH /documentsendpoints.A new
GET /summaryendpoint has been added to therag-server, allowing users to retrieve summaries of uploaded files.
Custom metadata support
POST /collectionswill be deprecated in favor ofPOST /collectionfor the ingestor-server.POST /collectionallows only a single collection to be created at a time.Developers can now define a custom metadata schema for all files uploaded to a collection.
POST /collectionswill be deprecated in a future release; developers are encouraged to migrate toPOST /collection.
Metadata information is now available in the responses of the
GET /collectionsandGET /documents APIs.
Multi-collection support
The
collection_names: List[str]field has been added to the request schema of thePOST /generateandPOST /searchendpoints, replacingcollection_name: str. The oldcollection_name: strfield has been removed.
Migration Guide: v2.0.0 to v2.1.0#
In RAG 2.1.0, the the behavior of POST /documents API which can be used to upload documents has changed. Developers can now upload documents in a non-blocking manner.
API Changes#
2.1 Changed Endpoints and Features#
Documents management:
Upload documents:
New field:
blocking: boolis added in the request schema. By default it is set toTrue. Developers are expected to call this API and then monitor the status of doc upload using/statusAPI.
Migration Guide: v1.0.0 to v2.0.0#
In RAG v1.0.0, a single server managed both ingestion and retrieval/generation APIs.
In RAG v2.0.0, the architecture has evolved to utilize two separate servers:
RAG Server - Manages retrieval and generation APIs.
Ingestion Server - Manages ingestion APIs.
Also the pipeline by default using on-prem models as default. Earlier it used to use NVIDIA cloud hosted models as default. The minimum hardware requirements for deploying the blueprint in its default settings is specified here. This guide outlines the key changes and steps required for migration.
1. Server Architecture Changes#
Feature |
RAG v1.0.0 (Single Server) |
RAG v2.0.0 (Separate Servers) |
|---|---|---|
API Hosting |
Single server for all APIs |
Two servers: RAG Server and Ingestion Server |
Retrieval & Generation |
Same server as ingestion |
Hosted separately in RAG Server |
Document Ingestion |
Same server as retrieval |
Hosted separately in Ingestion Server |
2. API Changes#
Updated OpenAPI schemas are available here and here.
2.1 New Endpoints and Features#
Collection Management:
Create Collection:
New Endpoint:
POST /collectionsDescription: Allows the creation of document collections. Previously, collections were implicitly created during document uploads.
Delete Collection:
New Endpoint:
DELETE /collections/{collection_name}Description: Enables deletion of entire collections.
Multi-file Document Upload:
Enhanced Endpoint:
POST /documentsDescription: Supports uploading multiple files in a single request. Previously, only single-file uploads were supported.
2.2 Endpoints Moved to Separate Servers#
API Endpoint |
RAG v1.0.0 |
RAG v2.0.0 |
|---|---|---|
|
Unified Server |
Now in Ingestion Server |
|
Unified Server |
Now in Ingestion Server |
|
Unified Server |
Now in Ingestion Server |
|
Unified Server |
Now in RAG Server |
|
Unified Server |
Now in RAG Server |
2.3 Breaking Endpoint Changes#
Ingestion API Enhancements:
PATCH /documentsintroduced in v2.0.0 for deleting & uploading documents in a single request.POST /documentswill throw error if a document exists in the collectionPOST /documentsnow accepts multiple files as a list instead a single file. The payload schema in v2.0.0 is non-backward compatible with v1.0.0.A separate
POST /collectionsAPI is now needed to be called to create a new collection. In v1.0.0, a new collection was automatically created whenPOST /documentswas called.New optional parameters introduced for all APIs to improve the runtime configurability of the pipeline.
DELETE /documentsAPI now accepts multiple files (List[str]) in the payload instead of a single string. This is again non-backward compatible with v1.0.0.
Document Search and Generate Enhancements:
searchandgenerateAPI now includes additional options added to refine retrieval results.Both of these APIs remain backward compatible with v1.0.0.
Health API remains unchanged:
/healthendpoint still exists in both servers and is backward compatible.
3. Migration Steps#
Step 1: Deploy Two Separate Containers#
Ensure that you run two separate containers for RAG Server and Ingestion Server by following the appropriate deployment guide.
Step 2: Update API Calls#
Modify API calls in your client applications:
For Retrieval & Generation, update requests to point to the RAG Server (e.g.,
http://rag-server:8081).For Document Ingestion, update requests to point to the Ingestion Server (e.g.,
http://ingestion-server:8082).
Step 3: Adjust API Payloads#
You can understand the updated schemas for APIs in v2.0.0 by following the notebooks.