Milvus Collection Schema Requirements for NVIDIA RAG Blueprint#
When you create a collection in Milvus to use with the NVIDIA RAG Blueprint server, there are specific schema requirements that must be followed to ensure compatibility with the search and generate APIs. This document outlines the required fields and their configurations.
Note
If you are using either LangChain’s Milvus integration or NVIDIA’s nv-ingest tool for data ingestion, these schema requirements are automatically handled for you. Both tools will create and configure the collection with the correct schema fields. You only need to ensure these requirements when manually creating collections or using custom ingestion methods.
Required Schema Fields#
The following fields are required in your Milvus collection schema:
Vector Field
Name:
vectorDescription: Stores the document embeddings
Text Field
Name:
textDescription: Stores the document content
Source Field
Name:
sourceCan be configured in two ways:
Simple string format: Directly store the filename
JSON format: Store a JSON object with a
source_idfield containing the filename
{ "source_id": "document.pdf" }
Content Metadata Field (Optional)
Name:
content_metadataType:
JSON(DataType.JSON)Description: Stores additional metadata about the document content
Can be used for filtering during search and retrieval
Example Schema Definition#
Here’s an example of a complete schema definition that meets all requirements:
{
'auto_id': True,
'description': '',
'fields': [
{
'name': 'pk',
'description': '',
'type': DataType.INT64,
'is_primary': True,
'auto_id': True
},
{
'name': 'vector',
'description': '',
'type': DataType.FLOAT_VECTOR,
'params': {'dim': 2048}
},
{
'name': 'source',
'description': '',
'type': DataType.JSON
},
{
'name': 'content_metadata',
'description': '',
'type': DataType.JSON
},
{
'name': 'text',
'description': '',
'type': DataType.VARCHAR,
'params': {'max_length': 65535}
}
],
'enable_dynamic_field': True
}
Usage with RAG Server#
When using this schema with the RAG server:
The search API will use the
vectorfield for similarity searchThe
textfield will be used to return the actual contentThe
sourcefield will be used to track document sourcesThe
content_metadatafield can be used for filtering using thefilter_exprparameter in search and generate APIs
For more information about using metadata for filtering, refer to the Custom Metadata Documentation.