Vector databases
Use this documentation to learn how NeMo Retriever Library stores extracted embeddings and uploads data to vector databases.
On this page
- Overview
- Why LanceDB?
- Upload to LanceDB
- Semantic retrieval
- Metadata and filtering
- LanceDB deployment characteristics
- Upload to a Custom Data Store
- Vector database partners
- Related Topics
Overview
NeMo Retriever Library supports extracting text representations of various forms of content, and ingesting to a vector database. LanceDB is the vector database backend for storing and retrieving extracted embeddings.
The data upload task (vdb_upload) pulls extraction results to the Python client,
and then pushes them to LanceDB (embedded, in-process).
The vector database stores only the extracted text representations of ingested data. It does not store the embeddings for images.
Storing Extracted Images
To persist extracted images, tables, and chart renderings to disk or object storage, use the store task in addition to vdb_upload. The store task supports any fsspec-compatible backend (local filesystem, S3, GCS, and other object stores). For details, refer to Store Extracted Images.
NeMo Retriever Library supports uploading data by using the Ingestor.vdb_upload API. Currently, data upload is not supported through the CLI.
Why LanceDB?
LanceDB is optimized for low-latency retrieval in this stack:
- Lance columnar format — Data is stored in Lance files, an Arrow/Parquet-style analytics layout optimized for fast local scans and indexed retrieval. This reduces serialization overhead compared with a separate database server.
- IVF_HNSW_SQ index — Vectors are scalar-quantized (SQ) within an IVF-HNSW index, compressing them for faster search with lower memory bandwidth cost.
- Embedded runtime — LanceDB runs in-process, so you do not run extra vector-database containers for the default path. Fewer moving parts to start, configure, and maintain.
This combination of file format, index strategy, and in-process runtime supports the latency characteristics described in benchmarks.
Upload to LanceDB
LanceDB uses the LanceDB operator class from the client library. You can configure it via the Python API.
Programmatic API (Python)
Pass vdb_op="lancedb" to vdb_upload, or construct a LanceDB instance and pass it as vdb_op:
from nemo_retriever.vdb.lancedb import LanceDB
vdb = LanceDB(
uri="./lancedb_data", # Path to LanceDB database directory
table_name="nemo-retriever", # Table name
index_type="IVF_HNSW_SQ", # Index type (default)
)
# Ingest
vdb.run(results)
# Retrieve with precomputed query vectors
docs = vdb.retrieval(queries, top_k=10)
Query ingested tables with LanceDB.retrieval() (precomputed vectors) or with Retriever.query (embeds the query string for you). Optional where predicates and client-side filters are documented under Metadata and filtering.
When using the Ingestor with vdb_upload, pass vdb_op="lancedb" or a LanceDB instance so uploads target LanceDB. If you omit vdb_op, the ingestion Python client still defaults the string argument to "milvus" for backward compatibility, which is not the LanceDB operator—always pass vdb_op="lancedb" when you intend LanceDB.
Semantic retrieval
Semantic retrieval uses dense embeddings to find content that is similar in meaning to a query. In NeMo Retriever Library, the default vector path is LanceDB. Use these resources together with the sections on this page:
- Metadata and filtering for sidecar metadata at ingest and query-time filters
- Concepts for broader pipeline and search patterns
- Use the NeMo Retriever Library Python API for
Retriever.queryandLanceDB.retrievalparameters
Evaluation — For evaluation and metrics, refer to Evaluate on your data.
Metadata and filtering
This page covers LanceDB upload and retrieval. Metadata is not duplicated here.
- Published guide — Custom metadata and filtering (sidecar
meta_*onvdb_upload, compact JSON in LanceDB, server-sidewhereonRetriever.query, and client-sidefilter_hits_by_content_metadata). - Canonical reference — Vector DB operators and LanceDB — Metadata filtering in
nemo_retriever/src/nemo_retriever/vdb/README.md(operator behavior and examples).
LanceDB deployment characteristics
| Aspect | LanceDB |
|---|---|
| Runtime model | Embedded (in-process) |
| External services | None for the vector store itself |
| Helm / extra stack | Not required for LanceDB (default path) |
| Index type | IVF_HNSW_SQ (default) |
| Persistence | Lance files on disk under your configured URI |
Upload to a Custom Data Store
You can ingest to other data stores by using the Ingestor.vdb_upload method;
however, you must configure other data stores and connections yourself.
NeMo Retriever Library does not provide connections to other data sources.
Vector database partners
NeMo Retriever Library integrates with vector databases used for RAG collections. The sections above focus on LanceDB as the shipped backend. This section lists that backend and how partner or custom VDB subclasses plug into graph operators. For chunking behavior, see Chunking.
Backends with VDB implementations (retriever adapters)
NeMo Retriever graph operators IngestVdbOperator and RetrieveVdbOperator wrap concrete classes that implement the VDB interface (run for ingest, retrieval for search). The library ships one first-party backend:
| Backend | Project | Implementation |
|---|---|---|
| LanceDB | LanceDB · documentation | lancedb.py — pass vdb_op="lancedb" (recommended). |
On the ingestion Python client's Ingestor.vdb_upload, omitting vdb_op does not select LanceDB; see Upload to LanceDB.
Pass vdb_op="lancedb" or a LanceDB instance. To integrate another vector database, subclass VDB and pass your operator instance as vdb (see Build a Custom Vector Database Operator).
RAG Blueprint and partner vector stores
Some deployments use a different vector store than the default LanceDB path on this page—for example the NVIDIA RAG Blueprint (Docker Compose or Helm) or a partner package that subclasses the same VDB interface. Use the following public references when you wire those stacks to ingestion and retrieval:
| Vector store | Where to configure or implement |
|---|---|
| Elasticsearch | Configure Elasticsearch as Your Vector Database for NVIDIA RAG Blueprint — compose profiles, environment variables, and Helm notes for the RAG Blueprint. |
| Pinecone | Customize your vector database (Pinecone + NVIDIA RAG) in the pinecone-io/nvidia-pinecone-rag repository. |
| Teradata | TeradataVDB (NVIDIA NIM Ingest integration) — teradatagenai.vector_store.teradataVDB.TeradataVDB implements the NeMo Retriever ingestion VDB abstract class for Teradata Vector Store. |
Testing and release cadence for these integrations follow the owning project (RAG Blueprint, Pinecone sample repo, or Teradata Generative AI package), not the first-party LanceDB operator validated for NeMo Retriever Library on this page.
More information (embeddings & custom VDB)
- Custom metadata and filtering and the package VDB README (metadata filtering)
- Multimodal embeddings (VLM)
- NeMo Retriever Text Embedding NIM
- NVIDIA NIM catalog for embedding and retrieval-related NIMs
Important
NVIDIA documents and validates the first-party LanceDB operator for this library. If you integrate a different vector store, you are responsible for testing and maintaining that integration.
To implement a custom operator, follow the VDB abstract interface described in Build a Custom Vector Database Operator.