Release Notes for NeMo Retriever Extraction
This documentation contains the release notes for NeMo Retriever extraction.
Note
NeMo Retriever extraction is also known as NVIDIA Ingest and nv-ingest.
Release 26.01 (26.1.2)
The NeMo Retriever extraction 26.01 release adds new hardware and software support, and other improvements.
To upgrade the Helm Charts for this version, refer to NV-Ingest Helm Charts.
Highlights
This release contains the following key changes:
- Added functional support for H200 NVL. For details, refer to Support Matrix.
- All Helm deployments for Kubernetes now use NVIDIA NIM Operator. For details, refer to NV-Ingest Helm Charts.
- Updated RIVA NIM to version 1.4.0. For details, refer to Extract Speech.
- Updated VLM NIM to nemotron-nano-12b-v2-vl. For details, refer to Extract Captions from Images.
- Added VLM caption prompt customization parameters, including reasoning control. For details, refer to Caption Images and Control Reasoning.
- Added support for the nemotron-parse model which replaces the nemoretriever-parse model. For details, refer to Advanced Visual Parsing.
- Support is now deprecated for paddleocr.
- The
meta-llama/Llama-3.2-1Btokenizer is now pre-downloaded so that you can run token-based splitting without making a network request. For details, refer to Split Documents. - For scanned PDFs, added specialized extraction strategies. For details, refer to PDF Extraction Strategies.
- Added support for LanceDB. For details, refer to Upload to a Custom Data Store.
- The V2 API is now available and is the default processing pipeline. The response format remains backwards-compatible. You can enable the v2 API by using
message_client_kwargs={"api_version": "v2"}.For details, refer to API Reference. - Large PDFs are now automatically split into chunks and processed in parallel, delivering faster ingestion for long documents. For details, refer to PDF Pre-Splitting.
- Issues maintaining extraction quality while processing very large files are now resolved with the V2 API. For details, refer to V2 API Guide.
- Updated the embedding task to support embedding on custom content fields like the results of summarization functions. For details, refer to Use the Python API.
- User-defined function summarization is now using
nemotron-mini-4b-instructwhich provides significant speed improvements. For details, refer to User-defined Functions and NV-Ingest UDF Examples. - In the
Ingestor.extractmethod, the defaults forextract_textandextract_imagesare now set totruefor consistency withextract_tablesandextract_charts. For details, refer to Use the Python API. - The
table-structureprofile is no longer available. The table-structure profile is now part of the default profile. For details, refer to Profile Information. - New documentation Why Throughput Is Dataset-Dependent.
- New documentation Add User-defined Stages.
- New documentation Add User-defined Functions.
- New documentation Resource Scaling Modes.
- New documentation NimClient Usage.
- New documentation Use the API (V2).
Fixed Known Issues
The following are the known issues that are fixed in this version:
- A10G support is restored. To use A10G hardware, use release 26.1.2 or later. For details, refer to Support Matrix.
- L40S support is restored. To use L40S hardware, use release 26.1.2 or later. For details, refer to Support Matrix.
- The page number field in the content metadata now starts at 1 instead of 0 so each page number is no longer off by one from what you would expect. For details, refer to Content Metadata.
- Support for batches that include individual files greater than approximately 400MB is restored. This includes audio files and pdfs.
All Known Issues
The following are the known issues for NeMo Retriever extraction:
-
Advanced visual parsing is not supported on RTX Pro 6000, B200, or H200 NVL. For details, refer to Advanced Visual Parsing and Support Matrix.
-
The Page Elements NIM (
nemoretriever-page-elements-v3:1.7.0) may intermittently fail during inference under high-concurrency workloads. This happens when Triton’s dynamic batching combines requests that exceed the model’s maximum batch size, a situation more commonly seen in multi-GPU setups or large ingestion runs. In these cases, extraction fails for the impacted documents. A correction is planned fornemoretriever-page-elements-v3:1.7.1.
Release Notes for Previous Versions
| 26.1.1 | 25.9.0 | 25.6.3 | 25.6.2 | 25.4.2 | 25.3.0 | 24.12.1 | 24.12.0 |