Pre-Requisites & Support Matrix
Before you begin using NeMo Retriever Library, confirm your software stack, deployment hardware, and—if you use them—advanced features (audio and video, Nemotron Parse, VLM image captioning, reranking) against the guidance in this page.
Software Requirements
- Linux operating systems (Ubuntu 22.04 or later recommended)
- CUDA Toolkit (NVIDIA Driver >=
535, CUDA >=12.2) - Python
3.12— required to install and run the NeMo Retriever Library Python API, CLI, and related packages from PyPI (for examplepiporuv). Older Python versions will fail dependency resolution without a clear error. - UV Python package and environment manager (optional; recommended for creating isolated environments)
- For audio and video,
ffmpegandffprobemust be onPATH(for examplesudo apt-get install -y --no-install-recommends ffmpegon Debian/Ubuntu).ffmpeg-pythonandnemo-retriever[multimedia]do not install these binaries. On Helm with package-repo access, setservice.installFfmpeg=true. For air-gapped clusters, see Air-gapped and disconnected deployment.
Note
When you use UV, create the environment with Python 3.12 — for example, uv venv --python 3.12. This matches the requires-python metadata in the library packages.
Hardware Requirements
The full ingestion pipeline is designed to consume significant CPU and memory resources to achieve maximal parallelism. Resource usage scales up to the limits of your deployed system.
For per-feature GPU memory, disk, and co-residency rules, refer to Model hardware requirements below.
Recommended Production Deployment Specifications
- System Memory: At least 256 GB RAM
- CPU Cores: At least 32 CPU cores
- GPU: NVIDIA GPU with at least 24 GB VRAM (for example, A100, H100, L40S, or equivalent)
Note
Using less powerful systems or lower resource limits is still viable, but performance will suffer.
Resource Consumption Notes
- The pipeline performs runtime allocation of parallel resources based on system configuration
- Memory usage can reach up to the full system capacity for large document processing
- CPU utilization scales with the number of concurrent processing tasks
- GPU is required for inference using HuggingFace models or NIMs
- GPU is NOT required for build.nvidia.com hosted inference
Scaling Considerations
For production deployments processing large volumes of documents, consider: - Higher memory configurations for processing large PDF files or image collections - Additional CPU cores for improved parallel processing - Multiple GPUs for distributed processing workloads
Environment Requirements
Ensure your deployment environment meets these specifications before running the full pipeline. Resource-constrained environments may experience performance degradation.
Core and Advanced Pipeline Features
The NeMo Retriever Library extraction core pipeline features run on a single A10G or better GPU.
Default Helm NIMs
The production Helm chart enables these NIM microservices by default (for example via nimOperator.*.enabled=true):
| Helm flag | NIM | Role |
|---|---|---|
page_elements |
nemotron-page-elements-v3 | Page layout and element detection |
table_structure |
nemotron-table-structure-v1 | Table structure extraction |
ocr |
nemotron-ocr-v2 | Image OCR |
vlm_embed |
llama-nemotron-embed-vl-1b-v2 | Multimodal (VL) embedding |
Nemotron OCR v2 language mode
Note
Local Hugging Face inference: When you deploy locally with HuggingFace model weights (for example pip install "nemo-retriever[local]" and GPU inference without remote OCR NIM URLs), the default OCR engine is Nemotron OCR v2, which runs in multilingual mode by default. For CLI flags and API parameters, see Nemotron OCR v2 — language mode. Remote OCR NIM endpoints use their own model and language behavior; local OCR language selectors are not sent on remote requests.
Helm / NIM: The chart deploys the core OCR NIM under nimOperator.ocr. For image defaults, multilingual behavior, and upgrade notes, see Nemotron OCR v2 — language mode in the Helm chart README.
Default VL embedder container and model for release deployments:
- Image:
nvcr.io/nim/nvidia/llama-nemotron-embed-vl-1b-v2:1.12.0 - Model ID:
nvidia/llama-nemotron-embed-vl-1b-v2
Optional Helm NIMs (not auto-wired)
These NIM microservices are optional for the default extraction pipeline. The retriever service does not call them until you enable the matching pipeline stage (reranker, Nemotron Parse, caption, or audio). For 26.05 production, disable keys you do not need (see Recommended minimal install (26.05)). Set nimOperator.<key>.enabled=true when you want that NIM reconciled. Chart keys are in the NeMo Retriever Helm chart README.
| Helm flag | NIM | Role |
|---|---|---|
rerankqa |
llama-nemotron-rerank-vl-1b-v2 | Reranking for improved retrieval accuracy |
nemotron_parse |
nemotron-parse | Optional PDF extract_method="nemotron_parse" (default PDF extraction uses pdfium) |
nemotron_3_nano_omni_30b_a3b_reasoning |
nemotron-3-nano-omni-30b-a3b-reasoning | Supported image captioning for 26.05 when you enable the caption stage |
audio |
parakeet-1-1b-ctc-en-us | Audio and video transcription |
Image captioning (26.05)
For 26.05, use nemotron_3_nano_omni_30b_a3b_reasoning when you enable the caption stage (hosted model ID nvidia/nemotron-3-nano-omni-30b-a3b-reasoning). The Helm key is in the optional NIMs table above.
Optional features listed in the table above require additional GPU support, disk space, and feature-specific system dependencies beyond the four default NIMs.
For published NIM model IDs and deployment-specific constraints, use the product support matrices linked under Related Topics below.
Model Hardware Requirements
NeMo Retriever Library supports the following GPU hardware given system constraints in the table.
- HF model weights — approximate Hugging Face checkpoint footprint (files such as
model*.safetensors,weights.pth, or other published weight bundles in the model repository). Values are rounded from the current public file listing and can change when the repository is updated. - NIM disk space — approximate container and on-disk model cache for self-hosted NIM microservices (not the same as HF download size). For Nemotron 3 Nano Omni captioning, see the NVIDIA NIM for Vision Language Models support matrix.
Model repositories and NIM references are linked in Core and Advanced Pipeline Features above.
| Feature | HF Model Weights | GPU Option | RTX Pro 6000 | B200 | H200 NVL | H100 | A100 80GB | A100 40GB | A10G | L40S | RTX PRO 4500 Blackwell |
|---|---|---|---|---|---|---|---|---|---|---|---|
| GPU | — | Memory | 96GB | 180GB | 141GB | 80GB | 80GB | 40GB | 24GB | 48GB | 32GB GDDR7 (GB203) |
| Core Features | ~4.8 GiB combined: embed VL 1b ~3.1 GiB; page-elements ~0.41 GiB; table-structure ~0.81 GiB; OCR ~0.51 GiB | Total GPUs | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Core Features | — | Total Disk Space | ~150GB | ~150GB | ~150GB | ~150GB | ~150GB | ~150GB | ~150GB | ~150GB | ~150GB |
| Audio/video extraction (parakeet-1-1b-ctc-en-us) | ~4.0 GiB (model.safetensors; the repo also ships parakeet-ctc-1.1b.nemo of similar size—use one format to avoid roughly doubling disk use) |
Additional Dedicated GPUs | Not supported⁴ | Not supported⁴ | 1¹ | 1¹ | 1¹ | 1¹ | 1¹ | 1¹ | Not supported⁴ |
| — | Additional Disk Space | Not supported⁴ | Not supported⁴ | ~37GB¹ | ~37GB¹ | ~37GB¹ | ~37GB¹ | ~37GB¹ | ~37GB¹ | Not supported⁴ | |
| nemotron-parse | ~3.5 GiB | Additional Dedicated GPUs | Not supported | 1 | Not supported | 1 | 1 | 1 | 1 | 1 | Not supported² |
| nemotron-parse | — | Additional Disk Space | Not supported | ~16GB | Not supported | ~16GB | ~16GB | ~16GB | ~16GB | ~16GB | Not supported² |
| Omni caption (nemotron-3-nano-omni-30b-a3b-reasoning) | ~62 GiB (BF16); ~33 GiB (FP8); ~21 GiB (NVFP4) | Additional Dedicated GPUs | 1 | 1 | 1 | 1 | 1 | Not supported | Not supported | 2 | Not supported³ |
| Omni caption (nemotron-3-nano-omni-30b-a3b-reasoning) | — | Additional Disk Space (HF) | ~21–62GB | ~21–62GB | ~21–62GB | ~21–62GB | ~21–62GB | Not supported | Not supported | ~21–62GB | Not supported³ |
| Omni caption (nemotron-3-nano-omni-30b-a3b-reasoning) | — | Additional Disk Space (NIM) | ~80GB | ~80GB | ~80GB | ~80GB | ~80GB | Not supported | Not supported | ~80GB | Not supported³ |
| Reranker | ~3.1 GiB (llama-nemotron-rerank-vl-1b-v2) | With Core Pipeline | Yes | Yes | Yes | Yes | Yes | No* | No* | No* | No* |
| Reranker | — | Standalone (recall only) | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
¹ On other supported GPUs, Parakeet ASR (parakeet-1-1b-ctc-en-us:1.5.0) may require a runtime TensorRT engine build (no prebuilt profile in the chart image).
⁴ On B200 and other Blackwell GPUs (compute capability 12.0), including RTX PRO 6000 Blackwell and RTX PRO 4500 Blackwell, self-hosted audio/video extraction via Parakeet ASR (parakeet-1-1b-ctc-en-us:1.5.0, nimOperator.audio) is not supported. Core PDF and multimodal extraction on Blackwell is unchanged. Video workflows that depend on Parakeet for speech transcription are affected the same way. NIMService for nimOperator.audio may stay not Ready or enter CrashLoopBackOff while building the Riva/TensorRT engine (for example ONNX Runtime IR version, cuDNN visibility, or FP8 tactic errors). Use a non-Blackwell dedicated GPU, hosted Parakeet on build.nvidia.com, or set nimOperator.audio.enabled=false.
² Nemotron Parse fails to start on 32GB.
³ Opt-in Omni captioning uses the nemotron-3-nano-omni-30b-a3b-reasoning NIM (nvcr.io/nim/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:1.7.0-variant). BF16 requires at least 80 GB total GPU memory; see the VLM NIM support matrix. L40S requires two GPUs. A100 40GB, A10G, and RTX PRO 4500 are below the minimum.
* GPUs with less than 80GB VRAM cannot run the reranker concurrently with the core pipeline. To perform recall testing with the reranker on these GPUs, shut down the core pipeline NIM microservices and run only the embedder, reranker, and your vector database.
Related Topics
- Troubleshooting
- Release Notes
- Deployment options (local Python, hosted NIMs, and Kubernetes)
- Deploy with Helm
- NVIDIA NIM for Object Detection (support matrix)
- NVIDIA NIM for Image OCR (support matrix)
- NVIDIA NIM for Vision Language Models (support matrix)
- NVIDIA Speech NIM Microservices (support matrix)