Skip to content

Pre-Requisites & Support Matrix

Before you begin using NeMo Retriever Library, confirm your software stack, deployment hardware, and—if you use them—advanced features (audio and video, Nemotron Parse, VLM image captioning, reranking) against the guidance in this page.

Software Requirements

  • Linux operating systems (Ubuntu 22.04 or later recommended)
  • CUDA Toolkit (NVIDIA Driver >= 535, CUDA >= 12.2)
  • Python 3.12 — required to install and run the NeMo Retriever Library Python API, CLI, and related packages from PyPI (for example pip or uv). Older Python versions will fail dependency resolution without a clear error.
  • UV Python package and environment manager (optional; recommended for creating isolated environments)
  • For audio and video, ffmpeg and ffprobe must be on PATH (for example sudo apt-get install -y --no-install-recommends ffmpeg on Debian/Ubuntu). ffmpeg-python and nemo-retriever[multimedia] do not install these binaries. On Helm with package-repo access, set service.installFfmpeg=true. For air-gapped clusters, see Air-gapped and disconnected deployment.

Note

When you use UV, create the environment with Python 3.12 — for example, uv venv --python 3.12. This matches the requires-python metadata in the library packages.

Hardware Requirements

The full ingestion pipeline is designed to consume significant CPU and memory resources to achieve maximal parallelism. Resource usage scales up to the limits of your deployed system.

For per-feature GPU memory, disk, and co-residency rules, refer to Model hardware requirements below.

  • System Memory: At least 256 GB RAM
  • CPU Cores: At least 32 CPU cores
  • GPU: NVIDIA GPU with at least 24 GB VRAM (for example, A100, H100, L40S, or equivalent)

Note

Using less powerful systems or lower resource limits is still viable, but performance will suffer.

Resource Consumption Notes

  • The pipeline performs runtime allocation of parallel resources based on system configuration
  • Memory usage can reach up to the full system capacity for large document processing
  • CPU utilization scales with the number of concurrent processing tasks
  • GPU is required for inference using HuggingFace models or NIMs
  • GPU is NOT required for build.nvidia.com hosted inference

Scaling Considerations

For production deployments processing large volumes of documents, consider: - Higher memory configurations for processing large PDF files or image collections - Additional CPU cores for improved parallel processing - Multiple GPUs for distributed processing workloads

Environment Requirements

Ensure your deployment environment meets these specifications before running the full pipeline. Resource-constrained environments may experience performance degradation.

Core and Advanced Pipeline Features

The NeMo Retriever Library extraction core pipeline features run on a single A10G or better GPU.

Default Helm NIMs

The production Helm chart enables these NIM microservices by default (for example via nimOperator.*.enabled=true):

Helm flag NIM Role
page_elements nemotron-page-elements-v3 Page layout and element detection
table_structure nemotron-table-structure-v1 Table structure extraction
ocr nemotron-ocr-v2 Image OCR
vlm_embed llama-nemotron-embed-vl-1b-v2 Multimodal (VL) embedding

Nemotron OCR v2 language mode

Note

Local Hugging Face inference: When you deploy locally with HuggingFace model weights (for example pip install "nemo-retriever[local]" and GPU inference without remote OCR NIM URLs), the default OCR engine is Nemotron OCR v2, which runs in multilingual mode by default. For CLI flags and API parameters, see Nemotron OCR v2 — language mode. Remote OCR NIM endpoints use their own model and language behavior; local OCR language selectors are not sent on remote requests.

Helm / NIM: The chart deploys the core OCR NIM under nimOperator.ocr. For image defaults, multilingual behavior, and upgrade notes, see Nemotron OCR v2 — language mode in the Helm chart README.

Default VL embedder container and model for release deployments:

  • Image: nvcr.io/nim/nvidia/llama-nemotron-embed-vl-1b-v2:1.12.0
  • Model ID: nvidia/llama-nemotron-embed-vl-1b-v2

Optional Helm NIMs (not auto-wired)

These NIM microservices are optional for the default extraction pipeline. The retriever service does not call them until you enable the matching pipeline stage (reranker, Nemotron Parse, caption, or audio). For 26.05 production, disable keys you do not need (see Recommended minimal install (26.05)). Set nimOperator.<key>.enabled=true when you want that NIM reconciled. Chart keys are in the NeMo Retriever Helm chart README.

Helm flag NIM Role
rerankqa llama-nemotron-rerank-vl-1b-v2 Reranking for improved retrieval accuracy
nemotron_parse nemotron-parse Optional PDF extract_method="nemotron_parse" (default PDF extraction uses pdfium)
nemotron_3_nano_omni_30b_a3b_reasoning nemotron-3-nano-omni-30b-a3b-reasoning Supported image captioning for 26.05 when you enable the caption stage
audio parakeet-1-1b-ctc-en-us Audio and video transcription

Image captioning (26.05)

For 26.05, use nemotron_3_nano_omni_30b_a3b_reasoning when you enable the caption stage (hosted model ID nvidia/nemotron-3-nano-omni-30b-a3b-reasoning). The Helm key is in the optional NIMs table above.

Optional features listed in the table above require additional GPU support, disk space, and feature-specific system dependencies beyond the four default NIMs.

For published NIM model IDs and deployment-specific constraints, use the product support matrices linked under Related Topics below.

Model Hardware Requirements

NeMo Retriever Library supports the following GPU hardware given system constraints in the table.

  • HF model weights — approximate Hugging Face checkpoint footprint (files such as model*.safetensors, weights.pth, or other published weight bundles in the model repository). Values are rounded from the current public file listing and can change when the repository is updated.
  • NIM disk space — approximate container and on-disk model cache for self-hosted NIM microservices (not the same as HF download size). For Nemotron 3 Nano Omni captioning, see the NVIDIA NIM for Vision Language Models support matrix.

Model repositories and NIM references are linked in Core and Advanced Pipeline Features above.

Feature HF Model Weights GPU Option RTX Pro 6000 B200 H200 NVL H100 A100 80GB A100 40GB A10G L40S RTX PRO 4500 Blackwell
GPU Memory 96GB 180GB 141GB 80GB 80GB 40GB 24GB 48GB 32GB GDDR7 (GB203)
Core Features ~4.8 GiB combined: embed VL 1b ~3.1 GiB; page-elements ~0.41 GiB; table-structure ~0.81 GiB; OCR ~0.51 GiB Total GPUs 1 1 1 1 1 1 1 1 1
Core Features Total Disk Space ~150GB ~150GB ~150GB ~150GB ~150GB ~150GB ~150GB ~150GB ~150GB
Audio/video extraction (parakeet-1-1b-ctc-en-us) ~4.0 GiB (model.safetensors; the repo also ships parakeet-ctc-1.1b.nemo of similar size—use one format to avoid roughly doubling disk use) Additional Dedicated GPUs Not supported⁴ Not supported⁴ Not supported⁴
Additional Disk Space Not supported⁴ Not supported⁴ ~37GB¹ ~37GB¹ ~37GB¹ ~37GB¹ ~37GB¹ ~37GB¹ Not supported⁴
nemotron-parse ~3.5 GiB Additional Dedicated GPUs Not supported 1 Not supported 1 1 1 1 1 Not supported²
nemotron-parse Additional Disk Space Not supported ~16GB Not supported ~16GB ~16GB ~16GB ~16GB ~16GB Not supported²
Omni caption (nemotron-3-nano-omni-30b-a3b-reasoning) ~62 GiB (BF16); ~33 GiB (FP8); ~21 GiB (NVFP4) Additional Dedicated GPUs 1 1 1 1 1 Not supported Not supported 2 Not supported³
Omni caption (nemotron-3-nano-omni-30b-a3b-reasoning) Additional Disk Space (HF) ~21–62GB ~21–62GB ~21–62GB ~21–62GB ~21–62GB Not supported Not supported ~21–62GB Not supported³
Omni caption (nemotron-3-nano-omni-30b-a3b-reasoning) Additional Disk Space (NIM) ~80GB ~80GB ~80GB ~80GB ~80GB Not supported Not supported ~80GB Not supported³
Reranker ~3.1 GiB (llama-nemotron-rerank-vl-1b-v2) With Core Pipeline Yes Yes Yes Yes Yes No* No* No* No*
Reranker Standalone (recall only) Yes Yes Yes Yes Yes Yes Yes Yes Yes

¹ On other supported GPUs, Parakeet ASR (parakeet-1-1b-ctc-en-us:1.5.0) may require a runtime TensorRT engine build (no prebuilt profile in the chart image).

⁴ On B200 and other Blackwell GPUs (compute capability 12.0), including RTX PRO 6000 Blackwell and RTX PRO 4500 Blackwell, self-hosted audio/video extraction via Parakeet ASR (parakeet-1-1b-ctc-en-us:1.5.0, nimOperator.audio) is not supported. Core PDF and multimodal extraction on Blackwell is unchanged. Video workflows that depend on Parakeet for speech transcription are affected the same way. NIMService for nimOperator.audio may stay not Ready or enter CrashLoopBackOff while building the Riva/TensorRT engine (for example ONNX Runtime IR version, cuDNN visibility, or FP8 tactic errors). Use a non-Blackwell dedicated GPU, hosted Parakeet on build.nvidia.com, or set nimOperator.audio.enabled=false.

² Nemotron Parse fails to start on 32GB.

³ Opt-in Omni captioning uses the nemotron-3-nano-omni-30b-a3b-reasoning NIM (nvcr.io/nim/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:1.7.0-variant). BF16 requires at least 80 GB total GPU memory; see the VLM NIM support matrix. L40S requires two GPUs. A100 40GB, A10G, and RTX PRO 4500 are below the minimum.

* GPUs with less than 80GB VRAM cannot run the reranker concurrently with the core pipeline. To perform recall testing with the reranker on these GPUs, shut down the core pipeline NIM microservices and run only the embedder, reranker, and your vector database.