๐Ÿ“„ Intelligent Document Processing (IDP) with Nemotron RAG#

Build a production-grade Intelligent Document Processing (IDP) pipeline that transforms complex enterprise documentsโ€”containing dense text, charts, and tablesโ€”into a queryable knowledge base using NVIDIA Nemotron models.

๐Ÿ“‹ Overview#

Standard text extraction often fails on complex enterprise documents due to โ€œLinearization Lossโ€โ€”the loss of structural context. This example demonstrates how to solve this using NVIDIA NeMo Retriever and multimodal Nemotron models.

The pipeline handles:

  1. Complex Layouts: Preserves structure in financial reports and manuals.

  2. Visual Data: Extracts and understands trends in charts (e.g., bar graphs).

  3. Dense Tables: Maintains row/column alignment for accurate financial lookups.

๐Ÿ“ฆ Models Used#

Component

Model

Function

Deployment

Extraction

NeMo Retriever Library (nv-ingest)

A library for advanced document parsing

Self-hosted (Library Mode)

Embedding

nvidia/llama-nemotron-embed-vl-1b-v2

Multimodal Embedding (Text + Images)

Local Inference (Hugging Face)

Reranking

nvidia/llama-nemotron-rerank-vl-1b-v2

Visual Cross-Encoder Reranking

Local Inference (Hugging Face)

Reasoning

nvidia/llama-3.3-nemotron-super-49b

Citation-Backed Answer Generation

NVIDIA NIM / API

๐ŸŒŸ Key Features#

  • Multimodal RAG: Retrieves and reasons over both text and visual elements (charts/graphs).

  • Intelligent Extraction: Uses YOLOX and Transformer-based models to detect and crop charts and parse tables into Markdown.

  • Visual Reranking: The reranker โ€œseesโ€ the retrieved chart images to verify relevance, improving accuracy over text-only search.

  • Hardware-Aware: Includes robust fallback mechanisms to run on varying GPU architectures (e.g., T4 vs. H100) by adjusting precision and attention implementations automatically.

  • Citation & Verification: Enforces strict citation of sources and internal fact-checking (reasoning traces) to reduce hallucinations.

๐Ÿ—๏ธ Pipeline Architecture#

Pipeline Architecture

๐Ÿ”ง Requirements#

Hardware#

  • GPU: NVIDIA GPU recommended: H100 (Tested on H100 and T4).

    • Note: The notebook includes patches to support older GPUs (like T4) by disabling Flash Attention 2 where incompatible.

Software#

  • Python 3.12

  • NVIDIA API Key: Required for the Generation NIM (llama-3.3-nemotron-super-49b).

  • Dependencies: nv-ingest, pymilvus, pillow, transformer, etc.

๐Ÿš€ Quick Start#

1. Set Up Environment#

Install the required libraries. Note that nv-ingest is used in Library Mode for this example, which runs locally without requiring a complex Docker setup.

uv pip install -r pyproject.toml

2. Configure API Key#

Export your NVIDIA API key to access the cloud-hosted generation models.

export NVIDIA_API_KEY="nvapi-..."

3. Run the Notebook#

Launch Jupyter and run the pipeline:

jupyter notebook intelligent_document_processing_pipeline.ipynb

๐Ÿ“„ License#

This project uses NVIDIA open models. Each model is governed by its respective license: