Enable PDF extraction with Nemotron Parse for NVIDIA RAG Blueprint#
For enhanced PDF extraction capabilities, particularly for scanned documents or documents with complex layouts, you can use the Nemotron Parse service with the NVIDIA RAG Blueprint This service provides higher-accuracy text extraction and improved PDF parsing compared to the default PDF extraction method.
Warning
Nemotron Parse is not supported on NVIDIA B200 GPUs or RTX Pro 6000 GPUs. For this feature, use H100 or A100 GPUs instead.
Using Docker Compose#
Using On-Prem Models#
Prerequisites: Follow the deployment guide up to and including the step labelled “Start all required NIMs.”
Deploy the Nemotron Parse service along with other required NIMs:
USERID=$(id -u) docker compose --profile rag --profile nemotron-parse -f deploy/compose/nims.yaml up -d
Configure the ingestor-server to use Nemotron Parse by setting the environment variable:
export APP_NVINGEST_PDFEXTRACTMETHOD=nemotron_parse
Deploy the ingestion-server and rag-server containers following the remaining steps in the deployment guide.
You can now ingest PDF files using the ingestion API usage notebook.
Using NVIDIA Hosted API Endpoints#
Prerequisites: Follow the deployment guide up to and including the step labelled “Start the vector db containers from the repo root.”
Export the following variables to use nemotron parse API endpoints:
export NEMOTRON_PARSE_HTTP_ENDPOINT=https://integrate.api.nvidia.com/v1/chat/completions export NEMOTRON_PARSE_MODEL_NAME=nvidia/nemotron-parse export NEMOTRON_PARSE_INFER_PROTOCOL=http
Configure the ingestor-server to use Nemotron Parse by setting the environment variable:
export APP_NVINGEST_PDFEXTRACTMETHOD=nemotron_parse
Deploy the ingestion-server and rag-server containers following the remaining steps in the deployment guide.
You can now ingest PDF files using the ingestion API usage notebook.
Note
When using NVIDIA hosted endpoints, you may encounter rate limiting with larger file ingestions (>10 files).
Using Helm#
To enable PDF extraction with Nemotron Parse using Helm, you need to enable the Nemotron Parse service and configure the ingestor-server to use it.
Prerequisites#
Ensure you have sufficient GPU resources. Nemotron Parse requires a dedicated GPU.
Deployment Steps#
To deploy with Nemotron Parse enabled:
Modify values.yaml to enable Nemotron Parse:
# Enable Nemotron Parse NIM
nv-ingest:
nimOperator:
nemotron_parse:
enabled: true
# Configure ingestor-server to use Nemotron Parse
ingestor-server:
envVars:
APP_NVINGEST_PDFEXTRACTMETHOD: "nemotron_parse"
After modifying values.yaml, apply the changes as described in Change a Deployment.
For detailed HELM deployment instructions, see Helm Deployment Guide.
Note
Key Configuration Changes:
nv-ingest.nimOperator.nemotron_parse.enabled=true- Enables Nemotron Parse NIMingestor-server.envVars.APP_NVINGEST_PDFEXTRACTMETHOD="nemotron_parse"- Configures ingestor to use Nemotron Parse
Limitations and Requirements#
When using Nemotron Parse for PDF extraction, consider the following:
Nemotron Parse only supports PDF format documents, not image files. Attempting to process non-PDF files will lead them to be extracted using the default extraction method.
The service requires GPU resources and must run on a dedicated GPU. Make sure you have sufficient GPU resources available before enabling this feature.
The extraction quality may vary depending on the PDF structure and content.
Nemotron Parse is not supported on NVIDIA B200 GPUs or RTX Pro 6000 GPUs.
For detailed information about hardware requirements and supported GPUs for all NeMo Retriever extraction NIMs, refer to the Nemotron Parse Support Matrix.
Available PDF Extraction Methods#
The APP_NVINGEST_PDFEXTRACTMETHOD environment variable supports the following values:
nemotron_parse: Uses the Nemotron Parse service for enhanced PDF extraction (recommended for scanned documents or documents with complex layouts)pdfium: Uses the default PDFium-based extractionNone: Uses the default extraction method
Note
The Nemotron Parse service requires GPU resources and must run on a dedicated GPU. Make sure you have sufficient GPU resources available before enabling this feature.