Ingestor Server Volume Mounting for NVIDIA RAG Blueprint#
You can mount a host directory to access extraction results from NeMo Retriever Library directly from the filesystem when you use the NVIDIA RAG Blueprint. Designed for advanced developers who need programmatic access to raw extraction results for custom processing pipelines or external vector database integration.
Configuration#
Environment Variables#
Variable |
Default |
Description |
|---|---|---|
|
|
Container internal path. Mapped to the |
|
|
Enable disk persistence |
Setup#
Enable disk persistence:
# Enable disk persistence export APP_NVINGEST_SAVETODISK=True # (Optional) Override container internal path export INGESTOR_SERVER_DATA_DIR=/data/
The ingestor-server compose file already mounts
rag-vol-ingestoratINGESTOR_SERVER_DATA_DIR; nothing else needs to be configured to persist results.
Accessing the Results from the Host#
Docker named volumes are owned by root on the host, so use one of the following patterns to read the files:
# Copy a single result file out of the volume:
docker run --rm -v rag-vol-ingestor:/src:ro -v "$PWD":/dst alpine \
cp /src/nv-ingest-results/<collection>/<file>.results.jsonl /dst/
# List the directory tree inside the volume:
docker run --rm -v rag-vol-ingestor:/src:ro alpine ls -la /src/nv-ingest-results
# Or copy directly from the running ingestor-server container:
docker cp ingestor-server:/data/nv-ingest-results ./nv-ingest-results
See Manage Persistent Data Volumes for backup, reset, and migration commands.
Result Structure#
Results are saved as .jsonl files with naming convention: {original_filename}.results.jsonl
rag-vol-ingestor:/
└── nv-ingest-results/
├── collection_name1/
│ ├── document1.pdf.results.jsonl
│ ├── presentation.pptx.results.jsonl
│ └── spreadsheet.xlsx.results.jsonl
└── collection_name2/
├── report.pdf.results.jsonl
├── analysis.docx.results.jsonl
└── data.xlsx.results.jsonl
Each .jsonl file contains structured extraction metadata including text segments, document structure, images, tables, and chunk boundaries.
Advanced Usage: These .jsonl files can be used for storing data in vector databases or performing custom processing workflows as desired. This functionality is intended for advanced developers who need direct access to the structured extraction results.
Note
This is an advanced feature for custom processing workflows. Standard RAG functionality stores results directly in the vector database.
Helm (Kubernetes)#
Overview#
The Helm chart supports persisting ingestor-server data to a PersistentVolumeClaim (PVC). When enabled, the chart mounts a PVC at the same path used by INGESTOR_SERVER_DATA_DIR (default /data/). Set APP_NVINGEST_SAVETODISK=True to write extraction results to disk.
Values#
Edit values.yaml and set:
ingestor-server:
envVars:
# Ensure results are written to disk inside the pod
APP_NVINGEST_SAVETODISK: "True"
# Directory inside the container where results will be written
INGESTOR_SERVER_DATA_DIR: "/data/"
# PVC configuration (created automatically unless existingClaim is set)
persistence:
enabled: true
existingClaim: "" # set to use an existing PVC; leave empty to create one
storageClass: "" # set if your cluster requires a specific class (e.g., "standard")
accessModes:
- ReadWriteOnce
size: 50Gi
# Optional: explicitly set the mount path (defaults to INGESTOR_SERVER_DATA_DIR)
mountPath: "/data/"
# Optional: mount a subPath within the PVC
subPath: ""
Notes:
If
existingClaimis empty, the chart will create a PVC named<appName>-data. With the defaultappNameofingestor-server, the PVC name will beingestor-server-data.The container writes results under
/data/by default. Structure matches the compose example:/data/nv-ingest-results/<collection>/file.results.jsonl.
Deploy the Changes#
After modifying values.yaml, apply the changes as described in Change a Deployment.
For detailed HELM deployment instructions, see Helm Deployment Guide.
List and Access Files#
List results inside the ingestor-server pod (default mount path /data/):
kubectl -n rag exec -it <ingestor-pod> -- ls -l /data/
Copy data from the pod to your local computer:
kubectl -n rag cp <ingestor-pod>:/data/ ./ingestor-data