Ingestor Server Volume Mounting for NVIDIA RAG Blueprint#
You can mount a host directory to access NV-Ingest extraction results directly from the filesystem when you use the NVIDIA RAG Blueprint. Designed for advanced developers who need programmatic access to raw extraction results for custom processing pipelines or external vector database integration.
Configuration#
Environment Variables#
Variable |
Default |
Description |
|---|---|---|
|
|
Host filesystem path |
|
|
Container internal path |
|
|
Enable disk persistence |
Setup#
Export environment variables:
# Enable disk persistence export APP_NVINGEST_SAVETODISK=True # Set host directory path (optional - customize as needed) export INGESTOR_SERVER_EXTERNAL_VOLUME_MOUNT=./volumes/ingestor-server # Set container internal path (optional - customize as needed) export INGESTOR_SERVER_DATA_DIR=/data/
Troubleshooting#
Optional: Fix permissions issues If you encounter permission errors when accessing the volume:
sudo chown -R 1000:1000 ${INGESTOR_SERVER_EXTERNAL_VOLUME_MOUNT}
sudo chmod -R 755 ${INGESTOR_SERVER_EXTERNAL_VOLUME_MOUNT}
Result Structure#
Results are saved as .jsonl files with naming convention: {original_filename}.results.jsonl
${INGESTOR_SERVER_EXTERNAL_VOLUME_MOUNT}/
└── nv-ingest-results/
├── collection_name1/
│ ├── document1.pdf.results.jsonl
│ ├── presentation.pptx.results.jsonl
│ └── spreadsheet.xlsx.results.jsonl
└── collection_name2/
├── report.pdf.results.jsonl
├── analysis.docx.results.jsonl
└── data.xlsx.results.jsonl
Each .jsonl file contains structured extraction metadata including text segments, document structure, images, tables, and chunk boundaries.
Advanced Usage: These .jsonl files can be used for storing data in vector databases or performing custom processing workflows as desired. This functionality is intended for advanced developers who need direct access to the structured extraction results.
Note
This is an advanced feature for custom processing workflows. Standard RAG functionality stores results directly in the vector database.
Helm (Kubernetes)#
Overview#
The Helm chart supports persisting ingestor-server data to a PersistentVolumeClaim (PVC). When enabled, the chart mounts a PVC at the same path used by INGESTOR_SERVER_DATA_DIR (default /data/). Set APP_NVINGEST_SAVETODISK=True to write extraction results to disk.
Values#
Edit values.yaml and set:
ingestor-server:
envVars:
# Ensure results are written to disk inside the pod
APP_NVINGEST_SAVETODISK: "True"
# Directory inside the container where results will be written
INGESTOR_SERVER_DATA_DIR: "/data/"
# PVC configuration (created automatically unless existingClaim is set)
persistence:
enabled: true
existingClaim: "" # set to use an existing PVC; leave empty to create one
storageClass: "" # set if your cluster requires a specific class (e.g., "standard")
accessModes:
- ReadWriteOnce
size: 50Gi
# Optional: explicitly set the mount path (defaults to INGESTOR_SERVER_DATA_DIR)
mountPath: "/data/"
# Optional: mount a subPath within the PVC
subPath: ""
Notes:
If
existingClaimis empty, the chart will create a PVC named<appName>-data. With the defaultappNameofingestor-server, the PVC name will beingestor-server-data.The container writes results under
/data/by default. Structure matches the compose example:/data/nv-ingest-results/<collection>/file.results.jsonl.
Install / Upgrade (On-prem only)#
Ensure your NGC API key is available:
export NGC_API_KEY="<your-ngc-api-key>"
Using a custom values file:
helm upgrade --install rag -n rag \
https://helm.ngc.nvidia.com/0648981100760671/charts/nvidia-blueprint-rag-v2.4.0-dev-dev.tgz \
--username '$oauthtoken' \
--password "${NGC_API_KEY}" \
--set imagePullSecret.password=$NGC_API_KEY \
--set ngcApiSecret.password=$NGC_API_KEY \
-f -f deploy/helm/nvidia-blueprint-rag/values.yaml
Or with inline overrides:
helm upgrade --install rag -n rag \
https://helm.ngc.nvidia.com/0648981100760671/charts/nvidia-blueprint-rag-v2.4.0-dev.tgz \
--username '$oauthtoken' \
--password "${NGC_API_KEY}" \
--set imagePullSecret.password=$NGC_API_KEY \
--set ngcApiSecret.password=$NGC_API_KEY \
--set ingestor-server.envVars.APP_NVINGEST_SAVETODISK=True \
--set ingestor-server.envVars.INGESTOR_SERVER_DATA_DIR=/data/ \
--set ingestor-server.persistence.enabled=true \
--set ingestor-server.persistence.size=50Gi
List and Access Files#
List results inside the ingestor-server pod (default mount path /data/):
kubectl -n rag exec -it <ingestor-pod> -- ls -l /data/
Copy data from the pod to your local computer:
kubectl -n rag cp <ingestor-pod>:/data/ ./ingestor-data