Air-Gapped Deployment (End-to-End)
This guide consolidates what you need to run NeMo Retriever Library in a secured, network-isolated environment (for example 26.3.0 on Kubernetes or Docker Compose). It focuses on NeMo Retriever–specific images, Helm assets, and configuration so you can plug them into your broader air-gapped platform (private container registry, model artifact storage, NVIDIA GPU Operator, and NIM Operator).
Source of truth for versions
Image repositories and default tags change between releases. Always verify pins against the release/26.3.0 (or your exact stack) branches of:
- NeMo Retriever
docker-compose.yamlfor self-hosted Compose (check out the Git tag or branch that matches 26.3.0 in your environment;mainmoves forward) helm/values.yamlandhelm/README.mdfor Kubernetes / Helm
End-to-end workflow
Use a staging machine (or bastion) that can reach the public internet and NGC, then promote artifacts into the disconnected site.
- Inventory — Decide deployment mode (Compose vs Helm), optional profiles / NIMs (audio, Nemotron Parse, VLM, Milvus retrieval, reranker), and list every image and chart you must mirror (sections below).
- Mirror container images — Pull from upstream registries, retag to your private registry (optional but recommended), record digests for reproducibility, and push to the registry reachable from the air-gapped environment.
- Stage non-image assets — Helm chart
.tgzpackages from NGC (or vendor them internally), Python wheels for clients (nv-ingest-client), and any operator bundles your cluster policy requires. - Configure pulls and runtime — Point all
image.repositoryvalues (and Compose image env vars) at the private registry; useimagePullSecrets; setimagePullPolicy: IfNotPresent(orNeveronly if every node is preloaded). Ensure no workload still referencesintegrate.api.nvidia.com,ai.api.nvidia.com, or other hosted NVIDIA APIs unless you intentionally proxy them. - NIM models and caches — For Kubernetes, follow NIM Operator: Air-gapped environments to preload models into NIMCache / private artifact storage so NIM pods never need outbound registry access at runtime.
- Validate — From a jump host inside the enclave, run image pull tests,
helm templatewith offline values, then smoke-test ingest health (/v1/health/ready) and a minimal extract job.
What NeMo Retriever / NV-Ingest needs (container images)
Core pipeline (typical default)
These services are commonly enabled for document extraction with self-hosted NIMs (names match Docker Compose services where applicable).
| Role | Default image (verify tag in repo) | Notes |
|---|---|---|
| Ingest runtime | nvcr.io/nvidia/nemo-microservices/nv-ingest:26.3.0 |
Main API / Ray service |
| Page elements NIM | nvcr.io/nim/nvidia/nemotron-page-elements-v3 |
Default tag in Compose: 1.8.0 |
| Graphic elements NIM | nvcr.io/nim/nvidia/nemotron-graphic-elements-v1 |
Default tag: 1.8.0 |
| Table structure NIM | nvcr.io/nim/nvidia/nemotron-table-structure-v1 |
Default tag: 1.8.0 |
| OCR NIM | nvcr.io/nim/nvidia/nemotron-ocr-v1 |
Default tag: 1.3.0 |
| Embedding NIM | nvcr.io/nim/nvidia/llama-nemotron-embed-1b-v2 |
Default tag: 1.13.0 |
| Redis | redis/redis-stack |
Message broker / stack |
| OpenTelemetry Collector | otel/opentelemetry-collector-contrib |
e.g. 0.140.0 in Helm values |
| Zipkin | openzipkin/zipkin |
e.g. 3.5.0 in Helm values |
Optional profiles / features
Add these images only if you enable the matching Compose profiles or Helm NIM toggles.
| Feature | Image | When needed |
|---|---|---|
| Nemotron Parse | nvcr.io/nim/nvidia/nemotron-parse |
Advanced PDF parsing profile |
| VLM captioning | nvcr.io/nim/nvidia/nemotron-nano-12b-v2-vl |
vlm profile / nemotron_nano_12b_v2_vl in Helm |
| Audio (ASR) | nvcr.io/nim/nvidia/parakeet-1-1b-ctc-en-us |
audio profile |
| Reranking NIM | nvcr.io/nim/nvidia/llama-nemotron-rerank-1b-v2 |
Reranker profile / rerankqa in Helm |
| Retrieval stack (Milvus path) | milvusdb/milvus, minio/minio, quay.io/coreos/etcd, zilliz/attu |
Compose retrieval profile; Helm deploys Milvus/MinIO via subchart defaults |
| Observability extras | prom/prometheus, grafana/grafana |
Only if you enable monitoring containers in Compose |
Kubernetes-only dependencies (Helm chart defaults)
When you install from the nv-ingest Helm chart, also plan to mirror subchart images your values enable, for example:
- Milvus (
milvusdb/milvusand bundled etcd/minio images from chart values) - Redis (
rediswith chart tag, for example8.2.3in defaultvalues.yaml)
Again, take the exact repository and tag from your pinned values.yaml for 26.3.0.
Helm charts and packaging artifacts
From a connected environment, download and version-control the chart archive you install, for example (see NV-Ingest Helm README):
nv-ingest-26.3.0.tgzfrom NGC Helm (helm pullwith NGC credentials)
If your process forbids live helm install from URLs, use helm pull on the staging host, copy the .tgz and any dependent charts into the enclave, then helm upgrade --install using the local file path.
Pinning versions and digests
- Tags — Align Compose
*_TAG/ Helmimage.tagfields with the same release line you qualified (for example26.3.0for the ingest image, NIM tags fromdocker-compose.yaml/nimOperatorinvalues.yaml). - Digests — After
docker pull nvcr.io/...:tag, rundocker inspect --format='{{index .RepoDigests 0}}' image:tag(or use crane / skopeo; see tooling below) and recordrepository@sha256:.... Prefer deploying with digests in highly regulated environments; keep a mapping table from digest → human-readable tag for operations.
Mirroring images into a private registry
Typical pattern on the staging host:
docker login nvcr.io(NGC key as password, username$oauthtoken) and log in to other upstream registries you use (docker.io,quay.io, etc.).- For each image:
docker pull upstream/image:tag docker tag upstream/image:tag <PRIVATE_REGISTRY>/nv-ingest-mirror/upstream-image:tagdocker push <PRIVATE_REGISTRY>/nv-ingest-mirror/upstream-image:tag
For large fleets, prefer skopeo or crane for copy/sync between registries without loading into a local Docker daemon.
Transfer tarballs instead when the enclave has no registry yet: docker save -o nv-ingest-bundle.tar multiple images, move media, then docker load / ctr -n k8s.io images import on nodes (cluster-specific).
Pointing deployments at the private registry (avoid runtime pulls)
Docker Compose
- Override each
*_IMAGE/*_TAGenvironment variable (seedocker-compose.yaml) so everyimage:resolves to your mirror. - Keep hosted API endpoints disabled: use in-stack URLs for NIMs (defaults in the compose file already prefer
http://…service names overhttps://integrate.api.nvidia.com). - Provide
.envor config alongside the compose file in the enclave; never rely on pulling new images atuptime without registry access.
For a short Compose-oriented procedure, see Air-Gapped Deployment (Docker Compose) in the self-hosted quickstart; this page is the complete checklist.
Helm (Kubernetes)
- Set the main ingest image, for example:
image.repository=<PRIVATE_REGISTRY>/nvidia/nemo-microservices/nv-ingest
image.tag=26.3.0
-
Under
nimOperatorinvalues.yaml, overrideimage.repository/image.tagfor each enabled NIM (page elements, graphic elements, table structure, OCR, embed, optional parse/VLM/audio/rerank) to your mirrored paths. -
Configure
imagePullSecretsso kubelet can authenticate to your private registry (the chart defaults assume NGC-style secrets; replace with secrets that reference your mirror’s credentials). -
Set
imagePullPolicy: IfNotPresent(default in many paths) once images are pre-pulled or pulled through the mirror; useNeveronly if you fully preload images on every node and understand scheduling failure modes. -
Review
envVarsfor any URL that still points to the public internet (for example hosted VLM or Parse endpoints). For fully offline captioning, deploy the Nemotron Nano VL NIM in-cluster and setVLM_CAPTION_ENDPOINTto the in-cluster HTTP/gRPC endpoint (see comments invalues.yamlfor the expected pattern). -
Install GPU Operator and NIM Operator, configured for offline registries and cached models (NIM Operator air-gap guide).
Recommended tooling references
| Tool | Use case |
|---|---|
| skopeo | Copy/sign/inspect images between registries without Docker |
| crane | Bulk copy, tag, digest resolution |
| Helm | Package, template, and install charts from local .tgz |
| Harbor or Docker Registry | Private registry to host mirrored images |
| NIM Operator — Air-gapped environments | NIM-specific mirroring, secrets, and model cache behavior |
Related documentation
- Deploy With Helm for NeMo Retriever Library
- Deploy (Self-Hosted) Quickstart — includes Compose air-gap summary
- Environment Variables — runtime tuning and endpoints
- Generate Your NGC Keys — staging-time pulls from
nvcr.io
Broader dependencies (outside this doc’s scope)
Plan separately for GPU drivers / GPU Operator, ingress / TLS, storage classes for Milvus and NIM PVCs, enterprise image scanning, and internal PyPI / wheel mirrors for Python clients. NeMo Retriever’s ingest container is built so common tokenizer assets are present at build time; if you enable features that need extra Hugging Face access, you must pre-stage those artifacts per your security policy (see Environment Variables for tokens such as HF_ACCESS_TOKEN).