Air-Gapped Environment Deployment#

Overview#

Finetuning Microservices (FTMS) supports deployment in air-gapped environments, where Internet connectivity is restricted or unavailable. This feature enables organizations to use FTMS in secure, isolated environments while maintaining full functionality for model training, fine-tuning, and deployment.

The air-gapped deployment solution consists of three main components:

  1. Asset preparation (tao-core): Download and prepare pre-trained models, base experiments, and datasets in an Internet-connected environment

  2. Secure transfer: Transfer prepared assets (models + data) to the air-gapped environment via secure methods

  3. Chart deployment (tao-toolkit-api): Deploy FTMS in the air-gapped environment using Helm charts with SeaweedFS for distributed storage

Warning

Deployment in air-gapped environments requires disabling NVIDIA user authentication. Access control to air-gapped deployment must be managed by the user.

Key Features#

  • Complete asset download: Download entire pre-trained models for offline use

  • Flexible model selection: Support for auto-discovery, CSV-based, and architecture-specific selection modes

  • Distributed storage: Integration with SeaweedFS for scalable file storage

  • Secure transfer: Support for multiple secure transfer methods

Prerequisites#

Internet-connected environment (for preparation):

  • Access to NGC (NVIDIA GPU Cloud) with valid API keys

  • Python environment with TAO Core repository cloned

  • Sufficient storage space for model downloads (can be several gigabytes per model)

  • Sufficient storage space for datasets (can be several gigabytes to terabytes depending on data size)

Air-gapped environment (for deployment):

  • Kubernetes cluster

  • Helm 3.x installed

  • Sufficient storage for models and data

  • Storage class for persistent volumes

Architecture#

../../_images/airgapped-architecture.svg

Configuration#

Environment variables:

The following environment variables control air-gapped mode behavior:

# Enable air-gapped mode
export AIRGAPPED_MODE=true

# NGC API key for model downloads (preparation phase only)
export PTM_API_KEY="your_ngc_api_key"

Helm chart configuration:

Configure your values.yaml file for air-gapped deployment:

# Environment configuration
airgapped:
  enabled: true
  environment:
    AIRGAPPED_MODE: "true"
    LOCAL_MODEL_REGISTRY: "/shared-storage/ptm/airgapped-models"  # Must match SeaweedFS upload path

Note

If you are using Docker Compose: * You can skip the helm chart configuration and use the ./run.sh up-all --airgapped command to start the services in airgapped mode. * You should change the local model registry in config.env to match the SeaweedFS upload path. * You should update the secrets.json file with your PTM_API_KEY. * Refer to Docker Compose Deployment for more details.

Asset Preparation (Internet-Connected Environment)#

The asset preparation phase must be completed in an Internet-connected environment, and includes model preparation using the pretrained_models.py script from tao-core, dataset preparation, and container image management.

Airgapped Docker Images#

For airgapped deployments, you can use the existing Docker image save/load scripts to transfer TAO container images to your airgapped environment. This process involves saving images on an Internet-connected machine and loading them on the airgapped machine.

The key steps involve:

  1. Saving images on an Internet-connected machine using ./save-docker-images.sh

  2. Transferring the saved images to your airgapped environment

  3. Loading images on the airgapped machine using ./load-docker-images.sh

This approach ensures that all required TAO container images are available locally without requiring internet access during deployment.

Container Registry Setup#

  1. Login to NGC Private Registry.

    docker login nvcr.io
    

    Note

    You can use oauthtoken as a username and a valid NGC API key as a password to log in to the registry.

  2. Clone the FTMS “getting started” repository.

    git clone https://github.com/NVIDIA/tao_tutorials.git
    
  3. Navigate to the tao_tutorials/setup/tao-docker-compose repository.

    cd tao_tutorials/setup/tao-docker-compose
    
  4. On your Internet-connected machine, run the script to generate a folder named saved_docker_images in the root of the repository.

    ./save-docker-images.sh
    
  5. Use scp or rsync to copy the folder to your airgapped machine.

    scp -r saved_docker_images user@airgapped-machine:/path/to/saved_docker_images
    
  6. On the airgapped machine, run the script to load the container images into the local Docker registry.

    ./load-docker-images.sh
    
  7. Verify that the container images are loaded successfully into Docker.

    docker images
    

Model Preparation#

Use the pretrained_models.py script from tao-core to download pre-trained models.

Step 1. Set Up the Environment#

# Install tao-core package
pip install -U nvidia-tao-core

# Set environment variables
export AIRGAPPED_MODE=True
export PTM_API_KEY="your_ngc_api_key"

Step 2. Choose a Download Method#

The script supports multiple model discovery and download methods. Option 2 (CSV file mode) is recommended for getting started quickly as it allows you to select only the specific models needed for your workflows, while Option 1 downloads all available TAO models.

Option 1. Auto-Discovery Mode (All TAO Models)#

Automatically discovers and downloads all available models from specified organizations:

# Download all models from nvidia/tao organization
python -m nvidia_tao_core.microservices.pretrained_models \
  --org-teams "nvidia/tao" \
  --ngc-key "your_personal_ngc_key" \
  --shared-folder-path ./airgapped-models

# Auto-discover all accessible organizations
python -m nvidia_tao_core.microservices.pretrained_models \
  --ngc-key "your_personal_ngc_key" \
  --shared-folder-path ./airgapped-models
Option 3. Model Names Mode (By Architecture)#

Downloads models by matching specific architecture names.

python -m nvidia_tao_core.microservices.pretrained_models \
  --model-names "classification_pyt,dino,segformer,centerpose" \
  --ngc-key "your_personal_ngc_key" \
  --shared-folder-path ./airgapped-models
Option 4. Combined Mode#

Uses both CSV file and auto-discovery together.

python -m nvidia_tao_core.microservices.pretrained_models \
  --use-both \
  --org-teams "nvidia/tao" \
  --ngc-key "your_personal_ngc_key" \
  --shared-folder-path ./airgapped-models

Step 3. Verify Download#

# Verify download completed successfully
ls -la airgapped-models/
# Should contain: ptm_metadatas.json and model directories (nvidia/, nvstaging/, etc.)

Dataset Preparation#

Based on the TAO workflows you plan to run in an air-gapped environment, download and prepare the datasets in the Internet-connected environment:

# Create dataset directory structure
mkdir -p airgapped-datasets/{train,val,test}

# Copy your datasets to the prepared structure
cp -r /path/to/your/datasets/* airgapped-datasets/

Step 4. Package and Transfer Assets#

# Combine models and datasets into a single transfer package
mkdir -p airgapped-assets
mv airgapped-models airgapped-assets/models
mv airgapped-datasets airgapped-assets/datasets  # If you have datasets

# Create compressed archive and checksum
tar -czf airgapped-assets.tar.gz airgapped-assets/
sha256sum airgapped-assets.tar.gz > airgapped-assets.tar.gz.sha256

# Transfer to air-gapped environment (USB, SCP, etc.)
# cp airgapped-assets.tar.gz* /media/usb-drive/

Deployment (Air-Gapped Environment)#

Step 1. Extract Assets#

# Verify transfer integrity
sha256sum -c airgapped-assets.tar.gz.sha256

# Extract assets in air-gapped environment
tar -xzf airgapped-assets.tar.gz

# Verify extracted structure
ls -la airgapped-assets/
# Should contain: models/ and datasets/ directories

Step 2. Deploy FTMS and Set Up SeaweedFS#

# Deploy with air-gapped configuration
helm install tao-api /path/to/chart -f values.yaml

# Wait for pods to be ready
kubectl wait --for=condition=ready pod -l name=tao-api-app-pod --timeout=3600s

# Setup SeaweedFS access
CLUSTER_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
SEAWEED_ENDPOINT=http://$CLUSTER_IP:32333
export AWS_ACCESS_KEY_ID=seaweedfs
export AWS_SECRET_ACCESS_KEY=seaweedfs123

Note

If you are using Docker Compose, you can set the SEAWEED_ENDPOINT environment variable to http://localhost:8333.

Step 3. Upload Assets to SeaweedFS#

Important

The SeaweedFS upload path for models must exactly match the LOCAL_MODEL_REGISTRY path in your values.yaml file.

# Create storage bucket and upload assets
aws s3 mb --endpoint-url $SEAWEED_ENDPOINT s3://tao-storage

# Upload models (path must match LOCAL_MODEL_REGISTRY)
aws s3 cp --endpoint-url $SEAWEED_ENDPOINT \
  airgapped-assets/models/ \
  s3://tao-storage/shared-storage/ptm/airgapped-models/ \
  --recursive

# Upload datasets (if any)
aws s3 cp --endpoint-url $SEAWEED_ENDPOINT \
  airgapped-assets/datasets/ \
  s3://tao-storage/data/ \
  --recursive

Alternative: Direct Data Transfer via SSH Port Forwarding#

If your data is located on a different machine (e.g., an Internet-connected machine) than the one hosting the Kubernetes cluster with SeaweedFS, you can use SSH port forwarding to transfer data directly without having to first copy the data to the host machine. This method is particularly useful for transferring large datasets efficiently.

For Docker Compose deployments, port forwarding is not needed as SeaweedFS is accessible on localhost.

Setup on the Host Machine with Kubernetes:

First, set up port forwarding from the SeaweedFS pod to your local machine:

# Get the SeaweedFS S3 pod name
POD_NAME=$(kubectl get pods -l app.kubernetes.io/component=s3,app.kubernetes.io/name=seaweedfs -o jsonpath='{.items[0].metadata.name}')

# Set up port forwarding in the background
nohup kubectl port-forward pod/${POD_NAME} 8333:8333 --address 127.0.0.1 > /tmp/seaweedfs-port-forward.log 2>&1 &

Setup on the Machine with Data:

On the machine where your data is present, establish an SSH tunnel to the host machine:

# Create SSH tunnel to forward local port 8333 to the host machine
ssh -L 8333:localhost:8333 <username>@<host_machine_ip>

Once the SSH tunnel is established, configure AWS CLI and transfer your data:

# Install AWS CLI if not already installed
pip install awscli

# Configure AWS CLI with SeaweedFS credentials
aws configure set aws_access_key_id seaweedfs --profile seaweedfs
aws configure set aws_secret_access_key seaweedfs123 --profile seaweedfs
aws configure set region us-east-1 --profile seaweedfs

# Verify connection
aws s3 --endpoint-url http://localhost:8333 --profile seaweedfs ls

# Create bucket (if not already created)
aws s3 --endpoint-url http://localhost:8333 --profile seaweedfs mb s3://tao-storage

# List bucket contents
aws s3 --endpoint-url http://localhost:8333 --profile seaweedfs ls s3://tao-storage/

# Upload data directly to SeaweedFS
aws s3 --endpoint-url http://localhost:8333 --profile seaweedfs cp ~/data/annotations.json s3://tao-storage/data/cosmos_rl_its_subset/
aws s3 --endpoint-url http://localhost:8333 --profile seaweedfs cp ~/data/videos.tar.gz s3://tao-storage/data/cosmos_rl_its_subset/

Note

This method enables direct data transfer from an Internet-connected machine to the SeaweedFS instance without requiring an intermediate transfer to the host machine. This is particularly efficient for large datasets.

Step 4. Load Model Metadata#

Uses the FTMS API to load model metadata into the database.

import requests
import json

# Ensure that the airgapped-models/ directory with ptm_metadatas.json is uploaded.

# 1. Log in to TAO-API Service.
data = json.dumps({"ngc_org_name": ngc_org_name, "ngc_key": ngc_key})
response = requests.post(f"{host_url}/api/v1/login", data=data)
token = response.json()["token"]
base_url = f"{host_url}/api/v1/orgs/{ngc_org_name}"
headers = {"Authorization": f"Bearer {token}"}

# 5. Create a cloud workspace using the following metadata:
cloud_metadata = {
    "name": "SeaweedFS Workspace",
    "cloud_type": "seaweedfs",
    "cloud_specific_details": {
        "cloud_region": "us-east-1",
        "cloud_bucket_name": "tao-storage",
        "access_key": "seaweedfs",
        "secret_key": "seaweedfs123",
        "endpoint_url": "http://seaweedfs-s3:8333"
    }
}
endpoint = f"{base_url}/workspaces"
data = json.dumps(cloud_metadata)
response = requests.post(endpoint, headers=headers, data=data)
workspace_id = response.json()["id"]

# 6. Load base experiments into DB from below endpoint.
endpoint = f"{base_url}/experiments:load_airgapped"
data = {
    "workspace_id": workspace_id,
}
response = requests.post(endpoint, headers=headers, json=data)
# 2. Load base experiments into database
endpoint = f"{base_url}/experiments:load_airgapped"
data = {"workspace_id": workspace_id}
response = requests.post(endpoint, headers=headers, json=data)

Troubleshooting#

These are some common issues you may encounter:

  1. “Failed to download JSON file from cloud storage”: - Ensure that the SeaweedFS upload path matches LOCAL_MODEL_REGISTRY in values.yaml. - Verify that ptm_metadatas.json exists in uploaded models.

  2. SeaweedFS connection issues: - Check pods: kubectl get pods -l app.kubernetes.io/name=seaweedfs - Test connectivity: curl http://seaweedfs-s3:8333

  3. Path mismatch: - Verify that the models were uploaded to the correct S3 path matching LOCAL_MODEL_REGISTRY. - Use aws s3 ls to verify path structure.

Conclusion#

The FTMS air-gapped deployment feature enables you to create secure, isolated environments while maintaining full functionality. By combining model preparation from tao-core with the deployment infrastructure from tao-toolkit-api, you can meet strict security requirements while supporting complete ML/AI workflows.