Air-Gapped Deployment#

NVIDIA NIM for Visual Generative AI supports serving models in air-gapped environments (also called air wall, air-gapping, or disconnected networks). In an air-gapped setup, you run NIM without internet access and without a connection to the NGC registry.

Before you begin, review all prerequisites and instructions in Getting Started.

For air-gapped deployment steps, refer to offline cache.

Air-Gapped Deployment (Offline Cache Option)#

NIM supports serving models in air-gapped environments. When it finds a previously loaded profile in the cache, NIM automatically serves that profile from the cache.

Prerequisites#

Before you deploy in an air-gapped environment, use an internet-connected system to download the model profiles. Follow these steps:

Step 1: Set environment variables on the internet-connected system

# Set your NIM image name (replace with actual values from NGC)
export CONTAINER_NAME=<your-nim-image-name>

# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"

Step 2: Download model profiles to the cache on the internet-connected system

Launch the NIM container to use the download utilities:

docker run -it --rm --name=nim-server \
--runtime=nvidia \
--gpus all \
-e NGC_API_KEY=$NGC_API_KEY \
-e HF_TOKEN=$HF_TOKEN \
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
$CONTAINER_NAME bash
podman run -it --rm --name=nim-server \
--device nvidia.com/gpu=all \
-e NGC_API_KEY=$NGC_API_KEY \
-e HF_TOKEN=$HF_TOKEN \
-p 8000:8000 \
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache/" \
$CONTAINER_NAME bash

In the container, list the available profiles, then download the one you need:

# List available model profiles
list-model-profiles

# download a profile matching the air-gapped profile (replace with actual profile ID)
download-to-cache --profile 7ce6d27f47b2ae9e076714b14b14f0cac86a3ecf53cbf258d47970aac76b2c74

When the download finishes, exit the container:

exit

Air-Gapped System Deployment#

Step 3: Transfer the cache to the air-gapped system

Copy the entire cache directory from the internet-connected system to the air-gapped system.

Step 4: Set environment variables on the air-gapped system

# Set your NIM image name (same as used for download)
export CONTAINER_NAME=<your-nim-image-name>

# Path to the transferred cache directory on the air-gapped system
export AIR_GAP_NIM_CACHE=~/.cache/air-gap-nim-cache

# Set your NIM model profile id (same as used for download)
export NIM_MODEL_PROFILE=<your-nim-model-profile>

# Ensure the directory exists and copy the transferred cache
mkdir -p "$AIR_GAP_NIM_CACHE"
cp -r <path-to-transferred-cache>/* "$AIR_GAP_NIM_CACHE"

Step 5: Launch NIM on the air-gapped system

docker run -it --rm --name=nim-server \
--runtime=nvidia \
--gpus all \
-e NIM_MODEL_PROFILE=$NIM_MODEL_PROFILE \
-v "$AIR_GAP_NIM_CACHE:/opt/nim/.cache" \
-p 8000:8000 \
$CONTAINER_NAME
podman run -it --rm --name=nim-server \
--device nvidia.com/gpu=all \
-e NIM_MODEL_PROFILE=$NIM_MODEL_PROFILE \
-v "$AIR_GAP_NIM_CACHE:/opt/nim/.cache" \
-p 8000:8000 \
$CONTAINER_NAME
.. tab-set::
:sync-group: category

.. tab-item:: docker
    :sync: docker

    .. code-block:: bash

        docker run -it --rm --name=nim-server \
        --runtime=nvidia \
        --gpus all \
        -e NIM_MODEL_PROFILE=$NIM_MODEL_PROFILE \
        -e TORCH_CACHE_PATH=/tmp/torch_cache \
        -e TORCH_EXTENSIONS_DIR=/tmp/torch_cache/torch_extensions \
        -v "$AIR_GAP_NIM_CACHE:/opt/nim/.cache" \
        -p 8000:8000 \
        $CONTAINER_NAME

.. tab-item:: podman
    :sync: podman

    .. code-block:: bash

        podman run -it --rm --name=nim-server \
        --device nvidia.com/gpu=all \
        -e NIM_MODEL_PROFILE=$NIM_MODEL_PROFILE \
        -e TORCH_CACHE_PATH=/tmp/torch_cache \
        -e TORCH_EXTENSIONS_DIR=/tmp/torch_cache/torch_extensions \
        -v "$AIR_GAP_NIM_CACHE:/opt/nim/.cache" \
        -p 8000:8000 \
        $CONTAINER_NAME

TORCH_CACHE_PATH and TORCH_EXTENSIONS_DIR should be additionally set as they are used for the system specific optimizations.

(Optional) Verification#

After NIM starts on the air-gapped system, verify the service is ready to handle inference requests:

$ curl -X GET http://localhost:8000/v1/health/ready

Example Output

{"description":"Triton liveness check","status":"live"}