Air Gap Deployment for NVIDIA NIM for LLMs#

NVIDIA NIM for large language models (LLMs) supports serving models in an air gap system (also known as air wall, air-gapping or disconnected network). In an air gap system, you can run a NIM with no internet connection, and with no connection to the NGC registry or HuggingFace Hub.

Before you use this documentation, review all prerequisites and instructions in Get Started with NIM, and see Serving models from local assets.

You have two options for air gap deployment: offline cache and local model directory.

Air Gap Deployment for LLM-agnostic NIMs#

Local Model Directory Option#

Solution for the air gap route is to deploy the created model repository by using the create-model-store command within the NIM Container to create a repository for a single model, as shown in the following example. HF_TOKEN is needed to run this tool for creating the model-store.

Initialize Container Setup#

# Choose a container name for bookkeeping
export CONTAINER_NAME=llm-nim

# The container name from the previous ngc registry image list command
Repository=nim/nvidia/llm-nim

# Choose a LLM NIM Image from NGC
export IMG_NAME="nvcr.io/nim/nvidia/llm-nim:1.11.0"

# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"

export MODEL_REPO=/path/to/model-repository
# provide write permissions to model-repo for the user
chown -R $(id -u) $MODEL_REPO
export NIM_SERVED_MODEL_NAME=my-model
# HuggingFace model repository
export NIM_MODEL_NAME=hf://nvidia/Llama-3.1-Nemotron-Nano-8B-v1

create model store in /path/to/model-repository#

docker run -it --rm --name=$CONTAINER_NAME \
  --runtime=nvidia \
  --gpus all \
  --shm-size=16GB \
  -e NIM_SERVED_MODEL_NAME \
  -v $MODEL_REPO:/model-repo \
  -e HF_TOKEN \ 
  -p 8000:8000 \
  $IMG_NAME create-model-store --model-repo $NIM_MODEL_NAME --model-store /model-repo 

Now run following docker command in air-gap environment. Do not set HF_TOKEN as shown below:

docker run -it --rm --name=$CONTAINER_NAME \
  --runtime=nvidia \
  --gpus all \
  --shm-size=16GB \
  -e NIM_MODEL_NAME=/model-repo \
  -e NIM_SERVED_MODEL_NAME \
  -v $MODEL_REPO:/model-repo \
  -u $(id -u) \
  -p 8000:8000 \
  $IMG_NAME

Air Gap Deployment for LLM-specific NIMs#

Offline Cache Option#

If NIM detects a previously loaded profile in the cache, it serves that profile from the cache. After downloading the profiles to cache by using download-to-cache, you can transfer the cache to an air-gapped system to run a NIM without any internet connection and with no connection to the NGC registry.

download-to-cache -p 09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b

Do NOT provide the NGC_API_KEY (after running download-to-cache), as shown in the following example.

# Create an example air-gapped directory where the downloaded NIM will be deployed
export AIR_GAP_NIM_CACHE=~/.cache/air-gap-nim-cache
mkdir -p "$AIR_GAP_NIM_CACHE"

# Transport the downloaded NIM to an air-gapped directory
cp -r "$LOCAL_NIM_CACHE"/* "$AIR_GAP_NIM_CACHE"

# Choose a container name for bookkeeping
export CONTAINER_NAME=Llama-3.1-8B-instruct

# The container name from the previous ngc registry image list command
Repository=nim/meta/llama-3.1-8b-instruct

# Choose a LLM NIM Image from NGC
export IMG_NAME="nvcr.io/${Repository}:1.11.0"

# Assuming the command run prior was `download-to-cache`, downloading the optimal profile
docker run -it --rm --name=$CONTAINER_NAME \
  --runtime=nvidia \
  --gpus all \
  --shm-size=16GB \
  -v "$AIR_GAP_NIM_CACHE:/opt/nim/.cache" \
  -u $(id -u) \
  -p 8000:8000 \
  $IMG_NAME

# Assuming the command run prior was `download-to-cache --profile 09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b`
docker run -it --rm --name=$CONTAINER_NAME \
  --runtime=nvidia \
  --gpus all \
  --shm-size=16GB \
  -e NIM_MODEL_PROFILE=09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b \
  -v "$AIR_GAP_NIM_CACHE:/opt/nim/.cache" \
  -u $(id -u) \
  -p 8000:8000 \
  $IMG_NAME

Local Model Directory Option#

Another option for the air gap route is to deploy the created model repository by using the create-model-store command within the NIM Container to create a repository for a single model, as shown in the following example.

# provide write permissions to model-repo for the user
chown -R $(id -u) $MODEL_REPO
# ensure model-store should have permission to write insdie the container
create-model-store --profile 09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b --model-store /model-repo 

Do NOT provide the NGC_API_KEY (after running create-model-store), as shown in the following example.

# Choose a container name for bookkeeping
export CONTAINER_NAME=Llama-3.1-8B-instruct

# The container name from the previous ngc registry image list command
Repository=nim/meta/llama-3.1-8b-instruct

# Choose a LLM NIM Image from NGC
export IMG_NAME="nvcr.io/${Repository}:1.11.0"

# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"

export MODEL_REPO=/path/to/model-repository
export NIM_SERVED_MODEL_NAME=my-model

docker run -it --rm --name=$CONTAINER_NAME \
  --runtime=nvidia \
  --gpus all \
  --shm-size=16GB \
  -e NIM_MODEL_NAME=/model-repo \
  -e NIM_SERVED_MODEL_NAME \
  -v $MODEL_REPO:/model-repo \
  -u $(id -u) \
  -p 8000:8000 \
  $IMG_NAME