NVIDIA NIM for large language models (LLMs) supports serving models in an air gap system (also known as air wall, air-gapping or disconnected network). Before you use this documentation, review all prerequisites and instructions in Getting Started, and see Serving models from local assets.

Air Gap Deployment (offline cache route)#

If NIM detects a previously loaded profile in the cache, it serves that profile from the cache. After downloading the profiles to cache using download-to-cache , the cache can be transferred to an air-gapped system to run a NIM without any internet connection and with no connection to the NGC registry.

To see this in action, do NOT provide the NGC_API_KEY, as shown in the following example.