Utilities#

NIM includes a set of utility scripts to assist with NIM operation.

Utilities can be launched by adding the name of the desired utility to the docker run command. For example, you can execute the list-model-profiles utility with the following command:

docker run --rm --runtime=nvidia --gpus=all $IMG_NAME list-model-profiles

You can get more information about each utility with the -h flag:

docker run --rm --runtime=nvidia --gpus=all $IMG_NAME download-to-cache -h

List Available Model Profiles#

Prints to the console the system information detected by NIM, and the list of all profiles for the chosen NIM. Profiles are categorized by whether or not they are compatible with the current system, based on the system information detected.

list-model-profiles

Example#

Note

Model profile’s ID in the examples migth be different from user’s log, that is expected if user running on different GPUs and profile-id might also get updated in each release.

docker run -it --rm --gpus all $IMG_NAME list-model-profiles

SYSTEM INFO
- Free GPUs:
  -  [2331:10de] (0) NVIDIA H100 PCIe (H100 80GB) [current utilization: 0%]
  -  [2331:10de] (1) NVIDIA H100 PCIe (H100 80GB) [current utilization: 0%]
  -  [2331:10de] (2) NVIDIA H100 PCIe (H100 80GB) [current utilization: 0%]
  -  [2331:10de] (3) NVIDIA H100 PCIe (H100 80GB) [current utilization: 0%]
  -  [2331:10de] (4) NVIDIA H100 PCIe (H100 80GB) [current utilization: 0%]
  -  [2331:10de] (5) NVIDIA H100 PCIe (H100 80GB) [current utilization: 0%]
  -  [2331:10de] (6) NVIDIA H100 PCIe (H100 80GB) [current utilization: 0%]
  -  [2331:10de] (7) NVIDIA H100 PCIe (H100 80GB) [current utilization: 0%]
MODEL PROFILES
- Compatible with system and runnable:
  - 454ab496734eabc2a40e03076f85a9691a1487a9fd987580a55421aaad2684ce (h100-fp16-tp1-throughput)
  - With LoRA support:
- Incompatible with system:
  - 85a9d051b6ccfb146f764221fc73a3a0d8af9d2983ce42e04a112266f6b37524 (a100-fp16-tp1-throughput)
  - 76869db83fc0a776cf2eb6199ab173688c5ac03229aa33a00c59e17debb225ab (l40s-fp16-tp1-throughput)
  - 39dd37c06749c414ce4d1e5a87a26e0cd18f0471f5314247fa71c73f3d22fcee (h100l-fp16-tp1-throughput)
  - ...

Download Model Profiles to NIM Cache#

Downloads selected or default model profile(s) to NIM cache. Can be used to pre-cache profiles prior to deployment. Requires NGC_API_KEY in environment.

download-to-cache

     --profiles [PROFILES ...], -p [PROFILES ...]
Profile hashes to download. If none are provided, the optimal profile is downloaded. Multiple profiles can be specified separated by spaces.


  --all
Set to download all profiles to cache


  --lora
Set this to download default lora profile.
This expects {option}`--profiles` and {option}`--all` arguments are not specified.

Example#

docker run -it --rm --gpus all -e NGC_API_KEY -v $LOCAL_NIM_CACHE:/opt/nim/.cache \
  $IMG_NAME download-to-cache -p 454ab496734eabc2a40e03076f85a9691a1487a9fd987580a55421aaad2684ce

INFO 08-12 18:44:07.810 pre_download.py:80] Fetching contents for profile 454ab496734eabc2a40e03076f85a9691a1487a9fd987580a55421aaad2684ce
INFO 08-12 18:44:07.810 pre_download.py:86] {
  "gpu": "H100",
  "gpu_device": "2331:10de",
  "llm_engine": "tensorrt_llm",
  "pp": "1",
  "precision": "bf16",
  "profile": "throughput",
  "tp": "1"
}
...

Create Model Store#

Extracts files from a cached model profile and creates a properly formatted directory. If the profile is not already cached, it will be downloaded to the model cache. Downloading the profile requires NGC_API_KEY in environment.

create-model-store

  --profile <PROFILE>, -p <PROFILE>
Profile hash to create a model directory of. Will be downloaded if not present.


  --model-store <MODEL_STORE>, -m <MODEL_STORE>
Directory path where model {option}`--profile` will be extracted and copied to.

Example#

docker run -it --rm --gpus all -e NGC_API_KEY -v $LOCAL_NIM_CACHE:/opt/nim/.cache $IMG_NAME create-model-store -p 454ab496734eabc2a40e03076f85a9691a1487a9fd987580a55421aaad2684ce -m /tmp

INFO 08-12 19:49:47.629 pre_download.py:128] Fetching contents for profile 454ab496734eabc2a40e03076f85a9691a1487a9fd987580a55421aaad2684ce
INFO 08-12 19:49:47.629 pre_download.py:135] Copying contents for profile 454ab496734eabc2a40e03076f85a9691a1487a9fd987580a55421aaad2684ce to /tmp

Check NIM Cache#

Checks if the NIM cache directory is present and can be written to.

VLLM_NVEXT_LOG_LEVEL=debug nim-llm-check-cache-env

Example#

docker run -it --rm --gpus all -e VLLM_NVEXT_LOG_LEVEL=debug -v /bad_path:/opt/nim/.cache $IMG_NAME nim-llm-check-cache-env

WARNING 08-12 19:54:06.347 caches.py:30] /opt/nim/.cache is read-only, application may fail if model is not already present in cache

Set Cache Environment Variables#

Prints commands for setting cache environment variables to console.

nim-llm-set-cache-env

Example#

docker run -it --rm --gpus all -v $LOCAL_NIM_CACHE:/opt/nim/.cache $IMG_NAME nim-llm-set-cache-env

export NUMBA_CACHE_DIR=/tmp/numba
export NGC_HOME=/opt/nim/.cache/ngc
export HF_HOME=/opt/nim/.cache/huggingface
export VLLM_CONFIG_ROOT=/opt/nim/.cache/vllm/config
export VLLM_CACHE_ROOT=/opt/nim/.cache/vllm/cache

Air Gap Deployment (offline cache route)#

NIM supports serving models in an Air Gap system (also known as air wall, air-gapping or disconnected network). If NIM detects a previously loaded profile in the cache, it serves that profile from the cache. After downloading the profiles to the cache by using download-to-cache, the cache can be transferred to an air-gapped system to run a NIM without any internet connection and with no connection to the NGC registry.

To do this, do not provide the NGC_API_KEY to the docker run command, as shown in the following example.

# Create an example air-gapped directory where the downloaded NIM will be deployed
export AIR_GAP_NIM_CACHE=~/.cache/air-gap-nim-cache
mkdir -p "$AIR_GAP_NIM_CACHE"

# Transport the downloaded NIM to an air-gapped directory
cp -r "$LOCAL_NIM_CACHE"/* "$AIR_GAP_NIM_CACHE"

# Choose a container name for bookkeeping
export CONTAINER_NAME=nvidia-vila

# The container name from the previous ngc registgry image list command
Repository=vila-1.5-35b
Latest_Tag=latest

# Choose a LLM NIM Image from NGC
export IMG_NAME="nvcr.io/nim/nvidia/${Repository}:${Latest_Tag}"

# Assuming the command run prior was `download-to-cache`, downloading the optimal profile
docker run -it --rm --name=$CONTAINER_NAME \
   --runtime=nvidia \
   --gpus all \
   --shm-size=16GB \
   -v "$AIR_GAP_NIM_CACHE:/opt/nim/.cache" \
   -u $(id -u) \
   -p 8000:8000 \
   $IMG_NAME

# Assuming the command run prior was `download-to-cache --profile 454ab496734eabc2a40e03076f85a9691a1487a9fd987580a55421aaad2684ce`
docker run -it --rm --name=$CONTAINER_NAME \
   --runtime=nvidia \
   --gpus all \
   --shm-size=16GB \
   -e NIM_MODEL_PROFILE=454ab496734eabc2a40e03076f85a9691a1487a9fd987580a55421aaad2684ce \
   -v "$AIR_GAP_NIM_CACHE:/opt/nim/.cache" \
   -u $(id -u) \
   -p 8000:8000 \
   $IMG_NAME