Utilities#

NIM includes a set of utility scripts to assist with NIM operation.

Utilities can be launched by adding the name of the desired utility to the docker run command. For example, you can execute the list-model-profiles utility with the following command:

docker run --rm --runtime=nvidia --gpus=all --entrypoint nim_list_model_profiles $IMG_NAME
podman run --rm --device nvidia.com/gpu=all --entrypoint nim_list_model_profiles $IMG_NAME

You can get more information about each utility with the -h flag:

docker run --rm --runtime=nvidia --gpus=all --entrypoint nim_list_model_profiles $IMG_NAME -h
podman run --rm --device nvidia.com/gpu=all --entrypoint nim_list_model_profiles $IMG_NAME -h

List available model profiles#

nim_list_model_profiles()#

Prints to the console the system information detected by NIM, and the list of all profiles for the chosen NIM. Profiles are categorized by whether or not they are compatible with the current system, based on the system information detected. This function can also be called using its alias list-model-profiles.

manifest-file <manifest_file>, -m <manifest_file>#

The Manifest file path is an optional parameter that users can specify.

Example#

docker run -it --rm --gpus all --entrypoint nim_list_model_profiles $IMG_NAME -m $HOME/model_manifest.yaml
podman run --rm --device nvidia.com/gpu=all --entrypoint nim_list_model_profiles $IMG_NAME -m $HOME/model_manifest.yaml
SYSTEM INFO
    - Free GPUs:
      -  [26b5:10de] (0) NVIDIA L40 (L40S) [current utilization: 1%]
    MODEL PROFILES
    - All profiles:
        - 537add8d8b4db6120a905d2e43f674de8b671af8546a109d1ac68b1553481232 (backend=tensorrt, gpu=a10g, number_of_gpus=1, precision=int8, resolution=1024x1024, variant=base+refiner)
        - 596644b463df89398e2ca7309b57ac2320d396e5fa35ffd6c40174e5e262ea45 (backend=tensorrt, gpu=a100, number_of_gpus=1, precision=int8, resolution=1024x1024, variant=base+refiner)
        - 70526c17e24b2ce6d9a7901f6f66908468f42b1edf7eb57be67369bc0505ccf0 (backend=pytorch, gpu=generic, number_of_gpus=1, precision=fp16, variant=base)
        - 7212633181c6e19cf5fdff678b14994009bf2fb1132c0c2f199dcce9af68ada6 (backend=tensorrt, gpu=l40, number_of_gpus=1, precision=int8, resolution=1024x1024, variant=base+refiner)
        - 91537943f8f27fe7f7f40fb4bfbf4671008490c08811e17bdfe409648ce43780 (backend=tensorrt, gpu=a10g, number_of_gpus=1, precision=int8, resolution=768-1344x768-1344, variant=base+refiner)
        - acc5c183e9b76e38e70b94f8d4ee2ac6d607e59aa285e52894271b3688db1cad (backend=tensorrt, gpu=l40, number_of_gpus=1, precision=int8, resolution=768-1344x768-1344, variant=base+refiner)
        - badba2b53daa7021ddbdcf297a8d1cae41fb384fd3eee430001abc22b5df5734 (backend=tensorrt, gpu=h100, number_of_gpus=1, precision=fp8, resolution=1024x1024, variant=base+refiner)
        - bf33593ff91d33ccb81a70a90cdcc4b1b19cfa578b4030a15d2e6986b7057a0d (backend=pytorch, gpu=generic, number_of_gpus=1, precision=fp16, variant=base+refiner)
        - c37834c9f0b27b5502459f1939c950d35e40996a1b144bf5de3f91d4cf8a149d (backend=tensorrt, gpu=a100, number_of_gpus=1, precision=int8, resolution=768-1344x768-1344, variant=base+refiner)
        - f26976025835def8c32b29b910ee5fb33aac46ca368de7fd3f46a0ad65e1c9c7 (backend=tensorrt, gpu=h100, number_of_gpus=1, precision=fp8, resolution=768-1344x768-1344, variant=base+refiner)

Download model profiles to NIM cache#

nim_download_to_cache()#

Downloads selected or default model profile(s) to NIM cache. Can be used to pre-cache profiles prior to deployment. Requires NGC_API_KEY in environment. This function can also be called using its alias download-to-cache.

--profiles [PROFILES ...], -p [PROFILES ...]#

Profile hashes to download. If none are provided, the optimal profile is downloaded. Multiple profiles can be specified separated by spaces.

--all#

Set to download all profiles to cache

--lora#

Set this to download default lora profile. This expects --profiles and --all arguments are not specified.

manifest-file <manifest_file>, -m <manifest_file>#

The Manifest file path is an optional parameter that users can specify. It allows for the downloading of model profiles.

--model-cache-path <model-cache-path>#

The model cache path is an optional parameter that users can specify. This feature enables the modification of the default model_cache_path.

Example#

docker run -it --rm --gpus all -e NGC_API_KEY -v $LOCAL_NIM_CACHE:/opt/nim/.cache --entrypoint nim_download_to_cache $IMG_NAME -p 8543f214b6e05f3b2b25596c7a4990ee43d727c4671b13db58a2adf3630d7efe
podman run -it --rm --device nvidia.com/gpu=all -e NGC_API_KEY -v $LOCAL_NIM_CACHE:/opt/nim/.cache --entrypoint nim_download_to_cache $IMG_NAME -p 8543f214b6e05f3b2b25596c7a4990ee43d727c4671b13db58a2adf3630d7efe
"timestamp": "2024-10-28 12:54:40,457", "level": "INFO", "message": "Fetching contents for profile 8543f214b6e05f3b2b25596c7a4990ee43d727c4671b13db58a2adf3630d7efe"
"timestamp": "2024-10-28 12:54:40,457", "level": "INFO", "message": "{
"gpu": "a6000",
"llm_engine": "trtllm",
"precision": "fp16",
"tp": "1"
}"
...

Create model store#

nim_create_model_store()#

Extracts files from a cached model profile and creates a properly formatted directory. If the profile is not already cached, it will be downloaded to the model cache. Downloading the profile requires NGC_API_KEY in environment. This function can also be called using its alias create-model-store.

--profile <PROFILE>, -p <PROFILE>#

Profile hash to create a model directory of. Will be downloaded if not present.

--model-store <MODEL_STORE>, -m <MODEL_STORE>#

Directory path where model --profile will be extracted and copied to.

--model-cache-path <model-cache-path>#

The Manifest file path is an optional parameter that users can specify. This feature enables the modification of the default model_cache_path.

Example#

docker run -it --rm --gpus all -e NGC_API_KEY -v $LOCAL_NIM_CACHE:/opt/nim/.cache --entrypoint nim_create_model_store  $IMG_NAME -p 6f437946f8efbca34997428528d69b08974197de157460cbe36c34939dc99edb -m /tmp
podman run -it --rm --device nvidia.com/gpu=all -e NGC_API_KEY -v $LOCAL_NIM_CACHE:/opt/nim/.cache --entrypoint nim_create_model_store  $IMG_NAME -p 6f437946f8efbca34997428528d69b08974197de157460cbe36c34939dc99edb -m /tmp
"timestamp": "2024-10-28 12:58:37,113", "level": "INFO", "message": "Creating model store at /tmp"
"timestamp": "2024-10-28 12:58:37,114", "level": "INFO", "message": "Copying contents for profile 6f437946f8efbca34997428528d69b08974197de157460cbe36c34939dc99edb to /tmp"

Check NIM cache#

nim_check_cache_env()#

Checks if the NIM cache directory is present and can be written to. This function can also be called using its alias nim-llm-check-cache-env.

Example#

docker run -it --rm --gpus all -v /bad_path:/opt/nim/.cache --entrypoint nim_check_cache_env $IMG_NAME
podman run -it --rm --device nvidia.com/gpu=all -v /bad_path:/opt/nim/.cache --entrypoint nim_check_cache_env $IMG_NAME
WARNING 08-12 19:54:06.347 caches.py:30] /opt/nim/.cache is read-only, application may fail if model is not already present in cache