Large Language Models (1.1.0)
Large Language Models (1.1.0)

Utilities

NIM includes a set of utility scripts to assist with NIM operation.

Utilities can be launched by adding the name of the desired utility to the docker run command. For example, you can execute the list-model-profiles utility with the following command:

Copy
Copied!
            

docker run --rm --runtime=nvidia --gpus=all $IMG_NAME list-model-profiles

You can get more information about each utility with the -h flag:

Copy
Copied!
            

docker run --rm --runtime=nvidia --gpus=all $IMG_NAME download-to-cache -h

list-model-profiles

Prints to the console the system information detected by NIM, and the list of all profiles for the chosen NIM. Profiles are categorized by whether or not they are compatible with the current system, based on the system information detected.

Example

Copy
Copied!
            

docker run -it --rm --gpus all $IMG_NAME list-model-profiles

Copy
Copied!
            

SYSTEM INFO - Free GPUs: - [20b2:10de] (0) NVIDIA A100-SXM4-80GB (A100 80GB) [current utilization: 0%] - [20b2:10de] (1) NVIDIA A100-SXM4-80GB (A100 80GB) [current utilization: 0%] - [20b2:10de] (2) NVIDIA A100-SXM4-80GB (A100 80GB) [current utilization: 0%] - [20b2:10de] (3) NVIDIA A100-SXM4-80GB (A100 80GB) [current utilization: 0%] - [20b2:10de] (4) NVIDIA A100-SXM4-80GB (A100 80GB) [current utilization: 0%] - [20b2:10de] (5) NVIDIA A100-SXM4-80GB (A100 80GB) [current utilization: 0%] - [20b2:10de] (6) NVIDIA A100-SXM4-80GB (A100 80GB) [current utilization: 0%] - [20b2:10de] (7) NVIDIA A100-SXM4-80GB (A100 80GB) [current utilization: 0%] MODEL PROFILES - Compatible with system and runnable: - d86754a6413430bf502ece62fdcc8137d4ed24d6062e93c23c1090f0623d535f (tensorrt_llm-a100-bf16-tp8-latency) - 6f437946f8efbca34997428528d69b08974197de157460cbe36c34939dc99edb (tensorrt_llm-a100-bf16-tp4-throughput) - 7283d5adcddeeab03996f61a33c51552d9bcff16c38e4a52f1204210caeb393c (vllm-fp16-tp8) - cdcbc486dd076bc287cca6262c59fe90057d76ae18a407882075f65a99f5f038 (vllm-fp16-tp4) - With LoRA support: - 4cac7d500b9ed35bc51cb7845e637288c682f4a644f0b4e6a4f71d3b8b188101 (tensorrt_llm-a100-bf16-tp4-throughput-lora) - 7096ab12e70abc4ac0e125a90a8e40b296891603fad45d2b208d655ac1dea9d8 (vllm-fp16-tp8-lora) - d4bc4be4167c103b45d9375c9a907c11339f59235dfc5de321a9e13d8132aba6 (vllm-fp16-tp4-lora) - Incompatible with system: - 5296eed82c6309b64b13da03fbb843d99c3276effd6a0c51e28ad5bb29f56017 (tensorrt_llm-h100-fp8-tp8-latency) - 4e0aeeefd4dfeae46ad40f16238bbde8858850ce0cf56c26449f447a02a9ac8f (tensorrt_llm-h100-fp8-tp4-throughput) - ...

download-to-cache

Downloads selected or default model profile(s) to NIM cache. Can be used to pre-cache profiles prior to deployment. Requires NGC_API_KEY in environment.

--profiles[PROFILES ...],-p[PROFILES ...]

Profile hashes to download. If none are provided, the optimal profile is downloaded. Multiple profiles can be specified separated by spaces.

--all

Set to download all profiles to cache

--lora

Set this to download default lora profile. This expects --profiles and --all arguments are not specified.

Example

Copy
Copied!
            

docker run -it --rm --gpus all -e NGC_API_KEY -v $LOCAL_NIM_CACHE:/opt/nim/.cache \ $IMG_NAME download-to-cache -p 6f437946f8efbca34997428528d69b08974197de157460cbe36c34939dc99edb

Copy
Copied!
            

INFO 08-12 18:44:07.810 pre_download.py:80] Fetching contents for profile 6f437946f8efbca34997428528d69b08974197de157460cbe36c34939dc99edb INFO 08-12 18:44:07.810 pre_download.py:86] { "feat_lora": "false", "gpu": "A100", "gpu_device": "20b2:10de", "llm_engine": "tensorrt_llm", "pp": "1", "precision": "bf16", "profile": "throughput", "tp": "4" } ...

create-model-store

Extracts files from a cached model profile and creates a properly formatted directory. If the profile is not already cached, it will be downloaded to the model cache. Downloading the profile requires NGC_API_KEY in environment.

--profile<PROFILE>,-p<PROFILE>

Profile hash to create a model directory of. Will be downloaded if not present.

--model-store<MODEL_STORE>,-m<MODEL_STORE>

Directory path where model --profile will be extracted and copied to.

Example

Copy
Copied!
            

docker run -it --rm --gpus all -e NGC_API_KEY -v $LOCAL_NIM_CACHE:/opt/nim/.cache $IMG_NAME create-model-store -p 6f437946f8efbca34997428528d69b08974197de157460cbe36c34939dc99edb -m /tmp

Copy
Copied!
            

INFO 08-12 19:49:47.629 pre_download.py:128] Fetching contents for profile 6f437946f8efbca34997428528d69b08974197de157460cbe36c34939dc99edb INFO 08-12 19:49:47.629 pre_download.py:135] Copying contents for profile 6f437946f8efbca34997428528d69b08974197de157460cbe36c34939dc99edb to /tmp

nim-llm-check-cache-env

Checks if the NIM cache directory is present and can be written to.

Example

Copy
Copied!
            

docker run -it --rm --gpus all -v /bad_path:/opt/nim/.cache $IMG_NAME nim-llm-check-cache-env

Copy
Copied!
            

WARNING 08-12 19:54:06.347 caches.py:30] /opt/nim/.cache is read-only, application may fail if model is not already present in cache

nim-llm-set-cache-env

Prints commands for setting cache environment variables to console.

Example

Copy
Copied!
            

docker run -it --rm --gpus all -v $LOCAL_NIM_CACHE:/opt/nim/.cache $IMG_NAME nim-llm-set-cache-env

Copy
Copied!
            

export NUMBA_CACHE_DIR=/tmp/numba export NGC_HOME=/opt/nim/.cache/ngc export HF_HOME=/opt/nim/.cache/huggingface export VLLM_CONFIG_ROOT=/opt/nim/.cache/vllm/config export VLLM_CACHE_ROOT=/opt/nim/.cache/vllm/cache

Previous Llama Stack API (Experimental)
Next Observability
© Copyright © 2024, NVIDIA Corporation. Last updated on Sep 9, 2024.