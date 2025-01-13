NIM includes a set of utility scripts to assist with NIM operation.

Utilities can be launched by adding the name of the desired utility to the docker run command. For example, you can execute the list-model-profiles utility with the following command:

docker run --rm --runtime = nvidia --gpus = all $IMG_NAME list-model-profiles

You can get more information about each utility with the -h flag:

docker run --rm --runtime = nvidia --gpus = all $IMG_NAME download-to-cache -h

List available model profiles# list-model-profiles Prints to the console the system information detected by NIM, and the list of all profiles for the chosen NIM. Profiles are categorized by whether or not they are compatible with the current system, based on the system information detected. Example Output: SYSTEM INFO - Free GPUs : - [ 20 b2 : 10 de ] ( 0 ) NVIDIA A100 - SXM4 -80 GB ( A100 80 GB ) [ current utilization : 1 % ] - [ 20 b2 : 10 de ] ( 1 ) NVIDIA A100 - SXM4 -80 GB ( A100 80 GB ) [ current utilization : 1 % ] MODEL PROFILES - Compatible with system and runnable : - a93a1a6b72643f2b2ee5e80ef25904f4d3f942a87f8d32da9e617eeccfaae04c ( tensorrt_llm - a100 - fp16 - tp2 - latency ) - 751382 df4272eafc83f541f364d61b35aed9cce8c7b0c869269cea5a366cd08c ( tensorrt_llm - a100 - fp16 - tp1 - throughput ) - 19031 a45cf096b683c4d66fff2a072c0e164a24f19728a58771ebfc4c9ade44f ( vllm - fp16 - tp2 ) - 8835 c31752fbc67ef658b20a9f78e056914fdef0660206d82f252d62fd96064d ( vllm - fp16 - tp1 ) - With LoRA support : - cce57ae50c3af15625c1668d5ac4ccbe82f40fa2e8379cc7b842cc6c976fd334 ( tensorrt_llm - a100 - fp16 - tp1 - throughput - lora ) - c5ffce8f82de1ce607df62a4b983e29347908fb9274a0b7a24537d6ff8390eb9 ( vllm - fp16 - tp2 - lora ) - 8 d3824f766182a754159e88ad5a0bd465b1b4cf69ecf80bd6d6833753e945740 ( vllm - fp16 - tp1 - lora ) - Incompatible with system : - dcd85d5e877e954f26c4a7248cd3b98c489fbde5f1cf68b4af11d665fa55778e ( tensorrt_llm - h100 - fp8 - tp2 - latency ) - f59d52b0715ee1ecf01e6759dea23655b93ed26b12e57126d9ec43b397ea2b87 ( tensorrt_llm - l40s - fp8 - tp2 - latency ) - 30 b562864b5b1e3b236f7b6d6a0998efbed491e4917323d04590f715aa9897dc ( tensorrt_llm - h100 - fp8 - tp1 - throughput ) ...

Download model profiles to NIM cache# download-to-cache Downloads selected or default model profile(s) to NIM cache. Can be used to pre-cache profiles prior to deployment. --profiles [PROFILES ...] , -p [PROFILES ...] # Profile hashes to download. If none are provided, the optimal profile is downloaded. Multiple profiles can be specified separated by spaces. --all # Set to download all profiles to cache --lora # Set this to download default lora profile. This expects --profiles and --all arguments are not specified.

Create model store# create-model-store Extracts files from a cached model profile and creates a properly formatted directory. If the profile is not already cached, it will be downloaded to the model cache. --profile <PROFILE> , -p <PROFILE> # Profile hash to create a model directory of. Will be downloaded if not present. --model-store <MODEL_STORE> , -m <MODEL_STORE> # Directory path where model --profile will be extracted and copied to.

Check NIM cache# nim-llm-check-cache-env Checks if the NIM cache directory is present and can be written to.