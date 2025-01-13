You can easily deploy custom, fine-tuned models on NIM. NIM automatically builds an optimized TensorRT-LLM locally-built engine given the weights in the HuggingFace or NeMo formats.

You can deploy the non-optimized model as described in Serving models from local assets.

Launch the NIM container

export CUSTOM_WEIGHTS = /path/to/customized/llama docker run -it --rm --name = llama3-8b-instruct \ --gpus all \ -e NIM_FT_MODEL = $CUSTOM_WEIGHTS \ -e NIM_SERVED_MODEL_NAME = "llama3.1-8b-my-domain" \ -e NIM_CUSTOM_MODEL_NAME = custom_1 \ # set this to cache the model for faster subsequent runs -v $CUSTOM_WEIGHTS : $CUSTOM_WEIGHTS \ -u $( id -u ) \ $NIM_IMAGE

You can also select an alternative profile by using the output of the list-model-profiles command, which lists the profiles available within the container.

This command should produce output similar to the following.

SYSTEM INFO - Free GPUs : - [ 26 b3 : 10 de ] ( 0 ) NVIDIA RTX 5880 Ada Generation ( RTX A6000 Ada ) [ current utilization : 0 % ] - [ 26 b3 : 10 de ] ( 1 ) NVIDIA RTX 5880 Ada Generation ( RTX A6000 Ada ) [ current utilization : 0 % ] MODEL PROFILES - Compatible with system and runnable : - 771 c17ba45c566b400c5823af6188d479e3703e5b25f56260713afcc377bcfa5 ( custom_1 ) - 19031 a45cf096b683c4d66fff2a072c0e164a24f19728a58771ebfc4c9ade44f ( vllm - fp16 - tp2 ) - 8835 c31752fbc67ef658b20a9f78e056914fdef0660206d82f252d62fd96064d ( vllm - fp16 - tp1 ) - With LoRA support : - c5ffce8f82de1ce607df62a4b983e29347908fb9274a0b7a24537d6ff8390eb9 ( vllm - fp16 - tp2 - lora ) - 8 d3824f766182a754159e88ad5a0bd465b1b4cf69ecf80bd6d6833753e945740 ( vllm - fp16 - tp1 - lora ) - Compilable to TRT - LLM using just - in - time compilation of HF models to TRTLLM engines : - 375 dc0ff86133c2a423fbe9ef46d8fdf12d6403b3caa3b8e70d7851a89fc90dd ( tensorrt_llm - trtllm_buildable - bf16 - tp2 ) - 54946 b08b79ecf9e7f2d5c000234bf2cce19c8fee21b243c1a084b03897e8c95 ( tensorrt_llm - trtllm_buildable - bf16 - tp1 ) - With LoRA support : - 7 b8458eb682edb0d2a48b4019b098ba0bfbc4377aadeeaa11b346c63c7adf724 ( tensorrt_llm - trtllm_buildable - bf16 - tp2 - lora ) - 00172 c81416075181f203532da34b88e371b8081d2ad801d9d30110ea88cbf95 ( tensorrt_llm - trtllm_buildable - bf16 - tp1 - lora ) - Incompatible with system : - dcd85d5e877e954f26c4a7248cd3b98c489fbde5f1cf68b4af11d665fa55778e ( tensorrt_llm - h100 - fp8 - tp2 - latency ) - f59d52b0715ee1ecf01e6759dea23655b93ed26b12e57126d9ec43b397ea2b87 ( tensorrt_llm - l40s - fp8 - tp2 - latency ) - 30 b562864b5b1e3b236f7b6d6a0998efbed491e4917323d04590f715aa9897dc ( tensorrt_llm - h100 - fp8 - tp1 - throughput ) - 09e2f8e68f 78 ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b ( tensorrt_llm - l40s - fp8 - tp1 - throughput )

Select a compatible tensorrt_llm or a tensorrt_llm-trtllm_buildable profile. Then run the previous command with the additional option e NIM_MODEL_PROFILE=profile_name , where profile_name is the name of a profile.