Model Profiles#
A NIM Model profile defines what model engines NIM can use. Unique strings based on a hash of the profile contents identify each profile.
Users may select a profile at deployment time by following the Profile Selection steps. If the user does not manually select a profile at deployment time, NIM automatically chooses a generic, non-optimized profile. To understand how profiles and their corresponding engines are created, see How Profiles are Created.
Model profiles are embedded within the NIM container in a Model Manifest file, which is by default placed at /opt/nim/etc/config/default/model_manifest.yaml
within the container filesystem.
Profile Selection#
To select a profile for deployment, set a specific profile ID with -e NIM_MANIFEST_PROFILE=<value>
. You can choose a profile id for your GPU from the following list:
GPU |
GPU Memory |
Precision |
Profile Id |
---|---|---|---|
H100 |
80 |
FP16 |
81b154f8a559772a3e1192354538166ad68e7e9a81ddebddf4afb2ee940f7c2c |
H100 NVL |
80 |
FP16 |
227d6ddb4fdb21c7d8968982c4deb168d7ef8729cf097aaf3178eda06446335e |
H100 PCIe |
80 |
FP16 |
6b831f71a16f631b02a04bf75dc59d44c194c015af2cc16502f01346060c0dac |
A100 SXM |
80 |
FP16 |
9367a7048d21c405768203724f863e116d9aeb71d4847fca004930b9b9584bb6 |
A100 PCIe |
80 |
FP16 |
9367a7048d21c405768203724f863e116d9aeb71d4847fca004930b9b9584bb6 |
L40S |
48 |
FP16 |
140ed439c490059878dc8879b74ea90033350bcddd2b00b2b3ad76519cfb3535 |
L4 |
24 |
FP16 |
fd503a9c18276856474f6018aa5889950005132ff087739f7a1a43adc927172f |
A10G |
24 |
FP16 |
ef64b17e07abb04c17e80f72ab20ccc028c8874459998bb305b5b2cec2fdca24 |
A6000 Ada |
48 |
FP16 |
c172dbfd4e51dee6ecf2938960763539c27f9ba59ea559bd685bebe2689faf49 |
RTX 4090 |
24 |
FP16 |
e0ca2ec2230a21b45f986c4883c11658088ab0fd8db27aae390ef57dbe0359fe |
RTX 5080-WSL |
16 |
FP16 |
7c4119292272959bbb1f7f759c54b13457f583d92e8857ed0acd5e93b27d6a06 |
RTX 5090-WSL |
32 |
FP16 |
a8842f1dbfa9209b4e53be67e8aff0b3d89a7807d40345f69adf0cbca98ba958 |
GH-200 |
480 |
FP16 |
0f3f65dca8fc252950ec51891048f2d7c559a30a7bc8c5a1876a92f6d9704086 |
If you run on an unsupported GPU, NIM chooses a generic, non-optimized profile with profile id 05dfcd65c81b0f7250d8fdeb64d6bb9c3c7db5845fbbb5055cac21ba3a2b7b41
.
How Profiles are Created#
NIM microservices have two main categories of profiles: optimized
and generic
. optimized
profiles are created for a subset of GPUs and models and leverage model- and hardware-specific optimizations intended to improve the performance of large language models. Over time, the breadth of models and GPUs for which optimized
engines exist will increase. However, if an optimized engine does not exist for a particular model and GPU configuration combination, a generic
backend is used as a fallback.
Currently, optimized
profiles leverage pre-compiled TensorRT engines, while generic
profiles utilize ONNX.
Quantization#
For some models and GPU configurations, quantized engines with reduced numerical precision are available. Currently, NV-CLIP NIM supports fp16
quantization for different GPU profiles.