Model Profiles#

A NIM Model profile defines what model engines NIM can use. Unique strings based on a hash of the profile contents identify each profile.

Users may select a profile at deployment time by following the Profile Selection steps. If the user does not manually select a profile at deployment time, NIM automatically chooses a generic, non-optimized profile. To understand how profiles and their corresponding engines are created, see How Profiles are Created.

Model profiles are embedded within the NIM container in a Model Manifest file, which is by default placed at /opt/nim/etc/config/default/model_manifest.yaml within the container filesystem.

Profile Selection#

To select a profile for deployment, set a specific profile ID with -e NIM_MANIFEST_PROFILE=<value>. You can choose a profile id for your GPU from the following list:

GPU

GPU Memory

Precision

Profile Id

H100

80

FP16

81b154f8a559772a3e1192354538166ad68e7e9a81ddebddf4afb2ee940f7c2c

H100 NVL

80

FP16

227d6ddb4fdb21c7d8968982c4deb168d7ef8729cf097aaf3178eda06446335e

H100 PCIe

80

FP16

6b831f71a16f631b02a04bf75dc59d44c194c015af2cc16502f01346060c0dac

A100 SXM

80

FP16

9367a7048d21c405768203724f863e116d9aeb71d4847fca004930b9b9584bb6

A100 PCIe

80

FP16

9367a7048d21c405768203724f863e116d9aeb71d4847fca004930b9b9584bb6

L40S

48

FP16

140ed439c490059878dc8879b74ea90033350bcddd2b00b2b3ad76519cfb3535

L4

24

FP16

fd503a9c18276856474f6018aa5889950005132ff087739f7a1a43adc927172f

A10G

24

FP16

ef64b17e07abb04c17e80f72ab20ccc028c8874459998bb305b5b2cec2fdca24

A6000 Ada

48

FP16

c172dbfd4e51dee6ecf2938960763539c27f9ba59ea559bd685bebe2689faf49

RTX 4090

24

FP16

e0ca2ec2230a21b45f986c4883c11658088ab0fd8db27aae390ef57dbe0359fe

RTX 5080-WSL

16

FP16

7c4119292272959bbb1f7f759c54b13457f583d92e8857ed0acd5e93b27d6a06

RTX 5090-WSL

32

FP16

a8842f1dbfa9209b4e53be67e8aff0b3d89a7807d40345f69adf0cbca98ba958

GH-200

480

FP16

0f3f65dca8fc252950ec51891048f2d7c559a30a7bc8c5a1876a92f6d9704086

If you run on an unsupported GPU, NIM chooses a generic, non-optimized profile with profile id 05dfcd65c81b0f7250d8fdeb64d6bb9c3c7db5845fbbb5055cac21ba3a2b7b41.

How Profiles are Created#

NIM microservices have two main categories of profiles: optimized and generic. optimized profiles are created for a subset of GPUs and models and leverage model- and hardware-specific optimizations intended to improve the performance of large language models. Over time, the breadth of models and GPUs for which optimized engines exist will increase. However, if an optimized engine does not exist for a particular model and GPU configuration combination, a generic backend is used as a fallback.

Currently, optimized profiles leverage pre-compiled TensorRT engines, while generic profiles utilize ONNX.

Quantization#

For some models and GPU configurations, quantized engines with reduced numerical precision are available. Currently, NV-CLIP NIM supports fp16 quantization for different GPU profiles.