Model Profiles#

A NIM Model profile defines what model engines NIM can use. Unique strings based on a hash of the profile contents identify each profile.

Users may select a profile at deployment time by following the Profile Selection steps. If the user does not manually select a profile at deployment time, NIM automatically chooses a generic, non-optimized profile. To understand how profiles and their corresponding engines are created, see How Profiles are Created.

Model profiles are embedded within the NIM container in a Model Manifest file, which is by default placed at /opt/nim/etc/config/default/model_manifest.yaml within the container filesystem.

Profile Selection#

To select a profile for deployment, set a specific profile ID with -e NIM_MANIFEST_PROFILE=<value>. You can choose a profile id for your GPU from the following list:

GPU	GPU Memory	Precision	Profile Id
H100	80	FP16	81b154f8a559772a3e1192354538166ad68e7e9a81ddebddf4afb2ee940f7c2c
H100 NVL	80	FP16	227d6ddb4fdb21c7d8968982c4deb168d7ef8729cf097aaf3178eda06446335e
H100 PCIe	80	FP16	6b831f71a16f631b02a04bf75dc59d44c194c015af2cc16502f01346060c0dac
A100 SXM	80	FP16	9367a7048d21c405768203724f863e116d9aeb71d4847fca004930b9b9584bb6
A100 PCIe	80	FP16	9367a7048d21c405768203724f863e116d9aeb71d4847fca004930b9b9584bb6
L40S	48	FP16	140ed439c490059878dc8879b74ea90033350bcddd2b00b2b3ad76519cfb3535
L4	24	FP16	fd503a9c18276856474f6018aa5889950005132ff087739f7a1a43adc927172f
A10G	24	FP16	ef64b17e07abb04c17e80f72ab20ccc028c8874459998bb305b5b2cec2fdca24
A6000 Ada	48	FP16	c172dbfd4e51dee6ecf2938960763539c27f9ba59ea559bd685bebe2689faf49
RTX 4090	24	FP16	e0ca2ec2230a21b45f986c4883c11658088ab0fd8db27aae390ef57dbe0359fe
RTX 5080-WSL	16	FP16	7c4119292272959bbb1f7f759c54b13457f583d92e8857ed0acd5e93b27d6a06
RTX 5090-WSL	32	FP16	a8842f1dbfa9209b4e53be67e8aff0b3d89a7807d40345f69adf0cbca98ba958
GH-200	480	FP16	0f3f65dca8fc252950ec51891048f2d7c559a30a7bc8c5a1876a92f6d9704086

If you run on an unsupported GPU, NIM chooses a generic, non-optimized profile with profile id 05dfcd65c81b0f7250d8fdeb64d6bb9c3c7db5845fbbb5055cac21ba3a2b7b41.

How Profiles are Created#

NIM microservices have two main categories of profiles: optimized and generic. optimized profiles are created for a subset of GPUs and models and leverage model- and hardware-specific optimizations intended to improve the performance of large language models. Over time, the breadth of models and GPUs for which optimized engines exist will increase. However, if an optimized engine does not exist for a particular model and GPU configuration combination, a generic backend is used as a fallback.

Currently, optimized profiles leverage pre-compiled TensorRT engines, while generic profiles utilize ONNX.

Quantization#

For some models and GPU configurations, quantized engines with reduced numerical precision are available. Currently, NV-CLIP NIM supports fp16 quantization for different GPU profiles.