NV-CLIP (Latest)
NV-CLIP (Latest)

Model Profiles

A NIM Model profile defines what model engines NIM can use. Unique strings based on a hash of the profile contents identify each profile.

Users may select a profile at deployment time by following the Profile Selection steps. If the user does not manually select a profile at deployment time, NIM automatically chooses a generic, non-optimized profile. To understand how profiles and their corresponding engines are created, see How Profiles are Created.

Model profiles are embedded within the NIM container in a Model Manifest file, which is by default placed at /opt/nim/etc/config/default/model_manifest.yaml within the container filesystem.

To select a profile for deployment, set a specific profile ID with -e NIM_MANIFEST_PROFILE=<value>. You can choose a profile id for your GPU from the following list:

GPU

GPU Memory

Precision

Profile Id

H100 SXM 80 FP16 420b5bb2-cd51-4dac-be21-759f3df4e441
H100 PCIe 80 FP16 420b5bb2-cd51-4dac-be21-759f3df4e441
A100 SXM 80 FP16 3f5c5926-add5-402d-8877-c0798ffbb9e9
A100 PCIe 80 FP16 3f5c5926-add5-402d-8877-c0798ffbb9e9
L40S 48 FP16 3c28d914-ebbd-418b-8c5a-2a0da64bf4e3
A10G 24 FP16 d892ff5f-a51e-417b-bda8-63a004f4c3d7
A6000 Ada 48 FP16 a19c7bf8-b6c4-47b3-b519-0b67840c9951
RTX 4090 48 FP16 38ce5361-fd45-4d48-94b0-1ca7eb3c5d0b

If you run on an unsupported GPU, NIM chooses a generic, non-optimized profile with profile id afd81bb5-1b82-4816-a1bd-312dd380e4d1.

NIM microservices have two main categories of profiles: optimized and generic. optimized profiles are created for a subset of GPUs and models and leverage model- and hardware-specific optimizations intended to improve the performance of large language models. Over time, the breadth of models and GPUs for which optimized engines exist will increase. However, if an optimized engine does not exist for a particular model and GPU configuration combination, a generic backend is used as a fallback.

Currently, optimized profiles leverage pre-compiled TensorRT engines, while generic profiles utilize ONNX.

Quantization

For some models and GPU configurations, quantized engines with reduced numerical precision are available. Currently, NV-CLIP NIM supports fp16 quantization for different GPU profiles.

Previous API Reference
Next Accuracy
© Copyright © 2024, NVIDIA Corporation. Last updated on Oct 3, 2024.