Support Matrix#
About Model Profiles#
The models for NVIDIA NIM microservices use model engines that are tuned for specific NVIDIA GPU models, number of GPUs, precision, and so on. NVIDIA produces model engines for several popular combinations and these are referred to as model profiles. Each model profile is identified by a unique 64-character string of hexadecimal digits that is referred to as a profile ID.
The NIM microservices support automatic profile selection by determining the GPU model and count on the node and attempting to match the optimal model profile. Alternatively, NIM microservices support running a specified model profile, but this requires that you review the profiles and know the profile ID.
The available model profiles are stored in a file in the NIM container file system.
The file is referred to as the model manifest file and the default path is /opt/nim/etc/default/model_manifest.yaml
in the container.
NVIDIA Llama 3.1 NemoGuard 8B ContentSafety Model Profiles#
The model requires 48 GB of GPU memory. NVIDIA developed and tested the microservice using H100, A100, and A6000 GPUs. You can use a single GPU with that capacity or two GPUs that meet the capacity.
For information about locally-buildable and generic model profiles, refer to Model Profiles in the NIM for LLMs documentation.
Note
The tensor parallel 4 GPU model profiles are not runnable. This is a known issue.
Locally-Buildable Model Profiles#
Precision |
# of GPUs |
LoRA |
LLM Engine |
TensorRT-LLM Buildable |
Disk Space |
Profile ID |
---|---|---|---|---|---|---|
BF16 |
1 |
False |
TensorRT-LLM |
True |
14.97 GB |
7cc8597690a35aba19a3636f35e7f1c7e7dbc005fe88ce9394cad4a4adeed414
|
BF16 |
1 |
True |
TensorRT-LLM |
True |
14.97 GB |
df4113435195daa68b56c83741d66b422c463c556fc1669f39f923427c1c57c5
|
BF16 |
2 |
True |
TensorRT-LLM |
True |
14.97 GB |
48696b63c4821ae61e3dae479a1a822f1d2aa4cc8d02fae64a59f1d88c487304
|
BF16 |
2 |
False |
TensorRT-LLM |
True |
14.97 GB |
b7b6fa584441d9536091ce5cf80ccc31765780b8a46540da4e7bada5c5108ed9
|
BF16 |
4 |
False |
TensorRT-LLM |
True |
14.97 GB |
4e0d43c3245d0232d32bcca05648c98a70e9692518701cdd0cfd987acf5a3cfa
|
Generic Model Profiles#
Precision |
# of GPUs |
LoRA |
LLM Engine |
Disk Space |
Profile ID |
---|---|---|---|---|---|
BF16 |
1 |
False |
vLLM |
14.97 GB |
193649a2eb95e821309d6023a2cabb31489d3b690a9973c7ab5d1ff58b0aa7eb
|
BF16 |
2 |
False |
vLLM |
14.97 GB |
395082aa40085d35f004dd3056d7583aea330417ed509b4315099a66cfc72bdd
|
BF16 |
4 |
False |
vLLM |
14.97 GB |
96e7cd0991f4ab5cf47a08cce8d1169daa8a431485be805fb00de0638bdeed9d
|