Support Matrix#
About Model Profiles#
The models for NVIDIA NIM microservices use model engines that are tuned for specific NVIDIA GPU models, number of GPUs, precision, and so on. NVIDIA produces model engines for several popular combinations and these are referred to as model profiles. Each model profile is identified by a unique 64-character string of hexadecimal digits that is referred to as a profile ID.
The NIM microservices support automatic profile selection by determining the GPU model and count on the node and attempting to match the optimal model profile. Alternatively, NIM microservices support running a specified model profile, but this requires that you review the profiles and know the profile ID.
The available model profiles are stored in a file in the NIM container file system.
The file is referred to as the model manifest file and the default path is /opt/nim/etc/default/model_manifest.yaml
in the container.
NVIDIA Llama 3.1 NemoGuard 8B TopicGuard Model Profiles#
The model requires 48 GB of GPU memory. NVIDIA developed and tested the microservice using the following GPUs:
B200 with tensor parallelism of
1
and2
GH200 480GB with tensor parallelism of
1
H200 with tensor parallelism of
1
and2
H200 NVLink with tensor parallelism of
1
and2
H100 with tensor parallelism of
1
and2
H100 NVLink with tensor parallism of
1
and2
A100 with tensor parallelism of
1
and2
A100 SXM4 40GB with tensor parallelism of
1
and2
A10G with tensor parallelism of
4
and8
L40S with tensor parallelism of
2
and4
You can use a single GPU with the required memory capacity or multiple GPUs that meet the capacity.
For information about locally-buildable and generic model profiles, refer to Model Profiles in NVIDIA NIM for LLMs in the NIM for LLMs documentation.
Llama 3.1 Nemoguard 8B Topic Control Version 1.10.1 Model Profiles#
Locally-Buildable Model Profiles#
Precision |
# of GPUs |
LoRA |
LLM Engine |
TensorRT-LLM Buildable |
Disk Space |
Profile ID |
---|---|---|---|---|---|---|
BF16 |
1 |
False |
TensorRT-LLM |
True |
14.97 GB |
ac34857f8dcbd174ad524974248f2faf271bd2a0355643b2cf1490d0fe7787c2
|
BF16 |
2 |
False |
TensorRT-LLM |
True |
14.97 GB |
375dc0ff86133c2a423fbe9ef46d8fdf12d6403b3caa3b8e70d7851a89fc90dd
|
BF16 |
4 |
False |
TensorRT-LLM |
True |
14.97 GB |
54946b08b79ecf9e7f2d5c000234bf2cce19c8fee21b243c1a084b03897e8c95
|
BF16 |
8 |
False |
TensorRT-LLM |
True |
14.97 GB |
1d7b604f835f74791e6bfd843047fc00a5aef0f72954ca48ce963811fb6f3f09
|
Generic Model Profiles#
Precision |
# of GPUs |
LoRA |
LLM Engine |
Disk Space |
Profile ID |
---|---|---|---|---|---|
BF16 |
1 |
False |
vLLM |
14.97 GB |
4f904d571fe60ff24695b5ee2aa42da58cb460787a968f1e8a09f5a7e862728d
|
BF16 |
2 |
False |
vLLM |
14.97 GB |
7fa4a5a68c0338f16aef61de94977acfdacb7cabd848d38c49c48d2f639f04b3
|
BF16 |
4 |
False |
vLLM |
14.97 GB |
c84b2a068e56a551906563035ed77f88c88cbe1a63c6768fb2d4a9e0af1e67ba
|
BF16 |
8 |
False |
vLLM |
14.97 GB |
f95be114df33dd6613105f76fd567a071ed3bd08232888a5ba2f0545a99dbd92
|