Support Matrix#
Hardware Requirements#
Locally-buildable model profiles require additional host memory on first use to build the model profile.
Model Profile Type |
GPU Models |
GPU Memory |
Host Memory (First Use) |
Host Memory (Subsequent Use) |
---|---|---|---|---|
Locally-Buildable |
H100, A100, A6000 |
48 GB |
60 to 80 GB |
32 GB |
Generic |
Any GPU with sufficient memory |
48 GB |
32 GB |
32 GB |
About Model Profiles#
The models for NVIDIA NIM microservices use model engines that are tuned for specific NVIDIA GPU models, number of GPUs, precision, and so on. NVIDIA produces model engines for several popular combinations and these are referred to as model profiles. Each model profile is identified by a unique 64-character string of hexadecimal digits that is referred to as a profile ID.
The NIM microservices support automatic profile selection by determining the GPU model and count on the node and attempting to match the optimal model profile. Alternatively, NIM microservices support running a specified model profile, but this requires that you review the profiles and know the profile ID.
The available model profiles are stored in a file in the NIM container file system.
The file is referred to as the model manifest file and the default path is /opt/nim/etc/default/model_manifest.yaml
in the container.
For information about locally-buildable and generic model profiles, refer to Model Profiles in the NIM for LLMs documentation.
NVIDIA Llama 3.1 NemoGuard 8B TopicGuard Model Profiles#
Note
The tensor parallel 4 GPU model profiles are not runnable. This is a known issue.
NVIDIA Llama 3.1 Nemoguard 8B Topic Control Version 1.3.0 Model Profiles#
Locally-Buildable Model Profiles#
Precision |
# of GPUs |
LoRA |
LLM Engine |
TensorRT-LLM Buildable |
Disk Space |
Profile ID |
---|---|---|---|---|---|---|
BF16 |
1 |
False |
TensorRT-LLM |
True |
14.97 GB |
7cc8597690a35aba19a3636f35e7f1c7e7dbc005fe88ce9394cad4a4adeed414
|
BF16 |
1 |
True |
TensorRT-LLM |
True |
14.97 GB |
df4113435195daa68b56c83741d66b422c463c556fc1669f39f923427c1c57c5
|
BF16 |
2 |
True |
TensorRT-LLM |
True |
14.97 GB |
48696b63c4821ae61e3dae479a1a822f1d2aa4cc8d02fae64a59f1d88c487304
|
BF16 |
2 |
False |
TensorRT-LLM |
True |
14.97 GB |
b7b6fa584441d9536091ce5cf80ccc31765780b8a46540da4e7bada5c5108ed9
|
BF16 |
4 |
False |
TensorRT-LLM |
True |
14.97 GB |
4e0d43c3245d0232d32bcca05648c98a70e9692518701cdd0cfd987acf5a3cfa
|
Generic Model Profiles#
Precision |
# of GPUs |
LoRA |
LLM Engine |
Disk Space |
Profile ID |
---|---|---|---|---|---|
BF16 |
1 |
False |
vLLM |
14.97 GB |
193649a2eb95e821309d6023a2cabb31489d3b690a9973c7ab5d1ff58b0aa7eb
|
BF16 |
2 |
False |
vLLM |
14.97 GB |
395082aa40085d35f004dd3056d7583aea330417ed509b4315099a66cfc72bdd
|
BF16 |
4 |
False |
vLLM |
14.97 GB |
96e7cd0991f4ab5cf47a08cce8d1169daa8a431485be805fb00de0638bdeed9d
|