Support Matrix#

About Model Profiles#

The models for NVIDIA NIM microservices use model engines that are tuned for specific NVIDIA GPU models, number of GPUs, precision, and so on. NVIDIA produces model engines for several popular combinations and these are referred to as model profiles. Each model profile is identified by a unique 64-character string of hexadecimal digits that is referred to as a profile ID.

The NIM microservices support automatic profile selection by determining the GPU model and count on the node and attempting to match the optimal model profile. Alternatively, NIM microservices support running a specified model profile, but this requires that you review the profiles and know the profile ID.

The available model profiles are stored in a file in the NIM container file system. The file is referred to as the model manifest file and the default path is /opt/nim/etc/default/model_manifest.yaml in the container.

NVIDIA Llama 3.1 NemoGuard 8B TopicGuard Model Profiles#

The model requires 48 GB of GPU memory. NVIDIA developed and tested the microservice using the following GPUs:

B200 with tensor parallelism of 1 and 2
GH200 480GB with tensor parallelism of 1
H200 with tensor parallelism of 1 and 2
H200 NVLink with tensor parallelism of 1 and 2
H100 with tensor parallelism of 1 and 2
H100 NVLink with tensor parallism of 1 and 2
A100 with tensor parallelism of 1 and 2
A100 SXM4 40GB with tensor parallelism of 1 and 2
A10G with tensor parallelism of 4 and 8
L40S with tensor parallelism of 2 and 4

You can use a single GPU with the required memory capacity or multiple GPUs that meet the capacity.

For information about locally-buildable and generic model profiles, refer to Model Profiles in NVIDIA NIM for LLMs in the NIM for LLMs documentation.

Llama 3.1 Nemoguard 8B Topic Control Version 1.10.1 Model Profiles#

Locally-Buildable Model Profiles#

Precision	# of GPUs	LoRA	LLM Engine	TensorRT-LLM Buildable	Disk Space	Profile ID
BF16	1	False	TensorRT-LLM	True	14.97 GB	ac34857f8dcbd174ad524974248f2faf271bd2a0355643b2cf1490d0fe7787c2
BF16	2	False	TensorRT-LLM	True	14.97 GB	375dc0ff86133c2a423fbe9ef46d8fdf12d6403b3caa3b8e70d7851a89fc90dd
BF16	4	False	TensorRT-LLM	True	14.97 GB	54946b08b79ecf9e7f2d5c000234bf2cce19c8fee21b243c1a084b03897e8c95
BF16	8	False	TensorRT-LLM	True	14.97 GB	1d7b604f835f74791e6bfd843047fc00a5aef0f72954ca48ce963811fb6f3f09

Generic Model Profiles#

Precision	# of GPUs	LoRA	LLM Engine	Disk Space	Profile ID
BF16	1	False	vLLM	14.97 GB	4f904d571fe60ff24695b5ee2aa42da58cb460787a968f1e8a09f5a7e862728d
BF16	2	False	vLLM	14.97 GB	7fa4a5a68c0338f16aef61de94977acfdacb7cabd848d38c49c48d2f639f04b3
BF16	4	False	vLLM	14.97 GB	c84b2a068e56a551906563035ed77f88c88cbe1a63c6768fb2d4a9e0af1e67ba
BF16	8	False	vLLM	14.97 GB	f95be114df33dd6613105f76fd567a071ed3bd08232888a5ba2f0545a99dbd92