Support Matrix#

Hardware Requirements#

Locally-buildable model profiles require additional host memory on first use to build the model profile.

Model Profile Type

GPU Models

GPU Memory

Host Memory (First Use)

Host Memory (Subsequent Use)

Locally-Buildable

H100, A100, A6000

48 GB

60 to 80 GB

32 GB

Generic

Any GPU with sufficient memory

48 GB

32 GB

32 GB

About Model Profiles#

The models for NVIDIA NIM microservices use model engines that are tuned for specific NVIDIA GPU models, number of GPUs, precision, and so on. NVIDIA produces model engines for several popular combinations and these are referred to as model profiles. Each model profile is identified by a unique 64-character string of hexadecimal digits that is referred to as a profile ID.

The NIM microservices support automatic profile selection by determining the GPU model and count on the node and attempting to match the optimal model profile. Alternatively, NIM microservices support running a specified model profile, but this requires that you review the profiles and know the profile ID.

The available model profiles are stored in a file in the NIM container file system. The file is referred to as the model manifest file and the default path is /opt/nim/etc/default/model_manifest.yaml in the container.

For information about locally-buildable and generic model profiles, refer to Model Profiles in the NIM for LLMs documentation.

NVIDIA Llama 3.1 NemoGuard 8B TopicGuard Model Profiles#

Note

The tensor parallel 4 GPU model profiles are not runnable. This is a known issue.

NVIDIA Llama 3.1 Nemoguard 8B Topic Control Version 1.3.0 Model Profiles#

Locally-Buildable Model Profiles#

Precision

# of GPUs

LoRA

LLM Engine

TensorRT-LLM Buildable

Disk Space

Profile ID

BF16

1

False

TensorRT-LLM

True

14.97 GB

7cc8597690a35aba19a3636f35e7f1c7e7dbc005fe88ce9394cad4a4adeed414

BF16

1

True

TensorRT-LLM

True

14.97 GB

df4113435195daa68b56c83741d66b422c463c556fc1669f39f923427c1c57c5

BF16

2

True

TensorRT-LLM

True

14.97 GB

48696b63c4821ae61e3dae479a1a822f1d2aa4cc8d02fae64a59f1d88c487304

BF16

2

False

TensorRT-LLM

True

14.97 GB

b7b6fa584441d9536091ce5cf80ccc31765780b8a46540da4e7bada5c5108ed9

BF16

4

False

TensorRT-LLM

True

14.97 GB

4e0d43c3245d0232d32bcca05648c98a70e9692518701cdd0cfd987acf5a3cfa

Generic Model Profiles#

Precision

# of GPUs

LoRA

LLM Engine

Disk Space

Profile ID

BF16

1

False

vLLM

14.97 GB

193649a2eb95e821309d6023a2cabb31489d3b690a9973c7ab5d1ff58b0aa7eb

BF16

2

False

vLLM

14.97 GB

395082aa40085d35f004dd3056d7583aea330417ed509b4315099a66cfc72bdd

BF16

4

False

vLLM

14.97 GB

96e7cd0991f4ab5cf47a08cce8d1169daa8a431485be805fb00de0638bdeed9d