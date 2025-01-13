NVIDIA NIMs for large-language models will run on any NVIDIA GPU, as long as the GPU has sufficient memory, or on multiple, homogeneous NVIDIA GPUs with sufficient aggregate memory and CUDA compute capability > 7.0 (8.0 for bfloat16). Some model/GPU combinations, including vGPU, are optimized. See the following Supported Models section for further information.

The GPU listed in the following sections have the following specifications.

General Guidelines#

In general, NVIDIA recommends the following guidelines for models that NVIDIA NIMs support, but have not been either optimized for our TRT-LLM runtime nor tested against all of our GPUs in our lab. The values in these two tables are based on the number of parameters used during training.

Note These values are estimates not guarantees.

GPUs# Both H100 and A100 should be 80GB SXM/NVLink models, L40S should be 48GB PCIe models, and A10G should be 24GB PCIe models. Billion Parameters H100 A100 L40S A10G 8 or fewer 1 1 1 1 8 to 70 1 1 2 4 70 to 300 4 4 8 16 300+ 8 8 16 32