Support Matrix#

This page lists the supported models, their deployment profiles, and the verified hardware SKUs for NIM LLM.

Supported Models and Profiles#

Use the following sections to identify the supported deployment profiles for each model. Profile strings follow a naming convention described in Model Profiles and Selection.

Note

For supported hardware, refer to the Verified GPUs dropdown for each model or the GPU Compatibility section.

gpt-oss-120b#

Latest supported NIM LLM version: 2.0.1

The following table lists the supported profile configurations for openai/gpt-oss-120b:

Precision

TP1

TP2

TP4

TP8

MXFP4

vllm-mxfp4-tp1-pp1

vllm-mxfp4-tp2-pp1

vllm-mxfp4-tp4-pp1

vllm-mxfp4-tp8-pp1

MXFP4 + LoRA

vllm-mxfp4-tp1-pp1-lora

vllm-mxfp4-tp2-pp1-lora

vllm-mxfp4-tp4-pp1-lora

vllm-mxfp4-tp8-pp1-lora

Verified GPUs

This model has been verified on the following GPUs:

  • NVIDIA-A100-SXM4-40GB

  • NVIDIA-A100-SXM4-80GB

  • NVIDIA-B200

  • NVIDIA-B300-SXM6-AC

  • NVIDIA-GB200

  • NVIDIA-GH200-144G-HBM3e

  • NVIDIA-H100-80GB-HBM3

  • NVIDIA-H100-NVL

  • NVIDIA-H200

  • NVIDIA-H200-NVL

  • NVIDIA-L40S

  • NVIDIA-RTX-PRO-6000-Blackwell-Server-Edition

gpt-oss-20b#

Latest supported NIM LLM version: 2.0.1

The following table lists the supported profile configurations for openai/gpt-oss-20b:

Precision

TP1

TP2

TP4

TP8

MXFP4

vllm-mxfp4-tp1-pp1

vllm-mxfp4-tp2-pp1

vllm-mxfp4-tp4-pp1

vllm-mxfp4-tp8-pp1

MXFP4 + LoRA

vllm-mxfp4-tp1-pp1-lora

vllm-mxfp4-tp2-pp1-lora

vllm-mxfp4-tp4-pp1-lora

vllm-mxfp4-tp8-pp1-lora

Verified GPUs

This model has been verified on the following GPUs:

  • NVIDIA-A100-SXM4-40GB

  • NVIDIA-A100-SXM4-80GB

  • NVIDIA-A10G

  • NVIDIA-B200

  • NVIDIA-B300-SXM6-AC

  • NVIDIA-GB10

  • NVIDIA-GB200

  • NVIDIA-GH200-144G-HBM3e

  • NVIDIA-GH200-480GB

  • NVIDIA-H100-80GB-HBM3

  • NVIDIA-H200

  • NVIDIA-L40S

  • NVIDIA-RTX-PRO-4500-Blackwell-Server-Edition

  • NVIDIA-RTX-PRO-6000-Blackwell-Server-Edition

llama-3.1-70b-instruct#

Latest supported NIM LLM version: 2.0.1

The following table lists the supported profile configurations for meta/llama-3.1-70b-instruct:

Precision

TP1

TP2

TP4

TP8

BF16

vllm-bf16-tp1-pp1

vllm-bf16-tp2-pp1

vllm-bf16-tp4-pp1

vllm-bf16-tp8-pp1

BF16 + LoRA

vllm-bf16-tp1-pp1-lora

vllm-bf16-tp2-pp1-lora

vllm-bf16-tp4-pp1-lora

vllm-bf16-tp8-pp1-lora

FP8

vllm-fp8-tp1-pp1

vllm-fp8-tp2-pp1

vllm-fp8-tp4-pp1

vllm-fp8-tp8-pp1

FP8 + LoRA

vllm-fp8-tp1-pp1-lora

vllm-fp8-tp2-pp1-lora

vllm-fp8-tp4-pp1-lora

vllm-fp8-tp8-pp1-lora

NVFP4

vllm-nvfp4-tp1-pp1

vllm-nvfp4-tp2-pp1

vllm-nvfp4-tp4-pp1

vllm-nvfp4-tp8-pp1

NVFP4 + LoRA

vllm-nvfp4-tp1-pp1-lora

vllm-nvfp4-tp2-pp1-lora

vllm-nvfp4-tp4-pp1-lora

vllm-nvfp4-tp8-pp1-lora

Verified GPUs

This model has been verified on the following GPUs:

  • NVIDIA-A10G

  • NVIDIA-A100-SXM4-40GB

  • NVIDIA-A100-SXM4-80GB

  • NVIDIA-B200

  • NVIDIA-B300-SXM6-AC

  • NVIDIA-GB200

  • NVIDIA-GH200-144G-HBM3e

  • NVIDIA-GH200-480GB

  • NVIDIA-H100-80GB-HBM3

  • NVIDIA-H100-NVL

  • NVIDIA-H200

  • NVIDIA-H200-NVL

  • NVIDIA-L40S

  • NVIDIA-RTX-PRO-6000-Blackwell-Server-Edition

llama-3.1-8b-instruct#

Latest supported NIM LLM version: 2.0.1

The following table lists the supported profile configurations for meta/llama-3.1-8b-instruct:

Precision

TP1

BF16

vllm-bf16-tp1-pp1

BF16 + LoRA

vllm-bf16-tp1-pp1-lora

FP8

vllm-fp8-tp1-pp1

FP8 + LoRA

vllm-fp8-tp1-pp1-lora

NVFP4

vllm-nvfp4-tp1-pp1

NVFP4 + LoRA

vllm-nvfp4-tp1-pp1-lora

Verified GPUs

This model has been verified on the following GPUs:

  • NVIDIA-A100-SXM4-40GB

  • NVIDIA-A100-SXM4-80GB

  • NVIDIA-B200

  • NVIDIA-B300-SXM6-AC

  • NVIDIA-GB10

  • NVIDIA-GB200

  • NVIDIA-GH200-144G-HBM3e

  • NVIDIA-GH200-480GB

  • NVIDIA-H100-80GB-HBM3

  • NVIDIA-H100-NVL

  • NVIDIA-H200

  • NVIDIA-H200-NVL

  • NVIDIA-L40S

  • NVIDIA-RTX-PRO-4500-Blackwell-Server-Edition

  • NVIDIA-RTX-PRO-6000-Blackwell-Server-Edition

llama-3.3-70b-instruct#

Latest supported NIM LLM version: 2.0.1

The following table lists the supported profile configurations for meta/llama-3.3-70b-instruct:

Precision

TP1

TP2

TP4

TP8

BF16

vllm-bf16-tp1-pp1

vllm-bf16-tp2-pp1

vllm-bf16-tp4-pp1

vllm-bf16-tp8-pp1

BF16 + LoRA

vllm-bf16-tp1-pp1-lora

vllm-bf16-tp2-pp1-lora

vllm-bf16-tp4-pp1-lora

vllm-bf16-tp8-pp1-lora

FP8

vllm-fp8-tp1-pp1

vllm-fp8-tp2-pp1

vllm-fp8-tp4-pp1

vllm-fp8-tp8-pp1

FP8 + LoRA

vllm-fp8-tp1-pp1-lora

vllm-fp8-tp2-pp1-lora

vllm-fp8-tp4-pp1-lora

vllm-fp8-tp8-pp1-lora

NVFP4

vllm-nvfp4-tp1-pp1

vllm-nvfp4-tp2-pp1

vllm-nvfp4-tp4-pp1

vllm-nvfp4-tp8-pp1

NVFP4 + LoRA

vllm-nvfp4-tp2-pp1-lora

vllm-nvfp4-tp4-pp1-lora

vllm-nvfp4-tp8-pp1-lora

Verified GPUs

This model has been verified on the following GPUs:

  • NVIDIA-A10G

  • NVIDIA-A100-SXM4-40GB

  • NVIDIA-A100-SXM4-80GB

  • NVIDIA-B200

  • NVIDIA-B300-SXM6-AC

  • NVIDIA-GB200

  • NVIDIA-GH200-144G-HBM3e

  • NVIDIA-GH200-480GB

  • NVIDIA-H100-80GB-HBM3

  • NVIDIA-H100-NVL

  • NVIDIA-H200

  • NVIDIA-H200-NVL

  • NVIDIA-L40S

  • NVIDIA-RTX-PRO-6000-Blackwell-Server-Edition

llama-3.3-nemotron-super-49b-v1.5#

Latest supported NIM LLM version: 2.0.1

The following table lists the supported profile configurations for nvidia/llama-3.3-nemotron-super-49b-v1.5:

Precision

TP1

TP2

TP4

TP8

BF16

vllm-bf16-tp1-pp1

vllm-bf16-tp2-pp1

vllm-bf16-tp4-pp1

vllm-bf16-tp8-pp1

BF16 + LoRA

vllm-bf16-tp1-pp1-lora

vllm-bf16-tp2-pp1-lora

vllm-bf16-tp4-pp1-lora

vllm-bf16-tp8-pp1-lora

FP8

vllm-fp8-tp1-pp1

vllm-fp8-tp2-pp1

vllm-fp8-tp4-pp1

vllm-fp8-tp8-pp1

FP8 + LoRA

vllm-fp8-tp1-pp1-lora

vllm-fp8-tp2-pp1-lora

vllm-fp8-tp4-pp1-lora

vllm-fp8-tp8-pp1-lora

NVFP4

vllm-nvfp4-tp1-pp1

vllm-nvfp4-tp2-pp1

vllm-nvfp4-tp4-pp1

vllm-nvfp4-tp8-pp1

NVFP4 + LoRA

vllm-nvfp4-tp1-pp1-lora

vllm-nvfp4-tp2-pp1-lora

vllm-nvfp4-tp4-pp1-lora

vllm-nvfp4-tp8-pp1-lora

Verified GPUs

This model has been verified on the following GPUs:

  • NVIDIA-A100-SXM4-40GB

  • NVIDIA-A100-SXM4-80GB

  • NVIDIA-B200

  • NVIDIA-B300-SXM6-AC

  • NVIDIA-GB10

  • NVIDIA-GB200

  • NVIDIA-GH200-144G-HBM3e

  • NVIDIA-GH200-480GB

  • NVIDIA-H100-80GB-HBM3

  • NVIDIA-H100-NVL

  • NVIDIA-H200

  • NVIDIA-H200-NVL

  • NVIDIA-L40S

  • NVIDIA-RTX-PRO-4500-Blackwell-Server-Edition

  • NVIDIA-RTX-PRO-6000-Blackwell-Server-Edition

nemotron-3-nano#

Latest supported NIM LLM version: 2.0.1

The following table lists the supported profile configurations for nvidia/nemotron-3-nano:

Precision

TP1

TP2

TP4

TP8

BF16

vllm-bf16-tp1-pp1

vllm-bf16-tp2-pp1

vllm-bf16-tp4-pp1

vllm-bf16-tp8-pp1

BF16 + LoRA

vllm-bf16-tp1-pp1-lora

vllm-bf16-tp2-pp1-lora

vllm-bf16-tp4-pp1-lora

vllm-bf16-tp8-pp1-lora

FP8

vllm-fp8-tp1-pp1

vllm-fp8-tp2-pp1

vllm-fp8-tp4-pp1

vllm-fp8-tp8-pp1

FP8 + LoRA

vllm-fp8-tp1-pp1-lora

vllm-fp8-tp2-pp1-lora

vllm-fp8-tp4-pp1-lora

vllm-fp8-tp8-pp1-lora

NVFP4

vllm-nvfp4-tp1-pp1

vllm-nvfp4-tp2-pp1

vllm-nvfp4-tp4-pp1

vllm-nvfp4-tp8-pp1

NVFP4 + LoRA

vllm-nvfp4-tp1-pp1-lora

vllm-nvfp4-tp2-pp1-lora

vllm-nvfp4-tp4-pp1-lora

vllm-nvfp4-tp8-pp1-lora

Verified GPUs

This model has been verified on the following GPUs:

  • NVIDIA-A100-SXM4-40GB

  • NVIDIA-A100-SXM4-80GB

  • NVIDIA-B200

  • NVIDIA-B300-SXM6-AC

  • NVIDIA-GB10

  • NVIDIA-GB200

  • NVIDIA-GH200-144G-HBM3e

  • NVIDIA-GH200-480GB

  • NVIDIA-H100-80GB-HBM3

  • NVIDIA-H100-NVL

  • NVIDIA-H200

  • NVIDIA-H200-NVL

  • NVIDIA-L40S

  • NVIDIA-RTX-PRO-4500-Blackwell-Server-Edition

  • NVIDIA-RTX-PRO-6000-Blackwell-Server-Edition

nemotron-3-super-120b-a12b#

Latest supported NIM LLM version: 2.0.2

Select a verified GPU to view the supported profile configurations for nvidia/nemotron-3-super-120b-a12b:

PrecisionTP1TP2TP4TP8
BF16----vllm-bf16-tp4-pp1vllm-bf16-tp8-pp1
BF16 + LoRA----vllm-bf16-tp4-pp1-loravllm-bf16-tp8-pp1-lora
FP8--------
FP8 + LoRA--------
NVFP4--------
PrecisionTP1TP2TP4TP8
BF16--vllm-bf16-tp2-pp1vllm-bf16-tp4-pp1vllm-bf16-tp8-pp1
BF16 + LoRA--vllm-bf16-tp2-pp1-loravllm-bf16-tp4-pp1-loravllm-bf16-tp8-pp1-lora
FP8vllm-fp8-tp1-pp1vllm-fp8-tp2-pp1vllm-fp8-tp4-pp1vllm-fp8-tp8-pp1
FP8 + LoRAvllm-fp8-tp1-pp1-loravllm-fp8-tp2-pp1-loravllm-fp8-tp4-pp1-loravllm-fp8-tp8-pp1-lora
NVFP4vllm-nvfp4-tp1-pp1vllm-nvfp4-tp2-pp1vllm-nvfp4-tp4-pp1vllm-nvfp4-tp8-pp1
PrecisionTP1TP2TP4TP8
BF16vllm-bf16-tp1-pp1vllm-bf16-tp2-pp1vllm-bf16-tp4-pp1vllm-bf16-tp8-pp1
BF16 + LoRAvllm-bf16-tp1-pp1-loravllm-bf16-tp2-pp1-loravllm-bf16-tp4-pp1-loravllm-bf16-tp8-pp1-lora
FP8vllm-fp8-tp1-pp1vllm-fp8-tp2-pp1vllm-fp8-tp4-pp1vllm-fp8-tp8-pp1
FP8 + LoRAvllm-fp8-tp1-pp1-loravllm-fp8-tp2-pp1-loravllm-fp8-tp4-pp1-loravllm-fp8-tp8-pp1-lora
NVFP4vllm-nvfp4-tp1-pp1vllm-nvfp4-tp2-pp1vllm-nvfp4-tp4-pp1vllm-nvfp4-tp8-pp1
PrecisionTP1TP2TP4TP8
BF16--vllm-bf16-tp2-pp1vllm-bf16-tp4-pp1vllm-bf16-tp8-pp1
BF16 + LoRA--vllm-bf16-tp2-pp1-loravllm-bf16-tp4-pp1-loravllm-bf16-tp8-pp1-lora
FP8vllm-fp8-tp1-pp1vllm-fp8-tp2-pp1vllm-fp8-tp4-pp1vllm-fp8-tp8-pp1
FP8 + LoRAvllm-fp8-tp1-pp1-loravllm-fp8-tp2-pp1-loravllm-fp8-tp4-pp1-loravllm-fp8-tp8-pp1-lora
NVFP4--------
PrecisionTP1TP2TP4TP8
BF16----vllm-bf16-tp4-pp1vllm-bf16-tp8-pp1
BF16 + LoRA----vllm-bf16-tp4-pp1-loravllm-bf16-tp8-pp1-lora
FP8--vllm-fp8-tp2-pp1vllm-fp8-tp4-pp1vllm-fp8-tp8-pp1
FP8 + LoRA--vllm-fp8-tp2-pp1-loravllm-fp8-tp4-pp1-loravllm-fp8-tp8-pp1-lora
NVFP4--------
PrecisionTP1TP2TP4TP8
BF16------vllm-bf16-tp8-pp1
BF16 + LoRA--------
FP8----vllm-fp8-tp4-pp1vllm-fp8-tp8-pp1
FP8 + LoRA--------
NVFP4--------
PrecisionTP1TP2TP4TP8
BF16--------
BF16 + LoRA--------
FP8------vllm-fp8-tp8-pp1
FP8 + LoRA------vllm-fp8-tp8-pp1-lora
NVFP4----vllm-nvfp4-tp4-pp1vllm-nvfp4-tp8-pp1
PrecisionTP1TP2TP4TP8
BF16------vllm-bf16-tp8-pp1
BF16 + LoRA------vllm-bf16-tp8-pp1-lora
FP8----vllm-fp8-tp4-pp1vllm-fp8-tp8-pp1
FP8 + LoRA----vllm-fp8-tp4-pp1-loravllm-fp8-tp8-pp1-lora
NVFP4vllm-nvfp4-tp1-pp1vllm-nvfp4-tp2-pp1vllm-nvfp4-tp4-pp1vllm-nvfp4-tp8-pp1

Note

This is a large model. Lower-TP profiles require substantially more GPU memory per device, so some verified GPUs support only TP4 or TP8 profiles.

Verified GPUs

The following GPUs have been verified with one or more supported profiles for this model:

  • NVIDIA-A100-SXM4-80GB

  • NVIDIA-B200

  • NVIDIA-B300-SXM6-AC

  • NVIDIA-GB200

  • NVIDIA-GH200-144G-HBM3e

  • NVIDIA-H100-80GB-HBM3

  • NVIDIA-H100-NVL

  • NVIDIA-H200

  • NVIDIA-H200-NVL

  • NVIDIA-L40S

  • NVIDIA-RTX-PRO-4500-Blackwell-Server-Edition

  • NVIDIA-RTX-PRO-6000-Blackwell-Server-Edition

starcoder2-7b#

Latest supported NIM LLM version: 2.0.1

The following table lists the supported profile configurations for bigcode/starcoder2-7b:

Precision

TP1

TP2

BF16

vllm-bf16-tp1-pp1

vllm-bf16-tp2-pp1

Verified GPUs

This model has been verified on the following GPUs:

  • NVIDIA-H100-80GB-HBM3

  • NVIDIA-H200

Model-Free NIM#

Latest supported NIM LLM version: 2.0.1

The following models are tested and validated for nvidia/model-free-nim:

  • gpt-oss-20b

  • apriel-nemotron

  • codestral

While not explicitly validated, the model-free NIM can be used with any model supported by the underlying backend (vLLM) version. Refer to Model-Free NIM for deployment details.

Verified GPUs

The model-free NIM has been verified on the following GPUs:

  • NVIDIA-A100-80GB-PCIe

  • NVIDIA-A100-PCIE-40GB

  • NVIDIA-A100-SXM4-40GB

  • NVIDIA-A100-SXM4-80GB

  • NVIDIA-B300-SXM6-AC

  • NVIDIA-GH200-480GB

  • NVIDIA-H100-80GB-HBM3

  • NVIDIA-H100-NVL

  • NVIDIA-H100-PCIe

  • NVIDIA-H200

  • NVIDIA-H200-NVL

  • NVIDIA-RTX-PRO-4500-Blackwell-Server-Edition

GPU Compatibility#

Use the following dropdowns to determine which models are supported on a given GPU:

NVIDIA-A10G

The following models have been verified on this GPU:

NVIDIA-GB10

The following models have been verified on this GPU:

NVIDIA-RTX-PRO-4500-Blackwell-Server-Edition

1.x NIM LLM Models#

For more information on version 1.x NIMs, refer to the 1.15 version of the NIM LLM Supported Models page.

Show 1.x models