Is this page helpful?

Support Matrix for Certified NIMs#

This page lists the supported models, their deployment profiles, and the verified hardware SKUs for NIM LLM Certified NIMs. For NIM Day 0, refer to Support Matrix for NIM Day 0.

Supported Models and Profiles#

Use the following sections to identify the supported deployment profiles for each model. Profile strings follow a naming convention described in Model Profiles and Selection.

Use the table below to filter certified NIM profiles by GPU, tensor parallelism (TP), precision, and model name. Each row is one supported profile; details also appear in the per-model sections that follow.

LoRA only

Model	TP	Precision	LoRA
gpt-oss-120b	1	MXFP4	No
gpt-oss-120b	2	MXFP4	No
gpt-oss-120b	4	MXFP4	No
gpt-oss-120b	8	MXFP4	No
gpt-oss-120b	1	MXFP4	Yes
gpt-oss-120b	2	MXFP4	Yes
gpt-oss-120b	4	MXFP4	Yes
gpt-oss-120b	8	MXFP4	Yes
gpt-oss-20b	1	MXFP4	No
gpt-oss-20b	2	MXFP4	No
gpt-oss-20b	4	MXFP4	No
gpt-oss-20b	8	MXFP4	No
gpt-oss-20b	1	MXFP4	Yes
gpt-oss-20b	2	MXFP4	Yes
gpt-oss-20b	4	MXFP4	Yes
gpt-oss-20b	8	MXFP4	Yes
llama-3.1-70b-instruct	1	BF16	No
llama-3.1-70b-instruct	2	BF16	No
llama-3.1-70b-instruct	4	BF16	No
llama-3.1-70b-instruct	8	BF16	No
llama-3.1-70b-instruct	1	BF16	Yes
llama-3.1-70b-instruct	2	BF16	Yes
llama-3.1-70b-instruct	4	BF16	Yes
llama-3.1-70b-instruct	8	BF16	Yes
llama-3.1-70b-instruct	1	FP8	No
llama-3.1-70b-instruct	2	FP8	No
llama-3.1-70b-instruct	4	FP8	No
llama-3.1-70b-instruct	8	FP8	No
llama-3.1-70b-instruct	1	FP8	Yes
llama-3.1-70b-instruct	2	FP8	Yes
llama-3.1-70b-instruct	4	FP8	Yes
llama-3.1-70b-instruct	8	FP8	Yes
llama-3.1-70b-instruct	1	NVFP4	No
llama-3.1-70b-instruct	2	NVFP4	No
llama-3.1-70b-instruct	4	NVFP4	No
llama-3.1-70b-instruct	8	NVFP4	No
llama-3.1-70b-instruct	1	NVFP4	Yes
llama-3.1-70b-instruct	2	NVFP4	Yes
llama-3.1-70b-instruct	4	NVFP4	Yes
llama-3.1-70b-instruct	8	NVFP4	Yes
llama-3.1-8b-instruct	1	BF16	No
llama-3.1-8b-instruct	1	BF16	Yes
llama-3.1-8b-instruct	1	FP8	No
llama-3.1-8b-instruct	1	FP8	Yes
llama-3.1-8b-instruct	1	NVFP4	No
llama-3.1-8b-instruct	1	NVFP4	Yes
llama-3.3-70b-instruct	1	BF16	No
llama-3.3-70b-instruct	2	BF16	No
llama-3.3-70b-instruct	4	BF16	No
llama-3.3-70b-instruct	8	BF16	No
llama-3.3-70b-instruct	1	BF16	Yes
llama-3.3-70b-instruct	2	BF16	Yes
llama-3.3-70b-instruct	4	BF16	Yes
llama-3.3-70b-instruct	8	BF16	Yes
llama-3.3-70b-instruct	1	FP8	No
llama-3.3-70b-instruct	2	FP8	No
llama-3.3-70b-instruct	4	FP8	No
llama-3.3-70b-instruct	8	FP8	No
llama-3.3-70b-instruct	1	FP8	Yes
llama-3.3-70b-instruct	2	FP8	Yes
llama-3.3-70b-instruct	4	FP8	Yes
llama-3.3-70b-instruct	8	FP8	Yes
llama-3.3-70b-instruct	1	NVFP4	No
llama-3.3-70b-instruct	2	NVFP4	No
llama-3.3-70b-instruct	4	NVFP4	No
llama-3.3-70b-instruct	8	NVFP4	No
llama-3.3-70b-instruct	1	NVFP4	Yes
llama-3.3-70b-instruct	2	NVFP4	Yes
llama-3.3-70b-instruct	4	NVFP4	Yes
llama-3.3-70b-instruct	8	NVFP4	Yes
llama-3.3-nemotron-super-49b-v1.5	1	BF16	No
llama-3.3-nemotron-super-49b-v1.5	2	BF16	No
llama-3.3-nemotron-super-49b-v1.5	4	BF16	No
llama-3.3-nemotron-super-49b-v1.5	8	BF16	No
llama-3.3-nemotron-super-49b-v1.5	1	BF16	Yes
llama-3.3-nemotron-super-49b-v1.5	2	BF16	Yes
llama-3.3-nemotron-super-49b-v1.5	4	BF16	Yes
llama-3.3-nemotron-super-49b-v1.5	8	BF16	Yes
llama-3.3-nemotron-super-49b-v1.5	1	FP8	No
llama-3.3-nemotron-super-49b-v1.5	2	FP8	No
llama-3.3-nemotron-super-49b-v1.5	4	FP8	No
llama-3.3-nemotron-super-49b-v1.5	8	FP8	No
llama-3.3-nemotron-super-49b-v1.5	1	FP8	Yes
llama-3.3-nemotron-super-49b-v1.5	2	FP8	Yes
llama-3.3-nemotron-super-49b-v1.5	4	FP8	Yes
llama-3.3-nemotron-super-49b-v1.5	8	FP8	Yes
llama-3.3-nemotron-super-49b-v1.5	1	NVFP4	No
llama-3.3-nemotron-super-49b-v1.5	2	NVFP4	No
llama-3.3-nemotron-super-49b-v1.5	4	NVFP4	No
llama-3.3-nemotron-super-49b-v1.5	8	NVFP4	No
llama-3.3-nemotron-super-49b-v1.5	1	NVFP4	Yes
llama-3.3-nemotron-super-49b-v1.5	2	NVFP4	Yes
llama-3.3-nemotron-super-49b-v1.5	4	NVFP4	Yes
llama-3.3-nemotron-super-49b-v1.5	8	NVFP4	Yes
nemotron-3-nano	1	BF16	No
nemotron-3-nano	2	BF16	No
nemotron-3-nano	4	BF16	No
nemotron-3-nano	8	BF16	No
nemotron-3-nano	1	BF16	Yes
nemotron-3-nano	2	BF16	Yes
nemotron-3-nano	4	BF16	Yes
nemotron-3-nano	8	BF16	Yes
nemotron-3-nano	1	FP8	No
nemotron-3-nano	2	FP8	No
nemotron-3-nano	4	FP8	No
nemotron-3-nano	8	FP8	No
nemotron-3-nano	1	NVFP4	No
nemotron-3-nano	2	NVFP4	No
nemotron-3-nano	4	NVFP4	No
nemotron-3-nano	8	NVFP4	No
nemotron-3-nano	1	NVFP4	Yes
nemotron-3-super-120b-a12b	1	BF16	No
nemotron-3-super-120b-a12b	2	BF16	No
nemotron-3-super-120b-a12b	4	BF16	No
nemotron-3-super-120b-a12b	8	BF16	No
nemotron-3-super-120b-a12b	1	BF16	Yes
nemotron-3-super-120b-a12b	2	BF16	Yes
nemotron-3-super-120b-a12b	4	BF16	Yes
nemotron-3-super-120b-a12b	8	BF16	Yes
nemotron-3-super-120b-a12b	1	FP8	No
nemotron-3-super-120b-a12b	2	FP8	No
nemotron-3-super-120b-a12b	4	FP8	No
nemotron-3-super-120b-a12b	8	FP8	No
nemotron-3-super-120b-a12b	1	NVFP4	No
nemotron-3-super-120b-a12b	2	NVFP4	No
nemotron-3-super-120b-a12b	4	NVFP4	No
nemotron-3-super-120b-a12b	8	NVFP4	No
nemotron-3-super-120b-a12b	1	NVFP4	Yes
nemotron-3-super-120b-a12b	2	NVFP4	Yes
starcoder2-7b	1	BF16	No
starcoder2-7b	2	BF16	No
No matching certified profiles. This configuration may still be deployable with memory tuning; refer to the memory troubleshooting guide for details.
No matching certified profiles. This GPU is verified for Model-Free NIM deployment, and this configuration may also be deployable with memory tuning; refer to the memory troubleshooting guide for details.

gpt-oss-120b#

The following table lists the supported profile configurations for openai/gpt-oss-120b:

Precision	TP1	TP2	TP4	TP8
MXFP4	`vllm-mxfp4-tp1-pp1`	`vllm-mxfp4-tp2-pp1`	`vllm-mxfp4-tp4-pp1`	`vllm-mxfp4-tp8-pp1`
MXFP4 + LoRA	`vllm-mxfp4-tp1-pp1-lora`	`vllm-mxfp4-tp2-pp1-lora`	`vllm-mxfp4-tp4-pp1-lora`	`vllm-mxfp4-tp8-pp1-lora`

gpt-oss-20b#

The following table lists the supported profile configurations for openai/gpt-oss-20b:

Precision	TP1	TP2	TP4	TP8
MXFP4	`vllm-mxfp4-tp1-pp1`	`vllm-mxfp4-tp2-pp1`	`vllm-mxfp4-tp4-pp1`	`vllm-mxfp4-tp8-pp1`
MXFP4 + LoRA	`vllm-mxfp4-tp1-pp1-lora`	`vllm-mxfp4-tp2-pp1-lora`	`vllm-mxfp4-tp4-pp1-lora`	`vllm-mxfp4-tp8-pp1-lora`

llama-3.1-70b-instruct#

The following table lists the supported profile configurations for meta/llama-3.1-70b-instruct:

Precision	TP1	TP2	TP4	TP8
BF16	`vllm-bf16-tp1-pp1`	`vllm-bf16-tp2-pp1`	`vllm-bf16-tp4-pp1`	`vllm-bf16-tp8-pp1`
BF16 + LoRA	`vllm-bf16-tp1-pp1-lora`	`vllm-bf16-tp2-pp1-lora`	`vllm-bf16-tp4-pp1-lora`	`vllm-bf16-tp8-pp1-lora`
FP8	`vllm-fp8-tp1-pp1`	`vllm-fp8-tp2-pp1`	`vllm-fp8-tp4-pp1`	`vllm-fp8-tp8-pp1`
FP8 + LoRA	`vllm-fp8-tp1-pp1-lora`	`vllm-fp8-tp2-pp1-lora`	`vllm-fp8-tp4-pp1-lora`	`vllm-fp8-tp8-pp1-lora`
NVFP4	`vllm-nvfp4-tp1-pp1`	`vllm-nvfp4-tp2-pp1`	`vllm-nvfp4-tp4-pp1`	`vllm-nvfp4-tp8-pp1`
NVFP4 + LoRA	`vllm-nvfp4-tp1-pp1-lora`	`vllm-nvfp4-tp2-pp1-lora`	`vllm-nvfp4-tp4-pp1-lora`	`vllm-nvfp4-tp8-pp1-lora`

llama-3.1-8b-instruct#

The following table lists the supported profile configurations for meta/llama-3.1-8b-instruct:

Precision	TP1
BF16	`vllm-bf16-tp1-pp1`
BF16 + LoRA	`vllm-bf16-tp1-pp1-lora`
FP8	`vllm-fp8-tp1-pp1`
FP8 + LoRA	`vllm-fp8-tp1-pp1-lora`
NVFP4	`vllm-nvfp4-tp1-pp1`
NVFP4 + LoRA	`vllm-nvfp4-tp1-pp1-lora`

llama-3.3-70b-instruct#

The following table lists the supported profile configurations for meta/llama-3.3-70b-instruct:

Precision	TP1	TP2	TP4	TP8
BF16	`vllm-bf16-tp1-pp1`	`vllm-bf16-tp2-pp1`	`vllm-bf16-tp4-pp1`	`vllm-bf16-tp8-pp1`
BF16 + LoRA	`vllm-bf16-tp1-pp1-lora`	`vllm-bf16-tp2-pp1-lora`	`vllm-bf16-tp4-pp1-lora`	`vllm-bf16-tp8-pp1-lora`
FP8	`vllm-fp8-tp1-pp1`	`vllm-fp8-tp2-pp1`	`vllm-fp8-tp4-pp1`	`vllm-fp8-tp8-pp1`
FP8 + LoRA	`vllm-fp8-tp1-pp1-lora`	`vllm-fp8-tp2-pp1-lora`	`vllm-fp8-tp4-pp1-lora`	`vllm-fp8-tp8-pp1-lora`
NVFP4	`vllm-nvfp4-tp1-pp1`	`vllm-nvfp4-tp2-pp1`	`vllm-nvfp4-tp4-pp1`	`vllm-nvfp4-tp8-pp1`
NVFP4 + LoRA	`vllm-nvfp4-tp1-pp1-lora`	`vllm-nvfp4-tp2-pp1-lora`	`vllm-nvfp4-tp4-pp1-lora`	`vllm-nvfp4-tp8-pp1-lora`

llama-3.3-nemotron-super-49b-v1.5#

The following table lists the supported profile configurations for nvidia/llama-3.3-nemotron-super-49b-v1.5:

Precision	TP1	TP2	TP4	TP8
BF16	`vllm-bf16-tp1-pp1`	`vllm-bf16-tp2-pp1`	`vllm-bf16-tp4-pp1`	`vllm-bf16-tp8-pp1`
BF16 + LoRA	`vllm-bf16-tp1-pp1-lora`	`vllm-bf16-tp2-pp1-lora`	`vllm-bf16-tp4-pp1-lora`	`vllm-bf16-tp8-pp1-lora`
FP8	`vllm-fp8-tp1-pp1`	`vllm-fp8-tp2-pp1`	`vllm-fp8-tp4-pp1`	`vllm-fp8-tp8-pp1`
FP8 + LoRA	`vllm-fp8-tp1-pp1-lora`	`vllm-fp8-tp2-pp1-lora`	`vllm-fp8-tp4-pp1-lora`	`vllm-fp8-tp8-pp1-lora`
NVFP4	`vllm-nvfp4-tp1-pp1`	`vllm-nvfp4-tp2-pp1`	`vllm-nvfp4-tp4-pp1`	`vllm-nvfp4-tp8-pp1`
NVFP4 + LoRA	`vllm-nvfp4-tp1-pp1-lora`	`vllm-nvfp4-tp2-pp1-lora`	`vllm-nvfp4-tp4-pp1-lora`	`vllm-nvfp4-tp8-pp1-lora`

nemotron-3-nano#

The following table lists the supported profile configurations for nvidia/nemotron-3-nano:

Precision	TP1	TP2	TP4	TP8
BF16	`vllm-bf16-tp1-pp1`	`vllm-bf16-tp2-pp1`	`vllm-bf16-tp4-pp1`	`vllm-bf16-tp8-pp1`
BF16 + LoRA	`vllm-bf16-tp1-pp1-lora`	`vllm-bf16-tp2-pp1-lora`	`vllm-bf16-tp4-pp1-lora`	`vllm-bf16-tp8-pp1-lora`
FP8	`vllm-fp8-tp1-pp1`	`vllm-fp8-tp2-pp1`	`vllm-fp8-tp4-pp1`	`vllm-fp8-tp8-pp1`
NVFP4	`vllm-nvfp4-tp1-pp1`	`vllm-nvfp4-tp2-pp1`	`vllm-nvfp4-tp4-pp1`	`vllm-nvfp4-tp8-pp1`
NVFP4 + LoRA	`vllm-nvfp4-tp1-pp1-lora`	–	–	–

nemotron-3-super-120b-a12b#

The following table lists the supported profile configurations for nvidia/nemotron-3-super-120b-a12b:

Precision	TP1	TP2	TP4	TP8
BF16	`vllm-bf16-tp1-pp1`	`vllm-bf16-tp2-pp1`	`vllm-bf16-tp4-pp1`	`vllm-bf16-tp8-pp1`
BF16 + LoRA	`vllm-bf16-tp1-pp1-lora`	`vllm-bf16-tp2-pp1-lora`	`vllm-bf16-tp4-pp1-lora`	`vllm-bf16-tp8-pp1-lora`
FP8	`vllm-fp8-tp1-pp1`	`vllm-fp8-tp2-pp1`	`vllm-fp8-tp4-pp1`	`vllm-fp8-tp8-pp1`
NVFP4	`vllm-nvfp4-tp1-pp1`	`vllm-nvfp4-tp2-pp1`	`vllm-nvfp4-tp4-pp1`	`vllm-nvfp4-tp8-pp1`
NVFP4 + LoRA	`vllm-nvfp4-tp1-pp1-lora`	`vllm-nvfp4-tp2-pp1-lora`	–	–

Note

This is a large model. Lower-TP profiles require substantially more GPU memory per device, so some verified GPUs support only TP4 or TP8 profiles.

starcoder2-7b#

The following table lists the supported profile configurations for bigcode/starcoder2-7b:

Precision	TP1	TP2
BF16	`vllm-bf16-tp1-pp1`	`vllm-bf16-tp2-pp1`

Model-Free NIM#

The following models are tested and validated for nvidia/model-free-nim:

gpt-oss-20b
apriel-nemotron
codestral

While not explicitly validated, the model-free NIM can be used with any model supported by the underlying backend (vLLM) version. Refer to Model-Free NIM for deployment details.

1.x NIM LLM Models#

For more information on version 1.x NIMs, refer to the 1.15 version of the NIM LLM Supported Models page.

Show 1.x models

Model (Hardware Requirements)	Organization/Model ID (Catalog Page)
DeepSeek-V3.1-Terminus	`deepseek-ai/deepseek-v3.1-terminus`
DeepSeek-V3.2-Exp	`deepseek-ai/deepseek-v32-exp-nim`
GLM-5	`zai-org/glm-5`
GLM-5.1	`zai-org/glm-51`
Llama-3.1-Nemotron-Nano-8B-Healthcare-Text2sql-v1.0	`nvidia/llama-3.1-nemotron-nano-8b-healthcare-text2sql-v1.0`
Llama-3.3-Nemotron-Super-49B-Healthcare-Text2sql-v1.0	`nvidia/llama-3.3-nemotron-super-49b-healthcare-text2sql-v1.0`
MiniMax-M2.5	`minimax-ai/minimax-m25`
NVIDIA-Nemotron-Nano-9B-v2-DGX-Spark	`nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark`
Nemotron-3-Super-120B-A12B	`nvidia/nemotron-3-super-120b-a12b`
Qwen3-Coder-Next	`qwen/qwen3-coder-next`
Qwen3-Next-80B-A3B-Instruct	`qwen/qwen3-next-80b-a3b-instruct`
Qwen3 Next 80B A3B Thinking	`qwen/qwen3-next-80b-a3b-thinking`
Qwen3-32B	`qwen/qwen3-32b`
Qwen3-32B NIM for DGX Spark	`qwen/qwen3-32b-dgx-spark`
Riva-Translate-4b-Instruct-v1.1	`nvidia/riva-translate-4b-instruct-v1.1`