Support Matrix for NIM Day 0#

This page lists the supported models and GPUs for NIM Day 0.

nemotron-content-safety-reasoning-4b-experimental#

Latest supported NIM LLM version: 2.0.3

For information on getting started with nemotron-content-safety-reasoning-4b-experimental, refer to version 2.0.3 of the Day 0 documentation.

nemotron-3-ultra-550b-a55b#

Latest supported NIM LLM version: 2.0.5-variant

The following table lists the supported configurations for nvidia/nemotron-3-ultra-550b-a55b:

The GPU Memory values are per GPU and in GB.

GPU

GPU Memory

Precision

TP

PP

Number of GPUs

LoRA Profile

NVIDIA-H100-80GB-HBM3

80

BF16

8

2

16 (multi-node)

Base and LoRA

NVIDIA-H100-80GB-HBM3

80

NVFP4

8

1

8

Base only

NVIDIA-H200

141

BF16

8

1

8

Base only

NVIDIA-H200

141

NVFP4

4

1

4

Base only

NVIDIA-H200

141

NVFP4

8

1

8

Base only

NVIDIA-B200

180

BF16

8

1

8

Base and LoRA

NVIDIA-B200

180

NVFP4

2

1

2

Base only

NVIDIA-B200

180

NVFP4

4

1

4

Base only

NVIDIA-B200

180

NVFP4

8

1

8

Base only

NVIDIA-B300-SXM6-AC

288

BF16

4

1

4

Base only

NVIDIA-B300-SXM6-AC

288

BF16

8

1

8

Base and LoRA

NVIDIA-B300-SXM6-AC

288

NVFP4

2

1

2

Base only

NVIDIA-B300-SXM6-AC

288

NVFP4

4

1

4

Base only

NVIDIA-B300-SXM6-AC

288

NVFP4

8

1

8

Base only

NVIDIA-GB200

192

BF16

4

2

8 (multi-node)

Base and LoRA

NVIDIA-GB200

192

NVFP4

2

1

2

Base only

NVIDIA-GB200

192

NVFP4

4

1

4

Base only

NVIDIA-GB300

288

BF16

4

1

4

Base only

NVIDIA-GB300

288

NVFP4

2

1

2

Base only

NVIDIA-GB300

288

NVFP4

4

1

4

Base only

Any

>= 270

BF16

4

1

4

Base only

Any

>= 139

BF16

8

1

8

Base and LoRA

Any

>= 172

NVFP4

2

1

2

Base only

Any

>= 90

NVFP4

4

1

4

Base only

Any

>= 49

NVFP4

8

1

8

Base only

LoRA profiles are available only for rows marked Base and LoRA.

Profile Topologies#

GPU-specific profiles use the following tensor-parallel (TP) and pipeline-parallel (PP) values:

  • NVIDIA-H100-80GB-HBM3: BF16 TP=8, PP=2 across two nodes (16 GPUs total). Requires inter-node InfiniBand or RoCE networking. NVFP4 TP=8, PP=1 on a single node (8 GPUs).

  • NVIDIA-H200: BF16 TP=8, PP=1 on a single node (8 GPUs); NVFP4 TP=4 or TP=8, PP=1 on a single node.

  • NVIDIA-B200: BF16 TP=8, PP=1 on a single node (8 GPUs); NVFP4 TP=2, TP=4, or TP=8 on a single node.

  • NVIDIA-B300-SXM6-AC: BF16 TP=4 or TP=8 on a single node; NVFP4 TP=2, TP=4, or TP=8 on a single node.

  • NVIDIA-GB200: BF16 TP=4, PP=2 across two NVL trays of four GPUs. Requires inter-node InfiniBand or RoCE networking. NVFP4 TP=2 or TP=4 is supported on a single node.

  • NVIDIA-GB300: BF16 TP=4 on a single node; NVFP4 TP=2 or TP=4 on a single node.

The Any rows are GPU-memory-gated profiles that are available when each GPU satisfies the listed min_vram_per_device_gb gate:

  • BF16 TP=4, PP=1 requires at least 270 GB per GPU.

  • BF16 TP=8, PP=1 requires at least 139 GB per GPU.

  • NVFP4 TP=2, PP=1 requires at least 172 GB per GPU.

  • NVFP4 TP=4, PP=1 requires at least 90 GB per GPU.

  • NVFP4 TP=8, PP=1 requires at least 49 GB per GPU.

For more information about deploying NIM across multiple nodes, refer to Multi-Node Deployment.