NVIDIA DGX SuperPOD: Release Notes 11.31.0#

This document covers the latest DGX SuperPOD™ component versions, including the NVIDIA Base Command Manager (BCM) 11.31.0 and NVIDIA Run:ai 2.23 software releases, along with validated configurations for DGX SuperPOD systems.

Component Versions#

DGX SuperPOD component versions for this release are in Latest Validated SuperPOD Component Matrix.

Note

The below matrix is supported for DGX SuperPOD systems with B300, B200, H200, and H100 GPUs.

Table 1 Latest Validated SuperPOD Component Matrix#

Software Package

Component

H100/H200

B200

B300

Deployment Target

DGX OS (BCM)

DGX OS

7.3.1

7.3.1

7.3.1

DGX Nodes

DGX OS (BCM)

Ubuntu

24.04

24.04

24.04

DGX Nodes

DGX OS (BCM)

DGX Kernel

6.8.0-87-generic

6.8.0-87-generic

6.8.0-87-generic

DGX Nodes

DGX OS (BCM)

DCGM

4.4.2

4.4.2

4.4.2

DGX Nodes

DGX OS (BCM)

GPU Driver

R580TRD4 (580.105.08)

R580TRD4(580.105.08)

R580TRD4(580.105.08)

DGX Nodes

DGX OS (BCM)

DOCA OFED

3.1.0-091548

3.1.0-091548

3.1.0-091548

DGX Nodes

DGX OS (BCM)

nvidia-container-toolkit

1.18.10

1.18.10

1.18.10

DGX Nodes

BCM

BCM ISO

11.31.0

11.31.0

11.31.0

Head Nodes

BCM

Slurm

25.05

25.05

25.05

Slurm Login, DGX Nodes

BCM

Enroot

3.5.0

3.5.0

3.5.0

Slurm Login, DGX Nodes

BCM

Kubernetes

1.34

1.34

1.34

Kubernetes, DGX Nodes

BCM

GPU Operator

25.10

25.10

25.10

Kubernetes, DGX Nodes

BCM

Network Operator

25.10

25.10

25.10

Kubernetes, DGX Nodes

BCM

MetalLB

0.15.2

0.15.2

0.15.2

Kubernetes, DGX Nodes

BCM

Calico

3.30.2

3.30.2

3.30.2

Kubernetes, DGX Nodes

BCM

MPI Operator

0.6.0

0.6.0

0.6.0

Kubernetes, DGX Nodes

Run:ai

Run:ai

2.23

2.23

2.23

Kubernetes, DGX Nodes

Compute FW

DGX FW

25.12.2

25.12.1

25.11.2

DGX Nodes

Quantum (IB)

IB MLNX-OS/NVOS

3.12.6000 (MLNX-OS)

3.12.6000 (MLNX-OS)

25.02.6077 (NVOS)

InfiniBand Switch

NIC

ConnectX7/ConnectX8

28.47.1026 (CX7)

28.47.1026 (CX7)

40.46.3052(CX8)

All Nodes

BlueField-3

BlueField-3 (BF3)

32.47.1026

32.47.1026

32.43.2402

DGX Nodes

Spectrum (Eth)

Cumulus OS

5.15

5.15

5.15

Spectrum (Ethernet) Switches

UFM Appliance

UFM XDR AC

2.3.1

2.3.1

2.3.1

UFM Appliance

UFM Appliance

UFM Enterprise Appliance

1.14.1

1.14.1

1.14.1

UFM Appliance

NVIDIA Mission Control Component Versions#

The table below outlines the additional components included with the purchase of NVIDIA Mission Control Software (B200/B300 platforms only).

Table 2 Components Included with NVIDIA Mission Control Software 2.2 release.#

Software Package

Component

B200

B300

Deployment Target

Autonomous Hardware Recovery

Shoreline

28.4

Planned

All Nodes

Autonomous Job Recovery

Heimdall

1.5

Planned

All Nodes

Observability Stack

NMC Grafana Visualizations

27.1.0

Planned

All Nodes

Observability Stack

Loki

3.5.2

3.5.2

All Nodes

Observability Stack

Prometheus (via BCM)

3.5.0

3.5.0

All Nodes

Observability Stack

Grafana (via BCM)

12.0.2

12.0.2

All Nodes

Observability Stack

Node Exporter (via BCM)

1.9.1

1.9.1

All Nodes

Observability Stack

DCGM Exporter

4.2.3-4.1.1

4.2.3-4.1.1

All Nodes

Observability Stack

Promtail

3.5.1

3.5.1

All Nodes

Run:ai

Run:ai

2.23

2.23

All Nodes

Domain Power Service

Domain Power Service

0.7.8 (beta)

0.7.8 (beta)

All Nodes

Note

The tables above shows the latest validated DGX SuperPOD component matrix. NVIDIA supports this validated stack. While newer versions of individual components may be publicly available, updating to those versions requires careful consideration of dependencies. Please note that the DGX SuperPOD team may not be able to validate non-matrix component updates.

Reference Product Documentation#