NVIDIA DGX SuperPOD: Release Notes 11.31.0#
This document covers the latest DGX SuperPOD™ component versions, including the NVIDIA Base Command Manager (BCM) 11.31.0 and NVIDIA Run:ai 2.23 software releases, along with validated configurations for DGX SuperPOD systems.
Component Versions#
DGX SuperPOD component versions for this release are in Latest Validated SuperPOD Component Matrix.
Note
The below matrix is supported for DGX SuperPOD systems with B300, B200, H200, and H100 GPUs.
Software Package |
Component |
H100/H200 |
B200 |
B300 |
Deployment Target |
|---|---|---|---|---|---|
DGX OS (BCM) |
DGX OS |
7.3.1 |
7.3.1 |
7.3.1 |
DGX Nodes |
DGX OS (BCM) |
Ubuntu |
24.04 |
24.04 |
24.04 |
DGX Nodes |
DGX OS (BCM) |
DGX Kernel |
6.8.0-87-generic |
6.8.0-87-generic |
6.8.0-87-generic |
DGX Nodes |
DGX OS (BCM) |
DCGM |
4.4.2 |
4.4.2 |
4.4.2 |
DGX Nodes |
DGX OS (BCM) |
GPU Driver |
R580TRD4 (580.105.08) |
R580TRD4(580.105.08) |
R580TRD4(580.105.08) |
DGX Nodes |
DGX OS (BCM) |
DOCA OFED |
3.1.0-091548 |
3.1.0-091548 |
3.1.0-091548 |
DGX Nodes |
DGX OS (BCM) |
nvidia-container-toolkit |
1.18.10 |
1.18.10 |
1.18.10 |
DGX Nodes |
BCM |
BCM ISO |
11.31.0 |
11.31.0 |
11.31.0 |
Head Nodes |
BCM |
Slurm |
25.05 |
25.05 |
25.05 |
Slurm Login, DGX Nodes |
BCM |
Enroot |
3.5.0 |
3.5.0 |
3.5.0 |
Slurm Login, DGX Nodes |
BCM |
Kubernetes |
1.34 |
1.34 |
1.34 |
Kubernetes, DGX Nodes |
BCM |
GPU Operator |
25.10 |
25.10 |
25.10 |
Kubernetes, DGX Nodes |
BCM |
Network Operator |
25.10 |
25.10 |
25.10 |
Kubernetes, DGX Nodes |
BCM |
MetalLB |
0.15.2 |
0.15.2 |
0.15.2 |
Kubernetes, DGX Nodes |
BCM |
Calico |
3.30.2 |
3.30.2 |
3.30.2 |
Kubernetes, DGX Nodes |
BCM |
MPI Operator |
0.6.0 |
0.6.0 |
0.6.0 |
Kubernetes, DGX Nodes |
Run:ai |
Run:ai |
2.23 |
2.23 |
2.23 |
Kubernetes, DGX Nodes |
Compute FW |
DGX FW |
25.12.2 |
25.12.1 |
25.11.2 |
DGX Nodes |
Quantum (IB) |
IB MLNX-OS/NVOS |
3.12.6000 (MLNX-OS) |
3.12.6000 (MLNX-OS) |
25.02.6077 (NVOS) |
InfiniBand Switch |
NIC |
ConnectX7/ConnectX8 |
28.47.1026 (CX7) |
28.47.1026 (CX7) |
40.46.3052(CX8) |
All Nodes |
BlueField-3 |
BlueField-3 (BF3) |
32.47.1026 |
32.47.1026 |
32.43.2402 |
DGX Nodes |
Spectrum (Eth) |
Cumulus OS |
5.15 |
5.15 |
5.15 |
Spectrum (Ethernet) Switches |
UFM Appliance |
UFM XDR AC |
2.3.1 |
2.3.1 |
2.3.1 |
UFM Appliance |
UFM Appliance |
UFM Enterprise Appliance |
1.14.1 |
1.14.1 |
1.14.1 |
UFM Appliance |
NVIDIA Mission Control Component Versions#
The table below outlines the additional components included with the purchase of NVIDIA Mission Control Software (B200/B300 platforms only).
Software Package |
Component |
B200 |
B300 |
Deployment Target |
|---|---|---|---|---|
Autonomous Hardware Recovery |
Shoreline |
28.4 |
Planned |
All Nodes |
Autonomous Job Recovery |
Heimdall |
1.5 |
Planned |
All Nodes |
Observability Stack |
NMC Grafana Visualizations |
27.1.0 |
Planned |
All Nodes |
Observability Stack |
Loki |
3.5.2 |
3.5.2 |
All Nodes |
Observability Stack |
Prometheus (via BCM) |
3.5.0 |
3.5.0 |
All Nodes |
Observability Stack |
Grafana (via BCM) |
12.0.2 |
12.0.2 |
All Nodes |
Observability Stack |
Node Exporter (via BCM) |
1.9.1 |
1.9.1 |
All Nodes |
Observability Stack |
DCGM Exporter |
4.2.3-4.1.1 |
4.2.3-4.1.1 |
All Nodes |
Observability Stack |
Promtail |
3.5.1 |
3.5.1 |
All Nodes |
Run:ai |
Run:ai |
2.23 |
2.23 |
All Nodes |
Domain Power Service |
Domain Power Service |
0.7.8 (beta) |
0.7.8 (beta) |
All Nodes |
Note
The tables above shows the latest validated DGX SuperPOD component matrix. NVIDIA supports this validated stack. While newer versions of individual components may be publicly available, updating to those versions requires careful consideration of dependencies. Please note that the DGX SuperPOD team may not be able to validate non-matrix component updates.