Scope#

This document highlights suggested software package updates and steps to perform updates without considering cluster upgrade (or BCM Major version change) for DGX BasePOD and SuperPOD.

This document (June 2025 version) provides update recommendations for DGX H100/H200 BasePOD and SuperPOD customers using BCM version 10. The document outlines a minor update process, specifically for environments running BCM version 10.23.12, and describes the steps required to safely update to BCM version 10.25.03.

For DGX SuperPOD customers, before you update any installed software stack, always consult the respective release notes for the latest information about available components. Refer to the DGX SuperPOD release notes for the latest component versions.

Note

Storage solution and recommendation are not in scope for this document. Air-Gapped deployments are not in scope for this document.

The table below is a reference to the update path followed as part of creating this document. Before you update any installed software stack, always consult the respective documentation for the latest information about available components.

Table 1 Software Update Reference Table#

Component

Old Version

Updated Version

Update Source

BCM CMD

10.23.12

10.25.03

Customer Portal ISO BCM/Ubuntu Repo

BCM OS

Ubuntu 22.04.2 LTS

Ubuntu 22.04.5 LTS

Customer Portal ISO BCM/Ubuntu Repo

DGX Release

6.1.0

6.3.2

BCM/DGX/CUDA/Ubuntu Repository

DGX OS

Ubuntu 22.04.2 LTS

Ubuntu 22.04.5 LTS

BCM/DGX/CUDA/Ubuntu Repository

DGX Kernel

5.15.0-1042-nvidia

5.15.0-1078-nvidia

Ubuntu Repository

DGX GPU Driver

535.129.03

550.163.01

Ubuntu Repository

DGX Firmware

1.1.1

24.09.1

Enterprise Support

Enroot

3.4.1

3.5.0

BCM Repository

CUDA toolkit

12.2

12.4.1

CUDA Repository

DCGM

3.1.8

4.2.3

CUDA Repository

Cumulus OS (TOR/IPMI)

5.5.0

5.11.0

CUMULUS Repository

Slurm

23.02.6

23.02.8

BCM Repository

Mellanox OFED / DOCA

MLNX_OFED_LINUX-23.10-0.5.5.0 (OFED-internal-24.10-3.2.5)

MLNX_OFED_LINUX-23.10-4.0.9.1 DOCA_OFED 2.9.3

Mellanox/DOCA Page

UFM Appliance

v1.8.2.1

v1.10.1

NVIDIA Licensing

IB Switch

v3.11.3002

v3.12.2002

NVIDIA Licensing

Kubernetes

1.28

1.31

Kubernetes Repo

Helm Based Operators

GPU Operator: 24.9.0 Network Operator: 24.4.0 Metallb: 0.14.9 Prometheus Stack: 35.6.0 Kube State Metrics: 5.31.0

GPU Operator: 24.9.1 Network Operator: 24.7.0 Metallb: 0.14.9 Prometheus Stack: 72.9.1 Kube State Metrics: 5.36.0

Helm Repositories

Kubernetes Components

Kubeflow MPI: v1.5.0 MPI Operator: 0.4.0 Calico: v3.29.2

Kubeflow MPI: v1.7.0 MPI Operator: 0.6.0 Calico: v3.29.4

GitHub/Docker Repository

Run:ai

Control Plane: 2.19.58 Cluster: 2.19.58

Control Plane: 2.21.25 Cluster: 2.21.25

Run:ai Repository