Release Notes#

This section provides detailed information for releases and upgrades available for the NVIDIA DGX™ Software Stack for Red Hat Enterprise Linux 9 and Rocky Linux 9.

Current Software Versions#

The following table shows the current version information of the software packages provided in the NVIDIA repositories for the NVIDIA DGX Software Stack.

Current Software Versions (Last Updated on December 9, 2024)#

Component

Version

Additional Information

GPU Driver

550.127.08

GPU Driver

535.216.03

CUDA Toolkit

12.4 Update 1

R550: 12.4 Update 1 download

CUDA Toolkit

12.2 Update 2

R535: 12.2 Update 2 download

MLNX_OFED

24.10-1.1.4.0 LTS

24.10-1.1.4.0 download

DOCA OFED

2.9.1

2.9.1 download

Inbox OFED

39.0-1

For DGX OS 6 only.

NCCL

2.23.4

cuDNN

9.6.0

DCGM

3.3.9

GPUDirect Storage (GDS)

  • 1.11.1 for CUDA Toolkit 12.6 Update 2

  • 1.11 for CUDA Toolkit 12.6

  • 1.10 for CUDA Toolkit 12.5

  • 1.9 for CUDA Toolkit 12.4

  • 1.8 for CUDA Toolkit 12.3

  • 1.7 for CUDA Toolkit 12.2

NVIDIA Container Toolkit

1.16.2

NVIDIA Container Toolkit includes the following packages:

  • nvidia-container-toolkit: 1.16.2

  • libnvidia-container-tools: 1.16.2

  • libnvidia-container1: 1.16.2

nvidia-peer-memory

1.3

Note

  • CUDA Toolkit is installed by default only for DGX stations and is optional for DGX servers. Refer to the CUDA Release Notes for driver compatibility information.

  • For CUDA Toolkit minor version compatibility and the minimum required driver version, refer to CUDA Compatibility.

The following table provides information about the matching firmware versions for the NVIDIA DOCA™ Host package with the doca-ofed installation profile v2.9.1 and the NVIDIA® OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) v24.10-1.1.4.0 LTS. For information about the MLNX_OFED release transition, refer to the MLNX_OFED section in Adapter Software.

Matching Firmware Versions (Last Updated on December 9, 2024)#
DGX-1, DGX-2
ConnectX-4 (CX-4) or
ConnectX-5 (CX-5)
DGX A100
ConnectX-6
DGX A100
ConnectX-7
DGX H100/H200
ConnectX-7

CX-5: 16.35.4030

CX-4: 12.28.2006

20.43.2026

28.43.2026

28.43.2026

For installation instructions, refer to

Note

For information about LTS software versions for related networking components, refer to the Networking Long-Term Support Releases page.

Latest Release#

Important

Installing or updating to the DGX Software also updates the installed Red Hat Enterprise Linux 9 distribution to the latest version.

If you use NVIDIA MLNX_OFED, before installing or updating to EL9-24.12, be sure that there is a MLNX_OFED package version available that supports the latest Red Hat Enterprise Linux 9 version.

  • To check the latest Red Hat Enterprise Linux 9 version, refer to Red Hat Knowledgebase article 3078.

  • To check the MLNX_OFED package OS support, visit Mellanox and click the latest NVIDIA MLNX_OFED software version. Use the side menu to navigate to Release Notes > General Support and view Supported Operating Systems.

Release EL9-24.12#

Release Date: December 18, 2024

Release Highlights#

Qualified Software Stack#

The following table shows the current version information of the software packages provided in the NVIDIA repositories for the NVIDIA DGX Software Stack.

Component

Latest versions in the repositories

DGX Base OS

EL9-24.12

OS

Red Hat Enterprise Linux 9.5 and Rocky Linux 9.5

Kernel

5.14.0-503.15.1.el9_5.x86_64

GPU Driver

CUDA Toolkit

NCCL

2.23.4

cuDNN

9.5.1

DCGM

3.3.8

GPU Direct Storage

  • 1.9 for CUDA 12.4

  • 1.7 for CUDA 12.2

NVIDIA System Management (NVSM)

24.06.05

Docker CE

27.3.1

NVIDIA Container Runtime

  • nvidia-container-toolkit: 1.16.2

  • libnvidia-container-tools: 1.16.2

  • libnvidia-container1: 1.16.2

MIG Configuration Tool

0.10.0

GDRCopy

2.4.3

DLFW (Deep Learning Frameworks)

24.10

The following table provides information about the supported OS and matching firmware versions for NVIDIA DOCA™ Host package with the doca-ofed installation profile v2.9.1 and the NVIDIA® OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) v24.10-1.1.4.0.

OS
DGX-1, DGX-2
ConnectX-4 (CX-4) or
ConnectX-5 (CX-5)
DGX A100
ConnectX-6
DGX A100
ConnectX-7
DGX H100/H200
ConnectX-7
RHEL 9

CX-5: 16.35.4030

CX-4: 12.28.2006

20.43.2026

28.43.2026

28.43.2026

Supported DGX Systems#

The EL9-24.12 release supports the following DGX systems:

  • DGX H200 1,128 GB

  • DGX H100 640 GB

  • DGX A100 640 GB

  • DGX A100 320 GB

  • DGX A800 640 GB

  • DGX-2

  • DGX-1 32 GB

  • DGX Station A100 320 GB

  • DGX Station A100 160 GB

  • DGX Station A800 320 GB

  • DGX Station 32 GB

Previous Releases#

Release EL9-24.06#

Release Date: July 11, 2024

Release Highlights#

  • Added support for Red Hat Enterprise Linux 9.4 and Rocky 9.4.

  • Introduced support for the NVIDIA DOCA™ Host package with the doca-ofed installation profile v2.7.0.

  • Included support for the NVIDIA® OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) v24.04-0.6.6.0

  • Continued support for single-port ConnectX-7 VPI adapter card for DGX A100 System.

  • Updated the DGX Software Stack.

Qualified Software Stack#

The following table shows the current version information of the software packages provided in the NVIDIA repositories for the NVIDIA DGX Software Stack.

Component

Latest versions in the repositories

DGX Base OS

EL9-24.06

OS

Red Hat Enterprise Linux 9.4 and Rocky Linux 9.4

Kernel

5.14.0-427.18.1.el9_4.x86_64

GPU Driver

CUDA Toolkit

NCCL

2.21.5

cuDNN

9.1.1

DCGM

3.3.6

GPU Direct Storage

  • 1.9.1 for CUDA 12.4

  • 1.7.2 for CUDA 12.2

NVIDIA System Management (NVSM)

24.03.03

Docker CE

26.1.3

NVIDIA Container Runtime

  • nvidia-container-toolkit: 1.15.0

  • libnvidia-container-tools: 1.15.0

  • libnvidia-container1: 1.15.0

MIG Configuration Tool

0.7.0

GDRCopy

2.4.1

DLFW (Deep Learning Frameworks)

24.05

The following table provides information about the supported OS and matching firmware versions for the NVIDIA® OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) v24.04-0.6.6.0 and the NVIDIA DOCA™ Host package with the doca-ofed installation profile v2.7.0.

OS
DGX-1, DGX-2
ConnectX-4 (CX-4) or
ConnectX-5 (CX-5)
DGX A100
ConnectX-6
DGX A100
ConnectX-7
DGX H100
ConnectX-7
RHEL 9

CX-5: 16.35.3502

CX-4: 12.28.2006

20.41.1000

28.41.1000

28.41.1000

Supported DGX Systems#

The EL9-24.06 release supports the following DGX systems:

  • DGX H100

  • DGX A100 640 GB

  • DGX A100 320 GB

  • DGX A800 640 GB

  • DGX-2

  • DGX-1 32 GB

  • DGX Station A100 320 GB

  • DGX Station A100 160 GB

  • DGX Station A800 320 GB

  • DGX Station 32 GB

Release EL9-23.12#

Release Date: December 19, 2023

Release Highlights#

  • Added support for Red Hat Enterprise Linux 9.3 and Rocky 9.3.

  • Continued support for Red Hat Enterprise Linux 9.2 and Rocky Linux 9.2.

  • Added support for single-port ConnectX-7 VPI adapter card for DGX A100 System.

  • Added support for NVIDIA® OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) version 23.10-1.1.9.0 - a long-term support (LTS) release.

  • Continued support for DGX H100.

Qualified Software Stack#

The following table shows the current version information of the software packages provided in the NVIDIA repositories for the NVIDIA DGX Software Stack.

Component

Latest versions in the repositories

DGX Base OS

EL9-23.12

OS

Red Hat Enterprise Linux 9.3 and Rocky Linux 9.3

Kernel

5.14.0-362.8.1.el9_3

GPU Driver and CUDA Toolkit

CUDA Toolkit 12.2 and GPU Driver 535.129.03 (Default)

NCCL

2.19.3

cuDNN

8.9.6

DCGM

3.3.0-002

GPU Direct Storage

1.7.2 or later

NVIDIA System Management (NVSM)

23.09.02

Docker-CE

24.0.7-1

NVIDIA Container Runtime

  • nvidia-docker2: 2.13.0-1

  • nvidia-container-toolkit (and base): 1.14.3-1

  • libnvidia-container-tools: 1.14.3-1

  • libnvidia-container1: 1.14.3-1

MIG Configuration Tool

0.5.4-1

NGC CLI

3.17.0-1

DLFW (Deep Learning Frameworks)

23.10

The following table provides information about the supported OS and matching firmware versions for NVIDIA® OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) version 23.10-1.1.9.0.

OS
DGX-1, DGX-2
ConnectX-4 (CX-4) or
ConnectX-5 (CX-5)
DGX A100
ConnectX-6
DGX A100
ConnectX-7
DGX H100
ConnectX-7
RHEL 9

CX-5: 16.35.3006

CX-4: 12.28.2006

20.39.1002

28.39.1002

28.39.1002

Supported DGX Systems#

NVIDIA has validated and tested EL9-23.12 with the following DGX systems:

  • DGX H100

  • DGX A100 640 GB

  • DGX A100 320 GB

  • DGX A800 640 GB

  • DGX-2

  • DGX-1 32 GB

  • DGX Station A100 320 GB

  • DGX Station A100 160 GB

  • DGX Station 32 GB

Resolved Issues#

The following issues have been resolved in the EL9-23.12 release:

Bug ID

Issue

4108242

Running joc tests resulted in an unrecognized arguments: --local-rank error with GPU driver R525.105.17.

4386925

GPUDirect RDMA bandwidth test failed with the Xid (PCI:0000:0f:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus message.

Release EL9-23.08#

Release Highlights#

  • Add support for NVIDIA DGX H100 System. Support is limited to the Red Hat Enterprise Linux 9.1 release.

  • Add support for Red Hat Enterprise Linux 9.2 and Rocky Linux 9.2.

Qualified Software Stack#

The following table provides version information for EL9-23.08 and the software it has been qualified:

Component

Latest versions in the repositories

Linux Distribution

Red Hat Enterprise Linux 9.2 and Rocky Linux 9.2

For NVIDIA DGX H100 Systems, only Red Hat Enterprise Linux 9.1 is supported.

GPU Driver

535.86.10

CUDA Toolkit

12.2.0

NCCL

2.18.3

CuDNN

8.9.2.26

DCGM

3.1.8

MLNX OFED

  • ConnectX-7 with DGX H100: 5.9-0.5.6.0.125

  • ConnectX-7 with DGX A100: 5.4-3.7.5.0

  • ConnectX-6 with DGX A100: 5.8-3.0.7.0

  • ConnectX-5 and ConnectX-4: 5.8-3.0.7.0

MLNX FW

  • ConnectX-7 and DGX H100: 28.36.2050

  • ConnectX-7 and DGX A100: 28.34.4000

  • ConnectX-6 and DGX A100: 20.35.4000

  • ConnectX-5: 16.35.3006

  • ConnectX-4: 12.28.2006

GPU Direct Storage

1.7.2

NVIDIA System Management (NVSM)

23.06.04

Docker Engine

23.0.4

NVIDIA Container Runtime

  • nvidia-docker2: 2.13.1-1

  • nvidia-container-toolkit (and base): 1.13.1-1

  • libnvidia-container-tools: 1.13.1-1

  • libnvidia-container1: 1.13.1-1

MIG Configuration Tool

0.5.1

NGC CLI

3.17.0

DLFW (Deep Learning Frameworks)

23.07

The following table provides information about the supported OS and matching firmware versions for Mellanox OFED.

OS

DGX-1, DGX-2

ConnectX-4 or ConnectX-5

DGX A100

ConnectX-6 (CX-6)

DGX A100

ConnectX-7 (CX-7)

DGX H100

ConnectX-7 (CX-7)

RHEL 8

5.8-3.0.7.0

  • CX-5: 16.35.3006

  • CX-4: 12.28.2006

  • RHEL 8.8

5.8-3.0.7.0

  • CX-6: 20.35.3006

  • RHEL 8.8

5.4-3.7.5.0

  • CX-7: 28.34.4000

  • RHEL 8.8

5.9-0.5.6.0.127

  • CX-7: 28.36.2050

  • RHEL 8.7

RHEL 9

5.8-3.0.7.0

  • CX-5: 16.35.3006

  • CX-4: 12.28.2006

  • RHEL 9.2

5.8-3.0.7.0

  • CX-6: 20.35.3006

  • RHEL 9.2

5.4-3.7.5.0

  • CX-7: 28.34.4000

  • RHEL 9.2

5.9-0.5.6.0.127

  • CX-7: 28.36.2050

  • RHEL 9.1

Supported DGX Systems#

NVIDIA has validated and tested EL9-23.08 with the following DGX systems:

  • NVIDIA DGX H100

  • NVIDIA DGX A100

  • NVIDIA DGX Station A100

  • NVIDIA DGX Station

  • NVIDIA DGX-2

  • NVIDIA DGX-1

Release EL9-23.01#

Initial release of the DGX Software Stack for Red Hat Enterprise Linux 9.

Qualified Software Stack#

The following table provides version information for EL9-23.01 and the software it has been qualified:

Component

Versions in this release

Linux Distribution

Red Hat Enterprise Linux 9.1 and Rocky Linux 9.1

GPU Driver

525.105.17

CUDA Toolkit

12.0

NCCL

2.18.1

CuDNN

8.9.1.23

DCGM

3.1.8

NVIDIA MLNX_OFED

5.8-2.0.3.0

NVIDIA ConnectX Firmware

  • CX-4: 12.28.2006

  • CX-5: 16.35.2000

  • CX-6: 20.35.2000

NVIDIA System Management (NVSM)

22.12.04

Docker Engine

23.0.4

NVIDIA Container Runtime

  • nvidia-docker2: 2.13.0-1

  • nvidia-container-toolkit (and base): 1.13.1-1

  • libnvidia-container-tools: 1.13.1-1

  • libnvidia-container1: 1.13.1-1

MIG Configuration Tool

0.5.1

NGC CLI

3.17.0

DLFW (Deep Learning Frameworks)

23.03

Supported DGX Systems#

NVIDIA has validated and tested EL9-23.01 with the following DGX systems:

  • DGX-1

  • DGX-2

  • DGX Station

  • DGX A100

  • DGX Station A100