Base OS - DGX OS 5

Release Notes

This section provides detailed information for releases and upgrades available for DGX OS 5.

Note

Software upgrades are cumulative, which means that your systems will always receive the latest versions of all installed software components. The packages in the repositories may also newer than the current DGX OS release. You should evaluate information and advisements from all relevant releases and later upgrades.

Here is a current list of software versions of the qualified DGX software stack in the repositories:

Component

Version

Additional Information

GPU Driver

450.216.04

470.161.03

515.86.01

The default driver included in the DGX OS ISO is R470.

CUDA Toolkit

11.4.4

Note: The CUDA Tookit is only installed for DGX Stations and option fo DGX servers. Refer also to the latest CUDA Release Notes for driver compatibility information.

NCCL

2.15.1

cuDNN

8.4.1

DCGM

2.4.7

Mellanox OFED

5.4-3.5.8.0

MLNX FW

ConnectX-4: 12.28.2006

ConnectX-5: 16.31.2006

ConnectX-6: 20.31.2354

ConnectX-7: 28.34.4000

GPUDirect Storage (GDS)

1.0

NVSM

22.09.07

Docker Engine

23.0

Refer to Docker Engine.

NVIDIA Container Toolkit

1.23.0

NVIDIA Container Toolkit includes the following packates:

  • libnvidia-container-tools: 1.12.0-1

  • libnvidia-container1: 1.12.0-1

  • nvidia-container-toolkit: 1.12.0-1

  • nvidia-docker2: 2.11.0

MIG Configuration Tool

0.4.3

Refer to the NVIDIA mig-parted project on GitHub.

NGC CLI

2.2.0-1

Refer to the NGC CLI Documentation.

nvipmitool

1.0.6.0

nvidia-peer-memory

nvidia-peer-memory-dkms

1.3.0

This section provides details of each DGX OS release. These include mostly new NVIDIA features and accumulated bug fixes and security updates.

DGX OS 5.4

Here are the new features in DGX OS 5.4.

  • GPUDirect Storage 1.0 was added

  • Upgraded Software packages:

    • NVSM to 22.06.02

    • DCGM to 2.4.7

    • MLNX OFED to 5.4-3.5.8.0

    • docker-ce: 20.10.18

nvidia-mig-parted now contains a set of checkpoint/restore commands. These allow one to checkpoint (and later restore) the MIG configuration applied across all GPUs on a node, regardless of what tool was used to set up those MIG configurations.

In previous versions of nvidia-mig-parted, all MIG configurations had to be done via nvidia- mig-parted itself in order for it to recognize and subsequently reconfigure the MIG state on a set of GPUs. With this new checkpoint/restore feature, tools such as nvidia-smi can be used to configure MIG as well.

The following example partitions the GPU and then saves and restores a checkpoint.

  • Partition the GPU

    Copy
    Copied!
                

    $ sudo nvidia-smi mig -C -cgi 1g.5gb,1g.5gb,1g.5gb,1g.5gb,1g.5gb,1g.5gb,1g.5gb

  • Save a checkpoint of the GPU partition:

    Copy
    Copied!
                

    $ sudo -E nvidia-mig-parted checkpoint

    This will save a checkpoint of the current MIG state to the default location of /var/lib/nvidia-mig-manager/checkpoint.json.

  • Later (after rebooting the system, for example) users can run restore to ensure that the checkpointed MIG configuration is properly restored:

    Copy
    Copied!
                

    $ sudo -E nvidia-mig-parted restore

DGX OS 5.3

Here are the new features in DGX OS 5.3. See also Update: November 22, 2022 for important changes since the release.

Warning

The features and component versions in DGX OS 5.3 are identical to the versions in DGX OS 5.2. In DGX OS 5.3, the GPG keys that are used to sign the packages and metadata in those repositories need to be rotated.

Refer to Rotating the GPG Keys for more information.

DGX OS 5.2

Here are the new features in DGX OS 5.2:

  • Updated NVSM to 21.09.14

  • Updated DCGM to 2.3.2

  • Added DGX Software Stack installation method

The DGX Software Stack provides the option to install a vanilla version of Ubuntu 20.04 and then separately install the additional NVIDIA software (NVIDIA DGX Software Stack). This option is available for DGX servers (DGX A100, DGX-2, DGX-1). The DGX Software Stack is a stream-lined version of the software stack incorporated into the DGX OS ISO image, and includes meta-packages to simplify the installation process. Refer to Installing the DGX Software Stack.

DGX OS 5.1

Here are the new features in DGX OS 5.3. See also Update: November 22, 2022 for important changes since the release.

  • Added NVIDIA GPU driver Release 470.

    Note

    When upgrading DGX OS, the system remains on the installed GPU driver branch. For example, the GPU driver branch on the system does not automatically switch from R450 to R470. Refer to the Changing Your GPU Branch section of the DGX OS User Guide for instructions on switching GPU driver branches.

  • Supports the CUDA Toolkit up to 11.4 natively, or newer versions via the compatibility module.

  • Updated the Docker Engine to 20.10.

  • Incorporates NVIDIA MLNX_OFED 5.4.

  • Updated NVSM

  • Added ability to generate a test alert/email.

  • NVSM dump/show health includes firmware version information (incorporates ‘nvsm show -level all’ in the command).

  • NVSM binds port 273 to 127.0.0.1 to limit external communications. To open other ports for IPV4 or IPV6, edit nvsm.config (bindaddress) and then restart NVSM

  • Added NVML libraries

  • Includes MOFED 5.4

  • Added NGC CLI

  • Added MIG Configuration Tool to define MIG partitions and provide a systemd service to make MIG partitions persist across reboots.

  • MIG is disabled by default

  • The MIG configuration file overrides any MIG-related nvidia-smi commands. Use nvidia-mig-parted instead of nvidia-smi for MIG configuration.

  • arp_ignore=1 and arp_announce=2 are now set on all InfiniBand configured interfaces.

  • Added LLDPd for validating network cabling The default configuration is now set to use the PortID of the interface name rather than the MAC address.

  • Added support for GPUDirect Storage 1.0 (Refer to GDS Documentation for installation instructions)

Warning

This release incorporates the following updates.

  • NVIDIA MLNX_OFED 5.4

Customers are advised to consider these updates and any effect they may have on their application. For example, some MOFED-dependent applications may be affected.

A best practice is to upgrade on select systems and verify that your applications work as expected before deploying on more systems.

DGX OS 5.0

This is the initial DGX OS 5 release. Here are the new features in DGX OS 5:

  • NVIDIA GPU driver Release 450.

  • Supports the CUDA Toolkit up to 11.0 natively, or newer versions via the compatibility module.

  • Incorporates NVIDIA MLNX_OFED 5.1.

  • Added rootfs encryption option, configurable during the re-imaging process.

  • Added option to password protect the GRUB menu, configurable during the first boot process.

  • Updated NVSM

  • Added support for custom drive partitioning

  • Added monitoring of firmware health

  • Updated the default InfiniBand network naming policy.

The InfinBand interfaces, enumerated as ibx in previous releases, now enumerate as ibpxsy (similar to Ethernet (enpxsy). Refer to the DGX A100 User Guide for the new naming.

Warning

This release incorporates the following updates.

  • NVIDIA MLNX_OFED 5.1

Customers are advised to consider these updates and any effect they may have on their application. For example, some MOFED-dependent applications may be affected.

A best practice is to upgrade on select systems and verify that your applications work as expected before deploying on more systems.

This section provides information about the updates to DGX OS 5. The updates listed include:

  • Major component updates in the Ubuntu repositories.

  • NVIDIA driver updates in the Ubuntu repositories

Update: November 22, 2022

  • The following changes were made to the Ubuntu repositories:

    • R515 NVIDIA GPU Driver: 515.86.01

    • R470 NVIDIA GPU Driver: 470.161.03

    • R450 NVIDIA GPU Driver: 450.216.04

Update: October 14, 2022

  • The following changes were made to the repositories:

    • GPUDirect Storage 1.0 was added.

    • The following changes were made to the Ubuntu repositories:

    • R470 NVIDIA GPU Driver: 470.129.06

    • R450 NVIDIA GPU Driver: 450.203.03

    • NCCL 2.15.1

    • DCGM 2.4.7

    • MOFED 5.4-3.5.8.0

    • NVSM 22.06.02

    • Docker-ce 20.10.18

    • MIG Configuration Tool: 0.4.3

  • The DGX OS ISO 5.4.1 has been released.

DGX OS 5.4 Release: August 8, 2023

Update: June 7, 2022

  • The installer version has been updated to 5.3.1.

  • The following changes were made to the repositories:

    • R470 NVIDIA GPU Driver: 470.129.06

    • R450 NVIDIA GPU Driver: 450.191.01

    • DCGM: 2.3.6

    • NVSM: 22.03.05

    • Docker CE: 20.10.16

    • nvidia-peer-memory/nvidia-peer-memory DKMS: 1.3.0

  • The DGX OS ISO 5.3.1 has been released.

Update: May 17, 2022

  • The following changes were made to the Ubuntu repositories:

    • NVIDIA GPU R470 Driver: 470.129.06

    • NVIDIA GPU R450 Driver: 450.191.01

DGX OS 5.3 Release: April 28, 2022

DGX OS 5.2 Release: February 17, 2022

  • DGX OS 5.2 has been released.

  • Installer version has been updated to 5.2.0.

  • Added DGX Software Stack installation method

    The DGX Software Stack provides the option to install a vanilla version of Ubuntu 20.04 and then separately install the additional NVIDIA software (NVIDIA DGX Software Stack). This option is available for DGX servers (DGX A100, DGX-2, DGX-1). The DGX Software Stack is a stream-lined version of the software stack incorporated into the DGX OS ISO image, and includes meta-packages to simplify the installation process. Refer to the Installing the DGX Software Stack for instructions.

  • The following changes were made to the Ubuntu repositories:

    • R470 NVIDIA GPU Driver: 470.103.01

    • R450 NVIDIA GPU Driver: 470.172.01

  • The following changes were made to NVIDIA repositories:

    • DCGM: 2.3.2

    • NVSM: 21.09.14

    • Docker CE: 20.10.11

    • nvidia-peer-memory/nvidia-peer-memory DKMS: 1.3.0

  • The DGX OS ISO 5.2.0 has been released.

Update: December 14, 2021

  • Installer version updated to 5.1.1.

  • The following changes were made to the Ubuntu repositories:

    • R470 NVIDIA GPU Driver: 470.82.01

  • The following changes were made to the NVIDIA repositories:

    • DCGM: 2.3.1

    • NVSM: 21.09.10

    • MOFED: MLNX 5.4-3.1.0.0

    • Docker CE: 20.10.11

    • nvidia-container stack:

      • nvidia-docker2-2.8.0-1

      • nvidia-container-runtime-3.7.0-1

      • nvidia-container-toolkit-1.7.0-1

      • libnvidia-container-tools-1.7.0-1

      • libnvidia-container1-1.7.0-1

    • nvipmitool: 1.0.6.0

    • nvidia-peer-memory/nvidia-peer-memory DKMS: 1.2.0

Update: October 26 , 2021

  • The following changes were made to the Ubuntu repositories:

    • NVIDIA GPU Driver: 450.156.00

DGX OS 5.1 Release: August 26, 2021

  • The following updates were made to the NVIDIA repositories:

    • Docker Engine: 20.10.7

    • NVSM: 21.07.15

    • DCGM: 2.2.9

    • nvidia-container-runtime: 3.5.0-1

    • NVIDIA MLNX_OFED: 5.4-1.0.3.0

    • (New) NGC CLI: 2.2.0

    • (New) MIG Configuration Tool: 0.1.2-1

  • The following changes were made to the Ubuntu repositories:

    • Added the release 470 GPU Driver: 470.57.02

  • The DGX OS ISO 5.1.0 has been released.

Update: June 30 , 2021

  • The following changes were made to the NVIDIA repositories:

    • GPUDirect Storage: Added support for GPUDirect Storage (GDS). It requires manual installation. For more information and installation instructions, refer to Installing GPUDirect Storage.

    • NVSM for GPUDirect Storage: Updated to 21.03.11 only when installing GPUDirect Storage

    • MOFED for GPUDirectStorage: Updated to 5.3-1.0.5.0 only when installing GPUDirect Storage.

Update: June 20 , 2021

  • The following changes were made to the Ubuntu repositories:

    • NVIDIA GPU Driver: 450.142.00

Update: June 2, 2021

  • The following changes were made to the Ubuntu repositories:

    • NVIDIA GPU Driver: 450.119.04 These are signed drivers and replace the unsigned drivers.

Update: May 27, 2021

  • The following changes were made to the NVIDIA repositories:

    • NVSM: 20.09.26

    • MOFED: MLNX 5.1-2.6.2.0

      Incorporates mlnx-fw-updater 5.2-1.0.4.0. When the update is made, the Mellanox FW updater updates the ConnectX card firmware as follows:

      Card

      Firmware Version

      ConnectX-4

      12.28.2006

      ConnectX-5

      16.29.1016

      ConnectX-6

      20.29.1016

      Note

      The firmware in the ConnectX-4 may have been upgraded to a later release. Refer to Appendix B: Downgrade Firmware for Mellanox ConnectX-4 Cards for more information and instructions to force to downgrade the version.

Update: May 06, 2021

  • The following change was made in the NVIDIA repositories:

    • NVIDIA GPU Driver: 450.119.04

      Unsigned precompiled 450.119.04 kernel modules have been added to the NVIDIA repositories which provides a fix for issue Driver Version Mismatch Reported. They will be removed once signed precompiled 450.119.04 kernel modules are provided by Canonical.

      Warning

      Do not update if your system has Secure Boot enabled. Since these are unsigned drivers, systems with Secure Boot enabled will fail to load the drivers.

Update: April 20, 2021

Update: April 13, 2021

  • The following changes were made in the NVIDIA repositories:

    • GPUDirect Storage: Added support for GPUDirect Storage as a Technical Preview. GPUDirect Storage requires manual installation. For more information and installation instructions, refer to GDS Troubleshooting.

    • MOFED: Updated to MLNX 5.1-2.6.2.0

      Note

      There is no need anymore to manually uninstall previous MOFED versions before getting this update.

Update: March 30, 2021

  • The following changes were made in the NVIDIA repositories:

    • MOFED: MLNX 5.1-2.5.8.0.47

      Warning

      If you have already updated to the latest Ubuntu kernel (uname -a reports 5.4.0-67 or later), then you need to uninstall MOFED and then reinstall it as follows.

      Copy
      Copied!
                  

      $ apt-get purge mlnx-ofed-all mlnx-ofed-kernel-dkms --auto-remove

      Copy
      Copied!
                  

      $ apt-get update

      Copy
      Copied!
                  

      $ apt-get install mlnx-ofed-all nvidia-peer-memory-dkms

Update: March 2, 2021

  • Added support for the DGX Station A100.

  • The following changes were made in the NVIDIA repositories:

    • DCGM: 2.0.14

    • NVSM: 20.09.20

  • The DGX OS ISO 5.0.2 has been released.

Update: February 23, 2021

  • The following change was made in the NVIDIA repositories:

    • NVSM: 20.09.17

Update: January 20, 2021

  • The following change was made in the Ubuntu repositories:

    • NVIDIA GPU Driver: 450.102.04

Update: December 11, 2020

  • The following changes were made in the NVIDIA repositories:

    • Docker: docker-ce 19.03.14 This addresses CVE-2020-15257

    • MOFED: MLNX 5.1-2.5.8.0 When the update is made, the Mellanox FW updater updates the ConnectX card firmware as follows:

      Card

      Firmware Version

      ConnectX-4

      12.28.2006

      ConnectX-5

      16.28.4000

      ConnectX-6

      20.28.4000

      Note

      The firmware in the ConnectX-4 may have been upgraded to a later release. Refer to Appendix B: Downgrade Firmware for Mellanox ConnectX-4 Cards for more information and instructions to force to downgrade the version.

Update: October 31, 2020 (DGX OS 5.0 Release)

This section lists all DGX OS ISO releases with the software versions included in the image.

DGX OS ISO 5.4.1

Component

Version

Additional Information

Ubuntu

20.04 LTS

Ubuntu Kernel

5.4.0-52.57

GPU Driver

470.129.06

CUDA Toolkit

11.4.4

Note: The CUDA Tookit is only installed for DGX Stations and optional fo DGX servers. Refer also to the latest CUDA Release Notes for driver compatibility information.

NCCL

2.15.1

cuDNN

8.4.1

DCGM

2.4.7

Mellanox OFED

5.4-3.5.8.0

MLNX FW

ConnectX-4: 12.28.2006

ConnectX-5: 16.31.2006

ConnectX-6: 20.31.2354

ConnectX-7: 28.34.4000

GPUDirect Storage (GDS)

1.0

NVSM

22.09.07

Docker Engine

23.0

Refer to Docker Engine.

NVIDIA Container Toolkit

1.12.0

NVIDIA Container Toolkit includes the following packates: * libnvidia-container-tools: 1.12.0-1 * libnvidia-container1: 1.12.0-1 * nvidia-container-toolkit: 1.12.0-1 * nvidia-docker2: 2.11.0

MIG Configuration Tool

0.4.3

Refer to the NVIDIA mig-parted project in GitHub.

NGC CLI

2.2.0-1

Refer to the NGC CLI Documentation.

nvipmitool

1.0.6.0

nvidia-peer-memory

nvidia-peer-memory-dkms

1.3.0

DGX OS ISO 5.3.1

Component

Version

Additional Information

Ubuntu

20.04 LTS

Ubuntu Kernel

5.4.0-113.127

GPU Driver

470.129.06

CUDA Toolkit

11.4.2

Note: The CUDA Tookit is only installed for DGX Stations and optional fo DGX servers. Refer also to the latest CUDA Release Notes for driver compatibility information.

DCGM

2.3.6

Mellanox OFED

5.4-3.1.0.0

NVSM

22.03.05

Docker Engine

20.10.16

Refer to Docker Engine.

NVIDIA Container Toolkit

1.7.0

NVIDIA Container Toolkit includes the following packages:

  • libnvidia-container-tools: 1.7.0-1

  • libnvidia-container1: 1.7.0-1

  • nvidia-container-toolkit: 1.7.0-1

  • nvidia-container-runtime: 2.8.0-1

  • nvidia-docker2: 2.8.0-1

MIG Configuration Tool

0.1.2-1

Refer to the NVIDIA mig-parted project in GitHub.

NGC CLI

2.2.0-1

Refer to the NGC CLI Documentation.

nvipmitool

1.0.6.0

nvidia-peer-memory

nvidia-peer-memory-dkms

1.3.0

DGX OS ISO 5.2.0

Component

Version

Additional Information

Ubuntu

20.04 LTS

Ubuntu Kernel

5.4.0-80.90

GPU Driver

470.103.01

CUDA Toolkit

11.4.4

Note: The CUDA Tookit is only installed for DGX Stations and optional fo DGX servers. Refer also to the latest CUDA Release Notes for driver compatibility information.

DCGM

2.3.2

Mellanox OFED

5.4-1.0.3.0

MLNX FW

ConnectX-4: 12.28.2006

ConnectX-5: 16.31.2006

ConnectX-6: 20.31.2354

ConnectX-7: 28.34.4000

NVSM

21.09.14

Docker Engine

20.10.11

Refer to Docker Engine.

NVIDIA Container Toolkit

1.7.0

NVIDIA Container Toolkit includes the following packages:

  • libnvidia-container-tools: 1.7.0-1

  • libnvidia-container1: 1.7.0-1

  • nvidia-container-toolkit: 1.7.0-1

  • nvidia-container-runtime: 3.5.0-1 FIXME Can’t be?

  • nvidia-docker2: 2.8.0

MIG Configuration Tool

0.1.2-1

Refer to the NVIDIA mig-parted project in GitHub.

NGC CLI

2.2.0-1

Refer to the NGC CLI Documentation.

nvipmitool

1.0.6.0

nvidia-peer-memory

nvidia-peer-memory-dkms

1.3.0

DGX OS ISO 5.1.0

Component

Version

Additional Information

Ubuntu

20.04 LTS

Ubuntu Kernel

5.4.0-80.90

GPU Driver

470.57.02

CUDA Toolkit

11.4.0

Note: The CUDA Tookit is only installed for DGX Stations and option fo DGX servers. Refer also to the latest CUDA Release Notes for driver compatibility information.

DCGM

2.2.9

Mellanox OFED

5.4-1.0.3.0

NVSM

22.03.05

Docker Engine

20.10.7

Refer to Docker Engine.

NVIDIA Container Toolkit

1.5.1

NVIDIA Container Toolkit includes the following packages:

  • libnvidia-container-tools: 1.5.1-1

  • libnvidia-container1: 1.4.0-1

  • nvidia-container-toolkit: 1.4.0-1

  • nvidia-container-runtime: 3.5.0-1

  • nvidia-docker2: 2.6.0-1

MIG Configuration Tool

0.1.2-1

Refer to the NVIDIA mig-parted project in GitHub.

NGC CLI

2.2.0-1

Refer to the NGC CLI Documentation.

DGX OS ISO 5.0.2

Component

Version

Additional Information

Ubuntu

20.04 LTS

Ubuntu Kernel

5.4.0-58.127

GPU Driver

450.80.02

CUDA Toolkit

11.4.0

Note: The CUDA Tookit is only installed for DGX Stations and option fo DGX servers. Refer also to the latest CUDA Release Notes for driver compatibility information.

DCGM

2.0.14

Mellanox OFED

5.1-2.5.8.0

NVSM

20.09.17

Docker Engine

19.03.14

Refer to Docker Engine.

NVIDIA Container Toolkit

1.3.0

NVIDIA Container Toolkit includes the following packages:

  • libnvidia-container-tools: 1.3.0-1

  • libnvidia-container1: 1.3.0-1

  • nvidia-container-toolkit: 1.3.0-1

  • nvidia-container-runtime: 3.4.0-1

  • nvidia-docker2: 2.5.0-1

DGX OS ISO 5.0.0

Component

Version

Additional Information

Ubuntu

20.04 LTS

Ubuntu Kernel

5.4.0-52.127

GPU Driver

450.80.02

CUDA Toolkit

11.4.0

Note: The CUDA Tookit is only installed for DGX Stations and option fo DGX servers. Refer also to the latest CUDA Release Notes for driver compatibility information.

DCGM

2.0.13

Mellanox OFED

5.1-2.4.6.0

NVSM

20.07.40

Docker Engine

19.03.13

Refer to Docker Engine.

NVIDIA Container Toolkit

1.3.0

NVIDIA Container Toolkit includes the following packages:

  • libnvidia-container-tools: 1.3.0-1

  • libnvidia-container1: 1.3.0-1

  • nvidia-container-toolkit: 1.3.0-1

  • nvidia-container-runtime: 3.4.0-1

  • nvidia-docker2: 2.5.0-1

© Copyright 2020-2023, NVIDIA. Last updated on Mar 24, 2023.