Version EL7-20.02
The DGX Software for Red Hat Enterprise Linux 7, EL7-20.02 update, is available. You must enable the update repository in order to obtain this update.
Installing or updating to EL7-20.02 also updates the installed Red Hat Enterprise Linux 7 distribution to the latest version. If you require use of the Mellanox OpenFabrics Enterprise Distribution for Linux (MLNX_OFED), then before installing or updating to EL7-20.02, be sure that there is a MLNX_OFED package version available that supports the latest Red Hat Enterprise Linux 7 version.
To check the latest Red Hat Enterprise Linux 7 version, visit https://access.redhat.com/articles/3078
To check the MLNX_OFED package OS support, visit https://docs.mellanox.com/category/mlnxofedib, click the latest MLNX_OFED software version and then use the side menu to navigate to Release Notes->General Support in MLNX_OFED and view Supported Operating Systems.
If a supporting MLNX_OFED package has been released, then be sure to install it.
Update Repository
The update repository was created for updating the NVIDIA driver to the R418 driver branch, CUDA 10.1, and other software packages associated with CUDA 10.1.
These updates are available only if you have enabled the update repository. See DGX Software for Red Hat Enterprise Linux 7 Installation Guide <https://docs.nvidia.com/dgx/dgx-rhel-install-guide/installing-dgx-sw.html#enabling-dgx-sw-repo> for instructions on updating the NVIDIA repositories.
Change Highlights
The following changes were made to the update repository.
Updated NVSM to version 20.01.15 <https://docs.nvidia.com/datacenter/nvsm/20.01/nvsm-release-notes/index.html>
Updated NCCL Runtime to version 2.5.6 <https://docs.nvidia.com/deeplearning/sdk/nccl-archived/nccl_256/nccl-release-notes/rel_2-5-6.html#rel_2-5-6>+cuda10.1
Updated cuDNN Library Runtime to version 7.6.5 <https://docs.nvidia.com/deeplearning/sdk/cudnn-archived/cudnn_765/cudnn-release-notes/rel_765.html#rel_765>+cuda10.1
Updated TensorRT to version 6.0.1 <https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-601/tensorrt-release-notes/tensorrt-6.html#rel_6-0-1>+cuda10.1
Updated NVIDIA GPU driver to version 418.126.02 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-418-12602/index.html>
Mellanox CX6 cards are now supported on DGX-1 (Tesla V100).
PXE boot is now supported on DGX-1 and DGX-2.
CPU mitigations can now be disabled and restored.
Support for the NVSM commands nvsm show health and nvsm dump health on the DGX Station replaces the DGX Station Diagnostic Components.
Software Contents:
The following table provides version information for software included in the DGX Software Stack for Red Hat Enterprise Linux 7.
Unlike the DGX OS shipped with the NVIDIA DGX system, the DGX software stack for Red Hat does not include the Mellanox OpenFabrics Enterprise Distribution (MLNX_OFED) for Linux. This is due to the likelihood of the MLNX_OFED kernel being out of sync with the Red Hat distribution kernel. This can result in system instability. To use InfiniBand on the DGX system, see the DGX Software for Red Hat Enterprise Linux 7 Installation Guide <https://docs.nvidia.com/dgx/dgx-rhel-install-guide/installing-ib-drivers.html#installing-ib-drivers> for instructions.
Component |
Version |
---|---|
GPU Driver |
418.126.02 |
NVIDIA System Management (NVSM) |
20.01.15 |
Data Center GPU Management (DCGM) |
1.7.2 |
DGX Station Theme |
dgxstation-desktop - 19.10-0 dgx-gnome - 19.10-0 |
NCCL Runtime |
2.5.6+cuda10.1 |
cuDNN Library Runtime |
7.6.5+cuda10.1 |
TensorRT |
6.0.1+cuda10.1 |
CUDA Toolkit |
10.1.243 |
Compatibility
NVIDIA has validated and tested the DGX Software version EL7-20.02 on the following systems:
NVIDIA DGX-2 with Red Hat Enterprise Linux 7 and CentOS
NVIDIA DGX-1 (Tesla V100) with Red Hat Enterprise Linux 7 and CentOS.
NVIDIA DGX Station with Red Hat Enterprise Linux 7 and CentOS
NVIDIA acknowledges the wide use of CentOS and understands that it is a community-developed derivative of the NVIDIA supported Red Hat Enterprise Linux. Support for CentOS is available directly from the CentOS community. NVIDIA ensures that NVIDIA provided software runs on tested CentOS versions and will try to identify and correct issues related to NVIDIA provided software.
Update Instructions
See the section Installing and Updating the Software <install-update> for instructions.
Resolved Issues
DGX-1, DGX-2: NVSM Services May Fail to Load
DGX-2, DGX-1, DGX Station: Docker GPU Containers Cannot be Run
DGX-1, CentOS: NVSM CLI and API Reports Incorrect DGX-1 Serial Number
Known Issues
DGX-2: Unable to Boot from Degraded OS RAID 1 Array
DGX-2: Ubuntu Appears as a Boot Option
DGX-1: DKMS May not Build for New Kernel During Driver Update
DGX-1: NVSM Storage Alerts are Cleared When All Data Drives are Removed
DGX-1: Black screen on BMC Remote Console with Red Hat Enterprise Linux 7.5
DGX Station: The Symbolic Link to /usr/local/cuda Is Missing