DGX Software For Red Hat Enterprise Linux 7 Release Notes
NVIDIA Base OS

Version EL7-19.10

The DGX Software for Red Hat Enterprise Linux 7, EL7-19.10 update, is available. You must enable the update repository in order to obtain this update.

Important

Installing or updating to EL7-19.10 also updates the installed Red Hat Enterprise Linux 7 distribution to the latest version. If you require use of the Mellanox OpenFabrics Enterprise Distribution for Linux (MLNX_OFED), then before installing or updating to EL7-19.10, be sure that there is a MLNX_OFED package version available that supports the latest Red Hat Enterprise Linux 7 version.

If a supporting MLNX_OFED package has been released, then be sure to install it.

Update Repository

The update repository was created for updating the NVIDIA driver to the R418 driver branch, CUDA 10.1, and other software packages associated with CUDA 10.1.

These updates are available only if you have enabled the update repository. See the document DGX-Software-Stack-for-Red-Hat-Enterprise-Linux-on-DGX <https://enterprise-support.nvidia.com/s/announcement/a4z1W000000WlMmQAK/dgx-software-stack-for-red-hat-enterprise-linux-on-dgx> (available to DGX customers with an NVIDIA Enterprise Support account) for instructions on updating the NVIDIA repositories.

Change Highlights

The following changes were made to the update repository.

  • Added NVSM version 19.08.

    See also the list of resolved issues.

  • Updated DCGM to v1.7.1

  • Updated NVIDIA GPU driver to version 418.87.01.

    Resolved a driver issue that caused the GPU to hang.

Software Contents:

The following table provides version information for software included in the DGX Software Stack for Red Hat Enterprise Linux 7.

Note

Unlike the DGX OS shipped with the NVIDIA DGX system, the DGX software stack for Red Hat does not include the Mellanox OpenFabrics Enterprise Distribution (MLNX_OFED) for Linux. This is due to the likelihood of the MLNX_OFED kernel  being out of sync with the Red Hat distribution kernel. This can result in system instability. To use InfiniBand on the DGX system, see the DGX Software for Red Hat Enterprise Linux 7 Installation Guide <https://docs.nvidia.com/dgx/dgx-rhel-install-guide/installing-ib-drivers.html#installing-ib-drivers> for instructions.

Table 16. Contents of the Update Repository

Component

Version

GPU Driver

418.87.01

NVIDIA System Management (NVSM)

nvsm-cli 19.08.5

nvsm-dshm 19.08.5

nvsm-apis 19.08.8

nvsm-health 19.08.6

Data Center GPU Management (DCGM)

1.7.1

NCCL Runtime

2.4.7+cuda10.1

cuDNN Library Runtime

7.6.2+cuda10.1

TensorRT

5.1.5+cuda10.1

CUDA Toolkit

10.1.243

Compatibility

NVIDIA has validated and tested the DGX Software version EL7-19.10 on the

  • NVIDIA DGX-2 with Red Hat Enterprise Linux 7 and CentOS, and

  • NVIDIA DGX-1 (Tesla V100) with Red Hat Enterprise Linux 7 and CentOS.

NVIDIA acknowledges the wide use of CentOS and understands that it is a community-developed derivative of the NVIDIA supported Red Hat Enterprise Linux. Support for CentOS is available directly from the CentOS community. NVIDIA ensures that NVIDIA provided software runs on tested CentOS versions and will try to identify and correct issues related to NVIDIA provided software.

Update Instructions

  • For installing on a fresh DGX system, see the DGX Software for Red Hat Enterprise Linux 7 - Installation Guide <https://docs.nvidia.com/dgx/dgx-rhel-install-guide/index.html> or the DGX Software for CentOS - Installation Guide <https://docs.nvidia.com/dgx/dgx-centos-install-guide/index.html>.

  • To obtain additional updates, issue the following.

Copy
Copied!
            

sudo yum update The updates will depend on which repositories you have enabled. See the document `DGX-Software-Stack-for-Red-Hat-Enterprise-Linux-on-DGX <https://enterprise-support.nvidia.com/s/announcement/a4z1W000000WlMmQAK/dgx-software-stack-for-red-hat-enterprise-linux-on-dgx>` (available to DGX customers with an NVIDIA Enterprise Support account) for instructions on updating the NVIDIA repositories.

Resolved Issues

  • DGX-2: NVSM is unable to detect PSU and fan sensors with BMC v1.05.07 (due to updated sensor names).

  • DGX-2: NVSM erroneously reports PSUs and fans as unhealthy after updating the BMC to version 1.05.07 <issue-nvsm-reports-psus-fans-unhealthy-dgx-2>

  • DGX-1: Failure Reading Sector 0x0 May Occur on Reboot<issue-error-reading-sector-0x0-on-reboot>

Known Issues

  • DGX-2: DGX-2: NVSM Error Occurs When Accessing Systems/Localhost<issue-nvsm-error-accessing-local-host>

  • DGX-2: Ubuntu Appears as a Boot Option <unique_157798676>

  • DGX-1: DGX-1: DKMS May not Build for New Kernel During Driver Update<issue-dkms-build-during-driver-update>

  • DGX-1: NVSM Storage Alerts are Cleared When All Data Drives are Removed <unique_1914269314>

  • Black screen on BMC Remote Console with Red Hat Enterprise Linux 7.5

  • DGX-1/CentOS: NVSM CLI and API Reports Incorrect DGX-1 Serial Number<issue-nvsm-reports-incorrect-serial-number>

© Copyright 2022-2023, NVIDIA. Last updated on Jun 27, 2023.