Version EL7-19.07

The DGX Software for Red Hat Enterprise Linux 7, Version EL7-19.07, is available.

Change Highlights

  • Optional Installation Repository

    Added an optional repository for updating the NVIDIA driver to the R418 driver branch, CUDA 10.1, and other software packages associated with CUDA 10.1. The default repository updates to the component versions listed in the table Contents of Optional Repository for EL7-19.07

  • NVSM Updated to version 19.06

Implemented the following NVSM updates:

  • Storage reporting: NVSM now reports the physical slot number in addition to device name of the failed storage device.

  • NVSM APIs are based on the OpenAPI project.

  • Generic bug fixes

Software Contents:

The following table provides version information for software included in the DGX Software Stack for Red Hat Enterprise Linux 7.

Note

Unlike the DGX OS shipped with the NVIDIA DGX system, the DGX software stack for Red Hat does not include the Mellanox OpenFabrics Enterprise Distribution (MLNX_OFED) for Linux. This is due to the likelihood of the MLNX_OFED kernel  being out of sync with the Red Hat distribution kernel. This can result in system instability. To use InfiniBand on the DGX system, see the DGX Software for Red Hat Enterprise Linux 7 Installation Guide <https://docs.nvidia.com/dgx/dgx-rhel-install-guide/index.html> for instructions.

Table 21. Contents of Default Repository for EL7-19.07

Component

Version

GPU Driver

410.104

NVIDIA System Health Monitor (NVSM)

nvsm-cli 19.06.5-1

nvsm-dshm 19.06-2

nvsm-apis 19.06.9-1

nvhealth 19.06.8-1

Data Center GPU Management (DCGM)

1.5.9

NCCL Runtime

2.4.7+cuda10.0

cuDNN Library Runtime

7.6.0.64-1+cuda10.0

TensorRT

5.1.5+cuda10.0

CUDA Toolkit

10.0-130

Table 22. Contents of Optional Repository for EL7-19.07

Component

Version

GPU Driver

418.67

NVIDIA System Health Monitor (NVSM)

Not included. Components are updated from the default repository.

Data Center GPU Management (DCGM)

1.6.5

NCCL Runtime

2.4.7+cuda10.1

cuDNN Library Runtime

7.6.0.64-1+cuda10.1

TensorRT

5.1.5+cuda10.1

CUDA Toolkit

10.1.168

Compatibility

NVIDIA has validated and tested the DGX Software version EL7-19.07 on the

  • NVIDIA DGX-2 with Red Hat Enterprise Linux 7.6 and CentOS, and

  • NVIDIA DGX-1 (Tesla V100) with Red Hat Enterprise Linux 7.6 and CentOS.

NVIDIA acknowledges the wide use of CentOS and understands that it is a community-developed derivative of the NVIDIA supported Red Hat Enterprise Linux. Support for CentOS is available directly from the CentOS community. NVIDIA ensures that NVIDIA provided software runs on tested CentOS versions and will try to identify and correct issues related to NVIDIA provided software.

Update Instructions

  • For installing on a fresh DGX system, see the DGX Software for Red Hat Enterprise Linux 7 - Installation Guide <https://docs.nvidia.com/dgx/dgx-rhel-install-guide/index.html>.

  • To obtain additional updates, issue the following.

Copy
Copied!
            

sudo yum update

The updates will depend on which repositories you have enabled.

See the document DGX-Software-Stack-for-Red-Hat-Enterprise-Linux-on-DGX <https://npncommunity.force.com/ESPCommunity/s/announcement/a4z1W000000WlMmQAK/dgx-software-stack-for-red-hat-enterprise-linux-on-dgx> (available to DGX customers with an NVIDIA Enterprise Support account) for instructions on updating the NVIDIA repositories.

See the section Change Highlights <ver-el7-19-07> for an explanation of the two repositories.

Fixed Issues

  • DGX-1: DSHM Does Not Clear Alerts After RAID 0 Rebuild<issue-dshm-does-not-clear-alerts-raid-0-rebuild>

  • DGX-2: NVSM Does not Raise an Alert When the EFI Directory is Modified<issue-ubuntu-boot-option-appears-dgx-2>

  • DGX-2: NVSM Reports ‘Unsupported Drive’ Alerts During RAID 1 Rebuild<issue-nvsm-unsupported-drive-raid-1-rebuild>

  • DGX-2: NVSM EFI Sync Hangs on CentOS

Known Issues

  • DGX-2: Ubuntu Appears as a Boot Option

  • DGX-1: DGX-1: DKMS May not Build for New Kernel During Driver Update<issue-dkms-build-during-driver-update>

  • DGX-1: NVSM Storage Alerts are Cleared When All Data Drives are Removed

  • Black screen on BMC Remote Console with Red Hat Enterprise Linux 7.5

  • DGX-1: NVSM CLI Returns HTTP Code 500 Error After Hot-Plugging a Previously Removed SSD<issue-500-error-ssd-hot-plug>

© Copyright 2022-2023, NVIDIA. Last updated on Jun 27, 2023.