Introduction

The NVIDIA® DGX™ servers (DGX-1 and DGX-2) are shipped with DGX™ OS which incorporates the NVIDIA DGX software stack built upon the Ubuntu Linux distribution. Instead of running the Ubuntu distribution, you can run Red Hat Enterprise Linux on the DGX system and still take advantage of the advanced DGX features.

This document explains how to install and configure the NVIDIA DGX software stack on DGX systems installed with Red Hat Enterprise Linux.

Note: The NVIDIA DGX software stack for Red Hat Enterprise Linux is currently not supported on the NVIDIA DGX Station™ workstation.
Note: While it may be possible to use other derived Linux distributions besides Red Hat Enterprise Linux, not all have been tested and qualified by NVIDIA. Refer to the DGX Software for Red Hat Enterprise Linux 7 Release Notes for the list of tested and qualified software and Linux distributions.

Prerequisites

The following are required (or recommended where indicated).

Red Hat Subscription

You need a Red Hat subscription if you plan to install and use Red Hat Enterprise Linux 7 on the DGX. A subscription also lets you obtain update packages and additional packages for Red Hat Enterprise Linux. You can either purchase a subscription or obtain a free evaluation subscription from the Red Hat Software & Download Center.

Access to Repositories

The repositories can be accessed from the internet. If your installation does not allow connection to the internet, see the section Installing Software on Air-Gapped NVIDIA DGX Systems for information about updating software on “air-gapped” systems.

Note:

You can use yum-config-manager to conveniently enable certain repositories. To use yum-config-manager, first install the yum utilities.

sudo yum -y install yum-utils 

NVIDIA Repositories

  • NVIDIA DGX Software Repository

    After installing Red Hat Enterprise Linux on the DGX-1 system, you must enable the NVIDIA DGX software repository. Instructions are provided in the document DGX-Software-Stack-for-Red-Hat-Enterprise-Linux-on-DGX-1 (available to DGX customers with an NVIDIA Enterprise Support account)

Red Hat Repositories

Installation of the DGX Software over Red Hat Enterprise Linux 7 requires access to several additional repositories.

  • Red Hat Enterprise Server Extras Repository: rhel-7-server-extras-rpms

    Required for container support

  • Red Hat Enterprise Server Optional Repository: rhel-7-server-optional-rpms

    Required by NVIDIA System Manager (NVSM) and the GPU driver.

  • Red Hat Software Collections Repository: rhel-server-rhscl-7-rpms

    This repository is required by the NVSM tool for Python 3. If you do not have access to the Red Hat software collections repository, refer to https://access.redhat.com/solutions/472793 for instructions on requesting access for free.

Network File System

A network file system (NFS) is recommended to take advantage of the cache file system provided by the DGX software stack.

BMC Password

The DGX BMC comes with default login credentials as specified in Appendix B: Changing the BMC Login.

Important:

NVIDIA recommends disabling the default username and creating a unique BMC username and strong password as soon as possible. Refer to Appendix B: Changing the BMC Login for instructions.