Introduction

The NVIDIA® DGX™ systems (DGX-1, DGX-2, and DGX A100 servers, and NVIDIA DGX Station™ and DGX Station A100 systems) are shipped with DGX™ OS which incorporates the NVIDIA DGX software stack built upon the Ubuntu Linux distribution. Instead of running the Ubuntu distribution, you can run CentOS on the DGX system and still take advantage of the advanced DGX features.

This document explains how to install and configure the NVIDIA DGX software stack on DGX systems installed with CentOS.

Important: NVIDIA acknowledges the wide use of CentOS and understands that it is a community-developed derivative of the NVIDIA supported Red Hat Enterprise Linux. Support for CentOS is available directly from the CentOS community. NVIDIA ensures that NVIDIA provided software runs on tested CentOS versions and will try to identify and correct issues related to NVIDIA provided software.
Note: While it may be possible to use other derived Linux distributions besides CentOS, not all have been tested and qualified by NVIDIA. Refer to the DGX Software for Red Hat Enterprise Linux 7 Release Notes for the list of tested and qualified software and Linux distributions.

Prerequisites

The following are required (or recommended where indicated).

Access to Repositories

The repositories can be accessed from the internet.

If you are using a proxy server, then follow the instructions in the section Configuring a System Proxy to make sure the system can access the necessary URIs.

Note:

You can use yum-config-manager to conveniently enable certain repositories. To use yum-config-manager, first install the yum utilities.

sudo yum -y install yum-utils 

NVIDIA Repositories

  • NVIDIA DGX Software Repository

    After installing CentOS on the DGX system, you must enable the NVIDIA DGX software repository. The repository includes the NVIDIA drivers and software for supporting DGX systems.

    See the section Enabling the Repositories for instructions on how to enable the repositories.

CentOS Repositories

Installation of the DGX Software over CentOS requires access to several additional repositories.

  • CentOS Software Collections Repository: centos-release-scl

    This repository is required by the NVSM tool for Python 3.

  • CentOS Testing Repository: centos-sclo-rh-testing

    This repository is required by the NVSM tool for Python 3.

Network File System

On DGX servers, the data drives are meant to be used as a cache. DGX Station users can follow the same usage, or can alternatively opt to use these drives for storage. When using the data drives as cache, a network file system (NFS) is recommended to take advantage of the cache file system provided by the DGX software stack.

BMC Password

The DGX BMC comes with default login credentials as specified in Appendix B: Changing the BMC Login.

Important:

NVIDIA recommends disabling the default username and creating a unique BMC username and strong password as soon as possible. Refer to Appendix B: Changing the BMC Login for instructions.