DGX Software for Red Hat Enterprise Linux 10#

The DGX Software for Red Hat Enterprise Linux 10 User Guide is also available as a PDF.

Introduction#

This document explains the installation and configuration of the NVIDIA BaseOS 8 Software on DGX systems user-installed with Red Hat Enterprise Linux version 10. NVIDIA BaseOS defines a configuration of the Linux operating system for running AI, machine learning, and analytics applications. It provides essential information for configuring and optimizing systems for these workloads.

Derived from the proven software stack embedded in DGX OS, NVIDIA BaseOS comprises additional software, as well as hardware and software configurations. The continuously tested and qualified software stack includes additional software, libraries, utilities, and diagnostics and monitoring tools. Hardware and software configurations are provided in the form of packages, cookbooks, and scripts.

NVIDIA BaseOS may be extended to support other RHEL releases and other distributions.

The main changes from previous RHEL releases are:

  • Branding:

    • “BaseOS” refers to the software that is released on top of, and independently from, the OS distribution

  • Functional:

    • Move to TuneD for system configurations as much as possible

    • Sort packages into different repositories based on whether packages are generic or DGX-specific

Note

NVIDIA acknowledges the wide use of Rocky Linux and understands that it is a community-developed derivative of the NVIDIA supported Red Hat Enterprise Linux. Support for Rocky Linux is available directly from the Rocky Linux community. NVIDIA ensures that NVIDIA provided software runs on tested Rocky Linux versions and will try to identify and correct issues related to NVIDIA provided software.

While it might be possible to use other derived Linux distributions, not all have been tested and qualified by NVIDIA. Refer to the Release Notes for the list of tested and qualified software and Linux distributions.

Items Needed Before You Begin#

The following are required (or recommended wherever indicated).

Red Hat Subscription#

You need a Red Hat subscription if you plan to install and use Red Hat Enterprise Linux on the DGX system. A subscription also lets you obtain update packages and additional packages for Red Hat Enterprise Linux. You can either purchase a subscription or obtain a free evaluation subscription from the Red Hat Software & Download Center.

Access to Repositories#

The repositories can be accessed from the internet.

If you are using a proxy server, follow the instructions in the section Configuring a System Proxy to make sure the system can access the necessary URIs.

  • Red Hat Repositories

    Installation of the DGX Software over Red Hat Enterprise Linux 10 requires access to several additional repositories.

    • Red Hat Enterprise BaseOS Repository: rhel-10-for-x86_64-baseos-rpms

    • Red Hat Enterprise AppStream Repository: rhel-10-for-x86_64-appstream-rpms

    • Red Hat Enterprise CodeReady Linux Builder Repository: codeready-builder-for-rhel-10-x86_64-debug-rpms

  • NVIDIA and DGX Repositories

    After installing Red Hat Enterprise Linux on the DGX system, enable the NVIDIA software repository and the DGX software repository (https://repo.download.nvidia.com). These repositories include software for supporting DGX systems.

    See the Enabling the NVIDIA and DGX Software Repositories and Installing Required Components section for instructions on how to enable the repository.

Network File System#

On DGX servers, the data drives are meant to be used as a cache. When using the data drives as a cache, a network file system (NFS) is recommended to take advantage of the cache file system provided by NVIDIA BaseOS.

BMC Password#

The NVIDIA DGX server includes a base management controller (BMC) for out-of-band management of the DGX system. NVIDIA recommends disabling the default username and creating a unique username and password as soon as possible.