DGX OS 5 / Ubuntu 20.04#

Introduction#

DGX™ OS provides a customized installation of Ubuntu Linux with system-specific optimizations and configurations, additional drivers, and diagnostic and monitoring tools. It provides a stable, fully-tested, and supported OS to run AI, machine learning, and analytics applications on DGX Supercomputers.

NVIDIA® DGX systems are shipped preinstalled with DGX OS to provide a turnkey solution for running AI and analytics workloads. Basic system configuration is deferred to a setup wizard on first boot. This offers users a fast on-boarding experience for using DGX systems.

The DGX OS installer is released in the form of an ISO image to reimage a DGX system. The additional software, the NVIDIA DGX Software Stack, that is included in DGX OS is provided as packages that are available from software repositories over the internet.

You also have the option to install the NVIDIA DGX Software Stack on a regular Ubuntu 20.04 while still benefiting from the advanced DGX features. This installation method supports more flexibility, such as custom partition schemes. Cluster deployments also benefit from this installation method by taking advantage of Ubuntu`s standardized automated and non-interactive installation process.

DGX OS 5 Features#

The following are the key features of DGX OS Release 5:

  • Based on Ubuntu 20.04 LTS

  • Includes Extended Security Maintenance updates from Ubuntu

  • Common ISO for all DGX systems

  • Option to manually install Ubuntu and the DGX Software Stack

  • DGX system-specific performance optimizations

  • NVIDIA System Management (NVSM) NVSM provides active health monitoring and system alerts for NVIDIA DGX nodes in a data center. It also provides simple commands to check the health of the DGX systems from the command line

  • Data Center GPU Management (DCGM) This software enables node-wide administration of GPUs and can be used for cluster and data-center level management

  • NVIDIA GPU driver, CUDA toolkit, and domain specific libraries

  • Docker Engine

  • NVIDIA Container Toolkit

  • Cachefiles Daemon for caching NFS reads

  • Tools to convert data disks between RAID levels

  • Disk drive encryption and root filesystem encryption (optional)

  • Mellanox OpenFabrics Enterprise Distribution for Linux (MOFED) and Mellanox Software Tools (MST) for systems with Mellanox network cards

Overview#

This document covers deployment and upgrade options for DGX OS. It also provides instructions for setting up the system and installing additional software.

  • Initial Setup

    If your system is already running DGX OS 6, you can skip to Initial Setup for instructions on how to set up the system on first boot. Ensure you review Upgrading for information on upgrading the software to the latest versions.

  • Upgrading

    To upgrade your DGX OS to the latest software versions or for instructions on performing a release upgrade from DGX OS 5 to DGX OS 6, you can find more details and instructions in Upgrading.

  • Reimaging

    In situations where you want to restore a DGX system to a default DGX OS installation and erase all data, you can use the ISO image that includes an autonomous installer. Refer to Reimaging for instructions.

  • Installing on Ubuntu

    If you want to install Ubuntu and the DGX Software Stack, you can find instructions in Installing on Ubuntu. It also covers automating the installation process, for example, for cluster deployments.

  • Additional Software

    DGX OS and Ubuntu provide many additional software packages from the repositories, including additional NVIDIA software and driver options. Refer to Additional Software for more information and installation instructions.

Important

Before you upgrade or install any new software, always consult the Release Notes for the latest information about available upgrades. You can find out more about the release cadence and release methods for DGX OS in Release Guidance

Additional Documentation#

Here are links to some additional DGX documentation.

  • DGX Documentation

    All documentation for DGX products, including product user guides, software release notes, and firmware update container information

  • MIG User Guide

    The new Multi-Instance GPU (MIG) feature allows the NVIDIA A100 GPU to be securely partitioned into up to seven separate

  • NGC Private Registry

    How to access the NGC container registry for using containerized deep learning GPU

  • NVSM Software User Guide

    Contains instructions for using the NVIDIA System Manager software.

  • DCGM Software User Guide

    Contains instructions for using the Data Center GPU Manager software.

NVIDIA Enterprise Support#

NVIDIA Enterprise Support is the support resource for DGX customers and can assist with hardware, software, or NGC application issues. For more information about how to obtain support, visit NVIDIA Enterprise Support.