DGX OS 5 / Ubuntu 20.04#

Introduction#

DGX™ OS provides a customized installation of Ubuntu Linux with system-specific optimizations and configurations, additional drivers, and diagnostic and monitoring tools. It provides a stable, fully-tested, and supported OS to run AI, machine learning, and analytics applications on DGX Supercomputers.

NVIDIA® DGX systems are shipped preinstalled with DGX OS to provide a turnkey solution for running AI and analytics workloads. Basic system configuration is deferred to a setup wizard on first boot. This offers users a fast on-boarding experience for using DGX systems.

The DGX OS installer is released in the form of an ISO image to reimage a DGX system. The additional software, the NVIDIA DGX Software Stack, that is included in DGX OS is provided as packages that are available from software repositories over the internet.

You also have the option to install the NVIDIA DGX Software Stack on a regular Ubuntu 20.04 while still benefiting from the advanced DGX features. This installation method supports more flexibility, such as custom partition schemes. Cluster deployments also benefit from this installation method by taking advantage of Ubuntu`s standardized automated and non-interactive installation process.

DGX OS 5 Features#

The following are the key features of DGX OS Release 5:

Based on Ubuntu 20.04 LTS
Includes Extended Security Maintenance updates from Ubuntu
Common ISO for all DGX systems
Option to manually install Ubuntu and the DGX Software Stack
DGX system-specific performance optimizations
NVIDIA System Management (NVSM) NVSM provides active health monitoring and system alerts for NVIDIA DGX nodes in a data center. It also provides simple commands to check the health of the DGX systems from the command line
Data Center GPU Management (DCGM) This software enables node-wide administration of GPUs and can be used for cluster and data-center level management
NVIDIA GPU driver, CUDA toolkit, and domain specific libraries
Docker Engine
NVIDIA Container Toolkit
Cachefiles Daemon for caching NFS reads
Tools to convert data disks between RAID levels
Disk drive encryption and root filesystem encryption (optional)
Mellanox OpenFabrics Enterprise Distribution for Linux (MOFED) and Mellanox Software Tools (MST) for systems with Mellanox network cards

Overview#

This document covers deployment and upgrade options for DGX OS. It also provides instructions for setting up the system and installing additional software.

Initial Setup

If your system is already running DGX OS 6, you can skip to Initial Setup for instructions on how to set up the system on first boot. Ensure you review Upgrading for information on upgrading the software to the latest versions.
Upgrading

To upgrade your DGX OS to the latest software versions or for instructions on performing a release upgrade from DGX OS 5 to DGX OS 6, you can find more details and instructions in Upgrading.
Reimaging

In situations where you want to restore a DGX system to a default DGX OS installation and erase all data, you can use the ISO image that includes an autonomous installer. Refer to Reimaging for instructions.
Installing on Ubuntu

If you want to install Ubuntu and the DGX Software Stack, you can find instructions in Installing on Ubuntu. It also covers automating the installation process, for example, for cluster deployments.
Additional Software

DGX OS and Ubuntu provide many additional software packages from the repositories, including additional NVIDIA software and driver options. Refer to Additional Software for more information and installation instructions.

Important

Before you upgrade or install any new software, always consult the Release Notes for the latest information about available upgrades. You can find out more about the release cadence and release methods for DGX OS in Release Guidance

Additional Documentation#

Here are links to some additional DGX documentation.

DGX Documentation

All documentation for DGX products, including product user guides, software release notes, and firmware update container information
MIG User Guide

The new Multi-Instance GPU (MIG) feature allows the NVIDIA A100 GPU to be securely partitioned into up to seven separate
NGC Private Registry

How to access the NGC container registry for using containerized deep learning GPU
NVSM Software User Guide

Contains instructions for using the NVIDIA System Manager software.
DCGM Software User Guide

Contains instructions for using the Data Center GPU Manager software.

NVIDIA Enterprise Support#

NVIDIA Enterprise Support is the support resource for DGX customers and can assist with hardware, software, or NGC application issues. For more information about how to obtain support, visit NVIDIA Enterprise Support.