Introduction
DGX™ OS provides a customized installation of Ubuntu Linux with platform-specific configurations, additional drivers, and diagnostic and monitoring tools. It provides the stable, fully-tested, and supported OS to run AI, machine learning, and analytics applications on DGX systems.
NVIDIA® DGX systems are shipped preinstalled with DGX OS to provide the turnkey solution for running AI and analytics workloads. Basic system configuration is deferred to a setup wizard on first boot. This offers users a fast on-boarding experience for using DGX systems.
DGX OS is released in the form of an ISO image and as packages that are available from software repositories over the internet. The ISO image includes an autonomous installer to reimage a DGX system. Users also have the option to install Ubuntu and the DGX Software Stack manually. This provides more flexibility, such as defining custom partition schemes, but requires more expertise. Cluster deployments also benefit from this installation method by taking advantage of Ubuntu’s standardized automated and non-interactive installation process.
The following are the key features of DGX OS Release 5:
Based on Ubuntu 20.04 LTS
Includes Extended Security Maintenance updates from Ubuntu
Common ISO for all DGX systems
Option to manually install Ubuntu and the DGX Software Stack
NVIDIA System Management (NVSM) NVSM provides active health monitoring and system alerts for NVIDIA DGX nodes in a data center. It also provides simple commands to check the health of the DGX systems from the command line
Data Center GPU Management (DCGM) This software enables node-wide administration of GPUs and can be used for cluster and data-center level management
DGX system-specific support packages
NVIDIA GPU driver, CUDA toolkit, and domain specific libraries
Docker Engine
NVIDIA Container Toolkit
Cachefiles Daemon for caching NFS reads
Includes drive encryption for added security
Tools to convert data disks between RAID levels
Disk drive encryption and root filesystem encryption (optional)
Mellanox OpenFabrics Enterprise Distribution for Linux (MOFED) and Mellanox Software Tools (MST) for systems with Mellanox network cards
This document covers deployment and upgrade options for DGX OS. It also provides instructions for setting up the system and installing additional software.
-
If your system is already running DGX OS 5, you can skip to Initial DGX OS Setup for instructions on how to set up the system on first boot. Ensure to also read Upgrading DGX OS for information on upgrading the software to the latest versions.
-
To upgrade your DGX OS to the latest software versions or for instructions on performing a release upgrade from DGX OS 4 to DGX OS 5, you can find instructions in Upgrading DGX OS
Installing the DGX Software Stack.
If you want to install Ubuntu and the DGX Software Stack manually, you can find instructions in Installing the DGX Software Stack. It also covers automating the installation process, for example, for cluster deployments.
Upgrading or Installing Additional Software
DGX OS and Ubuntu provide many additional software packages from the repositories, including additional NVIDIA software and driver options. Refer to Upgrading or Installing Additional Software for more information and installation instructions,
Before you upgrade or install any new software, always consult the Release Notes for the latest information about available upgrades. You can find out more about the release cadence and release methodologies for DGX OS in Release Information
Here are links to some additional DGX documentation.
-
All documentation for DGX products, including product user guides, software release notes, firmware update container information
-
The new Multi-Instance GPU (MIG) feature allows the NVIDIA A100 GPU to be securely partitioned into up to seven separate
-
How to access the NGC container registry for using containerized deep learning GPU
-
Contains instructions for using
-
Contains instructions for using the Data Center GPU Manager software.
NVIDIA Enterprise Support is the support resource for DGX customers and can assist with hardware, software, or NGC application issues. For more information about how to obtain support, visit NVIDIA Enterprise Support.