Base OS - DGX OS 5

Introduction

DGX™ OS provides a customized installation of Ubuntu Linux with platform-specific configurations, additional drivers, and diagnostic and monitoring tools. It provides the stable, fully-tested, and supported OS to run AI, machine learning, and analytics applications on DGX systems.

NVIDIA® DGX systems are shipped preinstalled with DGX OS to provide the turnkey solution for running AI and analytics workloads. Basic system configuration is deferred to a setup wizard on first boot. This offers users a fast on-boarding experience for using DGX systems.

DGX OS is released in the form of an ISO image and as packages that are available from software repositories over the internet. The ISO image includes an autonomous installer to reimage a DGX system. Users also have the option to install Ubuntu and the DGX Software Stack manually. This provides more flexibility, such as defining custom partition schemes, but requires more expertise. Cluster deployments also benefit from this installation method by taking advantage of Ubuntu’s standardized automated and non-interactive installation process.

The following are the key features of DGX OS Release 5:

  • Based on Ubuntu 20.04 LTS

  • Includes Extended Security Maintenance updates from Ubuntu

  • Common ISO for all DGX systems

  • Option to manually install Ubuntu and the DGX Software Stack

  • NVIDIA System Management (NVSM) NVSM provides active health monitoring and system alerts for NVIDIA DGX nodes in a data center. It also provides simple commands to check the health of the DGX systems from the command line

  • Data Center GPU Management (DCGM) This software enables node-wide administration of GPUs and can be used for cluster and data-center level management

  • DGX system-specific support packages

  • NVIDIA GPU driver, CUDA toolkit, and domain specific libraries

  • Docker Engine

  • NVIDIA Container Toolkit

  • Cachefiles Daemon for caching NFS reads

  • Includes drive encryption for added security

  • Tools to convert data disks between RAID levels

  • Disk drive encryption and root filesystem encryption (optional)

  • Mellanox OpenFabrics Enterprise Distribution for Linux (MOFED) and Mellanox Software Tools (MST) for systems with Mellanox network cards

This document covers deployment and upgrade options for DGX OS. It also provides instructions for setting up the system and installing additional software.

Important

Before you upgrade or install any new software, always consult the Release Notes for the latest information about available upgrades. You can find out more about the release cadence and release methodologies for DGX OS in Release Information

Here are links to some additional DGX documentation.

  • DGX Documentation

    All documentation for DGX products, including product user guides, software release notes, firmware update container information

  • MIG User Guide

    The new Multi-Instance GPU (MIG) feature allows the NVIDIA A100 GPU to be securely partitioned into up to seven separate

  • NGC Private Registry

    How to access the NGC container registry for using containerized deep learning GPU

  • NVSM Software User Guide

    Contains instructions for using

  • DCGM Software User Guide

    Contains instructions for using the Data Center GPU Manager software.

NVIDIA Enterprise Support is the support resource for DGX customers and can assist with hardware, software, or NGC application issues. For more information about how to obtain support, visit NVIDIA Enterprise Support.

© Copyright 2020-2023, NVIDIA. Last updated on Mar 24, 2023.