Installing DGX OS

This section provides information about installing the DGX OS by reimaging the system from the DGX OS ISO image.

DGX OS is already preinstalled on new DGX systems and only requires reimaging in limited cases. You can skip to Initial DGX OS Setup for instructions about the initial setup of the system. To upgrade a system from DGX OS 4, refer to Upgrading DGX OS. You also have the option to install Ubuntu and the DGX software manually, for example, if you require custom installation options, such a specific a drive partition scheme. Refer to Installing the DGX Software Stack for more details. It also describes automating the installation process, for example, for cluster deployments.

There are situations where you want to reimage a system, such as the following:

  • When you want to install the latest version on a new system.

  • When you need to install and older version

  • When the OS becomes corrupt.

  • When the OS drive is replaced or both drives in a RAID-1 configuration are replaced.

  • When you want to encrypt the root filesystem.

  • When you want a fresh installation of DGX OS 5.

Warning

Reimagin the system erases all data stored on the OS drives. This includes the /home partition, where all users’ documents, software settings, and other personal files are stored. If you need to preserve data through the reimaging, you can move the files and documents to the /raid directory and install the DGX OS software with the option to preserve the RAID array content.

The reimage process does not change persistent hardware configurations such as MIG settings or data drive encryption.

Important

After completing the installation, refer also to Upgrading DGX OS for information on upgrading the system to the latest available software versions available since the DGX OS ISO release, including security updates.

Note

Before you begin, ensure that you have an active NVIDIA Enterprise Support account.

To ensure that you install the latest available version of DGX OS, obtain the latest ISO image file from NVIDIA Enterprise Supprt:

  1. Go to the Download Center.

  2. Click [Server/Workstation] -> [DGX], and select All Downloads for your system.

  3. Click on the download link for the latest ISO release to go to the announcement.

  4. Download the ISO image that is referenced in the announcement and save it to your local disk.

  5. Run the md5sum command to print the MD5 hash and compare it with the value in the announcement.

    Copy
    Copied!
                

    $ md5sum DGXOS-5.0.0-2020-09-21-15-40-02.iso e4c77338ed35d7a34e772d8552e9d080 --> DGXOS-5.0.0-2020-09-21-15-40-02.iso

Install the DGX OS ISO image in one of the following ways:

Installing the DGX OS Image Remotely through the BMC

These instructions describe how to re-image the system remotely through the BMC. After obtaining the DGX OS 5 ISO image from NVIDIA Enterprise Support, create a bootable installation medium, such as a USB flash drive or DVD-ROM, that contains the image.

  1. Log in to the BMC Connecting to the DGX System

  2. Click [Remote Control] and then click [Launch KVM].

  3. Set up the ISO image as virtual media.

    1. From the top bar, click [Browse File] and then locate and select the DGX OS ISO file and click [Open]

    2. Click [Start Media].

  4. Reset the system and boot the virtual media image.

    1. From the top menu, click [Power] and select [Hard Reset], then click [Perform Action]

    2. Click [Yes] and then [OK] at the Power Control dialogs, then wait for the system to power down and then come back online.

    3. Refer to DGX OS ISO Boot Options for a description of the GRUB menu options and for instructions on completing the installation process.

Installing the DGX OS Image from a USB Flash Drive or DVD-ROM

After obtaining the DGX OS 5 ISO image from NVIDIA Enterprise Support, create a bootable installation medium, such as a USB flash drive or DVD-ROM, that contains the image.

Creating a Bootable USB Flash Drive by Using the dd Command

On a Linux system, you can use the dd command to create a bootable USB flash drive that contains the DGX OS software image.

Note

To ensure that the resulting flash drive is bootable, use the dd command to perform a device bit copy of the image. If you use other commands to perform a simple file copy of the image, the resulting flash drive may not be bootable.

Ensure that the following prerequisites are met:

  • The correct DGX OS software image is saved to your local disk.

    For more information see the Checksum File.

  • The USB flash drive meets the following requirement:

    • The USB flash drive has a capacity of at least 16 GB.

    • This requirement applies only to DGX A100: The partition scheme on the USD flash drive is a CPT partition scheme for UEFI.

  1. Plug the USB flash drive into one of the USB ports of your Linux2. Obtain the device name of the USB flash drive by running the lsblk command.

    Copy
    Copied!
                

    $ lsblk

    You can identify the USB flash drive from its size, which is much smaller than the size of the SSDs in the DGX software, and from the mount points of any partitions on the drive, which are under /media.

    In the following example, the device name of the USB flash drive is sde.

    Copy
    Copied!
                

    $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1.8T 0 disk |_sda1 8:1 0 121M 0 part /boot/efi |_sda2 8:2 0 1.8T 0 part / sdb 8:16 0 1.8T 0 disk |_sdb1 8:17 0 1.8T 0 part sdc 8:32 0 1.8T 0 disk sdd 8:48 0 1.8T 0 disk sde 8:64 1 7.6G 0 disk |_sde1 8:65 1 7.6G 0 part /media/deeplearner/DGXSTATION

  2. As root, convert and copy the image to the USB flash drive.

    Copy
    Copied!
                

    $ sudo dd if=path-to-software-image bs=2048 of=usb-drive-device-name

    Warning

    The dd command erases all data on the device that you specify in the of option of the command. To avoid losing data, ensure that you specify the correct path to the USB flash drive.

Creating a Bootable USB Flash Drive by Using Akeo Rufus

On a Windows system, you can use the Akeo Reliable USB Formatting Utility (Rufus) to create a bootable USB flash drive that contains the DGX OS software image.

Ensure that the following prerequisites are met:

  • The correct DGX OS software image is saved to your local disk.

    For more information see the Checksum File.

  • The USB flash drive has a capacity of at least 16 GB.

Follow these steps to create the bootable USB Flash drive:

  1. Plug the USB flash drive into one of the USB ports of your Windows system.

  2. Download and launch the Akeo Reliable USB Formatting Utility (Rufus)

  3. In Drive Properties, select the following options:

    1. In Device, select your USB flash drive.

    2. In Boot selection, click [SELECT], locate, and select the DGX OS software image.

    You can leave the other settings at the default.

  4. Click [Start]. This step prompts you to select whether to write the image in ISO Image mode (file copy) or DD Image mode (disk image).

    rufus-hybrid.png

  5. Select [Write in DD Image mode] and click [OK].

Booting the DGX OS ISO image

These instructions describe how to boot the DGX OS ISO image locally.

  1. Plug the USB flash drive containing the OS image into the DGX system.

  2. Connect a monitor and keyboard directly to the DGX system.

  3. Boot the system and then press F11 when the NVIDIA logo appears to get to the boot menu.

  4. Select the USB volume name that corresponds to the inserted USB flash drive and boot the system from it.

  5. Continue to the next chapter (DGX OS ISO Boot Options) for a description of the GRUB menu options and for instructions on completing the installation process.

This section provides information about the available installation and boot options of the DGX OS ISO installer.

These instructions assume that you have booted the DGX OS ISO, either remotely through the BMC or locally from a USB flash drive.

  • When the system boots up, select one of the following options from the GRUB menu:

    • Install DGX OS <version>: Install DGX OS and Reformat the Data RAID

    • Install DGX OS <version>: Without Reformatting Data RAID

    • Advanced Installation Options: Select to install with an encrypted root filesystem

      • Install DGX OS <version> With Encrypted Root

      • Install DGX OS <version> With Encrypted Root and Without Reformatting Data RAID

    • Boot Into Live Environment

    • Check Disc for Defects

    See the subsections below for more information about these options.

  • Verify that the DGX system booted up and that the image is being installed.

    This process will iterate through the software components and copy and install them showing the executed commands. This process generally takes between 15 and 60 minutes, depending on DGX platform, and how the system is being imaged (for example, BMC over a slow network or locally with a fast USB flash drive).

Note

On DGX servers, the NVIDIA InfiniBand driver is installed and the firmware on the ConnectX cards are updated. This process can take up to 5 minutes for each card. Other system firmware is not updated.

After the installation is completed, the system reboots into the OS, and prompts for configuration information. See Initial DGX OS Setup for more information about how to boot up the DGX system for the first time after a fresh installation.

Install DGX OS and Reformat the Data RAID

Here are the steps to install your DGX system and reformat the data RAID.

When you accept this option, the installation process repartitions all drives, including the OS and the data drives. The data drives are configured as a RAID array and mounted under the /raid directory. This process overwrites all the data and file systems that might exist on the OS and data drives. The RAID array on the DGX data disks is intended to be used as a cache and not for long-term data storage, so reformatting the data RAID should not be disruptive.

These changes are preserved across system reboots.

Install DGX OS without Reformatting the Data RAID

Here are the steps to install your DGX system without reformatting the data RAID.

The RAID array on the DGX data disks is intended to be used as a cache and not for long-term data storage, so this should not be disruptive. However, if you are an advanced user and have set up the disks for a non-cache purpose and want to keep the data on those drives, select [Install DGX system] without formatting RAID option at the boot menu during the boot installation. This option retains data on the RAID disks, and the following tasks are completed:

  • Installs the cache daemon but leaves it disabled by commenting out the RUN=yes line in /etc/default/cachefilesd

  • Creates a /raid directory, leaves it out of the file system table by commenting out the entry containing /raid in /etc/fstab

  • Does not format the RAID disks.

When the installation is completed, you can repeat any configuration steps that you had performed to use the RAID disks as other than cache disks. You can always choose to use the RAID disks as cache disks later by enabling cachefilesd and adding /raid to the file system table:

  1. Uncomment the #RUN=yes line in /etc/default/cachefilesd

  2. Uncomment the /raid line in etc/fstab

  3. Run the following:

    1. Mount /raid

      Copy
      Copied!
                  

      $ sudo mount /raid

    2. Reload the systemd manager configuration.

      Copy
      Copied!
                  

      $ systemctl daemon-reload

    3. Start the cache daemon.

      Copy
      Copied!
                  

      $ systemctl start cachefilesd.server

These changes are preserved across system reboots.

Advanced Installation Options (Encrypted Root)

When you select this menu item, you have the ability to encrypt the root filesystem of the DGX system.

Warning

This option should only be selected when you want to encrypt the root filesystem.

Aside from the encrypted root filesystem, the behavior is identical to the default installation.

Selecting Encrypted Root instructs the installer to encrypt the root filesystem. The encryption is fully automated and you will be required to manually unlock the root partition by entering a passphrase at the console (through a direct keyboard and mouse connection or through the BMC) each time the system boots.

During the First Boot Process for DGX Servers “Here are the steps to complete the first boot process for DGX servers. or the First Boot Process for DGX Station

When you power on your DGX system for the first time, you are prompted to accept end user license agreements for NVIDIA software. You are then guided through the process to complete the initial Ubuntu OS configuration, you can create your passphrase for the drive. If necessary, you can change this passphrase later.

Warning

Encryption cannot be enabled or disabled after the installation. To change the encryption state again, you need to reimage the drives.

Boot Into a Live Environment

The DGX OS installer image can also be used as a Live image, which means that the image boots up and runs a minimal DGX OS in system memory and does not overwrite anything on the disks in the system.

Live mode does not load drivers, and is essentially a simple Ubuntu Server configuration. This mode can be used as a tool to debug a system when the disks on the system are not accessible or should not be touched.

In a typical operation, this option should not be selected.

Check Disc for Defects

Here is some information about how you can check the disc for defects.

If you are experiencing anomalies when you install the DGX OS, and suspect the installation media might have an issue, selecting this item to complete an extensive test of the install media contents.

The process is time consuming, and the installation media is usually is not the source of the problem. In a typical operation, this option should not be selected.

© Copyright 2020-2023, NVIDIA. Last updated on Mar 24, 2023.