Reimaging the System#

This section provides information about installing the DGX OS by reimaging the system from the DGX OS ISO image.

DGX OS is already preinstalled on new DGX systems and only requires reimaging in limited cases. If your system is already running DGX OS 6, you can skip to Initial Setup for instructions about the initial setup of the system. To upgrade a system from DGX OS 5, refer to Upgrading the OS.

You also have the option to install Ubuntu and the DGX software manually, for example, if you require custom installation options, such as a specific a drive partition scheme. Refer to Installing DGX Software on Ubuntu for more details. It also describes automating the installation process, for example, for cluster deployments.

There are situations where you want to reimage a system, such as the following:

  • You want to install the latest version on a new system.

  • You need to install an older version.

  • The OS becomes corrupted.

  • The OS drive is replaced or both drives in a RAID-1 configuration are replaced.

  • You want to encrypt the root filesystem.

  • You want to revert the DGX system to the originally installed DGX OS.

Warning

Reimaging the system erases all data stored on the OS drives. This includes the /home partition, where all users’ documents, software settings, and other personal files are stored. If you need to preserve data through the reimaging, you can move the files and documents to the /raid directory and install the DGX OS software with the option to preserve the RAID array content.

The reimage process does not change persistent hardware configurations such as MIG settings or data drive encryption.

Important

After completing the installation, refer to Upgrading the OS to perform a package upgrade to the latest available software versions available since the DGX OS ISO release, including security updates.

Obtaining the DGX OS ISO Image#

Note

Before you begin, ensure that you have an active NVIDIA Enterprise Support account.

To ensure that you install the latest available version of DGX OS, obtain the latest ISO image file from NVIDIA Enterprise Support:

  1. Go to the Download Center.

  2. Click [Server/Workstation] -> [DGX], and select All Downloads for your system.

  3. Click on the download link for the latest ISO release to go to the announcement.

  4. Download the ISO image that is referenced in the announcement and save it to your local disk.

  5. Run the md5sum command to print the MD5 hash and compare it with the value in the announcement. For example:

    $ md5sum DGXOS-6.1.0-2023-08-09-12-30-10.iso
    

    Example Output

    d38620ffa58905330c1efe49b3d7ff53  DGXOS-6.1.0-2023-08-09-12-30-10.iso
    

Installing the DGX OS Image#

Install the DGX OS ISO image in one of the following ways:

Installing the DGX OS Image Remotely through the BMC#

These instructions describe how to re-image the system remotely through the BMC.

After obtaining the DGX OS 6 ISO image from NVIDIA Enterprise Support, make sure the host that you use for your web browser can access the ISO image file.

  1. Log in to the BMC.

    Refer to Connecting to the DGX System for more information.

  2. Click [Remote Control] and then click [Launch KVM].

  3. Set up the ISO image as virtual media.

    1. From the top bar, click [Browse File] and then locate and select the DGX OS ISO file and click [Open]

    2. Click [Start Media].

  4. Reset the system and boot the virtual media image.

    1. From the top menu, click [Power] and select [Hard Reset], then click [Perform Action].

    2. Click [Yes] and then [OK] at the Power Control dialogs.

      Wait for the system to power down and then come back online.

    3. Refer to DGX OS ISO Boot Options for a description of the GRUB menu options and for instructions on completing the installation process.

Installing the DGX OS Image from a USB Flash Drive or DVD-ROM#

After obtaining the DGX OS 6 ISO image from NVIDIA Enterprise Support, create a bootable installation medium, such as a USB flash drive or DVD-ROM, that contains the image.

Creating a Bootable USB Flash Drive by Using the dd Command#

On a Linux system, you can use the dd command to create a bootable USB flash drive that contains the DGX OS software image.

Note

To ensure that the resulting flash drive is bootable, use the dd command to perform a device bit copy of the image. If you use other commands to perform a simple file copy of the image, the resulting flash drive may not be bootable.

Ensure that the following prerequisites are met:

  • The correct DGX OS software image is saved to your local disk.

    For more information, refer to Obtaining the DGX OS ISO Image.

  • The USB flash drive meets the following requirement:

    • The USB flash drive has a capacity of at least 16 GB.

    • (DGX A100 only) The partition scheme on the USB flash drive is a GPT partition for UEFI.

Create the bootable USB Flash drive:

  1. Plug the USB flash drive into one of the USB ports of your Linux host. Obtain the device name of the USB flash drive by running the lsblk command.

    lsblk
    

    You can identify the USB flash drive from its size, which is much smaller than the size of the SSDs in the DGX software, and from the mount points of any partitions on the drive, which are under /media.

    In the following example output, the device name of the USB flash drive is sde.

    NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
    sda      8:0    0   1.8T  0 disk
    |_sda1   8:1    0   121M  0 part /boot/efi
    |_sda2   8:2    0   1.8T  0 part /
    sdb      8:16   0   1.8T  0 disk
    |_sdb1   8:17   0   1.8T  0 part
    sdc      8:32   0   1.8T  0 disk
    sdd      8:48   0   1.8T  0 disk
    sde      8:64   1   7.6G  0 disk
    |_sde1   8:65   1   7.6G  0 part /media/deeplearner/DGXSTATION
    
  2. As root, convert and copy the image to the USB flash drive.

    sudo dd if=<path-to-ISO-image> bs=2048 of=<usb-drive-device-name>
    

    Warning

    The dd command erases all data on the device that you specify in the of argument. To avoid losing data, ensure that you specify the correct path to the USB flash drive.

Creating a Bootable USB Flash Drive by Using Akeo Rufus#

On a Windows system, you can use the Akeo Reliable USB Formatting Utility (Rufus) to create a bootable USB flash drive that contains the DGX OS software image.

Ensure that the following prerequisites are met:

  • The correct DGX OS software image is saved to your local disk.

    For more information, refer to Obtaining the DGX OS ISO Image.

  • The USB flash drive has a capacity of at least 16 GB.

Follow these steps to create the bootable USB Flash drive:

  1. Plug the USB flash drive into one of the USB ports of your Windows system.

  2. Download and launch the Akeo Reliable USB Formatting Utility (Rufus).

  3. In Drive Properties, select the following options:

    1. In Device, select your USB flash drive.

    2. In Boot selection, click [SELECT], locate, and select the DGX OS software image.

    You can leave the other settings at the default.

  4. Click [Start]. This step prompts you to select whether to write the image in ISO Image mode (file copy) or DD Image mode (disk image).

    _images/rufus-hybrid.png
  5. Select [Write in DD Image mode] and click [OK].

Booting the DGX OS ISO image#

These instructions describe how to boot the DGX OS ISO image locally.

  1. Plug the USB flash drive containing the OS image into the DGX system.

  2. Connect a monitor and keyboard directly to the DGX system.

  3. Boot the system and then press F11 when the NVIDIA logo appears to access the boot menu.

  4. Select the USB volume name that corresponds to the inserted USB flash drive and boot the system from it.

Refer to DGX OS ISO Boot Options) for a description of the GRUB menu options and for information about completing the installation process.

DGX OS ISO Boot Options#

This section provides information about the available installation and boot options of the DGX OS ISO installer.

These instructions assume that you have booted the DGX OS ISO, either remotely through the BMC or locally from a USB flash drive.

  • When the system boots up, select one of the following options from the GRUB menu:

    • Install DGX OS <version>

    • Install DGX OS <version>: Without Reformatting Data RAID (does not mount /raid)

    • Advanced Installation Options - Install DGX OS <version> Without NVIDIA Drivers - Install DGX OS <version> With Encrypted Root - Install DGX OS <version> With Encrypted Root and Without Reformatting Data RAID

    • Boot Into Live Environment

    • Check Media for Defects

    See the following sections for more information about these options.

  • Verify that the DGX system booted up and that the image is being installed.

    This process will iterate through the software components and copy and install them showing the executed commands. This process generally takes between 15 and 60 minutes, depending on DGX platform, and how the system is being imaged (for example, BMC over a slow network or locally with a fast USB flash drive).

Note

On DGX servers, the NVIDIA InfiniBand driver is installed and the firmware on the ConnectX cards are updated. This process can take up to 5 minutes for each card. Other system firmware is not updated.

After the installation is complete, the system reboots into the OS, and prompts for configuration information. Refer to Initial Setup for more information about how to boot up the DGX system for the first time after reimaging the system.

Install DGX OS#

Here are the steps to install your DGX system and reformat the data RAID.

When you accept this option, the installation process repartitions all drives, including the OS and the data drives. The data drives are configured as a RAID array and mounted under the /raid directory. This process overwrites all the data and file systems that might exist on the OS and data drives. The RAID array on the DGX data disks is intended to be used as a cache and not for long-term data storage, so reformatting the data RAID should not be disruptive.

These changes are preserved across system reboots.

Install DGX OS without Reformatting the Data RAID#

Here are the steps to install your DGX system without reformatting the data RAID.

The RAID array on the DGX data disks is intended for use as a cache and not for long-term data storage, so this should not be disruptive. However, if you are an advanced user and have set up the disks for a non-cache purpose and want to keep the data on those drives, select Install DGX system without Reformatting the Data RAID option at the boot menu during the boot installation. This option retains data on the RAID disks, and the following tasks are completed:

  • Installs the cache daemon but leaves it disabled by commenting out the RUN=yes line in /etc/default/cachefilesd

  • Creates a /raid directory, leaves it out of the file system table by commenting out the entry containing /raid in /etc/fstab

  • Does not format the RAID disks.

When the installation is completed, you can repeat any configuration steps that you had performed to use the RAID disks as other than cache disks. You can always choose to use the RAID disks as cache disks later by enabling cachefilesd and adding /raid to the file system table:

  1. Uncomment the #RUN=yes line in /etc/default/cachefilesd.

  2. To mount a block device or RAID array at /raid, uncomment the /raid line in /etc/fstab.

    Ensure to replace <device> with the proper value for your environment.

  3. Run the following:

    1. Mount /raid.

      sudo mount /raid
      
    2. Reload the systemd manager configuration.

      systemctl daemon-reload
      
    3. Start the cache daemon.

      systemctl start cachefilesd
      

These changes are preserved across system reboots.

Advanced Installation Options (Encrypted Root)#

When you select this menu item, you have the ability to encrypt the root filesystem of the DGX system.

Warning

Select this option only if you want to encrypt the root filesystem.

Aside from the encrypted root filesystem, the behavior is identical to the default installation.

Selecting Encrypted Root instructs the installer to encrypt the root filesystem. The encryption is fully automated and you will be required to manually unlock the root partition by entering a passphrase at the console (through a direct keyboard and mouse connection or through the BMC) each time the system boots.

When you power on your DGX system for the first time, you are prompted to accept end user license agreements for NVIDIA software. You are then guided through the process to complete the initial Ubuntu OS configuration, you can create your passphrase for the drive. If necessary, you can change this passphrase later. For more details see First Boot Process for DGX Servers or First Boot Process for DGX Station.

Warning

Encryption cannot be enabled or disabled after the installation. To change the encryption state again, you need to reimage the drives.

Boot Into a Live Environment#

The DGX OS installer image can also be used as a Live image, which means that the image boots up and runs a minimal DGX OS in system memory and does not overwrite anything on the disks in the system.

Live mode does not load drivers, and is essentially a simple Ubuntu Server configuration. This mode can be used as a tool to debug a system when the disks on the system are not accessible or should not be touched.

In a typical operation, this option should not be selected.

Check Disc for Defects#

Here is some information about how you can check the disc for defects.

If you are experiencing anomalies when you install the DGX OS, and suspect the installation media might have an issue, selecting this item to complete an extensive test of the install media contents.

The process is time consuming, and the installation media is usually is not the source of the problem. In a typical operation, this option should not be selected.