Reimaging#
This section provides information about installing the DGX OS by reimaging the system from the DGX OS ISO image.
DGX OS is already preinstalled on new DGX systems and only requires reimaging in limited cases. If your system is already running DGX OS 5, you can skip to Initial Setup for instructions about the initial setup of the system. To upgrade a system from DGX OS 4, refer to Upgrading.
You also have the option to install Ubuntu and the DGX software manually, for example, if you require custom installation options, such as a specific a drive partition scheme. Refer to Installing on Ubuntu for more details. It also describes automating the installation process, for example, for cluster deployments.
There are situations where you want to reimage a system, such as the following:
When you want to install the latest version on a new system.
When you need to install and older version
When the OS becomes corrupt.
When the OS drive is replaced or both drives in a RAID-1 configuration are replaced.
When you want to encrypt the root filesystem.
When you want a fresh installation of DGX OS 5.
Warning
Reimaging the system erases all data stored on the OS drives. This includes
the /home
partition, where all users’ documents, software
settings, and other personal files are stored. If you need to
preserve data through the reimaging, you can move the files and
documents to the /raid
directory and install the DGX OS software
with the option to preserve the RAID array content.
The reimage process does not change persistent hardware configurations such as MIG settings or data drive encryption.
Important
After completing the installation, you need to
refer also to Upgrading for information on upgrading the system to the latest available software versions available since the DGX OS ISO release, including security updates.
Obtaining the DGX OS ISO Image#
Note
Before you begin, ensure that you have an active NVIDIA Enterprise Support account.
To ensure that you install the latest available version of DGX OS, obtain the latest ISO image file from NVIDIA Enterprise Supprt:
Go to the Download Center.
Click [Server/Workstation] -> [DGX], and select All Downloads for your system.
Click on the download link for the latest ISO release to go to the announcement.
Download the ISO image that is referenced in the announcement and save it to your local disk.
Run the
md5sum
command to print the MD5 hash and compare it with the value in the announcement.md5sum DGXOS-5.0.0-2020-09-21-15-40-02.iso e4c77338ed35d7a34e772d8552e9d080 --> DGXOS-5.0.0-2020-09-21-15-40-02.iso
Installing the DGX OS Image#
Install the DGX OS ISO image in one of the following ways:
Remotely through the BMC for systems that provide a BMC. Refer to Installing the DGX OS Image Remotely through the BMC below for instructions.
Note
This method is not available for DGX Station (V100)
Locally from a UEFI-bootable USB flash drive or DVD-ROM.
Refer to Installing the DGX OS Image from a USB Flash Drive or DVD-ROM section in the corresponding DGX user guide for instructions.
After obtaining the DGX OS 5 ISO image from NVIDIA Enterprise Support, create a bootable installation medium, such as a USB flash drive or DVD-ROM, that contains the image.
Installing the DGX OS Image Remotely through the BMC#
These instructions describe how to re-image the system remotely through the BMC. After obtaining the DGX OS 5 ISO image from NVIDIA Enterprise Support, create a bootable installation medium, such as a USB flash drive or DVD-ROM, that contains the image.
Log in to the BMC Connecting to the DGX System
Click [Remote Control] and then click [Launch KVM].
Set up the ISO image as virtual media.
From the top bar, click [Browse File] and then locate and select the DGX OS ISO file and click [Open]
Click [Start Media].
Reset the system and boot the virtual media image.
From the top menu, click [Power] and select [Hard Reset], then click [Perform Action]
Click [Yes] and then [OK] at the Power Control dialogs, then wait for the system to power down and then come back online.
Refer to DGX OS ISO Boot Options for a description of the GRUB menu options and for instructions on completing the installation process.
Installing the DGX OS Image from a USB Flash Drive or DVD-ROM#
After obtaining the DGX OS 5 ISO image from NVIDIA Enterprise Support, create a bootable installation medium, such as a USB flash drive or DVD-ROM, that contains the image.
If you are creating a bootable USB flash drive, refer to one of the following links for more information:
On Linux, see Creating a Bootable USB Flash Drive by Using the dd Command
On Windows, see Creating a Bootable USB Flash Drive by Using Akeo Rufus
If you are creating a bootable DVD ROM, refer to Burning the ISO on to a DVD-ROM on the Ubuntu Community Help Wiki for more information about the available methods.
Creating a Bootable USB Flash Drive by Using the dd Command#
On a Linux system, you can use the dd command to create a bootable USB flash drive that contains the DGX OS software image.
Note
To ensure that the resulting flash drive is bootable, use the
dd
command to perform a device bit copy of the image. If you use
other commands to perform a simple file copy of the image, the
resulting flash drive may not be bootable.
Ensure that the following prerequisites are met:
The correct DGX OS software image is saved to your local disk.
For more information see the Checksum File.
The USB flash drive meets the following requirement:
The USB flash drive has a capacity of at least 16 GB.
This requirement applies only to DGX A100: The partition scheme on the USD flash drive is a CPT partition scheme for UEFI.
Plug the USB flash drive into one of the USB ports of your Linux2. Obtain the device name of the USB flash drive by running the lsblk command.
lsblk
You can identify the USB flash drive from its size, which is much smaller than the size of the SSDs in the DGX software, and from the mount points of any partitions on the drive, which are under
/media
.In the following example, the device name of the USB flash drive is
sde
.lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1.8T 0 disk |_sda1 8:1 0 121M 0 part /boot/efi |_sda2 8:2 0 1.8T 0 part / sdb 8:16 0 1.8T 0 disk |_sdb1 8:17 0 1.8T 0 part sdc 8:32 0 1.8T 0 disk sdd 8:48 0 1.8T 0 disk sde 8:64 1 7.6G 0 disk |_sde1 8:65 1 7.6G 0 part /media/deeplearner/DGXSTATION
As root, convert and copy the image to the USB flash drive.
sudo dd if=path-to-software-image bs=2048 of=usb-drive-device-name
Warning
The
dd
command erases all data on the device that you specify in the of option of the command. To avoid losing data, ensure that you specify the correct path to the USB flash drive.
Creating a Bootable USB Flash Drive by Using Akeo Rufus#
On a Windows system, you can use the Akeo Reliable USB Formatting Utility (Rufus) to create a bootable USB flash drive that contains the DGX OS software image.
Ensure that the following prerequisites are met:
The correct DGX OS software image is saved to your local disk.
For more information see the Checksum File.
The USB flash drive has a capacity of at least 16 GB.
Follow these steps to create the bootable USB Flash drive:
Plug the USB flash drive into one of the USB ports of your Windows system.
Download and launch the Akeo Reliable USB Formatting Utility (Rufus)
In Drive Properties, select the following options:
In Device, select your USB flash drive.
In Boot selection, click [SELECT], locate, and select the DGX OS software image.
You can leave the other settings at the default.
Click [Start]. This step prompts you to select whether to write the image in ISO Image mode (file copy) or DD Image mode (disk image).
Select [Write in DD Image mode] and click [OK].
Booting the DGX OS ISO image#
These instructions describe how to boot the DGX OS ISO image locally.
Plug the USB flash drive containing the OS image into the DGX system.
Connect a monitor and keyboard directly to the DGX system.
Boot the system and then press F11 when the NVIDIA logo appears to get to the boot menu.
Select the USB volume name that corresponds to the inserted USB flash drive and boot the system from it.
Continue to the next chapter (DGX OS ISO Boot Options) for a description of the GRUB menu options and for instructions on completing the installation process.
DGX OS ISO Boot Options#
This section provides information about the available installation and boot options of the DGX OS ISO installer.
These instructions assume that you have booted the DGX OS ISO, either remotely through the BMC or locally from a USB flash drive.
When the system boots up, select one of the following options from the GRUB menu:
Install DGX OS <version>: Install DGX OS and Reformat the Data RAID
Install DGX OS <version>: Without Reformatting Data RAID
Advanced Installation Options: Select to install with an encrypted root filesystem
Install DGX OS <version> With Encrypted Root
Install DGX OS <version> With Encrypted Root and Without Reformatting Data RAID
Boot Into Live Environment
Check Disc for Defects
See the subsections below for more information about these options.
Verify that the DGX system booted up and that the image is being installed.
This process will iterate through the software components and copy and install them showing the executed commands. This process generally takes between 15 and 60 minutes, depending on DGX platform, and how the system is being imaged (for example, BMC over a slow network or locally with a fast USB flash drive).
Note
On DGX servers, the NVIDIA InfiniBand driver is installed and the firmware on the ConnectX cards are updated. This process can take up to 5 minutes for each card. Other system firmware is not updated.
After the installation is completed, the system reboots into the OS, and prompts for configuration information. See Initial Setup for more information about how to boot up the DGX system for the first time after a fresh installation.
Install DGX OS and Reformat the Data RAID#
Here are the steps to install your DGX system and reformat the data RAID.
When you accept this option, the installation process repartitions all drives,
including the OS and the data drives. The data drives are configured as a RAID
array and mounted under the /raid
directory. This process overwrites all the
data and file systems that might exist on the OS and data drives. The RAID array
on the DGX data disks is intended to be used as a cache and not for long-term
data storage, so reformatting the data RAID should not be disruptive.
These changes are preserved across system reboots.
Install DGX OS without Reformatting the Data RAID#
Here are the steps to install your DGX system without reformatting the data RAID.
The RAID array on the DGX data disks is intended to be used as a cache and not for long-term data storage, so this should not be disruptive. However, if you are an advanced user and have set up the disks for a non-cache purpose and want to keep the data on those drives, select [Install DGX system] without formatting RAID option at the boot menu during the boot installation. This option retains data on the RAID disks, and the following tasks are completed:
Installs the cache daemon but leaves it disabled by commenting out the
RUN=yes
line in/etc/default/cachefilesd
Creates a
/raid
directory, leaves it out of the file system table by commenting out the entry containing/raid
in/etc/fstab
Does not format the RAID disks.
When the installation is completed, you can repeat any configuration
steps that you had performed to use the RAID disks as other than cache
disks. You can always choose to use the RAID disks as cache disks later
by enabling cachefilesd
and adding /raid
to the file system
table:
Uncomment the
#RUN=yes
line in/etc/default/cachefilesd
Uncomment the
/raid
line inetc/fstab
Run the following:
Mount
/raid
sudo mount /raid
Reload the systemd manager configuration.
systemctl daemon-reload
Start the cache daemon.
systemctl start cachefilesd.server
These changes are preserved across system reboots.
Advanced Installation Options (Encrypted Root)#
When you select this menu item, you have the ability to encrypt the root filesystem of the DGX system.
Warning
This option should only be selected when you want to encrypt the root filesystem.
Aside from the encrypted root filesystem, the behavior is identical to the default installation.
Selecting Encrypted Root instructs the installer to encrypt the root filesystem. The encryption is fully automated and you will be required to manually unlock the root partition by entering a passphrase at the console (through a direct keyboard and mouse connection or through the BMC) each time the system boots.
During the First Boot Process for DGX Servers “Here are the steps to complete the first boot process for DGX servers. or the First Boot Process for DGX Station
When you power on your DGX system for the first time, you are prompted to accept end user license agreements for NVIDIA software. You are then guided through the process to complete the initial Ubuntu OS configuration, you can create your passphrase for the drive. If necessary, you can change this passphrase later.
Warning
Encryption cannot be enabled or disabled after the installation. To change the encryption state again, you need to reimage the drives.
Boot Into a Live Environment#
The DGX OS installer image can also be used as a Live image, which means that the image boots up and runs a minimal DGX OS in system memory and does not overwrite anything on the disks in the system.
Live mode does not load drivers, and is essentially a simple Ubuntu Server configuration. This mode can be used as a tool to debug a system when the disks on the system are not accessible or should not be touched.
In a typical operation, this option should not be selected.
Check Disc for Defects#
Here is some information about how you can check the disc for defects.
If you are experiencing anomalies when you install the DGX OS, and suspect the installation media might have an issue, selecting this item to complete an extensive test of the install media contents.
The process is time consuming, and the installation media is usually is not the source of the problem. In a typical operation, this option should not be selected.