Reimaging the System#
This section provides information about installing the DGX OS by reimaging the system from the DGX OS ISO image.
DGX OS is already preinstalled on new DGX systems and only requires reimaging in limited cases. If your system is already running DGX OS 6, you can skip to Initial Setup for instructions about the initial setup of the system. To upgrade a system from DGX OS 5, refer to Upgrading the OS.
You also have the option to install Ubuntu and the DGX software manually, for example, if you require custom installation options, such as a specific a drive partition scheme. Refer to Installing DGX Software on Ubuntu for more details. It also describes automating the installation process, for example, for cluster deployments.
There are situations where you want to reimage a system, such as the following:
You want to install the latest version on a new system.
You need to install an older version.
The OS becomes corrupted.
The OS drive is replaced or both drives in a RAID-1 configuration are replaced.
You want to encrypt the root filesystem.
You want to revert the DGX system to the originally installed DGX OS.
Warning
Reimaging the system erases all data stored on the OS drives. This includes
the /home
partition, where all users’ documents, software
settings, and other personal files are stored. If you need to
preserve data through the reimaging, you can move the files and
documents to the /raid
directory and install the DGX OS software
with the option to preserve the RAID array content.
The reimage process does not change persistent hardware configurations such as MIG settings or data drive encryption.
Important
After completing the installation, refer to Upgrading the OS to perform a package upgrade to the latest available software versions available since the DGX OS ISO release, including security updates.
Obtaining the DGX OS ISO Image#
Note
Before you begin, ensure that you have an active NVIDIA Enterprise Support account.
To ensure that you install the latest available version of DGX OS, obtain the latest ISO image file from NVIDIA Enterprise Support:
Go to the Download Center.
Click [Server/Workstation] -> [DGX], and select All Downloads for your system.
Click on the download link for the latest ISO release to go to the announcement.
Download the ISO image that is referenced in the announcement and save it to your local disk.
Run the
md5sum
command to print the MD5 hash and compare it with the value in the announcement. For example:$ md5sum DGXOS-6.1.0-2023-08-09-12-30-10.iso
Example Output
d38620ffa58905330c1efe49b3d7ff53 DGXOS-6.1.0-2023-08-09-12-30-10.iso
Installing the DGX OS Image#
Install the DGX OS ISO image in one of the following ways:
Remotely through the BMC for systems that provide a BMC. Refer to Installing the DGX OS Image Remotely through the BMC below for instructions.
Note
This method is not available for DGX Station (V100)
Locally from a UEFI-bootable USB flash drive or DVD-ROM.
Refer to Installing the DGX OS Image from a USB Flash Drive or DVD-ROM. After obtaining the DGX OS 6 ISO image from NVIDIA Enterprise Support, create a bootable installation medium, such as a USB flash drive or DVD-ROM, that contains the image.
Installing the DGX OS Image Remotely through the BMC#
These instructions describe how to re-image the system remotely through the BMC.
After obtaining the DGX OS 6 ISO image from NVIDIA Enterprise Support, make sure the host that you use for your web browser can access the ISO image file.
Log in to the BMC.
Refer to Connecting to the DGX System for more information.
Click [Remote Control] and then click [Launch KVM].
Set up the ISO image as virtual media.
From the top bar, click [Browse File] and then locate and select the DGX OS ISO file and click [Open]
Click [Start Media].
Reset the system and boot the virtual media image.
From the top menu, click [Power] and select [Hard Reset], then click [Perform Action].
Click [Yes] and then [OK] at the Power Control dialogs.
Wait for the system to power down and then come back online.
Refer to DGX OS ISO Boot Options for a description of the GRUB menu options and for instructions on completing the installation process.
Installing the DGX OS Image from a USB Flash Drive or DVD-ROM#
After obtaining the DGX OS 6 ISO image from NVIDIA Enterprise Support, create a bootable installation medium, such as a USB flash drive or DVD-ROM, that contains the image.
To create a bootable USB flash drive, refer to one of the following links for more information:
On Linux, refer to Creating a Bootable USB Flash Drive by Using the dd Command.
On Windows, refer to Creating a Bootable USB Flash Drive by Using Akeo Rufus.
To create a bootable DVD ROM, refer to Burning the ISO on to a DVD-ROM on the Ubuntu Community Help Wiki for more information about the available methods.
Creating a Bootable USB Flash Drive by Using the dd Command#
On a Linux system, you can use the dd command to create a bootable USB flash drive that contains the DGX OS software image.
Note
To ensure that the resulting flash drive is bootable, use the
dd
command to perform a device bit copy of the image. If you use
other commands to perform a simple file copy of the image, the
resulting flash drive may not be bootable.
Ensure that the following prerequisites are met:
The correct DGX OS software image is saved to your local disk.
For more information, refer to Obtaining the DGX OS ISO Image.
The USB flash drive meets the following requirement:
The USB flash drive has a capacity of at least 16 GB.
(DGX A100 only) The partition scheme on the USB flash drive is a GPT partition for UEFI.
Create the bootable USB Flash drive:
Plug the USB flash drive into one of the USB ports of your Linux host. Obtain the device name of the USB flash drive by running the lsblk command.
lsblk
You can identify the USB flash drive from its size, which is much smaller than the size of the SSDs in the DGX software, and from the mount points of any partitions on the drive, which are under
/media
.In the following example output, the device name of the USB flash drive is
sde
.NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1.8T 0 disk |_sda1 8:1 0 121M 0 part /boot/efi |_sda2 8:2 0 1.8T 0 part / sdb 8:16 0 1.8T 0 disk |_sdb1 8:17 0 1.8T 0 part sdc 8:32 0 1.8T 0 disk sdd 8:48 0 1.8T 0 disk sde 8:64 1 7.6G 0 disk |_sde1 8:65 1 7.6G 0 part /media/deeplearner/DGXSTATION
As root, convert and copy the image to the USB flash drive.
sudo dd if=<path-to-ISO-image> bs=2048 of=<usb-drive-device-name>
Warning
The
dd
command erases all data on the device that you specify in theof
argument. To avoid losing data, ensure that you specify the correct path to the USB flash drive.
Creating a Bootable USB Flash Drive by Using Akeo Rufus#
On a Windows system, you can use the Akeo Reliable USB Formatting Utility (Rufus) to create a bootable USB flash drive that contains the DGX OS software image.
Ensure that the following prerequisites are met:
The correct DGX OS software image is saved to your local disk.
For more information, refer to Obtaining the DGX OS ISO Image.
The USB flash drive has a capacity of at least 16 GB.
Follow these steps to create the bootable USB Flash drive:
Plug the USB flash drive into one of the USB ports of your Windows system.
Download and launch the Akeo Reliable USB Formatting Utility (Rufus).
In Drive Properties, select the following options:
In Device, select your USB flash drive.
In Boot selection, click [SELECT], locate, and select the DGX OS software image.
You can leave the other settings at the default.
Click [Start]. This step prompts you to select whether to write the image in ISO Image mode (file copy) or DD Image mode (disk image).
Select [Write in DD Image mode] and click [OK].
Booting the DGX OS ISO image#
These instructions describe how to boot the DGX OS ISO image locally.
Plug the USB flash drive containing the OS image into the DGX system.
Connect a monitor and keyboard directly to the DGX system.
Boot the system and then press F11 when the NVIDIA logo appears to access the boot menu.
Select the USB volume name that corresponds to the inserted USB flash drive and boot the system from it.
Refer to DGX OS ISO Boot Options) for a description of the GRUB menu options and for information about completing the installation process.
DGX OS ISO Boot Options#
This section provides information about the available installation and boot options of the DGX OS ISO installer.
These instructions assume that you have booted the DGX OS ISO, either remotely through the BMC or locally from a USB flash drive.
When the system boots up, select one of the following options from the GRUB menu:
Install DGX OS <version>
Install DGX OS <version>: Without Reformatting Data RAID (does not mount /raid)
Advanced Installation Options - Install DGX OS <version> Without NVIDIA Drivers - Install DGX OS <version> With Encrypted Root - Install DGX OS <version> With Encrypted Root and Without Reformatting Data RAID
Boot Into Live Environment
Check Media for Defects
See the following sections for more information about these options.
Verify that the DGX system booted up and that the image is being installed.
This process will iterate through the software components and copy and install them showing the executed commands. This process generally takes between 15 and 60 minutes, depending on DGX platform, and how the system is being imaged (for example, BMC over a slow network or locally with a fast USB flash drive).
Note
On DGX servers, the NVIDIA InfiniBand driver is installed and the firmware on the ConnectX cards are updated. This process can take up to 5 minutes for each card. Other system firmware is not updated.
After the installation is complete, the system reboots into the OS, and prompts for configuration information. Refer to Initial Setup for more information about how to boot up the DGX system for the first time after reimaging the system.
Install DGX OS#
Here are the steps to install your DGX system and reformat the data RAID.
When you accept this option, the installation process repartitions all drives,
including the OS and the data drives. The data drives are configured as a RAID
array and mounted under the /raid
directory. This process overwrites all the
data and file systems that might exist on the OS and data drives. The RAID array
on the DGX data disks is intended to be used as a cache and not for long-term
data storage, so reformatting the data RAID should not be disruptive.
These changes are preserved across system reboots.
Install DGX OS without Reformatting the Data RAID#
Here are the steps to install your DGX system without reformatting the data RAID.
The RAID array on the DGX data disks is intended for use as a cache and not for long-term data storage, so this should not be disruptive. However, if you are an advanced user and have set up the disks for a non-cache purpose and want to keep the data on those drives, select Install DGX system without Reformatting the Data RAID option at the boot menu during the boot installation. This option retains data on the RAID disks, and the following tasks are completed:
Installs the cache daemon but leaves it disabled by commenting out the
RUN=yes
line in/etc/default/cachefilesd
Creates a
/raid
directory, leaves it out of the file system table by commenting out the entry containing/raid
in/etc/fstab
Does not format the RAID disks.
When the installation is completed, you can repeat any configuration
steps that you had performed to use the RAID disks as other than cache
disks. You can always choose to use the RAID disks as cache disks later
by enabling cachefilesd
and adding /raid
to the file system
table:
Uncomment the
#RUN=yes
line in/etc/default/cachefilesd
.To mount a block device or RAID array at
/raid
, uncomment the/raid
line in/etc/fstab
.Ensure to replace
<device>
with the proper value for your environment.Run the following:
Mount
/raid
.sudo mount /raid
Reload the systemd manager configuration.
systemctl daemon-reload
Start the cache daemon.
systemctl start cachefilesd
These changes are preserved across system reboots.
Advanced Installation Options (Encrypted Root)#
When you select this menu item, you have the ability to encrypt the root filesystem of the DGX system.
Warning
Select this option only if you want to encrypt the root filesystem.
Aside from the encrypted root filesystem, the behavior is identical to the default installation.
Selecting Encrypted Root instructs the installer to encrypt the root filesystem. The encryption is fully automated and you will be required to manually unlock the root partition by entering a passphrase at the console (through a direct keyboard and mouse connection or through the BMC) each time the system boots.
When you power on your DGX system for the first time, you are prompted to accept end user license agreements for NVIDIA software. You are then guided through the process to complete the initial Ubuntu OS configuration, you can create your passphrase for the drive. If necessary, you can change this passphrase later. For more details see First Boot Process for DGX Servers or First Boot Process for DGX Station.
Warning
Encryption cannot be enabled or disabled after the installation. To change the encryption state again, you need to reimage the drives.
Boot Into a Live Environment#
The DGX OS installer image can also be used as a Live image, which means that the image boots up and runs a minimal DGX OS in system memory and does not overwrite anything on the disks in the system.
Live mode does not load drivers, and is essentially a simple Ubuntu Server configuration. This mode can be used as a tool to debug a system when the disks on the system are not accessible or should not be touched.
In a typical operation, this option should not be selected.
Check Disc for Defects#
Here is some information about how you can check the disc for defects.
If you are experiencing anomalies when you install the DGX OS, and suspect the installation media might have an issue, selecting this item to complete an extensive test of the install media contents.
The process is time consuming, and the installation media is usually is not the source of the problem. In a typical operation, this option should not be selected.