PXE Boot Setup#

The dgx-server UEFI BIOS supports PXE boot. Several manual customization steps are required to get PXE to boot the Base OS image.

Caution

This document is meant to be used as a reference. explicit instructions are not given to configure the DHCP, HTTP, and TFTP servers. The end user’s IT team is expected to configure these servers to fit their company’s environment and security guidelines.

Requirements#

  • TFTP server

    • Software that provides TFTP service.

  • HTTP server

    • An HTTP server is used to transfer large files, such as the iso image and initrd. alternatively, TFTP can be used for this purpose. HTTP is used in the example below.

  • DHCP server

    • Software that provides dynamic host configuration protocol (DHCP) service.

      Note

      The TFTP server, HTTP server, and DHCP server can all be configured on the same system, or they can each be on different systems.

  • linux bootloader

  • ip address: <ftp ip>

  • fully qualified host: <ftp host>

This topic provides some guidance concerning how to set up a PXE boot environment for DGX systems. For complete details, refer to online documentation for setting up a PXE boot server. In this example, xinetd is used to provide TFTP service; dnsmasq is used to provide DHCP service; and syslinux is used as the bootloader.

Overview of the PXE Server#

The PXE server requires configuration in the following areas:

  • bootloader (grub)

  • TFTP contents (the kernel and initrd)

    In this example, TFTP is configured to serve files from /local/tftp/. You will need to configure your TFTP server to serve files from /local/tftp or the directory you desire to use.

  • HTTP contents (the iso image)

    In this example, HTTP is configured to serve files from /local/http/. You will need to configure your HTTP server to serve files from /local/http or the directory you desire to use.

  • DHCP

PXE Server Configuration#

x86_64 Instructions

In this example, the directory structure on the HTTP and TFTP server looks like this:

/local/
   http/
      base_os_7.0.0/
         base_os_7.0.0.iso
   tftp/
      grub2/
         base_os_7.0.0/
            vmlinuz
            initrd
      grub.cfg
      bootx64.efi
      grubx64.efi

Note

The vmlinuz and initrd files are specified relative to the TFTP root, /local/tftp/; and the location of the ISO, base_os_7.0.0.iso, is relative to the HTTP root, /local/http/.

Here, the DHCP and PXE servers are configured to use the above directory structure. The person responsible for deploying the PXE environment should change the directory names and structure to fit their infrastructure.

You can set up the directory structure on your HTTP and TFTP server similarly.

The contents of the /local/tftp/grub2/grub.cfg file should look something like this:

set default=0
set timeout=-1
insmod all_video

menuentry 'Install BaseOS 7.0.0' {
  linuxefi /grub2/base_os_7.0.0/vmlinuz fsck.mode=skip autoinstall ip=dhcp url=http://<Server IP>/base_os_7.0.0/base_os_7.0.0.iso nvme-core.multipath=n nouveau.modeset=0
  initrdefi /grub2/base_os_7.0.0/initrd
}

Note

The kernel boot parameters should match the contents of the corresponding ISO’s boot menu found in /mnt/boot/grub/grub.cfg.

When the system being installed boots via PXE, boot files located on /local/tftp are retrieved from the TFTP server. (In this example, the TFTP server is provided by the xinetd service whose configuration file, /etc/xinetd.d/tftp, specifies that the boot files are located on /local/tftp.) When a system is PXE booted, the bootx64.efi file that is designated in the DHCP server’s dhcpd.conf file is retrieved by TFTP transfer (see Configure Your DHCP Server ). By default, after the bootx64.efi is booted, the PXE boot grub.cfg file, grub2/grub.cfg in this example, provides menu options for booting further. The PXE boot grub.cfg config file specifies the locations of the kernel and initrd files relative to the tftp directory.

  • Configure the HTTP Directory:

Configure the HTTP file directory and ISO image by placing a copy of the Base OS 7.0.0 ISO in directory /local/http/base_os_7.0.0/. In this example, the full path is /local/http/base_os_7.0.0/`base_os_7.0.0.iso.

  • Configure the TFTP Directory By Using the Following Steps:

Mount the Base OS 7.0.0 ISO. Assume your mount point is /mnt:

sudo mount -o loop /local/http/base_os_7.0.0/base_os_7.0.0.iso /mnt

Copy the kernel and initrd from the ISO to the tftp directory:

cp /mnt/casper/vmlinuz /local/tftp/grub2/base_os_7.0.0/
cp /mnt/casper/initrd /local/tftp/grub2/base_os_7.0.0/

Unmount the Base OS 7.0.0 ISO:

umount /mnt
  • Download GRUB Packages For x86_64 and Copy the *.efi shim Binaries for PXE Booting into Place:

cd /tmp
wget https://mirror.stream.centos.org/9-stream/BaseOS/x86_64/os/Packages/grub2-efi-x64-2.06-94.el9.x86_64.rpm
wget https://mirror.stream.centos.org/9-stream/BaseOS/x86_64/os/Packages/shim-x64-15-15.el8_2.x86_64.rpm

Note

If either grub2-efi-x64-2.06-94.el9.x86_64.rpm or shim-x64-15-15.el8_2.x86_64.rpm do not exist on https://mirror.stream.centos.org/9-stream/BaseOS/x86_64/os/Packages, use the most recent version of either file from the same https: location.

Unpack the RPMs with the following commands:

rpm2cpio grub2-efi-x64-2.06-94.el9.x86_64.rpm | cpio -idmv
rpm2cpio shim-x64-15-15.el8_2.x86_64.rpm | cpio -idmv

Copy the following binaries from the unpacked RPMs to /local/tftp/grub2/:

cp -p $(find . -name shimx64.efi) /local/tftp/grub2/shimx64.efi
cp -p $(find . -name grubx64.efi) /local/tftp/grub2/grubx64.efi
cd -

Make a copy of shimx64.efi in /local/tftp/grub2/ and name the copy bootx64.efi:

cp -p /local/tftp/grub2/shimx64.efi /local/tftp/grub2/bootx64.efi

For more information, refer to How to netboot the server installer on amd64.

x86_64 Alternate Instructions If initrd File Transfer Times Out Over TFTP

In some cases, the transfer of the initrd can time out over TFTP. A workaround for this is to host the requisite files - initrd, vmlinuz, and the ISO - over HTTP instead. Hosting these over HTTP makes the transfer faster and more reliable. In this example, assume the HTTP server files are hosted from /local/http. The directory structure on the HTTP and TFTP server looks like this:

/local/
   http/
      base_os_7.0.0/
         base_os_7.0.0.iso
         vmlinuz
         initrd
   tftp/
      grub2/
         grub.cfg
         bootx64.efi
         grubx64.efi

Note

In this alternate configuration method, the vmlinuz and initrd file and the ISO, base_os_7.0.0.iso are now all specified relative to the HTTP root, /local/http/.

When configured this way, the grub.cfg file will contain text that will look something like this:

set default=0
set timeout=-1
insmod all_video

menuentry 'Install BaseOS 7.0.0'  {
  linuxefi (http,<HTTP Server IP>)/base_os_7.0.0/vmlinuz fsck.mode=skip autoinstall ip=dhcp url=http://<Server IP>/base_os_7.0.0/base_os_7.0.0.iso nvme-core.multipath=n nouveau.modeset=0
  initrdefi (http,<HTTP Server IP>)/base_os_7.0.0/initrd
}

Note

The kernel boot parameters should match the contents of the corresponding ISO’s boot menu, found in /mnt/boot/grub/grub.cfg.

ARM64 Instructions

Download the relevant grub package with the correct architecture specified:

wget https://mirror.stream.centos.org/9-stream/BaseOS/aarch64/os/Packages/grub2-efi-aa64-2.06-94.el9.aarch64.rpm

Unpack the RPMs with the following commands:

rpm2cpio grub2-efi-aa64-2.06-94.el9.aarch64.rpm | cpio -idmv

Copy the following binary from the unpacked RPM to /local/tftp/grub2/:

grubaa64.efi

An example of the directory structure on the HTTP and TFTP server could be similar to:

/local/
   http/
      base_os_7.0.0/
         base_os_7.0.0.iso
         vmlinuz
         initrd
   tftp/
      grub2/
         grub.cfg
         bootx64.efi
         grubaa64.efi

When configured this way, the grub.cfg file would contain lines similar to:

set default=0
set timeout=-1
insmod all_video

menuentry 'Install BaseOS 7.0.0' {
    linuxefi (http,<HTTP Server IP>)/base_os_7.0.0/vmlinuz fsck.mode=skip autoinstall ip=dhcp url=http://<Server IP>/base_os_7.0.0/base_os_7.0.0.iso nvme-core.multipath=n nouveau.modeset=0
    initrdefi (http,<HTTP Server IP>)/base_os_7.0.0/initrd

Note

The kernel boot parameters should match the contents of the corresponding ISO’s boot menu, found in /mnt/boot/grub/grub.cfg.

TFTP and HTTP Server Verification#

After you have set up all elements of your PXE server, prior to doing a PXE install, verify that the TFTP and HTTP servers are working properly.

TFTP Server Verification

To verify that the TFTP server has been set up correctly, from a different system on the same subnet, use tftp to get one of the files that will be obtained via tftp during the PXE boot. In this example, the TFTP server has been set up to serve files from /local/tftp. The grub configuration file, grub.cfg is located on /local/tftp/grub2/grub.cfg; therefore, from the TFTP command prompt, request grub.cfg via get grub2/grub.cfg.

cd /tmp
tftp <TFTP_Server_IP>
get grub2/grub.cfg
quit

HTTP Server Verification

To verify that the HTTP server has been set up correctly, use the wget command to get one of the files that will be obtained via HTTP during the PXE boot. In this example, the HTTP server has been set up to serve files from /local/http. The ISO, base_os_7.0.0.iso, is located on /local/http/base_os_7.0.0/base_os_7.0.0.iso; therefore, test the HTTP request to retrieve base_os_7.0.0.iso by running the following commands:

cd /tmp
wget http://<HTTP Server IP>/baseos-7/base_os_7.0.0.iso

Useful Parameters for Configuring Your System’s Network Interfaces#

ip=dhcp: tells the initramfs to automatically configure the system’s interfaces using DHCP.

  • If only one interface is connected to the network, this should be sufficient.

  • If multiple interfaces are connected to the network, the first interface that receives a reply will be used.

Parameters Unique to the Base OS Installer#

  • rebuild-raid tells the installer to rebuild the data RAID if specified. Installing from the factory should always specify this, but it is optional otherwise.

  • md5checkdisc will not perform an installation when this is specified. It will simply unpack the ISO and check that its contents match what is described in md5sum.txt.

  • offwhendone powers off the system after the installation. Otherwise, we reboot when done. Factory installs will specify this.

  • nooemconfig skip oemconfig and create default user nvidia, seeding initial password. Used for touchless install in PXE install or automatic VM creation/installation.

  • force-ai allows you to supply your autoinstall file. If the networking is set up, you can provide a URL. Otherwise, this has to be one that exists in the installer.

For example:

force-ai=/ai/dgx_a100-ai.yaml
force-ai=http://your-server.com/your-ai.yaml

Note

Refer to the note the Autoinstall Customizations section for special formatting considerations when using custom autoinstall files along with the force-ai parameter.

Configure Your DHCP Server#

The DHCP server is responsible for providing the IP address of the TFTP server and the name of the bootloader file in addition to the usual role of providing dynamic IP addresses. The address of the TFTP server is specified in the DHCP configuration file as next-server, and the bootloader file is specified as filename. The architecture option can be used to detect the architecture of the client system and to serve the correct version of the grub bootloader (x86, IA-32, ARM, and so on).

An example of the PXE portion of dhcpd.conf is:

class "pxeclients" {
match if substring (option vendor-class-identifier, 0, 9) = "PXEClient";

next-server <TFTP_Server_IP>;

   # x86 UEFI
   if option arch = 00:06 {
      filename "grub2/bootx64.efi";
   # x64 UEFI
   } else if option arch = 00:07 {
      filename "grub2/bootx64.efi";
   # ARM 64-bit UEFI
   } else if option arch = 00:0b {
      filename "grub2/grubaa64.efi";
   } else {
      filename "pxelinux.0";
   }
}

An example of the subnet portion for the system of dhcpd.conf is shown below. The target network is 10.10.10.0. The DHCP server will assign the specific IP address from the range specified. Specific IP addresses can be configured. Consult the PXE installation setup documentation instructions for your DHCP server for more detailed information.

subnet 10.10.10.0 netmask 255.255.255.0

{ option broadcast-address 10.10.10.255;
  option routers 10.10.10.1;
  option subnet-mask 255.255.255.0;
  option interface-mtu 9000;
  range 10.10.10.200 10.10.10.250; }

(Optional) Configure CX-4/5/6/7 Cards to PXE Boot#

DGX-Servers may also PXE boot using the MLNX CX-4/5/6 cards. If you are logged into the DGX-Server host OS, and running DGX Base OS 4.4 or later, you can perform this section’s steps using the /usr/sbin/mlnx_pxe_setup.bash tool, which will enable the UEFI PXE ROM of every MLNX Infiniband device found.

Otherwise, proceed with the manual steps below.

Enable UEFI for ConnectX Cards#

To PXE boot from the MLNX ConnectX-4/5/6/7 cards, you must first enable the UEFI PXE ROM of the card you wish to PXE boot from because it is disabled by default. This needs to be performed from the DGX Server host OS itself, it cannot be done remotely.

Follow these steps using the DOCA OFED software, which allows you to install the same drivers and tools of MLNX_OFED.

  1. To determine the device name and current configurations of the MLNX ConnectX cards, run the sudo mlxconfig query command:

    $ sudo mlxconfig query
    
    Device #1:
    ----------
    
    Device type:    ConnectX4
    Name:           MCX455A-ECA_Ax
    Description:    ConnectX-4 VPI adapter card; EDR IB (100Gb/s) and 100GbE; single-port QSFP28; PCIe3.0 x16; ROHS R6
    Device:         /dev/mst/mt4115_pciconf3
    
    Configurations:                              Next Boot
             ...
             ...
             EXP_ROM_UEFI_x86_ENABLE             False(0)
             ...
             ...
    
  2. Enable UEFI PXE ROM of the ConnectX card:

    user@dgx1server$ sudo mlxconfig -y -d /dev/mst/mt4115_pciconf3 set EXP_ROM_UEFI_x86_ENABLE=1
    
  3. Reboot the server for the change to take effect.

    user@dgx1server$ sudo reboot
    
  4. Upon reboot, confirm the configuration was set.

    user@dgx1server$ sudo mlxconfig query
    
    Device #1:
    ----------
    
    Device type:    ConnectX4
    Name:           MCX455A-ECA_Ax
    Description:    ConnectX-4 VPI adapter card; EDR IB (100Gb/s) and 100GbE; single-port QSFP28; PCIe3.0 x16; ROHS R6
    Device:         /dev/mst/mt4115_pciconf3
    
    Configurations:                              Next Boot
             ...
             ...
             EXP_ROM_UEFI_x86_ENABLE             True(1)
             ...
             ...
    

(Optional) Configure the DGX-Server to PXE Boot Automatically#

Add PXE to the Top of the UEFI Boot Order#

On systems with a BMC, you can specify the DGX-Server to PXE boot by adding it to the top of the UEFI boot order. This may be done out-of-band via IPMI.

ipmitool -I lanplus -H <DGX_BMC_IP> -U <ADMIN> -P <PASSWORD> chassis bootdev pxe options=efiboot

Note

This only sets the DGX-Server to PXE boot, but doesn’t specify the order of network devices to attempt PXE from. This is a limitation of our current UEFI and BMC FW. See the following section to specify the network device boot order.

Configure Network Boot Priorities#

The UEFI Network Drive BBS Priorities allows you to specify the order of network devices to PXE boot from. To modify this, you must reboot your DGX-Server and enter the UEFI boot selection menu by pressing F2 or Del when you see the splash screen. Navigate to the Boot menu, and then scroll down to UEFI NETWORK Drive BBS Priorities.

_images/sbios-splash-screen-1.png

Configure the order of devices to attempt network boots from using this menu.

_images/sbios-splash-screen-2.png

Save and Exit.

After you finished ordering the network boot priorities, save your changes and reset.

_images/sbios-splash-screen-3.png

Make the DGX-Server PXE Boot#

Automated PXE Boot Process#

If you followed the optional steps above, you can now reboot and UEFI will attempt PXE boot using the devices in order specified in the Network Drive BBS Priorities list.

Manual PXE Boot Process#

If you want to manually trigger the PXE boot, reboot your DGX-Server and enter the UEFI boot selection menu by pressing F2 or Del when you see the splash screen.

Navigate to the Save & Exit menu, scroll down to the Boot Override section, and choose the appropriate network port to boot from. The MLNX cards will only appear if you enable the UEFI PXE ROM of that card.

_images/sbios-splash-screen-4.png

Alternatively, you can press F12 at the SBIOS splash screen, and SBIOS will iterate thru each NIC and try PXE on each one. The order of the NICs attempted is specified by the Network Drive BBS Priorities.

Other IPMI Boot Options#

For more information about specifying boot order via IPMI, see the chassis command and bootdev subcommand in the ipmitool man page.

For more information about the IPMI specification, refer to Intelligent Platform Management Interface Specification v2.0 rev. 1.1.

Autoinstall Customizations#

There are many autoinstall files that you can reference inside the Base OS 7.x ISO; these are contained in:

casper/ubuntu-server-minimal.ubuntu-server.installer.kernel.nvidia.squashfs

You can mount the ISO and then mount this squashfs to view the many autoinstall files that are packed within:

mkdir -p /tmp/iso_mnt
mkdir -p /tmp/squash_mnt
sudo mount /path/to/DGXOS-<version>-<date>.iso /tmp/iso_mnt/
sudo mount /tmp/iso_mnt/casper/ubuntu-server-minimal.ubuntu-server.installer.kernel.nvidia.squashfs /tmp/squash_mnt/
find /tmp/squash_mnt/ai/ -name '*.yaml'

For some deployments, you might want to use their own autoinstall files. This section will describe some sections contained in the built-in autoinstall files as well as how to perform some common customizations.

Note

The installer expects a unified autoinstall file rather than the typical split vendor/user/meta-data format. This means that the user-supplied autoinstall file will need to account for some formatting differences, namely, the autoinstall: keyword needs to be dropped and the indentations adjusted accordingly:

#
# typical user-data file
#
#cloud-config
autoinstall:
  version: 1
  identity:
    realname: 'DGX User'
    username: dgxuser
    password: '$6$g3vXaGj.MQpP/inN$l6.JtAueRAfMtQweK7qASjxXiEX8Vue3CvRcwON81Rt9BJmlEQKtnfOVSnCqHrTsy88PbMDDHq6k.iM6PWfHr1'

#
# unified autoinstall file
#
version: 1
identity:
  realname: 'DGX User'
  username: dgxuser
  password: '$6$g3vXaGj.MQpP/inN$l6.JtAueRAfMtQweK7qASjxXiEX8Vue3CvRcwON81Rt9BJmlEQKtnfOVSnCqHrTsy88PbMDDHq6k.iM6PWfHr1'

NVIDIA-Specific Autoinstall Variables#

The autoinstall files contained in the ISO are platform-specific, and serve as a good starting point for custom versions. Many of them contain variables, prefixed with CHANGE_, which will be substituted by the installer:

  • CHANGE_STORAGE_REG This gets removed and uncommented when the boot parameter ai-encrypt-root is not present. Uncommenting this stanza results in the standard disk partitioning scheme without LUKS encryption.

  • CHANGE_STORAGE_ENC This gets removed and uncommented when the boot parameter ai-encrypt-root is present. Uncommenting this stanza results in an encrypted root partition.

  • CHANGE_BOOT_DISK_NAME_x This is a disk-name, without the /dev prefix. There may be multiple ones (for example, CHANGE_BOOT_DISK_NAME_1 and CHANGE_BOOT_DISK_NAME_2) for platforms that expect a RAIDed boot device as is the case for DGX A100.

    Note

    The installer will find the appropriate disk name to substitute here. Alternatively, the force-bootdisk parameter can be used to specify the disk name(s).

  • CHANGE_BOOT_DISK_PATH_x This is the same as the CHANGE_BOOT_DISK_NAME_x variable above, except that it is prefixed with /dev/.

  • CHANGE_DESC_PLATFORM The installer will substitute this with a platform-specific descriptive name.

  • CHANGE_SERIAL_NUMBER The installer will substitute this with the serial number reported by dmidecode.

  • CHANGE_INSTALL_PKGS The installer will substitute this value with a list of packages specific to the platform. The lists of packages are specified by the *-pkgs files in the squashfs.

    Note

    The list of packages here will include oem-config and its dependencies. When you supply your own autoinstall file, you also want to perform the additional setup steps provided by oem-config and have these steps performed during autoinstall instead. For this use case we recommend adding, in the late-commands section, a step to remove the oem-config and ubiquity packages:

    late-commands:
     ...
     - curtin in-target -- apt-get purge -y oem-config ubiquity
    
  • CHANGE_REBUILD_RAID This gets replaced with either true or false based on whether or not the rebuild-raid boot parameter is present.

  • CHANGE_IPMISOL This gets replaced with either true or false based on whether or not the ai-encrypt-root boot parameter is present. When we set the system up with encryption, we also undo the IPMI serial-over-LAN configuration to ensure that the LUKS passphrase prompt shows up on the console rather than the serial-over-LAN interface.

Attention

While it is possible to replace these values on your own, we strongly recommend letting the installer handle this.

Common Customizations#

In this section, we will describe some common customizations that may be useful in more custom deployments.

Network Configuration#

To configure the network at install time, you can add a network section to your autoinstall file. In this example we will create a netplan configuration file that sets the enp1s0f0 interface to use DHCP:

network:
 version: 2
 ethernets:
   enp1s0f0:
    dhcp4: yes

Creating a User#

To create a user at install time, you can add an identity section to your autoinstall file. The following example sets the system hostname to dgx and creates a user nvidia with the password nvidia.

#  To generate an encrypted password:
#    printf '<plaintext_password>' | openssl passwd -6 -stdin
#
#  For example:
#    printf 'nvidia' | openssl passwd -6 -stdin
identity:
  hostname: dgx
  password: $6$8fqF54QDoaLMtDXJ$J02iNH1xW9hHtzH6APpUX4X4HkRx2xY2ZKy9DQpGOQhW7OOuTk3DwHr9FnAAh1JIyqn3L277Jy9MEzW4MyVsV0
  username: nvidia

For more examples, refer to the Autoinstall configuration reference manual in the Ubuntu installation documentation.