PXE Boot Setup
The DGX-Server UEFI BIOS supports PXE boot. Several manual customization steps are required to get PXE to boot the Base OS image.
Caution
This document is meant to be used as a reference. Explicit instructions are not given to configure the DHCP, FTP, and TFTP servers. It is expected that the end user’s IT team will configure them to fit within their company’s security guidelines.
Pre-requisites
- TFTP server is setup
TFTP is configured to serve files from /local/tftp/
- HTTP server is setup
HTTP is configured to serve files from /local/http/
DHCP server is setup
IP address is <FTP IP>
Fully qualified host is <FTP host>
This document is intended to provide detailed step-by-step instructions on how to set up a PXE boot environment for DGX systems. The examples are based on a DGX A100. There are several major components of the solution:
DHCP server: dnsmasq is used in this doc.
TFTP server: dnsmasq is also used as a TFTP server.
HTTP server: HTTP server is used to transfer large files, such is ISO image and initrd. Alternatively, FTP can also be used for this purpose. HTTP in used this doc.
Syslinux: Linux bootloader software package. section
Overview of the PXE Server
The PXE Server is divided up into three general areas:
Bootloader (grub)
TFTP contents (the kernel and initrd)
HTTP contents (the ISO image)
The rough directory structure on the TFTP and HTTP server will look like this:
/local/
http/
base_os_6.0.0/
base_os_6.0.0.iso
tftp/
grub2/
base_os_6.0.0/
vmlinuz
initrd
grub.cfg
bootx64.efi
The tftp-server (controlled by the xinetd service and configuration found in /etc/xinetd.d/tftp) points to the /local/tftp directory for when the system PXE boots. TFTP is what transfers the bootx64.efi file that is designated in the DHCP server’s dhcpd.conf file (see Configure your DHCP server). By default after the bootx64.efi is booted it looks for a grub2/grub.cfg file with the menu options for booting further. That config file will look for its kernel and initrd files relative to the tftp directory.
The following steps will assume the DHCP and PXE servers are configured to use the above directory structure. The lab admin, or whoever is in charge of deploying the PXE environment, should change the directory names and structure to fit their infrastructure.
Configuring the HTTP File Directory and ISO Image
Place a copy of the BaseOS 6.0.0 ISO in /local/http/base_os_6.0.0/
Mount the BaseOS 6.0.0 ISO
Assume your mount point is “/mnt”:
sudo mount -o loop /local/http/base_os_6.0.0/base_os_6.0.0.iso /mnt
Copy the kernel and initrd from the ISO to the TFTP Directory
cp /mnt/casper/vmlinuz /local/tftp/grub2/base_os_6.0.0/
cp /mnt/casper/initrd /local/tftp/grub2/base_os_6.0.0/
Configure the TFTP directory
Mount the BaseOS 6.0.0 ISO Assume your mount point is “/mnt”:
sudo mount -o loop /local/http/base_os_6.0.0/base_os_6.0.0.iso /mnt
Copy the kernel and initrd from the ISO to the TFTP Directory
cp /mnt/casper/vmlinuz /local/tftp/grub2/base_os_6.0.0/
cp /mnt/casper/initrd /local/tftp/grub2/base_os_6.0.0/
Download GRUB Packages For x86_64:
Download the relevant grub packages with the correct architecture specified:
wget http://mirror.centos.org/centos/7/updates/x86_64/Packages/grub2-efi-x64-2.02-0.87.el7.centos.7.x86_64.rpm
wget http://mirror.centos.org/centos/7/os/x86_64/Packages/shim-x64-15-8.el7.x86_64.rpm
Unpack the RPMs with the following commands:
rpm2cpio grub2-efi-x64-2.02-0.86.el7.centos.x86_64.rpm | cpio -idmv
rpm2cpio shim-x64-15-8.el7.x86_64.rpm | cpio -idmv
Copy the following binaries from the unpacked RPMs to /local/tftp/grub2/
:
shim.efi
shimx64.efi
grubx64.efi
Make a copy of shimx64.efi
in /local/tftp/grub2/ and name the copy bootx64.efi
For arm64:
Download the relevant grub package with the correct architecture specified:
wget http://mirror.centos.org/altarch/7/updates/aarch64/Packages/grub2-efi-aa64-2.02-0.87.el7.centos.7.aarch64.rpm
Unpack the RPMs with the following commands:
rpm2cpio grub2-efi-aa64-2.02-0.87.0.1.el7.centos.9.aarch64.rpm | cpio -idmv
Copy the following binary from the unpacked RPM to /local/tftp/grub2/:
grubaa64.efi
Create the grub configuration file
The contents of the /local/tftp/grub2/grub.cfg file should look something like:
set default=0
set timeout=-1
insmod all_video
menuentry 'Install BaseOS 6.0.0' {
linuxefi /grub2/base_os_6.0.0/vmlinuz fsck.mode=skip autoinstall ip=dhcp url=http://<Server IP>/base_os_6.0.0/base_os_6.0.0.iso nvme-core.multipath=n nouveau.modeset=0
initrdefi /grub2/base_os_6.0.0/initrd
}
Note
NOTE 1: The vmlinuz and initrd files are specified relative to the TFTP root - in this example, relative to /local/tftp/. The location of the ISO is relative to the HTTP root - in this example, /local/http/.
NOTE 2: The kernel boot parameters should match the contents of the corresponding ISO’s boot menu, found in /mnt/boot/grub/grub.cfg.
NOTE 3: In some cases, the transfer of the initrd can time out over FTP. A work around for this is to host the requisite files – initrd, vmlinuz, and the ISO – over HTTP instead. Hosting these over HTTP makes the transfer speedier and more reliable. In this example, we assume that the HTTP server is hosted from /local/http. We will need to copy these files to this location:
/local/
http/
base_os_6.0.0/
base_os_6.0.0.iso
vmlinuz
initrd
tftp/
grub2/
grub.cfg
bootx64.efi
grubaa64.efi
When configured this way, the grub.cfg file may look something like:
set default=0
set timeout=-1
insmod all_video
menuentry 'Install BaseOS 6.0.0' {
linuxefi (http,<HTTP Server IP>)/base_os_6.0.0/vmlinuz fsck.mode=skip autoinstall ip=dhcp url=http://<Server IP>/base_os_6.0.0/base_os_6.0.0.iso nvme-core.multipath=n nouveau.modeset=0
initrdefi (http,<HTTP Server IP>)/base_os_6.0.0/initrd
}
Useful parameters for configuring your system’s network interfaces:
ip=dhcp
: tells the initramfs to automatically configure the system’s interfaces using DHCP.
- If only one interface is connected to the network then this should be enough.
- If multiple interfaces are connected to the network, then it will go with the first one that receives a reply.
Parameters unique to the Base OS installer
rebuild-raid
tells the installer to rebuild the data RAID if specified. Installs from the factory should always specify this, but it is optional otherwisemd5checkdisc
will not perform an installation when this is specified. It will simply unpack the ISO and check that its contents match with what’s described in md5sum.txtoffwhendone
powers off the system after the installation. Otherwise, we reboot when done. Factory installs will specify this.nooemconfig
skip oemconfig and create default user “nvidia”, seeding initial password. Used for touchless install in PXE install or automatic VM creation/installation.force-ai
allows users to supply their own autoinstall file. If the networking is set up, then users can provide a URL. Otherwise, this has to be one that exists in the installer.
For example:
force-ai=/ai/dgx2-ai.yaml
force-ai=http://your-server.com/your-ai.yaml
Note
Refer to the note the Autoinstall Customizations section for special formatting considerations when using custom autoinstall files along with the force-ai
parameter.
Configure DHCP
The DHCP server is responsible for providing the IP address of the TFTP server and the name of the bootloader file in addition to the usual role of providing dynamic IP addresses. The address of the TFTP server is specified in the DHCP configuration file as “next-server”, and the bootloader file is specified as “filename”. The architecture option can be used to detect the architecture of the client system and used to serve the correct version of the grub bootloader (x86, ia32, arm, etc).
An example of the PXE portion of dhcpd.conf is:
next-server <TFTP_Server_IP>;
# x86 UEFI
if option arch = 00:06 {
filename "grub2/bootx64.efi";
# x64 UEFI
} else if option arch = 00:07 {
filename "grub2/bootx64.efi";
# ARM 64-bit UEFI
} else if option arch = 00:0b {
filename "grub2/grubaa64.efi";
} else {
filename "pxelinux.0";
}
Optional: Configure CX-4/5/6/7 cards to PXE boot
DGX-Servers may also PXE boot using the MLNX CX-4/5/6 cards. If you are logged into the DGX-Server host OS, and running DGX Base OS 4.4 or later, then you can perform this section’s steps using the “/usr/sbin/mlnx_pxe_setup.bash” tool, which will enable the UEFI PXE ROM of every MLNX Infiniband device found.
Otherwise, proceed with the manual steps below.
Query UEFI PXE ROM state
In order to PXE boot from the MLNX CX-4/5/6/7 cards, you must first enable the UEFI PXE ROM of the card you wish to PXE boot from because it is disabled by default. This needs to be performed from the DGX Server host OS itself, it can’t be done remotely.
DGX OS 6 provides the in-tree OFED stack by default, but users may optionally install MOFED on top. The commands used to query and enable the UEFI PXE ROM will differ based on whether you are using the in-tree OFED vs. MOFED stack.
MOFED Instructions
To determine the device name and current configurations of the MLNX CX cards, run “sudo mlxconfig query”:
user@dgx1server$ sudo mlxconfig query
Device #1:
----------
Device type: ConnectX4
Name: MCX455A-ECA_Ax
Description: ConnectX-4 VPI adapter card; EDR IB (100Gb/s) and 100GbE; single-port QSFP28; PCIe3.0 x16; ROHS R6
Device: /dev/mst/mt4115_pciconf3
Configurations: Next Boot
...
...
EXP_ROM_UEFI_x86_ENABLE False(0)
...
...
In-tree OFED Instructions
To determine the device name and current configurations of the MLNX CX cards, run "sudo mstconfig query":
user@dgxserver:~$ sudo mstconfig query
Device #1:
----------
Device type: ConnectX7
Name: MCX755206AS-NEA_Ax
Description: NVIDIA ConnectX-7 VPI adapter card; 400Gb/s IB and 200GbE; dual-port QSFP; PCIe 5.0 x16 with x16 PCIe extension option; dual slot; secure boot; no crypto; tall bracket for Nvidia DGX storage
Device: /sys/bus/pci/devices/0000:b1:00.0/config
Configurations: Next Boot
...
...
EXP_ROM_UEFI_x86_ENABLE False(0)
...
...
Enable UEFI PXE ROM
The "EXP_ROM_UEFI_x86_ENABLE" configuration must be set to True(1) for the MLNX CX card that you wish to PXE boot from, and reboot.
MOFED Instructions
user@dgx1server$ sudo mlxconfig -y -d /dev/mst/mt4115_pciconf3 set EXP_ROM_UEFI_x86_ENABLE=1
user@dgx1server$ sudo reboot
Upon reboot, confirm the configuration was set.
user@dgx1server$ sudo mlxconfig query
Device #1:
----------
Device type: ConnectX4
Name: MCX455A-ECA_Ax
Description: ConnectX-4 VPI adapter card; EDR IB (100Gb/s) and 100GbE; single-port QSFP28; PCIe3.0 x16; ROHS R6
Device: /dev/mst/mt4115_pciconf3
Configurations: Next Boot
...
...
EXP_ROM_UEFI_x86_ENABLE True(1)
...
...
In-tree OFED Instructions
user@dgxserver:~$ sudo mstconfig -y -d b1:00.0 set EXP_ROM_UEFI_x86_ENABLE=1
user@dgx1server$ sudo reboot
Upon reboot, confirm the configuration was set.
user@dgxserver:~$ sudo mstconfig query
Device #1:
----------
Device type: ConnectX7
Name: MCX755206AS-NEA_Ax
Description: NVIDIA ConnectX-7 VPI adapter card; 400Gb/s IB and 200GbE; dual-port QSFP; PCIe 5.0 x16 with x16 PCIe extension option; dual slot; secure boot; no crypto; tall bracket for Nvidia DGX storage
Device: /sys/bus/pci/devices/0000:b1:00.0/config
Configurations: Next Boot
...
...
EXP_ROM_UEFI_x86_ENABLE True(1)
...
...
Optional: Configure the DGX-Server to PXE boot automatically
Add PXE to the top of the UEFI boot order
On systems that have a BMC, you can specify the DGX-Server to PXE boot by adding it to the top of the UEFI boot order. This may be done out-of-band via IPMI.
ipmitool -I lanplus -H <DGX_BMC_IP> -U <ADMIN> -P <PASSWORD> chassis bootdev pxe options=efiboot
Note
that this only sets the DGX-Server to PXE boot, but doesn’t specify the order of network devices to attempt PXE from. This is a limitation of our current UEFI and BMC FW. See the following section to specify the network device boot order.
Configure network boot priorities
The UEFI Network Drive BBS Priorities allows you to specify the order of network devices to PXE boot from. To modify this, you must reboot your DGX-Server and enter the UEFI boot selection menu by pressing “F2” or “Del” when you see the splash screen. Navigate to the “Boot” menu, and then scroll down to the “UEFI NETWORK Drive BBS Priorities”
Configure the order of devices to attempt network boots from using this menu.
Save and Exit.
Once you’ve finished ordering the network boot priorities, then save your changes and reset.
Make the DGX-Server PXE boot
Automated PXE Boot Process
If you’ve followed the optional steps above, then you can simply reboot, and UEFI will attempt PXE boot using the devices in order specified in the Network Drive BBS Priorities list.
Manual PXE Boot Process
If you want to manually trigger the PXE boot, then reboot your DGX-Server and enter the UEFI boot selection menu by pressing “F2” or “Del” when you see the splash screen.
Navigate to the “Save & Exit” menu, scroll down to the Boot Override section, and choose the appropriate network port to boot from. The MLNX cards will only appear if you enabled the UEFI PXE ROM of that particular card.
Alternatively, you can press “F12” at the SBIOS splash screen, and SBIOS will iterate thru each NIC and try PXE on each one. The order of the NICs attempted is specified by the Network Drive BBS Priorities.
Other IPMI boot options
For more information about specifying boot order via IPMI, see the “ipmitool” man page, and look at the “chassis” command, and “bootdev” subcommand: https://linux.die.net/man/1/ipmitool
For more information about the IPMI specification, refer to Intelligent Platform Management Interface Specification v2.0 rev. 1.1.
Autoinstall Customizations
The Base OS 6.x installer has undergone major changes compared to the Base OS 5.x installer. The Base OS 6.x installer now uses subiquity, which supports autoinstall instead of curtin.
Autoinstall and curtin both serve similar purposes but have some syntactic differences – be aware of these when porting old curtin files. There are many autoinstall files that users can reference inside the Base OS 6.x ISO; these are contained in:
casper/ubuntu-server-minimal.ubuntu-server.installer.kernel.nvidia.squashfs
Users can mount the ISO and then mount this squashfs to view the many autoinstall files that are packed within:
mkdir -p /tmp/iso_mnt
mkdir -p /tmp/squash_mnt
sudo mount /path/to/DGXOS-<version>-<date>.iso /tmp/iso_mnt/
sudo mount /tmp/iso_mnt/casper/ubuntu-server-minimal.ubuntu-server.installer.kernel.nvidia.squashfs /tmp/squash_mnt/
find /tmp/squash_mnt/ai/ -name '*.yaml'
For some deployments, users may want to use their own autoinstall files. This section will describe some sections contained in the built-in autoinstall files as well as how to perform some common customizations.
Note
The installer expects a unified autoinstall file rather than the typical split vendor/user/meta-data format. This means that the user-supplied autoinstall file will need to account for some formatting differences – namely, the autoinstall:
keyword needs to be dropped and the indentations adjusted accordingly:
#
# typical user-data file
#
#cloud-config
autoinstall:
version: 1
identity:
realname: 'DGX User'
username: dgxuser
password: '$6$g3vXaGj.MQpP/inN$l6.JtAueRAfMtQweK7qASjxXiEX8Vue3CvRcwON81Rt9BJmlEQKtnfOVSnCqHrTsy88PbMDDHq6k.iM6PWfHr1'
#
# unified autoinstall file
#
version: 1
identity:
realname: 'DGX User'
username: dgxuser
password: '$6$g3vXaGj.MQpP/inN$l6.JtAueRAfMtQweK7qASjxXiEX8Vue3CvRcwON81Rt9BJmlEQKtnfOVSnCqHrTsy88PbMDDHq6k.iM6PWfHr1'
NVIDIA-Specific Autoinstall Variables
The autoinstall files contained in the ISO are platform-specific, and serve as a good starting point for custom versions.
Many of them contain variables, prefixed with CHANGE_
which will be substituted by the installer:
CHANGE_STORAGE_REG
This gets removed and uncommented when the boot parameter ai-encrypt-root is not present. Uncommenting this stanza results in the standard disk partitioning scheme without LUKS encryption.CHANGE_STORAGE_ENC
This gets removed and uncommented when the boot parameter “ai-encrypt-root” is present. Uncommenting this stanza results in an encrypted root partition.CHANGE_BOOT_DISK_NAME_x
This is a disk-name, without the “/dev” prefix. There may be multiple ones (e.g. CHANGE_BOOT_DISK_NAME_1 and CHANGE_BOOT_DISK_NAME_2) for platforms that expect a RAIDed boot device as – is the case for DGX-2 and DGX A100.Note
The installer will find the appropriate disk name to substitute here. Alternatively, the “force-bootdisk” parameter can be used to specify the disk name(s).
CHANGE_BOOT_DISK_PATH_x
This is the same as the CHANGE_BOOT_DISK_NAME_x variable above, except that it is prefixed with “/dev/”.CHANGE_DESC_PLATFORM
The installer will substitute this with a platform-specific descriptive name.CHANGE_SERIAL_NUMBER
The installer will substitute this with the serial number reported by dmidecode.CHANGE_INSTALL_PKGS
The installer will substitute this value with a list of packages specific to the platform. The lists of packages are specified by the*-pkgs
files in thesquashfs
Note
The list of packages here will include oem-config and its dependencies. When users supply their own autoinstall file, they’ll generally also want to forego the additional setup steps provided by oem-config and have these steps performed during autoinstall instead. For this use-case we recommend adding, in the late-commands section, a step to remove the oem-config and ubiquity packages:
late-commands: ... - curtin in-target -- apt-get purge -y oem-config ubiquity
CHANGE_REBUILD_RAID
This gets replaced with either “true” or “false” based on whether or not the “rebuild-raid” boot parameter is present.CHANGE_IPMISOL
This gets replaced with either “true” or “false” based on whether or not the “ai-encrypt-root” boot parameter is present. When we set the system up with encryption, we also undo the IPMI serial-over-LAN configuration to ensure that the LUKS passphrase prompt shows up on the console rather than the serial-over-LAN interface.
Attention
While it is possible to replace these values on your own, we strongly recommend letting the installer handle this.
Common Customizations
In this section, we will describe some common customizations that may be useful in more custom deployments.
Network Configuration
To configure the network at install time, you can add a “network” section to your autoinstall file. In this example we will create a netplan configuration file that sets the enp1s0f0 interface to use DHCP:
network:
version: 2
ethernets:
enp1s0f0:
dhcp4: yes
Creating a User
To create a user at install time, you can add an “identity” section to your autoinstall file. In this example, we set the system’s hostname to “dgx” and create a user with the name/password of nvidia/nvidia.
# To generate an encrypted password:
# printf '<plaintext_password>' | openssl passwd -6 -stdin
#
# For example:
# printf 'nvidia' | openssl passwd -6 -stdin
identity:
hostname: dgx
password: $6$8fqF54QDoaLMtDXJ$J02iNH1xW9hHtzH6APpUX4X4HkRx2xY2ZKy9DQpGOQhW7OOuTk3DwHr9FnAAh1JIyqn3L277Jy9MEzW4MyVsV0
username: nvidia
There are many more examples documented in the Ubuntu autoinstall reference