Category Creation#
Individual category (typically by node type) settings are configured to address that particular type of node. This usually assumes that the hardware node configuration is the same for each node (in other words all the nodes of a particular type should have the same make, model, and configuration). While mixing various types of hardware into a single category is possible, it is much simpler not to do so.
Each major device type in the control plane is given a category. The settings for the category level apply to all nodes within that category. Each category is also assigned a software image in which to provision and boot all the nodes of that category.
The categories that need to be defined are:
slogin
k8s-system-admin
k8s-system-user
dgx-gb200/gb300-k8s and dgx-gb200/gb300-slurm
Note
the dgx-gb200 category is created by the bcm-post-install module, however if that is not being used, it will need to be defined manually (OEMs).
For each category the following tasks need to be completed:
Add <category name>.
cmsh -c "category; add <category name>; commit"
Set the software image.
cmsh -c "category; use <category name>; set softwareimage <category name>-image; commit"
Set the management network.
This is typically the network that the nodes in this category are provisioned from.
cmsh -c "category; use <category name>; set managementnetwork internalnet; commit"
Note
For both control planes and the dgx-gb200/gb300 categories, the management network is set to internalnet by default.
If bcm-netautogen is used, or if a separate dgxnet is created, the management network (dgxnet) should set to match if that is network that is provisioning that category.
Ensure this is cleared from the node level in order to inherit this property from the category.
Add BMC login credentials to the category. This should behave correctly if all nodes in that category have had their username/password set to the same value. If not, specify this at the node level for the control plane nodes.
cmsh -c "category use <category name>; bmcsettings; set username <bmc username>; set userid <bmc user id>; set password <bmc password>; commit"
Create and assign a disksetup.xml.
cmsh; category use <category>; set disksetup <double tab to see options>; commit
Note
hit enter to input in the xml manually/copy-paste or set disksetup <disksetup file name> if the file is already created.
This is unique per control plane node type, and they have different requirements. This is covered in the next section.
For any categories that will provision aarch64/ARM architecture nodes, the boot loader must be set to GRUB from syslinux.
cmsh -c "category use <category name>; set bootloader grub; commit"or
cmsh; category; use <aarch64/ARM category>; set bootloader grub; commitFor the dgx-gb200/gb300 category, ensure that the BMC settings are defined so that OOB power control can be established via BCM 11 itself. The firmware management mode also needs to be set for the firmware update process to work properly through BCM.
cmsh -c "category use <gb200 category>; bmcsettings; set firmwaremanagemode GB200; set password 0penBmc; set privilege ADMINISTRATOR; set userid 0; set username root; commit"or
cmsh; category use <gb200 category>; bmcsettings; set firmwaremanagemode GB200 set password <default bmc password> set privilege ADMINISTRATOR set userid 0 set username <default bmc username> commit
Note
For GB300 categories, set the firmwaremanagemode to GB200 at this time. This is a temporary requirement and will be updated to a dedicated GB300 mode in a future version of BCM.
Control Plane Disk Setup#
Each control plane category can have a specific disk setup depending on the server’s hardware model. It is assumed that all the servers in a particular category are of the same make and model. Since there are control nodes of varying hardware topologies, some information gathering with regards to PCIe addressing/topology needs to be done. This information gathering is covered in the Hardware Information Gathering section of the Appendix. Provided are the disksetup configurations for each category assuming the reference architecture models are used.
Note
If a non-reference server is being used, edit the example(s) below to reflect the drive count and PCI Express addresses of the drives. However, the correct partitioning is crucial to the installation of NVIDIA Mission Control Software.
slogin disksetup file#
Create and add a slogin disk setup file in /cm/local/apps/cmd/etc/htdocs/disk-setup/slogin-node-disksetup.xml.
Note
For non-reference servers, determine the equivalent PCIe address for each drive and update the disksetup file accordingly. The PCIe addresses shown in the examples are specific to the reference hardware models and may not match all server configurations.
Reference: Disk Setup for slogin nodes (based on Supermicro ARS-221GL-FNB-NC24B-DC Model).
Disk Setup for slogin nodes (based on Supermicro ARS-221GL-FNB-NC24B-DC Model)
<?xml version="1.0" encoding="UTF-8"?> <diskSetup> <device> <blockdev>/dev/disk/by-path/pci-0014:01:00.0-nvme-1</blockdev> <partition id="boot1" partitiontype="esp"> <size>512M</size> <type>linux</type> <filesystem>fat</filesystem> <mountPoint>/boot/efi</mountPoint> <mountOptions>defaults,noatime,nodiratime</mountOptions> </partition> <partition id="slash1"> <size>max</size> <type>linux raid</type> </partition> </device> <device> <blockdev>/dev/disk/by-path/pci-0015:01:00.0-nvme-1</blockdev> <partition id="boot2" partitiontype="esp"> <size>512M</size> <type>linux</type> <filesystem>fat</filesystem> <mountOptions>defaults,noatime,nodiratime</mountOptions> </partition> <partition id="slash2"> <size>max</size> <type>linux raid</type> </partition> </device> <device> <blockdev>/dev/disk/by-path/pci-0000:01:00.0-nvme-1</blockdev> <partition id="var1"> <size>1500G</size> <type>linux raid</type> </partition> <partition id="tmp1"> <size>max</size> <type>linux raid</type> </partition> </device> <device> <blockdev>/dev/disk/by-path/pci-0001:01:00.0-nvme-1</blockdev> <partition id="var2"> <size>1500G</size> <type>linux raid</type> </partition> <partition id="tmp2"> <size>max</size> <type>linux raid</type> </partition> </device> <raid id="slashraid"> <member>slash1</member> <member>slash2</member> <level>1</level> <filesystem>ext4</filesystem> <mountPoint>/</mountPoint> <mountOptions>defaults,noatime,nodiratime</mountOptions> </raid> <raid id="varraid"> <member>var1</member> <member>var2</member> <level>1</level> <filesystem>ext4</filesystem> <mountPoint>/var</mountPoint> <mountOptions>defaults,noatime,nodiratime</mountOptions> </raid> <raid id="tmpraid"> <member>tmp1</member> <member>tmp2</member> <level>0</level> <filesystem>ext4</filesystem> <mountPoint>/tmp</mountPoint> <mountOptions>defaults,noatime,nodiratime</mountOptions> </raid> </diskSetup>
Reference: slogin disk layout after provisioning
lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS nvme0n1 259:0 0 7T 0 disk ├─nvme0n1p1 259:1 0 1.5T 0 part │ └─md1 9:1 0 1.5T 0 raid1 /var └─nvme0n1p2 259:2 0 5.5T 0 part └─md2 9:2 0 11T 0 raid0 /tmp nvme1n1 259:3 0 7T 0 disk ├─nvme1n1p1 259:4 0 1.5T 0 part │ └─md1 9:1 0 1.5T 0 raid1 /var └─nvme1n1p2 259:5 0 5.5T 0 part └─md2 9:2 0 11T 0 raid0 /tmp nvme3n1 259:6 0 894.3G 0 disk ├─nvme3n1p1 259:14 0 512M 0 part └─nvme3n1p2 259:15 0 893.7G 0 part └─md0 9:0 0 893.7G 0 raid1 / nvme2n1 259:7 0 894.3G 0 disk ├─nvme2n1p1 259:12 0 512M 0 part /boot/efi └─nvme2n1p2 259:13 0 893.7G 0 part └─md0 9:0 0 893.7G 0 raid1 / root@a03-p1-aps-arm-01:~# df -h Filesystem Size Used Avail Use% Mounted on tmpfs 240G 62M 240G 1% /run /dev/md0 879G 7.1G 827G 1% / none 240G 0 240G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock efivarfs 384K 21K 364K 6% /sys/firmware/efi/efivars /dev/nvme2n1p1 511M 4.0K 511M 1% /boot/efi /dev/md1 1.5T 4.2G 1.4T 1% /var /dev/md2 11T 1.9M 11T 1% /tmp
Set the disksetup file in the category.
msh; category use slogin; set disksetup slogin-disksetup.xml; commit
k8s-system-admin disksetup file#
Create and add an k8s-system-admin disk setup file in /cm/local/apps/cmd/etc/htdocs/disk-setup/k8s-system-admin-disksetup.xml.
Reference: Disk setup for k8s-system-admin nodes (based on Supermicro SYS-221GE-FNB-NC24B-DC model)
Disk setup for k8s-system-admin nodes (based on Supermicro SYS-221GE-FNB-NC24B-DC model)
<?xml version="1.0" encoding="UTF-8"?> <diskSetup> <device> <blockdev>/dev/disk/by-path/pci-0000:04:00.0-nvme-1</blockdev> <partition id="boot2" partitiontype="esp"> <size>512M</size> <type>linux</type> <filesystem>fat</filesystem> <mountPoint>/boot/efi</mountPoint> <mountOptions>defaults,noatime,nodiratime</mountOptions> </partition> <partition id="slash2"> <size>max</size> <type>linux raid</type> </partition> </device> <device> <blockdev>/dev/disk/by-path/pci-0000:3d:00.0-nvme-1</blockdev> <partition id="shoreline1"> <size>1500G</size> <type>linux raid</type> </partition> <partition id="raid1"> <size>max</size> <type>linux raid</type> </partition> </device> <device> <blockdev>/dev/disk/by-path/pci-0000:3e:00.0-nvme-1</blockdev> <partition id="shoreline2"> <size>1500G</size> <type>linux raid</type> </partition> <partition id="raid2"> <size>max</size> <type>linux raid</type> </partition> </device> <raid id="slashraid"> <member>slash1</member> <member>slash2</member> <level>1</level> <filesystem>ext4</filesystem> <mountPoint>/</mountPoint> <mountOptions>defaults,noatime,nodiratime</mountOptions> </raid> <raid id="shorelineraid"> <member>shoreline1</member> <member>shoreline2</member> <level>1</level> </raid> <raid id="localraid"> <member>raid1</member> <member>raid2</member> <level>0</level> <filesystem>ext4</filesystem> <mountPoint>/local</mountPoint> <mountOptions>defaults,noatime,nodiratime</mountOptions> </raid> </diskSetup>
Reference: k8s-system-admin disk layout after provisioning
lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS loop0 7:0 0 1.5T 0 loop nvme0n1 259:0 0 7T 0 disk ├─nvme0n1p1 259:1 0 1.5T 0 part │ └─md1 9:1 0 1.5T 0 raid1 └─nvme0n1p2 259:2 0 5.5T 0 part └─md2 9:2 0 11T 0 raid0 /local nvme3n1 259:6 0 894.3G 0 disk ├─nvme3n1p1 259:8 0 512M 0 part └─nvme3n1p2 259:9 0 893.7G 0 part └─md0 9:0 0 893.7G 0 raid1 / nvme2n1 259:7 0 894.3G 0 disk ├─nvme2n1p1 259:10 0 512M 0 part /boot/efi └─nvme2n1p2 259:11 0 893.7G 0 part └─md0 9:0 0 893.7G 0 raid1 / # df -h Filesystem Size Used Avail Use% Mounted on tmpfs 240G 114M 240G 1% /run /dev/md0 879G 26G 809G 4% / tmpfs 240G 0 240G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock efivarfs 384K 21K 364K 6% /sys/firmware/efi/efivars /dev/nvme2n1p1 511M 4.0K 511M 1% /boot/efi /dev/md2 11T 28K 11T 1% /local
Note
/dev/md1 is an unformatted partition used by NMC Autonomous Hardware Recovery (AHR).
Set the disksetup file in the category.
cmsh; category use k8s-system-admin; set disksetup k8s-system-admin-disksetup.xml; commit
k8s-system-user disksetup file#
Create and add a k8s-system-user disk setup file in /cm/local/apps/cmd/etc/htdocs/disk-setup/k8s-system-user-disksetup.xml.
Reference: Disk setup for k8s-system-user nodes (based on Supermicro ARS-221GL-FNB-NC24B-DC Model)
Disk setup for k8s-system-user nodes (based on Supermicro ARS-221GL-FNB-NC24B-DC Model)
<?xml version="1.0" encoding="UTF-8"?> <diskSetup> <device> <blockdev>/dev/disk/by-path/pci-0014:01:00.0-nvme-1</blockdev> <partition id="boot1" partitiontype="esp"> <size>512M</size> <type>linux</type> <filesystem>fat</filesystem> <mountPoint>/boot/efi</mountPoint> <mountOptions>defaults,noatime,nodiratime</mountOptions> </partition> <partition id="slash1"> <size>max</size> <type>linux raid</type> </partition> </device> <device> <blockdev>/dev/disk/by-path/pci-0015:01:00.0-nvme-1</blockdev> <partition id="boot2" partitiontype="esp"> <size>512M</size> <type>linux</type> <filesystem>fat</filesystem> <mountOptions>defaults,noatime,nodiratime</mountOptions> </partition> <partition id="slash2"> <size>max</size> <type>linux raid</type> </partition> </device> <device> <blockdev>/dev/disk/by-path/pci-0000:01:00.0-nvme-1</blockdev> <partition id="var1"> <size>1500G</size> <type>linux raid</type> </partition> <partition id="tmp1"> <size>max</size> <type>linux raid</type> </partition> </device> <device> <blockdev>/dev/disk/by-path/pci-0001:01:00.0-nvme-1</blockdev> <partition id="var2"> <size>1500G</size> <type>linux raid</type> </partition> <partition id="tmp2"> <size>max</size> <type>linux raid</type> </partition> </device> <raid id="slashraid"> <member>slash1</member> <member>slash2</member> <level>1</level> <filesystem>ext4</filesystem> <mountPoint>/</mountPoint> <mountOptions>defaults,noatime,nodiratime</mountOptions> </raid> <raid id="varraid"> <member>var1</member> <member>var2</member> <level>1</level> <filesystem>ext4</filesystem> <mountPoint>/var</mountPoint> <mountOptions>defaults,noatime,nodiratime</mountOptions> </raid> <raid id="tmpraid"> <member>tmp1</member> <member>tmp2</member> <level>0</level> <filesystem>ext4</filesystem> <mountPoint>/tmp</mountPoint> <mountOptions>defaults,noatime,nodiratime</mountOptions> </raid> </diskSetup>
Reference: k8s-system-user disk layout after provisioning
lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS nvme0n1 259:0 0 7T 0 disk ├─nvme0n1p1 259:1 0 1.5T 0 part │ └─md1 9:1 0 1.5T 0 raid1 /var └─nvme0n1p2 259:2 0 5.5T 0 part └─md2 9:2 0 11T 0 raid0 /tmp nvme1n1 259:3 0 7T 0 disk ├─nvme1n1p1 259:4 0 1.5T 0 part │ └─md1 9:1 0 1.5T 0 raid1 /var └─nvme1n1p2 259:5 0 5.5T 0 part └─md2 9:2 0 11T 0 raid0 /tmp nvme3n1 259:6 0 894.3G 0 disk ├─nvme3n1p1 259:14 0 512M 0 part └─nvme3n1p2 259:15 0 893.7G 0 part └─md0 9:0 0 893.7G 0 raid1 / nvme2n1 259:7 0 894.3G 0 disk ├─nvme2n1p1 259:12 0 512M 0 part /boot/efi └─nvme2n1p2 259:13 0 893.7G 0 part └─md0 9:0 0 893.7G 0 raid1 / # df -h Filesystem Size Used Avail Use% Mounted on tmpfs 240G 62M 240G 1% /run /dev/md0 879G 7.1G 827G 1% / none 240G 0 240G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock efivarfs 384K 21K 364K 6% /sys/firmware/efi/efivars /dev/nvme2n1p1 511M 4.0K 511M 1% /boot/efi /dev/md1 1.5T 4.2G 1.4T 1% /var /dev/md2 11T 1.9M 11T 1% /tmp
Set the disksetup file in the category.
cmsh; category use k8s-system-user; set disksetup k8s-system-user-disksetup.xml; commit
DGX GB200/GB300 Disk Setup#
The following post install process is done to ensure consistent naming of nvme/disk drive devices.
Note
For systems that have two M.2 nvmes, nvme mulitpath is disabled in these instructions. Specifically, for DGX GB200 systems, this does not have to be done because the compute tray only has a single M.2 OS Drive. See the UDEV Rules KB article for more details.
For both DGX GB200 and GB300, follow the steps below. The only difference is the rules file, which uses different PCIe addresses for each system. The disk setup configuration is the same for both. For OEM GB200/GB300 systems, ensure that the proper research has been done to determine the correct PCIe addresses for each disk, and modify the rules file and disk setup configuration accordingly.
Create the rules file
Create the appropriate rules file for your system:
For DGX GB200 (save as 60-persistent-storage-gb200.rules):
########## persistent nvme rules by HW address (GB200) ########## KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0015:01:00.0", SYMLINK+="disk/by-id/osdisk-1" KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0006:07:00.0", SYMLINK+="disk/by-id/raiddisk-1" KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0006:09:00.0", SYMLINK+="disk/by-id/raiddisk-2" KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0016:07:00.0", SYMLINK+="disk/by-id/raiddisk-3" KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0016:09:00.0", SYMLINK+="disk/by-id/raiddisk-4" ########## persistent nvme rules by HW address ##########
For DGX GB300 (save as 60-persistent-storage-gb300.rules):
########## persistent nvme rules by HW address (GB300) ########## KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0015:01:00.0", SYMLINK+="disk/by-id/osdisk-1" KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0012:07:00.0", SYMLINK+="disk/by-id/raiddisk-1" KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0010:07:00.0", SYMLINK+="disk/by-id/raiddisk-2" KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0000:07:00.0", SYMLINK+="disk/by-id/raiddisk-3" KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0002:07:00.0", SYMLINK+="disk/by-id/raiddisk-4" ########## persistent nvme rules by HW address ##########
Add the rules file to the node-installer images:
# For GB200 cp 60-persistent-storage-gb200.rules /cm/node-installer/usr/lib/udev/rules.d/60-persistent-storage-gb200.rules cp 60-persistent-storage-gb200.rules /cm/node-installer-ubuntu2404-aarch64/usr/lib/udev/rules.d/60-persistent-storage-gb200.rules # For GB300 cp 60-persistent-storage-gb300.rules /cm/node-installer/usr/lib/udev/rules.d/60-persistent-storage-gb300.rules cp 60-persistent-storage-gb300.rules /cm/node-installer-ubuntu2404-aarch64/usr/lib/udev/rules.d/60-persistent-storage-gb300.rules
Note
If the head node is C2/ARM, then copying to the /cm/node-installer is sufficient.
Copy the rules file to the OS image:
# For GB200 cp 60-persistent-storage-gb200.rules /cm/images/<dgxos image>/usr/lib/udev/rules.d/60-persistent-storage-gb200.rules # For GB300 cp 60-persistent-storage-gb300.rules /cm/images/<dgxos image>/usr/lib/udev/rules.d/60-persistent-storage-gb300.rules
Create and add the disk setup file (same for both GB200 and GB300) as gb200-disksetup.xml or gb300-disksetup.xml in the directory /cm/local/apps/cmd/etc/htdocs/disk-setup:
Disk setup for DGX GB200 and GB300
<?xml version="1.0" encoding="UTF-8"?> <diskSetup> <device> <blockdev>/dev/disk/by-id/osdisk-1</blockdev> <partition id="efi" partitiontype="esp"> <size>100M</size> <type>linux</type> <filesystem>fat</filesystem> <mountPoint>/boot/efi</mountPoint> <mountOptions>defaults,noatime,nodiratime</mountOptions> </partition> <partition id="boot1"> <size>4G</size> <type>linux</type> <filesystem>ext2</filesystem> <mountPoint>/boot</mountPoint> <mountOptions>defaults,noatime,nodiratime</mountOptions> </partition> <partition id="slash1"> <size>max</size> <type>linux</type> <filesystem>ext4</filesystem> <mountPoint>/</mountPoint> <mountOptions>defaults,noatime,nodiratime</mountOptions> </partition> </device> <device> <blockdev>/dev/disk/by-id/raiddisk-1</blockdev> <partition id="raid1"> <size>max</size> <type>linux raid</type> </partition> </device> <device> <blockdev>/dev/disk/by-id/raiddisk-2</blockdev> <partition id="raid2"> <size>max</size> <type>linux raid</type> </partition> </device> <device> <blockdev>/dev/disk/by-id/raiddisk-3</blockdev> <partition id="raid3"> <size>max</size> <type>linux raid</type> </partition> </device> <device> <blockdev>/dev/disk/by-id/raiddisk-4</blockdev> <partition id="raid4"> <size>max</size> <type>linux raid</type> </partition> </device> <raid id="scratch_local"> <member>raid1</member> <member>raid2</member> <member>raid3</member> <member>raid4</member> <level>0</level> <filesystem>ext4</filesystem> <mountPoint>/raid</mountPoint> <mountOptions>defaults,noatime,nodiratime</mountOptions> </raid> </diskSetup>
Note
Both DGX GB200 and GB300 systems use the disk-by-id method for disk setup, consistent with the referenced KB article.
Set the disksetup file in the appropriate category.
For GB200
cmsh; category use dgx-gb200; set disksetup gb200-disksetup.xml; commitFor GB300
cmsh; category use dgx-gb300; set disksetup gb300-disksetup.xml; commit