Category Creation#

Individual category (typically by node type) settings are configured to address that particular type of node. This usually assumes that the hardware node configuration is the same for each node (in other words all the nodes of a particular type should have the same make, model, and configuration). While mixing various types of hardware into a single category is possible, it is much simpler not to do so.

Each major device type in the control plane is given a category. The settings for the category level apply to all nodes within that category. Each category is also assigned a software image in which to provision and boot all the nodes of that category.

The categories that need to be defined are:

  • slogin

  • k8s-system-admin

  • k8s-system-user

  • dgx-gb200/gb300-k8s and dgx-gb200/gb300-slurm

Note

the dgx-gb200 category is created by the bcm-post-install module, however if that is not being used, it will need to be defined manually (OEMs).

For each category the following tasks need to be completed:

  1. Add <category name>.

cmsh -c "category; add <category name>; commit"
  1. Set the software image.

cmsh -c "category; use <category name>; set softwareimage <category name>-image; commit"
  1. Set the management network.

This is typically the network that the nodes in this category are provisioned from.

cmsh -c "category; use <category name>; set managementnetwork internalnet; commit"

Note

  • For both control planes and the dgx-gb200/gb300 categories, the management network is set to internalnet by default.

  • If bcm-netautogen is used, or if a separate dgxnet is created, the management network (dgxnet) should set to match if that is network that is provisioning that category.

  • Ensure this is cleared from the node level in order to inherit this property from the category.

  1. Add BMC login credentials to the category. This should behave correctly if all nodes in that category have had their username/password set to the same value. If not, specify this at the node level for the control plane nodes.

cmsh -c "category use <category name>; bmcsettings; set username <bmc username>; set userid <bmc user id>; set password <bmc password>; commit"
  1. Create and assign a disksetup.xml.

cmsh; category use <category>; set disksetup <double tab to see options>; commit

Note

hit enter to input in the xml manually/copy-paste or set disksetup <disksetup file name> if the file is already created.

This is unique per control plane node type, and they have different requirements. This is covered in the next section.

  1. For any categories that will provision aarch64/ARM architecture nodes, the boot loader must be set to GRUB from syslinux.

    cmsh -c "category use <category name>; set bootloader grub; commit"
    

    or

    cmsh; category; use <aarch64/ARM category>; set bootloader grub; commit
    
  2. For the dgx-gb200/gb300 category, ensure that the BMC settings are defined so that OOB power control can be established via BCM 11 itself. The firmware management mode also needs to be set for the firmware update process to work properly through BCM.

    cmsh -c "category use <gb200 category>; bmcsettings; set firmwaremanagemode GB200; set password 0penBmc; set privilege ADMINISTRATOR; set userid 0; set username root; commit"
    

    or

    cmsh; category use <gb200 category>; bmcsettings;
    set firmwaremanagemode GB200
    set password <default bmc password>
    set privilege ADMINISTRATOR
    set userid 0
    set username <default bmc username>
    commit
    

Note

For GB300 categories, set the firmwaremanagemode to GB200 at this time. This is a temporary requirement and will be updated to a dedicated GB300 mode in a future version of BCM.

Control Plane Disk Setup#

Each control plane category can have a specific disk setup depending on the server’s hardware model. It is assumed that all the servers in a particular category are of the same make and model. Since there are control nodes of varying hardware topologies, some information gathering with regards to PCIe addressing/topology needs to be done. This information gathering is covered in the Hardware Information Gathering section of the Appendix. Provided are the disksetup configurations for each category assuming the reference architecture models are used.

Note

If a non-reference server is being used, edit the example(s) below to reflect the drive count and PCI Express addresses of the drives. However, the correct partitioning is crucial to the installation of NVIDIA Mission Control Software.

slogin disksetup file#

  1. Create and add a slogin disk setup file in /cm/local/apps/cmd/etc/htdocs/disk-setup/slogin-node-disksetup.xml.

    Note

    For non-reference servers, determine the equivalent PCIe address for each drive and update the disksetup file accordingly. The PCIe addresses shown in the examples are specific to the reference hardware models and may not match all server configurations.

    Reference: Disk Setup for slogin nodes (based on Supermicro ARS-221GL-FNB-NC24B-DC Model).

    Disk Setup for slogin nodes (based on Supermicro ARS-221GL-FNB-NC24B-DC Model)
    <?xml version="1.0" encoding="UTF-8"?>
    
    <diskSetup>
    
    <device>
      <blockdev>/dev/disk/by-path/pci-0014:01:00.0-nvme-1</blockdev>
      <partition id="boot1" partitiontype="esp">
        <size>512M</size>
        <type>linux</type>
        <filesystem>fat</filesystem>
        <mountPoint>/boot/efi</mountPoint>
        <mountOptions>defaults,noatime,nodiratime</mountOptions>
      </partition>
      <partition id="slash1">
        <size>max</size>
        <type>linux raid</type>
      </partition>
    </device>
    
    <device>
      <blockdev>/dev/disk/by-path/pci-0015:01:00.0-nvme-1</blockdev>
      <partition id="boot2" partitiontype="esp">
        <size>512M</size>
        <type>linux</type>
        <filesystem>fat</filesystem>
        <mountOptions>defaults,noatime,nodiratime</mountOptions>
      </partition>
      <partition id="slash2">
        <size>max</size>
        <type>linux raid</type>
      </partition>
    </device>
    
    <device>
      <blockdev>/dev/disk/by-path/pci-0000:01:00.0-nvme-1</blockdev>
      <partition id="var1">
        <size>1500G</size>
        <type>linux raid</type>
      </partition>
      <partition id="tmp1">
        <size>max</size>
        <type>linux raid</type>
      </partition>
    </device>
    
    <device>
      <blockdev>/dev/disk/by-path/pci-0001:01:00.0-nvme-1</blockdev>
      <partition id="var2">
        <size>1500G</size>
        <type>linux raid</type>
      </partition>
      <partition id="tmp2">
        <size>max</size>
        <type>linux raid</type>
      </partition>
    </device>
    
    <raid id="slashraid">
      <member>slash1</member>
      <member>slash2</member>
      <level>1</level>
      <filesystem>ext4</filesystem>
      <mountPoint>/</mountPoint>
      <mountOptions>defaults,noatime,nodiratime</mountOptions>
    </raid>
    
    <raid id="varraid">
      <member>var1</member>
      <member>var2</member>
      <level>1</level>
      <filesystem>ext4</filesystem>
      <mountPoint>/var</mountPoint>
      <mountOptions>defaults,noatime,nodiratime</mountOptions>
    </raid>
    
    <raid id="tmpraid">
      <member>tmp1</member>
      <member>tmp2</member>
      <level>0</level>
      <filesystem>ext4</filesystem>
      <mountPoint>/tmp</mountPoint>
      <mountOptions>defaults,noatime,nodiratime</mountOptions>
    </raid>
    
    </diskSetup>
    

    Reference: slogin disk layout after provisioning

    lsblk
    
    NAME         MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
    nvme0n1      259:0    0     7T  0 disk
    ├─nvme0n1p1  259:1    0   1.5T  0 part
    │ └─md1         9:1    0   1.5T  0 raid1 /var
    └─nvme0n1p2  259:2    0   5.5T  0 part
      └─md2         9:2    0    11T  0 raid0 /tmp
    nvme1n1      259:3    0     7T  0 disk
    ├─nvme1n1p1  259:4    0   1.5T  0 part
    │ └─md1         9:1    0   1.5T  0 raid1 /var
    └─nvme1n1p2  259:5    0   5.5T  0 part
      └─md2         9:2    0    11T  0 raid0 /tmp
    nvme3n1      259:6    0 894.3G  0 disk
    ├─nvme3n1p1 259:14    0   512M  0 part
    └─nvme3n1p2 259:15    0 893.7G  0 part
      └─md0         9:0    0 893.7G  0 raid1 /
    nvme2n1      259:7    0 894.3G  0 disk
    ├─nvme2n1p1 259:12    0   512M  0 part /boot/efi
    └─nvme2n1p2 259:13    0 893.7G  0 part
      └─md0         9:0    0 893.7G  0 raid1 /
    
    root@a03-p1-aps-arm-01:~# df -h
    
    Filesystem      Size  Used Avail Use% Mounted on
    tmpfs           240G   62M  240G   1% /run
    /dev/md0        879G  7.1G  827G   1% /
    none            240G     0  240G   0% /dev/shm
    tmpfs           5.0M     0  5.0M   0% /run/lock
    efivarfs        384K   21K  364K   6% /sys/firmware/efi/efivars
    /dev/nvme2n1p1  511M  4.0K  511M   1% /boot/efi
    /dev/md1        1.5T  4.2G  1.4T   1% /var
    /dev/md2         11T  1.9M   11T   1% /tmp
    
  2. Set the disksetup file in the category.

    msh; category use slogin; set disksetup slogin-disksetup.xml; commit
    

k8s-system-admin disksetup file#

  1. Create and add an k8s-system-admin disk setup file in /cm/local/apps/cmd/etc/htdocs/disk-setup/k8s-system-admin-disksetup.xml.

    Reference: Disk setup for k8s-system-admin nodes (based on Supermicro SYS-221GE-FNB-NC24B-DC model)

    Disk setup for k8s-system-admin nodes (based on Supermicro SYS-221GE-FNB-NC24B-DC model)
    <?xml version="1.0" encoding="UTF-8"?>
    
    <diskSetup>
    
     <device>
       <blockdev>/dev/disk/by-path/pci-0000:04:00.0-nvme-1</blockdev>
       <partition id="boot2" partitiontype="esp">
         <size>512M</size>
         <type>linux</type>
         <filesystem>fat</filesystem>
         <mountPoint>/boot/efi</mountPoint>
         <mountOptions>defaults,noatime,nodiratime</mountOptions>
       </partition>
       <partition id="slash2">
         <size>max</size>
         <type>linux raid</type>
       </partition>
     </device>
    
     <device>
       <blockdev>/dev/disk/by-path/pci-0000:3d:00.0-nvme-1</blockdev>
       <partition id="shoreline1">
         <size>1500G</size>
         <type>linux raid</type>
       </partition>
       <partition id="raid1">
         <size>max</size>
         <type>linux raid</type>
       </partition>
     </device>
    
     <device>
       <blockdev>/dev/disk/by-path/pci-0000:3e:00.0-nvme-1</blockdev>
       <partition id="shoreline2">
         <size>1500G</size>
         <type>linux raid</type>
       </partition>
       <partition id="raid2">
         <size>max</size>
         <type>linux raid</type>
       </partition>
     </device>
    
     <raid id="slashraid">
       <member>slash1</member>
       <member>slash2</member>
       <level>1</level>
       <filesystem>ext4</filesystem>
       <mountPoint>/</mountPoint>
       <mountOptions>defaults,noatime,nodiratime</mountOptions>
     </raid>
    
     <raid id="shorelineraid">
       <member>shoreline1</member>
       <member>shoreline2</member>
       <level>1</level>
     </raid>
    
     <raid id="localraid">
       <member>raid1</member>
       <member>raid2</member>
       <level>0</level>
       <filesystem>ext4</filesystem>
       <mountPoint>/local</mountPoint>
       <mountOptions>defaults,noatime,nodiratime</mountOptions>
     </raid>
    
    </diskSetup>
    

    Reference: k8s-system-admin disk layout after provisioning

    lsblk
    
    NAME         MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
    loop0          7:0    0   1.5T  0 loop
    nvme0n1      259:0    0     7T  0 disk
    ├─nvme0n1p1  259:1    0   1.5T  0 part
    │ └─md1        9:1    0   1.5T  0 raid1
    └─nvme0n1p2  259:2    0   5.5T  0 part
      └─md2        9:2    0    11T  0 raid0 /local
    nvme3n1      259:6    0 894.3G  0 disk
    ├─nvme3n1p1  259:8    0   512M  0 part
    └─nvme3n1p2  259:9    0 893.7G  0 part
      └─md0        9:0    0 893.7G  0 raid1 /
    nvme2n1      259:7    0 894.3G  0 disk
    ├─nvme2n1p1  259:10   0   512M  0 part /boot/efi
    └─nvme2n1p2  259:11   0 893.7G  0 part
      └─md0        9:0    0 893.7G  0 raid1 /
    
    # df -h
    
    Filesystem      Size  Used Avail Use% Mounted on
    tmpfs           240G  114M  240G   1% /run
    /dev/md0        879G   26G  809G   4% /
    tmpfs           240G     0  240G   0% /dev/shm
    tmpfs           5.0M     0  5.0M   0% /run/lock
    efivarfs        384K   21K  364K   6% /sys/firmware/efi/efivars
    /dev/nvme2n1p1  511M  4.0K  511M   1% /boot/efi
    /dev/md2         11T   28K   11T   1% /local
    

Note

/dev/md1 is an unformatted partition used by NMC Autonomous Hardware Recovery (AHR).

  1. Set the disksetup file in the category.

    cmsh; category use k8s-system-admin; set disksetup k8s-system-admin-disksetup.xml; commit
    

k8s-system-user disksetup file#

  1. Create and add a k8s-system-user disk setup file in /cm/local/apps/cmd/etc/htdocs/disk-setup/k8s-system-user-disksetup.xml.

    Reference: Disk setup for k8s-system-user nodes (based on Supermicro ARS-221GL-FNB-NC24B-DC Model)

    Disk setup for k8s-system-user nodes (based on Supermicro ARS-221GL-FNB-NC24B-DC Model)
    <?xml version="1.0" encoding="UTF-8"?>
    
    <diskSetup>
    
    <device>
      <blockdev>/dev/disk/by-path/pci-0014:01:00.0-nvme-1</blockdev>
      <partition id="boot1" partitiontype="esp">
        <size>512M</size>
        <type>linux</type>
        <filesystem>fat</filesystem>
        <mountPoint>/boot/efi</mountPoint>
        <mountOptions>defaults,noatime,nodiratime</mountOptions>
      </partition>
      <partition id="slash1">
        <size>max</size>
        <type>linux raid</type>
      </partition>
    </device>
    
    <device>
      <blockdev>/dev/disk/by-path/pci-0015:01:00.0-nvme-1</blockdev>
      <partition id="boot2" partitiontype="esp">
        <size>512M</size>
        <type>linux</type>
        <filesystem>fat</filesystem>
        <mountOptions>defaults,noatime,nodiratime</mountOptions>
      </partition>
      <partition id="slash2">
        <size>max</size>
        <type>linux raid</type>
      </partition>
    </device>
    
    <device>
      <blockdev>/dev/disk/by-path/pci-0000:01:00.0-nvme-1</blockdev>
      <partition id="var1">
        <size>1500G</size>
        <type>linux raid</type>
      </partition>
      <partition id="tmp1">
        <size>max</size>
        <type>linux raid</type>
      </partition>
    </device>
    
    <device>
      <blockdev>/dev/disk/by-path/pci-0001:01:00.0-nvme-1</blockdev>
      <partition id="var2">
        <size>1500G</size>
        <type>linux raid</type>
      </partition>
      <partition id="tmp2">
        <size>max</size>
        <type>linux raid</type>
      </partition>
    </device>
    
    <raid id="slashraid">
      <member>slash1</member>
      <member>slash2</member>
      <level>1</level>
      <filesystem>ext4</filesystem>
      <mountPoint>/</mountPoint>
      <mountOptions>defaults,noatime,nodiratime</mountOptions>
    </raid>
    
    <raid id="varraid">
      <member>var1</member>
      <member>var2</member>
      <level>1</level>
      <filesystem>ext4</filesystem>
      <mountPoint>/var</mountPoint>
      <mountOptions>defaults,noatime,nodiratime</mountOptions>
    </raid>
    
    <raid id="tmpraid">
      <member>tmp1</member>
      <member>tmp2</member>
      <level>0</level>
      <filesystem>ext4</filesystem>
      <mountPoint>/tmp</mountPoint>
      <mountOptions>defaults,noatime,nodiratime</mountOptions>
    </raid>
    
    </diskSetup>
    

    Reference: k8s-system-user disk layout after provisioning

    lsblk
    
    NAME         MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
    nvme0n1      259:0    0     7T  0 disk
    ├─nvme0n1p1  259:1    0   1.5T  0 part
    │ └─md1         9:1    0   1.5T  0 raid1 /var
    └─nvme0n1p2  259:2    0   5.5T  0 part
      └─md2         9:2    0    11T  0 raid0 /tmp
    nvme1n1      259:3    0     7T  0 disk
    ├─nvme1n1p1  259:4    0   1.5T  0 part
    │ └─md1         9:1    0   1.5T  0 raid1 /var
    └─nvme1n1p2  259:5    0   5.5T  0 part
      └─md2         9:2    0    11T  0 raid0 /tmp
    nvme3n1      259:6    0 894.3G  0 disk
    ├─nvme3n1p1 259:14    0   512M  0 part
    └─nvme3n1p2 259:15    0 893.7G  0 part
      └─md0         9:0    0 893.7G  0 raid1 /
    nvme2n1      259:7    0 894.3G  0 disk
    ├─nvme2n1p1 259:12    0   512M  0 part /boot/efi
    └─nvme2n1p2 259:13    0 893.7G  0 part
      └─md0         9:0    0 893.7G  0 raid1 /
    
    # df -h
    
    Filesystem      Size  Used Avail Use% Mounted on
    tmpfs           240G   62M  240G   1% /run
    /dev/md0        879G  7.1G  827G   1% /
    none            240G     0  240G   0% /dev/shm
    tmpfs           5.0M     0  5.0M   0% /run/lock
    efivarfs        384K   21K  364K   6% /sys/firmware/efi/efivars
    /dev/nvme2n1p1  511M  4.0K  511M   1% /boot/efi
    /dev/md1        1.5T  4.2G  1.4T   1% /var
    /dev/md2         11T  1.9M   11T   1% /tmp
    
  2. Set the disksetup file in the category.

    cmsh; category use k8s-system-user; set disksetup k8s-system-user-disksetup.xml; commit
    

DGX GB200/GB300 Disk Setup#

The following post install process is done to ensure consistent naming of nvme/disk drive devices.

Note

For systems that have two M.2 nvmes, nvme mulitpath is disabled in these instructions. Specifically, for DGX GB200 systems, this does not have to be done because the compute tray only has a single M.2 OS Drive. See the UDEV Rules KB article for more details.

For both DGX GB200 and GB300, follow the steps below. The only difference is the rules file, which uses different PCIe addresses for each system. The disk setup configuration is the same for both. For OEM GB200/GB300 systems, ensure that the proper research has been done to determine the correct PCIe addresses for each disk, and modify the rules file and disk setup configuration accordingly.

  1. Create the rules file

    Create the appropriate rules file for your system:

    • For DGX GB200 (save as 60-persistent-storage-gb200.rules):

    ########## persistent nvme rules by HW address (GB200) ##########
    KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0015:01:00.0", SYMLINK+="disk/by-id/osdisk-1"
    KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0006:07:00.0", SYMLINK+="disk/by-id/raiddisk-1"
    KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0006:09:00.0", SYMLINK+="disk/by-id/raiddisk-2"
    KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0016:07:00.0", SYMLINK+="disk/by-id/raiddisk-3"
    KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0016:09:00.0", SYMLINK+="disk/by-id/raiddisk-4"
    ########## persistent nvme rules by HW address ##########
    
    • For DGX GB300 (save as 60-persistent-storage-gb300.rules):

    ########## persistent nvme rules by HW address (GB300) ##########
    KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0015:01:00.0", SYMLINK+="disk/by-id/osdisk-1"
    KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0012:07:00.0", SYMLINK+="disk/by-id/raiddisk-1"
    KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0010:07:00.0", SYMLINK+="disk/by-id/raiddisk-2"
    KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0000:07:00.0", SYMLINK+="disk/by-id/raiddisk-3"
    KERNEL=="nvme[0-9]n[0-9]", ATTRS{address}=="0002:07:00.0", SYMLINK+="disk/by-id/raiddisk-4"
    ########## persistent nvme rules by HW address ##########
    
  2. Add the rules file to the node-installer images:

    # For GB200
    cp 60-persistent-storage-gb200.rules /cm/node-installer/usr/lib/udev/rules.d/60-persistent-storage-gb200.rules
    cp 60-persistent-storage-gb200.rules /cm/node-installer-ubuntu2404-aarch64/usr/lib/udev/rules.d/60-persistent-storage-gb200.rules
    
    # For GB300
    cp 60-persistent-storage-gb300.rules /cm/node-installer/usr/lib/udev/rules.d/60-persistent-storage-gb300.rules
    cp 60-persistent-storage-gb300.rules /cm/node-installer-ubuntu2404-aarch64/usr/lib/udev/rules.d/60-persistent-storage-gb300.rules
    

    Note

    If the head node is C2/ARM, then copying to the /cm/node-installer is sufficient.

  3. Copy the rules file to the OS image:

    # For GB200
    cp 60-persistent-storage-gb200.rules /cm/images/<dgxos image>/usr/lib/udev/rules.d/60-persistent-storage-gb200.rules
    
    # For GB300
    cp 60-persistent-storage-gb300.rules /cm/images/<dgxos image>/usr/lib/udev/rules.d/60-persistent-storage-gb300.rules
    
  4. Create and add the disk setup file (same for both GB200 and GB300) as gb200-disksetup.xml or gb300-disksetup.xml in the directory /cm/local/apps/cmd/etc/htdocs/disk-setup:

    Disk setup for DGX GB200 and GB300
    <?xml version="1.0" encoding="UTF-8"?>
    <diskSetup>
        <device>
            <blockdev>/dev/disk/by-id/osdisk-1</blockdev>
            <partition id="efi" partitiontype="esp">
                <size>100M</size>
                <type>linux</type>
                <filesystem>fat</filesystem>
                <mountPoint>/boot/efi</mountPoint>
                <mountOptions>defaults,noatime,nodiratime</mountOptions>
            </partition>
            <partition id="boot1">
                <size>4G</size>
                <type>linux</type>
                <filesystem>ext2</filesystem>
                <mountPoint>/boot</mountPoint>
                <mountOptions>defaults,noatime,nodiratime</mountOptions>
            </partition>
            <partition id="slash1">
                <size>max</size>
                <type>linux</type>
                <filesystem>ext4</filesystem>
                <mountPoint>/</mountPoint>
                <mountOptions>defaults,noatime,nodiratime</mountOptions>
            </partition>
        </device>
        <device>
            <blockdev>/dev/disk/by-id/raiddisk-1</blockdev>
            <partition id="raid1">
                <size>max</size>
                <type>linux raid</type>
            </partition>
        </device>
        <device>
            <blockdev>/dev/disk/by-id/raiddisk-2</blockdev>
            <partition id="raid2">
                <size>max</size>
                <type>linux raid</type>
            </partition>
        </device>
        <device>
            <blockdev>/dev/disk/by-id/raiddisk-3</blockdev>
            <partition id="raid3">
                <size>max</size>
                <type>linux raid</type>
            </partition>
        </device>
        <device>
            <blockdev>/dev/disk/by-id/raiddisk-4</blockdev>
            <partition id="raid4">
                <size>max</size>
                <type>linux raid</type>
            </partition>
        </device>
        <raid id="scratch_local">
            <member>raid1</member>
            <member>raid2</member>
            <member>raid3</member>
            <member>raid4</member>
            <level>0</level>
            <filesystem>ext4</filesystem>
            <mountPoint>/raid</mountPoint>
            <mountOptions>defaults,noatime,nodiratime</mountOptions>
        </raid>
    </diskSetup>
    

    Note

    Both DGX GB200 and GB300 systems use the disk-by-id method for disk setup, consistent with the referenced KB article.

  5. Set the disksetup file in the appropriate category.

    For GB200

    cmsh; category use dgx-gb200; set disksetup gb200-disksetup.xml; commit
    

    For GB300

    cmsh; category use dgx-gb300; set disksetup gb300-disksetup.xml; commit