Cluster Configuration#

This documentation is part of NVIDIA DGX BasePOD: Deployment Guide Featuring NVIDIA DGX A100 Systems.

Note

Before you complete the steps in this documentation, complete Deployment.

Warning

The # prompt indicates commands that you execute as the root user on a head node. The % prompt indicates commands that you execute within cmsh.

Cluster Configuration Steps#

  1. Log in to the Base Command Manager (BCM) head node assigned to externalnet.

    1ssh <externalnet>
    
  2. Install the cluster license by running the request-license command. Because HA is used, specify the MAC address of the first NIC of the secondary head node so that it can also serve the BCM licenses in the event of a failover.

    This example is for a head node with Internet access. For air-gapped clusters, see “Off-cluster WWW access” in Section 4.3.3 of the NVIDIA Base Command Manager Installation Manual.

    1# request-license
    2Product Key (XXXXXX-XXXXXX-XXXXXX-XXXXXX-XXXXXX): 123456-123456-123456-123456
    3...
    
  3. Backup the default software image. The backup image can be used to create additional software images.

1# cmsh
2% softwareimage
3% clone default-image default-image-orig
4% commit

Wait for the ramdisk to be regenerated and the following text to be displayed.

1Wed Jul 26 09:00:53 2023 [notice] bcm10-headnode: Initial ramdisk for image default-image-orig was generated successfully
  1. Backup the DGX software image. The backup image can be used to create additional software images.

    1% softwareimage
    2% clone dgx-os-6.0-a100-image dgx-os-6.0-a100-image-orig
    3% commit
    

    Wait for the ramdisk to be regenerated and the following text to be displayed.

    1Wed Jul 26 09:01:11 2023 [notice] bcm10-headnode: Initial ramdisk for image dgx-a100-image-orig was generated successfully
    
  2. Create the K8s software image by cloning the default software image. This software image will be further configured and provisioned onto the K8s control plane nodes. Wait for the ramdisk to be regenerated.

    1% softwareimage
    2% clone default-image k8s-master-image
    3% commit
    
  3. Add the required kernel modules to the k8s-master-image software image.

    1% /
    2% softwareimage
    3% use k8s-master-image
    4% kernelmodules
    5% add mlx5_core
    6% add bonding
    7% softwareimage commit
    
  4. Create the k8s-master node category and assign the k8s-master-image software image to it. All nodes assigned to the k8s-master category will be provisioned with the k8s-master-image software image.

    1% category
    2% clone default k8s-master
    3% set softwareimage k8s-master-image
    4% commit
    
  5. Create the DGX nodes. node01 was created during head node installation. Clone node01 to create the DGX nodes, which will initially be named node02, node03, node04, and node05.

    1% device
    2% foreach --clone node01 -n node02..node05 ()
    3% commit
    
  6. Rename the DGX nodes so they are more easily identified later.

    1% use node02
    2% set hostname dgx01
    3% use node03
    4% set hostname dgx02
    5% use node04
    6% set hostname dgx03
    7% use node05
    8% set hostname dgx04
    9% device commit
    
  7. Clone node01 to create the K8s control plane nodes, which will initially be named node05, node06 and node07.

1% device
2% foreach --clone node01 -n node06..node08 ()
3% commit
  1. Rename the K8s control plane nodes so they are more easily identifiable.

1% device
2% use node06
3% set hostname knode01
4% use node07
5% set hostname knode02
6% use node08
7% set hostname knode03
8% device commit
  1. Rename node01. The purpose of this step is to specify that node01 is only a template.

1% device
2% use node01
3% set hostname template01
4% commit
  1. Assign the DGX nodes to the correct node category; dgx-a100.

1% foreach -n dgx01..dgx04 (set category dgx-a100)
  1. Assign the K8S nodes to the k8s-master node category.

1% foreach -n knode01..knode03 (set category k8s-master)
2% commit
  1. Check the nodes and their categories. Extra options are used for device list to make the format more readable.

 1% device list -f hostname:20,category:10,ip:20,status:15
 2hostname (key)       category   ip                   status
 3-------------------- ---------- -------------------- ---------------
 4bcm10-headnode                  10.227.48.8          [   UP   ]
 5dgx01                dgx-a100   10.227.48.5          [  DOWN  ]
 6dgx02                dgx-a100   10.227.48.6          [  DOWN  ]
 7dgx03                dgx-a100   10.227.48.7          [  DOWN  ]
 8dgx04                dgx-a100   10.227.48.4          [  DOWN  ]
 9knode01              k8s-master 10.227.48.4          [  DOWN  ]
10knode02              k8s-master 10.227.48.4          [  DOWN  ]
11knode03              k8s-master 10.227.48.4          [  DOWN  ]
12template01           default    10.227.48.4          [  DOWN  ]

Network Configuration#

  1. Add a Network for InfiniBand (ibnet).

    1% network
    2% add ibnet
    3% set domainname ibnet.cluster.local
    4% set baseaddress 10.126.0.0
    5% set netmaskbits 16
    6% set mtu 2048
    7% commit
    8% add ibnet
    
  2. Verify the results.

    1% list -f name:20,type:10,netmaskbits:10,baseaddress:15,domainname:20
    2name (key)           type       netmaskbit baseaddress     domainname
    3-------------------- ---------- ---------- --------------- --------------------
    4externalnet          External   26         10.227.52.0     nvidia.com
    5globalnet            Global     0          0.0.0.0         cm.cluster
    6ibnet                Internal   16         10.126.0.0      ibnet.cluster.local
    7internalnet          Internal   26         10.227.48.0     eth.cluster
    8ipminet              Internal   26         10.227.20.64    ipmi.cluster
    
  3. Ensure that head node interfaces are configured correctly.

     1% device
     2% use bcm10-headnode
     3% interfaces
     4% list
     5Type         Network device name  IP               Network          Start if
     6------------ -------------------- ---------------- ---------------- --------
     7bmc          ipmi0                10.227.20.91     ipminet          always
     8physical     ens10f0              10.227.52.8      externalnet      always
     9physical     ens10f1              10.227.20.126    ipminet          always
    10physical     ens1f1np1 [prov]     10.227.48.8      internalnet      always
    
  4. If any interfaces are missing or unconfigured, add any missing devices and configure their network as appropriate.

  5. Reboot the head node if the network interfaces were changed.

    1% /
    2% device
    3% use bcm10-headnode
    4% reboot
    5Reboot in progress for: bcm10-headnode
    

Configure Disk Layouts for Node Categories#

Part of using BCM for managing nodes in a DGX BasePOD is to define the disk partitions. Each DGX BasePOD node category includes K8s control plane and DGX node categories. The DGX categories are pre-configured with the correct disk partitions out of the box.

These steps detail how to configure the disk layout for the k8s-master category.

  1. Augment the disksetup of the k8s-master category.

    For the K8s control plane nodes, an EFI System Partition of 100 MB is created at the start of the disk, with the remainder of the disk dedicated to the OS as a single large partition. Note that this disk setup does not have a swap partition.

    The configuration file references /dev/nvme0n1 as the block device used. This may need to be changed to match the specific device name used on systems intended as K8s control plane nodes.

    Save the following text to /cm/local/apps/cmd/etc/htdocs/disk-setup/k8s-disksetup.xml, factoring in any necessary changes specific to the target systems as noted.

     1<?xml version="1.0" encoding="ISO-8859-1"?>
     2<diskSetup xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
     3<device>
     4
     5   <blockdev>/dev/nvme0n1</blockdev>
     6
     7   <partition id="a0" partitiontype="esp">
     8      <size>100M</size>
     9      <type>linux</type>
    10      <filesystem>fat</filesystem>
    11      <mountPoint>/boot/efi</mountPoint>
    12      <mountOptions>defaults,noatime,nodiratime</mountOptions>
    13   </partition>
    14   <partition id="a1">
    15      <size>max</size>
    16      <type>linux</type>
    17      <filesystem>xfs</filesystem>
    18      <mountPoint>/</mountPoint>
    19      <mountOptions>defaults,noatime,nodiratime</mountOptions>
    20   </partition>
    21</device>
    22</diskSetup>
    
  2. Assign this disk layout to the k8s-master node category.

    1$ cmsh
    2% category
    3% use k8s-master
    4% set disksetup /cm/local/apps/cmd/etc/htdocs/disk-setup/k8s-disksetup.xml
    5% commit
    

Configure Node Network Interfaces#

Configure BCM to Allow MAC Addresses to PXE Boot#

  1. Use the root (not cmsh) shell.

  2. In /cm/local/apps/cmd/etc/cmd.conf, uncomment the AdvancedConfig parameter.

    1AdvancedConfig = { "DeviceResolveAnyMAC=1" } # modified value
    
  3. Restart the CMDaemon to enable reliable PXE booting from bonded interfaces.

    1# systemctl restart cmd
    
  4. Restarting the CMDaemon will disconnect the cmsh session. Type connect to reconnect after the CMDaemon has restarted. Or enter exit and then restart cmsh.

Configure Provisioning Interfaces on the DGX Nodes#

The steps that follow are performed on the head node and should be run on all DGX systems.

Warning

Double check the MAC address for each interface, and the IP address for the bond0 interface. Mistakes here will be difficult to diagnose.

  1. Use a cmsh for loop to quickly add the new physical interfaces and the bond0 interface. This will update all four DGX A100 systems.

    1# cmsh
    2% device
    3% foreach -n dgx01..dgx04 (interfaces; add physical enp225s0f1; add physical enp97s0f1; add physical enp225s0f1np1; add physical enp97s0f1np1; commit)
    4% foreach -n dgx01..dgx04 (interfaces; add bond bond0; set interfaces enp225s0f1 enp97s0f1 enp225s0f1np1 enp97s0f1np1; set network internalnet; set mode 4; set options miimon=100; commit)
    
  2. Set the physical interface MAC addresses as appropriate, and set the ipmi0 and bond0 interfaces if they should be changed—this must be repeated on each DGX system (a single system shown here).

     1# cmsh
     2% device
     3% use dgx01
     4% interfaces
     5% set enp225s0f1 mac B8:CE:F6:2F:08:69
     6% set enp97s0f1 mac B8:CE:F6:2D:0E:A7
     7% set enp225s0f1np1 mac B8:CE:F6:2F:08:69
     8% set enp97s0f1np1 mac B8:CE:F6:2D:0E:A7
     9% set ipmi0 ip 10.227.20.69
    10% set bond0 ip 10.227.48.13
    11% commit
    12% list
    13Type         Network device name  IP               Network          Start if
    14------------ -------------------- ---------------- ---------------- --------
    15bmc          ipmi0                10.227.20.69     ipminet          always
    16physical     BOOTIF [prov]        10.227.48.4      internalnet      always
    17bond         bond0                10.227.48.13     internalnet      always
    18physical     enp225s0f1           0.0.0.0                           always
    19physical     enp225s0f1np1        0.0.0.0                           always
    20physical     enp97s0f1            0.0.0.0                           always
    21physical     enp97s0f1np1         0.0.0.0                           always
    
  3. Using a foreach loop, the bond0 interface as the provisioninginterface and remove bootif.

    1% /                       # go to top level of cmsh
    2% device
    3% foreach -n dgx01..dgx04 (set provisioninginterface bond0; commit; interfaces; remove bootif; commit)
    
  4. Verify the configuration.

     1% device
     2% use dgx01
     3% get provisioninginterface
     4bond0
     5% interfaces
     6% list
     7Type         Network device name    IP               Network          Start if
     8------------ ---------------------- ---------------- ---------------- --------
     9bmc          ipmi0                  10.227.20.69     ipminet          always
    10bond         bond0 [prov]           10.227.48.13     internalnet      always
    11physical     enp225s0f1 (bond0)     0.0.0.0                           always
    12physical     enp225s0f1np1 (bond0)  0.0.0.0                           always
    13physical     enp97s0f1 (bond0)      0.0.0.0                           always
    14physical     enp97s0f1np1 (bond0)   0.0.0.0                           always
    

Configure Provisioning Interfaces on the Kubernetes Nodes#

All the following steps in this section must be run for each of the three Kubernetes (K8s) nodes.

  1. Use a cmsh for loop to quickly add the new physical interfaces and the bond0 interface. This will update all three knodes.

1% /                       # got to top level of CMSH
2% device
3% foreach -n knode01..knode03 (interfaces; add physical ens1f1; add physical ens2f1; add physical ens1f1np1; add physical ens2f1np1; commit)
4% foreach -n knode01..knode03 (interfaces; add bond bond0; set interfaces ens1f1np1 ens2f1np1 ens1f1 ens2f1; set network internalnet; set mode 4; set options miimon=100)
  1. Set the physical interface MAC addresses as appropriate, and set the ipmi0 and bond0 interfaces if they should be changed—this must be repeated on each knode system (a single system shown here).

     1% /
     2% device
     3% use knode01
     4% interfaces
     5% set ens1f1 mac 04:3F:72:E7:64:97
     6% set ens1f1np1 mac 04:3F:72:E7:64:97
     7% set ens2f1 mac 0C:42:A1:79:9B:15
     8% set ens2f1np1 mac 0C:42:A1:79:9B:15
     9% add bond bond0
    10% set ipmi0 ip 10.227.20.80
    11% set bond0 ip 10.227.48.30
    12% list
    13% commit
    14
    15Type         Network device name  IP               Network          Start if
    16------------ -------------------- ---------------- ---------------- --------
    17bmc          ipmi0                10.227.20.80     ipminet          always
    18physical     BOOTIF [prov]        10.227.48.4      internalnet      always
    19bond         bond0                10.227.48.30     internalnet      always
    20physical     ens1f1 (bond0)       0.0.0.0                           always
    21physical     ens1f1np1 (bond0)    0.0.0.0                           always
    22physical     ens2f1 (bond0)       0.0.0.0                           always
    23physical     ens2f1np1 (bond0)    0.0.0.0                           always
    
  2. Set the bond0 interface as the provisioninginterface, and remove bootif. A for loop should be used here again.

    1% /
    2% device
    3% foreach -n knode01..knode03 (set provisioninginterface bond0; commit; interfaces; remove bootif; commit)
    

Configure InfiniBand Interfaces on DGX Nodes#

The following procedure adds four physical InfiniBand interfaces, and must be run for each DGX node.

  1. Use a cmsh for loop to quickly add the new physical Infiniband interfaces. This will update all four DGX nodes.

    1% /                       # got to top level of CMSH
    2% device
    3% foreach -n dgx01..dgx04 (interfaces; add physical ibp12s0; set network ibnet; add physical ibp141s0; set network ibnet; add physical ibp186s0; set network ibnet; add physical ibp75s0; set network ibnet; commit)
    
  2. Set the ip addresses for each physical Infiniband interface—this will need to be repeated on each DGX system (a single system shown here).

     1% /                       # go to top level of CMSH
     2% device
     3% use dgx01
     4% interfaces
     5% set ibp12s0 ip 10.126.0.13
     6% set ibp141s0 ip 10.126.2.13
     7% set ibp186s0 ip 10.126.3.13
     8% set ibp75s0 ip 10.126.1.13
     9% commit
    10% list
    11Type         Network device name    IP               Network          Start if
    12------------ ---------------------- ---------------- ---------------- --------
    13bmc          ipmi0                  10.227.20.69     ipminet          always
    14bond         bond0 [prov]           10.227.48.13     internalnet      always
    15physical     enp225s0f1 (bond0)     0.0.0.0                           always
    16physical     enp225s0f1np1 (bond0)  0.0.0.0                           always
    17physical     enp97s0f1 (bond0)      0.0.0.0                           always
    18physical     enp97s0f1np1 (bond0)   0.0.0.0                           always
    19physical     ibp12s0                10.126.0.13      ibnet            always
    20physical     ibp141s0               10.126.2.13      ibnet            always
    21physical     ibp186s0               10.126.3.13      ibnet            always
    22physical     ibp75s0                10.126.1.13      ibnet            always
    

Identify the Cluster Nodes#

  1. Identify the nodes by setting the MAC address for the provisioning interface for each node to the MAC address listed in the site survey.

     1% device
     2% set dgx01 mac b8:ce:f6:2f:08:69
     3% set dgx02 mac 0c:42:a1:54:32:a7
     4% set dgx03 mac 0c:42:a1:0a:7a:51
     5% set dgx04 mac 1c:34:da:29:17:6e
     6% set knode01 mac 04:3F:72:E7:64:97
     7% set knode02 mac 04:3F:72:D3:FC:EB
     8% set knode03 mac 04:3F:72:D3:FC:DB
     9% foreach -c dgx-a100,k8s-master (get mac)
    10B8:CE:F6:2F:08:69
    110C:42:A1:54:32:A7
    120C:42:A1:0A:7A:51
    131C:34:DA:29:17:6E
    1404:3F:72:E7:64:97
    1504:3F:72:D3:FC:EB
    1604:3F:72:D3:FC:DB
    
  2. If all the MAC addresses are set properly, commit the changes.

    1% device commit
    2% quit
    3% commit
    

Next Steps#

After you complete the steps on this page, see Power On and Provision Cluster Nodes.