Control Plane Node Entries#

Now that the categories, software images, and networks have been defined, the control nodes themselves can now be defined for provisioning. For each node type, a “golden node” is created, and then cloned. As best practice, use network IPs for each interface so that they are logically intuitive to find and are incremental. If this is not the case, adjust the IPs in the node entries accordingly.

To define the control nodes and provision them:

  1. Create a golden node entry.

  2. Define network interfaces used by each node based on the P2P connections for that NIC. Add their specific MAC addresses.

  3. Create bonds on the interfaces where needed.

  4. Assign a network to that NIC.

  5. Set an IP for the NIC/Bond based on the IP plan.

  6. Set the provisioning interface.

  7. Add the BMC interface and set its IP, network, and BMC MAC address (if available).

  8. Set up the login information for the BMC interface for OOB control of the node. (This applies if each control plane node has a unique BMC user/password.)

  9. Clone the golden node based on the node count.

  10. All Control Nodes are ready for provisioning.

Example: Generic node entry cloning with incrementing IPs.

foreach -o <goldennode> -n <hostname with first node number>..<hostname with last node number> --next-ip ()

Example K8s-admin Node entry cloning with incrementing IPs.

foreach -o k8s-admin-01 -n k8s-admin-02..k8s-admin-03 --next-ip ()

# If each cloned node name is unique

foreach -o k8s-admin-01 -n <name1> <name2> --next-ip ()

Control Plane Host Naming Conventions#

The following schema are done for:

  • Head nodes

  • Slurm login (slogin)

  • Kubernetes User Space Nodes

  • Kubernetes Admin Space Nodes

Table 1 Control Plane Host Names Schema#

Term

Definition

<RACK>-<RU>- P[1-16]-HEAD-0[1-2]

BCM head nodes, per PODs, no increase in head node

<RACK>-<RU>- P[1-16]-SLOGIN-0[1-2]

sLogin, no incremental

<RACK>-<RU>- P[1-16]-K8U-0[1-3]

Kubernetes User Space Nodes

<RACK>-<RU>- P[1-16]-K8A-0[1-3]

Kubernetes Admin Space Nodes

Supermicro Reference Servers#

For the reference control plane two models of Supermicro servers are used:

  • ARM/C2 based Supermicro ARS-221GL-FNB-NC24B.

  • x86 based Supermicro SYS-221GE-FNB-NC24B.

The hard drive and NIC count are the same:

  • OS 2x960 GB M.2 in a SW RAID1 configuration.

  • 2x NVME 7.68 TB in either SW RAID0 or RAID1 configuration depending on the control node.

The NIC counts are also the same:

  • 4x Connect-X 7 (200Gb NDR 1x200 Gb/s, single port OSFP).

  • 2x 1 GbE in-band ports.

  • 1x 1 GbE BMC.

Reference: Supermicro ARS-221GL-FNB-NC24B network interfaces (C2/ARM)

root@head-01:~# lshw -c network -businfo

Bus info Device Class Description

=============================================================

pci@0002:01:00.0 enP2s2f0 network Ethernet Controller X550

pci@0002:01:00.1 enP2s2f1 network Ethernet Controller X550

pci@0004:01:00.0 enP4s4np0 network MT2910 Family [ConnectX-7]

pci@0006:01:00.0 enP6s6np0 network MT2910 Family [ConnectX-7]

pci@0012:01:00.0 enP18s18np0 network MT2910 Family [ConnectX-7]

pci@0016:01:00.0 enP22s22np0 network MT2910 Family [ConnectX-7]

Reference: Supermicro SYS-221GE-FNB-NC24B network interfaces (x86)

root@a03-p1-k8s-admin-x86-01:~# lshw -c network -businfo

Bus info Device Class Description

=============================================================

pci@0000:17:00.0 ens1np0 network MT2910 Family [ConnectX-7]

pci@0000:2a:00.0 enp42s0np0 network MT2910 Family [ConnectX-7]

pci@0000:63:00.0 enp99s0f0 network Ethernet Controller X550

pci@0000:63:00.1 enp99s0f1 network Ethernet Controller X550

pci@0000:ab:00.0 enp171s0np0 network MT2910 Family [ConnectX-7]

pci@0000:bd:00.0 enp189s0np0 network MT2910 Family [ConnectX-7]

Head Nodes#

Only the primary head node needs to be configured. Its settings are duplicated and propagated to the secondary head node in the high availability (HA) setup process.

  1. Add BMC (ipmi0 or rf0) and its IP and network information.

    Defining this as ipmi0 will have BCM run ipmitool for OOB communication. Using rf0 (redfish) will have BCM use redfish for OOB communication. For DGX SuperPOD GB200 nodes, use rf0. For OEM GB200 compute trays, ipmi0 may need to be used if rf0 does not work.

    cmsh -c "device use master;interfaces;add bmc rf0;set ip <rf0 ip on ipminet>;set network <ipminetx>;set mac <BMC MAC>;commit"
    

    Note

    Set mac <BMC MAC> is new in BCM11 where the MAC for the IPMI can be defined.

  2. Add provisioning NICs then create a network bond for communication with internalnet (200G) in-band management. Below is the single command. The example provides the step-by-step summary.

    cmsh -c "device use master;interfaces;add physical enP4s4np0;set mac <M1 MAC>;add physical enP6s6np0;set mac <M2 MAC>;add bond bond0;set ip <headnode IP>;set network internalnet;set interfaces enP4s4np0 enP6s6np0;set mode 4;set options miimon=100;commit;..;..;set provisioninginterface bond0;commit"
    

    Example: Supermicro ARS-221GL-FNB-NC24B ARM/C2 head node bond0 settings

    cmsh
    device use master
    interfaces
    
    add physical enP4s4np0   # (In the P2P this is M1)
    set mac <M1 MAC>
    
    add physical enP6s6np0   # (In the P2P this is M2)
    set mac <M2 MAC>
    
    add bond bond0
    set ip <headnode IP>
    set network internalnet
    set interfaces enP4s4np0 enP6s6np0
    set mode 4   # (enables LACP bonding)
    set options miimon=100
    
    commit
    ..
    ..
    
    set provisioninginterface bond0
    
    commit
    

    Example: Supermicro ARS-221GL-FNB-NC24B ARM/C2 head node bond0 summary

    cmsh -c "device use master;interfaces;use bond0;show"
    
    Parameter                Value
    ---------                -----
    Revision
    Type                     bond
    Network device name      bond0 [prov]
    Network                  internalnet
    IP                       7.241.16.8
    DHCP                     no
    Alternative Hostname
    Additional Hostnames
    Switch ports
    Start if                 always
    BringUpDuringInstall     no
    On network priority      70
    Bootable                 no
    MAC                      00:00:00:00:00:00
    Mode                     4 (802.3ad)
    Options
    Interfaces               enP4s4np0,enP6s6np0
    
  3. Add NICs/ports for use in a network bond to connect to the ipmi network which is used to talk to the NVLink Switch devices (COMe1).

    cmsh -c "device use master;interfaces;add physical enP18s18np0;set mac <M3 MAC>;add physical enP22s22np0;set mac <M4 MAC>;add bond bond1;set ip <headnode IP>;set network ipminet0;set interfaces enP18s18np0 enP22s22np0;set mode 4;commit"
    

    Example: Supermicro ARS-221GL-FNB-NC24B ARM/C2 head node bond1 settings

    cmsh
    device use master
    interfaces
    
    add physical enP18s18np0   # (In the P2P this is M3)
    set mac <M3 MAC>
    
    add physical enP22s22np0   # (In the P2P this is M4)
    set mac <M4 MAC>
    
    add bond bond1
    set ip <headnode IP>
    set network ipminet0
    set interfaces enP18s18np0 enP22s22np0
    set mode 4
    
    commit
    

    Reference: Head node interface configuration (before HA setup):

    [a03-p1-head-01->device[a03-p1-head-01]->interfaces]% list
    
    Type      Network device name   IP            Network      Start if
    --------  -------------------  -------------  ----------   --------
    bmc       rf0                  7.241.0.5      ipminet0     always
    bond      bond0 [prov]         7.241.16.8     internalnet  always
    bond      bond1                7.241.0.8      ipminet0     always
    physical  enP18s18np0 (bond1)  0.0.0.0        -            always
    physical  enP22s22np0 (bond1)  0.0.0.0        -            always
    physical  enP4s4np0 (bond0)    0.0.0.0        -            always
    physical  enP6s6np0 (bond0)    0.0.0.0        -            always
    
  4. Optional- If there is an extra 1G RJ45 LOM port and it is identical to what is available on the secondary head node, add this device to later use for a dedicated heartbeat cable for HA.

  5. Set the BMC credentials for the head nodes.

    cmsh -c "device use master;interfaces;set bmcsettings;set userid <2 is the default for most OEMS>;set username <oem default user name>;set password <Unique password found on the asset tag>;commit"
    
    device use master
    bmcsettings
    set userid <2 is the default for most OEMS>
    set username <oem default user name>
    set password
    commit
    

SLURM Login Nodes (slogin)#

Here are the requirements for the slogin nodes:

Node Count: 2 nodes are required.

  • Reason: Redundancy and load balancing.

CPU Architecture: The slogin space nodes are ARM based.

  • Reason: The slogin nodes are used for user access to the cluster and for running Slurm jobs. Generally this is where users will natively compile ARM code to run on the GB200 compute trays.

Network Connectivity: These nodes require one bond interface for provisioning, and two ports connect to the fast storage fabric (storagenet).

Required Interfaces: Each node must have the following network interfaces:

  • A bond for provisioning and normal N/S (host-to-host) communication.

  1. Create slogin golden node entry.

    Set a MAC at the device level, this can be either management port MAC (M1/M2). This assumes the MACs are known. This is required for the first provisioning.

cmsh -c "device use master;device; add physicalnode <nodename>; set category slogin; set mac <M1/M2 MAC>; commit"

Example: Supermicro ARS-221GL-FNB-NC24B ARM/C2 slogin node entry

cmsh
device; add physicalnode <nodename>;
set category slogin
set mac <M1/M2 MAC>
commit
  1. Add BMC (ipmi0 or rf0).

    Defining this as ipmi0 will have BCM run ipmitool for OOB communication. Using rf0 (redfish) will have BCM use redfish for OOB communication. For DGX SuperPOD, use rf0.

cmsh -c "device use <slogin-node name>;interfaces;add bmc rf0;set ip <rf0 ip on ipminet>;set network <ipminetx>;set mac <BMC MAC>;commit"

Example: Supermicro ARS-221GL-FNB-NC24B ARM/C2 slogin node rf0 settings

cmsh
device use <slogin-node name>;interfaces
add bmc rf0  # in this generation, it is now set up as rf0 (redfish0) vs. ipmi0
set ip <rf0 ip on ipminet>
set network <ipminetx>
set mac <BMC MAC>  # this is new in BCM11 where the MAC for the IPMI can be defined
commit

Note

When this is committed, it will set the power control to rf0/ipmi0.

  1. Add provisioning NICs then create a bond for communication with internalnet (200G) in-band management.

On the slogin nodes the two port X550 Ethernet card is unused and is not configured in BCM.

cmsh -c "device use <slogin-node name>;interfaces;add physical enP4s4np0;set mac <M1 MAC>;add physical enP6s6np0;set mac <M2 MAC>;add bond bond0;set ip <first slogin node IP>;set network internalnet;set interfaces enP4s4np0 enP6s6np0;set mode 4;set options miimon=100;commit;..;..;set provisioninginterface bond0;commit"

Example: Supermicro ARS-221GL-FNB-NC24B ARM/C2 slogin node bond0 settings

cmsh
device use <slogin-node name>;interfaces;

add physical enP4s4np0  # (In the P2P this is M1)
set mac <M1 MAC>

add physical enP6s6np0
set mac <M2 MAC>

add bond bond0
set ip <first slogin node IP>
set network internalnet
set interfaces enP4s4np0 enP6s6np0
set mode 4
set options miimon=100

commit
..
..

set provisioninginterface bond0

commit
  1. For slogin, add two ports to connect to the fast storage network (/31 IPs).

cmsh -c "device use <slogin-node name>;interfaces;add physical enP18s18np0;set mac <M3 MAC>;add physical enP22s22np0;set mac <M4 MAC>;set network storagenet;set ip <S1 IP>;set ip <S2 IP>;commit"

Example: Supermicro ARS-221GL-FNB-NC24B ARM/C2 slogin node fast storage network settings

cmsh

device use <slogin-node name>;interfaces;

add physical enP18s18np0  # (In the P2P this is M3)
set mac <M3 MAC>
set ip <S1 IP>
set network storagenet

add physical enP22s22np0  # (In the P2P this is M4)
set mac <M4 MAC>
set ip <S2 IP>
set network storagenet

commit
  1. Set the slogin BMC credentials here if each node has a unique user/password. If they are the same, this should have already been set at the category level.

cmsh -c "device use <slogin-node name>;bmcsettings;set userid 2;set username ADMIN;set password <Unique password found on the asset tag>;commit"

Example: Supermicro ARS-221GL-FNB-NC24B ARM/C2 slogin node BMC settings .. code-block:: console

cmsh device use <slogin-node-name> bmcsettings set userid 2 set username ADMIN set password <Unique password found on the asset tag> commit

Reference: slogin node interface settings

Type        Network device name   IP             Network      Start if
----------- -------------------- --------------  -----------  --------
bmc         rf0                  7.241.0.23     ipminet0     always
bond        bond0 [prov]         7.241.16.23    internalnet  always
physical    enP4s4np0 (bond0)    0.0.0.0        -            always
physical    enP6s6np0 (bond0)    0.0.0.0        -            always
physical    enP18s18np0          <S1 IP>        storagenet   always
physical    enP22s22np0          <S2 IP>        storagenet   always
  1. Clone golden slogin node to one other slogin node.

foreach -o <rack>-<ru>-p<podnumber>-slogin-01 -n <rack>-<ru>-p<podnumber>-slogin-02 --next-ip ()

k8s-admin Nodes (k8a)#

Here are the requirements for the k8s-admin Kubernetes cluster:

Node Count: 3 nodes are required.

  • Reason: Kubernetes requires an odd number of control plane nodes to maintain quorum.

CPU Architecture: All nodes must be x86-based.

  • Reason: the NMX-M software for managing NVLink fabric is only compatible with x86 at this time.

Network Connectivity: The cluster needs access to the Out of Band (OOB) network.

  • Purpose: NMX-M software must communicate with NVLink switches, which are only available on the OOB network.

Required Interfaces: Each node must have two bonded network interfaces:

  • A bond for provisioning and normal N/S (host-to-host) communication.

  • A bond for the OOB network (also called the NVLink COMe0/COMe1 network).

  1. Create k8s-admin golden node entry.

Set a MAC at the device level, this can be either management port MAC (M1/M2). This assumes the MACs are known. This is required for first provisioning.

cmsh -c "device use master;device; add physicalnode <rack>-<RU>-p<podnumber>-k8a-<arch>-01; set category k8s-admin; set mac <M1/M2 MAC>; commit"

Example: Supermicro SYS-221GE-FNB-NC24B x86 k8s-admin node entry

cmsh;device; add physicalnode <rack>-<RU>-p<podnumber>-k8a-<arch>-01;
set category k8s-admin
set mac <M1/M2 MAC>
commit
  1. Add bmc rf0.

cmsh -c "device use <rack>-<RU>-p<podnumber>-k8a-<arch>-01;interfaces;add bmc rf0;set ip <rf0 ip on ipminet>;set network <ipminetx>;set mac <BMC MAC>;commit"

Example: Supermicro SYS-221GE-FNB-NC24B x86 k8s-admin node rf0 settings

cmsh;device use <rack>-<RU>-p<podnumber>-k8a-<arch>-01;interfaces
add bmc rf0
set ip <rf0 ip on ipminet>
set network <ipminetx>
set mac <BMC MAC>
commit
  1. Add provisioning NICs.

cmsh -c "device use <rack>-<RU>-p<podnumber>-k8a-<arch>-01;interfaces;add physical ens1np0;set mac <M1 MAC>;add physical enp42s0np0;set mac <M2 MAC>;add physical enp171s0np0;set mac <M3 MAC>;add physical enp189s0np0;set mac <M4 MAC>;commit"

Example: Supermicro SYS-221GE-FNB-NC24B x86 k8s-admin node interfaces

# NICS in bond0 (management/internalnet)
cmsh;device use <rack>-<RU>-p<podnumber>-k8a-<arch>-01;interfaces;
add physical ens1np0  # In the P2P this is M1
set mac <M1 MAC>
add physical enp42s0np0
set mac <M2 MAC>

# NICS in bond1 (NICs to bond for NVLink COMe0/COMe1 connections)
add physical enp171s0np0  # In the P2P this is M3
set mac <M3 MAC>
add physical enp189s0np0
set mac <M4 MAC>
  1. Create the bond0 for connection with internalnet and bond1 for connection to the NVLink COMe network/ipminet.

cmsh -c "device use <k8s-admin node name>;interfaces;add bond bond0;set ip <first k8s-admin node IP>;set network internalnet;set interfaces ens1np0 enp42s0np0;set mode 4;commit;..;..;set provisioninginterface bond0;add bond bond1;set interfaces enp171s0np0 enp189s0np0;set mode 4;commit"
cmsh
device use  <k8s-admin node name>
interfaces

add bond bond0
set ip <first k8s-admin node IP>
set network internalnet
set interfaces ens1np0 enp42s0np0
set mode 4

commit
..
..

set provisioninginterface bond0

interfaces
add bond bond1
set interfaces enp171s0np0 enp189s0np0
set mode 4
commit
  1. Clone golden k8s-admin node to two other k8s-admin hostnames.

foreach -o <rack>-<RU>-p<podnumber>-k8a-<arch>-01 -n <rack>-<RU>-p<podnumber>-k8a-<arch>-02 <rack>-<RU>-p<podnumber>-k8a-<arch>-03 --next-ip ()

K8s-user (k8u)#

Here are the requirements for the k8s-user Kubernetes cluster:

Node Count: 3 nodes are required.

  • Reason: Kubernetes requires an odd number of control plane nodes to maintain quorum.

CPU Architecture: The k8s-user space nodes can either be ARM or x86 based.

  • Reason: The primary use of the k8s-user space is for the installation and hosting of Run:ai which supports both microarchitectures.

Network Connectivity: These nodes require one bond interface for provisioning, and the other two ports connect to the fast storage fabric (storagenet).

Required Interfaces: Each node must have the following network interfaces:

  • A bond for provisioning and normal N/S (host-to-host) communication.

  • Two fast storage NICs/ports, each with an /31 IP, each IP on a different border TOR.

  1. Create k8s-user golden node entry.

Set a MAC at the device level, this can be either management port MAC (M1/M2). This assumes the MACs are known. This is required for first provisioning.

cmsh;device; add physicalnode <rack>-<ru>-p<podnumber>-k8u-<arch>-01;
set category k8s-user
set mac <M1/M2 MAC>
  1. Add BMC (ipmi0 or rf0).

Defining this as ipmi0 will have BCM run ipmitool for OOB communication. Using rf0 (redfish) will have BCM use redfish for OOB communication. For DGX SuperPOD, use rf0.

cmsh;device use <rack>-<ru>-p<podnumber>-k8u-<arch>-01;
add bmc rf0
set ip <rf0 ip on ipminet >
set network <ipminetx>
set mac <BMC MAC>
  1. Add provisioning NICs then create a bond for communication with internalnet (200G) in-band management.

cmsh;device use <rack>-<ru>-p<podnumber>-k8u-<arch>-01;interfaces;

add physical enP4s4np0  # In the P2P this is M1
set mac <M1 MAC>

add physical enP6s6np0
set mac <M2 MAC>

add bond bond0
set ip <first k8s-user IP>
set network internalnet
set interfaces enP4s4np0 enP6s6np0
set mode 4

commit
..
..
set provisioninginterface bond0

commit
  1. Add two fast storage NICs, each with an /31 IP, each IP on a different border TOR.

cmsh;device use <rack>-<ru>-p<podnumber>-k8u-<arch>-01;interfaces;

add physical enP18s18np0  # In the P2P this is M3
set mac <M3 MAC>
set ip <S1 IP>
set network storagenet

add physical enP22s22np0  # In the P2P this is M4
set mac <M4 MAC>
set ip <S2 IP>
set network storagenet

commit
  1. Set the k8s-user BMC credentials here if each node has a unique user/password. If they are the same this should have already been set at the category level.

device use <rack>-<ru>-p<podnumber>-k8u-<arch>-01

bmcsettings

set userid 2

set username ADMIN

set password <Unique password found on the asset tag>

commit

Reference: k8s-user interface settings

Type       Network device name   IP              Network      Start if
---------- -------------------- --------------- ------------ --------
bmc        rf0                  7.241.0.11      ipminet0     always
bond       bond0 [prov]         7.241.16.11     internalnet  always
physical   enP18s18np0          100.127.0.1     storagenet   always
physical   enP22s22np0          100.127.128.1   storagenet   always
physical   enP4s4np0 (bond0)    0.0.0.0         -            always
physical   enP6s6np0 (bond0)    0.0.0.0         -            always
  1. Clone golden k8s-user to two other k8s-user-02 and k8s-user-03.

foreach -o <rack>-<ru>-p<podnumber>-k8u-<arch>-01 -n <rack>-<ru>-p<podnumber>-k8u-<arch>-02 <rack>-<ru>-p<podnumber>-k8u-<arch>-03 --next-ip ()