Control Plane Node Entries#
Now that the categories, software images, and networks have been defined, the control nodes themselves can now be defined for provisioning. For each node type, a “golden node” is created, and then cloned. As best practice, use network IPs for each interface so that they are logically intuitive to find and are incremental. If this is not the case, adjust the IPs in the node entries accordingly.
To define the control nodes and provision them:
Create a
golden
node entry.Define network interfaces used by each node based on the P2P connections for that NIC. Add their specific MAC addresses.
Create bonds on the interfaces where needed.
Assign a network to that NIC.
Set an IP for the NIC/Bond based on the IP plan.
Set the provisioning interface.
Add the BMC interface and set its IP, network, and BMC MAC address (if available).
Set up the login information for the BMC interface for OOB control of the node. (This applies if each control plane node has a unique BMC user/password.)
Clone the golden node based on the node count.
All Control Nodes are ready for provisioning.
Example: Generic node entry cloning with incrementing IPs.
foreach -o <goldennode> -n <hostname with first node number>..<hostname with last node number> --next-ip ()
Example K8s-admin Node entry cloning with incrementing IPs.
foreach -o k8s-admin-01 -n k8s-admin-02..k8s-admin-03 --next-ip ()
# If each cloned node name is unique
foreach -o k8s-admin-01 -n <name1> <name2> --next-ip ()
Control Plane Host Naming Conventions#
The following schema are done for:
Head nodes
Slurm login (slogin)
Kubernetes User Space Nodes
Kubernetes Admin Space Nodes
Term |
Definition |
---|---|
<RACK>-<RU>- P[1-16]-HEAD-0[1-2] |
BCM head nodes, per PODs, no increase in head node |
<RACK>-<RU>- P[1-16]-SLOGIN-0[1-2] |
sLogin, no incremental |
<RACK>-<RU>- P[1-16]-K8U-0[1-3] |
Kubernetes User Space Nodes |
<RACK>-<RU>- P[1-16]-K8A-0[1-3] |
Kubernetes Admin Space Nodes |
Supermicro Reference Servers#
For the reference control plane two models of Supermicro servers are used:
ARM/C2 based Supermicro ARS-221GL-FNB-NC24B.
x86 based Supermicro SYS-221GE-FNB-NC24B.
The hard drive and NIC count are the same:
OS 2x960 GB M.2 in a SW RAID1 configuration.
2x NVME 7.68 TB in either SW RAID0 or RAID1 configuration depending on the control node.
The NIC counts are also the same:
4x Connect-X 7 (200Gb NDR 1x200 Gb/s, single port OSFP).
2x 1 GbE in-band ports.
1x 1 GbE BMC.
Reference: Supermicro ARS-221GL-FNB-NC24B network interfaces (C2/ARM)
root@head-01:~# lshw -c network -businfo
Bus info Device Class Description
=============================================================
pci@0002:01:00.0 enP2s2f0 network Ethernet Controller X550
pci@0002:01:00.1 enP2s2f1 network Ethernet Controller X550
pci@0004:01:00.0 enP4s4np0 network MT2910 Family [ConnectX-7]
pci@0006:01:00.0 enP6s6np0 network MT2910 Family [ConnectX-7]
pci@0012:01:00.0 enP18s18np0 network MT2910 Family [ConnectX-7]
pci@0016:01:00.0 enP22s22np0 network MT2910 Family [ConnectX-7]
Reference: Supermicro SYS-221GE-FNB-NC24B network interfaces (x86)
root@a03-p1-k8s-admin-x86-01:~# lshw -c network -businfo
Bus info Device Class Description
=============================================================
pci@0000:17:00.0 ens1np0 network MT2910 Family [ConnectX-7]
pci@0000:2a:00.0 enp42s0np0 network MT2910 Family [ConnectX-7]
pci@0000:63:00.0 enp99s0f0 network Ethernet Controller X550
pci@0000:63:00.1 enp99s0f1 network Ethernet Controller X550
pci@0000:ab:00.0 enp171s0np0 network MT2910 Family [ConnectX-7]
pci@0000:bd:00.0 enp189s0np0 network MT2910 Family [ConnectX-7]
Head Nodes#
Only the primary head node needs to be configured. Its settings are duplicated and propagated to the secondary head node in the high availability (HA) setup process.
Add BMC (ipmi0 or rf0) and its IP and network information.
Defining this as ipmi0 will have BCM run ipmitool for OOB communication. Using rf0 (redfish) will have BCM use redfish for OOB communication. For DGX SuperPOD GB200 nodes, use rf0. For OEM GB200 compute trays, ipmi0 may need to be used if rf0 does not work.
cmsh -c "device use master;interfaces;add bmc rf0;set ip <rf0 ip on ipminet>;set network <ipminetx>;set mac <BMC MAC>;commit"
Note
Set mac <BMC MAC> is new in BCM11 where the MAC for the IPMI can be defined.
Add provisioning NICs then create a network bond for communication with internalnet (200G) in-band management. Below is the single command. The example provides the step-by-step summary.
cmsh -c "device use master;interfaces;add physical enP4s4np0;set mac <M1 MAC>;add physical enP6s6np0;set mac <M2 MAC>;add bond bond0;set ip <headnode IP>;set network internalnet;set interfaces enP4s4np0 enP6s6np0;set mode 4;set options miimon=100;commit;..;..;set provisioninginterface bond0;commit"
Example: Supermicro ARS-221GL-FNB-NC24B ARM/C2 head node bond0 settings
cmsh device use master interfaces add physical enP4s4np0 # (In the P2P this is M1) set mac <M1 MAC> add physical enP6s6np0 # (In the P2P this is M2) set mac <M2 MAC> add bond bond0 set ip <headnode IP> set network internalnet set interfaces enP4s4np0 enP6s6np0 set mode 4 # (enables LACP bonding) set options miimon=100 commit .. .. set provisioninginterface bond0 commit
Example: Supermicro ARS-221GL-FNB-NC24B ARM/C2 head node bond0 summary
cmsh -c "device use master;interfaces;use bond0;show"
Parameter Value --------- ----- Revision Type bond Network device name bond0 [prov] Network internalnet IP 7.241.16.8 DHCP no Alternative Hostname Additional Hostnames Switch ports Start if always BringUpDuringInstall no On network priority 70 Bootable no MAC 00:00:00:00:00:00 Mode 4 (802.3ad) Options Interfaces enP4s4np0,enP6s6np0
Add NICs/ports for use in a network bond to connect to the ipmi network which is used to talk to the NVLink Switch devices (COMe1).
cmsh -c "device use master;interfaces;add physical enP18s18np0;set mac <M3 MAC>;add physical enP22s22np0;set mac <M4 MAC>;add bond bond1;set ip <headnode IP>;set network ipminet0;set interfaces enP18s18np0 enP22s22np0;set mode 4;commit"
Example: Supermicro ARS-221GL-FNB-NC24B ARM/C2 head node bond1 settings
cmsh device use master interfaces add physical enP18s18np0 # (In the P2P this is M3) set mac <M3 MAC> add physical enP22s22np0 # (In the P2P this is M4) set mac <M4 MAC> add bond bond1 set ip <headnode IP> set network ipminet0 set interfaces enP18s18np0 enP22s22np0 set mode 4 commit
Reference: Head node interface configuration (before HA setup):
[a03-p1-head-01->device[a03-p1-head-01]->interfaces]% list Type Network device name IP Network Start if -------- ------------------- ------------- ---------- -------- bmc rf0 7.241.0.5 ipminet0 always bond bond0 [prov] 7.241.16.8 internalnet always bond bond1 7.241.0.8 ipminet0 always physical enP18s18np0 (bond1) 0.0.0.0 - always physical enP22s22np0 (bond1) 0.0.0.0 - always physical enP4s4np0 (bond0) 0.0.0.0 - always physical enP6s6np0 (bond0) 0.0.0.0 - always
Optional- If there is an extra 1G RJ45 LOM port and it is identical to what is available on the secondary head node, add this device to later use for a dedicated heartbeat cable for HA.
Set the BMC credentials for the head nodes.
cmsh -c "device use master;interfaces;set bmcsettings;set userid <2 is the default for most OEMS>;set username <oem default user name>;set password <Unique password found on the asset tag>;commit"
device use master bmcsettings set userid <2 is the default for most OEMS> set username <oem default user name> set password commit
SLURM Login Nodes (slogin)#
Here are the requirements for the slogin
nodes:
Node Count: 2 nodes are required.
Reason: Redundancy and load balancing.
CPU Architecture: The slogin space nodes are ARM based.
Reason: The slogin nodes are used for user access to the cluster and for running Slurm jobs. Generally this is where users will natively compile ARM code to run on the GB200 compute trays.
Network Connectivity: These nodes require one bond interface for provisioning, and two ports connect to the fast storage fabric (storagenet).
Required Interfaces: Each node must have the following network interfaces:
A bond for provisioning and normal N/S (host-to-host) communication.
Create slogin golden node entry.
Set a MAC at the device level, this can be either management port MAC (M1/M2). This assumes the MACs are known. This is required for the first provisioning.
cmsh -c "device use master;device; add physicalnode <nodename>; set category slogin; set mac <M1/M2 MAC>; commit"
Example: Supermicro ARS-221GL-FNB-NC24B ARM/C2 slogin node entry
cmsh
device; add physicalnode <nodename>;
set category slogin
set mac <M1/M2 MAC>
commit
Add BMC (ipmi0 or rf0).
Defining this as ipmi0 will have BCM run ipmitool for OOB communication. Using rf0 (redfish) will have BCM use redfish for OOB communication. For DGX SuperPOD, use rf0.
cmsh -c "device use <slogin-node name>;interfaces;add bmc rf0;set ip <rf0 ip on ipminet>;set network <ipminetx>;set mac <BMC MAC>;commit"
Example: Supermicro ARS-221GL-FNB-NC24B ARM/C2 slogin node rf0 settings
cmsh
device use <slogin-node name>;interfaces
add bmc rf0 # in this generation, it is now set up as rf0 (redfish0) vs. ipmi0
set ip <rf0 ip on ipminet>
set network <ipminetx>
set mac <BMC MAC> # this is new in BCM11 where the MAC for the IPMI can be defined
commit
Note
When this is committed, it will set the power control to rf0/ipmi0.
Add provisioning NICs then create a bond for communication with internalnet (200G) in-band management.
On the slogin nodes the two port X550 Ethernet card is unused and is not configured in BCM.
cmsh -c "device use <slogin-node name>;interfaces;add physical enP4s4np0;set mac <M1 MAC>;add physical enP6s6np0;set mac <M2 MAC>;add bond bond0;set ip <first slogin node IP>;set network internalnet;set interfaces enP4s4np0 enP6s6np0;set mode 4;set options miimon=100;commit;..;..;set provisioninginterface bond0;commit"
Example: Supermicro ARS-221GL-FNB-NC24B ARM/C2 slogin node bond0 settings
cmsh
device use <slogin-node name>;interfaces;
add physical enP4s4np0 # (In the P2P this is M1)
set mac <M1 MAC>
add physical enP6s6np0
set mac <M2 MAC>
add bond bond0
set ip <first slogin node IP>
set network internalnet
set interfaces enP4s4np0 enP6s6np0
set mode 4
set options miimon=100
commit
..
..
set provisioninginterface bond0
commit
For slogin, add two ports to connect to the fast storage network (/31 IPs).
cmsh -c "device use <slogin-node name>;interfaces;add physical enP18s18np0;set mac <M3 MAC>;add physical enP22s22np0;set mac <M4 MAC>;set network storagenet;set ip <S1 IP>;set ip <S2 IP>;commit"
Example: Supermicro ARS-221GL-FNB-NC24B ARM/C2 slogin node fast storage network settings
cmsh
device use <slogin-node name>;interfaces;
add physical enP18s18np0 # (In the P2P this is M3)
set mac <M3 MAC>
set ip <S1 IP>
set network storagenet
add physical enP22s22np0 # (In the P2P this is M4)
set mac <M4 MAC>
set ip <S2 IP>
set network storagenet
commit
Set the slogin BMC credentials here if each node has a unique user/password. If they are the same, this should have already been set at the category level.
cmsh -c "device use <slogin-node name>;bmcsettings;set userid 2;set username ADMIN;set password <Unique password found on the asset tag>;commit"
Example: Supermicro ARS-221GL-FNB-NC24B ARM/C2 slogin node BMC settings .. code-block:: console
cmsh device use <slogin-node-name> bmcsettings set userid 2 set username ADMIN set password <Unique password found on the asset tag> commit
Reference: slogin node interface settings
Type Network device name IP Network Start if
----------- -------------------- -------------- ----------- --------
bmc rf0 7.241.0.23 ipminet0 always
bond bond0 [prov] 7.241.16.23 internalnet always
physical enP4s4np0 (bond0) 0.0.0.0 - always
physical enP6s6np0 (bond0) 0.0.0.0 - always
physical enP18s18np0 <S1 IP> storagenet always
physical enP22s22np0 <S2 IP> storagenet always
Clone golden slogin node to one other slogin node.
foreach -o <rack>-<ru>-p<podnumber>-slogin-01 -n <rack>-<ru>-p<podnumber>-slogin-02 --next-ip ()
k8s-admin Nodes (k8a)#
Here are the requirements for the k8s-admin
Kubernetes cluster:
Node Count: 3 nodes are required.
Reason: Kubernetes requires an odd number of control plane nodes to maintain quorum.
CPU Architecture: All nodes must be x86-based.
Reason: the NMX-M software for managing NVLink fabric is only compatible with x86 at this time.
Network Connectivity: The cluster needs access to the Out of Band (OOB) network.
Purpose: NMX-M software must communicate with NVLink switches, which are only available on the OOB network.
Required Interfaces: Each node must have two bonded network interfaces:
A bond for provisioning and normal N/S (host-to-host) communication.
A bond for the OOB network (also called the NVLink COMe0/COMe1 network).
Create
k8s-admin
golden node entry.
Set a MAC at the device level, this can be either management port MAC (M1/M2). This assumes the MACs are known. This is required for first provisioning.
cmsh -c "device use master;device; add physicalnode <rack>-<RU>-p<podnumber>-k8a-<arch>-01; set category k8s-admin; set mac <M1/M2 MAC>; commit"
Example: Supermicro SYS-221GE-FNB-NC24B x86 k8s-admin node entry
cmsh;device; add physicalnode <rack>-<RU>-p<podnumber>-k8a-<arch>-01; set category k8s-admin set mac <M1/M2 MAC> commit
Add bmc rf0.
cmsh -c "device use <rack>-<RU>-p<podnumber>-k8a-<arch>-01;interfaces;add bmc rf0;set ip <rf0 ip on ipminet>;set network <ipminetx>;set mac <BMC MAC>;commit"Example: Supermicro SYS-221GE-FNB-NC24B x86 k8s-admin node rf0 settings
cmsh;device use <rack>-<RU>-p<podnumber>-k8a-<arch>-01;interfaces add bmc rf0 set ip <rf0 ip on ipminet> set network <ipminetx> set mac <BMC MAC> commit
Add provisioning NICs.
cmsh -c "device use <rack>-<RU>-p<podnumber>-k8a-<arch>-01;interfaces;add physical ens1np0;set mac <M1 MAC>;add physical enp42s0np0;set mac <M2 MAC>;add physical enp171s0np0;set mac <M3 MAC>;add physical enp189s0np0;set mac <M4 MAC>;commit"Example: Supermicro SYS-221GE-FNB-NC24B x86 k8s-admin node interfaces
# NICS in bond0 (management/internalnet) cmsh;device use <rack>-<RU>-p<podnumber>-k8a-<arch>-01;interfaces; add physical ens1np0 # In the P2P this is M1 set mac <M1 MAC> add physical enp42s0np0 set mac <M2 MAC> # NICS in bond1 (NICs to bond for NVLink COMe0/COMe1 connections) add physical enp171s0np0 # In the P2P this is M3 set mac <M3 MAC> add physical enp189s0np0 set mac <M4 MAC>
Create the bond0 for connection with internalnet and bond1 for connection to the NVLink COMe network/ipminet.
cmsh -c "device use <k8s-admin node name>;interfaces;add bond bond0;set ip <first k8s-admin node IP>;set network internalnet;set interfaces ens1np0 enp42s0np0;set mode 4;commit;..;..;set provisioninginterface bond0;add bond bond1;set interfaces enp171s0np0 enp189s0np0;set mode 4;commit"cmsh device use <k8s-admin node name> interfaces add bond bond0 set ip <first k8s-admin node IP> set network internalnet set interfaces ens1np0 enp42s0np0 set mode 4 commit .. .. set provisioninginterface bond0 interfaces add bond bond1 set interfaces enp171s0np0 enp189s0np0 set mode 4 commit
Clone golden k8s-admin node to two other k8s-admin hostnames.
foreach -o <rack>-<RU>-p<podnumber>-k8a-<arch>-01 -n <rack>-<RU>-p<podnumber>-k8a-<arch>-02 <rack>-<RU>-p<podnumber>-k8a-<arch>-03 --next-ip ()
K8s-user (k8u)#
Here are the requirements for the k8s-user
Kubernetes cluster:
Node Count: 3 nodes are required.
Reason: Kubernetes requires an odd number of control plane nodes to maintain quorum.
CPU Architecture: The k8s-user space nodes can either be ARM or x86 based.
Reason: The primary use of the k8s-user space is for the installation and hosting of Run:ai which supports both microarchitectures.
Network Connectivity: These nodes require one bond interface for provisioning, and the other two ports connect to the fast storage fabric (storagenet).
Required Interfaces: Each node must have the following network interfaces:
A bond for provisioning and normal N/S (host-to-host) communication.
Two fast storage NICs/ports, each with an /31 IP, each IP on a different border TOR.
Create k8s-user golden node entry.
Set a MAC at the device level, this can be either management port MAC (M1/M2). This assumes the MACs are known. This is required for first provisioning.
cmsh;device; add physicalnode <rack>-<ru>-p<podnumber>-k8u-<arch>-01;
set category k8s-user
set mac <M1/M2 MAC>
Add BMC (ipmi0 or rf0).
Defining this as ipmi0 will have BCM run ipmitool for OOB communication. Using rf0 (redfish) will have BCM use redfish for OOB communication. For DGX SuperPOD, use rf0.
cmsh;device use <rack>-<ru>-p<podnumber>-k8u-<arch>-01;
add bmc rf0
set ip <rf0 ip on ipminet >
set network <ipminetx>
set mac <BMC MAC>
Add provisioning NICs then create a bond for communication with internalnet (200G) in-band management.
cmsh;device use <rack>-<ru>-p<podnumber>-k8u-<arch>-01;interfaces;
add physical enP4s4np0 # In the P2P this is M1
set mac <M1 MAC>
add physical enP6s6np0
set mac <M2 MAC>
add bond bond0
set ip <first k8s-user IP>
set network internalnet
set interfaces enP4s4np0 enP6s6np0
set mode 4
commit
..
..
set provisioninginterface bond0
commit
Add two fast storage NICs, each with an /31 IP, each IP on a different border TOR.
cmsh;device use <rack>-<ru>-p<podnumber>-k8u-<arch>-01;interfaces;
add physical enP18s18np0 # In the P2P this is M3
set mac <M3 MAC>
set ip <S1 IP>
set network storagenet
add physical enP22s22np0 # In the P2P this is M4
set mac <M4 MAC>
set ip <S2 IP>
set network storagenet
commit
Set the k8s-user BMC credentials here if each node has a unique user/password. If they are the same this should have already been set at the category level.
device use <rack>-<ru>-p<podnumber>-k8u-<arch>-01
bmcsettings
set userid 2
set username ADMIN
set password <Unique password found on the asset tag>
commit
Reference: k8s-user interface settings
Type Network device name IP Network Start if
---------- -------------------- --------------- ------------ --------
bmc rf0 7.241.0.11 ipminet0 always
bond bond0 [prov] 7.241.16.11 internalnet always
physical enP18s18np0 100.127.0.1 storagenet always
physical enP22s22np0 100.127.128.1 storagenet always
physical enP4s4np0 (bond0) 0.0.0.0 - always
physical enP6s6np0 (bond0) 0.0.0.0 - always
Clone golden k8s-user to two other k8s-user-02 and k8s-user-03.
foreach -o <rack>-<ru>-p<podnumber>-k8u-<arch>-01 -n <rack>-<ru>-p<podnumber>-k8u-<arch>-02 <rack>-<ru>-p<podnumber>-k8u-<arch>-03 --next-ip ()