Control Plane Configuration Verification Checklist#

This checklist helps verify that all control plane node configurations have been completed correctly after following the steps in Control Plane Node Entries. Use this systematic verification process before proceeding to provisioning or high availability setup.

General Prerequisites#

  • [ ] Categories, software images, and networks have been defined

  • [ ] P2P connection documentation is available for MAC addresses

  • [ ] IP allocation plan is documented and followed

  • [ ] BMC credentials are documented (asset tags, default usernames)

Head Nodes Verification#

Primary Head Node Configuration#

BMC Interface (rf0/ipmi0)

  • [ ] BMC interface added (rf0 for DGX SuperPOD, ipmi0 for OEM if rf0 fails)

  • [ ] IP address assigned on correct IPMI network

  • [ ] BMC MAC address configured

  • [ ] Power control set to rf0/ipmi0

Provisioning Bond (bond0)

  • [ ] Physical interfaces added: enP4s4np0 (M1) and enP6s6np0 (M2)

  • [ ] Correct MAC addresses assigned to each interface

  • [ ] Bond configured with mode 4 (LACP)

  • [ ] IP assigned on internalnet

  • [ ] Bond set as provisioning interface

  • [ ] miimon=100 option configured

IPMI Network Bond (bond1)

  • [ ] Physical interfaces added: enP18s18np0 (M3) and enP22s22np0 (M4)

  • [ ] Correct MAC addresses assigned

  • [ ] Bond configured with mode 4

  • [ ] IP assigned on ipminet0

BMC Credentials

  • [ ] User ID set (typically 2)

  • [ ] Username configured (OEM default)

  • [ ] Password set (from asset tag)

Optional Heartbeat Interface

  • [ ] Extra 1G RJ45 LOM port configured if available for HA

SLURM Login Nodes (slogin) Verification#

Golden Node Configuration#

Node Creation

  • [ ] Physical node created with category slogin

  • [ ] Device-level MAC set (M1/M2)

  • [ ] ARM/C2 architecture confirmed

BMC Configuration

  • [ ] BMC interface rf0 added

  • [ ] IP assigned on IPMI network

  • [ ] BMC MAC address configured

Provisioning Bond (bond0)

  • [ ] Physical interfaces: enP4s4np0 (M1) and enP6s6np0 (M2)

  • [ ] Bond mode 4 configured

  • [ ] IP assigned on internalnet

  • [ ] Set as provisioning interface

  • [ ] miimon=100 option configured

Fast Storage Network

  • [ ] Two physical ports configured: enP18s18np0 (M3) and enP22s22np0 (M4)

  • [ ] Each port has /31 IP on storagenet

  • [ ] Correct MAC addresses assigned

BMC Credentials (if unique per node)

  • [ ] User ID, username, and password configured

Node Cloning#

Second slogin Node

  • [ ] Cloned using foreach command with --next-ip

  • [ ] Hostname follows naming convention: <rack>-<ru>-p<podnumber>-slogin-02

  • [ ] IPs incremented correctly

K8s-Admin Nodes (k8a) Verification#

Golden Node Configuration#

Node Creation

  • [ ] Physical node created with category k8s-admin

  • [ ] x86 architecture confirmed (required for NMX-M software)

  • [ ] Device-level MAC set (M1/M2)

BMC Configuration

  • [ ] BMC interface rf0 added

  • [ ] IP assigned on IPMI network

  • [ ] BMC MAC address configured

Physical Interfaces

  • [ ] Four interfaces configured:

    • [ ] ens1np0 (M1) with correct MAC

    • [ ] enp42s0np0 (M2) with correct MAC

    • [ ] enp171s0np0 (M3) with correct MAC

    • [ ] enp189s0np0 (M4) with correct MAC

Bond Configuration

  • [ ] Bond0 (Management/Provisioning)

    • [ ] Interfaces: ens1np0 and enp42s0np0

    • [ ] Mode 4 (LACP)

    • [ ] IP on internalnet

    • [ ] Set as provisioning interface

  • [ ] Bond1 (NVLink COMe Network)

    • [ ] Interfaces: enp171s0np0 and enp189s0np0

    • [ ] Mode 4 (LACP)

    • [ ] Connected to OOB network for NMX-M communication

Node Cloning#

Additional k8s-admin Nodes

  • [ ] Total of 3 nodes (odd number for quorum)

  • [ ] Nodes 02 and 03 cloned with foreach command

  • [ ] Hostnames follow convention: <rack>-<RU>-p<podnumber>-k8a-<arch>-02/03

  • [ ] IPs incremented correctly

K8s-User Nodes (k8u) Verification#

Golden Node Configuration#

Node Creation

  • [ ] Physical node created with category k8s-user

  • [ ] Architecture documented (ARM or x86 supported)

  • [ ] Device-level MAC set (M1/M2)

BMC Configuration

  • [ ] BMC interface rf0 added

  • [ ] IP assigned on IPMI network

  • [ ] BMC MAC address configured

Provisioning Bond (bond0)

  • [ ] Physical interfaces: enP4s4np0 (M1) and enP6s6np0 (M2)

  • [ ] Bond mode 4 configured

  • [ ] IP assigned on internalnet

  • [ ] Set as provisioning interface

Fast Storage Network

  • [ ] Two physical ports: enP18s18np0 (M3) and enP22s22np0 (M4)

  • [ ] Each port has /31 IP on storagenet

  • [ ] Each IP on different border TOR

  • [ ] Correct MAC addresses assigned

BMC Credentials (if unique per node)

  • [ ] User ID, username, and password configured

Node Cloning#

Additional k8s-user Nodes

  • [ ] Total of 3 nodes (odd number for quorum)

  • [ ] Nodes 02 and 03 cloned with foreach command

  • [ ] Hostnames follow convention: <rack>-<ru>-p<podnumber>-k8u-<arch>-02/03

  • [ ] IPs incremented correctly

Final Verification Steps#

Network Connectivity#

Interface Status Check

  • [ ] All interfaces show “always” for Start if

  • [ ] Bond interfaces show correct member interfaces

  • [ ] IP addresses match planning documentation

Network Assignment Verification

  • [ ] internalnet assignments for all provisioning bonds

  • [ ] ipminet0 assignments for IPMI/OOB interfaces

  • [ ] storagenet assignments for storage interfaces

Documentation and Naming#

Hostname Conventions

  • [ ] All nodes follow <RACK>-<RU>-P[1-16]-<TYPE>-0[1-X] pattern

  • [ ] HEAD nodes: P[1-16]-HEAD-0[1-2]

  • [ ] SLOGIN nodes: P[1-16]-SLOGIN-0[1-2]

  • [ ] K8U nodes: P[1-16]-K8U-0[1-3]

  • [ ] K8A nodes: P[1-16]-K8A-0[1-3]

Ready for Next Steps#

High Availability Setup

  • [ ] Primary head node configuration complete

  • [ ] Ready for HA setup process (secondary head node will be auto-configured)

Provisioning Readiness

  • [ ] All control nodes defined and configured

  • [ ] Ready to begin provisioning process

  • [ ] Network infrastructure validated

Commands for Quick Verification#

Use these commands to quickly verify configurations:

Check Interface Configuration

cmsh -c "device use <node-name>;interfaces;list"

Verify BMC Settings

cmsh -c "device use <node-name>;bmcsettings;show"

Check Bond Configuration

cmsh -c "device use <node-name>;interfaces;use bond0;show"

List All Nodes by Category

cmsh -c "device;list -c slogin"
cmsh -c "device;list -c k8s-admin"
cmsh -c "device;list -c k8s-user"

Verify Network Assignments

cmsh -c "networks;list"
cmsh -c "device use <node-name>;interfaces;list"

Next Steps#

Once all items in this checklist are verified:

  1. Proceed to Finalize Headnode Setup

  2. Configure High Availability if required

  3. Begin Control Plane Power On and Provisioning

Note

This checklist should be completed before proceeding to the provisioning phase. Any missing configurations should be addressed by returning to the appropriate sections in Control Plane Node Entries.