Head Node Configuration
This section addresses configuration steps to be performed on BCM head nodes.
Use the root (not cmsh) shell.
In /cm/local/apps/cmd/etc/cmd.conf, uncomment the AdvancedConfig parameter.
AdvancedConfig = { "DeviceResolveAnyMAC=1" } # modified value
Restart the CMDaemon to enable reliable PXE booting from bonded interfaces.
systemctl restart cmd
The cmsh session will be disconnected because of restarting the CMDaemon. Type connect to reconnect after the CMDaemon has restarted. Or enter exit and then restart cmsh. The steps that follow are performed on the head node and should be run for all DGX systems.
The steps that follow are performed on the head node and should be run for all DGX systems.
On the head node, set the MAC addresses on the physical interfaces.
Note
Double check the MAC address for each interface and the IP number for the bond0 interface. Mistakes here will be difficult to diagnose.
For each DGX H100 system, set the MAC and IP addresses as in this code block. Ensure that the addresses match the site survey.
1cmsh 2 3[bcm10-headnode->device]% use dgx-01 4[bcm10-headnode->device[dgx-01]]% set mac 94:6D:AE:53:91:FB 5[bcm10-headnode->device*[dgx-01*]]% interfaces 6[bcm10-headnode->device*[dgx-01*]->interfaces]% set enp170s0f1np1 mac 94:6D:AE:53:91:FB 7[bcm10-headnode->device*[dgx-01*]->interfaces*]% set enp41s0f1np1 mac 94:6D:AE:53:74:0B 8[bcm10-headnode->device*[dgx-01*]->interfaces*]% set ipmi0 ip 10.133.3.39 9[bcm10-headnode->device*[dgx-01*]->interfaces*]% set bond0 ip 10.133.5.31 10[bcm10-headnode->device*[dgx-01*]->interfaces*]% exit 11[bcm10-headnode->device*[dgx-01*]]% commit
Verify the configuration.
1[bcm10-headnode->device]% use dgx-01 2[bcm10-headnode->device*[dgx-01]]% interfaces 3[bcm10-headnode->device[dgx-01]->interfaces]% ls 4Type Network device name IP Network Start if 5------------ ---------------------- ---------------- ---------------- -------- 6bmc ipmi0 10.133.3.39 ipminet always 7bond bond0 [prov] 10.133.5.31 dgxnet1 always 8physical enp170s0f1np1 (bond0) 0.0.0.0 always 9physical enp41s0f1np1 (bond0) 0.0.0.0 always 10physical ibp154s0 100.126.0.17 computenet always 11physical ibp170s0f0 100.127.0.14 storagenet always 12physical ibp192s0 100.126.0.18 computenet always 13physical ibp206s0 100.126.0.19 computenet always 14physical ibp220s0 100.126.0.20 computenet always 15physical ibp24s0 100.126.0.13 computenet always 16physical ibp41s0f0 100.127.0.13 storagenet always 17physical ibp64s0 100.126.0.14 computenet always 18physical ibp79s0 100.126.0.15 computenet always 19physical ibp94s0 100.126.0.16 computenet always
(Optional) Delete any extra DGX nodes that will not be provisioned. The list of nodes can be comma separated, or specified by a range as in the example below.
[bcm10-headnode]% device [bcm10-headnode->device]% remove -n dgx-21..dgx-31 [bcm10-headnode->device*]% commit Successfully removed 11 Devices Successfully committed 0 Devices
Delete the slogin nodes and create the first k8s master node. The knodes will be configured during the kubernetes setup.
[bcm10-headnode]% device [bcm10-headnode->device]% remove -n slogin-01,slogin-02 [bcm10-headnode->device*]% set cpu-01 hostname knode-01 [bcm10-headnode->device*]% commit Successfully removed 2 Devices Successfully committed 1 Devices
(Optional) If the head node will be using a bonded interface, use the following commands. You may need to reboot the head node and redo request-license steps.
1[bcm10-headnode]% device 2[bcm10-headnode->device]% use bcm10-headnode 3[bcm10-headnode->device[bcm10-headnode]]% interfaces 4[bcm10-headnode->device[bcm10-headnode]->interfaces]% clear ens3f1np1 ip 5[bcm10-headnode->device*[bcm10-headnode*]->interfaces*]% clear ens3f1np1 network 6[bcm10-headnode->device*[bcm10-headnode*]->interfaces*]% add physical ens2np0 7[bcm10-headnode->device*[bcm10-headnode*]->interfaces*[ens2np0*]]% set mac 88:e9:a4:20:18:d8 8[bcm10-headnode->device*[bcm10-headnode*]->interfaces*[ens2np0*]]% add bond bond0 9[bcm10-headnode->device*[bcm10-headnode*]->interfaces*[bond0*]]% append interfaces ens3f1np1 ens2np0 10[bcm10-headnode->device*[bcm10-headnode*]->interfaces*[bond0*]]% set mode 1 11[bcm10-headnode->device*[bcm10-headnode*]->interfaces*[bond0*]]% set network internalnet 12[bcm10-headnode->device*[bcm10-headnode*]->interfaces*[bond0*]]% set ip 10.133.4.24 13[bcm10-headnode->device*[bcm10-headnode*]->interfaces*[bond0*]]% .. 14[bcm10-headnode->device*[bcm10-headnode*]->interfaces*]% .. 15[bcm10-headnode->device*[bcm10-headnode*]]% set provisioninginterface bond0 16[bcm10-headnode->device*[bcm10-headnode*]]% commit
Power on and provision the DGX nodes.
For initial provisioning, the DGX nodes must be powered on either directly or by using a KVM. It will take several minutes for the nodes to go through their BIOS. After that, node status progress will be displayed as the nodes are being provisioned. Monitor the /var/log/messages and /var/log/node-installer log files to verify that everything is proceeding smoothly.