Head Node Configuration
This section addresses configuration steps to be performed on BCM head nodes.
Use the root (not cmsh) shell.
In /cm/local/apps/cmd/etc/cmd.conf, uncomment the AdvancedConfig parameter.
AdvancedConfig = { "DeviceResolveAnyMAC=1" } # modified value
Restart the CMDaemon to enable reliable PXE booting from bonded interfaces.
systemctl restart cmd
The cmsh session will be disconnected because of restarting the CMDaemon. Type connect to reconnect after the CMDaemon has restarted. Or enter exit and then restart cmsh. The steps that follow are performed on the head node and should be run for all DGX systems.
Warning
Older method: The steps below are only necessary if you are employing the MAC to IP Allocation method. Newer methods omit assigning MAC addresses based on the ports. Skip to Step #9.
The steps that follow are performed on the head node and should be run for all DGX systems.
On the head node, set the MAC addresses on the physical interfaces.
Note
Double check the MAC address for each interface and the IP number for the bond0 interface. Mistakes here will be difficult to diagnose.
For DGX A100 systems, the commands should be like this code block.
1# cmsh 2% device 3% use bcm-dgx-a100-01 4% interfaces 5% use enp225s0f1np1 6% set mac B8:CE:F6:2F:08:69 7% use enp97s0f1np1 8% set mac B8:CE:F6:2D:0E:A7 9% .. 10% commit
For DGX H100 systems, the commands should be like this code block.
1# cmsh 2% device 3% use bcm-dgx-h100-01 4% interfaces 5% use enp170s0f1np1 6% set mac B8:CE:F6:2F:08:69 7% use enp41s0f1np1 8% set mac B8:CE:F6:2D:0E:A7 9% .. 10% commit
Verify the configuration.
This example is for a DGX A100 system. The output for a DGX H100 system is similar.
1% get provisioninginterface 2bond0 3% interfaces 4% list 5Type Network device name IP Network Start if 6------------ -------------------- ---------------- ---------------- -------- 7bmc ipmi0 10.130.111.68 ipminet always 8bond bond0 [prov] 10.130.122.5 internalnet always 9physical enp225s0f1np1 (bond0) 0.0.0.0 always 10physical enp97s0f1np1 (bond0) 0.0.0.0 always
Identify the nodes by setting the MAC address for the provisioning interface for each node to the MAC address listed in the site survey.
1% device 2% use bcm-dgx-h100-01 3% set mac b8:ce:f6:2f:08:69 4% use bcm-dgx-h100-02 5% set mac 0c:42:a1:54:32:a7 6% use bcm-dgx-h100-03 7% set mac 0c:42:a1:0a:7a:51 8% use bcm-dgx-h100-04 9% set mac 1c:34:da:29:17:6e 10% foreach -c dgx-h100 (get mac) 11B8:CE:F6:2F:08:69 120C:42:A1:54:32:A7 130C:42:A1:0A:7A:51 141C:34:DA:29:17:6E
If all the MAC addresses are set properly, commit the changes.
1% device commit 2% quit
Set the MAC addresses for the Ethernet interfaces.
For control nodes connected to DGX A100 systems, use the following commands.
1% device 2% use bcm-cpu-01 3% interfaces 4% use ens2f0np0 5% set mac 88:e9:a4:92:26:ba 6% use ens2f1np1 7% set mac 88:e9:a4:92:26:bb 8% commit
For control nodes connected to DGX H100 systems, use the following commands.
1% device 2% use bcm-cpu-01 3% interfaces 4% use enp37s0np0 5% set mac 88:e9:a4:92:26:ba 6% use enp65s0np0 7% set mac 88:e9:a4:92:26:bb 8% commit
If the head node uses a bonded interface, use the following commands. You may need to reboot the head node and redo request-license steps.
1% device 2% use headnode-01 3% interfaces 4% use ens1np0 5% clear ip 6% clear network 7% add physical ens2np0 8% set mac 88:e9:a4:20:18:d8 9% add bond bond0 10% append interfaces ens1np0 ens2np0 11% set mode 1 12% set network internalnet 13% set ip 10.180.115.189 14% .. 15% .. 16% set provisioninginterface bond0 17% interfaces 18% use ipmi0 19% set ip 10.180.217.154
Set the IP address for the bond0 interface.
1% device 2% use bcm-cpu-01 3% interfaces 4% use bond0 5% set ip 10.127.3.15 6% commit
Power on and provision the cluster nodes.
For initial provisioning, the cluster nodes must be powered on either directly or by using a KVM. It will take several minutes for the nodes to go through their BIOS. After that, node status progress will be displayed as the nodes are being provisioned. Monitor the /var/log/messages and /var/log/node-installer log files to verify that everything is proceeding smoothly.
Newer Method
Prerequisite
The switch must be in the UP position to retrieve the MAC address from the switch and switch port specified in the CSV file.
If any TORs are offline, Bright will be unable to retrieve MAC addresses from the switch.
Verify
Ensure that the weather node interfaces have been assigned a switch and switch port. Navigate to cmsh > device > use <node> and execute the “show” command.
Next Step
Reboot the Node
Detailed MAC addresses per node are not necessary as Bright will automatically detect them based on Switch and Switchport assignments.