Head Node Configuration

This section addresses configuration steps to be performed on BCM head nodes.

Use the root (not cmsh) shell.

  1. In /cm/local/apps/cmd/etc/cmd.conf, uncomment the AdvancedConfig parameter.

    AdvancedConfig = { "DeviceResolveAnyMAC=1" } # modified value
    
  2. Restart the CMDaemon to enable reliable PXE booting from bonded interfaces.

    systemctl restart cmd
    

    The cmsh session will be disconnected because of restarting the CMDaemon. Type connect to reconnect after the CMDaemon has restarted. Or enter exit and then restart cmsh. The steps that follow are performed on the head node and should be run for all DGX systems.

    Warning

    Older method: The steps below are only necessary if you are employing the MAC to IP Allocation method. Newer methods omit assigning MAC addresses based on the ports. Skip to Step #9.

  3. The steps that follow are performed on the head node and should be run for all DGX systems.

    On the head node, set the MAC addresses on the physical interfaces.

    Note

    Double check the MAC address for each interface and the IP number for the bond0 interface. Mistakes here will be difficult to diagnose.

    For DGX A100 systems, the commands should be like this code block.

     1# cmsh
     2% device
     3% use  bcm-dgx-a100-01
     4% interfaces
     5% use enp225s0f1np1
     6% set mac B8:CE:F6:2F:08:69
     7% use enp97s0f1np1
     8% set mac B8:CE:F6:2D:0E:A7
     9% ..
    10% commit
    

    For DGX H100 systems, the commands should be like this code block.

     1# cmsh
     2% device
     3% use bcm-dgx-h100-01
     4% interfaces
     5% use enp170s0f1np1
     6% set mac B8:CE:F6:2F:08:69
     7% use enp41s0f1np1
     8% set mac B8:CE:F6:2D:0E:A7
     9% ..
    10% commit
    
  4. Verify the configuration.

    This example is for a DGX A100 system. The output for a DGX H100 system is similar.

     1% get provisioninginterface
     2bond0
     3% interfaces
     4% list
     5Type         Network device name  IP               Network          Start if
     6------------ -------------------- ---------------- ---------------- --------
     7bmc          ipmi0                10.130.111.68    ipminet          always
     8bond         bond0 [prov]         10.130.122.5     internalnet      always
     9physical     enp225s0f1np1 (bond0)   0.0.0.0                        always
    10physical     enp97s0f1np1 (bond0)    0.0.0.0                        always
    
  5. Identify the nodes by setting the MAC address for the provisioning interface for each node to the MAC address listed in the site survey.

     1% device
     2% use bcm-dgx-h100-01
     3% set mac b8:ce:f6:2f:08:69
     4% use bcm-dgx-h100-02
     5% set mac 0c:42:a1:54:32:a7
     6% use bcm-dgx-h100-03
     7% set mac 0c:42:a1:0a:7a:51
     8% use bcm-dgx-h100-04
     9% set mac 1c:34:da:29:17:6e
    10% foreach -c  dgx-h100 (get mac)
    11B8:CE:F6:2F:08:69
    120C:42:A1:54:32:A7
    130C:42:A1:0A:7A:51
    141C:34:DA:29:17:6E
    
  6. If all the MAC addresses are set properly, commit the changes.

    1% device commit
    2% quit
    
  7. Set the MAC addresses for the Ethernet interfaces.

    For control nodes connected to DGX A100 systems, use the following commands.

    1% device
    2% use bcm-cpu-01
    3% interfaces
    4% use ens2f0np0
    5% set mac 88:e9:a4:92:26:ba
    6% use ens2f1np1
    7% set mac 88:e9:a4:92:26:bb
    8% commit
    

    For control nodes connected to DGX H100 systems, use the following commands.

    1% device
    2% use bcm-cpu-01
    3% interfaces
    4% use enp37s0np0
    5% set mac 88:e9:a4:92:26:ba
    6% use enp65s0np0
    7% set mac 88:e9:a4:92:26:bb
    8% commit
    

    If the head node uses a bonded interface, use the following commands. You may need to reboot the head node and redo request-license steps.

     1% device
     2% use headnode-01
     3% interfaces
     4% use ens1np0
     5% clear ip
     6% clear network
     7% add physical ens2np0
     8% set mac 88:e9:a4:20:18:d8
     9% add bond bond0
    10% append interfaces ens1np0 ens2np0
    11% set mode 1
    12% set network internalnet
    13% set ip 10.180.115.189
    14% ..
    15% ..
    16% set provisioninginterface bond0
    17% interfaces
    18% use ipmi0
    19% set ip 10.180.217.154
    
  8. Set the IP address for the bond0 interface.

    1% device
    2% use bcm-cpu-01
    3% interfaces
    4% use bond0
    5% set ip 10.127.3.15
    6% commit
    
  9. Power on and provision the cluster nodes.

    For initial provisioning, the cluster nodes must be powered on either directly or by using a KVM. It will take several minutes for the nodes to go through their BIOS. After that, node status progress will be displayed as the nodes are being provisioned. Monitor the /var/log/messages and /var/log/node-installer log files to verify that everything is proceeding smoothly.

Newer Method

Prerequisite

  • The switch must be in the UP position to retrieve the MAC address from the switch and switch port specified in the CSV file.

    • If any TORs are offline, Bright will be unable to retrieve MAC addresses from the switch.

Verify

  • Ensure that the weather node interfaces have been assigned a switch and switch port. Navigate to cmsh > device > use <node> and execute the “show” command.

_images/head-node-01.png

Next Step

  • Reboot the Node

    • Detailed MAC addresses per node are not necessary as Bright will automatically detect them based on Switch and Switchport assignments.