Manual Addition of GB200 Rack Entries#

If the GB200 rack components need to be added manually in BCM (through cmsh), the explicit steps and requirements are documented here.

GB200 Compute Tray Golden Node#

  1. Add the rack entry into cmsh.

    Step 1: Add rack entry commands
    cmsh -c "rack; add <rack number>; set x-coordinate 1; set y-coordinate 1; commit"
    
  2. Add node entry. Please note, follow the nomenclature as described in the rack inventory section.

    Step 2: Add node entry commands
    cmsh -c "device;add physicalnode <dgx-node-01>"
    
  3. Set category (many attributes will be inherited based on what was set up for the GB200 category).

    Step 3: Set category commands
    cmsh -c "rack-ru-p1-head-01->device*[<dgx-node-01>]; set category <gb200 category>; commit"
    
  4. Add BMC connection (rf0 for redfish based control, ipmi0 for ipmitool. default rf0). This will set the power control of the node to this interface.

    Step 4: Add BMC connection commands
    cmsh -c "add bmc rf0; set network <name of ipminet where ipmi is configured>; set ip <bmc IP>; set mac <bmc mac> (if available); commit"
    
  5. Add Bluefield and CX-7 interfaces. The individual netnames for management ports 1 and 2 and storage ports 1 and 2 are:

    • M1—enP6p3s0f0np0

    • M2—enP22p3s0f0np0

    • S1—enP6p3s0f1np1

    • S2—enP22p3s0f1np1

    Step 5: Add Bluefield and CX-7 interface commands
    cmsh -c "add physical enP6p3s0f0np0; set mac <M1 MAC>; commit"
    cmsh -c "add physical enP22p3s0f0np0; set mac <M2 MAC>; commit"
    
    cmsh -c "add physical enP6p3s0f1np1; set network <storage network name>; set mac <S1 MAC>; commit"
    cmsh -c "add physical enP22p3s0f1np1; set network <storage network name>; set mac <S2 MAC>; commit"
    

    Note

    • If the MACs are not available, they can be set after cloning the golden node entry.

    • M1 and M2 will be configured as bond0, so the IP address and network will be setup during the bond configuration.

    • S1 and S2 will be configured as individual interfaces, so the IP address and network will be setup during the interface configuration.

  6. Set MACs for each interface (if available). This can be done after cloning the golden node entry.

    Note

    MAC addresses can be set in the previous step along with the interface configuration.

  7. Configure bonds. Add and configure Bond0 (assuming LACP bonding enabled)

    Configure bond0 single command
    cmsh -c "add bond bond0; set interfaces enP6p3s0f0np0 enP22p3s0f0np0; set mode 4; set set options miimon=100; set network <internalnet or whatever network it is being provisioned on>; set ip <bond ip>; commit"
    
    Configure bond commands (multiple commands)
    add bond bond0
    set interfaces enP6p3s0f0np0 enP22p3s0f0np0
    set mode 4
    set set options miimon=100
    set network <internalnet or whatever network it is being provisioned on>
    set ip <bond ip>
    commit
    
  8. Set provisioning interface.

    Step 8: Set provisioning interface commands
    cmsh -c "device; use <dgx-node-01>; set provisioninginterface bond0; commit"
    
  9. Add InfiniBand interfaces.

    Step 9: Add InfiniBand interface commands
    cmsh -c "add physical ibp3s0; set network computenet; set ip <IP> (if available); commit"
    cmsh -c "add physical ibP2p3s0; set network computenet; set ip <IP> (if available); commit"
    cmsh -c "add physical ibP16p3s0; set network computenet; set ip <IP> (if available); commit"
    cmsh -c "add physical ibP18p3s0; set network computenet; set ip <IP> (if available); commit"
    
  10. Set system MAC for initial boot. Choose M1 or M2 MAC and set it for the device unless a console or remote BMC/KVM access is available to the node to select it

    Step 10: Set system MAC commands
    cmsh -c "device; use <dgx-node-01>; set mac <M1 or M2 MAC>; commit"
    
  11. Clone 18 entries.

foreach -o <goldennode> -n <hostname with first node number>..<hostname with last node number> --next-ip ()

foreach -o <rack number>-<pod number>-gb200-c01 -n <rack number>-<pod number>-gb200-c02..<rack number>-<pod number>-gb200-c18 --next-ip ()

commit
  1. IPs and MACs will have to be updated manually for each entry.

  2. Set the rack name and position for each compute tray, NVLink Switch, and Power Shelf.

The following script will prompt the user to enter the hostname nomenclature for the rack to be added. This is needed for each rack to properly display in the rack submenu, which is a new feature of BCM 11. The script handles all rack components with proper physical positioning and consistent RU-based hostname mapping throughout the entire 42U rack space.

Manual rack position update script
#!/bin/bash

# --- Get hostname components from the user ---

echo "Please define the hostname components."

read -p "Enter the 'rack number' for the hostname (this will also be used for 'set rack <rack_number>'): " HOSTNAME_RACK_ID

read -p "Enter the 'pod number' for the hostname (e.g., p1): " HOSTNAME_POD_ID

# Validate inputs (basic check if they are not empty)

if [ -z "$HOSTNAME_RACK_ID" ] || [ -z "$HOSTNAME_POD_ID" ]; then
   echo "Error: Hostname rack number and pod number cannot be empty."
   exit 1
fi

echo # Adding a blank line for better readability

echo "Using Hostname Rack ID (and for 'set rack' command): $HOSTNAME_RACK_ID"
echo "Using Hostname Pod ID: $HOSTNAME_POD_ID"
echo "Hostname formats will be:"
echo "  Compute trays: ${HOSTNAME_RACK_ID}-<RU>-${HOSTNAME_POD_ID}-gb200-n<node_number>"
echo "  NVLink switches: ${HOSTNAME_RACK_ID}-<RU>-${HOSTNAME_POD_ID}-nvsw-n<switch_number>"
echo "  Power shelves: ${HOSTNAME_RACK_ID}-<RU>-${HOSTNAME_POD_ID}-pwr-n<shelf_number>"

echo # Adding a blank line

# --- First range of nodes (01 to 08) ---

k_rack_position=10 # Initialize k_rack_position for the first set of nodes

echo "Processing nodes 01 to 08..."

for i in {01..08}; do
   # The host_RU matches the rack position
   host_RU=$k_rack_position

   # Construct the device hostname
   device_hostname="${HOSTNAME_RACK_ID}-${host_RU}-${HOSTNAME_POD_ID}-gb200-n${i}"
   echo "Target Device: $device_hostname, Setting Rack Position $k_rack_position in Rack ${HOSTNAME_RACK_ID}"
   cmsh -c "device; use ${device_hostname}; set rack ${HOSTNAME_RACK_ID} $k_rack_position; commit"
   k_rack_position=$((k_rack_position + 1))
done

echo "Finished processing nodes 01 to 08."
echo # Adding a blank line

# --- Second range of nodes (09 to 18) ---

k_rack_position=27 # Re-initialize k_rack_position for the second set of nodes

echo "Processing nodes 09 to 18..."

for i in {09..18}; do
   # The host_RU matches the rack position
   host_RU=$k_rack_position

   # Construct the device hostname
   device_hostname="${HOSTNAME_RACK_ID}-${host_RU}-${HOSTNAME_POD_ID}-gb200-n${i}"
   echo "Target Device: $device_hostname, Setting Rack Position $k_rack_position in Rack ${HOSTNAME_RACK_ID}"
   cmsh -c "device; use ${device_hostname}; set rack ${HOSTNAME_RACK_ID} $k_rack_position; commit"
   k_rack_position=$((k_rack_position + 1))
done

echo "Finished processing nodes 09 to 18."
echo # Adding a blank line

# --- Third range - NVLink Switches (01 to 09) ---

k_rack_position=18 # Initialize k_rack_position for NVLink switches

echo "Processing NVLink switches 01 to 09..."

for i in {01..09}; do
   # The host_RU matches the rack position
   host_RU=$k_rack_position

   # Construct the NVLink switch hostname
   switch_hostname="${HOSTNAME_RACK_ID}-${host_RU}-${HOSTNAME_POD_ID}-nvsw-n${i}"
   echo "Target NVLink Switch: $switch_hostname, Setting Rack Position $k_rack_position in Rack ${HOSTNAME_RACK_ID}"
   cmsh -c "device; use ${switch_hostname}; set rack ${HOSTNAME_RACK_ID} $k_rack_position; commit"
   k_rack_position=$((k_rack_position + 1))
done

echo "Finished processing NVLink switches 01 to 09."
echo # Adding a blank line

# --- Fourth range - Power Shelves (01 to 04) - Below first compute tray group ---

k_rack_position=6 # Initialize k_rack_position for power shelves (below first compute group)

echo "Processing Power shelves 01 to 04..."

for i in {01..04}; do
   # The host_RU matches the rack position
   host_RU=$k_rack_position

   # Construct the power shelf hostname
   pwr_hostname="${HOSTNAME_RACK_ID}-${host_RU}-${HOSTNAME_POD_ID}-pwr-n${i}"
   echo "Target Power Shelf: $pwr_hostname, Setting Rack Position $k_rack_position in Rack ${HOSTNAME_RACK_ID}"
   cmsh -c "device; use ${pwr_hostname}; set rack ${HOSTNAME_RACK_ID} $k_rack_position; commit"
   k_rack_position=$((k_rack_position + 1))
done

echo "Finished processing Power shelves 01 to 04."
echo # Adding a blank line

# --- Fifth range - Power Shelves (05 to 08) - Above top compute tray group ---

k_rack_position=39 # Initialize k_rack_position for power shelves (above second compute group)

echo "Processing Power shelves 05 to 08..."

for i in {05..08}; do
   # The host_RU matches the rack position
   host_RU=$k_rack_position

   # Construct the power shelf hostname
   pwr_hostname="${HOSTNAME_RACK_ID}-${host_RU}-${HOSTNAME_POD_ID}-pwr-n${i}"
   echo "Target Power Shelf: $pwr_hostname, Setting Rack Position $k_rack_position in Rack ${HOSTNAME_RACK_ID}"
   cmsh -c "device; use ${pwr_hostname}; set rack ${HOSTNAME_RACK_ID} $k_rack_position; commit"
   k_rack_position=$((k_rack_position + 1))
done

echo "Finished processing Power shelves 05 to 08."
echo "All nodes, switches, and power shelves processed. Confirm in cmsh by doing cmsh;rack; display <rack id>"
  1. After the nodes have been added, check if the rack positions look correct.

cmsh;rack; display rack <rack number>

or

cmsh;rack;list #see that the nodes are in the expected position
Rack position reference for a GB200 rack
Table 3 GB200 Rack Position Layout (48U Rack, Top-Down View)#

Rack Position

Device Type

Device #

Example Hostname

48-46

Infrastructure

N/A

(Empty/Other infrastructure)

45-44

SN2201 TOR/OOB

N/A

(TOR/OOB switch)

43

Infrastructure

N/A

(Empty/Other infrastructure)

42

Power Shelf

08

A05-42-P1-pwr-n08

41

Power Shelf

07

A05-41-P1-pwr-n07

40

Power Shelf

06

A05-40-P1-pwr-n06

39

Power Shelf

05

A05-39-P1-pwr-n05

38

Infrastructure

N/A

(Empty)

37

Compute Tray

18

A05-37-P1-gb200-n18

36

Compute Tray

17

A05-36-P1-gb200-n17

35

Compute Tray

16

A05-35-P1-gb200-n16

34

Compute Tray

15

A05-34-P1-gb200-n15

33

Compute Tray

14

A05-33-P1-gb200-n14

32

Compute Tray

13

A05-32-P1-gb200-n13

31

Compute Tray

12

A05-31-P1-gb200-n12

30

Compute Tray

11

A05-30-P1-gb200-n11

29

Compute Tray

10

A05-29-P1-gb200-n10

28

Compute Tray

09

A05-28-P1-gb200-n09

27

NVLink Switch

09

A05-27-P1-nvsw-n09

26

NVLink Switch

08

A05-26-P1-nvsw-n08

25

NVLink Switch

07

A05-25-P1-nvsw-n07

24

NVLink Switch

06

A05-24-P1-nvsw-n06

23

NVLink Switch

05

A05-23-P1-nvsw-n05

22

NVLink Switch

04

A05-22-P1-nvsw-n04

21

NVLink Switch

03

A05-21-P1-nvsw-n03

20

NVLink Switch

02

A05-20-P1-nvsw-n02

19

NVLink Switch

01

A05-19-P1-nvsw-n01

18

Compute Tray

08

A05-18-P1-gb200-n08

17

Compute Tray

07

A05-17-P1-gb200-n07

16

Compute Tray

06

A05-16-P1-gb200-n06

15

Compute Tray

05

A05-15-P1-gb200-n05

14

Compute Tray

04

A05-14-P1-gb200-n04

13

Compute Tray

03

A05-13-P1-gb200-n03

12

Compute Tray

02

A05-12-P1-gb200-n02

11

Compute Tray

01

A05-11-P1-gb200-n01

10

Infrastructure

N/A

(Empty)

9

Power Shelf

04

A05-9-P1-pwr-n04

8

Power Shelf

03

A05-8-P1-pwr-n03

7

Power Shelf

02

A05-7-P1-pwr-n02

6

Power Shelf

01

A05-6-P1-pwr-n01

5-1

Infrastructure

N/A

(Empty/Other infrastructure)

The power shelves are strategically positioned to provide power distribution to the compute node groups above and below them.