Manual Addition of GB200 Rack Entries#
If the GB200 rack components need to be added manually in BCM (through cmsh), the explicit steps and requirements are documented here.
GB200 Compute Tray Golden Node#
Add the rack entry into cmsh.
Step 1: Add rack entry commands
cmsh -c "rack; add <rack number>; set x-coordinate 1; set y-coordinate 1; commit"
Add node entry. Please note, follow the nomenclature as described in the rack inventory section.
Step 2: Add node entry commands
cmsh -c "device;add physicalnode <dgx-node-01>"
Set category (many attributes will be inherited based on what was set up for the GB200 category).
Step 3: Set category commands
cmsh -c "rack-ru-p1-head-01->device*[<dgx-node-01>]; set category <gb200 category>; commit"
Add BMC connection (rf0 for redfish based control, ipmi0 for ipmitool. default rf0). This will set the power control of the node to this interface.
Step 4: Add BMC connection commands
cmsh -c "add bmc rf0; set network <name of ipminet where ipmi is configured>; set ip <bmc IP>; set mac <bmc mac> (if available); commit"
Add Bluefield and CX-7 interfaces. The individual netnames for management ports 1 and 2 and storage ports 1 and 2 are:
M1—enP6p3s0f0np0
M2—enP22p3s0f0np0
S1—enP6p3s0f1np1
S2—enP22p3s0f1np1
Step 5: Add Bluefield and CX-7 interface commands
cmsh -c "add physical enP6p3s0f0np0; set mac <M1 MAC>; commit" cmsh -c "add physical enP22p3s0f0np0; set mac <M2 MAC>; commit" cmsh -c "add physical enP6p3s0f1np1; set network <storage network name>; set mac <S1 MAC>; commit" cmsh -c "add physical enP22p3s0f1np1; set network <storage network name>; set mac <S2 MAC>; commit"
Note
If the MACs are not available, they can be set after cloning the golden node entry.
M1 and M2 will be configured as bond0, so the IP address and network will be setup during the bond configuration.
S1 and S2 will be configured as individual interfaces, so the IP address and network will be setup during the interface configuration.
Set MACs for each interface (if available). This can be done after cloning the golden node entry.
Note
MAC addresses can be set in the previous step along with the interface configuration.
Configure bonds. Add and configure Bond0 (assuming LACP bonding enabled)
Configure bond0 single command
cmsh -c "add bond bond0; set interfaces enP6p3s0f0np0 enP22p3s0f0np0; set mode 4; set set options miimon=100; set network <internalnet or whatever network it is being provisioned on>; set ip <bond ip>; commit"
Configure bond commands (multiple commands)
add bond bond0 set interfaces enP6p3s0f0np0 enP22p3s0f0np0 set mode 4 set set options miimon=100 set network <internalnet or whatever network it is being provisioned on> set ip <bond ip> commit
Set provisioning interface.
Step 8: Set provisioning interface commands
cmsh -c "device; use <dgx-node-01>; set provisioninginterface bond0; commit"
Add InfiniBand interfaces.
Step 9: Add InfiniBand interface commands
cmsh -c "add physical ibp3s0; set network computenet; set ip <IP> (if available); commit" cmsh -c "add physical ibP2p3s0; set network computenet; set ip <IP> (if available); commit" cmsh -c "add physical ibP16p3s0; set network computenet; set ip <IP> (if available); commit" cmsh -c "add physical ibP18p3s0; set network computenet; set ip <IP> (if available); commit"
Set system MAC for initial boot. Choose M1 or M2 MAC and set it for the device unless a console or remote BMC/KVM access is available to the node to select it
Step 10: Set system MAC commands
cmsh -c "device; use <dgx-node-01>; set mac <M1 or M2 MAC>; commit"
Clone 18 entries.
foreach -o <goldennode> -n <hostname with first node number>..<hostname with last node number> --next-ip ()
foreach -o <rack number>-<pod number>-gb200-c01 -n <rack number>-<pod number>-gb200-c02..<rack number>-<pod number>-gb200-c18 --next-ip ()
commit
IPs and MACs will have to be updated manually for each entry.
Set the rack name and position for each compute tray, NVLink Switch, and Power Shelf.
The following script will prompt the user to enter the hostname nomenclature for the rack to be added. This is needed for each rack to properly display in the rack submenu, which is a new feature of BCM 11. The script handles all rack components with proper physical positioning and consistent RU-based hostname mapping throughout the entire 42U rack space.
Manual rack position update script
#!/bin/bash
# --- Get hostname components from the user ---
echo "Please define the hostname components."
read -p "Enter the 'rack number' for the hostname (this will also be used for 'set rack <rack_number>'): " HOSTNAME_RACK_ID
read -p "Enter the 'pod number' for the hostname (e.g., p1): " HOSTNAME_POD_ID
# Validate inputs (basic check if they are not empty)
if [ -z "$HOSTNAME_RACK_ID" ] || [ -z "$HOSTNAME_POD_ID" ]; then
echo "Error: Hostname rack number and pod number cannot be empty."
exit 1
fi
echo # Adding a blank line for better readability
echo "Using Hostname Rack ID (and for 'set rack' command): $HOSTNAME_RACK_ID"
echo "Using Hostname Pod ID: $HOSTNAME_POD_ID"
echo "Hostname formats will be:"
echo " Compute trays: ${HOSTNAME_RACK_ID}-<RU>-${HOSTNAME_POD_ID}-gb200-n<node_number>"
echo " NVLink switches: ${HOSTNAME_RACK_ID}-<RU>-${HOSTNAME_POD_ID}-nvsw-n<switch_number>"
echo " Power shelves: ${HOSTNAME_RACK_ID}-<RU>-${HOSTNAME_POD_ID}-pwr-n<shelf_number>"
echo # Adding a blank line
# --- First range of nodes (01 to 08) ---
k_rack_position=10 # Initialize k_rack_position for the first set of nodes
echo "Processing nodes 01 to 08..."
for i in {01..08}; do
# The host_RU matches the rack position
host_RU=$k_rack_position
# Construct the device hostname
device_hostname="${HOSTNAME_RACK_ID}-${host_RU}-${HOSTNAME_POD_ID}-gb200-n${i}"
echo "Target Device: $device_hostname, Setting Rack Position $k_rack_position in Rack ${HOSTNAME_RACK_ID}"
cmsh -c "device; use ${device_hostname}; set rack ${HOSTNAME_RACK_ID} $k_rack_position; commit"
k_rack_position=$((k_rack_position + 1))
done
echo "Finished processing nodes 01 to 08."
echo # Adding a blank line
# --- Second range of nodes (09 to 18) ---
k_rack_position=27 # Re-initialize k_rack_position for the second set of nodes
echo "Processing nodes 09 to 18..."
for i in {09..18}; do
# The host_RU matches the rack position
host_RU=$k_rack_position
# Construct the device hostname
device_hostname="${HOSTNAME_RACK_ID}-${host_RU}-${HOSTNAME_POD_ID}-gb200-n${i}"
echo "Target Device: $device_hostname, Setting Rack Position $k_rack_position in Rack ${HOSTNAME_RACK_ID}"
cmsh -c "device; use ${device_hostname}; set rack ${HOSTNAME_RACK_ID} $k_rack_position; commit"
k_rack_position=$((k_rack_position + 1))
done
echo "Finished processing nodes 09 to 18."
echo # Adding a blank line
# --- Third range - NVLink Switches (01 to 09) ---
k_rack_position=18 # Initialize k_rack_position for NVLink switches
echo "Processing NVLink switches 01 to 09..."
for i in {01..09}; do
# The host_RU matches the rack position
host_RU=$k_rack_position
# Construct the NVLink switch hostname
switch_hostname="${HOSTNAME_RACK_ID}-${host_RU}-${HOSTNAME_POD_ID}-nvsw-n${i}"
echo "Target NVLink Switch: $switch_hostname, Setting Rack Position $k_rack_position in Rack ${HOSTNAME_RACK_ID}"
cmsh -c "device; use ${switch_hostname}; set rack ${HOSTNAME_RACK_ID} $k_rack_position; commit"
k_rack_position=$((k_rack_position + 1))
done
echo "Finished processing NVLink switches 01 to 09."
echo # Adding a blank line
# --- Fourth range - Power Shelves (01 to 04) - Below first compute tray group ---
k_rack_position=6 # Initialize k_rack_position for power shelves (below first compute group)
echo "Processing Power shelves 01 to 04..."
for i in {01..04}; do
# The host_RU matches the rack position
host_RU=$k_rack_position
# Construct the power shelf hostname
pwr_hostname="${HOSTNAME_RACK_ID}-${host_RU}-${HOSTNAME_POD_ID}-pwr-n${i}"
echo "Target Power Shelf: $pwr_hostname, Setting Rack Position $k_rack_position in Rack ${HOSTNAME_RACK_ID}"
cmsh -c "device; use ${pwr_hostname}; set rack ${HOSTNAME_RACK_ID} $k_rack_position; commit"
k_rack_position=$((k_rack_position + 1))
done
echo "Finished processing Power shelves 01 to 04."
echo # Adding a blank line
# --- Fifth range - Power Shelves (05 to 08) - Above top compute tray group ---
k_rack_position=39 # Initialize k_rack_position for power shelves (above second compute group)
echo "Processing Power shelves 05 to 08..."
for i in {05..08}; do
# The host_RU matches the rack position
host_RU=$k_rack_position
# Construct the power shelf hostname
pwr_hostname="${HOSTNAME_RACK_ID}-${host_RU}-${HOSTNAME_POD_ID}-pwr-n${i}"
echo "Target Power Shelf: $pwr_hostname, Setting Rack Position $k_rack_position in Rack ${HOSTNAME_RACK_ID}"
cmsh -c "device; use ${pwr_hostname}; set rack ${HOSTNAME_RACK_ID} $k_rack_position; commit"
k_rack_position=$((k_rack_position + 1))
done
echo "Finished processing Power shelves 05 to 08."
echo "All nodes, switches, and power shelves processed. Confirm in cmsh by doing cmsh;rack; display <rack id>"
After the nodes have been added, check if the rack positions look correct.
cmsh;rack; display rack <rack number>
or
cmsh;rack;list #see that the nodes are in the expected position
Rack position reference for a GB200 rack
Rack Position |
Device Type |
Device # |
Example Hostname |
---|---|---|---|
48-46 |
Infrastructure |
N/A |
(Empty/Other infrastructure) |
45-44 |
SN2201 TOR/OOB |
N/A |
(TOR/OOB switch) |
43 |
Infrastructure |
N/A |
(Empty/Other infrastructure) |
42 |
Power Shelf |
08 |
A05-42-P1-pwr-n08 |
41 |
Power Shelf |
07 |
A05-41-P1-pwr-n07 |
40 |
Power Shelf |
06 |
A05-40-P1-pwr-n06 |
39 |
Power Shelf |
05 |
A05-39-P1-pwr-n05 |
38 |
Infrastructure |
N/A |
(Empty) |
37 |
Compute Tray |
18 |
A05-37-P1-gb200-n18 |
36 |
Compute Tray |
17 |
A05-36-P1-gb200-n17 |
35 |
Compute Tray |
16 |
A05-35-P1-gb200-n16 |
34 |
Compute Tray |
15 |
A05-34-P1-gb200-n15 |
33 |
Compute Tray |
14 |
A05-33-P1-gb200-n14 |
32 |
Compute Tray |
13 |
A05-32-P1-gb200-n13 |
31 |
Compute Tray |
12 |
A05-31-P1-gb200-n12 |
30 |
Compute Tray |
11 |
A05-30-P1-gb200-n11 |
29 |
Compute Tray |
10 |
A05-29-P1-gb200-n10 |
28 |
Compute Tray |
09 |
A05-28-P1-gb200-n09 |
27 |
NVLink Switch |
09 |
A05-27-P1-nvsw-n09 |
26 |
NVLink Switch |
08 |
A05-26-P1-nvsw-n08 |
25 |
NVLink Switch |
07 |
A05-25-P1-nvsw-n07 |
24 |
NVLink Switch |
06 |
A05-24-P1-nvsw-n06 |
23 |
NVLink Switch |
05 |
A05-23-P1-nvsw-n05 |
22 |
NVLink Switch |
04 |
A05-22-P1-nvsw-n04 |
21 |
NVLink Switch |
03 |
A05-21-P1-nvsw-n03 |
20 |
NVLink Switch |
02 |
A05-20-P1-nvsw-n02 |
19 |
NVLink Switch |
01 |
A05-19-P1-nvsw-n01 |
18 |
Compute Tray |
08 |
A05-18-P1-gb200-n08 |
17 |
Compute Tray |
07 |
A05-17-P1-gb200-n07 |
16 |
Compute Tray |
06 |
A05-16-P1-gb200-n06 |
15 |
Compute Tray |
05 |
A05-15-P1-gb200-n05 |
14 |
Compute Tray |
04 |
A05-14-P1-gb200-n04 |
13 |
Compute Tray |
03 |
A05-13-P1-gb200-n03 |
12 |
Compute Tray |
02 |
A05-12-P1-gb200-n02 |
11 |
Compute Tray |
01 |
A05-11-P1-gb200-n01 |
10 |
Infrastructure |
N/A |
(Empty) |
9 |
Power Shelf |
04 |
A05-9-P1-pwr-n04 |
8 |
Power Shelf |
03 |
A05-8-P1-pwr-n03 |
7 |
Power Shelf |
02 |
A05-7-P1-pwr-n02 |
6 |
Power Shelf |
01 |
A05-6-P1-pwr-n01 |
5-1 |
Infrastructure |
N/A |
(Empty/Other infrastructure) |
The power shelves are strategically positioned to provide power distribution to the compute node groups above and below them.