GB200/GB300 Rack Configuration Verification Checklist#

This checklist helps verify that all GB200 rack configurations have been completed correctly after following any of the rack configuration processes. Use this systematic verification process before proceeding to rack power-on and provisioning.

General Prerequisites#

  • [ ] Rack inventory file obtained from factory after L11 testing (for automated/manual import)

  • [ ] Point-to-Point (P2P) documentation available with MAC addresses

  • [ ] Site information and IP allocation plan documented

  • [ ] GB200/GB300 and NVLink Switch categories configured in BCM

  • [ ] Network subnets defined and available for device assignment

Automated Rack Import Process Verification#

If you used the automated bcm-netautogen tool and bcm-post-install automation:

Prerequisites Check

  • [ ] Rack inventory file from factory available

  • [ ] Point-to-Point (P2P) file available

  • [ ] siteinformation.yaml file properly configured

  • [ ] bcm-netautogen tool available and configured

Process Verification

  • [ ] bcm-netautogen tool executed successfully with all required input files

  • [ ] All .json files generated for each rack component:
    • [ ] 18 GB200 compute tray .json files

    • [ ] 9 NVLink Switch tray .json files

    • [ ] 8 Power shelf .json files

  • [ ] bcm-post-install automation completed successfully

  • [ ] All .json files imported into BCM without errors

Naming Convention Verification

  • [ ] GB200/GB300 compute trays follow: <RACK>-<RU>-P[1-16]-<ROLE>-0[1-8]-C0[1-18]

  • [ ] NVLink switches follow: <RACK>-<RU>-P[1-16]-<switch_role>-0[1-9]

  • [ ] Power shelves follow proper naming convention

Manual Rack Import Process Verification#

If you manually created and imported .json files:

File Creation Verification

  • [ ] Rack inventory file reviewed (or MAC addresses collected manually)

  • [ ] Individual .json files created for each component:

    • [ ] 18 GB200/GB300 compute tray .json files with proper MAC addresses

    • [ ] 9 NVLink Switch tray .json files with proper MAC addresses

    • [ ] 8 power shelf .json files

  • [ ] All .json files follow proper BCM format and syntax

  • [ ] IP addresses assigned correctly per network subnets

Import Verification

  • [ ] All .json files successfully imported into BCM

  • [ ] No import errors or warnings in BCM logs

  • [ ] All devices appear in BCM device list

Naming Convention Verification

  • [ ] All hostnames follow: <Rack Location>-<RU>-<POD Number>-<Tray Type>-<node number>

  • [ ] Tray types properly designated: DGX, NVSW, or PWR

Manual Addition of GB200/GB300 Rack Entries Verification#

If you manually added GB200/GB300 compute trays using cmsh commands:

Rack Entry Verification

  • [ ] Rack entry created with proper coordinates

  • [ ] Rack number matches site documentation

GB200/GB300 Compute Tray Golden Node

  • [ ] Physical node created with proper hostname

  • [ ] GB200/GB300 category assigned correctly

  • [ ] BMC interface configured (rf0 preferred, ipmi0 as fallback)

  • [ ] BMC IP, network, and MAC address configured

Network Interface Configuration

  • [ ] Bluefield and CX-7/CX-8 interfaces added:
    • [ ] M1 (enP6p3s0f0np0) with correct MAC

    • [ ] M2 (enP22p3s0f0np0) with correct MAC (For GB300, only M1 is present)

    • [ ] S1 (enP6p3s0f1np1) with correct MAC and storage network

    • [ ] S2 (enP22p3s0f1np1) with correct MAC and storage network (For GB300, only S1 is present)

Bond Configuration

  • [ ] Bond0 created with M1 and M2 interfaces (No bond0 for GB300)
    • [ ] Bond mode 4 (LACP) configured

    • [ ] Bond IP assigned on internal/management network

    • [ ] Bond set as provisioning interface

InfiniBand Interfaces

  • [ ] The following four InfiniBand interfaces added if needed:
    • [ ] ibp3s0 with compute network assignment

    • [ ] ibP2p3s0 with compute network assignment

    • [ ] ibP16p3s0 with compute network assignment

    • [ ] ibP18p3s0 with compute network assignment

System Configuration

  • [ ] System MAC set for initial boot (M1 or M2)

Node Cloning

  • [ ] 18 compute tray entries cloned from golden node

  • [ ] All cloned nodes have incremental IPs

  • [ ] All cloned nodes have unique MAC addresses

  • [ ] Rack positions set correctly for all nodes

Final Rack Verification#

Device Inventory Check#

  • [ ] Device count in BCM matches expected:
    • [ ] 18 GB200/GB300 compute trays

    • [ ] 9 NVLink switches

    • [ ] 8 power shelves

    • [ ] All devices show correct status in BCM

Network Configuration Validation#

  • [ ] All devices have correct IP assignments

  • [ ] Network connectivity between management network and all devices has been verified

  • [ ] BMC connectivity tested for each compute tray and switch

Rack Physical Layout#

  • [ ] All devices assigned to proper rack positions

  • [ ] Rack layout in BCM matches physical hardware

  • [ ] Device naming conforms to site conventions

Documentation and Naming#

  • [ ] Hostnames follow established site naming standards

  • [ ] Device categories assigned correctly

  • [ ] MAC addresses are documented and match physical assets

  • [ ] IP allocations are documented in site/network records

Power and Environmental#

  • [ ] Power shelf configuration complete

  • [ ] Environmental monitoring enabled (if required)

  • [ ] Rack power requirements documented

Ready for Next Steps#

High Availability and Networking#

  • [ ] All rack components configured and ready

  • [ ] Network infrastructure validated

  • [ ] Power infrastructure validated

Provisioning Readiness#

  • [ ] All GB200 compute trays are ready for provisioning

  • [ ] All NVLink switches are ready for configuration

  • [ ] Rack management system is operational

Pre-Provisioning Readiness#

High Availability & Networking Validation#

  • [ ] Configuration of all rack components verified complete

  • [ ] Network infrastructure checked and validated

  • [ ] Power infrastructure checked and validated

Provisioning Preparedness#

  • [ ] All GB200 compute trays validated as ready for provisioning

  • [ ] All NVLink switches validated as ready for initial configuration

  • [ ] Rack management system fully operational

Commands for Quick Verification#

Use these commands to quickly verify rack configurations:

List All Rack Devices

cmsh -c "device; list"
cmsh -c "device; list -c gb200"

Check Rack Layout

cmsh -c "rack; display <rack_number>"
cmsh -c "rack; list"

Verify Network Assignments

cmsh -c "device use <device_name>; interfaces; list"
cmsh -c "device use <device_name>; ping"

Check BMC Connectivity

cmsh -c "device use <device_name>; bmcsettings; show"

Verify Switch ZTP Status (if applicable)

cmsh -c "device; use <switch_name>; ssh"
sudo ztp status

Monitor Switch Health (if applicable)

cmsh -c "device; use <switch_name>; latesthealthdata"

Next Steps#

Once all items in this checklist are verified:

  1. Proceed to High Availability if HA is required

  2. Begin GB200/GB300 Rack Power On and Bring Up for rack power-on sequence

  3. Start compute node provisioning process

Note

This checklist should be completed before proceeding to the rack bring-up phase. Any missing configurations should be addressed by returning to the appropriate configuration sections.