GB200/GB300 Rack Configuration Verification Checklist#
This checklist helps verify that all GB200 rack configurations have been completed correctly after following any of the rack configuration processes. Use this systematic verification process before proceeding to rack power-on and provisioning.
General Prerequisites#
[ ] Rack inventory file obtained from factory after L11 testing (for automated/manual import)
[ ] Point-to-Point (P2P) documentation available with MAC addresses
[ ] Site information and IP allocation plan documented
[ ] GB200/GB300 and NVLink Switch categories configured in BCM
[ ] Network subnets defined and available for device assignment
Automated Rack Import Process Verification#
If you used the automated bcm-netautogen tool and bcm-post-install automation:
Prerequisites Check
[ ] Rack inventory file from factory available
[ ] Point-to-Point (P2P) file available
[ ] siteinformation.yaml file properly configured
[ ] bcm-netautogen tool available and configured
Process Verification
[ ] bcm-netautogen tool executed successfully with all required input files
- [ ] All .json files generated for each rack component:
[ ] 18 GB200 compute tray .json files
[ ] 9 NVLink Switch tray .json files
[ ] 8 Power shelf .json files
[ ] bcm-post-install automation completed successfully
[ ] All .json files imported into BCM without errors
Naming Convention Verification
[ ] GB200/GB300 compute trays follow:
<RACK>-<RU>-P[1-16]-<ROLE>-0[1-8]-C0[1-18][ ] NVLink switches follow:
<RACK>-<RU>-P[1-16]-<switch_role>-0[1-9][ ] Power shelves follow proper naming convention
Manual Rack Import Process Verification#
If you manually created and imported .json files:
File Creation Verification
[ ] Rack inventory file reviewed (or MAC addresses collected manually)
[ ] Individual .json files created for each component:
[ ] 18 GB200/GB300 compute tray .json files with proper MAC addresses
[ ] 9 NVLink Switch tray .json files with proper MAC addresses
[ ] 8 power shelf .json files
[ ] All .json files follow proper BCM format and syntax
[ ] IP addresses assigned correctly per network subnets
Import Verification
[ ] All .json files successfully imported into BCM
[ ] No import errors or warnings in BCM logs
[ ] All devices appear in BCM device list
Naming Convention Verification
[ ] All hostnames follow:
<Rack Location>-<RU>-<POD Number>-<Tray Type>-<node number>[ ] Tray types properly designated: DGX, NVSW, or PWR
Manual Addition of GB200/GB300 Rack Entries Verification#
If you manually added GB200/GB300 compute trays using cmsh commands:
Rack Entry Verification
[ ] Rack entry created with proper coordinates
[ ] Rack number matches site documentation
GB200/GB300 Compute Tray Golden Node
[ ] Physical node created with proper hostname
[ ] GB200/GB300 category assigned correctly
[ ] BMC interface configured (rf0 preferred, ipmi0 as fallback)
[ ] BMC IP, network, and MAC address configured
Network Interface Configuration
- [ ] Bluefield and CX-7/CX-8 interfaces added:
[ ] M1 (enP6p3s0f0np0) with correct MAC
[ ] M2 (enP22p3s0f0np0) with correct MAC (For GB300, only M1 is present)
[ ] S1 (enP6p3s0f1np1) with correct MAC and storage network
[ ] S2 (enP22p3s0f1np1) with correct MAC and storage network (For GB300, only S1 is present)
Bond Configuration
- [ ] Bond0 created with M1 and M2 interfaces (No bond0 for GB300)
[ ] Bond mode 4 (LACP) configured
[ ] Bond IP assigned on internal/management network
[ ] Bond set as provisioning interface
InfiniBand Interfaces
- [ ] The following four InfiniBand interfaces added if needed:
[ ] ibp3s0 with compute network assignment
[ ] ibP2p3s0 with compute network assignment
[ ] ibP16p3s0 with compute network assignment
[ ] ibP18p3s0 with compute network assignment
System Configuration
[ ] System MAC set for initial boot (M1 or M2)
Node Cloning
[ ] 18 compute tray entries cloned from golden node
[ ] All cloned nodes have incremental IPs
[ ] All cloned nodes have unique MAC addresses
[ ] Rack positions set correctly for all nodes
Manual Addition of NVLink Switch Entries Verification#
If you manually added NVLink switches using cmsh commands:
Basic Switch Entry Creation#
Switch Configuration#
[ ] Switch entry created in BCM with proper hostname
[ ] MAC address configured for switch
[ ] cm-lite-daemon enabled (hasclientdaemon = yes)
[ ] Switch kind set to nvlink
[ ] SNMP disabled (disablesnmp = yes)
Network Interface Configuration#
[ ] eth0/COMe0 interface added with correct MAC
[ ] eth0 IP address assigned on management network
[ ] Management network properly configured
SSH Access Configuration#
[ ] Username and password configured for SSH access
[ ] REST port set to
443[ ] SSH credentials tested and working
ZTP Settings Configuration (Optional)#
If ZTP was configured for automatic NVOS updates:
ZTP Directory Structure#
[ ] ZTP settings configured with proper templates
[ ] API enabled on switch
[ ]
initializecommand executed successfully[ ] Switch-specific directory created: /cm/local/apps/cmd/etc/htdocs/switch/<switch_name>/
File Repository Setup#
[ ] NVOS firmware image files copied to: /cm/local/apps/cmd/etc/htdocs/switch/image/
[ ] Startup configuration files copied to switch-specific directories
[ ] Startup configuration modified with hashed password field
ZTP Parameters#
[ ] Configuration mode set to file
[ ] Startup YAML file specified correctly
[ ] JSON template configured
[ ] Image name specified for updates
[ ]
checkimageonbootenabled
Switch Cloning#
[ ] 9 NVLink switch entries cloned from golden switch
[ ] All cloned switches have incremental IPs
[ ] All cloned switches have unique MAC addresses
[ ] ZTP settings inherited by all cloned switches
ZTP Process Execution#
[ ] Switch restart/reset completed to initiate ZTP
[ ] ZTP status monitored using cmdaemon logs
[ ] ZTP completion verified with
SUCCESSstatus[ ] CM Lite Daemon installed successfully on all switches
[ ] Switch monitoring and health data validated
Final Rack Verification#
Device Inventory Check#
- [ ] Device count in BCM matches expected:
[ ] 18 GB200/GB300 compute trays
[ ] 9 NVLink switches
[ ] 8 power shelves
[ ] All devices show correct status in BCM
Network Configuration Validation#
[ ] All devices have correct IP assignments
[ ] Network connectivity between management network and all devices has been verified
[ ] BMC connectivity tested for each compute tray and switch
Rack Physical Layout#
[ ] All devices assigned to proper rack positions
[ ] Rack layout in BCM matches physical hardware
[ ] Device naming conforms to site conventions
Documentation and Naming#
[ ] Hostnames follow established site naming standards
[ ] Device categories assigned correctly
[ ] MAC addresses are documented and match physical assets
[ ] IP allocations are documented in site/network records
Power and Environmental#
[ ] Power shelf configuration complete
[ ] Environmental monitoring enabled (if required)
[ ] Rack power requirements documented
Ready for Next Steps#
High Availability and Networking#
[ ] All rack components configured and ready
[ ] Network infrastructure validated
[ ] Power infrastructure validated
Provisioning Readiness#
[ ] All GB200 compute trays are ready for provisioning
[ ] All NVLink switches are ready for configuration
[ ] Rack management system is operational
Pre-Provisioning Readiness#
High Availability & Networking Validation#
[ ] Configuration of all rack components verified complete
[ ] Network infrastructure checked and validated
[ ] Power infrastructure checked and validated
Provisioning Preparedness#
[ ] All GB200 compute trays validated as ready for provisioning
[ ] All NVLink switches validated as ready for initial configuration
[ ] Rack management system fully operational
Commands for Quick Verification#
Use these commands to quickly verify rack configurations:
List All Rack Devices
cmsh -c "device; list"
cmsh -c "device; list -c gb200"
Check Rack Layout
cmsh -c "rack; display <rack_number>"
cmsh -c "rack; list"
Verify Network Assignments
cmsh -c "device use <device_name>; interfaces; list"
cmsh -c "device use <device_name>; ping"
Check BMC Connectivity
cmsh -c "device use <device_name>; bmcsettings; show"
Verify Switch ZTP Status (if applicable)
cmsh -c "device; use <switch_name>; ssh"
sudo ztp status
Monitor Switch Health (if applicable)
cmsh -c "device; use <switch_name>; latesthealthdata"
Next Steps#
Once all items in this checklist are verified:
Proceed to High Availability if HA is required
Begin GB200/GB300 Rack Power On and Bring Up for rack power-on sequence
Start compute node provisioning process
Note
This checklist should be completed before proceeding to the rack bring-up phase. Any missing configurations should be addressed by returning to the appropriate configuration sections.