Initial Setup Verification Checklist#
This checklist helps verify that all initial BCM setup configuration tasks have been completed correctly before proceeding to node configuration. Use this systematic verification process after completing mixed architecture setup, software image creation, and category configuration.
General Prerequisites#
[ ] Head node BCM installation completed successfully
[ ] Head node architecture identified (x86 or ARM/aarch64)
[ ] Network configuration completed on head node
[ ] Required node types and categories documented
[ ] Target architectures for cluster nodes identified
Mixed Architecture Setup Verification#
Architecture Requirements Assessment
[ ] Head node architecture documented
[ ] Target node architectures identified for cluster
[ ] Mixed architecture requirements confirmed (if different from head node)
Method 1: Pre-compiled Images Import (if used)#
Image Download Verification
[ ] Pre-compiled .tar.gz files downloaded:
[ ] node-installer.tar.gz for target architecture.
[ ] cmshared.tar.gz for target architecture.
[ ] default-image-ubuntu2404-<arch>.tar.gz for target architecture.
[ ] All downloaded files have correct checksums/integrity
Image Extraction Verification
[ ] Default software image extracted successfully:
[ ] Directory created:
/cm/images/default-image-ubuntu2404-<arch>
[ ] Image files extracted without errors
[ ] Node-installer extracted successfully: - [ ] Directory created:
/cm/images/node-installer-<arch>
- [ ] Node-installer files extracted without errors[ ] CM-shared extracted successfully: - [ ] Directory created:
/cm/images/shared-<arch>
- [ ] Shared files extracted without errors
Method 2: Image Creation with cm-image Tool (if used)#
Tool Preparation
[ ] cm-image tool available and functional
[ ] QEMU emulation configured for cross-architecture builds
[ ] Sufficient disk space available for image creation (recommend 50GB+)
[ ] Vanilla/base-distribution .tar.gz obtained (not .iso)
Image Creation Process
[ ] Default image creation completed successfully
[ ] Node-installer image creation completed successfully
[ ] CM-shared image creation completed successfully
[ ] All images created without errors in build logs
Image Verification
[ ] All created images bootable and functional
[ ] Architecture-specific binaries present in images
[ ] Image sizes reasonable and within expected ranges
Software Image Setup Verification#
Base Image Availability
[ ] Default images available for all required architectures
[ ] DGX OS 7 image available for GB200 nodes
[ ] All base images validated and functional
Control Plane Software Images#
ARM/aarch64 Control Plane Images (if applicable)
[ ] slogin-image created from ARM default image
[ ] k8s-user-image created from ARM default image
[ ] All ARM images cloned successfully without errors
x86 Control Plane Images (if applicable)
[ ] slogin-image created from x86 default image (not common)
[ ] k8s-admin-image created from x86 default image
[ ] k8s-user-image created from x86 default image
[ ] All x86 images cloned successfully without errors
GB200 Node Images
[ ] dgx-gb200-image created or imported successfully
[ ] DGX OS 7 compatibility verified
[ ] GB200-specific drivers and software included
Image Customization Verification#
cm-chroot-sw-image Functionality
[ ] cm-chroot-sw-image tool functional for all images
[ ] Virtual filesystems mount correctly during chroot
[ ] Package installation works within chroot environment
[ ] Custom configurations applied successfully
Image-Specific Customizations
[ ] slogin images: User access tools and development packages installed
[ ] k8s-admin images: Kubernetes administration tools configured
[ ] k8s-user images: User-space Kubernetes tools and Run:ai prerequisites
[ ] dgx-gb200 images: GPU drivers and CUDA libraries verified
Custom Software Installation (if applicable)
[ ] Additional packages installed via chroot
[ ] Custom scripts and configurations added
[ ] Service configurations updated as needed
[ ] All customizations documented
Category Creation Verification#
Required Categories#
Category Existence
[ ] slogin category created
[ ] k8s-admin category created
[ ] k8s-user category created
[ ] dgx-gb200 category created (or verified from bcm-post-install)
Software Image Assignment
[ ] slogin category assigned slogin-image
[ ] k8s-admin category assigned k8s-admin-image
[ ] k8s-user category assigned k8s-user-image
[ ] dgx-gb200 category assigned dgx-gb200-image
Category Configuration Verification#
Network Configuration
[ ] All categories assigned appropriate management network
[ ] Network assignments match site network planning
[ ] DHCP settings configured appropriately for each category
BMC Settings (where applicable)
[ ] BMC user credentials configured at category level
[ ] BMC settings consistent across similar node types
[ ] BMC networks assigned correctly
Boot and Installation Options
[ ] Boot options configured for each category
[ ] Installation parameters set appropriately
[ ] PXE boot settings verified for each category
Storage and File System Settings
[ ] Root file system settings configured
[ ] Swap settings configured appropriately
[ ] Additional mount points configured as needed
Hardware-Specific Settings
[ ] slogin category: ARM-specific settings (if applicable)
[ ] k8s-admin category: x86 requirements verified (for NMX-M compatibility)
[ ] k8s-user category: Architecture settings match hardware
[ ] dgx-gb200 category: GPU and networking hardware settings
Category Testing and Validation#
Category Functionality
[ ] Test node can be assigned to each category
[ ] Category inheritance working correctly
[ ] Software image assignment functional
[ ] Network assignments working in category context
Category Consistency
[ ] Similar categories have consistent settings
[ ] No conflicting configurations between categories
[ ] All required parameters set for each category
Integration Verification#
Architecture and Image Compatibility
[ ] Software images compatible with target hardware architectures
[ ] Cross-architecture functionality verified where needed
[ ] No architecture mismatches in category assignments
Image and Category Integration
[ ] All categories have valid software image assignments
[ ] Software images contain required software for category purpose
[ ] No missing dependencies between images and category requirements
Network Integration
[ ] Management network assignments consistent across setup
[ ] Network settings compatible with planned node configurations
[ ] No network conflicts between categories
System Readiness Verification#
File System and Storage
[ ] Sufficient disk space for all images and future operations
[ ] Image directories have correct permissions
[ ] Backup procedures in place for custom images
Performance and Resources
[ ] Head node performance adequate for mixed architecture workloads
[ ] Memory usage reasonable with multiple images
[ ] Network bandwidth sufficient for image deployment
Documentation and Procedures
[ ] All custom configurations documented
[ ] Image creation procedures documented for reproducibility
[ ] Category settings recorded in site documentation
Commands for Quick Verification#
Check Available Images
ls -la /cm/images/
cm-image-info --list
Verify Categories
cmsh -c "category; list"
cmsh -c "category; use <category_name>; show"
Check Software Image Assignments
cmsh -c "category; use slogin; show" | grep -i software
cmsh -c "category; use k8s-admin; show" | grep -i software
cmsh -c "category; use k8s-user; show" | grep -i software
cmsh -c "category; use dgx-gb200; show" | grep -i software
Test Image Accessibility
cm-chroot-sw-image /cm/images/<image_name>
# Test package installation and basic functionality
Verify Architecture Compatibility
file /cm/images/<image_name>/bin/bash
# Should show correct architecture (x86-64 or aarch64)
Check Network Assignments
cmsh -c "category; use <category_name>; networks; list"
Troubleshooting Common Issues#
Image Creation Problems
[ ] QEMU emulation working for cross-architecture builds
[ ] Sufficient disk space available during creation
[ ] Base distribution files not corrupted
Category Assignment Issues
[ ] Software image paths correct and accessible
[ ] Network names match existing network definitions
[ ] No typos in category or image names
Architecture Mismatches
[ ] Software images match intended hardware architecture
[ ] Cross-architecture emulation working correctly
[ ] Native architecture images preferred for performance
Next Steps#
Once all items in this checklist are verified:
Proceed to Control Plane Node Entries
Begin Automated Rack Import Process or manual rack configuration
Start node provisioning after completing node entries
Note
This checklist should be completed before proceeding to node configuration. Any missing configurations should be addressed by returning to the appropriate setup sections: