Initial Setup Verification Checklist#

This checklist helps verify that all initial BCM setup configuration tasks have been completed correctly before proceeding to node configuration. Use this systematic verification process after completing mixed architecture setup, software image creation, and category configuration.

General Prerequisites#

  • [ ] Head node BCM installation completed successfully

  • [ ] Head node architecture identified (x86 or ARM/aarch64)

  • [ ] Network configuration completed on head node

  • [ ] Required node types and categories documented

  • [ ] Target architectures for cluster nodes identified

Mixed Architecture Setup Verification#

Architecture Requirements Assessment

  • [ ] Head node architecture documented

  • [ ] Target node architectures identified for cluster

  • [ ] Mixed architecture requirements confirmed (if different from head node)

Method 1: Pre-compiled Images Import (if used)#

Image Download Verification

  • [ ] Pre-compiled .tar.gz files downloaded:

    • [ ] node-installer.tar.gz for target architecture.

    • [ ] cmshared.tar.gz for target architecture.

    • [ ] default-image-ubuntu2404-<arch>.tar.gz for target architecture.

  • [ ] All downloaded files have correct checksums/integrity

Image Extraction Verification

  • [ ] Default software image extracted successfully:

    • [ ] Directory created: /cm/images/default-image-ubuntu2404-<arch>

    • [ ] Image files extracted without errors

  • [ ] Node-installer extracted successfully: - [ ] Directory created: /cm/images/node-installer-<arch> - [ ] Node-installer files extracted without errors

  • [ ] CM-shared extracted successfully: - [ ] Directory created: /cm/images/shared-<arch> - [ ] Shared files extracted without errors

Method 2: Image Creation with cm-image Tool (if used)#

Tool Preparation

  • [ ] cm-image tool available and functional

  • [ ] QEMU emulation configured for cross-architecture builds

  • [ ] Sufficient disk space available for image creation (recommend 50GB+)

  • [ ] Vanilla/base-distribution .tar.gz obtained (not .iso)

Image Creation Process

  • [ ] Default image creation completed successfully

  • [ ] Node-installer image creation completed successfully

  • [ ] CM-shared image creation completed successfully

  • [ ] All images created without errors in build logs

Image Verification

  • [ ] All created images bootable and functional

  • [ ] Architecture-specific binaries present in images

  • [ ] Image sizes reasonable and within expected ranges

Software Image Setup Verification#

Base Image Availability

  • [ ] Default images available for all required architectures

  • [ ] DGX OS 7 image available for GB200 nodes

  • [ ] All base images validated and functional

Control Plane Software Images#

ARM/aarch64 Control Plane Images (if applicable)

  • [ ] slogin-image created from ARM default image

  • [ ] k8s-user-image created from ARM default image

  • [ ] All ARM images cloned successfully without errors

x86 Control Plane Images (if applicable)

  • [ ] slogin-image created from x86 default image (not common)

  • [ ] k8s-admin-image created from x86 default image

  • [ ] k8s-user-image created from x86 default image

  • [ ] All x86 images cloned successfully without errors

GB200 Node Images

  • [ ] dgx-gb200-image created or imported successfully

  • [ ] DGX OS 7 compatibility verified

  • [ ] GB200-specific drivers and software included

Image Customization Verification#

cm-chroot-sw-image Functionality

  • [ ] cm-chroot-sw-image tool functional for all images

  • [ ] Virtual filesystems mount correctly during chroot

  • [ ] Package installation works within chroot environment

  • [ ] Custom configurations applied successfully

Image-Specific Customizations

  • [ ] slogin images: User access tools and development packages installed

  • [ ] k8s-admin images: Kubernetes administration tools configured

  • [ ] k8s-user images: User-space Kubernetes tools and Run:ai prerequisites

  • [ ] dgx-gb200 images: GPU drivers and CUDA libraries verified

Custom Software Installation (if applicable)

  • [ ] Additional packages installed via chroot

  • [ ] Custom scripts and configurations added

  • [ ] Service configurations updated as needed

  • [ ] All customizations documented

Category Creation Verification#

Required Categories#

Category Existence

  • [ ] slogin category created

  • [ ] k8s-admin category created

  • [ ] k8s-user category created

  • [ ] dgx-gb200 category created (or verified from bcm-post-install)

Software Image Assignment

  • [ ] slogin category assigned slogin-image

  • [ ] k8s-admin category assigned k8s-admin-image

  • [ ] k8s-user category assigned k8s-user-image

  • [ ] dgx-gb200 category assigned dgx-gb200-image

Category Configuration Verification#

Network Configuration

  • [ ] All categories assigned appropriate management network

  • [ ] Network assignments match site network planning

  • [ ] DHCP settings configured appropriately for each category

BMC Settings (where applicable)

  • [ ] BMC user credentials configured at category level

  • [ ] BMC settings consistent across similar node types

  • [ ] BMC networks assigned correctly

Boot and Installation Options

  • [ ] Boot options configured for each category

  • [ ] Installation parameters set appropriately

  • [ ] PXE boot settings verified for each category

Storage and File System Settings

  • [ ] Root file system settings configured

  • [ ] Swap settings configured appropriately

  • [ ] Additional mount points configured as needed

Hardware-Specific Settings

  • [ ] slogin category: ARM-specific settings (if applicable)

  • [ ] k8s-admin category: x86 requirements verified (for NMX-M compatibility)

  • [ ] k8s-user category: Architecture settings match hardware

  • [ ] dgx-gb200 category: GPU and networking hardware settings

Category Testing and Validation#

Category Functionality

  • [ ] Test node can be assigned to each category

  • [ ] Category inheritance working correctly

  • [ ] Software image assignment functional

  • [ ] Network assignments working in category context

Category Consistency

  • [ ] Similar categories have consistent settings

  • [ ] No conflicting configurations between categories

  • [ ] All required parameters set for each category

Integration Verification#

Architecture and Image Compatibility

  • [ ] Software images compatible with target hardware architectures

  • [ ] Cross-architecture functionality verified where needed

  • [ ] No architecture mismatches in category assignments

Image and Category Integration

  • [ ] All categories have valid software image assignments

  • [ ] Software images contain required software for category purpose

  • [ ] No missing dependencies between images and category requirements

Network Integration

  • [ ] Management network assignments consistent across setup

  • [ ] Network settings compatible with planned node configurations

  • [ ] No network conflicts between categories

System Readiness Verification#

File System and Storage

  • [ ] Sufficient disk space for all images and future operations

  • [ ] Image directories have correct permissions

  • [ ] Backup procedures in place for custom images

Performance and Resources

  • [ ] Head node performance adequate for mixed architecture workloads

  • [ ] Memory usage reasonable with multiple images

  • [ ] Network bandwidth sufficient for image deployment

Documentation and Procedures

  • [ ] All custom configurations documented

  • [ ] Image creation procedures documented for reproducibility

  • [ ] Category settings recorded in site documentation

Commands for Quick Verification#

Check Available Images

ls -la /cm/images/
cm-image-info --list

Verify Categories

cmsh -c "category; list"
cmsh -c "category; use <category_name>; show"

Check Software Image Assignments

cmsh -c "category; use slogin; show" | grep -i software
cmsh -c "category; use k8s-admin; show" | grep -i software
cmsh -c "category; use k8s-user; show" | grep -i software
cmsh -c "category; use dgx-gb200; show" | grep -i software

Test Image Accessibility

cm-chroot-sw-image /cm/images/<image_name>
# Test package installation and basic functionality

Verify Architecture Compatibility

file /cm/images/<image_name>/bin/bash
# Should show correct architecture (x86-64 or aarch64)

Check Network Assignments

cmsh -c "category; use <category_name>; networks; list"

Troubleshooting Common Issues#

Image Creation Problems

  • [ ] QEMU emulation working for cross-architecture builds

  • [ ] Sufficient disk space available during creation

  • [ ] Base distribution files not corrupted

Category Assignment Issues

  • [ ] Software image paths correct and accessible

  • [ ] Network names match existing network definitions

  • [ ] No typos in category or image names

Architecture Mismatches

  • [ ] Software images match intended hardware architecture

  • [ ] Cross-architecture emulation working correctly

  • [ ] Native architecture images preferred for performance

Next Steps#

Once all items in this checklist are verified:

  1. Proceed to Control Plane Node Entries

  2. Begin Automated Rack Import Process or manual rack configuration

  3. Start node provisioning after completing node entries

Note

This checklist should be completed before proceeding to node configuration. Any missing configurations should be addressed by returning to the appropriate setup sections: