Control Plane Power On and Provisioning#
At this point, the control plane nodes’ racks can be powered on and provisioned.
Power on the nodes to provision (Physical power on or through BMC either by its KVM software or using ipmitool).
Watch on the KVM for boot progress and that the PXE successfully picks up the head node.
Watch the node installer log to look for any issues during the provisioning.
tail -f /var/log/node-installerWatch the syslog for any issues/errors.
tail -f /var/log/syslog | grep -i cmdCheck cmsh to confirm the nodes are in an UP state.
After provisioning, log in to the nodes from the headnode and confirm the state of the NICs (all bonds up, all connections are up at least for the north-south networking).
This verifies that the BCM 11 software is correctly set up to do provisioning over the network. If the GB200 racks need to be brought up immediately, the HA and NFS configuration can be deferred.
Troubleshooting Provisioning Failures#
Failed to detect boot interface (GB300 k8s-system-user nodes)
When provisioning k8s-system-user control plane nodes on GB300 systems, nodes may fail with “Failed to detect boot interface” even though they PXE boot successfully. The boot log may show “You should probably insert the correct kernel module into the ramdisk.”
If you encounter this issue:
Verify software image consistency: Ensure all k8s-system-user nodes use identical software images for their architecture. To rule out software image differences as the cause, try assigning a different control plane image (for example,
slogin-image) temporarily to the affected node. If it provisions with that image, the issue is likely specific to the k8s-system-user image configuration.Clear DHCP leases on the head node and retry provisioning.
Verify static routes: Ensure static routes for BMC access (compute tray and NVSwitch) are configured correctly if they are required for your deployment.
Optional: Add Mellanox/InfiniBand kernel modules to the
k8s-system-user-image(mlx5_core,mlx5_ib,ib_core) if the preceding steps do not resolve the issue. The root cause in reported cases has not been definitively confirmed; refer to Software Image Setup for the command to add kernel modules.