Configure Cluster Ethernet Networking#
Following files required, either create manually or formatted automatically
For each network that is setup within BCM there is an equivalent set of physical switches that need to be deployed and configured:
BCM Inband Network Switch - Control Plane/Internalnet
Ethernet Core and Spine Switch (For greater than 1 SU SuperPODs)
DGX Inband/Storage Network Switches
OOB Network (SN2201)
NVSwitch COMe Network
Control Plane OOB Network ( Mgmt Nodes, DDN, etc.)
Deployment Process Options#
The following table provides an overview of the different deployment processes available for configuring cluster ethernet networking:
Process Type |
Description |
Reference Section |
Pre-Stage |
Initial setup and preparation steps |
|
Pre-Run |
Validation and preparation steps |
|
Manual Process |
Step-by-step manual configuration |
|
Semi-Automated Process |
DHCP/ZTP-based automated deployment |
|
Fully Automated Process |
BCM/ZTP-based complete automation |
Prerequisite Stage#
This section covers the initial setup and preparation steps required before beginning any deployment process.
Prerequisites:
✓ Prefix Allocation: (see Appendix) - BMS use - /24 subnet, outside of supernet allocated for cluster
✓ P2P-Ethernet formatted for Netautogen to consume - p2p_standard_guide (see Appendix) - p2p example (see Appendix) - p2p column header / worksheet naming / default hostname (see Appendix)
✓ SiteInfo file with the following: (see Appendix) - Customer provided prefix - BGP between Customer SPOD handoff - Rack Inventory Mapping for GB200: <Serial> - Misc data like: NTP, Syslog, DNS
✓ GB200 Rack Inventory file - Format: .csv (Only supporting file from: Splunk DB) - Example (see Appendix)
✓ Download Files: - Cumulus Linux Installer Binary - NVOS - InfiniBand OS
✓ Build 2 USB sticks for production:
USB 1
USB 2
BCM ISO
p2p-ethernet.csv, GB200-rack-inventory, OS: Cumulus OS: 5.11.0 (x86), NVOS: 1.1.0 (x86), InfiniBand OS
Pre-Run Process#
This section covers the validation and preparation steps that should be performed before beginning the actual deployment.
To validate P2P and Network Auto Configuration:
Analyze / Restructure P2P (using GUI)
Analyze Point-to-Point (P2P) network connections
Identify P2P topology patterns and dependencies
Restructure P2P connections for netautogen to consume
Validate P2P link configurations
Network Auto Generate
Generate network configurations based on P2P.
Generate JSON to be imported into BCM as part of optional
To Access the UI:
Go to Network Analyzer Dashboard:
Temporary Login: bcm / bcm123
Select 2: Network Auto Generate
Project Name
DGX Type
Upload siteinfo.yml. For more information, see Section 1.1: siteinfo.yaml.
Upload p2p_ethernet.csv
Upload rack inventory files
Press: Generate Configuration
If there is an error, please check the file and logs to fix it manually.
Option 1: Download the generated project. (Download Project)
Will gave tar.gz file contains all the generated configuration files for the switches and BCM JSON files.
Option 2: Deploy Virtual Env. in AIR for network validation (Launch Air SIM)
Will gave Virtual Environment in AIR, which can be used to validate the network configuration.
To Validate the bcm-pod-setup, then use Krusty as an virtual example
Bring up Base Command Manager (BCM) in Krusty Environment
cod cc --version 11.0-dev --distro ubuntu2404 --head-node-root-volume-size 80 -m cod.8g4c -n 0
apt-get update; apt-get install bcm-superpod-network bcm-post-install
Upload all the files on to Krusty
Execute BCM “NetAutogen” Automation
module load bcm-superpod-network
bcm-netautogen --config-path <path_to_config_files> -l netauto.log
The commands above will do following:
Gather data from the p2p_ethernet and parse the data.
Generate Network subnets.
Allocate IP for each component.
Parse GB200 rack inventory information from the file and allocate subnets and IPs.
Generate all North-to-South Switch configuration in a startup.yaml along with NVSwitch common switch configurations.
Generate a .JSON file suitable for manual import into BCM as an alternative backup method.
CM-Create IMAGE:
bcm-pod-setup will look for the directory for pre-existing image of the dgx (/cm/image/<dgx_image>)
Depending on the Environment:
Airgapped Environment: adding –skipdist (not to update the apt-get package)
cm-create-image -n dgx-os-7.1-gb200-image -a /root/baseos7.1-image-arm64-04-25-2025.tar.gz --dgx -r --no-cm-cuda-repo --cmdvd /root/bcm-11.0-ubuntu2404-dgx-os-7.1.iso --skipdist
Non-Airgapped Environment CREATE IMAGE REQUIRED To Download the .tar file: https://support2.brightcomputing.com/baseos7-<ARCH>/<latest>.tar.gz
cm-create-image -n dgx-os-7.1-gb200-image -a baseos7.1-image-arm64-04-25-2025.tar.gz -s --dgx -r --no-cm-cuda-repo --cmdvd bcm-11.0-ubuntu2404-dgx-os-7.1.iso
Perform BCM “Pod Install” Process
module load bcm-post-install
bcm-pod-setup -I /root/bcm-11.0-ubuntu2404-dgx-os-7.2_arm64_RC3.iso \
--cpu-node-base storage-cpu \
--dgx-node-base storage-dgx \
-C 100.126.0.0/16 \
--dgx-image dgx-image
bcm-pod-setup will do the following:
Status |
Action |
---|---|
✓ |
[Manual] If no BCM bond0 is set, bond1 will be setup to finalize the NICs. |
✓ |
Creates category: dgx-<dgxtype> (dgx-gb200, dgx-b200, slogin, dgx) |
✓ |
[Manual] Update node-installer.conf file: |
✓ |
Creates DGX Storage Setup (xml) |
✓ |
category; use dgx-gb200; |
✓ |
set disksetup |
✓ |
Creates Persistent-storage-<dgxtype> |
✓ |
Adds 60-persistent-storage-<dgxtype>.rules to |
Manual Configuration Required:
/cm/node-installer/scripts/node-installer.conf
- setupBmc to false
- failOnMissingBmc = false
- strictBmcUserId = false
Attention
Importing JSON is not required as bcm-post-setup will be able to add networks, device, and racks automatically.
User will be required to provide the following passwords when running the netautogen command:
Cumulus OS Password
NSwitch NVOS Password
Server BMC Password
Note
Cumulus and NVOS are x86 - If BCM running ARM64
If BCM is running on ARCH: ARM64, an additional (AMD image) <name>_x86.iso must be mounted.
This ISO is required to retrieve the cm-lite-daemon package, which is necessary for installation on Cumulus and NVOS systems.
Steps:
# Upload bcm_x86.iso on to BCM
mount -o loop <file_name_bcm_x86.iso> /mnt/dvd/
cd /mnt/dvd/data
cm-lite-daemon-repo /mnt/dvd/
# File will be copied over here:
ls -l /cm/local/apps/cmd/etc/htdocs/switch/ | grep cm-lite
Internal Reference |
---|
Manual Process#
This section provides general guidance for 1-2 rack set up only. It does not include all the steps involved to configure the entire fabric manually.
Please refer to the prerequisites section for gathering required files.
p2p_ethernet.csv (required): Format the Excel file to a CSV file.
siteinfo.yaml (required): For more information, see Section 1.1: siteinfo.yaml.
rack-<rackName>.csv (optional): Rack Inventory for GB200. This file is provided by the factory.
To configure Cumulus Switch Configuration using the Template:
Create Cumulus NVUE commands from scratch based on the reference (Ref. Guide)
Prepare Installation Media (USB) (Guide)
Connect USB#2 stick to each Ethernet switch reboot and let it install/upgrade Cumulus Linux via ONIE
After the switches are rebooted, connect to the switch via serial console and apply the configuration
Apply Configuration
Upload the startup.yaml
copy the content to /etc/nvue.d/startup.yaml
nv config replace /etc/nvue.d/startup.yaml
nv config apply -y
Once the switches are rebooted, connect to the switch via serial console and apply the configuration, next step is to add the network, device, racks into BCM.
See the section rack-bring-up-install guide for next steps.
Semi-Automated Process (DHCP/ZTP)#
The following steps provide a Semi-Automated way to bring up the Network Switch Configuration:
Note
The semi-automated process provides a method to bring up the network switch configuration without requiring manual setup. It uses pre-generated startup.yaml files prepared in advance of deployment. Unlike the fully automated workflow, it does not require p2p_ethernet.csv or siteinfo.yaml, since all switch configurations and the corresponding BCM JSON files are already pre-generated.
Copy Cumulus Switch Configuration Files from USB:
Copy the switch startup.yaml from USB#2 to /cm/local/apps/cmd/etc/htdoc/switch/config/
Adjust ZTP File Contents (Deployment Specific)
Next, configure BCM Server for DHCP/ZTP Ethernet Switch Bringup:
Using the CMSH, add network, device, racks:
Import the .json file into BCM; it was created by NetAutoGen.
Upload the Following Files to BCM Server:
Cumulus Linux Installer Binary
Cumulus Linux Switch Configuration Files
NVOS image
Initiate Cumulus Linux Switch Provisioning.
Fully Automated Process (BCM/ZTP)#
Note
The fully automated process provides a method to bring up the network switch configuration automatically, by using the bcm-netautogen and bcm-pod-setup tools.
On-Site Process#
Use USB#1 to bringup BCM server
Use USB#2 to bringup the network switch configuration
Copy over the gathered files to BCM server, see prerequisites section for gathering required files.
Insert USB#2 into the BCM.
Copy the content from USB#2 into BCM
sudo fdisk -l
sudo mkdir -p /media/BCM
sudo mount -t vfat /dev/sdb1 /media/BCM
lsblk
sudo rsync -av /media/BCM/*.bin /cm/local/apps/cmd/etc/htdocs/switch/image
sudo rsync -av /media/BCM/* <any local path>
Follow the same process as the Pre-Run Process starting from step 4.
Build Access-OOB switch first before provisioning other switches
[Semi-Manual] add the network to BCM if BCM-01 has a connection to the Access-OOB switch, where the eth0 interfaces of other switches are physically connected. (Ref. Guide)
Use a USB-to-serial connection to apply the startup.yaml configuration file to the Access-OOB switch.
Use the ZTP process to take the switch configuration.
[Manual] Once NetAutoGen has completed generating configurations for all switches, use a USB-to-serial connection to upload the startup.yaml file to each switch.
Order of Switch Bringup is below:
Power cycle: All the OOBs, SPINEs, TORs,
[Pod-setup] BCM will also have all the devices added including NVSwitches.
** Ensure all parameters are properly configured before provisioning the control plane nodes.
Warning
If you choose the options below, no networks or devices will be added to the BCM. Please proceed with caution.
Alternative Deployment Options#
Option 1: NO bcm-netautogen, YES bcm-pod-install#
If BCM running ARM64: If the BCM is operating on an ARM64 architecture, an additional ISO image named <name>_x86.iso must be mounted. This ISO contains the cm-lite-daemon package, which is required for installing on Cumulus and NVOS systems.
# Upload bcm_x86.iso on to BCM
mount -o loop <file_name_bcm_x86.iso> /mnt/dvd/
cd /mnt/dvd/data
cm-lite-daemon-repo /mnt/dvd/
# File will be copied over here:
ls -l /cm/local/apps/cmd/etc/htdocs/switch/ | grep cm-lite
bcm-pod-setup -I /root/bcm-<image>.iso -C 100.126.0.0/16 -S 100.127.0.0/16 --dgx-type gb200
Note: When adding flag –dgx-type gb200 (the tool bcm-netautogen was not executed) Since bcm-netautogen was not run, the dgx-type value is instead taken from siteinfo.yaml.
Option 2: NO bcm-netautogen, NO bcm-pod-install#
CM-Create IMAGE
DGX Image required pre-existing image (/cm/image/<dgx_image>)
Depending on the Environment:
Airgapped Environment adding –skipdist (not to update the apt-get package)
cm-create-image -n dgx-os-7.1-gb200-image -a /root/baseos7.1-image-arm64-04-25-2025.tar.gz --dgx -r --no-cm-cuda-repo --cmdvd /root/bcm-11.0-ubuntu2404-dgx-os-7.1.iso --skipdist
Non-Airgapped Environment
# CREATE IMAGE REQUIRED
# To Download the .tar file https://support2.brightcomputing.com/baseos7-<ARCH>/<latest>.tar.gz
cm-create-image -n dgx-os-7.1-gb200-image -a baseos7.1-image-arm64-04-25-2025.tar.gz -s --dgx -r --no-cm-cuda-repo --cmdvd bcm-11.0-ubuntu2404-dgx-os-7.1.iso