Configure Cluster Ethernet Networking#

Following files required, either create manually or formatted automatically

For each network that is setup within BCM there is an equivalent set of physical switches that need to be deployed and configured:

SN5600 Switches

  • BCM Inband Network Switch - Control Plane/Internalnet

  • Ethernet Core and Spine Switch (For greater than 1 SU SuperPODs)

  • DGX Inband/Storage Network Switches

SN2201 Switches

  • OOB Network (SN2201)

  • NVSwitch COMe Network

  • Control Plane OOB Network ( Mgmt Nodes, DDN, etc.)

Deployment Process Options#

The following table provides an overview of the different deployment processes available for configuring cluster ethernet networking:

Process Type

Description

Reference Section

Pre-Stage

Initial setup and preparation steps

Prerequisite Stage

Pre-Run

Validation and preparation steps

Pre-Run Process

Manual Process

Step-by-step manual configuration

Manual Process

Semi-Automated Process

DHCP/ZTP-based automated deployment

Semi-Automated Process (DHCP/ZTP)

Fully Automated Process

BCM/ZTP-based complete automation

Fully Automated Process (BCM/ZTP)

Prerequisite Stage#

This section covers the initial setup and preparation steps required before beginning any deployment process.

Prerequisites:

  • Prefix Allocation: (see Appendix) - BMS use - /24 subnet, outside of supernet allocated for cluster

  • P2P-Ethernet formatted for Netautogen to consume - p2p_standard_guide (see Appendix) - p2p example (see Appendix) - p2p column header / worksheet naming / default hostname (see Appendix)

  • SiteInfo file with the following: (see Appendix) - Customer provided prefix - BGP between Customer SPOD handoff - Rack Inventory Mapping for GB200: <Serial> - Misc data like: NTP, Syslog, DNS

  • GB200 Rack Inventory file - Format: .csv (Only supporting file from: Splunk DB) - Example (see Appendix)

  • Download Files: - Cumulus Linux Installer Binary - NVOS - InfiniBand OS

  • Build 2 USB sticks for production:

    USB 1

    USB 2

    BCM ISO

    p2p-ethernet.csv, GB200-rack-inventory, OS: Cumulus OS: 5.11.0 (x86), NVOS: 1.1.0 (x86), InfiniBand OS

Pre-Run Process#

This section covers the validation and preparation steps that should be performed before beginning the actual deployment.

  1. To validate P2P and Network Auto Configuration:

    • Analyze / Restructure P2P (using GUI)

      1. Analyze Point-to-Point (P2P) network connections

      2. Identify P2P topology patterns and dependencies

      3. Restructure P2P connections for netautogen to consume

      4. Validate P2P link configurations

    • Network Auto Generate

      1. Generate network configurations based on P2P.

      2. Generate JSON to be imported into BCM as part of optional

    • To Access the UI:

      1. Go to Network Analyzer Dashboard:

        1. Temporary Login: bcm / bcm123

        2. Select 2: Network Auto Generate

          1. Project Name

          2. DGX Type

          3. Upload siteinfo.yml. For more information, see Section 1.1: siteinfo.yaml.

          4. Upload p2p_ethernet.csv

          5. Upload rack inventory files

        3. Press: Generate Configuration

          1. If there is an error, please check the file and logs to fix it manually.

    • Option 1: Download the generated project. (Download Project)

      1. Will gave tar.gz file contains all the generated configuration files for the switches and BCM JSON files.

    • Option 2: Deploy Virtual Env. in AIR for network validation (Launch Air SIM)

      1. Will gave Virtual Environment in AIR, which can be used to validate the network configuration.

To Validate the bcm-pod-setup, then use Krusty as an virtual example

  1. Bring up Base Command Manager (BCM) in Krusty Environment

cod cc --version 11.0-dev --distro ubuntu2404 --head-node-root-volume-size 80 -m cod.8g4c -n 0

apt-get update; apt-get install bcm-superpod-network bcm-post-install
  1. Upload all the files on to Krusty

  2. Execute BCM “NetAutogen” Automation

module load bcm-superpod-network

bcm-netautogen --config-path <path_to_config_files> -l netauto.log

The commands above will do following:

  • Gather data from the p2p_ethernet and parse the data.

  • Generate Network subnets.

  • Allocate IP for each component.

  • Parse GB200 rack inventory information from the file and allocate subnets and IPs.

  • Generate all North-to-South Switch configuration in a startup.yaml along with NVSwitch common switch configurations.

  • Generate a .JSON file suitable for manual import into BCM as an alternative backup method.

CM-Create IMAGE:

  • bcm-pod-setup will look for the directory for pre-existing image of the dgx (/cm/image/<dgx_image>)

Depending on the Environment:

  • Airgapped Environment: adding –skipdist (not to update the apt-get package)

cm-create-image -n dgx-os-7.1-gb200-image -a /root/baseos7.1-image-arm64-04-25-2025.tar.gz --dgx -r --no-cm-cuda-repo --cmdvd /root/bcm-11.0-ubuntu2404-dgx-os-7.1.iso --skipdist

Non-Airgapped Environment CREATE IMAGE REQUIRED To Download the .tar file: https://support2.brightcomputing.com/baseos7-<ARCH>/<latest>.tar.gz

cm-create-image -n dgx-os-7.1-gb200-image -a baseos7.1-image-arm64-04-25-2025.tar.gz -s --dgx -r --no-cm-cuda-repo --cmdvd bcm-11.0-ubuntu2404-dgx-os-7.1.iso
  • Perform BCM “Pod Install” Process

module load bcm-post-install
bcm-pod-setup -I /root/bcm-11.0-ubuntu2404-dgx-os-7.2_arm64_RC3.iso \
    --cpu-node-base storage-cpu \
    --dgx-node-base storage-dgx \
    -C 100.126.0.0/16 \
    --dgx-image dgx-image

bcm-pod-setup will do the following:

Status

Action

[Manual] If no BCM bond0 is set, bond1 will be setup to finalize the NICs.

Creates category: dgx-<dgxtype> (dgx-gb200, dgx-b200, slogin, dgx)

[Manual] Update node-installer.conf file:

Creates DGX Storage Setup (xml)

category; use dgx-gb200;

set disksetup

Creates Persistent-storage-<dgxtype>

Adds 60-persistent-storage-<dgxtype>.rules to

Manual Configuration Required:

/cm/node-installer/scripts/node-installer.conf

- setupBmc to false
- failOnMissingBmc = false
- strictBmcUserId = false

Attention

Importing JSON is not required as bcm-post-setup will be able to add networks, device, and racks automatically.

  • User will be required to provide the following passwords when running the netautogen command:

    • Cumulus OS Password

    • NSwitch NVOS Password

    • Server BMC Password

Note

Cumulus and NVOS are x86 - If BCM running ARM64

If BCM is running on ARCH: ARM64, an additional (AMD image) <name>_x86.iso must be mounted.

This ISO is required to retrieve the cm-lite-daemon package, which is necessary for installation on Cumulus and NVOS systems.

Steps:

# Upload bcm_x86.iso on to BCM
mount -o loop <file_name_bcm_x86.iso> /mnt/dvd/
cd /mnt/dvd/data
cm-lite-daemon-repo /mnt/dvd/

# File will be copied over here:
ls -l /cm/local/apps/cmd/etc/htdocs/switch/ | grep cm-lite

Manual Process#

This section provides general guidance for 1-2 rack set up only. It does not include all the steps involved to configure the entire fabric manually.

Please refer to the prerequisites section for gathering required files.

  • p2p_ethernet.csv (required): Format the Excel file to a CSV file.

  • siteinfo.yaml (required): For more information, see Section 1.1: siteinfo.yaml.

  • rack-<rackName>.csv (optional): Rack Inventory for GB200. This file is provided by the factory.

To configure Cumulus Switch Configuration using the Template:

  • Create Cumulus NVUE commands from scratch based on the reference (Ref. Guide)

  • Prepare Installation Media (USB) (Guide)

  • Connect USB#2 stick to each Ethernet switch reboot and let it install/upgrade Cumulus Linux via ONIE

  • After the switches are rebooted, connect to the switch via serial console and apply the configuration

  • Apply Configuration

    • Upload the startup.yaml

      • copy the content to /etc/nvue.d/startup.yaml

      • nv config replace /etc/nvue.d/startup.yaml

      • nv config apply -y

  • Once the switches are rebooted, connect to the switch via serial console and apply the configuration, next step is to add the network, device, racks into BCM.

  • See the section rack-bring-up-install guide for next steps.

Semi-Automated Process (DHCP/ZTP)#

The following steps provide a Semi-Automated way to bring up the Network Switch Configuration:

Note

The semi-automated process provides a method to bring up the network switch configuration without requiring manual setup. It uses pre-generated startup.yaml files prepared in advance of deployment. Unlike the fully automated workflow, it does not require p2p_ethernet.csv or siteinfo.yaml, since all switch configurations and the corresponding BCM JSON files are already pre-generated.

  • Copy Cumulus Switch Configuration Files from USB:

  • Copy the switch startup.yaml from USB#2 to /cm/local/apps/cmd/etc/htdoc/switch/config/

  • Adjust ZTP File Contents (Deployment Specific)

Next, configure BCM Server for DHCP/ZTP Ethernet Switch Bringup:

  • Using the CMSH, add network, device, racks:

    • Import the .json file into BCM; it was created by NetAutoGen.

    Upload the Following Files to BCM Server:

  • Cumulus Linux Installer Binary

  • Cumulus Linux Switch Configuration Files

  • NVOS image

  • Initiate Cumulus Linux Switch Provisioning.

Fully Automated Process (BCM/ZTP)#

Note

The fully automated process provides a method to bring up the network switch configuration automatically, by using the bcm-netautogen and bcm-pod-setup tools.

On-Site Process#

  • Use USB#1 to bringup BCM server

  • Use USB#2 to bringup the network switch configuration

  • Copy over the gathered files to BCM server, see prerequisites section for gathering required files.

  • Insert USB#2 into the BCM.

  • Copy the content from USB#2 into BCM

sudo fdisk -l
sudo mkdir -p /media/BCM
sudo mount -t vfat /dev/sdb1 /media/BCM
lsblk
sudo rsync -av /media/BCM/*.bin /cm/local/apps/cmd/etc/htdocs/switch/image
sudo rsync -av /media/BCM/* <any local path>
  • Follow the same process as the Pre-Run Process starting from step 4.

  • Build Access-OOB switch first before provisioning other switches

    • [Semi-Manual] add the network to BCM if BCM-01 has a connection to the Access-OOB switch, where the eth0 interfaces of other switches are physically connected. (Ref. Guide)

      • Use a USB-to-serial connection to apply the startup.yaml configuration file to the Access-OOB switch.

      • Use the ZTP process to take the switch configuration.

    • [Manual] Once NetAutoGen has completed generating configurations for all switches, use a USB-to-serial connection to upload the startup.yaml file to each switch.

  • Order of Switch Bringup is below:

    • Power cycle: All the OOBs, SPINEs, TORs,

  • [Pod-setup] BCM will also have all the devices added including NVSwitches.

  • ** Ensure all parameters are properly configured before provisioning the control plane nodes.

Warning

If you choose the options below, no networks or devices will be added to the BCM. Please proceed with caution.

Alternative Deployment Options#

Option 1: NO bcm-netautogen, YES bcm-pod-install#

If BCM running ARM64: If the BCM is operating on an ARM64 architecture, an additional ISO image named <name>_x86.iso must be mounted. This ISO contains the cm-lite-daemon package, which is required for installing on Cumulus and NVOS systems.

# Upload bcm_x86.iso on to BCM
mount -o loop <file_name_bcm_x86.iso> /mnt/dvd/
cd /mnt/dvd/data
cm-lite-daemon-repo /mnt/dvd/

# File will be copied over here:
ls -l /cm/local/apps/cmd/etc/htdocs/switch/ | grep cm-lite

bcm-pod-setup -I /root/bcm-<image>.iso -C 100.126.0.0/16 -S 100.127.0.0/16 --dgx-type gb200

Note: When adding flag –dgx-type gb200 (the tool bcm-netautogen was not executed) Since bcm-netautogen was not run, the dgx-type value is instead taken from siteinfo.yaml.

Option 2: NO bcm-netautogen, NO bcm-pod-install#

CM-Create IMAGE

DGX Image required pre-existing image (/cm/image/<dgx_image>)

Depending on the Environment:

Airgapped Environment adding –skipdist (not to update the apt-get package)

cm-create-image -n dgx-os-7.1-gb200-image -a /root/baseos7.1-image-arm64-04-25-2025.tar.gz --dgx -r --no-cm-cuda-repo --cmdvd /root/bcm-11.0-ubuntu2404-dgx-os-7.1.iso --skipdist

Non-Airgapped Environment

# CREATE IMAGE REQUIRED
# To Download the .tar file https://support2.brightcomputing.com/baseos7-<ARCH>/<latest>.tar.gz

cm-create-image -n dgx-os-7.1-gb200-image -a baseos7.1-image-arm64-04-25-2025.tar.gz -s --dgx -r --no-cm-cuda-repo --cmdvd bcm-11.0-ubuntu2404-dgx-os-7.1.iso