Configure Cluster Ethernet Networking#

Following files required, either create manually or formatted automatically.

For each network that is setup within BCM there is an equivalent set of physical switches that need to be deployed and configured:

SN5600 Switches

  • BCM Inband Network Switch - Control Plane/Internalnet

  • Ethernet Core and Spine Switch (For greater than 1 SU SuperPODs)

  • DGX Inband/Storage Network Switches

SN2201 Switches

  • OOB Network (SN2201)

  • NVSwitch COMe Network

  • Control Plane OOB Network ( Mgmt Nodes, DDN, etc.)

Deployment Process Options#

The following table provides an overview of the different deployment processes available for configuring cluster ethernet networking:

Process Type

Description

Reference Section

Pre-Stage

Initial setup and preparation steps

Prerequisite Stage

Pre-Run

Validation and preparation steps

Pre-Run Process

Manual Process

Step-by-step manual configuration

Manual Process

Semi-Automated Process

DHCP/ZTP-based automated deployment

Semi-Automated Process (DHCP/ZTP)

Fully Automated Process

BCM/ZTP-based complete automation

Fully Automated Process (BCM/ZTP)

Prerequisite Stage#

This section covers the initial setup and preparation steps required before beginning any deployment process.

Prerequisites:

  • Prefix Allocation: (see Appendix)

    • BMS use - /24 subnet, outside of supernet allocated for cluster

  • P2P-Ethernet formatted for Netautogen to consume

    • p2p_standard_guide (see Appendix)

    • p2p example (see Appendix)

    • p2p column header / worksheet naming / default hostname (see Appendix)

  • SiteInfo file with the following: (see Appendix)

    • Customer provided prefix

    • BGP between Customer SPOD handoff

    • Rack Inventory Mapping for GB200: <Serial>

    • Misc data like: NTP, Syslog, DNS

  • GB200 Rack Inventory file

    • Format: .csv (Only supporting file from: Splunk DB)

    • Example (see Appendix)

  • Download Files:

    • Cumulus Linux Installer Binary

    • NVOS

    • InfiniBand OS

  • Build 2 USB sticks for production:

    Table 8 :header-rows: 1 :widths: 20 80 :class: nvidia-green-header#

    USB 1

    USB 2

    BCM ISO

    p2p-ethernet.csv, GB200-rack-inventory,

Pre-Run Process#

This section covers the validation and preparation steps that should be performed before beginning the actual deployment.

  1. To validate P2P and Network Auto Configuration:

    • Analyze / Restructure P2P (using GUI)

      1. Analyze Point-to-Point (P2P) network connections

      2. Identify P2P topology patterns and dependencies

      3. Restructure P2P connections for netautogen to consume

      4. Validate P2P link configurations

    • Network Auto Generate

      1. Generate network configurations based on P2P.

      2. Generate JSON to be imported into BCM as part of optional

    • To Access the UI:

      1. Go to Network Analyzer Dashboard:

        1. Temporary Login: bcm / bcm123

        2. Select 2: Network Auto Generate

          1. Project Name

          2. DGX Type

          3. Upload siteinfo.yml. For more information, see Section 1.1: siteinfo.yaml.

          4. Upload p2p_ethernet.csv

          5. Upload rack inventory files

        3. Press: Generate Configuration

          1. If there is an error, please check the file and logs to fix it manually.

    • Option 1: Download the generated project. (Download Project)

      1. Will gave tar.gz file contains all the generated configuration files for the switches and BCM JSON files.

    • Option 2: Deploy Virtual Env. in AIR for network validation (Launch Air SIM)

      1. Will gave Virtual Environment in AIR, which can be used to validate the network configuration.

  2. Bring up Base Command Manager (BCM) in Krusty Environment

    Note

    To Validate the bcm-pod-setup, then use Krusty as an virtual example.

    $ cod cc --version 11.0-dev --distro ubuntu2404 --head-node-root-volume-size 80 -m cod.8g4c -n 0
    
    $ apt-get update; apt-get install bcm-superpod-network bcm-post-install
    
  3. Upload all the files on to Krusty

  4. Execute BCM “NetAutogen” Automation

    $ module load bcm-superpod-network
    
    $ bcm-netautogen --config-path <path_to_config_files> -l netauto.log
    

    The commands above will do following:

    • Gather data from the p2p_ethernet and parse the data.

    • Generate Network subnets.

    • Allocate IP for each component.

    • Parse GB200 rack inventory information from the file and allocate subnets and IPs.

    • Generate all North-to-South Switch configuration in a startup.yaml along with NVSwitch common switch configurations.

    • Generate a .json file suitable for manual import into BCM as an alternative backup method.

    CM-Create IMAGE:

    bcm-pod-setup will look for the directory for pre-existing image of the dgx (/cm/image/<dgx_image>)

    Depending on the Environment:

    Airgapped Environment: adding --skipdist (not to update the apt-get package)

    $ cm-create-image -n dgx-os-7.1-gb200-image -a /root/baseos7.1-image-arm64-04-25-2025.tar.gz --dgx -r --no-cm-cuda-repo --cmdvd /root/bcm-11.0-ubuntu2404-dgx-os-7.1.iso --skipdist
    

    Non-Airgapped Environment CREATE IMAGE REQUIRED To Download the .tar file: https://support2.brightcomputing.com/baseos7-<ARCH>/<latest>.tar.gz

    $ cm-create-image -n dgx-os-7.1-gb200-image -a baseos7.1-image-arm64-04-25-2025.tar.gz -s --dgx -r --no-cm-cuda-repo --cmdvd bcm-11.0-ubuntu2404-dgx-os-7.1.iso
    
  5. Perform BCM “Pod Install” Process

    $ module load bcm-post-install
    $ bcm-pod-setup -I /root/bcm-11.0-ubuntu2404-dgx-os-7.2_arm64_RC3.iso \
        --cpu-node-base storage-cpu \
        --dgx-node-base storage-dgx \
        -C 100.126.0.0/16 \
        --dgx-image dgx-image
    

    bcm-pod-setup will do the following:

    Status

    Action

    [Manual] If no BCM bond0 is set, bond1 will be setup to finalize the NICs.

    Creates category: dgx-<dgxtype> (dgx-gb200, dgx-b200, slogin, dgx)

    [Manual] Update node-installer.conf file:

    Creates DGX Storage Setup (xml)

    category; use dgx-gb200;

    set disksetup

    Creates Persistent-storage-<dgxtype>

    Adds 60-persistent-storage-<dgxtype>.rules to

    Manual Configuration Required:

    $ /cm/node-installer/scripts/node-installer.conf
    
    • setupBmc to false

    • failOnMissingBmc = false

    • strictBmcUserId = false

    Attention

    Importing .json is not required as bcm-post-setup will be able to add networks, device, and racks automatically.

    User will be required to provide the following passwords when running the netautogen command:

    • Cumulus OS Password

    • NSwitch NVOS Password

    • Server BMC Password

    Note

    Cumulus and NVOS are x86 - If BCM running ARM64 If BCM is running on ARCH: ARM64, an additional AMD image <NAME>_x86.iso must be mounted. This ISO is required to retrieve the cm-lite-daemon package, which is necessary for installation on Cumulus and NVOS systems.

    Steps:

    # Upload bcm_x86.iso on to BCM
    $ mount -o loop <file_name_bcm_x86.iso> /mnt/dvd/
    $ cd /mnt/dvd/data
    $ cm-lite-daemon-repo /mnt/dvd/
    
    # File will be copied over here:
    $ ls -l /cm/local/apps/cmd/etc/htdocs/switch/ | grep cm-lite
    

Manual Process#

This section provides general guidance for 1-2 rack set up only. It does not include all the steps involved to configure the entire fabric manually.

Please refer to the prerequisites section for gathering required files.

  • p2p_ethernet.csv (required): Format the Excel file to a .csv file.

  • siteinfo.yaml (required): For more information, see Section 1.1: siteinfo.yaml.

  • rack-<RACK_NAME>.csv (optional): Rack Inventory for GB200. This file is provided by the factory.

Configuring the Cumulus Switch using the Template#

To configure the Cumulus Switch using the template, follow the steps below:

  1. Create Cumulus NVUE commands from scratch based on the reference (Guide).

  2. Prepare Installation Media: Includes USB and Guide.

  3. Connect USB#2 stick to each Ethernet switch reboot and let it install/upgrade Cumulus Linux using ONIE.

  4. After the switches are rebooted, connect to the switch using the serial console and apply the configuration.

  5. Apply the following configuration to the switch:

    • Upload the startup.yaml file to the switch.

    • Copy the content to: /etc/nvue.d/startup.yaml.

    • Run the following commands:

      $ nv config replace /etc/nvue.d/startup.yaml
      $ nv config apply -y
      
  6. Once the switches are rebooted, connect to the switch using the serial console and apply the configuration, next step is to add the network, device, racks into BCM.

  7. See the “Rack Bring Up and Install” guide for next steps.

Semi-Automated Process (DHCP/ZTP)#

The following steps provide a Semi-Automated way to bring up the Network Switch Configuration:

Note

The semi-automated process provides a method to bring up the network switch configuration without requiring manual setup. It uses pre-generated startup.yaml files prepared in advance of deployment. Unlike the fully automated workflow, it does not require p2p_ethernet.csv or siteinfo.yaml file, since all switch configurations and the corresponding BCM JSON files are already pre-generated.

Copy Cumulus Switch Configuration Files from USB:

  1. Copy the switch startup.yaml from USB#2 to /cm/local/apps/cmd/etc/htdoc/switch/config/

  2. Adjust ZTP File Contents (Deployment Specific)

Next, configure BCM Server for DHCP/ZTP Ethernet Switch Bringup:

  1. Using the CMSH, add network, device, racks:

  2. Import the .json file into BCM; it was created by NetAutoGen.

  3. Upload the Following Files to BCM Server:

    • Cumulus Linux Installer Binary

    • Cumulus Linux Switch Configuration Files

    • NVOS image

    • Initiate Cumulus Linux Switch Provisioning.

Fully Automated Process (BCM/ZTP)#

Note

The fully automated process provides a method to bring up the network switch configuration automatically, by using the bcm-netautogen and bcm-pod-setup tools.

On-Site Process#

To perform the fully automated on-site process, follow the steps below:

  • Use USB#1 to bringup BCM server.

  • Use USB#2 to bringup the network switch configuration.

  • Copy over the gathered files to BCM server, see Prerequisite Stage for gathering required files.

  • Insert USB#2 into the BCM.

  • Copy the content from USB#2 into BCM using the following commands:

    $ sudo fdisk -l
    $ sudo mkdir -p /media/BCM
    $ sudo mount -t vfat /dev/sdb1 /media/BCM
    $ lsblk
    $ sudo rsync -av /media/BCM/*.bin /cm/local/apps/cmd/etc/htdocs/switch/image
    $ sudo rsync -av /media/BCM/* <any local path>
    
  • Follow the same process as the Pre-Run Process starting from step 4.

  • Build Access-OOB switch first before provisioning other switches

    • [Semi-Manual] add the network to BCM if BCM-01 has a connection to the Access-OOB switch, where the eth0 interfaces of other switches are physically connected: (Ref Guide)

      • Use a USB-to-serial connection to apply the startup.yaml configuration file to the Access-OOB switch.

      • Use the ZTP process to take the switch configuration.

    • [Manual] Once NetAutoGen has completed generating configurations for all switches, use a USB-to-serial connection to upload the startup.yaml file to each switch.

  • Order of Switch Bringup is below:

    • Power cycle: All the OOBs, SPINEs, TORs.

    • [Pod-setup] BCM will also have all the devices added including NV Link Switches.

  • Ensure all parameters are properly configured before provisioning the control plane nodes.

Warning

If you choose the options below, no networks or devices will be added to the BCM. Proceed with caution.

Alternative Deployment Options#

Option 1: NO bcm-netautogen, YES bcm-pod-install#

If BCM running ARM64:

If the BCM is operating on an ARM64 architecture, an additional ISO image named <NAME>_x86.iso must be mounted. This ISO contains the cm-lite-daemon package, which is required for installing on Cumulus and NVOS systems.

# Upload bcm_x86.iso on to BCM
$ mount -o loop <file_name_bcm_x86.iso> /mnt/dvd/
$ cd /mnt/dvd/data
$ cm-lite-daemon-repo /mnt/dvd/

# File will be copied over here:
$ ls -l /cm/local/apps/cmd/etc/htdocs/switch/ | grep cm-lite

$ bcm-pod-setup -I /root/bcm-<image>.iso -C 100.126.0.0/16 -S 100.127.0.0/16 --dgx-type gb200

Note

When you add the --dgx-type gb200 flag (and the bcm-netautogen tool is not executed), the dgx-type value is instead taken from siteinfo.yaml.

Option 2: NO bcm-netautogen, NO bcm-pod-install#

CM-Create IMAGE

GX Image required pre-existing image (/cm/image/<DGX_IMAGE>).

Airgapped Environment

adding --skipdist (To skip the update of the apt-get package)

$ cm-create-image -n dgx-os-7.1-gb200-image -a /root/baseos7.1-image-arm64-04-25-2025.tar.gz --dgx -r --no-cm-cuda-repo --cmdvd /root/bcm-11.0-ubuntu2404-dgx-os-7.1.iso --skipdist

Non-Airgapped Environment

# CREATE IMAGE REQUIRED
# To Download the .tar file https://support2.brightcomputing.com/baseos7-<ARCH>/<latest>.tar.gz

$ cm-create-image -n dgx-os-7.1-gb200-image -a baseos7.1-image-arm64-04-25-2025.tar.gz -s --dgx -r --no-cm-cuda-repo --cmdvd bcm-11.0-ubuntu2404-dgx-os-7.1.iso