Configure Cluster Ethernet Networking#

The following files are required. These can either be created manually or formatted automatically.

For each network that is setup within BCM there is an equivalent set of physical switches that need to be deployed and configured:

  • (edge) - Edge Network

  • (internalnet) - BCM Inband Network Switch - Control Plane/Internalnet

  • (storagenet) - DGX Inband/Storage Network Switches

  • (ipmninet0) - OOB Network (SN2201)

  • (ipminet(n) - NVSwitch COMe Network, PDU/Power Shelves, IB Switches, In Rack OOB

Refer to the M0 document for pre-work that needs to be done before deploying the configuration per switch type above.

USB Preparation#

Prepare 2 USB (~size: 64gb) sticks:

USB #1: Bootable BCM ISO

USB #2:

  • Cumulus OS (.bin), NVOS (.bin), Infiniband Switch OS (.img)

  • p2p_ethernet.csv, siteinfo.yml (Please see the Appendix …. for examples)

GB200 Rack Inventories Workflow#

GB200 Rack Inventories Workflow

Figure 2 GB200 Rack Inventories Workflow#

  1. Factory Preparation: Prepare the racks at the factory.

  2. Data Collection: The factory collects data for each rack’s components and sends an Excel file for each rack.

  3. Cable Mapping: Prepare point-to-point (P2P) files for cable mapping.

  4. Rack Arrival: Receive racks from the factory (not necessarily in order). Placement: Roll the GB200 racks into the reserved locations as they arrive.

  5. Parser: Will parse the rack inventory file and create mac to IP allocation (Check with NVIS Team)

  6. Rack Identification: Update identifies the rack serial numbers and maps them to rack names in the site survey file, along with other SuperPOD build information.

    1. Mapping Format: <CustomerRackPositionName>: <RackSerial#>

    2. Netautogen Tool: Run the netautogen tool.

    3. File Retrieval: Pull the P2P file and site survey data, including rack mapping, network details, and other information.

  7. Data Processing: The tool identifies data based on rack name mapping and retrieves serial numbers, interface names, and MAC addresses for each component

  8. Data Generation: Generate data for each component, including IP addresses, serial numbers, MAC addresses, and interfaces.

  9. BCM Configuration: Add configuration to BCM with the network, devices, and packages.

Manual Process#

This section provides general guidance for 1-2 rack setup only. It does not include all the steps involved to configure the entire fabric manually.

The following table shows the workflow for manual set up of the rack:

STAGES

SUMMARY

STATUS

1

Preparation and Documentation

  • Prepare all necessary files and documentation for deployment

  • Prepare configuration files and templates

  • Create USB sticks with deployment tools

  • Document network topology and requirements

  • Validate hardware inventory and specifications

2

On-Site Validation and Testing

  • Test network connectivity and routing

  • Validate BGP underlay and EVPN overlay

  • Test device discovery and management

  • Verify service functionality and performance

3

Zero Touch Provisioning (ZTP) Readiness

  • Validate ZTP configuration and readiness

Stage1: Preparation: Built IP breakout as following:#

GB200:#

Minimum subnet calculation 4 full GB200 rack

Number of Switches

POD

RACKs

GPUs

DGX Systems

CPU

1

8

576

144

10

2

16

1152

288

10

3

24

1728

432

10

4

32

2304

576

10

OOB Subnet Breakouts

OOB Networks

Configuration 1

Configuration 2

ipminet1

2 x /24

1 x /23

ipminet1[1-16]

2 x /24

2 x /23

ipminet2[1-16]

2 x /24

3 x /23

2 x /24

4 x /23

DATA Subnet Breakouts

OOB Root Prefix

DATA Root Prefix

Root Prefix Size

internalnet

MISC (Lo0/Edge)

dgxnet1[1-n]

1 x /24

1 x /24

2 x /25

21

22

20

1 x /24

1 x 24

4 x /25

21

22

20

1 x /24

1 x 24

6 x /25

21

21

20

1 x /24

1 x 24

8 x /25

21

21

20

Switch Configuration using Template:#

Reference Cumulus NVUE commands for OOB, TOR, and SPINE

Once the switch configurations are prepared, copy the configuration on to USB#2 and then

Download Cumulus Linux Installer Binary:

Go to the Nvidia Enterprise Portal and download: Downloads > Switches and Gateways > Switch Software > Nvidia Cumulus Linux

Prepare Installation Media (USB) (see)

Stage 2: ON Field Deployment#

Option 1: To provision the Ethernet Switch manually#

Use the following steps to provision OS manually using a USB stick:

  1. Connect USB# 2 into Cumulus Switch

  2. Power cycle the switch

  3. Required USB-C to RJ45

Connect the USB-C into the laptop (MAC) and the RJ45 to the console port of the Switch.

Console into the Switch:

If you are using macOS, then the following command can be used:

ls /dev/cu.*, and look for something like usbserial
screen /dev/cu.usb 115200

If NVUE commands created, then just copy and paste directly to the command line:

nv config show
nv config diff
nv config apply -y

If you have prepared startup.yaml file, then do following:

## Copy the startup.yaml
cp /media/BCM/<hostname>_startup.yaml /etc/nvue.d/startup.yaml
nv config replace /etc/nvue.d/startup.yaml
nv config apply -y
  1. Disconnect USB#2 and Console cable (RJ45) from the switch

  2. Repeat Step 1 to all the Cumulus Ethernet Switches.

Option 2: Provision your Ethernet Switch using BCM through the ZTP process#

If you are following the second option, ZTP provisioning the ethernet switch using BCM:

  1. Insert USB#2 into the BCM.

  2. Ensure USB#2 contains the pre-generated and/or modified configuration files.

  3. Copy the content from the USB# 2 into BCM

  4. Connect the USB# 2 into BCM and then use the following command sequence:

sudo fdisk -l
sudo mkdir -r /media/BCM
sudo mount -t vfat /dev/sdb1 /media/BCM
lsblk
sudo rsync -av /media/BCM/*.bin /cm/local/apps/cmd/etc/htdocs/switch/image
sudo rsync -av /media/BCM/* <any local path>
  1. Use the configuration tailored to the Customer Project.

  2. Copy the Cumulus switch configuration files from USB#2.

  3. Transfer startup.yaml to the following directory: /cm/local/apps/cmd/etc/htdocs/switch/<hostname>/startup.yaml.

Choose your option:#

NO bcm-netautogen, YES bcm-pod-install:#

If the BCM is operating on an ARM64 architecture, an additional ISO image named <name>_x86.iso must be mounted. This ISO contains the cm-lite-daemon package, which is required for installation on Cumulus and NVOS systems.

Upload bcm_x86.iso on to BCM using the following commands:

mount -o loop <file_name_bcm_x86.iso> /mnt/dvd/
cd /mnt/dvd/data
cm-lite-daemon-repo /mnt/dvd/

## File will be copied over here:
ls -l /cm/local/apps/cmd/etc/htdocs/switch/ | grep cm-lite
bcm-pod-setup -I /root/bcm-<image>.iso -C 100.126.0.0/16 -S 100.127.0.0/16 --dgx-type gb200

When adding flag --dgx-type gb200 (the tool bcm-netautogen was not executed).

Since bcm-netautogen was not run, the dgx-type value is instead taken from siteinfo.yaml.

NO bcm-netautogen, NO bcm-pod-install#

CM-Create IMAGE

DGX Image required pre-existing image (/cm/image/<dgx_image>)

Depending on the Environment:

Airgapped Environment

adding --skipdist (not to update the apt-get package)
cm-create-image -n dgx-os-7.1-gb200-image -a /root/baseos7.1-image-arm64-04-25-2025.tar.gz --dgx -r --no-cm-cuda-repo --cmdvd /root/bcm-11.0-ubuntu2404-dgx-os-7.1.iso --skipdist

Non-Airgapped Environment

CREATE IMAGE REQUIRED To Download the .tar file https://support2.brightcomputing.com/baseos7-<ARCH>/<latest>.tar.gz

cm-create-image -n dgx-os-7.1-gb200-image -a baseos7.1-image-arm64-04-25-2025.tar.gz -s --dgx -r --no-cm-cuda-repo --cmdvd bcm-11.0-ubuntu2404-dgx-os-7.1.iso

Configure manually following features:

Network

cmsh network; list
add ipminet0
set netmaskbits 26
set baseaddress <subnet network>
set nodebooting yes
set dynamicrangestart <network start range>
set dynamicrangeend <network end range>
set gateway <subnet gw>
set type Internal
set domainname cm.ipminet1
exit
commit

switch:

add switch <switch_hostname>
set ip <ipv4address>
set network ipminet0
set mac <provide correct mac>
set nvconfigurationmode file
set nvconfigurationfile <file path of switch configuration>
set hasclientdaemon yes
set disablesnmp yes
ztpsettings
set enableapi yes
set checkimageonboot yes
set image cumulus-linux-<version>-mlx-amd64.bin
exit
commit

The following images show examples of the switch settings:

NV Configuration

Figure 3 NV configuration example output#

ZTP Settings

Figure 4 ZTP Settings (command-line output) example output#

Initiate Primary OOB Cumulus Linux Switch Provisioning.

  1. Connect USB-to-Serial Install/Upgrade Cumulus Linux

  2. Apply Configuration

  3. Upload the startup.yaml

  4. copy the content to /etc/nvue.d/startup.yaml

nv config replace /etc/nvue.d/startup.yaml
nv config apply -y

Once the Access-OOB is configured, all ETH0 interfaces should be connected to the Access-OOB network. Based on the modified startup configuration placed in the designated BCM location, ZTP will automatically apply the configuration and install the cm-lite-daemon as part of the Cumulus ZTP process.

Setup Access OOB Switch#

  1. Build Access-OOB switch first before provisioning other switches. What is an Access OOB Switch? It refers to an SN2201 OOB switch connected to an additional BCM RJ45 1G cable. Its purpose is to provision the first OOB switch, which is typically where most of the core Ethernet switch management connections are made.

    Access OOB Switch

    Figure 5 Access OOB Switch#

  2. Check if there is 1G cable connected to OOB SW ETH0 port, run below command:

    cmsh -c "network; add provision; set domainname provision.cluster; set baseaddress 192.168.0.0; set netmaskbits 30; set nodebooting yes; set dynamicrangestart 192.168.0.1; set dynamicrangeend 192.168.0.1; commit"
    

    A new network named “provision” will be created in the BCM with the network address 192.168.0.0/30.

  3. Reboot Access-OOB:

    The rebooted switch will receive 192.168.0.1 IP from the BCM and it will configure startup.yaml file.