BCM Networking Setup#
The manual method for network configuration of a reference DGX SuperPOD is described in this chapter. Deviations from these instructions will occur for customers who do not follow the reference architecture. The automated method for networking setup is done via the bcm-netautogen tool. It will create the necessary networks based on the DGX SuperPOD Ethernet RA.
Manual BCM Networking Setup#
For OEMs, the manual addition of the networks into BCM is required.
Internalnet is defined during BCM head node installation but can be
modified here. globalnet is also a predefined network and where the
global network type is defined. For a more in-depth explanation of
adding and configuring networks within BCM 11, consult Section 3.2 Network Settings
of the BCM 11 Administrator Manual. In general, the following network
definitions need to be added for a GB200 NVL72 cluster:
internalnet
dgxnet (only if a separate network subnet is being used to provision GB200 compute nodes)
ipminet
computenet
storagenet
failovernet (only used if a dedicated heartbeat RJ 45 cable is connected between headnodes)
For internalnet, dgxnet, and ipminet, set node booting to yes and management allowed to yes.
internalnet#
internalnet is used to provision the control plane nodes in a reference SuperPOD design. By default, both node booting and management allowed are set to yes. This means that the network will hand out DHCP and can be assigned to a category. This should have been set up during the BCM software installation.
Reference: internalnet settings#
[a03-p1-head-01->network[internalnet]]% show
Parameter Value
-------------------------- ---------------------
Name internalnet
Private Cloud
Revision
Domain Name eth.cluster
Type Internal
MTU 9000
Allow autosign Automatic
Write DNS zone both
Node booting yes
Lock down dhcpd no
Management allowed yes
Search domain index 0
Exclude from search domain no
Disable automatic exports no
Base address 7.241.16.0
Broadcast address 7.241.16.255
Dynamic range start 7.241.16.249
Dynamic range end 7.241.16.254
Netmask bits 24
Gateway 7.241.16.1
Gateway metric
Cloud Subnet ID
EC2AvailabilityZone
Layer3 no
Notes <0B>
dgxnet#
dgxnet is a separate subnet that provisions DGX nodes. The dhcp pool defined here will be the initial IPs given by the node installer until a node is identified by BCM and has its defined configuration applied. For a 1SU (8x GB200 Rack) DGX SuperPOD, there should be two dgxnet subnets.
Example: dgxnet#
cmsh
network
add dgxnet
set mtu 9000
set domainname dgxnet.cluster
set nodebooting yes
set managementallowed yes
set baseaddress <dgxnet subnet>
set dynamicrangestart <dynamic range ip start value>
set dynamicrangeend <dynamic range ip end value>
set netmaskbits 24
set gateway <gateway ip value>
commit
Reference: dgxnet settings#
[a03-p1-head-01->network[dgxnet1]]% show
Parameter Value
-------------------------- ---------------------
Name dgxnet1
Private Cloud
Revision
Domain Name dgxnet1.cluster
Type Internal
MTU 9000
Allow autosign Automatic
Write DNS zone both
Node booting yes
Lock down dhcpd no
Management allowed yes
Search domain index 0
Exclude from search domain no
Disable automatic exports no
Base address 7.241.18.0
Broadcast address 7.241.18.127
Dynamic range start 7.241.18.100
Dynamic range end 7.241.18.126
Netmask bits 25
Gateway 7.241.18.1
Gateway metric 0
Cloud Subnet ID
EC2AvailabilityZone
Layer3 no
Notes <0B>
ipminet#
For DGX SuperPOD, the RA has several ipminet networks which control the OOB access to the control plane nodes, the PDUs, InfiniBand switches, and the GB200 racks.
OOB subnet overview:
/16 - /21: The subnet will be divided into blocks of /24s. (DEFAULT)
The first /24 will be allocated for Control Plane nodes.
The second /24 will be allocated for PDU, PWR, and InfiniBand switches.
The remaining /23 blocks will be distributed across four GB200 rack groups, which include:
4 x GB200 compute racks (DGX compute nodes).
Repeat the following example for each IPMI network.
Example: ipminet0#
cmsh
network
add ipminet0
set mtu 9000
set domainname ipminet0.cluster
set nodebooting yes
set managementallowed yes
set baseaddress <ipminet0 base address>
set dynamicrangestart <ipminet0 dynamic range start>
set dynamicrangeend <<ipminet0 dynamic range end>
set netmaskbits 24
set gateway <ipminet0 gateway>
commit
Reference: ipminet0 settings#
[a03-p1-head-01->network[ipminet0]]% show
Parameter Value
----------------------- ----------------------
Name ipminet0
Private Cloud
Revision
Domain Name ipminet0.cluster
Type Internal
MTU 9000
Allow autosign Automatic
Write DNS zone both
Node booting yes
Lock down dhcpd no
Management allowed yes
Search domain index 0
Exclude from search domain no
Disable automatic exports no
Base address 7.241.0.0
Broadcast address 7.241.0.255
Dynamic range start 7.241.0.150
Dynamic range end 7.241.0.254
Netmask bits 24
Gateway 7.241.0.1
Gateway metric 10
Cloud Subnet ID
EC2AvailabilityZone
Layer3 no
Notes <0B>
computenet#
computenet is a non-routable subnet used for the East-West/InfiniBand configuration.
Example: computenet#
cmsh
network
add computenet
set mtu 4096
set domainname computenet.cluster
set nodebooting no
set lockdowndhcpd no
set managementallowed no
set baseaddress 100.126.0.0
set dynamicrangestart 0.0.0.0
set dynamicrangeend 0.0.0.0
set netmaskbits 16
set gateway 0.0.0.0
commit
Reference: computenet settings#
[a03-p1-head-01->network[computenet]]% show
Parameter Value
-------------------------- ---------------------
Name computenet
Private Cloud
Revision
Domain Name computenet.cluster
Type Internal
MTU 4096
Allow autosign Automatic
Write DNS zone both
Node booting no
Lock down dhcpd no
Management allowed no
Search domain index 0
Exclude from search domain no
Disable automatic exports no
Base address 100.126.0.0
Broadcast address 100.126.255.255
Dynamic range start 0.0.0.0
Dynamic range end 0.0.0.0
Netmask bits 16
Gateway 0.0.0.0
Gateway metric 0
Cloud Subnet ID
EC2AvailabilityZone
Layer3 no
Notes <0B>
storagenet#
storagenet is a subnet used for the converged Ethernet fabric through the second Bluefield3 (BF3) port on each BF3 NIC per GB200 compute tray.
Example: storagenet#
cmsh
network
add storagenet
set mtu 9000
set domainname storagenet.cluster
set nodebooting no
set lockdowndhcpd no
set managementallowed no
set baseaddress 100.127.0.0
set dynamicrangestart 0.0.0.0
set dynamicrangeend 0.0.0.0
set netmaskbits 16
set gateway 0.0.0.0
commit
Reference: storagenet settings#
[a03-p1-head-01->network[storagenet]]% show
Parameter Value
-------------------------- ---------------------
Name storagenet
Private Cloud
Revision
Domain Name storagenet.cluster
Type Internal
MTU 9000
Allow autosign Automatic
Write DNS zone both
Node booting no
Lock down dhcpd no
Management allowed no
Search domain index 0
Exclude from search domain no
Disable automatic exports no
Base address 100.127.0.0
Broadcast address 100.127.255.255
Dynamic range start 0.0.0.0
Dynamic range end 0.0.0.0
Netmask bits 16
Gateway 0.0.0.0
Gateway metric 0
Cloud Subnet ID
EC2AvailabilityZone
Layer3 yes
Layer3 route none
Layer3 ecmp no
Layer3 split static route no
Notes <0B>
failovernet#
failovernet is a generic network set up for high availability (HA) with a direct connection between the head nodes. It is a simple network, and it is configured during the HA setup. Do not add this network manually. Its details are listed here for reference.
Example: Headnode failovernet IPs for HA setup#
#headnode 1
physical enP2s2f0 10.151.0.1 failovernet always
#headnode 2
physical enP2s2f0 10.151.0.2 failovernet always
Reference: failovernet settings#
[a03-p1-head-01->network[failovernet]]% show
Parameter Value
-------------------------- ---------------------
Name failovernet
Private Cloud
Revision
Domain Name failover.cluster
Type Internal
MTU 1500
Allow autosign Automatic
Write DNS zone both
Node booting no
Lock down dhcpd no
Management allowed no
Search domain index 0
Exclude from search domain no
Disable automatic exports no
Base address 10.151.0.0
Broadcast address 10.151.255.255
Dynamic range start 0.0.0.0
Dynamic range end 0.0.0.0
Netmask bits 16
Gateway 0.0.0.0
Gateway metric 0
Cloud Subnet ID
EC2AvailabilityZone
Layer3 no
Notes <0B>
globalnet#
This is an automatic network and is present by default. No configuration required. The administrator can change network types (Type 1, Type 2, Type 3) in the globalnet settings.
Reference: globalnet settings#
[a03-p1-head-01->network[globalnet]]% show
Parameter Value
-------------------------- ---------------------
Name globalnet
Private Cloud
Revision type3
Domain Name cm.cluster
Type Global
MTU 1500
Allow autosign Automatic
Write DNS zone both
Node booting no
Lock down dhcpd no
Management allowed no
Search domain index 0
Exclude from search domain no
Disable automatic exports no
Base address 0.0.0.0
Broadcast address 255.255.255.255
Dynamic range start 0.0.0.0
Dynamic range end 0.0.0.0
Netmask bits 0
Gateway 0.0.0.0
Gateway metric 0
Cloud Subnet ID
EC2AvailabilityZone
Layer3 no
Notes <0B>
bcm-netautogen#
Note
Please refer to the NVIDIA Mission Control DGX SuperPOD Ethernet North-South Network Configuration Guide for more in depth instructions on how to use the bcm-netautogen tool.
For DGX SuperPOD, the bcm-netautogen has been re-architected for GB200 generation and beyond. It generates the configuration for TOR switches.
200G in-band network (SN5600).
Control plane rack OOB switches as well as per-rack OOB switches (SN2201).
BCM import:
The networks and their CIDR information that are used.
The DGX GB200 node configurations within BCM that includes all network interface cards (NICs) with the appropriate assigned network and IPs.
NVLink Switch configuration and setup.
Required information:
The three documents/files that are inputs into bcm-netautogen are:
p2p_ethernet.csv which contains installation site point to point information for the ethernet fabric.
GB200 rack inventory files with MAC addresses for all NICs on every device in the GB200 rack. This is also in .csv format.
Site Information pertaining to the prefixes and BGP ASNs (that are required for generating the IP plan) placed into a site-info.yaml.
For DGX SuperPOD configurations, the project manager will receive the serial numbers for each DGX GB200 rack. Once the rack arrives on site, the deployment engineer will check/confirm the serial number on the rack and then assign it to a rack location based on the rack elevation.