Network Deployment#

It’s essential to get familiar with the DGX BasePOD Reference Architecture before proceeding

Note: Before powering on any of the switches, ensure physical serial port connections have been established, then proceed to power on all the switches.

SN4600C – managementnet ethernet switches#

The SN4600c managementnet fabric provides connectivity for inband management and provisioning of the nodes. The key configuration requirements are

  • MLAG between the two SN4600C switches

  • L3 SVI/VRRP for all the pod ethernet networks

  • Each headnode / K8s node / DGX is dual homed to the SN4600C switches via bond interface

  • External connectivity to customer network, using customer specified routing arrangements, like BGP (Border Gateway Protocol) or static or other dynamic routing protocols

  • Link to IPMI Network for BCM to access node BMCs, either direct or indirect via customer network.

SN4600C-1 reference configuration#

# Basic management configuration
nv set system hostname 4600C-1
#
# Create SVIs for Internal/Management Network with VRRP as FHRP
nv set bridge domain br_default vlan 102
nv set interface vlan102 type svi
nv set interface vlan102 ip vrr mac-address 00:00:5E:00:01:01
nv set interface vlan102 ip vrr address 10.184.94.1/24
nv set interface vlan102 ip address 10.184.94.2/24
nv set interface vlan102 ip vrr state up
# Repeat the same for other SVI interfaces
# Configure MLAG
# Define inter-chassis peerlink etherchannel/bond
nv set interface peerlink bond member swp63,swp64
nv set interface peerlink type peerlink
#
# Loopback for BGP/MLAG backup routing
nv set interface lo ip address 10.160.254.22
#
# Configure Peerlink L3 parameters
nv set interface peerlink.4094 base-interface peerlink
nv set interface peerlink.4094 type sub
nv set interface peerlink.4094 vlan 4094
nv set mlag backup 10.160.254.23
nv set mlag enable on
nv set mlag mac-address 44:38:39:ff:00:02
nv set mlag peer-ip linklocal
# MAG Primary
nv set mlag priority 2048
# Example port configuration for head nodes (BCM, Kube)
# BCM Head Nodes
nv set interface bond1 bond member swp1
nv set interface bond1 description "BCM Headnode 1"
nv set interface bond1 bond mlag id 1
nv set interface bond1 bridge domain br_default access 102
nv set interface bond1 bond mlag enable on
nv set interface bond1 bond lacp-bypass on
# Repeat for other management/workloads/compute nodes
#
# Uplink to the customer network.
# Example configuration with BGP unnumbered
nv set router bgp autonomous-system 4200004001
nv set router bgp enable on
nv set router bgp router-id 10.160.254.22
nv set vrf default router bgp address-family ipv4-unicast enable on
nv set vrf default router bgp address-family ipv4-unicast redistribute connected enable on
nv set vrf default router bgp enable on
# Uplinks via swp50
nv set vrf default router bgp neighbor swp50 type unnumbered
# Peering to MLAG peer switch
nv set vrf default router bgp neighbor peerlink.4094 remote-as internal
nv set vrf default router bgp neighbor peerlink.4094 type unnumbered

Refer to the appendix for complete switch configuration.

SN4600C-2 reference configuration#

Same as SN4600C-1, with the following changes

# Basic management configuration
nv set system hostname 4600C-2
#
# Create SVIs - Internal/Management Network with VRRP as FHRP
nv set bridge domain br_default vlan 102
nv set interface vlan102 type svi
nv set interface vlan102 ip vrr mac-address 00:00:5E:00:01:01
nv set interface vlan102 ip vrr address 10.184.94.1/24
nv set interface vlan102 ip address 10.184.94.3/24
nv set interface vlan102 ip vrr state up
#
# Configure MLAG
# Define inter-chassis peerlink etherchannel/bond
#
# BGP/MLAG backup routing loopback
nv set interface lo ip address 10.160.254.23
#
# Configure Peerlink L3 parameters
nv set mlag backup 10.160.254.22
nv set mlag mac-address 44:38:39:ff:00:02
# MLAG Secondary
nv set mlag priority 4096
#
# Example port configuration - head nodes (BCM, Kube)
# same as 4600-1
#
# Uplink to the customer network.
# Same as 4600-1

Refer to the appendix for complete switch configuration.

You can verify the MLAG status using the following command

root@mgmt-net-leaf-1:mgmt:/home/cumulus# clagctl
The peer is alive
     Our Priority, ID, and Role: 2048 9c:05:91:dd:cc:28 primary
    Peer Priority, ID, and Role: 2048 9c:05:91:f1:73:28 secondary
          Peer Interface and IP: peerlink.4094 fe80::9e05:91ff:fef1:7328 (linklocal)
                      Backup IP: 10.160.254.23 vrf mgmt (inactive)
                     System MAC: 44:38:39:ff:0a:00

CLAG Interfaces
Our Interface      Peer Interface     CLAG Id   Conflicts              Proto-Down Reason
----------------   ----------------   -------   --------------------   -----------------
           bond1   -                  1         -                      -
          bond10   -                  10        -                      -
          bond11   -                  11        -                      -
          bond12   -                  12        -                      -
          bond13   -                  13        -                      -
          bond14   -                  14        -                      -

For troubleshooting, you can use the consistency check command. Here is an example output from a working MLAG pair.

cumulus@mgmt-net-leaf-2:mgmt:~$ nv show mlag consistency-checker global
Parameter               LocalValue                 PeerValue                  Conflict  Summary
----------------------  -------------------------  -------------------------  --------  -------
anycast-ip              -                          -                          -
bridge-priority         32768                      32768                      -
bridge-stp-mode         rstp                       rstp                       -
bridge-stp-state        on                         on                         -
bridge-type             vlan-aware                 vlan-aware                 -
clag-pkg-version        1.6.0-cl5.11.0u2           1.6.0-cl5.11.0u2           -
clag-protocol-version   1.7.0                      1.7.0                      -
peer-ip                 fe80::9e05:91ff:fedd:cc28  fe80::9e05:91ff:fedd:cc28  -
peerlink-bridge-member  Yes                        Yes                        -
peerlink-mtu            9216                       9216                       -
peerlink-native-vlan    1                          1                          -
peerlink-vlans          1, 100->102                1, 100->102                -
redirect2-enable        yes                        yes                        -
system-mac              44:38:39:ff:0a:00          44:38:39:ff:0a:00          -

SN2201 – IPMI Switch for Out-of-Band Management#

All the BMCs are in the same oobmanagementnet subnet, configure all switch ports connected to the BMCs to be under the same VLAN.The oobmanagementnet should be accessible from the managementnet to allow the BCM headnodes to control the BMCs. In this example, the oobmanagementnet is routed via the managementnet SN4600C switches. It is recommended to add an additional uplink to the customer’s OOB network.

Example Configuration for the SN2201 switch.

nv set system hostname IPMI-SW
#<Basic management configuration>
#
# VLAN - BMC ports. Adjust according to the customer
specification
nv set bridge domain br_default vlan 101
#
# Enable the BMC Ports to the Access VLAN
#
nv set interface swp1-48 bridge domain br_default
nv set bridge domain br_default untagged 1
nv set interface swp1-48
nv set interface swp1-48 link state up
nv set interface swp1-48 description "BMC Ports"
nv set interface swp1-48 bridge domain br_default access 101
#
# Uplink to customer OOB/PIMI Network
# In this example the uplink is a layer 2 trunk with etherchannel/bond.
# Adjust according to the customer specification
nv set interface swp49-50 link state up
nv set interface bond1 bond member swp49,swp50
nv set interface bond1 bridge domain br_default untagged 1
nv set interface bond1 bridge domain br_default vlan all

Refer to the appendix for complete switch configuration.

Reference: Cumulus Network configuration Guide.

You can also use NVIDIA Air to simulate and model the network configuration.

Once the SN2201 switches have been successfully configured, verify that all devices out of band management interfaces are reachable from the network. (i.e. make sure you can access the BMC/iLO/iDRAC).

Computenet Configuration#

Before powering on any of the QM9700 switches in the Compute or Storage switch stacks ensure that serial port connectivity can be established (either via remote serial concentrator or physically interfacing with the serial port of the switch), then proceed to power on all Compute & Storage switches.

QM9700 IB Switches#

We recommend configuring the InfiniBand switches with subnet manager HA enabled.

Example configuration

QM-9700-1#

ib sm
ib sm virt enable
ib smnode 9700-1 create
ib smnode 9700-1 enable
ib smnode 9700-1 sm-priority 15
ib ha infiniband-default ip <HA VIP> <mask>

QM-9700-2#

ib sm virt enable
ib smnode 9700-1 create
ib smnode 9700-1 enable
ib smnode 9700-1 sm-priority 15

Verify IB SM HA status using the following command

QM9700-1[infiniband-default: master] # show ib smnodes
HA state of switch infiniband-default:
IB Subnet HA name: infiniband-default
HA IP address    : 10.185.230.247/22
Active HA nodes  : 2

HA node local information:
  Name       : 9700-2 (active)
  SM-HA state: standby
  SM Running : stopped
  SM Enabled : disabled
  SM Priority: 0
  IP         : 10.185.230.243

HA node local information:
  Name       : 9700-1 (active)  <--- (local node)
  SM-HA state: master
  SM Running : running
  SM Enabled : enabled - master
  SM Priority: 15
  IP         : 10.185.231.43

Refer to the Appendix for complete switch configuration.

Reference: Nvidia QM9700 InfiniBand Switch user manual

InfiniBand/Ethernet Storage Fabric Specific Configurations

A DGX BasePOD typically also includes dedicated storage, but the configuration is outside the scope of this document. Contact the vendor of the storage solution being used for instructions on configuring the high-performance storage portions of a DGX BasePOD.