Automated Rack Import Process#
The rack consists of eighteen GB200 compute trays, nine NVLink Switch trays, and eight power shelves. The configuration of these components is largely taken care of by the rack import process that is a part of the bcm-netautogen tool and the bcm-post-install automation.
Get rack Inventory file from factory once the rack passes L11 testing.
Use the bcm-netautogen tool with its inputs from the rack inventory file, the Point to Point (P2P) file, and a siteinformation.yaml to create .json files for each GB200 compute tray, NVLink switch, and powershelf. IPs will be assigned based on the available networking subnets as defined by the customer..
Run the bcm-post-install automation to import the .json files into BCM.
For more details about the inputs to the bcm-netautogen tool and the bcm-post-install automation, please see the NVIDIA Mission Control DGX SuperPOD Ethernet North-South Network Configuration Guide guidance.
GB200 and NVLink Switch Host Naming Conventions#
bcm-netautogen follows the naming convention schema in the following table.
Term |
Definition |
---|---|
GPU RACK (incremental) |
|
<RACK>-<RU>-P[1-16]-<ROLE>-0[1-8]-C0[1-18] |
RACKNAME, POD#, ROLE: DGX, C#: ComputeTray Example:
A01-P1-DGX-01-C01 .. A01-P1-DGX-01-C18
B09-P1-DGX-08-C01 .. B09-P1-DGX-02-C18 |
NVLink Switch (incremental) |
|
<RACK>-<RU>-P[1-16]-<switch_role>-0[1-9] |
Pod#, SwitchRole: nvsw, Rack# [1-8] (within pod, there 8 racks), NVLink Switch incremental [1-9] Example:
A01-P1-NVSW-01 .. A01-P1-NVSW-09
B01-P1-NVSW-01 .. B01-P1-NVSW-09
|
Storage rack (incremental) |
|
<RACK>-<RU>-P[1-16]-<storage_vendor>-0[1-n] |
Storage Appliance |
Ethernet switch (incremental) |
|
<RACK>-<RU>-P[1-16]-<switch_role>-0[1-n] |
Pod#, SwitchRole: TOR, IPMI, SPINE, SSPINE |
Rack Inventory File#
After each rack passes L11 testing at the factory, a rack inventory file will be generated and sent to the customer or the relevant deployment contact. After import, all the device information will be in BCM, including rack data for the rack management feature such as RU location on the rack.
Rack Entry .json Creation#
From the rack inventory file, the bcm-netautogen script will generate a .json file for each rack component. bcm-netautogen script generates the appropriate IPs per interface. Pulls the appropriate information such as NIC MAC addresses and assigns IPs accordingly. The following is an example of all the .json files that are generated by bcm-netautogen for import:
Example: Rack .json files for a DGX GB200 rack
a05-p1-dgx-01-c01.json
a05-p1-dgx-01-c02.json
a05-p1-dgx-01-c03.json
a05-p1-dgx-01-c04.json
a05-p1-dgx-01-c05.json
a05-p1-dgx-01-c06.json
a05-p1-dgx-01-c07.json
a05-p1-dgx-01-c08.json
a05-p1-dgx-01-c09.json
a05-p1-dgx-01-c10.json
a05-p1-dgx-01-c11.json
a05-p1-dgx-01-c12.json
a05-p1-dgx-01-c13.json
a05-p1-dgx-01-c14.json
a05-p1-dgx-01-c15.json
a05-p1-dgx-01-c16.json
a05-p1-dgx-01-c17.json
a05-p1-dgx-01-c18.json
a05-p1-nvsw-01.json
a05-p1-nvsw-02.json
a05-p1-nvsw-03.json
a05-p1-nvsw-04.json
a05-p1-nvsw-05.json
a05-p1-nvsw-06.json
a05-p1-nvsw-07.json
a05-p1-nvsw-08.json
a05-p1-nvsw-09.json
a05-p1-pwr-01.json
a05-p1-pwr-02.json
a05-p1-pwr-03.json
a05-p1-pwr-04.json
a05-p1-pwr-05.json
a05-p1-pwr-06.json
a05-p1-pwr-07.json
a05-p1-pwr-08.json
Compute tray .json#
The following example provides details for a compute tray .json file that can be modified to import node information into BCM11.
Example: Compute tray .json
{
"baseType": "Device",
"biosSetup": null,
"bmcSettings": {
"baseType": "BMCSettings",
"firmwareManageMode": "GB200"
},
"bootLoader": "CATEGORY",
"bootLoaderProtocol": "CATEGORY",
"category": "dgx-gb200",
"childType": "PhysicalNode",
"cmdaemonUrl": "https://7.241.21.141:8081",
"creationTime": 1744054247,
"extra_values": {
"Leak Detection": "GB200"
},
"fips": "CATEGORY",
"hostname": "a08-p1-dgx-04-c01",
"interfaces": [
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkBondInterface",
"interfaces": [
"enP22p3s0f0np0",
"enP6p3s0f0np0"
],
"ip": "7.241.21.141",
"mode": 4,
"name": "bond0",
"network": "dgxnet2",
"onNetworkPriority": 70,
"options": "miimon=100",
"startIf": "ALWAYS"
},
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkPhysicalInterface",
"name": "ibp3s0",
"onNetworkPriority": 60,
"startIf": "ALWAYS"
},
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkPhysicalInterface",
"name": "ibP2p3s0",
"onNetworkPriority": 60,
"startIf": "ALWAYS"
},
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkPhysicalInterface",
"name": "ibP16p3s0",
"onNetworkPriority": 60,
"startIf": "ALWAYS"
},
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkPhysicalInterface",
"name": "ibP18p3s0",
"onNetworkPriority": 60,
"startIf": "ALWAYS"
},
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkPhysicalInterface",
"ip": "7.241.7.10",
"name": "eth3",
"network": "ipminet2",
"onNetworkPriority": 60,
"startIf": "ALWAYS",
"mac": "E0:9D:73:E8:A0:69"
},
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkPhysicalInterface",
"ip": "7.241.7.11",
"name": "eth4",
"network": "ipminet2",
"onNetworkPriority": 10,
"startIf": "ALWAYS",
"mac": "E0:9D:73:E8:9F:F7"
},
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkBmcInterface",
"ip": "7.241.7.12",
"name": "rf0",
"network": "ipminet2",
"onNetworkPriority": 10,
"startIf": "ALWAYS",
"mac": "3C:6D:66:15:B7:FB"
},
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkPhysicalInterface",
"ip": "7.241.20.141",
"name": "enP5p9s0",
"network": "internalnet2",
"onNetworkPriority": 60,
"startIf": "ALWAYS",
"mac": "D0:F4:05:5B:E4:28"
},
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkPhysicalInterface",
"name": "enP6p3s0f0np0",
"onNetworkPriority": 65,
"startIf": "ALWAYS",
"mac": "E0:9D:73:E8:A0:44"
},
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkPhysicalInterface",
"name": "enP22p3s0f0np0",
"onNetworkPriority": 60,
"startIf": "ALWAYS",
"mac": "E0:9D:73:E8:9F:D2"
},
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkPhysicalInterface",
"ip": "100.127.1.109",
"name": "enP6p3s0f1np1",
"network": "storagenet",
"onNetworkPriority": 60,
"startIf": "ALWAYS",
"mac": "E0:9D:73:E8:A0:45"
},
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkPhysicalInterface",
"ip": "100.127.129.109",
"name": "enP22p3s0f1np1",
"network": "storagenet",
"onNetworkPriority": 60,
"startIf": "ALWAYS",
"mac": "E0:9D:73:E8:9F:D3"
}
],
"mac": "E0:9D:73:E8:9F:D2",
"managementNetwork": "dgxnet2",
"partition": "base",
"provisioningInterface": "bond0",
"provisioningTransport": "RSYNCDAEMON",
"rackPosition": {
"baseType": "RackPosition",
"position": 11,
"rack": "A08"
},
"tag": "1830625000348"
}
NVLink Switch .json#
Example: NVLink Switch tray .json
{
"accessSettings": {
"baseType": "AccessSettings",
"username": "admin",
"password": "0penBmc"
},
"baseType": "Device",
"bmcSettings": {
"baseType": "BMCSettings",
"firmwareManageMode": "gb200sw",
"privilege": "ADMINISTRATOR",
"userID": 1,
"userName": "root",
"password": "0penBmc"
},
"childType": "Switch",
"creationTime": 1744054247,
"disablePortDetection": true,
"disableSNMP": true,
"hasClientDaemon": true,
"hostname": "A08-P1-NVSW-01",
"interfaces": [
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkBmcInterface",
"ip": "7.241.7.111",
"mac": "2C:5E:AB:D0:91:BE",
"name": "rf0",
"network": "ipminet2",
"onNetworkPriority": 10,
"startIf": "ALWAYS",
"switchPort": "A08-P1-OOB-02:swp9"
},
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkPhysicalInterface",
"ip": "7.241.7.91",
"name": "eth0",
"network": "ipminet2",
"onNetworkPriority": 60,
"startIf": "ALWAYS",
"switchPort": "A08-P1-OOB-02:swp28"
},
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkPhysicalInterface",
"ip": "7.241.7.101",
"name": "eth1",
"network": "ipminet2",
"onNetworkPriority": 60,
"startIf": "ALWAYS",
"switchPort": "A08-P1-OOB-02:swp37"
}
],
"kind": "NVLINK",
"mac": "E0:9D:73:F0:3D:36",
"managementNetwork": "ipminet2",
"nvConfiguration": null,
"nvConfigurationMode": "AUTO",
"partition": "base",
"powerControl": "rf0",
"rackPosition": {
"baseType": "RackPosition",
"position": 19,
"rack": "A08"
},
"ztpSettings": {
"baseType": "ZTPSettings",
"enableAPI": true,
"switchImage": "nvos-amd64-25.02.1782.bin",
"ztpJsonTemplate": "nvlink-nvos.json",
"ztpScriptTemplate": "nvos-ztp.sh"
},
"tag": "MT2508602958"
}
Power shelf .json#
Example: Power shelf .json
{
"accessSettings": {
"baseType": "AccessSettings",
"username": "admin",
"password": "0penBmc"
},
"baseType": "Device",
"childType": "PowerShelf",
"creationTime": 1744054247,
"hostname": "a08-p1-pwr-01",
"interfaces": [
{
"baseType": "NetworkInterface",
"bringupduringinstall": "NO",
"childType": "NetworkPhysicalInterface",
"ip": "7.241.6.141",
"mac": "00:18:23:0C:40:7D",
"name": "eth0",
"network": "ipminet1",
"onNetworkPriority": 60,
"startIf": "ALWAYS"
}
],
"mac": "00:18:23:0C:40:7D",
"managementNetwork": "ipminet1",
"partition": "base",
"pmcSettings": {
"baseType": "PMCSettings",
"userName": "root",
"password": "0penBmc"
},
"rackPosition": {
"baseType": "RackPosition",
"rack": "A08",
"position": 6
}
}
Import Compute Nodes, NVLink Switches, and Power Shelves#
Each .json file can be imported individually into BCM.
cmsh-c "main;import <.json file location./<rack number>/<.json file name>"
Example: .json import process
% cmsh
% main
% import /root/<rack number>/a05-p1-dgx-01-c01.json
% import /root/<rack number>/a05-p1-dgx-01-c02.json
% # ...repeat above imports for all remaining devices in the JSON,
% # including all Power Shelves, switches, etc.
Note
No separate commit is necessary after imports; the devices are committed automatically as part of the import command.
Caution
Importing a .json of the same name will overwrite the current settings without warning or prompting.
After the rack import is complete, all devices can be controlled once they are provisioned at the individual node up to the rack level.
Example: GB200 compute tray node information in BCM
a03-p1-head-01->device[b05-p1-dgx-05-c01]]% show
Parameter Value
-------------------------------------- ------------------------------------------------------------
Leak Detection GB200
Hostname b05-p1-dgx-05-c01
IP 7.241.18.139
Network dgxnet2
Revision
Type PhysicalNode
Mac EE:67:84:CB:A1:6B
Use exclusively for (category:dgx-gb200)
Category dgx-gb200
Activation Mon, 10 Feb 2025 14:33:26 PST
Rack B05:11
Chassis < not set >
Access Settings <submode>
Roles <0 in submode>
Software image dgxos7-570.82-image (category:dgx-gb200)
Node installer disk no
Install boot record no
Install mode
Next install mode
Time zone America/Los_Angeles (partition:base)
Management network dgxnet2
Data node no
Disk setup <1818 bytes> (category:dgx-gb200)
Hardware RAID configuration <0B>
Initialize script <0B>
Finalize script <0B>
BIOS setup < not set >
Allow networking restart no
PXE Label
Block devices cleared on next boot
Provisioning interface bond0
Provisioning Transport RSYNCDAEMON
Exclude list manipulate script <0B>
Exclude list full install <0B>
Exclude list sync install <0B>
Exclude list update <0B>
Exclude list grab <0B>
Exclude list grab new <0B>
Power control rf0
Custom power script
Custom power script argument
Power distribution units
IO scheduler
Kernel version 6.8.0-1018-nvidia-64k (software image:dgxos7-570.82-image)
Kernel parameters rd.driver.blacklist=nouveau systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller (software image:dgxos7-570.82-image)
Kernel output console tty0 (software image:dgxos7-570.82-image)
Boot loader grub (dgx-gb200)
Boot loader protocol HTTP (dgx-gb200)
Boot loader file
Kernel modules 51 (software image:dgxos7-570.82-image)
FIPS no (dgx-gb200)
Template node no
From template node
Default gateway 7.241.18.129 (network: dgxnet2)
Default gateway metric 0
Switch ports
Interfaces <13 in submode>
Static routes <0 in submode>
GPU Settings <0 in submode>
BMC Settings <submode>
SELinux Settings <submode>
Filesystem mounts <0 in submode>
Filesystem exports <0 in submode>
Services <0 in submode>
Userdefined1
Userdefined2
User defined resources
Supports GNSS no
Custom ping script
Custom ping script argument
Custom remote console script
Custom remote console script argument
Partition base
Version config files no
Part number
Serial number
Notes <0B>
Prometheus metric forwarders <0 in submode>
Example: GB200 compute tray interfaces
[a03-p1-head-01->device[b05-p1-dgx-05-c01]->interfaces]% list
Type Network device Name IP Network Start if
name
--------- ------------------ -------------- --------------- ------------ --------
bmc rf0 - 7.241.4.13 ipminet3 always
bond bond0 [prov] - 7.241.18.139 dgxnet2 always
physical enP22p3s0f0np0 (bond0) 0.0.0.0 - always
physical enP22p3s0f1np1 - 100.127.130.1 storagenet always
physical enP5p9s0 - 7.241.19.83 internalnet2 always
physical enP6p3s0f0np0 (bond0) 0.0.0.0 - always
physical enP6p3s0f1np1 - 100.127.2.1 storagenet always
physical enP6p9s0 - 7.241.4.11 ipminet3 always
physical eth4 - 7.241.4.12 ipminet3 always
physical ibP16p3s0 - 0.0.0.0 - always
physical ibP18p3s0 - 0.0.0.0 - always
physical ibP2p3s0 - 0.0.0.0 - always
physical ibp3s0 - 0.0.0.0 - always
Example: NVLink Switch information
[a03-p1-head-01->device[a05-p1-nvsw-01]]% show
Parameter Value
-------------------------------- ----------------------------------------
Hostname a05-p1-nvsw-01
IP 7.241.3.1
Network ipminet2
Revision
Type Switch
Mac E0:9D:73:3F:E0:50
Model
Ports 0
Kind nvlink
Control script
Control script timeout 5
SNMP Settings <submode>
Lowest port 1
Uplinks
Disable port detection yes
Disable port mapping no
Activation Sun, 23 Feb 2025 12:55:30 PST
Rack A05:19
Chassis < not set >
Access Settings <submode>
Priority 0
VLAN cache time 5m
Has client daemon yes
ZTP Settings <submode>
Subnet manager no
Disable SNMP yes
GUID 00000000-0000-0000-0000-000000000000
Services <0 in submode>
NV configuration mode AUTO
Members
Management network ipminet2
Power control rf0
Custom power script
Custom power script argument
Power distribution units
Default gateway metric 0
Switch ports
Interfaces <3 in submode>
BMC Settings <submode>
Userdefined1
Userdefined2
User defined resources
Supports GNSS no
Custom ping script
Custom ping script argument
Partition base
Part number
Serial number
Notes <0B>
Prometheus metric forwarders <0 in submode>
Example: NVLink Switch tray interfaces
[head-01->device[a05-p1-nvsw-01]->interfaces]% list
Type Network device name IP Network Start if
--------- -------------------- -------------- ------------- --------
bmc rf0 7.241.3.21 ipminet2 always
physical eth0 7.241.3.1 ipminet2 always
physical eth1 7.241.3.11 ipminet2 always