Understanding Your Grace-Blackwell Systems#
GB200 NVL is a rack scale solution for GPUs connected using NVLink.
Architecture#
The GB200 architecture supports 36x2 and 72x1 configurations.
The 36x2 Configuration#
Here is the NVIDIA reference configuration for the 36x2 configuration:
MGX Rack
NVL36x2 GPU Racks
9x 2RU Compute trays.
9x 1RU NVL36 NVLink Switch trays.
Cable cartridges and manifolds in the RU pitch.
Hybrid cooling trays:
Grace CPU, Blackwell GPU, CX7, and NVLink Switch ASICs + OSFPs are liquid cooled.
The rest of the components are air cooled.

The 72x1 Configuration#
GB200 72x1 contains one rack with 72 GPUs.
Reference configuration:
18x 1RU Compute trays (4 GPUs per compute tray).
9x 1RU NVL72 NVLink Switch trays.
Hybrid cooling trays:
Grace CPU, Blackwell GPU, CX7, and NVLink Switches ASICs are liquid cooled.
Rest of the components are air cooled.

Topology of a Compute Node#
Here is the topology of a compute node:

Network Overview#
Here is a list of the networks in the GB200 rack installation:
External-net: Connects the rack to an external network, such as a corporate or campus network and possibly the Internet and a dedicated or shared remote storage.
This network is also usually referred to as the North-South or the Front-end network.
Each compute tray connects to the ethernet switch through the two BF-3 NICs, and the external-net is on the 400G ports.
Internal-net: Used exclusively to manage the rack such as SSH access and control plane functions such as compute node provisioning, workload orchestration, and telemetry collection.
This network is also referred to as an in-band management network.
Each compute tray connects to the ethernet switch through the two BF-3 NICs, and the internal-net is on 200G ports.
IPMI-net: Used for OOB management and connects the BMCs of the compute and switch trays.
This network is also referred to as the BMC OOB Eth network.
Each management node (host) requires one interface for the IPMI interface of the host and a second interface that will be used as host OS’s direct access to the IPMI subnet.
Fabric-net: InfiniBand or Ethernet network that connects the GB200 NVL rack’s Compute Fabric HCAs (typically ConnectX-7).
This network is also referred to as the East-West or the Back-end network.
For the fabric-net, the four CX7 in each compute tray are connected to a Compute InfiniBand or Ethernet switch. To manage the IB fabric, an SM is required.
OS-net: The network over which the OS of each compute tray communicates for debugging and other miscellaneous requirements.
The 10G/1G ports on each compute tray are connected to create the os-net.
The os-net is typically an optional network and is sometimes used early in the engineering phase to bring up the first nodes, install the OS by ssh into the nodes, or for debugging.
NVLink-net: The network used to manage NVLinks.
The two 1G ports on each of the switch trays are linked to form the nvlink-net to manage NVLink.
Only one port is needed for functionality.
The second port is provided only for high availability if the first port goes down.
The network is typically used to communicate with the NMX-C and NMX-T services that run on one of the switch trays to manage the NVLink fabric and to collect telemetries.