Hardware#

The DGX GB200 is a rack scale solution for graphics processing units (GPUs) connected by an NVLink through the NVLink passive copper cable cartridge backplane. The complete DGX GB200 rack scale solution comprises compute trays with one or two compute boards, NVLink switch trays, an NVLink passive copper cable backplane, power shelves, a bus bar, and liquid cooling manifolds.

Rack Configuration#

The DGX GB200 rack scale architecture is an NVIDIA rack with 72-GPU NVL domains. Each 72-GPU rack contains:

  • 18x 1RU compute trays, each with 2 Grace CPUs and 4 Blackwell GPUs.

  • 9x 1RU NVLink switch trays.

  • 2x TOR Switches for management.

  • Power shelves for supplying power to all trays and switches.

Image showing four 72-GPU racks side-by-side

DGX GB200 Rack Configuration - Front view of four 72-GPU racks#

The rear side of the system provides access to the cable management system, the inlets and outlets to the liquid cooling manifolds and the manifolds themselves, the cable cartridges, and the power bus bar.

Image showing the rear view of a DGX GB200 rack

DGX GB200 Rack Configuration - Rear view#

Compute Trays#

The DGX GB200 compute tray is an enclosure that holds the compute boards and peripheral accessory boards and runs an OS image. The compute trays are cooled by liquid that runs up and down the rack through manifolds, then through the cold plates that are attached to the CPUs and the GPUs in the tray. The rest of the components like networking and storage devices are air cooled, which is pushed through the system by the fans. The compute trays in a rack are interconnected via NVLink through the connectors at the back of the tray, which enables communication with the other compute trays via the NVSwitch trays.

Image showing DGX GB200 compute tray

DGX GB200 Compute Tray#

The following table provides information about the DGX GB200 compute tray configuration.

Component

Function

Compute

Compute node

2x Grace™ CPUs and 4x Blackwell™ GPUs

Networking

Cluster network

4x NVIDIA ConnectX-7 single port 400G OSFP NIC

Storage/management network

2x NVIDIA BlueField-3 DPU, dual port 400G Infiniband or Ethernet

Out-of-band management network

1x 1GbE x RJ45 from the compute tray BMC module

2x 1GbE x RJ45 from the BlueField-3 BMC interface

Storage

Data cache

4x 3.84TB E1.S NVMe per compute tray with software RAID 0

Boot drive

1x 1.92TB M.2 NVMe

DGX GB200 compute tray block diagram

DGX GB200 Compute Tray - Block diagram#

Out-of-Band (OOB) management of the compute tray hardware resources is provided by a combination of Board Management Controller (BMC) and Host Management Controller (HMC) microcontrollers. External access is available through standard BMC and console interfaces.

The figure below shows a top view of the DGX GB200 compute tray and identifies the main components.

Image showing top view of the DGX GB200 compute tray

DGX GB200 Compute Tray - Top view#

The figure below shows a front view of the DGX GB200 compute tray and identifies the main components.

Image showing front view of the DGX GB200 compute tray

DGX GB200 Compute Tray - Front view#

Power Shelves#

The DGX GB200 rack uses a bus bar structure to distribute power in the rack. Power whips energize power shelves from a remote power panel. The power shelves convert AC power into nominal 50V-51V DC output and distribute it through the bus bar to the rack components. Multiple power shelves are used in the rack to supply redundant power.

The rack power consumption is approximately 120kW. The power shelf uses six air-cooled 5.5kW PSUs in eight power shelves that provide N+N redundancy and the required input power of 33kW per power shelf.

Image showing top view of power shelf

Power Shelf - Top view#

Top-of-Rack Ethernet Switches#

Top-of-rack (TOR) switches are used to connect all BMCs in the compute trays (including BlueField-3 BMCs) and the switch trays to the management network. The following system and switch ports are connected to this switch:

  • Out-of-band management BMC from the compute trays

  • BlueField-3 BMCs from the compute trays

Image showing TOR ethernet switch

DGX GB200 TOR Ethernet Switch#

Console Access#

Each NVLink switch provides an RJ45 interface for RS232 serial access. The default serial baud rate is 115200 baud.

Leak Detection#

Leak detection in the DGX GB200 rack’s liquid cooling system prevents damage to equipment, ensures system reliability and uptime, protects data integrity, and maintains safety. Early detection helps avoid costly repairs, downtime, and regulatory compliance issues, optimizes the cooling efficiency, and prevents secondary damage to infrastructure.