Appendix A – Terminology#
Table 9: Terminology
Term |
Definition |
|---|---|
Node |
The distinct hardware tray in the GB300 NVL72 rack that supports a bare-metal OS instance |
Rack |
A physical arrangement of equipment, such as servers or switches in a vertical closet |
Open Compute Project (OCP) Rack |
Rack based on the Open Rack V3 standard from the Open Compute Project, with the vertical space divided up into 48mm units (OU) and with power distributed using power shelves, a DC bus bar and, AC power cords |
Electronic Industries Alliance (EIA) Rack |
Conventional “Enterprise” Rack, originally based on the EIA-310-D standard, with the vertical space divided up in 44.45mm rack units (RU) and with power distributed using rack power distribution units (rPDUs) and AC power |
Tray |
One of the systems inserted into the GB300 NVL72 solution, generally either a “Node” or a “NVSwitch” (which may include 1-2 NVSwitch ASICs) installed directly into a Rack |
Hot Air Containment (HAC) |
Refers to a set of racks that exhaust into the same hot-air containment area, generally from two opposing rows |
NVLink Switch ASIC |
The silicon building block with a specific logical port count that drives the network topology |
Global Fabric Manager (GFM) |
The external services that manage the fabric of a NVLink Domain |
NVLink Domain |
The full set of nodes connected with a single multi-node NVLink fabric |
NVLink Block |
A group of nodes within a NVLink Domain that are allocated to the same job. A single job may have multiple blocks assigned across multiple domains |
NVLink Partition |
A construct of group nodes within an NVLink Domain that are isolated to only access each other’s GPU memory. A partition could be configured as an entire domain or there could be multiple partitions in the same domain. Using partitions should protect interruptions to work running on one partition from the other partitions. Partitions are setup via API calls to the GFM. |