Major Components
Major components for the DGX SuperPOD configuration are listed in Table 7. These are representative of the configuration and must be finalized based on actual design.
Table 7. Major components of the 4 SU, 127-node DGX SuperPOD
Count |
Component |
Recommended Model |
---|---|---|
Racks |
||
38 |
Rack (Legrand) |
NVIDPD13 |
Nodes |
||
127 |
GPU nodes |
DGX H100 system |
4 |
UFM appliance |
NVIDIA Unified Fabric Manager Appliance 3.1 |
5 |
Management servers |
Intel based x86 2 × Socket, 24 core or greater, 384 GB RAM, OS (2x480GB M.2 or SATA/SAS SSD in RAID 1), NVME 7.68 TB (raw), 4x HDR200 VPI Ports, TPM 2.0 |
Ethernet Network |
||
8 |
In-band management |
NVIDIA SN4600C switch with Cumulus Linux |
8 |
OOB management |
NVIDIA SN2201 switch with Cumulus Linux |
Compute InfiniBand Fabric |
||
48 |
Fabric switches |
NVIDIA Quantum QM9700 switch, 920-9B210-00FN-0M0 |
Storage InfiniBand Fabric |
||
16 |
Fabric switches |
NVIDIA Quantum QM9700 switch, 920-9B210-00FN-0M0 |
PDUs |
||
96 |
Rack PDUs |
Raritan PX3-5878I2R-P1Q2R1A15D5 |
12 |
Rack PDUs |
Raritan PX3-5747V-V2 |
Associated cables and transceivers are listed in Table 8. All networking components are multi-mode fiber.
Table 8. Estimate of cables required for a 4 SU, 127-node DGX SuperPOD
Count |
Component |
Connection |
Recommended Mode¹ |
---|---|---|---|
In-Band Ethernet Cables |
|||
254 |
100 Gbps |
DGX H100 system |
Varies |
32 |
100 Gbps QSFP to QSFP AOC |
Management nodes |
Varies |
6 |
100 Gbps |
ISL Cables |
Varies |
Varies |
Ethernet (perf varies) |
Storage |
Varies |
Varies |
Varies |
Core DC |
Varies |
OOB Ethernet Cables |
|||
127 |
1 Gbps |
DGX H100 systems |
Cat5e |
64 |
1 Gbps |
InfiniBand Switches |
Cat5e |
11 |
1 Gbps |
Management/UFM nodes |
Cat5e |
8 |
1 Gbps |
In-band Ethernet switches |
Cat5e |
Varies |
1 Gbps |
Storage |
Cat5e |
108 |
1 Gbps |
PDUs |
Cat5e |
16 |
100 Gbps |
Two uplinks per OOB to in-band |
Varies |
Compute InfiniBand Cabling |
|||
2040 |
NDR Cables¹, 400 Gbps |
DGX H100 systems to leaf, leaf to spine |
980-9I57X-00N010 |
2 |
NDR Cables, 200 Gbps |
UFM to leaf ports |
980-9I111-00H010 |
1536 |
Switch OSFP Transceivers |
Leaf and spine transceivers |
980-9IA2O-00NS00 |
508 |
System OSFP Transceivers |
Transceivers in the DGX H100 Systems |
980-9I89P-00N000 |
4 |
UFM System Transceivers |
UFM to leaf connections |
980-9I89R-00NS00 |
Storage InfiniBand Cables¹ ² |
|||
494 |
NDR Cables, 400 Gbps |
DGX H100 systems to leaf, leaf to spine |
980-9I57X-00N010 |
48² |
NDR Cables, 200 Gbps |
Storage |
980-9I111-00H010 |
4 |
UFM System Transceivers |
UFM to leaf connections |
980-9I51S-00NS00 |
369 |
Switch Transceivers |
Leaf and spine transceivers |
980-9I510-00NS00 |
254 |
DGX System Transceivers |
QSFP112 transceivers |
980-9I693-00NS00 |
2 |
NDR Cables, 200 Gbps |
UFM to leaf ports |
980-9I557-00N030 |
4 |
HDR 400 Gbps to 2x200 Gbps |
Slurm management |
980-9I117-00H030 |
Varies |
Storage Cables, NDR200 |
Varies |
980-9I117-00H030 |
¹. Part number will depend on exact cable lengths needed based on data center requirements. ². Count and cable type required depend on specific storage selected. |