DGX SuperPOD Architecture#

The DGX SuperPOD architecture is a combination of DGX systems, Ethernet networking, InfiniBand Networking, management nodes, and storage. Figure 4 shows the rack layout of a single SU. With DGX SuperPOD with DGX RUBIN NVL8 systems, we utilize standard racks with traditional power supplies and PDUs.

In our reference design, eight DGX RUBIN NVL8 fit within a single rack. The rack-level power consumption per rack is ~225kW. The rack layout can be adjusted to meet local data center requirements, such as maximum power per rack and rack layout between DGX systems and supporting equipment to meet local needs for power and cooling distribution.

Figure 4 shows 72 NVIDIA DGX RUBIN NVL8 PS systems in standard racks each with three 2U rack PDUs for maximum redundancy. Note that depending on your data center’s capability, you might need to reduce the number of DGXs hosted on the same rack.

_images/image5.png

Figure 4 DGX RUBIN NVL8 in Racks#

Figure 5 shows an example management rack configuration with networking switches, management servers, storage arrays, and UFM appliances. Sizes and quantities vary depending upon the models used. This example is for 1SU.

_images/image6.png

Figure 5 Management Rack Configuration with Networking Switches#

This reference architecture is focused on eight SUs with 576 DGX nodes. DGX SuperPOD can scale to much larger configurations up to and beyond 72 SU with more than 2000 DGX RUBIN NVL8 nodes. For more information, see Table 3.

Table 3 DGX SuperPOD Scalability#

SU Count

Node Count

GPU Count

Cable Count

Leaf

Spine

Node-Leaf

Leaf-Spine

1

72

576

8

4

576

576

2

144

1152

16

8

1152

1152

4

288

2304

32

18

2304

2304

8

576

4608

64

36

4608

4608

16

1152

9216

128

64

9216

9216

Contact NVIDIA for information regarding DGX SuperPOD solutions for 18 SUs or more.