Abstract

The NVIDIA DGX SuperPOD™ with NVIDIA DGX™ H100 systems is the next generation of data center architecture for artificial intelligence (AI). Designed to provide the levels of computing performance required to solve advanced computational challenges in AI, high performance computing (HPC), and hybrid applications where the two are combined to improve prediction performance and time-to-solution. The DGX SuperPOD is based upon the infrastructure built at NVIDIA for internal research purposes and is designed to solve the most challenging computational problems of today. Systems based on the DGX SuperPOD architecture have been deployed at customer data centers and cloud-service providers around the world.

To achieve the most scalability, DGX SuperPOD is powered by several key NVIDIA technologies, including:

  • NVIDIA DGX H100 system—to provide the most powerful computational building block for AI and HPC.

  • NVIDIA NDR (400 Gbps) InfiniBand—bringing the highest performance, lowest latency, and most scalable network interconnect.

  • NVIDIA NVLink® technology—networking technologies that connect GPUs at the NVLink layer to provide unprecedented performance for most demanding communication patterns.

_images/abstract-01.png

The DGX SuperPOD architecture is managed by NVIDIA solutions including NVIDIA Base Command™, NVIDIA AI Enterprise, CUDA, and NVIDIA Magnum IO™. These technologies help keep the system running at the highest levels of availability, performance, and with NVIDIA Enterprise Support (NVEX), keeps all components and applications running smoothly.

This reference architecture (RA) discusses the components that define the scalable and modular architecture of the DGX SuperPOD. The system is built upon building blocks of scalable units (SU), each containing 32 DGX H100 systems, which provides for rapid deployment of systems of multiple sizes. This RA includes details regarding the SU design and specifics of InfiniBand, NVLink network, Ethernet fabric topologies, storage system specifications, recommended rack layouts, and wiring guides.