Key Components of the DGX SuperPOD#

The DGX SuperPOD architecture has been designed to maximize performance for state-of-the-art model training, scale to exaflops of performance, provide the highest performance to storage and support all customers in the enterprise, higher education, research, and the public sector. It is a digital twin of the main NVIDIA research and development system, meaning the company’s software, applications, and support structure are first tested and vetted on the same architecture. By using SUs, system deployment times are reduced from months to weeks. Leveraging the DGX SuperPOD design reduces time-to-solution and time-to-market of next generation models and applications.

DGX SuperPOD is the integration of key NVIDIA components, as well as storage solutions from partners certified to work in the DGX SuperPOD environment.

NVIDIA DGX B200 System#

The NVIDIA DGX B200 system (Figure 1) is an AI powerhouse that enables enterprises to expand the frontiers of business innovation and optimization. The DGX B200 system delivers breakthrough AI performance with the most powerful chips ever built, in an eight GPU configuration. The NVIDIA Blackwell GPU architecture provides the latest technologies that brings months of computational effort down to days and hours, on some of the largest AI/ML workloads.

_images/image3.png

Figure 1. DGX B200 system#

Some of the key highlights of the DGX B200 system when compared to the DGX H200 system include:

  • 72 petaFLOPS FP8 training and 144 petaFLOPS FP4 inference

  • Fifth generation of NVIDIA NVLink.

  • 1,440 GB of aggregated HBM3 memory

NVIDIA InfiniBand Technology#

InfiniBand is a high-performance, low latency, RDMA capable networking technology, proven over 20 years in the harshest compute environments to provide the best inter-node network performance. InfiniBand continues to evolve and lead data center network performance.

The NDR generation InfiniBand, NDR, has a peak speed of 400 Gbps per direction with an extremely low port-to-port latency, and is backwards compatible with the previous generations of InfiniBand specifications. InfiniBand is more than just peak bandwidth and low latency. InfiniBand provides additional features to optimize performance including adaptive routing (AR), collective communication with SHARPTM, dynamic network healing with SHIELDTM, and supports several network topologies including fat-tree, Dragonfly, and multi-dimensional Torus to build the largest fabrics and compute systems possible.

Runtime and System Management#

The DGX SuperPOD RA represents the best practices for building high-performance data centers. There is flexibility in how these systems can be presented to customers and users. NVIDIA Base Command Manager software is used to manage all DGX SuperPOD deployments.

DGX SuperPOD can be deployed on-premises, meaning the customer owns and manages the hardware as a traditional system. This can be within a customer’s data center or co-located at a commercial data center, but the customer owns the hardware.

Components#

The hardware components of DGX SuperPOD are described in Table 1. The software components are shown in Table 2.

Table 1. DGX SuperPOD hardware components by NVIDIA#

Component

NVIDIA Technology

Description

Compute nodes

NVIDIA DGX B200 system with eight B200 GPUs

The world’s premier purpose-built AI systems featuring NVIDIA B200 Tensor Core GPUs, fifth- generation NVIDIA NVLink, and fourth- generation NVIDIA NVSwitch™ technologies.

Compute fabric

NVIDIA Quantum QM9700 NDR 400 Gbps InfiniBand

Rail-optimized, non-blocking, full fat-tree network with eight NDR400 connections per system

InfiniBand Storage fabric

NVIDIA Quantum QM9700 NDR 400 Gbps InfiniBand

The fabric is optimized to match peak performance of the configured storage array

Ethernet Storage fabric

NVIDIA Spectrum-X SN5600 800 Gbps Ethernet

Optional storage fabric for ethernet based storage solutions

Compute/storage InfiniBand fabric management

NVIDIA Unified Fabric Manager Appliance, Enterprise Edition

NVIDIA UFM combines enhanced, real-time network telemetry with AI powered cyber intelligence and analytics to manage scale-out InfiniBand data centers

In-band management network

NVIDIA SN5600 switch

64 port 800 Gbps and up to 256 ports of 200 Gbps Ethernet switch providing high port density with high performance

In-band and Out-of-band (OOB) management network

NVIDIA SN2201 switch

48 port 1 Gbps Ethernet and 4 x 100 Gbps switch leveraging copper ports to minimize complexity

Table 2. DGX SuperPOD software components#

Component

Description

NVIDIA Base Command Manager

Comprehensive AI infrastructure management for AI clusters. It automates provisioning and administration and supports cluster sizes into the thousands of nodes.

NVIDIA AI Enterprise

NVIDIA AI Enterprise is an end-to-end, cloud-native software platform that accelerates data science pipelines and streamlines development and deployment of production-grade co-pilots and other generative AI applications.

Magnum IO

Enables increased performance for AI and HPC

NVIDIA NGC

The NGC catalog provides a collection of GPU-optimized containers for AI and HPC

Slurm

A classic workload manager used to manage complex workloads in a multi-node, batch-style, compute environment

Run:ai

Cloud-native AI workload and GPU orchestration platform enabling fractional, full, and multi-node support for the entire enterprise AI lifecycle including interactive development environments, training and inference

Note

DGX SuperPOD only supports NVIDIA Base Command Manager with Slurm, or Run:ai.

Design Requirements#

DGX SuperPOD is designed to minimize system bottlenecks throughout the tightly coupled configuration to provide the best performance and application scalability. Each subsystem has been thoughtfully designed to meet this goal. In addition, the overall design remains flexible so that data center requirements can be tailored to better integrate into existing data centers.

System Design#

DGX SuperPOD is optimized for a customers’ particular workload of multi-node AI and HPC applications:

  • A modular architecture based on SUs of 32 DGX B200 systems each.

  • A fully tested system scales to four SUs, but larger deployments can be built based on customer requirements.

  • Single rack that can support two DGX B200 systems per rack, so that the rack layout can be modified to accommodate different data center requirements.

  • Storage partner equipment that has been certified to work in DGX SuperPOD environments.

  • Full system support—including compute, storage, network, and software—is provided by NVIDIA Enterprise Experience (NVEX).

Compute Fabric#

  • The compute fabric is rail-optimized, balanced, full-fat tree topology

  • Managed NDR switches are used throughout the design to provide better management of the fabric.

  • The fabric is designed to support the latest SHaRP features.

Storage Fabric (High Speed Storage)#

The storage fabric provides high bandwidth to shared storage. It also has the following characteristics:

  • It is independent of the compute fabric to maximize performance of both storage and application performance.

  • Provides single-node bandwidth of at least 40 GBps to each DGX B200 system.

  • Storage is provided over InfiniBand or RDMA over Converged Ethernet to provide maximum performance and minimize CPU overhead.

  • It is flexible and can scale to meet specific capacity and bandwidth requirements.

  • Connectivity to management nodes required to provide storage access independent of compute nodes.

In-Band Management Network#

  • The in-band management network fabric is Ethernet-based and is used for node provisioning, data movement, Internet access, and other services that must be accessible by the users.

  • The in-band management network connections for compute and management nodes operate at 200 Gbps and are bonded for resiliency.

Out-of-Band Management Network#

The OOB management network connects all the base management controller (BMC) ports, as well as other devices that should be physically isolated from users. The Switch Management Network is a subset of the Out-Of-Band Network that provides additional security and resiliency.

Storage Requirements#

The DGX SuperPOD compute architecture must be paired with a high-performance, balanced, storage system to maximize overall system performance. DGX SuperPOD is designed to use two separate storage systems, high-performance storage (HPS) and user storage, optimized for key operations of throughput, parallel I/O, as well as higher IOPS and metadata workloads.

High-Performance Storage#

High-Performance Storage is provided via InfiniBand connected storage from a DGX SuperPOD certified storage partner, and is engineered and tested with the following attributes in mind:

  • High-performance, resilient, POSIX-style file system optimized for multi-threaded read and write operations across multiple nodes.

  • RDMA on InfiniBand or Ethernet support

  • Local system RAM for transparent caching of data.

  • Leverage local flash device transparently for read and write caching.

The specific storage fabric topology, capacity, and components are determined by the DGX SuperPOD certified storage partner as part of the DGX SuperPOD design process.

User Storage#

User Storage differs from High-Performance storage in that it exposes an NFS share on the in-band management fabric for multiple uses. It is typically used for “home directory” type usage (especially with clusters deployed with Slurm), administrative scratch space, and shared storage as needed by DGX SuperPOD components in a High Availability configuration (e.g., Base Command Manager), and log files.

With that in mind, User Storage has the following requirements:

  • 100 Gb/s connectivity is required.

  • Designed for high metadata performance, IOPS, and key enterprise features such as checkpointing. This is different than the HPS, which is optimized for parallel I/O and large capacity.

  • Communicate over Ethernet, using NFS.

User storage in a DGX SuperPOD is often satisfied with existing NFS servers already deployed, such that a new export is created and made accessible to the DGX SuperPOD’s in-band management network. User Storage is therefore not described in detail in this DGX SuperPOD reference architecture. However, we require 100 Gb/s minimum bandwidth for the user storage.