Key Components of the DGX SuperPOD#

The DGX SuperPOD architecture has been designed to maximize performance for state-of-the-art model training, scale to exaflops of performance, provide the highest performance to storage and support all customers in the enterprise, higher education, research, and the public sector. It is a digital twin of the main NVIDIA research and development system, meaning the company’s software, applications, and support structure are first tested and vetted on the same architecture. By using SUs, system deployment times are reduced from months to weeks. Leveraging the DGX SuperPOD design reduces time-to-solution and time-to-market of next generation models and applications.

DGX SuperPOD is the integration of key NVIDIA components, as well as storage solutions from partners certified to work in the DGX SuperPOD environment.

NVIDIA DGX B300 System#

The NVIDIA DGX B300 system (Figure 1) is an AI powerhouse that enables enterprises to expand the frontiers of business innovation and optimization. The DGX B300 system delivers breakthrough AI performance with the most powerful chips ever built, in an eight GPU configuration. The NVIDIA Blackwell GPU architecture provides the latest technologies that brings months of computational effort down to days and hours, on some of the largest AI/ML workloads.

Figure 1 DGX B300 system#

Some of the key highlights of the DGX B300 system when compared to the DGX B200 system include:

DC Busbar powered, MGX-rack capable design for high density deployment in modern datacenters
Alternative UshAC PSU powered appliance design
72 petaFLOPS FP8 training and 144 petaFLOPS FP4 inference
Fifth generation of NVIDIA NVLink.
1,440 GB of aggregated HBM3 memory

MGX Racks and DGX Power Shelves#

DGX B300 system features a DC busbar powered MGX v1.1 design that is similar in the DGX GB200/300 system. This design enables higher rack density, a better power efficiency, and data center level compatibility to DGX GB200/300 system.

The power shelf used for DGX B300 SuperPOD has six 5.5kW Power Shelves configured as N redundancy and can deliver up to 33kW of power. There are four total power shelves in a single DGX B300 rack. At the rear of the power shelf is a set of RJ45 ports used for power brake and current sharing feature. The power shelves are daisy chained to each other using these RJ45 ports. At the front of the power shelf is the BMC port. Figure 2 shows the front of the power shelf.

Figure 2 Power Shelf#

NVIDIA Spectrum-X Technology#

The NVIDIA Spectrum™-X Ethernet platform is designed specifically to improve the performance and efficiency of Ethernet-based AI clouds. This breakthrough technology achieves 1.6X better AI networking performance, along with consistent, and predictable performance. Spectrum-X is built on network innovations powered by the tight coupling of the NVIDIA Spectrum-4 Ethernet switch and NVIDIA® ConnectX-8 Smart NIC. Spectrum-X network optimizations reduce runtimes of massive transformer-based generative AI models and deliver faster time to insight.

Spectrum-X 2.0 provides the same 800Gbps connectivity with comparable latency characteristics when compared with XDR InfiniBand based on more affordable switches and optics.

NVIDIA InfiniBand Technology#

InfiniBand is a high-performance, low latency, RDMA capable networking technology, proven over 20 years in the harshest compute environments to provide the best inter-node network performance. InfiniBand continues to evolve and lead data center network performance.

NVIDIA InfiniBand XDR has a peak speed of 800 Gbps per direction with an extremely low port-to-port latency and is backwards compatible with the previous generations of InfiniBand specifications. InfiniBand is more than just peak bandwidth and low latency. InfiniBand provides additional features to optimize performance including Adaptive Routing (AR), collective communication with SHARP^TM, dynamic network healing with SHIELD^TM, and supports several network topologies.

NVIDIA Mission Control#

The DGX B300 SuperPOD Reference Architecture represents the best practices for building high-performance AI factories. There is flexibility in how these systems can be presented to customers and users. NVIDIA Mission Control software is used to manage all DGX B300 SuperPOD deployments.

NVIDIA Mission Control is a sophisticated full-stack software solution. As an essential part of the DGX SuperPOD experience, it optimizes developer workload performance and resiliency, ensures unmatched uptime with automated failure handling, and provides unified cluster-scale telemetry and manageability. Key features include full-stack resiliency, predictive maintenance, unified error reporting, data center optimizations, cluster health checks, and automated node management.

NVIDIA Mission Control software incorporates the same technology that NVIDIA uses to manage thousands of systems for our award-winning data scientists and provides an immediate path to AI Factory for organizations that need the best of the best.

DGX SuperPOD is to be deployed on-premises, meaning the customer owns and manages the hardware. This can be within a customer’s data center or co-located at a commercial data center. In each case the customer owns the hardware, the service it provides, and is responsible for their cluster infrastructure as well as providing the building management system for integration.

Components#

The hardware components of DGX SuperPOD are described in Table 1. The software components are shown in Table 2.

Table 1 DGX SuperPOD hardware components by NVIDIA#
Component	NVIDIA Technology	Description
Compute nodes	NVIDIA DGX B300 system with eight Blackwell Ultra GPUs	The world’s premier purpose-built AI systems featuring NVIDIA Blackwell Ultra GPUs, fifth-generation NVIDIA NVLink, and fourth-generation NVIDIA NVSwitch™ technologies.
Compute node transceiver and cable	NVIDIA OSFP twin port flat top transceiver, MMF passive fiber cable	Transceiver and cables for DGX nodes
Compute fabric	NVIDIA Quantum-3 Q3400-RA 800 Gbps InfiniBand	Rail-optimized, non-blocking, twin-plane, fat tree topology for next-generation extreme-scale AI factory.
InfiniBand Storage fabric	NVIDIA Quantum QM9700 NDR 400 Gbps InfiniBand Switch	The fabric is optimized to match peak performance of the configured storage array
InfiniBand storage fabric switch transceiver and cable \| NVIDIA QSFP single port flat top transceiver, MMF passive fiber cable		Transceiver and cables for InfiniBand switches
Ethernet Storage fabric	NVIDIA Spectrum-4 SN5610/SN5600D 800 Gbps Ethernet	Optional storage fabric for ethernet based storage solutions
Ethernet Storage fabric switch transceiver and cable	NVIDIA QSFP single port flat top transceiver 800GB (NIC side) and NVIDIA OSFP twin port finned (switch side) transceiver 1600GB MMF passive fiber cable	Transceiver and cables for Spectrum-4 switches
Storage InfiniBand fabric management	NVIDIA Unified Fabric Manager 3.5 Appliance, Enterprise Edition	NVIDIA UFM combines enhanced, real-time network telemetry with AI powered cyber intelligence and analytics to manage scale-out InfiniBand data centers
In-band management network	NVIDIA SN5610/SN5600D switch	64 port 800 Gbps and up to 256 ports of 200 Gbps Ethernet switch providing high port density with high performance
Ethernet In-band fabric switch transceiver and cable	NVIDIA QSFP single port flat top transceiver 400GB or NVIDIA OSFP twin port finned transceiver 800GB MMF passive fiber cable	Transceiver and cables for Spectrum-4 switches
Out-of-band (OOB) management network	NVIDIA SN2201(M) switch	48 port 1 Gbps Ethernet and 4 x 100 Gbps switch leveraging copper ports to minimize complexity

Table 2 DGX SuperPOD software components#
Component	Description
NVIDIA Mission Control [1]	Simplified AI data center operations, cluster management, and workload orchestration with agility, resilience, and hyperscale efficiency for enterprises.
NVIDIA Run:ai	Cloud-native AI workload and GPU orchestration platform enabling fractional, full, and multi-node support for the entire enterprise AI lifecycle including interactive development environments, training and inference
NVIDIA AI Enterprise	NVIDIA AI Enterprise is an end-to-end, cloud-native software platform that accelerates data science pipelines and streamlines development and deployment of production-grade co-pilots and other generative AI applications.
Magnum IO	Enables increased performance for AI and HPC
NVIDIA NGC	The NGC catalog provides a collection of GPU-optimized containers for AI and HPC
Slurm	A classic workload manager used to manage complex workloads in a multi-node, batch-style, compute environment

Note

NVIDIA Mission Control now includes Base Command Manager and Run:ai functionality. No separate purchase is needed. SuperPOD only supports multiteam environments through Base Command Manager; multitenancy is not supported with SuperPOD currently.

Design Requirements#

DGX SuperPOD is designed to minimize system bottlenecks throughout the tightly coupled configuration to provide the best performance and application scalability. Each subsystem has been thoughtfully designed to meet this goal. In addition, the overall design remains flexible so that data center requirements can be tailored to better integrate into existing data centers.

System Design#

DGX SuperPOD is optimized for a customers’ particular workload of multi-node AI and HPC applications:

A modular architecture based on SUs of 64 DGX B300 systems each.
MGX-rack based, DC busbar powered, integrated data-center scale design.
A fully tested system scales to four SUs, but larger deployments can be built based on customer requirements.
Single rack that can support up to four DGX B300 systems per rack, enabling modification to accommodate different data center requirements.
Storage partner equipment that has been certified to work in DGX SuperPOD environments.
Full system support, including compute, storage, network, and software is provided by NVIDIA Enterprise Experience (NVEX).

Compute Fabric#

The compute fabric is rail-optimized, twin-planar, full-fat tree topology
Managed Quantum-3 and Spectrum-X switches are used throughout the design to provide better management of the fabric.

Storage Fabric (High Speed Storage)#

The storage fabric provides high bandwidth to shared storage. It also has the following characteristics:

It is independent of the compute fabric to maximize performance of both storage and application performance.
Provides single-node bandwidth of at least 40 Gbps to each DGX B300 system.
Storage is provided over InfiniBand or RDMA over Converged Ethernet to provide maximum performance and minimize CPU overhead.
It is flexible and can scale to meet specific capacity and bandwidth requirements.
Connectivity to management nodes required to provide storage access independent of compute nodes.

In-Band Management Network#

The in-band management network fabric is Ethernet-based and is used for node provisioning, data movement, Internet access, and other services that must be accessible by the users.
The in-band management network connections for compute and management nodes operate at 200 Gbps and are bonded for resiliency.

Out-of-Band Management Network#

The OOB management network connects all the base management controller (BMC) ports, as well as other devices that should be physically isolated from users. The Switch Management Network is a subset of the Out-Of-Band Network that provides additional security and resiliency.

Storage Requirements#

The DGX SuperPOD compute architecture must be paired with a high-performance, balanced, storage system to maximize overall system performance. DGX SuperPOD is designed to use two separate storage systems, high-performance storage (HPS) and user storage, optimized for key operations of throughput, parallel I/O, as well as higher IOPS and metadata workloads.

High-Performance Storage#

High-Performance Storage is provided via InfiniBand connected storage from a DGX SuperPOD certified storage partner, and is engineered and tested with the following attributes in mind:

High-performance, resilient, POSIX-style file system optimized for multi-threaded read and write operations across multiple nodes.
RDMA on InfiniBand or Ethernet support
Local system RAM for transparent caching of data.
Leverage local flash device transparently for read and write caching.

The specific storage fabric topology, capacity, and components are determined by the DGX SuperPOD certified storage partner as part of the DGX SuperPOD design process.

User Storage#

User Storage differs from High-Performance storage in that it exposes an NFS share on the in-band management fabric for multiple uses. It is typically used for “home directory” type usage (especially with clusters deployed with Slurm), administrative scratch space, and shared storage as needed by DGX SuperPOD components in a High Availability configuration (e.g., Base Command Manager), and log files.

With that in mind, User Storage has the following requirements:

100 Gb/s DR1 connectivity is required.
Designed for high metadata performance, IOPS, and key enterprise features such as checkpointing. This is different than the HPS, which is optimized for parallel I/O and large capacity.
Communicate over Ethernet, using NFS.

User storage in a DGX SuperPOD is often satisfied with existing NFS servers already deployed, such that a new export is created and made accessible to the DGX SuperPOD’s in-band management network. For the best performance, we require 100 Gb/s minimum bandwidth for the user storage.

Footnotes