Mainstream Cluster Configuration#

The Mainstream configuration builds upon the Entry configuration. Yet it provides more powerful GPUs and networking, which offers maximum performance per node while maximizing the number of nodes per rack. More throughput between the nodes is achieved by adding high-performance NVIDIA Mellanox Networking, resulting in performance gains when executing multi-node AI Enterprise workloads.

This configuration also allows organizations to use the same infrastructure for mixed workloads by standardizing the server CPU specification.

It is important to note that sizing calculations are based on a 14kW redundant PDU per rack, and a Dual 1600W PSU per server since most enterprise data centers have these requirements. Due to these power requirements, rack density calculation results in fewer GPU nodes per rack than CPU-only nodes per rack for the Mainstream configuration. This configuration may require network upgrades depending on the current infrastructure; more information is provided in the networking section.

Server and Rack Configuration#

The following table describes an example of Mainstream configuration.

Due to upgrading the GPU, this Mainstream configuration has ample resources available to provide additional resources to other workloads in conjunction with the AI workloads. The additional NVMe also reduces AI workloads latency, ensuring the GPU has sufficient data to process.

Enterprise AI / Edge AI / Data Analytics

2U NVIDIA-Certified System

Dual Intel® Xeon® Gold 6248R 3.0G, 24C/48T, 10.4GT/s,

35.75 M Cache, Turbo, HT (205W) DDR4-2933

16x 32GB RDIMM, 3200MT/s, Dual Rank

2x 1.92TB SSD SATA Mix Use 6Gbps 512,

2.5in Hot-plug AG Drive, 3 DWPD, 10512 TBW

1x 1.92TB Enterprise NVMe

Read Intensive AG Drive U.2 Gen4 with carrier

1x 16GB microSDHC/SDXC Card

Dual, Hot-plug, Redundant Power Supply (1+1), 1600W

1x NVIDIA Mellanox® ConnectX-6 Dx PCIe 25G/100G

NVIDIA® SN3420/SN3700 Top of Rack

1x NVIDIA A100 for PCIe

Important

NVIDIA A30 and A100 GPUs are compute-only GPUs and are not suitable for Remote Collaboration/ProViz workloads.

The following table illustrates the rack density for the Mainstream configuration while providing increased storage, networking, and GPU resources. This rack configuration consists of 15 nodes requiring ~13.6 kW of power. Please refer to the Sizing Guide Appendix for additional clarification regarding Mainstream sizing calculations.

Enterprise AI / Edge AI / Data Analytics

_images/better-01.png

Rack Density

15 nodes requiring ~13.6 kW of power.

Networking#

To scale out efficiently for multi-node AI workloads, high-performance networking is recommended between the nodes for optimal peer-to-peer communication speed. Options are provided for networking, and these are dependent on whether the current infrastructure includes high-performance networking capabilities.

If the current infrastructure is based on 10G, upgrading to 100G NVIDIA Mellanox networking infrastructure for optimal multi-node scale-out performance is recommended.

If the current infrastructure supports 25G, there is an option to leverage the 25G version NVIDIA Mellanox ConnectX-6 DX PCIe instead of upgrading the networking infrastructure to 100G. This provides improved performance over 10G infrastructure but is not as performant as the recommended 100G configuration.

The following diagram describes the networking topology illustrated in the Reference Architecture for NVIDIA AI Enterprise on VMware vSphere.

Within this reference architecture, two network switches were used. The mgmt-leaf-01 network switch is the infrastructure Top of Rack switch. The gpu-leaf-01 is the high-performance 100G NVIDIA Mellanox Networking switch and provides more throughput between the nodes, resulting in performance gains when executing multi-node Enterprise AI workloads.

_images/better-03.png

Note

mgmt-leaf01 is the Infrastructure Top of Rack switch.

gpu-leaf01 is the compute Top of Rack NVIDIA Mellanox SN3700 switch.

Storage#

The Mainstream cluster configuration modifies the data storage to local data storage so that the data is local to the GPU. This configuration increases training and inferencing throughput. NVMe local storage to the node accelerates access to the data. NVMe drives connect to the PCI bus at a higher speed than an SSD. This removes any bottleneck caused by the storage, thus freeing the GPU to access the data as fast as possible. Depending on the size of the data, enabling an NFS cache may play an essential role in your node configuration.

Performance#

This configuration can improve performance up to 30x when compared to a CPU-only rack. By adding A100 GPUs and high-performance networking to existing rack infrastructure, organizations can dramatically increase performance throughput for AI Enterprise workloads using a Mainstream configuration. However, because this configuration requires more power than the CPU-only configuration at the rack level, it can only accommodate 15 GPU nodes compared to 20 CPU nodes.

For more information regarding performance test results, see the Sizing Guide Appendix.