Entry Cluster Configuration

Sizing Guide (0.1.0)

The Entry configuration is designed to be dropped into your existing rack infrastructure without modifying the current power or networking. This configuration will allow an organization to support AI Enterprise workloads quickly but has lower performance throughput potential when compared to the Mainstream and best configurations.

Using an NVIDIA-Certified Systems server with ~1600W power supplies, it is possible to populate a 14kW rack between 17 - 20 nodes.

GPU accelerated Enterprise/Edge AI workloads require minimal CPU cycles. Therefore, lower CPU specifications, such as 85W Intel Xeon Silver CPUs, can be used for the Entry Configuration.

The following table illustrates an example Entry configuration for each of these workloads. This configuration used existing rack infrastructure and did not require modifications to power or networking.

Enterprise AI / Edge AI / Data Analytics

2U NVIDIA-Certified System
Dual Intel Xeon Silver 4215 2.5G, 8C/16T, 9.6GT/s, 11M Cache, Turbo, HT (85W) DDR4-2400
24x 16GB RDIMM, 3200MT/s, Dual Rank
2x 1.92TB SSD SATA Mix Use 6Gbps 512, 2.5in Hot-plug AG Drive, 3 DWPD, 10512 TBW
1x 16GB microSDHC/SDXC Card
Onboard networking
Dual, Hot-plug, Redundant Power Supply (1+1), 1600W
NVIDIA ConnectX-6 Lx 25G NIC
NVIDIA SN2410 Top of Rack
1x NVIDIA A30 (Optional: A100)
Important

NVIDIA A30 and A100 GPUs are compute-only GPUs and are not suitable for Remote Collaboration/ProViz workloads.

The following table illustrates the rack density using existing power and networking. It is important to note that rack density is maintained even when adding GPU resources since our sizing calculations were designed to optimize the power consumption by reducing the specification of the CPU to a lower wattage CPU.

This rack configuration would consist of 20 Enterprise/Edge AI nodes requiring ~12.4 kW of power. Please refer to Sizing Guide Appendix for additional clarification regarding Entry sizing calculations.

Enterprise AI / Edge AI / Data Analytics

good-01.png
Rack Density 20 nodes requiring ~12.4 kW of power

Entry configuration networking options depend on whether the current infrastructure is based upon 10G or 25G networking.

If the current infrastructure is based on 10G, the servers can leverage onboard/built-in networking. If the existing infrastructure supports 25G, it is recommended to use the NVIDIA Mellanox ConnectX-6 LX PCIe, with your existing 25G switch that supports RoCE or paired with an NVIDIA® Mellanox® SN2410 switch. This will result in greater performance when executing AI Enterprise multi-node workloads.

Optimal access to storage is essential for AI Enterprise multi-node workloads; it depends on workload types, such as training or inference, and the dataset’s size. Suppose the storage array cannot promptly provide access to the dataset. In that case, the overall performance could be impacted while the GPU is waiting for more data. Enabling NFS Cache is an option to consider for the Entry configuration, reducing the load on the centralized storage array. NFS Cache can be used for any existing storage infrastructure. For more information on NFS Cache and its benefits, refer to the DGX Best Practices, NFS Cache for Deep Learning.

By adding A30 GPUs to existing rack infrastructure, organizations can dramatically increase performance throughput for AI Enterprise workloads using a Entry configuration. The Entry configuration can improve performance up to 20x when compared to a CPU-only node rack.

For more information regarding performance test results, see the Sizing Guide Appendix.

Previous Overview
Next Mainstream Cluster Configuration
© Copyright 2024, NVIDIA. Last updated on Apr 2, 2024.