Entry Cluster Configuration

The Entry configuration is designed to be dropped into your existing rack infrastructure without modifying the current power or networking. This configuration will allow an organization to support AI Enterprise workloads quickly but has lower performance throughput potential when compared to the Mainstream and best configurations.

Server and Rack Configuration

Using an NVIDIA-Certified Systems server with ~1600W power supplies, it is possible to populate a 14kW rack between 17 - 20 nodes.

GPU accelerated Enterprise/Edge AI workloads require minimal CPU cycles. Therefore, lower CPU specifications, such as 85W Intel Xeon Silver CPUs, can be used for the Entry Configuration.

The following table illustrates an example Entry configuration for each of these workloads. This configuration used existing rack infrastructure and did not require modifications to power or networking.

Enterprise AI / Edge AI / Data Analytics

2U NVIDIA-Certified System

Dual Intel Xeon Silver 4215 2.5G, 8C/16T, 9.6GT/s,

11M Cache, Turbo, HT (85W) DDR4-2400

24x 16GB RDIMM, 3200MT/s, Dual Rank

2x 1.92TB SSD SATA Mix Use 6Gbps 512,

2.5in Hot-plug AG Drive, 3 DWPD, 10512 TBW

1x 16GB microSDHC/SDXC Card

Onboard networking

Dual, Hot-plug, Redundant Power Supply (1+1), 1600W

NVIDIA ConnectX-6 Lx 25G NIC

NVIDIA SN2410 Top of Rack

1x NVIDIA A30 (Optional: A100)

Important

NVIDIA A30 and A100 GPUs are compute-only GPUs and are not suitable for Remote Collaboration/ProViz workloads.

The following table illustrates the rack density using existing power and networking. It is important to note that rack density is maintained even when adding GPU resources since our sizing calculations were designed to optimize the power consumption by reducing the specification of the CPU to a lower wattage CPU.

This rack configuration would consist of 20 Enterprise/Edge AI nodes requiring ~12.4 kW of power. Please refer to Sizing Guide Appendix for additional clarification regarding Entry sizing calculations.

Enterprise AI / Edge AI / Data Analytics

../_images/good-01.png

Rack Density

20 nodes requiring ~12.4 kW of power

Networking

Entry configuration networking options depend on whether the current infrastructure is based upon 10G or 25G networking.

If the current infrastructure is based on 10G, the servers can leverage onboard/built-in networking. If the existing infrastructure supports 25G, it is recommended to use the NVIDIA Mellanox ConnectX-6 LX PCIe, with your existing 25G switch that supports RoCE or paired with an NVIDIA® Mellanox® SN2410 switch. This will result in greater performance when executing AI Enterprise multi-node workloads.

Storage

Optimal access to storage is essential for AI Enterprise multi-node workloads; it depends on workload types, such as training or inference, and the dataset’s size. Suppose the storage array cannot promptly provide access to the dataset. In that case, the overall performance could be impacted while the GPU is waiting for more data. Enabling NFS Cache is an option to consider for the Entry configuration, reducing the load on the centralized storage array. NFS Cache can be used for any existing storage infrastructure. For more information on NFS Cache and its benefits, refer to the DGX Best Practices, NFS Cache for Deep Learning.

Performance

By adding A30 GPUs to existing rack infrastructure, organizations can dramatically increase performance throughput for AI Enterprise workloads using a Entry configuration. The Entry configuration can improve performance up to 20x when compared to a CPU-only node rack.

For more information regarding performance test results, see the Sizing Guide Appendix.