T4 NGC-ready Platform Design Guide

This document provides the platform specification for an NGC-Ready server using the NVIDIA T4 GPU.

1. Introduction

NGC (NVIDIA® GPU Cloud) is the hub for GPU-optimized software for deep learning, machine learning, and HPC that takes care of all the software plumbing, so data scientists, developers, and researchers can focus on building solutions, gathering insights, and delivering business value.

An NGC-Ready server using the NVIDIA T4 graphics processing unit (GPU) is specified to run NGC software for deep learning (DL) training and inference, machine learning (ML), and high-performance computing (HPC) with consistent, predictable performance.

This design guide provides the platform specification for an NGC-Ready server using the NVIDIA T4 GPU. It includes the GPU, CPU, system memory, network, and storage requirements needed for NGC-Ready compliance.

In addition, some design guidance is provided for T4-based servers that may not meet the requirements specified for NGC-Ready but are viable solutions for other target markets and workloads, such as video analytics.

2. Software Support

This chapter lists the software deliverables, versions, and target applications currently supported by NVIDIA as "NGC-Ready.”

Refer to the latest list of supported Tesla Recommended Drivers (TRDs), NVIDIA® CUDA® drivers, and containerized software at NVIDIA GPU Cloud Documentation for more details and updates.

Note: The T4 NGC-Ready program does not currently include NVIDIA GRID™ or IVA (video analytics) applications. Check the listed NGC website to get the most up-to-date list of containerized software supported by NGC.

3. T4 NGC-Ready Platform Specification

This chapter provides the system configuration requirements for an NGC-Ready server using T4 GPUs. NVIDIA has defined three different platform configurations for a T4 NGC-Ready server:

  • Good – Server configuration that meets minimum requirements with limited scale-out support
  • Better – Server configuration that meets minimum requirements with adequate scale-out support
  • Best – Recommended server configuration for optimal performance and scale-out support
Table 1. T4 NGC-Ready Platform Specification
Platform Element Good Better Best

GPU

Four T4 GPUs (two GPUs per socket)

CPU

Xeon Gold dual-socket / Skylake or Cascade Lake

CPU cores

16 cores per CPU

18 or more cores per CPU

CPU speed

2.1 GHz base clock (minimum)

System memory

192 GB per socket / 6 DRAM channels / dual DIMM per channel

Networking

One 10 Gbit NIC per server

One 25 Gbit NIC per socket

One 50 Gbit NIC per socket

Storage

One NVMe drive per system (either socket)

Two NVMe drives per system (one per socket)

Note:
  1. All GPUs should be connected to the host via PCIe Gen3 x16.
  2. For best RDMA performance between the GPU and NIC, choose an NIC that supports NVIDIA® GPUDirect™, such as the Mellanox ConnectX-4 EN.
  3. Best NIC performance is achieved with a PCIe x16 Gen3 connection, but if PCIe lanes are limited a x8 Gen3 connection is sufficient.
  4. NVMe connections to PCIe can be x4 Gen3.
Note: See the system diagrams in Chapter 4 for the system topology details.

4. System Topologies

The recommended system topologies for a T4 NGC-Ready server are shown in the following figures.

Figure 1 shows the “Best” NGC-Ready configuration with full multi-node support with 50 Gbit NICs and maximum storage.

  • Dual-socket Xeon Gold with 192 GB system memory per socket.
  • Two T4 GPUs per CPU socket.
  • One 50 Gbit NIC per CPU socket.
    • Note that if a x8 PCIe connection is used for the NIC, then a PCIe switch is not needed, and the GPUs and NICs can be connected direction to the CPU. Using a x16 PCIe connection to each NIC may increase performance but will require a PCIe switch (not shown) due to a limitation in the number of PCIe lanes supported by the Xeon CPU.
    • For best NIC performance, choose an NIC that supports GPUDirect, such as the Mellanox ConnectX-4 EN.
  • One NVME driver per socket.
Figure 1. T4 NGC-Ready Server Topology – Best

Figure 2 shows the “Better” NGC-Ready configuration with adequate multi-node support using 25 Gbit NICs and maximum storage.

  • Dual-socket Xeon Gold with 192 GB system memory per socket.
  • Two T4 GPUs per CPU socket.
  • One 25 Gbit NIC per CPU socket.
    • For best NIC performance, choose an NIC that supports GPUDirect, such as the Mellanox ConnectX-4 EN.
  • One NVME driver per socket.
Figure 2. T4 NGC-Ready Server Topology – Better

Figure 3 shows the “Good” NGC-Ready configuration with minimal multi-node support and maximum storage. With minimal multi-mode NIC support, a server with this configuration may have performance issues when running larger DL and RAPIDs workloads.

  • Dual-socket Xeon Gold with 192 GB system memory per socket.
  • Two T4 GPUs per CPU socket. Note that while a x16 PCIe connection to each GPU is preferred, some workloads can operate with acceptable performance when using a x8 PCIe connection to each T4 GPU.
  • One 10 Gbit NIC per server.
  • One NVME driver per socket.
Figure 3. T4 NGC-Ready Server Topology – Good

5. Design Discussion

PCI Express Interface

Most NGC-Ready platforms should be able to support direct x16 Gen3 PCIe connections from each CPU socket to the two T4 GPUs for best performance.

  • A x16 Gen3 connection to each T4 GPU is recommended for optimal performance. Use of a x8 PCIe connection to the T4 GPU may result in performance loss for some workloads. More performance analysis may be needed to quantify the degree of performance degradation for some workloads.
  • A x8 PCIe Gen3 connection to each 50 Gbit NIC is a minimum requirement for the best NGC-Ready server configuration. Using a x16 PCIe connection may result in better performance, but it would require use of a PCIe switch to accommodate all the PCIe devices (two GPUs, one NIC and one NVMe drive per CPU).
  • If there are insufficient PCIe lanes to support direct connections to the GPUs and NICs, then a PCIe switch should be used. In this case, the GPUs and NICs should be located downstream of a common PCIe switch for optimal P2P and RDMA performance.
  • The NVMe drives can connect to the host via a x8 Gen3 link.

5.2. CPU and System Memory

Currently, NVIDIA is only specifying a T4 NGC-Ready platform using Intel Xeon CPUs. A single CPU server configuration (for example with AMD ROME) may be released at a future date once more testing has been completed.

  • For best performance, a minimum of 18 CPU cores for every two T4 GPUs is preferred. For better or good performance, 16 CPU cores for every two T4 GPUs is the minimum requirement. Xeon Gold CPUs can meet this requirement.
  • The minimum system memory configuration for NGC-Ready workloads is 192 GB when using four T4 GPUs.
  • NVIDIA’s minimum system memory configuration is 192 GB per socket, because of the target application requirements, and because memory bandwidth is maximized with six DRAM channels populated.

5.3. Network Interface

In general, the preferred NIC:GPU ratio is one 33 Gbit NIC for every pair of T4 GPUs. For the optimal (Best) system configuration, this results in one 50 Gbit NIC per socket. For the “Better” system configuration, one 25 Gbit NIC per socket is adequate, but multi-node workload performance may be impacted. For the “Good” system configuration, one 10 Gbit NIC per server is needed.

For best throughput between the NICs and GPUs, choose a NIC that supports GPUDirect, such as the Mellanox ConnectX-4 EN. If the server design uses PCIe switches to support its topology, the NIC and GPUs should be located under the same PCIe switch for best performance.

5.4. Storage

One NVMe SSD drive for each socket connected via x4 Gen3 PCIe is the best option for an NGC-Ready server using T4. One NVMe per server is good enough but performance may be impacted for some servers. Consult with NVIDIA directly if the use of SSD drives is desired.

Notices

Notice

THE INFORMATION IN THIS GUIDE AND ALL OTHER INFORMATION CONTAINED IN NVIDIA DOCUMENTATION REFERENCED IN THIS GUIDE IS PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE INFORMATION FOR THE PRODUCT, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the product described in this guide shall be limited in accordance with the NVIDIA terms and conditions of sale for the product.

THE NVIDIA PRODUCT DESCRIBED IN THIS GUIDE IS NOT FAULT TOLERANT AND IS NOT DESIGNED, MANUFACTURED OR INTENDED FOR USE IN CONNECTION WITH THE DESIGN, CONSTRUCTION, MAINTENANCE, AND/OR OPERATION OF ANY SYSTEM WHERE THE USE OR A FAILURE OF SUCH SYSTEM COULD RESULT IN A SITUATION THAT THREATENS THE SAFETY OF HUMAN LIFE OR SEVERE PHYSICAL HARM OR PROPERTY DAMAGE (INCLUDING, FOR EXAMPLE, USE IN CONNECTION WITH ANY NUCLEAR, AVIONICS, LIFE SUPPORT OR OTHER LIFE CRITICAL APPLICATION). NVIDIA EXPRESSLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS FOR SUCH HIGH RISK USES. NVIDIA SHALL NOT BE LIABLE TO CUSTOMER OR ANY THIRD PARTY, IN WHOLE OR IN PART, FOR ANY CLAIMS OR DAMAGES ARISING FROM SUCH HIGH RISK USES.

NVIDIA makes no representation or warranty that the product described in this guide will be suitable for any specified use without further testing or modification. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to ensure the product is suitable and fit for the application planned by customer and to do the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this guide. NVIDIA does not accept any liability related to any default, damage, costs or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this guide, or (ii) customer product designs.

Other than the right for customer to use the information in this guide with the product, no other license, either expressed or implied, is hereby granted by NVIDIA under this guide. Reproduction of information in this guide is permissible only if reproduction is approved by NVIDIA in writing, is reproduced without alteration, and is accompanied by all associated conditions, limitations, and notices.

Trademarks

NVIDIA, the NVIDIA logo, and Volta are trademarks and/or registered trademarks of NVIDIA Corporation in the Unites States and other countries.

Docker and the Docker logo are trademarks or registered trademarks of Docker, Inc. in the United States and/or other countries.

Other company and product names may be trademarks of the respective companies with which they are associated.