NVIDIA-Certified Systems Configuration Guide#

NVIDIA-Certified Systems configuration is a methodology for configuring system hardware for optimal GPU-accelerated workload performance.

Introduction#

This document provides guidelines for configuring NVIDIA-Certified Systems to run various GPU-accelerated computing workloads in production environments. These recommendations serve as a starting point for addressing workload-specific needs.

Optimal PCIe server configurations depend on the target workloads (or applications) for each server and will vary on a case-by-case basis. GPU servers are commonly configured to execute the following types of applications or target workloads:

Large Language Models (LLM)
Natural Language Recognition (NLR)
Omniverse applications
Inference and Intelligent Video Analytics (IVA)
Deep learning (DL) training / AI
High-performance computing (HPC)
Cloud gaming
Rendering and virtual workstation
Virtual Desktop Infrastructure (VDI)
Virtual workstation (vWS)
Transcoding

It’s important to note that NVIDIA-Certified Systems testing by NVIDIA partners requires a standardized setup to simplify performance measurements.

The following sections outline architectural considerations for solutions between sales and end customers, serving as conversation starters for specific configuration and design deep dives.

Application Considerations—Workload Sizing#

You can configure a GPU server to execute many different workloads. The size of your application workload, datasets, models, and specific use case will impact your hardware selections and deployment considerations. This guide provides an overview of options as a starting point. Additional cluster sizing overviews are available in the NVIDIA AI Enterprise Sizing Guide. Please discuss specific workload requirements with your provider to ensure your solution will meet your business needs.

GPU Scalability Considerations#

Enterprise hardware can be configured to meet the specific requirements of your AI application, with multiple customization options available.

Single GPU: An application or workload has access to the entire GPU.
Multi-Instance GPU Partitioning: Certain GPUs can be run as a single unit or partitioned into multiple GPUs to support multiple parallel threads.
Multiple GPUs: Having multiple GPUs within a single server. These GPUs can be MIG capable, shared across multiple workloads, or dedicated to a high-performance computing workload within that server.
Single Node Workloads
Clustered Workloads
NVIDIA Enterprise Reference Architectures (Enterprise RAs)

Single Node Workloads#

Single Node workloads are a deployment pattern designed around being able to allocate resources within a single server or workstation. This can mean training or inferencing on a single server, using the entire system, or partitioning the GPU to run multiple applications all within the same node. There may be options to upgrade resources by adding additional GPU, CPU, or memory within that server, but these solutions typically do not scale to the cluster level. Single node deployments typically do not require high-speed networking to connect multiple nodes for your AI workload but may require it for connecting to other applications.

Clustered Workloads#

Workload clustering is an application deployment pattern designed around being able to allocate additional resources across multiple servers. This means multiple nodes are connected with high-speed networking (either InfiniBand or RoCE) or via NVLink and NVSwitch to allow the workload to spread resources across multiple nodes in a cluster. Much like the considerations of how your application workload processes threads on a single GPU, MIG partition, or multiple GPUs on a single server, your workload can also process across multiple GPUs on multiple servers, at multiple locations to run the most complex high-performance workloads.

Enterprise Reference Architectures#

NVIDIA Enterprise Reference Architectures (Enterprise RAs) are tailored for enterprise-class deployments, ranging from 32 to 1024 GPUs. Depending on the base technology, they include configurations for 4 up to 128 nodes, complete with the appropriate networking topology, switching, and allocations for storage and control plane nodes. Each reference architecture is designed around an NVIDIA-Certified server that follows a prescriptive design pattern, called a Reference Configuration, to ensure optimal performance when deployed in a cluster. Refer to NVIDIA Enterprise Reference Architecture Whitepaper for more details.

Deployment Considerations#

AI workloads can be deployed in multiple locations depending on the business requirements for the application and use case. Your specific use case will help guide your hardware needs. The following sections describe example locations.

Data Center Data Centers (DC) encompass the standard IT infrastructure location. These typically include servers deployed into racks with Top-Of-Rack (TOR) switches connecting multiple racks within a row. Rows are laid out with hot and cold aisles to service the hardware.

Edge (Enterprise Edge / Industrial Edge) Enterprise Edge locations cover non-standard data center locations and include remote management capabilities found in standard data centers. Often, the same servers can be found in both standard data centers and edge locations. These systems are usually based on traditional enterprise servers and have been adapted for use in edge applications. They are typically intended for use in temperature-controlled environments.

Industrial Edge locations cover applications where standard DC management capabilities traditionally do not exist, such as factory floors or cell phone towers. Systems deployed to industrial locations tend to undergo more rigorous thermal, shock, and vibration testing to handle conditions that standard servers in a data center would not tolerate. These systems are ruggedized industrial PCs or other specialized devices deployed on-premises or in vehicles, specifically designed for the environments in which they are used.

Workstations (Desktop Workstations / Mobile Workstations) Desktop Workstations are tower-based systems designed for limited mobility. Mobile Workstations are typically laptop-based systems designed for portability.

VDI (Virtual Desktop Infrastructure) Virtual Desktop Infrastructure (VDI) allows the creation of virtual desktops hosted on centralized servers, often located in a data center. With NVIDIA’s vGPU (Virtual GPU) technology, VDI enables the efficient delivery of high-performance graphics to virtual desktops.

Refer to Example VDI Deployment Configurations and NVIDIA Virtual GPU (vGPU) Software for more information.

Security Considerations#

Security becomes paramount as your accelerated workloads scale beyond the traditional data center. Specific security recommendations are beyond the scope of this guide, but the following features are validated as part of the certification process:

Trusted Platform Module (TPM) NVIDIA-Certified systems are tested for TPM 2.0 modules. TPM is an international security standard that enables platform integrity, disk encryption, and system identification and attestation.

Unified Extensible Firmware Interface (UEFI) UEFI is a public specification that replaces the legacy Basic Input/Output System (BIOS) boot firmware. NVIDIA-Certified systems are tested for UEFI bootloader compatibility.

Thermal Considerations#

NVIDIA Certified Systems are qualified and tested to run workloads within the OEM manufacturer’s temperature and airflow specifications.

Industrial-certified systems are tested at the OEM’s maximum supported temperature.

Component temperature can impact workload performance, which in turn is affected by environmental, airflow, and hardware selections. When building a solution to ensure optimal performance, consider these variables.

Configurations#

Inference System#

Inference application performance is greatly accelerated with the use of NVIDIA GPUs and NVIDIA inference microservices such as NVIDIA NIM™, and includes workloads such as:

Large Language Model Inference
Natural Language Recognition (NLR)
Omniverse applications
DeepStream – GPU-accelerated Intelligent Video Analytics (IVA)
NVIDIA® TensorRT™, Triton – inference software with GPU acceleration

A GPU server designed for executing inference workloads can be deployed at the edge or in the data center. Each server location has its own set of environmental and compliance requirements. For example, an edge server may require NEBS compliance with more stringent thermal and mechanical requirements.

Table 1 provides the system configuration recommendations for an inference server using NVIDIA GPUs. Specific use cases for your application workload should be discussed with your integration partner. Large Language Models should target the higher-end specs. Omniverse and visualization application usage will need L40S/L40.

Table 1. Inference Server System#
Parameter	Inference Server Recommendations
NVIDIA GPU models	Refer to NVIDIA Data Center GPUs for more details.
GPU count	2x / 4x / 8x GPUs per server for a balance configuration. GPUs typically should be balanced across CPU sockets and root ports. See topology diagrams for details.
CPU	PCIe Gen5 (or later when available) capable CPUs are recommended, such as NVIDIA Grace, Intel Xeon scalable processor (Emerald Rapids) or AMD Turin. CPU Sockets: minimum 2 sockets. CPU Speed: minimum 2.1 GHz base clock. CPU Cores: minimum 6x physical CPU cores per GPU.
System Memory	Minimum 2x times of total GPU memory. Evenly spread across all CPU sockets and memory channels. Using all memory slots can increase the bandwidth.
PCI Express	NVIDIA GPUs should be inserted into the PCIe slot which supports the PCIe speed and lanes match the GPU specification to get the optimized performance, for example: RTX Pro 6000 should be connected using a PCIe Gen5 x16 interface or above. H200 NVL should be connected using a PCIe Gen5 x16 interface or above. L40S should be connected using a PCIe Gen4 x16 interface or above. Refer to GPU specification to understand the requirement of the PCIe interface.
PCIe Topology	For balanced PCIe architecture, GPUs should be evenly distributed across CPU sockets and PCIe root ports. NICs and NVMe drives should be placed within the same PCIe switch or root complex as the GPUs. It’s important to note that a PCIe switch may be optional for cost-effective inference servers.
PCIe Switches	PCIe Gen5 Switches are recommended.
Network Adapter (NIC)	Minimum 200 Gbps for multi-node inference. Up to 400 Gbps per GPU.
Storage	For local storage, one NVMe per CPU socket is recommended. Minimum 1TB.
Remote Systems Management	Redfish 1.0 (or greater) compatible
Security Key Management	TPM 2.0 module (secure boot)

Deep Learning Training System#

Deep Learning (DL) training application performance is greatly accelerated by the use of NVIDIA GPUs and NVIDIA training microservices such as NVIDIA NeMo™, and includes workloads such as:

NVIDIA TensorRT-LLM (Large Language Model Training)
Recommender Training
Natural Language Processing Training
Computer Vision Training

GPU servers optimized for training workloads are usually located in data centers. Each data center or Cloud Service Provider (CSP) may have their own environmental and compliance standards, but these tend to be less strict than the requirements for NEBS or edge servers.

Table 2 provides the system configuration recommendations for a DL training server using NVIDIA GPUs.

Table 2. Training Server - System recommendations#
Parameter	Deep Learning Server Recommendations
NVIDIA GPU models	Refer to NVIDIA Data Center GPUs for more details.
GPU Configuration	2x / 4x / 8x GPUs per server. GPUs are balanced across CPU sockets and root ports. See topology diagrams for details.
CPU	PCIe Gen5 (or later when available) capable CPUs are recommended, such as NVIDIA Grace, Intel Xeon scalable processor (Emerald Rapids) or AMD Turin. CPU Sockets: minimum 2 sockets. CPU Speed: minimum 2.1 GHz base clock. CPU Cores: minimum 6x physical CPU cores per GPU.
System Memory	Minimum 2x times of total GPU memory. Evenly spread across all CPU sockets and memory channels. Using all memory slots can increase the bandwidth.
PCI Express	NVIDIA GPUs should be inserted into the PCIe slot which supports the PCIe speed and lanes match the GPU specification to get the optimized performance, for example: H200 NVL should be connected using a PCIe Gen5 x16 interface or above. L40S should be connected using a PCIe Gen4 x16 interface or above. Refer to GPU specification to understand the requirement of the PCIe interface.
PCIe Topology	Balanced PCIe topology with GPUs spread evenly across CPU sockets and PCIe root ports. NIC and NVMe drives should be under the same PCIe switch or PCIe root complex as the GPUs. See topology diagrams for details.
PCIe Switches	PCIe Gen5 Switches are recommended.
Network Adapter (NIC)	Minimum 200 Gbps for multi-node inference. Up to 400 Gbps per GPU.
Storage	For local storage, one NVMe per CPU socket is recommended. Minimum 1TB.
Remote Systems Management	Redfish 1.0 (or greater) compatible
Security Key Management	TPM 2.0 module (secure boot)

Inference and Deep Learning Training Topology Diagrams#

This chapter shows the system configurations that correspond to those outlined in Table 1 and Table 2 for inference and DL training servers, starting from the simplest configuration to the most complex.

Note depending on the number of PCIe lanes available from the CPU, a server with one or two GPUs per socket may not require a PCIe switch.

_images/2p-server-2-gpus.png — Figure 1. 2P Server with Two GPUs#

_images/2p-server-4-gpus.png — Figure 2. 2P Server with Four GPUs#

_images/2p-server-8-gpus.png — Figure 3. 2P Server with Eight GPUs and PCIe Switch#