Abstract#

This NVIDIA Enterprise Reference Architecture (RA) is a practical design guide for an NVIDIA HGX AI Factory. It is based on a 2-8-9-800 infrastructure configuration (2 CPUs, 8 GPUs, 9 NICs at 800 Gb/s bandwidth per GPU) with NVIDIA HGX™ B300 Servers featuring eight NVIDIA Blackwell Ultra GPUs connected via fifth-generation NVLink with 14.4 TB/s total interconnect bandwidth, along with NVIDIA ConnectX-8 SuperNICs, NVIDIA BlueField®-3 DPUs, and NVIDIA Spectrum-X™ Ethernet networking at the 32, 64, and 128 node design points.

The NVIDIA HGX AI Factory is designed to support enterprise AI inference workloads with industry-leading performance in an air-cooled form factor, AI training and fine-tuning, and high-performance computing (HPC) applications.

This document outlines the hardware components that define this scalable and modular architecture. This includes guidance regarding the SU design and specifics of Ethernet fabric topologies.