Core Components

The compute nodes with HCAs and switch resources form the foundation of the DGX BasePOD. The specific components used in the DGX BasePOD Reference Architectures are described in this section.

NVIDIA DGX Systems

NVIDIA DGX BasePOD configurations use DGX B200 and DGX H200 and H100 systems. The systems are described in the following sections.

NVIDIA DGX B200 System

The NVIDIA DGX B200 system (Figure 4) offers unprecedented compute density, performance, and flexibility.

_images/image6.png

Figure 4. DGX B200 system

Key specifications of the DGX B200 system are:

  • Built with eight NVIDIA B200

  • 1.4TB of GPU memory space

  • 4x OSFP ports serving 8x single-port NVIDIA ConnectX-7 VPI

  • Up to 400Gb/s InfiniBand/Ethernet 2x dual-port QSFP112 NVIDIA BlueField-3 DPU

  • Dual 5th generation Intel® Xeon® Scalable Processors

Rear ports of the DGX B200 CPU tray are shown in Figure 5.

_images/image7.png

Figure 5. DGX B200 CPU tray rear ports

Four of the ConnectX-7 OSFP are used for the compute fabric. Each pair of dual-port BlueField-3 HCAs (NIC mode) provide parallel pathways to the storage and management fabrics. The out-of-band (OOB) port is used for BMC access.

NVIDIA DGX H200 and H100 Systems

The DGX H200 system (Figure 6) is the latest DGX system and the AI powerhouse that is accelerated by the groundbreaking performance of the NVIDIA Hopper GPU.

_images/image8.png

Figure 6. DGX H200 and H100 system

Key specifications of the DGX H200 and H100 system are:

  • Eight NVIDIA Hopper GPUs.

  • 1,128 GB total GPU memory for H200.

  • 640 GB total GPU memory for H100.

  • Four NVIDIA NVSwitch™ chips.

  • Dual Intel® Xeon® Platinum 8480C processors, 112 cores total, 2.00 GHz (Base), 3.80 GHz (Max Boost) with PCIe 5.0 support.

  • 2 TB of DDR5 system memory.

  • Four OSFP ports serving eight single-port NVIDIA ConnectX-7 VPI, 2x dual-port QSFP112 NVIDIA ConnectX-7 VPI, up to 400 Gb/s InfiniBand/Ethernet.

  • 10Gb/s onboard NIC with RJ45, 100 Gb/s Ethernet NIC, BMC with RJ45.

  • Two 1.92 TB M.2 NVMe drives for DGX OS, eight 3.84 TB U.2 NVMe drives for storage/cache.

The rear ports of the DGX H200 and H100 CPU tray are shown in Figure 7.

_images/image9.png

Figure 7. DGX H200 and H100 CPU tray rear ports

Four of the OSFP ports serve eight ConnectX-7 HCAs for the compute fabric. Each pair of dual-port ConnectX-7 HCAs provide parallel pathways to the storage and management fabrics. The OOB port is used for BMC access.

NVIDIA Networking Adapters

NVIDIA DGX B200 and DGX H200 and H100 systems are equipped with NVIDIA® ConnectX®-7 network adapters. The DGX B200 has both ConnectX-7 and NVIDIA BlueField-3 network adapters. The network adapters are described in this section.

Note

Going forward, HCA will refer to network adapter cards configured for InfiniBand and NIC for those configured for Ethernet.

NVIDIA Networking Adapters

The ConnectX-7 VPI Adapter (Figure 8) is the latest ConnectX Adapter line. It can provide 25/50/100/200/400G of throughput. NVIDIA DGX system use ConnectX-7 and BlueField-3 (NIC Mode) HCAs to provide flexibility in DGX BasePOD deployments with NDR200, NDR400 and RoCE. Specifications are available here.

_images/image11.jpeg

Figure 8. NVIDIA ConnectX-7 HCA

_images/image12.jpeg

Figure 9. NVIDIA BlueField-3 HCA

NVIDIA Networking Switches

DGX BasePOD configurations can be equipped with four types of NVIDIA networking switches. The switches are described in this section, with how the switches are being deployed in the Reference Architectures section.

NVIDIA QM9700 Switch

NVIDIA QM9700 switches (Figure 12) with NDR InfiniBand connectivity power the compute fabric in NDR BasePOD configurations. ConnectX-7 single-port adapters are used for the InfiniBand compute fabric. Each NVIDIA DGX system has dual connections to each QM9700 switch, providing multiple high-bandwidth, low-latency paths between the systems.

_images/image13.png

Figure 12. NVIDIA QM9700 Switch

NVIDIA SN4600C Switch

NVIDIA SN4600C Switches (Figure 15) offer 128 total ports (64 per switch) to provide redundant connectivity for in-band management of the DGX BasePOD. The NVIDIA SN4600C switch can provide for speeds between 1 GbE and 200 GbE.

For storage appliances connected over Ethernet, the NVIDIA SN4600 switches are also used. The ports on the NVIDIA DGX dual-port network adapters are used for both in-band management and storage connectivity.

_images/image14.png

Figure 15. NVIDIA SN4600 Switch

NVIDIA SN2201 Switch

NVIDIA SN2201 switches (Figure 16) offer 48 ports to provide connectivity for OOB management. OOB management provides consolidated management connectivity for all components in BasePOD.

_images/image15.png

Figure 16. NVIDIA SN2201 switch

Control Plane

The minimum requirements for each server in the control plane are:

  • 2 × Intel x86 Xeon Gold or better

  • 512 GB memory

  • 1 × 6.4 TB NVMe for storage

  • 2 × 480 GB M.2 RAID for OS

  • 4 × 200 Gbps network

  • 2 × 100 GbE network