Appendix: Node Configurations#
This appendix provides information on the compute and control plane node configurations
Compute Nodes#
The starting point and the basic building block of this Enterprise RA is the NVIDIA-Certified HGX B300 platform. This is covered in more detail in the Node Hardware section of this document but for completeness in the Appendix there is a more simplified version below
Table 8: RTX PRO Server system componentst +——————+——————-+—————————————————————-+ | | | | | Component | Quantity | Description | +——————+——————-+—————————————————————-+ | | | | | GPUs | 8 on baseboard | Eight NVIDIA B300 SXM GPUs on an NVIDIA | | | | B300 baseboard, with GPU Memory of up to 2304 GB | +——————+——————-+—————————————————————-+ | | | | | DPU | 1 | NVIDIA BlueField-3 B3240 DPU with 2x400G | | | | ports and 1 Gb RJ45 management port | +——————+——————-+—————————————————————-+ | | | | | SuperNIC | 8 on baseboard | NVIDIA ConnectX-8 SuperNIC with a dual-port | | | | 400G configuration | +——————+——————-+—————————————————————-+ | | | | | CPUs | 2 | Minimum of 32 physical CPU cores per socket | | | | Recommendation of 56 physical CPU cores per socket | +——————+——————-+—————————————————————-+ | | | | | System Memory | 1 | Minimum of 2TB system memory | +——————+——————-+—————————————————————-+ | | | | | Storage | 3 | Inference Servers: Minimum 1 TB NVMe drive per CPU socket | | | | Training / DL Servers: Minimum 2 TB NVMe drive per CPU socket | | | | HPC Servers: Minimum 1 TB NVMe drive per CPU socket | | | | 1×1 TB NVMe boot drive | +——————+——————-+—————————————————————-+ | | | | | BMC/iLO | 1 | 1 GbE RJ45 management port | +——————+——————-+—————————————————————-+
NVIDIA-Certified Systems documentation can be found here.
Switches and Cables#
Table 9 and Table 10 provides an overview of the switches and cables used in the cluster for a dual plane configuration. The network adapters of the NVIDIA HGX B300 server, the storage servers, and the control plane servers are included for better understanding of the connectivity requirements.
Table 9: Switching with required inter-switch links, transceivers, and cables
Component |
Server Nodes |
||
32 |
64 |
128 |
|
NVIDIA HGX B300 SXM GPUs (8/node, on baseboard) |
256 |
512 |
1024 |
NVIDIA ConnectX-8 SuperNIC, Compute fabric (8/node, on baseboard) |
256 |
512 |
1024 |
BlueField-3 B3240 DPU, In-band management, Customer & Storage fabric and Support Servers |
32 |
64 |
128 |
NVIDIA Spectrum-4 SN5600 Ethernet switch, compute core fabric |
12 |
24 |
48 |
NVIDIA Spectrum-4 SN5600 Ethernet switch, converged core fabric |
12 |
24 |
48 |
SN2201 leaf switches for OOB management fabric |
4 |
8 |
16 |
OSFP, 2x400G transceiver used for N/S inter-switch links (ISL) |
26 |
256 |
512 |
Cable used for N/S inter-switch links (ISL) |
26 |
256 |
512 |
OSFP, 2x400G transceiver used for E/W inter-switch links (E/W ISL) |
512 |
1024 |
2048 |
Cable used for E/W inter-switch links (ISL) |
512 |
1024 |
2048 |
QSFP 100G transceivers used for OOB leaf switches |
8 |
16 |
32 |
OSFP, 2x400G transceiver used for OOB leaf switches |
2 |
4 |
8 |
Cable used for OOB leaf switches |
2 |
4 |
8 |
End-point connections with required transceivers, and cables (dual plane)
Component |
Server Nodes |
||
32 |
64 |
128 |
|
NVIDIA HGX B300 SXM GPUs (8/node, on baseboard) |
256 |
512 |
1024 |
NVIDIA ConnectX-8 SuperNIC, Compute fabric (8/node, on baseboard) |
256 |
512 |
1024 |
BlueField-3 B3240 DPU, In-band management, Customer & Storage fabric and Support Servers |
32 |
64 |
128 |
OSFP, 2x400G transceiver used for Switch to compute node (DPU) |
32 |
64 |
128 |
OSFP, 2x400G transceiver used for Switch to compute node (SuperNIC) |
256 |
512 |
1024 |
OSFP, 2x400G used for SuperNIC to switch |
256 |
512 |
1024 |
QSFP, 400G transceiver used for DPU to switch |
64 |
128 |
256 |
Cable for Switch to ConnectX-8 SuperNIC |
512 |
1024 |
2048 |
Cable for Switch to B3220 DPU |
64 |
128 |
256 |
OSFP, 2x400G transceiver used for switch to Storage |
4 |
8 |
16 |
QSFP 100G transceiver used for upstream storage |
32 |
64 |
128 |
Cable for storage |
8 |
16 |
32 |
OSFP, 2x400G transceiver used for switch to customer network |
8 |
16 |
32 |
QSFP 100G transceiver used for customer network |
64 |
128 |
256 |
Cable for customer network |
16 |
32 |
64 |
OSFP, 2x400G transceiver used for switch to control-plane B3220 DPUs |
4 |
4 |
4 |
QSFP, 400G transceiver used for control plane B3220 DPUs |
16 |
16 |
16 |
Cable for control-plane |
8 |
8 |
8 |
CAT6 RJ45 cable for 1G OOB fabric |
192 |
384 |
768 |