Appendix: Node Configurations#

This appendix provides information on the compute and control plane node configurations

Compute Nodes#

The starting point and the basic building block of this Enterprise RA is the NVIDIA-Certified HGX B300 platform. This is covered in more detail in the Node Hardware section of this document but for completeness in the Appendix there is a more simplified version below

Table 8: RTX PRO Server system componentst +——————+——————-+—————————————————————-+ | | | | | Component | Quantity | Description | +——————+——————-+—————————————————————-+ | | | | | GPUs | 8 on baseboard | Eight NVIDIA B300 SXM GPUs on an NVIDIA | | | | B300 baseboard, with GPU Memory of up to 2304 GB | +——————+——————-+—————————————————————-+ | | | | | DPU | 1 | NVIDIA BlueField-3 B3240 DPU with 2x400G | | | | ports and 1 Gb RJ45 management port | +——————+——————-+—————————————————————-+ | | | | | SuperNIC | 8 on baseboard | NVIDIA ConnectX-8 SuperNIC with a dual-port | | | | 400G configuration | +——————+——————-+—————————————————————-+ | | | | | CPUs | 2 | Minimum of 32 physical CPU cores per socket | | | | Recommendation of 56 physical CPU cores per socket | +——————+——————-+—————————————————————-+ | | | | | System Memory | 1 | Minimum of 2TB system memory | +——————+——————-+—————————————————————-+ | | | | | Storage | 3 | Inference Servers: Minimum 1 TB NVMe drive per CPU socket | | | | Training / DL Servers: Minimum 2 TB NVMe drive per CPU socket | | | | HPC Servers: Minimum 1 TB NVMe drive per CPU socket | | | | 1×1 TB NVMe boot drive | +——————+——————-+—————————————————————-+ | | | | | BMC/iLO | 1 | 1 GbE RJ45 management port | +——————+——————-+—————————————————————-+

NVIDIA-Certified Systems documentation can be found here.

Switches and Cables#

Table 9 and Table 10 provides an overview of the switches and cables used in the cluster for a dual plane configuration. The network adapters of the NVIDIA HGX B300 server, the storage servers, and the control plane servers are included for better understanding of the connectivity requirements.

Table 9: Switching with required inter-switch links, transceivers, and cables

Component

Server Nodes

32

64

128

NVIDIA HGX B300 SXM GPUs (8/node, on baseboard)

256

512

1024

NVIDIA ConnectX-8 SuperNIC, Compute fabric (8/node, on baseboard)

256

512

1024

BlueField-3 B3240 DPU, In-band management, Customer & Storage fabric and Support Servers

32

64

128

NVIDIA Spectrum-4 SN5600 Ethernet switch, compute core fabric

12

24

48

NVIDIA Spectrum-4 SN5600 Ethernet switch, converged core fabric

12

24

48

SN2201 leaf switches for OOB management fabric

4

8

16

OSFP, 2x400G transceiver used for N/S inter-switch links (ISL)

26

256

512

Cable used for N/S inter-switch links (ISL)

26

256

512

OSFP, 2x400G transceiver used for E/W inter-switch links (E/W ISL)

512

1024

2048

Cable used for E/W inter-switch links (ISL)

512

1024

2048

QSFP 100G transceivers used for OOB leaf switches

8

16

32

OSFP, 2x400G transceiver used for OOB leaf switches

2

4

8

Cable used for OOB leaf switches

2

4

8

End-point connections with required transceivers, and cables (dual plane)

Component

Server Nodes

32

64

128

NVIDIA HGX B300 SXM GPUs (8/node, on baseboard)

256

512

1024

NVIDIA ConnectX-8 SuperNIC, Compute fabric (8/node, on baseboard)

256

512

1024

BlueField-3 B3240 DPU, In-band management, Customer & Storage fabric and Support Servers

32

64

128

OSFP, 2x400G transceiver used for Switch to compute node (DPU)

32

64

128

OSFP, 2x400G transceiver used for Switch to compute node (SuperNIC)

256

512

1024

OSFP, 2x400G used for SuperNIC to switch

256

512

1024

QSFP, 400G transceiver used for DPU to switch

64

128

256

Cable for Switch to ConnectX-8 SuperNIC

512

1024

2048

Cable for Switch to B3220 DPU

64

128

256

OSFP, 2x400G transceiver used for switch to Storage

4

8

16

QSFP 100G transceiver used for upstream storage

32

64

128

Cable for storage

8

16

32

OSFP, 2x400G transceiver used for switch to customer network

8

16

32

QSFP 100G transceiver used for customer network

64

128

256

Cable for customer network

16

32

64

OSFP, 2x400G transceiver used for switch to control-plane B3220 DPUs

4

4

4

QSFP, 400G transceiver used for control plane B3220 DPUs

16

16

16

Cable for control-plane

8

8

8

CAT6 RJ45 cable for 1G OOB fabric

192

384

768