Appendix#
This appendix provides information on the compute node configurations along with the switches, cables and transceivers used in the cluster.
Compute Nodes#
The starting point and basic building block of this NVIDIA Enteprise RA is an NVIDIA-Certified RTX PRO Server.
Table 8: RTX PRO Server system componentst
Component |
Quantity |
Specification |
|---|---|---|
GPUs |
8 |
Eight RTX™ PRO 6000 Blackwell Server Edition GPUs with 768GB of GPU memory |
DPU |
1 |
NVIDIA BlueField-3 B3220 DPU with 2x200G ports and 1 Gb RJ45 management port |
SuperNIC |
4 |
NVIDIA BlueField-3 B3140H SuperNIC with a 400G port and 1 Gb RJ45 management port |
CPUs |
2 |
For RTX™ PRO 6000 Blackwell Server Edition: Minimum 7 physical CPU cores per GPU For configuration using MIG, min 2 CPU cores are required per MIG instance. CSPs may provision cores based on VM administration. For OS kernel or virtualization, an additional two cores per GPU are required. For RAPIDS Apache Spark Data Processing (ETL) workloads, an additional core is required per GPU Recommendation of at least 48 physical CPU cores per socket for an 8 RTX™ PRO 6000 Blackwell Server Edition GPU configuration |
System Memory |
1 |
Minimum 128 GB of system memory per GPU. Recommendation of at least 1TB memory configured for system memory. |
Storage |
3 |
Inference Servers: Minimum 1 TB NVMe drive per CPU socket DL Servers: Minimum 2 TB NVMe drive per CPU socket HPC Servers: Minimum 1 TB NVMe drive per CPU socket 1×1 TB NVMe boot drive |
BMC |
1 |
1 GbE RJ45 management port |
NVIDIA-Certified Systems documentation can be found here.
Switches and Cables#
Table 9 and Table 10 provide an overview of the switches and cables used in the cluster. The network adapters of the RTX PRO Server 8-GPU servers, the storage servers, and the control plane servers are included for better understanding of the connectivity requirements. Your specific requirements may vary depending on your configuration—additional storage servers or compute nodes will require corresponding increases in cables and transceivers.
Table 9: Switching with required inter-switch links, transceivers, and cables (GPU, NIC, DPU counts for reference)
Component |
Server Nodes |
|
|---|---|---|
16 |
32 |
|
RTX PRO Server GPUs (8/node) |
128 |
256 |
BlueField-3 B3140H SuperNIC, Compute fabric (8/node) |
64 |
128 |
BlueField-3 B3220 DPU, In-band management, customer & Storage fabric and Support Servers |
16 |
32 |
SN5610 Ethernet switch, compute core fabric |
2 |
2 |
SN5610 Ethernet switch, converged core fabric |
2 |
|
SN2201 leaf switches for OOB management fabric |
2 |
4 |
OSFP, 2x400G transceiver used for inter-switch links (ISL) |
32 |
82 |
Cable used for inter-switch links (ISL) |
32 |
82 |
QSFP 100G transceiver used for OOB leaf switches |
4 |
8 |
OSFP, 2x400G transceiver used for OOB leaf switches |
2 |
2 |
Cable used for OOB leaf switches |
2 |
2 |
Table 10: End-point connections with required transceivers, and cables (GPU, NIC, DPU counts for reference)
Component |
Server Nodes |
|
|---|---|---|
16 |
32 |
|
RTX PRO Server GPUs (8/node) |
128 |
256 |
BlueField-3 B3140H SuperNIC, Compute fabric (8/node) |
64 |
128 |
BlueField-3 B3220 DPU, In-band management, customer & Storage fabric and Support Servers |
16 |
32 |
QSFP, 2x400G transceiver used for Switch to compute node (DPU & SuperNIC) |
40 |
80 |
QSFP, 400G transceiver used for compute node (DPU & SuperNIC) |
96 |
192 |
Cable for Switch to B3140H SuperNIC |
64 |
128 |
Cable for Switch to B3220 DPU |
16 |
32 |
OSFP, 2x400G transceiver used for switch to Storage |
2 |
4 |
QSFP 100G transceiver used for upstream storage |
16 |
32 |
Cable for storage |
4 |
8 |
OSFP, 2x400G transceiver used for switch to customer network |
4 |
8 |
QSFP 100G transceiver used for customer network |
32 |
64 |
Cable for customer network |
8 |
16 |
OSFP, 2x400G transceiver used for switch to control-plane B3220 DPUs |
4 |
4 |
QSFP, 400G transceiver used for control-plane B3220 DPUs |
16 |
16 |
Cable for control-plane |
8 |
8 |
CAT6 RJ45 cable for 1G OOB fabric |
96 |
192 |
For exact BOM and part numbers, please work with your OEM/ODM partners and our NVIDIA support team depending on your specific rack and data center conditions.