Networking Physical Topologies#

The Enterprise RA configurations use three physical network fabrics:

Compute (Node East/West) Network

CPU Converged (Node North/South) Network

Out-of-Band Management Network

The usage and configuration of each of these three networks is described in the sections below.

Compute (Node East/West) Network#

The compute fabric (East-West) is built using switches with NVIDIA Spectrum technology in a full non blocking fat tree topology to provide the highest level of performance for the application running over the cluster. The compute fabric is an RDMA-enabled fabric. It is designed to provide the shortest hop count through the network for the application, in a spine-leaf manner, where the GPUs are connected using a rail-optimized network topology through their respective BlueField-3 B3140H SuperNICs. This design allows the most efficient communication for multi-GPU applications within and across the nodes.

The collapsed leaf and spine architecture allows for a scalable and reliable network that can fit varied sizes of clusters using the same architecture.

The compute fabric is designed to maximize bandwidth and minimize network latency required to connect GPUs within a server and within a rail.

The compute network is not necessarily required for inference workloads. Most common pre-trained models do not typically exceed the size of a single GPU. Models approximately beyond ~40B FP16 parameters (RTX™ PRO 6000 Blackwell Server Edition) may require model parallelism and require more than one GPU.

Note

In the architectures below, the Compute Fabric is merged with the Converged Fabric for smaller design points. This is to lower cost and to simplify the network architecture.

A Note on Pure Inference Deployments#

For the use case of pure inference, an East-West compute network may not be necessary. Each RTX™ PRO 6000 Blackwell Server Edition GPU can support a model size of approximately 70B parameters. Notably, tensor parallelism yields less performant results per GPU than running the model on the same GPUs in data parallel mode.

Since there are no performance gains from running an inference model on multiple GPUs, either on a single node or across multiple nodes, a compute network is not necessary.

The drawback to implementing an infrastructure without a compute network is that the infrastructure cannot be used for any hybrid workflows including model finetuning. The infrastructure can be deployed without a compute network and can be retrofitted later, albeit with potentially significant downtime and reconfiguration.

CPU Converged (Node North/South) Network#

A converged network for both storage and in-band management is used in the Enterprise RA. This provides enterprises with flexible storage allocation and easy network management. The converged network has the following attributes:

It provides high bandwidth to shared storage and connects the customer through the converged network.

It is independent of the compute fabric to maximize both storage and application performance.

Each compute and management node are connected with two 200 GbE ports to two separate switches to provide redundancy and high storage throughput that can reach up to 40 GB/s per node.

The fabric is built on Ethernet technology with RDMA over Converged Ethernet (RoCE) support and utilizes NVIDIA BlueField-3 B3220 DPUs in each compute node to deliver existing and emerging cloud and storage services.

It is flexible and can scale to meet specific capacity and bandwidth requirements.

Tenant-controlled management nodes providing tenant the flexibility to deploy OS and job scheduler of their choice.

Hybrid storage fabric design with support for tenant isolation provides access to both shared and dedicated storage per tenant.

Used for node provisioning, data movement, Internet access, and other services that must be accessible by the users.

Out-of-Band Management (Node) Network#

The OOB management network connects all the Baseboard Management Controller (BMC) ports, as well as other devices that should be physically isolated from system users to allow the infrastructure management. This includes the 1 GbE switch server management ports and the BlueField-3 DPU management ports.