Management Servers#
For DGX SuperPOD with B300, we provide 2 means of Cluster Access for End Users (AI Practitioners):
With SLURM – best fit for pretraining-type of workflow, with bare-metal or containerized access to the compute infrastructure.
With Kubernetes and NVIDIA Run:AI.
To support the operations, monitoring, and installation of the DGX SuperPOD, a set of management servers is required.
Management Server Quantities and Connectivity#
These nodes are used for the following purposes:
Base Command Manager in High Availability (HA): 2 Nodes, connects to Inband and OOB
K8s Management Server: 3 Nodes, connects to Inband and storage
SLURM Login Nodes, 2 Nodes, connects to Inband and storage
All devices connect to OOB with 1GbE for IPMI/Redfish as well.
System |
Power Supply, Form Factor |
---|---|
B300 PDU |
AC, EIA |
B300 Busbar |
DC, MGX |