Master Node

A master node, also sometimes called a head node, is a very useful server within a cluster. Typically, it runs the cluster management software, the resource manager, and any monitoring tools that are used. For smaller clusters, it is also used as a login node for users to create and submit jobs.

For clusters of any size that include the DGX-2, DGX-1, or even a group of DGX Stations, a master node can be very helpful. It allows the DGX systems to focus solely on computing rather than any interactive logins or post-processing that users may be doing. As the number of nodes in a cluster increases, it is recommended to use a master node.

It is recommended to size the master node for things such as:
  • Interactive user logins
  • Resource management (running a job scheduler)
  • Graphical pre-processing and post-processing
    • Consider a GPU in the master node for visualization
  • Cluster monitoring
  • Cluster management

Since the master node becomes an important part of the operation of the cluster, consider using RAID-1 for the OS drive in the master node as well as redundant power supplies. This can help improve the uptime of the master node.

For smaller clusters, you can also use the master node as an NFS server by adding storage and more memory to the master node and NFS export the storage to the cluster clients. For larger clusters, it is recommended to have dedicated storage, either NFS or a parallel file system.

For InfiniBand networks, the master node can also be used for running the software SM. If you want some HA for the SM, run the primary SM on the master node and use an SM on the IB switch as a secondary SM (hardware SM).

As the cluster grows, it is recommended to consider splitting the login and data processing functions from the master node to one or more dedicated login nodes. This is also true as the number of users grows. You can run the primary SM on the master node and other SM’s on the login nodes. You could even use the hardware SM’s on the switches as backups.