Head Node

A head node is a very useful server within a cluster. Typically, it runs the cluster management software, the resource manager, and any monitoring tools that are used. For smaller clusters, it is also used as a login node for users to create and submit jobs.

For clusters of any size that include the DGX-2, DGX-1, or even a group of DGX Stations, a head node can be very helpful. It allows the DGX systems to focus solely on computing rather than any interactive logins or post-processing that users may be doing. As the number of nodes in a cluster increases, it is recommended to use a head node.

It is recommended to size the head node for things such as:
  • Interactive user logins
  • Resource management (running a job scheduler)
  • Graphical pre-processing and post-processing
    • Consider a GPU in the head node for visualization
  • Cluster monitoring
  • Cluster management

Since the head node becomes an important part of the operation of the cluster, consider using RAID-1 for the OS drive in the head node as well as redundant power supplies. This can help improve the uptime of the head node.

For smaller clusters, you can also use the head node as an NFS server by adding storage and more memory to the head node and NFS export the storage to the cluster clients. For larger clusters, it is recommended to have dedicated storage, either NFS or a parallel file system.

For InfiniBand networks, the head node can also be used for running the software SM. If you want some HA for the SM, run the primary SM on the head node and use an SM on the IB switch as a secondary SM (hardware SM).

As the cluster grows, it is recommended to consider splitting the login and data processing functions from the head node to one or more dedicated login nodes. This is also true as the number of users grows. You can run the primary SM on the head node and other SM’s on the login nodes. You could even use the hardware SM’s on the switches as backups.