Provisioning and Cluster Management

Cluster management tools go beyond resource managers and job schedulers, managing the state of each node in an entire cluster. They typically include mechanisms to provision the nodes in the cluster (install the operating system image, firmware, and drivers), deploy a job scheduler, monitor and manage hardware, configure user access, and make modifications to the software stack.

Provisioning and cluster management of DGX Systems may be bootstrapped with DeepOps. DeepOps is open source and highly modular. It has defaults which can be configured to meet organizational needs and incorporates best practices for deploying GPU-accelerated Kubernetes and Slurm.

Alternatively, Bright Cluster Manager deploys complete DGX PODs over bare metal and manages them effectively. It provides management for the entire DGX POD, including the hardware, operating system, and users. It even manages the Data Analytics software, NGC, Bright Data Science, Kubernetes, Docker and Singularity Containers. With Bright Cluster Manager, a system administrator can quickly stand up DGX PODs and keep them running reliably throughout their life cycle—all with the ease and elegance of a fully-featured, enterprise-grade cluster manager.