NVIDIA DGX SuperPOD Overview#

The NVIDIA DGX SuperPOD™ is a multi-user system designed to run large AI and HPC applications efficiently. Although a DGX SuperPOD is composed of many different components, it should be thought of as an entity that can manage simultaneous use by many users, provide advanced access controls for queuing, and schedule resources fairly to ensure maximum performance. It provides the tools for collaboration between users and security controls to protect data and limit interaction between users where necessary. The management tools are designed to treat the multiple components as a single system. For more details about the physical architecture, refer to the NVIDIA DGX SuperPOD Reference Architecture.

This document discusses the range of features and tasks that are supported on the DGX SuperPOD. The constituent elements that make up a DGX SuperPOD, both in hardware and software, support a superset of features compared to the DGX SuperPOD solution. Contact the NVIDIA Technical Account Manager (TAM) if clarification is needed on what functionality is supported by the DGX SuperPOD product.

Important: NVIDIA DGX SuperPOD supports Slurm, Run:ai, or both for scheduling workloads depending on the deployment configuration.

NVIDIA Mission Control shares a common set of features and control plane design across different hardware and software configurations. The supported workload schedulers outlined in this document are Slurm and Run:ai.

There are slight variations in features and deployment architecture outlined in this guide.

Quick Start Guide Overview