Workload Power Profile Solution (WPPS) Introduction#

This document presents the Workload Power Profile Solution (WPPS), a comprehensive, energy-optimized power management system that draws upon deep expertise in Blackwell Series GPU architectures to deliver optimal performance and efficiency across a spectrum of workloads. WPPS intelligently adjusts key parameters—such as the Power Budget, allowing for MAX-Q (performance per watt) or MAX-P (peak performance) modes — to configure GPU power settings tailored for distinct use cases like inference and training.

WPPS is seamlessly integrated as an optional feature within BCM, and it can be activated on BCM-provisioned Slurm nodes either before a job commences or through a job prolog. This flexibility ensures that each job benefits from end-to-end performance optimization and enhanced power efficiency. Through integration with the BCM Base View admin settings UI, DGX SuperPOD administrators and model builders can easily apply these profiles to individual Slurm nodes or groups of nodes dedicated to HPC and AI training or inference workloads. This capability directly addresses the challenge of balancing power, performance, and resource allocation, empowering teams to maximize the utilization of their assigned GPU compute capacity.

Motivation#

In enterprise and research data centers, administrators and model builders (including ML Engineers, Applied Scientists, and others) are tasked with efficiently orchestrating a wide array of workloads—often with competing requirements for throughput, power efficiency, and resource scaling. WPPS provides a practical solution, enabling the dynamic enablement or disablement of four pre-configured GPU Workload Power Profiles:

  • MAX-P Training

  • MAX-P Inference

  • MAX-Q Training

  • MAX-Q Inference

These profiles can be applied on BCM-provisioned Slurm nodes, either during cluster provisioning or when defining a Slurm job, ensuring optimal tuning for each specific workload—whether high-throughput HPC training or energy-efficient AI inference.

WPPS overview diagram