Introduction#

Motivation#

Modern datacenters are increasingly constrained not by compute resources, but by power. With compute demand accelerating - driven by AI, LLMs, and large-scale R&D - organizations continue to build more datacenters.

Yet, despite this growth, real-world power utilization within general-purpose clusters remains low, typically between 30–60%. This inefficiency stems from several factors: nodes frequently go under maintenance; scheduling gaps leave resources idle; and many workloads are either not optimized for power efficiency or are limited by other bottlenecks such as memory, interconnect bandwidth, or serialization in code. Additionally, phases like job startup, termination, or checkpointing are characterized by reduced GPU activity. Even administrative operations - such as health checks or taking nodes offline due to hardware faults - further degrade utilization.

This power under utilization is magnified in AI datacenters, where GPUs exhibit wide dynamic power ranges and workloads are bursty. Adding to the complexity, strict power compliance is critical: exceeding the allocated budget can trip breakers or trigger costly penalties from utility providers. To simplify design and avoid risk, operators often allocate power evenly across systems, trading flexibility for safety.

The consequence is significant: many datacenters operate at just a fraction of their designed power budget, wasting valuable CapEx, as power infrastructure (transformers, PDUs, cooling) is built to handle peak load.

Power Reservation Steering (PRS) was developed to address this inefficiency - safely reclaiming unused power headroom and redirecting it dynamically to maximize performance within the existing power envelope.

What is PRS?#

PRS is a prediction-based datacenter-scale dynamic power management system. PRS enables operators to increase their data center’s compute density through power oversubscription by adding nodes beyond the nominal power budget. The system manages the power budget dynamically, ensuring power budget compliance while minimizing the impact on application performance.

A potential impact of power oversubscription on running jobs is increased job runtime. If too many power-intensive jobs are executed concurrently, job latency increases because jobs share the power budget and may not receive their required power allocation. To control the impact of power oversubscription on running jobs, PRS uses a custom Slurm plugin that treats power as a schedulable resource, ensuring jobs are placed with awareness of both compute and power availability.

Additionally, to support critical workloads, PRS introduces the concept of hero jobs. These jobs can use the theoretical maximum power of the devices allocated to them. The hero attribute is supported as a Slurm QOS, and should be reserved for special use cases, such as GPU benchmarking jobs.