FAQ#
What is node static power?#
Dynamic (managed) power: The sum of the theoretical maximum power of all devices in the node that PRS manages, such as GPUs and CPUs.
Static (unmanaged) power: Everything else, i.e. the theoretical maximum power of all devices and equipment in the node that PRS does not manage, such as fans, NICs, storage. PRS assumes this power is fully utilized and cannot manage it.
Who should configure the PDN?#
Only engineers with in-depth knowledge of your cluster’s PDN should define PDs, set budgets, and tweak thresholds.
Why can’t I submit one large job across all nodes?#
PRS estimates a job’s power consumption before it starts—typically assuming an average value above idle. To avoid risking job performance (e.g., longer runtimes), PRS may block job placement if the available power budget is below the theoretical maximum, even for non-hero jobs.
The utilization_thresh
is a PD attribute that sets the maximum allowed power utilization of the PD for starting a new job. If the power utilization of the PD, including the potential job, exceeds this threshold, the job stays pending. The default is 93%, preserving headroom for runtime power fluctuations. Setting it to 100% permits full power utilization across all nodes.
To set the allowed power utilization threshold to 100%:
cmsh -c "configurationoverlay use prs-server ; roles; use PRS::Server ; domains ; set <domain-name> -e utilization_thresh 100 ; commit "
Example: Configuring a PD for a GB200 NVL72 rack#
A GB200 NVL72 rack has 18 nodes, each with 4 Blackwell GPUs and 2 Grace CPUs. Total rack power at full load ≈ 120 kW, including nodes and rack components (switches, interconnects). For PRS, we divide the total by 18 to assign a maximum power per node, embedding external rack power into each node’s budget.
Hence, the maximum node power consumption is 6.7 kW total; PRS manages 5.3 kW (4 × 1.2 kW GPUs + 0.5 kW CPUs) and static power ≈ 1.4 kW. Therefore:
Static power usage: 1400 Watts.
Static power usage down: 50 Watts; when a node is down, it still draws a small amount of standby power — mainly from the BMC.
Budgeting:
Full-infra capacity: The total power budget for the PD depends on the site’s infrastructure, including the capacity of power distribution units (PDUs), circuit breakers, and upstream power sources. If the infrastructure is provisioned for the full theoretical load, the required budget would be at least 120 kW.
With 20% margin: 96 kW. Note that since PRS only manages the dynamic portion (GPU and CPU power), any reduction in power budget must be absorbed entirely within that 5.3 kW × 18 nodes = 95.4 kW managed power envelope. Thus, a 20% reduction in the budget (equal to 120 kW × 0.2 = 24 kW) translates into a ~25% cut in the managed power budget (since 24 / 95.4 × 100 ≈ 25).
Will PRS PDs update when a node is removed from BCM or a category?#
Yes—any removal is evaluated by CMD and sent to the PRS config server automatically.
Will PRS PDs update when a new node is added to a category?#
Yes—additions are likewise picked up by CMD and pushed to PRS.
What is the lifecycle of updates to the PRS config server?#
Any change that can affect PRS (node, category, overlay) is evaluated by CMD and sent to the PRS config server.
Does PRS reflect autoscaler changes automatically?#
Yes—nodes inherit properties from their categories, and when the autoscaler scales in/out, CMD propagates those updates to PRS.
When is the CPU included in PRS-managed devices?#
PRS includes the CPU as a managed device only if it supports power capping. This applies in the following cases:
GRACE CPUs: Always supported.
Certain Intel CPUs: Supported if power capping is enabled in the BIOS.
To check if a CPU supports power capping, run:
cmsh -c "device use <node-name> ; sysinfo" | grep Core
Look for a line that includes “powerCapEnabled” in the output. If set to yes, PRS will manage the CPU’s power limit.
What does it mean to stop or start a PD?#
Starting or stopping a PD controls whether PRS actively manages power for the nodes in that PD.
Start: When a PD is in the running state, PRS continuously monitors and adjusts power limits for the nodes assigned to it. The control loop is active, and PRS enforces the power budget dynamically based on workload and predictions.
Stop: When a PD is in the stopped state, PRS halts power control operations for that PD. No new power limits are applied, but the last limits set by PRS remain in effect. PRS does not reset or release those limits—it simply suspends further updates until the PD is started again.
This functionality is useful for scenarios like maintenance, staged rollouts, or debugging.