Power Policies
Power Policies
Overview
Power policies define specific power limits and configurations that can be applied to topology entities. They specify maximum power consumption for different component types (nodes, GPUs, CPUs, memory) and serve as the primary mechanism for controlling power usage across datacenter equipment.
Policy Structure
Each power policy includes:
- Name - Unique identifier (e.g., “Node-High”, “GPU-Efficiency”)
- Limits - Power constraints for different component types
Power limits can be specified as:
- Absolute Watts - Fixed power limits (e.g., 700W per GPU)
- Percentage - Relative to device capabilities (e.g., 80% of max power)
Both formats are supported and can be mixed within the same policy.
Policy Examples
High Performance Policy
{
"Name": "Node-High",
"Limits": [
{"ElementType": "Node", "PowerLimit": {"Watts": 10200}},
{"ElementType": "GPU", "PowerLimit": {"Watts": 7650}},
{"ElementType": "CPU", "PowerLimit": {"Watts": 1530}},
{"ElementType": "Memory", "PowerLimit": {"Watts": 1020}}
]
}Note: Different hardware platforms may use different power limits.
Balanced Policy
{
"Name": "Node-Med",
"Limits": [
{"ElementType": "Node", "PowerLimit": {"Watts": 7140}},
{"ElementType": "GPU", "PowerLimit": {"Watts": 5355}},
{"ElementType": "CPU", "PowerLimit": {"Watts": 1071}},
{"ElementType": "Memory", "PowerLimit": {"Watts": 714}}
]
}Power Saving Policy
{
"Name": "Node-Low",
"Limits": [
{"ElementType": "Node", "PowerLimit": {"Watts": 5250}},
{"ElementType": "GPU", "PowerLimit": {"Watts": 2800}},
{"ElementType": "CPU", "PowerLimit": {"Watts": 720}},
{"ElementType": "Memory", "PowerLimit": {"Watts": 480}}
]
}Policy using percentage limits
Percentage-based policies allow power limits to be specified as a percentage of the device’s maximum power capacity, making them more portable across different hardware configurations:
Note: When using percentage limits, DPS calculates the actual watt values based on the device specifications. For example, a GPU with
maxLoadWatts: 700and a 80% limit would result in a 560W power limit.
{
"Name": "Node-Efficiency",
"Limits": [
{"ElementType": "Node", "PowerLimit": {"Percentage": 70}},
{"ElementType": "GPU", "PowerLimit": {"Percentage": 80}},
{"ElementType": "CPU", "PowerLimit": {"Percentage": 75}},
{"ElementType": "Memory", "PowerLimit": {"Percentage": 60}}
]
}Mixed Policy (Absolute and Percentage)
Policies can combine both absolute watt values and percentage limits for different component types:
{
"Name": "Node-Hybrid",
"Limits": [
{"ElementType": "Node", "PowerLimit": {"Percentage": 85}},
{"ElementType": "GPU", "PowerLimit": {"Watts": 600}},
{"ElementType": "CPU", "PowerLimit": {"Percentage": 90}},
{"ElementType": "Memory", "PowerLimit": {"Watts": 800}}
]
}Policy Application Hierarchy
Power policies are applied in a three-level hierarchy:
1. Topology Default Policies (Base Level)
{
"Type": "ComputerSystem",
"Name": "node001",
"Policy": "Node-Med"
}2. Resource Group Policies (Override Level)
dpsctl resource-group create \
--resource-group "ml-training" \
--policy "Node-High"3. Entity-Specific Policies (Granular Level)
dpsctl resource-group update \
--resource-group "ml-training" \
--entity node001 \
--policy "GPU-Optimized"Complete Policy Set Example
{
"Policies": [
{
"Name": "Node-Low",
"Limits": [
{"ElementType": "Node", "PowerLimit": {"Watts": 5250}},
{"ElementType": "GPU", "PowerLimit": {"Watts": 2800}},
{"ElementType": "CPU", "PowerLimit": {"Watts": 720}},
{"ElementType": "Memory", "PowerLimit": {"Watts": 480}}
]
},
{
"Name": "Node-Med",
"Limits": [
{"ElementType": "Node", "PowerLimit": {"Watts": 7140}},
{"ElementType": "GPU", "PowerLimit": {"Watts": 5355}},
{"ElementType": "CPU", "PowerLimit": {"Watts": 1071}},
{"ElementType": "Memory", "PowerLimit": {"Watts": 714}}
]
},
{
"Name": "Node-High",
"Limits": [
{"ElementType": "Node", "PowerLimit": {"Watts": 10200}},
{"ElementType": "GPU", "PowerLimit": {"Watts": 7650}},
{"ElementType": "CPU", "PowerLimit": {"Watts": 1530}},
{"ElementType": "Memory", "PowerLimit": {"Watts": 1020}}
]
}
]
}Usage
Import Policies
# Import policies from topology file
dpsctl topology import datacenter.json
# List available policies
dpsctl policy listApply to Entities
# Set entity default policy
dpsctl entity update node001 --policy "Node-High"Use in Resource Groups
# Create resource group with policy
dpsctl resource-group create \
--resource-group "ml-training" \
--policy "Node-High"
# Update resource group policy
dpsctl resource-group update \
--resource-group "ml-training" \
--policy "Node-Med"GPU-Level Control
# Set per-GPU power limits for the node in the resource group
dpsctl gpu-policy \
--node node001=500,550,600,700,650,700,550,600Policy Implementation
Power policies are implemented through device-specific plugins that:
- Translate policy limits into hardware commands
- Communicate with BMCs through Redfish APIs
- Monitor compliance and report status
# Device specification references policy plugin
- type: ComputerSystem
model: DGX_H100
spec:
powerPolicyPlugin: DGX_H100For the standard devices that are bundled with DPS, specifying powerPolicyPlugin is optional. Internally DPS stores default power plugin for the given model name, if applicable.
Further Reading
- Device Specifications - Define policy plugin capabilities
- Entities - Apply default policies to hardware
- Resource Groups - Dynamic policy application
- Topologies - Entity default policy configuration
- Workload Power Profiles Settings - GPU optimization profiles for specific workloads