Power Compliance
The Domain Power Service (DPS) ensures power compliance through a validation system that operates at two critical stages: topology activation and resource group creation. This guide explains how DPS maintains power compliance by validating constraints and automatically adjusting policies when necessary.
Overview
DPS power compliance is achieved through:
- Topology Constraint Validation - Ensuring the power topology can support the defined constraints before activation
- Dynamic Resource Group Validation - Validating power requirements when new resource groups are created
- Automatic Policy Adjustment - Falling back to lower power policies when constraints cannot be met
- Continuous Monitoring - Ongoing validation of power consumption against defined limits
Topology Activation Compliance
Initial Validation Process
When a topology is activated, DPS performs validation to ensure all power constraints can be satisfied:
1. Power Budget Verification
- Total Power Calculation: DPS calculates the total power requirements across all entities in the topology
- Constraint Checking: Validates that power domains have sufficient capacity to support their child entities
- Hierarchy Validation: Ensures power flows correctly through the entire power distribution hierarchy
2. Power Factor Validation
- Loss Calculations: Accounts for power losses through power distribution components using defined power factors
- Efficiency Modeling: Validates that power supply efficiency ratings are realistic for the expected loads
3. Activation Failure Scenarios
The topology activation will fail if:
- Total power demand exceeds available power budget in any power domain
- Power distribution components cannot handle the calculated power flows
- Constraint violations are detected at any level of the power hierarchy
Resource Group Compliance
Dynamic Validation During Creation
When resource groups are created (typically through Slurm prolog scripts), DPS performs real-time compliance validation:
1. Policy Application Validation
- Resource Allocation: DPS calculates power requirements for the specific nodes allocated to the resource group
- Policy Integration: Applies the requested power policy to the allocated resources
- Constraint Verification: Validates that the modified topology still satisfies all power constraints
2. Power Tree Constraint Checking
DPS validates the entire power tree with the new policy applied:
PowerDomain (Root)
├── PDU-A
│ ├── Node-01 [Applied Policy: High-Performance]
│ └── Node-02 [Applied Policy: High-Performance]
└── PDU-B
├── Node-03 [Default Policy]
└── Node-04 [Default Policy]The validation ensures:
- The root power domain can supply the total demand
- Each PDU can handle its assigned load
- No thermal or electrical limits are exceeded
3. Resource Group Creation Process
- Initial Request: Resource group creation request with desired power policy
- Power Calculation: DPS calculates total power requirements with the requested policy
- Constraint Validation: Validates against all topology constraints
- Success Path: If validation passes, resource group is created with the requested policy
- Fallback Path: If validation fails, DPS attempts policy adjustment
Automatic Policy Adjustment
Policy Fallback Mechanism
When the requested power policy cannot be satisfied, DPS implements an intelligent fallback system:
1. Policy Hierarchy
DPS maintains a hierarchy of power policies from highest to lowest power consumption, for example:
High-Performance → Balanced → Power-Saver → Emergency2. Automatic Adjustment Process
- Validation Failure: Initial policy validation fails due to power constraints
- Policy Downgrade: DPS automatically tries the next lower power policy
- Re-validation: Performs constraint validation with the lower policy
- Iteration: Continues until a compliant policy is found or all options are exhausted
3. Fallback Example
# Resource group creation with automatic policy adjustment
Initial Request: Policy "High-Performance" for nodes [node-01, node-02]
Validation Result: FAILED - Insufficient power budget (deficit: 2500W)
Attempting Fallback: Policy "Balanced"
Validation Result: FAILED - Insufficient power budget (deficit: 800W)
Attempting Fallback: Policy "Power-Saver"
Validation Result: SUCCESS - Resource group created with "Power-Saver" policy
Resource Group ID: rg-12345
Applied Policy: Power-Saver
Allocated Nodes: node-01, node-02
Power Consumption: 12500W (within budget)Best Practices for Compliance
1. Topology Design
- Conservative Budgeting: Allocate 10-15% headroom in power domains for unexpected demands
- Realistic Power Factors: Use measured power factors rather than theoretical values
- Hierarchical Constraints: Implement constraints at multiple levels for better granularity
2. Policy Management
- Graduated Policies: Create policies with gradual power differences to enable smooth fallbacks
- Emergency Policies: Always maintain ultra-low power emergency policies
3. Monitoring and Maintenance
- Regular Validation: Periodically re-validate topology constraints as hardware changes
- Trend Analysis: Monitor power consumption trends to predict future capacity needs
Troubleshooting Compliance Issues
Common Compliance Problems
1. Topology Activation Failures
Problem: Topology fails to activate due to power budget violations Solution:
- Review power domain capacities and increase if necessary
- Adjust default topology entity policies
- Validate entity power specifications
2. Resource Group Creation Failures
Problem: All policy fallbacks fail during resource group creation Solution:
- Review current power allocation across existing resource groups
- Implement more aggressive power-saving policies
Integration with Workload Managers
Slurm Integration Compliance
When integrated with Slurm, DPS compliance works seamlessly:
- Job Submission: User submits job with power requirements
- Prolog Execution: Slurm prolog script calls DPS to create resource group
- Compliance Check: DPS validates power compliance for allocated nodes
- Policy Adjustment: If needed, DPS automatically adjusts to compliant policy
- Job Execution: Job runs with compliant power configuration
- Epilog Cleanup: Slurm epilog script removes resource group, freeing power budget
This integration ensures that computational workloads never compromise power infrastructure integrity while maximizing available performance within compliance constraints.