Workload Power Profiles Settings
Workload Power Profiles Settings
Overview
Workload Power Profiles Settings (WPPS) are NVIDIA’s pretuned GPU power optimization profiles. DPS provides a mechanism to enable and disable these profiles on supported GPUs through resource groups settings.
WPPS profiles are designed for different performance and power optimization goals (e.g. Max-Q vs Max-P) and automatically configure GPU power settings without requiring manual tuning.
Key Concepts
Profile IDs
WPPS profiles are identified by numeric IDs ranging from 3 to 258. DPS allows you to:
- Enable specific profile IDs for your workloads
- Enable multiple profiles simultaneously (GPU firmware handles conflicts automatically)
- Apply profiles to entire resource groups for consistent optimization
Important: The power profile values configured via DPS use Out-Of-Band (OOB) management (Redfish APIs on the BMC), which uses a different indexing system than the one used in DCGMI and NVSMI. Specifically, the OOB values are offset by 3:
OOB Value = DCGMI/NVSMI Value + 3
For example, if a profile is set to 1 in DCGMI or NVSMI, it should be set to 4 in OOB management.
Example Usage
# Enable single profile
--workload-profile-ids 5
# Enable multiple profiles
--workload-profile-ids 3,7,12Profile States
For reference, each GPU maintains three profile state masks that are managed automatically by WPPS:
- Supported Profile Mask - Profiles available on the hardware (read-only)
- Requested Profile Mask - Profiles requested by DPS
- Enforced Profile Mask - Profiles currently active on the GPU
DPS users don’t need to manage these states directly - they are handled automatically by the GPU firmware.
DPS Integration and Operations
Resource Group Management
DPS provides simple commands to manage WPPS through resource groups:
# Create resource group with WPPS
dpsctl resource-group create \
--resource-group "ml-training" \
--policy "Node-High" \
--workload-profile-ids 5,7
# Update existing resource group profiles
dpsctl resource-group update \
--resource-group "ml-training" \
--workload-profile-ids 3,12
# Remove all profiles
dpsctl resource-group update \
--resource-group "training-job" \
--remove-all-workload-profiles
# Delete existing resource with WPPS
dpsctl resource-group delete \
--resource-group "ml-training"DPS applies the profiles to all GPUs in the resource group and handles any conflicts automatically. When a resource group configured with WPPS is deleted, DPS removes the WPPS profiles it configured.
SLURM Integration
Use SLURM job comments to automatically apply WPPS:
# Submit job with WPPS
sbatch --comment="dps_policy:Node-High,dps_wpps:3,9" training_job.shDevice Requirements
GPUs must have wppsSupport: true in their device specification to use WPPS.
Note: WPPS is not supported on GPU architectures before Blackwell.
Best Practices
- Choose appropriate profiles for your workload type (training vs inference)
- Test profile combinations to find optimal performance for your specific use case
- Use SLURM job comments for automatic profile selection
- Monitor job performance to validate profile effectiveness
Troubleshooting
If WPPS aren’t working:
- Check device support: Verify GPU specifications include
wppsSupport: true - Verify profile IDs: Ensure IDs are in the valid range (3-258)
- Check logs: Monitor DPS server logs for profile application errors
Further Reading
- Power Policies - Static power limit management
- Resource Groups - Dynamic workload resource management
- Device Specifications - Hardware capability definitions
- Redfish API - Low-level hardware communication protocol
- SLURM Integration - Automated profile selection for jobs