Workload Power Profiles Settings

Overview

Workload Power Profiles Settings (WPPS) are NVIDIA’s pretuned GPU power optimization profiles. DPS provides a mechanism to enable and disable these profiles on supported GPUs through resource groups settings.

WPPS profiles are designed for different performance and power optimization goals (e.g. Max-Q vs Max-P) and automatically configure GPU power settings without requiring manual tuning.

Key Concepts

Profile IDs

WPPS profiles are identified by numeric IDs ranging from 3 to 258. DPS allows you to:

Enable specific profile IDs for your workloads
Enable multiple profiles simultaneously (GPU firmware handles conflicts automatically)
Apply profiles to entire resource groups for consistent optimization

Important: The power profile values configured via DPS use Out-Of-Band (OOB) management (Redfish APIs on the BMC), which uses a different indexing system than the one used in DCGMI and NVSMI. Specifically, the OOB values are offset by 3:

OOB Value = DCGMI/NVSMI Value + 3

For example, if a profile is set to 1 in DCGMI or NVSMI, it should be set to 4 in OOB management.

Example Usage

# Enable single profile
--workload-profile-ids 5

# Enable multiple profiles
--workload-profile-ids 3,7,12

Profile States

For reference, each GPU maintains three profile state masks that are managed automatically by WPPS:

Supported Profile Mask - Profiles available on the hardware (read-only)
Requested Profile Mask - Profiles requested by DPS
Enforced Profile Mask - Profiles currently active on the GPU

DPS users don’t need to manage these states directly - they are handled automatically by the GPU firmware.

DPS Integration and Operations

Resource Group Management

DPS provides simple commands to manage WPPS through resource groups:

# Create resource group with WPPS
dpsctl resource-group create \
  --resource-group "ml-training" \
  --policy "Node-High" \
  --workload-profile-ids 5,7

# Update existing resource group profiles
dpsctl resource-group update \
  --resource-group "ml-training" \
  --workload-profile-ids 3,12

# Remove all profiles
dpsctl resource-group update \
  --resource-group "training-job" \
  --remove-all-workload-profiles

# Delete existing resource with WPPS
dpsctl resource-group delete \
  --resource-group "ml-training"

DPS applies the profiles to all GPUs in the resource group and handles any conflicts automatically. When a resource group configured with WPPS is deleted, DPS removes the WPPS profiles it configured.

SLURM Integration

Use SLURM job comments to automatically apply WPPS:

# Submit job with WPPS
sbatch --comment="dps_policy:Node-High,dps_wpps:3,9" training_job.sh

Device Requirements

GPUs must have wppsSupport: true in their device specification to use WPPS.

Note: WPPS is not supported on GPU architectures before Blackwell.

Best Practices

Choose appropriate profiles for your workload type (training vs inference)
Test profile combinations to find optimal performance for your specific use case
Use SLURM job comments for automatic profile selection
Monitor job performance to validate profile effectiveness

Troubleshooting

If WPPS aren’t working:

Check device support: Verify GPU specifications include wppsSupport: true
Verify profile IDs: Ensure IDs are in the valid range (3-258)
Check logs: Monitor DPS server logs for profile application errors