SLURM Integration
Integrating the Domain Power Service (DPS) with Slurm allows for automated, workload-specific power management through Slurm’s job scheduling. This integration is primarily achieved by configuring prolog and epilog scripts within your Slurm environment. These scripts create and manage DPS resource groups that correspond to High-Performance Computing (HPC) jobs.
Prerequisites for Slurm Integration
Before integrating DPS with Slurm, ensure the following are in place:
- A Slurm cluster installed and configured. Your user will need administrative access to the Slurm configuration and root access on the head node.
- DPS server is running and accessible.
dpsctlcommand-line client is installed and authenticated on the head node.- Device specifications are defined in DPS.
- An active topology with entities configured exists in DPS.
- Your user account has administrative access to Slurm and root access on the head node.
How DPS Integrates with Slurm
DPS integration with Slurm is managed through the PrologSlurmctld and EpilogSlurmctld parameters in the slurm.conf file. These parameters point to scripts that execute automatically before a job starts and after it completes. Each HPC job is represented in DPS as a resource group.
In a nutshell: The prolog and epilog scripts simply execute dpsctl commands to manage the compute nodes that Slurm has allocated for each job. When a job starts, the prolog script uses dpsctl to create a resource group and add the allocated nodes to it. When the job ends, the epilog script uses dpsctl to clean up by removing the resource group.
1. Prolog Script (Job Start)
The prolog script runs before a job starts on a compute node. Its main responsibilities include:
- Creating a resource group using the job ID as an external identifier.
- Adding allocated compute nodes to the newly created resource group.
- Activating power policies to apply specific power configurations to the hardware resources.
- Configuring workload-specific settings, such as power policies, Power Reallocation System (PRS) status, Dynamic Power Management (DPM) status, and Workload Power Profile Settings (WPPS).
- If DPS detects insufficient power to start the job at the requested settings, it may adjust or return a failure, causing the job to be re-queued if the prolog fails.
2. Epilog Script (Job End)
The epilog script runs after a job completes on a compute node. Its primary function is to delete the resource group created by the prolog script. This action ensures that the devices return to their original power configuration, as defined in the topology, after the job has finished.
Configuration Steps
To integrate DPS with Slurm, you will configure the prolog and epilog scripts in your Slurm configuration.
-
Install DPS Integration Scripts: First, ensure the DPS integration scripts are available on your system. These scripts should be installed alongside the DPS software package.
-
Configure Slurm Prolog/Epilog Scripts: Edit your
slurm.conffile to include the prolog and epilog script paths. Add or modify the following lines in yourslurm.conf:PrologSlurmctld=/usr/share/dps/prolog.sh EpilogSlurmctld=/usr/share/dps/epilog.shNote: Adjust the paths above to match the actual location of the DPS prolog and epilog scripts on your system.
-
Restart Slurm Services: After modifying the
slurm.conffile, restart the Slurm controller daemon to apply the changes:sudo systemctl restart slurmctld -
Verify Script Settings: Verify that the script settings were correctly configured in your
slurm.conffile:grep -E "(PrologSlurmctld|EpilogSlurmctld)" /etc/slurm/slurm.confExpected output will show lines like:
PrologSlurmctld=/usr/share/dps/prolog.sh EpilogSlurmctld=/usr/share/dps/epilog.sh
Advanced SLURM Integration with Job Comments
Advanced Slurm integration supports parsing DPS settings directly from job comments. The prolog script can automatically parse these settings to apply specific configurations:
dps_policy:<string>: Sets a specific power policy.dps_prs:<bool>: Disables the Power Reallocation System.dps_dpm:<bool>: Enables Dynamic Power Management.dps_wpps:<comma-separated ints>: Sets workload power profile IDs for supported GPUs. For WPPS, the Out-Of-Band (OOB) value used by Redfish APIs on the BMC is offset by 3 from the DCGMI/NVSMI value (OOB Value = DCGMI/NVSMI Value + 3).
Troubleshooting
- Prolog Script Failures: If the DPS prolog script fails (returns a non-zero exit code), the job will be re-queued. This typically indicates that DPS was unable to configure a corresponding resource group with the requested power settings, and thus the job should not run.
- Logging: The DPS prolog/epilog scripts log to STDOUT by default.
- Authentication and Authorization: Ensure that
dpsctlhas been properly configured for authentication and authorization. - Slurm Documentation: Refer to the official Slurm documentation for further information on prolog and epilog scripts.