SLURM
SLURM
Overview
SLURM (Simple Linux Utility for Resource Management) is an open-source job scheduler and workload manager designed for high-performance computing (HPC) environments. It manages compute resources, schedules jobs, and provides a framework for parallel and distributed computing. SLURM is widely used in datacenters, research institutions, and enterprise environments to efficiently allocate computing resources among multiple users and applications.
Key Concepts
Job Scheduling
SLURM manages the allocation of compute resources (nodes, CPUs, GPUs, memory) to user jobs. It provides:
- Job queuing - Jobs wait in queues until resources become available
- Resource allocation - Automatic assignment of compute nodes and resources
- Job prioritization - Fair-share scheduling based on user quotas and priorities
- Preemption - Ability to suspend or terminate lower-priority jobs for higher-priority ones
Resource Management
SLURM tracks and manages:
- Compute nodes - Individual servers in the cluster
- Partitions - Logical groupings of nodes (e.g., “gpu”, “cpu-only”, “debug”)
- Accounts - User groups with resource quotas and limits
- Quality of Service (QoS) - Service levels that affect job priority and limits
SLURM CLI Tools
Core Commands
Job Submission:
# Submit a simple job
sbatch job_script.sh
# Submit with specific requirements
sbatch --nodes=4 --ntasks-per-node=8 --gres=gpu:4 job_script.sh
# Submit an interactive job
srun --pty --nodes=1 --ntasks=1 bash
Job Management:
# List all jobs
squeue
# List jobs for specific user
squeue -u username
# Cancel a job
scancel job_id
# Hold a job (prevent from starting)
scontrol hold job_id
# Release a held job
scontrol release job_id
Resource Information:
# Show cluster status
sinfo
# Show detailed node information
scontrol show nodes
# Show partition information
scontrol show partitions
# Show account information
sacctmgr show accounts
Job Scripts
SLURM job scripts are shell scripts with special directives:
#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --gres=gpu:4
#SBATCH --time=02:00:00
#SBATCH --partition=gpu
#SBATCH --account=my_account
# Job commands
module load cuda/11.8
mpirun -np 32 ./my_application
Prolog and Epilog Functionality
Overview
SLURM’s prolog and epilog scripts provide hooks for custom actions before and after job execution. These scripts run on the compute nodes and can perform setup, cleanup, and integration tasks.
Prolog Scripts
Prolog scripts execute before a job starts on a compute node. Common uses include:
File System Operations:
#!/bin/bash
# Create job-specific directories
mkdir -p /scratch/job_${SLURM_JOB_ID}
ln -s /scratch/job_${SLURM_JOB_ID} $SLURM_SUBMIT_DIR/scratch
Environment Setup:
#!/bin/bash
# Load required modules
module load cuda/11.8
module load openmpi/4.1.4
# Set environment variables
export CUDA_VISIBLE_DEVICES=0,1,2,3
export OMP_NUM_THREADS=4
Epilog Scripts
Epilog scripts execute after a job completes on a compute node. Common uses include:
Cleanup Operations:
#!/bin/bash
# Remove job-specific files
rm -rf /scratch/job_${SLURM_JOB_ID}
rm -f $SLURM_SUBMIT_DIR/scratch
Logging and Monitoring:
#!/bin/bash
# Log job completion
echo "$(date): Job ${SLURM_JOB_ID} completed on $(hostname)" >> /var/log/slurm/jobs.log
# Collect performance metrics
sacct -j ${SLURM_JOB_ID} --format=JobID,JobName,Elapsed,MaxRSS,MaxVMSize >> /var/log/slurm/metrics.log
Further Reading
- SLURM Documentation - Official SLURM documentation
- Resource Groups - DPS resource group management
- DPS Integration Guide - Detailed DPS-SLURM integration
- SLURM Prolog/Epilog Guide - Official prolog/epilog documentation