SLURM

SLURM

Overview

SLURM (Simple Linux Utility for Resource Management) is an open-source job scheduler and workload manager designed for high-performance computing (HPC) environments. It manages compute resources, schedules jobs, and provides a framework for parallel and distributed computing. SLURM is widely used in datacenters, research institutions, and enterprise environments to efficiently allocate computing resources among multiple users and applications.

Key Concepts

Job Scheduling

SLURM manages the allocation of compute resources (nodes, CPUs, GPUs, memory) to user jobs. It provides:

  • Job queuing - Jobs wait in queues until resources become available
  • Resource allocation - Automatic assignment of compute nodes and resources
  • Job prioritization - Fair-share scheduling based on user quotas and priorities
  • Preemption - Ability to suspend or terminate lower-priority jobs for higher-priority ones

Resource Management

SLURM tracks and manages:

  • Compute nodes - Individual servers in the cluster
  • Partitions - Logical groupings of nodes (e.g., “gpu”, “cpu-only”, “debug”)
  • Accounts - User groups with resource quotas and limits
  • Quality of Service (QoS) - Service levels that affect job priority and limits

SLURM CLI Tools

Core Commands

Job Submission:

# Submit a simple job
sbatch job_script.sh

# Submit with specific requirements
sbatch --nodes=4 --ntasks-per-node=8 --gres=gpu:4 job_script.sh

# Submit an interactive job
srun --pty --nodes=1 --ntasks=1 bash

Job Management:

# List all jobs
squeue

# List jobs for specific user
squeue -u username

# Cancel a job
scancel job_id

# Hold a job (prevent from starting)
scontrol hold job_id

# Release a held job
scontrol release job_id

Resource Information:

# Show cluster status
sinfo

# Show detailed node information
scontrol show nodes

# Show partition information
scontrol show partitions

# Show account information
sacctmgr show accounts

Job Scripts

SLURM job scripts are shell scripts with special directives:

#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --gres=gpu:4
#SBATCH --time=02:00:00
#SBATCH --partition=gpu
#SBATCH --account=my_account

# Job commands
module load cuda/11.8
mpirun -np 32 ./my_application

Prolog and Epilog Functionality

Overview

SLURM’s prolog and epilog scripts provide hooks for custom actions before and after job execution. These scripts run on the compute nodes and can perform setup, cleanup, and integration tasks.

Prolog Scripts

Prolog scripts execute before a job starts on a compute node. Common uses include:

File System Operations:

#!/bin/bash
# Create job-specific directories
mkdir -p /scratch/job_${SLURM_JOB_ID}
ln -s /scratch/job_${SLURM_JOB_ID} $SLURM_SUBMIT_DIR/scratch

Environment Setup:

#!/bin/bash
# Load required modules
module load cuda/11.8
module load openmpi/4.1.4

# Set environment variables
export CUDA_VISIBLE_DEVICES=0,1,2,3
export OMP_NUM_THREADS=4

Epilog Scripts

Epilog scripts execute after a job completes on a compute node. Common uses include:

Cleanup Operations:

#!/bin/bash
# Remove job-specific files
rm -rf /scratch/job_${SLURM_JOB_ID}
rm -f $SLURM_SUBMIT_DIR/scratch

Logging and Monitoring:

#!/bin/bash
# Log job completion
echo "$(date): Job ${SLURM_JOB_ID} completed on $(hostname)" >> /var/log/slurm/jobs.log

# Collect performance metrics
sacct -j ${SLURM_JOB_ID} --format=JobID,JobName,Elapsed,MaxRSS,MaxVMSize >> /var/log/slurm/metrics.log

Further Reading