Slurm Executor#
The Slurm executor runs evaluations on high‑performance computing (HPC) clusters managed by Slurm, an open‑source workload manager widely used in research and enterprise environments. It schedules and executes jobs across cluster nodes, enabling parallel, large‑scale evaluation runs while preserving reproducibility via containerized benchmarks.
See common concepts and commands in Executors.
Slurm can optionally host your model for the scope of an evaluation by deploying a serving container on the cluster and pointing the benchmark to that temporary endpoint. In this mode, two containers are used: one for the evaluation harness and one for the model server. The evaluation configuration includes a deployment section when this is enabled. See the examples in the examples/ directory for ready‑to‑use configurations.
If you do not require deployment on Slurm, simply omit the deployment section from your configuration and set the model’s endpoint URL directly (any OpenAI‑compatible endpoint that you host elsewhere).
Prerequisites#
Access to a Slurm cluster (with appropriate partitions/queues)
Pyxis SPANK plugin installed on the cluster
Configuration Example#
Here’s a complete Slurm executor configuration with model deployment:
# examples/slurm_llama_3_1_8b_instruct.yaml
defaults:
- execution: slurm/default
- deployment: vllm
- _self_
execution:
account: your_account
output_dir: /shared/results
partition: gpu
walltime: "04:00:00"
gpus_per_node: 8
deployment:
checkpoint_path: /shared/models/llama-3.1-8b-instruct
served_model_name: meta-llama/Llama-3.1-8B-Instruct
tensor_parallel_size: 1
evaluation:
tasks:
- name: hellaswag
- name: arc_challenge
- name: winogrande
This configuration:
Uses the Slurm execution backend
Deploys a vLLM model server on the cluster
Requests GPU resources (8 GPUs per node, 4-hour time limit)
Runs three benchmark tasks