Slurm Deployment via Launcher#
Deploy and evaluate models on HPC clusters using Slurm workload manager through NeMo Evaluator Launcher orchestration.
Overview#
Slurm launcher-orchestrated deployment:
Submits jobs to Slurm-managed HPC clusters
Supports multi-node evaluation runs
Handles resource allocation and job scheduling
Manages model deployment lifecycle within Slurm jobs
Quick Start#
# Deploy and evaluate on Slurm cluster
nemo-evaluator-launcher run \
--config packages/nemo-evaluator-launcher/examples/slurm_vllm_checkpoint_path.yaml \
-o deployment.checkpoint_path=/shared/models/llama-3.1-8b-instruct \
-o execution.partition=gpu
vLLM Deployment#
# Slurm with vLLM deployment
defaults:
- execution: slurm/default
- deployment: vllm
- _self_
deployment:
type: vllm
checkpoint_path: /shared/models/llama-3.1-8b-instruct
served_model_name: meta-llama/Llama-3.1-8B-Instruct
tensor_parallel_size: 1
data_parallel_size: 8
port: 8000
execution:
account: my-account
output_dir: /shared/results
partition: gpu
num_nodes: 1
ntasks_per_node: 1
gres: gpu:8
walltime: "02:00:00"
target:
api_endpoint:
url: http://localhost:8000/v1/chat/completions
model_id: meta-llama/Llama-3.1-8B-Instruct
evaluation:
tasks:
- name: ifeval
- name: gpqa_diamond
- name: mbpp
Slurm Configuration#
Supported Parameters#
The following execution parameters are supported for Slurm deployments. See configs/execution/slurm/default.yaml in the launcher package for the base configuration:
execution:
# Required parameters
hostname: ??? # Slurm cluster hostname
username: ${oc.env:USER} # SSH username (defaults to USER environment variable)
account: ??? # Slurm account for billing
output_dir: ??? # Results directory
# Resource allocation
partition: batch # Slurm partition/queue
num_nodes: 1 # Total SLURM nodes
num_instances: 1 # Independent deployment instances (HAProxy auto-enabled when > 1)
ntasks_per_node: 1 # Tasks per node
gres: gpu:8 # GPU resources
walltime: "01:00:00" # Wall time limit (HH:MM:SS)
# Environment variables and mounts
env_vars:
deployment: {} # Environment variables for deployment container
evaluation: {} # Environment variables for evaluation container
mounts:
deployment: {} # Mount points for deployment container (source:target format)
evaluation: {} # Mount points for evaluation container (source:target format)
mount_home: true # Whether to mount home directory
Note
The gpus_per_node parameter can be used as an alternative to gres for specifying GPU resources. However, gres is the default in the base configuration.
Multi-Node Deployment#
Multi-node deployment can be achieved with or without Ray.
Without Ray (Custom Command)#
For multi-node setups using vLLM’s native data parallelism or other custom coordination, override deployment.command with your own multi-node logic. The launcher exports MASTER_IP and SLURM_PROCID to help coordinate nodes:
defaults:
- execution: slurm/default
- deployment: vllm
- _self_
execution:
num_nodes: 2
deployment:
command: >-
bash -c 'if [ "$SLURM_PROCID" -eq 0 ]; then
vllm serve ${deployment.hf_model_handle} --data-parallel-size 16 --data-parallel-address $MASTER_IP ...;
else
vllm serve ${deployment.hf_model_handle} --headless --data-parallel-address $MASTER_IP ...;
fi'
See examples/slurm_vllm_multinode_dp.yaml for a complete native data parallelism example.
With Ray (vllm_ray)#
For models that require tensor/pipeline parallelism across nodes, use the vllm_ray deployment config which includes a built-in Ray cluster setup script:
defaults:
- execution: slurm/default
- deployment: vllm_ray # Ray-managed multi-node vLLM deployment
- _self_
execution:
num_nodes: 2 # Single instance spanning 2 nodes
deployment:
tensor_parallel_size: 8
pipeline_parallel_size: 2
Multi-Instance with HAProxy#
To run multiple independent deployment instances with HAProxy load-balancing:
execution:
num_nodes: 4 # Total SLURM nodes
num_instances: 2 # 2 instances of 2 nodes each → HAProxy auto-enabled
When num_instances > 1, HAProxy is automatically configured to distribute requests across instance head nodes. See the examples/ directory for complete configurations.
Configuration Examples#
Benchmark Suite Evaluation#
# Run multiple benchmarks on a single model
defaults:
- execution: slurm/default
- deployment: vllm
- _self_
deployment:
type: vllm
checkpoint_path: /shared/models/llama-3.1-8b-instruct
served_model_name: meta-llama/Llama-3.1-8B-Instruct
tensor_parallel_size: 1
data_parallel_size: 8
port: 8000
execution:
account: my-account
output_dir: /shared/results
hostname: slurm.example.com
partition: gpu
num_nodes: 1
ntasks_per_node: 1
gres: gpu:8
walltime: "06:00:00"
target:
api_endpoint:
url: http://localhost:8000/v1/chat/completions
model_id: meta-llama/Llama-3.1-8B-Instruct
evaluation:
tasks:
- name: ifeval
- name: gpqa_diamond
- name: mbpp
- name: hellaswag
Tasks Requiring Dataset Mounting#
Some tasks require access to local datasets stored on the cluster’s shared filesystem:
evaluation:
tasks:
- name: mteb.techqa
dataset_dir: /shared/datasets/techqa # Path on shared filesystem
The system will automatically:
Mount the dataset directory into the evaluation container
Set the
NEMO_EVALUATOR_DATASET_DIRenvironment variableValidate that all required environment variables are configured
Custom mount path example:
evaluation:
tasks:
- name: mteb.techqa
dataset_dir: /shared/datasets/techqa
dataset_mount_path: /data/techqa # Optional: customize container mount point
Note
Ensure the dataset directory is accessible from all cluster nodes via shared storage (e.g., NFS, Lustre).
Job Management#
Submitting Jobs#
# Submit job with configuration
nemo-evaluator-launcher run \
--config packages/nemo-evaluator-launcher/examples/slurm_vllm_basic.yaml
# Submit with configuration overrides
nemo-evaluator-launcher run \
--config packages/nemo-evaluator-launcher/examples/slurm_vllm_basic.yaml \
-o execution.walltime="04:00:00" \
-o execution.partition=gpu-long
Monitoring Jobs#
# Check job status
nemo-evaluator-launcher status <job_id>
# List all runs (optionally filter by executor)
nemo-evaluator-launcher ls runs --executor slurm
Managing Jobs#
# Cancel job
nemo-evaluator-launcher kill <job_id>
Native Slurm Commands#
You can also use native Slurm commands to manage jobs directly:
# View job details
squeue -j <slurm_job_id> -o "%.18i %.9P %.50j %.8u %.2t %.10M %.6D %R"
# Check job efficiency
seff <slurm_job_id>
# Cancel Slurm job directly
scancel <slurm_job_id>
# Hold/release job
scontrol hold <slurm_job_id>
scontrol release <slurm_job_id>
# View detailed job information
scontrol show job <slurm_job_id>
Troubleshooting#
Common Issues#
Job Pending:
# Check node availability
sinfo -p gpu
# Try different partition
-o execution.partition="gpu-shared"
Job Failed:
# Check job status
nemo-evaluator-launcher status <job_id>
# View Slurm job details
scontrol show job <slurm_job_id>
# Check job output logs (location shown in status output)
Job Timeout:
# Increase walltime
-o execution.walltime="08:00:00"
# Check current walltime limit for partition
sinfo -p <partition_name> -o "%P %l"
Resource Allocation:
# Adjust GPU allocation via gres
-o execution.gres=gpu:4
-o deployment.tensor_parallel_size=4
Debugging with Slurm Commands#
# View job details
scontrol show job <slurm_job_id>
# Monitor resource usage
sstat -j <slurm_job_id> --format=AveCPU,AveRSS,MaxRSS,AveVMSize
# Job accounting information
sacct -j <slurm_job_id> --format=JobID,JobName,State,ExitCode,DerivedExitCode
# Check job efficiency after completion
seff <slurm_job_id>