Example: Deploy Multi-node SGLang with Dynamo on SLURM#
This folder implements the example of SGLang DeepSeek-R1 Disaggregated with WideEP on a SLURM cluster.
Overview#
The scripts in this folder set up multiple cluster nodes to run the SGLang DeepSeek-R1 Disaggregated with WideEP example, with separate nodes handling prefill and decode. The node setup is done using Python job submission scripts with Jinja2 templates for flexible configuration. The setup also includes GPU utilization monitoring capabilities to track performance during benchmarks.
Scripts#
submit_job_script.py
: Main script for generating and submitting SLURM job scripts from templatesjob_script_template.j2
: Jinja2 template for generating SLURM job scriptsscripts/worker_setup.py
: Worker script that handles the setup on each nodescripts/monitor_gpu_utilization.sh
: Script for monitoring GPU utilization during benchmarks
Logs Folder Structure#
Each SLURM job creates a unique log directory under logs/
using the job ID. For example, job ID 3062824
creates the directory logs/3062824/
.
Log File Structure#
logs/
├── 3062824/ # Job ID directory
│ ├── log.out # Main job output (node allocation, IP addresses, launch commands)
│ ├── log.err # Main job errors
│ ├── node0197_prefill.out # Prefill node stdout (node0197)
│ ├── node0197_prefill.err # Prefill node stderr (node0197)
│ ├── node0200_prefill.out # Prefill node stdout (node0200)
│ ├── node0200_prefill.err # Prefill node stderr (node0200)
│ ├── node0201_decode.out # Decode node stdout (node0201)
│ ├── node0201_decode.err # Decode node stderr (node0201)
│ ├── node0204_decode.out # Decode node stdout (node0204)
│ ├── node0204_decode.err # Decode node stderr (node0204)
│ ├── node0197_prefill_gpu_utilization.log # GPU utilization monitoring (node0197)
│ ├── node0200_prefill_gpu_utilization.log # GPU utilization monitoring (node0200)
│ ├── node0201_decode_gpu_utilization.log # GPU utilization monitoring (node0201)
│ └── node0204_decode_gpu_utilization.log # GPU utilization monitoring (node0204)
├── 3063137/ # Another job ID directory
├── 3062689/ # Another job ID directory
└── ...
Setup#
For simplicity of the example, we will make some assumptions about your SLURM cluster:
We assume you have access to a SLURM cluster with multiple GPU nodes available. For functional testing, most setups should be fine. For performance testing, you should aim to allocate groups of nodes that are performantly inter-connected, such as those in an NVL72 setup.
We assume this SLURM cluster has the Pyxis SPANK plugin setup. In particular, the
job_script_template.j2
template in this example will usesrun
arguments like--container-image
,--container-mounts
, and--container-env
that are added tosrun
by Pyxis. If your cluster supports similar container based plugins, you may be able to modify the template to use that instead.We assume you have already built a recent Dynamo+SGLang container image as described here. This is the image that can be passed to the
--container-image
argument in later steps.
Usage#
Submit a benchmark job:
python submit_job_script.py \ --template job_script_template.j2 \ --model-dir /path/to/model \ --config-dir /path/to/configs \ --container-image container-image-uri \ --account your-slurm-account
Required arguments:
--template
: Path to Jinja2 template file--model-dir
: Model directory path--config-dir
: Config directory path--container-image
: Container image URI (e.g.,registry/repository:tag
)--account
: SLURM account
Optional arguments:
--prefill-nodes
: Number of prefill nodes (default:2
)--decode-nodes
: Number of decode nodes (default:2
)--gpus-per-node
: Number of GPUs per node (default:8
)--network-interface
: Network interface to use (default:eth3
)--job-name
: SLURM job name (default:dynamo_setup
)--time-limit
: Time limit in HH:MM:SS format (default:01:00:00
)
Note: The script automatically calculates the total number of nodes needed based on
--prefill-nodes
and--decode-nodes
parameters.Monitor job progress:
squeue -u $USER
Check logs in real-time:
tail -f logs/{JOB_ID}/log.out
Monitor GPU utilization:
tail -f logs/{JOB_ID}/{node}_prefill_gpu_utilization.log
Outputs#
Benchmark results and outputs are stored in the outputs/
directory, which is mounted into the container.