The installation of Slurm is primarily done through the cm-wlm-setup tool included with BCM.
Note
If any images are added to the system (namely a new DGX OS image) after this installation. This wizard has to be run again in order for the appropriate pyxis/enroot and CMDaemon packages are installed.
Here is the process to install Slurm:
Run cm-wlm-setup.
Choose Setup (Step By Step) on the WLM operations window and then select OK.
Choose Slurm on the Select Workload Manager screen and then select Ok.
Enter WLM cluster name and then select Ok.
Use the default cluster name of slurm.
Choose (only) two nodes for Workload Manager server role and then select Ok.
Only select the head node(s); not the slogin node(s).
Enter the new configuration overlay name and priority for server role and then select Ok.
Use the default values of slurm-server and 500.
Ensure that all categories for the Workload Manager client role are unchecked and then select Ok.
Ensure that all nodes for the Workload Manager client role are unchecked and then select Ok.
Enter the configuration overlay name and priority for client role and select Ok.
Use the default values of slurm-client and 500.
Pick healthcheck producers that will be configured as pre-job checks and then select Ok.
Select the following healthcheck producers to be run as pre-job checks:
The compute tray nodes with GPU are selected in a different step and have their own configuration overlay.
Enter the name for the ConfigurationOverlay and then select Ok.
Use the default value of slurm-client-gpu.
Pick dgx or whatever name the gb200 category is for the Workload Manager client role and then select Ok.
All GPU compute tray nodes will be added and controlled at the category level.
Do not select individual nodes with GPUs or any control nodes.
Ensure that nothing is chosen for the Workload Manager client role and then select Ok.
Enter the new configuration overlay priority for client role and then select Ok.
Use the default value of 450.
Leave Tune number of slots empty and then select Ok.
Choose the categories for Workload Manager submit role and then select Ok.
Pick dgx-gb200 and slogin.
This allows both the GB200 nodes and the slogin nodes to submit Slurm jobs.
Choose the nodes for Workload Manager submit role and then select Ok.
Pick the head node(s).
Enter new configuration overlay name and priority for submit role and then select Ok.
Use the defaults values of slurm-submit and 500.
Enter the new configuration overlay name and priority for accounting role and then select Ok.
Use the default values of slurm-accounting and 500.
Choose the accounting nodes and then select Ok.
Pick the head node(s).
Choose no for activate Slurm Accounting High Availability and then select Ok.
Choose Use accounting node on for the storage server type for accounting and then select Ok.
Choose no for automatically run takeover on BCM failover? and then select Ok.
Choose no for Enable Slurm power saving features? and then select Ok.
Choose BMC autodetects GPUs for the GPU configuration method and then select Ok.
Choose yes for Configure Pyxis plugin? and then select Ok.
Do not choose anything on the Enroot settings page and then select Ok.
Select Internal for the topology source so that the generated topology is based only on cluster-internal resources.
Choose Block for the topology plugin and then select Ok.
Choose Constrain devices for Cgroups resource constraints and then select Ok.
Choose no for Install NVIDIA GPU packages? and then select Ok.
The required packages are already included in the DGX OS 7 image.
Use the default queue name of defq and then select Ok.
If different queues are requested, define them here or in the configurationoverlay within cmsh later where racks or sets of nodes can be assigned to different queues.
Choose Save config & deploy on the Summary screen and then select Ok.
The IMEX must be setup for proper inter-GPU memory sharing across all nodes within an NVLink domain.
Global IMEX will configure/populate all IMEX peers in the /etc/nvidia-imex/nodes_config.cfg file with their management interface IP address (typically bond0). This setting is used to run validation testing during initial cluster bring-up:
# global(default)category services dgx-gb200 (or whatever the compute node category name is)add nvidia-imexset autostart yesset monitored yesset managed yescommit
The final settings should resemble the following (with the global IMEX configuration):
[a03-p1-head-01->configurationoverlay[slurm-client-gpu]->roles[slurmclient]]% show
Parameter Value---------------------------------- ------------------------------------------------Name slurmclientRevisionType SlurmClientRoleAdd services yesWLM cluster slurmSlots 0All queues noQueues defqFeaturesSockets 0Cores per socket 0Threads per core 0Boards 0Sockets per board 0Real memory 0BNode addressWeight 0Port 0Tmp disk 0ReasonCPU spec listCore spec count 0Mem spec limit 0BGPU auto detect BCMNode customizations <0 in submode>Generic resources <1 in submode>Cpu bindings NoneSlurm hardware probe auto detect yesMemory autodetection slack 2.00%IMEX noWrite procs always noWrite only Procs noNodesetsPower profiles <submode>Nodeset features#genericresourcessub-menu
[a03-p1-head-01->configurationoverlay[slurm-client-gpu]->roles[slurmclient]]% genericresources
[a03-p1-head-01->configurationoverlay[slurm-client-gpu]->roles[slurmclient]->genericresources]% list
Alias (key) Name Type Count File------------------ -------- -------- -------- ----------------
Configure Workload/per-job IMEX.
The workload IMEX configuration is for after customer handoff, where the IMEX configuration will be done per Slurm job. All nodes in the job are added to a job specific IMEX domain. In this case be sure to only make changes to the configuration overlay. If the cluster was previously set to do global IMEX, be sure to undo the changes to the services within the GB200 category:
If a node is in drain state, then either it was taken down due to a node failure or it was put into drain state on purpose to pull it out of the queue of available nodes to perform work/maintenance/debug.
Use scontrolshownodes<nodeindrain> to find the reason why the node is drained.
If the administrator what to put a node in drain state scontrolupdatenodename=<nodename>state=drainreason="maintenance".
When fixed add it back to the idle queue with scontrolupdatenodename=<nodename>state=resume.
If the nodes are showing invalid, it means that the configuration of the node does not match what it expects. Commonly this is due to an incorrect GPU count/missing GPU(s).
If it shows Reason=gres/gpucountreportedlowerthanconfigured(0<8), this means Slurm is expecting 8 GPUs and sees zero.
This sometimes indicates that the autodetection of GPUs failed for some reason.
A reason this could fail is if the cuda-dcgm package is missing in the DGX OS image:
If it shows Reason=gres/gpucountreportedlowerthanconfigured(7<8) then a GPU has failed due to perhaps GPU tray seating issues (this should not be an issue in GB200 generation)
If the slurmctld service logs (systemctlstatusslurmctld or journalctl-xeuslurmctld) or the slurmdbd service logs (systemctlstatusslurmdbd or journalctl-xeuslurmdbd) are indicating that a connection is being refused, the mySQL password may need to be reset:
root@bcm11-head-01:~# systemctlstatusslurmctld.service
● slurmctld.service - Slurm controller daemon Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; preset: enabled) Drop-In: /etc/systemd/system/slurmctld.service.d └─99-cmd.conf Active: active (running) since Mon 2025-06-02 18:49:14 PDT; 1min 8s ago Main PID: 344308 (slurmctld) Tasks: 83 Memory: 33.9M (peak: 55.6M) CPU: 339ms CGroup: /system.slice/slurmctld.service ├─344308 /cm/local/apps/slurm/24.11/sbin/slurmctld --systemd └─344374 "slurmctld: slurmscriptd"Jun 02 18:50:21 bcm11-head-01 slurmctld[344308]: slurmctld: error: Sending PersistInit msg: Connection refusedJun 02 18:50:21 bcm11-head-01 slurmctld[344308]: slurmctld: accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: Registering slurmctld at port 6817 with slurmdbdJun 02 18:50:21 bcm11-head-01 slurmctld[344308]: slurmctld: error: Sending PersistInit msg: Connection refusedJun 02 18:50:21 bcm11-head-01 slurmctld[344308]: slurmctld: error: Still don't know my ClusterIDJun 02 18:50:23 bcm11-head-01 slurmctld[344308]: slurmctld: error: Retrying initial connection to slurmdbdJun 02 18:50:23 bcm11-head-01 slurmctld[344308]: slurmctld: error: _open_persist_conn: failed to open persistent connection to host:master:6819: Connection refusedJun 02 18:50:23 bcm11-head-01 slurmctld[344308]: slurmctld: error: Sending PersistInit msg: Connection refusedJun 02 18:50:23 bcm11-head-01 slurmctld[344308]: slurmctld: accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: Registering slurmctld at port 6817 with slurmdbdJun 02 18:50:23 bcm11-head-01 slurmctld[344308]: slurmctld: error: Sending PersistInit msg: Connection refusedJun 02 18:50:23 bcm11-head-01 slurmctld[344308]: slurmctld: error: Still don't know my ClusterIDroot@bcm11-head-01:~# systemctlstatusslurmctld.service
Set a new MySQL password of slurm_acct_db for user ‘slurm’@’g’ on the headnode using /cm/local/apps/slurm/current/scripts/cm-restore-db-password.
Specify the slurmdbd.conf path [/cm/shared/apps/slurm/etc/slurmdbd.conf].
Specify the slurmdbd.conf template path [/cm/local/apps/slurm/clurrent/templates/slurmdbd.conf.template].
Set the MySQL password to match the head node password.
If HA is configured, the utility will ask for the IP of the secondary head node. Leave blank if it is not.
Slurm Troubleshooting— Pyxis is Plug-in Unavailable on GB200 Software Image#
Sometimes if the software image for the GB200 compute trays was not assigned to the category when cm-wlm-setup is run, the Pyxis plug-in may be missing as indicated by:
root@dgxos-image-ubuntu2404-aarch64:/# ls-la/cm/local/apps/slurm/current/lib64/slurm/spank_pyxis.so
/usr/bin/ls: cannot access '/cm/local/apps/slurm/current/lib64/slurm/spank_pyxis.so': No such file or directory
To correct this:
use the cm-chroot-sw-img tool to enter the gb200 software image to install pyxis-sources and then run the command to compile and install the Pyxis plugin for Slurm.