Cluster Management#

This chapter introduces cluster management with NVIDIA Base Command Manager (BCM). A cluster running the cluster manager exports a cluster management interface to the outside world, which can be used by any application designed to communicate with the cluster.

Concepts#

In this section, some concepts central to cluster management with the cluster manager are introduced.

Devices#

A device in the cluster manager infrastructure represents components of a cluster. A device can be any of the following types:

Head node
Physical node
Chassis
Ethernet switch
InfiniBand switch
Power Distribution unit
Generic device

A device can have several properties (such as rack position, hostname, and switch port) which can be set to configure the device. Using the cluster manager, operations (for example, power on) may be performed on a device. The property changes and operations that can be performed on a device depend on the type of device. For example, it is possible to mount a new filesystem to a node, but not to an Ethernet switch.

Every device that is managed by the cluster manager has a device state associated with it. Table 3 describes the most important states for devices. All have state tracking enabled.

Table 3. Cluster manager device states

Device status	Device is	Monitored by BCM?
[ UP ]	UP	monitored
[ DOWN ]	DOWN	monitored
[ CLOSED ] (UP)	UP	mostly ignored
[ CLOSED ] (DOWN)	DOWN	mostly ignored

These and other states are described in more detail in Section 5.5 of the Bright Cluster Manager Administrator Manual.

[ DOWN ] and [ CLOSED ] (DOWN) states have an important difference. In the case of [ DOWN ], the device is down, but is typically intended to be available, and thus typically indicates a failure. In the case of [ CLOSED ] (DOWN), the device is down, but is intended to be unavailable, and typically indicates that the administrator deliberately brought the device down and would like the device to be ignored.

Software Images#

A software image is a blueprint for the contents of the local filesystems on a compute node. In practice, a software image is a directory on the head node containing a full Linux filesystem.

In the DGX SuperPOD, all nodes managed by (meaning all management and DGX nodes) share the same base operating system (OS), with the DGX nodes including the customizations of DGX Base OS.

When a non-compute node boots, the node provisioning system sets up the node with the software image associated with that node category (Section 2.1.3). Often this is a copy of the default software image, called default-image. DGX nodes are provisioned with a copy of the DGX OS image, identified by the dgx- prefix in the image name.

After the node is fully booted, it is possible to instruct the node to re-synchronize its local filesystems with the software image. This procedure can be used to distribute changes to the software image without rebooting nodes.

It is also possible to lock a software image so that no node is able to pick up the image until the software image is unlocked.

Software images can be changed using regular Linux tools and commands (such as apt and chroot). More details on making changes to software images and doing image package management can be found in Chapter 11 of the Bright Cluster Manage Administrator Manual.

Node Categories#

The collection of settings in the cluster manager that can apply to a node is called the configuration of the node. The administrator usually configures nodes using the Base View or cmsh front-end tools, and the configurations are managed internally with a database.

A node category is a group of compute nodes that share the same configuration. Node categories bring efficiency, enabling an administrator to:

Configure a large group of nodes concurrently. For example, to set up a group of nodes with a particular disk layout.
Operate on a large group of nodes concurrently. For example, to conduct a reboot on an entire category.

The default node categories for BCM installed on a DGX SuperPOD are shown in Table 4.

Table 4. Default node categories

System Type	Node Category	Description
login	slogin	login-image
compute	dgxnodes	dgxos-image
others	default	default-image

The default category can be changed by accessing the base object of partition mode and setting the value of defaultcategory to another, existing, category. System types other than login or compute are automatically placed in the default node category.

Nodes are typically divided into categories based upon its hardware specifications or the task that it is to perform. Whether or not nodes should be placed in a separate category depends on whether the configuration—for example: monitoring setup, disk layout, role assignment—for these nodes differs from the rest of the nodes.

A node inherits values from the category that it is in. Each value is treated as the default property value for a node and can be overruled by specifying the node property value for a particular node.

One configuration property value of a node category is its software image. However, there is no requirement for a one-to-one correspondence between node categories and software images. Multiple node categories may use the same software image, and conversely, one variable image—it is variable because it can be changed by the node setting—may be used in the same node category.

Software images can have their parameters overruled by the category settings. By default, however, the category settings that can overrule the software image parameters are unset.

Node Groups#

A node group consists of nodes that have been grouped together for convenience. The default node group assignments for BCM on a DGX SuperPOD are shown in Table 5.

Table 5. Default node group assignments

Node Group	Members
su1	dgx001..dgx020
su2	dgx021..dgx040
login	slogin1, slogin2

Node groups can consist of any mix of all kinds of nodes, irrespective of whether they are head nodes or compute nodes, and irrespective of what category they are in. A node may be in zero or more node groups at one time. That is, a node may belong to many node groups.

Node groups are used for carrying out operations on an entire group of nodes at a time. Because the nodes inside a node group do not necessarily share the same configuration, configuration changes cannot be conducted using node groups.

One important use for node groups is in the nodegroups property of the provisioning role configuration where a list of node groups that can configure node provisions is specified.

Roles#

A role is a task that can be performed by a node. By assigning a certain role to a node, an administrator activates the functionality that the role represents on this node. For example, a node can be turned into provisioning node, or can be turned into a storage node, by assigning the corresponding roles to the node.

Roles can be assigned to individual nodes or to node categories. Once assigned, a role is implicitly assigned to all nodes inside the category.

A configuration overlay is a group of roles that can be assigned to designated groups of nodes within a cluster. This enables configuration of many configuration parameters in various combinations of nodes.

Some roles allow parameters to be set that influence the behavior of the role. For example, the Slurm client role (which turns a node into a Slurm client) uses parameters to control how the node is configured within Slurm in terms of queues and the number of GPUs.

When a role has been assigned to a node category with a certain set of parameters, it is possible to override those parameters. This can be done by reassigning the role to the individual node with a different set of parameters. Roles that have been thus assigned override roles that have been assigned to a node category.

Roles have a priority setting associated with them. Roles assigned at category level have a fixed priority of 250, while roles assigned at node level have a fixed priority of 750. The configuration overlay priority is variable but is set to 500 by default. For example, roles assigned at the node level override roles assigned at the category level. Roles assigned at the node level also override roles assigned by the default configuration overlay.

A role can be imported from another entity, such as a role, a category, or a configuration overlay. Examples of role assignment are given in Sections 5.2.2 and 5.2.3 of the Bright Cluster Manager Administration Manual.

Authentication#

Changing Administrative Passwords#

Note

How to setup or change regular user passwords is discussed in :ref: user-management.

The cm-change-passwd command is used to administer these:

Head node: allows a root login to the head node.
Software images: allows a root login to a compute node running with that image and is stored in the image file.
Node installer: allows a root login to the node when the node-installer, a stripped-down operating system (OS), is running. The node-installer stage prepares the node for the final OS when the node is booting up. See Section 5.4 of the Bright Cluster Manager Administrator Manual for more information about the node-installer.
MySQL: allows a root login to the MySQL server used by.

It has a dialog prompting the administrator on which of them, if any, should be changed.

[root©headnode ~]# cm-change-passwd
With this utility you can easily change the following passwords:
* root password of head node
* root password of slave images
* root password of node-installer
* root password of mysql

Note: if this cluster has a high-availability setup with 2 head
    nodes, be sure to run this script on both head nodes.

Change password for root on head node? [y/N]: y
Changing password for root on head node.
Changing password for user root.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Change password for root in default-image [y/N]: y Changing password for root in default-image.
Changing password for user root. New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Change password for root in node-installer? [y/N]: y
Changing password for root in node-installer.
Changing password for user root. New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Change password for MYSQL root user? [y/N]: y
Changing password for MYSQL root user.
Old password:
New password:
Re-enter new password:

For an HA configuration, the passwords are copied over automatically to the other head node when a change is made to the root password of the software image. This allows a root login to a regular node running with that image.

For the remaining password cases (head root password, MySQL root password, and node-installer root password), the passwords are best “copied” to the other head node by rerunning the script on that head node.

Also, for software images passwords used by the compute nodes: the new password that is set for a compute node only works on the node after the image on the node itself has been updated, with, for example, the imageupdate command. Alternatively, the new password can be made to work on the node by rebooting the node to pick up the new image.

The LDAP root password is a random string set during installation. Changing this is not done using cm-change-password. It can be changed as explained in Appendix I of the Bright Cluster Manager Administrator Manual.

If the administrator has stored the password to the cluster in the Base View front-end, then the password should be modified there too (Figure 2).

ssh Logins#

The standard system login root password of the head node, the software image, and the node-installer, can be set using the cm-change-passwd command (2.2.1). In contrast, ssh logins from the head node to the compute nodes are set by default to be passwordless:

For non-root users, an ssh passwordless login works if the /home directory that contains the authorized keys for these users is mounted. The /home directory is mounted by default on the head node as well as on the compute node, so that by default a passwordless login works from the head node to the compute nodes, as well as from the compute nodes to the head node.
For the root user, an ssh passwordless login should always work from the head node to the compute nodes since the authorized keys are stored in /root. Logins from the compute node to the head node are configured by default to request a password, as a security consideration.

Users can be restricted from ssh logins:

On compute nodes using the usernodelogin or User node login settings.
On the head node by modifying the sshd configuration. For example, to allow only root logins, the value of AllowUsers can be set in /etc/ssh/sshd_config to root. See the sshd_config man page for more information.

Certificates#

PEM Certificates and CMDaemon Front-end Authentication#

While nodes in the cluster accept ordinary ssh logins, the cluster manager accepts public key authentication using X509v3 certificates. Public key authentication using X509v3 certificates means that the user authenticating to the cluster manager must present their public certificate, and in addition must have access to the private key that corresponds to the certificate.

The cluster manager uses the PEM format for certificates. In this format, the certificate and private key are stored as plain text in two separate PEM-encoded files, ending in .pem and .key.

Using cmsh and Authenticating to CMDaemon#

By default, one administrator certificate is created for root for the cmsh front-end to interact with the cluster manager. The certificate and corresponding private key are thus found on a newly installed cluster manager cluster on the head node at:

/root/.cm/admin.pem
/root/.cm/admin.key

The cmsh front-end, when accessing the certificate and key pair as user root, uses this pair by default, so that prompting for authentication is then not a security requirement. The logic that is followed to access the certificate and key by default is explained in detail in Section 4.4.2.6.

Using Base View and Authenticating to the Cluster Manager#

When an administrator uses the Base View front-end, a login to the cluster is conducted with username password authentication (Figure 2) unless the authentication has already been stored in the browser, or unless certificate-based authentication is used.

Certificate-based authentication can be carried out using a PKCS#12 certificate file. This can be generated from the PEM format certificates. For example, for the root user, an openssl command that can be used to generate the admin.pfx file is:
openssl pkcs12 -export -in ~/.cm/admin.pem -inkey ~/.cm/admin.key -out ~/.cm/admin.pfx
In Chrome, the IMPORT wizard at chrome://settings/certificates can be used to save the file into the browser.
For Firefox, the equivalent clickpath is
about:preferences#privacy>Certificates>View Certificates>Your>Certificates>Import.

The browser can then access the Base View front-end without a username/password combination.

If the administrator certificate and key are replaced, then any other certificates signed by the original administrator certificate must be generated again using the replacement, because otherwise they will no longer function.

Certificate generation in general, including the generation and use of non-administrator certificates, is described in greater detail in 4.4.

Profiles#

Certificates that authenticate to CMDaemon contain a profile.

A profile determines which cluster management operations that the certificate holder may perform. The administrator certificate is created with the admin profile, which is a built-in profile that enables all cluster management operations to be performed. In this sense, it is like the root account on unix systems. Other certificates may be created with different profiles giving certificate owners access to a predefined subset of the cluster management functionality (4.4).

Base View GUI#

This section introduces the basics of the Base View, which is the web application front end to the cluster manager. Base View is supported to run on the latest two publicly available desktop versions of Firefox, Google Chrome, Edge, and Safari at the time of release of BCM. Browsers that run on mobile devices are not supported.

Cluster Management GUI Service#

In the DGX SuperPOD, the GUI interface is provided as a web service on port 8081 from the head node to the browser. Its direct URL takes the form:

https://<host name or IP address>:8081/base-view

The cluster manager package that provides the service is base-view.

Base View Login Window#

Figure 2 shows the login dialog window for Base View. Use this window to administer that Base View service on the cluster. At the time of DGX SuperPOD deployment, at least one login is available: the root user, with the password selected during DGX SuperPOD installation (often documented on the Site Survey before installation).

Figure 2. Base View login

Base View Default Display on Connection#

By default, an overview window is displayed (Figure 3). It shows the Occupation rate, memory used, CPU cycles used, node statuses, and other cluster details. It corresponds to clickpath Cluster>Partition base.

Figure 3. Cluster overview

Navigating the Cluster with Base View#

Aspects of the cluster can be managed by administrators using Base View (Figure 3). The resource tree, displayed on the left side of the window, consists of available cluster usage concepts such as Provisioning, Grouping, HPC, Cloud, and Containers. It also has a cluster-centric approach to miscellaneous system concepts such as hardware devices Devices, non-hardware resources such as Identity Management, and Networking.

Selecting a resource opens a window that allows parameters related to the resource to be viewed and managed. As an example, the Cluster resource can be selected. This opens the Partition base window, which is a representation of the cluster instance. The tabs within the Partition base window are mapped out in Figure 4 and described next.

Figure 4. Cluster navigation within the Partition base window

Settings#

The Settings tab has several global cluster properties and property groups. These are loosely grouped as follows:

Buttons for jumping to various operational settings.
Cluster name, administrator e-mail, partition name.
Node basename, node digits.
Name servers, time servers.
Search domains, relay host.
Externally visible IP, Provisioning node auto update timeout.
Default burn configuration.
External network, management network.
Default category: sets the default category.
Sign installer certificates.
Notes.

Other Tab Information#

Information about other tabs is shown in Table 6.

Table 6. Additional tab information

Tab	Description
System information	Shows the main hardware specifications of the node (CPU, memory, BIOS), along with the OS version that it runs.
Version Information	Shows version information for important cluster software components, such as the CMDaemon database version, and the cluster manager version and builds.
Run Command	Allows a specified command to be run on a selected node of the cluster.
Fabrics	Displays the topology and switches for the fabrics used.
Rack View	Displays a view of the rack as defined by node allocations made by the administrator to racks and chassis.

Cluster Management Shell#

This section introduces the basics of the cluster management shell, cmsh. This is the command-line interface (CLI) to cluster management. Because cmsh and Base View give access to the same cluster management functionality. The cmsh front-end allows commands to be run with it and can be used in batch mode. Although cmsh commands often use constructs familiar to programmers, it is designed for managing the cluster efficiently rather than for trying to be a good or complete programming language. For programming cluster management, use Python bindings instead of using cmsh in batch mode. Usually, cmsh is invoked from an interactive session (for example, through ssh) on the head node, but it can also be used to manage the cluster from outside.

Invoking cmsh#

From the head node, cmsh can be invoked as follows:

[root©dgxsuperpod ~]# cmsh
\[dgxsuperpod]%

By default, it connects to the IP address of the local management network interface using the default cluster manager port. If it fails to connect as in the preceding example, but a connection takes place using cmsh localhost, then the management interface is most probably not up. In that case, bringing the management interface up allows cmsh to connect to CMDaemon. Running cmsh without arguments starts an interactive cluster management session. To go back to the unix shell, enter quit or ctrl-d:

[dgxsuperpod]% quit
[root©dgxsuperpod ~]#

Batch Mode and Piping in cmsh#

The -c flag allows cmsh to be used in batch mode. Commands may be separated using semicolons:

[root©dgxsuperpod ~]# cmsh -c "main showprofile; device status apc01" admin
apc01 ............... [   UP   ]
[root©dgxsuperpod ~]#

Alternatively, commands can be piped to cmsh:

[root©dgxsuperpod ~]# echo device status I cmsh
device status
apc01 ............... [   UP   ]
dgxsuperpod ......... [   UP   ]
dgx001 .............. [   UP   ]
dgx002 ...............[   UP   ]
switch01 .............[   UP   ]

Dotfiles and /etc/cmshrc File for cmsh#

In a similar way to unix shells, cmsh sources dotfiles, if they exist, upon start-up in both batch and interactive mode. In the following list of dotfiles, a setting in the file that is in the shorter path will override a setting in the file with the longer path:

∼/.cm/cmsh/.cmshrc
∼/.cm/.cmshrc
∼/.cmshrc

If there is no dotfile for the user and the file /etc/cmshrc exists, it is sourced, and its settings used. If /etc/cmshrc exists, its settings are used, but the values can be overridden by user dotfiles, which is standard Unix behavior.

Defining Command Aliases in cmsh#

Sourcing settings is convenient when defining command aliases. Command aliases can be used to abbreviate longer commands. For example, putting the following in .cmshrc would allow lv to be used as an alias for device list virtualnode:

alias lv device list virtualnode

Besides defining aliases in dotfiles, aliases in cmsh can also be created with the alias command. The preceding example can be run within cmsh to create the lv alias. Running the alias command within cmsh lists the existing aliases. Aliases can be exported from within cmsh together with other cmsh dot settings with the help of the export command:

[dgxsuperpod]% export > /root/mydotsettings

The dot settings can be taken into cmsh by running the run command from within cmsh:

[dgxsuperpod]% run /root/mydotsettings

Built-in Aliases in cmsh#

The following aliases are built-ins and are not defined in any .cmshrc or cmshrc files:

[headnode]% alias
alias - goto -
alias .. exit
alias / home
alias ? help
alias ds device status
alias ls list

The meanings are:

goto -: go to previous directory level of cmsh
exit: go up a directory level or leave cmsh if already at top level.
home: go to the top-level directory.
help: show help text for current level.
device status: show status of devices that can be accessed in device mode.
list: list state for all modes.

Automatic Aliases in cmsh#

A cmsh script is a file that has a sequence of cmsh commands that run within a cmsh session.

The directory .cm/cmsh/ can have a cmsh script placed in it with a .cmsh suffix and an arbitrary prefix. The prefix then automatically becomes an alias in cmsh.

In the following example:

The file tablelist.cmsh provides the alias tablelist, to list devices using the | symbol as a delimiter.
The file dfh.cmsh provides the alias dfh to conduct the Linux shell command df -h.

[root©dgxsuperpod ~]# cat /root/.cm/cmsh/tablelist.cmsh
list -d "|"
[root©dgxsuperpod ~]# cat /root/.cm/cmsh/dfh.cmsh
!df -h
[root©dgxsuperpod ~]# cmsh
[dgxsuperpod]% device
[dgxsuperpod->device]% alias | egrep ‘(tablelist|dfh)’
alias dfh run /root/.cm/cmsh/dfh.cmsh
alias tablelist run /root/.cm/cmsh/tablelist.cmsh
[dgxsuperpod->device]% list
Type                                                        Hostname (key)    MAC                  Category  Ip
---------------------- ---------------- ------------------ ---------------- ---------------
HeadNode                                            dgxsuperpod       FA:16:3E:B4:39:DB                      10.141.255.254
PhysicalNode                                dgx001            FA:16:3E:D5:87:71     default          10.141.0.1
PhysicalNode                                dgx002            FA:16:3E:BE:05:FE     default          10.141.0.2
[dgxsuperpod->device]% tablelist
Type                                                 |Hostname (key)  |MAC               |Category        |Ip
----------------------|----------------|------------------|----------------|---------------
HeadNode                                     |dgxsuperpod     |FA:16:3E:B4:39:DB |                  |10.141.255.254
PhysicalNode                         |dgx001                 |FA:16:3E:D5:87:71 |default            |10.141.0.1
PhysicalNode                         |dgx002                 |FA:16:3E:BE:05:FE |default            |10.141.0.2
[dgxsuperpod->device]% dfh
Filesystem  Size  Used Avail        Use%    Mounted on
devtmpfs            1.8G    0               1.8G    0%      /dev
tmpfs                       1.9G    0               1.9G    0%      /dev/shm
tmpfs                       1.9G    33M     1.8G    2%      /run
tmpfs                       1.9G    0               1.9G    0%      /sys/fs/cgroup
/dev/vdb1   25G     17G     8.7G  66%       /
tmpfs                       374M    0               374M    0%      /run/user/0

The cmsh session does not need restarting for the alias to become active.

Default Arguments in cmsh Scripts#

In a cmsh script, the parameters $1, $2, and so on, can be used to pass arguments. If the argument being passed is blank, then the values the parameters take also remain blank. However, if the parameter format has a suffix of the form -<value>, then <value> is the default value that the parameter takes if the argument being passed is blank.

[root©dgxsuperpod ~]# cat .cm/cmsh/encrypt-node-disk.cmsh home
device use ${1-dgx001}
set disksetup /root/my-encrypted-node-disk.xml set revision ${2-test}
commit

The script can be run without an argument (a blank value for the argument), in which case it takes on the default value of dgx001 for the parameter:

[root©dgxsuperpod ~]# cmsh [dgxsuperpod]% encrypt-node-disk [dgxsuperpod->device[dgx001]]%

The script can be run with an argument (dgx002 here), in which case it takes on the passed value of dgx002 for the parameter:

[root©dgxsuperpod  ~]#  cmsh
[dgxsuperpod]% encrypt-node-disk dgx002
[dgxsuperpod->device[dgx002]]%

cmsh Options#

The options usage information is shown with cmsh –h:

Usage:
cmsh [options] [hostname[:port]] cmsh [options] -c <command>
cmsh [options] -f <filename>
Options:
--help|-h
        Display this help
--noconnect|-u
        Start unconnected
--controlflag| -z
        ETX in non-interactive mode
--color <yes/no>
        Define usage of colors
--spool <directory>
        Alternative /var/spool/cmd
--tty| -t
        Pretend a TTY is available
--noredirect| -r
        Do not follow redirects
--norc| -n
        Do not load cmshrc file on start-up
--noquitconfirmation| -Q
        Do not ask for quit confirmation
--echo| -x
    Echo all commands
--quit|-q
    Exit immediately after error
--disablemultiline|-m
    Disable multiline support
--hide-events
    Hide all events by default
--disable-events
    Disable all events by default
Arguments:
hostname
    The hostname or IP to connect to
command
    A list of cmsh commands to execute
filename
    A file which contains a list of cmsh commands to execute
Examples:
cmsh        run  in  interactive  mode
cmsh -c  device status      run the device status command and exit
cmsh --hide-events -c  device status        run the device status command and exit, without
showing any events that arrive during this time cmsh -f some.file -q -x     run and echo the commands from some.file, exit

There is also a main page for cmsh(8), which is a bit more extensive than the help text. It does not however cover the modes and interactive behavior.

Levels, Modes, Help, and Commands Syntax in cmsh#

The top-level of cmsh is the level that cmsh is in when entered without any options.

To avoid overloading a user with commands, cluster management functionality has been grouped and placed in separate cmsh mode levels. Mode levels and associated objects for a level make up a hierarchy available below the top level.

There is an object-oriented terminology associated with managing using this hierarchy. To perform cluster management functions, the administrator descends through cmsh into the appropriate mode and object and conducts actions relevant to the mode or object.

For example, within user mode, an object representing a user instance, userthree, might be added or removed. Within the object userthree, the administrator can manage its properties. The properties can be data such as a password password123, or a home directory /home/userthree.

Typing help at the top level of cmsh shows the top-level commands.

alias ......................... Set aliases
category ...................... Enter  category  mode
cert .......................... Enter cert mode
cloud ......................... Enter cloud mode
cmjob ......................... Enter cmjob mode
color ......................... Manage console text color settings
configuration overlay ......... Enter configuration overlay mode
connect ....................... Connect to cluster
delimiter ..................... Display/set delimiter
device ........................ Enter  device  mode
disconnect .................... Disconnect from cluster
edgesight...................... Enter  edgesight  mode
etcd .......................... Enter  etcd  mode
events ........................ Manage events
exit .......................... Exit from current object or mode
export ........................ Display list of aliases current list formats
fspart .......................  Enter fspart mode
group ......................... Enter group mode
groupingsyntax ................ Manage the default grouping syntax
help .......................... Display this help
hierarchy ....................  Enter  hierarchy  mode
history ....................... Display command history
keyvaluestore ................  Enter  keyvaluestore  mode
kubernetes .................... Enter  kubernetes  mode
list .......................... List state for all modes
main .......................... Enter main mode
modified ...................... List modified objects
monitoring .................... Enter  monitoring  mode
network ....................... Enter network mode
nodegroup ..................... Enter nodegroup mode
partition ..................... Enter partition mode
process ....................... Enter  process  mode
profile ....................... Enter profile mode
quit .......................... Quit shell
quitconfirmation .............. Manage the status of quit confirmation
rack .......................... Enter rack mode
refresh ....................... Refresh all modes
run ........................... Execute cmsh commands from specified file
session ....................... Enter  session  mode
softwareimage ................. Enter softwareimage mode
task .......................... Enter task mode
time .......................... Measure time of executing command
unalias ....................... Unset aliases
unmanagednodeconfiguration .... Enter unmanagednodeconfiguration mode
user .......................... Enter user mode
watch ......................... Execute a command periodically, showing output
wlm ........................... Enter wlm mode

All levels inside cmsh provide these top-level commands. Passing a command as an argument to help gets details for it:

[myheadnode]% help run
Name:       run - Execute all commands in the given file(s)
Usage:      run [OPTIONS] <filename> [<filename2> …]
Options:  -x, --echo
                    Echo all commands
                -q, --quit
                    Exit immediately after error
[myheadnode]%

In the general case, invoking help at any mode level or within an object, without an argument, provides two lists: #. Under the title of To: a list of top-level commands. #. Under the title of the level it was invoked at: a list of commands that may be used at that level.

For example, entering session mode and then typing in help displays, firstly, output with a title of Top, and secondly, output with a title of session:

[myheadnode]% session
[myheadnode->session]% help
============================ Top =============================
alias ......................... Set aliases
category ...................... Enter  category  mode
ceph .......................... Enter ceph mode
...
==========================  session  ===========================
id ....................... Display current session id
killsession .............. Kill a session
list ..................... Provide overview of active sessions
[myheadnode->session]%

Navigation Through Modes and Objects in cmsh#

The major modes tree is shown in Appendix M.1 of the Bright Cluster Manager Administrator Manual. The following notes can help the cluster administrator in navigating the cmsh shell:

To enter a mode, a user enters the mode name at the cmsh prompt. The prompt changes to indicate that cmsh is in the requested mode, and commands for that mode can then be run.
To use an object within a mode, the use command is used with the object name. In other words, a mode is entered, and an object within that mode is used. When an object is used, the prompt changes to display that the object within the mode is now being used, and that commands are applied for that object.
The exit command is used to leave a mode and go back up a level. Similarly, if an object is in use, the exit command exits the object. At the top level, exit has the same effect as the quit command, that is, the user leaves cmsh and returns to the unix shell. The string .. is an alias for exit.
The home command, which is aliased to /, takes the user from any mode depth to the top level.
The path command at any mode depth displays a string that can be used as a path to the current mode and object, in a form that is convenient for copying and pasting into cmsh. The string can be used in many ways. For example, an alias can be defined in .cmshrc (2.4.1.2).

In the following example, the path command is used to print out a string. This string makes it easy to construct a bash shell command to run a list from the correct place within cmsh:
[headnode->configurationoverlay[slurm-client]->roles[slurmclient]]% list
Name (key)
----------------------------
slurmclient
[headnode->configurationoverlay[slurm-client]->roles[slurmclient]]% path home;configurationoverlay;use "slurm-client";roles;use slurmclient;

Pasting the string into a bash shell, using the cmsh command with the -c option, and appending the list command to the string, replicates the session output of the list command:

[headnode ~]# cmsh -c   configurationoverlay;use "slurm-client";roles;use slurmclient; list
Name (key)
----------------------------
slurmclient

The following example shows that the path command can also be used inside the cmsh session itself for convenience:

[headnode]% device
[headnode->device]% list
Type          Hostname (key)   MAC               Category Ip             Network     Status
------------- -------------    ----------------- -------- -------------  ----------- ------
EthernetSwitch switch01        00:00:00:00:00:00          10.141.0.50    internalnet [ UP ]
HeadNode       headnode        00:0C:29:5D:55:46          10.141.255.254 internalnet [ UP ]
PhysicalNode   dgx001          00:0C:29:7A:41:78 default  10.141.0.1     internalnet [ UP ]
PhysicalNode   dgx002          00:0C:29:CC:4F:79 default  10.141.0.2     internalnet [ UP ]
[headnode->device]% exit
[headnode]% device
[headnode->device]% use dgx001
[headnode->device[dgx001]]% path
home;device;use dgx001;
[headnode->device[dgx001]]% home
[headnode]% home;device;use dgx001 #copy-pasted from path output earlier
[headnode->device[dgx001]]%

A command can also be executed in a mode without staying within that mode. This is done by specifying the mode before the command that is to be executed within that node. Most commands also accept arguments after the command. Multiple commands can be executed in one line by separating commands with semicolons. A cmsh input line has the following syntax:

<mode> <cmd> <arg> . . . <arg>; . . . ; <mode> <cmd> <arg> . . . <arg>

Where <mode> and <arg> are optional. A more precise synopsis is: [<mode>] <cmd> [<arg> ... ] [; ... ; [<mode>] <cmd> [<arg> ... ]]

[headnode->network]% device status headnode; list
headnode ............ [   UP   ]
Name (key)     Type      Netmask bits Base address   Domain name          Ipv6
-------------  --------- ------------- ------------- -------------------- ----
externalnet         External  16             192.168.1.0     userdomain.com      no
globalnet      Global          0             0.0.0.0         cm.cluster
internalnet         Internal  16             10.141.0.0      eth.cluster
[headnode->network]%

In the preceding example, while in network mode, the status command is executed in device mode on the host name of the head node, making it display the status of the head node. The list command on the same line after the semicolon still runs in network mode, as expected, and not in device mode, and so displays a list of networks. Inserting a semicolon makes a difference, in that the mode is entered, so that the list displays a list of nodes:

[headnode->network]% device; status headnode; list
headnode ................ [ UP      ]
Type                                Hostname (key) MAC                  Category    Ip              Network     Status
------------- --------------- ------------------ ---------  -------------- ----------- ------
HeadNode                    headnode        FA:16:3E:C8:06:D1               10.141.255.254 internalnet [ UP ] PhysicalNode  dgx001  F       A:16:3E:A2:9C:87   default      10.141.0.1      internalnet [ UP ]
[headnode->device]%

Working with Objects#

Modes in cmsh work with associated groupings of data called objects. For instance, device mode works with device objects, and network mode works with network objects. The commands used to deal with objects have similar behavior in all modes. Not all the commands exist in every mode, and not all the commands function with an explicit object (Table 7).

Table 7. Command and objects

Command	Description
`use`	Use the specified object. That is: Make the specified object the current object
`add`	Create the object and use it
`assign`	Assign a new object
`unassign`	Unassign an object
`clear`	Clear the values of the object
`clone`	Clone the object and use it
`commit`	Commit local changes, done to an object, to CMDaemon
`refresh`	Undo local changes done to the object
`list`	List all objects at current level
`sort`	Sort the order of display for the list command
`format`	Set formatting preferences for list output
`foreach`	Execute a set of commands on several objects
`show`	Display all properties of the object
`swap`	Swap (exchange) the names of two objects
`get`	Display specified property of the object
`set`	Set a specified property of the object
`clear`	Set default value for a specified property of the object.
`append`	Append a value to a property of the object, for a multi-valued property
`removefrom`	Remove a value from a specific property of the object, for a multi-valued property
`modified`	List objects with uncommitted local changes
`usedby`	List objects that depend on the object
`validate`	Do a validation check on the properties of the object
`exit`	Exit from the current object or mode level

Working with objects with these commands is demonstrated with several examples in this section.

use and exit#

[dgxsuperpod->device]% use dgx001
[dgxsuperpod->device[dgx001]]% status
dgx001 ............. [      UP      ]
[dgxsuperpod->device[dgx001]]% exit
[dgxsuperpod->device]%

In the preceding example, use dgx001 issued from within device mode makes dgx001 the current object. The prompt changes accordingly. The status command, without an argument, then returns status information just for dgx001, because making an object the current object makes subsequent commands within that mode level apply only to that object. Finally, the exit command exits the current object level.

add, commit, and remove#

The commands introduced in this section have many implicit concepts associated with them. So an illustrative session is first presented as an example. What happens in the session is then explained to familiarize the reader with the commands and associated concepts.

[dgxsuperpod->device]% add physicalnode dgx100 10.141.0.100 [dgxsuperpod->device*[dgx100*]]% commit
[dgxsuperpod->device[dgx100]]% category add test-category [dgxsuperpod->category*[test-category*]]% commit
[dgxsuperpod->category[test-category]]% remove test-category
[dgxsuperpod->category*]% commit
Successfully removed 1 Categories
Successfully committed 0 Categories
[dgxsuperpod->category]% device remove dgx100
[dgxsuperpod->category]% device
[dgxsuperpod->device*]% commit
Successfully removed 1 Devices Successfully committed 0 Devices [dgxsuperpod->device]%

add: this command creates an object within its associated mode, and in cmsh the prompt drops into the object level just created. Thus, at the start in the preceding example, within device mode, a new object, named dgx100, is added. For this object, properties such as the type (physicalnode) and IP address (10.141.0.100) can be set. The node object level ([dgx100*]) is automatically dropped into from device mode when the add command is executed. After execution, the state achieved is that the object has been created with some properties. However, it is still in a temporary, modified state, and not yet persistent. Asterisk tags in the prompt are a useful reminder of a modified state, with each asterisk indicating a tagged object that has an unsaved, modified property. In this case, the unsaved properties are the IP address setting, the node name, and the node type.

The add command—syntax notes:

In most modes the add command takes only one argument, namely the name of the object that is to be created. However, in device mode, an extra object-type, in this case physicalnode, is also required as argument, and an optional extra IP argument may also be specified. The response to help add while in device mode gives details:

[myheadnode->device]% help add
Name:
add - Create a new device of the given type with specified hostname. The IP address may also be set.
Usage:
add <type> <hostname> [IP address]
Arguments:
type
chassis, genericdevice, gpuunit, litenode, cloudnode, physicalnode, headnode, powerdistributionunit, racksensor, ethernetswitch, ibswitch, myrinetswitch

commit: this command is a further step that saves any changes made after executing a command. In this case, in the second line, it saves the dgx100 object with its properties. The asterisk tag disappears for the prompt if settings for that mode level and below have been saved.

The top-level modes, such as the category mode, can be accessed directly from within this level if the mode is stated before the command. So, stating the mode category before running the add command allows the specified category test-category to be added. Again, the test-category object level within category mode is automatically dropped into when the add command is executed.

commit -w|--wait: the commit command by default does not wait for a state change to complete. This means that the prompt becomes available right away. This means that it is not obvious that the change has taken place, which causes problems if scripting with cmsh for cloning a software image (2.1.2). The -w|--wait option to the commit command works around this issue by waiting for any associated background task, such as the cloning of a software image, to be completed before making the prompt available.

remove: this command removes a specified object within its associated mode. On successful execution, if the prompt is at the object level, then the prompt moves one level up. The removal is not conducted yet; it is only a proposed removal. This is indicated by the asterisk tag, which remains visible until the commit command is executed, and the test-category removal is saved. The remove command can also remove an object in a non-local mode if the non-local mode is associated with the command. This is illustrated in the example where, from within category mode, the device mode is declared before running the remove command for dgx100. The proposed removal is configured without being made permanent, but in this case no asterisk tag shows up in the category mode, because the change is in device mode. To drop into device mode, the mode command “device” is executed. An asterisk tag appears to remind the administrator that there is still an uncommitted change (the node that is to be removed) for the mode. The commit command would remove the object whichever mode it is in—the non-existence of the asterisk tag does not change the effectiveness of commit.

remove -d|--data: the remove command by default removes an object, and not the represented data. An example is if, in softwareimage mode, a software image is removed with the remove (without options) command. As far as the cluster manager is concerned, the image is removed after running commit. However the data in the directory for that software image is not removed. The -d| --data option to the remove command arranges removal of the data in the directory for the specified image, as well as removal of its associated object. remove -a|--all: the remove command by default does not remove software image revisions. The -a| --all option to the remove command also removes all software image revisions.

clone, modified, and swap#

The node object dgx100 that was created in the previous example, can be cloned to dgx101 as follows:

[dgxsuperpod->device]% clone dgx100 dgx101
Warning: The Ethernet switch settings were not cloned, and have to be set manually [dgxsuperpod->device*[dgx101*]]% exit
[dgxsuperpod->device*]% modified
State  Type                                      Name
------ ------------------------ -----------------------------------
+             Device                                     dgx101
[dgxsuperpod->device*]% commit
[dgxsuperpod->device]%
[dgxsuperpod->device]% remove dgx100
[dgxsuperpod->device*]% commit
[dgxsuperpod->device]%

The modified command is used to check what objects have uncommitted changes, and the new object dgx101 that is seen to be modified, is saved with a commit. The device dgx100 is then removed by using the remove command. A commit executes the removal. The modified command corresponds to the functionality of the Unsaved entities icon Figure 11. The + entry in the State column in the output of the modified command in the preceding example indicates that the object is a newly added one, but not yet committed. Similarly, a ~ entry indicates an object that is to be removed on committing, while a blank entry indicates that the object has been modified without an addition or removal involved. Cloning an object is a convenient method of duplicating a fully configured object. When duplicating a device object, cmsh will attempt to automatically assign a new IP address using several heuristics. In the preceding example, dgx101 is assigned IP address 10.141.0.101. The attempt is a best-effort and does not guarantee a properly configured object. The cluster administrator should therefore inspect the result. Sometimes an object may have been misnamed, or physically swapped. For example, dgx001 exchanged physically with dgx002 in the rack, or the hardware device eth0 is misnamed by the kernel and should be eth1. In that case it can be convenient to swap their names using the cluster manager front-end rather than change the physical device or adjust kernel configurations. This is equivalent to exchanging all the attributes from one name to the other. For example, if the two interfaces on the head node must have their names exchanged, it can be done as follows:

[dgxsuperpod->device]% use dgxsuperpod
[dgxsuperpod->device[dgxsuperpod]]% interfaces
[dgxsuperpod->device[dgxsuperpod]->interfaces]%  list
Type                         Network device name  IP                      Network
------------ -------------------- ---------------- --------------
physical             eth0 [dhcp]        10.150.4.46      externalnet
physical             eth1 [prov]        10.141.255.254   internalnet
[headnode->device[dgxsuperpod]->interfaces]% swap eth0 eth1; commit [headnode->device[dgxsuperpod]->interfaces]% list
Type                         Network device name  IP                      Network
------------ -------------------- ---------------- --------------
physical             eth0 [prov]        10.141.255.254   internalnet
physical             eth1 [dhcp]        10.150.4.46      externalnet
[dgxsuperpod->device[dgxsuperpod]->interfaces]% exit; exit

get, set, and refresh#

The get command is used to retrieve a specified property from an object, and set is used to set it:

[dgxsuperpod->device]% use dgx101
[dgxsuperpod->device[dgx101]]% get category test-category
[dgxsuperpod->device[dgx101]]% set category default [dgxsuperpod->device*[dgx101*]]% get category default
[dgxsuperpod->device*[dgx101*]]% modified
State  Type                                           Name
------ ------------------------ -------------------------------
        Device                                        dgx101
[dgxsuperpod->device*[dgx101*]]% refresh
[dgxsuperpod->device[dgx101]]% modified
No modified objects of type device
[dgxsuperpod->device[dgx101]]% get category test-category
[dgxsuperpod->device[dgx101]]%

Here, the category property of the dgx101 object is retrieved by using the get command. The property is then changed using the set command. Using get confirms that the value of the property has changed, and the modified command reconfirms that dgx101 has local uncommitted changes. The refresh command undoes the changes made and corresponds to the Revert button in Base View when viewing Unsaved entities (Figure 11). The modified command then confirms that no local changes exist. Finally, the get command reconfirms that no local change took place. Among the possible values a property can take on are strings and Booleans:

A string can be set as a revision label for any object:

[dgxsuperpod->device[dgx101]]% set revision "changed on 10th May"
[dgxsuperpod->device*[dgx101*]]% get revision
[dgxsuperpod->device*[dgx101*]]% changed on 10th May 2011

This can be useful when using shell scripts with an input text to label and track revisions when sending commands to cmsh. How to send commands from the shell to cmsh is introduced in 2.4.1. For Booleans, the values yes, 1, on and true are equivalent to each other, as are their opposites no, 0, off and false. These values are case-insensitive.

clear#

[dgxsuperpod->device]% set dgx101 mac 00:11:22:33:44:55
[dgxsuperpod->device*]% get dgx101 mac
00:11:22:33:44:55
[dgxsuperpod->device*]% clear dgx101 mac
[dgxsuperpod->device*]% get dgx101 mac
00:00:00:00:00:00
[dgxsuperpod->device*]%

The get and set commands are used to view and set the MAC address of dgx101 without running the use command to make dgx101 the current object. The clear command then unsets the value of the property. The result of clear depends on the type of the property that it acts on. In the case of string properties, the empty string is assigned, whereas for MAC addresses the special value 00:00:00:00:00:00 is assigned.

list, format, and sort#

The list command is used to list objects in a mode. The command has many options. The ones that are valid for the current mode can be viewed by running help list. The f|–format option is available in all modes and takes a format string as argument. The string specifies what properties are printed for each object, and how many characters are used to display each property in the output line. In the following example, a list of objects is requested for device mode, displaying the hostname, switchports, and ip properties for each device object.

[headnode->device]% list -f hostname:14,switchports:15,ip
hostname (key) switchports   ip
-------------- --------------- --------------------
apc01                                     10.142.254.1
headnode          switch01:46        10.142.255.254
dgx001            switch01:47        10.142.0.1
dgx002            switch01:45        10.142.0.2
switch01                               10.142.253.1
[headnode->device]%

Running the list command with no argument uses the current format string for the mode. Running the format command without arguments displays the current format string, and displays all available properties including a description of each property:

[headnode->device]% format
Current list printing format:
-----------------------------
type:22, hostname:[16-32], mac:18, category:[16-32], ip:15, network:[14-32], status:[16-32]
Valid fields:
-------------
activation                 : Date on which node was defined
additionalhostnames: List of additional hostnames that should resolve to the interfaces IP address
allownetworkingrestart     : Allow node to update ifcfg files and restart networking
banks                            : Number of banks
...

The print specification of the format command uses the delimiter: to separate the parameter and the value for the width of the parameter column. For example, a width of ten can be set with:

[headnode->device]% format hostname:10
[headnode->device]% list
hostname (
----------
apc01
headnode
dgx001
dgx002
switch01

A range of widths can be set, from a minimum to a maximum, using square brackets. A single minimum width possible is chosen from the range that fits all the characters of the column. If the number of characters in the column exceeds the maximum, then the maximum value is chosen. For example:

[headnode->device]% format hostname:[10-14]
[headnode->device]% list
hostname (key)
--------------
apc01
headnode
dgx001
dgx002
switch01

The parameters to be viewed can be chosen from a list of valid fields by running the format command without any options, as shown earlier. The format command can take as an argument a string that is made up of multiple parameters in a comma-separated list. Each parameter takes a colon-delimited width specification.

[headnode->device]% format hostname:[10-14],switchports:14,ip:20
[headnode->device]% list
hostname (key) switchports  ip
-------------- -------------- --------------------
apc01                                    10.142.254.1
headnode       switch01:46  10.142.255.254
dgx001            switch01:47       10.142.0.1
dgx002            switch01:45       10.142.0.2
switch01                              10.142.253.1

The output of the format command without arguments shows the current list printing format string, with spaces. In general, the string used in the format command can be set with enclosing quotes ("):

[headnode->device]% format "hostname:[16-32], network:[14-32], status:[16-32]"

Or with the spaces removed:

[headnode->device]%  format  hostname:[16-32],network:[14-32],status:[16-32]

The default parameter settings can be restored with the -r|–reset option:

[headnode->device]% format -r
[headnode->device]% format I head -3
Current list printing format:
-----------------------------
type:22, hostname:[16-32], mac:18, category:[16-32], ip:15, network:[14-32], status:[16-32]
[headnode->device]%

The sort command sorts output in alphabetical order for specified parameters when the list command is run. The sort is done according to the precedence of the parameters passed to the sort command:

[headnode->device]% sort type mac
[headnode->device]% list -f type:15,hostname:15,mac
type                  hostname (key)  mac
--------------- --------------- --------------------
HeadNode           headnode   08:0A:27:BA:B9:43
PhysicalNode    dgx002                00:00:00:00:00:00
PhysicalNode    log001                52:54:00:DE:E3:6B
[headnode->device]% sort type hostname
[headnode->device]% list -f type:15,hostname:15,mac
type                  hostname (key)  mac
--------------- --------------- --------------------
HeadNode             headnode         08:0A:27:BA:B9:43
PhysicalNode         log001          52:54:00:DE:E3:6B
PhysicalNode         dgx002          00:00:00:00:00:00
[headnode->device]% sort mac hostname
[headnode->device]% list -f type:15,hostname:15,mac
type                  hostname (key)  mac
--------------- --------------- --------------------
PhysicalNode         dgx002           00:00:00:00:00:00
HeadNode           headnode   08:0A:27:BA:B9:43
PhysicalNode         log001           52:54:00:DE:E3:6B

The preceding sort commands can alternatively be specified with the -s|–sort option to the list command:

[headnode->device]% list -f type:15,hostname:15,mac --sort type,mac
[headnode->device]% list -f type:15,hostname:15,mac --sort type,hostname
[headnode->device]% list -f type:15,hostname:15,mac --sort mac,hostname

append and removefrom#

When dealing with a property of an object that can take more than one value at a time—a list of values—the append and removefrom commands can be used to respectively append to and remove elements from the list. If more than one element is appended, they should be space-separated. The set command may also be used to assign a new list immediately, overwriting the existing list. In the following example, values are appended and removed from the powerdistributionunits properties of device dgx001.

The powerdistributionunits properties represent the list of ports on power distribution units that a particular device is connected to. This information is relevant when power operations are performed on a node.

[dgxsuperpod->device]% use dgx001
[dgxsuperpod->device[dgx001]]% get powerdistributionunits
apc01:1
[...device[dgx001]]% append powerdistributionunits apc01:5
[...device*[dgx001*]]% get powerdistributionunits
apc01:1 apc01:5
[...device*[dgx001*]]% append powerdistributionunits apc01:6
[...device*[dgx001*]]% get powerdistributionunits
apc01:1 apc01:5 apc01:6
[...device*[dgx001*]]% removefrom powerdistributionunits apc01:5 [...device*[dgx001*]]% get powerdistributionunits
apc01:1 apc01:6
[...device*[dgx001*]]% set powerdistributionunits apc01:1 apc 01:02 [...device*[dgx001*]]% get powerdistributionunits
apc01:1 apc01:2

Chapter 4 of the Bright Cluster Manager Administrator Manual has more information on power settings and operations.

usedby#

Removing a specific object is only possible if other objects do not have references to it. To help the administrator discover a list of objects that depend on (“use”) the specified object, the usedby command may be used. In the following example, objects depending on device apc01 are requested. The usedby property of powerdistributionunits indicates that device objects dgx001 and dgx002 contain references to (“use”) the object apc01. In addition, the apc01 device is itself displayed as being in the up state, indicating a dependency of apc01 on itself. If the device is to be removed, then the two references to it first must be removed, and the device then must be brought to the CLOSED state by using the close command.

[dgxsuperpod->device]% usedby apc01
Device used by the following:
Type                    Name      Parameter
---------------- ---------- ----------------------
Device               apc01     Device is up
Device               dgx001    powerDistributionUnits
Device               dgx002    powerDistributionUnits
[dgxsuperpod->device]%

validate#

Whenever committing changes to an object, the cluster management infrastructure checks the object to be committed for consistency. If one or more consistency requirements are not met, then cmsh reports the violations that must be resolved before the changes are committed. The validate command allows an object to be checked for consistency without committing local changes.

[dgxsuperpod->device]% use dgx001
[dgxsuperpod->device[dgx001]]% clear category
[dgxsuperpod->device*[dgx001*]]% commit
Code  Field                   Message
----- ------------------------ ---------------------------
1      category                    The category should be set
[dgxsuperpod->device*[dgx001*]]% set category default
[dgxsuperpod->device*[dgx001*]]% validate
All good
[dgxsuperpod->device*[dgx001*]]% commit
[dgxsuperpod->device[dgx001]]%

show#

The show command is used to show the parameters and values of a specific object. For example, for the object dgx001, the attributes displayed are:

[dgxsuperpod->device[dgx001]]% show
Parameter                               Value
--------------------------------------- ------------------------------------
Activation                              Thu, 03 Aug 2017 15:57:42 CEST
BMC Settings                                 <submode>
Block devices cleared on next boot
Category                                   default
...
Data node                               no
Default gateway                           10.141.255.254 (network: internalnet)
...
Software image                               default-image
Static  routes                               <0 in submode>
...

assign and unassign#

The assign and unassign commands are analogous to add and remove. The difference between assign and add from the system administrator point of view is that assign sets an object with settable properties from a choice of existing names, whereas add sets an object with settable properties that include the name that is to be given. This makes assign suited for cases where multiple versions of a specific object choice cannot be used.

For example:

If a node is going to be configured to run with particular Slurm settings, the node can be assigned an slurmclient role with the assign command. The node cannot be assigned another slurmclient role with other Slurm settings at the same time. Only the settings within the assigned Slurm client role can be changed.
If a node is to be configured to run with added interfaces eth3 and eth4, then the node can have both physical interfaces added to it with the add command.

The only place where the assign command is currently used within cmsh is within the roles submode, available under category mode, configurationoverlay mode, or device mode. Within roles, assign is used for assigning roles objects to give properties associated with that role to the category, configuration overlay, or device.

import#

The import command is an advanced command that works within a role. It is used to clone roles between entities. A node inherits all roles from the category and configuration overlay it is a part of.

[root©headnode ~]# cmsh
[headnode]% device roles dgx001
[headnode->device[dgx001]->roles]% list
Name (key)
-------------------------------
[category:default] cgroupsupervisor
[category:default] slurmclient

If there is a small change to the default roles to be made, only for dgx001, in slurmclient, then the role can be imported from a category or overlay. Importing the role duplicates the object and assigns the duplicate value to dgx001.

This differs from simply assigning a slurmclient role to dgx001, because importing provides the values from the category or overlay, whereas assigning provides unset values.

After running import, just as for assign, changes to the role made at dgx001 level stay at that node level, and changes made to the category-level or overlay-level slurmclient role are not automatically inherited by the dgx001 slurmclient role.

[headnode->device[dgx001]->roles]% import<TAB><TAB>
cephmds     cloudgateway    elasticsearch
...and other available roles including  slurmclient...
[headnode->device[dgx001]->roles]% import --overlay slurm-client slurmclient
[headnode->device*[dgx001*]->roles*]% list
Name (key)
-------------------------------
[category:default] cgroupsupervisor
slurmclient
[headnode->device*[dgx001*]->roles*]% set slurmclient queues dgx1q
[headnode->device*[dgx001*]->roles*]% commit

The preceding shows that a list of roles is prompted for using tab-completion after having typed import, and that the settings from the configuration overlay level are brought into dgx001 for the slurmclient role. The slurmclient values at node level then override any of the overlay-level or category level-settings, as suggested by the new list output. The Slurm client settings are then the same for dgx001 as the settings at the overlay level. The only change made is that a special queue, dgx1q, is configured just for dgx001.

The import command in roles mode can duplicate any role between any two entities. Options can be used to import from a category (-c|--category), a node (-n|--node), or an overlay (-o|--overla``y), as indicated by its help text (``help import).

Advanced cmsh Features#

This section describes some advanced features of cmsh.

CLI Editing#

CLI editing and history features from the readline library are available. http://tiswww.case.edu/php/chet/readline/rluserman.html provides a full list of key-bindings. For users who are familiar with the bash shell running with readline, probably the most useful and familiar features provided by readline within cmsh are:

Tab-completion of commands and arguments.
Being able to select earlier commands from the command history using <ctrl>-r or using the up- and down-arrow keys.

history and timestamps#

The history command within cmsh explicitly displays the cmsh command history as a list. The --timestamps|-t option to the history command displays the command history with timestamps.

[headnode->device[dgx001]]% history | tail -3
use dgx001
history
history |  tail -3
[headnode->device[dgx001]]% history -t | tail -3
Thu Dec  3 15:15:18 2015 history
Thu Dec  3 15:15:43 2015 history | tail -3
Thu Dec  3 15:15:49 2015 history -t | tail -3

This history is saved in the file .cm/.cmshhistory in the cmsh user’s directory. The timestamps in the file are in unix epoch time format and can be converted to human friendly format with the standard date utility.

[root@dgxsuperpod ~]# tail -2 .cm/.cmshhistory 1615412046
device list
[root@dgxsuperpod ~]# date -d ©1615412046
Wed Mar 10 22:34:06 CET 2021

Mixing cmsh and Unix Shell Commands#

It is often useful for an administrator to be able to execute unix shell commands while conducting cluster management tasks. The cluster manager shell, cmsh, therefore allows users to execute commands in a subshell if the command is prefixed with a ! character:

[dgxsuperpod]% !hostname -f
dgxsuperpod.cm.cluster
[dgxsuperpod]%

Executing the ! command by itself will start an interactive login subshell. By exiting the subshell, the user will return to the cmsh prompt. Besides simply executing commands from within cmsh, the output of OS shell commands can also be used within cmsh. This is done by using the legacy-style “backtick syntax” available in most unix shells.

[dgxsuperpod]% device use `hostname`
[dgxsuperpod->device[dgxsuperpod]]% status
dgxsuperpod ................ [   UP   ]
[dgxsuperpod->device[dgxsuperpod]]%

Output Redirection#

Like unix shells, cmsh also supports output redirection to the shell through common operators such as “>”, “”>>”, and “|”.

[dgxsuperpod]% device list > devices
[dgxsuperpod]% device status >> devices
[dgxsuperpod]% device list | grep dgx001
Type                                Hostname (key) MAC (key)             Category
-------------- -------------- ------------------- ----------
PhysicalNode        dgx001          00:E0:81:2E:F7:96   default

Input Redirection#

Input redirection with cmsh is possible. As is usual, the input can be a string or a file. For example, for a file runthis with some commands stored in it:

[root©dgxsuperpod ~]# cat runthis
device
get dgx001 ip

The commands can be run with the redirection operator as:

[root©dgxsuperpod ~]# cmsh < runthis
device
get dgx001 ip
10.141.0.1

Running the file with the -f option avoids echoing the commands:

[root©dgxsuperpod ~]# cmsh -f runthis
10.141.0.1

ssh#

The ssh command is run from within the device mode of cmsh. If an ssh session is launched from within cmsh, then it clears the screen and is connected to the specified node. Exiting from the ssh session returns the user back to the cmsh launch point.

[headnode]% device ssh dgx001
<screen is cleared>
<some MOTD text and login information is displayed>
[root©dgx001 ~]# exit
Connection to dgx001 closed.
[headnode]% device use headnode
[headnode->device[headnode]]% #now let us connect to the head node from the head node object
[headnode->device[headnode]]% ssh
<screen is cleared>
<some MOTD text and login information is displayed>
[root©headnode ~]# exit
logout
Connection to headnode closed.
[headnode->device[headnode]]%

An alternative to running ssh within cmsh is to launch it in a subshell anywhere from within cmsh, by using !ssh.

time#

The time command within cmsh is a simplified version of the standard unix time command. The time command takes as its argument a second command that is to be executed within cmsh. On execution of the time command, the second command is executed. After execution of the time command is complete, the time the second command took to execute is displayed.

[headnode->device]% time ds dgx001
dgx001 .................. [   UP   ]
time: 0.108s

watch#

The watch command within cmsh is a simplified version of the standard unix watch command. The watch command takes as its argument a second command that is to be executed within cmsh. On execution of the watch command, the second command is executed every two seconds by default, and the output of that second command is displayed.

The repeat interval of the watch command can be set with the --interval|-n option. A running watch command can be interrupted with a <Ctrl>-c.

[headnode->device]% watch newnodes
screen clears
Every 2.0s: newnodes        Thu Dec 3 13:01:45 2015
No new nodes currently available.
[headnode->device]% watch -n 3 status -n dgx001,dgx002
screen clears
Every 3.0s: status -n dgx001,dgx002 Thu Jun 30 17:53:21 2016
dgx001 ...............[ UP ]
dgx002 ...............[ UP ]

foreach#

It is frequently convenient to be able to execute a cmsh command on several objects in parallel. The foreach command is available in several cmsh modes for this purpose. A foreach command takes a list of space-separated object names (the keys of the object) and a list of commands that must be enclosed by parentheses. The foreach command will then iterate through the objects, executing the list of commands on the iterated object each iteration.

Basic syntax for the foreach command: foreach <object1> <object2> · · · ( <command1>; <command2> · · · )

[dgxsuperpod->device]% foreach dgx001 dgx002 (get hostname; status)
dgx001
dgx001 ............. [  UP  ]
dgx002
dgx002 ............. [  UP  ]
[dgxsuperpod->device]%

With the foreach command, it is possible to perform set commands on groups of objects simultaneously, or to perform an operation on a group of objects. The range command (2.4.4.12) provides an alternative to it in many cases. Advanced options for the foreach command: the foreach command advanced options can be viewed from the help page:

[root©headnode ~]# cmsh -c "device help foreach"

The options can be classed as: grouping options (list, type), adding options, conditional options, and looping options.

-n|--nodes,  -g| --group,  -c| --category,  -r| --rack,  -h| --chassis,  -e| --overlay,-l| --role, -m| --image, -u| --union, -i| --intersection
-t| --type chassis| genericdevice| gpuunit| litenode| cloudnode| node| physicalnode|  headnode| powerdistributionunit| racksensor| ethernetswitch| ibswitch| myrinetswitch| unmanagednode

There are two forms of grouping options shown in the preceding text. The first form uses a list of the objects being grouped, while the second form uses the type of the objects being grouped. These options become available according to the cmsh mode used. In the device mode of cmsh, for example, the foreach command has many grouping options available. If objects are specified with a grouping option, then the specified objects can be looped over.

For example, with the list form, the –-category|-c option takes a node category argument (or several categories), while the –-node|-n option takes a node-list argument. Node-lists (2.4.4.10) can also use the following, more elaborate, syntax: <node>,…,<node>,<node>..<node>:

[demo->device]% foreach -c default (status)
dgx001 ............. [ DOWN ]
dgx002 ............. [ DOWN ]
[demo->device]% foreach -g rack8 (status)
...
[demo->device]% foreach -n dgx001,dgx008..dgx016,dgx032 (status)
...
[demo->device]%

With the type form, using the -t|--type option, the literal value to this option must be one of node, cloudnode, virtualnode, and so on. If multiple grouping options are used, then the union operation takes place by default. Both grouping option forms are often used in commands other than foreach for node selection. clone -o|--clone: this option allows the cloning (2.4.3.3) of objects in a loop. In the following example, from device mode, dgx001 is used as the base object from which other nodes from dgx022 up to dgx024 are cloned:

[headnode->device]% foreach --clone dgx001 -n dgx022..dgx024 () [headnode->device*]% list | grep node
Type                 Hostname (key) Ip
------------        -------------- -----------
PhysicalNode        dgx001         10.141.0.1
PhysicalNode        dgx022         10.141.0.22
PhysicalNode        dgx023         10.141.0.23
PhysicalNode        dgx024         10.141.0.24
[headnode->device*]% commit

The cloned objects are placeholder schematics and settings, with different values for some of the settings, such as IP addresses, decided by heuristics. It is not the software disk image of dgx001 that is duplicated by object cloning to the other nodes by this action at this time. clone -a|--add :this option creates the device for a specified device type if it does not exist. Valid types are shown in the help output, and include physicalnode, headnode, and ibswitch.

Conditional options: -s|--status, -q|--quitonunknown The –-status|-s option allows nodes to be filtered by the device status (2.1.1).

[headnode->device]% foreach -n dgx001..dgx004 --status UP (get IP)
10.141.0.1
10.141.0.3

Since the –status option is also a grouping option, the union operation applies to it by default too, when more than one grouping option is being run. The –-quitonunknown|-q option allows the foreach loop to exit when an unknown command is detected. Looping options: *, --verbose|-v The wildcard character * with foreach implies all the objects that the list command lists for that mode. It is used without grouping options:

[myheadnode->device]% foreach * (get ip; status)
10.141.253.1
switch01 ............ [  DOWN  ]
10.141.255.254
myheadnode .......... [   UP   ]
10.141.0.1
dgx001 .............. [ CLOSED ]
10.141.0.2
dgx002 .............. [ CLOSED ]
[myheadnode->device]%

Another example that lists all the nodes per category, by running the listnodes command within category mode:

[headnode->category]% foreach * (get name; listnodes)
default
Type                           Hostname  MAC                       Category  Ip            Network       Status
-------------  --------- ------------------  --------- ----------  ------------ --------
PhysicalNode        dgx001     FA:16:3E:79:4B:77    default    10.141.0.1 internalnet  [ UP ]
PhysicalNode        dgx002     FA:16:3E:41:9E:A8    default    10.141.0.2 internalnet  [ UP ]
PhysicalNode        dgx003     FA:16:3E:C0:1F:E1    default    10.141.0.3 internalnet  [ UP ]

The –-verbose|-v option displays the loop headers during a running loop with timestamps, which can help in debugging.

Node List Syntax#

Node list specifications, as used in the foreach specification and elsewhere, can be of several types. Here are some examples:

adhoc (with a comma, or a space):

example: dgx001,dgx003,dgx005,dgx006

sequential (with two dots or square brackets):

example: dgx001..dgx004 or equivalently: dgx00[1-4] which is: dgx001,dgx002,dgx003,dgx004

sequential extended expansion (only for square brackets):

example: node[001-002]s[001-005] which is: dgx001s001,dgx001s002,dgx001s003,dgx001s004,dgx001s005,dgx002s001,dgx002s002,dgx002s003,dgx002s004,dgx002s005

rack-based:

This is intended to hint which rack a node is located in. Thus: example: r[1-2]n[01-03] which is: r1n01,r1n02,r1n03,r2n01,r2n02,r2n03 This might hint at two racks, r1 and r2, with three nodes each. example: rack[1-2]dgx0[1-3] which is: rack1dgx01,rack1dgx02,rack1dgx03,rack2dgx01,rack2dgx02,rack2dgx03 Like the previous one, but for nodes that were named more verbosely.

sequential exclusion (negation):

example: dgx001..dgx005,-dgx002..dgx003 which is: dgx001,dgx004,dgx005

sequential stride (every <stride> steps):

example: dgx00[1..7:2] which is: dgx001,dgx003,dgx005,dgx007

mixed List

The square brackets and the two dots input specification cannot be used at the same time in one argument. Other than this, specifications can be mixed: example: r1n001..r1n003,r2n003 which is: r1n001,r1n002,r1n003,r2n003 example: r2n003,r[3-5]n0[01-03] which is: r2n003,r3n001,r3n002,r3n003,r4n001,r4n002,\ r4n003,r5n001,r5n002,r5n003 example: node[001-100],-node[004-100:4] which is: every node in the 100 nodes, except for every fourth node.

path to file that contains a list of nodes:

example: ~/some/filepath/<file with list of nodes> The caret sign is a special character in cmsh for node list specifications. It indicates the string that follows is a file path that is to be read.

groupingsyntax#

groupingsyntax refers to usage of dots and square brackets. In other words, it is the syntax of how a grouping is marked so that it is accepted as a list. The list that is specified in this manner can be for input or output purposes. The groupingsyntax command sets the grouping syntax using the following options:

bracket: the square brackets specification.
dot: the two dots specification.
auto: the default. Setting auto means that:
- Either the dot or the bracket specifications are accepted as input.
- The dot specification is used for output.

The chosen groupingsyntax option can be made persistent by adding it to the .cmshrc dotfiles, or to /etc/cmshrc (2.4.1).

[root©headnode ~]# cat .cm/cmsh/.cmshrc
groupingsyntax auto

range#

The range command provides an interactive option to conduct basic foreach commands over a grouping of nodes. When the grouping option has been chosen, the cmsh prompt indicates the chosen range within braces ({}).

[headnode->device]% range -n dgx0[01-24]
[headnode->device{-n dgx001..024}]%

In the preceding example, commands applied at device level will be applied to the range of 24 node objects. Continuing the preceding session—if a category can be selected with the -c option. If the default category just has three nodes, then output displayed could look like:

[headnode->device{-n dgx001..024}]% range -c default
[headnode->device{-c default}]% ds
dgx001      [       UP      ] state flapping
dgx002      [       UP      ]
dgx003      [       UP      ]
Values can be set at device mode level for the selected grouping.
[headnode->device{-c default}]% get revision
[headnode->device{-c default}]% set revision test
[headnode->device{-c default}]% get revision
test test test

Values can also be set within a submode. However, staying in the submode for a full interaction is not possible. The settings must be done by entering the submode using a semicolon (new command statement continuation on same line) syntax, as follows:

[headnode->device{-c default}]% roles; assign pbsproclient; commit
The range command can be regarded as a modal way to carry out an implicit foreach on the grouping object. Many administrators should find it easier than a foreach:
[headnode->device{-c default}]% get ip
10.141.0.1
10.141.0.2
10.141.0.3
[headnode->device{-c default}]% ..
[headnode->device]% foreach -c default (get ip)
10.141.0.1
10.141.0.2
10.141.0.3

Commands can be run inside a range. However, running a pexec command inside a range is typically not the intention of the cluster administrator, even though it can be done:

[headnode->device]% range -n node[001-100]
[headnode->device{-n node[001-100]]]% pexec -n node[001-100] hostname

The preceding starts 100 pexec commands, each running on each of the 100 nodes. Further options to the range command can be seen with the help text for the command (output truncated):

[root©headnode ~]# cmsh -c "device help range"
Name:       range - Set a range of several devices to execute future commands on
Usage:      range [OPTIONS] * (command)
range [OPTIONS] <device> [<device> ...] (command)
Options:    --show  Show the current range
--clear     Clear the range
-v, --verbose  Show header before each element
...

bookmark#

A bookmark can be:

Set with the bookmark command.
Reached using the goto command.

A bookmark is set with arguments to the bookmark command within cmsh as follows:

The user can set the current location as a bookmark:

By using no argument. This is the same as setting no name for it.

By using an arbitrary argument. This is the same as setting an arbitrary name for it.

Apart from any user-defined bookmark names, cmsh automatically sets the special name: “-”. This is always the previous location in the cmsh hierarchy that the user has just come from.

All bookmarks that have been set can be listed with the -l|--list option. Reaching a bookmark: a bookmark can be reached with the goto command. The goto command can take the following as arguments: a blank (no argument), any arbitrary bookmark name, or “-”. The bookmark corresponding to the chosen argument is then reached.

The - bookmark does not need to be preceded by a goto.

[dgxsuperpod]% device use dgx001
[dgxsuperpod->device[dgx001]]% bookmark
[dgxsuperpod->device[dgx001]]% bookmark -l
Name                    Bookmark
----------------  ------------------------
                home;device;use dgx001;
-                  home;
[dgxsuperpod->device[dgx001]]% home
[dgxsuperpod]% goto
[dgxsuperpod->device[dgx001]]% goto -
[dgxsuperpod]% goto
[dgxsuperpod->device[dgx001]]% bookmark dn1
[dgxsuperpod->device[dgx001]]% goto -
[dgxsuperpod]% goto dn1
[dgxsuperpod->device[dgx001]]%
Saving bookmarks, and making them persistent: bookmarks can be saved to a file, such as mysaved, with the -s|--save option, as follows:
[dgxsuperpod]% bookmark -s mysaved

Bookmarks can be made persistent by setting .cmshrc files (2.4.1.2) to load a previously saved bookmarks file whenever a new cmsh session is started. The bookmark command loads a saved bookmark file using the -x|–load option.

[rootheadnode ~]# cat .cm/cmsh/.cmshrc
bookmark -x mysaved

rename#

Nodes can be renamed globally from within partition mode, in the Node basename field associated with the prefix of the node in Base View or in cmsh. However, a more fine-grained batch renaming is also possible with the rename command, and typically avoids having to resort to scripting mechanisms. Using rename is best illustrated by examples:

The examples begin with using the default basename of node and default node digits (padded suffix number length) of 3. A simple rename that is a prefix change, can then be conducted as:

[headnode->device]% rename dgx001..dgx003 test
Renamed: dgx001 to test1
Renamed: dgx002 to test2
Renamed: dgx003 to test3

The rename starts up its own numbering from 1, independent of the original numbering. The change is committed using the commit command. Zero-padding occurs if the number of nodes is sufficiently large to need it. For example, if ten nodes are renamed:

[headnode->device]% rename node[001-010] test
Renamed: dgx001 to test01
Renamed: dgx002 to test02
...
Renamed: dgx009 to test09
Renamed: dgx010 to test10

Then two digits are used for each number suffix, to match the size of the last number. String formatting can be used to specify the number of digits in the padded number field:

[headnode->device]% rename node[001-003] test%04d
Renamed: dgx001 to test0001
Renamed: dgx002 to test0002
Renamed: dgx003 to test0003

The target names can conveniently be specified exactly. It requires an exact name mapping. That is, it assumes the source list size and target list size match:

[headnode->device]% rename node[001-005] test0[1,2,5-7]
Renamed: dgx001 to test01
Renamed: dgx002 to test02
Renamed: dgx003 to test05
Renamed: dgx004 to test06
Renamed: dgx005 to test07

The hostnames are sorted alphabetically before they are applied, with some exceptions based on the listing method used. A --dry-run option can be used to show how the devices will be renamed. Alternatively, the refresh command can clear a proposed set of changes before a commit command commits the change, although the refresh would also remove other pending changes. Exact name mapping could be used to allocate individual servers to several people:

[root©headnode ~]# cmsh
[headnode]% device
[headnode->device]% rename node[001-004] userone, usertwo, userthree, userfour
Renamed: dgx001 to userone
Renamed: dgx002 to usertwo
Renamed: dgx003 to userthree
Renamed: dgx004 to userfour
[headnode->device]% commit

Skipping by several nodes is possible using a colon (:). An example might be to skip by two so that twin servers can be segregated into left/right.

[root©headnode ~]# cmsh
[headnode]% device
[headnode->device]% rename node[001-100:2] left[001-050]
Renamed: dgx001 to left001
Renamed: dgx003 to left002
...
Renamed: dgx097 to left049
Renamed: dgx099 to left050
[headnode->device]% rename node[002-100:2] right[001-050]
Renamed: dgx002 to right001
Renamed: dgx004 to right002
...
Renamed: dgx098 to right049
Renamed: dgx100 to right050
[headnode->device]% commit

Using CMDaemon Environment Variables in Scripts#

Within device mode, the environment command shows the CMDaemon environment variables that can be passed to scripts for a particular device.

[dgxsuperpod->device]% environment dgx001
Key                                                   Value
---------------------------------------------- ----------------------------------
CMD_ACTIVE_MASTER_IP                                          10.141.255.254
CMD_CATEGORY                                          default
CMD_CLUSTERNAME                                       dgxsuperpod
CMD_DEVICE_TYPE                                       ComputeNode
CMD_ENVIRONMENT_CACHE_EPOCH_MILLISECONDS      1615465821582
...

The environment variables can be prepared for use in Bash scripts with the -—export|-e option:

[dgxsuperpod->device]% environment -e dgx001
export CMD_ENVIRONMENT_CACHE_UPDATES=4
export CMD_CATEGORY=default
export CMD_SOFTWAREIMAGE=default-image
export CMD_DEVICE_TYPE=ComputeNode
export CMD_ROLES=
export CMD_FSMOUNT__SLASH_home_FILESYSTEM=nfs
...

Converting Tables to JSON with cmsh#

A list of table entries can be converted to a JSON representation by using the delimiter specification option -d {}. By default, the indentation value used is 2. Other values can be set by putting the value inside the braces.

[headnode->device]% list -f hostname,ip,mac,status
hostname (key)           ip                     mac            status
-------------------- -------------------- -------------------- --------------------
dgx001                  10.141.0.1       FA:16:3E:95:80:9F     [  UP  ]
headnode                10.141.255.254           FA:16:3E:D3:56:E0     [  UP  ]
[headnode->device]% color off; list -f hostname,ip,mac,status -d
[
"hostname (key)": "headnode", "ip": "10.141.255.254",
"mac": "FA:16:3E:D3:56:E0",
"status":  "[       UP      ]"
"hostname (key)": "dgx001",
"ip": "10.141.0.1",
"mac": "FA:16:3E:95:80:9F",
"status":  "[       UP      ]"
]
[headnode->device]%

The color off setting is needed to remove the default console coloring. If the command is to run from the bash shell, the same output can be achieved with:

[root©headnode ~]# cmsh --color=no -c "device; list -f hostname,ip,mac,status -d {}"