ClusterKit

ClusterKit is a multifaceted node assessment tool for high performance clusters. Currently, ClusterKit is capable of testing latency, bandwidth, effective bandwidth, memory bandwidth, GFLOPS by node, per-rack collective performance, as well as bandwidth and latency between GPUs and local/remote memory. ClusterKit employs well known techniques and tests to arrive at these performance metrics and is intended to give the user a general look at the health and performance of a cluster.

After loading the HPC-X package, and in a job allocation, issue one of the following commands.

Copy
Copied!
            

mpirun -x UCX_NET_DEVICES=mlx5_4:1 $HPCX_CLUSTERKIT_DIR/bin/clusterkit

To allow UCX to choose the network device/devices:

Copy
Copied!
            

mpirun $HPCX_CLUSTERKIT_DIR/bin/clusterkit

Note that multi-rail is enabled by default.

When not using a job scheduler, the mpirun command line arguments that specify the hosts should be added.

The application will run with the default set of tests. Run with --help to see all command line options. During the program run, interim results for each test are printed, so you can track the progress. This is particularly important for very large clusters, with thousands of nodes.

Towards the end of the program output, you will see the name of the output directory, which is based on the time and date, and should be similar to the following.

Copy
Copied!
            

Output directory: 20190915_061634/

The output directory is automatically created, and .json and .txt results are written for each test.

The .txt files are human readable, the .json files are for importing into the UFM-hosted viewer. For small scale, the .txt files generally suffice, but for larger clusters, the UFM-hosted viewer is recommended for viewing the .json files.

Clusterkit can also be run using the supplied clusterkit.sh convenience script. This script provides a simple interface to configure some internal UCX parameters.

Copy
Copied!
            

./clusterkit.sh [options] <parameters>   Parameters: -v|--verbose Set verbose mode -f|--hostfile <hostfile> File with newline separated hostnames to run tests on. -r|--hpcx_dir <path> Path to HPCX installation root folder (or use env HPCX_DIR)   Options: -p|--ppn <number> Select number of processes per hostname (default: 1) -d|--hca_list "string" Comma separated list of HCAs to use (default: autoselect) -t|--transport_list "string" List of RDMA transports to use (rc,dc,ud) (default: autoselect best) -z|--traffic <nn> Run traffic for 'nn' minutes -s|--ssh Use ssh for process launching (default: autoselect) -h|--help Show help message -n|--dry-run Dry run (do nothing, only print) -m|--map-by [node|core|socket] (Used in MPI argument: -- map-by ppr:ppn:map-by) -y|--bycore Run on ALL cores, not just a single core per node -k|--test_intra_node Run intra-node tests for bandwidth and latency (default: skip intra-node) -U|--unidirectional Run unidirectional bandwidth tests (default: bidirectional) -e|--mapper shell script that maps local MPI rank to a core and one or more HCAs e.g. for testing machines with multiple HCAs, where each HCA needs to be tested -g|--gpu Run GPU lat/bw/neighbor tests -G|--gpudirect Run GPU tests with GPU-Direct. -w|--rdma_write Use RDMA-write to pass data to the remote host. -o|--rdma_read Use RDMA-read to access data from the remote host. -P|--performance Set CPU scaling governor to 'performance'. Set back to 'powersave' after execution -a|--output Generate zip of heatmaps and tgz of JSON files from output. Overrides -k. output options: -l|--normalize Normalize latency results default: false -C|--clean Erase output cache directory default: false        -x|--exe_opt Options for clusterkit. -i|--mpi_opt Options for mpirun.     To pass additional MPI options, use the mpi_opt environment variable. To pass additional options to the clusterkit executable, use the ext_opt environment variable.   Examples: % ./clusterkit.sh --ssh --hostfile hostfile.txt   % ./clusterkit.sh --hca_list "mlx5_0:1,mlx5_2:1" --hostfile hostfile.txt   % exe_opt="--gpudirect " ./clusterkit.sh --hca_list "mlx5_0:1,mlx5_2:1" --hostfile hostfile.txt   % mpi_opt="-x UCX_RNDV_SCHEME=get_zcopy" ./clusterkit.sh --hca_list "mlx5_0:1,mlx5_2:1" --hostfile hostfile.txt

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.