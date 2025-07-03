On This Page
- Pairwise Tests
- SCOPED Tests
- Other Tests
Test Descriptions and Options
All command-line options mentioned in the test descriptions are applicable to the ClusterKit binary (see Running ClusterKit).
Bandwidth Test (-d bw)
The bandwidth test utilizes nonblocking
MPI_Isend and
MPI_Irecv calls.
Options:
Iterations:
-b<iters>,
--biters=<iters>(Default: 16)
Message Size:
-B<size>,
--bsize=<size>(Default: 32 MB)
Unidirectional:
-U,
--unidirectional(send data in one direction only; default is bidirectional)
Tolerance:
-u <tol>,
--btol=<tol>(specify tolerance - see ClusterKit Evaluation Logic for Pairwise Tests)
Latency Test (-d lat)
The latency test is performed with a series of
MPI_Send and
MPI_Recv calls, where one partner sends a message to the other, which then sends a message back. This process is repeated
<iters> times.
Options:
Iterations:
-l<iters>,
--liters=<iters>(Default: 1024)
Message Size:
-L<size>,
--lsize=<size>(Default: 0 Bytes)
Tolerance:
-t <tol>,
--ltol=<tol>(specify tolerance - see ClusterKit Evaluation Logic for Pairwise Tests)
GPU-GPU Latency Test (-d gpu_gpu_lat)
Measures latency of GPU-to-GPU communication with
MPI_ISend and
MPI_IRecv.
Options:
Iterations:
-k,
--gpulati=<iters>(Default: 1024)
Message Size:
-K,
--gpulats=<size>(Default: 0 Bytes)
Tolerance:
-t <tol>,
--ltol=<tol>(specify tolerance - see ClusterKit Evaluation Logic for Pairwise Tests)
Per-GPU test:
-z,
--bygpu(test corresponding GPU pairs: GPU0-to-GPU0, GPU1-to-GPU1, etc.)
Use GPUDIRECT:
-G,
--gpudirect(use GPUDIRECT; default is to copy from GPU memory to host)
GPU-GPU Bandwidth Test (-d gpu_gpu_bw)
Measures bandwidth of GPU-to-GPU communication with
MPI_ISend and
MPI_IRecv.
Options:
Iterations:
-a,
--gpubwi=<iters>(Default: 64)
Message Size:
-A,
--gpubws=<size>(Default: 1 MB)
Tolerance:
-u <tol>,
--btol=<tol>(specify tolerance - see ClusterKit Evaluation Logic for Pairwise Tests)
Per-GPU test:
-z,
--bygpu(test corresponding GPU pairs from different nodes: GPU0-to-GPU0, GPU1-to-GPU1, etc.)
Use GPUDIRECT:
-G,
--gpudirect(use GPUDIRECT; default is to copy from GPU memory to host)
NCCL GPU-GPU Bandwidth Test (-d nccl_bw)
Measures bandwidth of GPU-to-GPU communication with NCCL communications primitives.
Options:
Iterations:
-a,
--gpubwi=<iters>(default: 64)
Message Size:
-A,
--gpubws=<size>(default: 1 MB)
Tolerance:
-u <tol>,
--btol=<tol>(specify tolerance - see ClusterKit Evaluation Logic for Pairwise Tests)
NCCL GPU-GPU Latency Test (-d gpu_gpu_lat)
Measures latency of GPU-to-GPU communication with NCCL communications primitives.
Options:
Iterations:
-k,
--gpulati=<iters>(default: 1024)
Message Size:
-K,
--gpulats=<size>(default: 0 Bytes)
Collective Tests
Collective tests perform selected collective operations across all nodes in a defined scope.
Types of tests:
barrier
Allreduce
bcast
Alltoall (set as an argument to -d option)
Options:
Iterations:
-n,
--niter=<iters>(default: 10000)
NCCL Collective Tests
Performs NCCL collective operations among nodes in the same scope.
Types of Tests:
nccl_bcast
nccl_allreduce
nccl_reduce
nccl_allgather
nccl_reducescatter
Options:
Iterations:
-n,
--niter=<iters>(default: 10,000)
Bisectional Bandwidth Test (-d bisect_bw)
Measures bisectional bandwidth by enabling communication between corresponding nodes in different scopes, assessing potential interference.
Options:
Iterations:
-b<iters>,
--biters=<iters>(default: 16)
Message Size:
-B<size>,
--bsize=<size>(default: 32 MB)
Unidirectional:
-U,
--unidirectional(sends data in one direction only)
Scope Order:
--scope_order=<scope_order>(sets order of scopes for testing)
Scope Order File Format: The file consists of lines formatted as follows:
<pass_num>,<scope1>,<scope2>
Example:
1,scope01,scope02
1,scope03,scope04
2,scope02,scope03
3,scope01,scope04
3,scope02,scope03
This instructs ClusterKit to execute 3 passes, testing specified connections.
Memory Bandwidth Test (-d mb)
The memory bandwidth test can be conducted with one of the following operations:
ADD:
a[i] = b[i] + c[i]
COPY:
a[i] = b[i]
SCALE:
a[i] = D * b[i]
TRIAD:
a[i] = b[i] + D * c[i]
Options:
Iterations:
-I <iters>,
--mbiters=<iters>(default: 16)
Array Size:
-I <size>,
--mbsize=<size>(default: 4 * L3 cache size)
Test Type:
-m <type>,
--memtest=add|copy|scale|triad(default: TRIAD)
Effective Bandwidth Ordered Test (-d beff_o)
Rings of doubling size are formed, starting at 2, and messages are passed in one direction based on rank ordering.
Options:
Iterations:
-e,
--beffi=<iters>(default: 512)
Message Size:
-E,
--beffs=<size>(default: 32 MB)
Tolerance:
-u <tol>,
--btol=<tol>(specify tolerance. Nodes showing results worse than max * tolerance will be considered ‘bad’)
Effective Bandwidth Random Test (-d beff_or)
Similar to the ordered test, but rings are created randomly.
Options:
Iterations:
-e,
--beffi=<iters>(default: 512)
Message Size:
-E,
--beffs=<size>(default: 32 MB)
Tolerance:
-u <tol>,
--btol=<tol>(specify tolerance. Nodes showing results worse than max * tolerance will be considered ‘bad’)
GPU Memory Bandwidth Test (-d gpumb)
Measures bandwidth for host-to-GPU and GPU-to-host memory transfers.
Options:
Iterations:
-j,
--gpumbi=<iters>(default: 16)
Message Size:
-J,
--gpumbs=<size>(default: 0 bytes)
Tolerance:
-u <tol>,
--btol=<tol>(specify tolerance. Nodes showing results worse than max * tolerance will be considered ‘bad’)
GPU Neighbor Latency Test (-d gpu_neighbor_lat)
A restricted variant of the GPU-GPU latency test that measures communication only between GPUs on neighboring nodes.
Options:
Iterations:
-k,
--gpulati=<iters>(default: 1024)
Message Size:
-K,
--gpulats=<size>(default: 0 bytes)
Use GPUDIRECT:
-G,
--gpudirect(use GPUDIRECT - default is to copy from GPU memory to host)
GPU Neighbor Bandwidth Test (-d gpu_neighbor_bw)
A restricted variant of the GPU-GPU bandwidth test that measures communication only between GPUs on neighboring nodes.
Options:
Iterations:
-a,
--gpubwi=<iters>(default: 64)
Message Size:
-A,
--gpubws=<size>(default: 1 MB)
Use GPUDIRECT:
-G,
--gpudirect(use GPUDIRECT - default is to copy from GPU memory to host)