The bandwidth test utilizes nonblocking MPI_Isend and MPI_Irecv calls.

Options:

Iterations : -b<iters> , --biters=<iters> (Default: 16)

Message Size : -B<size> , --bsize=<size> (Default: 32 MB)

Unidirectional : -U , --unidirectional (send data in one direction only; default is bidirectional)

Tolerance: -u <tol> , --btol=<tol> (specify tolerance - see ClusterKit Evaluation Logic for Pairwise Tests)

The latency test is performed with a series of MPI_Send and MPI_Recv calls, where one partner sends a message to the other, which then sends a message back. This process is repeated <iters> times.

Options:

Iterations : -l<iters> , --liters=<iters> (Default: 1024)

Message Size : -L<size> , --lsize=<size> (Default: 0 Bytes)

Tolerance: -t <tol> , --ltol=<tol> (specify tolerance - see ClusterKit Evaluation Logic for Pairwise Tests)

Measures latency of GPU-to-GPU communication with MPI_ISend and MPI_IRecv .

Options:

Iterations : -k , --gpulati=<iters> (Default: 1024)

Message Size : -K , --gpulats=<size> (Default: 0 Bytes)

Tolerance : -t <tol> , --ltol=<tol> (specify tolerance - see ClusterKit Evaluation Logic for Pairwise Tests)

Per-GPU test : -z , --bygpu (test corresponding GPU pairs: GPU0-to-GPU0, GPU1-to-GPU1, etc.)

Use GPUDIRECT: -G , --gpudirect (use GPUDIRECT; default is to copy from GPU memory to host)

Measures bandwidth of GPU-to-GPU communication with MPI_ISend and MPI_IRecv .

Options:

Iterations : -a , --gpubwi=<iters> (Default: 64)

Message Size : -A , --gpubws=<size> (Default: 1 MB)

Tolerance : -u <tol> , --btol=<tol> (specify tolerance - see ClusterKit Evaluation Logic for Pairwise Tests)

Per-GPU test : -z , --bygpu (test corresponding GPU pairs from different nodes: GPU0-to-GPU0, GPU1-to-GPU1, etc.)

Use GPUDIRECT: -G , --gpudirect (use GPUDIRECT; default is to copy from GPU memory to host)

Measures bandwidth of GPU-to-GPU communication with NCCL communications primitives.

Options:

Iterations : -a , --gpubwi=<iters> (default: 64)

Message Size : -A , --gpubws=<size> (default: 1 MB)

Tolerance: -u <tol> , --btol=<tol> (specify tolerance - see ClusterKit Evaluation Logic for Pairwise Tests)

Measures latency of GPU-to-GPU communication with NCCL communications primitives.

Options: