Getting Started#
, cudaFree()
, cudaMemcpy()
, and cudaMemcpyAsync()
.Hardware and Software requirements#
Hardware requirements#
x86_64 CPU architecture
NVIDIA data center GPUs: with Volta (
SM 7.0
), Ampere (SM 8.0
), or Hopper (SM 9.0
) architecturesRecommended NVIDIA Infiniband solutions for accelerated inter-node communication
Software requirements#
Supported OS: Linux x86_64
Supported CUDA: 11.8, 12.1.1
Required packages
CUDA 11.8.0
CUDA Toolkit 11.8.0 (
HPC-X v2.14 ( - contains OpenUCC and OpenUCX that satisfy cuSOLVERMp requirements.
NCCL v2.16.x ( - required to achieve good performance.
CAL ( - a companion library used for communication.
CUDA 12.1.1
CUDA Toolkit 12.1.1 (
HPC-X v2.16 ( - contains OpenUCC and OpenUCX that satisfy cuSOLVERMp requirements.
NCCL v2.16.x ( - required to achieve good performance.
CAL ( - a companion library used for communication.
Recommended packages
OpenUCX v1.10+ (openucx/ucx) and OpenUCC v1.1+ (openucx/ucc) - alternative to HPC-X you can install OpenUCX and OpenUCC manually. Both needs to be configured with CUDA support
GDRCopy v2.0+ (NVIDIA/gdrcopy) and nv_peer_mem (Mellanox/nv_peer_memory) - Allows underlying communication packages use GPUDirect RDMA. If you install OpenUCX yourself - it should be configured with GDRCopy support.
Mellanox OFED ( - drivers for NVIDIA Infiniband Adapters ( If you install OpenUCX yourself - it should be configured with IB communications support.
Synchronous Execution#
Data Layout of Local Matrices#
1. Bootstrap CAL communicator: cal_comm_create().2. Initialize the library handle: cusolverMpCreate().3. Initialize grid descriptors: cusolverMpCreateDeviceGrid().4. Initialize matrix descriptors: cusolverMpCreateMatrixDesc().5. Query the host and device buffer sizes for a given routine.6. Allocate host and device workspace buffers for a given routine.6. Execute the routine to perform the desired computation.7. Synchronize local stream to make sure the result is available, if required: cal_stream_sync().8. Deallocate host and device workspace.9. Destroy matrix descriptors: cusolverMpDestroyMatrixDesc().10. Destroy grid descriptors: cusolverMpDestroyGrid().11. Destroy cuSOLVERMp library handle: cusolverMpDestroy().12. Destroy CAL library handle: cal_comm_destroy().