cuSOLVERMp: A High-Performance CUDA Library for Distributed Dense Linear Algebra¶
NVIDIA cusolverMp is a high-performance, distributed-memory, GPU-accelerated library that provides tools for the solution of dense linear systems and eigenvalue problems.
cuSOLVERMp is compatible with 2D block-cyclic data layout and provides ScaLAPACK-like C APIs.
A companion library, CAL, contains utilities to manage communicators and to synchronize processes in a safe way.
Download: cuSOLVERMp library is available through NVIDIA HPC SDK
Key Features¶
Multi-process, multi-GPU.
One process per GPU.
ScaLAPACK-like C functionalities and interfaces to facilitate porting.
Configurable communication backends (UCC, NCCL, UCX, etc)
Logging and tracing.
Tensor-core accelerated.
Support¶
Supported SM Architectures:
SM 8.0,SM 8.6Supported OSes:
LinuxSupported CPU Architectures:
x86_64,pp64Supported communication Libraries:
UCC 1.1+ (shipped with HPC-SDK)
Prerequisites¶
HPC-SDK and it’s HPC-X communication module need to be set up
Dependencies:
cudart,nvrtc,cublas.h,cusolverDn.h,cal.hheaders.libcal.soandcusolverMp.sobinaries.