cuSOLVERMp: A High-Performance CUDA Library for Distributed Dense Linear Algebra¶
NVIDIA cusolverMp is a high-performance, distributed-memory, GPU-accelerated library that provides tools for the solution of dense linear systems and eigenvalue problems.
cuSOLVERMp is compatible with 2D block-cyclic data layout and provides ScaLAPACK-like C APIs.
A companion library, CAL, contains utilities to manage communicators and to synchronize processes in a safe way.
Download: cuSOLVERMp library is available through NVIDIA HPC SDK
Key Features¶
- Multi-process, multi-GPU. 
- One process per GPU. 
- ScaLAPACK-like C functionalities and interfaces to facilitate porting. 
- Configurable communication backends (UCC, NCCL, UCX, etc) 
- Logging and tracing. 
- Tensor-core accelerated. 
Support¶
- Supported SM Architectures: - SM 8.0,- SM 8.6
- Supported OSes: - Linux
- Supported CPU Architectures: - x86_64,- pp64
- Supported communication Libraries: - UCC 1.1+ (shipped with HPC-SDK)
Prerequisites¶
- HPC-SDK and it’s HPC-X communication module need to be set up 
- Dependencies: - cudart,- nvrtc,- cublas.h,- cusolverDn.h,- cal.hheaders.- libcal.soand- cusolverMp.sobinaries.