cusolverMp: A High-Performance CUDA Library for Distributed Dense Linear Algebra

NVIDIA cusolverMp is a high-performance, distributed-memory, GPU-accelerated library that provides tools for the solution of dense linear systems and eigenvalue problems.

cusolverMp is compatible with 2D block-cyclic data layout and provides ScaLAPACK-like C APIs.

A companion library, CAL, contains utilities to manage communicators and to synchronize processes in a safe way.

Download: cusolverMp library is available through NVIDIA HPC SDK

Key Features

  • Multi-process, multi-GPU.
  • One process per GPU.
  • ScaLAPACK-like C functionalities and interfaces to facilitate porting.
  • Configurable communication backends (NCCL, MPI, UCX)
  • Logging and tracing.
  • Tensor-core accelerated.


  • Supported SM Architectures: SM 8.0, SM 8.6
  • Supported OSes: Linux
  • Supported CPU Architectures: x86_64, pp64
  • Supported MPI Libraries: OpenMPI (shipped with HPC-SDK), SpectrumMPI 11.x


  • HPC-SDK.
  • Dependencies: cudart, nvrtc, cublas.h, cusolverDn.h, cal.h headers. and binaries.