cuBLASMp: A High-Performance CUDA Library for Distributed Dense Linear Algebra¶
NVIDIA cublasMp is a high performance, multi-process, GPU accelerated library for distributed basic dense linear algebra.
cuBLASMp is compatible with 2D block-cyclic data layout and provides PBLAS-like C APIs.
A companion library, CAL, contains utilities to manage communicators and to synchronize processes in a safe way.
Download: cuBLASMp library is available through NVIDIA Developer Zone and NVIDIA HPC SDK as early access.
Key Features¶
Multi-process, multi-GPU.
One process per GPU.
PBLAS-like C functionalities and interfaces to facilitate porting.
Configurable communication backends (UCC, NCCL, UCX, etc)
Logging and tracing.
Tensor-core accelerated.