NVIDIA cuBLASDx¶
The cuBLAS Device Extensions (cuBLASDx) library enables you to perform selected linear algebra functions known from cuBLAS inside your CUDA kernel. This is currently limited only to General Matrix Multiplication (GEMM). Fusing linear algebra routines with other operations can decrease the latency and improve the overall performance of your application.
cuBLASDx is a part of the MathDx package which also includes cuFFTDx for FFT calculations. Both libraries are designed to work together. Examples of such fusion are included in the package. When using multiple device extensions libraries in a single project they should all come from the same MathDx release.
The documentation consists of three main components:
A quick start guide, General Matrix Multiply Using cuBLASDx.
An API reference for a comprehensive overview of the provided functionality.
Highlights¶
The cuBLASDx library currently provides:
BLAS GEMM routine embeddable into a CUDA kernel.
High performance, no unnecessary data movement from and to global memory.
Customizability, options to adjust GEMM routine for different needs (size, precision, type, targeted CUDA architecture, etc.).
Ability to fuse BLAS kernels with other operations in order to save global memory trips.
Compatibility with future versions of the CUDA Toolkit.