cuBLASDx APIs (nvmath. device)#
Overview#
These APIs offer integration with the NVIDIA cuBLASDx library. Detailed documentation of cuBLASDx can be found in the cuBLASDx documentation.
Note
The Matmul device API in module
nvmath. currently supports cuBLASDx 0.4.1, also available
as part of MathDx 25.06.
API Reference#
|
A class that encapsulates a partial Matmul device function. |
|
Create an |
|
make_tensor is a helper function for creating |
|
AXPBY operation: y = alpha * x + beta * y |
|
Copies data from the source tensor to the destination tensor. |
|
A bidirectional copying method to copy data between register fragments and global memory tensors. |
|
Clears the contents of the given tensor by setting all elements to zero. |
Creates synchronization point. |
|
|
Abstraction over the cuBLASDx tensor type (an alias of the CuTe tensor type). |
|
Layout for the |
|
Partition of a global memory tensor into a partitioned tensor. |
|
Partitioner is an abstraction for partitioning a global memory tensor into a partitioned tensor. |
Helper class to calculate shared storage size. |
|
|
A namedtuple class that encapsulates the three leading dimensions in matrix multiplication \(C = \alpha Op(A) Op(B) + \beta C\). |
|
A namedtuple class that encapsulates the transpose mode for input matrices |