cuBLASDx APIs (`nvmath.device`)#

Overview#

These APIs offer integration with the NVIDIA cuBLASDx library. Detailed documentation of cuBLASDx can be found in the cuBLASDx documentation.

Note

The Matmul device API in module nvmath.device currently supports cuBLASDx 0.4.1, also available as part of MathDx 25.06.

`Matmul`(size, precision, data_type, *[, sm, ...])	A class that encapsulates a partial Matmul device function.
`matmul`(*[, compiler, code_type, ...])	Create an `Matmul` object that encapsulates a compiled and ready-to-use device function for matrix multiplication.
`make_tensor`(array, layout)	make_tensor is a helper function for creating `nvmath.device.OpaqueTensor` objects.
`axpby`(alpha, x_tensor, beta, y_tensor)	AXPBY operation: y = alpha * x + beta * y
`copy`(src, dst[, alignment])	Copies data from the source tensor to the destination tensor.
`copy_fragment`(src, dst)	A bidirectional copying method to copy data between register fragments and global memory tensors.
`clear`(arr)	Clears the contents of the given tensor by setting all elements to zero.
`copy_wait`()	Creates synchronization point.
`OpaqueTensor`(*args)	Abstraction over the cuBLASDx tensor type (an alias of the CuTe tensor type).
`Layout`()	Layout for the `nvmath.device.OpaqueTensor`.
`Partition`(*args)	Partition of a global memory tensor into a partitioned tensor.
`Partitioner`(*args)	Partitioner is an abstraction for partitioning a global memory tensor into a partitioned tensor.
`SharedStorageCalc`()	Helper class to calculate shared storage size.
`LeadingDimension`(a, b, c)	A namedtuple class that encapsulates the three leading dimensions in matrix multiplication \(C = \alpha Op(A) Op(B) + \beta C\).
`TransposeMode`(a, b)	A namedtuple class that encapsulates the transpose mode for input matrices `A` and `B` in matrix multiplication.