Introduction to GEMM with nvmath-python#
In this tutorial we will demonstrate how to perform GEMM (General Matrix Multiply) with nvmath-python library.
We will demonstrate two APIs to execute matrix multiplication with nvmath-python:
matmul
function (stateless API), which performs a single GEMM on its arguments and returns the result.Matmul
class (stateful API), which can perform multiple GEMMs on different input data, allowing to amortize the cost of initialization and planning.
Stateless API#
Let us demonstrate the usage of matmul
function. We will perform the computations on CuPy arrays, but nvmath-python supports NumPy arrays and PyTorch tensors as well.
[1]:
import cupy
from nvmath.linalg.advanced import matmul
Basic matrix multiplication#
In its most basic use-case, matmul
can be used to just multiply two matrices:
[2]:
a = cupy.asarray([[1, 2], [3, 4]], dtype=cupy.float32)
b = cupy.asarray([[0, 1], [2, 3]], dtype=cupy.float32)
result = matmul(a, b)
print(result)
print("Result type:", type(result))
[[ 4. 7.]
[ 8. 15.]]
Result type: <class 'cupy.ndarray'>
Note that the result is CuPy array as well. Nvmath-python always returns the result of the same type as its inputs: if the inputs were PyTorch tensors, the result will be a PyTorch tensor as well.
GEMM#
We can as well use it to perform a GEMM, which is defined as:
for matrices \(A\), \(B\) and \(C\) and constants \(\alpha\) and \(\beta\).
[3]:
a = cupy.asarray([[1, 2], [3, 4], [5, 6]], dtype=cupy.float32)
b = cupy.asarray([[0, 1], [2, 3]], dtype=cupy.float32)
c = cupy.asarray([[4, 5], [6, 7], [8, 9]], dtype=cupy.float32)
result = matmul(a, b, c=c, alpha=0.5, beta=100)
print(result)
# Check with cupy
assert cupy.allclose(result, 0.5 * a @ b + 100 * c)
[[402. 503.5]
[604. 707.5]
[806. 911.5]]
Stateful API#
The stateless matmul
function, which we demonstrated above, plans and executes the matrix multiplication at once.
The stateful API allows you to first create a Matmul
object, then plan the multiplication, optionally fine-tune it and then execute (possibly multiple times). This stateful API is recommended for use cases where similar multiplication (with same shapes and types of tensors, but with different numbers inside) is performed many times. This amortizes the planning cost and results in significant reduction of the overhead.
[4]:
from nvmath.linalg.advanced import Matmul
Basic usage#
Let us show an example of how Matmul
object can be used.
[5]:
a = cupy.asarray([[1, 2], [3, -4]], dtype=cupy.float32)
b = cupy.asarray([[5, -6], [7, -8]], dtype=cupy.float32)
mm = Matmul(a, b)
With our Matmul
object created, let us execute the planning. In this phase the underlying cuBLAS library will choose the best algorithms based on input shapes and types. The proposed algorithms will be stored in mm.algorithms
and ordered starting from the best one.
[6]:
mm.plan()
print(f"{len(mm.algorithms)} algorithms were proposed in the planning phase.")
7 algorithms were proposed in the planning phase.
Optionally, you can autotune the multiplications. Autotuning runs benchmarks on each of the algorithms in mm.algorithms
and shuffles the list according to the results.
[7]:
mm.autotune()
With the planning complete, you can execute the multiplication:
[8]:
result1 = mm.execute()
print(result1)
[[ 19. -22.]
[-13. 14.]]
Resetting the operands#
To perform the next multiplication, you can use mm.reset_operands
to change all or some of the inputs:
[9]:
mm.reset_operands(a=cupy.asarray([[0, 1], [0, 0]], dtype=cupy.float32))
result2 = mm.execute()
print(result2)
[[ 7. -8.]
[ 0. 0.]]
The new inputs must be of the same shape and type as the original. Otherwise, an error will be raised:
[10]:
try:
mm.reset_operands(a=cupy.asarray([[7]], dtype=cupy.float32)) # This is 1x1 matrix!
except ValueError as e:
print("Error:", e)
Error: The extents of the new operand must match the extents of the original operand.
[11]:
try:
mm.reset_operands(a=cupy.asarray([[0, 1], [0, 0]], dtype=cupy.float64)) # This is float64!
except ValueError as e:
print("Error:", e)
Error: The data type of the new operand must match the data type of the original operand.
Managing the resources#
Finally, we should release the resources of mm
:
[12]:
mm.free()
Alternatively, we can use with
to manage the resources automatically:
[ ]:
a = cupy.asarray([[1, 2], [3, -4]], dtype=cupy.float32)
b = cupy.asarray([[5, -6], [7, -8]], dtype=cupy.float32)
c = cupy.asarray([[10, 10], [20, 30]], dtype=cupy.float32)
with Matmul(a, b, c=c, alpha=2, beta=0.7) as mm:
mm.plan()
result = mm.execute()
# mm.free() is no longer needed, the resources are freed by the context manager.
print(result)
[[ 45. -37.]
[-12. 49.]]
In fact, matmul
is just a thin wrapper around Matmul
, which under the hood creates a Matmul
object, and then calls .plan
and .execute
just as we did above.
Learning more#
To learn more, we encourage you to visit our documentation pages for linear algebra.