Epilogs#

In this tutorial, we will demonstrate the use of cuBLAS epilogs. An epilog is a simple computation executed after performing the matrix multiplication. Epilogs are usually much faster than the same computation executed manually on the result.

[1]:

from nvmath.linalg.advanced import MatmulEpilog as Epilog

Let us begin by listing the available epilogs:

[2]:

", ".join(e.name for e in Epilog)

[2]:

'DEFAULT, RELU, RELU_AUX, BIAS, RELU_BIAS, RELU_AUX_BIAS, DRELU, DRELU_BGRAD, GELU, GELU_AUX, GELU_BIAS, GELU_AUX_BIAS, DGELU, DGELU_BGRAD, BGRADA, BGRADB'

Epilogs can be specified by epilog= keyword argument of matmul (or Matmul.plan, if you are using stateful API).

Now, let us describe in detail few of the epilogs available.

RELU#

RELU epilog executes ReLU (Rectified Linear Unit) on the result. It replaces all negative elements with zeros:

[3]:

import cupy
from nvmath.linalg.advanced import matmul

a = cupy.asarray([[1, 2], [3, -4]], dtype=cupy.float32)
b = cupy.asarray([[5, -6], [7, -8]], dtype=cupy.float32)

print("A @ B:")
print(matmul(a, b))
print()
print("relu(A @ B):")
print(matmul(a, b, epilog=Epilog.RELU))

A @ B:
[[ 19. -22.]
 [-13.  14.]]

relu(A @ B):
[[19.  0.]
 [ 0. 14.]]

BIAS#

BIAS epilog adds 1-dimensional bias to the result, broadcasting it along the columns of the result. This epilog requires an extra input (the bias to add), which we can provide with epilog_inputs argument:

[4]:

a = cupy.asarray(cupy.ones((3, 5)), dtype=cupy.float32)
b = cupy.asarray(cupy.ones((5, 4)), dtype=cupy.float32)
bias = cupy.asarray([0, 10, 20], dtype=cupy.float32)

print("a @ b:")
print(matmul(a, b))
print()
print("a @ b + bias:")
print(matmul(a, b, epilog=Epilog.BIAS, epilog_inputs={"bias": bias}))

a @ b:
[[5. 5. 5. 5.]
 [5. 5. 5. 5.]
 [5. 5. 5. 5.]]

a @ b + bias:
[[ 5.  5.  5.  5.]
 [15. 15. 15. 15.]
 [25. 25. 25. 25.]]

RELU_BIAS#

Also, there is a RELU_BIAS epilog which first adds the bias and then applies ReLU:

[5]:

a = cupy.asarray([[1, 2], [3, -4]], dtype=cupy.float32)
b = cupy.asarray([[5, -6], [7, -8]], dtype=cupy.float32)
bias = cupy.asarray([-10, 10], dtype=cupy.float32)

# No epilog
print("a @ b:")
print(matmul(a, b))
print()

# BIAS epilog, for reference only
print("a @ b + bias:")
print(matmul(a, b, epilog=Epilog.BIAS, epilog_inputs={"bias": bias}))
print()

# RELU_BIAS epilog
print("relu(a @ b + bias):")
print(matmul(a, b, epilog=Epilog.RELU_BIAS, epilog_inputs={"bias": bias}))

a @ b:
[[ 19. -22.]
 [-13.  14.]]

a @ b + bias:
[[  9. -32.]
 [ -3.  24.]]

relu(a @ b + bias):
[[ 9.  0.]
 [ 0. 24.]]

Gradient computations (`_AUX` epilogs)#

As you may have noticed, the functions available in epilogs are commonly used in neural networks training. During backpropagation in a neural network, we need to compute the gradient of the transformations applied. For this reason, we need to store an auxiliary information about their inputs.

For example, to compute the gradient of ReLU, we need to know which elements of the input matrix were negative. To get this information, we can use RELU_AUX epilog, which returns an auxiliary output indicating the sign of elements before ReLU. In case of ReLU, this auxiliary output is a bitmask for space optimization reasons. You don’t need to unpack and interpret this bitmask in any way - there is a dedicated DRELU epilog which can handle this for you

Let us show a simple example with RELU_AUX:

[6]:

a = cupy.asarray([[1, 2], [3, -4]], dtype=cupy.float32)
b = cupy.asarray([[5, -6], [7, -8]], dtype=cupy.float32)

print("a @ b:")
print(matmul(a, b))
print()

print("relu(a @ b):")
result, aux = matmul(a, b, epilog=Epilog.RELU_AUX)  # Note that a tuple is returned!
print(result)
print()

print("aux is a", type(aux), "with the following keys:", aux.keys())
print("aux[relu_aux] is", type(aux["relu_aux"]), "with dtype", aux["relu_aux"].dtype)

a @ b:
[[ 19. -22.]
 [-13.  14.]]

relu(a @ b):
[[19.  0.]
 [ 0. 14.]]

aux is a <class 'dict'> with the following keys: dict_keys(['relu_aux'])
aux[relu_aux] is <class 'cupy.ndarray'> with dtype int8

Learning more#

We will show a practical use case for the epilogs in the next tutorial, in which we will implement a simple digit recognition neural network using nvmath-python matmul and its epilogs.

To learn more about the available epilogs, you can visit cuBLAS documentation on epilogs.