Matmul#

class nvmath.device.Matmul(
size,
precision,
data_type,
*,
sm=None,
block_size=None,
block_dim=None,
leading_dimension=None,
transpose_mode=None,
arrangement=None,
alignment=None,
function='MM',
static_block_dim=False,
execution='Block',
)[source]#

A class that encapsulates a partial Matmul device function. A partial device function can be queried for available or optimal values for some knobs (such as leading_dimension or block_dim).

Changed in version 0.7.0: Matmul has replaced BlasOptions and BlasOptionsComplete.

Parameters:
  • size – A sequence of integers denoting the three dimensions (m, n, k) for the matrix multiplication problem.

  • precision – The computation precision specified as a numpy float dtype, currently supports numpy.float16, numpy.float32 and numpy.float64.

  • data_type – The data type of the input matrices, can be either 'real' or 'complex'.

  • sm (ComputeCapability) – Target mathdx compute-capability.

  • block_size (int) – The total block size, optional. If not provided or set to 'suggested', will be set to a suggested value for 1D block dim.

  • block_dim (Dim3) – The block dimension for launching the CUDA kernel, optional. If not provided or set to 'suggested', will be set to a suggested value. Cannot be used when block_size is explicitly specified.

  • leading_dimension (LeadingDimension) – The leading dimensions for the input matrices, optional. If not provided, will be set to match the matrix row/column dimension. Alternatively, if provided as 'suggested', will be set to a suggested value for optimal performance.

  • transpose_mode (TransposeMode) – The transpose mode for all input matrices ; transpose_mode or arrangement must be provided.

  • arrangement (Arrangement) – The arrangement for all input matrices ; transpose_mode or arrangement must be provided.

  • alignment (Alignment) – The alignment for the input matrices in shared memory. Defines the alignments (in bytes) of the input matrices A, B, and C (either arrays or wrapped in opaque tensors) that are passed to the execute(…) method. Default alignment is equal to an element size of the matrix unless used suggested layout. In that case alignment is greater or equal than the element size.

  • function (str) – A string specifying the name of the function. Currently supports 'MM' (default) for matrix multiplication.

  • execution (str) – A string specifying the execution method, can be 'Block' or 'Thread'.

  • execute_api (str) –

    A string specifying the signature of the function that handles problems with default or custom/dynamic leading dimensions. Could be 'static_leading_dimensions' or 'dynamic_leading_dimensions'.

    Changed in version 0.5.0: execute_api is not part of the Matmul (ex. Blas) type. Pass this argument to nvmath.device.matmul() instead.

  • tensor_types (Sequence[str]) –

    A list of strings specifying the tensors being used at execute signature.

    Changed in version 0.5.0: tensor_types is not part of the Matmul (ex. Blas) type. Pass this argument to nvmath.device.matmul() instead.

See also

The attributes of this class provide a 1:1 mapping with the CUDA C++ cuBLASDx APIs. For further details, please refer to cuBLASDx documentation.

Methods

__init__(
size,
precision,
data_type,
*,
sm=None,
block_size=None,
block_dim=None,
leading_dimension=None,
transpose_mode=None,
arrangement=None,
alignment=None,
function='MM',
static_block_dim=False,
execution='Block',
)[source]#
create(
code_type=None,
compiler=None,
execute_api=None,
tensor_types=None,
global_memory_alignment=None,
**kwargs,
)[source]#

Creates a copy of the instance with provided arguments updated.

Deprecated since version 0.7.0: Please use functools.partial() instead.

definition()[source]#

Deprecated since version 0.7.0.

execute(*args)[source]#
get_layout_gmem_a(
leading_dimension: int | None = None,
) Layout[source]#
get_layout_gmem_b(
leading_dimension: int | None = None,
) Layout[source]#
get_layout_gmem_c(
leading_dimension: int | None = None,
) Layout[source]#
get_layout_smem_a() Layout[source]#
get_layout_smem_b() Layout[source]#
get_layout_smem_c() Layout[source]#
get_shared_storage_size() int[source]#
get_shared_storage_size(lda: int, ldb: int, ldc: int) int
get_shared_storage_size(
matrix_a_layout: Layout,
matrix_b_layout: Layout,
matrix_c_layout: Layout,
) int
get_shared_storage_size_ab() int[source]#
get_shared_storage_size_ab(lda: int, ldb: int) int
get_shared_storage_size_ab(
matrix_a_layout: Layout,
matrix_b_layout: Layout,
) int
suggest_layout_rmem_c() Layout[source]#
suggest_layout_smem_a() Layout[source]#
suggest_layout_smem_b() Layout[source]#
suggest_layout_smem_c() Layout[source]#
suggest_partitioner() Partitioner[source]#
valid(*knobs)[source]#

Attributes

a_dim#
a_size#
a_value_type#
alignment#
arrangement#
b_dim#
b_size#
b_value_type#
block_dim#
block_size#
c_dim#
c_size#
c_value_type#
codes#

A list of Code objects for all lto functions.

data_type#
execution#
files#

The list of binary files for the lto functions.

function#
input_type#
leading_dimension#
max_threads_per_block#
output_type#
precision#
shared_memory_size#
size#
sm#
static_block_dim#
transpose_mode#
value_type#