BlasOptions#

class nvmath.device.BlasOptions(
size,
precision,
data_type,
*,
code_type=None,
block_size=None,
block_dim=None,
leading_dimension=None,
transpose_mode=None,
arrangement=None,
alignment=None,
global_memory_alignment=None,
function='MM',
static_block_dim=False,
execution='Block',
execute_api='static_leading_dimensions',
tensor_types=None,
)[source]#

A class that encapsulates a partial BLAS device function. A partial device function can be queried for available or optimal values for some knobs (such as leading_dimension or block_dim). It does not contain a compiled, ready-to-use, device function until finalized using create().

Parameters:
  • size – A sequence of integers denoting the three dimensions (m, n, k) for the matrix multiplication problem.

  • precision – The computation precision specified as a numpy float dtype, currently supports numpy.float16, numpy.float32 and numpy.float64.

  • data_type – The data type of the input matrices, can be either 'real' or 'complex'.

  • code_type (CodeType) – The target GPU code and compute-capability.

  • block_size (int) – The total block size, optional. If not provided or set to 'suggested', will be set to a suggested value for 1D block dim.

  • block_dim (Dim3) – The block dimension for launching the CUDA kernel, optional. If not provided or set to 'suggested', will be set to a suggested value. Cannot be used when block_size is explicitly specified.

  • leading_dimension (LeadingDimension) – The leading dimensions for the input matrices, optional. If not provided, will be set to match the matrix row/column dimension. Alternatively, if provided as 'suggested', will be set to a suggested value for optimal performance.

  • transpose_mode (TransposeMode) – The transpose mode for all input matrices ; transpose_mode or arrangement must be provided.

  • arrangement (Arrangement) – The arrangement for all input matrices ; transpose_mode or arrangement must be provided.

  • alignment (Alignment) – The alignment for the input matrices in shared memory. Defines the alignments (in bytes) of the input matrices A, B, and C (either arrays or wrapped in opaque tensors) that are passed to the execute(…) method. Default alignment is equal to an element size of the matrix unless used suggested layout. In that case alignment is greater or equal than the element size.

  • function (str) – A string specifying the name of the function. Currently supports 'MM' (default) for matrix multiplication.

  • execution (str) – A string specifying the execution method, can be 'Block' or 'Thread'.

  • execute_api (str) – A string specifying the signature of the function that handles problems with default or custom/dynamic leading dimensions. Could be 'static_leading_dimensions' or 'dynamic_leading_dimensions'.

  • global_memory_alignment (Alignment) – Same as alignment, but for the global memory. Used to optimize copying between shared and global memory.

See also

The attributes of this class provide a 1:1 mapping with the CUDA C++ cuBLASDx APIs. For further details, please refer to cuBLASDx documentation.

Methods

__init__(
size,
precision,
data_type,
*,
code_type=None,
block_size=None,
block_dim=None,
leading_dimension=None,
transpose_mode=None,
arrangement=None,
alignment=None,
global_memory_alignment=None,
function='MM',
static_block_dim=False,
execution='Block',
execute_api='static_leading_dimensions',
tensor_types=None,
)[source]#
create(**kwargs)[source]#
valid(*knobs)[source]#

Attributes

alignment#
arrangement#
block_dim#
block_size#
code_type#
data_type#
execute_api#
execution#
function#
leading_dimension#
precision#
size#
static_block_dim#
transpose_mode#