Partitioner#

class nvmath.device.Partitioner(*args)[source]#

Partitioner is an abstraction for partitioning a global memory tensor into a partitioned tensor.

Note

Do not create directly, use nvmath.device.Matmul.suggest_partitioner().

Refer to the cuBLASDx documentation for more details on how to use this class: https://docs.nvidia.com/cuda/cublasdx/api/other_tensors.html#partitioner-register-tensor-other-label

Methods

__init__(*args)[source]#
abstract is_index_in_bounds(index: int) bool[source]#

Checks if the given index is within the bounds of the partitioned tensor. This is used to prevent out-of-bounds access in the kernel.

abstract is_predicated() bool[source]#

Checks if the current thread is predicated. This is used to determine if the thread should execute the kernel.

abstract is_thread_active() bool[source]#

Checks if the current thread takes part in GEMM.

abstract map_fragment_index(fragment_index: int) tuple[int, int][source]#

Maps the given fragment index to a global memory index. This is used to access the correct element in the partitioned tensor.

abstract partition_like_C(
gmem_c: OpaqueTensor,
) Partition[source]#

Partitions the given global memory tensor gmem_c into a partitioned tensor. The partitioned tensor is used for accessing the C matrix when working with register fragment.