Partitioner#

class nvmath.device.Partitioner(*args)[source]#

Partitioner is an abstraction for partitioning a global memory tensor into a partitioned tensor.

Note

Do not create directly, use nvmath.device.Matmul.suggest_partitioner().

Refer to the cuBLASDx documentation for more details on how to use this class: https://docs.nvidia.com/cuda/cublasdx/api/other_tensors.html#partitioner-register-tensor-other-label

Methods

__init__(*args)[source]#

abstract is_index_in_bounds(index: int) → bool[source]#: Checks if the given index is within the bounds of the partitioned tensor. This is used to prevent out-of-bounds access in the kernel.

abstract is_predicated() → bool[source]#: Checks if the current thread is predicated. This is used to determine if the thread should execute the kernel.

abstract is_thread_active() → bool[source]#: Checks if the current thread takes part in GEMM.

abstract map_fragment_index(fragment_index: int) → tuple[int, int][source]#: Maps the given fragment index to a global memory index. This is used to access the correct element in the partitioned tensor.

abstract partition_like_C( gmem_c: OpaqueTensor, ) → Partition[source]#: Partitions the given global memory tensor gmem_c into a partitioned tensor. The partitioned tensor is used for accessing the C matrix when working with register fragment.