Accumulator#

class nvmath.device.Accumulator(*args)[source]#

Accumulator is an abstraction that provides the link between the global memory and register layouts. It offers operations like partitioning, copying data, and mapping register indices to matrix coordinates.

Refer to the cuBLASDx documentation for more details on how to use this class: https://docs.nvidia.com/cuda/cublasdx/api/other_tensors.html#accumulator-and-register-fragment-tensors

Methods

__init__(*args)[source]#

axpby()[source]#

clear()[source]#

get_alignment() → int[source]#

get_results( out=None, ) → OpaqueTensor[source]#

is_index_in_bounds(index: int) → bool[source]#: Checks if the given index is within the bounds of the partitioned tensor. This is used to prevent out-of-bounds access in the kernel.

is_predicated() → bool[source]#: Checks if the current thread is predicated. This is used to determine if the thread should execute the kernel.

is_thread_active() → bool[source]#: Checks if the current thread takes part in GEMM.

make_empty_fragment() → OpaqueTensor[source]#: Creates an empty fragment tensor in register memory. Fragment layout is same as accumulator layout.

make_partition_and_copy( src: OpaqueTensor, ) → OpaqueTensor[source]#: Same as partition_and_copy but returns the partitioned rmem tensor.

map_fragment_index(fragment_index: int) → tuple[int, int][source]#: Maps the given fragment index to a global memory index. This is used to access the correct element in the partitioned tensor.

partition_and_copy( src: OpaqueTensor, dst: OpaqueTensor, )[source]#: Partition gmem tensor and copy to rmem fragment.

partition_and_store( tensor: OpaqueTensor, )[source]#

partition_like_C( gmem_c: OpaqueTensor, ) → OpaqueTensor[source]#: Partitions the given global memory tensor gmem_c into a partitioned tensor. The partitioned tensor is used for accessing the C matrix when working with register fragment.

size()[source]#