Accumulator#
-
class nvmath.
device. Accumulator(*args)[source]# Accumulator is an abstraction that provides the link between the global memory and register layouts. It offers operations like partitioning, copying data, and mapping register indices to matrix coordinates.
Refer to the cuBLASDx documentation for more details on how to use this class: https://docs.nvidia.com/cuda/cublasdx/api/other_tensors.html#accumulator-and-register-fragment-tensors
Methods
- get_results(
- out=None,
- is_index_in_bounds(index: int) bool[source]#
Checks if the given index is within the bounds of the partitioned tensor. This is used to prevent out-of-bounds access in the kernel.
- is_predicated() bool[source]#
Checks if the current thread is predicated. This is used to determine if the thread should execute the kernel.
- make_empty_fragment() OpaqueTensor[source]#
Creates an empty fragment tensor in register memory. Fragment layout is same as accumulator layout.
- make_partition_and_copy(
- src: OpaqueTensor,
Same as partition_and_copy but returns the partitioned rmem tensor.
- map_fragment_index(fragment_index: int) tuple[int, int][source]#
Maps the given fragment index to a global memory index. This is used to access the correct element in the partitioned tensor.
- partition_and_copy(
- src: OpaqueTensor,
- dst: OpaqueTensor,
Partition gmem tensor and copy to rmem fragment.
- partition_and_store(
- tensor: OpaqueTensor,
- partition_like_C(
- gmem_c: OpaqueTensor,
Partitions the given global memory tensor
gmem_cinto a partitioned tensor. The partitioned tensor is used for accessing the C matrix when working with register fragment.