DevicePipeline#

class nvmath.device.DevicePipeline(
mm: Matmul,
pipeline_depth: int,
a: ndarray,
b: ndarray,
)[source]#

DevicePipeline allows users to optimally configure kernel calls for pipelined matrix multiplication. It also provides an access point for getting a TilePipeline object within a kernel.

Refer to the cuBLASDx documentation for more details on how to use this class: https://docs.nvidia.com/cuda/cublasdx/using_pipelines.html

Methods

__init__(
mm: Matmul,
pipeline_depth: int,
a: ndarray,
b: ndarray,
)[source]#
get_tile(
smem: ndarray,
blockIdx_x: int,
blockIdx_y: int,
) TilePipeline[source]#
reset_tile(
tile_pipeline: TilePipeline,
idx: int | tuple[int, int],
idy: int | tuple[int, int],
)[source]#

Attributes

a_strides#
b_strides#
block_dim#
buffer_alignment#
buffer_size#
storage_alignment#
storage_bytes#