DevicePipeline#

class nvmath.device.DevicePipeline( mm: Matmul, pipeline_depth: int, a: ndarray, b: ndarray, )[source]#

DevicePipeline allows users to optimally configure kernel calls for pipelined matrix multiplication. It also provides an access point for getting a TilePipeline object within a kernel.

Refer to the cuBLASDx documentation for more details on how to use this class: https://docs.nvidia.com/cuda/cublasdx/using_pipelines.html

Methods

__init__( mm: Matmul, pipeline_depth: int, a: ndarray, b: ndarray, )[source]#

get_tile( smem: ndarray, blockIdx_x: int, blockIdx_y: int, ) → TilePipeline[source]#

reset_tile( tile_pipeline: TilePipeline, idx: int | tuple[int, int], idy: int | tuple[int, int], )[source]#

Attributes

a_strides#

b_strides#

block_dim#

buffer_alignment#

buffer_size#

storage_alignment#

storage_bytes#