cuda.tile.TiledView#
- class cuda.tile.TiledView#
Class for tiled view objects.
- property dtype: DType#
The data type of the elements in the tiled view.
- Return type:
DType (constant)
- property tile_shape: tuple[int, ...]#
The shape of tiles produced by each indexed access.
- Return type:
tuple[const int,…]
- num_tiles(axis)#
The number of tiles along a tiled view’s given axis.
- Parameters:
axis (const int) – The axis of the tile index space.
- Return type:
int32
- load(index, *, latency=None, allow_tma=None)#
Loads a tile from the tiled view at the given tile index.
The returned tile has shape
tile_shape.For a tile that partially extends beyond the tiled view boundaries, out-of-bound elements are filled according to the view’s padding mode. If the tile lies entirely outside the tiled view, the behavior is undefined.
- Parameters:
index (tuple[int,...]) – An index in the tiled view’s tile space.
latency (const int) – A hint indicating how heavy DRAM traffic will be. It shall be an integer between 1 (low) and 10 (high). By default, the compiler will infer the latency.
allow_tma (const bool) – If False, the load will not use TMA. By default, TMA is allowed.
- Return type:
Examples
@ct.kernel def kernel(x): tv = x.tiled_view(4) tile = tv.load(0) print(tile) x = torch.arange(8, device='cuda') ct.launch(stream, (1,), kernel, (x,))
import cuda.tile as ct import torch torch.cuda.init() stream = torch.cuda.current_stream() @ct.kernel def kernel(x): tv = x.tiled_view(4) tile = tv.load(0) print(tile) x = torch.arange(8, device='cuda') ct.launch(stream, (1,), kernel, (x,)) torch.cuda.synchronize()
Output
[0, 1, 2, 3]
- store(index, tile, *, latency=None, allow_tma=None)#
Stores a tile into the tiled view at the given tile index.
The tile’s shape must be broadcastable to
tile_shape. If the tile’s dtype differs from the view’s dtype, an implicit cast is performed.For a tile that partially extends beyond the tiled view boundaries, out-of-bound elements are ignored. If the tile lies entirely outside the tiled view, the behavior is undefined.
- Parameters:
index (tuple[int,...]) – An index in the tiled view’s tile space.
tile (Tile) – The tile to store.
latency (const int) – A hint indicating how heavy DRAM traffic will be. It shall be an integer between 1 (low) and 10 (high). By default, the compiler will infer the latency.
allow_tma (const bool) – If False, the store will not use TMA. By default, TMA is allowed.
Examples
@ct.kernel def kernel(x): tv = x.tiled_view(4) tile = ct.full((4,), 99, dtype=ct.int32) tv.store(0, tile) x = torch.zeros(8, dtype=torch.int32, device='cuda') ct.launch(stream, (1,), kernel, (x,)) print(x.tolist())
import cuda.tile as ct import torch torch.cuda.init() stream = torch.cuda.current_stream() @ct.kernel def kernel(x): tv = x.tiled_view(4) tile = ct.full((4,), 99, dtype=ct.int32) tv.store(0, tile) x = torch.zeros(8, dtype=torch.int32, device='cuda') ct.launch(stream, (1,), kernel, (x,)) print(x.tolist()) torch.cuda.synchronize()
Output
[99, 99, 99, 99, 0, 0, 0, 0]