cuda.tile.TiledView#

class cuda.tile.TiledView#

Class for tiled view objects.

property dtype: DType#

The data type of the elements in the tiled view.

Return type:

DType (constant)

property tile_shape: tuple[int, ...]#

The shape of tiles produced by each indexed access.

Return type:

tuple[const int,…]

num_tiles(axis)#

The number of tiles along a tiled view’s given axis.

Parameters:

axis (const int) – The axis of the tile index space.

Return type:

int32

load(index, *, latency=None, allow_tma=None)#

Loads a tile from the tiled view at the given tile index.

The returned tile has shape tile_shape.

For a tile that partially extends beyond the tiled view boundaries, out-of-bound elements are filled according to the view’s padding mode. If the tile lies entirely outside the tiled view, the behavior is undefined.

Parameters:
  • index (tuple[int,...]) – An index in the tiled view’s tile space.

  • latency (const int) – A hint indicating how heavy DRAM traffic will be. It shall be an integer between 1 (low) and 10 (high). By default, the compiler will infer the latency.

  • allow_tma (const bool) – If False, the load will not use TMA. By default, TMA is allowed.

Return type:

Tile

Examples

@ct.kernel
def kernel(x):
    tv = x.tiled_view(4)
    tile = tv.load(0)
    print(tile)

x = torch.arange(8, device='cuda')
ct.launch(stream, (1,), kernel, (x,))
import cuda.tile as ct
import torch

torch.cuda.init()
stream = torch.cuda.current_stream()

@ct.kernel
def kernel(x):
    tv = x.tiled_view(4)
    tile = tv.load(0)
    print(tile)

x = torch.arange(8, device='cuda')
ct.launch(stream, (1,), kernel, (x,))

torch.cuda.synchronize()

Output

[0, 1, 2, 3]
store(index, tile, *, latency=None, allow_tma=None)#

Stores a tile into the tiled view at the given tile index.

The tile’s shape must be broadcastable to tile_shape. If the tile’s dtype differs from the view’s dtype, an implicit cast is performed.

For a tile that partially extends beyond the tiled view boundaries, out-of-bound elements are ignored. If the tile lies entirely outside the tiled view, the behavior is undefined.

Parameters:
  • index (tuple[int,...]) – An index in the tiled view’s tile space.

  • tile (Tile) – The tile to store.

  • latency (const int) – A hint indicating how heavy DRAM traffic will be. It shall be an integer between 1 (low) and 10 (high). By default, the compiler will infer the latency.

  • allow_tma (const bool) – If False, the store will not use TMA. By default, TMA is allowed.

Examples

@ct.kernel
def kernel(x):
    tv = x.tiled_view(4)
    tile = ct.full((4,), 99, dtype=ct.int32)
    tv.store(0, tile)

x = torch.zeros(8, dtype=torch.int32, device='cuda')
ct.launch(stream, (1,), kernel, (x,))
print(x.tolist())
import cuda.tile as ct
import torch

torch.cuda.init()
stream = torch.cuda.current_stream()

@ct.kernel
def kernel(x):
    tv = x.tiled_view(4)
    tile = ct.full((4,), 99, dtype=ct.int32)
    tv.store(0, tile)

x = torch.zeros(8, dtype=torch.int32, device='cuda')
ct.launch(stream, (1,), kernel, (x,))
print(x.tolist())

torch.cuda.synchronize()

Output

[99, 99, 99, 99, 0, 0, 0, 0]