cuda.tile.extract#

cuda.tile.extract(x, /, index, shape)#

Extracts a smaller tile from input tile.

Partition the input tile into a grid with subtile shape and return a tile given the index into the grid. Similar to load() but performed on a tile.

Parameters:

x (Tile) – input tile.
index (Shape) – Index into the grid of subtiles, not element index. Each dimension i has x.shape[i] // shape[i] subtiles; valid values are [0, x.shape[i] // shape[i]). For example, extracting shape (4,) from a (128,) tile gives 32 subtiles, so valid indices are 0–31.
shape (Shape) – The shape of the extracted tile. Must evenly divide x.shape in every dimension.

Return type:

Tile

Examples

1D tile.

SnippetComplete Example

tile = ct.arange(8, dtype=ct.int32)
sub = ct.extract(tile, (0,), shape=(4,))
print(f'(0,): {sub}')
sub = ct.extract(tile, (1,), shape=(4,))
print(f'(1,): {sub}')

import cuda.tile as ct
import torch

@ct.kernel
def kernel():
    tile = ct.arange(8, dtype=ct.int32)
    sub = ct.extract(tile, (0,), shape=(4,))
    print(f'(0,): {sub}')
    sub = ct.extract(tile, (1,), shape=(4,))
    print(f'(1,): {sub}')


torch.cuda.init()
ct.launch(torch.cuda.current_stream(), (1,), kernel, ())
torch.cuda.synchronize()

Output

(0,): [0, 1, 2, 3]
(1,): [4, 5, 6, 7]

2D tile.

SnippetComplete Example

tile = ct.arange(16, dtype=ct.int32).reshape((4, 4))
sub = ct.extract(tile, (0, 0), shape=(2, 2))
print(f'(0, 0): {sub}')
sub = ct.extract(tile, (0, 1), shape=(2, 2))
print(f'(0, 1): {sub}')

import cuda.tile as ct
import torch

@ct.kernel
def kernel():
    tile = ct.arange(16, dtype=ct.int32).reshape((4, 4))
    sub = ct.extract(tile, (0, 0), shape=(2, 2))
    print(f'(0, 0): {sub}')
    sub = ct.extract(tile, (0, 1), shape=(2, 2))
    print(f'(0, 1): {sub}')


torch.cuda.init()
ct.launch(torch.cuda.current_stream(), (1,), kernel, ())
torch.cuda.synchronize()

Output

(0, 0): [[0, 1], [4, 5]]
(0, 1): [[2, 3], [6, 7]]