cuda.tile.Array#

class cuda.tile.Array#

Class for global array objects.

property dtype: DType#

The data type of the array’s elements.

Return type:

DType (constant)

property shape: tuple[int, ...]#

The number of elements in each of the array’s dimensions.

Return type:

tuple[int32,…]

property strides: tuple[int, ...]#

The number of elements to step in each dimension while traversing the array.

Return type:

tuple[int32,…]

property ndim: int#

The number of dimensions in the array.

Return type:

int (constant)

slice(axis, start, stop)#

Creates a view of the array sliced along a single axis.

The returned array references the same underlying memory as array, but with a restricted range from index start (inclusive) to stop (exclusive) along the specified axis. No data is copied.

axis must be a constant integer. Negative values are supported and count from the last dimension (e.g., axis=-1 refers to the last axis).

start and stop must be integers (scalars or 0D tiles). They must satisfy 0 <= start < N and start <= stop <= N, where N is the size of array along the sliced axis.

For example, consider a 2-dimensional array A of shape (M, N). Slicing along axis 0 from start to stop produces an array of shape (stop - start, N):

@ct.kernel
def kernel(x):
    sub = x.slice(axis=0, start=1, stop=3)
    print(ct.load(sub, (0, 0), shape=(2, 4)))

x = torch.arange(16, device='cuda').reshape(4, 4)
ct.launch(stream, (1,), kernel, (x,))
import cuda.tile as ct
import torch

torch.cuda.init()
stream = torch.cuda.current_stream()

@ct.kernel
def kernel(x):
    sub = x.slice(axis=0, start=1, stop=3)
    print(ct.load(sub, (0, 0), shape=(2, 4)))

x = torch.arange(16, device='cuda').reshape(4, 4)
ct.launch(stream, (1,), kernel, (x,))

torch.cuda.synchronize()

Output

[[4, 5, 6, 7], [8, 9, 10, 11]]

Using NumPy slice notation for illustration, this is equivalent to:

sub = A[start:stop, :]  # NumPy notation for reference only

The slice bounds can be dynamic (runtime values):

@ct.kernel
def kernel(x, offset, length):
    sub = x.slice(axis=0, start=offset, stop=offset+length)
    print(ct.load(sub, (0,), shape=(4,)))
    print(ct.load(sub, (1,), shape=(4,)))

x = torch.arange(16, device='cuda')
ct.launch(stream, (1,), kernel, (x, 8, 8))
import cuda.tile as ct
import torch

torch.cuda.init()
stream = torch.cuda.current_stream()

@ct.kernel
def kernel(x, offset, length):
    sub = x.slice(axis=0, start=offset, stop=offset+length)
    print(ct.load(sub, (0,), shape=(4,)))
    print(ct.load(sub, (1,), shape=(4,)))

x = torch.arange(16, device='cuda')
ct.launch(stream, (1,), kernel, (x, 8, 8))

torch.cuda.synchronize()

Output

[8, 9, 10, 11]
[12, 13, 14, 15]
tiled_view(
tile_shape,
*,
padding_mode=PaddingMode.UNDETERMINED,
traversal_steps=None,
)#

Creates a tiled view of this array with a fixed tile_shape.

The resulting TiledView partitions this array into a grid of equally sized tiles.

Parameters:
  • tile_shape (tuple[const int,...]) – The shape of each tile in the view. Must have the same rank as this array.

  • padding_mode (PaddingMode) – The value used to pad tiles that extend beyond the array boundaries. By default, the padding value is undetermined.

  • traversal_steps (tuple[const int, ...], optional) –

    Number of elements between consecutive tile origins along each axis. Must have the same rank as the array, or be None (default).

    • None or traversal_steps[i] == tile_shape[i]: tiles partition axis i with no overlap or gaps.

    • traversal_steps[i] < tile_shape[i]: tiles overlap along axis i.

    • traversal_steps[i] > tile_shape[i]: gaps between tiles along axis i.

    (Since CTK 13.3)

Return type:

TiledView

Examples

@ct.kernel
def kernel(x):
    tv = x.tiled_view((2, 4))
    print(tv.load((0, 0)))
    print(tv.load((1, 0)))
    # traversal_steps=(1, 4): advance 1 row per step, tiles overlap
    tv2 = x.tiled_view((2, 4), traversal_steps=(1, 4))
    print(tv2.load((0, 0)))
    print(tv2.load((1, 0)))

x = torch.arange(16, device='cuda').reshape(4, 4)
ct.launch(stream, (1,), kernel, (x,))
import cuda.tile as ct
import torch

torch.cuda.init()
stream = torch.cuda.current_stream()

@ct.kernel
def kernel(x):
    tv = x.tiled_view((2, 4))
    print(tv.load((0, 0)))
    print(tv.load((1, 0)))
    # traversal_steps=(1, 4): advance 1 row per step, tiles overlap
    tv2 = x.tiled_view((2, 4), traversal_steps=(1, 4))
    print(tv2.load((0, 0)))
    print(tv2.load((1, 0)))

x = torch.arange(16, device='cuda').reshape(4, 4)
ct.launch(stream, (1,), kernel, (x,))

torch.cuda.synchronize()

Output

[[0, 1, 2, 3], [4, 5, 6, 7]]
[[8, 9, 10, 11], [12, 13, 14, 15]]
[[0, 1, 2, 3], [4, 5, 6, 7]]
[[4, 5, 6, 7], [8, 9, 10, 11]]

See also

Tiled Views