cuda.tile.pack_to_bytes#

cuda.tile.pack_to_bytes(x, /)#

Flattens a tile and reinterprets its raw bytes as uint8 elements.

The total number of bits of the input tile must be divisible by 8.

Parameters:

x (Tile) – input tile.

Returns:

a 1D uint8 tile with total_elements * bit width // 8 elements.

Return type:

Tile

Examples

tx = ct.full(1, 0x04030201, dtype=ct.int32)
ty = ct.pack_to_bytes(tx)
print(ty)
import cuda.tile as ct
import torch

@ct.kernel
def kernel():
    tx = ct.full(1, 0x04030201, dtype=ct.int32)
    ty = ct.pack_to_bytes(tx)
    print(ty)


torch.cuda.init()
ct.launch(torch.cuda.current_stream(), (1,), kernel, ())
torch.cuda.synchronize()

Output

[1, 2, 3, 4]