cuda.tile.static_iter#

cuda.tile.static_iter(iterable)#

Iterates at compile time.

Can only be used as the iterable of a for loop:

for ... in ct.static_iter(...):
    ...

The surrounded expression is evaluated using the same rules as static_eval(): it can reference global and local variables, and use the full Python syntax, but must not perform any run-time operations.

The expression must return a Python iterable, whose length must not exceed some pre-defined number of iterations (currently, 1000). Before any further processing is done, the contents of the iterable are saved to a temporary list, and each item is checked to be valid, as if it were a result of a static_eval() expression (i.e., it must be a supported compile-time constant value or a proxy object for a dynamic value such as a tile).

Finally, for each item of the iterable, the loop body is inlined, with the induction variable(s) bound to the item. The break, continue, and return statements are not allowed inside a static_iter loop.

tile = ct.zeros(4, dtype=ct.int32)
size = 4

states = ()
for i in ct.static_iter(range(size)):
    states += (tile + i,)
print(states)

new_states = ()
for i in ct.static_iter(range(size)):
    new_states += (states[(i + 1) % size] + states[i], )
print(new_states)
import cuda.tile as ct
import torch

@ct.kernel
def kernel():
    tile = ct.zeros(4, dtype=ct.int32)
    size = 4

    states = ()
    for i in ct.static_iter(range(size)):
        states += (tile + i,)
    print(states)

    new_states = ()
    for i in ct.static_iter(range(size)):
        new_states += (states[(i + 1) % size] + states[i], )
    print(new_states)


torch.cuda.init()
ct.launch(torch.cuda.current_stream(), (1,), kernel, ())
torch.cuda.synchronize()

Output

([0, 0, 0, 0], [1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3])
([1, 1, 1, 1], [3, 3, 3, 3], [5, 5, 5, 5], [3, 3, 3, 3])