cuda.tile.static_iter#
- cuda.tile.static_iter(iterable)#
Iterates at compile time.
Can only be used as the iterable of a for loop:
for ... in ct.static_iter(...): ...
The surrounded expression is evaluated using the same rules as
static_eval(): it can reference global and local variables, and use the full Python syntax, but must not perform any run-time operations.The expression must return a Python iterable, whose length must not exceed some pre-defined number of iterations (currently, 1000). Before any further processing is done, the contents of the iterable are saved to a temporary list, and each item is checked to be valid, as if it were a result of a
static_eval()expression (i.e., it must be a supported compile-time constant value or a proxy object for a dynamic value such as a tile).Finally, for each item of the iterable, the loop body is inlined, with the induction variable(s) bound to the item. The break, continue, and return statements are not allowed inside a static_iter loop.
tile = ct.zeros(4, dtype=ct.int32) size = 4 states = () for i in ct.static_iter(range(size)): states += (tile + i,) print(states) new_states = () for i in ct.static_iter(range(size)): new_states += (states[(i + 1) % size] + states[i], ) print(new_states)
import cuda.tile as ct import torch @ct.kernel def kernel(): tile = ct.zeros(4, dtype=ct.int32) size = 4 states = () for i in ct.static_iter(range(size)): states += (tile + i,) print(states) new_states = () for i in ct.static_iter(range(size)): new_states += (states[(i + 1) % size] + states[i], ) print(new_states) torch.cuda.init() ct.launch(torch.cuda.current_stream(), (1,), kernel, ()) torch.cuda.synchronize()
Output
([0, 0, 0, 0], [1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]) ([1, 1, 1, 1], [3, 3, 3, 3], [5, 5, 5, 5], [3, 3, 3, 3])