Control Flow#

Overview#

CuTe DSL walks Python’s AST and converts each control-flow construct it finds into structured intermediate representation (IR). You can therefore write ordinary Python loops and branches while the compiler decides—statement by statement—whether to

  • evaluate at compile time if the controlling value is a Constexpr (compile-time Python value), or

  • emit intermediate representation (IR) when the value is dynamic.

For a high-level discussion of the overall pipeline, see the code-generation overview.

For Loops#

CuTe DSL recognises three kinds of ranges for for loops:

  • range – the Python built-in

  • cutlass.range_dynamic – always lowers to intermediate representation (IR)

  • cutlass.range_constexpr – always unrolls at compile time

range(…)#

The AST rewriter inserts a small helper stub. At runtime the loop bounds are inspected:

  • Constant bounds → the loop is unrolled at compile time.

  • Dynamic bounds → the loop is emitted as structured intermediate representation (IR).

cutlass.range_dynamic(…)#

Use when you always want a loop in the generated intermediate representation (IR), even if the bounds look constant.

cutlass.range_constexpr(…)#

Runs in the Python interpreter and is fully unrolled before code generation. All loop indices must be Constexpr (compile-time Python value).

Limitations of Dynamic For Loops#

  • Early-exit break, continue, or raising exception are not yet supported.

  • Operations in the loop body are traced only when tracing is active in that region.

Example:

@cute.jit
def loop_example():
    n = 10

    # ❌ This loop is dynamic, early-exit isn't allowed.
    for i in cutlass.range_dynamic(n):
        if i == 5:
            break         # Early-exit
        cute.printf("%d\\n", i)

    # ✅ This loop is constexpr, early-exit is allowed.
    for i in cutlass.range_constexpr(n):
        if i == 5:
            break         # Early-exit
        cute.printf("%d\\n", i)

If-Else Statements#

Standard Python if/else is supported.

  • Predicate is Constexpr (compile-time Python value) → evaluated at compile time.

  • Predicate is dynamic → lowered to intermediate representation (IR).

Example:

@cute.jit
def main(const_var: cutlass.Constexpr, dynamic_var: cutlass.Int32):
    if const_var:                         # compile-time branch
        cute.printf("Const branch\\n")
    else:
        cute.printf("Const else\\n")

    if dynamic_var == 10:                 # dynamic branch
        cute.printf("Dynamic True\\n")
    else:
        cute.printf("Dynamic False\\n")

Similarly to for-loops, the if cutlass.const_expr and if cutlass.dynamic_expr constructs can be used to force the evaluation at compile-time or the generation of IR, respectively. Unstructured control flow is only supported when using if cutlass.const_expr.

While Loops#

Python while loops are always treated as dynamic because the loop condition may become dynamic after the first iteration. Similarly to for-loops and if/else, the while cutlass.const_expr and while cutlass.dynamic_expr constructs are available.

Compile-Time Metaprogramming#

Mix compile-time constructs with normal CuTe DSL code to generate specialised kernels without runtime overhead. A compile-time flag can, for example, toggle an optional ReLU epilogue:

@cute.kernel
def gemm(..., do_relu: cutlass.Constexpr):
    # main GEMM work
    ...
    if const_expr(do_relu):    # compile-time guard
        # ReLU code is emitted only when do_relu is True
        ...
gemm(..., False)   # ReLU is omitted from the generated |IR|
gemm(..., True)    # ReLU is included