Types#
Overview#
CuTe DSL provides a set of core types that form the foundation of tensor layout algebra and GPU programming. These types enable precise control over memory layout, data representation, and tensor operations. This document covers the key types available in cutlass.cute.core.
Core Numeric Types#
IntValue#
IntValue is an internal representation of constrained integer types with divisibility information. It serves as a proxy for constrained integer types in the CuTe IR, automatically tracking divisibility constraints that are crucial for layout operations.
Key Features:
Inherits from
ArithValuewith extensions for divisibility trackingAutomatically emits
cute.get_scalarsoperations in the IRSupports arithmetic operations that propagate divisibility information
Used internally for type-safe integer operations in layout algebra
API Methods:
get_typed_value()- Returns the value as an IntTupleTypeget_divisibility()- Returns the divisibility constraint of the valuedivisibility- Property that returns the divisibility constraint
Supported Operations:
The IntValue type supports standard arithmetic operations with divisibility tracking:
# Addition, subtraction, multiplication, division, and modulo
result = int_val1 + int_val2
result = int_val1 - int_val2
result = int_val1 * int_val2
result = int_val1 // int_val2
result = int_val1 % int_val2
String Representation:
# IntValue with divisibility 1
str(int_val) # Returns "?"
# IntValue with divisibility 4
str(int_val) # Returns "?{div=4}"
Ratio#
Ratio represents a rational number as a ratio of two integers. It is used in CuTe to represent exact fractional values that arise in tensor layout operations, particularly in composition operations where divisibility conditions may not be satisfied.
Constructor:
ratio = cute.Ratio(numerator, denominator)
- param numerator:
The numerator of the ratio
- type numerator:
int
- param denominator:
The denominator of the ratio
- type denominator:
int
- raises TypeError:
If numerator or denominator are not integers
Methods:
is_integral()- ReturnsTrueif the ratio represents an integer value (numerator divisible by denominator)reduced()- Returns a new Ratio with numerator and denominator reduced to lowest termsto(dtype)- Converts the ratio to another type (Ratio, float, or int)
Arithmetic Operations:
# Multiplication with another ratio
ratio1 = cute.Ratio(1, 2)
ratio2 = cute.Ratio(3, 4)
result = ratio1 * ratio2 # Returns Ratio(3, 8)
# Multiplication with integer
ratio = cute.Ratio(2, 3)
result = ratio * 5 # Returns Ratio(10, 3)
result = 5 * ratio # Returns Ratio(10, 3)
Type Conversion:
ratio = cute.Ratio(3, 2)
# Convert to float
float_val = ratio.to(float) # Returns 1.5
# Convert to int (floor division)
int_val = ratio.to(int) # Returns 1
Layout Algebra Types#
ScaledBasis#
ScaledBasis represents a scaled basis element in CuTe’s layout algebra. It consists of a scale value and a mode that identifies which basis element in the layout algebra is being referenced. ScaledBasis elements are fundamental to CuTe’s coordinate system representation.
Constructor:
sb = cute.ScaledBasis(value, mode)
- param value:
The scale value
- type value:
Union[int, Integer, Ratio, ir.Value]
- param mode:
The mode identifying the basis element
- type mode:
Union[int, List[int]]
- raises TypeError:
If mode is not an integer or list of integers
Examples:
# Create a scaled basis with integer scale and mode
sb1 = cute.ScaledBasis(2, 0) # 2 * E(0)
# Create a scaled basis with a Ratio scale
sb2 = cute.ScaledBasis(cute.Ratio(1, 2), 1) # (1/2) * E(1)
# Create a scaled basis with a list of modes
sb3 = cute.ScaledBasis(4, [0, 1]) # 4 * E([0, 1])
# Scaled basis elements are commonly used in layout strides
layout = cute.make_layout((4, 8), stride=(cute.ScaledBasis(2, 0), cute.ScaledBasis(1, 1)))
# This creates a layout with strides (2@0, 1@1) representing
# a coordinate system where each dimension has its own basis
# Example: Mapping coordinates to indices using the layout
coord = (2, 3)
idx = cute.crd2idx(coord, layout) # Maps (2, 3) to (4, 3)
Properties:
value- Get the scale valuemode- Get the mode as a list of integersis_static()- ReturnsTrueif the value is statically known
Methods:
to(dtype)- Convert to another type (ScaledBasis or internal _ScaledBasis)
Operations:
# Right multiplication by a scale factor
sb = cute.ScaledBasis(2, 0)
result = 3 * sb # Creates ScaledBasis(6, 0)
Utility Function:
# Create a basis element with unit scale
basis = cute.E(mode) # Equivalent to ScaledBasis(1, mode)
Swizzle#
Swizzle is a transformation that permutes the elements of a layout. Swizzles are used to rearrange data elements to improve memory access patterns and computational efficiency, particularly for avoiding bank conflicts in shared memory.
Swizzle Parameters:
A swizzle is defined by three parameters:
MBase: The number of least-significant bits to keep constant
BBits: The number of bits in the mask
SShift: The distance to shift the mask
Bit Pattern:
0bxxxxxxxxxxxxxxxYYYxxxxxxxZZZxxxx
^--^ MBase (least-sig bits kept constant)
^-^ ^-^ BBits (number of bits in mask)
^---------^ SShift (distance to shift YYY)
(positive: right, negative: left)
Given: 0bxxxxxxxxxxxxxxxxYYxxxxxxxxxZZxxx
Result: 0bxxxxxxxxxxxxxxxxYYxxxxxxxxxAAxxx
where AA = ZZ xor YY
Usage:
Swizzles are typically created using CuTe’s swizzle factory functions and composed with layouts to create optimized memory access patterns.
Layout#
Layout is CuTe’s core abstraction for representing tensor layouts. A Layout maps from a logical coordinate space to an index space, defined by a pair of (Shape, Stride). Layouts present a common interface to multidimensional array access that abstracts away the details of how array elements are organized in memory.
Key Concepts:
Shape: Defines the abstract dimensions of the Layout
Stride: Defines how coordinates within the Shape map to linear indices
Hierarchical Structure: CuTe layouts are inherently hierarchical, constructed from smaller nested layouts
Properties:
shape- An IntTuple representing the dimensions of the layoutstride- An IntTuple representing the strides of the layoutmax_alignment- The maximum alignment of the layout in bytes
Examples:
# Creating a layout with shape (4,8) and default stride (column major)
layout = cute.make_layout((4, 8))
# Creating a layout with explicit shape and stride (row major)
layout = cute.make_layout((4, 8), stride=(8, 1))
# Accessing layout properties
shape = layout.shape # Returns (4, 8)
stride = layout.stride # Returns (8, 1)
# Mapping a coordinate to an index: (2, 3) -> 2 * 8 + 3 * 1 = 19
idx = cute.crd2idx((2, 3), layout)
Layout Operations:
Layouts support a rich algebra of operations:
Concatenation: Combining layouts along dimensions
Coalescence: Merging adjacent modes
Composition: Composing layouts with functions or other layouts
Complement: Computing the complement space
Inversion: Inverting the layout mapping
String Representation:
layout = cute.make_layout((4, 8), stride=(1, 4))
print(layout) # Prints "shape:stride" format, e.g., "(4,8):(1,4)"
ComposedLayout#
ComposedLayout represents a composition of layouts and transformations. It is a generalization of normal layouts that can support arbitrary function mappings from coordinate to coordinate as an inner layout.
Structure:
A ComposedLayout consists of three components:
inner: The inner transformation (Swizzle or Layout)
offset: An offset applied to coordinates
outer: The outer layout
Properties:
inner- Returns the inner transformation (Union[Swizzle, Layout])offset- Returns the offset as an IntTupleouter- Returns the outer layoutshape- Returns the shape of the composed layoutmax_alignment- Returns the maximum alignmentis_normal- ReturnsTrueif this is a normal layout (not a general composition)
Examples:
# ComposedLayouts are typically created through composition operations
# For example, composing a layout with a swizzle
layout = cute.make_layout((8, 8))
swizzle = cute.make_swizzle(...)
composed = cute.composition(swizzle, layout)
# Accessing components
inner = composed.inner # Returns the swizzle
outer = composed.outer # Returns the layout
offset = composed.offset # Returns the offset
String Representation:
print(composed) # Prints "inner o offset o outer" format
Memory and Pointer Types#
Pointer#
Pointer represents a memory address with specific properties. Pointers are a fundamental type of iterator/engine that support random-access operations. They can be offset by elements of a layout’s codomain and dereferenced to produce values.
Properties:
dtype- The type of value this pointer points totype- The MLIR type of the pointermemspace- The memory space where the pointer data resides (e.g.,gmem,smem,rmem)alignment- The alignment of the pointer in bytesmax_alignment- The maximum alignment of the pointer in bytes
Operations:
# Pointer arithmetic
ptr2 = ptr + offset # Offset pointer forward
ptr3 = offset + ptr # Offset pointer forward (commutative)
ptr4 = ptr - offset # Offset pointer backward
# Convert pointer to integer
int_addr = ptr.toint()
# Align pointer to specified byte boundary
aligned_ptr = ptr.align(16) # Align to 16-byte boundary
Tensor Composition:
When composed with a layout, a pointer forms a tensor: T = E ∘ L, where E is the pointer (engine) and L is the layout. The tensor evaluates the layout by mapping a coordinate c to the codomain, offsets the pointer accordingly, and dereferences the result:
T(c) = (E ∘ L)(c) = *(E + L(c))
Methods:
llvm_ptr- Get the LLVM pointer representation (low-level use only)align(min_align)- Align pointer to specified byte alignment (must be power of 2)toint()- Convert pointer to integer address (Int64 for gmem/generic, Int32 otherwise)
Examples:
# Create a pointer from a tensor's data
ptr = tensor.data()
# Offset the pointer
offset_ptr = ptr + 16
# Check pointer properties
print(f"Memory space: {ptr.memspace}")
print(f"Alignment: {ptr.alignment}")
print(f"Data type: {ptr.dtype}")
Structured Data Types#
struct#
The struct decorator abstracts C structures in Python DSL. It allows you to define structured data types with precise control over layout, alignment, and nesting.
Supported Elements:
Base DSL scalar int/float elements
Arrays (MemRange)
Nested structures
Aligned elements
Basic Usage:
# Define a simple struct
@cute.struct
class complex:
real : cutlass.Float32
imag : cutlass.Float32
# Define a struct with arrays and nested structures
@cute.struct
class StorageA:
mbarA : cute.struct.MemRange[cutlass.Int64, stage]
compA : complex
intA : cutlass.Int16
Alignment Control:
# Define a struct with explicit alignment
@cute.struct
class StorageB:
a: cute.struct.Align[
cute.struct.MemRange[cutlass.Float32, size_a], 1024
]
b: cute.struct.Align[
cute.struct.MemRange[cutlass.Float32, size_b], 1024
]
x: cute.struct.Align[cutlass.Int32, 16]
compA: cute.struct.Align[complex, 16]
Static Queries:
# Get size and alignment at compile time
size = StorageB.__sizeof__()
align = StorageB.__alignof__()
Allocation and Access:
# Allocate and reference elements
storage = allocator.allocate(StorageB)
# Access struct members
storage.a[0] = ...
storage.x = ...
... = storage.compA.real.ptr
... = storage.x.ptr.load()
Methods:
__sizeof__()- Returns the size of the struct in bytes__alignof__()- Returns the alignment of the struct in bytessize_in_bytes()- Returns the size of the struct in bytes
struct.MemRange#
MemRange defines a contiguous range of memory with a specific element type and size.
Syntax:
cute.struct.MemRange[dtype, size]
- param dtype:
The data type (must be a DSL scalar type)
- type dtype:
Type[Numeric]
- param size:
The number of elements in the range
- type size:
int
Properties:
size- Number of elements in the rangeelem_width- Width of each element in bitssize_in_bytes- Total size in bytes
Methods:
data_ptr()- Returns a pointer to the start of the memory rangeget_tensor(layout, swizzle=None, dtype=None)- Creates a tensor from the memory range__getitem__(index)- Returns the element at the specified index
Examples:
@cute.struct
class Buffer:
data : cute.struct.MemRange[cutlass.Float32, 128]
# Allocate buffer
buf = allocator.allocate(Buffer)
# Get pointer to data
ptr = buf.data.data_ptr()
# Access individual elements
element = buf.data[5]
# Create tensor from memory range
layout = cute.make_layout((8, 16))
tensor = buf.data.get_tensor(layout)
struct.Align#
Align specifies explicit alignment requirements for struct members.
Syntax:
cute.struct.Align[dtype, alignment]
- param dtype:
The type to align (scalar, MemRange, or struct)
- type dtype:
Type
- param alignment:
The alignment in bytes (must be > 0)
- type alignment:
int
Properties:
dtype- The data type being alignedalign- The alignment value
Examples:
@cute.struct
class AlignedStorage:
# Align scalar to 16 bytes
counter: cute.struct.Align[cutlass.Int32, 16]
# Align array to 1024 bytes
buffer: cute.struct.Align[
cute.struct.MemRange[cutlass.Float32, 256], 1024
]
union#
The union decorator abstracts C unions in Python DSL. Similar to struct, but all members start at offset 0, and the size is the maximum size of all members.
Layout Characteristics:
All objects start at offset 0
Alignment is the maximum alignment of all objects
Size is the maximum size of all objects
Usage:
# Define a union with scalar elements
@cute.union
class value_union:
as_int : cutlass.Int32
as_float : cutlass.Float32
# Allocate union
val = allocator.allocate(value_union)
# Access different interpretations of same memory
val.as_int = 42
float_val = val.as_float.ptr.load() # Interpret same bits as float
Methods:
Same as struct:
__sizeof__()- Returns the size of the union in bytes__alignof__()- Returns the alignment of the union in bytes
Deprecated Types#
ThrMma#
Deprecated since version ``cute.core.ThrMma``: is deprecated, use cute.ThrMma instead
ThrCopy#
Deprecated since version ``cute.core.ThrCopy``: is deprecated, use cute.ThrCopy instead
Type Hierarchies and Relationships#
Type Protocol Support:
Many CuTe types implement standard Python protocols for integration:
__str__()- String representation for debugging__eq__()/__ne__()- Equality comparison__getitem__()- Indexing operations__add__()/__sub__()/__mul__()/__floordiv__()/__mod__()- Arithmetic
MLIR Integration:
Internal types like IntValue, Layout, Pointer, and ComposedLayout are registered as MLIR value casters, enabling seamless integration with the underlying compiler infrastructure.
Best Practices#
Choosing Between Static and Dynamic:
Use static values (Python
int) when dimensions are known at compile time for maximum optimizationUse dynamic values (
IntValue) when dimensions must be determined at runtimeRefer to Static vs Dynamic layouts for detailed guidance on static vs dynamic layouts
Memory Alignment:
Always specify alignment requirements for shared memory structures to avoid bank conflicts
Use
struct.Alignto enforce alignment constraintsCheck
max_alignmentproperties to verify pointer and layout alignment
Layout Operations:
Prefer built-in layout operations (
make_layout,composition, etc.) over manual constructionUse
ScaledBasisfor explicit control over stride modes in multi-modal layoutsLeverage
ComposedLayoutfor complex transformations like swizzling
Type Safety:
Use type annotations in
@jitand@kernelfunctionsLet the DSL infer types when possible for cleaner code
Check
dtypeandmemspaceproperties when working with pointers
See Also#
Introduction - Introduction to CuTe DSL decorators and calling conventions
Control Flow - Control flow with static and dynamic values
Static vs Dynamic layouts - Working with static and dynamic layouts
Integration with Frameworks - Integration with deep learning frameworks
Debugging - Debugging techniques for CuTe DSL programs