cutlass.cute#
- class cutlass.cute.ComposedLayout(*args: Any, **kwargs: Any)#
Bases:
Value
ComposedLayout represents the functional composition of layouts in CuTe.
A ComposedLayout is formed by the composition of three components: inner o offset o outer, where:
inner: The inner layout or swizzle that is applied last
offset: An integer tuple representing a coordinate offset
outer: The outer layout that is applied first
ComposedLayout implements the functional composition operation where: R(c) := (inner o offset o outer)(c) := inner(offset + outer(c))
This composition allows for complex transformations of coordinates and indices, enabling operations like tiling, partitioning, and reshaping of data.
- Variables:
inner – The inner layout or swizzle component
offset – The coordinate offset applied between inner and outer layouts
outer – The outer layout component
max_alignment – The maximum alignment of the composed layout
Examples:
# Create a composed layout with inner layout, offset, and outer layout # inner layout: (4, 8):(1, 4) inner_layout = make_layout((4, 8)) offset = (0, 0) # outer layout: (2, 2):(1@0, 1@1) outer_layout = make_layout((2, 2), stride=(1 * E(0), 1 * E(1))) # composed layout: (inner o offset o outer) composed = make_composed_layout(inner_layout, offset, outer_layout) # Accessing components of the composed layout inner = composed.inner offset = composed.offset outer = composed.outer # map coordinate (1, 2) to linear index # - outer(1, 2) = (1, 2) # - offset + outer(1, 2) = (1, 2) # - inner(1, 2) = 1 * 1 + 2 * 4 = 9 idx = crd2idx((1, 2), composed) # Composition is used in many tiling operations # For example, in logical_product, raked_product, and blocked_product
- __init__(*args: Any, **kwargs: Any) None #
- class cutlass.cute.Swizzle(*args: Any, **kwargs: Any)#
Bases:
Value
Swizzle is a transformation that permutes the elements of a layout.
Swizzles are used to rearrange data elements to improve memory access patterns and computational efficiency.
Swizzle is defined by three parameters: - MBase: The number of least-significant bits to keep constant - BBits: The number of bits in the mask - SShift: The distance to shift the mask
The mask is applied to the least-significant bits of the layout.
0bxxxxxxxxxxxxxxxYYYxxxxxxxZZZxxxx ^--^ MBase is the number of least-sig bits to keep constant ^-^ ^-^ BBits is the number of bits in the mask ^---------^ SShift is the distance to shift the YYY mask (pos shifts YYY to the right, neg shifts YYY to the left) e.g. Given 0bxxxxxxxxxxxxxxxxYYxxxxxxxxxZZxxx the result is 0bxxxxxxxxxxxxxxxxYYxxxxxxxxxAAxxx where AA = ZZ xor YY
- cutlass.cute.E(mode: int | List[int]) ScaledBasis #
Create a unit ScaledBasis element with the specified mode.
This function creates a ScaledBasis with value 1 and the given mode. The mode represents the coordinate axis or dimension in the layout.
- Parameters:
mode (Union[int, List[int]]) – The mode (dimension) for the basis element, either a single integer or a list of integers
- Returns:
A ScaledBasis with value 1 and the specified mode
- Return type:
- Raises:
TypeError – If mode is not an integer or a list
Examples:
# Create a basis element for the first dimension (mode 0) e0 = E(0) # Create a basis element for the second dimension (mode 1) e1 = E(1) # Create a basis element for a hierarchical dimension e_hier = E([0, 1])
- class cutlass.cute.Atom(op: Op, trait: Trait)#
Bases:
ABC
Atom base class.
An Atom is the composition of
a MMA or Copy Operation;
an internal MMA or Copy Trait.
An Operation is a pure Python class that is used to model a specific MMA or Copy instruction. The Trait wraps the underlying IR Value and provides access to the metadata of the instruction encoded using CuTe Layouts. When the Trait can be constructed straighforwardly from an Operation, the
make_mma_atom
ormake_copy_atom
API should be used. There are cases where constructing the metadata is not trivial and requires more information, for example to determine the number of bytes copied per TMA instruction (“the TMA vector length”). In such cases, dedicated helper functions are provided with an appropriate API such that the Atom is constructed internally in an optimal fashion for the user.- __init__(
- op: Op,
- trait: Trait,
- property op: Op#
- property type#
- set(modifier, value, *, loc=None, ip=None) None #
Sets runtime fields of the Atom.
Some Atoms have runtime state, for example a tcgen05 MMA Atom
tiled_mma = cute.make_tiled_mma(some_tcgen05_mma_op) tiled_mma.set(cute.nvgpu.tcgen05.Field.ACCUMULATE, True)
The
set
method provides a way to the user to modify such runtime state. Modifiable fields are provided by arch-specific enumerations, for exampletcgen05.Field
. The Atom instance internally validates the field as well as the value provided by the user to set the field to.
- _unpack(*, loc=None, ip=None, **kwargs) cutlass._mlir.ir.Value #
- _abc_impl = <_abc._abc_data object>#
- class cutlass.cute.MmaAtom(op: Op, trait: Trait)#
Bases:
Atom
The MMA Atom class.
- property thr_id: cutlass.cute.typing.Layout#
- property shape_mnk: cutlass.cute.typing.Shape#
- property tv_layout_A: cutlass.cute.typing.Layout#
- property tv_layout_B: cutlass.cute.typing.Layout#
- property tv_layout_C: cutlass.cute.typing.Layout#
- make_fragment_A(input, *, loc=None, ip=None)#
- make_fragment_B(input, *, loc=None, ip=None)#
- make_fragment_C(input, *, loc=None, ip=None)#
- _abc_impl = <_abc._abc_data object>#
- class cutlass.cute.CopyAtom(op: Op, trait: Trait)#
Bases:
Atom
The Copy Atom class.
- property value_type: Type[cutlass.cute.typing.Numeric]#
- property thr_id: cutlass.cute.typing.Layout#
- property layout_src_tv: cutlass.cute.typing.Layout#
- property layout_dst_tv: cutlass.cute.typing.Layout#
- _abc_impl = <_abc._abc_data object>#
- class cutlass.cute.TiledCopy(op: Op, trait: Trait)#
Bases:
CopyAtom
The tiled Copy class.
- property layout_tv_tiled: cutlass.cute.typing.Layout#
- property tiler_mn: cutlass.cute.typing.Tile#
- property layout_src_tv_tiled: cutlass.cute.typing.Layout#
- property layout_dst_tv_tiled: cutlass.cute.typing.Layout#
- property size: int#
- get_slice(
- thr_idx: int | cutlass.cute.typing.Int32,
- retile(src, *, loc=None, ip=None)#
- _abc_impl = <_abc._abc_data object>#
- class cutlass.cute.TiledMma(op: Op, trait: Trait)#
Bases:
MmaAtom
The tiled MMA class.
- property tv_layout_A_tiled: cutlass.cute.typing.Layout#
- property tv_layout_B_tiled: cutlass.cute.typing.Layout#
- property tv_layout_C_tiled: cutlass.cute.typing.Layout#
- property permutation_mnk: cutlass.cute.typing.Tile#
- property thr_layout_vmnk: cutlass.cute.typing.Layout#
- property size: int#
- get_tile_size(mode_idx: int) cutlass.cute.typing.Shape #
- get_slice(
- thr_idx: int | cutlass.cute.typing.Int32,
- _partition_shape(operand_id, shape, *, loc=None, ip=None)#
- partition_shape_A(shape_mk, *, loc=None, ip=None)#
- partition_shape_B(shape_nk, *, loc=None, ip=None)#
- partition_shape_C(shape_mn, *, loc=None, ip=None)#
- _thrfrg(
- operand_id,
- input: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- _thrfrg(
- operand_id,
- input: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
- _thrfrg_A(
- input: cutlass.cute.typing.Layout | cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
- _thrfrg_B(
- input: cutlass.cute.typing.Layout | cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
- _thrfrg_C(
- input: cutlass.cute.typing.Layout | cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
- _abc_impl = <_abc._abc_data object>#
- class cutlass.cute.TensorSSA(*args: Any, **kwargs: Any)#
Bases:
ArithValue
A class representing thread local data from CuTe Tensor in value semantic and immutable.
- Parameters:
value (ir.Value) – Flatten vector as ir.Value holding logic data of SSA Tensor
shape (Shape) – The nested shape in CuTe of the vector
dtype (Type[Numeric]) – Data type of the tensor elements
- Variables:
_shape – The nested shape in CuTe of the vector
_dtype – Data type of the tensor elements
- Raises:
ValueError – If shape is not static
- __init__(
- value,
- shape: cutlass.cute.typing.Shape,
- dtype: Type[cutlass.cute.typing.Numeric],
Initialize a new TensorSSA object.
- Parameters:
value (ir.Value) – Flatten vector as ir.Value holding logic data of SSA Tensor
shape (Shape) – The nested shape in CuTe of the vector
dtype (Type[Numeric]) – Data type of the tensor elements
- Raises:
ValueError – If shape is not static
- property dtype: Type[cutlass.cute.typing.Numeric]#
- property element_type: Type[cutlass.cute.typing.Numeric]#
- property shape#
- _apply_op(
- op,
- other: TensorSSA,
- flip,
- *,
- loc,
- ip,
- _apply_op(
- op,
- other: cutlass.cutlass_dsl.cutlass_arith.ArithValue,
- flip,
- *,
- loc,
- ip,
- _apply_op(
- op,
- other: int | float | bool,
- flip,
- *,
- loc,
- ip,
- _flatten_shape_and_coord(crd, *, loc=None, ip=None)#
- _build_result(res_vect, res_shp, *, loc=None, ip=None)#
- to(
- dtype: Type[cutlass.cute.typing.Numeric],
- *,
- loc=None,
- ip=None,
Convert the tensor to a different numeric type.
- Parameters:
dtype (Type[Numeric]) – The target numeric type to cast to.
- Returns:
A new tensor with the same shape but with elements cast to the target type.
- Return type:
- Raises:
TypeError – If dtype is not a subclass of Numeric.
NotImplementedError – If dtype is an unsigned integer type.
- ir_value(*, loc=None, ip=None)#
- reduce(
- op,
- init_val,
- reduction_profile: cutlass.cute.typing.Coord,
- *,
- loc=None,
- ip=None,
Perform reduce on selected modes with given predefined reduction op.
- Parameters:
op (operator) – The reduction operator to use (operator.add or operator.mul)
init_val (numeric) – The initial value for the reduction
reduction_profile (Coord) – Specifies which dimensions to reduce. Dimensions marked with ‘_’ are kept.
- Returns:
The reduced tensor
- Return type:
Examples
- reduce(f32 o (4,))
=> f32
- reduce(f32 o (4, 5))
=> f32
- reduce(f32 o (4, (5, 4)), reduction_profile=(_, 1))
=> f32 o (4,)
- reduce(f32 o (4, (5, 4)), reduction_profile=(_, (_, 1)))
=> f32 o (4, (5,))
- cutlass.cute.assume(src, divby=None, *, loc=None, ip=None)#
- cutlass.cute.is_integer(a) bool #
Check if an object is static integer or dynamic integer
- cutlass.cute.is_int_tuple(a) bool #
- cutlass.cute.is_static(
- x: cutlass._mlir.ir.Type | cutlass._mlir.ir.Value | cutlass.cute.typing.XTuple,
Check if a value is statically known at compile time.
In CuTe, static values are those whose values are known at compile time, as opposed to dynamic values which are only known at runtime.
- Parameters:
x (Union[ir.Type, ir.Value, XTuple]) – The value to check
- Returns:
True if the value is static, False otherwise
- Return type:
bool
- Raises:
TypeError – If an unsupported type is provided
- cutlass.cute.size(
- a: cutlass.cute.typing.IntTuple | cutlass.cute.typing.Shape | cutlass.cute.typing.Layout | cutlass._mlir.ir.register_value_caster | cutlass.cute.typing.Tensor,
- mode: List[int] = [],
- *,
- loc=None,
- ip=None,
Return size of domain of layout or tensor.
Computes the size (number of elements) in the domain of a layout or tensor. For layouts, this corresponds to the shape of the coordinate space. See Section 3.2 in the CuTe Whitepaper for more details on layout domains.
- Parameters:
a (IntTuple, Shape, Layout, ComposedLayout or Tensor) – The input object whose size to compute
mode (list of int, optional) – List of mode(s) for size calculation. If empty, computes total size, defaults to []
loc (optional) – Source location for MLIR, defaults to None
ip (optional) – Insertion point, defaults to None
- Returns:
Static size of layout or tensor if static, otherwise a Value
- Return type:
int or Value
- Raises:
ValueError – If mode contains non-integer elements
- cutlass.cute.has_underscore(a: cutlass.cute.typing.XTuple) bool #
- cutlass.cute.slice_(src, coord: cutlass.cute.typing.Coord, *, loc=None, ip=None)#
Perform a slice operation on a source object using the given coordinate.
This function implements CuTe’s slicing operation which extracts a subset of elements from a source object (tensor, layout, etc.) based on a coordinate pattern. The slice operation preserves the structure of the source while selecting specific elements.
- Parameters:
src (Union[Tensor, Layout, IntTuple, Value]) – Source object to be sliced (tensor, layout, tuple, etc.)
coord (Coord) – Coordinate pattern specifying which elements to select
loc (Optional[Location]) – Source location information, defaults to None
ip (Optional[InsertionPoint]) – Insertion point for IR generation, defaults to None
- Returns:
A new object containing the sliced elements
- Return type:
Union[Tensor, Layout, IntTuple, tuple]
- Raises:
ValueError – If the coordinate pattern is incompatible with source
Examples:
# Layout slicing layout = make_layout((4,4)) # Select 1st index of first mode and keep all elements in second mode sub_layout = slice_(layout, (1, None))
# Basic tensor slicing tensor = make_tensor(...) # Create a 2D tensor # Select 1st index of first mode and keep all elements in second mode sliced = slice_(tensor, (1, None))
# Select 2nd index of second mode and keep all elements in first mode sliced = slice_(tensor, (None, 2))
Note
None represents keeping all elements in that mode
Slicing preserves the layout/structure of the original object
Can be used for: * Extracting sub-tensors/sub-layouts * Creating views into data * Selecting specific patterns of elements
- cutlass.cute.depth(
- a: cutlass.cute.typing.XTuple | cutlass.cute.typing.Layout | cutlass._mlir.ir.register_value_caster,
Returns the depth (nesting level) of a tuple, layout, or tensor.
The depth of a tuple is the maximum depth of its elements plus 1. For an empty tuple, the depth is 1. For layouts and tensors, the depth is determined by the depth of their shape. For non-tuple values (e.g., integers), the depth is considered 0.
- Parameters:
a (Union[XTuple, Layout, ComposedLayout, Tensor, Any]) – The object whose depth is to be determined
- Returns:
The depth of the input object
- Return type:
int
Example:
>>> depth(1) 0 >>> depth((1, 2)) 1 >>> depth(((1, 2), (3, 4))) 2
- cutlass.cute.rank(
- a: cutlass.cute.typing.XTuple | cutlass.cute.typing.Layout | cutlass._mlir.ir.register_value_caster,
Returns the rank (dimensionality) of a tuple, layout, or tensor.
The rank of a tuple is its length. For layouts and tensors, the rank is determined by the rank of their shape. For non-tuple values (e.g., integers), the rank is considered 1 for convenience.
- Parameters:
a (Union[XTuple, Layout, ComposedLayout, Tensor, Any]) – The object whose rank is to be determined
- Returns:
The rank of the input object
- Return type:
int
This function is used in layout algebra to determine the dimensionality of tensors and layouts for operations like slicing and evaluation.
- cutlass.cute.printf(*args, loc=None, ip=None) None #
Print a value or a list of values.
- Parameters:
args (list) – List of values to print
loc (source location, optional) – Source location where it’s called, defaults to None
ip (insertion pointer, optional) – Insertion pointer, defaults to None
- Raises:
ValueError – If no arguments are provided or if an unsupported argument type is passed
- cutlass.cute.print_tensor(
- tensor: cutlass.cute.typing.Tensor,
- *,
- verbose: bool = False,
- loc=None,
- ip=None,
Print content of the tensor in human readable format.
- tensor(raw_ptr<@…, Float32, generic, align(4)> o (8,5):(5,1), data=
- [[-0.4326, -0.5434, 0.1238, 0.7132, 0.8042],
[-0.8462, 0.9871, 0.4389, 0.7298, 0.6948], [ 0.3426, 0.5856, 0.1541, 0.2923, 0.6976], [-0.1649, 0.8811, 0.1788, 0.1404, 0.2568], [-0.2944, 0.8593, 0.4171, 0.8998, 0.1766], [ 0.8814, 0.7919, 0.7390, 0.4566, 0.1576], [ 0.9159, 0.7577, 0.6918, 0.0754, 0.0591], [ 0.6551, 0.1626, 0.1189, 0.0292, 0.8655]])
- cutlass.cute.pretty_str(arg) str #
Constructs a concise readable pretty string.
- cutlass.cute.make_layout(
- shape: cutlass.cute.typing.Shape,
- *,
- stride: cutlass.cute.typing.Stride | None = None,
- loc=None,
- ip=None,
Create a CuTe Layout object from shape and optional stride information.
A Layout in CuTe represents the mapping between logical and physical coordinates of a tensor. This function creates a Layout object that defines how tensor elements are arranged in memory.
- Parameters:
shape (Shape) – Shape of the layout defining the size of each mode
stride (Union[Stride, None]) – Optional stride values for each mode, defaults to None
loc (Optional[Location]) – Source location information, defaults to None
ip (Optional[InsertionPoint]) – Insertion point for IR generation, defaults to None
- Returns:
A new Layout object with the specified shape and stride
- Return type:
Layout
Examples:
# Create a 2D compact left-most layout with shape (4,4) layout = make_layout((4,4)) # compact left-most layout # Create a left-most layout with custom strides layout = make_layout((4,4), stride=(1,4)) # left-most layout with strides (1,4) # Create a layout for a 3D tensor layout = make_layout((32,16,8)) # left-most layout # Create a layout with custom strides layout = make_layout((2,2,2), stride=(4,1,2)) # layout with strides (4,1,2)
Note
If stride is not provided, a default compact left-most stride is computed based on the shape
The resulting layout maps logical coordinates to physical memory locations
The layout object can be used for tensor creation and memory access patterns
Strides can be used to implement: * Row-major vs column-major layouts * Padding and alignment * Blocked/tiled memory arrangements * Interleaved data formats
Stride is keyword only argument to improve readability, e.g. * make_layout((3,4), (1,4)) can be confusing with make_layout(((3,4), (1,4))) * make_layout((3,4), stride=(1,4)) is more readable
- cutlass.cute.recast_layout(
- new_type_bits,
- old_type_bits,
- src_layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.make_identity_layout(
- shape: cutlass.cute.typing.Shape,
- *,
- loc=None,
- ip=None,
Create an identity layout with the given shape.
An identity layout maps logical coordinates directly to themselves without any transformation. This is equivalent to a layout with stride (1@0,1@1,…,1@(N-1)).
- Parameters:
shape (Shape) – The shape of the layout
loc (Optional[Location]) – Source location information, defaults to None
ip (Optional[InsertionPoint]) – Insertion point for IR generation, defaults to None
- Returns:
A new identity Layout object with the specified shape
- Return type:
Layout
Examples:
# Create a 2D identity layout with shape (4,4) layout = make_identity_layout((4,4)) # stride=(1@0,1@1) # Create a 3D identity layout layout = make_identity_layout((32,16,8)) # stride=(1@0,1@1,1@2)
Note
An identity layout is a special case where each coordinate maps to itself
Useful for direct coordinate mapping without any transformation
- cutlass.cute.make_ordered_layout(
- shape: cutlass.cute.typing.Shape,
- order: cutlass.cute.typing.Shape,
- *,
- loc=None,
- ip=None,
Create a layout with a specific ordering of dimensions.
This function creates a layout where the dimensions are ordered according to the specified order parameter, allowing for custom dimension ordering in the layout.
- Parameters:
shape (Shape) – The shape of the layout
order (Shape) – The ordering of dimensions
loc (Optional[Location]) – Source location information, defaults to None
ip (Optional[InsertionPoint]) – Insertion point for IR generation, defaults to None
- Returns:
A new Layout object with the specified shape and dimension ordering
- Return type:
Layout
Examples:
# Create a row-major layout layout = make_ordered_layout((4,4), order=(1,0)) # Create a column-major layout layout = make_ordered_layout((4,4), order=(0,1)) # stride=(1,4) # Create a layout with custom dimension ordering for a 3D tensor layout = make_ordered_layout((32,16,8), order=(2,0,1)) # stride=(128,1,16)
Note
The order parameter specifies the ordering of dimensions from fastest-varying to slowest-varying
For a 2D tensor, (0,1) creates a column-major layout, while (1,0) creates a row-major layout
The length of order must match the rank of the shape
- cutlass.cute.make_composed_layout(
- inner,
- offset: cutlass.cute.typing.IntTuple,
- outer: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
Create a composed layout by composing an inner transformation with an outer layout.
As described in the CuTe whitepaper, a composed layout applies a sequence of transformations to coordinates. The composition is defined as (inner ∘ offset ∘ outer), where the operations are applied from right to left.
- Parameters:
inner (Union[Layout, Swizzle]) – The inner transformation (can be a Layout or Swizzle)
offset (IntTuple) – An integral offset applied between transformations
outer (Layout) – The outer (right-most) layout that is applied first
loc (Optional[Location]) – Source location information, defaults to None
ip (Optional[InsertionPoint]) – Insertion point for IR generation, defaults to None
- Returns:
A new ComposedLayout representing the composition
- Return type:
Examples:
# Create a basic layout inner = make_layout(...) outer = make_layout((4,4), stride=(E(0), E(1))) # Create a composed layout with an offset composed = make_composed_layout(inner, (2,0), outer)
Note
The composition applies transformations in the order: outer → offset → inner
The stride divisibility condition must be satisfied for valid composition
Certain compositions (like Swizzle with scaled basis) are invalid and will raise errors
Composed layouts inherit many properties from the outer layout
- cutlass.cute.make_layout_tv(
- thr_layout: cutlass.cute.typing.Layout,
- val_layout: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
Create a tiled copy given separate thr and val layouts. A TV partitioner is inferred based on inputs. Requires input thr layout be compact.
- Parameters:
atom (copy atom, e.g. smit_copy and simt_async_copy, tma_load, etc.)
thr_layout (mn -> tid (need to be compact?))
val_layout (mn -> vid)
loc (source location for mlir (optional))
ip (insertion point (optional))
- Returns:
layout_mn – logical tile size
layout_tv – thread-value layout (tid, vid) -> mn
- cutlass.cute.make_layout_image_mask(
- lay: cutlass.cute.typing.Layout,
- coord: cutlass.cute.typing.Coord,
- mode: int,
- *,
- loc=None,
- ip=None,
Makes a 16-bit integer mask of the image of a layout sliced at a given mode and accounting for the offset given by the input coordinate for the other modes.
- cutlass.cute.make_ptr(
- dtype: Type[cutlass.cute.typing.Numeric] | None,
- value,
- mem_space: cutlass.cute.typing.AddressSpace = cutlass.cute.typing.AddressSpace.generic,
- *,
- assumed_align=None,
- loc=None,
- ip=None,
- cutlass.cute.make_tensor(
- iterator,
- layout: cutlass.cute.typing.Shape | cutlass.cute.typing.Layout | cutlass._mlir.ir.register_value_caster,
- *,
- loc=None,
- ip=None,
Creates a tensor by composing an engine (iterator/pointer) with a layout.
A tensor is defined as T = E ∘ L, where E is an engine (array, pointer, or counting iterator) and L is a layout that maps logical coordinates to physical offsets. The tensor evaluates coordinates by applying the layout mapping and dereferencing the engine at the resulting offset.
- Parameters:
iterator (Union[Pointer, IntTuple]) – Engine component (pointer, iterator, or counting iterator) that provides data access capabilities
layout (Union[Shape, Layout, ComposedLayout]) – Layout component that defines the mapping from logical coordinates to physical offsets
loc (Optional[Location]) – Source location for MLIR operation tracking, defaults to None
ip (Optional[InsertionPoint]) – Insertion point for MLIR operation, defaults to None
- Returns:
A tensor object representing the composition E ∘ L
- Return type:
Tensor
- Raises:
ValueError – If iterator type is not supported
Examples:
# Create a tensor with row-major layout layout = make_layout((64, 128), stride=(128, 1)) tensor = make_tensor(ptr, layout) # Create a tensor with hierarchical layout layout = make_layout(((128, 8), (1, 4, 1)), stride=((32, 1), (0, 8, 4096))) tensor = make_tensor(smem_ptr, layout) # Create a counting tensor layout = make_layout(2, stride=16 * E(0)) tensor = make_tensor(5, layout)
Notes
The engine (iterator) must support random access operations
Common engine types include raw pointers, arrays, and random-access iterators
The layout defines both the shape (logical dimensions) and stride (physical mapping)
Supports both direct coordinate evaluation T(c) and partial evaluation (slicing)
- cutlass.cute.make_identity_tensor(
- shape: cutlass.cute.typing.Shape,
- *,
- loc=None,
- ip=None,
Creates an identity tensor with the given shape.
An identity tensor maps each coordinate to itself, effectively creating a counting sequence within the shape’s bounds. This is useful for generating coordinate indices or creating reference tensors for layout transformations.
- Parameters:
shape (Shape) – The shape defining the tensor’s dimensions. Can be a simple integer sequence or a hierarchical structure ((m,n),(p,q))
loc (Optional[Location]) – Source location for MLIR operation tracking, defaults to None
ip (Optional[InsertionPoint]) – Insertion point for MLIR operation, defaults to None
- Returns:
A tensor that maps each coordinate to itself
- Return type:
Tensor
Examples:
# Create a simple 1D counting tensor tensor = make_identity_tensor(6) # [0,1,2,3,4,5] # Create a 2D counting tensor tensor = make_identity_tensor((3,2)) # [(0,0),(1,0),(2,0),(0,1),(1,1),(2,1)] # Create hierarchical counting tensor tensor = make_identity_tensor(((2,1),3)) # [((0,0),0),((1,0),0),((0,0),1),((1,0),1),((0,0),2),((1,0),2)]
Notes
The shape parameter follows CuTe’s IntTuple concept
Coordinates are ordered colexicographically
Useful for generating reference coordinates in layout transformations
- cutlass.cute.make_fragment(
- layout_or_shape: cutlass.cute.typing.Layout | cutlass.cute.typing.Shape,
- dtype: Type[cutlass.cute.typing.Numeric],
- *,
- loc=None,
- ip=None,
- cutlass.cute.make_fragment_like(src, dtype=None, *, loc=None, ip=None)#
Create tensor with a compact layout in the same shape as the source on stack.
This function either creates a fragment tensor with compact layout in same shape as the source layout or a new layout with the same shape as the source. The strides of the new layout follow the order induced by the source’s strides, with a special handling of the 0th mode: it is always stride-1 and generated in column-major order (LayoutLeft).
- Parameters:
src (Union[Layout, ComposedLayout, Tensor]) – The source layout or tensor whose shape will be matched
dtype (Type[Numeric], optional) – The element type for the fragment tensor, defaults to None
loc (Location, optional) – Source location for MLIR operations, defaults to None
ip (InsertionPoint, optional) – Insertion point for MLIR operations, defaults to None
- Returns:
A new layout or fragment tensor with matching shape
- Return type:
Union[Layout, Tensor]
Examples
Creating a rmem tensor from a tensor:
smem_tensor = cute.make_tensor(smem_ptr, layout) frag_tensor = cute.make_fragment_like(smem_tensor, cutlass.Float32) # frag_tensor will be a register-backed tensor with the same shape
Creating a fragment with a different element type:
tensor = cute.make_tensor(gmem_ptr, layout) bool_frag = cute.make_fragment_like(tensor, cutlass.Boolean) # bool_frag will be a register-backed tensor with Boolean elements
Notes
When used with a Tensor, if a type is provided, it will create a new fragment tensor with that element type.
For layouts with ScaledBasis strides, the function creates a fragment from the shape only.
This function is commonly used in GEMM and other tensor operations to create register storage for intermediate results.
- cutlass.cute.recast_ptr(
- ptr: cutlass.cute.typing.Pointer,
- swizzle_=None,
- dtype: Type[cutlass.cute.typing.Numeric] | None = None,
- loc=None,
- ip=None,
- cutlass.cute.recast_tensor(
- src: cutlass.cute.typing.Tensor,
- dtype: Type[cutlass.cute.typing.Numeric],
- swizzle_=None,
- *,
- loc=None,
- ip=None,
- cutlass.cute.get(input, mode: List[int], *, loc=None, ip=None)#
Extract a specific element or sub-layout from a layout or tuple.
This function recursively traverses the input according to the mode indices, extracting the element at the specified path. For layouts, this operation corresponds to extracting a specific sub-layout.
- Parameters:
input (Layout, ComposedLayout, tuple) – The input layout or tuple to extract from
mode (List[int]) – Indices specifying the path to traverse for extraction
loc (optional) – Source location for MLIR, defaults to None
ip (optional) – Insertion point, defaults to None
- Returns:
The extracted element or sub-layout
- Return type:
Layout, ComposedLayout, or element type
- Raises:
ValueError – If any index in mode is out of range
TypeError – If mode contains non-integer elements or if input has unsupported type
Examples:
For a layout like ((4,8),2):((16,1),8), get with mode=[0,1] would extract the element 8 from the shape component.
- cutlass.cute.select(input, mode: List[int], *, loc=None, ip=None)#
Select modes from input.
- Parameters:
input (Layout, ComposedLayout, tuple) – Input to select from
mode (List[int]) – Indices specifying which dimensions or elements to select
loc (optional) – Source location for MLIR, defaults to None
ip (optional) – Insertion point, defaults to None
- Returns:
A new instance with selected dimensions/elements
- Return type:
Layout, ComposedLayout, tuple
- Raises:
ValueError – If any index in mode is out of range
TypeError – If the input type is invalid
- cutlass.cute.front(input, *, loc=None, ip=None)#
Recursively get the first element of input.
This function traverses a hierarchical structure (like a layout or tensor) and returns the first element at the deepest level. It’s particularly useful for accessing the first stride value in a layout to determine properties like majorness.
- Parameters:
input (Union[Tensor, Layout, Stride]) – The hierarchical structure to traverse
loc (source location, optional) – Source location where it’s called, defaults to None
ip (insertion pointer, optional) – Insertion pointer for IR generation, defaults to None
- Returns:
The first element at the deepest level of the input structure
- Return type:
Union[int, float, bool, ir.Value]
- cutlass.cute.is_major(
- mode,
- stride: cutlass.cute.typing.Stride,
- *,
- loc=None,
- ip=None,
Check whether a mode in stride is the major mode.
- cutlass.cute.find(
- x: int,
- t: tuple | cutlass._mlir.ir.Value | int,
- *,
- exclude_when: cutlass.cute.typing.IntTuple | None = None,
- loc=None,
- ip=None,
Find the first position of a x in t. If exclude_when is provided, the positions where comparison equals comparison_value will be excluded from the search results.
- Parameters:
x (int) – The static integer x to search for
t (Union[tuple, ir.Value, int]) – The search space
exclude_when (Optional[Tuple[int, Union[tuple, ir.Value, int]]]) – A tuple of (comparison_value, comparison) - positions where comparison equals comparison_value will be excluded from the search results
- Returns:
Index if found at top level, tuple of indices showing nested position, or None if not found
- Return type:
Union[int, Tuple[int, …], None]
- cutlass.cute.coalesce(
- input,
- *,
- target_profile: cutlass.cute.typing.Coord | None = None,
- loc=None,
- ip=None,
- cutlass.cute.group_modes(input, begin: int, end: int = -1, *, loc=None, ip=None)#
Group modes of a hierarchical tuple or layout into a single mode.
This function groups a range of modes from the input object into a single mode, creating a hierarchical structure. For tuples, it creates a nested tuple containing the specified range of elements. For layouts and other CuTe objects, it creates a hierarchical representation where the specified modes are grouped together.
- Parameters:
input (Layout, ComposedLayout, tuple, Shape, Stride, etc.) – Input object to group modes from (layout, tuple, etc.)
beg (int) – Beginning index of the range to group (inclusive)
end (int) – Ending index of the range to group (exclusive)
loc (optional) – Source location for MLIR, defaults to None
ip (optional) – Insertion point, defaults to None
- Returns:
A new object with the specified modes grouped
- Return type:
Same type as input with modified structure
Examples:
# Group modes in a tuple t = (2, 3, 4, 5) grouped = group_modes(t, 1, 3) # (2, (3, 4), 5) # Group modes in a layout layout = make_layout((2, 3, 4, 5)) grouped_layout = group_modes(layout, 1, 3) # Layout with shape (2, (3, 4), 5) # Group modes in a shape shape = make_shape(2, 3, 4, 5) grouped_shape = group_modes(shape, 0, 2) # Shape ((2, 3), 4, 5)
- cutlass.cute.cosize(
- a: cutlass.cute.typing.Layout | cutlass._mlir.ir.register_value_caster | cutlass.cute.typing.Tensor,
- mode: List[int] = [],
- *,
- loc=None,
- ip=None,
Return size of codomain of layout or tensor. Return static value if type is static.
- Parameters:
a (Union[Layout, ComposedLayout, Tensor]) – Layout, ComposedLayout, or Tensor object
mode (List[int], optional) – List of mode(s) for cosize calculation
loc (optional) – Location information for diagnostics, defaults to None
ip (optional) – Instruction pointer for diagnostics, defaults to None
- Returns:
Static size of layout or tensor (fast fold) if static, or a dynamic Value
- Return type:
Union[int, Value]
- cutlass.cute.size_in_bytes(
- dtype: Type[cutlass.cute.typing.Numeric],
- layout: cutlass.cute.typing.Layout | cutlass._mlir.ir.register_value_caster,
- *,
- loc=None,
- ip=None,
Calculate the size in bytes based on its data type and layout.
- Parameters:
dtype (Type[Numeric]) – The DSL numeric data type
layout (Layout, optional) – The layout of the elements. If None, the function returns 0
loc (optional) – Location information for diagnostics, defaults to None
ip (optional) – Instruction pointer for diagnostics, defaults to None
- Returns:
The total size in bytes. Returns 0 if the layout is None
- Return type:
int
- cutlass.cute.flatten_to_tuple(
- a: cutlass.cute.typing.IntTuple | cutlass.cute.typing.Coord | cutlass.cute.typing.Shape | cutlass.cute.typing.Stride,
Flattens a potentially nested tuple structure into a flat tuple.
This function recursively traverses the input structure and flattens it into a single-level tuple, preserving the order of elements.
- Parameters:
a (Union[IntTuple, Coord, Shape, Stride]) – The structure to flatten
- Returns:
A flattened tuple containing all elements from the input
- Return type:
tuple
Examples:
flatten_to_tuple((1, 2, 3)) # Returns (1, 2, 3) flatten_to_tuple(((1, 2), 3)) # Returns (1, 2, 3) flatten_to_tuple((1, (2, (3,)))) # Returns (1, 2, 3)
- cutlass.cute.flatten(
- a: cutlass.cute.typing.IntTuple | cutlass.cute.typing.Coord | cutlass.cute.typing.Shape | cutlass.cute.typing.Stride | cutlass.cute.typing.Layout | cutlass.cute.typing.Tensor,
Flattens a CuTe data structure into a simpler form.
For tuples, this function flattens the structure into a single-level tuple. For non-tuple types, it returns the input unchanged.
- Parameters:
a (Union[IntTuple, Coord, Shape, Stride, Layout, Tensor]) – The structure to flatten
- Returns:
The flattened structure
- Return type:
Union[tuple, Any]
- Raises:
NotImplementedError – If input is a Layout or Tensor
Examples:
flatten((1, 2, 3)) # Returns (1, 2, 3) flatten(((1, 2), (3, 4))) # Returns (1, 2, 3, 4) flatten(5) # Returns 5
- cutlass.cute.product(
- a: cutlass.cute.typing.IntTuple | cutlass.cute.typing.Shape,
- *,
- loc=None,
- ip=None,
Return product of the given IntTuple or Shape.
Computes the product of all elements in the input tuple or shape. Returns static value if type is static.
- Parameters:
a (IntTuple or Shape) – The input tuple or shape
loc (optional) – Source location for MLIR, defaults to None
ip (optional) – Insertion point, defaults to None
- Returns:
Static product of IntTuple or Shape if static, otherwise a Value
- Return type:
int or Value
- Raises:
TypeError – If input is not an IntTuple or Shape
- cutlass.cute.product_like(
- a: cutlass.cute.typing.IntTuple | cutlass.cute.typing.Shape,
- target_profile: cutlass.cute.typing.XTuple,
- *,
- loc=None,
- ip=None,
Return product of the given IntTuple or Shape at leaves of target_profile.
This function computes products according to the structure defined by target_profile.
- Parameters:
a (IntTuple or Shape) – The input tuple or shape
target_profile (XTuple) – The profile that guides how products are computed
loc (optional) – Source location for MLIR, defaults to None
ip (optional) – Insertion point, defaults to None
- Returns:
The resulting tuple with products computed according to target_profile
- Return type:
IntTuple or Shape
- Raises:
TypeError – If inputs have incompatible types
ValueError – If inputs have incompatible shapes
- cutlass.cute.product_each(a, *, loc=None, ip=None)#
Compute products for each component of the input.
Returns a rank(a) tuple result such that get(result, mode=[i]) == product(get(a, mode=[i]))
- Parameters:
a (IntTuple or Shape) – The input tuple or shape
loc (optional) – Source location for MLIR, defaults to None
ip (optional) – Insertion point, defaults to None
- Returns:
A tuple containing products for each component
- Return type:
tuple
- Raises:
TypeError – If input is not an IntTuple or Shape
- cutlass.cute.prepend(
- input,
- elem,
- up_to_rank: None | int = None,
- *,
- loc=None,
- ip=None,
Extend input to rank up_to_rank by prepending elem in front of input.
This function extends the input object by prepending elements to reach a desired rank. It supports various CuTe types including shapes, layouts, tensors etc.
- Parameters:
input (Union[Shape, Stride, Coord, IntTuple, Tile, Layout, ComposedLayout, Tensor]) – Source to be prepended to
elem (Union[Shape, Stride, Coord, IntTuple, Tile, Layout]) – Element to prepend to input
up_to_rank (Union[None, int], optional) – The target rank after extension, defaults to None
loc (Optional[Location]) – Source location for MLIR, defaults to None
ip (Optional[InsertionPoint]) – Insertion point, defaults to None
- Returns:
The extended result with prepended elements
- Return type:
Union[Shape, Stride, Coord, IntTuple, Tile, Layout, ComposedLayout, Tensor]
- Raises:
ValueError – If up_to_rank is less than input’s current rank
TypeError – If input or elem has unsupported type
Examples:
# Prepend to a Shape shape = (4,4) prepend(shape, 2) # Returns (2,4,4) # Prepend to a Layout layout = make_layout((8,8)) prepend(layout, make_layout((2,))) # Returns (2,8,8):(1,1,8) # Prepend with target rank coord = (1,1) prepend(coord, 0, up_to_rank=4) # Returns (0,0,1,1)
- cutlass.cute.append(
- input,
- elem,
- up_to_rank: None | int = None,
- *,
- loc=None,
- ip=None,
Extend input to rank up_to_rank by appending elem to the end of input.
This function extends the input object by appending elements to reach a desired rank. It supports various CuTe types including shapes, layouts, tensors etc.
- Parameters:
input (Union[Shape, Stride, Coord, IntTuple, Tile, Layout, ComposedLayout, Tensor]) – Source to be appended to
elem (Union[Shape, Stride, Coord, IntTuple, Tile, Layout]) – Element to append to input
up_to_rank (Union[None, int], optional) – The target rank after extension, defaults to None
loc (Optional[Location]) – Source location for MLIR, defaults to None
ip (Optional[InsertionPoint]) – Insertion point, defaults to None
- Returns:
The extended result with appended elements
- Return type:
Union[Shape, Stride, Coord, IntTuple, Tile, Layout, ComposedLayout, Tensor]
- Raises:
ValueError – If up_to_rank is less than input’s current rank
TypeError – If input or elem has unsupported type
Examples:
# Append to a Shape shape = (4,4) append(shape, 2) # Returns (4,4,2) # Append to a Layout layout = make_layout((8,8)) append(layout, make_layout((2,))) # Returns (8,8,2):(1,8,1) # Append with target rank coord = (1,1) append(coord, 0, up_to_rank=4) # Returns (1,1,0,0)
Note
The function preserves the structure of the input while extending it
Can be used to extend tensors, layouts, shapes and other CuTe types
When up_to_rank is specified, fills remaining positions with elem
Useful for tensor reshaping and layout transformations
- cutlass.cute.prepend_ones(
- t: cutlass.cute.typing.Tensor,
- up_to_rank: None | int = None,
- *,
- loc=None,
- ip=None,
- cutlass.cute.append_ones(
- t: cutlass.cute.typing.Tensor,
- up_to_rank: None | int = None,
- *,
- loc=None,
- ip=None,
- cutlass.cute.ceil_div(
- input: cutlass.cute.typing.Shape,
- tiler: cutlass.cute.typing.Tiler,
- *,
- loc=None,
- ip=None,
Compute the ceiling division of a target shape by a tiling specification.
This function computes the number of tiles required to cover the target domain. It is equivalent to the second mode of zipped_divide(input, tiler).
- Parameters:
input (Shape) – A tuple of integers representing the dimensions of the target domain.
tiler (Union[Layout, Shape, Tile]) – The tiling specification.
loc (optional) – Optional location information for IR diagnostics.
ip (optional) – Optional instruction pointer or context for underlying IR functions.
- Returns:
A tuple of integers representing the number of tiles required along each dimension, i.e. the result of the ceiling division of the input dimensions by the tiler dimensions.
- Return type:
Shape
Example:
import cutlass.cute as cute @cute.jit def foo(): input = (10, 6) tiler = (3, 4) result = cute.ceil_div(input, tiler) print(result) # Outputs: (4, 2)
- cutlass.cute.round_up(
- a: cutlass.cute.typing.IntTuple,
- b: cutlass.cute.typing.IntTuple,
Rounds up elements of a using elements of b.
- cutlass.cute.slice_and_offset(coord, src, *, loc=None, ip=None)#
- cutlass.cute.crd2idx(
- coord: cutlass.cute.typing.Coord,
- layout,
- *,
- loc=None,
- ip=None,
Convert a multi-dimensional coordinate into a value using the specified layout.
This function computes the inner product of the flattened coordinate and stride:
index = sum(flatten(coord)[i] * flatten(stride)[i] for i in range(len(coord)))
- Parameters:
coord (Coord) – A tuple or list representing the multi-dimensional coordinate (e.g., (i, j) for a 2D layout).
layout (Layout or ComposedLayout) – A layout object that defines the memory storage layout, including shape and stride, used to compute the inner product.
loc (optional) – Optional location information for IR diagnostics.
ip (optional) – Optional instruction pointer or context for underlying IR functions.
- Returns:
The result of applying the layout transformation to the provided coordinate.
- Return type:
Any type that the layout maps to
Example:
import cutlass.cute as cute @cute.jit def foo(): L = cute.make_layout((5, 4), stride=(4, 1)) idx = cute.crd2idx((2, 3), L) # Computed as: 2 * 4 + 3 = 11 print(idx) foo() # Expected output: 11
- cutlass.cute.domain_offset(
- coord: cutlass.cute.typing.Coord,
- tensor: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
- cutlass.cute.elem_less(
- lhs: cutlass.cute.typing.Shape | cutlass.cute.typing.IntTuple | cutlass.cute.typing.Coord,
- rhs: cutlass.cute.typing.Shape | cutlass.cute.typing.IntTuple | cutlass.cute.typing.Coord,
- *,
- loc=None,
- ip=None,
- cutlass.cute.filter_zeros(input, *, target_profile=None, loc=None, ip=None)#
Filter out zeros from a layout or tensor.
This function removes zero-stride dimensions from a layout or tensor. See Section 3.3 in the CuTe Whitepaper for more details on layout operations.
- Parameters:
input (Layout or Tensor) – The input layout or tensor to filter
target_profile (optional) – Target profile for the filtered result, defaults to None
loc (optional) – Source location for MLIR, defaults to None
ip (optional) – Insertion point, defaults to None
- Returns:
The filtered layout or tensor with zeros removed
- Return type:
Layout or Tensor
- Raises:
TypeError – If input is not a Layout or Tensor
- cutlass.cute.filter(
- input: cutlass.cute.typing.Layout | cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
Filter a layout or tensor.
This function filters a layout or tensor according to CuTe’s filtering rules.
- Parameters:
input (Layout or Tensor) – The input layout or tensor to filter
loc (optional) – Source location for MLIR, defaults to None
ip (optional) – Insertion point, defaults to None
- Returns:
The filtered layout or tensor
- Return type:
Layout or Tensor
- Raises:
TypeError – If input is not a Layout or Tensor
- cutlass.cute.tile_to_shape(
- atom: cutlass.cute.typing.Layout | cutlass._mlir.ir.register_value_caster,
- trg_shape: cutlass.cute.typing.Shape,
- order: cutlass.cute.typing.Shape,
- *,
- loc=None,
- ip=None,
- cutlass.cute.shape_div(
- lhs: cutlass.cute.typing.Shape,
- rhs: cutlass.cute.typing.Shape,
- *,
- loc=None,
- ip=None,
Perform element-wise division of shapes.
This function performs element-wise division between two shapes.
- Parameters:
lhs (Shape) – Left-hand side shape
rhs (Shape) – Right-hand side shape
loc (optional) – Source location for MLIR, defaults to None
ip (optional) – Insertion point, defaults to None
- Returns:
The result of element-wise division
- Return type:
Shape
- cutlass.cute.composition(
- lhs,
- rhs: cutlass.cute.typing.Layout | cutlass.cute.typing.Shape | cutlass.cute.typing.Tile,
- *,
- loc=None,
- ip=None,
Compose two layout representations using the CuTe layout algebra.
Compose a left-hand layout (or tensor) with a right-hand operand into a new layout R, such that for every coordinate c in the domain of the right-hand operand, the composed layout satisfies:
R(c) = A(B(c))
where A is the left-hand operand provided as
lhs
and B is the right-hand operand provided asrhs
. In this formulation, B defines the coordinate domain while A applies its transformation to B’s output, and the resulting layout R inherits the stride and shape adjustments from A.- Satisfies:
cute.shape(cute.composition(lhs, rhs)) is compatible with cute.shape(rhs)
- Parameters:
lhs (Layout or Tensor) – The left-hand operand representing the transformation to be applied.
rhs (Layout, Shape, or Tile, or int or tuple) – The right-hand operand defining the coordinate domain. If provided as an int or tuple, it will be converted to a tile layout.
loc (optional) – Optional location information for IR diagnostics.
ip (optional) – Optional instruction pointer or context for underlying IR functions.
- Returns:
A new composed layout R, such that for all coordinates c in the domain of
rhs
, R(c) = lhs(rhs(c)).- Return type:
Layout or Tensor
Example:
import cutlass.cute as cute @cute.jit def foo(): # Create a layout that maps (i,j) to i*4 + j L1 = cute.make_layout((2, 3), stride=(4, 1)) # Create a layout that maps (i,j) to i*3 + j L2 = cute.make_layout((3, 4), stride=(3, 1)) # Compose L1 and L2 L3 = cute.composition(L1, L2) # L3 now maps coordinates through L2 then L1
- cutlass.cute.complement(
- input: cutlass.cute.typing.Layout,
- cotarget: cutlass.cute.typing.Layout | cutlass.cute.typing.Shape,
- *,
- loc=None,
- ip=None,
Compute the complement layout of the input layout with respect to the cotarget.
The complement of a layout A with respect to cotarget n is a layout A* such that for every k in Z_n and c in the domain of A, there exists a unique c* in the domain of A* where k = A(c) + A*(c*).
This operation is useful for creating layouts that partition a space in complementary ways, such as row and column layouts that together cover a matrix.
- Parameters:
input (Layout) – The layout to compute the complement of
cotarget (Union[Layout, Shape]) – The target layout or shape that defines the codomain
loc (optional) – Optional location information for IR diagnostics
ip (optional) – Optional instruction pointer or context for underlying IR functions
- Returns:
The complement layout
- Return type:
Layout
Example:
import cutlass.cute as cute @cute.jit def foo(): # Create a right-major layout for a 4x4 matrix row_layout = cute.make_layout((4, 4), stride=(4, 1)) # Create a left-major layout that complements the row layout col_layout = cute.complement(row_layout, 16) # The two layouts are complementary under 16
- cutlass.cute.right_inverse(
- input: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.left_inverse(
- input: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.max_common_layout(
- a: cutlass.cute.typing.Layout | cutlass.cute.typing.Tensor,
- b: cutlass.cute.typing.Layout | cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
- cutlass.cute.max_common_vector(
- a: cutlass.cute.typing.Layout | cutlass.cute.typing.Tensor,
- b: cutlass.cute.typing.Layout | cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
- cutlass.cute.is_congruent(
- a: cutlass.cute.typing.XTuple | cutlass.cute.typing.Layout | cutlass._mlir.ir.register_value_caster | cutlass.cute.typing.Tensor,
- b: cutlass.cute.typing.XTuple | cutlass.cute.typing.Layout | cutlass._mlir.ir.register_value_caster | cutlass.cute.typing.Tensor,
Returns whether a is congruent to b.
- cutlass.cute.is_weakly_congruent(
- a: cutlass.cute.typing.XTuple | cutlass.cute.typing.Layout | cutlass._mlir.ir.register_value_caster | cutlass.cute.typing.Tensor,
- b: cutlass.cute.typing.XTuple | cutlass.cute.typing.Layout | cutlass._mlir.ir.register_value_caster | cutlass.cute.typing.Tensor,
Returns whether a is weakly congruent to b.
- cutlass.cute.logical_product(
- block,
- tiler: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.zipped_product(
- block,
- tiler: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.tiled_product(
- block,
- tiler: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.flat_product(
- block,
- tiler: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.raked_product(
- block,
- tiler: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.blocked_product(
- block,
- tiler: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.flat_divide(
- target,
- tiler: cutlass.cute.typing.Tiler,
- *,
- loc=None,
- ip=None,
- cutlass.cute.logical_divide(
- target,
- tiler: cutlass.cute.typing.Tiler,
- *,
- loc=None,
- ip=None,
- cutlass.cute.zipped_divide(
- target,
- tiler: cutlass.cute.typing.Tiler,
- *,
- loc=None,
- ip=None,
- cutlass.cute.tiled_divide(
- target,
- tiler: cutlass.cute.typing.Tiler,
- *,
- loc=None,
- ip=None,
- cutlass.cute.local_partition(
- target: cutlass.cute.typing.Tensor,
- tiler: cutlass.cute.typing.Layout | cutlass.cute.typing.Shape,
- index,
- proj: cutlass.cute.typing.XTuple = 1,
- *,
- loc=None,
- ip=None,
- cutlass.cute.local_tile(
- input: cutlass.cute.typing.Tensor,
- tiler: cutlass.cute.typing.Layout | cutlass.cute.typing.Shape,
- coord: cutlass.cute.typing.Coord,
- proj: cutlass.cute.typing.XTuple | None = None,
- *,
- loc=None,
- ip=None,
- cutlass.cute.make_mma_atom(
- op: MmaOp,
- *,
- loc=None,
- ip=None,
- **kwargs,
Makes an MMA Atom from an MMA Operation.
This function creates an MMA Atom from a given MMA Operation. Arbitrary kw arguments can be provided for Op-specific additional parameters. They are not used as of today.
- Parameters:
op (MmaOp) – The MMA Operation to construct an Atom for
- Returns:
The MMA Atom
- Return type:
- cutlass.cute.make_tiled_mma(
- op_or_atom: Op | MmaAtom,
- atom_layout_mnk=(1, 1, 1),
- permutation_mnk=None,
- *,
- loc=None,
- ip=None,
- **kwargs,
Makes a tiled MMA from an MMA Operation or an MMA Atom.
- Parameters:
op_or_atom (Union[Op, MmaAtom]) – The MMA Operation or Atom
atom_layout_mnk (Layout) – A Layout describing the tiling of Atom across threads
permutation_mnk (Tiler) – A permutation Tiler describing the tiling of Atom across values including any permutation of such tiling
- Returns:
The resulting tiled MMA
- Return type:
- cutlass.cute.make_copy_atom(
- op: CopyOp,
- copy_internal_type: Type[cutlass.cute.typing.Numeric],
- *,
- loc=None,
- ip=None,
- **kwargs,
Makes a Copy Atom from a Copy Operation.
This function creates a Copy Atom from a given Copy Operation. Arbitrary kw arguments can be provided for Op-specific additional parameters.
Example:
op = cute.nvgpu.CopyUniversalOp() atom = cute.make_copy_atom(op, tensor_dtype, num_bits_per_copy=64)
- Parameters:
op (CopyOp) – The Copy Operation to construct an Atom for
copy_internal_type (Type[Numeric]) – An internal data type used to construct the source/destination layouts in unit of tensor elements
- Returns:
The Copy Atom
- Return type:
- cutlass.cute.make_tiled_copy_tv(
- atom,
- thr_layout,
- val_layout,
- *,
- loc=None,
- ip=None,
Create a tiled copy given separate thr and val layouts. A TV partitioner is inferred based on inputs. Requires input thr layout be compact.
- Parameters:
atom (copy atom, e.g. smit_copy and simt_async_copy, tma_load, etc.)
thr_layout (mn -> tid (need to be compact?))
val_layout (mn -> vid)
loc (source location for mlir (optional))
ip (insertion point (optional))
- Returns:
A tiled copy for partitioner
- Return type:
tiled_copy
- cutlass.cute.make_tiled_copy(atom, layout_tv, tiler_mn, *, loc=None, ip=None)#
Create a tiled type given a TV partitioner and tiler
- Parameters:
atom (copy atom, e.g. smit_copy and simt_async_copy, tma_load, etc.)
layout_tv (thread-value layout.)
tiler_mn (tile size (??))
loc (source location for mlir (optional))
ip (insertion point (optional))
- Returns:
A tuple of A tiled copy and atom
- Return type:
tiled_copy
- cutlass.cute.make_tiled_copy_C_atom( )#
Create the smallest tiled copy that can retile LayoutC_TV for use with pipelined epilogues with subtiled stores
- cutlass.cute.basic_copy(
- src: cutlass.cute.typing.Tensor,
- dst: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
Performs a basic element-wise copy.
This functions assumes the following pre-conditions: 1. size(src) == size(dst)
When the src and dst shapes are static, the pre-conditions are actually verified and the element-wise loop is fully unrolled.
- cutlass.cute.basic_copy_if(
- pred: cutlass.cute.typing.Tensor,
- src: cutlass.cute.typing.Tensor,
- dst: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
Performs a basic predicated element-wise copy.
This functions assumes the following pre-conditions: 1. size(src) == size(dst) 2. size(src) == size(pred)
When all shapes are static, the pre-conditions are actually verified and the element-wise loop is fully unrolled.
- cutlass.cute.autovec_copy(
- src: cutlass.cute.typing.Tensor,
- dst: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
Auto-vectorizing SIMT copy policy.
Given a source and destination tensors that are statically shaped, this policy figures out the largest safe vector width that the copy instruction can take and performs the copy.
- cutlass.cute.copy(
- atom: CopyAtom,
- src: cutlass.cute.typing.Tensor,
- dst: cutlass.cute.typing.Tensor,
- *,
- pred: cutlass.cute.typing.Tensor | None = None,
- loc=None,
- ip=None,
- **kwargs,
The Copy algorithm.
The “copy with Atom” expects source and destination tensors to be partitioned according to the provided Copy Atom. Some Atoms require additional Op-specific kw arguments, for example TMA copies:
cute.copy(tma_atom, src, dst, tma_bar_ptr=mbar_ptr, mcast_mask=mask)
An additional predication tensor can be provided. If the partitioned tensors have the following logical profile
((ATOM_V,ATOM_REST),REST_M,...)
, the predication tensor must have a profile consistent with(ATOM_REST,REST_M,...)
.
- cutlass.cute.gemm(
- atom: MmaAtom,
- d: cutlass.cute.typing.Tensor,
- a: cutlass.cute.typing.Tensor,
- b: cutlass.cute.typing.Tensor,
- c: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
- **kwargs,
The GEMM algorithm.
Computes
D <- AB + C
whereC
andD
can alias. Note that some MMA Atoms (e.g. warpgroup-wide or tcgen05 MMAs) require manually setting an “accumulate” boolean field.All tensors must be partitioned according to the provided MMA Atom.
- cutlass.cute.full(
- shape,
- fill_value,
- dtype: Type[cutlass.cute.typing.Numeric],
- *,
- loc=None,
- ip=None,
Return a new TensorSSA of given shape and type, filled with fill_value.
- Parameters:
shape (tuple) – Shape of the new tensor.
fill_value (scalar) – Value to fill the tensor with.
dtype (Type[Numeric]) – Data type of the tensor.
- Returns:
Tensor of fill_value with the specified shape and dtype.
- Return type:
- cutlass.cute.full_like(
- a: TensorSSA,
- fill_value,
- dtype: Type[cutlass.cute.typing.Numeric] | None = None,
- *,
- loc=None,
- ip=None,
Return a full TensorSSA with the same shape and type as a given array.
- Parameters:
a (array_like) – The shape and data-type of a define these same attributes of the returned array.
fill_value (array_like) – Fill value.
dtype (Union[None, Type[Numeric]], optional) – Overrides the data type of the result, defaults to None
- Returns:
Tensor of fill_value with the same shape and type as a.
- Return type:
See also
empty_like()
: Return an empty array with shape and type of input.ones_like()
: Return an array of ones with shape and type of input.zeros_like()
: Return an array of zeros with shape and type of input.full()
: Return a new array of given shape filled with value.Examples
frg = cute.make_fragment(Float32, (2, 3)) a = frg.load() b = cute.full_like(a, 1.0)
- cutlass.cute.empty_like(a, dtype=None)#
Return a new TensorSSA with the same shape and type as a given array, without initializing entries.
- Parameters:
a (TensorSSA) – The shape and data-type of a define these same attributes of the returned array.
dtype (Type[Numeric], optional) – Overrides the data type of the result, defaults to None
- Returns:
Uninitialized tensor with the same shape and type (unless overridden) as a.
- Return type:
- cutlass.cute.ones_like(a, dtype=None)#
Return a TensorSSA of ones with the same shape and type as a given array.
- Parameters:
a (TensorSSA) – The shape and data-type of a define these same attributes of the returned array.
dtype (Type[Numeric], optional) – Overrides the data type of the result, defaults to None
- Returns:
Tensor of ones with the same shape and type (unless overridden) as a.
- Return type:
- cutlass.cute.zeros_like(a, dtype=None, *, loc=None, ip=None)#
Return a TensorSSA of zeros with the same shape and type as a given array.
- Parameters:
a (TensorSSA) – The shape and data-type of a define these same attributes of the returned array.
dtype (Type[Numeric], optional) – Overrides the data type of the result, defaults to None
- Returns:
Tensor of zeros with the same shape and type (unless overridden) as a.
- Return type:
- cutlass.cute.where( ) TensorSSA #
Return elements chosen from x or y depending on condition.
- Parameters:
- Returns:
A tensor with elements from x where condition is True, and elements from y where condition is False.
- Return type:
- cutlass.cute.any_(
- x: TensorSSA,
- *,
- loc=None,
- ip=None,
Test whether any tensor element evaluates to True.
- cutlass.cute.all_(
- x: TensorSSA,
- *,
- loc=None,
- ip=None,
Test whether all tensor elements evaluate to True.
- cutlass.cute.repeat_like(x, target)#
Creates an object congruent to target and filled with x.
This function recursively creates a nested tuple structure that matches the structure of the target, with each leaf node filled with the value x.
- Parameters:
x (Any) – The value to fill the resulting structure with
target (Union[tuple, Any]) – The structure to mimic
- Returns:
A structure matching target but filled with x
- Return type:
Union[tuple, Any]
Examples:
repeat_like(0, (1, 2, 3)) # Returns (0, 0, 0) repeat_like(1, ((1, 2), 3)) # Returns ((1, 1), 1) repeat_like(2, 5) # Returns 2
- class cutlass.cute.ScaledBasis(value, mode)#
Bases:
object
A class representing a scaled basis element in CuTe’s layout algebra.
ScaledBasis is used to represent elements in the layout algebra, particularly in the context of composition operations. It consists of a value (scale) and a mode that identifies mode of the basis element.
- Parameters:
value (Union[int, Integer, Ratio, ir.Value]) – The scale value
mode (Union[int, List[int]]) – The mode identifying the basis element
- Raises:
TypeError – If mode is not an integer or list of integers
Examples:
# Create a scaled basis with integer scale and mode sb1 = ScaledBasis(2, 0) # 2 * E(0) # Create a scaled basis with a Ratio scale sb2 = ScaledBasis(Ratio(1, 2), 1) # (1/2) * E(1) # Create a scaled basis with a list of modes sb3 = ScaledBasis(4, [0, 1]) # 4 * E([0, 1]) # Scaled basis elements are commonly used in layout strides layout = make_layout((4, 8), stride=(ScaledBasis(1, 0), ScaledBasis(1, 1))) # This creates a layout with strides (1@0, 1@1) representing # a coordinate system where each dimension has its own basis
- __init__(value, mode) None #
- is_static() bool #
Check if the value is statically known.
- Returns:
True if the value is not a dynamic expression
- Return type:
bool
- to(dtype)#
Convert to another type.
- Parameters:
dtype (type) – The target type for conversion
- Returns:
The ScaledBasis converted to the specified type
- Raises:
TypeError – If conversion to the specified type is not supported
- property value#
Get the scale value.
- Returns:
The scale value
- property mode: List[int]#
Get the mode identifying the basis element.
- Returns:
The mode as a list of integers
- Return type:
List[int]
- class cutlass.cute.struct(cls)#
Bases:
object
Decorator to abstract C structure in Python DSL.
Usage: .. code-block:
# Supports base_dsl scalar int/float elements, array and nested struct: @cute.struct class complex: real : cutlass.Float32 imag : cutlass.Float32 @cute.struct class StorageA: mbarA : cute.struct.MemRange[cutlass.Int64, stage] compA : complex intA : cutlass.Int16 # Supports aligment for its elements: @cute.struct class StorageB: a: cute.struct.Align[ cute.struct.MemRange[cutlass.Float32, size_a], 1024 ] b: cute.struct.Align[ cute.struct.MemRange[cutlass.Float32, size_b], 1024 ] x: cute.struct.Align[cutlass.Int32, 16] compA: cute.struct.Align[complex, 16] # Statically get size and alignment: size = StorageB.__sizeof__() align = StorageB.__alignof__() # Allocate and referencing elements: storage = allocator.allocate(StorageB) storage.a[0] ... storage.x ... storage.compA.real ...
- Parameters:
cls – The struct class with annotations.
- Returns:
The decorated struct class.
- class _MemRangeMeta(name, bases, dct)#
Bases:
type
A metaclass for creating MemRange classes.
This metaclass is used to dynamically create MemRange classes with specific data types and sizes.
- Variables:
_dtype – The data type of the MemRange.
_size – The size of the MemRange.
- _dtype = None#
- _size = None#
- property size#
- property elem_width#
- property size_in_bytes#
- class MemRange#
Bases:
object
Defines a range of memory by MemRange[T, size].
- class _MemRangeData(dtype, size, base)#
Bases:
object
Represents a range of memory.
- Parameters:
dtype – The data type.
size – The size of the memory range in bytes.
base – The base address of the memory range.
- __init__(dtype, size, base)#
Initializes a new memory range.
- Parameters:
dtype – The data type.
size – The size of the memory range in bytes.
base – The base address of the memory range.
- data_ptr()#
Returns start pointer to the data in this memory range.
- Returns:
A pointer to the start of the memory range.
- Raises:
AssertionError – If the size of the memory range is not greater than zero.
- get_tensor(layout, swizzle=None, dtype=None)#
Creates a tensor from the memory range.
- Parameters:
layout – The layout of the tensor.
swizzle – Optional swizzle pattern.
dtype – Optional data type; defaults to the memory range’s data type if not specified.
- Returns:
A tensor representing the memory range.
- Raises:
TypeError – If the layout is incompatible with the swizzle.
AssertionError – If the size of the memory range is not greater than zero.
- class _AlignMeta(name, bases, dct)#
Bases:
type
Aligns the given object by setting its alignment attribute.
- Parameters:
v – The object to align. Must be a struct, MemRange, or a scalar type.
align – The alignment value to set.
- Returns:
A copy of the object with the specified alignment.
- Raises:
TypeError – If the object is not a struct, MemRange, or a scalar type.
- class Align#
Bases:
object
Aligns the given type by Align[T, alignment].
- static _is_scalar_type(dtype)#
Checks if the given type is a scalar numeric type.
- Parameters:
dtype – The type to check.
- Returns:
True if the type is a subclass of Numeric, False otherwise.
- __init__(cls)#
Initializes a new struct decorator instance.
- Parameters:
cls – The class representing the structured data type.
- Raises:
TypeError – If the struct is empty.
- size_in_bytes() int #
Returns the size of the struct in bytes.
- Returns:
The size of the struct.