cutlass.cute#
- class cutlass.cute.Swizzle(*args: Any, **kwargs: Any)#
- Bases: - Value- Swizzle is a transformation that permutes the elements of a layout. - Swizzles are used to rearrange data elements to improve memory access patterns and computational efficiency. - Swizzle is defined by three parameters: - MBase: The number of least-significant bits to keep constant - BBits: The number of bits in the mask - SShift: The distance to shift the mask - The mask is applied to the least-significant bits of the layout. - 0bxxxxxxxxxxxxxxxYYYxxxxxxxZZZxxxx ^--^ MBase is the number of least-sig bits to keep constant ^-^ ^-^ BBits is the number of bits in the mask ^---------^ SShift is the distance to shift the YYY mask (pos shifts YYY to the right, neg shifts YYY to the left) e.g. Given 0bxxxxxxxxxxxxxxxxYYxxxxxxxxxZZxxx the result is 0bxxxxxxxxxxxxxxxxYYxxxxxxxxxAAxxx where AA = ZZ `xor` YY 
- cutlass.cute.E(mode: int | List[int]) ScaledBasis#
- Create a unit ScaledBasis element with the specified mode. - This function creates a ScaledBasis with value 1 and the given mode. The mode represents the coordinate axis or dimension in the layout. - Parameters:
- mode (Union[int, List[int]]) – The mode (dimension) for the basis element, either a single integer or a list of integers 
- Returns:
- A ScaledBasis with value 1 and the specified mode 
- Return type:
- Raises:
- TypeError – If mode is not an integer or a list 
 - Examples: - # Create a basis element for the first dimension (mode 0) e0 = E(0) # Create a basis element for the second dimension (mode 1) e1 = E(1) # Create a basis element for a hierarchical dimension e_hier = E([0, 1]) 
- class cutlass.cute.ScaledBasis(value, mode)#
- Bases: - object- A class representing a scaled basis element in CuTe’s layout algebra. - ScaledBasis is used to represent elements in the layout algebra, particularly in the context of composition operations. It consists of a value (scale) and a mode that identifies mode of the basis element. - Parameters:
- value (Union[int, Integer, Ratio, ir.Value]) – The scale value 
- mode (Union[int, List[int]]) – The mode identifying the basis element 
 
- Raises:
- TypeError – If mode is not an integer or list of integers 
 - Examples: - # Create a scaled basis with integer scale and mode sb1 = ScaledBasis(2, 0) # 2 * E(0) # Create a scaled basis with a Ratio scale sb2 = ScaledBasis(Ratio(1, 2), 1) # (1/2) * E(1) # Create a scaled basis with a list of modes sb3 = ScaledBasis(4, [0, 1]) # 4 * E([0, 1]) # Scaled basis elements are commonly used in layout strides layout = make_layout((4, 8), stride=(ScaledBasis(2, 0), ScaledBasis(1, 1))) # This creates a layout with strides (2@0, 1@1) representing # a coordinate system where each dimension has its own basis # Example: Mapping coordinates to indices using the layout coord = (2, 3) idx = crd2idx(coord, layout) # Maps (2, 3) to (4, 3) - __init__(value, mode) None#
 - is_static() bool#
- Check if the value is statically known. - Returns:
- True if the value is not a dynamic expression 
- Return type:
- bool 
 
 - to(dtype)#
- Convert to another type. - Parameters:
- dtype (type) – The target type for conversion 
- Returns:
- The ScaledBasis converted to the specified type 
- Raises:
- TypeError – If conversion to the specified type is not supported 
 
 - property value#
- Get the scale value. - Returns:
- The scale value 
 
 - property mode: List[int]#
- Get the mode identifying the basis element. - Returns:
- The mode as a list of integers 
- Return type:
- List[int] 
 
 
- class cutlass.cute.Atom(op: Op, trait: Trait)#
- Bases: - ABC- Atom base class. - An Atom is the composition of - a MMA or Copy Operation; 
- an internal MMA or Copy Trait. 
 - An Operation is a pure Python class that is used to model a specific MMA or Copy instruction. The Trait wraps the underlying IR Value and provides access to the metadata of the instruction encoded using CuTe Layouts. When the Trait can be constructed straighforwardly from an Operation, the - make_mma_atomor- make_copy_atomAPI should be used. There are cases where constructing the metadata is not trivial and requires more information, for example to determine the number of bytes copied per TMA instruction (“the TMA vector length”). In such cases, dedicated helper functions are provided with an appropriate API such that the Atom is constructed internally in an optimal fashion for the user.- __init__(
- op: Op,
- trait: Trait,
 - property op: Op#
 - property type#
 - set(modifier, value, *, loc=None, ip=None) None#
- Sets runtime fields of the Atom. - Some Atoms have runtime state, for example a tcgen05 MMA Atom - tiled_mma = cute.make_tiled_mma(some_tcgen05_mma_op) tiled_mma.set(cute.nvgpu.tcgen05.Field.ACCUMULATE, True) - The - setmethod provides a way to the user to modify such runtime state. Modifiable fields are provided by arch-specific enumerations, for example- tcgen05.Field. The Atom instance internally validates the field as well as the value provided by the user to set the field to.
 - get(field, *, loc=None, ip=None) Any#
- Gets runtime fields of the Atom. - Some Atoms have runtime state, for example a tcgen05 MMA Atom - tiled_mma = cute.make_tiled_mma(some_tcgen05_mma_op) accum = tiled_mma.get(cute.nvgpu.tcgen05.Field.ACCUMULATE) - The - getmethod provides a way to the user to access such runtime state. Modifiable fields are provided by arch-specific enumerations, for example- tcgen05.Field. The Atom instance internally validates the field as well as the value provided by the user to set the field to.
 - _unpack(*, loc=None, ip=None, **kwargs) cutlass._mlir.ir.Value#
 - _abc_impl = <_abc._abc_data object>#
 
- class cutlass.cute.MmaAtom(op: Op, trait: Trait)#
- Bases: - Atom- The MMA Atom class. - property thr_id: cutlass.cute.typing.Layout#
 - property shape_mnk: cutlass.cute.typing.Shape#
 - property tv_layout_A: cutlass.cute.typing.Layout#
 - property tv_layout_B: cutlass.cute.typing.Layout#
 - property tv_layout_C: cutlass.cute.typing.Layout#
 - make_fragment_A(input, *, loc=None, ip=None)#
 - make_fragment_B(input, *, loc=None, ip=None)#
 - make_fragment_C(input, *, loc=None, ip=None)#
 - _abc_impl = <_abc._abc_data object>#
 
- class cutlass.cute.CopyAtom(op: Op, trait: Trait)#
- Bases: - Atom- The Copy Atom class. - property value_type: Type[cutlass.cute.typing.Numeric]#
 - property thr_id: cutlass.cute.typing.Layout#
 - property layout_src_tv: cutlass.cute.typing.Layout#
 - property layout_dst_tv: cutlass.cute.typing.Layout#
 - _abc_impl = <_abc._abc_data object>#
 
- class cutlass.cute.TiledCopy(op: Op, trait: Trait)#
- Bases: - CopyAtom- The tiled Copy class. - property layout_tv_tiled: cutlass.cute.typing.Layout#
 - property tiler_mn: cutlass.cute.typing.Tile#
 - property layout_src_tv_tiled: cutlass.cute.typing.Layout#
 - property layout_dst_tv_tiled: cutlass.cute.typing.Layout#
 - property size: int#
 - retile(src, *, loc=None, ip=None)#
 - _abc_impl = <_abc._abc_data object>#
 
- class cutlass.cute.TiledMma(op: Op, trait: Trait)#
- Bases: - MmaAtom- The tiled MMA class. - property tv_layout_A_tiled: cutlass.cute.typing.Layout#
 - property tv_layout_B_tiled: cutlass.cute.typing.Layout#
 - property tv_layout_C_tiled: cutlass.cute.typing.Layout#
 - property permutation_mnk: cutlass.cute.typing.Tile#
 - property thr_layout_vmnk: cutlass.cute.typing.Layout#
 - property size: int#
 - get_tile_size(mode_idx: int) cutlass.cute.typing.Shape#
 - _partition_shape(operand_id, shape, *, loc=None, ip=None)#
 - partition_shape_A(shape_mk, *, loc=None, ip=None)#
 - partition_shape_B(shape_nk, *, loc=None, ip=None)#
 - partition_shape_C(shape_mn, *, loc=None, ip=None)#
 - _thrfrg(
- operand_id,
- input: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- _thrfrg(
- operand_id,
- input: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
 - _thrfrg_A(
- input: cutlass.cute.typing.Layout | cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
 - _thrfrg_B(
- input: cutlass.cute.typing.Layout | cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
 - _thrfrg_C(
- input: cutlass.cute.typing.Layout | cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
 - _abc_impl = <_abc._abc_data object>#
 
- class cutlass.cute.ThrMma(
- op: Op,
- trait: Trait,
- thr_idx: int | cutlass.cute.typing.Int32,
- Bases: - TiledMma- The thread MMA class for modeling a thread-slice of a tiled MMA. - __init__(
- op: Op,
- trait: Trait,
- thr_idx: int | cutlass.cute.typing.Int32,
 - property thr_idx#
 - partition_A(
- input_mk: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
 - partition_B(
- input_nk: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
 - partition_C(
- input_mn: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
 - _abc_impl = <_abc._abc_data object>#
 
- class cutlass.cute.ThrCopy(
- op: Op,
- trait: Trait,
- thr_idx: int | cutlass.cute.typing.Int32,
- Bases: - TiledCopy- The thread Copy class for modeling a thread-slice of a tiled Copy. - __init__(
- op: Op,
- trait: Trait,
- thr_idx: int | cutlass.cute.typing.Int32,
 - property thr_idx#
 - partition_S(
- src: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
 - partition_D(
- dst: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
 - _abc_impl = <_abc._abc_data object>#
 
- class cutlass.cute.TensorSSA(*args: Any, **kwargs: Any)#
- Bases: - ArithValue- A class representing thread local data from CuTe Tensor in value semantic and immutable. - Parameters:
- value (ir.Value) – Flatten vector as ir.Value holding logic data of SSA Tensor 
- shape (Shape) – The nested shape in CuTe of the vector 
- dtype (Type[Numeric]) – Data type of the tensor elements 
 
- Variables:
- _shape – The nested shape in CuTe of the vector 
- _dtype – Data type of the tensor elements 
 
- Raises:
- ValueError – If shape is not static 
 - __init__(
- value,
- shape: cutlass.cute.typing.Shape,
- dtype: Type[cutlass.cute.typing.Numeric],
- Initialize a new TensorSSA object. - Parameters:
- value (ir.Value) – Flatten vector as ir.Value holding logic data of SSA Tensor 
- shape (Shape) – The nested shape in CuTe of the vector 
- dtype (Type[Numeric]) – Data type of the tensor elements 
 
- Raises:
- ValueError – If shape is not static 
 
 - property dtype: Type[cutlass.cute.typing.Numeric]#
 - property element_type: Type[cutlass.cute.typing.Numeric]#
 - property shape#
 - _apply_op(
- op,
- other: TensorSSA,
- flip=False,
- *,
- loc,
- ip,
- _apply_op(
- op,
- other: cutlass.cutlass_dsl.cutlass_arith.ArithValue,
- flip=False,
- *,
- loc,
- ip,
- _apply_op(
- op,
- other: int | float | bool,
- flip=False,
- *,
- loc,
- ip,
 - broadcast_to(
- target_shape: cutlass.cute.typing.Shape,
- *,
- loc=None,
- ip=None,
- Broadcast the tensor to the target shape. 
 - _flatten_shape_and_coord(crd, *, loc=None, ip=None)#
 - _build_result(res_vect, res_shp, *, loc=None, ip=None)#
 - to(
- dtype: Type[cutlass.cute.typing.Numeric],
- *,
- loc=None,
- ip=None,
- Convert the tensor to a different numeric type. - Parameters:
- dtype (Type[Numeric]) – The target numeric type to cast to. 
- Returns:
- A new tensor with the same shape but with elements cast to the target type. 
- Return type:
- Raises:
- TypeError – If dtype is not a subclass of Numeric. 
- NotImplementedError – If dtype is an unsigned integer type. 
 
 
 - ir_value(*, loc=None, ip=None)#
 - ir_value_int8(*, loc=None, ip=None)#
- Returns int8 ir value of Boolean tensor. When we need to store Boolean tensor ssa, use ir_value_int8(). - Parameters:
- loc (Optional[Location], optional) – Source location information, defaults to None 
- ip (Optional[InsertionPoint], optional) – Insertion point for MLIR operations, defaults to None 
 
- Returns:
- The int8 value of this Boolean 
- Return type:
- ir.Value 
 
 - reduce(
- op,
- init_val,
- reduction_profile: cutlass.cute.typing.Coord,
- *,
- loc=None,
- ip=None,
- Perform reduce on selected modes with given predefined reduction op. - Parameters:
- op (operator) – The reduction operator to use (operator.add or operator.mul) 
- init_val (numeric) – The initial value for the reduction 
- reduction_profile (Coord) – Specifies which dimensions to reduce. Dimensions marked with None are kept. 
 
- Returns:
- The reduced tensor 
- Return type:
 - Examples: - reduce(f32 o (4,)) => f32 reduce(f32 o (4, 5)) => f32 reduce(f32 o (4, (5, 4)), reduction_profile=(None, 1)) => f32 o (4,) reduce(f32 o (4, (5, 4)), reduction_profile=(None, (None, 1))) => f32 o (4, (5,)) 
 
- cutlass.cute.assume(src, divby=None, *, loc=None, ip=None)#
- cutlass.cute.is_static(
- x: cutlass._mlir.ir.Type | cutlass._mlir.ir.Value | cutlass.cute.typing.XTuple,
- Check if a value is statically known at compile time. - In CuTe, static values are those whose values are known at compile time, as opposed to dynamic values which are only known at runtime. - Parameters:
- x (Union[ir.Type, ir.Value, XTuple]) – The value to check 
- Returns:
- True if the value is static, False otherwise 
- Return type:
- bool 
- Raises:
- TypeError – If an unsupported type is provided 
 
- cutlass.cute.has_underscore(a: cutlass.cute.typing.XTuple) bool#
- cutlass.cute.printf(*args, loc=None, ip=None) None#
- Print one or more values with optional formatting. - This function provides printf-style formatted printing capabilities. It can print values directly or format them using C-style format strings. The function supports printing various types including layouts, numeric values, tensors, and other CuTe objects. - The function accepts either: 1. A list of values to print directly 2. A format string followed by values to format - Parameters:
- args (Any) – Variable length argument list containing either: - One or more values to print directly - A format string followed by values to format 
- loc (Optional[Location]) – Source location information for debugging, defaults to None 
- ip (Optional[InsertionPoint]) – Insertion point for code generation, defaults to None 
 
- Raises:
- ValueError – If no arguments are provided 
- TypeError – If an unsupported argument type is passed 
 
 - Examples: - Direct printing of values: - a = cute.make_layout(shape=(10, 10), stride=(10, 1)) b = cutlass.Float32(1.234) cute.printf(a, b) # Prints values directly - Formatted printing: - # Using format string with generic format specifiers cute.printf("a={}, b={}", a, b) # Using format string with C-style format specifiers cute.printf("a={}, b=%.2f", a, b) 
- cutlass.cute.print_tensor(
- tensor: cutlass.cute.typing.Tensor | TensorSSA,
- *,
- verbose: bool = False,
- loc=None,
- ip=None,
- Print content of the tensor in human readable format. - Outputs the tensor data in a structured format showing both metadata and the actual data values. The output includes tensor type information, layout details, and a formatted array representation of the values. - Parameters:
- tensor (Tensor) – The tensor to print 
- verbose (bool) – If True, includes additional debug information in the output 
- loc (source location, optional) – Source location where it’s called, defaults to None 
- ip (insertion pointer, optional) – Insertion pointer for IR generation, defaults to None 
 
- Raises:
- NotImplementedError – If the tensor type doesn’t support trivial dereferencing 
 - Example output: - tensor(raw_ptr<@..., Float32, generic, align(4)> o (8,5):(5,1), data= [[-0.4326, -0.5434, 0.1238, 0.7132, 0.8042], [-0.8462, 0.9871, 0.4389, 0.7298, 0.6948], [ 0.3426, 0.5856, 0.1541, 0.2923, 0.6976], [-0.1649, 0.8811, 0.1788, 0.1404, 0.2568], [-0.2944, 0.8593, 0.4171, 0.8998, 0.1766], [ 0.8814, 0.7919, 0.7390, 0.4566, 0.1576], [ 0.9159, 0.7577, 0.6918, 0.0754, 0.0591], [ 0.6551, 0.1626, 0.1189, 0.0292, 0.8655]])
- cutlass.cute.pretty_str(arg) str#
- Constructs a concise readable pretty string. 
- cutlass.cute.make_layout(
- shape: cutlass.cute.typing.Shape,
- *,
- stride: cutlass.cute.typing.Stride | None = None,
- loc=None,
- ip=None,
- Create a CuTe Layout object from shape and optional stride information. - A Layout in CuTe represents the mapping between logical and physical coordinates of a tensor. This function creates a Layout object that defines how tensor elements are arranged in memory. - Parameters:
- shape (Shape) – Shape of the layout defining the size of each mode 
- stride (Union[Stride, None]) – Optional stride values for each mode, defaults to None 
- loc (Optional[Location]) – Source location information, defaults to None 
- ip (Optional[InsertionPoint]) – Insertion point for IR generation, defaults to None 
 
- Returns:
- A new Layout object with the specified shape and stride 
- Return type:
- Layout 
 - Examples: - # Create a 2D compact left-most layout with shape (4,4) layout = make_layout((4,4)) # compact left-most layout # Create a left-most layout with custom strides layout = make_layout((4,4), stride=(1,4)) # left-most layout with strides (1,4) # Create a layout for a 3D tensor layout = make_layout((32,16,8)) # left-most layout # Create a layout with custom strides layout = make_layout((2,2,2), stride=(4,1,2)) # layout with strides (4,1,2) - Note - If stride is not provided, a default compact left-most stride is computed based on the shape 
- The resulting layout maps logical coordinates to physical memory locations 
- The layout object can be used for tensor creation and memory access patterns 
- Strides can be used to implement: * Row-major vs column-major layouts * Padding and alignment * Blocked/tiled memory arrangements * Interleaved data formats 
- Stride is keyword only argument to improve readability, e.g. * make_layout((3,4), (1,4)) can be confusing with make_layout(((3,4), (1,4))) * make_layout((3,4), stride=(1,4)) is more readable 
 
- cutlass.cute.recast_layout(
- new_type_bits,
- old_type_bits,
- src_layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.make_identity_layout(
- shape: cutlass.cute.typing.Shape,
- *,
- loc=None,
- ip=None,
- Create an identity layout with the given shape. - An identity layout maps logical coordinates directly to themselves without any transformation. This is equivalent to a layout with stride (1@0,1@1,…,1@(N-1)). - Parameters:
- shape (Shape) – The shape of the layout 
- loc (Optional[Location]) – Source location information, defaults to None 
- ip (Optional[InsertionPoint]) – Insertion point for IR generation, defaults to None 
 
- Returns:
- A new identity Layout object with the specified shape 
- Return type:
- Layout 
 - Examples: - # Create a 2D identity layout with shape (4,4) layout = make_identity_layout((4,4)) # stride=(1@0,1@1) # Create a 3D identity layout layout = make_identity_layout((32,16,8)) # stride=(1@0,1@1,1@2) - Note - An identity layout is a special case where each coordinate maps to itself 
- Useful for direct coordinate mapping without any transformation 
 
- cutlass.cute.make_ordered_layout(
- shape: cutlass.cute.typing.Shape,
- order: cutlass.cute.typing.Shape,
- *,
- loc=None,
- ip=None,
- Create a layout with a specific ordering of dimensions. - This function creates a layout where the dimensions are ordered according to the specified order parameter, allowing for custom dimension ordering in the layout. - Parameters:
- shape (Shape) – The shape of the layout 
- order (Shape) – The ordering of dimensions 
- loc (Optional[Location]) – Source location information, defaults to None 
- ip (Optional[InsertionPoint]) – Insertion point for IR generation, defaults to None 
 
- Returns:
- A new Layout object with the specified shape and dimension ordering 
- Return type:
- Layout 
 - Examples: - # Create a row-major layout layout = make_ordered_layout((4,4), order=(1,0)) # Create a column-major layout layout = make_ordered_layout((4,4), order=(0,1)) # stride=(1,4) # Create a layout with custom dimension ordering for a 3D tensor layout = make_ordered_layout((32,16,8), order=(2,0,1)) # stride=(128,1,16) - Note - The order parameter specifies the ordering of dimensions from fastest-varying to slowest-varying 
- For a 2D tensor, (0,1) creates a column-major layout, while (1,0) creates a row-major layout 
- The length of order must match the rank of the shape 
 
- cutlass.cute.make_composed_layout(
- inner,
- offset: cutlass.cute.typing.IntTuple,
- outer: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- Create a composed layout by composing an inner transformation with an outer layout. - A composed layout applies a sequence of transformations to coordinates. The composition is defined as (inner ∘ offset ∘ outer), where the operations are applied from right to left. - Parameters:
- inner (Union[Layout, Swizzle]) – The inner transformation (can be a Layout or Swizzle) 
- offset (IntTuple) – An integral offset applied between transformations 
- outer (Layout) – The outer (right-most) layout that is applied first 
- loc (Optional[Location]) – Source location information, defaults to None 
- ip (Optional[InsertionPoint]) – Insertion point for IR generation, defaults to None 
 
- Returns:
- A new ComposedLayout representing the composition 
- Return type:
- ComposedLayout 
 - Examples: - # Create a basic layout inner = make_layout(...) outer = make_layout((4,4), stride=(E(0), E(1))) # Create a composed layout with an offset composed = make_composed_layout(inner, (2,0), outer) - Note - The composition applies transformations in the order: outer → offset → inner 
- The stride divisibility condition must be satisfied for valid composition 
- Certain compositions (like Swizzle with scaled basis) are invalid and will raise errors 
- Composed layouts inherit many properties from the outer layout 
 
- cutlass.cute.make_layout_tv(
- thr_layout: cutlass.cute.typing.Layout,
- val_layout: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- Create a thread-value layout for partitioning data tensors. - This function creates a thread-value layout that maps between - (thread_idx, value_idx)coordinates and logical- (M,N)coordinates. The thread layout must be compact to ensure proper partitioning.- This implements the thread-value partitioning pattern shown in Figure TVLayout, where data is partitioned across threads and values within each thread. - Parameters:
- thr_layout (Layout) – Layout mapping from - (TileM,TileN)coordinates to thread IDs (must be compact)
- val_layout (Layout) – Layout mapping from - (ValueM,ValueN)coordinates to value IDs within each thread
- loc (Optional[Location], optional) – Source location for MLIR, defaults to None 
- ip (Optional[InsertionPoint], optional) – Insertion point, defaults to None 
 
- Returns:
- A tuple containing - tiler_mnand- layout_tv
- Return type:
- Tuple[Shape, Layout] 
 - where:
- tiler_mnis tiler and- shape(tiler_mn)is compatible with- shape(zipped_divide(x, tiler_mn))[0]
- layout_tv: Thread-value layout mapping (thread_idx, value_idx) -> (M,N)
 
 - Example: - tiler_mn, layout_tv = cute.make_layout_tv( cute.make_layout((4, 8), stride=(8, 1)), cute.make_layout(2, stride=1) ) - Above code creates a TV layout that maps between thread/value coordinates and the logical coordinates in a 8x8 matrix with: - thread block layout - (4,8):(8,1)
- 2 elements per thread 
 
- cutlass.cute.make_layout_image_mask(
- lay: cutlass.cute.typing.Layout,
- coord: cutlass.cute.typing.Coord,
- mode: int,
- *,
- loc=None,
- ip=None,
- Makes a 16-bit integer mask of the image of a layout sliced at a given mode and accounting for the offset given by the input coordinate for the other modes. 
- cutlass.cute.make_ptr(
- dtype: Type[cutlass.cute.typing.Numeric] | None,
- value,
- mem_space: cutlass.cute.typing.AddressSpace = cutlass.cute.typing.AddressSpace.generic,
- *,
- assumed_align=None,
- loc=None,
- ip=None,
- cutlass.cute.make_tensor(
- iterator,
- layout: cutlass.cute.typing.Shape | cutlass.cute.typing.Layout | cutlass.cute.typing.ComposedLayout,
- *,
- loc=None,
- ip=None,
- Creates a tensor by composing an engine (iterator/pointer) with a layout. - A tensor is defined as T = E ∘ L, where E is an engine (array, pointer, or counting iterator) and L is a layout that maps logical coordinates to physical offsets. The tensor evaluates coordinates by applying the layout mapping and dereferencing the engine at the resulting offset. - Parameters:
- iterator (Union[Pointer, IntTuple]) – Engine component (pointer, iterator, or counting iterator) that provides data access capabilities 
- layout (Union[Shape, Layout, ComposedLayout]) – Layout component that defines the mapping from logical coordinates to physical offsets 
- loc (Optional[Location]) – Source location for MLIR operation tracking, defaults to None 
- ip (Optional[InsertionPoint]) – Insertion point for MLIR operation, defaults to None 
 
- Returns:
- A tensor object representing the composition E ∘ L 
- Return type:
- Tensor 
- Raises:
- ValueError – If iterator type is not supported 
 - Examples: - # Create a tensor with row-major layout layout = make_layout((64, 128), stride=(128, 1)) tensor = make_tensor(ptr, layout) # Create a tensor with hierarchical layout layout = make_layout(((128, 8), (1, 4, 1)), stride=((32, 1), (0, 8, 4096))) tensor = make_tensor(smem_ptr, layout) # Create a coord tensor layout = make_layout(2, stride=16 * E(0)) tensor = make_tensor(5, layout) - Notes - The engine (iterator) must support random access operations 
- Common engine types include raw pointers, arrays, and random-access iterators 
- The layout defines both the shape (logical dimensions) and stride (physical mapping) 
- Supports both direct coordinate evaluation T(c) and partial evaluation (slicing) 
 
- cutlass.cute.make_identity_tensor(
- shape: cutlass.cute.typing.Shape,
- *,
- loc=None,
- ip=None,
- Creates an identity tensor with the given shape. - An identity tensor maps each coordinate to itself, effectively creating a counting sequence within the shape’s bounds. This is useful for generating coordinate indices or creating reference tensors for layout transformations. - Parameters:
- shape (Shape) – The shape defining the tensor’s dimensions. Can be a simple integer sequence or a hierarchical structure ((m,n),(p,q)) 
- loc (Optional[Location]) – Source location for MLIR operation tracking, defaults to None 
- ip (Optional[InsertionPoint]) – Insertion point for MLIR operation, defaults to None 
 
- Returns:
- A tensor that maps each coordinate to itself 
- Return type:
- Tensor 
 - Examples: - # Create a simple 1D coord tensor tensor = make_identity_tensor(6) # [0,1,2,3,4,5] # Create a 2D coord tensor tensor = make_identity_tensor((3,2)) # [(0,0),(1,0),(2,0),(0,1),(1,1),(2,1)] # Create hierarchical coord tensor tensor = make_identity_tensor(((2,1),3)) # [((0,0),0),((1,0),0),((0,0),1),((1,0),1),((0,0),2),((1,0),2)] - Notes - The shape parameter follows CuTe’s IntTuple concept 
- Coordinates are ordered colexicographically 
- Useful for generating reference coordinates in layout transformations 
 
- cutlass.cute.make_fragment(
- layout_or_shape: cutlass.cute.typing.Layout | cutlass.cute.typing.Shape,
- dtype: Type[cutlass.cute.typing.Numeric],
- *,
- loc=None,
- ip=None,
- cutlass.cute.make_fragment_like(src, dtype=None, *, loc=None, ip=None)#
- Create tensor with a compact layout in the same shape as the source on stack. - This function either creates a fragment tensor with compact layout in same shape as the source layout or a new layout with the same shape as the source. The strides of the new layout follow the order induced by the source’s strides, with a special handling of the 0th mode: it is always stride-1 and generated in column-major order (LayoutLeft). - Parameters:
- src (Union[Layout, ComposedLayout, Tensor]) – The source layout or tensor whose shape will be matched 
- dtype (Type[Numeric], optional) – The element type for the fragment tensor, defaults to None 
- loc (Location, optional) – Source location for MLIR operations, defaults to None 
- ip (InsertionPoint, optional) – Insertion point for MLIR operations, defaults to None 
 
- Returns:
- A new layout or fragment tensor with matching shape 
- Return type:
- Union[Layout, Tensor] 
 - Examples: - Creating a rmem tensor from a tensor: - smem_tensor = cute.make_tensor(smem_ptr, layout) frag_tensor = cute.make_fragment_like(smem_tensor, cutlass.Float32) # frag_tensor will be a register-backed tensor with the same shape - Creating a fragment with a different element type: - tensor = cute.make_tensor(gmem_ptr, layout) bool_frag = cute.make_fragment_like(tensor, cutlass.Boolean) # bool_frag will be a register-backed tensor with Boolean elements - Notes - When used with a Tensor, if a type is provided, it will create a new fragment tensor with that element type. 
- For layouts with ScaledBasis strides, the function creates a fragment from the shape only. 
- This function is commonly used in GEMM and other tensor operations to create register storage for intermediate results. 
 
- cutlass.cute.recast_ptr(
- ptr: cutlass.cute.typing.Pointer,
- swizzle_=None,
- dtype: Type[cutlass.cute.typing.Numeric] | None = None,
- loc=None,
- ip=None,
- cutlass.cute.recast_tensor(
- src: cutlass.cute.typing.Tensor,
- dtype: Type[cutlass.cute.typing.Numeric],
- swizzle_=None,
- *,
- loc=None,
- ip=None,
- cutlass.cute.get(input, mode: List[int], *, loc=None, ip=None)#
- Extract a specific element or sub-layout from a layout or tuple. - This function recursively traverses the input according to the mode indices, extracting the element at the specified path. For layouts, this operation corresponds to extracting a specific sub-layout. - Parameters:
- input (Layout, ComposedLayout, tuple) – The input layout or tuple to extract from 
- mode (List[int]) – Indices specifying the path to traverse for extraction 
- loc (optional) – Source location for MLIR, defaults to None 
- ip (optional) – Insertion point, defaults to None 
 
- Returns:
- The extracted element or sub-layout 
- Return type:
- Layout, ComposedLayout, or element type 
- Raises:
- ValueError – If any index in mode is out of range 
- TypeError – If mode contains non-integer elements or if input has unsupported type 
 
- Postcondition:
- get(t, mode=find(x,t)) == x if find(x,t) != None else True
 - Examples: - layout = make_layout(((4, 8), (16, 1), 8), stride=((1, 4), (32, 0), 512)) sub_layout = get(layout, mode=[0, 1]) # 8:4 sub_layout = get(layout, mode=[1]) # (16, 1):(32, 0) 
- cutlass.cute.select(input, mode: List[int], *, loc=None, ip=None)#
- Select modes from input. - Parameters:
- input (Layout, ComposedLayout, tuple) – Input to select from 
- mode (List[int]) – Indices specifying which dimensions or elements to select 
- loc (optional) – Source location for MLIR, defaults to None 
- ip (optional) – Insertion point, defaults to None 
 
- Returns:
- A new instance with selected dimensions/elements 
- Return type:
- Layout, ComposedLayout, tuple 
- Raises:
- ValueError – If any index in mode is out of range 
- TypeError – If the input type is invalid 
 
 - Examples: - # Select specific dimensions from a layout layout = make_layout((4, 8, 16), stride=(32, 4, 1)) selected = select(layout, mode=[0, 2]) # Select mode 0 and mode 2 # Result: (4, 16):(32, 1) # Select elements from a tuple t = (1, 2, 3, 4, 5) selected = select(t, mode=[0, 2, 4]) # Select mode 0, mode 2, and mode 4 # Result: (1, 3, 5) 
- cutlass.cute.front(input, *, loc=None, ip=None)#
- Recursively get the first element of input. - This function traverses a hierarchical structure (like a layout or tensor) and returns the first element at the deepest level. It’s particularly useful for accessing the first stride value in a layout to determine properties like majorness. - Parameters:
- input (Union[Tensor, Layout, Stride]) – The hierarchical structure to traverse 
- loc (source location, optional) – Source location where it’s called, defaults to None 
- ip (insertion pointer, optional) – Insertion pointer for IR generation, defaults to None 
 
- Returns:
- The first element at the deepest level of the input structure 
- Return type:
- Union[int, float, bool, ir.Value] 
 
- cutlass.cute.is_major(
- mode,
- stride: cutlass.cute.typing.Stride,
- *,
- loc=None,
- ip=None,
- Check whether a mode in stride is the major mode. 
- cutlass.cute.leading_dim(
- shape: cutlass.cute.typing.Shape,
- stride: cutlass.cute.typing.Stride,
- Find the leading dimension of a shape and stride. - Parameters:
- shape (Shape) – The shape of the tensor or layout 
- stride (Stride) – The stride of the tensor or layout 
 
- Returns:
- The leading dimension index or indices 
- Return type:
- Union[int, Tuple[int, …], None] 
 - The return value depends on the stride pattern: - If a single leading dimension is found, returns an integer index 
- If nested leading dimensions are found, returns a tuple of indices 
- If no leading dimension is found, returns None 
 
- cutlass.cute.find(
- t: tuple | cutlass._mlir.ir.Value | int,
- x: int,
- *,
- loc=None,
- ip=None,
- Find the first position of a value - xin a hierarchical structure- t.- Searches for the first occurrence of x in t, optionally excluding positions where a comparison value matches. The search can traverse nested structures and returns either a single index or a tuple of indices for nested positions. - Parameters:
- t (Union[tuple, ir.Value, int]) – The search space 
- x (int) – The static integer x to search for 
 
- Returns:
- Index if found at top level, tuple of indices showing nested position, or None if not found 
- Return type:
- Union[int, Tuple[int, …], None] 
 
- cutlass.cute.find_if(
- t: tuple | cutlass._mlir.ir.Value | int,
- pred_fn: Callable[[int, Tuple[int, ...]], bool],
- *,
- loc=None,
- ip=None,
- cutlass.cute.transform_leaf(f, *args)#
- Apply a function to the leaf nodes of nested tuple structures. - This function traverses nested tuple structures in parallel and applies the function f to corresponding leaf nodes. All input tuples must have the same nested structure. - Parameters:
- f (Callable) – Function to apply to leaf nodes 
- args – One or more nested tuple structures with matching profiles 
 
- Returns:
- A new nested tuple with the same structure as the inputs, but with leaf values transformed by f 
- Raises:
- TypeError – If the input tuples have different nested structures 
 - Example: - >>> transform_leaf(lambda x: x + 1, (1, 2)) (2, 3) >>> transform_leaf(lambda x, y: x + y, (1, 2), (3, 4)) (4, 6) >>> transform_leaf(lambda x: x * 2, ((1, 2), (3, 4))) ((2, 4), (6, 8)) 
- cutlass.cute.coalesce(
- input,
- *,
- target_profile: cutlass.cute.typing.Coord | None = None,
- loc=None,
- ip=None,
- cutlass.cute.group_modes(input, begin: int, end: int = -1, *, loc=None, ip=None)#
- Group modes of a hierarchical tuple or layout into a single mode. - This function groups a range of modes from the input object into a single mode, creating a hierarchical structure. For tuples, it creates a nested tuple containing the specified range of elements. For layouts and other CuTe objects, it creates a hierarchical representation where the specified modes are grouped together. - Parameters:
- input (Layout, ComposedLayout, tuple, Shape, Stride, etc.) – Input object to group modes from (layout, tuple, etc.) 
- beg (int) – Beginning index of the range to group (inclusive) 
- end (int) – Ending index of the range to group (exclusive) 
- loc (optional) – Source location for MLIR, defaults to None 
- ip (optional) – Insertion point, defaults to None 
 
- Returns:
- A new object with the specified modes grouped 
- Return type:
- Same type as input with modified structure 
 - Examples: - # Group modes in a tuple t = (2, 3, 4, 5) grouped = group_modes(t, 1, 3) # (2, (3, 4), 5) # Group modes in a layout layout = make_layout((2, 3, 4, 5)) grouped_layout = group_modes(layout, 1, 3) # Layout with shape (2, (3, 4), 5) # Group modes in a shape shape = make_shape(2, 3, 4, 5) grouped_shape = group_modes(shape, 0, 2) # Shape ((2, 3), 4, 5) 
- cutlass.cute.cosize(
- a: cutlass.cute.typing.Layout | cutlass.cute.typing.ComposedLayout | cutlass.cute.typing.Tensor,
- mode: List[int] = [],
- *,
- loc=None,
- ip=None,
- Return size of codomain of layout or tensor. Return static value if type is static. - For a layout - L = S:Dwhere- Sis the shape and- Dis the stride, the codomain size is the minimum size needed to store all possible offsets generated by the layout. This is calculated by taking the maximum offset plus 1.- For example, given a layout L = (4,(3,2)):(2,(8,1)):
- Shape - S = (4,(3,2))
- Stride - D = (2,(8,1))
- Maximum offset = - 2*(4-1) + 8*(3-1) + 1*(2-1) = 6 + 16 + 1 = 23
- Therefore - cosize(L) = 24
 
 - Examples: - L = cute.make_layout((4,(3,2)), stride=(2,(8,1))) # L = (4,(3,2)):(2,(8,1)) print(cute.cosize(L)) # => 24 - Parameters:
- a (Union[Layout, ComposedLayout, Tensor]) – Layout, ComposedLayout, or Tensor object 
- mode (List[int], optional) – List of mode(s) for cosize calculation. If empty, calculates over all modes. If specified, calculates cosize only for the given modes. 
- loc (optional) – Location information for diagnostics, defaults to None 
- ip (optional) – Instruction pointer for diagnostics, defaults to None 
 
- Returns:
- Static size of layout or tensor (fast fold) if static, or a dynamic Value 
- Return type:
- Union[int, Value] 
 
- For example, given a layout 
- cutlass.cute.size_in_bytes(
- dtype: Type[cutlass.cute.typing.Numeric],
- layout: cutlass.cute.typing.Layout | cutlass.cute.typing.ComposedLayout | None,
- *,
- loc=None,
- ip=None,
- Calculate the size in bytes based on its data type and layout. The result is rounded up to the nearest byte. - Parameters:
- dtype (Type[Numeric]) – The DSL numeric data type 
- layout (Layout, optional) – The layout of the elements. If None, the function returns 0 
- loc (optional) – Location information for diagnostics, defaults to None 
- ip (optional) – Instruction pointer for diagnostics, defaults to None 
 
- Returns:
- The total size in bytes. Returns 0 if the layout is None 
- Return type:
- int 
 
- cutlass.cute.flatten_to_tuple(
- a: cutlass.cute.typing.XTuple,
- Flattens a potentially nested tuple structure into a flat tuple. - This function recursively traverses the input structure and flattens it into a single-level tuple, preserving the order of elements. - Parameters:
- a (Union[IntTuple, Coord, Shape, Stride]) – The structure to flatten 
- Returns:
- A flattened tuple containing all elements from the input 
- Return type:
- tuple 
 - Examples: - flatten_to_tuple((1, 2, 3)) # Returns (1, 2, 3) flatten_to_tuple(((1, 2), 3)) # Returns (1, 2, 3) flatten_to_tuple((1, (2, (3,)))) # Returns (1, 2, 3) 
- cutlass.cute.flatten(a)#
- Flattens a CuTe data structure into a simpler form. - For tuples, this function flattens the structure into a single-level tuple. For layouts, it returns a new layout with flattened shape and stride. For tensors, it returns a new tensor with flattened layout. For other types, it returns the input unchanged. - Parameters:
- a (Union[IntTuple, Coord, Shape, Stride, Layout, Tensor]) – The structure to flatten 
- Returns:
- The flattened structure 
- Return type:
- Union[tuple, Any] 
 - Examples: - flatten((1, 2, 3)) # Returns (1, 2, 3) flatten(((1, 2), (3, 4))) # Returns (1, 2, 3, 4) flatten(5) # Returns 5 flatten(Layout(shape, stride)) # Returns Layout(flatten(shape), flatten(stride)) flatten(Tensor(layout)) # Returns Tensor(flatten(layout)) 
- cutlass.cute.unflatten(
- sequence: Tuple[Any, ...] | List[Any] | Iterable[Any],
- profile: cutlass.cute.typing.XTuple,
- Unflatten a flat tuple into a nested tuple structure according to a profile. - This function transforms a flat sequence of elements into a nested tuple structure that matches the structure defined by the profile parameter. It traverses the profile structure and populates it with elements from the sequence. - sequence must be long enough to fill the profile. Raises RuntimeError if it is not. - Parameters:
- sequence (Union[Tuple[Any, ...], List[Any], Iterable[Any]]) – A flat sequence of elements to be restructured 
- profile (XTuple) – A nested tuple structure that defines the shape of the output 
 
- Returns:
- A nested tuple with the same structure as profile but containing elements from sequence 
- Return type:
- XTuple 
 - Examples: - unflatten([1, 2, 3, 4], ((0, 0), (0, 0))) # Returns ((1, 2), (3, 4)) 
- cutlass.cute.product(
- a: cutlass.cute.typing.IntTuple | cutlass.cute.typing.Shape,
- *,
- loc=None,
- ip=None,
- cutlass.cute.product_like(
- a: cutlass.cute.typing.IntTuple,
- target_profile: cutlass.cute.typing.XTuple,
- *,
- loc=None,
- ip=None,
- Return product of the given IntTuple or Shape at leaves of target_profile. - This function computes products according to the structure defined by target_profile. - Parameters:
- a (IntTuple or Shape) – The input tuple or shape 
- target_profile (XTuple) – The profile that guides how products are computed 
- loc (optional) – Source location for MLIR, defaults to None 
- ip (optional) – Insertion point, defaults to None 
 
- Returns:
- The resulting tuple with products computed according to target_profile 
- Return type:
- IntTuple or Shape 
- Raises:
- TypeError – If inputs have incompatible types 
- ValueError – If inputs have incompatible shapes 
 
 
- cutlass.cute.product_each(
- a: cutlass.cute.typing.IntTuple,
- *,
- loc=None,
- ip=None,
- cutlass.cute.prepend(
- input,
- elem,
- up_to_rank: None | int = None,
- *,
- loc=None,
- ip=None,
- Extend input to rank up_to_rank by prepending elem in front of input. - This function extends the input object by prepending elements to reach a desired rank. It supports various CuTe types including shapes, layouts, tensors etc. - Parameters:
- input (Union[Shape, Stride, Coord, IntTuple, Tile, Layout, ComposedLayout, Tensor]) – Source to be prepended to 
- elem (Union[Shape, Stride, Coord, IntTuple, Tile, Layout]) – Element to prepend to input 
- up_to_rank (Union[None, int], optional) – The target rank after extension, defaults to None 
- loc (Optional[Location]) – Source location for MLIR, defaults to None 
- ip (Optional[InsertionPoint]) – Insertion point, defaults to None 
 
- Returns:
- The extended result with prepended elements 
- Return type:
- Union[Shape, Stride, Coord, IntTuple, Tile, Layout, ComposedLayout, Tensor] 
- Raises:
- ValueError – If up_to_rank is less than input’s current rank 
- TypeError – If input or elem has unsupported type 
 
 - Examples: - # Prepend to a Shape shape = (4,4) prepend(shape, 2) # Returns (2,4,4) # Prepend to a Layout layout = make_layout((8,8)) prepend(layout, make_layout((2,))) # Returns (2,8,8):(1,1,8) # Prepend with target rank coord = (1,1) prepend(coord, 0, up_to_rank=4) # Returns (0,0,1,1) 
- cutlass.cute.append(
- input,
- elem,
- up_to_rank: None | int = None,
- *,
- loc=None,
- ip=None,
- Extend input to rank up_to_rank by appending elem to the end of input. - This function extends the input object by appending elements to reach a desired rank. It supports various CuTe types including shapes, layouts, tensors etc. - Parameters:
- input (Union[Shape, Stride, Coord, IntTuple, Tile, Layout, ComposedLayout, Tensor]) – Source to be appended to 
- elem (Union[Shape, Stride, Coord, IntTuple, Tile, Layout]) – Element to append to input 
- up_to_rank (Union[None, int], optional) – The target rank after extension, defaults to None 
- loc (Optional[Location]) – Source location for MLIR, defaults to None 
- ip (Optional[InsertionPoint]) – Insertion point, defaults to None 
 
- Returns:
- The extended result with appended elements 
- Return type:
- Union[Shape, Stride, Coord, IntTuple, Tile, Layout, ComposedLayout, Tensor] 
- Raises:
- ValueError – If up_to_rank is less than input’s current rank 
- TypeError – If input or elem has unsupported type 
 
 - Examples: - # Append to a Shape shape = (4,4) append(shape, 2) # Returns (4,4,2) # Append to a Layout layout = make_layout((8,8)) append(layout, make_layout((2,))) # Returns (8,8,2):(1,8,1) # Append with target rank coord = (1,1) append(coord, 0, up_to_rank=4) # Returns (1,1,0,0) - Note - The function preserves the structure of the input while extending it 
- Can be used to extend tensors, layouts, shapes and other CuTe types 
- When up_to_rank is specified, fills remaining positions with elem 
- Useful for tensor reshaping and layout transformations 
 
- cutlass.cute.prepend_ones(
- t: cutlass.cute.typing.Tensor,
- up_to_rank: None | int = None,
- *,
- loc=None,
- ip=None,
- cutlass.cute.append_ones(t, up_to_rank: None | int = None, *, loc=None, ip=None)#
- cutlass.cute.elem_less(
- lhs: cutlass.cute.typing.Shape | cutlass.cute.typing.IntTuple | cutlass.cute.typing.Coord,
- rhs: cutlass.cute.typing.Shape | cutlass.cute.typing.IntTuple | cutlass.cute.typing.Coord,
- *,
- loc=None,
- ip=None,
- cutlass.cute.ceil_div(
- input: cutlass.cute.typing.Shape,
- tiler: cutlass.cute.typing.Tiler,
- *,
- loc=None,
- ip=None,
- Compute the ceiling division of a target shape by a tiling specification. - This function computes the number of tiles required to cover the target domain. It is equivalent to the second mode of zipped_divide(input, tiler). - Parameters:
- input (Shape) – A tuple of integers representing the dimensions of the target domain. 
- tiler (Union[Layout, Shape, Tile]) – The tiling specification. 
- loc (optional) – Optional location information for IR diagnostics. 
- ip (optional) – Optional instruction pointer or context for underlying IR functions. 
 
- Returns:
- A tuple of integers representing the number of tiles required along each dimension, i.e. the result of the ceiling division of the input dimensions by the tiler dimensions. 
- Return type:
- Shape 
 - Example: - import cutlass.cute as cute @cute.jit def foo(): input = (10, 6) tiler = (3, 4) result = cute.ceil_div(input, tiler) print(result) # Outputs: (4, 2) 
- cutlass.cute.round_up(
- a: cutlass.cute.typing.IntTuple,
- b: cutlass.cute.typing.IntTuple,
- Rounds up elements of a using elements of b. 
- cutlass.cute.slice_and_offset(coord, src, *, loc=None, ip=None)#
- cutlass.cute.crd2idx(
- coord: cutlass.cute.typing.Coord,
- layout,
- *,
- loc=None,
- ip=None,
- Convert a multi-dimensional coordinate into a value using the specified layout. - This function computes the inner product of the flattened coordinate and stride: - index = sum(flatten(coord)[i] * flatten(stride)[i] for i in range(len(coord))) - Parameters:
- coord (Coord) – A tuple or list representing the multi-dimensional coordinate (e.g., (i, j) for a 2D layout). 
- layout (Layout or ComposedLayout) – A layout object that defines the memory storage layout, including shape and stride, used to compute the inner product. 
- loc (optional) – Optional location information for IR diagnostics. 
- ip (optional) – Optional instruction pointer or context for underlying IR functions. 
 
- Returns:
- The result of applying the layout transformation to the provided coordinate. 
- Return type:
- Any type that the layout maps to 
 - Example: - import cutlass.cute as cute @cute.jit def foo(): L = cute.make_layout((5, 4), stride=(4, 1)) idx = cute.crd2idx((2, 3), L) # Computed as: 2 * 4 + 3 = 11 print(idx) foo() # Expected output: 11 
- cutlass.cute.domain_offset(
- coord: cutlass.cute.typing.Coord,
- tensor: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
- cutlass.cute.filter_zeros(input, *, target_profile=None, loc=None, ip=None)#
- Filter out zeros from a layout or tensor. - This function removes zero-stride dimensions from a layout or tensor. Refer to NVIDIA/cutlass for more layout algebra operations. - Parameters:
- input (Layout or Tensor) – The input layout or tensor to filter 
- target_profile (Stride, optional) – Target stride profile for the filtered result, defaults to None 
- loc (optional) – Source location for MLIR, defaults to None 
- ip (optional) – Insertion point, defaults to None 
 
- Returns:
- The filtered layout or tensor with zeros removed 
- Return type:
- Layout or Tensor 
- Raises:
- TypeError – If input is not a Layout or Tensor 
 
- cutlass.cute.filter(input, *, loc=None, ip=None)#
- Filter a layout or tensor. - This function filters a layout or tensor according to CuTe’s filtering rules. - Parameters:
- input (Layout or Tensor) – The input layout or tensor to filter 
- loc (optional) – Source location for MLIR, defaults to None 
- ip (optional) – Insertion point, defaults to None 
 
- Returns:
- The filtered layout or tensor 
- Return type:
- Layout or Tensor 
- Raises:
- TypeError – If input is not a Layout or Tensor 
 
- cutlass.cute.tile_to_shape(
- atom,
- trg_shape: cutlass.cute.typing.Shape,
- order: cutlass.cute.typing.Shape,
- *,
- loc=None,
- ip=None,
- cutlass.cute.shape_div(
- lhs: cutlass.cute.typing.Shape,
- rhs: cutlass.cute.typing.Shape,
- *,
- loc=None,
- ip=None,
- Perform element-wise division of shapes. - This function performs element-wise division between two shapes. - Parameters:
- lhs (Shape) – Left-hand side shape 
- rhs (Shape) – Right-hand side shape 
- loc (optional) – Source location for MLIR, defaults to None 
- ip (optional) – Insertion point, defaults to None 
 
- Returns:
- The result of element-wise division 
- Return type:
- Shape 
 
- cutlass.cute.composition(
- lhs,
- rhs: cutlass.cute.typing.Layout | cutlass.cute.typing.Shape | cutlass.cute.typing.Tile,
- *,
- loc=None,
- ip=None,
- Compose two layout representations using the CuTe layout algebra. - Compose a left-hand layout (or tensor) with a right-hand operand into a new layout R, such that for every coordinate c in the domain of the right-hand operand, the composed layout satisfies: - R(c) = A(B(c)) - where A is the left-hand operand provided as - lhsand B is the right-hand operand provided as- rhs. In this formulation, B defines the coordinate domain while A applies its transformation to B’s output, and the resulting layout R inherits the stride and shape adjustments from A.- Satisfies:
- cute.shape(cute.composition(lhs, rhs)) is compatible with cute.shape(rhs) 
 - Parameters:
- lhs (Layout or Tensor) – The left-hand operand representing the transformation to be applied. 
- rhs (Layout, Shape, or Tile, or int or tuple) – The right-hand operand defining the coordinate domain. If provided as an int or tuple, it will be converted to a tile layout. 
- loc (optional) – Optional location information for IR diagnostics. 
- ip (optional) – Optional instruction pointer or context for underlying IR functions. 
 
- Returns:
- A new composed layout R, such that for all coordinates c in the domain of - rhs, R(c) = lhs(rhs(c)).
- Return type:
- Layout or Tensor 
 - Example: - import cutlass.cute as cute @cute.jit def foo(): # Create a layout that maps (i,j) to i*4 + j L1 = cute.make_layout((2, 3), stride=(4, 1)) # Create a layout that maps (i,j) to i*3 + j L2 = cute.make_layout((3, 4), stride=(3, 1)) # Compose L1 and L2 L3 = cute.composition(L1, L2) # L3 now maps coordinates through L2 then L1 
- cutlass.cute.complement(
- input: cutlass.cute.typing.Layout,
- cotarget: cutlass.cute.typing.Layout | cutlass.cute.typing.Shape,
- *,
- loc=None,
- ip=None,
- Compute the complement layout of the input layout with respect to the cotarget. - The complement of a layout A with respect to cotarget n is a layout A* such that for every k in Z_n and c in the domain of A, there exists a unique c* in the domain of A* where k = A(c) + A*(c*). - This operation is useful for creating layouts that partition a space in complementary ways, such as row and column layouts that together cover a matrix. - Parameters:
- input (Layout) – The layout to compute the complement of 
- cotarget (Union[Layout, Shape]) – The target layout or shape that defines the codomain 
- loc (optional) – Optional location information for IR diagnostics 
- ip (optional) – Optional instruction pointer or context for underlying IR functions 
 
- Returns:
- The complement layout 
- Return type:
- Layout 
 - Example: - import cutlass.cute as cute @cute.jit def foo(): # Create a right-major layout for a 4x4 matrix row_layout = cute.make_layout((4, 4), stride=(4, 1)) # Create a left-major layout that complements the row layout col_layout = cute.complement(row_layout, 16) # The two layouts are complementary under 16 
- cutlass.cute.right_inverse(
- input: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.left_inverse(
- input: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.max_common_layout(
- a: cutlass.cute.typing.Layout | cutlass.cute.typing.Tensor,
- b: cutlass.cute.typing.Layout | cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
- cutlass.cute.max_common_vector(
- a: cutlass.cute.typing.Layout | cutlass.cute.typing.Tensor,
- b: cutlass.cute.typing.Layout | cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
- cutlass.cute.is_congruent(
- a: cutlass.cute.typing.XTuple | cutlass.cute.typing.Layout | cutlass.cute.typing.ComposedLayout | cutlass.cute.typing.Tensor,
- b: cutlass.cute.typing.XTuple | cutlass.cute.typing.Layout | cutlass.cute.typing.ComposedLayout | cutlass.cute.typing.Tensor,
- Returns whether a is congruent to b. - Congruence is an equivalence relation between hierarchical structures. - Two objects are congruent if: * They have the same rank, AND * They are both non-tuple values, OR * They are both tuples AND all corresponding elements are congruent. - Congruence requires type matching at each level – scalar values match with scalar values, and tuples match with tuples of the same rank. - Parameters:
- a (Union[XTuple, Layout, ComposedLayout, Tensor]) – First object to compare 
- b (Union[XTuple, Layout, ComposedLayout, Tensor]) – Second object to compare 
 
- Returns:
- True if a and b are congruent, False otherwise 
- Return type:
- bool 
 
- cutlass.cute.is_weakly_congruent(
- a: cutlass.cute.typing.XTuple | cutlass.cute.typing.Layout | cutlass.cute.typing.ComposedLayout | cutlass.cute.typing.Tensor,
- b: cutlass.cute.typing.XTuple | cutlass.cute.typing.Layout | cutlass.cute.typing.ComposedLayout | cutlass.cute.typing.Tensor,
- Returns whether a is weakly congruent to b. - Weak congruence is a partial order on hierarchical structures. - Object X is weakly congruent to object Y if: * X is a non-tuple value, OR * X and Y are both tuples of the same rank AND all corresponding elements are weakly congruent. - Weak congruence allows scalar values to match with tuples, making it useful for determining whether an object has a hierarchical structure “up to” another. - Parameters:
- a (Union[XTuple, Layout, ComposedLayout, Tensor]) – First object to compare 
- b (Union[XTuple, Layout, ComposedLayout, Tensor]) – Second object to compare 
 
- Returns:
- True if a and b are weakly congruent, False otherwise 
- Return type:
- bool 
 
- cutlass.cute.logical_product(
- block,
- tiler: cutlass.cute.typing.Tile,
- *,
- loc=None,
- ip=None,
- cutlass.cute.zipped_product(
- block,
- tiler: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.tiled_product(
- block,
- tiler: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.flat_product(
- block,
- tiler: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.raked_product(
- block,
- tiler: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.blocked_product(
- block,
- tiler: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- cutlass.cute.flat_divide(
- target,
- tiler: cutlass.cute.typing.Tile,
- *,
- loc=None,
- ip=None,
- cutlass.cute.logical_divide(
- target,
- tiler: cutlass.cute.typing.Tiler,
- *,
- loc=None,
- ip=None,
- cutlass.cute.zipped_divide(
- target,
- tiler: cutlass.cute.typing.Tiler,
- *,
- loc=None,
- ip=None,
- cutlass.cute.tiled_divide(
- target,
- tiler: cutlass.cute.typing.Tiler,
- *,
- loc=None,
- ip=None,
- cutlass.cute.local_partition(
- target: cutlass.cute.typing.Tensor,
- tiler: cutlass.cute.typing.Layout | cutlass.cute.typing.Shape,
- index: int | cutlass.cute.typing.Numeric,
- proj: cutlass.cute.typing.XTuple = 1,
- *,
- loc=None,
- ip=None,
- cutlass.cute.local_tile(
- input: cutlass.cute.typing.Tensor,
- tiler: cutlass.cute.typing.Layout | cutlass.cute.typing.Shape,
- coord: cutlass.cute.typing.Coord,
- proj: cutlass.cute.typing.XTuple | None = None,
- *,
- loc=None,
- ip=None,
- cutlass.cute.make_atom(ty, values=None, *, loc=None, ip=None)#
- This is a wrapper around the _cute_ir.make_atom operation, providing default value for the values argument. 
- cutlass.cute.make_mma_atom(
- op: MmaOp,
- *,
- loc=None,
- ip=None,
- **kwargs,
- Makes an MMA Atom from an MMA Operation. - This function creates an MMA Atom from a given MMA Operation. Arbitrary kw arguments can be provided for Op-specific additional parameters. They are not used as of today. - Parameters:
- op (MmaOp) – The MMA Operation to construct an Atom for 
- Returns:
- The MMA Atom 
- Return type:
 
- cutlass.cute.make_tiled_mma(
- op_or_atom: Op | MmaAtom,
- atom_layout_mnk=(1, 1, 1),
- permutation_mnk=None,
- *,
- loc=None,
- ip=None,
- **kwargs,
- Makes a tiled MMA from an MMA Operation or an MMA Atom. - Parameters:
- op_or_atom (Union[Op, MmaAtom]) – The MMA Operation or Atom 
- atom_layout_mnk (Layout) – A Layout describing the tiling of Atom across threads 
- permutation_mnk (Tiler) – A permutation Tiler describing the tiling of Atom across values including any permutation of such tiling 
 
- Returns:
- The resulting tiled MMA 
- Return type:
 
- cutlass.cute.make_copy_atom(
- op: CopyOp,
- copy_internal_type: Type[cutlass.cute.typing.Numeric],
- *,
- loc=None,
- ip=None,
- **kwargs,
- Makes a Copy Atom from a Copy Operation. - This function creates a Copy Atom from a given Copy Operation. Arbitrary kw arguments can be provided for Op-specific additional parameters. - Example: - op = cute.nvgpu.CopyUniversalOp() atom = cute.make_copy_atom(op, tensor_dtype, num_bits_per_copy=64) - Parameters:
- op (CopyOp) – The Copy Operation to construct an Atom for 
- copy_internal_type (Type[Numeric]) – An internal data type used to construct the source/destination layouts in unit of tensor elements 
 
- Returns:
- The Copy Atom 
- Return type:
 
- cutlass.cute.make_tiled_copy_tv(
- atom: CopyAtom,
- thr_layout: cutlass.cute.typing.Layout,
- val_layout: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- Create a tiled copy given separate thread and value layouts. - A TV partitioner is inferred based on the input layouts. The input thread layout must be compact. - Parameters:
- atom (CopyAtom) – Copy atom 
- thr_layout (Layout) – Layout mapping from - (TileM,TileN)coordinates to thread IDs (must be compact)
- val_layout (Layout) – Layout mapping from - (ValueM,ValueN)coordinates to value IDs
- loc (Optional[Location], optional) – Source location for MLIR, defaults to None 
- ip (Optional[InsertionPoint], optional) – Insertion point, defaults to None 
 
- Returns:
- A tiled copy for the partitioner 
- Return type:
 
- cutlass.cute.make_tiled_copy(atom, layout_tv, tiler_mn, *, loc=None, ip=None)#
- Create a tiled type given a TV partitioner and tiler. - Parameters:
- atom (CopyAtom) – Copy atom, e.g. smit_copy and simt_async_copy, tma_load, etc. 
- layout_tv (Layout) – Thread-value layout 
- tiler_mn (Tiler) – Tile size 
- loc (Optional[Location], optional) – Source location for MLIR, defaults to None 
- ip (Optional[InsertionPoint], optional) – Insertion point, defaults to None 
 
- Returns:
- A tiled copy for the partitioner 
- Return type:
 
- cutlass.cute.make_tiled_copy_S(atom, tiled_copy, *, loc=None, ip=None)#
- Create a tiled copy out of the copy_atom that matches the Src-Layout of tiled_copy. 
- cutlass.cute.make_tiled_copy_D(atom, tiled_copy, *, loc=None, ip=None)#
- Create a tiled copy out of the copy_atom that matches the Dst-Layout of tiled_copy. 
- cutlass.cute.make_tiled_copy_A(atom, tiled_mma, *, loc=None, ip=None)#
- Create a tiled copy out of the copy_atom that matches the A-Layout of tiled_mma. 
- cutlass.cute.make_tiled_copy_B(atom, tiled_mma, *, loc=None, ip=None)#
- Create a tiled copy out of the copy_atom that matches the B-Layout of tiled_mma. 
- cutlass.cute.make_tiled_copy_C(atom, tiled_mma, *, loc=None, ip=None)#
- Create a tiled copy out of the copy_atom that matches the C-Layout of tiled_mma. 
- cutlass.cute.make_tiled_copy_C_atom( )#
- Create the smallest tiled copy that can retile LayoutC_TV for use with pipelined epilogues with subtiled stores. - Parameters:
- Returns:
- A tiled copy for partitioner 
- Return type:
- Raises:
- ValueError – If the number value of CopyAtom’s source layout is greater than the size of TiledMma’s LayoutC_TV 
 
- cutlass.cute.make_cotiled_copy(
- atom: CopyAtom,
- atom_layout_tv: cutlass.cute.typing.Layout,
- data_layout: cutlass.cute.typing.Layout,
- *,
- loc=None,
- ip=None,
- Produce a TiledCopy from thread and value offset maps. The TV Layout maps threads and values to the codomain of the data_layout. It is verified that the intended codomain is valid within data_layout. Useful when threads and values don’t care about owning specific coordinates, but care more about the vector-width and offsets between them. - Parameters:
- atom (copy atom, e.g. simt_copy and simt_async_copy, tgen05.st, etc.) 
- atom_layout_tv ((tid, vid) -> data addr) 
- data_layout (data coord -> data addr) 
- loc (source location for mlir (optional)) 
- ip (insertion point (optional)) 
 
- Returns:
- A tuple of A tiled copy and atom 
- Return type:
- tiled_copy 
 
- cutlass.cute.copy_atom_call(
- atom: CopyAtom,
- src: cutlass.cute.typing.Tensor,
- dst: cutlass.cute.typing.Tensor,
- *,
- pred: cutlass.cute.typing.Tensor | None = None,
- loc=None,
- ip=None,
- **kwargs,
- Execute a single copy atom operation. - The copy_atom_call operation executes a copy atom with the given operands. Following src/dst layout of atom are valid: * ((atom_v)) * (atom_v) - Note: The format ((atom_v, rest_v)) is NOT valid for copy_atom_call since it would require multiple atom operations, which contradicts the definition of a single copy atom call. - Examples: - # Call a copy atom operation cute.copy_atom_call(copy_atom, src_tensor, dst_tensor) - An additional predication tensor can be provided. If the partitioned tensors have the following logical profile - ((ATOM_V,ATOM_REST),REST_M,...), the predication tensor must have a profile consistent with- (ATOM_REST,REST_M,...).
- cutlass.cute.basic_copy(
- src: cutlass.cute.typing.Tensor,
- dst: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
- Performs a basic element-wise copy. - This functions assumes the following pre-conditions: 1. size(src) == size(dst) - When the src and dst shapes are static, the pre-conditions are actually verified and the element-wise loop is fully unrolled. - Parameters:
- src (Tensor) – Source tensor 
- dst (Tensor) – Destination tensor 
- loc (Optional[Location], optional) – Source location for MLIR, defaults to None 
- ip (Optional[InsertionPoint], optional) – Insertion point, defaults to None 
 
 
- cutlass.cute.basic_copy_if(
- pred: cutlass.cute.typing.Tensor,
- src: cutlass.cute.typing.Tensor,
- dst: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
- Performs a basic predicated element-wise copy. - This functions assumes the following pre-conditions: 1. size(src) == size(dst) 2. size(src) == size(pred) - When all shapes are static, the pre-conditions are actually verified and the element-wise loop is fully unrolled. 
- cutlass.cute.autovec_copy(
- src: cutlass.cute.typing.Tensor,
- dst: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
- Auto-vectorization SIMT copy policy. - Given a source and destination tensors that are statically shaped, this policy figures out the largest safe vector width that the copy instruction can take and performs the copy. 
- cutlass.cute.copy(
- atom: CopyAtom,
- src: cutlass.cute.typing.Tensor,
- dst: cutlass.cute.typing.Tensor,
- *,
- pred: cutlass.cute.typing.Tensor | None = None,
- loc=None,
- ip=None,
- **kwargs,
- The Copy algorithm. - The “copy with Atom” expects source and destination tensors to be partitioned according to the provided Copy Atom. Some Atoms require additional Op-specific kw arguments, for example TMA copies: - cute.copy(tma_atom, src, dst, tma_bar_ptr=mbar_ptr, mcast_mask=mask) - An additional predication tensor can be provided. If the partitioned tensors have the following logical profile - ((ATOM_V,ATOM_REST),REST_M,...), the predication tensor must have a profile consistent with- (ATOM_REST,REST_M,...).- For Copy Atoms that require single-threaded execution, the copy op automatically handles thread election internally. Manual thread selection is not required in such cases. 
- cutlass.cute.prefetch(
- atom: CopyAtom,
- src: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
- The Prefetch algorithm. - The “prefetch” expects source tensors to be partitioned according to the provided Copy Atom. Prefetch is used for loading tensors from global memory to L2. - Prefetch accepts Copy Atom but not all are allowed. Currently, only supports TMA prefetch. - cute.prefetch(tma_prefetch, src) - For Copy Atoms that require single-threaded execution, the copy op automatically handles thread election internally. Manual thread selection is not required in such cases. 
- cutlass.cute.gemm(
- atom: MmaAtom,
- d: cutlass.cute.typing.Tensor,
- a: cutlass.cute.typing.Tensor,
- b: cutlass.cute.typing.Tensor,
- c: cutlass.cute.typing.Tensor,
- *,
- loc=None,
- ip=None,
- **kwargs,
- The GEMM algorithm. - Computes - D <- A * B + Cwhere- Cand- Dcan alias. Note that some MMA Atoms (e.g. warpgroup-wide or tcgen05 MMAs) require manually setting an “accumulate” boolean field.- All tensors must be partitioned according to the provided MMA Atom. - For MMA Atoms that require single-threaded execution, the gemm op automatically handles thread election internally. Manual thread selection is not required in such cases. - Following dispatch rules are supported: - Dispatch [1]: (V) x (V) => (V) => (V,1,1) x (V,1,1) => (V,1,1) 
- Dispatch [2]: (M) x (N) => (M,N) => (1,M,1) x (1,N,1) => (1,M,N) 
- Dispatch [3]: (M,K) x (N,K) => (M,N) => (1,M,K) x (1,N,K) => (1,M,N) 
- Dispatch [4]: (V,M) x (V,N) => (V,M,N) => (V,M,1) x (V,N,1) => (V,M,N) 
- Dispatch [5]: (V,M,K) x (V,N,K) => (V,M,N) 
 - Parameters:
- atom (MmaAtom) – MMA atom 
- d (Tensor) – Destination tensor 
- a (Tensor) – First source tensor 
- b (Tensor) – Second source tensor 
- c (Tensor) – Third source tensor 
- loc (Optional[Location], optional) – Source location for MLIR, defaults to None 
- ip (Optional[InsertionPoint], optional) – Insertion point for MLIR, defaults to None 
- kwargs (dict) – Additional keyword arguments 
 
- Returns:
- None 
- Return type:
- None 
 
- cutlass.cute.full(
- shape,
- fill_value,
- dtype: Type[cutlass.cute.typing.Numeric],
- *,
- loc=None,
- ip=None,
- Return a new TensorSSA of given shape and type, filled with fill_value. - Parameters:
- shape (tuple) – Shape of the new tensor. 
- fill_value (scalar) – Value to fill the tensor with. 
- dtype (Type[Numeric]) – Data type of the tensor. 
 
- Returns:
- Tensor of fill_value with the specified shape and dtype. 
- Return type:
 
- cutlass.cute.full_like(
- a: TensorSSA | cutlass.cute.typing.Tensor,
- fill_value,
- dtype: None | Type[cutlass.cute.typing.Numeric] = None,
- *,
- loc=None,
- ip=None,
- Return a full TensorSSA with the same shape and type as a given array. - Parameters:
- a (array_like) – The shape and data-type of a define these same attributes of the returned array. 
- fill_value (array_like) – Fill value. 
- dtype (Union[None, Type[Numeric]], optional) – Overrides the data type of the result, defaults to None 
 
- Returns:
- Tensor of fill_value with the same shape and type as a. 
- Return type:
 - See also - empty_like(): Return an empty array with shape and type of input.- ones_like(): Return an array of ones with shape and type of input.- zeros_like(): Return an array of zeros with shape and type of input.- full(): Return a new array of given shape filled with value.- Examples: - frg = cute.make_fragment(Float32, (2, 3)) a = frg.load() b = cute.full_like(a, 1.0) 
- cutlass.cute.empty_like(a, dtype=None, *, loc=None, ip=None)#
- Return a new TensorSSA with the same shape and type as a given array, without initializing entries. - Parameters:
- a (TensorSSA) – The shape and data-type of a define these same attributes of the returned array. 
- dtype (Type[Numeric], optional) – Overrides the data type of the result, defaults to None 
 
- Returns:
- Uninitialized tensor with the same shape and type (unless overridden) as a. 
- Return type:
 
- cutlass.cute.ones_like(a, dtype=None, *, loc=None, ip=None)#
- Return a TensorSSA of ones with the same shape and type as a given array. - Parameters:
- a (TensorSSA) – The shape and data-type of a define these same attributes of the returned array. 
- dtype (Type[Numeric], optional) – Overrides the data type of the result, defaults to None 
 
- Returns:
- Tensor of ones with the same shape and type (unless overridden) as a. 
- Return type:
 
- cutlass.cute.zeros_like(a, dtype=None, *, loc=None, ip=None)#
- Return a TensorSSA of zeros with the same shape and type as a given array. - Parameters:
- a (TensorSSA) – The shape and data-type of a define these same attributes of the returned array. 
- dtype (Type[Numeric], optional) – Overrides the data type of the result, defaults to None 
 
- Returns:
- Tensor of zeros with the same shape and type (unless overridden) as a. 
- Return type:
 
- cutlass.cute.where(
- cond: TensorSSA,
- x: TensorSSA | cutlass.cute.typing.Numeric,
- y: TensorSSA | cutlass.cute.typing.Numeric,
- *,
- loc=None,
- ip=None,
- Return elements chosen from x or y depending on condition; will auto broadcast x or y if needed. - Parameters:
- Returns:
- A tensor with elements from x where condition is True, and elements from y where condition is False. 
- Return type:
 
- cutlass.cute.any_(
- x: TensorSSA,
- *,
- loc=None,
- ip=None,
- Test whether any tensor element evaluates to True. 
- cutlass.cute.all_(
- x: TensorSSA,
- *,
- loc=None,
- ip=None,
- Test whether all tensor elements evaluate to True. 
- cutlass.cute.repeat_like(x, target)#
- Creates an object congruent to target and filled with x. - This function recursively creates a nested tuple structure that matches the structure of the target, with each leaf node filled with the value x. - Parameters:
- x (Any) – The value to fill the resulting structure with 
- target (Union[tuple, Any]) – The structure to mimic 
 
- Returns:
- A structure matching target but filled with x 
- Return type:
- Union[tuple, Any] 
 - Examples: - repeat_like(0, (1, 2, 3)) # Returns (0, 0, 0) repeat_like(1, ((1, 2), 3)) # Returns ((1, 1), 1) repeat_like(2, 5) # Returns 2 
- class cutlass.cute.struct(cls)#
- Bases: - object- Decorator to abstract C structure in Python DSL. - Usage: - # Supports base_dsl scalar int/float elements, array and nested struct: @cute.struct class complex: real : cutlass.Float32 imag : cutlass.Float32 @cute.struct class StorageA: mbarA : cute.struct.MemRange[cutlass.Int64, stage] compA : complex intA : cutlass.Int16 # Supports alignment for its elements: @cute.struct class StorageB: a: cute.struct.Align[ cute.struct.MemRange[cutlass.Float32, size_a], 1024 ] b: cute.struct.Align[ cute.struct.MemRange[cutlass.Float32, size_b], 1024 ] x: cute.struct.Align[cutlass.Int32, 16] compA: cute.struct.Align[complex, 16] # Statically get size and alignment: size = StorageB.__sizeof__() align = StorageB.__alignof__() # Allocate and referencing elements: storage = allocator.allocate(StorageB) storage.a[0] ... storage.x ... storage.compA.real ... - Parameters:
- cls – The struct class with annotations. 
- Returns:
- The decorated struct class. 
 - class _MemRangeMeta(name, bases, dct)#
- Bases: - type- A metaclass for creating MemRange classes. - This metaclass is used to dynamically create MemRange classes with specific data types and sizes. - Variables:
- _dtype – The data type of the MemRange. 
- _size – The size of the MemRange. 
 
 - _dtype = None#
 - _size = None#
 - property size#
 - property elem_width#
 - property size_in_bytes#
 
 - class MemRange#
- Bases: - object- Defines a range of memory by MemRange[T, size]. 
 - class _MemRangeData(dtype, size, base)#
- Bases: - object- Represents a range of memory. - Parameters:
- dtype – The data type. 
- size – The size of the memory range in bytes. 
- base – The base address of the memory range. 
 
 - __init__(dtype, size, base)#
- Initializes a new memory range. - Parameters:
- dtype – The data type. 
- size – Size of the memory range in bytes. A size of 0 is accepted, but in that case the range can only be used for its address (e.g. as a partition marker). 
- base – The base address of the memory range. 
 
 
 - data_ptr()#
- Returns start pointer to the data in this memory range. - Returns:
- A pointer to the start of the memory range. 
- Raises:
- AssertionError – If the size of the memory range is negative. 
 
 - get_tensor(layout, swizzle=None, dtype=None)#
- Creates a tensor from the memory range. - Parameters:
- layout – The layout of the tensor. 
- swizzle – Optional swizzle pattern. 
- dtype – Optional data type; defaults to the memory range’s data type if not specified. 
 
- Returns:
- A tensor representing the memory range. 
- Raises:
- TypeError – If the layout is incompatible with the swizzle. 
- AssertionError – If the size of the memory range is not greater than zero. 
 
 
 
 - class _AlignMeta(name, bases, dct)#
- Bases: - type- Aligns the given object by setting its alignment attribute. - Parameters:
- v – The object to align. Must be a struct, MemRange, or a scalar type. 
- align – The alignment value to set. 
 
- Raises:
- TypeError – If the object is not a struct, MemRange, or a scalar type. 
- Variables:
- _dtype – The data type to be aligned. 
- _align – The alignment of the data type. 
 
 - _dtype = None#
 - _align = None#
 - property dtype#
 - property align#
 
 - class Align#
- Bases: - object- Aligns the given type by Align[T, alignment]. 
 - static _is_scalar_type(dtype)#
- Checks if the given type is a scalar numeric type. - Parameters:
- dtype – The type to check. 
- Returns:
- True if the type is a subclass of Numeric, False otherwise. 
 
 - __init__(cls)#
- Initializes a new struct decorator instance. - Parameters:
- cls – The class representing the structured data type. 
- Raises:
- TypeError – If the struct is empty. 
 
 - size_in_bytes() int#
- Returns the size of the struct in bytes. - Returns:
- The size of the struct. 
 
 - static align_offset(offset, align)#
- Return the round-up offset up to the next multiple of align.