Common#
- class cutlass.cute.nvgpu.OpError(*args: Any, **kwargs: Any)#
Bases:
DSLBaseErrorAn exception class for Op construction errors.
- cutlass.cute.nvgpu.normalize_field_to_ir_name(field, admissible_fields) str#
Normalize a field specifier to its IR logical field name.
Accepted inputs:
Enum value present in admissible_fields (must expose _to_ir_field_name()).
Exact string IR name (e.g., “accum_c”, “neg_a”, “sf_a”).
Any other form is rejected.
- class cutlass.cute.nvgpu.MmaUniversalOp(abacc_dtype: Type[cutlass.cute.typing.Numeric])#
Bases:
MmaOpThe universal MMA Operation.
This Operation currently expects the A/B operands as well as the accumulator to share the same data types.
- Parameters:
abacc_dtype (Type[Numeric]) – The data type for the A/B operands and the accumulator
- abacc_dtype: Type[cutlass.cute.typing.Numeric]#
- class cutlass.cute.nvgpu.MmaUniversalTrait(value: cutlass._mlir.ir.Value)#
Bases:
Trait
- class cutlass.cute.nvgpu.CopyUniversalOp#
Bases:
CopyOpThe universal Copy Operation.
When creating a Copy Atom out of this operation, the expected usage pattern is
op = cute.nvgpu.CopyUniversalOp() atom = cute.make_copy_atom( op, tensor_dtype, num_bits_per_copy=64, l1c_evict_priority=cute.nvgpu.CacheEvictionPriority.EVICT_NORMAL )
tensor_dtypeis the data type used to build the reference TV Layout (either the source or the destination TV Layout) in unit of tensor elements and is used for partitioning byTiledCopyfor examplenum_bits_per_copyis a kw argument specifying the number of bits to copy per Atom execution. This can be larger than the width of the above data type. When not provided, the compiler will do a best effort at auto-vectorizing.l1c_evict_priorityis a kw argument specifying the L1 cache eviction priority hint for the copy operation. Defaults toEVICT_NORMALif not provided.invariantis a kw argument specifying whether the load is invariant (read-only data that never changes). This enables compiler optimizations like instruction reordering. Defaults toFalseif not provided.
- class cutlass.cute.nvgpu.CopyUniversalTrait(value: cutlass._mlir.ir.Value)#
Bases:
Trait
- class cutlass.cute.nvgpu.MemoryOrder(value)#
Bases:
EnumAn enumeration.
- class cutlass.cute.nvgpu.MemoryScope(value)#
Bases:
EnumAn enumeration.
- class cutlass.cute.nvgpu.CacheEvictionPriority(value)#
Bases:
EnumAn enumeration.
- cutlass.cute.nvgpu.make_tiled_tma_atom_A(
- op: CopyBulkTensorTileG2SOp | CopyBulkTensorTileG2SMulticastOp,
- gmem_tensor: cutlass.cute.typing.Tensor,
- smem_layout: cutlass.cute.typing.Layout | cutlass.cute.typing.ComposedLayout,
- mma_tiler_mnk: cutlass.cute.typing.Shape,
- tiled_mma: TiledMma,
- cluster_shape_vmnk: cutlass.cute.typing.Shape | None = None,
- *,
- internal_type: Type[cutlass.cute.typing.Numeric] | None = None,
- loc=None,
- ip=None,
Makes a TMA Copy atom mapping to
.tilemode forcp.async.bulk.tensorPTX operation accounting for the MK projections of the TiledMMA for A tensor loads.Given
a GMEM tensor
a SMEM layout
a MMA Tiler
a TiledMma
a Cluster-level shape
this function figures out the bulk tensor asynchronous copy instruction to use with the maximum “TMA vector length” to copy tiles of the GMEM tensor to an SMEM buffer with the provided layout and consistent with the provided Tiler & tiled_mma (considering the M-mode & K-mode). The Cluster-level shape is used to determine the multicast factor across the N-mode for A tensor loads.
This function returns two results:
the Copy Atom
the so-called TMA tensor used to map logical coordinates of the GMEM tensor to coordinates that the TMA unit can consume. TMA tensors have so-called basis stride elements so that the associated layout can output coordinates. Otherwise, TMA tensors can be partitioned similarly to any other CuTe tensors using the algebra.
- Parameters:
op (Union[CopyBulkTensorTileG2SOp, CopyBulkTensorTileG2SMulticastOp]) – The Copy Operation to construct an Atom for
gmem_tensor (Tensor) – The GMEM tensor to be loaded by this copy atom
smem_layout (Union[Layout, ComposedLayout]) – Shared memory layout to load the tensor into (PDSL)
mma_tiler_mnk (Shape) – The MMA Tiler shape (TILE_M, TILE_N, TILE_K) in MNK dimensions
tiled_mma (atom.TiledMma) – The TiledMMA that will consume the load as operands
cluster_shape_vmnk (Shape) – The Cluster-level shape in VMNK dimensions
internal_type (Type[Numeric]) – An optional parameter for the internal data type to when element type does not match the copy type
- Returns:
A copy atom for this operation and the associated TMA coord tensor
- Return type:
Tuple[atom.CopyAtom, Tensor]
- cutlass.cute.nvgpu.make_tiled_tma_atom_B(
- op: CopyBulkTensorTileG2SOp | CopyBulkTensorTileG2SMulticastOp,
- gmem_tensor: cutlass.cute.typing.Tensor,
- smem_layout: cutlass.cute.typing.Layout | cutlass.cute.typing.ComposedLayout,
- mma_tiler_mnk: cutlass.cute.typing.Shape,
- tiled_mma: TiledMma,
- cluster_shape_vmnk: cutlass.cute.typing.Shape | None = None,
- *,
- internal_type: Type[cutlass.cute.typing.Numeric] | None = None,
- loc=None,
- ip=None,
Makes a TMA Copy atom mapping to
.tilemode forcp.async.bulk.tensorPTX operation accounting for the NK projections of the TiledMMA for B tensor loads.Given
a GMEM tensor
a SMEM layout
a MMA Tiler
a TiledMma
a Cluster-level shape
this function figures out the bulk tensor asynchronous copy instruction to use with the maximum “TMA vector length” to copy tiles of the GMEM tensor to an SMEM buffer with the provided layout and consistent with the provided Tiler & tiled_mma (considering the N-mode & K-mode). The Cluster-level shape is used to determine the multicast factor across the M-mode for B tensor loads.
This function returns two results:
the Copy Atom
the so-called TMA tensor used to map logical coordinates of the GMEM tensor to coordinates that the TMA unit can consume. TMA tensors have so-called basis stride elements so that the associated layout can output coordinates. Otherwise, TMA tensors can be partitioned similarly to any other CuTe tensors using the algebra.
- Parameters:
op (Union[CopyBulkTensorTileG2SOp, CopyBulkTensorTileG2SMulticastOp]) – The Copy Operation to construct an Atom for
gmem_tensor (Tensor) – The GMEM tensor to be loaded by this copy atom
smem_layout (Union[Layout, ComposedLayout]) – Shared memory layout to load the tensor into (PDSL)
mma_tiler_mnk (Shape) – The MMA Tiler shape (TILE_M, TILE_N, TILE_K) in MNK dimensions
tiled_mma (core.TiledMma) – The TiledMMA that will consume the load as operands
cluster_shape_vmnk (Shape) – The Cluster-level shape in VMNK dimensions
internal_type (Type[Numeric]) – An optional parameter for the internal data type to when element type does not match the copy type
- Returns:
A Copy Atom for this Operation and the associated TMA tensor
- Return type:
Tuple[atom.CopyAtom, Tensor]