CUTLASS 3.x# Design CUTLASS 3.0 design goals A new Conceptual GEMM Hierarchy Adoption of CuTe Layout and Tensors Reducing the number of named types and iterator concepts Correctness by default, Performance through clear, individual points of tuning GEMM Backwards Compatibility Compatible Device API Compatible Kernel API Threadblock API and Inner Loops Porting from 2.x to 3.0 API GEMM API CUTLASS GEMM Model CUTLASS GEMM Components Kernel API Device API Tiled MMA and Copy Atom API