Functionality#

The CUTLASS DSL 4.0 release supports Python 3.12 only. It shares the same driver requirements as the CUDA Toolkit 12.9. Specifically, the driver version must be 575.51.03 or later.

Currently, only Linux x86_64 is supported. Additional platform support will be added in future releases.

Supported MMA Operations#

NVIDIA Ampere Architecture:

  • FP16 / BF16 tensor core instructions

NVIDIA Hopper Architecture:

  • FP16 / BF16

  • FP8

NVIDIA Blackwell Architecture:

  • FP16 / BF16

  • TF32

  • I8

  • F8

Notable Limitations#

For current constraints and unsupported features, refer to the Limitations section.