Release Notes Release 0.4.0 (BETA)

Key Features and Enhancements

Added new nvte_multi_cast_transpose() C function for handling multiple casts at the same time.

Moved softmax kernels to the framework-agnostic C API layer.

Added a performance optimization tutorial.

Fixed Issues in This Release

Fixed a crash occurring for some inputs in the LayerNorm() backward call.

Known Issues in This Release

There are no known issues in this release.

Breaking Changes in This Release

The C API is reworked to be more flexible when handling scaling parameters.

The LayerNorm module parameter names are changed to weight and bias to match pyTorch.

Deprecated Features

There are no deprecated features in this release.