Release Notes – Release 0.9.0 (BETA)¶

Key Features and Enhancements¶

Support FlashAttention with no masking.

Added DDP support for no-bias training option in PyTorch.

Added support for cuDNN fused-attention for specific cases.

JAX support for Praxis.

JIT-compiled transpose kernels.

Fixed Issues¶

Fixed a bug in PyTorch sequence-parallel where wrong tensor was being passed for GEMM in a specific path.

Fixed an issue with handling nested fp8 autocasts in PyTorch.

Fixed a bug during warmup of JIT kernels that made execution non-deterministic.

Known Issues in This Release¶

There are no known issues in this release.

Breaking Changes in This Release¶

There are no breaking changes in this release.

Deprecated Features¶

The TransformerLayer arguments attention_softmax_in_fp32 and apply_query_key_layer_scaling are deprecated, and will be removed in a future release. The default behavior is as if those arguments were set to True.