Release Notes Release 0.9.0 (BETA)

Key Features and Enhancements

Support FlashAttention with no masking.

Added DDP support for no-bias training option in PyTorch.

Added support for cuDNN fused-attention for specific cases.

JAX support for Praxis.

JIT-compiled transpose kernels.

Fixed Issues

Fixed a bug in PyTorch sequence-parallel where wrong tensor was being passed for GEMM in a specific path.

Fixed an issue with handling nested fp8 autocasts in PyTorch.

Fixed a bug during warmup of JIT kernels that made execution non-deterministic.

Known Issues in This Release

There are no known issues in this release.

Breaking Changes in This Release

There are no breaking changes in this release.

Deprecated Features

The TransformerLayer arguments attention_softmax_in_fp32 and apply_query_key_layer_scaling are deprecated, and will be removed in a future release. The default behavior is as if those arguments were set to True.