.. include:: /content/common.rsts

Release Notes |ndash| Release 0.9.0 (BETA)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Key Features and Enhancements
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Support FlashAttention with no masking.

Added DDP support for no-bias training option in PyTorch.

Added support for cuDNN fused-attention for specific cases.

JAX support for Praxis.

JIT-compiled transpose kernels.

Fixed Issues
@@@@@@@@@@@@

Fixed a bug in PyTorch sequence-parallel where wrong tensor was being passed for GEMM in a specific path.

Fixed an issue with handling nested fp8 autocasts in PyTorch.

Fixed a bug during warmup of JIT kernels that made execution non-deterministic.

Known Issues in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@

There are no known issues in this release.

Breaking Changes in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

There are no breaking changes in this release.

Deprecated Features
@@@@@@@@@@@@@@@@@@@

The ``TransformerLayer`` arguments `attention_softmax_in_fp32` and `apply_query_key_layer_scaling` are deprecated, and will be removed in a future release. The default behavior is as if those arguments were set to ``True``.