.. include:: /content/common.rsts

Release Notes |ndash| Release 0.13.0 (BETA)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Key Features and Enhancements
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

- [pyTorch] Support for switching training precision between iterations.

Fixed Issues
@@@@@@@@@@@@

- [pyTorch] Fixed the misaligned address issue in the unfused softmax path (https://github.com/NVIDIA/TransformerEngine/issues/295).
- [pyTorch] Fixed an issue where in some cases using the cuDNN backend of the fused attention would corrupt the random number generator state. 
- Enabled rigorous error checking in the FusedAttention backend to catch unsupported use cases.
- [pyTorch] Bug fix in ONNX export not allowing users to specify type of attention mask.
- [pyTorch] bug fix in LayerNorm when using grouped query attention.
- [JAX] Fix a bug in LayerNorm backward that resulted in incorrect sharding when using FSDP.

Known Issues in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@

- FlashAttention v2, which is a dependency of this release of Transformer Engine, has a known issue with excessive memory usage during installation (https://github.com/Dao-AILab/flash-attention/issues/358). You can work around this issue either by setting the environment variable ``MAX_JOBS=1`` during Transformer Engine installation, or by installing FlashAttention v1 (e.g. running `pip install flash-attn==1.0.9`) before attempting to install Transformer Engine.
- There is a known crash when using the ``TransformerLayer`` and ``MultiheadAttention`` APIs with the *rotary_pos_emb* option.

Breaking Changes in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

- There are no breaking changes in this release.

Deprecated Features
@@@@@@@@@@@@@@@@@@@

- [pyTorch] The ``TransformerLayer`` arguments *attention_softmax_in_fp32*
  and *apply_query_key_layer_scaling* are deprecated, and will be removed in a future release. The default behavior is as if those arguments were set to ``True``.