.. include:: /content/common.rsts Release Notes |ndash| Release 0.13.0 (BETA) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Key Features and Enhancements @@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - [pyTorch] Support for switching training precision between iterations. Fixed Issues @@@@@@@@@@@@ - [pyTorch] Fixed the misaligned address issue in the unfused softmax path (https://github.com/NVIDIA/TransformerEngine/issues/295). - [pyTorch] Fixed an issue where in some cases using the cuDNN backend of the fused attention would corrupt the random number generator state. - Enabled rigorous error checking in the FusedAttention backend to catch unsupported use cases. - [pyTorch] Bug fix in ONNX export not allowing users to specify type of attention mask. - [pyTorch] bug fix in LayerNorm when using grouped query attention. - [JAX] Fix a bug in LayerNorm backward that resulted in incorrect sharding when using FSDP. Known Issues in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@ - FlashAttention v2, which is a dependency of this release of Transformer Engine, has a known issue with excessive memory usage during installation (https://github.com/Dao-AILab/flash-attention/issues/358). You can work around this issue either by setting the environment variable ``MAX_JOBS=1`` during Transformer Engine installation, or by installing FlashAttention v1 (e.g. running `pip install flash-attn==1.0.9`) before attempting to install Transformer Engine. - There is a known crash when using the ``TransformerLayer`` and ``MultiheadAttention`` APIs with the *rotary_pos_emb* option. Breaking Changes in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - There are no breaking changes in this release. Deprecated Features @@@@@@@@@@@@@@@@@@@ - [pyTorch] The ``TransformerLayer`` arguments *attention_softmax_in_fp32* and *apply_query_key_layer_scaling* are deprecated, and will be removed in a future release. The default behavior is as if those arguments were set to ``True``.