.. include:: /content/common.rsts

Release Notes |ndash| Release 0.11.0 (BETA)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Key Features and Enhancements
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

- [pyTorch] Added ``RMSNorm`` module
- [pyTorch] Added `normalization` option to the ``LayerNormLinear``,
  ``LayerNormMLP``, and ``TransformerLayer`` modules to let the user choose between `LayerNorm` and `RMSNorm` normalization.
- [pyTorch] Added FlashAttention v2 support.
- [pyTorch] Added support for Multi-Query and Grouped-Query Attention.
- [pyTorch] Added cuDNN attention for long sequence lengthsas a backend for ``DotProductAttention``.

Fixed Issues
@@@@@@@@@@@@

- Fixed issues with the ONNX export of the ``LayerNorm`` module.
- Fixed a problem with  discovery of the Transformer Engine library
  in the Python virtual environment.
- Fixed a crash occurring when trying to combine ``torch.compile``
  with the Transformer Engine modules.

Known Issues in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@

- FlashAttention v2, which is a dependency of this release
  of Transformer Engine, has a known issue with excessive memory usage during installation (https://github.com/Dao-AILab/flash-attention/issues/358). You can work around this issue either by setting the environment variable ``MAX_JOBS=1`` during Transformer Engine installation, or by installing FlashAttention v1 (e.g. running `pip install flash-attn==1.0.9`) before attempting to install Transformer Engine.

Breaking Changes in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

- [JAX] The ``TransformerLayer`` argument `attn_type` is removed and superseded
  by the argument `attn_mask_type`.


Deprecated Features
@@@@@@@@@@@@@@@@@@@

- [pyTorch] The ``TransformerLayer`` arguments *attention_softmax_in_fp32*
  and *apply_query_key_layer_scaling* are deprecated, and will be removed in a future release. The default behavior is as if those arguments were set to ``True``.