.. include:: /content/common.rsts Release Notes |ndash| Release 0.11.0 (BETA) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Key Features and Enhancements @@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - [pyTorch] Added ``RMSNorm`` module - [pyTorch] Added `normalization` option to the ``LayerNormLinear``, ``LayerNormMLP``, and ``TransformerLayer`` modules to let the user choose between `LayerNorm` and `RMSNorm` normalization. - [pyTorch] Added FlashAttention v2 support. - [pyTorch] Added support for Multi-Query and Grouped-Query Attention. - [pyTorch] Added cuDNN attention for long sequence lengthsas a backend for ``DotProductAttention``. Fixed Issues @@@@@@@@@@@@ - Fixed issues with the ONNX export of the ``LayerNorm`` module. - Fixed a problem with discovery of the Transformer Engine library in the Python virtual environment. - Fixed a crash occurring when trying to combine ``torch.compile`` with the Transformer Engine modules. Known Issues in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@ - FlashAttention v2, which is a dependency of this release of Transformer Engine, has a known issue with excessive memory usage during installation (https://github.com/Dao-AILab/flash-attention/issues/358). You can work around this issue either by setting the environment variable ``MAX_JOBS=1`` during Transformer Engine installation, or by installing FlashAttention v1 (e.g. running `pip install flash-attn==1.0.9`) before attempting to install Transformer Engine. Breaking Changes in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - [JAX] The ``TransformerLayer`` argument `attn_type` is removed and superseded by the argument `attn_mask_type`. Deprecated Features @@@@@@@@@@@@@@@@@@@ - [pyTorch] The ``TransformerLayer`` arguments *attention_softmax_in_fp32* and *apply_query_key_layer_scaling* are deprecated, and will be removed in a future release. The default behavior is as if those arguments were set to ``True``.