.. include:: /content/common.rsts

Release Notes |ndash| Release 1.4
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


Key Features and Enhancements
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

- [C/pyTorch] Added support for QuickGELU activation.
- [C/pyTorch] Added fused RoPE implementation for improved speedup.
- [C/pyTorch] Added support for zero centered gamma in RMSNorm.
- [C/pyTorch] Added support for alibi slopes to all attention backends.
- [docs/pyTorch] Added a tutorial on accelerating HF Llama models
  with Transformer Engine.
- [JAX] Added support for sequence parallelism.
- [JAX] Added support for RoPE.
- [JAX] Added support for GELU.
- [JAX] Increased execution speed in GQA.
- [paddle] Added support for grouped query attention (GQA).


Fixed Issues
@@@@@@@@@@@@

- [pyTorch] Fixed an issue where uninitialized/unused module buffers
  resulted in increased memory usage with the ``fp8_model_init`` API call.
- [pyTorch] Fixed an issue in MultiheadAttention where the attention type
  was not properly passed down into granular API calls.
- [pyTorch] Fixed an issue that caused Transformer Engine to crash
  when used with pyTorch version >=\ |nbsp|\ 2.0 and <\ |nbsp|\ 2.1.
- [pyTorch] Fixed a convergence issue when using FP8 with activation recompute.
- [pyTorch] Fixed a numerical bug associated with use of pipeline parallelism.


Known Issues in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@

- FlashAttention v2, which is a dependency of this release
  of Transformer Engine, has a known issue with excessive memory usage during installation (https://github.com/Dao-AILab/flash-attention/issues/358). You can work around this issue either by setting the environment variable ``MAX_JOBS=1`` during Transformer Engine installation or by installing FlashAttention v1 (e.g. with the command ``pip install flash-attn==1.0.9``) before attempting to install Transformer Engine.
- [pyTorch] FlashAttention v2.1 changed the behavior of the causal mask
  when performing cross-attention (see https://github.com/Dao-AILab/flash-attention#21-change-behavior-of-causal-flag for reference). For Transformer Engine to keep consistent behavior between versions and backends, FlashAttention is disabled for the use case "cross attention with casual masking" when 2.1+ version of FlashAttentionA is installed.


Breaking Changes in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

There are no breaking changes in this release.


Deprecated Features
@@@@@@@@@@@@@@@@@@@

There are no deprecated features in this release.


Miscellaneous Changes
@@@@@@@@@@@@@@@@@@@@@

FlashAttention v1 is not longer supported in Transformer Engine. Support for it was dropped in version 1.3. The minimum required FlashAttention version is v2.0.6.