.. include:: /content/common.rsts Release Notes |ndash| Release 0.12.0 (BETA) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Key Features and Enhancements @@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - [pyTorch] Added ``device`` option for all modules (with ``cpu`` and ``cuda`` as possible values), enabling initialization of the model of the CPU. - [pyTorch] Added ``MultiheadAttention`` module. - [pyTorch] ``DotProductAttention`` module exposes now the ``attn_mask_type`` parameter in its ``forward`` method, enabling easy switching between causal and non-causal execution (e.g. when switching between training and inference). - [JAX] Added support for FLAX 0.7.1. - [JAX] Added support for the fused attention with sequence lengths longer than 512. - [JAX] Added support for FSDP in FLAX and Praxis. - [JAX] Added support for FP8 execution in Praxis. Fixed Issues @@@@@@@@@@@@ - [pyTorch] Fixed an issue with the reproducibility of the results between runs with and without activation recomputation - [pyTorch] Fixed an issue where in some cases memory would be allocated on a wrong device during loading from the checkpoint (https://github.com/NVIDIA/TransformerEngine/issues/342). - [pyTorch] Fixed a crash when sequence parallelism is used with frozen weights. - [pyTorch] Fixed the behavior of LayerNorm and RMSNorm modules when running under AMP. - [pyTorch] Fixed an issue where in some cases using the cuDNN backend of the fused attention would corrupt the random number generator state. Known Issues in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@ - FlashAttention v2, which is a dependency of this release of Transformer Engine, has a known issue with excessive memory usage during installation (https://github.com/Dao-AILab/flash-attention/issues/358). You can work around this issue either by setting the environment variable ``MAX_JOBS=1`` during Transformer Engine installation, or by installing FlashAttention v1 (e.g. running `pip install flash-attn==1.0.9`) before attempting to install Transformer Engine. Breaking Changes in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - There are no breaking changes in this release. Deprecated Features @@@@@@@@@@@@@@@@@@@ - [pyTorch] The ``TransformerLayer`` arguments *attention_softmax_in_fp32* and *apply_query_key_layer_scaling* are deprecated, and will be removed in a future release. The default behavior is as if those arguments were set to ``True``. - [pyTorch] The DotProductAttention argument ``attn_mask_type`` has been moved to the ``forward`` method and is deprecated. It will be fully removed in the future release.