.. include:: /content/common.rsts

Release Notes |ndash| Release 0.10.0 (BETA)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Key Features and Enhancements
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Added Rotary Position Embedding support in the ``TransformerLayer`` API for pyTorch.

Added more activation options (ReLU, ReGLU, GeGLU and SwiGLU to previously existing GeLU) for ``LayerNormMLP`` and ``TransformerLayer`` APIs for pyTorch.

Made the attention mask type configurable in ``TransformerLayer`` API for Google JAX.


Fixed Issues
@@@@@@@@@@@@

Fixed an issue resulting in larger than expected memory consumption when training models in FP8.

Known Issues in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@

There are no known issues in this release.

Breaking Changes in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

There are no breaking changes in this release.

Deprecated Features
@@@@@@@@@@@@@@@@@@@

[pyTorch]: The ``TransformerLayer`` arguments `attention_softmax_in_fp32` and `apply_query_key_layer_scaling` are deprecated, and will be removed in the future release. The default behavior is as if those arguments were set to ``True``.

[Google JAX]: The ``TransformerLayer`` argument `attn_type` is deprecated, and will be ignored in version 0.10 of Transformer Engine and removed in version 0.11. It is superseded by the argument `attn_mask_type`.