.. include:: /content/common.rsts Release Notes |ndash| Release 0.10.0 (BETA) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Key Features and Enhancements @@@@@@@@@@@@@@@@@@@@@@@@@@@@@ Added Rotary Position Embedding support in the ``TransformerLayer`` API for pyTorch. Added more activation options (ReLU, ReGLU, GeGLU and SwiGLU to previously existing GeLU) for ``LayerNormMLP`` and ``TransformerLayer`` APIs for pyTorch. Made the attention mask type configurable in ``TransformerLayer`` API for Google JAX. Fixed Issues @@@@@@@@@@@@ Fixed an issue resulting in larger than expected memory consumption when training models in FP8. Known Issues in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no known issues in this release. Breaking Changes in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no breaking changes in this release. Deprecated Features @@@@@@@@@@@@@@@@@@@ [pyTorch]: The ``TransformerLayer`` arguments `attention_softmax_in_fp32` and `apply_query_key_layer_scaling` are deprecated, and will be removed in the future release. The default behavior is as if those arguments were set to ``True``. [Google JAX]: The ``TransformerLayer`` argument `attn_type` is deprecated, and will be ignored in version 0.10 of Transformer Engine and removed in version 0.11. It is superseded by the argument `attn_mask_type`.