Release Notes Release 0.10.0 (BETA)

Key Features and Enhancements

Added Rotary Position Embedding support in the TransformerLayer API for pyTorch.

Added more activation options (ReLU, ReGLU, GeGLU and SwiGLU to previously existing GeLU) for LayerNormMLP and TransformerLayer APIs for pyTorch.

Made the attention mask type configurable in TransformerLayer API for Google JAX.

Fixed Issues

Fixed an issue resulting in larger than expected memory consumption when training models in FP8.

Known Issues in This Release

There are no known issues in this release.

Breaking Changes in This Release

There are no breaking changes in this release.

Deprecated Features

[pyTorch]: The TransformerLayer arguments attention_softmax_in_fp32 and apply_query_key_layer_scaling are deprecated, and will be removed in the future release. The default behavior is as if those arguments were set to True.

[Google JAX]: The TransformerLayer argument attn_type is deprecated, and will be ignored in version 0.10 of Transformer Engine and removed in version 0.11. It is superseded by the argument attn_mask_type.