.. include:: /content/common.rsts Release Notes |ndash| Release 1.13 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Key Features and Enhancements @@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - [C/PyTorch/Jax] Added support for THD layout for MQA/GQA. - [Jax] Expanded FFI (Foreign Function Interface) support to include quantization, transpose, layernorms, fused-attention, and CUDA graphs; fixed miscellaneous bugs in the existing FFI implementations. - [Jax] Added support for Ring attention for context parallelism. - [PyTorch] Expanded support for the Sequential/Operations Based API to include activations, communication overlap, normalizations, and other fusions. - [PyTorch] Made miscellaneous fixes to reduce CPU overhead during execution. - [PyTorch] Leveraged cuDNN 9.6+ to reduce memory usage for THD input format to attention. Fixed Issues @@@@@@@@@@@@ - [PyTorch] Fixed a crash that could occur when using FlashAttention with context parallelism. - [C/Jax] Adopted 64-bit offsets to fix overflow for large tensors in the cuDNN attention back end. - [C/Jax] Fixed build when using clang compiler to build JAX native extensions. - [PyTorch] Fixed a crash when importing ``transformer-engine`` in CPU-only systems. - [PyTorch] Fixed a crash when using context parallelism with RoPE. Known Issues in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no known issues in this release. Breaking Changes in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no breaking changes in this release. Deprecated Features @@@@@@@@@@@@@@@@@@@ - Transformer Engine support for the PaddlePaddle framework is deprecated, and will be fully removed in version 2.0. - Support for exporting Transformer Engine modules via ONNX is deprecated, and will be removed in version 2.0. This feature will be supported again in a later minor release of version 2.