.. include:: /content/common.rsts

Release Notes |ndash| Release 1.13
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Key Features and Enhancements
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

- [C/PyTorch/Jax] Added support for THD layout for MQA/GQA.
- [Jax] Expanded FFI (Foreign Function Interface) support to include quantization, transpose,
  layernorms, fused-attention, and CUDA graphs; fixed miscellaneous bugs in the existing FFI implementations.
- [Jax] Added support for Ring attention for context parallelism.
- [PyTorch] Expanded support for the Sequential/Operations Based API to include activations,
  communication overlap, normalizations, and other fusions.
- [PyTorch] Made miscellaneous fixes to reduce CPU overhead during execution.
- [PyTorch] Leveraged cuDNN 9.6+ to reduce memory usage for THD input format to attention.


Fixed Issues
@@@@@@@@@@@@

- [PyTorch] Fixed a crash that could occur when using FlashAttention with context parallelism.
- [C/Jax] Adopted 64-bit offsets to fix overflow for large tensors in the cuDNN attention back end.
- [C/Jax] Fixed build when using clang compiler to build JAX native extensions.
- [PyTorch] Fixed a crash when importing ``transformer-engine`` in CPU-only systems.
- [PyTorch] Fixed a crash when using context parallelism with RoPE.

   
Known Issues in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@

There are no known issues in this release.


Breaking Changes in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

There are no breaking changes in this release.


Deprecated Features
@@@@@@@@@@@@@@@@@@@

- Transformer Engine support for the PaddlePaddle framework
  is deprecated, and will be fully removed in version 2.0.
- Support for exporting Transformer Engine modules via ONNX
  is deprecated, and will be removed in version 2.0. This feature will be supported again in a later minor release of version 2.