.. include:: /content/common.rsts Release Notes |ndash| Release 2.5 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Key Features and Enhancements @@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - [Jax] Added support for sliding window attention (SWA) in context parallel ring attention using THD format and striped sharding. - [Jax] Improved performance for per-tensor scaling FP8 recipe. - [PyTorch] Enabled FP8 tensor-parallel communication for FP8 block scaling recipe for Hopper by supporting coalesced gather of FP8 quantized tensors. - [PyTorch] Optimized MXFP8 Userbuffers implementation by overlapping wgrad NCCL all-gather with dgrad GEMM.. - [PyTorch] Added support for CPU offloading when using FP8 parameters. - [PyTorch] Added support for Context Parallel for Multi Latent Attention (MLA). - [PyTorch] Reduced CPU overhead in MoE. - [C][PyTorch] Improved performance for FP8 padding and unpadding kernels for MoE. - [Jax] Added MXFP8 support for the ``GroupedDense`` module and handle the case with zero input tokens. - Added support for Python 3.12+ - Added support for head dimension (``head_dim``) > 128 for attention for all architectures. - [PyTorch] Added support for FP8 current scaling in operation-based API. Fixed Issues @@@@@@@@@@@@ - [Jax] Fixed a numerical error in the scaled masked softmax kernel. - [Jax] Fixed output dtype for FP8 GEMM. - [PyTorch] Fixed a bug that appeared when the FP8 recipe is changed in between training steps. - [PyTorch] Made miscellaneous fixes in ``TransformerLayer``: Pass missing arguments *cu_seqlens* and *max_seqlen* to cross-attention and allow ``attn_input_format=thd``. - [PyTorch] Fixed a crash when loading checkpoints from previously generated Transformer Engine versions. - [PyTorch] Made miscellaneous fixes in CPU offloading logic. - [PyTorch] Fixed a numerical issue in cross-entropy loss. - [C][PyTorch][Jax] Fixed source installation when using ``NVTE_FRAMEWORK=all``. - [PyTorch] Fixed a crash in ``GroupedLinear`` when using CUDA graphs. Known Issues in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no known issues in this release. Breaking Changes in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no breaking changes in this release. Deprecated Features @@@@@@@@@@@@@@@@@@@ There are no deprecated features in this release. Miscellaneous @@@@@@@@@@@@@ There are no miscellaneous issues in this release.