.. include:: /content/common.rsts .. |ge| replace:: :html:`≥` Release Notes |ndash| Release 2.6 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Key Features and Enhancements @@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - [PyTorch] Added support for gradient accumulation fusion when using FSDP from megatron-core. - [PyTorch] Optimized memory usage when using |NVIDIA(r)| CUDA\ |reg| graphs with TE using the ``make_graphed_callables`` function. - [PyTorch] Optimized performance of permute fusion kernels for MoE. - [PyTorch] Added support for ONNX export of Transformer Engine modules. - [PyTorch] Added a ``save_original_input`` option to the ``Linear`` and ``GroupedLinear`` modules to decouple row-wise (forward) and column-wise (backward) quantization. This option saves memory for certain workloads and training recipes. - [PyTorch] Improved performance of MXFP8 quantization kernels. - [Core] Improved performance of KV caching kernels. Fixed Issues @@@@@@@@@@@@ - [PyTorch] Fixed an issue in the ``LayerNormLinear`` module where the returned normalization output was of different shape than the input tensor. - [PyTorch] Fixed an issue with the ``align_size`` calculation in FP8 padding/unpadding modules. - [PyTorch] Made miscellaneous fixes and enhancements to the fusible ops (``te.sequential``) API. - [PyTorch] Reduced CPU overhead in these workloads: DelayedScaling recipe, MXFP8 MoE, and pipeline parallelism. - [PyTorch] Fixed a bug in the multi-tensor adam kernel that incorrectly downcast an FP32 tensor to BF16. - [PyTorch] Fixed an issue with caching FP8 weights when running validation steps between training steps. - [PyTorch] Fixed a logical error that could lead to using a sub-optimal attention backend when a better-performing backend is available. - [PyTorch] Fixed miscellaneous errors during runtime loading of shared libraries by expanding search paths. - [PyTorch] Fixed a use-after-free bug in cases where quantization and normalization are unfused. - [Jax] Fixed a crash with grouped GEMM in CUDA version |ge| 12.9.1. - [Jax] Fixed build with JAX v0.7.0 that failed due to removal of ``jax.extend.ffi``. Known Issues in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no known issues in this release. Breaking Changes in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no breaking changes in this release. Deprecated Features @@@@@@@@@@@@@@@@@@@ There are no deprecated features in this release. Miscellaneous @@@@@@@@@@@@@ There are no miscellaneous issues in this release.