Release Notes – Release 2.6¶
Key Features and Enhancements¶
[PyTorch] Added support for gradient accumulation fusion when using FSDP from megatron-core.
[PyTorch] Optimized memory usage when using NVIDIA® CUDA® graphs with TE using the
make_graphed_callables
function.[PyTorch] Optimized performance of permute fusion kernels for MoE.
[PyTorch] Added support for ONNX export of Transformer Engine modules.
[PyTorch] Added a
save_original_input
option to theLinear
andGroupedLinear
modules to decouple row-wise (forward) and column-wise (backward) quantization. This option saves memory for certain workloads and training recipes.[PyTorch] Improved performance of MXFP8 quantization kernels.
[Core] Improved performance of KV caching kernels.
Fixed Issues¶
[PyTorch] Fixed an issue in the
LayerNormLinear
module where the returned normalization output was of different shape than the input tensor.[PyTorch] Fixed an issue with the
align_size
calculation in FP8 padding/unpadding modules.[PyTorch] Made miscellaneous fixes and enhancements to the fusible ops (
te.sequential
) API.[PyTorch] Reduced CPU overhead in these workloads: DelayedScaling recipe, MXFP8 MoE, and pipeline parallelism.
[PyTorch] Fixed a bug in the multi-tensor adam kernel that incorrectly downcast an FP32 tensor to BF16.
[PyTorch] Fixed an issue with caching FP8 weights when running validation steps between training steps.
[PyTorch] Fixed a logical error that could lead to using a sub-optimal attention backend when a better-performing backend is available.
[PyTorch] Fixed miscellaneous errors during runtime loading of shared libraries by expanding search paths.
[PyTorch] Fixed a use-after-free bug in cases where quantization and normalization are unfused.
[Jax] Fixed a crash with grouped GEMM in CUDA version ≥ 12.9.1.
[Jax] Fixed build with JAX v0.7.0 that failed due to removal of
jax.extend.ffi
.
Known Issues in This Release¶
There are no known issues in this release.
Breaking Changes in This Release¶
There are no breaking changes in this release.
Deprecated Features¶
There are no deprecated features in this release.
Miscellaneous¶
There are no miscellaneous issues in this release.