.. include:: /content/common.rsts Release Notes |ndash| Release 2.3 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Key Features and Enhancements @@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - [PyTorch] Sped up import of transformer-engine by moving to a lazy compilation of functions using torch.compile. - [PyTorch] Enabled FP8 weights when using FSDP. - [C][PyTorch] Added support for Float 8 block scaling recipe, as used in the `Deepseek v3 paper `_, for Hopper GPUs. - [PyTorch] Made miscellaneous fixes to reduce CPU overhead. - [PyTorch] Added support for CPU offloading for activation tensors when using FP8 attention. - [PyTorch] Enable MXFP8 recipe for the ``GroupedLinear`` module. - [PyTorch] Add a feature to support decoupling the weight gradient compute from the backward function of Transformer Engine modules. This allows users to call backward wgrad and gives them finer-grained control over when gradients are called to support certain advanced parallelism/overlap schemes. - Added support for RTX 5090. - Added support for staggered application of rope embedding to a sequence of inputs in a batch, depending on their starting positions. Fixed Issues @@@@@@@@@@@@ - [PyTorch] Fixed a numerical bug with use of custom DDP from megatron-core. - [PyTorch] Fixed a crash when using the ``checkpoint`` method for activation recompute on non-Transformer Engine modules. Known Issues in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no known issues in this release. Breaking Changes in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - [Jax] Praxis layers have been removed, as PAXML is no longer supported. Deprecated Features @@@@@@@@@@@@@@@@@@@ - The installation for Transformer Engine now requires use of the `--no-build-isolation` flag when using PyPI package or building from source. Support for installations with build isolation will be removed in a future release. - [PyTorch] CPU offloading weight tensors is deprecated. Miscellaneous @@@@@@@@@@@@@ There are miscellaneous issues in this release.