.. include:: /content/common.rsts Release Notes |ndash| Release 1.7 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Key Features and Enhancements @@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - [JAX] Added support for SwiGLU, gated/non-gated ReLU, Quick GeLU, and squared ReLU activations. - [pyTorch] Added support for attention bias and various QKV formats when using context parallelism. - [pyTorch] Expanded the Linear API to handle zero input tokens for MoE-like use cases. - [pyTorch] Added support for upstream AMP (``torch.amp.autocast``) in the checkpoint API. - [pyTorch] Added squared-relu activation. - [pyTorch] Updated flash-attention support to version 2.5.8. - [paddle-paddle] Added support for gradient accumulation fusion. Fixed Issues @@@@@@@@@@@@ - [pyTorch] Fixed an uninitialized TP group error that could occur when training with certain tensor parallel configs. - [pyTorch] Fixed a bug that occured when loading a checkpoint with calibrated high-precision weights. - [pyTorch] Improved the documentation for attention mask. - [JAX] Fixed a bug with mismatching shapes of activations and corresponding sharding constraints. - [JAX] Fixed an internal bug which caused an incorrect shape to be passed for Layernorm gradient. Known Issues in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no known issues in this release. Breaking Changes in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no breaking changes in this release. Deprecated Features @@@@@@@@@@@@@@@@@@@ There are no deprecated features in this release.