.. include:: /content/common.rsts

Release Notes |ndash| Release 1.7
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


Key Features and Enhancements
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

- [JAX] Added support for SwiGLU, gated/non-gated ReLU, Quick GeLU, and squared ReLU activations.
- [pyTorch] Added support for attention bias and various QKV formats when using context parallelism.
- [pyTorch] Expanded the Linear API to handle zero input tokens for MoE-like use cases.

- [pyTorch] Added support for upstream AMP (``torch.amp.autocast``) in the checkpoint API.

- [pyTorch] Added squared-relu activation.
- [pyTorch] Updated flash-attention support to version 2.5.8.
- [paddle-paddle] Added support for gradient accumulation fusion.


Fixed Issues
@@@@@@@@@@@@

- [pyTorch] Fixed an uninitialized TP group error that could occur
  when training with certain tensor parallel configs.
  
- [pyTorch] Fixed a bug that occured when loading a checkpoint
  with calibrated high-precision weights.
  
- [pyTorch] Improved the documentation for attention mask.

- [JAX] Fixed a bug with mismatching shapes of activations and corresponding sharding constraints.

- [JAX] Fixed an internal bug which caused an incorrect shape to be passed for Layernorm gradient.


Known Issues in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@

There are no known issues in this release.


Breaking Changes in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

There are no breaking changes in this release.


Deprecated Features
@@@@@@@@@@@@@@@@@@@

There are no deprecated features in this release.