.. include:: /content/common.rsts Release Notes |ndash| Release 1.11 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Key Features and Enhancements @@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - [pyTorch] Added dtensor support for optimizers. - [pyTorch] Added context parallel implementation with QKV all-to-all collectives. - [pyTorch] Added support for CPU offloading when using FP8 attention. - [pyTorch] Implemented padding and unpadding modules for FP8 that improve e2e performance of MoE models by ~2%. - [C/pyTorch] Added support for permutation operations for MoE and exposed them in the C API. - [pyTorch] Added support for RoPE when using FP8 attention. - [pyTorch] Added support for FlashAttention-3. - [JAX] Implemented context parallel fused attention using allgather and reduce-scatter collectives. Fixed Issues @@@@@@@@@@@@ - [pyTorch] Fixed a crash in fused adam optimizer when master parameters are not set. - [pyTorch] Fix a crash when using activation recompute with Python 3.10. - [pyTorch] Made miscellaneous fixes in the logic to select the correct attention backend. Known Issues in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no known issues in this release. Breaking Changes in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no breaking changes in this release. Deprecated Features @@@@@@@@@@@@@@@@@@@ There are no deprecated features in this release.