Release Notes Release 1.11

Key Features and Enhancements

  • [pyTorch] Added dtensor support for optimizers.

  • [pyTorch] Added context parallel implementation with QKV all-to-all collectives.

  • [pyTorch] Added support for CPU offloading when using FP8 attention.

  • [pyTorch] Implemented padding and unpadding modules for FP8 that improve e2e performance of MoE models by ~2%.

  • [C/pyTorch] Added support for permutation operations for MoE and exposed them in the C API.

  • [pyTorch] Added support for RoPE when using FP8 attention.

  • [pyTorch] Added support for FlashAttention-3.

  • [JAX] Implemented context parallel fused attention using allgather and reduce-scatter collectives.

Fixed Issues

  • [pyTorch] Fixed a crash in fused adam optimizer when master parameters are not set.

  • [pyTorch] Fix a crash when using activation recompute with Python 3.10.

  • [pyTorch] Made miscellaneous fixes in the logic to select the correct attention backend.

Known Issues in This Release

There are no known issues in this release.

Breaking Changes in This Release

There are no breaking changes in this release.

Deprecated Features

There are no deprecated features in this release.