Release Notes Release 1.8

Key Features and Enhancements

  • [pyTorch] Added a new argument, softmax_scale, to the DotProductAttention API.

  • [pyTorch] Extended TransformerEngine’s pyTorch build to always compile with tensor parallelism (TP) communication overlap support, and to remove MPI dependency. Also exposed the APIs initialize_ub and destroy_ub for communication-gemm overlap configuration.

  • [pyTorch] Improved documentation for the DotProductAttention API, including benchmarks and end-to-end test scripts.

  • [pyTorch] Incorporated the Fused Adam and Fused SGD optimizers into Transformer Engine. They previously had to be installed from the GitHub repository https://github.com/NVIDIA/apex.

Fixed Issues

  • [pyTorch] Made internal changes to reduce the amount of CPU overhead.

  • [pyTorch] Fixed a crash that occured when using TorchDynamo with the checkpoint API.

  • [pyTorch] Fixed an issue with loading an FP8 checkpoint when using FP8 attention.

Known Issues in This Release

There are no known issues in this release.

Breaking Changes in This Release

There are no breaking changes in this release.

Deprecated Features

There are no deprecated features in this release.