.. include:: /content/common.rsts Release Notes |ndash| Release 1.8 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Key Features and Enhancements @@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - [pyTorch] Added a new argument, ``softmax_scale``, to the ``DotProductAttention`` API. - [pyTorch] Extended TransformerEngine’s pyTorch build to always compile with tensor parallelism (TP) communication overlap support, and to remove MPI dependency. Also exposed the APIs ``initialize_ub`` and ``destroy_ub`` for communication-gemm overlap configuration. - [pyTorch] Improved documentation for the ``DotProductAttention`` API, including benchmarks and end-to-end test scripts. - [pyTorch] Incorporated the Fused Adam and Fused SGD optimizers into Transformer Engine. They previously had to be installed from the GitHub repository https://github.com/NVIDIA/apex. Fixed Issues @@@@@@@@@@@@ - [pyTorch] Made internal changes to reduce the amount of CPU overhead. - [pyTorch] Fixed a crash that occured when using TorchDynamo with the ``checkpoint`` API. - [pyTorch] Fixed an issue with loading an FP8 checkpoint when using FP8 attention. Known Issues in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no known issues in this release. Breaking Changes in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no breaking changes in this release. Deprecated Features @@@@@@@@@@@@@@@@@@@ There are no deprecated features in this release.