Release Notes – Release 1.12¶
Key Features and Enhancements¶
[pyTorch] Added rotary_base argument for RoPE instead of hard-coding the value to 10000.
[pyTorch] Added support for the pool argument in the make_graphed_callables API.
[pyTorch] Made miscellaneous minor improvements to mitigate CPU overhead.
[pyTorch/C] Fixed window size calculation when using cuDNN attention backend.
[pyTorch] Expanded fused RoPE kernel support to include Context parallelism and “thd” qkv-format.
[pyTorch] Made
flash-attnan optional dependency.[JAX] Added support for sliding window attention.
Fixed Issues¶
[pyTorch/C] Fixed window size calculation when using cuDNN attention backend.
[pyTorch] Fixed miscellaneous bugs in the
flash-attnversion 3 backend.[pyTorch] Fixed an issue using the
flash-attnbackend with Context Parallelism.[pyTorch] Fixed a numerical error when using FP8 with activation recompute.
[pyTorch] Fixed an issue in the backward pass of the GroupedLinear class when weights don’t require gradient.
[JAX] Fixed a numerical bug in the cuDNN attention backend when using Context Parallelism.
Known Issues in This Release¶
There are no known issues in this release.
Breaking Changes in This Release¶
There are no breaking changes in this release.
Deprecated Features¶
There are no deprecated features in this release.