Release Notes – Release 1.12¶
Key Features and Enhancements¶
[pyTorch] Added rotary_base argument for RoPE instead of hard-coding the value to 10000.
[pyTorch] Added support for the pool argument in the make_graphed_callables API.
[pyTorch] Made miscellaneous minor improvements to mitigate CPU overhead.
[pyTorch/C] Fixed window size calculation when using cuDNN attention backend.
[pyTorch] Expanded fused RoPE kernel support to include Context parallelism and “thd” qkv-format.
[pyTorch] Made
flash-attn
an optional dependency.[JAX] Added support for sliding window attention.
Fixed Issues¶
[pyTorch/C] Fixed window size calculation when using cuDNN attention backend.
[pyTorch] Fixed miscellaneous bugs in the
flash-attn
version 3 backend.[pyTorch] Fixed an issue using the
flash-attn
backend with Context Parallelism.[pyTorch] Fixed a numerical error when using FP8 with activation recompute.
[pyTorch] Fixed an issue in the backward pass of the GroupedLinear class when weights don’t require gradient.
[JAX] Fixed a numerical bug in the cuDNN attention backend when using Context Parallelism.
Known Issues in This Release¶
There are no known issues in this release.
Breaking Changes in This Release¶
There are no breaking changes in this release.
Deprecated Features¶
There are no deprecated features in this release.