Release Notes – Release 1.12¶

Key Features and Enhancements¶

[pyTorch] Added rotary_base argument for RoPE instead of hard-coding the value to 10000.
[pyTorch] Added support for the pool argument in the make_graphed_callables API.
[pyTorch] Made miscellaneous minor improvements to mitigate CPU overhead.
[pyTorch/C] Fixed window size calculation when using cuDNN attention backend.
[pyTorch] Expanded fused RoPE kernel support to include Context parallelism and “thd” qkv-format.
[pyTorch] Made flash-attn an optional dependency.
[JAX] Added support for sliding window attention.

[pyTorch/C] Fixed window size calculation when using cuDNN attention backend.
[pyTorch] Fixed miscellaneous bugs in the flash-attn version 3 backend.
[pyTorch] Fixed an issue using the flash-attn backend with Context Parallelism.
[pyTorch] Fixed a numerical error when using FP8 with activation recompute.
[pyTorch] Fixed an issue in the backward pass of the GroupedLinear class when weights don’t require gradient.
[JAX] Fixed a numerical bug in the cuDNN attention backend when using Context Parallelism.

There are no known issues in this release.

There are no breaking changes in this release.

There are no deprecated features in this release.