.. include:: /content/common.rsts

Release Notes |ndash| Release 1.9
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


Key Features and Enhancements
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

- [PyTorch] Added support for sliding window attention
  in the cuDNN backend.
- [PyTorch] Added an experimental `torch.nn.Sequential` style API
  for automatic operation based fusions.
- [C/PyTorch] Added support for bottom-right aligned diagonal causal mask.
- [C/PyTorch] Added support for grouped GEMM for MoE training.
- [JAX] Added support for THD attention format.
- [PaddlePaddle] Added support for CUDA graphs.
- [PaddlePaddle] Added support for PaddlePaddle versions >= 2.6.1.


Fixed Issues
@@@@@@@@@@@@

- [PyTorch] Fixed incorrect outputs
  when handling non-contiguous input tensors.
- [PyTorch] Fixed a hang in the `initialize_ub` function
  during multi-node runs, along with miscellaneous improvements in communication-GEMM overlap with userbuffers.
- [PyTorch] Fixed convergence when using CPU offloading.
- [PyTorch] Fixed a crash that occurred when using MoE,
  when an expert receives 0 tokens.
- [JAX] Fixed a crash in newer JAX versions
  which restricted the output format of HLO lowering.
- [PaddlePaddle] Fixed a crash when using
  the standalone column parallel linear API.
- Fixed a numerical bug in the QGeLU activation.
- Fixed a compilation bug in the core library with CUDA 12.1.
- Fixed a bug selecting tuned RMSNorm kernels.
- Fixed performance overheads
  by reducing the number of calls to the CUDA driver.


Known Issues in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@

There are no known issues in this release.


Breaking Changes in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

There are no breaking changes in this release.


Deprecated Features
@@@@@@@@@@@@@@@@@@@

There are no deprecated features in this release.