.. include:: /content/common.rsts Release Notes |ndash| Release 0.9.0 (BETA) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Key Features and Enhancements @@@@@@@@@@@@@@@@@@@@@@@@@@@@@ Support FlashAttention with no masking. Added DDP support for no-bias training option in PyTorch. Added support for cuDNN fused-attention for specific cases. JAX support for Praxis. JIT-compiled transpose kernels. Fixed Issues @@@@@@@@@@@@ Fixed a bug in PyTorch sequence-parallel where wrong tensor was being passed for GEMM in a specific path. Fixed an issue with handling nested fp8 autocasts in PyTorch. Fixed a bug during warmup of JIT kernels that made execution non-deterministic. Known Issues in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no known issues in this release. Breaking Changes in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no breaking changes in this release. Deprecated Features @@@@@@@@@@@@@@@@@@@ The ``TransformerLayer`` arguments `attention_softmax_in_fp32` and `apply_query_key_layer_scaling` are deprecated, and will be removed in a future release. The default behavior is as if those arguments were set to ``True``.