Release Notes – Release 0.8.0 (BETA)¶

Key Features and Enhancements¶

Added experimental support for TensorFlow (single GPU only for now).

Added C++ API for FP8 fused attention from cuDNN.

Optimized performance in some cases when using FlashAttention.

Added an option to train without biases in LayerNormMLP.

Added support for zero-centered gamma in LayerNorm in JAX.

Added an option to perform reduction of amax asynchronously when training with FP8.

Fixed Issues¶

Fixed multiple issues with exporting the model to ONNX.

Known Issues in This Release¶

There are no known issues in this release.

Breaking Changes in This Release¶

There are no breaking changes in this release.

Deprecated Features¶

The TransformerLayer arguments attention_softmax_in_fp32 and apply_query_key_layer_scaling are deprecated, and will be removed in a future release. The default behavior is as if those arguments were set to True.