Release Notes Release 0.8.0 (BETA)

Key Features and Enhancements

Added experimental support for TensorFlow (single GPU only for now).

Added C++ API for FP8 fused attention from cuDNN.

Optimized performance in some cases when using FlashAttention.

Added an option to train without biases in LayerNormMLP.

Added support for zero-centered gamma in LayerNorm in JAX.

Added an option to perform reduction of amax asynchronously when training with FP8.

Fixed Issues

Fixed multiple issues with exporting the model to ONNX.

Known Issues in This Release

There are no known issues in this release.

Breaking Changes in This Release

There are no breaking changes in this release.

Deprecated Features

The TransformerLayer arguments attention_softmax_in_fp32 and apply_query_key_layer_scaling are deprecated, and will be removed in a future release. The default behavior is as if those arguments were set to True.