Release 0.1.0 (BETA)¶

Using Transformer Engine 0.1.0¶

To upgrade to Transformer Engine 0.1.0 from an older version of Transformer Engine, follow the installation and usage information in the NVIDIA Transformer Engine User Guide.

Note

The internal Transformer Engine C++ API used for operator’s implementation is not yet officially supported. Hence this API may change in future releases without advance notice.

Key Features and Enhancements¶

Easy-to-use pyTorch modules enabling building of the Transformer layers with FP8 support on H100 GPUs
Optimizations (e.g. fused kernels) for Transformer models across all precisions and NVIDIA GPU architectures
Support for parallel execution via data parallelism, tensor parallelism and sequence parallelism

Fixed Issues¶

There are no fixed issues in this release.

Known Issues in This Release¶

The following issues are known to exist in this release:

For some model configurations, the default fusion pattern in the LayerNormMLP and TransformerLayer modules does not give the best performance. You can set the environment variable NVTE_BIAS_GELU_NVFUSION to 1 to improve it.
Running the model in higher precision and then switching to FP8 precision currently results in assertion failure.

Breaking Changes in This Release¶

There are no breaking changes in this release.

Deprecated Features¶

There are no deprecated features in this release.