Release 0.1.0 (BETA)

Using Transformer Engine 0.1.0

To upgrade to Transformer Engine 0.1.0 from an older version of Transformer Engine, follow the installation and usage information in the NVIDIA Transformer Engine User Guide.

Note

The internal Transformer Engine C++ API used for operator’s implementation is not yet officially supported. Hence this API may change in future releases without advance notice.

Key Features and Enhancements

  • Easy-to-use pyTorch modules enabling building of the Transformer layers with FP8 support on H100 GPUs

  • Optimizations (e.g. fused kernels) for Transformer models across all precisions and NVIDIA GPU architectures

  • Support for parallel execution via data parallelism, tensor parallelism and sequence parallelism

Fixed Issues

There are no fixed issues in this release.

Known Issues in This Release

The following issues are known to exist in this release:

  • For some model configurations, the default fusion pattern in the LayerNormMLP and TransformerLayer modules does not give the best performance. You can set the environment variable NVTE_BIAS_GELU_NVFUSION to 1 to improve it.

  • Running the model in higher precision and then switching to FP8 precision currently results in assertion failure.

Breaking Changes in This Release

There are no breaking changes in this release.

Deprecated Features

There are no deprecated features in this release.