Release 0.1.0 (BETA)¶
Using Transformer Engine 0.1.0¶
To upgrade to Transformer Engine 0.1.0 from an older version of Transformer Engine, follow the installation and usage information in the NVIDIA Transformer Engine User Guide.
Note
The internal Transformer Engine C++ API used for operator’s implementation is not yet officially supported. Hence this API may change in future releases without advance notice.
Key Features and Enhancements¶
- Easy-to-use pyTorch modules enabling building of the Transformer layers with FP8 support on H100 GPUs 
- Optimizations (e.g. fused kernels) for Transformer models across all precisions and NVIDIA GPU architectures 
- Support for parallel execution via data parallelism, tensor parallelism and sequence parallelism 
Fixed Issues¶
There are no fixed issues in this release.
Known Issues in This Release¶
The following issues are known to exist in this release:
- For some model configurations, the default fusion pattern in the - LayerNormMLPand- TransformerLayermodules does not give the best performance. You can set the environment variable- NVTE_BIAS_GELU_NVFUSIONto 1 to improve it.
- Running the model in higher precision and then switching to FP8 precision currently results in assertion failure. 
Breaking Changes in This Release¶
There are no breaking changes in this release.
Deprecated Features¶
There are no deprecated features in this release.