Release Notes – Release 1.2.0¶
Key Features and Enhancements¶
[pyTorch] Sliding window support is added for DotProductAttention.
[pyTorch] Performance of DotProductAttention is increased on Hopper GPUs by utilizing cuDNN.
[pyTorch] Support for the Falcon architecture is added in TransformerLayer via the new option
parallel_attention_mlp
.[pyTorch] Checkpointing logic when using
fp8_model_init
is improved.[JAX] Support is added for controlling SM margin in LayerNorm and RMSNorm kernel via environment variables
NVTE_FWD_LAYERNORM_SM_MARGIN
andNVTE_BWD_LAYERNORM_SM_MARGIN
.
Fixed Issues¶
Weight gradient could be computed incorrectly in some cases when FP8 execution and sequence parallelism were used together.
Statistics were computed incorrectly during FP8 calibration.
Using torch.compile on DotProductAttention module caused a crash.
Rotary embeddings during pipeline-parallel inference did not operate correctly.
Incorrect mask type used by the decoder in encoder-decoder architectures.
Exporting Transformer Engine modules to ONNX in recent versions of pyTorch did not work correctly.
Known Issues in This Release¶
FlashAttention v2, which is a dependency of this release of Transformer Engine, has a known issue with excessive memory usage during installation (https://github.com/Dao-AILab/flash-attention/issues/358).
You can work around this issue either by setting the environment variable
MAX_JOBS=1
during Transformer Engine installation, or by installing FlashAttention v1 (e.g. by runningpip install flash-attn==1.0.9
) before attempting to install Transformer Engine.[pyTorch] FlashAttention v2.1 changed the behavior of the causal mask when performing cross-attention. (See https://github.com/Dao-AILab/flash-attention#21-change-behavior-of-causal-flag for reference.) To keep Transformer Engine behavior consistent between versions and backends, FlashAttention is disabled for this use case (cross attention with casual masking) when 2.1+ version of FlashAttention is installed.
Breaking Changes in This Release¶
There are no breaking changes in this release.
Deprecated Features¶
There are no deprecated features in this release.