Transformer Engine v2.12 Release Notes¶
Key Features and Enhancements¶
Made miscellaneous improvements and fixes to the documentation.
[C] Improved performance of NVFP4 quantization kernels. (#2412)
[C] Documented environment variables. (#2552)
[PyTorch] Added fused permute+pad and unpermute+unpad operations for FP8 optimization. (#1921)
[PyTorch] Improved the performance in CPU-limited scenarios.
[PyTorch] Added support for Sliding Window Attention (left, right) with fused attention. (#2477)
[PyTorch] Improved the performance of MXFP8 and NVFP4 by fusing the swizzling into the quantization (#2486)
[PyTorch] Added cudagraph support for activation recomputation. (#2518)
[JAX] Added a tutorial for integrating TE/JAX quantization into existing frameworks. (#2423)
[JAX] Added custom partitioning for permutation primitives. (#2591)
Fixed Issues¶
[C] Fixed SM120 compilation with CUDA 12. (#2482)
[C] Fixed overflow in padding and unpadding kernels. (#2548)
[C] Fixed a numerical issue in
sort_chunks_by_index. (#2566)[C] Fixed a numerical issue in swizzling blockwise E8 scales. (#2589)
[PyTorch] Fixed an AttributeError issue when checkpointing the model with MXFP8 parameters. (#2427)
[PyTorch] Fixed cross-entropy loss calculation when some tokens are ignored. (#2476)
[PyTorch] Fixed
Float8Tensor.contiguousautograd support. (#2533)[PyTorch] Fixed multiple CPU offloading issues. (#2535)
[PyTorch] Fixed uninitialized
permuted_scalevalues. (#2547)[PyTorch] Fixed FP8 quantization for the second MLP in
LayerNormMLP. (#2577)[PyTorch] Fixed ONNX tests and added FP8 attention export support. (#2598)
[JAX] Removed unused TE DPA dtype handling to improve cuDNN backend dtype detection. (#2485)
[JAX] Fixed segment-position calculation from segment IDs in
SequenceDescriptorclass. (#2523)[JAX] Fixed bugs in permutation custom partitioning. (#2617)
[JAX] Fixed issue in encoder and MNIST examples due to dataset path moving. (#2625)
There are no breaking Changes in This Release¶
There are no breaking changes in this release.
Deprecated Features¶
There are no deprecated features in this release.