Transformer Engine v2.12 Release Notes

Key Features and Enhancements

  • Made miscellaneous improvements and fixes to the documentation.

  • [C] Improved performance of NVFP4 quantization kernels. (#2412)

  • [C] Documented environment variables. (#2552)

  • [PyTorch] Added fused permute+pad and unpermute+unpad operations for FP8 optimization. (#1921)

  • [PyTorch] Improved the performance in CPU-limited scenarios.

  • [PyTorch] Added support for Sliding Window Attention (left, right) with fused attention. (#2477)

  • [PyTorch] Improved the performance of MXFP8 and NVFP4 by fusing the swizzling into the quantization (#2486)

  • [PyTorch] Added cudagraph support for activation recomputation. (#2518)

  • [JAX] Added a tutorial for integrating TE/JAX quantization into existing frameworks. (#2423)

  • [JAX] Added custom partitioning for permutation primitives. (#2591)

Fixed Issues

  • [C] Fixed SM120 compilation with CUDA 12. (#2482)

  • [C] Fixed overflow in padding and unpadding kernels. (#2548)

  • [C] Fixed a numerical issue in sort_chunks_by_index. (#2566)

  • [C] Fixed a numerical issue in swizzling blockwise E8 scales. (#2589)

  • [PyTorch] Fixed an AttributeError issue when checkpointing the model with MXFP8 parameters. (#2427)

  • [PyTorch] Fixed cross-entropy loss calculation when some tokens are ignored. (#2476)

  • [PyTorch] Fixed Float8Tensor.contiguous autograd support. (#2533)

  • [PyTorch] Fixed multiple CPU offloading issues. (#2535)

  • [PyTorch] Fixed uninitialized permuted_scale values. (#2547)

  • [PyTorch] Fixed FP8 quantization for the second MLP in LayerNormMLP. (#2577)

  • [PyTorch] Fixed ONNX tests and added FP8 attention export support. (#2598)

  • [JAX] Removed unused TE DPA dtype handling to improve cuDNN backend dtype detection. (#2485)

  • [JAX] Fixed segment-position calculation from segment IDs in SequenceDescriptor class. (#2523)

  • [JAX] Fixed bugs in permutation custom partitioning. (#2617)

  • [JAX] Fixed issue in encoder and MNIST examples due to dataset path moving. (#2625)

There are no breaking Changes in This Release

There are no breaking changes in this release.

Deprecated Features

There are no deprecated features in this release.