.. include:: /content/common.rsts

Release Notes |ndash| Release 2.0
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Key Features and Enhancements
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

- [C] Added MXFP8 support in functions for casting, GEMMs, normalization, activations.
- [C] Added generic API for quantized tensors,
  including generic quantize and dequantize functions.
- [C] Exposed cuDNN ``LayerNorm`` and ``RMSNorm`` kernels.
- [pyTorch] Added MXFP8 recipe.
- [pyTorch] Added MXFP8 support in ``Linear``, ``LayerNormLinear``,
  ``LayerNormMLP``, and ``TransformerLayer`` modules, and in the operation-based API.
- [pyTorch] Changed the default quantization scheme
  from FP8 to MXFP8 for Blackwell GPUs.
- [pyTorch] Added a custom tensor class for MXFP8 data.
- [pyTorch] Reduced CPU overhead in FP8/MXFP8 execution.
- [pyTorch] Enabled efficient handling of FP8 parameters
  with PyTorch FSDP2.
- [pyTorch] Expanded the support matrix for Sliding Window Attention.


Fixed Issues
@@@@@@@@@@@@

- [pyTorch] Fixed bugs in capturing CUDA Graphs for MoE models.
- [pyTorch] Fixed errors with THE FP8 state
  when loading HuggingFace checkpoints.

   
Known Issues in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@

- [pyTorch] Overlapping tensor-parallel communication with Userbuffers
  is not supported with MXFP8.
- [pyTorch] When running linear modules with MXFP8,
  the memory footprint and tensor-parallel communication volume is larger than necessary.
- [pyTorch] Userbuffers support in the operation-based API is disabled.


Breaking Changes in This Release
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

- [C] Updated minimum requirements to CUDA 12.1 and cuDNN 9.3.
- [PaddlePaddle] Removed PaddlePaddle integration.
- [pyTorch] Changed the default quantization from FP8 to MXFP8
  for Blackwell GPUs.
- [pyTorch] Removed support for exporting ONNX models.
  Support for ONNX export will be reenabled in a future release


Deprecated Features
@@@@@@@@@@@@@@@@@@@

There are no deprecated features in this release.