.. include:: /content/common.rsts Release Notes |ndash| Release 2.0 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Key Features and Enhancements @@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - [C] Added MXFP8 support in functions for casting, GEMMs, normalization, activations. - [C] Added generic API for quantized tensors, including generic quantize and dequantize functions. - [C] Exposed cuDNN ``LayerNorm`` and ``RMSNorm`` kernels. - [pyTorch] Added MXFP8 recipe. - [pyTorch] Added MXFP8 support in ``Linear``, ``LayerNormLinear``, ``LayerNormMLP``, and ``TransformerLayer`` modules, and in the operation-based API. - [pyTorch] Changed the default quantization scheme from FP8 to MXFP8 for Blackwell GPUs. - [pyTorch] Added a custom tensor class for MXFP8 data. - [pyTorch] Reduced CPU overhead in FP8/MXFP8 execution. - [pyTorch] Enabled efficient handling of FP8 parameters with PyTorch FSDP2. - [pyTorch] Expanded the support matrix for Sliding Window Attention. Fixed Issues @@@@@@@@@@@@ - [pyTorch] Fixed bugs in capturing CUDA Graphs for MoE models. - [pyTorch] Fixed errors with THE FP8 state when loading HuggingFace checkpoints. Known Issues in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@ - [pyTorch] Overlapping tensor-parallel communication with Userbuffers is not supported with MXFP8. - [pyTorch] When running linear modules with MXFP8, the memory footprint and tensor-parallel communication volume is larger than necessary. - [pyTorch] Userbuffers support in the operation-based API is disabled. Breaking Changes in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - [C] Updated minimum requirements to CUDA 12.1 and cuDNN 9.3. - [PaddlePaddle] Removed PaddlePaddle integration. - [pyTorch] Changed the default quantization from FP8 to MXFP8 for Blackwell GPUs. - [pyTorch] Removed support for exporting ONNX models. Support for ONNX export will be reenabled in a future release Deprecated Features @@@@@@@@@@@@@@@@@@@ There are no deprecated features in this release.