.. include:: /content/common.rsts .. |ge| replace:: :html:`≥` Release Notes |ndash| Release 2.11 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Key Features and Enhancements @@@@@@@@@@@@@@@@@@@@@@@@@@@@@ - [PyTorch] Enabled the reference Current Scaling recipe for FP8 training. (`#2368 `_) - [PyTorch] Improved Random Hadamard Transform (RHT) device tensor caching to reduce memory allocations and improve performance for NVFP4 quantization. (`#2395 `_) - [PyTorch] Implemented selective activation checkpointing for ``LayerNormMLP`` module. (`#2311 `_) - [C, PyTorch, JAX] Improved performance of MXFP8 quantization. (`#2062 `_) - [C, PyTorch] Improved performance of NVFP4 quantization. (`#2351 `_) - [PyTorch] Improved FSDP2 all-gather performance and added support for ``FusedAdam`` optimizer with FSDP2. (`#2370 `_) - [PyTorch] Extended debug tools to support GroupedLinear layers. (`#1953 `_) - [JAX] Added Triton kernel bindings for JAX, enabling custom Triton kernels in JAX workflows. (`#2437 `_) - [C] Introduced experimental ``NVTEGroupedTensor`` class and helper functions. (`#2388 `_) - [C, PyTorch, JAX] Added FP8 support for primary weights in MXFP8 format with partial casting and amax calculations. (`#2055 `_) - [JAX] Added support for context parallelism (CP) for THD format and sliding window attention (SWA) using all-gather (AG), striped load balancing with stripe size greater than 1. (`#2379 `_) - [JAX] Implemented JAX primitives for token permutation operations on single GPU for mixture-of-experts routing. (`#2473 `_) - [PyTorch] Added THD format support for ``max_logit`` clipping and ``MuonClip`` gradient clipping operations. (`#2480 `_) Fixed Issues @@@@@@@@@@@@ - [PyTorch] Fixed a numerical issue when noncontiguous tensor was passed to cross_entropy backward pass. (`#2402 `_) - [PyTorch] Fixed CUDA graph execution order for backward weight gradient computation when using chunked layers. (`#2376 `_) - [C] Fixed runtime library loading logic to properly handle missing dependencies and load order. (`#2297 `_) - [Jax] Removed use of scan loop as the default for ring attention due for improved performance. (`#2503 `_). Known Issues in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no known issues in this release. Breaking Changes in This Release @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ There are no breaking changes in this release. . _DeprecatedFeatures: Deprecated Features @@@@@@@@@@@@@@@@@@@ There are no deprecated features in this release.