cuDNN Release Notes
NVIDIA CUDA Deep Neural Network (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. It provides highly tuned implementations of routines arising frequently in DNN applications. These release notes describe the key features, software enhancements and improvements, and known issues for the NVIDIA cuDNN 8.8.1 and earlier releases.
1. cuDNN Release 8.x.x
1.1. cuDNN Release 8.8.1
These are the NVIDIA cuDNN 8.8.1 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
1.2. cuDNN Release 8.8.0
These are the NVIDIA cuDNN 8.8.0 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
1.3. cuDNN Release 8.7.0
These are the NVIDIA cuDNN 8.7.0 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
1.4. cuDNN Release 8.6.0
These are the NVIDIA cuDNN 8.6.0 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
1.5. cuDNN Release 8.5.0
These are the NVIDIA cuDNN 8.5.0 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Performance Results
Model | Batchsize | A100 8.5.0 vs V100 8.4.1 | V100 8.5.0 vs V100 8.4.1 | ||
---|---|---|---|---|---|
FP16 | FP32 | FP16 | FP32 | ||
V-Net (3D-Image segmentation) | 2 | 1.1x | 2.9x | 1.0x | 1.0x |
8 | 1.4x | 3.4x | 1.0x | 1.0x | |
16 | 1.6x | 3.8x | 1.0x | 1.1x | |
32 | 1.8x | 3.7x | 1.0x | 1.0x | |
3D-UNet (3D-Image Segmentation) | 2 | 2.1x | 6.0x | 1.0x | 1.2x |
4 | 2.1x | 5.7x | 1.0x | 1.4x |
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
1.6. cuDNN Release 8.4.1
These are the NVIDIA cuDNN 8.4.1 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
1.7. cuDNN Release 8.4.0
These are the NVIDIA cuDNN 8.4.0 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
1.8. cuDNN Release 8.3.3
These are the NVIDIA cuDNN 8.3.3 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
Deprecated Features
- We are deprecating the reporting of performance results in the Best Practices For Using cuDNN 3D Convolutions and will instead update these Release Notes if there is anything interesting to report release-over-release. Starting with cuDNN 8.4.0, this section will be removed. For past performance tables, refer to the NVIDIA cuDNN Archives.
- Updated and migrated the content from the Best Practices For Using cuDNN 3D Convolutions to the NVIDIA cuDNN Developer Guide. The Best Practices document has been deprecated.
1.9. cuDNN Release 8.3.2
This is the NVIDIA cuDNN 8.3.2 Release Notes. This release includes fixes from the previous cuDNN v8.1.x releases as well as the following additional changes. These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
Deprecated Features
- We are deprecating the reporting of performance results in the Best Practices For Using cuDNN 3D Convolutions and will instead update these Release Notes if there is anything interesting to report release-over-release. Starting with cuDNN 8.4.0, this section will be removed. For past performance tables, refer to the NVIDIA cuDNN Archives > Best Practices For Using cuDNN 3D Convolutions.
1.10. cuDNN Release 8.3.1
This is the NVIDIA cuDNN 8.3.1 Release Notes. This release includes fixes from the previous cuDNN v8.1.x releases as well as the following additional changes. These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
Limitations
1.11. cuDNN Release 8.3.0
This is the NVIDIA cuDNN 8.3.0 release notes. This release includes fixes from the previous cuDNN v8.1.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
Limitations
1.12. cuDNN Release 8.2.4
This is the cuDNN 8.2.4 release notes. This release includes fixes from the previous cuDNN v8.1.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
1.13. cuDNN Release 8.2.2
This is the cuDNN 8.2.2 release notes. This release includes fixes from the previous cuDNN v8.1.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
Limitations
1.14. cuDNN Release 8.2.1
This is the cuDNN 8.2.1 release notes. This release includes fixes from the previous cuDNN v8.1.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, see the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
1.15. cuDNN Release 8.2.0
This is the cuDNN 8.2.0 release notes. This release includes fixes from the previous cuDNN v8.1.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
1.16. cuDNN Release 8.1.1
This is the cuDNN 8.1.1 release notes. This release includes fixes from the previous cuDNN v8.0.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
1.17. cuDNN Release 8.1.0
This is the cuDNN 8.1.0 release notes. This release includes fixes from the previous cuDNN v8.0.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
1.18. cuDNN Release 8.0.5
This is the cuDNN 8.0.5 release notes. This release includes fixes from the previous cuDNN v8.0.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Key Features and Enhancements
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
1.19. cuDNN Release 8.0.4
This is the cuDNN 8.0.4 release notes. This release includes fixes from the previous cuDNN v8.0.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Key Features and Enhancements
- GA102 support with improved convolution performance
- Now includes convolution heuristics targeting the NVIDIA GA102 GPU. (not applicable for Jetson platforms)
- RNN API v8 sample
- The new RNN sample illustrating the usage of the new RNN version 8 API has been added. The sample's workflow consists of the several routines to create RNN descriptors, create RNN data descriptors, set up weight space, and compute routines. The sample takes several input parameters that can set up different RNN configurations and input data specifications (data type, cell mode, bias mode, and so on).
- RNN functional and performance improvements
- ARM Server Base System Architecture (SBSA)
- Added support for ARM SBSA for Linux.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
Known Issues
1.20. cuDNN Release 8.0.3
This is the cuDNN 8.0.3 release notes. This release includes fixes from the previous cuDNN v8.0.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Key Features and Enhancements
-
Documentation for the cuDNN backend API has been included in this release. Users specify the computational case, set up an execution plan for it, and execute the computation using numerous descriptors. The typical use pattern for a descriptor with attributes consists of the following sequence of API calls:
- cudnnBackendCreateDescriptor() creates a descriptor of a specified type.
- cudnnBackendSetAttribute() sets the values of a settable attribute for the descriptor. All required attributes must be set before the next step.
- cudnnBackendFinalize() finalizes the descriptor.
- cudnnBackendGetAttribute() gets the values of an attribute from a finalized descriptor.
For more information, refer to the NVIDIA cuDNN Backend API.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
Known Issues
1.21. cuDNN Release 8.0.2
This is the cuDNN 8.0.2 release notes and first GA release of cuDNN 8.x. This release includes fixes from the previous cuDNN v8.0.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Key Features and Enhancements
- The key features mentioned in cuDNN 8.0.1 Preview and 8.0.0 Preview are now GA quality in this release.
- cudnnRNNBackwardData_v8() and cudnnRNNBackwardWeights_v8() are now documented in the cudnn_adv_train.so Library. For a list of functions and data types that were added in this release, see API changes for cuDNN 8.0.2.
-
- TF32 for 3D convolutions and deconvolution performance is significantly better, up to 3.9x, compared to cuDNN 8.0.1.
- TF32 for grouped convolutions on A100 were improved up to 1.5x performance compared to cuDNN 8.0.1 on ResNext convolution layers and up to 3x the performance compared to V100 with cuDNN v7.6. (not applicable for Jetson platforms)
The above performance improvements were measured using only cuDNN operations. The observed performance improvements will depend on a number of factors, such as non-cuDNN operations, kernel run time, and model architecture type.
- This release includes performance improvements on all architectures for 2D and 3D grouped convolutions compared with version 7.6. Additionally, we improved kernel selection heuristics on several known deep learning GitHub examples (also known as model scripts).
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
Known Issues
1.22. cuDNN Release 8.0.1 Preview
These release notes are applicable to NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Key Features and Enhancements
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
Known Issues
1.23. cuDNN Release 8.0.0 Preview
For previous cuDNN documentation, see the cuDNN Archived Documentation.
Key Features and Enhancements
- cuDNN library
- Multiple dynamic libraries
- In order to link against a subset of cuDNN, you must know which subset
of the API you are using and then link against the appropriate cuDNN sub
components. The cuDNN sub components are as follows:
- cudnn_ops_infer.so
- cudnn_ops_train.so
- cudnn_cnn_infer.so
- cudnn_cnn_train.so
- cudnn_adv_infer.so
- cudnn_adv_train.so
- cuDNN linking options
- There are two different linking options:
- cuDNN loading options
- For users who want a smaller memory footprint, there are two ways of
loading the library.
- New API functions
- For a list of functions and data types that were added in this release, refer to the API changes for cuDNN 8.0.0.
- General Support of CUDA Graph Capture
-
cuDNN 8.0.0 does not at this time offer API support to add operations to an existing CUDA graph directly; however, the captured graph may be added to an existing graph through the existing CUDA Graphs API.
Regarding texture usage, cuDNN 8.0.0 by default will not enable texture usage; expert users may enable texture usage where allowed, but that usage will prevent a successful CUDA Graph capture until disabled. In order for cuDNN 8.0.0 to be graph-capture compatible library-wide, the cuDNN 8.0.0 CTC API was updated as described elsewhere.
The usual restrictions for CUDA Graphs apply in addition to these restrictions here.
- New APIs for convolution
- A new set of API functions to provide a brand new approach to cuDNN that offers more fine-grain control of performance, numerical properties, and so on for convolution. Using this API, users directly access various engines that compute convolution forward propagation, backward data, backward filter, and generic support for fusion starting with a limited support in this cuDNN 8.0.0 release and expanding support in follow-up releases. Each engine has performance-tuning knobs such as GEMM tiling and split-K. Users can use this API to fine-tune their network by querying cuDNN’s heuristics, or doing their own, to find the most optimal engine configuration with which cuDNN computes each network layer.
- NVIDIA Ampere architecture GPU support (not applicable for Jetson platforms)
-
- NVIDIA Turing and NVIDIA Volta architecture improvements
-
- Operation fusion
- Operation fusion can be achieved using the backend API. The general workflow is similar to running unfused operations, except that instead of creating a single operation Operation Graph, the user may specify a multi-operation Operation Graph. For more information, refer to Operation Fusion using the Backend API.
- Depthwise convolution extension
- We have extended the fprop and dgrad NHWC depthwise kernels to support more combinations (filter sizes/strides) such as 5x5/1x1, 5x5/2x2, 7x7/1x1, 7x7/2x2 (in addition to what we already have, 1x1/1x1, 3x3/1x1, 3x3/2x2), which provides good performance.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
2. cuDNN Release 7.x.x
2.1. cuDNN Release 7.6.5
This is the cuDNN 7.6.5 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN release notes, refer to the NVIDIA cuDNN Archives.
Key Features and Enhancements
The following features and enhancements have been added to this release:
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
Limitations
-
RNN and multihead attention API calls may exhibit non-deterministic behavior when the cuDNN 7.6.5 library is built with CUDA Toolkit 10.2 or higher. This is the result of a new buffer management and heuristics in the cuBLAS library. As described in the Results Reproducibility section in the cuBLAS Library User's Guide, numerical results may not be deterministic when cuBLAS APIs are launched in more than one CUDA stream using the same cuBLAS handle. This is caused by two buffer sizes (16 KB and 4 MB) used in the default configuration.
When a larger buffer size is not available at runtime, instead of waiting for a buffer of that size to be released, a smaller buffer may be used with a different GPU kernel. The kernel selection may affect numerical results. The user can eliminate the non-deterministic behavior of cuDNN RNN and multihead attention APIs, by setting a single buffer size in the CUBLAS_WORKSPACE_CONFIG environmental variable, for example, :16:8 or :4096:2.
The first configuration instructs cuBLAS to allocate eight buffers of 16 KB each in GPU memory while the second setting creates two buffers of 4 MB each. The default buffer configuration in cuBLAS 10.2 and 11.0 is :16:8:4096:2, that is, we have two buffer sizes. In earlier cuBLAS libraries, such as cuBLAS 10.0, it used the :16:8 non-adjustable configuration. When buffers of only one size are available, the behavior of cuBLAS calls is deterministic in multi-stream setups.
Known Issues
- Updated: August 24, 2020
Two-dimensional forward convolutions using algo1 may segfault when the filter size is large. For example, we have observed this issue when the filter width and height are more than or equal to 363.
- Updated: September 28, 2020
cudnnConvolutionForward(), cudnnConvolutionBackwardData(), and cudnnConvolutionBackwardFilter() calls with algo0 or algo1 can result in an illegal memory access for PSEUDO_HALF_CONFIG data configuration when the number of elements in the output tensor is odd. This can be mitigated by allocating one extra element in the output buffer.
2.2. cuDNN Release 7.6.4
This is the cuDNN 7.6.4 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
For previous cuDNN release notes, refer to the NVIDIA cuDNN Archives.
Key Features and Enhancements
The following features and enhancements have been added to this release:
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
2.3. cuDNN Release 7.6.3
This is the cuDNN 7.6.3 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN release notes, refer to the NVIDIA cuDNN Archives.
2.4. cuDNN Release 7.6.2
This is the cuDNN 7.6.2 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
For previous cuDNN release notes, refer to the NVIDIA cuDNN Archives.