cuDNN Release Notes
NVIDIA CUDA Deep Neural Network (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. It provides highly tuned implementations of routines arising frequently in DNN applications. These release notes describe the key features, software enhancements and improvements, and known issues for the NVIDIA cuDNN 8.9.4 and earlier releases.
1. cuDNN Release 8.x.x
1.1. cuDNN Release 8.9.4
These are the NVIDIA cuDNN 8.9.4 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
1.2. cuDNN Release 8.9.3
These are the NVIDIA cuDNN 8.9.3 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
1.3. cuDNN Release 8.9.2
These are the NVIDIA cuDNN 8.9.2 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
1.4. cuDNN Release 8.9.1
These are the NVIDIA cuDNN 8.9.1 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
1.5. cuDNN Release 8.9.0
These are the NVIDIA cuDNN 8.9.0 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
1.6. cuDNN Release 8.8.1
These are the NVIDIA cuDNN 8.8.1 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
1.7. cuDNN Release 8.8.0
These are the NVIDIA cuDNN 8.8.0 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
1.8. cuDNN Release 8.7.0
These are the NVIDIA cuDNN 8.7.0 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
1.9. cuDNN Release 8.6.0
These are the NVIDIA cuDNN 8.6.0 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
1.10. cuDNN Release 8.5.0
These are the NVIDIA cuDNN 8.5.0 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Performance Results
| Model | Batchsize | A100 8.5.0 vs V100 8.4.1 | V100 8.5.0 vs V100 8.4.1 | ||
|---|---|---|---|---|---|
| FP16 | FP32 | FP16 | FP32 | ||
| V-Net (3D-Image segmentation) | 2 | 1.1x | 2.9x | 1.0x | 1.0x | 
| 8 | 1.4x | 3.4x | 1.0x | 1.0x | |
| 16 | 1.6x | 3.8x | 1.0x | 1.1x | |
| 32 | 1.8x | 3.7x | 1.0x | 1.0x | |
| 3D-UNet (3D-Image Segmentation) | 2 | 2.1x | 6.0x | 1.0x | 1.2x | 
| 4 | 2.1x | 5.7x | 1.0x | 1.4x | |
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
1.11. cuDNN Release 8.4.1
These are the NVIDIA cuDNN 8.4.1 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
1.12. cuDNN Release 8.4.0
These are the NVIDIA cuDNN 8.4.0 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
1.13. cuDNN Release 8.3.3
These are the NVIDIA cuDNN 8.3.3 Release Notes. These Release Notes include fixes from the previous cuDNN releases as well as the following additional changes.
These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
Deprecated Features
- We are deprecating the reporting of performance results in the Best Practices For Using cuDNN 3D Convolutions and will instead update these Release Notes if there is anything interesting to report release-over-release. Starting with cuDNN 8.4.0, this section will be removed. For past performance tables, refer to the NVIDIA cuDNN Archives.
- Updated and migrated the content from the Best Practices For Using cuDNN 3D Convolutions to the NVIDIA cuDNN Developer Guide. The Best Practices document has been deprecated.
1.14. cuDNN Release 8.3.2
This is the NVIDIA cuDNN 8.3.2 Release Notes. This release includes fixes from the previous cuDNN v8.1.x releases as well as the following additional changes. These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
Deprecated Features
- We are deprecating the reporting of performance results in the Best Practices For Using cuDNN 3D Convolutions and will instead update these Release Notes if there is anything interesting to report release-over-release. Starting with cuDNN 8.4.0, this section will be removed. For past performance tables, refer to the NVIDIA cuDNN Archives > Best Practices For Using cuDNN 3D Convolutions.
1.15. cuDNN Release 8.3.1
This is the NVIDIA cuDNN 8.3.1 Release Notes. This release includes fixes from the previous cuDNN v8.1.x releases as well as the following additional changes. These Release Notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
Limitations
1.16. cuDNN Release 8.3.0
This is the NVIDIA cuDNN 8.3.0 release notes. This release includes fixes from the previous cuDNN v8.1.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack™ users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previously released cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
Limitations
1.17. cuDNN Release 8.2.4
This is the cuDNN 8.2.4 release notes. This release includes fixes from the previous cuDNN v8.1.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
1.18. cuDNN Release 8.2.2
This is the cuDNN 8.2.2 release notes. This release includes fixes from the previous cuDNN v8.1.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
Limitations
1.19. cuDNN Release 8.2.1
This is the cuDNN 8.2.1 release notes. This release includes fixes from the previous cuDNN v8.1.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, see the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
1.20. cuDNN Release 8.2.0
This is the cuDNN 8.2.0 release notes. This release includes fixes from the previous cuDNN v8.1.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
1.21. cuDNN Release 8.1.1
This is the cuDNN 8.1.1 release notes. This release includes fixes from the previous cuDNN v8.0.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
1.22. cuDNN Release 8.1.0
This is the cuDNN 8.1.0 release notes. This release includes fixes from the previous cuDNN v8.0.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
1.23. cuDNN Release 8.0.5
This is the cuDNN 8.0.5 release notes. This release includes fixes from the previous cuDNN v8.0.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Key Features and Enhancements
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
1.24. cuDNN Release 8.0.4
This is the cuDNN 8.0.4 release notes. This release includes fixes from the previous cuDNN v8.0.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Key Features and Enhancements
- GA102 support with improved convolution performance
- Now includes convolution heuristics targeting the NVIDIA GA102 GPU. (not applicable for Jetson platforms)
- RNN API v8 sample
- The new RNN sample illustrating the usage of the new RNN version 8 API has been added. The sample's workflow consists of the several routines to create RNN descriptors, create RNN data descriptors, set up weight space, and compute routines. The sample takes several input parameters that can set up different RNN configurations and input data specifications (data type, cell mode, bias mode, and so on).
- RNN functional and performance improvements
- ARM Server Base System Architecture (SBSA)
- Added support for ARM SBSA for Linux.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
Known Issues
1.25. cuDNN Release 8.0.3
This is the cuDNN 8.0.3 release notes. This release includes fixes from the previous cuDNN v8.0.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Key Features and Enhancements
- 
                                    Documentation for the cuDNN backend API has been included in this release. Users specify the computational case, set up an execution plan for it, and execute the computation using numerous descriptors. The typical use pattern for a descriptor with attributes consists of the following sequence of API calls:- cudnnBackendCreateDescriptor() creates a descriptor of a specified type.
- cudnnBackendSetAttribute() sets the values of a settable attribute for the descriptor. All required attributes must be set before the next step.
- cudnnBackendFinalize() finalizes the descriptor.
- cudnnBackendGetAttribute() gets the values of an attribute from a finalized descriptor.
 For more information, refer to the NVIDIA cuDNN Backend API. 
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
Known Issues
1.26. cuDNN Release 8.0.2
This is the cuDNN 8.0.2 release notes and first GA release of cuDNN 8.x. This release includes fixes from the previous cuDNN v8.0.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Key Features and Enhancements
- The key features mentioned in cuDNN 8.0.1 Preview and 8.0.0 Preview are now GA quality in this release.
- cudnnRNNBackwardData_v8() and cudnnRNNBackwardWeights_v8() are now documented in the cudnn_adv_train.so Library. For a list of functions and data types that were added in this release, see API changes for cuDNN 8.0.2.
- 
                                    - TF32 for 3D convolutions and deconvolution performance is significantly better, up to 3.9x, compared to cuDNN 8.0.1.
- TF32 for grouped convolutions on A100 were improved up to 1.5x performance compared to cuDNN 8.0.1 on ResNext convolution layers and up to 3x the performance compared to V100 with cuDNN v7.6. (not applicable for Jetson platforms)
 The above performance improvements were measured using only cuDNN operations. The observed performance improvements will depend on a number of factors, such as non-cuDNN operations, kernel run time, and model architecture type. 
- This release includes performance improvements on all architectures for 2D and 3D grouped convolutions compared with version 7.6. Additionally, we improved kernel selection heuristics on several known deep learning GitHub examples (also known as model scripts).
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
Known Issues
1.27. cuDNN Release 8.0.1 Preview
These release notes are applicable to NVIDIA JetPack users of cuDNN unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN documentation, refer to the NVIDIA cuDNN Archives.
Key Features and Enhancements
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
Limitations
Known Issues
1.28. cuDNN Release 8.0.0 Preview
For previous cuDNN documentation, see the cuDNN Archived Documentation.
Key Features and Enhancements
- cuDNN library
- Multiple dynamic libraries
- In order to link against a subset of cuDNN, you must know which subset
                                    							of the API you are using and then link against the appropriate cuDNN sub
                                    							components. The cuDNN sub components are as follows:
                                       - cudnn_ops_infer.so
- cudnn_ops_train.so
- cudnn_cnn_infer.so
- cudnn_cnn_train.so
- cudnn_adv_infer.so
- cudnn_adv_train.so
 
- cuDNN linking options
- There are two different linking options:
                                       
- cuDNN loading options
- For users who want a smaller memory footprint, there are two ways of
                                    							loading the library.
                                       
- New API functions
- For a list of functions and data types that were added in this release, refer to the API changes for cuDNN 8.0.0.
- General Support of CUDA Graph Capture
- 
                                    
                                    cuDNN 8.0.0 does not at this time offer API support to add operations to an existing CUDA graph directly; however, the captured graph may be added to an existing graph through the existing CUDA Graphs API. Regarding texture usage, cuDNN 8.0.0 by default will not enable texture usage; expert users may enable texture usage where allowed, but that usage will prevent a successful CUDA Graph capture until disabled. In order for cuDNN 8.0.0 to be graph-capture compatible library-wide, the cuDNN 8.0.0 CTC API was updated as described elsewhere. The usual restrictions for CUDA Graphs apply in addition to these restrictions here. 
- New APIs for convolution
- A new set of API functions to provide a brand new approach to cuDNN that offers more fine-grain control of performance, numerical properties, and so on for convolution. Using this API, users directly access various engines that compute convolution forward propagation, backward data, backward filter, and generic support for fusion starting with a limited support in this cuDNN 8.0.0 release and expanding support in follow-up releases. Each engine has performance-tuning knobs such as GEMM tiling and split-K. Users can use this API to fine-tune their network by querying cuDNN’s heuristics, or doing their own, to find the most optimal engine configuration with which cuDNN computes each network layer.
- NVIDIA Ampere architecture GPU support (not applicable for Jetson platforms)
- 
                                       
- NVIDIA Turing and NVIDIA Volta architecture improvements
- 
                                       
- Operation fusion
- Operation fusion can be achieved using the backend API. The general workflow is similar to running unfused operations, except that instead of creating a single operation Operation Graph, the user may specify a multi-operation Operation Graph. For more information, refer to Operation Fusion using the Backend API.
- Depthwise convolution extension
- We have extended the fprop and dgrad NHWC depthwise kernels to support more combinations (filter sizes/strides) such as 5x5/1x1, 5x5/2x2, 7x7/1x1, 7x7/2x2 (in addition to what we already have, 1x1/1x1, 3x3/1x1, 3x3/2x2), which provides good performance.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, refer to the NVIDIA cuDNN Support Matrix.
2. cuDNN Release 7.x.x
2.1. cuDNN Release 7.6.5
This is the cuDNN 7.6.5 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN release notes, refer to the NVIDIA cuDNN Archives.
Key Features and Enhancements
The following features and enhancements have been added to this release:
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
Limitations
- 
                                    RNN and multihead attention API calls may exhibit non-deterministic behavior when the cuDNN 7.6.5 library is built with CUDA Toolkit 10.2 or higher. This is the result of a new buffer management and heuristics in the cuBLAS library. As described in the Results Reproducibility section in the cuBLAS Library User's Guide, numerical results may not be deterministic when cuBLAS APIs are launched in more than one CUDA stream using the same cuBLAS handle. This is caused by two buffer sizes (16 KB and 4 MB) used in the default configuration. When a larger buffer size is not available at runtime, instead of waiting for a buffer of that size to be released, a smaller buffer may be used with a different GPU kernel. The kernel selection may affect numerical results. The user can eliminate the non-deterministic behavior of cuDNN RNN and multihead attention APIs, by setting a single buffer size in the CUBLAS_WORKSPACE_CONFIG environmental variable, for example, :16:8 or :4096:2. The first configuration instructs cuBLAS to allocate eight buffers of 16 KB each in GPU memory while the second setting creates two buffers of 4 MB each. The default buffer configuration in cuBLAS 10.2 and 11.0 is :16:8:4096:2, that is, we have two buffer sizes. In earlier cuBLAS libraries, such as cuBLAS 10.0, it used the :16:8 non-adjustable configuration. When buffers of only one size are available, the behavior of cuBLAS calls is deterministic in multi-stream setups. 
Known Issues
- Updated: August 24, 2020Two-dimensional forward convolutions using algo1 may segfault when the filter size is large. For example, we have observed this issue when the filter width and height are more than or equal to 363. 
- Updated: September 28, 2020cudnnConvolutionForward(), cudnnConvolutionBackwardData(), and cudnnConvolutionBackwardFilter() calls with algo0 or algo1 can result in an illegal memory access for PSEUDO_HALF_CONFIG data configuration when the number of elements in the output tensor is odd. This can be mitigated by allocating one extra element in the output buffer. 
2.2. cuDNN Release 7.6.4
This is the cuDNN 7.6.4 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
For previous cuDNN release notes, refer to the NVIDIA cuDNN Archives.
Key Features and Enhancements
The following features and enhancements have been added to this release:
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the NVIDIA hardware, see the NVIDIA cuDNN Support Matrix.
2.3. cuDNN Release 7.6.3
This is the cuDNN 7.6.3 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes. These release notes are applicable to both cuDNN and NVIDIA JetPack users unless appended specifically with (not applicable for Jetson platforms).
For previous cuDNN release notes, refer to the NVIDIA cuDNN Archives.
2.4. cuDNN Release 7.6.2
This is the cuDNN 7.6.2 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
For previous cuDNN release notes, refer to the NVIDIA cuDNN Archives.
2.5. cuDNN Release 7.6.1
2.6. cuDNN Release 7.6.0
2.7. cuDNN Release 7.5.1
2.8. cuDNN Release 7.5.0
2.9. cuDNN Release 7.4.2
This is the cuDNN 7.4.2 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
Fixed Issues
The following issues have been fixed in this release:
- In some cases when the data is in CUDNN_DATA_HALF and NHWC, illegal memory access may occur for cudnnBatchNormalization* functions in the cuDNN 7.4.1 library. This is now fixed.
- When the data is in CUDNN_DATA_HALF and NHWC, for cudnnBatchNormalization* functions when (N*H*W) is large and odd number, the output may contain wrong results. This is fixed.
- When calling the cudnnConvolutionBiasActivationForward() function with the algo parameter set to CUDNN_CONVOLUTION_FWD_ALGO_FFT and the activationDesc parameter set to CUDNN_ACTIVATION_RELU and sufficiently large inputs, the ReLU operation is not applied and negative values are passed through to the output. This issue is now fixed. This issue was present in all previous cuDNN versions.
- Performance regression was introduced in cuDNN 7.4.1 for cudnnConvolutionBwdFilterAlgo_t() function with CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1 algorithm. This is fixed.
Known Issues
The following issues and limitations exist in this release:
- When cudnnBatchNormMode_t is set to CUDNN_BATCHNORM_SPATIAL_PERSISTENT and the I/O tensors are in NHWC format and of CUDNN_DATA_HALF datatype, then, on Windows only, the cudnnBatchNormalization*Ex functions are supported only with the device in TCC mode. See Tesla Compute Cluster Mode for Windows. This issue is not present on Linux systems. This issue is present in cuDNN 7.4.1 and this current version.
- 
                                 In some cases, the 3D convolution will have a reduced performance on NVIDIA Turing GPUs, compared to the previous cuDNN releases. 
- 
                                 The functions cudnnGetConvolutionForwardAlgorithm_v7() and cudnnGetConvolutionForwardWorkspaceSize() will return CUDNN_STATUS_SUCCESS, but the execution of the convolution returns CUDNN_STATUS_NOT_SUPPORTED. This issue is present in cuDNN 7.2.2 library and later versions. 
2.10. cuDNN Release 7.4.1
This is the cuDNN 7.4.1 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
Key Features and Enhancements
The following enhancements have been added to this release:
- 
                                 Added a new family of fast NHWC batch normalization functions. Refer to the following five new functions and one new type descriptor: - 
                                       cudnnGetBatchNormalizationForwardTrainingExWorkspaceSize() function 
- 
                                       cudnnBatchNormalizationForwardTrainingEx function 
- 
                                       cudnnGetBatchNormalizationBackwardExWorkspaceSize() function 
- 
                                       cudnnBatchNormalizationBackwardEx() function 
- 
                                       cudnnGetBatchNormalizationTrainingExReserveSpaceSize() function 
- 
                                       cudnnBatchNormOps_t type descriptor 
 
- 
                                       
- For API Logging, a conversion specifier for the process id is added. With this, the process id can be included in the log file name. Refer to API Logging for more information.
- Performance of cudnnPoolingBackward() is enhanced for the average pooling when using NHWC data format-for both the CUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDING and CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING cases of cudnnPoolingMode_t.
- Performance of the strided convolution in cudnnConvolutionBackwardData() is enhanced when the filter is in NHWC format and the data type is TRUE_HALF_CONFIG, PSEUDO_HALF_CONFIG, or FLOAT_CONFIG. For strides u,v < r,s the performance is further enhanced.
- Significantly improved the performance of cudnnConvolutionForward(), cudnnConvolutionBackwardData(), and cudnnConvolutionBackwardFilter() functions on RCNN models such as Fast RCNN, Faster RCNN, and Mask RCNN.
Fixed Issues
The following issues have been fixed in this release:
- The following set-up was giving Misaligned Address error in cuDNN 7.3.x. This is fixed in cuDNN 7.4.1: For the cudnnConvolutionForward() function with the CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM algorithm, in the data type configuration of PSEUDO_HALF_CONFIG, when the input and output tensors are in NHWC and the filter is 1x1 and NCHW, and Tensor Op is enabled.
- For a few convolution sizes for ALGO_0 and ALGO_1, the performance of the function cudnnConvolutionBackwardFilter() was degraded in cuDNN 7.3.1. This is now fixed.
- Fixed. In cuDNN 7.3.1, the function cudnnAddTensor was computing incorrect results when run on GPUs with the compute capability < 6.0 (before NVIDIA Pascal).
Known Issues
The following issues and limitations exist in this release:
- When calling the cudnnConvolutionBiasActivationForward() function with the algo parameter set to CUDNN_CONVOLUTION_FWD_ALGO_FFT and the activationDesc parameter set to CUDNN_ACTIVATION_RELU and sufficiently large inputs, the ReLU operation is not applied and negative values are passed through to the output. This issue is present in all previous cuDNN versions.
2.11. cuDNN Release 7.3.1
This is the cuDNN 7.3.1 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
Key Features and Enhancements
The following enhancements have been added to this release:
- The FFT tiling algorithms for convolution have been enhanced to support strided convolution. In specific, for the algorithms CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING and CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT_TILING, the convDesc's vertical and horizontal filter stride can be 2 when neither the filter width nor the filter height is 1.
- The CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD algorithm for cudnnConvolutionForward() and cudnnConvolutionBackwardData() now give superior performance for NVIDIA Volta architecture. In addition, the mobile version of this algorithm in the same functions gives superior performance for Maxwell and NVIDIA Pascal architectures.
- Dilated convolutions now give superior performance for cudnnConvolutionForward(), cudnnConvolutionBackwardData(), and cudnnConvolutionBackwardFilter() on NVIDIA Volta architecture, in some cases.
Known Issues and Limitations
The following issues and limitations exist in this release:
- For the cudnnConvolutionForward(), when using a 1x1 filter with input and output tensors of NHWC format and of CUDNN_DATA_HALF (half precision) type, and the filter format is NCHW, with compute type of float, cuDNN will generate incorrect results.
- On Quadro P4000, when calling cudnnConvolutionForward() function with CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED algorithm, there may be a small chance of seeing intermittent inaccurate results.
- When using cudnnConvolutionBackwardFilter() with CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0 in mixed precision computation, with I/O in CUDNN_DATA_HALF (half precision) and compute type of float, when the number of batches (N) is larger than 1 the results might include INF due to an intermediate down convert to half float. In other words, with an accumulation of float for all intermediate values (such as in CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1) the result will be a finite half precision float. This limitation also exists in all previous cuDNN versions.
Fixed Issues
The following issues have been fixed in this release:
- Fixed a pointer arithmetic integer overflow issue in RNN forward and backward functions, when sequence length and mini-batch size are sufficiently large.
- When tensor cores are enabled in cuDNN 7.3.0, the cudnnConvolutionBackwardFilter() calculations were performing an illegal memory access when K and C values are both non-integral multiples of 8. This issue is fixed.
- For the CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1 algorithm in cudnnConvolutionBackwardFilter(), on NVIDIA Volta, the tensor operations were occasionally failing when the filter-spatial size (filter h * filter w) was greater than 64. This issue is fixed.
- While running cuDNN 7.3.0 on NVIDIA Turing with CUDA 10.0, r400 driver, the functions cudnnRNNForwardTraining(Ex) and cudnnRNNForwardInference(Ex) errored out returning CUDNN_STATUS_NOT_SUPPORTED. This issue is fixed.
- In cuDNN 7.3.0, when using CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1 with tensor data or filter data in NHWC format, the function might have resulted in a silent failure. This is now fixed.
2.12. cuDNN Release 7.3.0
This is the cuDNN 7.3.0release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
Key Features and Enhancements
The following enhancements have been added to this release:
- Support is added to the following for the dilated convolution, for
                                    NCHW and NHWC filter formats:
                                       - cudnnConvolutionForward() for 2D
- CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
- cudnnConvolutionBackwardData() for 2D
- CUDNN_CONVOLUTION_BWD_DATA_ALGO_1
- cudnnConvolutionBackwardFilter() for 2D
- CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1
 For these supported cases, the dilated convolution is expected to offer superior speed, compared to the existing dilated convolution with algo 0. 
- Grouped convolutions for depth-wise separable convolutions are optimized for the following NHWC formats: HHH (input: Half, compute: Half, output: Half), HSH, and SSS.
- While using CUDNN_TENSOR_OP_MATH or CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION, with the tensor cores, the c, and k dimensions of the tensors are now padded to multiples of 8 (as needed), to allow a tensor core kernel to run.
- The CUDNN_BATCHNORM_SPATIAL_PERSISTENT algo is enhanced in cudnnBatchNormalizationForwardTraining() and cudnnBatchNormalizationBackward() to propagate NaN-s or Inf-s as in a pure floating point implementation (the "persistent" flavor of the batch normalization is optimized for speed and it uses integer atomics for inter thread-block reductions). In earlier versions of cuDNN, we recommended invoking cudnnQueryRuntimeError() to ensure that no overflow was encountered. When it happened, the best practice was to discard the results, and use CUDNN_BATCHNORM_SPATIAL instead, as some results generated by CUDNN_BATCHNORM_SPATIAL_PERSISTENT could be finite but invalid. This behavior is now corrected: NaN-s and Inf-s are consistently output when intermediate results are out of range. The refined implementation simulates math operations on special floating point values, for example, +Inf-Inf=NaN.
Known Issues and Limitations
Following issues and limitations exist in this release:
- When tensor cores are enabled in cuDNN 7.3.0, the wgrad calculations will perform an illegal memory access when K and C values are both non-integral multiples of 8. This will not likely produce incorrect results, but may corrupt other memory depending on the user buffer locations. This issue is present on NVIDIA Volta and NVIDIA Turing architectures.
- Using cudnnGetConvolution*_v7 routines with cudnnConvolutionDescriptor_t set to CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION leads to incorrect outputs. These incorrect outputs will consist only of CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION cases, instead of also returning the performance results for both DEFAULT_MATH and CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION cases.
Fixed Issues
The following issues have been fixed in this release:
- Using cudnnConvolutionBackwardData() with CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD algorithm produced incorrect results due to an incorrect filter transform. This issue was present in cuDNN 7.2.1.
- For INT8 type, with xDesc and yDesc of NHWC format, the cudnnGetConvolutionForwardAlgorithm_v7 function was incorrectly returning CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM as a valid algorithm. This is fixed.
- cudnnConvolutionForward() using CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD intermittently produced incorrect results in cuDNN 7.2, due to a race condition. This issue is fixed.
- When running cudnnConvolutionBackwardFilter() with NHWC filter format, when n, c, and k are all multiple of 8, and when the workSpace input is exactly as indicated by cudnnGetConvolutionBackwardFilterWorkspaceSize(), leads to error in cuDNN 7.2. This is fixed.
- When the user runs cudnnRNNForward* or cudnnRNNBackward* with FP32 I/O on sm_70 or sm_72, with RNN descriptor's algo field set to CUDNN_RNN_ALGO_PERSIST_STATIC, and cudnnMathType_t type set to CUDNN_TENSOR_OP_MATH using cudnnSetRNNMatrixMathType, then the results were incorrect. This is fixed.
- When the user runs cudnnRNNForward* or cudnnRNNBackward* with FP32 I/O on sm_70 or sm_72, with RNN descriptor's algo field set to CUDNN_RNN_ALGO_PERSIST_STATIC, and cudnnMathType_t type set to CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION using cudnnSetRNNMatrixMathType, then the resulting performance was suboptimal. This is fixed.
- Convolution routines with filter format as NHWC require both input and output formats to be NHWC. However, in cuDNN 7.2 and earlier, this condition was not being checked for, as a result of which silent failures may have occurred. This is fixed in 7.3.0 to correctly return CUDNN_STATUS_NOT_SUPPORTED.
2.13. cuDNN Release 7.2.1
This is the cuDNN 7.2.1 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
Key Features and Enhancements
The following enhancements have been added to this release:
- The following new functions are added to provide support for the padding
                                    mask for the cudnnRNN* family of functions: 
                                       - cudnnSetRNNPaddingMode(): Enables/disables the padded RNN I/O.
- cudnnGetRNNPaddingMode(): Reads the padding mode status.
- cudnnCreateRNNDataDescriptor() and cudnnDestroyRNNDataDescriptor(): Creates and destroys, respectively, cudnnRNNDataDescriptor_t, an RNN data descriptor.
- cudnnSetRNNDataDescriptor() and cudnnGetRNNDataDescriptor(): Initializes and reads, respectively, the RNN data descriptor.
- cudnnRNNForwardTrainingEx(): An extended version of the cudnnRNNForwardTraining() to allow for the padded (unpacked) layout for the I/O.
- cudnnRNNForwardInferenceEx(): An extended version of the cudnnRNNForwardInference() to allow for the padded (unpacked) layout for the I/O.
- cudnnRNNBackwardDataEx(): An extended version of the cudnnRNNBackwardData() to allow for the padded (unpacked) layout for the I/O.
- cudnnRNNBackwardWeightsEx(): An extended version of the cudnnRNNBackwardWeights() to allow for the padded (unpacked) layout for the I/O.
 
- 
                                    Added support for cell clipping in cuDNN LSTM. The following new functions are added: - cudnnRNNSetClip() and cudnnRNNGetClip(): Sets and retrieves, respectively, the LSTM cell clipping mode.
 
- Accelerate your convolution computation with this new feature: When the
                                    input channel size c is a multiple of 32, you can use the
                                    new data type CUDNN_DATA_INT8x32 to accelerate your
                                    convolution computation. 
                                    Note: This new data type CUDNN_DATA_INT8x32 is only supported by sm_72.
- Enhanced the family of cudnnFindRNN* functions. The findIntensity input to these functions now enables the user to control the overall runtime of the RNN find algorithms, by selecting a percentage of a large Cartesian product space to be searched.
- A new mode CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION is added to cudnnMathType_t. The computation time for FP32 tensors can be reduced by selecting this mode.
- The functions cudnnRNNForwardInference(), cudnnRNNForwardTraining(), cudnnRNNBackwardData(), and cudnnRNNBackwardWeights() will now perform down conversion of FP32 I/O only when CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION is set.
- Improved the heuristics for cudnnGet*Algorithm() functions.
Known Issues and Limitations
Following issues and limitations exist in this release:
- For FP16 inputs, the functions cudnnGetConvolutionForwardAlgorithm(), cudnnGetConvolutionBackwardDataAlgorithm(), and cudnnGetConvolutionBackwardFilterAlgorithm() will obtain a slower algorithm.
- For cases where beta is not equal to zero, and when the input
                                 channel size is greater than 65535, then the below
                                 cudnnConvolutionBackwardFilter() algorithms may return
                                 EXECUTION_FAILED error: 
                                    - CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0
- CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1
- CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3
 
- This is a rare occurrence: When beta is not equal to zero, the function cudnnFindConvolutionBackwardFilterAlgorithm() may not return the fastest algorithm available for cudnnConvolutionBackwardFilter().
- Grouped convolutions are not supported in the TRUE_HALF_CONFIG (convDesc is CUDNN_DATA_HALF) data type configuration. As a workaround, the PSEUDO_HALF_CONFIG (convDesc is CUDNN_DATA_FLOAT) data type configuration can be used without losing any precision.
- For the cudnnConvolutionBiasActivationForward() function, if the input cudnnActivationMode_t is set to enum value CUDNN_ACTIVATION_IDENTITY, then the input cudnnConvolutionFwdAlgo_t must be set to the enum value CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM.
- When the user runs cudnnRNNForward* or cudnnRNNBackward* with FP32 I/O, on sm_70 or sm_72, with RNN descriptor's algo field set to CUDNN_RNN_ALGO_PERSIST_STATIC, and math type set to CUDNN_TENSOR_OP_MATH using cudnnSetRNNMatrixMathType(), then the results are incorrect.
- When the user runs cudnnRNNForward* or cudnnRNNBackward* with FP32 I/O, on sm_70 or sm_72, with RNN descriptor's algo field set to CUDNN_RNN_ALGO_PERSIST_STATIC, and math type set to CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION using cudnnSetRNNMatrixMathType(), then the resulting performance is suboptimal.
Fixed Issues
The following issues have been fixed in this release:
- The cudnnConvolutionBackwardData() function produced incorrect
                                 					result under these conditions: 
                                    - The algo input is set to CUDNN_CONVOLUTION_BWD_DATA_ALGO_1 in cudnnConvolutionBwdDataAlgo_t, and
- CUDNN_TENSOR_OP_MATH is selected. 
                                       Under these conditions, the dgrad computation was giving incorrect results when the data is not packed and the data format is NCHW. This is fixed. 
 
- 
                                 When the cudnnConvolutionFwdAlgo_t() was set to CONVOLUTION_FWD_ALGO_FFT_TILING then the function cudnnConvolutionForward() was leading to illegal memory access. This is now fixed. 
- cudnnPoolingBackward() was failing when using a large kernel size used for 'global_pooling' with NHWC I/O layout. This is fixed.
- The below two items are fixed: If you set RNN mathtype to
                                 CUDNN_TENSOR_OP_MATH, and run RNN on sm6x or earlier
                                 hardware: 
                                    - You may have received CUDNN_STATUS_NOT_SUPPORTED when algo selected is CUDNN_RNN_ALGO_STANDARD or CUDNN_RNN_ALGO_PERSIST_STATIC.
- You may have received incorrect results when algo selected is CUDNN_RNN_ALGO_PERSIST_DYNAMIC.
 
- If you passed in variable sequence length input tensor to cudnnRNNForwardInference(), cudnnRNNForwardTraining(), cudnnRNNBackwardData(), and used CUDNN_RNN_ALGO_PERSIST_STATIC or CUDNN_RNN_ALGO_PERSIST_DYNAMIC, then you may have received incorrect results. Now this is being checked, and CUDNN_STATUS_NOT_SUPPORTED will be returned.
2.14. cuDNN Release 7.1.4
This is the cuDNN 7.1.4 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
Key Features and Enhancements
The following enhancements have been added to this release:
Known Issues
Following are known issues in this release:
Fixed Issues
The following issues have been fixed in this release:
2.15. cuDNN Release 7.1.3
This is the cuDNN 7.1.3 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
Known Issues
Following are known issues in this release:
Fixed Issues
The following issues have been fixed in this release:
2.16. cuDNN Release 7.1.2
This is the cuDNN 7.1.2 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
Key Features and Enhancements
The following enhancements have been added to this release:
Known Issues
Following are known issues in this release:
Fixed Issues
The following issues have been fixed in this release:
2.17. cuDNN Release 7.1.1
This is the cuDNN 7.1.1 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
Key Features and Enhancements
The following enhancements have been added to this release:
Known Issues
Following are known issues in this release:
Fixed Issues
The following issues have been fixed in this release:
2.18. cuDNN Release 7.0.5
This is the cuDNN 7.0.5 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
Known Issues
Following are known issues in this release:
Fixed Issues
The following issues have been fixed in this release:
2.19. cuDNN Release 7.0.4
This is the cuDNN 7.0.4 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
Key Features and Enhancements
- Performance improvements for grouped convolutions when input channels and
                                    output channels per group are one, two, or four for the following
                                    algorithms:
                                       - CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM
- CUDNN_CONVOLUTION_BWD_DATA_ALGO0
- CUDNN_CONVOLUTION_BWD_DATA_ALGO_1
- CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0
- CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1
 
Known Issues
Following are known issues in this release:
Fixed Issues
The following issues have been fixed in this release:
2.20. cuDNN Release 7.0.3
This is the cuDNN 7.0.3 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
Known Issues
The following are known issues in this release:
- CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING may cause CUDA_ERROR_ILLEGAL_ADDRESS. This issue affects input images of just one pixel in width and certain n, c, k, h combinations.
Fixed Issues
The following issues have been fixed in this release:
2.22. cuDNN Release 7.0.2
This is the cuDNN 7.0.2 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.
Key Features and Enhancements
- This is a patch release of cuDNN 7.0 and includes bug fixes and performance
                                    improvements mainly on NVIDIA Volta.
                                    - Algo 1 Convolutions Performance Improvements
- Performance improvements were made to CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM, CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1, and CUDNN_CONVOLUTION_BWD_DATA_ALGO_1. These improvements consist of new SASS kernels and improved heuristics. The new kernels implement convolutions over various data sizes and tile sizes. The improved heuristics take advantage of these new kernels.
 
Known Issues
The following are known issues in this release:
cuDNN Release 7.0.1
This is the cuDNN 7.0.1 release notes. This release includes the following changes.
cuDNN v7.0.1 is the first release to support the NVIDIA Volta GPU architecture. In addition, cuDNN v7.0.1 brings new layers, grouped convolutions, and improved convolution find as error query mechanism.
Key Features and Enhancements
This cuDNN release includes the following key features and enhancements.
- Version 7.0.1 of cuDNN is the first to support the Tensor Core operations in its implementation. Tensor Cores provide highly optimized matrix multiplication building blocks that do not have an equivalent numerical behavior in the traditional instructions, therefore, its numerical behavior is slightly different.
- The cudnnSetConvolutionMathType and
                                    cudnnSetRNNMatrixMathType functions enable you to
                                    choose whether or not to use Tensor Core operations in the convolution
                                    and RNN layers respectively by setting the math mode to either
                                    CUDNN_TENSOR_OP_MATH or
                                    CUDNN_DEFAULT_MATH. 
                                    Tensor Core operations perform parallel floating point accumulation of multiple floating point products. Setting the math mode to CUDNN_TENSOR_OP_MATH indicates that the library will use Tensor Core operations. The default is CUDNN_DEFAULT_MATH. This default indicates that the Tensor Core operations will be avoided by the library. The default mode is a serialized operation whereas, the Tensor Core is a parallelized operation, therefore, the two might result in slightly different numerical results due to the different sequencing of operations.Note: The library falls back to the default math mode when Tensor Core operations are not supported or not permitted.
- A new interface that allows applications to perform convolution groups in the convolution layers in a single API call.
- cudnnCTCLoss provides a GPU implementation of the Connectionist Temporal Classification (CTC) loss function for RNNs. The CTC loss function is used for phoneme recognition in speech and handwriting recognition.
- The CUDNN_BATCHNORM_SPATIAL_PERSISTENT function is a new batch normalization mode for cudnnBatchNormalizationForwardTraining and cudnnBatchNormalizationBackward. This mode is similar to CUDNN_BATCHNORM_SPATIAL, however, it can be faster for some tasks.
- The cudnnQueryRuntimeError function reports error codes written by GPU kernels when executing cudnnBatchNormalizationForwardTraining and cudnnBatchNormalizationBackward with the CUDNN_BATCHNORM_SPATIAL_PERSISTENT mode.
- This new API returns all algorithms sorted by expected performance (using internal heuristics). These algorithms are output similarly to cudnnFindConvolutionForwardAlgorithm.
- This new API returns all algorithms sorted by expected performance (using internal heuristics). These algorithms are output similarly to cudnnFindConvolutionBackwardAlgorithm.
- This new API returns all algorithms sorted by expected performance (using internal heuristics). These algorithms are output similarly to cudnnFindConvolutionBackwardFilterAlgorithm.
- The MUL_NO_ZEROS function is a multiplication reduction that ignores zeros in the data.
- The OP_TENSOR_NOT function is a unary operation that takes the negative of (alpha*A).
- The cudnnGetDropoutDescriptor function allows applications to get dropout values.
Notices
Notice
This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. NVIDIA Corporation (“NVIDIA”) makes no representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality.
NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice.
Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.
NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.
NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and therefore such inclusion and/or use is at customer’s own risk.
NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the applicability of any information contained in this document, ensure the product is suitable and fit for the application planned by customer, and perform the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.
No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under this document. Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property rights of the third party, or a license from NVIDIA under the patents or other intellectual property rights of NVIDIA.
Reproduction of information in this document is permissible only if approved in advance by NVIDIA in writing, reproduced without alteration and in full compliance with all applicable export laws and regulations, and accompanied by all associated conditions, limitations, and notices.
THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the products described herein shall be limited in accordance with the Terms of Sale for the product.
Arm
Arm, AMBA and Arm Powered are registered trademarks of Arm Limited. Cortex, MPCore and Mali are trademarks of Arm Limited. "Arm" is used to represent Arm Holdings plc; its operating company Arm Limited; and the regional subsidiaries Arm Inc.; Arm KK; Arm Korea Limited.; Arm Taiwan Limited; Arm France SAS; Arm Consulting (Shanghai) Co. Ltd.; Arm Germany GmbH; Arm Embedded Technologies Pvt. Ltd.; Arm Norway, AS and Arm Sweden AB.
HDMI
HDMI, the HDMI logo, and High-Definition Multimedia Interface are trademarks or registered trademarks of HDMI Licensing LLC.
Blackberry/QNX
Copyright © 2020 BlackBerry Limited. All rights reserved.
Trademarks, including but not limited to BLACKBERRY, EMBLEM Design, QNX, AVIAGE, MOMENTICS, NEUTRINO and QNX CAR are the trademarks or registered trademarks of BlackBerry Limited, used under license, and the exclusive rights to such trademarks are expressly reserved.
Trademarks
NVIDIA, the NVIDIA logo, and BlueField, CUDA, DALI, DRIVE, Hopper, JetPack, Jetson AGX Xavier, Jetson Nano, Maxwell, NGC, Nsight, Orin, Pascal, Quadro, Tegra, TensorRT, Triton, Turing and Volta are trademarks and/or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.