This is the cuDNN 7.6.4 release notes. This release includes fixes from the previous
cuDNN v7.x.x releases as well as the following additional changes.
Key Features and Enhancements
The following features and enhancements have been added to this release:
- Gained significant speed-up in multihead-attention forward training and
inference.
Compatibility
For the latest compatibility software versions of the OS, CUDA, the CUDA driver, and the
NVIDIA hardware, see the cuDNN Support Matrix for v7.6.4.
Limitations
- When launching a CUDA graph constructed via a stream capture that includes a
cudnnConvolutionForward operation, the subsequent
synchronization point reports a cudaErrorLaunchFailure
error. This error appears when cuDNN is set to use a non-default
stream.
Fixed Issues
The following issues have been fixed in this release:
- Earlier versions of cuDNN v7.6 contained symbols which would conflict with those
of in TensorRT 5.1 and later. In some cases, these conflicts could lead to
application crashes when applications linked against cuDNN and TensorRT. This
issue is fixed in cuDNN 7.6.4.
- Addressed the regressions that were introduced in the
cudnnConvolutionBiasActivationForward function in cuDNN
7.6.3. Previously, if this API had different values in destination data buffer
and zData buffer, then incorrect results were computed. This issue has been
resolved and now the API will compute correct results even if users provide an
arbitrary set of values to the destination data and zData.
- Multi-head attention will now return CUDNN_STATUS_ARCH_MISMATCH
for true-half configuration on devices with compute capability less than 5.3
(for example, most of Maxwell and all of Kepler, etc..), which do not have
native hardware support for true half computation. Previously, an error like
CUDNN_STATUS_EXECUTION_FAILED may be triggered or
inaccurate results may be produced.