cuDNN Release Notes v7.6.1

Key Features and Enhancements

The following features and enhancements have been added to this release:

Fixed Issues

The following issues have been fixed in this release:

  • In cuDNN 7.6.0, the function cudnnGetConvolutionBackwardDataWorkspaceSize() returns a value for which cudnnConvolutionBackwardData(), when used with CUDNN_CONVOLUTION_BWD_DATA_ALGO_0, returns CUDNN_STATUS_NOT_SUPPORTED. This is fixed in cuDNN 7.6.1 so that now cudnnGetConvolutionBackwardDataWorkspaceSize() returns a proper value for cudnnConvolutionBackwardData().
  • In cuDNN 7.6.0 and earlier versions, when all the following conditions are true,
    • RNN model is bi-directional,
    • Cell type is LSTM,
    • cudnnRNNAlgo_t= CUDNN_RNN_ALGO_STANDARD, and
    • Dropout probability was greater than zero,

    then the cudnnRNNBackwardWeights() function produces inaccurate and occasionally non-deterministic results.

    This is fixed in cuDNN 7.6.1.

    An underlying issue, where the same buffer was used for left-to-right and right-to-left directions when re-computing forward dropout results passed from one RNN layer to the next, was the cause of the bug.

  • A bug in cuDNN 7.6.0 and earlier versions, in the cudnnRNNForwardTraining() function, related to dropout, is fixed in cuDNN 7.6.1.

    When all the following conditions are true:

    • cudnnRNNAlgo_t=CUDNN_RNN_ALGO_PERSIST_STATIC,
    • cudnnMathType_t is CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION, and
    • input data type is CUDNN_DATA_FLOAT,

    then the FP32-to-FP16 conversion might be applied as a performance optimization.

    When this downconversion is scheduled, a GPU kernel invoked by cudnnDropoutForward() would crash due to incorrect parameters being passed. In this case CUDA runtime reports the "misaligned address" error when reading the data from global memory.

  • In cuDNN 7.6.0, on RHEL7 only, the /usr/src/cudnn_samples_v7/samples_common.mk file is missing. This requires a workaround to compile the cuDNN samples. This is fixed in cuDNN 7.6.1 and the workaround is not needed for cuDNN 7.6.1 .
  • In cuDNN 7.6.0, on pre-Volta hardware only, the function cudnnGetConvolutionBackwardFilterWorkspaceSize can erroneously return CUDNN_STATUS_SUCCESS for cudnnConvolutionBackwardFilter for 3D convolutions, using CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1 with NDHWC layout. When this occurs, the cudnnConvolutionBackwardFilter function will process the data using a kernel that expects the data in NCDHW layout (the only format supported by wDesc in this case), leading to incorrect results. In cuDNN 7.6.1, this is fixed so that cudnnGetConvolutionBackwardFilterWorkspaceSize will now return CUDNN_STATUS_NOT_SUPPORTED.
  • In cuDNN 7.5.x and 7.6.0 for Jetson platform, in some cases the function cudnnConvolutionBackwardData , when used with CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD, might return incorrect results. This is fixed in cuDNN 7.6.1.
  • When the data type configuration is FLOAT_CONFIG, then cudnnGetConvolution*Algorithm(), for a few convolution sizes, incorrectly returns a slow algorithm for the Pascal architecture. This is fixed in cuDNN 7.5.0 and later versions.
  • When using the fusedOps API with the enum CUDNN_FUSED_SCALE_BIAS_ACTIVATION_CONV_BNSTATS or CUDNN_FUSED_SCALE_BIAS_ACTIVATION_WGRAD, and when input tensor is in NCHW format or is not fully-packed, then incorrect results may be produced. This is now fixed in cuDNN 7.6.1.

Known Issues

The following issues and limitations exist in this release:

  • Algorithms returned by cudnnGetConvolution*Algorithm() may, in some limited use cases, fail to execute when they are actually run. This is a cuDNN library-wide issue and applies for convolution forward, convolution backward data, and convolution backward filter operations. This issue is also present in versions prior to cuDNN 7.6.1.
  • When the input and output tensors are in NHWC and the filter is 1x1 and NCHW, the performance of the function cudnnConvolutionBackwardData() might be degraded.
  • In cuDNN 7.6.1, when using the experimental multi-head attention API, it is possible that the forward and backward paths produce different results for the BERT model, when the batch size is greater than one and/or the number of heads is greater than one.
  • In cuDNN 7.6.1, on Volta architecture only, there may be a performance degradation when the function cudnnConvolutionBackwardFilter() is used for 3D convolutions with CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1.
  • In cuDNN 7.6.1, on Turing and Pascal architectures, performance may be degraded for cudnnConvolutionBackwardData(), when used with the following conditions:
    • CUDNN_CONVOLUTION_BWD_DATA_ALGO_0 for 3D convolutions
    • wDesc, dyDesc and dxDesc are all in NCDHW
    • Data type configuration is FLOAT_CONFIG (i.e., single precision data and compute)