cuDNN Release Notes v7.0.3

Key Features and Enhancements

Performance improvements for various cases:
  • Forward Grouped Convolutions where input channel per groups is 1, 2 or 4 and hardware is Volta or Pascal.
  • cudnnTransformTensor() where input and output tensor is packed.
    Note: This is an improved fallback, improvements will not be seen in all cases.

Known Issues

The following are known issues in this release:

  • CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING may cause CUDA_ERROR_ILLEGAL_ADDRESS. This issue affects input images of just one 1 pixel in width and certain n, c, k, h combinations.

Fixed Issues

The following issues have been fixed in this release:

  • AddTensor and TensorOp produce incorrect results for half and INT8 inputs for various use cases.
  • cudnnPoolingBackward() can produce incorrect values for rare cases of non-deterministic MAX pooling with window_width > 256. These rare cases are when the maximum element in a window is duplicated horizontally (along width) by a stride of 256*k for some k. The behavior is now fixed to accumulate derivatives for the duplicate that is left-most.
  • cudnnGetConvolutionForwardWorkspaceSize() produces incorrect workspace size for algorithm FFT_TILING for 1d convolutions. This only occurs for large sized convolutions where intermediate calculations produce values greater than 2^31 (2 to the power of 31).
  • CUDNN_STATUS_NOT_SUPPORTED returned by cudnnPooling*() functions for small x image (channels * height * width < 4).