Key Features and Enhancements
Performance improvements for various cases:
- Forward Grouped Convolutions where input channel per groups is 1, 2 or 4 and hardware is Volta or Pascal.
cudnnTransformTensor()
where input and output tensor is packed.Note:This is an improved fallback, improvements will not be seen in all cases.
Known Issues
The following are known issues in this release:
CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING
may causeCUDA_ERROR_ILLEGAL_ADDRESS
. This issue affects input images of just one 1 pixel in width and certainn
,c
,k
,h
combinations.
Fixed Issues
The following issues have been fixed in this release:
AddTensor
andTensorOp
produce incorrect results for half and INT8 inputs for various use cases.cudnnPoolingBackward()
can produce incorrect values for rare cases of non-deterministic MAX pooling withwindow_width > 256
. These rare cases are when the maximum element in a window is duplicated horizontally (along width) by a stride of256*k
for somek
. The behavior is now fixed to accumulate derivatives for the duplicate that is left-most.cudnnGetConvolutionForwardWorkspaceSize()
produces incorrect workspace size for algorithmFFT_TILING
for 1d convolutions. This only occurs for large sized convolutions where intermediate calculations produce values greater than 2^31 (2 to the power of 31).CUDNN_STATUS_NOT_SUPPORTED
returned bycudnnPooling*()
functions for smallx
image (channels * height * width < 4
).