cuDNN Release Notes v7.0.2
Key Features and Enhancements
This is a patch release of cuDNN 7.0 and includes bug fixes and performance improvements mainly on Volta.
- Algo 1 Convolutions Performance Improvements
Performance improvements were made to
CUDNN_CONVOLUTION_BWD_DATA_ALGO_1. These improvements consist of new SASS kernels and improved heuristics. The new kernels implement convolutions over various data sizes and tile sizes. The improved heuristics take advantage of these new kernels.
The following are known issues in this release:
cudnnGetConvolutionForwardWorkspaceSize()returns overflowed size_t value for certain input shape for
cudnnPoolingBackward()fails for pooling window size > 256.
The following issues have been fixed in this release:
- Batch Norm
CUDNN_BATCHNORM_SPATIAL_PERSISTENTmight get into race conditions in certain scenarios.
- cuDNN convolution layers using
TENSOR_OP_MATHwith fp16 inputs and outputs and fp32 compute will use “round to nearest” mode instead of “round to zero” mode as in 7.0.1. This rounding mode has proven to achieve better results in training.
- Fixed synchronization logic in the
CUDNN_CTC_LOSS_ALGO_DETERMINISTICalgo for CTC. The original code would hang in rare cases.
- Convolution algorithms using
TENSOR_OP_MATHreturned a workspace size from
*GetWorkspaceSize()smaller than actually necessary.
- The results of int8 are inaccurate in certain cases when calling
cudnnConvolutionForward()in convolution layer.
xDesc’s channel = yDesc’s channel = groupCountcould compute incorrect values when vertical padding > 0.