Key Features and Enhancements
This is a patch release of cuDNN 7.0 and includes bug fixes and performance improvements mainly on Volta.
- Algo 1 Convolutions Performance Improvements
-
Performance improvements were made to
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
,CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1
, andCUDNN_CONVOLUTION_BWD_DATA_ALGO_1
. These improvements consist of new SASS kernels and improved heuristics. The new kernels implement convolutions over various data sizes and tile sizes. The improved heuristics take advantage of these new kernels.
Known Issues
The following are known issues in this release:
cudnnGetConvolutionForwardWorkspaceSize()
returns overflowed size_t value for certain input shape forCUDNN_CONVOLUTION_*_ALGO_FFT_TILING
.cudnnPoolingBackward()
fails for pooling window size > 256.
Fixed Issues
The following issues have been fixed in this release:
- Batch Norm
CUDNN_BATCHNORM_SPATIAL_PERSISTENT
might get into race conditions in certain scenarios.
- cuDNN convolution layers using
TENSOR_OP_MATH
with fp16 inputs and outputs and fp32 compute will use “round to nearest” mode instead of “round to zero” mode as in 7.0.1. This rounding mode has proven to achieve better results in training.
- Fixed synchronization logic in the
CUDNN_CTC_LOSS_ALGO_DETERMINISTIC
algo for CTC. The original code would hang in rare cases.
- Convolution algorithms using
TENSOR_OP_MATH
returned a workspace size from*GetWorkspaceSize()
smaller than actually necessary.
- The results of int8 are inaccurate in certain cases when calling
cudnnConvolutionForward()
in convolution layer.
cudnnConvolutionForward()
called withxDesc’s channel = yDesc’s channel = groupCount
could compute incorrect values when vertical padding > 0.