## cuDNN Release Notes v7.0.2

## Key Features and Enhancements

This is a patch release of cuDNN 7.0 and includes bug fixes and performance improvements mainly on Volta.

- Algo 1 Convolutions Performance Improvements
- Performance improvements were made to
`CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM`,`CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1`, and`CUDNN_CONVOLUTION_BWD_DATA_ALGO_1`. These improvements consist of new SASS kernels and improved heuristics. The new kernels implement convolutions over various data sizes and tile sizes. The improved heuristics take advantage of these new kernels.

## Known Issues

The following are known issues in this release:

`cudnnGetConvolutionForwardWorkspaceSize()`returns overflowed size_t value for certain input shape for`CUDNN_CONVOLUTION_*_ALGO_FFT_TILING`.

## Fixed Issues

The following issues have been fixed in this release:

- Batch Norm
`CUDNN_BATCHNORM_SPATIAL_PERSISTENT`might get into race conditions in certain scenarios.

- cuDNN convolution layers using
`TENSOR_OP_MATH`with fp16 inputs and outputs and fp32 compute will use “round to nearest” mode instead of “round to zero” mode as in 7.0.1. This rounding mode has proven to achieve better results in training.

- Fixed synchronization logic in the
`CUDNN_CTC_LOSS_ALGO_DETERMINISTIC`algo for CTC. The original code would hang in rare cases.

- Convolution algorithms using
`TENSOR_OP_MATH`returned a workspace size from`*GetWorkspaceSize()`smaller than actually necessary.

- cuDNN pooling backwards fails for pooling window size > 256.

- The results of int8 are inaccurate in certain cases when calling
`cudnnConvolutionForward()`in convolution layer.

`cudnnConvolutionForward()`called with`xDesc’s channel = yDesc’s channel = groupCount`could compute incorrect values when vertical padding > 0.