Key Features and Enhancements
The following enhancements have been added to this release:
- The following new functions are added to provide support for the padding mask for the
cudnnRNN*
family of functions:cudnnSetRNNPaddingMode()
: Enables/disables the padded RNN input/output.cudnnGetRNNPaddingMode()
: Reads the padding mode status.cudnnCreateRNNDataDescriptor()
andcudnnDestroyRNNDataDescriptor()
: Creates and destroys, respectively,cudnnRNNDataDescriptor_t
, an RNN data descriptor.cudnnSetRNNDataDescriptor()
andcudnnGetRNNDataDescriptor()
: Initializes and reads, respectively, the RNN data descriptor.cudnnRNNForwardTrainingEx()
: An extended version of thecudnnRNNForwardTraining()
to allow for the padded (unpacked) layout for the input/output.cudnnRNNForwardInferenceEx()
: An extended version of thecudnnRNNForwardInference()
to allow for the padded (unpacked) layout for the input/output.cudnnRNNBackwardDataEx()
: An extended version of thecudnnRNNBackwardData()
to allow for the padded (unpacked) layout for the input/output.cudnnRNNBackwardWeightsEx()
: An extended version of thecudnnRNNBackwardWeights()
to allow for the padded (unpacked) layout for the input/output.
-
Added support for cell clipping in cuDNN LSTM. The following new functions are added:
cudnnRNNSetClip()
andcudnnRNNGetClip()
: Sets and retrieves, respectively, the LSTM cell clipping mode.
- Accelerate your convolution computation with this new feature: When the input channel size
c
is a multiple of 32, you can use the new data type CUDNN_DATA_INT8x32 to accelerate your convolution computation.Note:This new data type CUDNN_DATA_INT8x32 is only supported by sm_72.
- Enhanced the family of
cudnnFindRNN*
functions. ThefindIntensity
input to these functions now enable the user to control the overall runtime of the RNN find algorithms, by selecting a percentage of a large Cartesian product space to be searched. - A new mode CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION is added to
cudnnMathType_t
. The computation time for FP32 tensors can be reduced by selecting this mode. - The functions
cudnnRNNForwardInference()
,cudnnRNNForwardTraining()
,cudnnRNNBackwardData()
, andcudnnRNNBackwardWeights()
will now perform down conversion of FP32 input/output only when CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION is set. - Improved the heuristics for
cudnnGet*Algorithm()
functions.
Known Issues and Limitations
Following issues and limitations exist in this release:
- For FP16 inputs, the functions
cudnnGetConvolutionForwardAlgorithm()
,cudnnGetConvolutionBackwardDataAlgorithm()
, andcudnnGetConvolutionBackwardFilterAlgorithm()
will obtain a slower algorithm. - For cases where
beta
is not equal to zero, and when the input channel size is greater than 65535, then the belowcudnnConvolutionBackwardFilter()
algorithms may return EXECUTION_FAILED error:- CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0,
- CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1, and
- CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3
- This is a rare occurrence: When
beta
is not equal to zero, the functioncudnnFindConvolutionBackwardFilterAlgorithm()
may not return the fastest algorithm available forcudnnConvolutionBackwardFilter()
. - Grouped convolutions are not supported in the TRUE_HALF_CONFIG (
convDesc
is CUDNN_DATA_HALF) data type configuration. As a workaround, the PSEUDO_HALF_CONFIG (convDesc
is CUDNN_DATA_FLOAT) data type configuration can be used without losing any precision. - For the
cudnnConvolutionBiasActivationForward()
function, if the inputcudnnActivationMode_t
is set to enum value CUDNN_ACTIVATION_IDENTITY, then the inputcudnnConvolutionFwdAlgo_t
must be set to the enum value CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM. - When the user runs
cudnnRNNForward
* orcudnnRNNBackward
* with FP32 input/output, on sm_70 or sm_72, with RNN descriptor'salgo
field set toCUDNN_RNN_ALGO_PERSIST_STATIC
, and math type set toCUDNN_TENSOR_OP_MATH
viacudnnSetRNNMatrixMathType()
, then the results are incorrect. - When the user runs
cudnnRNNForward
* orcudnnRNNBackward
* with FP32 input/output, on sm_70 or sm_72, with RNN descriptor'salgo
field set toCUDNN_RNN_ALGO_PERSIST_STATIC
, and math type set toCUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION
viacudnnSetRNNMatrixMathType()
, then the resulting performance is suboptimal.
Fixed Issues
The following issues have been fixed in this release:
- The
cudnnConvolutionBackwardData()
function produced incorrect result under these conditions:- The
algo
input is set toCUDNN_CONVOLUTION_BWD_DATA_ALGO_1
incudnnConvolutionBwdDataAlgo_t
, and CUDNN_TENSOR_OP_MATH
is selected.Under above conditions, the dgrad computation was giving incorrect results when the data is not packed and the data format is NCHW. This is fixed.
- The
-
When the
cudnnConvolutionFwdAlgo_t()
was set toCONVOLUTION_FWD_ALGO_FFT_TILING
then the functioncudnnConvolutionForward()
was leading to illegal memory access. This is now fixed. cudnnPoolingBackward()
was failing when using a large kernel size used for 'global_pooling' with NHWC I/O layout. This is fixed.- The below two items are fixed: If you set RNN mathtype to CUDNN_TENSOR_OP_MATH, and run RNN on sm6x or earlier hardware:
- a. You may have received CUDNN_STATUS_NOT_SUPPORTED when algo selected is CUDNN_RNN_ALGO_STANDARD or CUDNN_RNN_ALGO_PERSIST_STATIC.
- b. You may have received incorrect results when algo selected is CUDNN_RNN_ALGO_PERSIST_DYNAMIC.
- If you passed in variable sequence length input tensor to
cudnnRNNForwardInference()
,cudnnRNNForwardTraining()
,cudnnRNNBackwardData()
, and used CUDNN_RNN_ALGO_PERSIST_STATIC or CUDNN_RNN_ALGO_PERSIST_DYNAMIC, then you may have received incorrect results. Now this is being checked, and CUDNN_STATUS_NOT_SUPPORTED will be returned.