cudnn_ops Library#
Data Type References#
These are the data type references in the cudnn_ops library.
Pointer To Opaque Struct Types#
These are the pointers to the opaque struct types in the cudnn_ops library.
cudnnActivationDescriptor_t#
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnActivationDescriptor_t is a pointer to an opaque structure holding the description of an activation operation. cudnnCreateActivationDescriptor() is used to create one instance, and cudnnSetActivationDescriptor() must be used to initialize this instance.
cudnnCTCLossDescriptor_t#
cudnnCTCLossDescriptor_t is a pointer to an opaque structure holding the description of a CTC loss operation. cudnnCreateCTCLossDescriptor() is used to create one instance, cudnnSetCTCLossDescriptor() is used to initialize this instance, and cudnnDestroyCTCLossDescriptor() is used to destroy this instance.
cudnnDropoutDescriptor_t#
cudnnDropoutDescriptor_t is a pointer to an opaque structure holding the description of a dropout operation. cudnnCreateDropoutDescriptor() is used to create one instance, cudnnSetDropoutDescriptor() is used to initialize this instance, cudnnDestroyDropoutDescriptor() is used to destroy this instance, cudnnGetDropoutDescriptor() is used to query fields of a previously initialized instance, cudnnRestoreDropoutDescriptor() is used to restore an instance to a previously saved off state.
cudnnFilterDescriptor_t#
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnFilterDescriptor_t is a pointer to an opaque structure holding the description of a filter dataset. cudnnCreateFilterDescriptor() is used to create one instance, and cudnnSetFilter4dDescriptor() or cudnnSetFilterNdDescriptor() must be used to initialize this instance.
cudnnLRNDescriptor_t#
cudnnLRNDescriptor_t is a pointer to an opaque structure holding the parameters of a local response normalization. cudnnCreateLRNDescriptor() is used to create one instance, and the routine cudnnSetLRNDescriptor() must be used to initialize this instance.
cudnnOpTensorDescriptor_t#
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnOpTensorDescriptor_t is a pointer to an opaque structure holding the description of a Tensor Core operation, used as a parameter to cudnnOpTensor(). cudnnCreateOpTensorDescriptor() is used to create one instance, and cudnnSetOpTensorDescriptor() must be used to initialize this instance.
cudnnPoolingDescriptor_t#
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnPoolingDescriptor_t is a pointer to an opaque structure holding the description of a pooling operation. cudnnCreatePoolingDescriptor() is used to create one instance, and cudnnSetPoolingNdDescriptor() or cudnnSetPooling2dDescriptor() must be used to initialize this instance.
cudnnReduceTensorDescriptor_t#
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnReduceTensorDescriptor_t is a pointer to an opaque structure holding the description of a tensor reduction operation, used as a parameter to cudnnReduceTensor(). cudnnCreateReduceTensorDescriptor() is used to create one instance, and cudnnSetReduceTensorDescriptor() must be used to initialize this instance.
cudnnSpatialTransformerDescriptor_t#
cudnnSpatialTransformerDescriptor_t is a pointer to an opaque structure holding the description of a spatial transformation operation. cudnnCreateSpatialTransformerDescriptor() is used to create one instance, cudnnSetSpatialTransformerNdDescriptor() is used to initialize this instance, and cudnnDestroySpatialTransformerDescriptor() is used to destroy this instance.
cudnnTensorDescriptor_t#
cudnnTensorDescriptor_t is a pointer to an opaque structure holding the description of a generic n-D dataset. cudnnCreateTensorDescriptor() is used to create one instance, and one of the routines cudnnSetTensorNdDescriptor(), cudnnSetTensor4dDescriptor(), or cudnnSetTensor4dDescriptorEx() must be used to initialize this instance.
cudnnTensorTransformDescriptor_t#
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnTensorTransformDescriptor_t is an opaque structure containing the description of the tensor transform. Use the cudnnCreateTensorTransformDescriptor() function to create an instance of this descriptor, and cudnnDestroyTensorTransformDescriptor() function to destroy a previously created instance.
Enumeration Types#
These are the enumeration types in the cudnn_ops library.
cudnnBatchNormMode_t#
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnBatchNormMode_t is an enumerated type used to specify the mode of operation in cudnnBatchNormalizationForwardInference(), cudnnBatchNormalizationForwardTraining(), cudnnBatchNormalizationBackward(), and cudnnDeriveBNTensorDescriptor() routines.
Values
CUDNN_BATCHNORM_PER_ACTIVATIONNormalization is performed per-activation. This mode is intended to be used after the non-convolutional network layers. In this mode, the tensor dimensions of
bnBiasandbnScaleand the parameters used in thecudnnBatchNormalization*functions are 1xCxHxW.CUDNN_BATCHNORM_SPATIALNormalization is performed over N+spatial dimensions. This mode is intended for use after convolutional layers (where spatial invariance is desired). In this mode, the
bnBiasandbnScaletensor dimensions are 1xCx1x1.CUDNN_BATCHNORM_SPATIAL_PERSISTENTThis mode is similar to
CUDNN_BATCHNORM_SPATIALbut it can be faster for some tasks.An optimized path may be selected for
CUDNN_DATA_FLOATandCUDNN_DATA_HALFtypes, compute capability 6.0 or higher for the following two batch normalization API calls: cudnnBatchNormalizationForwardTraining(), and cudnnBatchNormalizationBackward(). In the case of cudnnBatchNormalizationBackward(), thesavedMeanandsavedInvVariancearguments should not beNULL.
NCHW Mode Only This mode may use a scaled atomic integer reduction that is deterministic but imposes more restrictions on the input data range. When a numerical overflow occurs, the algorithm may produce NaN-s or Inf-s (infinity) in output buffers.
When Inf-s/NaN-s are present in the input data, the output in this mode is the same as from a pure floating-point implementation.
For finite but very large input values, the algorithm may encounter overflows more frequently due to a lower dynamic range and emit Inf-s/NaN-s while CUDNN_BATCHNORM_SPATIAL will produce finite results. The user can invoke cudnnQueryRuntimeError() to check if a numerical overflow occurred in this mode.
cudnnBatchNormOps_t#
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnBatchNormOps_t is an enumerated type used to specify the mode of operation in cudnnGetBatchNormalizationForwardTrainingExWorkspaceSize(), cudnnBatchNormalizationForwardTrainingEx(), cudnnGetBatchNormalizationBackwardExWorkspaceSize(), cudnnBatchNormalizationBackwardEx(), and cudnnGetBatchNormalizationTrainingExReserveSpaceSize() functions.
Values
CUDNN_BATCHNORM_OPS_BNOnly batch normalization is performed, per-activation.
CUDNN_BATCHNORM_OPS_BN_ACTIVATIONFirst, the batch normalization is performed, and then the activation is performed.
CUDNN_BATCHNORM_OPS_BN_ADD_ACTIVATIONPerforms the batch normalization, then element-wise addition, followed by the activation operation.
cudnnConvolutionBwdDataAlgo_t#
cudnnConvolutionBwdDataAlgo_t is an enumerated type that exposes the different algorithms available to execute the backward data convolution operation.
Values
CUDNN_CONVOLUTION_BWD_DATA_ALGO_0This algorithm expresses the convolution as a sum of matrix products without actually explicitly forming the matrix that holds the input tensor data. The sum is done using the atomic add operation, thus the results are non-deterministic.
CUDNN_CONVOLUTION_BWD_DATA_ALGO_1This algorithm expresses the convolution as a matrix product without actually explicitly forming the matrix that holds the input tensor data. The results are deterministic.
CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFTThis algorithm uses a Fast-Fourier Transform approach to compute the convolution. A significant memory workspace is needed to store intermediate results. The results are deterministic.
CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT_TILINGThis algorithm uses the Fast-Fourier Transform approach but splits the inputs into tiles. A significant memory workspace is needed to store intermediate results but less than
CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFTfor large size images. The results are deterministic.CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRADThis algorithm uses the Winograd Transform approach to compute the convolution. A reasonably sized workspace is needed to store intermediate results. The results are deterministic.
Note
The
CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRADalgorithm is not supported on GPUs based on the NVIDIA Hopper GPU architecture and later GPU architectures.CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD_NONFUSEDThis algorithm uses the Winograd Transform approach to compute the convolution. A significant workspace may be needed to store intermediate results. The results are deterministic.
cudnnConvolutionBwdFilterAlgo_t#
cudnnConvolutionBwdFilterAlgo_t is an enumerated type that exposes the different algorithms available to execute the backward filter convolution operation.
Values
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0This algorithm expresses the convolution as a sum of matrix products without actually explicitly forming the matrix that holds the input tensor data. The sum is done using the atomic add operation, thus the results are non-deterministic.
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1This algorithm expresses the convolution as a matrix product without actually explicitly forming the matrix that holds the input tensor data. The results are deterministic.
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFTThis algorithm uses the Fast-Fourier Transform approach to compute the convolution. A significant workspace is needed to store intermediate results. The results are deterministic.
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3This algorithm is similar to CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0 but uses some small workspace to precompute some indices. The results are also non-deterministic.
CUDNN_CONVOLUTION_BWD_FILTER_WINOGRAD_NONFUSEDThis algorithm uses the Winograd Transform approach to compute the convolution. A significant workspace may be needed to store intermediate results. The results are deterministic.
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT_TILINGThis algorithm uses the Fast-Fourier Transform approach to compute the convolution but splits the input tensor into tiles. A significant workspace may be needed to store intermediate results. The results are deterministic.
cudnnConvolutionFwdAlgo_t#
cudnnConvolutionFwdAlgo_t is an enumerated type that exposes the different algorithms available to execute the forward convolution operation.
Values
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMMThis algorithm expresses the convolution as a matrix product without actually explicitly forming the matrix that holds the input tensor data.
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PPRECOMP_GEMMThis algorithm expresses the convolution as a matrix product without actually explicitly forming the matrix that holds the input tensor data, but still needs some memory workspace to precompute some indices in order to facilitate the implicit construction of the matrix that holds the input tensor data.
CUDNN_CONVOLUTION_FWD_ALGO_GEMMThis algorithm expresses convolution as an explicit matrix product. A significant memory workspace is needed to store the matrix that holds the input tensor data.
CUDNN_CONVOLUTION_FWD_ALGO_DIRECTThis algorithm expresses the convolution as a direct convolution (for example, without implicitly or explicitly doing a matrix multiplication).
CUDNN_CONVOLUTION_FWD_ALGO_FFTThis algorithm uses the Fast-Fourier Transform approach to compute the convolution. A significant memory workspace is needed to store intermediate results.
CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILINGThis algorithm uses the Fast-Fourier Transform approach but splits the inputs into tiles. A significant memory workspace is needed to store intermediate results but less than
CUDNN_CONVOLUTION_FWD_ALGO_FFTfor large size images.CUDNN_CONVOLUTION_FWD_ALGO_WINOGRADThis algorithm uses the Winograd Transform approach to compute the convolution. A reasonably sized workspace is needed to store intermediate results.
Note
The
CUDNN_CONVOLUTION_FWD_ALGO_WINOGRADalgorithm is not supported on GPUs based on the NVIDIA Hopper GPU architecture and later GPU architectures.CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSEDThis algorithm uses the Winograd Transform approach to compute the convolution. A significant workspace may be needed to store intermediate results.
cudnnCTCLossAlgo_t#
cudnnCTCLossAlgo_t is an enumerated type that exposes the different algorithms available to execute the CTC loss operation.
Values
CUDNN_CTC_LOSS_ALGO_DETERMINISTICResults are guaranteed to be reproducible.
CUDNN_CTC_LOSS_ALGO_NON_DETERMINISTICResults are not guaranteed to be reproducible.
cudnnDeterminism_t#
cudnnDeterminism_t is an enumerated type used to indicate if the computed results are deterministic (reproducible). For more information, refer to Reproducibility (Determinism).
Values
CUDNN_NON_DETERMINISTICResults are not guaranteed to be reproducible.
CUDNN_DETERMINISTICResults are guaranteed to be reproducible.
cudnnDivNormMode_t#
cudnnDivNormMode_t is an enumerated type used to specify the mode of operation in cudnnDivisiveNormalizationForward() and cudnnDivisiveNormalizationBackward().
Values
CUDNN_DIVNORM_PRECOMPUTED_MEANSThe means tensor data pointer is expected to contain means or other kernel convolution values precomputed by the user. The means pointer can also be
NULL, in that case, it’s considered to be filled with zeroes. This is equivalent to spatial LRN.In the backward pass, the means are treated as independent inputs and the gradient over means is computed independently. In this mode, to yield a net gradient over the entire LCN computational graph, the
destDiffMeansresult should be backpropagated through the user’s means layer (which can be implemented using average pooling) and added to thedestDiffDatatensor produced by cudnnDivisiveNormalizationBackward().
cudnnFoldingDirection_t#
cudnnFoldingDirection_t is an enumerated type used to select the folding direction. For more information, refer to cudnnTensorTransformDescriptor_t.
Data Member
CUDNN_TRANSFORM_FOLD = 0USelects folding.
CUDNN_TRANSFORM_UNFOLD = 1USelects unfolding.
cudnnIndicesType_t#
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnIndicesType_t is an enumerated type used to indicate the data type for the indices to be computed by the cudnnReduceTensor() routine. This enumerated type is used as a field for the cudnnReduceTensorDescriptor_t descriptor.
Values
CUDNN_32BIT_INDICESCompute unsigned int indices.
CUDNN_64BIT_INDICESCompute unsigned long indices.
CUDNN_16BIT_INDICESCompute unsigned short indices.
CUDNN_8BIT_INDICESCompute unsigned char indices.
cudnnLRNMode_t#
cudnnLRNMode_t is an enumerated type used to specify the mode of operation in cudnnLRNCrossChannelForward() and cudnnLRNCrossChannelBackward().
Values
CUDNN_LRN_CROSS_CHANNEL_DIM1LRN computation is performed across the tensor’s dimension
dimA[1].
cudnnNormAlgo_t#
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnNormAlgo_t is an enumerated type used to specify the algorithm to execute the normalization operation.
Values
CUDNN_NORM_ALGO_STANDARDStandard normalization is performed.
CUDNN_NORM_ALGO_PERSISTThis mode is similar to
CUDNN_NORM_ALGO_STANDARD, however it only supportsCUDNN_NORM_PER_CHANNELand can be faster for some tasks.An optimized path may be selected for
CUDNN_DATA_FLOATandCUDNN_DATA_HALFtypes, compute capability 6.0 or higher for the following two normalization API calls: cudnnNormalizationForwardTraining() and cudnnNormalizationBackward(). In the case of cudnnNormalizationBackward(), thesavedMeanandsavedInvVariancearguments should not beNULL.
NCHW Mode Only This mode may use a scaled atomic integer reduction that is deterministic but imposes more restrictions on the input data range. When a numerical overflow occurs, the algorithm may produce NaN-s or Inf-s (infinity) in output buffers.
When Inf-s/NaN-s are present in the input data, the output in this mode is the same as from a pure floating-point implementation.
For finite but very large input values, the algorithm may encounter overflows more frequently due to a lower dynamic range and emit Inf-s/NaN-s while CUDNN_NORM_ALGO_STANDARD will produce finite results. The user can invoke cudnnQueryRuntimeError() to check if a numerical overflow occurred in this mode.
cudnnNormMode_t#
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnNormMode_t is an enumerated type used to specify the mode of operation in cudnnNormalizationForwardInference(), cudnnNormalizationForwardTraining(), cudnnBatchNormalizationBackward(), cudnnGetNormalizationForwardTrainingWorkspaceSize(), cudnnGetNormalizationBackwardWorkspaceSize(), and cudnnGetNormalizationTrainingReserveSpaceSize() routines.
Values
CUDNN_NORM_PER_ACTIVATIONNormalization is performed per-activation. This mode is intended to be used after the non-convolutional network layers. In this mode, the tensor dimensions of
normBiasandnormScaleand the parameters used in thecudnnNormalization*functions are 1xCxHxW.CUDNN_NORM_PER_CHANNELNormalization is performed per-channel over N+spatial dimensions. This mode is intended for use after convolutional layers (where spatial invariance is desired). In this mode, the
normBiasandnormScaletensor dimensions are 1xCx1x1.
cudnnNormOps_t#
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnNormOps_t is an enumerated type used to specify the mode of operation in cudnnGetNormalizationForwardTrainingWorkspaceSize(), cudnnNormalizationForwardTraining(), cudnnGetNormalizationBackwardWorkspaceSize(), cudnnNormalizationBackward(), and cudnnGetNormalizationTrainingReserveSpaceSize() functions.
Values
CUDNN_NORM_OPS_NORMOnly normalization is performed.
CUDNN_NORM_OPS_NORM_ACTIVATIONFirst, the normalization is performed, then the activation is performed.
CUDNN_NORM_OPS_NORM_ADD_ACTIVATIONPerforms the normalization, then element-wise addition, followed by the activation operation.
cudnnOpTensorOp_t#
cudnnOpTensorOp_t is an enumerated type used to indicate the Tensor Core operation to be used by the cudnnOpTensor() routine. This enumerated type is used as a field for the cudnnOpTensorDescriptor_t descriptor.
Values
CUDNN_OP_TENSOR_ADDThe operation to be performed is addition.
CUDNN_OP_TENSOR_MULThe operation to be performed is multiplication.
CUDNN_OP_TENSOR_MINThe operation to be performed is a minimum comparison.
CUDNN_OP_TENSOR_MAXThe operation to be performed is a maximum comparison.
CUDNN_OP_TENSOR_SQRTThe operation to be performed is square root, performed on only the
Atensor.CUDNN_OP_TENSOR_NOTThe operation to be performed is negation, performed on only the
Atensor.
cudnnPoolingMode_t#
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnPoolingMode_t is an enumerated type passed to cudnnSetPooling2dDescriptor() to select the pooling method to be used by cudnnPoolingForward() and cudnnPoolingBackward().
Values
CUDNN_POOLING_MAXThe maximum value inside the pooling window is used.
CUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDINGValues inside the pooling window are averaged. The number of elements used to calculate the average includes spatial locations falling in the padding region.
CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDINGValues inside the pooling window are averaged. The number of elements used to calculate the average excludes spatial locations falling in the padding region.
CUDNN_POOLING_MAX_DETERMINISTICThe maximum value inside the pooling window is used. The algorithm used is deterministic.
cudnnReduceTensorIndices_t#
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnReduceTensorIndices_t is an enumerated type used to indicate whether indices are to be computed by the cudnnReduceTensor() routine. This enumerated type is used as a field for the cudnnReduceTensorDescriptor_t descriptor.
Values
CUDNN_REDUCE_TENSOR_NO_INDICESDo not compute indices.
CUDNN_REDUCE_TENSOR_FLATTENED_INDICESCompute indices. The resulting indices are relative, and flattened.
cudnnSamplerType_t#
cudnnSamplerType_t is an enumerated type passed to cudnnSetSpatialTransformerNdDescriptor() to select the sampler type to be used by cudnnSpatialTfSamplerForward() and cudnnSpatialTfSamplerBackward().
Values
CUDNN_SAMPLER_BILINEARSelects the bilinear sampler.
cudnnSoftmaxAlgorithm_t#
cudnnSoftmaxAlgorithm_t is used to select an implementation of the softmax function used in cudnnSoftmaxForward() and cudnnSoftmaxBackward().
Values
CUDNN_SOFTMAX_FASTThis implementation applies the straightforward softmax operation.
CUDNN_SOFTMAX_ACCURATEThis implementation scales each point of the softmax input domain by its maximum value to avoid potential floating point overflows in the softmax evaluation.
CUDNN_SOFTMAX_LOGThis entry performs the log softmax operation, avoiding overflows by scaling each point in the input domain as in
CUDNN_SOFTMAX_ACCURATE.
cudnnSoftmaxMode_t#
cudnnSoftmaxMode_t is used to select over which data the cudnnSoftmaxForward() and cudnnSoftmaxBackward() are computing their results.
Values
CUDNN_SOFTMAX_MODE_INSTANCE
The softmax operation is computed per image (N) across the dimensions C,H,W.
CUDNN_SOFTMAX_MODE_CHANNEL
The softmax operation is computed per spatial location (H,W) per image (N) across dimension C.
API Functions#
These are the API functions in the cudnn_ops library.
cudnnActivationBackward()#
This function has been deprecated in cuDNN 9.0.
This routine computes the gradient of a neuron activation function.
cudnnStatus_t cudnnActivationBackward( cudnnHandle_t handle, cudnnActivationDescriptor_t activationDesc, const void *alpha, const cudnnTensorDescriptor_t yDesc, const void *y, const cudnnTensorDescriptor_t dyDesc, const void *dy, const cudnnTensorDescriptor_t xDesc, const void *x, const void *beta, const cudnnTensorDescriptor_t dxDesc, void *dx)
In-place operation is allowed for this routine; meaning dy and dx pointers may be equal. However, this requires the corresponding tensor descriptors to be identical (particularly, the strides of the input and output must match for an in-place operation to be allowed).
All tensor formats are supported for 4 and 5 dimensions, however, the best performance is obtained when the strides of yDesc and xDesc are equal and HW-packed. For more than 5 dimensions the tensors must have their spatial dimensions packed.
Parameters
handleInput. Handle to a previously created cuDNN context. For more information, refer to cudnnHandle_t.
activationDescInput. Activation descriptor. For more information, refer to cudnnActivationDescriptor_t.
alpha,betaInput. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows:
dstValue = alpha[0]*result + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
yDescInput. Handle to the previously initialized input tensor descriptor. For more information, refer to cudnnTensorDescriptor_t.
yInput. Data pointer to GPU memory associated with the tensor descriptor
yDesc.dyDescInput. Handle to the previously initialized input differential tensor descriptor.
dyInput. Data pointer to GPU memory associated with the tensor descriptor
dyDesc.xDescInput. Handle to the previously initialized output tensor descriptor.
xInput. Data pointer to GPU memory associated with the output tensor descriptor
xDesc.dxDescInput. Handle to the previously initialized output differential tensor descriptor.
dxOutput. Data pointer to GPU memory associated with the output tensor descriptor
dxDesc.
Returns
CUDNN_STATUS_SUCCESSThe function launched successfully.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
The strides
nStride,cStride,hStride,wStrideof the input differential tensor and output differential tensor differ and in-place operation is used.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration. Refer to the following for some examples of non-supported configurations:
The dimensions
n,c,h, andwof the input tensor and output tensor differ.The
datatypeof the input tensor and output tensor differs.The strides
nStride,cStride,hStride, andwStrideof the input tensor and the input differential tensor differ.The strides
nStride,cStride,hStride, andwStrideof the output tensor and the output differential tensor differ.
CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnActivationForward()#
This function has been deprecated in cuDNN 9.0.
This routine applies a specified neuron activation function element-wise over each input value.
cudnnStatus_t cudnnActivationForward( cudnnHandle_t handle, cudnnActivationDescriptor_t activationDesc, const void *alpha, const cudnnTensorDescriptor_t xDesc, const void *x, const void *beta, const cudnnTensorDescriptor_t yDesc, void *y)
In-place operation is allowed for this routine; meaning, xData and yData pointers may be equal. However, this requires xDesc and yDesc descriptors to be identical (particularly, the strides of the input and output must match for an in-place operation to be allowed).
All tensor formats are supported for 4 and 5 dimensions, however, the best performance is obtained when the strides of xDesc and yDesc are equal and HW-packed. For more than 5 dimensions the tensors must have their spatial dimensions packed.
Parameters
handleInput. Handle to a previously created cuDNN context. For more information, refer to cudnnHandle_t.
activationDescInput. Activation descriptor. For more information, refer to cudnnActivationDescriptor_t.
alpha,betaInput. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows:
dstValue = alpha[0]*result + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDescInput. Handle to the previously initialized input tensor descriptor. For more information, refer to cudnnTensorDescriptor_t.
xInput. Data pointer to GPU memory associated with the tensor descriptor
xDesc.yDescInput. Handle to the previously initialized input tensor descriptor.
yInput. Data pointer to GPU memory associated with the tensor descriptor
yDesc.
Returns
CUDNN_STATUS_SUCCESSThe function launched successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
The parameter
modehas an invalid enumerant value.The dimensions
n,c,h,wof the input tensor and output tensor differ.The
datatypeof the input tensor and output tensor differs.The strides
nStride,cStride,hStride,wStrideof the input tensor and output tensor differ and in-place operation is used (meaning,xandypointers are equal).
CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnAddTensor()#
This function has been deprecated in cuDNN 9.0.
This function adds the scaled values of a bias tensor to another tensor. Each dimension of the bias tensor A must match the corresponding dimension of the destination tensor C or must be equal to 1. In the latter case, the same value from the bias tensor for those dimensions will be used to blend into the C tensor.
cudnnStatus_t cudnnAddTensor( cudnnHandle_t handle, const void *alpha, const cudnnTensorDescriptor_t aDesc, const void *A, const void *beta, const cudnnTensorDescriptor_t cDesc, void *C)
Only 4D and 5D tensors are supported. Beyond these dimensions, this routine is not supported.
Parameters
handleInput. Handle to a previously created cuDNN context. For more information, refer to cudnnHandle_t.
alpha,betaInput. Pointers to scaling factors (in host memory) used to blend the source value with the prior value in the destination tensor as follows:
dstValue = alpha[0]*srcValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
aDescInput. Handle to a previously initialized tensor descriptor. For more information, refer to cudnnTensorDescriptor_t.
AInput. Pointer to data of the tensor described by the
aDescdescriptor.cDescInput. Handle to a previously initialized tensor descriptor.
CInput/Output. Pointer to data of the tensor described by the
cDescdescriptor.
Returns
CUDNN_STATUS_SUCCESSThe function executed successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMThe dimensions of the bias tensor refer to an amount of data that is incompatible with the output tensor dimensions or the
dataTypeof the two tensor descriptors are different.CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnBatchNormalizationBackward()#
This function has been deprecated in cuDNN 9.0.
This function performs the backward batch normalization layer computation. This layer is based on the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift paper.
cudnnStatus_t cudnnBatchNormalizationBackward( cudnnHandle_t handle, cudnnBatchNormMode_t mode, const void *alphaDataDiff, const void *betaDataDiff, const void *alphaParamDiff, const void *betaParamDiff, const cudnnTensorDescriptor_t xDesc, const void *x, const cudnnTensorDescriptor_t dyDesc, const void *dy, const cudnnTensorDescriptor_t dxDesc, void *dx, const cudnnTensorDescriptor_t bnScaleBiasDiffDesc, const void *bnScale, void *resultBnScaleDiff, void *resultBnBiasDiff, double epsilon, const void *savedMean, const void *savedInvVariance)
Only 4D and 5D tensors are supported.
The epsilon value has to be the same during training, backpropagation, and inference.
Higher performance can be obtained when HW-packed tensors are used for all of x, dy, and dx.
For more information, refer to cudnnDeriveBNTensorDescriptor() for the secondary tensor descriptor generation for the parameters used in this function.
Parameters
handleInput. Handle to a previously created cuDNN library descriptor. For more information, refer to cudnnHandle_t.
modeInput. Mode of operation (spatial or per-activation). For more information, refer to cudnnBatchNormMode_t.
*alphaDataDiff,*betaDataDiffInputs. Pointers to scaling factors (in host memory) used to blend the gradient output
dxwith a prior value in the destination tensor as follows:dstValue = alphaDataDiff[0]*resultValue + betaDataDiff[0]*priorDstValue
For more information, refer to Scaling Parameters.
*alphaParamDiff,*betaParamDiffInputs. Pointers to scaling factors (in host memory) used to blend the gradient outputs
resultBnScaleDiffandresultBnBiasDiffwith prior values in the destination tensor as follows:dstValue = alphaParamDiff[0]*resultValue + betaParamDiff[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDesc,dxDesc,dyDescInputs. Handles to the previously initialized tensor descriptors.
*xInputs. Data pointer to GPU memory associated with the tensor descriptor
xDesc, for the layer’sxdata.*dyInputs. Data pointer to GPU memory associated with the tensor descriptor
dyDesc, for the backpropagated differentialdyinput.*dxInputs/Outputs. Data pointer to GPU memory associated with the tensor descriptor
dxDesc, for the resulting differential output with respect tox.bnScaleBiasDiffDescInput. Shared tensor descriptor for the following five tensors:
bnScale,resultBnScaleDiff,resultBnBiasDiff,savedMean, andsavedInvVariance. The dimensions for this tensor descriptor are dependent on normalization mode. For more information, refer to cudnnDeriveBNTensorDescriptor().Note
The data type of this tensor descriptor must be
floatfor FP16 and FP32 input tensors, anddoublefor FP64 input tensors.*bnScaleInput. Pointer in the device memory for the batch normalization
scaleparameter (in the original paper the quantityscaleis referred to as gamma).Note
The
bnBiasparameter is not needed for this layer’s computation.resultBnScaleDiff,resultBnBiasDiffOutputs. Pointers in device memory for the resulting scale and bias differentials computed by this routine. Note that these scale and bias gradients are weight gradients specific to this batch normalization operation, and by definition are not backpropagated.
epsilonInput. Epsilon value used in batch normalization formula. Its value should be equal to or greater than the value defined for
CUDNN_BN_MIN_EPSILONincudnn.h. The sameepsilonvalue should be used in forward and backward functions.*savedMean,*savedInvVarianceInputs. Optional cache parameters containing saved intermediate results that were computed during the forward pass. For this to work correctly, the layer’s
xandbnScaledata have to remain unchanged until this backward function is called.Note
Both these parameters can be
NULLbut only at the same time. It is recommended to use this cache since the memory overhead is relatively small.
Supported Configurations
This function supports the following combinations of data types for various descriptors.
Data Type Configurations Supported |
|
|
|
|
|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
Any of the pointers
alpha,beta,x,dy,dx,bnScale,resultBnScaleDiff, andresultBnBiasDiffisNULL.The number of
xDesc,yDesc, ordxDesctensor descriptor dimensions is not within the range of [4,5] (only 4D and 5D tensors are supported).bnScaleBiasDiffDescdimensions are not 1xCx1x1 for 4D and 1xCx1x1x1 for 5D for spatial, and are not 1xCxHxW for 4D and 1xCxDxHxW for 5D for per-activation mode.Exactly one of
savedMean,savedInvVariancepointers isNULL.epsilonvalue is less thanCUDNN_BN_MIN_EPSILON.Dimensions or data types mismatch for any pair of
xDesc,dyDesc, ordxDesc.
cudnnBatchNormalizationBackwardEx()#
This function has been deprecated in cuDNN 9.0.
This function is an extension of the cudnnBatchNormalizationBackward() for performing the backward batch normalization layer computation with a fast NHWC semi-persistent kernel.
cudnnStatus_t cudnnBatchNormalizationBackwardEx ( cudnnHandle_t handle, cudnnBatchNormMode_t mode, cudnnBatchNormOps_t bnOps, const void *alphaDataDiff, const void *betaDataDiff, const void *alphaParamDiff, const void *betaParamDiff, const cudnnTensorDescriptor_t xDesc, const void *xData, const cudnnTensorDescriptor_t yDesc, const void *yData, const cudnnTensorDescriptor_t dyDesc, const void *dyData, const cudnnTensorDescriptor_t dzDesc, void *dzData, const cudnnTensorDescriptor_t dxDesc, void *dxData, const cudnnTensorDescriptor_t dBnScaleBiasDesc, const void *bnScaleData, const void *bnBiasData, void *dBnScaleData, void *dBnBiasData, double epsilon, const void *savedMean, const void *savedInvVariance, const cudnnActivationDescriptor_t activationDesc, void *workspace, size_t workSpaceSizeInBytes void *reserveSpace size_t reserveSpaceSizeInBytes);
This API will trigger the new semi-persistent NHWC kernel when the following conditions are true:
All tensors, namely,
x,y,dz,dy, anddxmust be NHWC-fully packed, and must be of the typeCUDNN_DATA_HALF.The input parameter mode must be set to
CUDNN_BATCHNORM_SPATIAL_PERSISTENT.Before cuDNN version 8.2.0, the tensor
Cdimension should always be a multiple of 4. After 8.2.0, the tensorCdimension should be a multiple of 4 only whenbnOpsisCUDNN_BATCHNORM_OPS_BN_ADD_ACTIVATION.
workspaceis notNULL.
workSpaceSizeInBytesis equal to or larger than the amount required by cudnnGetBatchNormalizationBackwardExWorkspaceSize().
reserveSpaceSizeInBytesis equal to or larger than the amount required by cudnnGetBatchNormalizationTrainingExReserveSpaceSize().The content in
reserveSpacestored by cudnnBatchNormalizationForwardTrainingEx() must be preserved.
If workspace is NULL and workSpaceSizeInBytes of zero is passed in, this API will function exactly like the non-extended function cudnnBatchNormalizationBackward().
This workspace is not required to be clean. Moreover, the workspace does not have to remain unchanged between the forward and backward pass, as it is not used for passing any information.
This extended function can accept a *workspace pointer to the GPU workspace, and workSpaceSizeInBytes, the size of the workspace, from the user.
The bnOps input can be used to set this function to perform either only the batch normalization, or batch normalization followed by activation, or batch normalization followed by element-wise addition and then activation.
Only 4D and 5D tensors are supported. The epsilon value has to be the same during the training, the backpropagation, and the inference.
When the tensor layout is NCHW, higher performance can be obtained when HW-packed tensors are used for x, dy, and dx.
Parameters
handleInput. Handle to a previously created cuDNN library descriptor. For more information, refer to cudnnHandle_t.
modeInput. Mode of operation (spatial or per-activation). For more information, refer to cudnnBatchNormMode_t.
bnOpsInput. Mode of operation. Currently,
CUDNN_BATCHNORM_OPS_BN_ACTIVATIONandCUDNN_BATCHNORM_OPS_BN_ADD_ACTIVATIONare only supported in the NHWC layout. For more information, refer to cudnnBatchNormOps_t. This input can be used to set this function to perform either only the batch normalization, or batch normalization followed by activation, or batch normalization followed by element-wise addition and then activation.*alphaDataDiff,*betaDataDiffInputs. Pointers to scaling factors (in host memory) used to blend the gradient output
dxwith a prior value in the destination tensor as follows:dstValue = alpha[0]*resultValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
*alphaParamDiff,*betaParamDiffInputs. Pointers to scaling factors (in host memory) used to blend the gradient outputs
dBnScaleDataanddBnBiasDatawith prior values in the destination tensor as follows:dstValue = alpha[0]*resultValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDesc,*x,yDesc,*yData,dyDesc,*dyDataInputs. Tensor descriptors and pointers in the device memory for the layer’s
xdata, backpropagated gradient inputdy, the original forward outputydata.yDescandyDataare not needed ifbnOpsis set toCUDNN_BATCHNORM_OPS_BN, users may passNULL. For more information, refer to cudnnTensorDescriptor_t.dzDesc,dxDescInputs. Tensor descriptors and pointers in the device memory for the computed gradient output
dz, anddx.dzDescis not needed whenbnOpsisCUDNN_BATCHNORM_OPS_BNorCUDNN_BATCHNORM_OPS_BN_ACTIVATION, users may passNULL. For more information, refer to cudnnTensorDescriptor_t.*dzData,*dxDataOutputs. Tensor descriptors and pointers in the device memory for the computed gradient output
dz, anddx.*dzDatais not needed whenbnOpsisCUDNN_BATCHNORM_OPS_BNorCUDNN_BATCHNORM_OPS_BN_ACTIVATION, users may passNULL. For more information, refer to cudnnTensorDescriptor_t.dBnScaleBiasDescInput. Shared tensor descriptor for the following six tensors:
bnScaleData,bnBiasData,dBnScaleData,dBnBiasData,savedMean, andsavedInvVariance. For more information, refer to cudnnDeriveBNTensorDescriptor().The dimensions for this tensor descriptor are dependent on normalization mode.
Note
The data type of this tensor descriptor must be
floatfor FP16 and FP32 input tensors anddoublefor FP64 input tensors. For more information, refer to cudnnTensorDescriptor_t.*bnScaleDataInput. Pointer in the device memory for the batch normalization scale parameter (in the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift paper, the quantity scale is referred to as gamma).
*bnBiasDataInput. Pointers in the device memory for the batch normalization bias parameter (in the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift paper, bias is referred to as beta). This parameter is used only when activation should be performed.
*dBnScaleData,*dBnBiasDataOutputs. Pointers in the device memory for the gradients of
bnScaleDataandbnBiasData, respectively.epsilonInput. Epsilon value used in batch normalization formula. Its value should be equal to or greater than the value defined for
CUDNN_BN_MIN_EPSILONincudnn.h. The same epsilon value should be used in forward and backward functions.*savedMean,*savedInvVarianceInputs. Optional cache parameters containing saved intermediate results computed during the forward pass. For this to work correctly, the layer’s
xandbnScaleData,bnBiasDatadata has to remain unchanged until this backward function is called. Note that both these parameters can be NULL but only at the same time. It is recommended to use this cache since the memory overhead is relatively small.activationDescInput. Descriptor for the activation operation. When the
bnOpsinput is set to eitherCUDNN_BATCHNORM_OPS_BN_ACTIVATIONorCUDNN_BATCHNORM_OPS_BN_ADD_ACTIVATIONthen this activation is used, otherwise the user may passNULL.workspaceInput. Pointer to the GPU workspace. If workspace is
NULLandworkSpaceSizeInBytesof zero is passed in, then this API will function exactly like the non-extended function cudnnBatchNormalizationBackward().workSpaceSizeInBytesInput. The size of the workspace. It must be large enough to trigger the fast NHWC semi-persistent kernel by this function.
*reserveSpaceInput. Pointer to the GPU workspace for the
reserveSpace.reserveSpaceSizeInBytesInput. The size of the
reserveSpace. It must be equal or larger than the amount required by cudnnGetBatchNormalizationTrainingExReserveSpaceSize().
Supported Configurations
This function supports the following combinations of data types for various descriptors.
Data Type Configurations Supported |
|
|
|
|
|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
Any of the pointers
alphaDataDiff,betaDataDiff,alphaParamDiff,betaParamDiff,x,dy,dx,bnScale,resultBnScaleDiff, andresultBnBiasDiffisNULL.The number of
xDesc,yDesc, ordxDesctensor descriptor dimensions is not within the range of [4,5] (only 4D and 5D tensors are supported).dBnScaleBiasDescdimensions not 1xCx1x1 for 4D and 1xCx1x1x1 for 5D for spatial, and are not 1xCxHxW for 4D and 1xCxDxHxW for 5D for per-activation mode.Exactly one of
savedMean,savedInvVariancepointers isNULL.epsilonvalue is less thanCUDNN_BN_MIN_EPSILON.Dimensions or data types mismatch for any pair of
xDesc,dyDesc, anddxDesc.
cudnnBatchNormalizationForwardInference()#
This function has been deprecated in cuDNN 9.0.
This function performs the forward batch normalization layer computation for the inference phase. This layer is based on the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift paper.
cudnnStatus_t cudnnBatchNormalizationForwardInference( cudnnHandle_t handle, cudnnBatchNormMode_t mode, const void *alpha, const void *beta, const cudnnTensorDescriptor_t xDesc, const void *x, const cudnnTensorDescriptor_t yDesc, void *y, const cudnnTensorDescriptor_t bnScaleBiasMeanVarDesc, const void *bnScale, const void *bnBias, const void *estimatedMean, const void *estimatedVariance, double epsilon)
Only 4D and 5D tensors are supported.
The input transformation performed by this function is defined as:
y = beta*y + alpha *[bnBias + (bnScale * (x-estimatedMean)/sqrt(epsilon + estimatedVariance)]
The epsilon value has to be the same during training, backpropagation and inference.
For the training phase, refer to cudnnBatchNormalizationForwardTraining().
Higher performance can be obtained when HW-packed tensors are used for all of x and dx.
For more information, refer to cudnnDeriveBNTensorDescriptor() for the secondary tensor descriptor generation for the parameters used in this function.
Parameters
handleInput. Handle to a previously created cuDNN library descriptor. For more information, refer to cudnnHandle_t.
modeInput. Mode of operation (spatial or per-activation). For more information, refer to cudnnBatchNormMode_t.
alpha,betaInputs. Pointers to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows:
dstValue = alpha[0]*resultValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDesc,yDescInput. Handles to the previously initialized tensor descriptors.
*xInput. Data pointer to GPU memory associated with the tensor descriptor
xDesc, for the layer’sxinput data.*yInput/Output. Data pointer to GPU memory associated with the tensor descriptor
yDesc, for theyoutput of the batch normalization layer.bnScaleBiasMeanVarDesc,bnScale,bnBiasInputs. Tensor descriptors and pointers in device memory for the batch normalization scale and bias parameters (in the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift paper, bias is referred to as beta and scale as gamma).
estimatedMean,estimatedVarianceInputs. Mean and variance tensors (these have the same descriptor as the bias and scale). The
resultRunningMeanandresultRunningVariance, accumulated during the training phase from the cudnnBatchNormalizationForwardTraining() call, should be passed as inputs here.epsilonInput. Epsilon value used in batch normalization formula. Its value should be equal to or greater than the value defined for
CUDNN_BN_MIN_EPSILONincudnn.h. The same epsilon value should be used in forward and backward functions.
Supported Configurations
This function supports the following combinations of data types for various descriptors.
Data Type Configurations Supported |
|
|
|
|
|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
One of the pointers
alpha,beta,x,y,bnScale,bnBias,estimatedMean, andestimatedInvVarianceisNULL.The number of
xDescoryDesctensor descriptor dimensions is not within the range of [4,5] (only 4D and 5D tensors are supported.)bnScaleBiasMeanVarDescdimensions are not 1xCx1x1 for 4D and 1xCx1x1x1 for 5D for spatial, and are not 1xCxHxW for 4D and 1xCxDxHxW for 5D for per-activation mode.epsilonvalue is less thanCUDNN_BN_MIN_EPSILON.Dimensions or data types mismatch for
xDesc,yDesc.
cudnnBatchNormalizationForwardTraining()#
This function has been deprecated in cuDNN 9.0.
This function performs the forward batch normalization layer computation for the training phase. This layer is based on the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift paper.
cudnnStatus_t cudnnBatchNormalizationForwardTraining( cudnnHandle_t handle, cudnnBatchNormMode_t mode, const void *alpha, const void *beta, const cudnnTensorDescriptor_t xDesc, const void *x, const cudnnTensorDescriptor_t yDesc, void *y, const cudnnTensorDescriptor_t bnScaleBiasMeanVarDesc, const void *bnScale, const void *bnBias, double exponentialAverageFactor, void *resultRunningMean, void *resultRunningVariance, double epsilon, void *resultSaveMean, void *resultSaveInvVariance)
Only 4D and 5D tensors are supported.
The epsilon value has to be the same during training, backpropagation, and inference.
For the inference phase, use cudnnBatchNormalizationForwardInference().
Higher performance can be obtained when HW-packed tensors are used for both x and y.
Refer to cudnnDeriveBNTensorDescriptor() for the secondary tensor descriptor generation for the parameters used in this function.
Parameters
handleInput. Handle to a previously created cuDNN library descriptor. For more information, refer to cudnnHandle_t.
modeInput. Mode of operation (spatial or per-activation). For more information, refer to cudnnBatchNormMode_t.
alpha,betaInputs. Pointers to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows:
dstValue = alpha[0]*resultValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDesc,yDescInput. Tensor descriptors and pointers in device memory for the layer’s
xandydata. For more information, refer to cudnnTensorDescriptor_t.*xInput. Data pointer to GPU memory associated with the tensor descriptor
xDesc, for the layer’sxinput data.*yInput. Data pointer to GPU memory associated with the tensor descriptor
yDesc, for theyoutput of the batch normalization layer.bnScaleBiasMeanVarDescInput. Shared tensor descriptor desc for the secondary tensor that was derived by cudnnDeriveBNTensorDescriptor(). The dimensions for this tensor descriptor are dependent on the normalization mode.
bnScale,bnBiasInputs. Pointers in device memory for the batch normalization scale and bias parameters (in the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift paper, bias is referred to as beta and scale as gamma). Note that
bnBiasparameter can replace the previous layer’s bias parameter for improved efficiency.exponentialAverageFactorInput. Factor used in the moving average computation as follows:
runningMean = runningMean*(1-factor) + newMean*factor
Use a
factor=1/(1+n)atN-thcall to the function to get the Cumulative Moving Average (CMA) behavior, for example:CMA[n] = (x[1]+...+x[n])/n
For example:
CMA[n+1] = (n*CMA[n]+x[n+1])/(n+1) = ((n+1)*CMA[n]-CMA[n])/(n+1) + x[n+1]/(n+1) = CMA[n]*(1-1/(n+1))+x[n+1]*1/(n+1) = CMA[n]*(1-factor) + x(n+1)*factor
resultRunningMean,resultRunningVarianceInputs/Outputs. Running mean and variance tensors (these have the same descriptor as the bias and scale). Both of these pointers can be
NULLbut only at the same time. The value stored inresultRunningVariance(or passed as an input in inference mode) is the sample variance and is the moving average ofvariance[x]where the variance is computed either over batch or spatial+batch dimensions depending on the mode. If these pointers are notNULL, the tensors should be initialized to some reasonable values or to0.epsilonInput. Epsilon value used in the batch normalization formula. Its value should be equal to or greater than the value defined for
CUDNN_BN_MIN_EPSILONincudnn.h. The same epsilon value should be used in forward and backward functions.resultSaveMean,resultSaveInvVarianceOutputs. Optional cache to save intermediate results computed during the forward pass. These buffers can be used to speed up the backward pass when supplied to the cudnnBatchNormalizationBackward() function. The intermediate results stored in
resultSaveMeanandresultSaveInvVariancebuffers should not be used directly by the user. Depending on the batch normalization mode, the results stored inresultSaveInvVariancemay vary. For the cache to work correctly, the input layer data must remain unchanged until the backward function is called. Note that both parameters can beNULLbut only at the same time. In such a case, intermediate statistics will not be saved, and cudnnBatchNormalizationBackward() will have to re-compute them. It is recommended to use this cache as the memory overhead is relatively small because these tensors have a much lower product of dimensions than the data tensors.
Supported Configurations
This function supports the following combinations of data types for various descriptors.
Data Type Configurations Supported |
|
|
|
|
|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
One of the pointers
alpha,beta,x,y,bnScale, andbnBiasisNULL.The number of
xDescoryDesctensor descriptor dimensions is not within the range of [4,5] (only 4D and 5D tensors are supported.)bnScaleBiasMeanVarDescdimensions are not 1xCx1x1 for 4D and 1xCx1x1x1 for 5D for spatial, and are not 1xCxHxW for 4D and 1xCxDxHxW for 5D for per-activation mode.Exactly one of
resultSaveMean,resultSaveInvVariancepointers areNULL.Exactly one of
resultRunningMean,resultRunningInvVariancepointers areNULL.epsilonvalue is less thanCUDNN_BN_MIN_EPSILON.Dimensions or data types mismatch for
xDesc,yDesc.
cudnnBatchNormalizationForwardTrainingEx()#
This function has been deprecated in cuDNN 9.0.
This function is an extension of the cudnnBatchNormalizationForwardTraining() for performing the forward batch normalization layer computation.
cudnnStatus_t cudnnBatchNormalizationForwardTrainingEx( cudnnHandle_t handle, cudnnBatchNormMode_t mode, cudnnBatchNormOps_t bnOps, const void *alpha, const void *beta, const cudnnTensorDescriptor_t xDesc, const void *xData, const cudnnTensorDescriptor_t zDesc, const void *zData, const cudnnTensorDescriptor_t yDesc, void *yData, const cudnnTensorDescriptor_t bnScaleBiasMeanVarDesc, const void *bnScaleData, const void *bnBiasData, double exponentialAverageFactor, void *resultRunningMeanData, void *resultRunningVarianceData, double epsilon, void *saveMean, void *saveInvVariance, const cudnnActivationDescriptor_t activationDesc, void *workspace, size_t workSpaceSizeInBytes void *reserveSpace size_t reserveSpaceSizeInBytes);
This API will trigger the new semi-persistent NHWC kernel when the following conditions are true:
All tensors, namely,
x,y,dz,dy, anddxmust be NHWC-fully packed and must be of the typeCUDNN_DATA_HALF.The input parameter mode must be set to
CUDNN_BATCHNORM_SPATIAL_PERSISTENT.
workspaceis notNULL.Before cuDNN version 8.2.0, the tensor
Cdimension should always be a multiple of 4. After 8.2.0, the tensorCdimension should be a multiple of 4 only whenbnOpsisCUDNN_BATCHNORM_OPS_BN_ADD_ACTIVATION.
workSpaceSizeInBytesis equal to or larger than the amount required by cudnnGetBatchNormalizationForwardTrainingExWorkspaceSize().
reserveSpaceSizeInBytesis equal to or larger than the amount required by cudnnGetBatchNormalizationTrainingExReserveSpaceSize().The content in
reserveSpacestored by cudnnBatchNormalizationForwardTrainingEx() must be preserved.
If workspace is NULL and workSpaceSizeInBytes of zero is passed in, this API will function exactly like the non-extended function cudnnBatchNormalizationForwardTraining().
This workspace is not required to be clean. Moreover, the workspace does not have to remain unchanged between the forward and backward pass, as it is not used for passing any information.
This extended function can accept a *workspace pointer to the GPU workspace, and workSpaceSizeInBytes, the size of the workspace, from the user.
The bnOps input can be used to set this function to perform either only the batch normalization, or batch normalization followed by activation, or batch normalization followed by element-wise addition and then activation.
Only 4D and 5D tensors are supported. The epsilon value has to be the same during the training, the backpropagation, and the inference.
When the tensor layout is NCHW, higher performance can be obtained when HW-packed tensors are used for x, dy, and dx.
Parameters
handleInput. Handle to a previously created cuDNN library descriptor. For more information, refer to cudnnHandle_t.
modeInput. Mode of operation (spatial or per-activation). For more information, refer to cudnnBatchNormMode_t.
bnOpsInput. Mode of operation for the fast NHWC kernel. For more information, refer to cudnnBatchNormOps_t. This input can be used to set this function to perform either only the batch normalization, or batch normalization followed by activation, or batch normalization followed by element-wise addition and then activation.
*alpha,*betaInputs. Pointers to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows:
dstValue = alpha[0]*resultValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDesc,*xData,zDesc,*zData,yDesc,*yDataInputs. Tensor descriptors and pointers in device memory for the layer’s input
xand outputy, and for the optionalztensor input for residual addition to the result of the batch normalization operation, prior to the activation. The optionalzDescand*zDatadescriptors are only used whenbnOpsisCUDNN_BATCHNORM_OPS_BN_ADD_ACTIVATION, otherwise users may passNULL. When in use,zshould have exactly the same dimension asxand the final outputy. For more information, refer to cudnnTensorDescriptor_t.bnScaleBiasMeanVarDescInput. Shared tensor descriptor
descfor the secondary tensor that was derived by cudnnDeriveBNTensorDescriptor(). The dimensions for this tensor descriptor are dependent on the normalization mode.*bnScaleData,*bnBiasDataInputs. Pointers in the device memory for the batch normalization scale and bias data. In the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift paper, bias is referred to as beta and scale as gamma. Note that
bnBiasDataparameter can replace the previous operations bias parameter for improved efficiency.exponentialAverageFactorInput. Factor used in the moving average computation as follows:
runningMean = runningMean*(1-factor) + newMean*factor
Use a
factor=1/(1+n)atN-thcall to the function to get the Cumulative Moving Average (CMA) behavior, for example:CMA[n] = (x[1]+...+x[n])/n
For example:
CMA[n+1] = (n*CMA[n]+x[n+1])/(n+1) = ((n+1)*CMA[n]-CMA[n])/(n+1) + x[n+1]/(n+1) = CMA[n]*(1-1/(n+1))+x[n+1]*1/(n+1) = CMA[n]*(1-factor) + x(n+1)*factor
*resultRunningMeanData,*resultRunningVarianceDataInputs/Outputs. Pointers to the running mean and running variance data. Both these pointers can be
NULLbut only at the same time. The value stored inresultRunningVarianceData(or passed as an input in inference mode) is the sample variance and is the moving average ofvariance[x]where the variance is computed either over batch or spatial+batch dimensions depending on the mode. If these pointers are notNULL, the tensors should be initialized to some reasonable values or to0.epsilonInput. Epsilon value used in the batch normalization formula. Its value should be equal to or greater than the value defined for
CUDNN_BN_MIN_EPSILONincudnn.h. The same epsilon value should be used in forward and backward functions.*saveMean,*saveInvVarianceOutputs. Optional cache parameters containing saved intermediate results computed during the forward pass. For this to work correctly, the layer’s
xandbnScaleData,bnBiasDatadata has to remain unchanged until this backward function is called. Note that both these parameters can beNULLbut only at the same time. It is recommended to use this cache since the memory overhead is relatively small.activationDescInput. The tensor descriptor for the activation operation. When the
bnOpsinput is set to eitherCUDNN_BATCHNORM_OPS_BN_ACTIVATIONorCUDNN_BATCHNORM_OPS_BN_ADD_ACTIVATIONthen this activation is used, otherwise user may passNULL.*workspace,workSpaceSizeInBytesInputs.
*workspaceis a pointer to the GPU workspace, andworkSpaceSizeInBytesis the size of the workspace. When*workspaceis notNULLand*workSpaceSizeInBytesis large enough, and the tensor layout is NHWC and the data type configuration is supported, then this function will trigger a new semi-persistent NHWC kernel for batch normalization. The workspace is not required to be clean. Also, the workspace does not need to remain unchanged between the forward and backward passes.*reserveSpaceInput. Pointer to the GPU workspace for the
reserveSpace.reserveSpaceSizeInBytesInput. The size of the
reserveSpace. Must be equal or larger than the amount required by cudnnGetBatchNormalizationTrainingExReserveSpaceSize().
Supported Configurations
This function supports the following combinations of data types for various descriptors.
Data Type Configurations Supported |
|
|
|
|
|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
One of the pointers
alpha,beta,x,y,bnScaleData, andbnBiasDataisNULL.The number of
xDescoryDesctensor descriptor dimensions is not within the [4,5] range (only 4D and 5D tensors are supported.)bnScaleBiasMeanVarDescdimensions are not 1xCx1x1 for 4D and 1xCx1x1x1 for 5D for spatial, and are not 1xCxHxW for 4D and 1xCxDxHxW for 5D for per-activation mode.Exactly one of
saveMean,saveInvVariancepointers areNULL.Exactly one of
resultRunningMeanData,resultRunningInvVarianceDatapointers areNULL.epsilonvalue is less thanCUDNN_BN_MIN_EPSILON.Dimensions or data types mismatch for
xDescandyDesc.
cudnnCreateActivationDescriptor()#
This function creates an activation descriptor object by allocating the memory needed to hold its opaque structure. For more information, refer to cudnnActivationDescriptor_t.
cudnnStatus_t cudnnCreateActivationDescriptor( cudnnActivationDescriptor_t *activationDesc)
Returns
CUDNN_STATUS_SUCCESSThe object was created successfully.
CUDNN_STATUS_ALLOC_FAILEDThe resources could not be allocated.
cudnnCreateDropoutDescriptor()#
This function creates a generic dropout descriptor object by allocating the memory needed to hold its opaque structure. For more information, refer to cudnnDropoutDescriptor_t.
cudnnStatus_t cudnnCreateDropoutDescriptor( cudnnDropoutDescriptor_t *dropoutDesc)
Returns
CUDNN_STATUS_SUCCESSThe object was created successfully.
CUDNN_STATUS_ALLOC_FAILEDThe resources could not be allocated.
cudnnCreateFilterDescriptor()#
This function has been deprecated in cuDNN 9.0.
This function creates a filter descriptor object by allocating the memory needed to hold its opaque structure. For more information, refer to cudnnFilterDescriptor_t.
cudnnStatus_t cudnnCreateFilterDescriptor( cudnnFilterDescriptor_t *filterDesc)
Returns
CUDNN_STATUS_SUCCESSThe object was created successfully.
CUDNN_STATUS_ALLOC_FAILEDThe resources could not be allocated.
cudnnCreateLRNDescriptor()#
This function allocates the memory needed to hold the data needed for LRN and DivisiveNormalization layers operation and returns a descriptor used with subsequent layer forward and backward calls.
cudnnStatus_t cudnnCreateLRNDescriptor( cudnnLRNDescriptor_t *poolingDesc)
Returns
CUDNN_STATUS_SUCCESSThe object was created successfully.
CUDNN_STATUS_ALLOC_FAILEDThe resources could not be allocated.
cudnnCreateOpTensorDescriptor()#
This function creates a tensor pointwise math descriptor. For more information, refer to cudnnOpTensorDescriptor_t.
cudnnStatus_t cudnnCreateOpTensorDescriptor( cudnnOpTensorDescriptor_t* opTensorDesc)
Parameters
opTensorDescOutput. Pointer to the structure holding the description of the tensor pointwise math such as add, multiply, and more.
Returns
CUDNN_STATUS_SUCCESSThe function returned successfully.
CUDNN_STATUS_BAD_PARAMTensor pointwise math descriptor passed to the function is invalid.
CUDNN_STATUS_ALLOC_FAILEDMemory allocation for this tensor pointwise math descriptor failed.
cudnnCreatePoolingDescriptor()#
This function creates a pooling descriptor object by allocating the memory needed to hold its opaque structure.
cudnnStatus_t cudnnCreatePoolingDescriptor( cudnnPoolingDescriptor_t *poolingDesc)
Returns
CUDNN_STATUS_SUCCESSThe object was created successfully.
CUDNN_STATUS_ALLOC_FAILEDThe resources could not be allocated.
cudnnCreateReduceTensorDescriptor()#
This function creates a reduced tensor descriptor object by allocating the memory needed to hold its opaque structure.
cudnnStatus_t cudnnCreateReduceTensorDescriptor( cudnnReduceTensorDescriptor_t* reduceTensorDesc)
Returns
CUDNN_STATUS_SUCCESSThe object was created successfully.
CUDNN_STATUS_BAD_PARAMreduceTensorDescis aNULLpointer.CUDNN_STATUS_ALLOC_FAILEDThe resources could not be allocated.
cudnnCreateSpatialTransformerDescriptor()#
This function creates a generic spatial transformer descriptor object by allocating the memory needed to hold its opaque structure.
cudnnStatus_t cudnnCreateSpatialTransformerDescriptor( cudnnSpatialTransformerDescriptor_t *stDesc)
Returns
CUDNN_STATUS_SUCCESSThe object was created successfully.
CUDNN_STATUS_ALLOC_FAILEDThe resources could not be allocated.
cudnnCreateTensorDescriptor()#
This function creates a generic tensor descriptor object by allocating the memory needed to hold its opaque structure. The data is initialized to all zeros.
cudnnStatus_t cudnnCreateTensorDescriptor( cudnnTensorDescriptor_t *tensorDesc)
Parameters
tensorDescInput. Pointer to pointer where the address to the allocated tensor descriptor object should be stored.
Returns
CUDNN_STATUS_BAD_PARAMInvalid input argument.
CUDNN_STATUS_ALLOC_FAILEDThe resources could not be allocated.
CUDNN_STATUS_SUCCESSThe object was created successfully.
cudnnCreateTensorTransformDescriptor()#
This function has been deprecated in cuDNN 9.0.
This function creates a tensor transform descriptor object by allocating the memory needed to hold its opaque structure. The tensor data is initialized to all zeros. Use the cudnnSetTensorTransformDescriptor() function to initialize the descriptor created by this function.
cudnnStatus_t cudnnCreateTensorTransformDescriptor( cudnnTensorTransformDescriptor_t *transformDesc);
Parameters
transformDescOutput. A pointer to an uninitialized tensor transform descriptor.
Returns
CUDNN_STATUS_SUCCESSThe descriptor object was created successfully.
CUDNN_STATUS_BAD_PARAMThe
transformDescisNULL.CUDNN_STATUS_ALLOC_FAILEDThe memory allocation failed.
cudnnDeriveBNTensorDescriptor()#
This function derives a secondary tensor descriptor for the batch normalization scale, invVariance, bnBias, and bnScale subtensors from the layer’s x data descriptor.
cudnnStatus_t cudnnDeriveBNTensorDescriptor( cudnnTensorDescriptor_t derivedBnDesc, const cudnnTensorDescriptor_t xDesc, cudnnBatchNormMode_t mode)
Use the tensor descriptor produced by this function as the bnScaleBiasMeanVarDesc parameter for the cudnnBatchNormalizationForwardInference() and cudnnBatchNormalizationForwardTraining() functions, and as the bnScaleBiasDiffDesc parameter in the cudnnBatchNormalizationBackward() function.
The resulting dimensions will be:
1xCx1x1 for 4D and 1xCx1x1x1 for 5D for
BATCHNORM_MODE_SPATIAL1xCxHxW for 4D and 1xCxDxHxW for 5D for
BATCHNORM_MODE_PER_ACTIVATIONmode
For HALF input data type the resulting tensor descriptor will have a FLOAT type. For other data types, it will have the same type as the input data.
Note
Only 4D and 5D tensors are supported.
The
derivedBnDescshould be first created using cudnnCreateTensorDescriptor().
xDescis the descriptor for the layer’sxdata and has to be set up with proper dimensions prior to calling this function.
Parameters
derivedBnDescOutput. Handle to a previously created tensor descriptor.
xDescInput. Handle to a previously created and initialized layer’s
xdata descriptor.modeInput. Batch normalization layer mode of operation.
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_BAD_PARAMInvalid batch normalization mode.
cudnnDeriveNormTensorDescriptor()#
This function derives tensor descriptors for the normalization mean, invariance, normBias, and normScale subtensors from the layer’s x data descriptor and norm mode. Normalization mean and invariance share the same descriptor while bias and scale share the same descriptor.
cudnnStatus_t CUDNNWINAPI cudnnDeriveNormTensorDescriptor(cudnnTensorDescriptor_t derivedNormScaleBiasDesc, cudnnTensorDescriptor_t derivedNormMeanVarDesc, const cudnnTensorDescriptor_t xDesc, cudnnNormMode_t mode, int groupCnt)
Use the tensor descriptor produced by this function as the normScaleBiasDesc or normMeanVarDesc parameter for the cudnnNormalizationForwardInference() and cudnnNormalizationForwardTraining() functions, and as the dNormScaleBiasDesc and normMeanVarDesc parameters in the cudnnNormalizationBackward() function.
The resulting dimensions will be:
1xCx1x1 for 4D and 1xCx1x1x1 for 5D for
CUDNN_NORM_PER_ACTIVATION1xCxHxW for 4D and 1xCxDxHxW for 5D for
CUDNN_NORM_PER_CHANNELmode
For HALF input data type the resulting tensor descriptor will have a FLOAT type. For other data types, it will have the same type as the input data.
Note
Only 4D and 5D tensors are supported.
The
derivedNormScaleBiasDescandderivedNormMeanVarDescshould be first created using cudnnCreateTensorDescriptor().
xDescis the descriptor for the layer’sxdata and has to be set up with proper dimensions prior to calling this function.
Parameters
derivedNormScaleBiasDescOutput. Handle to a previously created tensor descriptor.
derivedNormMeanVarDescOutput. Handle to a previously created tensor descriptor.
xDescInput. Handle to a previously created and initialized layer’s
xdata descriptor.modeInput. Batch normalization layer mode of operation.
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_BAD_PARAMInvalid batch normalization mode.
cudnnDestroyActivationDescriptor()#
This function destroys a previously created activation descriptor object.
cudnnStatus_t cudnnDestroyActivationDescriptor( cudnnActivationDescriptor_t activationDesc)
Returns
CUDNN_STATUS_SUCCESSThe object was destroyed successfully.
cudnnDestroyDropoutDescriptor()#
This function destroys a previously created dropout descriptor object.
cudnnStatus_t cudnnDestroyDropoutDescriptor( cudnnDropoutDescriptor_t dropoutDesc)
Returns
CUDNN_STATUS_SUCCESSThe object was destroyed successfully.
cudnnDestroyFilterDescriptor()#
This function has been deprecated in cuDNN 9.0.
This function destroys a filter object.
cudnnStatus_t cudnnDestroyFilterDescriptor( cudnnFilterDescriptor_t filterDesc)
Returns
CUDNN_STATUS_SUCCESSThe object was destroyed successfully.
cudnnDestroyLRNDescriptor()#
This function destroys a previously created LRN descriptor object.
cudnnStatus_t cudnnDestroyLRNDescriptor( cudnnLRNDescriptor_t lrnDesc)
Returns
CUDNN_STATUS_SUCCESSThe object was destroyed successfully.
cudnnDestroyOpTensorDescriptor()#
This function deletes a tensor pointwise math descriptor object.
cudnnStatus_t cudnnDestroyOpTensorDescriptor( cudnnOpTensorDescriptor_t opTensorDesc)
Parameters
opTensorDescInput. Pointer to the structure holding the description of the tensor pointwise math to be deleted.
Returns
CUDNN_STATUS_SUCCESSThe function returned successfully.
cudnnDestroyPoolingDescriptor()#
This function destroys a previously created pooling descriptor object.
cudnnStatus_t cudnnDestroyPoolingDescriptor( cudnnPoolingDescriptor_t poolingDesc)
Returns
CUDNN_STATUS_SUCCESSThe object was destroyed successfully.
cudnnDestroyReduceTensorDescriptor()#
This function destroys a previously created reduce tensor descriptor object. When the input pointer is NULL, this function performs no destroy operation.
cudnnStatus_t cudnnDestroyReduceTensorDescriptor( cudnnReduceTensorDescriptor_t tensorDesc)
Parameters
tensorDescInput. Pointer to the reduce tensor descriptor object to be destroyed.
Returns
CUDNN_STATUS_SUCCESSThe object was destroyed successfully.
cudnnDestroySpatialTransformerDescriptor()#
This function destroys a previously created spatial transformer descriptor object.
cudnnStatus_t cudnnDestroySpatialTransformerDescriptor( cudnnSpatialTransformerDescriptor_t stDesc)
Returns
CUDNN_STATUS_SUCCESSThe object was destroyed successfully.
cudnnDestroyTensorDescriptor()#
This function destroys a previously created tensor descriptor object. When the input pointer is NULL, this function performs no destroy operation.
cudnnStatus_t cudnnDestroyTensorDescriptor(cudnnTensorDescriptor_t tensorDesc)
Parameters
tensorDescInput. Pointer to the tensor descriptor object to be destroyed.
Returns
CUDNN_STATUS_SUCCESSThe object was destroyed successfully.
cudnnDestroyTensorTransformDescriptor()#
This function has been deprecated in cuDNN 9.0.
Destroys a previously created tensor transform descriptor.
cudnnStatus_t cudnnDestroyTensorTransformDescriptor( cudnnTensorTransformDescriptor_t transformDesc);
Parameters
transformDescInput. The tensor transform descriptor to be destroyed.
Returns
CUDNN_STATUS_SUCCESSThe object was destroyed successfully.
cudnnDivisiveNormalizationBackward()#
This function performs the backward DivisiveNormalization layer computation.
cudnnStatus_t cudnnDivisiveNormalizationBackward( cudnnHandle_t handle, cudnnLRNDescriptor_t normDesc, cudnnDivNormMode_t mode, const void *alpha, const cudnnTensorDescriptor_t xDesc, const void *x, const void *means, const void *dy, void *temp, void *temp2, const void *beta, const cudnnTensorDescriptor_t dxDesc, void *dx, void *dMeans)
Supported tensor formats are NCHW for 4D and NCDHW for 5D with any non-overlapping non-negative strides. Only 4D and 5D tensors are supported.
Parameters
handleInput. Handle to a previously created cuDNN library descriptor.
normDescInput. Handle to a previously initialized LRN parameter descriptor (this descriptor is used for both LRN and
DivisiveNormalizationlayers).modeInput.
DivisiveNormalizationlayer mode of operation. Currently onlyCUDNN_DIVNORM_PRECOMPUTED_MEANSis implemented. Normalization is performed using the means input tensor that is expected to be precomputed by the user.alpha,betaInputs. Pointers to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows:
dstValue = alpha[0]*resultValue + beta[0]*priorDstValue.
For more information, refer to Scaling Parameters.
xDesc,x,meansInputs. Tensor descriptor and pointers in device memory for the layer’s
xand means data. Note that themeanstensor is expected to be precomputed by the user. It can also contain any valid values (not required to be actualmeans, and can be for instance a result of a convolution with a Gaussian kernel).dyInput. Tensor pointer in device memory for the layer’s
dycumulative loss differential data (error backpropagation).temp,temp2Workspace. Temporary tensors in device memory. These are used for computing intermediate values during the backward pass. These tensors do not have to be preserved from forward to backward pass. Both use
xDescas a descriptor.dxDescInput. Tensor descriptor for
dxanddMeans.dx,dMeansOutputs. Tensor pointers (in device memory) for the layers resulting in cumulative gradients
dxanddMeans(dLoss/dxanddLoss/dMeans). Both share the same descriptor.
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
One of the tensor pointers
x,dx,temp,tmep2, anddyisNULL.Number of any of the input or output tensor dimensions is not within the [4,5] range.
Either
alphaorbetapointer isNULL.A mismatch in dimensions between
xDescanddxDesc.LRN descriptor parameters are outside of their valid ranges.
Any of the tensor strides is negative.
CUDNN_STATUS_UNSUPPORTEDThe function does not support the provided configuration, for example, any of the input and output tensor strides mismatch (for the same dimension) is a non-supported configuration.
cudnnDivisiveNormalizationForward()#
This function performs the forward spatial DivisiveNormalization layer computation. It divides every value in a layer by the standard deviation of its spatial neighbors. Note that DivisiveNormalization only implements the x/max(c, sigma_x) portion of the computation, where sigma_x is the variance over the spatial neighborhood of x.
cudnnStatus_t cudnnDivisiveNormalizationForward( cudnnHandle_t handle, cudnnLRNDescriptor_t normDesc, cudnnDivNormMode_t mode, const void *alpha, const cudnnTensorDescriptor_t xDesc, const void *x, const void *means, void *temp, void *temp2, const void *beta, const cudnnTensorDescriptor_t yDesc, void *y)
The full LCN (Local Contrastive Normalization) computation can be implemented as a two-step process:
x_m = x-mean(x); y = x_m/max(c, sigma(x_m));
The x-mean(x) which is often referred to as “subtractive normalization” portion of the computation can be implemented using cuDNN average pooling layer followed by a call to addTensor.
Supported tensor formats are NCHW for 4D and NCDHW for 5D with any non-overlapping non-negative strides. Only 4D and 5D tensors are supported.
Parameters
handleInput. Handle to a previously created cuDNN library descriptor.
normDescInput. Handle to a previously initialized LRN parameter descriptor. This descriptor is used for both LRN and
DivisiveNormalizationlayers.divNormModeInput.
DivisiveNormalizationlayer mode of operation. Currently onlyCUDNN_DIVNORM_PRECOMPUTED_MEANSis implemented. Normalization is performed using the means input tensor that is expected to be precomputed by the user.alpha,betaInput. Pointers to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows:
dstValue = alpha[0]*resultValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDesc,yDescInput. Tensor descriptor objects for the input and output tensors. Note that
xDescis shared betweenx,means,temp, andtemp2tensors.xInput. Input tensor data pointer in device memory.
meansInput. Input means tensor data pointer in device memory. Note that this tensor can be
NULL(in that case its values are assumed to be zero during the computation). This tensor also doesn’t have to containmeans, these can be any values, a frequently used variation is a result of convolution with a normalized positive kernel (such as Gaussian).temp,temp2Workspace. Temporary tensors in device memory. These are used for computing intermediate values during the forward pass. These tensors do not have to be preserved as inputs from forward to the backward pass. Both use
xDescas their descriptor.yOutput. Pointer in device memory to a tensor for the result of the forward
DivisiveNormalizationcomputation.
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
One of the tensor pointers
x,y,temp, andtemp2isNULL.Number of input tensor or output tensor dimensions is outside of [4,5] range.
A mismatch in dimensions between any two of the input or output tensors.
For in-place computation when pointers
x == y, a mismatch in strides between the input data and output data tensors.alphaorbetapointer isNULL.LRN descriptor parameters are outside of their valid ranges.
Any of the tensor strides are negative.
CUDNN_STATUS_UNSUPPORTEDThe function does not support the provided configuration, for example, any of the input and output tensor strides mismatch (for the same dimension) is a non-supported configuration.
cudnnDropoutBackward()#
This function performs backward dropout operation over dy returning results in dx. If during forward dropout operation value from x was propagated to y then during backward operation value from dy will be propagated to dx, otherwise, dx value will be set to 0.
cudnnStatus_t cudnnDropoutBackward( cudnnHandle_t handle, const cudnnDropoutDescriptor_t dropoutDesc, const cudnnTensorDescriptor_t dydesc, const void *dy, const cudnnTensorDescriptor_t dxdesc, void *dx, void *reserveSpace, size_t reserveSpaceSizeInBytes)
Better performance is obtained for fully packed tensors.
Parameters
handleInput. Handle to a previously created cuDNN context.
dropoutDescInput. Previously created dropout descriptor object.
dyDescInput. Handle to a previously initialized tensor descriptor.
dyInput. Pointer to data of the tensor described by the
dyDescdescriptor.dxDescInput. Handle to a previously initialized tensor descriptor.
dxOutput. Pointer to data of the tensor described by the
dxDescdescriptor.reserveSpaceInput. Pointer to user-allocated GPU memory used by this function. It is expected that
reserveSpacewas populated during a call tocudnnDropoutForwardand has not been changed.reserveSpaceSizeInBytesInput. Specifies the size in bytes of the provided memory for the reserve space
Returns
CUDNN_STATUS_SUCCESSThe call was successful.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
The number of elements of input tensor and output tensors differ.
The
datatypeof the input tensor and output tensors differs.The strides of the input tensor and output tensors differ and in-place operation is used (meaning,
xandypointers are equal).The provided
reserveSpaceSizeInBytesis less than the value returned bycudnnDropoutGetReserveSpaceSize().cudnnSetDropoutDescriptor()has not been called ondropoutDescwith non-NULLstates argument.
CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnDropoutForward()#
This function performs forward dropout operation over x returning results in y. If dropout was used as a parameter to cudnnSetDropoutDescriptor(), the approximate dropout fraction of x values will be replaced by a 0, and the rest will be scaled by 1/(1-dropout). This function should not be running concurrently with another cudnnDropoutForward() function using the same states.
cudnnStatus_t cudnnDropoutForward( cudnnHandle_t handle, const cudnnDropoutDescriptor_t dropoutDesc, const cudnnTensorDescriptor_t xdesc, const void *x, const cudnnTensorDescriptor_t ydesc, void *y, void *reserveSpace, size_t reserveSpaceSizeInBytes)Note
Better performance is obtained for fully packed tensors.
This function should not be called during inference.
Parameters
handleInput. Handle to a previously created cuDNN context.
dropoutDescInput. Previously created dropout descriptor object.
xDescInput. Handle to a previously initialized tensor descriptor.
xInput. Pointer to data of the tensor described by the
xDescdescriptor.yDescInput. Handle to a previously initialized tensor descriptor.
yOutput. Pointer to data of the tensor described by the
yDescdescriptor.reserveSpaceOutput. Pointer to user-allocated GPU memory used by this function. It is expected that the contents of
reserveSpacedoes not change betweencudnnDropoutForward()and cudnnDropoutBackward() calls.reserveSpaceSizeInBytesInput. Specifies the size in bytes of the provided memory for the reserve space.
Returns
CUDNN_STATUS_SUCCESSThe call was successful.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
The number of elements of input tensor and output tensors differ.
The
datatypeof the input tensor and output tensors differs.The strides of the input tensor and output tensors differ and in-place operation is used (meaning,
xandypointers are equal).The provided
reserveSpaceSizeInBytesis less than the value returned by cudnnDropoutGetReserveSpaceSize().cudnnSetDropoutDescriptor() has not been called on
dropoutDescwith non-NULLstates argument.
CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnDropoutGetReserveSpaceSize()#
This function is used to query the amount of reserve needed to run dropout with the input dimensions given by xDesc. The same reserve space is expected to be passed to cudnnDropoutForward() and cudnnDropoutBackward(), and its contents is expected to remain unchanged between cudnnDropoutForward() and cudnnDropoutBackward() calls.
cudnnStatus_t cudnnDropoutGetReserveSpaceSize( cudnnTensorDescriptor_t xDesc, size_t *sizeInBytes)
Parameters
xDescInput. Handle to a previously initialized tensor descriptor, describing input to a dropout operation.
sizeInBytesOutput. Amount of GPU memory needed as reserve space to be able to run dropout with an input tensor descriptor specified by
xDesc.
Returns
CUDNN_STATUS_SUCCESSThe query was successful.
cudnnDropoutGetStatesSize()#
This function is used to query the amount of space required to store the states of the random number generators used by the cudnnDropoutForward() function.
cudnnStatus_t cudnnDropoutGetStatesSize( cudnnHandle_t handle, size_t *sizeInBytes)
Parameters
handleInput. Handle to a previously created cuDNN context.
sizeInBytesOutput. Amount of GPU memory needed to store random generator states.
Returns
CUDNN_STATUS_SUCCESSThe query was successful.
cudnnGetActivationDescriptor()#
This function queries a previously initialized generic activation descriptor object.
cudnnStatus_t cudnnGetActivationDescriptor( const cudnnActivationDescriptor_t activationDesc, cudnnActivationMode_t *mode, cudnnNanPropagation_t *reluNanOpt, double *coef)
Parameters
activationDescInput. Handle to a previously created activation descriptor.
modeOutput. Enumerant to specify the activation mode.
reluNanOptOutput. Enumerant to specify the Nan propagation mode.
coefOutput. Floating point number to specify the clipping threshold when the activation mode is set to
CUDNN_ACTIVATION_CLIPPED_RELUor to specify the alpha coefficient when the activation mode is set toCUDNN_ACTIVATION_ELU.
Returns
CUDNN_STATUS_SUCCESSThe object was queried successfully.
cudnnGetActivationDescriptorSwishBeta()#
This function queries the current beta parameter set for SWISH activation.
cudnnStatus_t cudnnGetActivationDescriptorSwishBeta(cudnnActivationDescriptor_t activationDesc, double* swish_beta)
Parameters
activationDescInput. Handle to a previously created activation descriptor.
swish_betaOutput. Pointer to a double value that will receive the currently configured SWISH beta parameter.
Returns
CUDNN_STATUS_SUCCESSThe beta parameter was queried successfully.
CUDNN_STATUS_BAD_PARAMAt least one of
activationDescorswish_betawereNULL.
cudnnGetBatchNormalizationBackwardExWorkspaceSize()#
This function returns the amount of GPU memory workspace the user should allocate to be able to call cudnnGetBatchNormalizationBackwardExWorkspaceSize() function for the specified bnOps input setting. The workspace allocated will then be passed to the function cudnnGetBatchNormalizationBackwardExWorkspaceSize().
cudnnStatus_t cudnnGetBatchNormalizationBackwardExWorkspaceSize( cudnnHandle_t handle, cudnnBatchNormMode_t mode, cudnnBatchNormOps_t bnOps, const cudnnTensorDescriptor_t xDesc, const cudnnTensorDescriptor_t yDesc, const cudnnTensorDescriptor_t dyDesc, const cudnnTensorDescriptor_t dzDesc, const cudnnTensorDescriptor_t dxDesc, const cudnnTensorDescriptor_t dBnScaleBiasDesc, const cudnnActivationDescriptor_t activationDesc, size_t *sizeInBytes);
Parameters
handleInput. Handle to a previously created cuDNN library descriptor. For more information, refer to cudnnHandle_t.
modeInput. Mode of operation (spatial or per-activation). For more information, refer to cudnnBatchNormMode_t.
bnOpsInput. Mode of operation for the fast NHWC kernel. For more information, refer to cudnnBatchNormOps_t. This input can be used to set this function to perform either only the batch normalization, or batch normalization followed by activation, or batch normalization followed by element-wise addition and then activation.
xDesc,yDesc,dyDesc,dzDesc,dxDescInputs. Tensor descriptors and pointers in the device memory for the layer’s
xdata, back propagated differentialdy(inputs), the optionalyinput data, the optionaldzoutput, and thedxoutput, which is the resulting differential with respect tox. For more information, refer to cudnnTensorDescriptor_t.dBnScaleBiasDescInput. Shared tensor descriptor for the following six tensors:
bnScaleData,bnBiasData,dBnScaleData,dBnBiasData,savedMean, andsavedInvVariance. This is the shared tensor descriptor desc for the secondary tensor that was derived by cudnnDeriveBNTensorDescriptor(). The dimensions for this tensor descriptor are dependent on normalization mode. Note that the data type of this tensor descriptor must befloatfor FP16 and FP32 input tensors, anddoublefor FP64 input tensors.activationDescInput. Descriptor for the activation operation. When the
bnOpsinput is set to eitherCUDNN_BATCHNORM_OPS_BN_ACTIVATIONorCUDNN_BATCHNORM_OPS_BN_ADD_ACTIVATION, then this activation is used, otherwise user may passNULL.*sizeInBytesOutput. Amount of GPU memory required for the workspace, as determined by this function, to be able to execute the
cudnnGetBatchNormalizationBackwardExWorkspaceSize()function with the specifiedbnOpsinput setting.
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
Number of
xDesc,yDesc, ordxDesctensor descriptor dimensions is not within the range of [4,5] (only 4D and 5D tensors are supported).dBnScaleBiasDescdimensions not 1xCx1x1 for 4D and 1xCx1x1x1 for 5D for spatial, and are not 1xCxHxW for 4D and 1xCxDxHxW for 5D for per-activation mode.Dimensions or data types mismatch for any pair of
xDesc,dyDesc, ordxDesc.
cudnnGetBatchNormalizationForwardTrainingExWorkspaceSize()#
This function has been deprecated in cuDNN 9.0.
This function returns the amount of GPU memory workspace the user should allocate to be able to call cudnnGetBatchNormalizationForwardTrainingExWorkspaceSize() function for the specified bnOps input setting. The workspace allocated should then be passed by the user to the function cudnnGetBatchNormalizationForwardTrainingExWorkspaceSize().
cudnnStatus_t cudnnGetBatchNormalizationForwardTrainingExWorkspaceSize( cudnnHandle_t handle, cudnnBatchNormMode_t mode, cudnnBatchNormOps_t bnOps, const cudnnTensorDescriptor_t xDesc, const cudnnTensorDescriptor_t zDesc, const cudnnTensorDescriptor_t yDesc, const cudnnTensorDescriptor_t bnScaleBiasMeanVarDesc, const cudnnActivationDescriptor_t activationDesc, size_t *sizeInBytes);
Parameters
handleInput. Handle to a previously created cuDNN library descriptor. For more information, refer to cudnnHandle_t.
modeInput. Mode of operation (spatial or per-activation). For more information, refer to cudnnBatchNormMode_t.
bnOpsInput. Mode of operation for the fast NHWC kernel. For more information, refer to cudnnBatchNormOps_t. This input can be used to set this function to perform either only the batch normalization, or batch normalization followed by activation, or batch normalization followed by element-wise addition and then activation.
xDesc,zDesc,yDescInputs. Tensor descriptors and pointers in the device memory for the layer’s
xdata, the optionalzinput data, and theyoutput.zDescis only needed whenbnOpsisCUDNN_BATCHNORM_OPS_BN_ADD_ACTIVATION, otherwise the user may passNULL. For more information, refer to cudnnTensorDescriptor_t.bnScaleBiasMeanVarDescInput. Shared tensor descriptor for the following six tensors:
bnScaleData,bnBiasData,dBnScaleData,dBnBiasData,savedMean, andsavedInvVariance. This is the shared tensor descriptor desc for the secondary tensor that was derived by cudnnDeriveBNTensorDescriptor(). The dimensions for this tensor descriptor are dependent on normalization mode. Note that the data type of this tensor descriptor must befloatfor FP16 and FP32 input tensors, anddoublefor FP64 input tensors.activationDescInput. Descriptor for the activation operation. When the
bnOpsinput is set to eitherCUDNN_BATCHNORM_OPS_BN_ACTIVATIONorCUDNN_BATCHNORM_OPS_BN_ADD_ACTIVATION, then this activation is used, otherwise user may passNULL.*sizeInBytesOutput. Amount of GPU memory required for the workspace, as determined by this function, to be able to execute the
cudnnGetBatchNormalizationBackwardExWorkspaceSize()function with the specifiedbnOpsinput setting.
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
Number of
xDesc,yDesc, ordxDesctensor descriptor dimensions is not within the range of [4,5] (only 4D and 5D tensors are supported).dBnScaleBiasDescdimensions not 1xCx1x1 for 4D and 1xCx1x1x1 for 5D for spatial, and are not 1xCxHxW for 4D and 1xCxDxHxW for 5D for per-activation mode.Dimensions or data types mismatch for any pair of
xDescordyDesc.
cudnnGetBatchNormalizationTrainingExReserveSpaceSize()#
This function has been deprecated in cuDNN 9.0.
This function returns the amount of reserve GPU memory workspace the user should allocate for the batch normalization operation, for the specified bnOps input setting. In contrast to the workspace, the reserved space should be preserved between the forward and backward calls, and the data should not be altered.
cudnnStatus_t cudnnGetBatchNormalizationTrainingExReserveSpaceSize( cudnnHandle_t handle, cudnnBatchNormMode_t mode, cudnnBatchNormOps_t bnOps, const cudnnActivationDescriptor_t activationDesc, const cudnnTensorDescriptor_t xDesc, size_t *sizeInBytes);
Parameters
handleInput. Handle to a previously created cuDNN library descriptor. For more information, refer to cudnnHandle_t.
modeInput. Mode of operation (spatial or per-activation). For more information, refer to cudnnBatchNormMode_t.
bnOpsInput. Mode of operation for the fast NHWC kernel. For more information, refer to cudnnBatchNormOps_t. This input can be used to set this function to perform either only the batch normalization, or batch normalization followed by activation, or batch normalization followed by element-wise addition and then activation.
xDescInput. Tensor descriptors and pointers in the device memory for the layer’s
xdata. For more information, refer to cudnnTensorDescriptor_t.activationDescInput. Descriptor for the activation operation. When the
bnOpsinput is set to eitherCUDNN_BATCHNORM_OPS_BN_ACTIVATIONorCUDNN_BATCHNORM_OPS_BN_ADD_ACTIVATION, then this activation is used, otherwise user may passNULL.*sizeInBytesOutput. Amount of GPU memory reserved.
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
The
xDesctensor descriptor dimension is not within the [4,5] range (only 4D and 5D tensors are supported).
cudnnGetDropoutDescriptor()#
This function queries the fields of a previously initialized dropout descriptor.
cudnnStatus_t cudnnGetDropoutDescriptor( cudnnDropoutDescriptor_t dropoutDesc, cudnnHandle_t handle, float *dropout, void **states, unsigned long long *seed)
Parameters
dropoutDescInput. Previously initialized dropout descriptor.
handleInput. Handle to a previously created cuDNN context.
dropoutOutput. The probability with which the value from input is set to 0 during the dropout layer.
statesOutput. Pointer to user-allocated GPU memory that holds random number generator states.
seedOutput. Seed used to initialize random number generator states.
Returns
CUDNN_STATUS_SUCCESSThe call was successful.
CUDNN_STATUS_BAD_PARAMOne or more of the arguments was an invalid pointer.
cudnnGetFilter4dDescriptor()#
This function has been deprecated in cuDNN 9.0.
This function queries the parameters of the previously initialized Filter4d descriptor object.
cudnnStatus_t cudnnGetFilter4dDescriptor( const cudnnFilterDescriptor_t filterDesc, cudnnDataType_t *dataType, cudnnTensorFormat_t *format, int *k, int *c, int *h, int *w)
Parameters
filterDescInput. Handle to a previously created filter descriptor.
datatypeOutput. Data type.
formatOutput. Type of format.
kOutput. Number of output feature maps.
cOutput. Number of input feature maps.
hOutput. Height of each filter.
wOutput. Width of each filter.
Returns
CUDNN_STATUS_SUCCESSThe object was set successfully.
cudnnGetFilterNdDescriptor()#
This function has been deprecated in cuDNN 9.0.
This function queries a previously initialized FilterNd descriptor object.
cudnnStatus_t cudnnGetFilterNdDescriptor( const cudnnFilterDescriptor_t wDesc, int nbDimsRequested, cudnnDataType_t *dataType, cudnnTensorFormat_t *format, int *nbDims, int filterDimA[])
Parameters
wDescInput. Handle to a previously initialized filter descriptor.
nbDimsRequestedInput. Dimension of the expected filter descriptor. It is also the minimum size of the arrays filterDimA in order to be able to hold the results
datatypeOutput. Data type.
formatOutput. Type of format.
nbDimsOutput. Actual dimension of the filter.
filterDimAOutput. Array of dimensions of at least
nbDimsRequestedthat will be filled with the filter parameters from the provided filter descriptor.
Returns
CUDNN_STATUS_SUCCESSThe object was set successfully.
CUDNN_STATUS_BAD_PARAMThe parameter
nbDimsRequestedis negative.
cudnnGetFilterSizeInBytes()#
This function has been deprecated in cuDNN 9.0.
This function returns the size of the filter tensor in memory with respect to the given descriptor. It can be used to know the amount of GPU memory to be allocated to hold that filter tensor.
cudnnStatus_t cudnnGetFilterSizeInBytes(const cudnnFilterDescriptor_t filterDesc, size_t *size);
Parameters
filterDescInput. handle to a previously initialized filter descriptor.
sizeOutput. size in bytes needed to hold the tensor in GPU memory.
Returns
CUDNN_STATUS_SUCCESSfilterDescis valid.CUDNN_STATUS_BAD_PARAMfilerDescis invalid.
cudnnGetLRNDescriptor()#
This function retrieves values stored in the previously initialized LRN descriptor object.
cudnnStatus_t cudnnGetLRNDescriptor( cudnnLRNDescriptor_t normDesc, unsigned *lrnN, double *lrnAlpha, double *lrnBeta, double *lrnK)
Parameters
normDescOutput. Handle to a previously created LRN descriptor.
lrnN, lrnAlpha, lrnBeta, lrnK
Outputs. Pointers to receive values of parameters stored in the descriptor object. For more information, refer to cudnnSetLRNDescriptor(). Any of these pointers can be NULL (no value is returned for the corresponding parameter).
Returns
CUDNN_STATUS_SUCCESSFunction completed successfully.
cudnnGetNormalizationBackwardWorkspaceSize()#
This function returns the amount of GPU memory workspace the user should allocate to be able to call cudnnNormalizationBackward() function for the specified normOps and algo input setting. The workspace allocated will then be passed to the function cudnnNormalizationBackward().
cudnnStatus_t cudnnGetNormalizationBackwardWorkspaceSize(cudnnHandle_t handle, cudnnNormMode_t mode, cudnnNormOps_t normOps, cudnnNormAlgo_t algo, const cudnnTensorDescriptor_t xDesc, const cudnnTensorDescriptor_t yDesc, const cudnnTensorDescriptor_t dyDesc, const cudnnTensorDescriptor_t dzDesc, const cudnnTensorDescriptor_t dxDesc, const cudnnTensorDescriptor_t dNormScaleBiasDesc, const cudnnActivationDescriptor_t activationDesc, const cudnnTensorDescriptor_t normMeanVarDesc, size_t *sizeInBytes, int groupCnt);
Parameters
handleInput. Handle to a previously created cuDNN library descriptor. For more information, refer to cudnnHandle_t.
modeInput. Mode of operation (per-channel or per-activation). For more information, refer to cudnnNormMode_t.
normOpsInput. Mode of post-operative. Currently
CUDNN_NORM_OPS_NORM_ACTIVATIONandCUDNN_NORM_OPS_NORM_ADD_ACTIVATIONare only supported in the NHWC layout. For more information, refer to cudnnNormOps_t. This input can be used to set this function to perform either only the normalization, or normalization followed by activation, or normalization followed by element-wise addition and then activation.algoInput. Algorithm to be performed. For more information, refer to cudnnNormAlgo_t.
xDesc,yDesc,dyDesc,dzDesc,dxDescInputs. Tensor descriptors and pointers in the device memory for the layer’s
xdata, back propagated differentialdy(inputs), the optionalyinput data, the optionaldzoutput, and thedxoutput, which is the resulting differential with respect tox. For more information, refer to cudnnTensorDescriptor_t.dNormScaleBiasDescInput. Shared tensor descriptor for the following four tensors:
normScaleData,normBiasData,dNormScaleData,dNormBiasData. The dimensions for this tensor descriptor are dependent on normalization mode. Note that the data type of this tensor descriptor must befloatfor FP16 and FP32 input tensors, anddoublefor FP64 input tensors.activationDescInput. Descriptor for the activation operation. When the
normOpsinput is set to eitherCUDNN_NORM_OPS_NORM_ACTIVATIONorCUDNN_NORM_OPS_NORM_ADD_ACTIVATION, then this activation is used, otherwise the user may passNULL.normMeanVarDescInput. Shared tensor descriptor for the following tensors:
savedMeanandsavedInvVariance. The dimensions for this tensor descriptor are dependent on normalization mode. Note that the data type of this tensor descriptor must befloatfor FP16 and FP32 input tensors, anddoublefor FP64 input tensors.*sizeInBytesOutput. Amount of GPU memory required for the workspace, as determined by this function, to be able to execute the cudnnGetNormalizationForwardTrainingWorkspaceSize() function with the specified
normOpsinput setting.groutCntInput. Only support 1 for now.
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
Number of
xDesc,yDesc, ordxDesctensor descriptor dimensions is not within the range of [4,5] (only 4D and 5D tensors are supported).dNormScaleBiasDescdimensions not 1xCx1x1 for 4D and 1xCx1x1x1 for 5D for per-channel, and are not 1xCxHxW for 4D and 1xCxDxHxW for 5D for per-activation mode.Dimensions or data types mismatch for any pair of
xDesc,dyDesc, ordxDesc.
cudnnGetNormalizationForwardTrainingWorkspaceSize()#
This function returns the amount of GPU memory workspace the user should allocate to be able to call cudnnNormalizationForwardTraining() function for the specified normOps and algo input setting. The workspace allocated should then be passed by the user to the function cudnnNormalizationForwardTraining().
cudnnStatus_t cudnnGetNormalizationForwardTrainingWorkspaceSize(cudnnHandle_t handle, cudnnNormMode_t mode, cudnnNormOps_t normOps, cudnnNormAlgo_t algo, const cudnnTensorDescriptor_t xDesc, const cudnnTensorDescriptor_t zDesc, const cudnnTensorDescriptor_t yDesc, const cudnnTensorDescriptor_t normScaleBiasDesc, const cudnnActivationDescriptor_t activationDesc, const cudnnTensorDescriptor_t normMeanVarDesc, size_t *sizeInBytes, int groupCnt);
Parameters
handleInput. Handle to a previously created cuDNN library descriptor. For more information, refer to cudnnHandle_t.
modeInput. Mode of operation (per-channel or per-activation). For more information, refer to cudnnNormMode_t.
normOpsInput. Mode of post-operative. Currently
CUDNN_NORM_OPS_NORM_ACTIVATIONandCUDNN_NORM_OPS_NORM_ADD_ACTIVATIONare only supported in the NHWC layout. For more information, refer to cudnnNormOps_t. This input can be used to set this function to perform either only the normalization, or normalization followed by activation, or normalization followed by element-wise addition and then activation.algoInput. Algorithm to be performed. For more information, refer to cudnnNormAlgo_t.
xDesc,zDesc,yDescInputs. Tensor descriptors and pointers in the device memory for the layer’s
xdata, back propagated differentialdy(inputs), the optionalyinput data, the optionaldzoutput, and thedxoutput, which is the resulting differential with respect tox. For more information, refer to cudnnTensorDescriptor_t.normScaleBiasDescInput. Shared tensor descriptor for the following four tensors:
normScaleDataandnormBiasData. The dimensions for this tensor descriptor are dependent on normalization mode. Note that the data type of this tensor descriptor must befloatfor FP16 and FP32 input tensors, anddoublefor FP64 input tensors.activationDescInput. Descriptor for the activation operation. When the
normOpsinput is set to eitherCUDNN_NORM_OPS_NORM_ACTIVATIONorCUDNN_NORM_OPS_NORM_ADD_ACTIVATION, then this activation is used, otherwise the user may passNULL.normMeanVarDescInput. Shared tensor descriptor for the following tensors:
savedMeanandsavedInvVariance. The dimensions for this tensor descriptor are dependent on normalization mode. Note that the data type of this tensor descriptor must befloatfor FP16 and FP32 input tensors, anddoublefor FP64 input tensors.*sizeInBytesOutput. Amount of GPU memory required for the workspace, as determined by this function, to be able to execute the cudnnGetNormalizationForwardTrainingWorkspaceSize() function with the specified
normOpsinput setting.groutCntInput. Only support 1 for now.
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
Number of
xDesc,yDesc, orzDesctensor descriptor dimensions is not within the range of [4,5] (only 4D and 5D tensors are supported).normScaleBiasDescdimensions not 1xCx1x1 for 4D and 1xCx1x1x1 for 5D for per-channel, and are not 1xCxHxW for 4D and 1xCxDxHxW for 5D for per-activation mode.Dimensions or data types mismatch for any pair of
xDescoryDesc.
cudnnGetNormalizationTrainingReserveSpaceSize()#
This function returns the amount of reserve GPU memory workspace the user should allocate for the normalization operation, for the specified normOps input setting. In contrast to the workspace, the reserved space should be preserved between the forward and backward calls, and the data should not be altered.
cudnnStatus_t cudnnGetNormalizationTrainingReserveSpaceSize(cudnnHandle_t handle, cudnnNormMode_t mode, cudnnNormOps_t normOps, cudnnNormAlgo_t algo, const cudnnActivationDescriptor_t activationDesc, const cudnnTensorDescriptor_t xDesc, size_t *sizeInBytes, int groupCnt);
Parameters
handleInput. Handle to a previously created cuDNN library descriptor. For more information, refer to cudnnHandle_t.
modeInput. Mode of operation (per-channel or per-activation). For more information, refer to cudnnNormMode_t.
normOpsInput. Mode of post-operative. Currently
CUDNN_NORM_OPS_NORM_ACTIVATIONandCUDNN_NORM_OPS_NORM_ADD_ACTIVATIONare only supported in the NHWC layout. For more information, refer to cudnnNormOps_t. This input can be used to set this function to perform either only the normalization, or normalization followed by activation, or normalization followed by element-wise addition and then activation.algoInput. Algorithm to be performed. For more information, refer to cudnnNormAlgo_t.
xDescInput. Tensor descriptors for the layer’s
xdata. For more information, refer to cudnnTensorDescriptor_t.activationDescInput. Descriptor for the activation operation. When the
normOpsinput is set to eitherCUDNN_NORM_OPS_NORM_ACTIVATIONorCUDNN_NORM_OPS_NORM_ADD_ACTIVATION, then this activation is used, otherwise the user may passNULL.*sizeInBytesOutput. Amount of GPU memory reserved.
groutCntInput. Only support 1 for now.
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
The
xDesctensor descriptor dimension is not within the [4,5] range (only 4D and 5D tensors are supported).
cudnnGetOpTensorDescriptor()#
This function returns the configuration of the passed tensor pointwise math descriptor.
cudnnStatus_t cudnnGetOpTensorDescriptor( const cudnnOpTensorDescriptor_t opTensorDesc, cudnnOpTensorOp_t *opTensorOp, cudnnDataType_t *opTensorCompType, cudnnNanPropagation_t *opTensorNanOpt)
Parameters
opTensorDescInput. Tensor pointwise math descriptor passed to get the configuration from.
opTensorOpOutput. Pointer to the tensor pointwise math operation type, associated with this tensor pointwise math descriptor.
opTensorCompTypeOutput. Pointer to the cuDNN data-type associated with this tensor pointwise math descriptor.
opTensorNanOptOutput. Pointer to the NAN propagation option associated with this tensor pointwise math descriptor.
Returns
CUDNN_STATUS_SUCCESSThe function returned successfully.
CUDNN_STATUS_BAD_PARAMInput tensor pointwise math descriptor passed is invalid.
cudnnGetPooling2dDescriptor()#
This function queries a previously created Pooling2d descriptor object.
cudnnStatus_t cudnnGetPooling2dDescriptor( const cudnnPoolingDescriptor_t poolingDesc, cudnnPoolingMode_t *mode, cudnnNanPropagation_t *maxpoolingNanOpt, int *windowHeight, int *windowWidth, int *verticalPadding, int *horizontalPadding, int *verticalStride, int *horizontalStride)
Parameters
poolingDescInput. Handle to a previously created pooling descriptor.
modeOutput. Enumerant to specify the pooling mode.
maxpoolingNanOptOutput. Enumerant to specify the Nan propagation mode.
windowHeightOutput. Height of the pooling window.
windowWidthOutput. Width of the pooling window.
verticalPaddingOutput. Size of vertical padding.
horizontalPaddingOutput. Size of horizontal padding.
verticalStrideOutput. Pooling vertical stride.
horizontalStrideOutput. Pooling horizontal stride.
Returns
CUDNN_STATUS_SUCCESSThe object was set successfully.
cudnnGetPooling2dForwardOutputDim()#
This function provides the output dimensions of a tensor after Pooling2d has been applied.
cudnnStatus_t cudnnGetPooling2dForwardOutputDim( const cudnnPoolingDescriptor_t poolingDesc, const cudnnTensorDescriptor_t inputDesc, int *outN, int *outC, int *outH, int *outW)
Each dimension h and w of the output images is computed as follows:
outputDim = 1 + (inputDim + 2*padding - windowDim)/poolingStride;
Parameters
poolingDescInput. Handle to a previously initialized pooling descriptor.
inputDescInput. Handle to the previously initialized input tensor descriptor.
NOutput. Number of images in the output.
COutput. Number of channels in the output.
HOutput. Height of images in the output.
WOutput. Width of images in the output.
Returns
CUDNN_STATUS_SUCCESSThe function launched successfully.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
poolingDeschas not been initialized.poolingDescorinputDeschas an invalid number of dimensions (2 and 4 respectively are required).
cudnnGetPoolingNdDescriptor()#
This function queries a previously initialized generic PoolingNd descriptor object.
cudnnStatus_t cudnnGetPoolingNdDescriptor( const cudnnPoolingDescriptor_t poolingDesc, int nbDimsRequested, cudnnPoolingMode_t *mode, cudnnNanPropagation_t *maxpoolingNanOpt, int *nbDims, int windowDimA[], int paddingA[], int strideA[])
Parameters
poolingDescInput. Handle to a previously created pooling descriptor.
nbDimsRequestedInput. Dimension of the expected pooling descriptor. It is also the minimum size of the arrays
windowDimA,paddingA, andstrideAin order to be able to hold the results.modeOutput. Enumerant to specify the pooling mode.
maxpoolingNanOptInput. Enumerant to specify the Nan propagation mode.
nbDimsOutput. Actual dimension of the pooling descriptor.
windowDimAOutput. Array of dimension of at least
nbDimsRequestedthat will be filled with the window parameters from the provided pooling descriptor.paddingAOutput. Array of dimension of at least
nbDimsRequestedthat will be filled with the padding parameters from the provided pooling descriptor.strideAOutput. Array of dimension at least
nbDimsRequestedthat will be filled with the stride parameters from the provided pooling descriptor.
Returns
CUDNN_STATUS_SUCCESSThe object was queried successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe parameter
nbDimsRequestedis greater thanCUDNN_DIM_MAX.
cudnnGetPoolingNdForwardOutputDim()#
This function provides the output dimensions of a tensor after PoolingNd has been applied.
cudnnStatus_t cudnnGetPoolingNdForwardOutputDim( const cudnnPoolingDescriptor_t poolingDesc, const cudnnTensorDescriptor_t inputDesc, int nbDims, int outDimA[])
Each dimension of the (nbDims-2)-D images of the output tensor is computed as follows:
outputDim = 1 + (inputDim + 2*padding - windowDim)/poolingStride;
Parameters
poolingDescInput. Handle to a previously initialized pooling descriptor.
inputDescInput. Handle to the previously initialized input tensor descriptor.
nbDimsInput. Number of dimensions in which pooling is to be applied.
outDimAOutput. Array of
nbDimsoutput dimensions.
Returns
CUDNN_STATUS_SUCCESSThe function launched successfully.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
poolingDeschas not been initialized.The value of
nbDimsis inconsistent with the dimensionality ofpoolingDescandinputDesc.
cudnnGetReduceTensorDescriptor()#
This function queries a previously initialized reduce tensor descriptor object.
cudnnStatus_t cudnnGetReduceTensorDescriptor( const cudnnReduceTensorDescriptor_t reduceTensorDesc, cudnnReduceTensorOp_t *reduceTensorOp, cudnnDataType_t *reduceTensorCompType, cudnnNanPropagation_t *reduceTensorNanOpt, cudnnReduceTensorIndices_t *reduceTensorIndices, cudnnIndicesType_t *reduceTensorIndicesType)
Parameters
reduceTensorDescInput. Pointer to a previously initialized reduce tensor descriptor object.
reduceTensorOpOutput. Enumerant to specify the reduced tensor operation.
reduceTensorCompTypeOutput. Enumerant to specify the computation datatype of the reduction.
reduceTensorNanOptInput. Enumerant to specify the Nan propagation mode.
reduceTensorIndicesOutput. Enumerant to specify the reduced tensor indices.
reduceTensorIndicesTypeOutput. Enumerant to specify the reduced tensor indices type.
Returns
CUDNN_STATUS_SUCCESSThe object was queried successfully.
CUDNN_STATUS_BAD_PARAMreduceTensorDescisNULL.
cudnnGetReductionIndicesSize()#
This is a helper function to return the minimum size of the index space to be passed to the reduction given the input and output tensors.
cudnnStatus_t cudnnGetReductionIndicesSize( cudnnHandle_t handle, const cudnnReduceTensorDescriptor_t reduceDesc, const cudnnTensorDescriptor_t aDesc, const cudnnTensorDescriptor_t cDesc, size_t *sizeInBytes)
Parameters
handleInput. Handle to a previously created cuDNN library descriptor.
reduceDescInput. Pointer to a previously initialized reduce tensor descriptor object.
aDescInput. Pointer to the input tensor descriptor.
cDescInput. Pointer to the output tensor descriptor.
sizeInBytesOutput. Minimum size of the index space to be passed to the reduction.
Returns
CUDNN_STATUS_SUCCESSThe index space size is returned successfully.
cudnnGetReductionWorkspaceSize()#
This is a helper function to return the minimum size of the workspace to be passed to the reduction given the input and output tensors.
cudnnStatus_t cudnnGetReductionWorkspaceSize( cudnnHandle_t handle, const cudnnReduceTensorDescriptor_t reduceDesc, const cudnnTensorDescriptor_t aDesc, const cudnnTensorDescriptor_t cDesc, size_t *sizeInBytes)
Parameters
handleInput. Handle to a previously created cuDNN library descriptor.
reduceDescInput. Pointer to a previously initialized reduce tensor descriptor object.
aDescInput. Pointer to the input tensor descriptor.
cDescInput. Pointer to the output tensor descriptor.
sizeInBytesOutput. Minimum size of the index space to be passed to the reduction.
Returns
CUDNN_STATUS_SUCCESSThe workspace size is returned successfully.
cudnnGetTensor4dDescriptor()#
This function queries the parameters of the previously initialized Tensor4d descriptor object.
cudnnStatus_t cudnnGetTensor4dDescriptor( const cudnnTensorDescriptor_t tensorDesc, cudnnDataType_t *dataType, int *n, int *c, int *h, int *w, int *nStride, int *cStride, int *hStride, int *wStride)
Parameters
tensorDescInput. Handle to a previously initialized tensor descriptor.
datatypeOutput. Data type.
nOutput. Number of images.
cOutput. Number of feature maps per image.
hOutput. Height of each feature map.
wOutput. Width of each feature map.
nStrideOutput. Stride between two consecutive images.
cStrideOutput. Stride between two consecutive feature maps.
hStrideOutput. Stride between two consecutive rows.
wStrideOutput. Stride between two consecutive columns.
Returns
CUDNN_STATUS_SUCCESSThe operation succeeded.
cudnnGetTensorNdDescriptor()#
This function retrieves values stored in a previously initialized TensorNd descriptor object.
cudnnStatus_t cudnnGetTensorNdDescriptor( const cudnnTensorDescriptor_t tensorDesc, int nbDimsRequested, cudnnDataType_t *dataType, int *nbDims, int dimA[], int strideA[])
Parameters
tensorDescInput. Handle to a previously initialized tensor descriptor.
nbDimsRequestedInput. Number of dimensions to extract from a given tensor descriptor. It is also the minimum size of the arrays
dimAandstrideA. If this number is greater than the resultingnbDims[0], onlynbDims[0]dimensions will be returned.datatypeOutput. Data type.
nbDimsOutput. Actual number of dimensions of the tensor will be returned in
nbDims[0].dimAOutput. Array of dimensions of at least
nbDimsRequestedthat will be filled with the dimensions from the provided tensor descriptor.strideAOutput. Array of dimensions of at least
nbDimsRequestedthat will be filled with the strides from the provided tensor descriptor.
Returns
CUDNN_STATUS_SUCCESSThe results were returned successfully.
CUDNN_STATUS_BAD_PARAMEither
tensorDescornbDimspointer isNULL.
cudnnGetTensorSizeInBytes()#
This function returns the size of the tensor in memory in respect to the given descriptor. This function can be used to know the amount of GPU memory to be allocated to hold that tensor.
cudnnStatus_t cudnnGetTensorSizeInBytes( const cudnnTensorDescriptor_t tensorDesc, size_t *size)
Parameters
tensorDescInput. Handle to a previously initialized tensor descriptor.
sizeOutput. Size in bytes needed to hold the tensor in GPU memory.
Returns
CUDNN_STATUS_SUCCESSThe results were returned successfully.
cudnnGetTensorTransformDescriptor()#
This function has been deprecated in cuDNN 9.0.
This function returns the values stored in a previously initialized tensor transform descriptor.
cudnnStatus_t cudnnGetTensorTransformDescriptor( cudnnTensorTransformDescriptor_t transformDesc, uint32_t nbDimsRequested, cudnnTensorFormat_t *destFormat, int32_t padBeforeA[], int32_t padAfterA[], uint32_t foldA[], cudnnFoldingDirection_t *direction);
Parameters
transformDescInput. A previously initialized tensor transform descriptor.
nbDimsRequestedInput. The number of dimensions to consider. For more information, refer to Tensor Descriptor.
destFormatOutput. The transform format that will be returned.
padBeforeA[]Output. An array filled with the amount of padding to add before each dimension. The dimension of this
padBeforeA[]parameter is equal tonbDimsRequested.padAfterA[]Output. An array filled with the amount of padding to add after each dimension. The dimension of this
padBeforeA[]parameter is equal tonbDimsRequested.foldA[]Output. An array that was filled with the folding parameters for each spatial dimension. The dimension of this
foldA[]array isnbDimsRequested-2.directionOutput. The setting that selects folding or unfolding. For more information, refer to cudnnFoldingDirection_t.
Returns
CUDNN_STATUS_SUCCESSThe results were obtained successfully.
CUDNN_STATUS_BAD_PARAMIf
transformDescisNULLor ifnbDimsRequestedis less than3or greater thanCUDNN_DIM_MAX.
cudnnInitTransformDest()#
This function has been deprecated in cuDNN 9.0.
This function initializes and returns a destination tensor descriptor destDesc for tensor transform operations. The initialization is done with the desired parameters described in the transform descriptor cudnnTensorDescriptor_t.
cudnnStatus_t cudnnInitTransformDest( const cudnnTensorTransformDescriptor_t transformDesc, const cudnnTensorDescriptor_t srcDesc, cudnnTensorDescriptor_t destDesc, size_t *destSizeInBytes);
The returned Tensor descriptor will be packed.
Parameters
transformDescInput. Handle to a previously initialized tensor transform descriptor.
srcDescInput. Handle to a previously initialized tensor descriptor.
destDescOutput. Handle of the tensor descriptor that will be initialized and returned.
destSizeInBytesOutput. A pointer to hold the size, in bytes, of the new tensor.
Returns
CUDNN_STATUS_SUCCESSThe tensor descriptor was initialized successfully.
CUDNN_STATUS_BAD_PARAMIf either
srcDescordestDescisNULL, or if the tensor descriptorsnbDimsis incorrect. For more information, refer to Tensor Descriptor..CUDNN_STATUS_NOT_SUPPORTEDIf the provided configuration is not 4D.
CUDNN_STATUS_EXECUTION_FAILEDFunction failed to launch on the GPU.
cudnnLRNCrossChannelBackward()#
This function performs the backward LRN layer computation.
cudnnStatus_t cudnnLRNCrossChannelBackward( cudnnHandle_t handle, cudnnLRNDescriptor_t normDesc, cudnnLRNMode_t lrnMode, const void *alpha, const cudnnTensorDescriptor_t yDesc, const void *y, const cudnnTensorDescriptor_t dyDesc, const void *dy, const cudnnTensorDescriptor_t xDesc, const void *x, const void *beta, const cudnnTensorDescriptor_t dxDesc, void *dx)
Supported formats are: positive-strided, NCHW and NHWC for 4D x and y, and only NCDHW DHW-packed for 5D (for both x and y). Only non-overlapping 4D and 5D tensors are supported. NCHW layout is preferred for performance.
Parameters
handleInput. Handle to a previously created cuDNN library descriptor.
normDescInput. Handle to a previously initialized LRN parameter descriptor.
lrnModeInput. LRN layer mode of operation. Currently only
CUDNN_LRN_CROSS_CHANNEL_DIM1is implemented. Normalization is performed along the tensor’sdimA[1].alpha,betaInputs. Pointers to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows:
dstValue = alpha[0]*resultValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
yDesc,yInputs. Tensor descriptor and pointer in device memory for the layer’s
ydata.dyDesc,dyInputs. Tensor descriptor and pointer in device memory for the layer’s input cumulative loss differential data
dy(including error backpropagation).xDesc,xInputs. Tensor descriptor and pointer in device memory for the layer’s
xdata. Note that these values are not modified during backpropagation.dxDesc,dxOutputs. Tensor descriptor and pointer in device memory for the layer’s resulting cumulative loss differential data
dx(including error backpropagation).
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
One of the tensor pointers
x,yisNULL.Number of input tensor dimensions is 2 or less.
LRN descriptor parameters are outside of their valid ranges.
One of the tensor parameters is 5D but is not in NCDHW DHW-packed format.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration. Some examples include:
Any of the input tensor datatypes is not the same as any of the output tensor datatype.
Any pairwise tensor dimensions mismatch for
x,y,dx, ordy.Any tensor parameters strides are negative.
cudnnLRNCrossChannelForward()#
This function performs the forward LRN layer computation.
cudnnStatus_t cudnnLRNCrossChannelForward( cudnnHandle_t handle, cudnnLRNDescriptor_t normDesc, cudnnLRNMode_t lrnMode, const void *alpha, const cudnnTensorDescriptor_t xDesc, const void *x, const void *beta, const cudnnTensorDescriptor_t yDesc, void *y)
Supported formats are: positive-strided, NCHW and NHWC for 4D x and y, and only NCDHW DHW-packed for 5D (for both x and y). Only non-overlapping 4D and 5D tensors are supported. NCHW layout is preferred for performance.
Parameters
handleInput. Handle to a previously created cuDNN library descriptor.
normDescInput. Handle to a previously initialized LRN parameter descriptor.
lrnModeInput. LRN layer mode of operation. Currently only
CUDNN_LRN_CROSS_CHANNEL_DIM1is implemented. Normalization is performed along the tensor’sdimA[1].alpha,betaInputs. Pointers to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows:
dstValue = alpha[0]*resultValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDesc,yDescInputs. Tensor descriptor objects for the input and output tensors.
xInput. Input tensor data pointer in device memory.
yOutput. Output tensor data pointer in device memory.
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
One of the tensor pointers
x,yisNULL.Number of input tensor dimensions is 2 or less.
LRN descriptor parameters are outside of their valid ranges.
One of the tensor parameters is 5D but is not in NCDHW DHW-packed format.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration. Some examples include:
Any of the input tensor datatypes is not the same as any of the output tensor datatype.
xandytensor dimensions mismatch.Any tensor parameters strides are negative.
cudnnNormalizationBackward()#
This function has been deprecated in cuDNN 9.0.
This function performs backward normalization layer computation that is specified by mode. Per-channel normalization layer is based on the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift paper.
cudnnStatus_t cudnnNormalizationBackward(cudnnHandle_t handle, cudnnNormMode_t mode, cudnnNormOps_t normOps, cudnnNormAlgo_t algo, const void *alphaDataDiff, const void *betaDataDiff, const void *alphaParamDiff, const void *betaParamDiff, const cudnnTensorDescriptor_t xDesc, const void *xData, const cudnnTensorDescriptor_t yDesc, const void *yData, const cudnnTensorDescriptor_t dyDesc, const void *dyData, const cudnnTensorDescriptor_t dzDesc, void *dzData, const cudnnTensorDescriptor_t dxDesc, void *dxData, const cudnnTensorDescriptor_t dNormScaleBiasDesc, const void *normScaleData, const void *normBiasData, void *dNormScaleData, void *dNormBiasData, double epsilon, const cudnnTensorDescriptor_t normMeanVarDesc, const void *savedMean, const void *savedInvVariance, cudnnActivationDescriptor_t activationDesc, void *workSpace, size_t workSpaceSizeInBytes, void *reserveSpace, size_t reserveSpaceSizeInBytes, int groupCnt)
Only 4D and 5D tensors are supported.
The epsilon value has to be the same during training, backpropagation, and inference. This workspace is not required to be clean. Moreover, the workspace does not have to remain unchanged between the forward and backward pass, as it is not used for passing any information.
This function can accept a *workspace pointer to the GPU workspace, and workSpaceSizeInBytes, the size of the workspace, from the user.
The normOps input can be used to set this function to perform either only the normalization, or normalization followed by activation, or normalization followed by element-wise addition and then activation.
When the tensor layout is NCHW, higher performance can be obtained when HW-packed tensors are used for x, dy, dx.
Higher performance for CUDNN_NORM_PER_CHANNEL mode can be obtained when the following conditions are true:
All tensors, namely,
x,y,dz,dy, anddxmust be NHWC-fully packed, and must be of the typeCUDNN_DATA_HALF.The tensor C dimension should be a multiple of 4.
The input parameter mode must be set to
CUDNN_NORM_PER_CHANNEL.The input parameter algo must be set to
CUDNN_NORM_ALGO_PERSIST.
Workspaceis notNULL.
workSpaceSizeInBytesis equal to or larger than the amount required by cudnnGetNormalizationBackwardWorkspaceSize().
reserveSpaceSizeInBytesis equal to or larger than the amount required by cudnnGetNormalizationTrainingReserveSpaceSize().The content in
reserveSpacestored by cudnnNormalizationForwardTraining() must be preserved.
Parameters
handleInput. Handle to a previously created cuDNN library descriptor. For more information, refer to cudnnHandle_t.
modeInput. Mode of operation (per-channel or per-activation). For more information, refer to cudnnNormMode_t.
normOpsInput. Mode of post-operative. Currently
CUDNN_NORM_OPS_NORM_ACTIVATIONandCUDNN_NORM_OPS_NORM_ADD_ACTIVATIONare only supported in the NHWC layout. For more information, refer to cudnnNormOps_t. This input can be used to set this function to perform either only the normalization, or normalization followed by activation, or normalization followed by element-wise addition and then activation.algoInput. Algorithm to be performed. For more information, refer to cudnnNormAlgo_t.
*alphaDataDiff,*betaDataDiffInputs. Pointers to scaling factors (in host memory) used to blend the gradient output
dxwith a prior value in the destination tensor as follows:dstValue = alpha[0]*resultValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
*alphaParamDiff,*betaParamDiffInputs. Pointers to scaling factors (in host memory) used to blend the gradient outputs
dNormScaleDataanddNormBiasDatawith prior values in the destination tensor as follows:dstValue = alpha[0]*resultValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDesc,*xData,yDesc,*yData,dyDesc,*dyDataInputs. Tensor descriptors and pointers in the device memory for the layer’s
xdata, backpropagated gradient inputdy, the original forward outputydata.yDescandyDataare not needed ifnormOpsis set toCUDNN_NORM_OPS_NORM, users may passNULL. For more information, refer to cudnnTensorDescriptor_t.dzDesc,dxDescInputs. Tensor descriptors and pointers in the device memory for the computed gradient output
dzanddx.dzDescis not needed whennormOpsisCUDNN_NORM_OPS_NORMorCUDNN_NORM_OPS_NORM_ACTIVATION, users may passNULL. For more information, refer to cudnnTensorDescriptor_t.*dzData,*dxDataOutputs. Tensor descriptors and pointers in the device memory for the computed gradient output
dzanddx.*dzDatais not needed whennormOpsisCUDNN_NORM_OPS_NORMorCUDNN_NORM_OPS_NORM_ACTIVATION, users may passNULL. For more information, refer to cudnnTensorDescriptor_t.dNormScaleBiasDescInput. Shared tensor descriptor for the following six tensors:
normScaleData,normBiasData,dNormScaleData, anddNormBiasData. The dimensions for this tensor descriptor are dependent on normalization mode.The data type of this tensor descriptor must be
floatfor FP16 and FP32 input tensors anddoublefor FP64 input tensors. For more information, refer to cudnnTensorDescriptor_t.*normScaleDataInput. Pointer in the device memory for the normalization scale parameter (in the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift paper, the quantity scale is referred to as gamma).
*normBiasDataInput. Pointers in the device memory for the normalization bias parameter (in the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift paper, bias is referred to as beta). This parameter is used only when activation should be performed.
*dNormScaleData,*dNormBiasDataOutputs. Pointers in the device memory for the gradients of
normScaleDataandnormBiasData, respectively.epsilonInput. Epsilon value used in normalization formula. Its value should be equal to or greater than zero. The same epsilon value should be used in forward and backward functions.
normMeanVarDescInput. Shared tensor descriptor for the following tensors:
savedMeanandsavedInvVariance. The dimensions for this tensor descriptor are dependent on normalization mode.The data type of this tensor descriptor must be
floatfor FP16 and FP32 input tensors anddoublefor FP64 input tensors. For more information, refer to cudnnTensorDescriptor_t.*savedMean,*savedInvVarianceInputs. Optional cache parameters containing saved intermediate results computed during the forward pass. For this to work correctly, the layer’s
xandnormScaleData,normBiasDatadata has to remain unchanged until this backward function is called. Note that both these parameters can beNULLbut only at the same time. It is recommended to use this cache since the memory overhead is relatively small.activationDescInput. Descriptor for the activation operation. When the
normOpsinput is set to eitherCUDNN_NORM_OPS_NORM_ACTIVATIONorCUDNN_NORM_OPS_NORM_ADD_ACTIVATIONthen this activation is used, otherwise the user may passNULL.workspaceInput. Pointer to the GPU workspace.
workSpaceSizeInBytesInput. The size of the workspace. It must be large enough to trigger the fast NHWC semi-persistent kernel by this function.
*reserveSpaceInput. Pointer to the GPU workspace for the
reserveSpace.reserveSpaceSizeInBytesInput. The size of the
reserveSpace. It must be equal or larger than the amount required by cudnnGetNormalizationTrainingReserveSpaceSize().groutCntInput. Only support 1 for now.
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
Any of the pointers
alphaDataDiff,betaDataDiff,alphaParamDiff,betaParamDiff,xData,dyData,dxData,normScaleData,dNormScaleData, anddNormBiasDataisNULL.The number of
xDesc,yDesc, ordxDesctensor descriptor dimensions is not within the range of [4,5] (only 4D and 5D tensors are supported).dNormScaleBiasDescdimensions not 1xCx1x1 for 4D and 1xCx1x1x1 for 5D for per-channel, and are not 1xCxHxW for 4D and 1xCxDxHxW for 5D for per-activation mode.Exactly one of
savedMean,savedInvVariancepointers isNULL.epsilonvalue is less than zero.Dimensions or data types mismatch for any pair of
xDesc,dyDesc,dxDesc,dNormScaleBiasDesc, ornormMeanVarDesc.
cudnnNormalizationForwardInference()#
This function has been deprecated in cuDNN 9.0.
This function performs the forward normalization layer computation for the inference phase. Per-channel normalization layer is based on the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift paper.
cudnnStatus_t cudnnNormalizationForwardInference(cudnnHandle_t handle, cudnnNormMode_t mode, cudnnNormOps_t normOps, cudnnNormAlgo_t algo, const void *alpha, const void *beta, const cudnnTensorDescriptor_t xDesc, const void *x, const cudnnTensorDescriptor_t normScaleBiasDesc, const void *normScale, const void *normBias, const cudnnTensorDescriptor_t normMeanVarDesc, const void *estimatedMean, const void *estimatedVariance, const cudnnTensorDescriptor_t zDesc, const void *z, cudnnActivationDescriptor_t activationDesc, const cudnnTensorDescriptor_t yDesc, void *y, double epsilon, int groupCnt);
Only 4D and 5D tensors are supported.
The input transformation performed by this function is defined as:
y = beta*y + alpha *[normBias + (normScale * (x-estimatedMean)/sqrt(epsilon + estimatedVariance)]
The epsilon value has to be the same during training, backpropagation, and inference.
For the training phase, refer to cudnnNormalizationForwardTraining().
Higher performance can be obtained when HW-packed tensors are used for all of x and y.
Parameters
handleInput. Handle to a previously created cuDNN library descriptor. For more information, refer to cudnnHandle_t.
modeInput. Mode of operation (per-channel or per-activation). For more information, refer to cudnnNormMode_t.
normOpsInput. Mode of post-operative. Currently
CUDNN_NORM_OPS_NORM_ACTIVATIONandCUDNN_NORM_OPS_NORM_ADD_ACTIVATIONare only supported in the NHWC layout. For more information, refer to cudnnNormOps_t. This input can be used to set this function to perform either only the normalization, or normalization followed by activation, or normalization followed by element-wise addition and then activation.algoInput. Algorithm to be performed. For more information, refer to cudnnNormAlgo_t.
*alpha,*betaInputs. Pointers to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows:
dstValue = alpha[0]*resultValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDesc,yDescInputs. Handles to the previously initialized tensor descriptors.
*xInput. Data pointer to GPU memory associated with the tensor descriptor
xDesc, for the layer’sxinput data.*yOutput. Data pointer to GPU memory associated with the tensor descriptor
yDesc, for theyoutput of the normalization layer.zDesc,*zInputs. Tensor descriptors and pointers in device memory for residual addition to the result of the normalization operation, prior to the activation.
zDescand*zare optional and are only used whennormOpsisCUDNN_NORM_OPS_NORM_ADD_ACTIVATION, otherwise users may passNULL. When in use,zshould have exactly the same dimension asxand the final outputy. For more information, refer to cudnnTensorDescriptor_t.Since
normOpsis only supported forCUDNN_NORM_OPS_NORM, we can set these toNULLfor now.normScaleBiasDesc,normScale,normBiasInputs. Tensor descriptors and pointers in device memory for the normalization scale and bias parameters (in the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift paper, bias is referred to as beta and scale as gamma).
normMeanVarDesc,estimatedMean,estimatedVarianceInputs. Mean and variance tensors and their tensor descriptors. The
estimatedMeanandestimatedVarianceinputs, accumulated during the training phase from the cudnnNormalizationForwardTraining() call, should be passed as inputs here.activationDescInput. Descriptor for the activation operation. When the
normOpsinput is set to eitherCUDNN_NORM_OPS_NORM_ACTIVATIONorCUDNN_NORM_OPS_NORM_ADD_ACTIVATIONthen this activation is used, otherwise the user may passNULL. SincenormOpsis only supported forCUDNN_NORM_OPS_NORM, we can set these toNULLfor now.epsilonInput. Epsilon value used in the normalization formula. Its value should be equal to or greater than zero.
groutCntInput. Only support 1 for now.
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_NOT_SUPPORTEDA compute or data type other than what is supported was chosen, or an unknown algorithm type was chosen.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
One of the pointers
alpha,beta,x,y,normScale,normBias,estimatedMean, andestimatedInvVarianceisNULL.The number of
xDescoryDesctensor descriptor dimensions is not within the range of [4,5] (only 4D and 5D tensors are supported).normScaleBiasDescandnormMeanVarDescdimensions are not 1xCx1x1 for 4D and 1xCx1x1x1 for 5D for per-channel, and are not 1xCxHxW for 4D and 1xCxDxHxW for 5D for per-activation mode.epsilonvalue is less than zero.Dimensions or data types mismatch for
xDescandyDesc.
cudnnNormalizationForwardTraining()#
This function has been deprecated in cuDNN 9.0.
This function performs the forward normalization layer computation for the training phase. Depending on mode, different normalization operations will be performed. Per-channel layer is based on the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift paper.
cudnnStatus_t cudnnNormalizationForwardTraining(cudnnHandle_t handle, cudnnNormMode_t mode, cudnnNormOps_t normOps, cudnnNormAlgo_t algo, const void *alpha, const void *beta, const cudnnTensorDescriptor_t xDesc, const void *xData, const cudnnTensorDescriptor_t normScaleBiasDesc, const void *normScale, const void *normBias, double exponentialAverageFactor, const cudnnTensorDescriptor_t normMeanVarDesc, void *resultRunningMean, void *resultRunningVariance, double epsilon, void *resultSaveMean, void *resultSaveInvVariance, cudnnActivationDescriptor_t activationDesc, const cudnnTensorDescriptor_t zDesc, const void *zData, const cudnnTensorDescriptor_t yDesc, void *yData, void *workspace, size_t workSpaceSizeInBytes, void *reserveSpace, size_t reserveSpaceSizeInBytes, int groupCnt);
Only 4D and 5D tensors are supported.
The epsilon value has to be the same during training, back propagation, and inference.
For the inference phase, refer to cudnnNormalizationForwardInference().
Higher performance can be obtained when HW-packed tensors are used for both x and y.
This API will trigger the new semi-persistent NHWC kernel when the following conditions are true:
All tensors, namely,
xData,yDatamust be NHWC-fully packed and must be of the typeCUDNN_DATA_HALF.The tensor C dimension should be a multiple of 4.
The input parameter mode must be set to
CUDNN_NORM_PER_CHANNEL.The input parameter algo must be set to
CUDNN_NORM_ALGO_PERSIST.
workspaceis notNULL.
workSpaceSizeInBytesis equal to or larger than the amount required by cudnnGetNormalizationForwardTrainingWorkspaceSize().
reserveSpaceSizeInBytesis equal to or larger than the amount required by cudnnGetNormalizationTrainingReserveSpaceSize().The content in
reserveSpacestored by cudnnNormalizationForwardTraining() must be preserved.
This workspace is not required to be clean. Moreover, the workspace does not have to remain unchanged between the forward and backward pass, as it is not used for passing any information. This extended function can accept a *workspace pointer to the GPU workspace, and workSpaceSizeInBytes, the size of the workspace, from the user.
The normOps input can be used to set this function to perform either only the normalization, or normalization followed by activation, or normalization followed by element-wise addition and then activation.
Only 4D and 5D tensors are supported. The epsilon value has to be the same during the training, the backpropagation, and the inference.
When the tensor layout is NCHW, higher performance can be obtained when HW-packed tensors are used for xData, yData.
Parameters
handleInput. Handle to a previously created cuDNN library descriptor. For more information, refer to cudnnHandle_t.
modeInput. Mode of operation (per-channel or per-activation). For more information, refer to cudnnNormMode_t.
normOpsInput. Mode of post-operative. Currently
CUDNN_NORM_OPS_NORM_ACTIVATIONandCUDNN_NORM_OPS_NORM_ADD_ACTIVATIONare only supported in the NHWC layout. For more information, refer to cudnnNormOps_t. This input can be used to set this function to perform either only the normalization, or normalization followed by activation, or normalization followed by element-wise addition and then activation.algoInput. Algorithm to be performed. For more information, refer to cudnnNormAlgo_t.
*alpha,*betaInputs. Pointers to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows:
dstValue = alpha[0]*resultValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDesc,yDescInputs. Handles to the previously initialized tensor descriptors.
*xDataInput. Data pointer to GPU memory associated with the tensor descriptor
xDesc, for the layer’sxinput data.*yDataOutput. Data pointer to GPU memory associated with the tensor descriptor
yDesc, for theyoutput of the normalization layer.zDesc,*zDataInputs. Tensor descriptors and pointers in device memory for residual addition to the result of the normalization operation, prior to the activation.
zDescand*zDataare optional and are only used whennormOpsisCUDNN_NORM_OPS_NORM_ADD_ACTIVATION, otherwise the user may passNULL. When in use,zshould have exactly the same dimension asxDataand the final outputyData. For more information, refer to cudnnTensorDescriptor_t.normScaleBiasDesc,normScale,normBiasInputs. Tensor descriptors and pointers in device memory for the normalization scale and bias parameters (in the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift paper, bias is referred to as beta and scale as gamma). The dimensions for the tensor descriptor are dependent on the normalization mode.
exponentialAverageFactorInput. Factor used in the moving average computation as follows:
runningMean = runningMean*(1-factor) + newMean*factor
Use a
factor=1/(1+n)atN-thcall to the function to get Cumulative Moving Average (CMA) behavior such that:CMA[n] = (x[1]+...+x[n])/n
This is proved below:
Writing CMA[n+1] = (n*CMA[n]+x[n+1])/(n+1) = ((n+1)*CMA[n]-CMA[n])/(n+1) + x[n+1]/(n+1) = CMA[n]*(1-1/(n+1))+x[n+1]*1/(n+1) = CMA[n]*(1-factor) + x(n+1)*factor
normMeanVarDescInputs. Tensor descriptor used for following tensors:
resultRunningMean,resultRunningVariance,resultSaveMean,resultSaveInvVariance.*resultRunningMean,*resultRunningVarianceInputs/Outputs. Pointers to the running mean and running variance data. Both these pointers can be
NULLbut only at the same time. The value stored inresultRunningVariance(or passed as an input in inference mode) is the sample variance and is the moving average ofvariance[x]where the variance is computed either over batch or spatial+batch dimensions depending on the mode. If these pointers are notNULL, the tensors should be initialized to some reasonable values or to0.epsilonInput. Epsilon value used in the normalization formula. Its value should be equal to or greater than zero.
*resultSaveMean,*resultSaveInvVarianceOutputs. Optional cache parameters containing saved intermediate results computed during the forward pass. For this to work correctly, the layer’s
xandnormScale,normBiasdata has to remain unchanged until this backward function is called. Note that both these parameters can beNULLbut only at the same time. It is recommended to use this cache since the memory overhead is relatively small.activationDescInput. The tensor descriptor for the activation operation. When the
normOpsinput is set to eitherCUDNN_NORM_OPS_NORM_ACTIVATIONorCUDNN_NORM_OPS_NORM_ADD_ACTIVATIONthen this activation is used, otherwise the user may passNULL.*workspace,workSpaceSizeInBytesInputs.
*workspaceis a pointer to the GPU workspace, andworkSpaceSizeInBytesis the size of the workspace. When*workspaceis notNULLand*workSpaceSizeInBytesis large enough, and the tensor layout is NHWC and the data type configuration is supported, then this function will trigger a semi-persistent NHWC kernel for normalization. The workspace is not required to be clean. Also, the workspace does not need to remain unchanged between the forward and backward passes.*reserveSpaceInput. Pointer to the GPU workspace for the
reserveSpace.reserveSpaceSizeInBytesInput. The size of the
reserveSpace. Must be equal or larger than the amount required by cudnnGetNormalizationTrainingReserveSpaceSize().groutCntInput. Only support 1 for now.
Returns
CUDNN_STATUS_SUCCESSThe computation was performed successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
One of the pointers
alpha,beta,xData,yData,normScale, andnormBiasisNULL.The number of
xDescoryDesctensor descriptor dimensions is not within the [4,5] range (only 4D and 5D tensors are supported).normScaleBiasDescdimensions are not 1xCx1x1 for 4D and 1xCx1x1x1 for 5D for per-channel mode, and are not 1xCxHxW for 4D and 1xCxDxHxW for 5D for per-activation mode.Exactly one of
resultSaveMean,resultSaveInvVariancepointers areNULL.Exactly one of
resultRunningMean,resultRunningInvVariancepointers areNULL.epsilonvalue is less than zero.Dimensions or data types mismatch for
xDescoryDesc.
cudnnOpTensor()#
This function has been deprecated in cuDNN 9.0.
This function implements the equation C = op(alpha1[0] * A, alpha2[0] * B) + beta[0] * C, given the tensors A, B, and C and the scaling factors alpha1, alpha2, and beta. The op to use is indicated by the descriptor cudnnOpTensorDescriptor_t, meaning, the type of opTensorDesc. Currently-supported ops are listed by the cudnnOpTensorOp_t enum.
cudnnStatus_t cudnnOpTensor( cudnnHandle_t handle, const cudnnOpTensorDescriptor_t opTensorDesc, const void *alpha1, const cudnnTensorDescriptor_t aDesc, const void *A, const void *alpha2, const cudnnTensorDescriptor_t bDesc, const void *B, const void *beta, const cudnnTensorDescriptor_t cDesc, void *C)
The following restrictions on the input and destination tensors apply:
Each dimension of the input tensor A must match the corresponding dimension of the destination tensor C, and each dimension of the input tensor B must match the corresponding dimension of the destination tensor C or must be equal to 1. In the latter case, the same value from the input tensor B for those dimensions will be used to blend into the C tensor.
|
Tensor A |
Tensor B |
Destination Tensor C |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CUDNN_TENSOR_NCHW_VECT_C is not supported as input tensor format. All tensors up to dimension five (5) are supported. This routine does not support tensor formats beyond these dimensions.
Parameters
handleInput. Handle to a previously created cuDNN context.
opTensorDescInput. Handle to a previously initialized op tensor descriptor.
alpha1,alpha2,betaInputs. Pointers to scaling factors (in host memory) used to blend the source value with prior value in the destination tensor as follows:
dstValue = alpha[0]*resultValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
aDesc,bDesc,cDescInputs. Handle to a previously initialized tensor descriptor.
A,BInputs. Pointer to data of the tensors described by the
aDescandbDescdescriptors, respectively.CInput/Output. Pointer to data of the tensor described by the
cDescdescriptor.
Returns
CUDNN_STATUS_SUCCESSThe function executed successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration. Some examples include:
The dimensions of the bias tensor and the output tensor dimensions are above 5.
opTensorCompTypeis not set as stated above.
CUDNN_STATUS_BAD_PARAMThe data type of the destination tensor C is unrecognized, or the restrictions on the input and destination tensors, stated above, are not met.
CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnOpsVersionCheck()#
Cross-library version checker. Each sublibrary has a version checker that checks whether its own version matches that of its dependencies.
cudnnStatus_t cudnnOpsVersionCheck(void)
Returns
CUDNN_STATUS_SUCCESSThe version check passed.
CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCHThe versions are inconsistent.
cudnnPoolingBackward()#
This function has been deprecated in cuDNN 9.0.
This function computes the gradient of a pooling operation.
cudnnStatus_t cudnnPoolingBackward( cudnnHandle_t handle, const cudnnPoolingDescriptor_t poolingDesc, const void *alpha, const cudnnTensorDescriptor_t yDesc, const void *y, const cudnnTensorDescriptor_t dyDesc, const void *dy, const cudnnTensorDescriptor_t xDesc, const void *xData, const void *beta, const cudnnTensorDescriptor_t dxDesc, void *dx)
As of cuDNN version 6.0, a deterministic algorithm is implemented for max backwards pooling. This algorithm can be chosen via the pooling mode enum of poolingDesc. The deterministic algorithm has been measured to be up to 50% slower than the legacy max backwards pooling algorithm, or up to 20% faster, depending upon the use case.
Tensor vectorization is not supported for any tensor descriptor arguments in this function. Best performance is expected when using HW-packed tensors. Only 2 and 3 spatial dimensions are supported.
Parameters
handleInput. Handle to a previously created cuDNN context.
poolingDescInput. Handle to the previously initialized pooling descriptor.
alpha,betaInputs. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows:
dstValue = alpha[0]*result + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
yDescInput. Handle to the previously initialized input tensor descriptor. Can be
NULLfor avg pooling.yInput. Data pointer to GPU memory associated with the tensor descriptor
yDesc. Can beNULLfor avg pooling.dyDescInput. Handle to the previously initialized input differential tensor descriptor. Must be of type
FLOAT,DOUBLE,HALF, orBFLOAT16. For more information, refer to cudnnDataType_t.dyInput. Data pointer to GPU memory associated with the tensor descriptor
dyData.xDescInput. Handle to the previously initialized output tensor descriptor. Can be
NULLfor avg pooling.xInput. Data pointer to GPU memory associated with the output tensor descriptor
xDesc. Can beNULLfor avg pooling.dxDescInput. Handle to the previously initialized output differential tensor descriptor. Must be of type
FLOAT,DOUBLE,HALF, orBFLOAT16. For more information, refer to cudnnDataType_t.dxOutput. Data pointer to GPU memory associated with the output tensor descriptor
dxDesc.
Returns
CUDNN_STATUS_SUCCESSThe function launched successfully.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
The dimensions
n,c,h,wof theyDescanddyDesctensors differ.The strides
nStride,cStride,hStride,wStrideof theyDescanddyDesctensors differ.The dimensions
n,c,h,wof thedxDescanddxDesctensors differ.The strides
nStride,cStride,hStride,wStrideof thexDescanddxDesctensors differ.The datatype of the four tensors differ.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration. Some examples includes:
The
wStrideof input tensor or output tensor is not1.
CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnPoolingForward()#
This function has been deprecated in cuDNN 9.0.
This function computes pooling of input values (meaning, the maximum or average of several adjacent values) to produce an output with smaller height and/or width.
cudnnStatus_t cudnnPoolingForward( cudnnHandle_t handle, const cudnnPoolingDescriptor_t poolingDesc, const void *alpha, const cudnnTensorDescriptor_t xDesc, const void *x, const void *beta, const cudnnTensorDescriptor_t yDesc, void *y)
All tensor formats are supported, best performance is expected when using HW-packed tensors. Only 2 and 3 spatial dimensions are allowed. Vectorized tensors are only supported if they have 2 spatial dimensions.
The dimensions of the output tensor yDesc can be smaller or bigger than the dimensions advised by the routine cudnnGetPooling2dForwardOutputDim() or cudnnGetPoolingNdForwardOutputDim().
For average pooling, the compute type is float even for integer input and output data type. Output round is nearest-even and clamp to the most negative or most positive value of type if out of range.
Parameters
handleInput. Handle to a previously created cuDNN context.
poolingDescInput. Handle to a previously initialized pooling descriptor.
alpha,betaInputs. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows:
dstValue = alpha[0]*result + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDescInput. Handle to the previously initialized input tensor descriptor. Must be of type
FLOAT,DOUBLE,HALF,INT8,INT8x4,INT8x32, orBFLOAT16. For more information, refer to cudnnDataType_t.xInput. Data pointer to GPU memory associated with the tensor descriptor
xDesc.yDescInput. Handle to the previously initialized output tensor descriptor. Must be of type
FLOAT,DOUBLE,HALF,INT8,INT8x4,INT8x32, orBFLOAT16. For more information, refer to cudnnDataType_t.yOutput. Data pointer to GPU memory associated with the output tensor descriptor
yDesc.
Returns
CUDNN_STATUS_SUCCESSThe function launched successfully.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
The dimensions
n,cof the input tensor and output tensors differ.The datatype of the input tensor and output tensors differs.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnReduceTensor()#
This function has been deprecated in cuDNN 9.0.
This function reduces tensor A by implementing the equation C = alpha * reduce op ( A ) + beta * C, given tensors A and C and scaling factors alpha and beta. The reduction op to use is indicated by the descriptor reduceTensorDesc. Currently-supported ops are listed by the cudnnReduceTensorOp_t enum.
cudnnStatus_t cudnnReduceTensor( cudnnHandle_t handle, const cudnnReduceTensorDescriptor_t reduceTensorDesc, void *indices, size_t indicesSizeInBytes, void *workspace, size_t workspaceSizeInBytes, const void *alpha, const cudnnTensorDescriptor_t aDesc, const void *A, const void *beta, const cudnnTensorDescriptor_t cDesc, void *C)
Each dimension of the output tensor C must match the corresponding dimension of the input tensor A or must be equal to 1. The dimensions equal to 1 indicate the dimensions of A to be reduced.
The implementation will generate indices for the min and max ops only, as indicated by the cudnnReduceTensorIndices_t enum of the reduceTensorDesc. Requesting indices for the other reduction ops results in an error. The data type of the indices is indicated by the cudnnIndicesType_t enum; currently only the 32-bit (unsigned int) type is supported.
The indices returned by the implementation are not absolute indices but relative to the dimensions being reduced. The indices are also flattened, meaning, not coordinate tuples.
The data types of the tensors A and C must match if of type double. In this case, alpha and beta and the computation enum of reduceTensorDesc are all assumed to be of type double.
The HALF and INT8 data types may be mixed with the FLOAT data types. In these cases, the computation enum of reduceTensorDesc is required to be of type FLOAT.
Up to dimension 8, all tensor formats are supported. Beyond those dimensions, this routine is not supported.
Parameters
handleInput. Handle to a previously created cuDNN context.
reduceTensorDescInput. Handle to a previously initialized reduce tensor descriptor.
indicesOutput. Handle to a previously allocated space for writing indices.
indicesSizeInBytesInput. Size of the above previously allocated space.
workspaceInput. Handle to a previously allocated space for the reduction implementation.
workspaceSizeInBytesInput. Size of the above previously allocated space.
alpha,betaInputs. Pointers to scaling factors (in host memory) used to blend the source value with prior value in the destination tensor as follows:
dstValue = alpha[0]*resultValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
aDesc,cDescInputs. Handle to a previously initialized tensor descriptor.
AInput. Pointer to data of the tensor described by the
aDescdescriptor.CInput/Output. Pointer to data of the tensor described by the
cDescdescriptor.
Returns
CUDNN_STATUS_SUCCESSThe function executed successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration. Some examples include:
The dimensions of the input tensor and the output tensor are above 8.
reduceTensorCompTypeis not set as stated above.
CUDNN_STATUS_BAD_PARAMThe corresponding dimensions of the input and output tensors all match, or the conditions in the above paragraphs are unmet.
CUDNN_INVALID_VALUEThe allocations for the indices or workspace are insufficient.
CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnRestoreDropoutDescriptor()#
This function restores a dropout descriptor to a previously saved-off state.
cudnnStatus_t cudnnRestoreDropoutDescriptor( cudnnDropoutDescriptor_t dropoutDesc, cudnnHandle_t handle, float dropout, void *states, size_t stateSizeInBytes, unsigned long long seed)
Parameters
dropoutDescInput/Output. Previously created dropout descriptor.
handleInput. Handle to a previously created cuDNN context.
dropoutInput. Probability with which the value from an input tensor is set to 0 when performing dropout.
statesInput. Pointer to GPU memory that holds random number generator states initialized by a prior call to cudnnSetDropoutDescriptor().
stateSizeInBytesInput. Size in bytes of buffer holding random number generator
states.seedInput. Seed used in prior calls to cudnnSetDropoutDescriptor() that initialized
statesbuffer. Using a different seed from this has no effect. A change of seed, and subsequent update to random number generator states can be achieved by calling cudnnSetDropoutDescriptor().
Returns
CUDNN_STATUS_SUCCESSThe call was successful.
CUDNN_STATUS_INVALID_VALUEThe
statesbuffer size (as indicated instateSizeInBytes) is too small.
cudnnScaleTensor()#
This function has been deprecated in cuDNN 9.0.
This function scales all the elements of a tensor by a given factor.
cudnnStatus_t cudnnScaleTensor( cudnnHandle_t handle, const cudnnTensorDescriptor_t yDesc, void *y, const void *alpha)
Parameters
handleInput. Handle to a previously created cuDNN context.
yDescInput. Handle to a previously initialized tensor descriptor.
yInput/Output. Pointer to data of the tensor described by the
yDescdescriptor.alphaInput. Pointer in Host memory to a single value that all elements of the tensor will be scaled with. For more information, refer to Scaling Parameters.
Returns
CUDNN_STATUS_SUCCESSThe function launched successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMOne of the provided pointers is
NIL.CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnSetActivationDescriptor()#
This function initializes a previously created generic activation descriptor object.
cudnnStatus_t cudnnSetActivationDescriptor( cudnnActivationDescriptor_t activationDesc, cudnnActivationMode_t mode, cudnnNanPropagation_t reluNanOpt, double coef)
Parameters
activationDescInput/Output. Handle to a previously created activation descriptor.
modeInput. Enumerant to specify the activation mode.
reluNanOptInput. Enumerant to specify the
Nanpropagation mode.coefInput. Floating point number. When the activation mode (refer to cudnnActivationMode_t) is set to
CUDNN_ACTIVATION_CLIPPED_RELU, this input specifies the clipping threshold; and when the activation mode is set toCUDNN_ACTIVATION_RELU, this input specifies the upper bound.
Returns
CUDNN_STATUS_SUCCESSThe object was set successfully.
CUDNN_STATUS_BAD_PARAMmodeorreluNanOpthas an invalid enumerant value.
cudnnSetActivationDescriptorSwishBeta()#
This function sets the beta parameter of the SWISH activation function to swish_beta.
cudnnStatus_t cudnnSetActivationDescriptorSwishBeta(cudnnActivationDescriptor_t activationDesc, double swish_beta)
Parameters
activationDescInput/Output. Handle to a previously created activation descriptor.
swish_betaInput. The value to set the SWISH activations’ beta parameter to.
Returns
CUDNN_STATUS_SUCCESSThe value was set successfully.
CUDNN_STATUS_BAD_PARAMThe activation descriptor is a
NULLpointer.
cudnnSetDropoutDescriptor()#
This function initializes a previously created dropout descriptor object. If the states argument is equal to NULL, then the random number generator states won’t be initialized, and only the dropout value will be set. The user is expected not to change the memory pointed at by states for the duration of the computation.
cudnnStatus_t cudnnSetDropoutDescriptor( cudnnDropoutDescriptor_t dropoutDesc, cudnnHandle_t handle, float dropout, void *states, size_t stateSizeInBytes, unsigned long long seed)
When the states argument is not NULL, a cuRAND initialization kernel is invoked by cudnnSetDropoutDescriptor(). This kernel requires a substantial amount of GPU memory for the stack. Memory is released when the kernel finishes. The CUDNN_STATUS_ALLOC_FAILED status is returned when no sufficient free memory is available for the GPU stack.
Parameters
dropoutDescInput/Output. Previously created dropout descriptor object.
handleInput. Handle to a previously created cuDNN context.
dropoutInput. The probability with which the value from input is set to zero during the dropout layer.
statesOutput. Pointer to user-allocated GPU memory that will hold random number generator states.
stateSizeInBytesInput. Specifies the size in bytes of the provided memory for the states.
seedInput. Seed used to initialize random number generator states.
Returns
CUDNN_STATUS_SUCCESSThe call was successful.
CUDNN_STATUS_INVALID_VALUEThe
sizeInBytesargument is less than the value returned by cudnnDropoutGetStatesSize().CUDNN_STATUS_ALLOC_FAILEDThe function failed to temporarily extend the GPU stack.
CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
CUDNN_STATUS_INTERNAL_ERRORInternally used CUDA functions returned an error status.
cudnnSetFilter4dDescriptor()#
This function has been deprecated in cuDNN 9.0.
This function initializes a previously created filter descriptor object into a 4D filter. The layout of the filters must be contiguous in memory.
cudnnStatus_t cudnnSetFilter4dDescriptor( cudnnFilterDescriptor_t filterDesc, cudnnDataType_t dataType, cudnnTensorFormat_t format, int k, int c, int h, int w)
Tensor format CUDNN_TENSOR_NHWC has limited support in cudnnConvolutionForward(), cudnnConvolutionBackwardData(), and cudnnConvolutionBackwardFilter().
Parameters
filterDescInput/Output. Handle to a previously created filter descriptor.
datatypeInput. Data type.
formatInput. Type of the filter layout format. If this input is set to
CUDNN_TENSOR_NCHW, which is one of the enumerant values allowed by cudnnTensorFormat_t descriptor, then the layout of the filter is in the form ofKCRS, where:Krepresents the number of output feature mapsCis the number of input feature mapsRis the number of rows per filterSis the number of columns per filter
If this input is set to
CUDNN_TENSOR_NHWC, then the layout of the filter is in the form ofKRSC.kInput. Number of output feature maps.
cInput. Number of input feature maps.
hInput. Height of each filter.
wInput. Width of each filter.
Returns
CUDNN_STATUS_SUCCESSThe object was set successfully.
CUDNN_STATUS_BAD_PARAMAt least one of the parameters
k,c,h,wis negative ordataTypeorformathas an invalid enumerant value.
cudnnSetFilterNdDescriptor()#
This function has been deprecated in cuDNN 9.0.
This function initializes a previously created filter descriptor object. The layout of the filters must be contiguous in memory.
cudnnStatus_t cudnnSetFilterNdDescriptor( cudnnFilterDescriptor_t filterDesc, cudnnDataType_t dataType, cudnnTensorFormat_t format, int nbDims, const int filterDimA[])
The tensor format CUDNN_TENSOR_NHWC has limited support in cudnnConvolutionForward(), cudnnConvolutionBackwardData(), and cudnnConvolutionBackwardFilter().
Parameters
filterDescInput/Output. Handle to a previously created filter descriptor.
datatypeInput. Data type.
formatInput.Type of the filter layout format. If this input is set to
CUDNN_TENSOR_NCHW, which is one of the enumerant values allowed by cudnnTensorFormat_t descriptor, then the layout of the filter is as follows:- For
N=4, a 4D filter descriptor, the filter layout is in the form ofKCRS: Krepresents the number of output feature mapsCis the number of input feature mapsRis the number of rows per filterSis the number of columns per filter
- For
For
N=3, a 3D filter descriptor, the numberS(number of columns per filter) is omitted.For
N=5and greater, the layout of the higher dimensions immediately followsRS.
On the other hand, if this input is set to
CUDNN_TENSOR_NHWC, then the layout of the filter is as follows:For
N=4, a 4D filter descriptor, the filter layout is in the form ofKRSC.For
N=3, a 3D filter descriptor, the numberS(number of columns per filter) is omitted, and the layout ofCimmediately followsR.For
N=5and greater, the layout of the higher dimensions are inserted betweenSandC.
nbDimsInput. Dimension of the filter.
filterDimAInput. Array of dimension
nbDimscontaining the size of the filter for each dimension.
Returns
CUDNN_STATUS_SUCCESSThe object was set successfully.
CUDNN_STATUS_BAD_PARAMAt least one of the elements of the array
filterDimAis negative ordataTypeorformathas an invalid enumerant value.CUDNN_STATUS_NOT_SUPPORTEDThe parameter
nbDimsexceedsCUDNN_DIM_MAX.
cudnnSetLRNDescriptor()#
This function initializes a previously created LRN descriptor object.
cudnnStatus_t cudnnSetLRNDescriptor( cudnnLRNDescriptor_t normDesc, unsigned lrnN, double lrnAlpha, double lrnBeta, double lrnK)Note
Macros
CUDNN_LRN_MIN_N,CUDNN_LRN_MAX_N,CUDNN_LRN_MIN_K,CUDNN_LRN_MIN_BETAdefined incudnn.hspecify valid ranges for parameters.Values of double parameters will be cast down to the tensor
datatypeduring computation.
Parameters
normDescOutput. Handle to a previously created LRN descriptor.
lrnNInput. Normalization window width in elements. The LRN layer uses a window
[center-lookBehind, center+lookAhead], wherelookBehind = floor( (lrnN-1)/2 ),lookAhead = lrnN-lookBehind-1. So forn=10, the window is[k-4...k...k+5]with a total of 10 samples. For theDivisiveNormalizationlayer, the window has the same extent as above in all spatial dimensions (dimA[2],dimA[3],dimA[4]). By default,lrnNis set to 5 in cudnnCreateLRNDescriptor().lrnAlphaInput. Value of the alpha variance scaling parameter in the normalization formula. Inside the library code, this value is divided by the window width for LRN and by
(window width)^#spatialDimensionsforDivisiveNormalization. By default, this value is set to 1e-4 in cudnnCreateLRNDescriptor().lrnBetaInput. Value of the beta power parameter in the normalization formula. By default, this value is set to
0.75in cudnnCreateLRNDescriptor().lrnKInput. Value of the
kparameter in the normalization formula. By default, this value is set to2.0.
Returns
CUDNN_STATUS_SUCCESSThe object was set successfully.
CUDNN_STATUS_BAD_PARAMOne of the input parameters was out of valid range as described above.
cudnnSetOpTensorDescriptor()#
This function initializes a tensor pointwise math descriptor.
cudnnStatus_t cudnnSetOpTensorDescriptor( cudnnOpTensorDescriptor_t opTensorDesc, cudnnOpTensorOp_t opTensorOp, cudnnDataType_t opTensorCompType, cudnnNanPropagation_t opTensorNanOpt)
Parameters
opTensorDescOutput. Pointer to the structure holding the description of the tensor pointwise math descriptor.
opTensorOpInput. Tensor pointwise math operation for this tensor pointwise math descriptor.
opTensorCompTypeInput. Computation datatype for this tensor pointwise math descriptor.
opTensorNanOptInput. NAN propagation policy.
Returns
CUDNN_STATUS_SUCCESSThe function returned successfully.
CUDNN_STATUS_BAD_PARAMAt least one of the input parameters passed is invalid.
cudnnSetPooling2dDescriptor()#
This function initializes a previously created generic pooling descriptor object into a 2D description.
cudnnStatus_t cudnnSetPooling2dDescriptor( cudnnPoolingDescriptor_t poolingDesc, cudnnPoolingMode_t mode, cudnnNanPropagation_t maxpoolingNanOpt, int windowHeight, int windowWidth, int verticalPadding, int horizontalPadding, int verticalStride, int horizontalStride)
Parameters
poolingDescInput/Output. Handle to a previously created pooling descriptor.
modeInput. Enumerant to specify the pooling mode.
maxpoolingNanOptInput. Enumerant to specify the Nan propagation mode.
windowHeightInput. Height of the pooling window.
windowWidthInput. Width of the pooling window.
verticalPaddingInput. Size of vertical padding.
horizontalPaddingInput. Size of horizontal padding
verticalStrideInput. Pooling vertical stride.
horizontalStrideInput. Pooling horizontal stride.
Returns
CUDNN_STATUS_SUCCESSThe object was set successfully.
CUDNN_STATUS_BAD_PARAMAt least one of the parameters
windowHeight,windowWidth,verticalStride,horizontalStrideis negative or mode ormaxpoolingNanOpthas an invalid enumerate value.
cudnnSetPoolingNdDescriptor()#
This function initializes a previously created generic pooling descriptor object.
cudnnStatus_t cudnnSetPoolingNdDescriptor( cudnnPoolingDescriptor_t poolingDesc, const cudnnPoolingMode_t mode, const cudnnNanPropagation_t maxpoolingNanOpt, int nbDims, const int windowDimA[], const int paddingA[], const int strideA[])
Parameters
poolingDescInput/Output. Handle to a previously created pooling descriptor.
modeInput. Enumerant to specify the pooling mode.
maxpoolingNanOptInput. Enumerant to specify the Nan propagation mode.
nbDimsInput. Dimension of the pooling operation. Must be greater than zero.
windowDimAInput. Array of dimension
nbDimscontaining the window size for each dimension. The value of array elements must be greater than zero.paddingAInput. Array of dimension
nbDimscontaining the padding size for each dimension. Negative padding is allowed.strideAInput. Array of dimension
nbDimscontaining the striding size for each dimension. The value of array elements must be greater than zero (meaning, negative striding size is not allowed).
Returns
CUDNN_STATUS_SUCCESSThe object was initialized successfully.
CUDNN_STATUS_NOT_SUPPORTEDIf (
nbDims > CUDNN_DIM_MAX-2).CUDNN_STATUS_BAD_PARAMEither
nbDims, or at least one of the elements of the arrayswindowDimAorstrideAis negative, or mode ormaxpoolingNanOpthas an invalid enumerate value.
cudnnSetReduceTensorDescriptor()#
This function initializes a previously created reduce tensor descriptor object.
cudnnStatus_t cudnnSetReduceTensorDescriptor( cudnnReduceTensorDescriptor_t reduceTensorDesc, cudnnReduceTensorOp_t reduceTensorOp, cudnnDataType_t reduceTensorCompType, cudnnNanPropagation_t reduceTensorNanOpt, cudnnReduceTensorIndices_t reduceTensorIndices, cudnnIndicesType_t reduceTensorIndicesType)
Parameters
reduceTensorDescInput/Output. Handle to a previously created reduce tensor descriptor.
reduceTensorOpInput. Enumerant to specify the reduced tensor operation.
reduceTensorCompTypeInput. Enumerant to specify the computation datatype of the reduction.
reduceTensorNanOptInput. Enumerant to specify the Nan propagation mode.
reduceTensorIndicesInput. Enumerant to specify the reduced tensor indices.
reduceTensorIndicesTypeInput. Enumerant to specify the reduced tensor indices type.
Returns
CUDNN_STATUS_SUCCESSThe object was set successfully.
CUDNN_STATUS_BAD_PARAMreduceTensorDescisNULL(reduceTensorOp,reduceTensorCompType,reduceTensorNanOpt,reduceTensorIndicesorreduceTensorIndicesTypehas an invalid enumerant value).
cudnnSetSpatialTransformerNdDescriptor()#
This function initializes a previously created generic spatial transformer descriptor object.
cudnnStatus_t cudnnSetSpatialTransformerNdDescriptor( cudnnSpatialTransformerDescriptor_t stDesc, cudnnSamplerType_t samplerType, cudnnDataType_t dataType, const int nbDims, const int dimA[])
Parameters
stDescInput/Output. Previously created spatial transformer descriptor object.
samplerTypeInput. Enumerant to specify the sampler type.
dataTypeInput. Data type.
nbDimsInput. Dimension of the transformed tensor.
dimAInput. Array of dimension
nbDimscontaining the size of the transformed tensor for every dimension.
Returns
CUDNN_STATUS_SUCCESSThe call was successful.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
Either
stDescordimAisNULL.Either
dataTypeorsamplerTypehas an invalid enumerant value.
cudnnSetTensor()#
This function sets all the elements of a tensor to a given value.
cudnnStatus_t cudnnSetTensor( cudnnHandle_t handle, const cudnnTensorDescriptor_t yDesc, void *y, const void *valuePtr)
Parameters
handleInput. Handle to a previously created cuDNN context.
yDescInput. Handle to a previously initialized tensor descriptor.
yInput/Output. Pointer to data of the tensor described by the
yDescdescriptor.valuePtrInput. Pointer in host memory to a single value. All elements of the
ytensor will be set tovalue[0]. The data type of the element invalue[0]has to match the data type of tensory.
Returns
CUDNN_STATUS_SUCCESSThe function launched successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMOne of the provided pointers is
NIL.CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnSetTensor4dDescriptor()#
This function initializes a previously created generic tensor descriptor object into a 4D tensor. The strides of the four dimensions are inferred from the format parameter and set in such a way that the data is contiguous in memory with no padding between dimensions.
cudnnStatus_t cudnnSetTensor4dDescriptor( cudnnTensorDescriptor_t tensorDesc, cudnnTensorFormat_t format, cudnnDataType_t dataType, int n, int c, int h, int w)
The total size of a tensor including the potential padding between dimensions is limited to 2 Giga-elements of type datatype.
Parameters
tensorDescInput/Output. Handle to a previously created tensor descriptor.
formatInput. Type of format.
datatypeInput. Data type.
nInput. Number of images.
cInput. Number of feature maps per image.
hInput. Height of each feature map.
wInput. Width of each feature map.
Returns
CUDNN_STATUS_SUCCESSThe object was set successfully.
CUDNN_STATUS_BAD_PARAMAt least one of the parameters
n,c,h,wwas negative orformathas an invalid enumerant value ordataTypehas an invalid enumerant value.CUDNN_STATUS_NOT_SUPPORTEDThe total size of the tensor descriptor exceeds the maximum limit of 2 Giga-elements.
cudnnSetTensor4dDescriptorEx()#
This function initializes a previously created generic tensor descriptor object into a 4D tensor, similarly to cudnnSetTensor4dDescriptor() but with the strides explicitly passed as parameters. This can be used to lay out the 4D tensor in any order or simply to define gaps between dimensions.
cudnnStatus_t cudnnSetTensor4dDescriptorEx( cudnnTensorDescriptor_t tensorDesc, cudnnDataType_t dataType, int n, int c, int h, int w, int nStride, int cStride, int hStride, int wStride)
At present, some cuDNN routines have limited support for strides. Those routines will return CUDNN_STATUS_NOT_SUPPORTED if a 4D tensor object with an unsupported stride is used. cudnnTransformTensor() can be used to convert the data to a supported layout.
The total size of a tensor including the potential padding between dimensions is limited to 2 Giga-elements of type datatype.
Parameters
tensorDescInput/Output. Handle to a previously created tensor descriptor.
datatypeInput. Data type.
nInput. Number of images.
cInput. Number of feature maps per image.
hInput. Height of each feature map.
wInput. Width of each feature map.
nStrideInput. Stride between two consecutive images.
cStrideInput. Stride between two consecutive feature maps.
hStrideInput. Stride between two consecutive rows.
wStrideInput. Stride between two consecutive columns.
Returns
CUDNN_STATUS_SUCCESSThe object was set successfully.
CUDNN_STATUS_BAD_PARAMAt least one of the parameters
n,c,h,wornStride,cStride,hStride,wStrideis negative ordataTypehas an invalid enumerant value.CUDNN_STATUS_NOT_SUPPORTEDThe total size of the tensor descriptor exceeds the maximum limit of 2 Giga-elements.
cudnnSetTensorNdDescriptor()#
This function initializes a previously created generic tensor descriptor object.
cudnnStatus_t cudnnSetTensorNdDescriptor( cudnnTensorDescriptor_t tensorDesc, cudnnDataType_t dataType, int nbDims, const int dimA[], const int strideA[])
The total size of a tensor including the potential padding between dimensions is limited to 2 Giga-elements of type datatype. Tensors are restricted to having at least 4 dimensions, and at most CUDNN_DIM_MAX dimensions (defined in cudnn.h). When working with lower dimensional data, it is recommended that the user create a 4D tensor, and set the size along unused dimensions to 1.
Parameters
tensorDescInput/Output. Handle to a previously created tensor descriptor.
datatypeInput. Data type.
nbDimsInput. Dimension of the tensor.
Do not use 2 dimensions. Due to historical reasons, the minimum number of dimensions in the filter descriptor is three.
dimAInput. Array of dimension
nbDimsthat contain the size of the tensor for every dimension. The size along unused dimensions should be set to1. By convention, the ordering of dimensions in the array follows the format -[N, C, D, H, W], withWoccupying the smallest index in the array.strideAInput. Array of dimension
nbDimsthat contain the stride of the tensor for every dimension. By convention, the ordering of the strides in the array follows the format -[Nstride, Cstride, Dstride, Hstride, Wstride], withWstrideoccupying the smallest index in the array.
Returns
CUDNN_STATUS_SUCCESSThe object was set successfully.
CUDNN_STATUS_BAD_PARAMAt least one of the elements of the array
dimAwas negative or zero, ordataTypehas an invalid enumerant value.CUDNN_STATUS_NOT_SUPPORTEDThe parameter
nbDimsis outside the range[4, CUDNN_DIM_MAX], or the total size of the tensor descriptor exceeds the maximum limit of 2 Giga-elements.
cudnnSetTensorNdDescriptorEx()#
This function initializes an Nd tensor descriptor.
cudnnStatus_t cudnnSetTensorNdDescriptorEx( cudnnTensorDescriptor_t tensorDesc, cudnnTensorFormat_t format, cudnnDataType_t dataType, int nbDims, const int dimA[])
Parameters
tensorDescOutput. Pointer to the tensor descriptor struct to be initialized.
formatInput. Tensor format.
dataTypeInput. Tensor data type.
nbDimsInput. Dimension of the tensor.
Do not use 2 dimensions. Due to historical reasons, the minimum number of dimensions in the filter descriptor is three.
dimAInput. Array containing the size of each dimension.
Returns
CUDNN_STATUS_SUCCESSThe function was successful.
CUDNN_STATUS_BAD_PARAMTensor descriptor was not allocated properly; or input parameters are not set correctly.
CUDNN_STATUS_NOT_SUPPORTEDDimension size requested is larger than maximum dimension size supported.
cudnnSetTensorTransformDescriptor()#
This function has been deprecated in cuDNN 9.0.
This function initializes a tensor transform descriptor that was previously created using the cudnnCreateTensorTransformDescriptor() function.
cudnnStatus_t cudnnSetTensorTransformDescriptor( cudnnTensorTransformDescriptor_t transformDesc, const uint32_t nbDims, const cudnnTensorFormat_t destFormat, const int32_t padBeforeA[], const int32_t padAfterA[], const uint32_t foldA[], const cudnnFoldingDirection_t direction);
Parameters
transformDescOutput. The tensor transform descriptor to be initialized.
nbDimsInput. The dimensionality of the transform operands. Must be greater than 2. For more information, refer to Tensor Descriptor.
destFormatInput. The desired destination format.
padBeforeA[]Input. An array that contains the amount of padding that should be added before each dimension. Set to
NULLfor no padding.padAfterA[]Input. An array that contains the amount of padding that should be added after each dimension. Set to
NULLfor no padding.foldA[]Input. An array that contains the folding parameters for each spatial dimension (dimensions 2 and up). Set to
NULLfor no folding.directionInput. Selects folding or unfolding. This input has no effect when folding parameters are all <= 1. For more information, refer to cudnnFoldingDirection_t.
Returns
CUDNN_STATUS_SUCCESSThe function was launched successfully.
CUDNN_STATUS_BAD_PARAMThe parameter
transformDescisNULL, or ifdirectionis invalid, ornbDimsis <= 2.CUDNN_STATUS_NOT_SUPPORTEDIf the dimension size requested is larger than maximum dimension size supported (meaning, one of the
nbDimsis larger thanCUDNN_DIM_MAX), or ifdestFromatis something other than NCHW or NHWC.
cudnnSoftmaxBackward()#
This routine computes the gradient of the softmax function.
cudnnStatus_t cudnnSoftmaxBackward( cudnnHandle_t handle, cudnnSoftmaxAlgorithm_t algorithm, cudnnSoftmaxMode_t mode, const void *alpha, const cudnnTensorDescriptor_t yDesc, const void *yData, const cudnnTensorDescriptor_t dyDesc, const void *dy, const void *beta, const cudnnTensorDescriptor_t dxDesc, void *dx)
In-place operation is allowed for this routine; meaning, dy and dx pointers may be equal. However, this requires dyDesc and dxDesc descriptors to be identical (particularly, the strides of the input and output must match for in-place operation to be allowed).
All tensor formats are supported for all modes and algorithms with 4 and 5D tensors. Performance is expected to be highest with NCHW fully-packed tensors. For more than 5 dimensions tensors must be packed in their spatial dimensions.
Parameters
handleInput. Handle to a previously created cuDNN context.
algorithmInput. Enumerant to specify the softmax algorithm.
modeInput. Enumerant to specify the softmax mode.
alpha,betaInputs. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows:
dstValue = alpha[0]*result + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
yDescInput. Handle to the previously initialized input tensor descriptor.
yInput. Data pointer to GPU memory associated with the tensor descriptor
yDesc.dyDescInput. Handle to the previously initialized input differential tensor descriptor.
dyInput. Data pointer to GPU memory associated with the tensor descriptor
dyData.dxDescInput. Handle to the previously initialized output differential tensor descriptor.
dxOutput. Data pointer to GPU memory associated with the output tensor descriptor
dxDesc.
Returns
CUDNN_STATUS_SUCCESSThe function launched successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
The dimensions
n,c,h,wof theyDesc,dyDescanddxDesctensors differ.The strides
nStride,cStride,hStride,wStrideof theyDescanddyDesctensors differ.The
datatypeof the three tensors differs.
CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnSoftmaxForward()#
This routine computes the softmax function.
cudnnStatus_t cudnnSoftmaxForward( cudnnHandle_t handle, cudnnSoftmaxAlgorithm_t algorithm, cudnnSoftmaxMode_t mode, const void *alpha, const cudnnTensorDescriptor_t xDesc, const void *x, const void *beta, const cudnnTensorDescriptor_t yDesc, void *y)
In-place operation is allowed for this routine; meaning, x and y pointers may be equal. However, this requires xDesc and yDesc descriptors to be identical (particularly, the strides of the input and output must match for in-place operation to be allowed).
All tensor formats are supported for all modes and algorithms with 4 and 5D tensors. Performance is expected to be highest with NCHW fully-packed tensors. For more than 5 dimensions tensors must be packed in their spatial dimensions.
Parameters
handleInput. Handle to a previously created cuDNN context.
algorithmInput. Enumerant to specify the softmax algorithm.
modeInput. Enumerant to specify the softmax mode.
alpha,betaInputs. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows:
dstValue = alpha[0]*result + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDescInput. Handle to the previously initialized input tensor descriptor.
xInput. Data pointer to GPU memory associated with the tensor descriptor
xDesc.yDescInput. Handle to the previously initialized output tensor descriptor.
yOutput. Data pointer to GPU memory associated with the output tensor descriptor
yDesc.
Returns
CUDNN_STATUS_SUCCESSThe function launched successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
The dimensions
n,c,h,wof the input tensor and output tensors differ.The
datatypeof the input tensor and output tensors differ.The parameters
algorithmormodehave an invalid enumerant value.
CUDNN_STATUS_EXECUTION_FAILED
The function failed to launch on the GPU.
cudnnSpatialTfGridGeneratorBackward()#
This function computes the gradient of a grid generation operation.
cudnnStatus_t cudnnSpatialTfGridGeneratorBackward( cudnnHandle_t handle, const cudnnSpatialTransformerDescriptor_t stDesc, const void *dgrid, void *dtheta)
Only 2D transformation is supported.
Parameters
handleInput. Handle to a previously created cuDNN context.
stDescInput. Previously created spatial transformer descriptor object.
dgridInput. Data pointer to GPU memory contains the input differential data.
dthetaOutput. Data pointer to GPU memory contains the output differential data.
Returns
CUDNN_STATUS_SUCCESSThe call was successful.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
handleisNULL.One of the parameters
dgridordthetaisNULL.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration. Some examples incldue:
The dimension of the transformed tensor specified in
stDesc> 4.
CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnSpatialTfGridGeneratorForward()#
This function generates a grid of coordinates in the input tensor corresponding to each pixel from the output tensor.
cudnnStatus_t cudnnSpatialTfGridGeneratorForward( cudnnHandle_t handle, const cudnnSpatialTransformerDescriptor_t stDesc, const void *theta, void *grid)
Only 2D transformation is supported.
Parameters
handleInput. Handle to a previously created cuDNN context.
stDescInput. Previously created spatial transformer descriptor object.
thetaInput. Affine transformation matrix. It should be of size
n*2*3for a 2D transformation, wherenis the number of images specified instDesc.gridOutput. A grid of coordinates. It is of size
n*h*w*2for a 2D transformation, wheren,h,wis specified instDesc. In the 4th dimension, the first coordinate isx, and the second coordinate isy.
Returns
CUDNN_STATUS_SUCCESSThe call was successful.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
handleisNULL.One of the parameters
gridorthetaisNULL.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration. Some examples include:
The dimension of the transformed tensor specified in
stDesc> 4.
CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnSpatialTfSamplerBackward()#
This function computes the gradient of a sampling operation.
cudnnStatus_t cudnnSpatialTfSamplerBackward( cudnnHandle_t handle, const cudnnSpatialTransformerDescriptor_t stDesc, const void *alpha, const cudnnTensorDescriptor_t xDesc, const void *x, const void *beta, const cudnnTensorDescriptor_t dxDesc, void *dx, const void *alphaDgrid, const cudnnTensorDescriptor_t dyDesc, const void *dy, const void *grid, const void *betaDgrid, void *dgrid)
Only 2D transformation is supported.
Parameters
handleInput. Handle to a previously created cuDNN context.
stDescInput. Previously created spatial transformer descriptor object.
alpha,betaInputs. Pointers to scaling factors (in host memory) used to blend the source value with prior value in the destination tensor as follows:
dstValue = alpha[0]*srcValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDescInput. Handle to the previously initialized input tensor descriptor.
xInput. Data pointer to GPU memory associated with the tensor descriptor
xDesc.dxDescInput. Handle to the previously initialized output differential tensor descriptor.
dxOutput. Data pointer to GPU memory associated with the output tensor descriptor
dxDesc.alphaDgrid,betaDgridInputs. Pointers to scaling factors (in host memory) used to blend the gradient outputs dgrid with prior value in the destination pointer as follows:
dstValue = alpha[0]*srcValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
dyDescInput. Handle to the previously initialized input differential tensor descriptor.
dyInput. Data pointer to GPU memory associated with the tensor descriptor
dyDesc.gridInput. A grid of coordinates generated by cudnnSpatialTfGridGeneratorForward().
dgridOutput. Data pointer to GPU memory contains the output differential data.
Returns
CUDNN_STATUS_SUCCESSThe call was successful.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
handleisNULL.One of the parameters
x,dx,y,dy,grid, anddgridisNULL.The dimension of
dydiffers from those specified instDesc.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration. Some examples include:
The dimension of transformed tensor > 4.
CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnSpatialTfSamplerForward()#
This function performs a sampler operation and generates the output tensor using the grid given by the grid generator.
cudnnStatus_t cudnnSpatialTfSamplerForward( cudnnHandle_t handle, const cudnnSpatialTransformerDescriptor_t stDesc, const void *alpha, const cudnnTensorDescriptor_t xDesc, const void *x, const void *grid, const void *beta, cudnnTensorDescriptor_t yDesc, void *y)
Only 2D transformation is supported.
Parameters
handleInput. Handle to a previously created cuDNN context.
stDescInput. Previously created spatial transformer descriptor object.
alpha,betaInputs. Pointers to scaling factors (in host memory) used to blend the source value with prior value in the destination tensor as follows:
dstValue = alpha[0]*srcValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDescInput. Handle to the previously initialized input tensor descriptor.
xInput. Data pointer to GPU memory associated with the tensor descriptor
xDesc.gridInput. A grid of coordinates generated by cudnnSpatialTfGridGeneratorForward().
yDescInput. Handle to the previously initialized output tensor descriptor.
yOutput. Data pointer to GPU memory associated with the output tensor descriptor
yDesc.
Returns
CUDNN_STATUS_SUCCESSThe call was successful.
CUDNN_STATUS_BAD_PARAMAt least one of the following conditions are met:
handleisNULL.One of the parameters
x,y, orgridisNULL.The dimension of
dydiffers from those specified instDesc.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration. Some examples include:
The dimension of transformed tensor > 4.
CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnTransformFilter()#
This function has been deprecated in cuDNN 9.0.
This function converts the filter between different formats, data types, or dimensions based on the described transformation. It can be used to convert a filter with an unsupported layout format to a filter with a supported layout format.
cudnnStatus_t cudnnTransformFilter( cudnnHandle_t handle, const cudnnTensorTransformDescriptor_t transDesc, const void *alpha, const cudnnFilterDescriptor_t srcDesc, const void *srcData, const void *beta, const cudnnFilterDescriptor_t destDesc, void *destData);
This function copies the scaled data from the input filter srcDesc to the output tensor destDesc with a different layout. If the filter descriptors of srcDesc and destDesc have different dimensions, they must be consistent with folding and padding amount and order specified in transDesc.
The srcDesc and destDesc tensors must not overlap in any way (meaning, tensors cannot be transformed in place).
When performing a folding transform or a zero-padding transform, the scaling factors (alpha, beta) should be set to (1, 0). However, unfolding transforms support any (alpha, beta) values. This function is thread safe.
Parameters
handleInput. Handle to a previously created cuDNN context. For more information, refer to cudnnHandle_t.
transDescInput. A descriptor containing the details of the requested filter transformation. For more information, refer to cudnnTensorTransformDescriptor_t.
alpha,betaInputs. Pointers, in the host memory, to the scaling factors used to scale the data in the input tensor
srcDesc.betais used to scale the destination tensor, whilealphais used to scale the source tensor. For more information, refer to Scaling Parameters.The beta scaling value is not honored in the folding and zero-padding cases. Unfolding supports any (
alpha,beta).srcDesc,destDescInputs. Handles to the previously initiated filter descriptors.
srcDescanddestDescmust not overlap. For more information, refer to cudnnTensorDescriptor_t.srcData,destDataInputs. Pointers, in the host memory, to the data of the tensor described by
srcDescanddestDescrespectively.
Returns
CUDNN_STATUS_SUCCESSThe function launched successfully.
CUDNN_STATUS_BAD_PARAMA parameter is uninitialized or initialized incorrectly, or the number of dimensions is different between
srcDescanddestDesc.CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration. Also, in the folding and padding paths, any value other than
A=1andB=0will result in aCUDNN_STATUS_NOT_SUPPORTED.CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnTransformTensor()#
This function has been deprecated in cuDNN 9.0.
This function copies the scaled data from one tensor to another tensor with a different layout. Those descriptors need to have the same dimensions but not necessarily the same strides. The input and output tensors must not overlap in any way (meaning, tensors cannot be transformed in place). This function can be used to convert a tensor with an unsupported format to a supported one.
cudnnStatus_t cudnnTransformTensor( cudnnHandle_t handle, const void *alpha, const cudnnTensorDescriptor_t xDesc, const void *x, const void *beta, const cudnnTensorDescriptor_t yDesc, void *y)
Parameters
handleInput. Handle to a previously created cuDNN context.
alpha,betaInputs. Pointers to scaling factors (in host memory) used to blend the source value with prior value in the destination tensor as follows:
dstValue = alpha[0]*srcValue + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDescInput. Handle to a previously initialized tensor descriptor. For more information, refer to cudnnTensorDescriptor_t.
xInput. Pointer to data of the tensor described by the
xDescdescriptor.yDescInput. Handle to a previously initialized tensor descriptor. For more information, refer to cudnnTensorDescriptor_t.
yOutput. Pointer to data of the tensor described by the
yDescdescriptor.
Returns
CUDNN_STATUS_SUCCESSThe function launched successfully.
CUDNN_STATUS_NOT_SUPPORTEDThe function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAMThe dimensions
n,c,h,wor thedataTypeof the two tensor descriptors are different.CUDNN_STATUS_EXECUTION_FAILEDThe function failed to launch on the GPU.
cudnnTransformTensorEx()#
This function has been deprecated in cuDNN 9.0.
This function converts the tensor layouts between different formats. It can be used to convert a tensor with an unsupported layout format to a tensor with a supported layout format.
cudnnStatus_t cudnnTransformTensorEx( cudnnHandle_t handle, const cudnnTensorTransformDescriptor_t transDesc, const void *alpha, const cudnnTensorDescriptor_t srcDesc, const void *srcData, const void *beta, const cudnnTensorDescriptor_t destDesc, void *destData);
This function copies the scaled data from the input tensor srcDesc to the output tensor destDesc with a different layout. The tensor descriptors of srcDesc and destDesc should have the same dimensions but need not have the same strides.
The srcDesc and destDesc tensors must not overlap in any way (meaning, tensors cannot be transformed in place).
When performing a folding transform or a zero-padding transform, the scaling factors (alpha, beta) should be set to (1, 0). However, unfolding transforms support any (alpha, beta) values. This function is thread safe.
Parameters
handleInput. Handle to a previously created cuDNN context. For more information, refer to cudnnHandle_t.
transDescInput. A descriptor containing the details of the requested tensor transformation. For more information, refer to cudnnTensorTransformDescriptor_t.
alpha,betaInputs. Pointers, in the host memory, to the scaling factors used to scale the data in the input tensor
srcDesc.betais used to scale the destination tensor, whilealphais used to scale the source tensor. For more information, refer to Scaling Parameters.The beta scaling value is not honored in the folding and zero-padding cases. Unfolding supports any (
alpha,beta).srcDesc,destDescInputs. Handles to the previously initiated tensor descriptors.
srcDescanddestDescmust not overlap. For more information, refer to cudnnTensorDescriptor_t.srcData,destDataInput. Pointers, in the host memory, to the data of the tensor described by
srcDescanddestDescrespectively.
Returns
CUDNN_STATUS_SUCCESSThe function was launched successfully.
CUDNN_STATUS_BAD_PARAMA parameter is uninitialized or initialized incorrectly, or the number of dimensions is different between
srcDescanddestDesc.CUDNN_STATUS_NOT_SUPPORTEDFunction does not support the provided configuration. Also, in the folding and padding paths, any value other than
A=1andB=0will result in aCUDNN_STATUS_NOT_SUPPORTED.CUDNN_STATUS_EXECUTION_FAILEDFunction failed to launch on the GPU.