cudnn_cnn Library
Contents
Data Type References
These are the data type references in the cudnn_cnn
library.
Struct Types
These are the struct types in the cudnn_cnn
library.
cudnnConvolutionBwdDataAlgoPerf_t
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnConvolutionBwdDataAlgoPerf_t
is a structure containing performance results returned by cudnnFindConvolutionBackwardDataAlgorithm() or heuristic results returned by cudnnGetConvolutionBackwardDataAlgorithm_v7().
Data Members
cudnnConvolutionBwdDataAlgo_t algo
The algorithm runs to obtain the associated performance metrics.
cudnnStatus_t status
If any error occurs during the workspace allocation or timing of cudnnConvolutionBackwardData(), this status will represent that error. Otherwise, this status will be the return status of cudnnConvolutionBackwardData().
CUDNN_STATUS_ALLOC_FAILED
if any error occurred during workspace allocation or if the provided workspace is insufficient.CUDNN_STATUS_INTERNAL_ERROR
if any error occurred during timing calculations or workspace deallocation.Otherwise, this will be the return status of cudnnConvolutionBackwardData().
float time
The execution time of cudnnConvolutionBackwardData() (in milliseconds).
size_t memory
The workspace size (in bytes).
cudnnDeterminism_t determinism
The determinism of the algorithm.
cudnnMathType_t mathType
The math type provided to the algorithm.
int reserved[3]
Reserved space for future properties.
cudnnConvolutionBwdFilterAlgoPerf_t
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnConvolutionBwdFilterAlgoPerf_t
is a structure containing performance results returned by cudnnFindConvolutionBackwardFilterAlgorithm() or heuristic results returned by cudnnGetConvolutionBackwardFilterAlgorithm_v7().
Data Members
cudnnConvolutionBwdFilterAlgo_t algo
The algorithm runs to obtain the associated performance metrics.
cudnnStatus_t status
If any error occurs during the workspace allocation or timing of cudnnConvolutionBackwardFilter(), this status will represent that error. Otherwise, this status will be the return status of cudnnConvolutionBackwardFilter().
CUDNN_STATUS_ALLOC_FAILED
if any error occurred during workspace allocation or if the provided workspace is insufficient.CUDNN_STATUS_INTERNAL_ERROR
if any error occurred during timing calculations or workspace deallocation.Otherwise, this will be the return status of cudnnConvolutionBackwardFilter().
float time
The execution time of cudnnConvolutionBackwardFilter() (in milliseconds).
size_t memory
The workspace size (in bytes).
cudnnDeterminism_t determinism
The determinism of the algorithm.
cudnnMathType_t mathType
The math type provided to the algorithm.
int reserved[3]
Reserved space for future properties.
cudnnConvolutionFwdAlgoPerf_t
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnConvolutionFwdAlgoPerf_t
is a structure containing performance results returned by cudnnFindConvolutionForwardAlgorithm() or heuristic results returned by cudnnGetConvolutionForwardAlgorithm_v7().
Data Members
cudnnConvolutionFwdAlgo_t algo
The algorithm runs to obtain the associated performance metrics.
cudnnStatus_t status
If any error occurs during the workspace allocation or timing of cudnnConvolutionForward(), this status will represent that error. Otherwise, this status will be the return status of cudnnConvolutionForward().
CUDNN_STATUS_ALLOC_FAILED
if any error occurred during workspace allocation or if the provided workspace is insufficient.CUDNN_STATUS_INTERNAL_ERROR
if any error occurred during timing calculations or workspace deallocation.Otherwise, this will be the return status of cudnnConvolutionForward().
float time
The execution time of cudnnConvolutionForward() (in milliseconds).
size_t memory
The workspace size (in bytes).
cudnnDeterminism_t determinism
The determinism of the algorithm.
cudnnMathType_t mathType
The math type provided to the algorithm.
int reserved[3]
Reserved space for future properties.
Pointer To Opaque Struct Types
These are the pointers to the opaque struct types in the cudnn_cnn
library.
cudnnConvolutionDescriptor_t
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnConvolutionDescriptor_t
is a pointer to an opaque structure holding the description of a convolution operation. cudnnCreateConvolutionDescriptor() is used to create one instance, and cudnnSetConvolutionNdDescriptor() or cudnnSetConvolution2dDescriptor() must be used to initialize this instance.
cudnnFusedOpsConstParamPack_t
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnFusedOpsConstParamPack_t
is a pointer to an opaque structure holding the description of the cudnnFusedOps
constant parameters. Use the function cudnnCreateFusedOpsConstParamPack() to create one instance of this structure, and the function cudnnDestroyFusedOpsConstParamPack() to destroy a previously-created descriptor.
cudnnFusedOpsPlan_t
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnFusedOpsPlan_t
is a pointer to an opaque structure holding the description of the cudnnFusedOpsPlan
. This descriptor contains the plan information, including the problem type and size, which kernels should be run, and the internal workspace partition. Use the function cudnnCreateFusedOpsPlan() to create one instance of this structure, and the function cudnnDestroyFusedOpsPlan() to destroy a previously-created descriptor.
cudnnFusedOpsVariantParamPack_t
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnFusedOpsVariantParamPack_t
is a pointer to an opaque structure holding the description of the cudnnFusedOps
variant parameters. Use the function cudnnCreateFusedOpsVariantParamPack() to create one instance of this structure, and the function cudnnDestroyFusedOpsVariantParamPack() to destroy a previously-created descriptor.
Enumeration Types
These are the enumeration types in the cudnn_cnn
library.
cudnnFusedOps_t
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
The cudnnFusedOps_t
type is an enumerated type to select a specific sequence of computations to perform in the fused operations.
Members and Descriptions
CUDNN_FUSED_SCALE_BIAS_ACTIVATION_CONV_BNSTATS = 0
On a per-channel basis, it performs these operations in this order:
scale
,add bias
,activation
,convolution
, and generatebatchNorm
statistics.CUDNN_FUSED_SCALE_BIAS_ACTIVATION_WGRAD = 1
On a per-channel basis, it performs these operations in this order:
scale
,add bias
,activation
, convolution backward weights, and generatebatchNorm
statistics.CUDNN_FUSED_BN_FINALIZE_STATISTICS_TRAINING = 2
Computes the equivalent
scale
andbias
fromySum
,ySqSum
, learnedscale
, andbias
. Optionally, update running statistics and generate saved stats.CUDNN_FUSED_BN_FINALIZE_STATISTICS_INFERENCE = 3
Computes the equivalent
scale
andbias
from the learned running statistics and the learnedscale
andbias
.CUDNN_FUSED_CONV_SCALE_BIAS_ADD_ACTIVATION = 4
On a per-channel basis, performs these operations in this order:
convolution
,scale
,add bias
, element-wise addition with another tensor, andactivation
.CUDNN_FUSED_SCALE_BIAS_ADD_ACTIVATION_GEN_BITMASK = 5
On a per-channel basis, performs these operations in this order:
scale
andbias
on one tensor,scale
andbias
on a second tensor, element-wise addition of these two tensors, and on the resulting tensor performsactivation
and generates activation bit mask.CUDNN_FUSED_DACTIVATION_FORK_DBATCHNORM = 6
On a per-channel basis, performs these operations in this order: backward activation, fork (meaning, write out gradient for the residual branch), and backward batch norm.
cudnnFusedOpsConstParamLabel_t
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
The cudnnFusedOpsConstParamLabel_t
is an enumerated type for the selection of the type of the cudnnFusedOps
descriptor. For more information, refer to cudnnSetFusedOpsConstParamPackAttribute().
typedef enum { CUDNN_PARAM_XDESC = 0, CUDNN_PARAM_XDATA_PLACEHOLDER = 1, CUDNN_PARAM_BN_MODE = 2, CUDNN_PARAM_BN_EQSCALEBIAS_DESC = 3, CUDNN_PARAM_BN_EQSCALE_PLACEHOLDER = 4, CUDNN_PARAM_BN_EQBIAS_PLACEHOLDER = 5, CUDNN_PARAM_ACTIVATION_DESC = 6, CUDNN_PARAM_CONV_DESC = 7, CUDNN_PARAM_WDESC = 8, CUDNN_PARAM_WDATA_PLACEHOLDER = 9, CUDNN_PARAM_DWDESC = 10, CUDNN_PARAM_DWDATA_PLACEHOLDER = 11, CUDNN_PARAM_YDESC = 12, CUDNN_PARAM_YDATA_PLACEHOLDER = 13, CUDNN_PARAM_DYDESC = 14, CUDNN_PARAM_DYDATA_PLACEHOLDER = 15, CUDNN_PARAM_YSTATS_DESC = 16, CUDNN_PARAM_YSUM_PLACEHOLDER = 17, CUDNN_PARAM_YSQSUM_PLACEHOLDER = 18, CUDNN_PARAM_BN_SCALEBIAS_MEANVAR_DESC = 19, CUDNN_PARAM_BN_SCALE_PLACEHOLDER = 20, CUDNN_PARAM_BN_BIAS_PLACEHOLDER = 21, CUDNN_PARAM_BN_SAVED_MEAN_PLACEHOLDER = 22, CUDNN_PARAM_BN_SAVED_INVSTD_PLACEHOLDER = 23, CUDNN_PARAM_BN_RUNNING_MEAN_PLACEHOLDER = 24, CUDNN_PARAM_BN_RUNNING_VAR_PLACEHOLDER = 25, CUDNN_PARAM_ZDESC = 26, CUDNN_PARAM_ZDATA_PLACEHOLDER = 27, CUDNN_PARAM_BN_Z_EQSCALEBIAS_DESC = 28, CUDNN_PARAM_BN_Z_EQSCALE_PLACEHOLDER = 29, CUDNN_PARAM_BN_Z_EQBIAS_PLACEHOLDER = 30, CUDNN_PARAM_ACTIVATION_BITMASK_DESC = 31, CUDNN_PARAM_ACTIVATION_BITMASK_PLACEHOLDER = 32, CUDNN_PARAM_DXDESC = 33, CUDNN_PARAM_DXDATA_PLACEHOLDER = 34, CUDNN_PARAM_DZDESC = 35, CUDNN_PARAM_DZDATA_PLACEHOLDER = 36, CUDNN_PARAM_BN_DSCALE_PLACEHOLDER = 37, CUDNN_PARAM_BN_DBIAS_PLACEHOLDER = 38, } cudnnFusedOpsConstParamLabel_t;
Short-Form Used |
Stands For |
---|---|
Setter |
|
Getter |
|
|
|
|
Stands for |
Attribute Key |
Expected Descriptor Type Passed in, in the Setter |
Description |
Default Value After Creation |
---|---|---|---|
|
In the setter, the |
Tensor descriptor describing the size, layout, and datatype of the |
|
|
In the setter, the |
Describes whether |
|
|
In the setter, the |
Describes the mode of operation for the scale, bias and the statistics. As of cuDNN 7.6.0, only |
|
|
In the setter, the |
Tensor descriptor describing the size, layout, and datatype of the batchNorm equivalent scale and bias tensors. The shapes must match the mode specified in |
|
|
In the setter, the |
Describes whether |
|
|
In the setter, the |
Describes whether |
|
|
In the setter, the |
Describes the activation operation. As of 7.6.0, only activation modes of |
|
|
In the setter, the |
Describes the convolution operation. |
|
|
In the setter, the |
Filter descriptor describing the size, layout and datatype of the |
|
|
In the setter, the |
Describes whether |
|
|
In the setter, the |
Tensor descriptor describing the size, layout and datatype of the |
|
|
In the setter, the |
Describes whether |
|
|
In the setter, the |
Tensor descriptor describing the size, layout and datatype of the sum of |
|
|
In the setter, the |
Describes whether sum of |
|
|
In the setter, the |
Describes whether sum of |
|
Note
If the corresponding pointer placeholder in
ConstParamPack
is set toCUDNN_PTR_NULL
, then the device pointer in theVariantParamPack
needs to beNULL
as well.If the corresponding pointer placeholder in
ConstParamPack
is set toCUDNN_PTR_ELEM_ALIGNED
orCUDNN_PTR_16B_ALIGNED
, then the device pointer in theVariantParamPack
may not beNULL
and need to be at least element-aligned or 16 bytes-aligned, respectively.
As of cuDNN 7.6.0, if the following conditions in the table are met, then the fully fused fast path will be triggered. Otherwise, a slower partially fused path will be triggered.
Parameter |
Condition |
---|---|
Device compute capability |
Needs to be one of |
|
|
|
|
|
|
|
|
|
|
Attribute Key |
Expected Descriptor Type Passed in, in the Setter |
Description |
Default Value After Creation |
---|---|---|---|
|
In the setter, the |
Tensor descriptor describing the size, layout, and datatype of the |
|
|
In the setter, the |
Describes whether |
|
|
In the setter, the |
Describes the mode of operation for the scale, bias and the statistics. As of cuDNN 7.6.0, only |
|
|
In the setter, the |
Tensor descriptor describing the size, layout, and datatype of the batchNorm equivalent scale and bias tensors. The shapes must match the mode specified in |
|
|
In the setter, the |
Describes whether |
|
|
In the setter, the |
Describes whether |
|
|
In the setter, the |
Describes the activation operation. As of 7.6.0, only activation modes of |
|
|
In the setter, the |
Describes the convolution operation. |
|
|
In the setter, the |
Filter descriptor describing the size, layout and datatype of the |
|
|
In the setter, the |
Describes whether |
|
|
In the setter, the |
Tensor descriptor describing the size, layout and datatype of the |
|
|
In the setter, the |
Describes whether |
|
Note
If the corresponding pointer placeholder in
ConstParamPack
is set toCUDNN_PTR_NULL
, then the device pointer in theVariantParamPack
needs to beNULL
as well.If the corresponding pointer placeholder in
ConstParamPack
is set toCUDNN_PTR_ELEM_ALIGNED
orCUDNN_PTR_16B_ALIGNED
, then the device pointer in theVariantParamPack
may not beNULL
and need to be at least element-aligned or 16 bytes-aligned, respectively.
As of cuDNN 7.6.0, if the following conditions in the table are met, then the fully fused fast path will be triggered. Otherwise, a slower partially fused path will be triggered.
Parameter |
Condition |
---|---|
Device compute capability |
Needs to be one of |
|
|
|
|
|
|
|
|
Attribute Key |
Expected Descriptor Type Passed in, in the Setter |
Description |
Default Value After Creation |
---|---|---|---|
|
In the setter, the |
Describes the mode of operation for the scale, bias and the statistics. As of cuDNN 7.6.0, only |
|
|
In the setter, the |
Tensor descriptor describing the size, layout, and data type of the sum of |
|
|
In the setter, the |
Describes whether sum of |
|
|
In the setter, the |
Describes whether sum of |
|
|
In the setter, the |
A common tensor descriptor describing the size, layout, and data type of the |
|
|
In the setter, the |
Describes whether the |
|
|
In the setter, the |
Describes whether the |
|
|
In the setter, the |
Describes whether the |
|
|
In the setter, the |
Describes whether the |
|
|
In the setter, the |
Describes whether the |
|
|
In the setter, the |
Describes whether the |
|
|
In the setter, the |
Tensor descriptor describing the size, layout, and data type of the |
|
|
In the setter, the |
Describes whether |
|
|
In the setter, the |
Describes whether |
|
Attribute Key |
Expected Descriptor Type Passed in, in the Setter |
Description |
Default Value After Creation |
---|---|---|---|
|
In the setter, the |
Describes the mode of operation for the scale, bias, and the statistics. As of cuDNN 7.6.0, only |
|
|
In the setter, the |
A common tensor descriptor describing the size, layout, and data type of the |
|
|
In the setter, the |
Describes whether the |
|
|
In the setter, the |
Describes whether the |
|
|
In the setter, the |
Describes whether the |
|
|
In the setter, the |
Describes whether the |
|
|
In the setter, the |
Tensor descriptor describing the size, layout, and data type of the |
|
|
In the setter, the |
Describes whether the |
|
|
In the setter, the |
Describes whether the |
|
The following operation performs the computation, where \(*\) denotes convolution operator: \(y=\alpha_{1}\left( w*x \right)+\alpha_{2}z+b\)
Attribute Key |
Expected Descriptor Type Passed in, in the Setter |
Description |
Default Value After Creation |
---|---|---|---|
|
In the setter, the |
Tensor descriptor describing the size, layout, and data type of the |
|
|
In the setter, the |
Describes whether |
|
|
In the setter, the |
Describes the convolution operation. |
|
|
In the setter, the |
Filter descriptor describing the size, layout, and data type of the |
|
|
In the setter, the |
Describes whether |
|
|
In the setter, the |
Tensor descriptor describing the size, layout, and data type of the \(\alpha_{1}\) scale and bias tensors. The tensor should have shape (1,K,1,1), K is the number of output features. |
|
|
In the setter, the |
Describes whether |
|
|
In the setter, the |
Tensor descriptor describing the size, layout, and data type of the |
|
|
In the setter, the |
Describes whether |
|
|
In the setter, the |
Tensor descriptor describing the size, layout, and data type of the \(\alpha_{2}\) tensor. If set to |
|
|
In the setter, the |
Describes whether |
|
|
In the setter, the |
Describes the activation operation. As of 7.6.0, only activation modes of |
|
|
In the setter, the |
Tensor descriptor describing the size, layout, and data type of the |
|
|
In the setter, the |
Describes whether |
|
cudnnFusedOpsPointerPlaceHolder_t
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
cudnnFusedOpsPointerPlaceHolder_t
is an enumerated type used to select the alignment type of the cudnnFusedOps
descriptor pointer.
Members and Descriptions
CUDNN_PTR_NULL = 0
Indicates that the pointer to the tensor in the
variantPack
will beNULL
.CUDNN_PTR_ELEM_ALIGNED = 1
Indicates that the pointer to the tensor in the
variantPack
will not beNULL
, and will have element alignment.CUDNN_PTR_16B_ALIGNED = 2
Indicates that the pointer to the tensor in the
variantPack
will not beNULL
, and will have 16 byte alignment.
cudnnFusedOpsVariantParamLabel_t
This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.
The cudnnFusedOpsVariantParamLabel_t
is an enumerated type that is used to set the buffer pointers. These buffer pointers can be changed in each iteration.
typedef enum { CUDNN_PTR_XDATA = 0, CUDNN_PTR_BN_EQSCALE = 1, CUDNN_PTR_BN_EQBIAS = 2, CUDNN_PTR_WDATA = 3, CUDNN_PTR_DWDATA = 4, CUDNN_PTR_YDATA = 5, CUDNN_PTR_DYDATA = 6, CUDNN_PTR_YSUM = 7, CUDNN_PTR_YSQSUM = 8, CUDNN_PTR_WORKSPACE = 9, CUDNN_PTR_BN_SCALE = 10, CUDNN_PTR_BN_BIAS = 11, CUDNN_PTR_BN_SAVED_MEAN = 12, CUDNN_PTR_BN_SAVED_INVSTD = 13, CUDNN_PTR_BN_RUNNING_MEAN = 14, CUDNN_PTR_BN_RUNNING_VAR = 15, CUDNN_PTR_ZDATA = 16, CUDNN_PTR_BN_Z_EQSCALE = 17, CUDNN_PTR_BN_Z_EQBIAS = 18, CUDNN_PTR_ACTIVATION_BITMASK = 19, CUDNN_PTR_DXDATA = 20, CUDNN_PTR_DZDATA = 21, CUDNN_PTR_BN_DSCALE = 22, CUDNN_PTR_BN_DBIAS = 23, CUDNN_SCALAR_SIZE_T_WORKSPACE_SIZE_IN_BYTES = 100, CUDNN_SCALAR_INT64_T_BN_ACCUMULATION_COUNT = 101, CUDNN_SCALAR_DOUBLE_BN_EXP_AVG_FACTOR = 102, CUDNN_SCALAR_DOUBLE_BN_EPSILON = 103, } cudnnFusedOpsVariantParamLabel_t;
Short-Form Used |
Stands For |
---|---|
Setter |
|
Getter |
|
|
Stands for |
Attribute Key |
Expected Descriptor Type Passed in, in the Setter |
I/O Type |
Description |
Default Value |
---|---|---|---|---|
|
|
input |
Pointer to |
|
|
|
input |
Pointer to |
|
|
|
input |
Pointer to |
|
|
|
input |
Pointer to |
|
|
|
input |
Pointer to |
|
|
|
input |
Pointer to sum of |
|
|
|
input |
Pointer to sum of |
|
|
|
input |
Pointer to user allocated workspace on device. Can be |
|
|
|
input |
Pointer to a |
|
Note
If the corresponding pointer placeholder in
ConstParamPack
is set toCUDNN_PTR_NULL
, then the device pointer in theVariantParamPack
needs to beNULL
as well.If the corresponding pointer placeholder in
ConstParamPack
is set toCUDNN_PTR_ELEM_ALIGNED
orCUDNN_PTR_16B_ALIGNED
, then the device pointer in theVariantParamPack
may not beNULL
and need to be at least element-aligned or 16 bytes-aligned, respectively.
Attribute Key |
Expected Descriptor Type Passed in, in the Setter |
I/O Type |
Description |
Default Value |
---|---|---|---|---|
|
|
input |
Pointer to |
|
|
|
input |
Pointer to |
|
|
|
input |
Pointer to |
|
|
|
output |
Pointer to |
|
|
|
input |
Pointer to |
|
|
|
input |
Pointer to user allocated workspace on device. Can be |
|
|
|
input |
Pointer to a |
|
Note
If the corresponding pointer placeholder in
ConstParamPack
is set toCUDNN_PTR_NULL
, then the device pointer in theVariantParamPack
needs to beNULL
as well.If the corresponding pointer placeholder in
ConstParamPack
is set toCUDNN_PTR_ELEM_ALIGNED
orCUDNN_PTR_16B_ALIGNED
, then the device pointer in theVariantParamPack
may not beNULL
and need to be at least element-aligned or 16 bytes-aligned, respectively.
Attribute Key |
Expected Descriptor Type Passed in, in the Setter |
I/O Type |
Description |
Default Value |
---|---|---|---|---|
|
|
input |
Pointer to sum of |
|
|
|
input |
Pointer to sum of |
|
|
|
input |
Pointer to sum of |
|
|
|
input |
Pointer to sum of |
|
|
|
output |
Pointer to sum of |
|
|
|
output |
Pointer to sum of |
|
|
|
input/output |
Pointer to sum of |
|
|
|
input/output |
Pointer to sum of |
|
|
|
output |
Pointer to |
|
|
|
output |
Pointer to |
|
|
|
input |
Pointer to a scalar value in |
|
|
|
input |
Pointer to a scalar value in double on host memory. Factor used in the moving average computation. Refer to |
|
|
|
input |
Pointer to a scalar value in double on host memory. A conditioning constant used in the batch normalization formula. Its value should be equal to or greater than the value defined for |
|
|
|
input |
Pointer to user allocated workspace on device. Can be |
|
|
|
input |
Pointer to a |
|
Note
If the corresponding pointer placeholder in
ConstParamPack
is set toCUDNN_PTR_NULL
, then the device pointer in theVariantParamPack
needs to beNULL
as well.If the corresponding pointer placeholder in
ConstParamPack
is set toCUDNN_PTR_ELEM_ALIGNED
orCUDNN_PTR_16B_ALIGNED
, then the device pointer in theVariantParamPack
may not beNULL
and need to be at least element-aligned or 16 bytes-aligned, respectively.
Attribute Key |
Expected Descriptor Type Passed in, in the Setter |
I/O Type |
Description |
Default Value |
---|---|---|---|---|
|
|
input |
Pointer to sum of |
|
|
|
input |
Pointer to sum of |
|
|
|
input/output |
Pointer to sum of |
|
|
|
input/output |
Pointer to sum of |
|
|
|
output |
Pointer to |
|
|
|
output |
Pointer to |
|
|
|
input |
Pointer to a scalar value in double on host memory. A conditioning constant used in the batch normalization formula. Its value should be equal to or greater than the value defined for |
|
|
|
input |
Pointer to user allocated workspace on device. Can be |
|
|
|
input |
Pointer to a |
|
Note
If the corresponding pointer placeholder in
ConstParamPack
is set toCUDNN_PTR_NULL
, then the device pointer in theVariantParamPack
needs to beNULL
as well.If the corresponding pointer placeholder in
ConstParamPack
is set toCUDNN_PTR_ELEM_ALIGNED
orCUDNN_PTR_16B_ALIGNED
, then the device pointer in theVariantParamPack
may not beNULL
and need to be at least element-aligned or 16 bytes-aligned, respectively.
Attribute Key |
Expected Descriptor Type Passed in, in the Setter |
I/O Type |
Description |
Default Value |
---|---|---|---|---|
|
|
input |
Pointer to |
|
|
|
input |
Pointer to |
|
|
|
input |
Pointer to |
|
|
|
input |
Pointer to |
|
|
|
input |
Pointer to |
|
|
|
output |
Pointer to |
|
|
|
input |
Pointer to user allocated workspace on device. Can be |
|
|
|
input |
Pointer to a |
|
Note
If the corresponding pointer placeholder in
ConstParamPack
is set toCUDNN_PTR_NULL
, then the device pointer in theVariantParamPack
needs to beNULL
as well.If the corresponding pointer placeholder in
ConstParamPack
is set toCUDNN_PTR_ELEM_ALIGNED
orCUDNN_PTR_16B_ALIGNED
, then the device pointer in theVariantParamPack
may not beNULL
and need to be at least element-aligned or 16 bytes-aligned, respectively.
API Functions
These are the API functions in the cudnn_cnn
library.
cudnnCnnVersionCheck()
Cross-library version checker. Each sublibrary has a version checker that checks whether its own version matches that of its dependencies.
Returns
CUDNN_STATUS_SUCCESS
The version check passed.
CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH
The versions are inconsistent.
cudnnConvolutionBackwardBias()
This function has been deprecated in cuDNN 9.0.
This function computes the convolution function gradient with respect to the bias, which is the sum of every element belonging to the same feature map across all of the images of the input tensor. Therefore, the number of elements produced is equal to the number of features maps of the input tensor.
cudnnStatus_t cudnnConvolutionBackwardBias( cudnnHandle_t handle, const void *alpha, const cudnnTensorDescriptor_t dyDesc, const void *dy, const void *beta, const cudnnTensorDescriptor_t dbDesc, void *db)
Parameters
handle
Input. Handle to a previously created cuDNN context. For more information, refer to cudnnHandle_t.
alpha
,beta
Input. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows:
dstValue = alpha[0]*result + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
dyDesc
Input. Handle to the previously initialized input tensor descriptor. For more information, refer to cudnnTensorDescriptor_t.
dy
Input. Data pointer to GPU memory associated with the tensor descriptor
dyDesc
.dbDesc
Input. Handle to the previously initialized output tensor descriptor.
db
Output. Data pointer to GPU memory associated with the output tensor descriptor
dbDesc
.
Returns
CUDNN_STATUS_SUCCESS
The operation was launched successfully.
CUDNN_STATUS_NOT_SUPPORTED
The function does not support the provided configuration.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
One of the parameters
n
,height
, orwidth
of the output tensor is not1
.The numbers of feature maps of the input tensor and output tensor differ.
The
dataType
of the two tensor descriptors is different.
cudnnConvolutionBackwardData()
This function has been deprecated in cuDNN 9.0.
This function computes the convolution data gradient of the tensor dy
, where y
is the output of the forward convolution in cudnnConvolutionForward(). It uses the specified algo
, and returns the results in the output tensor dx
. Scaling factors alpha
and beta
can be used to scale the computed result or accumulate with the current dx
.
cudnnStatus_t cudnnConvolutionBackwardData( cudnnHandle_t handle, const void *alpha, const cudnnFilterDescriptor_t wDesc, const void *w, const cudnnTensorDescriptor_t dyDesc, const void *dy, const cudnnConvolutionDescriptor_t convDesc, cudnnConvolutionBwdDataAlgo_t algo, void *workSpace, size_t workSpaceSizeInBytes, const void *beta, const cudnnTensorDescriptor_t dxDesc, void *dx)
Parameters
handle
Input. Handle to a previously created cuDNN context. For more information, refer to cudnnHandle_t.
alpha
,beta
Input. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows:
dstValue = alpha[0]*result + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
wDesc
Input. Handle to a previously initialized filter descriptor. For more information, refer to cudnnFilterDescriptor_t.
w
Input. Data pointer to GPU memory associated with the filter descriptor
wDesc
.dyDesc
Input. Handle to the previously initialized input differential tensor descriptor. For more information, refer to cudnnTensorDescriptor_t.
dy
Input. Data pointer to GPU memory associated with the input differential tensor descriptor
dyDesc
.convDesc
Input. Previously initialized convolution descriptor. For more information, refer to cudnnConvolutionDescriptor_t.
algo
Input. Enumerant that specifies which backward data convolution algorithm should be used to compute the results. For more information, refer to cudnnConvolutionBwdDataAlgo_t.
workSpace
Input. Data pointer to GPU memory to a workspace needed to be able to execute the specified algorithm. If no workspace is needed for a particular algorithm, that pointer can be
NIL
.workSpaceSizeInBytes
Input. Specifies the size in bytes of the provided
workSpace
.dxDesc
Input. Handle to the previously initialized output tensor descriptor.
dx
Input/Output. Data pointer to GPU memory associated with the output tensor descriptor
dxDesc
that carries the result.
Supported Configurations
This function supports the following combinations of data types for wDesc
, dyDesc
, convDesc
, and dxDesc
.
Data Type Configurations |
|
|
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Supported Algorithms
Specifying a separate algorithm can cause changes in performance, support and computation determinism. Refer to the following list of algorithm options, and their respective supported parameters and deterministic behavior.
The table below shows the list of the supported 2D and 3D convolutions. The 2D convolutions are described first, followed by the 3D convolutions.
For brevity, the short-form versions followed by > are used in the table below:
CUDNN_CONVOLUTION_BWD_DATA_ALGO_0
>_ALGO_0
CUDNN_CONVOLUTION_BWD_DATA_ALGO_1
>_ALGO_1
CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT
>_FFT
CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT_TILING
>_FFT_TILING
CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD
>_WINOGRAD
CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD_NONFUSED
>_WINOGRAD_NONFUSED
CUDNN_TENSOR_NCHW
>_NCHW
CUDNN_TENSOR_NHWC
>_NHWC
CUDNN_TENSOR_NCHW_VECT_C
>_NCHW_VECT_C
Algo Name |
Deterministic |
Tensor Formats Supported for |
Tensor Formats Supported for |
Data Type Configurations Supported |
Important |
---|---|---|---|---|---|
|
NHWC HWC-packed |
NHWC HWC-packed |
|
Algo Name |
Deterministic |
Tensor Formats Supported for |
Tensor Formats Supported for |
Data Type Configurations Supported |
Important |
---|---|---|---|---|---|
|
No |
NCHW CHW-packed |
All except |
|
Dilation: Greater than |
|
Yes |
NCHW CHW-packed |
All except |
|
Dilation: Greater than |
|
Yes |
NCHW CHW-packed |
NCHW HW-packed |
|
Dilation: |
|
Yes |
NCHW CHW-packed |
NCHW HW-packed |
|
Dilation: |
|
Yes |
NCHW CHW-packed |
All except |
|
Dilation: |
|
Yes |
NCHW CHW-packed |
All except |
|
Dilation: |
Algo Name (3D Convolutions) |
Deterministic |
Tensor Formats Supported for |
Tensor Formats Supported for |
Data Type Configurations Supported |
Important |
---|---|---|---|---|---|
|
Yes |
NCDHW CDHW-packed |
All except |
|
Dilation: Greater than |
|
Yes |
NCDHW CDHW-packed |
NCDHW CDHW-packed |
|
Dilation: |
|
Yes |
NCDHW CDHW-packed |
NCDHW DHW-packed |
|
Dilation: |
Algo Name (3D Convolutions) |
Deterministic |
Tensor Formats Supported for |
Tensor Formats Supported for |
Data Type Configurations Supported |
Important |
---|---|---|---|---|---|
|
Yes |
NDHWC DHWC-packed |
NDHWC DHWC-packed |
|
Dilation: Greater than |
Returns
CUDNN_STATUS_SUCCESS
The operation was launched successfully.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
At least one of the following is
NULL: handle
,dyDesc
,wDesc
,convDesc
,dxDesc
,dy
,w
,dx
,alpha
, andbeta
wDesc
anddyDesc
have a non-matching number of dimensionswDesc
anddxDesc
have a non-matching number of dimensionswDesc
has fewer than three number of dimensionswDesc
,dxDesc
, anddyDesc
have a non-matching data typewDesc
anddxDesc
have a non-matching number of input feature maps per image (or group in case of grouped convolutions)dyDesc
spatial sizes do not match with the expected size as determined by cudnnGetConvolutionNdForwardOutputDim()
CUDNN_STATUS_NOT_SUPPORTED
At least one of the following conditions are met:
dyDesc
ordxDesc
have a negative tensor stridingdyDesc
,wDesc
, ordxDesc
has a number of dimensions that is not4
or5
The chosen algo does not support the parameters provided; refer to the above tables for an exhaustive list of parameters that support each algo.
dyDesc
orwDesc
indicate an output channel count that isn’t a multiple of group count (if group count has been set inconvDesc
)
CUDNN_STATUS_MAPPING_ERROR
An error occurs during the texture binding of texture object creation associated with the filter data or the input differential tensor data.
CUDNN_STATUS_EXECUTION_FAILED
The function failed to launch on the GPU.
cudnnConvolutionBackwardFilter()
This function has been deprecated in cuDNN 9.0.
This function computes the convolution weight (filter) gradient of the tensor dy
, where y
is the output of the forward convolution in cudnnConvolutionForward(). It uses the specified algo
, and returns the results in the output tensor dw
. Scaling factors alpha
and beta
can be used to scale the computed result or accumulate with the current dw
.
cudnnStatus_t cudnnConvolutionBackwardFilter( cudnnHandle_t handle, const void *alpha, const cudnnTensorDescriptor_t xDesc, const void *x, const cudnnTensorDescriptor_t dyDesc, const void *dy, const cudnnConvolutionDescriptor_t convDesc, cudnnConvolutionBwdFilterAlgo_t algo, void *workSpace, size_t workSpaceSizeInBytes, const void *beta, const cudnnFilterDescriptor_t dwDesc, void *dw)
Parameters
handle
Input. Handle to a previously created cuDNN context. For more information, refer to cudnnHandle_t.
alpha
,beta
Input. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows:
dstValue = alpha[0]*result + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDesc
Input. Handle to a previously initialized tensor descriptor. For more information, refer to cudnnTensorDescriptor_t.
x
Input. Data pointer to GPU memory associated with the tensor descriptor
xDesc
.dyDesc
Input. Handle to the previously initialized input differential tensor descriptor.
dy
Input. Data pointer to GPU memory associated with the backpropagation gradient tensor descriptor
dyDesc
.convDesc
Input. Previously initialized convolution descriptor. For more information, refer to cudnnConvolutionDescriptor_t.
algo
Input. Enumerant that specifies which convolution algorithm should be used to compute the results. For more information, refer to cudnnConvolutionBwdFilterAlgo_t.
workSpace
Input. Data pointer to GPU memory to a workspace needed to be able to execute the specified algorithm. If no workspace is needed for a particular algorithm, that pointer can be
NIL
.workSpaceSizeInBytes
Input. Specifies the size in bytes of the provided
workSpace
.dwDesc
Input. Handle to a previously initialized filter gradient descriptor. For more information, refer to cudnnFilterDescriptor_t.
dw
Input/Output. Data pointer to GPU memory associated with the filter gradient descriptor
dwDesc
that carries the result.
Supported Configurations
This function supports the following combinations of data types for xDesc
, dyDesc
, convDesc
, and dwDesc
.
Data Type Configurations |
|
|
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Supported Algorithms
Specifying a separate algorithm can cause changes in performance, support, and computation determinism. Refer to the following table for an exhaustive list of algorithm options and their respective supported parameters and deterministic behavior.
The table below shows the list of the supported 2D and 3D convolutions. The 2D convolutions are described first, followed by the 3D convolutions.
For the following terms, the short-form versions shown in parentheses are used in the table below, for brevity:
For brevity, the short-form versions followed by > are used in the table below:
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0
>_ALGO_0
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1
>_ALGO_1
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3
>_ALGO_3
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT
>_FFT
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT_TILING
>_FFT_TILING
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_WINOGRAD_NONFUSED
>_WINOGRAD_NONFUSED
CUDNN_TENSOR_NCHW
>_NCHW
CUDNN_TENSOR_NHWC
>_NHWC
CUDNN_TENSOR_NCHW_VECT_C
>_NCHW_VECT_C
Algo Name |
Deterministic |
Tensor Formats Supported for |
Tensor Formats Supported for |
Data Type Configurations Supported |
Important |
---|---|---|---|---|---|
|
All except |
NHWC HWC-packed |
|
Algo Name |
Deterministic |
Tensor Formats Supported for |
Tensor Formats Supported for |
Data Type Configurations Supported |
Important |
---|---|---|---|---|---|
|
No |
All except |
NCHW CHW-packed |
|
Dilation: Greater than |
|
Yes |
All except |
NCHW CHW-packed |
|
Dilation: Greater than |
|
Yes |
NCHW CHW-packed |
NCHW CHW-packed |
|
Dilation: |
|
No |
All except |
NCHW CHW-packed |
|
Dilation: |
|
Yes |
All except |
NCHW CHW-packed |
|
Dilation: |
|
Yes |
NCHW CHW-packed |
NCHW CHW-packed |
|
Dilation: |
Algo Name (3D Convolutions) |
Deterministic |
Tensor Formats Supported for |
Tensor Formats Supported for |
Data Type Configurations Supported |
Important |
---|---|---|---|---|---|
|
No |
All except |
NCDHW CDHW-packed NCDHW W-packed NDHWC |
|
Dilation: Greater than |
|
No |
All except |
NCDHW CDHW-packed NCDHW W-packed NDHWC |
|
Dilation: Greater than |
|
No |
NCDHW fully-packed |
NCDHW fully-packed |
|
Dilation: Greater than |
Algo Name (3D Convolutions) |
Deterministic |
Tensor Formats Supported for |
Tensor Formats Supported for |
Data Type Configurations Supported |
Important |
---|---|---|---|---|---|
|
Yes |
NDHWC HWC-packed |
NDHWC HWC-packed |
|
Dilation: Greater than |
Returns
CUDNN_STATUS_SUCCESS
The operation was launched successfully.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
At least one of the following is
NULL
:handle
,xDesc
,dyDesc
,convDesc
,dwDesc
,xData
,dyData
,dwData
,alpha
, orbeta
xDesc
anddyDesc
have a non-matching number of dimensionsxDesc
anddwDesc
have a non-matching number of dimensionsxDesc
has fewer than three number of dimensionsxDesc
,dyDesc
, anddwDesc
have a non-matching data typexDesc
anddwDesc
have a non-matching number of input feature maps per image (or group in case of grouped convolutions)yDesc
ordwDesc
indicate an output channel count that isn’t a multiple of group count (if group count has been set in convDesc)
CUDNN_STATUS_NOT_SUPPORTED
At least one of the following conditions are met:
xDesc
ordyDesc
have negative tensor stridingxDesc
,dyDesc`,` or ``dwDesc
has a number of dimensions that is not4
or5
The chosen
algo
does not support the parameters provided; see above for an exhaustive list of parameter support for eachalgo
CUDNN_STATUS_MAPPING_ERROR
An error occurs during the texture object creation associated with the filter data.
CUDNN_STATUS_EXECUTION_FAILED
The function failed to launch on the GPU.
cudnnConvolutionBiasActivationForward()
This function has been deprecated in cuDNN 9.0.
This function applies a bias and then an activation to the convolutions or cross-correlations of cudnnConvolutionForward(), returning results in y
. The full computation follows the equation y = act (alpha1 * conv(x) + alpha2 * z + bias)
.
cudnnStatus_t cudnnConvolutionBiasActivationForward( cudnnHandle_t handle, const void *alpha1, const cudnnTensorDescriptor_t xDesc, const void *x, const cudnnFilterDescriptor_t wDesc, const void *w, const cudnnConvolutionDescriptor_t convDesc, cudnnConvolutionFwdAlgo_t algo, void *workSpace, size_t workSpaceSizeInBytes, const void *alpha2, const cudnnTensorDescriptor_t zDesc, const void *z, const cudnnTensorDescriptor_t biasDesc, const void *bias, const cudnnActivationDescriptor_t activationDesc, const cudnnTensorDescriptor_t yDesc, void *y)
The routine cudnnGetConvolution2dForwardOutputDim() or cudnnGetConvolutionNdForwardOutputDim() can be used to determine the proper dimensions of the output tensor descriptor yDesc
with respect to xDesc
, convDesc
, and wDesc
.
Only the CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
algo is enabled with CUDNN_ACTIVATION_IDENTITY
. In other words, in the cudnnActivationDescriptor_t structure of the input activationDesc
, if the mode of the cudnnActivationMode_t field is set to the enum value CUDNN_ACTIVATION_IDENTITY
, then the input cudnnConvolutionFwdAlgo_t of this function cudnnConvolutionBiasActivationForward() must be set to the enum value CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
. For more information, refer to cudnnSetActivationDescriptor().
Device pointer z
and y
may be pointing to the same buffer, however, x
cannot point to the same buffer as z
or y
.
Parameters
handle
Input. Handle to a previously created cuDNN context. For more information, refer to cudnnHandle_t.
alpha1
,alpha2
Input. Pointers to scaling factors (in host memory) used to blend the computation result of convolution with
z
and bias as follows:y = act (alpha1 * conv(x) + alpha2 * z + bias)
For more information, refer to Scaling Parameters.
xDesc
Input. Handle to a previously initialized tensor descriptor. For more information, refer to cudnnTensorDescriptor_t.
x
Input. Data pointer to GPU memory associated with the tensor descriptor
xDesc
.wDesc
Input. Handle to a previously initialized filter descriptor. For more information, refer to cudnnFilterDescriptor_t.
w
Input. Data pointer to GPU memory associated with the filter descriptor
wDesc
.convDesc
Input. Previously initialized convolution descriptor. For more information, refer to cudnnConvolutionDescriptor_t.
algo
Input. Enumerant that specifies which convolution algorithm should be used to compute the results. For more information, refer to cudnnConvolutionFwdAlgo_t.
workSpace
Input. Data pointer to GPU memory to a workspace needed to be able to execute the specified algorithm. If no workspace is needed for a particular algorithm, that pointer can be
NIL
.workSpaceSizeInBytes
Input. Specifies the size in bytes of the provided
workSpace
.zDesc
Input. Handle to a previously initialized tensor descriptor.
z
Input. Data pointer to GPU memory associated with the tensor descriptor zDesc.
biasDesc
Input. Handle to a previously initialized tensor descriptor.
bias
Input. Data pointer to GPU memory associated with the tensor descriptor
biasDesc
.activationDesc
Input. Handle to a previously initialized activation descriptor. For more information, refer to cudnnActivationDescriptor_t.
yDesc
Input. Handle to a previously initialized tensor descriptor.
y
Input/Output. Data pointer to GPU memory associated with the tensor descriptor
yDesc
that carries the result of the convolution.
For the convolution step, this function supports the specific combinations of data types for xDesc
, wDesc
, convDesc
, and yDesc
as listed in the documentation of cudnnConvolutionForward(). The following table specifies the supported combinations of data types for x
, y
, z
, bias
, alpha1
, and alpha2
.
|
|
|
|
|
|
---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Returns
In addition to the error values listed by the documentation of cudnnConvolutionForward(), the possible error values returned by this function and their meanings are listed below.
CUDNN_STATUS_SUCCESS
The operation was launched successfully.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
At least one of the following is
NULL: handle
,xDesc
,wDesc
,convDesc
,yDesc
,zDesc
,biasDesc
,activationDesc
,xData
,wData
,yData
,zData
,bias
,alpha1
, andalpha2
.The number of dimensions of
xDesc
,wDesc
,yDesc
, andzDesc
is not equal to the array length ofconvDesc
+ 2.
CUDNN_STATUS_NOT_SUPPORTED
The function does not support the provided configuration. Some examples of non-supported configurations include:
The
mode
ofactivationDesc
is notCUDNN_ACTIVATION_RELU
orCUDNN_ACTIVATION_IDENTITY
.The
reluNanOpt
ofactivationDesc
is notCUDNN_NOT_PROPAGATE_NAN
.The second stride of
biasDesc
is not equal to1
.The first dimension of
biasDesc
is not equal to1
.The second dimension of
biasDesc
and the first dimension offilterDesc
are not equal.The data type of
biasDesc
does not correspond to the data type ofyDesc
as listed in the above data type tables.zDesc
anddestDesc
do not match.
CUDNN_STATUS_EXECUTION_FAILED
The function failed to launch on the GPU.
cudnnConvolutionForward()
This function has been deprecated in cuDNN 9.0.
This function executes convolutions or cross-correlations over x
using filters specified with w
, returning results in y
. Scaling factors alpha
and beta
can be used to scale the input tensor and the output tensor respectively.
cudnnStatus_t cudnnConvolutionForward( cudnnHandle_t handle, const void *alpha, const cudnnTensorDescriptor_t xDesc, const void *x, const cudnnFilterDescriptor_t wDesc, const void *w, const cudnnConvolutionDescriptor_t convDesc, cudnnConvolutionFwdAlgo_t algo, void *workSpace, size_t workSpaceSizeInBytes, const void *beta, const cudnnTensorDescriptor_t yDesc, void *y)
The routine cudnnGetConvolution2dForwardOutputDim() or cudnnGetConvolutionNdForwardOutputDim() can be used to determine the proper dimensions of the output tensor descriptor yDesc
with respect to xDesc
, convDesc
, and wDesc
.
Parameters
handle
Input. Handle to a previously created cuDNN context. For more information, refer to cudnnHandle_t.
alpha
,beta
Input. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows:
dstValue = alpha[0]*result + beta[0]*priorDstValue
For more information, refer to Scaling Parameters.
xDesc
Input. Handle to a previously initialized tensor descriptor. For more information, refer to cudnnTensorDescriptor_t.
x
Input. Data pointer to GPU memory associated with the tensor descriptor
xDesc
.wDesc
Input. Handle to a previously initialized filter descriptor. For more information, refer to cudnnFilterDescriptor_t.
w
Input. Data pointer to GPU memory associated with the filter descriptor
wDesc
.convDesc
Input. Previously initialized convolution descriptor. For more information, refer to cudnnConvolutionDescriptor_t.
algo
Input. Enumerant that specifies which convolution algorithm should be used to compute the results. For more information, refer to cudnnConvolutionFwdAlgo_t.
workSpace
Input. Data pointer to GPU memory to a workspace needed to be able to execute the specified algorithm. If no workspace is needed for a particular algorithm, that pointer can be
NIL
.workSpaceSizeInBytes
Input. Specifies the size in bytes of the provided
workSpace
.yDesc
Input. Handle to a previously initialized tensor descriptor.
y
Input/Output. Data pointer to GPU memory associated with the tensor descriptor
yDesc
that carries the result of the convolution.
Supported Configurations
This function supports the following combinations of data types for xDesc
, wDesc
, convDesc
, and yDesc
.
Data Type Configurations |
|
|
|
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Supported Algorithms
For this function, all algorithms perform deterministic computations. Specifying a separate algorithm can cause changes in performance and support.
The following table shows the list of the supported 2D and 3D convolutions. The 2D convolutions are described first, followed by the 3D convolutions.
For brevity, the short-form versions followed by > are used in the table below:
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM
>_IMPLICIT_GEMM
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
>_IMPLICIT_PRECOMP_GEMM
CUDNN_CONVOLUTION_FWD_ALGO_GEMM
>_GEMM
CUDNN_CONVOLUTION_FWD_ALGO_DIRECT
>_DIRECT
CUDNN_CONVOLUTION_FWD_ALGO_FFT
>_FFT
CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING
>_FFT_TILING
CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD
>_WINOGRAD
CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED
>_WINOGRAD_NONFUSED
CUDNN_TENSOR_NCHW
>_NCHW
CUDNN_TENSOR_NHWC
>_NHWC
CUDNN_TENSOR_NCHW_VECT_C
>_NCHW_VECT_C
Algo Name |
Tensor Formats Supported for |
Tensor Formats Supported for |
Data Type Configurations Supported |
Important |
---|---|---|---|---|
|
All except |
All except |
|
Dilation: Greater than |
|
All except |
All except |
|
Dilation: |
|
All except |
All except |
|
Dilation: |
|
NCHW HW-packed |
NCHW HW-packed |
|
Dilation: |
|
NCHW HW-packed |
NCHW HW-packed |
|
Dilation: |
|
All except |
All except |
|
Dilation: |
|
All except |
All except |
|
Dilation: |
|
Currently, not implemented in cuDNN. |
Currently, not implemented in cuDNN. |
Currently, not implemented in cuDNN. |
Currently, not implemented in cuDNN. |
Algo Name |
Tensor Formats Supported for |
Tensor Formats Supported for |
Data Type Configurations Supported |
Important |
---|---|---|---|---|
|
All except |
All except |
|
Dilation: |
|
All except |
All except |
|
Dilation: |
Algo Name |
Tensor Formats Supported for |
Tensor Formats Supported for |
Data Type Configurations Supported |
Important |
---|---|---|---|---|
|
NHWC fully-packed |
NHWC fully-packed |
|
Dilation: |
|
NHWC HWC-packed |
NHWC HWC-packed NCHW CHW-packed |
|
|
Algo Name |
Tensor Formats Supported for |
Tensor Formats Supported for |
Data Type Configurations Supported |
Important |
---|---|---|---|---|
|
All except |
All except |
|
Dilation: Greater than |
|
NCDHW DHW-packed |
NCDHW DHW-packed |
|
Dilation: |
Algo Name |
Tensor Formats Supported for |
Tensor Formats Supported for |
Data Type Configurations Supported |
Important |
---|---|---|---|---|
|
NDHWC DHWC-packed |
NDHWC DHWC-packed |
|
Dilation: Greater than |
Tensors can be converted to and from CUDNN_TENSOR_NCHW_VECT_C
with cudnnTransformTensor().
Returns
CUDNN_STATUS_SUCCESS
The operation was launched successfully.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
At least one of the following is
NULL
:handle
,xDesc
,wDesc
,convDesc
,yDesc
,xData
,w
,yData
,alpha
, andbeta
xDesc
andyDesc
have a non-matching number of dimensionsxDesc
andwDesc
have a non-matching number of dimensionsxDesc
has fewer than three number of dimensionsxDesc
number of dimensions is not equal toconvDesc
array length + 2xDesc
andwDesc
have a non-matching number of input feature maps per image (or group in case of grouped convolutions)yDesc
orwDesc
indicate an output channel count that isn’t a multiple of group count (if group count has been set inconvDesc
)xDesc
,wDesc
, andyDesc
have a non-matching data typeFor some spatial dimension,
wDesc
has a spatial size that is larger than the input spatial size (including zero-padding size)
CUDNN_STATUS_NOT_SUPPORTED
At least one of the following conditions are met:
xDesc
oryDesc
have negative tensor stridingxDesc
,wDesc
, oryDesc
has a number of dimensions that is not4
or5
yDesc
spatial sizes do not match with the expected size as determined by cudnnGetConvolutionNdForwardOutputDim()The chosen algo does not support the parameters provided; see above for an exhaustive list of parameters supported for each algo
CUDNN_STATUS_MAPPING_ERROR
An error occurs during the texture object creation associated with the filter data.
CUDNN_STATUS_EXECUTION_FAILED
The function failed to launch on the GPU.
cudnnCreateConvolutionDescriptor()
This function has been deprecated in cuDNN 9.0.
This function creates a convolution descriptor object by allocating the memory needed to hold its opaque structure. For more information, refer to cudnnConvolutionDescriptor_t.
cudnnStatus_t cudnnCreateConvolutionDescriptor( cudnnConvolutionDescriptor_t *convDesc)
Returns
CUDNN_STATUS_SUCCESS
The object was created successfully.
CUDNN_STATUS_ALLOC_FAILED
The resources could not be allocated.
cudnnCreateFusedOpsConstParamPack()
This function has been deprecated in cuDNN 9.0.
This function creates an opaque structure to store the various problem size information, such as the shape, layout and the type of tensors, and the descriptors for convolution and activation, for the selected sequence of cudnnFusedOps
computations.
cudnnStatus_t cudnnCreateFusedOpsConstParamPack( cudnnFusedOpsConstParamPack_t *constPack, cudnnFusedOps_t ops);
Parameters
constPack
Input. The opaque structure that is created by this function. For more information, refer to cudnnFusedOpsConstParamPack_t.
ops
Input. The specific sequence of computations to perform in the
cudnnFusedOps
computations, as defined in the enumerant type cudnnFusedOps_t.
Returns
CUDNN_STATUS_BAD_PARAM
If either
constPack
orops
isNULL
.CUDNN_STATUS_ALLOC_FAILED
The resources could not be allocated.
CUDNN_STATUS_SUCCESS
If the descriptor is created successfully.
cudnnCreateFusedOpsPlan()
This function has been deprecated in cuDNN 9.0.
This function creates the plan descriptor for the cudnnFusedOps
computation. This descriptor contains the plan information, including the problem type and size, which kernels should be run, and the internal workspace partition.
cudnnStatus_t cudnnCreateFusedOpsPlan( cudnnFusedOpsPlan_t *plan, cudnnFusedOps_t ops);
Parameters
plan
Input. A pointer to the instance of the descriptor created by this function.
ops
Input. The specific sequence of fused operations computations for which this plan descriptor should be created. For more information, refer to cudnnFusedOps_t.
Returns
CUDNN_STATUS_BAD_PARAM
If either the input
*plan
isNULL
or theops
input is not a validcudnnFusedOp
enum.CUDNN_STATUS_ALLOC_FAILED
The resources could not be allocated.
CUDNN_STATUS_SUCCESS
The plan descriptor is created successfully.
cudnnCreateFusedOpsVariantParamPack()
This function has been deprecated in cuDNN 9.0.
This function creates the variant pack descriptor for the cudnnFusedOps
computation.
cudnnStatus_t cudnnCreateFusedOpsVariantParamPack( cudnnFusedOpsVariantParamPack_t *varPack, cudnnFusedOps_t ops);
Parameters
varPack
Input. Pointer to the descriptor created by this function. For more information, refer to cudnnFusedOpsVariantParamPack_t.
ops
Input. The specific sequence of fused operations computations for which this descriptor should be created.
Returns
CUDNN_STATUS_SUCCESS
The descriptor was destroyed successfully.
CUDNN_STATUS_ALLOC_FAILED
The resources could not be allocated.
CUDNN_STATUS_BAD_PARAM
If any input is invalid.
cudnnDestroyConvolutionDescriptor()
This function has been deprecated in cuDNN 9.0.
This function destroys a previously created convolution descriptor object.
cudnnStatus_t cudnnDestroyConvolutionDescriptor( cudnnConvolutionDescriptor_t convDesc)
Returns
CUDNN_STATUS_SUCCESS
The descriptor was destroyed successfully.
cudnnDestroyFusedOpsConstParamPack()
This function has been deprecated in cuDNN 9.0.
This function destroys a previously-created cudnnFusedOpsConstParamPack_t structure.
cudnnStatus_t cudnnDestroyFusedOpsConstParamPack( cudnnFusedOpsConstParamPack_t constPack);
Parameters
constPack
Input. The cudnnFusedOpsConstParamPack_t structure that should be destroyed.
Returns
CUDNN_STATUS_SUCCESS
The descriptor was destroyed successfully.
CUDNN_STATUS_INTERNAL_ERROR
The
ops
enum value is either not supported or is invalid.
cudnnDestroyFusedOpsPlan()
This function has been deprecated in cuDNN 9.0.
This function destroys the plan descriptor provided.
cudnnStatus_t cudnnDestroyFusedOpsPlan( cudnnFusedOpsPlan_t plan);
Parameters
plan
Input. The descriptor that should be destroyed by this function.
Returns
CUDNN_STATUS_SUCCESS
Either the plan descriptor is
NULL
or the descriptor was successfully destroyed.
cudnnDestroyFusedOpsVariantParamPack()
This function has been deprecated in cuDNN 9.0.
This function destroys a previously-created descriptor for cudnnFusedOps
constant parameters.
cudnnStatus_t cudnnDestroyFusedOpsVariantParamPack( cudnnFusedOpsVariantParamPack_t varPack);
Parameters
varPack
Input. The descriptor that should be destroyed.
Returns
CUDNN_STATUS_SUCCESS
The descriptor was successfully destroyed.
cudnnFindConvolutionBackwardDataAlgorithm()
This function has been deprecated in cuDNN 9.0.
This function attempts all algorithms available for cudnnConvolutionBackwardData(). It will attempt both the provided convDesc
mathType
and CUDNN_DEFAULT_MATH
(assuming the two differ).
cudnnStatus_t cudnnFindConvolutionBackwardDataAlgorithm( cudnnHandle_t handle, const cudnnFilterDescriptor_t wDesc, const cudnnTensorDescriptor_t dyDesc, const cudnnConvolutionDescriptor_t convDesc, const cudnnTensorDescriptor_t dxDesc, const int requestedAlgoCount, int *returnedAlgoCount, cudnnConvolutionBwdDataAlgoPerf_t *perfResults)
Algorithms without the CUDNN_TENSOR_OP_MATH
availability will only be tried with CUDNN_DEFAULT_MATH
, and returned as such.
Memory is allocated via cudaMalloc()
. The performance metrics are returned in the user-allocated array of cudnnConvolutionBwdDataAlgoPerf_t. These metrics are written in a sorted fashion where the first element has the lowest compute time. The total number of resulting algorithms can be queried through the API cudnnGetConvolutionBackwardDataAlgorithmMaxCount().
Note
This function is host blocking.
It is recommended to run this function prior to allocating layer data; doing otherwise may needlessly inhibit some algorithm options due to resource usage.
Parameters
handle
Input. Handle to a previously created cuDNN context.
wDesc
Input. Handle to a previously initialized filter descriptor.
dyDesc
Input. Handle to the previously initialized input differential tensor descriptor.
convDesc
Input. Previously initialized convolution descriptor.
dxDesc
Input. Handle to the previously initialized output tensor descriptor.
requestedAlgoCount
Input. The maximum number of elements to be stored in
perfResults
.returnedAlgoCount
Output. The number of output elements stored in
perfResults
.perfResults
Output. A user-allocated array to store performance metrics sorted ascending by compute time.
Returns
CUDNN_STATUS_SUCCESS
The query was successful.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
handle
is not allocated properlywDesc
,dyDesc
, ordxDesc
is not allocated properlywDesc
,dyDesc
, ordxDesc
has fewer than1
dimensionEither
returnedCount
orperfResults
isNIL
requestedCount
is less than1
CUDNN_STATUS_ALLOC_FAILED
This function was unable to allocate memory to store sample input, filters, and output.
CUDNN_STATUS_INTERNAL_ERROR
At least one of the following conditions are met:
The function was unable to allocate necessary timing objects
The function was unable to deallocate necessary timing objects
The function was unable to deallocate sample input, filters, and output
cudnnFindConvolutionBackwardDataAlgorithmEx()
This function has been deprecated in cuDNN 9.0.
This function attempts all algorithms available for cudnnConvolutionBackwardData(). It will attempt both the provided convDesc
mathType
and CUDNN_DEFAULT_MATH
(assuming the two differ).
cudnnStatus_t cudnnFindConvolutionBackwardDataAlgorithmEx( cudnnHandle_t handle, const cudnnFilterDescriptor_t wDesc, const void *w, const cudnnTensorDescriptor_t dyDesc, const void *dy, const cudnnConvolutionDescriptor_t convDesc, const cudnnTensorDescriptor_t dxDesc, void *dx, const int requestedAlgoCount, int *returnedAlgoCount, cudnnConvolutionBwdDataAlgoPerf_t *perfResults, void *workSpace, size_t workSpaceSizeInBytes)
Algorithms without the CUDNN_TENSOR_OP_MATH
availability will only be tried with CUDNN_DEFAULT_MATH
, and returned as such.
Memory is allocated via cudaMalloc()
. The performance metrics are returned in the user-allocated array of cudnnConvolutionBwdDataAlgoPerf_t. These metrics are written in a sorted fashion where the first element has the lowest compute time. The total number of resulting algorithms can be queried through the API cudnnGetConvolutionBackwardDataAlgorithmMaxCount().
Note
This function is host blocking.
Parameters
handle
Input. Handle to a previously created cuDNN context.
wDesc
Input. Handle to a previously initialized filter descriptor.
w
Input. Data pointer to GPU memory associated with the filter descriptor
wDesc
.dyDesc
Input. Handle to the previously initialized input differential tensor descriptor.
dy
Input. Data pointer to GPU memory associated with the filter descriptor
dyDesc
.convDesc
Input. Previously initialized convolution descriptor.
dxDesc
Input. Handle to the previously initialized output tensor descriptor.
dx
Input/Output. Data pointer to GPU memory associated with the tensor descriptor
dxDesc
. The content of this tensor will be overwritten with arbitrary values.requestedAlgoCount
Input. The maximum number of elements to be stored in
perfResults
.returnedAlgoCount
Output. The number of output elements stored in
perfResults
.perfResults
Output. A user-allocated array to store performance metrics sorted ascending by compute time.
workSpace
Input. Data pointer to GPU memory is a necessary workspace for some algorithms. The size of this workspace will determine the availability of algorithms. A
NIL
pointer is considered aworkSpace
of0
bytes.workSpaceSizeInBytes
Input. Specifies the size in bytes of the provided
workSpace
.
Returns
CUDNN_STATUS_SUCCESS
The query was successful.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
handle
is not allocated properlywDesc
,dyDesc
, ordxDesc
is not allocated properlywDesc
,dyDesc
, ordxDesc
has fewer than1
dimensionEither
returnedCount
orperfResults
isNIL
requestedCount
is less than1
CUDNN_STATUS_INTERNAL_ERROR
At least one of the following conditions are met:
The function was unable to allocate necessary timing objects
The function was unable to deallocate necessary timing objects
The function was unable to deallocate sample input, filters, and output
cudnnFindConvolutionBackwardFilterAlgorithm()
This function has been deprecated in cuDNN 9.0.
This function attempts all algorithms available for cudnnConvolutionBackwardFilter(). It will attempt both the provided convDesc mathType
and CUDNN_DEFAULT_MATH
(assuming the two differ).
cudnnStatus_t cudnnFindConvolutionBackwardFilterAlgorithm( cudnnHandle_t handle, const cudnnTensorDescriptor_t xDesc, const cudnnTensorDescriptor_t dyDesc, const cudnnConvolutionDescriptor_t convDesc, const cudnnFilterDescriptor_t dwDesc, const int requestedAlgoCount, int *returnedAlgoCount, cudnnConvolutionBwdFilterAlgoPerf_t *perfResults)
Algorithms without the CUDNN_TENSOR_OP_MATH
availability will only be tried with CUDNN_DEFAULT_MATH
, and returned as such.
Memory is allocated via cudaMalloc()
. The performance metrics are returned in the user-allocated array of cudnnConvolutionBwdFilterAlgoPerf_t. These metrics are written in a sorted fashion where the first element has the lowest compute time. The total number of resulting algorithms can be queried through the API cudnnGetConvolutionBackwardFilterAlgorithmMaxCount().
Note
This function is host blocking.
It is recommended to run this function prior to allocating layer data; doing otherwise may needlessly inhibit some algorithm options due to resource usage.
Parameters
handle
Input. Handle to a previously created cuDNN context.
xDesc
Input. Handle to the previously initialized input tensor descriptor.
dyDesc
Input. Handle to the previously initialized input differential tensor descriptor.
convDesc
Input. Previously initialized convolution descriptor.
dwDesc
Input. Handle to a previously initialized filter descriptor.
requestedAlgoCount
Input. The maximum number of elements to be stored in
perfResults
.returnedAlgoCount
Output. The number of output elements stored in
perfResults
.perfResults
Output. A user-allocated array to store performance metrics sorted ascending by compute time.
Returns
CUDNN_STATUS_SUCCESS
The query was successful.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
handle
is not allocated properlyxDesc
,dyDesc
, ordwDesc
are not allocated properlyxDesc
,dyDesc
, ordwDesc
has fewer than1
dimensionEither
returnedCount
orperfResults
isNIL
requestedCount
is less than1
CUDNN_STATUS_ALLOC_FAILED
This function was unable to allocate memory to store sample input, filters and output.
CUDNN_STATUS_INTERNAL_ERROR
At least one of the following conditions are met:
The function was unable to allocate necessary timing objects.
The function was unable to deallocate necessary timing objects.
The function was unable to deallocate sample input, filters, and output.
cudnnFindConvolutionBackwardFilterAlgorithmEx()
This function has been deprecated in cuDNN 9.0.
This function attempts all algorithms available for cudnnConvolutionBackwardFilter(). It will attempt both the provided convDesc mathType
and CUDNN_DEFAULT_MATH
(assuming the two differ).
cudnnStatus_t cudnnFindConvolutionBackwardFilterAlgorithmEx( cudnnHandle_t handle, const cudnnTensorDescriptor_t xDesc, const void *x, const cudnnTensorDescriptor_t dyDesc, const void *dy, const cudnnConvolutionDescriptor_t convDesc, const cudnnFilterDescriptor_t dwDesc, void *dw, const int requestedAlgoCount, int *returnedAlgoCount, cudnnConvolutionBwdFilterAlgoPerf_t *perfResults, void *workSpace, size_t workSpaceSizeInBytes)
Algorithms without the CUDNN_TENSOR_OP_MATH
availability will only be tried with CUDNN_DEFAULT_MATH
, and returned as such.
Memory is allocated via cudaMalloc()
. The performance metrics are returned in the user-allocated array of cudnnConvolutionBwdFilterAlgoPerf_t. These metrics are written in a sorted fashion where the first element has the lowest compute time. The total number of resulting algorithms can be queried through the API cudnnGetConvolutionBackwardFilterAlgorithmMaxCount().
Note
This function is host blocking.
Parameters
handle
Input. Handle to a previously created cuDNN context.
xDesc
Input. Handle to the previously initialized input tensor descriptor.
x
Input. Data pointer to GPU memory associated with the filter descriptor
xDesc
.dyDesc
Input. Handle to the previously initialized input differential tensor descriptor.
dy
Input. Handle to the previously initialized input differential tensor descriptor.
convDesc
Input. Previously initialized convolution descriptor.
dwDesc
Input. Handle to a previously initialized filter descriptor.
dw
Input/Output. Data pointer to GPU memory associated with the filter descriptor
dwDesc
. The content of this tensor will be overwritten with arbitrary values.requestedAlgoCount
Input. The maximum number of elements to be stored in
perfResults
.returnedAlgoCount
Output. The number of output elements stored in
perfResults
.perfResults
Output. A user-allocated array to store performance metrics sorted ascending by compute time.
workSpace
Input. Data pointer to GPU memory is a necessary workspace for some algorithms. The size of this workspace will determine the availability of algorithms. A
NIL
pointer is considered aworkSpace
of0
bytes.workSpaceSizeInBytes
Input. Specifies the size in bytes of the provided
workSpace
.
Returns
CUDNN_STATUS_SUCCESS
The query was successful.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
handle
is not allocated properlyxDesc
,dyDesc
, ordwDesc
are not allocated properlyxDesc
,dyDesc
, ordwDesc
has fewer than1
dimensionx
,dy
, ordw
isNIL
Either
returnedCount
orperfResults
isNIL
requestedCount
is less than1
CUDNN_STATUS_INTERNAL_ERROR
At least one of the following conditions are met:
The function was unable to allocate necessary timing objects.
The function was unable to deallocate necessary timing objects.
The function was unable to deallocate sample input, filters, and output.
cudnnFindConvolutionForwardAlgorithm()
This function has been deprecated in cuDNN 9.0.
This function attempts all algorithms available for cudnnConvolutionForward(). It will attempt both the provided convDesc mathType
and CUDNN_DEFAULT_MATH
(assuming the two differ).
cudnnStatus_t cudnnFindConvolutionForwardAlgorithm( cudnnHandle_t handle, const cudnnTensorDescriptor_t xDesc, const cudnnFilterDescriptor_t wDesc, const cudnnConvolutionDescriptor_t convDesc, const cudnnTensorDescriptor_t yDesc, const int requestedAlgoCount, int *returnedAlgoCount, cudnnConvolutionFwdAlgoPerf_t *perfResults)
Algorithms without the CUDNN_TENSOR_OP_MATH
availability will only be tried with CUDNN_DEFAULT_MATH
, and returned as such.
Memory is allocated via cudaMalloc()
. The performance metrics are returned in the user-allocated array of cudnnConvolutionFwdAlgoPerf_t. These metrics are written in a sorted fashion where the first element has the lowest compute time. The total number of resulting algorithms can be queried through the API cudnnGetConvolutionForwardAlgorithmMaxCount().
Note
This function is host blocking.
It is recommended to run this function prior to allocating layer data; doing otherwise may needlessly inhibit some algorithm options due to resource usage.
Parameters
handle
Input. Handle to a previously created cuDNN context.
xDesc
Input. Handle to the previously initialized input tensor descriptor.
wDesc
Input. Handle to a previously initialized filter descriptor.
convDesc
Input. Previously initialized convolution descriptor.
yDesc
Input. Handle to the previously initialized output tensor descriptor.
requestedAlgoCount
Input. The maximum number of elements to be stored in
perfResults
.returnedAlgoCount
Output. The number of output elements stored in
perfResults
.perfResults
Output. A user-allocated array to store performance metrics sorted ascending by compute time.
Returns
CUDNN_STATUS_SUCCESS
The query was successful.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
handle
is not allocated properlyxDesc
,dyDesc
, ordwDesc
are not allocated properlyxDesc
,dyDesc
, ordwDesc
has fewer than1
dimensionEither
returnedCount
orperfResults
isNIL
requestedCount
is less than1
CUDNN_STATUS_ALLOC_FAILED
This function was unable to allocate memory to store sample input, filters, and output.
CUDNN_STATUS_INTERNAL_ERROR
At least one of the following conditions are met:
The function was unable to allocate necessary timing objects.
The function was unable to deallocate necessary timing objects.
The function was unable to deallocate sample input, filters, and output.
cudnnFindConvolutionForwardAlgorithmEx()
This function has been deprecated in cuDNN 9.0.
This function attempts all algorithms available for cudnnConvolutionForward(). It will attempt both the provided convDesc mathType
and CUDNN_DEFAULT_MATH
(assuming the two differ).
cudnnStatus_t cudnnFindConvolutionForwardAlgorithmEx( cudnnHandle_t handle, const cudnnTensorDescriptor_t xDesc, const void *x, const cudnnFilterDescriptor_t wDesc, const void *w, const cudnnConvolutionDescriptor_t convDesc, const cudnnTensorDescriptor_t yDesc, void *y, const int requestedAlgoCount, int *returnedAlgoCount, cudnnConvolutionFwdAlgoPerf_t *perfResults, void *workSpace, size_t workSpaceSizeInBytes)
Algorithms without the CUDNN_TENSOR_OP_MATH
availability will only be tried with CUDNN_DEFAULT_MATH
, and returned as such.
Memory is allocated via cudaMalloc()
. The performance metrics are returned in the user-allocated array of cudnnConvolutionFwdAlgoPerf_t. These metrics are written in a sorted fashion where the first element has the lowest compute time. The total number of resulting algorithms can be queried through the API cudnnGetConvolutionForwardAlgorithmMaxCount().
Note
This function is host blocking.
Parameters
handle
Input. Handle to a previously created cuDNN context.
xDesc
Input. Handle to the previously initialized input tensor descriptor.
x
Input. Data pointer to GPU memory associated with the tensor descriptor
xDesc
.wDesc
Input. Handle to a previously initialized filter descriptor.
w
Input. Data pointer to GPU memory associated with the filter descriptor
wDesc
.convDesc
Input. Previously initialized convolution descriptor.
yDesc
Input. Handle to the previously initialized output tensor descriptor.
y
Input/Output. Data pointer to GPU memory associated with the tensor descriptor
yDesc
. The content of this tensor will be overwritten with arbitrary values.requestedAlgoCount
Input. The maximum number of elements to be stored in
perfResults
.returnedAlgoCount
Output. The number of output elements stored in
perfResults
.perfResults
Output. A user-allocated array to store performance metrics sorted ascending by compute time.
workSpace
Input. Data pointer to GPU memory is a necessary workspace for some algorithms. The size of this workspace will determine the availability of algorithms. A
NIL
pointer is considered aworkSpace
of0
bytes.workSpaceSizeInBytes
Input. Specifies the size in bytes of the provided
workSpace
.
Returns
CUDNN_STATUS_SUCCESS
The query was successful.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
handle
is not allocated properlyxDesc
,dyDesc
, ordwDesc
are not allocated properlyxDesc
,dyDesc
, ordwDesc
has fewer than1
dimensionx
,w
, ory
isNIL
Either
returnedCount
orperfResults
isNIL
requestedCount
is less than1
CUDNN_STATUS_INTERNAL_ERROR
At least one of the following conditions are met:
The function was unable to allocate necessary timing objects.
The function was unable to deallocate necessary timing objects.
The function was unable to deallocate sample input, filters, and output.
cudnnFusedOpsExecute()
This function executes the sequence of cudnnFusedOps
operations.
cudnnStatus_t cudnnFusedOpsExecute( cudnnHandle_t handle, const cudnnFusedOpsPlan_t plan, cudnnFusedOpsVariantParamPack_t varPack);
Parameters
handle
Input. Pointer to the cuDNN library context.
plan
Input. Pointer to a previously-created and initialized plan descriptor.
varPack
Input. Pointer to the descriptor to the variant parameters pack.
Returns
CUDNN_STATUS_BAD_PARAM
If the type of cudnnFusedOps_t in the plan descriptor is unsupported.
cudnnGetConvolution2dDescriptor()
This function has been deprecated in cuDNN 9.0.
This function queries a previously initialized 2D convolution descriptor object.
cudnnStatus_t cudnnGetConvolution2dDescriptor( const cudnnConvolutionDescriptor_t convDesc, int *pad_h, int *pad_w, int *u, int *v, int *dilation_h, int *dilation_w, cudnnConvolutionMode_t *mode, cudnnDataType_t *computeType)
Parameters
convDesc
Input/Output. Handle to a previously created convolution descriptor.
pad_h
Output. Zero-padding height: number of rows of zeros implicitly concatenated onto the top and onto the bottom of input images.
pad_w
Output. Zero-padding width: number of columns of zeros implicitly concatenated onto the left and onto the right of input images.
u
Output. Vertical filter stride.
v
Output. Horizontal filter stride.
dilation_h
Output. Filter height dilation.
dilation_w
Output. Filter width dilation.
mode
Output. Convolution mode.
computeType
Output. Compute precision.
Returns
CUDNN_STATUS_SUCCESS
The operation was successful.
CUDNN_STATUS_BAD_PARAM
The parameter
convDesc
isNIL
.
cudnnGetConvolution2dForwardOutputDim()
This function has been deprecated in cuDNN 9.0.
This function returns the dimensions of the resulting 4D tensor of a 2D convolution, given the convolution descriptor, the input tensor descriptor and the filter descriptor. This function can help to set up the output tensor and allocate the proper amount of memory prior to launching the actual convolution.
cudnnStatus_t cudnnGetConvolution2dForwardOutputDim( const cudnnConvolutionDescriptor_t convDesc, const cudnnTensorDescriptor_t inputTensorDesc, const cudnnFilterDescriptor_t filterDesc, int *n, int *c, int *h, int *w)
Each dimension h
and w
of the output images is computed as follows:
outputDim = 1 + ( inputDim + 2*pad - (((filterDim-1)*dilation)+1) )/convolutionStride;Note
The dimensions provided by this routine must be strictly respected when calling cudnnConvolutionForward() or cudnnConvolutionBackwardBias(). Providing a smaller or larger output tensor is not supported by the convolution routines.
Parameters
convDesc
Input. Handle to a previously created convolution descriptor.
inputTensorDesc
Input. Handle to a previously initialized tensor descriptor.
filterDesc
Input. Handle to a previously initialized filter descriptor.
n
Output. Number of output images.
c
Output. Number of output feature maps per image.
h
Output. Height of each output feature map.
w
Output. Width of each output feature map.
Returns
CUDNN_STATUS_BAD_PARAM
One or more of the descriptors has not been created correctly or there is a mismatch between the feature maps of
inputTensorDesc
andfilterDesc
.CUDNN_STATUS_SUCCESS
The object was set successfully.
cudnnGetConvolutionBackwardDataAlgorithm_v7()
This function has been deprecated in cuDNN 9.0.
This function serves as a heuristic for obtaining the best suited algorithm for cudnnConvolutionBackwardData() for the given layer specifications. This function will return all algorithms (including CUDNN_TENSOR_OP_MATH
and CUDNN_DEFAULT_MATH
versions of algorithms where CUDNN_TENSOR_OP_MATH
may be available) sorted by expected (based on internal heuristic) relative performance with the fastest being index 0
of perfResults
. For an exhaustive search for the fastest algorithm, use cudnnFindConvolutionBackwardDataAlgorithm(). The total number of resulting algorithms can be queried through the returnedAlgoCount
variable.
cudnnStatus_t cudnnGetConvolutionBackwardDataAlgorithm_v7( cudnnHandle_t handle, const cudnnFilterDescriptor_t wDesc, const cudnnTensorDescriptor_t dyDesc, const cudnnConvolutionDescriptor_t convDesc, const cudnnTensorDescriptor_t dxDesc, const int requestedAlgoCount, int *returnedAlgoCount, cudnnConvolutionBwdDataAlgoPerf_t *perfResults)
Parameters
handle
Input. Handle to a previously created cuDNN context.
wDesc
Input. Handle to a previously initialized filter descriptor.
dyDesc
Input. Handle to the previously initialized input differential tensor descriptor.
convDesc
Input. Previously initialized convolution descriptor.
dxDesc
Input. Handle to the previously initialized output tensor descriptor.
requestedAlgoCount
Input. The maximum number of elements to be stored in
perfResults
.returnedAlgoCount
Output. The number of output elements stored in
perfResults
.perfResults
Output. A user-allocated array to store performance metrics sorted ascending by compute time.
Returns
CUDNN_STATUS_SUCCESS
The query was successful.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
One of the parameters
handle
,wDesc
,dyDesc
,convDesc
,dxDesc
,perfResults
, orreturnedAlgoCount
isNULL
.The numbers of feature maps of the input tensor and output tensor differ.
The
dataType
of the two tensor descriptors or the filters are different.requestedAlgoCount
is less than or equal to0
.
cudnnGetConvolutionBackwardDataAlgorithmMaxCount()
This function has been deprecated in cuDNN 9.0.
This function returns the maximum number of algorithms which can be returned from cudnnFindConvolutionBackwardDataAlgorithm() and cudnnGetConvolutionForwardAlgorithm_v7(). This is the sum of all algorithms plus the sum of all algorithms with Tensor Core operations supported for the current device.
cudnnStatus_t cudnnGetConvolutionBackwardDataAlgorithmMaxCount( cudnnHandle_t handle, int *count)
Parameters
handle
Input. Handle to a previously created cuDNN context.
count
Output. The resulting maximum number of algorithms.
Returns
CUDNN_STATUS_SUCCESS
The function was successful.
CUDNN_STATUS_BAD_PARAM
The provided handle is not allocated properly.
cudnnGetConvolutionBackwardDataWorkspaceSize()
This function has been deprecated in cuDNN 9.0.
This function returns the amount of GPU memory workspace the user needs to allocate to be able to call cudnnConvolutionBackwardData() with the specified algorithm. The workspace allocated will then be passed to the routine cudnnConvolutionBackwardData(). The specified algorithm can be the result of the call to cudnnGetConvolutionBackwardDataAlgorithm_v7() or can be chosen arbitrarily by the user. Note that not every algorithm is available for every configuration of the input tensor and/or every configuration of the convolution descriptor.
cudnnStatus_t cudnnGetConvolutionBackwardDataWorkspaceSize( cudnnHandle_t handle, const cudnnFilterDescriptor_t wDesc, const cudnnTensorDescriptor_t dyDesc, const cudnnConvolutionDescriptor_t convDesc, const cudnnTensorDescriptor_t dxDesc, cudnnConvolutionBwdDataAlgo_t algo, size_t *sizeInBytes)
Parameters
handle
Input. Handle to a previously created cuDNN context.
wDesc
Input. Handle to a previously initialized filter descriptor.
dyDesc
Input. Handle to the previously initialized input differential tensor descriptor.
convDesc
Input. Previously initialized convolution descriptor.
dxDesc
Input. Handle to the previously initialized output tensor descriptor.
algo
Input. Enumerant that specifies the chosen convolution algorithm.
sizeInBytes
Output. Amount of GPU memory needed as workspace to be able to execute a forward convolution with the specified
algo
.
Returns
CUDNN_STATUS_SUCCESS
The query was successful.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
The numbers of feature maps of the input tensor and output tensor differ.
The
dataType
of the two tensor descriptors or the filter are different.
CUDNN_STATUS_NOT_SUPPORTED
The combination of the tensor descriptors, filter descriptor, and convolution descriptor is not supported for the specified algorithm.
cudnnGetConvolutionBackwardFilterAlgorithm_v7()
This function has been deprecated in cuDNN 9.0.
This function serves as a heuristic for obtaining the best suited algorithm for cudnnConvolutionBackwardFilter() for the given layer specifications. This function will return all algorithms (including CUDNN_TENSOR_OP_MATH
and CUDNN_DEFAULT_MATH
versions of algorithms where CUDNN_TENSOR_OP_MATH
may be available) sorted by expected (based on internal heuristic) relative performance with the fastest being index 0
of perfResults
. For an exhaustive search for the fastest algorithm, use cudnnFindConvolutionBackwardFilterAlgorithm(). The total number of resulting algorithms can be queried through the returnedAlgoCount
variable.
cudnnStatus_t cudnnGetConvolutionBackwardFilterAlgorithm_v7( cudnnHandle_t handle, const cudnnTensorDescriptor_t xDesc, const cudnnTensorDescriptor_t dyDesc, const cudnnConvolutionDescriptor_t convDesc, const cudnnFilterDescriptor_t dwDesc, const int requestedAlgoCount, int *returnedAlgoCount, cudnnConvolutionBwdFilterAlgoPerf_t *perfResults)
Parameters
handle
Input. Handle to a previously created cuDNN context.
xDesc
Input. Handle to the previously initialized input tensor descriptor.
dyDesc
Input. Handle to the previously initialized input differential tensor descriptor.
convDesc
Input. Previously initialized convolution descriptor.
dwDesc
Input. Handle to a previously initialized filter descriptor.
requestedAlgoCount
Input. The maximum number of elements to be stored in
perfResults
.returnedAlgoCount
Output. The number of output elements stored in
perfResults
.perfResults
Output. A user-allocated array to store performance metrics sorted ascending by compute time.
Returns
CUDNN_STATUS_SUCCESS
The query was successful.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
One of the parameters
handle
,xDesc
,dyDesc
,convDesc
,dwDesc
,perfResults
, orreturnedAlgoCount
isNULL
.The numbers of feature maps of the input tensor and output tensor differ.
The
dataType
of the two tensor descriptors or the filter are different.requestedAlgoCount
is less than or equal to0
.
cudnnGetConvolutionBackwardFilterAlgorithmMaxCount()
This function has been deprecated in cuDNN 9.0.
This function returns the maximum number of algorithms which can be returned from cudnnFindConvolutionBackwardFilterAlgorithm() and cudnnGetConvolutionForwardAlgorithm_v7(). This is the sum of all algorithms plus the sum of all algorithms with Tensor Core operations supported for the current device.
cudnnStatus_t cudnnGetConvolutionBackwardFilterAlgorithmMaxCount( cudnnHandle_t handle, int *count)
Parameters
handle
Input. Handle to a previously created cuDNN context.
count
Output. The resulting maximum count of algorithms.
Returns
CUDNN_STATUS_SUCCESS
The function was successful.
CUDNN_STATUS_BAD_PARAM
The provided handle is not allocated properly.
cudnnGetConvolutionBackwardFilterWorkspaceSize()
This function has been deprecated in cuDNN 9.0.
This function returns the amount of GPU memory workspace the user needs to allocate to be able to call cudnnConvolutionBackwardFilter() with the specified algorithm. The workspace allocated will then be passed to the routine cudnnConvolutionBackwardFilter(). The specified algorithm can be the result of the call to cudnnGetConvolutionBackwardFilterAlgorithm_v7() or can be chosen arbitrarily by the user. Note that not every algorithm is available for every configuration of the input tensor and/or every configuration of the convolution descriptor.
cudnnStatus_t cudnnGetConvolutionBackwardFilterWorkspaceSize( cudnnHandle_t handle, const cudnnTensorDescriptor_t xDesc, const cudnnTensorDescriptor_t dyDesc, const cudnnConvolutionDescriptor_t convDesc, const cudnnFilterDescriptor_t dwDesc, cudnnConvolutionBwdFilterAlgo_t algo, size_t *sizeInBytes)
Parameters
handle
Input. Handle to a previously created cuDNN context.
xDesc
Input. Handle to the previously initialized input tensor descriptor.
dyDesc
Input. Handle to the previously initialized input differential tensor descriptor.
convDesc
Input. Previously initialized convolution descriptor.
dwDesc
Input. Handle to a previously initialized filter descriptor.
algo
Input. Enumerant that specifies the chosen convolution algorithm.
sizeInBytes
Output. Amount of GPU memory needed as workspace to be able to execute a forward convolution with the specified
algo
.
Returns
CUDNN_STATUS_SUCCESS
The query was successful.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
The numbers of feature maps of the input tensor and output tensor differ.
The
dataType
of the two tensor descriptors or the filter are different.
CUDNN_STATUS_NOT_SUPPORTED
The combination of the tensor descriptors, filter descriptor and convolution descriptor is not supported for the specified algorithm.
cudnnGetConvolutionForwardAlgorithm_v7()
This function has been deprecated in cuDNN 9.0.
This function serves as a heuristic for obtaining the best suited algorithm for cudnnConvolutionForward() for the given layer specifications. This function will return all algorithms (including CUDNN_TENSOR_OP_MATH
and CUDNN_DEFAULT_MATH
versions of algorithms where CUDNN_TENSOR_OP_MATH
may be available) sorted by expected (based on internal heuristic) relative performance with the fastest being index 0
of perfResults
. For an exhaustive search for the fastest algorithm, use cudnnFindConvolutionForwardAlgorithm(). The total number of resulting algorithms can be queried through the returnedAlgoCount
variable.
cudnnStatus_t cudnnGetConvolutionForwardAlgorithm_v7( cudnnHandle_t handle, const cudnnTensorDescriptor_t xDesc, const cudnnFilterDescriptor_t wDesc, const cudnnConvolutionDescriptor_t convDesc, const cudnnTensorDescriptor_t yDesc, const int requestedAlgoCount, int *returnedAlgoCount, cudnnConvolutionFwdAlgoPerf_t *perfResults)
Parameters
handle
Input. Handle to a previously created cuDNN context.
xDesc
Input. Handle to the previously initialized input tensor descriptor.
wDesc
Input. Handle to a previously initialized convolution filter descriptor.
convDesc
Input. Previously initialized convolution descriptor.
yDesc
Input. Handle to the previously initialized output tensor descriptor.
requestedAlgoCount
Input. The maximum number of elements to be stored in
perfResults
.returnedAlgoCount
Output. The number of output elements stored in
perfResults
.perfResults
Output. A user-allocated array to store performance metrics sorted ascending by compute time.
Returns
CUDNN_STATUS_SUCCESS
The query was successful.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
One of the parameters
handle
,xDesc
,wDesc
,convDesc
,yDesc
,perfResults
, orreturnedAlgoCount
isNULL
.Either
yDesc
orwDesc
have different dimensions fromxDesc
.The data types of tensors
xDesc
,yDesc
, orwDesc
are not all the same.The number of feature maps in
xDesc
andwDesc
differs.The tensor
xDesc
has a dimension smaller than3
.requestedAlgoCount
is less than or equal to0
.
cudnnGetConvolutionForwardAlgorithmMaxCount()
This function has been deprecated in cuDNN 9.0.
This function returns the maximum number of algorithms which can be returned from cudnnFindConvolutionForwardAlgorithm() and cudnnGetConvolutionForwardAlgorithm_v7(). This is the sum of all algorithms plus the sum of all algorithms with Tensor Core operations supported for the current device.
cudnnStatus_t cudnnGetConvolutionForwardAlgorithmMaxCount( cudnnHandle_t handle, int *count)
Parameters
handle
Input. Handle to a previously created cuDNN context.
count
Output. The resulting maximum number of algorithms.
Returns
CUDNN_STATUS_SUCCESS
The function was successful.
CUDNN_STATUS_BAD_PARAM
The provided handle is not allocated properly.
cudnnGetConvolutionForwardWorkspaceSize()
This function has been deprecated in cuDNN 9.0.
This function returns the amount of GPU memory workspace the user needs to allocate to be able to call cudnnConvolutionForward() with the specified algorithm. The workspace allocated will then be passed to the routine cudnnConvolutionForward(). The specified algorithm can be the result of the call to cudnnGetConvolutionForwardAlgorithm_v7() or can be chosen arbitrarily by the user. Note that not every algorithm is available for every configuration of the input tensor and/or every configuration of the convolution descriptor.
cudnnStatus_t cudnnGetConvolutionForwardWorkspaceSize( cudnnHandle_t handle, const cudnnTensorDescriptor_t xDesc, const cudnnFilterDescriptor_t wDesc, const cudnnConvolutionDescriptor_t convDesc, const cudnnTensorDescriptor_t yDesc, cudnnConvolutionFwdAlgo_t algo, size_t *sizeInBytes)
Parameters
handle
Input. Handle to a previously created cuDNN context.
xDesc
Input. Handle to the previously initialized
x
tensor descriptor.wDesc
Input. Handle to a previously initialized filter descriptor.
convDesc
Input. Previously initialized convolution descriptor.
yDesc
Input. Handle to the previously initialized
y
tensor descriptor.algo
Input. Enumerant that specifies the chosen convolution algorithm.
sizeInBytes
Output. Amount of GPU memory needed as workspace to be able to execute a forward convolution with the specified
algo
.
Returns
CUDNN_STATUS_SUCCESS
The query was successful.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
One of the parameters
handle
,xDesc
,wDesc
,convDesc
, oryDesc
isNULL
.The tensor
yDesc
orwDesc
are not of the same dimension asxDesc
.The tensor
xDesc
,yDesc
, orwDesc
are not of the same data type.The numbers of feature maps of the tensor
xDesc
andwDesc
differ.The tensor
xDesc
has a dimension smaller than3
.
CUDNN_STATUS_NOT_SUPPORTED
The combination of the tensor descriptors, filter descriptor, and convolution descriptor is not supported for the specified algorithm.
cudnnGetConvolutionGroupCount()
This function has been deprecated in cuDNN 9.0.
This function returns the group count specified in the given convolution descriptor.
cudnnStatus_t cudnnGetConvolutionGroupCount( cudnnConvolutionDescriptor_t convDesc, int *groupCount)
Returns
CUDNN_STATUS_SUCCESS
The group count was returned successfully.
CUDNN_STATUS_BAD_PARAM
An invalid convolution descriptor was provided.
cudnnGetConvolutionMathType()
This function has been deprecated in cuDNN 9.0.
This function returns the math type specified in a given convolution descriptor.
cudnnStatus_t cudnnGetConvolutionMathType( cudnnConvolutionDescriptor_t convDesc, cudnnMathType_t *mathType)
Returns
CUDNN_STATUS_SUCCESS
The math type was returned successfully.
CUDNN_STATUS_BAD_PARAM
An invalid convolution descriptor was provided.
cudnnGetConvolutionNdDescriptor()
This function has been deprecated in cuDNN 9.0.
This function queries a previously initialized convolution descriptor object.
cudnnStatus_t cudnnGetConvolutionNdDescriptor( const cudnnConvolutionDescriptor_t convDesc, int arrayLengthRequested, int *arrayLength, int padA[], int filterStrideA[], int dilationA[], cudnnConvolutionMode_t *mode, cudnnDataType_t *dataType)
Parameters
convDesc
Input/Output. Handle to a previously created convolution descriptor.
arrayLengthRequested
Input. Dimension of the expected convolution descriptor. It is also the minimum size of the arrays
padA
,filterStrideA
, anddilationA
in order to be able to hold the resultsarrayLength
Output. Actual dimension of the convolution descriptor.
padA
Output. Array of dimension of at least
arrayLengthRequested
that will be filled with the padding parameters from the provided convolution descriptor.filterStrideA
Output. Array of dimension of at least
arrayLengthRequested
that will be filled with the filter stride from the provided convolution descriptor.dilationA
Output. Array of dimension of at least
arrayLengthRequested
that will be filled with the dilation parameters from the provided convolution descriptor.mode
Output. Convolution mode of the provided descriptor.
datatype
Output. Datatype of the provided descriptor.
Returns
CUDNN_STATUS_SUCCESS
The query was successful.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
The descriptor
convDesc
isNIL
.The
arrayLengthRequest
is negative.
CUDNN_STATUS_NOT_SUPPORTED
The
arrayLengthRequested
is greater thanCUDNN_DIM_MAX-2
.
cudnnGetConvolutionNdForwardOutputDim()
This function has been deprecated in cuDNN 9.0.
This function returns the dimensions of the resulting Nd
tensor of a nbDims-2-D
convolution, given the convolution descriptor, the input tensor descriptor and the filter descriptor This function can help to setup the output tensor and allocate the proper amount of memory prior to launch the actual convolution.
cudnnStatus_t cudnnGetConvolutionNdForwardOutputDim( const cudnnConvolutionDescriptor_t convDesc, const cudnnTensorDescriptor_t inputTensorDesc, const cudnnFilterDescriptor_t filterDesc, int nbDims, int tensorOuputDimA[])
Each dimension of the (nbDims-2)-D
images of the output tensor is computed as follows:
outputDim = 1 + ( inputDim + 2*pad - (((filterDim-1)*dilation)+1) )/convolutionStride;
The dimensions provided by this routine must be strictly respected when calling cudnnConvolutionForward() or cudnnConvolutionBackwardBias(). Providing a smaller or larger output tensor is not supported by the convolution routines.
Parameters
convDesc
Input. Handle to a previously created convolution descriptor.
inputTensorDesc
Input. Handle to a previously initialized tensor descriptor.
filterDesc
Input. Handle to a previously initialized filter descriptor.
nbDims
Input. Dimension of the output tensor
tensorOuputDimA
Output. Array of dimensions
nbDims
that contains on exit of this routine the sizes of the output tensor
Returns
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
One of the parameters
convDesc
,inputTensorDesc
, andfilterDesc
isNIL
.The dimension of the filter descriptor
filterDesc
is different from the dimension of input tensor descriptorinputTensorDesc
.The dimension of the convolution descriptor is different from the dimension of input tensor descriptor
inputTensorDesc-2
.The features map of the filter descriptor
filterDesc
is different from the one of input tensor descriptorinputTensorDesc
.The size of the dilated filter
filterDesc
is larger than the padded sizes of the input tensor.The dimension
nbDims
of the output array is negative or greater than the dimension of input tensor descriptorinputTensorDesc
.
CUDNN_STATUS_SUCCESS
The routine exited successfully.
cudnnGetConvolutionReorderType()
This function has been deprecated in cuDNN 9.0.
This function retrieves the convolution reorder type from the given convolution descriptor.
cudnnStatus_t cudnnGetConvolutionReorderType( cudnnConvolutionDescriptor_t convDesc, cudnnReorderType_t *reorderType);
Parameters
convDesc
Input. The convolution descriptor from which the reorder type should be retrieved.
reorderType
Output. The retrieved reorder type. For more information, refer to cudnnReorderType_t.
Returns
CUDNN_STATUS_BAD_PARAM
One of the inputs to this function is not valid.
CUDNN_STATUS_SUCCESS
The reorder type is retrieved successfully.
cudnnGetFoldedConvBackwardDataDescriptors()
This function calculates folding descriptors for backward data gradients. It takes as input the data descriptors along with the convolution descriptor and computes the folded data descriptors and the folding transform descriptors. These can then be used to do the actual folding transform.
cudnnStatus_t cudnnGetFoldedConvBackwardDataDescriptors(const cudnnHandle_t handle, const cudnnFilterDescriptor_t filterDesc, const cudnnTensorDescriptor_t diffDesc, const cudnnConvolutionDescriptor_t convDesc, const cudnnTensorDescriptor_t gradDesc, const cudnnTensorFormat_t transformFormat, cudnnFilterDescriptor_t foldedFilterDesc, cudnnTensorDescriptor_t paddedDiffDesc, cudnnConvolutionDescriptor_t foldedConvDesc, cudnnTensorDescriptor_t foldedGradDesc, cudnnTensorTransformDescriptor_t filterFoldTransDesc, cudnnTensorTransformDescriptor_t diffPadTransDesc, cudnnTensorTransformDescriptor_t gradFoldTransDesc, cudnnTensorTransformDescriptor_t gradUnfoldTransDesc) ;
Parameters
handle
Input. Handle to a previously created cuDNN context.
filterDesc
Input. Filter descriptor before folding.
diffDesc
Input.
Diff
descriptor before folding.convDesc
Input. Convolution descriptor before folding.
gradDesc
Input. Gradient descriptor before folding.
transformFormat
Input. Transform format for folding.
foldedFilterDesc
Output. Folded filter descriptor.
paddedDiffDesc
Output. Padded
Diff
descriptor.foldedConvDesc
Output. Folded convolution descriptor.
foldedGradDesc
Output. Folded gradient descriptor.
filterFoldTransDesc
Output. Folding transform descriptor for filter.
diffPadTransDesc
Output. Folding transform descriptor for
Desc
.gradFoldTransDesc
Output. Folding transform descriptor for gradient.
gradUnfoldTransDesc
Output. Unfolding transform descriptor for folded gradient.
Returns
CUDNN_STATUS_SUCCESS
Folded descriptors were computed successfully.
CUDNN_STATUS_BAD_PARAM
If any of the input parameters is
NULL
or if the input tensor has more than 4 dimensions.CUDNN_STATUS_EXECUTION_FAILED
Computing the folded descriptors failed.
cudnnGetFusedOpsConstParamPackAttribute()
This function retrieves the values of the descriptor pointed to by the param
pointer input. The type of the descriptor is indicated by the enum value of paramLabel
input.
cudnnStatus_t cudnnGetFusedOpsConstParamPackAttribute( const cudnnFusedOpsConstParamPack_t constPack, cudnnFusedOpsConstParamLabel_t paramLabel, void *param, int *isNULL);
Parameters
constPack
Input. The opaque cudnnFusedOpsConstParamPack_t structure that contains the various problem size information, such as the shape, layout, and the type of tensors, and the descriptors for convolution and activation, for the selected sequence of cudnnFusedOps_t computations.
paramLabel
Input. Several types of descriptors can be retrieved by this getter function. The
param
input points to the descriptor itself, and this input indicates the type of the descriptor pointed to by theparam
input. The cudnnFusedOpsConstParamLabel_t enumerant type enables the selection of the type of the descriptor. Refer to theparam
description below.param
Input. Data pointer to the host memory associated with the descriptor that should be retrieved. The type of this descriptor depends on the value of
paramLabel
. For the givenparamLabel
, if the associated value inside theconstPack
is set toNULL
or by defaultNULL
, then cuDNN will copy the value or the opaque structure in theconstPack
to the host memory buffer pointed to byparam
. For more information, refer to the table in cudnnFusedOpsConstParamLabel_t.isNULL
Input/Output. Users must pass a pointer to an integer in the host memory in this field. If the value in the
constPack
associated with the givenparamLabel
is by defaultNULL
or previously set by the user toNULL
, then cuDNN will write a non-zero value to the location pointed byisNULL
.
Returns
CUDNN_STATUS_SUCCESS
The descriptor values are retrieved successfully.
CUDNN_STATUS_BAD_PARAM
If either
constPack
,param
, orisNULL
isNULL
; or ifparamLabel
is invalid.
cudnnGetFusedOpsVariantParamPackAttribute()
This function retrieves the settings of the variable parameter pack descriptor.
cudnnStatus_t cudnnGetFusedOpsVariantParamPackAttribute( const cudnnFusedOpsVariantParamPack_t varPack, cudnnFusedOpsVariantParamLabel_t paramLabel, void *ptr);
Parameters
varPack
Input. Pointer to the
cudnnFusedOps
variant parameter pack (varPack
) descriptor.paramLabel
Input. Type of the buffer pointer parameter (in the
varPack
descriptor). For more information, refer to cudnnFusedOpsConstParamLabel_t. The retrieved descriptor values vary according to this type.ptr
Output. Pointer to the host or device memory where the retrieved value is written by this function. The data type of the pointer, and the host/device memory location, depend on the
paramLabel
input selection. For more information, refer to cudnnFusedOpsVariantParamLabel_t.
Returns
CUDNN_STATUS_SUCCESS
The descriptor values are retrieved successfully.
CUDNN_STATUS_BAD_PARAM
If either
varPack
orptr
isNULL
, or ifparamLabel
is set to invalid value.
cudnnIm2Col()
This function has been deprecated in cuDNN 9.0.
This function constructs the A matrix necessary to perform a forward pass of GEMM convolution.
cudnnStatus_t cudnnIm2Col( cudnnHandle_t handle, cudnnTensorDescriptor_t srcDesc, const void *srcData, cudnnFilterDescriptor_t filterDesc, cudnnConvolutionDescriptor_t convDesc, void *colBuffer)
This A matrix has a height of batch_size*y_height*y_width
and width of input_channels*filter_height*filter_width
, where:
batch_size
issrcDesc
first dimension
y_height
/y_width
are computed from cudnnGetConvolutionNdForwardOutputDim()input_channels is
srcDesc
second dimension (when in NCHW layout)
filter_height
/filter_width
arewDesc
third and fourth dimension
The A matrix is stored in format HW fully-packed in GPU memory.
Parameters
handle
Input. Handle to a previously created cuDNN context.
srcDesc
Input. Handle to a previously initialized tensor descriptor.
srcData
Input. Data pointer to GPU memory associated with the input tensor descriptor.
filterDesc
Input. Handle to a previously initialized filter descriptor.
convDesc
Input. Handle to a previously initialized convolution descriptor.
colBuffer
Output. Data pointer to GPU memory storing the output matrix.
Returns
CUDNN_STATUS_BAD_PARAM
srcData
orcolBuffer
isNULL
.CUDNN_STATUS_NOT_SUPPORTED
Any of
srcDesc
,filterDesc
,convDesc
hasdataType
ofCUDNN_DATA_INT8
,CUDNN_DATA_INT8x4
,CUDNN_DATA_INT8
, orCUDNN_DATA_INT8x4
convDesc
hasgroupCount
larger than1
.CUDNN_STATUS_EXECUTION_FAILED
The CUDA kernel execution was unsuccessful.
CUDNN_STATUS_SUCCESS
The output data array is successfully generated.
cudnnMakeFusedOpsPlan()
This function has been deprecated in cuDNN 9.0.
This function determines the optimum kernel to execute, and the workspace size the user should allocate, prior to the actual execution of the fused operations by cudnnFusedOpsExecute().
cudnnStatus_t cudnnMakeFusedOpsPlan( cudnnHandle_t handle, cudnnFusedOpsPlan_t plan, const cudnnFusedOpsConstParamPack_t constPack, size_t *workspaceSizeInBytes);
Parameters
handle
Input. Pointer to the cuDNN library context.
plan
Input. Pointer to a previously-created and initialized plan descriptor.
constPack
Input. Pointer to the descriptor to the const parameters pack.
workspaceSizeInBytes
Output. The amount of workspace size the user should allocate for the execution of this plan.
Returns
CUDNN_STATUS_BAD_PARAM
If any of the inputs is
NULL
, or if the type of cudnnFusedOps_t in theconstPack
descriptor is unsupported.CUDNN_STATUS_SUCCESS
The function executed successfully.
cudnnReorderFilterAndBias()
This function has been deprecated in cuDNN 9.0.
This function reorders the filter and bias values for tensors with data type CUDNN_DATA_INT8x32
and tensor format CUDNN_TENSOR_NCHW_VECT_C
. It can be used to enhance the inference time by separating the reordering operation from convolution.
cudnnStatus_t cudnnReorderFilterAndBias( cudnnHandle_t handle, const cudnnFilterDescriptor_t filterDesc, cudnnReorderType_t reorderType, const void *filterData, void *reorderedFilterData, int reorderBias, const void *biasData, void *reorderedBiasData);
Filter and bias tensors with data type CUDNN_DATA_INT8x32
(also implying tensor format CUDNN_TENSOR_NCHW_VECT_C
) requires permutation of output channel axes in order to take advantage of the Tensor Core IMMA instruction. This is done in every cudnnConvolutionForward() and cudnnConvolutionBiasActivationForward() call when the reorder type attribute of the convolution descriptor is set to CUDNN_DEFAULT_REORDER
. Users can avoid the repeated reordering kernel call by first using this call to reorder the filter and bias tensor and call the convolution forward APIs with reorder type set to CUDNN_NO_REORDER
.
For example, convolutions in a neural network of multiple layers can require reordering of kernels at every layer, which can take up a significant fraction of the total inference time. Using this function, the reordering can be done one time on the filter and bias data. This is followed by the convolution operations at the multiple layers, which enhance the inference time.
Parameters
handle
Input. Handle to a previously created cuDNN context.
filterDesc
Input. Descriptor for the kernel dataset.
reorderType
Input. Setting to either perform reordering or not. For more information, refer to cudnnReorderType_t.
filterData
Input. Pointer to the filter (kernel) data location in the device memory.
reorderedFilterData
Output. Pointer to the location in the device memory where the reordered filter data will be written to, by this function. This tensor has the same dimensions as
filterData
.reorderBias
Input. If > 0, then reorders the
biasData
also. If <= 0 then does not perform reordering operations on thebiasData
.biasData
Input. Pointer to the bias data location in the device memory.
reorderedBiasData
Output. Pointer to the location in the device memory where the reordered
biasData
will be written to, by this function. This tensor has the same dimensions asbiasData
.
Returns
CUDNN_STATUS_SUCCESS
Reordering was successful.
CUDNN_STATUS_EXECUTION_FAILED
Either the reordering of the filter data or of the
biasData
failed.CUDNN_STATUS_BAD_PARAM
The handle, filter descriptor, filter data, or reordered data is
NULL
. Or, if the bias reordering is requested (reorderBias > 0
), thebiasData
or reorderedbiasData
isNULL
. This status can also be returned if the filter dimension size is not4
.CUDNN_STATUS_NOT_SUPPORTED
Filter descriptor data type is not
CUDNN_DATA_INT8x32
; the filter descriptor tensor is not in a vectorized layout (CUDNN_TENSOR_NCHW_VECT_C
).
cudnnSetConvolution2dDescriptor()
This function has been deprecated in cuDNN 9.0.
This function initializes a previously created convolution descriptor object into a 2D correlation. This function assumes that the tensor and filter descriptors correspond to the forward convolution path and checks if their settings are valid. That same convolution descriptor can be reused in the backward path provided it corresponds to the same layer.
cudnnStatus_t cudnnSetConvolution2dDescriptor( cudnnConvolutionDescriptor_t convDesc, int pad_h, int pad_w, int u, int v, int dilation_h, int dilation_w, cudnnConvolutionMode_t mode, cudnnDataType_t computeType)
Parameters
convDesc
Input/Output. Handle to a previously created convolution descriptor.
pad_h
Input. Zero-padding height: number of rows of zeros implicitly concatenated onto the top and onto the bottom of input images.
pad_w
Input. Zero-padding width: number of columns of zeros implicitly concatenated onto the left and onto the right of input images.
u
Input. Vertical filter stride.
v
Input. Horizontal filter stride.
dilation_h
Input. Filter height dilation.
dilation_w
Input. Filter width dilation.
mode
Input. Selects between
CUDNN_CONVOLUTION
andCUDNN_CROSS_CORRELATION
.computeType
Input. compute precision.
Returns
CUDNN_STATUS_SUCCESS
The object was set successfully.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
The descriptor
convDesc
isNIL
.One of the parameters
pad_h
,pad_w
is strictly negative.One of the parameters
u
,v
is negative or zero.One of the parameters
dilation_h
,dilation_w
is negative or zero.The parameter mode has an invalid enumerant value.
cudnnSetConvolutionGroupCount()
This function has been deprecated in cuDNN 9.0.
This function allows the user to specify the number of groups to be used in the associated convolution.
cudnnStatus_t cudnnSetConvolutionGroupCount( cudnnConvolutionDescriptor_t convDesc, int groupCount)
Returns
CUDNN_STATUS_SUCCESS
The group count was set successfully.
CUDNN_STATUS_BAD_PARAM
An invalid convolution descriptor was provided.
cudnnSetConvolutionMathType()
This function has been deprecated in cuDNN 9.0.
This function allows the user to specify whether or not the use of tensor op is permitted in the library routines associated with a given convolution descriptor.
cudnnStatus_t cudnnSetConvolutionMathType( cudnnConvolutionDescriptor_t convDesc, cudnnMathType_t mathType)
Returns
CUDNN_STATUS_SUCCESS
The math type was set successfully.
CUDNN_STATUS_BAD_PARAM
Either an invalid convolution descriptor was provided or an invalid math type was specified.
cudnnSetConvolutionNdDescriptor()
This function has been deprecated in cuDNN 9.0.
This function initializes a previously created generic convolution descriptor object into a Nd
correlation. That same convolution descriptor can be reused in the backward path provided it corresponds to the same layer. The convolution computation will be done in the specified dataType
, which can be potentially different from the input/output tensors.
cudnnStatus_t cudnnSetConvolutionNdDescriptor( cudnnConvolutionDescriptor_t convDesc, int arrayLength, const int padA[], const int filterStrideA[], const int dilationA[], cudnnConvolutionMode_t mode, cudnnDataType_t dataType)
Parameters
convDesc
Input/Output. Handle to a previously created convolution descriptor.
arrayLength
Input. Dimension of the convolution.
padA
Input. Array of dimension
arrayLength
containing the zero-padding size for each dimension. For every dimension, the padding represents the number of extra zeros implicitly concatenated at the start and at the end of every element of that dimension.filterStrideA
Input. Array of dimension
arrayLength
containing the filter stride for each dimension. For every dimension, the filter stride represents the number of elements to slide to reach the next start of the filtering window of the next point.dilationA
Input. Array of dimension
arrayLength
containing the dilation factor for each dimension.mode
Input. Selects between
CUDNN_CONVOLUTION
andCUDNN_CROSS_CORRELATION
.datatype
Input. Selects the data type in which the computation will be done.
Note
CUDNN_DATA_HALF
incudnnSetConvolutionNdDescriptor()
withHALF_CONVOLUTION_BWD_FILTER
is not recommended as it is known to not be useful for any practical use case for training and will be considered to be blocked in a future cuDNN release. The use ofCUDNN_DATA_HALF
for input tensors in cudnnSetTensorNdDescriptor() andCUDNN_DATA_FLOAT
incudnnSetConvolutionNdDescriptor()
withHALF_CONVOLUTION_BWD_FILTER
is recommended and is used with the automatic mixed precision (AMP) training in many well known deep learning frameworks.
Returns
CUDNN_STATUS_SUCCESS
The object was set successfully.
CUDNN_STATUS_BAD_PARAM
At least one of the following conditions are met:
The descriptor
convDesc
isNIL
.The
arrayLengthRequest
is negative.The enumerant mode has an invalid value.
The enumerant
datatype
has an invalid value.One of the elements of
padA
is strictly negative.One of the elements of
strideA
is negative or zero.One of the elements of
dilationA
is negative or zero.
CUDNN_STATUS_NOT_SUPPORTED
At least one of the following conditions are met:
The
arrayLengthRequest
is greater thanCUDNN_DIM_MAX
.
cudnnSetConvolutionReorderType()
This function has been deprecated in cuDNN 9.0.
This function sets the convolution reorder type for the given convolution descriptor.
cudnnStatus_t cudnnSetConvolutionReorderType( cudnnConvolutionDescriptor_t convDesc, cudnnReorderType_t reorderType);
Parameters
convDesc
Input. The convolution descriptor for which the reorder type should be set.
reorderType
Input. Set the reorder type to this value. For more information, refer to cudnnReorderType_t.
Returns
CUDNN_STATUS_BAD_PARAM
The reorder type supplied is not supported.
CUDNN_STATUS_SUCCESS
Reorder type is set successfully.
cudnnSetFusedOpsConstParamPackAttribute()
This function has been deprecated in cuDNN 9.0.
This function sets the descriptor pointed to by the param
pointer input. The type of the descriptor to be set is indicated by the enum value of the paramLabel
input.
cudnnStatus_t cudnnSetFusedOpsConstParamPackAttribute( cudnnFusedOpsConstParamPack_t constPack, cudnnFusedOpsConstParamLabel_t paramLabel, const void *param);
Parameters
constPack
Input. The opaque cudnnFusedOpsConstParamPack_t structure that contains the various problem size information, such as the shape, layout and the type of tensors, the descriptors for convolution and activation, and settings for operations such as convolution and activation.
paramLabel
Input. Several types of descriptors can be set by this setter function. The
param
input points to the descriptor itself, and this input indicates the type of the descriptor pointed to by theparam
input. The cudnnFusedOpsConstParamPack_t enumerant type enables the selection of the type of the descriptor.param
Input. Data pointer to the host memory, associated with the specific descriptor. The type of the descriptor depends on the value of
paramLabel
. For more information, refer to the table in cudnnFusedOpsConstParamPack_t.If this pointer is set to
NULL
, then the cuDNN library will record as such. If not, then the values pointed to by this pointer (meaning, the value or the opaque structure underneath) will be copied into theconstPack
duringcudnnSetFusedOpsConstParamPackAttribute()
operation.
Returns
CUDNN_STATUS_SUCCESS
The descriptor is set successfully.
CUDNN_STATUS_BAD_PARAM
If
constPack
isNULL
, or ifparamLabel
or the ops setting forconstPack
is invalid.
cudnnSetFusedOpsVariantParamPackAttribute()
This function has been deprecated in cuDNN 9.0.
This function sets the variable parameter pack descriptor.
cudnnStatus_t cudnnSetFusedOpsVariantParamPackAttribute( cudnnFusedOpsVariantParamPack_t varPack, cudnnFusedOpsVariantParamLabel_t paramLabel, void *ptr);
Parameters
varPack
Input. Pointer to the
cudnnFusedOps
variant parameter pack (varPack
) descriptor.paramLabel
Input. Type to which the buffer pointer parameter (in the
varPack
descriptor) is set by this function. For more information, refer to cudnnFusedOpsConstParamLabel_t.ptr
Input. Pointer to the host or device memory, to the value to which the descriptor parameter is set. The data type of the pointer, and the host/device memory location, depend on the
paramLabel
input selection. For more information, refer to cudnnFusedOpsVariantParamLabel_t.
Returns
CUDNN_STATUS_BAD_PARAM
If
varPack
isNULL
or ifparamLabel
is set to an unsupported value.CUDNN_STATUS_SUCCESS
The descriptor is set successfully.