cudnn_cnn Library

Contents

Data Type References

These are the data type references in the cudnn_cnn library.

Struct Types

These are the struct types in the cudnn_cnn library.

cudnnConvolutionBwdDataAlgoPerf_t

This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.

cudnnConvolutionBwdDataAlgoPerf_t is a structure containing performance results returned by cudnnFindConvolutionBackwardDataAlgorithm() or heuristic results returned by cudnnGetConvolutionBackwardDataAlgorithm_v7().

Data Members

cudnnConvolutionBwdDataAlgo_t algo

The algorithm runs to obtain the associated performance metrics.

cudnnStatus_t status

If any error occurs during the workspace allocation or timing of cudnnConvolutionBackwardData(), this status will represent that error. Otherwise, this status will be the return status of cudnnConvolutionBackwardData().

  • CUDNN_STATUS_ALLOC_FAILED if any error occurred during workspace allocation or if the provided workspace is insufficient.

  • CUDNN_STATUS_INTERNAL_ERROR if any error occurred during timing calculations or workspace deallocation.

  • Otherwise, this will be the return status of cudnnConvolutionBackwardData().

float time

The execution time of cudnnConvolutionBackwardData() (in milliseconds).

size_t memory

The workspace size (in bytes).

cudnnDeterminism_t determinism

The determinism of the algorithm.

cudnnMathType_t mathType

The math type provided to the algorithm.

int reserved[3]

Reserved space for future properties.

cudnnConvolutionBwdFilterAlgoPerf_t

This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.

cudnnConvolutionBwdFilterAlgoPerf_t is a structure containing performance results returned by cudnnFindConvolutionBackwardFilterAlgorithm() or heuristic results returned by cudnnGetConvolutionBackwardFilterAlgorithm_v7().

Data Members

cudnnConvolutionBwdFilterAlgo_t algo

The algorithm runs to obtain the associated performance metrics.

cudnnStatus_t status

If any error occurs during the workspace allocation or timing of cudnnConvolutionBackwardFilter(), this status will represent that error. Otherwise, this status will be the return status of cudnnConvolutionBackwardFilter().

  • CUDNN_STATUS_ALLOC_FAILED if any error occurred during workspace allocation or if the provided workspace is insufficient.

  • CUDNN_STATUS_INTERNAL_ERROR if any error occurred during timing calculations or workspace deallocation.

  • Otherwise, this will be the return status of cudnnConvolutionBackwardFilter().

float time

The execution time of cudnnConvolutionBackwardFilter() (in milliseconds).

size_t memory

The workspace size (in bytes).

cudnnDeterminism_t determinism

The determinism of the algorithm.

cudnnMathType_t mathType

The math type provided to the algorithm.

int reserved[3]

Reserved space for future properties.

cudnnConvolutionFwdAlgoPerf_t

This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.

cudnnConvolutionFwdAlgoPerf_t is a structure containing performance results returned by cudnnFindConvolutionForwardAlgorithm() or heuristic results returned by cudnnGetConvolutionForwardAlgorithm_v7().

Data Members

cudnnConvolutionFwdAlgo_t algo

The algorithm runs to obtain the associated performance metrics.

cudnnStatus_t status

If any error occurs during the workspace allocation or timing of cudnnConvolutionForward(), this status will represent that error. Otherwise, this status will be the return status of cudnnConvolutionForward().

  • CUDNN_STATUS_ALLOC_FAILED if any error occurred during workspace allocation or if the provided workspace is insufficient.

  • CUDNN_STATUS_INTERNAL_ERROR if any error occurred during timing calculations or workspace deallocation.

  • Otherwise, this will be the return status of cudnnConvolutionForward().

float time

The execution time of cudnnConvolutionForward() (in milliseconds).

size_t memory

The workspace size (in bytes).

cudnnDeterminism_t determinism

The determinism of the algorithm.

cudnnMathType_t mathType

The math type provided to the algorithm.

int reserved[3]

Reserved space for future properties.

Pointer To Opaque Struct Types

These are the pointers to the opaque struct types in the cudnn_cnn library.

cudnnConvolutionDescriptor_t

This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.

cudnnConvolutionDescriptor_t is a pointer to an opaque structure holding the description of a convolution operation. cudnnCreateConvolutionDescriptor() is used to create one instance, and cudnnSetConvolutionNdDescriptor() or cudnnSetConvolution2dDescriptor() must be used to initialize this instance.

cudnnFusedOpsConstParamPack_t

This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.

cudnnFusedOpsConstParamPack_t is a pointer to an opaque structure holding the description of the cudnnFusedOps constant parameters. Use the function cudnnCreateFusedOpsConstParamPack() to create one instance of this structure, and the function cudnnDestroyFusedOpsConstParamPack() to destroy a previously-created descriptor.

cudnnFusedOpsPlan_t

This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.

cudnnFusedOpsPlan_t is a pointer to an opaque structure holding the description of the cudnnFusedOpsPlan. This descriptor contains the plan information, including the problem type and size, which kernels should be run, and the internal workspace partition. Use the function cudnnCreateFusedOpsPlan() to create one instance of this structure, and the function cudnnDestroyFusedOpsPlan() to destroy a previously-created descriptor.

cudnnFusedOpsVariantParamPack_t

This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.

cudnnFusedOpsVariantParamPack_t is a pointer to an opaque structure holding the description of the cudnnFusedOps variant parameters. Use the function cudnnCreateFusedOpsVariantParamPack() to create one instance of this structure, and the function cudnnDestroyFusedOpsVariantParamPack() to destroy a previously-created descriptor.

Enumeration Types

These are the enumeration types in the cudnn_cnn library.

cudnnFusedOps_t

This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.

The cudnnFusedOps_t type is an enumerated type to select a specific sequence of computations to perform in the fused operations.

Members and Descriptions

CUDNN_FUSED_SCALE_BIAS_ACTIVATION_CONV_BNSTATS = 0

On a per-channel basis, it performs these operations in this order: scale, add bias, activation, convolution, and generate batchNorm statistics.

cudnnFusedOps_t
CUDNN_FUSED_SCALE_BIAS_ACTIVATION_WGRAD = 1

On a per-channel basis, it performs these operations in this order: scale, add bias, activation, convolution backward weights, and generate batchNorm statistics.

cudnnFusedOps_t
CUDNN_FUSED_BN_FINALIZE_STATISTICS_TRAINING = 2

Computes the equivalent scale and bias from ySum, ySqSum, learned scale, and bias. Optionally, update running statistics and generate saved stats.

CUDNN_FUSED_BN_FINALIZE_STATISTICS_INFERENCE = 3

Computes the equivalent scale and bias from the learned running statistics and the learned scale and bias.

CUDNN_FUSED_CONV_SCALE_BIAS_ADD_ACTIVATION = 4

On a per-channel basis, performs these operations in this order: convolution, scale, add bias, element-wise addition with another tensor, and activation.

CUDNN_FUSED_SCALE_BIAS_ADD_ACTIVATION_GEN_BITMASK = 5

On a per-channel basis, performs these operations in this order: scale and bias on one tensor, scale and bias on a second tensor, element-wise addition of these two tensors, and on the resulting tensor performs activation and generates activation bit mask.

CUDNN_FUSED_DACTIVATION_FORK_DBATCHNORM = 6

On a per-channel basis, performs these operations in this order: backward activation, fork (meaning, write out gradient for the residual branch), and backward batch norm.

cudnnFusedOpsConstParamLabel_t

This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.

The cudnnFusedOpsConstParamLabel_t is an enumerated type for the selection of the type of the cudnnFusedOps descriptor. For more information, refer to cudnnSetFusedOpsConstParamPackAttribute().

typedef enum {
    CUDNN_PARAM_XDESC                          = 0,
    CUDNN_PARAM_XDATA_PLACEHOLDER              = 1,
    CUDNN_PARAM_BN_MODE                        = 2,
    CUDNN_PARAM_BN_EQSCALEBIAS_DESC            = 3,
    CUDNN_PARAM_BN_EQSCALE_PLACEHOLDER         = 4,
    CUDNN_PARAM_BN_EQBIAS_PLACEHOLDER          = 5,
    CUDNN_PARAM_ACTIVATION_DESC                = 6,
    CUDNN_PARAM_CONV_DESC                      = 7,
    CUDNN_PARAM_WDESC                          = 8,
    CUDNN_PARAM_WDATA_PLACEHOLDER              = 9,
    CUDNN_PARAM_DWDESC                         = 10,
    CUDNN_PARAM_DWDATA_PLACEHOLDER             = 11,
    CUDNN_PARAM_YDESC                          = 12,
    CUDNN_PARAM_YDATA_PLACEHOLDER              = 13,
    CUDNN_PARAM_DYDESC                         = 14,
    CUDNN_PARAM_DYDATA_PLACEHOLDER             = 15,
    CUDNN_PARAM_YSTATS_DESC                    = 16,
    CUDNN_PARAM_YSUM_PLACEHOLDER               = 17,
    CUDNN_PARAM_YSQSUM_PLACEHOLDER             = 18,
    CUDNN_PARAM_BN_SCALEBIAS_MEANVAR_DESC      = 19,
    CUDNN_PARAM_BN_SCALE_PLACEHOLDER           = 20,
    CUDNN_PARAM_BN_BIAS_PLACEHOLDER            = 21,
    CUDNN_PARAM_BN_SAVED_MEAN_PLACEHOLDER      = 22,
    CUDNN_PARAM_BN_SAVED_INVSTD_PLACEHOLDER    = 23,
    CUDNN_PARAM_BN_RUNNING_MEAN_PLACEHOLDER    = 24,
    CUDNN_PARAM_BN_RUNNING_VAR_PLACEHOLDER     = 25,
    CUDNN_PARAM_ZDESC                          = 26,
    CUDNN_PARAM_ZDATA_PLACEHOLDER              = 27,
    CUDNN_PARAM_BN_Z_EQSCALEBIAS_DESC          = 28,
    CUDNN_PARAM_BN_Z_EQSCALE_PLACEHOLDER       = 29,
    CUDNN_PARAM_BN_Z_EQBIAS_PLACEHOLDER        = 30,
    CUDNN_PARAM_ACTIVATION_BITMASK_DESC        = 31,
    CUDNN_PARAM_ACTIVATION_BITMASK_PLACEHOLDER = 32,
    CUDNN_PARAM_DXDESC                         = 33,
    CUDNN_PARAM_DXDATA_PLACEHOLDER             = 34,
    CUDNN_PARAM_DZDESC                         = 35,
    CUDNN_PARAM_DZDATA_PLACEHOLDER             = 36,
    CUDNN_PARAM_BN_DSCALE_PLACEHOLDER          = 37,
    CUDNN_PARAM_BN_DBIAS_PLACEHOLDER           = 38,
    } cudnnFusedOpsConstParamLabel_t;
Legend For Tables in cudnnFusedOpsConstParamLabel_t

Short-Form Used

Stands For

Setter

cudnnSetFusedOpsConstParamPackAttribute()

Getter

cudnnGetFusedOpsConstParamPackAttribute()

X_PointerPlaceHolder_t

cudnnFusedOpsPointerPlaceHolder_t

X_ prefix in the Attribute Key column

Stands for CUDNN_PARAM_ in the enumerator name.

For the Attribute CUDNN_FUSED_SCALE_BIAS_ACTIVATION_CONV_BNSTATS in cudnnFusedOpsVariantParamLabel_t

Attribute Key

Expected Descriptor Type Passed in, in the Setter

Description

Default Value After Creation

X_XDESC

In the setter, the *param should be xDesc, a pointer to a previously initialized cudnnTensorDescriptor_t.

Tensor descriptor describing the size, layout, and datatype of the x (input) tensor.

NULL

X_XDATA_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether xData pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *.

CUDNN_PTR_NULL

X_BN_MODE

In the setter, the *param should be a pointer to a previously initialized cudnnBatchNormMode_t*.

Describes the mode of operation for the scale, bias and the statistics. As of cuDNN 7.6.0, only CUDNN_BATCHNORM_SPATIAL and CUDNN_BATCHNORM_SPATIAL_PERSISTENT are supported, meaning, scale, bias, and statistics are all per-channel.

CUDNN_BATCHNORM_PER_ACTIVATION

X_BN_EQSCALEBIAS_DESC

In the setter, the *param should be a pointer to a previously initialized cudnnTensorDescriptor_t.

Tensor descriptor describing the size, layout, and datatype of the batchNorm equivalent scale and bias tensors. The shapes must match the mode specified in CUDNN_PARAM_BN_MODE. If set to NULL, both scale and bias operation will become a NOP.

NULL

X_BN_EQSCALE_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether batchNorm equivalent scale pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, then the scale operation becomes a NOP.

CUDNN_PTR_NULL

X_BN_EQBIAS_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether batchNorm equivalent bias pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, then the bias operation becomes a NOP.

CUDNN_PTR_NULL

X_ACTIVATION_DESC

In the setter, the *param should be a pointer to a previously initialized cudnnActivationDescriptor_t*.

Describes the activation operation. As of 7.6.0, only activation modes of CUDNN_ACTIVATION_RELU and CUDNN_ACTIVATION_IDENTITY are supported. If set to NULL or if the activation mode is set to CUDNN_ACTIVATION_IDENTITY, then the activation in the op sequence becomes a NOP.

NULL

X_CONV_DESC

In the setter, the *param should be a pointer to a previously initialized cudnnConvolutionDescriptor_t*.

Describes the convolution operation.

NULL

X_WDESC

In the setter, the *param should be a pointer to a previously initialized cudnnFilterDescriptor_t*.

Filter descriptor describing the size, layout and datatype of the w (filter) tensor.

NULL

X_WDATA_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether w (filter) tensor pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *.

CUDNN_PTR_NULL

X_YDESC

In the setter, the *param should be a pointer to a previously initialized cudnnTensorDescriptor_t*.

Tensor descriptor describing the size, layout and datatype of the y (output) tensor.

NULL

X_YDATA_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether y (output) tensor pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *.

CUDNN_PTR_NULL

X_YSTATS_DESC

In the setter, the *param should be a pointer to a previously initialized cudnnTensorDescriptor_t*.

Tensor descriptor describing the size, layout and datatype of the sum of y and sum of y square tensors. The shapes need to match the mode specified in CUDNN_PARAM_BN_MODE. If set to NULL, the y statistics generation operation will become a NOP.

NULL

X_YSUM_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether sum of y pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, the y statistics generation operation will become a NOP.

CUDNN_PTR_NULL

X_YSQSUM_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether sum of y square pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, the y statistics generation operation will become a NOP.

CUDNN_PTR_NULL

Note

  • If the corresponding pointer placeholder in ConstParamPack is set to CUDNN_PTR_NULL, then the device pointer in the VariantParamPack needs to be NULL as well.

  • If the corresponding pointer placeholder in ConstParamPack is set to CUDNN_PTR_ELEM_ALIGNED or CUDNN_PTR_16B_ALIGNED, then the device pointer in the VariantParamPack may not be NULL and need to be at least element-aligned or 16 bytes-aligned, respectively.

As of cuDNN 7.6.0, if the following conditions in the table are met, then the fully fused fast path will be triggered. Otherwise, a slower partially fused path will be triggered.

Conditions for Fully Fused Fast Path (Forward) for cudnnFusedOpsConstParamLabel_t

Parameter

Condition

Device compute capability

Needs to be one of 7.0, 7.2, or 7.5.

  • CUDNN_PARAM_XDESC

  • CUDNN_PARAM_XDATA_PLACEHOLDER

  • Tensor is 4 dimensional

  • Datatype is CUDNN_DATA_HALF

  • Layout is NHWC fully packed

  • Alignment is CUDNN_PTR_16B_ALIGNED

  • Tensor C dimension is a multiple of 8.

  • CUDNN_PARAM_BN_EQSCALEBIAS_DESC

  • CUDNN_PARAM_BN_EQSCALE_PLACEHOLDER

  • CUDNN_PARAM_BN_EQBIAS_PLACEHOLDER

  • If either one of scale and bias operation is not turned into a NOP:
    • Tensor is 4 dimensional with shape 1xCx1x1

    • Datatype is CUDNN_DATA_HALF

    • Layout is fully packed

    • Alignment is CUDNN_PTR_16B_ALIGNED

  • CUDNN_PARAM_CONV_DESC

  • CUDNN_PARAM_WDESC

  • CUDNN_PARAM_WDATA_PLACEHOLDER

  • Convolution descriptors mode needs to be CUDNN_CROSS_CORRELATION.

  • Convolution descriptors dataType needs to be CUDNN_DATA_FLOAT.

  • Convolution descriptors dilationA is (1,1).

  • Convolution descriptors group count needs to be 1.

  • Convolution descriptors mathType needs to be CUDNN_TENSOR_OP_MATH or CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION.

  • Filter is in NHWC layout

  • Filter data type is CUDNN_DATA_HALF

  • Filter K dimension is a multiple of 32

  • Filter size RxS is either 1x1 or 3x3

  • If filter size RxS is 1x1, convolution descriptors padA needs to be (0,0) and filterStrideA needs to be (1,1).

  • Filter alignment is CUDNN_PTR_16B_ALIGNED

  • CUDNN_PARAM_YDESC

  • CUDNN_PARAM_YDATA_PLACEHOLDER

  • Tensor is 4 dimensional

  • Datatype is CUDNN_DATA_HALF

  • Layout is NHWC fully packed

  • Alignment is CUDNN_PTR_16B_ALIGNED

  • CUDNN_PARAM_YSTATS_DESC

  • CUDNN_PARAM_YSUM_PLACEHOLDER

  • CUDNN_PARAM_YSQSUM_PLACEHOLDER

  • If the generate statistics operation is not turned into a NOP:
    • Tensor is 4 dimensional with shape 1xKx1x1

    • Datatype is CUDNN_DATA_FLOAT

    • Layout is fully packed

    • Alignment is CUDNN_PTR_16B_ALIGNED

For the attribute CUDNN_FUSED_SCALE_BIAS_ACTIVATION_WGRAD in cudnnFusedOpsConstParamLabel_t

Attribute Key

Expected Descriptor Type Passed in, in the Setter

Description

Default Value After Creation

X_XDESC

In the setter, the *param should be xDesc, a pointer to a previously initialized cudnnTensorDescriptor_t.

Tensor descriptor describing the size, layout, and datatype of the x (input) tensor.

NULL

X_XDATA_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether xData pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *.

CUDNN_PTR_NULL

X_BN_MODE

In the setter, the *param should be a pointer to a previously initialized cudnnBatchNormMode_t*.

Describes the mode of operation for the scale, bias and the statistics. As of cuDNN 7.6.0, only CUDNN_BATCHNORM_SPATIAL and CUDNN_BATCHNORM_SPATIAL_PERSISTENT are supported, meaning, scale, bias, and statistics are all per-channel.

CUDNN_BATCHNORM_PER_ACTIVATION

X_BN_EQSCALEBIAS_DESC

In the setter, the *param should be a pointer to a previously initialized cudnnTensorDescriptor_t.

Tensor descriptor describing the size, layout, and datatype of the batchNorm equivalent scale and bias tensors. The shapes must match the mode specified in CUDNN_PARAM_BN_MODE. If set to NULL, both scale and bias operation will become a NOP.

NULL

X_BN_EQSCALE_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether batchNorm equivalent scale pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, then the scale operation becomes a NOP.

CUDNN_PTR_NULL

X_BN_EQBIAS_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether batchNorm equivalent bias pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, then the bias operation becomes a NOP.

CUDNN_PTR_NULL

X_ACTIVATION_DESC

In the setter, the *param should be a pointer to a previously initialized cudnnActivationDescriptor_t*.

Describes the activation operation. As of 7.6.0, only activation modes of CUDNN_ACTIVATION_RELU and CUDNN_ACTIVATION_IDENTITY are supported. If set to NULL or if the activation mode is set to CUDNN_ACTIVATION_IDENTITY, then the activation in the op sequence becomes a NOP.

NULL

X_CONV_DESC

In the setter, the *param should be a pointer to a previously initialized cudnnConvolutionDescriptor_t*.

Describes the convolution operation.

NULL

X_DWDESC

In the setter, the *param should be a pointer to a previously initialized cudnnFilterDescriptor_t*.

Filter descriptor describing the size, layout and datatype of the dw (filter gradient output) tensor.

NULL

X_DWDATA_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether dw (filter gradient output) tensor pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *.

CUDNN_PTR_NULL

X_DYDESC

In the setter, the *param should be a pointer to a previously initialized cudnnTensorDescriptor_t*.

Tensor descriptor describing the size, layout and datatype of the dy (gradient input) tensor.

NULL

X_DYDATA_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether dy (gradient input) tensor pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *.

CUDNN_PTR_NULL

Note

  • If the corresponding pointer placeholder in ConstParamPack is set to CUDNN_PTR_NULL, then the device pointer in the VariantParamPack needs to be NULL as well.

  • If the corresponding pointer placeholder in ConstParamPack is set to CUDNN_PTR_ELEM_ALIGNED or CUDNN_PTR_16B_ALIGNED, then the device pointer in the VariantParamPack may not be NULL and need to be at least element-aligned or 16 bytes-aligned, respectively.

As of cuDNN 7.6.0, if the following conditions in the table are met, then the fully fused fast path will be triggered. Otherwise, a slower partially fused path will be triggered.

Conditions for Fully Fused Fast Path (Backward) for cudnnFusedOpsConstParamLabel_t

Parameter

Condition

Device compute capability

Needs to be one of 7.0, 7.2, or 7.5.

  • CUDNN_PARAM_XDESC

  • CUDNN_PARAM_XDATA_PLACEHOLDER

  • Tensor is 4 dimensional

  • Datatype is CUDNN_DATA_HALF

  • Layout is NHWC fully packed

  • Alignment is CUDNN_PTR_16B_ALIGNED

  • Tensor C dimension is a multiple of 8.

  • CUDNN_PARAM_BN_EQSCALEBIAS_DESC

  • CUDNN_PARAM_BN_EQSCALE_PLACEHOLDER

  • CUDNN_PARAM_BN_EQBIAS_PLACEHOLDER

  • If either one of scale and bias operation is not turned into a NOP:
    • Tensor is 4 dimensional with shape 1xCx1x1

    • Datatype is CUDNN_DATA_HALF

    • Layout is fully packed

    • Alignment is CUDNN_PTR_16B_ALIGNED

  • CUDNN_PARAM_CONV_DESC

  • CUDNN_PARAM_DWDESC

  • CUDNN_PARAM_DWDATA_PLACEHOLDER

  • Convolution descriptors mode needs to be CUDNN_CROSS_CORRELATION.

  • Convolution descriptors dataType needs to be CUDNN_DATA_FLOAT.

  • Convolution descriptors dilationA is (1,1).

  • Convolution descriptors group count needs to be 1.

  • Convolution descriptors mathType needs to be CUDNN_TENSOR_OP_MATH or CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION.

  • Filter gradients is in NHWC layout

  • Filter gradients data type is CUDNN_DATA_HALF

  • Filter gradients K dimension is a multiple of 32.

  • Filter gradient size RxS is either 1x1 or 3x3

  • If filter gradient size RxS is 1x1, convolution descriptors padA needs to be (0,0) and filterStrideA needs to be (1,1).

  • Filter gradients alignment is CUDNN_PTR_16B_ALIGNED

  • CUDNN_PARAM_DYDESC

  • CUDNN_PARAM_DYDATA_PLACEHOLDER

  • Tensor is 4 dimensional

  • Datatype is CUDNN_DATA_HALF

  • Layout is NHWC fully packed

  • Alignment is CUDNN_PTR_16B_ALIGNED

For the attribute CUDNN_FUSED_BN_FINALIZE_STATISTICS_TRAINING in cudnnFusedOpsConstParamLabel_t

Attribute Key

Expected Descriptor Type Passed in, in the Setter

Description

Default Value After Creation

X_BN_MODE

In the setter, the *param should be a pointer to a previously initialized cudnnBatchNormMode_t*.

Describes the mode of operation for the scale, bias and the statistics. As of cuDNN 7.6.0, only CUDNN_BATCHNORM_SPATIAL and CUDNN_BATCHNORM_SPATIAL_PERSISTENT are supported, meaning, scale, bias and statistics are all per-channel.

CUDNN_BATCHNORM_PER_ACTIVATION

X_YSTATS_DESC

In the setter, the *param should be a pointer to a previously initialized cudnnTensorDescriptor_t.

Tensor descriptor describing the size, layout, and data type of the sum of y and sum of y square tensors. The shapes need to match the mode specified in CUDNN_PARAM_BN_MODE.

NULL

X_YSUM_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether sum of y pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *.

CUDNN_PTR_NULL

X_YSQSUM_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether sum of y square pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *.

CUDNN_PTR_NULL

X_BN_SCALEBIAS_MEANVAR_DESC

In the setter, the *param should be a pointer to a previously initialized cudnnTensorDescriptor_t.

A common tensor descriptor describing the size, layout, and data type of the batchNorm trained scale, bias, and statistics tensors. The shapes need to match the mode specified in CUDNN_PARAM_BN_MODE (similar to the bnScaleBiasMeanVarDesc field in the cudnnBatchNormalization* API).

NULL

X_BN_SCALE_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether the batchNorm trained scale pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If the output of BN_EQSCALE is not needed, then this is not needed and may be NULL.

CUDNN_PTR_NULL

X_BN_BIAS_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether the batchNorm trained bias pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If neither output of BN_EQSCALE or BN_EQBIAS is needed, then this is not needed and may be NULL.

CUDNN_PTR_NULL

X_BN_SAVED_MEAN_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether the batchNorm saved mean pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, then the computation for this output becomes a NOP.

CUDNN_PTR_NULL

X_BN_SAVED_INVSTD_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether the batchNorm saved inverse standard deviation pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, then the computation for this output becomes a NOP.

CUDNN_PTR_NULL

X_BN_RUNNING_MEAN_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether the batchNorm running mean pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, then the computation for this output becomes a NOP.

CUDNN_PTR_NULL

X_BN_RUNNING_VAR_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether the batchNorm running variance pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, then the computation for this output becomes a NOP.

CUDNN_PTR_NULL

X_BN_EQSCALEBIAS_DESC

In the setter, the *param should be a pointer to a previously initialized cudnnTensorDescriptor_t.

Tensor descriptor describing the size, layout, and data type of the batchNorm equivalent scale and bias tensors. The shapes need to match the mode specified in CUDNN_PARAM_BN_MODE. If neither output of BN_EQSCALE or BN_EQBIAS is needed, then this is not needed and may be NULL.

NULL

X_BN_EQSCALE_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether batchNorm equivalent scale pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, then the computation for this output becomes a NOP.

CUDNN_PTR_NULL

X_BN_EQBIAS_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether batchNorm equivalent bias pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, then the computation for this output becomes a NOP.

CUDNN_PTR_NULL

For the attribute CUDNN_FUSED_BN_FINALIZE_STATISTICS_INFERENCE in cudnnFusedOpsConstParamLabel_t

Attribute Key

Expected Descriptor Type Passed in, in the Setter

Description

Default Value After Creation

X_BN_MODE

In the setter, the *param should be a pointer to a previously initialized cudnnBatchNormMode_t*.

Describes the mode of operation for the scale, bias, and the statistics. As of cuDNN 7.6.0, only CUDNN_BATCHNORM_SPATIAL and CUDNN_BATCHNORM_SPATIAL_PERSISTENT are supported, meaning, scale, bias, and statistics are all per-channel.

CUDNN_BATCHNORM_PER_ACTIVATION

X_BN_SCALEBIAS_MEANVAR_DESC

In the setter, the *param should be a pointer to a previously initialized cudnnTensorDescriptor_t.

A common tensor descriptor describing the size, layout, and data type of the batchNorm trained scale, bias, and statistics tensors. The shapes need to match the mode specified in CUDNN_PARAM_BN_MODE (similar to the bnScaleBiasMeanVarDesc field in the cudnnBatchNormalization* API).

NULL

X_BN_SCALE_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether the batchNorm trained scale pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *.

CUDNN_PTR_NULL

X_BN_BIAS_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether the batchNorm trained bias pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *.

CUDNN_PTR_NULL

X_BN_RUNNING_MEAN_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether the batchNorm running mean pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *.

CUDNN_PTR_NULL

X_BN_RUNNING_VAR_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether the batchNorm running variance pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *.

CUDNN_PTR_NULL

X_BN_EQSCALEBIAS_DESC

In the setter, the *param should be a pointer to a previously initialized cudnnTensorDescriptor_t.

Tensor descriptor describing the size, layout, and data type of the batchNorm equivalent scale and bias tensors. The shapes need to match the mode specified in CUDNN_PARAM_BN_MODE.

NULL

X_BN_EQSCALE_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether the batchNorm saved mean pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, then the computation for this output becomes a NOP.

CUDNN_PTR_NULL

X_BN_EQBIAS_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether the batchNorm equivalent bias pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, then the computation for this output becomes a NOP.

CUDNN_PTR_NULL

The following operation performs the computation, where \(*\) denotes convolution operator: \(y=\alpha_{1}\left( w*x \right)+\alpha_{2}z+b\)

For the attribute CUDNN_FUSED_CONVOLUTION_SCALE_BIAS_ADD_RELU in cudnnFusedOpsConstParamLabel_t

Attribute Key

Expected Descriptor Type Passed in, in the Setter

Description

Default Value After Creation

X_XDESC

In the setter, the *param should be xDesc, a pointer to a previously initialized cudnnTensorDescriptor_t.

Tensor descriptor describing the size, layout, and data type of the x (input) tensor.

NULL

X_XDATA_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether xData pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *.

CUDNN_PTR_NULL

X_CONV_DESC

In the setter, the *param should be a pointer to a previously initialized cudnnConvolutionDescriptor_t*.

Describes the convolution operation.

NULL

X_WDESC

In the setter, the *param should be a pointer to a previously initialized cudnnFilterDescriptor_t*.

Filter descriptor describing the size, layout, and data type of the w (filter) tensor.

NULL

X_WDATA_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether w (filter) tensor pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *.

CUDNN_PTR_NULL

X_BN_EQSCALEBIAS_DESC

In the setter, the *param should be a pointer to a previously initialized cudnnTensorDescriptor_t*.

Tensor descriptor describing the size, layout, and data type of the \(\alpha_{1}\) scale and bias tensors. The tensor should have shape (1,K,1,1), K is the number of output features.

NULL

X_BN_EQSCALE_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether batchNorm equivalent scale or \(\alpha_{1}\) tensor pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, then \(\alpha_{1}\) scaling becomes an NOP.

CUDNN_PTR_NULL

X_ZDESC

In the setter, the *param should be a pointer to a previously initialized cudnnTensorDescriptor_t.

Tensor descriptor describing the size, layout, and data type of the z tensor. If unset, then z scale-add term becomes a NOP.

NULL

CUDNN_PARAM_ZDATA_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether z tensor pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, then z scale-add term becomes a NOP.

CUDNN_PTR_NULL

CUDNN_PARAM_BN_Z_EQSCALEBIAS_DESC

In the setter, the *param should be a pointer to a previously initialized cudnnTensorDescriptor_t*.

Tensor descriptor describing the size, layout, and data type of the \(\alpha_{2}\) tensor. If set to NULL then scaling for input z becomes a NOP.

NULLPTR

CUDNN_PARAM_BN_Z_EQSCALE_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether batchNorm z equivalent scaling pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *. If set to CUDNN_PTR_NULL, then the scaling for input z becomes a NOP.

CUDNN_PTR_NULL

X_ACTIVATION_DESC

In the setter, the *param should be a pointer to a previously initialized cudnnActivationDescriptor_t*.

Describes the activation operation. As of 7.6.0, only activation modes of CUDNN_ACTIVATION_RELU and CUDNN_ACTIVATION_IDENTITY are supported. If set to NULL or if the activation mode is set to CUDNN_ACTIVATION_IDENTITY, then the activation in the op sequence becomes a NOP.

NULL

X_YDESC

In the setter, the *param should be a pointer to a previously initialized cudnnTensorDescriptor_t*.

Tensor descriptor describing the size, layout, and data type of the y (output) tensor.

NULL

X_YDATA_PLACEHOLDER

In the setter, the *param should be a pointer to a previously initialized X_PointerPlaceHolder_t*.

Describes whether y (output) tensor pointer in the VariantParamPack will be NULL, or if not, user promised pointer alignment *.

CUDNN_PTR_NULL

cudnnFusedOpsPointerPlaceHolder_t

This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.

cudnnFusedOpsPointerPlaceHolder_t is an enumerated type used to select the alignment type of the cudnnFusedOps descriptor pointer.

Members and Descriptions

CUDNN_PTR_NULL = 0

Indicates that the pointer to the tensor in the variantPack will be NULL.

CUDNN_PTR_ELEM_ALIGNED = 1

Indicates that the pointer to the tensor in the variantPack will not be NULL, and will have element alignment.

CUDNN_PTR_16B_ALIGNED = 2

Indicates that the pointer to the tensor in the variantPack will not be NULL, and will have 16 byte alignment.

cudnnFusedOpsVariantParamLabel_t

This enumerated type is deprecated and is currently only used by deprecated APIs. Consider using replacements for the deprecated APIs that use this enumerated type.

The cudnnFusedOpsVariantParamLabel_t is an enumerated type that is used to set the buffer pointers. These buffer pointers can be changed in each iteration.

typedef enum {
    CUDNN_PTR_XDATA                              = 0,
    CUDNN_PTR_BN_EQSCALE                         = 1,
    CUDNN_PTR_BN_EQBIAS                          = 2,
    CUDNN_PTR_WDATA                              = 3,
    CUDNN_PTR_DWDATA                             = 4,
    CUDNN_PTR_YDATA                              = 5,
    CUDNN_PTR_DYDATA                             = 6,
    CUDNN_PTR_YSUM                               = 7,
    CUDNN_PTR_YSQSUM                             = 8,
    CUDNN_PTR_WORKSPACE                          = 9,
    CUDNN_PTR_BN_SCALE                           = 10,
    CUDNN_PTR_BN_BIAS                            = 11,
    CUDNN_PTR_BN_SAVED_MEAN                      = 12,
    CUDNN_PTR_BN_SAVED_INVSTD                    = 13,
    CUDNN_PTR_BN_RUNNING_MEAN                    = 14,
    CUDNN_PTR_BN_RUNNING_VAR                     = 15,
    CUDNN_PTR_ZDATA                              = 16,
    CUDNN_PTR_BN_Z_EQSCALE                       = 17,
    CUDNN_PTR_BN_Z_EQBIAS                        = 18,
    CUDNN_PTR_ACTIVATION_BITMASK                 = 19,
    CUDNN_PTR_DXDATA                             = 20,
    CUDNN_PTR_DZDATA                             = 21,
    CUDNN_PTR_BN_DSCALE                          = 22,
    CUDNN_PTR_BN_DBIAS                           = 23,
    CUDNN_SCALAR_SIZE_T_WORKSPACE_SIZE_IN_BYTES  = 100,
    CUDNN_SCALAR_INT64_T_BN_ACCUMULATION_COUNT   = 101,
    CUDNN_SCALAR_DOUBLE_BN_EXP_AVG_FACTOR        = 102,
    CUDNN_SCALAR_DOUBLE_BN_EPSILON               = 103,
    } cudnnFusedOpsVariantParamLabel_t;
Legend For Tables in cudnnFusedOpsVariantParamLabel_t

Short-Form Used

Stands For

Setter

cudnnSetFusedOpsVariantParamPackAttribute()

Getter

cudnnGetFusedOpsVariantParamPackAttribute()

X_ prefix in the Attribute Key column

Stands for CUDNN_PTR_ or CUDNN_SCALAR_ in the enumerator name.

For the Attribute CUDNN_FUSED_SCALE_BIAS_ACTIVATION_CONV_BNSTATS in cudnnFusedOpsConstParamLabel_t

Attribute Key

Expected Descriptor Type Passed in, in the Setter

I/O Type

Description

Default Value

X_XDATA

void *

input

Pointer to x (input) tensor on device, need to agree with previously set CUDNN_PARAM_XDATA_PLACEHOLDER attribute *.

NULL

X_BN_EQSCALE

void *

input

Pointer to batchNorm equivalent scale tensor on device, need to agree with previously set CUDNN_PARAM_BN_EQSCALE_PLACEHOLDER attribute *.

NULL

X_BN_EQBIAS

void *

input

Pointer to batchNorm equivalent bias tensor on device, need to agree with previously set CUDNN_PARAM_BN_EQBIAS_PLACEHOLDER attribute *.

NULL

X_WDATA

void *

input

Pointer to w (filter) tensor on device, need to agree with previously set CUDNN_PARAM_WDATA_PLACEHOLDER attribute *.

NULL

X_YDATA

void *

input

Pointer to y (output) tensor on device, need to agree with previously set CUDNN_PARAM_YDATA_PLACEHOLDER attribute *.

NULL

X_YSUM

void *

input

Pointer to sum of y tensor on device, need to agree with previously set CUDNN_PARAM_YSUM_PLACEHOLDER attribute *.

NULL

X_YSQSUM

void *

input

Pointer to sum of y square tensor on device, need to agree with previously set CUDNN_PARAM_YSQSUM_PLACEHOLDER attribute *.

NULL

X_WORKSPACE

void *

input

Pointer to user allocated workspace on device. Can be NULL if the workspace size requested is 0.

NULL

X_SIZE_T_WORKSPACE_SIZE_IN_BYTES

size_t *

input

Pointer to a size_t value in host memory describing the user allocated workspace size in bytes. The amount needs to be equal or larger than the amount requested in cudnnMakeFusedOpsPlan().

0

Note

  • If the corresponding pointer placeholder in ConstParamPack is set to CUDNN_PTR_NULL, then the device pointer in the VariantParamPack needs to be NULL as well.

  • If the corresponding pointer placeholder in ConstParamPack is set to CUDNN_PTR_ELEM_ALIGNED or CUDNN_PTR_16B_ALIGNED, then the device pointer in the VariantParamPack may not be NULL and need to be at least element-aligned or 16 bytes-aligned, respectively.

For the attribute CUDNN_FUSED_SCALE_BIAS_ACTIVATION_WGRAD in cudnnFusedOpsVariantParamLabel_t

Attribute Key

Expected Descriptor Type Passed in, in the Setter

I/O Type

Description

Default Value

X_XDATA

void *

input

Pointer to x (input) tensor on device, need to agree with previously set CUDNN_PARAM_XDATA_PLACEHOLDER attribute *.

NULL

X_BN_EQSCALE

void *

input

Pointer to batchNorm equivalent scale tensor on device, need to agree with previously set CUDNN_PARAM_BN_EQSCALE_PLACEHOLDER attribute *.

NULL

X_BN_EQBIAS

void *

input

Pointer to batchNorm equivalent bias tensor on device, need to agree with previously set CUDNN_PARAM_BN_EQBIAS_PLACEHOLDER attribute *.

NULL

X_DWDATA

void *

output

Pointer to dw (filter gradient output) tensor on device, need to agree with previously set CUDNN_PARAM_WDATA_PLACEHOLDER attribute *.

NULL

X_DYDATA

void *

input

Pointer to dy (gradient input) tensor on device, need to agree with previously set CUDNN_PARAM_YDATA_PLACEHOLDER attribute *.

NULL

X_WORKSPACE

void *

input

Pointer to user allocated workspace on device. Can be NULL if the workspace size requested is 0.

NULL

X_SIZE_T_WORKSPACE_SIZE_IN_BYTES

size_t *

input

Pointer to a size_t value in host memory describing the user allocated workspace size in bytes. The amount needs to be equal or larger than the amount requested in cudnnMakeFusedOpsPlan().

0

Note

  • If the corresponding pointer placeholder in ConstParamPack is set to CUDNN_PTR_NULL, then the device pointer in the VariantParamPack needs to be NULL as well.

  • If the corresponding pointer placeholder in ConstParamPack is set to CUDNN_PTR_ELEM_ALIGNED or CUDNN_PTR_16B_ALIGNED, then the device pointer in the VariantParamPack may not be NULL and need to be at least element-aligned or 16 bytes-aligned, respectively.

For the attribute CUDNN_FUSED_BN_FINALIZE_STATISTICS_TRAINING in cudnnFusedOpsVariantParamLabel_t

Attribute Key

Expected Descriptor Type Passed in, in the Setter

I/O Type

Description

Default Value

X_YSUM

void *

input

Pointer to sum of y tensor on device, need to agree with previously set CUDNN_PARAM_YSUM_PLACEHOLDER attribute *.

NULL

X_YSQSUM

void *

input

Pointer to sum of y square tensor on device, need to agree with previously set CUDNN_PARAM_YSQSUM_PLACEHOLDER attribute *.

NULL

X_BN_SCALE

void *

input

Pointer to sum of y square tensor on device, need to agree with previously set CUDNN_PARAM_BN_SCALE_PLACEHOLDER attribute *.

NULL

X_BN_BIAS

void *

input

Pointer to sum of y square tensor on device, need to agree with previously set CUDNN_PARAM_BN_BIAS_PLACEHOLDER attribute *.

NULL

X_BN_SAVED_MEAN

void *

output

Pointer to sum of y square tensor on device, need to agree with previously se CUDNN_PARAM_BN_SAVED_MEAN_PLACEHOLDER attribute *.

NULL

X_BN_SAVED_INVSTD

void *

output

Pointer to sum of y square tensor on device, need to agree with previously set CUDNN_PARAM_BN_SAVED_INVSTD_PLACEHOLDER attribute *.

NULL

X_BN_RUNNING_MEAN

void *

input/output

Pointer to sum of y square tensor on device, need to agree with previously set CUDNN_PARAM_BN_RUNNING_MEAN_PLACEHOLDER attribute *.

NULL

X_BN_RUNNING_VAR

void *

input/output

Pointer to sum of y square tensor on device, need to agree with previously set CUDNN_PARAM_BN_RUNNING_VAR_PLACEHOLDER attribute *.

NULL

X_BN_EQSCALE

void *

output

Pointer to batchNorm equivalent scale tensor on device, need to agree with previously set CUDNN_PARAM_BN_EQSCALE_PLACEHOLDER attribute *.

NULL

X_BN_EQBIAS

void *

output

Pointer to batchNorm equivalent bias tensor on device, need to agree with previously set CUDNN_PARAM_BN_EQBIAS_PLACEHOLDER attribute *.

NULL

X_INT64_T_BN_ACCUMULATION_COUNT

int64_t *

input

Pointer to a scalar value in int64_t on host memory. This value should describe the number of tensor elements accumulated in the sum of y and sum of y square tensors. For example, in the single GPU use case, if the mode is CUDNN_BATCHNORM_SPATIAL or CUDNN_BATCHNORM_SPATIAL_PERSISTENT, the value should be equal to N*H*W of the tensor from which the statistics are calculated. In multi-GPU use case, if all-reduce has been performed on the sum of y and sum of y square tensors, this value should be the sum of the single GPU accumulation count on each of the GPUs.

0

X_DOUBLE_BN_EXP_AVG_FACTOR

double *

input

Pointer to a scalar value in double on host memory. Factor used in the moving average computation. Refer to exponentialAverageFactor in cudnnBatchNormalization* APIs.

0.0

X_DOUBLE_BN_EPSILON

double *

input

Pointer to a scalar value in double on host memory. A conditioning constant used in the batch normalization formula. Its value should be equal to or greater than the value defined for CUDNN_BN_MIN_EPSILON in cudnn.h. Refer to exponentialAverageFactor in cudnnBatchNormalization* APIs.

0.0

X_WORKSPACE

void *

input

Pointer to user allocated workspace on device. Can be NULL if the workspace size requested is 0.

NULL

X_SIZE_T_WORKSPACE_SIZE_IN_BYTES

size_t *

input

Pointer to a size_t value in host memory describing the user allocated workspace size in bytes. The amount needs to be equal or larger than the amount requested in cudnnMakeFusedOpsPlan().

0

Note

  • If the corresponding pointer placeholder in ConstParamPack is set to CUDNN_PTR_NULL, then the device pointer in the VariantParamPack needs to be NULL as well.

  • If the corresponding pointer placeholder in ConstParamPack is set to CUDNN_PTR_ELEM_ALIGNED or CUDNN_PTR_16B_ALIGNED, then the device pointer in the VariantParamPack may not be NULL and need to be at least element-aligned or 16 bytes-aligned, respectively.

For the attribute CUDNN_FUSED_BN_FINALIZE_STATISTICS_INFERENCE in cudnnFusedOpsVariantParamLabel_t

Attribute Key

Expected Descriptor Type Passed in, in the Setter

I/O Type

Description

Default Value

X_BN_SCALE

void *

input

Pointer to sum of y square tensor on device, need to agree with previously set CUDNN_PARAM_BN_SCALE_PLACEHOLDER attribute *.

NULL

X_BN_BIAS

void *

input

Pointer to sum of y square tensor on device, need to agree with previously set CUDNN_PARAM_BN_BIAS_PLACEHOLDER attribute *.

NULL

X_BN_RUNNING_MEAN

void *

input/output

Pointer to sum of y square tensor on device, need to agree with previously set CUDNN_PARAM_BN_RUNNING_MEAN_PLACEHOLDER attribute *.

NULL

X_BN_RUNNING_VAR

void *

input/output

Pointer to sum of y square tensor on device, need to agree with previously set CUDNN_PARAM_BN_RUNNING_VAR_PLACEHOLDER attribute *.

NULL

X_BN_EQSCALE

void *

output

Pointer to batchNorm equivalent scale tensor on device, need to agree with previously set CUDNN_PARAM_BN_EQSCALE_PLACEHOLDER attribute *.

NULL

X_BN_EQBIAS

void *

output

Pointer to batchNorm equivalent bias tensor on device, need to agree with previously set CUDNN_PARAM_BN_EQBIAS_PLACEHOLDER attribute *.

NULL

X_DOUBLE_BN_EPSILON

double *

input

Pointer to a scalar value in double on host memory. A conditioning constant used in the batch normalization formula. Its value should be equal to or greater than the value defined for CUDNN_BN_MIN_EPSILON in cudnn.h. Refer to exponentialAverageFactor in cudnnBatchNormalization* APIs.

0.0

X_WORKSPACE

void *

input

Pointer to user allocated workspace on device. Can be NULL if the workspace size requested is 0.

NULL

X_SIZE_T_WORKSPACE_SIZE_IN_BYTES

size_t *

input

Pointer to a size_t value in host memory describing the user allocated workspace size in bytes. The amount needs to be equal or larger than the amount requested in cudnnMakeFusedOpsPlan().

0

Note

  • If the corresponding pointer placeholder in ConstParamPack is set to CUDNN_PTR_NULL, then the device pointer in the VariantParamPack needs to be NULL as well.

  • If the corresponding pointer placeholder in ConstParamPack is set to CUDNN_PTR_ELEM_ALIGNED or CUDNN_PTR_16B_ALIGNED, then the device pointer in the VariantParamPack may not be NULL and need to be at least element-aligned or 16 bytes-aligned, respectively.

For the attribute CUDNN_FUSED_BN_FINALIZE_STATISTICS_INFERENCE in cudnnFusedOpsVariantParamLabel_t

Attribute Key

Expected Descriptor Type Passed in, in the Setter

I/O Type

Description

Default Value

X_XDATA

void *

input

Pointer to x (image) tensor on device, need to agree with previously set CUDNN_PARAM_XDATA_PLACEHOLDER attribute *.

NULL

X_WDATA

void *

input

Pointer to w (filter) tensor on device, need to agree with previously set CUDNN_PARAM_WDATA_PLACEHOLDER attribute *.

NULL

X_BN_EQSCALE

void *

input

Pointer to alpha1 or batchNorm equivalent scale tensor on device, need to agree with previously set CUDNN_PARAM_BN_EQSCALE_PLACEHOLDER attribute *.

NULL

X_ZDATA

void *

input

Pointer to z tensor on device, need to agree with previously set CUDNN_PARAM_YDATA_PLACEHOLDER attribute *.

NULL

X_BN_Z_EQBIAS

void *

input

Pointer to batchNorm equivalent bias tensor on device, need to agree with previously set CUDNN_PARAM_BN_Z_EQBIAS_PLACEHOLDER attribute *.

NULL

X_YDATA

void *

output

Pointer to y (output) tensor on device, need to agree with previously set CUDNN_PARAM_YDATA_PLACEHOLDER attribute *.

NULL

X_WORKSPACE

void *

input

Pointer to user allocated workspace on device. Can be NULL if the workspace size requested is 0.

NULL

X_SIZE_T_WORKSPACE_SIZE_IN_BYTES

size_t *

input

Pointer to a size_t value in host memory describing the user allocated workspace size in bytes. The amount needs to be equal or larger than the amount requested in cudnnMakeFusedOpsPlan().

0

Note

  • If the corresponding pointer placeholder in ConstParamPack is set to CUDNN_PTR_NULL, then the device pointer in the VariantParamPack needs to be NULL as well.

  • If the corresponding pointer placeholder in ConstParamPack is set to CUDNN_PTR_ELEM_ALIGNED or CUDNN_PTR_16B_ALIGNED, then the device pointer in the VariantParamPack may not be NULL and need to be at least element-aligned or 16 bytes-aligned, respectively.

API Functions

These are the API functions in the cudnn_cnn library.

cudnnCnnVersionCheck()

Cross-library version checker. Each sublibrary has a version checker that checks whether its own version matches that of its dependencies.

Returns

CUDNN_STATUS_SUCCESS

The version check passed.

CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH

The versions are inconsistent.

cudnnConvolutionBackwardBias()

This function has been deprecated in cuDNN 9.0.

This function computes the convolution function gradient with respect to the bias, which is the sum of every element belonging to the same feature map across all of the images of the input tensor. Therefore, the number of elements produced is equal to the number of features maps of the input tensor.

cudnnStatus_t cudnnConvolutionBackwardBias(
    cudnnHandle_t                    handle,
    const void                      *alpha,
    const cudnnTensorDescriptor_t    dyDesc,
    const void                      *dy,
    const void                      *beta,
    const cudnnTensorDescriptor_t    dbDesc,
    void                            *db)

Parameters

handle

Input. Handle to a previously created cuDNN context. For more information, refer to cudnnHandle_t.

alpha, beta

Input. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows:

dstValue = alpha[0]*result + beta[0]*priorDstValue

For more information, refer to Scaling Parameters.

dyDesc

Input. Handle to the previously initialized input tensor descriptor. For more information, refer to cudnnTensorDescriptor_t.

dy

Input. Data pointer to GPU memory associated with the tensor descriptor dyDesc.

dbDesc

Input. Handle to the previously initialized output tensor descriptor.

db

Output. Data pointer to GPU memory associated with the output tensor descriptor dbDesc.

Returns

CUDNN_STATUS_SUCCESS

The operation was launched successfully.

CUDNN_STATUS_NOT_SUPPORTED

The function does not support the provided configuration.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • One of the parameters n, height, or width of the output tensor is not 1.

  • The numbers of feature maps of the input tensor and output tensor differ.

  • The dataType of the two tensor descriptors is different.

cudnnConvolutionBackwardData()

This function has been deprecated in cuDNN 9.0.

This function computes the convolution data gradient of the tensor dy, where y is the output of the forward convolution in cudnnConvolutionForward(). It uses the specified algo, and returns the results in the output tensor dx. Scaling factors alpha and beta can be used to scale the computed result or accumulate with the current dx.

cudnnStatus_t cudnnConvolutionBackwardData(
    cudnnHandle_t                       handle,
    const void                         *alpha,
    const cudnnFilterDescriptor_t       wDesc,
    const void                         *w,
    const cudnnTensorDescriptor_t       dyDesc,
    const void                         *dy,
    const cudnnConvolutionDescriptor_t  convDesc,
    cudnnConvolutionBwdDataAlgo_t       algo,
    void                               *workSpace,
    size_t                              workSpaceSizeInBytes,
    const void                         *beta,
    const cudnnTensorDescriptor_t       dxDesc,
    void                               *dx)

Parameters

handle

Input. Handle to a previously created cuDNN context. For more information, refer to cudnnHandle_t.

alpha, beta

Input. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows:

dstValue = alpha[0]*result + beta[0]*priorDstValue

For more information, refer to Scaling Parameters.

wDesc

Input. Handle to a previously initialized filter descriptor. For more information, refer to cudnnFilterDescriptor_t.

w

Input. Data pointer to GPU memory associated with the filter descriptor wDesc.

dyDesc

Input. Handle to the previously initialized input differential tensor descriptor. For more information, refer to cudnnTensorDescriptor_t.

dy

Input. Data pointer to GPU memory associated with the input differential tensor descriptor dyDesc.

convDesc

Input. Previously initialized convolution descriptor. For more information, refer to cudnnConvolutionDescriptor_t.

algo

Input. Enumerant that specifies which backward data convolution algorithm should be used to compute the results. For more information, refer to cudnnConvolutionBwdDataAlgo_t.

workSpace

Input. Data pointer to GPU memory to a workspace needed to be able to execute the specified algorithm. If no workspace is needed for a particular algorithm, that pointer can be NIL.

workSpaceSizeInBytes

Input. Specifies the size in bytes of the provided workSpace.

dxDesc

Input. Handle to the previously initialized output tensor descriptor.

dx

Input/Output. Data pointer to GPU memory associated with the output tensor descriptor dxDesc that carries the result.

Supported Configurations

This function supports the following combinations of data types for wDesc, dyDesc, convDesc, and dxDesc.

Supported Configurations for cudnnConvolutionBackwardData()

Data Type Configurations

wDesc, dyDesc, and dxDesc Data Type

convDesc Data Type

TRUE_HALF_CONFIG (only supported on architectures with true FP16 support, meaning, compute capability 5.3 and later)

CUDNN_DATA_HALF

CUDNN_DATA_HALF

PSEUDO_HALF_CONFIG

CUDNN_DATA_HALF

CUDNN_DATA_FLOAT

PSEUDO_BFLOAT16_CONFIG

CUDNN_DATA_BFLOAT16

CUDNN_DATA_FLOAT

FLOAT_CONFIG

CUDNN_DATA_FLOAT

CUDNN_DATA_FLOAT

DOUBLE_CONFIG

CUDNN_DATA_DOUBLE

CUDNN_DATA_DOUBLE

Supported Algorithms

Specifying a separate algorithm can cause changes in performance, support and computation determinism. Refer to the following list of algorithm options, and their respective supported parameters and deterministic behavior.

The table below shows the list of the supported 2D and 3D convolutions. The 2D convolutions are described first, followed by the 3D convolutions.

For brevity, the short-form versions followed by > are used in the table below:

  • CUDNN_CONVOLUTION_BWD_DATA_ALGO_0 > _ALGO_0

  • CUDNN_CONVOLUTION_BWD_DATA_ALGO_1 > _ALGO_1

  • CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT > _FFT

  • CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT_TILING > _FFT_TILING

  • CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD > _WINOGRAD

  • CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD_NONFUSED > _WINOGRAD_NONFUSED

  • CUDNN_TENSOR_NCHW > _NCHW

  • CUDNN_TENSOR_NHWC > _NHWC

  • CUDNN_TENSOR_NCHW_VECT_C > _NCHW_VECT_C

Supported Algorithms for cudnnConvolutionBackwardData() 2D convolutions. Filter descriptor wDesc: _NHWC (refer to cudnnTensorFormat_t).

Algo Name

Deterministic

Tensor Formats Supported for dyDesc

Tensor Formats Supported for dxDesc

Data Type Configurations Supported

Important

_ALGO_0 _ALGO_1

NHWC HWC-packed

NHWC HWC-packed

TRUE_HALF_CONFIG PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG

Supported Algorithms for cudnnConvolutionBackwardData() 2D convolutions. Filter descriptor wDesc: _NCHW.

Algo Name

Deterministic

Tensor Formats Supported for dyDesc

Tensor Formats Supported for dxDesc

Data Type Configurations Supported

Important

_ALGO_0

No

NCHW CHW-packed

All except _NCHW_VECT_C

TRUE_HALF_CONFIG PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG DOUBLE_CONFIG

Dilation: Greater than 0 for all dimensions convDesc Group Count Support: Greater than 0

_ALGO_1

Yes

NCHW CHW-packed

All except _NCHW_VECT_C

TRUE_HALF_CONFIG PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

Dilation: Greater than 0 for all dimensions convDesc Group Count Support: Greater than 0

_FFT

Yes

NCHW CHW-packed

NCHW HW-packed

PSEUDO_HALF_CONFIG FLOAT_CONFIG

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0 dxDesc feature map height + 2 * convDesc zero-padding height must equal 256 or less dxDesc feature map width + 2 * convDesc zero-padding width must equal 256 or less convDesc vertical and horizontal filter stride must equal 1 wDesc filter height must be greater than convDesc zero-padding height wDesc filter width must be greater than convDesc zero-padding width

_FFT_TILING

Yes

NCHW CHW-packed

NCHW HW-packed

PSEUDO_HALF_CONFIG FLOAT_CONFIG DOUBLE_CONFIG is also supported when the task can be handled by 1D FFT, meaning, one of the filter dimensions, width, or height is 1.

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0 When neither of wDesc filter dimension is 1, the filter width and height must not be larger than 32 When either of wDesc filter dimension is 1, the largest filter dimension should not exceed 256 convDesc vertical and horizontal filter stride must equal 1 when either the filter width or filter height is 1, otherwise, the stride can be 1 or 2 wDesc filter height must be greater than convDesc zero-padding height wDesc filter width must be greater than convDesc zero-padding width

_WINOGRAD

Yes

NCHW CHW-packed

All except _NCHW_VECT_C

PSEUDO_HALF_CONFIG FLOAT_CONFIG

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0 convDesc vertical and horizontal filter stride must equal 1 wDesc filter height must be 3 wDesc filter width must be 3

_WINOGRAD_NONFUSED

Yes

NCHW CHW-packed

All except _NCHW_VECT_C

TRUE_HALF_CONFIG PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0 convDesc vertical and horizontal filter stride must equal 1 wDesc filter (height, width) must be (3,3) or (5,5) If wDesc filter (height, width) is (5,5), then the data type config TRUE_HALF_CONFIG is not supported

Supported Algorithms for cudnnConvolutionBackwardData() 3D convolutions. Filter descriptor wDesc: _NCHW.

Algo Name (3D Convolutions)

Deterministic

Tensor Formats Supported for dyDesc

Tensor Formats Supported for dxDesc

Data Type Configurations Supported

Important

_ALGO_0

Yes

NCDHW CDHW-packed

All except _NCDHW_VECT_C

PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

Dilation: Greater than 0 for all dimensions convDesc Group Count Support: Greater than 0

_ALGO_1

Yes

NCDHW CDHW-packed

NCDHW CDHW-packed

TRUE_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG PSEUDO_HALF_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0

_FFT_TILING

Yes

NCDHW CDHW-packed

NCDHW DHW-packed

PSEUDO_HALF_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0 wDesc filter height must equal 16 or less wDesc filter width must equal 16 or less wDesc filter depth must equal 16 or less convDesc must have all filter strides equal to 1 wDesc filter height must be greater than convDesc zero-padding height wDesc filter width must be greater than convDesc zero-padding width wDesc filter depth must be greater than convDesc zero-padding width

Supported Algorithms for cudnnConvolutionBackwardData() 3D convolutions. Filter descriptor wDesc: _NHWC.

Algo Name (3D Convolutions)

Deterministic

Tensor Formats Supported for dyDesc

Tensor Formats Supported for dxDesc

Data Type Configurations Supported

Important

_ALGO_1

Yes

NDHWC DHWC-packed

NDHWC DHWC-packed

TRUE_HALF_CONFIG PSEUDO_HALF_CONFIG PESUDO_BFLOAT16_CONFIG FLOAT_CONFIG

Dilation: Greater than 0 for all dimensions convDesc Group Count Support: Greater than 0

Returns

CUDNN_STATUS_SUCCESS

The operation was launched successfully.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • At least one of the following is NULL: handle, dyDesc, wDesc, convDesc, dxDesc, dy, w, dx, alpha, and beta

  • wDesc and dyDesc have a non-matching number of dimensions

  • wDesc and dxDesc have a non-matching number of dimensions

  • wDesc has fewer than three number of dimensions

  • wDesc, dxDesc, and dyDesc have a non-matching data type

  • wDesc and dxDesc have a non-matching number of input feature maps per image (or group in case of grouped convolutions)

  • dyDesc spatial sizes do not match with the expected size as determined by cudnnGetConvolutionNdForwardOutputDim()

CUDNN_STATUS_NOT_SUPPORTED

At least one of the following conditions are met:

  • dyDesc or dxDesc have a negative tensor striding

  • dyDesc, wDesc, or dxDesc has a number of dimensions that is not 4 or 5

  • The chosen algo does not support the parameters provided; refer to the above tables for an exhaustive list of parameters that support each algo.

  • dyDesc or wDesc indicate an output channel count that isn’t a multiple of group count (if group count has been set in convDesc)

CUDNN_STATUS_MAPPING_ERROR

An error occurs during the texture binding of texture object creation associated with the filter data or the input differential tensor data.

CUDNN_STATUS_EXECUTION_FAILED

The function failed to launch on the GPU.

cudnnConvolutionBackwardFilter()

This function has been deprecated in cuDNN 9.0.

This function computes the convolution weight (filter) gradient of the tensor dy, where y is the output of the forward convolution in cudnnConvolutionForward(). It uses the specified algo, and returns the results in the output tensor dw. Scaling factors alpha and beta can be used to scale the computed result or accumulate with the current dw.

cudnnStatus_t cudnnConvolutionBackwardFilter(
    cudnnHandle_t                       handle,
    const void                         *alpha,
    const cudnnTensorDescriptor_t       xDesc,
    const void                         *x,
    const cudnnTensorDescriptor_t       dyDesc,
    const void                         *dy,
    const cudnnConvolutionDescriptor_t  convDesc,
    cudnnConvolutionBwdFilterAlgo_t     algo,
    void                               *workSpace,
    size_t                              workSpaceSizeInBytes,
    const void                         *beta,
    const cudnnFilterDescriptor_t       dwDesc,
    void                               *dw)

Parameters

handle

Input. Handle to a previously created cuDNN context. For more information, refer to cudnnHandle_t.

alpha, beta

Input. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows:

dstValue = alpha[0]*result + beta[0]*priorDstValue

For more information, refer to Scaling Parameters.

xDesc

Input. Handle to a previously initialized tensor descriptor. For more information, refer to cudnnTensorDescriptor_t.

x

Input. Data pointer to GPU memory associated with the tensor descriptor xDesc.

dyDesc

Input. Handle to the previously initialized input differential tensor descriptor.

dy

Input. Data pointer to GPU memory associated with the backpropagation gradient tensor descriptor dyDesc.

convDesc

Input. Previously initialized convolution descriptor. For more information, refer to cudnnConvolutionDescriptor_t.

algo

Input. Enumerant that specifies which convolution algorithm should be used to compute the results. For more information, refer to cudnnConvolutionBwdFilterAlgo_t.

workSpace

Input. Data pointer to GPU memory to a workspace needed to be able to execute the specified algorithm. If no workspace is needed for a particular algorithm, that pointer can be NIL.

workSpaceSizeInBytes

Input. Specifies the size in bytes of the provided workSpace.

dwDesc

Input. Handle to a previously initialized filter gradient descriptor. For more information, refer to cudnnFilterDescriptor_t.

dw

Input/Output. Data pointer to GPU memory associated with the filter gradient descriptor dwDesc that carries the result.

Supported Configurations

This function supports the following combinations of data types for xDesc, dyDesc, convDesc, and dwDesc.

Supported Configurations for cudnnConvolutionBackwardFilter()

Data Type Configurations

xDesc, dyDesc, and dwDesc Data Type

convDesc Data Type

TRUE_HALF_CONFIG (only supported on architectures with true FP16 support, meaning, compute capability 5.3 and later)

CUDNN_DATA_HALF

CUDNN_DATA_HALF

PSEUDO_HALF_CONFIG

CUDNN_DATA_HALF

CUDNN_DATA_FLOAT

PSEUDO_BFLOAT16_CONFIG

CUDNN_DATA_BFLOAT16

CUDNN_DATA_FLOAT

FLOAT_CONFIG

CUDNN_DATA_FLOAT

CUDNN_DATA_FLOAT

DOUBLE_CONFIG

CUDNN_DATA_DOUBLE

CUDNN_DATA_DOUBLE

Supported Algorithms

Specifying a separate algorithm can cause changes in performance, support, and computation determinism. Refer to the following table for an exhaustive list of algorithm options and their respective supported parameters and deterministic behavior.

The table below shows the list of the supported 2D and 3D convolutions. The 2D convolutions are described first, followed by the 3D convolutions.

For the following terms, the short-form versions shown in parentheses are used in the table below, for brevity:

For brevity, the short-form versions followed by > are used in the table below:

  • CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0 > _ALGO_0

  • CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1 > _ALGO_1

  • CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3 > _ALGO_3

  • CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT > _FFT

  • CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT_TILING > _FFT_TILING

  • CUDNN_CONVOLUTION_BWD_FILTER_ALGO_WINOGRAD_NONFUSED > _WINOGRAD_NONFUSED

  • CUDNN_TENSOR_NCHW > _NCHW

  • CUDNN_TENSOR_NHWC > _NHWC

  • CUDNN_TENSOR_NCHW_VECT_C > _NCHW_VECT_C

Supported Algorithms for cudnnConvolutionBackwardFilter() 2D Convolutions. Filter descriptor dwDesc: _NHWC (refer to cudnnTensorFormat_t).

Algo Name

Deterministic

Tensor Formats Supported for xDesc

Tensor Formats Supported for dyDesc

Data Type Configurations Supported

Important

_ALGO_0 and _ALGO_1

All except _NCHW_VECT_C

NHWC HWC-packed

PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG

Supported Algorithms for cudnnConvolutionBackwardFilter() 2D Convolutions. Filter descriptor dwDesc: _NCHW.

Algo Name

Deterministic

Tensor Formats Supported for xDesc

Tensor Formats Supported for dyDesc

Data Type Configurations Supported

Important

_ALGO_0

No

All except _NCHW_VECT_C

NCHW CHW-packed

PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

Dilation: Greater than 0 for all dimensions convDesc Group Count Support: Greater than 0

_ALGO_1

Yes

All except _NCHW_VECT_C

NCHW CHW-packed

PSEUDO_HALF_CONFIG TRUE_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

Dilation: Greater than 0 for all dimensions convDesc Group Count Support: Greater than 0

_FFT

Yes

NCHW CHW-packed

NCHW CHW-packed

PSEUDO_HALF_CONFIG FLOAT_CONFIG

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0 xDesc feature map height + 2 * convDesc zero-padding height must equal 256 or less xDesc feature map width + 2 * convDesc zero-padding width must equal 256 or less convDesc vertical and horizontal filter stride must equal 1 convDesc must have all filter strides equal to 1 dwDesc filter height must be greater than convDesc zero-padding height dwDesc filter width must be greater than convDesc zero-padding width

_ALGO_3

No

All except _NCHW_VECT_C

NCHW CHW-packed

PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0

_WINOGRAD_NONFUSED

Yes

All except _NCHW_VECT_C

NCHW CHW-packed

TRUE_HALF_CONFIG PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0 convDesc vertical and horizontal filter stride must equal 1 dwDesc filter (height, width) must be (3,3) or (5,5) If dwDesc filter (height, width) is (5,5), then the data type config TRUE_HALF_CONFIG is not supported.

_FFT_TILING

Yes

NCHW CHW-packed

NCHW CHW-packed

PSEUDO_HALF_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0 dyDesc width or height must equal 1 (the same dimension as in xDesc). The other dimension must be less than or equal to 256, meaning, the largest 1D tile size currently supported. convDesc vertical and horizontal filter stride must equal 1 dwDesc filter height must be greater than convDesc zero-padding height dwDesc filter width must be greater than convDesc zero-padding width

Supported Algorithms for cudnnConvolutionBackwardFilter() 3D Convolutions. Filter descriptor dwDesc: _NCHW.

Algo Name (3D Convolutions)

Deterministic

Tensor Formats Supported for xDesc

Tensor Formats Supported for dyDesc

Data Type Configurations Supported

Important

_ALGO_0

No

All except _NCDHW_VECT_C

NCDHW CDHW-packed NCDHW W-packed NDHWC

PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

Dilation: Greater than 0 for all dimensions convDesc Group Count Support: Greater than 0

_ALGO_1

No

All except _NCDHW_VECT_C

NCDHW CDHW-packed NCDHW W-packed NDHWC

PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

Dilation: Greater than 0 for all dimensions convDesc Group Count Support: Greater than 0

_ALGO_3

No

NCDHW fully-packed

NCDHW fully-packed

PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

Dilation: Greater than 0 for all dimensions convDesc Group Count Support: Greater than 0

Supported Algorithms for cudnnConvolutionBackwardFilter() 3D Convolutions. Filter descriptor dwDesc: _NHWC.

Algo Name (3D Convolutions)

Deterministic

Tensor Formats Supported for xDesc

Tensor Formats Supported for dyDesc

Data Type Configurations Supported

Important

_ALGO_1

Yes

NDHWC HWC-packed

NDHWC HWC-packed

PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG TRUE_HALF_CONFIG

Dilation: Greater than 0 for all dimensions convDesc Group Count Support: Greater than 0

Returns

CUDNN_STATUS_SUCCESS

The operation was launched successfully.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • At least one of the following is NULL: handle, xDesc, dyDesc, convDesc, dwDesc, xData, dyData, dwData, alpha, or beta

  • xDesc and dyDesc have a non-matching number of dimensions

  • xDesc and dwDesc have a non-matching number of dimensions

  • xDesc has fewer than three number of dimensions

  • xDesc, dyDesc, and dwDesc have a non-matching data type

  • xDesc and dwDesc have a non-matching number of input feature maps per image (or group in case of grouped convolutions)

  • yDesc or dwDesc indicate an output channel count that isn’t a multiple of group count (if group count has been set in convDesc)

CUDNN_STATUS_NOT_SUPPORTED

At least one of the following conditions are met:

  • xDesc or dyDesc have negative tensor striding

  • xDesc, dyDesc`,` or ``dwDesc has a number of dimensions that is not 4 or 5

  • The chosen algo does not support the parameters provided; see above for an exhaustive list of parameter support for each algo

CUDNN_STATUS_MAPPING_ERROR

An error occurs during the texture object creation associated with the filter data.

CUDNN_STATUS_EXECUTION_FAILED

The function failed to launch on the GPU.

cudnnConvolutionBiasActivationForward()

This function has been deprecated in cuDNN 9.0.

This function applies a bias and then an activation to the convolutions or cross-correlations of cudnnConvolutionForward(), returning results in y. The full computation follows the equation y = act (alpha1 * conv(x) + alpha2 * z + bias).

cudnnStatus_t cudnnConvolutionBiasActivationForward(
    cudnnHandle_t                       handle,
    const void                         *alpha1,
    const cudnnTensorDescriptor_t       xDesc,
    const void                         *x,
    const cudnnFilterDescriptor_t       wDesc,
    const void                         *w,
    const cudnnConvolutionDescriptor_t  convDesc,
    cudnnConvolutionFwdAlgo_t           algo,
    void                               *workSpace,
    size_t                              workSpaceSizeInBytes,
    const void                         *alpha2,
    const cudnnTensorDescriptor_t       zDesc,
    const void                         *z,
    const cudnnTensorDescriptor_t       biasDesc,
    const void                         *bias,
    const cudnnActivationDescriptor_t   activationDesc,
    const cudnnTensorDescriptor_t       yDesc,
    void                               *y)

The routine cudnnGetConvolution2dForwardOutputDim() or cudnnGetConvolutionNdForwardOutputDim() can be used to determine the proper dimensions of the output tensor descriptor yDesc with respect to xDesc, convDesc, and wDesc.

Only the CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM algo is enabled with CUDNN_ACTIVATION_IDENTITY. In other words, in the cudnnActivationDescriptor_t structure of the input activationDesc, if the mode of the cudnnActivationMode_t field is set to the enum value CUDNN_ACTIVATION_IDENTITY, then the input cudnnConvolutionFwdAlgo_t of this function cudnnConvolutionBiasActivationForward() must be set to the enum value CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM. For more information, refer to cudnnSetActivationDescriptor().

Device pointer z and y may be pointing to the same buffer, however, x cannot point to the same buffer as z or y.

Parameters

handle

Input. Handle to a previously created cuDNN context. For more information, refer to cudnnHandle_t.

alpha1, alpha2

Input. Pointers to scaling factors (in host memory) used to blend the computation result of convolution with z and bias as follows:

y = act (alpha1 * conv(x) + alpha2 * z + bias)

For more information, refer to Scaling Parameters.

xDesc

Input. Handle to a previously initialized tensor descriptor. For more information, refer to cudnnTensorDescriptor_t.

x

Input. Data pointer to GPU memory associated with the tensor descriptor xDesc.

wDesc

Input. Handle to a previously initialized filter descriptor. For more information, refer to cudnnFilterDescriptor_t.

w

Input. Data pointer to GPU memory associated with the filter descriptor wDesc.

convDesc

Input. Previously initialized convolution descriptor. For more information, refer to cudnnConvolutionDescriptor_t.

algo

Input. Enumerant that specifies which convolution algorithm should be used to compute the results. For more information, refer to cudnnConvolutionFwdAlgo_t.

workSpace

Input. Data pointer to GPU memory to a workspace needed to be able to execute the specified algorithm. If no workspace is needed for a particular algorithm, that pointer can be NIL.

workSpaceSizeInBytes

Input. Specifies the size in bytes of the provided workSpace.

zDesc

Input. Handle to a previously initialized tensor descriptor.

z

Input. Data pointer to GPU memory associated with the tensor descriptor zDesc.

biasDesc

Input. Handle to a previously initialized tensor descriptor.

bias

Input. Data pointer to GPU memory associated with the tensor descriptor biasDesc.

activationDesc

Input. Handle to a previously initialized activation descriptor. For more information, refer to cudnnActivationDescriptor_t.

yDesc

Input. Handle to a previously initialized tensor descriptor.

y

Input/Output. Data pointer to GPU memory associated with the tensor descriptor yDesc that carries the result of the convolution.

For the convolution step, this function supports the specific combinations of data types for xDesc, wDesc, convDesc, and yDesc as listed in the documentation of cudnnConvolutionForward(). The following table specifies the supported combinations of data types for x, y, z, bias, alpha1, and alpha2.

Supported Combinations of Data Types (X = CUDNN_DATA) for cudnnConvolutionBiasActivationForward()

x

w

convDesc

y and z

bias

alpha1 and alpha2

X_DOUBLE

X_DOUBLE

X_DOUBLE

X_DOUBLE

X_DOUBLE

X_DOUBLE

X_FLOAT

X_FLOAT

X_FLOAT

X_FLOAT

X_FLOAT

X_FLOAT

X_HALF

X_HALF

X_FLOAT

X_HALF

X_HALF

X_FLOAT

X_BFLOAT16

X_BFLOAT16

X_FLOAT

X_BFLOAT16

X_BFLOAT16

X_FLOAT

X_INT8

X_INT8

X_INT32

X_INT8

X_FLOAT

X_FLOAT

X_INT8

X_INT8

X_INT32

X_FLOAT

X_FLOAT

X_FLOAT

X_INT8x4

X_INT8x4

X_INT32

X_INT8x4

X_FLOAT

X_FLOAT

X_INT8x4

X_INT8x4

X_INT32

X_FLOAT

X_FLOAT

X_FLOAT

X_UINT8

X_INT8

X_INT32

X_INT8

X_FLOAT

X_FLOAT

X_UINT8

X_INT8

X_INT32

X_FLOAT

X_FLOAT

X_FLOAT

X_UINT8x4

X_INT8x4

X_INT32

X_INT8x4

X_FLOAT

X_FLOAT

X_UINT8x4

X_INT8x4

X_INT32

X_FLOAT

X_FLOAT

X_FLOAT

X_INT8x32

X_INT8x32

X_INT32

X_INT8x32

X_FLOAT

X_FLOAT

Returns

In addition to the error values listed by the documentation of cudnnConvolutionForward(), the possible error values returned by this function and their meanings are listed below.

CUDNN_STATUS_SUCCESS

The operation was launched successfully.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • At least one of the following is NULL: handle, xDesc, wDesc, convDesc, yDesc, zDesc, biasDesc, activationDesc, xData, wData, yData, zData, bias, alpha1, and alpha2.

  • The number of dimensions of xDesc, wDesc, yDesc, and zDesc is not equal to the array length of convDesc + 2.

CUDNN_STATUS_NOT_SUPPORTED

The function does not support the provided configuration. Some examples of non-supported configurations include:

  • The mode of activationDesc is not CUDNN_ACTIVATION_RELU or CUDNN_ACTIVATION_IDENTITY.

  • The reluNanOpt of activationDesc is not CUDNN_NOT_PROPAGATE_NAN.

  • The second stride of biasDesc is not equal to 1.

  • The first dimension of biasDesc is not equal to 1.

  • The second dimension of biasDesc and the first dimension of filterDesc are not equal.

  • The data type of biasDesc does not correspond to the data type of yDesc as listed in the above data type tables.

  • zDesc and destDesc do not match.

CUDNN_STATUS_EXECUTION_FAILED

The function failed to launch on the GPU.

cudnnConvolutionForward()

This function has been deprecated in cuDNN 9.0.

This function executes convolutions or cross-correlations over x using filters specified with w, returning results in y. Scaling factors alpha and beta can be used to scale the input tensor and the output tensor respectively.

cudnnStatus_t cudnnConvolutionForward(
    cudnnHandle_t                       handle,
    const void                         *alpha,
    const cudnnTensorDescriptor_t       xDesc,
    const void                         *x,
    const cudnnFilterDescriptor_t       wDesc,
    const void                         *w,
    const cudnnConvolutionDescriptor_t  convDesc,
    cudnnConvolutionFwdAlgo_t           algo,
    void                               *workSpace,
    size_t                              workSpaceSizeInBytes,
    const void                         *beta,
    const cudnnTensorDescriptor_t       yDesc,
    void                               *y)

The routine cudnnGetConvolution2dForwardOutputDim() or cudnnGetConvolutionNdForwardOutputDim() can be used to determine the proper dimensions of the output tensor descriptor yDesc with respect to xDesc, convDesc, and wDesc.

Parameters

handle

Input. Handle to a previously created cuDNN context. For more information, refer to cudnnHandle_t.

alpha, beta

Input. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows:

dstValue = alpha[0]*result + beta[0]*priorDstValue

For more information, refer to Scaling Parameters.

xDesc

Input. Handle to a previously initialized tensor descriptor. For more information, refer to cudnnTensorDescriptor_t.

x

Input. Data pointer to GPU memory associated with the tensor descriptor xDesc.

wDesc

Input. Handle to a previously initialized filter descriptor. For more information, refer to cudnnFilterDescriptor_t.

w

Input. Data pointer to GPU memory associated with the filter descriptor wDesc.

convDesc

Input. Previously initialized convolution descriptor. For more information, refer to cudnnConvolutionDescriptor_t.

algo

Input. Enumerant that specifies which convolution algorithm should be used to compute the results. For more information, refer to cudnnConvolutionFwdAlgo_t.

workSpace

Input. Data pointer to GPU memory to a workspace needed to be able to execute the specified algorithm. If no workspace is needed for a particular algorithm, that pointer can be NIL.

workSpaceSizeInBytes

Input. Specifies the size in bytes of the provided workSpace.

yDesc

Input. Handle to a previously initialized tensor descriptor.

y

Input/Output. Data pointer to GPU memory associated with the tensor descriptor yDesc that carries the result of the convolution.

Supported Configurations

This function supports the following combinations of data types for xDesc, wDesc, convDesc, and yDesc.

Supported Configurations for cudnnConvolutionForward()

Data Type Configurations

xDesc and wDesc Data Type

convDesc Data Type

yDesc Data Type

TRUE_HALF_CONFIG (only supported on architectures with true FP16 support, meaning, compute capability 5.3 and later)

CUDNN_DATA_HALF

CUDNN_DATA_HALF

CUDNN_DATA_HALF

PSEUDO_HALF_CONFIG

CUDNN_DATA_HALF

CUDNN_DATA_FLOAT

CUDNN_DATA_HALF

PSEUDO_BFLOAT16_CONFIG (only support on architecture with bfloat16 support, meaning, compute capability 8.0 and later)

CUDNN_DATA_BFLOAT16

CUDNN_DATA_FLOAT

CUDNN_DATA_BFLOAT16

FLOAT_CONFIG

CUDNN_DATA_FLOAT

CUDNN_DATA_FLOAT

CUDNN_DATA_FLOAT

DOUBLE_CONFIG

CUDNN_DATA_DOUBLE

CUDNN_DATA_DOUBLE

CUDNN_DATA_DOUBLE

INT8_CONFIG (only supported on architectures with DP4A support, meaning, compute capability 6.1 and later)

CUDNN_DATA_INT8

CUDNN_DATA_INT32

CUDNN_DATA_INT8

INT8_EXT_CONFIG (only supported on architectures with DP4A support, meaning, compute capability 6.1 and later)

CUDNN_DATA_INT8

CUDNN_DATA_INT32

CUDNN_DATA_FLOAT

INT8x4_CONFIG (only supported on architectures with DP4A support, meaning, compute capability 6.1 and later)

CUDNN_DATA_INT8x4

CUDNN_DATA_INT32

CUDNN_DATA_INT8x4

INT8x4_EXT_CONFIG (only supported on architectures with DP4A support, meaning, compute capability 6.1 and later)

CUDNN_DATA_INT8x4

CUDNN_DATA_INT32

CUDNN_DATA_FLOAT

UINT8_CONFIG (only supported on architectures with DP4A support, meaning, compute capability 6.1 and later)

xDesc: CUDNN_DATA_UINT8 wDesc: CUDNN_DATA_INT8

CUDNN_DATA_INT32

CUDNN_DATA_INT8

UINT8x4_CONFIG (only supported on architectures with DP4A support, meaning, compute capability 6.1 and later)

xDesc: CUDNN_DATA_UINT8x4 wDesc: CUDNN_DATA_INT8x4

CUDNN_DATA_INT32

CUDNN_DATA_INT8x4

UINT8_EXT_CONFIG (only supported on architectures with DP4A support, meaning, compute capability 6.1 and later)

xDesc: CUDNN_DATA_UINT8 wDesc: CUDNN_DATA_INT8

CUDNN_DATA_INT32

CUDNN_DATA_FLOAT

UINT8x4_EXT_CONFIG (only supported on architectures with DP4A support, meaning, compute capability 6.1 and later)

xDesc: CUDNN_DATA_UINT8x4 wDesc: CUDNN_DATA_INT8x4

CUDNN_DATA_INT32

CUDNN_DATA_FLOAT

INT8x32_CONFIG (only supported on architectures with IMMA support, meaning compute capability 7.5 and later)

CUDNN_DATA_INT8x32

CUDNN_DATA_INT32

CUDNN_DATA_INT8x32

Supported Algorithms

For this function, all algorithms perform deterministic computations. Specifying a separate algorithm can cause changes in performance and support.

The following table shows the list of the supported 2D and 3D convolutions. The 2D convolutions are described first, followed by the 3D convolutions.

For brevity, the short-form versions followed by > are used in the table below:

  • CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM > _IMPLICIT_GEMM

  • CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM > _IMPLICIT_PRECOMP_GEMM

  • CUDNN_CONVOLUTION_FWD_ALGO_GEMM > _GEMM

  • CUDNN_CONVOLUTION_FWD_ALGO_DIRECT > _DIRECT

  • CUDNN_CONVOLUTION_FWD_ALGO_FFT > _FFT

  • CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING > _FFT_TILING

  • CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD > _WINOGRAD

  • CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED > _WINOGRAD_NONFUSED

  • CUDNN_TENSOR_NCHW > _NCHW

  • CUDNN_TENSOR_NHWC > _NHWC

  • CUDNN_TENSOR_NCHW_VECT_C > _NCHW_VECT_C

Supported Algorithms for cudnnConvolutionForward() 2D Convolutions. Filter descriptor wDesc: _NCHW (refer to cudnnTensorFormat_t).

Algo Name

Tensor Formats Supported for xDesc

Tensor Formats Supported for yDesc

Data Type Configurations Supported

Important

_IMPLICIT_GEMM

All except _NCHW_VECT_C

All except _NCHW_VECT_C

TRUE_HALF_CONFIG PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

Dilation: Greater than 0 for all dimensions convDesc Group Count Support: Greater than 0 for all algos

_IMPLICIT_PRECOMP_GEMM

All except _NCHW_VECT_C

All except _NCHW_VECT_C

TRUE_HALF_CONFIG PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0 for all algos

_GEMM

All except _NCHW_VECT_C

All except _NCHW_VECT_C

PSEUDO_HALF_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0 for all algos

_FFT

NCHW HW-packed

NCHW HW-packed

PSEUDO_HALF_CONFIG FLOAT_CONFIG

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0 for all algos xDesc feature map height + 2 * convDesc zero-padding height must equal 256 or less xDesc feature map width + 2 * convDesc zero-padding width must equal 256 or less convDesc vertical and horizontal filter stride must equal 1 wDesc filter height must be greater than convDesc zero-padding height wDesc filter width must be greater than convDesc zero-padding width

_FFT_TILING

NCHW HW-packed

NCHW HW-packed

PSEUDO_HALF_CONFIG FLOAT_CONFIG DOUBLE_CONFIG is also supported when the task can be handled by 1D FFT, meaning, one of the filter dimensions, width or height is 1.

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0 for all algos When neither of wDesc filter dimension is 1, the filter width and height must not be larger than 32 When either of wDesc filter dimension is 1, the largest filter dimension should not exceed 256 convDesc vertical and horizontal filter stride must equal 1 when either the filter width or filter height is 1, otherwise the stride can be a 1 or 2 wDesc filter height must be greater than convDesc zero-padding height wDesc filter width must be greater than convDesc zero-padding width

_WINOGRAD

All except _NCHW_VECT_C

All except _NCHW_VECT_C

PSEUDO_HALF_CONFIG FLOAT_CONFIG

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0 for all algos convDesc vertical and horizontal filter stride must equal 1 wDesc filter height must be 3 wDesc filter width must be 3

_WINOGRAD_NONFUSED

All except _NCHW_VECT_C

All except _NCHW_VECT_C

TRUE_HALF_CONFIG PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0 for all algos convDesc vertical and horizontal filter stride must equal 1 wDesc filter (height, width) must be (3,3) or (5,5) If wDesc filter (height, width) is (5,5), then data type config TRUE_HALF_CONFIG is not supported.

_DIRECT

Currently, not implemented in cuDNN.

Currently, not implemented in cuDNN.

Currently, not implemented in cuDNN.

Currently, not implemented in cuDNN.

Supported Algorithms for cudnnConvolutionForward() 2D Convolutions. Filter descriptor wDesc: _NCHWC.

Algo Name

Tensor Formats Supported for xDesc

Tensor Formats Supported for yDesc

Data Type Configurations Supported

Important

_IMPLICIT_GEMM _IMPLICIT_PRECOMP_GEMM

All except _NCHW_VECT_C

All except _NCHW_VECT_C

INT8x4_CONFIG UINT8x4_CONFIG

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0

_IMPLICIT_PRECOMP_GEMM

All except _NCHW_VECT_C

All except _NCHW_VECT_C

INT8x32_CONFIG

Dilation: 1 for all dimensions convDesc Group Count Support: Greater than 0 for all algos Requires compute capability 7.2 or above.

Supported Algorithms for cudnnConvolutionForward() 2D Convolutions. Filter descriptor wDesc: _NHWC.

Algo Name

Tensor Formats Supported for xDesc

Tensor Formats Supported for yDesc

Data Type Configurations Supported

Important

_IMPLICIT_GEMM _IMPLICIT_PRECOMP_GEMM

NHWC fully-packed

NHWC fully-packed

INT8_CONFIG INT8_EXT_CONFIG UINT8_CONFIG UINT8_EXT_CONFIG

Dilation: 1 for all dimensions Input and output feature maps must be a multiple of 4. Output features maps can be non-multiple in the case of INT8_EXT_CONFIG or UINT8_EXT_CONFIG. convDesc Group Count Support: Greater than 0

_IMPLICIT_GEMM _IMPLICIT_PRECOMP_GEMM

NHWC HWC-packed

NHWC HWC-packed NCHW CHW-packed

TRUE_HALF_CONFIG PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

convDesc Group Count Support: Greater than 0

Supported Algorithms for cudnnConvolutionForward() 3D Convolutions. Filter descriptor wDesc: _NCHW.

Algo Name

Tensor Formats Supported for xDesc

Tensor Formats Supported for yDesc

Data Type Configurations Supported

Important

_IMPLICIT_GEMM _IMPLICIT_PRECOMP_GEMM

All except _NCHW_VECT_C

All except _NCHW_VECT_C

PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

Dilation: Greater than 0 for all dimensions convDesc Group Count Support: Greater than 0 for all algos

_FFT_TILING

NCDHW DHW-packed

NCDHW DHW-packed

PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG DOUBLE_CONFIG

Dilation: 1 for all dimensions wDesc filter height must equal 16 or less wDesc filter width must equal 16 or less wDesc filter depth must equal 16 or less convDesc must have all filter strides equal to 1 wDesc filter height must be greater than convDesc zero-padding height wDesc filter width must be greater than convDesc zero-padding width wDesc filter depth must be greater than convDesc zero-padding depth convDesc Group count support: Greater than 0 for all algos

Supported Algorithms for cudnnConvolutionForward() 3D Convolutions. Filter descriptor wDesc: _NHWC.

Algo Name

Tensor Formats Supported for xDesc

Tensor Formats Supported for yDesc

Data Type Configurations Supported

Important

_IMPLICIT_PRECOMP_GEMM

NDHWC DHWC-packed

NDHWC DHWC-packed

PSEUDO_HALF_CONFIG PSEUDO_BFLOAT16_CONFIG FLOAT_CONFIG

Dilation: Greater than 0 for all dimensions convDesc Group Count Support: Greater than 0 for all algos

Tensors can be converted to and from CUDNN_TENSOR_NCHW_VECT_C with cudnnTransformTensor().

Returns

CUDNN_STATUS_SUCCESS

The operation was launched successfully.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • At least one of the following is NULL: handle, xDesc, wDesc, convDesc, yDesc, xData, w, yData, alpha, and beta

  • xDesc and yDesc have a non-matching number of dimensions

  • xDesc and wDesc have a non-matching number of dimensions

  • xDesc has fewer than three number of dimensions

  • xDesc number of dimensions is not equal to convDesc array length + 2

  • xDesc and wDesc have a non-matching number of input feature maps per image (or group in case of grouped convolutions)

  • yDesc or wDesc indicate an output channel count that isn’t a multiple of group count (if group count has been set in convDesc)

  • xDesc, wDesc, and yDesc have a non-matching data type

  • For some spatial dimension, wDesc has a spatial size that is larger than the input spatial size (including zero-padding size)

CUDNN_STATUS_NOT_SUPPORTED

At least one of the following conditions are met:

  • xDesc or yDesc have negative tensor striding

  • xDesc, wDesc, or yDesc has a number of dimensions that is not 4 or 5

  • yDesc spatial sizes do not match with the expected size as determined by cudnnGetConvolutionNdForwardOutputDim()

  • The chosen algo does not support the parameters provided; see above for an exhaustive list of parameters supported for each algo

CUDNN_STATUS_MAPPING_ERROR

An error occurs during the texture object creation associated with the filter data.

CUDNN_STATUS_EXECUTION_FAILED

The function failed to launch on the GPU.

cudnnCreateConvolutionDescriptor()

This function has been deprecated in cuDNN 9.0.

This function creates a convolution descriptor object by allocating the memory needed to hold its opaque structure. For more information, refer to cudnnConvolutionDescriptor_t.

cudnnStatus_t cudnnCreateConvolutionDescriptor(
    cudnnConvolutionDescriptor_t *convDesc)

Returns

CUDNN_STATUS_SUCCESS

The object was created successfully.

CUDNN_STATUS_ALLOC_FAILED

The resources could not be allocated.

cudnnCreateFusedOpsConstParamPack()

This function has been deprecated in cuDNN 9.0.

This function creates an opaque structure to store the various problem size information, such as the shape, layout and the type of tensors, and the descriptors for convolution and activation, for the selected sequence of cudnnFusedOps computations.

cudnnStatus_t cudnnCreateFusedOpsConstParamPack(
  cudnnFusedOpsConstParamPack_t *constPack,
  cudnnFusedOps_t ops);

Parameters

constPack

Input. The opaque structure that is created by this function. For more information, refer to cudnnFusedOpsConstParamPack_t.

ops

Input. The specific sequence of computations to perform in the cudnnFusedOps computations, as defined in the enumerant type cudnnFusedOps_t.

Returns

CUDNN_STATUS_BAD_PARAM

If either constPack or ops is NULL.

CUDNN_STATUS_ALLOC_FAILED

The resources could not be allocated.

CUDNN_STATUS_SUCCESS

If the descriptor is created successfully.

cudnnCreateFusedOpsPlan()

This function has been deprecated in cuDNN 9.0.

This function creates the plan descriptor for the cudnnFusedOps computation. This descriptor contains the plan information, including the problem type and size, which kernels should be run, and the internal workspace partition.

cudnnStatus_t cudnnCreateFusedOpsPlan(
  cudnnFusedOpsPlan_t *plan,
  cudnnFusedOps_t ops);

Parameters

plan

Input. A pointer to the instance of the descriptor created by this function.

ops

Input. The specific sequence of fused operations computations for which this plan descriptor should be created. For more information, refer to cudnnFusedOps_t.

Returns

CUDNN_STATUS_BAD_PARAM

If either the input *plan is NULL or the ops input is not a valid cudnnFusedOp enum.

CUDNN_STATUS_ALLOC_FAILED

The resources could not be allocated.

CUDNN_STATUS_SUCCESS

The plan descriptor is created successfully.

cudnnCreateFusedOpsVariantParamPack()

This function has been deprecated in cuDNN 9.0.

This function creates the variant pack descriptor for the cudnnFusedOps computation.

cudnnStatus_t cudnnCreateFusedOpsVariantParamPack(
  cudnnFusedOpsVariantParamPack_t *varPack,
  cudnnFusedOps_t ops);

Parameters

varPack

Input. Pointer to the descriptor created by this function. For more information, refer to cudnnFusedOpsVariantParamPack_t.

ops

Input. The specific sequence of fused operations computations for which this descriptor should be created.

Returns

CUDNN_STATUS_SUCCESS

The descriptor was destroyed successfully.

CUDNN_STATUS_ALLOC_FAILED

The resources could not be allocated.

CUDNN_STATUS_BAD_PARAM

If any input is invalid.

cudnnDestroyConvolutionDescriptor()

This function has been deprecated in cuDNN 9.0.

This function destroys a previously created convolution descriptor object.

cudnnStatus_t cudnnDestroyConvolutionDescriptor(
    cudnnConvolutionDescriptor_t convDesc)

Returns

CUDNN_STATUS_SUCCESS

The descriptor was destroyed successfully.

cudnnDestroyFusedOpsConstParamPack()

This function has been deprecated in cuDNN 9.0.

This function destroys a previously-created cudnnFusedOpsConstParamPack_t structure.

cudnnStatus_t cudnnDestroyFusedOpsConstParamPack(
  cudnnFusedOpsConstParamPack_t constPack);

Parameters

constPack

Input. The cudnnFusedOpsConstParamPack_t structure that should be destroyed.

Returns

CUDNN_STATUS_SUCCESS

The descriptor was destroyed successfully.

CUDNN_STATUS_INTERNAL_ERROR

The ops enum value is either not supported or is invalid.

cudnnDestroyFusedOpsPlan()

This function has been deprecated in cuDNN 9.0.

This function destroys the plan descriptor provided.

cudnnStatus_t cudnnDestroyFusedOpsPlan(
  cudnnFusedOpsPlan_t plan);

Parameters

plan

Input. The descriptor that should be destroyed by this function.

Returns

CUDNN_STATUS_SUCCESS

Either the plan descriptor is NULL or the descriptor was successfully destroyed.

cudnnDestroyFusedOpsVariantParamPack()

This function has been deprecated in cuDNN 9.0.

This function destroys a previously-created descriptor for cudnnFusedOps constant parameters.

cudnnStatus_t cudnnDestroyFusedOpsVariantParamPack(
  cudnnFusedOpsVariantParamPack_t varPack);

Parameters

varPack

Input. The descriptor that should be destroyed.

Returns

CUDNN_STATUS_SUCCESS

The descriptor was successfully destroyed.

cudnnFindConvolutionBackwardDataAlgorithm()

This function has been deprecated in cuDNN 9.0.

This function attempts all algorithms available for cudnnConvolutionBackwardData(). It will attempt both the provided convDesc mathType and CUDNN_DEFAULT_MATH (assuming the two differ).

cudnnStatus_t cudnnFindConvolutionBackwardDataAlgorithm(
    cudnnHandle_t                          handle,
    const cudnnFilterDescriptor_t          wDesc,
    const cudnnTensorDescriptor_t          dyDesc,
    const cudnnConvolutionDescriptor_t     convDesc,
    const cudnnTensorDescriptor_t          dxDesc,
    const int                              requestedAlgoCount,
    int                                   *returnedAlgoCount,
    cudnnConvolutionBwdDataAlgoPerf_t     *perfResults)

Algorithms without the CUDNN_TENSOR_OP_MATH availability will only be tried with CUDNN_DEFAULT_MATH, and returned as such.

Memory is allocated via cudaMalloc(). The performance metrics are returned in the user-allocated array of cudnnConvolutionBwdDataAlgoPerf_t. These metrics are written in a sorted fashion where the first element has the lowest compute time. The total number of resulting algorithms can be queried through the API cudnnGetConvolutionBackwardDataAlgorithmMaxCount().

Note

  • This function is host blocking.

  • It is recommended to run this function prior to allocating layer data; doing otherwise may needlessly inhibit some algorithm options due to resource usage.

Parameters

handle

Input. Handle to a previously created cuDNN context.

wDesc

Input. Handle to a previously initialized filter descriptor.

dyDesc

Input. Handle to the previously initialized input differential tensor descriptor.

convDesc

Input. Previously initialized convolution descriptor.

dxDesc

Input. Handle to the previously initialized output tensor descriptor.

requestedAlgoCount

Input. The maximum number of elements to be stored in perfResults.

returnedAlgoCount

Output. The number of output elements stored in perfResults.

perfResults

Output. A user-allocated array to store performance metrics sorted ascending by compute time.

Returns

CUDNN_STATUS_SUCCESS

The query was successful.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • handle is not allocated properly

  • wDesc, dyDesc, or dxDesc is not allocated properly

  • wDesc, dyDesc, or dxDesc has fewer than 1 dimension

  • Either returnedCount or perfResults is NIL

  • requestedCount is less than 1

CUDNN_STATUS_ALLOC_FAILED

This function was unable to allocate memory to store sample input, filters, and output.

CUDNN_STATUS_INTERNAL_ERROR

At least one of the following conditions are met:

  • The function was unable to allocate necessary timing objects

  • The function was unable to deallocate necessary timing objects

  • The function was unable to deallocate sample input, filters, and output

cudnnFindConvolutionBackwardDataAlgorithmEx()

This function has been deprecated in cuDNN 9.0.

This function attempts all algorithms available for cudnnConvolutionBackwardData(). It will attempt both the provided convDesc mathType and CUDNN_DEFAULT_MATH (assuming the two differ).

cudnnStatus_t cudnnFindConvolutionBackwardDataAlgorithmEx(
    cudnnHandle_t                          handle,
    const cudnnFilterDescriptor_t          wDesc,
    const void                            *w,
    const cudnnTensorDescriptor_t          dyDesc,
    const void                            *dy,
    const cudnnConvolutionDescriptor_t     convDesc,
    const cudnnTensorDescriptor_t          dxDesc,
    void                                  *dx,
    const int                              requestedAlgoCount,
    int                                   *returnedAlgoCount,
    cudnnConvolutionBwdDataAlgoPerf_t     *perfResults,
    void                                  *workSpace,
    size_t                                 workSpaceSizeInBytes)

Algorithms without the CUDNN_TENSOR_OP_MATH availability will only be tried with CUDNN_DEFAULT_MATH, and returned as such.

Memory is allocated via cudaMalloc(). The performance metrics are returned in the user-allocated array of cudnnConvolutionBwdDataAlgoPerf_t. These metrics are written in a sorted fashion where the first element has the lowest compute time. The total number of resulting algorithms can be queried through the API cudnnGetConvolutionBackwardDataAlgorithmMaxCount().

Note

This function is host blocking.

Parameters

handle

Input. Handle to a previously created cuDNN context.

wDesc

Input. Handle to a previously initialized filter descriptor.

w

Input. Data pointer to GPU memory associated with the filter descriptor wDesc.

dyDesc

Input. Handle to the previously initialized input differential tensor descriptor.

dy

Input. Data pointer to GPU memory associated with the filter descriptor dyDesc.

convDesc

Input. Previously initialized convolution descriptor.

dxDesc

Input. Handle to the previously initialized output tensor descriptor.

dx

Input/Output. Data pointer to GPU memory associated with the tensor descriptor dxDesc. The content of this tensor will be overwritten with arbitrary values.

requestedAlgoCount

Input. The maximum number of elements to be stored in perfResults.

returnedAlgoCount

Output. The number of output elements stored in perfResults.

perfResults

Output. A user-allocated array to store performance metrics sorted ascending by compute time.

workSpace

Input. Data pointer to GPU memory is a necessary workspace for some algorithms. The size of this workspace will determine the availability of algorithms. A NIL pointer is considered a workSpace of 0 bytes.

workSpaceSizeInBytes

Input. Specifies the size in bytes of the provided workSpace.

Returns

CUDNN_STATUS_SUCCESS

The query was successful.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • handle is not allocated properly

  • wDesc, dyDesc, or dxDesc is not allocated properly

  • wDesc, dyDesc, or dxDesc has fewer than 1 dimension

  • Either returnedCount or perfResults is NIL

  • requestedCount is less than 1

CUDNN_STATUS_INTERNAL_ERROR

At least one of the following conditions are met:

  • The function was unable to allocate necessary timing objects

  • The function was unable to deallocate necessary timing objects

  • The function was unable to deallocate sample input, filters, and output

cudnnFindConvolutionBackwardFilterAlgorithm()

This function has been deprecated in cuDNN 9.0.

This function attempts all algorithms available for cudnnConvolutionBackwardFilter(). It will attempt both the provided convDesc mathType and CUDNN_DEFAULT_MATH (assuming the two differ).

cudnnStatus_t cudnnFindConvolutionBackwardFilterAlgorithm(
    cudnnHandle_t                          handle,
    const cudnnTensorDescriptor_t          xDesc,
    const cudnnTensorDescriptor_t          dyDesc,
    const cudnnConvolutionDescriptor_t     convDesc,
    const cudnnFilterDescriptor_t          dwDesc,
    const int                              requestedAlgoCount,
    int                                   *returnedAlgoCount,
    cudnnConvolutionBwdFilterAlgoPerf_t     *perfResults)

Algorithms without the CUDNN_TENSOR_OP_MATH availability will only be tried with CUDNN_DEFAULT_MATH, and returned as such.

Memory is allocated via cudaMalloc(). The performance metrics are returned in the user-allocated array of cudnnConvolutionBwdFilterAlgoPerf_t. These metrics are written in a sorted fashion where the first element has the lowest compute time. The total number of resulting algorithms can be queried through the API cudnnGetConvolutionBackwardFilterAlgorithmMaxCount().

Note

  • This function is host blocking.

  • It is recommended to run this function prior to allocating layer data; doing otherwise may needlessly inhibit some algorithm options due to resource usage.

Parameters

handle

Input. Handle to a previously created cuDNN context.

xDesc

Input. Handle to the previously initialized input tensor descriptor.

dyDesc

Input. Handle to the previously initialized input differential tensor descriptor.

convDesc

Input. Previously initialized convolution descriptor.

dwDesc

Input. Handle to a previously initialized filter descriptor.

requestedAlgoCount

Input. The maximum number of elements to be stored in perfResults.

returnedAlgoCount

Output. The number of output elements stored in perfResults.

perfResults

Output. A user-allocated array to store performance metrics sorted ascending by compute time.

Returns

CUDNN_STATUS_SUCCESS

The query was successful.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • handle is not allocated properly

  • xDesc, dyDesc, or dwDesc are not allocated properly

  • xDesc, dyDesc, or dwDesc has fewer than 1 dimension

  • Either returnedCount or perfResults is NIL

  • requestedCount is less than 1

CUDNN_STATUS_ALLOC_FAILED This function was unable to allocate memory to store sample input, filters and output.

CUDNN_STATUS_INTERNAL_ERROR At least one of the following conditions are met:

  • The function was unable to allocate necessary timing objects.

  • The function was unable to deallocate necessary timing objects.

  • The function was unable to deallocate sample input, filters, and output.

cudnnFindConvolutionBackwardFilterAlgorithmEx()

This function has been deprecated in cuDNN 9.0.

This function attempts all algorithms available for cudnnConvolutionBackwardFilter(). It will attempt both the provided convDesc mathType and CUDNN_DEFAULT_MATH (assuming the two differ).

cudnnStatus_t cudnnFindConvolutionBackwardFilterAlgorithmEx(
    cudnnHandle_t                          handle,
    const cudnnTensorDescriptor_t          xDesc,
    const void                            *x,
    const cudnnTensorDescriptor_t          dyDesc,
    const void                            *dy,
    const cudnnConvolutionDescriptor_t     convDesc,
    const cudnnFilterDescriptor_t          dwDesc,
    void                                  *dw,
    const int                              requestedAlgoCount,
    int                                   *returnedAlgoCount,
    cudnnConvolutionBwdFilterAlgoPerf_t   *perfResults,
    void                                  *workSpace,
    size_t                                 workSpaceSizeInBytes)

Algorithms without the CUDNN_TENSOR_OP_MATH availability will only be tried with CUDNN_DEFAULT_MATH, and returned as such.

Memory is allocated via cudaMalloc(). The performance metrics are returned in the user-allocated array of cudnnConvolutionBwdFilterAlgoPerf_t. These metrics are written in a sorted fashion where the first element has the lowest compute time. The total number of resulting algorithms can be queried through the API cudnnGetConvolutionBackwardFilterAlgorithmMaxCount().

Note

This function is host blocking.

Parameters

handle

Input. Handle to a previously created cuDNN context.

xDesc

Input. Handle to the previously initialized input tensor descriptor.

x

Input. Data pointer to GPU memory associated with the filter descriptor xDesc.

dyDesc

Input. Handle to the previously initialized input differential tensor descriptor.

dy

Input. Handle to the previously initialized input differential tensor descriptor.

convDesc

Input. Previously initialized convolution descriptor.

dwDesc

Input. Handle to a previously initialized filter descriptor.

dw

Input/Output. Data pointer to GPU memory associated with the filter descriptor dwDesc. The content of this tensor will be overwritten with arbitrary values.

requestedAlgoCount

Input. The maximum number of elements to be stored in perfResults.

returnedAlgoCount

Output. The number of output elements stored in perfResults.

perfResults

Output. A user-allocated array to store performance metrics sorted ascending by compute time.

workSpace

Input. Data pointer to GPU memory is a necessary workspace for some algorithms. The size of this workspace will determine the availability of algorithms. A NIL pointer is considered a workSpace of 0 bytes.

workSpaceSizeInBytes

Input. Specifies the size in bytes of the provided workSpace.

Returns

CUDNN_STATUS_SUCCESS

The query was successful.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • handle is not allocated properly

  • xDesc, dyDesc, or dwDesc are not allocated properly

  • xDesc, dyDesc, or dwDesc has fewer than 1 dimension

  • x, dy, or dw is NIL

  • Either returnedCount or perfResults is NIL

  • requestedCount is less than 1

CUDNN_STATUS_INTERNAL_ERROR At least one of the following conditions are met:

  • The function was unable to allocate necessary timing objects.

  • The function was unable to deallocate necessary timing objects.

  • The function was unable to deallocate sample input, filters, and output.

cudnnFindConvolutionForwardAlgorithm()

This function has been deprecated in cuDNN 9.0.

This function attempts all algorithms available for cudnnConvolutionForward(). It will attempt both the provided convDesc mathType and CUDNN_DEFAULT_MATH (assuming the two differ).

cudnnStatus_t cudnnFindConvolutionForwardAlgorithm(
    cudnnHandle_t                      handle,
    const cudnnTensorDescriptor_t      xDesc,
    const cudnnFilterDescriptor_t      wDesc,
    const cudnnConvolutionDescriptor_t convDesc,
    const cudnnTensorDescriptor_t      yDesc,
    const int                          requestedAlgoCount,
    int                               *returnedAlgoCount,
    cudnnConvolutionFwdAlgoPerf_t     *perfResults)

Algorithms without the CUDNN_TENSOR_OP_MATH availability will only be tried with CUDNN_DEFAULT_MATH, and returned as such.

Memory is allocated via cudaMalloc(). The performance metrics are returned in the user-allocated array of cudnnConvolutionFwdAlgoPerf_t. These metrics are written in a sorted fashion where the first element has the lowest compute time. The total number of resulting algorithms can be queried through the API cudnnGetConvolutionForwardAlgorithmMaxCount().

Note

  • This function is host blocking.

  • It is recommended to run this function prior to allocating layer data; doing otherwise may needlessly inhibit some algorithm options due to resource usage.

Parameters

handle

Input. Handle to a previously created cuDNN context.

xDesc

Input. Handle to the previously initialized input tensor descriptor.

wDesc

Input. Handle to a previously initialized filter descriptor.

convDesc

Input. Previously initialized convolution descriptor.

yDesc

Input. Handle to the previously initialized output tensor descriptor.

requestedAlgoCount

Input. The maximum number of elements to be stored in perfResults.

returnedAlgoCount

Output. The number of output elements stored in perfResults.

perfResults

Output. A user-allocated array to store performance metrics sorted ascending by compute time.

Returns

CUDNN_STATUS_SUCCESS

The query was successful.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • handle is not allocated properly

  • xDesc, dyDesc, or dwDesc are not allocated properly

  • xDesc, dyDesc, or dwDesc has fewer than 1 dimension

  • Either returnedCount or perfResults is NIL

  • requestedCount is less than 1

CUDNN_STATUS_ALLOC_FAILED

This function was unable to allocate memory to store sample input, filters, and output.

CUDNN_STATUS_INTERNAL_ERROR At least one of the following conditions are met:

  • The function was unable to allocate necessary timing objects.

  • The function was unable to deallocate necessary timing objects.

  • The function was unable to deallocate sample input, filters, and output.

cudnnFindConvolutionForwardAlgorithmEx()

This function has been deprecated in cuDNN 9.0.

This function attempts all algorithms available for cudnnConvolutionForward(). It will attempt both the provided convDesc mathType and CUDNN_DEFAULT_MATH (assuming the two differ).

cudnnStatus_t cudnnFindConvolutionForwardAlgorithmEx(
    cudnnHandle_t                      handle,
    const cudnnTensorDescriptor_t      xDesc,
    const void                        *x,
    const cudnnFilterDescriptor_t      wDesc,
    const void                        *w,
    const cudnnConvolutionDescriptor_t convDesc,
    const cudnnTensorDescriptor_t      yDesc,
    void                              *y,
    const int                          requestedAlgoCount,
    int                               *returnedAlgoCount,
    cudnnConvolutionFwdAlgoPerf_t     *perfResults,
    void                              *workSpace,
    size_t                             workSpaceSizeInBytes)

Algorithms without the CUDNN_TENSOR_OP_MATH availability will only be tried with CUDNN_DEFAULT_MATH, and returned as such.

Memory is allocated via cudaMalloc(). The performance metrics are returned in the user-allocated array of cudnnConvolutionFwdAlgoPerf_t. These metrics are written in a sorted fashion where the first element has the lowest compute time. The total number of resulting algorithms can be queried through the API cudnnGetConvolutionForwardAlgorithmMaxCount().

Note

This function is host blocking.

Parameters

handle

Input. Handle to a previously created cuDNN context.

xDesc

Input. Handle to the previously initialized input tensor descriptor.

x

Input. Data pointer to GPU memory associated with the tensor descriptor xDesc.

wDesc

Input. Handle to a previously initialized filter descriptor.

w

Input. Data pointer to GPU memory associated with the filter descriptor wDesc.

convDesc

Input. Previously initialized convolution descriptor.

yDesc

Input. Handle to the previously initialized output tensor descriptor.

y

Input/Output. Data pointer to GPU memory associated with the tensor descriptor yDesc. The content of this tensor will be overwritten with arbitrary values.

requestedAlgoCount

Input. The maximum number of elements to be stored in perfResults.

returnedAlgoCount

Output. The number of output elements stored in perfResults.

perfResults

Output. A user-allocated array to store performance metrics sorted ascending by compute time.

workSpace

Input. Data pointer to GPU memory is a necessary workspace for some algorithms. The size of this workspace will determine the availability of algorithms. A NIL pointer is considered a workSpace of 0 bytes.

workSpaceSizeInBytes

Input. Specifies the size in bytes of the provided workSpace.

Returns

CUDNN_STATUS_SUCCESS

The query was successful.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • handle is not allocated properly

  • xDesc, dyDesc, or dwDesc are not allocated properly

  • xDesc, dyDesc, or dwDesc has fewer than 1 dimension

  • x, w, or y is NIL

  • Either returnedCount or perfResults is NIL

  • requestedCount is less than 1

CUDNN_STATUS_INTERNAL_ERROR At least one of the following conditions are met:

  • The function was unable to allocate necessary timing objects.

  • The function was unable to deallocate necessary timing objects.

  • The function was unable to deallocate sample input, filters, and output.

cudnnFusedOpsExecute()

This function executes the sequence of cudnnFusedOps operations.

cudnnStatus_t cudnnFusedOpsExecute(
  cudnnHandle_t handle,
  const cudnnFusedOpsPlan_t plan,
  cudnnFusedOpsVariantParamPack_t varPack);

Parameters

handle

Input. Pointer to the cuDNN library context.

plan

Input. Pointer to a previously-created and initialized plan descriptor.

varPack

Input. Pointer to the descriptor to the variant parameters pack.

Returns

CUDNN_STATUS_BAD_PARAM

If the type of cudnnFusedOps_t in the plan descriptor is unsupported.

cudnnGetConvolution2dDescriptor()

This function has been deprecated in cuDNN 9.0.

This function queries a previously initialized 2D convolution descriptor object.

cudnnStatus_t cudnnGetConvolution2dDescriptor(
    const cudnnConvolutionDescriptor_t  convDesc,
    int                                *pad_h,
    int                                *pad_w,
    int                                *u,
    int                                *v,
    int                                *dilation_h,
    int                                *dilation_w,
    cudnnConvolutionMode_t             *mode,
    cudnnDataType_t                    *computeType)

Parameters

convDesc

Input/Output. Handle to a previously created convolution descriptor.

pad_h

Output. Zero-padding height: number of rows of zeros implicitly concatenated onto the top and onto the bottom of input images.

pad_w

Output. Zero-padding width: number of columns of zeros implicitly concatenated onto the left and onto the right of input images.

u

Output. Vertical filter stride.

v

Output. Horizontal filter stride.

dilation_h

Output. Filter height dilation.

dilation_w

Output. Filter width dilation.

mode

Output. Convolution mode.

computeType

Output. Compute precision.

Returns

CUDNN_STATUS_SUCCESS

The operation was successful.

CUDNN_STATUS_BAD_PARAM

The parameter convDesc is NIL.

cudnnGetConvolution2dForwardOutputDim()

This function has been deprecated in cuDNN 9.0.

This function returns the dimensions of the resulting 4D tensor of a 2D convolution, given the convolution descriptor, the input tensor descriptor and the filter descriptor. This function can help to set up the output tensor and allocate the proper amount of memory prior to launching the actual convolution.

cudnnStatus_t cudnnGetConvolution2dForwardOutputDim(
    const cudnnConvolutionDescriptor_t  convDesc,
    const cudnnTensorDescriptor_t       inputTensorDesc,
    const cudnnFilterDescriptor_t       filterDesc,
    int                                *n,
    int                                *c,
    int                                *h,
    int                                *w)

Each dimension h and w of the output images is computed as follows:

outputDim = 1 + ( inputDim + 2*pad - (((filterDim-1)*dilation)+1) )/convolutionStride;

Note

The dimensions provided by this routine must be strictly respected when calling cudnnConvolutionForward() or cudnnConvolutionBackwardBias(). Providing a smaller or larger output tensor is not supported by the convolution routines.

Parameters

convDesc

Input. Handle to a previously created convolution descriptor.

inputTensorDesc

Input. Handle to a previously initialized tensor descriptor.

filterDesc

Input. Handle to a previously initialized filter descriptor.

n

Output. Number of output images.

c

Output. Number of output feature maps per image.

h

Output. Height of each output feature map.

w

Output. Width of each output feature map.

Returns

CUDNN_STATUS_BAD_PARAM

One or more of the descriptors has not been created correctly or there is a mismatch between the feature maps of inputTensorDesc and filterDesc.

CUDNN_STATUS_SUCCESS

The object was set successfully.

cudnnGetConvolutionBackwardDataAlgorithm_v7()

This function has been deprecated in cuDNN 9.0.

This function serves as a heuristic for obtaining the best suited algorithm for cudnnConvolutionBackwardData() for the given layer specifications. This function will return all algorithms (including CUDNN_TENSOR_OP_MATH and CUDNN_DEFAULT_MATH versions of algorithms where CUDNN_TENSOR_OP_MATH may be available) sorted by expected (based on internal heuristic) relative performance with the fastest being index 0 of perfResults. For an exhaustive search for the fastest algorithm, use cudnnFindConvolutionBackwardDataAlgorithm(). The total number of resulting algorithms can be queried through the returnedAlgoCount variable.

cudnnStatus_t cudnnGetConvolutionBackwardDataAlgorithm_v7(
    cudnnHandle_t                          handle,
    const cudnnFilterDescriptor_t          wDesc,
    const cudnnTensorDescriptor_t          dyDesc,
    const cudnnConvolutionDescriptor_t     convDesc,
    const cudnnTensorDescriptor_t          dxDesc,
    const int                              requestedAlgoCount,
    int                                   *returnedAlgoCount,
    cudnnConvolutionBwdDataAlgoPerf_t     *perfResults)

Parameters

handle

Input. Handle to a previously created cuDNN context.

wDesc

Input. Handle to a previously initialized filter descriptor.

dyDesc

Input. Handle to the previously initialized input differential tensor descriptor.

convDesc

Input. Previously initialized convolution descriptor.

dxDesc

Input. Handle to the previously initialized output tensor descriptor.

requestedAlgoCount

Input. The maximum number of elements to be stored in perfResults.

returnedAlgoCount

Output. The number of output elements stored in perfResults.

perfResults

Output. A user-allocated array to store performance metrics sorted ascending by compute time.

Returns

CUDNN_STATUS_SUCCESS

The query was successful.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • One of the parameters handle, wDesc, dyDesc, convDesc, dxDesc, perfResults, or returnedAlgoCount is NULL.

  • The numbers of feature maps of the input tensor and output tensor differ.

  • The dataType of the two tensor descriptors or the filters are different.

  • requestedAlgoCount is less than or equal to 0.

cudnnGetConvolutionBackwardDataAlgorithmMaxCount()

This function has been deprecated in cuDNN 9.0.

This function returns the maximum number of algorithms which can be returned from cudnnFindConvolutionBackwardDataAlgorithm() and cudnnGetConvolutionForwardAlgorithm_v7(). This is the sum of all algorithms plus the sum of all algorithms with Tensor Core operations supported for the current device.

cudnnStatus_t cudnnGetConvolutionBackwardDataAlgorithmMaxCount(
    cudnnHandle_t       handle,
    int                 *count)

Parameters

handle

Input. Handle to a previously created cuDNN context.

count

Output. The resulting maximum number of algorithms.

Returns

CUDNN_STATUS_SUCCESS

The function was successful.

CUDNN_STATUS_BAD_PARAM

The provided handle is not allocated properly.

cudnnGetConvolutionBackwardDataWorkspaceSize()

This function has been deprecated in cuDNN 9.0.

This function returns the amount of GPU memory workspace the user needs to allocate to be able to call cudnnConvolutionBackwardData() with the specified algorithm. The workspace allocated will then be passed to the routine cudnnConvolutionBackwardData(). The specified algorithm can be the result of the call to cudnnGetConvolutionBackwardDataAlgorithm_v7() or can be chosen arbitrarily by the user. Note that not every algorithm is available for every configuration of the input tensor and/or every configuration of the convolution descriptor.

cudnnStatus_t cudnnGetConvolutionBackwardDataWorkspaceSize(
    cudnnHandle_t                       handle,
    const cudnnFilterDescriptor_t       wDesc,
    const cudnnTensorDescriptor_t       dyDesc,
    const cudnnConvolutionDescriptor_t  convDesc,
    const cudnnTensorDescriptor_t       dxDesc,
    cudnnConvolutionBwdDataAlgo_t       algo,
    size_t                             *sizeInBytes)

Parameters

handle

Input. Handle to a previously created cuDNN context.

wDesc

Input. Handle to a previously initialized filter descriptor.

dyDesc

Input. Handle to the previously initialized input differential tensor descriptor.

convDesc

Input. Previously initialized convolution descriptor.

dxDesc

Input. Handle to the previously initialized output tensor descriptor.

algo

Input. Enumerant that specifies the chosen convolution algorithm.

sizeInBytes

Output. Amount of GPU memory needed as workspace to be able to execute a forward convolution with the specified algo.

Returns

CUDNN_STATUS_SUCCESS

The query was successful.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • The numbers of feature maps of the input tensor and output tensor differ.

  • The dataType of the two tensor descriptors or the filter are different.

CUDNN_STATUS_NOT_SUPPORTED

The combination of the tensor descriptors, filter descriptor, and convolution descriptor is not supported for the specified algorithm.

cudnnGetConvolutionBackwardFilterAlgorithm_v7()

This function has been deprecated in cuDNN 9.0.

This function serves as a heuristic for obtaining the best suited algorithm for cudnnConvolutionBackwardFilter() for the given layer specifications. This function will return all algorithms (including CUDNN_TENSOR_OP_MATH and CUDNN_DEFAULT_MATH versions of algorithms where CUDNN_TENSOR_OP_MATH may be available) sorted by expected (based on internal heuristic) relative performance with the fastest being index 0 of perfResults. For an exhaustive search for the fastest algorithm, use cudnnFindConvolutionBackwardFilterAlgorithm(). The total number of resulting algorithms can be queried through the returnedAlgoCount variable.

cudnnStatus_t cudnnGetConvolutionBackwardFilterAlgorithm_v7(
    cudnnHandle_t                          handle,
    const cudnnTensorDescriptor_t          xDesc,
    const cudnnTensorDescriptor_t          dyDesc,
    const cudnnConvolutionDescriptor_t     convDesc,
    const cudnnFilterDescriptor_t          dwDesc,
    const int                              requestedAlgoCount,
    int                                   *returnedAlgoCount,
    cudnnConvolutionBwdFilterAlgoPerf_t   *perfResults)

Parameters

handle

Input. Handle to a previously created cuDNN context.

xDesc

Input. Handle to the previously initialized input tensor descriptor.

dyDesc

Input. Handle to the previously initialized input differential tensor descriptor.

convDesc

Input. Previously initialized convolution descriptor.

dwDesc

Input. Handle to a previously initialized filter descriptor.

requestedAlgoCount

Input. The maximum number of elements to be stored in perfResults.

returnedAlgoCount

Output. The number of output elements stored in perfResults.

perfResults

Output. A user-allocated array to store performance metrics sorted ascending by compute time.

Returns

CUDNN_STATUS_SUCCESS

The query was successful.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • One of the parameters handle, xDesc, dyDesc, convDesc, dwDesc, perfResults, or returnedAlgoCount is NULL.

  • The numbers of feature maps of the input tensor and output tensor differ.

  • The dataType of the two tensor descriptors or the filter are different.

  • requestedAlgoCount is less than or equal to 0.

cudnnGetConvolutionBackwardFilterAlgorithmMaxCount()

This function has been deprecated in cuDNN 9.0.

This function returns the maximum number of algorithms which can be returned from cudnnFindConvolutionBackwardFilterAlgorithm() and cudnnGetConvolutionForwardAlgorithm_v7(). This is the sum of all algorithms plus the sum of all algorithms with Tensor Core operations supported for the current device.

cudnnStatus_t cudnnGetConvolutionBackwardFilterAlgorithmMaxCount(
    cudnnHandle_t       handle,
    int                 *count)

Parameters

handle

Input. Handle to a previously created cuDNN context.

count

Output. The resulting maximum count of algorithms.

Returns

CUDNN_STATUS_SUCCESS

The function was successful.

CUDNN_STATUS_BAD_PARAM

The provided handle is not allocated properly.

cudnnGetConvolutionBackwardFilterWorkspaceSize()

This function has been deprecated in cuDNN 9.0.

This function returns the amount of GPU memory workspace the user needs to allocate to be able to call cudnnConvolutionBackwardFilter() with the specified algorithm. The workspace allocated will then be passed to the routine cudnnConvolutionBackwardFilter(). The specified algorithm can be the result of the call to cudnnGetConvolutionBackwardFilterAlgorithm_v7() or can be chosen arbitrarily by the user. Note that not every algorithm is available for every configuration of the input tensor and/or every configuration of the convolution descriptor.

cudnnStatus_t cudnnGetConvolutionBackwardFilterWorkspaceSize(
    cudnnHandle_t                       handle,
    const cudnnTensorDescriptor_t       xDesc,
    const cudnnTensorDescriptor_t       dyDesc,
    const cudnnConvolutionDescriptor_t  convDesc,
    const cudnnFilterDescriptor_t       dwDesc,
    cudnnConvolutionBwdFilterAlgo_t     algo,
    size_t                             *sizeInBytes)

Parameters

handle

Input. Handle to a previously created cuDNN context.

xDesc

Input. Handle to the previously initialized input tensor descriptor.

dyDesc

Input. Handle to the previously initialized input differential tensor descriptor.

convDesc

Input. Previously initialized convolution descriptor.

dwDesc

Input. Handle to a previously initialized filter descriptor.

algo

Input. Enumerant that specifies the chosen convolution algorithm.

sizeInBytes

Output. Amount of GPU memory needed as workspace to be able to execute a forward convolution with the specified algo.

Returns

CUDNN_STATUS_SUCCESS

The query was successful.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • The numbers of feature maps of the input tensor and output tensor differ.

  • The dataType of the two tensor descriptors or the filter are different.

CUDNN_STATUS_NOT_SUPPORTED

The combination of the tensor descriptors, filter descriptor and convolution descriptor is not supported for the specified algorithm.

cudnnGetConvolutionForwardAlgorithm_v7()

This function has been deprecated in cuDNN 9.0.

This function serves as a heuristic for obtaining the best suited algorithm for cudnnConvolutionForward() for the given layer specifications. This function will return all algorithms (including CUDNN_TENSOR_OP_MATH and CUDNN_DEFAULT_MATH versions of algorithms where CUDNN_TENSOR_OP_MATH may be available) sorted by expected (based on internal heuristic) relative performance with the fastest being index 0 of perfResults. For an exhaustive search for the fastest algorithm, use cudnnFindConvolutionForwardAlgorithm(). The total number of resulting algorithms can be queried through the returnedAlgoCount variable.

cudnnStatus_t cudnnGetConvolutionForwardAlgorithm_v7(
    cudnnHandle_t                       handle,
    const cudnnTensorDescriptor_t       xDesc,
    const cudnnFilterDescriptor_t       wDesc,
    const cudnnConvolutionDescriptor_t  convDesc,
    const cudnnTensorDescriptor_t       yDesc,
    const int                           requestedAlgoCount,
    int                                *returnedAlgoCount,
    cudnnConvolutionFwdAlgoPerf_t      *perfResults)

Parameters

handle

Input. Handle to a previously created cuDNN context.

xDesc

Input. Handle to the previously initialized input tensor descriptor.

wDesc

Input. Handle to a previously initialized convolution filter descriptor.

convDesc

Input. Previously initialized convolution descriptor.

yDesc

Input. Handle to the previously initialized output tensor descriptor.

requestedAlgoCount

Input. The maximum number of elements to be stored in perfResults.

returnedAlgoCount

Output. The number of output elements stored in perfResults.

perfResults

Output. A user-allocated array to store performance metrics sorted ascending by compute time.

Returns

CUDNN_STATUS_SUCCESS

The query was successful.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • One of the parameters handle, xDesc, wDesc, convDesc, yDesc, perfResults, or returnedAlgoCount is NULL.

  • Either yDesc or wDesc have different dimensions from xDesc.

  • The data types of tensors xDesc, yDesc, or wDesc are not all the same.

  • The number of feature maps in xDesc and wDesc differs.

  • The tensor xDesc has a dimension smaller than 3.

  • requestedAlgoCount is less than or equal to 0.

cudnnGetConvolutionForwardAlgorithmMaxCount()

This function has been deprecated in cuDNN 9.0.

This function returns the maximum number of algorithms which can be returned from cudnnFindConvolutionForwardAlgorithm() and cudnnGetConvolutionForwardAlgorithm_v7(). This is the sum of all algorithms plus the sum of all algorithms with Tensor Core operations supported for the current device.

cudnnStatus_t cudnnGetConvolutionForwardAlgorithmMaxCount(
    cudnnHandle_t   handle,
    int             *count)

Parameters

handle

Input. Handle to a previously created cuDNN context.

count

Output. The resulting maximum number of algorithms.

Returns

CUDNN_STATUS_SUCCESS

The function was successful.

CUDNN_STATUS_BAD_PARAM

The provided handle is not allocated properly.

cudnnGetConvolutionForwardWorkspaceSize()

This function has been deprecated in cuDNN 9.0.

This function returns the amount of GPU memory workspace the user needs to allocate to be able to call cudnnConvolutionForward() with the specified algorithm. The workspace allocated will then be passed to the routine cudnnConvolutionForward(). The specified algorithm can be the result of the call to cudnnGetConvolutionForwardAlgorithm_v7() or can be chosen arbitrarily by the user. Note that not every algorithm is available for every configuration of the input tensor and/or every configuration of the convolution descriptor.

cudnnStatus_t cudnnGetConvolutionForwardWorkspaceSize(
    cudnnHandle_t   handle,
    const   cudnnTensorDescriptor_t         xDesc,
    const   cudnnFilterDescriptor_t         wDesc,
    const   cudnnConvolutionDescriptor_t    convDesc,
    const   cudnnTensorDescriptor_t         yDesc,
    cudnnConvolutionFwdAlgo_t               algo,
    size_t                                 *sizeInBytes)

Parameters

handle

Input. Handle to a previously created cuDNN context.

xDesc

Input. Handle to the previously initialized x tensor descriptor.

wDesc

Input. Handle to a previously initialized filter descriptor.

convDesc

Input. Previously initialized convolution descriptor.

yDesc

Input. Handle to the previously initialized y tensor descriptor.

algo

Input. Enumerant that specifies the chosen convolution algorithm.

sizeInBytes

Output. Amount of GPU memory needed as workspace to be able to execute a forward convolution with the specified algo.

Returns

CUDNN_STATUS_SUCCESS

The query was successful.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • One of the parameters handle, xDesc, wDesc, convDesc, or yDesc is NULL.

  • The tensor yDesc or wDesc are not of the same dimension as xDesc.

  • The tensor xDesc, yDesc, or wDesc are not of the same data type.

  • The numbers of feature maps of the tensor xDesc and wDesc differ.

  • The tensor xDesc has a dimension smaller than 3.

CUDNN_STATUS_NOT_SUPPORTED

The combination of the tensor descriptors, filter descriptor, and convolution descriptor is not supported for the specified algorithm.

cudnnGetConvolutionGroupCount()

This function has been deprecated in cuDNN 9.0.

This function returns the group count specified in the given convolution descriptor.

cudnnStatus_t cudnnGetConvolutionGroupCount(
    cudnnConvolutionDescriptor_t    convDesc,
    int                            *groupCount)

Returns

CUDNN_STATUS_SUCCESS

The group count was returned successfully.

CUDNN_STATUS_BAD_PARAM

An invalid convolution descriptor was provided.

cudnnGetConvolutionMathType()

This function has been deprecated in cuDNN 9.0.

This function returns the math type specified in a given convolution descriptor.

cudnnStatus_t cudnnGetConvolutionMathType(
    cudnnConvolutionDescriptor_t    convDesc,
    cudnnMathType_t                *mathType)

Returns

CUDNN_STATUS_SUCCESS

The math type was returned successfully.

CUDNN_STATUS_BAD_PARAM

An invalid convolution descriptor was provided.

cudnnGetConvolutionNdDescriptor()

This function has been deprecated in cuDNN 9.0.

This function queries a previously initialized convolution descriptor object.

cudnnStatus_t cudnnGetConvolutionNdDescriptor(
    const cudnnConvolutionDescriptor_t  convDesc,
    int                                 arrayLengthRequested,
    int                                *arrayLength,
    int                                 padA[],
    int                                 filterStrideA[],
    int                                 dilationA[],
    cudnnConvolutionMode_t             *mode,
    cudnnDataType_t                    *dataType)

Parameters

convDesc

Input/Output. Handle to a previously created convolution descriptor.

arrayLengthRequested

Input. Dimension of the expected convolution descriptor. It is also the minimum size of the arrays padA, filterStrideA, and dilationA in order to be able to hold the results

arrayLength

Output. Actual dimension of the convolution descriptor.

padA

Output. Array of dimension of at least arrayLengthRequested that will be filled with the padding parameters from the provided convolution descriptor.

filterStrideA

Output. Array of dimension of at least arrayLengthRequested that will be filled with the filter stride from the provided convolution descriptor.

dilationA

Output. Array of dimension of at least arrayLengthRequested that will be filled with the dilation parameters from the provided convolution descriptor.

mode

Output. Convolution mode of the provided descriptor.

datatype

Output. Datatype of the provided descriptor.

Returns

CUDNN_STATUS_SUCCESS

The query was successful.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • The descriptor convDesc is NIL.

  • The arrayLengthRequest is negative.

CUDNN_STATUS_NOT_SUPPORTED

The arrayLengthRequested is greater than CUDNN_DIM_MAX-2.

cudnnGetConvolutionNdForwardOutputDim()

This function has been deprecated in cuDNN 9.0.

This function returns the dimensions of the resulting Nd tensor of a nbDims-2-D convolution, given the convolution descriptor, the input tensor descriptor and the filter descriptor This function can help to setup the output tensor and allocate the proper amount of memory prior to launch the actual convolution.

cudnnStatus_t cudnnGetConvolutionNdForwardOutputDim(
    const cudnnConvolutionDescriptor_t  convDesc,
    const cudnnTensorDescriptor_t       inputTensorDesc,
    const cudnnFilterDescriptor_t       filterDesc,
    int                                 nbDims,
    int                                 tensorOuputDimA[])

Each dimension of the (nbDims-2)-D images of the output tensor is computed as follows:

outputDim = 1 + ( inputDim + 2*pad - (((filterDim-1)*dilation)+1) )/convolutionStride;

The dimensions provided by this routine must be strictly respected when calling cudnnConvolutionForward() or cudnnConvolutionBackwardBias(). Providing a smaller or larger output tensor is not supported by the convolution routines.

Parameters

convDesc

Input. Handle to a previously created convolution descriptor.

inputTensorDesc

Input. Handle to a previously initialized tensor descriptor.

filterDesc

Input. Handle to a previously initialized filter descriptor.

nbDims

Input. Dimension of the output tensor

tensorOuputDimA

Output. Array of dimensions nbDims that contains on exit of this routine the sizes of the output tensor

Returns

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • One of the parameters convDesc, inputTensorDesc, and filterDesc is NIL.

  • The dimension of the filter descriptor filterDesc is different from the dimension of input tensor descriptor inputTensorDesc.

  • The dimension of the convolution descriptor is different from the dimension of input tensor descriptor inputTensorDesc-2.

  • The features map of the filter descriptor filterDesc is different from the one of input tensor descriptor inputTensorDesc.

  • The size of the dilated filter filterDesc is larger than the padded sizes of the input tensor.

  • The dimension nbDims of the output array is negative or greater than the dimension of input tensor descriptor inputTensorDesc.

CUDNN_STATUS_SUCCESS

The routine exited successfully.

cudnnGetConvolutionReorderType()

This function has been deprecated in cuDNN 9.0.

This function retrieves the convolution reorder type from the given convolution descriptor.

cudnnStatus_t cudnnGetConvolutionReorderType(
  cudnnConvolutionDescriptor_t convDesc,
  cudnnReorderType_t *reorderType);

Parameters

convDesc

Input. The convolution descriptor from which the reorder type should be retrieved.

reorderType

Output. The retrieved reorder type. For more information, refer to cudnnReorderType_t.

Returns

CUDNN_STATUS_BAD_PARAM

One of the inputs to this function is not valid.

CUDNN_STATUS_SUCCESS

The reorder type is retrieved successfully.

cudnnGetFoldedConvBackwardDataDescriptors()

This function calculates folding descriptors for backward data gradients. It takes as input the data descriptors along with the convolution descriptor and computes the folded data descriptors and the folding transform descriptors. These can then be used to do the actual folding transform.

cudnnStatus_t
cudnnGetFoldedConvBackwardDataDescriptors(const cudnnHandle_t handle,
                                          const cudnnFilterDescriptor_t filterDesc,
                                          const cudnnTensorDescriptor_t diffDesc,
                                          const cudnnConvolutionDescriptor_t convDesc,
                                          const cudnnTensorDescriptor_t gradDesc,
                                          const cudnnTensorFormat_t transformFormat,
                                          cudnnFilterDescriptor_t foldedFilterDesc,
                                          cudnnTensorDescriptor_t paddedDiffDesc,
                                          cudnnConvolutionDescriptor_t foldedConvDesc,
                                          cudnnTensorDescriptor_t foldedGradDesc,
                                          cudnnTensorTransformDescriptor_t filterFoldTransDesc,
                                          cudnnTensorTransformDescriptor_t diffPadTransDesc,
                                          cudnnTensorTransformDescriptor_t gradFoldTransDesc,
                                          cudnnTensorTransformDescriptor_t gradUnfoldTransDesc) ;

Parameters

handle

Input. Handle to a previously created cuDNN context.

filterDesc

Input. Filter descriptor before folding.

diffDesc

Input. Diff descriptor before folding.

convDesc

Input. Convolution descriptor before folding.

gradDesc

Input. Gradient descriptor before folding.

transformFormat

Input. Transform format for folding.

foldedFilterDesc

Output. Folded filter descriptor.

paddedDiffDesc

Output. Padded Diff descriptor.

foldedConvDesc

Output. Folded convolution descriptor.

foldedGradDesc

Output. Folded gradient descriptor.

filterFoldTransDesc

Output. Folding transform descriptor for filter.

diffPadTransDesc

Output. Folding transform descriptor for Desc.

gradFoldTransDesc

Output. Folding transform descriptor for gradient.

gradUnfoldTransDesc

Output. Unfolding transform descriptor for folded gradient.

Returns

CUDNN_STATUS_SUCCESS

Folded descriptors were computed successfully.

CUDNN_STATUS_BAD_PARAM

If any of the input parameters is NULL or if the input tensor has more than 4 dimensions.

CUDNN_STATUS_EXECUTION_FAILED

Computing the folded descriptors failed.

cudnnGetFusedOpsConstParamPackAttribute()

This function retrieves the values of the descriptor pointed to by the param pointer input. The type of the descriptor is indicated by the enum value of paramLabel input.

cudnnStatus_t cudnnGetFusedOpsConstParamPackAttribute(
  const cudnnFusedOpsConstParamPack_t constPack,
  cudnnFusedOpsConstParamLabel_t paramLabel,
  void *param,
  int *isNULL);

Parameters

constPack

Input. The opaque cudnnFusedOpsConstParamPack_t structure that contains the various problem size information, such as the shape, layout, and the type of tensors, and the descriptors for convolution and activation, for the selected sequence of cudnnFusedOps_t computations.

paramLabel

Input. Several types of descriptors can be retrieved by this getter function. The param input points to the descriptor itself, and this input indicates the type of the descriptor pointed to by the param input. The cudnnFusedOpsConstParamLabel_t enumerant type enables the selection of the type of the descriptor. Refer to the param description below.

param

Input. Data pointer to the host memory associated with the descriptor that should be retrieved. The type of this descriptor depends on the value of paramLabel. For the given paramLabel, if the associated value inside the constPack is set to NULL or by default NULL, then cuDNN will copy the value or the opaque structure in the constPack to the host memory buffer pointed to by param. For more information, refer to the table in cudnnFusedOpsConstParamLabel_t.

isNULL

Input/Output. Users must pass a pointer to an integer in the host memory in this field. If the value in the constPack associated with the given paramLabel is by default NULL or previously set by the user to NULL, then cuDNN will write a non-zero value to the location pointed by isNULL.

Returns

CUDNN_STATUS_SUCCESS

The descriptor values are retrieved successfully.

CUDNN_STATUS_BAD_PARAM

If either constPack, param, or isNULL is NULL; or if paramLabel is invalid.

cudnnGetFusedOpsVariantParamPackAttribute()

This function retrieves the settings of the variable parameter pack descriptor.

cudnnStatus_t cudnnGetFusedOpsVariantParamPackAttribute(
  const cudnnFusedOpsVariantParamPack_t varPack,
  cudnnFusedOpsVariantParamLabel_t paramLabel,
  void *ptr);

Parameters

varPack

Input. Pointer to the cudnnFusedOps variant parameter pack (varPack) descriptor.

paramLabel

Input. Type of the buffer pointer parameter (in the varPack descriptor). For more information, refer to cudnnFusedOpsConstParamLabel_t. The retrieved descriptor values vary according to this type.

ptr

Output. Pointer to the host or device memory where the retrieved value is written by this function. The data type of the pointer, and the host/device memory location, depend on the paramLabel input selection. For more information, refer to cudnnFusedOpsVariantParamLabel_t.

Returns

CUDNN_STATUS_SUCCESS

The descriptor values are retrieved successfully.

CUDNN_STATUS_BAD_PARAM

If either varPack or ptr is NULL, or if paramLabel is set to invalid value.

cudnnIm2Col()

This function has been deprecated in cuDNN 9.0.

This function constructs the A matrix necessary to perform a forward pass of GEMM convolution.

cudnnStatus_t cudnnIm2Col(
    cudnnHandle_t                   handle,
    cudnnTensorDescriptor_t         srcDesc,
    const void                      *srcData,
    cudnnFilterDescriptor_t         filterDesc,
    cudnnConvolutionDescriptor_t    convDesc,
    void                            *colBuffer)

This A matrix has a height of batch_size*y_height*y_width and width of input_channels*filter_height*filter_width, where:

  • batch_size is srcDesc first dimension

  • y_height/y_width are computed from cudnnGetConvolutionNdForwardOutputDim()

  • input_channels is srcDesc second dimension (when in NCHW layout)

  • filter_height/filter_width are wDesc third and fourth dimension

The A matrix is stored in format HW fully-packed in GPU memory.

Parameters

handle

Input. Handle to a previously created cuDNN context.

srcDesc

Input. Handle to a previously initialized tensor descriptor.

srcData

Input. Data pointer to GPU memory associated with the input tensor descriptor.

filterDesc

Input. Handle to a previously initialized filter descriptor.

convDesc

Input. Handle to a previously initialized convolution descriptor.

colBuffer

Output. Data pointer to GPU memory storing the output matrix.

Returns

CUDNN_STATUS_BAD_PARAM

srcData or colBuffer is NULL.

CUDNN_STATUS_NOT_SUPPORTED

Any of srcDesc, filterDesc, convDesc has dataType of CUDNN_DATA_INT8, CUDNN_DATA_INT8x4, CUDNN_DATA_INT8, or CUDNN_DATA_INT8x4 convDesc has groupCount larger than 1.

CUDNN_STATUS_EXECUTION_FAILED

The CUDA kernel execution was unsuccessful.

CUDNN_STATUS_SUCCESS

The output data array is successfully generated.

cudnnMakeFusedOpsPlan()

This function has been deprecated in cuDNN 9.0.

This function determines the optimum kernel to execute, and the workspace size the user should allocate, prior to the actual execution of the fused operations by cudnnFusedOpsExecute().

cudnnStatus_t cudnnMakeFusedOpsPlan(
  cudnnHandle_t handle,
  cudnnFusedOpsPlan_t plan,
  const cudnnFusedOpsConstParamPack_t constPack,
  size_t *workspaceSizeInBytes);

Parameters

handle

Input. Pointer to the cuDNN library context.

plan

Input. Pointer to a previously-created and initialized plan descriptor.

constPack

Input. Pointer to the descriptor to the const parameters pack.

workspaceSizeInBytes

Output. The amount of workspace size the user should allocate for the execution of this plan.

Returns

CUDNN_STATUS_BAD_PARAM

If any of the inputs is NULL, or if the type of cudnnFusedOps_t in the constPack descriptor is unsupported.

CUDNN_STATUS_SUCCESS

The function executed successfully.

cudnnReorderFilterAndBias()

This function has been deprecated in cuDNN 9.0.

This function reorders the filter and bias values for tensors with data type CUDNN_DATA_INT8x32 and tensor format CUDNN_TENSOR_NCHW_VECT_C. It can be used to enhance the inference time by separating the reordering operation from convolution.

cudnnStatus_t cudnnReorderFilterAndBias(
  cudnnHandle_t handle,
  const cudnnFilterDescriptor_t filterDesc,
  cudnnReorderType_t reorderType,
  const void *filterData,
  void *reorderedFilterData,
  int reorderBias,
  const void *biasData,
  void *reorderedBiasData);

Filter and bias tensors with data type CUDNN_DATA_INT8x32 (also implying tensor format CUDNN_TENSOR_NCHW_VECT_C) requires permutation of output channel axes in order to take advantage of the Tensor Core IMMA instruction. This is done in every cudnnConvolutionForward() and cudnnConvolutionBiasActivationForward() call when the reorder type attribute of the convolution descriptor is set to CUDNN_DEFAULT_REORDER. Users can avoid the repeated reordering kernel call by first using this call to reorder the filter and bias tensor and call the convolution forward APIs with reorder type set to CUDNN_NO_REORDER.

For example, convolutions in a neural network of multiple layers can require reordering of kernels at every layer, which can take up a significant fraction of the total inference time. Using this function, the reordering can be done one time on the filter and bias data. This is followed by the convolution operations at the multiple layers, which enhance the inference time.

Parameters

handle

Input. Handle to a previously created cuDNN context.

filterDesc

Input. Descriptor for the kernel dataset.

reorderType

Input. Setting to either perform reordering or not. For more information, refer to cudnnReorderType_t.

filterData

Input. Pointer to the filter (kernel) data location in the device memory.

reorderedFilterData

Output. Pointer to the location in the device memory where the reordered filter data will be written to, by this function. This tensor has the same dimensions as filterData.

reorderBias

Input. If > 0, then reorders the biasData also. If <= 0 then does not perform reordering operations on the biasData.

biasData

Input. Pointer to the bias data location in the device memory.

reorderedBiasData

Output. Pointer to the location in the device memory where the reordered biasData will be written to, by this function. This tensor has the same dimensions as biasData.

Returns

CUDNN_STATUS_SUCCESS

Reordering was successful.

CUDNN_STATUS_EXECUTION_FAILED

Either the reordering of the filter data or of the biasData failed.

CUDNN_STATUS_BAD_PARAM

The handle, filter descriptor, filter data, or reordered data is NULL. Or, if the bias reordering is requested (reorderBias > 0), the biasData or reordered biasData is NULL. This status can also be returned if the filter dimension size is not 4.

CUDNN_STATUS_NOT_SUPPORTED

Filter descriptor data type is not CUDNN_DATA_INT8x32; the filter descriptor tensor is not in a vectorized layout (CUDNN_TENSOR_NCHW_VECT_C).

cudnnSetConvolution2dDescriptor()

This function has been deprecated in cuDNN 9.0.

This function initializes a previously created convolution descriptor object into a 2D correlation. This function assumes that the tensor and filter descriptors correspond to the forward convolution path and checks if their settings are valid. That same convolution descriptor can be reused in the backward path provided it corresponds to the same layer.

cudnnStatus_t cudnnSetConvolution2dDescriptor(
    cudnnConvolutionDescriptor_t    convDesc,
    int                             pad_h,
    int                             pad_w,
    int                             u,
    int                             v,
    int                             dilation_h,
    int                             dilation_w,
    cudnnConvolutionMode_t          mode,
    cudnnDataType_t                 computeType)

Parameters

convDesc

Input/Output. Handle to a previously created convolution descriptor.

pad_h

Input. Zero-padding height: number of rows of zeros implicitly concatenated onto the top and onto the bottom of input images.

pad_w

Input. Zero-padding width: number of columns of zeros implicitly concatenated onto the left and onto the right of input images.

u

Input. Vertical filter stride.

v

Input. Horizontal filter stride.

dilation_h

Input. Filter height dilation.

dilation_w

Input. Filter width dilation.

mode

Input. Selects between CUDNN_CONVOLUTION and CUDNN_CROSS_CORRELATION.

computeType

Input. compute precision.

Returns

CUDNN_STATUS_SUCCESS

The object was set successfully.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • The descriptor convDesc is NIL.

  • One of the parameters pad_h, pad_w is strictly negative.

  • One of the parameters u, v is negative or zero.

  • One of the parameters dilation_h, dilation_w is negative or zero.

  • The parameter mode has an invalid enumerant value.

cudnnSetConvolutionGroupCount()

This function has been deprecated in cuDNN 9.0.

This function allows the user to specify the number of groups to be used in the associated convolution.

cudnnStatus_t cudnnSetConvolutionGroupCount(
    cudnnConvolutionDescriptor_t    convDesc,
    int                             groupCount)

Returns

CUDNN_STATUS_SUCCESS

The group count was set successfully.

CUDNN_STATUS_BAD_PARAM

An invalid convolution descriptor was provided.

cudnnSetConvolutionMathType()

This function has been deprecated in cuDNN 9.0.

This function allows the user to specify whether or not the use of tensor op is permitted in the library routines associated with a given convolution descriptor.

cudnnStatus_t cudnnSetConvolutionMathType(
    cudnnConvolutionDescriptor_t    convDesc,
    cudnnMathType_t                 mathType)

Returns

CUDNN_STATUS_SUCCESS

The math type was set successfully.

CUDNN_STATUS_BAD_PARAM

Either an invalid convolution descriptor was provided or an invalid math type was specified.

cudnnSetConvolutionNdDescriptor()

This function has been deprecated in cuDNN 9.0.

This function initializes a previously created generic convolution descriptor object into a Nd correlation. That same convolution descriptor can be reused in the backward path provided it corresponds to the same layer. The convolution computation will be done in the specified dataType, which can be potentially different from the input/output tensors.

cudnnStatus_t cudnnSetConvolutionNdDescriptor(
    cudnnConvolutionDescriptor_t    convDesc,
    int                             arrayLength,
    const int                       padA[],
    const int                       filterStrideA[],
    const int                       dilationA[],
    cudnnConvolutionMode_t          mode,
    cudnnDataType_t                 dataType)

Parameters

convDesc

Input/Output. Handle to a previously created convolution descriptor.

arrayLength

Input. Dimension of the convolution.

padA

Input. Array of dimension arrayLength containing the zero-padding size for each dimension. For every dimension, the padding represents the number of extra zeros implicitly concatenated at the start and at the end of every element of that dimension.

filterStrideA

Input. Array of dimension arrayLength containing the filter stride for each dimension. For every dimension, the filter stride represents the number of elements to slide to reach the next start of the filtering window of the next point.

dilationA

Input. Array of dimension arrayLength containing the dilation factor for each dimension.

mode

Input. Selects between CUDNN_CONVOLUTION and CUDNN_CROSS_CORRELATION.

datatype

Input. Selects the data type in which the computation will be done.

Note

CUDNN_DATA_HALF in cudnnSetConvolutionNdDescriptor() with HALF_CONVOLUTION_BWD_FILTER is not recommended as it is known to not be useful for any practical use case for training and will be considered to be blocked in a future cuDNN release. The use of CUDNN_DATA_HALF for input tensors in cudnnSetTensorNdDescriptor() and CUDNN_DATA_FLOAT in cudnnSetConvolutionNdDescriptor() with HALF_CONVOLUTION_BWD_FILTER is recommended and is used with the automatic mixed precision (AMP) training in many well known deep learning frameworks.

Returns

CUDNN_STATUS_SUCCESS

The object was set successfully.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • The descriptor convDesc is NIL.

  • The arrayLengthRequest is negative.

  • The enumerant mode has an invalid value.

  • The enumerant datatype has an invalid value.

  • One of the elements of padA is strictly negative.

  • One of the elements of strideA is negative or zero.

  • One of the elements of dilationA is negative or zero.

CUDNN_STATUS_NOT_SUPPORTED

At least one of the following conditions are met:

  • The arrayLengthRequest is greater than CUDNN_DIM_MAX.

cudnnSetConvolutionReorderType()

This function has been deprecated in cuDNN 9.0.

This function sets the convolution reorder type for the given convolution descriptor.

cudnnStatus_t cudnnSetConvolutionReorderType(
  cudnnConvolutionDescriptor_t convDesc,
  cudnnReorderType_t reorderType);

Parameters

convDesc

Input. The convolution descriptor for which the reorder type should be set.

reorderType

Input. Set the reorder type to this value. For more information, refer to cudnnReorderType_t.

Returns

CUDNN_STATUS_BAD_PARAM

The reorder type supplied is not supported.

CUDNN_STATUS_SUCCESS

Reorder type is set successfully.

cudnnSetFusedOpsConstParamPackAttribute()

This function has been deprecated in cuDNN 9.0.

This function sets the descriptor pointed to by the param pointer input. The type of the descriptor to be set is indicated by the enum value of the paramLabel input.

cudnnStatus_t cudnnSetFusedOpsConstParamPackAttribute(
  cudnnFusedOpsConstParamPack_t constPack,
  cudnnFusedOpsConstParamLabel_t paramLabel,
  const void *param);

Parameters

constPack

Input. The opaque cudnnFusedOpsConstParamPack_t structure that contains the various problem size information, such as the shape, layout and the type of tensors, the descriptors for convolution and activation, and settings for operations such as convolution and activation.

paramLabel

Input. Several types of descriptors can be set by this setter function. The param input points to the descriptor itself, and this input indicates the type of the descriptor pointed to by the param input. The cudnnFusedOpsConstParamPack_t enumerant type enables the selection of the type of the descriptor.

param

Input. Data pointer to the host memory, associated with the specific descriptor. The type of the descriptor depends on the value of paramLabel. For more information, refer to the table in cudnnFusedOpsConstParamPack_t.

If this pointer is set to NULL, then the cuDNN library will record as such. If not, then the values pointed to by this pointer (meaning, the value or the opaque structure underneath) will be copied into the constPack during cudnnSetFusedOpsConstParamPackAttribute() operation.

Returns

CUDNN_STATUS_SUCCESS

The descriptor is set successfully.

CUDNN_STATUS_BAD_PARAM

If constPack is NULL, or if paramLabel or the ops setting for constPack is invalid.

cudnnSetFusedOpsVariantParamPackAttribute()

This function has been deprecated in cuDNN 9.0.

This function sets the variable parameter pack descriptor.

cudnnStatus_t cudnnSetFusedOpsVariantParamPackAttribute(
  cudnnFusedOpsVariantParamPack_t varPack,
  cudnnFusedOpsVariantParamLabel_t paramLabel,
  void *ptr);

Parameters

varPack

Input. Pointer to the cudnnFusedOps variant parameter pack (varPack) descriptor.

paramLabel

Input. Type to which the buffer pointer parameter (in the varPack descriptor) is set by this function. For more information, refer to cudnnFusedOpsConstParamLabel_t.

ptr

Input. Pointer to the host or device memory, to the value to which the descriptor parameter is set. The data type of the pointer, and the host/device memory location, depend on the paramLabel input selection. For more information, refer to cudnnFusedOpsVariantParamLabel_t.

Returns

CUDNN_STATUS_BAD_PARAM

If varPack is NULL or if paramLabel is set to an unsupported value.

CUDNN_STATUS_SUCCESS The descriptor is set successfully.