6.24. Execution Control [DEPRECATED]

This section describes the deprecated execution control functions of the low-level CUDA driver application programming interface.

Functions

CUresult cuFuncSetBlockShape ( CUfunction hfunc, int  x, int  y, int  z )
Sets the block-dimensions for the function.
CUresult cuFuncSetSharedMemConfig ( CUfunction hfunc, CUsharedconfig config )
Sets the shared memory configuration for a device function.
CUresult cuFuncSetSharedSize ( CUfunction hfunc, unsigned int  bytes )
Sets the dynamic shared-memory size for the function.
CUresult cuLaunch ( CUfunction f )
Launches a CUDA function.
CUresult cuLaunchCooperativeKernelMultiDevice ( CUDA_LAUNCH_PARAMS* launchParamsList, unsigned int  numDevices, unsigned int  flags )
Launches CUDA functions on multiple devices where thread blocks can cooperate and synchronize as they execute.
CUresult cuLaunchGrid ( CUfunction f, int  grid_width, int  grid_height )
Launches a CUDA function.
CUresult cuLaunchGridAsync ( CUfunction f, int  grid_width, int  grid_height, CUstream hStream )
Launches a CUDA function.
CUresult cuParamSetSize ( CUfunction hfunc, unsigned int  numbytes )
Sets the parameter size for the function.
CUresult cuParamSetTexRef ( CUfunction hfunc, int  texunit, CUtexref hTexRef )
Adds a texture-reference to the function's argument list.
CUresult cuParamSetf ( CUfunction hfunc, int  offset, float  value )
Adds a floating-point parameter to the function's argument list.
CUresult cuParamSeti ( CUfunction hfunc, int  offset, unsigned int  value )
Adds an integer parameter to the function's argument list.
CUresult cuParamSetv ( CUfunction hfunc, int  offset, void* ptr, unsigned int  numbytes )
Adds arbitrary data to the function's argument list.

Functions

CUresult cuFuncSetBlockShape ( CUfunction hfunc, int  x, int  y, int  z )
Sets the block-dimensions for the function.
Parameters
hfunc
- Kernel to specify dimensions of
x
- X dimension
y
- Y dimension
z
- Z dimension
Deprecated

Description
Specifies the x, y, and z dimensions of the thread blocks that are created when the kernel given by hfunc is launched.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuFuncSetSharedSize, cuFuncSetCacheConfig, cuFuncGetAttribute, cuParamSetSize, cuParamSeti, cuParamSetf, cuParamSetv, cuLaunch, cuLaunchGrid, cuLaunchGridAsync, cuLaunchKernel

CUresult cuFuncSetSharedMemConfig ( CUfunction hfunc, CUsharedconfig config )
Sets the shared memory configuration for a device function.
Parameters
hfunc
- kernel to be given a shared memory config
config
- requested shared memory configuration
Deprecated

Description
On devices with configurable shared memory banks, this function will force all subsequent launches of the specified device function to have the given shared memory bank size configuration. On any given launch of the function, the shared memory configuration of the device will be temporarily changed if needed to suit the function's preferred configuration. Changes in shared memory configuration between subsequent launches of functions, may introduce a device side synchronization point.

Any per-function setting of shared memory bank size set via cuFuncSetSharedMemConfig will override the context wide setting set with cuCtxSetSharedMemConfig.

Changing the shared memory bank size will not increase shared memory usage or affect occupancy of kernels, but may have major effects on performance. Larger bank sizes will allow for greater potential bandwidth to shared memory, but will change what kinds of accesses to shared memory will result in bank conflicts.

This function will do nothing on devices with fixed shared memory bank size.

The supported bank configurations are:

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuCtxGetCacheConfig, cuCtxSetCacheConfig, cuCtxGetSharedMemConfig, cuCtxSetSharedMemConfig, cuFuncGetAttribute, cuLaunchKernel, cudaFuncSetSharedMemConfig

CUresult cuFuncSetSharedSize ( CUfunction hfunc, unsigned int  bytes )
Sets the dynamic shared-memory size for the function.
Parameters
hfunc
- Kernel to specify dynamic shared-memory size for
bytes
- Dynamic shared-memory size per thread in bytes
Deprecated

Description
Sets through bytes the amount of dynamic shared memory that will be available to each thread block when the kernel given by hfunc is launched.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuFuncSetBlockShape, cuFuncSetCacheConfig, cuFuncGetAttribute, cuParamSetSize, cuParamSeti, cuParamSetf, cuParamSetv, cuLaunch, cuLaunchGrid, cuLaunchGridAsync, cuLaunchKernel

CUresult cuLaunch ( CUfunction f )
Launches a CUDA function.
Parameters
f
- Kernel to launch
Deprecated

Description
Invokes the kernel f on a 1 x 1 x 1 grid of blocks. The block contains the number of threads specified by a previous call to cuFuncSetBlockShape().

The block shape, dynamic shared memory size, and parameter information must be set using cuFuncSetBlockShape(), cuFuncSetSharedSize(), cuParamSetSize(), cuParamSeti(), cuParamSetf(), and cuParamSetv() prior to calling this function.

Launching a function via cuLaunchKernel() invalidates the function's block shape, dynamic shared memory size, and parameter information. After launching via cuLaunchKernel, this state must be re-initialized prior to calling this function. Failure to do so results in undefined behavior.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuFuncSetBlockShape, cuFuncSetSharedSize, cuFuncGetAttribute, cuParamSetSize, cuParamSetf, cuParamSeti, cuParamSetv, cuLaunchGrid, cuLaunchGridAsync, cuLaunchKernel

CUresult cuLaunchCooperativeKernelMultiDevice ( CUDA_LAUNCH_PARAMS* launchParamsList, unsigned int  numDevices, unsigned int  flags )
Launches CUDA functions on multiple devices where thread blocks can cooperate and synchronize as they execute.
Parameters
launchParamsList
- List of launch parameters, one per device
numDevices
- Size of the launchParamsList array
flags
- Flags to control launch behavior
Deprecated

This function is deprecated as of CUDA 11.3.

Description

Invokes kernels as specified in the launchParamsList array where each element of the array specifies all the parameters required to perform a single kernel launch. These kernels can cooperate and synchronize as they execute. The size of the array is specified by numDevices.

No two kernels can be launched on the same device. All the devices targeted by this multi-device launch must be identical. All devices must have a non-zero value for the device attribute CU_DEVICE_ATTRIBUTE_COOPERATIVE_MULTI_DEVICE_LAUNCH.

All kernels launched must be identical with respect to the compiled code. Note that any __device__, __constant__ or __managed__ variables present in the module that owns the kernel launched on each device, are independently instantiated on every device. It is the application's responsibility to ensure these variables are initialized and used appropriately.

The size of the grids as specified in blocks, the size of the blocks themselves and the amount of shared memory used by each thread block must also match across all launched kernels.

The streams used to launch these kernels must have been created via either cuStreamCreate or cuStreamCreateWithPriority. The NULL stream or CU_STREAM_LEGACY or CU_STREAM_PER_THREAD cannot be used.

The total number of blocks launched per kernel cannot exceed the maximum number of blocks per multiprocessor as returned by cuOccupancyMaxActiveBlocksPerMultiprocessor (or cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags) times the number of multiprocessors as specified by the device attribute CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT. Since the total number of blocks launched per device has to match across all devices, the maximum number of blocks that can be launched per device will be limited by the device with the least number of multiprocessors.

The kernels cannot make use of CUDA dynamic parallelism.

The CUDA_LAUNCH_PARAMS structure is defined as:

‎        typedef struct CUDA_LAUNCH_PARAMS_st
              {
                  CUfunction function;
                  unsigned int gridDimX;
                  unsigned int gridDimY;
                  unsigned int gridDimZ;
                  unsigned int blockDimX;
                  unsigned int blockDimY;
                  unsigned int blockDimZ;
                  unsigned int sharedMemBytes;
                  CUstream hStream;
                  void **kernelParams;
              } CUDA_LAUNCH_PARAMS;
where:

By default, the kernel won't begin execution on any GPU until all prior work in all the specified streams has completed. This behavior can be overridden by specifying the flag CUDA_COOPERATIVE_LAUNCH_MULTI_DEVICE_NO_PRE_LAUNCH_SYNC. When this flag is specified, each kernel will only wait for prior work in the stream corresponding to that GPU to complete before it begins execution.

Similarly, by default, any subsequent work pushed in any of the specified streams will not begin execution until the kernels on all GPUs have completed. This behavior can be overridden by specifying the flag CUDA_COOPERATIVE_LAUNCH_MULTI_DEVICE_NO_POST_LAUNCH_SYNC. When this flag is specified, any subsequent work pushed in any of the specified streams will only wait for the kernel launched on the GPU corresponding to that stream to complete before it begins execution.

Calling cuLaunchCooperativeKernelMultiDevice() sets persistent function state that is the same as function state set through cuLaunchKernel API when called individually for each element in launchParamsList.

When kernels are launched via cuLaunchCooperativeKernelMultiDevice(), the previous block shape, shared size and parameter info associated with each CUDA_LAUNCH_PARAMS::function in launchParamsList is overwritten.

Note that to use cuLaunchCooperativeKernelMultiDevice(), the kernels must either have been compiled with toolchain version 3.2 or later so that it will contain kernel parameter information, or have no kernel parameters. If either of these conditions is not met, then cuLaunchCooperativeKernelMultiDevice() will return CUDA_ERROR_INVALID_IMAGE.

Note:
  • This function uses standard default stream semantics.

  • Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuCtxGetCacheConfig, cuCtxSetCacheConfig, cuFuncSetCacheConfig, cuFuncGetAttribute, cuLaunchCooperativeKernel, cudaLaunchCooperativeKernelMultiDevice

CUresult cuLaunchGrid ( CUfunction f, int  grid_width, int  grid_height )
Launches a CUDA function.
Parameters
f
- Kernel to launch
grid_width
- Width of grid in blocks
grid_height
- Height of grid in blocks
Deprecated

Description
Invokes the kernel f on a grid_width x grid_height grid of blocks. Each block contains the number of threads specified by a previous call to cuFuncSetBlockShape().

The block shape, dynamic shared memory size, and parameter information must be set using cuFuncSetBlockShape(), cuFuncSetSharedSize(), cuParamSetSize(), cuParamSeti(), cuParamSetf(), and cuParamSetv() prior to calling this function.

Launching a function via cuLaunchKernel() invalidates the function's block shape, dynamic shared memory size, and parameter information. After launching via cuLaunchKernel, this state must be re-initialized prior to calling this function. Failure to do so results in undefined behavior.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuFuncSetBlockShape, cuFuncSetSharedSize, cuFuncGetAttribute, cuParamSetSize, cuParamSetf, cuParamSeti, cuParamSetv, cuLaunch, cuLaunchGridAsync, cuLaunchKernel

CUresult cuLaunchGridAsync ( CUfunction f, int  grid_width, int  grid_height, CUstream hStream )
Launches a CUDA function.
Parameters
f
- Kernel to launch
grid_width
- Width of grid in blocks
grid_height
- Height of grid in blocks
hStream
- Stream identifier
Deprecated

Description
Invokes the kernel f on a grid_width x grid_height grid of blocks. Each block contains the number of threads specified by a previous call to cuFuncSetBlockShape().

The block shape, dynamic shared memory size, and parameter information must be set using cuFuncSetBlockShape(), cuFuncSetSharedSize(), cuParamSetSize(), cuParamSeti(), cuParamSetf(), and cuParamSetv() prior to calling this function.

Launching a function via cuLaunchKernel() invalidates the function's block shape, dynamic shared memory size, and parameter information. After launching via cuLaunchKernel, this state must be re-initialized prior to calling this function. Failure to do so results in undefined behavior.

Note:
  • In certain cases where cubins are created with no ABI (i.e., using ptxas--abi-compileno), this function may serialize kernel launches. The CUDA driver retains asynchronous behavior by growing the per-thread stack as needed per launch and not shrinking it afterwards.

  • This function uses standard default stream semantics.

  • Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuFuncSetBlockShape, cuFuncSetSharedSize, cuFuncGetAttribute, cuParamSetSize, cuParamSetf, cuParamSeti, cuParamSetv, cuLaunch, cuLaunchGrid, cuLaunchKernel

CUresult cuParamSetSize ( CUfunction hfunc, unsigned int  numbytes )
Sets the parameter size for the function.
Parameters
hfunc
- Kernel to set parameter size for
numbytes
- Size of parameter list in bytes
Deprecated

Description
Sets through numbytes the total size in bytes needed by the function parameters of the kernel corresponding to hfunc.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuFuncSetBlockShape, cuFuncSetSharedSize, cuFuncGetAttribute, cuParamSetf, cuParamSeti, cuParamSetv, cuLaunch, cuLaunchGrid, cuLaunchGridAsync, cuLaunchKernel

CUresult cuParamSetTexRef ( CUfunction hfunc, int  texunit, CUtexref hTexRef )
Adds a texture-reference to the function's argument list.
Parameters
hfunc
- Kernel to add texture-reference to
texunit
- Texture unit (must be CU_PARAM_TR_DEFAULT)
hTexRef
- Texture-reference to add to argument list
Deprecated

Description
Makes the CUDA array or linear memory bound to the texture reference hTexRef available to a device program as a texture. In this version of CUDA, the texture-reference must be obtained via cuModuleGetTexRef() and the texunit parameter must be set to CU_PARAM_TR_DEFAULT.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

CUresult cuParamSetf ( CUfunction hfunc, int  offset, float  value )
Adds a floating-point parameter to the function's argument list.
Parameters
hfunc
- Kernel to add parameter to
offset
- Offset to add parameter to argument list
value
- Value of parameter
Deprecated

Description
Sets a floating-point parameter that will be specified the next time the kernel corresponding to hfunc will be invoked. offset is a byte offset.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuFuncSetBlockShape, cuFuncSetSharedSize, cuFuncGetAttribute, cuParamSetSize, cuParamSeti, cuParamSetv, cuLaunch, cuLaunchGrid, cuLaunchGridAsync, cuLaunchKernel

CUresult cuParamSeti ( CUfunction hfunc, int  offset, unsigned int  value )
Adds an integer parameter to the function's argument list.
Parameters
hfunc
- Kernel to add parameter to
offset
- Offset to add parameter to argument list
value
- Value of parameter
Deprecated

Description
Sets an integer parameter that will be specified the next time the kernel corresponding to hfunc will be invoked. offset is a byte offset.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuFuncSetBlockShape, cuFuncSetSharedSize, cuFuncGetAttribute, cuParamSetSize, cuParamSetf, cuParamSetv, cuLaunch, cuLaunchGrid, cuLaunchGridAsync, cuLaunchKernel

CUresult cuParamSetv ( CUfunction hfunc, int  offset, void* ptr, unsigned int  numbytes )
Adds arbitrary data to the function's argument list.
Parameters
hfunc
- Kernel to add data to
offset
- Offset to add data to argument list
ptr
- Pointer to arbitrary data
numbytes
- Size of data to copy in bytes
Deprecated

Description
Copies an arbitrary amount of data (specified in numbytes) from ptr into the parameter space of the kernel corresponding to hfunc. offset is a byte offset.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuFuncSetBlockShape, cuFuncSetSharedSize, cuFuncGetAttribute, cuParamSetSize, cuParamSetf, cuParamSeti, cuLaunch, cuLaunchGrid, cuLaunchGridAsync, cuLaunchKernel