4.6. Execution Control

This section describes the execution control functions of the CUDA runtime application programming interface.

Some functions have overloaded C++ API template versions documented separately in the C++ API Routines module.

Functions

__host____device__cudaError_t cudaFuncGetAttributes ( cudaFuncAttributes* attr, const void* func )
Find out attributes for a given function.
__host__cudaError_t cudaFuncSetCacheConfig ( const void* func, cudaFuncCache cacheConfig )
Sets the preferred cache configuration for a device function.
__host__cudaError_t cudaFuncSetSharedMemConfig ( const void* func, cudaSharedMemConfig config )
Sets the shared memory configuration for a device function.
__device__ ​ void* cudaGetParameterBuffer ( size_t alignment, size_t size )
Obtains a parameter buffer.
__device__ ​ void* cudaGetParameterBufferV2 ( void* func, dim3 gridDimension, dim3 blockDimension, unsigned int  sharedMemSize )
Launches a specified kernel.
__host__cudaError_t cudaLaunchKernel ( const void* func, dim3 gridDim, dim3 blockDim, void** args, size_t sharedMem, cudaStream_t stream )
Launches a device function.
__host__cudaError_t cudaSetDoubleForDevice ( double* d )
Converts a double argument to be executed on a device.
__host__cudaError_t cudaSetDoubleForHost ( double* d )
Converts a double argument after execution on a device.

Functions

__host____device__cudaError_t cudaFuncGetAttributes ( cudaFuncAttributes* attr, const void* func )
Find out attributes for a given function.
Parameters
attr
- Return pointer to function's attributes
func
- Device function symbol
Description

This function obtains the attributes of a function specified via func. func is a device function symbol and must be declared as a __global__ function. The fetched attributes are placed in attr. If the specified function does not exist, then cudaErrorInvalidDeviceFunction is returned. For templated functions, pass the function symbol as follows: func_name<template_arg_0,...,template_arg_N>

Note that some function attributes such as maxThreadsPerBlock may vary based on the device that is currently being used.

Note:
  • Note that this function may also return error codes from previous, asynchronous launches.

  • Use of a string naming a function as the func paramater was deprecated in CUDA 4.1 and removed in CUDA 5.0.

See also:

cudaConfigureCall, cudaFuncSetCacheConfig ( C API), cudaFuncGetAttributes ( C++ API), cudaLaunchKernel ( C API), cudaSetDoubleForDevice, cudaSetDoubleForHost, cudaSetupArgument ( C API)

__host__cudaError_t cudaFuncSetCacheConfig ( const void* func, cudaFuncCache cacheConfig )
Sets the preferred cache configuration for a device function.
Parameters
func
- Device function symbol
cacheConfig
- Requested cache configuration
Description

On devices where the L1 cache and shared memory use the same hardware resources, this sets through cacheConfig the preferred cache configuration for the function specified via func. This is only a preference. The runtime will use the requested configuration if possible, but it is free to choose a different configuration if required to execute func.

func is a device function symbol and must be declared as a __global__ function. If the specified function does not exist, then cudaErrorInvalidDeviceFunction is returned. For templated functions, pass the function symbol as follows: func_name<template_arg_0,...,template_arg_N>

This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.

Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point.

The supported cache configurations are:

Note:
  • Note that this function may also return error codes from previous, asynchronous launches.

  • Use of a string naming a function as the func paramater was deprecated in CUDA 4.1 and removed in CUDA 5.0.

See also:

cudaConfigureCall, cudaFuncSetCacheConfig ( C++ API), cudaFuncGetAttributes ( C API), cudaLaunchKernel ( C API), cudaSetDoubleForDevice, cudaSetDoubleForHost, cudaSetupArgument ( C API), cudaThreadGetCacheConfig, cudaThreadSetCacheConfig

__host__cudaError_t cudaFuncSetSharedMemConfig ( const void* func, cudaSharedMemConfig config )
Sets the shared memory configuration for a device function.
Parameters
func
- Device function symbol
config
- Requested shared memory configuration
Description

On devices with configurable shared memory banks, this function will force all subsequent launches of the specified device function to have the given shared memory bank size configuration. On any given launch of the function, the shared memory configuration of the device will be temporarily changed if needed to suit the function's preferred configuration. Changes in shared memory configuration between subsequent launches of functions, may introduce a device side synchronization point.

Any per-function setting of shared memory bank size set via cudaFuncSetSharedMemConfig will override the device wide setting set by cudaDeviceSetSharedMemConfig.

Changing the shared memory bank size will not increase shared memory usage or affect occupancy of kernels, but may have major effects on performance. Larger bank sizes will allow for greater potential bandwidth to shared memory, but will change what kinds of accesses to shared memory will result in bank conflicts.

This function will do nothing on devices with fixed shared memory bank size.

For templated functions, pass the function symbol as follows: func_name<template_arg_0,...,template_arg_N>

The supported bank configurations are:

  • cudaSharedMemBankSizeDefault: use the device's shared memory configuration when launching this function.

  • cudaSharedMemBankSizeFourByte: set shared memory bank width to be four bytes natively when launching this function.

  • cudaSharedMemBankSizeEightByte: set shared memory bank width to be eight bytes natively when launching this function.

Note:
  • Note that this function may also return error codes from previous, asynchronous launches.

  • Use of a string naming a function as the func paramater was deprecated in CUDA 4.1 and removed in CUDA 5.0.

See also:

cudaConfigureCall, cudaDeviceSetSharedMemConfig, cudaDeviceGetSharedMemConfig, cudaDeviceSetCacheConfig, cudaDeviceGetCacheConfig, cudaFuncSetCacheConfig

__device__ ​ void* cudaGetParameterBuffer ( size_t alignment, size_t size )
Obtains a parameter buffer.
Parameters
alignment
- Specifies alignment requirement of the parameter buffer
size
- Specifies size requirement in bytes
Returns

Returns pointer to the allocated parameterBuffer

Description

Obtains a parameter buffer which can be filled with parameters for a kernel launch. Parameters passed to cudaLaunchDevice must be allocated via this function.

This is a low level API and can only be accessed from Parallel Thread Execution (PTX). CUDA user code should use <<< >>> to launch kernels.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cudaLaunchDevice

__device__ ​ void* cudaGetParameterBufferV2 ( void* func, dim3 gridDimension, dim3 blockDimension, unsigned int  sharedMemSize )
Launches a specified kernel.
Parameters
func
- Pointer to the kernel to be launched
gridDimension
- Specifies grid dimensions
blockDimension
- Specifies block dimensions
sharedMemSize
- Specifies size of shared memory
Description

Launches a specified kernel with the specified parameter buffer. A parameter buffer can be obtained by calling cudaGetParameterBuffer().

This is a low level API and can only be accessed from Parallel Thread Execution (PTX). CUDA user code should use <<< >>> to launch the kernels.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

Please refer to Execution Configuration and Parameter Buffer Layout from the CUDA Programming Guide for the detailed descriptions of launch configuration and parameter layout respectively.

See also:

cudaGetParameterBuffer

__host__cudaError_t cudaLaunchKernel ( const void* func, dim3 gridDim, dim3 blockDim, void** args, size_t sharedMem, cudaStream_t stream )
Launches a device function.
Parameters
func
- Device function symbol
gridDim
- Grid dimentions
blockDim
- Block dimentions
args
- Arguments
sharedMem
- Shared memory
stream
- Stream identifier
Description

The function invokes kernel func on gridDim (gridDim.x × gridDim.y × gridDim.z) grid of blocks. Each block contains blockDim (blockDim.x × blockDim.y × blockDim.z) threads.

If the kernel has N parameters the args should point to array of N pointers. Each pointer, from args[0] to args[N - 1], point to the region of memory from which the actual parameter will be copied.

For templated functions, pass the function symbol as follows: func_name<template_arg_0,...,template_arg_N>

sharedMem sets the amount of dynamic shared memory that will be available to each thread block.

stream specifies a stream the invocation is associated to.

Note:
  • This function uses standard default stream semantics.

  • Note that this function may also return error codes from previous, asynchronous launches.

cudaLaunchKernel ( C++ API)

__host__cudaError_t cudaSetDoubleForDevice ( double* d )
Converts a double argument to be executed on a device.
Parameters
d
- Double to convert
Returns

cudaSuccess

Deprecated

This function is deprecated as of CUDA 7.5

Description

Converts the double value of d to an internal float representation if the device does not support double arithmetic. If the device does natively support doubles, then this function does nothing.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

cudaLaunch ( C API), cudaFuncSetCacheConfig ( C API), cudaFuncGetAttributes ( C API), cudaSetDoubleForHost, cudaSetupArgument ( C API)

__host__cudaError_t cudaSetDoubleForHost ( double* d )
Converts a double argument after execution on a device.
Parameters
d
- Double to convert
Returns

cudaSuccess

Deprecated

This function is deprecated as of CUDA 7.5

Description

Converts the double value of d from a potentially internal float representation if the device does not support double arithmetic. If the device does natively support doubles, then this function does nothing.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

cudaLaunch ( C API), cudaFuncSetCacheConfig ( C API), cudaFuncGetAttributes ( C API), cudaSetDoubleForDevice, cudaSetupArgument ( C API)