5.8. Occupancy
This section describes the occupancy calculation functions of the CUDA runtime application programming interface.
Besides the occupancy calculator functions (cudaOccupancyMaxActiveBlocksPerMultiprocessor and cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags), there are also C++ only occupancy-based launch configuration functions documented in C++ API Routines module.
See cudaOccupancyMaxPotentialBlockSize ( C++ API), cudaOccupancyMaxPotentialBlockSize ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMem ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMem ( C++ API)cudaOccupancyAvailableDynamicSMemPerBlock ( C++ API),
Functions
- __host__ cudaError_t cudaOccupancyAvailableDynamicSMemPerBlock ( size_t* dynamicSmemSize, const void* func, int numBlocks, int blockSize )
- Returns dynamic shared memory available per block when launching numBlocks blocks on SM.
- __host__ __device__ cudaError_t cudaOccupancyMaxActiveBlocksPerMultiprocessor ( int* numBlocks, const void* func, int blockSize, size_t dynamicSMemSize )
- Returns occupancy for a device function.
- __host__ cudaError_t cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags ( int* numBlocks, const void* func, int blockSize, size_t dynamicSMemSize, unsigned int flags )
- Returns occupancy for a device function with the specified flags.
Functions
- __host__ cudaError_t cudaOccupancyAvailableDynamicSMemPerBlock ( size_t* dynamicSmemSize, const void* func, int numBlocks, int blockSize )
-
Returns dynamic shared memory available per block when launching numBlocks blocks on SM.
Parameters
- dynamicSmemSize
- - Returned maximum dynamic shared memory
- func
- - Kernel function for which occupancy is calculated
- numBlocks
- - Number of blocks to fit on SM
- blockSize
- - Size of the block
Returns
cudaSuccess, cudaErrorInvalidDevice, cudaErrorInvalidDeviceFunction, cudaErrorInvalidValue, cudaErrorUnknown,
Description
Returns in *dynamicSmemSize the maximum size of dynamic shared memory to allow numBlocks blocks per SM.
Note:-
Note that this function may also return error codes from previous, asynchronous launches.
-
Note that this function may also return cudaErrorInitializationError, cudaErrorInsufficientDriver or cudaErrorNoDevice if this call tries to initialize internal CUDA RT state.
-
Note that as specified by cudaStreamAddCallback no CUDA function may be called from callback. cudaErrorNotPermitted may, but is not guaranteed to, be returned as a diagnostic in such case.
See also:
cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags, cudaOccupancyMaxPotentialBlockSize ( C++ API), cudaOccupancyMaxPotentialBlockSizeWithFlags ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMem ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMemWithFlags ( C++ API), cudaOccupancyAvailableDynamicSMemPerBlock
- __host__ __device__ cudaError_t cudaOccupancyMaxActiveBlocksPerMultiprocessor ( int* numBlocks, const void* func, int blockSize, size_t dynamicSMemSize )
-
Returns occupancy for a device function.
Parameters
- numBlocks
- - Returned occupancy
- func
- - Kernel function for which occupancy is calculated
- blockSize
- - Block size the kernel is intended to be launched with
- dynamicSMemSize
- - Per-block dynamic shared memory usage intended, in bytes
Returns
cudaSuccess, cudaErrorInvalidDevice, cudaErrorInvalidDeviceFunction, cudaErrorInvalidValue, cudaErrorUnknown,
Description
Returns in *numBlocks the maximum number of active blocks per streaming multiprocessor for the device function.
Note:-
Note that this function may also return error codes from previous, asynchronous launches.
-
Note that this function may also return cudaErrorInitializationError, cudaErrorInsufficientDriver or cudaErrorNoDevice if this call tries to initialize internal CUDA RT state.
-
Note that as specified by cudaStreamAddCallback no CUDA function may be called from callback. cudaErrorNotPermitted may, but is not guaranteed to, be returned as a diagnostic in such case.
See also:
cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags, cudaOccupancyMaxPotentialBlockSize ( C++ API), cudaOccupancyMaxPotentialBlockSizeWithFlags ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMem ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMemWithFlags ( C++ API), cudaOccupancyAvailableDynamicSMemPerBlock ( C++ API), cuOccupancyMaxActiveBlocksPerMultiprocessor
- __host__ cudaError_t cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags ( int* numBlocks, const void* func, int blockSize, size_t dynamicSMemSize, unsigned int flags )
-
Returns occupancy for a device function with the specified flags.
Parameters
- numBlocks
- - Returned occupancy
- func
- - Kernel function for which occupancy is calculated
- blockSize
- - Block size the kernel is intended to be launched with
- dynamicSMemSize
- - Per-block dynamic shared memory usage intended, in bytes
- flags
- - Requested behavior for the occupancy calculator
Returns
cudaSuccess, cudaErrorInvalidDevice, cudaErrorInvalidDeviceFunction, cudaErrorInvalidValue, cudaErrorUnknown,
Description
Returns in *numBlocks the maximum number of active blocks per streaming multiprocessor for the device function.
The flags parameter controls how special cases are handled. Valid flags include:
-
cudaOccupancyDefault: keeps the default behavior as cudaOccupancyMaxActiveBlocksPerMultiprocessor
-
cudaOccupancyDisableCachingOverride: This flag suppresses the default behavior on platform where global caching affects occupancy. On such platforms, if caching is enabled, but per-block SM resource usage would result in zero occupancy, the occupancy calculator will calculate the occupancy as if caching is disabled. Setting this flag makes the occupancy calculator to return 0 in such cases. More information can be found about this feature in the "Unified L1/Texture Cache" section of the Maxwell tuning guide.
Note:-
Note that this function may also return error codes from previous, asynchronous launches.
-
Note that this function may also return cudaErrorInitializationError, cudaErrorInsufficientDriver or cudaErrorNoDevice if this call tries to initialize internal CUDA RT state.
-
Note that as specified by cudaStreamAddCallback no CUDA function may be called from callback. cudaErrorNotPermitted may, but is not guaranteed to, be returned as a diagnostic in such case.
See also:
cudaOccupancyMaxActiveBlocksPerMultiprocessor, cudaOccupancyMaxPotentialBlockSize ( C++ API), cudaOccupancyMaxPotentialBlockSizeWithFlags ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMem ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMemWithFlags ( C++ API), cudaOccupancyAvailableDynamicSMemPerBlock ( C++ API), cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags