6.12. Library Management

This section describes the library management functions of the low-level CUDA driver application programming interface.

Functions

CUresult cuKernelGetAttribute ( int* pi, CUfunction_attribute attrib, CUkernel kernel, CUdevice dev )
Returns information about a kernel.
CUresult cuKernelGetFunction ( CUfunction* pFunc, CUkernel kernel )
Returns a function handle.
CUresult cuKernelSetAttribute ( CUfunction_attribute attrib, int  val, CUkernel kernel, CUdevice dev )
Sets information about a kernel.
CUresult cuKernelSetCacheConfig ( CUkernel kernel, CUfunc_cache config, CUdevice dev )
Sets the preferred cache configuration for a device kernel.
CUresult cuLibraryGetGlobal ( CUdeviceptr* dptr, size_t* bytes, CUlibrary library, const char* name )
Returns a global device pointer.
CUresult cuLibraryGetKernel ( CUkernel* pKernel, CUlibrary library, const char* name )
Returns a kernel handle.
CUresult cuLibraryGetManaged ( CUdeviceptr* dptr, size_t* bytes, CUlibrary library, const char* name )
Returns a pointer to managed memory.
CUresult cuLibraryGetModule ( CUmodule* pMod, CUlibrary library )
Returns a module handle.
CUresult cuLibraryGetUnifiedFunction ( void** fptr, CUlibrary library, const char* symbol )
Returns a pointer to a universal function.
CUresult cuLibraryLoadData ( CUlibrary* library, const void* code, CUjit_option* jitOptions, void** jitOptionsValues, unsigned int  numJitOptions, CUlibraryOption* libraryOptions, void** libraryOptionValues, unsigned int  numLibraryOptions )
Load a library with specified code and options.
CUresult cuLibraryLoadFromFile ( CUlibrary* library, const char* fileName, CUjit_option* jitOptions, void** jitOptionsValues, unsigned int  numJitOptions, CUlibraryOption* libraryOptions, void** libraryOptionValues, unsigned int  numLibraryOptions )
Load a library with specified file and options.
CUresult cuLibraryUnload ( CUlibrary library )
Unloads a library.

Functions

CUresult cuKernelGetAttribute ( int* pi, CUfunction_attribute attrib, CUkernel kernel, CUdevice dev )
Returns information about a kernel.
Parameters
pi
- Returned attribute value
attrib
- Attribute requested
kernel
- Kernel to query attribute of
dev
- Device to query attribute of
Description

Returns in *pi the integer value of the attribute attrib for the kernel kernel for the requested device dev. The supported attributes are:

  • CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK: The maximum number of threads per block, beyond which a launch of the kernel would fail. This number depends on both the kernel and the requested device.

  • CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES: The size in bytes of statically-allocated shared memory per block required by this kernel. This does not include dynamically-allocated shared memory requested by the user at runtime.

  • CU_FUNC_ATTRIBUTE_CONST_SIZE_BYTES: The size in bytes of user-allocated constant memory required by this kernel.

  • CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES: The size in bytes of local memory used by each thread of this kernel.

  • CU_FUNC_ATTRIBUTE_NUM_REGS: The number of registers used by each thread of this kernel.

  • CU_FUNC_ATTRIBUTE_PTX_VERSION: The PTX virtual architecture version for which the kernel was compiled. This value is the major PTX version * 10 + the minor PTX version, so a PTX version 1.3 function would return the value 13. Note that this may return the undefined value of 0 for cubins compiled prior to CUDA 3.0.

  • CU_FUNC_ATTRIBUTE_BINARY_VERSION: The binary architecture version for which the kernel was compiled. This value is the major binary version * 10 + the minor binary version, so a binary version 1.3 function would return the value 13. Note that this will return a value of 10 for legacy cubins that do not have a properly-encoded binary architecture version.

  • CU_FUNC_CACHE_MODE_CA: The attribute to indicate whether the kernel has been compiled with user specified option "-Xptxas --dlcm=ca" set.

  • CU_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES: The maximum size in bytes of dynamically-allocated shared memory.

  • CU_FUNC_ATTRIBUTE_PREFERRED_SHARED_MEMORY_CARVEOUT: Preferred shared memory-L1 cache split ratio in percent of total shared memory.

  • CU_FUNC_ATTRIBUTE_CLUSTER_SIZE_MUST_BE_SET: If this attribute is set, the kernel must launch with a valid cluster size specified.

  • CU_FUNC_ATTRIBUTE_REQUIRED_CLUSTER_WIDTH: The required cluster width in blocks.

  • CU_FUNC_ATTRIBUTE_REQUIRED_CLUSTER_HEIGHT: The required cluster height in blocks.

  • CU_FUNC_ATTRIBUTE_REQUIRED_CLUSTER_DEPTH: The required cluster depth in blocks.

  • CU_FUNC_ATTRIBUTE_NON_PORTABLE_CLUSTER_SIZE_ALLOWED: Indicates whether the function can be launched with non-portable cluster size. 1 is allowed, 0 is disallowed. A non-portable cluster size may only function on the specific SKUs the program is tested on. The launch might fail if the program is run on a different hardware platform. CUDA API provides cudaOccupancyMaxActiveClusters to assist with checking whether the desired size can be launched on the current device. A portable cluster size is guaranteed to be functional on all compute capabilities higher than the target compute capability. The portable cluster size for sm_90 is 8 blocks per cluster. This value may increase for future compute capabilities. The specific hardware unit may support higher cluster sizes that’s not guaranteed to be portable.

  • CU_FUNC_ATTRIBUTE_CLUSTER_SCHEDULING_POLICY_PREFERENCE: The block scheduling policy of a function. The value type is CUclusterSchedulingPolicy.

Note:

If another thread is trying to set the same attribute on the same device using cuKernelSetAttribute() simultaneously, the attribute query will give the old or new value depending on the interleavings chosen by the OS scheduler and memory consistency.

See also:

cuLibraryLoadData, cuLibraryLoadFromFile, cuLibraryUnload, cuKernelSetAttribute, cuLibraryGetKernel, cuLaunchKernel, cuKernelGetFunction, cuLibraryGetModule, cuModuleGetFunction, cuFuncGetAttribute

CUresult cuKernelGetFunction ( CUfunction* pFunc, CUkernel kernel )
Returns a function handle.
Parameters
pFunc
- Returned function handle
kernel
- Kernel to retrieve function for the requested context
Description

Returns in pFunc the handle of the function for the requested kernel kernel and the current context. If function handle is not found, the call returns CUDA_ERROR_NOT_FOUND.

See also:

cuLibraryLoadData, cuLibraryLoadFromFile, cuLibraryUnload, cuLibraryGetKernel, cuLibraryGetModule, cuModuleGetFunction

CUresult cuKernelSetAttribute ( CUfunction_attribute attrib, int  val, CUkernel kernel, CUdevice dev )
Sets information about a kernel.
Parameters
attrib
- Attribute requested
val
- Value to set
kernel
- Kernel to set attribute of
dev
- Device to set attribute of
Description

This call sets the value of a specified attribute attrib on the kernel kernel for the requested device dev to an integer value specified by val. This function returns CUDA_SUCCESS if the new value of the attribute could be successfully set. If the set fails, this call will return an error. Not all attributes can have values set. Attempting to set a value on a read-only attribute will result in an error (CUDA_ERROR_INVALID_VALUE)

Note that attributes set using cuFuncSetAttribute() will override the attribute set by this API irrespective of whether the call to cuFuncSetAttribute() is made before or after this API call. However, cuKernelGetAttribute() will always return the attribute value set by this API.

Supported attributes are:

Note:

The API has stricter locking requirements in comparison to its legacy counterpart cuFuncSetAttribute() due to device-wide semantics. If multiple threads are trying to set the same attribute on the same device simultaneously, the attribute setting will depend on the interleavings chosen by the OS scheduler and memory consistency.

See also:

cuLibraryLoadData, cuLibraryLoadFromFile, cuLibraryUnload, cuKernelGetAttribute, cuLibraryGetKernel, cuLaunchKernel, cuKernelGetFunction, cuLibraryGetModule, cuModuleGetFunction, cuFuncSetAttribute

CUresult cuKernelSetCacheConfig ( CUkernel kernel, CUfunc_cache config, CUdevice dev )
Sets the preferred cache configuration for a device kernel.
Parameters
kernel
- Kernel to configure cache for
config
- Requested cache configuration
dev
- Device to set attribute of
Description

On devices where the L1 cache and shared memory use the same hardware resources, this sets through config the preferred cache configuration for the device kernel kernel on the requested device dev. This is only a preference. The driver will use the requested configuration if possible, but it is free to choose a different configuration if required to execute kernel. Any context-wide preference set via cuCtxSetCacheConfig() will be overridden by this per-kernel setting.

Note that attributes set using cuFuncSetCacheConfig() will override the attribute set by this API irrespective of whether the call to cuFuncSetCacheConfig() is made before or after this API call.

This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.

Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point.

The supported cache configurations are:

Note:

The API has stricter locking requirements in comparison to its legacy counterpart cuFuncSetCacheConfig() due to device-wide semantics. If multiple threads are trying to set a config on the same device simultaneously, the cache config setting will depend on the interleavings chosen by the OS scheduler and memory consistency.

See also:

cuLibraryLoadData, cuLibraryLoadFromFile, cuLibraryUnload, cuLibraryGetKernel, cuKernelGetFunction, cuLibraryGetModule, cuModuleGetFunction, cuFuncSetCacheConfig, cuCtxSetCacheConfig, cuLaunchKernel

CUresult cuLibraryGetGlobal ( CUdeviceptr* dptr, size_t* bytes, CUlibrary library, const char* name )
Returns a global device pointer.
Parameters
dptr
- Returned global device pointer for the requested context
bytes
- Returned global size in bytes
library
- Library to retrieve global from
name
- Name of global to retrieve
Description

Returns in *dptr and *bytes the base pointer and size of the global with name name for the requested library library and the current context. If no global for the requested name name exists, the call returns CUDA_ERROR_NOT_FOUND. One of the parameters dptr or bytes (not both) can be NULL in which case it is ignored.

See also:

cuLibraryLoadData, cuLibraryLoadFromFile, cuLibraryUnload, cuLibraryGetModule, cuModuleGetGlobal

CUresult cuLibraryGetKernel ( CUkernel* pKernel, CUlibrary library, const char* name )
Returns a kernel handle.
Parameters
pKernel
- Returned kernel handle
library
- Library to retrieve kernel from
name
- Name of kernel to retrieve
Description

Returns in pKernel the handle of the kernel with name name located in library library. If kernel handle is not found, the call returns CUDA_ERROR_NOT_FOUND.

See also:

cuLibraryLoadData, cuLibraryLoadFromFile, cuLibraryUnload, cuKernelGetFunction, cuLibraryGetModule, cuModuleGetFunction

CUresult cuLibraryGetManaged ( CUdeviceptr* dptr, size_t* bytes, CUlibrary library, const char* name )
Returns a pointer to managed memory.
Parameters
dptr
- Returned pointer to the managed memory
bytes
- Returned memory size in bytes
library
- Library to retrieve managed memory from
name
- Name of managed memory to retrieve
Description

Returns in *dptr and *bytes the base pointer and size of the managed memory with name name for the requested library library. If no managed memory with the requested name name exists, the call returns CUDA_ERROR_NOT_FOUND. One of the parameters dptr or bytes (not both) can be NULL in which case it is ignored. Note that managed memory for library library is shared across devices and is registered when the library is loaded into atleast one context.

Note:

The API requires a CUDA context to be present and initialized on at least one device. If no context is present, the call returns CUDA_ERROR_NOT_FOUND.

See also:

cuLibraryLoadData, cuLibraryLoadFromFile, cuLibraryUnload,

CUresult cuLibraryGetModule ( CUmodule* pMod, CUlibrary library )
Returns a module handle.
Parameters
pMod
- Returned module handle
library
- Library to retrieve module from
Description

Returns in pMod the module handle associated with the current context located in library library. If module handle is not found, the call returns CUDA_ERROR_NOT_FOUND.

See also:

cuLibraryLoadData, cuLibraryLoadFromFile, cuLibraryUnload, cuModuleGetFunction

CUresult cuLibraryGetUnifiedFunction ( void** fptr, CUlibrary library, const char* symbol )
Returns a pointer to a universal function.
Parameters
fptr
- Returned pointer to a universal function
library
- Library to retrieve function pointer memory from
symbol
- Name of function pointer to retrieve
Description

Returns in *fptr the function pointer to a global function denoted by symbol. If no universal function with name symbol exists, the call returns CUDA_ERROR_NOT_FOUND. If there is no device with attrubute CU_DEVICE_ATTRIBUTE_UNIFIED_FUNCTION_POINTERS present in the system, the call may return CUDA_ERROR_NOT_FOUND.

See also:

cuLibraryLoadData, cuLibraryLoadFromFile, cuLibraryUnload,

CUresult cuLibraryLoadData ( CUlibrary* library, const void* code, CUjit_option* jitOptions, void** jitOptionsValues, unsigned int  numJitOptions, CUlibraryOption* libraryOptions, void** libraryOptionValues, unsigned int  numLibraryOptions )
Load a library with specified code and options.
Parameters
library
- Returned library
code
- Code to load
jitOptions
- Options for JIT
jitOptionsValues
- Option values for JIT
numJitOptions
- Number of options
libraryOptions
- Options for loading
libraryOptionValues
- Option values for loading
numLibraryOptions
- Number of options for loading
Description

Takes a pointer code and loads the corresponding library library into all contexts existent at the time of the call and future contexts at the time of creation until the library is unloaded with cuLibraryUnload().

The pointer may be obtained by mapping a cubin or PTX or fatbin file, passing a cubin or PTX or fatbin file as a NULL-terminated text string, or incorporating a cubin or fatbin object into the executable resources and using operating system calls such as Windows FindResource() to obtain the pointer. Options are passed as an array via jitOptions and any corresponding parameters are passed in jitOptionsValues. The number of total JTT options is supplied via numJitOptions. Any outputs will be returned via jitOptionsValues.

See also:

cuLibraryLoadFromFile, cuLibraryUnload, cuModuleLoad, cuModuleLoadData, cuModuleLoadDataEx

CUresult cuLibraryLoadFromFile ( CUlibrary* library, const char* fileName, CUjit_option* jitOptions, void** jitOptionsValues, unsigned int  numJitOptions, CUlibraryOption* libraryOptions, void** libraryOptionValues, unsigned int  numLibraryOptions )
Load a library with specified file and options.
Parameters
library
- Returned library
fileName
jitOptions
- Options for JIT
jitOptionsValues
- Option values for JIT
numJitOptions
- Number of options
libraryOptions
- Options for loading
libraryOptionValues
- Option values for loading
numLibraryOptions
- Number of options for loading
Description

Takes a filename fileName and loads the corresponding library library into all contexts existent at the time of the call and future contexts at the time of creation until the library is unloaded with cuLibraryUnload().

The file should be a cubin file as output by nvcc, or a PTX file either as output by nvcc or handwritten, or a fatbin file as output by nvcc from toolchain 4.0 or later.

Options are passed as an array via jitOptions and any corresponding parameters are passed in jitOptionsValues. The number of total options is supplied via numJitOptions. Any outputs will be returned via jitOptionsValues.

See also:

cuLibraryLoadData, :: cuLibraryUnload, cuModuleLoad, cuModuleLoadData, cuModuleLoadDataEx

CUresult cuLibraryUnload ( CUlibrary library )
Unloads a library.
Parameters
library
- Library to unload
Description

Unloads the library specified with library

See also:

cuLibraryLoadDatacuLibraryLoadFromFile, cuModuleUnload