6.34. Coredump Attributes Control API
This section describes the coredump attribute control functions of the low-level CUDA driver application programming interface.
Typedefs
- typedef CUcoredumpCallbackEntry_st * CUcoredumpCallbackHandle
- Opaque handle representing a registered coredump status callback.
- typedef void(CUDA_CB* CUcoredumpStatusCallback )( void* userData, int pid, CUdevice dev )
- Callback function prototype for GPU coredump status notifications.
Enumerations
- enum CUCoredumpGenerationFlags
- enum CUcoredumpSettings
Functions
- CUresult cuCoredumpDeregisterCompleteCallback ( CUcoredumpCallbackHandle callback )
- Deregister a previously registered coredump complete callback.
- CUresult cuCoredumpDeregisterStartCallback ( CUcoredumpCallbackHandle callback )
- Deregister a previously registered coredump start callback.
- CUresult cuCoredumpGetAttribute ( CUcoredumpSettings attrib, void* value, size_t* size )
- Allows caller to fetch a coredump attribute value for the current context.
- CUresult cuCoredumpGetAttributeGlobal ( CUcoredumpSettings attrib, void* value, size_t* size )
- Allows caller to fetch a coredump attribute value for the entire application.
- CUresult cuCoredumpRegisterCompleteCallback ( CUcoredumpStatusCallback callback, void* userData, CUcoredumpCallbackHandle* callbackOut )
- Register a callback to be invoked when a GPU coredump completes.
- CUresult cuCoredumpRegisterStartCallback ( CUcoredumpStatusCallback callback, void* userData, CUcoredumpCallbackHandle* callbackOut )
- Register a callback to be invoked when a GPU coredump begins.
- CUresult cuCoredumpSetAttribute ( CUcoredumpSettings attrib, void* value, size_t* size )
- Allows caller to set a coredump attribute value for the current context.
- CUresult cuCoredumpSetAttributeGlobal ( CUcoredumpSettings attrib, void* value, size_t* size )
- Allows caller to set a coredump attribute value globally.
Typedefs
- typedef CUcoredumpCallbackEntry_st * CUcoredumpCallbackHandle
-
Opaque handle representing a registered coredump status callback. This handle is returned when registering a callback and must be provided when deregistering the callback.
- void(CUDA_CB* CUcoredumpStatusCallback )( void* userData, int pid, CUdevice dev )
-
Callback function prototype for GPU coredump status notifications. This callback will be invoked when a GPU coredump begins or completes, depending on which registration function was used. The callback executes synchronously during the coredump process.
- userData
- - User-provided data pointer that was passed during registration
- int pid
- CUdevice dev
Parameters
Enumerations
- enum CUCoredumpGenerationFlags
-
Flags for controlling coredump contents
Values
- CU_COREDUMP_DEFAULT_FLAGS = 0
- CU_COREDUMP_SKIP_NONRELOCATED_ELF_IMAGES = (1<<0)
- CU_COREDUMP_SKIP_GLOBAL_MEMORY = (1<<1)
- CU_COREDUMP_SKIP_SHARED_MEMORY = (1<<2)
- CU_COREDUMP_SKIP_LOCAL_MEMORY = (1<<3)
- CU_COREDUMP_SKIP_ABORT = (1<<4)
- CU_COREDUMP_SKIP_CONSTBANK_MEMORY = (1<<5)
- CU_COREDUMP_GZIP_COMPRESS = (1<<6)
- CU_COREDUMP_LIGHTWEIGHT_FLAGS = CU_COREDUMP_SKIP_NONRELOCATED_ELF_IMAGES |CU_COREDUMP_SKIP_GLOBAL_MEMORY |CU_COREDUMP_SKIP_SHARED_MEMORY |CU_COREDUMP_SKIP_LOCAL_MEMORY |CU_COREDUMP_SKIP_CONSTBANK_MEMORY
- enum CUcoredumpSettings
-
Flags for choosing a coredump attribute to get/set
Values
- CU_COREDUMP_ENABLE_ON_EXCEPTION = 1
- CU_COREDUMP_TRIGGER_HOST
- CU_COREDUMP_LIGHTWEIGHT
- CU_COREDUMP_ENABLE_USER_TRIGGER
- CU_COREDUMP_FILE
- CU_COREDUMP_PIPE
- CU_COREDUMP_GENERATION_FLAGS
- CU_COREDUMP_MAX
Functions
- CUresult cuCoredumpDeregisterCompleteCallback ( CUcoredumpCallbackHandle callback )
-
Deregister a previously registered coredump complete callback.
Parameters
- callback
- - The callback handle to deregister
Returns
Description
This function removes a callback that was registered with cuCoredumpRegisterCompleteCallback. The callback handle becomes invalid after this call.
Note:It is the caller's responsibility to deregister callbacks before they go out of scope.
See also:
- CUresult cuCoredumpDeregisterStartCallback ( CUcoredumpCallbackHandle callback )
-
Deregister a previously registered coredump start callback.
Parameters
- callback
- - The callback handle to deregister
Returns
Description
This function removes a callback that was registered with cuCoredumpRegisterStartCallback. The callback handle becomes invalid after this call.
Note:It is the caller's responsibility to deregister callbacks before they go out of scope.
See also:
- CUresult cuCoredumpGetAttribute ( CUcoredumpSettings attrib, void* value, size_t* size )
-
Allows caller to fetch a coredump attribute value for the current context.
Parameters
- attrib
- - The enum defining which value to fetch.
- value
- - void* containing the requested data.
- size
- - The size of the memory region value points to.
Returns
CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_NOT_PERMITTED, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_CONTEXT_IS_DESTROYED
Description
Returns in *value the requested value specified by attrib. It is up to the caller to ensure that the data type and size of *value matches the request.
If the caller calls this function with *value equal to NULL, the size of the memory region (in bytes) expected for attrib will be placed in size.
The supported attributes are:
-
CU_COREDUMP_ENABLE_ON_EXCEPTION: Bool where true means that GPU exceptions from this context will create a coredump at the location specified by CU_COREDUMP_FILE. The default value is false unless set to true globally or locally, or the CU_CTX_USER_COREDUMP_ENABLE flag was set during context creation.
-
CU_COREDUMP_TRIGGER_HOST: Bool where true means that the host CPU will also create a coredump. The default value is true unless set to false globally or or locally. This value is deprecated as of CUDA 12.5 - raise the CU_COREDUMP_SKIP_ABORT flag to disable host device abort() if needed.
-
CU_COREDUMP_LIGHTWEIGHT: Bool where true means that any resulting coredumps will not have a dump of GPU memory or non-reloc ELF images. The default value is false unless set to true globally or locally. This attribute is deprecated as of CUDA 12.5, please use CU_COREDUMP_GENERATION_FLAGS instead.
-
CU_COREDUMP_ENABLE_USER_TRIGGER: Bool where true means that a coredump can be created by writing to the system pipe specified by CU_COREDUMP_PIPE. The default value is false unless set to true globally or locally.
-
CU_COREDUMP_FILE: String of up to 1023 characters that defines the location where any coredumps generated by this context will be written. The default value is core.cuda.HOSTNAME.PID where HOSTNAME is the host name of the machine running the CUDA applications and PID is the process ID of the CUDA application.
-
CU_COREDUMP_PIPE: String of up to 1023 characters that defines the name of the pipe that will be monitored if user-triggered coredumps are enabled. The default value is corepipe.cuda.HOSTNAME.PID where HOSTNAME is the host name of the machine running the CUDA application and PID is the process ID of the CUDA application.
-
CU_COREDUMP_GENERATION_FLAGS: An integer with values to allow granular control the data contained in a coredump specified as a bitwise OR combination of the following values: + CU_COREDUMP_DEFAULT_FLAGS - if set by itself, coredump generation returns to its default settings of including all memory regions that it is able to access + CU_COREDUMP_SKIP_NONRELOCATED_ELF_IMAGES - Coredump will not include the data from CUDA source modules that are not relocated at runtime. + CU_COREDUMP_SKIP_GLOBAL_MEMORY - Coredump will not include device-side global data that does not belong to any context. + CU_COREDUMP_SKIP_SHARED_MEMORY - Coredump will not include grid-scale shared memory for the warp that the dumped kernel belonged to. + CU_COREDUMP_SKIP_LOCAL_MEMORY - Coredump will not include local memory from the kernel. + CU_COREDUMP_LIGHTWEIGHT_FLAGS - Enables all of the above options. Equiavlent to setting the CU_COREDUMP_LIGHTWEIGHT attribute to true. + CU_COREDUMP_SKIP_ABORT - If set, GPU exceptions will not raise an abort() in the host CPU process. Same functional goal as CU_COREDUMP_TRIGGER_HOST but better reflects the default behavior.
See also:
cuCoredumpGetAttributeGlobal, cuCoredumpSetAttribute, cuCoredumpSetAttributeGlobal
- CUresult cuCoredumpGetAttributeGlobal ( CUcoredumpSettings attrib, void* value, size_t* size )
-
Allows caller to fetch a coredump attribute value for the entire application.
Parameters
- attrib
- - The enum defining which value to fetch.
- value
- - void* containing the requested data.
- size
- - The size of the memory region value points to.
Returns
Description
Returns in *value the requested value specified by attrib. It is up to the caller to ensure that the data type and size of *value matches the request.
If the caller calls this function with *value equal to NULL, the size of the memory region (in bytes) expected for attrib will be placed in size.
The supported attributes are:
-
CU_COREDUMP_ENABLE_ON_EXCEPTION: Bool where true means that GPU exceptions from this context will create a coredump at the location specified by CU_COREDUMP_FILE. The default value is false.
-
CU_COREDUMP_TRIGGER_HOST: Bool where true means that the host CPU will also create a coredump. The default value is true unless set to false globally or or locally. This value is deprecated as of CUDA 12.5 - raise the CU_COREDUMP_SKIP_ABORT flag to disable host device abort() if needed.
-
CU_COREDUMP_LIGHTWEIGHT: Bool where true means that any resulting coredumps will not have a dump of GPU memory or non-reloc ELF images. The default value is false. This attribute is deprecated as of CUDA 12.5, please use CU_COREDUMP_GENERATION_FLAGS instead.
-
CU_COREDUMP_ENABLE_USER_TRIGGER: Bool where true means that a coredump can be created by writing to the system pipe specified by CU_COREDUMP_PIPE. The default value is false.
-
CU_COREDUMP_FILE: String of up to 1023 characters that defines the location where any coredumps generated by this context will be written. The default value is core.cuda.HOSTNAME.PID where HOSTNAME is the host name of the machine running the CUDA applications and PID is the process ID of the CUDA application.
-
CU_COREDUMP_PIPE: String of up to 1023 characters that defines the name of the pipe that will be monitored if user-triggered coredumps are enabled. The default value is corepipe.cuda.HOSTNAME.PID where HOSTNAME is the host name of the machine running the CUDA application and PID is the process ID of the CUDA application.
-
CU_COREDUMP_GENERATION_FLAGS: An integer with values to allow granular control the data contained in a coredump specified as a bitwise OR combination of the following values: + CU_COREDUMP_DEFAULT_FLAGS - if set by itself, coredump generation returns to its default settings of including all memory regions that it is able to access + CU_COREDUMP_SKIP_NONRELOCATED_ELF_IMAGES - Coredump will not include the data from CUDA source modules that are not relocated at runtime. + CU_COREDUMP_SKIP_GLOBAL_MEMORY - Coredump will not include device-side global data that does not belong to any context. + CU_COREDUMP_SKIP_SHARED_MEMORY - Coredump will not include grid-scale shared memory for the warp that the dumped kernel belonged to. + CU_COREDUMP_SKIP_LOCAL_MEMORY - Coredump will not include local memory from the kernel. + CU_COREDUMP_LIGHTWEIGHT_FLAGS - Enables all of the above options. Equiavlent to setting the CU_COREDUMP_LIGHTWEIGHT attribute to true. + CU_COREDUMP_SKIP_ABORT - If set, GPU exceptions will not raise an abort() in the host CPU process. Same functional goal as CU_COREDUMP_TRIGGER_HOST but better reflects the default behavior.
See also:
cuCoredumpGetAttribute, cuCoredumpSetAttribute, cuCoredumpSetAttributeGlobal
- CUresult cuCoredumpRegisterCompleteCallback ( CUcoredumpStatusCallback callback, void* userData, CUcoredumpCallbackHandle* callbackOut )
-
Register a callback to be invoked when a GPU coredump completes.
Parameters
- callback
- - The callback function to register
- userData
- - User data pointer to pass to the callback
- callbackOut
- - Location to store the callback handle (optional, may be NULL)
Description
This function registers a callback that will be called when a GPU coredump has been fully collected and written to disk. Callbacks are executed in the order they were registered. The same callback function can be registered multiple times with different userData, and each registration will receive a unique handle.
Note:Callbacks execute synchronously during the coredump process and will block coredump progress while running.
See also:
cuCoredumpDeregisterCompleteCallback, cuCoredumpRegisterStartCallback
- CUresult cuCoredumpRegisterStartCallback ( CUcoredumpStatusCallback callback, void* userData, CUcoredumpCallbackHandle* callbackOut )
-
Register a callback to be invoked when a GPU coredump begins.
Parameters
- callback
- - The callback function to register
- userData
- - User data pointer to pass to the callback
- callbackOut
- - Location to store the callback handle (optional, may be NULL)
Description
This function registers a callback that will be called when a GPU coredump is initiated, before any coredump data is collected. Callbacks are executed in the order they were registered. The same callback function can be registered multiple times with different userData, and each registration will receive a unique handle.
Note:Callbacks execute synchronously during the coredump process and will block coredump progress while running.
See also:
cuCoredumpDeregisterStartCallback, cuCoredumpRegisterCompleteCallback
- CUresult cuCoredumpSetAttribute ( CUcoredumpSettings attrib, void* value, size_t* size )
-
Allows caller to set a coredump attribute value for the current context.
Parameters
- attrib
- - The enum defining which value to set.
- value
- - void* containing the requested data.
- size
- - The size of the memory region value points to.
Returns
CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_NOT_PERMITTED, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_CONTEXT_IS_DESTROYED, CUDA_ERROR_NOT_SUPPORTED
Description
This function should be considered an alternate interface to the CUDA-GDB environment variables defined in this document: https://docs.nvidia.com/cuda/cuda-gdb/index.html#gpu-coredump
An important design decision to note is that any coredump environment variable values set before CUDA initializes will take permanent precedence over any values set with this function. This decision was made to ensure no change in behavior for any users that may be currently using these variables to get coredumps.
*value shall contain the requested value specified by set. It is up to the caller to ensure that the data type and size of *value matches the request.
If the caller calls this function with *value equal to NULL, the size of the memory region (in bytes) expected for set will be placed in size.
/note This function will return CUDA_ERROR_NOT_SUPPORTED if the caller attempts to set CU_COREDUMP_ENABLE_ON_EXCEPTION on a GPU of with Compute Capability < 6.0. cuCoredumpSetAttributeGlobal works on those platforms as an alternative.
/note CU_COREDUMP_ENABLE_USER_TRIGGER and CU_COREDUMP_PIPE cannot be set on a per-context basis.
The supported attributes are:
-
CU_COREDUMP_ENABLE_ON_EXCEPTION: Bool where true means that GPU exceptions from this context will create a coredump at the location specified by CU_COREDUMP_FILE. The default value is false.
-
CU_COREDUMP_TRIGGER_HOST: Bool where true means that the host CPU will also create a coredump. The default value is true unless set to false globally or or locally. This value is deprecated as of CUDA 12.5 - raise the CU_COREDUMP_SKIP_ABORT flag to disable host device abort() if needed.
-
CU_COREDUMP_LIGHTWEIGHT: Bool where true means that any resulting coredumps will not have a dump of GPU memory or non-reloc ELF images. The default value is false. This attribute is deprecated as of CUDA 12.5, please use CU_COREDUMP_GENERATION_FLAGS instead.
-
CU_COREDUMP_FILE: String of up to 1023 characters that defines the location where any coredumps generated by this context will be written. The default value is core.cuda.HOSTNAME.PID where HOSTNAME is the host name of the machine running the CUDA applications and PID is the process ID of the CUDA application.
-
CU_COREDUMP_GENERATION_FLAGS: An integer with values to allow granular control the data contained in a coredump specified as a bitwise OR combination of the following values: + CU_COREDUMP_DEFAULT_FLAGS - if set by itself, coredump generation returns to its default settings of including all memory regions that it is able to access + CU_COREDUMP_SKIP_NONRELOCATED_ELF_IMAGES - Coredump will not include the data from CUDA source modules that are not relocated at runtime. + CU_COREDUMP_SKIP_GLOBAL_MEMORY - Coredump will not include device-side global data that does not belong to any context. + CU_COREDUMP_SKIP_SHARED_MEMORY - Coredump will not include grid-scale shared memory for the warp that the dumped kernel belonged to. + CU_COREDUMP_SKIP_LOCAL_MEMORY - Coredump will not include local memory from the kernel. + CU_COREDUMP_LIGHTWEIGHT_FLAGS - Enables all of the above options. Equiavlent to setting the CU_COREDUMP_LIGHTWEIGHT attribute to true. + CU_COREDUMP_SKIP_ABORT - If set, GPU exceptions will not raise an abort() in the host CPU process. Same functional goal as CU_COREDUMP_TRIGGER_HOST but better reflects the default behavior.
See also:
cuCoredumpGetAttributeGlobal, cuCoredumpGetAttribute, cuCoredumpSetAttributeGlobal
- CUresult cuCoredumpSetAttributeGlobal ( CUcoredumpSettings attrib, void* value, size_t* size )
-
Allows caller to set a coredump attribute value globally.
Parameters
- attrib
- - The enum defining which value to set.
- value
- - void* containing the requested data.
- size
- - The size of the memory region value points to.
Description
This function should be considered an alternate interface to the CUDA-GDB environment variables defined in this document: https://docs.nvidia.com/cuda/cuda-gdb/index.html#gpu-coredump
An important design decision to note is that any coredump environment variable values set before CUDA initializes will take permanent precedence over any values set with this function. This decision was made to ensure no change in behavior for any users that may be currently using these variables to get coredumps.
*value shall contain the requested value specified by set. It is up to the caller to ensure that the data type and size of *value matches the request.
If the caller calls this function with *value equal to NULL, the size of the memory region (in bytes) expected for set will be placed in size.
The supported attributes are:
-
CU_COREDUMP_ENABLE_ON_EXCEPTION: Bool where true means that GPU exceptions from this context will create a coredump at the location specified by CU_COREDUMP_FILE. The default value is false.
-
CU_COREDUMP_TRIGGER_HOST: Bool where true means that the host CPU will also create a coredump. The default value is true unless set to false globally or or locally. This value is deprecated as of CUDA 12.5 - raise the CU_COREDUMP_SKIP_ABORT flag to disable host device abort() if needed.
-
CU_COREDUMP_LIGHTWEIGHT: Bool where true means that any resulting coredumps will not have a dump of GPU memory or non-reloc ELF images. The default value is false. This attribute is deprecated as of CUDA 12.5, please use CU_COREDUMP_GENERATION_FLAGS instead.
-
CU_COREDUMP_ENABLE_USER_TRIGGER: Bool where true means that a coredump can be created by writing to the system pipe specified by CU_COREDUMP_PIPE. The default value is false.
-
CU_COREDUMP_FILE: String of up to 1023 characters that defines the location where any coredumps generated by this context will be written. The default value is core.cuda.HOSTNAME.PID where HOSTNAME is the host name of the machine running the CUDA applications and PID is the process ID of the CUDA application.
-
CU_COREDUMP_PIPE: String of up to 1023 characters that defines the name of the pipe that will be monitored if user-triggered coredumps are enabled. This value may not be changed after CU_COREDUMP_ENABLE_USER_TRIGGER is set to true. The default value is corepipe.cuda.HOSTNAME.PID where HOSTNAME is the host name of the machine running the CUDA application and PID is the process ID of the CUDA application.
-
CU_COREDUMP_GENERATION_FLAGS: An integer with values to allow granular control the data contained in a coredump specified as a bitwise OR combination of the following values: + CU_COREDUMP_DEFAULT_FLAGS - if set by itself, coredump generation returns to its default settings of including all memory regions that it is able to access + CU_COREDUMP_SKIP_NONRELOCATED_ELF_IMAGES - Coredump will not include the data from CUDA source modules that are not relocated at runtime. + CU_COREDUMP_SKIP_GLOBAL_MEMORY - Coredump will not include device-side global data that does not belong to any context. + CU_COREDUMP_SKIP_SHARED_MEMORY - Coredump will not include grid-scale shared memory for the warp that the dumped kernel belonged to. + CU_COREDUMP_SKIP_LOCAL_MEMORY - Coredump will not include local memory from the kernel. + CU_COREDUMP_LIGHTWEIGHT_FLAGS - Enables all of the above options. Equiavlent to setting the CU_COREDUMP_LIGHTWEIGHT attribute to true. + CU_COREDUMP_SKIP_ABORT - If set, GPU exceptions will not raise an abort() in the host CPU process. Same functional goal as CU_COREDUMP_TRIGGER_HOST but better reflects the default behavior.
See also:
cuCoredumpGetAttribute, cuCoredumpGetAttributeGlobal, cuCoredumpSetAttribute