6.34. Coredump Attributes Control API

This section describes the coredump attribute control functions of the low-level CUDA driver application programming interface.

Typedefs

typedef CUcoredumpCallbackEntry_st *  CUcoredumpCallbackHandle
Opaque handle representing a registered coredump status callback.
typedef void(CUDA_CB*  CUcoredumpStatusCallback )( void*  userData,  int pid,  CUdevice dev )
Callback function prototype for GPU coredump status notifications.

Enumerations

enum CUCoredumpGenerationFlags
enum CUcoredumpSettings

Functions

CUresult cuCoredumpDeregisterCompleteCallback ( CUcoredumpCallbackHandle callback )
Deregister a previously registered coredump complete callback.
CUresult cuCoredumpDeregisterStartCallback ( CUcoredumpCallbackHandle callback )
Deregister a previously registered coredump start callback.
CUresult cuCoredumpGetAttribute ( CUcoredumpSettings attrib, void* value, size_t* size )
Allows caller to fetch a coredump attribute value for the current context.
CUresult cuCoredumpGetAttributeGlobal ( CUcoredumpSettings attrib, void* value, size_t* size )
Allows caller to fetch a coredump attribute value for the entire application.
CUresult cuCoredumpRegisterCompleteCallback ( CUcoredumpStatusCallback callback, void* userData, CUcoredumpCallbackHandle* callbackOut )
Register a callback to be invoked when a GPU coredump completes.
CUresult cuCoredumpRegisterStartCallback ( CUcoredumpStatusCallback callback, void* userData, CUcoredumpCallbackHandle* callbackOut )
Register a callback to be invoked when a GPU coredump begins.
CUresult cuCoredumpSetAttribute ( CUcoredumpSettings attrib, void* value, size_t* size )
Allows caller to set a coredump attribute value for the current context.
CUresult cuCoredumpSetAttributeGlobal ( CUcoredumpSettings attrib, void* value, size_t* size )
Allows caller to set a coredump attribute value globally.

Typedefs

typedef CUcoredumpCallbackEntry_st * CUcoredumpCallbackHandle

Opaque handle representing a registered coredump status callback. This handle is returned when registering a callback and must be provided when deregistering the callback.

void(CUDA_CB* CUcoredumpStatusCallback )( void*  userData,  int pid,  CUdevice dev )

Callback function prototype for GPU coredump status notifications. This callback will be invoked when a GPU coredump begins or completes, depending on which registration function was used. The callback executes synchronously during the coredump process.

Parameters
userData
- User-provided data pointer that was passed during registration
int pid
CUdevice dev

Enumerations

enum CUCoredumpGenerationFlags

Flags for controlling coredump contents

Values
CU_COREDUMP_DEFAULT_FLAGS = 0
CU_COREDUMP_SKIP_NONRELOCATED_ELF_IMAGES = (1<<0)
CU_COREDUMP_SKIP_GLOBAL_MEMORY = (1<<1)
CU_COREDUMP_SKIP_SHARED_MEMORY = (1<<2)
CU_COREDUMP_SKIP_LOCAL_MEMORY = (1<<3)
CU_COREDUMP_SKIP_ABORT = (1<<4)
CU_COREDUMP_SKIP_CONSTBANK_MEMORY = (1<<5)
CU_COREDUMP_GZIP_COMPRESS = (1<<6)
CU_COREDUMP_LIGHTWEIGHT_FLAGS = CU_COREDUMP_SKIP_NONRELOCATED_ELF_IMAGES |CU_COREDUMP_SKIP_GLOBAL_MEMORY |CU_COREDUMP_SKIP_SHARED_MEMORY |CU_COREDUMP_SKIP_LOCAL_MEMORY |CU_COREDUMP_SKIP_CONSTBANK_MEMORY
enum CUcoredumpSettings

Flags for choosing a coredump attribute to get/set

Values
CU_COREDUMP_ENABLE_ON_EXCEPTION = 1
CU_COREDUMP_TRIGGER_HOST
CU_COREDUMP_LIGHTWEIGHT
CU_COREDUMP_ENABLE_USER_TRIGGER
CU_COREDUMP_FILE
CU_COREDUMP_PIPE
CU_COREDUMP_GENERATION_FLAGS
CU_COREDUMP_MAX

Functions

CUresult cuCoredumpDeregisterCompleteCallback ( CUcoredumpCallbackHandle callback )
Deregister a previously registered coredump complete callback.
Parameters
callback
- The callback handle to deregister
Description

This function removes a callback that was registered with cuCoredumpRegisterCompleteCallback. The callback handle becomes invalid after this call.

Note:

It is the caller's responsibility to deregister callbacks before they go out of scope.

See also:

cuCoredumpRegisterCompleteCallback

CUresult cuCoredumpDeregisterStartCallback ( CUcoredumpCallbackHandle callback )
Deregister a previously registered coredump start callback.
Parameters
callback
- The callback handle to deregister
Description

This function removes a callback that was registered with cuCoredumpRegisterStartCallback. The callback handle becomes invalid after this call.

Note:

It is the caller's responsibility to deregister callbacks before they go out of scope.

See also:

cuCoredumpRegisterStartCallback

CUresult cuCoredumpGetAttribute ( CUcoredumpSettings attrib, void* value, size_t* size )
Allows caller to fetch a coredump attribute value for the current context.
Parameters
attrib
- The enum defining which value to fetch.
value
- void* containing the requested data.
size
- The size of the memory region value points to.
Description

Returns in *value the requested value specified by attrib. It is up to the caller to ensure that the data type and size of *value matches the request.

If the caller calls this function with *value equal to NULL, the size of the memory region (in bytes) expected for attrib will be placed in size.

The supported attributes are:

  • CU_COREDUMP_ENABLE_ON_EXCEPTION: Bool where true means that GPU exceptions from this context will create a coredump at the location specified by CU_COREDUMP_FILE. The default value is false unless set to true globally or locally, or the CU_CTX_USER_COREDUMP_ENABLE flag was set during context creation.

  • CU_COREDUMP_TRIGGER_HOST: Bool where true means that the host CPU will also create a coredump. The default value is true unless set to false globally or or locally. This value is deprecated as of CUDA 12.5 - raise the CU_COREDUMP_SKIP_ABORT flag to disable host device abort() if needed.

  • CU_COREDUMP_LIGHTWEIGHT: Bool where true means that any resulting coredumps will not have a dump of GPU memory or non-reloc ELF images. The default value is false unless set to true globally or locally. This attribute is deprecated as of CUDA 12.5, please use CU_COREDUMP_GENERATION_FLAGS instead.

  • CU_COREDUMP_ENABLE_USER_TRIGGER: Bool where true means that a coredump can be created by writing to the system pipe specified by CU_COREDUMP_PIPE. The default value is false unless set to true globally or locally.

  • CU_COREDUMP_FILE: String of up to 1023 characters that defines the location where any coredumps generated by this context will be written. The default value is core.cuda.HOSTNAME.PID where HOSTNAME is the host name of the machine running the CUDA applications and PID is the process ID of the CUDA application.

  • CU_COREDUMP_PIPE: String of up to 1023 characters that defines the name of the pipe that will be monitored if user-triggered coredumps are enabled. The default value is corepipe.cuda.HOSTNAME.PID where HOSTNAME is the host name of the machine running the CUDA application and PID is the process ID of the CUDA application.

  • CU_COREDUMP_GENERATION_FLAGS: An integer with values to allow granular control the data contained in a coredump specified as a bitwise OR combination of the following values: + CU_COREDUMP_DEFAULT_FLAGS - if set by itself, coredump generation returns to its default settings of including all memory regions that it is able to access + CU_COREDUMP_SKIP_NONRELOCATED_ELF_IMAGES - Coredump will not include the data from CUDA source modules that are not relocated at runtime. + CU_COREDUMP_SKIP_GLOBAL_MEMORY - Coredump will not include device-side global data that does not belong to any context. + CU_COREDUMP_SKIP_SHARED_MEMORY - Coredump will not include grid-scale shared memory for the warp that the dumped kernel belonged to. + CU_COREDUMP_SKIP_LOCAL_MEMORY - Coredump will not include local memory from the kernel. + CU_COREDUMP_LIGHTWEIGHT_FLAGS - Enables all of the above options. Equiavlent to setting the CU_COREDUMP_LIGHTWEIGHT attribute to true. + CU_COREDUMP_SKIP_ABORT - If set, GPU exceptions will not raise an abort() in the host CPU process. Same functional goal as CU_COREDUMP_TRIGGER_HOST but better reflects the default behavior.

See also:

cuCoredumpGetAttributeGlobal, cuCoredumpSetAttribute, cuCoredumpSetAttributeGlobal

CUresult cuCoredumpGetAttributeGlobal ( CUcoredumpSettings attrib, void* value, size_t* size )
Allows caller to fetch a coredump attribute value for the entire application.
Parameters
attrib
- The enum defining which value to fetch.
value
- void* containing the requested data.
size
- The size of the memory region value points to.
Description

Returns in *value the requested value specified by attrib. It is up to the caller to ensure that the data type and size of *value matches the request.

If the caller calls this function with *value equal to NULL, the size of the memory region (in bytes) expected for attrib will be placed in size.

The supported attributes are:

  • CU_COREDUMP_ENABLE_ON_EXCEPTION: Bool where true means that GPU exceptions from this context will create a coredump at the location specified by CU_COREDUMP_FILE. The default value is false.

  • CU_COREDUMP_TRIGGER_HOST: Bool where true means that the host CPU will also create a coredump. The default value is true unless set to false globally or or locally. This value is deprecated as of CUDA 12.5 - raise the CU_COREDUMP_SKIP_ABORT flag to disable host device abort() if needed.

  • CU_COREDUMP_LIGHTWEIGHT: Bool where true means that any resulting coredumps will not have a dump of GPU memory or non-reloc ELF images. The default value is false. This attribute is deprecated as of CUDA 12.5, please use CU_COREDUMP_GENERATION_FLAGS instead.

  • CU_COREDUMP_ENABLE_USER_TRIGGER: Bool where true means that a coredump can be created by writing to the system pipe specified by CU_COREDUMP_PIPE. The default value is false.

  • CU_COREDUMP_FILE: String of up to 1023 characters that defines the location where any coredumps generated by this context will be written. The default value is core.cuda.HOSTNAME.PID where HOSTNAME is the host name of the machine running the CUDA applications and PID is the process ID of the CUDA application.

  • CU_COREDUMP_PIPE: String of up to 1023 characters that defines the name of the pipe that will be monitored if user-triggered coredumps are enabled. The default value is corepipe.cuda.HOSTNAME.PID where HOSTNAME is the host name of the machine running the CUDA application and PID is the process ID of the CUDA application.

  • CU_COREDUMP_GENERATION_FLAGS: An integer with values to allow granular control the data contained in a coredump specified as a bitwise OR combination of the following values: + CU_COREDUMP_DEFAULT_FLAGS - if set by itself, coredump generation returns to its default settings of including all memory regions that it is able to access + CU_COREDUMP_SKIP_NONRELOCATED_ELF_IMAGES - Coredump will not include the data from CUDA source modules that are not relocated at runtime. + CU_COREDUMP_SKIP_GLOBAL_MEMORY - Coredump will not include device-side global data that does not belong to any context. + CU_COREDUMP_SKIP_SHARED_MEMORY - Coredump will not include grid-scale shared memory for the warp that the dumped kernel belonged to. + CU_COREDUMP_SKIP_LOCAL_MEMORY - Coredump will not include local memory from the kernel. + CU_COREDUMP_LIGHTWEIGHT_FLAGS - Enables all of the above options. Equiavlent to setting the CU_COREDUMP_LIGHTWEIGHT attribute to true. + CU_COREDUMP_SKIP_ABORT - If set, GPU exceptions will not raise an abort() in the host CPU process. Same functional goal as CU_COREDUMP_TRIGGER_HOST but better reflects the default behavior.

See also:

cuCoredumpGetAttribute, cuCoredumpSetAttribute, cuCoredumpSetAttributeGlobal

CUresult cuCoredumpRegisterCompleteCallback ( CUcoredumpStatusCallback callback, void* userData, CUcoredumpCallbackHandle* callbackOut )
Register a callback to be invoked when a GPU coredump completes.
Parameters
callback
- The callback function to register
userData
- User data pointer to pass to the callback
callbackOut
- Location to store the callback handle (optional, may be NULL)
Description

This function registers a callback that will be called when a GPU coredump has been fully collected and written to disk. Callbacks are executed in the order they were registered. The same callback function can be registered multiple times with different userData, and each registration will receive a unique handle.

Note:

Callbacks execute synchronously during the coredump process and will block coredump progress while running.

See also:

cuCoredumpDeregisterCompleteCallback, cuCoredumpRegisterStartCallback

CUresult cuCoredumpRegisterStartCallback ( CUcoredumpStatusCallback callback, void* userData, CUcoredumpCallbackHandle* callbackOut )
Register a callback to be invoked when a GPU coredump begins.
Parameters
callback
- The callback function to register
userData
- User data pointer to pass to the callback
callbackOut
- Location to store the callback handle (optional, may be NULL)
Description

This function registers a callback that will be called when a GPU coredump is initiated, before any coredump data is collected. Callbacks are executed in the order they were registered. The same callback function can be registered multiple times with different userData, and each registration will receive a unique handle.

Note:

Callbacks execute synchronously during the coredump process and will block coredump progress while running.

See also:

cuCoredumpDeregisterStartCallback, cuCoredumpRegisterCompleteCallback

CUresult cuCoredumpSetAttribute ( CUcoredumpSettings attrib, void* value, size_t* size )
Allows caller to set a coredump attribute value for the current context.
Parameters
attrib
- The enum defining which value to set.
value
- void* containing the requested data.
size
- The size of the memory region value points to.
Description

This function should be considered an alternate interface to the CUDA-GDB environment variables defined in this document: https://docs.nvidia.com/cuda/cuda-gdb/index.html#gpu-coredump

An important design decision to note is that any coredump environment variable values set before CUDA initializes will take permanent precedence over any values set with this function. This decision was made to ensure no change in behavior for any users that may be currently using these variables to get coredumps.

*value shall contain the requested value specified by set. It is up to the caller to ensure that the data type and size of *value matches the request.

If the caller calls this function with *value equal to NULL, the size of the memory region (in bytes) expected for set will be placed in size.

/note This function will return CUDA_ERROR_NOT_SUPPORTED if the caller attempts to set CU_COREDUMP_ENABLE_ON_EXCEPTION on a GPU of with Compute Capability < 6.0. cuCoredumpSetAttributeGlobal works on those platforms as an alternative.

/note CU_COREDUMP_ENABLE_USER_TRIGGER and CU_COREDUMP_PIPE cannot be set on a per-context basis.

The supported attributes are:

  • CU_COREDUMP_ENABLE_ON_EXCEPTION: Bool where true means that GPU exceptions from this context will create a coredump at the location specified by CU_COREDUMP_FILE. The default value is false.

  • CU_COREDUMP_TRIGGER_HOST: Bool where true means that the host CPU will also create a coredump. The default value is true unless set to false globally or or locally. This value is deprecated as of CUDA 12.5 - raise the CU_COREDUMP_SKIP_ABORT flag to disable host device abort() if needed.

  • CU_COREDUMP_LIGHTWEIGHT: Bool where true means that any resulting coredumps will not have a dump of GPU memory or non-reloc ELF images. The default value is false. This attribute is deprecated as of CUDA 12.5, please use CU_COREDUMP_GENERATION_FLAGS instead.

  • CU_COREDUMP_FILE: String of up to 1023 characters that defines the location where any coredumps generated by this context will be written. The default value is core.cuda.HOSTNAME.PID where HOSTNAME is the host name of the machine running the CUDA applications and PID is the process ID of the CUDA application.

  • CU_COREDUMP_GENERATION_FLAGS: An integer with values to allow granular control the data contained in a coredump specified as a bitwise OR combination of the following values: + CU_COREDUMP_DEFAULT_FLAGS - if set by itself, coredump generation returns to its default settings of including all memory regions that it is able to access + CU_COREDUMP_SKIP_NONRELOCATED_ELF_IMAGES - Coredump will not include the data from CUDA source modules that are not relocated at runtime. + CU_COREDUMP_SKIP_GLOBAL_MEMORY - Coredump will not include device-side global data that does not belong to any context. + CU_COREDUMP_SKIP_SHARED_MEMORY - Coredump will not include grid-scale shared memory for the warp that the dumped kernel belonged to. + CU_COREDUMP_SKIP_LOCAL_MEMORY - Coredump will not include local memory from the kernel. + CU_COREDUMP_LIGHTWEIGHT_FLAGS - Enables all of the above options. Equiavlent to setting the CU_COREDUMP_LIGHTWEIGHT attribute to true. + CU_COREDUMP_SKIP_ABORT - If set, GPU exceptions will not raise an abort() in the host CPU process. Same functional goal as CU_COREDUMP_TRIGGER_HOST but better reflects the default behavior.

See also:

cuCoredumpGetAttributeGlobal, cuCoredumpGetAttribute, cuCoredumpSetAttributeGlobal

CUresult cuCoredumpSetAttributeGlobal ( CUcoredumpSettings attrib, void* value, size_t* size )
Allows caller to set a coredump attribute value globally.
Parameters
attrib
- The enum defining which value to set.
value
- void* containing the requested data.
size
- The size of the memory region value points to.
Description

This function should be considered an alternate interface to the CUDA-GDB environment variables defined in this document: https://docs.nvidia.com/cuda/cuda-gdb/index.html#gpu-coredump

An important design decision to note is that any coredump environment variable values set before CUDA initializes will take permanent precedence over any values set with this function. This decision was made to ensure no change in behavior for any users that may be currently using these variables to get coredumps.

*value shall contain the requested value specified by set. It is up to the caller to ensure that the data type and size of *value matches the request.

If the caller calls this function with *value equal to NULL, the size of the memory region (in bytes) expected for set will be placed in size.

The supported attributes are:

  • CU_COREDUMP_ENABLE_ON_EXCEPTION: Bool where true means that GPU exceptions from this context will create a coredump at the location specified by CU_COREDUMP_FILE. The default value is false.

  • CU_COREDUMP_TRIGGER_HOST: Bool where true means that the host CPU will also create a coredump. The default value is true unless set to false globally or or locally. This value is deprecated as of CUDA 12.5 - raise the CU_COREDUMP_SKIP_ABORT flag to disable host device abort() if needed.

  • CU_COREDUMP_LIGHTWEIGHT: Bool where true means that any resulting coredumps will not have a dump of GPU memory or non-reloc ELF images. The default value is false. This attribute is deprecated as of CUDA 12.5, please use CU_COREDUMP_GENERATION_FLAGS instead.

  • CU_COREDUMP_ENABLE_USER_TRIGGER: Bool where true means that a coredump can be created by writing to the system pipe specified by CU_COREDUMP_PIPE. The default value is false.

  • CU_COREDUMP_FILE: String of up to 1023 characters that defines the location where any coredumps generated by this context will be written. The default value is core.cuda.HOSTNAME.PID where HOSTNAME is the host name of the machine running the CUDA applications and PID is the process ID of the CUDA application.

  • CU_COREDUMP_PIPE: String of up to 1023 characters that defines the name of the pipe that will be monitored if user-triggered coredumps are enabled. This value may not be changed after CU_COREDUMP_ENABLE_USER_TRIGGER is set to true. The default value is corepipe.cuda.HOSTNAME.PID where HOSTNAME is the host name of the machine running the CUDA application and PID is the process ID of the CUDA application.

  • CU_COREDUMP_GENERATION_FLAGS: An integer with values to allow granular control the data contained in a coredump specified as a bitwise OR combination of the following values: + CU_COREDUMP_DEFAULT_FLAGS - if set by itself, coredump generation returns to its default settings of including all memory regions that it is able to access + CU_COREDUMP_SKIP_NONRELOCATED_ELF_IMAGES - Coredump will not include the data from CUDA source modules that are not relocated at runtime. + CU_COREDUMP_SKIP_GLOBAL_MEMORY - Coredump will not include device-side global data that does not belong to any context. + CU_COREDUMP_SKIP_SHARED_MEMORY - Coredump will not include grid-scale shared memory for the warp that the dumped kernel belonged to. + CU_COREDUMP_SKIP_LOCAL_MEMORY - Coredump will not include local memory from the kernel. + CU_COREDUMP_LIGHTWEIGHT_FLAGS - Enables all of the above options. Equiavlent to setting the CU_COREDUMP_LIGHTWEIGHT attribute to true. + CU_COREDUMP_SKIP_ABORT - If set, GPU exceptions will not raise an abort() in the host CPU process. Same functional goal as CU_COREDUMP_TRIGGER_HOST but better reflects the default behavior.

See also:

cuCoredumpGetAttribute, cuCoredumpGetAttributeGlobal, cuCoredumpSetAttribute