4.14. Event Management

This section describes the event management functions of the low-level CUDA driver application programming interface.

Functions

CUresult cuEventCreate ( CUevent* phEvent, unsigned int  Flags )
Creates an event.
CUresult cuEventDestroy ( CUevent hEvent )
Destroys an event.
CUresult cuEventElapsedTime ( float* pMilliseconds, CUevent hStart, CUevent hEnd )
Computes the elapsed time between two events.
CUresult cuEventQuery ( CUevent hEvent )
Queries an event's status.
CUresult cuEventRecord ( CUevent hEvent, CUstream hStream )
Records an event.
CUresult cuEventSynchronize ( CUevent hEvent )
Waits for an event to complete.
CUresult cuStreamBatchMemOp ( CUstream stream, unsigned int  count, CUstreamBatchMemOpParams* paramArray, unsigned int  flags )
Batch operations to synchronize the stream via memory operations.
CUresult cuStreamWaitValue32 ( CUstream stream, CUdeviceptr addr, cuuint32_t value, unsigned int  flags )
Wait on a memory location.
CUresult cuStreamWaitValue64 ( CUstream stream, CUdeviceptr addr, cuuint64_t value, unsigned int  flags )
Wait on a memory location.
CUresult cuStreamWriteValue32 ( CUstream stream, CUdeviceptr addr, cuuint32_t value, unsigned int  flags )
Write a value to memory.
CUresult cuStreamWriteValue64 ( CUstream stream, CUdeviceptr addr, cuuint64_t value, unsigned int  flags )
Write a value to memory.

Functions

CUresult cuEventCreate ( CUevent* phEvent, unsigned int  Flags )
Creates an event.
Parameters
phEvent
- Returns newly created event
Flags
- Event creation flags
Description

Creates an event *phEvent with the flags specified via Flags. Valid flags include:

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuEventRecord, cuEventQuery, cuEventSynchronize, cuEventDestroy, cuEventElapsedTime, cudaEventCreate, cudaEventCreateWithFlags

CUresult cuEventDestroy ( CUevent hEvent )
Destroys an event.
Parameters
hEvent
- Event to destroy
Description

Destroys the event specified by hEvent.

In case hEvent has been recorded but has not yet been completed when cuEventDestroy() is called, the function will return immediately and the resources associated with hEvent will be released automatically once the device has completed hEvent.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuEventCreate, cuEventRecord, cuEventQuery, cuEventSynchronize, cuEventElapsedTime, cudaEventDestroy

CUresult cuEventElapsedTime ( float* pMilliseconds, CUevent hStart, CUevent hEnd )
Computes the elapsed time between two events.
Parameters
pMilliseconds
- Time between hStart and hEnd in ms
hStart
- Starting event
hEnd
- Ending event
Description

Computes the elapsed time between two events (in milliseconds with a resolution of around 0.5 microseconds).

If either event was last recorded in a non-NULL stream, the resulting time may be greater than expected (even if both used the same stream handle). This happens because the cuEventRecord() operation takes place asynchronously and there is no guarantee that the measured latency is actually just between the two events. Any number of other different stream operations could execute in between the two measured events, thus altering the timing in a significant way.

If cuEventRecord() has not been called on either event then CUDA_ERROR_INVALID_HANDLE is returned. If cuEventRecord() has been called on both events but one or both of them has not yet been completed (that is, cuEventQuery() would return CUDA_ERROR_NOT_READY on at least one of the events), CUDA_ERROR_NOT_READY is returned. If either event was created with the CU_EVENT_DISABLE_TIMING flag, then this function will return CUDA_ERROR_INVALID_HANDLE.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuEventCreate, cuEventRecord, cuEventQuery, cuEventSynchronize, cuEventDestroy, cudaEventElapsedTime

CUresult cuEventQuery ( CUevent hEvent )
Queries an event's status.
Parameters
hEvent
- Event to query
Description

Query the status of all device work preceding the most recent call to cuEventRecord() (in the appropriate compute streams, as specified by the arguments to cuEventRecord()).

If this work has successfully been completed by the device, or if cuEventRecord() has not been called on hEvent, then CUDA_SUCCESS is returned. If this work has not yet been completed by the device then CUDA_ERROR_NOT_READY is returned.

For the purposes of Unified Memory, a return value of CUDA_SUCCESS is equivalent to having called cuEventSynchronize().

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuEventCreate, cuEventRecord, cuEventSynchronize, cuEventDestroy, cuEventElapsedTime, cudaEventQuery

CUresult cuEventRecord ( CUevent hEvent, CUstream hStream )
Records an event.
Parameters
hEvent
- Event to record
hStream
- Stream to record event for
Description

Records an event. See note on NULL stream behavior. Since operation is asynchronous, cuEventQuery or cuEventSynchronize() must be used to determine when the event has actually been recorded.

If cuEventRecord() has previously been called on hEvent, then this call will overwrite any existing state in hEvent. Any subsequent calls which examine the status of hEvent will only examine the completion of this most recent call to cuEventRecord().

It is necessary that hEvent and hStream be created on the same context.

Note:
  • This function uses standard default stream semantics.

  • Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuEventCreate, cuEventQuery, cuEventSynchronize, cuStreamWaitEvent, cuEventDestroy, cuEventElapsedTime, cudaEventRecord

CUresult cuEventSynchronize ( CUevent hEvent )
Waits for an event to complete.
Parameters
hEvent
- Event to wait for
Description

Wait until the completion of all device work preceding the most recent call to cuEventRecord() (in the appropriate compute streams, as specified by the arguments to cuEventRecord()).

If cuEventRecord() has not been called on hEvent, CUDA_SUCCESS is returned immediately.

Waiting for an event that was created with the CU_EVENT_BLOCKING_SYNC flag will cause the calling CPU thread to block until the event has been completed by the device. If the CU_EVENT_BLOCKING_SYNC flag has not been set, then the CPU thread will busy-wait until the event has been completed by the device.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuEventCreate, cuEventRecord, cuEventQuery, cuEventDestroy, cuEventElapsedTime, cudaEventSynchronize

CUresult cuStreamBatchMemOp ( CUstream stream, unsigned int  count, CUstreamBatchMemOpParams* paramArray, unsigned int  flags )
Batch operations to synchronize the stream via memory operations.
Parameters
stream
The stream to enqueue the operations in.
count
The number of operations in the array. Must be less than 256.
paramArray
The types and parameters of the individual operations.
flags
Reserved for future expansion; must be 0.
Description

This is a batch version of cuStreamWaitValue32() and cuStreamWriteValue32(). Batching operations may avoid some performance overhead in both the API call and the device execution versus adding them to the stream in separate API calls. The operations are enqueued in the order they appear in the array.

See CUstreamBatchMemOpType for the full set of supported operations, and cuStreamWaitValue32(), cuStreamWaitValue64(), cuStreamWriteValue32(), and cuStreamWriteValue64() for details of specific operations.

Basic support for this can be queried with cuDeviceGetAttribute() and CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS. See related APIs for details on querying support for specific operations.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuStreamWaitValue32, cuStreamWaitValue64, cuStreamWriteValue32, cuStreamWriteValue64, cuMemHostRegister

CUresult cuStreamWaitValue32 ( CUstream stream, CUdeviceptr addr, cuuint32_t value, unsigned int  flags )
Wait on a memory location.
Parameters
stream
The stream to synchronize on the memory location.
addr
The memory location to wait on.
value
The value to compare with the memory location.
flags
See CUstreamWaitValue_flags.
Description

Enqueues a synchronization of the stream on the given memory location. Work ordered after the operation will block until the given condition on the memory is satisfied. By default, the condition is to wait for (int32_t)(*addr - value) >= 0, a cyclic greater-or-equal. Other condition types can be specified via flags.

If the memory was registered via cuMemHostRegister(), the device pointer should be obtained with cuMemHostGetDevicePointer(). This function cannot be used with managed memory (cuMemAllocManaged).

Support for this can be queried with cuDeviceGetAttribute() and CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS. The only requirement for basic support is that on Windows, a device must be in TCC mode.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuStreamWaitValue64, cuStreamWriteValue32, cuStreamWriteValue64cuStreamBatchMemOp, cuMemHostRegister, cuStreamWaitEvent

CUresult cuStreamWaitValue64 ( CUstream stream, CUdeviceptr addr, cuuint64_t value, unsigned int  flags )
Wait on a memory location.
Parameters
stream
The stream to synchronize on the memory location.
addr
The memory location to wait on.
value
The value to compare with the memory location.
flags
See CUstreamWaitValue_flags.
Description

Enqueues a synchronization of the stream on the given memory location. Work ordered after the operation will block until the given condition on the memory is satisfied. By default, the condition is to wait for (int64_t)(*addr - value) >= 0, a cyclic greater-or-equal. Other condition types can be specified via flags.

If the memory was registered via cuMemHostRegister(), the device pointer should be obtained with cuMemHostGetDevicePointer().

Support for this can be queried with cuDeviceGetAttribute() and CU_DEVICE_ATTRIBUTE_CAN_USE_64_BIT_STREAM_MEM_OPS. The requirements are compute capability 7.0 or greater, and on Windows, that the device be in TCC mode.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuStreamWaitValue32, cuStreamWriteValue32, cuStreamWriteValue64, cuStreamBatchMemOp, cuMemHostRegister, cuStreamWaitEvent

CUresult cuStreamWriteValue32 ( CUstream stream, CUdeviceptr addr, cuuint32_t value, unsigned int  flags )
Write a value to memory.
Parameters
stream
The stream to do the write in.
addr
The device address to write to.
value
The value to write.
flags
See CUstreamWriteValue_flags.
Description

Write a value to memory. Unless the CU_STREAM_WRITE_VALUE_NO_MEMORY_BARRIER flag is passed, the write is preceded by a system-wide memory fence, equivalent to a __threadfence_system() but scoped to the stream rather than a CUDA thread.

If the memory was registered via cuMemHostRegister(), the device pointer should be obtained with cuMemHostGetDevicePointer(). This function cannot be used with managed memory (cuMemAllocManaged).

Support for this can be queried with cuDeviceGetAttribute() and CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS. The only requirement for basic support is that on Windows, a device must be in TCC mode.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuStreamWriteValue64, cuStreamWaitValue32, cuStreamWaitValue64, cuStreamBatchMemOp, cuMemHostRegister, cuEventRecord

CUresult cuStreamWriteValue64 ( CUstream stream, CUdeviceptr addr, cuuint64_t value, unsigned int  flags )
Write a value to memory.
Parameters
stream
The stream to do the write in.
addr
The device address to write to.
value
The value to write.
flags
See CUstreamWriteValue_flags.
Description

Write a value to memory. Unless the CU_STREAM_WRITE_VALUE_NO_MEMORY_BARRIER flag is passed, the write is preceded by a system-wide memory fence, equivalent to a __threadfence_system() but scoped to the stream rather than a CUDA thread.

If the memory was registered via cuMemHostRegister(), the device pointer should be obtained with cuMemHostGetDevicePointer().

Support for this can be queried with cuDeviceGetAttribute() and CU_DEVICE_ATTRIBUTE_CAN_USE_64_BIT_STREAM_MEM_OPS. The requirements are compute capability 7.0 or greater, and on Windows, that the device be in TCC mode.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

See also:

cuStreamWriteValue32, cuStreamWaitValue32, cuStreamWaitValue64, cuStreamBatchMemOp, cuMemHostRegister, cuEventRecord