3. Modules

3.1. CUPTI Result Codes

Error and result codes returned by CUPTI functions.

Enumerations

enum CUptiResult
CUPTI result codes.

Functions

CUptiResult cuptiGetResultString ( CUptiResult result, const char** str )
Get the descriptive string for a CUptiResult.

Enumerations

enum CUptiResult

Error and result codes returned by CUPTI functions.

Values
CUPTI_SUCCESS = 0
No error.
CUPTI_ERROR_INVALID_PARAMETER = 1
One or more of the parameters is invalid.
CUPTI_ERROR_INVALID_DEVICE = 2
The device does not correspond to a valid CUDA device.
CUPTI_ERROR_INVALID_CONTEXT = 3
The context is NULL or not valid.
CUPTI_ERROR_INVALID_EVENT_DOMAIN_ID = 4
The event domain id is invalid.
CUPTI_ERROR_INVALID_EVENT_ID = 5
The event id is invalid.
CUPTI_ERROR_INVALID_EVENT_NAME = 6
The event name is invalid.
CUPTI_ERROR_INVALID_OPERATION = 7
The current operation cannot be performed due to dependency on other factors.
CUPTI_ERROR_OUT_OF_MEMORY = 8
Unable to allocate enough memory to perform the requested operation.
CUPTI_ERROR_HARDWARE = 9
An error occurred on the performance monitoring hardware.
CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT = 10
The output buffer size is not sufficient to return all requested data.
CUPTI_ERROR_API_NOT_IMPLEMENTED = 11
API is not implemented.
CUPTI_ERROR_MAX_LIMIT_REACHED = 12
The maximum limit is reached.
CUPTI_ERROR_NOT_READY = 13
The object is not yet ready to perform the requested operation.
CUPTI_ERROR_NOT_COMPATIBLE = 14
The current operation is not compatible with the current state of the object
CUPTI_ERROR_NOT_INITIALIZED = 15
CUPTI is unable to initialize its connection to the CUDA driver.
CUPTI_ERROR_INVALID_METRIC_ID = 16
The metric id is invalid.
CUPTI_ERROR_INVALID_METRIC_NAME = 17
The metric name is invalid.
CUPTI_ERROR_QUEUE_EMPTY = 18
The queue is empty.
CUPTI_ERROR_INVALID_HANDLE = 19
Invalid handle (internal?).
CUPTI_ERROR_INVALID_STREAM = 20
Invalid stream.
CUPTI_ERROR_INVALID_KIND = 21
Invalid kind.
CUPTI_ERROR_INVALID_EVENT_VALUE = 22
Invalid event value.
CUPTI_ERROR_DISABLED = 23
CUPTI is disabled due to conflicts with other enabled profilers
CUPTI_ERROR_INVALID_MODULE = 24
Invalid module.
CUPTI_ERROR_INVALID_METRIC_VALUE = 25
Invalid metric value.
CUPTI_ERROR_HARDWARE_BUSY = 26
The performance monitoring hardware is in use by other client.
CUPTI_ERROR_NOT_SUPPORTED = 27
The attempted operation is not supported on the current system or device.
CUPTI_ERROR_UM_PROFILING_NOT_SUPPORTED = 28
Unified memory profiling is not supported on the system. Potential reason could be unsupported OS or architecture.
CUPTI_ERROR_UM_PROFILING_NOT_SUPPORTED_ON_DEVICE = 29
Unified memory profiling is not supported on the device
CUPTI_ERROR_UM_PROFILING_NOT_SUPPORTED_ON_NON_P2P_DEVICES = 30
Unified memory profiling is not supported on a multi-GPU configuration without P2P support between any pair of devices
CUPTI_ERROR_UM_PROFILING_NOT_SUPPORTED_WITH_MPS = 31
Unified memory profiling is not supported under the Multi-Process Service (MPS) environment. CUDA 7.5 removes this restriction.
CUPTI_ERROR_CDP_TRACING_NOT_SUPPORTED = 32
In CUDA 9.0, devices with compute capability 7.0 don't support CDP tracing
CUPTI_ERROR_VIRTUALIZED_DEVICE_NOT_SUPPORTED = 33
Profiling on virtualized GPU is not supported.
CUPTI_ERROR_CUDA_COMPILER_NOT_COMPATIBLE = 34
Profiling results might be incorrect for CUDA applications compiled with nvcc version older than 9.0 for devices with compute capability 6.0 and 6.1. Profiling session will continue and CUPTI will notify it using this error code. User is advised to recompile the application code with nvcc version 9.0 or later. Ignore this warning if code is already compiled with the recommended nvcc version.
CUPTI_ERROR_INSUFFICIENT_PRIVILEGES = 35
User doesn't have sufficient privileges which are required to start the profiling session. One possible reason for this may be that the NVIDIA driver or your system administrator may have restricted access to the NVIDIA GPU performance counters. To learn how to resolve this issue and find more information, please visit https://developer.nvidia.com/CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
CUPTI_ERROR_OLD_PROFILER_API_INITIALIZED = 36
Old profiling api's are not supported with new profiling api's
CUPTI_ERROR_OPENACC_UNDEFINED_ROUTINE = 37
Missing definition of the OpenACC API routine in the linked OpenACC library.One possible reason is that OpenACC library is linked statically in the user application, which might not have the definition of all the OpenACC API routines needed for the OpenACC profiling, as compiler might ignore definitions for the functions not used in the application. This issue can be mitigated by linking the OpenACC library dynamically.
CUPTI_ERROR_LEGACY_PROFILER_NOT_SUPPORTED = 38
Legacy CUPTI Profiling is not supported on devices with Compute Capability 7.5 or higher (Turing+). Using this error to specify this case and differentiate it from other errors.
CUPTI_ERROR_MULTIPLE_SUBSCRIBERS_NOT_SUPPORTED = 39
CUPTI doesn't allow multiple callback subscribers. Only a single subscriber can be registered at a time. Same error code is used when application is launched using NVIDIA tools like nvprof, Visual Profiler, Nsight Systems, Nsight Compute, cuda-gdb and cuda-memcheck.
CUPTI_ERROR_UNKNOWN = 999
An unknown internal error has occurred.
CUPTI_ERROR_FORCE_INT = 0x7fffffff

Functions

CUptiResult cuptiGetResultString ( CUptiResult result, const char** str )
Get the descriptive string for a CUptiResult.
Parameters
result
The result to get the string for
str
Returns the string
Returns

  • CUPTI_SUCCESS

    on success

  • CUPTI_ERROR_INVALID_PARAMETER

    if str is NULL or result is not a valid CUptiResult

Description

Return the descriptive string for a CUptiResult in *str.

Note:

Thread-safety: this function is thread safe.

3.2. CUPTI Version

Function and macro to determine the CUPTI version.

Defines

#define CUPTI_API_VERSION 13
The API version for this implementation of CUPTI.

Functions

CUptiResult cuptiGetVersion ( uint32_t* version )
Get the CUPTI API version.

Defines

#define CUPTI_API_VERSION 13

The API version for this implementation of CUPTI. This define along with cuptiGetVersion can be used to dynamically detect if the version of CUPTI compiled against matches the version of the loaded CUPTI library.

v1 : CUDAToolsSDK 4.0 v2 : CUDAToolsSDK 4.1 v3 : CUDA Toolkit 5.0 v4 : CUDA Toolkit 5.5 v5 : CUDA Toolkit 6.0 v6 : CUDA Toolkit 6.5 v7 : CUDA Toolkit 6.5(with sm_52 support) v8 : CUDA Toolkit 7.0 v9 : CUDA Toolkit 8.0 v10 : CUDA Toolkit 9.0 v11 : CUDA Toolkit 9.1 v12 : CUDA Toolkit 10.0, 10.1 and 10.2 v13 : CUDA Toolkit 11.0

Functions

CUptiResult cuptiGetVersion ( uint32_t* version )
Get the CUPTI API version.
Parameters
version
Returns the version
Returns

  • CUPTI_SUCCESS

    on success

  • CUPTI_ERROR_INVALID_PARAMETER

    if version is NULL

Description

Return the API version in *version.

See also:

CUPTI_API_VERSION

3.3. CUPTI Activity API

Functions, types, and enums that implement the CUPTI Activity API.

Classes

struct 
The base activity record.
struct 
The activity record for a driver or runtime API invocation.
struct 
Device auto boost state structure.
struct 
The activity record for source level result branch. (deprecated).
struct 
The activity record for source level result branch.
struct 
The activity record for CDP (CUDA Dynamic Parallelism) kernel.
struct 
The activity record for a context.
struct 
The activity record for CUDA event.
struct 
The activity record for a device. (deprecated).
struct 
The activity record for a device. (CUDA 7.0 onwards).
struct 
The activity record for a device attribute.
struct 
The activity record for CUPTI environmental data.
struct 
The activity record for a CUPTI event.
struct 
The activity record for a CUPTI event with instance information.
struct 
The activity record for correlation with external records.
struct 
The activity record for global/device functions.
struct 
The activity record for source-level global access. (deprecated).
struct 
The activity record for source-level global access. (deprecated in CUDA 9.0).
struct 
The activity record for source-level global access.
struct 
The activity record for an instantaneous CUPTI event.
struct 
The activity record for an instantaneous CUPTI event with event domain instance information.
struct 
The activity record for an instantaneous CUPTI metric.
struct 
The instantaneous activity record for a CUPTI metric with instance information.
struct 
The activity record for source-level sass/source line-by-line correlation.
struct 
The activity record for source-level instruction execution.
struct 
The activity record for kernel. (deprecated).
struct 
The activity record for kernel. (deprecated).
struct 
The activity record for a kernel (CUDA 6.5(with sm_52 support) onwards). (deprecated in CUDA 9.0).
struct 
The activity record for a kernel (CUDA 9.0(with sm_70 support) onwards). (deprecated in CUDA 11.0).
struct 
struct 
The activity record providing a marker which is an instantaneous point in time. (deprecated in CUDA 8.0).
struct 
The activity record providing a marker which is an instantaneous point in time.
struct 
The activity record providing detailed information for a marker.
struct 
The activity record for memory copies.
struct 
The activity record for memory copies.
struct 
The activity record for peer-to-peer memory copies.
struct 
The activity record for peer-to-peer memory copies.
struct 
The activity record for memory.
struct 
The activity record for memset.
struct 
The activity record for memset.
struct 
The activity record for a CUPTI metric.
struct 
The activity record for a CUPTI metric with instance information.
struct 
The activity record for a CUDA module.
struct 
The activity record providing a name.
struct 
NVLink information. (deprecated in CUDA 9.0).
struct 
NVLink information. (deprecated in CUDA 10.0).
struct 
NVLink information.
union 
Identifiers for object kinds as specified by CUpti_ActivityObjectKind.
struct 
The base activity record for OpenAcc records.
struct 
The activity record for OpenACC data.
struct 
The activity record for OpenACC launch.
struct 
The activity record for OpenACC other.
struct 
The base activity record for OpenMp records.
struct 
The activity record for CUPTI and driver overheads.
struct 
The activity record for PC sampling. (deprecated in CUDA 8.0).
struct 
The activity record for PC sampling. (deprecated in CUDA 9.0).
struct 
The activity record for PC sampling.
struct 
PC sampling configuration structure.
struct 
The activity record for record status for PC sampling.
struct 
PCI devices information required to construct topology.
struct 
The activity record for a preemption of a CDP kernel.
struct 
The activity record for source-level shared access.
struct 
The activity record for source locator.
struct 
The activity record for CUDA stream.
struct 
The activity record for synchronization management.
struct 
The activity record for Unified Memory counters (deprecated in CUDA 7.0).
struct 
The activity record for Unified Memory counters (CUDA 7.0 and beyond).
struct 
Unified Memory counters configuration structure.

Defines

#define CUPTI_AUTO_BOOST_INVALID_CLIENT_PID 0
#define CUPTI_CORRELATION_ID_UNKNOWN 0
#define CUPTI_FUNCTION_INDEX_ID_INVALID 0
#define CUPTI_GRID_ID_UNKNOWN 0LL
#define CUPTI_MAX_NVLINK_PORTS 16
#define CUPTI_NVLINK_INVALID_PORT -1
#define CUPTI_SOURCE_LOCATOR_ID_UNKNOWN 0
#define CUPTI_SYNCHRONIZATION_INVALID_VALUE -1
#define CUPTI_TIMESTAMP_UNKNOWN 0LL

Typedefs

typedef void  ( *CUpti_BuffersCallbackCompleteFunc )( CUcontext context,  uint32_t streamId, uint8_t*  buffer,  size_t size,  size_t validSize )
Function type for callback used by CUPTI to return a buffer of activity records.
typedef void  ( *CUpti_BuffersCallbackRequestFunc )( uint8_t*  *buffer, size_t*  size, size_t*  maxNumRecords )
Function type for callback used by CUPTI to request an empty buffer for storing activity records.

Enumerations

enum CUpti_ActivityAttribute
Activity attributes.
enum CUpti_ActivityComputeApiKind
The kind of a compute API.
enum CUpti_ActivityEnvironmentKind
The kind of environment data. Used to indicate what type of data is being reported by an environment activity record.
enum CUpti_ActivityFlag
Flags associated with activity records.
enum CUpti_ActivityInstructionClass
SASS instruction classification.
enum CUpti_ActivityKind
The kinds of activity records.
enum CUpti_ActivityLaunchType
The type of the CUDA kernel launch.
enum CUpti_ActivityMemcpyKind
The kind of a memory copy, indicating the source and destination targets of the copy.
enum CUpti_ActivityMemoryKind
The kinds of memory accessed by a memory operation/copy.
enum CUpti_ActivityObjectKind
The kinds of activity objects.
enum CUpti_ActivityOverheadKind
The kinds of activity overhead.
enum CUpti_ActivityPCSamplingPeriod
Sampling period for PC sampling method Sampling period can be set using /ref cuptiActivityConfigurePCSampling.
enum CUpti_ActivityPCSamplingStallReason
The stall reason for PC sampling activity.
enum CUpti_ActivityPartitionedGlobalCacheConfig
Partitioned global caching option.
enum CUpti_ActivityPreemptionKind
The kind of a preemption activity.
enum CUpti_ActivityStreamFlag
stream type.
enum CUpti_ActivitySynchronizationType
Synchronization type.
enum CUpti_ActivityThreadIdType
Thread-Id types.
enum CUpti_ActivityUnifiedMemoryAccessType
Memory access type for unified memory page faults.
enum CUpti_ActivityUnifiedMemoryCounterKind
Kind of the Unified Memory counter.
enum CUpti_ActivityUnifiedMemoryCounterScope
Scope of the unified memory counter (deprecated in CUDA 7.0).
enum CUpti_ActivityUnifiedMemoryMigrationCause
Migration cause of the Unified Memory counter.
enum CUpti_DevType
The device type for device connected to NVLink.
enum CUpti_DeviceVirtualizationMode
enum CUpti_EnvironmentClocksThrottleReason
Reasons for clock throttling.
enum CUpti_ExternalCorrelationKind
The kind of external APIs supported for correlation.
enum CUpti_FuncShmemLimitConfig
The shared memory limit per block config for a kernel This should be used to set 'cudaOccFuncShmemConfig' field in occupancy calculator API.
enum CUpti_LinkFlag
Link flags.
enum CUpti_OpenAccConstructKind
The OpenAcc parent construct kind for OpenAcc activity records.
enum CUpti_OpenAccEventKind
The OpenAcc event kind for OpenAcc activity records.
enum CUpti_PcieDeviceType
enum CUpti_PcieGen

Functions

CUptiResult cuptiActivityConfigurePCSampling ( CUcontext ctx, CUpti_ActivityPCSamplingConfig* config )
Set PC sampling configuration.
CUptiResult cuptiActivityConfigureUnifiedMemoryCounter ( CUpti_ActivityUnifiedMemoryCounterConfig* config, uint32_t count )
Set Unified Memory Counter configuration.
CUptiResult cuptiActivityDisable ( CUpti_ActivityKind kind )
Disable collection of a specific kind of activity record.
CUptiResult cuptiActivityDisableContext ( CUcontext context, CUpti_ActivityKind kind )
Disable collection of a specific kind of activity record for a context.
CUptiResult cuptiActivityEnable ( CUpti_ActivityKind kind )
Enable collection of a specific kind of activity record.
CUptiResult cuptiActivityEnableContext ( CUcontext context, CUpti_ActivityKind kind )
Enable collection of a specific kind of activity record for a context.
CUptiResult cuptiActivityEnableLatencyTimestamps ( uint8_t enable )
Controls the collection of queued and submitted timestamps for kernels.
CUptiResult cuptiActivityFlush ( CUcontext context, uint32_t streamId, uint32_t flag )
Wait for all activity records are delivered via the completion callback.
CUptiResult cuptiActivityFlushAll ( uint32_t flag )
Wait for all activity records are delivered via the completion callback.
CUptiResult cuptiActivityGetAttribute ( CUpti_ActivityAttribute attr, size_t* valueSize, void* value )
Read an activity API attribute.
CUptiResult cuptiActivityGetNextRecord ( uint8_t* buffer, size_t validBufferSizeBytes, CUpti_Activity** record )
Iterate over the activity records in a buffer.
CUptiResult cuptiActivityGetNumDroppedRecords ( CUcontext context, uint32_t streamId, size_t* dropped )
Get the number of activity records that were dropped of insufficient buffer space.
CUptiResult cuptiActivityPopExternalCorrelationId ( CUpti_ExternalCorrelationKind kind, uint64_t* lastId )
Pop an external correlation id for the calling thread.
CUptiResult cuptiActivityPushExternalCorrelationId ( CUpti_ExternalCorrelationKind kind, uint64_t id )
Push an external correlation id for the calling thread.
CUptiResult cuptiActivityRegisterCallbacks ( CUpti_BuffersCallbackRequestFunc funcBufferRequested, CUpti_BuffersCallbackCompleteFunc funcBufferCompleted )
Registers callback functions with CUPTI for activity buffer handling.
CUptiResult cuptiActivitySetAttribute ( CUpti_ActivityAttribute attr, size_t* valueSize, void* value )
Write an activity API attribute.
CUptiResult cuptiComputeCapabilitySupported ( int  major, int  minor, int* support )
Check support for a compute capability.
CUptiResult cuptiDeviceSupported ( CUdevice dev, int* support )
Check support for a compute device.
CUptiResult cuptiDeviceVirtualizationMode ( CUdevice dev, CUpti_DeviceVirtualizationMode* mode )
Query the virtualization mode of the device.
CUptiResult cuptiFinalize ( void )
Detach CUPTI from the running process.
CUptiResult cuptiGetAutoBoostState ( CUcontext context, CUpti_ActivityAutoBoostState* state )
Get auto boost state.
CUptiResult cuptiGetContextId ( CUcontext context, uint32_t* contextId )
Get the ID of a context.
CUptiResult cuptiGetDeviceId ( CUcontext context, uint32_t* deviceId )
Get the ID of a device.
CUptiResult cuptiGetGraphNodeId ( CUgraphNode node, uint64_t* nodeId )
Get the unique ID of a graph node.
CUptiResult cuptiGetLastError ( void )
Returns the last error from a cupti call or callback.
CUptiResult cuptiGetStreamId ( CUcontext context, CUstream stream, uint32_t* streamId )
Get the ID of a stream.
CUptiResult cuptiGetStreamIdEx ( CUcontext context, CUstream stream, uint8_t perThreadStream, uint32_t* streamId )
Get the ID of a stream.
CUptiResult cuptiGetThreadIdType ( CUpti_ActivityThreadIdType* type )
Get the thread-id type.
CUptiResult cuptiGetTimestamp ( uint64_t* timestamp )
Get the CUPTI timestamp.
CUptiResult cuptiSetThreadIdType ( CUpti_ActivityThreadIdType type )
Set the thread-id type.

Defines

#define CUPTI_AUTO_BOOST_INVALID_CLIENT_PID 0

An invalid/unknown process id.

#define CUPTI_CORRELATION_ID_UNKNOWN 0

An invalid/unknown correlation ID. A correlation ID of this value indicates that there is no correlation for the activity record.

#define CUPTI_FUNCTION_INDEX_ID_INVALID 0

An invalid function index ID.

#define CUPTI_GRID_ID_UNKNOWN 0LL

An invalid/unknown grid ID.

#define CUPTI_MAX_NVLINK_PORTS 16

Maximum NVLink port numbers.

#define CUPTI_NVLINK_INVALID_PORT -1

Invalid/unknown NVLink port number.

#define CUPTI_SOURCE_LOCATOR_ID_UNKNOWN 0

The source-locator ID that indicates an unknown source location. There is not an actual CUpti_ActivitySourceLocator object corresponding to this value.

#define CUPTI_SYNCHRONIZATION_INVALID_VALUE -1

An invalid/unknown value.

#define CUPTI_TIMESTAMP_UNKNOWN 0LL

An invalid/unknown timestamp for a start, end, queued, submitted, or completed time.

Typedefs

void ( *CUpti_BuffersCallbackCompleteFunc )( CUcontext context,  uint32_t streamId, uint8_t*  buffer,  size_t size,  size_t validSize )

Function type for callback used by CUPTI to return a buffer of activity records. This callback function returns to the CUPTI client a buffer containing activity records. The buffer contains validSize bytes of activity records which should be read using cuptiActivityGetNextRecord. The number of dropped records can be read using cuptiActivityGetNumDroppedRecords. After this call CUPTI relinquished ownership of the buffer and will not use it anymore. The client may return the buffer to CUPTI using the CUpti_BuffersCallbackRequestFunc callback. Note: CUDA 6.0 onwards, all buffers returned by this callback are global buffers i.e. there is no context/stream specific buffer. User needs to parse the global buffer to extract the context/stream specific activity records.

Parameters
context
The context this buffer is associated with. If NULL, the buffer is associated with the global activities. This field is deprecated as of CUDA 6.0 and will always be NULL.
uint32_t streamId
buffer
The activity record buffer.
size_t size
size_t validSize
void ( *CUpti_BuffersCallbackRequestFunc )( uint8_t*  *buffer, size_t*  size, size_t*  maxNumRecords )

Function type for callback used by CUPTI to request an empty buffer for storing activity records. This callback function signals the CUPTI client that an activity buffer is needed by CUPTI. The activity buffer is used by CUPTI to store activity records. The callback function can decline the request by setting *buffer to NULL. In this case CUPTI may drop activity records.

Parameters
*buffer
size
Returns the size of the returned buffer.
maxNumRecords
Returns the maximum number of records that should be placed in the buffer. If 0 then the buffer is filled with as many records as possible. If > 0 the buffer is filled with at most that many records before it is returned.

Enumerations

enum CUpti_ActivityAttribute

These attributes are used to control the behavior of the activity API.

Values
CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE = 0
The device memory size (in bytes) reserved for storing profiling data for non-CDP operations, especially for concurrent kernel tracing, for each buffer on a context. The value is a size_t.Having larger buffer size means less flush operations but consumes more device memory. Having smaller buffer size increases the risk of dropping timestamps for kernel records if too many kernels are launched/replayed at one time. This value only applies to new buffer allocations.Set this value before initializing CUDA or before creating a context to ensure it is considered for the following allocations.The default value is 8388608 (8MB).Note: The actual amount of device memory per buffer reserved by CUPTI might be larger.
CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE_CDP = 1
The device memory size (in bytes) reserved for storing profiling data for CDP operations for each buffer on a context. The value is a size_t.Having larger buffer size means less flush operations but consumes more device memory. This value only applies to new allocations.Set this value before initializing CUDA or before creating a context to ensure it is considered for the following allocations.The default value is 8388608 (8MB).Note: The actual amount of device memory per context reserved by CUPTI might be larger.
CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT = 2
The maximum number of memory buffers per context. The value is a size_t.Buffers can be reused by the context. Increasing this value reduces the number of times CUPTI needs to flush the buffers. Setting this value will not modify the number of memory buffers currently stored.Set this value before initializing CUDA to ensure the limit is not exceeded.The default value is 100.
CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_SIZE = 3
The profiling semaphore pool size reserved for storing profiling data for serialized kernels and memory operations for each context. The value is a size_t.Having larger pool size means less semaphore query operations but consumes more device resources. Having smaller pool size increases the risk of dropping timestamps for kernel and memcpy records if too many kernels or memcpy are launched/replayed at one time. This value only applies to new pool allocations.Set this value before initializing CUDA or before creating a context to ensure it is considered for the following allocations.The default value is 65536.
CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_LIMIT = 4
The maximum number of profiling semaphore pools per context. The value is a size_t.Profiling semaphore pool can be reused by the context. Increasing this value reduces the number of times CUPTI needs to query semaphores in the pool. Setting this value will not modify the number of semaphore pools currently stored.Set this value before initializing CUDA to ensure the limit is not exceeded.The default value is 100.
CUPTI_ACTIVITY_ATTR_ZEROED_OUT_ACTIVITY_BUFFER = 5
The flag to indicate whether user should provide activity buffer of zero value. The value is a uint8_t.If the value of this attribute is non-zero, user should provide a zero value buffer in the CUpti_BuffersCallbackRequestFunc. If the user does not provide a zero value buffer after setting this to non-zero, the activity buffer may contain some uninitialized values when CUPTI returns it in CUpti_BuffersCallbackCompleteFuncIf the value of this attribute is zero, CUPTI will initialize the user buffer received in the CUpti_BuffersCallbackRequestFunc to zero before filling it. If the user sets this to zero, a few stalls may appear in critical path because CUPTI will zero out the buffer in the main thread. Set this value before returning from CUpti_BuffersCallbackRequestFunc to ensure it is considered for all the subsequent user buffers.The default value is 0.
CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_FORCE_INT = 0x7fffffff
enum CUpti_ActivityComputeApiKind

Values
CUPTI_ACTIVITY_COMPUTE_API_UNKNOWN = 0
The compute API is not known.
CUPTI_ACTIVITY_COMPUTE_API_CUDA = 1
The compute APIs are for CUDA.
CUPTI_ACTIVITY_COMPUTE_API_CUDA_MPS = 2
The compute APIs are for CUDA running in MPS (Multi-Process Service) environment.
CUPTI_ACTIVITY_COMPUTE_API_FORCE_INT = 0x7fffffff
enum CUpti_ActivityEnvironmentKind

Values
CUPTI_ACTIVITY_ENVIRONMENT_UNKNOWN = 0
Unknown data.
CUPTI_ACTIVITY_ENVIRONMENT_SPEED = 1
The environment data is related to speed.
CUPTI_ACTIVITY_ENVIRONMENT_TEMPERATURE = 2
The environment data is related to temperature.
CUPTI_ACTIVITY_ENVIRONMENT_POWER = 3
The environment data is related to power.
CUPTI_ACTIVITY_ENVIRONMENT_COOLING = 4
The environment data is related to cooling.
CUPTI_ACTIVITY_ENVIRONMENT_COUNT
CUPTI_ACTIVITY_ENVIRONMENT_KIND_FORCE_INT = 0x7fffffff
enum CUpti_ActivityFlag

Activity record flags. Flags can be combined by bitwise OR to associated multiple flags with an activity record. Each flag is specific to a certain activity kind, as noted below.

Values
CUPTI_ACTIVITY_FLAG_NONE = 0
Indicates the activity record has no flags.
CUPTI_ACTIVITY_FLAG_DEVICE_CONCURRENT_KERNELS = 1<<0
Indicates the activity represents a device that supports concurrent kernel execution. Valid for CUPTI_ACTIVITY_KIND_DEVICE.
CUPTI_ACTIVITY_FLAG_DEVICE_ATTRIBUTE_CUDEVICE = 1<<0
Indicates if the activity represents a CUdevice_attribute value or a CUpti_DeviceAttribute value. Valid for CUPTI_ACTIVITY_KIND_DEVICE_ATTRIBUTE.
CUPTI_ACTIVITY_FLAG_MEMCPY_ASYNC = 1<<0
Indicates the activity represents an asynchronous memcpy operation. Valid for CUPTI_ACTIVITY_KIND_MEMCPY.
CUPTI_ACTIVITY_FLAG_MARKER_INSTANTANEOUS = 1<<0
Indicates the activity represents an instantaneous marker. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_START = 1<<1
Indicates the activity represents a region start marker. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_END = 1<<2
Indicates the activity represents a region end marker. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_SYNC_ACQUIRE = 1<<3
Indicates the activity represents an attempt to acquire a user defined synchronization object. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_SYNC_ACQUIRE_SUCCESS = 1<<4
Indicates the activity represents success in acquiring the user defined synchronization object. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_SYNC_ACQUIRE_FAILED = 1<<5
Indicates the activity represents failure in acquiring the user defined synchronization object. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_SYNC_RELEASE = 1<<6
Indicates the activity represents releasing a reservation on user defined synchronization object. Valid for CUPTI_ACTIVITY_KIND_MARKER.
CUPTI_ACTIVITY_FLAG_MARKER_COLOR_NONE = 1<<0
Indicates the activity represents a marker that does not specify a color. Valid for CUPTI_ACTIVITY_KIND_MARKER_DATA.
CUPTI_ACTIVITY_FLAG_MARKER_COLOR_ARGB = 1<<1
Indicates the activity represents a marker that specifies a color in alpha-red-green-blue format. Valid for CUPTI_ACTIVITY_KIND_MARKER_DATA.
CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_SIZE_MASK = 0xFF<<0
The number of bytes requested by each thread Valid for CUpti_ActivityGlobalAccess3.
CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_LOAD = 1<<8
If bit in this flag is set, the access was load, else it is a store access. Valid for CUpti_ActivityGlobalAccess3.
CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_CACHED = 1<<9
If this bit in flag is set, the load access was cached else it is uncached. Valid for CUpti_ActivityGlobalAccess3.
CUPTI_ACTIVITY_FLAG_METRIC_OVERFLOWED = 1<<0
If this bit in flag is set, the metric value overflowed. Valid for CUpti_ActivityMetric and CUpti_ActivityMetricInstance.
CUPTI_ACTIVITY_FLAG_METRIC_VALUE_INVALID = 1<<1
If this bit in flag is set, the metric value couldn't be calculated. This occurs when a value(s) required to calculate the metric is missing. Valid for CUpti_ActivityMetric and CUpti_ActivityMetricInstance.
CUPTI_ACTIVITY_FLAG_INSTRUCTION_VALUE_INVALID = 1<<0
If this bit in flag is set, the source level metric value couldn't be calculated. This occurs when a value(s) required to calculate the source level metric cannot be evaluated. Valid for CUpti_ActivityInstructionExecution.
CUPTI_ACTIVITY_FLAG_INSTRUCTION_CLASS_MASK = 0xFF<<1
The mask for the instruction class, CUpti_ActivityInstructionClass Valid for CUpti_ActivityInstructionExecution and CUpti_ActivityInstructionCorrelation
CUPTI_ACTIVITY_FLAG_FLUSH_FORCED = 1<<0
When calling cuptiActivityFlushAll, this flag can be set to force CUPTI to flush all records in the buffer, whether finished or not
CUPTI_ACTIVITY_FLAG_SHARED_ACCESS_KIND_SIZE_MASK = 0xFF<<0
The number of bytes requested by each thread Valid for CUpti_ActivitySharedAccess.
CUPTI_ACTIVITY_FLAG_SHARED_ACCESS_KIND_LOAD = 1<<8
If bit in this flag is set, the access was load, else it is a store access. Valid for CUpti_ActivitySharedAccess.
CUPTI_ACTIVITY_FLAG_MEMSET_ASYNC = 1<<0
Indicates the activity represents an asynchronous memset operation. Valid for CUPTI_ACTIVITY_KIND_MEMSET.
CUPTI_ACTIVITY_FLAG_THRASHING_IN_CPU = 1<<0
Indicates the activity represents thrashing in CPU. Valid for counter of kind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THRASHING in CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER
CUPTI_ACTIVITY_FLAG_THROTTLING_IN_CPU = 1<<0
Indicates the activity represents page throttling in CPU. Valid for counter of kind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THROTTLING in CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER
CUPTI_ACTIVITY_FLAG_FORCE_INT = 0x7fffffff
enum CUpti_ActivityInstructionClass

The sass instruction are broadly divided into different class. Each enum represents a classification.

Values
CUPTI_ACTIVITY_INSTRUCTION_CLASS_UNKNOWN = 0
The instruction class is not known.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_FP_32 = 1
Represents a 32 bit floating point operation.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_FP_64 = 2
Represents a 64 bit floating point operation.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_INTEGER = 3
Represents an integer operation.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_BIT_CONVERSION = 4
Represents a bit conversion operation.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_CONTROL_FLOW = 5
Represents a control flow instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_GLOBAL = 6
Represents a global load-store instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_SHARED = 7
Represents a shared load-store instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_LOCAL = 8
Represents a local load-store instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_GENERIC = 9
Represents a generic load-store instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_SURFACE = 10
Represents a surface load-store instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_CONSTANT = 11
Represents a constant load instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_TEXTURE = 12
Represents a texture load-store instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_GLOBAL_ATOMIC = 13
Represents a global atomic instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_SHARED_ATOMIC = 14
Represents a shared atomic instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_SURFACE_ATOMIC = 15
Represents a surface atomic instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_INTER_THREAD_COMMUNICATION = 16
Represents a inter-thread communication instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_BARRIER = 17
Represents a barrier instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_MISCELLANEOUS = 18
Represents some miscellaneous instructions which do not fit in the above classification.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_FP_16 = 19
Represents a 16 bit floating point operation.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_UNIFORM = 20
Represents uniform instruction.
CUPTI_ACTIVITY_INSTRUCTION_CLASS_KIND_FORCE_INT = 0x7fffffff
enum CUpti_ActivityKind

Each activity record kind represents information about a GPU or an activity occurring on a CPU or GPU. Each kind is associated with a activity record structure that holds the information associated with the kind.

See also:

CUpti_Activity

CUpti_ActivityAPI

CUpti_ActivityContext

CUpti_ActivityDevice

CUpti_ActivityDevice2

CUpti_ActivityDeviceAttribute

CUpti_ActivityEvent

CUpti_ActivityEventInstance

CUpti_ActivityKernel

CUpti_ActivityKernel2

CUpti_ActivityKernel3

CUpti_ActivityKernel4

CUpti_ActivityKernel5

CUpti_ActivityCdpKernel

CUpti_ActivityPreemption

CUpti_ActivityMemcpy

CUpti_ActivityMemcpy3

CUpti_ActivityMemcpyPtoP

CUpti_ActivityMemcpyPtoP2

CUpti_ActivityMemset

CUpti_ActivityMemset2

CUpti_ActivityMetric

CUpti_ActivityMetricInstance

CUpti_ActivityName

CUpti_ActivityMarker

CUpti_ActivityMarker2

CUpti_ActivityMarkerData

CUpti_ActivitySourceLocator

CUpti_ActivityGlobalAccess

CUpti_ActivityGlobalAccess2

CUpti_ActivityGlobalAccess3

CUpti_ActivityBranch

CUpti_ActivityBranch2

CUpti_ActivityOverhead

CUpti_ActivityEnvironment

CUpti_ActivityInstructionExecution

CUpti_ActivityUnifiedMemoryCounter

CUpti_ActivityFunction

CUpti_ActivityModule

CUpti_ActivitySharedAccess

CUpti_ActivityPCSampling

CUpti_ActivityPCSampling2

CUpti_ActivityPCSampling3

CUpti_ActivityPCSamplingRecordInfo

CUpti_ActivityCudaEvent

CUpti_ActivityStream

CUpti_ActivitySynchronization

CUpti_ActivityInstructionCorrelation

CUpti_ActivityExternalCorrelation

CUpti_ActivityUnifiedMemoryCounter2

CUpti_ActivityOpenAccData

CUpti_ActivityOpenAccLaunch

CUpti_ActivityOpenAccOther

CUpti_ActivityOpenMp

CUpti_ActivityNvLink

CUpti_ActivityNvLink2

CUpti_ActivityNvLink3

CUpti_ActivityMemory

CUpti_ActivityPcie

Values
CUPTI_ACTIVITY_KIND_INVALID = 0
The activity record is invalid.
CUPTI_ACTIVITY_KIND_MEMCPY = 1
A host<->host, host<->device, or device<->device memory copy. The corresponding activity record structure is CUpti_ActivityMemcpy3.
CUPTI_ACTIVITY_KIND_MEMSET = 2
A memory set executing on the GPU. The corresponding activity record structure is CUpti_ActivityMemset2.
CUPTI_ACTIVITY_KIND_KERNEL = 3
A kernel executing on the GPU. This activity kind may significantly change the overall performance characteristics of the application because all kernel executions are serialized on the GPU. Other activity kind for kernel CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL doesn't break kernel concurrency. The corresponding activity record structure is CUpti_ActivityKernel5.
CUPTI_ACTIVITY_KIND_DRIVER = 4
A CUDA driver API function execution. The corresponding activity record structure is CUpti_ActivityAPI.
CUPTI_ACTIVITY_KIND_RUNTIME = 5
A CUDA runtime API function execution. The corresponding activity record structure is CUpti_ActivityAPI.
CUPTI_ACTIVITY_KIND_EVENT = 6
An event value. The corresponding activity record structure is CUpti_ActivityEvent.
CUPTI_ACTIVITY_KIND_METRIC = 7
A metric value. The corresponding activity record structure is CUpti_ActivityMetric.
CUPTI_ACTIVITY_KIND_DEVICE = 8
Information about a device. The corresponding activity record structure is CUpti_ActivityDevice2.
CUPTI_ACTIVITY_KIND_CONTEXT = 9
Information about a context. The corresponding activity record structure is CUpti_ActivityContext.
CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL = 10
A kernel executing on the GPU. This activity kind doesn't break kernel concurrency. The corresponding activity record structure is CUpti_ActivityKernel5.
CUPTI_ACTIVITY_KIND_NAME = 11
Thread, device, context, etc. name. The corresponding activity record structure is CUpti_ActivityName.
CUPTI_ACTIVITY_KIND_MARKER = 12
Instantaneous, start, or end marker. The corresponding activity record structure is CUpti_ActivityMarker2.
CUPTI_ACTIVITY_KIND_MARKER_DATA = 13
Extended, optional, data about a marker. The corresponding activity record structure is CUpti_ActivityMarkerData.
CUPTI_ACTIVITY_KIND_SOURCE_LOCATOR = 14
Source information about source level result. The corresponding activity record structure is CUpti_ActivitySourceLocator.
CUPTI_ACTIVITY_KIND_GLOBAL_ACCESS = 15
Results for source-level global acccess. The corresponding activity record structure is CUpti_ActivityGlobalAccess3.
CUPTI_ACTIVITY_KIND_BRANCH = 16
Results for source-level branch. The corresponding activity record structure is CUpti_ActivityBranch2.
CUPTI_ACTIVITY_KIND_OVERHEAD = 17
Overhead activity records. The corresponding activity record structure is CUpti_ActivityOverhead.
CUPTI_ACTIVITY_KIND_CDP_KERNEL = 18
A CDP (CUDA Dynamic Parallel) kernel executing on the GPU. The corresponding activity record structure is CUpti_ActivityCdpKernel. This activity can not be directly enabled or disabled. It is enabled and disabled through concurrent kernel activity i.e. _CONCURRENT_KERNEL.
CUPTI_ACTIVITY_KIND_PREEMPTION = 19
Preemption activity record indicating a preemption of a CDP (CUDA Dynamic Parallel) kernel executing on the GPU. The corresponding activity record structure is CUpti_ActivityPreemption.
CUPTI_ACTIVITY_KIND_ENVIRONMENT = 20
Environment activity records indicating power, clock, thermal, etc. levels of the GPU. The corresponding activity record structure is CUpti_ActivityEnvironment.
CUPTI_ACTIVITY_KIND_EVENT_INSTANCE = 21
An event value associated with a specific event domain instance. The corresponding activity record structure is CUpti_ActivityEventInstance.
CUPTI_ACTIVITY_KIND_MEMCPY2 = 22
A peer to peer memory copy. The corresponding activity record structure is CUpti_ActivityMemcpyPtoP2.
CUPTI_ACTIVITY_KIND_METRIC_INSTANCE = 23
A metric value associated with a specific metric domain instance. The corresponding activity record structure is CUpti_ActivityMetricInstance.
CUPTI_ACTIVITY_KIND_INSTRUCTION_EXECUTION = 24
Results for source-level instruction execution. The corresponding activity record structure is CUpti_ActivityInstructionExecution.
CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER = 25
Unified Memory counter record. The corresponding activity record structure is CUpti_ActivityUnifiedMemoryCounter2.
CUPTI_ACTIVITY_KIND_FUNCTION = 26
Device global/function record. The corresponding activity record structure is CUpti_ActivityFunction.
CUPTI_ACTIVITY_KIND_MODULE = 27
CUDA Module record. The corresponding activity record structure is CUpti_ActivityModule.
CUPTI_ACTIVITY_KIND_DEVICE_ATTRIBUTE = 28
A device attribute value. The corresponding activity record structure is CUpti_ActivityDeviceAttribute.
CUPTI_ACTIVITY_KIND_SHARED_ACCESS = 29
Results for source-level shared acccess. The corresponding activity record structure is CUpti_ActivitySharedAccess.
CUPTI_ACTIVITY_KIND_PC_SAMPLING = 30
Enable PC sampling for kernels. This will serialize kernels. The corresponding activity record structure is CUpti_ActivityPCSampling3.
CUPTI_ACTIVITY_KIND_PC_SAMPLING_RECORD_INFO = 31
Summary information about PC sampling records. The corresponding activity record structure is CUpti_ActivityPCSamplingRecordInfo.
CUPTI_ACTIVITY_KIND_INSTRUCTION_CORRELATION = 32
SASS/Source line-by-line correlation record. This will generate sass/source correlation for functions that have source level analysis or pc sampling results. The records will be generated only when either of source level analysis or pc sampling activity is enabled. The corresponding activity record structure is CUpti_ActivityInstructionCorrelation.
CUPTI_ACTIVITY_KIND_OPENACC_DATA = 33
OpenACC data events. The corresponding activity record structure is CUpti_ActivityOpenAccData.
CUPTI_ACTIVITY_KIND_OPENACC_LAUNCH = 34
OpenACC launch events. The corresponding activity record structure is CUpti_ActivityOpenAccLaunch.
CUPTI_ACTIVITY_KIND_OPENACC_OTHER = 35
OpenACC other events. The corresponding activity record structure is CUpti_ActivityOpenAccOther.
CUPTI_ACTIVITY_KIND_CUDA_EVENT = 36
Information about a CUDA event. The corresponding activity record structure is CUpti_ActivityCudaEvent.
CUPTI_ACTIVITY_KIND_STREAM = 37
Information about a CUDA stream. The corresponding activity record structure is CUpti_ActivityStream.
CUPTI_ACTIVITY_KIND_SYNCHRONIZATION = 38
Records for synchronization management. The corresponding activity record structure is CUpti_ActivitySynchronization.
CUPTI_ACTIVITY_KIND_EXTERNAL_CORRELATION = 39
Records for correlation of different programming APIs. The corresponding activity record structure is CUpti_ActivityExternalCorrelation.
CUPTI_ACTIVITY_KIND_NVLINK = 40
NVLink information. The corresponding activity record structure is CUpti_ActivityNvLink3.
CUPTI_ACTIVITY_KIND_INSTANTANEOUS_EVENT = 41
Instantaneous Event information. The corresponding activity record structure is CUpti_ActivityInstantaneousEvent.
CUPTI_ACTIVITY_KIND_INSTANTANEOUS_EVENT_INSTANCE = 42
Instantaneous Event information for a specific event domain instance. The corresponding activity record structure is CUpti_ActivityInstantaneousEventInstance
CUPTI_ACTIVITY_KIND_INSTANTANEOUS_METRIC = 43
Instantaneous Metric information The corresponding activity record structure is CUpti_ActivityInstantaneousMetric.
CUPTI_ACTIVITY_KIND_INSTANTANEOUS_METRIC_INSTANCE = 44
Instantaneous Metric information for a specific metric domain instance. The corresponding activity record structure is CUpti_ActivityInstantaneousMetricInstance.
CUPTI_ACTIVITY_KIND_MEMORY = 45
Memory activity tracking allocation and freeing of the memory The corresponding activity record structure is CUpti_ActivityMemory.
CUPTI_ACTIVITY_KIND_PCIE = 46
PCI devices information used for PCI topology. The corresponding activity record structure is CUpti_ActivityPcie.
CUPTI_ACTIVITY_KIND_OPENMP = 47
OpenMP parallel events. The corresponding activity record structure is CUpti_ActivityOpenMp.
CUPTI_ACTIVITY_KIND_INTERNAL_LAUNCH_API = 48
A CUDA driver kernel launch occurring outside of any public API function execution. Tools can handle these like records for driver API launch functions, although the cbid field is not used here. The corresponding activity record structure is CUpti_ActivityAPI.
CUPTI_ACTIVITY_KIND_COUNT = 49
CUPTI_ACTIVITY_KIND_FORCE_INT = 0x7fffffff
enum CUpti_ActivityLaunchType

Values
CUPTI_ACTIVITY_LAUNCH_TYPE_REGULAR = 0
The kernel was launched via a regular kernel call
CUPTI_ACTIVITY_LAUNCH_TYPE_COOPERATIVE_SINGLE_DEVICE = 1
The kernel was launched via API cudaLaunchCooperativeKernel() or cuLaunchCooperativeKernel()
CUPTI_ACTIVITY_LAUNCH_TYPE_COOPERATIVE_MULTI_DEVICE = 2
The kernel was launched via API cudaLaunchCooperativeKernelMultiDevice() or cuLaunchCooperativeKernelMultiDevice()
enum CUpti_ActivityMemcpyKind

Each kind represents the source and destination targets of a memory copy. Targets are host, device, and array.

Values
CUPTI_ACTIVITY_MEMCPY_KIND_UNKNOWN = 0
The memory copy kind is not known.
CUPTI_ACTIVITY_MEMCPY_KIND_HTOD = 1
A host to device memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_DTOH = 2
A device to host memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_HTOA = 3
A host to device array memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_ATOH = 4
A device array to host memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_ATOA = 5
A device array to device array memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_ATOD = 6
A device array to device memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_DTOA = 7
A device to device array memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_DTOD = 8
A device to device memory copy on the same device.
CUPTI_ACTIVITY_MEMCPY_KIND_HTOH = 9
A host to host memory copy.
CUPTI_ACTIVITY_MEMCPY_KIND_PTOP = 10
A peer to peer memory copy across different devices.
CUPTI_ACTIVITY_MEMCPY_KIND_FORCE_INT = 0x7fffffff
enum CUpti_ActivityMemoryKind

Each kind represents the type of the memory accessed by a memory operation/copy.

Values
CUPTI_ACTIVITY_MEMORY_KIND_UNKNOWN = 0
The memory kind is unknown.
CUPTI_ACTIVITY_MEMORY_KIND_PAGEABLE = 1
The memory is pageable.
CUPTI_ACTIVITY_MEMORY_KIND_PINNED = 2
The memory is pinned.
CUPTI_ACTIVITY_MEMORY_KIND_DEVICE = 3
The memory is on the device.
CUPTI_ACTIVITY_MEMORY_KIND_ARRAY = 4
The memory is an array.
CUPTI_ACTIVITY_MEMORY_KIND_MANAGED = 5
The memory is managed
CUPTI_ACTIVITY_MEMORY_KIND_DEVICE_STATIC = 6
The memory is device static
CUPTI_ACTIVITY_MEMORY_KIND_MANAGED_STATIC = 7
The memory is managed static
CUPTI_ACTIVITY_MEMORY_KIND_FORCE_INT = 0x7fffffff
enum CUpti_ActivityObjectKind
Values
CUPTI_ACTIVITY_OBJECT_UNKNOWN = 0
The object kind is not known.
CUPTI_ACTIVITY_OBJECT_PROCESS = 1
A process.
CUPTI_ACTIVITY_OBJECT_THREAD = 2
A thread.
CUPTI_ACTIVITY_OBJECT_DEVICE = 3
A device.
CUPTI_ACTIVITY_OBJECT_CONTEXT = 4
A context.
CUPTI_ACTIVITY_OBJECT_STREAM = 5
A stream.
CUPTI_ACTIVITY_OBJECT_FORCE_INT = 0x7fffffff
enum CUpti_ActivityOverheadKind

Values
CUPTI_ACTIVITY_OVERHEAD_UNKNOWN = 0
The overhead kind is not known.
CUPTI_ACTIVITY_OVERHEAD_DRIVER_COMPILER = 1
Compiler(JIT) overhead.
CUPTI_ACTIVITY_OVERHEAD_CUPTI_BUFFER_FLUSH = 1<<16
Activity buffer flush overhead.
CUPTI_ACTIVITY_OVERHEAD_CUPTI_INSTRUMENTATION = 2<<16
CUPTI instrumentation overhead.
CUPTI_ACTIVITY_OVERHEAD_CUPTI_RESOURCE = 3<<16
CUPTI resource creation and destruction overhead.
CUPTI_ACTIVITY_OVERHEAD_FORCE_INT = 0x7fffffff
enum CUpti_ActivityPCSamplingPeriod

Values
CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_INVALID = 0
The PC sampling period is not set.
CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_MIN = 1
Minimum sampling period available on the device.
CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_LOW = 2
Sampling period in lower range.
CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_MID = 3
Medium sampling period.
CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_HIGH = 4
Sampling period in higher range.
CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_MAX = 5
Maximum sampling period available on the device.
CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_FORCE_INT = 0x7fffffff
enum CUpti_ActivityPCSamplingStallReason

Values
CUPTI_ACTIVITY_PC_SAMPLING_STALL_INVALID = 0
Invalid reason
CUPTI_ACTIVITY_PC_SAMPLING_STALL_NONE = 1
No stall, instruction is selected for issue
CUPTI_ACTIVITY_PC_SAMPLING_STALL_INST_FETCH = 2
Warp is blocked because next instruction is not yet available, because of instruction cache miss, or because of branching effects
CUPTI_ACTIVITY_PC_SAMPLING_STALL_EXEC_DEPENDENCY = 3
Instruction is waiting on an arithmatic dependency
CUPTI_ACTIVITY_PC_SAMPLING_STALL_MEMORY_DEPENDENCY = 4
Warp is blocked because it is waiting for a memory access to complete.
CUPTI_ACTIVITY_PC_SAMPLING_STALL_TEXTURE = 5
Texture sub-system is fully utilized or has too many outstanding requests.
CUPTI_ACTIVITY_PC_SAMPLING_STALL_SYNC = 6
Warp is blocked as it is waiting at __syncthreads() or at memory barrier.
CUPTI_ACTIVITY_PC_SAMPLING_STALL_CONSTANT_MEMORY_DEPENDENCY = 7
Warp is blocked waiting for __constant__ memory and immediate memory access to complete.
CUPTI_ACTIVITY_PC_SAMPLING_STALL_PIPE_BUSY = 8
Compute operation cannot be performed due to the required resources not being available.
CUPTI_ACTIVITY_PC_SAMPLING_STALL_MEMORY_THROTTLE = 9
Warp is blocked because there are too many pending memory operations. In Kepler architecture it often indicates high number of memory replays.
CUPTI_ACTIVITY_PC_SAMPLING_STALL_NOT_SELECTED = 10
Warp was ready to issue, but some other warp issued instead.
CUPTI_ACTIVITY_PC_SAMPLING_STALL_OTHER = 11
Miscellaneous reasons
CUPTI_ACTIVITY_PC_SAMPLING_STALL_SLEEPING = 12
Sleeping.
CUPTI_ACTIVITY_PC_SAMPLING_STALL_FORCE_INT = 0x7fffffff
enum CUpti_ActivityPartitionedGlobalCacheConfig

Values
CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_UNKNOWN = 0
Partitioned global cache config unknown.
CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_NOT_SUPPORTED = 1
Partitioned global cache not supported.
CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_OFF = 2
Partitioned global cache config off.
CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_ON = 3
Partitioned global cache config on.
CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_FORCE_INT = 0x7fffffff
enum CUpti_ActivityPreemptionKind

Values
CUPTI_ACTIVITY_PREEMPTION_KIND_UNKNOWN = 0
The preemption kind is not known.
CUPTI_ACTIVITY_PREEMPTION_KIND_SAVE = 1
Preemption to save CDP block.
CUPTI_ACTIVITY_PREEMPTION_KIND_RESTORE = 2
Preemption to restore CDP block.
CUPTI_ACTIVITY_PREEMPTION_KIND_FORCE_INT = 0x7fffffff
enum CUpti_ActivityStreamFlag

The types of stream to be used with CUpti_ActivityStream.

Values
CUPTI_ACTIVITY_STREAM_CREATE_FLAG_UNKNOWN = 0
Unknown data.
CUPTI_ACTIVITY_STREAM_CREATE_FLAG_DEFAULT = 1
Default stream.
CUPTI_ACTIVITY_STREAM_CREATE_FLAG_NON_BLOCKING = 2
Non-blocking stream.
CUPTI_ACTIVITY_STREAM_CREATE_FLAG_NULL = 3
Null stream.
CUPTI_ACTIVITY_STREAM_CREATE_MASK = 0xFFFF
Stream create Mask
CUPTI_ACTIVITY_STREAM_CREATE_FLAG_FORCE_INT = 0x7fffffff
enum CUpti_ActivitySynchronizationType

The types of synchronization to be used with CUpti_ActivitySynchronization.

Values
CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_UNKNOWN = 0
Unknown data.
CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_EVENT_SYNCHRONIZE = 1
Event synchronize API.
CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_STREAM_WAIT_EVENT = 2
Stream wait event API.
CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_STREAM_SYNCHRONIZE = 3
Stream synchronize API.
CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_CONTEXT_SYNCHRONIZE = 4
Context synchronize API.
CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_FORCE_INT = 0x7fffffff
enum CUpti_ActivityThreadIdType

CUPTI uses different methods to obtain the thread-id depending on the support and the underlying platform. This enum documents these methods for each type. APIs cuptiSetThreadIdType and cuptiGetThreadIdType can be used to set and get the thread-id type.

Values
CUPTI_ACTIVITY_THREAD_ID_TYPE_DEFAULT = 0
Default type Windows uses API GetCurrentThreadId() Linux/Mac/Android/QNX use POSIX pthread API pthread_self()
CUPTI_ACTIVITY_THREAD_ID_TYPE_SYSTEM = 1
This type is based on the system API available on the underlying platform and thread-id obtained is supposed to be unique for the process lifetime. Windows uses API GetCurrentThreadId() Linux uses syscall SYS_gettid Mac uses syscall SYS_thread_selfid Android/QNX use gettid()
CUPTI_ACTIVITY_THREAD_ID_TYPE_FORCE_INT = 0x7fffffff
enum CUpti_ActivityUnifiedMemoryAccessType
Values
CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_UNKNOWN = 0
The unified memory access type is not known
CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_READ = 1
The page fault was triggered by read memory instruction
CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_WRITE = 2
The page fault was triggered by write memory instruction
CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_ATOMIC = 3
The page fault was triggered by atomic memory instruction
CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_PREFETCH = 4
The page fault was triggered by memory prefetch operation
enum CUpti_ActivityUnifiedMemoryCounterKind

Many activities are associated with Unified Memory mechanism; among them are tranfer from host to device, device to host, page fault at host side.

Values
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_UNKNOWN = 0
The unified memory counter kind is not known.
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_HTOD = 1
Number of bytes transfered from host to device
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOH = 2
Number of bytes transfered from device to host
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_CPU_PAGE_FAULT_COUNT = 3
Number of CPU page faults, this is only supported on 64 bit Linux and Mac platforms
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_GPU_PAGE_FAULT = 4
Number of GPU page faults, this is only supported on devices with compute capability 6.0 and higher and 64 bit Linux platforms
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THRASHING = 5
Thrashing occurs when data is frequently accessed by multiple processors and has to be constantly migrated around to achieve data locality. In this case the overhead of migration may exceed the benefits of locality. This is only supported on 64 bit Linux platforms.
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THROTTLING = 6
Throttling is a prevention technique used by the driver to avoid further thrashing. Here, the driver doesn't service the fault for one of the contending processors for a specific period of time, so that the other processor can run at full-speed. This is only supported on 64 bit Linux platforms.
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_REMOTE_MAP = 7
In case throttling does not help, the driver tries to pin the memory to a processor for a specific period of time. One of the contending processors will have slow access to the memory, while the other will have fast access. This is only supported on 64 bit Linux platforms.
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOD = 8
Number of bytes transferred from one device to another device. This is only supported on 64 bit Linux platforms.
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_COUNT
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_FORCE_INT = 0x7fffffff
enum CUpti_ActivityUnifiedMemoryCounterScope

Values
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_SCOPE_UNKNOWN = 0
The unified memory counter scope is not known.
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_SCOPE_PROCESS_SINGLE_DEVICE = 1
Collect unified memory counter for single process on one device
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_SCOPE_PROCESS_ALL_DEVICES = 2
Collect unified memory counter for single process across all devices
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_SCOPE_COUNT
CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_SCOPE_FORCE_INT = 0x7fffffff
enum CUpti_ActivityUnifiedMemoryMigrationCause
Values
CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_UNKNOWN = 0
The unified memory migration cause is not known
CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_USER = 1
The unified memory migrated due to an explicit call from the user e.g. cudaMemPrefetchAsync
CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_COHERENCE = 2
The unified memory migrated to guarantee data coherence e.g. CPU/GPU faults on Pascal+ and kernel launch on pre-Pascal GPUs
CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_PREFETCH = 3
The unified memory was speculatively migrated by the UVM driver before being accessed by the destination processor to improve performance
CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_EVICTION = 4
The unified memory migrated to the CPU because it was evicted to make room for another block of memory on the GPU
CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_ACCESS_COUNTERS = 5
The unified memory migrated to another processor because of access counter notifications
enum CUpti_DevType

Values
CUPTI_DEV_TYPE_INVALID = 0
CUPTI_DEV_TYPE_GPU = 1
The device type is GPU.
CUPTI_DEV_TYPE_NPU = 2
The device type is NVLink processing unit in CPU.
CUPTI_DEV_TYPE_FORCE_INT = 0x7fffffff
enum CUpti_DeviceVirtualizationMode

This indicates the virtualization mode in which CUDA device is running

Values
CUPTI_DEVICE_VIRTUALIZATION_MODE_NONE = 0
No virtualization mode isassociated with the device i.e. it's a baremetal GPU
CUPTI_DEVICE_VIRTUALIZATION_MODE_PASS_THROUGH = 1
The device is associated with the pass-through GPU. In this mode, an entire physical GPU is directly assigned to one virtual machine (VM).
CUPTI_DEVICE_VIRTUALIZATION_MODE_VIRTUAL_GPU = 2
The device is associated with the virtual GPU (vGPU). In this mode multiple virtual machines (VMs) have simultaneous, direct access to a single physical GPU.
CUPTI_DEVICE_VIRTUALIZATION_MODE_FORCE_INT = 0x7fffffff
enum CUpti_EnvironmentClocksThrottleReason

The possible reasons that a clock can be throttled. There can be more than one reason that a clock is being throttled so these types can be combined by bitwise OR. These are used in the clocksThrottleReason field in the Environment Activity Record.

Values
CUPTI_CLOCKS_THROTTLE_REASON_GPU_IDLE = 0x00000001
Nothing is running on the GPU and the clocks are dropping to idle state.
CUPTI_CLOCKS_THROTTLE_REASON_USER_DEFINED_CLOCKS = 0x00000002
The GPU clocks are limited by a user specified limit.
CUPTI_CLOCKS_THROTTLE_REASON_SW_POWER_CAP = 0x00000004
A software power scaling algorithm is reducing the clocks below requested clocks.
CUPTI_CLOCKS_THROTTLE_REASON_HW_SLOWDOWN = 0x00000008
Hardware slowdown to reduce the clock by a factor of two or more is engaged. This is an indicator of one of the following: 1) Temperature is too high, 2) External power brake assertion is being triggered (e.g. by the system power supply), 3) Change in power state.
CUPTI_CLOCKS_THROTTLE_REASON_UNKNOWN = 0x80000000
Some unspecified factor is reducing the clocks.
CUPTI_CLOCKS_THROTTLE_REASON_UNSUPPORTED = 0x40000000
Throttle reason is not supported for this GPU.
CUPTI_CLOCKS_THROTTLE_REASON_NONE = 0x00000000
No clock throttling.
CUPTI_CLOCKS_THROTTLE_REASON_FORCE_INT = 0x7fffffff
enum CUpti_ExternalCorrelationKind

Custom correlation kinds are reserved for usage in external tools.

See also:

CUpti_ActivityExternalCorrelation

Values
CUPTI_EXTERNAL_CORRELATION_KIND_INVALID = 0
CUPTI_EXTERNAL_CORRELATION_KIND_UNKNOWN = 1
The external API is unknown to CUPTI
CUPTI_EXTERNAL_CORRELATION_KIND_OPENACC = 2
The external API is OpenACC
CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM0 = 3
The external API is custom0
CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM1 = 4
The external API is custom1
CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM2 = 5
The external API is custom2
CUPTI_EXTERNAL_CORRELATION_KIND_SIZE
Add new kinds before this line
CUPTI_EXTERNAL_CORRELATION_KIND_FORCE_INT = 0x7fffffff
enum CUpti_FuncShmemLimitConfig

Values
CUPTI_FUNC_SHMEM_LIMIT_DEFAULT = 0x00
CUPTI_FUNC_SHMEM_LIMIT_OPTIN = 0x01
CUPTI_FUNC_SHMEM_LIMIT_FORCE_INT = 0x7fffffff
enum CUpti_LinkFlag

Describes link properties, to be used with CUpti_ActivityNvLink.

Values
CUPTI_LINK_FLAG_INVALID = 0
CUPTI_LINK_FLAG_PEER_ACCESS = (1<<1)
Is peer to peer access supported by this link.
CUPTI_LINK_FLAG_SYSMEM_ACCESS = (1<<2)
Is system memory access supported by this link.
CUPTI_LINK_FLAG_PEER_ATOMICS = (1<<3)
Is peer atomic access supported by this link.
CUPTI_LINK_FLAG_SYSMEM_ATOMICS = (1<<4)
Is system memory atomic access supported by this link.
CUPTI_LINK_FLAG_FORCE_INT = 0x7fffffff
enum CUpti_OpenAccConstructKind

Values
CUPTI_OPENACC_CONSTRUCT_KIND_UNKNOWN = 0
CUPTI_OPENACC_CONSTRUCT_KIND_PARALLEL = 1
CUPTI_OPENACC_CONSTRUCT_KIND_KERNELS = 2
CUPTI_OPENACC_CONSTRUCT_KIND_LOOP = 3
CUPTI_OPENACC_CONSTRUCT_KIND_DATA = 4
CUPTI_OPENACC_CONSTRUCT_KIND_ENTER_DATA = 5
CUPTI_OPENACC_CONSTRUCT_KIND_EXIT_DATA = 6
CUPTI_OPENACC_CONSTRUCT_KIND_HOST_DATA = 7
CUPTI_OPENACC_CONSTRUCT_KIND_ATOMIC = 8
CUPTI_OPENACC_CONSTRUCT_KIND_DECLARE = 9
CUPTI_OPENACC_CONSTRUCT_KIND_INIT = 10
CUPTI_OPENACC_CONSTRUCT_KIND_SHUTDOWN = 11
CUPTI_OPENACC_CONSTRUCT_KIND_SET = 12
CUPTI_OPENACC_CONSTRUCT_KIND_UPDATE = 13
CUPTI_OPENACC_CONSTRUCT_KIND_ROUTINE = 14
CUPTI_OPENACC_CONSTRUCT_KIND_WAIT = 15
CUPTI_OPENACC_CONSTRUCT_KIND_RUNTIME_API = 16
CUPTI_OPENACC_CONSTRUCT_KIND_FORCE_INT = 0x7fffffff
enum CUpti_OpenAccEventKind

See also:

CUpti_ActivityKindOpenAcc

Values
CUPTI_OPENACC_EVENT_KIND_INVALID = 0
CUPTI_OPENACC_EVENT_KIND_DEVICE_INIT = 1
CUPTI_OPENACC_EVENT_KIND_DEVICE_SHUTDOWN = 2
CUPTI_OPENACC_EVENT_KIND_RUNTIME_SHUTDOWN = 3
CUPTI_OPENACC_EVENT_KIND_ENQUEUE_LAUNCH = 4
CUPTI_OPENACC_EVENT_KIND_ENQUEUE_UPLOAD = 5
CUPTI_OPENACC_EVENT_KIND_ENQUEUE_DOWNLOAD = 6
CUPTI_OPENACC_EVENT_KIND_WAIT = 7
CUPTI_OPENACC_EVENT_KIND_IMPLICIT_WAIT = 8
CUPTI_OPENACC_EVENT_KIND_COMPUTE_CONSTRUCT = 9
CUPTI_OPENACC_EVENT_KIND_UPDATE = 10
CUPTI_OPENACC_EVENT_KIND_ENTER_DATA = 11
CUPTI_OPENACC_EVENT_KIND_EXIT_DATA = 12
CUPTI_OPENACC_EVENT_KIND_CREATE = 13
CUPTI_OPENACC_EVENT_KIND_DELETE = 14
CUPTI_OPENACC_EVENT_KIND_ALLOC = 15
CUPTI_OPENACC_EVENT_KIND_FREE = 16
CUPTI_OPENACC_EVENT_KIND_FORCE_INT = 0x7fffffff
enum CUpti_PcieDeviceType

Field to differentiate whether PCIE Activity record is of a GPU or a PCI Bridge

Values
CUPTI_PCIE_DEVICE_TYPE_GPU = 0
PCIE GPU record
CUPTI_PCIE_DEVICE_TYPE_BRIDGE = 1
PCIE Bridge record
CUPTI_PCIE_DEVICE_TYPE_FORCE_INT = 0x7fffffff
enum CUpti_PcieGen

Genearation. /ref pcieGeneration Enumeration of PCIE Generation for pcie activity attribute pcieGeneration

Values
CUPTI_PCIE_GEN_GEN1 = 1
PCIE Generation 1
CUPTI_PCIE_GEN_GEN2 = 2
PCIE Generation 2
CUPTI_PCIE_GEN_GEN3 = 3
PCIE Generation 3
CUPTI_PCIE_GEN_GEN4 = 4
PCIE Generation 4
CUPTI_PCIE_GEN_FORCE_INT = 0x7fffffff

Functions

CUptiResult cuptiActivityConfigurePCSampling ( CUcontext ctx, CUpti_ActivityPCSamplingConfig* config )
Set PC sampling configuration.
Parameters
ctx
The context
config
A pointer to CUpti_ActivityPCSamplingConfig structure containing PC sampling configuration.
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_OPERATION

    if this api is called while some valid event collection method is set.

  • CUPTI_ERROR_INVALID_PARAMETER

    if config is NULL or any parameter in the config structures is not a valid value

  • CUPTI_ERROR_NOT_SUPPORTED

    Indicates that the system/device does not support the unified memory counters

Description

CUptiResult cuptiActivityConfigureUnifiedMemoryCounter ( CUpti_ActivityUnifiedMemoryCounterConfig* config, uint32_t count )
Set Unified Memory Counter configuration.
Parameters
config
A pointer to CUpti_ActivityUnifiedMemoryCounterConfig structures containing Unified Memory counter configuration.
count
Number of Unified Memory counter configuration structures
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_PARAMETER

    if config is NULL or any parameter in the config structures is not a valid value

  • CUPTI_ERROR_UM_PROFILING_NOT_SUPPORTED

    One potential reason is that platform (OS/arch) does not support the unified memory counters

  • CUPTI_ERROR_UM_PROFILING_NOT_SUPPORTED_ON_DEVICE

    Indicates that the device does not support the unified memory counters

  • CUPTI_ERROR_UM_PROFILING_NOT_SUPPORTED_ON_NON_P2P_DEVICES

    Indicates that multi-GPU configuration without P2P support between any pair of devices does not support the unified memory counters

Description

CUptiResult cuptiActivityDisable ( CUpti_ActivityKind kind )
Disable collection of a specific kind of activity record.
Parameters
kind
The kind of activity record to stop collecting
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_KIND

    if the activity kind is not supported

Description

Disable collection of a specific kind of activity record. Multiple kinds can be disabled by calling this function multiple times. By default all activity kinds are disabled for collection.

CUptiResult cuptiActivityDisableContext ( CUcontext context, CUpti_ActivityKind kind )
Disable collection of a specific kind of activity record for a context.
Parameters
context
The context for which activity is to be disabled
kind
The kind of activity record to stop collecting
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_KIND

    if the activity kind is not supported

Description

Disable collection of a specific kind of activity record for a context. This setting done by this API will supersede the global settings for activity records. Multiple kinds can be enabled by calling this function multiple times.

CUptiResult cuptiActivityEnable ( CUpti_ActivityKind kind )
Enable collection of a specific kind of activity record.
Parameters
kind
The kind of activity record to collect
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_NOT_COMPATIBLE

    if the activity kind cannot be enabled

  • CUPTI_ERROR_INVALID_KIND

    if the activity kind is not supported

Description

Enable collection of a specific kind of activity record. Multiple kinds can be enabled by calling this function multiple times. By default all activity kinds are disabled for collection.

CUptiResult cuptiActivityEnableContext ( CUcontext context, CUpti_ActivityKind kind )
Enable collection of a specific kind of activity record for a context.
Parameters
context
The context for which activity is to be enabled
kind
The kind of activity record to collect
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_NOT_COMPATIBLE

    if the activity kind cannot be enabled

  • CUPTI_ERROR_INVALID_KIND

    if the activity kind is not supported

Description

Enable collection of a specific kind of activity record for a context. This setting done by this API will supersede the global settings for activity records enabled by cuptiActivityEnable. Multiple kinds can be enabled by calling this function multiple times.

CUptiResult cuptiActivityEnableLatencyTimestamps ( uint8_t enable )
Controls the collection of queued and submitted timestamps for kernels.
Parameters
enable
is a boolean, denoting whether these timestamps should be collected
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

Description

This API is used to control the collection of queued and submitted timestamps for kernels whose records are provided through the struct CUpti_ActivityKernel4. Default value is 0, i.e. these timestamps are not collected. This API needs to be called before initialization of CUDA and this setting should not be changed during the profiling session.

CUptiResult cuptiActivityFlush ( CUcontext context, uint32_t streamId, uint32_t flag )
Wait for all activity records are delivered via the completion callback.
Parameters
context
A valid CUcontext or NULL.
streamId
The stream ID.
flag
The flag can be set to indicate a forced flush. See CUpti_ActivityFlag
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_CUPTI_ERROR_INVALID_OPERATION

    if not preceeded by a successful call to cuptiActivityRegisterCallbacks

  • CUPTI_ERROR_UNKNOWN

    an internal error occurred

Description

This function does not return until all activity records associated with the specified context/stream are returned to the CUPTI client using the callback registered in cuptiActivityRegisterCallbacks. To ensure that all activity records are complete, the requested stream(s), if any, are synchronized.

If context is NULL, the global activity records (i.e. those not associated with a particular stream) are flushed (in this case no streams are synchonized). If context is a valid CUcontext and streamId is 0, the buffers of all streams of this context are flushed. Otherwise, the buffers of the specified stream in this context is flushed.

Before calling this function, the buffer handling callback api must be activated by calling cuptiActivityRegisterCallbacks.

**DEPRECATED** This method is deprecated CONTEXT and STREAMID will be ignored. Use cuptiActivityFlushAll to flush all data.

CUptiResult cuptiActivityFlushAll ( uint32_t flag )
Wait for all activity records are delivered via the completion callback.
Parameters
flag
The flag can be set to indicate a forced flush. See CUpti_ActivityFlag
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_OPERATION

    if not preceeded by a successful call to cuptiActivityRegisterCallbacks

  • CUPTI_ERROR_UNKNOWN

    an internal error occurred

Description

This function does not return until all activity records associated with all contexts/streams (and the global buffers not associated with any stream) are returned to the CUPTI client using the callback registered in cuptiActivityRegisterCallbacks. To ensure that all activity records are complete, the requested stream(s), if any, are synchronized.

Before calling this function, the buffer handling callback api must be activated by calling cuptiActivityRegisterCallbacks.

CUptiResult cuptiActivityGetAttribute ( CUpti_ActivityAttribute attr, size_t* valueSize, void* value )
Read an activity API attribute.
Parameters
attr
The attribute to read
valueSize
Size of buffer pointed by the value, and returns the number of bytes written to value
value
Returns the value of the attribute
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_PARAMETER

    if valueSize or value is NULL, or if attr is not an activity attribute

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT

    Indicates that the value buffer is too small to hold the attribute value.

Description

Read an activity API attribute and return it in *value.

CUptiResult cuptiActivityGetNextRecord ( uint8_t* buffer, size_t validBufferSizeBytes, CUpti_Activity** record )
Iterate over the activity records in a buffer.
Parameters
buffer
The buffer containing activity records
validBufferSizeBytes
The number of valid bytes in the buffer.
record
Inputs the previous record returned by cuptiActivityGetNextRecord and returns the next activity record from the buffer. If input value is NULL, returns the first activity record in the buffer. Records of kind CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL may contain invalid (0) timestamps, indicating that no timing information could be collected for lack of device memory.
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_MAX_LIMIT_REACHED

    if no more records in the buffer

  • CUPTI_ERROR_INVALID_PARAMETER

    if buffer is NULL.

Description

This is a helper function to iterate over the activity records in a buffer. A buffer of activity records is typically obtained by receiving a CUpti_BuffersCallbackCompleteFunc callback.

An example of typical usage:

‎ CUpti_Activity *record = NULL;
       CUptiResult status = CUPTI_SUCCESS;
         do {
            status = cuptiActivityGetNextRecord(buffer, validSize, &record);
            if(status == CUPTI_SUCCESS) {
                 // Use record here...
            }
            else if (status == CUPTI_ERROR_MAX_LIMIT_REACHED)
                break;
            else {
                goto Error;
            }
          } while (1);

CUptiResult cuptiActivityGetNumDroppedRecords ( CUcontext context, uint32_t streamId, size_t* dropped )
Get the number of activity records that were dropped of insufficient buffer space.
Parameters
context
The context, or NULL to get dropped count from global queue
streamId
The stream ID
dropped
The number of records that were dropped since the last call to this function.
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_PARAMETER

    if dropped is NULL

Description

Get the number of records that were dropped because of insufficient buffer space. The dropped count includes records that could not be recorded because CUPTI did not have activity buffer space available for the record (because the CUpti_BuffersCallbackRequestFunc callback did not return an empty buffer of sufficient size) and also CDP records that could not be record because the device-size buffer was full (size is controlled by the CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE_CDP attribute). The dropped count maintained for the queue is reset to zero when this function is called.

CUptiResult cuptiActivityPopExternalCorrelationId ( CUpti_ExternalCorrelationKind kind, uint64_t* lastId )
Pop an external correlation id for the calling thread.
Parameters
kind
The kind of external API activities should be correlated with.
lastId
If the function returns successful, contains the last external correlation id for this kind, can be NULL.
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER

    The external API kind is invalid.

  • CUPTI_ERROR_QUEUE_EMPTY

    No external id is currently associated with kind.

Description

This function notifies CUPTI that the calling thread is leaving an external API region.

CUptiResult cuptiActivityPushExternalCorrelationId ( CUpti_ExternalCorrelationKind kind, uint64_t id )
Push an external correlation id for the calling thread.
Parameters
kind
The kind of external API activities should be correlated with.
id
External correlation id.
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER

    The external API kind is invalid

Description

This function notifies CUPTI that the calling thread is entering an external API region. When a CUPTI activity API record is created while within an external API region and CUPTI_ACTIVITY_KIND_EXTERNAL_CORRELATION is enabled, the activity API record will be preceeded by a CUpti_ActivityExternalCorrelation record for each CUpti_ExternalCorrelationKind.

CUptiResult cuptiActivityRegisterCallbacks ( CUpti_BuffersCallbackRequestFunc funcBufferRequested, CUpti_BuffersCallbackCompleteFunc funcBufferCompleted )
Registers callback functions with CUPTI for activity buffer handling.
Parameters
funcBufferRequested
callback which is invoked when an empty buffer is requested by CUPTI
funcBufferCompleted
callback which is invoked when a buffer containing activity records is available from CUPTI
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER

    if either funcBufferRequested or funcBufferCompleted is NULL

Description

This function registers two callback functions to be used in asynchronous buffer handling. If registered, activity record buffers are handled using asynchronous requested/completed callbacks from CUPTI.

Registering these callbacks prevents the client from using CUPTI's blocking enqueue/dequeue functions.

CUptiResult cuptiActivitySetAttribute ( CUpti_ActivityAttribute attr, size_t* valueSize, void* value )
Write an activity API attribute.
Parameters
attr
The attribute to write
valueSize
The size, in bytes, of the value
value
The attribute value to write
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_PARAMETER

    if valueSize or value is NULL, or if attr is not an activity attribute

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT

    Indicates that the value buffer is too small to hold the attribute value.

Description

Write an activity API attribute.

CUptiResult cuptiComputeCapabilitySupported ( int  major, int  minor, int* support )
Check support for a compute capability.
Parameters
major
The major revision number of the compute capability
minor
The minor revision number of the compute capability
support
Pointer to an integer to return the support status
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER

    if support is NULL

Description

This function is used to check the support for a device based on it's compute capability. It sets the support when the compute capability is supported by the current version of CUPTI, and clears it otherwise. This version of CUPTI might not support all GPUs sharing the same compute capability. It is suggested to use API cuptiDeviceSupported which provides correct information.

See also:

cuptiDeviceSupported

CUptiResult cuptiDeviceSupported ( CUdevice dev, int* support )
Check support for a compute device.
Parameters
dev
The device handle returned by CUDA Driver API cuDeviceGet
support
Pointer to an integer to return the support status
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER

    if support is NULL

  • CUPTI_ERROR_INVALID_DEVICE

    if dev is not a valid device

Description

This function is used to check the support for a compute device. It sets the support when the device is supported by the current version of CUPTI, and clears it otherwise.

See also:

cuptiComputeCapabilitySupported

CUptiResult cuptiDeviceVirtualizationMode ( CUdevice dev, CUpti_DeviceVirtualizationMode* mode )
Query the virtualization mode of the device.
Parameters
dev
The device handle returned by CUDA Driver API cuDeviceGet
mode
Pointer to an CUpti_DeviceVirtualizationMode to return the virtualization mode
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_DEVICE

    if dev is not a valid device

  • CUPTI_ERROR_INVALID_PARAMETER

    if mode is NULL

Description

This function is used to query the virtualization mode of the CUDA device.

CUptiResult cuptiFinalize ( void )
Detach CUPTI from the running process.
Description

This API detaches the CUPTI from the running process. It destroys and cleans up all the resources associated with CUPTI in the current process. After CUPTI detaches from the process, the process will keep on running with no CUPTI attached to it. For safe operation of the API, it is recommended this API is invoked from the exit callsite of any of the CUDA Driver or Runtime API. Otherwise CUPTI client needs to make sure that required CUDA synchronization and CUPTI activity buffer flush is done before calling the API. Sample code showing the usage of the API in the cupti callback handler code:

‎    void CUPTIAPI
          cuptiCallbackHandler(void *userdata, CUpti_CallbackDomain domain,
              CUpti_CallbackId cbid, void *cbdata)
          {
              const CUpti_CallbackData *cbInfo = (CUpti_CallbackData *)cbdata;
      
              // Take this code path when CUPTI detach is requested
              if (detachCupti) {
                  switch(domain)
                  {
                  case CUPTI_CB_DOMAIN_RUNTIME_API:
                  case CUPTI_CB_DOMAIN_DRIVER_API:
                      if (cbInfo->callbackSite == CUPTI_API_EXIT) {
                          // call the CUPTI detach API
                          cuptiFinalize();
                      }
                      break;
                  default:
                      break;
                  }
              }
          }

CUptiResult cuptiGetAutoBoostState ( CUcontext context, CUpti_ActivityAutoBoostState* state )
Get auto boost state.
Parameters
context
A valid CUcontext.
state
A pointer to CUpti_ActivityAutoBoostState structure which contains the current state and the id of the process that has requested the current state
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER

    if CUcontext or state is NULL

  • CUPTI_ERROR_NOT_SUPPORTED

    Indicates that the device does not support auto boost

  • CUPTI_ERROR_UNKNOWN

    an internal error occurred

Description

The profiling results can be inconsistent in case auto boost is enabled. CUPTI tries to disable auto boost while profiling. It can fail to disable in cases where user does not have the permissions or CUDA_AUTO_BOOST env variable is set. The function can be used to query whether auto boost is enabled.

CUptiResult cuptiGetContextId ( CUcontext context, uint32_t* contextId )
Get the ID of a context.
Parameters
context
The context
contextId
Returns a process-unique ID for the context
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_CONTEXT

    The context is NULL or not valid.

  • CUPTI_ERROR_INVALID_PARAMETER

    if contextId is NULL

Description

Get the ID of a context.

CUptiResult cuptiGetDeviceId ( CUcontext context, uint32_t* deviceId )
Get the ID of a device.
Parameters
context
The context, or NULL to indicate the current context.
deviceId
Returns the ID of the device that is current for the calling thread.
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_DEVICE

    if unable to get device ID

  • CUPTI_ERROR_INVALID_PARAMETER

    if deviceId is NULL

Description

If context is NULL, returns the ID of the device that contains the currently active context. If context is non-NULL, returns the ID of the device which contains that context. Operates in a similar manner to cudaGetDevice() or cuCtxGetDevice() but may be called from within callback functions.

CUptiResult cuptiGetGraphNodeId ( CUgraphNode node, uint64_t* nodeId )
Get the unique ID of a graph node.
Parameters
node
The graph node.
nodeId
Returns the unique ID of the node
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_PARAMETER

    if node is NULL

Description

Returns the unique ID of the CUDA graph node.

CUptiResult cuptiGetLastError ( void )
Returns the last error from a cupti call or callback.
Description

Returns the last error that has been produced by any of the cupti api calls or the callback in the same host thread and resets it to CUPTI_SUCCESS.

CUptiResult cuptiGetStreamId ( CUcontext context, CUstream stream, uint32_t* streamId )
Get the ID of a stream.
Parameters
context
If non-NULL then the stream is checked to ensure that it belongs to this context. Typically this parameter should be null.
stream
The stream
streamId
Returns a context-unique ID for the stream
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_STREAM

    if unable to get stream ID, or if context is non-NULL and stream does not belong to the context

  • CUPTI_ERROR_INVALID_PARAMETER

    if streamId is NULL

Description

Get the ID of a stream. The stream ID is unique within a context (i.e. all streams within a context will have unique stream IDs).

**DEPRECATED** This method is deprecated as of CUDA 8.0. Use method cuptiGetStreamIdEx instead.

CUptiResult cuptiGetStreamIdEx ( CUcontext context, CUstream stream, uint8_t perThreadStream, uint32_t* streamId )
Get the ID of a stream.
Parameters
context
If non-NULL then the stream is checked to ensure that it belongs to this context. Typically this parameter should be null.
stream
The stream
perThreadStream
Flag to indicate if program is compiled for per-thread streams
streamId
Returns a context-unique ID for the stream
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_STREAM

    if unable to get stream ID, or if context is non-NULL and stream does not belong to the context

  • CUPTI_ERROR_INVALID_PARAMETER

    if streamId is NULL

Description

Get the ID of a stream. The stream ID is unique within a context (i.e. all streams within a context will have unique stream IDs).

CUptiResult cuptiGetThreadIdType ( CUpti_ActivityThreadIdType* type )
Get the thread-id type.
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER

    if type is NULL

Description

Returns the thread-id type used in CUPTI

CUptiResult cuptiGetTimestamp ( uint64_t* timestamp )
Get the CUPTI timestamp.
Parameters
timestamp
Returns the CUPTI timestamp
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER

    if timestamp is NULL

Description

Returns a timestamp normalized to correspond with the start and end timestamps reported in the CUPTI activity records. The timestamp is reported in nanoseconds.

CUptiResult cuptiSetThreadIdType ( CUpti_ActivityThreadIdType type )
Set the thread-id type.
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_SUPPORTED

    if type is not supported on the platform

Description

CUPTI uses the method corresponding to set type to generate the thread-id. See enum /ref CUpti_ActivityThreadIdType for the list of methods. Activity records having thread-id field contain the same value. Thread id type must not be changed during the profiling session to avoid thread-id value mismatch across activity records.

3.4. CUPTI Callback API

Functions, types, and enums that implement the CUPTI Callback API.

Classes

struct 
Data passed into a runtime or driver API callback function.
struct 
CUDA graphs data passed into a resource callback function.
struct 
Module data passed into a resource callback function.
struct 
Data passed into a NVTX callback function.
struct 
Data passed into a resource callback function.
struct 
Data passed into a synchronize callback function.

Typedefs

typedef void  ( *CUpti_CallbackFunc )( void*  userdata,  CUpti_CallbackDomain domain,  CUpti_CallbackId cbid, const void*  cbdata )
Function type for a callback.
typedef uint32_t  CUpti_CallbackId
An ID for a driver API, runtime API, resource or synchronization callback.
typedef CUpti_CallbackDomain* CUpti_DomainTable
Pointer to an array of callback domains.
typedef CUpti_Subscriber_st *  CUpti_SubscriberHandle
A callback subscriber.

Enumerations

enum CUpti_ApiCallbackSite
Specifies the point in an API call that a callback is issued.
enum CUpti_CallbackDomain
Callback domains.
enum CUpti_CallbackIdResource
Callback IDs for resource domain.
enum CUpti_CallbackIdSync
Callback IDs for synchronization domain.

Functions

CUptiResult cuptiEnableAllDomains ( uint32_t enable, CUpti_SubscriberHandle subscriber )
Enable or disable all callbacks in all domains.
CUptiResult cuptiEnableCallback ( uint32_t enable, CUpti_SubscriberHandle subscriber, CUpti_CallbackDomain domain, CUpti_CallbackId cbid )
Enable or disabled callbacks for a specific domain and callback ID.
CUptiResult cuptiEnableDomain ( uint32_t enable, CUpti_SubscriberHandle subscriber, CUpti_CallbackDomain domain )
Enable or disabled all callbacks for a specific domain.
CUptiResult cuptiGetCallbackName ( CUpti_CallbackDomain domain, uint32_t cbid, const char** name )
Get the name of a callback for a specific domain and callback ID.
CUptiResult cuptiGetCallbackState ( uint32_t* enable, CUpti_SubscriberHandle subscriber, CUpti_CallbackDomain domain, CUpti_CallbackId cbid )
Get the current enabled/disabled state of a callback for a specific domain and function ID.
CUptiResult cuptiSubscribe ( CUpti_SubscriberHandle* subscriber, CUpti_CallbackFunc callback, void* userdata )
Initialize a callback subscriber with a callback function and user data.
CUptiResult cuptiSupportedDomains ( size_t* domainCount, CUpti_DomainTable* domainTable )
Get the available callback domains.
CUptiResult cuptiUnsubscribe ( CUpti_SubscriberHandle subscriber )
Unregister a callback subscriber.

Typedefs

void ( *CUpti_CallbackFunc )( void*  userdata,  CUpti_CallbackDomain domain,  CUpti_CallbackId cbid, const void*  cbdata )

Function type for a callback. Function type for a callback. The type of the data passed to the callback in cbdata depends on the domain. If domain is CUPTI_CB_DOMAIN_DRIVER_API or CUPTI_CB_DOMAIN_RUNTIME_API the type of cbdata will be CUpti_CallbackData. If domain is CUPTI_CB_DOMAIN_RESOURCE the type of cbdata will be CUpti_ResourceData. If domain is CUPTI_CB_DOMAIN_SYNCHRONIZE the type of cbdata will be CUpti_SynchronizeData. If domain is CUPTI_CB_DOMAIN_NVTX the type of cbdata will be CUpti_NvtxData.

Parameters
userdata
User data supplied at subscription of the callback
CUpti_CallbackDomain domain
CUpti_CallbackId cbid
cbdata
Data passed to the callback.
typedef uint32_t CUpti_CallbackId

An ID for a driver API, runtime API, resource or synchronization callback. An ID for a driver API, runtime API, resource or synchronization callback. Within a driver API callback this should be interpreted as a CUpti_driver_api_trace_cbid value (these values are defined in cupti_driver_cbid.h). Within a runtime API callback this should be interpreted as a CUpti_runtime_api_trace_cbid value (these values are defined in cupti_runtime_cbid.h). Within a resource API callback this should be interpreted as a CUpti_CallbackIdResource value. Within a synchronize API callback this should be interpreted as a CUpti_CallbackIdSync value.

typedef CUpti_CallbackDomain* CUpti_DomainTable

Pointer to an array of callback domains.

typedef CUpti_Subscriber_st * CUpti_SubscriberHandle

A callback subscriber.

Enumerations

enum CUpti_ApiCallbackSite

Specifies the point in an API call that a callback is issued. This value is communicated to the callback function via CUpti_CallbackData::callbackSite.

Values
CUPTI_API_ENTER = 0
The callback is at the entry of the API call.
CUPTI_API_EXIT = 1
The callback is at the exit of the API call.
CUPTI_API_CBSITE_FORCE_INT = 0x7fffffff
enum CUpti_CallbackDomain

Callback domains. Each domain represents callback points for a group of related API functions or CUDA driver activity.

Values
CUPTI_CB_DOMAIN_INVALID = 0
Invalid domain.
CUPTI_CB_DOMAIN_DRIVER_API = 1
Domain containing callback points for all driver API functions.
CUPTI_CB_DOMAIN_RUNTIME_API = 2
Domain containing callback points for all runtime API functions.
CUPTI_CB_DOMAIN_RESOURCE = 3
Domain containing callback points for CUDA resource tracking.
CUPTI_CB_DOMAIN_SYNCHRONIZE = 4
Domain containing callback points for CUDA synchronization.
CUPTI_CB_DOMAIN_NVTX = 5
Domain containing callback points for NVTX API functions.
CUPTI_CB_DOMAIN_SIZE = 6
CUPTI_CB_DOMAIN_FORCE_INT = 0x7fffffff
enum CUpti_CallbackIdResource

Callback IDs for resource domain, CUPTI_CB_DOMAIN_RESOURCE. This value is communicated to the callback function via the cbid parameter.

Values
CUPTI_CBID_RESOURCE_INVALID = 0
Invalid resource callback ID.
CUPTI_CBID_RESOURCE_CONTEXT_CREATED = 1
A new context has been created.
CUPTI_CBID_RESOURCE_CONTEXT_DESTROY_STARTING = 2
A context is about to be destroyed.
CUPTI_CBID_RESOURCE_STREAM_CREATED = 3
A new stream has been created.
CUPTI_CBID_RESOURCE_STREAM_DESTROY_STARTING = 4
A stream is about to be destroyed.
CUPTI_CBID_RESOURCE_CU_INIT_FINISHED = 5
The driver has finished initializing.
CUPTI_CBID_RESOURCE_MODULE_LOADED = 6
A module has been loaded.
CUPTI_CBID_RESOURCE_MODULE_UNLOAD_STARTING = 7
A module is about to be unloaded.
CUPTI_CBID_RESOURCE_MODULE_PROFILED = 8
The current module which is being profiled.
CUPTI_CBID_RESOURCE_GRAPH_CREATED = 9
CUDA graph has been created.
CUPTI_CBID_RESOURCE_GRAPH_DESTROY_STARTING = 10
CUDA graph is about to be destroyed.
CUPTI_CBID_RESOURCE_GRAPH_CLONED = 11
CUDA graph is cloned.
CUPTI_CBID_RESOURCE_GRAPHNODE_CREATE_STARTING = 12
CUDA graph node is about to be created
CUPTI_CBID_RESOURCE_GRAPHNODE_CREATED = 13
CUDA graph node is created.
CUPTI_CBID_RESOURCE_GRAPHNODE_DESTROY_STARTING = 14
CUDA graph node is about to be destroyed.
CUPTI_CBID_RESOURCE_GRAPHNODE_DEPENDENCY_CREATED = 15
Dependency on a CUDA graph node is created.
CUPTI_CBID_RESOURCE_GRAPHNODE_DEPENDENCY_DESTROY_STARTING = 16
Dependency on a CUDA graph node is destroyed.
CUPTI_CBID_RESOURCE_GRAPHEXEC_CREATE_STARTING = 17
An executable CUDA graph is about to be created.
CUPTI_CBID_RESOURCE_GRAPHEXEC_CREATED = 18
An executable CUDA graph is created.
CUPTI_CBID_RESOURCE_GRAPHEXEC_DESTROY_STARTING = 19
An executable CUDA graph is about to be destroyed.
CUPTI_CBID_RESOURCE_GRAPHNODE_CLONED = 20
CUDA graph node is cloned.
CUPTI_CBID_RESOURCE_SIZE
CUPTI_CBID_RESOURCE_FORCE_INT = 0x7fffffff
enum CUpti_CallbackIdSync

Callback IDs for synchronization domain, CUPTI_CB_DOMAIN_SYNCHRONIZE. This value is communicated to the callback function via the cbid parameter.

Values
CUPTI_CBID_SYNCHRONIZE_INVALID = 0
Invalid synchronize callback ID.
CUPTI_CBID_SYNCHRONIZE_STREAM_SYNCHRONIZED = 1
Stream synchronization has completed for the stream.
CUPTI_CBID_SYNCHRONIZE_CONTEXT_SYNCHRONIZED = 2
Context synchronization has completed for the context.
CUPTI_CBID_SYNCHRONIZE_SIZE
CUPTI_CBID_SYNCHRONIZE_FORCE_INT = 0x7fffffff

Functions

CUptiResult cuptiEnableAllDomains ( uint32_t enable, CUpti_SubscriberHandle subscriber )
Enable or disable all callbacks in all domains.
Parameters
enable
New enable state for all callbacks in all domain. Zero disables all callbacks, non-zero enables all callbacks.
subscriber
- Handle to callback subscription
Returns

  • CUPTI_SUCCESS

    on success

  • CUPTI_ERROR_NOT_INITIALIZED

    if unable to initialized CUPTI

  • CUPTI_ERROR_INVALID_PARAMETER

    if subscriber is invalid

Description

Enable or disable all callbacks in all domains.

Note:

Thread-safety: a subscriber must serialize access to cuptiGetCallbackState, cuptiEnableCallback, cuptiEnableDomain, and cuptiEnableAllDomains. For example, if cuptiGetCallbackState(sub, d, *) and cuptiEnableAllDomains(sub) are called concurrently, the results are undefined.

CUptiResult cuptiEnableCallback ( uint32_t enable, CUpti_SubscriberHandle subscriber, CUpti_CallbackDomain domain, CUpti_CallbackId cbid )
Enable or disabled callbacks for a specific domain and callback ID.
Parameters
enable
New enable state for the callback. Zero disables the callback, non-zero enables the callback.
subscriber
- Handle to callback subscription
domain
The domain of the callback
cbid
The ID of the callback
Returns

  • CUPTI_SUCCESS

    on success

  • CUPTI_ERROR_NOT_INITIALIZED

    if unable to initialized CUPTI

  • CUPTI_ERROR_INVALID_PARAMETER

    if subscriber, domain or cbid is invalid.

Description

Enable or disabled callbacks for a subscriber for a specific domain and callback ID.

Note:

Thread-safety: a subscriber must serialize access to cuptiGetCallbackState, cuptiEnableCallback, cuptiEnableDomain, and cuptiEnableAllDomains. For example, if cuptiGetCallbackState(sub, d, c) and cuptiEnableCallback(sub, d, c) are called concurrently, the results are undefined.

CUptiResult cuptiEnableDomain ( uint32_t enable, CUpti_SubscriberHandle subscriber, CUpti_CallbackDomain domain )
Enable or disabled all callbacks for a specific domain.
Parameters
enable
New enable state for all callbacks in the domain. Zero disables all callbacks, non-zero enables all callbacks.
subscriber
- Handle to callback subscription
domain
The domain of the callback
Returns

  • CUPTI_SUCCESS

    on success

  • CUPTI_ERROR_NOT_INITIALIZED

    if unable to initialized CUPTI

  • CUPTI_ERROR_INVALID_PARAMETER

    if subscriber or domain is invalid

Description

Enable or disabled all callbacks for a specific domain.

Note:

Thread-safety: a subscriber must serialize access to cuptiGetCallbackState, cuptiEnableCallback, cuptiEnableDomain, and cuptiEnableAllDomains. For example, if cuptiGetCallbackEnabled(sub, d, *) and cuptiEnableDomain(sub, d) are called concurrently, the results are undefined.

CUptiResult cuptiGetCallbackName ( CUpti_CallbackDomain domain, uint32_t cbid, const char** name )
Get the name of a callback for a specific domain and callback ID.
Parameters
domain
The domain of the callback
cbid
The ID of the callback
name
Returns pointer to the name string on success, NULL otherwise
Returns

  • CUPTI_SUCCESS

    on success

  • CUPTI_ERROR_INVALID_PARAMETER

    if name is NULL, or if domain or cbid is invalid.

Description

Returns a pointer to the name c_string in **name.

Note:

Names are available only for the DRIVER and RUNTIME domains.

CUptiResult cuptiGetCallbackState ( uint32_t* enable, CUpti_SubscriberHandle subscriber, CUpti_CallbackDomain domain, CUpti_CallbackId cbid )
Get the current enabled/disabled state of a callback for a specific domain and function ID.
Parameters
enable
Returns non-zero if callback enabled, zero if not enabled
subscriber
Handle to the initialize subscriber
domain
The domain of the callback
cbid
The ID of the callback
Returns

  • CUPTI_SUCCESS

    on success

  • CUPTI_ERROR_NOT_INITIALIZED

    if unable to initialized CUPTI

  • CUPTI_ERROR_INVALID_PARAMETER

    if enabled is NULL, or if subscriber, domain or cbid is invalid.

Description

Returns non-zero in *enable if the callback for a domain and callback ID is enabled, and zero if not enabled.

Note:

Thread-safety: a subscriber must serialize access to cuptiGetCallbackState, cuptiEnableCallback, cuptiEnableDomain, and cuptiEnableAllDomains. For example, if cuptiGetCallbackState(sub, d, c) and cuptiEnableCallback(sub, d, c) are called concurrently, the results are undefined.

CUptiResult cuptiSubscribe ( CUpti_SubscriberHandle* subscriber, CUpti_CallbackFunc callback, void* userdata )
Initialize a callback subscriber with a callback function and user data.
Parameters
subscriber
Returns handle to initialize subscriber
callback
The callback function
userdata
A pointer to user data. This data will be passed to the callback function via the userdata paramater.
Returns

  • CUPTI_SUCCESS

    on success

  • CUPTI_ERROR_NOT_INITIALIZED

    if unable to initialize CUPTI

  • CUPTI_ERROR_MULTIPLE_SUBSCRIBERS_NOT_SUPPORTED

    if there is already a CUPTI subscriber

  • CUPTI_ERROR_INVALID_PARAMETER

    if subscriber is NULL

Description

Initializes a callback subscriber with a callback function and (optionally) a pointer to user data. The returned subscriber handle can be used to enable and disable the callback for specific domains and callback IDs.

Note:
  • Only a single subscriber can be registered at a time. To ensure that no other CUPTI client interrupts the profiling session, it's the responsibility of all the CUPTI clients to call this function before starting the profling session. In case profiling session is already started by another CUPTI client, this function returns the error code CUPTI_ERROR_MULTIPLE_SUBSCRIBERS_NOT_SUPPORTED. Note that this function returns the same error when application is launched using NVIDIA tools like nvprof, Visual Profiler, Nsight Systems, Nsight Compute, cuda-gdb and cuda-memcheck.

  • This function does not enable any callbacks.

  • Thread-safety: this function is thread safe.

CUptiResult cuptiSupportedDomains ( size_t* domainCount, CUpti_DomainTable* domainTable )
Get the available callback domains.
Parameters
domainCount
Returns number of callback domains
domainTable
Returns pointer to array of available callback domains
Returns

  • CUPTI_SUCCESS

    on success

  • CUPTI_ERROR_NOT_INITIALIZED

    if unable to initialize CUPTI

  • CUPTI_ERROR_INVALID_PARAMETER

    if domainCount or domainTable are NULL

Description

Returns in *domainTable an array of size *domainCount of all the available callback domains.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiUnsubscribe ( CUpti_SubscriberHandle subscriber )
Unregister a callback subscriber.
Parameters
subscriber
Handle to the initialize subscriber
Returns

  • CUPTI_SUCCESS

    on success

  • CUPTI_ERROR_NOT_INITIALIZED

    if unable to initialized CUPTI

  • CUPTI_ERROR_INVALID_PARAMETER

    if subscriber is NULL or not initialized

Description

Removes a callback subscriber so that no future callbacks will be issued to that subscriber.

Note:

Thread-safety: this function is thread safe.

3.5. CUPTI Event API

Functions, types, and enums that implement the CUPTI Event API.

Classes

struct 
A set of event groups.
struct 
A set of event group sets.

Defines

#define CUPTI_EVENT_INVALID
The value that indicates the event value is invalid.
#define CUPTI_EVENT_OVERFLOW
The overflow value for a CUPTI event.

Typedefs

typedef uint32_t  CUpti_EventDomainID
ID for an event domain.
typedef void *  CUpti_EventGroup
A group of events.
typedef uint32_t  CUpti_EventID
ID for an event.
typedef void  ( *CUpti_KernelReplayUpdateFunc )( const char*  kernelName,  int numReplaysDone, void*  customData )
Function type for getting updates on kernel replay.

Enumerations

enum CUpti_DeviceAttribute
Device attributes.
enum CUpti_DeviceAttributeDeviceClass
Device class.
enum CUpti_EventAttribute
Event attributes.
enum CUpti_EventCategory
An event category.
enum CUpti_EventCollectionMethod
The collection method used for an event.
enum CUpti_EventCollectionMode
Event collection modes.
enum CUpti_EventDomainAttribute
Event domain attributes.
enum CUpti_EventGroupAttribute
Event group attributes.
enum CUpti_EventProfilingScope
Profiling scope for event.
enum CUpti_ReadEventFlags
Flags for cuptiEventGroupReadEvent an cuptiEventGroupReadAllEvents.

Functions

CUptiResult cuptiDeviceEnumEventDomains ( CUdevice device, size_t* arraySizeBytes, CUpti_EventDomainID* domainArray )
Get the event domains for a device.
CUptiResult cuptiDeviceGetAttribute ( CUdevice device, CUpti_DeviceAttribute attrib, size_t* valueSize, void* value )
Read a device attribute.
CUptiResult cuptiDeviceGetEventDomainAttribute ( CUdevice device, CUpti_EventDomainID eventDomain, CUpti_EventDomainAttribute attrib, size_t* valueSize, void* value )
Read an event domain attribute.
CUptiResult cuptiDeviceGetNumEventDomains ( CUdevice device, uint32_t* numDomains )
Get the number of domains for a device.
CUptiResult cuptiDeviceGetTimestamp ( CUcontext context, uint64_t* timestamp )
Read a device timestamp.
CUptiResult cuptiDisableKernelReplayMode ( CUcontext context )
Disable kernel replay mode.
CUptiResult cuptiEnableKernelReplayMode ( CUcontext context )
Enable kernel replay mode.
CUptiResult cuptiEnumEventDomains ( size_t* arraySizeBytes, CUpti_EventDomainID* domainArray )
Get the event domains available on any device.
CUptiResult cuptiEventDomainEnumEvents ( CUpti_EventDomainID eventDomain, size_t* arraySizeBytes, CUpti_EventID* eventArray )
Get the events in a domain.
CUptiResult cuptiEventDomainGetAttribute ( CUpti_EventDomainID eventDomain, CUpti_EventDomainAttribute attrib, size_t* valueSize, void* value )
Read an event domain attribute.
CUptiResult cuptiEventDomainGetNumEvents ( CUpti_EventDomainID eventDomain, uint32_t* numEvents )
Get number of events in a domain.
CUptiResult cuptiEventGetAttribute ( CUpti_EventID event, CUpti_EventAttribute attrib, size_t* valueSize, void* value )
Get an event attribute.
CUptiResult cuptiEventGetIdFromName ( CUdevice device, const char* eventName, CUpti_EventID* event )
Find an event by name.
CUptiResult cuptiEventGroupAddEvent ( CUpti_EventGroup eventGroup, CUpti_EventID event )
Add an event to an event group.
CUptiResult cuptiEventGroupCreate ( CUcontext context, CUpti_EventGroup* eventGroup, uint32_t flags )
Create a new event group for a context.
CUptiResult cuptiEventGroupDestroy ( CUpti_EventGroup eventGroup )
Destroy an event group.
CUptiResult cuptiEventGroupDisable ( CUpti_EventGroup eventGroup )
Disable an event group.
CUptiResult cuptiEventGroupEnable ( CUpti_EventGroup eventGroup )
Enable an event group.
CUptiResult cuptiEventGroupGetAttribute ( CUpti_EventGroup eventGroup, CUpti_EventGroupAttribute attrib, size_t* valueSize, void* value )
Read an event group attribute.
CUptiResult cuptiEventGroupReadAllEvents ( CUpti_EventGroup eventGroup, CUpti_ReadEventFlags flags, size_t* eventValueBufferSizeBytes, uint64_t* eventValueBuffer, size_t* eventIdArraySizeBytes, CUpti_EventID* eventIdArray, size_t* numEventIdsRead )
Read the values for all the events in an event group.
CUptiResult cuptiEventGroupReadEvent ( CUpti_EventGroup eventGroup, CUpti_ReadEventFlags flags, CUpti_EventID event, size_t* eventValueBufferSizeBytes, uint64_t* eventValueBuffer )
Read the value for an event in an event group.
CUptiResult cuptiEventGroupRemoveAllEvents ( CUpti_EventGroup eventGroup )
Remove all events from an event group.
CUptiResult cuptiEventGroupRemoveEvent ( CUpti_EventGroup eventGroup, CUpti_EventID event )
Remove an event from an event group.
CUptiResult cuptiEventGroupResetAllEvents ( CUpti_EventGroup eventGroup )
Zero all the event counts in an event group.
CUptiResult cuptiEventGroupSetAttribute ( CUpti_EventGroup eventGroup, CUpti_EventGroupAttribute attrib, size_t valueSize, void* value )
Write an event group attribute.
CUptiResult cuptiEventGroupSetDisable ( CUpti_EventGroupSet* eventGroupSet )
Disable an event group set.
CUptiResult cuptiEventGroupSetEnable ( CUpti_EventGroupSet* eventGroupSet )
Enable an event group set.
CUptiResult cuptiEventGroupSetsCreate ( CUcontext context, size_t eventIdArraySizeBytes, CUpti_EventID* eventIdArray, CUpti_EventGroupSets** eventGroupPasses )
For a set of events, get the grouping that indicates the number of passes and the event groups necessary to collect the events.
CUptiResult cuptiEventGroupSetsDestroy ( CUpti_EventGroupSets* eventGroupSets )
Destroy a CUpti_EventGroupSets object.
CUptiResult cuptiGetNumEventDomains ( uint32_t* numDomains )
Get the number of event domains available on any device.
CUptiResult cuptiKernelReplaySubscribeUpdate ( CUpti_KernelReplayUpdateFunc updateFunc, void* customData )
Subscribe to kernel replay updates.
CUptiResult cuptiSetEventCollectionMode ( CUcontext context, CUpti_EventCollectionMode mode )
Set the event collection mode.

Defines

#define CUPTI_EVENT_INVALID

Value

((uint64_t)0xFFFFFFFFFFFFFFFEULL)

#define CUPTI_EVENT_OVERFLOW

The CUPTI event value that indicates an overflow.

Value

((uint64_t)0xFFFFFFFFFFFFFFFFULL)

Typedefs

typedef uint32_t CUpti_EventDomainID

ID for an event domain. ID for an event domain. An event domain represents a group of related events. A device may have multiple instances of a domain, indicating that the device can simultaneously record multiple instances of each event within that domain.

typedef void * CUpti_EventGroup

A group of events. An event group is a collection of events that are managed together. All events in an event group must belong to the same domain.

typedef uint32_t CUpti_EventID

ID for an event. An event represents a countable activity, action, or occurrence on the device.

void ( *CUpti_KernelReplayUpdateFunc )( const char*  kernelName,  int numReplaysDone, void*  customData )

Function type for getting updates on kernel replay.

Parameters
kernelName
The mangled kernel name
int numReplaysDone
customData
Pointer of any custom data passed in when subscribing

Enumerations

enum CUpti_DeviceAttribute

CUPTI device attributes. These attributes can be read using cuptiDeviceGetAttribute.

Values
CUPTI_DEVICE_ATTR_MAX_EVENT_ID = 1
Number of event IDs for a device. Value is a uint32_t.
CUPTI_DEVICE_ATTR_MAX_EVENT_DOMAIN_ID = 2
Number of event domain IDs for a device. Value is a uint32_t.
CUPTI_DEVICE_ATTR_GLOBAL_MEMORY_BANDWIDTH = 3
Get global memory bandwidth in Kbytes/sec. Value is a uint64_t.
CUPTI_DEVICE_ATTR_INSTRUCTION_PER_CYCLE = 4
Get theoretical maximum number of instructions per cycle. Value is a uint32_t.
CUPTI_DEVICE_ATTR_INSTRUCTION_THROUGHPUT_SINGLE_PRECISION = 5
Get theoretical maximum number of single precision instructions that can be executed per second. Value is a uint64_t.
CUPTI_DEVICE_ATTR_MAX_FRAME_BUFFERS = 6
Get number of frame buffers for device. Value is a uint64_t.
CUPTI_DEVICE_ATTR_PCIE_LINK_RATE = 7
Get PCIE link rate in Mega bits/sec for device. Return 0 if bus-type is non-PCIE. Value is a uint64_t.
CUPTI_DEVICE_ATTR_PCIE_LINK_WIDTH = 8
Get PCIE link width for device. Return 0 if bus-type is non-PCIE. Value is a uint64_t.
CUPTI_DEVICE_ATTR_PCIE_GEN = 9
Get PCIE generation for device. Return 0 if bus-type is non-PCIE. Value is a uint64_t.
CUPTI_DEVICE_ATTR_DEVICE_CLASS = 10
Get the class for the device. Value is a CUpti_DeviceAttributeDeviceClass.
CUPTI_DEVICE_ATTR_FLOP_SP_PER_CYCLE = 11
Get the peak single precision flop per cycle. Value is a uint64_t.
CUPTI_DEVICE_ATTR_FLOP_DP_PER_CYCLE = 12
Get the peak double precision flop per cycle. Value is a uint64_t.
CUPTI_DEVICE_ATTR_MAX_L2_UNITS = 13
Get number of L2 units. Value is a uint64_t.
CUPTI_DEVICE_ATTR_MAX_SHARED_MEMORY_CACHE_CONFIG_PREFER_SHARED = 14
Get the maximum shared memory for the CU_FUNC_CACHE_PREFER_SHARED preference. Value is a uint64_t.
CUPTI_DEVICE_ATTR_MAX_SHARED_MEMORY_CACHE_CONFIG_PREFER_L1 = 15
Get the maximum shared memory for the CU_FUNC_CACHE_PREFER_L1 preference. Value is a uint64_t.
CUPTI_DEVICE_ATTR_MAX_SHARED_MEMORY_CACHE_CONFIG_PREFER_EQUAL = 16
Get the maximum shared memory for the CU_FUNC_CACHE_PREFER_EQUAL preference. Value is a uint64_t.
CUPTI_DEVICE_ATTR_FLOP_HP_PER_CYCLE = 17
Get the peak half precision flop per cycle. Value is a uint64_t.
CUPTI_DEVICE_ATTR_NVLINK_PRESENT = 18
Check if Nvlink is connected to device. Returns 1, if at least one Nvlink is connected to the device, returns 0 otherwise. Value is a uint32_t.
CUPTI_DEVICE_ATTR_GPU_CPU_NVLINK_BW = 19
Check if Nvlink is present between GPU and CPU. Returns Bandwidth, in Bytes/sec, if Nvlink is present, returns 0 otherwise. Value is a uint64_t.
CUPTI_DEVICE_ATTR_NVSWITCH_PRESENT = 20
Check if NVSwitch is present in the underlying topology. Returns 1, if present, returns 0 otherwise. Value is a uint32_t.
CUPTI_DEVICE_ATTR_FORCE_INT = 0x7fffffff
enum CUpti_DeviceAttributeDeviceClass

Enumeration of device classes for device attribute CUPTI_DEVICE_ATTR_DEVICE_CLASS.

Values
CUPTI_DEVICE_ATTR_DEVICE_CLASS_TESLA = 0
CUPTI_DEVICE_ATTR_DEVICE_CLASS_QUADRO = 1
CUPTI_DEVICE_ATTR_DEVICE_CLASS_GEFORCE = 2
CUPTI_DEVICE_ATTR_DEVICE_CLASS_TEGRA = 3
enum CUpti_EventAttribute

Event attributes. These attributes can be read using cuptiEventGetAttribute.

Values
CUPTI_EVENT_ATTR_NAME = 0
Event name. Value is a null terminated const c-string.
CUPTI_EVENT_ATTR_SHORT_DESCRIPTION = 1
Short description of event. Value is a null terminated const c-string.
CUPTI_EVENT_ATTR_LONG_DESCRIPTION = 2
Long description of event. Value is a null terminated const c-string.
CUPTI_EVENT_ATTR_CATEGORY = 3
Category of event. Value is CUpti_EventCategory.
CUPTI_EVENT_ATTR_PROFILING_SCOPE = 5
Profiling scope of the events. It can be either device or context or both. Value is a CUpti_EventProfilingScope.
CUPTI_EVENT_ATTR_FORCE_INT = 0x7fffffff
enum CUpti_EventCategory

Each event is assigned to a category that represents the general type of the event. A event's category is accessed using cuptiEventGetAttribute and the CUPTI_EVENT_ATTR_CATEGORY attribute.

Values
CUPTI_EVENT_CATEGORY_INSTRUCTION = 0
An instruction related event.
CUPTI_EVENT_CATEGORY_MEMORY = 1
A memory related event.
CUPTI_EVENT_CATEGORY_CACHE = 2
A cache related event.
CUPTI_EVENT_CATEGORY_PROFILE_TRIGGER = 3
A profile-trigger event.
CUPTI_EVENT_CATEGORY_SYSTEM = 4
A system event.
CUPTI_EVENT_CATEGORY_FORCE_INT = 0x7fffffff
enum CUpti_EventCollectionMethod

The collection method indicates how an event is collected.

Values
CUPTI_EVENT_COLLECTION_METHOD_PM = 0
Event is collected using a hardware global performance monitor.
CUPTI_EVENT_COLLECTION_METHOD_SM = 1
Event is collected using a hardware SM performance monitor.
CUPTI_EVENT_COLLECTION_METHOD_INSTRUMENTED = 2
Event is collected using software instrumentation.
CUPTI_EVENT_COLLECTION_METHOD_NVLINK_TC = 3
Event is collected using NvLink throughput counter method.
CUPTI_EVENT_COLLECTION_METHOD_FORCE_INT = 0x7fffffff
enum CUpti_EventCollectionMode

The event collection mode determines the period over which the events within the enabled event groups will be collected.

Values
CUPTI_EVENT_COLLECTION_MODE_CONTINUOUS = 0
Events are collected for the entire duration between the cuptiEventGroupEnable and cuptiEventGroupDisable calls. Event values are reset when the events are read. For CUDA toolkit v6.0 and older this was the default mode.
CUPTI_EVENT_COLLECTION_MODE_KERNEL = 1
Events are collected only for the durations of kernel executions that occur between the cuptiEventGroupEnable and cuptiEventGroupDisable calls. Event collection begins when a kernel execution begins, and stops when kernel execution completes. Event values are reset to zero when each kernel execution begins. If multiple kernel executions occur between the cuptiEventGroupEnable and cuptiEventGroupDisable calls then the event values must be read after each kernel launch if those events need to be associated with the specific kernel launch. Note that collection in this mode may significantly change the overall performance characteristics of the application because kernel executions that occur between the cuptiEventGroupEnable and cuptiEventGroupDisable calls are serialized on the GPU. This is the default mode from CUDA toolkit v6.5
CUPTI_EVENT_COLLECTION_MODE_FORCE_INT = 0x7fffffff
enum CUpti_EventDomainAttribute

Event domain attributes. Except where noted, all the attributes can be read using either cuptiDeviceGetEventDomainAttribute or cuptiEventDomainGetAttribute.

Values
CUPTI_EVENT_DOMAIN_ATTR_NAME = 0
Event domain name. Value is a null terminated const c-string.
CUPTI_EVENT_DOMAIN_ATTR_INSTANCE_COUNT = 1
Number of instances of the domain for which event counts will be collected. The domain may have additional instances that cannot be profiled (see CUPTI_EVENT_DOMAIN_ATTR_TOTAL_INSTANCE_COUNT). Can be read only with cuptiDeviceGetEventDomainAttribute. Value is a uint32_t.
CUPTI_EVENT_DOMAIN_ATTR_TOTAL_INSTANCE_COUNT = 3
Total number of instances of the domain, including instances that cannot be profiled. Use CUPTI_EVENT_DOMAIN_ATTR_INSTANCE_COUNT to get the number of instances that can be profiled. Can be read only with cuptiDeviceGetEventDomainAttribute. Value is a uint32_t.
CUPTI_EVENT_DOMAIN_ATTR_COLLECTION_METHOD = 4
Collection method used for events contained in the event domain. Value is a CUpti_EventCollectionMethod.
CUPTI_EVENT_DOMAIN_ATTR_FORCE_INT = 0x7fffffff
enum CUpti_EventGroupAttribute

Event group attributes. These attributes can be read using cuptiEventGroupGetAttribute. Attributes marked [rw] can also be written using cuptiEventGroupSetAttribute.

Values
CUPTI_EVENT_GROUP_ATTR_EVENT_DOMAIN_ID = 0
The domain to which the event group is bound. This attribute is set when the first event is added to the group. Value is a CUpti_EventDomainID.
CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES = 1
[rw] Profile all the instances of the domain for this eventgroup. This feature can be used to get load balancing across all instances of a domain. Value is an integer.
CUPTI_EVENT_GROUP_ATTR_USER_DATA = 2
[rw] Reserved for user data.
CUPTI_EVENT_GROUP_ATTR_NUM_EVENTS = 3
Number of events in the group. Value is a uint32_t.
CUPTI_EVENT_GROUP_ATTR_EVENTS = 4
Enumerates events in the group. Value is a pointer to buffer of size sizeof(CUpti_EventID) * num_of_events in the eventgroup. num_of_events can be queried using CUPTI_EVENT_GROUP_ATTR_NUM_EVENTS.
CUPTI_EVENT_GROUP_ATTR_INSTANCE_COUNT = 5
Number of instances of the domain bound to this event group that will be counted. Value is a uint32_t.
CUPTI_EVENT_GROUP_ATTR_PROFILING_SCOPE = 6
Event group scope can be set to CUPTI_EVENT_PROFILING_SCOPE_DEVICE or CUPTI_EVENT_PROFILING_SCOPE_CONTEXT for an eventGroup, before adding any event. Sets the scope of eventgroup as CUPTI_EVENT_PROFILING_SCOPE_DEVICE or CUPTI_EVENT_PROFILING_SCOPE_CONTEXT when the scope of the events that will be added is CUPTI_EVENT_PROFILING_SCOPE_BOTH. If profiling scope of event is either CUPTI_EVENT_PROFILING_SCOPE_DEVICE or CUPTI_EVENT_PROFILING_SCOPE_CONTEXT then setting this attribute will not affect the default scope. It is not allowed to add events of different scope to same eventgroup. Value is a uint32_t.
CUPTI_EVENT_GROUP_ATTR_FORCE_INT = 0x7fffffff
enum CUpti_EventProfilingScope

Profiling scope of event indicates if the event can be collected at context scope or device scope or both i.e. it can be collected at any of context or device scope.

Values
CUPTI_EVENT_PROFILING_SCOPE_CONTEXT = 0
Event is collected at context scope.
CUPTI_EVENT_PROFILING_SCOPE_DEVICE = 1
Event is collected at device scope.
CUPTI_EVENT_PROFILING_SCOPE_BOTH = 2
Event can be collected at device or context scope. The scope can be set using /ref cuptiEventGroupSetAttribute API.
CUPTI_EVENT_PROFILING_SCOPE_FORCE_INT = 0x7fffffff
enum CUpti_ReadEventFlags
Values
CUPTI_EVENT_READ_FLAG_NONE = 0
No flags.
CUPTI_EVENT_READ_FLAG_FORCE_INT = 0x7fffffff

Functions

CUptiResult cuptiDeviceEnumEventDomains ( CUdevice device, size_t* arraySizeBytes, CUpti_EventDomainID* domainArray )
Get the event domains for a device.
Parameters
device
The CUDA device
arraySizeBytes
The size of domainArray in bytes, and returns the number of bytes written to domainArray
domainArray
Returns the IDs of the event domains for the device
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_DEVICE

  • CUPTI_ERROR_INVALID_PARAMETER

    if arraySizeBytes or domainArray are NULL

Description

Returns the event domains IDs in domainArray for a device. The size of the domainArray buffer is given by *arraySizeBytes. The size of the domainArray buffer must be at least numdomains * sizeof(CUpti_EventDomainID) or else all domains will not be returned. The value returned in *arraySizeBytes contains the number of bytes returned in domainArray.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiDeviceGetAttribute ( CUdevice device, CUpti_DeviceAttribute attrib, size_t* valueSize, void* value )
Read a device attribute.
Parameters
device
The CUDA device
attrib
The attribute to read
valueSize
Size of buffer pointed by the value, and returns the number of bytes written to value
value
Returns the value of the attribute
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_DEVICE

  • CUPTI_ERROR_INVALID_PARAMETER

    if valueSize or value is NULL, or if attrib is not a device attribute

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT

    For non-c-string attribute values, indicates that the value buffer is too small to hold the attribute value.

Description

Read a device attribute and return it in *value.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiDeviceGetEventDomainAttribute ( CUdevice device, CUpti_EventDomainID eventDomain, CUpti_EventDomainAttribute attrib, size_t* valueSize, void* value )
Read an event domain attribute.
Parameters
device
The CUDA device
eventDomain
ID of the event domain
attrib
The event domain attribute to read
valueSize
The size of the value buffer in bytes, and returns the number of bytes written to value
value
Returns the attribute's value
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_DEVICE

  • CUPTI_ERROR_INVALID_EVENT_DOMAIN_ID

  • CUPTI_ERROR_INVALID_PARAMETER

    if valueSize or value is NULL, or if attrib is not an event domain attribute

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT

    For non-c-string attribute values, indicates that the value buffer is too small to hold the attribute value.

Description

Returns an event domain attribute in *value. The size of the value buffer is given by *valueSize. The value returned in *valueSize contains the number of bytes returned in value.

If the attribute value is a c-string that is longer than *valueSize, then only the first *valueSize characters will be returned and there will be no terminating null byte.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiDeviceGetNumEventDomains ( CUdevice device, uint32_t* numDomains )
Get the number of domains for a device.
Parameters
device
The CUDA device
numDomains
Returns the number of domains
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_DEVICE

  • CUPTI_ERROR_INVALID_PARAMETER

    if numDomains is NULL

Description

Returns the number of domains in numDomains for a device.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiDeviceGetTimestamp ( CUcontext context, uint64_t* timestamp )
Read a device timestamp.
Parameters
context
A context on the device from which to get the timestamp
timestamp
Returns the device timestamp
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_CONTEXT

  • CUPTI_ERROR_INVALID_PARAMETER

    is timestamp is NULL

Description

Returns the device timestamp in *timestamp. The timestamp is reported in nanoseconds and indicates the time since the device was last reset.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiDisableKernelReplayMode ( CUcontext context )
Disable kernel replay mode.
Parameters
context
The context
Returns

  • CUPTI_SUCCESS

Description

Set profiling mode for the context to non-replay (default) mode. Event collection mode will be set to CUPTI_EVENT_COLLECTION_MODE_KERNEL. All previously enabled event groups and event group sets will be disabled.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiEnableKernelReplayMode ( CUcontext context )
Enable kernel replay mode.
Parameters
context
The context
Returns

  • CUPTI_SUCCESS

Description

Set profiling mode for the context to replay mode. In this mode, any number of events can be collected in one run of the kernel. The event collection mode will automatically switch to CUPTI_EVENT_COLLECTION_MODE_KERNEL. In this mode, cuptiSetEventCollectionMode will return CUPTI_ERROR_INVALID_OPERATION.

Note:
  • Kernels might take longer to run if many events are enabled.

  • Thread-safety: this function is thread safe.

CUptiResult cuptiEnumEventDomains ( size_t* arraySizeBytes, CUpti_EventDomainID* domainArray )
Get the event domains available on any device.
Parameters
arraySizeBytes
The size of domainArray in bytes, and returns the number of bytes written to domainArray
domainArray
Returns all the event domains
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER

    if arraySizeBytes or domainArray are NULL

Description

Returns all the event domains available on any CUDA-capable device. Event domain IDs are returned in domainArray. The size of the domainArray buffer is given by *arraySizeBytes. The size of the domainArray buffer must be at least numDomains * sizeof(CUpti_EventDomainID) or all domains will not be returned. The value returned in *arraySizeBytes contains the number of bytes returned in domainArray.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiEventDomainEnumEvents ( CUpti_EventDomainID eventDomain, size_t* arraySizeBytes, CUpti_EventID* eventArray )
Get the events in a domain.
Parameters
eventDomain
ID of the event domain
arraySizeBytes
The size of eventArray in bytes, and returns the number of bytes written to eventArray
eventArray
Returns the IDs of the events in the domain
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_EVENT_DOMAIN_ID

  • CUPTI_ERROR_INVALID_PARAMETER

    if arraySizeBytes or eventArray are NULL

Description

Returns the event IDs in eventArray for a domain. The size of the eventArray buffer is given by *arraySizeBytes. The size of the eventArray buffer must be at least numdomainevents * sizeof(CUpti_EventID) or else all events will not be returned. The value returned in *arraySizeBytes contains the number of bytes returned in eventArray.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiEventDomainGetAttribute ( CUpti_EventDomainID eventDomain, CUpti_EventDomainAttribute attrib, size_t* valueSize, void* value )
Read an event domain attribute.
Parameters
eventDomain
ID of the event domain
attrib
The event domain attribute to read
valueSize
The size of the value buffer in bytes, and returns the number of bytes written to value
value
Returns the attribute's value
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_EVENT_DOMAIN_ID

  • CUPTI_ERROR_INVALID_PARAMETER

    if valueSize or value is NULL, or if attrib is not an event domain attribute

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT

    For non-c-string attribute values, indicates that the value buffer is too small to hold the attribute value.

Description

Returns an event domain attribute in *value. The size of the value buffer is given by *valueSize. The value returned in *valueSize contains the number of bytes returned in value.

If the attribute value is a c-string that is longer than *valueSize, then only the first *valueSize characters will be returned and there will be no terminating null byte.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiEventDomainGetNumEvents ( CUpti_EventDomainID eventDomain, uint32_t* numEvents )
Get number of events in a domain.
Parameters
eventDomain
ID of the event domain
numEvents
Returns the number of events in the domain
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_EVENT_DOMAIN_ID

  • CUPTI_ERROR_INVALID_PARAMETER

    if numEvents is NULL

Description

Returns the number of events in numEvents for a domain.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiEventGetAttribute ( CUpti_EventID event, CUpti_EventAttribute attrib, size_t* valueSize, void* value )
Get an event attribute.
Parameters
event
ID of the event
attrib
The event attribute to read
valueSize
The size of the value buffer in bytes, and returns the number of bytes written to value
value
Returns the attribute's value
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_EVENT_ID

  • CUPTI_ERROR_INVALID_PARAMETER

    if valueSize or value is NULL, or if attrib is not an event attribute

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT

    For non-c-string attribute values, indicates that the value buffer is too small to hold the attribute value.

Description

Returns an event attribute in *value. The size of the value buffer is given by *valueSize. The value returned in *valueSize contains the number of bytes returned in value.

If the attribute value is a c-string that is longer than *valueSize, then only the first *valueSize characters will be returned and there will be no terminating null byte.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiEventGetIdFromName ( CUdevice device, const char* eventName, CUpti_EventID* event )
Find an event by name.
Parameters
device
The CUDA device
eventName
The name of the event to find
event
Returns the ID of the found event or undefined if unable to find the event
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_DEVICE

  • CUPTI_ERROR_INVALID_EVENT_NAME

    if unable to find an event with name eventName. In this case *event is undefined

  • CUPTI_ERROR_INVALID_PARAMETER

    if eventName or event are NULL

Description

Find an event by name and return the event ID in *event.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiEventGroupAddEvent ( CUpti_EventGroup eventGroup, CUpti_EventID event )
Add an event to an event group.
Parameters
eventGroup
The event group
event
The event to add to the group
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_EVENT_ID

  • CUPTI_ERROR_OUT_OF_MEMORY

  • CUPTI_ERROR_INVALID_OPERATION

    if eventGroup is enabled

  • CUPTI_ERROR_NOT_COMPATIBLE

    if event belongs to a different event domain than the events already in eventGroup, or if a device limitation prevents event from being collected at the same time as the events already in eventGroup

  • CUPTI_ERROR_MAX_LIMIT_REACHED

    if eventGroup is full

  • CUPTI_ERROR_INVALID_PARAMETER

    if eventGroup is NULL

Description

Add an event to an event group. The event add can fail for a number of reasons:

  • The event group is enabled

  • The event does not belong to the same event domain as the events that are already in the event group

  • Device limitations on the events that can belong to the same group

  • The event group is full

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiEventGroupCreate ( CUcontext context, CUpti_EventGroup* eventGroup, uint32_t flags )
Create a new event group for a context.
Parameters
context
The context for the event group
eventGroup
Returns the new event group
flags
Reserved - must be zero
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_CONTEXT

  • CUPTI_ERROR_OUT_OF_MEMORY

  • CUPTI_ERROR_INVALID_PARAMETER

    if eventGroup is NULL

Description

Creates a new event group for context and returns the new group in *eventGroup.

Note:
  • flags are reserved for future use and should be set to zero.

  • Thread-safety: this function is thread safe.

CUptiResult cuptiEventGroupDestroy ( CUpti_EventGroup eventGroup )
Destroy an event group.
Parameters
eventGroup
The event group to destroy
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_OPERATION

    if the event group is enabled

  • CUPTI_ERROR_INVALID_PARAMETER

    if eventGroup is NULL

Description

Destroy an eventGroup and free its resources. An event group cannot be destroyed if it is enabled.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiEventGroupDisable ( CUpti_EventGroup eventGroup )
Disable an event group.
Parameters
eventGroup
The event group
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_HARDWARE

  • CUPTI_ERROR_INVALID_PARAMETER

    if eventGroup is NULL

Description

Disable an event group. Disabling an event group stops collection of events contained in the group.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiEventGroupEnable ( CUpti_EventGroup eventGroup )
Enable an event group.
Parameters
eventGroup
The event group
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_HARDWARE

  • CUPTI_ERROR_NOT_READY

    if eventGroup does not contain any events

  • CUPTI_ERROR_NOT_COMPATIBLE

    if eventGroup cannot be enabled due to other already enabled event groups

  • CUPTI_ERROR_INVALID_PARAMETER

    if eventGroup is NULL

  • CUPTI_ERROR_HARDWARE_BUSY

    if another client is profiling and hardware is busy

Description

Enable an event group. Enabling an event group zeros the value of all the events in the group and then starts collection of those events.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiEventGroupGetAttribute ( CUpti_EventGroup eventGroup, CUpti_EventGroupAttribute attrib, size_t* valueSize, void* value )
Read an event group attribute.
Parameters
eventGroup
The event group
attrib
The attribute to read
valueSize
Size of buffer pointed by the value, and returns the number of bytes written to value
value
Returns the value of the attribute
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_PARAMETER

    if valueSize or value is NULL, or if attrib is not an eventgroup attribute

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT

    For non-c-string attribute values, indicates that the value buffer is too small to hold the attribute value.

Description

Read an event group attribute and return it in *value.

Note:

Thread-safety: this function is thread safe but client must guard against simultaneous destruction or modification of eventGroup (for example, client must guard against simultaneous calls to cuptiEventGroupDestroy, cuptiEventGroupAddEvent, etc.), and must guard against simultaneous destruction of the context in which eventGroup was created (for example, client must guard against simultaneous calls to cudaDeviceReset, cuCtxDestroy, etc.).

CUptiResult cuptiEventGroupReadAllEvents ( CUpti_EventGroup eventGroup, CUpti_ReadEventFlags flags, size_t* eventValueBufferSizeBytes, uint64_t* eventValueBuffer, size_t* eventIdArraySizeBytes, CUpti_EventID* eventIdArray, size_t* numEventIdsRead )
Read the values for all the events in an event group.
Parameters
eventGroup
The event group
flags
Flags controlling the reading mode
eventValueBufferSizeBytes
The size of eventValueBuffer in bytes, and returns the number of bytes written to eventValueBuffer
eventValueBuffer
Returns the event values
eventIdArraySizeBytes
The size of eventIdArray in bytes, and returns the number of bytes written to eventIdArray
eventIdArray
Returns the IDs of the events in the same order as the values return in eventValueBuffer.
numEventIdsRead
Returns the number of event IDs returned in eventIdArray
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_HARDWARE

  • CUPTI_ERROR_INVALID_OPERATION

    if eventGroup is disabled

  • CUPTI_ERROR_INVALID_PARAMETER

    if eventGroup, eventValueBufferSizeBytes, eventValueBuffer, eventIdArraySizeBytes, eventIdArray or numEventIdsRead is NULL

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT

    if size of eventValueBuffer or eventIdArray is not sufficient

Description

Read the values for all the events in an event group. The event values are returned in the eventValueBuffer buffer. eventValueBufferSizeBytes indicates the size of eventValueBuffer. The buffer must be at least (sizeof(uint64) * number of events in group) if CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES is not set on the group containing the events. The buffer must be at least (sizeof(uint64) * number of domain instances * number of events in group) if CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES is set on the group.

The data format returned in eventValueBuffer is:

  • domain instance 0: event0 event1 ... eventN

  • domain instance 1: event0 event1 ... eventN

  • ...

  • domain instance M: event0 event1 ... eventN

The event order in eventValueBuffer is returned in eventIdArray. The size of eventIdArray is specified in eventIdArraySizeBytes. The size should be at least (sizeof(CUpti_EventID) * number of events in group).

If any instance of any event counter overflows, the value returned for that event instance will be CUPTI_EVENT_OVERFLOW.

The only allowed value for flags is CUPTI_EVENT_READ_FLAG_NONE.

Reading events from a disabled event group is not allowed. After being read, an event's value is reset to zero.

Note:

Thread-safety: this function is thread safe but client must guard against simultaneous destruction or modification of eventGroup (for example, client must guard against simultaneous calls to cuptiEventGroupDestroy, cuptiEventGroupAddEvent, etc.), and must guard against simultaneous destruction of the context in which eventGroup was created (for example, client must guard against simultaneous calls to cudaDeviceReset, cuCtxDestroy, etc.). If cuptiEventGroupResetAllEvents is called simultaneously with this function, then returned event values are undefined.

CUptiResult cuptiEventGroupReadEvent ( CUpti_EventGroup eventGroup, CUpti_ReadEventFlags flags, CUpti_EventID event, size_t* eventValueBufferSizeBytes, uint64_t* eventValueBuffer )
Read the value for an event in an event group.
Parameters
eventGroup
The event group
flags
Flags controlling the reading mode
event
The event to read
eventValueBufferSizeBytes
The size of eventValueBuffer in bytes, and returns the number of bytes written to eventValueBuffer
eventValueBuffer
Returns the event value(s)
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_EVENT_ID

  • CUPTI_ERROR_HARDWARE

  • CUPTI_ERROR_INVALID_OPERATION

    if eventGroup is disabled

  • CUPTI_ERROR_INVALID_PARAMETER

    if eventGroup, eventValueBufferSizeBytes or eventValueBuffer is NULL

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT

    if size of eventValueBuffer is not sufficient

Description

Read the value for an event in an event group. The event value is returned in the eventValueBuffer buffer. eventValueBufferSizeBytes indicates the size of the eventValueBuffer buffer. The buffer must be at least sizeof(uint64) if CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES is not set on the group containing the event. The buffer must be at least (sizeof(uint64) * number of domain instances) if CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES is set on the group.

If any instance of an event counter overflows, the value returned for that event instance will be CUPTI_EVENT_OVERFLOW.

The only allowed value for flags is CUPTI_EVENT_READ_FLAG_NONE.

Reading an event from a disabled event group is not allowed. After being read, an event's value is reset to zero.

Note:

Thread-safety: this function is thread safe but client must guard against simultaneous destruction or modification of eventGroup (for example, client must guard against simultaneous calls to cuptiEventGroupDestroy, cuptiEventGroupAddEvent, etc.), and must guard against simultaneous destruction of the context in which eventGroup was created (for example, client must guard against simultaneous calls to cudaDeviceReset, cuCtxDestroy, etc.). If cuptiEventGroupResetAllEvents is called simultaneously with this function, then returned event values are undefined.

CUptiResult cuptiEventGroupRemoveAllEvents ( CUpti_EventGroup eventGroup )
Remove all events from an event group.
Parameters
eventGroup
The event group
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_OPERATION

    if eventGroup is enabled

  • CUPTI_ERROR_INVALID_PARAMETER

    if eventGroup is NULL

Description

Remove all events from an event group. Events cannot be removed if the event group is enabled.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiEventGroupRemoveEvent ( CUpti_EventGroup eventGroup, CUpti_EventID event )
Remove an event from an event group.
Parameters
eventGroup
The event group
event
The event to remove from the group
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_EVENT_ID

  • CUPTI_ERROR_INVALID_OPERATION

    if eventGroup is enabled

  • CUPTI_ERROR_INVALID_PARAMETER

    if eventGroup is NULL

Description

Remove event from the an event group. The event cannot be removed if the event group is enabled.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiEventGroupResetAllEvents ( CUpti_EventGroup eventGroup )
Zero all the event counts in an event group.
Parameters
eventGroup
The event group
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_HARDWARE

  • CUPTI_ERROR_INVALID_PARAMETER

    if eventGroup is NULL

Description

Zero all the event counts in an event group.

Note:

Thread-safety: this function is thread safe but client must guard against simultaneous destruction or modification of eventGroup (for example, client must guard against simultaneous calls to cuptiEventGroupDestroy, cuptiEventGroupAddEvent, etc.), and must guard against simultaneous destruction of the context in which eventGroup was created (for example, client must guard against simultaneous calls to cudaDeviceReset, cuCtxDestroy, etc.).

CUptiResult cuptiEventGroupSetAttribute ( CUpti_EventGroup eventGroup, CUpti_EventGroupAttribute attrib, size_t valueSize, void* value )
Write an event group attribute.
Parameters
eventGroup
The event group
attrib
The attribute to write
valueSize
The size, in bytes, of the value
value
The attribute value to write
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_PARAMETER

    if valueSize or value is NULL, or if attrib is not an event group attribute, or if attrib is not a writable attribute

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT

    Indicates that the value buffer is too small to hold the attribute value.

Description

Write an event group attribute.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiEventGroupSetDisable ( CUpti_EventGroupSet* eventGroupSet )
Disable an event group set.
Parameters
eventGroupSet
The pointer to the event group set
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_HARDWARE

  • CUPTI_ERROR_INVALID_PARAMETER

    if eventGroupSet is NULL

Description

Disable a set of event groups. Disabling a set of event groups stops collection of events contained in the groups.

Note:
  • Thread-safety: this function is thread safe.

  • If this call fails, some of the event groups in the set may be disabled and other event groups may remain enabled.

CUptiResult cuptiEventGroupSetEnable ( CUpti_EventGroupSet* eventGroupSet )
Enable an event group set.
Parameters
eventGroupSet
The pointer to the event group set
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_HARDWARE

  • CUPTI_ERROR_NOT_READY

    if eventGroup does not contain any events

  • CUPTI_ERROR_NOT_COMPATIBLE

    if eventGroup cannot be enabled due to other already enabled event groups

  • CUPTI_ERROR_INVALID_PARAMETER

    if eventGroupSet is NULL

  • CUPTI_ERROR_HARDWARE_BUSY

    if other client is profiling and hardware is busy

Description

Enable a set of event groups. Enabling a set of event groups zeros the value of all the events in all the groups and then starts collection of those events.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiEventGroupSetsCreate ( CUcontext context, size_t eventIdArraySizeBytes, CUpti_EventID* eventIdArray, CUpti_EventGroupSets** eventGroupPasses )
For a set of events, get the grouping that indicates the number of passes and the event groups necessary to collect the events.
Parameters
context
The context for event collection
eventIdArraySizeBytes
Size of eventIdArray in bytes
eventIdArray
Array of event IDs that need to be grouped
eventGroupPasses
Returns a CUpti_EventGroupSets object that indicates the number of passes required to collect the events and the events to collect on each pass
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_CONTEXT

  • CUPTI_ERROR_INVALID_EVENT_ID

  • CUPTI_ERROR_INVALID_PARAMETER

    if eventIdArray or eventGroupPasses is NULL

Description

The number of events that can be collected simultaneously varies by device and by the type of the events. When events can be collected simultaneously, they may need to be grouped into multiple event groups because they are from different event domains. This function takes a set of events and determines how many passes are required to collect all those events, and which events can be collected simultaneously in each pass.

The CUpti_EventGroupSets returned in eventGroupPasses indicates how many passes are required to collect the events with the numSets field. Within each event group set, the sets array indicates the event groups that should be collected on each pass.

Note:

Thread-safety: this function is thread safe, but client must guard against another thread simultaneously destroying context.

CUptiResult cuptiEventGroupSetsDestroy ( CUpti_EventGroupSets* eventGroupSets )
Destroy a CUpti_EventGroupSets object.
Parameters
eventGroupSets
The object to destroy
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_OPERATION

    if any of the event groups contained in the sets is enabled

  • CUPTI_ERROR_INVALID_PARAMETER

    if eventGroupSets is NULL

Description

Destroy a CUpti_EventGroupSets object.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiGetNumEventDomains ( uint32_t* numDomains )
Get the number of event domains available on any device.
Parameters
numDomains
Returns the number of domains
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER

    if numDomains is NULL

Description

Returns the total number of event domains available on any CUDA-capable device.

Note:

Thread-safety: this function is thread safe.

CUptiResult cuptiKernelReplaySubscribeUpdate ( CUpti_KernelReplayUpdateFunc updateFunc, void* customData )
Subscribe to kernel replay updates.
Parameters
updateFunc
The update function pointer
customData
Pointer to any custom data
Returns

  • CUPTI_SUCCESS

Description

When subscribed, the function pointer passed in will be called each time a kernel run is finished during kernel replay. Previously subscribed function pointer will be replaced. Pass in NULL as the function pointer unsubscribes the update.

CUptiResult cuptiSetEventCollectionMode ( CUcontext context, CUpti_EventCollectionMode mode )
Set the event collection mode.
Parameters
context
The context
mode
The event collection mode
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_CONTEXT

  • CUPTI_ERROR_INVALID_OPERATION

    if called when replay mode is enabled

  • CUPTI_ERROR_NOT_SUPPORTED

    if mode is not supported on the device

Description

Set the event collection mode for a context. The mode controls the event collection behavior of all events in event groups created in the context. This API is invalid in kernel replay mode.

Note:

Thread-safety: this function is thread safe.

3.6. CUPTI Metric API

Functions, types, and enums that implement the CUPTI Metric API.

Classes

union 
A metric value.

Typedefs

typedef uint32_t  CUpti_MetricID
ID for a metric.

Enumerations

enum CUpti_MetricAttribute
Metric attributes.
enum CUpti_MetricCategory
A metric category.
enum CUpti_MetricEvaluationMode
A metric evaluation mode.
enum CUpti_MetricPropertyDeviceClass
Device class.
enum CUpti_MetricPropertyID
Metric device properties.
enum CUpti_MetricValueKind
Kinds of metric values.
enum CUpti_MetricValueUtilizationLevel
Enumeration of utilization levels for metrics values of kind CUPTI_METRIC_VALUE_KIND_UTILIZATION_LEVEL. Utilization values can vary from IDLE (0) to MAX (10) but the enumeration only provides specific names for a few values.

Functions

CUptiResult cuptiDeviceEnumMetrics ( CUdevice device, size_t* arraySizeBytes, CUpti_MetricID* metricArray )
Get the metrics for a device.
CUptiResult cuptiDeviceGetNumMetrics ( CUdevice device, uint32_t* numMetrics )
Get the number of metrics for a device.
CUptiResult cuptiEnumMetrics ( size_t* arraySizeBytes, CUpti_MetricID* metricArray )
Get all the metrics available on any device.
CUptiResult cuptiGetNumMetrics ( uint32_t* numMetrics )
Get the total number of metrics available on any device.
CUptiResult cuptiMetricCreateEventGroupSets ( CUcontext context, size_t metricIdArraySizeBytes, CUpti_MetricID* metricIdArray, CUpti_EventGroupSets** eventGroupPasses )
For a set of metrics, get the grouping that indicates the number of passes and the event groups necessary to collect the events required for those metrics.
CUptiResult cuptiMetricEnumEvents ( CUpti_MetricID metric, size_t* eventIdArraySizeBytes, CUpti_EventID* eventIdArray )
Get the events required to calculating a metric.
CUptiResult cuptiMetricEnumProperties ( CUpti_MetricID metric, size_t* propIdArraySizeBytes, CUpti_MetricPropertyID* propIdArray )
Get the properties required to calculating a metric.
CUptiResult cuptiMetricGetAttribute ( CUpti_MetricID metric, CUpti_MetricAttribute attrib, size_t* valueSize, void* value )
Get a metric attribute.
CUptiResult cuptiMetricGetIdFromName ( CUdevice device, const char* metricName, CUpti_MetricID* metric )
Find an metric by name.
CUptiResult cuptiMetricGetNumEvents ( CUpti_MetricID metric, uint32_t* numEvents )
Get number of events required to calculate a metric.
CUptiResult cuptiMetricGetNumProperties ( CUpti_MetricID metric, uint32_t* numProp )
Get number of properties required to calculate a metric.
CUptiResult cuptiMetricGetRequiredEventGroupSets ( CUcontext context, CUpti_MetricID metric, CUpti_EventGroupSets** eventGroupSets )
For a metric get the groups of events that must be collected in the same pass.
CUptiResult cuptiMetricGetValue ( CUdevice device, CUpti_MetricID metric, size_t eventIdArraySizeBytes, CUpti_EventID* eventIdArray, size_t eventValueArraySizeBytes, uint64_t* eventValueArray, uint64_t timeDuration, CUpti_MetricValue* metricValue )
Calculate the value for a metric.
CUptiResult cuptiMetricGetValue2 ( CUpti_MetricID metric, size_t eventIdArraySizeBytes, CUpti_EventID* eventIdArray, size_t eventValueArraySizeBytes, uint64_t* eventValueArray, size_t propIdArraySizeBytes, CUpti_MetricPropertyID* propIdArray, size_t propValueArraySizeBytes, uint64_t* propValueArray, CUpti_MetricValue* metricValue )
Calculate the value for a metric.

Typedefs

typedef uint32_t CUpti_MetricID

ID for a metric. A metric provides a measure of some aspect of the device.

Enumerations

enum CUpti_MetricAttribute

Metric attributes describe properties of a metric. These attributes can be read using cuptiMetricGetAttribute.

Values
CUPTI_METRIC_ATTR_NAME = 0
Metric name. Value is a null terminated const c-string.
CUPTI_METRIC_ATTR_SHORT_DESCRIPTION = 1
Short description of metric. Value is a null terminated const c-string.
CUPTI_METRIC_ATTR_LONG_DESCRIPTION = 2
Long description of metric. Value is a null terminated const c-string.
CUPTI_METRIC_ATTR_CATEGORY = 3
Category of the metric. Value is of type CUpti_MetricCategory.
CUPTI_METRIC_ATTR_VALUE_KIND = 4
Value type of the metric. Value is of type CUpti_MetricValueKind.
CUPTI_METRIC_ATTR_EVALUATION_MODE = 5
Metric evaluation mode. Value is of type CUpti_MetricEvaluationMode.
CUPTI_METRIC_ATTR_FORCE_INT = 0x7fffffff
enum CUpti_MetricCategory

Each metric is assigned to a category that represents the general type of the metric. A metric's category is accessed using cuptiMetricGetAttribute and the CUPTI_METRIC_ATTR_CATEGORY attribute.

Values
CUPTI_METRIC_CATEGORY_MEMORY = 0
A memory related metric.
CUPTI_METRIC_CATEGORY_INSTRUCTION = 1
An instruction related metric.
CUPTI_METRIC_CATEGORY_MULTIPROCESSOR = 2
A multiprocessor related metric.
CUPTI_METRIC_CATEGORY_CACHE = 3
A cache related metric.
CUPTI_METRIC_CATEGORY_TEXTURE = 4
A texture related metric.
CUPTI_METRIC_CATEGORY_NVLINK = 5
A Nvlink related metric.
CUPTI_METRIC_CATEGORY_PCIE = 6
A PCIe related metric.
CUPTI_METRIC_CATEGORY_FORCE_INT = 0x7fffffff
enum CUpti_MetricEvaluationMode

A metric can be evaluated per hardware instance to know the load balancing across instances of a domain or the metric can be evaluated in aggregate mode when the events involved in metric evaluation are from different event domains. It might be possible to evaluate some metrics in both modes for convenience. A metric's evaluation mode is accessed using CUpti_MetricEvaluationMode and the CUPTI_METRIC_ATTR_EVALUATION_MODE attribute.

Values
CUPTI_METRIC_EVALUATION_MODE_PER_INSTANCE = 1
If this bit is set, the metric can be profiled for each instance of the domain. The event values passed to cuptiMetricGetValue can contain values for one instance of the domain. And cuptiMetricGetValue can be called for each instance.
CUPTI_METRIC_EVALUATION_MODE_AGGREGATE = 1<<1
If this bit is set, the metric can be profiled over all instances. The event values passed to cuptiMetricGetValue can be aggregated values of events for all instances of the domain.
CUPTI_METRIC_EVALUATION_MODE_FORCE_INT = 0x7fffffff
enum CUpti_MetricPropertyDeviceClass

Enumeration of device classes for metric property CUPTI_METRIC_PROPERTY_DEVICE_CLASS.

Values
CUPTI_METRIC_PROPERTY_DEVICE_CLASS_TESLA = 0
CUPTI_METRIC_PROPERTY_DEVICE_CLASS_QUADRO = 1
CUPTI_METRIC_PROPERTY_DEVICE_CLASS_GEFORCE = 2
CUPTI_METRIC_PROPERTY_DEVICE_CLASS_TEGRA = 3
enum CUpti_MetricPropertyID

Metric device properties describe device properties which are needed for a metric. Some of these properties can be collected using cuDeviceGetAttribute.

Values
CUPTI_METRIC_PROPERTY_MULTIPROCESSOR_COUNT
CUPTI_METRIC_PROPERTY_WARPS_PER_MULTIPROCESSOR
CUPTI_METRIC_PROPERTY_KERNEL_GPU_TIME
CUPTI_METRIC_PROPERTY_CLOCK_RATE
CUPTI_METRIC_PROPERTY_FRAME_BUFFER_COUNT
CUPTI_METRIC_PROPERTY_GLOBAL_MEMORY_BANDWIDTH
CUPTI_METRIC_PROPERTY_PCIE_LINK_RATE
CUPTI_METRIC_PROPERTY_PCIE_LINK_WIDTH
CUPTI_METRIC_PROPERTY_PCIE_GEN
CUPTI_METRIC_PROPERTY_DEVICE_CLASS
CUPTI_METRIC_PROPERTY_FLOP_SP_PER_CYCLE
CUPTI_METRIC_PROPERTY_FLOP_DP_PER_CYCLE
CUPTI_METRIC_PROPERTY_L2_UNITS
CUPTI_METRIC_PROPERTY_ECC_ENABLED
CUPTI_METRIC_PROPERTY_FLOP_HP_PER_CYCLE
CUPTI_METRIC_PROPERTY_GPU_CPU_NVLINK_BANDWIDTH
enum CUpti_MetricValueKind

Metric values can be one of several different kinds. Corresponding to each kind is a member of the CUpti_MetricValue union. The metric value returned by cuptiMetricGetValue should be accessed using the appropriate member of that union based on its value kind.

Values
CUPTI_METRIC_VALUE_KIND_DOUBLE = 0
The metric value is a 64-bit double.
CUPTI_METRIC_VALUE_KIND_UINT64 = 1
The metric value is a 64-bit unsigned integer.
CUPTI_METRIC_VALUE_KIND_PERCENT = 2
The metric value is a percentage represented by a 64-bit double. For example, 57.5% is represented by the value 57.5.
CUPTI_METRIC_VALUE_KIND_THROUGHPUT = 3
The metric value is a throughput represented by a 64-bit integer. The unit for throughput values is bytes/second.
CUPTI_METRIC_VALUE_KIND_INT64 = 4
The metric value is a 64-bit signed integer.
CUPTI_METRIC_VALUE_KIND_UTILIZATION_LEVEL = 5
The metric value is a utilization level, as represented by CUpti_MetricValueUtilizationLevel.
CUPTI_METRIC_VALUE_KIND_FORCE_INT = 0x7fffffff
enum CUpti_MetricValueUtilizationLevel

Values
CUPTI_METRIC_VALUE_UTILIZATION_IDLE = 0
CUPTI_METRIC_VALUE_UTILIZATION_LOW = 2
CUPTI_METRIC_VALUE_UTILIZATION_MID = 5
CUPTI_METRIC_VALUE_UTILIZATION_HIGH = 8
CUPTI_METRIC_VALUE_UTILIZATION_MAX = 10
CUPTI_METRIC_VALUE_UTILIZATION_FORCE_INT = 0x7fffffff

Functions

CUptiResult cuptiDeviceEnumMetrics ( CUdevice device, size_t* arraySizeBytes, CUpti_MetricID* metricArray )
Get the metrics for a device.
Parameters
device
The CUDA device
arraySizeBytes
The size of metricArray in bytes, and returns the number of bytes written to metricArray
metricArray
Returns the IDs of the metrics for the device
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_DEVICE

  • CUPTI_ERROR_INVALID_PARAMETER

    if arraySizeBytes or metricArray are NULL

Description

Returns the metric IDs in metricArray for a device. The size of the metricArray buffer is given by *arraySizeBytes. The size of the metricArray buffer must be at least numMetrics * sizeof(CUpti_MetricID) or else all metric IDs will not be returned. The value returned in *arraySizeBytes contains the number of bytes returned in metricArray.

CUptiResult cuptiDeviceGetNumMetrics ( CUdevice device, uint32_t* numMetrics )
Get the number of metrics for a device.
Parameters
device
The CUDA device
numMetrics
Returns the number of metrics available for the device
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_DEVICE

  • CUPTI_ERROR_INVALID_PARAMETER

    if numMetrics is NULL

Description

Returns the number of metrics available for a device.

CUptiResult cuptiEnumMetrics ( size_t* arraySizeBytes, CUpti_MetricID* metricArray )
Get all the metrics available on any device.
Parameters
arraySizeBytes
The size of metricArray in bytes, and returns the number of bytes written to metricArray
metricArray
Returns the IDs of the metrics
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER

    if arraySizeBytes or metricArray are NULL

Description

Returns the metric IDs in metricArray for all CUDA-capable devices. The size of the metricArray buffer is given by *arraySizeBytes. The size of the metricArray buffer must be at least numMetrics * sizeof(CUpti_MetricID) or all metric IDs will not be returned. The value returned in *arraySizeBytes contains the number of bytes returned in metricArray.

CUptiResult cuptiGetNumMetrics ( uint32_t* numMetrics )
Get the total number of metrics available on any device.
Parameters
numMetrics
Returns the number of metrics
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER

    if numMetrics is NULL

Description

Returns the total number of metrics available on any CUDA-capable devices.

CUptiResult cuptiMetricCreateEventGroupSets ( CUcontext context, size_t metricIdArraySizeBytes, CUpti_MetricID* metricIdArray, CUpti_EventGroupSets** eventGroupPasses )
For a set of metrics, get the grouping that indicates the number of passes and the event groups necessary to collect the events required for those metrics.
Parameters
context
The context for event collection
metricIdArraySizeBytes
Size of the metricIdArray in bytes
metricIdArray
Array of metric IDs
eventGroupPasses
Returns a CUpti_EventGroupSets object that indicates the number of passes required to collect the events and the events to collect on each pass
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_CONTEXT

  • CUPTI_ERROR_INVALID_METRIC_ID

  • CUPTI_ERROR_INVALID_PARAMETER

    if metricIdArray or eventGroupPasses is NULL

Description

For a set of metrics, get the grouping that indicates the number of passes and the event groups necessary to collect the events required for those metrics.

See also:

cuptiEventGroupSetsCreate for details on event group set creation.

CUptiResult cuptiMetricEnumEvents ( CUpti_MetricID metric, size_t* eventIdArraySizeBytes, CUpti_EventID* eventIdArray )
Get the events required to calculating a metric.
Parameters
metric
ID of the metric
eventIdArraySizeBytes
The size of eventIdArray in bytes, and returns the number of bytes written to eventIdArray
eventIdArray
Returns the IDs of the events required to calculate metric
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_METRIC_ID

  • CUPTI_ERROR_INVALID_PARAMETER

    if eventIdArraySizeBytes or eventIdArray are NULL.

Description

Gets the event IDs in eventIdArray required to calculate a metric. The size of the eventIdArray buffer is given by *eventIdArraySizeBytes and must be at least numEvents * sizeof(CUpti_EventID) or all events will not be returned. The value returned in *eventIdArraySizeBytes contains the number of bytes returned in eventIdArray.

CUptiResult cuptiMetricEnumProperties ( CUpti_MetricID metric, size_t* propIdArraySizeBytes, CUpti_MetricPropertyID* propIdArray )
Get the properties required to calculating a metric.
Parameters
metric
ID of the metric
propIdArraySizeBytes
The size of propIdArray in bytes, and returns the number of bytes written to propIdArray
propIdArray
Returns the IDs of the properties required to calculate metric
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_METRIC_ID

  • CUPTI_ERROR_INVALID_PARAMETER

    if propIdArraySizeBytes or propIdArray are NULL.

Description

Gets the property IDs in propIdArray required to calculate a metric. The size of the propIdArray buffer is given by *propIdArraySizeBytes and must be at least numProp * sizeof(CUpti_DeviceAttribute) or all properties will not be returned. The value returned in *propIdArraySizeBytes contains the number of bytes returned in propIdArray.

CUptiResult cuptiMetricGetAttribute ( CUpti_MetricID metric, CUpti_MetricAttribute attrib, size_t* valueSize, void* value )
Get a metric attribute.
Parameters
metric
ID of the metric
attrib
The metric attribute to read
valueSize
The size of the value buffer in bytes, and returns the number of bytes written to value
value
Returns the attribute's value
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_METRIC_ID

  • CUPTI_ERROR_INVALID_PARAMETER

    if valueSize or value is NULL, or if attrib is not a metric attribute

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT

    For non-c-string attribute values, indicates that the value buffer is too small to hold the attribute value.

Description

Returns a metric attribute in *value. The size of the value buffer is given by *valueSize. The value returned in *valueSize contains the number of bytes returned in value.

If the attribute value is a c-string that is longer than *valueSize, then only the first *valueSize characters will be returned and there will be no terminating null byte.

CUptiResult cuptiMetricGetIdFromName ( CUdevice device, const char* metricName, CUpti_MetricID* metric )
Find an metric by name.
Parameters
device
The CUDA device
metricName
The name of metric to find
metric
Returns the ID of the found metric or undefined if unable to find the metric
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_DEVICE

  • CUPTI_ERROR_INVALID_METRIC_NAME

    if unable to find a metric with name metricName. In this case *metric is undefined

  • CUPTI_ERROR_INVALID_PARAMETER

    if metricName or metric are NULL.

Description

Find a metric by name and return the metric ID in *metric.

CUptiResult cuptiMetricGetNumEvents ( CUpti_MetricID metric, uint32_t* numEvents )
Get number of events required to calculate a metric.
Parameters
metric
ID of the metric
numEvents
Returns the number of events required for the metric
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_METRIC_ID

  • CUPTI_ERROR_INVALID_PARAMETER

    if numEvents is NULL

Description

Returns the number of events in numEvents that are required to calculate a metric.

CUptiResult cuptiMetricGetNumProperties ( CUpti_MetricID metric, uint32_t* numProp )
Get number of properties required to calculate a metric.
Parameters
metric
ID of the metric
numProp
Returns the number of properties required for the metric
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_METRIC_ID

  • CUPTI_ERROR_INVALID_PARAMETER

    if numProp is NULL

Description

Returns the number of properties in numProp that are required to calculate a metric.

CUptiResult cuptiMetricGetRequiredEventGroupSets ( CUcontext context, CUpti_MetricID metric, CUpti_EventGroupSets** eventGroupSets )
For a metric get the groups of events that must be collected in the same pass.
Parameters
context
The context for event collection
metric
The metric ID
eventGroupSets
Returns a CUpti_EventGroupSets object that indicates the events that must be collected in the same pass to ensure the metric is calculated correctly. Returns NULL if no grouping is required for metric
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_METRIC_ID

Description

For a metric get the groups of events that must be collected in the same pass to ensure that the metric is calculated correctly. If the events are not collected as specified then the metric value may be inaccurate.

The function returns NULL if a metric does not have any required event group. In this case the events needed for the metric can be grouped in any manner for collection.

CUptiResult cuptiMetricGetValue ( CUdevice device, CUpti_MetricID metric, size_t eventIdArraySizeBytes, CUpti_EventID* eventIdArray, size_t eventValueArraySizeBytes, uint64_t* eventValueArray, uint64_t timeDuration, CUpti_MetricValue* metricValue )
Calculate the value for a metric.
Parameters
device
The CUDA device that the metric is being calculated for
metric
The metric ID
eventIdArraySizeBytes
The size of eventIdArray in bytes
eventIdArray
The event IDs required to calculate metric
eventValueArraySizeBytes
The size of eventValueArray in bytes
eventValueArray
The normalized event values required to calculate metric. The values must be order to match the order of events in eventIdArray
timeDuration
The duration over which the events were collected, in ns
metricValue
Returns the value for the metric
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_METRIC_ID

  • CUPTI_ERROR_INVALID_OPERATION

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT

    if the eventIdArray does not contain all the events needed for metric

  • CUPTI_ERROR_INVALID_EVENT_VALUE

    if any of the event values required for the metric is CUPTI_EVENT_OVERFLOW

  • CUPTI_ERROR_INVALID_METRIC_VALUE

    if the computed metric value cannot be represented in the metric's value type. For example, if the metric value type is unsigned and the computed metric value is negative

  • CUPTI_ERROR_INVALID_PARAMETER

    if metricValue, eventIdArray or eventValueArray is NULL

Description

Use the events collected for a metric to calculate the metric value. Metric value evaluation depends on the evaluation mode CUpti_MetricEvaluationMode that the metric supports. If a metric has evaluation mode as CUPTI_METRIC_EVALUATION_MODE_PER_INSTANCE, then it assumes that the input event value is for one domain instance. If a metric has evaluation mode as CUPTI_METRIC_EVALUATION_MODE_AGGREGATE, it assumes that input event values are normalized to represent all domain instances on a device. For the most accurate metric collection, the events required for the metric should be collected for all profiled domain instances. For example, to collect all instances of an event, set the CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES attribute on the group containing the event to 1. The normalized value for the event is then: (sum_event_values * totalInstanceCount) / instanceCount, where sum_event_values is the summation of the event values across all profiled domain instances, totalInstanceCount is obtained from querying CUPTI_EVENT_DOMAIN_ATTR_TOTAL_INSTANCE_COUNT and instanceCount is obtained from querying CUPTI_EVENT_GROUP_ATTR_INSTANCE_COUNT (or CUPTI_EVENT_DOMAIN_ATTR_INSTANCE_COUNT).

CUptiResult cuptiMetricGetValue2 ( CUpti_MetricID metric, size_t eventIdArraySizeBytes, CUpti_EventID* eventIdArray, size_t eventValueArraySizeBytes, uint64_t* eventValueArray, size_t propIdArraySizeBytes, CUpti_MetricPropertyID* propIdArray, size_t propValueArraySizeBytes, uint64_t* propValueArray, CUpti_MetricValue* metricValue )
Calculate the value for a metric.
Parameters
metric
The metric ID
eventIdArraySizeBytes
The size of eventIdArray in bytes
eventIdArray
The event IDs required to calculate metric
eventValueArraySizeBytes
The size of eventValueArray in bytes
eventValueArray
The normalized event values required to calculate metric. The values must be order to match the order of events in eventIdArray
propIdArraySizeBytes
The size of propIdArray in bytes
propIdArray
The metric property IDs required to calculate metric
propValueArraySizeBytes
The size of propValueArray in bytes
propValueArray
The metric property values required to calculate metric. The values must be order to match the order of metric properties in propIdArray
metricValue
Returns the value for the metric
Returns

  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_METRIC_ID

  • CUPTI_ERROR_INVALID_OPERATION

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT

    if the eventIdArray does not contain all the events needed for metric

  • CUPTI_ERROR_INVALID_EVENT_VALUE

    if any of the event values required for the metric is CUPTI_EVENT_OVERFLOW

  • CUPTI_ERROR_NOT_COMPATIBLE

    if the computed metric value cannot be represented in the metric's value type. For example, if the metric value type is unsigned and the computed metric value is negative

  • CUPTI_ERROR_INVALID_PARAMETER

    if metricValue, eventIdArray or eventValueArray is NULL

Description

Use the events and properties collected for a metric to calculate the metric value. Metric value evaluation depends on the evaluation mode CUpti_MetricEvaluationMode that the metric supports. If a metric has evaluation mode as CUPTI_METRIC_EVALUATION_MODE_PER_INSTANCE, then it assumes that the input event value is for one domain instance. If a metric has evaluation mode as CUPTI_METRIC_EVALUATION_MODE_AGGREGATE, it assumes that input event values are normalized to represent all domain instances on a device. For the most accurate metric collection, the events required for the metric should be collected for all profiled domain instances. For example, to collect all instances of an event, set the CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES attribute on the group containing the event to 1. The normalized value for the event is then: (sum_event_values * totalInstanceCount) / instanceCount, where sum_event_values is the summation of the event values across all profiled domain instances, totalInstanceCount is obtained from querying CUPTI_EVENT_DOMAIN_ATTR_TOTAL_INSTANCE_COUNT and instanceCount is obtained from querying CUPTI_EVENT_GROUP_ATTR_INSTANCE_COUNT (or CUPTI_EVENT_DOMAIN_ATTR_INSTANCE_COUNT).

3.7. CUPTI Profiler API

Functions, types, and enums that implement the CUPTI Profiler API.

Classes

struct 
Params for cuptiProfilerBeginPass.
struct 
Params for cuptiProfilerBeginSession.
struct 
Input parameter to define the counterDataImage.
struct 
Params for cuptiProfilerCounterDataImageCalculateScratchBufferSize.
struct 
Params for cuptiProfilerCounterDataImageCalculateSize.
struct 
Params for cuptiProfilerCounterDataImageInitializeScratchBuffer.
struct 
Params for cuptiProfilerCounterDataImageInitialize.
struct 
Default parameter for cuptiProfilerDeInitialize.
struct 
Params for cuptiProfilerDisableProfiling.
struct 
Params for cuptiProfilerEnableProfiling.
struct 
Params for cuptiProfilerEndPass.
struct 
Params for cuptiProfilerEndSession.
struct 
Params for cuptiProfilerFlushCounterData.
struct 
Params for cuptiProfilerGetCounterAvailability.
struct 
Default parameter for cuptiProfilerInitialize.
struct 
Params for cuptiProfilerIsPassCollected.
struct 
Params for cuptiProfilerSetConfig.
struct 
Params for cuptiProfilerUnsetConfig.

Enumerations

enum CUpti_ProfilerRange
Profiler range attribute.
enum CUpti_ProfilerReplayMode
Profiler replay attribute.

Functions

CUptiResult cuptiProfilerBeginPass ( CUpti_Profiler_BeginPass_Params* pParams )
Replay API: used for multipass collection.
CUptiResult cuptiProfilerBeginSession ( CUpti_Profiler_BeginSession_Params* pParams )
Begin profiling session sets up the profiling on the device.
CUptiResult cuptiProfilerCounterDataImageCalculateScratchBufferSize ( CUpti_Profiler_CounterDataImage_CalculateScratchBufferSize_Params* pParams )
A temporary storage for CounterData image needed for internal operations.
CUptiResult cuptiProfilerCounterDataImageCalculateSize ( CUpti_Profiler_CounterDataImage_CalculateSize_Params* pParams )
A CounterData image allocates space for values for each counter for each range.
CUptiResult cuptiProfilerDeInitialize ( CUpti_Profiler_DeInitialize_Params* pParams )
DeInitializes the profiler interface.
CUptiResult cuptiProfilerDisableProfiling ( CUpti_Profiler_DisableProfiling_Params* pParams )
Disable Profiling.
CUptiResult cuptiProfilerEnableProfiling ( CUpti_Profiler_EnableProfiling_Params* pParams )
Enables Profiling.
CUptiResult cuptiProfilerEndPass ( CUpti_Profiler_EndPass_Params* pParams )
Replay API: used for multipass collection.
CUptiResult cuptiProfilerEndSession ( CUpti_Profiler_EndSession_Params* pParams )
Ends profiling session.
CUptiResult cuptiProfilerFlushCounterData ( CUpti_Profiler_FlushCounterData_Params* pParams )
Decode all the submitted passes.
CUptiResult cuptiProfilerGetCounterAvailability ( CUpti_Profiler_GetCounterAvailability_Params* pParams )
Query counter availibility.
CUptiResult cuptiProfilerInitialize ( CUpti_Profiler_Initialize_Params* pParams )
Initializes the profiler interface.
CUptiResult cuptiProfilerIsPassCollected ( CUpti_Profiler_IsPassCollected_Params* pParams )
Asynchronous call to query if the submitted pass to GPU is collected.
CUptiResult cuptiProfilerPopRange ( CUpti_Profiler_PopRange_Params* pParams )
Range API's : Pop user range.
CUptiResult cuptiProfilerPushRange ( CUpti_Profiler_PushRange_Params* pParams )
Range API's : Push user range.
CUptiResult cuptiProfilerSetConfig ( CUpti_Profiler_SetConfig_Params* pParams )
Set metrics configuration to be profiled.
CUptiResult cuptiProfilerUnsetConfig ( CUpti_Profiler_UnsetConfig_Params* pParams )
Unset metrics configuration profiled.

Enumerations

enum CUpti_ProfilerRange

A metric enabled in the session's configuration is collected separately per unique range-stack in the pass. This is an attribute to collect metrics around each kernel in a profiling session or in an user defined range.

Values
CUPTI_Range_INVALID
Invalid value
CUPTI_AutoRange
Ranges are auto defined around each kernel in a profiling session
CUPTI_UserRange
A range in which metric data to be collected is defined by the user
CUPTI_Range_COUNT
Range count
enum CUpti_ProfilerReplayMode

For metrics which require multipass collection, a replay of the GPU kernel(s) is required. This is an attribute which specify how the replay of the kernel(s) to be measured is done.

Values
CUPTI_Replay_INVALID
Invalid Value
CUPTI_ApplicationReplay
Replay is done by CUPTI user around the process
CUPTI_KernelReplay
Replay is done around kernel implicitly by CUPTI
CUPTI_UserReplay
Replay is done by CUPTI user within a process
CUPTI_Replay_COUNT
Replay count

Functions

CUptiResult cuptiProfilerBeginPass ( CUpti_Profiler_BeginPass_Params* pParams )
Replay API: used for multipass collection.
Description

These APIs are used if user chooses to replay by itself /ref CUPTI_UserReplay or /ref CUPTI_ApplicationReplay for multipass collection of the metrics configurations. It's a no-op in case of /ref CUPTI_KernelReplay.

CUptiResult cuptiProfilerBeginSession ( CUpti_Profiler_BeginSession_Params* pParams )
Begin profiling session sets up the profiling on the device.
Description

Although, it doesn't start the profiling but GPU resources needed for profiling are allocated. Outside of a session, the GPU will return to its normal operating state.

CUptiResult cuptiProfilerCounterDataImageCalculateScratchBufferSize ( CUpti_Profiler_CounterDataImage_CalculateScratchBufferSize_Params* pParams )
A temporary storage for CounterData image needed for internal operations.
Description

Use these APIs to calculate the allocation size and initialize counterData image scratch buffer.

CUptiResult cuptiProfilerCounterDataImageCalculateSize ( CUpti_Profiler_CounterDataImage_CalculateSize_Params* pParams )
A CounterData image allocates space for values for each counter for each range.
Description

User borne the resposibility of managing the counterDataImage allocations. CounterDataPrefix contains meta data about the metrics that will be stored in counterDataImage. Use these APIs to calculate the allocation size and initialize counterData image.

CUptiResult cuptiProfilerDeInitialize ( CUpti_Profiler_DeInitialize_Params* pParams )
DeInitializes the profiler interface.
Description

CUptiResult cuptiProfilerDisableProfiling ( CUpti_Profiler_DisableProfiling_Params* pParams )
Disable Profiling.
Description

In /ref CUPTI_AutoRange, these APIs are used to enable/disable profiling for the kernels to be executed in a profiling session.

CUptiResult cuptiProfilerEnableProfiling ( CUpti_Profiler_EnableProfiling_Params* pParams )
Enables Profiling.
Description

In /ref CUPTI_AutoRange, these APIs are used to enable/disable profiling for the kernels to be executed in a profiling session.

CUptiResult cuptiProfilerEndPass ( CUpti_Profiler_EndPass_Params* pParams )
Replay API: used for multipass collection.
Description

These APIs are used if user chooses to replay by itself /ref CUPTI_UserReplay or /ref CUPTI_ApplicationReplay for multipass collection of the metrics configurations. Its a no-op in case of /ref CUPTI_KernelReplay. Returns information for next pass.

CUptiResult cuptiProfilerEndSession ( CUpti_Profiler_EndSession_Params* pParams )
Ends profiling session.
Description

Frees up the GPU resources acquired for profiling. Outside of a session, the GPU will return to it's normal operating state.

CUptiResult cuptiProfilerFlushCounterData ( CUpti_Profiler_FlushCounterData_Params* pParams )
Decode all the submitted passes.
Description

Flush Counter data API to ensure every pass is decoded into the counterDataImage passed at beginSession. This will cause the CPU/GPU sync to collect all the undecoded pass.

CUptiResult cuptiProfilerGetCounterAvailability ( CUpti_Profiler_GetCounterAvailability_Params* pParams )
Query counter availibility.
Description

Use this API to query counter availability information in a buffer which can be used to filter unavailable raw metrics on host. Note: This API may fail, if any profiling or sampling session is active on the specified context or its device.

CUptiResult cuptiProfilerInitialize ( CUpti_Profiler_Initialize_Params* pParams )
Initializes the profiler interface.
Description

Loads the required libraries in the process address space. Sets up the hooks with the CUDA driver.

CUptiResult cuptiProfilerIsPassCollected ( CUpti_Profiler_IsPassCollected_Params* pParams )
Asynchronous call to query if the submitted pass to GPU is collected.
Description

CUptiResult cuptiProfilerPopRange ( CUpti_Profiler_PopRange_Params* pParams )
Range API's : Pop user range.
Description

Counter data is collected per unique range-stack. Identified by a string label passsed by the user. It's an invalid operation in case of /ref CUPTI_AutoRange.

CUptiResult cuptiProfilerPushRange ( CUpti_Profiler_PushRange_Params* pParams )
Range API's : Push user range.
Description

Counter data is collected per unique range-stack. Identified by a string label passsed by the user. It's an invalid operation in case of /ref CUPTI_AutoRange.

CUptiResult cuptiProfilerSetConfig ( CUpti_Profiler_SetConfig_Params* pParams )
Set metrics configuration to be profiled.
Description

Use these APIs to set the config to profile in a session. It can be used for advanced cases such as where multiple configurations are collected into a single CounterData Image on the need basis, without restarting the session.

CUptiResult cuptiProfilerUnsetConfig ( CUpti_Profiler_UnsetConfig_Params* pParams )
Unset metrics configuration profiled.
Description