6.1. CUPTI Activity API#

Functions, types, and enums that implement the CUPTI Activity API.

6.1.1. Data Structures#

CUpti_Activity

The base activity record.

CUpti_ActivityAPI

The activity record for a driver or runtime API invocation.

CUpti_ActivityAutoBoostState

Device auto boost state structure.

CUpti_ActivityBranch2

The activity record for source level result branch.

CUpti_ActivityCdpKernel

The activity record for CDP (CUDA Dynamic Parallelism) kernel.

CUpti_ActivityComputeEngineCtxSwitch

The activity record for trace of CUDA context switch events.

CUpti_ActivityConfidentialComputeRotation

Event related to confidential compute encryption rotation.

CUpti_ActivityConfig

Activity configuration.

CUpti_ActivityContext4

The activity record for a context.

CUpti_ActivityCudaEvent2

The activity record for CUDA event.

CUpti_ActivityDevice6

The activity record for a device.

CUpti_ActivityDeviceAttribute

The activity record for a device attribute.

CUpti_ActivityDeviceGraphTrace

The activity record for trace of device graph execution.

CUpti_ActivityEnvironment

The activity record for CUPTI environmental data.

CUpti_ActivityEvent

The activity record for a CUPTI event.

CUpti_ActivityEventInstance

The activity record for a CUPTI event with instance information.

CUpti_ActivityExternalCorrelation

The activity record for correlation with external records.

CUpti_ActivityFieldLayoutEntry

CUpti_ActivityFieldSelection

CUpti_ActivityFunction

The activity record for global/device functions.

CUpti_ActivityGlobalAccess3

The activity record for source-level global access.

CUpti_ActivityGraphHostNode

CUpti_ActivityGraphTrace2

The activity record for trace of graph execution.

CUpti_ActivityGreenContext

The activity record for a green context.

CUpti_ActivityHostLaunch

The activity record for host launch functions.

CUpti_ActivityInstantaneousEvent

The activity record for an instantaneous CUPTI event.

CUpti_ActivityInstantaneousEventInstance

The activity record for an instantaneous CUPTI event with event domain instance information.

CUpti_ActivityInstantaneousMetric

The activity record for an instantaneous CUPTI metric.

CUpti_ActivityInstantaneousMetricInstance

The instantaneous activity record for a CUPTI metric with instance information.

CUpti_ActivityInstructionCorrelation

The activity record for source-level sass/source line-by-line correlation.

CUpti_ActivityInstructionExecution

The activity record for source-level instruction execution.

CUpti_ActivityJit2

The activity record for JIT operations.

CUpti_ActivityKernel11

The activity record for kernel.

CUpti_ActivityMarker2

The activity record providing a marker which is an instantaneous point in time.

CUpti_ActivityMarkerData2

The activity record providing detailed information for a marker.

CUpti_ActivityMemDecompress

The activity record for trace of decompression operations.

CUpti_ActivityMemcpy6

The activity record for memory copies.

CUpti_ActivityMemcpyPtoP4

The activity record for peer-to-peer memory copies.

CUpti_ActivityMemory

The activity record for memory.

CUpti_ActivityMemory4

The activity record for memory.

CUpti_ActivityMemoryPool3

The activity record for memory pool.

CUpti_ActivityMemset4

The activity record for memset.

CUpti_ActivityMetric

The activity record for a CUPTI metric.

CUpti_ActivityMetricInstance

The activity record for a CUPTI metric with instance information.

CUpti_ActivityModule

The activity record for a CUDA module.

CUpti_ActivityName

The activity record providing a name.

CUpti_ActivityNvLink5

NVLink information.

CUpti_ActivityObjectKindId

Identifiers for object kinds as specified by CUpti_ActivityObjectKind.

CUpti_ActivityOpenAcc

The base activity record for OpenAcc records.

CUpti_ActivityOpenAccData

The activity record for OpenACC data.

CUpti_ActivityOpenAccLaunch

The activity record for OpenACC launch.

CUpti_ActivityOpenAccOther

The activity record for OpenACC other.

CUpti_ActivityOpenMp

The base activity record for OpenMp records.

CUpti_ActivityOverhead3

The activity record for CUPTI and driver overheads.

CUpti_ActivityOverheadCommandBufferFullData

The structure to provide additional data for CUPTI_ACTIVITY_OVERHEAD_COMMAND_BUFFER_FULL.

CUpti_ActivityPCSampling3

The activity record for PC sampling.

CUpti_ActivityPCSamplingConfig

PC sampling configuration structure.

CUpti_ActivityPCSamplingRecordInfo

The activity record for record status for PC sampling.

CUpti_ActivityPcie

PCI devices information required to construct topology.

CUpti_ActivityPreemption

The activity record for a preemption of a CDP kernel.

CUpti_ActivityRecordLayout

CUpti_ActivitySharedAccess

The activity record for source-level shared access.

CUpti_ActivitySourceLocator

The activity record for source locator.

CUpti_ActivityStream

The activity record for CUDA stream.

CUpti_ActivitySynchronization2

The activity record for synchronization management.

CUpti_ActivityUnifiedMemoryCounter3

The activity record for Unified Memory counters (CUDA 7.0 and beyond)

CUpti_ActivityUnifiedMemoryCounterConfig

Unified Memory counters configuration structure.

CUpti_BufferCallbackCompleteInfo

CUpti_BufferCallbackRequestInfo

CUpti_NvtxExtPayloadAttr

6.1.2. Macros#

CUPTI_ACTIVITY_STRUCT_SIZE

CUPTI_AUTO_BOOST_INVALID_CLIENT_PID

An invalid/unknown process id.

CUPTI_CORRELATION_ID_UNKNOWN

An invalid/unknown correlation ID.

CUPTI_DECOMPRESSED_BYTES_UNKNOWN

An invalid/unknown value for decompressed bytes.

CUPTI_FUNCTION_INDEX_ID_INVALID

An invalid function index ID.

CUPTI_GRID_ID_UNKNOWN

An invalid/unknown grid ID.

CUPTI_MAX_GPUS

CUPTI_NVLINK_INVALID_PORT

Invalid/unknown NVLink port number.

CUPTI_SOURCE_LOCATOR_ID_UNKNOWN

The source-locator ID that indicates an unknown source location.

CUPTI_SYNCHRONIZATION_INVALID_VALUE

An invalid/unknown value.

CUPTI_TIMESTAMP_UNKNOWN

An invalid/unknown timestamp for a start, end, queued, submitted, or completed time.

CUpti_ActivityConfig_STRUCT_SIZE

6.1.3. Enumerations#

CUpti_ActivityApiFieldIds

Enum identifiers for fields in CUpti_ActivityAPI .

CUpti_ActivityAttribute

Activity attributes.

CUpti_ActivityComputeApiKind

The kind of a compute API.

CUpti_ActivityComputeEngineCtxSwitchFieldIds

Enum identifiers for fields in CUpti_ActivityComputeEngineCtxSwitch .

CUpti_ActivityConfidentialComputeRotationFieldIds

Enum identifiers for fields in CUpti_ActivityConfidentialComputeRotation .

CUpti_ActivityContextFieldIds

Enum identifiers for fields to trace context.

CUpti_ActivityCudaEventFieldIds

Enum identifiers for fields to trace CUDA Event operations.

CUpti_ActivityDeviceFieldIds

Enum identifiers for fields in CUpti_ActivityDevice6 .

CUpti_ActivityDeviceGraphTraceFieldIds

Enum identifiers for fields in CUpti_ActivityDeviceGraphTrace .

CUpti_ActivityEnvironmentKind

The kind of environment data.

CUpti_ActivityExternalCorrelationFieldIds

Enum identifiers for fields in CUpti_ActivityExternalCorrelation .

CUpti_ActivityFlag

Flags associated with activity records.

CUpti_ActivityGraphHostNodeFieldIds

Enum identifiers for fields in CUpti_ActivityGraphHostNode.

CUpti_ActivityGraphTraceFieldIds

Enum identifiers for fields to trace graph execution.

CUpti_ActivityGreenContextFieldIds

Enum identifiers for fields to trace green context information.

CUpti_ActivityHostLaunchFieldIds

Enum identifiers for fields in CUpti_ActivityHostLaunch .

CUpti_ActivityInstructionClass

SASS instruction classification.

CUpti_ActivityJitEntryType

The types of JIT entry.

CUpti_ActivityJitFieldIds

Enum identifiers for fields to trace JIT operations.

CUpti_ActivityJitOperationType

The types of JIT compilation operations.

CUpti_ActivityKernelFieldIds

Enum identifiers for fields to trace kernel operations.

CUpti_ActivityKind

The kinds of activity records.

CUpti_ActivityLaunchType

The type of the CUDA kernel launch.

CUpti_ActivityMarkerDataFieldIds

Enum identifiers for fields to trace detailed information for a marker.

CUpti_ActivityMarkerFieldIds

Enum identifiers for fields to trace NVTX markers/ranges.

CUpti_ActivityMemDecompressFieldIds

Enum identifiers for fields to trace decompression operations.

CUpti_ActivityMemcpy2FieldIds

Enum identifiers for fields to trace peer-to-peer memory copies.

CUpti_ActivityMemcpyFieldIds

Enum identifiers for fields to trace memory copies.

CUpti_ActivityMemcpyKind

The kind of a memory copy, indicating the source and destination targets of the copy.

CUpti_ActivityMemoryFieldIds

Enum identifiers for fields to trace memory allocation and free operation.

CUpti_ActivityMemoryKind

The kinds of memory accessed by a memory operation/copy.

CUpti_ActivityMemoryOperationType

Memory operation types.

CUpti_ActivityMemoryPoolFieldIds

Enum identifiers for fields to trace CUDA memory pool creation, destruction and trimming.

CUpti_ActivityMemoryPoolOperationType

Memory pool operation types.

CUpti_ActivityMemoryPoolType

Memory pool types.

CUpti_ActivityMemsetFieldIds

Enum identifiers for fields to trace memset operations.

CUpti_ActivityNameFieldIds

Enum identifiers for fields in CUpti_ActivityName .

CUpti_ActivityObjectKind

The kinds of activity objects.

CUpti_ActivityOverheadFieldIds

Enum identifiers for fields to trace CUPTI and driver overhead information.

CUpti_ActivityOverheadKind

The kinds of activity overhead.

CUpti_ActivityPCSamplingPeriod

Sampling period for PC sampling method.

CUpti_ActivityPCSamplingStallReason

The stall reason for PC sampling activity.

CUpti_ActivityPartitionedGlobalCacheConfig

Partitioned global caching option.

CUpti_ActivityPreemptionKind

The kind of a preemption activity.

CUpti_ActivityStreamFieldIds

Enum identifiers for fields to trace CUDA Stream operations.

CUpti_ActivityStreamFlag

stream type.

CUpti_ActivitySynchronizationFieldIds

Enum identifiers for fields to trace various CUDA synchronization APIs.

CUpti_ActivitySynchronizationType

Synchronization type.

CUpti_ActivityThreadIdType

Thread-Id types.

CUpti_ActivityUnifiedMemoryAccessType

Memory access type for unified memory page faults.

CUpti_ActivityUnifiedMemoryCounterKind

Kind of the Unified Memory counter.

CUpti_ActivityUnifiedMemoryCounterScope

Scope of the unified memory counter (deprecated in CUDA 7.0)

CUpti_ActivityUnifiedMemoryMigrationCause

Migration cause of the Unified Memory counter.

CUpti_ActivityUnifiedMemoryRemoteMapCause

Remote memory map cause of the Unified Memory counter.

CUpti_ActivityUvmCounterFieldIds

Enum identifiers for fields in CUpti_ActivityUnifiedMemoryCounter3 .

CUpti_ChannelType

CUpti_ComputeEngineCtxSwitchOperationType

The operation type of CUDA context switch event records.

CUpti_ConfidentialComputeRotationEventType

CUpti_ContextCigMode

CIG (CUDA in Graphics) Modes.

CUpti_DevType

The device type for device connected to NVLink.

CUpti_DeviceGraphLaunchMode

The launch mode for device graph execution.

CUpti_DeviceVirtualizationMode

This indicates the virtualization mode in which CUDA device is running.

CUpti_EnvironmentClocksThrottleReason

Reasons for clock throttling.

CUpti_ExternalCorrelationKind

The kind of external APIs supported for correlation.

CUpti_FuncShmemLimitConfig

The shared memory limit per block config for a kernel This should be used to set 'cudaOccFuncShmemConfig' field in occupancy calculator API.

CUpti_LinkFlag

Link flags.

CUpti_NvtxExtPayloadType

CUpti_OpenAccConstructKind

The OpenAcc parent construct kind for OpenAcc activity records.

CUpti_OpenAccEventKind

The OpenAcc event kind for OpenAcc activity records.

CUpti_OpenMpEventKind

CUpti_PcieDeviceType

Field to differentiate whether PCIE Activity record is of a GPU or a PCI Bridge.

CUpti_PcieGen

PCIE Generation.

6.1.4. Functions#

CUptiResult cuptiActivityConfigurePCSampling(CUcontext ctx, CUpti_ActivityPCSamplingConfig *config)

Set PC sampling configuration.

CUptiResult cuptiActivityConfigureUnifiedMemoryCounter(CUpti_ActivityUnifiedMemoryCounterConfig *config, uint32_t count)

Set Unified Memory Counter configuration.

CUptiResult cuptiActivityDisable(CUpti_ActivityKind kind)

Disable collection of a specific kind of activity record.

CUptiResult cuptiActivityDisableContext(CUcontext context, CUpti_ActivityKind kind)

Disable collection of a specific kind of activity record for a context.

CUptiResult cuptiActivityDisable_v2(CUpti_SubscriberHandle subscriber, CUpti_ActivityKind kind, CUpti_ActivityConfig *pActivityConfig)

Disable collection of a specific kind of activity record for the subscriber.

CUptiResult cuptiActivityEnable(CUpti_ActivityKind kind)

Enable collection of a specific kind of activity record.

CUptiResult cuptiActivityEnableAllSyncRecords(uint8_t enable)

Enables collecting records for all synchronization operations.

CUptiResult cuptiActivityEnableAllocationSource(uint8_t enable)

Enables tracking the source library for memory allocation requests.

CUptiResult cuptiActivityEnableAndDump(CUpti_ActivityKind kind)

Enable collection of a specific kind of activity record.

CUptiResult cuptiActivityEnableAndDump_v2(CUpti_SubscriberHandle subscriber, CUpti_ActivityKind kind, CUpti_ActivityConfig *activityConfig)

Enable collection of a specific kind of activity record.

CUptiResult cuptiActivityEnableContext(CUcontext context, CUpti_ActivityKind kind)

Enable collection of a specific kind of activity record for a context.

CUptiResult cuptiActivityEnableCudaEventDeviceTimestamps(uint8_t enable)

Enable/Disable collecting device timestamp for CUPTI_ACTIVITY_KIND_CUDA_EVENT record.

CUptiResult cuptiActivityEnableDeviceGraph(uint8_t enable)

Controls the collection of records for device launched graphs.

CUptiResult cuptiActivityEnableDriverApi(CUpti_CallbackId cbid, uint8_t enable)

Controls the collection of activity records for specific CUDA Driver APIs.

CUptiResult cuptiActivityEnableHWTrace(uint8_t enable)

Enables CUDA kernel timestamp collection via Hardware Event System (HES).

CUptiResult cuptiActivityEnableLatencyTimestamps(uint8_t enable)

Controls the collection of queued and submitted timestamps for kernels.

CUptiResult cuptiActivityEnableLaunchAttributes(uint8_t enable)

Controls the collection of launch attributes for kernels.

CUptiResult cuptiActivityEnableRuntimeApi(CUpti_CallbackId cbid, uint8_t enable)

Controls the collection of activity records for specific CUDA Runtime APIs.

CUptiResult cuptiActivityEnable_v2(CUpti_SubscriberHandle subscriber, CUpti_ActivityKind kind, CUpti_ActivityConfig *pActivityConfig)

Enable collection of a specific kind of activity record for the subscriber.

CUptiResult cuptiActivityFlush(CUcontext context, uint32_t streamId, uint32_t flag)

Wait for all activity records to be delivered via the completion callback.

CUptiResult cuptiActivityFlushAll(uint32_t flag)

Request to deliver activity records via the buffer completion callback.

CUptiResult cuptiActivityFlushPeriod(uint32_t time)

Sets the flush period for the worker thread.

CUptiResult cuptiActivityGetAttribute(CUpti_ActivityAttribute attr, size_t *valueSize, void *value)

Read an activity API attribute.

CUptiResult cuptiActivityGetAttribute_v2(CUpti_SubscriberHandle subscriber, CUpti_ActivityAttribute attr, size_t *valueSize, void *value)

Read an activity API attribute.

CUptiResult cuptiActivityGetEnabledKinds(CUpti_SubscriberHandle subscriber, CUpti_ActivityKind *buffer, uint32_t *bufferSize, uint32_t *enabledKindsCount)

Get the enabled activity kinds for a subscriber.

CUptiResult cuptiActivityGetNextRecord(uint8_t *buffer, size_t validBufferSizeBytes, CUpti_Activity **record)

Iterate over the activity records in a buffer.

CUptiResult cuptiActivityGetNumDroppedRecords(CUcontext context, uint32_t streamId, size_t *dropped)

Get the number of activity records that were dropped of insufficient buffer space.

CUptiResult cuptiActivityGetStructSize(CUpti_ActivityKind activityKind, uint32_t version, size_t *activityStructSize)

Get the size of the activity struct for a given CUPTI version.

CUptiResult cuptiActivityPopExternalCorrelationId(CUpti_ExternalCorrelationKind kind, uint64_t *lastId)

Pop an external correlation id for the calling thread.

CUptiResult cuptiActivityPushExternalCorrelationId(CUpti_ExternalCorrelationKind kind, uint64_t id)

Push an external correlation id for the calling thread.

CUptiResult cuptiActivityRegisterCallbacks(CUpti_BuffersCallbackRequestFunc funcBufferRequested, CUpti_BuffersCallbackCompleteFunc funcBufferCompleted)

Registers callback functions with CUPTI for activity buffer handling.

CUptiResult cuptiActivityRegisterCallbacks_v2(CUpti_SubscriberHandle subscriber, CUpti_BuffersCallbackRequestFunc_v2 funcBufferRequested, CUpti_BuffersCallbackCompleteFunc_v2 funcBufferCompleted)

Registers callback functions with CUPTI for activity buffer handling for the subscriber.

CUptiResult cuptiActivityRegisterTimestampCallback(CUpti_TimestampCallbackFunc funcTimestamp)

Registers callback function with CUPTI for providing timestamp.

CUptiResult cuptiActivitySetAttribute(CUpti_ActivityAttribute attr, size_t *valueSize, void *value)

Write an activity API attribute.

CUptiResult cuptiActivitySetAttribute_v2(CUpti_SubscriberHandle subscriber, CUpti_ActivityAttribute attr, size_t *valueSize, void *value)

Write an activity API attribute.

CUptiResult cuptiComputeCapabilitySupported(int major, int minor, int *support)

Check support for a compute capability.

CUptiResult cuptiDeviceSupported(CUdevice dev, int *support)

Check support for a compute device.

CUptiResult cuptiDeviceVirtualizationMode(CUdevice dev, CUpti_DeviceVirtualizationMode *mode)

Query the virtualization mode of the device.

CUptiResult cuptiFinalize(void)

Detach CUPTI from the running process.

CUptiResult cuptiGetAutoBoostState(CUcontext context, CUpti_ActivityAutoBoostState *state)

Get auto boost state.

CUptiResult cuptiGetContextId(CUcontext context, uint32_t *contextId)

Get the ID of a context.

CUptiResult cuptiGetDeviceId(CUcontext context, uint32_t *deviceId)

Get the ID of a device.

CUptiResult cuptiGetGraphExecId(CUgraphExec graphExec, uint32_t *pId)

Get the unique ID of executable graph.

CUptiResult cuptiGetGraphId(CUgraph graph, uint32_t *pId)

Get the unique ID of graph.

CUptiResult cuptiGetGraphNodeId(CUgraphNode node, uint64_t *nodeId)

Get the unique ID of a graph node.

CUptiResult cuptiGetLastError(void)

Returns the last error from a cupti call or callback.

CUptiResult cuptiGetStreamId(CUcontext context, CUstream stream, uint32_t *streamId)

Get the ID of a stream.

CUptiResult cuptiGetStreamIdEx(CUcontext context, CUstream stream, uint8_t perThreadStream, uint32_t *streamId)

Get the ID of a stream.

CUptiResult cuptiGetThreadIdType(CUpti_ActivityThreadIdType *type)

Get the thread-id type.

CUptiResult cuptiGetTimestamp(uint64_t *timestamp)

Get the CUPTI timestamp.

CUptiResult cuptiIsTracingSessionRunning(uint8_t *isRunning)

Check whether a CUPTI tracing session is still running.

CUptiResult cuptiSetThreadIdType(CUpti_ActivityThreadIdType type)

Set the thread-id type.

6.1.5. Typedefs#

CUpti_BuffersCallbackCompleteFunc

Function type for callback used by CUPTI to return a buffer of activity records.

CUpti_BuffersCallbackCompleteFunc_v2

Function type for callback used by CUPTI to return a buffer of activity records.

CUpti_BuffersCallbackRequestFunc

Function type for callback used by CUPTI to request an empty buffer for storing activity records.

CUpti_BuffersCallbackRequestFunc_v2

Function type for callback used by CUPTI to request an empty buffer for storing activity records.

CUpti_TimestampCallbackFunc

Function type for callback used by CUPTI to request a timestamp to be used in activity records.

6.1.6. Macros#

CUPTI_ACTIVITY_STRUCT_SIZE(type_, lastfield_)#
CUPTI_AUTO_BOOST_INVALID_CLIENT_PID#

An invalid/unknown process id.

CUPTI_CORRELATION_ID_UNKNOWN#

An invalid/unknown correlation ID.

A correlation ID of this value indicates that there is no correlation for the activity record.

CUPTI_DECOMPRESSED_BYTES_UNKNOWN#

An invalid/unknown value for decompressed bytes.

CUPTI_FUNCTION_INDEX_ID_INVALID#

An invalid function index ID.

CUPTI_GRID_ID_UNKNOWN#

An invalid/unknown grid ID.

CUPTI_MAX_GPUS#

Invalid/unknown NVLink port number.

CUPTI_SOURCE_LOCATOR_ID_UNKNOWN#

The source-locator ID that indicates an unknown source location.

There is not an actual CUpti_ActivitySourceLocator object corresponding to this value.

CUPTI_SYNCHRONIZATION_INVALID_VALUE#

An invalid/unknown value.

CUPTI_TIMESTAMP_UNKNOWN#

An invalid/unknown timestamp for a start, end, queued, submitted, or completed time.

CUpti_ActivityConfig_STRUCT_SIZE#

6.1.7. Enumerations#

enum CUpti_ActivityApiFieldIds#

Enum identifiers for fields in CUpti_ActivityAPI.

Each enum value corresponds to a field in CUpti_ActivityAPI and describes the data type and purpose of that field.

Values:

enumerator API_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always either CUPTI_ACTIVITY_KIND_DRIVER or CUPTI_ACTIVITY_KIND_RUNTIME.

enumerator API_FIELD_CBID#

CUpti_CallbackId cbid; Callback ID of the CUDA driver/runtime API function.

enumerator API_FIELD_START#

uint64_t start; Start timestamp of the CUDA driver/runtime API call (in nanoseconds).

enumerator API_FIELD_END#

uint64_t end; End timestamp of the CUDA driver/runtime API call (in nanoseconds).

enumerator API_FIELD_PROCESS_ID#

uint32_t processId; ID of the process executing the CUDA driver/runtime API call.

enumerator API_FIELD_THREAD_ID#

uint64_t threadId; ID of the thread executing the CUDA driver/runtime API call.

enumerator API_FIELD_CORRELATION_ID#

uint64_t correlationId; Correlation ID assigned to this CUDA driver/runtime API invocation.

This ID matches the correlation ID of the associated kernel, memcpy, or memset activity record that the API call initiated.

enumerator API_FIELD_RETURN_VALUE#

uint32_t returnValue; Return value of the CUDA driver/runtime API call (CUresult or cudaError_t).

enumerator API_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityAttribute#

Activity attributes.

These attributes are used to control the behavior of the activity API.

Values:

enumerator CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE#

The device memory size (in bytes) reserved for storing profiling data for concurrent kernels (activity kind CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL), memcopies and memsets for each buffer on a context.

The value is a size_t.

There is a limit on how many device buffers can be allocated per context. User can query and set this limit using the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT. CUPTI doesn’t pre-allocate all the buffers, it pre-allocates only those many buffers as set by the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_PRE_ALLOCATE_VALUE. When all of the data in a buffer is consumed, it is added in the reuse pool, and CUPTI picks a buffer from this pool when a new buffer is needed. Thus memory footprint does not scale with the kernel count. Applications with the high density of kernels, memcopies and memsets might result in having CUPTI to allocate more device buffers. CUPTI allocates another buffer only when it runs out of the buffers in the reuse pool.

Since buffer allocation happens in the main application thread, this might result in stalls in the critical path. CUPTI pre-allocates 3 buffers of the same size to mitigate this issue. User can query and set the pre-allocation limit using the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_PRE_ALLOCATE_VALUE.

Having larger buffer size leaves less device memory for the application. Having smaller buffer size increases the risk of dropping timestamps for records if too many kernels or memcopies or memsets are launched at one time.

This value only applies to new buffer allocations. Set this value before initializing CUDA or before creating a context to ensure it is considered for the following allocations.

The default value is 3200000 (~3MB) which can accommodate profiling data up to 100,000 kernels, memcopies and memsets combined.

Note: Starting with the CUDA 12.0 Update 1 release, CUPTI allocates the profiling buffer in the device memory by default, which may improve the performance of the tracing run. To change the preferred location to page-locked host memory, refer to the attribute CUPTI_ACTIVITY_ATTR_MEM_ALLOCATION_TYPE_HOST_PINNED. The size of the memory and maximum number of pools are still controlled by the attributes CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE and CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT.

Note: The actual amount of device memory per buffer reserved by CUPTI might be larger.

enumerator CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE_CDP#

The device memory size (in bytes) reserved for storing profiling data for CDP operations for each buffer on a context.

The value is a size_t.

Having larger buffer size means less flush operations but consumes more device memory. This value only applies to new allocations.

Set this value before initializing CUDA or before creating a context to ensure it is considered for the following allocations.

The default value is 8388608 (8MB).

Note: The actual amount of device memory per context reserved by CUPTI might be larger.

enumerator CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT#

The maximum number of device memory buffers per context.

The value is a size_t.

For an application with high rate of kernel launches, memcopies and memsets having a bigger pool limit helps in timestamp collection for all these activities at the expense of a larger memory footprint. Refer to the description of the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE for more details.

Setting this value will not modify the number of memory buffers currently stored.

Set this value before initializing CUDA to ensure the limit is not exceeded.

The default value is 250.

enumerator CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_SIZE#

This attribute is not supported starting with CUDA 12.3 CUPTI no longer uses profiling semaphore pool to store profiling data.

There is a limit on how many semaphore pools can be allocated per context. User can query and set this limit using the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_LIMIT. CUPTI doesn’t pre-allocate all the semaphore pools, it pre-allocates only those many semaphore pools as set by the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_PRE_ALLOCATE_VALUE. When all of the data in a semaphore pool is consumed, it is added in the reuse pool, and CUPTI picks a semaphore pool from the reuse pool when a new semaphore pool is needed. Thus memory footprint does not scale with the kernel count. Applications with the high density of kernels might result in having CUPTI to allocate more semaphore pools. CUPTI allocates another semaphore pool only when it runs out of the semaphore pools in the reuse pool.

Since semaphore pool allocation happens in the main application thread, this might result in stalls in the critical path. CUPTI pre-allocates 3 semaphore pools of the same size to mitigate this issue. User can query and set the pre-allocation limit using the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_PRE_ALLOCATE_VALUE.

Having larger semaphore pool size leaves less device memory for the application. Having smaller semaphore pool size increases the risk of dropping timestamps for kernel records if too many kernels are issued/launched at one time.

This value only applies to new semaphore pool allocations. Set this value before initializing CUDA or before creating a context to ensure it is considered for the following allocations.

The default value is 25000 which can accommodate profiling data for upto 25,000 kernels.

enumerator CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_LIMIT#

This attribute is not supported starting with CUDA 12.3 CUPTI no longer uses profiling semaphore pool to store profiling data.

The maximum number of profiling semaphore pools per context. The value is a size_t.

Refer to the description of the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_SIZE for more details.

Set this value before initializing CUDA to ensure the limit is not exceeded.

The default value is 250.

enumerator CUPTI_ACTIVITY_ATTR_ZEROED_OUT_ACTIVITY_BUFFER#

The flag to indicate whether user should provide activity buffer of zero value.

The value is a uint8_t.

If the value of this attribute is non-zero, user should provide a zero value buffer in the CUpti_BuffersCallbackRequestFunc. If the user does not provide a zero value buffer after setting this to non-zero, the activity buffer may contain some uninitialized values when CUPTI returns it in CUpti_BuffersCallbackCompleteFunc

If the value of this attribute is zero, CUPTI will initialize the user buffer received in the CUpti_BuffersCallbackRequestFunc to zero before filling it. If the user sets this to zero, a few stalls may appear in critical path because CUPTI will zero out the buffer in the main thread. Set this value before returning from CUpti_BuffersCallbackRequestFunc to ensure it is considered for all the subsequent user buffers.

The default value is 0.

enumerator CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_PRE_ALLOCATE_VALUE#

Number of device buffers to pre-allocate for a context during the initialization phase.

The value is a size_t.

Refer to the description of the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE for details.

This value must be less than the maximum number of device buffers set using the attribute CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT

Set this value before initializing CUDA or before creating a context to ensure it is considered by the CUPTI.

The default value is set to 3 to ping pong between these buffers (if possible).

enumerator CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_PRE_ALLOCATE_VALUE#

This attribute is not supported starting with CUDA 12.3 CUPTI no longer uses profiling semaphore pool to store profiling data.

Number of profiling semaphore pools to pre-allocate for a context during the initialization phase. The value is a size_t.

Refer to the description of the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_SIZE for details.

This value must be less than the maximum number of profiling semaphore pools set using the attribute CUPTI_ACTIVITY_ATTR_PROFILING_SEMAPHORE_POOL_LIMIT

Set this value before initializing CUDA or before creating a context to ensure it is considered by the CUPTI.

The default value is set to 3 to ping pong between these pools (if possible).

enumerator CUPTI_ACTIVITY_ATTR_MEM_ALLOCATION_TYPE_HOST_PINNED#

Allocate page-locked (pinned) host memory for storing profiling data for concurrent kernels, memcopies and memsets for each buffer on a context.

The value is a uint8_t.

From CUDA 11.2 through CUDA 12.0 GA releases, CUPTI allocated the profiling buffer in pinned host memory by default. Allocating excessive amounts of pinned memory may degrade system performance, as it reduces the amount of memory available to the system for paging. For this reason user might want to change the location from pinned host memory to device memory by setting value of this attribute to 0.

Using page-locked (pinned) host memory buffers is not supported on confidential computing devices. If this attribute is set to 1, CUPTI will return error CUPTI_ERROR_NOT_SUPPORTED.

The default value is 0.

enumerator CUPTI_ACTIVITY_ATTR_PER_THREAD_ACTIVITY_BUFFER#

Request activity buffers per-thread to store CUPTI activity records in the activity buffer on per-thread basis.

The value is a uint8_t.

The attribute should be set before registering the buffer callbacks using cuptiActivityRegisterCallbacks API and before any of the CUPTI activity kinds are enabled. This makes sure that all the records are stored in activity buffers allocated per-thread. Changing this attribute in the middle of the profiling session will result in undefined behavior.

The default value is 1.

enumerator CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE_DEVICE_GRAPHS#

The device memory size (in bytes) reserved for storing profiling data for device graph operations for each buffer on a context.

The value is a size_t.

Having larger buffer size means less flush operations but consumes more device memory. This value only applies to new allocations.

Set this value before initializing CUDA or before creating a context to ensure it is considered for the following allocations.

The default value is 16777216 (16MB).

Note: The actual amount of device memory per context reserved by CUPTI might be larger.

enumerator CUPTI_ACTIVITY_ATTR_USER_DEFINED_RECORDS#

Enable user-defined activity records.

The value is a uint8_t.

When this attribute is enabled, CUPTI activity records will contain only the fields that are enabled for collection by the user. This can help to reduce the size of each activity record and decrease the overall memory consumption for activity collection.

This attribute must be set before enabling any CUPTI activity kinds. The activity kinds should be enabled/disabled using CUPTI API cuptiActivityEnable_v2 and cuptiActivityDisable_v2 respectively.

Minimum CUPTI Version: CUPTI_API_VERSION 130200 (CUDA 13.2). The CUPTI version can be queried using cuptiGetVersion() API.

Refer these links for online documentation: CUPTI Activity User-Defined Records: https://docs.nvidia.com/cupti/main/main.html#cupti-user-defined-activity-records Tutorial: https://docs.nvidia.com/cupti/tutorial/tutorial.html#tutorial-activity-user-defined-records Refer sample cupti_user_defined_records for usage.

NOTE: The CUPTI user-defined activity records feature is currently in beta. APIs and behavior may change in future releases.

The default value is 0 i.e. CUPTI will provide records with all fields.

enumerator CUPTI_ACTIVITY_ATTR_MULTIPLE_SUBSCRIBER_STATE#

Allow multiple subscribers.

The value is a uint8_t.

If 0, only a single subscriber is allowed. If 1, multiple subscribers are allowed. If 2, this option has not been set by a prior subscriber, if there were any. In this case, the subscriber is free to set it during cuptiSubscribe_v2 call. Note that this attribute cannot be set by cuptiActivitySetAttribute or cuptiActivitySetAttribute_v2. It can however be queried using cuptiActivityGetAttribute_v2.

Note

The CUPTI multiple subscribers feature is currently in beta. APIs and behavior may change in future releases.

enumerator CUPTI_ACTIVITY_ATTR_ENABLE_HES#

Get the HES (Hardware Events System) enabled flag.

The value is a uint8_t.

If 0, HES (Hardware Events System) is disabled. If 1, HES (Hardware Events System) is enabled. Note that this attribute can be set by cuptiActivitySetAttribute_v2 and queried using cuptiActivityGetAttribute_v2. This attribute is not subscriber specific, and the subscriber parameter is ignored. All subscribers will share the same HES enabled flag. However, this attribute must be set before enabling any activity kind by any subscriber. However, it can be queried using cuptiActivityGetAttribute_v2.

Note

The CUPTI multiple subscribers feature is currently in beta. APIs and behavior may change in future releases.

enumerator CUPTI_ACTIVITY_ATTR_ENABLE_ALLOCATION_SOURCE_TRACKING#

Get the allocation source library tracking enabled flag (for the specified subscriber).

The value is a uint8_t.

If 0, allocation source library tracking is disabled. If 1, allocation source library tracking is enabled. Note that this attribute can be set by cuptiActivitySetAttribute_v2 and queried using cuptiActivityGetAttribute_v2 for the specified subscriber. Note that the cuptiActivityEnableAllocationSource API cannot be used when multiple subscribers are allowed.

Note

The CUPTI multiple subscribers feature is currently in beta. APIs and behavior may change in future releases.

enumerator CUPTI_ACTIVITY_ATTR_ENABLE_KERNEL_LATENCY_TIMESTAMPS#

Get the latency timestamp tracking enabled flag (for the specified subscriber).

The value is a uint8_t.

This can be used to toggle collecting latency timestamps for kernel records.

If 0, latency timestamp tracking is disabled. If 1, latency timestamp tracking is enabled. Note that this attribute can be set by cuptiActivitySetAttribute_v2 and queried using cuptiActivityGetAttribute_v2 for the specified subscriber.

Note

The CUPTI multiple subscribers feature is currently in beta. APIs and behavior may change in future releases.

enumerator CUPTI_ACTIVITY_ATTR_ENABLE_ALL_SYNC_RECORDS#

Get the all sync records enabled flag (for the specified subscriber).

The value is a uint8_t.

This can be used to toggle collecting sync records for all synchronization operations (whether to include records for synchronization operations which return non-zero CUDA status). By default, this flag is set to 0. If this flag is set to 1, sync records are collected for all synchronization operations (even if they return non-zero CUDA status). Note that this attribute can be set by cuptiActivitySetAttribute_v2 and queried using cuptiActivityGetAttribute_v2 for the specified subscriber.

Note

The CUPTI multiple subscribers feature is currently in beta. APIs and behavior may change in future releases.

enumerator CUPTI_ACTIVITY_ATTR_ENABLE_CUDA_EVENT_DEVICE_TIMESTAMPS#

Get the event device timestamps enabled flag (for the specified subscriber).

The value is a uint8_t.

This can be used to toggle collecting event device timestamps for CUDA event records. By default, this flag is set to 0. If this flag is set to 1, event device timestamps are collected for CUDA event records. Note that this attribute can be set by cuptiActivitySetAttribute_v2 and queried using cuptiActivityGetAttribute_v2 for the specified subscriber.

Note

The CUPTI multiple subscribers feature is currently in beta. APIs and behavior may change in future releases.

enumerator CUPTI_ACTIVITY_ATTR_ENABLE_KERNEL_LAUNCH_ATTRIBUTES#

Get the launch attributes for kernel enabled flag (for the specified subscriber).

The value is a uint8_t.

This can be used to toggle collecting launch attributes for kernel records. By default, this flag is set to 0. If this flag is set to 1, launch attributes are collected for kernel records. Note that this attribute can be set by cuptiActivitySetAttribute_v2 and queried using cuptiActivityGetAttribute_v2 for the specified subscriber.

Note

The CUPTI multiple subscribers feature is currently in beta. APIs and behavior may change in future releases.

enumerator CUPTI_ACTIVITY_ATTR_ENABLE_DEVICE_GRAPH_TRACE#

Get the device graph trace enabled flag (for the specified subscriber).

The value is a uint8_t.

This can be used to toggle collecting device graph trace for graph records. By default, this flag is set to 0. If this flag is set to 1, device graph trace is collected for graph records. Note that this attribute can be set by cuptiActivitySetAttribute_v2 and queried using cuptiActivityGetAttribute_v2 for the specified subscriber.

Note

The CUPTI multiple subscribers feature is currently in beta. APIs and behavior may change in future releases.

enumerator CUPTI_ACTIVITY_ATTR_ENABLE_MULTI_SUBSCRIBER_GRAPH_LEVEL_TRACE#

Get the multi-subscriber graph level trace enabled flag.

The value is a uint8_t.

This attribute is only relevant for multi-subscriber mode. CUPTI currently does not support different subscribers to ask for node level trace and graph level trace concurrently. If this flag is disabled, body nodes of a graph are reported under normal concurrent kernel/ memcpy/ memset records. If this flag is enabled, body nodes of a graph are not reported under normal concurrent kernel/ memcpy/ memset records, but the graph is entirely reported as part of a single graph trace record(if CUPTI_ACTIVITY_KIND_GRAPH_TRACE is enabled for the subscriber). Note that this behavior is consistent across all subscribers, regardless of which activities each subscriber is enabling.

If 0, multi-subscriber graph level trace is disabled. If 1, multi-subscriber graph level trace is enabled. The default value is 0. Note that this attribute can be set by cuptiActivitySetAttribute_v2 and queried using cuptiActivityGetAttribute_v2 for the specified subscriber. However, this attribute must be set before enabling any activity kind by any subscriber. This attribute is not subscriber specific, and the subscriber parameter is ignored. All subscribers will share the same multi-subscriber graph level trace enabled flag.

Note

The CUPTI multiple subscribers feature is currently in beta. APIs and behavior may change in future releases.

enumerator CUPTI_ACTIVITY_ATTR_THREAD_ID_TYPE#

Get or set the thread ID type (for the specified subscriber).

The value is a CUpti_ActivityThreadIdType.

This attribute specifies the method used by CUPTI to obtain thread IDs. The default value is CUPTI_ACTIVITY_THREAD_ID_TYPE_DEFAULT. Note that this attribute can be set by cuptiActivitySetAttribute_v2 and queried using cuptiActivityGetAttribute_v2 for the specified subscriber.

Note

The CUPTI multiple subscribers feature is currently in beta. APIs and behavior may change in future releases.

enumerator CUPTI_ACTIVITY_ATTR_TIMESTAMP_CALLBACK#

Get or set the timestamp callback function (for the specified subscriber).

The value is a CUpti_TimestampCallbackFunc.

This attribute allows registering a custom timestamp callback function that CUPTI will use instead of its default CPU timer. The value can be NULL to unregister the callback. Note that this attribute can be set by cuptiActivitySetAttribute_v2 and queried using cuptiActivityGetAttribute_v2 for the specified subscriber.

Note

The CUPTI multiple subscribers feature is currently in beta. APIs and behavior may change in future releases.

enumerator CUPTI_ACTIVITY_ATTR_CIG_MODE#

Get or set the CIG (CUDA in Graphics) mode.

The value is a uint8_t.

This attribute allows enabling or disabling the CIG mode. Note that this attribute can be set by cuptiActivitySetAttribute_v2 and queried using cuptiActivityGetAttribute_v2.

Note

The CUPTI multiple subscribers feature is currently in beta. APIs and behavior may change in future releases.

enumerator CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_FORCE_INT#
enum CUpti_ActivityComputeApiKind#

The kind of a compute API.

Values:

enumerator CUPTI_ACTIVITY_COMPUTE_API_UNKNOWN#

The compute API is not known.

enumerator CUPTI_ACTIVITY_COMPUTE_API_CUDA#

The compute APIs are for CUDA.

enumerator CUPTI_ACTIVITY_COMPUTE_API_CUDA_MPS#

The compute APIs are for CUDA running in MPS (Multi-Process Service) environment.

enumerator CUPTI_ACTIVITY_COMPUTE_API_FORCE_INT#
enum CUpti_ActivityComputeEngineCtxSwitchFieldIds#

Enum identifiers for fields in CUpti_ActivityComputeEngineCtxSwitch.

Each enum value corresponds to a field in CUpti_ActivityComputeEngineCtxSwitch and describes the data type and purpose of that field.

Values:

enumerator COMPUTE_ENGINE_CTX_SWITCH_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_COMPUTE_ENGINE_CTX_SWITCH.

enumerator COMPUTE_ENGINE_CTX_SWITCH_FIELD_CONTEXT_ID#

uint32_t contextId; ID of the CUDA context.

enumerator COMPUTE_ENGINE_CTX_SWITCH_FIELD_TIMESTAMP#

uint64_t timestamp; Timestamp at which the context switch operation occurs.

enumerator COMPUTE_ENGINE_CTX_SWITCH_FIELD_OPERATION_TYPE#

CUpti_ComputeEngineCtxSwitchOperationType operationType; Type of the Compute Engine Context switch operation.

enumerator COMPUTE_ENGINE_CTX_SWITCH_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityConfidentialComputeRotationFieldIds#

Enum identifiers for fields in CUpti_ActivityConfidentialComputeRotation.

Each enum value corresponds to a field in CUpti_ActivityConfidentialComputeRotation and describes the data type and purpose of that field.

Values:

enumerator CONFIDENTIAL_COMPUTE_ROTATION_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_CONFIDENTIAL_COMPUTE_ROTATION.

enumerator CONFIDENTIAL_COMPUTE_ROTATION_FIELD_EVENT_TYPE#

CUpti_ConfidentialComputeRotationEventType eventType; Type of event.

enumerator CONFIDENTIAL_COMPUTE_ROTATION_FIELD_DEVICE_ID#

uint32_t deviceId; Device ID.

enumerator CONFIDENTIAL_COMPUTE_ROTATION_FIELD_CONTEXT_ID#

uint32_t contextId; Context ID.

enumerator CONFIDENTIAL_COMPUTE_ROTATION_FIELD_CHANNEL_ID#

uint32_t channelId; Channel ID.

enumerator CONFIDENTIAL_COMPUTE_ROTATION_FIELD_CHANNEL_TYPE#

CUpti_ChannelType channelType; Channel Type.

enumerator CONFIDENTIAL_COMPUTE_ROTATION_FIELD_TIMESTAMP#

uint64_t timestamp; Timestamp in ns.

enumerator CONFIDENTIAL_COMPUTE_ROTATION_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityContextFieldIds#

Enum identifiers for fields to trace context.

These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_CONTEXT.

Values:

enumerator CONTEXT_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_CONTEXT.

enumerator CONTEXT_FIELD_CONTEXT_ID#

uint32_t contextId; The ID of the context.

enumerator CONTEXT_FIELD_DEVICE_ID#

uint32_t deviceId; The ID of the device associated with the context.

enumerator CONTEXT_FIELD_COMPUTE_API_KIND#

uint16_t computeApiKind; The compute API kind associated with the context.

CUpti_ActivityComputeApiKind

enumerator CONTEXT_FIELD_NULL_STREAM_ID#

uint32_t nullStreamId; The ID for the NULL stream in this context.

enumerator CONTEXT_FIELD_PARENT_CONTEXT_ID#

uint32_t parentContextId; The ID of the parent context.

It would be 0 if context does not have parent.

enumerator CONTEXT_FIELD_IS_GREEN_CONTEXT#

uint8_t isGreenContext; Indicates whether the context is a green context.

enumerator CONTEXT_FIELD_NUM_MULTIPROCESSORS#

uint16_t numMultiprocessors; Number of multiprocessors assigned to the green context.

Invalid if the field ‘isGreenContext’ is 0.

enumerator CONTEXT_FIELD_CIG_MODE#

CUpti_ContextCigMode cigMode; The CIG mode of the context.

enumerator CONTEXT_FIELD_PROCESS_ID#

uint32_t processId; The ID of the process associated with the context.

enumerator CONTEXT_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityCudaEventFieldIds#

Enum identifiers for fields to trace CUDA Event operations.

These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_CUDA_EVENT.

Values:

enumerator CUDA_EVENT_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_CONTEXT.

enumerator CUDA_EVENT_FIELD_CORRELATION_ID#

uint64_t correlationId; Correlation ID assigned to this CUDA Event operation.

Matches the correlation ID in the CUDA driver/runtime API activity record that launched it.

enumerator CUDA_EVENT_FIELD_CONTEXT_ID#

uint32_t contextId; The ID of the context where the event was recorded.

enumerator CUDA_EVENT_FIELD_STREAM_ID#

uint32_t streamId; The compute stream where the event was recorded.

enumerator CUDA_EVENT_FIELD_EVENT_ID#

uint64_t eventId; A unique event ID to identify the event record.

enumerator CUDA_EVENT_FIELD_DEVICE_ID#

uint32_t deviceId; The ID of the device where the event was recorded.

enumerator CUDA_EVENT_FIELD_DEVICE_TIMESTAMP#

uint64_t deviceTimestamp; The device-side timestamp on CUDA event record (in nanoseconds).

Collection of this field is disabled by default. It can be enabled by calling CUPTI API cuptiActivityEnableCudaEventDeviceTimestamps

enumerator CUDA_EVENT_FIELD_CUDA_EVENT_SYNC_ID#

uint64_t cudaEventSyncId; A unique ID to associate event synchronization records with the latest CUDA Event record.

Similar field is added in CUpti_ActivitySynchronization2 to associate CUDA Event record to the synchronization record.

The same CUDA event can be used multiple times, so the event id will not be unique to correlate the synchronization record with the latest CUDA Event record. This field will be unique and can be used to do the required correlation.

enumerator CUDA_EVENT_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityDeviceFieldIds#

Enum identifiers for fields in CUpti_ActivityDevice6.

Each enum value corresponds to a field in CUpti_ActivityDevice6 and describes the data type and purpose of that field.

Values:

enumerator DEVICE_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_DEVICE.

enumerator DEVICE_FIELD_FLAGS#

CUpti_ActivityFlag flags; Flags associated with the device.

enumerator DEVICE_FIELD_GLOBAL_MEMORY_BANDWIDTH#

uint64_t globalMemoryBandwidth; Global memory bandwidth available on the device (in kBytes/sec).

enumerator DEVICE_FIELD_GLOBAL_MEMORY_SIZE#

uint64_t globalMemorySize; Amount of global memory on the device (in bytes).

enumerator DEVICE_FIELD_CONSTANT_MEMORY_SIZE#

uint32_t constantMemorySize; Amount of constant memory on the device (in bytes).

enumerator DEVICE_FIELD_L2_CACHE_SIZE#

uint32_t l2CacheSize; Size of the L2 cache on the device (in bytes).

enumerator DEVICE_FIELD_NUM_THREADS_PER_WARP#

uint32_t numThreadsPerWarp; Number of threads per warp on the device.

enumerator DEVICE_FIELD_CORE_CLOCK_RATE#

uint32_t coreClockRate; Core clock rate of the device (in kHz).

enumerator DEVICE_FIELD_NUM_MEMCPY_ENGINES#

uint32_t numMemcpyEngines; Number of memory copy engines on the device.

enumerator DEVICE_FIELD_NUM_MULTIPROCESSORS#

uint32_t numMultiprocessors; Number of multiprocessors on the device.

enumerator DEVICE_FIELD_MAX_IPC#

uint32_t maxIPC; Maximum instructions per cycle possible on each device multiprocessor.

enumerator DEVICE_FIELD_MAX_WARPS_PER_MULTIPROCESSOR#

uint32_t maxWarpsPerMultiprocessor; Maximum number of warps that can be present on a multiprocessor.

enumerator DEVICE_FIELD_MAX_BLOCKS_PER_MULTIPROCESSOR#

uint32_t maxBlocksPerMultiprocessor; Maximum number of blocks that can be present on a multiprocessor.

enumerator DEVICE_FIELD_MAX_SHARED_MEMORY_PER_MULTIPROCESSOR#

uint32_t maxSharedMemoryPerMultiprocessor; Maximum amount of shared memory available per multiprocessor (in bytes).

enumerator DEVICE_FIELD_MAX_REGISTERS_PER_MULTIPROCESSOR#

uint32_t maxRegistersPerMultiprocessor; Maximum number of 32-bit registers available per multiprocessor.

enumerator DEVICE_FIELD_MAX_REGISTERS_PER_BLOCK#

uint32_t maxRegistersPerBlock; Maximum number of registers that can be allocated to a block.

enumerator DEVICE_FIELD_MAX_SHARED_MEMORY_PER_BLOCK#

uint32_t maxSharedMemoryPerBlock; Maximum amount of shared memory that can be assigned to a block (in bytes).

enumerator DEVICE_FIELD_MAX_THREADS_PER_BLOCK#

uint32_t maxThreadsPerBlock; Maximum number of threads allowed in a block.

enumerator DEVICE_FIELD_MAX_BLOCK_DIM_X#

uint32_t maxBlockDimX; Maximum allowed X dimension for a block.

enumerator DEVICE_FIELD_MAX_BLOCK_DIM_Y#

uint32_t maxBlockDimY; Maximum allowed Y dimension for a block.

enumerator DEVICE_FIELD_MAX_BLOCK_DIM_Z#

uint32_t maxBlockDimZ; Maximum allowed Z dimension for a block.

enumerator DEVICE_FIELD_MAX_GRID_DIM_X#

uint32_t maxGridDimX; Maximum allowed X dimension for a grid.

enumerator DEVICE_FIELD_MAX_GRID_DIM_Y#

uint32_t maxGridDimY; Maximum allowed Y dimension for a grid.

enumerator DEVICE_FIELD_MAX_GRID_DIM_Z#

uint32_t maxGridDimZ; Maximum allowed Z dimension for a grid.

enumerator DEVICE_FIELD_COMPUTE_CAPABILITY_MAJOR#

uint32_t computeCapabilityMajor; Compute capability for the device (major number).

enumerator DEVICE_FIELD_COMPUTE_CAPABILITY_MINOR#

uint32_t computeCapabilityMinor; Compute capability for the device (minor number).

enumerator DEVICE_FIELD_ID#

uint32_t id; Device ID.

enumerator DEVICE_FIELD_ECC_ENABLED#

uint32_t eccEnabled; ECC enabled flag for device.

enumerator DEVICE_FIELD_UUID#

CUuuid uuid; Device UUID.

enumerator DEVICE_FIELD_NAME#

const char *name; Device name.

enumerator DEVICE_FIELD_IS_CUDA_VISIBLE#

uint8_t isCudaVisible; Flag to indicate whether the device is visible to CUDA.

enumerator DEVICE_FIELD_IS_MIG_ENABLED#

uint8_t isMigEnabled; MIG enabled flag for device.

enumerator DEVICE_FIELD_GPU_INSTANCE_ID#

uint32_t gpuInstanceId; GPU instance ID for MIG enabled devices.

enumerator DEVICE_FIELD_COMPUTE_INSTANCE_ID#

uint32_t computeInstanceId; Compute instance ID for MIG enabled devices.

enumerator DEVICE_FIELD_MIG_UUID#

CUuuid migUuid; MIG UUID.

enumerator DEVICE_FIELD_IS_NUMA_NODE#

uint32_t isNumaNode; NUMA node flag for device.

enumerator DEVICE_FIELD_NUMA_ID#

uint32_t numaId; NUMA node ID of the GPU memory.

enumerator DEVICE_FIELD_NUM_TPCS#

uint32_t numTpcs; Number of TPCs on the device.

enumerator DEVICE_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityDeviceGraphTraceFieldIds#

Enum identifiers for fields in CUpti_ActivityDeviceGraphTrace.

Each enum value corresponds to a field in CUpti_ActivityDeviceGraphTrace and describes the data type and purpose of that field.

Values:

enumerator DEVICE_GRAPH_TRACE_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_DEVICE_GRAPH_TRACE.

enumerator DEVICE_GRAPH_TRACE_FIELD_DEVICE_ID#

uint32_t deviceId; ID of the device where the first node of the graph is executed.

enumerator DEVICE_GRAPH_TRACE_FIELD_START#

uint64_t start; Start timestamp for the graph execution (in ns).

enumerator DEVICE_GRAPH_TRACE_FIELD_END#

uint64_t end; End timestamp for the graph execution (in ns).

enumerator DEVICE_GRAPH_TRACE_FIELD_GRAPH_ID#

uint32_t graphId; Unique ID of the graph that is launched.

enumerator DEVICE_GRAPH_TRACE_FIELD_LAUNCHER_GRAPH_ID#

uint32_t launcherGraphId; Unique ID of the graph that has launched this graph.

enumerator DEVICE_GRAPH_TRACE_FIELD_DEVICE_LAUNCH_MODE#

uint32_t deviceLaunchMode; Type of launch.

enumerator DEVICE_GRAPH_TRACE_FIELD_CONTEXT_ID#

uint64_t contextId; ID of the context where the first node of the graph is executed.

enumerator DEVICE_GRAPH_TRACE_FIELD_STREAM_ID#

uint64_t streamId; ID of the stream where the graph is being launched.

This is the only record that has 64-bit streamId to accommodate device-side created streams.

enumerator DEVICE_GRAPH_TRACE_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityEnvironmentKind#

The kind of environment data.

Used to indicate what type of data is being reported by an environment activity record.

Values:

enumerator CUPTI_ACTIVITY_ENVIRONMENT_UNKNOWN#

Unknown data.

enumerator CUPTI_ACTIVITY_ENVIRONMENT_SPEED#

The environment data is related to speed.

enumerator CUPTI_ACTIVITY_ENVIRONMENT_TEMPERATURE#

The environment data is related to temperature.

enumerator CUPTI_ACTIVITY_ENVIRONMENT_POWER#

The environment data is related to power.

enumerator CUPTI_ACTIVITY_ENVIRONMENT_COOLING#

The environment data is related to cooling.

enumerator CUPTI_ACTIVITY_ENVIRONMENT_COUNT#
enumerator CUPTI_ACTIVITY_ENVIRONMENT_KIND_FORCE_INT#
enum CUpti_ActivityExternalCorrelationFieldIds#

Enum identifiers for fields in CUpti_ActivityExternalCorrelation.

This activity record correlates native CUDA records (e.g. CUDA Driver API, kernels, memcpys, …) with records from external APIs such as OpenACC. (CUPTI_ACTIVITY_KIND_EXTERNAL_CORRELATION).

Values:

enumerator EXTERNAL_CORRELATION_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_EXTERNAL_CORRELATION.

enumerator EXTERNAL_CORRELATION_FIELD_EXTERNAL_KIND#

CUpti_ExternalCorrelationKind externalKind; Kind of external API this record correlated to.

enumerator EXTERNAL_CORRELATION_FIELD_EXTERNAL_ID#

uint64_t externalId; Correlation ID of the associated non-CUDA API record.

enumerator EXTERNAL_CORRELATION_FIELD_CORRELATION_ID#

uint64_t correlationId; Correlation ID of the associated CUDA driver or runtime API record.

enumerator EXTERNAL_CORRELATION_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityFlag#

Flags associated with activity records.

Activity record flags. Flags can be combined by bitwise OR to associated multiple flags with an activity record. Each flag is specific to a certain activity kind, as noted below.

Values:

enumerator CUPTI_ACTIVITY_FLAG_NONE#

Indicates the activity record has no flags.

enumerator CUPTI_ACTIVITY_FLAG_DEVICE_CONCURRENT_KERNELS#

Indicates the activity represents a device that supports concurrent kernel execution.

Valid for CUPTI_ACTIVITY_KIND_DEVICE.

enumerator CUPTI_ACTIVITY_FLAG_DEVICE_ATTRIBUTE_CUDEVICE#

Indicates if the activity represents a CUdevice_attribute value or a CUpti_DeviceAttribute value.

Valid for CUPTI_ACTIVITY_KIND_DEVICE_ATTRIBUTE.

enumerator CUPTI_ACTIVITY_FLAG_MEMCPY_ASYNC#

Indicates the activity represents an asynchronous memcpy operation.

Valid for CUPTI_ACTIVITY_KIND_MEMCPY.

enumerator CUPTI_ACTIVITY_FLAG_MARKER_INSTANTANEOUS#

Indicates the activity represents an instantaneous marker.

Valid for CUPTI_ACTIVITY_KIND_MARKER.

enumerator CUPTI_ACTIVITY_FLAG_MARKER_START#

Indicates the activity represents a region start marker.

Valid for CUPTI_ACTIVITY_KIND_MARKER.

enumerator CUPTI_ACTIVITY_FLAG_MARKER_END#

Indicates the activity represents a region end marker.

Valid for CUPTI_ACTIVITY_KIND_MARKER.

enumerator CUPTI_ACTIVITY_FLAG_MARKER_SYNC_ACQUIRE#

Indicates the activity represents an attempt to acquire a user defined synchronization object.

Valid for CUPTI_ACTIVITY_KIND_MARKER.

enumerator CUPTI_ACTIVITY_FLAG_MARKER_SYNC_ACQUIRE_SUCCESS#

Indicates the activity represents success in acquiring the user defined synchronization object.

Valid for CUPTI_ACTIVITY_KIND_MARKER.

enumerator CUPTI_ACTIVITY_FLAG_MARKER_SYNC_ACQUIRE_FAILED#

Indicates the activity represents failure in acquiring the user defined synchronization object.

Valid for CUPTI_ACTIVITY_KIND_MARKER.

enumerator CUPTI_ACTIVITY_FLAG_MARKER_SYNC_RELEASE#

Indicates the activity represents releasing a reservation on user defined synchronization object.

Valid for CUPTI_ACTIVITY_KIND_MARKER.

enumerator CUPTI_ACTIVITY_FLAG_MARKER_COLOR_NONE#

Indicates the activity represents a marker that does not specify a color.

Valid for CUPTI_ACTIVITY_KIND_MARKER_DATA.

enumerator CUPTI_ACTIVITY_FLAG_MARKER_COLOR_ARGB#

Indicates the activity represents a marker that specifies a color in alpha-red-green-blue format.

Valid for CUPTI_ACTIVITY_KIND_MARKER_DATA.

enumerator CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_SIZE_MASK#

The number of bytes requested by each thread Valid for CUpti_ActivityGlobalAccess3.

enumerator CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_LOAD#

If bit in this flag is set, the access was load, else it is a store access.

Valid for CUpti_ActivityGlobalAccess3.

enumerator CUPTI_ACTIVITY_FLAG_GLOBAL_ACCESS_KIND_CACHED#

If this bit in flag is set, the load access was cached else it is uncached.

Valid for CUpti_ActivityGlobalAccess3.

enumerator CUPTI_ACTIVITY_FLAG_METRIC_OVERFLOWED#

If this bit in flag is set, the metric value overflowed.

Valid for CUpti_ActivityMetric and CUpti_ActivityMetricInstance.

enumerator CUPTI_ACTIVITY_FLAG_METRIC_VALUE_INVALID#

If this bit in flag is set, the metric value couldn’t be calculated.

This occurs when a value(s) required to calculate the metric is missing. Valid for CUpti_ActivityMetric and CUpti_ActivityMetricInstance.

enumerator CUPTI_ACTIVITY_FLAG_INSTRUCTION_VALUE_INVALID#

If this bit in flag is set, the source level metric value couldn’t be calculated.

This occurs when a value(s) required to calculate the source level metric cannot be evaluated. Valid for CUpti_ActivityInstructionExecution.

enumerator CUPTI_ACTIVITY_FLAG_INSTRUCTION_CLASS_MASK#

The mask for the instruction class, CUpti_ActivityInstructionClass Valid for CUpti_ActivityInstructionExecution and CUpti_ActivityInstructionCorrelation.

enumerator CUPTI_ACTIVITY_FLAG_FLUSH_FORCED#

When calling cuptiActivityFlushAll, this flag can be set to force CUPTI to flush all records in the buffer, whether finished or not.

enumerator CUPTI_ACTIVITY_FLAG_SHARED_ACCESS_KIND_SIZE_MASK#

The number of bytes requested by each thread Valid for CUpti_ActivitySharedAccess.

enumerator CUPTI_ACTIVITY_FLAG_SHARED_ACCESS_KIND_LOAD#

If bit in this flag is set, the access was load, else it is a store access.

Valid for CUpti_ActivitySharedAccess.

enumerator CUPTI_ACTIVITY_FLAG_MEMSET_ASYNC#

Indicates the activity represents an asynchronous memset operation.

Valid for CUPTI_ACTIVITY_KIND_MEMSET.

enumerator CUPTI_ACTIVITY_FLAG_THRASHING_IN_CPU#

Indicates the activity represents thrashing in CPU.

Valid for counter of kind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THRASHING in CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER

enumerator CUPTI_ACTIVITY_FLAG_THROTTLING_IN_CPU#

Indicates the activity represents page throttling in CPU.

Valid for counter of kind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THROTTLING in CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER

enumerator CUPTI_ACTIVITY_FLAG_FORCE_INT#
enum CUpti_ActivityGraphHostNodeFieldIds#

Enum identifiers for fields in CUpti_ActivityGraphHostNode.

Each enum value corresponds to a field in CUpti_ActivityGraphHostNode and describes the data type and purpose of that field.

Values:

enumerator GRAPH_HOST_NODE_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_GRAPH_HOST_NODE.

enumerator GRAPH_HOST_NODE_FIELD_STREAM_ID#

uint32_t streamId; ID of the stream that waits for the graph host node to finish.

enumerator GRAPH_HOST_NODE_FIELD_CONTEXT_ID#

uint32_t contextId; ID of the CUDA context to which the waiting CUDA stream belongs.

enumerator GRAPH_HOST_NODE_FIELD_DEVICE_ID#

uint32_t deviceId; ID of the CUDA device to which the CUDA context and stream belong.

enumerator GRAPH_HOST_NODE_FIELD_CORRELATION_ID#

uint64_t correlationId; Correlation ID of the graph host node operation.

enumerator GRAPH_HOST_NODE_FIELD_GRAPH_ID#

uint32_t graphId; Unique ID of the graph that executed this host node through graph launch.

enumerator GRAPH_HOST_NODE_FIELD_GRAPH_NODE_ID#

uint64_t graphNodeId; Unique ID of the graph node that executed this host node through graph launch.

enumerator GRAPH_HOST_NODE_FIELD_PROCESS_ID#

uint32_t processId; ID of the process where the host node is executing.

enumerator GRAPH_HOST_NODE_FIELD_THREAD_ID#

uint64_t threadId; ID of the thread where the host node is executing.

enumerator GRAPH_HOST_NODE_FIELD_START#

uint64_t start; Start timestamp.

enumerator GRAPH_HOST_NODE_FIELD_END#

uint64_t end; End timestamp.

enumerator GRAPH_HOST_NODE_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityGraphTraceFieldIds#

Enum identifiers for fields to trace graph execution.

These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_GRAPH_TRACE.

Values:

enumerator GRAPH_TRACE_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_GRAPH_TRACE.

enumerator GRAPH_TRACE_FIELD_CORRELATION_ID#

uint64_t correlationId; The correlation ID of the graph launch.

enumerator GRAPH_TRACE_FIELD_START#

uint64_t start; The start timestamp for the graph execution, in ns.

enumerator GRAPH_TRACE_FIELD_END#

uint64_t end; The end timestamp for the graph execution, in ns.

enumerator GRAPH_TRACE_FIELD_DEVICE_ID#

uint32_t deviceId; The ID of the device where the first node of the graph is executed.

If this is INT_MAX, then the start is on the host.

enumerator GRAPH_TRACE_FIELD_GRAPH_ID#

uint32_t graphId; The unique ID of the graph that is launched.

enumerator GRAPH_TRACE_FIELD_CONTEXT_ID#

uint32_t contextId; The ID of the context where the first node of the graph is executed.

If this is INT_MAX, then the start is on the host.

enumerator GRAPH_TRACE_FIELD_STREAM_ID#

uint32_t streamId; The ID of the stream where the graph is being launched.

enumerator GRAPH_TRACE_FIELD_END_DEVICE_ID#

uint32_t endDeviceId; The ID of the device where last node of the graph is executed.

enumerator GRAPH_TRACE_FIELD_END_CONTEXT_ID#

uint32_t endContextId; The ID of the context where the last node of the graph is executed.

enumerator GRAPH_TRACE_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityGreenContextFieldIds#

Enum identifiers for fields to trace green context information.

These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_GREEN_CONTEXT.

Values:

enumerator GREEN_CONTEXT_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_GREEN_CONTEXT.

enumerator GREEN_CONTEXT_FIELD_CONTEXT_ID#

uint32_t contextId; The context ID of the green context.

enumerator GREEN_CONTEXT_FIELD_PARENT_CONTEXT_ID#

uint32_t parentContextId; The ID of the parent context.

enumerator GREEN_CONTEXT_FIELD_DEVICE_ID#

uint32_t deviceId; The device ID associated with the green context.

enumerator GREEN_CONTEXT_FIELD_NUM_MULTIPROCESSORS#

uint16_t numMultiprocessors; The number of multiprocessors (SMs) allocated to the green context.

enumerator GREEN_CONTEXT_FIELD_NUM_TPCS#

uint32_t numTpcs; The number of TPCs allocated to the green context.

enumerator GREEN_CONTEXT_FIELD_LOGICAL_TPC_MASK_SIZE#

uint8_t logicalTpcMaskSize; The size (in 32-bit words) of the logical TPC mask.

enumerator GREEN_CONTEXT_FIELD_LOGICAL_TPC_MASK#

uint32_t logicalTpcMask[32]; The logical TPC mask for the green context.

enumerator GREEN_CONTEXT_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityHostLaunchFieldIds#

Enum identifiers for fields in CUpti_ActivityHostLaunch.

Each enum value corresponds to a field in CUpti_ActivityHostLaunch and describes the data type and purpose of that field.

Values:

enumerator HOST_LAUNCH_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_HOST_LAUNCH.

enumerator HOST_LAUNCH_FIELD_STREAM_ID#

uint32_t streamId; ID of the CUDA stream to which the waiting CUDA stream belongs.

enumerator HOST_LAUNCH_FIELD_CONTEXT_ID#

uint32_t contextId; ID of the CUDA context to which the waiting CUDA stream belongs.

enumerator HOST_LAUNCH_FIELD_DEVICE_ID#

uint32_t deviceId; ID of the CUDA device to which the CUDA context and stream belong.

enumerator HOST_LAUNCH_FIELD_CORRELATION_ID#

uint64_t correlationId; Correlation ID of the host launch operation.

enumerator HOST_LAUNCH_FIELD_PROCESS_ID#

uint32_t processId; ID of the process where the host function is executing.

enumerator HOST_LAUNCH_FIELD_THREAD_ID#

uint64_t threadId; ID of the thread where the host function is executing.

enumerator HOST_LAUNCH_FIELD_START#

uint64_t start; Start timestamp.

enumerator HOST_LAUNCH_FIELD_END#

uint64_t end; End timestamp.

enumerator HOST_LAUNCH_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityInstructionClass#

SASS instruction classification.

The sass instruction are broadly divided into different class. Each enum represents a classification.

Values:

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_UNKNOWN#

The instruction class is not known.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_FP_32#

Represents a 32 bit floating point operation.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_FP_64#

Represents a 64 bit floating point operation.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_INTEGER#

Represents an integer operation.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_BIT_CONVERSION#

Represents a bit conversion operation.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_CONTROL_FLOW#

Represents a control flow instruction.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_GLOBAL#

Represents a global load-store instruction.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_SHARED#

Represents a shared load-store instruction.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_LOCAL#

Represents a local load-store instruction.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_GENERIC#

Represents a generic load-store instruction.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_SURFACE#

Represents a surface load-store instruction.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_CONSTANT#

Represents a constant load instruction.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_TEXTURE#

Represents a texture load-store instruction.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_GLOBAL_ATOMIC#

Represents a global atomic instruction.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_SHARED_ATOMIC#

Represents a shared atomic instruction.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_SURFACE_ATOMIC#

Represents a surface atomic instruction.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_INTER_THREAD_COMMUNICATION#

Represents a inter-thread communication instruction.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_BARRIER#

Represents a barrier instruction.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_MISCELLANEOUS#

Represents some miscellaneous instructions which do not fit in the above classification.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_FP_16#

Represents a 16 bit floating point operation.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_UNIFORM#

Represents uniform instruction.

enumerator CUPTI_ACTIVITY_INSTRUCTION_CLASS_KIND_FORCE_INT#
enum CUpti_ActivityJitEntryType#

The types of JIT entry.

To be used in CUpti_ActivityJit.

Values:

enumerator CUPTI_ACTIVITY_JIT_ENTRY_INVALID#
enumerator CUPTI_ACTIVITY_JIT_ENTRY_PTX_TO_CUBIN#

PTX to CUBIN.

enumerator CUPTI_ACTIVITY_JIT_ENTRY_NVVM_IR_TO_PTX#

NVVM-IR to PTX.

enumerator CUPTI_ACTIVITY_JIT_ENTRY_TYPE_FORCE_INT#
enum CUpti_ActivityJitFieldIds#

Enum identifiers for fields to trace JIT operations.

These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_JIT.

Values:

enumerator JIT_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_JIT.

enumerator JIT_FIELD_ENTRY_TYPE#

CUpti_ActivityJitEntryType jitEntryType; The JIT entry type.

CUpti_ActivityJitEntryType

enumerator JIT_FIELD_OPERATION_TYPE#

CUpti_ActivityJitOperationType jitOperationType; The JIT operation type.

CUpti_ActivityJitOperationType

enumerator JIT_FIELD_DEVICE_ID#

uint32_t deviceId; The device ID.

enumerator JIT_FIELD_START#

uint64_t start; The start timestamp for the JIT operation (in nanoseconds).

enumerator JIT_FIELD_END#

uint64_t end; The end timestamp for the JIT operation (in nanoseconds).

enumerator JIT_FIELD_CORRELATION_ID#

uint64_t correlationId; Correlation ID assigned to this JIT operation.

Matches the correlation ID in the CUDA driver/runtime API activity record that launched it.

enumerator JIT_FIELD_OPERATION_CORRELATION_ID#

uint64_t jitOperationCorrelationId; The correlation ID to correlate JIT compilation, load and store operations.

Each JIT compilation unit is assigned a unique correlation ID at the time of the JIT compilation. This correlation id can be used to find the matching JIT cache load/store records.

enumerator JIT_FIELD_CACHE_SIZE#

uint64_t cacheSize; The size of compute cache.

enumerator JIT_FIELD_CACHE_PATH#

const char* cachePath; The path where the fat binary is cached.

enumerator JIT_FIELD_PROCESS_ID#

uint32_t processId; The ID of the process where the JIT operation is executing.

enumerator JIT_FIELD_THREAD_ID#

uint64_t threadId; The ID of the thread where the JIT operation is executing.

enumerator JIT_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityJitOperationType#

The types of JIT compilation operations.

To be used in CUpti_ActivityJit.

Values:

enumerator CUPTI_ACTIVITY_JIT_OPERATION_INVALID#
enumerator CUPTI_ACTIVITY_JIT_OPERATION_CACHE_LOAD#

Loaded from the compute cache.

enumerator CUPTI_ACTIVITY_JIT_OPERATION_CACHE_STORE#

Stored in the compute cache.

enumerator CUPTI_ACTIVITY_JIT_OPERATION_COMPILE#

JIT compilation.

enumerator CUPTI_ACTIVITY_JIT_OPERATION_TYPE_FORCE_INT#
enum CUpti_ActivityKernelFieldIds#

Enum identifiers for fields to trace kernel operations.

These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL.

Values:

enumerator KERNEL_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always either CUPTI_ACTIVITY_KIND_KERNEL or CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL.

enumerator KERNEL_FIELD_CACHE_CONFIG_REQUESTED#

uint8_t requested; Indicates the cache configuration requested by the kernel.

Values are from CUfunc_cache enumeration.

enumerator KERNEL_FIELD_CACHE_CONFIG_EXECUTED#

uint8_t executed; Indicates the cache configuration actually used during kernel execution.

Values are from CUfunc_cache enumeration.

enumerator KERNEL_FIELD_SHARED_MEMORY_CONFIG#

uint8_t sharedMemoryConfig; Specifies the shared memory configuration used for the kernel.

Values are from CUsharedconfig enumeration.

enumerator KERNEL_FIELD_REGISTERS_PER_THREAD#

uint16_t registersPerThread; Number of registers required for each thread executing the kernel.

enumerator KERNEL_FIELD_PARTITIONED_GLOBAL_CACHE_REQUESTED#

CUpti_ActivityPartitionedGlobalCacheConfig requested; Partitioned global caching requested for the kernel.

Relevant for devices requiring this to enable caching.

enumerator KERNEL_FIELD_PARTITIONED_GLOBAL_CACHE_EXECUTED#

CUpti_ActivityPartitionedGlobalCacheConfig executed; Partitioned global caching actually used during kernel execution.

enumerator KERNEL_FIELD_START#

uint64_t start; Start timestamp for the kernel execution (in nanoseconds).

A value of 0 indicates timestamp couldn’t be collected.

enumerator KERNEL_FIELD_END#

uint64_t end; End timestamp for the kernel execution (in nanoseconds).

A value of 0 indicates timestamp couldn’t be collected.

enumerator KERNEL_FIELD_COMPLETED#

uint64_t completed; Completion timestamp for the kernel execution, including all child kernels (in nanoseconds).

CUPTI_TIMESTAMP_UNKNOWN if unknown.

enumerator KERNEL_FIELD_DEVICE_ID#

uint32_t deviceId; ID of the device where the kernel is executing.

enumerator KERNEL_FIELD_CONTEXT_ID#

uint64_t contextId; ID of the context where the kernel is executing.

enumerator KERNEL_FIELD_STREAM_ID#

uint64_t streamId; ID of the stream where the kernel is executing.

enumerator KERNEL_FIELD_GRID_X#

int32_t gridX; X-dimension of the grid size for the kernel.

enumerator KERNEL_FIELD_GRID_Y#

int32_t gridY; Y-dimension of the grid size for the kernel.

enumerator KERNEL_FIELD_GRID_Z#

int32_t gridZ; Z-dimension of the grid size for the kernel.

enumerator KERNEL_FIELD_BLOCK_X#

int32_t blockX; X-dimension of the block size for the kernel.

enumerator KERNEL_FIELD_BLOCK_Y#

int32_t blockY; Y-dimension of the block size for the kernel.

enumerator KERNEL_FIELD_BLOCK_Z#

int32_t blockZ; Z-dimension of the block size for the kernel.

enumerator KERNEL_FIELD_STATIC_SHARED_MEMORY#

int32_t staticSharedMemory; Amount of static shared memory allocated for the kernel, in bytes.

enumerator KERNEL_FIELD_DYNAMIC_SHARED_MEMORY#

int32_t dynamicSharedMemory; Amount of dynamic shared memory reserved for the kernel, in bytes.

enumerator KERNEL_FIELD_LOCAL_MEMORY_PER_THREAD#

uint32_t localMemoryPerThread; Amount of local memory reserved per thread, in bytes.

enumerator KERNEL_FIELD_CORRELATION_ID#

uint64_t correlationId; Correlation ID of the kernel, Matches the correlation ID in the CUDA driver/runtime API activity record that launched it.

enumerator KERNEL_FIELD_GRID_ID#

int64_t gridId; Unique grid ID assigned to the kernel at runtime.

enumerator KERNEL_FIELD_NAME#

const char* name; Name of the kernel.

Shared across all activity records for the same kernel.

enumerator KERNEL_FIELD_QUEUED#

uint64_t queued; Timestamp when the kernel was queued in the command buffer (in nanoseconds).

CUPTI_TIMESTAMP_UNKNOWN if not collected.

enumerator KERNEL_FIELD_SUBMITTED#

uint64_t submitted; Timestamp when the command buffer containing the kernel launch was submitted to the GPU (in nanoseconds).

CUPTI_TIMESTAMP_UNKNOWN if not collected.

enumerator KERNEL_FIELD_LAUNCH_TYPE#

uint8_t launchType; Indicates if the kernel was executed via a regular launch or a cooperative launch.

See CUpti_ActivityLaunchType.

enumerator KERNEL_FIELD_IS_SHARED_MEMORY_CARVEOUT_REQUESTED#

uint8_t isSharedMemoryCarveoutRequested; Indicates if CU_FUNC_ATTRIBUTE_PREFERRED_SHARED_MEMORY_CARVEOUT was updated for the kernel launch.

enumerator KERNEL_FIELD_SHARED_MEMORY_CARVEOUT_REQUESTED#

uint8_t sharedMemoryCarveoutRequested; Shared memory carveout value requested for the function, as a percentage of the total resource.

enumerator KERNEL_FIELD_SHARED_MEMORY_EXECUTED#

uint32_t sharedMemoryExecuted; Shared memory size set by the driver, in bytes.

enumerator KERNEL_FIELD_GRAPH_NODE_ID#

uint64_t graphNodeId; Unique ID of the graph node that launched this kernel through graph launch APIs.

0 if not launched through graph APIs.

enumerator KERNEL_FIELD_SHMEM_LIMIT_CONFIG#

CUpti_FuncShmemLimitConfig shmemLimitConfig; Shared memory limit configuration for the kernel, indicating if a higher per-block dynamic shared memory limit was opted.

enumerator KERNEL_FIELD_GRAPH_ID#

uint32_t graphId; Unique ID of the graph that launched this kernel through graph launch APIs.

0 if not launched through graph APIs.

enumerator KERNEL_FIELD_ACCESS_POLICY_WINDOW#

CUaccessPolicyWindow* pAccessPolicyWindow; Pointer to the access policy window structure.

Defined in cuda.h.

enumerator KERNEL_FIELD_CHANNEL_ID#

uint32_t channelID; ID of the hardware channel on which the kernel is launched.

enumerator KERNEL_FIELD_CHANNEL_TYPE#

CUpti_ChannelType channelType; Type of the channel used for kernel launch.

enumerator KERNEL_FIELD_CLUSTER_X#

int32_t clusterX; X-dimension of the cluster size for the kernel.

Valid for devices with compute capability 9.0 and higher.

enumerator KERNEL_FIELD_CLUSTER_Y#

int32_t clusterY; Y-dimension of the cluster size for the kernel.

Valid for devices with compute capability 9.0 and higher.

enumerator KERNEL_FIELD_CLUSTER_Z#

int32_t clusterZ; Z-dimension of the cluster size for the kernel.

Valid for devices with compute capability 9.0 and higher.

enumerator KERNEL_FIELD_CLUSTER_SCHEDULING_POLICY#

uint32_t clusterSchedulingPolicy; Cluster scheduling policy for the kernel.

Refer to CUclusterSchedulingPolicy. Valid for devices with compute capability 9.0 and higher.

enumerator KERNEL_FIELD_LOCAL_MEMORY_TOTAL#

uint64_t localMemoryTotal_v2; Total amount of local memory reserved for the kernel, in bytes.

enumerator KERNEL_FIELD_MAX_POTENTIAL_CLUSTER_SIZE#

uint32_t maxPotentialClusterSize; Maximum cluster size for the kernel.

enumerator KERNEL_FIELD_MAX_ACTIVE_CLUSTERS#

uint32_t maxActiveClusters; Maximum number of clusters that could co-exist on the target device for the kernel.

enumerator KERNEL_FIELD_IS_DEVICE_LAUNCHED#

uint8_t isDeviceLaunched; This field is set to 1 if the kernel is part of a device launched graph.

enumerator KERNEL_FIELD_LAUNCH_PRIORITY#

int32_t priority; The launch priority of the kernel.

enumerator KERNEL_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityKind#

The kinds of activity records.

Each activity record kind represents information about a GPU or an activity occurring on a CPU or GPU. Each kind is associated with a activity record structure that holds the information associated with the kind.

See also

CUpti_Activity

Values:

enumerator CUPTI_ACTIVITY_KIND_INVALID#

The activity record is invalid.

enumerator CUPTI_ACTIVITY_KIND_MEMCPY#

A host<->host, host<->device, or device<->device memory copy.

For peer to peer memory copy, use the kind CUPTI_ACTIVITY_KIND_MEMCPY2. The corresponding activity record structure is CUpti_ActivityMemcpy6.

enumerator CUPTI_ACTIVITY_KIND_MEMSET#

A memory set executing on the GPU.

The corresponding activity record structure is CUpti_ActivityMemset4.

enumerator CUPTI_ACTIVITY_KIND_KERNEL#

A kernel executing on the GPU.

This activity kind may significantly change the overall performance characteristics of the application because all kernel executions are serialized on the GPU. Other activity kind for kernel CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL doesn’t break kernel concurrency. The corresponding activity record structure is CUpti_ActivityKernel11.

enumerator CUPTI_ACTIVITY_KIND_DRIVER#

A CUDA driver API function execution.

The corresponding activity record structure is CUpti_ActivityAPI.

enumerator CUPTI_ACTIVITY_KIND_RUNTIME#

A CUDA runtime API function execution.

The corresponding activity record structure is CUpti_ActivityAPI.

enumerator CUPTI_ACTIVITY_KIND_EVENT#

A performance counter (aka event) value.

The corresponding activity record structure is CUpti_ActivityEvent. This activity cannot be directly enabled or disabled. Information collected using the Event API. can be stored in the corresponding activity record. Starting with the CUDA 13.0 release, this enum is unsupported and should no longer be used.

enumerator CUPTI_ACTIVITY_KIND_METRIC#

A performance metric value.

The corresponding activity record structure is CUpti_ActivityMetric. This activity cannot be directly enabled or disabled. Information collected using the Metric API. can be stored in the corresponding activity record. Starting with the CUDA 13.0 release, this enum is unsupported and should no longer be used.

enumerator CUPTI_ACTIVITY_KIND_DEVICE#

Information about a CUDA device.

The corresponding activity record structure is CUpti_ActivityDevice6.

enumerator CUPTI_ACTIVITY_KIND_CONTEXT#

Information about a CUDA context.

The corresponding activity record structure is CUpti_ActivityContext4.

enumerator CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL#

A kernel executing on the GPU.

This activity kind doesn’t break kernel concurrency. The corresponding activity record structure is CUpti_ActivityKernel11.

enumerator CUPTI_ACTIVITY_KIND_NAME#

Resource naming done via NVTX APIs for thread, device, context, etc.

The corresponding activity record structure is CUpti_ActivityName.

enumerator CUPTI_ACTIVITY_KIND_MARKER#

Instantaneous, start, or end NVTX marker.

The corresponding activity record structure is CUpti_ActivityMarker2.

enumerator CUPTI_ACTIVITY_KIND_MARKER_DATA#

Extended, optional, data about a NVTX marker.

User must enable CUPTI_ACTIVITY_KIND_MARKER as well to get records for marker data. The corresponding activity record structure is CUpti_ActivityMarkerData2.

enumerator CUPTI_ACTIVITY_KIND_SOURCE_LOCATOR#

Source information about source level result.

The corresponding activity record structure is CUpti_ActivitySourceLocator. Starting with the CUDA 13.0 release, this enum is unsupported and should no longer be used. Enabling it will return the error code CUPTI_ERROR_LEGACY_PROFILER_NOT_SUPPORTED. Instead, use the SASS Metric APIs from the cupti_sass_metrics.h header.

enumerator CUPTI_ACTIVITY_KIND_GLOBAL_ACCESS#

Results for source-level global access.

The corresponding activity record structure is CUpti_ActivityGlobalAccess3. Starting with the CUDA 13.0 release, this enum is unsupported and should no longer be used. Enabling it will return the error code CUPTI_ERROR_LEGACY_PROFILER_NOT_SUPPORTED. Instead, use the SASS Metric APIs from the cupti_sass_metrics.h header.

enumerator CUPTI_ACTIVITY_KIND_BRANCH#

Results for source-level branch.

The corresponding activity record structure is CUpti_ActivityBranch2. Starting with the CUDA 13.0 release, this enum is unsupported and should no longer be used. Enabling it will return the error code CUPTI_ERROR_LEGACY_PROFILER_NOT_SUPPORTED. Instead, use the SASS Metric APIs from the cupti_sass_metrics.h header.

enumerator CUPTI_ACTIVITY_KIND_OVERHEAD#

Overhead added by CUPTI, Compiler, CUDA driver etc.

The corresponding activity record structure is CUpti_ActivityOverhead3.

enumerator CUPTI_ACTIVITY_KIND_CDP_KERNEL#

A CDP (CUDA Dynamic Parallel) kernel executing on the GPU.

The corresponding activity record structure is CUpti_ActivityCdpKernel. This activity cannot be directly enabled or disabled. It is enabled and disabled through concurrent kernel activity i.e. _CONCURRENT_KERNEL.

enumerator CUPTI_ACTIVITY_KIND_PREEMPTION#

Preemption activity record indicating a preemption of a CDP (CUDA Dynamic Parallel) kernel executing on the GPU.

The corresponding activity record structure is CUpti_ActivityPreemption.

enumerator CUPTI_ACTIVITY_KIND_ENVIRONMENT#

Environment activity records indicating power, clock, thermal, etc.

levels of the GPU. The corresponding activity record structure is CUpti_ActivityEnvironment.

enumerator CUPTI_ACTIVITY_KIND_EVENT_INSTANCE#

An performance counter value associated with a specific event domain instance.

The corresponding activity record structure is CUpti_ActivityEventInstance. This activity cannot be directly enabled or disabled. Information collected using the Event API. can be stored in the corresponding activity record. Starting with the CUDA 13.0 release, this enum is unsupported and should no longer be used.

enumerator CUPTI_ACTIVITY_KIND_MEMCPY2#

A peer to peer memory copy.

The corresponding activity record structure is CUpti_ActivityMemcpyPtoP4.

enumerator CUPTI_ACTIVITY_KIND_METRIC_INSTANCE#

A performance metric value associated with a specific metric domain instance.

The corresponding activity record structure is CUpti_ActivityMetricInstance. This activity cannot be directly enabled or disabled. Information collected using the Metric API. can be stored in the corresponding activity record. Starting with the CUDA 13.0 release, this enum is unsupported and should no longer be used.

enumerator CUPTI_ACTIVITY_KIND_INSTRUCTION_EXECUTION#

Results for source-level instruction execution.

The corresponding activity record structure is CUpti_ActivityInstructionExecution. Starting with the CUDA 13.0 release, this enum is unsupported and should no longer be used. Enabling it will return the error code CUPTI_ERROR_LEGACY_PROFILER_NOT_SUPPORTED. Instead, use the SASS Metric APIs from the cupti_sass_metrics.h header.

enumerator CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER#

Unified Memory counter record.

The corresponding activity record structure is CUpti_ActivityUnifiedMemoryCounter3.

enumerator CUPTI_ACTIVITY_KIND_FUNCTION#

Device global/function record.

The corresponding activity record structure is CUpti_ActivityFunction.

enumerator CUPTI_ACTIVITY_KIND_MODULE#

CUDA Module record.

The corresponding activity record structure is CUpti_ActivityModule. This activity cannot be directly enabled or disabled. Information collected using the module callback can be be stored in the corresponding activity record.

enumerator CUPTI_ACTIVITY_KIND_DEVICE_ATTRIBUTE#

A device attribute value.

The corresponding activity record structure is CUpti_ActivityDeviceAttribute. This activity cannot be directly enabled or disabled. Information collected using attributes CUpti_DeviceAttribute or CUdevice_attribute can be stored in the corresponding activity record.

enumerator CUPTI_ACTIVITY_KIND_SHARED_ACCESS#

Results for source-level shared access.

The corresponding activity record structure is CUpti_ActivitySharedAccess. Starting with the CUDA 13.0 release, this enum is unsupported and should no longer be used. Enabling it will return the error code CUPTI_ERROR_LEGACY_PROFILER_NOT_SUPPORTED. Instead, use the SASS Metric APIs from the cupti_sass_metrics.h header.

enumerator CUPTI_ACTIVITY_KIND_PC_SAMPLING#

PC sampling information for kernels.

This will serialize kernels. The corresponding activity record structure is CUpti_ActivityPCSampling3. Starting with the CUDA 13.0 release, this enum is unsupported and should no longer be used. Enabling it will return the error code CUPTI_ERROR_LEGACY_PROFILER_NOT_SUPPORTED. Instead, use the PC Sampling API from the cupti_pcsampling.h header, which allows concurrent kernel execution.

enumerator CUPTI_ACTIVITY_KIND_PC_SAMPLING_RECORD_INFO#

Summary information about PC sampling records.

The corresponding activity record structure is CUpti_ActivityPCSamplingRecordInfo. Starting with the CUDA 13.0 release, this enum is unsupported and should no longer be used. Enabling it will return the error code CUPTI_ERROR_LEGACY_PROFILER_NOT_SUPPORTED. Instead, use the PC Sampling API from the cupti_pcsampling.h header.

enumerator CUPTI_ACTIVITY_KIND_INSTRUCTION_CORRELATION#

SASS/Source line-by-line correlation record.

This will generate sass/source correlation for functions that have source level analysis or pc sampling results. The records will be generated only when either of source level analysis or pc sampling activity is enabled. The corresponding activity record structure is CUpti_ActivityInstructionCorrelation. Starting with the CUDA 13.0 release, this enum is unsupported and should no longer be used. Enabling it will return the error code CUPTI_ERROR_LEGACY_PROFILER_NOT_SUPPORTED. Instead, use the SASS Metric APIs from the cupti_sass_metrics.h header.

enumerator CUPTI_ACTIVITY_KIND_OPENACC_DATA#

OpenACC data events.

The corresponding activity record structure is CUpti_ActivityOpenAccData.

enumerator CUPTI_ACTIVITY_KIND_OPENACC_LAUNCH#

OpenACC launch events.

The corresponding activity record structure is CUpti_ActivityOpenAccLaunch.

enumerator CUPTI_ACTIVITY_KIND_OPENACC_OTHER#

OpenACC other events.

The corresponding activity record structure is CUpti_ActivityOpenAccOther.

enumerator CUPTI_ACTIVITY_KIND_CUDA_EVENT#

Information about a CUDA event (cudaEvent).

The corresponding activity record structure is CUpti_ActivityCudaEvent2.

enumerator CUPTI_ACTIVITY_KIND_STREAM#

Information about a CUDA stream.

The corresponding activity record structure is CUpti_ActivityStream.

enumerator CUPTI_ACTIVITY_KIND_SYNCHRONIZATION#

Records for CUDA synchronization primitives.

The corresponding activity record structure is CUpti_ActivitySynchronization2.

enumerator CUPTI_ACTIVITY_KIND_EXTERNAL_CORRELATION#

Records for correlation of different programming APIs.

The corresponding activity record structure is CUpti_ActivityExternalCorrelation.

enumerator CUPTI_ACTIVITY_KIND_NVLINK#

NVLink topology information.

The corresponding activity record structure is CUpti_ActivityNvLink5.

enumerator CUPTI_ACTIVITY_KIND_INSTANTANEOUS_EVENT#

Instantaneous Event information.

The corresponding activity record structure is CUpti_ActivityInstantaneousEvent. This activity can not be directly enabled or disabled. Information collected using the Event API can be stored in the corresponding activity record. Starting with the CUDA 13.0 release, this enum is unsupported and should no longer be used.

enumerator CUPTI_ACTIVITY_KIND_INSTANTANEOUS_EVENT_INSTANCE#

Instantaneous Event information for a specific event domain instance.

The corresponding activity record structure is CUpti_ActivityInstantaneousEventInstance. This activity can not be directly enabled or disabled. Information collected using the Event API can be stored in the corresponding activity record. Starting with the CUDA 13.0 release, this enum is unsupported and should no longer be used.

enumerator CUPTI_ACTIVITY_KIND_INSTANTANEOUS_METRIC#

Instantaneous Metric information The corresponding activity record structure is CUpti_ActivityInstantaneousMetric.

This activity cannot be directly enabled or disabled. Information collected using the Metric API can be stored in the corresponding activity record. Starting with the CUDA 13.0 release, this enum is unsupported and should no longer be used.

enumerator CUPTI_ACTIVITY_KIND_INSTANTANEOUS_METRIC_INSTANCE#

Instantaneous Metric information for a specific metric domain instance.

The corresponding activity record structure is CUpti_ActivityInstantaneousMetricInstance. This activity cannot be directly enabled or disabled. Information collected using the Metric API can be stored in the corresponding activity record. Starting with the CUDA 13.0 release, this enum is unsupported and should no longer be used.

enumerator CUPTI_ACTIVITY_KIND_MEMORY#

Memory activity tracking allocation and freeing of the memory The corresponding activity record structure is CUpti_ActivityMemory.

enumerator CUPTI_ACTIVITY_KIND_PCIE#

PCI devices information used for PCI topology.

The corresponding activity record structure is CUpti_ActivityPcie.

enumerator CUPTI_ACTIVITY_KIND_OPENMP#

OpenMP parallel events.

The corresponding activity record structure is CUpti_ActivityOpenMp.

enumerator CUPTI_ACTIVITY_KIND_INTERNAL_LAUNCH_API#

A CUDA driver kernel launch occurring outside of any public API function execution.

Tools can handle these like records for driver API launch functions, although the cbid field is not used here. The corresponding activity record structure is CUpti_ActivityAPI.

enumerator CUPTI_ACTIVITY_KIND_MEMORY2#

Memory activity tracking allocation and freeing of the memory The corresponding activity record structure is CUpti_ActivityMemory4.

enumerator CUPTI_ACTIVITY_KIND_MEMORY_POOL#

Memory pool activity tracking creation, destruction and trimming of the memory pool.

The corresponding activity record structure is CUpti_ActivityMemoryPool3.

enumerator CUPTI_ACTIVITY_KIND_GRAPH_TRACE#

Activity record for graph-level information.

The corresponding activity record structure is CUpti_ActivityGraphTrace2.

enumerator CUPTI_ACTIVITY_KIND_JIT#

JIT (Just-in-time) operation tracking.

The corresponding activity record structure is CUpti_ActivityJit2.

enumerator CUPTI_ACTIVITY_KIND_DEVICE_GRAPH_TRACE#

This activity can not be directly enabled or disabled.

It is enabled when CUPTI_ACTIVITY_KIND_GRAPH_TRACE is enabled and device graph trace is enabled through API cuptiActivityEnableDeviceGraph(). The corresponding activity record structure is CUpti_ActivityDeviceGraphTrace.

enumerator CUPTI_ACTIVITY_KIND_MEM_DECOMPRESS#

Tracing batches of copies that are to be decompressed.

The corresponding activity record structure is CUpti_ActivityMemDecompress.

enumerator CUPTI_ACTIVITY_KIND_CONFIDENTIAL_COMPUTE_ROTATION#

Tracing new overheads introduced on some hardware due when confidential computing is enabled.

The corresponding activity record structure is CUpti_ActivityConfidentialComputeRotation.

enumerator CUPTI_ACTIVITY_KIND_GRAPH_HOST_NODE#

Tracing of host execution nodes of the CUDA graph, i.e.

nodes of type CU_GRAPH_NODE_TYPE_HOST. The corresponding activity record structure is CUpti_ActivityGraphHostNode.

enumerator CUPTI_ACTIVITY_KIND_COMPUTE_ENGINE_CTX_SWITCH#

An activity denoting the switching of Compute Engine contexts in/out of the GPU.

The corresponding activity record structure is CUpti_ActivityComputeEngineCtxSwitch.

enumerator CUPTI_ACTIVITY_KIND_HOST_LAUNCH#

An activity kind denoting the launch of a host function through cu(da)LaunchHostFunc API.

The corresponding activity record structure is CUpti_ActivityHostLaunch.

enumerator CUPTI_ACTIVITY_KIND_GREEN_CONTEXT#

An activity kind denoting the allocation of a green context.

The corresponding activity record structure is CUpti_ActivityGreenContext.

enumerator CUPTI_ACTIVITY_KIND_COUNT#

Count of supported activity kinds.

enumerator CUPTI_ACTIVITY_KIND_FORCE_INT#
enum CUpti_ActivityLaunchType#

The type of the CUDA kernel launch.

Values:

enumerator CUPTI_ACTIVITY_LAUNCH_TYPE_REGULAR#

The kernel was launched via a regular kernel call.

enumerator CUPTI_ACTIVITY_LAUNCH_TYPE_COOPERATIVE_SINGLE_DEVICE#

The kernel was launched via API cudaLaunchCooperativeKernel() or cuLaunchCooperativeKernel()

enumerator CUPTI_ACTIVITY_LAUNCH_TYPE_COOPERATIVE_MULTI_DEVICE#

The kernel was launched via API cudaLaunchCooperativeKernelMultiDevice() or cuLaunchCooperativeKernelMultiDevice()

enumerator CUPTI_ACTIVITY_LAUNCH_TYPE_CBL_COMMANDLIST#

The kernel was launched as a CBL commandlist.

enum CUpti_ActivityMarkerDataFieldIds#

Enum identifiers for fields to trace detailed information for a marker.

User must enable CUPTI_ACTIVITY_KIND_MARKER as well to get records for marker data. These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_MARKER_DATA.

Values:

enumerator MARKER_DATA_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_MARKER_DATA.

enumerator MARKER_DATA_FIELD_FLAGS#

CUpti_ActivityFlag flags; Flags associated with the marker.

enumerator MARKER_DATA_FIELD_ID#

uint32_t id; Marker ID.

enumerator MARKER_DATA_FIELD_PAYLOAD_KIND#

CUpti_MetricValueKind payloadKind; Defines the payload format for the value associated with the marker.

enumerator MARKER_DATA_FIELD_PAYLOAD#

CUpti_MetricValue payload; The payload value.

enumerator MARKER_DATA_FIELD_COLOR#

uint32_t color; The color for the marker.

enumerator MARKER_DATA_FIELD_CATEGORY#

uint32_t category; The category for the marker.

enumerator MARKER_DATA_FIELD_CUPTI_DOMAIN_ID#

uint32_t cuptiDomainId; CUPTI maintained domain id required for NVTX extended payloads.

enumerator MARKER_DATA_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityMarkerFieldIds#

Enum identifiers for fields to trace NVTX markers/ranges.

These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_MARKER.

Values:

enumerator MARKER_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_MARKER.

enumerator MARKER_FIELD_FLAGS#

CUpti_ActivityFlag flags; Flags associated with the marker.

enumerator MARKER_FIELD_TIMESTAMP#

uint64_t timestamp; Timestamp for the marker (in ns).

enumerator MARKER_FIELD_ID#

uint32_t id; Marker ID.

enumerator MARKER_FIELD_PROCESS_ID#

uint32_t processId; The process ID of the process where the overhead is occurring.

enumerator MARKER_FIELD_THREAD_ID#

uint64_t threadId; The thread ID of the process where the overhead is occurring.

enumerator MARKER_FIELD_NAME#

const char *name; Marker name for an instantaneous or start marker.

enumerator MARKER_FIELD_DOMAIN#

const char *domain; Name of the domain to which this marker belongs.

enumerator MARKER_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityMemDecompressFieldIds#

Enum identifiers for fields to trace decompression operations.

These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_MEM_DECOMPRESS.

Values:

enumerator MEM_DECOMPRESS_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_MEM_DECOMPRESS.

enumerator MEM_DECOMPRESS_FIELD_DEVICE_ID#

uint32_t deviceId; The ID of the device where the decompression operation was recorded.

enumerator MEM_DECOMPRESS_FIELD_CONTEXT_ID#

uint32_t contextId; The ID of the context where the decompression operation was recorded.

enumerator MEM_DECOMPRESS_FIELD_STREAM_ID#

uint32_t streamId; The compute stream where the decompression operation was recorded.

enumerator MEM_DECOMPRESS_FIELD_CHANNEL_ID#

uint32_t channelId; The ID of the HW channel on which the decompression operation was recorded.

enumerator MEM_DECOMPRESS_FIELD_CHANNEL_TYPE#

CUpti_ChannelType channelType; The type of the channel on which the decompression operation was recorded.

enumerator MEM_DECOMPRESS_FIELD_CORRELATION_ID#

uint64_t correlationId; Correlation ID assigned to this decompression operation.

Matches the correlation ID in the CUDA driver/runtime API activity record that launched it.

enumerator MEM_DECOMPRESS_FIELD_NUMBER_OF_OPERATIONS#

uint32_t numberOfOperations; The number of decompression operations in this batch.

enumerator MEM_DECOMPRESS_FIELD_SOURCE_BYTES#

uint64_t sourceBytes; The total number of bytes to be read and decompressed in the batch operation.

enumerator MEM_DECOMPRESS_FIELD_START#

uint64_t start; Start timestamp of the decompression operation, in nanoseconds.

enumerator MEM_DECOMPRESS_FIELD_END#

uint64_t end; End timestamp of the decompression operation, in nanoseconds.

enumerator MEM_DECOMPRESS_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityMemcpy2FieldIds#

Enum identifiers for fields to trace peer-to-peer memory copies.

These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_MEMCPY2.

Values:

enumerator MEMCPY2_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_MEMCPY2.

enumerator MEMCPY2_FIELD_COPY_KIND#

uint8_t copyKind; Type of memory copy.

ALways CUPTI_ACTIVITY_MEMCPY_KIND_P2P.

enumerator MEMCPY2_FIELD_SRC_KIND#

uint8_t srcKind; Kind of source memory used in the copy.

enumerator MEMCPY2_FIELD_DST_KIND#

uint8_t dstKind; Kind of destination memory used in the copy.

enumerator MEMCPY2_FIELD_FLAGS#

uint8_t flags; Flags associated with the memory copy.

enumerator MEMCPY2_FIELD_BYTES#

uint64_t bytes; Number of bytes transferred during the memory copy.

enumerator MEMCPY2_FIELD_START#

uint64_t start; Start timestamp for the memory copy (in nanoseconds).

A value of 0 indicates timestamp couldn’t be collected.

enumerator MEMCPY2_FIELD_END#

uint64_t end; End timestamp for the memory copy (in nanoseconds).

A value of 0 indicates timestamp couldn’t be collected.

enumerator MEMCPY2_FIELD_DEVICE_ID#

uint32_t deviceId; Device where the memory copy is occurring.

enumerator MEMCPY2_FIELD_CONTEXT_ID#

uint32_t contextId; CUDA context where the memory copy is occurring.

enumerator MEMCPY2_FIELD_STREAM_ID#

uint32_t streamId; Stream in which the memory copy is occurring.

enumerator MEMCPY2_FIELD_SRC_DEVICE_ID#

uint32_t srcDeviceId; Device from which memory is being copied.

enumerator MEMCPY2_FIELD_SRC_CONTEXT_ID#

uint32_t srcContextId; CUDA context owning the source memory.

enumerator MEMCPY2_FIELD_DST_DEVICE_ID#

uint32_t dstDeviceId; Device to which memory is being copied.

enumerator MEMCPY2_FIELD_DST_CONTEXT_ID#

uint32_t dstContextId; CUDA context owning the destination memory.

enumerator MEMCPY2_FIELD_CORRELATION_ID#

uint64_t correlationId; Correlation ID assigned to this memory copy.

Matches the correlation ID in the CUDA driver/runtime API activity record that launched it.

enumerator MEMCPY2_FIELD_GRAPH_NODE_ID#

uint64_t graphNodeId; ID of the graph node that launched this memcpy.

0 if not launched via a CUDA Graph.

enumerator MEMCPY2_FIELD_GRAPH_ID#

uint32_t graphId; ID of the CUDA Graph associated with this memory copy.

0 if not launched via a CUDA Graph.

enumerator MEMCPY2_FIELD_CHANNEL_ID#

uint32_t channelID; Hardware channel ID used for the memory copy.

enumerator MEMCPY2_FIELD_CHANNEL_TYPE#

CUpti_ChannelType channelType; Type of the hardware channel.

enumerator MEMCPY2_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityMemcpyFieldIds#

Enum identifiers for fields to trace memory copies.

These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_MEMCPY.

Values:

enumerator MEMCPY_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_MEMCPY.

enumerator MEMCPY_FIELD_COPY_KIND#

uint8_t copyKind; Type of memory copy operation.

enumerator MEMCPY_FIELD_SRC_KIND#

uint8_t srcKind; Memory kind of the source location.

enumerator MEMCPY_FIELD_DST_KIND#

uint8_t dstKind; Memory kind of the destination location.

enumerator MEMCPY_FIELD_FLAGS#

uint8_t flags; Flags associated with the memory copy.

enumerator MEMCPY_FIELD_BYTES#

uint64_t bytes; Number of bytes copied during the memory operation.

enumerator MEMCPY_FIELD_START#

uint64_t start; Start timestamp of the memory copy, in nanoseconds.

A value of 0 indicates timestamp couldn’t be collected.

enumerator MEMCPY_FIELD_END#

uint64_t end; End timestamp of the memory copy, in nanoseconds.

A value of 0 indicates timestamp couldn’t be collected.

enumerator MEMCPY_FIELD_DEVICE_ID#

uint32_t deviceId; ID of the device where the memory copy occurred.

enumerator MEMCPY_FIELD_CONTEXT_ID#

uint32_t contextId; ID of the CUDA context where the memory copy occurred.

enumerator MEMCPY_FIELD_STREAM_ID#

uint32_t streamId; ID of the stream where the memory copy occurred.

enumerator MEMCPY_FIELD_CORRELATION_ID#

uint64_t correlationId; Correlation ID assigned to this memory copy.

Matches the correlation ID in the CUDA driver/runtime API activity record that launched it.

enumerator MEMCPY_FIELD_GRAPH_NODE_ID#

uint64_t graphNodeId; ID of the CUDA Graph node that launched this memory copy.

0 if not launched via a CUDA Graph.

enumerator MEMCPY_FIELD_GRAPH_ID#

uint32_t graphId; ID of the CUDA Graph that launched this memory copy.

0 if not launched via a CUDA Graph.

enumerator MEMCPY_FIELD_CHANNEL_ID#

uint32_t channelID; ID of the hardware channel on which this memory copy was executed.

enumerator MEMCPY_FIELD_CHANNEL_TYPE#

CUpti_ChannelType channelType; Type of the hardware channel used.

enumerator MEMCPY_FIELD_IS_DEVICE_LAUNCHED#

uint8_t isDeviceLaunched; This field is set to 1 if the kernel is part of a device launched graph.

enumerator MEMCPY_FIELD_COPY_COUNT#

uint64_t copyCount; Number of individual memcpy operations traced in this record.

>1 if batched using MemcpyBatchAsync; otherwise 1.

enumerator MEMCPY_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityMemcpyKind#

The kind of a memory copy, indicating the source and destination targets of the copy.

Each kind represents the source and destination targets of a memory copy. Targets are host, device, and array.

Values:

enumerator CUPTI_ACTIVITY_MEMCPY_KIND_UNKNOWN#

The memory copy kind is not known.

enumerator CUPTI_ACTIVITY_MEMCPY_KIND_HTOD#

A host to device memory copy.

enumerator CUPTI_ACTIVITY_MEMCPY_KIND_DTOH#

A device to host memory copy.

enumerator CUPTI_ACTIVITY_MEMCPY_KIND_HTOA#

A host to device array memory copy.

enumerator CUPTI_ACTIVITY_MEMCPY_KIND_ATOH#

A device array to host memory copy.

enumerator CUPTI_ACTIVITY_MEMCPY_KIND_ATOA#

A device array to device array memory copy.

enumerator CUPTI_ACTIVITY_MEMCPY_KIND_ATOD#

A device array to device memory copy.

enumerator CUPTI_ACTIVITY_MEMCPY_KIND_DTOA#

A device to device array memory copy.

enumerator CUPTI_ACTIVITY_MEMCPY_KIND_DTOD#

A device to device memory copy on the same device.

enumerator CUPTI_ACTIVITY_MEMCPY_KIND_HTOH#

A host to host memory copy.

enumerator CUPTI_ACTIVITY_MEMCPY_KIND_PTOP#

A peer to peer memory copy across different devices.

enumerator CUPTI_ACTIVITY_MEMCPY_KIND_FORCE_INT#
enum CUpti_ActivityMemoryFieldIds#

Enum identifiers for fields to trace memory allocation and free operation.

This activity record provides separate records for memory allocation and memory release operations. This allows to correlate the corresponding driver and runtime API activity record with the memory operation.

These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_MEMORY2.

Values:

enumerator MEMORY_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_MEMORY2.

enumerator MEMORY_FIELD_OPERATION_TYPE#

CUpti_ActivityMemoryOperationType memoryOperationType; The memory operation requested by the user.

enumerator MEMORY_FIELD_MEMORY_KIND#

CUpti_ActivityMemoryKind memoryKind; The memory kind requested by the user.

enumerator MEMORY_FIELD_CORRELATION_ID#

uint64_t correlationId; Correlation ID of the memory operation.

enumerator MEMORY_FIELD_ADDRESS#

uint64_t address; Virtual address of the allocation.

enumerator MEMORY_FIELD_BYTES#

uint64_t bytes; Number of bytes of memory allocated.

enumerator MEMORY_FIELD_TIMESTAMP#

uint64_t timestamp; Start timestamp for the memory operation (in ns).

enumerator MEMORY_FIELD_PROCESS_ID#

uint32_t processId; ID of the process to which this record belongs.

enumerator MEMORY_FIELD_DEVICE_ID#

uint32_t deviceId; ID of the device where the memory operation is taking place.

enumerator MEMORY_FIELD_CONTEXT_ID#

uint32_t contextId; ID of the context.

enumerator MEMORY_FIELD_STREAM_ID#

uint32_t streamId; ID of the stream.

enumerator MEMORY_FIELD_IS_ASYNC#

uint32_t isAsync; Whether the memory operation happens through async memory APIs.

enumerator MEMORY_FIELD_POOL_TYPE#

CUpti_ActivityMemoryPoolType memoryPoolConfig.memoryPoolType; Type of the memory pool.

enumerator MEMORY_FIELD_POOL_ADDRESS#

uint64_t memoryPoolConfig.address; Base address of the memory pool.

enumerator MEMORY_FIELD_POOL_RELEASE_THRESHOLD#

uint64_t memoryPoolConfig.releaseThreshold; Release threshold of the memory pool in bytes.

enumerator MEMORY_FIELD_POOL_SIZE#

uint64_t memoryPoolConfig.pool.size; Size of memory pool in bytes.

enumerator MEMORY_FIELD_POOL_PROCESS_ID#

uint64_t memoryPoolConfig.pool.processId; Process ID of the memory pools.

enumerator MEMORY_FIELD_POOL_UTILIZED_SIZE#

uint64_t memoryPoolConfig.utilizedSize; Utilized size of the memory pool.

enumerator MEMORY_FIELD_SOURCE#

const char* source; Shared object or binary that the memory allocation request comes from.

enumerator MEMORY_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityMemoryKind#

The kinds of memory accessed by a memory operation/copy.

Each kind represents the type of the memory accessed by a memory operation/copy.

Values:

enumerator CUPTI_ACTIVITY_MEMORY_KIND_UNKNOWN#

The memory kind is unknown.

enumerator CUPTI_ACTIVITY_MEMORY_KIND_PAGEABLE#

The memory is pageable.

enumerator CUPTI_ACTIVITY_MEMORY_KIND_PINNED#

The memory is pinned.

enumerator CUPTI_ACTIVITY_MEMORY_KIND_DEVICE#

The memory is on the device.

enumerator CUPTI_ACTIVITY_MEMORY_KIND_ARRAY#

The memory is an array.

enumerator CUPTI_ACTIVITY_MEMORY_KIND_MANAGED#

The memory is managed.

enumerator CUPTI_ACTIVITY_MEMORY_KIND_DEVICE_STATIC#

The memory is device static.

enumerator CUPTI_ACTIVITY_MEMORY_KIND_MANAGED_STATIC#

The memory is managed static.

enumerator CUPTI_ACTIVITY_MEMORY_KIND_FORCE_INT#
enum CUpti_ActivityMemoryOperationType#

Memory operation types.

Describes the type of memory operation, to be used with CUpti_ActivityMemory4.

Values:

enumerator CUPTI_ACTIVITY_MEMORY_OPERATION_TYPE_INVALID#

The operation is invalid.

enumerator CUPTI_ACTIVITY_MEMORY_OPERATION_TYPE_ALLOCATION#

Memory is allocated.

enumerator CUPTI_ACTIVITY_MEMORY_OPERATION_TYPE_RELEASE#

Memory is released.

enumerator CUPTI_ACTIVITY_MEMORY_OPERATION_TYPE_FORCE_INT#
enum CUpti_ActivityMemoryPoolFieldIds#

Enum identifiers for fields to trace CUDA memory pool creation, destruction and trimming.

This activity record provides separate records for memory pool creation, destruction and trimming operations. This allows to correlate the corresponding driver and runtime API activity record with the memory pool operation.

These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_MEMORY_POOL.

Values:

enumerator MEMORY_POOL_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_MEMORY_POOL.

enumerator MEMORY_POOL_FIELD_OPERATION_TYPE#

CUpti_ActivityMemoryPoolOperationType memoryPoolOperationType; The memory operation requested by the user.

enumerator MEMORY_POOL_FIELD_POOL_TYPE#

CUpti_ActivityMemoryPoolType memoryPoolType; The type of the memory pool.

enumerator MEMORY_POOL_FIELD_CORRELATION_ID#

uint64_t correlationId; Correlation ID assigned to this memory pool operation.

Matches the correlation ID in the CUDA driver/runtime API activity record that launched it.

enumerator MEMORY_POOL_FIELD_PROCESS_ID#

uint32_t processId; ID of the process to which this record belongs.

enumerator MEMORY_POOL_FIELD_DEVICE_ID#

uint32_t deviceId; ID of the device where the memory pool is created.

enumerator MEMORY_POOL_FIELD_MIN_BYTES_TO_KEEP#

size_t minBytesToKeep; Minimum bytes to keep of the memory pool.

enumerator MEMORY_POOL_FIELD_ADDRESS#

uint64_t address; Virtual address of the allocation.

enumerator MEMORY_POOL_FIELD_SIZE#

uint64_t size; Size of the memory pool operation in bytes.

enumerator MEMORY_POOL_FIELD_RELEASE_THRESHOLD#

uint64_t releaseThreshold; Release threshold of the memory pool.

enumerator MEMORY_POOL_FIELD_TIMESTAMP#

uint64_t timestamp; Start timestamp for the memory operation (in ns).

enumerator MEMORY_POOL_FIELD_UTILIZED_SIZE#

uint64_t utilizedSize; Utilized size of the memory pool.

enumerator MEMORY_POOL_FIELD_IS_MANAGED_POOL#

uint8_t isManagedPool; Whether the pool is of managed memory allocation or pinned memory allocation.

enumerator MEMORY_POOL_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityMemoryPoolOperationType#

Memory pool operation types.

Describes the type of memory pool operation, to be used with CUpti_ActivityMemoryPool2.

Values:

enumerator CUPTI_ACTIVITY_MEMORY_POOL_OPERATION_TYPE_INVALID#

The operation is invalid.

enumerator CUPTI_ACTIVITY_MEMORY_POOL_OPERATION_TYPE_CREATED#

Memory pool is created.

enumerator CUPTI_ACTIVITY_MEMORY_POOL_OPERATION_TYPE_DESTROYED#

Memory pool is destroyed.

enumerator CUPTI_ACTIVITY_MEMORY_POOL_OPERATION_TYPE_TRIMMED#

Memory pool is trimmed.

enumerator CUPTI_ACTIVITY_MEMORY_POOL_OPERATION_TYPE_FORCE_INT#
enum CUpti_ActivityMemoryPoolType#

Memory pool types.

Describes the type of memory pool, to be used with CUpti_ActivityMemory4.

Values:

enumerator CUPTI_ACTIVITY_MEMORY_POOL_TYPE_INVALID#

The operation is invalid.

enumerator CUPTI_ACTIVITY_MEMORY_POOL_TYPE_LOCAL#

Memory pool is local to the process.

enumerator CUPTI_ACTIVITY_MEMORY_POOL_TYPE_IMPORTED#

Memory pool is imported by the process.

enumerator CUPTI_ACTIVITY_MEMORY_POOL_TYPE_FORCE_INT#
enum CUpti_ActivityMemsetFieldIds#

Enum identifiers for fields to trace memset operations.

These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_MEMSET.

Values:

enumerator MEMSET_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_MEMSET.

enumerator MEMSET_FIELD_VALUE#

uint32_t value; Value being written to memory by the memset.

enumerator MEMSET_FIELD_BYTES#

uint64_t bytes; Number of bytes being set.

enumerator MEMSET_FIELD_START#

uint64_t start; Start timestamp for the memset operation (in nanoseconds).

A value of 0 indicates timestamp couldn’t be collected.

enumerator MEMSET_FIELD_END#

uint64_t end; End timestamp for the memset operation (in nanoseconds).

A value of 0 indicates timestamp couldn’t be collected.

enumerator MEMSET_FIELD_DEVICE_ID#

uint32_t deviceId; Device on which the memset is occurring.

enumerator MEMSET_FIELD_CONTEXT_ID#

uint32_t contextId; CUDA context where the memset is occurring.

enumerator MEMSET_FIELD_STREAM_ID#

uint32_t streamId; Stream where the memset is occurring.

enumerator MEMSET_FIELD_CORRELATION_ID#

uint64_t correlationId; Correlation ID assigned to this memset operation.

Matches the correlation ID in the CUDA driver/runtime API activity record that launched it.

enumerator MEMSET_FIELD_FLAGS#

uint16_t flags; Flags associated with the memset.

enumerator MEMSET_FIELD_MEMORY_KIND#

uint16_t memoryKind; Kind of memory being set.

enumerator MEMSET_FIELD_GRAPH_NODE_ID#

uint64_t graphNodeId; CUDA Graph node ID that executed this memset.

enumerator MEMSET_FIELD_GRAPH_ID#

uint32_t graphId; CUDA Graph ID that executed this memset.

enumerator MEMSET_FIELD_CHANNEL_ID#

uint32_t channelID; Hardware channel ID used by the memset.

enumerator MEMSET_FIELD_CHANNEL_TYPE#

CUpti_ChannelType channelType; Type of hardware channel used by the memset.

enumerator MEMSET_FIELD_IS_DEVICE_LAUNCHED#

uint8_t isDeviceLaunched; This field is set to 1 if the kernel is part of a device launched graph.

enumerator MEMSET_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityNameFieldIds#

Enum identifiers for fields in CUpti_ActivityName.

Each enum value corresponds to a field in CUpti_ActivityName and describes the data type and purpose of that field.

Values:

enumerator NAME_FIELD_KIND#

CUpti_ActivityKind kind; Kind of activity record: CUPTI_ACTIVITY_KIND_NAME.

enumerator NAME_FIELD_OBJECT_KIND#

CUpti_ActivityObjectKind objectKind; Kind of activity object being named.

enumerator NAME_FIELD_OBJECT_ID#

CUpti_ActivityObjectKindId objectId; Identifier for the activity object being named.

enumerator NAME_FIELD_NAME#

const char *name; The name.

enumerator NAME_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityObjectKind#

The kinds of activity objects.

Values:

enumerator CUPTI_ACTIVITY_OBJECT_UNKNOWN#

The object kind is not known.

enumerator CUPTI_ACTIVITY_OBJECT_PROCESS#

A process.

enumerator CUPTI_ACTIVITY_OBJECT_THREAD#

A thread.

enumerator CUPTI_ACTIVITY_OBJECT_DEVICE#

A device.

enumerator CUPTI_ACTIVITY_OBJECT_CONTEXT#

A context.

enumerator CUPTI_ACTIVITY_OBJECT_STREAM#

A stream.

enumerator CUPTI_ACTIVITY_OBJECT_FORCE_INT#
enum CUpti_ActivityOverheadFieldIds#

Enum identifiers for fields to trace CUPTI and driver overhead information.

These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_OVERHEAD.

Values:

enumerator OVERHEAD_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_OVERHEAD.

enumerator OVERHEAD_FIELD_OVERHEAD_KIND#

CUpti_ActivityOverheadKind overheadKind; The kind of overhead, CUPTI, DRIVER, COMPILER etc.

enumerator OVERHEAD_FIELD_PROCESS_ID#

uint32_t processId; The process ID of the process where the overhead is occurring.

enumerator OVERHEAD_FIELD_THREAD_ID#

uint64_t threadId; The thread ID of the process where the overhead is occurring.

enumerator OVERHEAD_FIELD_START#

uint64_t start; The start timestamp for the overhead, in nanoseconds.

enumerator OVERHEAD_FIELD_END#

uint64_t end; The end timestamp for the overhead, in nanoseconds.

enumerator OVERHEAD_FIELD_CORRELATION_ID#

uint64_t correlationId; Correlation ID assigned to this overhead record.

Matches the correlation ID in the CUDA driver/runtime API activity record that launched it.

enumerator OVERHEAD_FIELD_OVERHEAD_DATA#

void *overheadData; Pointer to the struct with additional details about the overhead.

Refer CUpti_ActivityOverheadKind enum and the corresponding structure to typecast and access additional overhead data. Client is responsible for freeing this memory using the free function when done.

enumerator OVERHEAD_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityOverheadKind#

The kinds of activity overhead.

Values:

enumerator CUPTI_ACTIVITY_OVERHEAD_UNKNOWN#

The overhead kind is not known.

enumerator CUPTI_ACTIVITY_OVERHEAD_DRIVER_COMPILER#

Compiler overhead.

enumerator CUPTI_ACTIVITY_OVERHEAD_CUPTI_BUFFER_FLUSH#

Activity buffer flush overhead.

enumerator CUPTI_ACTIVITY_OVERHEAD_CUPTI_INSTRUMENTATION#

CUPTI instrumentation overhead.

enumerator CUPTI_ACTIVITY_OVERHEAD_CUPTI_RESOURCE#

CUPTI resource creation and destruction overhead.

enumerator CUPTI_ACTIVITY_OVERHEAD_RUNTIME_TRIGGERED_MODULE_LOADING#

CUDA Runtime triggered module loading overhead.

enumerator CUPTI_ACTIVITY_OVERHEAD_LAZY_FUNCTION_LOADING#

Lazy function loading overhead.

enumerator CUPTI_ACTIVITY_OVERHEAD_COMMAND_BUFFER_FULL#

Overhead due to lack of command buffer space.

Refer CUpti_ActivityOverheadCommandBufferFullData for more details.

enumerator CUPTI_ACTIVITY_OVERHEAD_ACTIVITY_BUFFER_REQUEST#

Overhead due to activity buffer request.

enumerator CUPTI_ACTIVITY_OVERHEAD_UVM_ACTIVITY_INIT#

Overhead due to UVM activity initialization.

enumerator CUPTI_ACTIVITY_OVERHEAD_FORCE_INT#
enum CUpti_ActivityPCSamplingPeriod#

Sampling period for PC sampling method.

Sampling period can be set using cuptiActivityConfigurePCSampling

Values:

enumerator CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_INVALID#

The PC sampling period is not set.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_MIN#

Minimum sampling period available on the device.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_LOW#

Sampling period in lower range.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_MID#

Medium sampling period.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_HIGH#

Sampling period in higher range.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_MAX#

Maximum sampling period available on the device.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_PERIOD_FORCE_INT#
enum CUpti_ActivityPCSamplingStallReason#

The stall reason for PC sampling activity.

Values:

enumerator CUPTI_ACTIVITY_PC_SAMPLING_STALL_INVALID#

Invalid reason.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_STALL_NONE#

No stall, instruction is selected for issue.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_STALL_INST_FETCH#

Warp is blocked because next instruction is not yet available, because of instruction cache miss, or because of branching effects.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_STALL_EXEC_DEPENDENCY#

Instruction is waiting on an arithmetic dependency.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_STALL_MEMORY_DEPENDENCY#

Warp is blocked because it is waiting for a memory access to complete.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_STALL_TEXTURE#

Texture sub-system is fully utilized or has too many outstanding requests.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_STALL_SYNC#

Warp is blocked as it is waiting at __syncthreads() or at memory barrier.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_STALL_CONSTANT_MEMORY_DEPENDENCY#

Warp is blocked waiting for constant memory and immediate memory access to complete.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_STALL_PIPE_BUSY#

Compute operation cannot be performed due to the required resources not being available.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_STALL_MEMORY_THROTTLE#

Warp is blocked because there are too many pending memory operations.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_STALL_NOT_SELECTED#

Warp was ready to issue, but some other warp issued instead.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_STALL_OTHER#

Miscellaneous reasons.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_STALL_SLEEPING#

Sleeping.

enumerator CUPTI_ACTIVITY_PC_SAMPLING_STALL_FORCE_INT#
enum CUpti_ActivityPartitionedGlobalCacheConfig#

Partitioned global caching option.

Values:

enumerator CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_UNKNOWN#

Partitioned global cache config unknown.

enumerator CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_NOT_SUPPORTED#

Partitioned global cache not supported.

enumerator CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_OFF#

Partitioned global cache config off.

enumerator CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_ON#

Partitioned global cache config on.

enumerator CUPTI_ACTIVITY_PARTITIONED_GLOBAL_CACHE_CONFIG_FORCE_INT#
enum CUpti_ActivityPreemptionKind#

The kind of a preemption activity.

Values:

enumerator CUPTI_ACTIVITY_PREEMPTION_KIND_UNKNOWN#

The preemption kind is not known.

enumerator CUPTI_ACTIVITY_PREEMPTION_KIND_SAVE#

Preemption to save CDP block.

enumerator CUPTI_ACTIVITY_PREEMPTION_KIND_RESTORE#

Preemption to restore CDP block.

enumerator CUPTI_ACTIVITY_PREEMPTION_KIND_FORCE_INT#
enum CUpti_ActivityStreamFieldIds#

Enum identifiers for fields to trace CUDA Stream operations.

These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_STREAM.

Values:

enumerator STREAM_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_STREAM.

enumerator STREAM_FIELD_CONTEXT_ID#

uint32_t contextId; The ID of the context where the stream was created.

enumerator STREAM_FIELD_STREAM_ID#

uint32_t streamId; A unique stream ID to identify the stream.

enumerator STREAM_FIELD_PRIORITY#

int32_t priority; The clamped priority for the stream.

enumerator STREAM_FIELD_FLAG#

CUpti_ActivityStreamFlag flag; Flags associated with the stream.

enumerator STREAM_FIELD_CORRELATION_ID#

uint64_t correlationId; Correlation ID assigned to this CUDA Stream operation.

enumerator STREAM_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivityStreamFlag#

stream type.

The types of stream to be used with CUpti_ActivityStream.

Values:

enumerator CUPTI_ACTIVITY_STREAM_CREATE_FLAG_UNKNOWN#

Unknown data.

enumerator CUPTI_ACTIVITY_STREAM_CREATE_FLAG_DEFAULT#

Default stream.

enumerator CUPTI_ACTIVITY_STREAM_CREATE_FLAG_NON_BLOCKING#

Non-blocking stream.

enumerator CUPTI_ACTIVITY_STREAM_CREATE_FLAG_NULL#

Null stream.

enumerator CUPTI_ACTIVITY_STREAM_CREATE_MASK#

Stream create Mask.

enumerator CUPTI_ACTIVITY_STREAM_CREATE_FLAG_FORCE_INT#
enum CUpti_ActivitySynchronizationFieldIds#

Enum identifiers for fields to trace various CUDA synchronization APIs.

These field IDs are used to reference specific fields when dynamically selecting or parsing activity records of CUPTI_ACTIVITY_KIND_SYNCHRONIZATION.

Values:

enumerator SYNCHRONIZATION_FIELD_KIND#

CUpti_ActivityKind kind; Kind of this activity record.

Always CUPTI_ACTIVITY_KIND_SYNCHRONIZATION.

enumerator SYNCHRONIZATION_FIELD_TYPE#

CUpti_ActivitySynchronizationType type; The type of synchronization operation.

enumerator SYNCHRONIZATION_FIELD_START#

uint64_t start; The start timestamp for the synchronization operation (in nanoseconds).

enumerator SYNCHRONIZATION_FIELD_END#

uint64_t start; The start timestamp for the synchronization operation (in nanoseconds).

enumerator SYNCHRONIZATION_FIELD_CORRELATION_ID#

uint64_t correlationId; Correlation ID assigned to this synchronization operation.

Matches the correlation ID in the CUDA driver/runtime API activity record that launched it.

enumerator SYNCHRONIZATION_FIELD_CONTEXT_ID#

uint32_t contextId; CUDA context where the operation is occurring.

enumerator SYNCHRONIZATION_FIELD_STREAM_ID#

uint32_t streamId; The compute stream where the synchronization operation is occurring.

CUPTI_SYNCHRONIZATION_INVALID_VALUE indicates that the field is not applicable for this record.

enumerator SYNCHRONIZATION_FIELD_CUDA_EVENT_ID#

uint64_t cudaEventId; The CUDA event ID for which the synchronization operation is called.

CUPTI_SYNCHRONIZATION_INVALID_VALUE indicates that the field is not applicable for this record.

enumerator SYNCHRONIZATION_FIELD_CUDA_EVENT_SYNC_ID#

uint64_t cudaEventSyncId; A unique ID to associate event synchronization records with the latest CUDA Event record.

Similar field is added in CUpti_ActivityCudaEventFieldIds to associate synchronization record to the CUDA Event record.

The same CUDA event can be used multiple times, so the event id will not be unique to correlate the synchronization record with the latest CUDA Event record. This field will be unique and can be used to do the required correlation.

A CUPTI_SYNCHRONIZATION_INVALID_VALUE value indicates that the field is not applicable for this record. Valid only for synchronization records related to CUDA Events.

enumerator SYNCHRONIZATION_FIELD_RETURN_VALUE#

uint32_t returnValue; The return value for the synchronization operation.

Use cuptiActivityEnableAllSyncRecords API to enable/disable collection of synchronization records with return value being non-zero. This will be a CUresult value.

enumerator SYNCHRONIZATION_FIELD_MAX#

Total number of defined fields.

enum CUpti_ActivitySynchronizationType#

Synchronization type.

The types of synchronization to be used with CUpti_ActivitySynchronization2.

Values:

enumerator CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_UNKNOWN#

Unknown data.

enumerator CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_EVENT_SYNCHRONIZE#

Event synchronize API.

enumerator CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_STREAM_WAIT_EVENT#

Stream wait event API.

enumerator CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_STREAM_SYNCHRONIZE#

Stream synchronize API.

enumerator CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_CONTEXT_SYNCHRONIZE#

Context synchronize API.

enumerator CUPTI_ACTIVITY_SYNCHRONIZATION_TYPE_FORCE_INT#
enum CUpti_ActivityThreadIdType#

Thread-Id types.

CUPTI uses different methods to obtain the thread-id depending on the support and the underlying platform. This enum documents these methods for each type. APIs cuptiSetThreadIdType and cuptiGetThreadIdType can be used to set and get the thread-id type.

Values:

enumerator CUPTI_ACTIVITY_THREAD_ID_TYPE_DEFAULT#

Default type Windows uses API GetCurrentThreadId() Linux/Mac/Android/QNX use POSIX pthread API pthread_self()

enumerator CUPTI_ACTIVITY_THREAD_ID_TYPE_SYSTEM#

This type is based on the system API available on the underlying platform and thread-id obtained is supposed to be unique for the process lifetime.

Windows uses API GetCurrentThreadId() Linux uses syscall SYS_gettid Mac uses syscall SYS_thread_selfid Android/QNX use gettid()

enumerator CUPTI_ACTIVITY_THREAD_ID_TYPE_SIZE#

Add new enums before this field.

enumerator CUPTI_ACTIVITY_THREAD_ID_TYPE_FORCE_INT#
enum CUpti_ActivityUnifiedMemoryAccessType#

Memory access type for unified memory page faults.

This is valid for CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_GPU_PAGE_FAULT and CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_CPU_PAGE_FAULT_COUNT

Values:

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_UNKNOWN#

The unified memory access type is not known.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_READ#

The page fault was triggered by read memory instruction.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_WRITE#

The page fault was triggered by write memory instruction.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_ATOMIC#

The page fault was triggered by atomic memory instruction.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_ACCESS_TYPE_PREFETCH#

The page fault was triggered by memory prefetch operation.

enum CUpti_ActivityUnifiedMemoryCounterKind#

Kind of the Unified Memory counter.

Many activities are associated with Unified Memory mechanism; among them are transfers from host to device, device to host, page fault at host side.

Values:

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_UNKNOWN#

The unified memory counter kind is not known.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_HTOD#

Number of bytes transferred from host to device.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOH#

Number of bytes transferred from device to host.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_CPU_PAGE_FAULT_COUNT#

Number of CPU page faults, this is only supported on 64 bit Linux and Mac platforms.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_GPU_PAGE_FAULT#

Number of GPU page faults, this is only supported on devices with compute capability 6.0 and higher and 64 bit Linux platforms.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THRASHING#

Thrashing occurs when data is frequently accessed by multiple processors and has to be constantly migrated around to achieve data locality.

In this case the overhead of migration may exceed the benefits of locality. This is only supported on 64 bit Linux platforms.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THROTTLING#

Throttling is a prevention technique used by the driver to avoid further thrashing.

Here, the driver doesn’t service the fault for one of the contending processors for a specific period of time, so that the other processor can run at full-speed. This is only supported on 64 bit Linux platforms.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_REMOTE_MAP#

In case throttling does not help, the driver tries to pin the memory to a processor for a specific period of time.

One of the contending processors will have slow access to the memory, while the other will have fast access. This is only supported on 64 bit Linux platforms.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOD#

Number of bytes transferred from one device to another device.

This is only supported on 64 bit Linux platforms.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_COUNT#
enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_FORCE_INT#
enum CUpti_ActivityUnifiedMemoryCounterScope#

Scope of the unified memory counter (deprecated in CUDA 7.0)

Values:

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_SCOPE_UNKNOWN#

The unified memory counter scope is not known.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_SCOPE_PROCESS_SINGLE_DEVICE#

Collect unified memory counter for single process on one device.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_SCOPE_PROCESS_ALL_DEVICES#

Collect unified memory counter for single process across all devices.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_SCOPE_COUNT#
enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_SCOPE_FORCE_INT#
enum CUpti_ActivityUnifiedMemoryMigrationCause#

Migration cause of the Unified Memory counter.

This is valid for CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_HTOD and CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOH

Values:

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_UNKNOWN#

The unified memory migration cause is not known.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_USER#

The unified memory migrated due to an explicit call from the user e.g.

cudaMemPrefetchAsync

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_COHERENCE#

The unified memory migrated to guarantee data coherence e.g.

CPU/GPU faults on Pascal+ and kernel launch on pre-Pascal GPUs

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_PREFETCH#

The unified memory was speculatively migrated by the UVM driver before being accessed by the destination processor to improve performance.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_EVICTION#

The unified memory migrated to the CPU because it was evicted to make room for another block of memory on the GPU.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_MIGRATION_CAUSE_ACCESS_COUNTERS#

The unified memory migrated to another processor because of access counter notifications.

Only frequently accessed pages are migrated between CPU and GPU, or between peer GPUs.

enum CUpti_ActivityUnifiedMemoryRemoteMapCause#

Remote memory map cause of the Unified Memory counter.

This is valid for CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_REMOTE_MAP

Values:

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_UNKNOWN#

The cause of mapping to remote memory was unknown.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_COHERENCE#

Mapping to remote memory was added to maintain data coherence.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_THRASHING#

Mapping to remote memory was added to prevent further thrashing.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_POLICY#

Mapping to remote memory was added to enforce the hints specified by the programmer or by performance heuristics of the UVM driver.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_OUT_OF_MEMORY#

Mapping to remote memory was added because there is no more memory available on the processor and eviction was not possible.

enumerator CUPTI_ACTIVITY_UNIFIED_MEMORY_REMOTE_MAP_CAUSE_EVICTION#

Mapping to remote memory was added after the memory was evicted to make room for another block of memory on the GPU.

enum CUpti_ActivityUvmCounterFieldIds#

Enum identifiers for fields in CUpti_ActivityUnifiedMemoryCounter3.

Each enum value corresponds to a field in CUpti_ActivityUnifiedMemoryCounter3 and describes the data type and purpose of that field.

Values:

enumerator UVM_COUNTER_FIELD_KIND#

CUpti_ActivityKind kind; Kind of activity record: CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER.

enumerator UVM_COUNTER_FIELD_COUNTER_KIND#

CUpti_ActivityUnifiedMemoryCounterKind counterKind; The Unified Memory counter kind.

enumerator UVM_COUNTER_FIELD_VALUE#

uint64_t value; Value of the counter.

enumerator UVM_COUNTER_FIELD_START#

uint64_t start; Start timestamp of the counter (in ns).

enumerator UVM_COUNTER_FIELD_END#

uint64_t end; End timestamp of the counter (in ns).

enumerator UVM_COUNTER_FIELD_ADDRESS#

uint64_t address; Virtual base address of the page/s being transferred.

enumerator UVM_COUNTER_FIELD_SRC_ID#

uint32_t srcId; ID of the source CPU/device.

enumerator UVM_COUNTER_FIELD_DST_ID#

uint32_t dstId; ID of the destination CPU/device.

enumerator UVM_COUNTER_FIELD_STREAM_ID#

uint32_t streamId; ID of the stream causing the transfer.

enumerator UVM_COUNTER_FIELD_PROCESS_ID#

uint32_t processId; ID of the process to which this record belongs.

enumerator UVM_COUNTER_FIELD_FLAGS#

uint32_t flags; Flags associated with this record.

enumerator UVM_COUNTER_FIELD_PROCESSORS#

uint64_t processors[5]; Bitmask of devices involved in the operation.

enumerator UVM_COUNTER_FIELD_MAX#

Total number of defined fields.

enum CUpti_ChannelType#

Values:

enumerator CUPTI_CHANNEL_TYPE_INVALID#
enumerator CUPTI_CHANNEL_TYPE_COMPUTE#

Channel is used for standard work launch and tracking.

enumerator CUPTI_CHANNEL_TYPE_ASYNC_MEMCPY#

Channel is used by an asynchronous copy engine For confidential compute configurations, work launch and completion are done using the copy engines.

enumerator CUPTI_CHANNEL_TYPE_DECOMP#

Channel is used for memory decompression operations.

enumerator CUPTI_CHANNEL_TYPE_FORCE_INT#
enum CUpti_ComputeEngineCtxSwitchOperationType#

The operation type of CUDA context switch event records.

Values:

enumerator CUPTI_COMPUTE_ENGINE_CTX_SWITCH_OPERATION_INVALID#
enumerator CUPTI_COMPUTE_ENGINE_CTX_SWITCH_OPERATION_START#

The start of the CUDA context switch operation.

enumerator CUPTI_COMPUTE_ENGINE_CTX_SWITCH_OPERATION_END#

The end of the CUDA context switch operation.

enumerator CUPTI_COMPUTE_ENGINE_CTX_SWITCH_OPERATION_COUNT#
enum CUpti_ConfidentialComputeRotationEventType#

Values:

enumerator CUPTI_CONFIDENTIAL_COMPUTE_INVALID_ROTATION_EVENT#
enumerator CUPTI_CONFIDENTIAL_COMPUTE_KEY_ROTATION_CHANNEL_BLOCKED#

This channel has been blocked from accepting new CUDA work so a key rotation can be done.

enumerator CUPTI_CONFIDENTIAL_COMPUTE_KEY_ROTATION_CHANNEL_DRAINED#

This channel remains blocked and all queued CUDA work has completed.

Other clients or channels may cause delays in starting the key rotation.

enumerator CUPTI_CONFIDENTIAL_COMPUTE_KEY_ROTATION_CHANNEL_UNBLOCKED#

Key rotations have completed and this channel is unblocked.

enumerator CUPTI_CONFIDENTIAL_COMPUTE_EVENT_TYPE_FORCE_INT#
enum CUpti_ContextCigMode#

CIG (CUDA in Graphics) Modes.

Describes the CIG modes associated with the CUDA context.

Values:

enumerator CUPTI_CONTEXT_CIG_MODE_NONE#

Regular (non-CIG) mode.

enumerator CUPTI_CONTEXT_CIG_MODE_CIG#

CIG mode.

enumerator CUPTI_CONTEXT_CIG_MODE_CIG_FALLBACK#

CIG fallback mode.

enumerator CUPTI_CONTEXT_CIG_MODE_FORCE_INT#
enum CUpti_DevType#

The device type for device connected to NVLink.

Values:

enumerator CUPTI_DEV_TYPE_INVALID#
enumerator CUPTI_DEV_TYPE_GPU#

The device type is GPU.

enumerator CUPTI_DEV_TYPE_NPU#

The device type is NVLink processing unit in CPU.

enumerator CUPTI_DEV_TYPE_FORCE_INT#
enum CUpti_DeviceGraphLaunchMode#

The launch mode for device graph execution.

Values:

enumerator CUPTI_DEVICE_GRAPH_LAUNCH_MODE_INVALID#
enumerator CUPTI_DEVICE_GRAPH_LAUNCH_MODE_FIRE_AND_FORGET#
enumerator CUPTI_DEVICE_GRAPH_LAUNCH_MODE_TAIL#
enumerator CUPTI_DEVICE_GRAPH_LAUNCH_MODE_FIRE_AND_FORGET_AS_SIBLING#
enum CUpti_DeviceVirtualizationMode#

This indicates the virtualization mode in which CUDA device is running.

Values:

enumerator CUPTI_DEVICE_VIRTUALIZATION_MODE_NONE#

No virtualization mode is associated with the device i.e.

it’s a baremetal GPU

enumerator CUPTI_DEVICE_VIRTUALIZATION_MODE_PASS_THROUGH#

The device is associated with the pass-through GPU.

In this mode, an entire physical GPU is directly assigned to one virtual machine (VM).

enumerator CUPTI_DEVICE_VIRTUALIZATION_MODE_VIRTUAL_GPU#

The device is associated with the virtual GPU (vGPU).

In this mode multiple virtual machines (VMs) have simultaneous, direct access to a single physical GPU.

enumerator CUPTI_DEVICE_VIRTUALIZATION_MODE_FORCE_INT#
enum CUpti_EnvironmentClocksThrottleReason#

Reasons for clock throttling.

The possible reasons that a clock can be throttled. There can be more than one reason that a clock is being throttled so these types can be combined by bitwise OR. These are used in the clocksThrottleReason field in the Environment Activity Record.

Values:

enumerator CUPTI_CLOCKS_THROTTLE_REASON_GPU_IDLE#

Nothing is running on the GPU and the clocks are dropping to idle state.

enumerator CUPTI_CLOCKS_THROTTLE_REASON_USER_DEFINED_CLOCKS#

The GPU clocks are limited by a user specified limit.

enumerator CUPTI_CLOCKS_THROTTLE_REASON_SW_POWER_CAP#

A software power scaling algorithm is reducing the clocks below requested clocks.

enumerator CUPTI_CLOCKS_THROTTLE_REASON_HW_SLOWDOWN#

Hardware slowdown to reduce the clock by a factor of two or more is engaged.

This is an indicator of one of the following: 1) Temperature is too high, 2) External power brake assertion is being triggered (e.g. by the system power supply), 3) Change in power state.

enumerator CUPTI_CLOCKS_THROTTLE_REASON_UNKNOWN#

Some unspecified factor is reducing the clocks.

enumerator CUPTI_CLOCKS_THROTTLE_REASON_UNSUPPORTED#

Throttle reason is not supported for this GPU.

enumerator CUPTI_CLOCKS_THROTTLE_REASON_NONE#

No clock throttling.

enumerator CUPTI_CLOCKS_THROTTLE_REASON_FORCE_INT#
enum CUpti_ExternalCorrelationKind#

The kind of external APIs supported for correlation.

Custom correlation kinds are reserved for usage in external tools.

Values:

enumerator CUPTI_EXTERNAL_CORRELATION_KIND_INVALID#
enumerator CUPTI_EXTERNAL_CORRELATION_KIND_UNKNOWN#

The external API is unknown to CUPTI.

enumerator CUPTI_EXTERNAL_CORRELATION_KIND_OPENACC#

The external API is OpenACC.

enumerator CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM0#

The external API is custom0.

enumerator CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM1#

The external API is custom1.

enumerator CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM2#

The external API is custom2.

enumerator CUPTI_EXTERNAL_CORRELATION_KIND_SIZE#

Add new kinds before this line.

enumerator CUPTI_EXTERNAL_CORRELATION_KIND_FORCE_INT#
enum CUpti_FuncShmemLimitConfig#

The shared memory limit per block config for a kernel This should be used to set ‘cudaOccFuncShmemConfig’ field in occupancy calculator API.

Values:

enumerator CUPTI_FUNC_SHMEM_LIMIT_DEFAULT#

The shared memory limit config is default.

enumerator CUPTI_FUNC_SHMEM_LIMIT_OPTIN#

User has opted for a higher dynamic shared memory limit using function attribute ‘cudaFuncAttributeMaxDynamicSharedMemorySize’ for runtime API or CU_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES for driver API.

enumerator CUPTI_FUNC_SHMEM_LIMIT_FORCE_INT#
enum CUpti_LinkFlag#

Link flags.

Describes link properties, to be used with CUpti_ActivityNvLink.

Values:

The flag is invalid.

Is peer to peer access supported by this link.

Is system memory access supported by this link.

Is peer atomic access supported by this link.

Is system memory atomic access supported by this link.

enum CUpti_NvtxExtPayloadType#

Values:

enumerator CUPTI_NVTX_EXT_PAYLOAD_TYPE_UNKNOWN#

The payload type is not known.

enumerator CUPTI_NVTX_EXT_PAYLOAD_TYPE_SCHEMA#

The payload type is a schema.

enumerator CUPTI_NVTX_EXT_PAYLOAD_TYPE_ENUM#

The payload type is an enum.

enumerator CUPTI_NVTX_EXT_PAYLOAD_TYPE_FORCE_INT#
enum CUpti_OpenAccConstructKind#

The OpenAcc parent construct kind for OpenAcc activity records.

Values:

enumerator CUPTI_OPENACC_CONSTRUCT_KIND_UNKNOWN#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_PARALLEL#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_KERNELS#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_LOOP#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_DATA#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_ENTER_DATA#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_EXIT_DATA#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_HOST_DATA#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_ATOMIC#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_DECLARE#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_INIT#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_SHUTDOWN#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_SET#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_UPDATE#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_ROUTINE#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_WAIT#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_RUNTIME_API#
enumerator CUPTI_OPENACC_CONSTRUCT_KIND_FORCE_INT#
enum CUpti_OpenAccEventKind#

The OpenAcc event kind for OpenAcc activity records.

See also

CUpti_ActivityKindOpenAcc

Values:

enumerator CUPTI_OPENACC_EVENT_KIND_INVALID#
enumerator CUPTI_OPENACC_EVENT_KIND_DEVICE_INIT#
enumerator CUPTI_OPENACC_EVENT_KIND_DEVICE_SHUTDOWN#
enumerator CUPTI_OPENACC_EVENT_KIND_RUNTIME_SHUTDOWN#
enumerator CUPTI_OPENACC_EVENT_KIND_ENQUEUE_LAUNCH#
enumerator CUPTI_OPENACC_EVENT_KIND_ENQUEUE_UPLOAD#
enumerator CUPTI_OPENACC_EVENT_KIND_ENQUEUE_DOWNLOAD#
enumerator CUPTI_OPENACC_EVENT_KIND_WAIT#
enumerator CUPTI_OPENACC_EVENT_KIND_IMPLICIT_WAIT#
enumerator CUPTI_OPENACC_EVENT_KIND_COMPUTE_CONSTRUCT#
enumerator CUPTI_OPENACC_EVENT_KIND_UPDATE#
enumerator CUPTI_OPENACC_EVENT_KIND_ENTER_DATA#
enumerator CUPTI_OPENACC_EVENT_KIND_EXIT_DATA#
enumerator CUPTI_OPENACC_EVENT_KIND_CREATE#
enumerator CUPTI_OPENACC_EVENT_KIND_DELETE#
enumerator CUPTI_OPENACC_EVENT_KIND_ALLOC#
enumerator CUPTI_OPENACC_EVENT_KIND_FREE#
enumerator CUPTI_OPENACC_EVENT_KIND_FORCE_INT#
enum CUpti_OpenMpEventKind#

Values:

enumerator CUPTI_OPENMP_EVENT_KIND_INVALID#
enumerator CUPTI_OPENMP_EVENT_KIND_PARALLEL#
enumerator CUPTI_OPENMP_EVENT_KIND_TASK#
enumerator CUPTI_OPENMP_EVENT_KIND_THREAD#
enumerator CUPTI_OPENMP_EVENT_KIND_IDLE#
enumerator CUPTI_OPENMP_EVENT_KIND_WAIT_BARRIER#
enumerator CUPTI_OPENMP_EVENT_KIND_WAIT_TASKWAIT#
enumerator CUPTI_OPENMP_EVENT_KIND_FORCE_INT#
enum CUpti_PcieDeviceType#

Field to differentiate whether PCIE Activity record is of a GPU or a PCI Bridge.

Values:

enumerator CUPTI_PCIE_DEVICE_TYPE_GPU#

PCIE GPU record.

enumerator CUPTI_PCIE_DEVICE_TYPE_BRIDGE#

PCIE Bridge record.

enumerator CUPTI_PCIE_DEVICE_TYPE_FORCE_INT#
enum CUpti_PcieGen#

PCIE Generation.

Enumeration of PCIE Generation for pcie activity attribute pcieGeneration

Values:

enumerator CUPTI_PCIE_GEN_GEN1#

PCIE Generation 1.

enumerator CUPTI_PCIE_GEN_GEN2#

PCIE Generation 2.

enumerator CUPTI_PCIE_GEN_GEN3#

PCIE Generation 3.

enumerator CUPTI_PCIE_GEN_GEN4#

PCIE Generation 4.

enumerator CUPTI_PCIE_GEN_GEN5#

PCIE Generation 5.

enumerator CUPTI_PCIE_GEN_GEN6#

PCIE Generation 6.

enumerator CUPTI_PCIE_GEN_FORCE_INT#

6.1.8. Functions#

CUptiResult cuptiActivityConfigurePCSampling(
CUcontext ctx,
CUpti_ActivityPCSamplingConfig *config,
)#

Set PC sampling configuration.

For Pascal and older GPU architectures this API must be called before enabling activity kind CUPTI_ACTIVITY_KIND_PC_SAMPLING. There is no such requirement for Volta and newer GPU architectures.

For Volta and newer GPU architectures if this API is called in the middle of execution, PC sampling configuration will be updated for subsequent kernel launches.

Starting with CUDA 13.0, this function is unsupported and should not be used. It always returns the error code CUPTI_ERROR_LEGACY_PROFILER_NOT_SUPPORTED.

Parameters:
Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_OPERATION – if this api is called while some valid event collection method is set.

  • CUPTI_ERROR_INVALID_PARAMETER – if config is NULL or any parameter in the config structures is not a valid value

  • CUPTI_ERROR_NOT_SUPPORTED – Indicates that the system/device does not support the unified memory counters

CUptiResult cuptiActivityConfigureUnifiedMemoryCounter(
CUpti_ActivityUnifiedMemoryCounterConfig *config,
uint32_t count,
)#

Set Unified Memory Counter configuration.

Set the configuration before enabling the corresponding activity kind CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER. The API should be called after CUDA driver initialization.

Parameters:
Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_PARAMETER – if config is NULL or any parameter in the config structures is not a valid value

  • CUPTI_ERROR_UM_PROFILING_NOT_SUPPORTED – One potential reason is that platform (OS/arch) does not support the unified memory counters

  • CUPTI_ERROR_UM_PROFILING_NOT_SUPPORTED_ON_DEVICE – Indicates that the device does not support the unified memory counters

  • CUPTI_ERROR_UM_PROFILING_NOT_SUPPORTED_ON_NON_P2P_DEVICES – Indicates that multi-GPU configuration without P2P support between any pair of devices does not support the unified memory counters

CUptiResult cuptiActivityDisable(CUpti_ActivityKind kind)#

Disable collection of a specific kind of activity record.

Disable collection of a specific kind of activity record. Multiple kinds can be disabled by calling this function multiple times. By default all activity kinds are disabled for collection.

Parameters:

kind – The kind of activity record to stop collecting

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_KIND – if the activity kind is not supported

CUptiResult cuptiActivityDisableContext(
CUcontext context,
CUpti_ActivityKind kind,
)#

Disable collection of a specific kind of activity record for a context.

Disable collection of a specific kind of activity record for a context. This setting done by this API will supersede the global settings for activity records. Multiple kinds can be enabled by calling this function multiple times.

Parameters:
  • context – The context for which activity is to be disabled

  • kind – The kind of activity record to stop collecting

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_KIND – if the activity kind is not supported

CUptiResult cuptiActivityDisable_v2(
CUpti_SubscriberHandle subscriber,
CUpti_ActivityKind kind,
CUpti_ActivityConfig *pActivityConfig,
)#

Disable collection of a specific kind of activity record for the subscriber.

Disable collection of a specific kind of activity record. Multiple kinds can be disabled by calling this function multiple times. By default all activity kinds are disabled for collection.

Parameters:
  • subscriber – The subscriber handle.

  • kind – The kind of activity record to stop collecting

  • pActivityConfig – The activity config. It can be NULL.

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_KIND – if the activity kind is not supported

CUptiResult cuptiActivityEnable(CUpti_ActivityKind kind)#

Enable collection of a specific kind of activity record.

Enable collection of a specific kind of activity record. Multiple kinds can be enabled by calling this function multiple times. By default all activity kinds are disabled for collection.

Parameters:

kind – The kind of activity record to collect

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_NOT_COMPATIBLE – if the activity kind cannot be enabled

  • CUPTI_ERROR_INVALID_KIND – if the activity kind is not supported

CUptiResult cuptiActivityEnableAllSyncRecords(uint8_t enable)#

Enables collecting records for all synchronization operations.

CUPTI provides CUDA event query and stream query records via CUPTI_ACTIVITY_KIND_SYNCHRONIZATION. Using this API, CUPTI client can disable to record CUDA event query and stream query records for queries for which the operations have not yet been completed on the CUDA event/stream.

By default, the record is generated for all CUDA events and stream irrespective of whether the operations have been completed on the CUDA event/stream.

Parameters:

enable – is a boolean, denoting whether to enable or disable the collection of all CUDA event query and stream query records

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

CUptiResult cuptiActivityEnableAllocationSource(uint8_t enable)#

Enables tracking the source library for memory allocation requests.

This API is used to control whether or not we track the source library of memory allocation requests. Default value is 0, i.e. it is not tracked. The activity kind CUPTI_ACTIVITY_KIND_MEMORY2 needs to be enabled, and if this flag is set, we get the full path of the shared object responsible for the GPU memory allocation request in the member source in the CUpti_ActivityMemory4 records. Also note that this feature adds runtime overhead.

Parameters:

enable – is a boolean, denoting whether the source library of the memory allocation request needs to be tracked

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

CUptiResult cuptiActivityEnableAndDump(CUpti_ActivityKind kind)#

Enable collection of a specific kind of activity record.

For certain activity kinds it dumps existing records.

In general, the behavior of this API is similar to the API cuptiActivityEnable i.e. it enables the collection of a specific kind of activity record. Additionally, this API can help in dumping the records for activities which happened in the past before enabling the corresponding activity kind. The API allows to get records for the current resource allocations done in CUDA For CUPTI_ACTIVITY_KIND_DEVICE, existing device records are dumped For CUPTI_ACTIVITY_KIND_CONTEXT, existing context records are dumped For CUPTI_ACTIVITY_KIND_STREAM, existing stream records are dumped For CUPTI_ACTIVITY_KIND_ NVLINK, existing NVLINK records are dumped For CUPTI_ACTIVITY_KIND_PCIE, existing PCIE records are dumped For other activities, the behavior is similar to the API cuptiActivityEnable

Device records are emitted in CUPTI on CUDA driver initialization. Those records can only be retrieved by the user if CUPTI is attached before CUDA initialization. Context and stream records are emitted on context and stream creation. The use case of the API is to provide the records for CUDA resources (contexts/streams/devices) that are currently active if user late attaches CUPTI.

Before calling this function, the user must register buffer callbacks to get the activity records by calling cuptiActivityRegisterCallbacks. If the user does not register the buffers and calls API cuptiActivityEnableAndDump, then CUPTI will enable the activity kind but not provide any records for that activity kind.

Parameters:

kind – The kind of activity record to collect

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_UNKNOWN – if buffer is not initialized.

  • CUPTI_ERROR_NOT_COMPATIBLE – if the activity kind cannot be enabled

  • CUPTI_ERROR_INVALID_KIND – if the activity kind is not supported

CUptiResult cuptiActivityEnableAndDump_v2(
CUpti_SubscriberHandle subscriber,
CUpti_ActivityKind kind,
CUpti_ActivityConfig *activityConfig,
)#

Enable collection of a specific kind of activity record.

For certain activity kinds it dumps existing records.

In general, the behavior of this API is similar to the API cuptiActivityEnable_v2 i.e. it enables the collection of a specific kind of activity record. Additionally, this API can help in dumping the records for activities which happened in the past before enabling the corresponding activity kind. The API allows to get records for the current resource allocations done in CUDA For CUPTI_ACTIVITY_KIND_DEVICE, existing device records are dumped For CUPTI_ACTIVITY_KIND_CONTEXT, existing context records are dumped For CUPTI_ACTIVITY_KIND_STREAM, existing stream records are dumped For CUPTI_ACTIVITY_KIND_ NVLINK, existing NVLINK records are dumped For CUPTI_ACTIVITY_KIND_PCIE, existing PCIE records are dumped For other activities, the behavior is similar to the API cuptiActivityEnable

Device records are emitted in CUPTI on CUDA driver initialization. Those records can only be retrieved by the user if CUPTI is attached before CUDA initialization. Context and stream records are emitted on context and stream creation. The use case of the API is to provide the records for CUDA resources (contexts/streams/devices) that are currently active if user late attaches CUPTI.

Before calling this function, the user must register buffer callbacks to get the activity records by calling cuptiActivityRegisterCallbacks_v2. If the user does not register the buffers and calls API cuptiActivityEnableAndDump_v2, then CUPTI will enable the activity kind but not provide any records for that activity kind.

Parameters:
  • subscriber – The subscriber handle.

  • kind – The kind of activity record to collect

  • activityConfig – The activity config. It can be NULL.

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_UNKNOWN – if buffer is not initialized.

  • CUPTI_ERROR_NOT_COMPATIBLE – if the activity kind cannot be enabled

  • CUPTI_ERROR_INVALID_KIND – if the activity kind is not supported

CUptiResult cuptiActivityEnableContext(
CUcontext context,
CUpti_ActivityKind kind,
)#

Enable collection of a specific kind of activity record for a context.

Enable collection of a specific kind of activity record for a context. This setting done by this API will supersede the global settings for activity records enabled by cuptiActivityEnable. Multiple kinds can be enabled by calling this function multiple times.

Parameters:
  • context – The context for which activity is to be enabled

  • kind – The kind of activity record to collect

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_NOT_COMPATIBLE – if the activity kind cannot be enabled

  • CUPTI_ERROR_INVALID_KIND – if the activity kind is not supported

CUptiResult cuptiActivityEnableCudaEventDeviceTimestamps(
uint8_t enable,
)#

Enable/Disable collecting device timestamp for CUPTI_ACTIVITY_KIND_CUDA_EVENT record.

CUPTI provides device timestamps via ‘deviceTimestamp’ field in CUPTI_ACTIVITY_KIND_CUDA_EVENT records. Using this API, CUPTI client can enable or disable the collection of CUDA event device timestamps. By default, the collection of CUDA event device timestamps is disabled.

Parameters:

enable – is a boolean, denoting whether to enable or disable the collection of CUDA event device timestamps

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

CUptiResult cuptiActivityEnableDeviceGraph(uint8_t enable)#

Controls the collection of records for device launched graphs.

This API is used to control the collection of records for device launched graphs. Default value is 0, i.e. these records are not collected. Default value is 1 if HW trace is enabled using API cuptiActivityEnableHWTrace. This API needs to be called before initialization of CUDA and this setting should not be changed during the profiling session.

Parameters:

enable – is a boolean, denoting whether these records should be collected

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

CUptiResult cuptiActivityEnableDriverApi(
CUpti_CallbackId cbid,
uint8_t enable,
)#

Controls the collection of activity records for specific CUDA Driver APIs.

Activity kind CUPTI_ACTIVITY_KIND_DRIVER controls the collection of either all CUDA Driver APIs or none. API cuptiActivityEnableDriverApi can be used for fine-grained control, it allows enabling/disabling tracing of a specific set of CUDA Driver APIs. To disable collection of a small set of CUDA Driver APIs, user can first enable the collection of all Driver APIs using the activity kind CUPTI_ACTIVITY_KIND_DRIVER and call this API to disable specific Driver APIs. And to enable the collection of a small set of CUDA Driver APIs, user can call this API without using the activity kind CUPTI_ACTIVITY_KIND_DRIVER.

Note: Activity kind CUPTI_ACTIVITY_KIND_DRIVER overrides the settings done by this API if it is called after the API.

Parameters:
  • cbid – callback id of the CUDA Driver API. This can be found in the header cupti_driver_cbid.h.

  • enable – is a boolean, denoting whether to enable or disable the collection

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

CUptiResult cuptiActivityEnableHWTrace(uint8_t enable)#

Enables CUDA kernel timestamp collection via Hardware Event System (HES).

This API enables hardware-based collection of CUDA kernel timestamps as an alternative to the traditional software instrumentation and semaphore-based approach. HES-based collection provides lower overhead and more accurate timing measurements.

This API must be called after CUDA driver initialization but before creating the CUDA context. As a generic solution, this API can be called from the CUPTI_CBID_RESOURCE_CU_INIT_FINISHED callback. Once enabled, HES-based tracing persists for the entire profiling session and cannot be disabled.

Note

This feature is only available on NVIDIA Blackwell architecture and later.

Parameters:

enable – is a boolean flag to enable (true) HES-based timestamp collection.

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED – if CUPTI is not initialized or the CUDA driver is not initialized

  • CUPTI_ERROR_NOT_SUPPORTED – if HW trace cannot be enabled on the current platform

  • CUPTI_ERROR_VIRTUALIZED_DEVICE_NOT_SUPPORTED

  • CUPTI_ERROR_CONFIDENTIAL_COMPUTING_NOT_SUPPORTED

  • CUPTI_ERROR_CMP_DEVICE_NOT_SUPPORTED

  • CUPTI_ERROR_MIG_DEVICE_NOT_SUPPORTED

  • CUPTI_ERROR_SLI_DEVICE_NOT_SUPPORTED

  • CUPTI_ERROR_WSL_DEVICE_NOT_SUPPORTED

  • CUPTI_ERROR_HES_TRACE_NOT_SUPPORTED_ON_MPS

CUptiResult cuptiActivityEnableLatencyTimestamps(uint8_t enable)#

Controls the collection of queued and submitted timestamps for kernels.

This API is used to control the collection of queued and submitted timestamps for kernels whose records are provided through the struct CUpti_ActivityKernel11. Default value is 0, i.e. these timestamps are not collected. This API needs to be called before initialization of CUDA and this setting should not be changed during the profiling session.

This API is not supported if the HW trace is enabled through the API cuptiActivityEnableHWTrace.

Parameters:

enable – is a boolean, denoting whether these timestamps should be collected

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

CUptiResult cuptiActivityEnableLaunchAttributes(uint8_t enable)#

Controls the collection of launch attributes for kernels.

This API is used to control the collection of launch attributes for kernels whose records are provided through the struct CUpti_ActivityKernel11. Default value is 0, i.e. these attributes are not collected.

Parameters:

enable – is a boolean denoting whether these launch attributes should be collected

CUptiResult cuptiActivityEnableRuntimeApi(
CUpti_CallbackId cbid,
uint8_t enable,
)#

Controls the collection of activity records for specific CUDA Runtime APIs.

Activity kind CUPTI_ACTIVITY_KIND_RUNTIME controls the collection of either all CUDA Runtime APIs or none. API cuptiActivityEnableRuntimeApi can be used for fine-grained control, it allows enabling/disabling tracing of a specific set of CUDA Runtime APIs. To disable collection of a small set of CUDA Runtime APIs, user can first enable the collection of all Runtime APIs using the activity kind CUPTI_ACTIVITY_KIND_RUNTIME and call this API to disable specific Runtime APIs. And to enable the collection of a small set of CUDA Runtime APIs, user can call this API without using the activity kind CUPTI_ACTIVITY_KIND_RUNTIME.

Note: Activity kind CUPTI_ACTIVITY_KIND_RUNTIME overrides the settings done by this API if it is called after the API.

Parameters:
  • cbid – callback id of the CUDA Runtime API. This can be found in the header cupti_runtime_cbid.h.

  • enable – is a boolean, denoting whether to enable or disable the collection

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

CUptiResult cuptiActivityEnable_v2(
CUpti_SubscriberHandle subscriber,
CUpti_ActivityKind kind,
CUpti_ActivityConfig *pActivityConfig,
)#

Enable collection of a specific kind of activity record for the subscriber.

Enable collection of a specific kind of activity record. Multiple kinds can be enabled by calling this function multiple times. By default all activity kinds are disabled for collection.

Parameters:
  • subscriber – The subscriber handle.

  • kind – The kind of activity record to collect

  • pActivityConfig – The activity config. It can be NULL.

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_NOT_COMPATIBLE – if the activity kind cannot be enabled

  • CUPTI_ERROR_INVALID_KIND – if the activity kind is not supported

CUptiResult cuptiActivityFlush(
CUcontext context,
uint32_t streamId,
uint32_t flag,
)#

Wait for all activity records to be delivered via the completion callback.

This function does not return until all activity records associated with the specified context/stream are returned to the CUPTI client using the callback registered in cuptiActivityRegisterCallbacks. To ensure that all activity records are complete, the requested stream(s), if any, are synchronized.

If context is NULL, the global activity records (i.e. those not associated with a particular stream) are flushed (in this case no streams are synchronized). If context is a valid CUcontext and streamId is 0, the buffers of all streams of this context are flushed. Otherwise, the buffers of the specified stream in this context is flushed.

Before calling this function, the buffer handling callback api must be activated by calling cuptiActivityRegisterCallbacks.

DEPRECATED This method is deprecated CONTEXT and STREAMID will be ignored. Use cuptiActivityFlushAll to flush all data.

Parameters:
  • context – A valid CUcontext or NULL.

  • streamId – The stream ID.

  • flag – The flag can be set to indicate a forced flush. See CUpti_ActivityFlag

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_CUPTI_ERROR_INVALID_OPERATION – if not preceded by a successful call to cuptiActivityRegisterCallbacks

  • CUPTI_ERROR_UNKNOWN – an internal error occurred

CUptiResult cuptiActivityFlushAll(uint32_t flag)#

Request to deliver activity records via the buffer completion callback.

This function returns the activity records associated with all contexts/streams (and the global buffers not associated with any stream) to the CUPTI client using the callback registered in cuptiActivityRegisterCallbacks.

This is a blocking call but it doesn’t issue any CUDA synchronization calls implicitly thus it’s not guaranteed that all activities are completed on the underlying devices. Activity record is considered as completed if it has all the information filled up including the timestamps if any. It is the client’s responsibility to issue necessary CUDA synchronization calls before calling this function if all activity records with complete information are expected to be delivered.

Behavior of the function based on the input flag: (-) ::For default flush i.e. when flag is set as 0, it returns all the activity buffers which have all the activity records completed, buffers need not to be full though. It doesn’t return buffers which have one or more incomplete records. Default flush can be done at a regular interval in a separate thread. (-) ::For forced flush i.e. when flag CUPTI_ACTIVITY_FLAG_FLUSH_FORCED is passed to the function, it returns all the activity buffers including the ones which have one or more incomplete activity records. It’s suggested for clients to do the force flush before the termination of the profiling session to allow remaining buffers to be delivered. In general, it can be done in the at-exit handler.

Before calling this function, the buffer handling callback api must be activated by calling cuptiActivityRegisterCallbacks.

Parameters:

flag – The flag can be set to indicate a forced flush. See CUpti_ActivityFlag

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_OPERATION – if not preceded by a successful call to cuptiActivityRegisterCallbacks

  • CUPTI_ERROR_UNKNOWN – an internal error occurred

CUptiResult cuptiActivityFlushPeriod(uint32_t time)#

Sets the flush period for the worker thread.

CUPTI creates a worker thread to minimize the perturbance for the application created threads. CUPTI offloads certain operations from the application threads to the worker thread, this includes synchronization of profiling resources between host and device, delivery of the activity buffers to the client using the callback registered in cuptiActivityRegisterCallbacks. For performance reasons, CUPTI wakes up the worker thread based on certain heuristics.

This API is used to control the flush period of the worker thread. This setting will override the CUPTI heuristics. Setting time to zero disables the periodic flush and restores the default behavior.

Periodic flush can return only those activity buffers which are full and have all the activity records completed.

It’s allowed to use the API cuptiActivityFlushAll to flush the data on-demand, even when client sets the periodic flush.

Parameters:

time – flush period in milliseconds (ms)

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

CUptiResult cuptiActivityGetAttribute(
CUpti_ActivityAttribute attr,
size_t *valueSize,
void *value,
)#

Read an activity API attribute.

Read an activity API attribute and return it in *value. In case of multiple subscribers, use cuptiActivityGetAttribute_v2 to read the attribute for the subscriber.

Parameters:
  • attr – The attribute to read

  • valueSize – Size of buffer pointed by the value, and returns the number of bytes written to value

  • value – Returns the value of the attribute

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_PARAMETER – if valueSize or value is NULL, or if attr is not an activity attribute

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT – Indicates that the value buffer is too small to hold the attribute value.

CUptiResult cuptiActivityGetAttribute_v2(
CUpti_SubscriberHandle subscriber,
CUpti_ActivityAttribute attr,
size_t *valueSize,
void *value,
)#

Read an activity API attribute.

This function is similar to cuptiActivityGetAttribute. Note that in case of multiple subscribers, only cuptiActivityGetAttribute_v2 can be used to read the attribute for the subscriber.

Read an activity API attribute and return it in *value.

Parameters:
  • subscriber – The subscriber handle.

  • attr – The attribute to read

  • valueSize – Size of buffer pointed by the value, and returns the number of bytes written to value

  • value – Returns the value of the attribute

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_PARAMETER – if valueSize or value is NULL, or if attr is not an activity attribute

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT – Indicates that the value buffer is too small to hold the attribute value.

CUptiResult cuptiActivityGetEnabledKinds(
CUpti_SubscriberHandle subscriber,
CUpti_ActivityKind *buffer,
uint32_t *bufferSize,
uint32_t *enabledKindsCount,
)#

Get the enabled activity kinds for a subscriber.

Note

If the provided buffer size is not sufficient to store all the enabled activity kinds, we populate the buffer with as much as we can, but return the true value of the number of enabled activity kinds in enabledKindsCount.

Parameters:
  • subscriber – The subscriber handle. If NULL, the union of activity kinds enabled across all subscribers is returned.

  • buffer – The buffer to store the enabled activity kinds. If NULL, the number of enabled activity kinds is returned in enabledKindsCount.

  • bufferSize – The size of the buffer. If NULL, only the number of enabled activity kinds is returned in enabledKindsCount. If NULL and buffer is not NULL, CUPTI_ERROR_INVALID_PARAMETER is returned.

  • enabledKindsCount – The number of enabled activity kinds. If NULL, CUPTI_ERROR_INVALID_PARAMETER is returned.

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER – if enabledKindsCount is NULL or if subscriber is invalid.

CUptiResult cuptiActivityGetNextRecord(
uint8_t *buffer,
size_t validBufferSizeBytes,
CUpti_Activity **record,
)#

Iterate over the activity records in a buffer.

This is a helper function to iterate over the activity records in a buffer. A buffer of activity records is typically obtained by receiving a CUpti_BuffersCallbackCompleteFunc callback. Stop iterating the buffer when an error occurs.

An example of typical usage:

CUpti_Activity *record = NULL;
CUptiResult status = CUPTI_SUCCESS;
  do {
     status = cuptiActivityGetNextRecord(buffer, validSize, &record);
     if(status == CUPTI_SUCCESS) {
          // Use record here...
     }
     else if (status == CUPTI_ERROR_MAX_LIMIT_REACHED)
         break;
     else if (status == CUPTI_ERROR_INVALID_KIND)
         break;
     else {
         goto Error;
     }
   } while (1);

Parameters:
  • buffer – The buffer containing activity records

  • record – Inputs the previous record returned by cuptiActivityGetNextRecord and returns the next activity record from the buffer. If input value is NULL, returns the first activity record in the buffer. Records of certain kinds like CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL may contain invalid (0) timestamps, indicating that no timing information could be collected for lack of device memory.

  • validBufferSizeBytes – The number of valid bytes in the buffer.

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_MAX_LIMIT_REACHED – if no more records in the buffer

  • CUPTI_ERROR_INVALID_PARAMETER – if buffer is NULL.

  • CUPTI_ERROR_INVALID_KIND – if activity record is either incomplete or invalid

CUptiResult cuptiActivityGetNumDroppedRecords(
CUcontext context,
uint32_t streamId,
size_t *dropped,
)#

Get the number of activity records that were dropped of insufficient buffer space.

Get the number of records that were dropped because of insufficient buffer space. The dropped count includes records that could not be recorded because CUPTI did not have activity buffer space available for the record (because the CUpti_BuffersCallbackRequestFunc callback did not return an empty buffer of sufficient size) and also CDP records that could not be record because the device-size buffer was full (size is controlled by the CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE_CDP attribute). The dropped count maintained for the queue is reset to zero when this function is called.

Parameters:
  • context – The context, or NULL to get dropped count from global queue

  • streamId – The stream ID

  • dropped – The number of records that were dropped since the last call to this function.

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_PARAMETER – if dropped is NULL

CUptiResult cuptiActivityGetStructSize(
CUpti_ActivityKind activityKind,
uint32_t version,
size_t *activityStructSize,
)#

Get the size of the activity struct for a given CUPTI version.

This API is applicable only for predefined activity structs.

Parameters:
  • activityKind – The activity kind to get the size for.

  • activityStructSize – The size of the activity struct.

  • version – The version of the CUPTI API. If 0, returns size for current version. If non-zero, returns size for the specified version. The version format is xxyyzz, e.g., 130000 for CUPTI 13.0, 130100 for CUPTI 13.1. The version of CUPTI can be queried using cuptiGetVersion API. Only versions >= 130000 are supported.

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER – if activityStructSize is NULL

  • CUPTI_ERROR_INVALID_KIND – if activityKind is invalid.

  • CUPTI_ERROR_NOT_SUPPORTED – if version is non-zero and is less than 130000.

CUptiResult cuptiActivityPopExternalCorrelationId(
CUpti_ExternalCorrelationKind kind,
uint64_t *lastId,
)#

Pop an external correlation id for the calling thread.

This function notifies CUPTI that the calling thread is leaving an external API region.

Parameters:
  • kind – The kind of external API activities should be correlated with.

  • lastId – If the function returns successful, contains the last external correlation id for this kind, can be NULL.

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER – The external API kind is invalid.

  • CUPTI_ERROR_QUEUE_EMPTY – No external id is currently associated with kind.

CUptiResult cuptiActivityPushExternalCorrelationId(
CUpti_ExternalCorrelationKind kind,
uint64_t id,
)#

Push an external correlation id for the calling thread.

This function notifies CUPTI that the calling thread is entering an external API region. When a CUPTI activity API record is created while within an external API region and CUPTI_ACTIVITY_KIND_EXTERNAL_CORRELATION is enabled, the activity API record will be preceded by a CUpti_ActivityExternalCorrelation record for each CUpti_ExternalCorrelationKind.

Parameters:
  • kind – The kind of external API activities should be correlated with.

  • id – External correlation id.

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER – The external API kind is invalid

CUptiResult cuptiActivityRegisterCallbacks(
CUpti_BuffersCallbackRequestFunc funcBufferRequested,
CUpti_BuffersCallbackCompleteFunc funcBufferCompleted,
)#

Registers callback functions with CUPTI for activity buffer handling.

This function registers two callback functions to be used in asynchronous buffer handling. If registered, activity record buffers are handled using asynchronous requested/completed callbacks from CUPTI.

Registering these callbacks prevents the client from using CUPTI’s blocking enqueue/dequeue functions.

Parameters:
  • funcBufferRequested – callback which is invoked when an empty buffer is requested by CUPTI

  • funcBufferCompleted – callback which is invoked when a buffer containing activity records is available from CUPTI

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER – if either funcBufferRequested or funcBufferCompleted is NULL

CUptiResult cuptiActivityRegisterCallbacks_v2(
CUpti_SubscriberHandle subscriber,
CUpti_BuffersCallbackRequestFunc_v2 funcBufferRequested,
CUpti_BuffersCallbackCompleteFunc_v2 funcBufferCompleted,
)#

Registers callback functions with CUPTI for activity buffer handling for the subscriber.

This function registers two callback functions to be used in asynchronous buffer handling. If registered, activity record buffers are handled using asynchronous requested/completed callbacks from CUPTI.

Registering these callbacks prevents the client from using CUPTI’s blocking enqueue/dequeue functions.

Parameters:
  • subscriber – The subscriber handle.

  • funcBufferRequested – callback which is invoked when an empty buffer is requested by CUPTI

  • funcBufferCompleted – callback which is invoked when a buffer containing activity records is available from CUPTI

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER – if either funcBufferRequested or funcBufferCompleted_v2 is NULL

CUptiResult cuptiActivityRegisterTimestampCallback(
CUpti_TimestampCallbackFunc funcTimestamp,
)#

Registers callback function with CUPTI for providing timestamp.

This function registers a callback function to obtain timestamp of user’s choice instead of using CUPTI provided timestamp. By default CUPTI uses different methods, based on the underlying platform, to retrieve the timestamp Linux (x86_64, aarch64 sbsa, aarch64) uses clock_gettime(CLOCK_REALTIME) Windows uses QueryPerformanceCounter() WSL (Windows Subsystem for Linux) uses clock_gettime(CLOCK_MONOTONIC_RAW) as CLOCK_REALTIME can cause backward jumps. QNX uses ClockCycles() Timestamps retrieved using these methods are converted to nanosecond if needed before usage.

Timestamps for GPU activities such as kernels, memory copies and memset operations are recorded directly on the GPU. To provide a unified and normalized view of these timestamps in relation to CPU time, CUPTI performs a linear interpolation to convert GPU timestamps into CPU timestamps during post-processing. For activities where timestamps are captured on the GPU, the timestamp callback is invoked during the post-processing phase, while converting GPU timestamps into CPU timestamps. For activities for which timestamps are captured directly on the CPU, the timestamp callback is invoked immediately at the time of the activity.

The registration of timestamp callback should be done before any of the CUPTI activity kinds are enabled to make sure that all the records report the timestamp using the callback function registered through cuptiActivityRegisterTimestampCallback API.

Changing the timestamp callback function in CUPTI through cuptiActivityRegisterTimestampCallback API in the middle of the profiling session can cause records generated prior to the change to report timestamps through previous timestamp method.

Parameters:

funcTimestamp – callback which is invoked when a timestamp is needed by CUPTI

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER – if funcTimestamp is NULL

  • CUPTI_ERROR_NOT_INITIALIZED

CUptiResult cuptiActivitySetAttribute(
CUpti_ActivityAttribute attr,
size_t *valueSize,
void *value,
)#

Write an activity API attribute.

Write an activity API attribute. In case of multiple subscribers, use cuptiActivitySetAttribute_v2 to write the attribute for the subscriber.

Parameters:
  • attr – The attribute to write

  • valueSize – The size, in bytes, of the value

  • value – The attribute value to write

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_NOT_SUPPORTED – if multiple subscribers are not allowed.

  • CUPTI_ERROR_INVALID_PARAMETER – if valueSize or value is NULL, or if attr is not an activity attribute

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT – Indicates that the value buffer is too small to hold the attribute value.

CUptiResult cuptiActivitySetAttribute_v2(
CUpti_SubscriberHandle subscriber,
CUpti_ActivityAttribute attr,
size_t *valueSize,
void *value,
)#

Write an activity API attribute.

Write an activity API attribute. In case of multiple subscribers, use cuptiActivitySetAttribute_v2 to write the attribute for the subscriber.

Parameters:
  • subscriber – The subscriber handle.

  • attr – The attribute to write

  • valueSize – The size, in bytes, of the value

  • value – The attribute value to write

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_PARAMETER – if valueSize or value is NULL, or if attr is not an activity attribute

  • CUPTI_ERROR_PARAMETER_SIZE_NOT_SUFFICIENT – Indicates that the value buffer is too small to hold the attribute value.

CUptiResult cuptiComputeCapabilitySupported(
int major,
int minor,
int *support,
)#

Check support for a compute capability.

This function is used to check the support for a device based on it’s compute capability. It sets the support when the compute capability is supported by the current version of CUPTI, and clears it otherwise. This version of CUPTI might not support all GPUs sharing the same compute capability. It is suggested to use API cuptiDeviceSupported which provides correct information.

Parameters:
  • major – The major revision number of the compute capability

  • minor – The minor revision number of the compute capability

  • support – Pointer to an integer to return the support status

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER – if support is NULL

CUptiResult cuptiDeviceSupported(CUdevice dev, int *support)#

Check support for a compute device.

This function is used to check the support for a compute device. It sets the support when the device is supported by the current version of CUPTI, and clears it otherwise.

Parameters:
  • dev – The device handle returned by CUDA Driver API cuDeviceGet

  • support – Pointer to an integer to return the support status

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER – if support is NULL

  • CUPTI_ERROR_INVALID_DEVICE – if dev is not a valid device

CUptiResult cuptiDeviceVirtualizationMode(
CUdevice dev,
CUpti_DeviceVirtualizationMode *mode,
)#

Query the virtualization mode of the device.

This function is used to query the virtualization mode of the CUDA device.

Parameters:
  • dev – The device handle returned by CUDA Driver API cuDeviceGet

  • mode – Pointer to an CUpti_DeviceVirtualizationMode to return the virtualization mode

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_DEVICE – if dev is not a valid device

  • CUPTI_ERROR_INVALID_PARAMETER – if mode is NULL

CUptiResult cuptiFinalize(void)#

Detach CUPTI from the running process.

This API detaches the CUPTI from the running process. It destroys and cleans up all the resources associated with CUPTI in the current process. After CUPTI detaches from the process, the process will keep on running with no CUPTI attached to it. For safe operation of the API, it is recommended this API is invoked from the exit callsite of any of the CUDA Driver or Runtime API. Otherwise CUPTI client needs to make sure that required CUDA synchronization and CUPTI activity buffer flush is done before calling the API. Sample code showing the usage of the API in the cupti callback handler code:

void CUPTIAPI
cuptiCallbackHandler(void *userdata, CUpti_CallbackDomain domain,
    CUpti_CallbackId cbid, void *cbdata)
{
  const CUpti_CallbackData *cbInfo = (CUpti_CallbackData *)cbdata;

  // Take this code path when CUPTI detach is requested
  if (detachCupti) {
    switch(domain)
    {
      case CUPTI_CB_DOMAIN_RUNTIME_API:
      case CUPTI_CB_DOMAIN_DRIVER_API:
        if (cbInfo->callbackSite == CUPTI_API_EXIT) {
            // call the CUPTI detach API
            cuptiFinalize();
        }
        break;
      default:
        break;
    }
  }
}

CUptiResult cuptiGetAutoBoostState(
CUcontext context,
CUpti_ActivityAutoBoostState *state,
)#

Get auto boost state.

The profiling results can be inconsistent in case auto boost is enabled. CUPTI tries to disable auto boost while profiling. It can fail to disable in cases where user does not have the permissions or CUDA_AUTO_BOOST env variable is set. The function can be used to query whether auto boost is enabled.

Parameters:
  • context – A valid CUcontext.

  • state – A pointer to CUpti_ActivityAutoBoostState structure which contains the current state and the id of the process that has requested the current state

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER – if CUcontext or state is NULL

  • CUPTI_ERROR_NOT_SUPPORTED – Indicates that the device does not support auto boost

  • CUPTI_ERROR_UNKNOWN – an internal error occurred

CUptiResult cuptiGetContextId(CUcontext context, uint32_t *contextId)#

Get the ID of a context.

Get the ID of a context.

Parameters:
  • context – The context

  • contextId – Returns a process-unique ID for the context

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_CONTEXT – The context is NULL or not valid.

  • CUPTI_ERROR_INVALID_PARAMETER – if contextId is NULL

CUptiResult cuptiGetDeviceId(CUcontext context, uint32_t *deviceId)#

Get the ID of a device.

If context is NULL, returns the ID of the device that contains the currently active context. If context is non-NULL, returns the ID of the device which contains that context. Operates in a similar manner to cudaGetDevice() or cuCtxGetDevice() but may be called from within callback functions.

Parameters:
  • context – The context, or NULL to indicate the current context.

  • deviceId – Returns the ID of the device that is current for the calling thread.

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_DEVICE – if unable to get device ID

  • CUPTI_ERROR_INVALID_PARAMETER – if deviceId is NULL

CUptiResult cuptiGetGraphExecId(CUgraphExec graphExec, uint32_t *pId)#

Get the unique ID of executable graph.

Returns the unique ID of executable CUDA graph.

Parameters:
  • graphExec – The executable graph.

  • pId – Returns the unique ID of the executable graph

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_PARAMETER – if graph is NULL

CUptiResult cuptiGetGraphId(CUgraph graph, uint32_t *pId)#

Get the unique ID of graph.

Returns the unique ID of CUDA graph.

Parameters:
  • graph – The graph.

  • pId – Returns the unique ID of the graph

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_PARAMETER – if graph is NULL

CUptiResult cuptiGetGraphNodeId(CUgraphNode node, uint64_t *nodeId)#

Get the unique ID of a graph node.

Returns the unique ID of the CUDA graph node.

Parameters:
  • node – The graph node.

  • nodeId – Returns the unique ID of the node

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_PARAMETER – if node is NULL

CUptiResult cuptiGetLastError(void)#

Returns the last error from a cupti call or callback.

Returns the last error that has been produced by any of the cupti api calls or the callback in the same host thread and resets it to CUPTI_SUCCESS.

CUptiResult cuptiGetStreamId(
CUcontext context,
CUstream stream,
uint32_t *streamId,
)#

Get the ID of a stream.

Get the ID of a stream. The stream ID is unique within a context (i.e. all streams within a context will have unique stream IDs).

DEPRECATED This method is deprecated as of CUDA 8.0. Use method cuptiGetStreamIdEx instead.

Parameters:
  • context – If non-NULL then the stream is checked to ensure that it belongs to this context. Typically this parameter should be null.

  • stream – The stream

  • streamId – Returns a context-unique ID for the stream

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_STREAM – if unable to get stream ID, or if context is non-NULL and stream does not belong to the context

  • CUPTI_ERROR_INVALID_PARAMETER – if streamId is NULL

CUptiResult cuptiGetStreamIdEx(
CUcontext context,
CUstream stream,
uint8_t perThreadStream,
uint32_t *streamId,
)#

Get the ID of a stream.

Get the ID of a stream. The stream ID is unique within a context (i.e. all streams within a context will have unique stream IDs).

Parameters:
  • context – If non-NULL then the stream is checked to ensure that it belongs to this context. Typically this parameter should be null.

  • stream – The stream

  • perThreadStream – Flag to indicate if program is compiled for per-thread streams

  • streamId – Returns a context-unique ID for the stream

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_INITIALIZED

  • CUPTI_ERROR_INVALID_STREAM – if unable to get stream ID, or if context is non-NULL and stream does not belong to the context

  • CUPTI_ERROR_INVALID_PARAMETER – if streamId is NULL

CUptiResult cuptiGetThreadIdType(CUpti_ActivityThreadIdType *type)#

Get the thread-id type.

Returns the thread-id type used in CUPTI

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER – if type is NULL

CUptiResult cuptiGetTimestamp(uint64_t *timestamp)#

Get the CUPTI timestamp.

Returns a timestamp normalized to correspond with the start and end timestamps reported in the CUPTI activity records. The timestamp is reported in nanoseconds.

Parameters:

timestamp – Returns the CUPTI timestamp

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER – if timestamp is NULL

CUptiResult cuptiIsTracingSessionRunning(uint8_t *isRunning)#

Check whether a CUPTI tracing session is still running.

Can be used to determine if it is safe to unload your CUPTI based tool.

Parameters:

isRunning – Returns whether the tracing session is still running.

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_INVALID_PARAMETER – if isRunning is NULL.

  • CUPTI_ERROR_UNKNOWN – if an unknown error occurred.

CUptiResult cuptiSetThreadIdType(CUpti_ActivityThreadIdType type)#

Set the thread-id type.

CUPTI uses the method corresponding to set type to generate the thread-id. See enum CUpti_ActivityThreadIdType for the list of methods. Activity records having thread-id field contain the same value. Thread id type must not be changed during the profiling session to avoid thread-id value mismatch across activity records.

Return values:
  • CUPTI_SUCCESS

  • CUPTI_ERROR_NOT_SUPPORTED – if type is not supported on the platform

6.1.9. Typedefs#

typedef void (*CUpti_BuffersCallbackCompleteFunc)(CUcontext context, uint32_t streamId, uint8_t *buffer, size_t size, size_t validSize)#

Function type for callback used by CUPTI to return a buffer of activity records.

This callback function returns to the CUPTI client a buffer containing activity records. The buffer contains validSize bytes of activity records which should be read using cuptiActivityGetNextRecord. The number of dropped records can be read using cuptiActivityGetNumDroppedRecords. After this call CUPTI relinquished ownership of the buffer and will not use it anymore. The client may return the buffer to CUPTI using the CUpti_BuffersCallbackRequestFunc callback. Note: CUDA 6.0 onwards, all buffers returned by this callback are global buffers i.e. there is no context/stream specific buffer. User needs to parse the global buffer to extract the context/stream specific activity records.

Param context:

The context this buffer is associated with. If NULL, the buffer is associated with the global activities. This field is deprecated as of CUDA 6.0 and will always be NULL.

Param streamId:

The stream id this buffer is associated with. This field is deprecated as of CUDA 6.0 and will always be NULL.

Param buffer:

The activity record buffer.

Param size:

The total size of the buffer in bytes as set in CUpti_BuffersCallbackRequestFunc.

Param validSize:

The number of valid bytes in the buffer.

typedef void (*CUpti_BuffersCallbackCompleteFunc_v2)(uint8_t *buffer, size_t size, size_t validSize, CUpti_BufferCallbackCompleteInfo *pBufferCompleteInfo)#

Function type for callback used by CUPTI to return a buffer of activity records.

This callback function returns to the CUPTI client a buffer containing activity records. The buffer contains validSize bytes of activity records which should be read using cuptiActivityGetNextRecord. The number of dropped records can be read using cuptiActivityGetNumDroppedRecords. After this call CUPTI relinquished ownership of the buffer and will not use it anymore. The client may return the buffer to CUPTI using the CUpti_BuffersCallbackRequestFunc callback. Note: CUDA 6.0 onwards, all buffers returned by this callback are global buffers i.e. there is no context/stream specific buffer. User needs to parse the global buffer to extract the context/stream specific activity records.

Param buffer:

The activity record buffer.

Param size:

The total size of the buffer in bytes as set in CUpti_BuffersCallbackRequestFunc.

Param validSize:

The number of valid bytes in the buffer.

Param pBufferCompleteInfo:

Additional information about the completed buffer.

typedef void (*CUpti_BuffersCallbackRequestFunc)(uint8_t **buffer, size_t *size, size_t *maxNumRecords)#

Function type for callback used by CUPTI to request an empty buffer for storing activity records.

This callback function signals the CUPTI client that an activity buffer is needed by CUPTI. The activity buffer is used by CUPTI to store activity records. The callback function can decline the request by setting *buffer to NULL. In this case CUPTI may drop activity records.

Param buffer:

Returns the new buffer. If set to NULL then no buffer is returned.

Param size:

Returns the size of the returned buffer.

Param maxNumRecords:

Returns the maximum number of records that should be placed in the buffer. If 0 then the buffer is filled with as many records as possible. If > 0 the buffer is filled with at most that many records before it is returned.

typedef void (*CUpti_BuffersCallbackRequestFunc_v2)(uint8_t **buffer, size_t *size, size_t *maxNumRecords, CUpti_BufferCallbackRequestInfo *pBufferRequestInfo)#

Function type for callback used by CUPTI to request an empty buffer for storing activity records.

This callback function signals the CUPTI client that an activity buffer is needed by CUPTI. The activity buffer is used by CUPTI to store activity records. The callback function can decline the request by setting *buffer to NULL. In this case CUPTI may drop activity records.

Param buffer:

Returns the new buffer. If set to NULL then no buffer is returned.

Param size:

Returns the size of the returned buffer.

Param maxNumRecords:

Returns the maximum number of records that should be placed in the buffer. If 0 then the buffer is filled with as many records as possible. If > 0 the buffer is filled with at most that many records before it is returned.

Param pBufferRequestInfo:

Additional information about the requested buffer.

typedef uint64_t (*CUpti_TimestampCallbackFunc)(void)#

Function type for callback used by CUPTI to request a timestamp to be used in activity records.

This callback function signals the CUPTI client that a timestamp needs to be returned. This timestamp would be treated as normalized timestamp to be used for various purposes in CUPTI. For example to store start and end timestamps reported in the CUPTI activity records. The returned timestamp must be in nanoseconds.