bindings#

The cupti.cupti module exposes 1:1 Python bindings for the CUPTI C API. For most users the higher-level cupti.pm_sampling and cupti.profiler_host modules are the recommended entry points; the bindings on this page are intended for the Activity and Callback APIs (which do not yet have a pythonic wrapper), and as raw-access fallbacks for PM Sampling and Profiler Host.

Known limitations#

  • The members of the inner struct/union classes (_py_anon_pod*) are not adequately documented. Refer to the CUPTI C documentation for member-level details.

  • The kind member of Python classes has type int. Use cupti.cupti.ActivityKind to interpret its value.

  • Some parts of the API still cite C enums / data structures by their C names instead of mapping them to their Python counterparts.

Activity API#

Functions#

cupti.cupti.activity_configure_unified_memory_counter(config: int, count: int)#

Set Unified Memory Counter configuration.

Parameters:
  • config (intptr_t) – A pointer to CUpti_ActivityUnifiedMemoryCounterConfig structures containing Unified Memory counter configuration.

  • count (uint32_t) – Number of Unified Memory counter configuration structures.

cupti.cupti.activity_disable(kind: int)#

Disable collection of a specific kind of activity record.

Parameters:

kind (ActivityKind) – The kind of activity record to stop collecting.

cupti.cupti.activity_disable_context(context: int, kind: int)#

Disable collection of a specific kind of activity record for a context.

Parameters:
  • context (intptr_t) – The context for which activity is to be disabled.

  • kind (ActivityKind) – The kind of activity record to stop collecting.

cupti.cupti.activity_enable(kind: int)#

Enable collection of a specific kind of activity record.

Parameters:

kind (ActivityKind) – The kind of activity record to collect.

cupti.cupti.activity_enable_all_sync_records(enable: int)#

Enables collecting records for all synchronization operations.

Parameters:

enable (uint8_t) – is a boolean, denoting whether to enable or disable the collection of all CUDA event query and stream query records.

cupti.cupti.activity_enable_allocation_source(enable: int)#

Enables tracking the source library for memory allocation requests.

Parameters:

enable (uint8_t) – is a boolean, denoting whether the source library of the memory allocation request needs to be tracked.

cupti.cupti.activity_enable_and_dump(kind: int)#

Enable collection of a specific kind of activity record. For certain activity kinds it dumps existing records.

Parameters:

kind (ActivityKind) – The kind of activity record to collect.

cupti.cupti.activity_enable_context(context: int, kind: int)#

Enable collection of a specific kind of activity record for a context.

Parameters:
  • context (intptr_t) – The context for which activity is to be enabled.

  • kind (ActivityKind) – The kind of activity record to collect.

cupti.cupti.activity_enable_cuda_event_device_timestamps(enable: int)#

Enable/Disable collecting device timestamp for CUPTI_ACTIVITY_KIND_CUDA_EVENT record.

Parameters:

enable (uint8_t) – is a boolean, denoting whether to enable or disable the collection of CUDA event device timestamps.

cupti.cupti.activity_enable_device_graph(enable: int)#

Controls the collection of records for device launched graphs.

Parameters:

enable (uint8_t) – is a boolean, denoting whether these records should be collected.

cupti.cupti.activity_enable_driver_api(cbid: int, enable: int)#

Controls the collection of activity records for specific CUDA Driver APIs.

Parameters:
  • cbid (uint32_t) – callback id of the CUDA Driver API. This can be found in the header cupti_driver_cbid.h.

  • enable (uint8_t) – is a boolean, denoting whether to enable or disable the collection.

cupti.cupti.activity_enable_hw_trace(enable: int)#

Enables CUDA kernel timestamp collection via Hardware Event System (HES).

Parameters:

enable (uint8_t) – is a boolean flag to enable (true) HES-based timestamp collection.

cupti.cupti.activity_enable_latency_timestamps(enable: int)#

Controls the collection of queued and submitted timestamps for kernels.

Parameters:

enable (uint8_t) – is a boolean, denoting whether these timestamps should be collected.

cupti.cupti.activity_enable_launch_attributes(enable: int)#

Controls the collection of launch attributes for kernels.

Parameters:

enable (uint8_t) – is a boolean denoting whether these launch attributes should be collected.

cupti.cupti.activity_enable_runtime_api(cbid: int, enable: int)#

Controls the collection of activity records for specific CUDA Runtime APIs.

Parameters:
  • cbid (uint32_t) – callback id of the CUDA Runtime API. This can be found in the header cupti_runtime_cbid.h.

  • enable (uint8_t) – is a boolean, denoting whether to enable or disable the collection.

cupti.cupti.activity_flush_all(flag: int)#

Request to deliver activity records via the buffer completion callback.

Parameters:

flag (uint32_t) – The flag can be set to indicate a forced flush. See CUpti_ActivityFlag.

cupti.cupti.activity_flush_period(time: int)#

Sets the flush period for the worker thread.

Parameters:

time (uint32_t) – flush period in milliseconds (ms).

cupti.cupti.activity_get_attribute(attr: int, value_size: int, value: int)#

Read an activity API attribute.

Parameters:
  • attr (ActivityAttribute) – The attribute to read.

  • value_size (intptr_t) – Size of buffer pointed by the value, and returns the number of bytes written to value.

  • value (intptr_t) – Returns the value of the attribute.

cupti.cupti.activity_get_num_dropped_records(
context: int,
stream_id: int,
dropped: int,
)#

Get the number of activity records that were dropped of insufficient buffer space.

Parameters:
  • context (intptr_t) – The context, or NULL to get dropped count from global queue.

  • stream_id (uint32_t) – The stream ID.

  • dropped (intptr_t) – The number of records that were dropped since the last call to this function.

cupti.cupti.activity_pop_external_correlation_id(kind: int) int#

Pop an external correlation id for the calling thread.

Parameters:

kind (ExternalCorrelationKind) – The kind of external API activities should be correlated with.

Returns:

If the function returns successful, contains the last external correlation id for this kind, can be NULL.

Return type:

uint64_t

cupti.cupti.activity_push_external_correlation_id(kind: int, id: int)#

Push an external correlation id for the calling thread.

Parameters:
  • kind (ExternalCorrelationKind) – The kind of external API activities should be correlated with.

  • id (uint64_t) – External correlation id.

cupti.cupti.activity_register_callbacks(
func_buffer_requested,
func_buffer_completed,
)#

Registers callback functions with CUPTI for activity buffer handling.

Parameters:
  • func_buffer_requested (object) – callback which is invoked when an empty buffer is requested by CUPTI.

  • func_buffer_completed (object) – callback which is invoked when a buffer containing activity records is available from CUPTI.

cupti.cupti.activity_register_timestamp_callback(func_timestamp)#

Registers callback function with CUPTI for providing timestamp.

Parameters:

func_timestamp (object) – callback which is invoked when a timestamp is needed by CUPTI.

cupti.cupti.activity_set_attribute(attr: int, value_size: int, value: int)#

Write an activity API attribute.

Parameters:
  • attr (ActivityAttribute) – The attribute to write.

  • value_size (intptr_t) – The size, in bytes, of the value.

  • value (intptr_t) – The attribute value to write.

cupti.cupti.compute_capability_supported(major: int, minor: int) int#

Check support for a compute capability.

Parameters:
  • major (int) – The major revision number of the compute capability.

  • minor (int) – The minor revision number of the compute capability.

Returns:

Pointer to an integer to return the support status.

Return type:

int

cupti.cupti.device_supported(dev: int) int#

Check support for a compute device.

Parameters:

dev (int) – The device handle returned by CUDA Driver API cuDeviceGet.

Returns:

Pointer to an integer to return the support status.

Return type:

int

cupti.cupti.device_virtualization_mode(dev: int) int#

Query the virtualization mode of the device.

Parameters:

dev (int) – The device handle returned by CUDA Driver API cuDeviceGet.

Returns:

Pointer to an CUpti_DeviceVirtualizationMode to return the virtualization mode.

Return type:

int

cupti.cupti.finalize()#

Detach CUPTI from the running process.

See also

cuptiFinalize

cupti.cupti.get_auto_boost_state(context: int, state: int)#

Get auto boost state.

Parameters:
  • context (intptr_t) – A valid CUcontext.

  • state (intptr_t) – A pointer to CUpti_ActivityAutoBoostState structure which contains the current state and the id of the process that has requested the current state.

cupti.cupti.get_context_id(context: int) int#

Get the ID of a context.

Parameters:

context (intptr_t) – The context.

Returns:

Returns a process-unique ID for the context.

Return type:

uint32_t

cupti.cupti.get_device_id(context: int) int#

Get the ID of a device.

Parameters:

context (intptr_t) – The context, or NULL to indicate the current context.

Returns:

Returns the ID of the device that is current for the calling thread.

Return type:

uint32_t

See also

cuptiGetDeviceId

cupti.cupti.get_graph_exec_id(graph_exec: int) int#

Get the unique ID of executable graph.

Parameters:

graph_exec (intptr_t) – The executable graph.

Returns:

Returns the unique ID of the executable graph.

Return type:

uint32_t

cupti.cupti.get_graph_id(graph: int) int#

Get the unique ID of graph.

Parameters:

graph (intptr_t) – The graph.

Returns:

Returns the unique ID of the graph.

Return type:

uint32_t

See also

cuptiGetGraphId

cupti.cupti.get_graph_node_id(node: int) int#

Get the unique ID of a graph node.

Parameters:

node (intptr_t) – The graph node.

Returns:

Returns the unique ID of the node.

Return type:

uint64_t

cupti.cupti.get_last_error() int#

Returns the last error from a cupti call or callback.

cupti.cupti.get_stream_id_ex(
context: int,
stream: int,
per_thread_stream: int,
) int#

Get the ID of a stream.

Parameters:
  • context (intptr_t) – If non-NULL then the stream is checked to ensure that it belongs to this context. Typically this parameter should be null.

  • stream (intptr_t) – The stream.

  • per_thread_stream (uint8_t) – Flag to indicate if program is compiled for per-thread streams.

Returns:

Returns a context-unique ID for the stream.

Return type:

uint32_t

cupti.cupti.get_thread_id_type() int#

Get the thread-id type.

Returns:

.

Return type:

int

cupti.cupti.get_timestamp() int#

Get the CUPTI timestamp.

Returns:

Returns the CUPTI timestamp.

Return type:

uint64_t

cupti.cupti.is_tracing_session_running() int#

Check whether a CUPTI tracing session is still running. This API returns true if a CUPTI library is already loaded by another CUPTI user by using any CUPTI API. This API returns false when CUPTI is finalized and there is no CUPTI currently loaded and active. Can be used to determine if it is safe to unload your CUPTI based tool. Note that this API itself does not load the CUPTI library, it merely checks if a CUPTI library is already loaded by another CUPTI user.

Returns:

Returns whether the tracing session is still running.

Return type:

uint8_t

cupti.cupti.set_thread_id_type(type: int)#

Set the thread-id type.

Parameters:

type (ActivityThreadIdType) –

.

Enums#

class cupti.cupti.ActivityAttribute(value)#

Bases: IntEnum

Activity attributes.These attributes are used to control the behavior of the activity API.

See CUpti_ActivityAttribute.

ATTR_CIG_MODE = 2147483649#
ATTR_DEVICE_BUFFER_FORCE_INT = 2147483647#
ATTR_DEVICE_BUFFER_POOL_LIMIT = 2#
ATTR_DEVICE_BUFFER_PRE_ALLOCATE_VALUE = 6#
ATTR_DEVICE_BUFFER_SIZE = 0#
ATTR_DEVICE_BUFFER_SIZE_CDP = 1#
ATTR_DEVICE_BUFFER_SIZE_DEVICE_GRAPHS = 10#
ATTR_ENABLE_ALLOCATION_SOURCE_TRACKING = 14#
ATTR_ENABLE_ALL_SYNC_RECORDS = 16#
ATTR_ENABLE_CIG_MODE = 23#
ATTR_ENABLE_CUDA_EVENT_DEVICE_TIMESTAMPS = 17#
ATTR_ENABLE_DEVICE_GRAPH_TRACE = 19#
ATTR_ENABLE_HES = 13#
ATTR_ENABLE_KERNEL_LATENCY_TIMESTAMPS = 15#
ATTR_ENABLE_KERNEL_LAUNCH_ATTRIBUTES = 18#
ATTR_ENABLE_MULTI_SUBSCRIBER_GRAPH_LEVEL_TRACE = 2147483648#
ATTR_ENABLE_MULTI_SUBSCRIBER_GRAPH_TRACE = 20#
ATTR_MEM_ALLOCATION_TYPE_HOST_PINNED = 8#
ATTR_MULTIPLE_SUBSCRIBER_STATE = 12#
ATTR_PER_THREAD_BUFFER = 9#
ATTR_PROFILING_SEMAPHORE_POOL_LIMIT = 4#
ATTR_PROFILING_SEMAPHORE_POOL_SIZE = 3#
ATTR_PROFILING_SEMAPHORE_PRE_ALLOCATE_VALUE = 7#
ATTR_THREAD_ID_TYPE = 21#
ATTR_TIMESTAMP_CALLBACK = 22#
ATTR_USER_DEFINED_RECORDS = 11#
ATTR_ZEROED_OUT_BUFFER = 5#
class cupti.cupti.ActivityComputeApiKind(value)#

Bases: IntEnum

The kind of a compute API.

See CUpti_ActivityComputeApiKind.

CUDA = 1#
CUDA_MPS = 2#
FORCE_INT = 2147483647#
UNKNOWN = 0#
class cupti.cupti.ActivityEnvironmentKind(value)#

Bases: IntEnum

The kind of environment data. Used to indicate what type of data is being reported by an environment activity record.

See CUpti_ActivityEnvironmentKind.

COOLING = 4#
COUNT = 5#
FORCE_INT = 2147483647#
POWER = 3#
SPEED = 1#
TEMPERATURE = 2#
UNKNOWN = 0#
class cupti.cupti.ActivityFlag(value)#

Bases: IntEnum

Flags associated with activity records.Activity record flags. Flags can be combined by bitwise OR to associated multiple flags with an activity record. Each flag is specific to a certain activity kind, as noted below.

See CUpti_ActivityFlag.

DEVICE_ATTRIBUTE_CUDEVICE = 1#
DEVICE_CONCURRENT_KERNELS = 1#
FLUSH_FORCED = 1#
FORCE_INT = 2147483647#
GLOBAL_ACCESS_KIND_CACHED = 512#
GLOBAL_ACCESS_KIND_LOAD = 256#
GLOBAL_ACCESS_KIND_SIZE_MASK = 255#
INSTRUCTION_CLASS_MASK = 510#
INSTRUCTION_VALUE_INVALID = 1#
MARKER_COLOR_ARGB = 2#
MARKER_COLOR_NONE = 1#
MARKER_INSTANTANEOUS = 1#
MARKER_START = 2#
MARKER_SYNC_ACQUIRE = 8#
MARKER_SYNC_ACQUIRE_FAILED = 32#
MARKER_SYNC_ACQUIRE_SUCCESS = 16#
MARKER_SYNC_RELEASE = 64#
MEMCPY_ASYNC = 1#
MEMSET_ASYNC = 1#
METRIC_OVERFLOWED = 1#
METRIC_VALUE_INVALID = 2#
NONE = 0#
SHARED_ACCESS_KIND_LOAD = 256#
SHARED_ACCESS_KIND_SIZE_MASK = 255#
THRASHING_IN_CPU = 1#
THROTTLING_IN_CPU = 1#
class cupti.cupti.ActivityInstructionClass(value)#

Bases: IntEnum

SASS instruction classification.The sass instruction are broadly divided into different class. Each enum represents a classification.

See CUpti_ActivityInstructionClass.

BARRIER = 17#
BIT_CONVERSION = 4#
CONSTANT = 11#
CONTROL_FLOW = 5#
FP_16 = 19#
FP_32 = 1#
FP_64 = 2#
GENERIC = 9#
GLOBAL = 6#
GLOBAL_ATOMIC = 13#
INTEGER = 3#
INTER_THREAD_COMMUNICATION = 16#
KIND_FORCE_INT = 2147483647#
LOCAL = 8#
MISCELLANEOUS = 18#
SHARED = 7#
SHARED_ATOMIC = 14#
SURFACE = 10#
SURFACE_ATOMIC = 15#
TEXTURE = 12#
UNIFORM = 20#
UNKNOWN = 0#
class cupti.cupti.ActivityJitEntryType(value)#

Bases: IntEnum

The types of JIT entry.To be used in CUpti_ActivityJit.

See CUpti_ActivityJitEntryType.

FORCE_INT = 2147483647#
INVALID = 0#
NVVM_IR_TO_PTX = 2#
PTX_TO_CUBIN = 1#
class cupti.cupti.ActivityJitOperationType(value)#

Bases: IntEnum

The types of JIT compilation operations.To be used in CUpti_ActivityJit.

See CUpti_ActivityJitOperationType.

CACHE_LOAD = 1#
CACHE_STORE = 2#
COMPILE = 3#
FORCE_INT = 2147483647#
INVALID = 0#
class cupti.cupti.ActivityKind(value)#

Bases: IntEnum

The kinds of activity records.Each activity record kind represents information about a GPU or an activity occurring on a CPU or GPU. Each kind is associated with a activity record structure that holds the information associated with the kind.

See CUpti_ActivityKind.

BRANCH = 16#
CDP_KERNEL = 18#
COMPUTE_ENGINE_CTX_SWITCH = 57#
CONCURRENT_KERNEL = 10#
CONTEXT = 9#
COUNT = 60#
CUDA_EVENT = 36#
DEVICE = 8#
DEVICE_ATTRIBUTE = 28#
DEVICE_GRAPH_TRACE = 53#
DRIVER = 4#
ENVIRONMENT = 20#
EVENT = 6#
EVENT_INSTANCE = 21#
EXTERNAL_CORRELATION = 39#
FORCE_INT = 2147483647#
FUNCTION = 26#
GLOBAL_ACCESS = 15#
GRAPH_HOST_NODE = 56#
GRAPH_TRACE = 51#
GREEN_CONTEXT = 59#
HOST_LAUNCH = 58#
INSTANTANEOUS_EVENT = 41#
INSTANTANEOUS_EVENT_INSTANCE = 42#
INSTANTANEOUS_METRIC = 43#
INSTANTANEOUS_METRIC_INSTANCE = 44#
INSTRUCTION_CORRELATION = 32#
INSTRUCTION_EXECUTION = 24#
INTERNAL_LAUNCH_API = 48#
INVALID = 0#
JIT = 52#
KERNEL = 3#
MARKER = 12#
MARKER_DATA = 13#
MEMCPY = 1#
MEMCPY2 = 22#
MEMORY = 45#
MEMORY2 = 49#
MEMORY_POOL = 50#
MEMSET = 2#
MEM_DECOMPRESS = 54#
METRIC = 7#
METRIC_INSTANCE = 23#
MODULE = 27#
NAME = 11#
OPENACC_DATA = 33#
OPENACC_LAUNCH = 34#
OPENACC_OTHER = 35#
OPENMP = 47#
OVERHEAD = 17#
PCIE = 46#
PC_SAMPLING = 30#
PC_SAMPLING_RECORD_INFO = 31#
PREEMPTION = 19#
ROTATION = 55#
RUNTIME = 5#
SHARED_ACCESS = 29#
SOURCE_LOCATOR = 14#
STREAM = 37#
SYNCHRONIZATION = 38#
UNIFIED_MEMORY_COUNTER = 25#
class cupti.cupti.ActivityLaunchType(value)#

Bases: IntEnum

The type of the CUDA kernel launch.

See CUpti_ActivityLaunchType.

CBL_COMMANDLIST = 3#
COOPERATIVE_MULTI_DEVICE = 2#
COOPERATIVE_SINGLE_DEVICE = 1#
REGULAR = 0#
class cupti.cupti.ActivityMemcpyKind(value)#

Bases: IntEnum

The kind of a memory copy, indicating the source and destination targets of the copy.Each kind represents the source and destination targets of a memory copy. Targets are host, device, and array.

See CUpti_ActivityMemcpyKind.

ATOA = 5#
ATOD = 6#
ATOH = 4#
DTOA = 7#
DTOD = 8#
DTOH = 2#
FORCE_INT = 2147483647#
HTOA = 3#
HTOD = 1#
HTOH = 9#
PTOP = 10#
UNKNOWN = 0#
class cupti.cupti.ActivityMemoryKind(value)#

Bases: IntEnum

The kinds of memory accessed by a memory operation/copy.Each kind represents the type of the memory accessed by a memory operation/copy.

See CUpti_ActivityMemoryKind.

ARRAY = 4#
DEVICE = 3#
DEVICE_STATIC = 6#
FORCE_INT = 2147483647#
MANAGED = 5#
MANAGED_STATIC = 7#
PAGEABLE = 1#
PINNED = 2#
UNKNOWN = 0#
class cupti.cupti.ActivityMemoryOperationType(value)#

Bases: IntEnum

Memory operation types.Describes the type of memory operation, to be used with CUpti_ActivityMemory4.

See CUpti_ActivityMemoryOperationType.

ALLOCATION = 1#
FORCE_INT = 2147483647#
INVALID = 0#
RELEASE = 2#
class cupti.cupti.ActivityMemoryPoolOperationType(value)#

Bases: IntEnum

Memory pool operation types.Describes the type of memory pool operation, to be used with CUpti_ActivityMemoryPool2.

See CUpti_ActivityMemoryPoolOperationType.

CREATED = 1#
DESTROYED = 2#
FORCE_INT = 2147483647#
INVALID = 0#
TRIMMED = 3#
class cupti.cupti.ActivityMemoryPoolType(value)#

Bases: IntEnum

Memory pool types.Describes the type of memory pool, to be used with CUpti_ActivityMemory4.

See CUpti_ActivityMemoryPoolType.

FORCE_INT = 2147483647#
IMPORTED = 2#
INVALID = 0#
LOCAL = 1#
class cupti.cupti.ActivityObjectKind(value)#

Bases: IntEnum

The kinds of activity objects.

See CUpti_ActivityObjectKind.

CONTEXT = 4#
DEVICE = 3#
FORCE_INT = 2147483647#
PROCESS = 1#
STREAM = 5#
THREAD = 2#
UNKNOWN = 0#
class cupti.cupti.ActivityOverheadKind(value)#

Bases: IntEnum

The kinds of activity overhead.

See CUpti_ActivityOverheadKind.

ACTIVITY_BUFFER_REQUEST = 458752#
COMMAND_BUFFER_FULL = 393216#
CUPTI_BUFFER_FLUSH = 65536#
CUPTI_INSTRUMENTATION = 131072#
CUPTI_RESOURCE = 196608#
DRIVER_COMPILER = 1#
FORCE_INT = 2147483647#
LAZY_FUNCTION_LOADING = 327680#
RUNTIME_TRIGGERED_MODULE_LOADING = 262144#
UNKNOWN = 0#
UVM_ACTIVITY_INIT = 524288#
class cupti.cupti.ActivityPCSamplingPeriod(value)#

Bases: IntEnum

Sampling period for PC sampling method.Sampling period can be set using cuptiActivityConfigurePCSampling

See CUpti_ActivityPCSamplingPeriod.

FORCE_INT = 2147483647#
HIGH = 4#
INVALID = 0#
LOW = 2#
MAX = 5#
MID = 3#
MIN = 1#
class cupti.cupti.ActivityPCSamplingStallReason(value)#

Bases: IntEnum

The stall reason for PC sampling activity.

See CUpti_ActivityPCSamplingStallReason.

CONSTANT_MEMORY_DEPENDENCY = 7#
EXEC_DEPENDENCY = 3#
FORCE_INT = 2147483647#
INST_FETCH = 2#
INVALID = 0#
MEMORY_DEPENDENCY = 4#
MEMORY_THROTTLE = 9#
NONE = 1#
NOT_SELECTED = 10#
OTHER = 11#
PIPE_BUSY = 8#
SLEEPING = 12#
SYNC = 6#
TEXTURE = 5#
class cupti.cupti.ActivityPartitionedGlobalCacheConfig(value)#

Bases: IntEnum

Partitioned global caching option.

See CUpti_ActivityPartitionedGlobalCacheConfig.

FORCE_INT = 2147483647#
NOT_SUPPORTED = 1#
OFF = 2#
ON = 3#
UNKNOWN = 0#
class cupti.cupti.ActivityPreemptionKind(value)#

Bases: IntEnum

The kind of a preemption activity.

See CUpti_ActivityPreemptionKind.

FORCE_INT = 2147483647#
RESTORE = 2#
SAVE = 1#
UNKNOWN = 0#
class cupti.cupti.ActivityStreamFlag(value)#

Bases: IntEnum

stream type.The types of stream to be used with CUpti_ActivityStream.

See CUpti_ActivityStreamFlag.

FLAG_DEFAULT = 1#
FLAG_FORCE_INT = 2147483647#
FLAG_NON_BLOCKING = 2#
FLAG_NULL = 3#
FLAG_UNKNOWN = 0#
MASK = 65535#
class cupti.cupti.ActivitySynchronizationType(value)#

Bases: IntEnum

Synchronization type.The types of synchronization to be used with CUpti_ActivitySynchronization2.

See CUpti_ActivitySynchronizationType.

CONTEXT_SYNCHRONIZE = 4#
EVENT_SYNCHRONIZE = 1#
FORCE_INT = 2147483647#
STREAM_SYNCHRONIZE = 3#
STREAM_WAIT_EVENT = 2#
UNKNOWN = 0#
class cupti.cupti.ActivityThreadIdType(value)#

Bases: IntEnum

Thread-Id types.CUPTI uses different methods to obtain the thread-id depending on the support and the underlying platform. This enum documents these methods for each type. APIs cuptiSetThreadIdType and cuptiGetThreadIdType can be used to set and get the thread-id type.

See CUpti_ActivityThreadIdType.

DEFAULT = 0#
FORCE_INT = 2147483647#
SIZE = 2#
SYSTEM = 1#
class cupti.cupti.ActivityUnifiedMemoryAccessType(value)#

Bases: IntEnum

Memory access type for unified memory page faults.This is valid for CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_GPU_PAGE_FAULT and CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_CPU_PAGE_FAULT_COUNT

See CUpti_ActivityUnifiedMemoryAccessType.

ATOMIC = 3#
PREFETCH = 4#
READ = 1#
UNKNOWN = 0#
WRITE = 2#
class cupti.cupti.ActivityUnifiedMemoryCounterKind(value)#

Bases: IntEnum

Kind of the Unified Memory counter.Many activities are associated with Unified Memory mechanism; among them are transfers from host to device, device to host, page fault at host side.

See CUpti_ActivityUnifiedMemoryCounterKind.

BYTES_TRANSFER_DTOD = 8#
BYTES_TRANSFER_DTOH = 2#
BYTES_TRANSFER_HTOD = 1#
COUNT = 9#
CPU_PAGE_FAULT_COUNT = 3#
FORCE_INT = 2147483647#
GPU_PAGE_FAULT = 4#
REMOTE_MAP = 7#
THRASHING = 5#
THROTTLING = 6#
UNKNOWN = 0#
class cupti.cupti.ActivityUnifiedMemoryCounterScope(value)#

Bases: IntEnum

Scope of the unified memory counter (deprecated in CUDA 7.0)

See CUpti_ActivityUnifiedMemoryCounterScope.

COUNT = 3#
FORCE_INT = 2147483647#
PROCESS_ALL_DEVICES = 2#
PROCESS_SINGLE_DEVICE = 1#
UNKNOWN = 0#
class cupti.cupti.ActivityUnifiedMemoryMigrationCause(value)#

Bases: IntEnum

Migration cause of the Unified Memory counter.This is valid for CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_HTOD and CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOH and CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOD

See CUpti_ActivityUnifiedMemoryMigrationCause.

ACCESS_COUNTERS = 5#
COHERENCE = 2#
EVICTION = 4#
PREFETCH = 3#
UNKNOWN = 0#
USER = 1#
class cupti.cupti.ActivityUnifiedMemoryRemoteMapCause(value)#

Bases: IntEnum

Remote memory map cause of the Unified Memory counter.This is valid for CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_REMOTE_MAP

See CUpti_ActivityUnifiedMemoryRemoteMapCause.

COHERENCE = 1#
EVICTION = 5#
OUT_OF_MEMORY = 4#
POLICY = 3#
THRASHING = 2#
UNKNOWN = 0#
class cupti.cupti.ChannelType(value)#

Bases: IntEnum

See CUpti_ChannelType.

ASYNC_MEMCPY = 2#
COMPUTE = 1#
DECOMP = 3#
FORCE_INT = 2147483647#
INVALID = 0#
class cupti.cupti.ComputeEngineCtxSwitchOperationType(value)#

Bases: IntEnum

The operation type of CUDA context switch event records.

See CUpti_ComputeEngineCtxSwitchOperationType.

COUNT = 2147483647#
INVALID = 0#
START = 1#
class cupti.cupti.ConfidentialComputeRotationEventType(value)#

Bases: IntEnum

See CUpti_ConfidentialComputeRotationEventType.

EVENT_TYPE_FORCE_INT = 2147483647#
INVALID_ROTATION_EVENT = 0#
KEY_ROTATION_ACKNOWLEGED = 2#
KEY_ROTATION_CHANNEL_BLOCKED = 1#
KEY_ROTATION_CHANNEL_DRAINED = 2#
KEY_ROTATION_CHANNEL_UNBLOCKED = 3#
KEY_ROTATION_COMPLETED = 4#
KEY_ROTATION_REQUESTED = 1#
KEY_ROTATION_STARTED = 3#
class cupti.cupti.ContextCigMode(value)#

Bases: IntEnum

CIG (CUDA in Graphics) Modes.Describes the CIG modes associated with the CUDA context.

See CUpti_ContextCigMode.

CIG = 1#
CIG_FALLBACK = 2#
FORCE_INT = 2147483647#
NONE = 0#
class cupti.cupti.DevType(value)#

Bases: IntEnum

The device type for device connected to NVLink.

See CUpti_DevType.

FORCE_INT = 2147483647#
GPU = 1#
INVALID = 0#
NPU = 2#
class cupti.cupti.DeviceAttribute(value)#

Bases: IntEnum

Device attributes.CUPTI device attributes. These attributes can be read using cuptiDeviceGetAttribute.

See CUpti_DeviceAttribute.

ATTR_CLASS = 10#
ATTR_FLOP_DP_PER_CYCLE = 12#
ATTR_FLOP_HP_PER_CYCLE = 17#
ATTR_FLOP_SP_PER_CYCLE = 11#
ATTR_FORCE_INT = 2147483647#
ATTR_GLOBAL_MEMORY_BANDWIDTH = 3#
ATTR_INSTRUCTION_PER_CYCLE = 4#
ATTR_INSTRUCTION_THROUGHPUT_SINGLE_PRECISION = 5#
ATTR_MAX_EVENT_DOMAIN_ID = 2#
ATTR_MAX_EVENT_ID = 1#
ATTR_MAX_FRAME_BUFFERS = 6#
ATTR_MAX_L2_UNITS = 13#
ATTR_MAX_SHARED_MEMORY_CACHE_CONFIG_PREFER_EQUAL = 16#
ATTR_MAX_SHARED_MEMORY_CACHE_CONFIG_PREFER_L1 = 15#
ATTR_MAX_SHARED_MEMORY_CACHE_CONFIG_PREFER_SHARED = 14#
ATTR_NVSWITCH_PRESENT = 20#
ATTR_PCIE_GEN = 9#
class cupti.cupti.DeviceVirtualizationMode(value)#

Bases: IntEnum

This indicates the virtualization mode in which CUDA device is running

See CUpti_DeviceVirtualizationMode.

FORCE_INT = 2147483647#
NONE = 0#
PASS_THROUGH = 1#
VIRTUAL_GPU = 2#
class cupti.cupti.EnvironmentClocksThrottleReason(value)#

Bases: IntEnum

Reasons for clock throttling.The possible reasons that a clock can be throttled. There can be more than one reason that a clock is being throttled so these types can be combined by bitwise OR. These are used in the clocksThrottleReason field in the Environment Activity Record.

See CUpti_EnvironmentClocksThrottleReason.

FORCE_INT = 2147483647#
GPU_IDLE = 1#
HW_SLOWDOWN = 8#
NONE = 0#
SW_POWER_CAP = 4#
UNKNOWN = 2147483648#
UNSUPPORTED = 1073741824#
USER_DEFINED_CLOCKS = 2#
class cupti.cupti.ExternalCorrelationKind(value)#

Bases: IntEnum

The kind of external APIs supported for correlation.Custom correlation kinds are reserved for usage in external tools.

See CUpti_ExternalCorrelationKind.

CUSTOM0 = 3#
CUSTOM1 = 4#
CUSTOM2 = 5#
FORCE_INT = 2147483647#
INVALID = 0#
OPENACC = 2#
SIZE = 6#
UNKNOWN = 1#
class cupti.cupti.FuncExecutionModel(value)#

Bases: IntEnum

The execution model of a kernel function. This should be used to set executionModel field in kernel records.

See CUpti_FuncExecutionModel.

FORCE_INT = 2147483647#
SIMT = 1#
SIZE = 3#
TILE = 2#
UNKNOWN = 0#
class cupti.cupti.FuncShmemLimitConfig(value)#

Bases: IntEnum

The shared memory limit per block config for a kernel This should be used to set ‘cudaOccFuncShmemConfig’ field in occupancy calculator API.

See CUpti_FuncShmemLimitConfig.

DEFAULT = 0#
FORCE_INT = 2147483647#
OPTIN = 1#
class cupti.cupti.OpenAccConstructKind(value)#

Bases: IntEnum

The OpenAcc parent construct kind for OpenAcc activity records.

See CUpti_OpenAccConstructKind.

ATOMIC = 8#
DATA = 4#
DECLARE = 9#
ENTER_DATA = 5#
EXIT_DATA = 6#
FORCE_INT = 2147483647#
HOST_DATA = 7#
INIT = 10#
KERNELS = 2#
LOOP = 3#
PARALLEL = 1#
ROUTINE = 14#
RUNTIME_API = 16#
SET = 12#
SHUTDOWN = 11#
UNKNOWN = 0#
UPDATE = 13#
WAIT = 15#
class cupti.cupti.OpenAccEventKind(value)#

Bases: IntEnum

The OpenAcc event kind for OpenAcc activity records.

See CUpti_OpenAccEventKind.

ALLOC = 15#
COMPUTE_CONSTRUCT = 9#
CREATE = 13#
DELETE = 14#
DEVICE_INIT = 1#
DEVICE_SHUTDOWN = 2#
ENQUEUE_DOWNLOAD = 6#
ENQUEUE_LAUNCH = 4#
ENQUEUE_UPLOAD = 5#
ENTER_DATA = 11#
EXIT_DATA = 12#
FORCE_INT = 2147483647#
FREE = 16#
IMPLICIT_WAIT = 8#
INVALID = 0#
RUNTIME_SHUTDOWN = 3#
UPDATE = 10#
WAIT = 7#
class cupti.cupti.OpenMpEventKind(value)#

Bases: IntEnum

See CUpti_OpenMpEventKind.

FORCE_INT = 2147483647#
IDLE = 4#
INVALID = 0#
PARALLEL = 1#
TASK = 2#
THREAD = 3#
WAIT_BARRIER = 5#
WAIT_TASKWAIT = 6#
class cupti.cupti.PcieDeviceType(value)#

Bases: IntEnum

Field to differentiate whether PCIE Activity record is of a GPU or a PCI Bridge

See CUpti_PcieDeviceType.

BRIDGE = 1#
FORCE_INT = 2147483647#
GPU = 0#

Classes#

class cupti.cupti.ActivityAPI#

Bases: object

Empty-initialize an instance of CUpti_ActivityAPI.

cbid#

The ID of the driver or runtime function.

Type:

int

correlation_id#

The correlation ID of the driver or runtime CUDA function. Each function invocation is assigned a unique correlation ID that is identical to the correlation ID in the memcpy, memset, or kernel activity record that is associated with this function.

Type:

int

end#

The end timestamp for the function, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the function.

Type:

int

static from_buffer(buffer)#

Create an ActivityAPI instance with the memory from the given buffer.

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_DRIVER, CUPTI_ACTIVITY_KIND_RUNTIME, or CUPTI_ACTIVITY_KIND_INTERNAL_LAUNCH_API.

Type:

int

process_id#

The ID of the process where the driver or runtime CUDA function is executing.

Type:

int

ptr#

Get the pointer address to the data as Python int.

return_value#

The return value for the function. For a CUDA driver function with will be a CUresult value, and for a CUDA runtime function this will be a cudaError_t value.

Type:

int

start#

The start timestamp for the function, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the function.

Type:

int

thread_id#

The ID of the thread where the driver or runtime CUDA function is executing.

Type:

int

class cupti.cupti.ActivityAutoBoostState#

Bases: object

Empty-initialize an instance of CUpti_ActivityAutoBoostState.

enabled#

Returned auto boost state. 1 is returned in case auto boost is enabled, 0 otherwise

Type:

int

static from_buffer(buffer)#

Create an ActivityAutoBoostState instance with the memory from the given buffer.

pid#

Id of process that has set the current boost state. The value will be CUPTI_AUTO_BOOST_INVALID_CLIENT_PID if the user does not have the permission to query process ids or there is an error in querying the process id.

Type:

int

ptr#

Get the pointer address to the data as Python int.

class cupti.cupti.ActivityCdpKernel#

Bases: object

Empty-initialize an instance of CUpti_ActivityCdpKernel.

block_x#

The X-dimension block size for the kernel.

Type:

int

block_y#

The Y-dimension block size for the kernel.

Type:

int

block_z#

The Z-dimension grid size for the kernel.

Type:

int

cache_config#

_py_anon_pod7:

completed#

The timestamp when kernel is marked as completed, in ns. A value of CUPTI_TIMESTAMP_UNKNOWN indicates that the completion time is unknown.

Type:

int

context_id#

The ID of the context where the kernel is executing.

Type:

int

correlation_id#

The correlation ID of the kernel. Each kernel execution is assigned a unique correlation ID that is identical to the correlation ID in the driver API activity record that launched the kernel.

Type:

int

device_id#

The ID of the device where the kernel is executing.

Type:

int

dynamic_shared_memory#

The dynamic shared memory reserved for the kernel, in bytes.

Type:

int

end#

The end timestamp for the kernel execution, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the kernel.

Type:

int

static from_buffer(buffer)#

Create an ActivityCdpKernel instance with the memory from the given buffer.

grid_id#

The grid ID of the kernel. Each kernel execution is assigned a unique grid ID.

Type:

int

grid_x#

The X-dimension grid size for the kernel.

Type:

int

grid_y#

The Y-dimension grid size for the kernel.

Type:

int

grid_z#

The Z-dimension grid size for the kernel.

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_CDP_KERNEL

Type:

int

local_memory_per_thread#

The amount of local memory reserved for each thread, in bytes.

Type:

int

local_memory_total#

The total amount of local memory reserved for the kernel, in bytes.

Type:

int

name#

The name of the kernel. This name is shared across all activity records representing the same kernel, and so should not be modified.

Type:

str

pad#

Undefined. Reserved for internal use.

Type:

int

parent_block_x#

The X-dimension of the parent block.

Type:

int

parent_block_y#

The Y-dimension of the parent block.

Type:

int

parent_block_z#

The Z-dimension of the parent block.

Type:

int

parent_grid_id#

The grid ID of the parent kernel.

Type:

int

ptr#

Get the pointer address to the data as Python int.

queued#

The timestamp when kernel is queued up, in ns. A value of CUPTI_TIMESTAMP_UNKNOWN indicates that the queued time is unknown.

Type:

int

registers_per_thread#

The number of registers required for each thread executing the kernel.

Type:

int

shared_memory_config#

The shared memory configuration used for the kernel. The value is one of the CUsharedconfig enumeration values from cuda.h.

Type:

int

start#

The start timestamp for the kernel execution, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the kernel.

Type:

int

static_shared_memory#

The static shared memory allocated for the kernel, in bytes.

Type:

int

stream_id#

The ID of the stream where the kernel is executing.

Type:

int

submitted#

The timestamp when kernel is submitted to the gpu, in ns. A value of CUPTI_TIMESTAMP_UNKNOWN indicates that the submission time is unknown.

Type:

int

class cupti.cupti.ActivityComputeEngineCtxSwitch#

Bases: object

Empty-initialize an instance of CUpti_ActivityComputeEngineCtxSwitch.

context_id#

The ID of the CUDA context.

Type:

int

static from_buffer(buffer)#

Create an ActivityComputeEngineCtxSwitch instance with the memory from the given buffer.

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_COMPUTE_ENGINE_CTX_SWITCH

Type:

int

operation_type#

The type of the Compute Engine Context switch operation. CUPTI_COMPUTE_ENGINE_CTX_SWITCH_OPERATION_START indicates the start of the context switch operation. CUPTI_COMPUTE_ENGINE_CTX_SWITCH_OPERATION_END indicates the end of the context switch operation.

Type:

int

padding#

int:

ptr#

Get the pointer address to the data as Python int.

timestamp#

The timestamp at which the CUpti_ComputeEngineCtxSwitchOperationType occurs.

Type:

int

class cupti.cupti.ActivityConfidentialComputeRotation#

Bases: object

Empty-initialize an instance of CUpti_ActivityConfidentialComputeRotation.

channel_id#

Channel ID

Type:

int

channel_type#

Channel Type CUpti_ChannelType

Type:

int

context_id#

Context ID

Type:

int

device_id#

Device ID

Type:

int

event_type#

Type of event CUpti_ConfidentialComputeRotationEventType

Type:

int

static from_buffer(buffer)#

Create an ActivityConfidentialComputeRotation instance with the memory from the given buffer.

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_CONFIDENTIAL_COMPUTE_ROTATION.

Type:

int

ptr#

Get the pointer address to the data as Python int.

timestamp#

Timestamp in ns

Type:

int

class cupti.cupti.ActivityContext4#

Bases: object

Empty-initialize an instance of CUpti_ActivityContext4.

cig_mode#

This field indicates the CIG mode

Type:

int

compute_api_kind#

The compute API kind. CUpti_ActivityComputeApiKind

Type:

int

context_id#

The context ID.

Type:

int

device_id#

The device ID.

Type:

int

static from_buffer(buffer)#

Create an ActivityContext4 instance with the memory from the given buffer.

is_green_context#

This field indicates whether the context is a green context

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_CONTEXT.

Type:

int

null_stream_id#

The ID for the NULL stream in this context

Type:

int

num_multiprocessors#

Number of multiprocessors assigned to the green context Invalid if the field ‘isGreenContext’ is 0

Type:

int

padding#

Undefined. Reserved for internal use.

Type:

int

padding2#

Undefined. Reserved for internal use.

Type:

int

parent_context_id#

The ID of the parent context. It would be 0 if context does not have parent

Type:

int

process_id#

The ID of the process associated with the context.

Type:

int

ptr#

Get the pointer address to the data as Python int.

class cupti.cupti.ActivityCudaEvent2#

Bases: object

Empty-initialize an instance of CUpti_ActivityCudaEvent2.

context_id#

The ID of the context where the event was recorded.

Type:

int

correlation_id#

The correlation ID of the API to which this result is associated.

Type:

int

cuda_event_sync_id#

A unique ID to associate event synchronization records with the latest CUDA Event record. Similar field is added in CUpti_ActivitySynchronization2 to associate CUDA Event record to the synchronization record. The same CUDA event can be used multiple times, so the event id will not be unique to correlate the synchronization record with the latest CUDA Event record. This field will be unique and can be used to do the required correlation.

Type:

int

device_id#

The ID of the device where the event was recorded.

Type:

int

device_timestamp#

The device-side timestamp on CUDA event record. Timestamp is in nanoseconds. Collection of this field is disabled by default. It can be enabled by calling CUPTI API cuptiActivityEnableCudaEventDeviceTimestamps

Type:

int

event_id#

A unique event ID to identify the event record.

Type:

int

static from_buffer(buffer)#

Create an ActivityCudaEvent2 instance with the memory from the given buffer.

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_CUDA_EVENT.

Type:

int

pad#

Undefined. Reserved for internal use.

Type:

int

pad2#

Undefined. Reserved for internal use.

Type:

int

ptr#

Get the pointer address to the data as Python int.

reserved0#

Undefined. Reserved for internal use.

Type:

int

stream_id#

The compute stream where the event was recorded.

Type:

int

class cupti.cupti.ActivityDevice6#

Bases: object

Empty-initialize an instance of CUpti_ActivityDevice6.

compute_capability_major#

Compute capability for the device, major number.

Type:

int

compute_capability_minor#

Compute capability for the device, minor number.

Type:

int

compute_instance_id#

Compute Instance id for MIG enabled devices. If mig mode is disabled value is set to UINT32_MAX

Type:

int

constant_memory_size#

The amount of constant memory on the device, in bytes.

Type:

int

core_clock_rate#

The core clock rate of the device, in kHz.

Type:

int

ecc_enabled#

ECC enabled flag for device

Type:

int

flags_#

The flags associated with the device. CUpti_ActivityFlag

Type:

int

static from_buffer(buffer)#

Create an ActivityDevice6 instance with the memory from the given buffer.

global_memory_bandwidth#

The global memory bandwidth available on the device, in kBytes/sec.

Type:

int

global_memory_size#

The amount of global memory on the device, in bytes.

Type:

int

gpu_instance_id#

GPU Instance id for MIG enabled devices. If mig mode is disabled value is set to UINT32_MAX

Type:

int

id#

The device ID.

Type:

int

is_cuda_visible#

Flag to indicate whether the device is visible to CUDA. Users can set the device visibility using CUDA_VISIBLE_DEVICES environment

Type:

int

is_mig_enabled#

MIG enabled flag for device

Type:

int

is_numa_node#

Numa (Non-uniform memory access) information for device GPU is a NUMA node or not

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_DEVICE.

Type:

int

l2cache_size#

The size of the L2 cache on the device, in bytes.

Type:

int

max_block_dim_x#

Maximum allowed X dimension for a block.

Type:

int

max_block_dim_y#

Maximum allowed Y dimension for a block.

Type:

int

max_block_dim_z#

Maximum allowed Z dimension for a block.

Type:

int

max_blocks_per_multiprocessor#

Maximum number of blocks that can be present on a multiprocessor at any given time.

Type:

int

max_grid_dim_x#

Maximum allowed X dimension for a grid.

Type:

int

max_grid_dim_y#

Maximum allowed Y dimension for a grid.

Type:

int

max_grid_dim_z#

Maximum allowed Z dimension for a grid.

Type:

int

max_ipc#

The maximum “instructions per cycle” possible on each device multiprocessor.

Type:

int

max_registers_per_block#

Maximum number of registers that can be allocated to a block.

Type:

int

max_registers_per_multiprocessor#

Maximum number of 32-bit registers available per multiprocessor.

Type:

int

max_shared_memory_per_block#

Maximum amount of shared memory that can be assigned to a block, in bytes.

Type:

int

max_shared_memory_per_multiprocessor#

Maximum amount of shared memory available per multiprocessor, in bytes.

Type:

int

max_threads_per_block#

Maximum number of threads allowed in a block.

Type:

int

max_warps_per_multiprocessor#

Maximum number of warps that can be present on a multiprocessor at any given time.

Type:

int

mig_uuid#

The MIG UUID. This value is the globally unique immutable alphanumeric identifier of the device.

Type:

cuda.bindings.driver.CUuuid

name#

The device name. Client is responsible for freeing this memory using the free function when done.

Type:

str

num_memcpy_engines#

Number of memory copy engines on the device.

Type:

int

num_multiprocessors#

Number of multiprocessors on the device.

Type:

int

num_threads_per_warp#

The number of threads per warp on the device.

Type:

int

num_tpcs#

Number of TPCs on the device.

Type:

int

numa_id#

Numa (Non-uniform memory access) information for device NUMA node ID of the GPU memory if GPU is not a NUMA node, it returns invalidNumaId

Type:

int

ptr#

Get the pointer address to the data as Python int.

reserved0#

Undefined. Reserved for internal use to maintain 8-byte alignment.

Type:

int

uuid#

The device UUID. This value is the globally unique immutable alphanumeric identifier of the device.

Type:

cuda.bindings.driver.CUuuid

class cupti.cupti.ActivityDeviceAttribute#

Bases: object

Empty-initialize an instance of CUpti_ActivityDeviceAttribute.

attribute#

The attribute, either a CUpti_DeviceAttribute or CUdevice_attribute. Flag CUPTI_ACTIVITY_FLAG_DEVICE_ATTRIBUTE_CUDEVICE is used to indicate what kind of attribute this is. If CUPTI_ACTIVITY_FLAG_DEVICE_ATTRIBUTE_CUDEVICE is 1 then CUdevice_attribute field is value, otherwise CUpti_DeviceAttribute field is valid.

Type:

_py_anon_pod9

device_id#

The ID of the device that this attribute applies to.

Type:

int

flags_#

The flags associated with the device. CUpti_ActivityFlag

Type:

int

static from_buffer(buffer)#

Create an ActivityDeviceAttribute instance with the memory from the given buffer.

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_DEVICE_ATTRIBUTE.

Type:

int

ptr#

Get the pointer address to the data as Python int.

value#

The value for the attribute. See CUpti_DeviceAttribute and CUdevice_attribute for the type of the value for a given attribute.

Type:

_py_anon_pod10

class cupti.cupti.ActivityDeviceGraphTrace#

Bases: object

Empty-initialize an instance of CUpti_ActivityDeviceGraphTrace.

context_id#

The ID of the context where the first node of the graph is executed.

Type:

int

device_id#

The ID of the device where the first node of the graph is executed.

Type:

int

device_launch_mode#

The type of launch. See CUpti_DeviceGraphLaunchMode

Type:

int

end#

The end timestamp for the graph execution, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the graph.

Type:

int

static from_buffer(buffer)#

Create an ActivityDeviceGraphTrace instance with the memory from the given buffer.

graph_id#

The unique ID of the graph that is launched.

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_DEVICE_GRAPH_TRACE

Type:

int

launcher_graph_id#

The unique ID of the graph that has launched this graph.

Type:

int

ptr#

Get the pointer address to the data as Python int.

start#

The start timestamp for the graph execution, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the graph.

Type:

int

stream_id#

The ID of the stream where the graph is being launched.

Type:

int

class cupti.cupti.ActivityEnvironment#

Bases: object

Empty-initialize an instance of CUpti_ActivityEnvironment.

data_#

_py_anon_pod11:

device_id#

The ID of the device

Type:

int

environment_kind#

The kind of data reported in this record.

Type:

int

static from_buffer(buffer)#

Create an ActivityEnvironment instance with the memory from the given buffer.

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_ENVIRONMENT.

Type:

int

ptr#

Get the pointer address to the data as Python int.

timestamp#

The timestamp when this sample was retrieved, in ns. A value of 0 indicates that timestamp information could not be collected for the marker.

Type:

int

class cupti.cupti.ActivityEnvironmentCooling#

Bases: object

Empty-initialize an instance of CUpti_ActivityEnvironmentCooling.

fan_speed#

The fan speed as percentage of maximum.

Type:

int

static from_buffer(buffer)#

Create an ActivityEnvironmentCooling instance with the memory from the given buffer.

ptr#

Get the pointer address to the data as Python int.

class cupti.cupti.ActivityEnvironmentPower#

Bases: object

Empty-initialize an instance of CUpti_ActivityEnvironmentPower.

static from_buffer(buffer)#

Create an ActivityEnvironmentPower instance with the memory from the given buffer.

power#

int:

power_limit#

int:

ptr#

Get the pointer address to the data as Python int.

class cupti.cupti.ActivityEnvironmentSpeed#

Bases: object

Empty-initialize an instance of CUpti_ActivityEnvironmentSpeed.

clocks_throttle_reasons#

The clocks throttle reasons.

Type:

int

static from_buffer(buffer)#

Create an ActivityEnvironmentSpeed instance with the memory from the given buffer.

memory_clock#

The memory frequency in MHz

Type:

int

The PCIe link generation.

Type:

int

The PCIe link width.

Type:

int

ptr#

Get the pointer address to the data as Python int.

sm_clock#

The SM frequency in MHz

Type:

int

class cupti.cupti.ActivityEnvironmentTemperature#

Bases: object

Empty-initialize an instance of CUpti_ActivityEnvironmentTemperature.

static from_buffer(buffer)#

Create an ActivityEnvironmentTemperature instance with the memory from the given buffer.

gpu_temperature#

The GPU temperature in degrees C.

Type:

int

ptr#

Get the pointer address to the data as Python int.

class cupti.cupti.ActivityExternalCorrelation#

Bases: object

Empty-initialize an instance of CUpti_ActivityExternalCorrelation.

correlation_id#

The correlation ID of the associated CUDA driver or runtime API record.

Type:

int

external_id#

The correlation ID of the associated non-CUDA API record. The exact field in the associated external record depends on that record’s activity kind (externalKind).

Type:

int

external_kind#

The kind of external API this record correlated to.

Type:

int

static from_buffer(buffer)#

Create an ActivityExternalCorrelation instance with the memory from the given buffer.

kind#

The kind of this activity.

Type:

int

ptr#

Get the pointer address to the data as Python int.

class cupti.cupti.ActivityFunction#

Bases: object

Empty-initialize an instance of CUpti_ActivityFunction.

context_id#

The ID of the context where the function is launched.

Type:

int

static from_buffer(buffer)#

Create an ActivityFunction instance with the memory from the given buffer.

function_index#

The function’s unique symbol index in the module.

Type:

int

id#

ID to uniquely identify the record

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_FUNCTION.

Type:

int

module_id#

The module ID in which this global/device function is present.

Type:

int

name#

The name of the function. This name is shared across all activity records representing the same kernel, and so should not be modified.

Type:

str

pad#

Undefined. Reserved for internal use.

Type:

int

ptr#

Get the pointer address to the data as Python int.

class cupti.cupti.ActivityGraphHostNode#

Bases: object

Empty-initialize an instance of CUpti_ActivityGraphHostNode.

context_id#

The ID of the CUDA context to which the waiting CUDA stream belongs.

Type:

int

correlation_id#

The correlation ID of the graph host node operation. Each operation is assigned a unique correlation ID that is identical to the correlation ID in the driver API activity record that launched the operation.

Type:

int

device_id#

The ID of the CUDA device to which the CUDA context and stream belong. The graph host node is executing on the CPU, but it is associated with the CUDA device through the CUDA context and stream.

Type:

int

end#

The end timestamp. A value of CUPTI_TIMESTAMP_UNKNOWN indicates that the start time is unknown.

Type:

int

static from_buffer(buffer)#

Create an ActivityGraphHostNode instance with the memory from the given buffer.

graph_id#

The unique ID of the graph that executed this host node through graph launch.

Type:

int

graph_node_id#

The unique ID of the graph node that executed this host node through graph launch.

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_GRAPH_HOST_NODE

Type:

int

process_id#

The ID of the process where the host node is executing.

Type:

int

ptr#

Get the pointer address to the data as Python int.

start#

The start timestamp. A value of CUPTI_TIMESTAMP_UNKNOWN indicates that the start time is unknown.

Type:

int

stream_id#

The ID of the stream that waits for the graph host node to finish.

Type:

int

thread_id#

The ID of the thread where the host node is executing.

Type:

int

class cupti.cupti.ActivityGraphTrace2#

Bases: object

Empty-initialize an instance of CUpti_ActivityGraphTrace2.

context_id#

The ID of the context where the first node of the graph is executed. If this is INT_MAX, then the start is on the host.

Type:

int

correlation_id#

The correlation ID of the graph launch. Each graph launch is assigned a unique correlation ID that is identical to the correlation ID in the driver API activity record that launched the graph.

Type:

int

device_id#

The ID of the device where the first node of the graph is executed. If this is INT_MAX, then the start is on the host.

Type:

int

end#

The end timestamp for the graph execution, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the graph.

Type:

int

end_context_id#

The ID of the context where the last node of the graph is executed.

Type:

int

end_device_id#

The ID of the device where last node of the graph is executed

Type:

int

static from_buffer(buffer)#

Create an ActivityGraphTrace2 instance with the memory from the given buffer.

graph_id#

The unique ID of the graph that is launched.

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_GRAPH_TRACE

Type:

int

ptr#

Get the pointer address to the data as Python int.

start#

The start timestamp for the graph execution, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the graph.

Type:

int

stream_id#

The ID of the stream where the graph is being launched.

Type:

int

class cupti.cupti.ActivityGreenContext2#

Bases: object

Empty-initialize an instance of CUpti_ActivityGreenContext2.

context_id#

The context ID of the green context.

Type:

int

device_id#

The device ID associated with the green context.

Type:

int

static from_buffer(buffer)#

Create an ActivityGreenContext2 instance with the memory from the given buffer.

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_GREEN_CONTEXT.

Type:

int

logical_tpc_mask#

Interpreting the TPC Mask** For a green context with numTpcs=3, logicalTpcMaskSize=8, and logicalTpcMask={66,8,0,0,0,0,0,0}: - Word 0: Value 66 (binary: …01000010, showing lowest 8 bits of 32-bit word) indicates TPCs 1 and 6 - Word 1: Value 8 (binary: …00001000) indicates TPC 35 (bit 3 of word 1, i.e., global TPC ID 32+3) - Each word is 32 bits covering TPCs 0-31 (word 0), 32-63 (word 1), etc. - To check if TPC N is allocated: ``(logicalTpcMask[N/32] & (1U << (N%32))) - logicalTpcMaskSize indicates how many words in the array contain valid data

Type:

uint32

Type:

(array of length 32).Bitset of allocated TPC IDs represented as 32-bit words. Valid words are specified by logicalTpcMaskSize; unused entries are zeroed. The array supports up to 1024 TPCs (32 words × 32 bits). Each bit represents a logical TPC ID assigned to the green context. For example, if bit k is set, logical TPC ID k is assigned to the context. **Example

logical_tpc_mask_size#

The size (in 32-bit words) of the logical TPC mask stored in logicalTpcMask[].

Type:

int

num_multiprocessors#

The number of multiprocessors (SMs) allocated to the green context.

Type:

int

num_tpcs#

The number of TPCs allocated to the green context.

Type:

int

padding#

Undefined. Reserved for internal use.

Type:

int

parent_context_id#

The ID of the parent context.

Type:

int

ptr#

Get the pointer address to the data as Python int.

workqueue_concurrency_limit#

The concurrency limit for the work queue associated with the green context. This defines the maximum number of concurrent operations allowed on this work queue.

Type:

int

workqueue_resource_id#

The work queue resource ID associated with the green context. This is a unique identifier for the work queue resource used by this green context.

Type:

int

workqueue_sharing_scope#

The sharing scope for the work queue associated with the green context. This defines how the work queue can be shared across different contexts or processes. Refer CUdevWorkqueueConfigScope in cuda.h for possible values.

Type:

int

class cupti.cupti.ActivityHostLaunch#

Bases: object

Empty-initialize an instance of CUpti_ActivityHostLaunch.

context_id#

The ID of the CUDA context to which the waiting CUDA stream belongs.

Type:

int

correlation_id#

The correlation ID of the host launch operation. Each operation is assigned a unique correlation ID that is identical to the correlation ID in the driver API activity record that launched the operation.

Type:

int

device_id#

The ID of the CUDA device to which the CUDA context and stream belong. The host function is executing on the CPU, but it is associated with the CUDA device through the CUDA context and stream.

Type:

int

end#

The end timestamp. A value of CUPTI_TIMESTAMP_UNKNOWN indicates that the start time is unknown.

Type:

int

static from_buffer(buffer)#

Create an ActivityHostLaunch instance with the memory from the given buffer.

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_HOST_LAUNCH

Type:

int

padding#

int:

process_id#

The ID of the process where the host function is executing.

Type:

int

ptr#

Get the pointer address to the data as Python int.

start#

The start timestamp. A value of CUPTI_TIMESTAMP_UNKNOWN indicates that the start time is unknown.

Type:

int

stream_id#

The ID of the CUDA context to which the waiting CUDA stream belongs.

Type:

int

thread_id#

The ID of the thread where the host function is executing.

Type:

int

class cupti.cupti.ActivityJit2#

Bases: object

Empty-initialize an instance of CUpti_ActivityJit2.

cache_path#

The path where the fat binary is cached.

Type:

str

cache_size#

The size of compute cache.

Type:

int

correlation_id#

The correlation ID of the JIT operation to which records belong to. Each JIT operation is assigned a unique correlation ID that is identical to the correlation ID in the driver or runtime API activity record that launched the JIT operation.

Type:

int

device_id#

The device ID.

Type:

int

end#

The end timestamp for the JIT operation, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the JIT operation.

Type:

int

static from_buffer(buffer)#

Create an ActivityJit2 instance with the memory from the given buffer.

jit_entry_type#

The JIT entry type.

Type:

int

jit_operation_correlation_id#

The correlation ID to correlate JIT compilation, load and store operations. Each JIT compilation unit is assigned a unique correlation ID at the time of the JIT compilation. This correlation id can be used to find the matching JIT cache load/store records.

Type:

int

jit_operation_type#

The JIT operation type.

Type:

int

kind#

The activity record kind must be CUPTI_ACTIVITY_KIND_JIT.

Type:

int

padding#

Internal use.

Type:

int

process_id#

The ID of the process where the JIT operation is executing.

Type:

int

ptr#

Get the pointer address to the data as Python int.

start#

The start timestamp for the JIT operation, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the JIT operation.

Type:

int

thread_id#

The ID of the thread where the JIT operation is executing.

Type:

int

class cupti.cupti.ActivityKernel12#

Bases: object

Empty-initialize an instance of CUpti_ActivityKernel12.

block_x#

The X-dimension block size for the kernel.

Type:

int

block_y#

The Y-dimension block size for the kernel.

Type:

int

block_z#

The Z-dimension grid size for the kernel.

Type:

int

cache_config#

For devices with compute capability 7.5+ cacheConfig values are not updated in case field isSharedMemoryCarveoutRequested is set

Type:

_py_anon_pod16

channel_id#

The ID of the HW channel on which the kernel is launched.

Type:

int

channel_type#

The type of the channel

Type:

int

cluster_scheduling_policy#

The cluster scheduling policy for the kernel. Refer CUclusterSchedulingPolicy Field is valid for devices with compute capability 9.0 and higher

Type:

int

cluster_x#

The X-dimension cluster size for the kernel. Field is valid for devices with compute capability 9.0 and higher

Type:

int

cluster_y#

The Y-dimension cluster size for the kernel. Field is valid for devices with compute capability 9.0 and higher

Type:

int

cluster_z#

The Z-dimension cluster size for the kernel. Field is valid for devices with compute capability 9.0 and higher

Type:

int

completed#

The completed timestamp for the kernel execution, in ns. It represents the completion of all it’s child kernels and the kernel itself. A value of CUPTI_TIMESTAMP_UNKNOWN indicates that the completion time is unknown.

Type:

int

context_id#

The ID of the context where the kernel is executing.

Type:

int

correlation_id#

The correlation ID of the kernel. Each kernel execution is assigned a unique correlation ID that is identical to the correlation ID in the driver or runtime API activity record that launched the kernel.

Type:

int

device_id#

The ID of the device where the kernel is executing.

Type:

int

dynamic_shared_memory#

The dynamic shared memory reserved for the kernel, in bytes.

Type:

int

end#

The end timestamp for the kernel execution, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the kernel.

Type:

int

execution_model#

The execution model of the kernel function. CUpti_FuncExecutionModel

Type:

int

static from_buffer(buffer)#

Create an ActivityKernel12 instance with the memory from the given buffer.

graph_id#

The unique ID of the graph that launched this kernel through graph launch APIs. This field will be 0 if the kernel is not launched through graph launch APIs.

Type:

int

graph_node_id#

The unique ID of the graph node that launched this kernel through graph launch APIs. This field will be 0 if the kernel is not launched through graph launch APIs.

Type:

int

grid_id#

The grid ID of the kernel. Each kernel is assigned a unique grid ID at runtime.

Type:

int

grid_x#

The X-dimension grid size for the kernel.

Type:

int

grid_y#

The Y-dimension grid size for the kernel.

Type:

int

grid_z#

The Z-dimension grid size for the kernel.

Type:

int

is_device_launched#

This field is set to 1 if the kernel is part of a device launched graph.

Type:

int

is_shared_memory_carveout_requested#

This indicates if CU_FUNC_ATTRIBUTE_PREFERRED_SHARED_MEMORY_CARVEOUT was updated for the kernel launch

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_KERNEL or CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL.

Type:

int

launch_type#

The indicates if the kernel was executed via a regular launch or via a single/multi device cooperative launch. CUpti_ActivityLaunchType

Type:

int

local_memory_per_thread#

The amount of local memory reserved for each thread, in bytes.

Type:

int

local_memory_total#

The total amount of local memory reserved for the kernel, in bytes (deprecated in CUDA 11.8). Refer field localMemoryTotal_v2

Type:

int

local_memory_total_v2#

The total amount of local memory reserved for the kernel, in bytes.

Type:

int

max_active_clusters#

The maximum clusters that could co-exist on the target device for the kernel

Type:

int

max_potential_cluster_size#

The maximum cluster size for the kernel

Type:

int

name#

The name of the kernel. This name is shared across all activity records representing the same kernel, and so should not be modified.

Type:

str

p_access_policy_window#

The pointer to the access policy window. The structure CUaccessPolicyWindow is defined in cuda.h.

Type:

int

padding#

Undefined. Reserved for internal use.

Type:

int

padding3#

(array of length 7).

Type:

uint8

padding4#

int:

padding5#

int:

partitioned_global_cache_executed#

The partitioned global caching executed for the kernel. Partitioned global caching is required to enable caching on certain chips, such as devices with compute capability 5.2. Partitioned global caching can be automatically disabled if the occupancy requirement of the launch cannot support caching.

Type:

int

partitioned_global_cache_requested#

The partitioned global caching requested for the kernel. Partitioned global caching is required to enable caching on certain chips, such as devices with compute capability 5.2.

Type:

int

priority#

The launch priority of the kernel.

Type:

int

ptr#

Get the pointer address to the data as Python int.

queued#

The timestamp when the kernel is queued up in the command buffer, in ns. A value of CUPTI_TIMESTAMP_UNKNOWN indicates that the queued time could not be collected for the kernel. This timestamp is not collected by default. Use API cuptiActivityEnableLatencyTimestamps() to enable collection. Command buffer is a buffer written by CUDA driver to send commands like kernel launch, memory copy etc to the GPU. All launches of CUDA kernels are asynchronous with respect to the host, the host requests the launch by writing commands into the command buffer, then returns without checking the GPU’s progress.

Type:

int

registers_per_thread#

The number of registers required for each thread executing the kernel.

Type:

int

reserved0#

Undefined. Reserved for internal use.

Type:

int

shared_memory_carveout_requested#

Shared memory carveout value requested for the function in percentage of the total resource. The value will be updated only if field isSharedMemoryCarveoutRequested is set.

Type:

int

shared_memory_config#

The shared memory configuration used for the kernel. The value is one of the CUsharedconfig enumeration values from cuda.h.

Type:

int

shared_memory_executed#

Shared memory size set by the driver.

Type:

int

shmem_limit_config#

The shared memory limit config for the kernel. This field shows whether user has opted for a higher per block limit of dynamic shared memory.

Type:

int

start#

The start timestamp for the kernel execution, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the kernel.

Type:

int

static_shared_memory#

The static shared memory allocated for the kernel, in bytes.

Type:

int

stream_id#

The ID of the stream where the kernel is executing.

Type:

int

submitted#

The timestamp when the command buffer containing the kernel launch is submitted to the GPU, in ns. A value of CUPTI_TIMESTAMP_UNKNOWN indicates that the submitted time could not be collected for the kernel. This timestamp is not collected by default. Use API cuptiActivityEnableLatencyTimestamps() to enable collection.

Type:

int

class cupti.cupti.ActivityMarker2#

Bases: object

Empty-initialize an instance of CUpti_ActivityMarker2.

domain#

The name of the domain to which this marker belongs to. This will be NULL for default domain.

Type:

str

flags_#

The flags associated with the marker. CUpti_ActivityFlag

Type:

int

static from_buffer(buffer)#

Create an ActivityMarker2 instance with the memory from the given buffer.

id#

The marker ID.

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_MARKER.

Type:

int

name#

The marker name for an instantaneous or start marker. This will be NULL for an end marker.

Type:

str

object_id#

The identifier for the activity object associated with this marker. ‘objectKind’ indicates which ID is valid for this record.

Type:

ActivityObjectKindId

object_kind#

The kind of activity object associated with this marker.

Type:

int

pad#

Undefined. Reserved for internal use.

Type:

int

ptr#

Get the pointer address to the data as Python int.

timestamp#

The timestamp for the marker, in ns. A value of 0 indicates that timestamp information could not be collected for the marker.

Type:

int

class cupti.cupti.ActivityMarkerData2#

Bases: object

Empty-initialize an instance of CUpti_ActivityMarkerData2.

category#

The category for the marker.

Type:

int

color#

The color for the marker.

Type:

int

cupti_domain_id#

CUPTI maintained domain id required for NVTX extended payloads. To parse the payload correctly, the domain id must be used to identify the payload attributes as they are domain specific.

Type:

int

flags_#

The flags associated with the marker. CUpti_ActivityFlag

Type:

int

static from_buffer(buffer)#

Create an ActivityMarkerData2 instance with the memory from the given buffer.

id#

The marker ID.

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_MARKER_DATA.

Type:

int

padding#

Reserved for internal use.

Type:

int

payload#

The payload value.

Type:

MetricValue

payload_kind#

Defines the payload format for the value associated with the marker.

Type:

int

ptr#

Get the pointer address to the data as Python int.

class cupti.cupti.ActivityMemDecompress#

Bases: object

Empty-initialize an instance of CUpti_ActivityMemDecompress.

channel_id#

The ID of the HW channel on which the memory copy is occurring.

Type:

int

channel_type#

The type of the channel

Type:

int

context_id#

The ID of the context.

Type:

int

correlation_id#

The correlation ID of the decompression operations. Each operation is assigned a unique correlation ID that is identical to the correlation ID in the driver API activity record that launched the operation.

Type:

int

device_id#

The ID of the device.

Type:

int

end#

The end timestamp. A value of CUPTI_TIMESTAMP_UNKNOWN indicates that the start time is unknown.

Type:

int

static from_buffer(buffer)#

Create an ActivityMemDecompress instance with the memory from the given buffer.

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_MEM_DECOMPRESS

Type:

int

number_of_operations#

The number of operations in the batch.

Type:

int

ptr#

Get the pointer address to the data as Python int.

reserved0#

This field is reserved for internal use

Type:

int

source_bytes#

The number of bytes to be read and decompressed in the batch operation.

Type:

int

start#

The start timestamp. A value of CUPTI_TIMESTAMP_UNKNOWN indicates that the start time is unknown.

Type:

int

stream_id#

The ID of the stream.

Type:

int

class cupti.cupti.ActivityMemcpy6#

Bases: object

Empty-initialize an instance of CUpti_ActivityMemcpy6.

bytes#

The number of bytes transferred by the memory copy.

Type:

int

channel_id#

The ID of the HW channel on which the memory copy is occurring.

Type:

int

channel_type#

The type of the channel

Type:

int

context_id#

The ID of the context where the memory copy is occurring.

Type:

int

copy_count#

The total number of memcopy operations traced in this record. This field is valid for memcpy operations happening using MemcpyBatchAsync APIs in CUDA. In MemcpyBatchAsync APIs, multiple memcpy operations are batched together for optimization purposes based on certain heuristics. For other memcpy operations, this field will be 1.

Type:

int

copy_kind#

The kind of the memory copy, stored as a byte to reduce record size. CUpti_ActivityMemcpyKind

Type:

int

correlation_id#

The correlation ID of the memory copy. Each memory copy is assigned a unique correlation ID that is identical to the correlation ID in the driver API activity record that launched the memory copy.

Type:

int

device_id#

The ID of the device where the memory copy is occurring.

Type:

int

dst_kind#

The destination memory kind read by the memory copy, stored as a byte to reduce record size. CUpti_ActivityMemoryKind

Type:

int

end#

The end timestamp for the memory copy, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the memory copy.

Type:

int

flags_#

The flags associated with the memory copy. CUpti_ActivityFlag

Type:

int

static from_buffer(buffer)#

Create an ActivityMemcpy6 instance with the memory from the given buffer.

graph_id#

The unique ID of the graph that executed this memcpy through graph launch. This field will be 0 if the memcpy is not done through graph launch.

Type:

int

graph_node_id#

The unique ID of the graph node that executed this memcpy through graph launch. This field will be 0 if the memcpy is not done through graph launch.

Type:

int

is_device_launched#

This field is used to indicate if the memcpy operation is part of a device graph launch.

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_MEMCPY.

Type:

int

pad#

Undefined. Reserved for internal use.

Type:

int

pad2#

(array of length 3).Reserved for internal use.

Type:

uint8

ptr#

Get the pointer address to the data as Python int.

reserved0#

Undefined. Reserved for internal use.

Type:

int

runtime_correlation_id#

The runtime correlation ID of the memory copy. Each memory copy is assigned a unique runtime correlation ID that is identical to the correlation ID in the runtime API activity record that launched the memory copy.

Type:

int

src_kind#

The source memory kind read by the memory copy, stored as a byte to reduce record size. CUpti_ActivityMemoryKind

Type:

int

start#

The start timestamp for the memory copy, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the memory copy.

Type:

int

stream_id#

The ID of the stream where the memory copy is occurring.

Type:

int

class cupti.cupti.ActivityMemcpyPtoP4#

Bases: object

Empty-initialize an instance of CUpti_ActivityMemcpyPtoP4.

bytes#

The number of bytes transferred by the memory copy.

Type:

int

channel_id#

The ID of the HW channel on which the memory copy is occurring.

Type:

int

channel_type#

The type of the channel

Type:

int

context_id#

The ID of the context where the memory copy is occurring.

Type:

int

copy_kind#

The kind of the memory copy, stored as a byte to reduce record size. CUpti_ActivityMemcpyKind

Type:

int

correlation_id#

The correlation ID of the memory copy. Each memory copy is assigned a unique correlation ID that is identical to the correlation ID in the driver and runtime API activity record that launched the memory copy.

Type:

int

device_id#

The ID of the device where the memory copy is occurring.

Type:

int

dst_context_id#

The ID of the context owning the memory being copied to.

Type:

int

dst_device_id#

The ID of the device where memory is being copied to.

Type:

int

dst_kind#

The destination memory kind read by the memory copy, stored as a byte to reduce record size. CUpti_ActivityMemoryKind

Type:

int

end#

The end timestamp for the memory copy, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the memory copy.

Type:

int

flags_#

The flags associated with the memory copy. CUpti_ActivityFlag

Type:

int

static from_buffer(buffer)#

Create an ActivityMemcpyPtoP4 instance with the memory from the given buffer.

graph_id#

The unique ID of the graph that executed this memcpy through graph launch. This field will be 0 if the memcpy is not done through graph launch.

Type:

int

graph_node_id#

The unique ID of the graph node that executed the memcpy through graph launch. This field will be 0 if memcpy is not done using graph launch.

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_MEMCPY2.

Type:

int

ptr#

Get the pointer address to the data as Python int.

reserved0#

Undefined. Reserved for internal use.

Type:

int

src_context_id#

The ID of the context owning the memory being copied from.

Type:

int

src_device_id#

The ID of the device where memory is being copied from.

Type:

int

src_kind#

The source memory kind read by the memory copy, stored as a byte to reduce record size. CUpti_ActivityMemoryKind

Type:

int

start#

The start timestamp for the memory copy, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the memory copy.

Type:

int

stream_id#

The ID of the stream where the memory copy is occurring.

Type:

int

class cupti.cupti.ActivityMemory#

Bases: object

Empty-initialize an instance of CUpti_ActivityMemory.

address#

The virtual address of the allocation

Type:

int

alloc_pc#

The program counter of the allocation of memory

Type:

int

bytes#

The number of bytes of memory allocated.

Type:

int

context_id#

The ID of the context. If context is NULL, `contextId` is set to CUPTI_INVALID_CONTEXT_ID.

Type:

int

device_id#

The ID of the device where the memory allocation is taking place.

Type:

int

end#

The end timestamp for the memory operation, i.e. the time when memory was freed, in ns. This will be 0 if memory is not freed in the application

Type:

int

free_pc#

The program counter of the freeing of memory. This will be 0 if memory is not freed in the application

Type:

int

static from_buffer(buffer)#

Create an ActivityMemory instance with the memory from the given buffer.

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_MEMORY

Type:

int

memory_kind#

The memory kind requested by the user

Type:

int

name#

Variable name. This name is shared across all activity records representing the same symbol, and so should not be modified.

Type:

str

pad#

Undefined. Reserved for internal use.

Type:

int

process_id#

The ID of the process to which this record belongs to.

Type:

int

ptr#

Get the pointer address to the data as Python int.

start#

The start timestamp for the memory operation, i.e. the time when memory was allocated, in ns.

Type:

int

class cupti.cupti.ActivityMemory4#

Bases: object

Empty-initialize an instance of CUpti_ActivityMemory4.

address#

The virtual address of the allocation. The base address of the memory pool.

Type:

int

bytes#

The number of bytes of memory allocated.

Type:

int

context_id#

The ID of the context. If context is NULL, `contextId` is set to CUPTI_INVALID_CONTEXT_ID.

Type:

int

correlation_id#

The correlation ID of the memory operation. Each memory operation is assigned a unique correlation ID that is identical to the correlation ID in the driver and runtime API activity record that launched the memory operation.

Type:

int

device_id#

The ID of the device where the memory operation is taking place.

Type:

int

static from_buffer(buffer)#

Create an ActivityMemory4 instance with the memory from the given buffer.

is_async#

`isAsync` is set if memory operation happens through async memory APIs.

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_MEMORY2

Type:

int

memory_kind#

The memory kind requested by the user, CUpti_ActivityMemoryKind.

Type:

int

memory_operation_type#

The memory operation requested by the user, CUpti_ActivityMemoryOperationType.

Type:

int

memory_pool_config#

The memory pool configuration used for the memory operations.

Type:

_py_anon_pod5

name#

Variable name. This name is shared across all activity records representing the same symbol, and so should not be modified.

Type:

str

pad1#

Undefined. Reserved for internal use.

Type:

int

pc#

The program counter of the memory operation.

Type:

int

process_id#

int:

ptr#

Get the pointer address to the data as Python int.

source#

The shared object or binary that the memory allocation request comes from.

Type:

str

stream_id#

The ID of the stream. If memory operation is not async, `streamId` is set to CUPTI_INVALID_STREAM_ID.

Type:

int

timestamp#

The start timestamp for the memory operation, in ns.

Type:

int

class cupti.cupti.ActivityMemoryPool3#

Bases: object

Empty-initialize an instance of CUpti_ActivityMemoryPool3.

address#

The virtual address of the allocation.

Type:

int

correlation_id#

The correlation ID of the memory pool operation. Each memory pool operation is assigned a unique correlation ID that is identical to the correlation ID in the driver and runtime API activity record that launched the memory operation.

Type:

int

device_id#

The ID of the device where the memory pool is created.

Type:

int

static from_buffer(buffer)#

Create an ActivityMemoryPool3 instance with the memory from the given buffer.

is_managed_pool#

Whether the pool is of managed memory allocation or pinned memory allocation. If it is 0, it is pinned and if it is 1, the memory pool allocation is of managed memory type.

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_MEMORY_POOL

Type:

int

memory_pool_operation_type#

The memory operation requested by the user, CUpti_ActivityMemoryPoolOperationType.

Type:

int

memory_pool_type#

The type of the memory pool, CUpti_ActivityMemoryPoolType

Type:

int

min_bytes_to_keep#

The minimum bytes to keep of the memory pool. `minBytesToKeep` is valid for CUPTI_ACTIVITY_MEMORY_POOL_OPERATION_TYPE_TRIMMED, CUpti_ActivityMemoryPoolOperationType

Type:

int

pad2#

(array of length 7).Undefined. Reserved for internal use.

Type:

uint8

process_id#

The ID of the process to which this record belongs to.

Type:

int

ptr#

Get the pointer address to the data as Python int.

release_threshold#

The release threshold of the memory pool. `releaseThreshold` is valid for CUPTI_ACTIVITY_MEMORY_POOL_TYPE_LOCAL, CUpti_ActivityMemoryPoolType.

Type:

int

size_#

The size of the memory pool operation in bytes. `size` is valid for CUPTI_ACTIVITY_MEMORY_POOL_TYPE_LOCAL, CUpti_ActivityMemoryPoolType.

Type:

int

timestamp#

The start timestamp for the memory operation, in ns.

Type:

int

utilized_size#

The utilized size of the memory pool. `utilizedSize` is valid for CUPTI_ACTIVITY_MEMORY_POOL_TYPE_LOCAL, CUpti_ActivityMemoryPoolType.

Type:

int

class cupti.cupti.ActivityMemset4#

Bases: object

Empty-initialize an instance of CUpti_ActivityMemset4.

bytes#

The number of bytes being set by the memory set.

Type:

int

channel_id#

The ID of the HW channel on which the memory set is occurring.

Type:

int

channel_type#

The type of the channel

Type:

int

context_id#

The ID of the context where the memory set is occurring.

Type:

int

correlation_id#

The correlation ID of the memory set. Each memory set is assigned a unique correlation ID that is identical to the correlation ID in the driver API activity record that launched the memory set.

Type:

int

device_id#

The ID of the device where the memory set is occurring.

Type:

int

end#

The end timestamp for the memory set, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the memory set.

Type:

int

flags_#

The flags associated with the memset. CUpti_ActivityFlag

Type:

int

static from_buffer(buffer)#

Create an ActivityMemset4 instance with the memory from the given buffer.

graph_id#

The unique ID of the graph that executed this memset through graph launch. This field will be 0 if the memset is not executed through graph launch.

Type:

int

graph_node_id#

The unique ID of the graph node that executed this memset through graph launch. This field will be 0 if the memset is not executed through graph launch.

Type:

int

is_device_launched#

This field is used to indicate if the memset operation is part of a device graph launch.

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_MEMSET.

Type:

int

memory_kind#

The memory kind of the memory set CUpti_ActivityMemoryKind

Type:

int

pad#

Undefined. Reserved for internal use.

Type:

int

pad2#

(array of length 3).Undefined. Reserved for internal use

Type:

uint8

ptr#

Get the pointer address to the data as Python int.

reserved0#

Undefined. Reserved for internal use.

Type:

int

start#

The start timestamp for the memory set, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the memory set.

Type:

int

stream_id#

The ID of the stream where the memory set is occurring.

Type:

int

value#

The value being assigned to memory by the memory set.

Type:

int

class cupti.cupti.ActivityModule#

Bases: object

Empty-initialize an instance of CUpti_ActivityModule.

context_id#

The ID of the context where the module is loaded.

Type:

int

cubin#

The pointer to cubin.

Type:

int

cubin_size#

The cubin size.

Type:

int

static from_buffer(buffer)#

Create an ActivityModule instance with the memory from the given buffer.

id#

The module ID.

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_MODULE.

Type:

int

ptr#

Get the pointer address to the data as Python int.

class cupti.cupti.ActivityName#

Bases: object

Empty-initialize an instance of CUpti_ActivityName.

static from_buffer(buffer)#

Create an ActivityName instance with the memory from the given buffer.

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_NAME.

Type:

int

name#

The name.

Type:

str

object_id#

The identifier for the activity object. ‘objectKind’ indicates which ID is valid for this record.

Type:

ActivityObjectKindId

object_kind#

The kind of activity object being named.

Type:

int

pad#

Undefined. Reserved for internal use.

Type:

int

ptr#

Get the pointer address to the data as Python int.

class cupti.cupti.ActivityObjectKindId#

Bases: object

Empty-initialize an instance of CUpti_ActivityObjectKindId.

dcs#

A device object requires that we identify the device ID. A context object requires that we identify both the device and context ID. A stream object requires that we identify device, context, and stream ID.

Type:

_py_anon_pod4

static from_buffer(buffer)#

Create an ActivityObjectKindId instance with the memory from the given buffer.

pt#

A process object requires that we identify the process ID. A thread object requires that we identify both the process and thread ID.

Type:

_py_anon_pod3

ptr#

Get the pointer address to the data as Python int.

class cupti.cupti.ActivityOpenAccData#

Bases: object

Empty-initialize an instance of CUpti_ActivityOpenAccData.

async_#

int:

async_map#

int:

bytes#

Number of bytes

Type:

int

cu_context_id#

CUDA context id Valid only if deviceType is acc_device_nvidia.

Type:

int

cu_device_id#

CUDA device id Valid only if deviceType is acc_device_nvidia.

Type:

int

cu_process_id#

The ID of the process where the OpenACC activity is executing.

Type:

int

cu_stream_id#

CUDA stream id Valid only if deviceType is acc_device_nvidia.

Type:

int

cu_thread_id#

The ID of the thread where the OpenACC activity is executing.

Type:

int

device_number#

int:

device_ptr#

Device pointer if available

Type:

int

device_type#

int:

end#

CUPTI end timestamp

Type:

int

end_line_no#

int:

event_kind#

CUPTI OpenACC event kind (CUpti_OpenAccEventKind)

Type:

int

external_id#

The OpenACC correlation ID. Valid only if deviceType is acc_device_nvidia. If not 0, it uniquely identifies this record. It is identical to the externalId in the preceding external correlation record of type CUPTI_EXTERNAL_CORRELATION_KIND_OPENACC.

Type:

int

static from_buffer(buffer)#

Create an ActivityOpenAccData instance with the memory from the given buffer.

func_end_line_no#

int:

func_line_no#

int:

func_name#

str:

host_ptr#

Host pointer if available

Type:

int

implicit#

int:

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_OPENACC_DATA.

Type:

int

line_no#

int:

parent_construct#

int:

ptr#

Get the pointer address to the data as Python int.

src_file#

str:

start#

CUPTI start timestamp

Type:

int

thread_id#

ThreadId

Type:

int

var_name#

str:

version#

int:

class cupti.cupti.ActivityOpenAccLaunch#

Bases: object

Empty-initialize an instance of CUpti_ActivityOpenAccLaunch.

async_#

Value of async() clause of the corresponding directive

Type:

int

async_map#

Internal asynchronous queue number used

Type:

int

cu_context_id#

CUDA context id Valid only if deviceType is acc_device_nvidia.

Type:

int

cu_device_id#

CUDA device id Valid only if deviceType is acc_device_nvidia.

Type:

int

cu_process_id#

The ID of the process where the OpenACC activity is executing.

Type:

int

cu_stream_id#

CUDA stream id Valid only if deviceType is acc_device_nvidia.

Type:

int

cu_thread_id#

The ID of the thread where the OpenACC activity is executing.

Type:

int

device_number#

Device number

Type:

int

device_type#

Device type

Type:

int

end#

CUPTI end timestamp

Type:

int

end_line_no#

For an OpenACC construct, this contains the line number of the end of the construct. A negative or zero value means the line number is not known.

Type:

int

event_kind#

CUPTI OpenACC event kind (CUpti_OpenAccEventKind)

Type:

int

external_id#

The OpenACC correlation ID. Valid only if deviceType is acc_device_nvidia. If not 0, it uniquely identifies this record. It is identical to the externalId in the preceding external correlation record of type CUPTI_EXTERNAL_CORRELATION_KIND_OPENACC.

Type:

int

static from_buffer(buffer)#

Create an ActivityOpenAccLaunch instance with the memory from the given buffer.

func_end_line_no#

The last line number of the function named in func_name. A negative or zero value means the line number is not known.

Type:

int

func_line_no#

The line number of the first line of the function named in func_name. A negative or zero value means the line number is not known.

Type:

int

func_name#

A pointer to a null-terminated string containing the name of the function in which the event occurred.

Type:

str

implicit#

1 for any implicit event, such as an implicit wait at a synchronous data construct 0 otherwise

Type:

int

kernel_name#

A pointer to null-terminated string containing the name of the kernel being launched, if known, or a null pointer if not.

Type:

str

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_OPENACC_LAUNCH.

Type:

int

line_no#

The line number of the directive or program construct or the starting line number of the OpenACC construct corresponding to the event. A negative or zero value means the line number is not known.

Type:

int

num_gangs#

The number of gangs created for this kernel launch

Type:

int

num_workers#

The number of workers created for this kernel launch

Type:

int

parent_construct#

CUPTI OpenACC parent construct kind (CUpti_OpenAccConstructKind) Note that for applications using PGI OpenACC runtime < 16.1, this will always be CUPTI_OPENACC_CONSTRUCT_KIND_UNKNOWN.

Type:

int

ptr#

Get the pointer address to the data as Python int.

src_file#

A pointer to null-terminated string containing the name of or path to the source file, if known, or a null pointer if not.

Type:

str

start#

CUPTI start timestamp

Type:

int

thread_id#

ThreadId

Type:

int

vector_length#

The number of vector lanes created for this kernel launch

Type:

int

version#

Version number

Type:

int

class cupti.cupti.ActivityOpenAccOther#

Bases: object

Empty-initialize an instance of CUpti_ActivityOpenAccOther.

async_#

Value of async() clause of the corresponding directive

Type:

int

async_map#

Internal asynchronous queue number used

Type:

int

cu_context_id#

CUDA context id Valid only if deviceType is acc_device_nvidia.

Type:

int

cu_device_id#

CUDA device id Valid only if deviceType is acc_device_nvidia.

Type:

int

cu_process_id#

The ID of the process where the OpenACC activity is executing.

Type:

int

cu_stream_id#

CUDA stream id Valid only if deviceType is acc_device_nvidia.

Type:

int

cu_thread_id#

The ID of the thread where the OpenACC activity is executing.

Type:

int

device_number#

Device number

Type:

int

device_type#

Device type

Type:

int

end#

CUPTI end timestamp

Type:

int

end_line_no#

For an OpenACC construct, this contains the line number of the end of the construct. A negative or zero value means the line number is not known.

Type:

int

event_kind#

CUPTI OpenACC event kind (CUpti_OpenAccEventKind)

Type:

int

external_id#

The OpenACC correlation ID. Valid only if deviceType is acc_device_nvidia. If not 0, it uniquely identifies this record. It is identical to the externalId in the preceding external correlation record of type CUPTI_EXTERNAL_CORRELATION_KIND_OPENACC.

Type:

int

static from_buffer(buffer)#

Create an ActivityOpenAccOther instance with the memory from the given buffer.

func_end_line_no#

The last line number of the function named in func_name. A negative or zero value means the line number is not known.

Type:

int

func_line_no#

The line number of the first line of the function named in func_name. A negative or zero value means the line number is not known.

Type:

int

func_name#

A pointer to a null-terminated string containing the name of the function in which the event occurred.

Type:

str

implicit#

1 for any implicit event, such as an implicit wait at a synchronous data construct 0 otherwise

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_OPENACC_OTHER.

Type:

int

line_no#

The line number of the directive or program construct or the starting line number of the OpenACC construct corresponding to the event. A negative or zero value means the line number is not known.

Type:

int

parent_construct#

CUPTI OpenACC parent construct kind (CUpti_OpenAccConstructKind) Note that for applications using PGI OpenACC runtime < 16.1, this will always be CUPTI_OPENACC_CONSTRUCT_KIND_UNKNOWN.

Type:

int

ptr#

Get the pointer address to the data as Python int.

src_file#

A pointer to null-terminated string containing the name of or path to the source file, if known, or a null pointer if not.

Type:

str

start#

CUPTI start timestamp

Type:

int

thread_id#

ThreadId

Type:

int

version#

Version number

Type:

int

class cupti.cupti.ActivityOpenMp#

Bases: object

Empty-initialize an instance of CUpti_ActivityOpenMp.

cu_process_id#

The ID of the process where the OpenMP activity is executing.

Type:

int

cu_thread_id#

The ID of the thread where the OpenMP activity is executing.

Type:

int

end#

CUPTI end timestamp

Type:

int

event_kind#

CUPTI OpenMP event kind (CUpti_OpenMpEventKind)

Type:

int

static from_buffer(buffer)#

Create an ActivityOpenMp instance with the memory from the given buffer.

kind#

The kind of this activity.

Type:

int

ptr#

Get the pointer address to the data as Python int.

start#

CUPTI start timestamp

Type:

int

thread_id#

ThreadId

Type:

int

version#

Version number

Type:

int

class cupti.cupti.ActivityOverhead3#

Bases: object

Empty-initialize an instance of CUpti_ActivityOverhead3.

correlation_id#

The correlation ID of the overhead operation to which records belong to. This ID is identical to the correlation ID in the driver or runtime API activity record that launched the overhead operation. In some cases, it can be zero, such as for CUPTI_ACTIVITY_OVERHEAD_CUPTI_BUFFER_FLUSH records.

Type:

int

end#

The end timestamp for the overhead, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the overhead.

Type:

int

static from_buffer(buffer)#

Create an ActivityOverhead3 instance with the memory from the given buffer.

kind#

The activity record kind, must be CUPTI_ACTIVITY_OVERHEAD.

Type:

int

object_id#

The identifier for the activity object. ‘objectKind’ indicates which ID is valid for this record.

Type:

ActivityObjectKindId

object_kind#

The kind of activity object that the overhead is associated with.

Type:

int

overhead_data#

Pointer to the struct with additional details about the overhead. Refer CUpti_ActivityOverheadKind enum and the corresponding structure to typecast and access additional overhead data. Client is responsible for freeing this memory using the free function when done.

Type:

int

overhead_kind#

The kind of overhead, CUPTI, DRIVER, COMPILER etc.

Type:

int

ptr#

Get the pointer address to the data as Python int.

reserved0#

Reserved for internal use.

Type:

int

start#

The start timestamp for the overhead, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the overhead.

Type:

int

class cupti.cupti.ActivityOverheadCommandBufferFullData#

Bases: object

Empty-initialize an instance of CUpti_ActivityOverheadCommandBufferFullData.

channel_id#

The channel ID of the command buffer.

Type:

int

channel_type#

The channel type of the command buffer.

Type:

int

command_buffer_length#

The remaining space in the command buffer. This field will always be zero when the command buffer is full, making it not useful in such cases.

Type:

int

static from_buffer(buffer)#

Create an ActivityOverheadCommandBufferFullData instance with the memory from the given buffer.

ptr#

Get the pointer address to the data as Python int.

class cupti.cupti.ActivityPreemption#

Bases: object

Empty-initialize an instance of CUpti_ActivityPreemption.

block_x#

The X-dimension of the block that is preempted

Type:

int

block_y#

The Y-dimension of the block that is preempted

Type:

int

block_z#

The Z-dimension of the block that is preempted

Type:

int

static from_buffer(buffer)#

Create an ActivityPreemption instance with the memory from the given buffer.

grid_id#

The grid-id of the block that is preempted

Type:

int

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_PREEMPTION

Type:

int

pad#

Undefined. Reserved for internal use.

Type:

int

preemption_kind#

kind of the preemption

Type:

int

ptr#

Get the pointer address to the data as Python int.

timestamp#

The timestamp of the preemption, in ns. A value of 0 indicates that timestamp information could not be collected for the preemption.

Type:

int

class cupti.cupti.ActivityStream#

Bases: object

Empty-initialize an instance of CUpti_ActivityStream.

context_id#

The ID of the context where the stream was created.

Type:

int

correlation_id#

The correlation ID of the API to which this result is associated.

Type:

int

flag#

Flags associated with the stream.

Type:

int

static from_buffer(buffer)#

Create an ActivityStream instance with the memory from the given buffer.

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_STREAM.

Type:

int

priority#

The clamped priority for the stream.

Type:

int

ptr#

Get the pointer address to the data as Python int.

stream_id#

A unique stream ID to identify the stream.

Type:

int

class cupti.cupti.ActivitySynchronization2#

Bases: object

Empty-initialize an instance of CUpti_ActivitySynchronization2.

context_id#

The ID of the context for which the synchronization API is called. In case of context synchronization API it is the context id for which the API is called. In case of stream/event synchronization it is the ID of the context where the stream/event was created.

Type:

int

correlation_id#

The correlation ID of the API to which this result is associated.

Type:

int

cuda_event_id#

The event ID for which the synchronization API is called. A CUPTI_SYNCHRONIZATION_INVALID_VALUE value indicate the field is not applicable for this record. Not valid for cuCtxSynchronize, cuStreamSynchronize.

Type:

int

cuda_event_sync_id#

A unique ID to associate event synchronization records with the latest CUDA Event record. Similar field is added in CUpti_ActivityCudaEvent2 to associate synchronization record to the CUDA Event record. The same CUDA event can be used multiple times, so the event id will not be unique to correlate the synchronization record with the latest CUDA Event record. This field will be unique and can be used to do the required correlation. A CUPTI_SYNCHRONIZATION_INVALID_VALUE value indicates that the field is not applicable for this record. Valid only for synchronization records related to CUDA Events.

Type:

int

end#

The end timestamp for the function, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the function.

Type:

int

static from_buffer(buffer)#

Create an ActivitySynchronization2 instance with the memory from the given buffer.

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_SYNCHRONIZATION.

Type:

int

pad#

Undefined. Reserved for internal use.

Type:

int

ptr#

Get the pointer address to the data as Python int.

return_value#

The return value for the synchronization record. Use cuptiActivityEnableAllSyncRecords API to enable/disable collection of synchronization records with return value being non-zero. This will be a CUresult value.

Type:

int

start#

The start timestamp for the function, in ns. A value of 0 for both the start and end timestamps indicates that timestamp information could not be collected for the function.

Type:

int

stream_id#

The compute stream for which the synchronization API is called. A CUPTI_SYNCHRONIZATION_INVALID_VALUE value indicate the field is not applicable for this record. Not valid for cuCtxSynchronize, cuEventSynchronize.

Type:

int

type#

The type of record.

Type:

int

class cupti.cupti.ActivityUnifiedMemoryCounter3#

Bases: object

Empty-initialize an instance of CUpti_ActivityUnifiedMemoryCounter3.

address#

This is the virtual base address of the page/s being transferred. For cpu and gpu faults, the virtual address for the page that faulted.

Type:

int

counter_kind#

The Unified Memory counter kind

Type:

int

dst_id#

The ID of the destination CPU/device involved in the memory transfer or remote map operation. Ignore this field if counterKind is CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_GPU_PAGE_FAULT or CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_CPU_PAGE_FAULT_COUNT or CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THRASHING or CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THROTTLING

Type:

int

end#

The end timestamp of the counter, in ns. Ignore this field if counterKind is CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_CPU_PAGE_FAULT_COUNT or CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THRASHING or CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_REMOTE_MAP. For counterKind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_HTOD and CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOH, timestamp is captured when activity finishes on GPU. For counterKind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_GPU_PAGE_FAULT, timestamp is captured when CUDA driver queues the replay of faulting memory accesses on the GPU For counterKind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THROTTLING, timestamp is captured when throttling operation was finished by CUDA driver

Type:

int

flags_#

The flags associated with this record. See enums CUpti_ActivityUnifiedMemoryAccessType if counterKind is CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_GPU_PAGE_FAULT and CUpti_ActivityUnifiedMemoryMigrationCause if counterKind is CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_HTOD or CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_HTOD and CUpti_ActivityUnifiedMemoryRemoteMapCause if counterKind is CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_REMOTE_MAP and CUpti_ActivityFlag if counterKind is CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THRASHING or CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THROTTLING

Type:

int

static from_buffer(buffer)#

Create an ActivityUnifiedMemoryCounter3 instance with the memory from the given buffer.

kind#

The activity record kind, must be CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER

Type:

int

pad#

Undefined. Reserved for internal use.

Type:

int

process_id#

The ID of the process to which this record belongs to.

Type:

int

processors#

(array of length 5).The bitmask of devices involved in the operation. For counterKind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THRASHING, it is a bitwise ORing of the device IDs fighting for the memory region. processors[0] represents the device ID of the device 0 to device 63, processors[1] represents device ID of device 64 to device 127 and so on. Ignore this field if counterKind is CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_CPU_PAGE_FAULT_COUNT or CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_GPU_PAGE_FAULT or CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THROTTLING or CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_REMOTE_MAP or CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_HTOD or CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOH or CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_DTOD or CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_GPU_FAULT_REPLAY

Type:

uint64

ptr#

Get the pointer address to the data as Python int.

src_id#

The ID of the source CPU/device involved in the memory transfer, page fault, thrashing, throttling or remote map operation. For counterKind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THRASHING, it is a bitwise ORing of the device IDs fighting for the memory region, ONLY if there are less than 32 devices. Ignore this field if counterKind is CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_CPU_PAGE_FAULT_COUNT

Type:

int

start#

The start timestamp of the counter, in ns. For counterKind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_HTOD and CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOH, timestamp is captured when activity starts on GPU. For counterKind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_GPU_PAGE_FAULT and CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_CPU_PAGE_FAULT_COUNT, timestamp is captured when CUDA driver started processing the fault. For counterKind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THRASHING, timestamp is captured when CUDA driver detected thrashing of memory region. For counterKind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THROTTLING, timestamp is captured when throttling operation was started by CUDA driver. For counterKind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_REMOTE_MAP, timestamp is captured when CUDA driver has pushed all required operations to the processor specified by dstId.

Type:

int

stream_id#

The ID of the stream causing the transfer. This value of this field is invalid.

Type:

int

value#

Value of the counter For counterKind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_HTOD, CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_BYTES_TRANSFER_DTOH, CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_THREASHING and CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_REMOTE_MAP, it is the size of the memory region in bytes. For counterKind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_GPU_PAGE_FAULT, it is the number of page fault groups for the same page. For counterKind CUPTI_ACTIVITY_UNIFIED_MEMORY_COUNTER_KIND_CPU_PAGE_FAULT_COUNT, it is the program counter for the instruction that caused fault.

Type:

int

class cupti.cupti.ActivityUnifiedMemoryCounterConfig#

Bases: object

Empty-initialize an array of CUpti_ActivityUnifiedMemoryCounterConfig. The resulting object is of length size and of dtype activity_unified_memory_counter_config_dtype. If default-constructed, the instance represents a single struct.

Parameters:

size (int) – number of structs, default=1.

device_id#

Device id of the target device. This is relevant only for single device scopes. (deprecated in CUDA 7.0)

Type:

Union[uint32, int]

enable#

Control to enable/disable the counter. To enable the counter set it to non-zero value while disable is indicated by zero.

Type:

Union[uint32, int]

static from_buffer(buffer)#

Create an ActivityUnifiedMemoryCounterConfig instance with the memory from the given buffer.

kind#

Unified Memory counter Counter kind

Type:

Union[int32, int]

ptr#

Get the pointer address to the data as Python int.

scope#

Unified Memory counter Counter scope. (deprecated in CUDA 7.0)

Type:

Union[int32, int]

class cupti.cupti.MetricValue#

Bases: object

Empty-initialize an instance of CUpti_MetricValue.

See also

CUpti_MetricValue

static from_buffer(buffer)#

Create an MetricValue instance with the memory from the given buffer.

metric_value_double#

float:

metric_value_int64#

int:

metric_value_nvtx_extended_payload#

Value for CUPTI_METRIC_VALUE_KIND_NVTX_EXTENDED_PAYLOAD.

Type:

int

metric_value_percent#

float:

metric_value_throughput#

int:

metric_value_uint64#

int:

metric_value_utilization_level#

int:

ptr#

Get the pointer address to the data as Python int.

Callback API#

Functions#

cupti.cupti.enable_all_domains(enable: int, subscriber: int)#

Enable or disable all callbacks in all domains.

Parameters:
  • enable (uint32_t) – New enable state for all callbacks in all domain. Zero disables all callbacks, non-zero enables all callbacks.

  • subscriber (intptr_t) – Handle to callback subscription.

cupti.cupti.enable_callback(enable: int, subscriber: int, domain: int, cbid: int)#

Enable or disabled callbacks for a specific domain and callback ID.

Parameters:
  • enable (uint32_t) – New enable state for the callback. Zero disables the callback, non-zero enables the callback.

  • subscriber (intptr_t) – Handle to callback subscription.

  • domain (CallbackDomain) – The domain of the callback.

  • cbid (uint32_t) – The ID of the callback.

cupti.cupti.enable_domain(enable: int, subscriber: int, domain: int)#

Enable or disabled all callbacks for a specific domain.

Parameters:
  • enable (uint32_t) – New enable state for all callbacks in the domain. Zero disables all callbacks, non-zero enables all callbacks.

  • subscriber (intptr_t) – Handle to callback subscription.

  • domain (CallbackDomain) – The domain of the callback.

cupti.cupti.get_callback_name(domain: int, cbid: int)#

Get the name of a callback for a specific domain and callback ID.

Parameters:
  • domain (CallbackDomain) – The domain of the callback.

  • cbid (uint32_t) – The ID of the callback.

Returns:

Returns name of the callback for the specified domain and callback ID

Return type:

name (str)

cupti.cupti.get_callback_state(subscriber: int, domain: int, cbid: int) int#

Get the current enabled/disabled state of a callback for a specific domain and function ID.

Parameters:
  • subscriber (intptr_t) – Handle to the initialize subscriber.

  • domain (CallbackDomain) – The domain of the callback.

  • cbid (uint32_t) – The ID of the callback.

Returns:

Returns non-zero if callback enabled, zero if not enabled.

Return type:

uint32_t

cupti.cupti.subscribe(callback, userdata) int#

Initialize a callback subscriber with a callback function and user data.

Parameters:
  • callback (object) – The callback function.

  • userdata (object) – A pointer to user data. This data will be passed to the callback function via the userdata parameter.

Returns:

Returns handle to initialize subscriber.

Return type:

intptr_t

See also

cuptiSubscribe

cupti.cupti.subscribe_v2(callback, userdata, p_params: int) int#

Initialize a callback subscriber with a callback function and user data.

Parameters:
  • callback (object) – The callback function.

  • userdata (object) – A pointer to user data. This data will be passed to the callback function via the userdata parameter.

  • p_params (intptr_t) – A pointer to CUpti_SubscriberParams. Can be NULL.

Returns:

Returns handle to initialize subscriber.

Return type:

intptr_t

cupti.cupti.supported_domains()#

Get the available callback domains.

Returns:

List of all available callback domains

Return type:

list[CallbackDomain]

cupti.cupti.unsubscribe(subscriber: int)#

Unregister a callback subscriber.

Parameters:

subscriber (intptr_t) – Handle to the initialize subscriber.

See also

cuptiUnsubscribe

Enums#

class cupti.cupti.ApiCallbackSite(value)#

Bases: IntEnum

Specifies the point in an API call that a callback is issued.Specifies the point in an API call that a callback is issued. This value is communicated to the callback function via CUpti_CallbackData.callbackSite.

See CUpti_ApiCallbackSite.

API_CBSITE_FORCE_INT = 2147483647#
API_ENTER = 0#
API_EXIT = 1#
class cupti.cupti.CallbackDomain(value)#

Bases: IntEnum

Callback domains.Callback domains. Each domain represents callback points for a group of related API functions or CUDA driver activity.

See CUpti_CallbackDomain.

DRIVER_API = 1#
FORCE_INT = 2147483647#
INVALID = 0#
NVTX = 5#
RESOURCE = 3#
RUNTIME_API = 2#
SIZE = 7#
STATE = 6#
SYNCHRONIZE = 4#
class cupti.cupti.CallbackIdResource(value)#

Bases: IntEnum

Callback IDs for resource domain.Callback IDs for resource domain, CUPTI_CB_DOMAIN_RESOURCE. This value is communicated to the callback function via the cbid parameter.

See CUpti_CallbackIdResource.

CONTEXT_CREATED = 1#
CONTEXT_DESTROY_STARTING = 2#
CU_INIT_FINISHED = 5#
FORCE_INT = 2147483647#
GRAPHEXEC_CREATED = 18#
GRAPHEXEC_CREATE_STARTING = 17#
GRAPHEXEC_DESTROY_STARTING = 19#
GRAPHNODE_CLONED = 20#
GRAPHNODE_CREATED = 13#
GRAPHNODE_CREATE_STARTING = 12#
GRAPHNODE_DEPENDENCY_CREATED = 15#
GRAPHNODE_DEPENDENCY_DESTROY_STARTING = 16#
GRAPHNODE_DESTROY_STARTING = 14#
GRAPH_CLONED = 11#
GRAPH_CREATED = 9#
GRAPH_DESTROY_STARTING = 10#
GRAPH_NODE_SET_PARAMS = 23#
GRAPH_NODE_UPDATED = 22#
INVALID = 0#
MODULE_LOADED = 6#
MODULE_PROFILED = 8#
MODULE_UNLOAD_STARTING = 7#
SIZE = 24#
STREAM_ATTRIBUTE_CHANGED = 21#
STREAM_CREATED = 3#
STREAM_DESTROY_STARTING = 4#
class cupti.cupti.CallbackIdState(value)#

Bases: IntEnum

Callback IDs for state domain.Callback IDs for state domain, CUPTI_CB_DOMAIN_STATE. This value is communicated to the callback function via the cbid parameter.

See CUpti_CallbackIdState.

ERROR = 2#
FATAL_ERROR = 1#
FORCE_INT = 2147483647#
INVALID = 0#
SIZE = 4#
WARNING = 3#
class cupti.cupti.CallbackIdSync(value)#

Bases: IntEnum

Callback IDs for synchronization domain.Callback IDs for synchronization domain, CUPTI_CB_DOMAIN_SYNCHRONIZE. This value is communicated to the callback function via the cbid parameter.

See CUpti_CallbackIdSync.

CONTEXT_SYNCHRONIZED = 2#
FORCE_INT = 2147483647#
INVALID = 0#
SIZE = 3#
STREAM_SYNCHRONIZED = 1#
class cupti.cupti.Driver_api_trace_cbid(value)#

Bases: IntEnum

See CUpti_driver_api_trace_cbid.

FORCE_INT = 2147483647#
INVALID = 0#
SIZE = 860#
cu64Array3DCreate = 230#
cu64Array3DGetDescriptor = 231#
cu64ArrayCreate = 228#
cu64ArrayGetDescriptor = 229#
cu64D3D10ResourceGetMappedPitch = 200#
cu64D3D10ResourceGetMappedPointer = 198#
cu64D3D10ResourceGetMappedSize = 199#
cu64D3D10ResourceGetSurfaceDimensions = 201#
cu64D3D9MapVertexBuffer = 206#
cu64D3D9ResourceGetMappedPitch = 205#
cu64D3D9ResourceGetMappedPointer = 203#
cu64D3D9ResourceGetMappedSize = 204#
cu64D3D9ResourceGetSurfaceDimensions = 202#
cu64DeviceTotalMem = 197#
cu64GLMapBufferObject = 207#
cu64GLMapBufferObjectAsync = 208#
cu64GraphicsResourceGetMappedPointer = 131#
cu64MemAlloc = 30#
cu64MemAllocPitch = 32#
cu64MemFree = 34#
cu64MemGetAddressRange = 36#
cu64MemGetInfo = 28#
cu64MemHostAlloc = 215#
cu64MemHostGetDevicePointer = 41#
cu64Memcpy2D = 232#
cu64Memcpy2DAsync = 234#
cu64Memcpy2DUnaligned = 233#
cu64Memcpy3D = 59#
cu64Memcpy3DAsync = 70#
cu64MemcpyAtoD = 52#
cu64MemcpyDtoA = 50#
cu64MemcpyDtoD = 48#
cu64MemcpyDtoDAsync = 65#
cu64MemcpyDtoH = 46#
cu64MemcpyDtoHAsync = 63#
cu64MemcpyHtoD = 44#
cu64MemcpyHtoDAsync = 61#
cu64MemsetD16 = 74#
cu64MemsetD16Async = 219#
cu64MemsetD2D16 = 80#
cu64MemsetD2D16Async = 225#
cu64MemsetD2D32 = 82#
cu64MemsetD2D32Async = 227#
cu64MemsetD2D8 = 78#
cu64MemsetD2D8Async = 223#
cu64MemsetD32 = 76#
cu64MemsetD32Async = 221#
cu64MemsetD8 = 72#
cu64MemsetD8Async = 217#
cu64ModuleGetGlobal = 25#
cu64TexRefGetAddress = 104#
cu64TexRefSetAddress = 96#
cu64TexRefSetAddress2D = 98#
cuArray3DCreate = 90#
cuArray3DCreate_v2 = 274#
cuArray3DGetDescriptor = 91#
cuArray3DGetDescriptor_v2 = 275#
cuArrayCreate = 87#
cuArrayCreate_v2 = 272#
cuArrayDestroy = 89#
cuArrayGetDescriptor = 88#
cuArrayGetDescriptor_v2 = 273#
cuArrayGetMemoryRequirements = 654#
cuArrayGetPlane = 597#
cuArrayGetSparseProperties = 582#
cuBinaryFree = 376#
cuCheckpointProcessCheckpoint = 771#
cuCheckpointProcessGetRestoreThreadId = 768#
cuCheckpointProcessGetState = 769#
cuCheckpointProcessLock = 770#
cuCheckpointProcessRestore = 772#
cuCheckpointProcessUnlock = 773#
cuCompilePtx = 375#
cuCoredumpDeregisterCompleteCallback = 845#
cuCoredumpDeregisterStartCallback = 844#
cuCoredumpGetAttribute = 701#
cuCoredumpGetAttributeGlobal = 702#
cuCoredumpRegisterCompleteCallback = 843#
cuCoredumpRegisterStartCallback = 842#
cuCoredumpSetAttribute = 703#
cuCoredumpSetAttributeGlobal = 704#
cuCtxAttach = 12#
cuCtxCreate = 10#
cuCtxCreate_v2 = 235#
cuCtxCreate_v3 = 645#
cuCtxCreate_v4 = 757#
cuCtxDestroy = 11#
cuCtxDestroy_v2 = 322#
cuCtxDetach = 13#
cuCtxDisablePeerAccess = 314#
cuCtxEnablePeerAccess = 313#
cuCtxFromGreenCtx = 753#
cuCtxGetApiVersion = 296#
cuCtxGetCacheConfig = 299#
cuCtxGetCurrent = 304#
cuCtxGetDevResource = 746#
cuCtxGetDevice = 16#
cuCtxGetDevice_v2 = 795#
cuCtxGetExecAffinity = 646#
cuCtxGetFlags = 391#
cuCtxGetId = 695#
cuCtxGetLimit = 137#
cuCtxGetSharedMemConfig = 337#
cuCtxGetStreamPriorityRange = 370#
cuCtxPopCurrent = 15#
cuCtxPopCurrent_v2 = 324#
cuCtxPushCurrent = 14#
cuCtxPushCurrent_v2 = 323#
cuCtxRecordEvent = 755#
cuCtxResetPersistingL2Cache = 568#
cuCtxSetCacheConfig = 300#
cuCtxSetCurrent = 303#
cuCtxSetFlags = 705#
cuCtxSetLimit = 136#
cuCtxSetSharedMemConfig = 336#
cuCtxSynchronize = 17#
cuCtxSynchronize_v2 = 800#
cuCtxWaitEvent = 756#
cuD3D10CtxCreate = 139#
cuD3D10CtxCreateOnDevice = 212#
cuD3D10CtxCreate_v2 = 236#
cuD3D10GetDevice = 138#
cuD3D10GetDevices = 211#
cuD3D10GetDirect3DDevice = 297#
cuD3D10MapResources = 143#
cuD3D10RegisterResource = 141#
cuD3D10ResourceGetMappedArray = 146#
cuD3D10ResourceGetMappedPitch = 149#
cuD3D10ResourceGetMappedPitch_v2 = 262#
cuD3D10ResourceGetMappedPointer = 147#
cuD3D10ResourceGetMappedPointer_v2 = 260#
cuD3D10ResourceGetMappedSize = 148#
cuD3D10ResourceGetMappedSize_v2 = 261#
cuD3D10ResourceGetSurfaceDimensions = 150#
cuD3D10ResourceGetSurfaceDimensions_v2 = 263#
cuD3D10ResourceSetMapFlags = 145#
cuD3D10UnmapResources = 144#
cuD3D10UnregisterResource = 142#
cuD3D11CtxCreate = 152#
cuD3D11CtxCreateOnDevice = 210#
cuD3D11CtxCreate_v2 = 237#
cuD3D11GetDevice = 151#
cuD3D11GetDevices = 209#
cuD3D11GetDirect3DDevice = 298#
cuD3D9Begin = 168#
cuD3D9CtxCreate = 155#
cuD3D9CtxCreateOnDevice = 214#
cuD3D9CtxCreate_v2 = 238#
cuD3D9End = 169#
cuD3D9GetDevice = 154#
cuD3D9GetDevices = 213#
cuD3D9GetDirect3DDevice = 157#
cuD3D9MapResources = 160#
cuD3D9MapVertexBuffer = 171#
cuD3D9MapVertexBuffer_v2 = 268#
cuD3D9RegisterResource = 158#
cuD3D9RegisterVertexBuffer = 170#
cuD3D9ResourceGetMappedArray = 164#
cuD3D9ResourceGetMappedPitch = 167#
cuD3D9ResourceGetMappedPitch_v2 = 267#
cuD3D9ResourceGetMappedPointer = 165#
cuD3D9ResourceGetMappedPointer_v2 = 265#
cuD3D9ResourceGetMappedSize = 166#
cuD3D9ResourceGetMappedSize_v2 = 266#
cuD3D9ResourceGetSurfaceDimensions = 163#
cuD3D9ResourceGetSurfaceDimensions_v2 = 264#
cuD3D9ResourceSetMapFlags = 162#
cuD3D9UnmapResources = 161#
cuD3D9UnmapVertexBuffer = 172#
cuD3D9UnregisterResource = 159#
cuD3D9UnregisterVertexBuffer = 173#
cuDestroyExternalMemory = 488#
cuDestroyExternalSemaphore = 494#
cuDevResourceGenerateDesc = 748#
cuDevSmResourceSplit = 822#
cuDevSmResourceSplitByCount = 751#
cuDeviceCanAccessPeer = 312#
cuDeviceComputeCapability = 6#
cuDeviceGet = 3#
cuDeviceGetAttribute = 9#
cuDeviceGetByPCIBusId = 331#
cuDeviceGetCount = 4#
cuDeviceGetDefaultMemPool = 606#
cuDeviceGetDevResource = 745#
cuDeviceGetExecAffinitySupport = 644#
cuDeviceGetGraphMemAttribute = 641#
cuDeviceGetHostAtomicCapabilities = 805#
cuDeviceGetLuid = 532#
cuDeviceGetMemPool = 610#
cuDeviceGetName = 5#
cuDeviceGetNvSciSyncAttributes = 542#
cuDeviceGetP2PAtomicCapabilities = 804#
cuDeviceGetP2PAttribute = 454#
cuDeviceGetPCIBusId = 332#
cuDeviceGetProperties = 8#
cuDeviceGetTexture1DLinearMaxWidth = 579#
cuDeviceGetUuid = 482#
cuDeviceGetUuid_v2 = 647#
cuDeviceGraphMemTrim = 640#
cuDevicePrimaryCtxGetState = 392#
cuDevicePrimaryCtxRelease = 387#
cuDevicePrimaryCtxRelease_v2 = 544#
cuDevicePrimaryCtxReset = 389#
cuDevicePrimaryCtxReset_v2 = 545#
cuDevicePrimaryCtxRetain = 386#
cuDevicePrimaryCtxSetFlags = 388#
cuDevicePrimaryCtxSetFlags_v2 = 546#
cuDeviceRegisterAsyncNotification = 735#
cuDeviceSetGraphMemAttribute = 642#
cuDeviceSetMemPool = 609#
cuDeviceTotalMem = 7#
cuDeviceTotalMem_v2 = 259#
cuDeviceUnregisterAsyncNotification = 736#
cuDriverGetGpuCodeIsaVersion = 806#
cuDriverGetVersion = 2#
cuEGLStreamConsumerAcquireFrame = 395#
cuEGLStreamConsumerConnect = 393#
cuEGLStreamConsumerConnectWithFlags = 470#
cuEGLStreamConsumerDisconnect = 394#
cuEGLStreamConsumerReleaseFrame = 396#
cuEGLStreamProducerConnect = 446#
cuEGLStreamProducerDisconnect = 447#
cuEGLStreamProducerPresentFrame = 448#
cuEGLStreamProducerReturnFrame = 453#
cuEventCreate = 118#
cuEventCreateFromEGLSync = 479#
cuEventDestroy = 122#
cuEventDestroy_v2 = 325#
cuEventElapsedTime = 123#
cuEventElapsedTime_v2 = 780#
cuEventQuery = 120#
cuEventRecord = 119#
cuEventRecordWithFlags = 587#
cuEventRecordWithFlags_ptsz = 588#
cuEventRecord_ptsz = 441#
cuEventSynchronize = 121#
cuExternalMemoryGetMappedBuffer = 486#
cuExternalMemoryGetMappedMipmappedArray = 487#
cuFlushGPUDirectRDMAWrites = 627#
cuFuncGetAttribute = 85#
cuFuncGetModule = 566#
cuFuncGetName = 718#
cuFuncGetParamCount = 835#
cuFuncGetParamInfo = 733#
cuFuncIsLoaded = 741#
cuFuncLoad = 742#
cuFuncSetAttribute = 481#
cuFuncSetBlockShape = 83#
cuFuncSetCacheConfig = 86#
cuFuncSetSharedMemConfig = 338#
cuFuncSetSharedSize = 84#
cuGLCtxCreate = 174#
cuGLCtxCreate_v2 = 239#
cuGLGetDevices = 333#
cuGLGetDevices_v2 = 385#
cuGLInit = 178#
cuGLMapBufferObject = 180#
cuGLMapBufferObjectAsync = 184#
cuGLMapBufferObjectAsync_v2 = 270#
cuGLMapBufferObjectAsync_v2_ptsz = 445#
cuGLMapBufferObject_v2 = 269#
cuGLMapBufferObject_v2_ptds = 417#
cuGLRegisterBufferObject = 179#
cuGLSetBufferObjectMapFlags = 183#
cuGLUnmapBufferObject = 181#
cuGLUnmapBufferObjectAsync = 185#
cuGLUnregisterBufferObject = 182#
cuGetErrorName = 373#
cuGetErrorString = 372#
cuGetExportTable = 135#
cuGetProcAddress = 626#
cuGetProcAddress_v2 = 677#
cuGraphAddBatchMemOpNode = 669#
cuGraphAddChildGraphNode = 525#
cuGraphAddDependencies = 518#
cuGraphAddDependencies_v2 = 727#
cuGraphAddEmptyNode = 526#
cuGraphAddEventRecordNode = 589#
cuGraphAddEventWaitNode = 590#
cuGraphAddExternalSemaphoresSignalNode = 618#
cuGraphAddExternalSemaphoresWaitNode = 621#
cuGraphAddHostNode = 530#
cuGraphAddKernelNode = 502#
cuGraphAddKernelNode_v2 = 689#
cuGraphAddMemAllocNode = 638#
cuGraphAddMemFreeNode = 639#
cuGraphAddMemcpyNode = 504#
cuGraphAddMemsetNode = 506#
cuGraphAddNode = 712#
cuGraphAddNode_v2 = 723#
cuGraphBatchMemOpNodeGetParams = 670#
cuGraphBatchMemOpNodeSetParams = 671#
cuGraphChildGraphNodeGetGraph = 529#
cuGraphClone = 523#
cuGraphConditionalHandleCreate = 722#
cuGraphCreate = 501#
cuGraphDebugDotPrint = 628#
cuGraphDestroy = 517#
cuGraphDestroyNode = 522#
cuGraphEventRecordNodeGetEvent = 591#
cuGraphEventRecordNodeSetEvent = 593#
cuGraphEventWaitNodeGetEvent = 592#
cuGraphEventWaitNodeSetEvent = 594#
cuGraphExecBatchMemOpNodeSetParams = 672#
cuGraphExecChildGraphNodeSetParams = 586#
cuGraphExecDestroy = 516#
cuGraphExecEventRecordNodeSetEvent = 595#
cuGraphExecEventWaitNodeSetEvent = 596#
cuGraphExecExternalSemaphoresSignalNodeSetParams = 624#
cuGraphExecExternalSemaphoresWaitNodeSetParams = 625#
cuGraphExecGetFlags = 658#
cuGraphExecGetId = 813#
cuGraphExecHostNodeSetParams = 564#
cuGraphExecKernelNodeSetParams = 538#
cuGraphExecKernelNodeSetParams_v2 = 692#
cuGraphExecMemcpyNodeSetParams = 562#
cuGraphExecMemsetNodeSetParams = 563#
cuGraphExecNodeSetParams = 714#
cuGraphExecUpdate = 561#
cuGraphExecUpdate_v2 = 696#
cuGraphExternalSemaphoresSignalNodeGetParams = 619#
cuGraphExternalSemaphoresSignalNodeSetParams = 620#
cuGraphExternalSemaphoresWaitNodeGetParams = 622#
cuGraphExternalSemaphoresWaitNodeSetParams = 623#
cuGraphGetEdges = 535#
cuGraphGetEdges_v2 = 724#
cuGraphGetId = 812#
cuGraphGetNodes = 534#
cuGraphGetRootNodes = 510#
cuGraphHostNodeGetParams = 531#
cuGraphHostNodeSetParams = 533#
cuGraphInstantiate = 513#
cuGraphInstantiateWithFlags = 643#
cuGraphInstantiateWithParams = 656#
cuGraphInstantiateWithParams_ptsz = 657#
cuGraphInstantiate_v2 = 578#
cuGraphKernelNodeCopyAttributes = 569#
cuGraphKernelNodeGetAttribute = 570#
cuGraphKernelNodeGetParams = 503#
cuGraphKernelNodeGetParams_v2 = 690#
cuGraphKernelNodeSetAttribute = 571#
cuGraphKernelNodeSetParams = 521#
cuGraphKernelNodeSetParams_v2 = 691#
cuGraphLaunch = 514#
cuGraphLaunch_ptsz = 515#
cuGraphMemAllocNodeGetParams = 648#
cuGraphMemFreeNodeGetParams = 649#
cuGraphMemcpyNodeGetParams = 505#
cuGraphMemcpyNodeSetParams = 520#
cuGraphMemsetNodeGetParams = 507#
cuGraphMemsetNodeSetParams = 508#
cuGraphNodeFindInClone = 524#
cuGraphNodeGetContainingGraph = 809#
cuGraphNodeGetDependencies = 511#
cuGraphNodeGetDependencies_v2 = 725#
cuGraphNodeGetDependentNodes = 512#
cuGraphNodeGetDependentNodes_v2 = 726#
cuGraphNodeGetEnabled = 651#
cuGraphNodeGetLocalId = 810#
cuGraphNodeGetParams = 841#
cuGraphNodeGetToolsId = 811#
cuGraphNodeGetType = 509#
cuGraphNodeSetEnabled = 650#
cuGraphNodeSetParams = 713#
cuGraphReleaseUserObject = 637#
cuGraphRemoveDependencies = 519#
cuGraphRemoveDependencies_v2 = 728#
cuGraphRetainUserObject = 636#
cuGraphUpload = 580#
cuGraphUpload_ptsz = 581#
cuGraphicsD3D10RegisterResource = 140#
cuGraphicsD3D11RegisterResource = 153#
cuGraphicsD3D9RegisterResource = 156#
cuGraphicsEGLRegisterImage = 390#
cuGraphicsGLRegisterBuffer = 175#
cuGraphicsGLRegisterImage = 176#
cuGraphicsMapResources = 133#
cuGraphicsMapResources_ptsz = 443#
cuGraphicsResourceGetMappedEglFrame = 449#
cuGraphicsResourceGetMappedMipmappedArray = 360#
cuGraphicsResourceGetMappedPointer = 130#
cuGraphicsResourceGetMappedPointer_v2 = 258#
cuGraphicsResourceSetMapFlags = 132#
cuGraphicsResourceSetMapFlags_v2 = 380#
cuGraphicsSubResourceGetMappedArray = 129#
cuGraphicsUnmapResources = 134#
cuGraphicsUnmapResources_ptsz = 444#
cuGraphicsUnregisterResource = 128#
cuGraphicsVDPAURegisterOutputSurface = 189#
cuGraphicsVDPAURegisterVideoSurface = 188#
cuGreenCtxCreate = 743#
cuGreenCtxDestroy = 744#
cuGreenCtxGetDevResource = 747#
cuGreenCtxGetId = 782#
cuGreenCtxRecordEvent = 749#
cuGreenCtxStreamCreate = 758#
cuGreenCtxWaitEvent = 750#
cuImportExternalMemory = 485#
cuImportExternalSemaphore = 489#
cuInit = 1#
cuIpcCloseMemHandle = 330#
cuIpcGetEventHandle = 334#
cuIpcGetMemHandle = 328#
cuIpcOpenEventHandle = 335#
cuIpcOpenMemHandle = 329#
cuIpcOpenMemHandle_v2 = 567#
cuKernelGetAttribute = 686#
cuKernelGetFunction = 683#
cuKernelGetLibrary = 754#
cuKernelGetName = 719#
cuKernelGetParamCount = 836#
cuKernelGetParamInfo = 734#
cuKernelSetAttribute = 687#
cuKernelSetCacheConfig = 688#
cuLaunch = 115#
cuLaunchCooperativeKernel = 477#
cuLaunchCooperativeKernelMultiDevice = 480#
cuLaunchCooperativeKernel_ptsz = 478#
cuLaunchGrid = 116#
cuLaunchGridAsync = 117#
cuLaunchHostFunc = 527#
cuLaunchHostFunc_ptsz = 528#
cuLaunchHostFunc_v2 = 831#
cuLaunchHostFunc_v2_ptsz = 832#
cuLaunchKernel = 307#
cuLaunchKernelEx = 652#
cuLaunchKernelEx_ptsz = 653#
cuLaunchKernel_ptsz = 442#
cuLibraryEnumerateKernels = 740#
cuLibraryGetGlobal = 684#
cuLibraryGetKernel = 681#
cuLibraryGetKernelCount = 739#
cuLibraryGetManaged = 685#
cuLibraryGetModule = 682#
cuLibraryGetUnifiedFunction = 700#
cuLibraryLoadData = 678#
cuLibraryLoadFromFile = 679#
cuLibraryUnload = 680#
cuLinkAddData = 363#
cuLinkAddData_v2 = 382#
cuLinkAddFile = 364#
cuLinkAddFile_v2 = 383#
cuLinkComplete = 365#
cuLinkCreate = 362#
cuLinkCreate_v2 = 381#
cuLinkDestroy = 366#
cuLogsCurrent = 765#
cuLogsDumpToFile = 766#
cuLogsDumpToMemory = 767#
cuLogsRegisterCallback = 763#
cuLogsUnregisterCallback = 764#
cuMemAddressFree = 548#
cuMemAddressReserve = 547#
cuMemAdvise = 457#
cuMemAdvise_v2 = 715#
cuMemAlloc = 29#
cuMemAllocAsync = 598#
cuMemAllocAsync_ptsz = 599#
cuMemAllocFromPoolAsync = 611#
cuMemAllocFromPoolAsync_ptsz = 612#
cuMemAllocHost = 37#
cuMemAllocHost_v2 = 294#
cuMemAllocManaged = 371#
cuMemAllocPitch = 31#
cuMemAllocPitch_v2 = 244#
cuMemAlloc_v2 = 243#
cuMemBatchDecompressAsync = 761#
cuMemBatchDecompressAsync_ptsz = 762#
cuMemCreate = 549#
cuMemDiscardAndPrefetchBatchAsync = 791#
cuMemDiscardAndPrefetchBatchAsync_ptsz = 792#
cuMemDiscardBatchAsync = 789#
cuMemDiscardBatchAsync_ptsz = 790#
cuMemExportToShareableHandle = 554#
cuMemFree = 33#
cuMemFreeAsync = 600#
cuMemFreeAsync_ptsz = 601#
cuMemFreeHost = 38#
cuMemFree_v2 = 245#
cuMemGetAccess = 558#
cuMemGetAddressRange = 35#
cuMemGetAddressRange_v2 = 246#
cuMemGetAllocationGranularity = 556#
cuMemGetAllocationPropertiesFromHandle = 557#
cuMemGetDefaultMemPool = 801#
cuMemGetHandleForAddressRange = 674#
cuMemGetInfo = 27#
cuMemGetInfo_v2 = 242#
cuMemGetMemPool = 802#
cuMemHostAlloc = 39#
cuMemHostAlloc_v2 = 271#
cuMemHostGetDevicePointer = 40#
cuMemHostGetDevicePointer_v2 = 247#
cuMemHostGetFlags = 42#
cuMemHostRegister = 301#
cuMemHostRegister_v2 = 379#
cuMemHostUnregister = 302#
cuMemImportFromShareableHandle = 555#
cuMemMap = 551#
cuMemMapArrayAsync = 584#
cuMemMapArrayAsync_ptsz = 585#
cuMemPeerGetDevicePointer = 317#
cuMemPeerRegister = 315#
cuMemPeerUnregister = 316#
cuMemPoolCreate = 607#
cuMemPoolDestroy = 608#
cuMemPoolExportPointer = 615#
cuMemPoolExportToShareableHandle = 613#
cuMemPoolGetAccess = 617#
cuMemPoolGetAttribute = 604#
cuMemPoolImportFromShareableHandle = 614#
cuMemPoolImportPointer = 616#
cuMemPoolSetAccess = 605#
cuMemPoolSetAttribute = 603#
cuMemPoolTrimTo = 602#
cuMemPrefetchAsync = 467#
cuMemPrefetchAsync_ptsz = 468#
cuMemPrefetchAsync_v2 = 716#
cuMemPrefetchAsync_v2_ptsz = 717#
cuMemPrefetchBatchAsync = 784#
cuMemPrefetchBatchAsync_ptsz = 785#
cuMemRangeGetAttribute = 471#
cuMemRangeGetAttributes = 472#
cuMemRelease = 550#
cuMemRetainAllocationHandle = 565#
cuMemSetAccess = 553#
cuMemSetMemPool = 803#
cuMemUnmap = 552#
cuMemcpy = 305#
cuMemcpy2D = 56#
cuMemcpy2DAsync = 68#
cuMemcpy2DAsync_v2 = 289#
cuMemcpy2DAsync_v2_ptsz = 424#
cuMemcpy2DUnaligned = 57#
cuMemcpy2DUnaligned_v2 = 288#
cuMemcpy2DUnaligned_v2_ptds = 406#
cuMemcpy2D_v2 = 287#
cuMemcpy2D_v2_ptds = 405#
cuMemcpy3D = 58#
cuMemcpy3DAsync = 69#
cuMemcpy3DAsync_v2 = 291#
cuMemcpy3DAsync_v2_ptsz = 425#
cuMemcpy3DBatchAsync = 778#
cuMemcpy3DBatchAsync_ptsz = 779#
cuMemcpy3DBatchAsync_v2 = 798#
cuMemcpy3DBatchAsync_v2_ptsz = 799#
cuMemcpy3DPeer = 320#
cuMemcpy3DPeerAsync = 321#
cuMemcpy3DPeerAsync_ptsz = 427#
cuMemcpy3DPeer_ptds = 410#
cuMemcpy3DWithAttributesAsync = 839#
cuMemcpy3DWithAttributesAsync_ptsz = 840#
cuMemcpy3D_v2 = 290#
cuMemcpy3D_v2_ptds = 407#
cuMemcpyAsync = 306#
cuMemcpyAsync_ptsz = 418#
cuMemcpyAtoA = 55#
cuMemcpyAtoA_v2 = 286#
cuMemcpyAtoA_v2_ptds = 404#
cuMemcpyAtoD = 51#
cuMemcpyAtoD_v2 = 284#
cuMemcpyAtoD_v2_ptds = 401#
cuMemcpyAtoH = 54#
cuMemcpyAtoHAsync = 67#
cuMemcpyAtoHAsync_v2 = 283#
cuMemcpyAtoHAsync_v2_ptsz = 420#
cuMemcpyAtoH_v2 = 282#
cuMemcpyAtoH_v2_ptds = 403#
cuMemcpyBatchAsync = 776#
cuMemcpyBatchAsync_ptsz = 777#
cuMemcpyBatchAsync_v2 = 796#
cuMemcpyBatchAsync_v2_ptsz = 797#
cuMemcpyDtoA = 49#
cuMemcpyDtoA_v2 = 285#
cuMemcpyDtoA_v2_ptds = 400#
cuMemcpyDtoD = 47#
cuMemcpyDtoDAsync = 64#
cuMemcpyDtoDAsync_v2 = 281#
cuMemcpyDtoDAsync_v2_ptsz = 423#
cuMemcpyDtoD_v2 = 280#
cuMemcpyDtoD_v2_ptds = 399#
cuMemcpyDtoH = 45#
cuMemcpyDtoHAsync = 62#
cuMemcpyDtoHAsync_v2 = 279#
cuMemcpyDtoHAsync_v2_ptsz = 422#
cuMemcpyDtoH_v2 = 278#
cuMemcpyDtoH_v2_ptds = 398#
cuMemcpyHtoA = 53#
cuMemcpyHtoAAsync = 66#
cuMemcpyHtoAAsync_v2 = 293#
cuMemcpyHtoAAsync_v2_ptsz = 419#
cuMemcpyHtoA_v2 = 292#
cuMemcpyHtoA_v2_ptds = 402#
cuMemcpyHtoD = 43#
cuMemcpyHtoDAsync = 60#
cuMemcpyHtoDAsync_v2 = 277#
cuMemcpyHtoDAsync_v2_ptsz = 421#
cuMemcpyHtoD_v2 = 276#
cuMemcpyHtoD_v2_ptds = 397#
cuMemcpyPeer = 318#
cuMemcpyPeerAsync = 319#
cuMemcpyPeerAsync_ptsz = 426#
cuMemcpyPeer_ptds = 409#
cuMemcpyWithAttributesAsync = 837#
cuMemcpyWithAttributesAsync_ptsz = 838#
cuMemcpy_ptds = 408#
cuMemcpy_v2 = 248#
cuMemsetD16 = 73#
cuMemsetD16Async = 218#
cuMemsetD16Async_ptsz = 429#
cuMemsetD16_v2 = 250#
cuMemsetD16_v2_ptds = 412#
cuMemsetD2D16 = 79#
cuMemsetD2D16Async = 224#
cuMemsetD2D16Async_ptsz = 432#
cuMemsetD2D16_v2 = 253#
cuMemsetD2D16_v2_ptds = 415#
cuMemsetD2D32 = 81#
cuMemsetD2D32Async = 226#
cuMemsetD2D32Async_ptsz = 433#
cuMemsetD2D32_v2 = 254#
cuMemsetD2D32_v2_ptds = 416#
cuMemsetD2D8 = 77#
cuMemsetD2D8Async = 222#
cuMemsetD2D8Async_ptsz = 431#
cuMemsetD2D8_v2 = 252#
cuMemsetD2D8_v2_ptds = 414#
cuMemsetD32 = 75#
cuMemsetD32Async = 220#
cuMemsetD32Async_ptsz = 430#
cuMemsetD32_v2 = 251#
cuMemsetD32_v2_ptds = 413#
cuMemsetD8 = 71#
cuMemsetD8Async = 216#
cuMemsetD8Async_ptsz = 428#
cuMemsetD8_v2 = 249#
cuMemsetD8_v2_ptds = 411#
cuMipmappedArrayCreate = 347#
cuMipmappedArrayDestroy = 349#
cuMipmappedArrayGetLevel = 348#
cuMipmappedArrayGetMemoryRequirements = 655#
cuMipmappedArrayGetSparseProperties = 583#
cuModuleEnumerateFunctions = 738#
cuModuleGetFunction = 23#
cuModuleGetFunctionCount = 737#
cuModuleGetGlobal = 24#
cuModuleGetGlobal_v2 = 241#
cuModuleGetLoadingMode = 673#
cuModuleGetSurfRef = 190#
cuModuleGetTexRef = 26#
cuModuleLoad = 18#
cuModuleLoadData = 19#
cuModuleLoadDataEx = 20#
cuModuleLoadFatBinary = 21#
cuModuleUnload = 22#
cuMulticastAddDevice = 707#
cuMulticastBindAddr = 709#
cuMulticastBindAddr_v2 = 821#
cuMulticastBindMem = 708#
cuMulticastBindMem_v2 = 820#
cuMulticastCreate = 706#
cuMulticastGetGranularity = 711#
cuMulticastUnbind = 710#
cuNNSetAllocator = 466#
cuOccupancyAvailableDynamicSMemPerBlock = 543#
cuOccupancyMaxActiveBlocksPerMultiprocessor = 374#
cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags = 451#
cuOccupancyMaxActiveClusters = 676#
cuOccupancyMaxPotentialBlockSize = 384#
cuOccupancyMaxPotentialBlockSizeWithFlags = 452#
cuOccupancyMaxPotentialClusterSize = 675#
cuParamSetSize = 110#
cuParamSetTexRef = 114#
cuParamSetf = 112#
cuParamSeti = 111#
cuParamSetv = 113#
cuPointerGetAttribute = 310#
cuPointerGetAttributes = 450#
cuPointerSetAttribute = 378#
cuProfilerInitialize = 311#
cuProfilerStart = 308#
cuProfilerStop = 309#
cuSignalExternalSemaphoresAsync = 490#
cuSignalExternalSemaphoresAsync_ptsz = 491#
cuStreamAddCallback = 346#
cuStreamAddCallback_ptsz = 437#
cuStreamAttachMemAsync = 377#
cuStreamAttachMemAsync_ptsz = 438#
cuStreamBatchMemOp = 462#
cuStreamBatchMemOp_ptsz = 463#
cuStreamBatchMemOp_v2 = 667#
cuStreamBatchMemOp_v2_ptsz = 668#
cuStreamBeginCapture = 495#
cuStreamBeginCaptureToCig = 783#
cuStreamBeginCaptureToCig_ptsz = 814#
cuStreamBeginCaptureToGraph = 720#
cuStreamBeginCaptureToGraph_ptsz = 721#
cuStreamBeginCapture_ptsz = 496#
cuStreamBeginCapture_v2 = 539#
cuStreamBeginCapture_v2_ptsz = 540#
cuStreamBeginRecaptureToGraph = 846#
cuStreamBeginRecaptureToGraph_ptsz = 847#
cuStreamCopyAttributes = 572#
cuStreamCopyAttributes_ptsz = 573#
cuStreamCreate = 124#
cuStreamCreateWithPriority = 367#
cuStreamDestroy = 127#
cuStreamDestroy_v2 = 326#
cuStreamEndCapture = 497#
cuStreamEndCaptureToCig = 815#
cuStreamEndCaptureToCig_ptsz = 816#
cuStreamEndCapture_ptsz = 498#
cuStreamGetAttribute = 574#
cuStreamGetAttribute_ptsz = 575#
cuStreamGetCaptureInfo = 536#
cuStreamGetCaptureInfo_ptsz = 537#
cuStreamGetCaptureInfo_v2 = 629#
cuStreamGetCaptureInfo_v2_ptsz = 630#
cuStreamGetCaptureInfo_v3 = 729#
cuStreamGetCaptureInfo_v3_ptsz = 730#
cuStreamGetCtx = 483#
cuStreamGetCtx_ptsz = 484#
cuStreamGetCtx_v2 = 759#
cuStreamGetCtx_v2_ptsz = 760#
cuStreamGetDevResource = 807#
cuStreamGetDevResource_ptsz = 808#
cuStreamGetDevice = 774#
cuStreamGetDevice_ptsz = 775#
cuStreamGetFlags = 369#
cuStreamGetFlags_ptsz = 435#
cuStreamGetGreenCtx = 752#
cuStreamGetId = 693#
cuStreamGetId_ptsz = 694#
cuStreamGetPriority = 368#
cuStreamGetPriority_ptsz = 434#
cuStreamIsCapturing = 499#
cuStreamIsCapturing_ptsz = 500#
cuStreamQuery = 125#
cuStreamQuery_ptsz = 439#
cuStreamSetAttribute = 576#
cuStreamSetAttribute_ptsz = 577#
cuStreamSetFlags = 559#
cuStreamSetFlags_ptsz = 560#
cuStreamSynchronize = 126#
cuStreamSynchronize_ptsz = 440#
cuStreamUpdateCaptureDependencies = 631#
cuStreamUpdateCaptureDependencies_ptsz = 632#
cuStreamUpdateCaptureDependencies_v2 = 731#
cuStreamUpdateCaptureDependencies_v2_ptsz = 732#
cuStreamWaitEvent = 295#
cuStreamWaitEvent_ptsz = 436#
cuStreamWaitValue32 = 458#
cuStreamWaitValue32_ptsz = 459#
cuStreamWaitValue32_v2 = 659#
cuStreamWaitValue32_v2_ptsz = 660#
cuStreamWaitValue64 = 473#
cuStreamWaitValue64_ptsz = 474#
cuStreamWaitValue64_v2 = 661#
cuStreamWaitValue64_v2_ptsz = 662#
cuStreamWriteValue32 = 460#
cuStreamWriteValue32_ptsz = 461#
cuStreamWriteValue32_v2 = 663#
cuStreamWriteValue32_v2_ptsz = 664#
cuStreamWriteValue64 = 475#
cuStreamWriteValue64_ptsz = 476#
cuStreamWriteValue64_v2 = 665#
cuStreamWriteValue64_v2_ptsz = 666#
cuSurfObjectCreate = 343#
cuSurfObjectDestroy = 344#
cuSurfObjectGetResourceDesc = 345#
cuSurfRefCreate = 191#
cuSurfRefDestroy = 192#
cuSurfRefGetArray = 196#
cuSurfRefGetFormat = 195#
cuSurfRefSetArray = 194#
cuSurfRefSetFormat = 193#
cuTensorMapEncodeIm2col = 698#
cuTensorMapEncodeIm2colWide = 781#
cuTensorMapEncodeTiled = 697#
cuTensorMapReplaceAddress = 699#
cuTexObjectCreate = 339#
cuTexObjectDestroy = 340#
cuTexObjectGetResourceDesc = 341#
cuTexObjectGetResourceViewDesc = 361#
cuTexObjectGetTextureDesc = 342#
cuTexRefCreate = 92#
cuTexRefDestroy = 93#
cuTexRefGetAddress = 103#
cuTexRefGetAddressMode = 106#
cuTexRefGetAddress_v2 = 257#
cuTexRefGetArray = 105#
cuTexRefGetBorderColor = 456#
cuTexRefGetFilterMode = 107#
cuTexRefGetFlags = 109#
cuTexRefGetFormat = 108#
cuTexRefGetMaxAnisotropy = 359#
cuTexRefGetMipmapFilterMode = 356#
cuTexRefGetMipmapLevelBias = 357#
cuTexRefGetMipmapLevelClamp = 358#
cuTexRefGetMipmappedArray = 355#
cuTexRefSetAddress = 95#
cuTexRefSetAddress2D = 97#
cuTexRefSetAddress2D_v2 = 256#
cuTexRefSetAddress2D_v3 = 327#
cuTexRefSetAddressMode = 100#
cuTexRefSetAddress_v2 = 255#
cuTexRefSetArray = 94#
cuTexRefSetBorderColor = 455#
cuTexRefSetFilterMode = 101#
cuTexRefSetFlags = 102#
cuTexRefSetFormat = 99#
cuTexRefSetMaxAnisotropy = 354#
cuTexRefSetMipmapFilterMode = 351#
cuTexRefSetMipmapLevelBias = 352#
cuTexRefSetMipmapLevelClamp = 353#
cuTexRefSetMipmappedArray = 350#
cuThreadExchangeStreamCaptureMode = 541#
cuUserObjectCreate = 633#
cuUserObjectRelease = 635#
cuUserObjectRetain = 634#
cuVDPAUCtxCreate = 187#
cuVDPAUCtxCreate_v2 = 240#
cuVDPAUGetDevice = 186#
cuWGLGetDevice = 177#
cuWaitExternalSemaphoresAsync = 492#
cuWaitExternalSemaphoresAsync_ptsz = 493#
reserved464 = 464#
reserved465 = 465#
reserved469 = 469#
reserved786 = 786#
reserved787 = 787#
reserved788 = 788#
reserved793 = 793#
reserved794 = 794#
reserved817 = 817#
reserved818 = 818#
reserved819 = 819#
reserved823 = 823#
reserved824 = 824#
reserved825 = 825#
reserved826 = 826#
reserved827 = 827#
reserved828 = 828#
reserved829 = 829#
reserved830 = 830#
reserved833 = 833#
reserved834 = 834#
reserved848 = 848#
reserved849 = 849#
reserved850 = 850#
reserved851 = 851#
reserved852 = 852#
reserved853 = 853#
reserved854 = 854#
reserved855 = 855#
reserved856 = 856#
reserved857 = 857#
reserved858 = 858#
reserved859 = 859#
class cupti.cupti.Nvtx_api_trace_cbid(value)#

Bases: IntEnum

See CUpti_nvtx_api_trace_cbid.

FORCE_INT = 2147483647#
INVALID = 0#
SIZE = 50#
nvtxDomainCreateA = 41#
nvtxDomainCreateW = 42#
nvtxDomainDestroy = 43#
nvtxDomainMarkEx = 30#
nvtxDomainNameCategoryA = 37#
nvtxDomainNameCategoryW = 38#
nvtxDomainRangeEnd = 32#
nvtxDomainRangePop = 34#
nvtxDomainRangePushEx = 33#
nvtxDomainRangeStartEx = 31#
nvtxDomainRegisterStringA = 39#
nvtxDomainRegisterStringW = 40#
nvtxDomainResourceCreate = 35#
nvtxDomainResourceDestroy = 36#
nvtxDomainSyncUserAcquireFailed = 47#
nvtxDomainSyncUserAcquireStart = 46#
nvtxDomainSyncUserAcquireSuccess = 48#
nvtxDomainSyncUserCreate = 44#
nvtxDomainSyncUserDestroy = 45#
nvtxDomainSyncUserReleasing = 49#
nvtxMarkA = 1#
nvtxMarkEx = 3#
nvtxMarkW = 2#
nvtxNameCategoryA = 12#
nvtxNameCategoryW = 13#
nvtxNameCuContextA = 18#
nvtxNameCuContextW = 19#
nvtxNameCuDeviceA = 16#
nvtxNameCuDeviceW = 17#
nvtxNameCuEventA = 22#
nvtxNameCuEventW = 23#
nvtxNameCuStreamA = 20#
nvtxNameCuStreamW = 21#
nvtxNameCudaDeviceA = 24#
nvtxNameCudaDeviceW = 25#
nvtxNameCudaEventA = 28#
nvtxNameCudaEventW = 29#
nvtxNameCudaStreamA = 26#
nvtxNameCudaStreamW = 27#
nvtxNameOsThreadA = 14#
nvtxNameOsThreadW = 15#
nvtxRangeEnd = 7#
nvtxRangePop = 11#
nvtxRangePushA = 8#
nvtxRangePushEx = 10#
nvtxRangePushW = 9#
nvtxRangeStartA = 4#
nvtxRangeStartEx = 6#
nvtxRangeStartW = 5#
class cupti.cupti.Runtime_api_trace_cbid(value)#

Bases: IntEnum

See CUpti_runtime_api_trace_cbid.

FORCE_INT = 2147483647#
INVALID = 0#
SIZE = 559#
cuda470_v12060 = 470#
cuda471_v12060 = 471#
cuda472_v12060 = 472#
cuda473_v12060 = 473#
cuda474_v12060 = 474#
cuda475_v12060 = 475#
cuda476_v12060 = 476#
cuda477_v12060 = 477#
cuda478_v12060 = 478#
cuda479_v12060 = 479#
cuda531_v13010 = 531#
cuda532_v13010 = 532#
cuda545_v13010 = 545#
cuda546_v13010 = 546#
cuda547_v13010 = 547#
cuda553_v13020 = 553#
cudaArrayGetInfo_v4010 = 181#
cudaArrayGetMemoryRequirements_v11060 = 428#
cudaArrayGetPlane_v11020 = 381#
cudaArrayGetSparseProperties_v11010 = 359#
cudaBindSurfaceToArray_v3020 = 61#
cudaBindTexture2D_v3020 = 56#
cudaBindTextureToArray_v3020 = 57#
cudaBindTextureToMipmappedArray_v5000 = 195#
cudaBindTexture_v3020 = 55#
cudaChooseDevice_v3020 = 5#
cudaConfigureCall_v3020 = 8#
cudaCreateChannelDesc_v3020 = 7#
cudaCreateSurfaceObject_v5000 = 189#
cudaCreateTextureObject_v2_v11080 = 434#
cudaCreateTextureObject_v5000 = 185#
cudaCtxResetPersistingL2Cache_v11000 = 337#
cudaD3D10GetDevice_v3020 = 88#
cudaD3D10GetDevices_v3020 = 89#
cudaD3D10GetDirect3DDevice_v3020 = 149#
cudaD3D10MapResources_v3020 = 94#
cudaD3D10RegisterResource_v3020 = 92#
cudaD3D10ResourceGetMappedArray_v3020 = 98#
cudaD3D10ResourceGetMappedPitch_v3020 = 101#
cudaD3D10ResourceGetMappedPointer_v3020 = 99#
cudaD3D10ResourceGetMappedSize_v3020 = 100#
cudaD3D10ResourceGetSurfaceDimensions_v3020 = 97#
cudaD3D10ResourceSetMapFlags_v3020 = 96#
cudaD3D10SetDirect3DDevice_v3020 = 90#
cudaD3D10UnmapResources_v3020 = 95#
cudaD3D10UnregisterResource_v3020 = 93#
cudaD3D11GetDevice_v3020 = 84#
cudaD3D11GetDevices_v3020 = 85#
cudaD3D11GetDirect3DDevice_v3020 = 148#
cudaD3D11SetDirect3DDevice_v3020 = 86#
cudaD3D9Begin_v3020 = 117#
cudaD3D9End_v3020 = 118#
cudaD3D9GetDevice_v3020 = 102#
cudaD3D9GetDevices_v3020 = 103#
cudaD3D9GetDirect3DDevice_v3020 = 105#
cudaD3D9MapResources_v3020 = 109#
cudaD3D9MapVertexBuffer_v3020 = 121#
cudaD3D9RegisterResource_v3020 = 107#
cudaD3D9RegisterVertexBuffer_v3020 = 119#
cudaD3D9ResourceGetMappedArray_v3020 = 113#
cudaD3D9ResourceGetMappedPitch_v3020 = 116#
cudaD3D9ResourceGetMappedPointer_v3020 = 114#
cudaD3D9ResourceGetMappedSize_v3020 = 115#
cudaD3D9ResourceGetSurfaceDimensions_v3020 = 112#
cudaD3D9ResourceSetMapFlags_v3020 = 111#
cudaD3D9SetDirect3DDevice_v3020 = 104#
cudaD3D9UnmapResources_v3020 = 110#
cudaD3D9UnmapVertexBuffer_v3020 = 122#
cudaD3D9UnregisterResource_v3020 = 108#
cudaD3D9UnregisterVertexBuffer_v3020 = 120#
cudaDestroyExternalMemory_v10000 = 277#
cudaDestroyExternalSemaphore_v10000 = 283#
cudaDestroySurfaceObject_v5000 = 190#
cudaDestroyTextureObject_v5000 = 186#
cudaDevResourceGenerateDesc_v13010 = 525#
cudaDevSmResourceSplitByCount_v13010 = 524#
cudaDevSmResourceSplit_v13010 = 549#
cudaDeviceCanAccessPeer_v4000 = 154#
cudaDeviceDisablePeerAccess_v4000 = 156#
cudaDeviceEnablePeerAccess_v4000 = 155#
cudaDeviceFlushGPUDirectRDMAWrites_v11030 = 405#
cudaDeviceGetAttribute_v5000 = 200#
cudaDeviceGetByPCIBusId_v4010 = 173#
cudaDeviceGetCacheConfig_v3020 = 168#
cudaDeviceGetDefaultMemPool_v11020 = 372#
cudaDeviceGetDevResource_v13010 = 523#
cudaDeviceGetExecutionCtx_v13010 = 548#
cudaDeviceGetGraphMemAttribute_v11040 = 424#
cudaDeviceGetHostAtomicCapabilities_v13000 = 521#
cudaDeviceGetLimit_v3020 = 166#
cudaDeviceGetMemPool_v11020 = 386#
cudaDeviceGetNvSciSyncAttributes_v10020 = 328#
cudaDeviceGetP2PAtomicCapabilities_v13000 = 522#
cudaDeviceGetP2PAttribute_v8000 = 255#
cudaDeviceGetPCIBusId_v4010 = 174#
cudaDeviceGetSharedMemConfig_v4020 = 183#
cudaDeviceGetStreamPriorityRange_v5050 = 205#
cudaDeviceGetTexture1DLinearMaxWidth_v11010 = 347#
cudaDeviceGraphMemTrim_v11040 = 423#
cudaDeviceRegisterAsyncNotification_v12040 = 465#
cudaDeviceReset_v3020 = 164#
cudaDeviceSetCacheConfig_v3020 = 169#
cudaDeviceSetGraphMemAttribute_v11040 = 425#
cudaDeviceSetLimit_v3020 = 167#
cudaDeviceSetMemPool_v11020 = 385#
cudaDeviceSetSharedMemConfig_v4020 = 184#
cudaDeviceSynchronize_v3020 = 165#
cudaDeviceUnregisterAsyncNotification_v12040 = 466#
cudaDriverGetVersion_v3020 = 1#
cudaEGLStreamConsumerAcquireFrame_v7000 = 259#
cudaEGLStreamConsumerConnectWithFlags_v7000 = 268#
cudaEGLStreamConsumerConnect_v7000 = 257#
cudaEGLStreamConsumerDisconnect_v7000 = 258#
cudaEGLStreamConsumerReleaseFrame_v7000 = 260#
cudaEGLStreamProducerConnect_v7000 = 261#
cudaEGLStreamProducerDisconnect_v7000 = 262#
cudaEGLStreamProducerPresentFrame_v7000 = 263#
cudaEGLStreamProducerReturnFrame_v7000 = 264#
cudaEventCreateFromEGLSync_v9000 = 271#
cudaEventCreateWithFlags_v3020 = 134#
cudaEventCreate_v3020 = 133#
cudaEventDestroy_v3020 = 136#
cudaEventElapsedTime_v12080 = 486#
cudaEventElapsedTime_v2_v12080 = 486#
cudaEventElapsedTime_v3020 = 139#
cudaEventQuery_v3020 = 138#
cudaEventRecordWithFlags_ptsz_v11010 = 371#
cudaEventRecordWithFlags_v11010 = 370#
cudaEventRecord_ptsz_v7000 = 242#
cudaEventRecord_v3020 = 135#
cudaEventSynchronize_v3020 = 137#
cudaExecutionCtxDestroy_v13010 = 527#
cudaExecutionCtxGetDevResource_v13010 = 528#
cudaExecutionCtxGetDevice_v13010 = 529#
cudaExecutionCtxGetId_v13010 = 530#
cudaExecutionCtxRecordEvent_v13010 = 543#
cudaExecutionCtxStreamCreate_v13010 = 533#
cudaExecutionCtxSynchronize_v13010 = 534#
cudaExecutionCtxWaitEvent_v13010 = 544#
cudaExternalMemoryGetMappedBuffer_v10000 = 275#
cudaExternalMemoryGetMappedMipmappedArray_v10000 = 276#
cudaFreeArray_v3020 = 24#
cudaFreeAsync_ptsz_v11020 = 376#
cudaFreeAsync_v11020 = 375#
cudaFreeHost_v3020 = 26#
cudaFreeMipmappedArray_v5000 = 194#
cudaFree_v3020 = 22#
cudaFuncGetAttributes_v3020 = 15#
cudaFuncGetName_v12030 = 451#
cudaFuncGetParamCount_v13020 = 552#
cudaFuncGetParamInfo_v12040 = 467#
cudaFuncSetAttribute_v9000 = 273#
cudaFuncSetCacheConfig_v3020 = 14#
cudaFuncSetSharedMemConfig_v4020 = 182#
cudaGLGetDevices_v4010 = 175#
cudaGLMapBufferObjectAsync_v3020 = 69#
cudaGLMapBufferObject_v3020 = 65#
cudaGLRegisterBufferObject_v3020 = 64#
cudaGLSetBufferObjectMapFlags_v3020 = 68#
cudaGLSetGLDevice_v3020 = 63#
cudaGLUnmapBufferObjectAsync_v3020 = 70#
cudaGLUnmapBufferObject_v3020 = 66#
cudaGLUnregisterBufferObject_v3020 = 67#
cudaGetChannelDesc_v3020 = 6#
cudaGetDeviceCount_v3020 = 3#
cudaGetDeviceFlags_v7000 = 212#
cudaGetDeviceProperties_v12000 = 440#
cudaGetDeviceProperties_v2_v12000 = 440#
cudaGetDeviceProperties_v3020 = 4#
cudaGetDevice_v3020 = 17#
cudaGetDriverEntryPointByVersion_ptsz_v12050 = 469#
cudaGetDriverEntryPointByVersion_v12050 = 468#
cudaGetDriverEntryPoint_ptsz_v11030 = 407#
cudaGetDriverEntryPoint_v11030 = 406#
cudaGetErrorName_v6050 = 209#
cudaGetErrorString_v3020 = 12#
cudaGetExportTable_v13000 = 493#
cudaGetFuncBySymbol_v11000 = 336#
cudaGetKernel_v12000 = 439#
cudaGetLastError_v3020 = 10#
cudaGetMipmappedArrayLevel_v5000 = 193#
cudaGetSurfaceObjectResourceDesc_v5000 = 191#
cudaGetSurfaceReference_v3020 = 62#
cudaGetSymbolAddress_v3020 = 53#
cudaGetSymbolSize_v3020 = 54#
cudaGetTextureAlignmentOffset_v3020 = 59#
cudaGetTextureObjectResourceDesc_v5000 = 187#
cudaGetTextureObjectResourceViewDesc_v5000 = 199#
cudaGetTextureObjectTextureDesc_v2_v11080 = 435#
cudaGetTextureObjectTextureDesc_v5000 = 188#
cudaGetTextureReference_v3020 = 60#
cudaGraphAddChildGraphNode_v10000 = 298#
cudaGraphAddDependencies_v10000 = 307#
cudaGraphAddDependencies_v12030 = 458#
cudaGraphAddDependencies_v2_v12030 = 458#
cudaGraphAddEmptyNode_v10000 = 300#
cudaGraphAddEventRecordNode_v11010 = 362#
cudaGraphAddEventWaitNode_v11010 = 365#
cudaGraphAddExternalSemaphoresSignalNode_v11020 = 397#
cudaGraphAddExternalSemaphoresWaitNode_v11020 = 400#
cudaGraphAddHostNode_v10000 = 296#
cudaGraphAddKernelNode_v10000 = 289#
cudaGraphAddMemAllocNode_v11040 = 419#
cudaGraphAddMemFreeNode_v11040 = 421#
cudaGraphAddMemcpyNode1D_v11010 = 352#
cudaGraphAddMemcpyNodeFromSymbol_v11010 = 351#
cudaGraphAddMemcpyNodeToSymbol_v11010 = 350#
cudaGraphAddMemcpyNode_v10000 = 290#
cudaGraphAddMemsetNode_v10000 = 293#
cudaGraphAddNode_v12020 = 445#
cudaGraphAddNode_v12030 = 460#
cudaGraphAddNode_v2_v12030 = 460#
cudaGraphChildGraphNodeGetGraph_v10000 = 299#
cudaGraphClone_v10000 = 301#
cudaGraphConditionalHandleCreate_v12030 = 454#
cudaGraphConditionalHandleCreate_v2_v13010 = 535#
cudaGraphCreate_v10000 = 286#
cudaGraphDebugDotPrint_v11030 = 408#
cudaGraphDestroyNode_v10000 = 309#
cudaGraphDestroy_v10000 = 314#
cudaGraphEventRecordNodeGetEvent_v11010 = 363#
cudaGraphEventRecordNodeSetEvent_v11010 = 364#
cudaGraphEventWaitNodeGetEvent_v11010 = 366#
cudaGraphEventWaitNodeSetEvent_v11010 = 367#
cudaGraphExecChildGraphNodeSetParams_v11010 = 361#
cudaGraphExecDestroy_v10000 = 313#
cudaGraphExecEventRecordNodeSetEvent_v11010 = 368#
cudaGraphExecEventWaitNodeSetEvent_v11010 = 369#
cudaGraphExecExternalSemaphoresSignalNodeSetParams_v11020 = 403#
cudaGraphExecExternalSemaphoresWaitNodeSetParams_v11020 = 404#
cudaGraphExecGetFlags_v12000 = 438#
cudaGraphExecGetId_v13010 = 542#
cudaGraphExecHostNodeSetParams_v10020 = 334#
cudaGraphExecKernelNodeSetParams_v10010 = 326#
cudaGraphExecMemcpyNodeSetParams1D_v11010 = 358#
cudaGraphExecMemcpyNodeSetParamsFromSymbol_v11010 = 357#
cudaGraphExecMemcpyNodeSetParamsToSymbol_v11010 = 356#
cudaGraphExecMemcpyNodeSetParams_v10020 = 332#
cudaGraphExecMemsetNodeSetParams_v10020 = 333#
cudaGraphExecNodeSetParams_v12020 = 447#
cudaGraphExecUpdate_v10020 = 335#
cudaGraphExternalSemaphoresSignalNodeGetParams_v11020 = 398#
cudaGraphExternalSemaphoresSignalNodeSetParams_v11020 = 399#
cudaGraphExternalSemaphoresWaitNodeGetParams_v11020 = 401#
cudaGraphExternalSemaphoresWaitNodeSetParams_v11020 = 402#
cudaGraphGetEdges_v10000 = 323#
cudaGraphGetEdges_v12030 = 455#
cudaGraphGetEdges_v2_v12030 = 455#
cudaGraphGetId_v13010 = 541#
cudaGraphGetNodes_v10000 = 322#
cudaGraphGetRootNodes_v10000 = 304#
cudaGraphHostNodeGetParams_v10000 = 297#
cudaGraphHostNodeSetParams_v10000 = 321#
cudaGraphInstantiateWithFlags_v11040 = 418#
cudaGraphInstantiateWithParams_ptsz_v12000 = 437#
cudaGraphInstantiateWithParams_v12000 = 436#
cudaGraphInstantiate_v10000 = 310#
cudaGraphInstantiate_v12000 = 443#
cudaGraphKernelNodeCopyAttributes_v11000 = 338#
cudaGraphKernelNodeGetAttribute_v11000 = 339#
cudaGraphKernelNodeGetParams_v10000 = 287#
cudaGraphKernelNodeSetAttribute_v11000 = 340#
cudaGraphKernelNodeSetParams_v10000 = 288#
cudaGraphLaunch_ptsz_v10000 = 312#
cudaGraphLaunch_v10000 = 311#
cudaGraphMemAllocNodeGetParams_v11040 = 420#
cudaGraphMemFreeNodeGetParams_v11040 = 422#
cudaGraphMemcpyNodeGetParams_v10000 = 291#
cudaGraphMemcpyNodeSetParams1D_v11010 = 355#
cudaGraphMemcpyNodeSetParamsFromSymbol_v11010 = 354#
cudaGraphMemcpyNodeSetParamsToSymbol_v11010 = 353#
cudaGraphMemcpyNodeSetParams_v10000 = 292#
cudaGraphMemsetNodeGetParams_v10000 = 294#
cudaGraphMemsetNodeSetParams_v10000 = 295#
cudaGraphNodeFindInClone_v10000 = 302#
cudaGraphNodeGetContainingGraph_v13010 = 538#
cudaGraphNodeGetDependencies_v10000 = 305#
cudaGraphNodeGetDependencies_v12030 = 456#
cudaGraphNodeGetDependencies_v2_v12030 = 456#
cudaGraphNodeGetDependentNodes_v10000 = 306#
cudaGraphNodeGetDependentNodes_v12030 = 457#
cudaGraphNodeGetDependentNodes_v2_v12030 = 457#
cudaGraphNodeGetEnabled_v11060 = 427#
cudaGraphNodeGetLocalId_v13010 = 539#
cudaGraphNodeGetParams_v13020 = 558#
cudaGraphNodeGetToolsId_v13010 = 540#
cudaGraphNodeGetType_v10000 = 303#
cudaGraphNodeSetEnabled_v11060 = 426#
cudaGraphNodeSetParams_v12020 = 446#
cudaGraphReleaseUserObject_v11030 = 417#
cudaGraphRemoveDependencies_v10000 = 308#
cudaGraphRemoveDependencies_v12030 = 459#
cudaGraphRemoveDependencies_v2_v12030 = 459#
cudaGraphRetainUserObject_v11030 = 416#
cudaGraphUpload_ptsz_v10000 = 349#
cudaGraphUpload_v10000 = 348#
cudaGraphicsD3D10RegisterResource_v3020 = 91#
cudaGraphicsD3D11RegisterResource_v3020 = 87#
cudaGraphicsD3D9RegisterResource_v3020 = 106#
cudaGraphicsEGLRegisterImage_v7000 = 256#
cudaGraphicsGLRegisterBuffer_v3020 = 73#
cudaGraphicsGLRegisterImage_v3020 = 72#
cudaGraphicsMapResources_v3020 = 76#
cudaGraphicsResourceGetMappedEglFrame_v7000 = 265#
cudaGraphicsResourceGetMappedMipmappedArray_v5000 = 196#
cudaGraphicsResourceGetMappedPointer_v3020 = 78#
cudaGraphicsResourceSetMapFlags_v3020 = 75#
cudaGraphicsSubResourceGetMappedArray_v3020 = 79#
cudaGraphicsUnmapResources_v3020 = 77#
cudaGraphicsUnregisterResource_v3020 = 74#
cudaGraphicsVDPAURegisterOutputSurface_v3020 = 83#
cudaGraphicsVDPAURegisterVideoSurface_v3020 = 82#
cudaGreenCtxCreate_v13010 = 526#
cudaHostAlloc_v3020 = 27#
cudaHostGetDevicePointer_v3020 = 28#
cudaHostGetFlags_v3020 = 29#
cudaHostRegister_v4000 = 152#
cudaHostUnregister_v4000 = 153#
cudaImportExternalMemory_v10000 = 274#
cudaImportExternalSemaphore_v10000 = 278#
cudaInitDevice_v12000 = 444#
cudaIpcCloseMemHandle_v4010 = 180#
cudaIpcGetEventHandle_v4010 = 176#
cudaIpcGetMemHandle_v4010 = 178#
cudaIpcOpenEventHandle_v4010 = 177#
cudaIpcOpenMemHandle_v4010 = 179#
cudaKernelSetAttributeForDevice_v12060 = 479#
cudaLaunchCooperativeKernelMultiDevice_v9000 = 272#
cudaLaunchCooperativeKernel_ptsz_v9000 = 270#
cudaLaunchCooperativeKernel_v9000 = 269#
cudaLaunchHostFunc_ptsz_v10000 = 285#
cudaLaunchHostFunc_v10000 = 284#
cudaLaunchHostFunc_v2_ptsz_v13020 = 551#
cudaLaunchHostFunc_v2_v13020 = 550#
cudaLaunchKernelExC_ptsz_v11060 = 431#
cudaLaunchKernelExC_v11060 = 430#
cudaLaunchKernel_ptsz_v7000 = 214#
cudaLaunchKernel_v7000 = 211#
cudaLaunch_ptsz_v7000 = 213#
cudaLaunch_v3020 = 13#
cudaLibraryEnumerateKernels_v12060 = 478#
cudaLibraryGetGlobal_v12060 = 474#
cudaLibraryGetKernelCount_v12060 = 477#
cudaLibraryGetKernel_v12060 = 473#
cudaLibraryGetManaged_v12060 = 475#
cudaLibraryGetUnifiedFunction_v12060 = 476#
cudaLibraryLoadData_v12060 = 470#
cudaLibraryLoadFromFile_v12060 = 471#
cudaLibraryUnload_v12060 = 472#
cudaLogsCurrent_v13000 = 515#
cudaLogsDumpToFile_v13000 = 516#
cudaLogsDumpToMemory_v13000 = 517#
cudaLogsRegisterCallback_v13000 = 513#
cudaLogsUnregisterCallback_v13000 = 514#
cudaMalloc3DArray_v3020 = 141#
cudaMalloc3D_v3020 = 140#
cudaMallocArray_v3020 = 23#
cudaMallocAsync_ptsz_v11020 = 374#
cudaMallocAsync_v11020 = 373#
cudaMallocFromPoolAsync_ptsz_v11020 = 392#
cudaMallocFromPoolAsync_v11020 = 391#
cudaMallocHost_v3020 = 25#
cudaMallocManaged_v6000 = 206#
cudaMallocMipmappedArray_v5000 = 192#
cudaMallocPitch_v3020 = 21#
cudaMalloc_v3020 = 20#
cudaMemAdvise_v12020 = 448#
cudaMemAdvise_v2_v12020 = 448#
cudaMemAdvise_v8000 = 254#
cudaMemDiscardAndPrefetchBatchAsync_ptsz_v13000 = 492#
cudaMemDiscardAndPrefetchBatchAsync_v13000 = 491#
cudaMemDiscardBatchAsync_ptsz_v13000 = 490#
cudaMemDiscardBatchAsync_v13000 = 489#
cudaMemGetDefaultMemPool_v13000 = 518#
cudaMemGetInfo_v3020 = 30#
cudaMemGetMemPool_v13000 = 519#
cudaMemPoolCreate_v11020 = 383#
cudaMemPoolDestroy_v11020 = 384#
cudaMemPoolExportPointer_v11020 = 389#
cudaMemPoolExportToShareableHandle_v11020 = 387#
cudaMemPoolGetAccess_v11020 = 382#
cudaMemPoolGetAttribute_v11020 = 379#
cudaMemPoolImportFromShareableHandle_v11020 = 388#
cudaMemPoolImportPointer_v11020 = 390#
cudaMemPoolSetAccess_v11020 = 380#
cudaMemPoolSetAttribute_v11020 = 378#
cudaMemPoolTrimTo_v11020 = 377#
cudaMemPrefetchAsync_ptsz_v12020 = 450#
cudaMemPrefetchAsync_ptsz_v8000 = 253#
cudaMemPrefetchAsync_v12020 = 449#
cudaMemPrefetchAsync_v2_ptsz_v12020 = 450#
cudaMemPrefetchAsync_v2_v12020 = 449#
cudaMemPrefetchAsync_v8000 = 252#
cudaMemPrefetchBatchAsync_ptsz_v13000 = 488#
cudaMemPrefetchBatchAsync_v13000 = 487#
cudaMemRangeGetAttribute_v8000 = 266#
cudaMemRangeGetAttributes_v8000 = 267#
cudaMemSetMemPool_v13000 = 520#
cudaMemcpy2DArrayToArray_ptds_v7000 = 222#
cudaMemcpy2DArrayToArray_v3020 = 38#
cudaMemcpy2DAsync_ptsz_v7000 = 228#
cudaMemcpy2DAsync_v3020 = 44#
cudaMemcpy2DFromArrayAsync_ptsz_v7000 = 230#
cudaMemcpy2DFromArrayAsync_v3020 = 46#
cudaMemcpy2DFromArray_ptds_v7000 = 220#
cudaMemcpy2DFromArray_v3020 = 36#
cudaMemcpy2DToArrayAsync_ptsz_v7000 = 229#
cudaMemcpy2DToArrayAsync_v3020 = 45#
cudaMemcpy2DToArray_ptds_v7000 = 218#
cudaMemcpy2DToArray_v3020 = 34#
cudaMemcpy2D_ptds_v7000 = 216#
cudaMemcpy2D_v3020 = 32#
cudaMemcpy3DAsync_ptsz_v7000 = 246#
cudaMemcpy3DAsync_v3020 = 145#
cudaMemcpy3DBatchAsync_ptsz_v12080 = 485#
cudaMemcpy3DBatchAsync_ptsz_v13000 = 512#
cudaMemcpy3DBatchAsync_v12080 = 484#
cudaMemcpy3DBatchAsync_v13000 = 511#
cudaMemcpy3DPeerAsync_ptsz_v7000 = 250#
cudaMemcpy3DPeerAsync_v4000 = 163#
cudaMemcpy3DPeer_ptds_v7000 = 249#
cudaMemcpy3DPeer_v4000 = 162#
cudaMemcpy3DWithAttributesAsync_ptsz_v13020 = 557#
cudaMemcpy3DWithAttributesAsync_v13020 = 556#
cudaMemcpy3D_ptds_v7000 = 245#
cudaMemcpy3D_v3020 = 144#
cudaMemcpyArrayToArray_ptds_v7000 = 221#
cudaMemcpyArrayToArray_v3020 = 37#
cudaMemcpyAsync_ptsz_v7000 = 225#
cudaMemcpyAsync_v3020 = 41#
cudaMemcpyBatchAsync_ptsz_v12080 = 483#
cudaMemcpyBatchAsync_ptsz_v13000 = 510#
cudaMemcpyBatchAsync_v12080 = 482#
cudaMemcpyBatchAsync_v13000 = 509#
cudaMemcpyFromArrayAsync_ptsz_v7000 = 227#
cudaMemcpyFromArrayAsync_v3020 = 43#
cudaMemcpyFromArray_ptds_v7000 = 219#
cudaMemcpyFromArray_v3020 = 35#
cudaMemcpyFromSymbolAsync_ptsz_v7000 = 232#
cudaMemcpyFromSymbolAsync_v3020 = 48#
cudaMemcpyFromSymbol_ptds_v7000 = 224#
cudaMemcpyFromSymbol_v3020 = 40#
cudaMemcpyPeerAsync_v4000 = 161#
cudaMemcpyPeer_v4000 = 160#
cudaMemcpyToArrayAsync_ptsz_v7000 = 226#
cudaMemcpyToArrayAsync_v3020 = 42#
cudaMemcpyToArray_ptds_v7000 = 217#
cudaMemcpyToArray_v3020 = 33#
cudaMemcpyToSymbolAsync_ptsz_v7000 = 231#
cudaMemcpyToSymbolAsync_v3020 = 47#
cudaMemcpyToSymbol_ptds_v7000 = 223#
cudaMemcpyToSymbol_v3020 = 39#
cudaMemcpyWithAttributesAsync_ptsz_v13020 = 555#
cudaMemcpyWithAttributesAsync_v13020 = 554#
cudaMemcpy_ptds_v7000 = 215#
cudaMemcpy_v3020 = 31#
cudaMemset2DAsync_ptsz_v7000 = 236#
cudaMemset2DAsync_v3020 = 52#
cudaMemset2D_ptds_v7000 = 234#
cudaMemset2D_v3020 = 50#
cudaMemset3DAsync_ptsz_v7000 = 244#
cudaMemset3DAsync_v3020 = 143#
cudaMemset3D_ptds_v7000 = 243#
cudaMemset3D_v3020 = 142#
cudaMemsetAsync_ptsz_v7000 = 235#
cudaMemsetAsync_v3020 = 51#
cudaMemset_ptds_v7000 = 233#
cudaMemset_v3020 = 49#
cudaMipmappedArrayGetMemoryRequirements_v11060 = 429#
cudaMipmappedArrayGetSparseProperties_v11010 = 360#
cudaOccupancyAvailableDynamicSMemPerBlock_v10200 = 329#
cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags_v7000 = 251#
cudaOccupancyMaxActiveBlocksPerMultiprocessor_v6000 = 207#
cudaOccupancyMaxActiveBlocksPerMultiprocessor_v6050 = 210#
cudaOccupancyMaxActiveClusters_v11070 = 433#
cudaOccupancyMaxPotentialClusterSize_v11070 = 432#
cudaPeekAtLastError_v3020 = 11#
cudaPeerGetDevicePointer_v4000 = 159#
cudaPeerRegister_v4000 = 157#
cudaPeerUnregister_v4000 = 158#
cudaPointerGetAttributes_v4000 = 151#
cudaProfilerInitialize_v4000 = 170#
cudaProfilerStart_v4000 = 171#
cudaProfilerStop_v4000 = 172#
cudaRuntimeGetVersion_v3020 = 2#
cudaSetDeviceFlags_v3020 = 19#
cudaSetDevice_v3020 = 16#
cudaSetDoubleForDevice_v3020 = 124#
cudaSetDoubleForHost_v3020 = 125#
cudaSetValidDevices_v3020 = 18#
cudaSetupArgument_v3020 = 9#
cudaSignalExternalSemaphoresAsync_ptsz_v10000 = 280#
cudaSignalExternalSemaphoresAsync_ptsz_v11020 = 394#
cudaSignalExternalSemaphoresAsync_v10000 = 279#
cudaSignalExternalSemaphoresAsync_v11020 = 393#
cudaSignalExternalSemaphoresAsync_v2_ptsz_v11020 = 394#
cudaSignalExternalSemaphoresAsync_v2_v11020 = 393#
cudaStreamAddCallback_ptsz_v7000 = 248#
cudaStreamAddCallback_v5000 = 197#
cudaStreamAttachMemAsync_ptsz_v7000 = 241#
cudaStreamAttachMemAsync_v6000 = 208#
cudaStreamBeginCaptureToGraph_ptsz_v12030 = 453#
cudaStreamBeginCaptureToGraph_v12030 = 452#
cudaStreamBeginCapture_ptsz_v10000 = 316#
cudaStreamBeginCapture_v10000 = 315#
cudaStreamCopyAttributes_ptsz_v11000 = 342#
cudaStreamCopyAttributes_v11000 = 341#
cudaStreamCreateWithFlags_v5000 = 198#
cudaStreamCreateWithPriority_v5050 = 202#
cudaStreamCreate_v3020 = 129#
cudaStreamDestroy_v3020 = 130#
cudaStreamDestroy_v5050 = 201#
cudaStreamEndCapture_ptsz_v10000 = 320#
cudaStreamEndCapture_v10000 = 319#
cudaStreamGetAttribute_ptsz_v11000 = 344#
cudaStreamGetAttribute_v11000 = 343#
cudaStreamGetCaptureInfo_ptsz_v10010 = 325#
cudaStreamGetCaptureInfo_ptsz_v12030 = 462#
cudaStreamGetCaptureInfo_v10010 = 324#
cudaStreamGetCaptureInfo_v12030 = 461#
cudaStreamGetCaptureInfo_v2_ptsz_v11030 = 410#
cudaStreamGetCaptureInfo_v2_v11030 = 409#
cudaStreamGetCaptureInfo_v3_ptsz_v12030 = 462#
cudaStreamGetCaptureInfo_v3_v12030 = 461#
cudaStreamGetDevResource_ptsz_v13010 = 537#
cudaStreamGetDevResource_v13010 = 536#
cudaStreamGetDevice_ptsz_v12080 = 481#
cudaStreamGetDevice_v12080 = 480#
cudaStreamGetFlags_ptsz_v7000 = 238#
cudaStreamGetFlags_v5050 = 204#
cudaStreamGetId_ptsz_v12000 = 442#
cudaStreamGetId_v12000 = 441#
cudaStreamGetPriority_ptsz_v7000 = 237#
cudaStreamGetPriority_v5050 = 203#
cudaStreamIsCapturing_ptsz_v10000 = 318#
cudaStreamIsCapturing_v10000 = 317#
cudaStreamQuery_ptsz_v7000 = 240#
cudaStreamQuery_v3020 = 132#
cudaStreamSetAttribute_ptsz_v11000 = 346#
cudaStreamSetAttribute_v11000 = 345#
cudaStreamSetFlags_ptsz_v10200 = 331#
cudaStreamSetFlags_v10200 = 330#
cudaStreamSynchronize_ptsz_v7000 = 239#
cudaStreamSynchronize_v3020 = 131#
cudaStreamUpdateCaptureDependencies_ptsz_v11030 = 412#
cudaStreamUpdateCaptureDependencies_ptsz_v12030 = 464#
cudaStreamUpdateCaptureDependencies_v11030 = 411#
cudaStreamUpdateCaptureDependencies_v12030 = 463#
cudaStreamUpdateCaptureDependencies_v2_ptsz_v12030 = 464#
cudaStreamUpdateCaptureDependencies_v2_v12030 = 463#
cudaStreamWaitEvent_ptsz_v7000 = 247#
cudaStreamWaitEvent_v3020 = 147#
cudaThreadExchangeStreamCaptureMode_v10010 = 327#
cudaThreadExit_v3020 = 123#
cudaThreadGetCacheConfig_v3020 = 150#
cudaThreadGetLimit_v3020 = 127#
cudaThreadSetCacheConfig_v3020 = 146#
cudaThreadSetLimit_v3020 = 128#
cudaThreadSynchronize_v3020 = 126#
cudaUnbindTexture_v3020 = 58#
cudaUserObjectCreate_v11030 = 413#
cudaUserObjectRelease_v11030 = 415#
cudaUserObjectRetain_v11030 = 414#
cudaVDPAUGetDevice_v3020 = 80#
cudaVDPAUSetVDPAUDevice_v3020 = 81#
cudaWGLGetDevice_v3020 = 71#
cudaWaitExternalSemaphoresAsync_ptsz_v10000 = 282#
cudaWaitExternalSemaphoresAsync_ptsz_v11020 = 396#
cudaWaitExternalSemaphoresAsync_v10000 = 281#
cudaWaitExternalSemaphoresAsync_v11020 = 395#
cudaWaitExternalSemaphoresAsync_v2_ptsz_v11020 = 396#
cudaWaitExternalSemaphoresAsync_v2_v11020 = 395#
cupti.cupti.driver_api_trace_cbid#

alias of Driver_api_trace_cbid

cupti.cupti.runtime_api_trace_cbid#

alias of Runtime_api_trace_cbid

Classes#

class cupti.cupti.CallbackData#

Bases: object

Empty-initialize an instance of CUpti_CallbackData.

callback_site#

Point in the runtime or driver function from where the callback was issued.

Type:

int

context#

Driver context current to the thread, or null if no context is current. This value can change from the entry to exit callback of a runtime API function if the runtime initializes a context.

Type:

int

context_uid#

Unique ID for the CUDA context associated with the thread. The UIDs are assigned sequentially as contexts are created and are unique within a process.

Type:

int

correlation_data#

Pointer to data shared between the entry and exit callbacks of a given runtime or drive API function invocation. This field can be used to pass 64-bit values from the entry callback to the corresponding exit callback.

Type:

int

correlation_id#

The activity record correlation ID for this callback. For a driver domain callback (i.e. `domain` CUPTI_CB_DOMAIN_DRIVER_API) this ID will equal the correlation ID in the CUpti_ActivityAPI record corresponding to the CUDA driver function call. For a runtime domain callback (i.e. `domain` CUPTI_CB_DOMAIN_RUNTIME_API) this ID will equal the correlation ID in the CUpti_ActivityAPI record corresponding to the CUDA runtime function call. Within the callback, this ID can be recorded to correlate user data with the activity record. This field is new in 4.1.

Type:

int

static from_buffer(buffer)#

Create an CallbackData instance with the memory from the given buffer.

function_name#

Name of the runtime or driver API function which issued the callback. This string is a global constant and so may be accessed outside of the callback.

Type:

str

function_params#

Pointer to the arguments passed to the runtime or driver API call. See generated_cuda_runtime_api_meta.h and generated_cuda_meta.h for structure definitions for the parameters for each runtime and driver API function.

Type:

int

function_return_value#

Pointer to the return value of the runtime or driver API call. This field is only valid within the exit::CUPTI_API_EXIT callback. For a runtime API `functionReturnValue` points to a `cudaError_t`. For a driver API `functionReturnValue` points to a `CUresult`.

Type:

int

ptr#

Get the pointer address to the data as Python int.

symbol_name#

Name of the symbol operated on by the runtime or driver API function which issued the callback. This entry is valid only for driver and runtime launch callbacks, where it returns the name of the kernel.

Type:

str

class cupti.cupti.GraphData#

Bases: object

Empty-initialize an instance of CUpti_GraphData.

See also

CUpti_GraphData

dependency#

The dependent graph node

Type:

int

static from_buffer(buffer)#

Create an GraphData instance with the memory from the given buffer.

graph#

CUDA graph

Type:

int

graph_exec#

CUDA executable graph

Type:

int

node#

CUDA graph node

Type:

int

node_type#

Type of the node

Type:

int

original_graph#

The original CUDA graph from which graph is cloned

Type:

int

original_node#

The original CUDA graph node from which node is cloned

Type:

int

ptr#

Get the pointer address to the data as Python int.

class cupti.cupti.ModuleResourceData#

Bases: object

Empty-initialize an instance of CUpti_ModuleResourceData.

cubin_size#

The size of the cubin.

Type:

int

static from_buffer(buffer)#

Create an ModuleResourceData instance with the memory from the given buffer.

module_id#

Identifier to associate with the CUDA module.

Type:

int

p_cubin#

Pointer to the associated cubin.

Type:

str

ptr#

Get the pointer address to the data as Python int.

class cupti.cupti.NvtxData#

Bases: object

Empty-initialize an instance of CUpti_NvtxData.

See also

CUpti_NvtxData

static from_buffer(buffer)#

Create an NvtxData instance with the memory from the given buffer.

function_name#

Name of the NVTX API function which issued the callback. This string is a global constant and so may be accessed outside of the callback.

Type:

str

function_params#

Pointer to the arguments passed to the NVTX API call. See generated_nvtx_meta.h for structure definitions for the parameters for each NVTX API function.

Type:

int

function_return_value#

Pointer to the return value of the NVTX API call. See nvToolsExt.h for each NVTX API function’s return value.

Type:

int

ptr#

Get the pointer address to the data as Python int.

class cupti.cupti.ResourceData#

Bases: object

Empty-initialize an instance of CUpti_ResourceData.

context#

For CUPTI_CBID_RESOURCE_CONTEXT_CREATED and CUPTI_CBID_RESOURCE_CONTEXT_DESTROY_STARTING, the context being created or destroyed. For CUPTI_CBID_RESOURCE_STREAM_CREATED and CUPTI_CBID_RESOURCE_STREAM_DESTROY_STARTING, the context containing the stream being created or destroyed.

Type:

int

static from_buffer(buffer)#

Create an ResourceData instance with the memory from the given buffer.

ptr#

Get the pointer address to the data as Python int.

resource_descriptor#

Reserved for future use.

Type:

int

resource_handle#

_py_anon_pod0:

class cupti.cupti.StateData#

Bases: object

Empty-initialize an instance of CUpti_StateData.

See also

CUpti_StateData

static from_buffer(buffer)#

Create an StateData instance with the memory from the given buffer.

ptr#

Get the pointer address to the data as Python int.

class cupti.cupti.StreamAttrData#

Bases: object

Empty-initialize an instance of CUpti_StreamAttrData.

attr#

The type of the CUDA stream attribute

Type:

int

static from_buffer(buffer)#

Create an StreamAttrData instance with the memory from the given buffer.

ptr#

Get the pointer address to the data as Python int.

stream#

The CUDA stream handle for the attribute

Type:

int

value#

The value of the CUDA stream attribute

Type:

int

class cupti.cupti.SubscriberParams#

Bases: _SubscriberParams

Empty-initialize an instance of CUpti_SubscriberParams.

class cupti.cupti.SynchronizeData#

Bases: object

Empty-initialize an instance of CUpti_SynchronizeData.

context#

The context of the stream being synchronized.

Type:

int

static from_buffer(buffer)#

Create an SynchronizeData instance with the memory from the given buffer.

ptr#

Get the pointer address to the data as Python int.

stream#

The stream being synchronized.

Type:

int

class cupti.cupti._SubscriberParams#

Bases: object

Empty-initialize an instance of CUpti_SubscriberParams.

allow_multiple_subscribers#

CUPTI Python does not support multiple subscribers; this variable is fixed at 0 and cannot be changed.

static from_buffer(buffer)#

Create an _SubscriberParams instance with the memory from the given buffer.

old_subscriber_name#

In case of multiple subscribers not allowed, and a CUPTI_ERROR_MULTIPLE_SUBSCRIBERS_NOT_SUPPORTED return code, the name of the incompatible tool or the existing CUPTI subscriber will be written to this location. Size should be greater than or equal to CUPTI_OLD_SUBSCRIBER_NAME_MIN_LEN to avoid truncation. Can be NULL. If multiple subscribers are allowed, this will be the name of the first subscriber, but CUPTI_ERROR_MULTIPLE_SUBSCRIBERS_NOT_SUPPORTED will not be returned.

Type:

str

ptr#

Get the pointer address to the data as Python int.

struct_size#

Size of the data structure. CUPTI client should set the size of the structure. It will be used in CUPTI to check what fields are available in the structure. Used to preserve backward compatibility.

Type:

int

subscriber_name#

Name given to the subscriber. The subscriber name need not include the “CUPTI” prefix, as the CUPTI library automatically adds it as “CUPTI for <subscriberName>”. Can be NULL. An internal copy is created. Size must not exceed CUPTI_SUBSCRIBER_NAME_MAX_LEN to avoid truncation.

Type:

str

PM Sampling API#

Functions#

cupti.cupti.pm_sampling_counter_data_get_sample_info(p_params: int)#

Get the sample info (start and end time stamp) for the given sample index. Each sample is distinguished by the start and end time stamp.

Parameters:

p_params (intptr_t) – A pointer to CUpti_PmSampling_CounterData_GetSampleInfo_Params.

cupti.cupti.pm_sampling_counter_data_image_initialize(p_params: int)#

Initialize the counter data to CUPTI record format for storing the metric data.

Parameters:

p_params (intptr_t) – A pointer to CUpti_PmSampling_CounterDataImage_Initialize_Params.

cupti.cupti.pm_sampling_decode_data(p_params: int)#

Decode the metrics data stored in the hardware buffer to the counter data image.

Parameters:

p_params (intptr_t) – A pointer to CUpti_PmSampling_DecodeData_Params.

cupti.cupti.pm_sampling_disable(p_params: int)#

Disable PM sampling on the CUDA device and destroy the PM sampling object.

Parameters:

p_params (intptr_t) – A pointer to CUpti_PmSampling_Disable_Params.

cupti.cupti.pm_sampling_enable(p_params: int)#

Create a PM sampling object and enable PM sampling on the CUDA device.

Parameters:

p_params (intptr_t) – A pointer to CUpti_PmSampling_Enable_Params.

cupti.cupti.pm_sampling_get_counter_availability(p_params: int)#

Query counter availibility information in a buffer which can be used to filter unavailable raw metrics on host. Note: This API may fail, if any profiling or sampling session is active on the specified device.

Parameters:

p_params (intptr_t) – A pointer to CUpti_PmSampling_GetCounterAvailability_Params.

cupti.cupti.pm_sampling_get_counter_data_info(p_params: int)#

Get the counter data info like number of samples, number of populated samples and number of completed samples in a counter data image.

Parameters:

p_params (intptr_t) – A pointer to CUpti_PmSampling_GetCounterDataInfo_Params.

cupti.cupti.pm_sampling_get_counter_data_size(p_params: int)#

Query the size of the counter data image which will be used to store the metrics data. User need to allocate the memory for the counter data image based on the size returned by this API.

Parameters:

p_params (intptr_t) – A pointer to CUpti_PmSampling_GetCounterDataSize_Params.

cupti.cupti.pm_sampling_set_config(p_params: int)#

Set the configuration for PM sampling like sampling interval, maximum number of samples filled in HW buffer, trigger mode and the config image which has scheduling info for metric collection.

Parameters:

p_params (intptr_t) – A pointer to CUpti_PmSampling_SetConfig_Params.

cupti.cupti.pm_sampling_start(p_params: int)#

Start the PM sampling. The GPU will start collecting the metrics data periodically based on trigger type and sampling interval passed in CUpti_PmSampling_SetConfig_Params. The collected data will be stored in the hardware buffer.

Parameters:

p_params (intptr_t) – A pointer to CUpti_PmSampling_Start_Params.

cupti.cupti.pm_sampling_stop(p_params: int)#

Stop the PM sampling. The GPU will stop collecting the metrics data.

Parameters:

p_params (intptr_t) – A pointer to CUpti_PmSampling_Stop_Params.

Enums#

cupti.cupti.PmSampling_DecodeStopReason#

alias of DecodeStopReason

cupti.cupti.PmSampling_HardwareBuffer_AppendMode#

alias of HardwareBuffer_AppendMode

cupti.cupti.PmSampling_TriggerMode#

alias of TriggerMode

Classes#

class cupti.cupti.PmSampling_CounterDataImage_Initialize_Params#

Bases: object

Empty-initialize an instance of CUpti_PmSampling_CounterDataImage_Initialize_Params.

counter_data_size#

[in] Size of the counter data image.

Type:

int

static from_buffer(buffer)#

Create an PmSampling_CounterDataImage_Initialize_Params instance with the memory from the given buffer.

p_counter_data#

[in] Counter data image.

Type:

int

p_pm_sampling_object#

[in] PM sampling object.

Type:

int

p_priv#

[in] Set to NULL.

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.PmSampling_CounterData_GetSampleInfo_Params#

Bases: object

Empty-initialize an instance of CUpti_PmSampling_CounterData_GetSampleInfo_Params.

counter_data_image_size#

[in] Size of the counter data image.

Type:

int

end_timestamp#

[out] End time of the sample.

Type:

int

static from_buffer(buffer)#

Create an PmSampling_CounterData_GetSampleInfo_Params instance with the memory from the given buffer.

p_counter_data_image#

[in] Counter data image.

Type:

int

p_pm_sampling_object#

[in] PM sampling object.

Type:

int

p_priv#

[in] Set to NULL.

Type:

int

ptr#

Get the pointer address to the data as Python int.

sample_index#

[in] Index of the sample.

Type:

int

start_timestamp#

[out] Start time of the sample.

Type:

int

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.PmSampling_DecodeData_Params#

Bases: object

Empty-initialize an instance of CUpti_PmSampling_DecodeData_Params.

counter_data_image_size#

[in] Size of the counter data image.

Type:

int

decode_stop_reason#

[out] decode stop reason

Type:

int

static from_buffer(buffer)#

Create an PmSampling_DecodeData_Params instance with the memory from the given buffer.

overflow#

[out] overflow status for hardware buffer. To avoid overflow, either increase the maxSamples values in CUpti_PmSampling_SetConfig_Params or reduce the sampling interval.

Type:

int

p_counter_data_image#

[in] Counter data image.

Type:

int

p_pm_sampling_object#

[in] PM sampling object.

Type:

int

p_priv#

[in] Set to NULL.

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.PmSampling_Disable_Params#

Bases: object

Empty-initialize an instance of CUpti_PmSampling_Disable_Params.

static from_buffer(buffer)#

Create an PmSampling_Disable_Params instance with the memory from the given buffer.

p_pm_sampling_object#

[in] PM sampling object.

Type:

int

p_priv#

[in] Set to NULL.

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.PmSampling_Enable_Params#

Bases: object

Empty-initialize an instance of CUpti_PmSampling_Enable_Params.

device_index#

[in] Device index.

Type:

int

static from_buffer(buffer)#

Create an PmSampling_Enable_Params instance with the memory from the given buffer.

p_pm_sampling_object#

[out] PM sampling object.

Type:

int

p_priv#

[in] Set to NULL.

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.PmSampling_GetCounterAvailability_Params#

Bases: object

Empty-initialize an instance of CUpti_PmSampling_GetCounterAvailability_Params.

counter_availability_image_size#

[inout] Size of the counter availability image. When pCounterAvailabilityImage is NULL, this field is used to return the size of the counter availability image.

Type:

int

device_index#

[in] Device index.

Type:

int

static from_buffer(buffer)#

Create an PmSampling_GetCounterAvailability_Params instance with the memory from the given buffer.

p_counter_availability_image#

[out] Counter availability image.

Type:

int

p_priv#

[in] Set to NULL.

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.PmSampling_GetCounterDataInfo_Params#

Bases: object

Empty-initialize an instance of CUpti_PmSampling_GetCounterDataInfo_Params.

counter_data_image_size#

[in] Size of the counter data image.

Type:

int

static from_buffer(buffer)#

Create an PmSampling_GetCounterDataInfo_Params instance with the memory from the given buffer.

num_completed_samples#

[out] Number of samples that have been completed.

Type:

int

num_populated_samples#

[out] Number of populated samples.

Type:

int

num_total_samples#

[out] Number of samples in the counter data image.

Type:

int

p_counter_data_image#

[in] Counter data image.

Type:

int

p_priv#

[in] Set to NULL.

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.PmSampling_GetCounterDataSize_Params#

Bases: object

Empty-initialize an instance of CUpti_PmSampling_GetCounterDataSize_Params.

counter_data_size#

[out] Size of the counter data image.

Type:

int

static from_buffer(buffer)#

Create an PmSampling_GetCounterDataSize_Params instance with the memory from the given buffer.

max_samples#

[in] Maximum number of samples to be stored in the counter data image.

Type:

int

num_metrics#

[in] Number of metrics to be collected.

Type:

int

p_metric_names#

[in] Names of the metrics to be collected.

Type:

str

p_pm_sampling_object#

[in] PM sampling object.

Type:

int

p_priv#

[in] Set to NULL.

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.PmSampling_SetConfig_Params#

Bases: object

Empty-initialize an instance of CUpti_PmSampling_SetConfig_Params.

config_size#

[in] Size of the config image.

Type:

int

static from_buffer(buffer)#

Create an PmSampling_SetConfig_Params instance with the memory from the given buffer.

hardware_buffer_size#

[in] The hardware buffer size in which raw PM sampling data will be stored. These samples will be decoded to counter data image with cuptiPmSamplingDecodeData call.

Type:

int

hw_buffer_append_mode#

[in] Append mode for the records in hardware buffer. For KEEP_OLDEST mode, all the records will be kept in the buffer and in case hardware buffer is getting filled up. overflow will be set to 1 in CUpti_PmSampling_DecodeData_Params. For KEEP_LATEST mode, the new records will overwrite the oldest records in the buffer in case of filled buffer.

Type:

int

p_config#

[in] Config image.

Type:

int

p_pm_sampling_object#

[in] PM sampling object.

Type:

int

p_priv#

[in] Set to NULL.

Type:

int

ptr#

Get the pointer address to the data as Python int.

sampling_interval#

[in] For the trigger mode `CUPTI_PM_SAMPLING_TRIGGER_MODE_GPU_SYSCLK_INTERVAL`, sampling interval is the number of sys clock cycles. For the trigger mode `CUPTI_PM_SAMPLING_TRIGGER_MODE_GPU_TIME_INTERVAL`, sampling interval is in nanoseconds.

Type:

int

struct_size#

[in] Size of the data structure.

Type:

int

trigger_mode#

CUPTI_PM_SAMPLING_TRIGGER_MODE_GPU_TIME_INTERVAL is not supported in Turing and GA100. Supported from GA10x onwards.

Type:

int

Type:

[in] Trigger mode. Note

class cupti.cupti.PmSampling_Start_Params#

Bases: object

Empty-initialize an instance of CUpti_PmSampling_Start_Params.

static from_buffer(buffer)#

Create an PmSampling_Start_Params instance with the memory from the given buffer.

p_pm_sampling_object#

[in] PM sampling object.

Type:

int

p_priv#

[in] Set to NULL.

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.PmSampling_Stop_Params#

Bases: object

Empty-initialize an instance of CUpti_PmSampling_Stop_Params.

static from_buffer(buffer)#

Create an PmSampling_Stop_Params instance with the memory from the given buffer.

p_pm_sampling_object#

[in] PM sampling object.

Type:

int

p_priv#

[in] Set to NULL.

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

Profiler Host API#

Functions#

cupti.cupti.profiler_host_config_add_metrics(p_params: int)#

Add the metrics to the profiler host object for generating the config image. The config image will have the required information to schedule the metrics for collecting the profiling data. Note:.

Parameters:

p_params (intptr_t) – A pointer to CUpti_Profiler_Host_ConfigAddMetrics_Params.

cupti.cupti.profiler_host_deinitialize(p_params: int)#

Deinitialize and destroy the profiler host object (CUpti_Profiler_Host_Object).

Parameters:

p_params (intptr_t) – A pointer to CUpti_Profiler_Host_Deinitialize_Params.

cupti.cupti.profiler_host_evaluate_to_gpu_values(p_params: int)#

Evaluate the metric values for the range index stored in the counter data.

Parameters:

p_params (intptr_t) – A pointer to CUpti_Profiler_Host_EvaluateToGpuValues_Params.

cupti.cupti.profiler_host_get_base_metrics(p_params: int)#

Get the list of supported base metrics for the chip.

Parameters:

p_params (intptr_t) – A pointer to CUpti_Profiler_Host_GetBaseMetrics_Params.

cupti.cupti.profiler_host_get_config_image(p_params: int)#

Get the config image for the metrics added to the profiler host object. User will pass the allocated buffer to store the config image.

Parameters:

p_params (intptr_t) – A pointer to CUpti_Profiler_Host_GetConfigImage_Params.

cupti.cupti.profiler_host_get_config_image_size(p_params: int)#

Get the size of the config image for the metrics added to the profiler host object. Users need to allocate the buffer for storing the config image.

Parameters:

p_params (intptr_t) – A pointer to CUpti_Profiler_Host_GetConfigImageSize_Params.

cupti.cupti.profiler_host_get_max_num_hardware_metrics_per_pass(p_params: int)#

Get the maximum number of hardware metrics (metric names which doesn’t include sass keyword) that can be scheduled in a single pass for a chip. While this represents a theoretical upper limit, practical constraints may prevent reaching this threshold for a specific set of metrics. Furthermore, the maximum achievable value is contingent upon the characteristics and architecture of the chip in question.

Parameters:

p_params (intptr_t) – A pointer to CUpti_Profiler_Host_GetMaxNumHardwareMetricsPerPass_Params.

cupti.cupti.profiler_host_get_metric_properties(p_params: int)#

Get the properties of the metric.

Parameters:

p_params (intptr_t) – A pointer to CUpti_Profiler_Host_GetMetricProperties_Params.

cupti.cupti.profiler_host_get_metrics_in_single_pass_set(p_params: int)#

Get all the metrics defined in the single pass metric set. Profiling data for the metrics in a single pass metric set can be collected in a single pass.

Parameters:

p_params (intptr_t) – A pointer to CUpti_Profiler_Host_GetMetricsInSinglePassSet_Params.

cupti.cupti.profiler_host_get_num_of_passes(p_params: int)#

Get the number of passes required for profiling the scheduled metrics in the config image.

Parameters:

p_params (intptr_t) – A pointer to CUpti_Profiler_Host_GetNumOfPasses_Params.

cupti.cupti.profiler_host_get_range_name(p_params: int)#

Get the range name for the range index stored in the counter data. In Range profiler, for Auto range mode the range name will be numeric value assigned to the kernel based on execution order. For user range mode, the name of range will be based on the range name provided by the user using Push range API.

Parameters:

p_params (intptr_t) – A pointer to CUpti_Profiler_Host_GetRangeName_Params.

cupti.cupti.profiler_host_get_single_pass_sets(p_params: int)#

Get the single pass metric sets defined in the metric config file. Profiling data for the metrics in a single pass metric set can be collected in a single pass.

Parameters:

p_params (intptr_t) – A pointer to CUpti_Profiler_Host_GetSinglePassSets_Params when ppSinglePassSets is NULL, the function will return the number of single pass metric sets and the caller needs to allocate the buffer for the single pass metric sets using the number of single pass metric sets. (i.e. ppSinglePassSets).

cupti.cupti.profiler_host_get_sub_metrics(p_params: int)#

Get the list of supported sub-metrics for the metric.

Parameters:

p_params (intptr_t) – A pointer to CUpti_Profiler_Host_GetSubMetrics_Params.

cupti.cupti.profiler_host_get_supported_chips(p_params: int)#

Get the list of supported chips.

Parameters:

p_params (intptr_t) – A pointer to CUpti_Profiler_Host_GetSupportedChips_Params.

cupti.cupti.profiler_host_initialize(p_params: int)#

Create and initialize the profiler host object (CUpti_Profiler_Host_Object).

Parameters:

p_params (intptr_t) – A pointer to CUpti_Profiler_Host_Initialize_Params.

Enums#

class cupti.cupti.MetricCollectionScope(value)#

Bases: IntEnum

See CUpti_MetricCollectionScope.

CONTEXT = 0#
DEVICE = 1#
INVALID = 2#
class cupti.cupti.MetricType(value)#

Bases: IntEnum

See CUpti_MetricType.

COUNTER = 0#
RATIO = 1#
THROUGHPUT = 2#
class cupti.cupti.MetricValueKind(value)#

Bases: IntEnum

Kinds of metric values.Metric values can be one of several different kinds. Corresponding to each kind is a member of the CUpti_MetricValue union. The metric value returned by cuptiMetricGetValue should be accessed using the appropriate member of that union based on its value kind.

See CUpti_MetricValueKind.

DOUBLE = 0#
FORCE_INT = 2147483647#
INT64 = 4#
NVTX_EXTENDED_PAYLOAD = 6#
PERCENT = 2#
THROUGHPUT = 3#
UINT64 = 1#
UTILIZATION_LEVEL = 5#
class cupti.cupti.MetricValueUtilizationLevel(value)#

Bases: IntEnum

Enumeration of utilization levels for metrics values of kind CUPTI_METRIC_VALUE_KIND_UTILIZATION_LEVEL. Utilization values can vary from IDLE (0) to MAX (10) but the enumeration only provides specific names for a few values.

See CUpti_MetricValueUtilizationLevel.

FORCE_INT = 2147483647#
HIGH = 8#
IDLE = 0#
LOW = 2#
MAX = 10#
MID = 5#

Classes#

class cupti.cupti.Profiler_Host_ConfigAddMetrics_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_Host_ConfigAddMetrics_Params.

static from_buffer(buffer)#

Create an Profiler_Host_ConfigAddMetrics_Params instance with the memory from the given buffer.

num_metrics#

[in] number of metrics

Type:

int

p_host_object#

[in] reference to the profiler host object allocated by CUPTI in cuptiProfilerHostInitialize

Type:

int

p_priv#

[in] Assign to NULL

Type:

int

pp_metric_names#

[in] metric names for which config image will be generated

Type:

str

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.Profiler_Host_Deinitialize_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_Host_Deinitialize_Params.

static from_buffer(buffer)#

Create an Profiler_Host_Deinitialize_Params instance with the memory from the given buffer.

p_host_object#

[in] reference to the profiler host object allocated by CUPTI in cuptiProfilerHostInitialize

Type:

int

p_priv#

[in] Assign to NULL

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure. CUPTI client should set the size of the structure. It will be used in CUPTI to check what fields are available in the structure. Used to preserve backward compatibility.

Type:

int

class cupti.cupti.Profiler_Host_EvaluateToGpuValues_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_Host_EvaluateToGpuValues_Params.

counter_data_image_size#

[in] size of counter data image

Type:

int

static from_buffer(buffer)#

Create an Profiler_Host_EvaluateToGpuValues_Params instance with the memory from the given buffer.

num_metrics#

[in] number of metrics

Type:

int

p_counter_data_image#

[in] the counter data image where profiling data has been decoded

Type:

int

p_host_object#

[in] reference to the profiler host object allocated by CUPTI in cuptiProfilerHostInitialize

Type:

int

p_metric_values#

[out] output value for given metric and range index

Type:

int

p_priv#

[in] Assign to NULL

Type:

int

pp_metric_names#

[in] the metrics for which GPU values will be evaluated for the range

Type:

str

ptr#

Get the pointer address to the data as Python int.

range_index#

[in] range index for which the range name will be queried

Type:

int

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.Profiler_Host_GetBaseMetrics_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_Host_GetBaseMetrics_Params.

static from_buffer(buffer)#

Create an Profiler_Host_GetBaseMetrics_Params instance with the memory from the given buffer.

metric_type#

[in] metric type (counter, ratio, throughput)

Type:

int

num_metrics#

[out] number of metrics

Type:

int

p_host_object#

[in] reference to the profiler host object allocated by CUPTI in cuptiProfilerHostInitialize

Type:

int

p_priv#

[in] Assign to NULL

Type:

int

pp_metric_names#

[out] list of base metrics supported of queried metric type for the chip

Type:

str

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.Profiler_Host_GetConfigImageSize_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_Host_GetConfigImageSize_Params.

config_image_size#

[out] the size of config image, users need to allocate the buffer for storing

Type:

int

static from_buffer(buffer)#

Create an Profiler_Host_GetConfigImageSize_Params instance with the memory from the given buffer.

p_host_object#

[in] reference to the profiler host object allocated by CUPTI in cuptiProfilerHostInitialize

Type:

int

p_priv#

[in] Assign to NULL

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.Profiler_Host_GetConfigImage_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_Host_GetConfigImage_Params.

config_image_size#

[in] Number of bytes allocated for pBuffer

Type:

int

static from_buffer(buffer)#

Create an Profiler_Host_GetConfigImage_Params instance with the memory from the given buffer.

p_config_image#

[out] Buffer receiving the config image

Type:

int

p_host_object#

[in] reference to the profiler host object allocated by CUPTI in cuptiProfilerHostInitialize

Type:

int

p_priv#

[in] Assign to NULL

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.Profiler_Host_GetMaxNumHardwareMetricsPerPass_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_Host_GetMaxNumHardwareMetricsPerPass_Params.

static from_buffer(
buffer,
)#

Create an Profiler_Host_GetMaxNumHardwareMetricsPerPass_Params instance with the memory from the given buffer.

max_metrics_per_pass#

[out] maximum number of metrics that can be scheduled in a pass

Type:

int

p_chip_name#

[in] accepted for chips supported at the time-of-release.

Type:

str

p_counter_availability_image#

[in] buffer with counter availability image - required for future chip support

Type:

int

p_priv#

[in] Assign to NULL

Type:

int

profiler_type#

[in] the profiler kind one from CUpti_ProfilerType

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.Profiler_Host_GetMetricProperties_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_Host_GetMetricProperties_Params.

static from_buffer(buffer)#

Create an Profiler_Host_GetMetricProperties_Params instance with the memory from the given buffer.

metric_collection_scope#

[out] the metric collection scope (context, device)

Type:

int

metric_type#

[out] the metric type (counter, ratio or throughput)

Type:

int

p_description#

[out] a short description about the metric

Type:

str

p_dim_unit#

[out] the dimension of the metric values

Type:

str

p_host_object#

[in] reference to the profiler host object allocated by CUPTI in cuptiProfilerHostInitialize

Type:

int

p_hw_unit#

[out] associated hw unit for the metric

Type:

str

p_metric_name#

[in] metric name for which its properties will be listed. Metric name can be with or without extension (rollup or submetric)

Type:

str

p_priv#

[in] Assign to NULL

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.Profiler_Host_GetMetricsInSinglePassSet_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_Host_GetMetricsInSinglePassSet_Params.

static from_buffer(buffer)#

Create an Profiler_Host_GetMetricsInSinglePassSet_Params instance with the memory from the given buffer.

metrics_buffer_size#

[inout] In Query mode, this will be returned as the buffer size needed for the metrics in the single pass metric set In Data retrieval mode, this must be set to the value returned in the query mode

Type:

int

num_of_metrics_in_single_pass_set#

[inout] In Query mode, this will be returned as the number of metrics in the single pass metric set In Data retrieval mode, this must be set to the value returned in the query mode

Type:

int

p_chip_name#

[in] the chip name for which the metrics in the single pass metric set will be queried

Type:

str

p_metrics_buffer#

[out] When set to NULL, the function call will treat it as query mode, otherwise it will treat it as data retrieval mode. User need to allocate the buffer for the metrics in the single pass metric set. The buffer size should be metricsBufferSize bytes.

Type:

int

p_metrics_indices_buffer#

[out] When set to NULL, the function call will treat it as query mode, otherwise it will treat it as data retrieval mode. User need to allocate the buffer for the metrics indices in the single pass metric set. The buffer size should be numOfMetricsInSinglePassSet * sizeof(size_t).

Type:

int

p_priv#

[in] Assign to NULL

Type:

int

p_single_pass_set_name#

[in] the single pass metric set name for which the metrics will be queried

Type:

str

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.Profiler_Host_GetNumOfPasses_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_Host_GetNumOfPasses_Params.

config_image_size#

[in] Number of bytes allocated for pConfigImage

Type:

int

static from_buffer(buffer)#

Create an Profiler_Host_GetNumOfPasses_Params instance with the memory from the given buffer.

num_of_passes#

[out] number of passes required for profiling scheduled metrics in the config image

Type:

int

p_config_image#

[in] the config image buffer

Type:

int

p_priv#

[in] Assign to NULL

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.Profiler_Host_GetRangeName_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_Host_GetRangeName_Params.

counter_data_image_size#

[in] size of counter data image

Type:

int

delimiter#

[in] used in case of nested ranges, default=”/”. Range1<delimiter>Range2

Type:

str

static from_buffer(buffer)#

Create an Profiler_Host_GetRangeName_Params instance with the memory from the given buffer.

p_counter_data_image#

[in] the counter data image where profiling data has been decoded

Type:

int

p_priv#

[in] Assign to NULL

Type:

int

p_range_name#

that the CUPTI allocate the memory internal and its user responsibility to free up the allocated memory

Type:

str

Type:

[out] the range name. Note

ptr#

Get the pointer address to the data as Python int.

range_index#

[in] range index for which the range name will be queried

Type:

int

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.Profiler_Host_GetSinglePassSets_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_Host_GetSinglePassSets_Params.

static from_buffer(buffer)#

Create an Profiler_Host_GetSinglePassSets_Params instance with the memory from the given buffer.

num_of_single_pass_sets#

[out] number of single pass metric sets

Type:

int

p_chip_name#

[in] the chip name for which the single pass metric sets will be queried

Type:

str

p_priv#

[in] Assign to NULL

Type:

int

pp_single_pass_sets#

[out] list of single pass metric sets.

Type:

str

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.Profiler_Host_GetSubMetrics_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_Host_GetSubMetrics_Params.

static from_buffer(buffer)#

Create an Profiler_Host_GetSubMetrics_Params instance with the memory from the given buffer.

metric_type#

[in] the metric type for queried metric

Type:

int

num_of_submetrics#

[out] number of submetrics supported

Type:

int

p_host_object#

[in] reference to the profiler host object allocated by CUPTI in cuptiProfilerHostInitialize

Type:

int

p_metric_name#

[in] metric name for which sub-metric will be listed. Metric name can be with or without extension (rollup or submetric)

Type:

str

p_priv#

[in] Assign to NULL

Type:

int

pp_sub_metrics#

[out] list of submetrics supported for the metric.

Type:

str

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.Profiler_Host_GetSupportedChips_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_Host_GetSupportedChips_Params.

static from_buffer(buffer)#

Create an Profiler_Host_GetSupportedChips_Params instance with the memory from the given buffer.

num_chips#

[out] number of supported chips

Type:

int

p_priv#

[in] Assign to NULL

Type:

int

pp_chip_names#

[out] list of supported chips

Type:

str

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure.

Type:

int

class cupti.cupti.Profiler_Host_Initialize_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_Host_Initialize_Params.

static from_buffer(buffer)#

Create an Profiler_Host_Initialize_Params instance with the memory from the given buffer.

p_chip_name#

[in] accepted for chips supported at the time-of-release.

Type:

str

p_counter_availability_image#

[in] buffer with counter availability image - required for future chip support

Type:

int

p_host_object#

[out] binary blob allocated by CUPTI and operations associated with this object.

Type:

int

p_priv#

[in] Assign to NULL

Type:

int

p_single_pass_metric_set_name#

[in] the single pass metric set name, single pass metric set supported for a chip can be found using cuptiProfilerHostGetSinglePassSets API. Only valid for PM sampling, this can be set to NULL for range profiler.

Type:

str

profiler_type#

[in] the profiler kind one from CUpti_ProfilerType

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] Size of the data structure. CUPTI client should set the size of the structure. It will be used in CUPTI to check what fields are available in the structure. Used to preserve backward compatibility.

Type:

int

Profiling API#

Functions#

cupti.cupti.profiler_deinitialize(p_params: int)#

DeInitializes the profiler interface

Parameters:

p_params (intptr_t) – A pointer to CUpti_Profiler_DeInitialize_Params.

cupti.cupti.profiler_initialize(p_params: int)#

Initializes the profiler interface

Parameters:

p_params (intptr_t) – A pointer to ``CUpti_Profiler_Initialize_Params ``.

Enums#

class cupti.cupti.ProfilerType(value)#

Bases: IntEnum

See CUpti_ProfilerType.

PM_SAMPLING = 1#
PROFILER_INVALID = 2#
RANGE_PROFILER = 0#

Classes#

class cupti.cupti.Profiler_DeInitialize_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_DeInitialize_Params.

static from_buffer(buffer)#

Create an Profiler_DeInitialize_Params instance with the memory from the given buffer.

p_priv#

[in] assign to NULL

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] CUpti_Profiler_DeInitialize_Params_STRUCT_SIZE

Type:

int

class cupti.cupti.Profiler_Initialize_Params#

Bases: object

Empty-initialize an instance of CUpti_Profiler_Initialize_Params.

static from_buffer(buffer)#

Create an Profiler_Initialize_Params instance with the memory from the given buffer.

p_priv#

[in] assign to NULL

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in] CUpti_Profiler_Initialize_Params_STRUCT_SIZE

Type:

int

Device query#

Functions#

cupti.cupti.device_get_chip_name(p_params: int)#

Classes#

class cupti.cupti.Device_GetChipName_Params#

Bases: object

Empty-initialize an instance of CUpti_Device_GetChipName_Params.

See also

CUpti_Device_GetChipName_Params

device_index#

[in]

Type:

int

static from_buffer(buffer)#

Create an Device_GetChipName_Params instance with the memory from the given buffer.

p_chip_name#

[out]

Type:

str

p_priv#

[in] assign to NULL

Type:

int

ptr#

Get the pointer address to the data as Python int.

struct_size#

[in]

Type:

int

Result Codes and Exceptions#

class cupti.cupti.Result(value)#

Bases: IntEnum

CUPTI result codes.Error and result codes returned by CUPTI functions.

See CUptiResult.

ERROR_API_NOT_IMPLEMENTED = 11#
ERROR_CDP_TRACING_NOT_SUPPORTED = 32#
ERROR_CMP_DEVICE_NOT_SUPPORTED = 42#
ERROR_CONFIDENTIAL_COMPUTING_NOT_SUPPORTED = 41#
ERROR_CUDA_COMPILER_NOT_COMPATIBLE = 34#
ERROR_DISABLED = 23#
ERROR_FORCE_INT = 2147483647#
ERROR_HARDWARE = 9#
ERROR_HARDWARE_BUSY = 26#
ERROR_HES_TRACE_NOT_SUPPORTED_ON_MPS = 47#
ERROR_INSUFFICIENT_PRIVILEGES = 35#
ERROR_INVALID_CHIP_NAME = 46#
ERROR_INVALID_CONTEXT = 3#
ERROR_INVALID_DEVICE = 2#
ERROR_INVALID_EVENT_DOMAIN_ID = 4#
ERROR_INVALID_EVENT_ID = 5#
ERROR_INVALID_EVENT_NAME = 6#
ERROR_INVALID_EVENT_VALUE = 22#
ERROR_INVALID_HANDLE = 19#
ERROR_INVALID_KIND = 21#
ERROR_INVALID_METRIC_ID = 16#
ERROR_INVALID_METRIC_NAME = 17#
ERROR_INVALID_METRIC_VALUE = 25#
ERROR_INVALID_MODULE = 24#
ERROR_INVALID_OPERATION = 7#
ERROR_INVALID_PARAMETER = 1#
ERROR_INVALID_STREAM = 20#
ERROR_LEGACY_PROFILER_NOT_SUPPORTED = 38#
ERROR_MAX_LIMIT_REACHED = 12#
ERROR_MIG_DEVICE_NOT_SUPPORTED = 43#
ERROR_MULTIPLE_SUBSCRIBERS_NOT_SUPPORTED = 39#
ERROR_NOT_COMPATIBLE = 14#
ERROR_NOT_INITIALIZED = 15#
ERROR_NOT_READY = 13#
ERROR_NOT_SUPPORTED = 27#
ERROR_OLD_PROFILER_API_INITIALIZED = 36#
ERROR_OPENACC_UNDEFINED_ROUTINE = 37#
ERROR_OUT_OF_MEMORY = 8#
ERROR_PARAMETER_SIZE_NOT_SUFFICIENT = 10#
ERROR_QUEUE_EMPTY = 18#
ERROR_SLI_DEVICE_NOT_SUPPORTED = 44#
ERROR_UM_PROFILING_NOT_SUPPORTED = 28#
ERROR_UM_PROFILING_NOT_SUPPORTED_ON_DEVICE = 29#
ERROR_UM_PROFILING_NOT_SUPPORTED_ON_NON_P2P_DEVICES = 30#
ERROR_UM_PROFILING_NOT_SUPPORTED_WITH_MPS = 31#
ERROR_UNKNOWN = 999#
ERROR_VIRTUALIZED_DEVICE_INSUFFICIENT_PRIVILEGES = 40#
ERROR_VIRTUALIZED_DEVICE_NOT_SUPPORTED = 33#
ERROR_WSL_DEVICE_NOT_SUPPORTED = 45#
SUCCESS = 0#
class cupti.cupti.cuptiError(status: int)#

Bases: Exception