CUPTI

What's New

CUPTI contains below changes as part of the CUDA Toolkit 9.2 release.
  • Added support to query PCI devices information which can be used to construct the PCIE topology. See activity kind CUPTI_ACTIVITY_KIND_PCIE and related activity record CUpti_ActivityPcie.
  • To view and analyze bandwidth of memory transfers over PCIe topologies, new set of metrics to collect total data bytes transmitted and recieved through PCIe are added. Those give accumulated count for all devices in the system. These metrics are collected at the device level for the entire application. And those are made available for devices with compute capability 5.2 and higher.
  • CUPTI added support for new metrics:
    • Instruction executed for different types of load and store
    • Total number of cached global/local load requests from SM to texture cache
    • Global atomic/non-atomic/reduction bytes written to L2 cache from texture cache
    • Surface atomic/non-atomic/reduction bytes written to L2 cache from texture cache
    • Hit rate at L2 cache for all requests from texture cache
    • Device memory (DRAM) read and write bytes
    • The utilization level of the multiprocessor function units that execute tensor core instructions for devices with compute capability 7.0
  • A new attribute CUPTI_EVENT_ATTR_PROFILING_SCOPE is added under enum CUpti_EventAttribute to query the profiling scope of a event. Profiling scope indicates if the event can be collected at the context level or device level or both. See Enum CUpti_EventProfilingScope for avaiable profiling scopes.
  • A new error code CUPTI_ERROR_VIRTUALIZED_DEVICE_NOT_SUPPORTED is added to indicate that tracing and profiling on virtualized GPU is not supported.

Table of Contents