CUPTI

What's New

CUPTI contains a number of enhancements as part of the CUDA Toolkit 6.5 release.
  • Instruction classification is done for source-correlated Instruction Execution activity CUpti_ActivityInstructionExecution. See CUpti_ActivityInstructionClass for instruction classes.
  • Two new device attributes are added to the activity CUpti_DeviceAttribute:
    • CUPTI_DEVICE_ATTR_FLOP_SP_PER_CYCLE gives peak single precision flop per cycle for the GPU.
    • CUPTI_DEVICE_ATTR_FLOP_DP_PER_CYCLE gives peak double precision flop per cycle for the GPU.
  • Two new metric properties are added:
    • CUPTI_METRIC_PROPERTY_FLOP_SP_PER_CYCLE gives peak single precision flop per cycle for the GPU.
    • CUPTI_METRIC_PROPERTY_FLOP_DP_PER_CYCLE gives peak double precision flop per cycle for the GPU.
  • Activity record CUpti_ActivityGlobalAccess for source level global access information has been deprecated and replaced by new activity record CUpti_ActivityGlobalAccess2. New record additionally gives information needed to map SASS assembly instructions to CUDA C source code. And it also provides ideal L2 transactions count based on the access pattern.
  • Activity record CUpti_ActivityBranch for source level branch information has been deprecated and replaced by new activity record CUpti_ActivityBranch2. New record additionally gives information needed to map SASS assembly instructions to CUDA C source code.
  • Sample sass_source_map is added to demonstrate the mapping of SASS assembly instructions to CUDA C source code.
  • Default event collection mode is changed to Kernel (CUPTI_EVENT_COLLECTION_MODE_KERNEL) from Continuous (CUPTI_EVENT_COLLECTION_MODE_CONTINUOUS). Also Continuous mode is now supported only on Tesla devices.
  • Profiling results might be inconsistent when auto boost is enabled. Profiler tries to disable auto boost by default, it might fail to do so in some conditions, but profiling will continue. A new API cuptiGetAutoBoostState is added to query the auto boost state of the device. This API returns error CUPTI_ERROR_NOT_SUPPORTED on devices that don't support auto boost. Note that auto boost is supported only on certain Tesla devices from the Kepler+ family.
  • Activity record CUpti_ActivityKernel2 for kernel execution has been deprecated and replaced by new activity record CUpti_ActivityKernel3. New record additionally gives information about Global Partitioned Cache Configuration requested and executed. The new fields apply for devices with 5.2 Compute Capability.

Table of Contents