Overview

The CUDA Profiling Tools Interface (CUPTI) enables the creation of profiling and tracing tools that target CUDA applications. CUPTI provides the following APIs: the Activity API, the Callback API, the Event API, the Metric API, the Profiling API, the PC Sampling API, the PM Sampling API, the SASS Metric API and the Checkpoint API. Using these APIs, you can develop profiling tools that give insight into the CPU and GPU behavior of CUDA applications. CUPTI is delivered as a dynamic library on all platforms supported by CUDA.

In this CUPTI document, Tracing refers to the collection of timestamps and additional information for CUDA activities such as CUDA APIs, kernel launches and memory copies during the execution of a CUDA application. Tracing helps in identifying performance issues for the CUDA code by telling you which parts of a program require the most time. Tracing information can be collected using the Activity and Callback APIs.

In this CUPTI document, Profiling refers to the collection of GPU performance metrics for a single kernel or a set of kernels in isolation. Profiling might involve multiple replays of the kernel/s or the entire application to collect GPU performance metrics. For Volta and earlier GPU architectures, these metrics can be collected using CUPTI Event and Metric APIs. For Volta and later GPU architectures, the low overhead CUPTI Profiling and Perfworks Metric APIs replace this functionality, and a new CUPTI PC Sampling API is supported.

Table 1. Description of CUPTI APIs

CUPTI API

Feature Description

Activity

Asynchronously record CUDA activities, e.g. CUDA API, Kernel, memory copy

Callback

CUDA event callback mechanism to notify subscriber that a specific CUDA event executed e.g. “Entering CUDA runtime memory copy”

Event

Collect kernel performance counters for a kernel execution

Metric

Collect kernel performance metrics for a kernel execution

Profiling

Collect performance metrics for a range of execution

PC Sampling

Collect continuous mode PC Sampling data without serializing kernel execution

PM Sampling

Collect hardware metrics by sampling the GPU performance monitors (PM) periodically at fixed intervals

SASS Metrics

Collect kernel performance metrics at the source level using SASS patching

Checkpoint

Provides support for automatically saving and restoring the functional state of the CUDA device

CUPTI Profiling API vs. NVIDIA Nsight Perf SDK

CUPTI Profiling API supports profiling of CUDA kernels and it allows collection of GPU performance metrics for a particular kernel or range of kernels at the CUDA context level. NVIDIA Nsight Perf SDK supports graphics APIs (i.e. DirectX, Vulkan, OpenGL) allowing collection of GPU performance metrics at graphics device, context and queue levels. Both NVIDIA Nsight PerfSDK and CUPTI Profiling API share the host APIs (i.e. metrics enumeration, configuration and evaluation) but differ in which GPU APIs they target on the device.