CUPTI Python Samples¶
Once CUPTI Python is installed, the CUPTI samples are located under the site-packages/cupti-python-samples
directory. You can determine the location of your site-packages directory by executing the following command:
$ python3 -m site
- Prior to executing the samples you need to install the following:
numba
Samples¶
The CuptiVectorAdd* samples have a simple code which does element by element vector addition.
CuptiVectorAddNumba.py¶
CUPTI Python sample which shows use of CUPTI Activity APIs. This sample uses CUDA Python with Numba.
- Command line options:
- --profile, -p
Enable CUPTI based profiling. Default: OFF
- --output, -o OUTPUT_TYPE
Select the profiler output format.
OUTPUT_TYPE
can be:brief
,detailed
, ornone
. Default:brief
- --help, -h
Shows the usage.
CuptiVectorAddNumbaCallback.py¶
CUPTI Python sample which shows use of CUPTI Callback APIs. This sample uses CUDA Python with Numba.
- Command line options:
- --profile, -p
Enable CUPTI based profiling. Default: OFF
- --output, -o OUTPUT_TYPE
Select the profiler output format.
OUTPUT_TYPE
can be:brief
,detailed
, ornone
. Default:brief
- --help, -h
Shows the usage.
CuptiVectorAddDrv.py¶
CUPTI Python sample which shows use of CUPTI Activity APIs. This sample uses CUDA Python Driver APIs. It also shows how to use CUDA profiler start and stop APIs to define the range of code to be profiled.
- Command line options:
- --profile, -p
Enable CUPTI based profiling. Default: OFF
- --define-profile-range, -r
Include CUDA profiler start and stop APIs to define the range of code to be profiled. Default: OFF
- --output, -o OUTPUT_TYPE
Select the profiler output format.
OUTPUT_TYPE
can be:brief
,detailed
, ornone
. Default:brief
- --help, -h
Shows the usage.
cupyprof.py¶
CUPTI Python sample which shows how to profile a CUDA Python application using the CUPTI Python APIs without having to modify the CUDA Python application code. This sample shows use of CUPTI Activity APIs and Callback APIs. It also shows how to profile a range of code for a CUDA Python application which uses CUDA profiler start and stop APIs.
usage: cupyprof.py [-h] [-p {from_start|range}] [-a <activities>] [-o {brief|detailed|none}] <python_file_path> [args]
- Command line options:
- --help, -h
Shows the usage.
- --profile, -p PROFILING_TYPE
Enable profiling for entire CUDA python program, or only for the subset between
cuProfilerStart
andcuProfilerStop
.PROFILING_TYPE
can be :from_start
orrange
. Default:from_start
- --activity, -a <comma separated list of activities>
Use
--help
to view the list of supported activities. To know which activities are enabled by default, seedefault_activity_choices
incupyprof.py
.- --output, -o OUTPUT_TYPE
Select the profiler output format.
OUTPUT_TYPE
can be:brief
,detailed
, ornone
. Default:brief
python_file_path
is the path to the CUDA Python application, andargs
are the arguments for the python application.
Examples of running samples¶
Run the sample without profiling:
$ python3 CuptiVectorAddNumba.py
Run the sample with profiling enabled and use default output:
$ python3 CuptiVectorAddNumba.py --profile
profiling_enabled: True
prof_output: ProfOutput.BRIEF
vector_length: 1048576
threads_per_block: 128
blocks_per_grid: 8192
Activity Kind Start Duration correlationId Name
DRIVER 1714136661470990409 1834876 1 cuCtxGetCurrent
DRIVER 1714136661472854473 213 2 cuDeviceGetCount
DRIVER 1714136661472869777 87 3 cuDeviceGet
DRIVER 1714136661472880942 566 4 cuDeviceGetAttribute
DRIVER 1714136661472883507 69 5 cuDeviceGetAttribute
DRIVER 1714136661472906825 3702 6 cuDeviceGetName
DRIVER 1714136661472969577 87 7 cuDeviceGetUuid_v2
DRIVER 1714136661472991812 140587104 8 cuDevicePrimaryCtxRetain
.
.
.
DRIVER 1714136661714686225 218 88 cuCtxGetCurrent
DRIVER 1714136661714688211 55 89 cuCtxGetDevice
DRIVER 1714136661714702981 2080 90 cuCtxSynchronize
verify_result: PASS
Using the
cupyprof.py
sample to profile a CUDA Python application with profiling range defined and withdetailed
output:
$ python3 cupyprof.py --profile range --output detailed ./CuptiVectorAddDrv.py --define-profile-range
profiling_enabled: False
prof_output: ProfOutput.BRIEF
profile_range: True
vector_length: 1048576
threads_per_block: 128
blocks_per_grid: 8192
MEMCPY "HTOD" [ 1726060107808115285, 1726060107808868469 ] duration 753184, size 4194304, src_kind 1, dst_kind 3, correlation_id 2
device_id 0, context_id 1, stream_id 13, graph_id 0, graph_node_id 0, channel_id 10, channel_type ASYNC_MEMCPY
.
.
.
CONCURRENT_KERNEL [ 1737454707775744135, 1737454707775763143 ] duration 19008, "vector_add", correlation_id 5, cache_config_requested 0, cache_config_executed 0
grid [8192, 1, 1], block [128, 1, 1], cluster [0, 0, 0], shared_memory (0, 0)
device_id 0, context_id 1, stream_id 13, graph_id 0, graph_node_id 0, channel_id 1, channel_type COMPUTE
.
.
.
MEMCPY "DTOH" [ 1737455038429091384, 1737455038429825494 ] duration 734110, size 4194304, src_kind 3, dst_kind 1, correlation_id 15
device_id 0, context_id 1, stream_id 13, graph_id 0, graph_node_id 0, channel_id 12, channel_type ASYNC_MEMCPY
verify_result: PASS