NVIDIA Nsight Compute Collector#
Collection utilities for profiling Nsight Python runs using NVIDIA Nsight Compute (ncu).
This module contains logic for launching NVIDIA Nsight Compute with appropriate settings. NCU is instructed to profile specific code sections marked by NVTX ranges - the Nsight Python annotations.
- class nsight.collection.ncu.NCUCollector(
- metric: str = 'gpu__time_duration.sum',
- ignore_kernel_list: Sequence[str] | None = None,
- combine_kernel_metrics: Callable[[float, float], float] | None = None,
- clock_control: Literal['base', 'none'] = 'none',
- cache_control: Literal['all', 'none'] = 'all',
- replay_mode: Literal['kernel', 'range'] = 'kernel',
Bases:
NsightCollectorNCU collector for Nsight Python.
- Parameters:
metric (
str) – Metric to collect from NVIDIA Nsight Compute. By default we collect kernel runtimes in nanoseconds. A list of supported metrics can be found withncu --list-metrics.ignore_kernel_list (
Optional[Sequence[str]]) – List of kernel names to ignore. If you call a library within aannotationcontext, you might not have precise control over which and how many kernels are being launched. If some of these kernels should be ignored in the Nsight Python profile, their their names can be blacklisted. Default:Nonecombine_kernel_metrics (
Optional[Callable[[float,float],float]]) – By default, Nsight Python expects one kernel launch per annotation. In case an annotated region launches multiple kernels, instead of failing the profiling run, you can specify how to summarize the collected metrics into a single number. For example, if we profile runtime and want to sum the times of all kernels we can specifycombine_kernel_metrics = lambda x, y: x + y. The function should take two arguments and return a single value. Default:None.clock_control (
Literal['base','none']) – Select clock_control option control in NVIDIA Nsight Compute. IfNone, we launchncu --clock-control none .... For more details, see the NVIDIA Nsight Compute Profiling Guide: https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#clock-control Default:Nonecache_control (
Literal['all','none']) – Select cache_control option control in NVIDIA Nsight Compute. IfNone, we launchncu --cache-control none .... For more details, see the NVIDIA Nsight Compute Profiling Guide: https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#cache-control Default:allreplay_mode (
Literal['kernel','range']) – Select replay mode option control in NVIDIA Nsight Compute. IfNone, we launchncu --replay-mode kernel .... For more details, see the NVIDIA Nsight Compute Profiling Guide: https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#replay Default:kernel
- collect( )#
Collects profiling data using NVIDIA Nsight Compute.
- nsight.collection.ncu.launch_ncu(
- report_path: str,
- name: str,
- metric: str,
- cache_control: Literal['none', 'all'],
- clock_control: Literal['none', 'base'],
- replay_mode: Literal['kernel', 'range'],
- verbose: bool,
Launch NVIDIA Nsight Compute to profile the current script with specified options.
- Parameters:
report_path (
str) – Path to write report file to.metric (
str) – Specific metric to collect.cache_control (
Literal['none','all']) – Select cache control optionclock_control (
Literal['none','base']) – Select clock control optionreplay_mode (
Literal['kernel','range']) – Select replay mode optionverbose (
bool) – If False, log is written to a file (ncu_log.txt)name (str)
- Raises:
NCUNotAvailableError – If NCU is not available on the system.
SystemExit – If profiling fails due to an error from NVIDIA Nsight Compute.
- Return type:
- Returns:
path to the NVIDIA Nsight Compute log file Produces NVIDIA Nsight Compute report file with profiling data.