NVIDIA Nsight Compute Collector#

Collection utilities for profiling Nsight Python runs using NVIDIA Nsight Compute (ncu).

This module contains logic for launching NVIDIA Nsight Compute with appropriate settings. NCU is instructed to profile specific code sections marked by NVTX ranges - the Nsight Python annotations.

class nsight.collection.ncu.NCUCollector( metrics: Sequence[str] = ['gpu__time_duration.sum'], ignore_kernel_list: Sequence[str] | None = None, combine_kernel_metrics: Callable[[float, float], float] | None = None, clock_control: Literal['base', 'none'] = 'none', cache_control: Literal['all', 'none'] = 'all', replay_mode: Literal['kernel', 'range'] = 'kernel', )#

Bases: NsightCollector

NCU collector for Nsight Python.

Parameters:

metrics (Sequence[str]) – Metrics to collect from NVIDIA Nsight Compute. By default we collect kernel runtimes in nanoseconds. A list of supported metrics can be found with ncu --list-metrics.
ignore_kernel_list (Optional[Sequence[str]]) – List of kernel names to ignore. If you call a library within a annotation context, you might not have precise control over which and how many kernels are being launched. If some of these kernels should be ignored in the Nsight Python profile, their their names can be blacklisted. Default: None
combine_kernel_metrics (Optional[Callable[[float, float], float]]) – By default, Nsight Python expects one kernel launch per annotation. In case an annotated region launches multiple kernels, instead of failing the profiling run, you can specify how to summarize the collected metrics into a single number. For example, if we profile runtime and want to sum the times of all kernels we can specify combine_kernel_metrics = lambda x, y: x + y. The function should take two arguments and return a single value. Default: None.
clock_control (Literal['base', 'none']) – Select clock_control option control in NVIDIA Nsight Compute. If None, we launch ncu --clock-control none .... For more details, see the NVIDIA Nsight Compute Profiling Guide: https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#clock-control Default: None
cache_control (Literal['all', 'none']) – Select cache_control option control in NVIDIA Nsight Compute. If None, we launch ncu --cache-control none .... For more details, see the NVIDIA Nsight Compute Profiling Guide: https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#cache-control Default: all
replay_mode (Literal['kernel', 'range']) – Select replay mode option control in NVIDIA Nsight Compute. If None, we launch ncu --replay-mode kernel .... For more details, see the NVIDIA Nsight Compute Profiling Guide: https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#replay Default: kernel

collect( func: Callable[[...], None], configs: Iterable[Sequence[Any]], settings: ProfileSettings, )#

Collects profiling data using NVIDIA Nsight Compute.

Parameters:

func (Callable[..., None]) – The function to profile.
configs (Iterable[Sequence[Any]]) – iterable of configurations to run the function with.
settings (ProfileSettings) – Profiling settings.

Return type:

DataFrame | None

Returns:

Collected profiling data.

nsight.collection.ncu.launch_ncu( report_path: str, name: str, metrics: Sequence[str], cache_control: Literal['none', 'all'], clock_control: Literal['none', 'base'], replay_mode: Literal['kernel', 'range'], verbose: bool, )#

Launch NVIDIA Nsight Compute to profile the current script with specified options.

Parameters:

report_path (str) – Path to write report file to.
metrics (Sequence[str]) – Specific metrics to collect.
cache_control (Literal['none', 'all']) – Select cache control option
clock_control (Literal['none', 'base']) – Select clock control option
replay_mode (Literal['kernel', 'range']) – Select replay mode option
verbose (bool) – If False, log is written to a file (ncu_log.txt)
name (str)

Raises:

NCUNotAvailableError – If NCU is not available on the system.
ProfilerException – If profiling fails due to an error from NVIDIA Nsight Compute.
ValueError – If invalid values are provided for cache_control, clock_control, or replay_mode.

Return type:

str | None

Returns:

path to the NVIDIA Nsight Compute log file Produces NVIDIA Nsight Compute report file with profiling data.