NVIDIA Nsight Compute Collector#

Collection utilities for profiling Nsight Python runs using NVIDIA Nsight Compute (ncu).

This module contains logic for launching NVIDIA Nsight Compute with appropriate settings. NCU is instructed to profile specific code sections marked by NVTX ranges - the Nsight Python annotations.

class nsight.collection.ncu.NCUCollector(
metric: str = 'gpu__time_duration.sum',
ignore_kernel_list: Sequence[str] | None = None,
combine_kernel_metrics: Callable[[float, float], float] | None = None,
clock_control: Literal['base', 'none'] = 'none',
cache_control: Literal['all', 'none'] = 'all',
replay_mode: Literal['kernel', 'range'] = 'kernel',
)#

Bases: NsightCollector

NCU collector for Nsight Python.

Parameters:
  • metric (str) – Metric to collect from NVIDIA Nsight Compute. By default we collect kernel runtimes in nanoseconds. A list of supported metrics can be found with ncu --list-metrics.

  • ignore_kernel_list (Optional[Sequence[str]]) – List of kernel names to ignore. If you call a library within a annotation context, you might not have precise control over which and how many kernels are being launched. If some of these kernels should be ignored in the Nsight Python profile, their their names can be blacklisted. Default: None

  • combine_kernel_metrics (Optional[Callable[[float, float], float]]) – By default, Nsight Python expects one kernel launch per annotation. In case an annotated region launches multiple kernels, instead of failing the profiling run, you can specify how to summarize the collected metrics into a single number. For example, if we profile runtime and want to sum the times of all kernels we can specify combine_kernel_metrics = lambda x, y: x + y. The function should take two arguments and return a single value. Default: None.

  • clock_control (Literal['base', 'none']) – Select clock_control option control in NVIDIA Nsight Compute. If None, we launch ncu --clock-control none .... For more details, see the NVIDIA Nsight Compute Profiling Guide: https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#clock-control Default: None

  • cache_control (Literal['all', 'none']) – Select cache_control option control in NVIDIA Nsight Compute. If None, we launch ncu --cache-control none .... For more details, see the NVIDIA Nsight Compute Profiling Guide: https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#cache-control Default: all

  • replay_mode (Literal['kernel', 'range']) – Select replay mode option control in NVIDIA Nsight Compute. If None, we launch ncu --replay-mode kernel .... For more details, see the NVIDIA Nsight Compute Profiling Guide: https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#replay Default: kernel

collect(
func: Callable[[...], Any],
configs: Sequence[Sequence[Any]],
settings: ProfileSettings,
)#

Collects profiling data using NVIDIA Nsight Compute.

Parameters:
Return type:

DataFrame | None

Returns:

Collected profiling data.

nsight.collection.ncu.launch_ncu(
report_path: str,
name: str,
metric: str,
cache_control: Literal['none', 'all'],
clock_control: Literal['none', 'base'],
replay_mode: Literal['kernel', 'range'],
verbose: bool,
)#

Launch NVIDIA Nsight Compute to profile the current script with specified options.

Parameters:
  • report_path (str) – Path to write report file to.

  • metric (str) – Specific metric to collect.

  • cache_control (Literal['none', 'all']) – Select cache control option

  • clock_control (Literal['none', 'base']) – Select clock control option

  • replay_mode (Literal['kernel', 'range']) – Select replay mode option

  • verbose (bool) – If False, log is written to a file (ncu_log.txt)

  • name (str)

Raises:
  • NCUNotAvailableError – If NCU is not available on the system.

  • SystemExit – If profiling fails due to an error from NVIDIA Nsight Compute.

Return type:

str | None

Returns:

path to the NVIDIA Nsight Compute log file Produces NVIDIA Nsight Compute report file with profiling data.