Nsight Compute

The user manual for NVIDIA Nsight Compute.

NVIDIA Nsight Compute (UI) user manual. Information on all views, controls and workflows within the tool. Description of PC sampling metrics and shipped section files.

1. Introduction

For users migrating from Visual Profiler to NVIDIA Nsight Compute, please see the Visual Profiler Transition Guide for comparison of features and workflows.

1.1. Overview

This document is a user guide to the next-generation NVIDIA Nsight Compute profiling tools. NVIDIA Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command line tool. In addition, its baseline feature allows users to compare results within the tool. NVIDIA Nsight Compute provides a customizable and data-driven user interface and metric collection and can be extended with analysis scripts for post-processing results.

Important Features
  • Interactive kernel profiler and API debugger
  • Graphical profile report
  • Result comparison across one or multiple reports within the tool
  • Fast Data Collection
  • UI and Command Line interface
  • Fully customizable reports and analysis rules

2. Quickstart

The following sections provide brief step-by-step guides of how to setup and run NVIDIA Nsight Compute to collect profile information. All directories are relative to the base directory of NVIDIA Nsight Compute, unless specified otherwise.

The UI executable is called ncu-ui. A shortcut with this name is located in the base directory of the NVIDIA Nsight Compute installation. The actual executable is located in the folder host\windows-desktop-win7-x64 on Windows or host/linux-desktop-glibc_2_11_3-x64 on Linux. By default, when installing from a Linux .run file, NVIDIA Nsight Compute is located in /usr/local/cuda-<cuda-version>/nsight-compute-<version>. When installing from a .deb or .rpm package, it is located in /opt/nvidia/nsight-compute/<version> to be consistent with Nsight Systems. In Windows, the default path is C:\Program Files\NVIDIA Corporation\Nsight Compute <version>.

After starting NVIDIA Nsight Compute, by default the Welcome Page is opened. It provides links to recently opened reports and projects as well as quick access to the Connection Dialog, and the Projects dialogs. To immediately start a profile run, select Continue under Quick Launch. See Environment on how to change the start-up action.

Welcome Page



2.1. Interactive Profile Activity

  1. Launch the target application from NVIDIA Nsight Compute

    When starting NVIDIA Nsight Compute, the Welcome Page will appear. Click on Quick Launch to open the Connection dialog. If the Connection dialog doesn't appear, you can open it using the Connect button from the main toolbar, as long as you are not currently connected. Select your target platform on the left-hand side and your connection target (machine) from the Connection drop down. If you have your local target platform selected, localhost will become available as a connection. Use + to add a new connection target. Then, fill in the launch details and select Launch. In the Activity panel, select the Interactive Profile activity to initiate a session that allows controlling the execution of the target application and selecting the kernels of interest interactively. Press Launch to start the session.





  2. Launch the target application with tools instrumentation from the command line
    The ncu can act as a simple wrapper that forces the target application to load the necessary libraries for tools instrumentation. The parameter --mode=launch specifies that the target application should be launched and suspended before the first instrumented API call. That way the application waits until we connect with the UI.
    $ ncu --mode=launch CuVectorAddDrv.exe
  3. Launch NVIDIA Nsight Compute and connect to target application




    Select the target machine at the top of the dialog to connect and update the list of attachable applications. By default, localhost is pre-selected if the target matches your current local platform. Select the Attach tab and the target application of interest and press Attach. Once connected, the layout of NVIDIA Nsight Compute changes into stepping mode that allows you to control the execution of any calls into the instrumented API. When connected, the API Stream window indicates that the target application waits before the very first API call.





  4. Control application execution

    Use the API Stream window to step the calls into the instrumented API. The dropdown at the top allows switching between different CPU threads of the application. Step In (F11), Step Over (F10), and Step Out (Shift + F11) are available from the Debug menu or the corresponding toolbar buttons. While stepping, function return values and function parameters are captured.





    Use Resume (F5) and Pause to allow the program to run freely. Freeze control is available to define the behavior of threads currently not in focus, i.e. selected in the thread drop down. By default, the API Stream stops on any API call that returns an error code. This can be toggled in the Debug menu by Break On API Error.

  5. Isolate a kernel launch

    To quickly isolate a kernel launch for profiling, use the Next API Launch button in the toolbar of the API Stream window to jump to the next kernel launch. The execution will stop before the kernel launch is executed.





  6. Profile a kernel launch

    Once the execution of the target application is suspended at a kernel launch, additional actions become available in the UI. These actions are either available from the menu or from the toolbar. Please note that the actions are disabled, if the API stream is not at a qualifying state (not at a kernel launch or launching on an unsupported GPU). To profile, press Profile and wait until the result is shown in the Profiler Report. Profiling progress is reported in the lower right corner status bar.

    Instead of manually selecting Profile, it is also possible to enable Auto Profile from the Profile menu. If enabled, each kernel matching the current kernel filter (if any) will be profiled using the current section configuration. This is especially useful if an application is to be profiled unattended, or the number of kernel launches to be profiled is very large. Sections can be enabled or disabled using the Sections/Rules Info tool window.

For a detailed description of the options available in this activity, see Interactive Profile Activity.

2.2. Non-Interactive Profile Activity

  1. Launch the target application from NVIDIA Nsight Compute

    When starting NVIDIA Nsight Compute, the Welcome Page will appear. Click on Quick Launch to open the Connection dialog. If the Connection dialog doesn’t appear, you can open it using the Connect button from the main toolbar, as long as you are not currently connected. Select your target platform on the left-hand side and your localhost from the Connection drop down. Then, fill in the launch details and select Launch. In the Activity panel, select the Profile activity to initiate a session that pre-configures the profile session and launches the command line profiler to collect the data. Provide the Output File name to enable starting the session with the Launch button.





  2. Additional Launch Options

    For more details on these options see the Command Line Options of the command line profiler. The options are grouped into tabs: The Filter tab exposes the options to specify which kernels should be profiled. Options include the kernel regex filter, the number of launches to skip, and the total number of launch to profile. The Section tab allows you to select which sections should be collected for each kernel launch. The Sampling tab allows you to configure sampling options for each kernel launch. The Other tab includes the option to collect NVTX information or custom metrics via the --metrics option.

    The Section tab allows you to select which sections should be collected for each kernel launch. Hover over a section to see its description as a tool-tip. To change the sections that are enabled by default, use the Sections/Rules Info tool window.





For a detailed description of the options available in this activity, see Profile Activity.

2.3. Navigate the Report

  1. Navigate the report

    The profile report comes up by default on the Details page. You can switch between different Report Pages of the report with the dropdown labeled Page on the top-left of the report. A report can contain any number of results from kernel launches. The Launch dropdown allows switching between the different results in a report.





  2. Diffing multiple results

    On the Details page, press the button Add Baseline to promote the current result in focus to become the baseline all other results from this report and any other report opened in the same instance of NVIDIA Nsight Compute gets compared to. If a baseline is set, every element on the Details page shows two values: The current value of the result in focus and the corresponding value of the baseline or the percentage of change from the corresponding baseline value.





    Use the Clear Baselines entry from the dropdown button, the Profile menu or the corresponding toolbar button to remove all baselines. For more information see Baselines.

  3. Executing rules

    On the Details page some sections may provide rules. Press the Apply button to execute an individual rule. The Apply Rules button on the top executes all available rules for the current result in focus. Rules can be user-defined too. For more information see the Customization Guide.





3. Connection Dialog

Use the Connection Dialog to launch and attach to applications on your local and remote platforms. Start by selecting the Target Platform for profiling. By default (and if supported) your local platform will be selected. Select the platform on which you would like to start the target application or connect to a running process.

Connection Dialog



When using a remote platform, you will be asked to select or create a Connection in the top drop down. To create a new connection, select + and enter your connection details. When using the local platform, localhost will be selected as the default and no further connection settings are required. You can still create or select a remote connection, if profiling will be on a remote system of the same platform.

Depending on your target platform, select either Launch or Remote Launch to launch an application for profiling on the target. Note that Remote Launch will only be available if supported on the target platform.

Fill in the following launch details for the application:
  • Application Executable: Specifies the root application to launch. Note that this may not be the final application that you wish to profile. It can be a script or launcher that creates other processes.
  • Working Directory: The directory in which the application will be launched.
  • Command Line Arguments: Specify the arguments to pass to the application executable.
  • Environment: The environment variables to set for the launched application.

Select Attach to attach the profiler to an application already running on the target platform. This application must have been started using another NVIDIA Nsight Compute CLI instance. The list will show all application processes running on the target system which can be attached. Select the refresh button to re-create this list.

Finally, select the Activity to be run on the target for the launched or attached application. Note that not all activities are necessarily compatible with all targets and connection options. Currently, the following activities exist:

3.1. Remote Connections

Remote devices that support SSH can also be configured as a target in the Connection Dialog. To configure a remote device, ensure an SSH-capable Target Platform is selected, then press the + button. The following configuration dialog will be presented.





NVIDIA Nsight Compute supports both password and private key authentication methods. In this dialog, select the authentication method and enter the following information:

  • Password
    • IP/Host Name: The IP address or host name of the target device.
    • User Name: The user name to be used for the SSH connection.
    • Password: The user password to be used for the SSH connection.
    • Port: The port to be used for the SSH connection. (The default value is 22.)
    • Deployment Directory: The directory to use on the target device to deploy supporting files. The specified user must have write permissions to this location.
  • Private Key




    • IP/Host Name: The IP address or host name of the target device.
    • User Name: The user name to be used for the SSH connection.
    • SSH Private Key: The private key that is used to authenticate to SSH server.
    • SSH Key Passphrase: The passphrase for your private key.
    • Deployment Directory: The directory to use on the target device to deploy supporting files. The specified user must have write permissions to this location.

When all information is entered, click the Add button to make use of this new connection.

When a remote connection is selected in the Connection Dialog, the Application Executable file browser will browse the remote file system using the configured SSH connection, allowing the user to select the target application on the remote device.

When an activity is launched on a remote device, the following steps are taken:
  1. The command line profiler and supporting files are copied into the Deployment Directory on the the remote device. (Only files that do not exist or are out of date are copied.)
  2. The Application Executable is executed on the remote device.
    • For Interactive Profile activities, a connection is established to the remote application and the profiling session begins.
    • For Non-Interactive Profile activities, the remote application is executed under the command line profiler and the specified report file is generated.
  3. For non-interactive profiling activities, the generated report file is copied back to the host, and opened.

The progress of each of these steps is presented in the Progress Log.

Progress Log



Note that once either activity type has been launched remotely, the tools necessary for further profiling sessions can be found in the Deployment Directory on the remote device.

3.2. Interactive Profile Activity

The Interactive Profile activity allows you to initiate a session that controls the execution of the target application, similar to a debugger. You can step API calls and workloads (CUDA kernels), pause and resume, and interactively select the kernels of interest and which metrics to collect.

This activity does currently not support profiling or attaching to child processes.

  • Enable NVTX Support

    Collect NVTX information provided by the application or its libraries. Required to support stepping to specific NVTX contexts.

  • Disable Profiling Start/Stop

    Ignore calls to cu(da)ProfilerStart or cu(da)ProfilerStop made by the application.

  • Enable Profiling From Start

    Enables profiling from the application start. Disabling this is useful if the application calls cu(da)ProfilerStart and kernels before the first call to this API should not be profiled. Note that disabling this does not prevent you from manually profiling kernels.

  • Cache Control

    Control the behavior of the GPU caches during profiling. Allowed values: For Flush All, all GPU caches are flushed before each kernel replay iteration during profiling. While metric values in the execution environment of the application might be slightly different without invalidating the caches, this mode offers the most reproducible metric results across the replay passes and also across multiple runs of the target application.

    For Flush None, no GPU caches are flushed during profiling. This can improve performance and better replicates the application behavior if only a single kernel replay pass is necessary for metric collection. However, some metric results will vary depending on prior GPU work, and between replay iterations. This can lead to inconsistent and out-of-bounds metric values.

  • Clock Control

    Control the behavior of the GPU clocks during profiling. Allowed values: For Base, GPC and memory clocks are locked to their respective base frequency during profiling. This has no impact on thermal throttling. For None, no GPC or memory frequencies are changed during profiling.

3.3. Profile Activity

The Profile activity provides a traditional, pre-configurable profiler. After configuring which kernels to profile, which metrics to collect, etc, the application is run under the profiler without interactive control. The activity completes once the application terminates. For applications that normally do not terminate on their own, e.g. interactive user interfaces, you can cancel the activity once all expected kernels are profiled.

This activity does not support attaching to processes previously launched via NVIDIA Nsight Compute. These processes will be shown grayed out in the Attach tab.

  • Output File

    Path to report file where the collected profile should be stored. If not present, the report extension .ncu-rep is added automatically. The placeholder %i is supported for the filename component. It is replaced by a sequentially increasing number to create a unique filename. This maps to the --export command line option.

  • Force Overwrite

    If set, existing report file are overwritten. This maps to the --force-overwrite command line option.

  • Target Processes

    Select the processes you want to profile. In mode Application Only, only the root application process is profiled. In mode all, the root application process and all its child processes are profiled. This maps to the --target-processes command line option.

  • Additional Options

    All remaining options map to their command line profiler equivalents. See the Command Line Options section in the NVIDIA Nsight Compute CLI documentation for details.

3.4. Reset

Entries in the connection dialog are saved as part of the current project. When working in a custom project, simply close the project to reset the dialog.

When not working in a custom project, entries are stored as part of the default project. You can delete all information from the default project by closing NVIDIA Nsight Compute and then deleting the project file from disk.

5. Tool Windows

5.1. API Statistics

The API Statistics window is available when NVIDIA Nsight Compute is connected to a target application. It opens by default as soon as the connection is established. It can be re-opened using Debug > API Statistics from the main menu.





Whenever the target application is suspended, it shows a summary of tracked API calls with some statistical information, such as the number of calls, their total, average, minimum and maximum duration. Note that this view cannot be used as a replacement for Nsight Systems when trying to optimize CPU performance of your application.

The Reset button deletes all statistics collected to the current point and starts a new collection. Use the Export to CSV button to export the current statistics to a CSV file.

5.2. API Stream

The API Stream window is available when NVIDIA Nsight Compute is connected to a target application. It opens by default as soon as the connection is established. It can be re-opened using Debug > API Stream from the main menu.





Whenever the target application is suspended, the window shows the history of API calls and traced kernel launches. The currently suspended API call or kernel launch (activity) is marked with a yellow arrow. If the suspension is at a subcall, the parent call is marked with a green arrow. The API call or kernel is suspended before being executed.

For each activity, further information is shown such as the kernel name or the function parameters (Func Parameters) and return value (Func Return). Note that the function return value will only become available once you step out or over the API call.

Use the Current Thread dropdown to switch between the active threads. The dropdown shows the thread ID followed by the current API name. One of several options can be chosen in the trigger dropdown, which are executed by the adjacent >> button. Next Kernel Launch resumes execution until the next kernel launch is found in any enabled thread. Next API Call resumes execution until the next API call matching Next Trigger is found in any enabled thread. Next Range Start resumes execution until the next start of an active profiler range is found. Profiler ranges are defined by using the cu(da)ProfilerStart/Stop API calls. Next Range Stop resumes execution until the next stop of an active profiler range is found. The API Level dropdown changes which API levels are shown in the stream. The Export to CSV button exports the currently visible stream to a CSV file.

5.3. NVTX

The NVTX window is available when NVIDIA Nsight Compute is connected to a target application. If closed, it can be re-opened using Debug > NVTX from the main menu. Whenever the target application is suspended, the window shows the state of all active NVTX domains and ranges in the currently selected thread. Note that NVTX information is only tracked if the launching command line profiler instance was started with --nvtx or NVTX was enabled in the NVIDIA Nsight Compute launch dialog.





Use the Current Thread dropdown in the API Stream window to change the currently selected thread. NVIDIA Nsight Compute supports NVTX named resources, such as threads, CUDA devices, CUDA contexts, etc. If a resource is named using NVTX, the appropriate UI elements will be updated.





5.4. Resources

The Resources window is available when NVIDIA Nsight Compute is connected to a target application. It shows information about the currently known resources, such as CUDA devices, CUDA streams or kernels. The window is updated every time the target application is suspended. If closed, it can be re-opened using Debug > Resources from the main menu.





Using the dropdown on the top, different views can be selected, where each view is specific to one kind of resource (context, stream, kernel, …). The Filter edit allows you to create filter expressions using the column headers of the currently selected resource.

The resource table shows all information for each resource instance. Each instance has a unique ID, the API Call ID when this resource was created, its handle, associated handles, and further parameters. When a resource is destroyed, it is removed from its table.

5.5. Sections/Rules Info

The Sections/Rules Info window can be opened from the main menu using Profile > Sections/Rules Info. It tracks all section sets, sections and rules currently loaded in NVIDIA Nsight Compute, independent from a specific connection or report. The directory to load those files from can be configured in the Profile options dialog. It is used to inspect available sets, sections and rules, as well as to configure which should be collected, and which rules should be applied. The window has two views, which can be selected using the dropdown in its header.

The Section Sets view shows all available section sets. Each set is associated with a number of sections. You can choose a set appropriate to the level of detail for which you want to collect performance metrics. Sets which collect more detailed information normally incur higher runtime overhead during profiling.





When enabling a set in this view, the associated sections are enabled in the Sections/Rules view. When disabling a set in this view, the associated sections in the Sections/Rules view are disabled. If no set is enabled, or if sections are manually enabled/disabled in the Sections/Rules view, the <custom> entry is marked active to represent that no section set is currently enabled. Note that the default set is enabled by default.

Whenever a kernel is profiled manually, or when auto-profiling is enabled, only sections enabled in the Sections/Rules view are collected. Similarly, whenever rules are applied, only rules enabled in this view are active.





The enabled states of sections and rules are persisted across NVIDIA Nsight Compute launches. The Reload button reloads all sections and rules from disk again. If a new section or rule is found, it will be enabled if possible. If any errors occur while loading a rule, they will be listed in an extra entry with a warning icon and a description of the error.

Use the Enable All and Disable All checkboxes to enable or disable all sections and rules at once. The Filter text box can be used to filter what is currently shown in the view. It does not alter activation of any entry.

The table shows sections and rules with their activation status, their relationship and further parameters, such as associated metrics or the original file on disk. Rules associated with a section are shown as children of their section entry. Rules independent of any section are shown under an additional Independent Rules entry.

Double-clicking an entry in the table's Filename column opens this file as a document. It can be edited and saved directly in NVIDIA Nsight Compute. After editing the file, Reload must be selected to apply those changes.

See the Kernel Profiling Guide for the list of default sections for NVIDIA Nsight Compute.

6. Profiler Report

The profiler report contains all the information collected during profiling for each kernel launch. In the user interface, it consists of a header with general information, as well as controls to switch between report pages or individual collected launches. By default, the report starts with the Details page selected.

6.1. Header

The Page dropdown can be used to switch between the available report pages, which are explained in detail in the next section.

Profiler report header



The Launch dropdown can be used to switch between all collected kernel launches. The information displayed in each page commonly represents the selected launch instance. On some pages (e.g. Raw), information for all launches is shown and the selected instance is highlighted. You can type in this dropdown to quickly filter and find a kernel launch.

The Apply Filters button open the filter dialog. You can use more than one filter to narrow down your results. On the filter dialog, enter your filter parameters and press OK button. The Launch dropdown will be filtered accordingly. Select the arrow dropdown to access the Clear Filters button, which removes all filters.

Filter Dialog



The Add Baseline button promotes the current result in focus to become the baseline of all other results from this report and any other report opened in the same instance of NVIDIA Nsight Compute. Select the arrow dropdown to access the Clear Baselines button, which removes all currently active baselines.

The Apply Rules button applies all rules available for this report. If rules had been applied previously, those results will be replaced. By default, rules are applied immediately once the kernel launch has been profiled. This can be changed in the options under Tools > Options > Profile > Report UI > Apply Applicable Rules Automatically.

A button on the right-hand side offers multiple operations that may be performed on the page. Available operations include:
  • Copy as Image - Copies the contents of the page to the clipboard as an image.
  • Save as Image - Saves the contents of the page to a file as an image.
  • Save as PDF - Saves the contents of the page to a file as a PDF.
  • Export to CSV - Exports the contents of page to CSV format.
  • Reset to Default - Resets the page to a default state by removing any persisted settings.

Note that not all functions are available on all pages.

Information about the selected kernel is shown as Current. [+] and [-] buttons can be used to show or hide the section body content. The info toggle button i changes the section description's visibility.

6.2. Report Pages

Use the Page dropdown in the header to switch between the report pages.

6.2.1. Session Page

This Session page contains basic information about the report and the machine, as well as device attributes of all devices for which launches were profiled. When switching between launch instances, the respective device attributes are highlighted.

6.2.2. Summary Page

The Summary page shows a list of all collected results in this report, with selected important summary metrics. It gives you a quick comparison overview across all profiled kernel launches. You can transpose the table of kernels and metrics by using the Transpose button.

6.2.3. Details Page

The Details page is the main page for all metric data collected during a kernel launch. The page is split into individual sections. Each section consists of a header table and an optional body that can be expanded. The sections are completely user defined and can be changed easily by updating their respective files. For more information on customizing sections, see the Customization Guide. For a list of sections shipped with NVIDIA Nsight Compute, see the Kernel Profiling Guide.

By default, once a new profile result is collected, all applicable rules are applied. Any rule results will be shown as Recommendations on this page. Most rule results will be purely informative or have a warning icon to indicate some performance problem. Results with error icons typically indicate an error while applying the rule.

Rule results often point out performance problems and guide through the analysis process.



If a rule result references another report section, it will appear as a link in the recommendation. Select the link to scroll to the respective section. If the section was not collected in the same profile result, enable it in the Sections/Rules Info tool window.

You can add or edit comments in each section of the Details view by clicking on the comment button (speech bubble). The comment icon will be highlighted in sections that contain a comment. Comments are persisted in the report and are summarized in the Comments Page.

Use the Comments button to annotate sections.



Besides their header, sections typically have one or more bodies with additional charts or tables. Click the triangle Expander icon in the top-left corner of each section to show or hide those. If a section has multiple bodies, a dropdown in their top-right corner allows you to switch between them.

Sections with multiple bodies have a dropdown to switch between them.



If enabled, the SOL Rooflines section contains a Roofline chart that is particularly helpful for visualizing kernel performance at a glance. (To enable roofline charts in the report, ensure that the GPU Speed of Light Roofline Chart section is selected when profiling.) More information on how to use and read this chart can be found in the Kernel Profiling Guide.

Sample roofline chart.



Sections such as Source Counters can contain source hot spot tables. These tables indicate the N highest or lowest values of one or more metrics in your kernel source code. Select the location links to navigate directly to this location in the Source Page. Hover the mouse over a value to see which metrics contribute to it.

Hot spot tables point out performance problems in your source.



6.2.4. Source Page

The Source page correlates assembly (SASS) with high-level code and PTX. In addition, it displays metrics that can be correlated with source code. It is filtered to only show (SASS) functions that were executed in the kernel launch.

Profiler report Source page



The View dropdown can be used to select different code (correlation) options. This includes SASS, PTX and Source (CUDA-C), as well as their combinations. Which options are available depends on the source information embedded into the executable.

If the application was built with the -lineinfo or --generate-line-info nvcc flag to correlate SASS and source, the CUDA-C view is available. If source files are not found locally in their original path, only their filenames are shown in the view together with a File Not Found error. Select a filename and click the Resolve button above to select where this source can be found on the local filesystem. If a file is found in its original or any source lookup location, but its attributes don't match, a File Mismatch error is shown. See the Source Lookup options for changing file lookup behavior.

If the report was collected using remote profiling, and automatic resolution of remote files is enabled in the Profile options, NVIDIA Nsight Compute will attempt to load the source from the remote target. If the connection credentials are not yet available in the current NVIDIA Nsight Compute instance, they are prompted in a dialog. Loading from a remote target is currently only available for Linux x86_64 targets and Linux and Windows hosts.

CUDA-C source Resolve button



The heatmap on the right-hand side of each view can be used to quickly identify locations with high metric values of the currently selected metric in the dropdown. The heatmap uses a black-body radiation color scale where black denotes the lowest mapped value and white the highest, respectively. The current scale is shown when clicking and holding the heatmap with the right mouse button.

If a view contains multiple source files or functions, [+] and [-] buttons are shown. These can be used to expand or collapse the view, thereby showing or hiding the file or function content except for its header. If collapsed, all metrics are shown aggregated to provide a quick overview.

Source view heatmap color scale



Views allow you to fix columns to not move out of view when scrolling horizontally. By default, the Source column is fixed to the left, enabling easy inspection of all metrics correlated to a source line. To change fixing of columns, select the triangle icon in the respective column header.

Fix Source column icon in the header



Pre-Defined Source Metrics
  • Live Registers

    Number of registers that need to be kept valid by the compiler. A high value indicates that many registers are required at this code location, potentially increasing the register pressure and the maximum number of register required by the kernel.

  • Sampling Data (All)

    The number of samples from the Statistical Sampler at this program location.

  • Sampling Data (Not Issued)

    The number of samples from the Statistical Sampler at this program location on cycles the warp scheduler issued no instructions. Note that (Not Issued) samples may be taken on a different profiling pass than (All) samples mentioned above, so their values do not strictly correlate.

    This metric is only available on devices with compute capability 7.0 or higher.

  • Instructions Executed

    Number of times the source (instruction) was executed by any warp.

  • Predicated-On Thread Instructions Executed

    Number of times the source (instruction) was executed by any active, predicated-on thread. For instructions that are executed unconditionally (i.e. without predicate), this is the number of active threads in the warp, multiplied with the respective Instructions Executed value.

  • Information on Memory Operation

    This includes Memory Address Space, Memory Access Operation, Memory Access Size, Memory L1 Transactions Shared, Memory L2 Transactions Global and Memory L2 Transactions Local.

  • Individual Sampling Data Metrics

    All stall_* metrics show the information combined in Sampling Data individually. See Statistical Sampler in the Kernel Profiling Guide for their descriptions.

  • See the Customization Guide on how to add additional metrics for this view.

6.2.5. Comments Page

The Comments page aggregates all section comments in a single view and allows the user to edit those comments on any launch instance or section, as well as on the overall report. Comments are persisted with the report. If a section comment is added, the comment icon of the respective section in the Details Page will be highlighted.

6.2.6. NVTX Page

The NVTX page shows the NVTX context when the kernel was launched. All thread-specific information is with respect to the thread of the kernel's launch API call. Note that NVTX information is only collected if the profiler is started with NVTX support enabled, either in the Connection Dialog or using the NVIDIA Nsight Compute CLI command line parameter.





6.2.7. Raw Page

The Raw page shows a list of all collected metrics with their units per profiled kernel launch. It can be exported, for example, to CSV format for further analysis. The page features a filter edit to quickly find specific metrics. You can transpose the table of kernels and metrics by using the Transpose button.

6.3. Metrics and Units

Numeric metric values are shown in various places in the report, including the header and tables and charts on most pages. NVIDIA Nsight Compute supports various ways to display those metrics and their values.

When available and applicable to the UI component, metrics are shown along with their unit. This is to make it apparent if a metric represents cycles, threads, bytes/s, and so on. The unit will normally be shown in rectangular brackets, e.g. Metric Name [bytes] 128.

By default, units are scaled automatically so that metric values are shown with a reasonable order of magnitude. Units are scaled using their SI-factors, i.e. byte-based units are scaled using a factor of 1000 and the prefixes K, M, G, etc. Time-based units are also scaled using a factor of 1000, with the prefixes n, u and m. This scaling can be disabled in the Profile options.

Metrics which could not be collected are shown as n/a and assigned a warning icon. If the metric floating point value is out of the regular range (i.e. nan (Not a number) or inf (infinite)), they are also assigned a warning icon. The exception are metrics for which these values are expected and which are white-listed internally.

7. Baselines

NVIDIA Nsight Compute supports diffing collected results across one or multiple reports using Baselines. Each result in any report can be promoted to a baseline. This causes metric values from all results in all reports to show the difference to the baseline. If multiple baselines are selected simultaneously, metric values are compared to the average across all current baselines. Note that currently, baselines are not stored with a report and are only available as long as the same NVIDIA Nsight Compute instance is open.

Profiler report with one baseline



Select Add Baseline to promote the current result in focus to become a baseline. If a baseline is set, most metrics on the Details Page, Raw Page and Summary Page show two values: the current value of the result in focus, and the corresponding value of the baseline or the percentage of change from the corresponding baseline value. (Note that an infinite percentage gain, inf%, may be displayed when the baseline value for the metric is zero, while the focus value is not.)

If multiple baselines are selected, each metric will show the following notation:
<focus value> (<difference to baselines average [%]>, z=<standard score>@<number of values>)
The standard score is the difference between the current value and the average across all baselines, normalized by the standard deviation. If the number of metric values contributing to the standard score equals the number of results (current and all baselines), the @<number of values> notation is omitted.
Profiler report with multiple baselines



Hovering the mouse over a baseline name allows the user to edit the displayed name. Hovering over the baseline color icon allows the user to remove this specific baseline from the list.

Use the Clear Baselines entry from the dropdown button, the Profile menu, or the corresponding toolbar button to remove all baselines.

8. Options

NVIDIA Nsight Compute options can be accessed via the main menu under Tools > Options. All options are persisted on disk and available the next time NVIDIA Nsight Compute is launched. When an option is changed from its default setting, its label will become bold. You can use the Restore Defaults button to restore all options to their default values.

Profile options



8.1. Profile

Table 1. NVIDIA Nsight Compute Profile Options
Name Description Values
Sections Directory Directory from which to import section files and rules. Relative paths are with respect to the NVIDIA Nsight Compute installation directory.  
Include Sub-Directories Recursively include section files and rules from sub-directories. Yes (Default)/No
Apply Applicable Rules Automatically Automatically apply active and applicable rules. Yes (Default)/No
Reload Rules Before Applying Force a rule reload before applying the rule to ensure changes in the rule script are recognized. Yes/No (Default)
Auto-Convert Metric Units Auto-adjust displayed metric units and values (e.g. Bytes to KBytes). Yes (Default)/No
Default Report Page The report page to show when a report is generated or opened.
  • Session
  • Summary
  • Details (Default)
  • Source
  • Comments
  • Raw
  • Nvtx
Delay Load Source Page Delays loading the content of the Source report page until the page becomes visible. Avoids processing costs and memory overhead until the page is opened. Yes/No (Default)
Function Name Mode Determines how function/kernel names are shown.
  • Auto (default): each component uses its preferred mode
  • Demangled: kernel names are shown demangled with all parameters
  • Function: kernel names are shown with their demangled function name without parameters
  • Mangled: kernel names are shown with their mangled name, if applicable
Maximum Baseline Name Length The maximum length of baseline names. 1..N
Number of Full Baselines to Display Number of baselines to display in the report header with all details in addition to the current result. 0..N
Show Instanced Metric Values Show the individual values of instanced metrics in tables. Yes/No (Default)
Show Metrics As Floating Point Show all numeric metrics as floating-point numbers. Yes/No (Default)
Show Single File For Multi-File Sources Shows a single file in each Source page view, even for multi-file sources. Yes/No (Default)
Show Only Executed Functions Shows only executed functions in the source page views. Disabling this can impact performance. Yes (Default)/No
Auto-Resolve Remote Source Files Automatically try to resolve remote source files on the source page (e.g. via SSH) if the connection is still registered. Yes/No (Default)

8.2. Environment

Table 2. NVIDIA Nsight Compute Environment Options
Name Description Values
Color Theme The currently selected UI color theme.
  • Dark (Default)
  • Light
Mixed DPI Scaling Disable Mixed DPI Scaling if unwanted artifacts are detected when using monitors with different DPIs.
  • Auto (Default)
  • Off
Default Document Folder Directory where documents unassociated with a project will be saved.  
At Startup What to do when NVIDIA Nsight Compute is launched.
  • Show welcome page (Default)
  • Show quick launch dialog
  • Load last project
  • Show empty environment
Show version update notifications Show notifications when a new version of this product is available.
  • Yes (Default)
  • No

8.3. Connection

Connection properties are grouped into Target Connection Options and Host Connection Properties.

Target Connection Properties

The Target Connection Properties determine how the host connects to the target application during an Interactive Profile Activity. This connection is used to transfer profile information to the host during the profile session.

Table 3. NVIDIA Nsight Compute Target Connection Properties
Name Description Values
Base Port Base port used to establish a connection from the host to the target application during an Interactive Profile activity (both local and remote). 1-65535 (Default: 49152)
Maximum Ports Maximum number of ports to try (starting from Base Port) when attempting to connect to the target application. 2-65534 (Default: 64)

Host Connection Properties

The Host Connection Properties determine how the command line profiler will connect to the host application during a Profile Activity. This connection is used to transfer profile information to the host during the profile session.

Table 4. NVIDIA Nsight Compute Host Connection Options
Name Description Values
Base Port Base port used to establish a connection from the command line profiler to the host application during a Profile activity (both local and remote). 1-65535 (Default: 50152)
Maximum Ports Maximum number of ports to try (starting from Base Port) when attempting to connect to the host application. 1-100 (Default: 10)

8.4. Source Lookup

Table 5. NVIDIA Nsight Compute Source Lookup Options
Name Description Values
Program Source Locations Set program source search paths. These paths are used to resolve CUDA-C source files on the Source page if the respective file cannot be found in its original location. Files which cannot be found are marked with a File Not Found error. See the Ignore File Properties option for files that are found but don't match.  
Ignore File Properties Ignore file properties (e.g. timestamp, size) for source resolution. If this is disabled, all file properties like modification timestamp and file size are checked against the information stored by the compiler in the application during compilation. If a file with the same name exists on a source lookup path, but not all properties match, it won't be used for resolution (and a File Mismatch error will be shown). Yes/No (Default)

8.5. Send Feedback

Table 6. NVIDIA Nsight Compute Send Feedback Options
Name Description Values
Collect Usage and Platform Data Choose whether or not you wish to allow NVIDIA Nsight Compute to collect usage and platform data.
  • Yes (Default)
  • No

9. Projects

NVIDIA Nsight Compute uses Project Files to group and organize profiling reports. At any given time, only one project can be open in NVIDIA Nsight Compute. Collected reports are automatically assigned to the current project. Reports stored on disk can be assigned to a project at any time. In addition to profiling reports, related files such as notes or source code can be associated with the project for future reference.

Note that only references to reports or other files are saved in the project file. Those references can become invalid, for example when associated files are deleted, removed or not available on the current system, in case the project file was moved itself.

NVIDIA Nsight Compute uses the ncu-proj file extension for project files.

When no custom project is current, a default project is used to store e.g. the current Connection Dialog entries. To remove all information from the default project, you must close NVIDIA Nsight Compute and then delete the file from disk.
  • On Windows, the file is located at <USER>\AppData\Local\NVIDIA Corporation\NVIDIA Nsight Compute\
  • On Linux, the file is located at <USER>/.local/share/NVIDIA Corporation/NVIDIA Nsight Compute/
  • On MacOSX, the file is located at <USER>/Library/Application Support/NVIDIA Corporation/NVIDIA Nsight Compute/

9.1. Project Dialogs

  • New Project

    Creates a new project. The project must be given a name, which will also be used for the project file. You can select the location where the project file should be saved on disk. Select whether a new directory with the project name should be created in that location.

9.2. Project Explorer

The Project Explorer window allows you to inspect and manage the current project. It shows the project name as well as all Items (profile reports and other files) associated with it. Right-click on any entry to see further actions, such as adding, removing or grouping items. Type in the Search project toolbar at the top to filter the currently shown entries.

Project Explorer



10. Visual Profiler Transition Guide

This guide provides tips for moving from Visual Profiler to NVIDIA Nsight Compute. NVIDIA Nsight Compute tries to provide as much parity as possible with Visual Profiler's kernel profiling features, but some functionality is now covered by different tools.

10.1. Trace

NVIDIA Nsight Compute does not support tracing GPU or API activities on an accurate timeline. This functionality is covered by NVIDIA Nsight Systems. In the Interactive Profile Activity, the API Stream tool window provides a stream of recent API calls on each thread. However, since all tracked API calls are serialized by default, it does not collect accurate timestamps.

10.2. Sessions

Instead of sessions, NVIDIA Nsight Compute uses Projects to launch and gather connection details and collected reports.

  • Executable and Import Sessions

    Use the Project Explorer or the Main Menu to create a new project. Reports collected from the command line, i.e. using NVIDIA Nsight Compute CLI, can be opened directly using the main menu. In addition, you can use the Project Explorer to associate existing reports as well as any other artifacts such as executables, notes, etc., with the project. Note that those associations are only references; in other words, moving or deleting the project file on disk will not update its artifacts.

    nvprof or command-line profiler output files, as well as Visual Profiler sessions, cannot be imported into NVIDIA Nsight Compute.

10.3. Timeline

Since trace analysis is now covered by Nsight Systems, NVIDIA Nsight Compute does not provide views of the application timeline. The API Stream tool window does show a per-thread stream of the last captured CUDA API calls. However, those are serialized and do not maintain runtime concurrency or provide accurate timing information.

10.4. Analysis

  • Guided Analysis

    All trace-based analysis is now covered by NVIDIA Nsight Systems. This means that NVIDIA Nsight Compute does not include analysis regarding concurrent CUDA streams or (for example) UVM events. For per-kernel analysis, NVIDIA Nsight Compute provides recommendations based on collected performance data on the Details Page. These rules currently require you to collect the required metrics via their sections up front, and do not support partial on-demand profiling.

    To use the rule-based recommendations, enable the respective rules in the Sections/Rules Info. Before profiling, enable Apply Rules in the Profile Options, or click the Apply Rules button in the report afterward.

  • Unguided Analysis

    All trace-based analysis is now covered by Nsight Systems. For per-kernel analysis, Python-based rules provide analysis and recommendations. See Guided Analysis above for more details.

  • PC Sampling View

    Source-correlated PC sampling information can now be viewed in the Source Page. Aggregated warp states are shown on the Details Page in the Warp State Statistics section.

  • Memory Statistics

    Memory Statistics are located on the Details Page. Enable the Memory Workload Analysis sections to collect the respective information.

  • NVLink View

    NVIDIA Nsight Compute does not currently support NVLink metrics or topology information.

  • Source-Disassembly View

    Source correlated with PTX and SASS disassembly is shown on the Source Page. Which information is available depends on your application's compilation/JIT flags.

  • GPU Details View

    NVIDIA Nsight Compute does not automatically collect data for each executed kernel, and it does not collect any data for device-side memory copies. Summary information for all profiled kernel launches is shown on the Summary Page. Comprehensive information on all collected metrics for all profiled kernel launches is shown on the Raw Page.

  • CPU Details View

    CPU callstack sampling is now covered by NVIDIA Nsight Systems.

  • OpenACC Details View

    OpenACC performance analysis is not supported by NVIDIA Nsight Compute. See the NVIDIA Nsight Systems release notes to check its latest support status.

  • OpenMP Details View

    OpenMP performance analysis is not supported by NVIDIA Nsight Compute. See the NVIDIA Nsight Systems release notes to check its latest support status.

  • Properties View

    NVIDIA Nsight Compute does not collect CUDA API and GPU activities and their properties. Performance data for profiled kernel launches is reported (for example) on the Details Page.

  • Console View

    NVIDIA Nsight Compute does not currently collect stdout/stderr application output.

  • Settings View

    Application launch settings are specified in the Connection Dialog. For reports collected from the UI, launch settings can be inspected on the Session Page after profiling.

  • CPU Source View

    Source for CPU-only APIs is not available. Source for profiled GPU kernel launches is shown on the Source Page.

10.5. Command Line Arguments

Please execute ncu-ui with the -h parameter within a shell window to see the currently supported command line arguments for the NVIDIA Nsight Compute UI.

To open a collected profile report with ncu-ui, simply pass the path to the report file as a parameter to the shell command.

11. Visual Studio Integration Guide

This guide provides information on using NVIDIA Nsight Compute within Microsoft Visual Studio, using the NVIDIA Nsight Integration Visual Studio extension, allowing for a seamless development workflow.

11.1. Visual Studio Integration Overview

NVIDIA Nsight Integration is a Visual Studio extension that allows you to access the power of NVIDIA Nsight Compute from within Visual Studio.

When NVIDIA Nsight Compute is installed along with NVIDIA Nsight Integration, NVIDIA Nsight Compute activities will appear under the NVIDIA 'Nsight' menu in the Visual Studio menu bar. These activities launch NVIDIA Nsight Compute with the current project settings and executable.

For more information about using NVIDIA Nsight Compute from within Visual Studio, please visit

12. Library Support

NVIDIA Nsight Compute can be used to profile CUDA applications, as well as applications that use CUDA via NVIDIA or third-party libraries. For most such libraries, the behavior is expected to be identical to applications using CUDA directly. However, for certain libraries, NVIDIA Nsight Compute has certain restrictions, alternate behavior, or requires non-default setup steps prior to profiling.

12.1. OptiX

NVIDIA Nsight Compute supports profiling of OptiX applications, but with certain restrictions.

  • Internal Kernels

    Kernels launched by OptiX that contain no user-defined code are given the generic name NVIDIA internal. These kernels show up on the API Stream in the NVIDIA Nsight Compute UI, and can be profiled in both the UI as well as the NVIDIA Nsight Compute CLI. However, no CUDA-C source, PTX or SASS is available for them.

  • User Kernels

    Kernels launched by OptiX can contain user-defined code. OptiX identifies these kernels in the API Stream with a custom name. This name starts with raygen__ (for "ray generation"). These kernels show up on the API Stream and can be profiled in the UI as well as the NVIDIA Nsight Compute CLI. The Source page displays CUDA-C source, PTX and SASS defined by the user. Certain parts of the kernel, including device functions that contain OptiX-internal code, will not be available in the Source page.

  • SASS

    When SASS information is available in the profile report, certain instructions might not be available in the Source page and shown as N/A.

Notices

Notice

ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.

Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all other information previously supplied. NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation.

Trademarks

NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.