Updates in 2022.1

General

  • Added support for the CUDA toolkit 11.6.

  • Added support for GA103 chips.

  • Added a new Range Replay mode to profile ranges of multiple, concurrent kernels. Range replay is available in the NVIDIA Nsight Compute CLI and the non-interactive Profile activity.

  • Added a new rule to detect non-fused floating-point instructions.

  • The Uncoalesced Memory access rules now show results in a dynamic table.

  • Unix Domain Sockets and Windows Named Pipes are used for local connection between the host and target processes on x86_64 Linux and Windows, respectively.

  • The NvRules API now supports querying action names using different function name bases (e.g. demangled).

NVIDIA Nsight Compute

  • The default report page is now chosen automatically when opening a report.

  • Added coverage for ECC (Error Correction Code) operations in the L2 Cache table of the Memory Analysis section.

  • Added a new L2 Evict Policies table to the Memory Analysis section.

  • The Occupancy Calculator now updates automatically when the input changes.

  • Added new metric Thread Instructions Executed to the Source page.

  • Added tooltips to the Register Dependency columns in the Source page to identify the associated register more conveniently.

  • Improved the selection of Sections and Sets in the Profile activity connection dialog.

  • NVLink utilization is shown in the NVLink Tables section.

  • NVLink links are colored according to the measured throughput.

NVIDIA Nsight Compute CLI

  • --kernel-regex and --kernel-regex-base options are no longer supported. Alternate options are --kernel-name and --kernel-name-base respectively, added in 2021.1.0.

  • Added support to resolve CUDA source files in the --page source output with the new --resolve-source-filecommand line option.

  • Added new option --target-processes-filter to filter the processes being profiled by name.

  • The CPU Stack Trace is shown in the NVIDIA Nsight Compute CLI output.

Resolved Issues

  • Fixed the calculation of aggregated average instruction execution metrics in non-SASS views on the Source page.

  • Fixed that atomic instructions are counted as both loads and stores in the Memory Analysis tables.