Updates in 2022.1
General
Added support for the CUDA toolkit 11.6.
Added support for GA103 chips.
Added a new Range Replay mode to profile ranges of multiple, concurrent kernels. Range replay is available in the NVIDIA Nsight Compute CLI and the non-interactive Profile activity.
Added a new rule to detect non-fused floating-point instructions.
The Uncoalesced Memory access rules now show results in a dynamic table.
Unix Domain Sockets and Windows Named Pipes are used for local connection between the host and target processes on x86_64 Linux and Windows, respectively.
The NvRules API now supports querying action names using different function name bases (e.g. demangled).
NVIDIA Nsight Compute
The default report page is now chosen automatically when opening a report.
Added coverage for ECC (Error Correction Code) operations in the L2 Cache table of the Memory Analysis section.
Added a new L2 Evict Policies table to the Memory Analysis section.
The Occupancy Calculator now updates automatically when the input changes.
Added new metric Thread Instructions Executed to the Source page.
Added tooltips to the Register Dependency columns in the Source page to identify the associated register more conveniently.
Improved the selection of Sections and Sets in the Profile activity connection dialog.
NVLink utilization is shown in the NVLink Tables section.
NVLink links are colored according to the measured throughput.
NVIDIA Nsight Compute CLI
--kernel-regex
and--kernel-regex-base
options are no longer supported. Alternate options are--kernel-name
and--kernel-name-base
respectively, added in 2021.1.0.Added support to resolve CUDA source files in the
--page source
output with the new--resolve-source-file
command line option.Added new option
--target-processes-filter
to filter the processes being profiled by name.The CPU Stack Trace is shown in the NVIDIA Nsight Compute CLI output.
Resolved Issues
Fixed the calculation of aggregated average instruction execution metrics in non-SASS views on the Source page.
Fixed that atomic instructions are counted as both loads and stores in the Memory Analysis tables.