Updates in 2020.1

General

Added support for the NVIDIA GA100/SM 8.x GPU architecture
Removed support for the Pascal SM 6.x GPU architecture
Windows 7 is not a supported host or target platform anymore
Added a rule for reporting uncoalesced memory accesses as part of the Source Counters section
Added support for report name placeholders %p, %q, %i and %h
The Kernel Profiling Guide was added to the documentation

NVIDIA Nsight Compute

The UI command was renamed from nv-nsight-cu to ncu-ui. Old names remain for backwards compatibility.
Added support for roofline analysis charts
Added linked hot spot tables in section bodies to indicate performance problems in the source code
Added section navigation links in rule results to quickly jump to the referenced section
Added a new option to select how kernel names are shown in the UI
Added new memory tables for the L1/TEX cache and the L2 cache. The old tables are still available for backwards compatibility and moved to a new section containing deprecated UI elements.
Memory tables now show the metric name as a tooltip
Source resolution now takes into account file properties when selecting a file from disk
Results in the profile report can now be filtered by NVTX range
The Source page now supports collapsing views even for single files
The UI shows profiler error messages as dismissible banners for increased visibility
Improved the baseline name control in the profiler report header

NVIDIA Nsight Compute CLI

The CLI command was renamed from nv-nsight-cu-cli to ncu. Old names remain for backwards compatibility.
Queried metrics on GV100 and newer chips are sorted alphabetically
Multiple instances of NVIDIA Nsight Compute CLI can now run concurrently on the same system, e.g. for profiling individual MPI ranks. Profiled kernels are serialized across all processes using a system-wide file lock.

Resolved Issues

More C++ kernel names can be properly demangled
Fixed a free(): invalid pointer error when profiling applications using pytorch > 19.07
Fixed profiling IBM Spectrum MPI applications that require PAMI GPU hooks (--smpiargs="-gpu")
Fixed that the first kernel instruction was missed when computing sass__inst_executed_per_opcode
Reduced surplus DRAM write traffic created from flushing caches during kernel replay
The Compute Workload Analysis section shows the IMMA pipeline on GV11b GPUs
Profile reports now scroll properly on MacOS when using a trackpad
Relative output filenames for the Profile activity now use the document directory, instead of the current working directory
Fixed path expansion of ~ on Windows
Memory access information is now shown properly for RED assembly instructions on the Source page
Fixed that user PYTHONHOME and PYTHONPATH environment variables would be picked up by NVIDIA Nsight Compute, resulting in locale encoding issues.