Updates in 2020.1

General

  • Added support for the NVIDIA GA100/SM 8.x GPU architecture

  • Removed support for the Pascal SM 6.x GPU architecture

  • Windows 7 is not a supported host or target platform anymore

  • Added a rule for reporting uncoalesced memory accesses as part of the Source Counters section

  • Added support for report name placeholders %p, %q, %i and %h

  • The Kernel Profiling Guide was added to the documentation

NVIDIA Nsight Compute

  • The UI command was renamed from nv-nsight-cu to ncu-ui. Old names remain for backwards compatibility.

  • Added support for roofline analysis charts

  • Added linked hot spot tables in section bodies to indicate performance problems in the source code

  • Added section navigation links in rule results to quickly jump to the referenced section

  • Added a new option to select how kernel names are shown in the UI

  • Added new memory tables for the L1/TEX cache and the L2 cache. The old tables are still available for backwards compatibility and moved to a new section containing deprecated UI elements.

  • Memory tables now show the metric name as a tooltip

  • Source resolution now takes into account file properties when selecting a file from disk

  • Results in the profile report can now be filtered by NVTX range

  • The Source page now supports collapsing views even for single files

  • The UI shows profiler error messages as dismissible banners for increased visibility

  • Improved the baseline name control in the profiler report header

NVIDIA Nsight Compute CLI

  • The CLI command was renamed from nv-nsight-cu-cli to ncu. Old names remain for backwards compatibility.

  • Queried metrics on GV100 and newer chips are sorted alphabetically

  • Multiple instances of NVIDIA Nsight Compute CLI can now run concurrently on the same system, e.g. for profiling individual MPI ranks. Profiled kernels are serialized across all processes using a system-wide file lock.

Resolved Issues

  • More C++ kernel names can be properly demangled

  • Fixed a free(): invalid pointer error when profiling applications using pytorch > 19.07

  • Fixed profiling IBM Spectrum MPI applications that require PAMI GPU hooks (--smpiargs="-gpu")

  • Fixed that the first kernel instruction was missed when computing sass__inst_executed_per_opcode

  • Reduced surplus DRAM write traffic created from flushing caches during kernel replay

  • The Compute Workload Analysis section shows the IMMA pipeline on GV11b GPUs

  • Profile reports now scroll properly on MacOS when using a trackpad

  • Relative output filenames for the Profile activity now use the document directory, instead of the current working directory

  • Fixed path expansion of ~ on Windows

  • Memory access information is now shown properly for RED assembly instructions on the Source page

  • Fixed that user PYTHONHOME and PYTHONPATH environment variables would be picked up by NVIDIA Nsight Compute, resulting in locale encoding issues.