Updates in 2020.1
General
Added support for the NVIDIA GA100/SM 8.x GPU architecture
Removed support for the Pascal SM 6.x GPU architecture
Windows 7 is not a supported host or target platform anymore
Added a rule for reporting uncoalesced memory accesses as part of the Source Counters section
Added support for report name placeholders %p, %q, %i and %h
The Kernel Profiling Guide was added to the documentation
NVIDIA Nsight Compute
The UI command was renamed from
nv-nsight-cu
toncu-ui
. Old names remain for backwards compatibility.Added support for roofline analysis charts
Added linked hot spot tables in section bodies to indicate performance problems in the source code
Added section navigation links in rule results to quickly jump to the referenced section
Added a new option to select how kernel names are shown in the UI
Added new memory tables for the L1/TEX cache and the L2 cache. The old tables are still available for backwards compatibility and moved to a new section containing deprecated UI elements.
Memory tables now show the metric name as a tooltip
Source resolution now takes into account file properties when selecting a file from disk
Results in the profile report can now be filtered by NVTX range
The Source page now supports collapsing views even for single files
The UI shows profiler error messages as dismissible banners for increased visibility
Improved the baseline name control in the profiler report header
NVIDIA Nsight Compute CLI
The CLI command was renamed from
nv-nsight-cu-cli
toncu
. Old names remain for backwards compatibility.Queried metrics on GV100 and newer chips are sorted alphabetically
Multiple instances of NVIDIA Nsight Compute CLI can now run concurrently on the same system, e.g. for profiling individual MPI ranks. Profiled kernels are serialized across all processes using a system-wide file lock.
Resolved Issues
More C++ kernel names can be properly demangled
Fixed a
free(): invalid pointer
error when profiling applications using pytorch > 19.07Fixed profiling IBM Spectrum MPI applications that require PAMI GPU hooks (
--smpiargs="-gpu"
)Fixed that the first kernel instruction was missed when computing
sass__inst_executed_per_opcode
Reduced surplus DRAM write traffic created from flushing caches during kernel replay
The Compute Workload Analysis section shows the IMMA pipeline on GV11b GPUs
Profile reports now scroll properly on MacOS when using a trackpad
Relative output filenames for the Profile activity now use the document directory, instead of the current working directory
Fixed path expansion of
~
on WindowsMemory access information is now shown properly for RED assembly instructions on the Source page
Fixed that user
PYTHONHOME
andPYTHONPATH
environment variables would be picked up by NVIDIA Nsight Compute, resulting in locale encoding issues.