Updates in 2021.3
General
Added support for the CUDA toolkit 11.5.
Added a new rule for detecting inefficient memory access patterns in the L1TEX cache and L2 cache.
Added a new rule for detecting high usage of system or peer memory.
Added new
IAction::sass_by_pc
function to the NvRules API.The Python-based report interface is now available for Windows and macOS hosts, too.
Added Hierarchical Roofline section files in a new “roofline” section set.
Added support for collecting CPU call stack information.
NVIDIA Nsight Compute
Added support for new remote profiling SSH connection and authentication options as well as local SSH configuration files.
Added an Occupancy Calculator which can be opened directly from a profile report or as a new activity. It offers feature parity to the CUDA Occupancy Calculator spreadsheet.
Added new Baselines tool window to manage (hide, update, re-order, save/load) baseline selections.
The Source page views now support multi-line/cell selection and copy/paste. Different colors are used for highlighting selections and correlated lines.
The search edit on the Source page now supports Shift+Enter to search in reverse direction.
The Memory Workload Analysis Chart can be configured to show throughput values instead of transferred bytes.
The Profile activity now supports the
--devices
option.The NVLink Topology diagram displays per NVLink metrics.
Added a new tool window showing the CPU call stack at the location where the current thread was suspended during interactive profiling activities.
If enabled, the Call Stack / NVTX page of the profile report shows the captured CPU call stack for the selected kernel launch.
NVIDIA Nsight Compute CLI
Added support for printing source/metric content with the new
--page source
and--print-source
command line options.Added new option
--call-stack
to enable collecting the CPU call stack for every profiled kernel launch.
Resolved Issues
Fixed that
memory_*
metrics could not be collected with the--metrics
option.Fixed that selection and copy/paste was not supported for section header tables on the Details page.
Fixed issues with the Source page when collapsing the content.
Fixed that the UI could crash when applying rules to a new profile result.
Fixed that PC Sampling metrics were not available for Profile Series.
Fixed that local profiling did not work if no non-loopback address was configured for the system.
Fixed termination of remote-launched applications. On QNX, terminating an application profiled via Remote Launch is now supported. Canceling remote-launched Profile activities is now supported.