Updates in 2021.3

General

  • Added support for the CUDA toolkit 11.5.

  • Added a new rule for detecting inefficient memory access patterns in the L1TEX cache and L2 cache.

  • Added a new rule for detecting high usage of system or peer memory.

  • Added new IAction::sass_by_pc function to the NvRules API.

  • The Python-based report interface is now available for Windows and MacOS hosts, too.

  • Added Hierarchical Roofline section files in a new “roofline” section set.

  • Added support for collecting CPU call stack information.

NVIDIA Nsight Compute

  • Added support for new remote profiling SSH connection and authentication options as well as local SSH configuration files.

  • Added an Occupancy Calculator which can be opened directly from a profile report or as a new activity. It offers feature parity to the CUDA Occupancy Calculator spreadsheet.

  • Added new Baselines tool window to manage (hide, update, re-order, save/load) baseline selections.

  • The Source page views now support multi-line/cell selection and copy/paste. Different colors are used for highlighting selections and correlated lines.

  • The search edit on the Source page now supports Shift+Enter to search in reverse direction.

  • The Memory Workload Analysis Chart can be configured to show throughput values instead of transferred bytes.

  • The Profile activity now supports the --devices option.

  • The NVLink Topology diagram displays per NVLink metrics.

  • Added a new tool window showing the CPU call stack at the location where the current thread was suspended during interactive profiling activities.

  • If enabled, the Call Stack / NVTX page of the profile report shows the captured CPU call stack for the selected kernel launch.

NVIDIA Nsight Compute CLI

  • Added support for printing source/metric content with the new --page source and --print-sourcecommand line options.

  • Added new option --call-stack to enable collecting the CPU call stack for every profiled kernel launch.

Resolved Issues

  • Fixed that memory_* metrics could not be collected with the --metrics option.

  • Fixed that selection and copy/paste was not supported for section header tables on the Details page.

  • Fixed issues with the Source page when collapsing the content.

  • Fixed that the UI could crash when applying rules to a new profile result.

  • Fixed that PC Sampling metrics were not available for Profile Series.

  • Fixed that local profiling did not work if no non-loopback address was configured for the system.

  • Fixed termination of remote-launched applications. On QNX, terminating an application profiled via Remote Launch is now supported. Canceling remote-launched Profile activities is now supported.