Updates in 2024.1

General

  • Switched to using OpenSSL version 3.0.10.

  • Added new metrics available when profiling on CUDA Green Contexts.

  • Reduced the number of passes required for collecting PM sampling sections.

  • Counter domains can now be specified for PM sampling metrics in section files.

  • PM sampling metrics can now be queried in the command line and Metric Details window by specifying the respective collection option.

  • Added a new optional PmSampling_WarpStates section for understanding warp stall reasons over the workload duration.

  • Added a new rule for detecting load imbalances.

  • Improved the performance of graph-level profiling on new drivers.

  • Updated the metrics compatibility table for OptiX cmdlists and instruction-level SASS metrics.

  • Improved PM sampling results by dynamically recollecting metrics to avoid outlier pass groups. Use the new pm-sampling-max-passes option to control the maximum number of dynamically replayed passes.

  • Added interKernelCommunication sample CUDA application to show how to use NVIDIA Nsight Compute to profile kernels that depend on each other and must be launched concurrently. Refer to the README.TXT file and sample code under extras/samples/interKernelCommunication.

NVIDIA Nsight Compute

  • Added SASS view and Source Markers support in Source Comparison.

  • Improved Source Comparison diff visualization by adding empty lines on other side of inserted/deleted lines.

  • The Source page column chooser can now be opened directly from the Navigation drop down.

  • Added support to update a source-page profile.

  • Added a Launch Details tool window for showing information about individual launches within larger workloads like OptiX command lists.

  • Added support for CUDA Green Contexts in the Resources tool window, the Launch Statistics section and the report header.

  • Property metrics can now be queried in the Metric Details window.

  • Added support to show if a CUDA Graph kernel node is device-side updatable in the Resources tool window.

NVIDIA Nsight Compute CLI

  • Improved documentation on NVTX expressions and command line output when a potentially incorrect expression led to no workloads being profiled.

  • Improved checking for invalid expressions when using the --target-processes-filer option.

Resolved Issues

  • Fixed that the L1 cache achieved roofline value was missing when profiling on GH100.

  • Fixed several “Launch Failed” errors when collecting instruction-level SASS metrics.

  • Fixed that Live Register values would be too high for some workloads.

  • Fixed a scrolling issue on the Source page when collapsing a multi-file view.

  • Fixed an issue that no PM sampling data would be shown in the timeline when context switch trace was not available.

  • Fixed a display issue in the memory chart when adding baselines.

  • Fixed a crash when adding baselines.

  • Fixed a crash in timeline views when not all configured data was available.

  • Fixed that the application history was not always deleted when selecting Reset Application Data.

  • Fixed an error in the metric compatibility documentation.