Updates in 2025.1

General

  • All roofline sections are now included in the full section set.

  • Range Replay and app-range replay are now supporting the collection of instruction-level source metrics.

  • Rules are now supported for range replays.

  • Improved which launch metrics are available for ranges.

  • Added a new launch__stack_size metric in the Launch Statistics section to report the configured stack size.

  • Added a new sass__inst_executed_register_spilling metric which counts the number of load and store instructions that were created by the compiler due to register spilling.

  • Nsight Compute now natively supports macOS arm64.

NVIDIA Nsight Compute

  • Added interactive tooltips to Details and Source pages. An interactive tooltip can be used to compare different baselines. Its content can be copied to the clipboard using the copy icon button.

  • CUDA Green Contexts support is improved by showing TPC mask information in the Launch Statistics section, the Resources tool window, and on the Session page.

  • Added heatmap to the Source Comparison document to visualize the source code differences.

  • Added Diff By drop down menu to the Source Comparison document in the SASS view, this allows you to choose the diff basis based on either Opcode or Full Instruction.

  • Performance improvements in SASS view.

  • The Resources View for CUDA Graphs can now visualize the graph structure directly in a new Chart mode.

  • The Memory Chart now supports zoom and pan.

  • The Metric Details tool window now shows PM sampling metrics from the timeline as context switched.

  • Improved the performance for deploying to target systems over remote connections.

NVIDIA Nsight Compute CLI

Resolved Issues

  • Fixed that on some systems, not all free GPU memory was considered when saving context memory for multi-pass data collection.

  • Fixed an incorrect multiplier in the calculation of non-tensor FP16 rooflines.

  • Fixed the metric Avg. Threads Executed for inlined functions with control flow.

  • Fixed that in some situations, no average was shown in the Source Statistics table for Warp Stall sampling metrics.

  • Fixed several SASS syntax highlighting issues.

  • Fixed an issue where the SM count wasn’t shown correctly in the report header when loading older reports.

  • Improved interactions between the Metric Details tool window and the memory chart.