Known Issues
Installation
The installer might not show all patch-level version numbers during installation.
Some command line options listed in the help of a .run installer of NVIDIA Nsight Compute are affecting only the archive extraction, but not the installation stage. To pass command line options to the embedded installer script, specify those options after
--
in the form of-- -<option>
. The available options for the installer script are:-help : Print help message -targetpath=<PATH> : Specify install path -noprompt : No prompts. Implies acceptance of the EULA
For example, specifying only option
--quiet
extracts the installer archive without any output to the console, but still prompts for user interaction during the installation. To install NVIDIA Nsight Compute without any console output nor any user interaction, please specify--quiet -- -noprompt
.After using the SDK Manager to install the NVIDIA Nsight Compute tools, their binary path needs to be manually added to your
PATH
environment variable.See also the System Requirements for more installation instructions.
Launch and Connection
Launching applications on remote targets/platforms is not supported for several combinations. See Platform Support for details. Manually launch the application using command line
ncu --mode=launch
on the remote system and connect using the UI or CLI afterwards.In the NVIDIA Nsight Compute connection dialog, a remote system can only be specified for one target platform. Remove a connection from its current target platform in order to be able to add it to another.
Loading of CUDA sources via SSH requires that the remote connection is configured, and that the hostname/IP address of the connection matches the target (as seen in the report session details). For example, prefer my-machine.my-domain.com, instead of my-machine, even though the latter resolves to the same.
Other issues concerning remote connections are discussed in the documentation for remote connections.
Local connections between NVIDIA Nsight Compute and the launched target application might not work on some ppc64le or aarch64 (sbsa) systems configured to only support IPv6. On these platforms, the NV_COMPUTE_PROFILER_LOCAL_CONNECTION_OVERRIDE=uds environment variable can be set to use Unix Domain Sockets instead of TCP for local connections to workaround the problem. On x86_64 Linux, Unix Domain Sockets are used by default, but local TCP connections can be forced using NV_COMPUTE_PROFILER_LOCAL_CONNECTION_OVERRIDE=tcp.
Profiling and Metrics
Profiling of 32-bit processes is not supported.
Profiling kernels executed on a device that is part of an SLI group is not supported. An “Unsupported GPU” error is shown in this case.
Profiling a kernel while other contexts are active on the same device (e.g. X server, or secondary CUDA or graphics application) can result in varying metric values for L2/FB (Device Memory) related metrics. Specifically, L2/FB traffic from non-profiled contexts cannot be excluded from the metric results. To completely avoid this issue, profile the application on a GPU without secondary contexts accessing the same device (e.g. no X server on Linux).
In the current release, profiling a kernel while any other GPU work is executing on the same MIG compute instance can result in varying metric values for all units. NVIDIA Nsight Compute enforces serialization of the CUDA launches within the target application to ensure those kernels do not influence each other. See Serialization for more details. However, GPU work issued through other APIs in the target process or workloads created by non-target processes running simultaneously in the same MIG compute instance will influence the collected metrics. Note that it is acceptable to run CUDA processes in other MIG compute instances as they will not influence the profiled MIG compute instance.
On Linux kernels settings
fs.protected_regular=1
(e.g. some Ubuntu 20.04 cloud service provider instances), root users may not be able to access the inter-process lock file. See the FAQ for workarounds.Profiling only supports up to 32 device instances, including instances of MIG partitions. Profiling the 33rd or higher device instance will result in indeterminate data.
Enabling certain metrics can cause GPU kernels to run longer than the driver’s watchdog time-out limit. In these cases the driver will terminate the GPU kernel resulting in an application error and profiling data will not be available. Please disable the driver watchdog time out before profiling such long running CUDA kernels.
On Linux, setting the X Config option Interactive to false is recommended.
For Windows, detailed information on disabling the Windows TDR is available at https://docs.microsoft.com/en-us/windows-hardware/drivers/display/timeout-detection-and-recovery
Collecting device-level metrics, such as the NVLink metrics (
nvl*
), is not supported on NVIDIA virtual GPUs (vGPUs).As of CUDA 11.4 and R470 TRD1 driver release, NVIDIA Nsight Compute is supported in a vGPU environment which requires a vGPU license. If the license is not obtained after 20 minutes, the reported performance metrics data from the GPU will be inaccurate. This is because of a feature in vGPU environment which reduces performance but retains functionality as specified here.
Profiling on NVIDIA live-migrated virtual machines is not supported and can result in undefined behavior.
Profiling with enabled multi-process service (MPS) is not supported.
Profiling is not supported while the target GPU is configured to run in any Confidential Computing mode.
Profiling most metrics with
CU_FORCE_PTX_JIT=1
set is only supported with CUDA 12.7 drivers or newer.When Profiling using Range Replay or Application Range Replay with multiple CUDA Green Contexts active which belong to the same device context, the range result will contain counter values aggregated on all Green Contexts. In case the Green Contexts use overlapping SM masks, this will even apply to Green-Context attributable metrics <../ProfilingGuide/index.html#cuda-green-contexts>`__.
The NVLink Topology section is not supported for a configuration using NVSwitch.
NVIDIA Nsight Compute does not support per-NVLink metrics.
NVIDIA Nsight Compute does not support the Logical NVLink Throughput table.
Setting a reduced NVLink Bandwidth mode does not impact the reported peak values for NVLink metrics. All peak values and corresponding percentages are calculated off the non-reduced NVLink bandwidth. Reconfiguring the NVLink Bandwidth mode using
nvidia-smi
while profiling may lead to undefined tools’ behavior.
Profiling CUDA graph kernel nodes that can launch device graphs or are part of device-launchable graphs is not supported. Use Graph Profiling mode instead.
Profiling in Graph Profiling mode is performed on the context that is specified by the stream handle for the graph launch. Only kernel nodes executing on this context are profiled.
On CUDA drivers older than 530.x, profiling on Windows Subsystem for Linux (WSL) is not supported if the system has multiple physical NVIDIA GPUs. This is not affected by setting
CUDA_VISIBLE_DEVICES
.Collecting software counters through PerfWorks currently forces all functions in the module of the profiled kernel to be loaded. This increases the host and device memory footprint of the target application for the remainder of the process lifetime.
PM Sampling is not supported when collecting a Profile Series.
Data collected using PM Sampling across multiple passes may not align perfectly in the timeline, even with context switch filtering applied.
For results which are collected with Work ID feature, the count of clusters, blocks, warps and threads launched on the device could be lower than configured. Metrics that depend on such counts would be affected accordingly.
On Windows, when in MCDM mode, changing access to GPU performance counters is not supported through the NVIDIA Control Panel. See ERR_NVGPUCTRPERM for further details.
Collecting NVLink Chip-2-Chip (C2C) metrics is not supported on Blackwell GPUs.
Compatibility
Applications calling blocking functions on std input/output streams can result in the profiler to stop, until the blocking function call is resolved.
NVIDIA Nsight Compute can hang on applications using RAPIDS in versions 0.6 and 0.7, due to an issue in cuDF.
Profiling child processes launched via
clone()
is not supported.Profiling of Cooperative Groups kernels launched with
cuLaunchCooperativeKernelMultiDevice
is not yet supported.On Linux systems, when profiling bsd-csh scripts, the original application output will not be printed. As a workaround, use a different C-shell, e.g. tcsh.
Attempting to use the
--clock-control
option to set the GPU clocks will fail when profiling on a GPU partition. Please usenvidia-smi
(installed with NVIDIA display driver) to control the clocks for the entire GPU. This will require administrative privileges when the GPU is partitioned.On Linux aarch64, NVIDIA Nsight Compute does not work if the HOME environment variable is not set.
NVIDIA Nsight Compute versions 2020.1.0 to 2020.2.1 are not compatible with CUDA driver version 460+ if the application launches Cooperative Groups kernels. Profiling will fail with error “UnknownError”.
Collecting CPU call stack information on Windows Server 2016 can hang NVIDIA Nsight Compute in some cases. Currently, the only workaround is to skip CPU call stack collection on such systems by not specifying the option
--call-stack
.When profiling a script,
--target-processes all
may target utility executables such as xargs, uname or ls. To avoid profiling these, use the--target-processes-filter
option accordingly.On mobile platforms,
--kill
option is not supported with application replay mode.NVIDIA Nsight Compute might show invalid characters for Unicode names and paths on Windows 10. As a workaround, use a third-party terminal emulator, e.g. Git bash.
CPU call stack collection may lead to crashes for CUDA applications with device functions compiled with gcc 13.2.
User Interface
The API Statistics filter in NVIDIA Nsight Compute does not support units.
File size is the only property considered when resolving source files. Timestamps are currently ignored.
Terminating or disconnecting an application in the Interactive Profiling activity while the API Stream View is updated can lead to a crash.
See the OptiX library support section for limitations concerning the Acceleration Structure Viewer.
After updating from a previous version of NVIDIA Nsight Compute on Linux, the file load dialog may not allow column resizing and sorting. As a workaround, the ~/.config/QtProject.conf file can be edited to remove the treeViewHeader entry from the [FileDialog] section.