NVIDIA® Nsight™ Application Development Environment for Heterogeneous Platforms, Visual Studio Edition 5.3 User Guide
Send Feedback
Knowing the number of executed integer operations and the mix of instructions is a useful input for come up with sound estimates of performance expectations for a kernel. And it also serves as basis for better understanding many other metrics and experiments. Similar to the Achieved FLOPS experiment, its primary benefit is tracking and evaluating differences in performance for the code changes made; rather than deriving the actual cause of a performance limitation.
Some operations may be implemented with multiple instructions (possibly of different native types) for efficiency. For example, 24-bit intrinsic operations may be implemented using 32-bit integer instructions, and integer instructions may be implemented using the mantissa of multiple floating-point instructions.
References to specific assembly instructions in this document are made in regard to the actual instruction set architecture (ISA) of the hardware, called SASS. For a description and a full list of SASS assembly instructions for the different CUDA compute architectures see the documentation of the NVIDIA CUDA tool cuobjdump.
Integer Operations![]() Reports the weighted sum of hit counts for executed integer instructions grouped by instruction class. The applied weights can be customized prior to data collection using the Experiment Configuration on the Activity Page. Depending on the actual operation and the used compiler flags, a single integer instruction written in CUDA C may result in multiple instructions in assembly code. The reported values refer to the executed assembly instructions; therefore the numbers may differ from expectations derived exclusively from the CUDA C code. Use the Source View page to investigate the mapping between high-level code and assembly code. MetricsADD
Weighted sum of all executed integer additions ( MUL
Weighted sum of all executed integer multiplications ( MAD
Weighted sum of all executed integer multiply-add ( SAD
Weighted sum of all executed sum-of-absolute-differences ( SCADD
Weighted sum of all executed shift-and-add ( Shifts
Weighted sum of all executed shift instructions, covering shift-right ( Bit Ops
Weighted sum of all executed integer bit operations. Specifically accouning for bit-field-extract ( |
Integer Operations per Second![]() Reports the weighted sum of hit counts for executed integer instructions per second. The chart is a stacked bar graph using the very same instruction classes as the Integer Operations chart. MetricsMath Weighted sum of all executed integer arithmetic instructions per second. Combines the individual contributions of ADD, MUL, MAD, SAD, and SCADD as reported in the Integer Operations chart. Other Weighted sum of shift instructions and bit operations per second. |
General hints for optimizing integer Arithmetic Instructions are given in the CUDA C Best Practices Guide:
Also try mimimizing the number of executed arithmetic instructions with low throughput. The CUDA C Programming Guide provides a list of the expected throughputs for all native Arithmetic Instructions per compute capability.
NVIDIA® Nsight™ Application Development Environment for Heterogeneous Platforms, Visual Studio Edition User Guide Rev. 5.3.170616 ©2009-2017. NVIDIA Corporation. All Rights Reserved.