nvcc Compiler Switches
nvcc
The NVIDIA nvcc
compiler driver converts .cu
files into C++ for the host system and CUDA assembly or binary instructions for the device. It supports a number of command-line parameters, of which the following are especially useful for optimization and related best practices:
-maxrregcount=N
specifies the maximum number of registers kernels can use at a per-file level. See Register Pressure. (See also the__launch_bounds__
qualifier discussed in Execution Configuration of the CUDA C++ Programming Guide to control the number of registers used on a per-kernel basis.)--ptxas-options=-v
or-Xptxas=-v
lists per-kernel register, shared, and constant memory usage.-ftz=true
(denormalized numbers are flushed to zero)-prec-div=false
(less precise division)-prec-sqrt=false
(less precise square root)-use_fast_math
compiler option ofnvcc
coerces everyfunctionName()
call to the equivalent__functionName()
call. This makes the code run faster at the cost of diminished precision and accuracy. See Math Libraries.