nvcc Compiler Switches
nvcc
The NVIDIA nvcc compiler driver converts .cu files into C++ for the host system and CUDA assembly or binary instructions for the device. It supports a number of command-line parameters, of which the following are especially useful for optimization and related best practices:
-maxrregcount=Nspecifies the maximum number of registers kernels can use at a per-file level. See Register Pressure. (See also the__launch_bounds__qualifier discussed in Execution Configuration of the CUDA C++ Programming Guide to control the number of registers used on a per-kernel basis.)--ptxas-options=-vor-Xptxas=-vlists per-kernel register, shared, and constant memory usage.-ftz=true(denormalized numbers are flushed to zero)-prec-div=false(less precise division)-prec-sqrt=false(less precise square root)-use_fast_mathcompiler option ofnvcccoerces everyfunctionName()call to the equivalent__functionName()call. This makes the code run faster at the cost of diminished precision and accuracy. See Math Libraries.