nvcc Compiler Switches

nvcc

The NVIDIA nvcc compiler driver converts .cu files into C++ for the host system and CUDA assembly or binary instructions for the device. It supports a number of command-line parameters, of which the following are especially useful for optimization and related best practices:

-maxrregcount=N specifies the maximum number of registers kernels can use at a per-file level. See Register Pressure. (See also the__launch_bounds__ qualifier discussed in Execution Configuration of the CUDA C++ Programming Guide to control the number of registers used on a per-kernel basis.)
--ptxas-options=-v or -Xptxas=-v lists per-kernel register, shared, and constant memory usage.
-ftz=true (denormalized numbers are flushed to zero)
-prec-div=false (less precise division)
-prec-sqrt=false (less precise square root)
-use_fast_math compiler option of nvcc coerces every functionName() call to the equivalent __functionName() call. This makes the code run faster at the cost of diminished precision and accuracy. See Math Libraries.