nvcc Compiler Switches

nvcc

The NVIDIA nvcc compiler driver converts .cu files into C++ for the host system and CUDA assembly or binary instructions for the device. It supports a number of command-line parameters, of which the following are especially useful for optimization and related best practices:

  • -maxrregcount=N specifies the maximum number of registers kernels can use at a per-file level. See Register Pressure. (See also the__launch_bounds__ qualifier discussed in Execution Configuration of the CUDA C++ Programming Guide to control the number of registers used on a per-kernel basis.)

  • --ptxas-options=-v or -Xptxas=-v lists per-kernel register, shared, and constant memory usage.

  • -ftz=true (denormalized numbers are flushed to zero)

  • -prec-div=false (less precise division)

  • -prec-sqrt=false (less precise square root)

  • -use_fast_math compiler option of nvcc coerces every functionName() call to the equivalent __functionName() call. This makes the code run faster at the cost of diminished precision and accuracy. See Math Libraries.