NVIDIA CUDA Compiler Driver NVCC

The documentation for nvcc, the CUDA compiler driver.

1. Introduction

1.1. Overview

1.1.1. CUDA Programming Model

The CUDA Toolkit targets a class of applications whose control part runs as a process on a general purpose computing device, and which use one or more NVIDIA GPUs as coprocessors for accelerating single program, multiple data (SPMD) parallel jobs. Such jobs are self-contained, in the sense that they can be executed and completed by a batch of GPU threads entirely without intervention by the host process, thereby gaining optimal benefit from the parallel graphics hardware.

The GPU code is implemented as a collection of functions in a language that is essentially C++, but with some annotations for distinguishing them from the host code, plus annotations for distinguishing different types of data memory that exists on the GPU. Such functions may have parameters, and they can be called using a syntax that is very similar to regular C function calling, but slightly extended for being able to specify the matrix of GPU threads that must execute the called function. During its life time, the host process may dispatch many parallel GPU tasks.

For more information on the CUDA programming model, consult the CUDA C++ Programming Guide.

1.1.2. CUDA Sources

Source files for CUDA applications consist of a mixture of conventional C++ host code, plus GPU device functions. The CUDA compilation trajectory separates the device functions from the host code, compiles the device functions using the proprietary NVIDIA compilers and assembler, compiles the host code using a C++ host compiler that is available, and afterwards embeds the compiled GPU functions as fatbinary images in the host object file. In the linking stage, specific CUDA runtime libraries are added for supporting remote SPMD procedure calling and for providing explicit GPU manipulation such as allocation of GPU memory buffers and host-GPU data transfer.

1.1.3. Purpose of NVCC

The compilation trajectory involves several splitting, compilation, preprocessing, and merging steps for each CUDA source file. It is the purpose of nvcc, the CUDA compiler driver, to hide the intricate details of CUDA compilation from developers. It accepts a range of conventional compiler options, such as for defining macros and include/library paths, and for steering the compilation process. All non-CUDA compilation steps are forwarded to a C++ host compiler that is supported by nvcc, and nvcc translates its options to appropriate host compiler command line options.

1.2. Supported Host Compilers

A general purpose C++ host compiler is needed by nvcc in the following situations:

During non-CUDA phases (except the run phase), because these phases will be forwarded by nvcc to this compiler.
During CUDA phases, for several preprocessing stages and host code compilation (see also The CUDA Compilation Trajectory).

nvcc assumes that the host compiler is installed with the standard method designed by the compiler provider. If the host compiler installation is non-standard, the user must make sure that the environment is set appropriately and use relevant nvcc compile options.

The following documents provide detailed information about supported host compilers:

On all platforms, the default host compiler executable (gcc and g++ on Linux and cl.exe on Windows) found in the current execution search path will be used, unless specified otherwise with appropriate options (see file-and-path-specifications).

Note, nvcc does not support the compilation of file paths that exceed the maximum path length limitations of the host system. To support the compilation of long file paths, please refer to the documentation for your system.

2. Compilation Phases

2.1. NVCC Identification Macro

nvcc predefines the following macros:

__NVCC__: Defined when compiling C/C++/CUDA source files.
__CUDACC__: Defined when compiling CUDA source files.
__CUDACC_RDC__: Defined when compiling CUDA source files in relocatable device code mode (see NVCC Options for Separate Compilation).
__CUDACC_EWP__: Defined when compiling CUDA source files in extensible whole program mode (see Options for Specifying Behavior of Compiler/Linker).
__CUDACC_TILE__: Defined when Tile compilation support is enabled, with the --enable-tile, --tile-only or --simt-only flags.
__CUDACC_DEBUG__: Defined when compiling CUDA source files in the device-debug mode (see Options for Specifying Behavior of Compiler/Linker).
__CUDACC_RELAXED_CONSTEXPR__: Defined when the --expt-relaxed-constexpr flag is specified on the command line. Refer to the CUDA C++ Programming Guide for more details.
__CUDACC_EXTENDED_LAMBDA__: Defined when the --expt-extended-lambda or --extended-lambda flag is specified on the command line. Refer to the CUDA C++ Programming Guide for more details.
__CUDACC_VER_MAJOR__: Defined with the major version number of nvcc.
__CUDACC_VER_MINOR__: Defined with the minor version number of nvcc.
__CUDACC_VER_BUILD__: Defined with the build version number of nvcc.
__NVCC_DIAG_PRAGMA_SUPPORT__: Defined when the CUDA frontend compiler supports diagnostic control with the nv_diag_suppress, nv_diag_error, nv_diag_warning, nv_diag_default, nv_diag_once, and nv_diagnostic pragmas.
__CUDACC_DEVICE_ATOMIC_BUILTINS__: Defined when the CUDA frontend compiler supports device atomic compiler builtins. Refer to the CUDA C++ Programming Guide for more details.

2.2. NVCC Phases

A compilation phase is a logical translation step that can be selected by command line options to nvcc. A single compilation phase can still be broken up by nvcc into smaller steps, but these smaller steps are just implementations of the phase: they depend on seemingly arbitrary capabilities of the internal tools that nvcc uses, and all of these internals may change with a new release of the CUDA Toolkit. Hence, only compilation phases are stable across releases, and although nvcc provides options to display the compilation steps that it executes, these are for debugging purposes only and must not be copied and used in build scripts.

nvcc phases are selected by a combination of command line options and input file name suffixes, and the execution of these phases may be modified by other command line options. In phase selection, the input file suffix defines the phase input, while the command line option defines the required output of the phase.

The following paragraphs list the recognized file name suffixes and the supported compilation phases. A full explanation of the nvcc command line options can be found in NVCC Command Options.

2.3. Supported Input File Suffixes

The following table defines how nvcc interprets its input files:

Input File Suffix	Description
`.cu`	CUDA source file, containing host code and device functions
`.c`	C source file
`.cc`, `.cxx`, `.cpp`	C++ source file
`.ptx`	PTX intermediate assembly file (see Figure 1)
`.cubin`	CUDA device code binary file (CUBIN) for a single GPU architecture (see Figure 1)
`.fatbin`	CUDA fat binary file that may contain multiple PTX and CUBIN files (see Figure 1)
`.o`, `.obj`	Object file
`.a`, `.lib`	Library file
`.res`	Resource file
`.so`	Shared object file

Note that nvcc does not make any distinction between object, library or resource files. It just passes files of these types to the linker when the linking phase is executed.

2.4. Supported Phases

The following table specifies the supported compilation phases, plus the option to nvcc that enables the execution of each phase. It also lists the default name of the output file generated by each phase, which takes effect when no explicit output file name is specified using the option --output-file:

Phase	`nvcc` Option		Default Output File Name
	Long Name	Short Name
CUDA compilation to C/C++ source file	`--cuda`	`-cuda`	`.cpp.ii` appended to source file name, as in `x.cu.cpp.ii`. This output file can be compiled by the host compiler that was used by `nvcc` to preprocess the `.cu` file.
C/C++ preprocessing	`--preprocess`	`-E`	<result on standard output>
C/C++ compilation to object file	`--compile`	`-c`	Source file name with suffix replaced by `o` on Linux or `obj` on Windows
SIMT Cubin generation from CUDA source files	`--cubin`	`-cubin`	Source file name with suffix replaced by `cubin`
Cubin generation from PTX intermediate files.	`--cubin`	`-cubin`	Source file name with suffix replaced by `cubin`
PTX generation from CUDA source files	`--ptx`	`-ptx`	Source file name with suffix replaced by `ptx`
SIMT Fatbinary generation from source, PTX or cubin files	`--fatbin`	`-fatbin`	Source file name with suffix replaced by `fatbin`
Tile cubin generation from CUDA source files	`--tilecubin`	`-tilecubin`	Source file name with suffix replaced by `.tile.cubin`
Tile fatbinary generation from CUDA source files	`--tilefatbin`	`-tilefatbin`	Source file name with suffix replaced by `.tile.fatbin`
cuda_tile generation from CUDA source files	`--tilebc`	`-tilebc`	Source file name with suffix replaced by `.tilebc`
Linking relocatable device code.	`--device-link`	`-dlink`	`a_dlink.obj` on Windows or `a_dlink.o` on other platforms
Cubin generation from linked relocatable device code.	`--device-link--cubin`	`-dlink-cubin`	`a_dlink.cubin`
Fatbinary generation from linked relocatable device code	`--device-link--fatbin`	`-dlink-fatbin`	`a_dlink.fatbin`
Linking an executable	<no phase option>		`a.exe` on Windows or `a.out` on other platforms
Constructing an object file archive, or library	`--lib`	`-lib`	`a.lib` on Windows or `a.a` on other platforms
`make` dependency generation	`--generate-dependencies`	`-M`	<result on standard output>
`make` dependency generation without headers in system paths.	`--generate-nonsystem-dependencies`	`-MM`	<result on standard output>
Compile CUDA source to OptiX IR output.	`--optix-ir`	`-optix-ir`	Source file name with suffix replaced by `optixir`
Compile CUDA source to LTO IR output.	`--ltoir`	`-ltoir`	Source file name with suffix replaced by `ltoir`
Running an executable	`--run`	`-run`

Notes:

The last phase in this list is more of a convenience phase. It allows running the compiled and linked executable without having to explicitly set the library path to the CUDA dynamic libraries.
Unless a phase option is specified, nvcc will compile and link all its input files.

3. The CUDA Compilation Trajectory

CUDA compilation works as follows: the input program is preprocessed for device compilation and is compiled to CUDA binary (cubin) and/or PTX or cuda_tile intermediate code, which are placed in a fatbinary. The input program is preprocessed once again for host compilation and is synthesized to embed the fatbinary and transform CUDA specific C++ extensions into standard C++ constructs. Then the C++ host compiler compiles the synthesized host code with the embedded fatbinary into a host object. The exact steps that are followed to achieve this are displayed in Figure 1.

The embedded fatbinary is inspected by the CUDA runtime system whenever the device code is launched by the host program to obtain an appropriate fatbinary image for the current GPU.

CUDA programs are compiled in the whole program compilation mode by default, i.e., the device code cannot reference an entity from a separate file. In the whole program compilation mode, device link steps have no effect. For more information on the separate compilation and the whole program compilation, refer to Using Separate Compilation in CUDA.

4. NVCC Command Options

4.1. Command Option Types and Notation

Each nvcc option has a long name and a short name, which are interchangeable with each other. These two variants are distinguished by the number of hyphens that must precede the option name: long names must be preceded by two hyphens, while short names must be preceded by a single hyphen. For example, -I is the short name of --include-path. Long options are intended for use in build scripts, where the size of the option is less important than the descriptive value. In contrast, short options are intended for interactive use.

nvcc recognizes three types of command options: boolean options, single value options, and list options.

Boolean options do not have an argument; they are either specified on the command line or not. Single value options must be specified at most once, but list options may be repeated. Examples of each of these option types are, respectively: --verbose (switch to verbose mode), --output-file (specify output file), and --include-path (specify include path).

Single value options and list options must have arguments, which must follow the name of the option itself by either one of more spaces or an equals character. When a one-character short name such as -I, -l, and -L is used, the value of the option may also immediately follow the option itself without being seperated by spaces or an equal character. The individual values of list options may be separated by commas in a single instance of the option, or the option may be repeated, or any combination of these two cases.

Hence, for the two sample options mentioned above that may take values, the following notations are legal:

-o file

-o=file

-Idir1,dir2 -I=dir3 -I dir4,dir5

Unless otherwise specified, long option names are used throughout this document. However, short names can be used instead of long names for the same effect.

4.2. Command Option Description

This section presents tables of nvcc options. The option type in the tables can be recognized as follows: Boolean options do not have arguments specified in the first column, while the other two types do. List options can be recognized by the repeat indicator ,... at the end of the argument.

Long options are described in the first column of the options table, and short options occupy the second column.

4.2.1. File and Path Specifications

4.2.1.1. `--output-file file` (`-o`)

Specify name and location of the output file.

4.2.1.2. `--objdir-as-tempdir` (`-objtemp`)

Create all intermediate files in the same directory as the object file. These intermediate files are deleted when the compilation is finished. This option will take effect only if -c, -dc or -dw is also used. Using this option will ensure that the intermediate file name that is embedded in the object file will not change in multiple compiles of the same file. However, this is not guaranteed if the input is stdin. If the same file is compiled with two different options, ex., ‘nvcc -c t.cu’ and ‘nvcc -c -ptx t.cu’, then the files should be compiled in different directories. Compiling them in the same directory can either cause the compilation to fail or produce incorrect results.

4.2.1.3. `--pre-include file,...` (`-include`)

Specify header files that must be pre-included during preprocessing.

4.2.1.4. `--library library,...` (`-l`)

Specify libraries to be used in the linking stage without the library file extension.

The libraries are searched for on the library search paths that have been specified using option --library-path (see Libraries).

4.2.1.5. `--define-macro def,...` (`-D`)

Define macros to be used during preprocessing.

def can be either name or name=definition.

name - Predefine name as a macro.
name=definition - The contents of definition are tokenized and preprocessed as if they appear during translation phase three in a #define directive. The definition will be truncated by embedded new line characters.

4.2.1.6. `--undefine-macro def,...` (`-U`)

Undefine an existing macro during preprocessing or compilation.

4.2.1.7. `--include-path path,...` (`-I`)

Specify include search paths.

4.2.1.8. `--system-include path,...` (`-isystem`)

Specify system include search paths.

4.2.1.9. `--library-path path,...` (`-L`)

Specify library search paths (see Libraries).

4.2.1.10. `--output-directory directory` (`-odir`)

Specify the directory of the output file.

This option is intended for letting the dependency generation step (see --generate-dependencies) generate a rule that defines the target object file in the proper directory.

4.2.1.11. `--dependency-output file` (`-MF`)

Specify the dependency output file.

This option specifies the output file for the dependency generation step (see --generate-dependencies). The option --generate-dependencies or --generate-nonystem-dependencies must be specified if a dependency output file is set.

4.2.1.12. `--generate-dependency-targets` (`-MP`)

Add an empty target for each dependency.

This option adds phony targets to the dependency generation step (see --generate-dependencies) intended to avoid makefile errors if old dependencies are deleted. The input files are not emitted as phony targets.

4.2.1.13. `--compiler-bindir directory` (`-ccbin`)

Specify the directory in which the default host compiler executable resides.

The host compiler executable name can be also specified to ensure that the correct host compiler is selected. In addition, driver prefix options (--input-drive-prefix, --dependency-drive-prefix, or --drive-prefix) may need to be specified, if nvcc is executed in a Cygwin shell or a MinGW shell on Windows.

4.2.1.14. `--allow-unsupported-compiler` (`-allow-unsupported-compiler`)

Disable nvcc check for supported host compiler versions.

Using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk. This option has no effect on MacOS.

4.2.1.15. `--archiver-binary executable` (`-arbin`)

Specify the path of the archiver tool used create static library with --lib.

4.2.1.16. `--cudart` {`none`|`shared`|`static` |`hybrid`} (`-cudart`)

Specify the type of CUDA runtime library to be used: no CUDA runtime library, shared/dynamic CUDA runtime library, or static CUDA runtime library. On Windows, the shared option has been replaced by a hybrid option, where a small loader library is statically linked in that dynamically loads the runtime from the Display Driver.

Allowed Values

none
shared (non-Windows)
static
hybrid (Windows)

Default

The static CUDA runtime library is used by default except on Windows, where the hybrid approach is the default instead.

4.2.1.17. `--cudadevrt` {`none`|`static`} (`-cudadevrt`)

Specify the type of CUDA device runtime library to be used: no CUDA device runtime library, or static CUDA device runtime library.

Allowed Values

none
static

Default

The static CUDA device runtime library is used by default.

4.2.1.18. `--libdevice-directory directory` (`-ldir`)

Specify the directory that contains the libdevice library files.

Libdevice library files are located in the nvvm/libdevice directory in the CUDA Toolkit.

4.2.1.19. `--target-directory string` (`-target-dir`)

Specify the subfolder name in the targets directory where the default include and library paths are located.

4.2.1.20. `--apply-controls` (`-apply-controls`)

Specify the Advanced Controls File (ACF) that is passed to nvcc and ptxas. ACFs override default compilation behavior with the goal to provide specific optimizations for a specific workload.

Experimental flag:Treat ACFs as per-kernel and per-environment configurations. Using an ACF may cause compilation failure or incorrect runtime execution. Expect failures, and add timeouts, correctness checks, and good logging.

Note: Support for passing an ACF to nvcc is limited to Blackwell and later architectures.

For more information about Advanced Controls Files consult the CompileIQ documentation.

4.2.2. Options for Specifying the Compilation Phase

Options of this category specify up to which stage the input files must be compiled.

4.2.2.1. `--link` (`-link`)

Specify the default behavior: compile and link all input files.

Default Output File Name

a.exe on Windows or a.out on other platforms is used as the default output file name.

4.2.2.2. `--lib` (`-lib`)

Compile all input files into object files, if necessary, and add the results to the specified library output file.

Default Output File Name

a.lib on Windows or a.a on other platforms is used as the default output file name.

4.2.2.3. `--device-link` (`-dlink`)

Link object files with relocatable device code and .ptx, .cubin, and .fatbin files into an object file with executable device code, which can be passed to the host linker.

Default Output File Name

a_dlink.obj on Windows or a_dlink.o on other platforms is used as the default output file name. When this option is used in conjunction with --fatbin, a_dlink.fatbin is used as the default output file name. When this option is used in conjunction with --cubin, a_dlink.cubin is used as the default output file name.

4.2.2.4. `--device-c` (`-dc`)

Compile each .c, .cc, .cpp, .cxx, and .cu input file into an object file that contains relocatable device code.

It is equivalent to --relocatable-device-code=true --compile.

Default Output File Name

The source file name extension is replaced by .obj on Windows and .o on other platforms to create the default output file name. For example, the default output file name for x.cu is x.obj on Windows and x.o on other platforms.

4.2.2.5. `--device-w` (`-dw`)

Compile each .c, .cc, .cpp, .cxx, and .cu input file into an object file that contains executable device code.

It is equivalent to --relocatable-device-code=false --compile.

Default Output File Name

The source file name extension is replaced by .obj on Windows and .o on other platforms to create the default output file name. For example, the default output file name for x.cu is x.obj on Windows and x.o on other platforms.

4.2.2.6. `--cuda` (`-cuda`)

Compile each .cu input file to a .cu.cpp.ii file.

Default Output File Name

.cu.cpp.ii is appended to the basename of the source file name to create the default output file name. For example, the default output file name for x.cu is x.cu.cpp.ii.

4.2.2.7. `--compile` (`-c`)

Compile each .c, .cc, .cpp, .cxx, and .cu input file into an object file.

Default Output File Name

The source file name extension is replaced by .obj on Windows and .o on other platforms to create the default output file name. For example, the default output file name for x.cu is x.obj on Windows and x.o on other platforms.

4.2.2.8. `--fatbin` (`-fatbin`)

Compile all .cu, .ptx, and .cubin input files to device-only .fatbin files. Note that when the input is a CUDA source file, the Tile code will not be part of the fatbin.