2. Basic Usage

2.1. Linking with libdevice

The libdevice library ships as an LLVM bitcode library and is meant to be linked with the target module early in the compilation process. The standard process for linking with libdevice is to first link it with the target module, then run the standard LLVM optimization and code generation passes. This allows the optimizers to inline and perform analyses on the used library functions, and eliminate any used functions as dead code.

Users of libnvvm can link with libdevice by adding the appropriate libdevice module to the nvvmProgram object being compiled. In addition, the following options for nvvmCompileProgram affect the behavior of libdevice functions:

Table 1. Supported Reflection Parameters
Parameter Values Description
-ftz 0 (default) preserve denormal values, when performing single-precision floating-point operations
1 flush denormal values to zero, when performing single-precision floating-point operations
-prec-div 0 use a faster approximation for single-precision floating-point division and reciprocals
1 (default) use IEEE round-to-nearest mode for single-precision floating-point division and reciprocals
-prec-sqrt 0 use IEEE round-to-nearest mode for single-precision floating-point square root
1 (default) use a faster approximation for single-precision floating-point square root

The following pseudo-code shows an example of linking an NVVM IR module with the libdevice library using libnvvm:

nvvmProgram prog;
size_t libdeviceModSize;

const char *libdeviceMod = loadFile('/path/to/libdevice.*.bc',
                                    &libdeviceModSize);
const char *myIr = /* NVVM IR in text or binary format */;
size_t myIrSize = /* size of myIr in bytes */;

// Create NVVM program object
nvvmCreateProgram(&prog);

// Add libdevice module to program
nvvmAddModuleToProgram(prog, libdeviceMod, libdeviceModSize);

// Add custom IR to program
nvvmAddModuleToProgram(prog, myIr, myIrSize);

// Declare compile options
const char *options[] = { "-ftz=1" };

// Compile the program
nvvmCompileProgram(prog, 1, options);

It is the responsibility of the client program to locate and read the libdevice library binary (represented by the loadFile function in the example).

2.2. Selecting Library Version

The libdevice library ships with several versions, each tuned for optimal performance on a particular device architecture. The following table provides a guideline for choosing the best libdevice version for the target architecture. All versions can be found in the CUDA Toolkit under nvvm/libdevice/<library-name>.

Table 2. Library version selection guidelines
Compute Capability Library
2.0 ≤ Arch < 3.0 libdevice.compute_20.XX.bc
Arch = 3.0 libdevice.compute_30.XX.bc
3.1 ≤ Arch < 3.5 libdevice.compute_20.XX.bc
3.5 ≤ Arch ≤ 3.7 libdevice.compute_35.XX.bc
3.7 < Arch < 5.0 libdevice.compute_30.XX.bc
5.0 ≤ Arch ≤ 5.3 libdevice.compute_50.XX.bc
Arch > 5.3 libdevice.compute_30.XX.bc

The XX in the library name corresponds to the libdevice library version number.