CUDA Toolkit Documentation - v6.5 (older) - Last updated August 1, 2014 - Send Feedback -

CUDA Toolkit Documentation v6.5


Release Notes
The Release Notes for the CUDA Toolkit.
EULA
The End User License Agreements for the NVIDIA CUDA Toolkit, the NVIDIA CUDA Samples, the NVIDIA Display Driver, and NVIDIA NSight (Visual Studio Edition).

Getting Started Guides


Getting Started Linux
This guide discusses how to install and check for correct operation of the CUDA Development Tools on GNU/Linux systems.
Getting Started Mac OS X
This guide discusses how to install and check for correct operation of the CUDA Development Tools on Mac OS X systems.
Getting Started Windows
This guide discusses how to install and check for correct operation of the CUDA Development Tools on Microsoft Windows systems.

Programming Guides


Programming Guide
This guide provides a detailed discussion of the CUDA programming model and programming interface. It then describes the hardware implementation, and provides guidance on how to achieve maximum performance. The Appendixes include a list of all CUDA-enabled devices, detailed description of all extensions to the C language, listings of supported mathematical functions, C++ features supported in host and device code, details on texture fetching, technical specifications of various devices, and concludes by introducing the low-level driver API.
Best Practices Guide
This guide presents established parallelization and optimization techniques and explains coding metaphors and idioms that can greatly simplify programming for CUDA-capable GPU architectures. The intent is to provide guidelines for obtaining the best performance from NVIDIA GPUs using the CUDA Toolkit.
Maxwell Compatibility Guide
This application note is intended to help developers ensure that their NVIDIA CUDA applications will run properly on GPUs based on the NVIDIA Maxwell Architecture. This document provides guidance to ensure that your software applications are compatible with Maxwell.
Kepler Tuning Guide
Kepler is NVIDIA's 3rd-generation architecture for CUDA compute applications. Applications that follow the best practices for the Fermi architecture should typically see speedups on the Kepler architecture without any code changes. This guide summarizes the ways that applications can be fine-tuned to gain additional speedups by leveraging Kepler architectural features.
Maxwell Tuning Guide
Maxwell is NVIDIA's 4th-generation architecture for CUDA compute applications. Applications that follow the best practices for the Kepler architecture should typically see speedups on the Maxwell architecture without any code changes. This guide summarizes the ways that applications can be fine-tuned to gain additional speedups by leveraging Maxwell architectural features.
PTX ISA
This guide provides detailed instructions on the use of PTX, a low-level parallel thread execution virtual machine and instruction set architecture (ISA). PTX exposes the GPU as a data-parallel computing device.
Developer Guide for Optimus
This document explains how CUDA APIs can be used to query for GPU capabilities in NVIDIA Optimus systems.
Video Decoder
This document provides the video decoder API specification and the format conversion and display using DirectX or OpenGL following decode.
Inline PTX Assembly
This document shows how to inline PTX (parallel thread execution) assembly language statements into CUDA code. It describes available assembler statement parameters and constraints, and the document also provides a list of some pitfalls that you may encounter.

CUDA API References


CUDA Runtime API
The CUDA runtime API.
CUDA Driver API
The CUDA driver API.
CUDA Math API
The CUDA math API.
cuBLAS
The cuBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the NVIDIA CUDA runtime. It allows the user to access the computational resources of NVIDIA Graphical Processing Unit (GPU), but does not auto-parallelize across multiple GPUs.
NVBLAS
The NVBLAS library is a multi-GPUs accelerated drop-in BLAS (Basic Linear Algebra Subprograms) built on top of the NVIDIA cuBLAS Library.
cuFFT
The cuFFT library user guide.
cuRAND
The cuRAND library user guide.
cuSPARSE
The cuSPARSE library user guide.
NPP
NVIDIA NPP is a library of functions for performing CUDA accelerated processing. The initial set of functionality in the library focuses on imaging and video processing and is widely applicable for developers in these areas. NPP will evolve over time to encompass more of the compute heavy tasks in a variety of problem domains. The NPP library is written to maximize flexibility, while maintaining high performance.
Thrust
The Thrust getting started guide.
CUDA Samples
This document contains a complete listing of the code samples that are included with the NVIDIA CUDA Toolkit. It describes each code sample, lists the minimum GPU specification, and provides links to the source code and white papers if available.

Tools


NVCC
This document is a reference guide on the use of the CUDA compiler driver nvcc. Instead of being a specific CUDA compilation driver, nvcc mimics the behavior of the GNU compiler gcc, accepting a range of conventional compiler options, such as for defining macros and include/library paths, and for steering the compilation process.
CUDA-GDB
The NVIDIA tool for debugging CUDA applications running on Linux and Mac, providing developers with a mechanism for debugging CUDA applications running on actual hardware. CUDA-GDB is an extension to the x86-64 port of GDB, the GNU Project debugger.
CUDA-MEMCHECK
CUDA-MEMCHECK is a suite of run time tools capable of precisely detecting out of bounds and misaligned memory access errors, checking device allocation leaks, reporting hardware errors and identifying shared memory data access hazards.
Nsight Eclipse Edition
Nsight Eclipse Edition getting started guide
Profiler
This is the guide to the Profiler.
CUDA Binary Utilities
The application notes for cuobjdump, nvdisasm, and nvprune.

White Papers


Floating Point and IEEE 754
A number of issues related to floating point accuracy and compliance are a frequent source of confusion on both CPUs and GPUs. The purpose of this white paper is to discuss the most common issues related to NVIDIA GPUs and to supplement the documentation in the CUDA C Programming Guide.
Incomplete-LU and Cholesky Preconditioned Iterative Methods
In this white paper we show how to use the cuSPARSE and cuBLAS libraries to achieve a 2x speedup over CPU in the incomplete-LU and Cholesky preconditioned iterative methods. We focus on the Bi-Conjugate Gradient Stabilized and Conjugate Gradient iterative methods, that can be used to solve large sparse nonsymmetric and symmetric positive definite linear systems, respectively. Also, we comment on the parallel sparse triangular solve, which is an essential building block in these algorithms.

Compiler SDK


libNVVM API
The libNVVM API.
libdevice User's Guide
The libdevice library is an LLVM bitcode library that implements common functions for GPU kernels.
NVVM IR
NVVM IR is a compiler IR (internal representation) based on the LLVM IR. The NVVM IR is designed to represent GPU compute kernels (for example, CUDA kernels). High-level language front-ends, like the CUDA C compiler front-end, can generate NVVM IR.

Miscellaneous


CUPTI
The CUPTI API.
Debugger API
The CUDA debugger API.
GPUDirect RDMA
A technology introduced in Kepler-class GPUs and CUDA 5.0, enabling a direct path for communication between the GPU and a third-party peer device on the PCI Express bus when the devices share the same upstream root complex using standard features of PCI Express. This document introduces the technology and describes the steps necessary to enable a GPUDirect RDMA connection to NVIDIA GPUs within the Linux device driver model.