NVIDIA Tegra
DRIVE 5.0 Linux Open Source Software

Development Guide
5.0.10.3 Release


 
Installing CUDA
 
Installing the CUDA Toolkit and Packages on the Host
Installing the CUDA Toolkit on the Target
Tips on Running CUDA Samples
Tips for Increasing the number of File Descriptors
CUDA Links
This topic explains how to install NVIDIA® CUDA® on your host and target systems. It also provides tips on running CUDA sample applications.
CUDA technology provides a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).
Types of CUDA tools packages (host vs target)
NVIDIA Linux includes the following separate and distinct CUDA tools:
Host: This is a Debian file (.deb) that gets installed on the amd64 Ubuntu 16.04 Linux host system that provides all the essential tools (compiler, linker, libraries, samples and documentation) for building CUDA applications via cross compiling for arm64 target systems. This is the primary CUDA tools install for all production level software to be run on the target system. This is the required and recommended CUDA tools package for the Linux release. In addition, production CUDA applications should be statically linked with supplementary CUDA libraries (cuBLAS, cuFFT, etc.) into a single binary for compatibility both Ubuntu and Yocto based target systems.
Target: This is a run file (.run) that provides a native only arm64 based CUDA toolkit for the arm64 Ubuntu 16.04 root file system on the target. Please note that this target CUDA toolkit is designed to install only on the arm64 Ubuntu 16.04 root file target and will not install on the Yocto GENIVI reference target root file system. This target based CUDA toolkit is provided for convenience for developing CUDA applications directly on the Ubuntu 16.04 target system and for providing the full set of arm64 libraries for cross compiling on the Host CUDA toolkit. Building CUDA applications directly on the target itself with this CUDA toolkit is not recommended for production level CUDA applications.
Prerequisites
Before installing CUDA, remove all previously installed CUDA packages on your host system.
CUDA packages require approximately 2 GB of disk space.
Installing the CUDA Toolkit and Packages on the Host
In this step, you install CUDA toolkit and installation packages to your amd64 Ubuntu 16.04 host system.
To install the CUDA toolkit and packages on the host
1. Copy the provided cuda-repo-ubuntu-<deb_ver>_14.04 or 16.04-amd64.deb CUDA toolkit Debian package for Ubuntu 14.04 or 16.04 (64-bit distribution) from the separate CUDA folder included with the release.
2. Install the CUDA toolkit on the host by entering the following commands:
sudo dpkg -i cuda-repo-ubuntu-<deb_ver>_14.04 or 16.04-amd64.deb
sudo dpkg --add-architecture arm64
sudo apt-get update
sudo apt-get install cuda-toolkit-<ver>
sudo apt-get install cuda-cross-aarch64-<ver>
Where
<deb_ver> is the version for the CUDA Debian package.
<ver> is the CUDA version as specified in the Release Notes.
This installs CUDA in the following location:
/usr/local/cuda-<ver>
 
Note:
Cross-compile libraries required for some application may not be installed with the above steps. The complete set of libraries must be copied from the target CUDA install directory.
 
3. After installing CUDA toolkit on the Ubuntu 16.04 target, perform the following steps on the host:
Copy the contents of /usr/local/cuda-<ver>/targets/aarch64-linux/lib/ on the target to the same location on the host system.
It is also possible to remote-mount this directory from the target to the host.
For more information on host CUDA installation for cross-compilation, see the NVIDIA CUDA Getting Started Guide for Linux.
Installing the CUDA Toolkit on the Target
In this step, you run a script that puts a CUDA installation run-once-package on the Ubuntu 16.04 target system. When you next flash and boot the target, the package installs CUDA on the Ubuntu 16.04 target filesystem.
Important:
Run the CUDA installer before flashing and booting the target in order for the target to pick up the target side installs. The CUDA installer can be run after first boot if the rootfs is NFS mounted. If the rootfs is mounted from local target memory, the target must be re-flashed with the updated reference targetfs.
 
Note:
The CUDA EGL Stream Interoperability includes the following issues:
The API is not finalized and subject to change.
The header file, cudaEGL.h, is not part of the host CUDA toolkit. The file must be copied from the target to the host to support cross-development of CUDA applications using EGL Stream Interoperability.
 
Prerequisites
You have installed the CUDA toolkit and packages to the host system.
The target is connected to the host system.
If you are installing CUDA on USB, you have prepared the root file system.
To install the CUDA toolkit on the target
Before setting up the rootfs, and before the first boot of the target, execute the CUDA installation script on the host:
bash drive-t186ref-cuda-5.0.10.3.run
The run script installs the following self-deleting run-once package on the target:
drive-setup.sh install-run-once-pkgs
The next time the target is flashed, the run-once package executes and installs the CUDA toolkit on the target.
Tips on Running CUDA Samples
This topic provides tips on running CUDA samples.
Some CUDA samples must be run from a desktop manager, such as xfce4.
The following warning message may appear when launching CUDA applications. The message can safely be ignored:
libkmod: ERROR ../libkmod/libkmod.c:554 kmod_search_moddep:
could not open moddep file '/lib/modules/3.10.17-g3850ea5/modules.dep.bin'
The samples are installed at the following directory location NVIDIA_CUDA-<ver>_Samples
Tips for Increasing the number of File Descriptors
Each system has a default limit on the number of file descriptors available for processes. Processes that need more file descriptors than the limit must request the increase by themselves. There are three options for increasing the number of file descriptors:
Using the setrlimit system call (from sys/resource.h) in the program can raise the limit up to a maximum of 1024*1024. Changing limits needs sudo or root permissions.
Setting ulimit -n <value> in the shell (parent process) can raise the limit so that any child process it starts can use the raised limit. Changing limit for the first time is allowed without sudo or root permissions.
Specifying the value of limit for LimitNOFILE property, located in the [Service] section of the systemd unit. The service must start with root privileges.
 
CUDA Links
More information on getting started with CUDA is available in NVIDIA CUDA Getting Started Guide for Linux:
http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html
Instructions on cross compiling are available in NVIDIA CUDA Compiler Driver NVCC:
http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html
More information on CUDA is also available at:
http://docs.nvidia.com/cuda