Abstract

The Caffe User Guide provides a detailed overview and look into using and customizing the Caffe deep learning framework. This guide also provides documentation on the NVIDIA Caffe parameters that you can use to help implement the optimizations of the container into your environment.

1. Overview of Caffe

Caffe is a deep-learning framework made with flexibility, speed, and modularity in mind. It was originally developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors.

NVIDIA® Caffe is an NVIDIA-maintained fork of BVLC Caffe tuned for NVIDIA GPUs, particularly in multi-GPU configurations. NVIDIA Caffe includes:
  • Supports 16-bit (half) floating point train and inference.
  • Mixed-precision support . It allows to store and/or compute data in either 64, 32 or 16-bit formats. Precision can be defined on each layer (forward and backward phases might be different too), or it can be set to a default for the whole Net.
  • Integration with cuDNN® v6.
  • Automatic selection of the best cuDNN convolution algorithm.
  • Integration with v1.3.4 of NVIDIA Collective Communications Library (NCCL®) for improved multi-GPU scaling.
  • Optimized GPU memory management for data and parameters storage, I/O buffers and workspace for convolutional layers.
  • Parallel data parser and transformer for improved I/O performance.
  • Parallel back-propagation and gradient reduction on multi-GPU systems.
  • Fast solvers implementation with fused CUDA® kernels for weights and history update.
  • Multi-GPU test phase for even memory load across multiple GPUs.
  • Backward compatibility with BVLC Caffe and NVIDIA Caffe 0.15.
  • Extended set of optimized models (including 16-bit floating point examples).

1.1. Contents of the NVIDIA Caffe Container

This image contains source and binaries for NVIDIA® Caffe. The pre-built and installed version of NVIDIA Caffe is located in the /usr/local/[bin,share,lib] directories. The complete source code is located in /opt/caffe directory.

This container image also includes pycaffe, which makes the Caffe interfaces available for use through Python.

The NVIDIA Collective Communications Library (NCCL®) library and NVIDIA Caffe bindings for NCCL are installed in this container, and models using multiple GPUs will automatically leverage this library for fast parallel training.

2. Pulling NVIDIA Caffe

You can pull (download) an NVIDIA® container that is already built, tuned, tested, and ready to run. Each NVIDIA deep learning container includes the code required to build the framework so that you can make changes to the internals. The containers do not contain sample data-sets or sample model definitions unless they are included with the source for the framework.

Containers are available for download from the DGX™ Container Registry (nvcr.io). NVIDIA has provided a number of containers for download from the DGX Container Registry . If your organization has provided you with access to any custom containers, you can download them as well.

The location of the framework source is in /opt/<framework> in each container.

Before pulling an NVIDIA Docker container, ensure that the following prerequisites are met:
  • You have read access to the registry space that contains the container.
  • You are logged into DGX™ Container Registry. For more information, see the NVIDIA Docker Container for Deep Learning Frameworks: Quick Start Guide.
  • You are member of the docker group, which enables you to use docker commands.
Tip: To browse the available containers in the DGX™ Container Registry, use a web browser to log in to your NVIDIA® DGX™ Cloud Services account on the DGX Cloud Services website.

Use the docker pull command to pull images from the NVIDIA DGX Container Registry or go to GitHub and download the source.

For step-by-step instructions on how to pull a container, see the NVIDIA Docker Containers for Deep Learning Frameworks: Quick Start Guide.

After pulling a container, you can run jobs in the container to run neural networks, deploy deep learning models, and perform AI analytics.

3. Verifying NVIDIA Caffe

After you run NVIDIA Caffe, it is a good idea to verify that the container image is running correctly. To do this, issue the following commands from within the container:
# cd /opt/caffe
# data/mnist/get_mnist.sh
# examples/mnist/create_mnist.sh
# examples/mnist/train_lenet.sh
If everything is running correctly, Caffe should download and create a data set, and then start training LeNet. If the training is successful, you will see a code similar to the following towards the end of the output:
I0402 15:08:01.016016 33 solver.cpp:431] Iteration 10000, loss = 0.0342847
I0402 15:08:01.016043 33 solver.cpp:453] Iteration 10000, Testing net (#0)
I0402 15:08:01.085050 38 data_reader.cpp:128] Restarting data pre-fetching
I0402 15:08:01.087720 33 solver.cpp:543] Test net output #0: accuracy = 0.9587
I0402 15:08:01.087751 33 solver.cpp:543] Test net output #1: loss = 0.130223 (* 1 = 0.130223 loss)
I0402 15:08:01.087767 33 caffe.cpp:239] Solver performance on device 0: 498.3 * 64 = 3.189e+04 img/sec
I0402 15:08:01.087780 33 caffe.cpp:242] Optimization Done in 24s

If Caffe is not running properly, or failed during the pulling phase, check your internet connection.

4. Running NVIDIA Caffe

To run a container, you must issue the nvidia-docker run command, specifying the registry, repository, and tags.

Before you can run an NVIDIA Docker deep learning framework container, you must have nvidia-docker installed. For more information, see Installing Docker and NVIDIA Docker in the Quick Start Guide.
  1. As a user, run the container interactively.
    $ nvidia-docker run --rm -ti nvcr.io/nvidia/<framework>

    The following example runs the December 2016 release (16.12) of the NVIDIA Caffe container in interactive mode. The container is automatically removed when the user exits the container.

    $ nvidia-docker run --rm -ti nvcr.io/nvidia/caffe:16.12
    
    ===========
    == Caffe ==
    ===========
    
    NVIDIA Release 16.12 (build 6217)
    
    Container image Copyright (c) 2016, NVIDIA CORPORATION.  All rights reserved.
    Copyright (c) 2014, 2015, The Regents of the University of California (Regents)
    All rights reserved.
    
    Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
    NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
    root@df57eb8e0100:/workspace#
    Note: You are now the root user in the container.
  2. From within the container, start the job that you want to run. The precise command to run depends on the deep learning framework in the container that you are running and the job that you want to run. For details see the /workspace/README.md file for the container.

    The following example runs the caffe time command on one GPU to measure the execution time of the deploy.prototxt model.

    # caffe time -model models/bvlc_alexnet/ -solver deploy.prototxt -gpu=0
  3. Optional: Run the December 2016 release (16.12) of the same NVIDIA Caffe container but in non-interactive mode.
    % nvidia-docker run --rm nvcr.io/nvidia/caffe:16.12 caffe time -model
          /workspace/models/bvlc_alexnet -solver /workspace/deploy.prototxt -gpu=0

5. Customizing and Extending NVIDIA Caffe

NVIDIA Docker images come prepackaged, tuned, and ready to run; however, you may want to build a new image from scratch or augment an existing image with custom code, libraries, data, or settings for your corporate infrastructure. This section will guide you through exercises that will highlight how to create a container from scratch, customize a container, extend a deep learning framework to add features, develop some code using that extended framework from the developer environment, then package that code as a versioned release.

By default, you do not need to build a container. The DGX-1 container repository from NVIDIA, nvcr.io, has a number of containers that can be used immediately. These include containers for deep learning as well as containers with just the CUDA Toolkit.

One of the great things about containers is that they can be used as starting points for creating new containers. This can be referred to as “customizing” or “extending” a container. You can create a container completely from scratch, however, since these containers are likely to run on the DGX-1, it is recommended that you are least start with a nvcr.io container that contains the OS and CUDA. However, you are not limited to this and can create a container that runs on the CPUs in the DGX-1 which does not use the GPUs. In this case, you can start with a bare OS container from the Docker Hub. However, to make development easier, you can still start with a container with CUDA - it is just not used when the container is used.

The customized or extended containers can be saved to a user’s private container repository. They can also be shared with other users of the DGX-1 but this requires some administrator help.

It is important to note that all NVIDIA Docker deep learning framework images include the source to build the framework itself as well as all of the prerequisites.
Attention: Do not install an NVIDIA driver into the docker image at docker build time. nvidia-docker is essentially a wrapper around docker that transparently provisions a container with the necessary components to execute code on the GPU.

A best-practice is to avoiddocker commit usage for developing new docker images, and to use Dockerfiles instead. The Dockerfile method provides visibility and capability to efficiently version-control changes made during development of a docker image. The docker commit method is appropriate for short-lived, disposable images only (see Example 2: Customizing NVIDIA Caffe using docker commit for an example.

For more information on writing a docker file, see the best practices documentation.

5.1. Benefits and Limitations to Customizing NVIDIA Caffe

You can customize a container to fit your specific needs for numerous reasons; for example, you depend upon specific software that is not included in the container that NVIDIA provides. No matter your reasons, you can customize a container.

The container images do not contain sample data-sets or sample model definitions unless they are included with the framework source. Be sure to check the container for sample data-sets or models.

5.2. Example 1: Customizing NVIDIA Caffe using Dockerfile

This example uses a Dockerfile to customize the caffe container in nvcr.io. Before customizing the container, you should ensure the caffe 17.03 container has been loaded into the registry using the docker pull command before proceeding.
$ docker pull nvcr.io/nvidia/caffe:17.03

As mentioned earlier in this document, the Docker containers on nvcr.io also provide a sample Dockerfile that explains how to patch a framework and rebuild the Docker image. In the directory /workspace/docker-examples, there are two sample Dockerfiles. For this example, we will use the Dockerfile.customcaffe file as a template for customizing a container.

  1. Create a working directory called my_docker_images on your local hard drive.
  2. Open a text editor and create a file called Dockerfile. Save the file to your working directory.
  3. Open your Dockerfile again and include the following lines in the file:
    FROM nvcr.io/nvidia/caffe:17.03
    # APPLY CUSTOMER PATCHES TO CAFFE
    # Bring in changes from outside container to /tmp
    # (assumes my-caffe-modifications.patch is in same directory as
    Dockerfile)
    #COPY my-caffe-modifications.patch /tmp
    
    # Change working directory to NVCaffe source path
    WORKDIR /opt/caffe
    
    # Apply modifications
    #RUN patch -p1 < /tmp/my-caffe-modifications.patch
    
    # Note that the default workspace for caffe is /workspace
    RUN mkdir build && cd build && \
      cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr/local -DUSE_NCCL=ON
    -DUSE_CUDNN=ON -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN="35 52 60 61"
    -DCUDA_ARCH_PTX="61" .. && \
      make -j"$(nproc)" install && \
      make clean && \
      cd .. && rm -rf build
    
    # Reset default working directory
    WORKDIR /workspace
    Save the file.
  4. Build the image using the docker build command and specify the repository name and tag. In the following example, the repository name is corp/caffe and the tag is 17.03.1PlusChanges. For the case, the command would be the following:
    $ docker build -t corp/caffe:17.03.1PlusChanges .
  5. Run the Docker image using the nvidia-docker run command. For example:
    $ nvidia-docker run -ti --rm corp/caffe:17.03.1PlusChanges .

5.3. Example 2: Customizing NVIDIA Caffe using docker commit

This example uses the docker commit command to flush the current state of the container to a Docker image. This is not a recommended best practice, however, this is useful when you have a container running to which you have made changes and want to save them. In this example, we are using the apt-get tag to install packages which requires that the user run as root.
Note:
  • The Caffe image release 17.04 is used in the example instructions for illustrative purposes.
  • Do not use the --rm flag when running the container. If you use the --rm flag when running the container, your changes will be lost when exiting the container.
  1. Pull the Docker container from the nvcr.io repository to the DGX-1 system. For example, the following command will pull the Caffe container:
    $ docker pull nvcr.io/nvidia/caffe:17.04
  2. Run the container on the DGX-1 using nvidia-docker.
    $ nvidia-docker run -ti nvcr.io/nvidia/caffe:17.04
    ==================
    == NVIDIA Caffe ==
    ==================
    
    NVIDIA Release 17.04 (build 26740)
    
    Container image Copyright (c) 2017, NVIDIA CORPORATION.  All rights reserved.
    Copyright (c) 2014, 2015, The Regents of the University of California (Regents)
    All rights reserved.
    
    Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
    NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
    
    NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be insufficient for NVIDIA Caffe.  NVIDIA recommends the use of the following flags:
       nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...
    
    root@1fe228556a97:/workspace#
  3. You should now be the root user in the container (notice the prompt). You can use the command apt to pull down a package and put it in the container.
    Note: The NVIDIA containers are built using Ubuntu which uses the apt-get package manager. Check the container release notes Deep Learning Documentation for details on the specific container you are using.
    In this example, we will install octave; the GNU clone of MATLAB, into the container.
    # apt-get update
    # apt install octave
    Note: You ahve to first issue apt-get update before you install octave using apt.
  4. Exit the workspace.
    # exit
  5. Display the list of containers using docker ps -a. As an example, here is some of the output from the docker ps -a command:
    $ docker ps -a
    CONTAINER ID    IMAGE                        CREATED       ...
    1fe228556a97    nvcr.io/nvidia/caffe:17.04   3 minutes ago ...
  6. Now you can create a new image from the container that is running where you have installed octave. You can commit the container with the following command.
    $ docker commit 1fe228556a97 nvcr.io/nvidian_sas/caffe_octave:17.04
    sha256:0248470f46e22af7e6cd90b65fdee6b4c6362d08779a0bc84f45de53a6ce9294
    
  7. Display the list of images.
    $ docker images
    REPOSITORY                 	TAG             	IMAGE ID     ...
    nvidian_sas/caffe_octave   	17.04           	75211f8ec225 ...
  8. To verify, let's run the container again and see if Octave is actually there.
    $ nvidia-docker run -ti nvidian_sas/caffe_octave:17.04
    ==================
    == NVIDIA Caffe ==
    ==================
    
    NVIDIA Release 17.04 (build 26740)
    
    Container image Copyright (c) 2017, NVIDIA CORPORATION.  All rights reserved. Copyright (c) 2014, 2015, The Regents of the University of California (Regents) All rights reserved.
    
    Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved. NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
    
    NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be insufficient for NVIDIA Caffe.  NVIDIA recommends the use of the following flags:
       nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...
    
    root@2fc3608ad9d8:/workspace# octave
    octave: X11 DISPLAY environment variable not set
    octave: disabling GUI features
    GNU Octave, version 4.0.0
    Copyright (C) 2015 John W. Eaton and others.
    This is free software; see the source code for copying conditions.
    There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
    FITNESS FOR A PARTICULAR PURPOSE.  For details, type 'warranty'.
    
    Octave was configured for "x86_64-pc-linux-gnu".
    
    Additional information about Octave is available at http://www.octave.org.
    
    Please contribute if you find this software useful.
    For more information, visit http://www.octave.org/get-involved.html
    
    Read http://www.octave.org/bugs.html to learn how to submit bug reports.
    For information about changes from previous versions, type 'news'.
    
    octave:1>

    Since the octave prompt displayed, Octave is installed.

  9. If you want to save the container into your private repository (Docker uses the phrase “push”), then you can use the command docker push ....
    $ docker push nvcr.io/nvidian_sas/caffe_octave:17.04

The new Docker image is now available for use. You can check your local Docker repository for it.

6. NVIDIA Caffe Parameters

Within the Caffe container, there is a caffe.proto file that NVIDIA has updated. The modifications that NVIDIA made are described in the following sections. These added parameters are to help implement the optimizations of the container into your environment.

6.1. Parameter Definitions

Within the Caffe container, there is a caffe.proto file that NVIDIA has updated. The modifications that NVIDIA made are described in the following sections. These added parameters are to help implement the optimizations of the container into your environment.
Boolean
A boolean value is a data type. There are two types of boolean values; true and false. If the string argument is not null, the object types value is true. Anything other than a string type of null results in a false type.
Enumerated
There are two types of enumerated values:
  • Type affects the math and storage precision. The values acceptable are:
    DOUBLE
    64-bit (also referred to as double precision) floating point type.
    FLOAT
    32-bit floating point type. This is the most common and default one.
    FLOAT16
    16-bit floating point type.
  • Engine affects the compute engine. The values acceptable are:
    DEFAULT
    Default implementation of algorithms and routines. Usually equals to CAFFE or CUDNN.
    CAFFE
    Basic CPU or GPU based implementation.
    CUDNN
    Advanced implementation based on highly optimized cuDNN library.
Floating Point Number
There is no fixed number of digits before or after the decimal point. Meaning the decimal point can float. The decimal point can be placed anywhere.
Integer
An integer is any whole number that is positive, negative, or zero.
String
A string is simply a set of characters with no relation to length.

6.2. Added and Modified Parameters

In addition to the parameters within the caffe.proto file included in the BVLC Caffe container, the following parameters have either been added for modified with the NVIDIA Caffe version.

For parameters not mentioned in this guide, see BVLC.

6.2.1. SolverParameter

The SolverParameter sets the solvers parameters.
Setting Value
Type enum
Required yes
Default value FLOAT
Level solver

Usage Example

net: "train_val_fp16.prototxt"
test_iter: 1042
test_interval: 5000
base_lr: 0.03
lr_policy: "poly"
power: 2
display: 100
max_iter: 75000
momentum: 0.9
weight_decay: 0.0005
snapshot: 150000
snapshot_prefix: "snapshots/alexnet_fp16"
solver_mode: GPU
random_seed: 1371
snapshot_after_train: false
solver_data_type: FLOAT16

6.2.1.1. solver_data_type

The solver_data_type parameter is the type used for storing weights and history.
Setting Value
Type enum
Required no
Default value FLOAT
Level solver
Usage Example
solver_data_type: FLOAT16

6.2.1.2. min_lr

The min_lr parameter ensures that the learning rate (lr) threshold is larger than 0.
Setting Value
Type float
Required no
Default value 0
Level solver
Usage Example
net: "train_val_fp16.prototxt"
test_iter: 1042
test_interval: 5000
base_lr: 0.03
min_lr: 1e-5
lr_policy: "poly"
...

6.2.1.3. store_blobs_in_old_format

If set to true, the store_blobs_in_old_format parameter:
  1. Stores blobs in an old, less efficient BVLC-compatible format.
  2. FP16 blobs are converted to FP32 and stored in the data container.
  3. FP32 blobs are stored in the data container.
  4. FP64 blobs are stored in the double_data container.
In rare cases, when the model is trained in NVCaffe but deployed to BVLC Caffe, this parameter ensures there is BVLC compatibility.
Setting Value
Type boolean
Required no
Default value false
Level solver
Usage Example
store_blobs_in_old_format: true

6.2.2. NetParameter

The NetParameter parameter controls the layers that make up the net. If NetParameter is set, it controls all of the layers within the LayerParameter. Each of the configurations, including connectivity and behavior, is specified as a LayerParameter.
Setting Value
Type type
Required no
Default value FLOAT
Level layer

Usage Example

name: "AlexNet-fp16"

default_forward_type: FLOAT16
default_backward_type: FLOAT16

default_forward_math: FLOAT
default_backward_math: FLOAT

6.2.2.1. default_forward_type

The default_forward_type parameter is the default data storage type used in forward pass for all layers.
Setting Value
Type type
Required no
Default value FLOAT
Level net
Usage Example
default_forward_type: FLOAT16

6.2.2.2. default_backward_type

The default_backward_type parameter is the default data storage type used in backward pass for all layers.
Setting Value
Type type
Required no
Default value FLOAT
Level net
Usage Example
default_backward_type: FLOAT16

6.2.2.3. default_forward_math

The default_forward_math parameter is the default data compute type used in forward pass for all layers.
Setting Value
Type type
Required no
Default value FLOAT
Level net
Usage Example
default_forward_math: FLOAT16

6.2.2.4. default_backward_math

The default_backward_math parameter is the default data compute type used in backward pass for all layers.
Setting Value
Type type
Required no
Default value FLOAT
Level net
Usage Example
default_backward_math: FLOAT16

6.2.2.5. reduce_buckets

The reduce_buckets parameter sets the approximate number of buckets to combine layers into. While using multiple GPUs, a reduction process is run after every iteration. For better performance, multiple layers are unified in buckets. The default value should work for the majority of nets.
Setting Value
Type integer
Required no
Default value 6
Level net
Usage Example
reduce_buckets: 10

6.2.2.6. 6.2.2.8. conv_algos_override

The conv_algos_override parameter overrides the convolution algorithms to values that are specified by the user rather than ones suggested by the seeker. For example, if set to a non-negative value, it enforces using the algorithm by the index provided. It has priority over CuDNNConvolutionAlgorithmSeeker and essentially disables seeking. The index should correspond the ordinal in structures:
  • cudnnConvolutionFwdAlgo_t
  • cudnnConvolutionBwdDataAlgo_t
  • cudnnConvolutionBwdFilterAlgo_t
Setting Value
Type string
Required no
Default value "-1,-1,-1"
Level layer
Usage Example
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
	lr_mult: 1
	decay_mult: 1
  }
  param {
	lr_mult: 2
	decay_mult: 0
  }
  convolution_param {
	num_output: 96
	kernel_size: 11
	stride: 4
	weight_filler {
  	type: "gaussian"
  	std: 0.01
	}
	bias_filler {
  	type: "constant"
  	value: 0
	}
	cudnn_convolution_algo_seeker: FINDEX 
conv_algos_override = “1,-1,-1” # USE Implicit GEMM on forward pass and whatever seeker decides on backward
  }
}

6.2.2.7. global_grad_scale

The global_grad_scale parameter defines the constant C used to improve the precision of backpropagation for float16 data storage. Gradients of loss function are multiplied by C before backpropagation starts; then gradients with regards to weights are divided by C accordingly before they are used for weight update.
Setting Value
Type float
Required no
Default value 1
Level net
Usage Example
global_grad_scale = 15

conv_algos_override

The conv_algos_override parameter overrides the convolution algorithms to values that are specified by the user rather than ones suggested by the seeker. For example, if set to a non-negative value, it enforces using the algorithm by the index provided. It has priority over CuDNNConvolutionAlgorithmSeeker and essentially disables seeking. The index should correspond the ordinal in structures:
  • cudnnConvolutionFwdAlgo_t
  • cudnnConvolutionBwdDataAlgo_t
  • cudnnConvolutionBwdFilterAlgo_t
Setting Value
Type string
Required no
Default value "-1,-1,-1"
Level layer
Usage Example
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
	lr_mult: 1
	decay_mult: 1
  }
  param {
	lr_mult: 2
	decay_mult: 0
  }
  convolution_param {
	num_output: 96
	kernel_size: 11
	stride: 4
	weight_filler {
  	type: "gaussian"
  	std: 0.01
	}
	bias_filler {
  	type: "constant"
  	value: 0
	}
	cudnn_convolution_algo_seeker: FINDEX 
conv_algos_override = “1,-1,-1” # USE Implicit GEMM on forward pass and whatever seeker decides on backward
  }
}

6.2.3. LayerParameter

The LayerParameter parameter consists of the following memory storage types:
  • forward_type
  • backward_type
  • forward_math
  • backward_math
The internal match types works for those layers where the internal match type could be different compared to the Forward or Backward type. For example, pseudo fp32 mode in convolution layers.
Setting Value
Type type
Required no
Default value FLOAT
Level layer

Usage Example

layer {
 .....
forward_type: FLOAT
backward_type: FLOAT
 .....

6.2.3.1. forward_type

The forward_type parameter is the output data storage type used by this layer in forward pass.
Setting Value
Type type
Required no
Default value FLOAT
Level layer
Usage Example
forward_type: FLOAT16

6.2.3.2. backward_type

The backward_type parameter is the output data storage type used by this layer in backward pass.
Setting Value
Type type
Required no
Default value FLOAT
Level layer
Usage Example
backward_type: FLOAT16

6.2.3.3. forward_math

The forward_math parameter computes the precision type used by this layer in forward pass.
Setting Value
Type type
Required no
Default value FLOAT
Level layer
Usage Example
forward_math: FLOAT16

6.2.3.4. backward_math

The backward_math parameter computes the precision type used by this layer in backward pass.
Setting Value
Type type
Required no
Default value FLOAT
Level layer
Usage Example
backward_math: FLOAT16

6.2.3.5. cudnn_math_override

The cudnn_math_override parameter sets the default cudnnMathType_t value for all cuDNN-based computations in the current layer, if applicable, otherwise, it is ignored. If negative or omitted, it assumes implicit default and allows optimizers like cudnnFindConvolution*AlgorithmEx to choose the best type. If set to zero, it enforces using CUDNN_DEFAULT_MATH everywhere in the current layer. If set to one, it enforces using CUDNN_TENSOR_OP_MATH everywhere in the current layer.
Setting Value
Type integer
Required no
Default value -1
Level layer
Usage Example
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 32
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "xavier"
    }
    bias_term: false
  }
  cudnn_math_override: 1
}

6.2.4. TransformationParameter

The TransformationParameter parameter consists of settings that can be used for data pre-processing. It stores parameters that are used to apply transformation to the data layers data.

Usage Example

transform_param {
    mirror: true
    crop_size: 227
    use_gpu_transform: true
    mean_file: ".../imagenet_lmdb/imagenet_mean.binaryproto"
  }

6.2.4.1. use_gpu_transform

The use_gpu_transform parameter runs the transform, synchronously, on the GPU.
Setting Value
Type boolean
Required no
Default value false
Level layer > transform_param
Usage Example
use_gpu_transform: true

6.2.4.2. img_rand_resize_lower

The img_rand_resize_lower parameter specifies that the variable-sized input image should be randomly resized. The aspect ratio of the resized image is preserved, but the shortest side of the resized image is uniformly sampled from the closed interval between img_rand_resize_lower and img_rand_resize_upper.
Note: This parameter is currently incompatible with mean_file.
Setting Value
Type integer
Required no
Default value 0
Level layer > transform_param
Usage Example
img_rand_resize_lower: 256

6.2.4.3. img_rand_resize_upper

The img_rand_resize_upper parameter specifies that the variable-sized input image should be randomly resized. The aspect ratio of the resized image is preserved, but the shortest side of the resized image is uniformly sampled from the closed interval between img_rand_resize_lower and img_rand_resize_upper.
Note: This parameter is currently incompatible with mean_file.
Setting Value
Type integer
Required no
Default value 0
Level layer > transform_param
Usage Example
img_rand_resize_upper: 480

6.2.4.4. img_rand_crop

The img_rand_crop parameter specifies that the variable-sized input image should be randomly cropped to a square with the specified size. The crop is taken from a randomly chosen location in the image.
Note: This parameter is currently incompatible with mean_file.
Setting Value
Type integer
Required no
Default value 0
Level layer > transform_param
Usage Example
img_rand_crop: 224

6.2.4.5. img_center_crop

The img_center_crop parameter specifies that the variable-sized input image should be randomly cropped to a square with the specified size. The crop is taken from the center of the input image.
Note: This parameter is currently incompatible with mean_file.
Setting Value
Type integer
Required no
Default value 0
Level layer > transform_param
Usage Example
img_center_crop: 224

6.2.5. BatchNormParameter

In NVIDIA Caffe version 0.15, it was required to explicitly set lr_mul: 0 and decay_mult v:0 for certain BatchNormParameter parameters (global_mean and global variance) to prevent their modification by gradient solvers. In version 0.16, this is done automatically, therefore, these parameters are not needed any more.

In NVIDIA Caffe version 0.15, it was also required that bottom and top contain different values. Although it is recommended that they remain different, this requirement is now optional.

Usage Example

layer {
  name: "conv1_bn"
  type: "BatchNorm"
  bottom: "conv1"
  top: "conv1_bn"
  batch_norm_param {
	moving_average_fraction: 0.9
	eps: 0.0001
	scale_bias: true
  }
}

6.2.5.1. scale_bias

The scale_bias parameter allows you to fuse batch normalization and scale layers. Beginning in version 0.16, batch normalization supports both NVIDIA Caffe and BVLC Caffe.
Setting Value
Type boolean
Required no
Default value false
Level layer
Usage Example
 layer {
  name: "bn"
  type: "BatchNorm"
  bottom: "conv"
  top: "bn"
  batch_norm_param {
	moving_average_fraction: 0.9
	eps: 0.0001
	scale_bias: true
  }
}

6.2.6. ConvolutionParameter

The ConvolutionParameter parameter Specifies which cuDNN routine should be used to find the best convolution algorithm.
Setting Value
Type CuDNNConvolutionAlgorithmSeeker
Required no
Default value FINDEX
Level LayerParameter

Usage Example

convolution_param {
	num_output: 96
	kernel_size: 11
	stride: 4
	weight_filler {
  	type: "gaussian"
  	std: 0.01
	}
	bias_filler {
  	type: "constant"
  	value: 0
	}
	cudnn_convolution_algo_seeker: FINDEX
  }

6.2.6.1. cudnn_convolution_algo_seeker

The cudnn_convolution_algo_seeker parameter specifies which cuDNN routine should be used to find the best convolution algorithm.

The most common use case scenario for Caffe is the image recognition. The convolution layer is the layer that stores the algorithms to process the images. The algorithm seeker has two engines:
GET
GET is the heuristic engine.
FINDEX
FINDEX makes real calls and real assessments and takes a few seconds to assess all possible algorithms for each and every convolutional layer.
Setting Value
Type enum CuDNNConvolutionAlgorithmSeeker { GET, FINDEX }
Required no
Default value FINDEX
Level layer
Usage Example
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
	lr_mult: 1
	decay_mult: 1
  }
  param {
	lr_mult: 2
	decay_mult: 0
  }
  convolution_param {
	num_output: 96
	kernel_size: 11
	stride: 4
	weight_filler {
  	type: "gaussian"
  	std: 0.01
	}
	bias_filler {
  	type: "constant"
  	value: 0
	}
	cudnn_convolution_algo_seeker: FINDEX
  }
}

6.2.7. DataParameter

The DataParameter belongs to the data layers LayerParameter settings. Besides regular BVLC settings, it contains the following performance related settings, threads and parser_threads.
Setting Value
Type enum DB { LEVELDB, LMDB }
Required no
Default value LEVELDB
Level layer

Usage Example

data_param {
  source: "/raid/caffe_imagenet_lmdb/ilsvrc12_train_lmdb"
  batch_size: 1024
  backend: LMDB
}

6.2.7.1. threads

The threads parameter is the number of Data Transformer threads per GPU. Prior to 17.04, the default is 3, which is the optimal value for the majority of nets.

Data Transformer is a component converting source data. It is compute intensive, therefore, if you think that Data Layer under-performs, set the value to 4.

In 17.04, the default is 0. If set to 0, Caffe optimizes it automatically.
Setting Value
Type unsigned integer
Required no
Default value 0
Level DataParameter of data layer
Usage Example
threads: 4

6.2.7.2. parser_threads

The parser_threads parameter is the number of Data Reader and Parser threads per GPU. Prior to 17.04, the default is 2, which is the optimal value for the majority of nets.

Asynchronous Data Reader is an NVIDIA Caffe component. It dramatically increases read speed. Google Protocol Buffers parser is a component that de-serializes raw data that is read by the Reader into a structure called Datum. If you observe messages like Waiting for Datum, increase the setting value to 4 or higher.

In 17.04, the default is 0. If set to 0, Caffe optimizes it automatically.
Setting Value
Type unsigned integer
Required no
Default value 0
Level DataParameter of data layer
Usage Example
parser_threads: 4

6.2.7.3. cache

The cache parameter ensures that the data is read once and put into the host memory. If the data does not fit in the host memory, the cache data is dropped and the Caffe model reads the data from the database.
Setting Value
Type boolean
Required no
Default value false
Level DataParameter of data layer
Usage Example
cache: true

6.2.7.4. shuffle

The shuffle parameter is ignored if the cache parameter is set to false. Shuffling is a data augmentation technique that improves accuracy of training your network. If cache does not fit in the host memory, shuffling will be cancelled.
Setting Value
Type boolean
Required no
Default value false
Level DataParameter of data layer
Usage Example
shuffle: true

6.2.8. ELUParameter

The ELUParameter stores parameters used by ELULayer.
Setting Value
Type structure
Required no
Default value 1.
Level layer

Usage Example

Layer{
   name: "selu"
  type: "ELU"
  bottom: "bottom"
  top: "top"
  elu_param {
    alpha:  1.6733
	lambda: 1.0507
  }
}

6.2.8.1. lambda

The lambda parameter is used for Scaled Exponential Linear Unit (SELU). SELU is a non-linear activation layer, which is defined as follows:
  • If input x >= 0 then output
  • If input x < 0 then output
Figure 1. Scaled Exponential Linear Unit (SELU) Scaled Exponential Linear Unit (SELU).
Setting Value
Type float
Required no
Default value 1.
Level layer
Usage Example
Layer{
   name: "selu"
  type: "ELU"
  bottom: "bottom"
  top: "top"
  elu_param {
    alpha:  1.6733
	lambda: 1.0507
  }

7. Troubleshooting

For more information about Caffe, including tutorials, documentation, and examples, see the Caffe website.

NVIDIA Caffe typically utilizes the same input formats and configuration parameters as Caffe, therefore, community-authored materials and pre-trained models for Caffe usually can be applied to NVIDIA Caffe as well.

For the latest Caffe Release Notes, see the Deep Learning Documentation website.

Notices

Notice

THE INFORMATION IN THIS GUIDE AND ALL OTHER INFORMATION CONTAINED IN NVIDIA DOCUMENTATION REFERENCED IN THIS GUIDE IS PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE INFORMATION FOR THE PRODUCT, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the product described in this guide shall be limited in accordance with the NVIDIA terms and conditions of sale for the product.

THE NVIDIA PRODUCT DESCRIBED IN THIS GUIDE IS NOT FAULT TOLERANT AND IS NOT DESIGNED, MANUFACTURED OR INTENDED FOR USE IN CONNECTION WITH THE DESIGN, CONSTRUCTION, MAINTENANCE, AND/OR OPERATION OF ANY SYSTEM WHERE THE USE OR A FAILURE OF SUCH SYSTEM COULD RESULT IN A SITUATION THAT THREATENS THE SAFETY OF HUMAN LIFE OR SEVERE PHYSICAL HARM OR PROPERTY DAMAGE (INCLUDING, FOR EXAMPLE, USE IN CONNECTION WITH ANY NUCLEAR, AVIONICS, LIFE SUPPORT OR OTHER LIFE CRITICAL APPLICATION). NVIDIA EXPRESSLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS FOR SUCH HIGH RISK USES. NVIDIA SHALL NOT BE LIABLE TO CUSTOMER OR ANY THIRD PARTY, IN WHOLE OR IN PART, FOR ANY CLAIMS OR DAMAGES ARISING FROM SUCH HIGH RISK USES.

NVIDIA makes no representation or warranty that the product described in this guide will be suitable for any specified use without further testing or modification. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to ensure the product is suitable and fit for the application planned by customer and to do the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this guide. NVIDIA does not accept any liability related to any default, damage, costs or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this guide, or (ii) customer product designs.

Other than the right for customer to use the information in this guide with the product, no other license, either expressed or implied, is hereby granted by NVIDIA under this guide. Reproduction of information in this guide is permissible only if reproduction is approved by NVIDIA in writing, is reproduced without alteration, and is accompanied by all associated conditions, limitations, and notices.

Trademarks

NVIDIA, the NVIDIA logo, and cuBLAS, CUDA, cuDNN, cuFFT, cuSPARSE, DIGITS, DGX, DGX-1, Jetson, Kepler, NVIDIA Maxwell, NCCL, NVLink, Pascal, Tegra, TensorRT, and Tesla are trademarks and/or registered trademarks of NVIDIA Corporation in the Unites States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.