Abstract

The Caffe User Guide provides a detailed overview and look into using and customizing the Caffe deep learning framework. This guide also provides documentation on the NVIDIA Caffe parameters that you can use to help implement the optimizations of the container into your environment.

1. Overview of Caffe

Caffe is a deep-learning framework made with flexibility, speed, and modularity in mind. It was originally developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors.

NVIDIA® Caffe is an NVIDIA-maintained fork of BVLC Caffe tuned for NVIDIA GPUs, particularly in multi-GPU configurations. NVIDIA Caffe includes:
  • Supports 16-bit (half) floating point train and inference.
  • Mixed-precision support . It allows to store and/or compute data in either 64, 32 or 16-bit formats. Precision can be defined on each layer (forward and backward phases might be different too), or it can be set to a default for the whole Net.
  • Integration with cuDNN® v6.
  • Automatic selection of the best cuDNN convolution algorithm.
  • Integration with v1.3.4 of NVIDIA Collective Communications Library (NCCL®) for improved multi-GPU scaling.
  • Optimized GPU memory management for data and parameters storage, I/O buffers and workspace for convolutional layers.
  • Parallel data parser and transformer for improved I/O performance.
  • Parallel back-propagation and gradient reduction on multi-GPU systems.
  • Fast solvers implementation with fused CUDA® kernels for weights and history update.
  • Multi-GPU test phase for even memory load across multiple GPUs.
  • Backward compatibility with BVLC Caffe and NVIDIA Caffe 0.15.
  • Extended set of optimized models (including 16-bit floating point examples).

1.1. Contents of the NVIDIA Caffe Container

This image contains source and binaries for NVIDIA® Caffe. The pre-built and installed version of NVIDIA Caffe is located in the /usr/local/[bin,share,lib] directories. The complete source code is located in /opt/caffe directory.

This container image also includes pycaffe, which makes the Caffe interfaces available for use through Python.

The NVIDIA Collective Communications Library (NCCL®) library and NVIDIA Caffe bindings for NCCL are installed in this container, and models using multiple GPUs will automatically leverage this library for fast parallel training.

2. Pulling NVIDIA Caffe

You can pull (download) an NVIDIA® container that is already built, tuned, tested, and ready to run. Each NVIDIA deep learning container includes the code required to build the framework so that you can make changes to the internals. The containers do not contain sample data-sets or sample model definitions unless they are included with the source for the framework.

Containers are available for download from the DGX™ Container Registry (nvcr.io). NVIDIA has provided a number of containers for download from the DGX Container Registry . If your organization has provided you with access to any custom containers, you can download them as well.

The location of the framework source is in /opt/<framework> in each container.

Before pulling an NVIDIA Docker container, ensure that the following prerequisites are met:
  • You have read access to the registry space that contains the container.
  • You are logged into DGX™ Container Registry. For more information, see the NVIDIA Docker Container for Deep Learning Frameworks: Quick Start Guide.
  • You are member of the docker group, which enables you to use docker commands.
Tip: To browse the available containers in the DGX™ Container Registry, use a web browser to log in to your NVIDIA® DGX™ Cloud Services account on the DGX Cloud Services website.

Use the docker pull command to pull images from the NVIDIA DGX Container Registry or go to GitHub and download the source.

For step-by-step instructions on how to pull a container, see the NVIDIA Docker Containers for Deep Learning Frameworks: Quick Start Guide.

After pulling a container, you can run jobs in the container to run neural networks, deploy deep learning models, and perform AI analytics.

3. Verifying NVIDIA Caffe

After you run NVIDIA Caffe, it is a good idea to verify that the container image is running correctly. To do this, issue the following commands from within the container:
# cd /opt/caffe
# data/mnist/get_mnist.sh
# examples/mnist/create_mnist.sh
# examples/mnist/train_lenet.sh
If everything is running correctly, Caffe should download and create a data set, and then start training LeNet. If the training is successful, you will see a code similar to the following towards the end of the output:
I0402 15:08:01.016016 33 solver.cpp:431] Iteration 10000, loss = 0.0342847
I0402 15:08:01.016043 33 solver.cpp:453] Iteration 10000, Testing net (#0)
I0402 15:08:01.085050 38 data_reader.cpp:128] Restarting data pre-fetching
I0402 15:08:01.087720 33 solver.cpp:543] Test net output #0: accuracy = 0.9587
I0402 15:08:01.087751 33 solver.cpp:543] Test net output #1: loss = 0.130223 (* 1 = 0.130223 loss)
I0402 15:08:01.087767 33 caffe.cpp:239] Solver performance on device 0: 498.3 * 64 = 3.189e+04 img/sec
I0402 15:08:01.087780 33 caffe.cpp:242] Optimization Done in 24s

If Caffe is not running properly, or failed during the pulling phase, check your internet connection.

4. Running NVIDIA Caffe

To run a container, you must issue the nvidia-docker run command, specifying the registry, repository, and tags.

Before you can run an NVIDIA Docker deep learning framework container, you must have nvidia-docker installed. For more information, see Installing Docker and NVIDIA Docker in the Quick Start Guide.
  1. As a user, run the container interactively.
    $ nvidia-docker run --rm -ti nvcr.io/nvidia/<framework>

    The following example runs the December 2016 release (16.12) of the NVIDIA Caffe container in interactive mode. The container is automatically removed when the user exits the container.

    $ nvidia-docker run --rm -ti nvcr.io/nvidia/caffe:16.12
    
    ===========
    == Caffe ==
    ===========
    
    NVIDIA Release 16.12 (build 6217)
    
    Container image Copyright (c) 2016, NVIDIA CORPORATION.  All rights reserved.
    Copyright (c) 2014, 2015, The Regents of the University of California (Regents)
    All rights reserved.
    
    Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
    NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
    root@df57eb8e0100:/workspace#
    Note: You are now the root user in the container.
  2. From within the container, start the job that you want to run. The precise command to run depends on the deep learning framework in the container that you are running and the job that you want to run. For details see the /workspace/README.md file for the container.

    The following example runs the caffe time command on one GPU to measure the execution time of the deploy.prototxt model.

    # caffe time -model models/bvlc_alexnet/ -solver deploy.prototxt -gpu=0
  3. Optional: Run the December 2016 release (16.12) of the same NVIDIA Caffe container but in non-interactive mode.
    % nvidia-docker run --rm nvcr.io/nvidia/caffe:16.12 caffe time -model
          /workspace/models/bvlc_alexnet -solver /workspace/deploy.prototxt -gpu=0

5. Customizing and Extending NVIDIA Caffe

NVIDIA Docker images come prepackaged, tuned, and ready to run; however, you may want to build a new image from scratch or augment an existing image with custom code, libraries, data, or settings for your corporate infrastructure. This section will guide you through exercises that will highlight how to create a container from scratch, customize a container, extend a deep learning framework to add features, develop some code using that extended framework from the developer environment, then package that code as a versioned release.

By default, you do not need to build a container. The DGX-1 container repository from NVIDIA, nvcr.io, has a number of containers that can be used immediately. These include containers for deep learning as well as containers with just the CUDA Toolkit.

One of the great things about containers is that they can be used as starting points for creating new containers. This can be referred to as “customizing” or “extending” a container. You can create a container completely from scratch, however, since these containers are likely to run on the DGX-1, it is recommended that you are least start with a nvcr.io container that contains the OS and CUDA. However, you are not limited to this and can create a container that runs on the CPUs in the DGX-1 which does not use the GPUs. In this case, you can start with a bare OS container from the Docker Hub. However, to make development easier, you can still start with a container with CUDA - it is just not used when the container is used.

The customized or extended containers can be saved to a user’s private container repository. They can also be shared with other users of the DGX-1 but this requires some administrator help.

It is important to note that all NVIDIA Docker deep learning framework images include the source to build the framework itself as well as all of the prerequisites.
Attention: Do not install an NVIDIA driver into the docker image at docker build time. nvidia-docker is essentially a wrapper around docker that transparently provisions a container with the necessary components to execute code on the GPU.

A best-practice is to avoiddocker commit usage for developing new docker images, and to use Dockerfiles instead. The Dockerfile method provides visibility and capability to efficiently version-control changes made during development of a docker image. The docker commit method is appropriate for short-lived, disposable images only (see Example 2: Customizing NVIDIA Caffe using docker commit for an example.

For more information on writing a docker file, see the best practices documentation.

5.1. Benefits and Limitations to Customizing NVIDIA Caffe

You can customize a container to fit your specific needs for numerous reasons; for example, you depend upon specific software that is not included in the container that NVIDIA provides. No matter your reasons, you can customize a container.

The container images do not contain sample data-sets or sample model definitions unless they are included with the framework source. Be sure to check the container for sample data-sets or models.

5.2. Example 1: Customizing NVIDIA Caffe using Dockerfile

This example uses a Dockerfile to customize the caffe container in nvcr.io. Before customizing the container, you should ensure the caffe 17.03 container has been loaded into the registry using the docker pull command before proceeding.
$ docker pull nvcr.io/nvidia/caffe:17.03

As mentioned earlier in this document, the Docker containers on nvcr.io also provide a sample Dockerfile that explains how to patch a framework and rebuild the Docker image. In the directory /workspace/docker-examples, there are two sample Dockerfiles. For this example, we will use the Dockerfile.customcaffe file as a template for customizing a container.

  1. Create a working directory called my_docker_images on your local hard drive.
  2. Open a text editor and create a file called Dockerfile. Save the file to your working directory.
  3. Open your Dockerfile again and include the following lines in the file:
    FROM nvcr.io/nvidia/caffe:17.03
    # APPLY CUSTOMER PATCHES TO CAFFE
    # Bring in changes from outside container to /tmp
    # (assumes my-caffe-modifications.patch is in same directory as
    Dockerfile)
    #COPY my-caffe-modifications.patch /tmp
    
    # Change working directory to NVCaffe source path
    WORKDIR /opt/caffe
    
    # Apply modifications
    #RUN patch -p1 < /tmp/my-caffe-modifications.patch
    
    # Note that the default workspace for caffe is /workspace
    RUN mkdir build && cd build && \
      cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr/local -DUSE_NCCL=ON
    -DUSE_CUDNN=ON -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN="35 52 60 61"
    -DCUDA_ARCH_PTX="61" .. && \
      make -j"$(nproc)" install && \
      make clean && \
      cd .. && rm -rf build
    
    # Reset default working directory
    WORKDIR /workspace
    Save the file.
  4. Build the image using the docker build command and specify the repository name and tag. In the following example, the repository name is corp/caffe and the tag is 17.03.1PlusChanges. For the case, the command would be the following:
    $ docker build -t corp/caffe:17.03.1PlusChanges .
  5. Run the Docker image using the nvidia-docker run command. For example:
    $ nvidia-docker run -ti --rm corp/caffe:17.03.1PlusChanges .

5.3. Example 2: Customizing NVIDIA Caffe using docker commit

This example uses the docker commit command to flush the current state of the container to a Docker image. This is not a recommended best practice, however, this is useful when you have a container running to which you have made changes and want to save them. In this example, we are using the apt-get tag to install packages which requires that the user run as root.
Note:
  • The Caffe image release 17.04 is used in the example instructions for illustrative purposes.
  • Do not use the --rm flag when running the container. If you use the --rm flag when running the container, your changes will be lost when exiting the container.
  1. Pull the Docker container from the nvcr.io repository to the DGX-1 system. For example, the following command will pull the Caffe container:
    $ docker pull nvcr.io/nvidia/caffe:17.04
  2. Run the container on the DGX-1 using nvidia-docker.
    $ nvidia-docker run -ti nvcr.io/nvidia/caffe:17.04
    ==================
    == NVIDIA Caffe ==
    ==================
    
    NVIDIA Release 17.04 (build 26740)
    
    Container image Copyright (c) 2017, NVIDIA CORPORATION.  All rights reserved.
    Copyright (c) 2014, 2015, The Regents of the University of California (Regents)
    All rights reserved.
    
    Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
    NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
    
    NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be insufficient for NVIDIA Caffe.  NVIDIA recommends the use of the following flags:
       nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...
    
    root@1fe228556a97:/workspace#
  3. You should now be the root user in the container (notice the prompt). You can use the command apt to pull down a package and put it in the container.
    Note: The NVIDIA containers are built using Ubuntu which uses the apt-get package manager. Check the container release notes Deep Learning Documentation for details on the specific container you are using.
    In this example, we will install octave; the GNU clone of MATLAB, into the container.
    # apt-get update
    # apt install octave
    Note: You ahve to first issue apt-get update before you install octave using apt.
  4. Exit the workspace.
    # exit
  5. Display the list of containers using docker ps -a. As an example, here is some of the output from the docker ps -a command:
    $ docker ps -a
    CONTAINER ID    IMAGE                        CREATED       ...
    1fe228556a97    nvcr.io/nvidia/caffe:17.04   3 minutes ago ...
  6. Now you can create a new image from the container that is running where you have installed octave. You can commit the container with the following command.
    $ docker commit 1fe228556a97 nvcr.io/nvidian_sas/caffe_octave:17.04
    sha256:0248470f46e22af7e6cd90b65fdee6b4c6362d08779a0bc84f45de53a6ce9294
    
  7. Display the list of images.
    $ docker images
    REPOSITORY                 	TAG             	IMAGE ID     ...
    nvidian_sas/caffe_octave   	17.04           	75211f8ec225 ...
  8. To verify, let's run the container again and see if Octave is actually there.
    $ nvidia-docker run -ti nvidian_sas/caffe_octave:17.04
    ==================
    == NVIDIA Caffe ==
    ==================
    
    NVIDIA Release 17.04 (build 26740)
    
    Container image Copyright (c) 2017, NVIDIA CORPORATION.  All rights reserved. Copyright (c) 2014, 2015, The Regents of the University of California (Regents) All rights reserved.
    
    Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved. NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
    
    NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be insufficient for NVIDIA Caffe.  NVIDIA recommends the use of the following flags:
       nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...
    
    root@2fc3608ad9d8:/workspace# octave
    octave: X11 DISPLAY environment variable not set
    octave: disabling GUI features
    GNU Octave, version 4.0.0
    Copyright (C) 2015 John W. Eaton and others.
    This is free software; see the source code for copying conditions.
    There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
    FITNESS FOR A PARTICULAR PURPOSE.  For details, type 'warranty'.
    
    Octave was configured for "x86_64-pc-linux-gnu".
    
    Additional information about Octave is available at http://www.octave.org.
    
    Please contribute if you find this software useful.
    For more information, visit http://www.octave.org/get-involved.html
    
    Read http://www.octave.org/bugs.html to learn how to submit bug reports.
    For information about changes from previous versions, type 'news'.
    
    octave:1>

    Since the octave prompt displayed, Octave is installed.

  9. If you want to save the container into your private repository (Docker uses the phrase “push”), then you can use the command docker push ....
    $ docker push nvcr.io/nvidian_sas/caffe_octave:17.04

The new Docker image is now available for use. You can check your local Docker repository for it.

6. NVIDIA Caffe Parameters

Within the Caffe container, there is a caffe.proto file that NVIDIA has updated. The modifications that NVIDIA made are described in the following sections. These added parameters are to help implement the optimizations of the container into your environment.

6.1. Parameter Definitions

Within the Caffe container, there is a caffe.proto file that NVIDIA has updated. The modifications that NVIDIA made are described in the following sections. These added parameters are to help implement the optimizations of the container into your environment.
Boolean
A boolean value is a data type. There are two types of boolean values; true and false. If the string argument is not null, the object types value is true. Anything other than a string type of null results in a false type.
Enumerated
There are two types of enumerated values:
  • Type affects the math and storage precision. The values acceptable are:
    DOUBLE
    64-bit (also referred to as double precision) floating point type.
    FLOAT
    32-bit floating point type. This is the most common and default one.
    FLOAT16
    16-bit floating point type.
  • Engine affects the compute engine. The values acceptable are:
    DEFAULT
    Default implementation of algorithms and routines. Usually equals to CAFFE or CUDNN.
    CAFFE
    Basic CPU or GPU based implementation.
    CUDNN
    Advanced implementation based on highly optimized cuDNN library.
Floating Point Number
There is no fixed number of digits before or after the decimal point. Meaning the decimal point can float. The decimal point can be placed anywhere.
Integer
An integer is any whole number that is positive, negative, or zero.
String
A string is simply a set of characters with no relation to length.

6.2. Added and Modified Parameters

In addition to the parameters within the caffe.proto file included in the BVLC Caffe container, the following parameters have either been added for modified with the NVIDIA Caffe version.

For parameters not mentioned in this guide, see BVLC.

6.2.1. SolverParameter

The SolverParameter sets the solvers parameters.
Setting Value
Type enum
Required yes
Default value FLOAT
Level solver

Usage Example

net: "train_val_fp16.prototxt"
test_iter: 1042
test_interval: 5000
base_lr: 0.03
lr_policy: "poly"
power: 2
display: 100
max_iter: 75000
momentum: 0.9
weight_decay: 0.0005
snapshot: 150000
snapshot_prefix: "snapshots/alexnet_fp16"
solver_mode: GPU
random_seed: 1371
snapshot_after_train: false
solver_data_type: FLOAT16

6.2.1.1. solver_data_type

The solver_data_type parameter is the type used for storing weights and history.
Setting Value
Type enum
Required no
Default value FLOAT
Level solver
Usage Example
solver_data_type: FLOAT16

6.2.1.2. min_lr

The min_lr parameter ensures that the learning rate (lr) threshold is larger than 0.
Setting Value
Type float
Required no
Default value 0
Level solver
Usage Example
net: "train_val_fp16.prototxt"
test_iter: 1042
test_interval: 5000
base_lr: 0.03
min_lr: 1e-5
lr_policy: "poly"
...

6.2.2. NetParameter

The NetParameter parameter controls the layers that make up the net. If NetParameter is set, it controls all of the layers within the LayerParameter. Each of the configurations, including connectivity and behavior, is specified as a LayerParameter.
Setting Value
Type type
Required no
Default value FLOAT
Level layer

Usage Example

name: "AlexNet-fp16"

default_forward_type: FLOAT16
default_backward_type: FLOAT16

default_forward_math: FLOAT
default_backward_math: FLOAT

6.2.2.1. default_forward_type

The default_forward_type parameter is the default data storage type used in forward pass for all layers.
Setting Value
Type type
Required no
Default value FLOAT
Level net
Usage Example
default_forward_type: FLOAT16

6.2.2.2. default_backward_type

The default_backward_type parameter is the default data storage type used in backward pass for all layers.
Setting Value
Type type
Required no
Default value FLOAT
Level net
Usage Example
default_backward_type: FLOAT16

6.2.2.3. default_forward_math

The default_forward_math parameter is the default data compute type used in forward pass for all layers.
Setting Value
Type type
Required no
Default value FLOAT
Level net
Usage Example
default_forward_math: FLOAT16

6.2.2.4. default_backward_math

The default_backward_math parameter is the default data compute type used in backward pass for all layers.
Setting Value
Type type
Required no
Default value FLOAT
Level net
Usage Example
default_backward_math: FLOAT16

6.2.2.5. reduce_buckets

The reduce_buckets parameter sets the approximate number of buckets to combine layers into. While using multiple GPUs, a reduction process is run after every iteration. For better performance, multiple layers are unified in buckets. The default value should work for the majority of nets.
Setting Value
Type integer
Required no
Default value 6
Level net
Usage Example
reduce_buckets: 10

6.2.2.6. conv_algos_override

The conv_algos_override parameter overrides the convolution algorithms to values that are specified by the user rather than ones suggested by the seeker. For example, if set to a non-negative value, it enforces using the algorithm by the index provided. It has priority over CuDNNConvolutionAlgorithmSeeker and essentially disables seeking. The index should correspond the ordinal in structures:
  • cudnnConvolutionFwdAlgo_t
  • cudnnConvolutionBwdDataAlgo_t
  • cudnnConvolutionBwdFilterAlgo_t
Setting Value
Type string
Required no
Default value "-1,-1,-1"
Level layer
Usage Example
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
	lr_mult: 1
	decay_mult: 1
  }
  param {
	lr_mult: 2
	decay_mult: 0
  }
  convolution_param {
	num_output: 96
	kernel_size: 11
	stride: 4
	weight_filler {
  	type: "gaussian"
  	std: 0.01
	}
	bias_filler {
  	type: "constant"
  	value: 0
	}
	cudnn_convolution_algo_seeker: FINDEX 
conv_algos_override = “1,-1,-1” # USE Implicit GEMM on forward pass and whatever seeker decides on backward
  }
}

6.2.3. LayerParameter

The LayerParameter parameter consists of the following memory storage types:
  • forward_type
  • backward_type
  • forward_math
  • backward_math
The internal match types works for those layers where the internal match type could be different compared to the Forward or Backward type. For example, pseudo fp32 mode in convolution layers.
Setting Value
Type type
Required no
Default value FLOAT
Level layer

Usage Example

layer {
 .....
forward_type: FLOAT
backward_type: FLOAT
 .....

6.2.3.1. forward_type

The forward_type parameter is the output data storage type used by this layer in forward pass.
Setting Value
Type type
Required no
Default value FLOAT
Level layer
Usage Example
forward_type: FLOAT16

6.2.3.2. backward_type

The backward_type parameter is the output data storage type used by this layer in backward pass.
Setting Value
Type type
Required no
Default value FLOAT
Level layer
Usage Example
backward_type: FLOAT16

6.2.3.3. forward_math

The forward_math parameter computes the precision type used by this layer in forward pass.
Setting Value
Type type
Required no
Default value FLOAT
Level layer
Usage Example
forward_math: FLOAT16

6.2.3.4. backward_math

The backward_math parameter computes the precision type used by this layer in backward pass.
Setting Value
Type type
Required no
Default value FLOAT
Level layer
Usage Example
backward_math: FLOAT16

6.2.4. TransformationParameter

The TransformationParameter parameter consists of settings that can be used for data pre-processing. It stores parameters that are used to apply transformation to the data layers data.

Usage Example

transform_param {
    mirror: true
    crop_size: 227
    use_gpu_transform: true
    mean_file: ".../imagenet_lmdb/imagenet_mean.binaryproto"
  }

6.2.4.1. use_gpu_transform

The use_gpu_transform parameter runs the transform, synchronously, on the GPU.
Setting Value
Type boolean
Required no
Default value false
Level layer > transform_param
Usage Example
use_gpu_transform: true

6.2.5. BatchNormParameter

In NVIDIA Caffe version 0.15, it was required to explicitly set lr_mul: 0 and decay_mult v:0 for certain BatchNormParameter parameters (global_mean and global variance) to prevent their modification by gradient solvers. In version 0.16, this is done automatically, therefore, these parameters are not needed any more.

In NVIDIA Caffe version 0.15, it was also required that bottom and top contain different values. Although it is recommended that they remain different, this requirement is now optional.

Usage Example

layer {
  name: "conv1_bn"
  type: "BatchNorm"
  bottom: "conv1"
  top: "conv1_bn"
  batch_norm_param {
	moving_average_fraction: 0.9
	eps: 0.0001
	scale_bias: true
  }
}

6.2.5.1. scale_bias

The scale_bias parameter allows you to fuse batch normalization and scale layers. Beginning in version 0.16, batch normalization supports both NVIDIA Caffe and BVLC Caffe.
Setting Value
Type boolean
Required no
Default value false
Level layer
Usage Example
 layer {
  name: "bn"
  type: "BatchNorm"
  bottom: "conv"
  top: "bn"
  batch_norm_param {
	moving_average_fraction: 0.9
	eps: 0.0001
	scale_bias: true
  }
}

6.2.6. ConvolutionParameter

The ConvolutionParameter parameter Specifies which cuDNN routine should be used to find the best convolution algorithm.
Setting Value
Type CuDNNConvolutionAlgorithmSeeker
Required no
Default value FINDEX
Level LayerParameter

Usage Example

convolution_param {
	num_output: 96
	kernel_size: 11
	stride: 4
	weight_filler {
  	type: "gaussian"
  	std: 0.01
	}
	bias_filler {
  	type: "constant"
  	value: 0
	}
	cudnn_convolution_algo_seeker: FINDEX
  }

6.2.6.1. cudnn_convolution_algo_seeker

The cudnn_convolution_algo_seeker parameter specifies which cuDNN routine should be used to find the best convolution algorithm.

The most common use case scenario for Caffe is the image recognition. The convolution layer is the layer that stores the algorithms to process the images. The algorithm seeker has two engines:
GET
GET is the heuristic engine.
FINDEX
FINDEX makes real calls and real assessments and takes a few seconds to assess all possible algorithms for each and every convolutional layer.
Setting Value
Type enum CuDNNConvolutionAlgorithmSeeker { GET, FINDEX }
Required no
Default value FINDEX
Level layer
Usage Example
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
	lr_mult: 1
	decay_mult: 1
  }
  param {
	lr_mult: 2
	decay_mult: 0
  }
  convolution_param {
	num_output: 96
	kernel_size: 11
	stride: 4
	weight_filler {
  	type: "gaussian"
  	std: 0.01
	}
	bias_filler {
  	type: "constant"
  	value: 0
	}
	cudnn_convolution_algo_seeker: FINDEX
  }
}

6.2.7. DataParameter

The DataParameter belongs to the data layers LayerParameter settings. Besides regular BVLC settings, it contains the following performance related settings, threads and parser_threads.
Setting Value
Type enum DB { LEVELDB, LMDB }
Required no
Default value LEVELDB
Level layer

Usage Example

data_param {
  source: "/raid/caffe_imagenet_lmdb/ilsvrc12_train_lmdb"
  batch_size: 1024
  backend: LMDB
}

6.2.7.1. threads

The threads parameter is the number of Data Transformer threads per GPU. Prior to 17.04, the default is 3, which is the optimal value for the majority of nets.

Data Transformer is a component converting source data. It is compute intensive, therefore, if you think that Data Layer under-performs, set the value to 4.

In 17.04, the default is 0. If set to 0, Caffe optimizes it automatically.
Setting Value
Type unsigned integer
Required no
Default value 0
Level DataParameter of data layer
Usage Example
threads: 4

6.2.7.2. parser_threads

The parser_threads parameter is the number of Data Reader and Parser threads per GPU. Prior to 17.04, the default is 2, which is the optimal value for the majority of nets.

Asynchronous Data Reader is an NVIDIA Caffe component. It dramatically increases read speed. Google Protocol Buffers parser is a component that de-serializes raw data that is read by the Reader into a structure called Datum. If you observe messages like Waiting for Datum, increase the setting value to 4 or higher.

In 17.04, the default is 0. If set to 0, Caffe optimizes it automatically.
Setting Value
Type unsigned integer
Required no
Default value 0
Level DataParameter of data layer
Usage Example
parser_threads: 4

6.2.7.3. cache

The cache parameter ensures that the data is read once and put into the host memory. If the data does not fit in the host memory, the cache data is dropped and the Caffe model reads the data from the database.
Setting Value
Type boolean
Required no
Default value false
Level DataParameter of data layer
Usage Example
cache: true

6.2.7.4. shuffle

The shuffle parameter is ignored if the cache parameter is set to false. Shuffling is a data augmentation technique that improves accuracy of training your network. If cache does not fit in the host memory, shuffling will be cancelled.
Setting Value
Type boolean
Required no
Default value false
Level DataParameter of data layer
Usage Example
shuffle: true

7. Caffe for DIGITS

The DIGITS application in the NVIDIA Docker repository, nvcr.io, comes with DIGITS, but also comes with Caffe and Torch. You can read the details in the container release notes here http://docs.nvidia.com/deeplearning/dgx/index.html. For example, the 17.04 release of DIGITS includes the 17.04 release of Caffe and the 17.04 release of Torch.

DIGITS is a training platform that can be used with NVIDIA Caffe and Torch deep learning frameworks. Using either of these frameworks, DIGITS will train your deep learning models on your dataset.

The following section includes examples using DIGITS with a Caffe backend.

7.1. Example 1: MNIST

  1. The first step in training a model with DIGITS and Caffe on a DGX-1 is to pull the DIGITS application from the nvcr.io registry (be sure you are logged into the DGX-1).
    $ docker pull nvcr.io/nvidia/digits:17.04
  2. After the application has been pulled, you can start DIGITS on the DGX-1. Because DIGITS is a web-based frontend for Caffe and Torch, we will run the DIGITS application in a non-interactive way using the following command.
    $ nvidia-docker run -d --name digits-17.04 -p 8888:5000
    --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864
    nvcr.io/nvidia/digits:17.04
    There are a number of options in this command.
    • The first option -d tells nvidia-docker to run the application in “daemon” mode.
    • The --name option names the running application (we will need this later).
    • The two ulimit options and the shmem option are to increase the amount of memory for Caffe since it shares data across GPUs using shared memory.
    • The -p 8888:5000 option maps the DIGITS port 5000 to port 8888 (you will see how this is used below).
    After you run this command you need to find the IP address of the DIGITS node. This can be found by running the command ifconfig as shown below.
    $ ifconfig
    docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 192.168.99.1  netmask 255.255.255.0  broadcast 0.0.0.0     
         inet6 fe80::42:5cff:fefb:1c30  prefixlen 64  scopeid 0x20<link>     
         ether 02:42:5c:fb:1c:30  txqueuelen 0  (Ethernet)     
         RX packets 22649  bytes 5171804 (4.9 MiB)     
         RX errors 0  dropped 0  overruns 0  frame 0     
         TX packets 29088  bytes 123439479 (117.7 MiB)     
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    
    enp1s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500     
         inet 10.31.229.99  netmask 255.255.255.128  broadcast 10.31.229.127     
         inet6 fe80::56ab:3aff:fed6:614f  prefixlen 64  scopeid 0x20<link>     
         ether 54:ab:3a:d6:61:4f  txqueuelen 1000  (Ethernet)     
         RX packets 8116350  bytes 11069954019 (10.3 GiB)     
         RX errors 0  dropped 9  overruns 0  frame 0     
         TX packets 1504305  bytes 162349141 (154.8 MiB)     
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    ...

    In this case, we want the Ethernet IP address since that is the address of the web server for DIGITS (10.31.229.56 for this example). Your IP address will be different.

  3. We now need to download the MNIST data set into the application. The DIGITS application has a simple script for downloading the data set into the application. As a check, run the following command to make sure the application is running.
    $ docker ps -a
    CONTAINER ID    IMAGE                       ...  NAMES
    c930962b9636    nvcr.io/nvidia/digits:17.04 ...  digits-17.04

    The application is running and has the name that we gave it (digits-17.04).

    Next you need to “shell” into the running application from another terminal on the DGX-1.
    $ docker exec -it digits-17.04 bash
    root@XXXXXXXXXXXX:/workspace#
    We want to put the data into the directory /data/mnist. There is a simple Python script in the application that will do this for us. It downloads the data in the correct format as well.
    # python -m digits.download_data mnist /data/mnist
    Downloading url=http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz ...
    Downloading url=http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz ...
    Downloading url=http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz ...
    Downloading url=http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz ...
    Uncompressing file=train-images-idx3-ubyte.gz ...
    Uncompressing file=train-labels-idx1-ubyte.gz ...
    Uncompressing file=t10k-images-idx3-ubyte.gz ...
    Uncompressing file=t10k-labels-idx1-ubyte.gz ...
    Reading labels from /data/mnist/train-labels.bin ...
    Reading images from /data/mnist/train-images.bin ...
    Reading labels from /data/mnist/test-labels.bin ...
    Reading images from /data/mnist/test-images.bin ...
    Dataset directory is created successfully at '/data/mnist'
    Done after 13.4188599586 seconds.
    
  4. You can now open a web browser to the IP address from the previous step. Be sure to use port 8888 since we mapped the DIGITS port from 5000 to port 8888. For this example, the URL would be the following.
    10.31.229.56:8888
    On the home page of DIGITS, in the top right corner it says that there are 8 of 8 GPUs available on this DGX-1.
    Figure 1. DIGITS home page DIGITS home page.
  5. Load a dataset. We are going to use the MNIST dataset as an example since it comes with the application.
    1. Click the Datasets tab.
    2. Click the Images drop down menu and select Classification. If DIGITS asks for a user name, you can enter anything you want. The New Image Classification Dataset window displays. After filling in the fields, your screen should look like the following.
      Figure 2. New Image Classification Dataset New Image Classification Dataset.
    3. Provide values for the Image Type and the Image size as shown in the above image.
    4. Give your dataset a name in the Dataset Name field. You can name the dataset anything you like. In this case the name is just “mnist”.
    5. Click Create. This tells DIGITS to tell Caffe to load the datasets. After the datasets are loaded, your screen should look similar to the following.
      Note: This screen capture has been truncated because the web page is very long.
      Figure 3. MNIST top level MNIST top level
      Figure 4. MNIST lower level MNIST lower level
      Note: There are two sections that allow you to “explore” the db (database). The Create DB (train) is for training data and Create DB (val) is for validating data. In either of these displays, you can click Explore the db for the training set.
  6. Train a model. We are going to use Yann Lecun’s LeNet model as an example since it comes with the application.
    1. Define the model. Click DIGITS in the upper left corner to be taken back to the home page.
    2. Click the Models tab.
    3. Click the Images drop down menu and select Classification. The New Image Classification Model window displays.
    4. Provide values for the Select Dataset and the training parameter fields.
    5. In the Standard Networks tab, click Caffe and select the LeNet radio button.
      Note: DIGITS allows you to use previous networks, pre-trained networks, and customer networks if you want.
    6. Click Create. The training of the LeNet model starts.
      Note: This screen capture has been truncated because the web page is very long.
      Figure 5. New Image Classification Model top level New Image Classification Model top level
      Figure 6. New Image Classification Model lower level New Image Classification Model lower level
      During the training, DIGITS displays the history of the training parameters, specifically, the loss function for the training data, the accuracy from the validation data set, and the loss function for the validation data. After the training completes, (all 30 epochs are trained), your screen should look similar to the following.
      Note: This screen capture has been truncated because the web page is very long.
      Figure 7. Image Classification Model top level Image Classification Model top level
      Figure 8. Image Classification Model lower level Image Classification Model lower level
  7. Optional: You can test some images (inference) against the trained model by scrolling to the bottom of the web page. For illustrative purposes, a single image is input from the test data set. You can always upload an image if you like. You can also input a list of “test” images if you want. The screen below does inference against a test image called /data/mnist/test/5/06206.png . Also, select the Statistics and Visualizations checkbox to ensure that you can see all of the details from the network as well as the network prediction.

    Figure 9. Trained Models Trained Models
    Note: You can select a model from any of the epochs if you want. To do so, click the Select Model drop down arrow and select a different epoch.
  8. Click Classify One. This opens another browser tab and displays predictions. The screen below is the output for the test image that is the number “5”.

    Figure 10. Classify One Image Classify One Image

8. Troubleshooting

For more information about Caffe, including tutorials, documentation, and examples, see the Caffe website.

NVIDIA Caffe typically utilizes the same input formats and configuration parameters as Caffe, therefore, community-authored materials and pre-trained models for Caffe usually can be applied to NVIDIA Caffe as well.

For the latest Caffe Release Notes, see the Deep Learning Documentation website.

Notices

Notice

THE INFORMATION IN THIS GUIDE AND ALL OTHER INFORMATION CONTAINED IN NVIDIA DOCUMENTATION REFERENCED IN THIS GUIDE IS PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE INFORMATION FOR THE PRODUCT, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the product described in this guide shall be limited in accordance with the NVIDIA terms and conditions of sale for the product.

THE NVIDIA PRODUCT DESCRIBED IN THIS GUIDE IS NOT FAULT TOLERANT AND IS NOT DESIGNED, MANUFACTURED OR INTENDED FOR USE IN CONNECTION WITH THE DESIGN, CONSTRUCTION, MAINTENANCE, AND/OR OPERATION OF ANY SYSTEM WHERE THE USE OR A FAILURE OF SUCH SYSTEM COULD RESULT IN A SITUATION THAT THREATENS THE SAFETY OF HUMAN LIFE OR SEVERE PHYSICAL HARM OR PROPERTY DAMAGE (INCLUDING, FOR EXAMPLE, USE IN CONNECTION WITH ANY NUCLEAR, AVIONICS, LIFE SUPPORT OR OTHER LIFE CRITICAL APPLICATION). NVIDIA EXPRESSLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS FOR SUCH HIGH RISK USES. NVIDIA SHALL NOT BE LIABLE TO CUSTOMER OR ANY THIRD PARTY, IN WHOLE OR IN PART, FOR ANY CLAIMS OR DAMAGES ARISING FROM SUCH HIGH RISK USES.

NVIDIA makes no representation or warranty that the product described in this guide will be suitable for any specified use without further testing or modification. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to ensure the product is suitable and fit for the application planned by customer and to do the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this guide. NVIDIA does not accept any liability related to any default, damage, costs or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this guide, or (ii) customer product designs.

Other than the right for customer to use the information in this guide with the product, no other license, either expressed or implied, is hereby granted by NVIDIA under this guide. Reproduction of information in this guide is permissible only if reproduction is approved by NVIDIA in writing, is reproduced without alteration, and is accompanied by all associated conditions, limitations, and notices.

Trademarks

NVIDIA, the NVIDIA logo, and cuBLAS, CUDA, cuDNN, cuFFT, cuSPARSE, DIGITS, DGX, DGX-1, Jetson, Kepler, NVIDIA Maxwell, NCCL, NVLink, Pascal, Tegra, TensorRT, and Tesla are trademarks and/or registered trademarks of NVIDIA Corporation in the Unites States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.